Jimaging 09 00147 v2

Journal of
Imaging
Editorial
Deep Learning and Vision Transformer for Medical
Image Analysis
Yudong Zhang 1, * , Jiaji Wang 1 , Juan Manuel Gorriz 2 and Shuihua Wang 1
1 School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK;
[email protected] (J.W.); [email protected] (S.W.)
2 Department of Signal Theory, Networking, and Communications, University of Granada,
52005 Granada, Spain; [email protected]
* Correspondence: [email protected]; Tel.: +44-754-870-0453
Artificial intelligence (AI) refers to the field of computer science theory and tech-
nology [1] that is focused on creating intelligent machines capable of simulating human
intelligence [2]. AI systems [3] are designed to perform tasks that typically require human
intelligence [4], such as perception, learning, reasoning [5], problem-solving [6], decision-
making [7], etc.
Machine learning (ML) [8] is a subfield of AI that encompasses algorithms and statisti-
cal models, enabling computer systems to automatically learn from data, identify patterns,
and make predictions or decisions without being explicitly programmed [9]. It involves
the development of mathematical models and algorithms [10] that allow machines to
iteratively process and analyze large datasets, learn from examples or experiences, and
improve their performance over time. By leveraging ML theories and techniques [11], com-
puters can discover complex patterns, extract meaningful insights, and generate reliable
predictions, making ML a powerful tool for various applications in fields such as finance,
smart healthcare [12], the Internet of Things [13], natural language processing (NLP) [14],
recommendation systems, etc.
Deep learning (DL) is a specialized branch of ML that focuses on the development
and training of artificial neural networks with multiple layers of interconnected nodes [15],
which are known as deep neural networks. It enables computers to automatically learn
hierarchical representations of data, allowing for the extraction of intricate patterns and fea-
Citation: Zhang, Y.; Wang, J.; Gorriz, tures from complex datasets [16]. DL leverages the power of large-scale computing and vast
J.M.; Wang, S. Deep Learning and amounts of data [17] to enable neural networks to perform sophisticated tasks, such as im-
Vision Transformer for Medical age and speech recognition, NLP, and even autonomous decision-making. By emulating the
Image Analysis. J. Imaging 2023, 9,
structure and functionality of the human brain, DL has revolutionized AI by significantly
147. https://doi.org/10.3390/
enhancing the accuracy and performance of various applications [18] including medical
jimaging9070147
image analysis (MIA) [19], while also demanding substantial computational resources.
Received: 15 June 2023 Transformers are a revolutionary DL method that have greatly impacted the field of
Accepted: 18 July 2023 NLP. They are an example of a neural network model designed to process sequential data,
Published: 21 July 2023 such as sentences or paragraphs, by leveraging attention mechanisms. Unlike traditional
recurrent neural networks (RNNs) [20] that process input sequentially, transformers [21]
employ a parallelized approach, allowing for more efficient and scalable computation.
By focusing on the relationships and dependencies between different words or tokens
Copyright: © 2023 by the authors.
within a sequence, the transformer model excels at tasks like machine translation, text
Licensee MDPI, Basel, Switzerland.
generation, sentiment analysis, and language understanding [22]. Transformers’ self-
This article is an open access article
attention mechanisms enable them to capture contextual information effectively, resulting
distributed under the terms and
in state-of-the-art performance on a wide range of NLP benchmarks and applications.
conditions of the Creative Commons
Transformers have become the foundation for many advanced language models, such as
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
BERT, ChatGPT [23], and T5, and have significantly advanced the capabilities of language
4.0/).
understanding and generation systems. Vision transformers (ViTs) [24] are an adaptation
J. Imaging 2023, 9, 147. https://doi.org/10.3390/jimaging9070147 https://www.mdpi.com/journal/jimaging

J. Imaging 2023, 9, x FOR PEER REVIEW 2 of 4
J. Imaging 2023, 9, x FOR PEER REVIEW 2 of 4
J. Imaging 2023, 9, 147 ChatGPT [23], and T5, and have significantly advanced the capabilities of language2 un- of 4
derstanding and generation systems. Vision transformers (ViTs) [24] are an adaptation of
ChatGPT [23], and T5, and have significantly advanced the capabilities of language un-
the classical transformer architecture that apply self-attention mechanisms to process im-
derstanding and generation systems. Vision transformers (ViTs) [24] are an adaptation of
age data [25], making them an exemplary powerful model for tasks in computer vision,
thethe
of classical
classicaltransformer
transformerarchitecture that
architecture apply
that applyself-attention mechanisms
self-attention to process
mechanisms im-
to process
showcasing the extension of transformers’ effectiveness beyond NLP. Figure 1 shows the
image
age datadata [25],making
[25], makingthem
theman anexemplary
exemplarypowerful
powerfulmodel
modelfor
for tasks
tasks in computer vision,
vision,
relationship between AI, ML, DL, and Transformers.
showcasing
showcasing the the extension
extension of
of transformers’
transformers’ effectiveness
effectiveness beyond
beyond NLP.
NLP. Figure
Figure11shows
showsthe
the
relationship
relationshipbetween
betweenAI,AI,ML,
ML,DL,DL,and
andTransformers.
Transformers.
Artificial Intelligence
MachineIntelligence
Artificial Learning
Deep Learning
Machine Learning
Deep Learning
Transformer
and VIT
Transformer
and VIT
Figure 1. Relationship between AI, ML, DL, and Transformers.
Figure
Figure 1.1. Relationship
MedicalRelationship betweenAI,
image between
analysis AI, ML, DL,
ML,
(MIA) DL, and
[26] and Transformers.
Transformers.
is an important field of application for AI. MIA
involves a series of common procedures [27], starting with image acquisition, wherein
Medical image
Medical image analysis
analysis (MIA)
(MIA) [26]
[26] is
is an
an important
important fieldfield of
of application
application for for AI.
AI. MIA
MIA
medical imaging modalities capture anatomical or functional information. The acquired
involves aa series
involves series of of common
common procedures
procedures [27],[27], starting
starting with
with image
image acquisition,
acquisition, wherein
wherein
images then undergo preprocessing techniques [28] to correct artifacts, enhance quality,
medical
medical imaging
imaging modalities
modalities capture
captureanatomical
anatomical or orfunctional
functionalinformation.
information. The The acquired
acquired
and standardize the data. Next, segmentation methods [29] are employed to separate and
images
imagesthenthenundergo
undergopreprocessing
preprocessing techniques
techniques [28][28]
to correct artifacts,
to correct enhance
artifacts, quality,
enhance and
quality,
identify specific structures or regions of interest within the images. Registration tech-
standardize the data. Next, segmentation methods [29] are employed
and standardize the data. Next, segmentation methods [29] are employed to separate and to separate and iden-
niques [30] are applied to align multiple images or different modalities for spatial corre-
tify specific
identify structures
specific or regions
structures of interest
or regions of within
interestthe images.
within theRegistration techniquestech-
images. Registration [30]
spondence.
are applied
niques [30] toarealign multiple
applied images
to align or different
multiple imagesmodalities
or different formodalities
spatial correspondence.
for spatial corre-
Feature extraction algorithms [31] extract relevant quantitative or qualitative infor-
Feature extraction algorithms [31] extract relevant quantitative or qualitative infor-
spondence.
mation from the segmented regions for subsequent analysis. Classification methods [32]
mation from the
Feature segmented
extraction regions[31]
algorithms for subsequent
extract relevant analysis. Classification
quantitative methodsinfor-
or qualitative [32]
are then utilized to classify the extracted features, enabling the identification of diseases
are then utilized to classify the extracted features, enabling the identification
mation from the segmented regions for subsequent analysis. Classification methods [32] of diseases or
or conditions.
conditions.
Visualization
Visualization
techniques
techniques [33]
[33]
help
help
in
in interpretation
the
the interpretation and
and display
display of the
ofanal-
the
are then utilized to classify the extracted features, enabling the identification of diseases
analysis
ysis results
results for clinicians
for Visualization
clinicians and researchers.
andtechniques
researchers. Localization methods
Localization [34] precisely deter-
or conditions. [33] help in themethods [34] precisely
interpretation determine
and display of the
mine
the the
spatial spatial
locationlocation of abnormalities or structures within the images, aiding in diag-
analysis results for of abnormalities
clinicians or structures
and researchers. within themethods
Localization images, [34] aiding in diagnosis
precisely deter-
nosistreatment
and and treatment
planning.planning. These procedures, shown in Figure 2, collectively contrib-
mine the spatial location These procedures,
of abnormalities orshown
structuresin Figure
within2,thecollectively contribute
images, aiding to
in diag-
ute
the to the comprehensive
comprehensive analysisanalysis
and and interpretation
interpretation of of
medical medical
images, images, ultimately
ultimately facili-
facilitating
nosis and treatment planning. These procedures, shown in Figure 2, collectively contrib-
tating improved
improved patient care and medical research
[35]. [35].
ute to the patient care and
comprehensive medical
analysis research
and interpretation of medical images, ultimately facili-
tating improved patient care and medical research [35].
Acquisition Feature Extraction
Preprocessing
Acquisition Classification
Feature Extraction
Medical
Image Segmentation
Preprocessing Visualization
Classification
Medical
Analysis
Image Registration
Segmentation Localization
Visualization
Analysis
Registration Localization
Figure 2.
Figure 2. Eight
Eight common
common procedures
procedures in
in medical
medical image
image analysis.
analysis.
Figure
DL2.for
Eight
for MIA
MIAcommon
faces procedures
faces several in medicalAcquiring
several challenges.
challenges. image analysis.
a sufficient
sufficient quantity of high-quality
annotated
annotated medical
medical images
images can be challenging due to privacy privacy concerns,
concerns, limited
limited availability,
availability,
and theDL time-consuming
for MIA faces several challenges.
process ofmanual
manual Acquiring a sufficient
annotation [36].DLDL quantity
and ofmodels
high-quality
time-consuming process of annotation [36]. and ViTViT
models often
often re-
annotated
require medical images can be data
challenging dueoptimal
to privacy concerns, limited availability,
quire a alarge
large amount
amount ofoflabeled
labeled data totoachieve
achieve performance,
optimal performance, and this
and data may
and
be the time-consuming
limited processorofspecific
manual annotation [36]. Further,DL and ViT models
ViT often re-
limited for
for rare
rare diseases
diseases [37]
[37] or specific subpopulations.
subpopulations. DL and models
models
quire a large amount of
typically have a large number labeled data to achieve optimal performance, and this data may
number of of parameters,
parameters, making them demanding and in need of of
be limited for
substantial rare diseases [37] or specific subpopulations. Further, DL and ViT models
substantial computational resources [38]
computational resources [38] for
for training
training and
and inference.
inference.
typically have a large number of parameters, making them demanding and in need of
substantial
Author computational
Contributions: resources [38]
Conceptualization, Y.Z.for
andtraining and inference.
J.W.; methodology, J.M.G. and S.W.; validation,
Y.Z. and J.W.; formal analysis, J.M.G. and S.W.; investigation, Y.Z.; resources, J.W.; data curation, J.M.G.
and S.W.; writing—original draft preparation, Y.Z. and J.W.; writing—review and editing, J.M.G. and
S.W.; supervision, J.M.G. and S.W.; project administration, Y.Z. and J.W.; funding acquisition, Y.Z.,
J.M.G. and S.W. All authors have read and agreed to the published version of the manuscript.
J. Imaging 2023, 9, 147 3 of 4
Funding: This paper was partially supported by MRC, UK (MC_PC_17171); Royal Society, UK
(RP202G0230); Hope Foundation for Cancer Research, UK (RM60G0680); GCRF, UK (P202PF11); Sino-
UK Industrial Fund, UK (RP202G0289); LIAS, UK (P202ED10, P202RE969); Data Science Enhancement
Fund, UK (P202RE237); Fight for Sight, UK (24NN201); Sino-UK Education Fund, UK (OP202006);
BBSRC, UK (RM32G0178B8); MCIN/AEI (10.13039/501100011033); FEDER ‘Una manera de hacer
Europa’ (RTI2018-098913-B100) by the Consejeria de Economia, Innovacion, Ciencia y Empleo (Junta
de Andalucia); FEDER (CV20-45250, A-TIC-080-UGR18, B-TIC-586-UGR20, and P20-00525).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Ghouri, A.M.; Khan, H.R.; Mani, V.; ul Haq, M.A.; Jabbour, A. An artificial-intelligence-based omnichannel blood supply chain:
A pathway for sustainable development. J. Bus. Res. 2023, 164, 113980. [CrossRef]
2. Cundall, P. Human intelligence seems capable of anything to me. New Sci. 2023, 246, 30.
3. Lee, M.C.M.; Scheepers, H.; Lui, A.K.H.; Ngai, E.W.T. The implementation of artificial intelligence in organizations: A systematic
literature review. Inf. Manag. 2023, 60, 103816. [CrossRef]
4. Raspanti, M.A.; Palazzani, L. Artificial intelligence and human intelligence:Contributions of christian theology and philosophy of
the person. Biolaw J.-Riv. Biodiritto 2022, 457–471. [CrossRef]
5. Saleem, K.; Saleem, M.; Ahmad, R.Z.; Javed, A.R.; Alazab, M.; Gadekallu, T.R.; Suleman, A. Situation-aware bdi reasoning to
detect early symptoms of covid 19 using smartwatch. IEEE Sens. J. 2023, 23, 898–905. [CrossRef]
6. Goudar, V.; Peysakhovich, B.; Freedman, D.J.; Buffalo, E.A.; Wang, X.J. Schema formation in a neural population subspace
underlies learning-to-learn in flexible sensorimotor problem-solving. Nat. Neurosci. 2023, 26, 879–890. [CrossRef]
7. Gomez, C.; Unberath, M.; Huang, C.M. Mitigating knowledge imbalance in ai-advised decision-making through collaborative
user involvement. Int. J. Hum.-Comput. Stud. 2023, 172, 102977. [CrossRef]
8. Shakibi, H.; Faal, M.Y.; Assareh, E.; Agarwal, N.; Yari, M.; Latifi, S.A.; Ghodrat, M.; Lee, M. Design and multi-objective
optimization of a multi-generation system based on pem electrolyzer, ro unit, absorption cooling system, and orc utilizing
machine learning approaches; a case study of australia. Energy 2023, 278, 127796. [CrossRef]
9. Bhowmik, R.T.; Jung, Y.S.; Aguilera, J.A.; Prunicki, M.; Nadeau, K. A multi-modal wildfire prediction and early-warning system
based on a novel machine learning framework. J. Environ. Manag. 2023, 341, 117908. [CrossRef]
10. Kozikowski, P. Machine learning for grouping nano-objects based on their morphological parameters obtained from sem analysis.
Micron 2023, 171, 103473. [CrossRef]
11. Vinod, D.N.; Prabaharan, S.R.S. Elucidation of infection asperity of ct scan images of COVID-19 positive cases: A machine
learning perspective. Sci. Afr. 2023, 20, e01681. [CrossRef]
12. Abd Rahman, N.H.; Zaki, M.H.M.; Hasikin, K.; Abd Razak, N.A.; Ibrahim, A.K.; Lai, K.W. Predicting medical device failure:
A promise to reduce healthcare facilities cost through smart healthcare management. PeerJ Comput. Sci. 2023, 9, e1279. [CrossRef]
[PubMed]
13. Yazdanpanah, S.; Chaeikar, S.S.; Jolfaei, A. Monitoring the security of audio biomedical signals communications in wearable iot
healthcare. Digit. Commun. Netw. 2023, 9, 393–399. [CrossRef]
14. Pyne, Y.; Wong, Y.M.; Fang, H.S.; Simpson, E. Analysis of ‘one in a million’ primary care consultation conversations using natural
language processing. BMJ Health Care Inform. 2023, 30, e100659. [CrossRef]
15. Ahmed, S.; Raza, B.; Hussain, L.; Aldweesh, A.; Omar, A.; Khan, M.S.; Eldin, E.T.; Nadim, M.A. The deep learning resnet101
and ensemble xgboost algorithm with hyperparameters optimization accurately predict the lung cancer. Appl. Artif. Intell. 2023,
37, 2166222. [CrossRef]
16. Tyson, R.; Gavalian, G.; Ireland, D.G.; McKinnon, B. Deep learning level-3 electron trigger for clas12. Comput. Phys. Commun.
2023, 290, 108783. [CrossRef]
17. Almutairy, F.; Scekic, L.; Matar, M.; Elmoudi, R.; Wshah, S. Detection and mitigation of gps spoofing attacks on phasor
measurement units using deep learning. Int. J. Electr. Power Energy Syst. 2023, 151, 109160. [CrossRef]
18. Alizadehsani, Z.; Ghaemi, H.; Shahraki, A.; Gonzalez-Briones, A.; Corchado, J.M. Dcservcg: A data-centric service code generation
using deep learning. Eng. Appl. Artif. Intell. 2023, 123, 106304. [CrossRef]
19. Zhang, Y.; Dong, Z. Medical imaging and image processing. Technologies 2023, 11, 54. [CrossRef]
20. Kessler, S.; Schroeder, D.; Korlakov, S.; Hettlich, V.; Kalkhoff, S.; Moazemi, S.; Lichtenberg, A.; Schmid, F.; Aubin, H. Predicting
readmission to the cardiovascular intensive care unit using recurrent neural networks. Digit. Health 2023, 9, 20552076221149529.
[CrossRef]
21. Alam, F.; Ananbeh, O.; Malik, K.M.; Odayani, A.A.; Hussain, I.B.; Kaabia, N.; Aidaroos, A.A.; Saudagar, A.K.J. Towards predicting
length of stay and identification of cohort risk factors using self-attention-based transformers and association mining: COVID-19
as a phenotype. Diagnostics 2023, 13, 1760. [CrossRef] [PubMed]
22. Fuad, K.A.A.; Chen, L.Z. A survey on sparsity exploration in transformer-based accelerators. Electronics 2023, 12, 2299. [CrossRef]
23. Gradonm, K.T. Electric sheep on the pastures of disinformation and targeted phishing campaigns: The security implications of
chatgpt. IEEE Secur. Priv. 2023, 21, 58–61. [CrossRef]
J. Imaging 2023, 9, 147 4 of 4
24. Hoshi, T.; Shibayama, S.; Jiang, X.A. Employing a hybrid model based on texture-biased convolutional neural networks and
edge-biased vision transformers for anomaly detection of signal bonds. J. Electron. Imaging 2023, 32, 023039. [CrossRef]
25. Chen, S.; Lu, S.; Wang, S.; Ni, Y.; Zhang, Y. Shifted window vision transformer for blood cell classification. Electronics 2023,
12, 2442. [CrossRef]
26. Apostolidis, K.D.; Papakostas, G.A. Digital watermarking as an adversarial attack on medical image analysis with deep learning.
J. Imaging 2022, 8, 155. [CrossRef]
27. Kiryati, N.; Landau, Y. Dataset growth in medical image analysis research. J. Imaging 2021, 7, 155. [CrossRef]
28. Wang, S. Advances in data preprocessing for biomedical data fusion: An overview of the methods, challenges, and prospects. Inf.
Fusion 2021, 76, 376–421. [CrossRef]
29. Shan, C.X.; Li, Q.; Guan, X. Lightweight brain tumor segmentation algorithm based on multi-view convolution. Laser Optoelectron.
Prog. 2023, 60, 1010018. [CrossRef]
30. Baum, Z.M.C.; Hu, Y.P.; Barratt, D.C. Meta-learning initializations for interactive medical image registration. IEEE Trans. Med.
Imaging 2023, 42, 823–833. [CrossRef]
31. Shamna, N.V.; Musthafa, B.A. Feature extraction method using hog with ltp for content-based medical image retrieval. Int. J.
Electr. Comput. Eng. Syst. 2023, 14, 267–275. [CrossRef]
32. Hida, M.; Eto, S.; Wada, C.; Kitagawa, K.; Imaoka, M.; Nakamura, M.; Imai, R.; Kubo, T.; Inoue, T.; Sakai, K.; et al. Development
of hallux valgus classification using digital foot images with machine learning. Life 2023, 13, 1146. [CrossRef] [PubMed]
33. Niemitz, L.; van der Stel, S.D.; Sorensen, S.; Messina, W.; Sekar, S.K.V.; Sterenborg, H.; Andersson-Engels, S.; Ruers, T.J.M.;
Burke, R. Microcamera visualisation system to overcome specular reflections for tissue imaging. Micromachines 2023, 14, 1062.
[CrossRef] [PubMed]
34. Bodard, S.; Denis, L.; Hingot, V.; Chavignon, A.; Helenon, O.; Anglicheau, D.; Couture, O.; Correas, J.M. Ultrasound localization
microscopy of the human kidney allograft on a clinical ultrasound scanner. Kidney Int. 2023, 103, 930–935. [CrossRef]
35. Zhang, Y.; Gorriz, J.M. Deep learning in medical image analysis. J. Imaging 2021, 7, 74. [CrossRef]
36. Sylolypavan, A.; Sleeman, D.; Wu, H.H.; Sim, M. The impact of inconsistent human annotations on ai driven clinical decision
making. NPJ Digit. Med. 2023, 6, 26. [CrossRef]
37. Talesh, S.A.; Mahmoudi, S.; Mohebali, M.; Mamishi, S. A rare presentation of visceral leishmaniasis and epididymo-orchitis in a
patient with chronic granulomatous disease. Clin. Case Rep. 2023, 11, e7426. [CrossRef]
38. Court, L.E.; Fave, X.; Mackin, D.; Lee, J.; Yang, J.Z.; Zhang, L.F. Computational resources for radiomics. Transl. Cancer Res. 2016, 5,
340–348. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Jimaging 09 00147 v2

Uploaded by

Copyright:

Available Formats

Jimaging 09 00147 v2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jimaging 09 00147 v2

Uploaded by

Copyright:

Available Formats

Journal of

J. Imaging 2023, 9, 147. https://doi.org/10.3390/jimaging9070147 https://www.mdpi.com/journal/jimaging

J. Imaging 2023, 9, x FOR PEER REVIEW 2 of 4

You might also like