Medical Image Classification Algorithm Based On Vi PDF
Medical Image Classification Algorithm Based On Vi PDF
Research Article
Medical Image Classification Algorithm Based on Visual
Attention Mechanism-MCNN
Received 4 August 2020; Revised 2 February 2021; Accepted 6 February 2021; Published 20 February 2021
Copyright © 2021 Fengping An et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Due to the complexity of medical images, traditional medical image classification methods have been unable to meet the actual
application needs. In recent years, the rapid development of deep learning theory has provided a technical approach for solving
medical image classification. However, deep learning has the following problems in the application of medical image
classification. First, it is impossible to construct a deep learning model with excellent performance according to the
characteristics of medical images. Second, the current deep learning network structure and training strategies are less adaptable
to medical images. Therefore, this paper first introduces the visual attention mechanism into the deep learning model so that the
information can be extracted more effectively according to the problem of medical images, and the reasoning is realized at a
finer granularity. It can increase the interpretability of the model. Additionally, to solve the problem of matching the deep
learning network structure and training strategy to medical images, this paper will construct a novel multiscale convolutional
neural network model that can automatically extract high-level discriminative appearance features from the original image, and
the loss function uses the Mahalanobis distance optimization model to obtain a better training strategy, which can improve the
robust performance of the network model. The medical image classification task is completed by the above method. Based on
the above ideas, this paper proposes a medical classification algorithm based on a visual attention mechanism-multiscale
convolutional neural network. The lung nodules and breast cancer images were classified by the method in this paper. The
experimental results show that the accuracy of medical image classification in this paper is not only higher than that of
traditional machine learning methods but also improved compared with other deep learning methods, and the method has good
stability and robustness.
large-scale training artificial neural network for the diagnosis the level of a dermatologist. Bidart et al. [32] developed a
of benign and malignant reduction in false positives and pul- method for autolocalization of breast cancer tissue sections
monary nodules in nodule detection [8, 9] and achieved a using a fully convolutional neural network and divided the
good recognition effect. However, the adaptive ability was nuclear images from breast cancer tissue sections into lym-
poor. Way et al. [10] used a support vector machine to clas- phocytes, benign epithelial cells, and malignant epithelial
sify benign and malignant lung nodules by extracting the cells. The final classification accuracy rate was 94.6%. Setio
two-dimensional texture. However, it was less effective in et al. [33] input the 9 differently oriented patches extracted
classifying smaller lung nodules. Firmino et al. [11] used sup- from the candidate images into separate networks and com-
port vector machine algorithms to train 420 lung cancer cases bined them in the fully connected layer to obtain the final
and then classified smaller lung nodules. The classification classification output. The experimental results were more
accuracy rate was 94.4%. However, it is still not ideal for obvious than the general methods. Nie et al. [34] analyzed
the benign and malignant recognition of small lung nodules. the nature of MRI by training 3DCNN to assess the survival
Yu et al. [12] classified squamous cell carcinoma and adeno- rate of patients with high glioma. Payan and Montana [35]
carcinoma in nonsmall cell lung cancer by automatic micro- and Hosseini-Asl et al. [36] used 3D convolutional neural
scopic pathological image features. It had a classification networks to classify patients with Alzheimer’s disease. The
effect on lung adenocarcinoma and lung squamous cell carci- accuracy and robustness of the classifier were superior to sev-
noma of more than 70%. However, it did not have a good eral conventional classifiers. Through the above analysis,
solution for classification between large cell carcinoma and deep learning theory has been widely promoted and applied
squamous cell carcinoma and adenocarcinoma in lung can- in the field of medical image classification. It has also
cer. Through the above analysis, it can be seen that although achieved good results. However, since medical images have
such methods have been promoted and applied to a certain more distinct and different feature information than natural
extent in medical image classification, certain effects have images, the self-characteristics of medical images must be
been achieved. However, they cannot be adaptively matched fully considered when constructing a deep learning model.
to medical image characteristics. They cannot extract the Therefore, a deep learning model can obtain better medical
medical feature information contained in the image, which image classification effects, which is one of the most difficult
makes the overall classification effect of these methods less problems in building a deep learning model. Additionally,
than ideal. There is a certain gap between the classification due to the current deep learning network structure and train-
effect and the requirements for assisting doctors in perform- ing strategies, medical images are less adaptable. This finding
ing effective diagnoses [13, 14]. points to the accuracy of medical image classification based
In 2006, Hinton and Salakhutdinov first proposed the on deep learning. To fully exploit the characteristics of
concept of deep learning in the first issue of Science magazine medical images, first, by introducing a visual attention mech-
[15], which introduced feature learning. The emergence of anism, it is possible to locate and extract effective informa-
deep learning technology provides a new idea and technical tion for medical images. Second, reasoning can be achieved
approach for solving the problems existing in traditional at a finer granularity by introducing a visual attention mech-
machine learning medical image classification methods. anism. Finally, the attention mechanism increases the
Deep learning technology [15] has been widely used in com- explanatory form of the model by visualizing attention.
puter vision [16–18], speech recognition [19, 20], and video Additionally, to solve the problem of matching the deep
analysis [21–23]. Therefore, research on medical image clas- learning network structure and training strategy to medical
sification based on deep learning has attracted extensive images, this paper constructs a novel multiscale convolu-
attention and in-depth research by scholars in the industry. tional neural network model that can automatically extract
Anthimopoulos et al. [24] used a deep learning network to high-level discriminative appearance features from the origi-
classify an adult chest radiograph database and achieved nal image, and the loss function uses the Mahalanobis dis-
good results. Plis et al. [25] and Suk et al. [26, 27] used a deep tance optimization model to obtain a better training
belief network and a self-encoder to classify brain magnetic strategy, which can improve the robust performance of the
resonance imaging, which can be used to determine if a deep learning network model. The medical image classifica-
patient has Alzheimer’s disease. Zhang et al. [28] used a tion task is completed by the above method. Based on this,
restricted Boltzmann machine (RBM) to automatically this paper proposes a medical classification algorithm based
extract image features from shear waveform elastography on a visual attention mechanism-multiscale convolutional
and used this method to classify breast tumors. The classifi- neural network.
cation accuracy rate reached 93.4%. Cheng et al. [29] used a Section 2 of this paper explains the deep learning model
stacked autoencoder to classify breast ultrasound lesions based on the visual attention mechanism proposed in this
and nonnodules. The classification performance was 10% paper. Section 3 systematically describes the multiscale con-
higher than the general method. Kawahara and Hamarneh volutional neural network model proposed in this paper.
[30] used a multibranch CNN to classify skin lesions, which Section 4 introduces a medical classification algorithm based
achieved a certain classification effect. Esteva et al. [31] on a visual attention mechanism-multiscale convolutional
trained CNN using a dataset of 129,450 clinical images. The neural network. Section 5 analyzes the medical image classi-
CNN also classified and recognized cell carcinoma and fication algorithm proposed in this paper and compares it
benign seborrheic keratosis. The experimental results showed with the mainstream medical image classification algorithm.
that the classification level of skin cancer by CNN reached Finally, the full text is summarized and discussed.
Oxidative Medicine and Cellular Longevity 3
2. Deep Learning Model Based on a Visual question f c . To date, the model has performed a single-
Attention Mechanism round projection process for cross-modal memories. Then,
the model proceeds to the next round of the cross-modal
2.1. Multiwheel Attention Memory Network Encoder. Given attention mechanism through equation (7) until the preset
an image f , its category is c, denoted as f c in this article. First, total round limit r is reached, which can solve the problem
input the image into the standard long short-term memory of information transfer and reasoning.
network (LSTM) [37] and then use the long-short-term Alternate projection calculations for the category and
memory model to calculate the vector of the image as V f c ∈ visual fact memory through the r-round obtain ur containing
Rd . The specific calculation formula is factual information that can answer question f c . Then, input
it to the single-layer neural network to obtain the final output
V f c = LSTM f ð f c Þ: ð1Þ of the encoder, specifically
ec = tanh ðW e ur + be Þ, ð8Þ
In the formula, all problems share the encoder LSTM f .
This paper proposes a network structure with multiple
where W e and be are the weights and offsets of the fully con-
rounds of attention mechanisms, which is used to extract
nected neural network, respectively. ec is the final encoded
enough useful information from the cross-modal medical
output of the problem f c .
category fact memory and visual fact memory. It, thus,
enables more accurate class identification f c . 2.2. Generating and Discriminating Decoders
The vector representation V f c for the class identification
problem f c and the two memory banks M I and M H are asso- 2.2.1. Generating Decoder. Given the problem f c and the
ciated therewith. The problem V f c is first projected into the encoder’s final coded representation ec , its corresponding
class ac is generated by:
historical medical category memory M H by the following for-
mula to retrieve the category facts associated with the cate- h0 = e c , ð9Þ
gory identification question f c :
hi = LSTMg ðhi−1 , xi−1 Þ, i = 1, ⋯, jac j, ð10Þ
u0 = V f c , ð2Þ
pi = soft max ðW g hi + bg Þ, ð11Þ
si = u j−1 ⋅ mH
i , j = 1, ⋯, r:, ð3Þ
where LSTMg is a productive LSTM decoding network. hi is
c−1 c−1 the output of the ith training of LSTMg . xi is the vector rep-
u j−1 = u j−1 + 〠 αi mi , αi = exp ðsi Þ/ 〠 expðsi Þ, ð4Þ resentation corresponding to the i-th category of the image
i=0 i=0 ac . The length of ac is ∣ac ∣ . pi is the probability distribution
where r represents the attention mechanism round. si is a of the category. During the training process, the method
measure of similarity between u j−1 and mi H . The result u j−1 maximizes the probability of generating the correct category
ac . In the test evaluation phase, the probability size of each
is calculated each time through the projection problem to
the category memory. This method treats the result as the lat- candidate category is calculated first. All candidate categories
est expression of the problem f c carrying the category. The are then ordered by probability in descending order.
neural network is then used to continue projecting it into the 2.2.2. Discriminant Decoder. First, each candidate class is
visual fact memory to retrieve the computationally related encoded by the following LSTM, which can obtain the corre-
visual category facts. The specific formula is as follows: sponding vector expression:
h = tanh W f ,h M I ⊕ W u u j−1 + bh , ð5Þ hai = LSTMd ðai Þ, i = 1, ⋯, N, ð12Þ
pI = soft max W p h + bp , ð6Þ where LSTMd is the encoder for all candidate categories and
196
shares weights. hai is the final encoded output of the ith candi-
u j = u j−1 + 〠 pIi mIi , ð7Þ date category ai . Then, the degree of similarity si between the
i=0 vectors is calculated by the point multiplication similarity:
where W i,h , W u ∈ Rk×d , and bh ∈ Rk . ⊕ marks the addition si = ec ⋅ hai : ð13Þ
between the matrix and the vector. h is the output of the
single-layer neural network obtained by the previous result Then, all the similarities fs1 , s2 , ⋯, sN g obtained are
obtained by the nonlinear hyperbolic tangent function. W p spliced together and input into a softmax classifier, which then
∈ Rk×d and pI ∈ R196 represent the projection probability value calculates the posterior probability of all candidate categories;
(i.e., correlation) between each picture region and u j−1 . The namely,
output u j is obtained by projecting the visual memory. It also
contains the category and visual fact information related to pa = soft max ðs1 Θs2 Θ ⋯ sN Þ, ð14Þ
4 Oxidative Medicine and Cellular Longevity
N max f0, Dð f ðyi Þ, f ðy+i ÞÞ, anced weighting factor. The distance function Dx , Dp is
Lðy, wÞ = 〠 ð16Þ defined as follows:
i=1 −Dð f ðyi Þ, f ðy−i ÞÞ + αg,
!
k f w ðyi Þ − f w ðy−i Þk2M
Dx ðyi , y+i , y−i , w, M Þ = 1− ,
where D(.,.) is expressed as the Euclidean distance between k f w ðyi Þ − f w ðy+i Þk2M + α
two feature vectors. α represents the interval parameter. w
represents a parameter in a deep convolutional network. ð18Þ
The loss function will constrain the distance of the “dissimi-
lar image pair” to be greater than the distance of the “similar Dp ðyi , y+i , w, M Þ = k f w ðyi Þ − f w ðy+i Þk2M , ð19Þ
image pair,” and the distance has a certain interval. However,
when the feature distance of the image pair is very small, the where k⋅k2M represents the Mahalanobis distance between the
gradient corresponding to each image will also become two features. α represents the interval between predefined
smaller. It may cause the disappearance of the back propaga- pairs of similar and dissimilar image pairs. The tripletwise
tion time gradient, and it will learn a suboptimal model. In constraint in the loss function belongs to the Mahalanobis
addition, the loss function does not restrict similar images distance of the matching pair whose Markov distance is less
to the distance within the class. It may make the features of than the mismatched pair. Pairwise constraints are intended
the same medical category image have large clusters of intra- to constrain the distance between image features within a
class variance in the learned feature space. class, which makes the generated features robust to factors
To solve the problem of the disappearance of the back such as illumination, camera angle, blur, and occlusion. The
propagation time gradient, the learning to obtain a subopti- improved triplet loss function is constrained in the distance
mal model and the loss function without constraints on the between images, as shown in Figure 2. An improved triple
intraclass distance of similar image pairs. This paper pro- loss function can be used to optimize the proposed deep con-
poses an improved triplet loss function to eliminate the volutional neural network model. It can learn robust and dis-
defect that the original loss function has no adaptive ability. criminative medical image category feature information from
Its definition is as follows: the original image.
The proposed deep convolutional network model and the
1 N optimized triplet loss function are trained together by a sto-
Lðy, w, M Þ = 〠 ðmax f0, Dx ðyi , y+i , y−i , w, M ÞgÞ chastic gradient descent algorithm. The parameter w of the
N i=1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl
ffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} deep convolutional network model and the Mahalanobis dis-
triplet‐wise constraint
ð17Þ tance matrix M are optimized. The learning process is iter-
+ μ ⋅ Dp ðyi , y+i , y−i , w, M Þ , ated back and forth until the preset number of training
|fflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
pair‐wise constraint rounds is reached, and an optimal network model is finally
obtained. This optimization strategy is inspired by the opti-
mization method [41]. This article replaces ðyi , yi + , yi − Þ with
where N represents the total number of input triplet images, yi in the equation below. In each iteration, the Mahalanobis
w is a parameter in a deep convolutional network, M repre- distance matrix M is first fixed as a constant, and the objec-
sents a Mahalanobis distance matrix, and μ represents a bal- tive function becomes the optimized depth network model
6 Oxidative Medicine and Cellular Longevity
parameter w. Therefore, the optimized triplet loss function is inite matrices S+ . The formula for calculating the partial
simplified to optimize a similar form of the original triplet derivative of the Mahalanobis distance matrix M in formula
loss at the Euclidean distance. As network parameters are (17) is as follows:
constantly updated and optimized, the network model can
be extracted to obtain more efficient medical image feature ∂Lðy, w, M Þ 1 N ∗ 1 N
representations. Formula (17) gives the calculation of the = 〠 dx ðyi , w, M Þ + 〠 d∗p ðyi , w, M Þ,
∂M N i=1 N i=1
network parameter w partial derivative as follows:
ð25Þ
∂Lðy, w, M Þ 1 N
1 N
8
= 〠 d x ðyi , w, M Þ + 〠 d p ðyi , w, M Þ, < ∂Dx ðyi , w, M Þ , D ðy , w, M Þ > 0,
∂w N i=1 N i=1
d ∗x ðyi , w, M Þ = ∂M x i
ð26Þ
ð20Þ :
0, Dx ðyi , w, M Þ ≤ 0,
8 ∂Dp ðyi , w, M Þ
< ∂Dx ðyi , w, M Þ , D ðy , w, M Þ > 0, d∗p ðyi , w, M Þ = μ ⋅ : ð27Þ
dx ðyi , w, M Þ = ∂w x i
ð21Þ ∂M
:
0, Dx ðyi , w, M Þ ≤ 0, Given the Dx ðyi , w, MÞ and Dp ðyi , w, MÞ conditions,
their gradient formulas for parameter matrix M are as
∂Dp ðyi , w, M Þ
d p ðyi , w, M Þ = μ ⋅ : ð22Þ follows:
∂w
According to Dx ðyi , w, MÞ and Dp ðyi , w, MÞ defined in ∂Dx ðyi , w, M Þ
= g1 ⋅ Cii− + g2 ⋅ C ii+ , ð28Þ
the previous formula, their gradient formulas for w are as ∂M
follows: ∂Dp ðyi , w, M Þ
= C ii+ , ð29Þ
∂Dx ðyi , w, M Þ ~ ∂f ðy Þ − ∂f w ðy−i Þ ∂M
= M ð f w ðyi Þ − f w ðy−i ÞÞ w i g1
∂w ∂w where C ij = ð f ðyi Þ − f ðy j ÞÞ ð f ðyi Þ − f ðy j ÞÞT . Minimizing for-
~ ð f ðy Þ − f ðy+ ÞÞ ∂f w ðyi Þ − ∂f w ðyi Þ g ,
+
+M mula (17) in the process of optimizing M, it must ensure that
w i w i
∂w 2
matrix M is a semipositive definite matrix. Thus, the gradient
ð23Þ updated matrix M is projected onto cone S+ , which includes
all semidefinite matrices after each gradient iteration. The pro-
∂Dp ðyi , w, M Þ
~ ð f w ðyi Þ − f w ðy+i ÞÞ, ∂f w ðyi Þ − ∂f w ðyi Þ ,
+ jection process is achieved by matrix diagonalization. The for-
=M mula M = VΔV T represents an eigenvalue decomposition
∂w ∂w
ð24Þ operation on the distance metric matrix M. V represents an
orthogonal matrix of eigenvectors of the metric matrix. Δb
~ = ðM + M T Þ, g = −ðk f w ðyi Þ − f w ðy−i Þk2 + αÞ/ðk f w
where M denotes a diagonal matrix of the composition of the eigen-
1 M
values of the metric matrix. The diagonal matrix can be further
ðyi Þ − f w ðy+i Þk2M + αÞ2 , g2 = ðkf w ðyi Þ − f w ðy−i Þk2M + αÞ/ split into Δ = Δ+ + Δ− . Δ + contains all positive eigenvalues;
2
ðk f w ðyi Þ − f w ðy+i Þk2M + αÞ . It can be seen from equations Δ − contains all negative eigenvalues. Matrix M is then pro-
(23) and (24) that in the case of given eigenvalues f w ðyi Þ, f w jected onto the cone in mathematical form as:
ðyi + Þ, and f w ðyi − Þ and gradients ∂f ðyi Þ/∂w, ∂f ðy+i Þ/∂w, and
∂f ðy−i Þ/∂w, the gradient of each image in the triple is calcu- PS+ = VΔ+ V T : ð30Þ
lated. Therefore, the Mahalanobis distance matrix M is fixed
to a constant. The gradient update for w in formula (17) can Formula (29) rejects all negative eigenvalues of matrix M
be obtained by standard forward and backward propagation after each iteration update by zeroing the negative eigenvalues.
through the network model for each image in the triple. It guarantees the semidefiniteness of the matrix M.
After this phase of each round of iterations, all training The optimization algorithm of formula (17) optimizes
images can be mapped to a particular feature space by train- the depth network model parameter w and the Mahalanobis
ing the updated deep network model. In the second stage, distance matrix M using the update rules in formulas (20),
given a triplet image feature representation f w ðyi Þ, f w ðyi + Þ, (25), and (29). The main learning process: first, from the ini-
and f w ðyi − Þ, the objective function becomes a semipositive tialization of M and w, set M as the unit matrix and train
distance measure matrix M optimized in formula (17), which according to the predefined number of iterations. In the first
is used to map feature representations from feature spaces to stage, the Mahalanobis distance matrix W in formula (17) is
appropriate distance spaces. Therefore, it can use an iterative first fixed, and then the network parameter w is optimized
gradient random descent projection algorithm to optimize using a small batch gradient random descent algorithm. In
the objective function. Specifically, at each iteration, the the second phase, the parameter w is fixed, and the medical
parameter M is optimized along the direction of the gradient classification image is transferred to the feature space
descent to reduce the target loss. The matrix M is then pro- through the updated depth network model. Then, the itera-
jected onto a feasible set containing all of its semipositive def- tive Mahalanobis distance matrix M is optimized using
Oxidative Medicine and Cellular Longevity 7
radiographs from 15 different medical institutions in the to one-tenth of the original when training was conducted to
world, including 154 cases of pulmonary nodules and 93 50 and 80 epochs, and the training lasted for 160 epochs. In
cases of no pulmonary nodules. The size of each image in all training sessions, the model proposed in this paper was
the database is 2048 × 2048 pixels. The diameter of the lung trained based on the stochastic gradient descent method,
nodules ranges from 5 to 40 mm. The lung nodule database and the number of samples per batch was set to 128.
established by JSRT is classified as very obvious, obvious,
inconspicuous, not obvious, and extremely inconspicuous 5.1.2. Classification Results and Analysis. The medical image
according to the degree of detection of pulmonary nodules. classification algorithm proposed in this paper and other
In this experiment, only the lung nodules in the obvious area mainstream medical image classification algorithms were
of the lung were classified. A total of 150 lung nodule images used to classify the lung nodule database established by JSRT.
met the test criteria. Some example pictures are shown in The classification results are shown in Table 1.
Figure 4. Table 1 shows that the classification accuracy of the med-
To better serve the later classification test, for the lung ical classification algorithm based on the visual attention
nodule database established by JSRT, this experiment used mechanism-multiscale convolutional neural network pro-
the gray histogram method to enhance the experimental posed in this paper improved compared with the traditional
image, as shown in Figure 5. The enhanced image can greatly machine learning algorithm and other deep learning algo-
improve the contrast of the lung nodules and the surround- rithms. This method has certain advantages. Specifically,
ing tissue structure. the traditional machine learning methods proposed in [43,
The deep learning model used in this experiment was 44] have classification accuracy rates of 95.37% and 95.96%,
based on the PyTorch implementation and was trained on respectively. Although the classification accuracy of tradi-
a Titan-X GPU. The deep learning model was based on the tional machine learning methods reached 95%, traditional
visual attention mechanism-multiscale convolutional neural machine learning methods are still the least effective among
network proposed in Sections 2 and 3 of this paper. The ini- the listed classification methods because the traditional
tial learning rate was set to 0.01, the learning rate was reduced machine learning method has certain problems in the
Oxidative Medicine and Cellular Longevity 9
Table 1: Comparison table of classification results of different Table 2: Comparison table of classification results of different
classification algorithms on the lung nodule database established classification algorithms on the WBCD database (%).
by JSRT (%).
Method type Classification accuracy
Method type Classification accuracy Rough cotraining [49] 92.78
[43] 95.37 Bayesian [50] 93.39
[44] 95.96 Neural networks [51] 95.61
[45] 98.12 CNN [52] 98.53
[46] 98.23 DeepNet1 [53] 99.02
[47] 98.38 DeepNet2 [54] 99.23
Our 99.86 Our 99.89
training effect of the lung nodule database established by The deep learning model used in this experiment was
JSRT, which directly leads to the image classification accu- based on the PyTorch implementation and was trained on
racy being weaker than other categories. The recognition the Titan-X GPU. The deep learning model is based on the
accuracy of the deep learning method proposed in the litera- visual attention mechanism-multiscale convolutional neural
ture [45–47] was 98.12%, 98.23%, and 98.38%, respectively. network proposed in Sections 2 and 3 of this paper. The ini-
Their classification accuracy was above 98%, and the classifi- tial learning rate was set to 0.001, and the batch size was 16.
cation accuracy was higher than the traditional machine The momentum was set to 0.3, and the weight attenuation
learning method by more than 1%. This is mainly because parameter was set to 0.0004. All dimensions were set for each
the deep learning model can train the lung nodule database pooled size, and stride was set to 2.
established by JSRT to obtain a more reasonable and reliable
image classification model. The classification accuracy of this 5.2.2. Classification Results and Analysis. The WBCD data-
method was 99.86%, and the classification accuracy rate was base was classified and identified by the medical image classi-
the highest among all methods, which is close to all accurate fication algorithm proposed in this paper and other
classifications, which fully verifies that the proposed method mainstream medical image classification algorithms. For a
can improve the accuracy of image classification and has bet- more fair comparison, this experiment used the average
ter stability and robustness. This is mainly because the accuracy of the training test ratio of 1 : 1 and the training test
method proposed in this paper compared the method pro- ratio of 2 : 1. Each was the result of 20 averages. The classifi-
posed in [43–47], and not only introduced the visual aid cation results are shown in Table 2.
mechanism to the deep learning model but also improved Table 2 shows that the classification accuracy of the med-
the network structure and loss function in the deep learning ical classification algorithm based on the visual attention
model. Through the optimization and improvement of the mechanism-multiscale convolutional neural network pro-
two, the modeling ability and nonlinear classification ability posed in this paper improved compared with the traditional
of the deep learning model improved, and the highest classi- machine learning algorithm and other deep learning algo-
fication effect was achieved. rithms. This method has obvious advantages. Specifically,
the traditional machine learning methods proposed in [49–
5.2. Breast Cancer Classification Experiment. To further ver- 51] had classification accuracy rates of 92.78%, 93.39%, and
ify the effect of the proposed algorithm on medical image 95.61%, respectively. Although the classification accuracy of
classification, this section classifies and tests the Wisconsin traditional machine learning methods reached more than
Breast Cancer Database (WBCD) [48] and compares it with 92%, the traditional machine learning method still had the
mainstream medical image classification algorithms. lowest recognition accuracy among the listed classification
methods because the traditional machine learning method
5.2.1. Database Introduction and Test Process Description. has a poor training effect on the WBCD database. The recog-
The WBCD was established by the University of California, nition accuracy of the deep learning method proposed in the
Irvine, and the WBCD has a total of 699 human breast tissue literature [52–54] reached 98.53%, 99.02%, and 99.23%,
samples. Of these, 458 samples were benign, and 241 samples respectively. Their classification accuracy rate was over
were malignant. All samples consisted of 9 cytological fea- 98%, which was more than 3% higher than traditional
tures of benign or malignant breast fine-needle aspiration. machine learning methods because the deep learning model
Each feature determines a specific value based on how well obtained a more reasonable and reliable image classification
the sample is expressed on the feature, and values are integer model for WBCD database training. The classification accu-
values between 1 and 10. Additionally, since the dataset racy of this method was 99.89%, and the classification accu-
includes 16 samples lacking specific attribute values, consid- racy rate was the highest among all methods, which was
ering the lack of samples with missing data values, they are similar to all accurate classifications. It fully verifies that the
treated as invalid data in the experiments in this paper. Only proposed method can extract the various feature information
relevant experiments were performed on the remaining 683 of the WBCD database to the maximum extent. It further
samples. proves that the deep learning model proposed in this paper
10 Oxidative Medicine and Cellular Longevity
is better than the deep learning method proposed in [52–54]. Conflicts of Interest
This is mainly because the proposed method can better opti-
mize the network architecture of the deep learning model The authors declare no conflict of interest.
and introduce the visual attention mechanism.
In summary, experiments using lung nodules and breast
cancer medical image databases show that the traditional Acknowledgments
medical image classification algorithm has the disadvantages
This paper is supported by the Natural Science Foundation of
of low recognition accuracy and poor stability in medical
Jiangsu Province (No. BK20201479), the National Natural
image classification tasks. The classification accuracy of the
Science Foundation of China (No. 61701188), China Post-
deep learning classification algorithm in the above database
doctoral Science Foundation (No. 2019M650512), and Hebei
is significantly better than that of the traditional machine
IoT Monitoring Engineering Technology Research Center
learning algorithm. This proves the advantages of the deep
funded project (No. IOT202004).
learning model. In addition, the medical image classification
algorithm based on the visual attention mechanism-
multiscale convolutional neural network proposed in this
paper can obtain better classification performance than other
References
proposed deep classification algorithms because the deep [1] P. Suetens, Fundamentals of Medical Imaging, Cambridge uni-
learning model proposed in this paper not only solves the versity press, 2017.
problem of model network architecture but also introduces [2] B. J. Erickson, P. Korfiatis, and Z. Akkus, “Machine learning
a visual attention mechanism that is particularly effective for medical imaging,” Radiographics, vol. 37, no. 2, pp. 505–
for medical image classification. 515, 2017.
[3] M. Frid-Adar, I. Diamant, and E. Klang, “GAN-based syn-
6. Conclusion thetic medical image augmentation for increased CNN perfor-
mance in liver lesion classification,” Neurocomputing, vol. 321,
Since medical images have more distinct and different feature pp. 321–331, 2018.
information than natural images, medical image characteris- [4] R. Zhang, Y. Zheng, and M. TWC, “Automatic detection and
tics must be fully considered in establishing a deep learning classification of colorectal polyps by transferring low-level
model for medical image classification. Therefore, this paper CNN features from nonmedical domain,” IEEE Journal of Bio-
first adds the visual attention mechanism to the deep learning medical and Health Informatics, vol. 21, no. 1, pp. 41–47, 2017.
model, and the attention mechanism increases the explana- [5] N. Dey and A. Ashour, Classification and Clustering in Bio-
tory form of the model by visualizing the attention, which medical Signal Processing, IGI global, Hershey, 2016.
can extract the feature information of the medical image [6] A. Meyer-Baese and V. J. Schmid, Pattern Recognition and Sig-
more effectively. To solve the problem of matching the deep nal Analysis in Medical Imaging, Elsevier, 2014.
learning network structure and training strategy to medical [7] M. C. Lee, L. Boroczky, and K. Sungur-Stasik, “Computer-
images, a novel multiscale convolutional neural network aided diagnosis of pulmonary nodules using a two-step
model is constructed. It automatically extracts high-level approach for feature selection and classifier ensemble con-
discriminative appearance features from the original image struction,” Artificial intelligence in medicine, vol. 50, no. 1,
and improves the robust performance of the network pp. 43–53, 2010.
model. Based on this, this paper proposes a medical classi- [8] K. Suzuki, S. G. Armato III, and F. Li, “Massive training artifi-
fication algorithm based on a visual attention mechanism- cial neural network (MTANN) for reduction of false positives
multiscale convolutional neural network. in computerized detection of lung nodules in low-dose com-
The results of lung nodule and breast cancer classification puted tomography,” Medical Physics, vol. 30, no. 7, pp. 1602–
1617, 2003.
experiments show that the classification accuracy of the deep
learning medical image classification method is the highest, [9] K. Suzuki, F. Li, and S. Sone, “Computer-aided diagnostic
reaching 99.86% and 99.89%. This is because this paper bet- scheme for distinction between benign and malignant nodules
in thoracic low-dose CT by use of massive training artificial
ter solves the problem of the network structure of the deep
neural network,” IEEE Transactions on Medical Imaging,
learning model and the interpretability of the model. Addi- vol. 24, no. 9, pp. 1138–1150, 2005.
tionally, the deep learning medical image classification
[10] T. W. Way, B. Sahiner, and H. P. Chan, “Computer-aided
method proposed in this paper can extract the feature infor- diagnosis of pulmonary nodules on CT scans: improvement
mation of medical images better, which is beneficial for of classification performance with nodule surface features,”
improving the effects of lung nodules and breast cancer clas- Medical physics, vol. 36, no. 7, pp. 3086–3098, 2009.
sification experiments. Therefore, the deep learning medical [11] M. Firmino, G. Angelo, and H. Morais, “Computer-aided
image classification method proposed in this paper achieves detection (CADe) and diagnosis (CADx) system for lung can-
the best classification accuracy. cer with likelihood of malignancy,” Biomedical Engineering
Online, vol. 15, no. 1, pp. 2–9, 2016.
Data Availability [12] K. H. Yu, C. Zhang, and G. J. Berry, “Predicting non-small cell
lung cancer prognosis by fully automated microscopic pathol-
The data used to support the findings of this study are ogy image features,” Nature communications, vol. 7,
included within the paper. pp. 12474–12481, 2016.
Oxidative Medicine and Cellular Longevity 11
[13] J. G. Lee, S. Jun, and Y. W. Cho, “Deep learning in medical [31] A. Esteva, B. Kuprel, and R. A. Novoa, “Dermatologist-level
imaging: general overview,” Korean journal of radiology, classification of skin cancer with deep neural networks,”
vol. 18, no. 4, pp. 570–584, 2017. Nature, vol. 542, no. 7639, pp. 115–118, 2017.
[14] K. Suzuki, “Overview of deep learning in medical imaging,” [32] R. Bidart, M. J. Gangeh, and M. Peikari, “Localization and clas-
Radiological Physics and Technology, vol. 10, no. 3, pp. 257– sification of cell nuclei in post-neoadjuvant breast cancer sur-
273, 2017. gical specimen using fully convolutional networks,” in Medical
[15] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimen- Imaging: Digital Pathology, pp. 10581–10589, International
sionality of data with neural networks,” Science, vol. 313, Society for Optics and Photonics, 2018.
no. 5786, pp. 504–507, 2006. [33] A. A. A. Setio, F. Ciompi, and G. Litjens, “Pulmonary nodule
[16] A. Kendall and Y. Gal, “What uncertainties do we need in detection in CT images: false positive reduction using multi-
Bayesian deep learning for computer vision?,” Advances in view convolutional networks,” IEEE transactions on medical
neural information processing systems, pp. 5574–5584, 2017. imaging, vol. 35, no. 5, pp. 1160–1169, 2016.
[17] C. Szegedy, V. Vanhoucke, and S. Ioffe, “Rethinking the incep- [34] D. Nie, H. Zhang, and E. Adeli, “3D deep learning for multi-
tion architecture for computer vision,” in Proceedings of the modal imaging-guided survival time prediction of brain tumor
IEEE conference on computer vision and pattern recognition, patients,” in International Conference on Medical Image Com-
pp. 2818–2826, Las Vegas, 2016. puting and Computer-Assisted Intervention, pp. 212–220, Ath-
[18] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep ens, 2016.
learning in computer vision: A survey,” IEEE Access, vol. 6, [35] A. Payan and G. Montana, “Predicting Alzheimer's disease: a
pp. 14410–14430, 2018. neuroimaging study with 3D convolutional neural networks,”
[19] D. Amodei, S. Ananthanarayanan, and R. Anubhai, “Deep 2015, https://arxiv.org/abs/1502.02506.
speech 2: end-to-end speech recognition in English and Man- [36] E. Hosseini-Asl, G. Gimel'farb, and A. El-Baz, “Alzheimer's
darin,” in International conference on machine learning, disease diagnostics by a deeply supervised adaptable 3D con-
pp. 173–182, Los Ageles, 2016. volutional network,” 2016, https://arxiv.org/abs/1607.00556.
[20] H. M. Fayek, M. Lech, and L. Cavedon, “Evaluating deep learn- [37] K. Greff, R. K. Srivastava, and J. Koutník, “LSTM: a search
ing architectures for Speech Emotion Recognition,” Neural space odyssey,” IEEE transactions on neural networks and
Networks, vol. 92, pp. 60–68, 2017. learning systems, vol. 28, no. 10, pp. 2222–2232, 2017.
[21] N. Takahashi, M. Gygli, and L. Van Gool, “Aenet: learning [38] K. He, X. Zhang, and S. Ren, “Deep residual learning for image
deep audio features for video analysis,” IEEE Transactions on recognition,” in Proceedings of the IEEE conference on com-
Multimedia, vol. 20, no. 3, pp. 513–524, 2017. puter vision and pattern recognition, pp. 770–778, Las Vegas,
[22] D. Tran, L. Bourdev, and R. Fergus, “Learning spatiotemporal 2016.
features with 3d convolutional networks,” in Proceedings of the [39] S. Ding, L. Lin, and G. Wang, “Deep feature learning with rel-
IEEE international conference on computer vision, pp. 4489– ative distance comparison for person re-identification,” Pat-
4497, Santiago, 2015. tern Recognition, vol. 48, no. 10, pp. 2993–3003, 2015.
[23] T. Mei and C. Zhang, “Deep learning for intelligent video anal- [40] F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: a unified
ysis,” in Proceedings of the 25th ACM international conference embedding for face recognition and clustering,” in Proceedings
on Multimedia, pp. 1955-1956, Mountain View, 2017. of the IEEE conference on computer vision and pattern recogni-
[24] M. Anthimopoulos, S. Christodoulidis, and L. Ebner, “Lung tion, pp. 815–823, Boston, 2015.
pattern classification for interstitial lung diseases using a deep [41] P. Kontschieder, M. Fiterau, and A. Criminisi, “Deep neural
convolutional neural network,” IEEE Transactions on Medical decision forests,” in Proceedings of the IEEE international con-
Imaging, vol. 35, no. 5, pp. 1207–1216, 2016. ference on computer vision, pp. 1467–1475, Santiago, 2015.
[25] S. M. Plis, D. R. Hjelm, and R. Salakhutdinov, “Deep learning [42] A. M. R. Schilham, B. Van Ginneken, and M. Loog, “A
for neuroimaging: a validation study,” Frontiers in neurosci- computer-aided diagnosis system for detection of lung nod-
ence, vol. 8, pp. 229–238, 2014. ules in chest radiographs with an evaluation on a public
[26] H. I. Suk and D. Shen, “Deep learning-based feature represen- database,” Medical Image Analysis, vol. 10, no. 2, pp. 247–
tation for AD/MCI classification,” in International Conference 258, 2006.
on Medical Image Computing and Computer-Assisted Inter- [43] G. Wu, X. Zhang, and S. Luo, “Lung segmentation based on
vention, pp. 583–590, Nagoya, 2013. customized active shape model from digital radiography chest
[27] H. I. Suk, S. W. Lee, and D. Shen, “Hierarchical feature repre- images,” Journal of Medical Imaging and Health Informatics,
sentation and multimodal fusion with deep learning for vol. 5, no. 2, pp. 184–191, 2015.
AD/MCI diagnosis,” NeuroImage, vol. 101, pp. 569–582, 2014. [44] T. Lan, S. Chen, and Y. Li, “Lung nodule detection based on
[28] Q. Zhang, Y. Xiao, and W. Dai, “Deep learning based classifi- the combination of morphometric and texture features,” Jour-
cation of breast tumors with shear-wave elastography,” Ultra- nal of Medical Imaging and Health Informatics, vol. 8, no. 3,
sonics, vol. 72, pp. 150–157, 2016. pp. 464–471, 2018.
[29] J. Z. Cheng, D. Ni, and Y. H. Chou, “Computer-aided diagno- [45] M. Keming and D. Zhuofu, “Lung nodule image classification
sis with deep learning architecture: applications to breast based on ensemble machine learning,” Journal of Medical
lesions in US images and pulmonary nodules in CT scans,” Sci- Imaging and Health Informatics, vol. 6, no. 7, pp. 1679–1685,
entific reports, vol. 6, pp. 24454–24465, 2016. 2016.
[30] J. Kawahara and G. Hamarneh, “Multi-resolution-tract CNN [46] S. L. Fernandes, V. P. Gurupur, and H. Lin, “A novel fusion
with hybrid pretrained and skin-lesion trained layers,” Inter- approach for early lung cancer detection using computer aided
national Workshop on Machine Learning in Medical Imaging, diagnosis techniques,” Journal of Medical Imaging and Health
pp. 164–171, 2016. Informatics, vol. 7, no. 8, pp. 1841–1850, 2017.
12 Oxidative Medicine and Cellular Longevity