0% found this document useful (0 votes)
19 views13 pages

Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging

Uploaded by

bakhshesh96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views13 pages

Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging

Uploaded by

bakhshesh96
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received: 3 August 2022

DOI: 10.1049/ccs2.12072

REVIEW
- Accepted: 7 December 2022

- Cognitive Computation and Systems

Review and analysis of deep neural network models for


Alzheimer's disease classification using brain medical resonance
imaging

Shruti Pallawi | Dushyant Kumar Singh

MNNIT Allahabad, Prayagraj, India Abstract


Alzheimer's disease is a type of progressive neurological disorder which is irreversible
Correspondence
and the patient suffers from severe memory loss. This disease is the seventh largest
Shruti Pallawi.
Email: [email protected]
cause of death across the globe. As yet there is no cure for this disease, the only way to
control it is its early diagnosis. Deep Learning techniques are mostly preferred in clas-
sification tasks because of their high accuracy over a large dataset. The main focus of
this paper is on fine‐tuning and evaluating the Deep Convolutional Networks for Alz-
heimer's disease classification. An empirical analysis of various deep learning‐based
neural network models has been done. The architectures evaluation includes Incep-
tionV3, ResNet with 50 layers and 101 layers and DenseNet with 169 layers. The dataset
has been taken from Kaggle which is publicly available and comprises of four classes
which represents the various stages of Alzheimer's disease. In our experiment, the ac-
curacy of DenseNet consistently improved with the increase in the number of epochs
resulting in a 99.94% testing accuracy score better than the rest of the architectures.
Although the results obtained are satisfactory, but for future research, we can apply
transfer learning on other deep models like Inception V4, AlexNet etc., to increase
accuracy and decrease computational time. Also, in future we can work on other datasets
like ADNI or OASIS and use Positron emitted tomography, diffusion tensor imaging
neuroimages and their combinations for better result.

KEYWORDS
artificial intelligence, artificial neural networks, computer vision, machine learning, supervised learning

1 | INTRODUCTION hypertension, depression, high blood pressure and other


cognitive disorders [2, 3]. Alzheimer's disease not only en-
Alzheimer's disease is considered one of the leading causes of dangers the patient's health but even threatens their life. The
dementia that generally starts slowly, and gradually the situation very early symptoms of this disease include partial memory
worsens. The main cause of Alzheimer's disease (AD) is low loss, difficulty in remembering recent events etc. In the
brain activity and blood flow in the brain. According to WHO advanced stage of this disease, the patient starts suffering from
the total number of patients suffering from AD doubles every language problems, mood swings and various behavioural is-
five years and can reach up to 152 million by 2050 [1]. The sues [4, 5]. The body starts losing its functions gradually which
treatment and therapy of this disease is very expensive, as well ultimately may lead to death. Although the proper cause of this
as takes a very long time for diagnosis. The main tragic fact disease is not known but with early identification and timely
with this disease is that its cause is still unknown. Many factors medications its effect can be minimised [6, 7].
are responsible for this disease. Among them, 70% are regar- There are various modalities available for the diagnosis of
ded as genetic, other factors can be severe head injuries, AD, like MRI (Medical resonance imaging), Positron emitted

-
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
© 2023 The Authors. Cognitive Computation and Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Shenzhen University.

Cogn. Comput. Syst. 2023;5:1–13. wileyonlinelibrary.com/journal/ccs2 1


25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2
- PALLAWI and SINGH

tomography (PET), computed Tomography and Cerebrospinal hardware, computational resources and high dimensional data.
fluid. Even though different modalities or their combination Good hyperparameters and optimal tuning are needed by deep
can be used for diagnosis, majority practical methods use MRI neural networks, failing which may lead to underfitting, over-
in their work. In this study, MRI of the brain is used for the fitting and various training issues during classification and
AD classification task which is safe and painless, MRI can feature extraction. Figure 1 below represents various stages of
clearly distinguish between the grey/white matter of the brain Alzheimer's disease obtained from brain MRI.
by providing high resolution for the soft tissues. For this In order to deal with these issues in CNN, another most
reason, MRI‐based analysis is mostly preferred in clinical popular learning technique has been adopted by researchers
research for the diagnosis of AD. The dataset used in our work called Transfer Learning (TL), which is widely used for
is taken from Kaggle, which is publicly available and consists different applications in the medical field [10, 11]. This tech-
of 6400 MRI images divided into four classes. Based on the nique makes use of various parameters and resources of a pre‐
damage to brain cells and the patient's health condition, AD trained model. The Transfer Learning model is efficient in
can be classified into four stages: moderate demented, mild classifying various computer vision tasks with minimum
demented, very mild demented and non‐demented AD [8, 9]. computation cost. In our study, comparative analysis of deep
The exact cause of Alzheimer's disease has not yet been clearly learning models like InceptionV3, ResNet50, ResNet101 and
identified by the scientists. According to most of the recent DenseNet169 has been done using the concept of TL [12]. The
studies, it is said to be caused due to high deposition of am- main objective of our research work are as follows:
yloid β proteins which are responsible for death in brain cells
and thus blocks the transmission of signals. The loss of tissues � Performing experiments on various deep learning architec-
in various sub‐regions of the brain causes atrophy in the hip- tures for the identification of stages of Alzheimer's disease
pocampus, cerebral cortex and other components of the brain. using brain MRI.
In the past few years, the Convolutional Neural Network � Comparative analysis of each model on the basis of various
(CNN) emerged as the most popular technique for the medical performance metrics.
image classification, yielding splendid performance. But while � Analysis of various learning curves obtained for determining
training any neural network, there is a requirement for extra the accuracies of each model during training and validation.

FIGURE 1 Different stages of Alzheimer's disease.


25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 3

2 | RELATED WORK Network that provides a feature extraction tool to generate


better accuracy and high‐level features with the ternary clas-
Several approaches have been used by different researchers for sification of CN, MCI and AD on the OASIS dataset. Junren
the classification and detection of AD by using various deep Pan et al. [10] use DecGAN for the detection of neural circuits
learning methods by different researchers. Al‐Farabi Z. for AD. The proposed model efficiently extracts complemen-
Nagashbayev et al. [8] have proposed a Capsule Network for tary information between diffusion tensor imaging and fMRI
the classification of Normal Control (NC) and AD. Their (functional MRI), but the limitation in their work was that the
model reduces the number of parameters for 3D inputs trained way the internal neural circuit affected the development of AD
in the original capsule networks significantly. The framework was not known. Wen Junhao et al. [3] proposed a CNN model
introduced works for subject‐dependent experiments and is for the classification of AD using four different approaches
useful when the number of GPU (General Processing Unit) slice based, patch based, region of interest based and subject
and MRI images is limited but is not suited for subject‐ based. Finally, a framework was used for the comparative
independent experiments. Shuomei Chen et al. [6] have analysis of CNN‐based different approaches. Bangyal Waqas
introduced the VBM‐DARTL method for preprocessing the Haider et al. [14] proposed a CNN‐based approach for the
sMRI data. Various splitting methods are adopted for the identification of AD. Machine learning techniques were also
evaluation and classification of pre‐processed ADNI (Alz- used for the performance comparison. The results were
heimer's disease Neuroimaging Initiative) subsets based on five generated using three strategies, 10 cross‐fold and the grid
convolutional neural networks, LeNet5, VGGNet, AlexNet, search technique.
GoogleNet, and ResNet for obtaining a significant effect on
the AD classification task. S. Sambath Kumar et al. [13] have
introduced the Entropy slicing method for extracting the fea- 3 | PROPOSED METHODOLOGY
tures and Transfer Learning (VGG16) for ternary classification
of AD, NC and Mild Cognitive impairment (MCI). This slicing In the proposed work, we have evaluated and compared four
method is used for the selection of MRI slices that are most different deep learning models, InceptionV3, ResNet50,
informative during the training stages. The model they pro- ResNet101 and DenseNet169 for the classification and
posed can effectively reduce the complexity of pre‐processing detection of Alzheimer disease. For the identification of the
and unreliability by using TL, thus increasing the system’s Alzheimer's disease, accurate and fast models are desired so
performance and accuracy. Muskan Kapoor et al. [12] have that appropriate measures can be applied at an earlier stage.
used a cognitive evaluation score as a biomarker in their paper.
Cognitive scores assigned to input features help in the early
detection of AD. They have evaluated four different models: 3.1 | InceptionV3
Support Vector Machine (SVM), Random Forest (RF), SVM
with RF and Artificial Neural Network for their accuracy In our experiment, a pretrained InceptionV3 model has been
scores and performance. Among these, RF gave the best result used. InceptionV3 is the modification of the base model,
as it can effectively handle missing values and reduces over- InceptionV1 [15], which was introduced by GoogleNet in
fitting in the model. 2014. The traditional convolutional network consisted of
Karim Aderghal et al. [1] have proposed a CNN model multiple deep layers resulting in overfitting of the data. To
with intermediate fusion and late fusion techniques. In their overcome this problem, InceptionV1 introduced the concept
work, the Shrinkage phenomenon of the hippocampus region of multiple filters at the same level of different sizes. This
is observed for the diagnosis of AD. A majority vote gives the concept made the model wider instead of deeper. In Figure 2
best result for the classification of AD/NC in the late fusion below the architecture of InceptionV1 is represented which is
technique and sagittal projection, whereas it does not perform the base model for InceptionV3.
well for the AD/MCI classification. Fan Zhang et al. [7] have In the above Figure 3 the architecture of InceptionV3 has
proposed a VGG19 model based on the multi‐modal deep been represented. InceptionV3 was introduced in 2015 with 42
learning technique for AD diagnosis. In their work, two layers and lower error rate. Many modifications were done in
different neural networks are trained on MRI and PET images, the InceptionV3 model‐like factorization of larger convolu-
and the result is compared with the clinical psychological tion, which was done into smaller convolutions [18]. For
diagnosis. The proposed model combines the clinical neuro- example, in order to decrease the computational cost, two
psychological with the neuroimaging diagnosis making it closer 3 � 3 convolutional layers were used instead of one 5 � 5
to the clinician's diagnosis process, thus making it easy in convolutional layer. Next was the use of Asymmetric Convo-
implementing. Richa Jain et al. [9] used the TL approach for lutions, where instead of one 3 � 3 convolutions, two con-
accurately classifying brain sMRI(Structural MRI) slices for AD volutions 1 � 3 followed by 3 � 1 were used. A further
classification. The VGG16 model used in their work performs modification was the use of an auxiliary classifier [19]. It made
well on binary as well as ternary classification. improvements in the convergence of very deep neural net-
Jyoti Islam et al. [4] proposed the Inception V4 model in works and also helped in combating the problem of vanishing
their work. Tian Bai et al. [5] proposed the image enhancement gradients in very deep networks. Besides this, the activation
technique of brain slice based on the Generative Adversarial dimension of network filters in InceptionV3 were expanded in
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4
- PALLAWI and SINGH

FIGURE 2 InceptionV1 model architecture [16].

FIGURE 3 InceptionV3 model architecture [17].

order to reduce the grid size of feature maps obtained. Fine‐ by researchers to solve the problem of vanishing/exploding
tuning of Inception V3 was performed by using pre‐trained gradient. In Figure 4 a skip connection in ResNet architecture
weights of ImageNet for the classification and identification has been shown, where the intermediate layer has been skipped
of the disease. by the model.
In this network, the concept of a skip connection was used to
propagate information across the layers. These skip connections
3.2 | ResNet skip the training of a few layers and directly connects them to the
output. The main benefit of adding these skip connections in
Very deep neural networks often suffer from vanishing the network is that if any of the layers impair the performance of
gradient problems. Residual networks were proposed in 2015 the architecture, it is skipped through regularisation. ResNet [16]
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 5

is a type of network‐in‐network (NIN) architecture as it consists DenseNet model has been shown in Figure 5 containing three
many residual units stacked together. The collection of these dense blocks where the layers in each block are densely con-
residual units forms a building block for the ResNet architecture. nected with one another.
Residual networks are easy to optimise and also gain high ac- The DenseNet network becomes thinner and compact
curacy with the increase in depth of the network. ResNet ar- with a fewer number of channels as feature maps are received
chitecture follows two design principles: 1) For each layer, the from all preceding layers, and so this network is better in terms
number of filters should be the same according to the output of computational and memory requirements. Also, it helps in
feature map size. 2) If the size of the feature map is halved, we reducing overfitting on tasks having small training set sizes
need to double the filter number in order to maintain the because of regularising the effect of dense connections. For the
complexity of the network. In my work, the ResNet model with task of Alzheimer's disease classification, the DenseNet model
50 and 101 layers has been used and loaded with pre‐trained with 169 layers has been used by loading the model with pre‐
weights from ImageNet for the identification of AD [20]. trained weights from ImageNet.
Resnet 50: ResNet‐50 is a 50‐layer CNN (48 convolutional DenseNet169: The depth of DenseNet169 is 169 layers, but
layers, one MaxPool layer and one average pool layer.) A small still the number of parameters is comparatively low than that of
change was made in ResNet50 from the previous architectures. other models. The vanishing gradient problem in this architec-
Shortcut connections skipped three layers in this architecture ture is also handled very well. In this architecture, the last fully
with 1 � 1 convolution layers. connected layer was replaced by 256 nodes fully connected layer.
ResNet101: ResNet101 is constructed using 101 layers. Further, 128 nodes fully connected layer and at last 10 fully
The number of three‐layer blocks are more than that of connected layers with softmax activation output were added.
ResNet50. ResNet101 is fast and more accurate than
ResNet50, and even after the increase in depth, the complexity
of this network is not increased. 4 | EXPERIMENTATIONS AND
RESULTS

3.3 | DenseNet Experimentations were done on a Central Processing Unit with


a 3.20 GHz core i7 processor and 16 GB RAM. The operating
DenseNet [17] is a convolutional network architecture that is system used is Window 11 of 64 bits. OpenCV, Keras and
densely connected. Since all the layers are directly connected in Theano library are used for implementing the software. No
a feed‐forward manner, there is a maximum flow of infor- GPU was used here.
mation and gradient between the layers in the network, which
make them easy to train. In this network, additional input is
obtained by each layer from all the preceding layers, and their 4.1 | Dataset and pre‐processing
feature maps are passed to their subsequent layers. DenseNet
uses concatenation of features from different layers, unlike We trained and evaluated various deep learning models to
ResNet where element‐wise addition is used and compared to classify and detect Alzheimer's disease on MRI images of the
Inception networks, which also use concatenation but Den- human brain. An open and freely available Kaggle dataset was
seNets are simpler and more efficient. The architecture of the used for this study. The dataset consists of 6400 images that
are divided into four classes namely non‐demented (3200 im-
ages), mild demented (896 images), moderate demented (64
images) and very mild demented (2240 images), representing
the various stages of AD. The images are first resized into
76 � 76 pixels. Data were then normalised by dividing all the
pixel values by 255 to make them compatible with the initial
values of the network. The synthetic minority oversampling
technique was further used in order to balance our dataset The
dataset is split into three sets: the training, validation and test
sets in the ratio of 70:20:10. The training data is used for model
FIGURE 4 Skip connection in ResNet [16]. learning, Validation data is the sample data that is used for

FIGURE 5 DenseNet model architecture [17].


25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6
- PALLAWI and SINGH

evaluating the data and its parameter tuning. The test data is Recall mainly focuses on FN, that is, the Type 2 error. False
used for the prediction and final evaluation of the model. negative means identifying a non‐infected patient as disease‐
infected. Recall close to one signifies that our model has
least TP, and its value less than 0.5 means our classifier has a
4.2 | Training high number of FN.

In the work done, we have performed the classification and d. F1‐score: F1‐score is a way to combine both precision and
identification of Alzheimer's disease by using the deep neural recall in one measure when the impact of both FP and FN
network. Our focus is on the comparative analysis of various is equally important.
deep learning models like InceptionV3, ResNet with 50,101
layers and DenseNet with 169 layers. For each experiment, 2 � ðPrecision � RecallÞ
F1 − score ¼ ð4Þ
categorical cross‐entropy loss and accuracy metric are used ðPrecision þ RecallÞ
which is based on the test dataset. Each model was trained for
a total of 50 epochs. All the networks were trained using the The value of the F1 score is high if both precision and
optimisation algorithm, RMSProp (Root Mean Squared recall are high. The value of the F1 score lies between 0 and 1.
Propagation), which helps in fast convergence. The learning
rate was set to 0.001 with a batch size of 32. The ReLU
(Rectified Linear Unit) activation function and Batch Nor- 4.4 | Evaluated results for InceptionV3
malisation technique were applied. The results of the experi-
ments for each model are presented below. The result of experiments performed on the InceptionV3
model are given below.
Table 1 gives the evaluation metric for InceptionV3. Here
4.3 | Performance measures the precision for each class is high, which shows that the model
has very least cases of False Positive. Whereas Recall for the
In the proposed work, Accuracy, Precision, Recall and F1‐ very mild demented class is low, which shows this class consists
score have been evaluated. The equations for obtaining each some False Negative values; consequently, the F1‐score value is
metric are given below. also low.
Table 2 shows the loss and accuracy on training & vali-
a. Accuracy: It indicates the total number of correct pre- dation data after every 10 epochs. The training and validation
dictions made by the model [21]. The history of the accu- loss kept decreasing with the growing number of epochs to
racy graph depicts the training and validation accuracy of 0.19 and 0.28, respectively. Whereas the training & validation
the graph. accuracy gradually increased to 93.63% and 89.45%, respec-
TP þ TN tively. Figure 6 shows the confusion matrix for the above‐
Accuracy ¼ ð1Þ obtained result.
T P þ T N þ FP þ FN
The model was trained for 50 epochs, and the average
training accuracy of 99.77% was obtained. The results obtained
Where, TP = True Positive, TN = True Negative,
show that the model was best fitted with the used dataset.
FP = False Positive and FN = False Negative.

b. Precision: It is also known as Positive prediction value. It


gives the ratio of total positives and total positive pre-
4.5 | Evaluated results for ResNet50 &
dictions. Basically, it tells us how many times our models'
ResNet101
positive predictions were actually positive.
In this work, ResNet with 50 layers and 101 layers were used
TP for the experimentation purpose, and the obtained results are
Precision ¼ ð2Þ as follows:
T P þ FP
Table 3 presents the evaluation metric for ResNet50 and
ResNet101. For ResNet50 ,in case of MildDemented where
Precision mainly focusses on FP, that is, the Type 1 error.
False positive means incorrectly labelling disease‐infected pa-
tients as non‐infected, which can be risky. Hence the value of TABLE 1 Evaluation metric for InceptionV3.
FP should always be as low as possible. Class Recall Precision F1‐score Support
Non‐Demented 0.69 0.79 0.74 205
c. Recall: Also known as the True Positive rate. It tells us out of
all positive points how many of them were predicted positive. Very Mild Demented 0.11 1.00 0.20 9

Mild Demented 0.77 0.77 0.77 334


TP
Recall ¼ ð3Þ Moderate Demented 0.70 0.63 0.66 288
T P þ FN
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 7

T A B L E 2 Loss & accuracy during


No. of epochs Training accuracy Training loss Validation accuracy Validation loss
training & validation after every 10 epochs.
10 75.93 0.56 79.00 0.46

20 85.13 0.38 85.16 0.36

30 89.06 0.31 88.13 0.31

40 90.91 0.26 89.01 0.30

50 93.63 0.19 89.45 0.28

FIGURE 6 Confusion matrix of InceptionV3.

TABLE 3 Evaluation metric for ResNet50 & ResNet101.

ResNet50 ResNet101
Class Recall Precision F1‐score Support Recall Precision F1‐score Support
Non‐Demented 0.83 0.80 0.81 639 0.76 0.85 0.80 639

Very Mild Demented 1.00 1.00 1.00 635 1.00 0.99 1.00 635

Mild Demented 0.43 0.79 0.56 662 0.86 0.56 0.68 662

Moderate Demented 0.73 0.51 0.60 624 0.24 0.45 0.31 624

precision is high and recall is low means when a model clas- that the model has both large number of false positive and
sifies a sample as positive, it is accurate but it can classify only false negative.
few positive samples, whereas in case of moderate demented Table 4 shows the loss and accuracy on training and vali-
where the value of recall is high and precision is low means dation data after every 10 epochs. The training and validation
model classifies most of the positive sample correctly, but it loss kept decreasing with the increasing number of epochs to
has many false positive. But for ResNet101, in case of mod- 0.67 and 0.56 for ResNet50 and 0.64 and 0.62 for ResNet101,
erate demented where both precision and recall is low states respectively. Whereas the training and validation accuracy
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8
- PALLAWI and SINGH

TABLE 4 Summary of training and validation accuracy and loss at certain epochs.

ResNet50 ResNet101
No. of Training Training Validation Validation Training Training Validation Validation
epochs accuracy loss accuracy loss accuracy loss accuracy loss
10 59.88 0.87 65.82 0.70 57.70 0.88 58.69 0.80

20 65.25 0.77 72.51 0.62 64.36 0.76 69.04 0.65

30 67.96 0.72 74.41 0.59 68.05 0.69 70.41 0.62

40 69.08 0.69 76.66 0.56 69.38 0.66 71.73 0.61

50 70.13 0.67 76.66 0.56 70.62 0.64 71.73 0.62

FIGURE 7 Confusion matrix for (a) ResNet 50 and (b) ResNet 101.

gradually increased to 70.13% and 76.66% for ResNet50 and TABLE 5 Evaluation metric for Densenet169.
70.62 and 71.73 for ResNet101, respectively. Figure 7 presents Class Recall Precision F1‐score Support
the confusion matrix for the above‐obtained result for both the
ResNet Models. Non‐Demented 0.97 0.95 0.96 639
The average training accuracy obtained after training the Very Mild Demented 1.00 1.00 1.00 635
models for 50 epochs for ResNet50 was 79.65%, whereas for
Mild Demented 0.92 0.91 0.91 662
ResNet101 was 75.26%. The results obtained are not as good
as InceptionV3. Moderate Demented 0.89 0.91 0.90 624

4.6 | Evaluated results for DenseNet169 respectively. Figure 8 shows the confusion matrix for the
above‐obtained result for the DenseNet169 model.
DenseNet is a convolution network with 169 layers in which all The above result shows that DenseNet169 gives the best
the layers are directly connected with other. The experimental performance with the highest training accuracy of 97.17% after
results obtained from this model are as follows. the model got trained for 50 epochs, which is the best accuracy
Table 5 gives the evaluation metric for DenseNet169. Here obtained among all the four architectures.
the precision and recall for each class is high, which shows that
model neither contains any False Positive nor any False
Negative values. Therefore, the value of the F1‐score is also 5 | DISCUSSION AND ANALYSIS
high for each class.
Table 6 presents the loss and accuracy on training and Recent work done in deep learning models shows that deeper
validation data after every 10 epochs. The training and vali- models are efficient to train and are more accurate. However,
dation loss kept decreasing with the growing number of epochs with the increase in depth of the model, other challenges also
to 0.08 and 0.25, respectively, whereas the training and vali- increases such as vanishing gradient, degradation, internal co-
dation accuracy gradually increased to 97.17% and 93.26%, variate shift problem and increase in computational cost.
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 9

Therefore, to deal with these problems, various strategies has used the TL approach to train pre‐trained InceptionV3,
been used in different architectures such as skip connections, ResNet50, ResNet101 and DenseNet169 CNN models. Eval-
TL, batch normalisation and optimisation methods. The uation of our model was done on unseen testing data. A
adopted TL concept helped in boosting the accuracy and comparative analysis of all four models has been shown in
reducing the execution time. Table 7 where the number of parameters, training time and
First of all, data augmentation is done to enhance our accuracy obtained during experimentation have been
dataset, and the augmented images are extracted for our pro- summarised.
posed CNN model. After augmentation, the oversampling In this section we have discussed the experimental analysis
operation has been performed using the synthetic minority of proposed models. All the architectures have been trained on
oversampling technique in order to balance our dataset. We same modalities, data augmentation, feature types with the

T A B L E 6 Training and validation


No. of epochs Training accuracy Training loss Validation accuracy Validation loss
accuracy and loss at certain epochs.
10 92.53 0.21 91.06 0.28

20 96.47 0.10 92.63 0.25

30 97.49 0.07 93.12 0.25

40 97.44 0.07 93.21 0.25

50 97.17 0.08 93.26 0.25

FIGURE 8 Confusion matrix of DenseNet169.

TABLE 7 Comparative Analysis of all four models.

No. of No. of parameters Training Validation Testing Training


Model layers (in Millions) accuracy% accuracy% accuracy% time
InceptionV3 48 23.03 99.77 89.45 88.87 1 h 30 min

ResNet50 50 28.61 79.65 76.66 74.45 2 h 52 min

ResNet101 101 47.68 75.26 71.73 71.84 4 h 55 min

DenseNet169 169 100.12 99.94 93.26 94.34 5 h 48 min


25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10
- PALLAWI and SINGH

same dataset for the detection and classification of AD. We In Table 8 performance metric of all the four models has
have used five metrics to evaluate the performance of our been summarised. In above table, we observe that the precision
model. The learning curves obtained depicts how model per- in case of DenseNet169 is 94% highest among the rest of the
forms with the growing number of epochs on the training and models, while ResNet101 showed 71%. Similarly, the value of
validation set. The training set is a portion of the dataset which recall for DenseNet169 is 95%, while InceptionV3 had the
is used to train the model initially, while the validation set lowest of 59%. We also observed that for the InceptionV3
validate the performance of the model during learning. model where precision and recall was respectively 80% and
56%, then the F‐score obtained was only 59%, while in other
models as both the precision & recall values were high, thus
T A B L E 8 Performance measures using accuracy, precision, recall,
F1‐score and Loss. the F1‐score obtained was also high.
The loss curve illustrates the error made by the curve. It
InceptionV3 ResNet50 ResNet101 DenseNet169 shows how much data has been hampered in terms of preci-
Accuracy 99.77% 79.65% 75.26% 99.94% sion. They play an important role in providing faulty results
Precision 79.75% 77.50% 71.25% 94.25%
during testing. Training loss is measured after each batch,
whereas validation loss is measured after each epoch.
Recall 56.75% 74.75% 71.50% 94.50% The plot for loss function comparing the behaviour of
F1‐score 59.25% 74.25% 69.75% 94.2% training and validation loss obtained during the training pro-
Loss 0.19 0.67 0.64 0.08
cess is presented in Figure 9. From the above‐obtained loss
curves, we observe that both the loss curves for InceptionV3

FIGURE 9 Learning curve for training and validation loss for (a) Inception V3, (b) ResNet50, (c) ResNet101, and (d) DenseNet169.
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
11
-

Learning curve for training and validation accuracy for (a) ResNet50 (b) DenseNet169 (c) InceptionV3 (d) ResNet101.
Loss graph for all the four models.
PALLAWI and SINGH

FIGURE 10

FIGURE 11
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12
- PALLAWI and SINGH

FIGURE 12 Accuracy graph for all the four models.

and ResNet50 are gradually decreasing till the end, whereas for architectures, DenseNet gave the best result by making
ResNet101, the loss curves almost saturate after 30 epochs and continuous improvement in accuracy with the increase in the
that for DenseNet169 after 20 epochs. We also observe that number of epochs without any overfitting or performance
the gap between the validation curve and training curve is not deterioration. DenseNet169 obtained the best result of 99.94%
large for any model. Thus, all our model acquires a good fit accuracy on training data and 93.26% accuracy on validation
except in the case of ResNet50 where the curve of validation data beating the rest of the architectures. Therefore, we can
loss is slightly less than training loss and validation loss is consider DenseNet as an efficient architecture for the classi-
gradually decreasing which shows the condition of a bit overfit. fication of Alzheimer's disease. Although the results obtained
Among all architectures, DenseNet169 gave the best result were satisfactory, we can do further research to reduce the
with the minimum loss of 0.08. Figure 10 below represents the computational time.
training and validation loss of all the four models in the form
of a bar graph where the loss for DenseNet169 is minimum. C O N FL I C T O F I N TE R E S T S TA TE M E N T
Figure 11 depicts the accuracy metric of all the four‐model The authors declare that there are no conflicts of interest
comparing their performance by training and testing the ac- regarding the publication of this article.
curacy curve where we can observe that both the accuracy
curves of InceptionV3 and ResNet50 are steadily increasing, DA TA AVA I L A B I L I T Y S TA TE M E N T
while in case of ResNet101, the accuracy has saturated almost Data openly available in a public repository that issues datasets
after 35 epochs and that of DenseNet169 it saturated after 20 with DOIs.
epochs only during the training process.
Moreover, the gap between the training and validation O R CI D
curve is not large, therefore we can say that our model acquires Shruti Pallawi https://orcid.org/0000-0002-9129-9989
a good fit. Among all the architectures, DenseNet169 gave the
best result as the training data accuracy is favourable around R E F ER E N CE S
99%, and the validation data accuracy which is around 93% 1. Aderghal, K., Benois‐Pineau, J., Afdel, K.: Classification of sMRI for
lacks to match the accuracy bar by only 6%. In ResNet50, the Alzheimer's disease diagnosis with CNN: single Siamese networks with
2D+? Approach and fusion on ADNI. In: Proceedings of the 2017 ACM
model shows the condition of a bit overfit as the validation
on International Conference on Multimedia Retrieval (2017)
curve is slightly higher than the training curve. Figure 12 below 2. Szegedy, C., et al.: Rethinking the inception architecture for computer
represents training, validation and testing accuracy of all the vision. In: Proceedings of the IEEE Conference on Computer Vision
four models in the form of a bar graph where the accuracy for and Pattern Recognition (2016)
DenseNet169 is maximum. 3. Wen, J., et al.: Convolutional neural networks for classification of Alz-
heimer's disease: overview and reproducible evaluation. Med. Image
Anal. 63, 101694 (2020). https://doi.org/10.1016/j.media.2020.101694
4. Islam, J., Zhang, Y.: A novel deep learning based multi‐class classification
6 | CONCLUSION AND FUTURE SCOPE method for Alzheimer’s disease detection using brain MRI data. In:
International Conference on Brain Informatics. Springer, Cham (2017)
In our work, fine‐tuning and evaluation of the deep neural 5. Bai, T., et al.: A novel Alzheimer’s disease detection approach using
GAN‐based brain slice image enhancement. Neurocomputing 492,
network for the detection and classification of Alzheimer's
353–369 (2022). https://doi.org/10.1016/j.neucom.2022.04.012
disease have been performed. InceptionV3, ResNet with 50 6. Chen, S., et al.: Alzheimer's disease classification using structural MRI
and 101 layers and DenseNet with 169 layers are the archi- based on convolutional neural networks. In: 2020 2nd International
tectures that have been evaluated. Among all the above Conference on Big‐Data Service and Intelligent Computation (2020)
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 13

7. Zhang, F., et al.: Multi‐modal deep learning model for auxiliary diagnosis 16. Andrade, L., et al.: Shearlets as Feature Extractor for Semantic Edge
of Alzheimer’s disease. Neurocomputing 361, 185–195 (2019). https:// Detection: The Model‐Based and Data‐Driven Realm (2019)
doi.org/10.1016/j.neucom.2019.04.093 17. García, Z., et al.: Mosquito Larvae Image Classification Based on Den-
8. Nagashbayev, Al‐F., Fatih Demirci, M.: Alzheimer's disease classification seNet and Guided Grad‐CAM (2019). https://doi.org/10.1007/978‐3‐
using capsule networks on structural MRI. In: 2020 5th International 030‐31321‐0_21
Conference on Biomedical Imaging. Signal Processing (2020) 18. Fulton, L.V., et al.: Classification of Alzheimer’s disease with and without
9. Jain, R., et al.: Convolutional neural network ‐based Alzheimer’s disease imagery using gradient boosted machines and ResNet‐50. Brain Sci. 9(9),
classification from magnetic resonance brain images. Cognit. Syst. Res. 212 (2019). https://doi.org/10.3390/brainsci9090212
57, 147–159 (2019). https://doi.org/10.1016/j.cogsys.2018.12.015 19. Szegedy, C., et al.: Rethinking the inception architecture for computer
10. Pan, J., et al.: DecGAN: decoupling Generative Adversarial Network vision. In: Proceedings of the IEEE Conference on Computer Vision
detecting abnormal neural circuits for Alzheimer's disease. arXiv preprint and Pattern Recognition (2016)
arXiv:2110.05712 (2021) 20. Kabir, A., et al.: Multi‐classification based Alzheimer's disease detection
11. Roy, S.S., et al.: Deep convolutional neural network for environmental with comparative analysis from brain MRI scans using deep learning. In:
sound classification via dilation. Journal of Intelligent & Fuzzy Systems TENCON 2021‐2021 IEEE Region 10 Conference (TENCON). IEEE
Preprint 43(2), 1–7 (2022). https://doi.org/10.3233/jifs‐219283 (2021)
12. Kapoor, M., et al.: Early diagnosis of Alzheimer's disease using machine 21. Wang, H.: Research on MRI classification method of Alzheimer's Disease
learning based methods. In: 2021 Thirteenth International Conference brain based on convolutional neural network. In: Proceedings of the 2nd
on Contemporary Computing (IC3‐2021) (2021) International Symposium on Artificial Intelligence for Medicine Sciences
13. Sambath Kumar, S., Nandhini, M.: Entropy slicing extraction and (2021)
transfer learning classification for early diagnosis of Alzheimer diseases
with sMRI. ACM Trans. Multimed Comput. Commun. Appl 17(2), 1–22
(2021). https://doi.org/10.1145/3383749 How to cite this article: Pallawi, S., Singh, D.K.:
14. Bangyal, W.H., et al.: Constructing domain ontology for alzheimer dis- Review and analysis of deep neural network models for
ease using deep learning based approach. Electronics 11(12), 1890 (2022). Alzheimer's disease classification using brain medical
https://doi.org/10.3390/electronics11121890
15. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of
resonance imaging. Cogn. Comput. Syst. 5(1), 1–13
the IEEE Conference on Computer Vision and Pattern Recognition (2023). https://doi.org/10.1049/ccs2.12072
(2015)

You might also like