Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging
Review and Analysis of Deep Neural Network Models For Alzheimer's Disease Classification Using Brain Medical Resonance Imaging
DOI: 10.1049/ccs2.12072
REVIEW
- Accepted: 7 December 2022
KEYWORDS
artificial intelligence, artificial neural networks, computer vision, machine learning, supervised learning
-
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the
original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
© 2023 The Authors. Cognitive Computation and Systems published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology and Shenzhen University.
tomography (PET), computed Tomography and Cerebrospinal hardware, computational resources and high dimensional data.
fluid. Even though different modalities or their combination Good hyperparameters and optimal tuning are needed by deep
can be used for diagnosis, majority practical methods use MRI neural networks, failing which may lead to underfitting, over-
in their work. In this study, MRI of the brain is used for the fitting and various training issues during classification and
AD classification task which is safe and painless, MRI can feature extraction. Figure 1 below represents various stages of
clearly distinguish between the grey/white matter of the brain Alzheimer's disease obtained from brain MRI.
by providing high resolution for the soft tissues. For this In order to deal with these issues in CNN, another most
reason, MRI‐based analysis is mostly preferred in clinical popular learning technique has been adopted by researchers
research for the diagnosis of AD. The dataset used in our work called Transfer Learning (TL), which is widely used for
is taken from Kaggle, which is publicly available and consists different applications in the medical field [10, 11]. This tech-
of 6400 MRI images divided into four classes. Based on the nique makes use of various parameters and resources of a pre‐
damage to brain cells and the patient's health condition, AD trained model. The Transfer Learning model is efficient in
can be classified into four stages: moderate demented, mild classifying various computer vision tasks with minimum
demented, very mild demented and non‐demented AD [8, 9]. computation cost. In our study, comparative analysis of deep
The exact cause of Alzheimer's disease has not yet been clearly learning models like InceptionV3, ResNet50, ResNet101 and
identified by the scientists. According to most of the recent DenseNet169 has been done using the concept of TL [12]. The
studies, it is said to be caused due to high deposition of am- main objective of our research work are as follows:
yloid β proteins which are responsible for death in brain cells
and thus blocks the transmission of signals. The loss of tissues � Performing experiments on various deep learning architec-
in various sub‐regions of the brain causes atrophy in the hip- tures for the identification of stages of Alzheimer's disease
pocampus, cerebral cortex and other components of the brain. using brain MRI.
In the past few years, the Convolutional Neural Network � Comparative analysis of each model on the basis of various
(CNN) emerged as the most popular technique for the medical performance metrics.
image classification, yielding splendid performance. But while � Analysis of various learning curves obtained for determining
training any neural network, there is a requirement for extra the accuracies of each model during training and validation.
order to reduce the grid size of feature maps obtained. Fine‐ by researchers to solve the problem of vanishing/exploding
tuning of Inception V3 was performed by using pre‐trained gradient. In Figure 4 a skip connection in ResNet architecture
weights of ImageNet for the classification and identification has been shown, where the intermediate layer has been skipped
of the disease. by the model.
In this network, the concept of a skip connection was used to
propagate information across the layers. These skip connections
3.2 | ResNet skip the training of a few layers and directly connects them to the
output. The main benefit of adding these skip connections in
Very deep neural networks often suffer from vanishing the network is that if any of the layers impair the performance of
gradient problems. Residual networks were proposed in 2015 the architecture, it is skipped through regularisation. ResNet [16]
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 5
is a type of network‐in‐network (NIN) architecture as it consists DenseNet model has been shown in Figure 5 containing three
many residual units stacked together. The collection of these dense blocks where the layers in each block are densely con-
residual units forms a building block for the ResNet architecture. nected with one another.
Residual networks are easy to optimise and also gain high ac- The DenseNet network becomes thinner and compact
curacy with the increase in depth of the network. ResNet ar- with a fewer number of channels as feature maps are received
chitecture follows two design principles: 1) For each layer, the from all preceding layers, and so this network is better in terms
number of filters should be the same according to the output of computational and memory requirements. Also, it helps in
feature map size. 2) If the size of the feature map is halved, we reducing overfitting on tasks having small training set sizes
need to double the filter number in order to maintain the because of regularising the effect of dense connections. For the
complexity of the network. In my work, the ResNet model with task of Alzheimer's disease classification, the DenseNet model
50 and 101 layers has been used and loaded with pre‐trained with 169 layers has been used by loading the model with pre‐
weights from ImageNet for the identification of AD [20]. trained weights from ImageNet.
Resnet 50: ResNet‐50 is a 50‐layer CNN (48 convolutional DenseNet169: The depth of DenseNet169 is 169 layers, but
layers, one MaxPool layer and one average pool layer.) A small still the number of parameters is comparatively low than that of
change was made in ResNet50 from the previous architectures. other models. The vanishing gradient problem in this architec-
Shortcut connections skipped three layers in this architecture ture is also handled very well. In this architecture, the last fully
with 1 � 1 convolution layers. connected layer was replaced by 256 nodes fully connected layer.
ResNet101: ResNet101 is constructed using 101 layers. Further, 128 nodes fully connected layer and at last 10 fully
The number of three‐layer blocks are more than that of connected layers with softmax activation output were added.
ResNet50. ResNet101 is fast and more accurate than
ResNet50, and even after the increase in depth, the complexity
of this network is not increased. 4 | EXPERIMENTATIONS AND
RESULTS
evaluating the data and its parameter tuning. The test data is Recall mainly focuses on FN, that is, the Type 2 error. False
used for the prediction and final evaluation of the model. negative means identifying a non‐infected patient as disease‐
infected. Recall close to one signifies that our model has
least TP, and its value less than 0.5 means our classifier has a
4.2 | Training high number of FN.
In the work done, we have performed the classification and d. F1‐score: F1‐score is a way to combine both precision and
identification of Alzheimer's disease by using the deep neural recall in one measure when the impact of both FP and FN
network. Our focus is on the comparative analysis of various is equally important.
deep learning models like InceptionV3, ResNet with 50,101
layers and DenseNet with 169 layers. For each experiment, 2 � ðPrecision � RecallÞ
F1 − score ¼ ð4Þ
categorical cross‐entropy loss and accuracy metric are used ðPrecision þ RecallÞ
which is based on the test dataset. Each model was trained for
a total of 50 epochs. All the networks were trained using the The value of the F1 score is high if both precision and
optimisation algorithm, RMSProp (Root Mean Squared recall are high. The value of the F1 score lies between 0 and 1.
Propagation), which helps in fast convergence. The learning
rate was set to 0.001 with a batch size of 32. The ReLU
(Rectified Linear Unit) activation function and Batch Nor- 4.4 | Evaluated results for InceptionV3
malisation technique were applied. The results of the experi-
ments for each model are presented below. The result of experiments performed on the InceptionV3
model are given below.
Table 1 gives the evaluation metric for InceptionV3. Here
4.3 | Performance measures the precision for each class is high, which shows that the model
has very least cases of False Positive. Whereas Recall for the
In the proposed work, Accuracy, Precision, Recall and F1‐ very mild demented class is low, which shows this class consists
score have been evaluated. The equations for obtaining each some False Negative values; consequently, the F1‐score value is
metric are given below. also low.
Table 2 shows the loss and accuracy on training & vali-
a. Accuracy: It indicates the total number of correct pre- dation data after every 10 epochs. The training and validation
dictions made by the model [21]. The history of the accu- loss kept decreasing with the growing number of epochs to
racy graph depicts the training and validation accuracy of 0.19 and 0.28, respectively. Whereas the training & validation
the graph. accuracy gradually increased to 93.63% and 89.45%, respec-
TP þ TN tively. Figure 6 shows the confusion matrix for the above‐
Accuracy ¼ ð1Þ obtained result.
T P þ T N þ FP þ FN
The model was trained for 50 epochs, and the average
training accuracy of 99.77% was obtained. The results obtained
Where, TP = True Positive, TN = True Negative,
show that the model was best fitted with the used dataset.
FP = False Positive and FN = False Negative.
ResNet50 ResNet101
Class Recall Precision F1‐score Support Recall Precision F1‐score Support
Non‐Demented 0.83 0.80 0.81 639 0.76 0.85 0.80 639
Very Mild Demented 1.00 1.00 1.00 635 1.00 0.99 1.00 635
Mild Demented 0.43 0.79 0.56 662 0.86 0.56 0.68 662
Moderate Demented 0.73 0.51 0.60 624 0.24 0.45 0.31 624
precision is high and recall is low means when a model clas- that the model has both large number of false positive and
sifies a sample as positive, it is accurate but it can classify only false negative.
few positive samples, whereas in case of moderate demented Table 4 shows the loss and accuracy on training and vali-
where the value of recall is high and precision is low means dation data after every 10 epochs. The training and validation
model classifies most of the positive sample correctly, but it loss kept decreasing with the increasing number of epochs to
has many false positive. But for ResNet101, in case of mod- 0.67 and 0.56 for ResNet50 and 0.64 and 0.62 for ResNet101,
erate demented where both precision and recall is low states respectively. Whereas the training and validation accuracy
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8
- PALLAWI and SINGH
TABLE 4 Summary of training and validation accuracy and loss at certain epochs.
ResNet50 ResNet101
No. of Training Training Validation Validation Training Training Validation Validation
epochs accuracy loss accuracy loss accuracy loss accuracy loss
10 59.88 0.87 65.82 0.70 57.70 0.88 58.69 0.80
FIGURE 7 Confusion matrix for (a) ResNet 50 and (b) ResNet 101.
gradually increased to 70.13% and 76.66% for ResNet50 and TABLE 5 Evaluation metric for Densenet169.
70.62 and 71.73 for ResNet101, respectively. Figure 7 presents Class Recall Precision F1‐score Support
the confusion matrix for the above‐obtained result for both the
ResNet Models. Non‐Demented 0.97 0.95 0.96 639
The average training accuracy obtained after training the Very Mild Demented 1.00 1.00 1.00 635
models for 50 epochs for ResNet50 was 79.65%, whereas for
Mild Demented 0.92 0.91 0.91 662
ResNet101 was 75.26%. The results obtained are not as good
as InceptionV3. Moderate Demented 0.89 0.91 0.90 624
4.6 | Evaluated results for DenseNet169 respectively. Figure 8 shows the confusion matrix for the
above‐obtained result for the DenseNet169 model.
DenseNet is a convolution network with 169 layers in which all The above result shows that DenseNet169 gives the best
the layers are directly connected with other. The experimental performance with the highest training accuracy of 97.17% after
results obtained from this model are as follows. the model got trained for 50 epochs, which is the best accuracy
Table 5 gives the evaluation metric for DenseNet169. Here obtained among all the four architectures.
the precision and recall for each class is high, which shows that
model neither contains any False Positive nor any False
Negative values. Therefore, the value of the F1‐score is also 5 | DISCUSSION AND ANALYSIS
high for each class.
Table 6 presents the loss and accuracy on training and Recent work done in deep learning models shows that deeper
validation data after every 10 epochs. The training and vali- models are efficient to train and are more accurate. However,
dation loss kept decreasing with the growing number of epochs with the increase in depth of the model, other challenges also
to 0.08 and 0.25, respectively, whereas the training and vali- increases such as vanishing gradient, degradation, internal co-
dation accuracy gradually increased to 97.17% and 93.26%, variate shift problem and increase in computational cost.
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 9
Therefore, to deal with these problems, various strategies has used the TL approach to train pre‐trained InceptionV3,
been used in different architectures such as skip connections, ResNet50, ResNet101 and DenseNet169 CNN models. Eval-
TL, batch normalisation and optimisation methods. The uation of our model was done on unseen testing data. A
adopted TL concept helped in boosting the accuracy and comparative analysis of all four models has been shown in
reducing the execution time. Table 7 where the number of parameters, training time and
First of all, data augmentation is done to enhance our accuracy obtained during experimentation have been
dataset, and the augmented images are extracted for our pro- summarised.
posed CNN model. After augmentation, the oversampling In this section we have discussed the experimental analysis
operation has been performed using the synthetic minority of proposed models. All the architectures have been trained on
oversampling technique in order to balance our dataset. We same modalities, data augmentation, feature types with the
same dataset for the detection and classification of AD. We In Table 8 performance metric of all the four models has
have used five metrics to evaluate the performance of our been summarised. In above table, we observe that the precision
model. The learning curves obtained depicts how model per- in case of DenseNet169 is 94% highest among the rest of the
forms with the growing number of epochs on the training and models, while ResNet101 showed 71%. Similarly, the value of
validation set. The training set is a portion of the dataset which recall for DenseNet169 is 95%, while InceptionV3 had the
is used to train the model initially, while the validation set lowest of 59%. We also observed that for the InceptionV3
validate the performance of the model during learning. model where precision and recall was respectively 80% and
56%, then the F‐score obtained was only 59%, while in other
models as both the precision & recall values were high, thus
T A B L E 8 Performance measures using accuracy, precision, recall,
F1‐score and Loss. the F1‐score obtained was also high.
The loss curve illustrates the error made by the curve. It
InceptionV3 ResNet50 ResNet101 DenseNet169 shows how much data has been hampered in terms of preci-
Accuracy 99.77% 79.65% 75.26% 99.94% sion. They play an important role in providing faulty results
Precision 79.75% 77.50% 71.25% 94.25%
during testing. Training loss is measured after each batch,
whereas validation loss is measured after each epoch.
Recall 56.75% 74.75% 71.50% 94.50% The plot for loss function comparing the behaviour of
F1‐score 59.25% 74.25% 69.75% 94.2% training and validation loss obtained during the training pro-
Loss 0.19 0.67 0.64 0.08
cess is presented in Figure 9. From the above‐obtained loss
curves, we observe that both the loss curves for InceptionV3
FIGURE 9 Learning curve for training and validation loss for (a) Inception V3, (b) ResNet50, (c) ResNet101, and (d) DenseNet169.
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
11
-
Learning curve for training and validation accuracy for (a) ResNet50 (b) DenseNet169 (c) InceptionV3 (d) ResNet101.
Loss graph for all the four models.
PALLAWI and SINGH
FIGURE 10
FIGURE 11
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12
- PALLAWI and SINGH
and ResNet50 are gradually decreasing till the end, whereas for architectures, DenseNet gave the best result by making
ResNet101, the loss curves almost saturate after 30 epochs and continuous improvement in accuracy with the increase in the
that for DenseNet169 after 20 epochs. We also observe that number of epochs without any overfitting or performance
the gap between the validation curve and training curve is not deterioration. DenseNet169 obtained the best result of 99.94%
large for any model. Thus, all our model acquires a good fit accuracy on training data and 93.26% accuracy on validation
except in the case of ResNet50 where the curve of validation data beating the rest of the architectures. Therefore, we can
loss is slightly less than training loss and validation loss is consider DenseNet as an efficient architecture for the classi-
gradually decreasing which shows the condition of a bit overfit. fication of Alzheimer's disease. Although the results obtained
Among all architectures, DenseNet169 gave the best result were satisfactory, we can do further research to reduce the
with the minimum loss of 0.08. Figure 10 below represents the computational time.
training and validation loss of all the four models in the form
of a bar graph where the loss for DenseNet169 is minimum. C O N FL I C T O F I N TE R E S T S TA TE M E N T
Figure 11 depicts the accuracy metric of all the four‐model The authors declare that there are no conflicts of interest
comparing their performance by training and testing the ac- regarding the publication of this article.
curacy curve where we can observe that both the accuracy
curves of InceptionV3 and ResNet50 are steadily increasing, DA TA AVA I L A B I L I T Y S TA TE M E N T
while in case of ResNet101, the accuracy has saturated almost Data openly available in a public repository that issues datasets
after 35 epochs and that of DenseNet169 it saturated after 20 with DOIs.
epochs only during the training process.
Moreover, the gap between the training and validation O R CI D
curve is not large, therefore we can say that our model acquires Shruti Pallawi https://orcid.org/0000-0002-9129-9989
a good fit. Among all the architectures, DenseNet169 gave the
best result as the training data accuracy is favourable around R E F ER E N CE S
99%, and the validation data accuracy which is around 93% 1. Aderghal, K., Benois‐Pineau, J., Afdel, K.: Classification of sMRI for
lacks to match the accuracy bar by only 6%. In ResNet50, the Alzheimer's disease diagnosis with CNN: single Siamese networks with
2D+? Approach and fusion on ADNI. In: Proceedings of the 2017 ACM
model shows the condition of a bit overfit as the validation
on International Conference on Multimedia Retrieval (2017)
curve is slightly higher than the training curve. Figure 12 below 2. Szegedy, C., et al.: Rethinking the inception architecture for computer
represents training, validation and testing accuracy of all the vision. In: Proceedings of the IEEE Conference on Computer Vision
four models in the form of a bar graph where the accuracy for and Pattern Recognition (2016)
DenseNet169 is maximum. 3. Wen, J., et al.: Convolutional neural networks for classification of Alz-
heimer's disease: overview and reproducible evaluation. Med. Image
Anal. 63, 101694 (2020). https://doi.org/10.1016/j.media.2020.101694
4. Islam, J., Zhang, Y.: A novel deep learning based multi‐class classification
6 | CONCLUSION AND FUTURE SCOPE method for Alzheimer’s disease detection using brain MRI data. In:
International Conference on Brain Informatics. Springer, Cham (2017)
In our work, fine‐tuning and evaluation of the deep neural 5. Bai, T., et al.: A novel Alzheimer’s disease detection approach using
GAN‐based brain slice image enhancement. Neurocomputing 492,
network for the detection and classification of Alzheimer's
353–369 (2022). https://doi.org/10.1016/j.neucom.2022.04.012
disease have been performed. InceptionV3, ResNet with 50 6. Chen, S., et al.: Alzheimer's disease classification using structural MRI
and 101 layers and DenseNet with 169 layers are the archi- based on convolutional neural networks. In: 2020 2nd International
tectures that have been evaluated. Among all the above Conference on Big‐Data Service and Intelligent Computation (2020)
25177567, 2023, 1, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ccs2.12072, Wiley Online Library on [10/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
PALLAWI and SINGH
- 13
7. Zhang, F., et al.: Multi‐modal deep learning model for auxiliary diagnosis 16. Andrade, L., et al.: Shearlets as Feature Extractor for Semantic Edge
of Alzheimer’s disease. Neurocomputing 361, 185–195 (2019). https:// Detection: The Model‐Based and Data‐Driven Realm (2019)
doi.org/10.1016/j.neucom.2019.04.093 17. García, Z., et al.: Mosquito Larvae Image Classification Based on Den-
8. Nagashbayev, Al‐F., Fatih Demirci, M.: Alzheimer's disease classification seNet and Guided Grad‐CAM (2019). https://doi.org/10.1007/978‐3‐
using capsule networks on structural MRI. In: 2020 5th International 030‐31321‐0_21
Conference on Biomedical Imaging. Signal Processing (2020) 18. Fulton, L.V., et al.: Classification of Alzheimer’s disease with and without
9. Jain, R., et al.: Convolutional neural network ‐based Alzheimer’s disease imagery using gradient boosted machines and ResNet‐50. Brain Sci. 9(9),
classification from magnetic resonance brain images. Cognit. Syst. Res. 212 (2019). https://doi.org/10.3390/brainsci9090212
57, 147–159 (2019). https://doi.org/10.1016/j.cogsys.2018.12.015 19. Szegedy, C., et al.: Rethinking the inception architecture for computer
10. Pan, J., et al.: DecGAN: decoupling Generative Adversarial Network vision. In: Proceedings of the IEEE Conference on Computer Vision
detecting abnormal neural circuits for Alzheimer's disease. arXiv preprint and Pattern Recognition (2016)
arXiv:2110.05712 (2021) 20. Kabir, A., et al.: Multi‐classification based Alzheimer's disease detection
11. Roy, S.S., et al.: Deep convolutional neural network for environmental with comparative analysis from brain MRI scans using deep learning. In:
sound classification via dilation. Journal of Intelligent & Fuzzy Systems TENCON 2021‐2021 IEEE Region 10 Conference (TENCON). IEEE
Preprint 43(2), 1–7 (2022). https://doi.org/10.3233/jifs‐219283 (2021)
12. Kapoor, M., et al.: Early diagnosis of Alzheimer's disease using machine 21. Wang, H.: Research on MRI classification method of Alzheimer's Disease
learning based methods. In: 2021 Thirteenth International Conference brain based on convolutional neural network. In: Proceedings of the 2nd
on Contemporary Computing (IC3‐2021) (2021) International Symposium on Artificial Intelligence for Medicine Sciences
13. Sambath Kumar, S., Nandhini, M.: Entropy slicing extraction and (2021)
transfer learning classification for early diagnosis of Alzheimer diseases
with sMRI. ACM Trans. Multimed Comput. Commun. Appl 17(2), 1–22
(2021). https://doi.org/10.1145/3383749 How to cite this article: Pallawi, S., Singh, D.K.:
14. Bangyal, W.H., et al.: Constructing domain ontology for alzheimer dis- Review and analysis of deep neural network models for
ease using deep learning based approach. Electronics 11(12), 1890 (2022). Alzheimer's disease classification using brain medical
https://doi.org/10.3390/electronics11121890
15. Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of
resonance imaging. Cogn. Comput. Syst. 5(1), 1–13
the IEEE Conference on Computer Vision and Pattern Recognition (2023). https://doi.org/10.1049/ccs2.12072
(2015)