Batch 03 Entire Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

COMPARTIVE ANALYSIS OF MACHINE LEARNING

APPROACHES FOR EARLY DETECTION OF


ALZHEIMER’S DISEASE

A PROJECT REPORT

Submitted by

AKILESHWARAN.S (310118104004)

SATHISH KUMAR.R (310118104059)

in partial fulfillment for the award of the degree


of
BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING


ANAND INSTITUTE OF HIGHER TECHNOLOGY

ANNA UNIVERSITY: CHENNAI 600 025


JUNE 2022
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “COMPARATIVE ANALYSIS OF MACHINE


LEARNING METHODS FOR EARLY DETECTION OF ALZHEIMER’S
DISEASE” is a bonafide work of AKILESHWARAN.S (310118104004) and
SATHISH KUMAR.R (310118104059) who carried out the Project work under my
supervision.

SIGNATURE SIGNATURE

Dr. S. Roselin Mary, Ph.D., A.Malathi, M.Tech

HEAD OF THE DEPARTMENT SUPERVISOR

ASSISTANT PROFESSOR

Department of CSE Department of CSE

Anand Institute of Higher Technology Anand Institute of Higher

Kazhipattur Technology, Kazhipattur

Chennai – 603-103 Chennai – 603-103

Submitted to Project and Viva Voice Examination held on………………

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

First and foremost, we thank the Almighty, for showering his abundant
blessings on us to successfully complete the project. Our sincere thanks to
our Honorable Founder “Kalvivallal” Late ThiruT.Kalasalingam,
B.Com.

Our sincere thanks and gratitude to our “SevaRatna”, Dr. Sridharan,


M. Com, M.B.A., M.Phil., Ph.D., Chairman, Dr. Mrs. S. Arivazhagi,
M.B.B.S., Secretary for giving us the support during the project work. We
convey our thanks to Dr. P. Suresh Mohan Kumar, Ph.D., principal for
thesupport towards the successful completion of this project.

We wish to thank our Project Guide, Assistant Professor A.


Malathi, M.Tech., Project Coordinator, Assistant Professor M.
Maheswari, M.E. and Head of the Department Dr. S. Roselin Mary,
Ph.D., for the co-ordination and better guidance and constant
encouragement in completing in this project.

We also thank all the Staff members of the Department of Computer


Science and Engineering for their commendable support and
encouragement to go ahead with the project in reaching perfection.

Last but not the least our sincere thanks to all our parents and friends
for their continuous support and encouragement in the successful
completion of our project

iii
ABSTRACT

The most common cause of dementia in older adults is Alzheimer’s

disease (AD). With no known cure of AD, there is an important need to

identify the behavioral tasks and biomarkers such as dataset that can

accurately access and/or predict disease progression in asymptotic patients,

since treatment is most likely effective at any early stage. Computers can

benefit from vast and complex datasets using Machine Learning, a branch of

Artificial Intelligence. Various analysis and evaluation techniques performed

on our survey for the early detection of AD using various approaches of

machine learning is also reviewed that makes use of variety of probabilistic

and optimization techniques. Some studies have suggested that MRI features

may predict rate of decline of AD and may guide therapy in the future.

However, in order to reach that stage clinicians and researchers will have to

make use of machine learning techniques that can accurately predict progress

of a patient from mild cognitive impairment to dementia. Our goal is to

develop a front-end webpage based on our survey that will assist clinicians to

predict the onset of Alzheimer’s disease early on. This examines accuracies of

variety of machine learning algorithms and bio-markers associated with the

disease. Thus, Random Forest classifier resulted with the highest accuracy of

99.1% among the classifiers and this helped us for our front-end prediction

model.

iv
TABLE OF CONTENTS

CHAPTER TITLE PAGE

ABSTRACT iv

LIST OF FIGURES vii

LIST OF ABBREVIATION viii

LIST OF TABLES ix

1. INTRODUCTION 1
1.1 OBJECTIVE 3
1.2 SCOPE 3

2. LITERATURE SURVEY 4

3. ANALYSIS 18

3.1 SYSTEM ANALYSIS 18


3.1.1 Problem definition 18
3.1.2 Existing System 18
3.1.3 Proposed System 19
3.2 REQUIREMENT ANALYSIS 19
3.2.1 Functional Requirements 19
3.2.2 Non-Functional Requirements 20
3.2.3 Software Analysis 21
3.2.4 Hardware Specification 21
3.2.5 Software Specification 21

4. DESIGN 22

4.1 OVERALL ARCHITECTURE 22


4.2 UML DIAGRAMS 23
4.2.1 Use Case Diagram 23
v
4.2.2 Class Diagram 24
4.2.3 Sequence Diagram 25
4.2.4 Collaboration Diagram 26
4.2.5 Activity Diagram 27

5. IMPLEMENTATION 29

5.1 MODULES 29
5.1.1 Dataset Analysis
5.1.2 Dataset Pre-processing 30
5.1.3 Model Training & Testing 30
5.1.4 Model Evaluation & Deployment 31
6. TESTING 32

7. RESULT AND DISCUSSION 43

8. USER MANUAL 45

9. CONCLUSION 46

10. FUTURE ENHANCEMENT

APPENDICES
APPENDIX 1 BASE PAPER
APPENDIX 2 SCREENSHOTS
APPENDIX 3 PUBLICATION
REFERENCES

vi
LIST OF FIGURES

FIGURE NO FIGURE DESCRIPTION PAGE NO

4.1 Overall Architecture for entire system 22

4.2 Use Case Diagram for entire system 23

4.3 Class Diagram for entire system 24

4.4 Sequence Diagram for entire system 25

4.5 Collaboration Diagram for entire 26


system
4.6 Activity Diagram for entire system 27

7.1 – Accuracies generated by dropping 46-50


dataset
7.15

vii
LIST OF TABLES

TABLE NO TABLE NAME PAGE NO

4.1 Attributes from Oasis Dataset 29

6.1 Test Case Design


34

6.2 Test Case Log 41

7.1 Accuracies generated by dropping 54


features

7.2 Accuracies generated by dropping features 54

viii
LIST OF ABBREVIATIONS

SYMBOLS ABBREVIATIONS

ML Machine Learning

MRI Magnetic Resonance Imaging

AD Alzheimer’s Disease

MCI Mild Cognitive Impairment

HCI High cognitive Impairment

ix
CHAPTER 1
INTRODUCTION
In the modern era, many people are suffering from many types of brain
diseases, in that Alzheimer’s is one of the brain diseases; it will affect the
memory and destroys cells in the neuron. In the initial state the disease makes
the brain to feel difficult to remember some incidents, then as days goes it will
not allow the brain to think anything. Finally, it also creates the Dementia. The
Alzheimer’s disease (AD) affected people cannot do even a simple task and they
can’t remember their name itself. In ancient days it affected the people with the
age of 60-90, now a day it affects the people with the age of 35-60. It is
categorized into initial, middle and final stage. The initial stage is the Mild
Cognitive Impairment (MCI), this shows the symptoms of starting state of the
Alzheimer and for all the people it will not continue as the disease but in the
next two stages it will be developed into Alzheimer’s disease and is quite
difficult to cure the disease.
In this project, a survey is being presented in a review on recent machine
learning approaches in detecting brain diseases such as AD. 10 recent articles on
Alzheimer’s disease are reviewed considering diverse machine learning
approaches, modalities, datasets etc. OASIS dataset is discussed which are used
most frequently in the reviewed articles as a primary source of brain disease
data. Moreover, a brief overview of different feature extraction techniques that
are used in diagnosing brain diseases is provided. Finally, key findings from the
reviewed articles are summarized and a number of major issues related to
machine learning based brain disease diagnostic approaches are discussed. It is
categorized into initial, middle and final stage. The initial stage is the Mild
Cognitive Impairment (MCI), this shows the symptoms of starting state of the
Alzheimer and for all the people it will not continue as the disease but in the
next two stages it will be developed into Alzheimer’s disease and is quite
difficult to cure the disease.

1
1.1 OBJECTIVE

• To survey the accuracies generated as a result of feature manipulations.

• To prepare a survey for analyzing the supervised classifiers resulting


with the highest accuracy.
• To develop a front-end model, by using our developed survey.

• To obtain the values from demographic data, data from clinical tests
and values From MRI Scan Reports, such as eTIV, nWBV and ASF.

• To submit the obtained values for finding Alzheimer’s Disease (AD).

1.2 SCOPE

• By developing a survey and a prediction model for early detection


of Alzheimer’s disease, this can help clinicians to have a better chance
of confirming whether patients are properly diagnosed by Alzheimer’s
or not.

• By doing so, one can predict whether an individual is having chances


ofacquiring AD, with reduced complexity in time and effort.

• There isn’t a demanding need of technicians, doctors, surgeons or


clinicians for using our web-based application. Any novice can easily
operate through our webpage by simply entering the necessary data
from demographics and MRI scan reports.

2
CHAPTER 2
LITERATURE SURVEY

Title : Early Detection of Alzheimer’s Disease with Blood Plasma

Proteins using Support Vector Machines

Author : Suhad al-Shoukry, Taha H. Rassem (senior member, ieee),

and nasrin m. makbol

Publication : 2020 IEEE International Conference on Imaging Systems and

Techniques (IST)
Concept Discussed:

This project is implemented to identify potential blood-based non-


amyloid biomarker panels for early detection of AD. Besides this concept
focuses on identify the performance of novels to detect the Mild cognitive
impairment (MCI). It is mainly based on Machine Learning technique SVM.

Work Done:

Blood plasma data used in this project were obtained from the
Alzheimer’s disease neuroimaging initiative (ADNI) portal. During the
preprocessing those data are separated into two datasets. Therefore, they have
developed potential models and identified five novel candidate non-amyloid
biomarker panels for early detection of AD utilizing a new approach and the
performance of the novels are identified

Problem Identified:

Many recent studies have used computers to diagnose or detect AD but


most machine detection methods are limited by congenital observations.

Knowledge Gained:

The new insights about the disease are gained from understanding the
interactions between the proteins in disease subjects and gained.

3
Gap:

This project has several limitations, it doesn’t support large amount of


dataand suspected that the protein related biomarkers would not give better results.

Title : Alzheimer's Diseases Detection by Using Deep

Learning Algorithms: A Mini review

Author : Suhad al-Shoukry, Taha H. Rassem (senior member, ieee),

and nasrin m. makbol

Publication : 2021 IEEE Journal of Biomedical and Health Informatics

Concept Discussed:

This project focuses on, papers published regarding an early detection of


the Alzheimer’s disease using the Deep learning algorithms. Multiple Deep
learningand certain Machine learning techniques were reviewed on this paper.

Work Done:

Deep learning machine learning (SVM) techniques and their


performances, image processing has been discussed in this project.

Problem Identified:

The ultimate purpose of these neuroscientific approaches is to enhance the


initial exposure and complete the treatment plan of individuals in high risk of
Alzheimer's disease and AD-related cognitive decline from a computational
perspective.

Knowledge Gained:

Knowledge gained about Deep learning techniques. Generative architecture


can be subdivided into the four sections of Recurrent Neural Network (RNN),
Deep Auto-Encoder (DAE), and Deep Belief Networks (DBN).

4
Gap:
DL is extremely expensive to train due to complex data models.
Moreover, deep learning requires expensive GPUs.

Title : Optimized One vs. One approach in multiclass


classification for early Alzheimer’s Disease
and Mild Cognitive impairment diagnosis.

Author : Castillo-Barnes, Francisco Jesus Martinez Murcia, Javier

Ramirez, and Juan M. Gorriz

Publication : 2021 IEEE International Conference Imaging Systems and

Techniques (IST)

Concept Discussed:

This paper proposes a novel multiclass classification approach that


addresses the outlier detection problem, uses pairwise t-test feature selection,
project the selected features onto a Partial-Least-Squares multiclass subspace,
and applies one-versus-one error correction output codes classification

Work Done:

Datasets PET, MRI and other biomarkers were obtained from the
Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. These datasets
are processed by a common preprocessing step consists of centering the data to
zero mean, then the features extraction, used in this work for performance
estimations can be considered as “pessimistic” and classification of disease was
takes placed.

Problem Identified:

To detect Alzheimer’s Disease in its early stages is crucial for patient care
and drugs development. Motivated by this fact, the neuroimaging community
has extensively applied machine learning techniques to the early diagnosis
problem with promising results.
5
Knowledge Gained:

This project focuses on the development of CAD systems for the multiclass
classification of 4 classes: Healthy Controls (HC), Mild Cognitive Impaired
(MCI) subjects, Mild cognitive impaired subjects that converted into
Alzheimer’s Disease during the study (MCI), and Alzheimer’s Disease
(AD)patients. The challenge provided with preprocessed MRI data of the
different classes to allow participant proposals of optimized CAD systems,
based on the finding that combining multiple anatomical measures improve
classification of early diagnosis of AD.
Gap:

The limited number of samples is required to diagnose the Alzheimer’s and


the preprocessing of outliers reveal a high redundancy on the original data.
Some mismatches between training fitting and final test estimations can be
expected, limiting the capabilities of the system for reaching its highest
performance at test level.

Title : Early Detection of Amyloid β Pathology in Alzheimer’s


Disease by Molecular MRI.

Author : Celia M. Dong, Audrey S. Guo, Anthea To, Kannie W.Y.


Chan, Aviva S.F. Chow, Liming Bian, Alex T.L. Leong, a

Publication : 2021, Hong Kong Research Grants Council and Guangdong


Key. Technologies for Alzheimer' Disease Diagnosis and
Treatment.

Concept Discussed:

Alzheimer’s disease (AD) is a degenerative brain disease and the most


common cause of dementia. Early stage β-amyloid oligomers (AβOs) and late
stage Aβ plaques are the pathological hallmarks of AD brains. In this project the
aim is to detect Aβ pathologies at the early and late stages of AD.

6
Work Done:

In previous studies, authors developed a new curcumin-conjugated


magnetic nanoparticle (Cur-MNPs) to target the Aβ pathologies. Hence, in this
project, investigate the feasibility of Cur-MNPs to detect Aβ pathologies at the
early and late stages of AD in transgenic AD. But the authors took the MRI
datasets of mice brain to test alpha and beta amyloids. Finally, Cur-MNPs are
able to target not only the Aβ pathologies, but also the Aβ oligomers at the early
stage of AD progression. Taken together, it presents a powerful imaging
approach in our pursuit of AD early diagnosis and drug development

Problem Identified:

Degeneration, even if effective drugs are developed, is the subtlety of


cognitive changes that occur early in the disease. On diagnosis, AD is usually
already at mild to moderate stage. Current treatments cannot stop progression of
dementia, and therefore recent therapeutic approaches are shifting to early
intervention aimed at halting neurodegeneration before damage accumulates.

Knowledge Gained:

The knowledge gained about the curcumin-conjugated magnetic


nanoparticle (Cur-MNPs), Aβ pathologies and visualized molecular approach.

Gap:
In this project, the authors took the MRI datasets of mice instead of human
brain, to detect the Aβ pathologies. So, the accuracy can be differed from mice
to human.

Title : Graph Wavelet-Based Multilevel Graph


Coarsening and Its Application in Graph
CNN for Alzheimer's Disease Detection.

Author :
Himanshu Padole, (member, ieee), Shiv Dutt Joshi, and Tapan
K. Gandhi

7
Publication : IEEE 2021 International Conference

Concept Discussed:

The classical applications like graph partitioning, graph visualization,


etc., graph coarsening has been recently applied in Graph Convolutional
Neural Network (GCNN) architectures to perform the pooling operation in the
graph domain. Hence, this modified GCNN architecture is then used as a graph
signal classifier for the early detection of Alzheimer's disease.

Work Done:

The existing GCNN architecture is modified by applying the author’s


proposed graph coarsening method to perform the pooling and Laplacian
operator. In general, graph coarsening can be thought of as a two-step process
consisting of graph down sampling and graph reduction AD detection model is
developed for the early detection of AD in which the modified GCNN
architecture is used as a graph signal classifier and achieved the state-of-heart
AD detection performance using the proposed AD detection model.

Problem Identified:

Many real-life problems are limited by the large size of the graphs involved
as the computational complexity of most of the algorithms in GSP grow
polynomial with the graph size.

Knowledge Gained:

By this project, knowledge gained about Graphical Convolutional Neural


Networks, graph partitioning, graph visualization, Laplacian optimization and
graph coarsening.

Gap:

GCNN architecture pooling and optimization of Laplacian operator requires


large amount of space.

8
Title : Machine Learning and Deep Learning Approaches for

Brain Disease Diagnosis: Principles and Recent Advances


Author : Aisha B. Rahman. Shahriar Kamal

Publication : Journal of Ambient Intelligence and Humanized Computing

Concept Discussed:

In this project, authors presented a review on recent machine learning


and deep learning approaches in detecting four brain diseases such as
Alzheimer's disease (AD), brain tumor, epilepsy, and Parkinson's disease.

Work Done:

Work done in the prognosis and prediction of Alzheimer’s disease using


machine and deep learning methods. Explicitly, the recent trends with respect to
machine and deep learning has been revealed including the types of data being
used and the performance of machine learning and deep learning methods in
predicting early stages of Alzheimer’s.

Problem identified:

To make the diagnosis of the disease easier, to detect the disease in its early
stages and use the machine and deep learning algorithm efficiently.

Knowledge Gained:

This paper presented a survey on the four most dangerous brain disease
detection processes using machine and deep learning. The survey reveals some
important insights into contemporary ML/DL techniques in the medical field
used in today's brain disorder research.

Gap:

Parameters were used to compare the accuracy. The deep and machine
learning techniques were not discussed.

9
Title : Toward Noninvasive Quantification of Brain Radio
ligandBinding by Combining Electronic Health
Records and Dynamic PET Imaging Data
Author : Arthur Mikhno, Francesca Zanderigo, R. Todd Ogden, J.
JohnMann
Publication : 2021 IEEE journal of biomedical and health informatics

Concept Discussed:

Quantitative analysis of positron emission tomography (PET) brain imaging


data requires a metabolite-corrected arterial input function (AIF) for estimation
of distribution volume and related outcome measures. Collecting arterial blood
samples adds risk, cost, measurement error, and patient discomfort to PET
studies.

Work Done:

Introducing the concept of nSIME, as a new SIME framework to enable


full noninvasive quantitative PET imaging. This framework replaces blood-
based radioligand measures by a robust PK input function model and multiple
noninvasive constraints based on machine learning on EHR data.

Problem identified:

The data was split into training/validation/test sets at the very


beginning and only the training/validation sets were used for model selection.

Knowledge Gained:

This survey on the four most dangerous brain disease detection processes
using machine and deep learning. The survey reveals some important insights
into contemporary ML/DL techniques in the medical field used in today's brain
disorder research.

10
Gap:

It has been designed an original big-data analytics tool toward a


noninvasive nSIME quantitative method for PET imaging However, it’s
complex to implement.

Title : Alzheimer’s disease prediction using a machine learning


algorithm
Authors : Neelaveni and Geetha Devasena

Publications :
2020 6th International Conference on Advanced
Computing & Communication Systems (ICACCS)

Concept Discussed:

This paper uses machine learning algorithms to predict the Alzheimer


disease using psychological MMSE parameters like age, number of visits and
education. Using Support vector machine and Decision tree algorithms.

Work Done:

The R-fMRI scans are preprocessed after being taken from the database.
The feature selection includes volumetric and thickness measurements.
Achieved theaccuracy of SVM as 85% and Decision tree of 83%.

Problem identified:

Using machine learning algorithms like Support Vector Machine, Naïve


Bayes and K-nearest neighbor to classify between different subjects.

Knowledge Gained:

Machine learning algorithms like Support Vector Machine, DecisionTree


were used, they predict the disease with different accuracies. Each algorithm
is trained with 70% training dataset and tested with 30% test dataset.

11
Gap:
Source of R-fRMI images were not mentioned, a smaller number of
parameters and ML classification techniques were used.

Title : Alzheimer’s disease prediction using a machine learning


algorithm

Author : Neelaveni and Geetha Devasena

2020 6th International Conference on Advanced


Publication : Computing & Communication Systems (ICACCS)

Concept Discussed:

This paper uses machine learning algorithms to predict the Alzheimer


disease using psychological MMSE parameters like age, number of visits and
education. Using Support vector machine and Decision tree algorithms

Work Done:

The R-fMRI scans are preprocessed after being taken from the database.
The feature selection includes volumetric and thickness measurements.
Achieved the accuracy of SVM as 85% and Decision tree of 83%.

Problem identified:

Using machine learning algorithms like Support Vector Machine, Naïve


Bayes and K-nearest neighbor to classify between different subjects.

Knowledge Gained:

When the machine learning algorithms like Support Vector Machine,


Decision Tree were used, they predict the disease with different accuracies.

Gap:
Source of R-fRMI images were not mentioned, a smaller number of
parameters and ML classification techniques were used.

12
Title : A Novel AI-Based System for Detection and Severity
Prediction of Dementia Using MRI
Author : Varun Jain, Om Nankar, Daryl Jacob Gerrish, Shilpa Gite,Shruti
Patil, and Ketan Kotecha
Publication : 2021 IEEE journal of biomedical and health informatics

Concept Discussed:

This model can predict MCI with an accuracy of 74% and can classify
dementia into four categories depending upon its prominence in the MRI scan.
This novel approach helps verify the differentiating features of the MRI scans
learned by the CNN model during training. The authors have also utilized
Visual Explainable A.I. (XAI) and have used Grad CAM to visually represent
the internal working of the model.

Work Done:

The performance of the CNN model has been made on these MRI datasets
of ADNI, feature extraction and prediction of alzheimer’s disease has been done
and CNN achieved the accuracy of 85%.

Problem identified:

Dementia is a symptom of Alzheimer's Disease (AD) that affects many


people around the globe each year. There is no effective cure to treat this
disease, and it can prove to be deadly to the patient if left untreated or
undetected.

Knowledge Gained:

Since there is a need to explore multiclass classification in dementia along


with a sufficient number of MRI images, the paper presented a novel D-BAC
system that uses a GAN-based data augmentation technique.

13
Gap:

The paper addressed the critical problem of imbalanced datasets using GAN
augmentation to balance the class labels and created a newly balanced dataset.

Title : Classification and Visualization of Alzheimer’s

Disease using Volumetric Convolutional Neural


Network and Transfer Learning
Kanghan Oh, Young Chul-Chung, Ko Woon Kim,
Authors :
Woo Sung Kim & II-Seok Oh

Publications : Journal of Ambient Intelligence and Humanized

Computing (2022)

Concept Discussed:

This project used convolutional auto encoder (CAE)- based unsupervised


learning for the AD vs. NC classification task, and supervised transfer learning
is applied to solve the pMCI vs. sMCI classification task with ADNI dataset.
And the results demonstrated that the proposed approach achieved the
accuracies of 86.60% and 73.95% for the AD and pMCI classification tasks
respectively.

Work Done:

The aim of this project is to find a way to encourage the end-to-end learning
of a CNN-based model for AD/NC/MCI classification to ultimately have the
capacity to obtain and analyze an explainable visualization map without human
intervention. They devised a notion of the end-to-end learning hierarchy, and
our work was built upon level 3 for which intensity and spatial normalization
are only considered. authors believed that this method is capable of maximizing
the full ability of CNNs.

14
Problem identified:

To date, the analysis of neuroimaging data, such as those obtained from


magnetic resonance imaging (MRI), positron emission tomography, functional
MRI (fMRI), and diffusion tensor imaging, has primarily been performed by
experts such as radiologists and physicians, thus requiring a high degree of
specialization.

Knowledge Gained:

Gained the knowledge about Gradient visualization method, Convolutional


auto encoder (CAE) and CNN.

Gap:

This project has several limitations: First, as the number of subjects used
for the training and test phases was still small for encouraging end-to-end
learning, any performance improvement compared with the prior conventional
models is limited.

15
CHAPTER 3

ANALYSIS

3.1 SYSTEM ANALYSIS


3.1.1 Problem definition

To develop a front-end webpage for early prediction of the disease by


providing demographic data, clinical data by neuro physicians and data from MRI
scans, our project use the ML classifiers in our oasis dataset, for getting accuracies
for our ML classifiers, used in our dataset, by dropping couple of features in
combinations, for surveying accuracies of each classifier under each combination,
for predicting demented, non-demented or converted by submitting the inputs, in the
required fields of our webpage.

3.1.2 Existing System

In existing models, MRI scan images were uploaded manually in the


computer storage. Then, those MRI scan images were made to be specified in the
module by using their path in which the image files were being stored. After
specifying path, the image will undergo processing, which usually involves Image
Restoration, Linear Filtering., Independent Component Analysis, Pixelation, Grey
Scaling, Template Matching, etc., for completely processing the uploaded image
file. Later values from the image are obtained and those values were trained and
tested with a dataset to exactly determine particular ranges for each definitive
value. Through this, they made their model ready for detection and thus were able
to detect the presence of the disease by using generated values from fed images as
input values, integer and float data types. Finally, the output results in whether the
patient is diagnosed with dementia or not.
Disadvantages

• Requirement of Image Processing every single time for extracting values.

• Lengthier processing times.

• Requires more processing power.

3.1.3 Proposed System

The prediction model focuses in developing a front end through


proposing our survey. This model looks to train the ML classifiers in our
dataset and developed a survey to gather accuracies of various ML
classifiers. Through these efforts, this are able to find the classifier with
highest accuracy recorded as a means of dropping couple of features in
various combinations, present in our dataset, by further obtaining values
from demographics, clinical tests by neuro- physicians and from MRI scan
reports and submit them for processing at the backend. Finally, this project
is able to predict whether the person has acquired chances to be affected by
dementia or not.
Advantages:

• Easier for psychologists and psychiatrists to use this for performing


early identification of possibility of the patient having an onset of
Alzheimer’sdisease.
• MRI report values and attributes are helpful for surveying the patient.

• Accuracy of Random Forest is generally very high.

• Its efficiency is particularly Notable in Large Data sets.

• Easier to implement, interpret and efficient to train ML classifiers.

• It allows us to determine the unbiased relationship between two variables


bycontrolling for the effects of other variables.
3.2 REQUIREMENT ANALYSIS
Requirement analysis, also called requirement engineering, is the process of
determining user expectations for a new modified product. It encompasses the tasks
that determine the need for analyzing, documenting, validating and managing
software or system requirements. The requirements should be documentable,
actionable, measurable, testable and traceable related to identified business needs or
opportunities and define to a level of detail, sufficient for systemdesign.

3.2.1 Functional Requirements

It is a technical specification requirement for the software products. It is the


first step in the requirement analysis process which lists the requirements of
particular software systems including functional, performance and security
requirements. The function of the system depends mainly on the quality
hardware used to run the software with given functionality.
Usability
It specifies how easy the system must be use. It is easy to ask queries in
any format which is short or long, porter stemming algorithm stimulates the
desired response for user.
Robustness
It refers to a program that performs well not only under ordinary conditions
but also under unusual conditions. It is the ability of the user to cope with errors
for irrelevant queries during execution.
Security
The state of providing protected access to resource is security. The system
provides good security and unauthorized users cannot access the system there
by providing high security.

Reliability
It is the probability of how often the software fails. The measurement is
often expressed in MTBF (Mean Time Between Failures).
Compatibility
It is supported by version above all web browsers. Using any web servers
like localhost makes the system real-time experience.
Flexibility
The flexibility of the project is provided in such a way that is has the ability
to run on different environments being executed by different users.
Safety
Safety is a measure taken to prevent trouble. Every query is processed in a
secured manner without letting others to know one’s personal information.

3.3 NON- FUNCTIONAL REQUIREMENTS

Portability
It is the usability of the same software in different environments. The project
can be run in any operating system.
Performance
These requirements determine the resources required, time interval,
throughput and everything that deals with the performance of the system.
Accuracy
The result of the requesting query is very accurate and high speed of
retrieving information. The degree of security provided by the system is high
and effective.
Maintainability
Project is simple as further updates can be easily done without affecting its
stability. Maintainability basically defines that how easy it is to maintain the
system. It means that how easy it is to maintain the system, analyze, change and
test the application. Maintainability of this project is simple as further updates
can be easily done without affecting its stability.
Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
The feasibility study investigates the problem and the information needs of
the stakeholders. It seeks to determine the resources required to provide an
information systems solution, the cost and benefits of such a solution.
The goal of the feasibility study is to consider alternative information
systems solutions, evaluate their feasibility, and propose the alternative most
suitable to the organization. The feasibility of a proposed solution is evaluated
in terms of its components.
Economic Feasibility
This study is carried out to check the economic impact that the system will
have on the organization. The amount of fund that the company can pour into
the research and development of the system is limited. The expenditures must
justify. Thus, the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
Technical Feasibility
This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high
demands on the available technical resources.
This will lead to high demands being placed on the client. The developed
system must have a modest requirement, as only minimal null changes are
required for implementing this system.
Social Feasibility
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity.
3.3.1 Hardware Specification

System : Core i5 – 4th Gen

Storage : 500 GB SATA SSD


Monitor : 15’’ LED
Input Device : Keyboard, Mouse
Ram : 2GB

3.3.2 Software Specification

Operating System : Windows 10

Coding Language : Python 3.8

Development Environment : PyCharm IDE


CHAPTER 4

DESIGN

4.1 OVERALL ARCHITECTURE

Oasis longitudinal dataset is preprocessed using Min Max Scaler. Then it splits
samples into training and testing. Then the trained and tested model is made to be
evaluated. The evaluated model with ML classifiers is then deployed in our front-end
webpage to predict the disease.

Fig 4.1 Overall Architecture for entire system


4.2 UML Diagram

Unified Modelling Language (UML) is simply another graphical


representation of a common semantic model. The proposed system has been
designed by using use case diagram, class diagram, sequence diagram,
collaboration diagram, state chart diagram and component diagram.

4.2.1 Use Case Diagram

The use case diagram consists of the actors and the use cases. The actors
of the system are user, system holder, device controller and the use cases are
authentication, checking credentials, basic ON/OFF, allow/deny user, storing
NLP commands, Input through voice commands, Deriving Data, Intrusion
Detection, Service Maintenance. describes Use Case diagram for Adaptive
Automation System (AAS).

Fig 4.2 Use case diagram for entire system


User and Non-User can login to the device by Admin. Then device
controller decides the entry of the user. The light can be turned on/off by
the system holder and they can alert the emergency system if any poisonous gas
or temperature exceeds the normal level. This can be maintained by system
maintenance.

4.2.2 Class Diagram

Class diagram is to model the static view of an application. Class


diagrams are the only diagrams which can be directly mapped with object-
oriented languages and thus widely used at the time of construction and it
is used for general conceptual modeling of the structure of the application, and
for detailed modeling translating the models into programming code. Class
diagrams can also be used for data modeling. The classes in a class diagram
represent both the main elements, interactions in the application, and the classes
to be programmed.

Fig 4.3 Class Diagram for entire system


4.2.3 Sequence Diagram

The control flow between various participants or entity roles of the


corresponding system in the form of messages is represented in the Sequence
Diagram. The participants are represented within the rectangular object. The
swim line or the lifeline that is dragged below every participant represents the
lifetime of the corresponding participant. The UML representation of a class is
rectangle containing three compartments stacked vertically. The top
compartments show the class’s name. the middle compartments list the class’s
attributes. The bottom compartment lists the class operations known as the
methods of the class. A class diagram consists of any number of classes which
will be connected by the lines, which may have arrows at one or both ends,
connecting the boxes. These lines define the relationships, also called
associations, between the classes. These lines will have multiplicity to represent
the number of instances of the classes.

Fig 4.4 Sequence Diagram for entire system


4.2.4 Collaboration Diagram

Collaboration diagram is defined as one of the interaction diagrams,


which consists of the set of objects related in a particular context and interaction
among those objects. The collaboration diagram is also called as the set of
message exchange among the objects within the collaborative nature of message
exchange between the corresponding.

Fig 4.5 Collaboration Diagram for entire system

4.2.5 Activity Diagram

Activity diagram is another important diagram in UML to describe the


dynamic aspects of the system. Activity diagram is basically a flowchart to
represent the flow from one activity to another activity. The activity can be
described as an operation of the system. The control flow is drawn from one
operation to another with different components of activity diagram. Some of the
components of activity diagram Start/Stop symbol, Action symbol, Joint and
Fork symbol, Decision symbol and Connector symbol.
Fig 4.6 Activity Diagram for entire system
4.2.6 PSEUDOCODE

Gaussian Naïve Bayes:

Input:

Training dataset T,

F= (ft, fz. f..., fo) in testing dataset. // value of the predictor variable

Output:

A class of testing dataset.

Step:

1. Read the training dataset T;

2. Calculate the mean and standard deviation of the predictor variables in each class;

3. Repeat

Calculate the probability of f, using the gauss density equation in eachclass;

Until the probability of all predictor variables (fi, f. f.fa) has been calculated.

4. Calculate the likelihood for each class;

5. Get the greatest likelihood;


Support Vector Machine (SVM):

Candidate SV = {closest pair from opposite classes}

while there are violating points do

Find a violator

Candidate SVU candidate

SVS

Violator

if any ap<0 due to addition of c to S

then candidate SV = candidate

SV\p repeat till all such points

are pruned

en

d ifend

while

K-Nearest Neighborhood:

1. Load the training and test data

2. Choose the value of K

3. For each point in test data:

- find the Euclidean distance to all training data points

- store the Euclidean distances in a list and sort it

- choose the first k points


- assign a class to the test point based on the majority of classes present in
thechosen points.

4. End

XG-Boost:

Initialization:

1. Given training data from the instance space

S = {(x1, y1),…,(Xm, Ym)} where; EX and y; € Y = {-1, +1}.

2. Initialize the distribution D₁(i) = 1/m.

Algorithm:

for t = 1,...,T: do

Train a weak learner he: X→ R

usingdistribution Dt.

Determine weight at of ht.

Update the distribution over the training set:

Dt+1 (i) = Dt(i)e¯αtyihi(xi)/Zt

where Zt is a normalization factor chosen so that Dt+1 will be a distribution.

end for

Final score:

f(x) = ∑_(𝑡 = 0)^𝑇 and H(x) = sign(f(x))


Linear Regression:

1: Start with random weights: w₁,..., wn, b2: for every point (1, 2,...,,): do

3: for i = 1, 2,. ..... n do

4: Update w←w; — a(ŷ − y)x;

5: Update bb-a(ŷ - y)6: Repeat until error is small

Random Forest:

1: procedure RANDOM FOREST

2: for 1 to T do

3: Draw n points D, with replacement from D

4. Build full decision/regression tree on D

BUT: each split only considers k features, picked uniformly atrandom


new features for every split

5: Prune tree to minimize out-of-bag error

6: end for

7: Average all T trees

8: end procedure
Table 4.1 Attributes from OASIS dataset

Attributes Description
Subject ID Unique ID of the patient
MRI. ID Unique id generated after conducting MRI on patient
Group It is a group of converted (previously normal but dementia later)
Demented and non demented
Visit Number of visits to detect dementia status
MR. Delay Not known
Gender Male, Female
Hand Handedness (All subjects were right-handed so I will drop this
column)
Age 18-96
Education Not known
SES Socioeconomic status as assessed by the Hollingshead Index of
Social Position and classified into categories from 1 (highest
status) to 5 (lowest status) clinical info
MMSE Mini-Mental State Examination score (range is from 0 = worst to
30 = best)
CDR Clinical Dementia Rating (0 = no dementia, 0.5 = very mild AD,
1 = mild AD, 2 = moderate AD) derived anatomic volumes
ETIV Estimated total intracranial volume, mm3
NWBV Normalized whole-brain volume, expressed as a percent of all
voxels in the atlas-masked image that are labeled as gray or white
matter by the automated tissue segmentation process
ASF Atlas scaling factor (unitless). Computed scaling factor that
transforms native-space brain and skull to the atlas target (i.e., the
determinant of the transform matrix)
CHAPTER 5

IMPLEMENTATION

5.1 MODULES

• Dataset Analysis
• Dataset Pre-processing
• Model Training & Testing
• Model Evaluation & Deployment

5.1.1 Dataset Analysis

The dataset used in this survey have been obtained from the widely used
data repository, ADNI (Alzheimer’s disease Neuroimaging Initiative). As it is
illustrated, this dataset has been widely used mainly to detect Alzheimer’s
disease at the earliest stages. A detailed description of the ADNI dataset.

ADNI dataset includes data recorded from the North American male and
female individuals that are “Cognitively Normal”, with “Early Mild Cognitive
Impairment”, with “Late Mild Cognitive Impairment”, or with “Alzheimer’s
Disease”. The dataset used in this survey contains 502 attributes for 1737
participants. This dataset is longitudinal since it contains data from multiple
visits per patient. In fact, ADNI contains records of individuals’ examination, at
differentmonthly intervals (i.e., from 0 to 120 months)

5.1.2 Dataset Pre-Processing

This step contains all the pre-processing functions needed to process the
input dataset. First, the data is split into train and test data files, then it performs
some pre- processing like normalization to avoid the curse of dimensionality.

Some exploratory data analysis is performed like response variable


distributionand data quality checks like null or missing values etc.
(i) Dataset cleaning

Data cleaning plays a significant role while processing a large number of


datasets from the mental health survey. When the data is used with the invalid
or null data the generating of the final results becomes crucial thus all the
irrelevant, inaccurate data is removed.

Data cleaning may be performed interactively with data wrangling tools, and
as batch processing through scripting. The data sets are cleansed to get high
quality ofdata from the available data sets

(ii) Feature Extraction:

In this step, feature extraction is performed and selection methods from


sci-kit learn python libraries. For feature selection, methods like simple bag-of-
words and n-grams and then term frequency like TF-IDF (team frequency-
inverse document frequency) weighting, have been used. Feature extraction can
be done by finding the correlation among the dataset using heat map.

5.1.3 Model Training & Testing

A training model is a dataset that is used to train an ML algorithm. It


consists of the sample output data and the corresponding sets of input data that
have an influence on the output.

The training model is used to run the input data through the algorithm to
correlate the processed output against the sample output. The result from this
correlation is used to modify the model.

This iterative process is called “model fitting”. The accuracy of the training
datasetor the validation dataset is critical for the precision of the model.
Model training in machine language is the process of feeding an ML
algorithm with data to help identify and learn good values for all attributes
involved.

There are several types of machine learning models, of which the most
common one is supervised learning.

In this module, classifiers such as Random Forest, Support Vector Machine,


Gaussian Naïve Bayes, K-Nearest Neighborhood, Linear Regression and XG-
Boost to train the model on the cleaned dataset after dimensionality reduction,
are used.

5.1.4 Model Evaluation and Deployment

In this module, the trained machine learning model is tested using the
Oasis longitudinal .csv dataset file, by pre-establishing connectivity with the
dataset file. ML classifiers are trained as well as tested in the ratio of 7:3
combination.

This survey, then, gets accuracies for the used classifiers and find out one
which results in higher accuracy, in comparison with other classifiers being
used.

Now, the module containing the classifiers code runs at the background
while a front-end model developed with HTML is used to gather data from
Demographics, from clinical tests by a Neuro-physician and data from MRI
Scans. Demographic data consists of Gender, Age, Symptoms in years and
Socio-economic status. Data from clinical tests by a Neuro-physician includes
Clinical Diagnosis Rating on a scale between 0 to 2 and Mini Mental State
Examination (MMSE) between a scale of 0 to 30. Data from MRI Scans require
inputting Estimated Total Intracranial Volume, Normalized Whole Brain
Volume and Atlas Scaling Factor, which are unit less. Finally, the module
running these classifiers at the back end and a webpage running on Flask
written in HTML surveying a combination of data is finally deployed for
runtime environment.
CHAPTER 6

TESTING

TESTING AND VALIATION

The process of evaluating software during the development process or at the


end of the development process to determine whether it satisfies specified business
requirements. Validation Testing ensures that the product actually meets the client's
needs. It can also be defined as to demonstrate that the product fulfills its intended
use when deployed on appropriate environment. Validation Testing - Workflow:
Validation testing can be best demonstrated using V-Model. The Software/product
under test is evaluated during this type of testing.

Activities:

• Unit Testing
• Integration Testing
• System Testing
• User Acceptance Test

TESTING LEVELS

Functional Testing

Functional testing is a type of testing which verifies that each function of the
software application operates in conformance with the requirement specification. This
testing mainly involves black box testing, and it is not concerned about the source
code of the application. Every functionality of the system is tested by providing
appropriate input, verifying the output and comparing the actual result with the
expected results. The testing can be done either manually or using automation.
Examples of Functional Testing Types.

• Unit testing

• Smoke testing
• User Acceptance

• Integration Testing

• Regression testing

Non-Functional Testing

Non-functional testing is a type of testing to check non-functional aspects


of a software application. It is explicitly designed to test the readiness of a
system asper nonfunctional parameters which are never addressed by functional
testing. A good example of non-functional test would be to check how many
people can simultaneously login into a software. Nonfunctional testing is
equally important as functional testing and affects client satisfaction.

Examples of Non-functional Testing Types

• Performance Testing

• Stress Testing

• Scalability

• Usability Testing

• Load Testing

DIFFERENT STAGES OF TESTING

Unit Testing

Unit testing is a level of software testing where individual units/


components of a software are tested. The purpose is to validate that each unit of
the software performs as designed. A unit is the smallest testable part of any
software. It usually has one or a few inputs and usually a single output. In
procedural programming, a unit may be an individual program, function,
procedure, etc. In object-oriented programming, the smallest unit is a method,
which may belong to a base/ super class, abstract class or derived/ child class.
(Some treat a module of an application as a unit. This is to be discouraged as
there will probably be many individual units within that module.) Unit testing
frameworks, drivers, stubs, and mock/ fake objects are used to assist in unit
testing.

Unit Testing Benefits:

Unit testing increases confidence in changing/ maintaining code. If good


unit tests are written and if they are run every time any code is changed, it will
be able to promptly catch any defects introduced due to the change. Also, if
codes are already made less interdependent to make unit testing possible, the
unintended impact of changes to any code is less. Codes are more reusable. In
order to make unit testing possible, codes need to be modular. This means that
codes are easier to reuse.

Integration Testing

Integration Testing is a level of software testing where individual units


are combined and tested as a group. The purpose of this level of testing is to
expose faults in the interaction between integrated units. Test drivers and test
stubs are used to assist in Integration Testing.

System Testing

System Testing is a level of software testing where a complete and


integrated software is tested. The purpose of this test is to evaluate the system’s
compliance with the specified requirements. Definition by ISTQB system
testing: The process of testing an integrated system to verify that it meets
specified requirements. Analogy: During the process of manufacturing a
ballpoint pen, the cap, the body, the tail, the ink cartridge and the ballpoint are
produced separately and unit tested separately. When two or more units are
ready, they are assembled and Integration Testing is performed. When the
complete pen is integrated, SystemTesting is performed.
BUILD THE TEST PLAN

Any project can be divided into units that can be further performed for
detailed processing. Then a testing strategy for each of this unit is carried out.
Unit testing helps to identify the possible bugs in the individual component, so
the component that has bugs can be identified and can be rectified from errors.

6.1 TEST CASES Table 6.1 Test cases design

S.NO Test Test Test Test Input Expected Actual


Case Description Procedure Result Result
ID
1 S101 To check Open Web Click Search Web Application
whether the browser or button on application Open with
site is opening search Web browser should open Expected
properly or not. engine after entering with output
the URL in expected
user Laptop. output
screen
2 S102 To make sure Click on the Enter The given The given
user enters the given fields Male/Female inputs inputs are
gender, age and to enter option, Age, should stored at
symptoms in Gender, Age Symptoms in suffice for the
years. and Years in execution backend.
Symptoms in integer format purpose in
Years and to & select the backend.
provide suitable
Socio- option for
Economic mentioning
status. Socio-
Economic
status.
3 S103 To make sure Mention the Select The given The given
that user required suitable inputs inputs are
provides data choice from option for should stored at
of Clinical a range of mentioning suffice for the
diagnosis rating for Clinical execution backend.
rating and Mini clinical Diagnosis purpose in
Mental State diagnosis Rating. the backend.
Examination rating.
for clinical Click on the Enter score
tests conducted fields for for Mini
by a neuro- providing Mental State
physician. score for Examination
Mini Mental in integer
State format.
Examination.
4 S104 To make sure Click on the Enter scores The given The given
that user field for for eTIV, inputs inputs are
provides eTIV, providing nWBV and should stored at
nWBV and eTIV, Atlas Scaling suffice for the
ASF present in nWBV and Factor in execution backend.
the MRI scan Atlas Scaling proper format purpose in
data. Factor. as present in the backend
the MRI scan
report.
5 S105 To check To submit Click on the Once data The
whether user the data for submit button is processed prediction
clicks on the predicting to process the after result is
submit button whether the given data. submission, visible to
to know patient is the the user.
whether demented or prediction
patient prone to not. result will
be affected by be visible to
Dementia the user.
6.2 TEST LOG
Table 6.2 Test log

S.NO TES TEST DESCRIPTION TEST


T ID STATUS
(PASS/FAIL)

1 S101 To check the user interface for getting proper inputs.


PASS

To obtain the Demographic data such as


PASS
2 S102 Gender, Age, Symptoms in years and Socio-
Economic status.

PASS
3 S103 To obtain the clinical diagnosis rating and mini
mental state examination data from a neuro-
physician.

PASS
4 S104 To obtain the eTIV, nWBV & ASF values.

To submit all the obtained values and to PASS


5 S105 click on the submit button.

To get the result whether the individual is prone


6 S106 to get affected by dementia or not.
PASS

7 S107 To go back to the home page for performing the same PASS
for another individual.
CHAPTER 7

RESULTS AND DISCUSSION

Various experiments were conducted to identify an efficient way for prediction


of Alzheimer’s disease in advance. The three datasets were analyzed to identify the
best ML technique. Also, the efficiency of various ML techniques was evaluated. The
below results and discussions show the various areas that are being improved or is
much more efficient in the proposed system.

Random forest vs other ML techniques

A random forest is a machine learning technique that’s used to solve regression


and classification problems. It utilizes ensemble learning, which is a technique that
combines many classifiers to provide solutions to complex problems. A random
forest algorithm consists of many decision trees. The random forest algorithm
establishes the outcome based on the predictions of the decision trees. It predicts by
taking the average or mean of the output from various trees. Increasing the number of
trees increases the precision of the outcome. A random forest eradicates the
limitations of a decision tree algorithm. It reduces the overfitting of datasets and
increases precision. It’s more accurate than other ML algorithms. Also, it provides an
effective way of handling missing data, a reasonable prediction without hyper-
parameter tuning. And in every random forest tree, a subset of features is selected
randomly at the node’s splitting point. The comparison ML of Techniques results
that, the Random Forest is the best technique gives better results than other
techniques. Finally, the accuracy of ML techniques is evaluated by dropping the
important features of the dataset that is shown below.
TABULATIONS

The results obtained by dropping the couples of features from the OASIS
datasetare as discussed below.
Table 7. 1: Accuracies generated by dropping features
Algorithm/Dropped eTIV eTIV ASF MMSE CDR nWBV Group Group & Age
Features & & & & SES & & Age & ASF
ASF nWBV CDR SES

Random Forest 99.1 98.9 98.2 97.2 96.05 95.8 90.09 89.19

Support Vector 50.45 68.47 50.45 50.45 50.45 50.45 50.45 50.45
Machine

Gaussian Naïve Bayes 97.3 97.4 97.6 96.8 95.5 95.4 90.09 89.19

K-Nearest Neighbor 44.14 83.79 55.86 53.19 48.65 48.75 48.65 60.36

Logistic Regression 91.89 98.1 89.19 92.79 69.37 89.19 70.27 89.19

XG-Boost 98.1 96.1 97.9 95.8 95.89 95.1 90.09 89.19

Table 7. 2: Accuracies generated by dropping features


Algorithm/Dropped EDUC Sub ID & MR EDUC & MMSE MRI ID M/F&Visit
Features & MRI Visit Delay & SES & MR & Sub ID
ID Visit Delay
Random Forest 88.78 88.13 87.57 87.12 86.87 85.97 85.25

Support Vector 50.45 50.45 50.45 50.45 50.45 50.45 50.45


Machine
Gaussian Naïve 87.83 88.1 86.87 83.76 86.07 85.04 85.14
Bayes

K-Nearest Neighbor 50.45 47.75 55.86 54.05 53.15 50.45 48.65

Logistic Regression 76.58 72.07 86.4 82.7 82.99 79.28 71.17

XG-Boost 85.71 88.1 85.6 86.62 85.45 84.76 84.95

This survey reveals us that, Random Forest Classifier has resulted in highest
accuracy of 99.1% among all the other ML classifiers, by dropping couple of features
in different combinations.
GRAPHS
The graphs were drawn for combination of dropped features based onthe
tabulation above.
120
eTIV & ASF
100

80
60
40
20
0
Random Forest Support Vector Gaussian K-Nearest Logistic XG-Boost
Naïve Neighbor Regression
Machine Bayes
Fig 7.1 Accuracies generated by dropping eTIV & ASF

120
eTIV & nWBV
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear Regression XG-Boost
Machine Bayes Neighborhood

Fig 7.2 Accuracies generated by dropping eTIV & nWBV

ASF & CDR


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear XG-Boost
Machine Bayes Neighborhood Regression

Fig 7.3 Accuracies generated by dropping ASF & CDR


MMSE & SES
120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear XG-Boost
Machine Bayes Neighborhood Regression

Fig 7.4 Accuracies generated by dropping MMSE & SES

CDR & SES


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.5 Accuracies generated by dropping CDR & SES

nWBV & Age


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.6 Accuracies generated by dropping nWBV & Age


Group & ASF
100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.7 Accuracies generated by dropping Group & ASF

Group & Age


100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.8 Accuracies generated by dropping Group & Age

EDUC & MRI ID


100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.9 Accuracies generated by dropping EDUC & MRI ID


Sub ID & Visit
100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.10 Accuracies generated by dropping SUB ID & Visit

MR Delay & Visit


100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.11 Accuracies generated by dropping MR Delay & Visit

EDUC & SES


100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.12 Accuracies generated by dropping EDUC & SES


MMSE & MR Delay
100

80

60

40

20

0
Random
Random Forest
Forest Support
Support Vector
Vector Gaussian
Gaussian Naïve
Naïve K-Nearest
K-Nearest Logistic
Logistic XG-Boost
XG-Boost
Machine
Machine Bayes Neighbor
Neighbor Regression

Fig 7.13 Accuracies generated by dropping MMSE & MR Delay

MRI ID & Sub ID


100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.14 Accuracies generated by dropping MRI ID & SUB ID

M/F & Visit


100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression

Fig 7.15 Accuracies generated by dropping MRI ID & SUB ID


CHAPTER 8

USER MANUAL

Installing Python

Step 1: A Python 3.8 setup pop-up window will appear.

Step 2: Ensure that the Install for all user radio button is pressed.

Step 3: Click Next > button.

Step 4: A new Python 3.8 setup pop-up window will appear Select Destination

Step 5: The default directory will appear in the bottom as C:\Python37\

Step 6: Click the Next > button.

Step 7: A new Python 3.8 setup pop-up window will appear.

Step 8: Use the default customization, which selects the PythonInterpreter and
all its libraries (about 50 Mb).

Step 9: Click the Next > button.

Step 10: Click the Yes button on the following window.

Step 11: A new Python 3.8 setup pop-up window will appear.

Step 12: Click the Finish button.

Installing Anaconda Navigator

Step 1: Download Anaconda Navigator .exe or .zip(extract) from the internet.

Step 2: After downloading, click on the .exe file, an Anaconda Navigator Setup
pop-up will appear.
Step 3: Accept the Terms and Conditions along with End User License Agreement.

Step 4: Check the mandatory check-boxes throughout the setup wizard.

Step 5: Let remain of the default configurations preset in the navigator setup.
Step 6: Specify the path where the navigator is to be accessed.

Step 7: Click Finish once the setup ends.

Step 8: Now the Anaconda Navigator is installed and is ready for use.

Installing PyCharm IDE

Step 1: Download latest PyCharm IDE .exe or .zip(extract) from the internet.

Step 2: After downloading, click on the .exe file, a Setup pop-up will appear.

Step 3: Accept the Terms and Conditions along with End User License

Step 4: Check the mandatory check-boxes throughout the setup wizard.

Step 5: Let remain of the default configurations preset in the IDE setup.

Step 6: Specify the path where the IDE is to be accessed.

Step 7: Click Finish once the IDE setup ends.

Step 8: Now the PyCharm IDE is installed and is ready for use.

Executing the modules

Step 1: Open PyCharm IDE and click on Settings and navigate to Python Interpreter
and specify the Interpreter by mapping the Anaconda Navigator application’s
file path.
Step 2: Install all the necessary packages available open-source across the PyCharm
Library.

Step 3: After installing the packages, it will become feasible to import the necessary

ones used in the code file.

Step 4: Open the code file, compile it for checking if there are any errors.

Step 5: Multiple code files are used across modules. Make sure the necessary

packages are installed in place. Once the preliminary checks are done, compile

andrun the files.


Step 6: While running the files containing a web app or webpage, make sure it is

carried through a proper environment. Here Flask package is imported and is

used for hosting the web page.

Step 7: After executing various modules, desired outputs are made visible to the user.
In case of webpage, click on the local host hyperlink, present in the execution
window, our webpage will become available for usage.
Step 8: Enter the Demographic Data, Data from clinical tests by a Neuro Physician

and Data from MRI scan report.

Step 9: Enter proper values that are within the applicable range.

Step 10: Once the values are entered, click on the submit button and then follows up
another page, predicted the result whether the concerned individual is
demented or non-demented

.
CHAPTER 9

CONCLUSION

This project has presented a survey of comparative analysis of ML


techniques to detect Alzheimer’s disease early. In the process of doing so, by
testing all of our classifiers by dropping a couple of features in combinations,
Random Forest generated the highest accuracy of 99.1% amongst all others. As
a result, the Random Forest classifier emerges as the best option for detecting
Alzheimer's disease in its earlier stages.
CHAPTER 10

FUTURE ENHANCEMENT

As a scope for future enhancement, one can improve the methods used in
building the overall prediction models, by using customized classifiers as a result of
a fusion of two or more best existing ones, that have resulted in an ideal accuracy so
far. Further, there is a room for improvement when it is about the overall design of
user interface and user experience. Much finely tuned dataset, with subject to
availability, can act as a catalyst in developing prediction models for detecting the
presence of Alzheimer’s disease, much earlier and simpler than existing models at
use. This enhancement may make it much more optimized and effective to use, for
conformity checks, in regard with the diagnosed disease.
APPENDIX I
BASE PAPER I
2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)

$O]KHLPHU'LVHDVH3UHGLFWLRQXVLQJ
0DFKLQH/HDUQLQJ$OJRULWKPV
J.Neelaveni M.S.Geetha Devasana
Computer Science and Engineering Department, Computer Science and Engineering Department,
Sri Ramakrishna Engineering College-641022 Sri Ramakrishna Engineering College-641022
Coimbatore, India Coimbatore, India
[email protected] [email protected]
Abstract---Alzheimer disease is the one amongst approach of life, and good planning to take care of
neurodegenerative disorders. Though the memory loss.
symptoms are benign initially, they become more
severe over time. Alzheimer's disease is a II. LITERATURE SURVEY
prevalent sort of dementia. This disease is
challenging one because there is no treatment for Ronghui Ju et.al, suggested method of deep
the disease. Diagnosis of the disease is done but learning along with the brain network and clinical
that too at the later stage only. Thus if the disease significant information like age, ApoE gene and
is predicted earlier, the progression or the gender of the subjects for earlier examination of
symptoms of the disease can be slow down. This Alzheimer’s [1]. Brain network was arranged,
paper uses machine learning algorithms to predict calculating functional connections in the brain region
the Alzheimer disease using psychological by employing the resting-state functional magnetic
parameters like age, number of visit, MMSE and resonance imaging (R-fMRI) data. To produce a
education. detailed discovery of the early AD, a deep network
Keywords- Alzheimer disease, mild like autoencoder is used where functional
cognitive impairment, machine learning algorithms, connections of the networks are constructed and are
psychological parameters. susceptible to AD and MCI. The dataset is taken
from the ADNI database. The classification model
I. INTRODUCTION consists of the early diagnosis, initially preprocessing
of raw R-fMRI is done [1]. Then, the time series data
Alzheimer disease is caused by both genetic (90 ×130matrix) is obtained and that indicates blood-
and environmental factors, those affects the brain of a oxygen levels in each and every region of brain and
person over time. The genetic changes guarantee a changes a long period. Then, a brain network is built
person will develop this disease. and transformed to a 90 ×90 time series data
correlation matrix. The targeted autoencoder model is
This disease breaks the brain tissue over time. It used which is a three layered model which gives
occurs to people over age 65. However people live with intellectual growth of the nervous system then
this disease for about 9 years and about 1 among excerpts brain networks attributes completely [1].
8 people of age 65 and over have this disease. When finite amount of data cases is taken, k-fold
cross verification was implemented mainly to avoid
MMSE (Mini Mental State Examination) the over fitting complication.
score is the main parameter used for prediction of the
disease. This score reduces periodically if the person K.R.Kruthika et.al, proposed a method
is affected. Those people having MCI have a serious called multistage classifier by using machine learning
risk of growing dementia. When the fundamental algorithms like Support Vector Machine, Naive
MCI results in a loss of memory, the situation Bayes and K-nearest neighbor to classify between
expects to develop to dementia due to this kind of different subjects [2]. PSO (particle swarm
disease. optimization) which is a technique that best selects
the features was enforced to obtain best features.
There is no treatment to cure Alzheimer's Naturally image retrieving process requires two
disease. In advanced stages of the disease, stages: the first stage involves generating features so
complications like dehydration, malnutrition or that it reproduces the query image and then later step
infection occurs which leads to death. The diagnosis correlates those features with already gathered in
at MCI stage will help the person to focus on healthy database [2]. The PSO algorithm is used to select the
finest biomarkers that show AD or MCI. The data is
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 101
taken from Alzheimer's disease Neuroimaging Fan Zhang et.al, proposed a multi-modal
Initiative (ADNI) database. The MRI scans are model where medical images are used for training. It
preprocessed first after taking from the database. The is done by two separate convolutional neural
feature selection includes volumetric and thickness networks. Here an auxiliary diagnosis using a deep
measurements. Then the optimum feature lists were learning model is used [4]. The two separate
obtained from PSO algorithm [2]. The Gaussian independent CNN for extracting the characteristics
Naïve Bayes, K- Nearest Neighbor, Support vector from both the MR scans and also the PET scans is
machine was used to distinguish between the used. It is obtained by a sequence of forward
subjects. Here a 2 stage classifier was used where in procreation convolution and the downer sampling
the initial stage GNB classifier was used to classify method. The outputs cohesion is calculated by
the objects between AD, MCI and NC and in later correlation analysis for the two networks. The
stages SVM and KNN were used to analyze the structure of CNN consists of sampling layer in the
object based on the performance of the initial one [2]. down, a convolution row or a fully connective row,
Control Based Image Retrieval was used for pooling layer and finally the output row [4]. Then it
retrieving images from the database. computes the correlation using Pearson correlation
coefficient method in between the prognosis of MRI
Ruoxuan Cuia et.al, proposed a model where scans and PET images. The main idea of the
longitudinal analysis is performed on consecutive correlation search is for regulating the output of both
MRI and is essential to design and compute the neuroimaging examinations whether they were
evolution of disease with time for the purpose of persistent or not. The purpose of identifing and
more precise diagnosis [3]. The actual process uses classifing is finished by using the output layer called
those features of morphological anomaly of the brain softmax logistic regression method [4]. The benefit
and the longitudinal difference in MRI and of this process is, it merges clinical
constructed classifier for distinguishing between the neuropsychological results with neuroimaging
distinct groups. The MRI brain images of 6 time results.
points that is for consecutive intervals in a gap of six
months are taken as inputs from ADNI database [3]. III. IMPLEMENTATION OF SUPPORT VECTOR
Then feature learning is done with the 3D MACHINE
Convolutional Neural Network. The CNN is followed
by a pooling layer and have many ways for pooling, SVM is directed study model that classifies
like collecting mean value otherwise the maximal, or by separating the objects using a hyperplane. It can
definite sequence of neuron in the section. But for be used for both classification and regression. The
studying the characteristics, the convolutional hyperplanes are drawn with the help of the margins.
operation of 2×2×2 is applied so that a linear The main goal is to maximize the distance between
combination is studied for pooling of neurons [3]. the hyperplane and the margin.
The fully connected layer has neurons that
produce output of all neurons in a linear combination, The margins are drawn with the help of
which are taken from preceding layer and then is support vectors that are belonging to the objects. The
moved through nonlinearity. Finally for the last fully main advantage of SVM is that it can distinguish
connected, a softmax layer is particularly used and linear and non-linear objects. Fig.1 shows the steps in
then tuned finely for back-propagation to predict the predicting the Alzheimer disease using machine
class probability [3]. The result of each node varies learning algorithms.
from 0 to 1, and the total of nodes will always be 1.
Finally the classification includes the deep network classifier = svm (formula=age, visit, MMSE, EDUC
construction including the 3D CNN training and .,data = train, type = 'C-
RNN model training. Then the results of fully classification', kernel = 'linear')
connective layers are directly mapped using a
softmax function [3]. The initial parameters that were The required packages for SVM classifier in
trained by both 3 dimensional CNN and the RNN R are caret and e1071 packages. The formula consists
network are established and then only the uppermost of the fields that are considered for prediction. The
fully connective layer parameters and the softmax basic type c-classification and linear kernel is chosen.
layer that was used for prediction are adjusted so that They both mostly depend on the data used.
the dimensional and longitude features were united
The psychological parameters are given as
for distinct identification.
the input for the classifier. When the classifier is

978-1-7281-5197-7/20/$31.00 ©2020 IEEE 102


trained and given for testing, it predicts the output package ctree() can also be used to analyze the decision
with an accuracy of 85%. tree.

V. RESULTS
Psychological
parameters TABLE I COMPARISON OF PREDICTION
ACCURACY OF MACHINE LEARNING
ALGORITHMS

SVM Decision tree ALGORITHM ACCURACY


Classifier Classifier
Support Vector Machine 85%

Decision Tree 83%

Performance The dataset has various parameters but only


Comparison the significant parameters that greatly help in
(Accuracy) predicting the disease like MMSE score, age,
number of visits, and education of the patients were
used. When the machine learning algorithms like
Fig 1 Block Diagram
Support Vector Machine, Decision Tree were used,
they predict the disease with different accuracies.
IV. IMPLEMENTATION OF DECISION TREE Each algorithm is trained with 70% training dataset
and tested with 30% test dataset.
Decision tree is a supervised learning model
that uses a set of rules to find a solution. It can solve The conduct of the algorithms is compared
any type and variety of problems. It can also be used based on their accuracy. Then the dataset is
for both classification and regression. A small change partitioned according to that ratio and when the
in the data also can give a great impact on the output. algorithms are compared the best one is selected and
can be used for next stage of prediction.
For a continuous variable regression trees
can be used and for categorical variable classification
VI. CONCLUSION
trees can be used. The decision tree consists of the
following nodes:
Machine learning approach to predict the
x Root node: It is the starting point of the tree. Alzheimer disease using machine learning algorithms
is successfully implemented and gives greater
x Internal node: It represents the decision
prediction accuracy results. The model predicts the
point of the problem that leads to the
disease in the patient and also distinguishes between
solution.
the cognitive impairment.
x Leaf node: They are the final or last nodes
The future work can be done by combining
of the entire tree.
both brain MRI scans and the psychological
The algorithm for decision tree classifier is as parameters to predict the disease with higher
follows: accuracy using machine learning algorithms. When
they are combined, the disease could be predicted
model <- rpart (formula = age, visits, MMSE, EDUC~ ., with a higher accuracy in the earlier stage itself.
data = alzhe,
method = "class") REFERENCES

The formula consists of the fields that are [1] K.R.Kruthika, Rajeswari, H.D.Maheshappa,
considered for the prediction of the Alzheimer disease. “Multistage classifier-based approach for
The method class indicates the classification trees. The Alzheimer’s Disease prediction and retrieval”,
packages used here are party, rpart, and rpart.plot. The Informatics in Medicine Unlocked, 2019.

978-1-7281-5197-7/20/$31.00 ©2020 IEEE 103


[2] Ronghui Ju , Chenhui Hu, Pan Zhou , and [12] Grassi M, Loewenstein DA, Caldirola D, Schruers
Quanzheng Li, “Early Diagnosis of Alzheimer’s K, Duara R, Perna G, “A clinically-translatable
Disease Based on Resting-State Brain Networks machine learning algorithm for the prediction of
and Deep Learning”, IEEE/ACM transactions on Alzheimer's disease conversion: further evidence
computational biology and bioinformatics, vol. of its accuracy via a transfer learning
16, no. 1, January/February 2019. approach”, Int Psychogeriatr, 2018 14:1–9. doi:
[3] Ruoxuan Cuia, Manhua Liu “RNN-based 10.1017/S1041610218001618.
longitudinal analysis for diagnosis of Alzheimer’s [13] Nation, D.A., Sweeney, M.D., Montagne, A.,
disease”, Informatics in Medicine Unlocked, Sagare, A.P., D’Orazio, L.M., Pachicano, M. et
2019. al, “Blood-brain barrier breakdown is an early
[4] Fan Zhang , Zhenzhen Li , Boyan Zhang , biomarker of human cognitive dysfunction”, Nat
Haishun Du , Binjie Wang , Xinhong Zhang, Med. 2019;25:270–276.
“Multi-modal deep learning model for auxiliary
diagnosis of Alzheimer’s disease”,
NeuroComputing, 2019.
[5] Chenjie Ge , Qixun Qu , Irene Yu-Hua Gu ,
Asgeir Store Jakola “Multi-stream multi-scale
deep convolutional networks for Alzheimer’s
disease detection using MR images”,
NeuroComputing, 2019.
[6] Tesi, N., van der Lee, S.J., Hulsman, M., Jansen,
I.E., Stringa, N., van Schoor, N. et
al, “Centenarian controls increase variant effect
sizes by an average twofold in an extreme case-
extreme control analysis of Alzheimer's
disease”, Eur J Hum Genet. 2019;27:244–253
[7] J. Shi, X. Zheng, Y. Li, Q. Zhang, S. Ying,
"Multimodal neuroimaging feature learning with
multimodal stacked deep polynomial networks for
diagnosis of Alzheimer’s disease", IEEE J.
Biomed. Health Inform., vol. 22, no. 1, pp. 173-
183, Jan. 2018.
[8] M. Liu, J. Zhang, P.-T. Yap, D. Shen, "View-
aligned hypergraph learning for Alzheimer's
disease diagnosis with incomplete multi-modality
data", Med. Image Anal., 2017 vol. 36, pp. 123-
134.
[9] Hansson O, Seibyl J, Stomrud E, Zetterberg H,
Trojanowski JQ,Bittner T, “CSF biomarkers of
Alzheimer’s disease concordwith amyloid-bPET
and predict clinical progression: A study of
fullyautomated immunoassays in BioFINDER and
ADNI cohorts”. Alzheimers Dement
2018;14:1470–81.
[10] Van der Lee SJ, Teunissen CE, Pool R, Shipley
MJ, Teumer A,Chouraki V, “Circulating
metabolites and general cognitive abilityand
dementia: Evidence from 11 cohort studies”,
Alzheimer’s Dement2018;14:707–22
[11] Kauppi Karolina, Dale Anders M, “Combining
Polygenic Hazard Score With Volumetric MRI
and Cognitive Measures Improves Prediction of
Progression from Mild Cognitive Impairment to
Alzheimer’s Disease”, Frontiers in Neuroscience,
2018.

978-1-7281-5197-7/20/$31.00 ©2020 IEEE 104


APPENDIX II
SCREENSHOT
User Interface for data entry:

User Interface for detecting Demented Individual:


User Interface for detecting Converted Individual:

User Interface for detecting Non-Demented Individual:


APPENDIX III
PUBLICATIONS
Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

Comparative Analysis of Machine


Learning Approaches for Early
Detection of Alzheimer’s Disease
Akileshwaran S1 , Sathish kumar R2 , Malathi A3 , Maheswari M 4 , Roselin Mary S5
1. Student, Computer science and engineering, Anand Institute of Higher Technology,
Chennai, India.
2. Student, Computer Science and Engineering, Anand Institute of Higher Technology,
Chennai, India.
3. Assistant Professor, Computer Science and Engineering, Anand Institute of Higher
Technology, Chennai, India.
4. Assistant Professor, Computer Science and Engineering, Anand Institute of Higher
Technology, Chennai, India.
5. Head of Department, Computer Science and Engineering, Anand Institute of Higher
Technology, Chennai, India.

ABSTRACT

The purpose of this survey is to compare the accuracy of Machine Learning algorithms to predict
Alzheimer's disease early. Machine learning algorithms is comparatively applied on variety of
biomarkers correlated with disease in order to examine the efficiency of those Machine Learning
techniques. Based on this analysis the foremost algorithm can be employed to perceive the Alzheimer’s
disease in advance. In this paper we are going to find better ways to predict the Alzheimer’s disease
when other chronic conditions are present.

Keywords: Random Forest (RF), Support Vector Machine (SVM), Gaussian Naïve Bayes (GNB),
Logistic Regression (LR), K-Nearest Neighbor (KNN), XG-Boost, Machine Learning (ML), Features,
Classifiers

1.INTRODUCTION
A disease’s likelihood of being cured greatly increases if it is diagnosed as soon as possible. In
recent years, the use of Machine Learning (ML) is surging through all fields of science, and the field of
neurology is definitely undergoing a revolution thanks to it. Medical science has benefitted from the
application of Machine Learning to improve the prediction and detection of Alzheimer’s disease.
Through this effort, we aim at finding the most accurate technique for detecting different brain
diseases which can be employed for future betterment. The symptom of that woman was memory loss,
language problem and unpredictable behavior. Then her brain was tested and noticed that anomalous
clumps and collection of fibers are created in the brain, then the res earch is started on that and given
the name of the doctor to the disease as Alzheimer. A neurofibrillary fiber is a fibrous clump of
abnormal protein, known as an amyloid plaque. The disease states from the hippocampus and spread
over the brain, as the res ult of this the death of neuron occurs and the tissues in the brain shrunk so
that fully memory loss will cause.
To find better ways to manage dementia when other chronic conditions are present. By
developing prediction model for early detection of Alzheimer's disease, we can help doctors to have a
better chance of helping asymptomatic patients by preventing further complications. This means that
the diagnosis criteria and treatment plan for Alzheimer’s disease needs to be revised. Determining
whether inflammatory reactions are persistent is critical for diagnosing and treating Alzheimer’s
disease.

17365 ijariie.co m 4139


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

2. RELATED WORKS
The sorts of data used and the efficiency of machine learning approaches in predicting early stages of
Alzheimer's disease have been highlighted as recent trends in machine learning [1]. The "MRI and
Alzheimer's" dataset, which was provided by the Open Access Series of Imaging Studies (OASIS)
project, was used to predict Alzheimer's disease or dementia in adult patients. SVM is the best model
among the other models in the system. It has better accuracy, recall, area under the curve, and F1 score
[2]. As rapid progress in neuroimaging techniques has created large-scale multimodal neuroimaging
data, the application of deep learning to early diagnosis and automated categorization of Alzheimer's
disease (AD) has recently gotten a lot of interest.[3]. A deep convolutional neural network-based
pipeline for Alzheimer's disease diagnosis utilizing magnetic resonance scans, as well as a four-way
classifier to predict AD [4]. A method for promoting end-to-end learning of a volumetric convolutional
neural network (CNN) model for four binary classifications [5]. Non-amyloid biomarker panels based
on blood for early detection of Alzheimer's disease Apart from that, this notion focuses on identifying
the performance of novels like A2M, ApoE, BNP, Eot3, RAGE, and SGOT in order to determine mild
cognitive impairment (MCI)[6]. Multiple Deep learning and Machine learning techniques were
reviewed. A novel multiclass classification strategy that utilizes one-versus-one error correction output
codes classification and pairwise t-test feature selection to handle the outlier identification problem [8].
The pathological hallmarks of AD brains are early stage -amyloid oligomers (AOs) and late-stage A
plaques. The intention of this initiative is to detect A abnormalities in the early and late phases of
Alzheimer's disease [9] Classical applications such as graph partitioning, graph visualization, and graph
coarsening have recently been utilized in Graph Convolutional Neural Network (GCNN) architecture
to perform graph pooling. This modified GCNN architecture is then used as a graph signal classifier to
detect early-stage Alzheimer's disease [10]. Review of contemporary machine learning and deep
learning approaches for detecting four brain diseases, including Alzheimer's disease (AD), brain
tumors, epilepsy, and Parkinson's disease, in order to determine the most accurate technique for
detecting different brain diseases that can be used in the upcoming years [11]. A metabolite-corrected
artery input function (AIF) is required for quantitative analysis of PET brain imaging data in order to
estimate distribution volume and related outcome measures. PET studies that collect arterial blood
samples add risk, cost, measurement inaccuracy, and patient discomfort.[12]. Machine learning
algorithms use psychological MMSE parameters including age, number of visits, and education to
predict Alzheimer's disease. Support vector machine and decision tree techniques used [13]. As the
combined high-order network (CHON) constructs FCN by combining static, dynamic, and high -level
information, whereas the GCN is utilized to integrate non-image information to improve the classifier's
performance [14]. ResNet18 and DenseNet201 were utilized to perform the AD multiclass
classification challenge. [15]. A survey, analysis, and critical critique of recent work on the early
diagnosis of Alzheimer's disease using machine learning techniques [16]. Thus, by referencing the
work done in related articles, we were finally able to conceive a survey with the help of our dataset
[17] for developing our front-end model for the early detection of Alzheimer’s disease.

3.EXISTING SYSTEM
Previously, users had to manually enter MRI images into the system, after which the value was
calculated using image restoration, linear filtering, pixelation, grey scaling, and template matching.
Later values from the image were extracted, and those values were trained and evaluated against a
dataset to precisely establish the ranges for each decisive value. Thus, utilizing generated values from
feed photographs as input values, integer and float data types, we were able to detect the presence of
the disease. Finally, the output indicates whether or not the patient has been diagnosed with dementia.

4.PROPOSED SYSTEM
By presenting our survey, we aim to establish a front end. In our dataset, we want to train the
supervised ML classifiers. We created a survey to collect data on the accuracies of different supervised
ml classifiers. We were able to select the classifier with the highest accuracy by eliminating a few
features in various combinations that were present in our dataset. We also acquire values from
demographics, neuro-physicians' clinical tests, and MRI scan reports, and submit them to the backend
for processing. Finally, we can forecast whether or not a person has acquired the risk of developing
dementia.

17365 ijariie.co m 4140


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

5. IMPLEMENTATION
The system is divided into four sections. Dataset Analysis is the first module, and it is the process of
evaluating, cleansing, transforming, and modelling data with the goal of identifying relevant
information through informing conclusions and helping decision -making. The second module is
Dataset Preprocessing to cleaning the data, which increases the accuracy and efficiency of a machine
learning model using Synthetic Minority Oversampling Technique (SMOTE). The third is Model
testing and Training, in this module we use supervised classification algorithms for training and testing
our dataset using machine learning algorithms. The fourth module is Model Deployment, in this
Module we developed User interface.

5.1 MACHINE LEARNING


Machine Learning is a process of training a computer to apply its past experience to solve a problem
given to it. The machine can process, analyze and make abstracts based on a large amount of data. Its
unique and intelligent behavior allows it to discover correlations and insights that are not readily
apparent to the human eye, making abstractions from experience.

5.2 CLASSIFICATION
Data samples to be assessed that is transforming raw datasets into machine readable data this method is
known as preprocessing or data cleaning. Besides the removal of a characteristics of data to be
performed called feature extraction. After the extraction of features, the data can be labeled. The
method by which the machine takes decisions of labeling data is called a classifier or classification.
Certain Machine Learning techniques are used in this paper such as RF, GNB, SVM, XGB, KNN, LR.
It uses several layers of nonlinear processing units. The output of a unit is imparted as input to the next
unit. Throughout the ordered structure of data movement, each level transforms the data it receives into
more condense data to be imparted to the next level. ML classifiers to detect brain diseases can be
classified as shown in Figure 1.

Figure A: Classification of ML

5.3 DATASET

17365 ijariie.co m 4141


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

Dataset being collected from the OASIS longitudinal in Kaggle repository. Kaggle(www.kaggle.com)
is a repository containing over 50,000 publicly available datasets. Labels indicating the presence and
absence of tumors are marked by “yes” and “no”, respectively. The dataset is an open-source .csv data
set that can be used by anyone. Initially, it consisted of 374 patients' data in rows and features in 15
columns, all of them being right-handed and aged 18 to 96 years. Both male and female patients were
present. One hundred of them aged above 60 were diagnosed with very mild to moderate AD. MRIs
should be done with three to four T1-weighted scans, with high contrast to noise ratio. Here, the total
volume of the brain and the estimation of the intracranial volume used for analyzing normal aging and
Alzheimer's disease.

5.4 ML TECHNIQUES
Machine learning algorithms employ computer methods to "learn" information directly from data rather
than depending on a model based on a preconceived equation. As the number of samples available for
learning grows, the algorithms adapt their performance.

5.4.1 Gaussian Naive Bayes


It calculates affiliation probabilities for each class, such as the likelihood that a certain record or data
point belongs to that class. The most likely class is defined as the one having the highest probability.
When working with continuous data, one common assumption is that the continuous values asso ciated
with each class follow a normal (or Gaussian) distribution.

5.4.2 Support Vector Machine


This analyzes data for classification and regression analysis. It creates a hyperplane that separate to
classes, it can create a hyperplane or set of hyperplanes in high dimension space. We want to optimize
the margin between the data points and the hyperplane.

5.4.3 K-nearest neighborhood (KNN)


A distance metric is at the heart of this classifier. The more accurately that metric captures label
similarity, the better. It’s not an invariable technique used to find matching ratings and average ratings
of top of KNN. It’s a dominant technique to understand and to execute. A pecu liarity of the KNN
algorithm is that it’s sensitivity to local structure of the data.

5.4.4 XG-Boost
Gradient boosting is an AI method utilized in classification and regression assignments, among others.
A loss function should be improved, which implies bringing down the loss function better than the
result. Decision trees are utilized in this limit the loss function. After training, if we want to predict for
a new data point then we will use constructed trees or models to get all values to solve the equation.

5.4.5-Logistic-Regression
It is an algorithm for predictive analysis that relies on the concept of probability. This method is used to
for binary classification problems. This algorithm is based on predictive analysis which is used to
describe data. It also describes the relationship between one or more nominal or ratio -level independent
variables and one or more department binary variables.

5.4.6-Random Forest
Random Forest is an adjustable, effortless to Machine Learning algorithm. It’s one of the incredible
and most effective Machine Learning algorithms uses both classification and regression. Ensemble
learning methods such as classification and regression produce mode of prediction mean by creating a
multitude during training.

17365 ijariie.co m 4142


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

6. RESULT AND DISCUSSION


The results obtained by dropping the couples of features from the OASIS dataset are as discussed here

Table-1: Accuracy by dropping couple of features in different combinations.

Table-2: Accuracy results generated by dropping couple of features in different combinations

6.1 GRAPHS FOR RESULTS

The graphs were drawn for combination of dropped features based on the tabulation above.

120
eTIV & ASF
100

80

60

40

20

0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression
Figure 1: Accuracies generated by dropping eTIV & ASF

17365 ijariie.co m 4143


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

eTIV & nWBV


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear XG-Boost
Machine Bayes Neighborhood Regression

Figure 2: Accuracies generated by dropping eTIV & nWBV

ASF & CDR


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear XG-Boost
Machine Bayes Neighborhood Regression

Figure 3: Accuracies generated by dropping ASF & CDR

MMSE & SES


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Linear XG-Boost
Machine Bayes Neighborhood Regression

Figure 4: Accuracies generated by dropping MMSE & SES

17365 ijariie.co m 4144


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

CDR & SES


120
100
80
60
40
20
0
Random Forest Support Vector Gaussian Naïve K-Nearest Logistic XG-Boost
Machine Bayes Neighbor Regression
Figure 5: Accuracies generated by dropping CDR & SES

6.2 RESULTS

Figure 6: User Interface for data entry.

Figure 7: User Interface for detecting Demented Individual

17365 ijariie.co m 4145


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

Figure 8: User Interface for detecting Converted Individual

Figure 9: User Interface for detecting Non-Demented Individual

Using the tabulations mentioned above, we were able to plot the graphs based on those. In figure 1, we
plotted the graph by dropping a couple of features, namely eTIV & ASF. In figure 2, we plotted the
graph by dropping a couple of features, namely eTIV & nWBV. In figure 3, we plotted the graph by
dropping a couple of features, namely ASF & CDR. In figure 4, we plotted the graph by dropping a
couple of features, namely MMSE & SES. In figure 5, we plotted the graph by dropping a couple of
features, namely CDR & SES.

Thus, by looking at the above bar graphs and tables we come to a conclusion that, out of a number of
classifiers which were being simultaneously trained and tested against the dataset. Out of these trained
and tested classifiers, it has been surveyed that, out of all the classifiers that we have tested, by
discarding a combination eTIV and ASF of and with a 7:3 splitting ratio, the Random Forest classifier
produced the highest accuracy of 99.1%, amongst all other classifiers.

Thus, based on our overall survey, we were able to develop our fron t-end model. For our User
Interface, in figure 6, it depicts our home page, containing fields for obtaining input values, from 3
categories of data, namely Demographic Data, Data from clinical tests conducted by a Neuro -physician

17365 ijariie.co m 4146


Vol-8 Issue-3 2022 IJARIIE-ISSN(O)-2395-4396

and Data from MRI scans. In figure 7, after submitting all the obtained values, we were able to witness
the particular patient was demented. In figure 8, after submitting all the obtained values, we were able
to witness the particular patient was non-demented. In figure 9, after submitting all the obtained values,
we were able to witness the particular patient was converted.

7. CONCLUSION
In this paper, we have presented a survey of comparative analysis of ML techniques to detect
Alzheimer’s disease early. In the process of doing so, we tested all of our classifiers by dropping a
couple of features in combinations and we witnessed Random Forest generating the highest accuracy of
99.1% amongst all others. As a result, the Random Forest classifier emerges as the best option for
detecting Alzheimer's disease in its earlier stages.

8. REFERENCES
[1]. Khan, Aunsia, and Muhammad Usman. "Early diagnosis of Alzheimer's disease using machine
learning techniques: A review paper." 2015 7th International Joint Conference on Knowledge
Discovery, Knowledge Engineering and Knowledge Management (IC3K).
[2]. Bari Antor, Morshedul, et al. "A comparative analysis of machine learning algorithms to predict
alzheimer’s disease." Journal of Healthcare Engineering 2021 (2021).
[3]. Jo, Taeho, Kwangsik Nho, and Andrew J. Saykin. "Deep learning in Alzheimer's disease:
diagnostic classification and prognostic prediction using neuroimaging data." Frontiers in aging
neuroscience 11 (2019): 220.
[4]. Farooq, Ammarah, et al. "A deep CNN based multi-class classification of Alzheimer's disease
using MRI." 2017 IEEE International Conference on Imaging systems and techniques (IST).
IEEE, 2017.
[5]. Oh, Kanghan, et al. "Classification and visualization of Alzheimer’s disease using volumetric
convolutional neural network and transfer learning." Scientific Reports 9.1 (2019): 1-16.
[6]. Eke, Chima S., et al. "Early Detection of Alzheimer's Disease with Blood Plasma Proteins Using
Support Vector Machines." IEEE Journal of Biomedical and Health Informatics 25.1 (2020):
218-226.
[7]. Al-Shoukry, S., Rassem, T. H., & Makbol, N. M. (2020). Alzheimer’s diseases detection by
using deep learning algorithms: a mini-review. IEEE Access, 8, 77131-77141.
[8]. Jimenez-Mesa, Carmen, et al. "Optimized One vs One approach in multiclass classification for
early Alzheimer’s disease and mild cognitive impairment diagnosis." IEEE Access 8 (2020):
96981-96993.
[9]. Dong, Celia M., et al. "Early Detection of Amyloid β Pathology in Alzheimer’s Disease by
Molecular MRI." 2020 42nd Annual International Conference of the IEEE Engineering in
Medicine & Biology Society (EMBC). IEEE, 2020.
[10]. Padole, Himanshu, Shiv Dutt Joshi, and Tapan K. Gandhi. "Graph wavelet -based multilevel
graph coarsening and its application in graph-CNN for alzheimer’s disease detection." IEEE
Access 8 (2020): 60906-60917.
[11]. Khan, Protima, et al. "Machine learning and deep learning approaches for brain disease
diagnosis: Principles and recent advances." IEEE Access 9 (2021): 37622-37655.
[12]. Mikhno, Arthur, et al. "Toward noninvasive quantification of brain radioligand binding by
combining electronic health records and dynamic PET imaging data." IEEE journal of
biomedical and health informatics 19.4 (2015): 1271-1282.
[13]. Neelaveni, J., MS Geetha Devasena, and G. Gopu. "A comparative study on the application of
machine learning algorithms for neurodegenerative disease prediction." Handbook of Decision
Support Systems for Neurological Disorders. Academic Press, 2021. 283-302.
[14]. Jain, Varun, et al. "A Novel AI-Based System for Detection and Severity Prediction of Dementia
Using MRI." IEEE Access 9 (2021): 154324-154346.
[15]. Song, Xuegang, Ahmed Elazab, and Yuexin Zhang. "Classification of mild cognitive
impairment based on a combined high-order network and graph convolutional network." Ieee
Access 8 (2020): 42816-42827.
[16]. Odusami, Modupe, Rytis Maskeliūnas, and Robertas Damaševičius. "An Intelligent System for
Early Recognition of Alzheimer’s Disease Using Neuroimaging." Sensors 22.3 (2022): 740.
[17]. Dataset source - EDA for predicting Dementia - https://www.kaggle.com/code/sid321axn/eda-
for-predicting-dementia/data?select=oasis_longitudinal.csv.

17365 ijariie.co m 4147


INTERNATIONAL JOURNAL OF ADVANCE
RESEARCH AND INNOVATIVE IDEAS IN EDUCATION

The Board of International Journal of Advance Research and Innovative Ideas in Education

IJARIIE is hereby Awarding this Certificate to


AKILESHWARAN S
In Recognition of the Publication of the Paper Entitled
COMPARATIVE ANALYSIS OF MACHINE LEARNING APPROACHES FOR EARLY DETECTION OF
ALZHEIMER’S DISEASE
Published in E-Journal
Volume-8 Issue-3 2022

Paper Id : 17365
ISSN(O) : 2395-4396 Editor In Chief
www.ijariie.com
INTERNATIONAL JOURNAL OF ADVANCE
RESEARCH AND INNOVATIVE IDEAS IN EDUCATION

The Board of International Journal of Advance Research and Innovative Ideas in Education

IJARIIE is hereby Awarding this Certificate to


SATHISH KUMAR R
In Recognition of the Publication of the Paper Entitled
COMPARATIVE ANALYSIS OF MACHINE LEARNING APPROACHES FOR EARLY DETECTION OF
ALZHEIMER’S DISEASE
Published in E-Journal
Volume-8 Issue-3 2022

Paper Id : 17365
ISSN(O) : 2395-4396 Editor In Chief
www.ijariie.com
INTERNATIONAL JOURNAL OF ADVANCE
RESEARCH AND INNOVATIVE IDEAS IN EDUCATION

The Board of International Journal of Advance Research and Innovative Ideas in Education

IJARIIE is hereby Awarding this Certificate to


MALATHI A
In Recognition of the Publication of the Paper Entitled
COMPARATIVE ANALYSIS OF MACHINE LEARNING APPROACHES FOR EARLY DETECTION OF
ALZHEIMER’S DISEASE
Published in E-Journal
Volume-8 Issue-3 2022

Paper Id : 17365
ISSN(O) : 2395-4396 Editor In Chief
www.ijariie.com
INTERNATIONAL JOURNAL OF ADVANCE
RESEARCH AND INNOVATIVE IDEAS IN EDUCATION

The Board of International Journal of Advance Research and Innovative Ideas in Education

IJARIIE is hereby Awarding this Certificate to


MAHESWARI M
In Recognition of the Publication of the Paper Entitled
COMPARATIVE ANALYSIS OF MACHINE LEARNING APPROACHES FOR EARLY DETECTION OF
ALZHEIMER’S DISEASE
Published in E-Journal
Volume-8 Issue-3 2022

Paper Id : 17365
ISSN(O) : 2395-4396 Editor In Chief
www.ijariie.com
INTERNATIONAL JOURNAL OF ADVANCE
RESEARCH AND INNOVATIVE IDEAS IN EDUCATION

The Board of International Journal of Advance Research and Innovative Ideas in Education

IJARIIE is hereby Awarding this Certificate to


DR. ROSELIN MARY. S
In Recognition of the Publication of the Paper Entitled
COMPARATIVE ANALYSIS OF MACHINE LEARNING APPROACHES FOR EARLY DETECTION OF
ALZHEIMER’S DISEASE
Published in E-Journal
Volume-8 Issue-3 2022

Paper Id : 17365
ISSN(O) : 2395-4396 Editor In Chief
www.ijariie.com
REFERENCES:

1. Khan, A., & Usman, M. (2015, November). Early diagnosis of Alzheimer's disease
using machine learning techniques: A review paper. In 2015 7th International Joint
Conference on Knowledge Discovery, Knowledge Engineering and Knowledge
Management (IC3K) (Vol. 1, pp. 380-387). IEEE.
2. Bari Antor, M., Jamil, A. H. M., Mamtaz, M., Monirujjaman Khan, M., Aljahdali,
S., Kaur, M., ... & Masud, M. (2021). A comparative analysis of machine learning
algorithms to predict alzheimer’s disease. Journal of Healthcare Engineering, 2021.
3. Jo, T., Nho, K., & Saykin, A. J. (2019). Deep learning in Alzheimer's disease:
diagnostic classification and prognostic prediction using neuroimaging data.
Frontiers in aging neuroscience, 11, 220.
4. Farooq, A., Anwar, S., Awais, M., & Rehman, S. (2017, October). A deep CNN
based multi-class classification of Alzheimer's disease using MRI. In 2017 IEEE
International Conference on Imaging systems and techniques (IST) (pp. 1-6). IEEE.
5. Oh, K., Chung, Y. C., Kim, K. W., Kim, W. S., & Oh, I. S. (2019). Classification
and visualization of Alzheimer’s disease using volumetric convolutional neural
network and transfer learning. Scientific Reports, 9(1), 1-16.
6. Eke, C. S., Jammeh, E., Li, X., Carroll, C., Pearson, S., & Ifeachor, E. (2020). Early
Detection of Alzheimer's Disease with Blood Plasma Proteins Using Support Vector
Machines. IEEE Journal of Biomedical and Health Informatics, 25(1), 218-226.
7. Al-Shoukry, S., Rassem, T. H., & Makbol, N. M. (2020). Alzheimer’s diseases
detection by using deep learning algorithms: a mini-review. IEEE Access, 8, 77131-
77141.
8. Jimenez-Mesa, C., Illán, I. A., Martin-Martin, A., Castillo-Barnes, D., Martinez-
Murcia, F. J., Ramirez, J., & Gorriz, J. M. (2020). Optimized One vs One approach
in multiclass classification for early Alzheimer’s disease and mild cognitive
impairment diagnosis. IEEE Access, 8, 96981-96993.
9. Dong, C. M., Guo, A. S., To, A., Chan, K. W., Chow, A. S., Bian, L., ... & Wu, E.
X. (2020, July). Early Detection of Amyloid β Pathology in Alzheimer’s Disease by
Molecular MRI. In 2020 42nd Annual International Conference of the IEEE
Engineering in Medicine & Biology Society (EMBC) (pp. 1100-1103). IEEE.
10.Padole, H., Joshi, S. D., & Gandhi, T. K. (2020). Graph wavelet-based multilevel
graph coarsening and its application in graph-CNN for alzheimer’s disease detection.
IEEE Access, 8, 60906-60917.
11.Khan, P., Kader, M. F., Islam, S. R., Rahman, A. B., Kamal, M. S., Toha, M. U., &
Kwak, K. S. (2021). Machine learning and deep learning approaches for brain
disease diagnosis: principles and recent advances. IEEE Access, 9, 37622-37655.
12.Mikhno, A., Zanderigo, F., Ogden, R. T., Mann, J. J., Angelini, E. D., Laine, A. F.,
& Parsey, R. V. (2015). Toward noninvasive quantification of brain radioligand
binding by combining electronic health records and dynamic PET imaging data.
IEEE journal of biomedical and health informatics, 19(4), 1271-1282.
13.Neelaveni, J., Devasena, M. G., & Gopu, G. (2021). A comparative study on the
application of machine learning algorithms for neurodegenerative disease prediction.
In Handbook of Decision Support Systems for Neurological Disorders (pp. 283-
302). Academic Press.
14.Jain, V., Nankar, O., Jerrish, D. J., Gite, S., Patil, S., & Kotecha, K. (2021). A novel
AI-based system for detection and severity prediction of dementia using MRI. IEEE
Access, 9, 154324-154346.
15.Song, X., Elazab, A., & Zhang, Y. (2020). Classification of mild cognitive
impairment based on a combined high-order network and graph convolutional
network. Ieee Access, 8, 42816-42827.
16.Odusami, M., Maskeliūnas, R., & Damaševičius, R. (2022). An Intelligent System
for Early Recognition of Alzheimer’s Disease Using Neuroimaging. Sensors, 22(3),
740.
17.Dataset source-EDA for predicting Dementia
https://www.kaggle.com/code/sid321axn/eda-for-predicting-
dementia/data?select=oasis_longitudinal.csv.

You might also like