0% found this document useful (0 votes)
208 views14 pages

Deep Transfer Learning Based Parkinsons Disease Detection Using Optimized Feature Selection

This document presents a research paper that proposes a deep transfer learning model for detecting Parkinson's disease using optimized feature selection from handwriting data. The model uses a genetic algorithm and K-nearest neighbor technique to select optimal features, achieving a detection accuracy over 95%, precision of 98%, AUC of 0.90, and loss of 0.12. This outperforms other machine learning and deep learning baseline models. The paper also reviews related works applying machine learning and deep learning to Parkinson's disease detection using different data types and assessment methods.

Uploaded by

syedashmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views14 pages

Deep Transfer Learning Based Parkinsons Disease Detection Using Optimized Feature Selection

This document presents a research paper that proposes a deep transfer learning model for detecting Parkinson's disease using optimized feature selection from handwriting data. The model uses a genetic algorithm and K-nearest neighbor technique to select optimal features, achieving a detection accuracy over 95%, precision of 98%, AUC of 0.90, and loss of 0.12. This outperforms other machine learning and deep learning baseline models. The paper also reviews related works applying machine learning and deep learning to Parkinson's disease detection using different data types and assessment methods.

Uploaded by

syedashmad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Received 3 December 2022, accepted 12 December 2022, date of publication 3 January 2023, date of current version 12 January 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3233969

Deep Transfer Learning Based Parkinson’s


Disease Detection Using Optimized
Feature Selection
SURA MAHMOOD ABDULLAH1 , THEKRA ABBAS2 , MUNZIR HUBIBA BASHIR3 ,
ISHFAQ AHMAD KHAJA 3 , MUSHEER AHMAD 3 , NAGLAA F. SOLIMAN 4 ,
AND WALID EL-SHAFAI 5,6
1 Department of Computer Sciences, University of Technology, Baghdad 10066, Iraq
2 Department of Computer Sciences, College of Science, Mustansiriyah University, Baghdad 14022, Iraq
3 Department of Computer Engineering, Jamia Millia Islamia, New Delhi 110025, India
4 Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671,

Saudi Arabia
5 Security Engineering Laboratory, Computer Science Department, Prince Sultan University, Riyadh 11586, Saudi Arabia
6 Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt

Corresponding author: Musheer Ahmad ([email protected])


This work was supported by the Deanship of Scientific Research, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia,
under Project PNURSP2023R66.

ABSTRACT Parkinson’s disease (PD) is one of the chronic neurological diseases whose progression is
slow and symptoms have similarities with other diseases. Early detection and diagnosis of PD is crucial
to prescribe proper treatment for patient’s productive and healthy lives. The disease’s symptoms are
characterized by tremors, muscle rigidity, slowness in movements, balancing along with other psychiatric
symptoms. The dynamics of handwritten records served as one of the dominant mechanisms which support
PD detection and assessment. Several machine learning methods have been investigated for the early
detection of this disease. But most of these handcrafted feature extraction techniques predominantly suffer
from low performance accuracy issues. This cannot be tolerable for dealing with detection of such a chronic
ailment. To this end, an efficient deep learning model is proposed which can assist to have early detection
of Parkinson’s disease. The significant contribution of the proposed model is to select the most optimum
features which have the effect of getting the high-performance accuracies. The feature optimization is done
through genetic algorithm wherein K -Nearest Neighbour technique. The proposed novel model results into
detection accuracy higher than 95%, precision of 98%, area under curve of 0.90 with a loss of 0.12 only.
The performance of proposed model is compared with some state-of-the-art machine learning and deep
learning-based PD detection approaches to demonstrate the better detection ability of our model.

INDEX TERMS Parkinson’s disease, neurological disorder, handwritten records, transfer learning, deep
learning.

I. INTRODUCTION brain which is responsible for movement and coordination


Parkinson’s disease (PD) is an incurable neurological dis- control. Dopamine levels decrease when the cells in basal
order that is caused due to the decrement of dopamine ganglia that are responsible for the synthesis of dopamine,
levels in a human brain. Dopamine is a neurotransmitter die or become impaired. Parkinson’s symptoms may include
that helps send messages to basal ganglia; the part of the tremors, restrictive or slowness of movement (Bradykinesia),
compromised balance, impaired posture, involuntary move-
The associate editor coordinating the review of this manuscript and ments (dyskinesia), stiff muscles, and speech and writing
approving it for publication was Mostafa M. Fouda . changes [1]. The Parkinson’s disease can prove to be complex

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 3511
S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

to diagnose since there aren’t much clinical tests such with existing models is done in section V. Finally section VI
as blood tests involved. PD is most common in people recapitulates the proposed work as conclusion.
above 60, but the disease can start off early and not be
diagnosed until too late. If the disease is detected at an II. RELATED WORKS
earlier stage, it becomes easy to manage the symptoms and With the advances in deep/machine learning and AI tech-
delay the deterioration caused by the disease [2]. The early nologies over the past couple of decades these technologies
onset of PD may result in finger tremors and halts during have gained quite a bit of fame in various fields. Of late, the
speech and movement. Finger tremors result in changes in deep learning methodologies have been used extensively in
handwriting and thus people with Parkinson’s tend to have the medical world as well. The rapid growth and research
small and cramped handwriting. This handwriting is termed in the AI and deep learning areas have increased its
as ‘‘Mycrographia’’ and can prove to be crucial in the early market tremendously in the medical field for diagnoses and
detection of PD. Patients can be diagnosed with PD by finding prognosis of various diseases. Various studies have been
out the presence of particular patterns in their handwriting, conducted to diagnose PD using a variety of datasets. Also,
indicating mycrographia or other deformations. there are a lot of symptoms that have been studied for
Deep learning methods have excelled at classification detection of PD like olfactory loss, walking patterns, speech
problems lately. Deep learning algorithms such as CNNs are patterns, handwriting tests and other motor skill tests.
proven to have state-of-the-art accuracies in classification Recently, Fang [5] proposed improved KNN algorithm
tasks. CNNs have been used widely for classification of entropy for the detection of PD. The UCI dataset was
images, audios, or videos. CNNs extract unique patterns considered for the study. To estimate the efficiency of this
from given data to use for final classification. The ease improved algorithm, the comparative analysis of already
of availability and usage makes CNNs an excellent choice existing approaches was carried out. The KNN (k-nearest
for classification problems. Prior research has also proven neighbors), Random Forest algorithm, and Naïve Bayes
that deep learning algorithms can work more efficiently algorithm were considered to verify the feasibility of
than the machine learning ones because transfer learning improved algorithm. 5-fold cross validation scheme was
can be applied. Transfer learning makes use of pre-trained used. It was observed that among the existing algorithms
CNNs with new use cases, and then one or multiple layers when compared with traditional methods, the improved
are added at the end [3]. Some of the deep learning KNN algorithm based on entropy weight showed significant
architectures include ResNets, EfficientNets, MobileNets etc. increase in the accuracy.
Deep learning approaches have been used in medical field Kuplan et al. [6] adopted a novel method for the
for quite some time now. Deep learning models can interpret classification of symptoms of PD using MRI scans. The main
medical data like X-Ray images, and MRI scans which goal of the study was to explore more clinical data to elaborate
proves advantageous for diagnosis. With the advancement the efficacy of artificial intelligence for better detection of the
of AI over the past decade, its application in medical field PD disease. Three classification tasks were carried out that
has also encountered tremendous growth. In medical field, focused on stage and major symptoms of Parkinson’s. The
the application of AI is of great potential and is currently symptoms included clinical stage, dementia status, and motor
being used to diagnose/predict a variety of diseases. Studies skills. After characterizing each and every patient based on
indicate that deep learning methods can be far superior in their current condition, a novel model was introduced which
comparison to other high-performing algorithms [4]. Using ran on the combined principles of handcrafted textural feature
deep learning approaches for PD detection using handwriting engineering, multiple feature selectors patch-based learning
data can prove beneficial as deep learning methods have and IMV. The model showed outstanding performance for all
reached excellent accuracies. A deep learning architecture classification tasks.
can be fed with image data comprised of handwritten samples Gazda et al. [7] also recently proposed an ensemble of
from affected and non-affected people and results can be deep learning architectures for the detection of PD from
acquired. offline handwriting. For this purpose, they used 2 datasets
In our proposed model, we proffered to use deep transfer namely, PaHaW and NewHandPD. To improve the generality
learning models, genetic algorithm, and k-nearest neigh- of the model, transfer learning was considered. The ensemble
bours’ technique to develop a system that efficiently detects classifier created, consisted of 5 CNN models. Since the
the patients as healthy or suffering with Parkinson’s disease PaHaW dataset consists of 8 separate handwritten tasks,
by extracting features from handwritten records. the prediction accuracy for all those tasks was calculated.
The rest of the paper is systemized as follows: Section II Prediction accuracy was calculated for each specific task via
begins with highlighting the related work that has been done each separate CNN as well as ensemble classifier and then
in recent years for Parkinson’s detection. Section III provides compared. The authors, in their work presented an ensemble
the description of materials and methods adopted for the of multiple CNNs for the diagnosis of Parkinson’s disease.
proposed work. The dataset used for this study is described To reduce the computational cost altogether, the approach
followed by the explanation of the proposed framework in of multiple fine tuning is adapted. This approach provided
Section IV. Whereas the experimental results and analysis competitive results. A detailed study and comparison with
of the proposed model including the comparative analysis other works was provided.

3512 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

Mohaghegh and Gascon [8] proposed a vision transformer 204 images was considered. This dataset had 102 images
(ViT) for handwritten data to detect Parkinson’s disease based of spiral sketches and an equal number of wave sketches.
on spiral and meander drawings. Their model comprised of The complete study comprised of 3 different sections.
three different layers; a pytorch layer (base model), a dropout Section 1 comprises of a generator section which served the
layer (dropout value=0.1) and finally a linear layer as the spiral and wave images of a specific patient. Section 2 defines
classifier. They made use of DeiT; pre-trained on ImageNet the CNN architecture that is in charge for producing feature
and self-supervised with DINO, as the base model. DeiT representations. After the features are extracted, they are sent
refers to data-efficient image transformer which is a type through the final dense layer for getting the predictions for
of vision transformer used for image classification jobs. every image. Section 3 defines the type of meta-classifiers
Using 5-fold validation schemes, an accuracy of 92.37% was that predict the probabilities and make final predictions.
achieved together with a standard deviation of 0.013. Logistic regression (LR) and Random Forest classifier (RF)
Fratello et al. [9], carried out a study in which data was are used for the development of the meta-classifiers. Both
collected from 9 PD patients and 22 healthy people. The data classifiers take input as prediction probabilities of wave
was collected in collaboration with Casa di Cura Le Terrazze and spiral CNNs and produce the outcome as Parkinson’s
institute where every subject, but one was right-handed. The disease or healthy. The model achieved an accuracy of 93.3%,
participants ranged from 25 to 60 in age. An application was an average recall of 94% and average precision of 93.5%. The
created in this study that helped the authors to record data work successfully achieved a multistage classifier system for
from subjects via a tablet for the recognition of patterns in the detected of the Parkinson’s disease with the help of spiral
handwritten data in order to diagnose PD post which; for the and meander sketches. They leveraged two systems namely
classification purposes, three models were proposed for three ensemble voting classifier and CNNs which resulted in an
different handwritten data. Highly discriminate features were average F1 score of 93.94%.
extracted using the Mann–Whitney test [10]. For the first Nõmm et al. [13], presented a research which was based
2 models, a linear SVM approach was considered while as for on a group of 34 people equally divided into sub-groups of
the third model a medium KNN was used. It was observed PD patients and healthy people, therefore having 17 subjects
that the first 2 models provided an accuracy of 71.6% and per group. The groups have a mean of 69 years along
75.5% respectively. The third model achieved an accuracy of with the standard deviation of 4 years. For this research,
77.5%. special software was constructed for ipad Pro equipped with
Loh et al. [11] used the Electroencephalography (EEG) stylus to collect different data from the patients. The data
signals for the detection of Parkinson’s disease. The approach was in the form of writing and drawing tests. AlexNet
was considerably different from other approaches that gen- architecture was proposed after necessary data enhancement
erally used the handwritten data for the PD analysis. Gabor and augmentation. The final accuracy was observed to be
transforms were used to obtain the spectrograms from EEG 93%. The experimental results of their research included the
signals after splitting them in half. Thus, from each of the application of deep learning-based networks in the area of
EEG signals, two spectrograms were collected. A 2D-CNN Parkinson’s diagnosis.
based architecture was proposed to classify the spectrogram Tuncer et al. [14] used voice signals to detect PD.
signals of control group, PD patients with medication and PD A fusion of SVD (singular value decomposition) and
patients without medication. The study included 4 kinds of minimum average maximum tree (MAMa) is proposed to
experiments, first to classify all the three groups, and the other find out unique features from the voice signals. In the pre-
three for a binary classification of the three groups in different processing phase the authors developed a new feature signal
combinations e.g., control group and PD with medication, from three levels of MAMa tree. After the feature signal
control group and PD without medication and lastly PD with was generated, SVD was applied to it for the extraction
medication and PD without medication. The last layer of of features. Relief feature selection method id implemented
the proposed model differed according to the output of each to extract about 50 different features. For classification
experiment. For the first experiment where the last layer could purposes, the KNN algorithm and 10-fold cross validation
have 3 outputs, a SoftMax function was used while as for are used. The experimental results show that with the KNN
the binary classification a sigmoid activation was used. The classifier, an accuracy of 92.46% was achieved. The proposed
results were examined by applying a 10-fold cross-validation algorithm in the study can be used for distinct signals like
scheme. For experiment 1 the accuracy was the highest, with ECG, EEG, PCG and EMG as well and detect several other
a value of 99.46%. The 2nd, 3rd and 4th experiments had the diseases.
accuracy values of 99.44%, 98.84% and 92.60% respectively. Das et al. [15], in their research, tested the perfor-
Chakraborty et al. [12] developed a system to observe mance of multiple CNNs on 2 datasets. The datasets in
some patterns in spirals and waves sketched by patients consideration were taken from Kaggle’s repository. The
with PD and eventually detect the disease. For this system, second dataset in question was provided by the authors
voting ensemble classifiers were used along with the of [16]. The images consisted of spiral drawings, wave
two-dimensional CNNs were considered for analyzing the drawings, in the first dataset and hand-drawn cube and
patterns in sketches and detecting the Parkinson’s disease. triangle images in dataset 2. Two approaches were consid-
To validate and train their proposed system; a dataset with ered for the study. In the first approach, the CNNs like

VOLUME 11, 2023 3513


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

VGG19, ResNet50, MobileNet-v2, Inceptionv3, Xception, with normalizing was done on these signals, for square
and Inception-ResNet-v2 were trained from scratch on both transformation. Data was converted into square matrix and
the datasets whereas in the second approach Transfer learning then fed to the proposed CNN model. Finally, the normalized
was applied. The authors investigated the usage of deep square matrices were sketched. Two datasets were used;
convolutional neural networks in the detection of Parkinson’s SST and DST and both were made of 72 images, where
with the help of hand drawn images. The fine tuning gave 57 entries belonged to PD patients and 15 entries belonged
better results as compared to approach 1 and 2 which to normal subjects. Early stopping was used to reduce the
comprised of training from scratch and using two shallow value of loss to prevent the drop in validation accuracy.
neural networks respectively. The models ResNet50 and A CNN based on LOOVC and K- fold cross validation was
MobileNet-V2 were explored that outperform other leading proposed that efficiently responds to the features, extracted.
CNNs. Another achievement of the research included getting higher
Johri and Tripathi et al. [17], developed a classifier made accuracies with fewer features. The model achieved over 88%
up of two modules, for cheap and efficient diagnosis of accuracy.
Parkinson’s disease. Two separate datasets namely VGFR
dataset and voice impairment dataset were used. The VGFR A. LITERATURE GAP
dataset is composed of the signals recorded for the reactions It is important to address that PD is a life altering disease and
of subjects to the vertical ground force. The voice impairment can cause long term suffering. It’s crucial to detect it with
dataset was contributed by the Max little university of Oxford higher precision and accuracy. While the above-mentioned
and contains voice measurements of 91 people. The deep papers present numerous techniques and methods to collect
learning method that is proposed is used to detect two data from control subjects and PD subjects, further research
Parkinson’s symptoms i.e., Gait and speech impairment. The is required to find out the best working algorithms to finally
proposed model has 2 modules; the first one being the VGFR diagnose PD with utmost precision and accuracy. Majority
spectrogram-detector which works on distorted walking of the research is diverted towards the working of various
styles and the second one is the voice impairment classifier leading deep learning models. In addition, common and
which is established on the basis of speech distortion of the generally used algorithms like K-nearest neighbours, Naïve
PD subjects. For the first module, the sensor readings per bayes and Random forests have been exploited for the
patient are converted to a spectrogram. This spectrogram problem until now. However, there is a scope for research
shows a pattern which is acquired with the help of these in the area of adaptive heuristic algorithms such as the
signal values. The 2D spectrogram images are used as the genetic algorithm. The working of heuristics algorithms for
inputs for the CNN. It was observed that for the VGFR PD detection can be elucidated.
module the accuracy came out to be 88.17% and for the
voice impairment module accuracy was 89.15%. In their B. OUR CONTRIBUTIONS
study, the authors proposed a novel system based on the Based on the literature reviewed and existing research
principles of bi-directional GRUs and 1D convolution. This gaps, a novel transfer learning-based model is proposed for
approach was to detect the distinguishing patterns in the automatic detection of Parkinson’s disease which explores
handwritten material acquired from people with and without the merits of genetic algorithm and K-nearest neighbor for
PD. Promising performance values were achieved by the efficient detection performance. In this study, the objective
authors. function of GA is based on KNN, which is a distance-
Tuncer and Dogan [18] implemented a novel multi based algorithm. The model never learns a discriminative
pooling technique for classification that used 8 pooling function, during the process but only calculates distances
methods, commonly called and octopus-based method to between two vectors. By reducing the features, the com-
solve 3 classification problems. These were Gender, PD and plexity of traditional KNN is further decreased. Thus, the
gender + PD classification problems. The aim of using the training complexity is highly reduced as compared to other
octopus-based method is to achieve a lightweight nature traditional CNN based models. This paper has following main
since this method doesn’t use any algorithm to optimize or contributions compared to the existing models investigated
update any weights during training. Concepts like SVD, NCA for Parkinson’s disease detection.
have been utilized by the authors for feature extraction and ? Instead of employing handcrafted feature extraction
selection respectfully. Several other algorithms like SVM, technique, an automatic feature extraction model based
KNN, logistic regression, and decision tress have been on transfer learning networks is suggested.
exploited for the classification phase. The results showed that ? Multiple transfer learning neural networks are employed
the KNN solved the PD problem the best with a high accuracy. to eliminate possible bias from any single TL network
The authors also managed to solve all the three problems with for precise and bonafide detection.
only 32 features. ? Optimum features selection out of the extracted stacked
Khatamino et al. [19] classified HW dataset with the help features from TL networks is performed through genetic
of CNN-based approach. The dataset was split into separate algorithm and K-nearest neighbor procedure to achieve
sections of spiral and spiral images. Spiral drawings were more accurate detection unlike traditional CNN based
drawn by using 2D features of the dataset. Rescaling along models.

3514 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

? The performance is compared with many existing PD need. Very small values of k e.g., 1 or 2 can prove to be
detection methods to demonstrate the better perfor- noisy and eventually lead to misleading interpretations due to
mance of the proposed model. outliers. The KNN is utilized as the objective function during
the feature optimization phase, and it also applied to do the
III. MATERIALS AND METHODS classification task after getting the optimized feature vector.
The different materials and methods adopted to develop the
proposed PD detection model are described in this section. C. GENETIC ALGORITHM
Primarily, the transfer learning models, K-nearest neighbor Genetic algorithm (GA) is an adaptive heuristic algorithm
classifier, and genetic algorithm optimizer are explored in this whose working is governed on the principles of genes and
proposed model. natural selection. Genetic algorithms being closely related to
the evolution theory mimic the concept of natural selection.
A. TRANSFER LEARNING Natural selection refers to the survival of the species that
Transfer learning makes it possible for pre-trained networks are able to adapt or mutate according to the changes in their
to be used for new use cases which might prove beneficial habitat and surroundings. The core idea behind the natural
in saving up resources and providing improved efficiency. selection is ‘‘Survival of the fittest.’’ Every generation is
The general idea of transfer learning is to use the previously comprised of a different set of individuals commonly known
gained knowledge and apply it for a newer problem with as a population. Each and every individual of a particular
different data. Transfer learning also saves a lot of time of populations acts as a point in the search space [21]. There are
training since a new model doesn’t need to be trained from generally 5 phases of the GA such as (1) Initial population,
scratch. This approach is also fruitful when it comes to the (2) Fitness Function, (3) Crossover, (4) Mutation, and (5)
absence of enough data. It allows a user to apply an entirely Selection. The basic procedure of a GA is shown in Figure 1.
new dataset to solve completely different problems. It allows
the user to specify the dimensions of last layers according to
will. Also, not only does the transfer learning approach allows
users to change the dimensions of output layer, it allows the
users to fine tune other hyper-parameters as well as weights in
the other layers of the pre-trained model. Typically, in transfer
learning the starting layers are fixed or locked and resistant
to any change, while as last layers are adjustable.

B. KNN CLASSIFIER
The k-nearest neighbor or k-NN algorithm is considered
to be one of the most straightforward and uncomplicated
machine learning algorithms. The simple nature of the
algorithm is achieved due to the fact that is doesn’t consider
any parameters because of which it can also be called a
non-parametric algorithm. The action on data is performed
during the last stages of this algorithm and often called
as a lazy learning technique. The algorithm can be used
for classification as well as regression purposes, but most
prominent application of the algorithm can be observed
in classification problems. The concepts of this algorithm
are easy to understand and apply. The k represents the
neighboring points of data surrounding the new data point. FIGURE 1. Genetic algorithm procedure.

The algorithm compares the new data point with its neighbors
(k) and then groups it with the most similar neighbors [20]. Starting with the initial phase called the initial population
The value of the K is generated randomly at the beginning phase; this phase consists of the population in question. Each
of the algorithm which is usually taken within the range individual is represented using a unique string. Here, each
of 3-5. The similarity among the data point is found out by individual acts as a solution to any problem that has to be
the means of distances between them. To be more specific, solved. Once the population is considered the next phase
Euclidean distances are calculated between new point and known as the fitness function starts. Fitness function, as the
its neighbors. The new data point is appointed to the group name suggests, displays the fitness level of an individual.
comprised of neighbors with least Euclidean distances. An individual needs to be fit in order to survive in its habitat
There hasn’t been a specified way to determine an optimum by being competitive against other individuals. The fitness
number of k, so some trial and error is always expected. Still, function provides a score which represents the fitness level.
the most commonly used value of k is taken as 5; however The selection of the individuals is dependent on the fitness
different problems might require changes according to the score. The third phase is the selection phase where the fittest

VOLUME 11, 2023 3515


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

individuals are selected to pass on their genes to younger


ones. In this phase a couple of individuals are selected on
the basis of their fitness scores. These individuals can also
be called parents. Parents with high fitness scores are more
likely to be considered for reproducing the off-springs. After
the selection phase, a very significant phase takes place which
is known as the crossover phase. Here random genes/ set of
genes are selected from the chromosomes of both the parents
and are swapped with each other. Thus, the offspring contains
half the genes of both the parents. This new individual is FIGURE 2. Examples from spiral and meander classes of the patient
then added to the existing population. The last phase of the group.

genetic algorithm is the mutation phase. This mutation refers


to the slight changes in the genes of a newly created offspring.
These changes are subject to the changing patterns of the
environment. Mutation occurs so that the new population
is able to deal with the changes and thrive, instead of not
coping up with the changes which could lead to extinction.
The algorithm is said to terminate when no new offspring
contains any kind of mutation.

IV. PROPOSED METHODOLOGY


In this section the proposed methodology for PD detection FIGURE 3. Examples from the spiral and meander classes of the control
using handwritten data will be discussed. This section begins group.
with a brief dataset description followed by the elaborated
discussion of proposed methodology.
or Patient). The images are resized to 256 × 256 for use in
A. DATASET DESCRIPTION Transfer Learning (TL) models. Because certain TL models
For the experimental part, NewHandPD [22] dataset has been have requirements for image sizes, this is done in advance
taken into consideration. The dataset has been published to prevent issues in the future. The images are read as
and made available to the public for research purposes. The RGB (3 channel images) since TL models normally work
dataset is entirely dedicated to the handwritten specimens on colored images only, as they were trained on colored
which prove to be beneficial for our work. The NewHandPD images. Additionally, image pixels gray values are scaled
contains images collected via a smart pen and a tablet down from 0 to 1 by dividing them by a factor of 256. Reason
respectively. The NewHandPD dataset was introduced by being, the deep learning models work better on values which
Pereira et al. [22]. This dataset is the extension of the HandPD are in the range of 0 and 1.
dataset [23]. There are 594 total images in the data set, 160 of
which are male and 104 of which are female. The Healthy C. PHASE-2: FEATURE EXTRACTION
Group and Patient Group are the two different sorts of groups In this work, feature extraction is carried out using three
that make up the data set. There are 315 samples overall in transfer learning models, specifically the following ones [20].
the healthy group and 279 samples total in the patient group.
In each category, samples are drawn from both males and 1) RESNET50
females. Depending on the sort of drawing that individual The ImageNet dataset was used to train the 50-layer deep
receives; the data set is split into three categories: Circle, neural network known as ResNet50. The network has learned
Meader, and Spiral. Each group has a depiction of the form a wide range of attributes as a result of being trained on
it stands for; for example, in the Circle group, a patient and more than a million images. The network’s input shape
a member who is in good health are asked to draw over (image input size) is 224 × 224 × 3. The network was first
the provided circle. The images of all groups are combined used for computer vision tasks, but as it advanced, it has
in our study into two categories: healthy and patient. The also shown promising performance in non-computer vision
images from all three groups that are healthy make up the applications.
healthy group. Images from all groups obtained from patients
make up the patient group. Some example images from the 2) VGG19
Patient group and healthy group are shown in Figure 2 and 3, Another transfer learning model that was educated on
respectively. ImageNet is VGG19. The network has 19 layers and
was trained using several different characteristics. Although
B. PHASE-1: DATA PREPROCESSING this network has also been utilized for computer vision
Each group’s images are read, and if they fall under the applications, it also performs well in other image-related
same category, they are preserved with that label (Healthy tasks.

3516 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

3) INCEPTION-V3
The Inception model is based on the depth of neural network
that is employed. This model consists of 48 symmetric
and asymmetric layers, including convolution, pooling,
dropouts, etc., make up the model. In terms of accuracy and
computational cost, the v3 model outperforms its antecedents
(v1 and v2).
By freezing their training and removing their tops, these
transfer learning models are utilized in our proposed work.
These models’ training has been stopped, and features
are now obtained using the weights from earlier training
(ImageNet). These models’ top include upwards of 20,000
classes, which are not necessary for our study. Therefore,
we deleted the top from each of the TL models and placed
our own neural network (top) on top of each of them in order
to effectively employ TL models. The top is composed of a
thick layer on top of a flattened layer. The number of features
we extract from each model individually is the number of
nodes given to the dense layer. The structure of individual
transfer learning model extracting features from input images
is shown in Figure 4. Where, the whole feature extraction FIGURE 5. Feature extraction process.
process collectively through all three TL models is depicted
in Figure 5.
characteristics that were chosen. This phase needs to choose
a suitable machine learning (ML) algorithm, and a proper
optimization method. In our study, the accuracy metric serves
as the cornerstone for the optimization process, with the
machine learning approach acting as an objective function.
Since, the KNN has low computational cost and it is perfectly
suited for small sized dataset, we choose the k-Nearest
Neighbors as objective function during feature optimization
phase. Due to the durable performance of genetic algorithm,
it is chosen as the optimization algorithm.
The feature vector optimization takes place as per the
following GA process:
FIGURE 4. Structure of each TL model. Step 1. Initialize the GA parameters
Step 2. Generate random population (Initial Population).
Each model has had 100 features taken out of it. After Step 3. Calculate the fitness of each member of initial
passing through each TL model, these characteristics are population
then layered horizontally on top of one another, giving each // calculate accuracy of each feature generated
input image a final form of 300 (100+100+100). Following Step 4. While iteration < Max_itr
the feature extraction procedure, each image is divided into Step 4.1: Choose two parents at random from the
300 features, resulting in the form of our entire dataset being population and perform crossover over the parents.
(594 × 300), which is composed of 594 images. This process is continued until the crossover ratio
The rationale behind employing three models is to from the entire population has been reached.
eliminate bias for any one model in particular. Our algorithm Step 4.2: Choose a parent from population and
model primarily focuses on the optimization process and perform mutation. This step is also repeated until the
is input independent due to a variety of features. By using mutation ratio from total population is attained.
three models we ensure that our study doesn’t produce biased Step 4.3: Calculate the fitness of newly generated
results. children.
Step 4.4: Select the top candidates from the extended
D. PHASE-3: FEATURE OPTIMIZATION population and forward them as population for the
The features that were extracted in the earlier step are then next iteration.
fed into this stage. The critical stage of the study occurs here, Step 4.5: Go to Step 4.
when a collection of features is chosen using an optimization Step 5. Stop and output the best vector produced.
approach and sent to the machine learning algorithm to assess In step 1 genetic optimization algorithm parameters
performance. The results of the optimization process are the are initialized, the various parameters are crossover ratio,

VOLUME 11, 2023 3517


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

mutation ratio, population size, total iterations etc. A popu- different feature vectors. The resultant feature vectors are
lation of binary vectors, or vectors that exclusively include stacked on top of each other to get a single output vector
0s and 1s, is generated in step two. Using the chaotic logistic which we feed to the optimization algorithm, which in our
map approach, the population is produced at random. This case is Genetic optimization algorithm (GA). As shown in the
approach generates random numbers by using a dynamic workflow diagram, GA comprises of three phases: crossover,
key. The same pseudo-random number vectors are calculated mutation, and selection, which give us the final output and
every time using the same key. In order to retain the influence performance evaluation metric. The complete procedure of
of the complete feature set on our algorithm, an additional proposed methodology is presented below in stepwise form.
vector made up entirely of 1s is introduced to the population. Step A. Load the NewHandPD Image dataset.
Accuracy metric serves as a parameter for fitness evaluation,
//Data pre-processing
in step 3. The algorithm estimates accuracy by converting
the feature matrix in accordance with each candidate in the Step B. Scale down the images by dividing each image’s
population. values by 256. (Scaling creates floating point values
The algorithm is then executed through a number of spec- which show better results).
ified iterations. In each iteration, step 4 randomly chooses //Feature extraction
two parents, performs crossover on these two parents, and Step C. Pass each image through three Neural Nets such as
produces two children. The initial crossover rate is what ResNet50, VGG19 and Inception_v3. Each model
determines the final crossover rate. The total population outputs a total of n features, thus each image is
generated then, contains both parents and children. The transformed into the 3∗ n features. The net result
algorithm then moves on to the next stage of mutation. The of this stage is a stack of feature maps, each of
candidate is chosen at random from the entire population, length 3∗ n.
and it is subjected to mutation. Up until a population-wide
//Feature optimization
mutation rate is attained, the mutation process is repeated
as well. The algorithmic population size is unaffected by Step D. After obtaining feature maps from step C, the
this phase. Usually, the mutation is carried out to prevent optimization algorithm is applied. The optimization
being trapped in local minima. Following these two methods, algorithm applied in this study is Genetic Algorithm
each created child’s fitness is determined, and only those (GA), which has three stages i.e., Crossover,
children who score highest on the fitness scale are kept. The Mutation and Selection. The initial population ‘m’
hyper parameters, (n), which represent the algorithm’s whole (population of binary vectors where 1 represents the
population, are used to choose candidates for subsequent presence of feature and 0 represents absence of the
generations. Step 4 is repeated till the stopping condition is feature) is chosen. This step is followed by objective
not met. We go on to step 5 after exiting step 4, where the best function evaluation. The objective function chosen
candidate found in step 4 is output and frozen. The process here is KNN. Each binary vector is multiplied by
then comes to an end with the optimized feature vector. The feature matrix and the KNN algorithm is applied
hyperparameter of the mentioned GA process are as follows. and evaluated on this matrix. Accuracy is calculated
and is stored against each vector in population.
Population size: 20 The objective is to maximize accuracy (objective
Total Iteration: 200 function). The anticipated GA process for feature
Crossover rate: 70% optimization presented in Section D is enforced.
Mutation rate: 30 % Step E. After optimization phase, the algorithm moves to
Selection process: 20 candidates with best accuracies next step, which is returning the optimized vector.
Random Seed: 13 This vector is then used to test and evaluate the test
A total of 300 features are input to our GA algorithm, set.
which generates a matrix of size 20∗ 300, where 20 being The schematic diagram of the proposed transfer learning
population size and 300 being vector size (size of individual and optimization-based Parkinson’s disease detection frame-
candidate in population). The resultant vector also has a size work is shown in Figure 6.
of 300 which is then multiplied by feature matrix, in order
to evaluate results. The GA generates a total of 20 vectors as V. PERFORMANCE RESULTS
output, among them only best (with highest objective score) In this part, we present the findings of a series of tests
is used in following steps. designed to evaluate the performance of the suggested
The model begins with reading the data from memory and model. The experiment assessed the prediction capability
shaping and scaling down the data. We have converted the of the proposed methodology on the NewHandPD dataset
images into 256×256×3 input vectors and have scaled them and determined how each feature subset contributed to
down by dividing 256. The intuition behind this is that our the total classification accuracy. Various analyses of the
model learns better when the data is scaled down. The input results are carried out based on how the performance of the
vectors are fed to three different networks namely ResNet, algorithm is assessed. The metrics used to assess the detection
VGG19 and Inception. The three different networks give the performance of the proposed model includes Accuracy,

3518 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

FIGURE 6. Schematic diagram of the proposed framework for PD detection.

Recall, Precision, and AUC which are briefly formulated as


follows.
Accuracy is expressed as:
TP + TN
ACC =
TP + TN + FP + FN
Recall is formulated as:
TP
Recall =
TP + FN
Mathematically, the precision is accounted as:
TP
Precision =
TP + FP
where,
TP stands for True Positives
TN stands for True Negatives
FP stands for False Positives
FN stands for False Negatives

A. BEHAVIOUR BASED ON POPULATION SIZE


In this analysis, the population size is varied from 10 to 50
with an increment of 5 and iterations are set to 200. The
peak accuracy at each population size is recorded. The
FIGURE 7. Performance of proposed model versus population size
obtained results are presented in graphical form shown in (a) accuracy, (b) loss.
Figure 7.
At population sizes of 20 and 40, the highest accuracy of
95.29% is attained. The algorithm is frozen at population size iteration 100, following which it began to increase until it
20 for future analyses in this study since the loss was lowest reached iteration 200. The loss exhibits a sharp rise after
at that level for the population. the 200th cycle. For this reason, 200 is selected as the ideal
iteration size in our study. Thus, the algorithmic parameters
B. BEHAVIOUR BASED ON ITERATION COUNT for further performance assessments in this study are 20 for
Here, the iteration varied from 50 to 500 with an increment the population and 200 for the iteration count.
of 50, and the population size is fixed at 20. The number
20 was chosen since it produced the best results in the prior C. PERFORMANCE ANALYSIS
analysis (95.29%). The highest accuracy achieved in each The input feature matrix is split into a train and a test
iteration is noted. Graphical plots shown in Figure 8 are set, and the best vector acquired from the optimization
plotted to present the behavior and findings. The algorithm procedure is multiplied with the feature map to produce
shows no signs of improvement beyond iteration 200, the optimized feature map. The test set is used to calculate
where the greatest average was attained (flat line in graph). the KNN algorithm’s score, while the train set is used to
The loss seen throughout this procedure dropped up to fit the KNN algorithm. The confusion matrix is described

VOLUME 11, 2023 3519


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

altogether. In each iteration of the algorithm, accuracy gains


are seen as visible in Figure 10. The algorithm achieves
its peak at and after iteration 200 and then exhibits steady
behavior after that.

FIGURE 10. Accuracy of proposed model.

2) LOSS
In order to create an optimum feature map (× train) at each
iteration, the best vector generated for each population is
FIGURE 8. Performance of proposed model versus iteration count
multiplied by × train to determine the training loss. This
(a) accuracy, (b) loss. feature map is used to fit the KNN algorithm, which is
subsequently tested on test data. Using the × test, the test loss
as follows. The dataset is split into a train set and a test is computed. The train and test loss’s variation with iteration
set for assessment. 20% of the total data is maintained for are shown in Figure 11 and 12, respectively. The minimal test
testing while the remaining 80% is provided to the train set. loss obtained is 0.09, while the minimum training loss is 0.01.
There are 119 samples in total which are stored for analysis.
The confusion matrix shown in Figure 9 is attained for the
proposed model for population size of 20 and iteration count
of 200. Total values successfully predicted are 105 (42+63),
while incorrect predictions are 14 (12 + 2).

FIGURE 11. Training loss plot.

3) AREA UNDER CURVE (AUC)


The level or measurement of separability is represented by
Area under curve (AUC). It reveals how well the anticipated
FIGURE 9. Confusion matrix for population=20 and iteration=200. model can differentiate across classes. The greater the AUC,
the better the model is in correctly classifying the classes
At a population size of 20 and iteration count 200, the i.e., Healthy classes as Healthy and Patient classes as patient.
algorithm is frozen, and performance is evaluated in each The AUC behavior for the proposed algorithm is shown in
iteration. Accuracy, train loss, test loss, and area under the Figure 13. The highest AUC value obtained is 0.928, while
curve (AUC) are the different metrics that are evaluated. the final AUC value i.e., after the 200th iteration is 0.9010.

1) ACCURACY 4) RECALL
The proposed algorithm achieves a maximal accuracy of The recall acts as a measure of how well our algorithm
95.29% after that, the algorithm overfits and stops training detects True Positives. Recall reveals how many people we

3520 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

FIGURE 14. Recall plot.


FIGURE 12. Test loss plot.

FIGURE 13. Area under curve (AUC) plot. FIGURE 15. Precision plot.

accurately recognized as patients out of all those who truly Figure 16. The table gives the performance comparison
are Patients. Our model’s peak recall is 0.8690, and the of every metric in varying architectures, which shows the
final recall measure is derived as 0.829. Recall serves as an better working of the proposed model and the evaluation
indicator of how well our model can locate the pertinent facts. parameters. In addition to accuracy, the Table 1 also compares
It is also known as the True Positive Rate or Sensitivity. the performance metrics such as recall, precision, Loss,
and AUC. It shows that the AUC, recall, precision and
5) PRECISION Loss parameters are also comparable. The Loss, which is
The proportion of True Positives to all Positives is known as only 0.12, along with high accuracy is an indicator of
precision. In terms of our problem statement, that would be better detection ability of the proposed model as compared
the proportion of patients with Parkinson’s disease that we to the mentioned existing detection schemes. Hence, the
are able to identify accurately out of all those who truly have comparison analysis helps to make a vivid deduction about
it. The proposed model achieves a peak precision score of the better performance of the proposed model.
1.00 as can be seen in Figure 15. However, it really ends up
E. DISCUSSION
being 0.985 at the end of last iteration. Furthermore, precision
provides us with a count of the pertinent data points. We investigated the detection of Parkinson’s disease using an
enhanced feature extraction procedure. The dataset is made
D. COMPARISON ANALYSIS up of drawings taken from both healthy and affected persons.
In order to fairly assess the performance of the proposed The study creates feature maps for each input image using
framework, we need to compare the obtained results against the well-known transfer learning (TL) models. To obtain
some recently investigated PD detection schemes suggested features, several TL (ResNet, VGG, and Inception) models
in [5, 7-9, 12, 13, 17, 19, 25-29]. We prepared Table 1 to are used. The stacked output from all three models is used to
present various performance parameters scores and compare extract the features. The fundamental goal of employing three
with various networks like CNN, Random Forest, Linear models is to eliminate bias for any one model in particular.
SVM, AlexNet, LSTM and ESN etc., with that of the Our algorithm model primarily focuses on the optimization
proposed model. As depicted in the Table, the accuracy process and is input independent due to a variety of
obtained in the conventional models varies from 88.0% to features.
93.88% and our model achieves 95.29% accuracy, which The feature extraction is followed by the optimization
is fairly better than all the listed recent schemes. The process, which in our work uses a genetic algorithm. Given
comparison of accuracies is also graphically shown in the objective function (KNN), the genetic algorithm produces

VOLUME 11, 2023 3521


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

TABLE 1. Performance comparison of proposed model.

FIGURE 16. Comparison of accuracy from proposed model and other recent schemes.

a population of binary vectors and, over the course of its run, the performance of the population’s size vs accuracy and
produces the best binary vectors. The GA methods, namely, population size against loss. The iteration size was set
crossover, mutation and selection are employed during to 200 and was kept constant during the course of this
its course. The algorithm produces the optimized vector, evaluation. It was seen that the best accuracy was obtained
which is then used to evaluate the test set. The objective at population size 20 and 40. The loss was recorded as
function of algorithm is based on KNN, which is a distance- 0.04 and 0.06 at 20 and 40 population sizes respectively. The
based algorithm. The algorithm never learns a discriminative population size of 20 was chosen for further analysis since it
function, during the process but only calculates distances demonstrated the best accuracy and loss.
between two vectors (lazy learner).By reducing features the The second evaluation includes the investigation of accu-
complexity of traditional KNN is further decreased. Thus, the racy and loss against iterations. Initially iterations were set
training complexity is highly reduced as compared to other at 50 and increased by 50-fold up to a maximum of 500.
traditional CNN based models. The population size being set at 20, the accuracy and loss are
The algorithm was evaluated on a number of dimen- measured at each run. The algorithm shows best convergence
sions, and the outcomes were interpreted in several ways. at iteration size 200. The accuracy measured at iteration
Population Size against Accuracy, Iterations vs Accuracy, 200 was found out 95.29% while as the loss was minimum
Population Size vs Loss, and Iteration size vs Loss are some i.e., 0.1 at iteration number 100. After 200th iteration the
of the several metrics used to assess performance. model overfits and shows a flat accuracy curve. While as loss
The population’s size is first set to 10 with a subsequent increases after 100th iterations and shows a spike after 200th
increase of 5 up to 50, in each run, in order to assess iteration.

3522 VOLUME 11, 2023


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

We thus conclude that the method shows the best [9] M. Fratello, F. Cordella, G. Albani, G. Veneziano, G. Marano, A. Paffi, and
convergence at population sizes of 20 and the 200th iteration. A. Pallotti, ‘‘Classification-based screening of Parkinson’s disease patients
through graph and handwriting signals,’’ Eng. Proc., vol. 11, no. 1, p. 49,
The algorithm’s final findings show accuracy of 95.29 percent 2021.
and a loss of 0.12. [10] A. Gold, ‘‘Understanding the mann-whitney test,’’ J. Property Tax
Assessment Admin., vol. 4, no. 3, pp. 55–57, 2007.
VI. CONCLUSION [11] H. W. Loh, C. P. Ooi, E. Palmer, P. D. Barua, S. Dogan, T. Tuncer,
M. Baygin, and U. R. Acharya, ‘‘GaborPDNet: Gabor transformation and
This paper proffered to present a novel framework for the deep neural network for Parkinson’s disease detection using EEG signals,’’
accurate detection of Parkinson’s disease through handwrit- Electronics, vol. 10, no. 14, p. 1740, Jul. 2021.
[12] S. Chakraborty, S. Aich, J. Seong-Sim, E. Han, J. Park, and H.-C. Kim,
ten records available from a standard NewHandPD dataset.
‘‘Parkinson’s disease detection from spiral and wave drawings using
The proposed framework is based on the transfer learning convolutional neural networks: A multistage classifier approach,’’ in Proc.
models such as ResNet, VGG19, and InceptionV3 so as to 22nd Int. Conf. Adv. Commun. Technol. (ICACT), Feb. 2020, pp. 298–303.
[13] S. Nõmm, S. Zarembo, K. Medijainen, P. Taba, and A. Toomela, ‘‘Deep
reduce the burden of the training time. The collective features CNN based classification of the archimedes spiral drawing tests to support
from the TL models are fed to the optimization process diagnostics of the Parkinson’s disease,’’ IFAC-PapersOnLine, vol. 53,
using genetic algorithm to get optimized feature vector for no. 5, pp. 260–264, 2020.
[14] T. Tuncer, S. Dogan, and U. R. Acharya, ‘‘Automated detection of
better classification results. The optimization phase considers Parkinson’s disease using minimum average maximum tree and singular
the accuracy as the fitness value and KNN as the objective value decomposition method with vowels,’’ Biocybernetics Biomed. Eng.,
function. The classification using the optimized features is vol. 40, no. 1, pp. 211–220, Jan. 2020.
[15] A. Das, H. S. Das, A. Choudhury, A. Neog, and S. Mazumdar, ‘‘Detection
done with the help of KNN which is computationally less of Parkinson’s disease from hand-drawn images using deep transfer
intensive. The performance of the proposed model is studied learning,’’ in Proc. Congr. Intell. Syst. Singapore: Springer, Sep. 2020,
and assessed through various analyses. The proposed model pp. 67–84.
[16] L. S. Bernardo, A. Quezada, R. Munoz, F. M. Maia, C. R. Pereira,
found to possess better classification accuracy than many W. Wu, and V. H. C. De Albuquerque, ‘‘Handwritten pattern recognition
recently investigated schemes. The Loss is very negligible for early Parkinson’s disease diagnosis,’’ Pattern Recognit. Lett., vol. 125,
and has good precision, and other performance lineaments. pp. 78–84, Jul. 2019.
[17] A. Johri and A. Tripathi, ‘‘Parkinson disease detection using deep neural
The experimental and performance comparison analysis networks,’’ in Proc. 12th Int. Conf. Contemp. Comput. (IC), Aug. 2019,
validated the better performance of the proposed model in pp. 1–4.
[18] T. Tuncer and S. Dogan, ‘‘A novel octopus based Parkinson’s disease
accurately detecting the Parkinson’s disease. and gender recognition method using vowels,’’ Appl. Acoust., vol. 155,
pp. 75–83, Dec. 2019.
CONFLICT OF INTEREST [19] P. Khatamino, I. Canturk, and L. Ozyilmaz, ‘‘A deep learning-CNN based
The authors declare no conflict of interest. system for medical diagnosis: An application on Parkinson’s disease
handwriting drawings,’’ in Proc. 6th Int. Conf. Control Eng. Inf. Technol.
ACKNOWLEDGMENT (CEIT), Oct. 2018, pp. 1–6.
[20] C.-L. Liu, C.-H. Lee, and P.-M. Lin, ‘‘A fall detection system using
The authors would like to acknowledge the Princess Nourah K-nearest neighbor classifier,’’ Exp. Syst. Appl., vol. 37, no. 10,
bint Abdulrahman University Researchers Supporting Project pp. 7174–7181, Oct. 2010.
[21] S. Katoch, S. S. Chauhan, and V. Kumar, ‘‘A review on genetic algorithm:
number (PNURSP2023R66), Princess Nourah bint Abdul- Past, present, and future,’’ Multimedia Tools Appl., vol. 80, no. 5,
rahman University, Riyadh, Saudi Arabia. pp. 8091–8126, 2021.
[22] C. R. Pereira, S. A. Weber, C. Hook, G. H. Rosa, and J. P. Papa, ‘‘Deep
learning-aided Parkinson’s disease diagnosis from handwritten dynamics,’’
REFERENCES in Proc. 29th Conf. Graph., Patterns Images (SIBGRAPI), Oct. 2016,
[1] B. R. Bloem, M. S. Okun, and C. Klein, ‘‘Parkinson’s disease,’’ Lancet, pp. 340–346.
vol. 397, no. 10291, pp. 2284–2303, 2021. [23] C. R. Pereira, D. R. Pereira, F. A. Silva, J. P. Masieiro, S. A. Weber,
[2] H. Li, C.-M. Pun, F. Xu, L. Pan, R. Zong, H. Gao, and H. Lu, ‘‘A hybrid C. Hook, and J. P. Papa, ‘‘A new computer vision-based approach to aid the
feature selection algorithm based on a discrete artificial bee colony for diagnosis of Parkinson’s disease,’’ Comput. Methods Programs Biomed.,
Parkinson’s diagnosis,’’ ACM Trans. Internet Technol., vol. 21, no. 3, vol. 136, pp. 79–88, Nov. 2016.
pp. 1–22, Aug. 2021. [24] R. Ribani and M. Marengoni, ‘‘A survey of transfer learning for
[3] M. Kim, J. Yun, Y. Cho, K. Shin, R. Jang, H. J. Bae, and N. Kim, ‘‘Deep convolutional neural networks,’’ in Proc. 32nd SIBGRAPI Conf. Graph.,
learning in medical imaging,’’ Neurospine, vol. 16, no. 4, p. 657, 2019. Patterns Images Tuts. (SIBGRAPI-T), 2019, pp. 47–57.
[25] S. Xu and Z. Pan, ‘‘A novel ensemble of random forest for assisting
[4] M. Bakator and D. Radosav, ‘‘Deep learning and medical diagnosis:
diagnosis of Parkinson’s disease on small handwritten dynamics dataset,’’
A review of literature,’’ Multimodal Technol. Interact., vol. 2, no. 3, p. 47,
Int. J. Med. Informat., vol. 144, Dec. 2020, Art. no. 104283.
Aug. 2018. [26] M. Moetesum, I. Siddiqi, N. Vincent, and F. Cloppet, ‘‘Assessing visual
[5] Z. Fang, ‘‘Improved KNN algorithm with information entropy for the attributes of handwriting for prediction of neurological disorders—A case
diagnosis of Parkinson’s disease,’’ in Proc. Int. Conf. Mach. Learn. Knowl. study on Parkinson’s disease,’’ Pattern Recognit. Lett., vol. 121, pp. 19–27,
Eng. (MLKE), Feb. 2022, pp. 98–101. Apr. 2019.
[6] E. Kaplan, E. Altunisik, Y. E. Firat, P. D. Barua, S. Dogan, M. Baygin, [27] A. Parziale, C. A. Della, R. Senatore, and A. Marcelli, ‘‘A decision tree for
F. B. Demir, T. Tuncer, E. Palmer, R.-S. Tan, P. Yu, J. Soar, H. Fujita, automatic diagnosis of Parkinson’s disease from offline drawing samples:
and U. R. Acharya, ‘‘Novel nested patch-based feature extraction model Experiments and findings,’’ in Proc. Int. Conf. Image Anal. Process. Cham,
for automated Parkinson’s disease symptom classification using MRI Switzerland: Springer, 2019, pp. 196–206.
images,’’ Comput. Methods Programs Biomed., vol. 224, Sep. 2022, [28] J. P. Folador, M. C. S. Santos, L. M. D. Luiz, L. A. P. De Souza,
Art. no. 107030. M. F. Vieira, A. A. Pereira, and A. De Oliveira Andrade, ‘‘On the use
[7] M. Gazda, M. Hires, and P. Drotar, ‘‘Ensemble of convolutional neural of histograms of oriented gradients for tremor detection from sinusoidal
networks for Parkinson’s disease diagnosis from offline handwriting,’’ and spiral handwritten drawings of people with Parkinson’s disease,’’ Med.
Dept. Comput. Inform., Intell. Inf. Syst. Lab, Tech. Univ. Kosice, Košice, Biol. Eng. Comput., vol. 59, no. 1, pp. 195–214, Jan. 2021.
Slovakia, Tech. Rep. 9, 2022. [29] L. Parisi, D. Neagu, R. Ma, and F. Campean, ‘‘Quantum ReLU
[8] M. Mohaghegh and J. Gascon, ‘‘Identifying Parkinson’s disease using activation for convolutional neural networks to improve diagnosis of
multimodal approach and deep learning,’’ in Proc. 6th Int. Conf. Innov. Parkinson’s disease and COVID-19,’’ Exp. Syst. Appl., vol. 187, Jan. 2022,
Technol. Intell. Syst. Ind. Appl. (CITISIA), Nov. 2021, pp. 1–6. Art. no. 115892.

VOLUME 11, 2023 3523


S. M. Abdullah et al.: Deep Transfer Learning Based Parkinson’s Disease Detection Using Optimized Feature Selection

SURA MAHMOOD ABDULLAH received the such as Information Sciences, Signal Processing, Expert Systems With Appli-
B.Sc. degree in computer science (software) cations, Journal of Information Security and Applications, IEEE JOURNAL
from the University of Technology, Iraq, in 2006, ON SELECTED AREAS IN COMMUNICATIONS (JSAC), IEEE TRANSACTIONS ON
and the M.Sc. degree in software engineering from CYBERNETICS (TCYB), IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
the Institute of Informatics of Higher Studies, INTELLIGENCE (TPAMI), IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Iraqi Commission for Computers and Informatics, (TII), IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
Baghdad, Iraq, in 2013. She is a Lecturer with (TCSVT), IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (TCAS), IEEE
the Department of Computer Sciences, University TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS (TNNLS),
of Technology, Baghdad. Her research interest IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, IEEE
includes artificial intelligence. TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS (TITS), IEEE
TRANSACTIONS ON RELIABILITY (TR), IEEE TRANSACTIONS ON NETWORK SCIENCE
THEKRA ABBAS received the bachelor’s degree AND ENGINEERING (TNSE), IEEE TRANSACTIONS ON NANOBIOSCIENCE, IEEE

in computer science from the Department of SYSJ, IEEE MULTIMEDIA, IEEE ACCESS, Wireless Personal Communications,
Computer Sciences, University of Technology, Neural Computing and Applications, International Journal of Bifurcation
Baghdad, Iraq, in 1987, the M.Sc. degree in and Chaos in Applied Sciences and Engineering, Chaos Solitons &
computer science from Mustansiriyah University, Fractals, Physica A Statistical Mechanics and its Applications, Signal
Baghdad, and the Ph.D. degree in computer Processing: Image Communication, Neurocomputing, IET Information
science from Central South University, Hunan, Security, IET Image Processing, Security and Communication Networks,
China. She is an Assistant Professor with the Optik, Optics and Laser Technology, Complexity, Computers in Biology and
Department of Computer Sciences, Mustansiriyah Medicine, Computational and Applied Mathematics, and Concurrency and
University, where she is the Head of the Computer Computation.
Science Department. She also leads and teaches modules in computer
science. She has more than 25 years of experience including extensive project NAGLAA F. SOLIMAN received the B.Sc., M.Sc.,
management, supervised researches in the related area. Her research interests and Ph.D. degrees from the Faculty of Engineer-
include information technology, big data, and multimedia. ing, Zagazig University, Egypt, in 1999, 2004,
and 2011, respectively. She has been with the
Faculty of Computer Science, PNU, KSA, since
MUNZIR HUBIBA BASHIR received the B.Tech. 2015. She is a Teaching Staff Member with the
degree in computer engineering and technology Department of Electronics and Communications
from Amity University, Noida, India, in 2020, Engineering, Faculty of Engineering, Zagazig
and the M.Tech. degree in computer engineering University, Egypt. Her current research interests
from Jamia Milia Islamia, New Delhi, India, include digital image processing, information
in 2022. Her research interests include (but not security, multimedia communications, medical image processing, optical
limited to) neural networks, pattern recognition signal processing, big data, and cloud computing.
and classification, and deep learning.
WALID EL-SHAFAI was born in Alexandria,
Egypt. He received the B.Sc. degree (Hons.) in
electronics and electrical communication engi-
ISHFAQ AHMAD KHAJA received the B.Tech. neering from the Faculty of Electronic Engineer-
degree from the University of Kashmir, Srinagar, ing (FEE), Menoufia University, Menouf, Egypt,
Jammu and Kashmir, India, in 2017, and the in 2008, the M.Sc. degree from the Egypt-Japan
M.Tech. degree from the Department of Computer University of Science and Technology (E-JUST),
Engineering, Jamia Millia Islamia, New Delhi, in 2012, and the Ph.D. degree from the Faculty
in 2020, where he is currently pursuing the Ph.D. of Electronic Engineering, Menoufia University,
degree with the Department of Computer Engi- in 2019. Since January 2021, he has been a
neering. His areas of research interests include Postdoctoral Research Fellow with the Security Engineering Laboratory
machine learning, deep learning, cryptography, (SEL), Prince Sultan University (PSU), Riyadh, Saudi Arabia. He is
optimization techniques, and image processing. currently working as a Lecturer and an Assistant Professor with the
Electronics and Communication Engineering (ECE) Department, FEE,
MUSHEER AHMAD received the B.Tech. and Menoufia University. He has several publications in his research areas
M.Tech. degrees from the Department of Com- in several reputable international and local journals and conferences. His
puter Engineering, Aligarh Muslim University, research interests include wireless mobile and multimedia communications
India, in 2004 and 2008, respectively, and systems, image and video signal processing, efficient 2-D video/3-D multi-
the Ph.D. degree in chaos-based cryptography view video coding, multi-view video plus depth coding, 3-D multi-view
from the Department of Computer Engineer- video coding and transmission, quality of service and experience, digital
ing, Jamia Millia Islamia, New Delhi, India. communication techniques, cognitive radio networks, adaptive filters design,
From 2007 to 2010, he was at the Department 3-D video watermarking, steganography, and encryption, error resilience and
of Computer Engineering, Aligarh Muslim Uni- concealment algorithms for H.264/AVC, H.264/MVC, and H.265/HEVC
versity. Since 2011, he has been an Assistant video codecs standards, cognitive cryptography, medical image processing,
Professor with the Department of Computer Engineering, Jamia Millia speech processing, security algorithms, software defined networks, the
Islamia. He has published over 100 research papers in internationally reputed Internet of Things, medical diagnoses applications, FPGA implementations
refereed journals and conference proceedings of the IEEE/Springer/Elsevier. for signal processing algorithms and communication systems, cancellable
He has more than 2500 citations of his research works with an H-index of biometrics and pattern recognition, image and video magnification, arti-
30, i-10 index of 70, and cumulative impact factor of more than 200. He is ficial intelligence for signal processing algorithms and communication
listed among World’s Top 2% Scientists in studies conducted by Elsevier BV systems, modulation identification and classification, image and video
and Stanford University, in 2021 and 2022. His research interests include super-resolution and denoising, cybersecurity applications, malware and
multimedia security, chaos-based cryptography, cryptanalysis, machine ransomware detection and analysis, deep learning in signal processing, and
learning for security, image processing, and optimization techniques. He has communication systems applications. He serves as a reviewer for several
served as a reviewer and a technical program committee member of many international journals.
international conferences. He was a Referee of some renowned journals,

3524 VOLUME 11, 2023

You might also like