Vijayalakshmi 2020
Vijayalakshmi 2020
Abstract— Today, there is an requirement of developing rapid medical diagnostic systems which are high in accuracy for the earlier
speedy recovery. The observations of the autism spectrum disorder (ASD) are highly rely on the behavioral evaluation of the
patients which takes more time and effort. The machine learning models are nowadays used to explore the viability of detecting the
features that are probably evaluates the existence of autism or not. The proposed work aimed to address on autism clusters:
Adolescent, adult and child dataset using the multi - classifier based regression (MCR) mechanism as a hybrid recommender model
to improve the accuracy in prediction. It is constructed by integrating Naïve Bayes and Random Forest classifiers with Logistic
regression through the meta-classifier with unprocessed data and evaluated by considering the average of probabilities and the
majority voting of multiple classifiers. As a whole, with unprocessed data, the MCR model on majority voting performs well on
adult and adolescent datasets, whereas no improvement in accuracy for child dataset. To overcome this, enhancement on the MCR
model through data preprocessing like selection of behavioral and demographic attributes, data transformation and normalization
is experimented. The outcome of the model improved remarkably with majority voting on behavioral analysis for all the three
datasets, whereas it suits for adult and child dataset on demographic analysis. The adolescent dataset on demographic attributes
accomplished using MCR with average of probabilities. To conclude, based on the overall observations and the interpretations, the
new integrated MCR model with processed data performs better than unprocessed data in all datasets and has developed the
confidence to be the fittest model in Autism detection.
Keywords— Autism Spectrum Disorder, Machine Learning, Random Forest, Logistic Regression, Naïve Bayes, Prediction,
Classification.
Autism Spectrum Disorder is a neurological related The objective of the proposed work is to effectively
developmental disorder that affects the child of age from classify whether a child, adolescent or adult is likely an
less one year and continues in their rest of the life. Autistic by using the multi - classifier based regression
Globally more than 1% of people – 62 million getting (MCR) mechanism as a hybrid recommender model. The
affected by Autism[3][4]. It occurs at the early childhood paperwork is organized the various sections as such :
as a complex mental disorder. Hence, predominant Section II has the literature survey on the various machine
research is required for the early diagnosis and so as the learning algorithms, Section III gives the methodology
better medical recommendations has to be provided with adopted, Section IV is on data analysis , Section V presents
the better understanding on autism[1]. Nowadays, due to the experimental methods with results interpreted on
tremendous growth in the machine learning, many various aspects, Section VI concludes with future
interdisciplinary researches are happening, exclusively on enhancements .
medical science at image segmentation, disease diagnosis,
gene modeling and many more[2][5][6]. Many II. LITERATURE SURVEY
contributions on autism diagnostics carried on using recent On ASD behavioral attributes on children, LDA performed
development in technology like machine learning[7]. In better than K-NN10]. Five different monogenetic disorders
this regard, the paper work uses the dataset of this author namely RTT, FXS, TSC, PMS and Timothy syndrome
who collected by using mobileApp for ASD screening associated with ASD syndrome, observed that decision tree
based on user response to the 10 behavioral and 10 was better than the SVM and Multilayer Perceptron.
Behavioral attributes on ASD, RF classifier satisfied with
978-1-7281-7213-2/20/$31.00 2020
c IEEE 139
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
the results with 96%[12]. No study carried before to B. Naïve Bayes
investigate the significance of restricted kinematic features A statistical based in classification, the Naïve Bayes is
can be used to identify ASD. On results, KNN performed introduced that defines the conditional independence of its
better than SVM, LDA, DT, RF with four kinematic features. Equation (2) is a Bayes classifier in which class A
features[13]. Using EEG signal processing with learning is to be predicted and maximize P(A|B) based on this class
models, a moderately less cost diagnostic system can be value.
developed to classify ASD[14].
A mobile app was developed to predict Autism using the It is the effective predictive technique with efficient
integrated Random Forest-CART & and Random Forest- performance in prediction for the datasets with missing
ID3[15]. On assessing the tree based classifiers, J48 is the values.
best compared to Logistic Model Tree, Random Forest,
Reduced Error Pruned Tree, and Decision Stump[16]. An C. Logistic Regression
automatic sleep stage classification based on the random The Logistic regression is the customary method used for
subspace method creates decision forest through random data analysis in terms relationship between a binary class
trees[17]. Deep Learning classifier proved the best label and one or more predictor variables. It is a statistical
classifier on compared with Decision Tree, Naïve Bayes, technique used widely to solve all predictive problems
k-nn, Random Tree applied on child, adolescent and adult through the probability of an event. It is used to classify
ASD screening. The least classifier is Random Tree the categorical based class label in a qualitative manner
classifier[18]. A facial recognition based classification depending on the independent variables.
model conducted by Wenbo Liu, Zhiding Yu, Bhiksha Raj,
Li Yi, Xiaobing Zou & Ming Li and given a different The proposed Multi-classifier based Regression (MCR)
technique for ASD detection[19]. model is the amalgamation of Random Forest, Naïve
Bayes classifiers with Logistic Regression. Fig 3.1
As a motivation, this proposed work likely to contribute a illustrates the methodology of this model and the algorithm
recommender system to the society, which gives a better for the same is provided mentioned below:
accurate prediction on ASD for the earlier treatment. This
system is modeled by integrating the Random Forest and
Naïve Bayes classifiers with regression to predict ASD in
various categories of datasets: adult, child and adolescent
based on behavioral attributes as well as with demographic
information by considering the average of different
classifiers probabilities and majority voting.
III. METHODOLOGY
The most prevalent technology that makes the machine
learns by itself through the previous experience to identify
the most persuasive patterns in the given dataset with an
artificial intelligence influence for the prominent
predictions is ML[20]. From (1), the representation of
supervised learning technique has a dataset D, a class C
and a prediction variable Y. Hence, to state simply, the
The Experiment starts with choosing the appropriate
classification is the process of mapping the relevant record
dataset out of Adult, child and adolescent. Once selecting
to a suitable class.
the dataset, the process continue to check for data
preprocessing. If no, then directly the MCR model will
The supervised based Machine learning techniques used in be applied and the test set and measures the model
the proposed work are: performance. Otherwise, it selects the behavioral and
demographic attributes and apply the MCR model.
A. Random Forest
Decision Tree is a widespread classification mechanism At the end of the process, the model compares its
handles any binary classification problems through performance efficiency in terms of accuracy, recall and
visualization. Random Forest is a collaborative decision ROC and convince that it performs better than Random
tree based technique that generates a forest as a group of Forest, Naïve Bayes and Logistic Regression models.
decision trees. The disadvantage of DT is model overfitting
can be overcome with Random Forest. Using voting, the
best scored tree will be selected from the forest randomly
on subtrees.
140 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
To do so, the three datasets works on the split ratio of
60:40 for training and a validation set. The class
distribution of the entire adult dataset is given below:
The class label NO has 73% and the YES has 26.8% of
distribution among the given observations. The different
levels of the factor(nominal) attributes of adult dataset is
IV. DATASET ANALYSIS given below:
The work is examined with the version 1 - ASD Screening
Data for Child Dataset(292, 21) / Adolescent Dataset(104,
21) / Adult Dataset (704, 21) collected from UCI machine
learning repository provided by Fadi Fayez Thabtah.
Tabtah F. which suits for all classification and regression
models[8][9]. The child dataset has 292 instances,
adolescent dataset has 104 and ,704 records with 21
attributes out of which 10 questionaries’(AQ1-10) related
to behavioral concern and 10 demographic features with
one class label. The information provided in the dataset has
lot of significance for the effective ASD diagnosis. The
Data Preprocessing: The quality of prediction through
data description of all three datasets are given below:
machine learning improves with preprocessing. The model
works on the two different subsets: i. behavioral set ii.
Demographic set of attributes, in which the machine
learning computational efficiency will improve. The
identified categorical data from each set encoded to a
numerical data and further using normalization the
numerical values can be transformed to a common range
for the better mathematical data analysis.
International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020) 141
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
i. Without Preprocessing:
The Fig 5.1 shows the graphical representation of MCR A. Accuracy & Classification Error:
model for unprocessed data considering all the Accuracy is the performance metric used in predictive
performance metrics. The assessment shows visually on models for the quality of prediction by considering the
achieving 100% accuracy in adult and adolescent dataset percentage of correctly classified instances against the total
with majority voting but no improvement in child dataset. number of records. Classification Error is the measure for
the deviation in prediction by considering the percentage
of incorrectly classified instances against the total number
of records. Both metrics can be measures using :
142 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
dataset. It achieved 100% TPR and 0% FPR whereas it has
not performed like Random forest for child dataset.
The Fig 5.4 illustrates on how the proposed MCR model The interpretation of both behavioral and demographic
works by considering average of probabilities and majority attributes using MCR model is provided in the Table 5.3.
of voting. It clearly depicts visually to have the confidence It is observed that the MCR model performs well with all
on MCR model with majority voting as the fittest model the three datasets with majority voting on behavioral
except for child dataset. The Table 5.2 presents the analysis.
overview on the scope of MCR model on all three datasets:
On Demographic data analysis, due to RMSE factor, it is
noticed that the Adult dataset works better with MCR on
average of probabilities and the rest two datasets handled
effectively with MCR model on majority voting.
International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020) 143
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
to detect the autism at the earlier stage for the right
medical guidance and fast recovery. In future work, it can
be extended for behavioral and demographic data analysis
on version 2 datasets with implementation.
VII. REFERENCES
[1] F. Hauck and N. Kliewer, “Machine Learning for Autism
Diagnostics: Applying Support Vector Classification,” Int’l Conf.
Heal. Informatics Med. Syst., pp. 120–123, 2017.
[2] Y. Q. Zhang and J. C. Rajapakse, Machine Learning in
Bioinformatics (Vol.4). John Wiley & Sons, 2009.
[3] American Psychiatric Association (2013). "Autism Spectrum
Disorder. 299.00 (F84.0)". Diagnostic and Statistical Manual of
Mental Disorders, Fifth Edition (DSM-5). Arlington, VA:
American Psychiatric Publishing. pp. 50–59.
[4] GBD 2015 Disease and Injury Incidence and Prevalence,
Collaborators. (8 October 2016)
[5] A. R. Olivera et al., “Comparison of machine-learning algorithms
to build a predictive model for detecting undiagnosed diabetes -
ELSA- Brasil: accuracy study,” Sao Paulo Med. J., vol. 135, no. 3,
pp. 234– 246, 2017.
[6] S. Wang and R. M. Summers, “Machine learning and radiology,”
Medical Image Analysis, vol. 16, no. 5. pp. 933–951, 2012.
[7] S. Narayanan, “Applying Machine Learning to Facilitate Autism
Diagnostics: Pitfalls and Promises,” J. Autism Dev. Disord., vol.
45, no. 5, pp. 1121–1136, 2015.
[8] F. F. Thabtah, “Autism Spectrum Disorder Screening: Machine
Learning Adaptation and DSM-5 Fulfillment,” in Proceedings of the
1st International Conference on Medical and Health Informatics
2017, 2017, pp. 1–6.
[9] F. F. Thabtah, “Autism Spectrum Disorder Tests App.” 2017. Wei-
Lun Chao, “Machine Learning Tutorial.” Graduate Institute of
Communication Engineering, National Taiwan University, Taiwan,
2011.
[10] Osman Altay & Mustafa Ulas. (2018). Prediction of the Autism
Spectrum Disorder Diagnosis with Linear Discriminant Analysis
Classifier and K-Nearest Neighbor in Children.
[11] V. Pream Sudha and M. S. Vijaya. (2018). Machine Learning-
Based Model for Identification of Syndromic Autism Spectrum
Disorder
[12] Shaon Bhatta Shuvo et al, “A Data Mining Based Approach to
Predict Autism Spectrum Disorder Considering Behavioral
Attributes”, 10th ICCCNT 2019 July 6-8, 2019, IIT -Kanpur,
Kanpur, India.
[13] Z . Zhao et al.:, Applying Machine Learning to Identify Autism
With Restricted Kinematic Features, IEEE Access, 2019.
[14] Dilantha Haputhanthri et al, An EEG based Channel Optimized
Classification Approach for Autism Spectrum Disorder, IEEE
Conf, 2019.
[15] Kazi Shahrukh Omar et al., A Machine Learning Approach to
Predict Autism Spectrum Disorder, 2019 International Conference
on Electrical, Computer and Communication Engineering (ECCE),
7-9 February, 2019.
[16] Md. Shahriare Satu et.al., Early Detection of Autism by Extracting
Features: A Case Study in Bangladesh, 2019 International
Conference on Robotics,Electrical and Signal Processing
Techniques (ICREST).
[17] Intan Nurma Yulita et. Al., Random Subspace Method for Sleep
Stage Classification of Autism Patients, ISRITI, 2018.
[18] Alrence Santiago Halibas et. al. ,Performance Analysis of Machine
Learning Classifiers for ASD Screening, 2018 International
Conference on Innovation and Intelligence for Informatics,
Computing, and Technologies (3ICT)
[19] Wenbo Liu, Zhiding Yu, Bhiksha Raj, Li Yi, Xiaobing Zou &
Ming Li. (2015). Efficient Autism Spectrum Disorder Prediction
with Eye Movement: A Machine Learning Framework
[20] H. Zhang, “The Optimality of Naive Bayes,” Proc. Seventeenth Int.
Florida Artif. Intell. Res. Soc. Conf. FLAIRS 2004, vol. 1, no. 2,
pp. 1–6, 2004.
144 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)
Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.