0% found this document useful (0 votes)
70 views6 pages

Vijayalakshmi 2020

This document proposes a hybrid recommender system called multi-classifier regression model (MCR) to improve autism detection accuracy. The MCR integrates naive Bayes and random forest classifiers with logistic regression through a meta-classifier. It is evaluated on autism datasets for adults, adolescents and children, considering the average probabilities and majority voting of classifiers. The MCR performs well on adult and adolescent datasets with unprocessed data, but not for children. Preprocessing like feature selection and normalization is shown to improve MCR accuracy for all three datasets, demonstrating its potential as an effective autism detection model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views6 pages

Vijayalakshmi 2020

This document proposes a hybrid recommender system called multi-classifier regression model (MCR) to improve autism detection accuracy. The MCR integrates naive Bayes and random forest classifiers with logistic regression through a meta-classifier. It is evaluated on autism datasets for adults, adolescents and children, considering the average probabilities and majority voting of classifiers. The MCR performs well on adult and adolescent datasets with unprocessed data, but not for children. Preprocessing like feature selection and normalization is shown to improve MCR accuracy for all three datasets, demonstrating its potential as an effective autism detection model.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Hybrid Recommender System using Multi-

Classifier Regression Model for Autism Detection


K. Vijayalakshmi Dr. M. Vinayakamurthy Dr. Anuradha
School of CSA School of CSA PG Department
REVA University REVA University STC College
Bangalore, India Bangalore, India Pollachi, India
[email protected] [email protected] [email protected]

Abstract— Today, there is an requirement of developing rapid medical diagnostic systems which are high in accuracy for the earlier
speedy recovery. The observations of the autism spectrum disorder (ASD) are highly rely on the behavioral evaluation of the
patients which takes more time and effort. The machine learning models are nowadays used to explore the viability of detecting the
features that are probably evaluates the existence of autism or not. The proposed work aimed to address on autism clusters:
Adolescent, adult and child dataset using the multi - classifier based regression (MCR) mechanism as a hybrid recommender model
to improve the accuracy in prediction. It is constructed by integrating Naïve Bayes and Random Forest classifiers with Logistic
regression through the meta-classifier with unprocessed data and evaluated by considering the average of probabilities and the
majority voting of multiple classifiers. As a whole, with unprocessed data, the MCR model on majority voting performs well on
adult and adolescent datasets, whereas no improvement in accuracy for child dataset. To overcome this, enhancement on the MCR
model through data preprocessing like selection of behavioral and demographic attributes, data transformation and normalization
is experimented. The outcome of the model improved remarkably with majority voting on behavioral analysis for all the three
datasets, whereas it suits for adult and child dataset on demographic analysis. The adolescent dataset on demographic attributes
accomplished using MCR with average of probabilities. To conclude, based on the overall observations and the interpretations, the
new integrated MCR model with processed data performs better than unprocessed data in all datasets and has developed the
confidence to be the fittest model in Autism detection.

Keywords— Autism Spectrum Disorder, Machine Learning, Random Forest, Logistic Regression, Naïve Bayes, Prediction,
Classification.

demographic related information which helps to identify


I. INTRODUCTION the autism symptom[8][9].

Autism Spectrum Disorder is a neurological related The objective of the proposed work is to effectively
developmental disorder that affects the child of age from classify whether a child, adolescent or adult is likely an
less one year and continues in their rest of the life. Autistic by using the multi - classifier based regression
Globally more than 1% of people – 62 million getting (MCR) mechanism as a hybrid recommender model. The
affected by Autism[3][4]. It occurs at the early childhood paperwork is organized the various sections as such :
as a complex mental disorder. Hence, predominant Section II has the literature survey on the various machine
research is required for the early diagnosis and so as the learning algorithms, Section III gives the methodology
better medical recommendations has to be provided with adopted, Section IV is on data analysis , Section V presents
the better understanding on autism[1]. Nowadays, due to the experimental methods with results interpreted on
tremendous growth in the machine learning, many various aspects, Section VI concludes with future
interdisciplinary researches are happening, exclusively on enhancements .
medical science at image segmentation, disease diagnosis,
gene modeling and many more[2][5][6]. Many II. LITERATURE SURVEY
contributions on autism diagnostics carried on using recent On ASD behavioral attributes on children, LDA performed
development in technology like machine learning[7]. In better than K-NN10]. Five different monogenetic disorders
this regard, the paper work uses the dataset of this author namely RTT, FXS, TSC, PMS and Timothy syndrome
who collected by using mobileApp for ASD screening associated with ASD syndrome, observed that decision tree
based on user response to the 10 behavioral and 10 was better than the SVM and Multilayer Perceptron.
Behavioral attributes on ASD, RF classifier satisfied with

978-1-7281-7213-2/20/$31.00 2020
c IEEE 139

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
the results with 96%[12]. No study carried before to B. Naïve Bayes
investigate the significance of restricted kinematic features A statistical based in classification, the Naïve Bayes is
can be used to identify ASD. On results, KNN performed introduced that defines the conditional independence of its
better than SVM, LDA, DT, RF with four kinematic features. Equation (2) is a Bayes classifier in which class A
features[13]. Using EEG signal processing with learning is to be predicted and maximize P(A|B) based on this class
models, a moderately less cost diagnostic system can be value.
developed to classify ASD[14].

A mobile app was developed to predict Autism using the It is the effective predictive technique with efficient
integrated Random Forest-CART & and Random Forest- performance in prediction for the datasets with missing
ID3[15]. On assessing the tree based classifiers, J48 is the values.
best compared to Logistic Model Tree, Random Forest,
Reduced Error Pruned Tree, and Decision Stump[16]. An C. Logistic Regression
automatic sleep stage classification based on the random The Logistic regression is the customary method used for
subspace method creates decision forest through random data analysis in terms relationship between a binary class
trees[17]. Deep Learning classifier proved the best label and one or more predictor variables. It is a statistical
classifier on compared with Decision Tree, Naïve Bayes, technique used widely to solve all predictive problems
k-nn, Random Tree applied on child, adolescent and adult through the probability of an event. It is used to classify
ASD screening. The least classifier is Random Tree the categorical based class label in a qualitative manner
classifier[18]. A facial recognition based classification depending on the independent variables.
model conducted by Wenbo Liu, Zhiding Yu, Bhiksha Raj,
Li Yi, Xiaobing Zou & Ming Li and given a different The proposed Multi-classifier based Regression (MCR)
technique for ASD detection[19]. model is the amalgamation of Random Forest, Naïve
Bayes classifiers with Logistic Regression. Fig 3.1
As a motivation, this proposed work likely to contribute a illustrates the methodology of this model and the algorithm
recommender system to the society, which gives a better for the same is provided mentioned below:
accurate prediction on ASD for the earlier treatment. This
system is modeled by integrating the Random Forest and
Naïve Bayes classifiers with regression to predict ASD in
various categories of datasets: adult, child and adolescent
based on behavioral attributes as well as with demographic
information by considering the average of different
classifiers probabilities and majority voting.

III. METHODOLOGY
The most prevalent technology that makes the machine
learns by itself through the previous experience to identify
the most persuasive patterns in the given dataset with an
artificial intelligence influence for the prominent
predictions is ML[20]. From (1), the representation of
supervised learning technique has a dataset D, a class C
and a prediction variable Y. Hence, to state simply, the
The Experiment starts with choosing the appropriate
classification is the process of mapping the relevant record
dataset out of Adult, child and adolescent. Once selecting
to a suitable class.
the dataset, the process continue to check for data
preprocessing. If no, then directly the MCR model will
The supervised based Machine learning techniques used in be applied and the test set and measures the model
the proposed work are: performance. Otherwise, it selects the behavioral and
demographic attributes and apply the MCR model.
A. Random Forest
Decision Tree is a widespread classification mechanism At the end of the process, the model compares its
handles any binary classification problems through performance efficiency in terms of accuracy, recall and
visualization. Random Forest is a collaborative decision ROC and convince that it performs better than Random
tree based technique that generates a forest as a group of Forest, Naïve Bayes and Logistic Regression models.
decision trees. The disadvantage of DT is model overfitting
can be overcome with Random Forest. Using voting, the
best scored tree will be selected from the forest randomly
on subtrees.

140 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
To do so, the three datasets works on the split ratio of
60:40 for training and a validation set. The class
distribution of the entire adult dataset is given below:

The class label NO has 73% and the YES has 26.8% of
distribution among the given observations. The different
levels of the factor(nominal) attributes of adult dataset is
IV. DATASET ANALYSIS given below:
The work is examined with the version 1 - ASD Screening
Data for Child Dataset(292, 21) / Adolescent Dataset(104,
21) / Adult Dataset (704, 21) collected from UCI machine
learning repository provided by Fadi Fayez Thabtah.
Tabtah F. which suits for all classification and regression
models[8][9]. The child dataset has 292 instances,
adolescent dataset has 104 and ,704 records with 21
attributes out of which 10 questionaries’(AQ1-10) related
to behavioral concern and 10 demographic features with
one class label. The information provided in the dataset has
lot of significance for the effective ASD diagnosis. The
Data Preprocessing: The quality of prediction through
data description of all three datasets are given below:
machine learning improves with preprocessing. The model
works on the two different subsets: i. behavioral set ii.
Demographic set of attributes, in which the machine
learning computational efficiency will improve. The
identified categorical data from each set encoded to a
numerical data and further using normalization the
numerical values can be transformed to a common range
for the better mathematical data analysis.

V. EXPERIMENTAL METHOD &


The Statistical models are used to measure the model INTERPRETATION OF RESULTS
performance and its summary of all numeric attributes of The proposed work use the classifiers like Random Forest
an adult dataset is provided below: and Naïve Bayes with logistic regression to improve the
ability of prediction by constructing the integrated MCR
model. As the chosen dataset has two–class prediction
label, the MCR model can be applied with more
confidence, as all the machine learning algorithms used in
this work are binary classifiers. This experiment used
WEKA tool that performs both supervised and
unsupervised machine learning. The experimental results
of the unprocessed data is given below:

International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020) 141

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
i. Without Preprocessing:

The Fig 5.1 shows the graphical representation of MCR A. Accuracy & Classification Error:
model for unprocessed data considering all the Accuracy is the performance metric used in predictive
performance metrics. The assessment shows visually on models for the quality of prediction by considering the
achieving 100% accuracy in adult and adolescent dataset percentage of correctly classified instances against the total
with majority voting but no improvement in child dataset. number of records. Classification Error is the measure for
the deviation in prediction by considering the percentage
of incorrectly classified instances against the total number
of records. Both metrics can be measures using :

From Fig 5.1, the accuracy of the MCR model on majority


voting is better with 100% for both adult and adoloscent
and 98% for child dataset.

B. Precision and Recall:


Precision is an evaluation metric that measures the
From the above, the least scorer is the logistic regression
relevance of the model prediction and recall is referred as
and it can improve with other strong classifiers through the
Sensitivity/Hit Ratio that measure the correctly classified
integrated MCR model.
with the relevance. The above two metrics can be
computed as :

From the Fig 5.2, the precision and recall achieved in


MCR with majority voting for adult and adolescent and
Random forest has higher precision and recall for child
dataset.

C. Receiver Operating Characteristic:


ROC measure the analytical ability and visualize of any
binary classifiers. It is used to choose the right classifier
model based on the optimal performance. From Fig 5.3, if
The evaluation criteria to measure and assess the greatness ROC value is 1, then the classifier is an optimal, otherwise
of a classifier are accuracy, classification error, precision,
the value 0.5 declares the classifier to random guessing.
recall & ROC. All these abovesaid metrics requires the
following values of TP, FP, TN & FN.

ROC values from Fig 5.1 signifies model as the possible


optimal model on prediction for adult and adolescent

142 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
dataset. It achieved 100% TPR and 0% FPR whereas it has
not performed like Random forest for child dataset.

The Fig 5.4 illustrates on how the proposed MCR model The interpretation of both behavioral and demographic
works by considering average of probabilities and majority attributes using MCR model is provided in the Table 5.3.
of voting. It clearly depicts visually to have the confidence It is observed that the MCR model performs well with all
on MCR model with majority voting as the fittest model the three datasets with majority voting on behavioral
except for child dataset. The Table 5.2 presents the analysis.
overview on the scope of MCR model on all three datasets:
On Demographic data analysis, due to RMSE factor, it is
noticed that the Adult dataset works better with MCR on
average of probabilities and the rest two datasets handled
effectively with MCR model on majority voting.

It shows clearly, that there is no impact of MCR model on


applying to child dataset. To resolve this issue and to
overcome, MCR model is handled with data preprocessing.

ii. With Preprocessing


From the previous observation, there is a need for an
Overall, it depicts the issue mentioned under unprocessed
improvement in the MCR model for child dataset. To
data on MCR model is addressed and hence the child
address this, the preprocessing is handled by choosing the
dataset is made to perform with a better optimal prediction
behavioral and demographic attributes from the
using MCR model on majority voting on both data
appropriate dataset. Data transformation is consideration to
analysis.
convert the nominal features to numeric, as most of the
ML algorithms performs better for numeric values.
Statistical measures gives the better accuracy in its VI. CONCLUSION AND FUTURE WORK
outcome, if all the numeric attributes are brought down to The study carried in this work is to predict in three
a common scale in its range of values using normalization. version-1 ASD datasets: adults, adolescent & child using
To measure the preprocessed data on MCR model, the proposed MCR classifier. Based on the various
accuracy, Kappa statistics and RMSE is used. evaluation metrics, the proposed integrated classifier
performance was excellent especially with data
Kappa statistic is a measure on chances on prediction preprocessing. This study concludes that MCR model with
classes with actual classes. A value greater than 0 means majority voting on all the three datasets to assess the
that the classifier is doing better than chance. If it is 0, then behavioral and demographic attributes is remarkable. Also,
it is random guessing. From the Fig 5.5, it is very clear the most important observation form the work shows that,
that both cases of MCR has KS value as 1 and hence it the logistic regression as a weak classifier. But the same
cannot be a random guess model. when it collaborated with Random forest and Naïve Bayes
classifiers, the results at different assessments shows the
significant improvement.

Due to the efficiency of MCR model, a better autism


screening model can be given to the society as a
potentially very effective automated recommender system

International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020) 143

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.
to detect the autism at the earlier stage for the right
medical guidance and fast recovery. In future work, it can
be extended for behavioral and demographic data analysis
on version 2 datasets with implementation.

VII. REFERENCES
[1] F. Hauck and N. Kliewer, “Machine Learning for Autism
Diagnostics: Applying Support Vector Classification,” Int’l Conf.
Heal. Informatics Med. Syst., pp. 120–123, 2017.
[2] Y. Q. Zhang and J. C. Rajapakse, Machine Learning in
Bioinformatics (Vol.4). John Wiley & Sons, 2009.
[3] American Psychiatric Association (2013). "Autism Spectrum
Disorder. 299.00 (F84.0)". Diagnostic and Statistical Manual of
Mental Disorders, Fifth Edition (DSM-5). Arlington, VA:
American Psychiatric Publishing. pp. 50–59.
[4] GBD 2015 Disease and Injury Incidence and Prevalence,
Collaborators. (8 October 2016)
[5] A. R. Olivera et al., “Comparison of machine-learning algorithms
to build a predictive model for detecting undiagnosed diabetes -
ELSA- Brasil: accuracy study,” Sao Paulo Med. J., vol. 135, no. 3,
pp. 234– 246, 2017.
[6] S. Wang and R. M. Summers, “Machine learning and radiology,”
Medical Image Analysis, vol. 16, no. 5. pp. 933–951, 2012.
[7] S. Narayanan, “Applying Machine Learning to Facilitate Autism
Diagnostics: Pitfalls and Promises,” J. Autism Dev. Disord., vol.
45, no. 5, pp. 1121–1136, 2015.
[8] F. F. Thabtah, “Autism Spectrum Disorder Screening: Machine
Learning Adaptation and DSM-5 Fulfillment,” in Proceedings of the
1st International Conference on Medical and Health Informatics
2017, 2017, pp. 1–6.
[9] F. F. Thabtah, “Autism Spectrum Disorder Tests App.” 2017. Wei-
Lun Chao, “Machine Learning Tutorial.” Graduate Institute of
Communication Engineering, National Taiwan University, Taiwan,
2011.
[10] Osman Altay & Mustafa Ulas. (2018). Prediction of the Autism
Spectrum Disorder Diagnosis with Linear Discriminant Analysis
Classifier and K-Nearest Neighbor in Children.
[11] V. Pream Sudha and M. S. Vijaya. (2018). Machine Learning-
Based Model for Identification of Syndromic Autism Spectrum
Disorder
[12] Shaon Bhatta Shuvo et al, “A Data Mining Based Approach to
Predict Autism Spectrum Disorder Considering Behavioral
Attributes”, 10th ICCCNT 2019 July 6-8, 2019, IIT -Kanpur,
Kanpur, India.
[13] Z . Zhao et al.:, Applying Machine Learning to Identify Autism
With Restricted Kinematic Features, IEEE Access, 2019.
[14] Dilantha Haputhanthri et al, An EEG based Channel Optimized
Classification Approach for Autism Spectrum Disorder, IEEE
Conf, 2019.
[15] Kazi Shahrukh Omar et al., A Machine Learning Approach to
Predict Autism Spectrum Disorder, 2019 International Conference
on Electrical, Computer and Communication Engineering (ECCE),
7-9 February, 2019.
[16] Md. Shahriare Satu et.al., Early Detection of Autism by Extracting
Features: A Case Study in Bangladesh, 2019 International
Conference on Robotics,Electrical and Signal Processing
Techniques (ICREST).
[17] Intan Nurma Yulita et. Al., Random Subspace Method for Sleep
Stage Classification of Autism Patients, ISRITI, 2018.
[18] Alrence Santiago Halibas et. al. ,Performance Analysis of Machine
Learning Classifiers for ASD Screening, 2018 International
Conference on Innovation and Intelligence for Informatics,
Computing, and Technologies (3ICT)
[19] Wenbo Liu, Zhiding Yu, Bhiksha Raj, Li Yi, Xiaobing Zou &
Ming Li. (2015). Efficient Autism Spectrum Disorder Prediction
with Eye Movement: A Machine Learning Framework
[20] H. Zhang, “The Optimality of Naive Bayes,” Proc. Seventeenth Int.
Florida Artif. Intell. Res. Soc. Conf. FLAIRS 2004, vol. 1, no. 2,
pp. 1–6, 2004.

144 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE 2020)

Authorized licensed use limited to: UNIVERSITY OF WESTERN ONTARIO. Downloaded on May 25,2021 at 12:05:28 UTC from IEEE Xplore. Restrictions apply.

You might also like