Islam 2020
Islam 2020
Abstract—Autism spectrum disorder (ASD) is a disorder where his or her communication skill through therapy. From the age
patients are unable to express and interact. Recently it is an of 12 months to 18 months, symptoms start to show, and if
issue to be concerned that one in 59 children has identified as detected earlier and treated accordingly [2]. We aim to detect
an autism spectrum disorder patient. ASDs start from childhood
but symptoms can be detected in adulthood. That is why these autism at an early age so that necessary steps can be taken
children are not being able to have proper treatment at an to prevent it from getting worse. Early detection can help in
early age and that causes more complexity in their health. not spending a lot in the future as it has been eliminating
Research shows that a diagnosis of autism at an earlier age those situations such as developing social skills and so on.
can be more reliable and stable. Therefore, our study aims to According to WHO every year among 160 children, one is
estimate ASD (autism spectrum disorder) at a sooner possible
time and increase more accuracy than the previous research and diagnosed with ASD traits all over the world [3]. Treating
reduce medical costs. In our thesis paper, we want to predict and ASD earlier is always the best option for toddlers as they
distinguish between autistic and non-autistic children by using are still developing. Against this huge burden, only 200
a machine learning approach. Firstly, we have gathered data psychiatrists and limited professionals are serving. As doctors
from the surveillance side as much as possible. We also set some have to depend on observing the responses of toddlers as well
particular questions and try to find maximum accurate answers
to all questions. Furthermore, supervised learning algorithms are as listening to the concerns of their parents so it is not easy
applied to diagnosis whether children meet the symptoms for to make an ASD diagnosis at all. That is why the objective
ASD. Among all applied algorithms KNN and Random Forest of the work is to detect ASD symptoms at a premature age
shows maximum accuracy and speed to diagnosis. Above all, our at minimum time and search for maximum accurate dataset
final goal is to create an online tool that can provide machine to improve the accuracy of previous research and using
learning-based analysis to a user to detect autism at an early age
precisely. maximum data. Besides, this work focuses on developing a
Index Terms—ASD, Toddler, Machine learning, Prediction, model using supervised Machine Learning techniques.
Treatment, KNN, Random Forest Classifier
I. I NTRODUCTION
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.
II. P ROPOSED M ODEL
A. Dataset Description
To apply supervised algorithms, we have used 1054
datasets. It has been categorized into three areas including
medical, health and social science. Besides, the attribute
type is categorical, continuous and binary. The queries of
the Q-CHAT and AQ tools are 10. Each item value assigned
from Kaggle and UCI ML repository [4]. The contained
datasets are for toddlers and it represents 30.76% female and
69.24% male in toddlers. There are a total of 1054 toddler
case values. In addition, the following ten questions were
asked to the parent, self, caregiver and medical staff. The
Q-CHAT questions possible answers are- ‘Always, Usually,
Sometimes, Rarely and Never’ which are selected as ‘0’
or ‘1’ in the dataset. If their reply was always /usually/
sometimes then ‘1’ is allocated. Otherwise it is 0 for 10
(A10). Again for question 0-9 (A0-A9), the response was
recorded as ‘1’ for the answer Sometimes / Rarely / Never.
If any child cuts more than 3(Q-CHAT-10-score) then the
child will be detected with ASD otherwise no ASD traits are
detected. Nevertheless, figure 3.1 and figure 3.2 describe the
feature of data in different columns [4]. Fig. 2: Features collected and their descriptions
B. Data preprocessing
The acquired dataset needed some modifications before we
could test our classifier algorithms on it. We preprocessed
the dataset in such a way that it was able to provide prime
output. Previously we found that unsorted and unprocessed
data affected our result scores hugely. To get rid of the
problems we followed some particular steps so the algorithms
would be able to give more precise results. Most of our
values in the dataset were binary and boolean values. They
were based on polar questions primarily. However, few of
the question criteria required non integer and non boolean
type answers. These data were recorded in string format.
For our algorithm to give the optimal result we needed to
convert these string type values to binary. We used one hot
encoding to transform the values to binary. This technique
was applied on only the values of the column ‘Ethnicity’. One
hot encoding is a method that converts categorical variables
into a type which can be given to ML algorithms to do a
Fig. 1: Details of variables mapping to the Q-Chart-10 better job of projection. For this process we first needed to
screening methods switch the string values of ‘Ethnicity’ column by default then
applied one hot encoding on these values to get the unique
The values were collected based on the Q-CHAT question- binary correspondents. We also used Standard deviation in
naires. Here, most of the data are boolean or binary type which ‘Age mons’ and ‘Qchat-10-Score’ columns which transforms
will be fitting for the classifiers to compute and will not need the value within the range of -3 to 3. Lastly, we dropped
preprocessing. Besides these there are also some integer and two column values that indicated the case number and who
string type value fields which will need conversion before we completed the Q-CHAT questionnaires test as they were not
use them in any classifiers. Otherwise we will not be able to needed for our analysis. After this we were able to read data
reach the optimal result. The table below contains the data for our experimentation. Next we decided the ML algorithms
type of each field and also the explanation of the features which we will use for our experimentation based on our data
along with the data acquiring method. size, the data features and the target set of results we are
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.
looking for. Initially we choose the following algorithms: have transformed a column ‘ethnicity’ to unique binary values.
• Random forest Classifier. For data preprocessing we have also used standard deviation in
• Support Vector Machine. ’Age mons’ and ’Qchat-10-Score’ which transforms the value
• Decision Tree Classifier. within the range of -3 to 3. After preprocessing we started
• Gaussian Naive Bayes Classifier. applying our ML algorithms. We used Logistic Regression,
• K Nearest Neighbor. decision tree, Random forest, Naive Bayes, SVM and KNN.
• Logistic Regression. Logistic regression and decision trees have been later excluded
For these algorithms we need to import the required libraries as they overfit our dataset. But the other algorithms resulted
individually for each of the classifiers. Our goal by applying in a good way, they did not overfit or underfit as our dataset is
these classifiers was to find the accuracy, precision and recall less for the other algorithm. Accuracy of all the algorithms has
scores for which we imported the necessary libraries. For been compared to each other for better results. Furthermore
KNN, Random Forest and Naive Bayes classifiers we imported we found the score for precision and recall. Precision does
AdaBoost classifier and GradientBoosting classifier along with not depend on accuracy as for precision the values are not
the other required ones. Some of the common libraries all of always the same. After getting precision,accuracy and recall
the libraries used were pandas, numpy, plt, plot ROC curve we compared the results and selected the best fit algorithm
etc. Our experiments were done using Google colab notebook. for our model. Lastly we apply them for the application of
We imported libraries and the preprocessed data on colab to our model. Our goal is to improve the accuracy of the result
compute our target result. At the beginning of the process we differing from the other autism detection research. Besides this
set the features of the data to X and the output to Y. We then our second target is to get as much data possible to train our
splitted our data set into a test and train set, where 20% of the model. We have planned to offer a mobile application of this
dataset were used for testing and 80% were used for training. model in the future and for that we will be needing a fund
sponsor.
C. Model Description
The proposed model ensures a more accurate result to the III. R ESULT AND A NALYSIS
research about autism detection at an early age of autism. Firstly, to build a model the important thing is to find the
We used different machine learning algorithms to get more goodness of a model. And the most valuable work is going
accurate results like the Support Vector machine, Random to be how good the predictions are. In here, the following
Forest, Naive Bayes and KNN, Logistic Regression and De- diagram shows the outcome of a model. This is called a ROC
cision Tree. First, we collected the questions by following a curve. In a ROC curve, it can be determined how an algorithm
standard. Those who are required to answer them we collected performs by observing the AUC.
their answers and sent them to data preprocessing. Data
preprocessing is adjusting if it is a non-matrix format type or
not, how many attributes and instances are there and how many
we need to run in a specific algorithm. In our case there are
1054 instances and 18 attributes including the class variables.
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.
color values are needed to be minimized. The details of these
values are-
F 1Score = 2∗((P recision∗Recall)/(P recision+Recall))
(4)
So, these parameters and ROC curve show the performance
of a model [6].
We have implemented four supervised machine learning
classifier algorithms in this paper. These are K-NN, Naive
Bayes, SVM and Random Forest. For every experiment we
used 20% data for testing and 80% for training the model.
Fig. 5: Confusion matrix parameters SVM Classifier Experiment: Supporting vector machines
is a different type of machine learning classifier that shows
True Positives(TP): True positives are correctly found some different kind of result than other models. SVM is a
values. This means the genuine result and the anticipated model which works best for limited data processing. As we
result both are yes. are dealing with less than two thousand data, this algorithm
was quite fitting for the task. In our experiment of SVM,
True Negatives(TN): True negatives are also correctly we basically used the SVM that performed very well. Before
found results. This means the genuine and anticipated result some proper preprocessing and without using the SVM it
both are no. showed almost 71-74% accuracy which was not so good.
After preprocessing and by using the SVM it showed us 83%
False Positives(FP): This means the real result is no but accuracy. We used gamma as 0.7 and C as 1.0 for parameters in
the anticipated result is yes applying the classifier. Then the result of its accuracy showed
83% almost. The precision score it showed is 89%. Moreover,
88% and 88% of Recall and F1 score. These scores can be
False Negatives(FN): This means the real result is yes but
seen in the following results table. The time it took while
anticipated result is no.
generating the result was 1.91 seconds which is quite higher
than some other models.
To identify how good a model has performed can be
Naive Bayes Classifier Experiment: In Naive bayes exper-
measured by some parameters namely Precision, Accuracy, F1
iment we used the Gaussian Naive bayes which showed some
Score and Recall. By using these four values we can calculate
promising results and the results were better than Supporting
Precision, Accuracy, F1 Score and Recall. These scores are:
Vector Machine classifier. We kept the parameters for Naive
Accuracy: Accuracy means how close the measured value
Bayes as default. We did not apply any different parameters
is compared to the standard value. Proportion of the total
for Naive bayes as it showed some good results in default
number of predictions that are actually correct.
parameters. It gave 89% of accuracy which is better than
the first algorithm and the precision score is 100% which is
Accuracy = (T P + T N )/(T P + F P + F N + T N ) (1) impressive. Also, Recall score was 84% which is not so good
as before and the F1 score it showed is 91%. It took 1.53
Precision: It indicates to the adjacency of two or more seconds to complete the experiment and to produce the result
measurements to each other. If we take any measurement 3 which is the lowest among all other algorithms.
times and every time we get the same value though the value Random Forest Classifier Experiment: The third algo-
is not close to the standard value then the measured value will rithm we used is Random Forest algorithm which performed
be considered as precise but not accurate. So precision does as one of the best performers. We kept parameters of random
not depend on accuracy. forest as best as possible. Then the results it showed are better
than the first two algorithms. It showed 93% of accuracy which
P recision = T P/(T P + F P ) (2) is the second best result we have got. Precision was 92% which
is good enough also. Then the recall and F1 score was 100%
Recall(Sensitivity):Recall actually determines how many and 96% which is also better than others. All the results of
of the real positives a model captures by labeling it as true this algorithm were good except one thing and that is the time
positive. So, recall is accurately anticipated positive inspection it took was higher than any other algorithms. 2.30 seconds is
ratio to all the inspections in the real table. the highest time it took than other algorithms.
K-NN Classifier Experiment: This algorithm was the
Recall = T P/(T P + F N ) (3) last successful implementation of our experiments and it
showed the best result among all other algorithms. We used
F1 Score: F1 takes both false negative and false positive in parameters of n neighbors as 5, metric as âminkowskiâ and
the count and takes a weighted average. F1 is usually more p equal to 2. It performed in this metric minkowski is better
useful than accuracy. than others. The accuracy it showed is 98% which is the
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.
finest among all results we got. The scores of Precision, most promising result and took one the lowest possible time
Recall and F1 score as 100%, 97% and 99%. All are good while calculating compared to others. Then the second best
scores as they had higher results than others. The Precision performed classifier algorithm was Random Forest Classifier.
score is better than some others. And Precision indicates to In algorithms we chose K-NN because closeness of features
the adjacency of two or more measurements to each other. is the main basement of K-NN. A data point is classified by
Also, high precision rate means low false positive rate which it based on how classification of its neighbours is done. It
means the score is high and this is good. Moreover, it should works better on data which are not complex and our dataset
be mentioned that this algorithm did a good time management is complication free(noise free) as we pre-processed the
like it took only 1.55 seconds to complete the calculation. data. Also, it works well in small dataset. K-NN normally
The result of applying the algorithms are given below: calculates the euclidean distance of unknown data points
from all the points to find the nearest neighbours values.
Method Accuracy Precision Recall F1 Score Time
Then by default it takes 5 neighbours values of euclidean
SVM 0.83 0.89 0.88 0.88 1.91 sec distance to classify. We used 33 as neighbours value because
Naive Bayes 0.89 1.0 0.84 0.91 1.53 sec
Random Forest 0.93 0.92 1.0 0.96 2.30 sec that is the square root of our total data. Square root of total
K-NN 0.98 1.0 0.97 0.99 1.55 sec data is one of the best to take as the nearest neighbour at
TABLE I: Results after applying Classifier algorithms k value. And the 33 nearest values of euclidean distance
did help to classify that a child has autism or not. So these
are the reasons why K-NN performed really well in our model.
Firstly, to obtain a good result from a dataset we had
to do some preprocessing that is necessary for a machine Thirdly, the second best result given algorithm is Random
learning model to perform well. Also, in machine learning Forest and we used it because Random Forest shows no
we should do some proper data preprocessing for future overfitting of data in result. By using multiple trees it reduces
proofing that means the acceptability of models and to the risk of overfitting. Also, it takes less training time though
perform well depends on preprocessing also. So, what we it did the opposite in terms of timing in our case. Random
did in preprocessing is, we kept all of the values of features forest method operates by constructing multiple decision trees
in binary except three features values. In those three features when it is in the training phase. And from those decision
one was “Ethnicity” and it was in string type. For the string trees, random forest takes the majority voted result and that
type data we did encoding which is named one hot encoding is why it can give a higher accuracy in results.
to transform the string type data to a binary value but
unique. We could have used only simple numerical values but After that, the Naive Bayes classifier has given the third
numerical values for 11 types of ethnicity will not give better good result which is 89% accuracy. Naive Bayes classifier
results after applying algorithms. Now, one hot encoding works on the basis of contingent probability from the Theorem
is different from simple binary code because it represents of Bayes. It is very easy to implement and simple. It is not
a value as a unique value and there is no dependency or sensitive to irrelevant features and that is what we found
serial like numerical values. And for the other two datas, it would have given good results even if we had irrelevant
“Age Mons” and “Qchat-10-Score” we did standard deviation features. It needs less training data and we chose it as we
that transformed the datas to the range of -3 to 3 which were have less amount of data which is less than two thousand.
previously numerical values without any range. These data Moreover, it is fast and it takes 1.53 seconds of time to train
preprocessing helped really well in algorithms performance and calculate the results which is the lowest compared to other
of obtaining better results. For instance, Support Vector algorithms. But it gave less accuracy than the Random Forest
Machine gave accuracy results of 71-73% highest without the and K-NN classifier because those algorithms could obtain
preprocessing of datas and when the values of “Ethnicity” better results as those have got some advantages in our dataset.
were converted into numerical values only. But after applying
preprocessing the algorithm “SVM” did a better job like the In the case of Support Vector Machine, it gave good
accuracy increased by 10%.So, that is why we used these results but the lowest among other algorithms. Support Vector
preprocessing materials for all the algorithms. Machine creates a decision boundary by dividing the data
into two categories. In splitting the data, the boundary should
Secondly, we experimented a total of six algorithms on be in a way where maximum space is found that separates the
our dataset. The algorithms were Random Forest, SVM, two classes. In our case the Support Vector Machine could
K-NN, Naive Bayes, Logistic Regression and Decision Tree not place the boundary in the best place where it separates
classifier. Every one of them were classifier algorithms. the two classes in maximum space means the support vector
From these six algorithms, two of the algorithms were and the hyperplane was not as far as possible, that is why it
named Logistic Regression and Decision Tree classifier showed less better results than others.
which resulted in overfit of data. So, we dropped these two
algorithms and kept the best performed algorithms in our The ROC curve in figure 6 demonstrates the true positive
experiments. From the rest 4 algorithms K-NN showed us the rate versus false positive rate curve of algorithms. Here we
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.
for future work is to collect as much possible data from various
sources and enhance the accuracy more. Moreover, we are
thinking of building a user-friendly mobile application for end
users based on our proposed model so that any individual
can use the application to predict the early autism symptoms
effortlessly so they can seek professional help if needed. Since
diagnosing autism is quite a costly and lengthy process, it has
been postponed for countless children. To conclude, with the
help of our proposed model individuals can be guided at a
very early age that will limit the situation from getting any
worse and reduce costs associated with delayed diagnosis.
Fig. 6: ROC curve comparing Random Forest Classifier with R EFERENCES
KNN, Gaussian NB and SVM [1] J. Baio, “Prevalence of autism spectrum disorders: Autism and devel-
opmental disabilities monitoring network, 14 sites, united states, 2008.
morbidity and mortality weekly report. surveillance summaries. volume
can see that the Support Vector Machine has the lowest area 61, number 3.” Centers for Disease Control and Prevention, 2012.
[2] S. E. Bryson, L. Zwaigenbaum, and W. Roberts, “The early detection of
under the curve and that is why the orange curve looks like autism in clinical practice,” Paediatrics & child health, vol. 9, no. 4, pp.
this. And the other curves show that the GaussianNB is a 219–221, 2004.
little bit less accurate than the K-NN and Random forest [3] P. Mesa-Gresa, H. Gil-Gómez, J.-A. Lozano-Quilis, and J.-A. Gil-Gómez,
“Effectiveness of virtual reality for children and adolescents with autism
classifier algorithms. So, It means the curve demonstrates spectrum disorder: an evidence-based systematic review,” Sensors, vol. 18,
Random Forest and K-NN is showing better performance than no. 8, p. 2486, 2018.
others. [4] F. Thabtah and D. Peebles, “A new machine learning model based on
induction of rules for autism detection,” Health informatics journal,
IV. C ONCLUSION AND F UTURE W ORK vol. 26, no. 1, pp. 264–286, 2020.
[5] N. DONGES, “A complete guide to the random forest algorithm,” Built
Our proposed model allows us to have a more accurate In, vol. 16, 2019.
result in terms of detecting autism at an early age. Questions [6] E. Solutions and . Name, “Accuracy, precision, recall amp; f1 score:
Interpretation of performance measures,” Nov 2016. [Online]. Available:
which are provided to the parents to identify if their children https://blog.exsilio.com/all/accuracy-precision-recall-f1
are in danger or out of danger is set in a way to maintain
their privacy. Using the dataset from Q-CHAT and AQ tools,
our proposed model can predict using SVM, Random Forest,
Naive Bayes and KNN with 83%, 93% ,89% and 98%
accuracy in case of toddlers. Algorithms which are supervised
are selected to run our dataset after preprocessing it. We used
SVM, Random Forest, Naive Bayes and KNN to get our output
more accurately. This outcome showed better performance
compared to the others. Our result showed marginal perfor-
mance in terms of accuracy (93%, 98%). The only limitation of
our model is the lack of enough large data to train our model.
The limitations on the characteristics of design or methodology
that impacted or influenced the interpretation of our thesis
research, the prime candidate is the lack of dataset. While
implementing two algorithms we found that because of the
lack of our dataset our result was overfitting. This modeling
error occurred because a function corresponds too closely to a
particular set of data. As a result, it failed to fit additional data
and affected the accuracy of predicting future observations and
we had to drop those two algorithms. In addition, it was not
helpful even after preprocessing our data. Previously, many
tried to detect autism at different ranges of age but we tried to
emphasize on early ASD detection. The purpose of choosing
such an age limitation in our model is to get as accurate results
as possible. With the help of more accurate results and more
data to train our model we can get tremendous work done.
Many countries are struggling to detect autism as early as
possible but with our model and the set of questionnaires we
collected the problem can be solved effortlessly. Our objective
Authorized licensed use limited to: Carleton University. Downloaded on June 05,2021 at 12:16:47 UTC from IEEE Xplore. Restrictions apply.