VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Belagavi, Karnataka, India
A Synopsis on
“Analysis and Detection of Autism Spectrum Disorder Using Machine
Learning”
Submitted to RVITM Affiliated to Visvesvaraya Technological University (VTU Belagavi) in partial
fulfillment of the requirements for the award of degree of
BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERING
By
Project Team No. PT15
Akshat Gupta 1RF20EC003
Archith P 1RF20EC011
Mohammed Nadeem 1RF20EC028
Vikrant Rana 1RF20EC053
Under the guidance of
Dr. Vikash Kumar
Assistant Professor Dept. of ECE,
RVITM
R V Educational Institutions
R V Institute of Technology and Management, Bengaluru
Department of Electronics and Communication Engineering
2023-24
ABSTRACT
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a wide
range of behavioral and cognitive traits. Early detection and accurate diagnosis of ASD are critical for
enabling timely interventions that can significantly improve outcomes for affected individuals. This
project presents a comprehensive approach that harnesses the power of machine learning to facilitate the
detection and assessment of autism.
The proposed workflow involves the pre-processing of data, training, and testing with various ML
models, evaluation of results and prediction of ASD. The proposed method is evaluated on a publicly
available dataset. The dataset is collected based on the evaluation of 21 attributes that are found to be
common in patients suffering from ASD. Data pre-processing is a technique which transforms the raw
data into a meaningful and understandable format. Then the preprocessed data is used to train various
ML models and the models are evaluated on different metrics such as sensitivity, specificity and
accuracy.
The proposed project aims to provide a significant step toward advancing the early detection and
assessment of ASD . The manual process of ASD diagnosis is unreliable due to unavailability of
resources and expert opinion. Therefore, computerized diagnostic systems which use Machine learning
architectures, are proposed to learn the patterns in the provided data and to identify the severity of the
disease. The proposed ML model can achieve high performance on ASD detection compared with the
conventional approach.
1
MOTIVATION
The motivation for using machine learning algorithms to analyze and detect ASD is driven by several
important factors:
1.Early Intervention: Early detection of ASD is crucial for providing effective interventions and
support to people suffering with ASD. The earlier the patient is diagnosed, the more effective
interventions can be, which can significantly improve their long-term outcomes.
2.Diagnostic Challenges: ASD is a complex neurodevelopmental disorder with a wide range of
symptoms and varying degrees of severity. Diagnosing ASD based solely on clinical observation can be
challenging and time-consuming, and there is a need for more objective and accurate diagnostic tools.
3.Reduction of Healthcare Costs: Early and accurate diagnosis of ASD can lead to cost savings in
healthcare. It can help avoid misdiagnoses, unnecessary tests, and delays in accessing appropriate
interventions, thereby reducing the overall burden on healthcare systems.
4.Research Advancements: Machine learning can assist researchers in uncovering the underlying
mechanisms and causes of ASD by analyzing diverse datasets. This can lead to a deeper understanding
of the disorder and potentially the development of more targeted treatments.
5.Remote Screening: Machine learning models can be applied to remote screening, allowing for the
assessment of ASD risk factors and symptoms in individuals who may not have easy access to
specialized clinical facilities.
The motivation for using machine learning in the analysis and detection of Autism Spectrum Disorder is
driven by the potential to improve early diagnosis, reduce healthcare costs, advance research, and
ultimately enhance the quality of life for individuals with ASD and their families. Machine learning
offers the promise of more objective and data-driven approaches to understanding and addressing this
complex neurodevelopmental disorder.
2
LITERATURE REVIEW
TITLE FEATURE LIMITATION
Muhammad Shuaib Qureshi Support Vector Machine (SVM) The hyperplane dimension must
et al. “Prediction and SVM is a supervised be altered from one to
Analysis of Autism Spectrum classification technique that uses the Nth dimension in this scenario
Disorder Using Machine a line to distinguish between two called as Kernel.
Learning Techniques”(2023). separate groups. SVM algorithm is not suitable for
• SVM works relatively well large data sets.
when there is a clear margin of SVM does not perform very well
separation between classes. when the data set has more noise
• SVM is more effective in high i.e. target classes are overlapping.
dimensional spaces.
• SVM is effective in cases
where the number of
dimensions is greater than the
number of samples.
• SVM is relatively memory
efficient
K. Vijayalakshmi et al. “A RandomForest Classifier: It requires much computational
Hybrid Recommender System Random Forest Decision power as well as resources as it
using Multi Classifier Tree is a widespread builds numerous trees to combine
Regression Model for Autism classification mechanism their outputs.
Detection”(2022) handles any binary
classification problems It also requires much time for
through visualization. training as it combines a lot of
Random Forest is a decision trees to determine the
collaborative decision tree class.
based technique that
generates a forest as a
group of decision trees.
• The advantage of DT is
model overfitting can be
overcome with Random
Forest.
• Using voting, the best
scored tree will be
selected from the forest
randomly on subtrees.
3
TITLE FEATURE LIMITATION
Shirajul Islam et al. Logistic Regression (LR) Logistic If the number of
“Autism Spectrum Regression’s primary aim is in finding the observations is lesser than
Disorder Detection in model with the best fit that describes the the number of features,
Toddlers for Early relationship between the binomial Logistic Regression should
Diagnosis Using Machine character of interest and a set of not be used, otherwise, it
Learning”(2021). independent variables. It makes use of a may lead to overfitting.
logistic function to find an optimal curve
to fit the data points. It can only be used to predict
• It makes no assumptions about discrete functions.
distributions of classes in feature space.
• It can easily extend to multiple
classes(multinomial regression) and a
natural probabilistic view of class
predictions.
Sushama Rani Dutta et Naive Bayes (NB) Based around NB is that it only works well
al.“A Machine Learning- conditional probability (Bayes theorem) with limited number of
based Method for Autism and counting, the name “naïve” comes features. Moreover, there is a
Diagnosis Assistance from its assumption of conditional high bias when there is a
in Children”(2021) independence of all input features. If this small amount of data.
assumption is considered true.
The rate at which an NB classifier will
converge will be much higher than a
discriminative model like logistic
regression. Therefore, the amount of
training data required would be lesser.
4
TITLE FEATURES LIMITATION
Haibin Cai, Yinfeng Fang, Decision trees can provide Overfitting: Decision trees are
Zhaojie Ju et al.“Sensing- information about the prone to overfitting.
enhanced Therapy System for importance of different features
Assessing Children with Autism (questions, variables, or Instability: Small changes in the
Spectrum Disorders: A symptoms) in the classification data can lead to different tree
Feasibility Study”(2020). process. This can be valuable structures, making the model
for understanding which factors unstable
contribute most to autism
detection.
Decision trees can capture non-
linear relationships between
features and the target variable,
which can be important in
identifying complex patterns in
autism diagnosis.
Andi W.R Emanuel et al. K-Nearest Neighbors Sensitive to Outliers.
“Machine Learning Classifiers Algorithm The k-nearest Computationally Expensive.
for Autism Spectrum neighbors algorithm, also
Disorder”(2020). known as KNN or k-NN, is a
non-parametric, supervised
learning classifier, which uses
proximity to make
classifications or predictions
about the grouping of an
individual data point.
• Can Handle Large Datasets
• Accurate and Effective
Che Zawiyah Che Hasan, XGBoost (Extreme Gradient XGBoost is a complex algorithm
Rozita Jailani et al. “ANN and Boosting) is a powerful with many hyperparameters to
SVM Classifiers in Identifying machine learning algorithm tune.
Autism Spectrum Disorder Gait that has been widely used for
Based on Three-Dimensional various classification tasks, XGBoost may not perform well if
Ground Reaction including autism detection. the data quality is poor or if the
Forces”(2019). Here are the pros and cons of dataset is biased
using XGBoost for autism
detection.
• outstanding predictive
performance.
• XGBoost can handle
missing data by learning
how to impute missing
values during the training
5
PROBLEM STATEMENT
Autism spectrum disorder (ASD) is a disorder where patients are unable to express and interact.
Recently it is an issue to be concerned that one in 59 children has identified as an autism spectrum
disorder patient. According to recent reports, about 20 million people in India are diagnosed with
autism. ASDs start from childhood but symptoms can be detected in adulthood. That is why these
children are not being able to have proper treatment at an early age and that causes more complexity in
their health. Research shows that a diagnosis of autism at an earlier age can be more reliable and stable.
Therefore, our proposed project aims to estimate ASD at a sooner possible time and increase more
accuracy than the previous research and reduce medical costs.
Early detection and treatment are the most important steps to be taken to decrease the symptoms of ASD
problem and to improve the quality of life of ASD suffering people. However, there is no procedure of
medical test for the detection of autism. ASD Symptoms are usually recognized by observation. By
assuming that human genes are responsible for it, the exact causes of ASD have not been recognized by
the scientist yet. The human genes affect the development by influencing the environment.
SCOPE & OBJECTIVES
Develop a Machine learning model that accurately detects Autism Spectrum Disorder:
1. Accurate Detection: Develop machine learning models that can accurately detect ASD from the
collected data, with a focus on achieving high sensitivity and specificity.
2. Early Detection: If applicable, design models that can identify signs of ASD in early childhood to
facilitate early intervention and support.
3. Interpretability: Ensure that the machine learning models provide interpretable results, enabling
healthcare professionals to understand the basis for ASD diagnosis.
4. Reduced Misdiagnosis: Minimize the risk of misdiagnosis and improve the reliability of ASD
diagnosis compared to traditional assessment methods.
6
METHODOLOGY
The steps in the proposed workflow, as shown in Fig 1, which involves the pre-processing of data,
training, and testing with specified models, evaluation of results and prediction of ASD.
Preprocessing
Data pre-processing is a technique in which transform the raw data into a meaningful and
understandable format. Real-world data is commonly incomplete and inconsistent because it contains
lots of errors and null values. A good pre-processed data always yields to a good result. Various Data
pre-processing methods are used to handle incomplete and inconsistent data like as handling missing
values, outlier detection, data discretization, data reduction (dimension and numerosity reduction), etc.
The problems of missing values in these dataset has been handled by imputation method.
Training and Testing Model
The whole dataset has been split into two parts i.e. one part is training the dataset and the other one is
testing dataset with a ratio of 80:20 respectively. For cross-validation purposes again training data has
been split into two parts. One part is the training dataset and another part is the validation dataset into an
80:20 ratio respectively. Figure 2 shows the final training, testing and validation sets on which
classification has been performed.
7
Support Vector Machine (SVM)
SVM is a linear supervised machine learning approach that is used for classification and regression. It is
a pattern recognition problem solver. It does not cause the problem of overfitting.
Naïve Bayes (NB)
A naive Bayes classifier is a supervised learning algorithm. It is a generative model and is based on joint
probability distribution. The Naive Bayes concept based on independent assumptions. It exhibits less
training time as compared to SVM model.
Convolutional Neural Network (CNN)
CNN is one of the deep learning techniques known to build models for various problems. It is a feed-
forward neural network that is inspired by the human brain. A CNN model contains one input layer, one
output layer, and many other different layers i.e. convolution layers, max pooling, fully connected
layers, and normalization layers.
Logistic Regression (LR)
LR is a regression tool that is used to analyze the binary dependent variables. Its output value lies in
either the 0 or 1 form. It is used for the continuous value dataset.
K- Nearest Neighbour (KNN)
KNN is a supervised learning approach and is the simplest of all. It is used for classification as well as
regression problems. It assumes that similar data exist nearby. The ‘K’ part indicates the number of seed
point that is to be selected . It should be chosen carefully to reduce the error.
8
FLOWCHART
ATTRIBUTES
DATA
PREPROCESSING
MACHINE
LEARNING
ALGORITHM
CLASSIFICATION
AUTISM NO AUTISM
9
Expected Outcome of the Project
• The ML models are evaluated on the following metrics:
the confusion matrix for a binary classification algorithm is as shown
TP
• The expected accuracy of the proposed model would be in the range of 80-98%
• .
Facilities Required for Proposed Work
Software :
Jupyter Notebook
Tensor flow framework
Programming languages :
Python
HTML(frontend)
CSS(frontend)
Java script(frontend)
10
Datasets :
UCI Machine Learning Repository is used to obtain the dataset inn the following 3 categories:
Processor :
I5 10th Gen with Nvidia graphics card (for training data)
11
Bibliography/Reference
[1] Benjamin Gesundheit* and Joshua P. Rosenzweig, “Editorial: Autism Spectrum Disorders (ASD)-
Searching for the Biological Basis for Behavioral Symptoms and New Therapeutic Targets, Published
online 2023 Jan.
[2] Arodami Chorianopoulou, Efthymios Tzinis, Elias Iosif Asimenia Papoulidi, Christina Papailiou,
Alexandros Potamianos, “Engagement detection for children with autism spectrum disorder”, 2023.
[3] Siriwan Sunsirikul and Tiranee Achalakul, “Associative Classification Mining in the Behavior
Study of Autism Spectrum Disorder”, vol.3, 2022.
[4] Beibin Li ; Sachin Mehta ; Deepali Aneja ; ClaireFoster ; PamelaVentola ; Frederick Shic ; Linda
Shapiro, “A Facial Affect Analysis System for Autism Spectrum Disorder”, 2022.
[5] Pratibha Vellanki, Thi Duong, Svetha Venkatesh, Dinh Phung, “Nonparametric Discovery of
Learning Patterns and Autism Subgroups from Therapeutic Data”, 2022.
[6] Paul Fergus, Basma Abdulaimma, Chris Carter, Sheena Round, “Interactive Mobile Technology for
Children with Autism Spectrum Condition (ASC)”, 2021.
[7] V.Y Tittagalla, R. R. P Wickramarachchi, G. W. C. N. Chandrarathne, N.M. D. M. B. Nanayakkara,
P. Samarasinghe, P. Rathnayake and M.G.N.M. Pemadasa, “Screening Tool for Autistic Children”,
2021.
[8] Daiki Mitsumoto, Takeshi Hori, Shigeki Sagayama Hidenori Yamasue, Keiho Owada, Masaki
Kojima, Keiko Ochi, Nobutaka Ono, “Autism Spectrum Disorder Discrimination Based on Voice
Activities Related to Fillers and Laughter”, 2021.
[9] Tarannum Zaki, Muhammad Nazrul Islam, Md. Sami Uddin, Sanjida Nasreen Tumpa, Md. Jubair
Hossain, Maksuda Rahman Anti, Md. Mahedi Hasan, “Towards Developing a Learning Tool for
Children with Autism”, 2021.
12
[10] Ardiana Sula, Evjola Spaho, Keita Matsuo, Leonard Barolli, Rozeta Miho and Fatos Xhafa, “An
IoT-based System for Supporting Children with Autism Spectrum Disorder”, 2020.
[11] Haibin Cai, Yinfeng Fang, Zhaojie Ju, Cristina Costescu, Daniel David, Erik Billing, Tom Ziemke,
Serge Thill, Tony Belpaeme, Bram Vanderborght, David Vernon, Kathleen Richardson and Honghai Liu,
“Sensing-enhanced Therapy System for Assessing Children with Autism Spectrum Disorders: A
Feasibility Study”, 2020.
[12] Akshay Vijayan ; S Janmasree ; C Keerthana ; L Baby Syla, “A Framework for Intelligent Learning
Assistant Platform Based on Cognitive Computing for Children with Autism Spectrum Disorder”, July
2019.
[13] Sushama Rani Dutta ; Sujoy Datta ; Monideepa Roy, “Using Cogency and Machine Learning for
Autism Detection from a Preliminary Symptom”, July 2019.
[14] Che Zawiyah Che Hasan, Rozita Jailani and Nooritawati Md Tahir, “ANN and SVM Classifiers in
Identifying Autism Spectrum Disorder Gait Based on Three-Dimensional Ground Reaction Forces”,
October 2019.
[15] D. P. Wall, R. Dally, R. Luyster, J.-Y. Jung, and T. F. DeLuca, “Use of artificial intelligence to
shorten the behavioral diagnosis of autism,” PloS one, vol. 7, no. 8, p. e43855, 2019.
[16] D. Bone, S. L. Bishop, M. P. Black, M. S. Goodwin, C. Lord, and S. S. Narayanan, “Use of
machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and
multi-instrument fusion,” Journal of Child Psychology and Psychiatry, vol. 57, 2019.
[17] J. Kosmicki, V. Sochat, M. Duda, and D. Wall, “Searching for a minimal set of behaviors for autism
detection through feature selection-based machine learning,” Translational psychiatry, vol. 5, no. 2, p.
e514, 2019.
13
[18] W. Liu, M. Li, and L. Yi, “Identifying children with autism spectrum disorder based on their face
processing abnormality: A machine learning framework,” Autism Research, vol. 9, no. 8, pp. 888–898,
2019.
[19] Kazi Shahrukh Omar, Prodipta Mondal, Nabila Shahnaz Khan, “A Machine Learning Approach to
Predict Autism Spectrum Disorder”, 7-9 February, 2019.
[20] Akter, Tania, Md Shahriare Satu, Md Imran Khan, Mohammad Hanif Ali, Shahadat Uddin, Pietro
Lio, Julian MW Quinn, and Mohammad Ali Moni. "Machine learning-based models for early stage
detection of autism spectrum disorders." 2019.
14
Additional Information
Is the project proposed relevant to the Industry / Society or Institution?
The identification of autism using ML algorithms is a proposed project that is very important to both
business and society. The creation of a reliable and effective ML algorithms for the diagnosis of autism
can have important commercial implications. Such algorithms can be used by medical firms and
healthcare professionals to give quicker and more precise diagnosis of autism, improving patient
outcomes and lowering costs.
The suggested project can also help society because it could significantly affect how autism is
diagnosed and treated. The proposed project can help in the early identification of autism, allowing for
prompt interventions and therapies to improve the quality of life for individuals with autism and their
families. It will do this by creating a dependable and effective instrument for the detection of autism.
All things considered, the suggested project on the diagnosis of autism using ML algorithms has huge
ramifications for both business and society, making it extremely relevant and significant.
Can the product or process developed in the project be taken up for filing a Patent?
No
15