0% found this document useful (0 votes)
24 views31 pages

Ultimate Final Report Phase 2 Sem 7

This document describes a project that aims to analyze and detect autism spectrum disorder using machine learning. It involves preprocessing data, training and testing various machine learning models like decision tree, KNN, naive bayes, random forest, logistic regression and support vector classifier. The models are evaluated on metrics like sensitivity, specificity and accuracy. The project aims to provide early and accurate detection of ASD to help timely interventions and advance research in understanding this complex neurodevelopmental condition. The motivation is to improve early diagnosis, reduce healthcare costs and enhance quality of life for individuals with ASD.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views31 pages

Ultimate Final Report Phase 2 Sem 7

This document describes a project that aims to analyze and detect autism spectrum disorder using machine learning. It involves preprocessing data, training and testing various machine learning models like decision tree, KNN, naive bayes, random forest, logistic regression and support vector classifier. The models are evaluated on metrics like sensitivity, specificity and accuracy. The project aims to provide early and accurate detection of ASD to help timely interventions and advance research in understanding this complex neurodevelopmental condition. The motivation is to improve early diagnosis, reduce healthcare costs and enhance quality of life for individuals with ASD.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Belagavi, Karnataka, India

A Report on
“Analysis and Detection of Autism Spectrum Disorder Using Machine
Learning”
Submitted to RVITM Affiliated to Visvesvaraya Technological University (VTU Belagavi) in partial
fulfillment of the requirements for the award of degree of

BACHELOR OF ENGINEERING
in
ELECTRONICS AND COMMUNICATION ENGINEERING
By
Project Team No. PT15

Akshat Gupta 1RF20EC003


Archith P 1RF20EC011
Mohammed Nadeem 1RF20EC028
Vikrant Rana 1RF20EC053

Under the guidance of


Dr. Vikash Kumar
Assistant Professor Dept. of ECE,
RVITM

RV Educational Institutions
RV Institute ofTechnology and Management, Bengaluru
Department of Electronicsand Communication Engineering
2023-24
RV INSTITUTE OF TECHNOLOGY AND MANAGEMENT®
(Affiliated to Visvesvaraya Technological University, Belagavi & Approved by AICTE, NewDelhi)
Bengaluru-560076

DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING

CERTIFICATE
Certified that the project work titled “ANALYSIS AND DETECTION OF AUTISM
SPECTRUM DISORDER USING MACHINE LEARNING” is carried out by
AKSHAT GUPTA(1RF20EC003), ARCHITH P(1RF20EC011), MOHAMMED
NADEEM(1RF20EC028), and VIKRANT RANA(1RF20EC053), who are
bonafide students of RV Institute of Technology and Management, Bangalore, in partial
fulfillment for the award of degree of Bachelor of Engineering in Electronics and
Communication Engineering of the Visvesvaraya Technological University, Belagavi during
the year 2023-2024. It is certified that all corrections/suggestions indicated for the internal
Assessment have been incorporated in the report deposited in the departmental library. The
project report has been approved as it satisfies the academic requirements in respect of project
work prescribed by the institution for the said degree.

Signature of Guide Signature of Head of the Department Signature of Principal


Dr. Vikash Kumar Dr. Prashant P Patavardhan Dr. Jayapal R

External Viva

Name of Examiners Signature with date

2
RV INSTITUTE OF TECHNOLOGY AND MANAGEMENT®
(Affiliated to Visvesvaraya Technological University, Belagavi & Approved by AICTE, NewDelhi)
Bengaluru-560076

DEPARTMENT OF
ELECTRONICS AND COMMUNICATION ENGINEERING

DECLARATION

We, AKSHAT GUPTA-1RF20EC003, ARCHITH P-1RF20EC011


MOHAMMED NADEEM-1RF20EC028, and VIKRANT RANA-1RF1EC053,
the students of sixth semester DEPARTMENT OF ELECTRONICS AND
COMMUNICATION ENGINEERING, hereby declare that the mini project titled
“ANALYSIS AND DETECTION OF AUTISM SPECTRUM DISORDER
USING MACHINE LEARNING” has been carried out by us and submitted in
partial fulfillment for the award of the degree of Bachelor of Engineering in
ELECTRONICS AND COMMUNICATION ENGINEERING. We do declare
that this work is not carried out by any other students for the award of a degree in
any other branch.
Place: Bengaluru Name: Signature:

Date: 1. AKSHAT GUPTA


2. ARCHITH P

3. MOHAMMED NADEEM

4. VIKRANT RANA
ACKNOWLEDGEMENT

The successful presentation of the PROJECT would be incomplete without the


mention of the people who made it possible and whose constant guidance crowned
oureffort with success.

We would like to thank our Project Guide, Dr. Vikash kumar, Assistant Professor,
Department of Electronics and Communication Engineering, RV Institute of
Technology and Management, Bengaluru, for his constant guidance and inputs.

We thank Dr. Prashant P Patavardhan, Professor and Head of the Department of


Electronics and Communication Engineering, RV Institute of Technology and
Management, Bengaluru, for his encouragement.

We would like to extend our gratitude to Dr. Jayapal R, Principal, RV Institute of


Technology and Management, Bengaluru, for providing us an opportunity to work on
a project in this institution.

We would like to thank all the Teaching and Non-Teaching Staff for their
cooperation.

We would like to extend our gratitude to the MANAGEMENT, RV Institute of


Technology and Management, Bengaluru, for providing all the facilities to present the
Project.

Finally, we extend our heartfelt gratitude to our family for their encouragement and
support without which we wouldn’t have come so far. Moreover, we thank all our
friends for their invaluable support and cooperation.

AKSHAT GUPTA-1RF20EC003
ARCHITH P-1RF20EC011
MOHAMMED NADEEM-1RF20EC028
VIKRANT RANA-1RF20EC053
ABSTRACT

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by a wide


range of behavioral and cognitive traits. Early detection and accurate diagnosis of ASD are critical for
enabling timely interventions that can significantly improve outcomes for affected individuals. The
effects of Autism Spectrum Disorder and the severity of symptoms are different in each person and the
symptoms they have are unusual behaviors, interests and also they might have social challenges. This
project presents a comprehensive approach that harnesses the power of machine learning to facilitate
the detection and assessment of autism.

The proposed workflow involves the pre-processing of data, training, and testing with various ML
models such as Decision Tree (DT), K-Nearest Neighbour (KNN), Naïve Bayes (NB),Random Forest
(RF) Logistic Regression (LR) and Support Vector Classifier (SVC). and comparision of results and
prediction of ASD. The proposed method is evaluated on a publicly available dataset. The dataset is
collected based on the evaluation of 31 attributes that are found to be common in patients suffering
from ASD. Data pre-processing is a technique which transforms the raw data into a meaningful and
understandable format. Then the preprocessed data is used to train the various ML models and the
models are evaluated on different metrics such as sensitivity, specificity and accuracy.

The proposed project aims to provide a significant step toward advancing the early detection and
assessment of ASD . The manual process of ASD diagnosis is unreliable due to unavailability of
resources and expert opinion. Therefore, computerized diagnostic systems which use Machine learning
architectures, are proposed to learn the patterns in the provided data and to identify the severity of the
disease. The proposed ML model can achieve high performance on ASD detection compared with the
conventional approach.

1
MOTIVATION

The motivation for using machine learning algorithms to analyze and detect ASD is driven by several
important factors:
1.Early Intervention: Early detection of ASD is crucial for providing effective interventions and
support to people suffering with ASD. The earlier the patient is diagnosed, the more effective
interventions can be, which can significantly improve their long-term outcomes.
2.Diagnostic Challenges: ASD is a complex neurodevelopmental disorder with a wide range of
symptoms and varying degrees of severity. Diagnosing ASD based solely on clinical observation can be
challenging and time-consuming, and there is a need for more objective and accurate diagnostic tools.
3.Reduction of Healthcare Costs: Early and accurate diagnosis of ASD can lead to cost savings in
healthcare. It can help avoid misdiagnoses, unnecessary tests, and delays in accessing appropriate
interventions, thereby reducing the overall burden on healthcare systems.
4.Research Advancements: Machine learning can assist researchers in uncovering the underlying
mechanisms and causes of ASD by analyzing diverse datasets. This can lead to a deeper understanding
of the disorder and potentially the development of more targeted treatments.
5.Remote Screening: Machine learning models can be applied to remote screening, allowing for the
assessment of ASD risk factors and symptoms in individuals who may not have easy access to
specialized clinical facilities.

The motivation for using machine learning in the analysis and detection of Autism Spectrum Disorder
is driven by the potential to improve early diagnosis, reduce healthcare costs, advance research, and
ultimately enhance the quality of life for individuals with ASD and their families. Machine learning
offers the promise of more objective and data-driven approaches to understanding and addressing this
complex neurodevelopmental disorder.

2
LITERATURE REVIEW
TITLE FEATURE LIMITATION
Muhammad Shuaib Qureshi Support Vector Machine (SVM) The hyperplane dimension must
et al. “Prediction and SVM is a supervised be altered from one to
Analysis of Autism Spectrum classification technique that uses the Nth dimension in this scenario
Disorder Using Machine a line to distinguish between two called as Kernel.
Learning Techniques”(2023). separate groups. SVM algorithm is not suitable for
• SVM works relatively well large data sets.
when there is a clear margin of SVM does not perform very well
separation between classes. when the data set has more noise
• SVM is more effective in high i.e. target classes are overlapping.
dimensional spaces.
• SVM is effective in cases
where the number of
dimensions is greater than the
number of samples.
• SVM is relatively memory
efficient

K. Vijayalakshmi et al. “A RandomForest Classifier: It requires much computational


Hybrid Recommender System Random Forest Decision power as well as resources as it
using Multi Classifier Tree is a widespread builds numerous trees to combine
Regression Model for Autism classification mechanism their outputs.
Detection”(2022) handles any binary
classification problems It also requires much time for
through visualization. training as it combines a lot of
Random Forest is a decision trees to determine the
collaborative decision tree class.
based technique that
generates a forest as a
group of decision trees.

• The disadvantage of DT
is model overfitting can
be overcome with
Random Forest.
• Using voting, the best
scored tree will be
selected from the forest
randomly on subtrees.

3
TITLE FEATURE LIMITATION
Shirajul Islam et al. Logistic Regression (LR) Logistic If the number of
“Autism Spectrum Regression’s primary aim is in finding the observations is lesser than
Disorder Detection in model with the best fit that describes the the number of features,
Toddlers for Early relationship between the binomial Logistic Regression should
Diagnosis Using Machine character of interest and a set of not be used, otherwise, it
Learning”(2021). independent variables. It makes use of a may lead to overfitting.
logistic function to find an optimal curve
to fit the data points. It can only be used to predict
• It makes no assumptions about discrete functions.
distributions of classes in feature space.
• It can easily extend to multiple
classes(multinomial regression) and a
natural probabilistic view of class
predictions.

Sushama Rani Dutta et Naive Bayes (NB) Based around NB is that it only works well
al.“A Machine Learning- conditional probability (Bayes theorem) with limited number of
based Method for Autism and counting, the name “naïve” comes features. Moreover, there is
Diagnosis Assistance from its assumption of conditional a high bias when there is a
in Children”(2021) independence of all input features. If this small amount of data.
assumption is considered true.

The rate at which an NB classifier will


converge will be much higher than a
discriminative model like logistic
regression. Therefore, the amount of
training data required would be lesser.

4
TITLE FEATURES LIMITATION

Haibin Cai, Yinfeng Fang, Decision trees can provide Overfitting: Decision trees are
Zhaojie Ju et al.“Sensing- information about the prone to overfitting.
enhanced Therapy System for importance of different
Assessing Children with Autism features (questions, variables, Instability: Small changes in the
Spectrum Disorders: A or symptoms) in the data can lead to different tree
Feasibility Study”(2020). classification process. This can structures, making the model
be valuable for understanding unstable
which factors contribute most
to autism detection.

Decision trees can capture non-


linear relationships between
features and the target variable,
which can be important in
identifying complex patterns in
autism diagnosis.
Andi W.R Emanuel et al. K-Nearest Neighbors Sensitive to Outliers.
“Machine Learning Classifiers Algorithm The k-nearest Computationally Expensive.
for Autism Spectrum neighbors algorithm, also
Disorder”(2020). known as KNN or k-NN, is a
non-parametric, supervised
learning classifier, which uses
proximity to make
classifications or predictions
about the grouping of an
individual data point.

• Can Handle Large Datasets


• Accurate and Effective
Che Zawiyah Che Hasan, XGBoost (Extreme Gradient XGBoost is a complex algorithm
Rozita Jailani et al. “ANN and Boosting) is a powerful with many hyperparameters to
SVM Classifiers in Identifying machine learning algorithm tune.
Autism Spectrum Disorder Gait that has been widely used for
Based on Three-Dimensional various classification tasks, XGBoost may not perform well if
Ground Reaction including autism detection. the data quality is poor or if the
Forces”(2019). Here are the pros and cons of dataset is biased
using XGBoost for autism
detection.
• outstanding predictive
performance.
• XGBoost can handle
missing data by learning
how to impute missing
values during the training

5
PROBLEM STATEMENT
Autism spectrum disorder (ASD) is a disorder where patients are unable to express and interact.
Recently it is an issue to be concerned that one in 59 children has identified as an autism spectrum
disorder patient. According to recent reports, about 20 million people in India are diagnosed with
autism. ASDs start from childhood but symptoms can be detected in adulthood. That is why these
children are not being able to have proper treatment at an early age and that causes more complexity in
their health. Research shows that a diagnosis of autism at an earlier age can be more reliable and stable.
Therefore, our proposed project aims to estimate ASD at a sooner possible time and increase more
accuracy than the previous research and reduce medical costs.

Early detection and treatment are the most important steps to be taken to decrease the symptoms of
ASD problem and to improve the quality of life of ASD suffering people. However, there is no
procedure of medical test for the detection of autism. ASD Symptoms are usually recognized by
observation. By assuming that human genes are responsible for it, the exact causes of ASD have not
been recognized by the scientist yet. The human genes affect the development by influencing the
environment.

SCOPE & OBJECTIVES


Develop a Machine learning model that accurately detects Autism Spectrum Disorder:

1. Accurate Detection: Develop machine learning models that can accurately detect ASD from the
collected data, with a focus on achieving high sensitivity and specificity.
2. Early Detection: If applicable, design models that can identify signs of ASD in early childhood to
facilitate early intervention and support.
3. Interpretability: Ensure that the machine learning models provide interpretable results, enabling
healthcare professionals to understand the basis for ASD diagnosis.
4. Reduced Misdiagnosis: Minimize the risk of misdiagnosis and improve the reliability of ASD
diagnosis compared to traditional assessment methods.

6
METHODOLOGY

Fig.1 Steps in the proposed ASD detection solution

The steps in the proposed workflow, as shown in Fig 1, which involves the pre-processing of data,
training, and testing with specified models, evaluation of results and prediction of ASD.

PREPROCESSING
Data pre-processing is a technique in which transform the raw data into a meaningful and
understandable format. The data in a dataset can contain a large number of irrelevant and missing
components. A good pre-processed data always yields to a good result. Various Data pre-processing
methods are used to handle incomplete and inconsistent data like as handling missing values, outlier
detection, data discretization, data reduction (dimension and numerosity reduction), etc.
The dataset is saved in .csv format. CSV files, are a file type that allows us to save tabular data, such as
spreadsheets. The model gets unduly complex due to the fact that the dataset contains several attributes
in text format. Therefore, all of the attributes are converted to numeric values in order to decrease the
training period and enhance the model's performance.

TRAINING AND TESTING MODEL


The whole dataset has been split into two parts i.e. one part is training the dataset and the other one is
testing dataset with a ratio of 7:3 respectively.

CLASSIFICATION
Data will be classified using a variety of algorithms, including the Logistic Regression Algorithm, the
State Vector Classifier Algorithm, the Naive Bayes Algorithm, the Decision Tree Algorithm, the K-
Nearest Neighbor Algorithm and the Random Forest Algorithm, after it has been split. We get different
metrics when we use these algorithms.

7
ALGORITHM

The general algorithm followed for executing the project is:

Step 1: Obtaining appropriate Data sets.


Step 2: Pre-processing the Data sets to obtain numeric values.
Step 3: Split the Data sets into Train and Test sets.
Step 4: Implement various machine learning algorithms.
Step 5: Feed the training set to the model.
Step 6: Define the attributes accordingly to obtain a good accuracy.
Step 7: Once the model is trained with the training data set, expose it to the test data.
Step 8: Obtain and plot training and testing accuracy graphs.
Step 9: Generate confusion matrix for the obtained results.
Step 10: Give any test set input to the network to check the predictions.

DATASET
The dataset considers 31 attributes based on which the models are trained. The attributes are:

Attribute Type Description

Age Number Age in years

Gender String Male or Female

Ethnicity String List of common ethnicities in text format (12 attributes)

Born with jaundice Boolean (yes or no) Whether the case was born with jaundice

Who is completing
String Parent, relative, self
the test (User)

S/he often notices small sounds when others do not,


(Child, Adolescent) S/he notices patterns in things all the
Question 1 (A1) Binary (0, 1)
time, (Adult) Does your child look at you when you call
his/her name? (Toddler)

S/he usually concentrates more on the whole picture,


rather than the small detail, (child, Adolescent, Adults)
Question 2 (A2) Binary (0, 1)
How easy is it for you to get eye contact with your child?
(Toddler)

8
In a social group, s/he can easily keep track of several
different peoples conversations, (child, Adolescent) I
Question 3 (A3) Binary (0, 1) find it easy to do more than one thing at once, (Adult)
Does your child point to indicate that s/he wants
something? (e.g. a toy that is out of reach) (Toddler)
S/he finds it easy to go back and forth between different
activities, (child, Adolescent) If there is an interruption,
Question 4 (A4) Binary (0, 1) s/he can switch back to what s/he was doing very quick,
(Adult) Does your child point to share interest with you?
(e.g. pointing at an interesting sight) (Toddler)
S/he does not know how to keep a conversation going
with his/her peers, (child, Adolescent) I find it easy to
Question 5 (A5) Binary (0, 1) read between the lines when someone is talking to me,
(Adult) Does your child pretend? (e.g. care for dolls, talk
on a toy phone) (Toddler)
S/he is good at social chit-chat, (child, Adolescent) I
know how to tell if someone listening to me is getting
Question 6 (A6) Binary (0, 1)
bored, (Adult) Does your child follow where you are
looking? (Toddler)
When s/he is read a story, s/he finds it difficult to work
out the characters intentions or feelings, (Child) When
s/he was younger, s/he used to enjoy playing games
involving pretending with other children, (Adolescent)
Question 7 (A7) Binary (0, 1) When I am reading a story, I find it difficult to work out
the characters intentions, (Adult) If you or someone else
in the family is visibly upset, does your child show signs
of wanting to comfort them? (e.g. stroking hair, hugging
them (Toddler)
When s/he was in preschool, s/he used to enjoy playing
games involving pretending with other children, (Child)
S/he finds it difficult to imagine what it would be like to
be someone else, (Adolescent) I like to collect
Question 8 (A8) Binary (0, 1)
information about categories of things (e.g. types of car,
types of bird, types of train, types of plant, etc.), (Adult)
Would you describe your childs’ first words as:
(Toddler)
S/he finds it easy to work out what someone is thinking
or feeling just by looking at their face, (Child) S/he finds
social situations easy, (Adolescent) I find it easy to work
Question 9 (A9) Binary (0, 1)
out what someone is thinking or feeling just by looking
at their face, (Adult) Does your child use simple
gestures? (e.g. wave goodbye) (Toddler)
S/he finds it hard to make new friends, (Child,
Adolescent) I find it difficult to work out peoples
Question 10 (A10) Binary (0, 1)
intentions, (Adult) Does your child stare at nothing with
no apparent purpose? (Toddler)

Table 1: Dataset Attributes

9
FLOWCHART

10
.

PLATFORM/SOFTWARE USED

JUPYTER NOTEBOOK

Jupyter Notebook stands as a versatile, interactive platform that seamlessly integrates live code,
visualizations, and explanatory text within a single document. This open-source web-based tool
facilitates data analysis, scientific exploration, and machine learning by enabling users to combine
executable code cells with markdown-based text, allowing for an intuitive blend of computation and
storytelling. Its adaptable environment supports various programming languages, fostering
collaborative work and the creation of comprehensive reports that showcase code, results, and
descriptive insights all in one accessible space. Jupyter notebook lets anyone create and execute
arbitrary Python code in the browser. It's ideal for machine learning, data analysis, and education. A
notebook is saved with an .jpynb extension.

PYTHON 3.10.0
Python libraries and frameworks offer a reliable environment which reduces software development
time significantly. Python is consistent, simple, flexible, platform independent and has wide
community which makes it most appropriate for machine learning Python includes a modular machine
learning library PyBrain, Tensorflow, Keras, NumPy etc which offers many algorithms for machine
learning task. Version

MATLAB
MATLAB is a programming and numeric computing environment used by millions of engineers and
scientists to analyze data, develop algorithms, and create models. MATLAB provides professionally
developed toolboxes for signal and image processing, control systems, wireless communications,
computational finance, robotics, deep learning and AI and more. MATLAB combines a desktop
environment tuned for iterative analysis and design processes with a high-level programming
language. It includes the Live Editor for creating scripts that combine code, output, and formatted text
in an executable notebook. Prebuilt apps allow you to interactively perform iterative tasks.

11
DESIGN AND DEVELOPMENT
Decision Tree Algorithm:

M = fitctree(X_train, Y_train, 'PredictorNames', features);

fitctree(X,Y) returns a fitted binary classification decision tree based on the input variables contained in
matrix X and output Y. The returned binary tree splits branching nodes based on the values of a column
of X.

KNN Algorithm:

for k = 1:numNeighbors
model = fitcknn(dataTrain(:, 1:end1),dataTrain.category_encoded,'NumNeighbors',
k);

fitcknn returns a k-nearest neighbor classification model based on the predictor data X and response Y.
NumNeighbors is the variable which holds the value of total number of neighbors and k holds the
value to be considered for each iteration.

Naïve Bayes Algorithm:

for num = 1:numFeatures


model = fitcnb(dataTrain(:, 1:num), dataTrain.category_encoded);

fitcnb(X,Y) returns a multiclass naive Bayes model (Mdl), trained by predictors X and class
labels Y0. additional options can be specified by one or more Name,Value pair arguments, using any of
the previous syntaxes. For example, you can specify a distribution to model the data, prior probabilities
for the classes, or the kernel smoothing window bandwidth.

Random Forest Algorithm:

for num = 1:numLearners


model = fitensemble(dataTrain, 'category_encoded', 'Bag', num, 'Tree', 'Type',
'classification’);

12
fitensemble(Tbl,ResponseVarName,Method,NLearn,Learners) returns a trained ensemble model object
that contains the results of fitting an ensemble of ”Nlearn” classification or regression learners
(Learners) to all variables in the table Tbl. ResponseVarName is the name of the response variable
in Tbl. Method is the ensemble-aggregation method.

Logistic Regression and State Vector Classifier Algorithm:

models = [LogisticRegression(), SVC(kernel='rbf')]

for model in models:


model.fit(X, Y)

print(f'{model} : ‘)
print('Training Accuracy : ', metrics.roc_auc_score(Y, model.predict(X)))
print('Validation Accuracy:metrics.roc_auc_score(Y_val,model.predict(X_val
)))
print()

RESULTS AND DISCUSSIONS


• The ML models are evaluated on the following metrics:
the confusion matrix for a binary classification algorithm is as shown

The confusion matrices of all the machine learning algorithms are as shown

13
CONFUSION MATRICES

Fig.2 K Nearest Neighbour Fig.3 Naïve Bayes

Fig.4 Decision Tree Fig.5 Random Forest

Fig.6 State Vector Classifier Fig.7 Logistic Regression

14
TRAINING ACCURACY AND TESTING ACCURACY PLOTS

Fig.8 K Nearest Neighbour Fig.9 Naïve Bayes

Fig.10 Decision Tree Fig.11 Random Forest

Fig.12 State Vector


Classifier
and
Logistic Regression

15
By displaying the true and false predictions for each class, the confusion matrix goes beyond
classification accuracy. A confusion matrix in the context of a binary classification job is a 2x2 matrix.

True Positive (TP): It is the total counts having both predicted and actual values are true.
True Negative (TN): It is the total counts having both predicted and actual values are false.
False Positive (FP): It is the total counts having prediction as true while actually, it is false.
False Negative (FN): It is the total counts having prediction as false while actually, it is true.

The figures Fig.2, Fig.3, Fig.4, Fig.5, Fig.6 and Fig.7 represent the confusion matrices of KNN, Naïve
Bayes, Decision Tree, Random Forest, State Vector Classifier and Logistic Regression algorithms
respectively.

The figures Fig.8, Fig.9 and Fig.11 illustrate the model accuracy on training and testing data.
These figures serve as crucial diagnostic tools, encapsulating the model's learning progress and
generalization capabilities. These visualizations offer insights into overfitting, hyperparameter
optimization, and the generalizing capabilities of the models by showcasing the evolution of
performance metrics across iterations or parameter variations. They enable us to assess model behavior,
detect issues like bias or variance and to compare different models, and communicate succinct
summaries of model performance, guiding the refinement and selection of optimal machine learning
models.

These graphs have 2 main features, curve 1 which is for the training set represented in blue and curve 2
which is the test set represented in orange. As it can be seen that both the curves increase exponentially
which denotes that the accuracy of our models increases periodically. These values are measured by
increasing the number of neighbors in KNN algorithm, the number of features in Naïve bayes algorithm
and the number of trees in Random Forest algorithm.

By comparing the model predictions with the actual values in terms of a percentage, it determines how
well our model predicts. Fig.10 and Fig.12 represent the model accuracy on training and testing data for
Decision tree, logistic regression and State Vector Classifier algorithms respectively.

16
Fig.13 Importance of each feature in the constructed Decision Tree

Fig.13 illustrates the importance of each feature. The x-axis corresponds to the features, and the y-axis
shows their importance estimates. Feature importance is calculated based on how often a feature is used
for splitting nodes across the tree in the ensemble. The more frequently a feature is chosen for splitting
nodes, the higher its importance is considered. It is often used after training an ensemble or tree-based
(in this case Decision tree) model to understand which features contribute the most to the model's
predictive performance.

Fig.14 : Tree structure obtained using Decision Tree Algorithm based on the above features

17
Fig.15 Importance of each feature in the Random Forest Algorithm
Random Forests explore more features compared to individual decision trees due to their ensemble
design. By building numerous trees on bootstrapped subsets of the data while considering a different
random subset of features for each tree, Random Forests encourage diversity among the trees. This
diversity promotes a broader exploration of the feature space, allowing the ensemble to capture a richer
representation of the relationships within the data. This approach often leads to improved
generalization, robustness against overfitting, and better overall performance compared to a single
decision tree.
Some of the Decision Tress are shown below:

18
Fig.16 Few Tree Structures obtained using Random Forest Algorithm

19
ACCURACY SCORES

knn
Fig.17 Accuracy of K Nearest Neighbour

Fig.18 Accuracy of Decision Tree

Fig.19 Accuracy of Random Forest Fig.20 Accuracy of Naïve Bayes

Fig.21 Accuracy of Logistic Regression and State Vector Classifier

20
Model Sensitivity specificity Accuracy
1. K-Nearest Neighbor 0.9696 0.9000 0.9505
2. Naïve Bayes 0.9805 0.9130 0.9597
3. Decision Tree 0.8266 0.8490 0.8325
4. Logistic Regression 0.9345 0.6415 0.8375
5. State Vector 0.9224 0.7272 0.8687
Classifier
6. Random Forest 0.9568 1.0000 0.9670

Table 2: Comparision of various Models

21
CONCLUSION AND FUTURE SCOPE

22
Bibliography/Reference

[1] Benjamin Gesundheit* and Joshua P. Rosenzweig, “Editorial: Autism Spectrum Disorders (ASD)-
Searching for the Biological Basis for Behavioral Symptoms and New Therapeutic Targets, Published
online 2023 Jan.

[2] Arodami Chorianopoulou, Efthymios Tzinis, Elias Iosif Asimenia Papoulidi, Christina Papailiou,
Alexandros Potamianos, “Engagement detection for children with autism spectrum disorder”, 2023.

[3] Siriwan Sunsirikul and Tiranee Achalakul, “Associative Classification Mining in the Behavior
Study of Autism Spectrum Disorder”, vol.3, 2022.

[4] Beibin Li ; Sachin Mehta ; Deepali Aneja ; ClaireFoster ; PamelaVentola ; Frederick Shic ; Linda
Shapiro, “A Facial Affect Analysis System for Autism Spectrum Disorder”, 2022.

[5] Pratibha Vellanki, Thi Duong, Svetha Venkatesh, Dinh Phung, “Nonparametric Discovery of
Learning Patterns and Autism Subgroups from Therapeutic Data”, 2022.

[6] Paul Fergus, Basma Abdulaimma, Chris Carter, Sheena Round, “Interactive Mobile Technology for
Children with Autism Spectrum Condition (ASC)”, 2021.

[7] V.Y Tittagalla, R. R. P Wickramarachchi, G. W. C. N. Chandrarathne, N.M. D. M. B. Nanayakkara,


P. Samarasinghe, P. Rathnayake and M.G.N.M. Pemadasa, “Screening Tool for Autistic Children”,
2021.

[8] Daiki Mitsumoto, Takeshi Hori, Shigeki Sagayama Hidenori Yamasue, Keiho Owada, Masaki
Kojima, Keiko Ochi, Nobutaka Ono, “Autism Spectrum Disorder Discrimination Based on Voice
Activities Related to Fillers and Laughter”, 2021.

[9] Tarannum Zaki, Muhammad Nazrul Islam, Md. Sami Uddin, Sanjida Nasreen Tumpa, Md. Jubair
Hossain, Maksuda Rahman Anti, Md. Mahedi Hasan, “Towards Developing a Learning Tool for
Children with Autism”, 2021.

23
[10] Ardiana Sula, Evjola Spaho, Keita Matsuo, Leonard Barolli, Rozeta Miho and Fatos Xhafa, “An
IoT-based System for Supporting Children with Autism Spectrum Disorder”, 2020.

[11] Haibin Cai, Yinfeng Fang, Zhaojie Ju, Cristina Costescu, Daniel David, Erik Billing, Tom Ziemke,
Serge Thill, Tony Belpaeme, Bram Vanderborght, David Vernon, Kathleen Richardson and Honghai
Liu, “Sensing-enhanced Therapy System for Assessing Children with Autism Spectrum Disorders: A
Feasibility Study”, 2020.

[12] Akshay Vijayan ; S Janmasree ; C Keerthana ; L Baby Syla, “A Framework for Intelligent Learning
Assistant Platform Based on Cognitive Computing for Children with Autism Spectrum Disorder”, July
2019.

[13] Sushama Rani Dutta ; Sujoy Datta ; Monideepa Roy, “Using Cogency and Machine Learning for
Autism Detection from a Preliminary Symptom”, July 2019.

[14] Che Zawiyah Che Hasan, Rozita Jailani and Nooritawati Md Tahir, “ANN and SVM Classifiers in
Identifying Autism Spectrum Disorder Gait Based on Three-Dimensional Ground Reaction Forces”,
October 2019.

[15] D. P. Wall, R. Dally, R. Luyster, J.-Y. Jung, and T. F. DeLuca, “Use of artificial intelligence to
shorten the behavioral diagnosis of autism,” PloS one, vol. 7, no. 8, p. e43855, 2019.

[16] D. Bone, S. L. Bishop, M. P. Black, M. S. Goodwin, C. Lord, and S. S. Narayanan, “Use of


machine learning to improve autism screening and diagnostic instruments: effectiveness, efficiency, and
multi-instrument fusion,” Journal of Child Psychology and Psychiatry, vol. 57, 2019.

[17] J. Kosmicki, V. Sochat, M. Duda, and D. Wall, “Searching for a minimal set of behaviors for
autism detection through feature selection-based machine learning,” Translational psychiatry, vol. 5, no.
2, p. e514, 2019.

24
[18] W. Liu, M. Li, and L. Yi, “Identifying children with autism spectrum disorder based on their face
processing abnormality: A machine learning framework,” Autism Research, vol. 9, no. 8, pp. 888–898,
2019.

[19] Kazi Shahrukh Omar, Prodipta Mondal, Nabila Shahnaz Khan, “A Machine Learning Approach to
Predict Autism Spectrum Disorder”, 7-9 February, 2019.

[20] Akter, Tania, Md Shahriare Satu, Md Imran Khan, Mohammad Hanif Ali, Shahadat Uddin, Pietro
Lio, Julian MW Quinn, and Mohammad Ali Moni. "Machine learning-based models for early stage
detection of autism spectrum disorders." 2019.

25

You might also like