0% found this document useful (0 votes)

36 views78 pages

Minor Project

Uploaded by

singhashpreet230

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views78 pages

Minor Project

Uploaded by

singhashpreet230

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Using Supervised Machine Learning Algorithms for Autism

Spectrum Disorder Recognition

DISSERTATION

Submitted in partial fulfillment of the

Requirements for the award of the degree

Bachelor of Technology
in
Computer Science & Engineering

By:
Ananya Singh(08/CSE1/2020)
Arshleen Bhandari(12/CSE1/2020)
Harleen Kaur Aarora(43/CSE1/2020)
Himika Prabhat(52/CSE1/2020)

Under the guidance of: Dr. Aashish Bhardwaj

Department of Computer Science & Engineering

Guru Tegh Bahadur Institute of Technology

Guru Gobind Singh Indraprastha University

Dwarka, New Delhi
Year 2020-24
DECLARATION

We hereby declare that all the work presented in the dissertation entitled “Autism Spectrum
Disorder screening in adults using Supervised Machine Learning Algorithms” in the partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science & Engineering, Guru Tegh Bahadur Institute of Technology, affiliated to
Guru Gobind Singh Indraprastha University Delhi is an authentic record of our own work carried
out under the guidance of Dr. Aashish Bhardwaj.

Date: 08-12-2023
Ananya Singh(08/CSE1/2020)
Arshleen Bhandari(12/CSE1/2020)
Harleen Kaur Aarora(43/CSE1/2020)
Himika Prabhat(52/CSE1/2020)

ii
CERTIFICATE

This is to certify that dissertation entitled “Autism Spectrum Disorder screening in adults
using Machine Learning Algorithms”, which is submitted by Ms. Ananya Singh, Ms.
Arshleen Bhandari, Ms. Harleen Kaur and Ms. Himika Prabhat in partial fulfillment of the
requirements for the award of the degree of Bachelor of Technology in Computer Science &
Engineering, Guru Tegh Bahadur Institute of Technology, New Delhi is an authentic record of
the candidate’s own work carried out by them under our guidance. The matter embodied in this
thesis is original and has not been submitted for the award of any other degree.

Date: 08-12-2023

Aashish Bhardwaj
(Head of Department)
Computer Science & Engineering

iii
ACKNOWLEDGEMENT

We express our sincere appreciation to our mentor, Dr. Ashish Bhardwaj, who serves as the
Head of the Department of Computer Science and Engineering (CSE). His guidance and
support have been invaluable throughout our dissertation process, and we are grateful for his
leadership that has played a crucial role in enhancing the quality of our work to its current
standard.

Without his expertise and encouragement, this achievement would not have been possible and he
has acted as guiding light for our path throughout the project. We are extremely grateful.

Date: 08-12-2023 Ananya Singh

(08/CSE1/2020)
[email protected]

Arshleen Bhandari
(12/CSE1/2020)
[email protected]

Harleen Kaur Aarora

(43/CSE1/2020)
[email protected]

Himika Prabhat
(52/CSE1/2020)
[email protected]

vi
ABSTRACT

This project delves into the intricate challenge of identifying Autism Spectrum Disorder (ASD)
in adults through the strategic application of Supervised Machine Learning Algorithms. ASD,
marked by difficulties in social interaction, communication, and repetitive behaviors,
underscores the imperative for early detection to facilitate timely intervention. Employing a
diverse ensemble of algorithms, including Decision Trees, Random Forest, Support Vector
Machines, K Nearest Neighbors, Logistic Regression, Linear Discriminant Analysis, and
Quadratic Discriminant Analysis, the project aims to craft an advanced ASD screening tool with
transformative potential for diagnostic processes.

The primary goal is the construction of a screening mechanism adept at leveraging the unique
strengths of each algorithm, ultimately yielding a more accurate and robust identification model.
To achieve this, we meticulously curated a dataset that incorporates information from both ASD
and non-ASD adult subjects. This dataset serves as the training and evaluation ground for the
machine learning algorithms, ensuring a diverse and representative sample that authentically
mirrors real-world scenarios.

The project places significant emphasis on algorithmic diversity, undertaking the exploration of
various modeling approaches to attain a nuanced understanding of the intricate features
associated with ASD. Each algorithm contributes a distinct perspective, collectively enriching
the screening process. Interpretability becomes a focal point, with the overarching objective of
unraveling complex relationships between features and ASD identification.

Integral to the project are ethical considerations that underscore privacy, fairness, and the
responsible deployment of machine learning technologies within healthcare contexts. Ensuring
the ethical use of ASD screening tools is paramount not only for building public trust but also for
safeguarding the well-being of individuals undergoing assessments.

In the synthesis of the unique capabilities inherent in diverse machine learning algorithms, this
project aspires to redefine ASD screening. The anticipated outcomes hold promise for
significantly enhancing the quality of life for individuals on the autism spectrum by enabling
more accurate and efficient early detection. This aligns seamlessly with the broader objectives of
precision medicine and exemplifies the transformative potential of advanced technologies in
positively impacting the identification of neurodevelopmental disorders. In essence, this project
represents a concerted effort to seamlessly blend cutting-edge machine learning methodologies
with a compassionate approach, thereby enhancing ASD diagnostic processes and nurturing a
future where early intervention becomes not only more accessible but also more effective in
improving outcomes for those affected by ASD.
LIST OF FIGURES

S.No Figure Name Page No.

1.1 Supervised Machine Learning Algorithm 8
1.2 Random Forest Algorithm 8
1.3 Schematic diagram of SVM algorithm 9
1.4 K-Nearest Neighbors Algorithm 10
1.5 Logistic Curve 10
1.6 Multi-Layer Perceptron Learning 11
1.7 Model Complexity Graphs 19
1.8 Learning Curve 20
3.1 ER Diagram 32
3.2 Use Case Diagram 34
4.1 Support Vector Regression score 38
4.2 k Nearest Neighbours Regression score 38
4.3 Logistic Regression score 38
4.4 XGB Classifier Regression score 38
4.5 Random Forest Regression score 38
4.6 MLP Regression score 39
A-1 First five rows of the dataset 54
A-2 Value_count of each unique values in the column ethnicity 54
A-3 Value_count of each unique values in the column relation 55
A-4 Pie chart for the number of data for each target 55
A-5 Plot where scores indicate no. of yes for set of 10 ques asked 55
A-6 Plot where count indicate no. of males and females 56
A-7 Plot where count indicate no. of people from diff ethnicities 56
A-8 Plot with 0 for people with autism and 1 for not with autism 57
A-9 Plots for diff countries given in the dataset 57
A-10 Pair plot for A1-A10 58
A-11 Heat map for correlation matrix of numeric columns 59
A-12 Count plot no. of cases for each age group 59
A-13 Comparison btw scores and no. of +ve and -ve cases 60
A-14 Normal Distribution of age values after log transformation 60
LIST OF TABLES

S.No Table Name Page No.

1.1 Attribute features and their description 13
5.1 Set of Questions 41
6.1 Results 44
CONTENTS

Chapter Page No.

Title page i
Declaration ii
Certificate iii
Acknowledgement iv
Abstract v
List of Tables and Figures vii
1. Introduction 1
1.1 Introduction to Autism Spectrum Disorder (ASD) 1
1.2 Significance of Early Detection in ASD 2
1.3 Role of Supervised Machine Learning in Healthcare 5
1.4 Libraries used 7
1.4.1 Numpy 7
1.4.2 Pandas 7
1.4.3 Matplotlib 7
1.4.4 Seaborn 7
1.4.5 SciKit Learn 7
1.5 Overview of Machine Learning Algorithms 8
1.5.1 Random Forests 8
1.5.2 Support Vector Machines (SVM) 9
1.5.3 k-Nearest Neighbors (kNN) 9
1.5.4 Logistic Regression 10
1.5.5 Multi-Layer Perceptron(MLP) 11
1.6 Crafting an Advanced ASD Screening Tool 12
1.7 Dataset Compilation and Characteristics 13
1.8 Emphasis on Algorithmic Diversity 17
1.9 Interpretability in ASD Identification 18
1.10 Ethical Considerations in Machine Learning for Healthcare 21
1.11 Anticipated Outcomes and Future Implications 23
2. Requirements Analysis with SRS 26
2.1 System Overview 26
2.1.1 System Description 26
2.1.2 System Features 26
2.2 Functional Requirements 26
2.2.1 Data Preprocessing 27
2.2.2 Algorithm Implementation 27
2.2.3 Model Evaluation 28
2.3 Non-Functional Requirements 28
2.3.1 Performance 28
2.3.2 Usability 28
2.3.3 Security 29
2.4 Constraints 29
2.4.1 Dataset Limitations 29
2.4.2 Availability of Algorithm 29
2.5 Appendix 29
2.5.1 Some References 29
2.5.2 Glossary 29
3. System Design 38
4. Test Plan
5. Body of Thesis
6. Results and observations 61
7. Summary and Conclusions
8. future scope 66
References 68
Appendix A 55
Appendix B 63
Chapter One

INTRODUCTION

1
INTRODUCTION

Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition that significantly

impacts an individual's social interaction, communication skills, and behavioral patterns. The
spectrum nature of ASD encompasses a wide range of symptoms and severity, making each
person's experience unique. Although ASD has historically been linked primarily with
childhood, it is increasingly acknowledged that it often extends into adulthood, bringing forth a
distinctive set of challenges for identification, diagnosis, and intervention.

1.1 Introduction to Autism Spectrum Disorder (ASD)

Autistic Spectrum Disorder (ASD) is the name for a group of developmental disorders impacting
the nervous system. ASD symptoms range from mild to severe: mainly language impairment,
challenges in social interaction, and repetitive behaviors. Many other possible symptoms include
anxiety, mood disorders and Attention-Deficit/Hyperactivity Disorder (ADHD).
The core features of ASD include difficulties in social interaction, marked by challenges in
understanding and responding to social cues. Individuals with ASD may struggle with
establishing and maintaining relationships, interpreting nonverbal communication, and grasping
the nuances of social reciprocity. Additionally, communication difficulties are a hallmark of
ASD, with variations ranging from delayed language development to a complete absence of
spoken language. The manifestation of repetitive behaviors, such as stereotypical movements or
adherence to strict routines, further contributes to the intricate tapestry of ASD symptoms.
Recognizing the prevalence of ASD in adults has become increasingly crucial in recent years.
The traditional perception of autism as a childhood disorder has given way to a broader
understanding that acknowledges the persistence of symptoms into adulthood. The transition
from childhood to adulthood introduces new challenges in identifying and diagnosing ASD in a
population that may have developed coping mechanisms or masked their symptoms over time.
This shift in perspective underscores the critical need for effective screening tools and diagnostic
criteria tailored to the unique characteristics of adults with ASD.
Despite advancements in our comprehension of ASD, the diagnostic journey for adults
remains a complex and nuanced process. Unlike childhood diagnosis, which often involves
observations of early developmental markers, adult diagnosis relies on retrospective assessments

2
and an analysis of lifelong patterns of behavior. The subtleties of adult presentation necessitate a
comprehensive evaluation that considers not only the core symptoms of ASD but also the
individual's adaptive functioning and quality of life. Clinicians face the challenge of
distinguishing ASD from other mental health conditions that may share overlapping features,
further emphasizing the need for a nuanced approach to diagnosis.
The importance of timely intervention for adults with ASD cannot be overstated. Early
identification and intervention have been shown to improve outcomes and enhance the
individual's overall quality of life. However, the challenges of diagnosing ASD in adults may
lead to delayed access to appropriate support and services. This delay can impact various aspects
of an individual's life, including education, employment, and social relationships.

1.2 Significance of Early Detection in ASD

The significance of early detection in Autism Spectrum Disorder (ASD) cannot be overstated,
particularly when considering its profound implications for individuals, society, and healthcare
systems. Early identification of ASD in adults serves as a pivotal gateway to timely and targeted
interventions, laying the foundation for improved outcomes across various domains of life.
At an individual level, early detection acts as a catalyst for enhancing the quality of life for those
with ASD. The developmental trajectory of individuals with ASD is markedly influenced by
early intervention, impacting key areas such as social integration, communication skills, and
adaptive behaviors. The malleability of the human brain during early developmental stages
makes this period particularly receptive to therapeutic interventions. Therefore, identifying and
addressing ASD in its early stages can lead to more effective interventions, potentially mitigating
the impact of core symptoms and promoting the development of essential life skills.
Social integration stands out as a critical domain affected by early detection and intervention.
Individuals with ASD often encounter challenges in forming and maintaining social
relationships. Early identification allows for the implementation of targeted social skills training
and support, fostering improved social interactions and relationships. As a result, the individual's
ability to navigate social situations, understand social cues, and engage in reciprocal
communication can be significantly enhanced, positively influencing their overall well-being.
Communication skills, another core aspect of ASD, benefit immensely from early intervention.
Speech and language difficulties, ranging from delayed language acquisition to challenges in

3
pragmatic communication, are common among individuals with ASD. Early identification
enables the initiation of speech and language therapy tailored to the individual's specific needs,
promoting the development of effective communication strategies. This, in turn, has cascading
effects on various aspects of life, including academic achievement, vocational success, and
independent living.
Adaptive behaviors, encompassing daily living skills and functional independence, also
experience positive outcomes with early detection. Interventions targeting adaptive behaviors,
such as self-care, organization, and time management, contribute to the individual's ability to
lead a more independent and fulfilling life. Early identification allows for the implementation of
personalized strategies and support systems, empowering individuals with ASD to navigate the
challenges of daily living more effectively.
Beyond the individual realm, the societal and economic significance of early ASD detection is
substantial. A society that actively promotes early identification and intervention contributes to a
more inclusive environment. By recognizing and accommodating the diverse needs of
individuals with ASD, societal structures become more accessible, allowing for greater
participation and contribution from this population. This inclusivity has ripple effects, fostering a
society that values neurodiversity and promotes the well-being of all its members.
From an economic perspective, early detection holds the potential to reduce the long-term
burden on healthcare systems. Timely interventions that address the core symptoms of ASD may
decrease the need for extensive and costly support services in later stages of life. Additionally,
individuals who receive early intervention are more likely to develop skills that enhance their
independence, potentially reducing the demand for long-term care and support services.

1.3 Role of Supervised Machine Learning in Healthcare

The role of supervised machine learning in healthcare represents a groundbreaking frontier that
has the potential to revolutionize various facets of medical practice. The integration of machine
learning technologies into healthcare systems signifies a paradigm shift, ushering in
unprecedented opportunities for enhanced diagnosis, personalized treatment strategies, and
improved patient care. In the realm of neurodevelopmental disorders, such as Autism Spectrum
Disorder (ASD), supervised machine learning stands out as a particularly promising avenue,
offering the potential to significantly advance diagnostic accuracy and efficiency.

4
In the context of ASD, a multifaceted neurodevelopmental condition characterized by
challenges in social interaction, communication difficulties, and repetitive behaviors, the
traditional diagnostic process has often been intricate and time-consuming. The reliance on
clinical observations, behavioral assessments, and subjective evaluations has led to variations in
diagnostic outcomes and, at times, delayed identification. Here, supervised machine learning
brings forth a transformative approach by harnessing the power of computational algorithms to
analyze and interpret complex patterns within vast datasets.
One of the primary advantages of supervised machine learning in the context of ASD lies in
its ability to learn from existing datasets. These datasets may include a diverse range of
information, such as clinical assessments, neuroimaging data, genetic profiles, and behavioral
observations. Through a process of supervised training, machine learning algorithms can discern
intricate patterns and relationships within these datasets, ultimately creating models that can
generalize and apply their learning to new, unseen data. This capacity to recognize subtle
patterns is particularly relevant in the case of ASD, where the disorder's spectrum nature and
variability in symptom presentation pose challenges for conventional diagnostic approaches.
The application of supervised machine learning in ASD diagnosis involves training
algorithms on labeled datasets, where each instance is associated with a known outcome or
diagnosis. These algorithms learn to identify patterns indicative of ASD based on the features
present in the training data. Once the training phase is complete, the machine learning model can
be applied to new, unseen data to predict whether an individual may have ASD, providing a
valuable tool for clinicians in the diagnostic process.
The potential benefits of incorporating supervised machine learning into ASD diagnosis are
manifold. Firstly, the accuracy of diagnoses may be significantly improved, as machine learning
models can analyze a wide array of data points simultaneously and identify subtle patterns that
may not be immediately apparent to human observers. This enhanced accuracy has the potential
to facilitate earlier and more precise identification of ASD, leading to timely interventions and
improved outcomes for individuals.
Moreover, the efficiency of the diagnostic process stands to benefit from the role of
supervised machine learning. The rapid analysis of diverse datasets allows for a streamlined and
objective assessment, reducing the time and resources traditionally required for a comprehensive

5
diagnosis. This efficiency is particularly critical in the context of neurodevelopmental disorders,
where early intervention has been shown to have a substantial impact on long-term outcomes.
However, it is essential to acknowledge the challenges and considerations associated with the
integration of machine learning in healthcare, including ethical concerns, data privacy, and the
interpretability of complex algorithms. Ensuring that machine learning models are transparent,
interpretable, and ethically sound is crucial for building trust among healthcare professionals and
the broader public.

Fig 1.1 Supervised Machine Learning Algorithm

6
1.4 Libraries used

1.4.1 Numpy
NumPy (Numerical Python) is an open source Python library that’s used in almost every field of
science and engineering. It’s the universal standard for working with numerical data in Python,
and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include
everyone from beginning coders to experienced researchers doing state-of-the-art scientific and
industrial research and development. The NumPy API is used extensively in Pandas, SciPy,
Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.

1.4.2 Pandas
Pandas is mainly used for data analysis and associated manipulation of tabular data in
DataFrames. Pandas allows importing data from various file formats such as comma-separated
values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.

1.4.3 Matplotlib
Matplotlib is a plotting library for the Python programming language and its numerical
mathematics extension NumPy. It provides an object-oriented API for embedding plots into
applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.

1.4.4 Seaborn
Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and
integrates closely with pandas data structures. Seaborn helps you explore and understand your
data.

1.4.5 SciKit Learn

Scikit-learn, a widely-used open-source machine learning library in Python, serves as a powerful
and versatile tool for developers, researchers, and data scientists. Designed to be user-friendly
and accessible, Scikit-learn provides a comprehensive set of functionalities for various machine
learning tasks, including classification, regression, clustering, and dimensionality reduction.
Developed on the principles of simplicity and efficiency, it seamlessly integrates with popular
data science libraries such as NumPy, SciPy, and Matplotlib, fostering a cohesive ecosystem for
machine learning and data analysis. With a vast array of algorithms and tools, Scikit-learn

7
empowers users to efficiently implement and experiment with machine learning models, making
it an indispensable resource for both beginners and seasoned practitioners in the field of data
science and artificial intelligence.

1.5 Overview of Machine Learning Algorithms

1.5.1 Random Forests Algorithm

Random Forest grows multiple decision trees which are merged together for a more accurate
prediction.
The logic behind the Random Forest model is that multiple uncorrelated models (the
individual decision trees) perform much better as a group than they do alone. When using
Random Forest for classification, each tree gives a classification or a “vote.” The forest chooses
the classification with the majority of the “votes.” When using Random Forest for regression, the
forest picks the average of the outputs of all trees.
The key here lies in the fact that there is low (or no) correlation between the individual
models—that is, between the decision trees that make up the larger Random Forest model. While
individual decision trees may produce errors, the majority of the group will be correct, thus
moving the overall outcome in the right direction.

Fig 1.2 Random Forest Algorithm

8
1.5.2 Support Vector Machines (SVM)
The best way to understand the SVM algorithm is by focusing on its primary type, the SVM
classifier. The idea behind the SVM classifier is to come up with a hyper-lane in an
N-dimensional space that divides the data points belonging to different classes. However, this
hyper-pane is chosen based on margin as the hyperplane providing the maximum margin
between the two classes is considered. These margins are calculated using data points known as
Support Vectors. Support Vectors are those data points that are near to the hyper-plane and help
in orienting it.

Fig 1.3 : Schematic diagram of SVM algorithm

1.5.3 k-Nearest Neighbors (kNN)
K-nearest neighbors (KNN) is a type of supervised learning algorithm used for both regression
and classification. KNN tries to predict the correct class for the test data by calculating the
distance between the test data and all the training points. Then select the K number of points
which is closest to the test data. The KNN algorithm calculates the probability of the test data
belonging to the classes of ‘K’ training data and which class holds the highest probability will be
selected. In the case of regression, the value is the mean of the ‘K’ selected training points.

9
Fig 1.4 k-Nearest Neighbors algorithm
1.5.4 Logistic Regression
Logistic regression is a process of modeling the probability of a discrete outcome given an input
variable. The most common logistic regression models a binary outcome; something that can
take two values such as true/false, yes/no, and so on. Multinomial logistic regression can model
scenarios where there are more than two possible discrete outcomes. Logistic regression is a
useful analysis method for classification problems, where you are trying to determine if a new
sample fits best into a category. As aspects of cyber security are classification problems, such as
attack detection, logistic regression is a useful analytic technique.

Fig 1.5 Logistic curve

10
1.5.5 Multi-Layer Perceptron(MLP)
A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP
consists of at least three layers of nodes. Except for the input nodes, each node is a neuron that
uses a nonlinear activation function. MLP utilizes a supervised learning technique called
backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from
a linear perceptron. It can distinguish data that is not linearly separable. Multilayer perceptrons
are sometimes colloquially referred to as ‘vanilla’ neural networks, especially when they have a
single hidden layer.

Fig 1.6 Multi-layer perceptron learning

1.5.6 XGB Classifier Algorithm

XGBoost, short for eXtreme Gradient Boosting, is a powerful and versatile machine learning
algorithm renowned for its efficiency and performance in predictive modeling. As a supervised
learning method, XGBoost belongs to the ensemble learning category, combining the strengths
of multiple weak learners to create a robust and accurate classifier. Widely utilized in various
domains, from finance to healthcare, XGBoost excels in handling diverse data types, providing
exceptional predictive accuracy, and effectively managing complex relationships within datasets.
With its innovative gradient boosting framework and regularization techniques, XGBoost stands
as a go-to choice for tackling classification challenges and pushing the boundaries of predictive
analytics.

11
1.6 Crafting an Advanced ASD Screening Tool

At the core of this ambitious project lies the fundamental goal of crafting an advanced Autism
Spectrum Disorder (ASD) screening tool that transcends the capabilities of individual
algorithms. The primary objective is to orchestrate a symphony of diverse machine learning
algorithms, strategically weaving together their distinctive strengths to form a screening
mechanism that surpasses the limitations inherent in any single approach. This endeavor seeks to
redefine the landscape of ASD diagnosis, with a focus on the unique challenges presented in the
context of adult ASD.
The significance of this undertaking is underscored by the recognition that the complexity of
ASD demands a multifaceted and adaptive approach. Individual algorithms, while powerful in
their own right, may exhibit limitations in capturing the intricate nuances and variability within
ASD data. Thus, the strategic amalgamation of a variety of algorithms becomes imperative, as it
holds the promise of synergistically enhancing the accuracy, sensitivity, and specificity of the
screening tool.
In the pursuit of constructing this groundbreaking screening mechanism, each algorithm is
chosen for its specific attributes and strengths. Decision Trees, known for their ability to unravel
complex decision-making processes, contribute by capturing patterns within ASD data that might
be challenging for human observers to discern. The ensemble then incorporates Random Forest,
which aggregates multiple Decision Trees, mitigating overfitting and enhancing the model's
generalization capacity. Support Vector Machines (SVM) add a layer of sophistication, excelling
in classifying data with intricate boundaries, a characteristic particularly relevant in the diverse
symptomatology of ASD.
The inclusion of K Nearest Neighbors (KNN) introduces a localized perspective, recognizing
the potential clustering of ASD symptoms within specific groups. Gaussian Naive Bayes, with its
probabilistic nature, accommodates the high-dimensional nature of ASD data, providing a robust
framework for classification. Logistic Regression, a classic yet powerful algorithm, contributes
its simplicity and interpretability to the ensemble.
The statistical rigor brought by Linear Discriminant Analysis (LDA) and Quadratic
Discriminant Analysis (QDA) enhances the tool's understanding of the underlying data
distributions associated with ASD. This diverse ensemble, collectively guided by each

12
algorithm's unique strengths, forms a screening mechanism that transcends the limitations of any
single methodology.
The project's commitment to overcoming the challenges associated with adult ASD diagnosis
is evident in its emphasis on innovation and adaptability. Adult ASD presents a distinct set of
complexities compared to childhood ASD, with individuals potentially developing coping
mechanisms or masking symptoms over time. The screening tool aims to address these nuances
by integrating algorithms that can discern patterns within retrospective data, offering a
comprehensive evaluation of an individual's lifelong behavioral patterns.
As this screening tool takes shape, rigorous training and validation processes are integral
components of its development. The iterative refinement of algorithmic parameters ensures that
the ensemble performs optimally, achieving a delicate balance between sensitivity and
specificity. The overarching goal is not merely the creation of a diagnostic tool but the
establishment of a pioneering solution that revolutionizes the diagnostic landscape for adult
ASD.
The potential impact of this advanced ASD screening tool extends beyond the individual level,
resonating in healthcare systems, research, and societal understanding. By providing a more
accurate and efficient means of identifying adult ASD, the tool has the potential to reduce the
diagnostic journey's intricacies, leading to earlier interventions and improved outcomes.
Additionally, the insights gained from the screening process contribute to the growing body of
knowledge surrounding adult ASD, fostering a more nuanced understanding of this complex
neurodevelopmental condition.

1.7 Dataset Compilation and Characteristics

Table 1.1 Attribute features and their description

Attribute Type Description

Age Number Age in years

Gender String Male or female

Ethnicity String List of common ethnicities in

text format.

Born with jaundice Boolean (yes or no) Whether the case was born

13
Attribute Type Description

with jaundice.

Family member with PDD Boolean (yes or no) Whether any immediate
family member has a PDD.

Who is completing the test String Parent, self, caregiver,

medical staff, clinician ,etc.

Country of residence String List of countries in text

format

Used the screening app before Boolean (yes or no) Whether the user has used a
screening app