0% found this document useful (0 votes)
77 views20 pages

A Machine Learning Framework For Early-Stage Detection of Autism Spectrum Disorders

This document describes a machine learning framework for early detection of autism spectrum disorder (ASD). The framework uses four feature scaling strategies and eight machine learning algorithms to classify four standard ASD datasets of different age groups. The best performing classifiers and feature scaling techniques are identified for each dataset based on evaluation metrics. The results show that Ada Boost and linear discriminant analysis achieved the highest accuracies of 99.25% and 97.12% respectively for two datasets. Feature selection is also performed to rank important attributes. The proposed framework achieves promising results for early ASD detection compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views20 pages

A Machine Learning Framework For Early-Stage Detection of Autism Spectrum Disorders

This document describes a machine learning framework for early detection of autism spectrum disorder (ASD). The framework uses four feature scaling strategies and eight machine learning algorithms to classify four standard ASD datasets of different age groups. The best performing classifiers and feature scaling techniques are identified for each dataset based on evaluation metrics. The results show that Ada Boost and linear discriminant analysis achieved the highest accuracies of 99.25% and 97.12% respectively for two datasets. Feature selection is also performed to rank important attributes. The proposed framework achieves promising results for early ASD detection compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Received 13 December 2022, accepted 25 December 2022, date of publication 26 December 2022,

date of current version 16 February 2023.


Digital Object Identifier 10.1109/ACCESS.2022.3232490

A Machine Learning Framework for Early-Stage


Detection of Autism Spectrum Disorders
S. M. MAHEDY HASAN1 , MD PALASH UDDIN 2,3 , (Member, IEEE),
MD AL MAMUN1 , (Senior Member, IEEE), MUHAMMAD IMRAN SHARIF4 ,
ANWAAR ULHAQ 5 , AND GOVIND KRISHNAMOORTHY6
1 Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh
2 Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur 5200, Bangladesh
3 School of Information Technology, Deakin University, Geelong, VIC 3220, Australia
4 Department of Computer Science, COMSATS University Islamabad, Wah Campus, Punjab 47040, Pakistan
5 School of Computing, Mathematics and Engineering, Charles Sturt University, Port Macquarie, NSW 2444, Australia
6 School of Psychology and Wellbeing, University of Southern Queensland, Ipswich, QLD 4305, Australia

Corresponding author: Anwaar Ulhaq ([email protected])


This work was supported by the Regional Australia Mental Health Research and Training Institute, Manna Institute, NSW, Australia, under
Grant 0000103935.

ABSTRACT Autism Spectrum Disorder (ASD) is a type of neurodevelopmental disorder that affects the
everyday life of affected patients. Though it is considered hard to completely eradicate this disease, disease
severity can be mitigated by taking early interventions. In this paper, we propose an effective framework for
the evaluation of various Machine Learning (ML) techniques for the early detection of ASD. The proposed
framework employs four different Feature Scaling (FS) strategies i.e., Quantile Transformer (QT), Power
Transformer (PT), Normalizer, and Max Abs Scaler (MAS). Then, the feature-scaled datasets are classified
through eight simple but effective ML algorithms like Ada Boost (AB), Random Forest (RF), Decision Tree
(DT), K-Nearest Neighbors (KNN), Gaussian Naïve Bayes (GNB), Logistic Regression (LR), Support Vector
Machine (SVM) and Linear Discriminant Analysis (LDA). Our experiments are performed on four standard
ASD datasets (Toddlers, Adolescents, Children, and Adults). Comparing the classification outcomes using
various statistical evaluation measures (Accuracy, Receiver Operating Characteristic: ROC curve, F1-score,
Precision, Recall, Mathews Correlation Coefficient: MCC, Kappa score, and Log loss), the best-performing
classification methods, and the best FS techniques for each ASD dataset are identified. After analyzing the
experimental outcomes of different classifiers on feature-scaled ASD datasets, it is found that AB predicted
ASD with the highest accuracy of 99.25%, and 97.95% for Toddlers and Children, respectively and LDA
predicted ASD with the highest accuracy of 97.12% and 99.03% for Adolescents and Adults datasets,
respectively. These highest accuracies are achieved while scaling Toddlers and Children with normalizer
FS and Adolescents and Adults with the QT FS method. Afterward, the ASD risk factors are calculated, and
the most important attributes are ranked according to their importance values using four different Feature
Selection Techniques (FSTs) i.e., Info Gain Attribute Evaluator (IGAE), Gain Ratio Attribute Evaluator
(GRAE), Relief F Attribute Evaluator (RFAE), and Correlation Attribute Evaluator (CAE). These detailed
experimental evaluations indicate that proper finetuning of the ML methods can play an essential role in
predicting ASD in people of different ages. We argue that the detailed feature importance analysis in this
paper will guide the decision-making of healthcare practitioners while screening ASD cases. The proposed
framework has achieved promising results compared to existing approaches for the early detection of ASD.

INDEX TERMS Autism spectrum disorder, machine learning, classification, feature scaling, feature
selection technique.

I. INTRODUCTION
The associate editor coordinating the review of this manuscript and Autism Spectrum Disorder (ASD) is a neurodevelopmental
approving it for publication was Santosh Kumar . condition associated with brain development that starts early

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
15038 VOLUME 11, 2023
S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

stage of life, impacting a person’s social relationships and Regression Trees (CART), Naive Bayes (NB), and SVM for
interaction issues [1], [2]. ASD has restricted and repeated adult ASD prediction. In [23], an ML model via induction of
behavioral patterns, and the word spectrum encompasses a rules was proposed for autism detection, which includes test-
wide range of symptoms and intensity [3], [4], [5]. Even ing on only one dataset and limited comparison. The authors
though there is no sustainable solution for ASD, simply early in [17] used LR analysis to build an ML autism classification
intervention and proper medical care will make a significant approach, which also falls into the lack of extensive validation
difference in a kid’s development to focus on improving a and comparison. The authors in [24] scrutinized autism data
child’s behaviors and skills in communication [6], [7], [8]. and observed that 5 of the overall 65 characteristics are suf-
Even so, the identification and diagnosis of ASD are really ficient to detect ASD through attention deficit hyperactivity
difficult and sophisticated, using traditional behavioral sci- disorder (ADHD). In 2019, the authors in [25] constructed an
ence. Usually, Autism is most commonly diagnosed at about RF-based model for the prediction of ASD utilizing behav-
two years of age and can also be diagnosed later, based on ioral features. In addition, the authors in [26] used LDA and
its severity [9], [10], [11]. A variety of treatment strategies KNN methods to identify ASD Children between the ages
are available to detect ASD as quickly as possible. These of 4 and 11 years. In 2018, the authors in [27] suggested an
diagnostic procedures aren’t always widely used in practice ASD model based on the RF classifier for children between
until a severe chance of developing ASD. The authors in [12] the ages of 4-11. The authors in [28] evaluated the predictive
provided a short and observable checklist that can be seen at performance of the Deep Neural Network (DNN) in the diag-
different stages of a person’s life, including toddlers, children, nosis of ASD utilizing two distinct Adult datasets. In 2019,
teens, and adults. Subsequently, the authors in [13] con- the authors in [18] constructed a smartphone application
structed the ASDTests mobile apps system for ASD identi- programming interface on RF-CART and RF-ID3 for the
fication as fast as possible, depending on a range of question- diagnosis of ASDs of all ages. The authors in [29] assessed
naire surveys, Q-CHAT, and AQ-10 methods. Consequently, the performance of multiple SVM kernels in classifying ASD
they also created an open-source dataset utilizing mobile data for children and explored that the polynomial kernel
phone app information and submitted the datasets to a pub- worked much better. The authors in [1] performed several
licly accessible website called the University of California- feature selection techniques on four ASD datasets and found
Irvine (UCI) machine learning repository and Kaggle for that the SVM classifier performed better for RIPPER-based
more development in this area of study. Over the past few toddler subset, correlation-based feature selection (CFS) and
years, several studies have been conducted incorporating var- Boruta CFS intersect (BIC) method-based child subset and
ious Machine Learning (ML) approaches to analyze and diag- CFS-based adult subset. Furthermore, they applied Shapley
nose ASD and also other diseases, such as diabetes, stroke, Additive Explanations (SHAP) method to various feature
and heart failure prediction as quickly as possible [14], [15], subsets, which achieved the highest accuracy and ranked their
[16]. The authors in [17] analyzed the ASD attributes utilizing features based on performance. The authors in [30] carried
Rule-based ML (RML) techniques and confirmed that RML out ensemble ML approaches of Fuzzy K-Nearest Neighbor
helps classification models boost classification accuracy. The (FKNN), Kernel Support Vector Machines (KSVM), Fuzzy
authors in [18] combined the Random Forest (RF) along Convolution Neural Network (FCNN), and Random For-
with Iterative Dichotomiser 3 (ID3) algorithms and produced est (RF) to classify Parkinson’s disease and ASD. Finally,
predictive models for children, adolescents, and adults. The the classification results are verified utilizing Leave-One-
authors in [19] introduced a new evaluation tool, integrating Person-Out Cross Validation (LOPOCV). The authors in
ADI-R and ADOS ML methods, and implemented differ- [31] performed an evolutionary cultural optimization algo-
ent attribute encoding approaches to resolve data insuffi- rithm to optimize the weights of Artificial Neural Net-
ciency, non-linearity, and inconsistency issues. Another study works (ANN) in classifying three benchmark datasets of
conducted by the authors in [13] demonstrates a feature-to- autism screening Toddlers, Children, and Adults. The authors
class and feature-to-feature correlation value utilizing cogni- in [32] performed an experimental analysis using 16 dif-
tive computing and implemented Support Vector Machines ferent ML models, among them, four bio-inspired algo-
(SVM), Decision Tree (DT), Logistic Regression (LR) as rithms, namely, Gray Wolf Optimization (GWO), Flower
ASD diagnostic and prognosis classifiers [17]. In addition, Pollination Algorithm (FPA), Bat Algorithms (BA), and Arti-
the authors in [20] explored traditionally formed (TD) (N = ficial Bee Colony (ABC) were employed for optimizing
19) and ASD (N = 11) cases, in which a correlation-based the wrapper feature selection method in order to select the
attribute selection was used to determine the importance of most informative features and to increase the accuracy of
the attributes. In 2015, the authors in [21] investigated ASD the classification models on genetic and personal charac-
and TD children and recognized 15 preschool ASDs using teristics datasets. Another study conducted by the authors
only seven features. Besides that, they conveyed that cluster in [33] combined three benchmark datasets as Toddlers,
analysis might effectively analyze complex patterns to predict Adolescents, and Adults and performed a Light Gradient
ASD phenotype and diversity. The authors in [22] contrasted Boosting Machine (LGBM) classifier to classify ASD. The
the classifier accuracy of K-Nearest Neighbors (KNN), LR, authors in [34] utilized Extreme Learning Machines (ELM)
Linear Discrimination Analysis (LDA), Classification and and Random Vector Function Link (RVFL) generalization

VOLUME 11, 2023 15039


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

techniques to classify the Toddlers, Adolescents, and Adults TABLE 1. Datasets description.
datasets.
This study gathers four standard ASD datasets (Toddlers,
Children, Adolescents, and Adults) and initially preprocesses
the datasets (manipulation of missing values and encoding).
Then, four Feature Scaling (FS) methods including Quantile
Transformer (QT), Power Transformer (PT), Normalizer, and
Max Abs Scaler (MAS) are undertaken to map the datasets
into an appropriate format for further assessments. There- dataset based on four FSTs to identify the risk factors for
after, the feature-scaled datasets are classified by eight simple ASD prediction.
but effective classification approaches (AB, RF, DT KNN, • Finally, we also perform extensive experiments and

Gaussian Naive Bayes (GNB), LR, SVM, and LDA), and comparisons using four different standard ASD datasets.
the best classification models are identified. Meanwhile, The remaining part of the paper is organized as follows.
we also explore the significance of the FS methods on each Section 2 demonstrates the proposed research methodology
dataset by analyzing the experimental outcomes of the trans- and material used in the study. Section 3 analyzes the detailed
formed datasets. Afterward, four Feature Selection Tech- experimental outcomes while Section 4 discusses the compar-
niques (FST) i.e., Info Gain Attribute Evaluator (IGAE), Gain ative results of the progressive works in this domain. At last,
Ratio Attribute Evaluator (GRAE), Relief F Attribute Evalu- Section 5 summarizes and concludes the observations and
ator (RFAE), and Correlation Attribute Evaluator (CAE) are findings.
implemented to calculate the risk factors of ASD and rank
the most important features of these feature-scaled Toddlers, II. MATERIALS & METHODS
Children, Adolescents and Adults datasets. Accordingly, this A. DATASET DESCRIPTION
study suggests that ML methods can be applied to help We collect the four ASD datasets (Toddlers, Adolescents,
identify the most significant features of ASD detection based Children, and Adults) from the publicly available reposito-
on the FST-based feature importance analysis and this will ries: Kaggle and UCI ML [36], [37], [38], [39]. The authors
help physicians diagnose ASD cases accurately. Notice that in [13] created the ASDTests smartphone app for Toddlers,
the work presented in [35] may seem somewhat similar to Children, Adolescents, and Adults ASD screening using
ours. However, the notable differences are as follows. (i) We QCHAT-10 and AQ-10. The application computes a score
consider four promising FS methods (QT, PT, Normalizer, of 0 to 10 for every individual, with which the final score
and MAS), whereas the three FS methods (Logarithmic, is 6 out of 10 which indicates an individual has positive
ZScore, and Sine) used in [35] are obsolete nowadays. (ii) ASD. In addition, ASD data is obtained from the ASDTests
After applying each FS method, we find the best FST from app while open-source databases are developed in order to
a list of IGAE, GRAE, RFAE, and CAE for each dataset to facilitate research in this area. The detailed description of the
train the ML models, whereas [35] did not consider any such Toddlers, Children, Adolescents, and Adults ASD datasets
tuning of the FST methods. (iii) We consider eight simple are given in Table 1 and Table 2.
but effective ML models for the prediction, whereas the ML
models used in [35] are archaic in this domain. (iv) Finally, B. METHOD OVERVIEW
we compare more recent works with our proposed model in This research aims to create an effective prediction model
contrast to [35]. To this end, the key contributions of this using different types of ML methods to detect autism in
paper are summarized as follows. people of different ages. First of all, the datasets are col-
• We develop a generalized ML framework for early-stage lected, and then the preprocessing is accomplished via the
detection of ASD in people of different missing values imputation, feature encoding, and oversam-
ages. pling. The Mean Value Imputation (MVI) method is used to
• We solve the imbalanced class distribution issue impute the missing values of the dataset. Then, the categor-
through Random Over Sampler to avoid the ML ical feature values are converted to their equivalent numer-
models being biased towards the majority class ical values using the One Hot Encoding (OHE) technique.
samples. Table 1 shows that all four datasets used in this work have an
• We select the best Feature Scaling (FS) method to map imbalanced class distribution problem. As such, a Random
individual ASD dataset’s feature values to improve the Over Sampler strategy is used to alleviate this issue. After
prediction performance. completing the initial preprocessing, the datasets’ feature
• We investigate eight simple but effective ML approaches values are scaled using four different FS techniques i.e., QT,
on each feature-scaled ASD dataset, analyze their clas- PT, Normalizer, and MAS (see their detailed operations in
sification performances and identify the best FS tech- Table 3). The feature-scaled datasets are then classified using
niques for each ASD dataset. eight different ML classification techniques i.e., AB, RF, DT,
• Furthermore, we also calculate and analyze the fea- KNN, GNB, LR, SVM, and LDA. Comparing the classifi-
ture importance values on each best feature-scaled ASD cation outcomes of the classifiers on different feature-scaled

15040 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 1. Sequential workflow for detecting ASD at an early stage.

TABLE 2. Feature description of the ASD datasets.

ASD datasets, the best-performing classification methods, calculate the risk factors that are most responsible for ASD
and the best FS techniques for each ASD dataset are iden- detection.
tified. After those analyses, the ASD risk factors are calcu-
lated, and the most important attributes are ranked according
to their importance values using four different FSTs i.e., C. MACHINE LEARNING METHOD
IGAE, GRAE, RFAE, and CAE (see the detailed opera- 1) ADA BOOST (AB)
tions in Table 4). To this end, Fig. 1 represents the pro- AB is a tree-based ensemble classifier that incorporates
posed research pipeline to analyze the ASD datasets and many weak classifiers to reduce misclassification errors [41].

VOLUME 11, 2023 15041


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

TABLE 3. Detailed description of the different FS methods [40].

TABLE 4. Detailed description of the different FST methods [40].

It selects the training set and iteratively assigns the weights assigned to the classifier. αt is calculated as follows.
depending on the previous training precision for retraining
0.5 ∗ ln(1 − E)
the algorithm. In order to train any weak classifier, an arbi- αt = (10)
trary subset of the full training set is used and AB assigns E
weights to each instance and classifier. The following equa- where E denote the error rate. The following equation is
tion defines the combination of several weak classifiers: utilized to update the weights of each training sample-label
XT pair (xi , yi ).
H (x) = Sign( αt ht (x)) (9)
Dt (i)exp(−αt yi ht (xi ))
t=1 Dt+1 (i) = (11)
Zt
where H (x) defines the output of the final model through
combining the weak classifiers and ht (x) represents the out- where Dt+1 denotes the updated weight, Dt specifies the
put of classifier t for input x and αt specifies the weight weight of previous level, and Zt sum of all weights.

15042 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

2) RANDOM FOREST (RF) where De indicates the euclidean distance, Xi denotes the
RF is a decision tree-based ensemble classification method testing sample values, Yi specifies the training sample values
and follows the split and conquer technique in the input and n represents the total number of sample values.
dataset to create multiple decision-making trees (known as
the forest) [42]. It works in two phases. At first, it creates a 5) GAUSSIAN NaïVE BAYES (GNB)
forest by combining the ‘N’ number of decision trees and in GNB algorithm follows a normal distribution and is used
the second phase, it makes predictions for each tree generated for classification when all the data values of a dataset are
in the first phase. The working process of the RF algorithm is numeric [43]. To compute the probability values of any
illustrated below: instance with respect to the class value mean and standard
1) Select random samples from the training dataset. deviation are calculated for each attribute of the dataset.
2) Construct decision trees for each training sample. Consequently, for testing, when any instance comes, it uti-
3) Select the value of ‘N ’ to define the number of decision lizes the mean and standard deviation values to calculate the
trees. probability of the test instance. The necessary equations are
4) Repeat Steps 1 and 2. given below:
5) For each test sample, find the predictions of each deci- 1X
n
sion tree, and assign the test sample a class value based µ= xi (16)
n
on majority voting. i=1
n
1 X
3) DECISION TREE (DT) δ= (xi − µ)2 (17)
n−1
DT follows a top-down approach to build a predictive model i=1

for class values using training data-inducing decision-making 1 1


f (x) = √ ∗ ∗ e−(x−µ)2 (18)
rules [43]. This research utilized the information gain method 2π δ
to select the best attribute. Assuming Pi , the probability such where µ indicates the mean, δ represents standard deviation,
that xi D, exists to a class Ci , and is predicted by |Ci, D|/|D|. xi denotes all samples in a particular column, n indicates
To classify instances in the dataset D, the required informa- the total number of samples and fx presents the conditional
tion is needed, and the following equation calculates it: probability of class value.
m
X
Info(D) = − Pi log2 (Pi ) (12) 6) LOGISTIC REGRESSION (LR)
i=1 Based on a given dataset of independent variables, logistic
where Info(D) is the average amount of information needed to regression calculates the likelihood that an event will occur,
identify Ci of an instance, xi D and the objective of DT is to such as voting or not voting. Given that the result is a prob-
divide repeatedly, D, into sub datasets D1 , D2 . . . . . . . . . Dn . ability, the dependent variable’s range is 0 to 1. In logis-
The following equation estimates the InfoA (D): tic regression, the odds—that is, the likelihood of success
v
divided by the probability of failure-are transformed using
X |Dj | the logit formula. The following formulae are used to express
InfoA (D) = ∗ Info(Dj ) (13)
|D| this logistic function, which is sometimes referred to as the
j=1
log odds or the natural logarithm of odds [43]:
Finally, the following equation calculates the information
1
gain value: p= , (19)
1 + e−x
Gain(A) = Info(D) − InfoA (D) (14) where p denotes the probability of instance x. At the time
of model training, for each instance x1 , x2 , x3 . . . . . . .xn the
4) K-NEAREST NEIGHBORS (KNN) logistic coefficients will be b0 , b1 , b2 . . . . . . bn . The stochas-
KNN classifies the test data by utilizing the training data tic gradient descent method estimates and updates the values
directly by calculating the K value, indicating the number of the coefficients.
of KNN [43]. For each instance, it computes the distance v = b0 x0 + b1 x1 + . . . . . . . . . .. + bn xn (20)
between all the training instances and sorts the distance. 1
Furthermore, a majority voting technique is employed to p= (21)
1 + e−v
assign the final class label to the test data. This research
Now, the following equation is used to update the values of
applies Euclidean distance to calculate the distances among
the coefficients:
instances. The following equation represents the Euclidean
distance calculation: b = b + l ∗ (y − p) ∗ (1 − p) ∗ p ∗ x (22)
v
u n Initially, all the coefficient values are 0 and y is the output
uX
De = t (Xi − Yi )2 (15) value for each training sample, where l denotes learning rate,
i=1 x represents biased input for b0 and is always 1. It updates the

VOLUME 11, 2023 15043


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

values of the coefficients until it predicts the correct output at the model with a small amount of data. If we perform hold-
the training stage. out validation with a fixed test set, then, there would have
a possibility of potential overfitting during model building
7) SUPPORT VECTOR MACHINE (SVM) and it will increase the variance and thus cannot generalize
SVM is used to classify both linear and non-linear data the prediction model for unseen test data. Various statistical
and mostly works well for high-dimensional data with non- evaluation measures including accuracy, Receiver Operat-
linear mapping. It explores the decision boundary or optimal ing Characteristics (ROC) curve, F1-score, precision, recall,
hyperplane to separate one class from another. This study Mathews Correlation Coefficient (MCC), Kappa score, and
used Radial Basis Function (RBF) as a kernel function and Log loss are considered to justify the experimental outcomes.
SVM automatically defines centers, weights, and thresholds The evaluation measures are calculated using the following
and reduces an upper bound of expected test error [29], [44]. formulae.
The following equation represents the RBF function: TN + TP
Accuracy = (27)
(||x − x 0 ||)2 TN + TP + FN + FP
K (x, x 0 ) = exp(− ) (23) TP
2δ 2 Precision = (28)
FP + TP
where (||x − x 0 ||)2 defines the squared Euclidean distance TP
Recall = (29)
between the two feature samples and δ is a free parameter. FN + TP
2TP
F1 − Score = (30)
8) LINEAR DISCRIMINANT ANALYSIS (LDA) FN + FP + 2TP
LDA is a dimensionality reduction technique but can be (TP ∗ TN −FP ∗ FN )
MCC = 1
used for classification by exploring the linear combination ((TP+FP)(TP+FN )(TN +FP)(TN +FN )) 2
of features [45]. LDA uses the Bayes theorem to estimate (31)
the probability. Let us, consider k classes and n training po − pe
samples that are defined as {x1 , x2 . . . . . . . . . xn } with classes Kappa = (32)
1 − pe
zi {1 . . . ..k}. The prior probability is assumed to display as
LogLoss = −1.0 ∗ (y log(y0 )) + (1 − y) ∗ log(1 − y0 )
Gaussian distribution φ(x|µk , 6) in each class. The model
estimation is defined as follows: (33)
Pn
i=1 l ∗ (zi = k)
The following terms represent the above equations. TP =
ak = (24) True Positive; TN = True Negative; FP = False Positive;
Pn n
FN = False Negative; po is the relative observed agreement
i=1 xi ∗ l ∗ (zi = k)
µk = P n (25) among raters; and pe is the hypothetical probability of chance
i=1 l ∗ (zi = k)
Pn agreement; y is the actual/true value and y0 is the prediction
i=1 (xi − µzi )(xi − µzi )
T
6= , (26) probability of each observation.
n
where ak denotes the prior probability, µk defines mean of all B. ANALYSIS ON ACCURACY
classes, 6 indicates the sample covariance of the class means. Accuracy represents the actual prediction performance of
any classifier. The higher the value of accuracy indicates
III. EXPERIMENTAL RESULTS ANALYSIS better prediction and lower the miss-classification. The accu-
A. EXPERIMENTAL SETUP racy values of various classifiers on different feature-scaled
In order to conduct the experiment, an open-source cloud- datasets are presented in Table 5.
based service named Google Collaboratory provided by In this case, LDA delivers the best accuracy of 97.12% for
Google is utilized. The scikit-learn package of Python pro- the normalizer-scaled Adolescent dataset. Moreover, while
gramming language is used to complete the data prepro- investigating the results of the feature-scaled Adult dataset,
cessing, feature scaling, feature selection, and classification it is seen that both the QT and normalizer-scaled datasets
tasks. In this work, a 10-fold cross-validation technique [46], perform better than the other FS methods. In both of the
[47], [48] is utilized to construct prediction models using cases, LDA achieves the best accuracy value of 99.03%.
four different ASD (Toddlers, Children, Adolescents, and Additionally, the accuracy values of various ML classifiers
Adults) datasets. In 10-fold cross-validation, during training, on feature-scaled Toddlers, Children, Adolescents, and Adult
the datasets are randomly divided into equal 10 folds. Dur- datasets are contrasted in Fig. 2.
ing model building, 9 folds are used and training and the
remaining one is used for testing. Hence, this procedure is C. ANALYSIS ON PRECISION
repeated 10 times, and finally, average the results. Here, due Precision represents positive predictive value and a higher
to the lack of enough samples in the datasets, 10-fold cross- value of precision means the true positive value is high and
validation is used to prevent the model from overfitting and the false positive value is low. The precision values of various
reducing the variance during model building and generalize classifiers on different feature-scaled datasets are presented

15044 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 2. Accuracy of the classifiers on different feature-scaled datasets.

TABLE 5. Accuracy of different ML classifiers on ASD datasets.

in Table 6. Analyzing the precision values of the Toddler feature-scaled Adolescent dataset, we observe that DT deliv-
dataset, it is found that the AB classifier provides the best ers the best precision of 97.25% while using PT as FS method.
precision of 99.95% while PT is used as the FS method. While Moreover, while investigating the results of the feature-scaled
reviewing the feature-scaled Children dataset, it is noticed Adult dataset, it is seen that both the QT-transformed datasets
that the LR classifier obtains the highest precision of 96.16% perform better than the other FS methods. In that case, SVM
for MAS in classifying ASD. Furthermore, inspecting the achieves the best precision value of 98.16%. Additionally,

VOLUME 11, 2023 15045


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 3. Precision of the classifiers on different feature-scaled datasets.

TABLE 6. Precision of the different ML classifiers on ASD datasets.

the precision values of various ML classifiers on feature- false negative value is low. When the true positive is high
scaled Toddlers, Children, Adolescents, and Adult datasets and the false negative is low that means better predic-
are contrasted in Fig. 3. tion. The recall values of various ML classifiers on dif-
ferent feature-scaled datasets are presented in Table 7.
D. ANALYSIS ON RECALL While reviewing the recall results of the feature-scaled Tod-
Recall represents a true positive rate and a higher value dler dataset, it is observed that AB obtains the highest
of recall means the true positive value is high and the recall of 98.45% for the normalizer-scaled Toddler dataset.

15046 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 4. Recall of the classifiers on different feature-scaled datasets.

TABLE 7. Recall of the different ML classifiers on ASD datasets.

Investigating the feature-scaled Children datasets, we find Adolescent datasets. Finally, we analyze the outcomes of
that LR delivers the best recall value of 97.72% while nor- feature-scaled Adult datasets and find that RF, KNN, and
malizer as FS method. Moreover, inspecting the recall results LR deliver the highest recall of 100.00% for PT, while DT
of feature-scaled adolescent datasets, it is noticed that AB and KNN obtain the best recall of 100.00% for PT and
achieves the highest recall of 97.36% for normalizer-scaled KNN, LR also obtains 100.00% recall value for MAS-scaled

VOLUME 11, 2023 15047


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 5. ROC of the classifiers on different feature-scaled datasets.

TABLE 8. ROC of the different ML classifiers on ASD datasets.

adult’s datasets. Besides, we also compare the recall values of of various ML classifiers on different feature-scaled datasets
various ML classifiers on feature-scaled Toddlers, Children, are presented in Table 8. While reviewing the ROC results
Adolescents, and Adult datasets in Fig. 4. of the feature-scaled Toddler dataset, it is observed that LR
obtains the highest ROC of 99.99% for both QT and PT and
E. ANALYSIS ON ROC AB achieves 99.99% for the normalizer method. Investigating
The ROC value indicates the ability of any classifier to distin- the feature-scaled Children dataset, it is found that GNB
guish between positive and negative classes. The ROC values delivers the best ROC value of 99.73% using normalizer as

15048 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 6. F1-score of the classifiers on different feature-scaled datasets.

TABLE 9. F1-score of the different ML classifiers on ASD datasets.

the FS method. Moreover, inspecting the ROC results of the ML classifiers on feature-scaled Toddlers, Children, Adoles-
feature-scaled Adolescent dataset, we notice that both AB cents, and Adult datasets in Fig. 5.
and LDA achieve the highest ROC of 99.72% for QT and
MAS-scaled datasets. Finally, we analyze the outcomes of
feature-scaled Adult datasets and find that LDA delivers the F. ANALYSIS ON THE F1-SCORE
highest ROC value of 99.99% while using PT and normalizer F1-score takes the harmonic mean of the precision and recall
as the FS methods. We compare the ROC values of various values and a higher value of it indicates better prediction.

VOLUME 11, 2023 15049


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 7. Kappa of the classifiers on different feature-scaled datasets.

TABLE 10. Kappa of the different ML classifiers on ASD datasets.

The F1-score values of various ML classifiers on differ- AB delivers the best F1-score value of 97.02% while using
ent feature-scaled datasets are presented in Table 9. While QT and normalizer as FS methods. Moreover, inspecting
reviewing the F1-score results of the feature-scaled Toddler the F1-score results of feature-scaled Adolescent datasets,
dataset, we observe that AB obtains the highest F1-score of we notice that AB achieves the highest F1-score of 97.69%
99.14% for the normalizer-scaled Toddler dataset. Investi- for the QT-scaled Adolescent dataset. Finally, we analyze
gating the feature-scaled Children dataset, it is found that the outcomes of the feature-scaled Adult dataset and notice

15050 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 8. Log loss of the classifiers on different feature-scaled datasets.

TABLE 11. Log loss of the different ML classifiers on ASD datasets.

that LDA delivers the highest F1-score value of 99.11% means a better prediction which indicates a higher degree of
while using PT as the FS method. We compare the F1-score agreement between actual and predicted values. The kappa
values of various ML classifiers on feature-scaled Toddlers, values of various ML classifiers on different feature-scaled
Children, Adolescents, and Adult datasets in Fig. 6. datasets are presented in Table 10. While reviewing the kappa
results of the feature-scaled Toddler dataset, it is observed
G. ANALYSIS ON KAPPA that both the normalizer and MAS-scaled datasets provide
Kappa score measures the degree of agreement between the best kappa value and outperform the other FS meth-
true class and predicted class. The higher value of kappa ods. Consequently, both LR and LDA obtain the highest

VOLUME 11, 2023 15051


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

FIGURE 9. MCC of the classifiers on different feature-scaled datasets.

TABLE 12. MCC of the different ML classifiers on ASD datasets.

MCC of 99.31% for normalizer and MAS-scaled Toddler using QT and normalizer as the feature scaling methods.
datasets. Investigating the feature-scaled Children datasets, Besides, we also compare the kappa values of various ML
it is found that AB delivers the best kappa value of 93.78% classifiers on feature-scaled Toddlers, Children, Adolescents,
using normalizer as FS method. Moreover, inspecting the and Adult datasets in Fig. 7.
kappa results of feature-scaled Adolescent datasets, we notice
that LDA achieves the highest MCC of 94.02% for both QT H. ANALYSIS ON LOG LOSS
and PT-scaled datasets respectively. Finally, we analyze the The log loss value indicates how close the prediction prob-
outcomes of feature-scaled Adult datasets and see that both ability is to the true values. The lower the log loss value,
LR and LDA deliver the highest kappa value of 99.02% while the better the prediction. The log loss values of various ML

15052 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

TABLE 13. Feature importance for the normalizer-scaled toddlers.

TABLE 14. Feature importance for the normalizer-scaled children.

classifiers on different feature-scaled datasets are presented correlation. The higher value of MCC represents better pre-
in Table 11. While reviewing the log loss results of the diction and strong correlation between actual and predicted
feature-scaled Toddler and children datasets, we observe that class. While reviewing the MCC results of the feature-scaled
AB obtains the lowest log loss of 0.0802% and 0.98% for Toddler dataset, we observe that both LR and LDA obtain
the normalizer-scaled toddler and QT and PT-scaled chil- the highest MCC of 99.31% for normalizer and MAS-scaled
dren. Furthemore, it is noticed that LDA achieves the lowest Toddler datasets. Investigating the feature-scaled children
log loss of 1.12% for QT, PT, and MAS-scaled adolescents datasets, it is found that AB delivers the best MCC value
datasets. Finally, we analyze the outcomes of feature-scaled of 93.88% using normalizer as the FS method. Moreover,
Adult datasets and see that both LR and LDA deliver the high- inspecting the MCC results of feature-scaled Adolescent
est log loss value of 0.16% while using QT and normalizer as datasets, we notice that LDA achieves the highest MCC of
the feature scaling methods. Besides, we also compare the 94.25% for both QT and PT-scaled datasets respectively.
log loss values of various ML classifiers on feature-scaled Finally, we analyze the outcomes of feature-scaled Adult
Toddlers, Children, Adolescents, and Adult datasets in Fig. 8. datasets and find that both LR and LDA deliver the highest
MCC value of 99.03% while using QT as the feature scal-
I. ANALYSIS ON MCC ing method. Besi’des, we also compare the MCC values of
MCC takes all the coefficient of confusion matrix such as TP, various ML classifiers on feature-scaled toddlers, children,
TN, FN and FP into consideration to calculate the degree of adolescents, and adults datasets in Fig. 9.

VOLUME 11, 2023 15053


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

TABLE 15. Feature importance for the QT-scaled adolescents.

TABLE 16. Feature importance for the QT-scaled adults.

IV. DISCUSSION AND EXTENDED COMPARISON (98.45%), LR (97.72%), AB (97.36%), RF, DT, KNN,
In the previous section, we analyzed four different ASD LR (100%) recall; LR, LDA (99.31%), AB (93.88%), LDA
datasets to build prediction models for different stages of (94.25%), LR, LDA (99.03%) MCC; LR, LDA (99.31%),
people. In order to do this, we applied various FS methods to AB (93.78%), LDA (94.02%), LR, LDA (99.02%) kappa; AB
those ASD datasets and classified them utilizing eight differ- (0.0802%), AB (0.98%), LDA (1.12%), LR, LDA (0.16%)
ent simple but effective ML classifiers and also determined log loss for Toddlers, Children, Adolescents, Adults datasets
how the FS methods affect the classification performance. respectively. After analyzing the experimental outcomes of
Furthermore, we also employed four different FSTs to com- different classifiers on feature-scaled ASD datasets, it is
pute the importance of the features which are more responsi- found that AB for Toddlers and Children, and LDA for Ado-
ble for ASD prediction. Inspecting the experimental findings, lescents and Adults outperformed the other ML classifiers
the best performing classifiers model predicted ASD with in terms of classification performance. Besides, the experi-
AB (99.25%), AB (97.95%), LDA (97.12%), LDA (99.03%) mental outcomes implied that the normalizer FS method for
accuracy; AB, LR (99.99%), GNB (99.73%), AB, LDA Toddlers, normalizer FS method for Children, QT FS method
(99.72%), LDA (99.99%) ROC; AB (99.14%), AB (97.02%), for Adolescents, and QT FS method for Adults showed better
AB (97.69%), LDA (99.11%) F1-score; AB (99.95%), performance. Additionally, we calculated the feature impor-
LR (96.16%), DT (97.25%), SVM (98.16%) precision; AB tance using the IGAE, GRAE, RFAE, and CAE FST methods

15054 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

TABLE 17. Comparison with other works.

on the normalizer-scaled Toddlers, normalizer-scaled Chil- four different FSTs (IGAE, GRAE, RFAE, and CAE). There-
dren, QT-scaled Adolescents, and QT-scaled Adults to enu- fore, the experimental analysis of this research will allow
merate the risk factors for ASD prediction. The quantitative healthcare practitioners to take into account the most impor-
results are provided in Table 13, Table 14, Table 15 and tant features while screening ASD cases. The limitation of
Table 16. This feature importance analysis helps healthcare our research work is that the amount of data was not suffi-
practitioners decide the most important features while screen- cient enough to build a generalized model for people of all
ing ASD cases. To this end, we provide the comparative stages. In the future, we intend to collect more data related
results of our work with other recent studies in Table 17. to ASD and construct a more generalized prediction model
for people of any age to improve ASD detection and other
V. CONCLUSION neuro-developmental disorders.
In this work, we proposed a machine-learning framework
for ASD detection in people of different ages (Toddlers, REFERENCES
Children, Adolescents, and Adults). We show that predictive
[1] M. Bala, M. H. Ali, M. S. Satu, K. F. Hasan, and M. A. Moni, ‘‘Efficient
models based on ML techniques are useful tools for this machine learning models for early stage detection of autism spectrum
task. After completing the initial data processing, those ASD disorder,’’ Algorithms, vol. 15, no. 5, p. 166, May 2022.
datasets were scaled using four different types of feature [2] D. Pietrucci, A. Teofani, M. Milanesi, B. Fosso, L. Putignani, F. Messina,
scaling (QT, PT, normalizer, MAS) techniques, classified G. Pesole, A. Desideri, and G. Chillemi, ‘‘Machine learning data analysis
highlights the role of parasutterella and alloprevotella in autism spectrum
using eight different ML classifiers (AB, RF, DT, KNN, disorders,’’ Biomedicines, vol. 10, no. 8, p. 2028, Aug. 2022.
GNB, LR, SVM, LDA). We then analyzed each feature- [3] R. Sreedasyam, A. Rao, N. Sachidanandan, N. Sampath, and
scaled dataset’s classification performance and identified the S. K. Vasudevan, ‘‘Aarya—A kinesthetic companion for children
with autism spectrum disorder,’’ J. Intell. Fuzzy Syst., vol. 32, no. 4,
best-performing FS and classification approaches. We con- pp. 2971–2976, Mar. 2017.
sidered different statistical evaluation measures such as accu- [4] J. Amudha and H. Nandakumar, ‘‘A fuzzy based eye gaze point
racy, ROC, F1-Score, precision, recall, Mathews correlation estimation approach to study the task behavior in autism spec-
coefficient (MCC), kappa score, and Log loss to justify the trum disorder,’’ J. Intell. Fuzzy Syst., vol. 35, no. 2, pp. 1459–1469,
Aug. 2018.
experimental findings. Consequently, our proposed predic- [5] H. Chahkandi Nejad, O. Khayat, and J. Razjouyan, ‘‘Software development
tion models based on ML techniques can be utilized as an of an intelligent spirography test system for neurological disorder detection
alternative or even a helpful tool for physicians to accurately and quantification,’’ J. Intell. Fuzzy Syst., vol. 28, no. 5, pp. 2149–2157,
Jun. 2015.
identify ASD cases for people of different ages. Additionally,
[6] F. Z. Subah, K. Deb, P. K. Dhar, and T. Koshiba, ‘‘A deep learning approach
the feature importance values were calculated to identify the to predict autism spectrum disorder using multisite resting-state fMRI,’’
most prominent features for ASD prediction by employing Appl. Sci., vol. 11, no. 8, p. 3636, Apr. 2021.

VOLUME 11, 2023 15055


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

[7] K.-F. Kollias, C. K. Syriopoulou-Delli, P. Sarigiannidis, and G. F. Fragulis, [28] M. F. Misman, A. A. Samah, F. A. Ezudin, H. A. Majid, Z. A. Shah,
‘‘The contribution of machine learning and eye-tracking technology in H. Hashim, and M. F. Harun, ‘‘Classification of adults with autism spec-
autism spectrum disorder research: A systematic review,’’ Electronics, trum disorder using deep neural network,’’ in Proc. 1st Int. Conf. Artif.
vol. 10, no. 23, p. 2982, Nov. 2021. Intell. Data Sci. (AiDAS), Sep. 2019, pp. 29–34.
[8] I. A. Ahmed, E. M. Senan, T. H. Rassem, M. A. H. Ali, H. S. A. Shatnawi, [29] S. Huang, N. Cai, P. P. Pacheco, S. Narrandes, Y. Wang, and W. Xu, ‘‘Appli-
S. M. Alwazer, and M. Alshahrani, ‘‘Eye tracking-based diagnosis and cations of support vector machine (SVM) learning in cancer genomics,’’
early detection of autism spectrum disorder using machine learning and Cancer Genomics Proteomics, vol. 15, no. 1, pp. 41–51, Jan./Feb. 2018.
deep learning techniques,’’ Electronics, vol. 11, no. 4, p. 530, Feb. 2022. [30] A. S. Haroon and T. Padma, ‘‘An ensemble classification and binomial
[9] P. Sukumaran and K. Govardhanan, ‘‘Towards voice based prediction and cumulative based PCA for diagnosis of Parkinson’s disease and autism
analysis of emotions in ASD children,’’ J. Intell. Fuzzy Syst., vol. 41, no. 5, spectrum disorder,’’ Int. J. Syst. Assurance Eng. Manage., early access,
pp. 5317–5326, 2021. pp. 1–16, Jul. 2022.
[10] S. P. Abirami, G. Kousalya, and R. Karthick, ‘‘Identification and explo- [31] R. Abitha, S. M. Vennila, and I. M. Zaheer, ‘‘Evolutionary multi-
ration of facial expression in children with ASD in a contact less environ- objective optimization of artificial neural network for classification of
ment,’’ J. Intell. Fuzzy Syst., vol. 36, no. 3, pp. 2033–2042, Mar. 2019. autism spectrum disorder screening,’’ J. Supercomput., vol. 78, no. 9,
[11] M. D. Hossain, M. A. Kabir, A. Anwar, and M. Z. Islam, ‘‘Detecting autism pp. 11640–11656, Jun. 2022.
spectrum disorder using machine learning techniques,’’ Health Inf. Sci. [32] M. Alsuliman and H. H. Al-Baity, ‘‘Efficient diagnosis of autism with
Syst., vol. 9, no. 1, pp. 1–13, Dec. 2021. optimized machine learning models: An experimental analysis on genetic
[12] C. Allison, B. Auyeung, and S. Baron-Cohen, ‘‘Toward brief ‘red flags’ and personal characteristic datasets,’’ Appl. Sci., vol. 12, no. 8, p. 3812,
for autism screening: The short autism spectrum quotient and the short Apr. 2022.
quantitative checklist in 1,000 cases and 3,000 controls,’’ J. Amer. Acad. [33] S. P. Kamma, S. Bano, G. L. Niharika, G. S. Chilukuri, and D. Ghanta,
Child Adolescent Psychiatry, vol. 51, no. 2, pp. 202–212, 2012. ‘‘Cost-effective and efficient detection of autism from screening test data
[13] F. Thabtah, F. Kamalov, and K. Rajab, ‘‘A new computational intelligence using light gradient boosting machine,’’ in Intelligent Sustainable Systems.
approach to detect autistic features for autism screening,’’ Int. J. Med. Singapore: Springer, pp. 777–789, 2022.
Inform., vol. 117, pp. 112–124, Sep. 2018. [34] U. Gupta, D. Gupta, and U. Agarwal, ‘‘Analysis of randomization-based
[14] M. M. Ali, B. K. Paul, K. Ahmed, F. M. Bui, J. M. W. Quinn, and approaches for autism spectrum disorder,’’ in Pattern Recognition and
M. A. Moni, ‘‘Heart disease prediction using supervised machine learning Data Analysis with Applications. Singapore: Springer, pp. 701–713, 2022.
algorithms: Performance analysis and comparison,’’ Comput. Biol. Med., [35] T. Akter, M. Shahriare Satu, M. I. Khan, M. H. Ali, S. Uddin, P. Lio,
vol. 136, Sep. 2021, Art. no. 104672. J. M. W. Quinn, and M. A. Moni, ‘‘Machine learning-based models for
[15] E. Dritsas and M. Trigka, ‘‘Stroke risk prediction with machine learning early stage detection of autism spectrum disorders,’’ IEEE Access, vol. 7,
techniques,’’ Sensors, vol. 22, no. 13, p. 4670, Jun. 2022. pp. 166509–166527, 2019.
[16] V. Chang, J. Bailey, Q. A. Xu, and Z. Sun, ‘‘Pima Indians diabetes mel- [36] Kaggle. (2022). Autism Spectrum Disorder Detection Dataset for Toddlers.
litus classification based on machine learning (ML) algorithms,’’ Neural [Online]. Available: https://www.kaggle.com/fabdelja/autism-screening-
Comput. Appl., early access, pp. 1–17, Mar. 2022. for-toddlers
[17] F. Thabtah, ‘‘Machine learning in autistic spectrum disorder behavioral [37] UCI. (2022). UCI Machine Learning Repository: Autistic Spectrum Dis-
research: A review and ways forward,’’ Inform. Health Social Care, vol. 44, order Screening Data for Adolescent Data Set. [Online]. Available:
no. 3, pp. 278–297, 2018. https://shorturl.at/fhxCZ
[18] K. S. Omar, P. Mondal, N. S. Khan, M. R. K. Rizvi, and M. N. Islam, [38] UCI. (2022). UCI Machine Learning Repository: Autism Screen-
‘‘A machine learning approach to predict autism spectrum disorder,’’ in ing Adult Data Set. [Online]. Available: https://archive.ics.uci.edu/
Proc. Int. Conf. Electr., Comput. Commun. Eng. (ECCE), Feb. 2019, ml/datasets/Autism+Screening+Adult
pp. 1–6. [39] UCI. (2022). UCI Machine Learning Repository: Autistic Spectrum
[19] H. Abbas, F. Garberson, E. Glover, and D. P. Wall, ‘‘Machine learning Disorder Screening Data for Children Data Set. [Online]. Available:
approach for early detection of autism by combining questionnaire and https://shorturl.at/fiwLU
home video screening,’’ J. Amer. Med. Informat. Assoc., vol. 25, no. 8, [40] D. Singh and B. Singh, ‘‘Investigating the impact of data normalization
pp. 1000–1007, 2018. on classification performance,’’ Appl. Soft Comput., vol. 97, Dec. 2020,
[20] K. L. Goh, S. Morris, S. Rosalie, C. Foster, T. Falkmer, and T. Tan, Art. no. 105524.
‘‘Typically developed adults and adults with autism spectrum disor- [41] D. Mease, A. J. Wyner, and A. Buja, ‘‘Boosted classification trees and
der classification using centre of pressure measurements,’’ in Proc. class probability/quantile estimation,’’ J. Mach. Learn. Res., vol. 8, no. 3,
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2016, pp. 409–439, 2007.
pp. 844–848. [42] Q. Wang, W. Cao, J. Guo, J. Ren, Y. Cheng, and D. N. Davis, ‘‘DMP_MI:
[21] A. Crippa, C. Salvatore, P. Perego, S. Forti, M. Nobile, M. Molteni, and An effective diabetes mellitus classification algorithm on imbalanced data
I. Castiglioni, ‘‘Use of machine learning to identify children with autism with missing values,’’ IEEE Access, vol. 7, pp. 102232–102238, 2019.
and their motor abnormalities,’’ J. Autism Develop. Disorders, vol. 45, [43] S. M. M. Hasan, M. A. Mamun, M. P. Uddin, and M. A. Hossain, ‘‘Compar-
no. 7, pp. 2146–2156, 2015. ative analysis of classification approaches for heart disease prediction,’’ in
[22] B. Tyagi, R. Mishra, and N. Bajpai, ‘‘Machine learning techniques to Proc. Int. Conf. Comput., Commun., Chem., Mater. Electron. Eng. (ICME),
predict autism spectrum disorder,’’ in Proc. IEEE Punecon, Jun. 2019, Feb. 2018, pp. 1–4.
pp. 1–5. [44] D. Ramesh and Y. S. Katheria, ‘‘Ensemble method based predictive model
[23] F. Thabtah and D. Peebles, ‘‘A new machine learning model based on for analyzing disease datasets: A predictive analysis approach,’’ Health
induction of rules for autism detection,’’ Health Informat. J., vol. 26, no. 1, Technol., vol. 9, no. 4, pp. 533–545, Aug. 2019.
pp. 264–286, Mar. 2020. [45] A. Arabameri and H. R. Pourghasemi, ‘‘Spatial modeling of gully erosion
[24] M. Duda, R. Ma, N. Haber, and D. P. Wall, ‘‘Use of machine learning for using linear and quadratic discriminant analyses in GIS and R,’’ in Spatial
behavioral distinction of autism and ADHD,’’ Transl. Psychiatry, vol. 6, Modeling in GIS and R for Earth and Environmental Sciences. Amsterdam,
no. 2, pp. e732–e732, Feb. 2016. The Netherlands: Elsevier, pp. 299–321, 2019.
[25] S. B. Shuvo, J. Ghosh, and A. S. Oyshi, ‘‘A data mining based approach [46] J. Stirling, T. Chen, and M. Adamou, ‘‘Autism spectrum disorder classi-
to predict autism spectrum disorder considering behavioral attributes,’’ fication using a self-organising fuzzy classifier,’’ in Fuzzy Logic. Cham,
in Proc. 10th Int. Conf. Comput., Commun. Netw. Technol. (ICCCNT), Switzerland: Springer, pp. 83–94, 2021.
Jul. 2019, pp. 1–5. [47] M. M. Rahman, O. L. Usman, R. C. Muniyandi, S. Sahran, S. Mohamed,
[26] O. Altay and M. Ulas, ‘‘Prediction of the autism spectrum disorder diag- and R. A. Razak, ‘‘A review of machine learning methods of feature
nosis with linear discriminant analysis classifier and K-nearest neighbor selection and classification for autism spectrum disorder,’’ Brain Sci.,
in children,’’ in Proc. 6th Int. Symp. Digit. Forensic Secur. (ISDFS), vol. 10, no. 12, p. 949, Dec. 2020.
Mar. 2018, pp. 1–4. [48] M. Hasan, M. M. Ahamad, S. Aktar, and M. A. Moni, ‘‘Early stage autism
[27] F. N. Buyukoflaz and A. Ozturk, ‘‘Early autism diagnosis of children with spectrum disorder detection of adults and toddlers using machine learning
machine learning algorithms,’’ in Proc. 26th Signal Process. Commun. models,’’ in Proc. 5th Int. Conf. Electr. Inf. Commun. Technol. (EICT),
Appl. Conf. (SIU), May 2018, pp. 1–4. Dec. 2021, pp. 1–6.

15056 VOLUME 11, 2023


S. M. Mahedy Hasan et al.: Machine Learning Framework for Early-Stage Detection of Autism Spectrum Disorders

[49] H. Talabani and E. Avci, ‘‘Performance comparison of SVM kernel types MUHAMMAD IMRAN SHARIF received the
on child autism disease database,’’ in Proc. Int. Conf. Artif. Intell. Data B.S. and M.S. degrees in computer science from
Process. (IDAP), Sep. 2018, pp. 1–5. COMSATS University Islamabad, Wah Cam-
pus, Pakistan, in 2019 and 2021, respectively.
His research interests include medical imaging,
machine learning, computer vision, artificial intel-
ligence, and pattern recognition.
S. M. MAHEDY HASAN received the B.Sc.
degree in computer science and engineering from
the Rajshahi University of Engineering and Tech-
nology (RUET), Bangladesh. He is currently serv-
ing as an Assistant Professor with the Department
of Computer Science and Engineering, RUET.
Before joining RUET, he was a Lecturer at the
Department of Computer Science and Engineer-
ing, Bangabandhu Sheikh Mujibur Rahman Sci-
ence and Technology University (BSMRSTU),
Bangladesh, in 2019. His research interests include computer vision, pattern
recognition, machine learning, deep learning, transfer learning, biomedical
engineering, bioinformatics, natural language processing, text mining, and
pedagogy.

ANWAAR ULHAQ received the Ph.D. degree


MD PALASH UDDIN (Member, IEEE) received in artificial intelligence from Monash University,
the B.Sc. degree in computer science and engineer- Australia. He is currently working as a Senior
ing from Hajee Mohammad Danesh Science and Lecturer (AI) with the School of Computing,
Technology University (HSTU), Bangladesh, and Mathematics, and Engineering, Charles Sturt Uni-
the M.Sc. degree in computer science and engi- versity, Australia. He has developed national and
neering from the Rajshahi University of Engineer- international recognition in computer vision and
ing and Technology, Bangladesh. He is currently image processing. His research has been featured
pursuing the Ph.D. degree with the School of Infor- 16 times in national and international news venues,
mation Technology, Deakin University, Australia. including ABC News and IFIP (UNESCO). He is
He is also an Academic Faculty Member with an Active Member of IEEE, ACS, and the Australian Academy of Sciences.
HSTU. His research interests include machine learning, federated learning, As the Deputy Leader of the Machine Vision and Digital Health Research
blockchain, and remote sensing image analysis. Group (MaViDH), he provides leadership in artificial intelligence research
and leverages his leadership vision and strategy to promote AI research by
mentoring junior researchers in AI and supervising HDR students devising
plans to increase research impact.

MD AL MAMUN (Senior Member, IEEE)


received the B.Sc. degree in computer science and
engineering from the Rajshahi University of Engi-
neering and Technology (RUET), Bangladesh,
in 2005, and the Ph.D. degree in computer science
from the University of New South Wales (UNSW),
Canberra, Australia, in 2011. He is currently work-
ing as a Professor with the Department of Com-
puter Science and Engineering, RUET. He has
published more than 50 publications in several
international journals and conferences. His area of research interests include
satellite image mining (image compression, change detection, prediction
and forecasting, adaptive linear and non-linear modeling), computer vision
(pattern recognition and image classification, objects recognition, feature GOVIND KRISHNAMOORTHY is currently a
extraction, and nonlinear image classification), machine learning, and data Clinical Psychologist and a Senior Lecturer with
mining. He has served as a Reviewer for various IEEE-sponsored confer- the School of Psychology and Wellbeing, Uni-
ences and journals, such as IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE versity of Southern Queensland, Australia. His
SENSING, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, and IEEE JOURNAL research and clinical practice focus on improv-
OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. ing mental health and educational outcomes for
He attended several national and international conferences and served as a children and adolescents. He has collaborated
member of the organizing committee or technical program committee (TPC). with health services, schools, and community ser-
He is also serving as the Director of the ICT Cell, RUET; an Executive vices in implementing place-based and systems
Member of the IEEE Computer Society Bangladesh Chapter; an Adviser approaches to support developmental disorders
of the IEEE Computer Society Student Branch RUET; and an Executive and mental health concerns in children, adolescents, and their families.
Member of the Robotic Foundation, Eastern Region, Bangladesh.

VOLUME 11, 2023 15057

You might also like