ADET Model Real Time Autism Detection Via Eye Tracking Model Using Retinal Scan Images
ADET Model Real Time Autism Detection Via Eye Tracking Model Using Retinal Scan Images
Abstract
Background: Deficits in concentration with social stimuli are more common in children affected by autism spectrum dis-
order (ASD). Developing visual attention is one of the most vital elements for detecting autism. Eye tracking technology
is a potential method to identify an early autism biomarker based on children’s abnormal visual patterns.
Objective: Eye tracking retinal scan path images can be generated by eyeball movement during the time of watching the
screen and capture the eye projection sequences, which helps to analyze the behavior of the children. The Shi-Tomasi
corner detection methodology uses open CV to identify the corners of the eye gaze movement in the images.
Methods: In the proposed ADET model, the corner detection-based vision transformer (CD-ViT) technique is utilized
to diagnose autism at an early stage. Generally, the transformer model divides the input images into patches, which
can be fed into the transformer encoder process. The vision transformer is fine-tuned to resolve binary classification
issues once the features are extracted via remora optimization. Specifically, the vision transformer model acts as the
cornerstone of the proposed work with the help of the corner detection technique. This study uses a dataset with 547
eye-tracking retinal scan path images for both autism and non-autistic children.
Results: Experimental results show that the suggested ADET frameworkachieves a better classification accuracy of
38.31%, 23.71%, 13.01%, 1.56%, 18.26%, and 44.56% than RM3ASD, MLP, SVM, CNN, SVM, and our proposed ADET
methods.
Conclusions: This screening method strongly suggests that it be used to assist medical professionals in providing efficient
and accurate autism detection.
Keywords
eye tracking, autism, vision transformer, eye ball movement, image
Received: 27 April 2024; accepted: 28 October 2024
1 Introduction
Autism is the most prevalent kind of Pervasive Developmental Disorder (PDD). Communication and social issues, as well
as the emergence of limited and recurring behavioral patterns, can be characterized by ASD.1 It is projected that 10 million
individuals in India are believed to have ASD. The WHO study on 2022 figures states that one in every 100 youngsters
globally has ASD.2 The greater likelihood of developing autism has been scientifically associated with a variety of factors,
including preterm delivery, environmental factors, genetics, maternal health concerns, and advanced parental age.3 Autism
may start to show symptoms in some children as early as twelve months of age, while it may take up to 24 months or more
for other children.4 Generally, the early symptoms of autism arise between the ages of 1 and 2. Lack of eye contact,
Department of Information Technology, National Engineering College, Information and Communication Engineering, Anna University, Chennai, India
Corresponding author:
Jesu Mariyan Beno Ranjana, Department of Information Technology, National Engineering College, Information and Communication Engineering, Anna
University, Chennai, India.
Email: [email protected]
2 Technology and Health Care 0(0)
refusal to become more responsive to noise, and inability to make eye contact are some early indicators of autism in
children younger than one year of age.5
Some two and three-year-old children with ASD show the following symptoms: limited communication, hypersen-
sitivity or decreased sensory stimuli, difficulty following simple instructions such as rejection of affection, overactive
or distractive behavior, and recurrent actions like flapping hands, spinning the rotation movement, and creating unusual
noises. The typical co-occurrence of ASD with other neurodevelopmental problems and medical comorbidities makes
it difficult to detect and diagnose with accuracy.6,7 There are several research investigations that have demonstrated the
positive and negative consequences of conducting evaluations without objective diagnostic testing.8 Missing out on oppor-
tunities for appropriate professional education and training, which has resulted in missed opportunities for detecting autism
in its early stages.6 Several research investigation techniques have used various input modes, such as brain signal analysis,
brain image analysis, biometrics, sensory inputs, facial expression images, and eye tracking models.9
Eye tracking studies are discussed by many researchers, and they make the identification of ASD promising. It collects
eye movement and eye gaze data from toddlers and preschool children and serves to determine the direction of eye gaze
and the latency of eye movements.10 The person who is an ASD child has a different eye gaze and latency movement
when collecting data from different devices. The tracker device’s recorded eye movements are sent as diagnostic data.11–14
Eye movements can be categorized into four types: scan routes, fixation, blink rate, and saccades. Recently, eye-tracking
technology has diagnosed ASD with new techniques.15 Normally, eye contact is the most important part of the field of
non-verbal and social communication. Children diagnosed with ASD typically exhibit abnormal movements of the eyes.16
Eye movement-based ASD detection has been extensively studied, yet there are still certain issues. At first, existing
technology majorly utilized statistically related analysis to carry out the eye gaze movements of autism and non-autistic
children.17 Even though the requirements of the analysis are too high, it gives a better classification result. Second, many
of the algorithms use only the fixation values, which define how many times the eye movements are stationary.18 But the
actual movement of the eyes is a dynamic process, and the relationshipbetween the eyes’ fixations also provides crucial
data.19
Many authors have developed many models for detecting ASD among children. In 2019, Y. Tao and M.L. Shyu20 deter-
mined the SP-ASDNet, which utilizes both the LSTM and CNN networks to detect ASD. It achieves 74.22% classification
accuracy, and it takes more computational power. In 2019, M. Krol et al.,21 created a powerful algorithm by comparing
the eye-movement sequence, statistically comparing the cross-validated accuracies, and finding the difference between
temporal scan path features. By re-running an entire analysis, an explicit dimension reduction function cannot possibly
add the new high-dimensional points.
In 2019, Eraslan et al.,22 designed Scan path Trend Analysis (STA), which combines a collection of eye-movement
pathways into a single representative path and determines the trending path of a group of users on a webpage. The dataset
in this case only includes six web pages, which is insufficient to fully explore the impact of characteristics. In 2020,
S. Zhang et al.,23 established a strategy for the combined analysis of children’s eye tracking and EEG. It analyzes the
connection between eye-tracking and EEG records and emphasizes their functional relationship. The experimental result
shows a classification accuracy of 95% for identifying the ASD children. Multi-model fusion analysis always chooses
simple fusion strategies; thus, the categorization model’s performance is very limited.
In 2020, Roth et al.,24 demonstrate the interactive dyadic system, which combines more communication channels
for recording non-verbal behavior, and find the differences using computer aided diagnosis. Limited study samples and
random samplings regarding the age factor as well as gender distribution do not provide sustainable results. In 2021, Akter
et al.,25 created a k-means clustering algorithm by using an eye-tracking dataset and gave stable results and evaluation
metrics that justified the performance of different classifiers. This will be helpful to diagnose the ASD for better treatment.
But the age boundaries of the children will help to increase the investigation and detect autism more precisely.
In 2022, Ahmed et al.,19 used eye-tracking scan path images for diagnosing ASD by developing three artificial intelli-
gence techniques such as hybrid models, deep learning and machine learning. Here, the dataset is balanced and the model
was adjusted as well as modified to extract the deep features and solve the overfitting problem. Misclassification arises
very rarely, and the computational cost of the hybrid model is too high. In 2022, Gaspar et al.,26 took the scan path pictures
of 219 ASD and 328 normal children and in order to optimize the kernel extreme machine learning model, a metaheuristic
approach was used. It also explains the construction of the Giza Pyramids for classification with an accuracy of 98.8%.
More participants are included and eye-movements are recorded in future processes.
In 2022, G. Wan et al.,27 proposed the fixation duration of TD and ASD children using 10-s female speaking videos.
This approach does not identify functional level indicators like adaptive and IQ behavior. Moreover, the sample size
is very minimal and it cannot find any specific age group of people. According to the previous studies, most of the
existing techniques used eye tracking to show that children with autism had distinct gaze patterns from typical kids. The
approaches discussed above have certain shortcomings, including less classification accuracy, high computational cost,
Ranjana and Muthukkumar 3
and less computational power in autism detection. Therefore, in this paper a novelADET model has been proposed to
diagnose autism at an early stage using the corner detection-based vision transformer (CD-ViT) technique.
This study’s primary contribution is:
• Collecting the eye-movement scan path images from autism and non-autistic children. Pre-processing is the most
important part, which balances the images present in the dataset and enhances the eye-movement images.
• Analyze the Shi-Tomasi corner detection method using open CV and identify the corners of the eye gaze movement
in the images.
• Create the vision transformer model to train the data set and it shows a promising result, demonstrating better
performance on eye-based image classification tasks.
• Finally, compare the model with different algorithms; they give better performance.
The following is the arrangement of the remaining portions of the paper: Section 2 presentsan overview of the pertinent
research. Section 3 presents a description of the proposedtechnique. Section 4 conducted the experimental evaluation.
Section 5 presents a discussion of this study.
2 Proposed methodology
The methodology explains a brief description of the autism eye retinal scan path image classification process. The input eye
retinal scan path image is initially preprocessed by minimizing or maximizing the intensity level of the image, avoiding
redundant images and removing the noise. The second stage is to identify the corners of the images that were marked
separately which will be helpful to train the model easily and effectively.
The third stage is the feature extraction stage and the features are extracted using the vision transformer model, which
usually splits the images into patches. Each patch separates the tokens and trains the image based on self-attention-based
approaches. Moreover, the corner vision transformer model differentiates the eye retinal scan path features and classifies
the model. The proposed ADET model corner detection image-based vision transformer (CD-ViT) technique is explained
in Figure 1.
4 Technology and Health Care 0(0)
2.1.1 Shi-Tomasi corner detection. Enhancement of the traditional corner identification algorithm, the Harris algorithm,
is achieved via the Shi-Tomasi (ST) algorithm. In general, it produces better corners than the Harris algorithm. We will
describe the conceptual foundations of the ST algorithm in this part. The Harris algorithm’s primary method is to traverse
around the image using a local window and determine whether the grayscale (gs) values have changed significantly. If the
gs values inside the window (as shown on the gradient map) exhibit notable differences, there is a corner in the area where
this window is situated.
At first, a mathematical model is established to identify the windows that will significantly alter the grayscale values.
The values of the grayscale pixels at a given spot in the grayscale image are used as the starting value when the window’s
centre is positioned there. If the window is moved slightly in both the x and y axes, the pixel gs value at that place shows
the change in gs values caused by the movement. In the simplest case, when every pixel in the window represents an
average filtering kernel with a weight of 1, the formula for the variance in pixel gs values that arises from rotating the
window in different directions is as follows:
∑
R(s, t) = z(x, y)[I(x + s, t + s) − r(x, y)]2 (1)
x,y
Following the expansion with Taylor’s formula, the approximation is provided by:
[( )] ( )
∑ rx2 , rx ry s
R ≈ [s, t] z(s, t) (2)
rx ry , ry2 t
The following expression can roughly be obtained for minor local displacements [s, t]:
()
s
R ≈ [s, t]N (3)
t
When the matrix N is diagonalized, the X and Y axis’ grayscale change rates are represented by the eigenvalues 𝛾1 and 𝛾2
respectively.
[ ] [ ]
∑ rx2 rx ry 𝛾1 0
N= z(s, t) = (5)
xy
rx ry ry2 0 𝛾2
The corner response function for the Harris corner identification technique is:
A = 𝛾1 𝛾2 − G(𝛾1 + 𝛾2 )2 (6)
Ranjana and Muthukkumar 5
To find local maxima in A, the corner response function A: A > threshold, is a thresholder as part of the Harris corner
detection technique. Shi-Tomasi’s approach is an enhancement on Harris’s, in whicha point is deemed a corner if the
minimum eigenvalue (𝛾1 or 𝛾2 ) is greater than theminimal value.
The ST corner detection algorithm’s corner response function is:
A = min(𝛾1 , 𝛾2 ) (7)
Figure 2 indicates the functions of the ST corner detection algorithm. This function uses:
• Initially, this method uses either Shi-Tomasi or Harris Corner to determine the corner quality score at each pixel.
• Subsequently, a non-maximum suppression is carried out using this function, keeping the local maximums in the
3 × 3 neighborhood.
• Subsequently, all corners with a quality score below QualityLevel ∗ maxx,y QualityScore(x, y) are eliminated. This
is the best corner score, or maxx,y QualityScore(x, y). In this case, all corners with a quality score of less than 15 are
discarded if the best corner has a quality score of 1500 and a quality level of 0.01.
• At this point, the quality score is used to sort every remaining corner in descending order.
• When a stronger corner is present at a distance shorter than maxDistance, Function discards those corners.
6 Technology and Health Care 0(0)
Figure 3. Eye tracking retinal scan path image after applying ADET model corner detection technique.
2.1.2 Patch the dataset images. The corner detection can be identified by using a computer vision application, and it has
a default function. To detect the corners using Shi-Tomasi by identifying the scoring function. To calculate the scoring
function R is:
R = min(𝜆1 , 𝜆2 )
The pixel point is regarded as a corner if the R value is higher than the threshold value. The values of 𝜆1 , 𝜆2 are the
eigenvalues of the resultant matrix. The output of the image is named as ECorner (x).
The eye image E (x) is converted after Shi-Tomasi corner detection is,
Figure 3 represents the ADET model corner detection of eye-tracking retinal scan path images. It identifies all the corners
where the eyeball movement is moving.
2.1.3 Positional embedding. Positional embedding keeps the positional information of a collective embedded patch and
also indicates the sequential position of all the patches. Finally, it can create a positional number before every patch in a
one-dimensional order. The resultant sequence of positional embedding gives the input to the transformer encoder.
2.1.4 Transformer encoder. The positional embedding patches initially add the learnable class token Xc . The learning
embedded is a sequence of embedded patches with class tokens. The starting encoder layer Z 0 is described in the equation
as follows:
Z0 = [Xc ; Xp1 E; Xp2 E; … ; XpN E] + EPI (8)
where,
E ∈ R(P .C)xD
2
(9)
(N+1)xD
EPI ∈ R (10)
Here Xc is a class label token, XpN is the patch images N ∈ 1 to K and EPI denotes how the positional information is stored
in a sequential order.
Every transformer encoder always needs a class token at the 0th position while using a pre-trained model. When the
patch image is sent as an input sequence to the encoder, one class token must be initiated as the first patch information.
There are several identical layers in the encoder process. Each layer creates two primary blocks named theFFN block and
Multi-head Self Attention block.
2.1.5 Feed-forward network (FFN). This is the transformer encoder’s second block, made up of two fully connected lay-
ers that include GELU activation functionality. Every one of the two encoder layer blocks is preceded by the layer of
normalization (LN). The output is calculated using the following formulas by applying residual connections:
2.1.6 Remora optimization. The remora optimization algorithm (ROA) is primarily inspired by the remora whale, which is
a clever marine navigator. The algorithm consists of two stages: exploitation and exploration. To construct the numerical
expression, remora behaviors like mindful eating and free travel are utilized. Mode switching is accomplished with a
single short step trial, and the remora factor, which produces convergence, can be employed to increase the optimization’s
precision. Decisions on mode switching must take into account phases like experience, thoughtful eating, and free travel.
These tactics aid in the ROA algorithm’s pursuit of ideal outcomes. The ROA algorithm’s steps are listed below:
(i) Initialization
Remora is the greatest option, and in its current standing, the variables in the search space are represented by the variable
D. Remora moves in different positions according to the size of the pool. The formula for the present position is Cj =
(Cj1 , Cj2 , … , Cjd ). The symbol d represents the size of a swimming remora, whereas i represents the total number of
remoras. Similarly, Cop = (C1∗ , C2∗ , … , Cd∗ ) can be used to indicate the algorithm’s optimal solution. Additionally, every
potential solution demonstrates a unique fitness value. Additionally, it can be written as E (Cj ) = E (Cj1 , Cj2 , … , Cjd ),
where the E is used to calculate the value of the fitness function. The ideal fitness of each remora location is shown by the
formula E (Cop ) = E (C1∗ , C2∗ , … , Cd∗ ).
• SFO Strategy
When the remora is attached to the swordfish, its position can be updated and expressed as follows,
[ ( s ) ]
Cop + Crand
s
Cj = Cop rand(0, 1) ∗
s+1 s
− Crand
s
(13)
2
Here, S is the highest number of iterations, s is utilized to indicate the number of iterations that are still in progress, and
Cjs+1 is the current position of the remora with its number(j). The best position of the remora is foundis Crand and Cop s
represents the remora’s random position. These variables are used to make sure the algorithm can perform a worldwide
lookup. Moreover, the fitness value of the current iteration is obtained from the experience assault step, which is the basis
for the random selection of remora.
• Experience Attack
Remora’s change of host can be estimated using this phase. It might be expressed as follows:
Cpre can be used to indicate the position of the previous generation, while Catt can be used to indicate the tentative step.
This is the movement for global search, and the randl can be selected suitably. The purpose of thisstage is to compare
the fitness values of the attempted solution E(Catt ), and the current solution E(Cjs ). In addition, the value of E(Catt ) is
tiny,meaning that E(Cjs ) > E(Catt ). Remora achieves local optimization by implementing a novel feeding method. The
previous solution, which can be stated as follows, if the associated solution’s fitness function is greater than the current
one, it will be utilized once more.
E(Cjs ) < E(Catt ) (15)
Remora’s bond with the whale serves as the basis for this, and the location updates are stated as follows,
The remora’s position is dependent on the whale, L is the distance (frequently appropriate solution) between the prey and
hunter, 𝛿 is the chosen random number, and the random number a lie between [−1, 1] and [−2, −1], decreasing linearly.
• Host Feeding
This subsection belongs to the stage of exploitation. The host’s position space can be used to compress the solution space
for this stage, which is expressed as follows:
The host and remora’s volume spaces are proportionate to the small movement step, denoted as L. In order to locate the
remora more precisely, T is used in the solution space. Figure 4 indicates the flowchart of ROA.
2.1.7 Classification layer. After the encoder’s output is included in the classification task, the class labels are determined
as tokens by the softmax activation function of the model.
( )
y = LN ZL0 (24)
During pre-training, FFN usually represents the classification task and it can be replaced by a fine-tuning stage. Finally, the
softmax function gives the probability for the classification accuracy of non-autisticand autism children using eye-tracking
retinal scan path images.
Eye tracking retinal scan path images usually create a sequence of consecutive fixations and saccades generated from
the path of eye movement over a specific time. Detecting the corner with the computer vision technique extracts all the
corners from the image, and it determines the contents of the image. Detection of eye movement from various places
helps to detect and differentiate the autism child from a normal child. Here, Shi-Tomasi corner detection identifies the eye
movement image points by analyzing the variations. The corner points are often categorized by using the intensity values.
2.1.8 ASD and non-ASD. Here, two different participants are analyzed and trained on their retinal scan path eye movement
images E, which are the combination of both autism and non-autism images. The autism-based eye retinal scan path images
are represented as A, and the non-autism-based eye retinal scan path images are represented as N.
{
A(x), for autism eye image
E (x) = (25)
N(x), for non-autism eye image
where A (x) represents autism image and N (x) represents non-autism image.
Figure 5 shows the eye-tracking retinal scan path images of ASD and non-ASD. The retinal scan path images are usually
a sequence of consecutive fixation points and eyeball movements through a specific period of time, and they cannot be
overlapped by themselves.
2.2.1 Step 1: original retinal scan path eye image. The original retinal scan path eye image dataset contains 219 autism
images and 328 non-autism images. The dataset is symbolized as 𝔼1 , and each image in the dataset is symbolized as e1 (x)
e 𝔼1 , x = 1, 2, . . . , | 𝔼 | = 547.
𝔼1 = {e1 (1), e1 (2), … , e1 (i), … , e1 (|E|)} (26)
Ranjana and Muthukkumar 9
2.2.2 Step 2: convert color images to grayscale. Transforming the eye retinal scan path color images to grayscale image,
and it maintains the brightness of the image and thus the grayscale image is symbolized as 𝔼2 , then
E2 = GrayImage(E1 )
E2 = {e2 (1), e2 (2), … , e2 (i), … , e2 (|E|)} (27)
10 Technology and Health Care 0(0)
Every image in 𝔼2 contains a grayscale image. Now, there are no changes in the size of the image, but the color value has
changed. After converting a color image to a grayscale image, the eye retinal scan path image’s size is
2.2.3 Step 3: apply contrast stretching (CS). Contrast stretching (CS) is a method for improving images, that can improve
the image by using intensity values. This is also a kind of normalization process that can change the range of pixel intensity
values, and thus the stretching image is symbolized as 𝔼3 . Suppose, the ith image e2 (i), i = 1, 2, … , |E| is applied in
contrast stretching, then the highest and lowest intensity values of the image are calculated.
To calculate the minimum gray scale values:
W2 H2
𝜇min (i) = min min e2 (i|x, y) (4.A)
x=1 y=1
W2 H2
𝜇max (i) = max max e2 (i|x, y) (4.B)
x=1 y=1
Ranjana and Muthukkumar 11
Here, the coordinates of the pixel are based on the height and width of the image. After contrast stretching, the image e3 (i)
is obtained as follows:
e (i) − 𝜇min (i)
e3 (i) = 2 (4.C)
𝜇max (i) − 𝜇min (i)
As e3 (i) all images are stretched by using contrast stretching,
E3 = CS(E2 )
E3 = {e3 (1), e3 (2), … , e3 (i), … , e3 (|E|)} (28)
The size of the image after contrast stretching has not changed. The size of the eye retinal scan path image is
2.2.4 Step 4: crop the image. Cropping the images by eliminating the undesirable portion of an image, the cropped image
is symbolized as 𝔼4 .
E4 = Crop(E3 )
Parameters can be used to crop the image by removing pixels from left, top, right, and bottom. It can be denoted as cl , ct ,
cr , and cb . Set the crop values in units of pixels from left, top, right, and bottom.
ct = cb = 60; cl = cr = 140
E4 = {e4 (1), e4 (2), … , e4 (i), … , e4 (|E|)} (29)
The size of the image after cropping has changed. The size of the eye retinal scan path image is,
Here, W 4 = H 4 = 360, C4 = 1.
2.2.5 Step 5: down sampling an image. Down sampling reduces the size of each image, and each image is resized and thus
the down sampled image is symbolized as E5 .
E5 = DownSampled(E4 )
= (E4 , [256, 256])
E5 = {e5 (1), e5 (2), … , e5 (i), … , e5 (|E|)} (30)
Where, Down Sampled(E4 ): E4 →E5 means down sampling function. Here, E4 is the original cropped image and E5 is the
down sampled image. In this study, the size of the image after down sampling is changed. The size of the eye retinal scan
path image is
Size {e5 (i)} = {W5 ∗ H5 ∗ C5 }
Here, W 5 = H 5 = 256, C5 = 1
Generally, down sampling saves storage space. Larger storage spaces always bring in overfitting problems, which
decreaseperformance.
The following Algorithm 1 describes the main steps for training and testing the proposed ADET model.
The second column represents the image with detected corners using the Shi-Tomasi corner detection algorithm. The
third column represents the patches of the corner detection image. Finally, the fourth column represents the selected strong
corners of the original image.
To train the proposed ADET model for detecting autism and non-autistic children by using eye tracking retinal scan
path images. Here, the optimizer used is the Adam optimizer which works and achieves good recognition accuracy. The
number of epochs used for training the model is 50. Batch normalization is also used to normalize the training image.
Table 2 has a description of parameter setups.
The dataset contains eye tracking retinal scan path images and it is implemented for detecting autism in children.
Generally, there are 547 images which are divided into autism and non-autism images. The split image is described in the
following Table 3.
Ranjana and Muthukkumar 13
Configuration Value
Optimizer Adam
Epoch 50
Batch Size 16
Learning Rate 1 × 10−4
Batch Normalization True
Execution Environment GPU
Eye-Tracking retinal scan path dataset (Detecting autism and non-autism children) Accuracy 97.27%
Precision 95.55%
Specificity 66.7%
Sensitivity 98.7%
TP + TN
Accuracy = ∗ 100
TP + FP + TN + FN
TP
Precision = ∗ 100
TP + FP
True Negative
Specificity = ∗ 100
True Negative + False Positive
True Positive
Sensitivity = ∗ 100
True Positive + False Negative
True Positive Rate Sensitivity
AUC = =
False Positive Rate Specificity
Where True Positive describes the correctly classified ASD children, the number of non-ASD children who are correctly
labeled as normal is called TN; the number of non-ASD children who are incorrectly classified as ASD children is called
FP; and the number of ASD children who are incorrectly classified as non-ASD children is called False Negative.
The following equations evaluate the values of all the parameters of the suggested method and they are described in
Table 4.
The transformer model is one of the most efficient models to evaluate, train and classify medical related images and
it completely depends on the image’s visibility and clarity. Generally, the transformer model classifies the images for
training, testing and validating them. Training images train the set of images by splitting them into patches and gathering
their features.
The features are easily gathered and trained by using the corner detecting process. This process extracts the features
with an encoder mechanism and classifies the output layer with two possibilities. The output layer detects the particular
image of the eye tracking retinal scan path, which denotes whether the child has autism or not. The result demonstrates
14 Technology and Health Care 0(0)
the total number of images for training, testing and validating processes and detects the accuracy and loss curve of the
suggested method shown in Figures 8 and 9.
The Receiver Operating Characteristics (ROC) measure the algorithms’ performance during the evaluation phase.The
algorithm operates quite efficiently when the curve gets closer to the left corner. The performance of the suggested method
is depicted in Figure 10 with the x-axis representing specificity or FP rate and the y-axis representing sensitivity or TP
rate.
Table 5 summarizes the performance results of the existing model with the proposed algorithm. It can be concluded
that our proposed model achieves higher accuracy and better performance than the existing model. When the suggest-
edtechnique is trained and classified with higher accuracy than the existing methodology. Even though there are a smaller
number of images,it achieves a better computation result by using the vision transformer model.
Table 5 indicates that the suggested ADET model achieves a better classification accuracy of 38.31%, 23.71%, 13.01%,
1.56%,and 18.26% than the RM3ASD,7 MLP,24 SVM,9 CNN,10 and SVM1 methods.
Our proposedADET model achieves a better classification precision of 39.28%, 25.66%, 13.64%,5.11%, and21.36%
than the RM3ASD,7 MLP,24 SVM,9 CNN,10 and SVM1 methods.
Our proposedADET model achieves a better sensitivity of 35.26%, 31.78%, 25.04%,35.68%, and26.83% than the
RM3ASD,7 MLP,24 SVM,9 CNN,10 and SVM1 methods.
Our suggested ADET model achieves a better specificity of 20.36%, 18.24%, 14.69%,3.95%, and16.51% than the
RM3ASD,7 MLP,24 SVM,9 CNN,10 and SVM1 methods.
Ranjana and Muthukkumar 15
where sf stands for a sign function, T stands for test statistics, S for sample size, y1j , y2j for the ranked pairings of the two
distributions, and Kj for rank.
4 Conclusion
Autism is a brain developmental disorder that affects children in their early stages and can spread all over the world.
In the proposed work, an eye tracking retinal scan path image dataset is evaluated using a transformer application. The
consistency and quality of the eye tracking data determine the accuracy of the suggestedADET model. The experimental
result indicates that the suggested work is more effective. The ADET model corner detection technique is also used to
give a higher level. Improve the accuracy level of feature selection with the Remora optimization algorithm. The vision
transformer model is finely tuned and it shows statistically better performance. The classification accuracy reaches 97.27%
and these results suggest that the model gives good generalization efficiency in binary class classification when compared
to the previous approaches. With a small amount of training data, the suggested approach has proven to be resilient. The
proposed approach has a significant flaw in that it relies heavily on eye-tracking data, which is not always available or
simple to gather, particularly in environments with limited resources or diversity. The model’s usability and scalability in
real-world applications may be limited by its dependence on certain technology and regulated conditions, especially for
large-scale or extensive screening operations. To address this issue, by implementing reliable data augmentation techniques
and sophisticated preprocessing approaches to manage variability in eye-tracking data, further work could improve the
accuracy and generality of eye-tracking models. Additionally, create a mobile application with this screening methodology
that incorporates eye tracking data from the age of children from 6 to 24 months and it can independently diagnose the
screening mechanism for toddlers by using mobile devices. Moreover, electrooculograms (EOG) are used to create the
dataset for detecting autism in children at an early stage by evaluating the artificial intelligence framework.
Acknowledgements
The authors would like to thank the National Engineering College, K.R. Nagar, Kovilpatti for their support by providing fellowship and
constructive suggestions that have helped to publish this research paper.
Informed consent
I certify that I have explained the nature and purpose of this study to the above-named individual, and I have discussed the potential
benefits of this study participation. The questions the individual had about this study have been answered, and we will always be
available to address future questions.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
References
1. Oliveira JS, Franco FO, Revers MC, et al. Computer-aided autism diagnosis based on visual attention models using eye tracking.
Sci Rep 2021; 11: 1–11.
2. Hus Y and Segal O. Challenges surrounding the diagnosis of autism in children. Neuropsychiatr Dis Treat 2021; 17: 3509–3529.
3. Davidson C, Turner F, Gillberg C, et al. Using the live assessment to discriminate between autism spectrum disorder and
disinhibited social engagement disorder. Res Dev Disabil 2023; 134: 104415.
4. Jonsdottir SL, Saemundsen E, Gudmundsdottir S, et al. Implementing an early detection program for autism in primary healthcare:
screening, education of healthcare professionals, referrals for diagnostic evaluation, and early intervention. Res Autism Spectr
Disord 2020; 77: 101616.
5. Jeyarani RA and Senthilkumar R. Eye tracking biomarkers for autism spectrum disorder detection using machine learning and
deep learning techniques. Res Autism Spectr Disord 2023; 108: 102228.
6. Xia C, Chen K, Li K, et al. Identification of autism spectrum disorder via an eye-tracking based representation learning model. In:
Proceedings of the 7th International Conference on Bioinformatics Research and Applications, 2020, pp.59–65.
7. Mazumdar P, Arru G and Battisti F. Early detection of children with autism spectrum disorder based on visual exploration of
images. Signal Process Image Commun 2021; 94: 116184.
8. Alcaniz M, Chicchi-GiglioliI A and Carrasco-Ribelles LA. Eye gaze as a biomarker in the recognition of autism spectrum disorder
using virtual reality and machine learning: A proof of concept for diagnosis. Autism Res 2021; 15: 131–145.
9. Zhao Z, Tang H, Zhang X, et al. Classification of children with autism and typical development using eye-tracking data from face-
to-face conversations: machine learning model development and performance evaluation. J. Med Internet Res 2021; 23: e29328.
10. Raj S and Masood S. Analysis and detection of autism spectrum disorder using machine learning techniques. Procedia Comput
Sci 2020; 167: 994–1004.
11. Ramji DR, Palagan CA, Nithya A, et al. Soft computing based color image demosaicing for medical image processing. Multimed
Tools Appl. 2020; 79: 10047–10063.
12. Safdar GA and Cheng X. Brain aneurysm classification via whale optimized dense neural network. Int J Data Sci Artif Intell 2024;
02: 63–67.
13. Hemamalini V, Anand L, Nachiyappan S, et al. Integrating bio medical sensors in detecting hidden signatures of COVID-19 with
artificial intelligence. Measurement ( Mahwah N J). 2022; 194: 111054.
14. Jegatheesh A, Kopperundevi N and Anlin Sahaya Infant Tinu M. Brain aneurysm detection via firefly optimized spiking neural
network. Int J Current Bio-Med Eng 2023; 01: 23–29.
15. Kanhirakadavath MR and Chandran MSM. Investigation of eye-tracking scan path as a biomarker for autism screening using
machine learning algorithms. Diagnostics 2022; 12: 518.
16. Kollias KF, Syriopoulou-Delli CK, Sarigiannidis P, et al. The contribution of machine learning and eye-tracking technology in
autism spectrum disorder research: a systematic review. Electronics (Basel). 2021; 10: 2982.
17. Zammarchi G and Conversano C. Application of eye tracking technology in medicine: a bibliometric analysis. Vision 2021; 5: 56.
18. Tahri Sqalli M, Aslonov B, Gafurov M, et al. Eye tracking technology in medical practice: a perspective on its diverse applications.
Front Med Technol 2023; 5: 1253001.
19. Solovyova A, Danylov S, Oleksii S, et al. Early Autism Spectrum Disorders Diagnosis Using Eye-Tracking Technology. arXiv
preprint arXiv:2008.09670. 2020.
20. Roth D, Jording M, Schmee T, et al. Towards computer aided diagnosis of autism spectrum disorder using virtual environments.
In: 2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), 2020, pp.115–122. IEEE.
21. Wan G, Kong X and Sun B. Applying eye tracking to identify autism Spectrum disorder in children. J Autism Dev Disord 2019;
49: 209–215.
22. Krol M and Krol ME. A novel eye movement data transformation technique that preserves temporal information: A demonstration
in a face processing task. Sensors-Basel 2019; 19: 2377.
23. Eraslan S, Yesilada Y, Yaneva V, et al. Autism detection based on eye movement sequences on the web: a scanpath trend analysis
approach. In: Proceedings of the 17th International Web for All Conference, 2020, pp.1–10.
24. Akter T, Ali MH, Khan MI, et al. Machine learning model to predict autism investigating eye-tracking dataset. In: Proceedings
of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), vol. 5–7, Dhaka,
Bangladesh, 2021, pp.383–387.
25. Zhang S, Chen D, Tang Y, et al. Children ASD evaluation through joint analysis of EEG and eye-tracking recordings with graph
convolution network. Front Hum Neurosci 2021; 15: 651349.
18 Technology and Health Care 0(0)
26. Ahmed IA, Senan EM, Rassem TH, et al. Eye tracking-based diagnosis and early detection of autism Spectrum disorder using
machine learning and deep learning techniques. Electronics (Basel). 2022; 11: 530.
27. Gaspar A, Oliva D, Hinojosa S, et al. An optimized kernel extreme learning machine for the classification of the autism spectrum
disorder by using gaze tracking images (May). Appl Soft Comput. 2022; 120: 108654.