Detecting High-Functioning Autism in Adults Using Eye Tracking and Machine Learning

This document discusses using eye tracking data and machine learning to detect high-functioning autism in adults. It summarizes previous research achieving up to 86% accuracy using fMRI data and explores whether eye tracking data collected during web browsing can provide an alternative method that is less expensive and easier to implement than fMRI. The study collected eye tracking data from 71 participants to evaluate if autism detection accuracy above chance level could be achieved using machine learning classifiers trained on gaze-based features.

Uploaded by

Charming Little angel

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Detecting High-Functioning Autism in Adults Using Eye Tracking and Machine Learning

Uploaded by

Charming Little angel

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1254 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 28, NO.

6, JUNE 2020

Detecting High-Functioning Autism in Adults

Using Eye Tracking and Machine Learning
Victoria Yaneva , Le An Ha, Sukru Eraslan , Yeliz Yesilada , and Ruslan Mitkov

Abstract — The purpose of this study is to test whether historical, and parent-report information [3], [4], which is then
visual processing differences between adults with and with- interpreted by a qualified clinician. The case-by-case basis of
out high-functioning autism captured through eye tracking the decision is necessary as it allows to treat each patient
can be used to detect autism. We record the eye movements
of adult participants with and without autism while they look according to their circumstances but, at the same time, it leads
for information within web pages. We then use the recorded to a lack of consistency and reliability [5], [6].
eye-tracking data to train machine learning classifiers to Obtaining an early diagnosis is more likely achieved when
detect the condition. The data was collected as part of two ASD symptoms are severe [5], and, conversely, people with
separate studies involving a total of 71 unique participants high-functioning autism seeking a diagnosis in their adulthood
(31 with autism and 40 control), which enabled the evalua-
tion of the approach on two separate groups of participants, are especially difficult to diagnose [7]. Some of the reasons
using different stimuli and tasks. We explore the effects are that the symptoms of high-functioning autism are not as
of a number of gaze-based and other variables, showing obvious; developing coping strategies throughout life (e.g.,
that autism can be detected automatically with around 74% learning to avoid triggers) masks the presentation of relevant
accuracy. These results confirm that eye-tracking data can symptoms; and that, unlike for children, critical incidents
be used for the automatic detection of high-functioning
autism in adults and that visual processing differences with adults are not monitored by school staff or parents. It
between the two groups exist when processing web pages. would therefore be beneficial to develop a screening method
for identifying high-functioning autism that does not rely on
Index Terms — Autism, eye tracking, web, screening, diag-
nostic classification, detection. parental and school reports and that is sensitive enough to
capture the fine-grained differences between adults who are
I. I NTRODUCTION on the spectrum and those who are not.
In this paper, we test the hypothesis that visual processing
M ANY disorders and diseases that do not have a clinical
biomarker are at risk of being either misdiagnosed or
diagnosed during their later stages. One such neurodevelop-
differences between adults with and without high-functioning
autism captured through eye tracking can be used to detect
mental disorder is Autism Spectrum Disorder (ASD), which autism automatically. This approach is based on the idea
affects communication and social interaction [1]. As autism that the eye-tracking data captures differences in the cogni-
is a highly heterogeneous condition, the term “spectrum” tive profiles of the two groups when executing information-
is used to signify the different types and levels of support searching tasks, and then these differences, as learned by a
that different individuals might need, where “high-functioning machine-learning classifier, can be used as a marker of the
autism” signifies a high level of independence and ability. condition.
While individuals with high-functioning autism have a normal
IQ range, they may process information differently, especially
A. Autism Detection
in situations requiring social interaction, the understanding of
semantics and pragmatics, or the transfer of knowledge from The most rigorously validated autism-detection models for
one domain to another. Many people on the spectrum may also adults which use behavioural data are based on resting-state
have atypical sensory processing and attention shifting patterns fMRI, owing to the availability of data sets collected in
(Section I-A,) as well as a preference for specific routines [2]. different centres and used as unseen data for evaluation. The
Currently, the ASD diagnostic procedure is a highly sub- accuracy of these classifiers varies between 79% [8] and 86%
jective assessment process. It is restricted to behavioural, [3] for leave-one-out cross-validation (LOOCV) and between
71% [8] and 83% [9] when tested on unseen data. Another
Manuscript received September 18, 2019; revised February 18, 2020; study using only LOOCV reports 76.7% [10]. The best result
accepted April 24, 2020. Date of publication April 30, 2020; date of current
version June 5, 2020. (Corresponding author: Victoria Yaneva.) of 86% (subsequently 80% when evaluated on unseen data)
Victoria Yaneva, Le An Ha, and Ruslan Mitkov are with the Research is based on training data from 12 participants with ASD
institute for Information and Language Processing, University of Wolver- and 12 Control participants and achieves 100% sensitivity
hampton, Wolverhampton WV1 1LY, U.K. (e-mail: [email protected];
[email protected]; [email protected]). (recall) and 66.7% specificity (precision). While these studies
Sukru Eraslan and Yeliz Yesilada are with the Computer Engineering provide a promising direction in autism detection, collecting
Program, Middle East Technical University Northern Cyprus the fMRI data is a very expensive and obtrusive procedure
Campus, 99738 Mersin, Turkey (e-mail: [email protected];
[email protected]). and is not suitable for pregnant women, nor for people with
Digital Object Identifier 10.1109/TNSRE.2020.2991675 sensory issues, metal implants, claustrophobia, head trauma,

1534-4320 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 08:59:41 UTC from IEEE Xplore. Restrictions apply.
YANEVA et al.: DETECTING HIGH-FUNCTIONING AUTISM IN ADULTS USING EYE TRACKING AND MACHINE LEARNING 1255

etc., which limits the applicability of the approach. Neverthe- and whether data from familiar activities such as web-page
less, to the best of our knowledge, these results represent the processing can be useful for this purpose. The motivation
state of the art in automatic autism detection with behavioural behind this approach is first and foremost related to the fact
data. that eye-movement data does not rely on subjective interpre-
Studies using EEG and speech data report results from tation of whether or not a given behaviour occurred, and that:
10-fold cross validation instead of LOOCV, where the accu- i) looking for information on web pages is a highly familiar
racy is 94% for EEG data [11] and 93% for speech data [12]. and naturalistic task; ii) gaze can be reliably captured through
These are potentially overoptimistic as different data segments eye tracking; and iii) eye-tracking data is relatively less costly
from the same participant are assigned to the training and and easier to record, as devices and software that use gaze
testing sets, thus increasing the similarity between the two. In for navigation or gaming have been available on the market
other words, this evaluation set-up does not correspond to real- for years (e.g., Samsung Galaxy S4, Tobii Eye Trackers for
world applications where the system has to categorize a user, PC Gaming, etc.). It is important to note that the proposed
portions of whose data are not included in the training set. screening method is not intended to substitute clinical diag-
The differences in visual attention between people with nostic procedures where these are available. Rather, its purpose
autism and neurotypical people are well documented in the is to be used to identify autistic traits at a wider level and to
literature (e.g., [13]–[18]). Atypical visual-attention patterns provide the means to refer for further assessment those people
reflect higher-order differences in information processing, as who might be at risk.
the focus of attention directs the input of information from The experiments presented in this paper test our main
the environment. Visual attention is related to concentration, hypothesis using data derived from different participant
interest, perception, learning, the ability to form joint attention, groups, stimuli and tasks in two independent data collec-
cognitive effort and other indicators, the combination of which tion rounds involving a total of 71 unique adult participants
can be used to detect autism. For example, many people with (31 with high-functioning ASD and 40 Control). In both
ASD tend to avoid the eye region when looking at faces [13], rounds of data collection, the two groups of participants
[16] and this phenomenon has been extensively investigated completed information-processing tasks while looking at web
in relation to social interaction difficulties, which are one of pages and having their eye movements recorded by an eye
the diagnostic criteria for ASD. Furthermore, eye-tracking data tracker. Using data from the first round only, we trained an
from visual attention tasks has been shown to correlate well initial autism-detection tool which achieved an accuracy of
with brain activity differences. Evidence from a large sample 75% and was presented in [19]. Encouraged by this result, we
of 294 ASD subjects suggests that the differences in the visual proceeded to collect more data and investigate new research
attention to the eye region could be due to smaller volume of questions using both data sets. As a result, the present man-
the right anterior cerebellum in subjects with ASD and that uscript has the following distinct objectives, which have not
“eye tracking may be a promising neuro-anatomically based been previously explored:
stratifying biomarker of ASD” [13].
Eye tracking has mainly been explored as a biomarker 1) To test whether classifiers based on eye-tracking data
of autism in infants, toddlers, and young children [14]–[17]. from visual processing tasks can be used to detect high-
A large contribution of these studies relates to the challenges functioning autism in adults with consistent accuracy when
of accurately recording gaze in such young subjects and they trained on new data using a different stimulus set, as well
establish eye tracking as a promising direction for autism as different participants and tasks. The consistency of the
detection. With regards to young children (between the ages prediction across various conditions is informative of the
of four and six), [16] reports a study involving ASD and reliability of the approach.
typically-developing children watching a silent video of a 2) To test whether people with autism exhibit different visual
woman mouthing the English alphabet. The reported accuracy processing patterns with and without specific information-
is 85.1%. This was achieved using five-fold cross-validation location tasks across different time conditions. The impli-
with no note on how the data was split (i.e. whether segments cations of this question are related to distinguishing
of the data of the same participant were used for training and whether the atypical attention patterns are directly influ-
testing, as in the EEG and speech studies above). In another enced by having a specific information-location task or
study, [17] perform classification of children between the ages exist independently as a natural approach to visual process-
of four and 11 looking at pictures of faces. Using LOOCV, ing in the absence of an explicit instruction. In addition to
the reported accuracy is 88.51%. It was unclear, however, how task effects, here we are also concerned with the effects of
severe the autism of the children from the ASD group was. task duration.
The study presented here differs from previous research in that 3) To investigate the effects of a number of factors over
it involves adults as opposed to children, and it focuses on the classification accuracy, which were not previously
high-functioning autism, which can be challenging to detect. investigated. These include a number of gaze-based and
page-related features, a larger number of tasks, particularly
B. Overview of the Proposed Approach in terms of eliciting attention-shifting differences, as well
Unlike previous research, the focus of this paper is on test- as different approaches and granularity levels for defining
ing whether gaze data can be used for fine-grained distinction the Areas of Interest (AOIs) on the web pages.
between adults with and without high-functioning autism 4) To aid independent research by making new data available.

Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 08:59:41 UTC from IEEE Xplore. Restrictions apply.
1256 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 28, NO. 6, JUNE 2020

TABLE I similar, except for having a diagnosis of autism. To ensure

I NFORMATION A BOUT THE PARTICIPANTS that no participants with a high incidence of autistic traits
were included in the control group, all control participants
completed the 50-item Autism Quotient (AQ) test [21]. This
test is widely used as a screening tool for autism by general
practitioners before providing a referral for expert diagnosis.
Excluded participants: In Study 1, data from five partici-
pants (three with ASD and two control) was excluded because
of calibration issues and inaccuracies due to head movements.
Subsequently, data from one randomly selected control partic-
In the next sections, we refer to experiments conducted ipant was also excluded to have balanced classes. For Study 2,
with the two rounds of data collection as Study 1 and five control-group participants were excluded (one for having
Study 2, respectively. For the purpose of direct comparison a high score at the AQ test, three for reporting school referrals
and joint analysis, we also present key results from the for dyslexia diagnosis, and two for calibration issues).
study reported in [19], which are clearly marked in the
Level of Independence: All participants were highly inde-
relevant tables.
pendent adults, none of whom relied on a caregiver. Twenty
participants with ASD were either employed or enrolled in a
II. M ETHOD higher education degree, while eight lived independently in
Prior to commencing the data collection, ethical approval council housing and received disability benefits. The control-
was sought and granted by the University of Wolverhampton group participants were either employed or in education.
Ethics Committee (Faculty of Science and Engineering). Eye- Education: Table I provides details on the number of years
tracking data, code, and materials are available in our external spent in education. In Study 2, we also asked about the highest
repository at https://tinyurl.com/detectingautism. level of completed degree. From the ASD group, eight people
had completed high-school (of them, two were enrolled in an
A. Participants undergraduate degree, three were in employment and three
Both Study 1 and Study 2 involved a group of participants relied on benefits), eight had obtained Bachelor’s degrees
with autism and a control group of participants without and three had Master’s degrees. Of the control group, three
ASD. Data was collected from a total of 71 unique partic- people had completed high school (and were enrolled in an
ipants (31 ASD and 40 Control) and retained for a total undergraduate degree), 10 people had Bachelor’s degrees, five
of 68 unique participants (28 ASD and 32 Control). Six people had Master’s degrees and one person had a Ph.D.
participants with ASD and three control-group participants
took part in both experiments. The initial data collected B. Materials
for Study 1 comprised 18 ASD-group participants and 18
Web pages are hypothesized to be a particularly suitable
control-group participants, whereas Study 2 included 19 ASD-
stimulus set for several reasons. They offer a variety of both
group participants and 25 control-group participants. Details
textual and visual stimuli organized for a specific semantic
about demographic characteristics are presented in Table I. All
purpose, which allows for a number of visual searching
participants reported that they used the web daily, except for
strategies to be employed. Exploring web pages is also a
one in Study 2 who reported that she used the web less than
highly familiar and naturalistic task for many people, which is
once a month.
important, as using data from everyday tasks has the potential
Recruitment: All ASD participants were recruited through a to unveil differences that could be captured by means which
UK charity organisation and the Enabling Centre at the Uni- are independent of laboratory equipment. Adult people with
versity of Wolverhampton. For both institutions, they had to autism are also known to focus for longer on images in text-
provide a copy of their formal diagnosis to access the provided and-image pairs compared to controls [22]. For web pages, the
services. The control-group participants were recruited through scanpaths of viewers with autism have been shown to exhibit
snowball sampling from the area of Birmingham, UK. higher variance [23]. These findings lead us to expect that web
Inclusion and Exclusion Criteria: The inclusion criteria for pages are suitable stimuli for investigating the use of visual
the ASD group was a formal diagnosis of autism and all processing differences for autism detection.
participants met the ADOS diagnostic criteria [20]. Some par- The materials used in Study 1 were six web pages, two of
ticipants were diagnosed before the introduction of DSM-5 in which had low visual complexity (Apple and Babylon), two
2013, so other acceptable diagnoses were “High-Functioning had medium complexity (AVG and Yahoo) and two had high
Autism” or “Asperger’s syndrome”. All participants had to be complexity (Godaddy and BBC). The details for the selection
over 18 years of age and able to use a computer. The exclusion of materials in Study 1 are presented in [19]. The screenshots
criteria were a formal diagnosis of any degree of intellectual of these pages are available in our open repository.
disability or a reading disorder, as well as conditions affecting To select the web pages for Study 2 we followed a similar
vision that could not be corrected using glasses or lenses. For procedure to Study 1, where we analysed the top 100 websites
the control group, the inclusion and exclusion criteria were listed by Alexa.com. We first removed any duplicates and

pages that were not in English, as well as those designed as if the possibility for longer times between participants were
search engines and those that required authentication. We then removed. With regards to the Search task, we hypothesized that
computed the visual complexity scores for the home pages of not setting such a short time limit (30 sec) and increasing the
the remaining websites by using the ViCRAM tool [24] and level of difficulty could give better results for the classification
randomly selected eight websites. We ensured that the home of the two groups of participants (the time-limited condition
pages of four of them had low visual complexity (WordPress, had already provided conclusive results about differences in
WhatsApp, Outlook, Netflix), while the home pages of the the searching strategies). In Study 2 we allowed longer times
other four websites had high visual complexity (YouTube, for the equivalent of the Search task (120 seconds per page)
Amazon, Adobe, BBC). The screenshots of these pages are and added extra difficulty to it, as explained below.
also available in the repository. • Browse task 2: The Browse task in Study 2 was the same
as in Study 1, with the sole difference that time was limited
C. Tasks
to 30 seconds per page. Each page would change after the
Both Study 1 and Study 2 included two different kinds of 30 seconds had passed.
tasks, can be found in our open repository. These tasks were
• Synthesis task: Similar to the Search task in Study 1, the
developed based on the “hierarchy of information needs” as
participants were required to answer questions about the
defined by [25], which differentiates between basic acquisition
information provided on the web page. However, these
of simple facts (e.g., browsing a web page), the ability to look
questions required the participants to locate at least two
up the answer to a question on the page (e.g., searching for
elements on the page and compare them, to arrive at the
the relevant element), and the ability to combine information
third piece of information, which was implicit (the correct
from multiple facts in order to arrive at a new piece of
answer). An example for the Netflix page is “How much
information (referred to as “synthesis”). Study 1 included
more would you have to pay compared to the basic plan if
browse and search tasks whereas Study 2 included browse
you wanted to have ultra HD?”. To answer this question,
and synthesis tasks. In each study, the two tasks were executed
the participant had to locate the price of the basic plan, the
in a counterbalanced order for each participant. All questions
price of the ultra-HD plan and compare the two. Having
were posed verbally, and answering the questions of search
the added difficulty of inferential thinking was a potentially
and synthesis tasks required the participant to either locate the
plausible way to enhance the signal, as it better represents
correct element or say what the answer was, respectively. No
the different cognitive profiles of the two groups. The time
interaction with the web pages was involved such as scrolling,
limit per web page was 120 seconds, but the participants
clicking, or typing.
were free to move on earlier if ready.
Study 1: There were two different kinds of tasks:
• Browse task: Participants were instructed to explore the web D. Apparatus
pages and were free to focus for as long or as little on any Both studies used a 60Hz Gazepoint GP3 video-based eye
given element. The time limit for this task was 120 seconds tracker with accuracy of 0.5-1 degree of visual angle. The
per page but the participants could move on to the next screenshots were presented on 19” and 17” LCD monitors
page when they had finished browsing the current one. This for Studies 1 and 2, respectively. The distance between each
task allowed for capturing differences in visual processing participant and the eye tracker was controlled by using a
of page elements without the interference of a specific task. system-integrated sensor, and was roughly 65 cm. The device
• Search task: Participants were required to locate a specific recorded data from the right eye. Fixations were extracted
element on the web page in order to answer a question. using Gazepoint’s built-in algorithm based on the position
An example is the question: “Can you find a telephone variance technique, where a sequence of gaze data estimates
number for technical support and read it?”. This was the spatially located within a local region are determined to belong
most intensive task, as the time limit was 30 seconds for to the current fixation, while subsequent data outside of this
two questions per page. local region is identified as beginning a new fixation [26].
Study 2: As will be seen, the Browse task from Study 1 E. Procedure
gave promising classification results but it was not possible All recordings were completed in a quiet room with only the
to find out whether this was because of a different approach researcher and the participant present. After getting familiar
to browsing in the two groups, or because the ASD group with the purpose and procedure of the experiment, all partic-
spent longer on the task and had more fixation points (reasons ipants signed their consent forms, filled in the demographic
for the longer times could have been related to higher level questionnaires and the Autism Quotient test (control partic-
of conscientiousness or following instructions more strictly). ipants only), and then performed a nine-point calibration of
To clarify this, in Study 2 we capped the time limit for the the device. After a successful calibration, each participant was
Browse task to 30 seconds per page, which meant that each presented with the web pages in a randomized order, to control
participant had spent the same time browsing each page. It for fatigue and memory effects. The two tasks per study were
is important to note that this is not a direct comparison of presented in a counterbalanced order for each participant.
two time limits for the same task, as we also had a different All questions and answers were presented verbally after the
set of participants and web pages. Instead, the aim was to relevant stimulus was presented on the screen. After the
find out whether a discrimination signal would occur even completion of the experiment, the participants were debriefed.

Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 08:59:41 UTC from IEEE Xplore. Restrictions apply.
1258 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 28, NO. 6, JUNE 2020

TABLE II
T HE L IST OF THE F EATURES A LONG W ITH T HEIR E XPLANATION

Fig. 1. Types of Areas of Interest (from left to right): 2 × 2 generic grid,

4 × 4 generic grid and page-specific AOIs.

III. C LASSIFICATION E XPERIMENTS

A. Defining the Areas of Interest
In this paper, we extend the exploration of the effects of AOI
granularity level on classification performance by adding more
extraction configurations than previous work. This allows us
to investigate the trade-off between capturing detailed page-
specific information on one hand, and not introducing extra
noise through too fine a segmentation on the other. We define
the AOIs using two different approaches: (1) generic and (2)
page-specific (see Figure 1). Both approaches are systematic and testing sets for 100 random splits of the data.2 In Study 1,
and can be replicated with other web pages. we report results using data from five randomly-selected
Page-specific AOIs: Page-specific AOIs correspond to visual participants per group for testing and data from the remaining
elements on web pages. For identifying page-specific AOIs, ten participants per group for training. In Study 2, we report
we used the extended version of the Vision-Based Page results from six random participants per group for testing
Segmentation (VIPS) algorithm [27]. This algorithm divides and the remaining 13 participants per group for training. In
web pages into their elements by using both their source code both studies the best results were achieved when splitting the
and visual representation. As a result, it arranges the elements training and testing participants in a 70% : 30% ratio for each
in a hierarchical form, where the deeper levels contain more fold. This randomization was performed for each training and
and smaller elements. Since a user study conducted by [27] testing iteration. The model was first trained to predict which
shows that the fifth level is the most preferred level, we used group a behaviour belongs to. In this context, a behaviour
the elements from that level. However, if a given task could is considered a vector of feature values corresponding to
be completed by fixating a single element, then we used the a given AOI for a given participant and web page. After
deeper levels. An example is presented in Figure 1. There were that, it compares the number of times a behaviour from that
a total of 112 page-specific AOIs identified in the first study participant is classified as belonging to each group (ASD or
and a total of 213 page-specific AOIs in the second study. Control) and considers the group with the higher number as
the predicted group. Since we performed evaluation for 100
Generic AOIs: The generic AOIs consisted of simple square
random splits, the accuracy for each split was then averaged
grids. We previously experimented with a 2 × 2 grid but now
and reported together with the 95% confidence intervals (CI).
we also added a 4 × 4 grid for the data collected in both
Studies 1 and 2. Unlike the page-specific AOIs, this grid-based
IV. R ESULTS
segmentation ensures that none of the fixations are lost.
We trained several logistic-regression classifiers with the
B. Features default parameters in R using combinations of feature sets.
We used all gaze features together with non-gaze ones for all
We experiment with gaze-based and other features, as
pages and selected combinations of pages. The main results
presented in Table II. We investigate 12 different cases: two
are presented in Table III, with additional results for non-gaze
eye-tracking studies each with two types of tasks (Browse and
variables that did not provide a clear pattern (namely Gender
Search for Study 1 and Browse and Synthesis for Study 2) and
and Visual Complexity) presented in the repository. The results
three kinds of AOIs for each task (Page-specific, 2×2 Generic,
for individual pages are presented in Table IV for the Browse
and 4 × 4 Generic). Therefore, we create a different table (i.e.
2 and Synthesis tasks are presented in the repository.
dataset) for each case where each row represents the values of
the features for each AOI on each page for each participant.
A. Browse Tasks
C. Experimental Setting Both the 30-second condition and the 120-second condition
provide a discrimination signal, indicating that, indeed, the
Several experiments were performed, taking into account
two groups have different browsing strategies. Best results
the different tasks and AOI identification approaches. A set
for the two conditions are achieved when using only the
of standard classification algorithms were initially tested by
most discriminatory web pages (Apple and AVG for Browse
using R, where the logistic regression algorithm performed
best.1 The evaluation of performance was based on training 2 We experimented with different training and testing sizes and report the
best performing classifiers; however, the full results for each different data
1 See repository for result comparison to other classifiers. split and each page configuration are available in our repository.

TABLE III
E VALUATION R ESULTS FOR A LL TASKS (95% C ONFIDENCE I NTERVALS G IVEN IN PARENTHESIS )

1 and Outlook for Browse 2; confusion matrix presented in discrimination signal but instead, it provided a comparable
repository). The condition with a longer time limit however, accuracy of 0.73. These results show that even when using a
provides better results (i.e. 0.74 compared to 0.65), most different participant group and stimuli set, eye-tracking data
likely due to the longer times spent by the ASD participants.3 can be used to classify the two groups with a consistent level
A similar result of 0.71 is achieved both by extracting the of accuracy. Notably, the 4 × 4 grid, which was not previously
data from page-specific AOIs and 4 × 4 grid AOIs, and explored in [19], provides the same accuracy for the two
compares to 0.66 for the 2 × 2 grid, indicating that the AOI conditions. This might be due to 4 × 4 providing the same
granularity level is of higher importance compared to AOI number of AOIs, while the page-specific AOIs for the different
type for a Browse task with a generous time limit. For the 30- sets of web pages differed in number (discussed in Section V).
second time limit condition, page-specific AOIs captured the Similar to all previous results, best performance is achieved
differences between the groups better than the generic grids when using data from selected web pages (Apple + Babylon
(0.65 compared to 0.60 for the 2×2 grid and 0.61 for the 4×4 for the Search task and Amazon + Outlook for the Synthesis
grid). These results are related to specific web pages providing task). Interestingly, in all other tasks, combining the pages
a better signal than others. If we look at results where not with best individual results led to optimal accuracy, except for
all of the “best” pages are included, best performance of the Synthesis task where the best performing individual page
0.662 is achieved by combining AVG + GoDaddy + BBC for (BBC) was not included in any of the optimal combinations.
Browse 1. The results for the rest of the page combinations In most cases, adding variables such as AOI ID and whether
can be found in our repository. Other features such as AOI ID or not the AOI contained the correct answer for the Search task
and Media ID do not improve the performance and, in many decreased the prediction accuracy. Again, no clear pattern was
cases, slightly decrease the accuracy, except for Browse 1, observed with regards to page visual complexity (Table IV),
where adding Media ID to the page-specific AOIs improves which was consistent with findings in [22]. The Gender
the result. In both tasks there is no clear pattern related to variable generally lowered the results.
the visual complexity of the pages, as shown in Table IV
and consistent with findings from [19]. The Gender variable V. D ISCUSSION
generally lowered the results. As can be seen from the results, i) all tasks provided a dis-
crimination signal, and ii) the classifiers achieve a comparable
B. Search and Synthesis Tasks accuracy to the one presented in the first study when using
The best result of 0.75 was achieved for the Search task a different set of stimuli and participants. Best results were
using the Apple and Babylon web pages (Table III). This achieved using data from the Search task (0.75), followed by
means that increasing the time limit and level of difficulty data from the first Browse task (0.74), the Synthesis task (0.73)
(i.e. the Synthesis task) did not succeed in providing a stronger and the second Browse task (0.65 for the 30-second condition).
Examination of the 95% confidence intervals reveals that
3 While this seems a probable reason for the better classification accuracy,
the models are robust in their predictions. Overall, these
it is also possible that the superior results might be related to the different
pages and/or participants. While this possibility cannot be ruled out using the results confirm that using visual-processing differences as
presented design, it remains a valid conclusion that there is a discrimination captured through eye-tracking data from web-searching tasks
signal for both time conditions, participant groups and stimuli sets. is a promising direction for automatic detection of autism.

Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 08:59:41 UTC from IEEE Xplore. Restrictions apply.
1260 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, VOL. 28, NO. 6, JUNE 2020

TABLE IV
E VALUATION R ESULTS FOR I NDIVIDUAL W EB PAGES FOR THE B ROWSE (S TUDY 2) AND S YNTHESIS TASKS W ITH PAGE -S PECIFIC AOI S (95%
C ONFIDENCE I NTERVALS G IVEN IN PARENTHESIS )

Task Effects: The type of task affected the classification, latter could be explained by the different number of page-
where best performance was achieved using the Search task specific AOIs in the two studies as identified by the VIPS
and the time-unlimited Browse task. Increased task complexity algorithm (112 for Study 1, corresponding to an average of
did not amplify the discrimination signal (Synthesis). While 18.6 AOIs per page, and 213 for Study 2, corresponding to an
the presented design was not intended to test this explicitly, average of 26.6 AOIs per page). Having more page-specific
a possible explanation may be that the higher-order inferen- AOIs for Study 2 may have increased the level of noise and
tial processes required to complete this task were executed decreased accuracy. These results suggest a trade-off between
covertly, without being reflected in the eye-tracking data. having sufficiently detailed AOIs and not introducing too much
Therefore, future research should test whether adding together noise. The optimal number of AOIs is best defined empirically.
more data points from simple Search tasks would provide Effects of Non-gaze Variables: The visual complexity of the
better accuracy, as opposed to designing more complex tasks. individual web pages does not provide a clear pattern associ-
Differences in Browsing Strategies: Study 1 did not pro- ated with better accuracy for any of the four tasks. Participant
vide conclusive results to decide whether the discrimina- gender had mostly negative effect on the results, while Media
tion signal for the Browse task reflected different browsing ID generally lowered the accuracy, except for the best results
strategies in the two groups or simply reflected longer times for Browse 1, where it introduced a peak in the level of
for browsing within the ASD group. Limiting the time to accuracy. The variable related to correct answer to a Search
30 seconds in Study 2 showed that the two groups have task had a slight positive effect on the Search task classifiers
different browsing strategies even when the time for browsing for the individual pages but when added to the best classifier
is controlled. This finding indicates that the visual processing it reduced the accuracy from 0.75 to 0.73.
of web pages, without the interference of a specific task, works Comparison with Prior Work: As mentioned in
differently in the two groups. The significance of this goes Section I-A, studies using fMRI data represent the state
beyond the task of automatic autism detection, as it implies of the art in autism detection using objective markers (as
that people with autism may scan the elements of a web page opposed to subjective ones). Our results are comparable to
in a different order or be drawn to specific elements (e.g., the accuracy range of 71% to 86.7% from the fMRI studies
images) more than other elements (e.g., text), as suggested by with several important distinctions. First, some of the fMRI
[22]. Further analysis will be needed to confirm this. studies provide higher accuracy and a few of them test the
AOI Identification Effects: We compared two types of validity of their classifiers on unseen data collected in other
AOIs: page-specific and generic. The results show that the AOI centres. This is currently not feasible for our approach due
type has an effect on the performance. In the discussion here to the task-specific nature of the data, and the unavailability
we aim to identify whether this significance is in favour of the of comparable data collected independently. Nevertheless, we
AOI content (i.e. what the AOIs capture) or AOI granularity. compare tasks using different stimuli and participant groups
In terms of AOI content, information from the page-specific and we show that all the main effects from the first study are
AOIs seems to offer superior performance, as the best results confirmed by the second one. These results are also slightly
for Search, Browse 1, and Browse 2 were achieved using lower compared to the accuracy achieved using eye-tracking
page-specific AOIs. At the same time, having generic AOIs data for detecting autism in children. This is expected because
for the two Browse tasks did not affect classification accuracy the comparatively subtler differences between adults with
as much as it did for the Search task, indicating that defining and without high-functioning autism make the task more
the AOIs in a meaningful way with respect to the page is more challenging. It is also possible that the signal extracted using
beneficial for Search tasks as it better captures differences in facial stimuli is stronger than the signal extracted using web
the visual-processing patterns. When all page elements have pages.
equal importance, as was the case with the two Browse tasks, Limitations: Some of the limitations of this study are that
the relevance of the areas to the content is lower. The effects of web-searching tasks are not suitable for very young children
AOI content do not appear in isolation from granularity effects. and that there is a relatively small number of participants
While the 2×2 grid consistently provided the worst results for (comparable with those of state-of-the-art fMRI studies for
all tasks, the 4 × 4 grid had similar performance for Browse autism detection but smaller than the number needed for large
1 and outperformed the page-specific AOIs for Synthesis. The validation studies). The current design is also not able to

provide a conclusive explanation of some the results, e.g., what [8] J. S. Anderson, J. A. Nielsen, A. L. Froehlich, M. B. DuBray,
makes some pages more suitable for eliciting larger between- T. J. Druzgal, A. N. Cariello, J. R. Cooperrider, B. A. Zielin-
ski, C. Ravichandran, P. T. Fletcher, A. L. Alexander, E. D. Bigler,
group differences and why the Synthesis task did not result in a N. Lange, and J. E. Lainhart, “Functional connectivity magnetic res-
better classification accuracy than the Search task, even though onance imaging classification of autism,” Brain, vol. 134, no. 12,
it is more complex. We can only speculate that the reasons for pp. 3742–3754, Dec. 2011.
[9] L. Q. Uddin et al., “Salience network-based classification and prediction
this are related to the covert nature of the inferential process of symptom severity in children with autism,” J. Amer. Med. Assoc.
required to solve the Synthesis questions or that the questions Psychiatry, vol. 70, no. 8, pp. 869–879, Aug. 2013.
indicated too clearly where the participants needed to look, [10] M. Plitt, K. A. Barnes, and A. Martin, “Functional connectivity clas-
sification of autism identifies highly predictive brain features but falls
thus masking their natural searching patterns. short of biomarker standards,” NeuroImage, Clin., vol. 7, pp. 359–366,
Future Work: In spite of the above limitations, the results 2015.
[11] S. Ibrahim, R. Djemal, and A. Alsuwailem, “Electroencephalography
from this study suggest various possibilities for future work. (EEG) signal processing for epilepsy and autism spectrum disorder
These include the development of a serious game that does not diagnosis,” Biocybernetics Biomed. Eng., vol. 38, no. 1, pp. 16–26, 2018.
rely on an eye tracker but logs visual processing differences [12] M. Asgari, A. Bayestehtashk, and I. Shafran, “Robust and accurate
features for detecting and diagnosing autism spectrum disorders,” in
differently; using behavioural data obtained in a natural envi- Proc. Interspeech, Jan. 2013, pp. 191–194.
ronment; and investigating whether gaze data could be used to [13] C. Laidi et al., “Cerebellar anatomical alterations and attention to eyes
detect other clinical conditions that exhibit atypical attention in autism,” Sci. Rep., vol. 7, no. 1, Dec. 2017, Art. no. 12008.
[14] R. S. Hessels, I. T. C. Hooge, T. M. Snijders, and C. Kemner, “Is there
patterns such as dementia, schizophrenia, and ADHD, among a limit to the superiority of individuals with ASD in visual search?”
others. For pursuing all these goals, there are three general J. Autism Develop. Disorders, vol. 44, no. 2, pp. 443–451, Feb. 2014.
questions that need to be explored: i) what makes a good [15] R. S. Hessels, R. Andersson, I. T. C. Hooge, M. Nyström, and
C. Kemner, “Consequences of eye color, positioning, and head move-
task for eliciting larger between-group differences; ii) how to ment for eye-tracking data quality in infant research,” Infancy, vol. 20,
record these behaviours outside of laboratory conditions; and no. 6, pp. 601–633, Nov. 2015.
most importantly, iii) how to do this ethically. [16] G. Wan et al., “Applying eye tracking to identify autism spectrum
disorder in children,” J. Autism Develop. Disorders, vol. 49, no. 1,
pp. 209–215, Jan. 2019.
VI. C ONCLUSION [17] W. Liu, M. Li, and L. Yi, “Identifying children with autism
spectrum disorder based on their face processing abnormality:
This paper presented two separate studies into automatic A machine learning framework,” Autism Res., vol. 9, no. 8, pp. 888–898,
autism detection based on visual processing differences, using Aug. 2016.
different participant and stimulus sets. Both studies confirmed [18] S. Eraslan, V. Yaneva, Y. Yesilada, and S. Harper, “Web users with
autism: Eye tracking evidence for differences,” Behav. Inf. Technol.,
the following main effects: i) that visual processing differences vol. 38, no. 7, pp. 678–700, Jul. 2019.
could potentially be used as a marker of autism with a [19] V. Yaneva, L. A. Ha, S. Eraslan, Y. Yesilada, and R. Mitkov, “Detecting
comparable accuracy across participants and stimulus sets, autism based on eye-tracking data from Web searching tasks,” in Proc.
Internet Accessible Things. New York, NY, USA: ACM, Apr. 2018,
ii) that people with autism process the information contained p. 16.
in web pages differently with and without specific information- [20] C. Lord et al., “Austism diagnostic observation schedule:
location instructions and across different time conditions, and A standardized observation of communicative and social behavior,”
J. Autism Develop. Disorders, vol. 19, no. 2, pp. 185–212,
iii) that the content and granularity level of the areas of interest Jun. 1989.
have an impact on the classification accuracy, while the visual [21] S. Baron-Cohen, S. Wheelwright, R. Skinner, J. Martin, and
complexity of the pages and the participant gender do not. E. Clubley, “The autism-spectrum quotient (AQ): Evidence from
Asperger syndrome/high-functioning autism, males and females, scien-
tists and mathematicians,” J. Autism Develop. Disorders, vol. 31, no. 1,
R EFERENCES pp. 5–17, 2001.
[1] American Psychiatric Association: Diagnostic and Statistical Manual of [22] V. Yaneva, I. Temnikova, and R. Mitkov, “Accessible texts for autism:
Mental Disorders, 5th ed. Arlington, VA, USA: American Psychiatric An eye-tracking study,” in Proc. 17th Int. ACM SIGACCESS Conf.
Association, 2013 Comput. Accessibility (ASSETS). New York, NY, USA: ACM, 2015,
[2] U. Frith, Autism: Explaining the Enigma. Oxford, U.K.: Blackwell, pp. 49–57, doi: 10.1145/2700648.2809852.
2003. [23] S. Eraslan, V. Yaneva, Y. Yesilada, and S. Harper, “Do Web users with
[3] A. Bernas, A. P. Aldenkamp, and S. Zinger, “Wavelet coherence-based autism experience barriers when searching for information within Web
classifier: A resting-state functional MRI study on neurodynamics in pages?” in Proc. 14th Web All Conf. Future Accessible Work (W4A).
adolescents with high-functioning autism,” Comput. Methods Programs New York, NY, USA: ACM, 2017, p. 20.
Biomed., vol. 154, pp. 143–151, Feb. 2018. [24] E. Michailidou, “Visual complexity rankings and accessibility metrics,”
[4] T. Falkmer, K. Anderson, M. Falkmer, and C. Horlin, “Diagnostic Ph.D. dissertation, School Comput. Sci., Univ. Manchester, Manchester,
procedures in autism spectrum disorders: A systematic literature review,” U.K., 2010.
Eur. Child Adolescent Psychiatry, vol. 22, no. 6, pp. 329–340, Jun. 2013. [25] G. Marchionini, “Exploratory search: From finding to understanding,”
[5] L. D. Wiggins, J. Baio, and C. Rice, “Examination of the time between Commun. ACM, vol. 49, no. 4, pp. 41–46, Apr. 2006.
first evaluation and first autism spectrum diagnosis in a population-based [26] R. J. Jacob, “Eye tracking in advanced interface design,” in Virtual
sample,” J. Develop. Behav. Pediatrics, vol. 27, no. 2, pp. S79–S87, Environments and Advanced Interface Design. Oxford, U.K.: Oxford
Apr. 2006. Univ. Press, 1995, pp. 258–288.
[6] M. Davidovitch, N. Levit-Binnun, D. Golan, and P. Manning-Courtney, [27] M. E. Akpınar and Y. Yesi̧lada, “Vision based page segmentation
“Late diagnosis of autism spectrum disorder after initial negative assess- algorithm: Extended and perceived success,” in Current Trends in
ment by a multidisciplinary team,” J. Develop. Behav. Pediatrics, vol. 36, Web Engineering (Lecture Notes in Computer Science), vol. 8295,
no. 4, pp. 227–234, May 2015. Q. Z. Sheng and J. Kjeldskov, Eds. New York, NY, USA: Springer,
[7] C. Murphy et al., “Autism spectrum disorder in adults: Diagnosis, man- 2013, pp. 238–252.
agement, and health services development,” Neuropsychiatric Disease [28] M. Roy and M. T. H. Chi, “Gender differences in patterns of searching
Treat., vol. 12, pp. 1669–1686, Jul. 2016. the Web,” J. Educ. Comput. Res., vol. 29, no. 3, pp. 335–348, Oct. 2003.

Authorized licensed use limited to: SRM Univ - Faculty of Eng & Tech- Vadapalani. Downloaded on January 21,2024 at 08:59:41 UTC from IEEE Xplore. Restrictions apply.