0% found this document useful (0 votes)

12 views7 pages

Early Detection of Asd Using Data Mechine

Uploaded by

Jr. Dzjoker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

Early Detection of Asd Using Data Mechine

Uploaded by

Jr. Dzjoker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Open access Original research

Early detection of autism spectrum

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
disorder in young children with machine
learning using medical claims data
Yu-Hsin Chen ‍ ‍,1 Qiushi Chen ‍ ‍,1 Lan Kong ‍ ‍,2 Guodong Liu ‍ ‍2,3,4,5

To cite: Chen Y-H, Chen Q, ABSTRACT

Kong L, et al. Early detection Objectives Early diagnosis and intervention are keys for WHAT IS ALREADY KNOWN ON THIS TOPIC
of autism spectrum disorder in improving long-term outcomes of children with autism ⇒ Growing evidence has shown that existing autism
young children with machine spectrum disorder (ASD)- specific screening tools
spectrum disorder (ASD). However, existing screening
learning using medical claims (eg, Modified Checklist for Autism in Toddlers) may
tools have shown insufficient accuracy. Our objective is
data. BMJ Health Care Inform
to predict the risk of ASD in young children between 18 not yield sufficient accuracy for early detection of
2022;29:e100544. doi:10.1136/
bmjhci-2022-100544 months and 30 months based on their medical histories children with ASD in clinical practice.
using real-world health claims data. ⇒ Previous clinical and health service research has
► Additional supplemental
Methods Using the MarketScan Health Claims Database identified clinical risk factors associated with ASD,
material is published online only.
2005–2016, we identified 12 743 children with ASD and but the clinical factors from an individual’s prior
To view, please visit the journal
a random sample of 25 833 children without ASD as medical history have not been used comprehen-

copyright.
online (http://dx.doi.org/10.
1136/bmjhci-2022-100544). our study cohort. We developed logistic regression (LR) sively to assess the risk of ASD in young children.
with least absolute shrinkage and selection operator and
WHAT THIS STUDY ADDS
random forest (RF) models for predicting ASD diagnosis
Received 06 January 2022 at ages of 18–30 months, using demographics, medical ⇒ This study demonstrated the feasibility of predicting
Accepted 19 August 2022
diagnoses and healthcare service procedures extracted ASD diagnosis with promising accuracy based on an
from individual’s medical claims during early years individual’s medical record from health claims data
postbirth as predictor variables. using machine learning models.
Results For predicting ASD diagnosis at age of 24 ⇒ Our prediction models were clinically interpretable,
months, the LR and RF models achieved the area under which systematically identified key predictors in line
the receiver operating characteristic curve (AUROC) of with known risk factors and symptoms among ASD
0.758 and 0.775, respectively. Prediction accuracy further children in the literature.
increased with age. With predictor variables separated by HOW THIS STUDY MIGHT AFFECT RESEARCH,
outpatient and inpatient visits, the RF model for prediction PRACTICE OR POLICY
at age of 24 months achieved an AUROC of 0.834, with
⇒ This study may serve as a basis for integrating pre-
96.4% specificity and 20.5% positive predictive value at
dictive modelling into the health information system
40% sensitivity, representing a promising improvement
and the clinical workflow to enhance the current
over the existing screening tool in practice.
ASD screening practice.
Conclusions Our study demonstrates the feasibility of
using machine learning models and health claims data
to identify children with ASD at a very young age. It is challenges with daily life, education and
deemed a promising approach for monitoring ASD risk
employment.3
in the general children population and early detection of
Early diagnosis is the key to early interven-
high-risk children for targeted screening.
tion for improving the long-term outcomes
of children with ASD. However, despite the
INTRODUCTION growing evidence shows that accurate and
Autism spectrum disorder (ASD) is a devel- stable diagnoses can be made by 2 years,4 in
opmental disorder that involves persistent real-world settings, the median age of ASD
© Author(s) (or their challenges in social interaction, speech and diagnosis is 50 months.2 To improve early
employer(s)) 2022. Re-use
nonverbal communication, and restricted diagnosis, the American Academy of Pedi-
permitted under CC BY-NC. No
commercial re-use. See rights and repetitive behaviours.1 In the USA, the atrics (AAP) has recommended universal
and permissions. Published by prevalence of ASD has increased substantially screening among all children at 18- month
BMJ. in the past two decades, with an estimate of and 24-month well-child visits in the primary
For numbered affiliations see every 1 in 44 children to be identified with care settings using the Modified Checklist
end of article. ASD by age 8 in 2016.2 Although there exist for Autism in Toddlers (M-CHAT),5 a ques-
Correspondence to evidence-based interventions which improve tionnaire that assesses children’s behaviour
Dr Qiushi Chen; core symptoms in children with ASD, many for toddlers.6 However, growing evidence
q.chen@psu.edu children with ASD still experience long-term has shown that using M- CHAT alone may

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 1

Open access

not yield sufficient accuracy in detecting ASD cases, with demographics (eg, sex, birth year, postal region), service

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
a sensitivity below 40% and a positive predictive value providers, insurance plans, medical diagnoses (in inter-
(PPV) under 20%.7 8 national Classification of Diseases (ICD)-9/10 codes) and
In addition to ASD-specific behavioural questionnaires, procedures (in Healthcare Common Procedure Coding
general clinical and healthcare records may also contain System (HCPCS) and Current Procedural Terminology-4
meaningful signals to differentiate the ASD risks among codes) at each encounter of healthcare services.
very young children. Studies have found that children
with ASD are oftentimes accompanied by certain symp- Study population
toms and medical issues such as gastrointestinal prob- We constructed an initial cohort consisting of young
lems,9 infections10 11 and feeding problems.12 This implies children with and without ASD (figure 1). The inclusion
that past diagnosis and healthcare encounter informa- criteria of the ASD cohort are as follows: (1) having at
tion, commonly available from health insurance claims or least 2 outpatient or 1 inpatient ASD diagnosis encoun-
Electronic Healthcare Record (EHR), could potentially ters (299 for ICD- 9 and F84 for ICD- 10) throughout
be used for ASD risk prediction. In fact, medical claims the existing records20 21; and (2) having continuous
and EHR data have been widely used in the health infor- enrolment from 4 months to 30 months to ensure the
matics literature for identifying disease- specific early completeness of health records from the claims data that
phenotypes even before the hallmark symptoms start to can be used for diagnosis prediction at up to 30 months
manifest, such as for chronic diseases like heart failures,13 (online supplemental figure S1). To create the non-ASD
diabetes14 and Alzheimer’s disease.15 In the context of cohort, we first identified individuals without any ASD
ASD, health record data has been used to identify the ASD diagnosis throughout their health records, then downs-
subtypes16 17 and to predict the suicidal risk in adolescents ampled 5% of the population to obtain a computation-
with ASD18; however, its use for predicting ASD diagnosis ally manageable yet sufficiently large subset of samples.

copyright.
in young children has remained limited. To fill this gap, To ensure patients had adequate follow-up time to receive
the objective of this study is to examine the feasibility of confirmed ASD diagnosis in the database, we restricted
using large-scale real-world medical claims data to develop our selection of non- ASD patients by requiring a full
a prediction model for ASD diagnosis in young children, enrolment period from 4 months to 60 months (online
which can be used to support effective ASD screening supplemental table S1).
strategies and facilitate early detection.
Predictor variables for ASD diagnosis
We examined all diagnosis and procedure codes of a
METHODS child’s medical encounters available from as early as
Data source within 4 months after birth up to the age for prediction
We used the deidentified individual- level longitu- of ASD. We applied the Clinical Classifications Software
dinal healthcare claims data from the IBM MarketScan (CCS),22 a commonly used tool in health informatics
Commercial Claims and Encounters Database from research, to aggregate the large number of distinct diag-
2005 to 2016. This database includes over 273 million nosis and procedure codes into clinically meaningful
unique individuals for both privately and publicly insured groups (figure 1). The single-level CCS maps the ICD-
people in the USA.19 The claims data include baseline 9/10 and HCPCS codes to a substantially smaller yet

Figure 1 Overview of study design for the predictive analysis. ASD, autism spectrum disorder; AUROC, area under receiver
operating characteristic curve; AUPRC, area under precision-recall curve; LASSO, least absolute shrinkage and selection
operator; PPV, positive predictive value.

2 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Open access

practical set that includes 285 diagnosis and 231 proce- is defined as the harmonic mean of PPV and sensitivity,

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
dure categories.22 We further removed the same- day which are suited for evaluating the prediction perfor-
duplications of CCS codes after the mapping by counting mance for the imbalanced testing sample.26 27 To assess
at most one encounter of a specific CCS category for each the stability and the uncertainty of prediction perfor-
person on each day. mance, we repeated the training and testing set sampling,
To predict the ASD diagnosis at the age of 24 months in model training, testing and performance evaluation with
our base case model, in line with the age when a diagnosis 50 independent replications. The 95% CIs of all perfor-
can possibly be made by an experienced professional,4 mance measures were reported.
we defined the predictor variables as the total number
of encounters for each CCS category up to the age for Predicting ASD diagnosis at different ages
prediction of 24 months. We also included sex and the In addition to the base case prediction model where the
encounters of emergency department visits, which are risk of ASD diagnosis was assessed based on clinical infor-
well-known clinically relevant factors associated with the mation up to 24 months, we compared the accuracy of
autism population.23 Variables that were present in <1% ASD prediction with varying lengths of available medical
of both ASD and non-ASD cohorts were excluded.24 A history at (1) a younger age, 18 months, considering
total of 170 input predictor variables were included for that the universal ASD screening is recommended for
prediction at the age of 24 months. Having considered children at both 18 months and 24 months5; and (2) an
that the course of clinical events may be following a older age, 30 months, which is still a critical time point for
different pattern after an encounter with ASD diagnosis, monitoring the developmental delays and consideration
we excluded any children who had at least one encounter of early intervention.28 We followed the same approach
with ASD diagnosis code prior to the age for prediction in the base case to exclude predictor variables of low
in our analysis. frequency (resulting in 150 and 180 predictor variables

copyright.
in total for prediction at 18 and 30 months, respectively)
Prediction model development and validation and the children with ASD diagnosis prior to the age for
We employed two machine learning methods, logistic prediction.
regression (LR) and random forest (RF), which have
been widely used for developing risk prediction models Identifying key predictor variables
in various clinical settings. LR assumes that the indepen- We further explored how many and which key predictive
dent variables are linearly related to the log odds and that variables had the most impact on the prediction perfor-
the effects of multiple variables are additive, whereas RF mance using the Gini importance index from the RF
is particularly suitable for exploiting nonlinear interac- model. We added variables incrementally following the
tive effects in high-dimensional data. For the LR model, order of Gini Index (ie, starting with the most important
we also applied the least absolute shrinkage and selec- variable) and evaluated how the prediction accuracy
tion operator (LASSO) as a feature selection technique changed as more variables were included. Selected key
to enforce the coefficients of weak predictors to be zero. predictive variables were then compared with those iden-
The RF model was limited to up to 100 decision trees tified by alternative strategies using (1) the absolute value
in the base case setting (other choices of the maximum of coefficients from the LASSO LR model and (2) the
number of trees were tested in sensitivity analysis). prevalence of each variable in the identified ASD cohort.
To train our model, we sampled 10 000 ASD and 10 000
non-ASD subjects (N=20 000) from the initial cohort to Separating inpatient and outpatient visits
build a large balanced training sample for maximising the Considering that the underlying severity of the symp-
discriminatory power learnt by the prediction model. To toms could potentially differ by inpatient hospitalisations
evaluate the model prediction performance, we created and outpatient visits,29 we split the number of encoun-
an independent imbalanced testing set (N=16 201) ters for each diagnosis and procedure by inpatient and
comprised of ASD and non- ASD patients from the outpatient visit separately and augmented the predic-
remaining cohort that were mutually exclusive from the tion model with more detailed encounter variables. We
training set. The testing set resembled the real-world esti- compared the prediction performance of the models
mates for ASD prevalence of 2.3% (ie, 1 in every 44) in using the augmented variables with our base case models.
the general population.2
We measured the prediction performance with sensi- Sensitivity analysis
tivity (also known as true positive rate or recall), speci- We performed sensitivity analysis on several modelling
ficity (or true negative rate) and PPV (or precision)25 assumptions to assess the robustness of our prediction
at various selected risk thresholds. The model’s overall models. Specifically, we strengthened the inclusion
discrimination ability was measured using the area under criteria for non-ASD subjects by requiring one additional
the receiver operating characteristic curve (AUROC). We year of enrollment, that is, increased from 4–60 months
also calculated the area under the precision-recall curve to 4–72 months. Furthermore, we assessed the potential
(AUPRC) where the precision-recall curve represents the loss of information due to excluding variables with <1%
relationship between PPV and sensitivity, and F1 score prevalence, to verify that such a variable prescreening

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 3

Open access

procedure would not miss out on rare but crucial predic-

12.1 (11.4 to 12.8)

21.0 (20.2 to 21.8)

PPV, % (95% CI)

11 (10.6 to 11.5)
8.7 (8.4 to 9.0)
6.7 (6.5 to 6.9)

8.4 (8.1 to 8.8)

5.1 (4.9 to 5.2)

5.1 (5.0 to 5.2)

5.1 (5.0 to 5.2)
4.6 (4.5 to 47)

*The sensitivity threshold of 40% was selected to be comparable with the estimated sensitivity of 33%–39% for the existing autism-specific screening tools from real-world clinical settings.7 8
AUPRC, area under precision-recall curve; AUROC, area under receiver operator characteristic curve; LASSO, least absolute shrinkage and selection operator; PPV, positive predictive value.
RESULTS
Predicting ASD diagnosis at age of 24 months
We identified the study cohort consisting of 12 743 ASD
subjects and 25 833 non-ASD subjects (more details in
online supplemental table S1). When predicting the ASD
diagnosis at the age of 24 months in independent testing
samples, the LR and RF models achieved the AUROC of

Specificity, % (95% CI)

0.758 (95% CI 0.755 to 0.762) and 0.775 (95% CI 0.771
to 0.779), respectively (table 1, figure 2). Compared with

90.0 (89.7 to 90.4)

83.7 (83.2 to 84.2)
66.1 (65.4 to 66.8)
93.0 (92.6 to 93.5)
87.3 (86.7 to 87.9)
69.6 (68.7 to 70.4)

78.4 (77.9 to 78.9)

78.8 (78.3 to 79.4)

90.4 (90.0 to 90.8)

95.6 (95.3 to 95.8)
the LR model, RF model also showed a higher AUPRC
(LR 0.101 (95% CI 0.098 to 0.104); RF 0.143 (95% CI
0.138 to 0.148)) and F1 score (LR 0.193 (95% CI 0.188
to 0.197); RF: 0.246 (95% CI 0.240 to 0.251)). The limit

Performance of LASSO logistic regression and random forest models in prediction of autism spectrum disorder
of up to 100 trees in the RF model was deemed sufficient
to achieve stable performance. Further increasing the

Sensitivity target,
model complexity did not translate to an improvement in
prediction accuracy (online supplemental table S2).

% (95% CI)

40*
50
70
40
50
70

50
50

50
50
30 months, we found that the prediction performance
increased substantially with the age. Specifically for the
0.193 (0.188 to 0.197)

0.246 (0.240 to 0.251)

0.128 (0.124 to 0.132)

0.130 (0.125 to 0.134)

0.255 (0.249 to 0.261)

0.326 (0.322 to 0.331)
RF model, the AUROC increased from 0.717 (0.714–
0.721) at age of 18 months to 0.832 (0.828–0.835) at 30
months (table 1). Similarly, the AUPRC increased from
F1 (95% CI)

0.067 (0.065–0.069) to 0.234 (0.227–0.240) (figure 3),

and F1 score increased from 0.130 (0.125–0.134) to 0.326
(0.322–0.331) from age of 18–30 months. The LR model,
although with a lower prediction accuracy compared
with the RF model in general, also showed a consistently
0.101 (0.098 to 0.104)

0.143 (0.138 to 0.148)

0.066 (0.064 to 0.068)

0.067 (0.065 to 0.069)

0.148 (0.143 to 0.153)

0.234 (0.227 to 0.240)
increasing prediction performance as the age increased.
AUPRC (95% CI)

Identifying key predictive variables

As the RF model included more variables following the
importance order by the Gini index, it showed higher
AUROC (online supplemental figure S2). For predic-
tion at age of 24 and 30 months, 30–40 most important
0.758 (0.755 to 0.762)

0.775 (0.771 to 0.779)

0.720 (0.716 to 0.723)

0.717 (0.714 to 0.721)

0.800 (0.797 to 0.803)

0.832 (0.828 to 0.835)

variables were sufficient to achieve stable prediction

AUROC (95% CI)

performance with AUROC, whereas for an earlier age of

18 months, the top 50 important variables contributed
to most of the prediction performance, while including
additional variables could continue to marginally improve
the prediction performance. We closely examined the 50
most important variables of the RF model (ranked by
At age of 24 months (base case)
Settings and prediction model

Gini index) and the LR model (ranked by the median

At age of 18 months (younger)

absolute value of the coefficient) for prediction at age of

LASSO logistic regression

At age of 30 months (older)

LASSO logistic regression

24 months (online supplemental figure S3). The identi-

fied important variables included sex, developmental and
nervous system disorders, psychological and psychiatric
services, respiratory system infections and symptoms,
Random forest

Random forest

gastrointestinal-related diagnosis, ear and eye infections,

Table 1

perinatal conditions, and ED visits, which have also been

seen as separate risk factors associated with ASD cases in
the clinical literature. The key predictors of the RF model

4 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Open access

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
Figure 2 Receiver operating characteristic curves (A) and
precision-recall (PR) curves (B) for prediction of autism
spectrum disorder (ASD) diagnosis at age of 24 months. The
prevalence stands for the baseline 2.27% (ie, 1 in 44) ASD Figure 4 Comparison of area under the receiver operating
prevalence in the general population. AUC, area under curve; characteristic curve (AUROC) with combined versus
LR, logistic regression; RF, random forest. separated inpatient and outpatient encounters by LASSO
logistic regression (LR) and random forest (RF) models, at
were also highly consistent with high prevalence variables, the age of 18, 24 and 30 months, respectively. Error bars
sharing 47 out of 50 most common variables in the ASD in the figure represent the 95% CIs based on results from
50 replications of independent runs. LASSO, least absolute
cohort (online supplemental figure S4).
shrinkage and selection operator.
Prediction using separated inpatient and outpatient data

copyright.
Separating inpatient and outpatient encounters further the non-ASD cohort, children would be less likely to be
increased the AUROC for prediction at the age of 24 misclassified. We also verified that including the low-
months to 0.766 (95% CI 0.762 to 0.769) in the LR model prevalence variables would not result in substantial differ-
and 0.834 (95% CI 0.831 to 0.837) in the RF model. At ences but only marginal changes of AUROC within 0.01
the target sensitivity of 40%, the RF model achieved a across all model specifications.
higher specificity of 96.4% (95% CI 96.2% to 96.5%) with
a PPV of 20.5% (95% CI 19.8% to 21.1%), outperforming
the existing screening tool M-CHAT/F (with a sensitivity DISCUSSION
of 38.8%, specificity of 94.9% and PPV of 14.6%). We Early identification is vital for children with ASD to ensure
found that using claims data separated by inpatient and their access to timely intervention and to optimise long-
outpatient visits improved the prediction performance term outcomes. In this study, we demonstrated the feasi-
consistently across all ages (figure 4). bility of predicting ASD diagnosis at early ages using health
claims data and machine learning models. We found that
Robustness check and sensitivity analysis LASSO LR and RF models achieved an overall AUROC
With a more stringent inclusion criterion for non-ASD above 0.75 when predicting ASD diagnosis at age of 24
subjects by requiring a longer full enrollment period up months. Our results also showed that prediction perfor-
to 72 months (vs 60 months in our base case), we found mance increased with age at the time of prediction. This
that the prediction performance had modest improve- is reasonable because more clinical information accumu-
ment (online supplemental table S3). It could be partially lated over a longer follow-up period since birth may contain
attributed to the fact that with longer years to ascertain more distinctive patterns to effectively differentiate chil-
dren with ASD. The prediction models developed in our
study are clinically interpretable. Key predictors, such as sex
(male), developmental delays, gastrointestinal disorders,
respiratory system infections and otitis media have shown
strong predictive values for ASD diagnosis, which are in line
with previous clinical studies that have shown these symp-
toms being associated with ASD children. Finally, our study
showed that separating inpatient and outpatient claims as
predictors could further improve the prediction accuracy.
In our study, both LASSO LR and RF models showed
promising accuracy in predicting ASD diagnosis based on
an individual’s medical claims data. This robust finding
Figure 3 Receiver operating characteristic curves (A) and implies that there may exist distinct patterns in health
precision-recall curves (B) for prediction of autism spectrum conditions and health service needs among young chil-
disorder at ages of 18, 24 and 30 months, respectively, by the dren with ASD, well before the onset of most hallmark
random forest model. AUC, area under curve. ASD behavioural symptoms. Such predictive signals can

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 5

Open access

be easily extracted from the electronic health records or rather a tool to enhance the screening accuracy. If more

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
medical claims administrative data, and used for the early detailed electronic health record data were available, the
identification of ASD cases. We also observed differences in proposed risk prediction model could be further extended
the performance between the two models. The RF model by incorporating screening results with clinical informa-
outperformed the LASSO LR model in general, likely tion, or by differentiating the clinical information before
because, with its tree-based model structure, the RF model versus after the screening events, to further improve the
is better at capturing complex interactive effects among accuracy of identifying high-risk ASD cases for further diag-
the predictor variables to distinguish between the ASD nostic evaluation.
and non-ASD cases, whereas the LR model synthesises the Our study has several limitations. First, diagnosis of ASD
effects of multiple variables additively. The advantage of the established only based on existing diagnosis codes from
RF model became more salient when input variables were claims data could be inaccurate and unreliable sometimes
separated by inpatient and outpatient claims into a more in practice. We followed a validated approach in ASD
granular level. health service research literature to identify the ASD cohort
Our study has made an important contribution to in our study.31 Second, the absence of ASD diagnosis codes
applying health informatics in the field of ASD. Although in one’s health record may not necessarily indicate an indi-
there exists a plethora of literature identifying individual vidual not having ASD, especially for children born in later
risk factors of ASD, using large healthcare service data and years, due to limited follow-up time prior to the cut-off date
machine learning models to systematically predict ASD in the database. Thus, we required full enrollment up to
diagnosis has remained much less explored. Unlike existing 60 months without ASD diagnoses to identify the non-ASD
clinical informatics studies that focused on detecting ASD cohort, and verified the robustness of our base case results
subtypes,16 17 we aim to detect ASD cases among the general in a sensitivity analysis requiring full enrolment up to 72
children population, that is, the early detection. This could months. Third, as autistic children are likely to have a wide

copyright.
be particularly challenging due to the low prevalence of range of comorbid conditions with various frequencies, for
ASD in the general population (ie, a highly imbalanced individuals who do not present comorbid conditions from
dataset), and the scarcity of information available at such the past healthcare encounter data, our model may provide
a young age. Nevertheless, our model showed promising limited value. Our risk prediction model can be further
prediction performance. The RF model with separated augmented by additional information other than informa-
inpatient and outpatient encounters achieved a specificity tion from the health claims database, such as ASD/develop-
of 96.4% at a sensitivity of 40% for the ASD prediction at mental screening results and behaviour-related information
the age of 24 months, outperforming the accuracy of the from a more comprehensive EHR dataset in future studies.
existing ASD- specific screening tool (sensitivity: 38.8%; Lastly, the diagnosis and procedure codes in insurance
specificity: 94.9%) from a clinical observational study.7 It is claims data may be subject to variabilities and irregulari-
worth noting that under a similar ASD prevalence (2.2%), ties. Instead of the original detailed clinical codes, we used
our model showed a higher PPV (20.5% vs 14.6%). aggregated CCS categories for diagnoses and procedures
Our prediction model for ASD diagnosis could lead to for more robust clinical measures.
a significant impact on the screening strategies for ASD in
young children. Although the AAP guidelines recommend
universal screening in all children, it has been debated that,
without the perfect screening tool, universal screening may CONCLUSIONS
result in overburdened diagnostic services in the health- Using real-world health claims data and machine learning
care system as these clinical resources are in extremely methods, we developed a prediction model that can success-
short supply.30 Our prediction models have demonstrated fully predict ASD diagnosis for children under 30 months
promising improvement over the existing ASD screening with promising prediction accuracy. Our model also identi-
tool by using clinical information, which could potentially fied the important predictors for the diagnosis prediction,
serve as a ‘triaging tool’ for identifying high-risk patients for which showed meaningful clinical relevance and intuition.
diagnostic evaluation. Moreover, the models only based on Our predictive modelling approach could potentially be
health claims data makes it practically feasible to integrate generalised to broader clinical settings for predicting the
into an EHR system or insurance claims database. It could diseases that may show early signals from past healthcare
further enable an automatic screening tool, which can service encounters in claims or EHR data. Future studies
continuously monitor an individual’s risk as new diagnosis could explore the prediction of ASD diagnosis dynamically
and procedure information emerges, and send reminders over time as new healthcare encounter occurs, and inves-
to patients or providers for a timely clinical assessment if tigate how validated risk prediction models could be inte-
necessary. On the other hand, it is possible that some diag- grated and used to inform ASD screening strategies.
nosis and procedure information appear after a concern
Author affiliations
that the child had autism has already existed, such as 1
The Harold and Inge Marcus Department of Industrial and Manufacturing
following a positive screening event, which could alter the Engineering, The Pennsylvania State University, University Park, Pennsylvania, USA
course of subsequent clinical events. As such, our prediction 2
Department of Public Health Sciences, The Pennsylvania State University College of
model is not designed to direct the screening decisions, but Medicine, Hershey, Pennsylvania, USA

6 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Open access
3
Department of Psychiatry and Behavioral Health, The Pennsylvania State University autism and pervasive developmental disorders. J Autism Dev Disord

BMJ Health & Care Informatics: first published as 10.1136/bmjhci-2022-100544 on 8 September 2022. Downloaded from https://informatics.bmj.com on 4 November 2024 by guest. Protected by
College of Medicine, Hershey, Pennsylvania, USA 2001;31:131–44.
4
Department of Pediatrics, The Pennsylvania State University College of Medicine, 7 Guthrie W, Wallis K, Bennett A, et al. Accuracy of autism screening in
a large pediatric network. Pediatrics 2019;144.
Hershey, Pennsylvania, USA 8 Carbone PS, Campbell K, Wilkes J, et al. Primary care autism
5
The Center for Applied Studies in Health Economics (CASHE), The Pennsylvania screening and later autism diagnosis. Pediatrics 2020;146.
State University College of Medicine, Hershey, Pennsylvania, USA doi:10.1542/peds.2019-2314. [Epub ahead of print: 06 07 2020].
9 Chaidez V, Hansen RL, Hertz-Picciotto I. Gastrointestinal problems in
Contributors GL, QC and Y-HC conceived of the presented idea, and developed it children with autism, developmental delays or typical development.
J Autism Dev Disord 2014;44:1117–27.
with support from GL and QC. Y-HC cleaned and preprocessed the data, developed 10 Rosen NJ, Yoshida CK, Croen LA. Infection in the first 2 years of life
prediction models, and performed model evaluations. All authors interpreted the and autism spectrum disorders. Pediatrics 2007;119:e61–9.
model results. Y-HC and QC drafted the manuscript, which was critically revised by 11 Adams DJ, Susi A, Erdie-Lalena CR, et al. Otitis media and related
all authors. QC is the guarantor of the project. complications among children with autism spectrum disorders.
J Autism Dev Disord 2016;46:1636–42.
Funding This work has been supported by Penn State Social Science Research
12 Ledford JR, Gast DL. Feeding problems in children with autism
Institute Level 1 Seed Grant (QC, GL), Penn State College of Engineering spectrum disorders. Focus Autism Other Dev Disabl 2006;21:153–66.
Multidisciplinary Research Seed Grant (Y-HC, QC, GL) and NIH R21 grant: 1 R21 13 Sideris C, Alshurafa N, Pourhomayoun M, et al. A data-driven feature
MH119480-01A1 (GL, LK). extraction framework for predicting the severity of condition of
congestive heart failure patients. Annu Int Conf IEEE Eng Med Biol
Competing interests None declared.
Soc 2015;2015:2534–7.
Patient consent for publication Not applicable. 14 Nguyen BP, Pham HN, Tran H, et al. Predicting the onset of type 2
diabetes using wide and deep learning with electronic health records.
Provenance and peer review Not commissioned; externally peer reviewed. Comput Methods Programs Biomed 2019;182:105055.
Data availability statement Data may be obtained from a third party and are not 15 Park JH, Cho HE, Kim JH, et al. Machine learning prediction of
publicly available. incidence of Alzheimer's disease using large-scale administrative
health data. NPJ Digit Med 2020;3:46.
Supplemental material This content has been supplied by the author(s). It has 16 Lingren T, Chen P, Bochenek J, et al. Electronic health record based
not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been algorithm to identify patients with autism spectrum disorder. PLoS
peer-reviewed. Any opinions or recommendations discussed are solely those One 2016;11:e0159621.
17 Vargason T, Frye RE, McGuinness DL, et al. Clustering of co-

copyright.
of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and
occurring conditions in autism spectrum disorder during early
responsibility arising from any reliance placed on the content. Where the content
childhood: a retrospective analysis of medical claims data. Autism
includes any translated material, BMJ does not warrant the accuracy and reliability Res 2019;12:1272–85.
of the translations (including but not limited to local regulations, clinical guidelines, 18 Downs J, Velupillai S, George G, et al. Detection of suicidality in
terminology, drug names and drug dosages), and is not responsible for any error adolescents with autism spectrum disorders: developing a natural
and/or omissions arising from translation and adaptation or otherwise. language processing approach for use in electronic health records.
AMIA Annu Symp Proc 2017;2017:641–9.
Open access This is an open access article distributed in accordance with the 19 IBM MarketScan research databases, 2020. Available: https://www.
Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which ibm.com/products/marketscan-research-databases
permits others to distribute, remix, adapt, build upon this work non-commercially, 20 Burke JP, Jain A, Yang W, et al. Does a claims diagnosis of autism
and license their derivative works on different terms, provided the original work is mean a true case? Autism 2014;18:321–30.
properly cited, appropriate credit is given, any changes made indicated, and the use 21 Coleman KJ, Lutsky MA, Yau V, et al. Validation of autism spectrum
is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/. disorder diagnoses in large healthcare systems with electronic
medical records. J Autism Dev Disord 2015;45:1989–96.
ORCID iDs 22 Agency for Healthcare Research and Quality R, MD. HCUP clinical
Yu-Hsin Chen http://orcid.org/0000-0002-3678-7517 classifications software (CCS) for ICD-9-CM Healthcare Cost and
Utilization Project (HCUP) 2006-2009; 2020. www.hcup-us.ahrq.gov/
Qiushi Chen http://orcid.org/0000-0003-4031-2669
toolssoftware/ccs/ccs.jsp
Lan Kong http://orcid.org/0000-0001-6098-9445 23 Loomes R, Hull L, Mandy WPL. What is the male-to-female ratio in
Guodong Liu http://orcid.org/0000-0001-8683-0803 autism spectrum disorder? A systematic review and meta-analysis.
J Am Acad Child Adolesc Psychiatry 2017;56:466–74.
24 He D, Mathews SC, Kalloo AN, et al. Mining high-dimensional
administrative claims data to predict early hospital readmissions.
REFERENCES J Am Med Inform Assoc 2014;21:272–9.
1 American Psychological Association. Diagnostic and statistical 25 Hunink MGM, Weinstein MC, Wittenberg E. Decision making in health
manual of mental disorders (DSM-5®). American Psychiatric Pub, and medicine : Integrating evidence and values. 2nd ed. Cambridge
2013. University Press, 2014.
2 Maenner MJ, Shaw KA, Bakian AV, et al. Prevalence and 26 Jeni LA, Cohn JF, De La Torre F. Facing imbalanced data
characteristics of autism spectrum disorder among children aged 8 recommendations for the use of performance metrics. Int Conf Affect
Years - autism and developmental disabilities monitoring network, 11 Comput Intell Interact Workshops 2013;2013:245–51.
Sites, United States, 2018. MMWR Surveill Summ 2021;70:1–16. 27 Ozenne B, Subtil F, Maucort-Boulch D. The precision--recall curve
3 McPheeters ML, Weitlauf A, Vehorn A. U.S. preventive services overcame the optimism of the receiver operating characteristic curve
Task force evidence syntheses, formerly systematic evidence in rare diseases. J Clin Epidemiol 2015;68:855–9.
reviews. screening for autism spectrum disorder in young children: a 28 Hyman SL, Levy SE, Myers SM, et al. Identification, evaluation, and
systematic evidence review for the US preventive services Task force. management of children with autism spectrum disorder. Pediatrics
Rockville (MD): Agency for Healthcare Research and Quality (US), 2020;145:e20193447.
2016. 29 Pottick K, Hansell S, Gutterman E, et al. Factors associated with
4 Lord C, Risi S, DiLavore PS, et al. Autism from 2 to 9 years of age. inpatient and outpatient treatment for children and adolescents
Arch Gen Psychiatry 2006;63:694–701. with serious mental illness. J Am Acad Child Adolesc Psychiatry
5 Lipkin PH, Macias MM, Council on children with disabilities, 1995;34:425–33.
section on developmental and behavioral pediatrics. Promoting 30 Siu AL, Bibbins-Domingo K, et al, US Preventive Services Task Force
optimal development: identifying infants and young children with (USPSTF). Screening for autism spectrum disorder in young children:
developmental disorders through developmental surveillance and US preventive services task force recommendation statement. JAMA
screening. Pediatrics 2020;145. doi:10.1542/peds.2019-3449. [Epub 2016;315:691–6.
ahead of print: 16 Dec 2019]. 31 Liu G, Pearl AM, Kong L, et al. Risk factors for emergency
6 Robins DL, Fein D, Barton ML, et al. The modified checklist for department utilization among adolescents with autism spectrum
autism in toddlers: an initial study investigating the early detection of disorder. J Autism Dev Disord 2019;49:4455–67.

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 7

Children Youth and Adults With Asperger Syndrome Integrating Multiple Perspectives 1st Edition Kevin P. Stoddart All Chapters Instant Download
100% (3)
Children Youth and Adults With Asperger Syndrome Integrating Multiple Perspectives 1st Edition Kevin P. Stoddart All Chapters Instant Download
82 pages
Leveraging Artificial Intelligence For Diagnosis of Children Autism Through Facial Expressions
No ratings yet
Leveraging Artificial Intelligence For Diagnosis of Children Autism Through Facial Expressions
20 pages
Markey Activity 1 - Laws in Special Education
No ratings yet
Markey Activity 1 - Laws in Special Education
3 pages
Autistic Spectrum Disorder - A Practical Approach at Home For Parents and Carers (Booklet)
No ratings yet
Autistic Spectrum Disorder - A Practical Approach at Home For Parents and Carers (Booklet)
36 pages
Pendekatan Machine Learning Untuk Mendeteksi ASD Di Anak
No ratings yet
Pendekatan Machine Learning Untuk Mendeteksi ASD Di Anak
12 pages
Mendeteksi ASD Dengan Menggunakan Machine Learning
No ratings yet
Mendeteksi ASD Dengan Menggunakan Machine Learning
7 pages
(PDF Download) Differential Diagnosis of Autism Spectrum Disorder Katherine K. M. Stavropoulos Fulll Chapter
100% (2)
(PDF Download) Differential Diagnosis of Autism Spectrum Disorder Katherine K. M. Stavropoulos Fulll Chapter
64 pages
Melakukan Perbandingan Algoritma Machine Learning Untuk Mengecek ASD
No ratings yet
Melakukan Perbandingan Algoritma Machine Learning Untuk Mengecek ASD
9 pages
3 - 2024 Paper
No ratings yet
3 - 2024 Paper
12 pages
Сборник "Personal statement essay"
No ratings yet
Сборник "Personal statement essay"
20 pages
2.ASD Transformer
No ratings yet
2.ASD Transformer
24 pages
Bbe, 13 (1), 557-572
No ratings yet
Bbe, 13 (1), 557-572
16 pages
Early Autism Spectrum Disorder Detection Using Machine Learning
No ratings yet
Early Autism Spectrum Disorder Detection Using Machine Learning
17 pages
Ca 4 ML Report
No ratings yet
Ca 4 ML Report
17 pages
Autism Spectrum Disorder Prediction in Children Using Machine Learning
No ratings yet
Autism Spectrum Disorder Prediction in Children Using Machine Learning
9 pages
Detection of Autism Spectrum Disorder Using Deep Learning Models
No ratings yet
Detection of Autism Spectrum Disorder Using Deep Learning Models
6 pages
Behavior Skills Training To Improve Parent Treatment Fidelity and
No ratings yet
Behavior Skills Training To Improve Parent Treatment Fidelity and
72 pages
Major PPT Autism Detection
No ratings yet
Major PPT Autism Detection
12 pages
Important As Base Paper
No ratings yet
Important As Base Paper
7 pages
An Evaluation of Machine Learning Approaches For Early Diagnosis of Autism Spectrum Disorder
No ratings yet
An Evaluation of Machine Learning Approaches For Early Diagnosis of Autism Spectrum Disorder
20 pages
Fpsyt 13 993077
No ratings yet
Fpsyt 13 993077
15 pages
Early Detection of Autism Using Digital Behavioral Phenotyping
No ratings yet
Early Detection of Autism Using Digital Behavioral Phenotyping
29 pages
Team-14 Major Project Presentation
No ratings yet
Team-14 Major Project Presentation
42 pages
A Survey On Genetic Disease - Autism Spectrum
No ratings yet
A Survey On Genetic Disease - Autism Spectrum
17 pages
Data Science Project Ideas, Methodology & Python Codes in Health Care
From Everand
Data Science Project Ideas, Methodology & Python Codes in Health Care
Zemelak Goraga
No ratings yet
ResearchPaper Abhishek
No ratings yet
ResearchPaper Abhishek
5 pages
Research Paper
No ratings yet
Research Paper
23 pages
Scaling Up Behavioral Skills Training
No ratings yet
Scaling Up Behavioral Skills Training
15 pages
Computers 12 00092 v3
No ratings yet
Computers 12 00092 v3
19 pages
Amit 2024 Oi 231496 1704211711.82967
No ratings yet
Amit 2024 Oi 231496 1704211711.82967
14 pages
Research Paper 2
No ratings yet
Research Paper 2
5 pages
Cit 5
No ratings yet
Cit 5
17 pages
Islam 2020
No ratings yet
Islam 2020
6 pages
c42af44e218300b7225583982c5bf0e0
No ratings yet
c42af44e218300b7225583982c5bf0e0
6 pages
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
100% (1)
Facial Image-Based Autism Detection A Comparative Study of Deep Neural
22 pages
Use of Machine Learning Methods in Prediction of Short-Term Outcome Inautism Spectrum Disorders
No ratings yet
Use of Machine Learning Methods in Prediction of Short-Term Outcome Inautism Spectrum Disorders
7 pages
IJETAUTISMPAPER
No ratings yet
IJETAUTISMPAPER
7 pages
Detection of Autism Spectrum Disorder (ASD) in Children and Adults Using Machine Learning
No ratings yet
Detection of Autism Spectrum Disorder (ASD) in Children and Adults Using Machine Learning
14 pages
122 Submission
No ratings yet
122 Submission
10 pages
Comparison of Classification Algorithms For Predicting Autistic Spectrum Disorder Using WEKA Modeler
No ratings yet
Comparison of Classification Algorithms For Predicting Autistic Spectrum Disorder Using WEKA Modeler
15 pages
Autism and AI
No ratings yet
Autism and AI
12 pages
Predictive Analysis of Autism Spectrum Disorder ASD Using Machine Learning
No ratings yet
Predictive Analysis of Autism Spectrum Disorder ASD Using Machine Learning
6 pages
Sailaja Paper
No ratings yet
Sailaja Paper
6 pages
Review 1-1
No ratings yet
Review 1-1
10 pages
Autistic Spectrum Disorder Screening Prediction With Machine Learning Models
No ratings yet
Autistic Spectrum Disorder Screening Prediction With Machine Learning Models
7 pages
Autism BASE
No ratings yet
Autism BASE
9 pages
Child Autism Test (PsyCom)
No ratings yet
Child Autism Test (PsyCom)
6 pages
Dutta 2017
No ratings yet
Dutta 2017
6 pages
Dental Consideration of Medically Compromised Children
No ratings yet
Dental Consideration of Medically Compromised Children
163 pages
Proposal
No ratings yet
Proposal
11 pages
Detection of Autism Spectrum Disorder in Children Using
No ratings yet
Detection of Autism Spectrum Disorder in Children Using
17 pages
3.detection of Autism Spectrum Disorder in Children Using
No ratings yet
3.detection of Autism Spectrum Disorder in Children Using
16 pages
A Deep Convolutional Neural Network Based Detection System For Autism Spectrum Disorder in Facial Images
No ratings yet
A Deep Convolutional Neural Network Based Detection System For Autism Spectrum Disorder in Facial Images
5 pages
Nurses Knowledge On Autism Spectrum Disorders
No ratings yet
Nurses Knowledge On Autism Spectrum Disorders
12 pages
Note Academic
No ratings yet
Note Academic
13 pages
Case Study - Persuasive Writing
No ratings yet
Case Study - Persuasive Writing
5 pages
Autism Spectrum Disorder
No ratings yet
Autism Spectrum Disorder
10 pages
IJETAUTISMPAPER
No ratings yet
IJETAUTISMPAPER
6 pages
IEEE Conference Template 2
No ratings yet
IEEE Conference Template 2
5 pages
A Machine Learning Framework For Early-Stage Detection of Autism Spectrum Disorders
No ratings yet
A Machine Learning Framework For Early-Stage Detection of Autism Spectrum Disorders
20 pages
Ulzii 2020
No ratings yet
Ulzii 2020
12 pages
2022 - Digital Medicine - Evaluation of An Artificial Intelligence-Based Medical Device For Diagnosis of Autism Spectrum Disorder
No ratings yet
2022 - Digital Medicine - Evaluation of An Artificial Intelligence-Based Medical Device For Diagnosis of Autism Spectrum Disorder
11 pages
A Scale For Detecting Autism
No ratings yet
A Scale For Detecting Autism
2 pages
Autism Spectrum Disorder Using Machine Learning
No ratings yet
Autism Spectrum Disorder Using Machine Learning
1 page
Multi-Modal Data Fusion For Classification of Autism Spectrum Disorder Using Phenotypic and Neuroimaging Data
No ratings yet
Multi-Modal Data Fusion For Classification of Autism Spectrum Disorder Using Phenotypic and Neuroimaging Data
16 pages
1 s2.0 S2665917423001101 Main
No ratings yet
1 s2.0 S2665917423001101 Main
6 pages
ML Autism
No ratings yet
ML Autism
6 pages
Facts About Developmental Disabilities
No ratings yet
Facts About Developmental Disabilities
5 pages
Adi-R Report Template With Qs June 2021
No ratings yet
Adi-R Report Template With Qs June 2021
5 pages
Machine Learning Classifiers For Autism Spectrum Disorder A Review
No ratings yet
Machine Learning Classifiers For Autism Spectrum Disorder A Review
6 pages
Prediction and Comparison Using Adaboost and ML Algorithms With Autistic Children Dataset IJERTV9IS070091
No ratings yet
Prediction and Comparison Using Adaboost and ML Algorithms With Autistic Children Dataset IJERTV9IS070091
4 pages
Autism + Camouflaging
No ratings yet
Autism + Camouflaging
2 pages
Quantum Reflex Integration
No ratings yet
Quantum Reflex Integration
2 pages
IJCRT2209468
No ratings yet
IJCRT2209468
8 pages
Sarah Benjamin Resume 2022 Edit
No ratings yet
Sarah Benjamin Resume 2022 Edit
2 pages
CAD Exam
No ratings yet
CAD Exam
15 pages
Ijresm V4 I12 18
No ratings yet
Ijresm V4 I12 18
2 pages
W-605CV (CD4) Steven v1
No ratings yet
W-605CV (CD4) Steven v1
24 pages
Inclusive Essay
No ratings yet
Inclusive Essay
9 pages
Pregnancy Tests Explained (2Nd Edition): Current Trends of Antenatal Tests
From Everand
Pregnancy Tests Explained (2Nd Edition): Current Trends of Antenatal Tests
Dr Patrick Chia FRCOG FAFP (Mal)
No ratings yet
Larsson Is ABA and EIBI An Effective Treatment For Autism
No ratings yet
Larsson Is ABA and EIBI An Effective Treatment For Autism
11 pages
Occupational Therapy Evaluation For Children: A Pocket Guide. ISBN 9781451176179, 978-1451176179
100% (27)
Occupational Therapy Evaluation For Children: A Pocket Guide. ISBN 9781451176179, 978-1451176179
23 pages
15 Selected Assess Tools For ASD
100% (1)
15 Selected Assess Tools For ASD
15 pages
Derbyshire County Council: This Document Contains Six Parts, Which Must All Be Completed Before Submitting The Referral
No ratings yet
Derbyshire County Council: This Document Contains Six Parts, Which Must All Be Completed Before Submitting The Referral
4 pages
Autism ML Paper
No ratings yet
Autism ML Paper
7 pages
Kanimozhiselvi Et Al. - 2019 - Grading Autism Children Using Machine Learning Tec
No ratings yet
Kanimozhiselvi Et Al. - 2019 - Grading Autism Children Using Machine Learning Tec
3 pages
Nac Parent Manual
100% (2)
Nac Parent Manual
146 pages
Architecture and Autism: Dissertation
No ratings yet
Architecture and Autism: Dissertation
19 pages
2010 Aba Training Program Series Brochure
No ratings yet
2010 Aba Training Program Series Brochure
4 pages

Early Detection of Asd Using Data Mechine

Uploaded by

Early Detection of Asd Using Data Mechine

Uploaded by

Open access Original research

Early detection of autism spectrum

To cite: Chen Y-­H, Chen Q, ABSTRACT

Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 1

2 Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 3

procedure would not miss out on rare but crucial predic-

12.1 (11.4 to 12.8)

21.0 (20.2 to 21.8)

8.4 (8.1 to 8.8)

5.1 (5.0 to 5.2)

Specificity, % (95% CI)

90.0 (89.7 to 90.4)

78.4 (77.9 to 78.9)

90.4 (90.0 to 90.8)

0.246 (0.240 to 0.251)

0.128 (0.124 to 0.132)

0.255 (0.249 to 0.261)

0.067 (0.065–0.069) to 0.234 (0.227–0.240) (figure 3),

0.143 (0.138 to 0.148)

0.066 (0.064 to 0.068)

0.148 (0.143 to 0.153)

Identifying key predictive variables

0.775 (0.771 to 0.779)

0.720 (0.716 to 0.723)

0.800 (0.797 to 0.803)

variables were sufficient to achieve stable prediction

performance with AUROC, whereas for an earlier age of

Gini index) and the LR model (ranked by the median

absolute value of the coefficient) for prediction at age of

At age of 30 months (older)

LASSO logistic regression

24 months (online supplemental figure S3). The identi-

gastrointestinal-­related diagnosis, ear and eye infections,

perinatal conditions, and ED visits, which have also been

4 Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 5

6 Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-­H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 7

You might also like

Open access Original research

To cite: Chen Y-H, Chen Q, ABSTRACT

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 1

2 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 3

gastrointestinal-related diagnosis, ear and eye infections,

4 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 5

6 Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544

Chen Y-H, et al. BMJ Health Care Inform 2022;29:e100544. doi:10.1136/bmjhci-2022-100544 7