0% found this document useful (0 votes)
87 views12 pages

Lung Cancer Prediction Using Electronic Claims Records A Transformer-Based Approach

This document presents a framework for lung cancer prediction using electronic claims records with a transformer-based approach. It applies the model to the entire population of Taiwan. The model achieves over 2.1 predictive power for all-stage lung cancer and around 2.0 for early-stage, with average positive predictive values of 5 and 1 respectively. The work establishes exclusion criteria to predict new incidences of lung cancer and presents analyses of the prediction performance and effects of the exclusion design.

Uploaded by

Nakib Ahsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views12 pages

Lung Cancer Prediction Using Electronic Claims Records A Transformer-Based Approach

This document presents a framework for lung cancer prediction using electronic claims records with a transformer-based approach. It applies the model to the entire population of Taiwan. The model achieves over 2.1 predictive power for all-stage lung cancer and around 2.0 for early-stage, with average positive predictive values of 5 and 1 respectively. The work establishes exclusion criteria to predict new incidences of lung cancer and presents analyses of the prediction performance and effects of the exclusion design.

Uploaded by

Nakib Ahsan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

6062 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO.

12, DECEMBER 2023

Lung Cancer Prediction Using Electronic Claims


Records: A Transformer-Based Approach
Huan-Yu Chen , Graduate Student Member, IEEE, Hui-Min Wang , Ching-Heng Lin , Rob Yang ,
and Chi-Chun Lee , Senior Member, IEEE

Abstract—Electronic claims records (ECRs) are large


scale and longitudinal collections of individual’s medical
service seeking actions. Compared to in-hospital medical
records (EMRs), ECRs are more standardized and cross-
sites. Recently, there has been studies showing promising
results on modeling claims data for a wide range of med-
Fig. 1. Sample of electronic claims record, which consists of visit date,
ical applications. However, few of them address the exclu-
demographic information (gender and birthday) and medical details (di-
sion criteria on cohort selection to extract new incidence agnosis and medication).
without prior signs and also often lack of emphasis on
predicting cancer in early stages. In this work, we aim to
design a lung cancer prediction framework using ECRs with biological monitoring data, lab examination results, specimen
rigorous exclusion design using state-of-the-art sequence- interpretation, treatments and medication, etc. EMRs are all-
based transformer. Furthermore, this work presents one around data for the purpose of recording comprehensive in-
of the first results by applying disease prediction model
formation about a patient’s diagnostic/intervention status and
to the entire population in Taiwan. The result shows over
2.1 predictive power, 5 average positive predictive value results. In contrast, claims records report the action items done
(PPV), and 0.668 area under curve (AUC) in all-stage lung as the final checklist for the purpose of submitting a claim on
cancer and around 2.0 predictive power, 1 average PPV insurance after each clinical visit (an example is shown in Fig. 1).
and 0.645 AUC in early-stage in our dataset. Sub-cohort Recently, developing computational frameworks to rapidly shift-
analysis could funnel high precision selective group into ing through these large scale data records is emerging for
prioritized clinical examination. Onset analysis validates diverse clinical applications [1], [2]. Furthermore, promising
the effect of our exclusion criteria. This work presents com- results have been repeatedly observed when modeling large scale
prehensive analyses on lung cancer prediction, and the digital collections of patient’s records with machine learning
proposed approach can serve as a state-of-the-art disease approaches, i.e., models of disease prediction, diagnosis, and
risk prediction framework on claims data.
progress monitoring have all been proposed (e.g., [3], [4], [5],
Index Terms—Electronic claims records, lung cancer, [6], [7]).
transformer, deep learning. Being a non-intrusive modality that has been collected regu-
larly, EMRs datasets are often large-scale in nature, especially
I. INTRODUCTION fulfilling the data-hungry nature of deep learning models. In fact,
IGITAL collections of patient’s records were originally a majority of previous works concentrate on using EMRs for
D constructed for diverse functions of our healthcare system.
Databases of electronic medical records (EMRs) and electronic
various predictive model building. However, the infrastructure
in hosting the EMRs is often limited to a single hospital to a few
claims records (ECRs) present themselves as valuable data sites at most, making it isolated among hospital systems and
sources, recording a wide variety of patient’s medical service difficult to collect across sites. Compared to EMRs, ECRs could
seeking information. EMRs are a collection of patient’s status, be further extended to broader scenarios. While claims data
include medical actions often without the results, these claims
data are usually owned by insurance companies or government
Manuscript received 26 December 2022; revised 4 May 2023 and making them more standardized across various health system
13 August 2023; accepted 3 October 2023. Date of publication 12 infrastructures (e.g., clinics, local hospitals, medical centers).
October 2023; date of current version 6 December 2023. This work
was supported by Johnson & Johnson under Grant NTHU 109A0198J6.
This unique nature of claims data makes it easier, as compared
(Huan-Yu Chen and Hui-Min Wang contributed equally to this work.) to EMRs, to grow into an even larger collection (often, with
(Corresponding authors: Rob Yang; Chi-Chun Lee.) longer time span) and cross-site collection of real-world popu-
Huan-Yu Chen and Chi-Chun Lee are with the Department of Electri- lations. The diverse, large, longitudinal data points from claims
cal Engineering, National Tsing Hua University, Hsinchu 300044, Taiwan database, when appropriately modeled, not only can be used
(e-mail: [email protected]; [email protected]).
Hui-Min Wang and Rob Yang are with the Lung Cancer Initia- for training predictive model at an individual level but also
tive, Johnson & Johnson Enterprise Innovation, Inc., New Brunswick, can be used to model clinical behaviors and statistics at the
NJ 08901 USA (e-mail: [email protected]; yang.robert@ population level. For example, disease prevalence estimation [8],
gmail.com). [9], [10], [11], modeling relation between environmental risks
Ching-Heng Lin is with the Department of Medical Research,
Taichung Veteran General Hospital, Taichung 40705, Taiwan (e-mail:
and phenotypes [12], and healthcare cost analysis [13], [14], etc.
[email protected]). While ECRs provide an opportunity for larger and more
Digital Object Identifier 10.1109/JBHI.2023.3324191 standardized data samples collections, claims data report the

2168-2194 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6063

clinical action for reimbursement purpose, may be biased toward cohort selection to ensure meaningful prediction on incidence
financial reasons. One can imagine every single claim’s data of lung cancer. A tangible application, by establishing such an
point possesses a noisier proxy to patient’s health status as automated approach to screen lung cancer patients for early
compared to medical records. As a result, there are two key detection, can help funnel high risk individuals into getting a
and unique challenges in using claims for model building: one low-dose CT scan or be treated with treatments.
is the need to handle the complex spatial-temporal nature of The following are the key contributions of this work:
data, and another one is proper design of target cohort to reflect r A state-of-the-art transformer-based approach to model
the model performance in meaningful clinical usage scenario. claims data and apply the predictive framework for the first
For the first part, to capture the health trajectory of patient from time in a population-wide evaluation (∼23 milion people
the claims, proper frameworks to handle the encoding of codes in Taiwan).
and time sequence would be essential. Each data point contains r Rigorous cohort design for new incidence lung cancer
the diagnosis and treatment in one medical visit, as a sample
prediction evaluation when using claims data.
reflecting the patient’s health status. The diagnosis and treatment r Providing a process for future risk prediction of lung can-
are often represented as codes from medical classification list
in different levels of granularity. Spatial difficulty lies in the cer that can be easily integrated in the current healthcare
trade-off between feature density and granularity in the coding system to help significantly reduce the mortality rate of
space. Moreover, the irregular visit sequence shows the underly- individuals with lung cancer.
ing informative yet heterogeneous temporal medical behaviors. The rest of the article is organized as follows. In Section II, we
The other challenge is to design a clinically meaningful cohort review the related literature on frameworks utilizing claims data.
for model evaluation. In common practice, most previous works Section III describes our main framework on cohort design, data
on clinical event or disease detection only consider patients’ preprocessing, and model architecture following Fig. 2. Section
first encounter as new incidence, i.e., no history of target diag- IV shows the results of our framework and discussions, and we
nosis [15], [16]. This setup neglects the related complication and conclude this work in Section V.
indicators leading to the target disease, inducing overestimation
on model performance. Motivated by the natural reflectivity and II. RELATED WORK
the trajectory of medical behavior from the claims data, one
should form a stringent cohort with clinically meaningful new A. Electronic Claims Records (ECRs)
incidence by removing patients with related prior symptoms and Most of the previous works compute trends and statistics
screens when evaluating on ECRs. based on the natural reflectivity of the large claims data for vari-
In this work, our aim is to develop and validate method for ous epidemiology studies including those of trends in prevalence
using claims data for disease onset prediction, and we focus on of Amyloid light-chain amyloidosis [22], trends in Lyme disease
lung cancer (LCa). Globally, lung cancer has been the most com- diagnoses [23], and frailty measure [24]. There also exists works
monly diagnosed cancer and leading cause of cancer mortality in in modeling claims using deep learning methods for diverse
the world [17]. As the survival rate evidence in each lung cancer prediction tasks, e.g., hospital readmission risk on COPD [25],
stages, the global data show that the 5-year overall survival rate payer response prediction [26], stroke prediction [27]. While
drops from 70% for stage I to 6% for stage IV [18]. Therefore, claims data arguably reflects less precise information on pa-
early detection, especially early stage lung cancer prediction, is tient’s health status compared to EMRs, these works have shown
a crucial key on reducing mortality and improving the prognosis competitive performances and feasibility of using claims data
in lung cancer. for different clinical applications. However, few of these past
Conventional clinical screenings and examinations are current works have stressed a rigorously-designed cohort to ensure
standards for detection, albeit prioritized for patients with high predictive model’s evaluation is carried on clean and new inci-
risks under experts’ instruction instead of the whole population. dence, potentially leading to over-estimated performances and
The American Cancer Society (ACS) recommends yearly lung non-realistic clinical applications. Furthermore, none of these
cancer screening with Low-Dose computed tomography(LDCT) prior works have their predictive model evaluated on a full
scans. The National Lung Screening Trial (NLST) claims that population.
LDCT screening detects a 20% reduction in lung cancer mor-
tality relative to conventional chest x-ray screening. LDCT
screening also shows individuals have 7% less likely to die B. Transformer Frameworks
overall (from any cause) than those who got chest x-rays [19], Transformer is originally proposed in sequence modeling task
[20]. However, the LDCT screening is limited to those that are such as language modeling and machine translation [28]. By
55–74 years old, current smokers or smokers who have quit utilizing solely the attention mechanism, Transformer variants
in the past 15 years and have at least a 30 pack-year smoking have continuously shown state-of-the-art performances across
history [21]. Therefore, an effortless and realistic early screening various sequence modeling tasks, such as those commonly found
framework on individuals that is operation-able at larger scale in natural language processing and computer vision communi-
(at best, population-wide) would be ideal in further advancing ties [29], [30]. Diverse variants of Transformers-like modeling
precision screening and improving diagnosis and/or intervention has been proposed for EMRs and were shown to outperform
for LCa. other deep learning architectures [31], [32], [33], [34]. Trans-
Specifically, this work introduces the use of a transformer- former has great success in various fields and becomes a de
based deep learning method for large-scale claims data mod- facto standard to achieve state-of-the-art performance. In this
eling to perform early detection on lung cancer. The Vision work, our objective is to design a transformer-based framework
Transformer (ViT)-based architecture was used for model train- for modeling ECRs and evaluate on the task of lung cancer
ing with claims data and evaluating on a rigorously-designed prediction.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
6064 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 12, DECEMBER 2023

Fig. 2. Overview of our framework, including data, model, and experiment design.

III. METHODOLOGY 4) Local Development Set: Our custom dataset used for
model development, which is a combination of Million subset
A. Dataset and positive cases in Catastrophic subset. In this work, we train
The dataset used in this work is from the National Health and comprehensively evaluate our model based on the local
Insurance Research Database (NHIRD) of Taiwan. With development set, and we further apply the trained model on the
the data standardization on a national-level, the database whole population set.
practically covers all subjects and constitutes in Taiwan [35]
(usage approved by NTHU IRB#REC10911HE098). Around B. Cohort Design
30 million diagnosis records, and their corresponding 200
million medication records are collected per month. The claims People between 45 and 65 years old are the main focus of this
database contains data including demographic variables, such research of lung cancer prediction. As people have higher risks
as the insured person’s registration location, gender, age, when getting older, we select this group that has moderate risk
examinations, diagnoses, prescriptions, and details of each but still leaves time and space to receive treatments. The task
outpatient visit or their inpatient care. is to predict whether one will have lung cancer in three years
International Classification of Diseases Clinical Modifica- after index, using three years of claim records before index as
tion (ICD codes) [36] are standardized medical codes used features. All patients meet the below criteria are first gathered.
to document medical conditions and procedures in healthcare r adults with age 45-65 years on the index date.
billing and medical records. ICD-9-CM and ICD-10-CM are r with at least 365 days continuous prior (to index) obser-
ninth and tenth revisions, respectively, with each chapter repre- vation time.
senting related conditions and divided into blocks for specific Most importantly, we additionally filter out patients with
categories of conditions. For example, the neoplasms chapter of several exclusions to focus on new incidence of lung cancer. An
ICD-10-CM contains blocks for neoplasms affecting different exquisite set of inclusion and exclusion rules are set in order to
parts of the human body; NHI Drug Code is mapped to the focus on new lung cancer incidence with barely any symptomatic
WHO Anatomical Therapeutic Chemical Classification System signs beforehand. We term the exclusion process from inspecting
(ATC code) [37] to assign a unique code to each medication one’s inpatient history as inpatient exclusion, and from one’s
based on its therapeutic properties and the organ or system it outpatient records as outpatient exclusion, e.g. one should be ex-
affects. ATC code has five levels of classification, with the first cluded under outpatient exclusion if one has lung cancer history
level representing the main anatomical group and the fifth level in outpatient records. We also focus on early-stage lung cancer,
representing the specific medication. which only contains the lung cancer in stage I and stage II, since
Four subsets of NHIRD are constructed to be used in our early detection leads to higher survival rate by leaving rooms
experiments. for treatments. Our computational framework is not restricted
1) Population Set: A set that includes all records of national- to high-risk groups as opposed to traditional screening tests,
wide population, which is used as an external testing set for the which require more medical resources. Detailed list of inclusion
performance evaluation and analysis. and exclusion criteria is in the following sections.
2) Million Subset: A subset of population that is randomly 1) Inclusion Criteria: Patients with lung cancer diagnosed
sampled from the whole population set preserving the natural within three year after index in the code group 162 from
distribution of the whole population. This is a subset that is ICD-9-CM, which includes “malignant neoplasm of trachea,
heavily used in prior works of NHIRD [38], [39], [40]. This bronchus, and lung” excluding “benign carcinoid tumor of
subset lack positive samples due to low incidence rate of lung bronchus (209.61)” and “malignant carcinoid tumor of bronchus
caner (especially early-stage) disease. Therefore, this subset is (209.21)”. Table II shows the concepts in the code group 162 and
used mainly as source of non-diseases control. its corresponding codes in ICD-10-CM.
3) Catastrophic Subset: This subset includes all catas- 2) Exclusion Criteria: Patients would be discarded if one has
trophic illness patients, e.g., one has cancer, genetic disease, any of the following conditions before index date. These rules
congenital disease, acting as the main source of positive sub- are applied to ensure we are predicting on those cases where
jects. Most of the lung cancer cases are extracted from this it is almost free of clear sign and reflecting truly those new
set. incidences.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6065

TABLE I
LIST OF CODES USED IN EXCLUSION CRITERIA

TABLE II 1) ICD-10-CM Codes for Diagnosis Encoding:


CONCEPT AND CODE UNDER USED IN SUBJECT INCLUSION r First Level: Total of 21 chapters from the official guide-
lines of classification, which is top level categories for
disease and health condition.
r Second Level: Total of 283 blocks subdivided from chap-
ters, which represent more specific categories of condi-
tions.
(Notation: in code groups, e.g. “A00-B99”, “D50-D53”)
2) ATC Codes for Medication Encoding:
r First Level: Total of 14 main anatomical or pharmaco-
logical groups.
r Second Level: Total of 94 pharmacological or therapeutic
r History of any cancer, carcinoembryonic antigen, and subgroups.
chemotherapy. (Notation: in code categories, e.g., “A03”, “N02”)
r History of related screens, i.e., computed tomography and 3) Temporal Information Encoding: A step in the time-axis
chest x-ray for lung cancer. represents sum of code appearances every 30 days. The begin
r History of related symptoms, i.e., hemoptysis for lung (origin) of time starts at three years prior to index date. (Notation:
in time frame, e.g., “0030-0059”, “1051-1080”, larger number
cancer.
being closer to index date).
Table I shows the concepts and their corresponding codes used
For example, if a patient was diagnosed with lung cancer
in the exclusion.
(ICD code C34) on day 1 and then again on day 20, the feature
The exclusion design aimed to evaluate the model’s perfor-
encoding would represent this as a cell located at (C30-C39,
mance using clean dataset without prior signs to mitigate inflated
0000-0029) with a value of 2, indicating that the diagnosis code
accuracy in known high-risk subgroups. The model can identify
appeared twice within that 30-day time window. The rationale
challenging cases that current clinical guidelines would likely
behind using a 30-day time sliding window is to capture the
miss due to a lack of non-intrusive screening. Patients excluded
temporal aspect of diagnoses and medication progression. As
can still be tested to gain additional insights. The study also
follow-up appointments are usually scheduled on a monthly
shows the model’s predictive power for various baseline risks
basis, and it may take several months to confirm an event.
and is not restricted to high-risk groups.

C. Feature Encoding D. Model Architecture


Feature arrays with time and code axes are used to encode We propose a Transformer-based model as in Fig. 3 to derive
three years of records prior to the index date. Diagnosis and a concise high dimensional embedding to represent the code
medication codes are used, with each cell representing the ap- matrix derived from claims data, i.e., 2D matrix consists of
pearances of a code within a particular time period. The coding time axis and diagnosis with medication code vectorization. A
design follows hierarchical structures in ICD and ATC codes, classifier is built by stacking the encoded representation with
with different levels of coding granularity. The terms “First each individual’s personal attributes, e.g. gender and age, to
Level” and “Second Level” are used to represent coarse and perform lung cancer prediction. This scalable approach can
fine level coding representations. Details of code and time axes effectively combine personal static attribute and temporal nature
are as follows. of an individual clinical characteristics.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
6066 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 12, DECEMBER 2023

Fig. 3. Our main framework of prediction from claims data. Two claim-related information matrices are encoded by the network composed of CNN,
ViT, and Transformer. We further concatenate the encoded vector with age and gender as the input to another DNN classifier to obtain the final
prediction.

1) Transformer: Transformer [28] is a sequence model by de- dependencies, while ViT is a fully attention-based network.
sign that utilizes attention mechanism for sequence to sequence ViT relies less on two-dimensional neighborhood structure, and
learning. By stacking self-attention layers, Transformers can the attention layers are global interactions [30]. Our features
learn the relation of features between timesteps efficiently, and describe actions of medical treatments along time. There is a
generate an encoded vector summarizing the temporal informa- more subtle relationship in neighbor codes compared to image
tion. We use the encoder part of Transformer in this framework. pixels. The common average- or max-pooling operations in deep
2) Vision Transformer: Vision Transformer (ViT) [30] is a CNNs also dilute or miss the importance of specific treatments.
variant of transformer model fully relies on attention mechanism 4) Model Output: ViT-Transformer acts as an encoder of
and has only applied to visual data. The key is to first divide temporal clinical information and extracts a concise representa-
image into patches, and the 2D image patches would first be tion of claim records. Then, we further feed the representation
flattened and applied through a simple linear projection. The along with other demographic information as age and gender
information utilizing relation among projected patches is further into a stack of linear layers for arbitrary downstream classifica-
extracted with standard self-attention layers. The linear projec- tion task. The output is a decision score bounded between 0 and
tion prior to self-attention encourage the network to identify new 1. Depending on a cutoff threshold, a prediction of 0 suggests
task-discriminative space prior to temporally self-attending. ViT the absence of lung cancer, while a prediction of 1 suggests the
is the state-of-the-art building block in computer vision tasks has presence of lung cancer.
becoming a strong successor to the well-known convolutional
neural network variants. In our settings, we adapt the similar
concept to model the spatial (code-axis, indicating health condi- E. Training and Evaluation Scenarios
tions) and temporal (time axis, indicating longitudinal progres- Every subject has records span over multiple years. To further
sion). The original Vision Transformer utilizes a 16 × 16 patch prevent overlapping samples leading to leakage of training in-
size on 224 × 224 images, resulting in 196 patches per image. We formation, our algorithm evaluation protocol is set up in a time
plan to have a comparable number of patches, approximately a and subject independent scenario. For time independence, the
hundred or so. After several trials of hyperparameters, we decide training and validation year is defined as the index year, and
on a patch size of 5 × 5, resulting in a total of 120 patches. testing is done on the next year of index as testing year. For
Each “patch” in this case roughly represents 270 days on time subject independence, we further split subjects into five groups,
axis, 5 codes in first level coding, and 10 codes in second level three groups are used as training, another one as validation, and
coding. the last group as testing. We take one visit from each subject to
3) Model Details: We propose a ViT-Transformer framework reduce the potential for bias of over-representing certain subjects
that comprises 2D convolution as a reshaping layer, ViT stacks in the dataset. Only the last visit of each subject within a year
as spatial and temporal encoding, Transformers being the more is taken as a sample, if one has multiple visits. The last medical
fine-grained temporal feature extractor, and several linear layers visit is considered to be the most recent and confirmed medical
as the final classifier. Since ViT requires the feature dimension status, as one is possible to not have cancer initially but develops
divisible into patches, we first feed the raw feature arrays into a it at the end. We show the distribution of the date of the last visit
2D convolution as a reshaping layer to make them numerical for each subject, and noticed that the majority of them occurred
friendly. After reshaping, the features are divided into large towards the end of the year, with smaller proportion occurring at
patches before entering ViT, as we leave the temporal learning the beginning of the year as in Table IV. Combining the above
mainly to Transformer. We follow the setup in original ViT by two strategies ensures both time and subject independence. We
using a fixed-size square grid. The input feature array is tiled into observe that utilizing all positive samples in training could be
a grid of patches, i.e., splitted into squares without overlapping, better than using only the limited samples met the exclusion
and then flattening each patch into a vector representation. One criteria. By up-sampling to 1:1 also helps the learning process
can merge the Transformer into ViT by directly splitting the due to the imbalance of label distribution. To sum up, the model
features into small patches along time axis, but we find better is trained on the up-sampled data of three groups in the index
performance with ViT followed by Transformer on this task. year, and validated on data after exclusion of one group also in
We assume that ViT is better than CNN in this task due to the the index year. The evaluation is leveraged on data of one group
nature of our feature design. CNN exploits the spatial locality in the index+1 year. The experiment is repeated five times to

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6067

TABLE III
DATA STATISTICS ON THE REMAINING SUBJECTS IN OUR DATASET AFTER OUTPATIENT EXCLUSION

TABLE IV takes the output vector gathered from Transformer, consists of


DISTRIBUTION OF THE DATE OF THE LAST VISIT FOR EACH SUBJECT SHOWS a stack of three linear layers with dimension as 514, 64, 8, 1 to
THAT THE MAJORITY OF VISITS OCCUR TOWARDS THE END OF THE YEAR,
WHILE SOME OCCUR AT THE BEGINNING get the output decision score. The hyperparameters are set as
batch size 8192, learning rate 0.0005, and 50 epochs. For model
selection, we choose the one with the highest average positive
predictive value (PPV) on validation set.

B. Comparison Models and Metrics

obtain five-fold cross validation results. The metric would be 1) Comparison Models:
r ViT-Transformer: Our proposed framework adapting
aggregated across each evaluation.
from the state-of-the-art transformer-based method.
r CNN-LSTM: A baseline for deep neural network, con-
IV. EXPERIMENTS
sisting of stacks of Convolution Neural Network and
We construct our model with local development set on index Long-Short Term Memory.
year 2003, and evaluation is on index year 2004. The model is r Xception: An architecture factors convolution into multi-
further tested on the whole population set, which is on index year ple branches, and utilizes depth-wise separable convolu-
2015. To address the data distribution imbalance, several metrics
tions and residual connections, which is used in [15].
are presented. Sub-cohort analysis is carried out given the natural r GBDT: Stands for gradient boosting decision tree, a fast
reflectivity of ECRs. Lastly, we observe the effect of varying
levels of exclusion on lung cancer onsets, and the performance and efficient model based on tree ensembling.
overtime. Table III shows the demographic statistics of our
r Conventional methods: Other conventional machine
dataset. learning models for comparison, including LinearSVC,
Lasso regression, Ridge regression, and random forest.
A. Model Parameters and Training Details 2) Presented Metrics:
r PPV @ X SEN: Predictive positive value (PPV) at various
Our features are in the shape of (37, 283) and (37, 94) in (time, sensitivities.
code) format for diagnosis and medication, respectively. First, r Avg PPV: Average precision.
we reshape the features with a CNN layer followed by adaptive r AUC: Area under the receiver operating characteristic
pooling to (20, 100) and (20, 50) in (width, height). The two
feature maps are then concatenated along code axis and tiled into curve.
r True Cases: Total positive subjects in the cohort.
patches of (5, 5) to be fed into ViT. We perform mean pooling
along the output of each patch to obtain a 512-length vector,
r Size: Total number of people in the cohort.
which later is reshaped into embedding size 32 and sequence r True/Size: Incidence rate, which is the quotient between
length 16 before entering Transformer. The final classifier, which true cases and total number.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
6068 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 12, DECEMBER 2023

TABLE V
COMPARISON OF RESULTS FROM VARIOUS MODELS

TABLE VI
EVALUATION ON LOCAL DEVELOPMENT SET UNDER VARIOUS EXCLUSION CRITERIA

TABLE VII erage PPV and AUC. For early-stage prediction, performance
MEDICAL CODES USED AS RISK FACTORS of ViT-Transformer with 0.954% average PPV and 0.645 AUC
are similar to Xception’s 0.989% average PPV and 0.629 AUC,
which is a bit lower in average PPV but slightly higher in
AUC. The low AUCs might come from the stringent design and
the skewness of positive/negative distribution. ViT-Transformer
could better capture the complex and sparse dynamics from
claims data.
Table VI shows the results on cohorts of various exclusion
criteria. The evaluation on full set without any exclusion shows
very high PPV of 40% average PPV especially for those high
r Predictive Power: Defined as the quotient between aver- risk groups even reaching 97% PPV (at 0.05 sensitivity). Since
no exclusion is a non-meaningful prediction task which simply
age PPV and incidence rate. We propose this metric for reflects past lung cancer diagnosis as indication of the future
model comparison. Intuitively, predictive power indicates lung cancer occurrence. For early-stage lung cancer, the trend
the model’s ability to precisely identify positive subjects is similar to the all-stage counterpart, which shows a high 33%
when pooling from a group of screened subjects relative average PPV in full set and drops to 0.945% PPV in our out-
to the background incidence rate. patient exclusion set. While the incidence rate is much lower in
early-stage prediction after outpatient exclusion, the predictive
power of 2.081 is still close to all-stage prediction’s 2.114
C. Prediction Results predictive power (2x the background incident rate). Note that the
Table V shows our results comparing to other methods after large differences of performances between different stringent
outpatient exclusion (the most stringent exclusion rule). ViT- levels of exclusion sets further underscore the importance in
Transformer outperforms other deep learning frameworks and properly design cohort to reflect the model predictive power on
conventional models with average PPV of 5.019% and 0.668 those subjects of real clinical importance, i.e., in our case, those
AUC in all-stage lung cancer prediction, leading in both av- new occurrences without any prior signs.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6069

TABLE VIII
EVALUATION ON POPULATION SET UNDER VARIOUS EXCLUSION CRITERIA

TABLE IX
COMPARISON BETWEEN THE GENERAL CANCER MODEL AND THE LUNG CANCER SPECIFIC MODEL, ON LOCAL DEVELOPMENT SET AND WHOLE
POPULATION SET, RESPECTIVELY

TABLE X 2) Gender: As shown in Fig. 4(b), males tend to be predicted


AVG. PPV AND PRED. POWER ARE PRESENTED FOR DIFFERENT FEATURE as higher risk being better with 6.09% average PPV, while
AND PREDICTION PERIODS
females have worse results of 3.07% average PPV. We infer
the reason is that males have higher incidence rate, therefore,
their claims data would possess easier-to-extract information to
predict. The underlying patterns from female might be more
heterogeneous and less observable in the claims data due to a
much lower incident rate.
3) Risk Factors: Risk factors are defined as known preceding
conditions of lung caner, as in Table VII. In Fig. 4(c), the predic-
tion seems better at medium sensitivity with around 10% PPV on
all-stage lung caner prediction. Subjects with risk factors have
relatively higher performance of 7.25% average PPV compared
to sub-cohorts split by other attributes such as gender and age.
Risk factors are validated as strong indication of lung cancer
and could act as a reliable pre-screening to support and funnel
patients into clinical follow-up.
4) Onsets: We split subjects into different onsets to inspect
D. Sub-Cohort Analysis the model performance against lung cancer events in near and
By applying various criteria, the subgroup with higher risks far future. Fig. 4(d) shows the model has overall the best results
could become a specific cohort for high-precision lung cancer on the sub-cohort that has onset in 2–3 years with average
prediction, which can be used as a selective group for prioritized PPV 2.10%. This result is intriguing and clinically relevant.
clinical examination or treatment. We carried out sub-cohort Intuitively, closer events are more easily predictable, while the
analysis based on prior knowledge such as demographic infor- result shows that the model has better performance on further
mation and history of related symptoms. events. We assume the exclusion has effectively removed those
1) Age: Fig. 4(a) shows that age between 55 and 65 are with high risk in the near future (potentially has existing signs).
predicted as higher risk, with average PPV of 6.49% compared The ability to have the prediction model to predict further in
to the all subjects’ 5.02% average PPV. This meets clinical the future is also more appealing as near-future onset probably
practice that being older has higher chance of getting lung cancer. could be more easily-inferred by physicians, where predicting
The prediction on those between 45 and 55 with lower risk of 2–3 years into the future clearly is much challenging (especially
2.78% average PPV are still useful especially for early-stage on those with no obvious prior signs, i.e., outpatient exclusion
lung cancer prediction due to their relatively young age and set). We further extend the onset analysis on the population set
eligibility for comprehensive treatment. to understand the effect of exclusion.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
6070 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 12, DECEMBER 2023

TABLE XI
MOST FREQUENT CODES AS TOP FACTOR IN EACH SUBJECT FROM LUNG CANCER MODEL AND GENERAL CANCER MODEL, RESPECTIVELY

Fig. 4. Precision-Recall curve, average PPV, and ROC-auc on various sub-cohorts. The curve in different color represents each sub-cohort. The
curse closer to the upper-right corner implies better performance.

Fig. 5. Predictive power before and after exclusion. Many subjects with high risk have been dropped.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6071

Fig. 6. Onset, number of diagnosed subjects, and captured rate before and after exclusion. The light bars indicate the total number of subjects
with onset in each month after index date, and the dark bars are the number of subjects that being correctly predicted as positives. The black line
is the capture rate, i.e., the quotient between dark and light bar. Many subjects with onset in the near future have been dropped.

E. Extending to Whole Population the results show that the performance of the lung cancer model
We further extend the evaluation up to the scale of whole is better than the cancer one. Lung cancer model on all-stage
cancer prediction has 5.019% average PPV, while the general
population as an external validation set using the model trained
in the local development set. Table VIII shows the results on cancer model counterpart has 3.459% average PPV, which has
cohorts of various exclusion criteria on the whole population performance gain of 1.56%; it also has 0.225% average PPV gain
on early-stage prediction. On population set, lung cancer model
set. The incidence rate in population set is much lower than
our local development set, but similar trend is observed across is 0.022% and 0.008% superior to the general cancer one, on
different exclusion criteria set as well. The performance drops all-stage and early-stage lung cancer, respectively. This indicates
our model is not simply picking up general symptoms (as reflect-
as the exclusion become more stringent. We obtained average
PPV of 16.207%, 2.075%, and 0.318%; AUC of 0.738, 0.679, ing in diagnosis and medications) reflecting cancer in general but
and 0.668 on full set, inpatient exclusion set, and outpatient is learning specifics patterned tailored to lung cancer prediction.
exclusion set, respectively.
H. Key Factors
F. Onset Analysis By exploring model interpretability, we further investigate
the key factors in our prediction models. We use integrated
To understand where and which events are eliminated from
gradients to compute individual feature importance [41], [42].
the exclusion criteria leading to a clinically meaningful cohort,
The top ranked codes from each subject predicted positive at 0.05
we examine the influence of exclusion by inspecting the onset
sensitivity are gathered to extract the most frequent top codes
of lung cancer events. In Fig. 5, comparing the prediction score
over all subjects as the generic top factors, listed in Table XI.
on threshold axis of full set and outpatient exclusion set, many
Despite being noisy, slight differences in top codes from both
samples considered as high risk (above 0.9) in full set are
model are observed, such as there are more medication related
eliminated in the outpatient exclusion set. It shows that our
to respiratory system in the top positive factors from lung cancer
exclusion criteria can effectively filter out samples with obvious
model, while not in the general cancer model counterpart. Note
prior indications of lung cancer, i.e., those cases that do not need
that our interpretation is that the factors might not be the direct
an automated prediction model to screen out.
cause leading to the prediction but as the prevailing medical
Furthermore, in Fig. 6, first we select threshold where recall is
seeking information from the positive subjects.
around 0.8, and then we compare the captured cases against time
after index date in unit of month. In full set, most positive cases
lie in the first month after index date, while more cases are on V. DISCUSSION
later interval in the outpatient exclusion set. This indicates that
those excluded samples are mostly high risk samples in the near A. Demographics Variable Used
future. The gradually decreasing captured rate in Fig. 6(a) shows We chose to include age and gender as they are commonly
that model tends to infer the subjects with later onset as lower used and important factors in clinical researches [43], [44] and
risk. While in Fig. 6(b), the captured rate becomes steady over also widely available in medical and claims databases. Age is an
time. The exclusion balances the performance between events in intuitively known risk factor for many diseases, and gender is an-
near and far future, e.g., predicting the challenging event in the other key biological factor can also affect disease incidence and
near future, which also improves the model stability overtime. outcomes. However, it is possible that other demographic fac-
tors, if available, such as race or socioeconomic status, could also
G. Background Cancer Model be important predictors, and could be included in future research.

To investigate whether the model trained from claims data


capture lung cancer specific information (not simply indication B. Time Used in Experiments
of cancer events), we create another model trained on all cancers Due to the limitation of data availability (ten years) in
(non-specific to lung cancer) to predict future cancer, as a base- this study, we conduct experiments on different feature and
line model of the background cancer information. In Table IX, prediction lengths, ranging from one to four years. As shown

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
6072 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 12, DECEMBER 2023

in Table X, using longer past features improves predictive per- [3] H. J. Kam and H. Y. Kim, “Learning representations for the early detec-
formance at the trade-off of increased computational resources. tion of sepsis with deep neural networks,” Comput. Biol. Med., vol. 89,
Conversely, predicting events further into the future becomes pp. 248–255, 2017.
[4] S. Wang, J. Pathak, and Y. Zhang, “Using electronic health records and
more challenging, as the predictive power decreased with in- machine learning to predict postpartum depression,” in MEDINFO 2019:
creasing time horizons. Furthermore, we chose a three-year Health and Wellbeing E-Networks for All. Amsterdam, The Netherlands:
window as our main experiment setup, which is similarly used IOS Press, 2019, pp. 888–892.
by previous literature [45], [46]. [5] J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, and J. Eisenstein, “Explain-
able prediction of medical codes from clinical text,” in Proc. Conf. North
Amer. Chapter Assoc. Comput. Linguistics: Human Lang. Technol., vol. 1,
C. Cross-Site Limitation 2018, pp. 1101–1111.
[6] Y. Shao, Q. T. Zeng, K. K. Chen, A. Shutes-David, S. M. Thielke, and D.
Integrating the proposed model into the healthcare system W. Tsuang, “Detection of probable dementia cases in undiagnosed patients
would be easy because of the well-standardized data formats of using structured and unstructured electronic health records,” BMC Med.
Inform. Decis. Mak., vol. 19, no. 1, pp. 1–11, 2019.
health insurance claims records, but some coding conventions [7] Y. Meng, W. Speier, M. K. Ong, and C. W. Arnold, “Bidirectional repre-
may vary across sites. The model can be easily integrated into sentation learning from transformers using multimodal electronic health
multi-level healthcare systems that record claims data regularly record data to predict depression,” IEEE J. Biomed. Health Inform., vol. 25,
and in a standardized manner. However, extending a model no. 8, pp. 3121–3129, Aug. 2021.
trained on specific medical centers to other parts of the healthcare [8] M. T. Wallin et al., “The prevalence of ms in the united states: A population-
based estimate using health claims data,” Neurology, vol. 92, no. 10,
system might be challenging. pp. e1029–e1040, 2019.
The proposed methodology can be adapted to other country’s [9] A. Postler et al., “Prevalence and treatment of hip and knee osteoarthritis
insurance systems, but specific findings on NHIRD may not in people aged 60 years or older in Germany: An analysis based on health
be directly applicable due to differences in socio-economical insurance claims data,” Clin. Interv. Aging, vol. 13, 2018, Art. no. 2339.
[10] N.-P. Yang et al., “Estimated prevalence of osteoporosis from a nation-
factors and claims reimbursement structure. A well-maintained wide health insurance database in Taiwan,” Health Policy, vol. 75, no. 3,
health system would be more feasible to ensure clean and infor- pp. 329–337, 2006.
mative data, but incomplete claims data due to unbilled services, [11] J.-H. Hsu, I.-C. Chien, and C.-H. Lin, “Increased risk of ischemic heart
patients switching payers, and unstable insurance coverage in disease in patients with bipolar disorder: A population-based study,” J.
some countries could be challenging when building reliable Affect. Disord., vol. 281, pp. 721–726, 2021.
[12] C. M. Lakhani, B. T. Tierney, A. K. Manrai, J. Yang, P. M. Visscher, and C.
computational frameworks. J. Patel, “Repurposing large health insurance claims data to estimate ge-
netic and environmental contributions in 560 phenotypes,” Nature Genet.,
vol. 51, no. 2, pp. 327–334, 2019.
VI. CONCLUSION [13] S. M. Mohnen et al., “Healthcare costs of patients on different renal
Non-intrusive claims data collection can result in cross-site replacement modalities–analysis of dutch health insurance claims data,”
PLoS One, vol. 14, no. 8, 2019, Art. no. e0220800.
database, even at population scale. Claims dataset has great [14] S.-W. Cheng, C.-Y. Wang, J.-H. Chen, and Y. Ko, “Healthcare costs and
potential as population-wise medical service seeking behav- utilization of diabetes-related complications in Taiwan: A claims database
iors are continuously recorded longitudinally. In this work, analysis,” Medicine, vol. 97, no. 31, 2018, Art. no. e11602.
we address the modeling challenging of ECRs by proposing [15] M. C.-H. Yeh, Y.-H. Wang, H.-C. Yang, K.-J. Bai, H.-H. Wang, and Y.-C.
a transformer-based method with all-stage and early-stage lung J. Li, “Artificial intelligence–based prediction of lung cancer risk using
nonimaging electronic medical records: Deep learning approach,” J. Med.
cancer prediction as case study. Another key component is to Internet Res., vol. 23, no. 8, 2021, Art. no. e26256.
identify rigorously the cohort design to reflect the performance [16] X. Wang et al., “Prediction of the 1-year risk of incident lung cancer:
of prediction model. This work proposes a state-of-the-art trans- Prospective study using electronic health records from the state of maine,”
former framework to encode spatial-temporal information of J. Med. Internet Res., vol. 21, no. 5, 2019, Art. no. e13260.
[17] Q. Yuan et al., “Performance of a machine learning algorithm using elec-
claims, demonstrate superior prediction performances on lung tronic health record data to identify and estimate survival in a longitudinal
cancer prediction even at population level, and present compre- cohort of patients with lung cancer,” JAMA Netw. Open, vol. 4, no. 7, 2021,
hensive analysis across diverse sub-cohorts and onset timing. Art. no. e2114723.
The framework with stringent exclusion is designed to not [18] C. S. Kim and M. D. Jeter, “Radiation therapy for early stage non-small
artificially inflate the performance. Alternatively, a more lenient cell lung cancer,” in StatPearls [Internet]. Petersburg, FL, USA: StatPearls
Publishing, 2021.
model could be feasible for solving general in-the-wild cases [19] N. L. S. T. R. Team et al., “The national lung screening trial: Overview
in real-world applications. In the future, we plan to investigate and study design,” Radiol., vol. 258, no. 1, 2011, Art. no. 243.
advanced encoding method and representations for claims data [20] N. L. S. T. R. Team, “Reduced lung-cancer mortality with low-dose
to improve modeling power, apply on various target diseases computed tomographic screening,” New England J. Med., vol. 365, no. 5,
pp. 395–409, 2011.
to extend framework generalization, and finally evaluate in [21] R. A. Smith et al., “Cancer screening in the United States, 2018: A review
cross health-system setting (e.g., US and U.K.). The aim is to of current american cancer society guidelines and current issues in cancer
eventually apply this approach as a non-intrusive alternative for screening,” CA: A Cancer J. Clinicians, vol. 68, no. 4, pp. 297–316,
disease prediction tool. 2018.
[22] T. P. Quock, T. Yan, E. Chang, S. Guthrie, and M. S. Broder, “Epidemiology
of AL amyloidosis: A real-world study using us claims data,” Blood Adv.,
REFERENCES vol. 2, no. 10, pp. 1046–1053, 2018.
[23] A. M. Schwartz, K. J. Kugeler, C. A. Nelson, G. E. Marx, and A. F.
[1] C. Xiao, E. Choi, and J. Sun, “Opportunities and challenges in developing Hinckley, “Use of commercial claims data for evaluating trends in lyme
deep learning models using electronic health records data: A systematic disease diagnoses, United States, 2010-2018,” Emerg. Infect. Dis., vol. 27,
review,” J. Amer. Med. Inform. Assoc., vol. 25, no. 10, pp. 1419–1428, no. 2, 2021, Art. no. 499.
2018. [24] D. H. Kim, E. Patorno, A. Pawar, H. Lee, S. Schneeweiss, and R. J.
[2] M. Tayefi et al., “Challenges and opportunities beyond structured data in Glynn, “Measuring frailty in administrative claims data: Comparative
analysis of electronic health records,” Wiley Interdiscipl. Rev.: Comput. performance of four claims-based frailty measures in the us medicare data,”
Statist., vol. 13, no. 6, 2021, Art. no. e1549. J. Gerontol.: Ser. A, vol. 75, no. 6, pp. 1120–1125, 2020.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: LUNG CANCER PREDICTION USING ELECTRONIC CLAIMS RECORDS 6073

[25] X. Min, B. Yu, and F. Wang, “Predictive modeling of the hospital readmis- [37] WHO Collaborating Centre for Drug Statistics Methodology, Guidelines
sion risk from patients’ claims data using machine learning: A case study for ATC Classification and DDD Assignment, Oslo, Norway: Norwegian
on copd,” Sci. Rep., vol. 9, no. 1, pp. 1–10, 2019. Institute of Public Health, 2023.
[26] B.-H. Kim, S. Sridharan, A. Atwal, and V. Ganapathi, “Deep claim: [38] M. Koo, J.-T. Lai, E. Y.-L. Yang, T.-C. Liu, and J.-H. Hwang, “Incidence of
Payer response prediction from claims data with deep learning,” in Proc. vestibular schwannoma in Taiwan from 2001 to 2012: A population-based
Healthcare Syst., Population Health, Role Health-Tech (HSYS) Workshop national health insurance study,” Ann. Otology Rhinol. Laryngol., vol. 127,
37th Int. Conf. Mach. Learn., 2020. no. 10, pp. 694–697, 2018.
[27] C.-Y. Hung, W.-C. Chen, P.-T. Lai, C.-H. Lin, and C.-C. Lee, “Comparing [39] H.-Y. Chen, I.-C. Chen, Y.-H. Chen, C.-C. Chen, C.-Y. Chuang, and C.-
deep neural network and other machine learning algorithms for stroke H. Lin, “The influence of socioeconomic status on esophageal cancer in
prediction in a large-scale population-based electronic medical claims Taiwan: A population-based study,” J. Personalized Med., vol. 12, no. 4,
database,” in Proc. IEEE 39th Annu. Int. Conf. Eng. Med. Biol. Soc., 2017, 2022, Art. no. 595.
pp. 3110–3113. [40] C.-Y. Hung, C.-H. Lin, and C.-C. Lee, “Improving young stroke prediction
[28] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. by learning with active data augmenter in a large-scale electronic medical
Process. Syst., 2017, pp. 6000–6010. claims database,” in Proc. IEEE 40th Annu. Int. Conf. Eng. Med. Biol.
[29] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training Soc., 2018, pp. 5362–5365.
of deep bidirectional transformers for language understanding,” in Proc. [41] M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep
Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Lang. networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 3319–3328.
Technol., vol. 1, 2019, pp. 4171–4186. [42] N. Kokhlikyan et al., “Captum: A unified and generic model interpretability
[30] A. Dosovitskiy et al., “An image is worth 16 × 16 words: Transformers library for PyTorch,” 2020, arXiv [cs.LG].
for image recognition at scale,” in Proc. Int. Conf. Learn. Representations, [43] X. Wang, K. Ma, J. Cui, X. Chen, L. Jin, and W. Li, “An individual risk
2021. prediction model for lung cancer based on a study in a chinese population,”
[31] Y. Li et al., “BEHRT: Transformer for electronic health records,” Sci. Rep., Tumori J., vol. 101, no. 1, pp. 16–23, 2015.
vol. 10, no. 1, pp. 1–12, 2020. [44] H. A. Katki, S. A. Kovalchik, C. D. Berg, L. C. Cheung, and A.
[32] Y. Li et al., “Hi-BEHRT: Hierarchical transformer-based model for ac- K. Chaturvedi, “Development and validation of risk models to select
curate prediction of clinical events using multimodal longitudinal elec- ever-smokers for CT lung cancer screening,” Jama, vol. 315, no. 21,
tronic health records,” IEEE J. Biomed. Health Inform., vol. 27, no. 2, pp. 2300–2311, 2016.
pp. 1106–1117, Feb. 2023. [45] S. A. Kovalchik et al., “A regression model for risk difference estimation in
[33] E. Choi et al., “Learning the graphical structure of electronic health records population-based case–control studies clarifies gender differences in lung
with graph convolutional transformer,” in Proc. AAAI Conf. Artif. Intell., cancer risk of smokers and never smokers,” BMC Med. Res. Methodol.,
2020, pp. 606–613. vol. 13, no. 1, pp. 1–8, 2013.
[34] L. Rasmy, Y. Xiang, Z. Xie, C. Tao, and D. Zhi, “Med-BERT: Pretrained [46] M. C. Tammemägi et al., “Development and validation of a multivariable
contextualized embeddings on large-scale structured electronic health lung cancer risk prediction model that includes low-dose computed tomog-
records for disease prediction,” NPJ Digit. Med., vol. 4, no. 1, pp. 1–13, raphy screening results: A secondary analysis of data from the national lung
2021. screening trial,” JAMA Netw. Open, vol. 2, no. 3, 2019, Art. no. e190204.
[35] C.-Y. Hsieh et al., “Taiwan’s national health insurance research database:
Past and future,” Clin. Epidemiol., vol. 11, 2019, Art. no. 349.
[36] World Health Organization (WHO), “The ICD-10 classification of men-
tal and behavioural disorders: Diagnostic criteria for research, Genéve,
Switzerland: World Health Organization, 1993.

Authorized licensed use limited to: Khulna Univ of Engineering & Technology - KUET. Downloaded on April 17,2024 at 15:49:21 UTC from IEEE Xplore. Restrictions apply.

You might also like