0% found this document useful (0 votes)
66 views6 pages

Predicting The Risk of Heart Failure With EHR Sequential Data Modeling

This article proposes a new approach for predicting the risk of heart failure using electronic health record (EHR) sequential data and neural networks. It introduces heart failure as a growing health problem and the need for early diagnosis. The approach models diagnosis events from EHRs using one-hot encoding and word vectors, and predicts heart failure risk using a long short-term memory network. An evaluation on a real-world dataset shows the method's promising ability to accurately predict heart failure risk.

Uploaded by

Muzamil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views6 pages

Predicting The Risk of Heart Failure With EHR Sequential Data Modeling

This article proposes a new approach for predicting the risk of heart failure using electronic health record (EHR) sequential data and neural networks. It introduces heart failure as a growing health problem and the need for early diagnosis. The approach models diagnosis events from EHRs using one-hot encoding and word vectors, and predicts heart failure risk using a long short-term memory network. An evaluation on a real-world dataset shows the method's promising ability to accurately predict heart failure risk.

Uploaded by

Muzamil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 1

Predicting the Risk of Heart Failure with EHR


Sequential Data Modeling
B. Jin*, Senior Member, IEEE , C. Che*, Z. Liu, Shulong Zhang, Xiaomeng Yin and X.P. Wei

 accurate diagnosis with such massive data, particularly in the


Abstract—Electronic health records (EHRs) contain patient early stages. Indeed, a method for early diagnosis of heart
diagnostic records, physician records, and records of hospital failure that has a low-error rate is critically needed for clinical
departments. For heart failure, we can obtain mass unstructured trials and treatments[3]. By analyzing these sequential datasets,
data from EHR time series. By analyzing and mining these time-
we have an opportunity to provide early diagnoses and
based EHRs, we can identify the links between diagnostic events
and ultimately predict when a patient will be diagnosed. However, treatments for people who are likely to have heart failure and
it is difficult to use the existing EHR data directly because they are help them have longer, more active lives.
sparse and non-standardized. Thus, this paper proposes an A preferred strategy to resolve the problems of accurate
effective and robust architecture for heart failure prediction. The diagnosis and the delivery of targeted therapies is the frequent
main contribution of this paper is to predict heart failure using a performance of complete physical evaluations[4]. However,
neural network (i.e., to predict the possibility of cardiac illness
complete and frequent physical evaluations would lead to data
based on a patient’s electronic medical data). Specifically, we
employed one-hot encoding and word vectors to model the overload. Heart failure patients and society would benefit if we
diagnosis events and predicted heart failure events using the basic could provide an accurate, systematic diagnostic service for the
principles of a long short-term memory network (LSTM) model. population. To this end, this paper develops a new approach to
Evaluations based on a real-world dataset demonstrate the this vital task using an enhanced long short-term memory
promising utility and efficacy of the proposed architecture in the networks (LSTM) method and a data-driven framework.
prediction of the risk of heart failure.
Specifically, we treat each patient as a dynamic system that
Index Terms—Electronic health records, Heart Failure, Risk can be measured by a set of time series, such as the results of
Prediction different lab tests, records and medical indicators. Our key idea
is to analyze these time series. A time series is a sequence that
provides the value of a statistical indicator in the order of time [5].
I. INTRODUCTION A time series indicates the trend of the numerical value of the
statistical index of the study object over a certain period. The
H EART failure, also referred to as congestive heart failure,
occurs when the heart cannot pump enough blood to meet
the body's needs[1]. The risk factors for heart failure include[2]
traditional prediction methods based on time series primarily
comprise the exponential smoothing method [6], the
high blood pressure, a prior heart attack, obesity, smoking, autoregressive integral moving average model (ARIMA) [7],
alcohol abuse, vitamin deficiencies, sleep apnea, heavy metal recurrent neural networks (RNN)[8], and the long short-term
toxicity, an unhealthy diet (including animal fats and salt), and memory network (LSTM) [9]. Currently, however, researchers
being sedentary. Heart failure is more common among people often infer the diagnosis events with vectors in an unsupervised
over the age of 65, overweight people, and those with a previous manner. In contrast, it is extremely valuable to model the
heart attack. The diagnostic method for heart failure is primarily diagnosis events with similarity learning. In this paper, we
based on the patient’s medical and family histories, a physical propose a novel method for diagnosis event modeling that
examination, and test results. The signs and symptoms of heart combines one-hot encoding and word vectors and employs
failure are also common in other conditions. Thus, physicians LSTM approach for heart failure prediction with the modeled
identify any damage to a patient’s heart and check how well the diagnosis events as the input. Experimental results on a real-
patient’s heart pumps blood. These diagnostic methods provide world dataset demonstrate the performance of the improved
massive sequential data, and it is a non-trivial task to perform diagnosis prediction method.
The rest of the paper is organized as follows. In Section 2,

September 1, 2017. Corresponding author: C. Che ([email protected]). Z. Liu is now an undergraduate student with the School of Computer Science,
* Equalcontributions. This work was supported in part by the Natural Science Dalian University of Technology, Dalian, China, Postal code 116024 (e-mail:
Foundation of China Grant 61772110, 61402068, 91546123. 2431871071 @qq.com).
B. Jin is with the School of Innovation and Entrepreneurship, Dalian Shulong Zhang is with Affiliated Zhongshan Hospital of Dalian University,
University of Technology, Dalian, China, Postal code 116024 (e-mail: Dalian, China, Postal code 116001 (e-mail: [email protected])
[email protected]). Xiaomeng Yin is with the First Affiliated Hospital of Dalian Medical
C. Che is with the Key Laboratory of Advanced Design and Intelligent University, Dalian, China, Postal code 116011 (e-mail: [email protected])
Computing, Ministry of Education, Dalian University, Dalian, China , Postal X.P. Wei is with the School of Computer Science, Dalian University of
code 116622(e-mail: [email protected]). Technology, Dalian China, Postal code 116024 (e-mail: [email protected]).

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 2

we introduce the background of the time series analysis. Section GRU model is both simpler and more popular than the standard
3 introduces the proposed modeling architecture, and we LSTM model.
evaluate it using real-world data in Section 4. We discuss the
related work in Section 5 and conclude our work and highlight III. METHODS
future research directions in Section 6. In contrast to the abovementioned methods, we develop a
word-vector-enhanced LSTM framework that can jointly
II. BACKGROUND construct the LSTM model with a word-to-vector method
A. Autoregressive integral moving average model (ARIMA) learned with available supervising constraints. Our work is
based on the basic LSTM model. We present the details of our
The ARIMA model and exponential smoothing model approach in the following section.
exhibit better performances with short-term time series data
prediction[10] and are thus suitable for numerical sequences. For A. Diagnostic event sequence preprocessing
non-numeric time series, a neural network can be constructed The input of our framework is the patient's diagnostic event
to solve the problem. In most time series data predictions, the sequence. In this paper, we use two methods to process the
depth-learning method provides better predictions than the diagnostic event sequence into the form of the model input. The
ARIMA model or the exponential smoothing model. The depth- first method is the one-hot method[17], and the second one is the
learning algorithm is based on a traditional neural network. word vector method[18].
The one-hot method represents each diagnostic event as a
B. Recurrent neural network (RNN) vector whose length is equal to the number of different
With today's increasing computational power, deep learning diagnostic events. The weight of the vector contains a single
has been used to build many complex neural networks, such as one, and the other cells are all 0s. The 1 corresponds to the
convolutional neural networks (CNNs) [11], recurrent neural current diagnostic event. One-hot coding is currently the most
networks (RNNs) [12], and depth neural networks (DNNs) [13]. widely used method. This method is most convenient for use
These networks have enabled breakthroughs in natural with only a few dimensions. However, one-hot encoding is not
language processing (NLP), image recognition (IR), speech good at characterizing similarities between different words. For
recognition (SR) and other fields. RNNs are suitable for dealing example, consider a vocabulary V inside which each word wi
has a label. The word wi can be expressed as a vector of length
with time-series prediction problems. RNNs consist of an input
|V| with the one-hot method. The i-th element is 1, the others
layer, a hidden layer, and output layer. The result of the hidden
are all 0s. Assuming that the second word is "Cardiac Failure"
layer is related to the input of the current layer and the output
and the third word is "Heart Failure", then:
of the previous layer. Using this mechanism, an RNN gains the
ability to remember historical results. Through transfer between w2  [0,1, 0,..., 0]T
the hidden layers, the previous information is passed to the next (5)
sequence, which establishes the relationship across the time w3  [0, 0,1,..., 0]T
series.
"Cardiac Failure" and "Heart Failure" are semantically the
C. Long short-term memory network (LSTM) same, but the one-hot expression does not reflect the similarity
Hochreiter et al.[14] proposed the LSTM model, which is a between the two words.
special RNN model, in 1997. To achieve long-term memory, Another method is the word vector model. As one of the
the RNN needs to hook the state of the current hidden layer to distributed expression methods, the word vector model
the state of a previous n-level hidden layer. This results in an provides a method for directly calculating the similarity
exponential increase in the amount of computation, which in between two words. The basic idea of the model is to map each
turn increases the time cost of the model. Thus, RNNs are not word into a fixed-length vector by studying a large number of
directly used for long-term memory calculations. The layers of expected corpora. In general, the vector length is much smaller
the LSTM are added to the valve node on the basis of the than the length of the dictionary in the language. The vector
original RNN network, which is conducive to overcoming the length is usually between tens to hundreds of dimensions. All
problems of RNN with long-term memory calculations. vectors make up the word vector space. Moreover, each vector
Moreover, this approach has been widely used. LSTM adds represents a point in the space such that the distance between
three gates to the basics of the original RNN network, i.e., an points can be used to measure the similarity between two words.
input gate, a forget gate, and an output gate. In recent years, We use a three-layer neural network to construct a language
many researchers have made minor changes to the LSTM model whose structure is illustrated in Figure 1.
model. One popular LSTM variant, introduced by Gers and
Schmidhuber[15], involves the addition of “peephole
connections” (i.e., we let the gate layers look at the cell state).
Another larger variant is the threshold cyclic unit (GRU) model,
which was proposed by Chung[16]. Here, the forget gate and the
input gate are combined as a single update gate. Chung also
merges the cell states and hidden states and makes some other
changes. The GRU can increase the persistence of the memories
of RNNs and thus support longer sequences. Moreover, the

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 3

of 0 to 1 to the state ct-1 of each cell through the σ function. A


"1" means all reserved, and a "0" means all discarded. Equation
(1) indicates the state of the cell:

ft   (Wf  [ht 1 , xt ]  b f ) (1)

The second part of Figure 1 (labeled "2") is used to update


the cell status and includes the sigmoid layer and the tanh layer.
The sigmoid layer determines what value needs to be updated.
A new value is created through the tanh layer. it and c~t can be
calculated from the sigmoid layer and the tanh layer:

it   ( Wi  [ht 1 , xt ]  bi )
Fig. 1 Three-layer neural network language model (2)
Ct  tanh( WC  [ht 1 , xt ]  bC )
C is a matrix of |V|*m. |V| represents the total number of
corpora. m represents the dimension of the word vector. C(w) The third part is used to update the cell state. This part
denotes the word vector of word w. The entire model uses a updates c~t to ct. Next, ct-1 and ft are multiplied, and the
unique set of word vectors that essentially consists of the information that needs to be dropped is discarded. Then, add
following three layers: it*c~t, and obtain the value ct of the new state:
Input layer: Splices the word vector C(wt-n+1),…,C(w t-2),C(w
Ct  ft  Ct 1  it * Ct (3)
t-1) and obtains the input vector x.
Hidden layer: Updates the state by calculating d+Hx directly, The last part is used to get the output value. The sigmoid
where d is the offset item, H is the hidden layer weight of the function is implemented to determine the part that needs to be
h*(n-1)m vector, and h represents the number of elements output. The cell state is tanh-treated (to obtain a value between
selected from the word vector. -1 and 1) and multiplied by the sigmoid output. Then, the part
Output layer: There is |V| nodes in all. Each node yi ht that needs to be output is obtained:
represents the probability of the next word, i, which is not the
normalized log. The softmax activation function is then used to ot   ( Wo  [ht 1 , xt ]  bo )
(4)
normalize the output value yi according to the following ht  ot  tanh(Ct )
formula:
y  b  Wx  U tanh(d  Hx) (6)
IV. DATA ANALYSIS
B. Risk prediction
This paper used the electronic health record (EHR) data from
The LSTM model for heart failure risk prediction is
a real-world dataset related to congestive heart disease to
illustrated in Figure 2. The valve node uses the sigmoid function
perform the experiment. First, we extracted the records of
(σ symbol) for calculation based on the memory state of the
patients who had heart failure disease for more than four years.
network as input. The output ‘0’ of the sigmoid layer
The dataset consists of two parts: dataset A and dataset B.
corresponds to the closed state of the gate, and the output ‘1’
Dataset A contains the diagnostic records of 5,393 patients who
corresponds to the open state. If the value of output gate exceeds
have been diagnosed with heart failure. The records mainly
the threshold, this output would multiply the output of the
includes patient IDs, recording times, diagnosis events, and
current layer and is taken as the input of the next layer.
diagnosis times. Dataset B contains the diagnostic records for
Otherwise, it should be forgotten. By controlling the closing of
17,001 patients who have not been diagnosed with heart failure.
the valve, the effect of the previous sequence on the final result
The records mainly include patient IDs, recording times, and
can occur.
diagnostic events.
Figure 3 illustrates the distribution of patients in dataset A
according to the period of diagnostic records. The number of
days diagnosed in Figure 3 refers to the number of elapsed days
from when the patient began the diagnosis process and
treatment until the illness was determined. Figure 4 shows the
same distribution for the patients in dataset B. As illustrated in
Figure 3 and Figure 4, the diagnostic times of the patients in
Fig. 2 LSTM network structure dataset A mainly occurred within six months, whereas the times
of diagnosis in dataset B mainly occurred within 3 months.
The first part of Figure 2 (labeled "1") determines which
information is discarded from the cell state. Here, ht-1 represents
the state of a hidden layer at moment t-1, and xt represents the
output at moment t. This decision is performed by the input gate.
The gate reads the values of ht-1 and xt and outputs the values

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 4

5 times for verification. Similarly, the word vector also


processed five training sets and five proof sets.
To evaluate the performance of the proposed model, we
compared it with several state-of-the-art alternatives as
baselines, including logistic regression (LR), random forest (RF)
and AdaBoost. We used the receiver operator characteristic
(ROC), precision-recall (PR), area under the curve (AUC) and
F1 score metrics to evaluate the proposed method[19]. The
experimental results are presented in Tables 3 and 4.
Fig. 3. Distribution of patients in dataset A Tab. 3. Experimental results for the one-hot processing
Model ROC-AUC PR-AUC F1 Score
LSTM 0.6483 0.2401 0.2787
LR 0.5335 0.1456 0.0012
RF 0.5480 0.1589 0.0022
AdaBoost 0.5955 0.1780 0.0473

Tab. 4. Experimental results for the word vector embedding processing


Model ROC-AUC PR-AUC F1 Score
LSTM 0.6827 0.2678 0.2186
LR 0.6633 0.2251 0.0394
Fig. 4. Distribution of patients in dataset B
RF 0.6270 0.2025 0.0060
Next, we extract the dataset for each patient's diagnostic AdaBoost 0.6302 0.2124 0.1336
record, including the patient number, diagnostic event number, The average results of the proposed method and the other
diagnostic time, and time of illness. For dataset B, the time of three baselines (i.e., ROC-AUC, PR-AUC and F1 score) were
illness is unknown. Thus, the time of the last diagnosis was obtained after 5-fold cross validation. From the experimental
taken as the time of illness. Then, each patient was given a sick results, we can draw the following conclusions:
label; those in dataset A were marked as sick, and those in a. The accuracy of LSTM disease prediction is higher than
dataset B were marked as not sick. The data formats of datasets those of LR, RF and AdaBoost algorithms, which indicates that
A and B are presented in Tab. 1 and Tab. 2, respectively. the LSTM model is superior.
Tab. 1. Data format of dataset A b. A comparison of the two tables shows the LSTM model
Patient ID Recording Event ID Event description Diagnostic using word embedding vector to represent the patient diagnostic
timestamp timestamp event outperforms the model using one-hot processing.
145976 74050 1694 pentoxifylline 74995
145976 74074 1075 cataract 74995 VI. CASE STUDY
… … … … …
999874887 75025 963 heart disease 75025
As presented in Table 5, 1,064 different diagnostic events
were obtained by numbering the events. Each event has a label
Tab. 2. Data format of dataset B based on its diagnostic event contents.
Patient ID Recording Event ID Event description
timestamp
17185 74084 1082 obesity and overuse Tab. 5. Sample of the diagnostic events
Label Patient ID Event Diagnostic Sick
17185 74094 1245 chronic renal failure IDs timestamps timestamp
… … … … False 297022117 1257 655 74451 74456 74982
930 … 74520…
999987850 75101 655 diabetes
True 483534609 117 655 74020 74027 74134
339 … 74134…
… … … … …
V. EXPERIMENTAL RESULTS
True 288857360 1186 931 74397 74408 74924
In this paper, phrases “embedded vector” and “one-hot 147… 74924 …
encoding” refer to the manner in which diagnostic events were
handled. “Embedded vector” uses the word2vec tool to perform Figure 5 illustrates a case study for a patient diagnosis
word vector training on the patients’ diagnostic events and process. This patient began by being diagnosed with acute
represents the 1864 diagnostic events by the word vector. “One- myocarditis. The next day, he was diagnosed with other lung-
hot encoding” represent each diagnostic event directly with an related diseases. A week later, he was diagnosed with acute
1864-dimension vector. The corresponding event in the vector myocarditis. A month later, he was diagnosed with heart disease,
is set to 1, and the others are set to 0. The data were divided into and finally, he was diagnosed with heart failure. As described
five copies using a 5-fold cross validation method. Four copies in Figure 5, this patient's diagnostic record represents a time
were used as the training set and one copy was used as the proof series. The patients are independent of each other. However,
set. The training set was rotated, and this process was repeated there is a relationship between a patient's diagnostic events,

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 5

namely, previous diagnostic events will have an influence on we focused on temporal modeling in the use of LSTM to predict
future diagnostic results. heart failure.

I40 Acute myocarditis


1 Day
A0222 Salmonella pneumonia VIII. CONCLUSIONS
(ID 806) (ID 1279)
In this paper, we propose a novel predictive model
1 Week framework for heart failure diagnosis using LSTM methods.
I40 Acute myocarditis I42 Cardiomyopathy
Compared to popular methods such as LR, RF, and AdaBoost,
1 Month
(ID 806) (ID 280) our method exhibits superior performance in the prediction of
heart failure diagnosis. In the experimental data analysis and
1 Day
preprocessing, we used one-hot encoding and word embedding
I50 Heart Failure vectors to represent the patient diagnostic events. By analyzing
(ID 1352)
the results, we reveal the importance of respecting the
Fig. 5 Patient diagnosis process
sequential nature of clinical records. Future work will include
incorporating expert knowledge into our framework and
VII. RELATED WORKS expanding our approach to additional health care applications.
Traditional time series methods using linear models for low-
dimensional data have been widely applied to EHRs; e.g.,
REFERENCES
modeling the progression of chronic kidney disease to kidney
failure using the Cox proportional hazard model[20], modeling [1] Huang, H., Huang, B., Li, Y., Huang, Y., Li, J., Yao, H., ... & Wang,
the progression of Alzheimer’s disease using the hidden J. (2014). Uric acid and risk of heart failure: a systematic review and
Markov model[21] and the fused group Lasso[22], modeling the meta‐analysis. European journal of heart failure, 16(1), 15-24.
progression of glaucoma using a 2-dimensional continuous- [2] Ford, I., Robertson, M., Komajda, M., Böhm, M., Borer, J. S.,
time hidden Markov model[23], modeling the progression of Tavazzi, L., ... & SHIFT Investigators. (2015). Top ten risk factors
lung disease using graphical models with the Gaussian for morbidity and mortality in patients with chronic systolic heart
process[24], modeling the progression of chronic obstructive failure and elevated heart rate: the SHIFT Risk Model. International
pulmonary disease using the Markov jump process[25], and journal of cardiology, 184, 163-169.
modeling the progression of multiple diseases using the [3] Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Using
Hawkes process[26]. These previous works were not able to recurrent neural network models for early detection of heart failure
model high-dimensional non-linear relations. onset. Journal of the American Medical Informatics Association,
Deep learning methods have recently led to a renaissance of 24(2), 361-370.
neural network-based models. Hochreiter and Schmidhuber[27] [4] Hripcsak, George, and David J. Albers. "Next-generation
proposed long short-term memory (LSTM), which exhibited phenotyping of electronic health records." Journal of the American
impressive performance in numerous sequence-based tasks Medical Informatics Association 20.1 (2012): 117-121.
such as handwriting recognition, acoustic modeling of speech, [5] Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015).
language modeling, and language translation. Humeral et al. [28] Time series analysis: forecasting and control. John Wiley & Sons.
applied restricted Boltzmann machines to time series data [6] Bianchi, F. M., De Santis, E., Rizzi, A., & Sadeghian, A. (2015).
collected from wearable sensors to predict the state of Short-term electric load forecasting using echo state networks and
Parkinson’s disease patients. Lipton et al. [29] used LSTM for PCA decomposition. Ieee Access, 3, 1931-1943.
multilevel diagnosis prediction using pediatric ICU time series [7] Pati, J., Kumar, B., Manjhi, D., & Shukla, K. K. (2017). A
data (e.g., heart rate, blood pressure, glucose level, etc.). Both COMPARISON AMONG ARIMA, BP-NN AND MOGA-NN
these latter studies used multivariate time series data from FORSOFTWARE CLONE EVOLUTION PREDICTION. IEEE
patients but focused on very different clinical conditions with Access.
the continuous time series data. [8] Su, Y. T., Lu, Y., Chen, M., & Liu, A. A. (2017). Spatiotemporal
The prediction and earlier detection of heart failure could Joint Mitosis Detection Using CNN-LSTM Network in Time-Lapse
lead to improved outcomes through patient engagement and Phase Contrast Microscopy Images. IEEE Access.
more assertive treatment. Previous work on the early detection [9] Zhu, G., Zhang, L., Shen, P., & Song, J. (2017). Multimodal Gesture
of heart failure has relied on conventional modeling techniques Recognition Using 3D Convolution and Convolutional LSTM. IEEE
such as logistic regression (LR) and support vector machines Access.
(SVM), using features that represent the aggregation of events [10] Zhang, G. Peter. "Time series forecasting using a hybrid ARIMA and
in an observation window and exclude the temporal relations neural network model." Neurocomputing 50 (2003): 159-175.
among events in the observation window. In contrast, recurrent [11] Jenkins, Gwilym M., and Athar S. Alavi. "Some aspects of modelling
neural network (RNN) methods capture temporal patterns that and forecasting multivariate time series." Journal of time series
are present in longitudinal data. RNN models have proven analysis 2.1 (1981): 1-47.
effective in many difficult machine-learning tasks, such as [12] Brown, Robert G. "Exponential smoothing for predicting demand."
image captioning[30] and language translation[31]. Extending Operations Research. Vol. 5. No. 1. 901 ELKRIDGE LANDING
these methods to health data is sensible. RD, STE 400, LINTHICUM HTS, MD 21090-2909: INST
We borrowed from the prior work to leverage similar OPERATIONS RESEARCH MANAGEMENT SCIENCES, 1957.
representations of medical concepts through word vectors, but [13] Box, George EP, Gwilym M. Jenkins, and Gregory C. Reinsel.

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2017.2789324, IEEE Access

Access-2017-07607 6

"Linear nonstationary models." Time Series Analysis, Fourth Edition disease trajectories by exploiting multi-resolution structure. In
(1976): 93-136. Advances in Neural Information Processing Systems (NIPS)
[14] Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term 2015:748–756. Montreal, Quebec, Canada
memory." Neural computation 9.8 (1997): 1735-1780. [25] Wang X, Sontag D, Wang F. Unsupervised learning of disease
[15] Gers, Felix A., and Jürgen Schmidhuber. "Recurrent nets that time progression models. In Knowledge Discovery and Data Mining
and count." Neural Networks, 2000. IJCNN 2000, Proceedings of the (KDD) 2014:85–94. New York, NY, USA.
IEEE-INNS-ENNS International Joint Conference on. Vol. 3. IEEE, [26] Choi E, Du N, Chen R, Song L, Sun J. Constructing disease network
2000. and temporal progression model via context-sensitive Hawkes
[16] Chung, Junyoung, Gulcehre,Caglar, Cho, KyungHyun, Bengio, process. In International Conference on Data Mining (ICDM)
Yoshua. "Empirical evaluation of gated recurrent neural networks on 2015:721–726. Atlantic City, NJ, USA.
sequence modeling." arXiv preprint arXiv:1412.3555 (2014). [27] Hochreiter S, Schmidhuber J. Long short-term memory. Neural
[17] Hinton, Geoffrey E. "Learning distributed representations of Comput 1997;9(8):1735–1780
concepts." Proceedings of the eighth annual conference of the [28] Hammerla N, Fisher J, Andras P, Rochester L, Walker R, Plotz T.
cognitive science society. Vol. 1. 1986. PD disease state assessment in naturalistic environments using deep
[18] Bengio, Yoshua, et al. "A neural probabilistic language model." learning. In AAAI 2015. 1742–1748. Austin, Texas, USA.
Journal of machine learning research 3.Feb (2003): 1137-1155. [29] Lipton Z, Kale D, Elkan C, Wetzell R. Learning to diagnose with
[19] C. Buckley and E. M. Voorhees, Retrieval evaluation with LSTM recurrent neural networks. In arXiv preprint arXiv:
incomplete information," ser. SIGIR '04, 2004. 1511.03677 2016.
[20] Tangri N, Stevens L, Griffith J, et al. A predictive model for [30] Karpathy A, Li F. Deep visual-semantic alignments for generating
progression of chronic kidney disease to kidney failure. JAMA image descriptions. Computer Vision and Pattern Recognition
2011;305(15): 1553–1559. (CVPR) 2015:3128–3137. Boston, MA, USA
[21] Sukkar R, Katz E, Zhang Y, Raunig D, Wyman B. Disease [31] Cho K, Van Merrienboer B, Gulcehre C, et al. Learning phrase
progression modeling using hidden Markov models. In Engineering representations using RNN encoder-decoder for statistical machine
in Medicine and Biology Society 2012:2845–2848. translation. In Empirical Methods in Natural Language Processing
[22] Zhou J, Liu J, Narayan V, Ye J. Modeling disease progression via (EMNLP). 2014:1724–1734. Doha, Qatar.
multitask learning. NeuroImage 2013;78:233–248
[23] Liu Y-Y, Ishikawa H, Chen M, Wollstein G, Schuman J, Rehg J.
Longitudinal modeling of glaucoma progression using 2-
dimensional continuous-time hidden Markov model. In Medical
Image Computing and Computer-Assisted Intervention (MICCAI)
2013:444–451. Nagoya, Japan.
[24] Schulam P, Saria S. A framework for individualizing predictions of

2169-3536 (c) 2017 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like