0% found this document useful (0 votes)
47 views10 pages

CS598 Report

This paper proposes using a deep learning model called NeuralLOS to predict patient length of stay in the ICU by integrating both clinical notes and physiological data. The model uses different architectures to process each data source separately before combining them. A temporal convolutional neural network is used to capture trends in physiological data, while clinical notes are embedded and passed through CNN or RNN layers. The model is evaluated on the MIMIC-III dataset and compared to benchmark linear regression and LSTM models.

Uploaded by

Vibhor Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views10 pages

CS598 Report

This paper proposes using a deep learning model called NeuralLOS to predict patient length of stay in the ICU by integrating both clinical notes and physiological data. The model uses different architectures to process each data source separately before combining them. A temporal convolutional neural network is used to capture trends in physiological data, while clinical notes are embedded and passed through CNN or RNN layers. The model is evaluated on the MIMIC-III dataset and compared to benchmark linear regression and LSTM models.

Uploaded by

Vibhor Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Project Draft for CS598-DLH Spring 2024

Temporal Point wise Convolutional Networks for Length of Stay


Prediction in the Intensive Care Unit (ICU)
Vibhor Jain Priyank Jain
https://github.com/vibhor-github/lenght-of-stay.git

ABSTRACT Moreover, studies like that by Weissman et al. [33] have


underscored the benefits of incorporating clinical notes in
The duration of a patient's inpatient stay is crucial not only for
predictive models, showing enhancements in both length-of-stay
their outcome but also for the effective planning and management
and mortality predictions. Mullenbach et al. [24] contributed an
of hospital resources. It directly influences readmission rates as
approach for extracting ICD codes from clinical text, further
well. Accurately predicting the length of stay can positively impact
mortality rates and aid in efficient resource allocation and enriching predictive modeling in healthcare contexts.
management, thus enhancing patient satisfaction by minimizing
unnecessary readmissions. Active research has explored various 2. INTRODUCTION
traditional machine learning methods for estimating the length of Considerable efforts have been dedicated to predicting length of
stay early on. However, deep learning techniques have stay using various statistical machine learning and deep learning
demonstrated superior performance in healthcare research techniques. Additionally, researchers have increasingly recognized
compared to traditional approaches. In our study, we propose the value of clinical notes as a crucial source of patient health
employing a deep learning-based approach, specifically a CNN information. While previous studies have explored predicting
model, to enhance the prediction of patient length of stay using length of stay using physiological data, none, to our knowledge,
routinely collected inpatient data. have investigated the potential of integrating both physiological
data and clinical text for this purpose.
1. RELATED WORK
Numerous significant research endeavors have been undertaken to We introduce a neural architecture, NeuralLOS, designed to
forecast the length of stay in healthcare settings. Early harness information from both clinical notes and physiological
investigations, such as that by David H. Gustafson [8], utilized data. In the initial layers, each data source undergoes separate
Bayesian regression techniques to demonstrate the feasibility of processing using distinct architectures to extract hidden states.
cost-effective length-of-stay prediction. Suresh et al. [32] leveraged These generated states are then merged to form a unified data
neural network concepts like backpropagation to discern patterns source, which subsequently passes through additional layers to
in patient data for predictive purposes. Similarly, Clark et al. [5] predict the remaining length of stay.
applied Poisson regression methodology to estimate lengths of
stay.
For the physiological data, we adopt a sliding window approach to
capture temporal diagnostic information using a CNN model.
Recent years have seen a surge in interest in employing deep
Conversely, for the clinical notes, we generate embeddings using
learning methodologies for length-of-stay prediction. Gentimis et
GloVe and apply either CNN or RNN architecture for further
al. [11], for instance, employed a basic neural network on the
processing.
MIMIC-III [19] dataset, illustrating the development of a
generalizable model with high predictive accuracy across diverse
health conditions. Rocheteau et al. [29] utilized temporal 3. METHOD
convolutional neural networks (CNNs) to capture temporal trends
and inter-feature relations for length-of-stay prediction, achieving 3.1 Data
notable improvements over baseline models like standard LSTM, The data utilized in this project is sourced from the Mimic-III
channel-wise LSTM, and transformer models. Clinical notes, being critical care database [19; 27], encompassing de-identified health-
rich repositories of patient information, have emerged as valuable related information such as demographics, vital signs, laboratory
data sources for prediction models. Huang et al. [16] devised results, procedures, medications, caregiver notes, and more.
ClinicalBERT, based on the BERT [7] model, to extract insights Structured as a relational database, Mimic-III offers versatility for
from clinical notes and enhance predictions of hospital various applications, including the focus of this project: length of
readmission. stay prediction.
Train Test We establish similar sliding windows for the preprocessed
# Patients 28,620 5,058 physiological data to encapsulate temporal information. Employing a
5-hour window size for both physiological data and note embeddings,
# ICU stays 35,621 6,281 we ensure a comprehensive representation of patient data. For instance,
Total samples 2,925,434 525,912 if a patient's stay spans 7 hours, the generated windows would be
[0,1,2,3,4], [1,2,3,4,5], [2,3,4,5,6], [3,4,5,6,7]. Each number denotes the
Table 1: Benchmark data for length of stay data set hour from admission, aggregating information across all features for
every hour within the window.

Pre-processing of the data relies on a standard benchmark [12] 3.3 Winsorization


code, albeit requiring significant modifications to ensure In our preliminary data analysis, we noted the presence of several
performance and inclusion of clinical notes, which were absent in extreme values within the dataset, which skewed the overall
the original benchmark models. representation of the data. To address this issue, we implemented a
winsorization technique, setting the threshold at 94% to mitigate the
The benchmark generates cleansed tabular features and label impact of outliers during the training process. Winsorization, a widely
values for predicting risk assessment (mortality prediction), used statistical method for outlier treatment, involves capping extreme
physiologic compensation, phenotype, and length of stay. values to reduce their influence on the analysis. With a winsorization of
Additionally, it provides code for generating prediction baseline 94%, we effectively trim 3% of extreme data from both ends of the
benchmarks for result comparison. Table 1 presents statistics datasets, ensuring a more balanced and reliable dataset for subsequent
regarding length of stay, including the distribution between analysis.
training and test datasets.

To enrich the features generated by the benchmark, clinical notes


3.4 Benchmark Models
for each visit are appended. These notes are then converted into Based on the original benchmark [13], we are conducting comparisons
embeddings for model input. Length of stay prediction labels are with a linear regression model and a simple LSTM model. Unlike the
assigned for each period length of the episode, with the label original paper, which employed an ordered classification approach for
decreasing as the period length increases. CSV files are also predicting the number of days remaining, we opt for simple regression,
generated for each patient/episode, comprising approximately as well as a custom metric to better reflect real-world length-of-stay
hourly physiological data points such as Glasgow Coma Scale (LOS) usage patterns. This custom metric divides the LOS range into
measures, glucose levels, oxygen saturation, blood pressure, heart ten buckets, including extremely short visits (less than one day), seven
rate, and temperature. Currently, the data lacks temporal day-long buckets for each day of the first week, and two "outlier"
organization, necessitating further investigation to potentially buckets for stays over one and two weeks, respectively. By transforming
consolidate data into hourly intervals to align with our intended the regression problem into an ordinal multiclass classification
CNN architecture, which requires fixed-length periods. problem, we use a Kappa score to measure this classification, as it
accommodates ordered classes and their correlations.
Since the benchmark dataset lacks clinical notes, constructing a
We incorporate the standard LSTM model trained with simple
clinical note dataset is imperative, possibly by grouping notes into
regression to provide a more direct comparison with the linear
fixed-length time windows. Determining the optimal time window
regression baseline, which only trains against the raw LOS value.
necessitates further investigation, particularly considering that the
Common metrics for regression tasks, such as Mean Squared Error or
Mimic-III dataset generally exhibits shorter stay lengths compared
Mean Absolute Difference, are typically employed for model
to the original paper utilizing clinical notes for health outcome
comparison. However, the original benchmark did not use Mean
prediction. Consequently, an 8-hour window may be excessively
Squared Error.
lengthy.
Furthermore, we enhance the benchmark preprocessing by integrating
3.2 Pre-processing clinical notes, an aspect overlooked in the original benchmark. Despite
The Mimic-III database stands as a cornerstone in healthcare leveraging code from the benchmark for this project, significant
research, widely embraced by researchers. To kickstart our modifications were necessary to ensure compatibility with the latest
analysis, we utilize the benchmark [13] to preprocess the raw versions of Keras and TensorFlow. Additionally, multiprocessing
Mimic-III data. This benchmark preprocessing yields tabular support was implemented to enhance the efficiency of the LOS task, as
physiological data alongside true values indicating the remaining creating tensors for training was previously bottlenecked.
length of stay, complemented by additional patient information
spanning the entire inpatient duration. This data is structured as a 3.5 Clinical notes
time-series, capturing all available observations over time for each Numerous studies and research endeavors [16; 7; 18] have highlighted
patient during each episode of admission. the wealth of valuable patient information contained within clinical
notes, demonstrating their efficacy in deep learning models. In our
However, the original benchmark code does not handle clinical investigation, we assessed BioClinicalBERT [1], a freely accessible
notes. Thus, we've enhanced the benchmark code to incorporate BERT model derived from BioBERT [23] and fine-tuned with MIMIC-
the retrieval of clinical notes corresponding to the physiological III clinical notes, alongside BioSentVec [4], which leverages PubMed
data time-series. These notes are segmented into sliding windows, [3] to generate embeddings from MIMIC-III clinical notes. Both
and embeddings are generated from the raw notes, serving as methods yield embeddings of similar shapes.
inputs to our model.
Layer# Layer Name #Input Params #Output Param
1 Conv2d 5,440 87,040
2 Conv2d 87,040 174,080
2 MaxPool2d 174,080 34,816
3 Conv2d 34,816 69,632
4 Linear 69,632 8,192
5 Dropout 8,192 8,192
6 Linear 8,192 4,096
7 Linear 4,096 1,024 Figure 1: PhysioNet implementation

Table 2: PhysioNet: Layerwise Parameters

Layer# Layer Name #Input Params #Output Param


1 Conv2d 1,966,080 7,864,320
2 MaxPool2d 7,864,320 1,966,080
2 Conv2d 1,966,080 3,932,160
3 Conv2d 3,932,160 1,966,080 Figure 2: NotesNet implementation
4 Linear 1,966,080 65,536
5 Dropout 65,536 65,536
6 Linear 65,536 16,384 The architecture of our model, depicted in figures [1, 2, 3],
7 Dropout 16,384 16,384
comprises multiple levels. Our top-level model, EpisodeNet, is a
8 Linear 16,384 4,096
9 Linear 4,096 1,024 fusion of two distinct models: PhysioNet and NotesNet.

Table 3: NotesNet: Layerwise Parameters 3.6.1 PhysioNet


Figure 1 illustrates the architecture of PhysioNet, which comprises
three convolution layers, one pooling layer, one dropout layer, and
These embeddings serve as input to our NotesNet [see Figure 2], three fully connected linear layers. Detailed parameters of the
enabling the extraction of hidden states from the embeddings. model are provided in Table 2. This model is utilized to process the
tabular physiological data for each batch. The output of the model
3.6 LOS Design is a tensor with dimensions (#batch size, 32), representing the
The duration of an inpatient stay hinges on various factors, hidden learned state of the model derived from the tabular
primarily the physiological data collected from the patient, such as physiological data.
blood pressure and temperature. However, these features can
fluctuate throughout the duration of the stay. Healthcare centers 3.6.2 NotesNet
typically monitor and record these features at regular intervals Figure 2 depicts the architecture of NotesNet, which comprises
until the patient is discharged or deceased, introducing a temporal three convolution layers, one pooling layer, two dropout layers,
aspect to the dataset where the remaining length of stay is and four fully connected linear layers. We utilize BioClinicalBERT
influenced by changes in the patient's condition over time. [1] to generate embeddings from the batch's notes. These
embeddings serve as input to NotesNet, enabling the processing
Given the sequential nature of such datasets, opting for a recurrent and learning of information for a given batch. The model's input is
neural network (RNN) model like LSTM [15] appears logical to a tensor with dimensions (#batch size, #sentences, #embeddings),
capture the evolving information. Additionally, since MIMIC III with #sentences set to 80 and #embeddings set to 768 for our
[27] is a sizable dataset, processing as much information as model. The output of the model is a tensor with dimensions
possible for model training seems sensible. Neural network models (#batch size, 32), representing the hidden learned state of the
generally perform better with larger datasets. However, training an model derived from the notes data.
RNN model on such a large dataset demands significant hardware
resources. 3.6.3 EpisodeNet
EpisodeNet serves as the top-level model, integrating both
To address these challenges, we propose a CNN [22]-based neural PhysioNet and NotesNet to process each batch of data. The
network architecture called NeuralLOS. We employ a sliding architecture is depicted in Figure 3. The tabular physiological data
window approach to capture temporal information from the from each batch undergoes processing via PhysioNet, while the
dataset. Each window comprises observations from the preceding corresponding notes embedding is processed through NotesNet.
few hours, which are then batched and shuffled to create the Subsequently, the output hidden states from both PhysioNet and
training dataset. The remaining length of stay at the end of the NotesNet are concatenated and forwarded through a sequence of
window serves as the true output. In our initial experiments, we fully connected linear layers to predict the remaining length of stay
utilize a 4-hour window. For example, if an inpatient admission as a regression output.
record spans 6 hours, the windows would be [1,2,3,4], [2,3,4,5],
and [3,4,5,6]. 3.7 Metrics
We selected 3 different metrics commonly used with regres-sion
Regarding model design, we explored the possibility of using pre- models for our evaluations.
trained CNN models like AlexNet [21] or ResNet [14] as our base
model. However, many of these pre-trained models are optimized 3.7.1 Mean Squared Error
for image data, which may not suit our use case. Consequently, we
opt for a simple 9-layer neural network for implementation.
Figure 3: EpisodeNet implementation

MSE measures the average of the square of the errors. It is


calculated by taking an average of the square of the differ-ence
Figure 4: Trend of training losses over epochs
between the true values and the predicted values. It is sensitive to
outliers and has higher penalty with greater deviation on true and
predicted values. It is given as:
(ytrue − ypred )2
P
#sample

3.7.2 Mean Absolute Error


MAE measures the average of the absolute difference of the errors.
It is calculated by taking an average of absolute value of the
difference between the true values and the predicted values. It is
less sensitive to outliers. It is given as:
P
|(ytrue − ypred )|
#sample

3.7.3 Mean Absolute Percentage Error


MAPE measures the average of the ratio of the absolute difference
of the errors to the true value. It gives the nor-malized version of
the MAE by true values. It is given as:
100 X (ytrue − ypred )
∗ | |
#sample ytrue Figure 5: Trend of validation mean squared errors over
epochs
3.8 Hyper-parameter tuning and selection
In the realm of deep learning models, hyperparameter
tuning stands as a pivotal step in identifying the most effective 3.8.2 Batch size; Learning rate
parameters for optimal model performance. In our approach, Similarly, we experimented with different batch sizes with a
we conducted experiments involving various combinations combination of different hyper-parameters. We selected 32, 128
of different hyperparameters to ascertain the optimal settings for and 256 as candidate batch sizes and ran multiple iter-ations with
achieving the best results. different learning rates. We collected different metrics with all the
combinations as shown in tables [4, 5, 6].
3.8.1 Number of epochs Based on these metrics, the model performed best on test
We conducted experiments involving different numbers of epochs dataset when trained with batch size of 32 and learning rate of
combined with various hyperparameters such as learning rate and 0.0001.
batch size across multiple iterations. Throughout these
experiments, we monitored the training loss and calculated metrics 3.8.3 Optimizer and Loss function
on the validation set for each epoch. Since we have designed our model to predict a regression output,
we experimented with two different loss functions: 1) L1Loss 2)
Our observations revealed that the training loss decreases rapidly MSELoss. Based on the test results over mul-tiple iterations, we
and substantially during the initial few epochs, eventually observed that MSELoss is better suited to our model.
flattening out at around 5 epochs for all iterations, as depicted in We explored Stochastic gradient descent and Adam[20] op-
Figure 4. Additionally, while the mean squared error (MSE) for timizers for our model. Based on the experimental results,
validation remains relatively stable up to 5 epochs, it begins to
increase thereafter, as illustrated in Figure 5. This trend suggests
potential overfitting of the model to the training set.

Based on these findings, we selected the number of epochs for our


evaluation, considering the balance between model performance
and avoidance of overfitting.
lr/batch size 32 128 256
0.01 6.821e+12 1.113e+12 3.555e+11
0.001 9.858e+4 1.009e+5 5.363e+4
0.0001 1.744e+4 9.187e+4 7.652e+4
0.00001 1.472e+5 1.427e+5 2.498e+5

Table 4: A comparison of mean square error values on test set for


different batch sizes and learning rates

lr/batch size 32 128 256


0.01 3.099e+5 4.809e+4 1.601e+4
0.001 8.239e+1 8.474e+1 8.280e+1
0.0001 8.070e+1 8.297e+1 8.327e+1
0.00001 9.132e+1 1.001e+2 9.739e+1

Table 5: A comparison of mean absolute error values on test set for Figure 6: Histogram showing episodes over length of stay
different batch sizes and learning rates after Winsorization

we selected Adam as the optimizer for our model.

4. RESULTS
4.1 Evaluation
The initial results of our benchmark models closely align with the
findings reported in the benchmark study [12], with our results
showing slight improvement. For instance, the original paper
reported a mean absolute error (MAE) of 94.7 for a basic LSTM
model, whereas our model achieved an MAE of 79.3. Notably, we
conducted training using regression output rather than employing
custom bins for classification, which was used in the original
study. Our attempts at training with custom bins (e.g., 1 day, 2
days, 3 days, etc., up to 2 weeks) did not yield satisfactory results.
Please refer to Table 7 for a comprehensive listing of results.
Figure 7: Distribution of Data Points Across Different Lengths of
When evaluating a forecasting model, it is essential to understand Stay
two key aspects: a) the amount of historical data required to make
accurate predictions and b) how closely the model's predictions The results are presented in Figure 8.
align with the current state [28]. To assess the model's
performance at different intervals, we initially applied As anticipated, the error decreases over progressive time periods
Winsorization of 96% across the length of stay for each episode, for all models, except for linear regression, which decreases until
removing the bottom 3% and top 97% of data points. Additionally, 50 hours but then rises again. Other models also exhibit an
we filtered out episodes lasting less than 60 hours to ensure inflection point at 60 hours, except for NeuralLOS with full data
consistent comparison across different time periods. The choice of and LSTM.
60 hours was made because it is close to one standard deviation of
the average length of stay (66 hours). Figure 6 illustrates the When comparing NeuralLOS using only physiological (tabular)
distribution of episodes over the length of stay after applying data with a model augmented with Bio-ClinicalNote embeddings
Winsorization. [16], we included a version of NeuralLOS trained on the same
dataset as the model with notes. Note processing consumes a
Considering the test set comprises 1,555 episodes, we observe a significant amount of time, preventing training on the full dataset.
consistent number of data points at each period up to 60 hours, as To facilitate comparison, we trained NeuralLOS on the same
depicted in Figure 7. By plotting the mean squared error at these smaller dataset to discern differences. The model with notes
different time periods, we gain insights into how the models appears to perform better than the tabular-only model overall, but
perform as they access more information over time. not when considering stays longer than 60 days.

lr/batch size 32 128 256 We also investigated whether predictions become more accurate as
0.01 1.008e+4 1.528e+3 3.337e+3 the patient approaches the end of their stay. Using the same
0.001 1.636 1.845 1.765 episodes, we categorized bins from 2 weeks (336 hours) to 12
0.0001 1.662 1.706 1.737 hours. The number of data points in each bin is illustrated in
Figure 9.
0.00001 2.349 2.839 2.596

Table 6: A comparison of mean absolute percentage error values


on test set for different batch sizes and learning rates
Model Data types MAE MSE MAPE
Linear regression Tabular 121.69 12,805,595 3.15
LSTM Tabular 79.28 16,889 81.10
LSTM Notes 101.44 28,201 0.72
PhysioNet (full data) Tabular 78.55 17,492 1.01
PhysioNet (Part data) Tabular 80.50 17,450 1.66
PhysioNet+Notes (Part data) Tabular+Notes 80.49 16,122 1.55

Table 7: Model results for length of stay prediction.

Figure 10: Mean Squared Error at different remaining lengths of


stay

Figure 8: Mean Squared Error at different hours of stay

Figure 11: Linear regression deviation distribution

As anticipated, all models exhibit improvement as they approach


the final stay. Interestingly, linear regression outperforms the other
models initially, showing an inflection point at 168 hours before
exceeding the chart limit at 60 hours. On the other hand,
NeuralLOS and LSTM models start with lower effectiveness but
show improvement around 120 hours.

While mean squared error (MSE) plots enable model comparison,


Figure 9: Number of data points at different remaining length of visualizing the spread and degree of model accuracy is enhanced
stay by plotting a histogram of deviation in hours. Figures 11 through
14 portray this distribution across the models.

The accuracy at each remaining length of stay is illustrated


through a series of box plots in Figures 15 through 18.

4.2 Infrastructure
Figure 12: LSTM deviation distribution Figure 15: NeuralLOS with notes deviation distribution

Figure 13: NeuralLOS deviation distribution Figure 16: LSTM deviation distribution

Figure 14: NeuralLOS with notes deviation distribution Figure 17: NeuralLOS deviation distribution
Forecasting the length of a patient's stay is a critical challenge in
healthcare. Obtaining an estimate of the remaining length of stay
aids hospitals in better resource allocation for healthcare services.
Additionally, it provides valuable insights for insurance companies
to estimate expenses accurately. Leveraging NeuralLOS, we
achieved impressive results in predicting length of stay. By
comparing our model with various benchmark models and
presenting results from different perspectives, we demonstrated its
effectiveness. Although further refinement is required to enhance
the model's performance, even in its current implementation,
NeuralLOS yields superior results.

7. LIMITATIONS
One of the primary challenges we encountered was the scarcity of
computational resources required to process the entire dataset.
The embeddings of notes consume significant memory, and due to
Figure 18: NeuralLOS with notes deviation distribution
memory constraints, we were unable to accommodate the entire
working set in memory. Additionally, since NeuralLOS involves
We trained and evaluated our models on AWS Cloud Plat-form. computing a large number of parameters, utilizing GPUs was
The machine configuration is listed below: imperative to expedite training. Despite encountering some
hurdles, we managed to secure access to a GPU in GCP with
Machine Type n1-standard [16 vCPUs] limited capacity. Consequently, we trained our EpisodeNet on a
CPU Platform Intel Broadwell subset of the data. An intriguing observation we made was that the
Memory 110GB prediction accuracy of NeuralLOS improves for patients with
GPU NVIDIA Tesla 4 longer stays. This improvement can be attributed to the
Storage 400 GB SSD accumulation of more information over time, enabling the model
to make more accurate predictions.
We utilize widely-used Python libraries, including but not limited to
PyTorch, Keras, TensorFlow, scikit-learn, Matplotlib, and pickle. Our 8. RESOURCES
code is accessible through a GitHub repository. It's important to note Github : https://github.com/vibhor-github/lenght-of-stay.git
that the benchmark code is constrained by data preprocessing-
intensive tasks. Initially, the benchmark code lacked the capability to
run in parallel and utilize GPU resources effectively. Consequently, we 9. ACKNOWLEDGEMENTS
dedicated significant effort to implement multi-threaded data We express our gratitude to Professor Jimeng Sun and all teaching
preprocessing, aiming to maximize GPU utilization. assistants for their invaluable guidance and support throughout
this work
5. TEAM CONTRIBUTIONS
It was a collaborative effort, both of us involved in various aspects, 10. REFERENCES
contributing to the planning, experimentation, and training phases of
model development. [1] E. Alsentzer, J. R. Murphy, W. Boag, W.-H. Weng,
D. Jin, T. Naumann, and M. B. A. McDermott. Pub-licly
Vibhor spearheaded the setup of AWS environments for training available clinical bert embeddings, 2019.
LSTM and linear regression models. He also led the efforts to adapt [2] H. Baek, M. Cho, S. Kim, H. Hwang, M. Song, and
and upgrade the benchmark code to ensure compatibility with newer
S. Yoo. Analysis of length of hospital stay using elec-
versions of libraries like TensorFlow and Keras. Addressing speed
tronic health records: A statistical and data mining
challenges, Alan implemented multiprocessing capabilities in the
approach. PloS one, 13(4):e0195901, 2018.
preprocessing routines used for creating training tensors. Additionally,
[3] K. Canese and S. Weis. Pubmed: the bibliographic
Alan authored the Data and Evaluation sections of the report and
developed the program responsible for aggregating results from all database. In The NCBI Handbook [Internet]. 2nd edi-
models. tion. National Center for Biotechnology Information
(US), 2013.
Priyank played a key role in designing the NeuralLOS model [4] Q. Chen, Y. Peng, and Z. Lu. Biosentvec: creat-
architecture and implementing the dataset windowing techniques. He
actively participated in model training and metric generation. ing sentence embeddings for biomedical texts. 2019
IEEE International Conference on Healthcare Infor-
matics (ICHI), Jun 2019.
We both significantly worked on the generation of BioSentVec and
BioClinicalBERT embeddings for notes. They also played a crucial role [5] D. E. Clark and L. M. Ryan. Concurrent prediction of
in generating preprocessed data using the benchmark code. hospital mortality and length of stay from risk factors on
admission. Health services research, 37(3):631–645, 2002.
6. CONCLUSIONS
[6] S. Cropley. The relationship-based care model: evalua-
tion of the impact on patient satisfaction, length of stay, and
readmission rates. JONA: The Journal of Nursing
Administration, 42(6):333–339, 2012.
[7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. [19] A. E. Johnson, T. J. Pollard, L. Shen, L. H. Lehman,
Bert: Pre-training of deep bidirectional trans- M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and
formers for language understanding. arXiv preprint R. G. Mark. Mimic-iii, a freely accessible crit-ical care
arXiv:1810.04805, 2018. database. Scientific data, 3:160035, 2016.

[8] G. DH. Length of stay: Prediction and explanation. [20] D. P. Kingma and J. Ba. Adam: A method for stochas-
Health services research, 3(1), 12–34., 1968. tic optimization, 2017.

[9] J. Fang, J. Zhu, and X. Zhang. Prediction of length of [21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Ima-
stay on the intensive care unit based on bayesian neu-ral genet classification with deep convolutional neural net-
network. In Journal of Physics: Conference Series, volume works. Advances in neural information processing sys-tems,
1631, page 012089. IOP Publishing, 2020. 25:1097–1105, 2012.

[10] R. Figueroa, J. Harman, and J. Engberg. Use of claims [22] Y. LeCun, P. Haffner, L. Bottou, and Y. Bengio. Object
data to examine the impact of length of inpatient psy- recognition with gradient-based learning. In Shape, con-tour
chiatric stay on readmission rate. Psychiatric Services, and grouping in computer vision, pages 319–345.
55(5):560–565, 2004. Springer, 1999.

[11] T. Gentimis, A. J. Alnaser, A. Durante, K. Cook, [23] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So,
and R. Steele. Predicting hospital length of stay using and J. Kang. Biobert: a pre-trained biomedical lan-guage
neural networks on mimic iii data. In 2017 IEEE representation model for biomedical text mining.
15th Intl Conf on Dependable, Autonomic and Secure Bioinformatics, Sep 2019.
Computing, 15th Intl Conf on Pervasive Intelligence and
Computing, 3rd Intl Conf on Big Data Intelligence and [24] J. Mullenbach, S. Wiegreffe, J. Duke, J. Sun, and
Computing and Cyber Science and Technology J. Eisenstein. Explainable prediction of medical codes from
Congress(DASC/PiCom/DataCom/CyberSciTech), pages clinical text. arXiv preprint arXiv:1802.05695, 2018.
1194–1201, 2017.
[25] K. J. Ottenbacher, P. M. Smith, S. B. Illig, R. T. Linn,
[12] H. Harutyunyan, H. Khachatrian, D. C. Kale,
G. V. Ostir, and C. V. Granger. Trends in length of stay,
G. Ver Steeg, and A. Galstyan. Multitask learning and
living setting, functional outcome, and mortality following
benchmarking with clinical time series data. Scientific Data,
medical rehabilitation. Jama, 292(14):1687–1695, 2004.
6(1), Jun 2019.

[13] H. Harutyunyan, H. Khachatrian, D. C. Kale,


[26] A. Peimankar and S. Puthusserypady. Dens-ecg: A
G. Ver Steeg, and A. Galstyan. Multitask learning and
deep learning approach for ecg signal delineation. Ex-pert
benchmarking with clinical time series data. Scientific Data,
Systems with Applications, 165:113911, 2021.
6(1):96, 2019.
[27] A. E. Pollard, Tom J abd Johnson. The mimic-iii clin-
[14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual ical database. http://dx.doi.org/10.13026/C2XW26, 2016.
learning for image recognition. In Proceedings of the
IEEE conference on computer vision and pattern recog-
nition, pages 770–778, 2016. [28] A. e. a. Rajkomar. Scalable and accurate deep learning
with electronic health records. 6:18, 2018.
[15] S. Hochreiter and J. Schmidhuber. Long short-term
memory. Neural computation, 9(8):1735–1780, 1997. [29] E. Rocheteau, P. Liò, and S. Hyland. Temporal point-
wise convolutional networks for length of stay pre-
[16] K. Huang, J. Altosaar, and R. Ranganath. Clinical- diction in the intensive care unit. arXiv preprint
bert: Modeling clinical notes and predicting hospital arXiv:2007.09483, 2020.
readmission. arXiv preprint arXiv:1904.05342, 2019.
[30] M. Schuster and K. K. Paliwal. Bidirectional recurrent
[17] S.-J. Jang, I. Yeo, D. N. Feldman, J. W. Cheung, R. M. neural networks. IEEE Transactions on Signal Process-ing,
Minutello, H. S. Singh, G. Bergman, S. C. Wong, and 45(11):2673–2681, 1997.
L. K. Kim. Associations between hospital length of stay, 30-
day readmission, and costs in st-segment–elevation [31] M. Sotoodeh and J. C. Ho. Improving length of stay
myocardial infarction after primary percutaneous coro-nary prediction using a hidden markov model. AMIA Sum-mits
intervention: a nationwide readmissions database analysis. on Translational Science Proceedings, 2019:425, 2019.
Journal of the American Heart Association,
9(11):e015503, 2020.
[32] A. Suresh, K. Harish, and N. Radhika. Particle swarm
[18] B. L. S. G. C. P.-P. J. M. A. V. M. M. Jienan Yao, optimization over back propagation neural network for
Yuyang Liu and M. Ghassemi. Visualization of deep length of stay prediction. Procedia Computer Science,
models on nursing notes and physiological data for pre- 46:268–275, 2015. Proceedings of the International Con-
dicting health outcomes through temporal sliding win- ference on Information and Communication Technolo-
dows. In Explainable AI in Healthcare and Medicine, gies, ICICT 2014, 3-5 December 2014 at Bolgatty
pages 115–129, 2021. Palace Island Resort, Kochi, India.
[33] G. E. Weissman, R. A. Hubbard, L. H. Ungar, M. O.
Harhay, C. S. Greene, B. E. Himes, and S. D. Halpern.
Inclusion of unstructured clinical text improves early
prediction of death or prolonged icu stay. Critical care
medicine, 46(7):1125, 2018.

You might also like