Forecasting Patient Length of Stay For Optimal Hospital Resource Allocation
Forecasting Patient Length of Stay For Optimal Hospital Resource Allocation
Karthikeyan P Bharath M
Information Technology
Information technology
Thiagarajar College of Engineering,
Thiagarajar College of Engineering,
Madurai, India
Madurai, India
[email protected]
[email protected]
Sriram B
Vishnuram M K Information Technology
Information Technology Thiagarajar College of Engineering,
Thiagarajar College of Engineering,
Madurai, India
Madurai, India
[email protected]
[email protected]
Abstract---Efficient management of hospital resources, in offering insights into the anticipated duration of
especially bed allocation, plays a pivotal role in enhancing patientstays, thereby enabling hospitals to forecast resource
patient care quality and operational effectiveness. In our requirements and optimize bed allocation accordingly. By
research, we propose a predictive modeling framework accurately forecasting patient LOS, hospitals can streamline
aimed at forecasting patient length of stay (LOS) to processes related to admissions, discharges, and transfers,
support optimal resource allocation in hospitals. Our ensuring timely access to care for all patients.
methodology relies on data extracted from two key In this context, predictive modeling techniques present a
databases: Hospital_Main_DB and Hospital_Insights_DB. promising avenue for forecasting patient LOS and guiding
By extracting relevant features from Hospital_Main_DB optimal resource allocation strategies. By harnessing data
and training machine learning models on from hospital databases such as Hospital_Main_DB and its
Hospital_Insights_DB, we aim to predict patient LOS counterpart Hospital_Insights_DB, predictive models can
accurately. The predictive models are designed to be analyze diverse medical parameters and historical patient
continuously updated as new data streams in, enabling data to deliver precise LOS predictions. This study aims to
real-time forecasting of bed counts based on projected showcase the practical utility of predictive modeling in
patient LOS. This iterative approach empowers proactive informing hospital resource allocation strategies through the
resource planning and allocation, thereby strengthening development of a predictive LOS model. Leveraging data
hospital operational efficiency and the delivery of patient from Hospital_Main_DB and its corresponding
care. Additionally, we leverage interactive dashboards Hospital_Insights_DB, we will train machine learning
using Power BI to visualize and present the results of our models tailored for LOS prediction.
predictive models. These dashboards provide stakeholders
with actionable insights, facilitating data-driven decision- Furthermore, we will explore the integration of predictive
making processes. Through our research, we seek to modeling outcomes into PowerBI dashboards, providing
underscore the value of predictive modeling in informing stakeholders with actionable insights for proactive resource
hospital resource allocation strategies, ultimately leading planning and allocation. Through this research endeavor, our
to improved healthcare outcomes. goal is to contribute to ongoing initiatives aimed at
optimizing hospital resource allocation and ultimately
Index Terms—Hospital Resource Allocation, Patient enhancing healthcare outcomes through informed data-driven
Length of Stay, Predictive Modeling, Machine Learning, decision-making processes.
Healthcare Analytics, Real-Time Forecasting, Data- II. PRIOR WORK
Driven Decision Making, Power BI, Data Visualization.
This study aimed to create a robust forecasting model for
I. INTRODUCTION predicting Patient Length of Stay (LOS) in Emergency
Departments (EDs) during the COVID-19 pandemic, with a
In modern healthcare systems, efficient management of
focus on meeting a 4-hour target window. Utilizing data
hospital resources is paramount to ensure both quality
from a Detroit hospital between March and December 2020,
patient care and operational efficiency. Among these
four machine learning models were employed: logistic
resources, hospital bed allocation stands out as a critical
regression, gradient boosting, decision tree, and random
area requiring meticulous attention. Effective bed allocation
forest.
not only enhances patient outcomes but also contributes
significantly to the overall efficacy of healthcare delivery. The gradient boosting model performed best, achieving an
Patient length of stay (LOS) emerges as a pivotal factor 85% accuracy rate and an F1-score of 0.88. Key factors for
influencing bed utilization and resource allocation planning predicting prolonged ED stay included patient
within hospitals. LOS prediction models play a crucial role demographics, comorbidities, and ED operational metrics.
979-8-3503-7700-2/24/$31.00 ©2024 IEEE
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.
The developed predictive framework can aid resource importance of data-driven methodologies in LOS
allocation and provide patients with more accurate LOS management and the benefits of oversampling and cost-
estimates, enhancing healthcare delivery. [1] sensitive techniques for addressing sample imbalances [6].
This Paper introduces a predictive model for hospital The study explores the benefits of using summarized De-
Length of Stay (LOS), focusing on departmental variations identified data obtained through k-means cluster analysis,
and predictive patterns. It highlights improved prediction addressing privacy concerns. It investigates the efficacy of
rates compared to baseline algorithms, particularly in analyzing clustered patient groups compared to the entire
identifying patterns for specific patient groups. The dataset for predicting mortality rates and ICU length of stay
background section discusses the significance of LOS (LOS). Regression analysis on both original and clustered
adjustment for cost-saving and identifies influential factors. data from the MIMIC II dataset shows clustering
It compares various machine learning techniques used in significantly improves prediction accuracy for mortality rates
previous studies, emphasizing the effectiveness of decision and LOS. It also enhances prediction accuracy for the
trees and Random Forest algorithms. Supervised duration until the next emergency room visit among cancer
classification algorithms are highlighted for predicting LOS. patients. These findings underscore the potential of clustering
The review suggests potential improvements in grouping techniques followed by regression analysis to improve
days and focusing efforts on medical specialties for better prediction accuracy, offering valuable insights for health
results [2]. repository-based data analytics and clinical decision support
During the COVID-19 pandemic, US healthcare faced methodologies [7].
unprecedented challenges, revealing shortcomings in This study investigates the prediction of length-of-stay (LOS)
existing pandemic planning guidelines. A new approach for pediatric patients diagnosed with respiratory illnesses. It
integrates system dynamics modeling and discrete-event employs decision tree methodologies including Bagging,
simulation to support hospital planning effectively. This AdaBoost, and Random Forest. Data from 11,206 patient
adaptable methodology, trained and validated using public records are analyzed, with preprocessing techniques applied.
data and hospital records, predicts COVID-19 cases and Evaluation involves two tests: bisection and periodic. Results
assesses bed utilization. By analyzing the impact of control indicate Bagging performs best in the bisection test, while
measures and virus variants, it provides insights for Adaboost slightly outperforms others in the periodic test. The
pandemic preparedness planning. Key features include a study underscores the effectiveness of these methodologies in
dynamic prediction module for estimating disease predicting LOS for pediatric respiratory illness cases,
transmission dynamics and a SEIR model for predicting emphasizing the significant influence of hospital treatment
spread. These models, calibrated with clinical evidence, data on predictive models [8].
offer reliable forecasting for hospital patient flow and This study focuses on forecasting the length of stay (LOS) for
readiness evaluation [3]. stroke patients at King Fahad Bin Abdul-Aziz Hospital in
The paper proposes a two-stage classification model for Saudi Arabia. Effective resource allocation in stroke units and
optimizing healthcare resource allocation based on early LOS prediction are crucial for cost containment and
electronic patient records. It utilizes CART analysis method service optimization. The methodology involves feature
to identify pivotal patient attributes influencing resource selection using information gain and constructing prediction
demands, such as principal procedure code, admission point, models with various machine learning algorithms. The
and operating surgeon. This approach is chosen for its Bayesian network model achieved 81.28% accuracy,
ability to handle predictor variable interactions, surpassing the C4.5 decision tree model with 77.1%
nonparametric nature, and resilience to dimensionality accuracy, indicating superior predictive performance for LOS
concerns. These variables collectively account for a prediction in this context [9].
significant portion of variance in patients' lengths of stay This Study focuses on Heart failure is prevalent in Japan,
(LOS) [4]. leading to high hospitalization rates and significant societal
This study analyzes Intensive Care Unit (ICU) Length of burdens compared to Western countries, primarily due to
Stay (LOS), crucial for healthcare quality and resource longer stays. To address this, a recent study introduced a deep
allocation. Previous methods lack interpretability and learning model that integrates weighted predictors to forecast
accuracy. Using data from 1,214 unplanned ICU hospitalization costs and length of stay utilizing electronic
admissions, advanced preprocessing, exploratory analysis, health records. Through rigorous 5-fold cross-validation, the
and LASSO algorithm are employed for a precise predictive model demonstrates superior predictive accuracy compared to
model. Rigorous evaluation shows promising results: traditional linear regression methods, presenting potential
RMSPE of 0.88±0.13 days, MAE of 0.87±0.07 days, and R2 advancements in healthcare resource management [10].
of 0.35±0.09. Risk factors for LOS in survivor and non- III. PROPOSED METHODS
survivor groups are investigated, with competitive
performance against contemporary methods. This enhances We propose a predictive modeling approach designed to
patient flow management and facility throughput [5]. optimize hospital resource allocation by forecasting patient
The paper presents a data-driven strategy for predicting length of stay (LOS) and anticipating bed count adjustments
prolonged lengths of stay (LOS) in appendectomy patients, based on these predictions. Leveraging data from
crucial for managing healthcare costs. It introduces a Hospital_Main_DB and Hospital_Insights_DB, our approach
support vector machine approach enhanced with resampling extracts key features from the former to train machine
techniques to handle imbalanced datasets. Analyzing 557 learning models on the latter for LOS prediction. These
cases from a medical center in Taiwan, the study models are then continuously updated with new data,
demonstrates the system's efficacy in identifying patients at providing real-time forecasts of bed counts based on
risk of extended hospital stays. These findings highlight the anticipated LOS fluctuations. To facilitate decision-making
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.
and resource allocation, we visualize the results through
interactive Power BI dashboards. These dashboards offer IV. DATA COLLECTION AND PREPROCESSING
insights into predicted LOS trends and corresponding bed We utilized a dataset sourced from Health Data
count requirements, enabling healthcare administrators to repository hosted on US Government website [11]
make informed decisions on staffing, bed availability, and
resource allocation. Figure-1 illustrates the system A. Dataset Features
architecture, showcasing the seamless integration of data
from both databases for LOS forecasting. It outlines the Table I: Features Details
flow of data and the interaction between various
components essential for model training and real-time
forecasting. Additionally, Figure-2 depicts the database
schema, elucidating the structure and relationships between
tables within Hospital_Main_DB and
Hospital_Insights_DB. This schema serves as a blueprint for
data extraction and feature engineering, which are crucial
steps in model training and refinement. Table-I shows the
features details. As part of our methodology, we conduct
correlation analysis, as shown in Figure-3, to understand the
relationships between variables. This analysis provides
insights into the interdependencies among variables and aids
in the identification of key predictors for LOS. Furthermore,
ANOVA feature selection analysis, illustrated in Figure-4, B. Exploratory Data Analysis (EDA)
refines our models by explaining the significance of
different features in predicting LOS. This process guides the
selection of the most influential variables for model
development and refinement.
By integrating these figures into our main content, we
provide a comprehensive overview of our methodology,
covering data integration, feature engineering, model
development, and refinement. This iterative approach aims
to enhance hospital operational efficiency and healthcare
outcomes by leveraging predictive analytics and data-driven
decision-making to effectively manage fluctuating bed
counts in response to changing patient LOS patterns.
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.
C. Stratified sampling Technique E. K-Fold Cross Validation
We utilized a stratified sampling approach to address the We assessed the performance of our regression models
distributional imbalance of the target variable, through three distinct K-fold cross-validation procedures to
Length_of_Stay, which comprised 119 unique values. We ensure a comprehensive evaluation. Employing 3-fold, 5-
partitioned the dataset into strata corresponding to each fold, and 10-fold strategies, we aimed to assess model
unique value and divided samples into two groups: one robustness and generalization capabilities under different
comprising predominant values (1, 2, 3, 4, 5, 6), and the validation scenarios. By systematically varying the number
other consisting of less common values. To ensure of folds, we obtained valuable insights into model stability
representativeness, 80% of samples from predominant and reliability across various partitioning schemes, aiding in
values and 20% from less frequent values were allocated for informed model selection and validation decisions.
training and testing. Random selection within each group To provide a detailed assessment of model performance, we
prevented bias. This method aimed to create a representative present the results in following tables. Table IⅠ shows the
sample capturing the variable's distributional nuances, MSE for each fold and the average MSE for 3-fold cross-
enhancing model robustness and generalization. validation. Similarly, Table III displays performance metrics
Additionally, Figure-5 illustrates the stratified sampling for 5-fold cross-validation, while Table IV presents MSE
process. We categorized the length of stay feature into two results for 10-fold cross-validation. These tables offer a
groups. The first group encompasses the most prevalent comprehensive overview of the model's predictive accuracy
values: {1, 2, 3, 4, 5, 6}, representing the predominant and stability under different cross-validation scenarios,
lengths of stay. Meanwhile, the second group encompasses facilitating informed decisions regarding model selection
the less frequent values. This visualization provides a clear and validation strategies.
depiction of how the stratified sampling approach was
implemented, ensuring that both predominant and less Table II: 3-Fold Cross-Validation Results
frequent values are adequately represented in the training
and testing datasets. Through this approach we aimed to
address distributional imbalances, enhancing the overall
performance and generalization capability of our predictive
models.
D. Prediction Model
In our investigation of multiple regression algorithms for
predicting the Length of Stay variable in healthcare datasets,
we meticulously evaluated their performance using 3, 5, and Table IV: 10 -Fold Cross Validation Results
10-fold cross-validation techniques. Notably, the 10-fold
validation method yielded the lowest Mean Squared Error
(MSE) scores across the assessed algorithms. Specifically,
the XGBoost regression model demonstrated good
performance, attaining an average MSE of 8.66. Similarly,
the Random Forest regression algorithm showcased
promising results, with an average MSE of 8.65,
highlighting its proficiency in handling nonlinear
relationships and complex data structures. However, the F. Results
Decision Tree regression model exhibited a slightly higher We propose a predictive modeling approach to optimize
average MSE of 9.05, suggesting moderate performance. hospital resource allocation, focusing on patient length of
Strikingly, AdaBoost regression displayed significant stay (LOS) forecasting. Leveraging data from
underperformance, with an average MSE of 35.4, indicating Hospital_Main_DB and Hospital_Insights_DB, our models
challenges in effectively capturing the dataset's nuances. continuously update to provide real-time bed count forecasts
These findings underscore the practical utility of XGBoost based on LOS predictions. This dynamic forecasting enables
and Random Forest regression models in accurately hospitals to adjust bed counts in real time, responding to
predicting Length of Stay in healthcare settings, fluctuations inpatient admissions and discharges.
emphasizing the critical role of algorithm selection tailored
to dataset complexities. Further refinement and exploration
are imperative, particularly in addressing the limitations
observed with ADABoost regression
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.
Results show significant improvements in operational empowering hospital administrators with actionable insights
efficiency and patient care delivery, facilitated by informed into bed utilization and patient flow management. Overall,
decision-making through interactive Power BI dashboards. our study demonstrates the transformative potential of
The Figure-6 represents PowerBI_Patients_information, predictive modeling in enhancing hospital resource allocation
showcasing key patient demographics and admission strategies and healthcare outcomes.
details.
G. Discussion
In the discussion section, we emphasize the pivotal role of
predictive modeling in optimizing hospital resource
allocation. Leveraging data from Hospital_Main_DB and
Hospital_Insights_DB, our approach enables real-time
forecasts of patient length of stay (LOS), enhancing
Figure 7 - Power-BI Medical_Encounters_Information
operational efficiency and patient care delivery. Continuous
model updating and interactive Power BI dashboards
empower decision-makers to make informed, proactive
decisions, ultimately improving healthcare outcomes.
Additionally, by continuously monitoring bed count
fluctuations, our model provides insights into resource
utilization trends, allowing for dynamic adjustments to meet
evolving demand and optimize resource allocation.
V. CONCLUSIONS
Our study demonstrates the effectiveness of predictive
modeling in optimizing hospital resource allocation,
especially in bed management, through precise forecasting
of patient length of stay (LOS). Leveraging machine
Figure 8 - Power-BI Resources Information learning and real-time data updates enables hospitals to
proactively plan and distribute resources, enhancing
The Figure-11 showcases Resources Information predicted operational efficiency and patient care. However, challenges
LOS, Admission Date, and Predicted Discharge Date, persist, particularly in managing fluctuating bed counts.
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.
While XGBoost and Random Forest regression algorithms
show superior performance in accurately predicting LOS,
challenges arise with ADABoost regression, indicating the
need for further refinement in predictive modeling
approaches to address such fluctuations. This research
contributes to advancing healthcare operations management,
emphasizing the pivotal role of data-driven decision-making
in improving healthcare outcomes.
ACKNOWLEDGMENTS
REFERENCES
[1] E. -E. Etu et al., "Prediction of Length of Stay in the Emergency
Department for COVID-19 Patients: A Machine Learning
Approach," in IEEE Access, vol. 10, pp. 42243-42251, 2022, doi:
10.1109/ACCESS.2022.3168045.
[2] J. M. P. Gutiérrez, M. -Á. Sicilia, S. Sanchez-Alonso and E. García-
Barriocanal, "Predicting Length of Stay Across Hospital
Departments," in IEEE Access, vol. 9, pp. 44671-44680, 2021, doi:
10.1109/ACCESS.2021.3066562.
[3] T. Zhang, Y. Lu, Y. Guan, X. Zhong and T. Hogan, "Data-Driven
Modeling and Analysis for COVID-19 Pandemic Hospital Beds
Planning," in IEEE Transactions on Automation Science and
Engineering, vol. 20, no. 3, pp. 1551-1564, July 2023, doi:
10.1109/TASE.2022.3224171.
[4] A. Kumar and H. Anjomshoa, "A Two-Stage Model to Predict
Surgical Patients’ Lengths of Stay From an Electronic Patient
Database," in IEEE Journal of Biomedical and Health Informatics,
vol. 23, no. 2, pp. 848-856, March 2019, doi:
10.1109/JBHI.2018.2819646.
[5] C. Li et al., "Prediction of Length of Stay on the Intensive Care Unit
Based on Least Absolute Shrinkage and Selection Operator,"
in IEEE Access, vol. 7, pp. 110710-110721, 2019, doi:
10.1109/ACCESS.2019.2934166.
[6] T. -H. Cheng and P. J. -H. Hu, "A Data-Driven Approach to Manage
the Length of Stay for Appendectomy Patients," in IEEE
Transactions on Systems, Man, and Cybernetics - Part A: Systems
and Humans, vol. 39, no. 6, pp. 1339-1347, Nov. 2009, doi:
10.1109/TSMCA.2009.2025510.
[7] M. Rouzbahman, A. Jovicic and M. Chignell, "Can Cluster-Boosted
Regression Improve Prediction of Death and Length of Stay in the
ICU?," in IEEE Journal of Biomedical and Health Informatics, vol.
21, no. 3, pp. 851-858, May 2017, doi:
10.1109/JBHI.2016.2525731.
[8] F. Ma, L. Yu, L. Ye, D. D. Yao and W. Zhuang, "Length-of-Stay
Prediction for Pediatric Patients With Respiratory Diseases Using
Decision Tree Methods," in IEEE Journal of Biomedical and Health
Informatics, vol. 24, no. 9, pp. 2651-2662, Sept. 2020, doi:
10.1109/JBHI.2020.2973285.
[9] A. R. Al Taleb, M. Hoque, A. Hasanat and M. B. Khan,
"Application of data mining techniques to predict length of stay of
stroke patients," 2017 International Conference on Informatics,
Health & Technology (ICIHT), Riyadh, Saudi Arabia, 2017, pp. 1-5,
doi: 10.1109/ICIHT.2017.7899004.
[10] X. Zhou, X. Zhu and K. Nakamura, "Prediction of Hospitalization
Cost and Length of Stay for Patients with Heart Failure Using Deep
Learning," 2022 IEEE 4th Global Conference on Life Sciences and
Technologies (LifeTech), Osaka, Japan, 2022, pp. 158-161, doi:
10.1109/LifeTech53646.2022.9754924.
[11] Health Data NY - https://health.data.ny.gov/ dated 01.04.2024
Authorized licensed use limited to: UNIVERSITI TEKNOLOGI MARA. Downloaded on April 18,2025 at 19:21:50 UTC from IEEE Xplore. Restrictions apply.