Process Mining Based Modeling and Analysis of W Orkflows in Clinical Care - A Case Study in A Chicago Outpatient Clinic
Process Mining Based Modeling and Analysis of W Orkflows in Clinical Care - A Case Study in A Chicago Outpatient Clinic
Abstract-The United States currently spends over 17% of its The research on healthcare system efficiency improvement
gross domestic product on healthcare and this expenditure will has attracted much attention, in which Discrete Event
continue to rise in the next few decades. Improving the Simulation (DES) based approach appears to be a dominant
performance and the efficiency of the United States healthcare tool [4]. DES can present detailed analysis results for existing
system is of practical value to lowering such expenditure. healthcare systems, predict the impact of potential changes, and
Discrete event modeling, which can capture the complex provide a guideline for management. It has been used in
behaviors of healthcare systems and provide statistical
various healthcare systems, such as the Accident & Emergency
estimations of 'what if' scenarios, is one of the most powerful and
Department (A&E), inpatient facilities, outpatient clinics and
cost-effective methods for healthcare system improvement. It
other hospital units [5]. For example, Harper and Gamlin [6]
involves creating an abstract-level workflow model with an
developed a simulation model for a clinic and tested a number
accurate view of the patient flow while considering the dynamic
nature of healthcare processes. In this paper, an outpatient clinic
of different appointment schedules. Their results suggested that
in Chicago, Illinois, USA, is used as a case study to illustrate a
the waiting time of patients could be reduced significantly
process mining based method for healthcare processes
through improved management of the schedule system.
management and improvement. This method is able to discover Sinreich and Marmor [7] reported an Emergency Department
meaningful knowledge, i.e., the workflow, of the clinical care (ED) model in a generic format by using Arena simulation
processes by mining event logs. Based on the results from process software. They developed ED models based on their level of
mining, a discrete event simulation model is proposed to abstraction and suggested that their model is better in
quantitatively analyze the clinical center. Sensitivity analyses reusability than others. Ferrin et al [8] proposed a DES model
have also been carried out to investigate the care activities with to analyze an ED in the U.S. Their investigations reduced the
limited resources such as doctors and nurses. The results suggest length of staying in the ED, increased the inpatient daily census
that this methodology is a useful and flexible tool for healthcare and improved the hospital net operating margin.
process performance improvement.
On the other hand, analytical approaches for modeling
Keywords-outpatient clinic; patient flow; process mining; healthcare systems have also been proposed and discussed. For
healthcare; discrete event simulation example, Au et al [9] developed a queuing-based predictive
model to analyze ambulance bypass due to ED overflow,
I. I NT RO D UCTIO N
which could be used to estimate the probability of the ED
reaching some designed capacity within time t for given initial
The expensive and rising cost of healthcare is a nationwide conditions. Green et al [10] investigated the variation of patient
problem in the United States. From 1990 to 2010, total U.S. arrival rates in the ED and proposed a queuing model for
healthcare expenditure has consistently grown faster than the staffing patterns to reduce the number of patients who leave
economy, which has reached $2.6 trillion, as $8,402 per person without being seen.
or the equivalent of 17.9% of the U.S. gross domestic product
(GDP) [1]. Many different factors, such as aging of the In spite of the advances in analyzing and improving
population, development of new treatments, and perverse healthcare systems, new approaches are still needed for two
incentives (third-party payers reimburse for procedures rather reasons. First, many reported studies are unit or facility specific
than outcomes performance) are behind the healthcare cost [5]. In other words, major efforts would be necessary to reuse
increase [2]. However, based on the World Health Report, the these reported models. Second, many researchers assume the
efficiency performance of the U.S. healthcare system is only patient flow is clear and usually build the workflows from
ranked 37th in the world [3]. Considering the fact that the U.S. observation. However, it is not always the case. In reality,
is the number one in terms of healthcare spending per person healthcare systems, especially large healthcare facilities, are
[3], there is an urgent need to improve the operation efficiency, characterized by highly complicated and remarkably flexible
which would potentially lower the costs in the U.S. healthcare patient flows. Under such circumstances, it will be really
system. challenging to get a precise view of the patient flow in the
health system with enough details.
591
B. Process Mining and Wor kflow Discovery a =p , and b=Pi+l ( l:S;i:s;n- l). (2) a -+[ b if and only if
The goal of process mining is to analyze the event log and a>[b and bf[a . (3) a#[b if and only if af[b and
discover a structured patient flow for simulation. To achieve bf[a .(4) all if and only if a >[ b and b>[a.
this, two different process mining techniques are applied in this
paper and their results are discussed below. The a-algorithm is one of the fust process mmmg
900,-------, techniques that can be used to produce a constructed Petri net
based on a given log. The a-algorithm defines several typical
750 process patterns such as sequencing, XOR-split, XOR-join,
AND-split, and AND-join [17]. The basic idea of a-algorithm
� is to examine the causal relationships between observed tasks
]'550
'"
in the log. In other words, the a-algorithm fust discovers the
logical order relationships between tasks. For example, one
c:
i
� 15
�() Zl
i., 0 . S
"
-" ;> §, �
() .� In order to apply the a-algorithm and get a readable model,
C/l C/l ., " �
u "
.S i:S 0 the column "Finish Time" in the event log is removed. Then
...
;::::l we import the new event log into ProM [17], a software tool
for process mining, and apply the a-algorithm. The model
Fig. I. Boxplots of event duration time
identified by the a-algorithm is shown in Fig.2. In addition to
the general drawbacks such as incompleteness, noise and
a) a-Algorithm
soundless, the following specific drawbacks of the generated
We introduce some basic knowledge and mathematical model are observed.
notations about Petri Net as well as the a-algorithm. Some
defmitions that are helpful for the patient flow discovery are
adopted from [17]. More detailed information on the concepts
of the trace, the firing rule, firing sequences, preset. , and
postset p. can be found in [17] and are not repeated here.
592
not a Workflow net as "Vitals" is disconnected from the rest IV. SI M UL ATOI N MO DEL
of the model.
In this section, we develop a DES model based on the
• Since there are many loops in Fig.2, it can generate discovered workflow in Fig. 3, which can be used later to
traces such as < ... ,Waiting, UrineSample, Waiting, estimate and improve the clinic's performance under different
UrineSample,...
>, which do not belong to the event log. operation scenarios. The simulation model is implemented
using the professional simulation software ProModel [22]. In
• The model generated using the a-algorithm does not this DES model, each procedure or event is treated as a
represent the frequency of each trace. "machine" with process operation time, required resources, and
There are various algorithms that could overcome the routing logics. The resources include doctors, nurses, and
weaknesses of the a-algorithm. For example, the a+-algorithm receptionists. The details of the proposed model are introduced
[18] uses a pre- and post-processing phase to deal with the as follows.
short loops. There are also algorithms that use a completely
different approach, e.g., genetic mining [19] and fuzzy mining
[20]. Since their concepts are pretty similar, the fuzzy mining
method is selected and discussed with details next.
b) Fuzzy Mining
The main problem with the model in Fig.2 is that it is too
complicated for the purposes of simulation and managerial
analysis. In other words, the discovered model is "spaghetti
like", which shows all details without distinguishing what is
crucial and what is unnecessary. Therefore, a method with
high-level abstraction and clustering is needed. To overcome
this problem, the fuzzy mining approach, which can provide
meaningful abstraction of system processes and allow for
different views with different simplified level for a particular
process, is applied in this paper to discover the workflow of the
clinic.
Fuzzy mining is a suitable process mining technology for
unstructured processes. Different from the a-algorithm, fuzzy
Fig. 3. The model created by the fuzzy mining
mining introduces two important and fundamental metrics:
significance and correlation [20]. The fust metric, significance, a) P atient Clustering: Based on the given information,
is used to measure the relative importance of behavior there are two types of patients, low-risk and high-risk. The
including both event classes and the binary order relationships cases in the event log can be classified into two clusters by
between them. One way of measuring significance is by using the k-means clustering algorithm, where k 2. After=
frequency, i.e. some relationships with higher frequency in the normalization for each event, a summary of clustering results is
event log are more significant. The second metric, correlation, provided in Table Ill, which is the k cluster centroid location.
focuses on the precedence relationships over events. It Since the total numbers for clusters Cl and C2 are 10052 and
indicates how closely the two events are following each other. 9942, respectively, we assume that the arrival probabilities of
Based on these two metrics, fuzzy mining could reduce and the two types of patients are equal.
focus the displayed event classes by employing the two
concepts, aggregation and abstraction, in the roadmap. To be TABLE III SUMMARY OF CLUSTERING RESULTS AFTER NORMALIZATION
more specific, less significant but highly correlated behavior Centroid
Cluster
could be aggregated in the model; less significant and lowly Name
Sign Check
Vitals
Urine
Diagnosis
Check
correlated behavior could be abstracted from the model. More In In Sample Out
Cl 0.26 0.50 0.68 0.34 0.72 0.30
details of the fuzzy mining algorithm can be found [20].
C2 0.25 0.50 0.24 0.34 0.20 0.30
Before applying fuzzy mining, we further analyze and
simplify the event log as follows. We observe the duration of b) P atient Arrival: There is no arrival time recorded in
"Waiting" is not independent of other events. That is, if the the event log. Therefore, the starting time of the fust event for
durations of other events are known, the duration of "Waiting" each case is assumed to be the arrival time of each patient. The
will be automatically determined. Therefore, we remove the histogram of patient arrival by hours is shown in Fig.4.
"Waiting" events from the event log. This new event log is Because the arrival of patient is based on an appointment
loaded into DISCO (Fluxicon Process Laboratories, Eindhoven, system, there is no significant difference between different
The Netherlands) [21] and analyzed by the fuzzy mining hours during work days. Based on the data, the patient arrival
approach built in. The final model generated is shown in Fig. 3, is assumed to follow Poisson distribution P(12.6).
where the left side shows the corresponding frequency of each
event and the right side shows the average duration time of c) Reso urces: Based on information from the clinic, we
each event. make the following assumptions about resources for the
simulation model. (1) The care center has two receptionists,
593
who are in charge of both "CheckIn" and "CheckOut" for
patients. (2) The care center has one nurse who is in charge of
taking the "Vitals" for patients. (3) The care center has three
doctors who are in charge of the "Diagnosis". (4) All doctors
and nurse can handle both types of patients.
2500
� 2000
.�
i 1500
.�
Q.
� 1000
QI
E
.c
� 500
d) Operation Times: The operation times of each service Fig. 5. The simulation model in Pro Model
procedures are collected from the event log. For diagnosis
process, the patients in different clusters will spend
A. Baseline
significantly different amount of time. In addition, some
patients may need "Diagnosis" twice and others may only need In order to provide a quantitative view about the current
it once. Therefore, all operation times of "Diagnosis" are performance of the clinic, the baseline case is simulated. Based
d
classified into four different groups: Cl -1st-diagnosis, Cl _2n _ on the information from the clinic, we assume that the office
diagnosis, C2-1st-diagnosis, and C2_2nd-diagnosis. The normal hours are from 8:00AM to 6:00PM (l Ohrs/day) and the latest
distribution is used for the operation times and the results are arriving time for patient is 4:00PM. In other words, the clinic
provided in Table IV and V. will not accept any new patients after 4:00PM. Since the details
about shifts are unknown, we assume the doctors will take a
twenty minutes rest every four hours. These assumptions will
TABLE IV. THE NORMAL DISTRIBUTION OF EACH EVENT EXCEPT
DIAGNOSIS be applied to all simulation cases in this section. The
Duration Checking Checking Urine Signing Vital simulation runs for 10,000 hours. The simulation results are
(min) In Out Sample In Signs summarized in Table VI. The results are very close to the
Mean 12.5 4.5 4.5 2.5 11.0 information extracted from the event log (average patient cycle
Standard time: 112.60 min vs 114.44 min, and average patient waiting
1.0 1.0 1.0 1.0 4.6
Deviation
time 41.86 vs 44.20 min).
594
applying this change, the patient waiting time would domestic product was unchanged from 2009," Health Affairs, vol. 31,pp.
208-219,2012.
significantly increase from 4 l.86 minutes to 10 l.46 minutes. In
addition, the utilization rate of nurses would be very low, and [2] R. S. Kaplan and M. E. Porter, "How to solve the cost crisis in health
care," Harv Bus Rev, vol. 89,pp. 46-52,2011.
the utilization rate of the receptionist would be too high. The
[3] C.1. Murray and J. Frenk, "Ranking 37th-Measuring the perfonnance
simulation result indicates that it would be a bad choice to
of the US health care system," New England Journal of Medicine, vol.
change one receptionist to one nurse. 362,pp. 98-99,2010.
[4] J. Wang, J. Li, and P. Howard, "A system model of work flow in the
TABLE VII THE PERFORMANCE INDEXES OF SCENARIO I patient room of hospital emergency department," Health Care
Item Result Management Science, pp. 1-11,2013.
Average finished patients /IOhrs 26.95 person/IOhrs
[5] M. M. GOnal and M. Pidd, "Discrete event simulation for performance
Average patient cycle time 172.24 min modelling in health care: a review of the literature," Journal of
Average patient waiting time 101.46 min Simulation, vol. 4,pp. 42-51,2010.
Receptionist utilization rate 98.96%
[6] P. R. Harper and H. M. Gamlin, "Reduced outpatient waiting times with
Nurses utilization rate 34.70%
improved appointment scheduling: a simulation modelling approach,"
Doctor utilization rate 87.05% O R Spectrum, vol. 25,pp. 207-222, 2003.
[7] D. Sinreich and Y. N. Mannor, "A simple and intuitive simulation tool
for analyzing emergency department operations," in Simulation
C. Scenario 2: Adding More Doctors Conference,2004. Proceedings of the 2004 Winter, vol.2,pp. 1994-2002,
In order to improve patients' satisfaction and reduce the 2004.
waiting time, we increase the number of doctors and check the [8] D. M. Ferrin, M. J. Miller, and D. L. McBroom, "Maximizing hospital
sensitivity of this resource. After the revision, we re-run the finanacial impact and emergency department throughput with
simulation," in Simulation Conference, 2007 Winter, pp. 1566-1573,
simulation and the results are listed in Table VIII. It has been
2007.
shown that adding a doctor could not always significantly
[9] L. Au, G. Byrnes, C. Bain, M. Fackrell, C. Brand, D. Campbell, and P.
reduce the waiting time. After the number of doctors reaches 4, Taylor, "Predicting overflow in an emergency department," !M A Journal
additional doctors will not provide significant benefit. of Management Mathematics, vol. 20,pp. 39-49,2009.
[10] L. V. Green,1. Soares,1. F. Giglio, and R. A. Green, "Using queueing
TABLE VTTT THE PERFORMANCE INDEXES OF SCENARIO 2 theory to increase the effectiveness of emergency department provider
Index 4 Doctors 5 Doctors staffing," Academic Emergency Medicine, vol. 13,pp. 61-68,2006.
Average finished [11] L. Wen, J. Wang, W. M. van der Aalst, B. Huang, and J. Sun, "A novel
40.44 person/IOhrs 41.02 person/IOhrs
patients /IOhrs approach for process mining based on event types," Journal oflntelligent
Average patient Information Systems, vol. 32,pp. 163-190,2009.
83.48 min 80.05 min
cycle time
[12] R. S. Mans, M. H. Schonenberg, M. Song, W. M. P. Aalst, and P. J. M.
Average patient waiting time 12.67 min 9.10 min Bakker, "Application of Process Mining in Healthcare - A Case Study in
Receptionist utilization rate 63.67% 63.79% a Dutch Hospital," in Biomedical Engineering Systems and
Nurses utilization rate 84.31% 84.33% Technologies. vol. 25, A. Fred, J. Filipe, and H. Gamboa, Eds., ed:
Doctor utilization rate 69.65% 56.74% Springer Berlin Heidelberg, 2009, pp. 425-438.
[13] C. Mc Gregor, C. Catley, and A. James, "A process mining driven
framework for clinical guideline improvement in critical care," in
VI. CONCL USIO NS Learning from Medical Data Streams 13th Conference on Artificial
In this paper, a process mining based method is introduced Intelligence in Medicine (LE ME DS). vol. 765,2012.
to analyze the event log and discover a useful workflow for an [14] P. B. Jensen, L. J. Jensen, and S. Brunak, "Mining electronic health
outpatient clinic. The workflow is then used to develop a DES records: towards better research applications and clinical care," Nature
model to evaluate the clinic performance and estimate the Reviews Genetics, vol. 13,pp. 395-405,2012.
potential improvement. The DES model can accurately [15] R. S. Mans, W. M. van der Aalst, R.1. Vanwersch, and A.1. Moleman,
"Process mining in healthcare: Data challenges when answering
reproduce the observed behaviors in the clinic, identify the frequently posed questions," in Process Support and Knowledge
bottleneck, and predict the impact of improvement efforts for Representation in Health Care,ed: Springer,2013,pp. 140-153.
decision making purposes. The case study shows a big [16] H. Darabi, A. Sharabiani," An outpatient clinic's workflow log,"
potential of process mining techniques in healthcare systems. MIE,University of lllinois at Chicago,20I3.
[17] W. M. Van der Aalst and W. van der Aalst, Process mining: Discovery,
With process mining, there is a possibility to obtain a DES conformance and enhancement of business processes: Springer,2011.
model in a semi-automatic way. However, current process
[18] L. Wen, J. Wang, and1. Sun, "Detecting implicit dependencies between
mining approaches still have some flaws. For example, if we tasks from event logs," in Frontiers of WWW Research and
do not use the dependence relationships of "Waiting" to Development-A PWeb 2006,ed: Springer,2006,pp. 591-603
simplify the event log, the workflow discovered by fuzzy [19] A. A. De Medeiros and A. Weijters, "Genetic process mining," in
mining will still be too complicated for simulation. Future Applications and Theory of Petri Nets 2005, volume 3536 of Lecture
work can be focused on developing more robust process Notes in Computer Science,2005
mining technologies that could automatically address such [20] C. W. GOnther and W. M. Van Der Aalst, "Fuzzy mining-adaptive
issues for the purpose of healthcare system application. process simplification based on multi-perspective metrics," in Business
Process Management,ed: Springer,2007,pp. 328-343.
[21] http://fluxicon.comldisco/
RE FE RE NCES
[22] http://www.promodel.com/
[I] A. B. Martin, D. Lassman, B. Washington, and A. Catlin, "Growth in
US health spending remained slow in 2010; Health share of gross
595