On Mining Clinical Pathway Patterns From Medical Behaviors: Artificial Intelligence in Medicine
On Mining Clinical Pathway Patterns From Medical Behaviors: Artificial Intelligence in Medicine
a r t i c l e i n f o a b s t r a c t
Article history: Objective: Clinical pathway analysis, as a pivotal issue in ensuring specialized, standardized, normalized
Received 7 June 2011 and sophisticated therapy procedures, is receiving increasing attention in the field of medical informatics.
Received in revised form 21 May 2012 Clinical pathway pattern mining is one of the most important components of clinical pathway analysis
Accepted 10 June 2012
and aims to discover which medical behaviors are essential/critical for clinical pathways, and also where
temporal orders of these medical behaviors are quantified with numerical bounds. Even though existing
Keywords:
clinical pathway pattern mining techniques can tell us which medical behaviors are frequently performed
Clinical pathway analysis
and in which order, they seldom precisely provide quantified temporal order information of critical
Pattern mining
Process mining
medical behaviors in clinical pathways.
Clinical workflow log Methods: This study adopts process mining to analyze clinical pathways. The key contribution of the paper
is to develop a new process mining approach to find a set of clinical pathway patterns given a specific
clinical workflow log and minimum support threshold. The proposed approach not only discovers which
critical medical behaviors are performed and in which order, but also provides comprehensive knowledge
about quantified temporal orders of medical behaviors in clinical pathways.
Results: The proposed approach is evaluated via real-world data-sets, which are extracted from Zhejiang
Huzhou Central hospital of China with regard to six specific diseases, i.e., bronchial lung cancer, gastric
cancer, cerebral hemorrhage, breast cancer, infarction, and colon cancer, in two years (2007.08–2009.09).
As compared to the general sequence pattern mining algorithm, the proposed approach consumes less
processing time, generates quite a smaller number of clinical pathway patterns, and has a linear scalability
in terms of execution time against the increasing size of data sets.
Conclusion: The experimental results indicate the applicability of the proposed approach, based on which
it is possible to discover clinical pathway patterns that can cover most frequent medical behaviors that
are most regularly encountered in clinical practice. Therefore, it holds significant promise in research
efforts related to the analysis of clinical pathways.
© 2012 Elsevier B.V. All rights reserved.
0933-3657/$ – see front matter © 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.artmed.2012.06.002
36 Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50
clinical pathway, or if patients normally undergo their radical resec- be recorded into clinical workflow logs through various kinds of
tion of colon cancer surgeries in 3–7 days after admission, and are hospital information systems. This can be used to verify and ana-
typically discharged 7–10 days after surgery. We note that tempo- lyze medical services. In addition, it effectively reflects the real
ral relationships among medical behaviors are also called chronicles executing conditions in clinical pathways. Consequently, as Fig. 1
[19–21], which not only allow the researcher to set a relative order shows, process mining can be applied to analyze all kinds of medi-
of medical behavior occurrences in clinical pathways, but also allow cal behaviors, mine frequent clinical pathway patterns, which can
one to quantify the time gaps between behaviors. In the process often indicate critical medical behaviors in patient-care journeys,
of clinical pathway analysis, this quantification is very useful as it and can also provide a valuable reference for clinical experts to help
allows the researcher to make differences between different situ- them redesign and continuously optimize clinical pathways.
ations in clinical pathways that have same medical behaviors, but Taking into account these reasons and considering the fact that
different time spreadings. Indeed, different time spreadings of the clinical workflow logs are often recorded by hospital information
same set of medical behaviors may indicate, for example, that the systems and are easy to collect, we adopt process mining to ana-
same patient therapy behaviors are realized in different contexts. lyze clinical pathways. In particular, the key objective of our paper
The essential medical behaviors and chronicle information form is to mine comprehensive clinical pathway patterns given a spe-
the backbone of clinical pathway patterns and should be conserved. cific clinical workflow log and a minimum support threshold, i.e.,
This task, called clinical pathway pattern mining, is one of the most the complete discovery consists in discovering all clinical pathway
important aspects of clinical pathway analysis. patterns with respect to the clinical workflow log such that the
Many techniques have been proposed for clinical pathway pat- support degree of each pattern is larger than a minimum support
tern mining, to extract knowledge and information in clinical threshold. The support degree of a clinical pathway pattern with
pathways, and help analysts to redesign/optimize clinical path- respect to a clinical workflow log is the number of clinical pathway
ways. Most of these techniques are based on the experiences and traces in the log which contain medical behaviors of the pattern,
knowledge of clinical experts, or are oriented to clinical data sta- and satisfy the temporal constraints (i.e., chronicles) of the pattern.
tistical analysis such as the statistics of pathway coincidence rate However, the diversity of medical behaviors and the complex-
and abort rate, etc [6,22]. In such techniques, the analysts interpret ity of chronicle information among medical behaviors in clinical
large amounts of collected medical behaviors, and elaborate clini- pathways is far higher than that of common business processes.
cal pathway patterns, piece after piece, which can be a very tedious Traditional process mining techniques have many problems and
process. In addition, it appears that analysis of the results are some- challenges when used for mining clinical pathway patterns [27,28].
how influenced by perceptions, e.g., medical behaviors in clinical Although many process mining techniques can tell us which med-
pathways are often normative in the sense that they state what ical behaviors are frequently performed and in which order, they
should be done rather than describing the actual medical behaviors seldom provide precise chronicled information about the critical
in clinical pathways. As a result, it tends to be a rather subjective medical behaviors in clinical pathways for further decision sup-
process. port. In addition, applying traditional process mining techniques
Another possible approach uses data mining and machine may generate spaghetti-like pathway patterns that are difficult to
learning technologies to measure medical behavior from clinical comprehend for clinical experts [27,29]. Such incomprehensible
workflow logs. This is also called process mining [23–26]. Process patterns are either not amenable or are lacking in assisting one
mining, as a valuable set of techniques, has been widely studied in the clinical pathway redesign and optimization efforts.
in the business process management domain. It uses workflow Therefore, it is necessary to develop a new process mining tech-
logs to record business process execution information, to mine the nique to effectively mine clinical pathway patterns. To this end, our
actual behaviors in business processes, and discover business pro- main contribution, in this paper, is to develop a novel process mining
cess patterns. Based on process execution data, with its logic and approach to mine clinical pathway patterns from medical behav-
reasoning ability, process mining guarantees integrity, objectivity iors. In comparison to the traditional process mining techniques,
and universality of the discovered process patterns [26,27]. the proposed approach can discover closed clinical pathway pat-
Process mining can be an objective way of analyzing clinical terns. It can also answer questions related to the most common
pathways as it is not biased by perceptions or normative behav- (likely) behavior, the pathway traces that share/capture a desired
iors. Note that the medical behaviors in patient-care journeys can behavior, the time spans between desired behaviors in clinical
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 37
pathways, etc. Our approach is evaluated via real-world data sets that are derived from the same process model, but differ slightly
from Zhejiang Huzhou Central Hospital of China. in structure [33]. In particular, variants are common in clinical set-
This paper is organized as follows. Section 2 introduces the tings because the changes in the patient state make the applied
concept of the clinical pathway and process mining and provides medical behaviors inappropriate, such that patient-care has to be
a brief overview of process mining in health-care. Section 3 for- adjusted. As a matter of fact, a clinical pathway frequently entails
mulates a clinical pathway pattern mining problem. In Section 4, improvisation or ad hoc combinations. Often unexpected delays
we introduce our approach for mining clinical pathway patterns and iterations might occur if new medical behaviors are not substi-
from medical behaviors, which are regularly recorded in clini- tuted or short-cuts are not taken. Over time, these improvisations
cal workflow logs. Evaluation of the proposed approach is shown may become accepted as formal procedures, but in the short run,
in Section 5. Section 6 summarizes our conclusions and considers they would appear as variants on the standardized pathway.
future research directions.
2.2. Process mining and its applications in health-care
2. Background
The goal of process mining is to extract information (e.g., process
In this section, we provide some background on clinical path- or organizational models) from workflow logs, i.e., process min-
ways, and then review related work on process mining and its ing describes a family of a-posteriori analysis techniques exploiting
various applications in health-care. the information recorded in workflow logs [26,34]. Typically, these
approaches assume that it is possible to record events sequentially
2.1. Understanding clinical pathway such that each event refers to an activity (i.e., a well-defined step in
the process) and is related to a particular case (i.e., a process case).
Clinical pathways aim to coordinate the patient-care process Process mining addresses the problem that most “pro-
by a team of health-care professionals for a specific diagnosis cess/system owners” have limited information about what is
or procedure [18]. In essence, it describes the functional knowl- actually happening [23,24,35]. In practice, there is often a sig-
edge pertaining to an institution clinical practices in terms of nificant gap between what is prescribed or supposed to happen,
time-sensitive and outcome-driven processes, represented as a and what happens in reality [36]. Only a concise assessment of
combination of plans, tasks, decisions, resources and care providers reality, which process mining strives to deliver, can help to ver-
that essentially resembles a workflow [1,30]. ify process patterns, and ultimately be used in process redesign
Hunter and Segrott point out that a clinical pathway is a specific efforts [35,36]. Discovering frequently occurring temporal patterns
trajectory of the sequencing and timing of practitioners’ care [6]. in process cases facilitates intelligent and automatic extraction of
Bryan et al. [31] describe a clinical pathway as “a map of the process useful knowledge to support business decision-making. Similarly,
involved in managing a common clinical condition or situation”. data mining techniques are exploited in workflow management
In fact, such a trajectory or map defines a set of medical behav- contexts to mine frequent workflow execution patterns [37]. The
iors, and the time that these behaviors take to occur. As shown sequence of activities within a process, the execution cost and the
in Table 1,1 a clinical pathway consists of different categories of reliability of the process can be predicted by using the process
medical behaviors, namely multidisciplinary clinical activities, and path mining technique [38]. Based on the process patterns and pro-
their dependencies with the following characteristics [6,32]: (1) cess paths, unexpected but useful knowledge about the process is
clinical activities are spread along a predefined time-line, which extracted to help the user make appropriate decisions. For more
is the expected length of stay (LOS) in the hospital, and each details on process mining, please refer to [26,36].
activity represents a specific clinical task (e.g., a medical order, a The application of process mining in health-care is a relatively
radiological examination test, etc.). For instance, a clinical activ- unexplored field, although it has already been attempted by some
ity such as the surgery of lung cancer, is preferably performed authors [27], who have devised a methodology based on process
in the time span between the fourth day and the seventh day of mining in order to support business process analysis in health-
LOS. In addition, there are specific starting/ending activity types in care. Their methodology includes process mining techniques that
clinical pathways. For example, the starting/ending activity types are especially useful in health-care environments, given the char-
of clinical pathways published by Ministry of Health of China are acteristics of health-care processes. A case study was conducted in
admission and discharge. Moreover, certain temporal relations exist the Hospital of São Sebastião in Portugal by gathering data from the
between clinical activities. For example, a color ultrasound exam- hospital information system and analyzing the data set by utiliz-
ination should be performed on the first day after admission, and ing a set of process mining techniques for the selected radiological
another color ultrasound examination should be performed on the examination processes.
day before discharge. (2) A clinical pathway normally enumerates To the best of our knowledge, the approaches that are most sim-
regular medical behaviors that are expected to occur in patient- ilar to the ones presented in this paper, are highlighted in [39,40].
care journeys and serves as checkpoints for the performance of the In [39], Klundert et al., presents a model to measure clinical path-
pathway. We note that medical behaviors in this study are repre- way adherence, which can cope with variations in pathways and
sented as flexible, transparent, and re-usable pieces of functionality deviations from pathways. They evaluated their method by using
that consist of one or several clinical activities required to set up real-life data from the years 2001-2005 at the Maastricht University
a clinical solution. (3) These medical behaviors, as basic alterna- Medical Centre (MUMC). Lin et al. [40] reported a data mining tech-
tives, can be applied in considering specific patient states in clinical nique that was developed to discover the time dependency pattern
pathways. A clinical pathway may generate a set of regular medi- of clinical pathways for managing brain stroke. The mining of time
cal behaviors with occasional variants. Note that respective model dependency patterns allows us to discover patterns of process exe-
adaptations result in large collections of process model variants cution sequences and to identify the dependent relation between
activities in a majority of cases. By obtaining the time dependency
patterns, it is possible to predict the paths for new patients who are
1
admitted into the hospital.
This is a translation of a bronchial lung cancer clinical pathway published by
Ministry of Health of China. For the original version, please refer to: http://www.
Note that the work mentioned above is only confined to one or
moh.gov.cn/publicfiles/business/cmsresources/mohyzs/cmsrsdocument/doc4905. several well-structured fragments of patient-linked treatment pro-
doc. cesses, such as radiological workflow. To the best of our knowledge,
38
Table 1
A portion of the bronchial lung cancer clinical pathway summary recommended by Ministry of Health of China.
Suitable for:Patient with First Diagnosis of bronchial lung cancer (ICD-10:C34;D02.2) for local excision of pulmonary/lobectomy/pneumonectomy+systematic lymph node dissection/Thoracotomy surgery (ICD-9-CM-3:32.29/32.3-32.5)
Outpatient Service NO: Hospitalization NO: Admission Date: Discharge Date: Length Of Stay: 14 -21 Day
Time Admission (Day 1) Pre-OP Day (Days 2-6) Operation(OP) Day (Days 4-7) Post-OP I (Days 5-8) Post-OP II (Days 6-12) Discharge (Days 13-21)
Higher authority physician rounds Indwelling catheter before surgery Higher authority Higher authority Remove incision suture
Preoperative evaluation Surgeon completes Resident completes Resident completes physician rounds
Medical history inquiry
Preoperative discussion and operation record progress note progress note and determine discharge
and physical examination
surgical planning Resident completes Observe chest drainage Review blood routine Resident completes discharge
Write patient record
Preoperative consultation postoperative course Note vital signs examination,biochemistry summary,medical
Issue laboratory orders
Resident completes medical records Higher authority physician and breath sounds and chest X-ray record homepage, etc.
and check request form
including progress and preoperative rounds in the lungs Remove chest drain and Inform patient and family
Attending rounds
log summary, superior physician records Observation of vital signs Encourage and dress incision according of issues after discharge
Set treatment plan
Sign the informed consent procedure, Account of illness and assist patients to patient condition Determine treatment
consent, consent authorization patient and family Bronchoscopy sputum Discontinue/adjust antibiotic postoperative pathology
kidney function examination, Body temperature, ECG, blood Stop atomization inhalation Remove suture
surgery under general anesthesia Normal diet
electrolytes, infectious disease pressure,respiration, pulse, blood Stop antimicrobial Dress the incision
No food or water intake Temporary Order:
screening,tumor markers check oxygen saturation monitoring Temporary Order: Inform of discharge
6 hours before surgery Blood routine, liver and
Lung function, arterial blood gas Record the amount of Remove closed chest Discharge with
Enema the night before surgery kidney function,
analysis, ECG, echocardiography chest drainage drainage tube medication
Preoperative skin preparation electrolytes examination
Sputum cytology, bronchoscopy+biopsy Continued catheterization, record Remove catheter Regular return visit
Preparation of blood transfusion Chest X-ray
Imaging: lateral chest X-ray, chest CT, 24-hour intake and output Dress the incision
Sedative drugs Other special advices
abdominal ultrasound or CT, whole Atomizing inhalation Review blood routine, liver
Preparation of antibacterial
body bone scan, brain MRI or CT Prophylactic antibiotics and kidney function,
drugs in surgery
When necessary: PET-CT or SPECT, Analgesic electrolytes examination
Other special advices
mediastinoscopy, 24-hour ambulatory Temporary Order: according to patient condition
ECG,percutaneous lung biopsy, etc. Other special advices Other special advices
Introduce the ward environment, Education, preoperative Observe changes in condition Observe patient condition Observe patient condition Observe patient condition
Nursing
facilities and equipment skin preparation Postoperative care of Care of psychological Care of psychological Care of psychological
Care
Admission nursing assessment Inform of no water and food intake psychological and life and life and life and life
Aid smoking cessation Respiratory exercises Maintain patency of airway Aid patient expectoration Aid patient expectoration Recovery instruction
Variance
No Yes, caused by: No Yes, caused by: No Yes, caused by: No Yes, caused by: No Yes, caused by: No Yes, caused by:
Record
1. 1. 1. 1. 1. 1.
2. 2. 2. 2. 2. 2.
Signature
Nurse
Signature
Physician
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 39
previous work is not yet involved in mining the core behav- Definition 2 (Clinical pathway trace). A clinical pathway trace is
iors of health-care processes such as clinical pathways. Because represented by = tid, e1 , e2 , . . ., en , where tid is the identifier
of complex medical behaviors generated during clinical pathway of this trace and e1 , e2 , . . ., en is a finite non-empty sequence of
execution, traditional process mining techniques have many prob- clinical events such that each event appears only once, and time is
lems and challenges when applied to clinical practice. They often non-decreasing, i.e., for 1 ≤ i ≤ j ≤ n : ei = / ej and ei . t ≤ ej . t.
generate spaghetti-like clinical pathway patterns that are incom-
prehensive to health-care professionals. For example, as shown in Fig. 2, there are four clinical path-
Our approach is different from the traditional process min- way traces. Each trace consists of a set of clinical events. These
ing techniques, which typically documents the start/end of each traces are represented as 1 , 2 , 3 and 4 , respectively, in Table 2.
activity execution and therefore, reflects the behavior of the imple- When checking if a clinical pathway trace appears in a sequence,
mented processes. Our approach is specific to mining clinical we usually have to determine the relation between the two events.
pathway patterns. Thus, given clinical workflow logs, this approach Definition 3 (Arrangement of events). In a clinical pathway trace,
can discover understandable clinical pathway patterns that pro- event ei must be placed before event ej based on the following
vide not only the sequential order of activities, but also information conditions:
about the time span between different pairs of activities precisely.
These discovered patterns allow medical staff to realize/study
1. ei . t < ej . t,
which medical behaviors can be performed and the time periods
2. If ei . t = ej . t, but ei ’s activity type ei . a alphabetically precedes
during which these behaviors can be performed in clinical path-
that of ej .
ways.
3. Problem definition Definition 4 (Clinical workflow log). Let Trace be the set of all pos-
sible clinical pathway traces, and a clinical workflow log L is a set
The goal of mining clinical pathway patterns is to extract knowl- of traces L ⊆ Trace such that each event appears at most once in
edge about target clinical pathways from medical behaviors. In the entire log, i.e., for any 1 , 2 ∈ L : ∀e1 ∈ 1 ∀e2 ∈ 2 , e1 =
/ e2 or
order to analyze any clinical activity, and thus to discover inter- 1 = 2.
esting behavioral knowledge about this activity, it is necessary Fig. 2 shows an example of a clinical workflow log of bronchial
to collect observational data about the activity. In this study, we lung cancer clinical pathway, which consists of four clinical path-
assume that it is possible to record medical behaviors represented way traces, i.e., L = {1 , 2 , 3 , 4 }. Table 2 shows the details of
as clinical events in patient-care journeys in clinical workflow logs. clinical event sequence of these traces.
We also assume that the occurrence times of these clinical events
are also recorded in clinical workflow logs. In fact, many electronic Definition 5 (Temporal constraint). Let A be a set of clinical activ-
medical record (EMR) systems record such information. In order to ities and let T be the time domain. A clinical pathway temporal
explain the kind of input needed for our approach, we first define constraint is a 4-tuple = (a1 , a2 , t− , t+ ), denoted (a1 , [t− , t+ ], a2 ),
the following concepts. where a1 , a2 ∈ A are clinical activity types, and t− , t+ ∈ T are the
lower bound and the upper bound of the temporal constraint, such
Definition 1 (Clinical event). Let A be a set of clinical activities,
that t− ≤ t+ . Two events e1 and e2 of a particular clinical pathway
and T the time domain. A clinical event e is represented as e = (a, t),
trace are said to satisfy the temporal constraint (a1 , [t− , t+ ], a2 ), if
where a is the activity type of e (a ∈ A), and t is the occurring time
e1 . a = a1 , e2 . a = a2 , e2 . t − e1 . t ∈ [t− , t+ ].
of event e (t ∈ T). A clinical event is a clinical activity occurring at a
particular time stamp. For example, let (a, [3, 4], g) be a temporal constraint. It enforces
that g appears between 3 and 4 time units after a. The clinical path-
For example, we let (a, 1) be a particular clinical event, where a
way traces 1 , 3 , and 4 as shown in Table 2, satisfy the temporal
is the activity type, i.e., admission, of the event, and 1 is occurring
constraint (a, [3, 4], g). For convenience, let . t− , . t+ be the lower
time of the event.
bound and upper bound of the temporal constraint , respectively.
In this study, we assume that clinical events are point-based
In addition, we define the frequency of a temporal constraint
events, which is the common assumption adopted by most pat-
in a clinical workflow log L as its number of occurrence in L with
tern mining studies [41–43]. A point-based event is viewed as
respect to the total count of traces in L. A simple way of collect-
something that occurs at a certain point in time. In clinical path-
ing the occurrences of in L is to decide that each combination of
ways, however, events cannot always be represented as points.
events of each trace of L that satisfies the temporal constraint of
For instance, in patient-care journeys, medical behaviors may be
is considered as an occurrence of . For example, there are four
represented as interval-based events, if we record when the med-
occurrences of the temporal constraint = (a, [3, 9], g) in clinical
ical behaviors are performed and how long the behaviors last.
workflog log shown in Table 2. Thus, the frequency of is 100%.
When clinical events are represented as intervals, an event can be
While for the temporal constraint = (a, [3, 4], g), its frequency in
described with three major characteristics: activity name, event
the clinical workflow log shown in Table 2 is 75%.
starting time, and event ending time. However, an interval-based
event can be represented by two point-based events. For example, Definition 6 (Clinical activity sequence). Let A be a set of clini-
as shown in Fig. 2,2 an interval-based event “Post operation drain” cal activities. Let A = a1 , a2 , . . . , ak be an ordered clinical activity
can be represented as two point-based events, i.e., “Post operation sequence, where ai ∈ A for 1 ≤ i ≤ k.
drain begin”, and “Post operation drain end”. Thus, in this study, we
simply represent each interval-based event as two corresponding For example, we let A = a, g, v be a particular clinical activity
point-based events, and develop an approach to discover clinical sequence, which consists of three activities, i.e., Admission, Rad-
pathway patterns from point-based events. ical surgery, and Discharge. These three activities are performed
sequentially in patients’ clinical pathway traces.
Fig. 2. A clinical workflow log example of bronchial lung cancer clinical pathway.
For example, we let A = a, g, v be a particular clinical activ- 1. A = a1 , a2 . . . , ak , is an ordered clinical activity sequence; and,
ity sequence, and {(a, [3, 9], g), (a, [15, 23], v), (g, [12, 17], v)} be a 2. C is a chronicle on A such that for all pairs (ai , aj ) of A satisfy-
particular chronicle on A. ing i < j, there exists a temporal constraint ai aj ∈ C, where ai aj is
Note that a chronicle is a set of temporal constraints on a denoted by (ai , [ta−i aj , ta+i aj ], aj ).
particular ordered activity sequence. It apparently suggests some
sequential behavior between these activities. In particular, it satis- A is called the sequence of , in the sense of frequent
fies the following property: sequence pattern discovery [44]. In addition, we call
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 41
= (A, C) a k-pattern, if A = a1 , a2 , . . . , ak . For example, a is a 1- Definition 11 (Support). Let L be a clinical workflow log, and be
pattern, and (a, g, v, {(a, [3, 9], g), (a, [15, 23], v), (g, [12, 17], v)}) a clinical pathway temporal pattern. The support of in L, denoted
is a 3-pattern. supp(, L), is defined as:
1. a1 = ef(1) . a, a2 = ef(2) . a, . . ., ak = ef(k) . a; and, Given a user-defined minimal support threshold, denoted as
2. ef(j) . t − ef(i) . t is in the temporal constraint of ai aj , i.e., ta−i aj ≤ minsupp, the problem of clinical pathway pattern mining is the
ef (j) .t − ef (i) .t ≤ ta+i aj for 1 ≤ i ≤ j − 1 and 2 ≤ j ≤ k. extraction of a clinical pathway pattern from a clinical workflow
log that supp(, L) ≥ minsupp. Such a clinical pathway pattern is
defined as being ‘frequent’.
For example, we let = (a, g, v, {(a, [3, 9], g), (a, [15, 23], v),
Property 2. If a clinical pathway pattern is frequent, so are all of its
(g, [12, 17], v)}) be a clinical pathway pattern. Clearly, is sup-
sub patterns. Accordingly, if a clinical pathway pattern is not frequent,
ported by all four clinical pathway traces in Table 2.
then its super pattern will not be either.
Definition 10 (Sub clinical pathway pattern). Let = (b1 , b2 , Definition 12 (Closed clinical pathway pattern). Let L be a clini-
. . ., bm , {bl bs |1 ≤ l < s ≤ m}) be a sub-pattern of another clinical cal workflow log. A clinical pathway pattern = (a1 , a2 , . . ., ak ,
pathway pattern = (a1 , a2 , . . . , ak , {ai aj |1 ≤ i < j ≤ k}), if the {ai aj |1 ≤ i < j ≤ k}) is a closed pattern if it satisfies the following
following conditions are satisfied: conditions:
4.1. Mining sequences of closed clinical pathway patterns Algorithm 1 (The clinical activity sequence mining algorithm).
1: Procedure::SCP-Miner(L, minsupp)
In this study, we propose a closed clinical pathway pattern’s 2: Input:
sequence mining algorithm, SCP-Miner, based on the ideas of clas- 3: L is a clinical workflow log
4: minsupp is a minimum support threshold value
sical sequence pattern mining algorithms. Before introducing the
5: Output:
proposed SCP-Miner algorithm, the definitions of prefix, projection, 6: SCP is the set of sequences of closed clinical pathway patterns
and projected clinical workflow log are given as follows. 7: Steps:
8: Let SCP =∅ be a set of sequences of closed clinical pathway patterns
Definition 13 (Prefix). Let = e1 , e2 , . . ., en be a clinical pathway 9: Let a be the clinical activity admission and L|a be a projected clinical
trace, and A = a1 , a2 , . . . , ak be a clinical activity sequence. We workflow log of a
say that A is a prefix of if and only if ai = ei . a for 1 ≤ i ≤ k ≤ n. 10: Call SCPMiner(a, L|a , minsupp, SCP)
11: Output SCP
For example, a clinical activity sequence A = a, b, c is a prefix 12: End Procedure
13: Procedure::SCPMiner(, L| , minsupp, SCP)
of the clinical pathway trace 1 , as shown in Table 2.
14: Input:
15: is a temporal pattern
Definition 14 (Projection). Let = e1 , e2 , . . ., en be a clinical path-
16: L| : a projected workflow log of
way trace, and A = a1 , a2 , . . . , ak be a clinical activity sequence. 17: minsupp is a minimum support threshold value
A sub clinical pathway trace ˇ = b1 , b2 , . . ., bm is a projection of 18: SCP is the set of sequences of closed clinical pathway patterns
with respect to A if and only if: 19: Output:
20: SCP is the set of sequences of closed clinical pathway patterns
21: Steps:
1. There exists a strictly increasing function f on the indexes of 22: Scan L|A and find all frequent clinical activities Xk+1
satisfying ef(1) . a = a1 , ef(2) . a = a2 , . . ., ef(k) . a = ak where ef(1) , ef(2) , 23: If (A passes the forward checking) then
. . ., ef(k) ∈ and a1 , a2 , . . . , ak ∈ A; 24: If (A is closed with respect to SCP) then
25: SCP = SCP ∪ {A}
2. A is a prefix of ˇ; and,
26: End If
3. the last m − k elements of ˇ are the same as the last m − k ele- 27: End If
ments of . 28: For each ak+1 in Xk+1
29: Append ak+1 to A as A
30: Let L|A be the projected clinical workflow log of A
For example, if the clinical trace 1 , as shown in Table 2, is pro-
31: Checking if A is contained by any sibling pattern and both share
jected by a sequence A = a, g, a projection is obtained, i.e., (a, 1), the same projections
(g, 4), (h, 4), (i, 4), (s, 7), (r, 8), (o, 13), (t, 15), (v, 16). 32: If not, call SCPMiner(A , L|A , minsupp, SCP)
33: End For
Definition 15 (Projected clinical workflow log). The projected clin- 34: return SCP
ical workflow log with respect to a clinical activity sequence A 35: End Procedure
contains all the projections of A in the clinical workflow log L.
Let us take the clinical workflow log, as shown in Table 2,
When we generate a clinical activity sequence, we need to do as an example. We assume that minsupp = 0.5. First, we scan
some closure checking to determine whether or not the generated the projected clinical workflow log L|a from the activity a, i.e.,
sequence is closed. Note that clinical pathways may have been spec- admission. Next, we grow the frequent 1-activity sequence a to
ified through starting/ending activity types, which can be used in find its frequent super-patterns in the projected clinical work-
closure checking. For example, the starting/ending activity types flow log. For example, if b is a frequent 1-activity pattern in
of clinical pathways published by Ministry of Health of China are a’s projected clinical workflow log, then we can grow the fre-
admission and discharge. Thus, the sequence can efficiently grow quent 1-activity sequence a by appending b to it, and thus
from a particular frequent 1-pattern, i.e., admission. This feature obtain a frequent 2-activity sequence a, b. Obviously, a, b
of clinical pathway patterns allows us to use a forward checking is not closed. Therefore, we continue to grow the sequence by
instead of bi-directional checking to determine if the generated appending a frequent 1-activity sequence in its projected clinical
sequence is closed. workflow log. The work is recursively performed until we get final
sequences as follows, A1 = a, b, c, d, e, g, h, i, j, s, r, k, l, t, v, and
Definition 16 (Forward checking). A clinical pathway pattern is
A2 = a, b, c, d, j, q, g, h, i, s, r, o, t, v.
not closed if the last activity of ’s sequence, A, is not discharge.3
The proposed SCP-Miner algorithm, outlined in Algorithm 1, 4.2. Mining chronicles on generated clinical activity sequences
consists of two phases. First, we scan a clinical workflow log L
from the pre-known frequent 1-activity sequence, i.e, admission, Based on the set of clinical activity sequences generated by the
and then build a projected clinical workflow log L|admission . Then, algorithm SCP-Miner, we can mine chronicles on each sequence to
we recursively use a frequent k-activity sequence and its projected generate closed clinical pathway patterns. Before we present our
clinical workflow log to generate its frequent super-patterns at the method of mining chronicles on each clinical activity sequence, we
next level in the frequent sequence tree, where k ≥ 1. For each fre- introduce the following concepts.
quent k-activity sequence, we build its projected clinical workflow
log and find all frequent 1-activity sequences in the projected clin- Definition 17 (Stricter chronicle). A chronicle CA is stricter than
ical workflow log. During this phase, we use a forward checking to another chronicle CA , denoted CA ≺ CA , if ∀ai , aj ∈ A, [ta−i aj , ta+i aj ] ⊂
determine if the frequent sequences generated are closed. If A is [ta−i aj , ta+i aj ].
closed and the support of A is not less than minsupp, A is added
into SCP. For example, we let C{a,g,v} = {(a, [3, 4], g), (a, [15, 23], v),
(g, [12, 17], v)} and C{a,g,v} = {(a, [3, 9], g), (a, [15, 23], v),
(g, [12, 17], v)} be two chronicles, C{a,g,v} ≺ C{a,g,v} since [3, 4] ⊂ [3,
3
Note that for different clinical pathways, there may are different starting/ending 9]. The relation ≺ is a partial relation of order over any chronicles.
activity types. Thus, the activity type used in forward checking may be different.
As a matter of fact, it could be more general to allow the user to specify a set of Definition 18 (Chronicle relation). Given a particular clinical activ-
starting/ending activity types according to different clinical pathways. ity sequence A, we say a chronicle CA “is child of” another chronicle
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 43
Fig. 4. A set of derived chronicles given the particular clinical activity sequence, A = {a, g, v}, and the particular clinical workflow log shown in Table 2.
CA if and only if CA ≺ CA and there is no other chronicle CA satisfying (ai , aj ) are stricter than . Note that the frequency of is 100% with
the condition such that CA ≺ CA ≺ CA . respect to L. At last, all possible temporal constraints are grouped
together to generate to a particular top chronicle.
Note that given a particular clinical workflog log L and a
clinical activity sequence A, we can derive a set of chronicles Property 4. Let A be a particular clinical activity sequence. Given
form L. For example, as shown in Fig. 4, a set of chroni- a particular clinical workflow log L, the frequency of the derived top
cles are generated on a particular clinical activity sequence chronicle CTOP
A is 100%.
A = {a, g, v} given the clinical workflow log shown in Table 2,
Note that since the frequency of each temporal constraint that
named G|{a,g,v} . There is an arrow from the chronicle C1 =
is contained in the top chronicle is 100%, the frequency of the top
{(a, [3, 9], g), (a, [15, 23], v), (g, [12, 17], v)} to the chronicle C2 =
chronicle is also 100%. For example, as shown in Fig. 4, the fre-
{(a, [3, 4], g), (a, [15, 23], v), (g, [12, 17], v)} because C1 is the par-
quency of the top chronicle C1 derived from the clinical workflow
ent of C2 , and an arrow from the chronicle C2 to the chronicle
log depicted in Table 2 is 100%.
C8 = {(a, [3, 4], g), (a, [15, 20], v), (g, [12, 17], v)} because C2 is the
parent of C8 . However, there is no arrow from the chronicle C1 Algorithm 2 (The top chronicle generating algorithm).
to the chronicle C8 because C1 is not the parent of C8 . Note that 1: Procedure::TC-Miner(A, L)
these chronicles are organized in an acyclic directed graph, where 2: Input:
nodes are chronicles and arrows represent “is child of” relations 3: A is a particular clinical activities sequence
(respectively, “is parent of” relation). 4: L is a clinical workflow log
5: Output:
Property 3. Let A be a particular clinical activity sequence. Given a 6: CTOP
A is the top chronicle with the limitation of B
7: Steps:
particular clinical workflow log L, there is the one and only one top 8: Let CTOP =∅
A
chronicle derived from L, denoted as CTOP
A , satisfying that there is no 9: For each activity pair (ai , aj ) in A, do
other derived chronicle CA such that CTOP
A ≺ CA . 10: O(ai , aj ) ← {(ai , ti )(aj , tj )|(ai , ti ) ∈ ∧ (aj , tj ) ∈ ∧ ∈ L}
11: ˝(ai , aj )← sort ({(tj − ti )|(ai , ti )(aj , tj ) ∈ O(ai , aj )})
For example, we deduce from Fig. 4 that C1 is the top chronicle 12: Let (ai , aj ) = (ai , [˝(ai , aj )[0], ˝(ai , aj )[|˝(ai , aj )|], aj )
with respect to the given clinical workflow log shown in Table 2. In 13: CTOP
A ← {(ai , aj )}
14: End For
order to derive the top chronicles from particular clinical workflow
15: Return CTOPA
logs, we propose an algorithm, i.e., TC-Miner, as shown in Algorithm 16: End Procedure
2.
In the algorithm TC-Miner, for each activity pair (ai , aj ) of clini- The top chronicle is the “seed” to mine stricter chronicles on
cal activity sequence A, the occurrences and the set of occurrence particular clinical activity sequences with respect to a particular
distances are calculated (Line 10 and Line 11) based on the input clinical workflow log. In this study, an algorithm, CCP-Miner, is
clinical workflow log L. Following this, the maximum occurrence presented to mine chronicles on clinical activity sequences so that
distance of (ai , aj ) is picked up to generate a particular temporal the closed clinical pathway patterns can be generated. The algo-
constraint on (ai , aj ) (Line 12). All other temporal constraints on rithm CCP-Miner, outlined in Algorithm 3, implements chronicle
44 Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50
discovery on particular clinical activity sequences generated by the 45: Let C = C − {} + { }
algorithm SCP-Miner. As shown in Line 9 of the algorithm, stores 46: Children ← Children {C }
47: End For
a set of possible frequent chronicles for a particular sequence A
48: End For
given a particular clinical workflow log L. Each element in would 49: return Children
be combined with A to generate a closed clinical pathway pattern. 50: End Procedure
is the set of candidates of chronicles. Initially, there is a top chron- 51: Procedure::Strict(L, , minsupp)
icle CTOP in (Line 11). Then, CCP-Miner works as follows: it takes 52: Input:
53: L is a clinical workflow log
one candidate chronicle C from , calculates the support of (A, C),
54: is a temporal constraint with respect to L
and adds its children to if (A, C) is frequent. The algorithm ends 55: minsupp is a minimum support threshold value
when is empty (Line 27). 56: Output:
In each iteration, a chronicle C in is chosen and removed from 57: is a set of stricter temporal constraints given L and
. Then, the support of (A, C) is calculated. If (A, C) (C ∈ ) has a 58: Steps:
59: Let =∅
support greater than minsupp, C is added into and is updated, 60: Let (ai , aj ) be the activity pair of
which removes from any chronicle C such that C ≺ C . Then, the 61: O(ai , aj ) ← {(ai , ti )(aj , tj )|.t − ≤ ti ≤ tj ≤ .t + ∧ (ai , ti ) ∈ ∧ (aj , tj ) ∈
procedure GetChildren generates all children of C. ∧ ∈ L}
Note that in the procedure GetChildren, each temporal constraint 62: ˝(ai , aj )← sort ({(tj − ti )|(ai , ti )(aj , tj ) ∈ O(ai , aj )})
63: Let k = |˝(ai , aj )| − 1
contained in the particular chronicle C gets stricter in order to gen- k
64: If |L| ≥ minsupp
erate a set of possible children of C (Lines 42–48). In detail, if a 65: ← {(ai , [˝(ai , aj )[l], ˝(ai , aj )[l + k − 1]], aj )|0 ≤ l ≤ |˝(ai ,
particular temporal constraint contained in C has stricter tempo- aj )| − k + 1 ∧ ˝(ai , aj )[l] = / ˝(ai , aj )[l − 1] ∧ ˝(ai , aj )[l + k − 1] = / ˝(ai ,
ral constraints learned by the procedure Strict, each element aj )[l + k]}
in will take the place of in C to generate a new child chronicle 66: End If
67: Return
C of C (Line 45). The generated child C is added to if C has never
68: End Procedure
been added to before (Lines 22–26). By checking that C has never
been added into , we ensure that the iteration will never process
the same chronicle twice. This ensures that CCP-Miner will always The procedure Strict is used to tighten a particular temporal con-
terminate. straint to generate a set of stricter temporal constraints such
that each one in is frequent with respect to L. At first, we build
a complete set of all occurrences of the temporal constraint with
Algorithm 3 (Chronicles mining algorithm). respect to the particular clinical workflow log L, denoted O(ai , aj )
1: Procedure::CCP-Miner(A, L, minsupp) (Line 61), where (ai , aj ) is the activity pair of . Formally, O(ai , aj ) =
2: Input:
{(ai , ti )(aj , tj )|(ai , ti ) ∈ ∧ (aj , tj ) ∈ ∧ ti ≤ tj ∧ ∈ L}. And then we
3: A is a sequence of a particular closed clinical pathway pattern
4: L is a clinical workflow log build and sort the set of occurrence distances, denoted ˝(ai , aj )
5: minsupp is a minimum support threshold value (Line 62). Formally, ˝(ai , aj ) = {tj − ti |(ai , ti )(aj , tj ) ∈ O(ai , aj )}. Tak-
6: Output: ing clinical workflow log L in Table 2 as an example, O(a, g) =
7: is a set of closed clinical pathway patterns {(a, 1)(g, 4), (a, 1)(g, 9), (a, 1)(g, 4), (a, 1)(g, 3)}, and ˝(a, g) = {3,
8: Steps:
9: Let =∅
9, 3, 4}, i.e., ˝(a, g) = {3, 3, 4, 9}. Furthermore, for the activity pair
10: Let Ctop = TC-Miner(A, L) is the top chronicle with respect to L (ai , aj ), we build a set of candidate temporal constraints, by apply-
11: Let = {Ctop } ing a minimum support threshold. In particular, we adopt Cram’s
12: Repeat approach [21] to slide a window of width k = |˝(ai , aj )| − 1 from
13: Let C be the first element of
the first occurrence of an element in ˝(ai , aj ) to the last occur-
14: ← − {C}
15: Let = (A, C) rence of an element in ˝(ai , aj ) (Line 65) to generate a set of stricter
16: If Supp(, L) ≥ minsupp then temporal constraints w.r.t , provided that the frequency of the gen-
17: Update(, C) erated temporal constraints is greater than the minimum support
18: Else threshold, i.e., (|˝(ai , aj )| − 1)/|L| ≥ minsupp (Line 64). For exam-
19: Go to Line 12
ple, for the pair (a, g) with minsupp = 0.5, the window width k is
20: End If
21: Let Children = GetChildren(C, L) |˝(a, g)| − 1 =3, after which we slide a window with 3 over ˝(a,
22: For each C ∈ Children do g) = {3, 3, 4, 9} : (a, [3, 4], g). As a result, a stricter temporal con-
23: been added into before
If C has never straint = (a, [3, 4], g) is generated. Note that the window is from
24: ← {C } the first occurrence of an element in ˝(ai , aj ) to the last occur-
25: End If
rence of an element in ˝(ai , aj ), i.e., ˝(ai , aj )[l] = / ˝(ai , aj )[l − 1]
26: End For
27: Until =∅ and ˝(ai , aj )[l + k − 1] = / ˝(ai , aj )[l + k] since some occurrences in
28: For each C in ˝(ai , aj ) may be identical with each other.
29: ← (A, C) Note that the derived chronicles may not be frequent for a partic-
30: End For ular clinical activity sequence given a particular clinical workflow
31: return
log L, even though each temporal constraint of those chronicles
32: End Procedure
33: Procedure::GetChildren(L, C) may be frequent. For example, as shown in Fig. 4, chronicle C12 is not
34: Input: frequent on the clinical workflow log shown in Table 2, although the
35: L is a clinical workflow log temporal constraints in C12 are frequent. To this end, we check the
36: C is a chronicle with respect to L frequency of both temporal constraints and chronicles with respect
37: Output:
to a particular workflow log, respectively (Line 16, and Line 64).
38: Children is a set of chronicles whose parent is C
39: Steps: The time complexity of the algorithm CCP-Miner has a strong
40: Let Children =∅ dependence on the generated chronicle number and temporal con-
41: Let be the set of temporal constraints contained in C straint number. If there are many chronicles and each chronicle
42: For each ∈
has many temporal constraints, the overall complexity of the pro-
43: Let = Strict(L, , minsupp) be a set of stricter temporal
constraints given L and
posed approach will grow exponentially. Supposing that there is
44: For each in a k-pattern , such that there are Ck2 /2 pairs of clinical activities.
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 45
This implies that Ck2 /2 temporal constraints in each chronicle can be Table 3
Six diseases’ clinical workflow logs used in the experiments.
built on . For each pair of clinical activities, we assume the number
of occurrences in a clinical workflow log is x. Then, at most, there Diseases # of clinical # of clinical # of clinical
C 2 /2 2 pathway traces events activities
are altogether x k = O(xk ) different chronicles that can be built
2
on . Since this new factor xk is very high, we were forced to find Bronchial lung cancer 48 3405 225
Gastric cancer 100 8024 274
strategies that limit its impact on the discovery time. This included
Cerebral hemorrhage 262 27,949 520
allowing the user to define some milestone clinical activities in Breast cancer 157 4539 46
advance so that only the pairs between these milestone activities Infarction 445 23,106 513
and other interesting activities are considered in building chroni- Colon cancer 52 4840 292
cle graph; and then mining chronicle information on the interested
activities. For example, we assume that clinical activities admis-
sion, surgery and discharge are three milestone activities, and users CloSpan follows a candidate maintenance-and-test paradigm over
may only be interested in mining the temporal constraints between the set of already mined closed sequence candidates. Using CloSpan
admission, surgery, discharge and other clinical activities, respec- for mining long sequences or for mining with very low support
tively, regardless of the temporal constraints among other clinical thresholds tends to be prohibitively expensive [46]. The algo-
activities except for admission, surgery, and discharge. Thus, based rithm BIDE adopts a closure checking scheme, called BI-Directional
on the proposed algorithm TC-Miner, we present a top chronicle Extension, which mines closed sequences without candidate main-
generate algorithm based on milestone activities, i.e., MATC-Miner, tenance. Performance studies [46] have shown that BIDE is more
outlined in Algorithm 4. efficient than CloSpan.
However, it may be not efficient to adopt BIDE in mining
Algorithm 4 (The top chronicle generating algorithm based on mile- frequent closed clinical activity sequences directly. As we have
stone activities). mentioned above, clinical pathways have their specific charac-
1: Procedure::MATC-Miner(A, B, L) teristics (e.g., specific starting/ending activity types, etc.). It is,
2: Input: therefore, necessary to design a specific closure checking scheme
3: A is a particular clinical activities sequence
instead of BI-Directional Extension of BIDE in mining clinical activ-
4: B is the set of milestone clinical activities that users select
5: L is a clinical workflow log ity sequences. To this end, we present a specific clinical activity
6: Output: sequence mining algorithm, i.e., SCP-Miner, in Section 4.1. In this
7: CTOP
A is the top chronicle with the limitation of B study, we compare the performance of the proposed algorithm SCP-
8: Steps:
Miner with the algorithm BIDE [46] using a set of real-life clinical
9: Let CTOP
A =∅
10: For each (ai , aj ) ⊂ A where ai ∈ B or aj ∈ B do
workflow logs of clinical pathways of six diseases recorded by the
11: O(ai , aj ) ← {(ai , ti )(aj , tj )|(ai , ti ) ∈ ∧ (aj , tj ) ∈ ∧ ∈ L} EMR system in Zhejiang Huzhou Central Hospital of China. The sys-
12: ˝(ai , aj )← sort ({(tj − ti )|(ai , ti )(aj , tj ) ∈ O(ai , aj )}) tem was brought on-line in August 2007, and the collected data are
13: Let (ai , aj ) = (ai , [˝(ai , aj )[0], ˝(ai , aj )[|˝(ai , aj )|], aj ) from 2007/08 to 2009/09. The details of the experimental data set
14: CTOP
A ← {(ai , aj )}
are shown in Table 3. All experiments were performed on a Lenevo
15: End For
16: Return CTOP Compatible PC with an Intel Pentium IV CPU 2.8 GHz, 4G byte main
A
17: End Procedure memory running on Microsoft Windows 7. The algorithms were
implemented using Microsoft C#. All run-times in the figures are
Combining algorithms SCP-Miner, MATC-Miner, and CCP- in seconds.
Miner, a clinical pathway pattern mining approach is presented. We must mention that algorithm BIDE can mine frequent
Note that users can run the algorithms several times until they are closed clinical activity sequences without chronicles informa-
satisfied by the results. It makes the platform more proactive in tion on the sequences. However, it is possible to compare the
pattern elaboration, and thus less tedious for the human analyst. proposed algorithm SCP-Miner with BIDE, and algorithms SCP-
This is the reason why milestone activities are selected by users in Miner + MATC-Miner + CCP-Miner with BIDE, respectively, in order
advance. The advantage of this strategy is that the analyst can mod- to investigate the performances of the proposed approach. In par-
ify and refine his mining request and run the mining process again ticular, for SCP-Miner + MATC-Miner + CCP-Miner, we let admission,
with the new request, and continue on iteratively. Another advan- surgery and discharge be the milestone activities in order to gener-
tage is that the mining process can be practical and thus, users can ate the top chronicle. This means that we consider the temporal
search clinical pathway traces for very complex pattern structures. constraints between admission, surgery and discharge and other
activities in the sequences of closed clinical pathway patterns,
5. Experiment regardless of the temporal constraints between those activities
except for admission, surgery and discharge.
In this section, we compare the proposed approach with sequen- In addition, we note that for algorithms SCP-Miner and BIDE, we
tial pattern mining algorithms on multiple clinical workflow logs, applied the pseudo projection technique in order to save both time
discuss our empirical evaluation, and illustrate how our approach and memory space. The main idea of the pseudo projection tech-
can contribute to clinical pathway redesign. nique is that instead of generating numerous physical projections in
main memory, one can register the index of the projected position
5.1. Comparison with sequential pattern mining algorithms with its sequence identifier in the sequence [41,47,44]. Through the
indexes, it can easily divide the searching space and then retrieve all
The proposed approach consists of two steps: frequent closed the necessary information for finding frequent sequential patterns
clinical activity sequence mining and frequent chronicles mining [41,47,44].
on discovered clinical activity sequences, provided particular clin- In order to illustrate the proposed approach practically, we com-
ical workflow logs. In the experiments, we firstly evaluated the pared the proposed SCP-Miner, SCP-Miner + MATC-Miner + CCP-
performance of the proposed algorithm SCP-Miner to mine fre- Miner, and BIDE from the perspectives of discovered pattern
quent closed clinical activity sequences. To our best knowledge, numbers, run-times, and scalability, respectively.
two efficient frequent closed sequence pattern mining algorithms Fig. 5 summarized the number of frequent patterns and run-
have been proposed, i.e., CloSpan [45], and BIDE [46]. The algorithm times of bronchial lung cancer clinical workflow log, which consists
46 Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50
of 48 traces. As shown in Fig. 5(A), by applying the algorithm SCP- both SCP-Miner and SCP-Miner + MATC-Miner + CCP-Miner have a
Miner, it can mine sequences of closed clinical pathway patterns linear scalability in terms of the run-times against the increasing
without chronicles information on the generated patterns. It out- number of traces.
performs algorithm BIDE in terms of run-times of mining for closed
clinical pathway patterns of bronchial lung cancer. In addition, we 5.2. Discussion
can see that by applying SCP-Miner + MATC-Miner + CCP-Miner, the
run-time performance approaches algorithm BIDE and SCP-Miner We have implemented and tested the proposed approach using
with the increase of minimal support threshold. As indicated in [41], Microsoft C#. Fig. 12 depicts a screen-shot of mining results of the
the most important factor influencing run-times of mining frequent breast cancer process. On the top of Fig. 12, clinical activities that
patterns is not whether the algorithms or patterns are complicated belong to one closed clinical pathway pattern are listed sequentially
or not, but whether it generates a large set of patterns, resulting in along the time-line of patient LOS. In addition, temporal constraints
a longer processing time for these patterns. between the milestone activities (i.e., admission, surgery,discharge)
Fig. 5(B)4 shows the experimental results on the discovered and other interesting activities are shown on the bottom of Fig. 13
number of patterns in comparison with SCP-Miner and SCP- . We note that users can either select all clinical activities of one
Miner + MATC-Miner + CCP-Miner. As shown in Fig. 5(B), when the pathway pattern in order to display the temporal relations, or can
minimum support increases, the number of patterns discovered also select several interesting activities, and display their tempo-
by SCP-Miner + MATC-Miner + CCP-Miner decreases, and is almost ral relations with milestone activities on the Figure. The discovered
equal to the number of sequences discovered by the algorithm clinical process patterns have been evaluated by the medical staff at
SCP-Miner. This results in reducing the processing time. The experi- the Zhejiang Huzhou Central Hospital of China, who understand the
mental results confirm this conclusion and reveal the advantages of beneficial effects of the clinical process mining of medical behav-
the proposed approach in mining closed clinical pathway patterns. iors. They also fully understand the mining results of our approach.
Similar to the mining results of bronchial lung-cancer clin- They indicate that the mining results of our approach: (1) allow
ical workflow log, the experimental results on the other five clinical activities to be clearly spread along the time-line of patient
diseases, as shown in Figs. 6–10, indicate the feasibility of the LOS; (2) allow for certain temporal relationships to explicitly exist
proposed approach. In comparison to the relative efficiency of SCP- between the activities; and (3) let a clinical process pattern enu-
Miner and BIDE, SCP-Miner always outperforms BIDE. As shown merate regular medical behaviors that are expected to occur in
in part (A) of these figures, the run-times of the algorithm SCP- patient-care journeys, which serve as checkpoints for the perfor-
Miner increases very slowly with minimum support decreases. mance of the patient-care journey. We would like to mention that
Furthermore, the run-times of SCP-Miner + MATC-Miner + CCP- physicians at the Zhejiang Huzhou Central Hospital of China are
Miner approaches BIDE and SCP-Miner with the minimum support satisfied with the mined results. The evaluations received from
threshold increases. As indicated in part (B) of these figures, medical staff indicate that the proposed approach has the ability to
with the minimum support increases, SCP-Miner generates quite find a clear characterization of possible clinical pathway patterns
a smaller set of patterns even at the low minimum sup- for particular diseases.
port threshold. As well, the number of patterns discovered by Finally, we use a simple example to illustrate how the discov-
SCP-Miner + MATC-Miner + CCP-Miner is almost equal to the per- ered patterns can contribute to clinical pathway redesign. As shown
formance of SCP-Miner, especially with the increases of minimum in Fig. 13(A), there is a fragment of bronchial lung cancer clini-
support. The experimental results indicate that the proposed cal pathway recommended by the Chinese Ministry of Health. In
approach is suitable to mine closed clinical pathway patterns. Fig. 13(B), there is a fragment of bronchial lung cancer pathway pat-
Next, we will study how the proposed approach performs with tern defined by physicians at the Zhejiang Huzhou Central Hospital
the increasing size of a clinical workflow log. Fig. 11 shows how of China. As well, Fig. 13(C) highlights a fragment of discovered pat-
SCP-Miner, SCP-Miner + MATC-Miner + CCP-Miner, and BIDE scale tern from the collected logs. These three pattern fragments consist
up as the number of input-clinical pathway traces is increased, from of three milestone activities, i.e., admission, surgery and discharge,
100 to 445. We note that all the experiments were performed on and the temporal constraints among three activities. We can see
the infarction clinical workflow log with the same minimum sup- that the temporal constraints, in three pattern fragments, reveals
port threshold of 0.25%. The execution times are normalized with the different time spans between any two clinical activities. For
respect to the time for the 100 input-traces. It can be observed that example, in the recommended clinical pathway, activity surgery is
assumed to be performed after 4–7 days of admission, and activity
discharge is assumed to be performed after 8–14 days of surgery;
in the physicians’ defined pattern, activity surgery is assumed to
4
We must mention that the algorithm BIDE discovers the same number of pat- be performed after 4 days of admission, and activity discharge is
terns with the algorithm SCP-Miner, since SCP-Miner is designed based on the
principle of BIDE except using the forward closure checking scheme instead of BI-
assumed to be performed after 8 days of surgery, while in actual
Directional Extension. Thus, we have not presented the mined results of BIDE in the patient-care journeys, the activity surgery is occurred between 3
discovered number of patterns. and 4 days after admission, and clinical activity discharge is occurred
Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50 47
determine a threshold value to cut off exceptional or incorrectly questions need to be answered before discovered clinical pathway
logged behavior. patterns can play an essential role in clinical applications.
Acknowledgements
6. Conclusion and future work
This work was supported by the National Nature Science Foun-
Clinical pathways are standardized patient-care processes. Hos- dation of China under Grant No. 81101126. The authors are
pital managers have presented their requirements to use tools especially thankful for the positive support received from the Zhe-
to analyze medical behaviors in patient-care processes so as to jiang Huzhou Central Hospital of China as well as to all medical staff
continuously (re)design and optimize clinical pathways [4]. The involved. In addition, the authors would like to thank the anony-
approach proposed in this study can be viewed as a technology mous reviewers for their constructive comments on an earlier draft
that contributes to this purpose. Our goal is to extract explicit clin- of this paper.
ical pathway patterns from medical behaviors, which are recorded
in clinical workflow logs. Thus, the challenge is to create clinical
References
pathway patterns given a log, such that discovered patterns are
consistent with the observed dynamic behavior. The experimen- [1] Wakamiya S, Yamauchi K. What are the standard functions of electronic clinical
tal results indicate that the proposed approach provides the ability pathways? International Journal of Medical Informatics 2009;78(8):543–50.
to discover clinical pathway patterns that cover the most frequent [2] Lenz R, Reichert M. IT support for healthcare processes-premises, challenges,
perspectives. Data & Knowledge Engineering 2007;61(1):39–58.
medical behaviors which are regularly encountered in clinical prac- [3] Lenz R, Blaser R, Beyer M, Heger O, Biber C, Bäumlein M, et al. IT support for
tice. clinical pathways-lessons learned. International Journal of Medical Informatics
As mentioned above, discovered clinical pathway patterns have 2007;76(3):S397–402.
[4] Schuld J, Schäfer T, Nickel S, Jacob P, Schilling MK, Richter S. Impact of
been evaluated by clinical experts and hospital managers from IT-supported clinical pathways on medical staff satisfaction. a prospec-
the Zhejiang Huzhou Central Hospital of China. These individuals tive longitudinal cohort study. International Journal of Medical Informatics
indicate that the proposed approach can provide a consistent char- 2011;80(3):151–6.
[5] Quaglini S, Stefanelli M, Lanzola G, Caporusso V, Panzarasa S. Flexible
acterization of all possible clinical pathway patterns for particular guideline-based patient careflow systems. Artificial Intelligence in Medicine
diseases. Notably, a fully development of a clinical pathway mod- 2001;22(1):65–80.
eling and analysis tool is finishing, which will be employed in the [6] Hunter B, Segrott J. Re-mappling client journeys and professional identities: a
review of the literature on clinical pathways. International Journal of Nursing
EMR system in the hospital.
Studies 2008;45:608–25.
Given the relevance of the analysis of medical behaviors and the [7] Weiland DE. Why use clinical pathways rather than practice guidelines? Amer-
problems that experts have in making good clinical pathway mod- ican Journal of Surgery 1997;174:592–5.
els, we will continue to work on the topics mentioned in this paper. [8] Uzark K. Clinical pathways for monitoring and advancing congenital heart dis-
ease care. Progress in Pediatric Cardiology 2003;18:131–9.
Note that the bottleneck of clinical pathway pattern mining is not [9] Loeb M, Carusone SC, Goeree R, Walter SD, Brazil K, Krueger P, et al. Effect of
based on whether users can derive the complete set of clinical path- a clinical pathway to reduce hospitalizations in nursing home residents with
way patterns efficiently, but rather on whether they can derive a pneumonia. Journal of the American Medical Association 2006;295:2503–10.
[10] Zand DJ, Brown KM, Konecki UL, Campbell JK, Salehi V, Chamberlain JM. Effec-
compact but high quality set of patterns that can cover most useful tiveness of a clinical pathway for the emergency treatment of patients with
medical behaviors in clinical practice. Although our study reveals inborn errors of metabolism. Pediatrics 2008;122:1191–5.
that the proposed approach is effective in analyzing medical behav- [11] Alexandrou D, Skitsas I, Mentzas G. A holistic environment for the design and
execution of self-adaptive clinical pathways. IEEE Transactions on Information
iors and discovering efficient clinical pathway patterns, there are Technology in Biomedicine 2011;15(1):108–18.
even more complex analysis and evaluation tasks that need to be [12] Huang Z, Lu X, Duan H. Using recommendation to support adaptive clinical
considered. pathways. Journal of Medical Systems 2011:1–12, 10.1007/s10916-010-9644-
3.
In fact, clinical experts at the Zhejiang Huzhou Central Hospital
[13] Combi C, Gozzi M, Oliboni B, Juarez JM, Marin R. Temporal similarity
of China, have indicated that, even though our approach is efficient measures for querying clinical workflows. Artificial Intelligence in Medicine
for mining precise and complete set of regular medical behaviors 2009;46(1):37–54.
[14] Lu X, Huang Z, Duan H. Supporting adaptive clinical treatment processes
in clinical pathways, there are still a number of infrequent behav-
through recommendations. Computer Methods and Programs in Biomedicine
iors that are missing in the discovered patterns. Note that these 2011.
infrequent behaviors may represent interesting variants in clinical [15] Huang Z, Lu X, Gan C, Duan H. Variation prediction in clinical processes. In:
pathways, and thus need to be discovered and analyzed. Approxi- Peleg M, Lavrac N, Combi C, editors. Artificial intelligence in medicine, vol. 6747
of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer; 2011. p.
mate frequent patterns [44,51] could be a possible choice to handle 286–95.
variants in clinical pathways. As well, sequence clustering could [16] Peleg M, Mulyar N, van der Aalst WMP. Pattern-based analysis of computer-
also be used to classify and analyze variants in clinical pathways interpretable guidelines: don’t forget the context. Artificial Intelligence in
Medicine 2012;54(1):73–4.
[27,52,53]. However, the interesting questions that remain address [17] Campbell H, Hotchkiss R, Bradshaw N, Porteous M. Integrated care pathways.
the issues of how to design efficient algorithms for mining and British Medical Journal 1998;316:133–7.
detecting variants in clinical pathways, as well as, how to explain [18] Cheah J. Development and implementation of a clinical pathway programme
in an acute care general hospital in Singapore. International Journal for Quality
these variants in a maximum-informative manner. Much research in Health Care 2000;12:403–12.
is still needed to make such mining both effective and efficient. [19] Dousson C, Duong TV. Discovering chronicles with numerical time constraints
In addition, to make clinical pathway pattern mining an essen- from alarm logs for monitoring dynamic systems. In: Thomas Dean, editor.
Proceedings of the 16th international joint conference on artificial intelligence.
tial task in clinical practice, much research is needed to further
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 1999. p. 620–6.
develop pattern-based mining methods. For example, how does [20] Dousson C, Maigat PL, France Telecom R&D. Chronicle recognition improve-
one construct an efficient classification model using the discov- ment using temporal focusing and hierarchization. In: Veloso Manuela M,
editor. Proceedings of the 20th international joint conference on artificial intel-
ered clinical pathway patterns in order to specialize in clinical
ligence. Menlo Park, CA: IJCAI/AAAI Press; 2007. p. 324–9.
pathways? What sorts of clinical pathway patterns are more effec- [21] Cram D, Mathern B, Mille A. A complete chronicle discovery approach: appli-
tive and discriminative than other patterns in treating particular cation to activity analysis. Expert Systems 2011.
patients? How does one measure the adherence between the dis- [22] Westbrook JI, Coiera EW, Gosling AS, Braithwaite J. Critical incidents and jour-
ney mapping as techniques to evaluate the impact of online evidence retrieval
covered clinical pathway patterns and actual medical behaviors, in systems on health care delivery and patient outcomes. International Journal of
order to assist medical staff to analyze clinical pathways? These Medical Informatics 2007;76:234–45.
50 Z. Huang et al. / Artificial Intelligence in Medicine 56 (2012) 35–50
[23] Agrawal R, Gunopulos D, Leymann F. Mining process models from workflow [38] Cardoso J, Lenic M. Web process and workflow path mining using the mul-
logs. In: Schek HJ, Saltor F, Ramos I, Alonso G, editors. Sixth international con- timethod approach. International Journal of Business Intelligence and Data
ference on extending database technology. London: Springer-Verlag; 1998. p. Mining 2006;1:304–28.
469–83. [39] van de Klundert J, Gorissen P, Zeemering S. Measuring clinical pathway adher-
[24] Cook JE, Wolf AL. Discovering models of software processes from event- ence. Journal of Biomedical Informatics 2010;43(6):861–72.
based data. ACM Transactions on Software Engineering and Methodology [40] Lin F, Chen S, Pan S, Chen Y. Mining time dependency patterns in clinical path-
1998;7(3):215–49. ways. International Journal of Medical Informatics 2001;62(1):11–25.
[25] Yang W, Hwang S. A process-mining framework for the detection of healthcare [41] Hu Y, Huang TCK, Yang HR, Chen YL. On mining multi-time-interval sequential
fraud and abuse. Expert Systems with Applications 2006;31(1):56–68. patterns. Data & Knowledge Engineering 2009;68(10):1112–27.
[26] van der Aalst WMP, Weijters AJMM, Maruster L. Workflow mining: discovering [42] Wu SY, Chen YL. Discovering hybrid temporal patterns from sequences con-
process models from event logs. IEEE Transactions on Knowledge and Data sisting of point- and interval-based events. Data & Knowledge Engineering
Engineering 2004;16(9):1128–42. 2009;68(11):1309–30.
[27] Rebuge A, Ferreira DR. Business process analysis in healthcare environ- [43] Chen YL, Wu SY, Wang YC. Discovering multi-label temporal patterns in
ments: a methodology based on process mining. Information Systems sequence databases. Information Sciences 2011;181(3):398–418.
2012;37(2):99–116. [44] Han J, Cheng H, Xin D, Yan X. Frequent pattern mining: current status and future
[28] Lang M, urkle TB, Laumann S, Prokosch H-U. Process mining for clinical work- directions. Data Mining and Knowledge Discovery 2007;15:55–86.
flows: challenges and current limitations. In: Andersen SK, Klein GO, Schulz S, [45] Yan X, Han J, Atshar R. CloSpan: mining closed sequential patterns in large
Aarts J, editors. Proceedings of MIE2008 the XXI international congress of the datasets. In: Daniel Barbar, Chandrika Kamath, editors. Proceedings of the third
European Federation for Medical Informatics. 2008. p. 229–34. SIAM international conference on data mining. San Francisco, CA, USA: SIAM;
[29] Günther WC, Rozinat A, van der Aalst WMP. Activity mining by global trace seg- 2003. p. 166–77.
mentation. In: van der Aalst WMP, Mylopoulos J, Sadeh NM, Shaw MJ, Szyperski [46] Wang J, Han J, Li C. Frequent closed sequence mining without candi-
C, Rinderle-Ma S, et al., editors. Business process management workshops, date maintenance. IEEE Transactions on Knowledge and Data Engineering
vol. 43 of Lecture Notes in Business Information Processing. Berlin/Heidelberg: 2007;19(8):1042–56.
Springer; 2010. p. 128–39. [47] Tzvetkov P, Yan X, Han J. TSP: mining top-k closed sequential patterns. Knowl-
[30] Daniyal A, Abidi S. Semantic web-based modeling of clinical pathways using edge and Information Systems 2005;7:438–57.
the UML activity diagrams and OWL-S. In: David Rian̈o, editor. Knowledge rep- [48] Adlassnig K-P, Combi C, Das AK, Keravnou ET, Pozzi G. Temporal representa-
resentation for health-care. Data, processes and guidelines, vol. 5943 of Lecture tion and reasoning in medicine: research directions and challenges. Artificial
Notes in Computer Science. Berlin/Heidelberg: Springer; 2010. p. 88–99. Intelligence in Medicine 2006;38(2):101–13.
[31] Bryan S, Holmes S, Prostlethwaite D, Carty N. The role of integrated care path- [49] Sacchi L, Larizza C, Combi C, Bellazzi R. Data mining with temporal abstrac-
ways in improving the client experience. Professional Nurse 2002;18(2):77–9. tions: learning rules from time series. Data Mining and Knowledge Discovery
[32] Ye Y, Jiang Z, Diao D, Yang D, Du G. An ontology-based hierarchical semantic 2007;15(2):217–47.
modeling approach to clinical pathway workflows. Computers in Biology and [50] Combi C, Oliboni B. Visually defining and querying consistent multi-
Medicine 2009;39:722–32. granular clinical temporal abstractions. Artificial Intelligence in Medicine
[33] Li C. Mining process model variants: challenges, techniques, examples. PhD 2012;54(2):75–101.
thesis, University of Twente, The Netherlands; 2010. [51] Li C, Jea K. An adaptive approximation method to discover frequent itemsets
[34] Darnton G, Darton M. Business process analysis. CA: International Thompson over sliding-window-based data streams. Expert Systems with Applications
Business Press; 1997. 2011;38(10):13386–404.
[35] Hwang S, Wang C, Yang W. Discovery of temporal patterns from process [52] Ferreira DR, Zacarias M, Malheiros M, Ferreira P. Approaching process min-
instances. Computers in Industry 2004;53:345–64. ing with sequence clustering: experiments and findings. In: Alonso G, Dadam
[36] van der Aalst WMP, Reijers HA, Weijters AJMM, van Dongen BF, Alves de P, Rosemann M, editors. vol. 4714 of the Lecture Notes in Computer Science.
Medeiros AK, Song M, et al. Business process mining: an industrial application. Berlin/Heidelberg: Springer; 2007. p. 360–74.
Information Systems 2007;32(5):713–32. [53] Ferreira DR. Applied sequence clustering techniques for process mining. In:
[37] Greco G, Guzzo A, Manco G, Sacca D. Mining and reasoning on workflows. IEEE Cardoso J, van der Aalst W, editors. Handbook of research on business process
Transactions on Knowledge and Data Engineering 2005;17:519–34. modeling, information science reference. IGI Global; 2009. p. 492–513.