Published Paper
Published Paper
a r t i c l e i n f o a b s t r a c t
Article history: Complex clinical decisions require the decision maker to evaluate multiple factors that may interact with
Received 17 June 2014 each other. Many clinical studies, however, report ‘univariate’ relations between a single factor and
Accepted 31 July 2014 outcome. Such univariate statistics are often insufficient to provide useful support for complex clinical
Available online 9 August 2014
decisions even when they are pooled using meta-analysis. More useful decision support could be pro-
vided by evidence-based models that take the interaction between factors into account. In this paper,
Keywords: we propose a method of integrating the univariate results of a meta-analysis with a clinical dataset
Clinical decision support
and expert knowledge to construct multivariate Bayesian network (BN) models. The technique reduces
Bayesian networks
Evidence-based medicine
the size of the dataset needed to learn the parameters of a model of a given complexity. Supplementing
Evidence synthesis the data with the meta-analysis results avoids the need to either simplify the model – ignoring some
Meta-analysis complexities of the problem – or to gather more data. The method is illustrated by a clinical case study
into the prediction of the viability of severely injured lower extremities. The case study illustrates the
advantages of integrating combined evidence into BN development: the BN developed using our method
outperformed four different data-driven structure learning methods, and a well-known scoring model
(MESS) in this domain.
Ó 2014 Elsevier Inc. All rights reserved.
1. Introduction makes it hard to collect very large datasets (so called ‘big data’);
their complexity demands a sophisticated multivariate model but
It is a challenge to build effective decision-support models for their importance ensures that a large number of relevant research
complex clinical problems; such problems involve multiple inter- studies is available.
acting factors [1,2] and to account for both the factors and their In these domains, the first method of building models – purely
interaction a ‘multivariate’ model is needed [3]; these can have from data – results in simple models that cannot deal with the com-
many forms: our focus is on Bayesian networks. In general, a plexity of the problem [1] because there is not enough data to sup-
multivariate model can be built in a number of ways: (1) purely port a complex model. The third approach fails because clinical
from data using statistical and machine learning techniques [4], studies rarely publish information detailed enough for multivariate
(2) from a combination of clinical knowledge and data [5–7] or meta-analysis [9]. Instead, many medical studies report ‘univariate’
(3) from published literature using multivariate meta-analysis relations between a single factor and an outcome. Randomised con-
techniques [8]. Each of these techniques has been shown to be suc- trolled trials, for example, analyse the effect of a single treatment by
cessful in certain conditions but in this paper, we focus on clinical using randomisation to decrease the confounding effect of other
problems where none of these techniques is sufficient, on its own, variables [10]. Similarly, many observational studies report the
to build a useful decision support model. That is, our focus is on relation between individual risk factors and outcomes even when
problems that are complex, important but also rare: their rarity their dataset contains information about multiple factors. The sec-
ond approach – combining knowledge and data – could work but it
ignores the large body of published evidence; our challenge is
⇑ Corresponding author. Address: Risk and Information Management Research therefore to exploit the results of a meta-analysis of studies report-
Group, Room CS332, School of Electronic Engineering and Computer Science, West ing univariate relations to supplement a dataset that is otherwise
Square, Queen Mary University of London, E1 4NS London, UK.
inadequate to support a complex multivariate model.
E-mail address: [email protected] (B. Yet).
http://dx.doi.org/10.1016/j.jbi.2014.07.018
1532-0464/Ó 2014 Elsevier Inc. All rights reserved.
374 B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385
review and some data about multivariate relations are available. The conditional probability distribution (CPD) of a discrete var-
However, the amount of data may be insufficient to learn the iable is encoded in a node probability table (NPT) in a BN. Table 1
parameters of some relations in the BN. shows the NPT of the variable Y. We require 4 parameters for this
NPT: Pðy1 jx11 ; x12 Þ; Pðy1 jx11 ; x22 Þ; Pðy1 jx21 ; x12 Þ and Pðy1 jx21 ; x22 Þ.
3.1. Structure The parameters of the variable Y can be learnt from data using
the maximum likelihood estimate (MLE) approach. For example,
A BN structure can be developed in two stages: selecting vari- Pðy1 jx11 ; x12 Þ) can be estimated by dividing M½y; x11 ; x12 to M½x11 ; x12 ,
ables, and identifying the relations between those variables. where M½y; x11 ; x12 represents the count of data instances where
Domain experts use the results of a meta-analysis to select the Y = y1, X 1 ¼ x11 and X 2 ¼ x12 , and M½x11 ; x12 represents the count of
important variables for the BN. The experts review every variable
data instances where X 1 ¼ x11 and X 2 ¼ x12 .
that is considered to be clinically important in the meta-analysis.
During the review, they define mechanistic relations between each M½y1 ; x11 ; x12
P y1 jx11 ; x12 ¼
of these variables and the outcome. These definitions enable us to M½x11 ; x12
(1) build a causal BN structure that is consistent with clinical
knowledge and (2) identify the variables that are clinically impor- Suppose we have a dataset, with a sample size of M = 250, to
tant considering the aims and scope of the model. Variables that learn the parameters of the BN in Fig. 2. Fig. 3 shows a part of
are outside the scope of the model are excluded even when they the relevant counts from this imaginary dataset. There are only 3
have a clinically significant effect in the meta-analysis. data instances where Y = y1, X 1 ¼ x11 and X 2 ¼ x12 as shown by
The mechanistic relations between the observed factors and M½y1 ; x11 ; x12 ¼ 3 in this figure.
outcome may depend on clinical factors that are not available in Our aim is to estimate the parameters of the BN. Although the
the data or not examined in the meta-analysis. For example, the overall sample size of the data is not small, there is not an adequate
data may not distinguish between the measurements and true amount of data for learning some of the parameters. For example,
state of a variable, or may exclude a part of the important causal there are only a few data instances to learn the probability of
factors in the domain (see [6] and chapters 1 and 2 of [20] for a Pðy1 jx11 ; x12 Þ since M½y1 ; x11 ; x12 ¼ 3 and M½x11 ; x12 ¼ 10.
more detailed discussion of this issue). In this case, latent variables As well as the data, suppose we have the results of a meta-
are used to model clinical knowledge in the BN. analysis that analyses the relation between Y and X1. This meta-
analysis pools the conditional probabilities of Pðy1 jx11 Þ reported in
3.2. Parameters different studies. The result of the meta-analysis is reported by
the mean, lpnew ðy1 jx11 Þ, and variance, r2pnew ðy1 jx11 Þ, of the predictive
Meta-analysis of a univariate relation provides a probability distribution of the pooled conditional probability (see Table 2). A
conditioned on a single variable, such as P(Y|X1). Such probability way of calculating these statistics is described in Section 2.
distribution cannot be directly used for a BN variable that is condi- The results of the meta-analysis cannot be directly used for the
tioned on multiple parents such as P(Y|X1, ... , Xn). In this section, we BN parameters since the variable Y is conditioned on both X1 and
present a parameter learning method for combining the results of a X2 in the BN model whereas it is conditioned only on X1 in the
univariate meta-analysis and data to learn the parameters of a BN meta-analysis. In other words, there is no parameter to use
variable that has multiple parents. Our method uses auxiliary Pðy1 jx11 Þ directly in the NPT of the variable Y (see Table 1).
Bayesian models to learn the parameters of the BN used for deci- In the remainder of this section, we present a novel technique
sion support. These auxiliary models are hierarchical models with that combines the data shown in Fig. 3 and the meta-analysis results
a structure that is conceptually similar to the Bayesian meta-anal- shown in Table 2 to learn the parameters Pðy1 jx11 ; x12 Þ and Pðy1 jx11 ; x22 Þ
ysis model described in Section 2. We introduce the proposed for the NPT of the variable Y. The generalisation of this method for a
method by a simple example in Section 3.2.1, and examine the larger number of parents and states is described in Section 3.2.2.
application of this method to more complex BN models in Fig. 4 shows a BN representation of the implemented technique.
Section 3.2.2. The conditional probabilities provided by a meta- The BN representation is divided into five components that are
analysis may be relevant to variables that are not directly linked described in the remainder of this section:
in the BN structure. We examine this issue in Section 3.2.3.
1. Data: This part uses the binomial distribution to model the
3.2.1. Illustration of the parameter learning method relation between the conditional probability distributions
In this section, we introduce our parameter learning method (CPD) that we aim to estimate and the observed counts in the
with the simple BN shown in Fig. 2. data. For example, the number of data instances where
This BN has 3 variables and each of its variables has 2 states: X 1 ¼ x11 , X 2 ¼ x12 and Y = y1, shown by M½y1 ; x11 ; x12 , has a bino-
mial distribution where the probability parameter is
X 1 ¼ x11 ; x21
Pðy1 jx11 ; x12 Þ and the number of trials parameter is M½x11 ; x12 . The
binomial distributions used in this part are shown below:
X 2 ¼ x12 ; x22
M y1 ; x11 ; x12 Binomial M x11 ; x12 ; P y1 jx11 ; x12
1 2
Y ¼ fy ; y g
M y1 ; x11 ; x22 Binomial M x11 ; x22 ; P y1 jx11 ; x22
Table 1
NPT of the variable Y.
Table 2 2. Probability distributions for NPT: This part contains the CPDs that
Predictive distribution parameters from the meta- we aim to estimate for the NPT of Y. We assign uniform priors
analysis.
for these distributions, informative expert priors can also be
Meta-analysis of Pðy1 jx11 Þ used when available:
Predictive distribution parameters
Pðy1 jx11 ; x12 Þ Uniformð0; 1Þ
lpnew ðy1 jx11 Þ 0.2
r2pnew ðy1 jx11 Þ 0.005
X
Pðy1 jx11 Þ ¼ ðPðy1 jx11 ; X 2 Þ PðX 2 ÞÞ Fig. 6 shows an abstract graphical illustration of the generalised
X2 auxiliary parameter learning model. This model is a generalisation
¼ Pðy1 jx11 ; x12 ÞPðx12 Þ þ Pðy1 jx11 ; x22 ÞPðx22 Þ of the model shown in Fig. 4. This illustration is not a BN; it is a
schema for building an auxiliary parameter learning model for
4. Probabilities required for marginalisation: In order to calculate any number of states and parent variables. The size of the auxiliary
the marginalisation in part 3, we need the probability distribu- parameter learning model grows rapidly with increasing number
tions of Pðx12 Þ and Pðx22 Þ. We assign uniform priors for these of parents and states.
variables. We also assign a constraint to ensure that the sum In Fig. 6, the variables shown by ellipses are unknown variables
of Pðx12 Þ and Pðx22 Þ is equal to 1. that will be estimated by the model. The variables shown by
rounded rectangles are observed with the values from the meta-
Pðx12 Þ Uniformð0; 1Þ analysis, and the variables shown by rectangles are observed from
the dataset. The constraints that sum probabilities to 1 are not
included in this figure to simplify the illustration. By running this
Pðx22 Þ Uniformð0; 1Þ auxiliary model, we estimate probability distributions for the
parameters P(Y|X) required by the NPT of Y. Since the BN requires
X only a point estimate of the parameter, not the entire distribution;
X2
PðX 2 Þ ¼ Pðx12 Þ þ Pðx22 Þ ¼ 1 we use the mean of this distribution as the BN parameter.
According to our model, the data related to Y, i.e. M[Y, X], is gen-
5. Values from meta-analysis: The pooled estimate lpnew ðy1 jx11 Þ erated by the binomial distribution with the probability of success
from the meta-analysis is modelled with the normal distribu- P(Y|X) and the number of trials M[X].
tion truncated to a unit interval as it represents a probability
M½Y; X BinomialðM½X; PðYjXÞÞ
value, denoted by TNormal[0, 1] (l, r2). We use Pðy1 jx11 Þ from
the marginalisation in part 3 and r2pnew ðy1 jx11 Þ from the predic- M[Y, X] represents the count of data instances for specific values of
tive distribution as the mean and variance of this normal distri- X1, . . ., Xn and Y. For example, M½y2 ; x11 ; x32 ; . . . ; x4n represents the
bution respectively. The values from the meta-analysis are number of data instances where Y ¼ y2 ; X 1 ¼ x11 ; X 2 ¼ x32 ; . . . ;
modelled as: X n ¼ x4n . Similarly M[X] represent the number of data instances
where X1, . . ., Xn have certain values.
lpnew ðy1 jx11 Þ TNormal½0;1 ðPðy1 jx11 Þ; r2pnew ðy1 jx11 ÞÞ Our aim is to estimate the CPD of P(Y|X). We assign a uniform
After the observations from the data and meta-analysis are entered prior for this distribution; informative expert priors can also be
to the BN (see Fig. 4), the posteriors for Pðy1 jx11 ; x12 Þ and Pðy1 jx11 ; x22 Þ used when available.
can be calculated. Note that, the NPT of Y requires point estimates PðYjXÞ Uniformð0; 1Þ
for Pðy1 jx11 ; x12 Þ and Pðy1 jx11 ; x22 Þ whereas our model calculates the
entire probability distribution of these parameters. Therefore, we The meta-analysis results are conditioned on a fewer variables
take the mean of these distributions for the point estimates than the CPD in the BN. Therefore, the expected values of the
required for the NPT (see Section 17.4 of [21] for a discussion of meta-analysis results are modelled as a marginalisation of the
the use of posterior distributions for BN parameters). CPD. The meta-analysis provided the pooled conditional probabil-
ity estimates about P(Y|Xi) that are modelled as the marginalisation
In the following section, we describe the generalisation of this of P(Y|X)
technique for estimating the parameters of variables with more X
PðYjX i Þ ¼ PðYjXÞPðX n fX i gÞ
parents or states. XnfX i g
3.2.2. Application of the parameter learning method for more complex PðX n fX i gÞ is also estimated by the binomial distribution below
BNs where M denotes the total number of data instances, and
Let Y be a BN variable that has n parents, and X = {X1, X2, . . ., Xn} M½X n fX i g denotes the counts of data instances with X n fX i g.
be the set of parents of Y (see Fig. 5). Both Y and its parents have PðX n fX i gÞ has a uniform prior.
multiple states:
M½X n fX i g BinomialðM; PðX n fX i gÞÞ
Y ¼ fy1 ; . . . ; yk g
PðX n fX i gÞ Uniformð0; 1Þ
X i ¼ fx1i ; . . . ; xki g The pooled estimates from the meta-analysis lPnewðYjX i Þ are mod-
Our dataset contains a total of M data instances about X and Y elled with a normal distribution truncated to a unit interval [0–1]
(see Table 3). We also have pooled conditional probability and as it represents a probability. The mean of this distribution is the
variance estimates of the predictive distribution of P(Y|Xi) from a marginalisation of the CPD, i.e. P(Y|Xi), and the variance r2PðYjX i Þ
meta-analysis (see Table 4). A way of calculating these predictive represents the degree of uncertainty we assign to the meta-
distributions is described in Section 2. analysis results. We enter the mean and variance of the predictive
Table 3
Sample learning dataset.
Y X1 ... Xn
1 y4 x31 ... x22
2 y2 x21 ... x12
: : : :
: : : :
Fig. 5. BN with n parents used for illustrating the generalised parameter learning M y1 x11 ... x42
method.
378 B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385
Table 4
Sample meta-analysis results.
lPnew r2Pnew
P(Y|X1). 0.13 0.007 Fig. 7. BN with an intermediate variable.
P(Y|X2). 0.21 0.025 X
: : : PðYjXÞ ¼ PðYjIÞPðIjXÞ
: : : I
P(Y|Xn). 0.19 0.001
Based on this, we can estimate every parameter of P(I|X) as:
P m m
k PðYjXÞ m2S PðYji ÞPði jXÞ
distribution in meta-analysis as observations for lPnewðYjX i Þ and Pði jXÞ ¼ k
r2PnewðYjXi Þ . We use the truncated normal distribution as it is conve- PðYji Þ
nient to define the expected value and variance parameters for it In this equation, S is the set of states of the variable I except the
but lPnewðYjX i Þ may not be normally distributed as it represents a state ik:
probability value between 0 and 1.
k
S ¼ ValðIÞ n fi g
lPnewðYjXi Þ TNormal½0;1 ðPðYjX i Þ; r2PnewðYjXi Þ Þ
Consequently, the parameters of P(I|X) can be estimated given
Finally, we introduce constraints to ensure that the sum of that meta-analysis provides us with P(Y|X), and we know or have
every probability distribution is equal to 1. data to learn P(Y|I). In order to get a point estimate for parameters
X of P(I|X), the number of the states of I must not exceed the number
PðYjXÞ ¼ 1
Y
of the states of Y. Otherwise, we can get an interval for the values in
P(I|X) but cannot estimate the exact values.
X
PðX n fX i gÞ ¼ 1 For example, let X, I and Y in Fig. 7 have two states each with the
XnfX i g values {x1, x2}, {i1, i2}, {y1, y2} respectively. Suppose a meta-analysis
X provides us with P(y1|x1) = 0.8, and we learn the probabilities
PðYjX i Þ ¼ 1 P(y1|i1) = 0.9, P(y1|i2) = 0.3 from the data. Since I has two states:
Y
P(i2|x1) = 1 P(i1|x1). From these values, we can calculate P(i1|x1)
as:
3.2.3. Meta-analysis results for non-neighbour variables 2 2
The method described in Sections 3.2.1 and 3.2.2 assumes that 1 Pðy1 jx1 Þ Pðy1 ji ÞPði jx1 Þ
Pði jx1 Þ ¼ 1
the variables analysed in the meta-analysis are neighbours in the Pðy1 ji Þ
BN. In this section we look at how this assumption can be relaxed 2
Pðy1 jx1 Þ Pðy1 ji Þð1 Pði jx1 ÞÞ
1
to handle the more general case where the BN contains other – inter- ¼ 1
¼ 0:833
Pðy1 ji Þ
mediate – variables between the variables analysed in the meta-
analysis. This situation is illustrated in Fig. 7; the meta-analysis When I has more states than Y, we cannot get the exact values of the
combines the published probabilities for P(Y|X) but the BN contains BN parameters but we can find an interval of possible values. Let I
another variable I between X and Y so that the values in P(Y|X) are no has the states {i1, i2, i3} instead of {i1, i2}. Since I has three states,
longer parameters of the BN. We examine how we can use informa- P(i3|x1) = 1 P(i1|x1) P(i2|x1). Let P(y1|i3) = 0.5, and the other
tion from a meta-analysis about non-neighbouring variables to esti- values be the same as in the example above. The parameters of
mate the parameters of a variable in the BN. In particular, we use P(i1|x1) can be calculated by solving the equation below, for each
P(Y|X), calculated from a meta-analysis, to estimate P(I|X), giving independent value of Y:
the parameters of the X ? I relation, when we know or have data 2 2 3 3
for P(Y|I) describing the other intermediate relation I ? Y. 1 PðYjx1 Þ ðPðYji ÞPði jx1 Þ þ PðYji ÞPði jx1 ÞÞ
Pði jx1 Þ ¼ 1
Since every variable in a BN is conditioned on its parents, P(Y|X) PðYji Þ
provided from the meta-analysis is equal to:
In this case, we cannot get an exact value for P(I|x1) as the interme- would inform treatment decisions and risks. We developed a BN
diate variable I has more states than Y, resulting in more unknowns model for predicting the viability of a traumatic lower extremity
than equations. Instead, P(I|x1) can get any value as long as it satis- with vascular injury after salvage is attempted. The BN is built in
fies the following conditions: collaboration with the Trauma Sciences Unit at the Royal London
2 1 Hospital and the United States Army Institute of Surgical Research
Pði jx1 Þ ¼ 1:5 þ 2Pði jx1 Þ (USAISR). Two trauma surgeons (the 2nd and 4th authors) were
involved in development of the LEVT BN. A dataset of 521 lower
3 1
Pði jx1 Þ ¼ 2:5 3Pði jx1 Þ extremity injuries and 487 patients collected by USAISR, and a
systematic review and meta-analysis of the relevant prognostic
0 6 PðIjx1 Þ 6 1 factors were used to develop the LEVT BN.
We could use expert knowledge, by eliciting additional con-
4.2. Meta-analysis for lower extremity vascular trauma
straints, to narrow down the set of acceptable values for the
parameters of I. In our example, P(i1|x1) can get any value between
A number of research studies that describe the factors that
0.75 and 0.83 to satisfy the conditions above. However, some of
affect outcome following LEVT have been published. Our first step
these values may not make sense to the domain experts, and we
was to conduct a systematic review of these studies and perform a
can eliminate these values by adding additional constraints. For
meta-analysis of the factors. The systematic review included 45
example, the experts could say that P(i1|x1) should only get values
articles containing information regarding 3164 lower extremity
above 0.8 and we could reflect it by adding the P(i1|x1) > 0.8 con-
repairs. The study protocol is published in the PROSPERO register
straint to the conditions above.
of systematic reviews [22].
The technique described above can also be applied when more
We used the model described in Section 2 to pool the relevant
intermediate variables are present. Fig. 8 shows a BN that has n
conditional probabilities and calculate the predictive distributions.
intermediate variables between Y and X. A similar case, where 2
The meta-analysis models were calculated using AgenaRisk [18].
intermediate variables are present, is encountered in the case
Table 5 shows the means and variances of the posterior predictive
study described in the following section (see Section 4.4.1). An
distributions. In the following sections, we use these results to
exact estimate can be found given that we know or have data for
define the structure and parameters of a BN model.
P(Y|X), P(Y|Ik) for k = 2, . . ., n, and that Y does not have fewer states
than I1.
4.3. Deriving the BN structure
P m m
k PðYjXÞ m2S PðYjIn ÞPðIn jIn1 Þ . . . PðI2 ji1 ÞPði1 jXÞ
Pði1 jXÞ ¼ k
PðYji1 Þ The structure of the BN was defined using the methodology
described in Section 3.1. The systematic review identified clinical
k
where S ¼ ValðI1 Þ n fi1 g. factors that are potentially associated with the outcome of interest.
A meta-analysis of these factors identified the strength of this asso-
4. Case-study ciation (see Table 5). A domain expert (2nd author) examined these
variables and described the mechanistic relation between each of
Using the method described in Section 3, we developed a BN to the variables and the outcome. These relations were modelled in
predict viability of a Lower Extremity with Vascular Trauma a causal BN structure. Knowledge of mechanistic relations enabled
(LEVT). This section presents the development of the LEVT BN, us to identify variables outside the intended scope of the BN. For
and Section 5 presents its results. example, nerve injuries were not included in the model even
though they are associated with an increased probability of ampu-
4.1. Background tation. The domain expert identified that although nerve injuries
may affect function of the related tissues (sensation and move-
Injuries to the blood vessels of the lower extremity are poten- ment) they do not affect viability of the tissue. As the intended
tially devastating and can result in death, severe disability or limb scope of our model was to predict limb viability, this variable
loss. Delays or errors in treatment decisions may lead to irrevers- was excluded.
ible consequences and worsen outcome. One of the most difficult For some variables, our dataset held more detailed information
surgical decisions is whether to attempt salvage or perform an than the results of the meta-analysis. For example, soft-tissue
amputation of a severely injured extremity. Accurate risk stratifi- injury was identified as one of the most important prognostic vari-
cation and outcome prediction, for a given injury pattern, has the ables by the meta-analysis, the information in our dataset allowed
potential to improve outcome by reducing delays and errors in us to model this variable in more detailed states for the BN. Simi-
decision-making. larly, detailed information on the degree of ischaemia was present
Limb tissues may be permanently damaged as a direct conse- in the dataset but not in the meta-analysis. Therefore, the BN mod-
quence of the energy transfer during injury or die because of a pro- els some variables in more detail than the information obtained
longed disruption to their blood supply. The extent of tissue from the meta-analysis.
damage, or loss, is directly related to future outcome and is the pri- Several latent variables were introduced as the domain expert
mary determinant of the need for amputation. Following a lower identified the mechanistic relations between the observed clinical
extremity vascular injury, early reperfusion of the affected tissues factors and outcomes. These variables were clinically important
is essential to ensure their viability. Reperfusion entails surgical but neither the dataset nor the reviewed studies contained them
reconstruction of the damaged blood vessels. Predicting the out- as they cannot be directly observed [6]. For example, both soft tis-
come of vascular reconstruction and the projected tissue viability sue injuries and vessel injuries that require a graft repair have high
probabilities of amputation in the meta-analysis. However, each of
these factors is related to amputation through a different mecha-
nism. Graft repairs can lead to amputation if the graft blocks or
bursts, and thus disrupts the blood flow to the extremity. A vari-
able representing the degree of ‘blood supply’ is required to model
Fig. 8. BN with multiple intermediate variables. this relation. Although the degree of blood supply can be estimated
380 B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385
Table 5 Table 6
Mean and variances of the predictive distributions from the meta-analysis for Observed and latent variables in LEVT BN.
P(amputation|clinical factor).
Observed variables Latent variables
Clinical factor Predictive distribution
Arterial Repair Blood Supply
lPnew r2Pnew Anatomical Site Ischaemic Damage
Multiple Levels (MAI) Microcirculation
Arterial repair
Soft Tissue Injury Soft Tissue Cover
Graft 0.11 0.009
Associated Fracture
Primary repair 0.05 0.002
Shock
Anatomical site Ischaemia Time
Femoral 0.04 0.004 Ischaemia Degree
Popliteal 0.14 0.005 Compartment Syndrome
Tibial 0.10 0.018 Repair Failure
Number of Injured Tibials
Associated injuries
Nonviable Extremity
MAIa – present 0.18 0.045
MAIa – absent 0.09 0.006
Soft tissue – present 0.26 0.066
Soft tissue – absent 0.08 0.009
Fracture – present 0.14 0.013 and protect it from infection. Therefore, the degree of soft tissue
Fracture – absent 0.02 0.001 cover is one of the main factors affecting limb viability. Our
Nerve – present 0.12 0.022
model estimates the amount of soft tissue (Soft Tissue Cover)
Nerve – absent 0.06 0.016
based on the amount of non-viable tissue due to the direct dam-
Complications
age from the injury (Soft Tissue Injury) and ischaemia (Ischae-
Shock – present 0.13 0.047
Shock – absent 0.06 0.030 mic Damage). Certain injury types (Mechanism of Injury),
Ischaemia time > 6 h 0.24 0.050 such as blast injuries, are likely to cause more severe soft tissue
Ischaemia time 6 6 h 0.05 0.009 injuries.
CSb – present 0.28 0.008 Success of arterial repair: This part of the model predicts the suc-
CSb – absent 0.06 0.002
cess of a vascular repair operation represented by the ‘Repair
a
MAI = Arterial Injuries at Multiple Levels. Failure’ variable. ‘Arterial Repair’ variable represents the type
b
CS: Compartment Syndrome. of the repair operation, and have two states: ‘Graft’ and ‘Primary
Repair’. ‘Graft’ represents bypassing of the injured artery by a
vein harvested from the patient. ‘Primary repair’ represents a
by several measurements, the precise state of this variable is diffi- simpler repair operation such as stitching of a small laceration
cult to observe and therefore is not available in the dataset. Soft tis- in the artery. ‘Graft’ repairs have higher rate of failure compared
sue injuries lead to amputation if insufficient tissue remains to to ‘Primary Repair’ as this operation is more complex and
allow repair. Similarly, a latent variable representing the degree applied to more severe cases. Injury characteristics often define
of ‘soft tissue cover’ is required to model this relation. Table 6 the type of the arterial repair. For example, an arterial injury
shows a list of the observed and latent variables in the LEVT BN cannot be treated by primary repair if a significant part of the
structure. These variables and the LEVT BN structure are described artery is missing, thus a graft is necessary.
in the following section. The ‘Multiple Levels’ variable represent whether vascular
The LEVT BN is divided into 5 components, corresponding to the injuries are present at multiple levels of the same extremity.
5 boxes shown in Fig. 9. A summary of the variables and relations Repairs of such injuries have a higher probability of failure
in each of these components are shown below: as they are more likely to block.
‘Anatomical Site’ variable represents the location of the main
Lower extremity outcome: A viable lower extremity requires an arterial injury. Our model includes injuries above the knee
adequate blood supply and sufficient viable soft tissue to allow (femoral artery), at the knee (popliteal artery) or below the
a repair. The ‘Nonviable Extremity’ variable represents extrem- knee (tibial arteries). Reconstruction of a femoral artery
ities that are amputated as a result of insufficient viable tissue. often has better outcomes compared to a popliteal or a tibial
‘Nonviable Extremity’ is the main outcome variable that the artery.
LEVT BN aims to predict. Blood circulation: ‘Blood Supply’ variable represents the degree
Ischaemia: Tissue ischaemia results when there is an imbalance of blood supply to the lower extremity. This variable essentially
between the supply of oxygen to tissue and the tissues oxygen depends on the ‘Repair Failure’ variable. If the vascular repair
requirements to sustain life. This results from a disruption in fails, the extremity will not have adequate blood supply; so
the blood supply to the tissue. Initially ischaemia may be revers- there is a deterministic relation between the negative repair
ible, but if prolonged will result in permanent death of the failure and inadequate blood supply. In other words, a repair
affected tissues. Since our model is built for lower extremities failure leads to inadequate blood supply, and inadequate blood
with vascular injuries, most of the extremities within the scope supply leads to a non-viable extremity in our model. However, a
of our model will be partly or completely ischaemic until the vas- successful arterial repair may not guarantee adequate blood
cular injury is repaired. The severity of ischaemic damage supply throughout the lower extremity; side factors including
depends on the time elapsed since the beginning of ischaemia ‘Shock’ and ‘Microcirculation’ can also affect the outcomes.
(Ischaemia Time) and the degree of obstruction (Ischaemia The ‘Shock’ variable represents an overall deficiency of blood
Degree). A second important cause of ischaemia is the develop- supply throughout the body. The ‘Microcirculation’ variable
ment of a complication called compartment syndrome. Compart- represents the severity of injury in the smaller vessels of the
ment syndrome results when the swelling of injured tissues lower extremity.
compresses the blood vessels, disrupting the blood supply.
Soft tissue damage: This part of the model predicts the projected A single vessel supplies blood to the lower extremity. This
amount of viable soft tissue in the lower extremity. A critical divides into three branches, called tibial arteries, below the knee.
amount of soft tissue is necessary to repair the lower extremity Modelling tibial arteries is important since, in this segment, limb
B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385 381
viability is related to the number of tibial arteries injured. In order factor (see Table 5). The variable equivalent to an unsuccessful out-
to model this difference, we modified the BN structure for injuries come is ‘nonviable extremity’ in the LEVT BN but the meta-analysis
below the knee by adding a variable about the number of injured results can also be used for the NPT of the ‘repair failure’ variable as
tibial arteries. This modification is shown by the variable with (1) our model assumes a deterministic relation between an unsuc-
dashed lines in Fig. 10. Our model assumes that a repair failure cessful outcome and repair failure, and thus we know the parame-
leads to a non-viable extremity if all 3 tibial arteries are injured. ters of the intermediate variables between them and (2) the
However, there is a chance of a successful outcome if only 1 or 2 parents of ‘repair failure’ can influence ‘nonviable extremity’
tibial arteries are injured. Apart from this difference, the BN models through only one pathway (see Section 3.2.3). For example, the
for above the knee, at the knee and below the knee injuries are ‘arterial repair’ variable can affect ‘nonviable extremity’ through
exactly the same. the following pathway in our model:
Table 7
Amount of data available for learning parameters of repair failure variable.
ARa Graft Graft Graft Graft Graft Graft Primary Primary ...
MAIa True True True False False False True True
ASa Femoral Popliteal Tibial Femoral Popliteal Tibial Femoral Popliteal
RFa 14 6 2 71 115 38 1 3 ...
Number of observations:
a
AR: Arterial Repair, MAI: Multiple Levels, AS: Anatomical Site, RF: Repair Failure.
learned by combining the results of the meta-analysis with the using meta-analysis results at this stage, so they were kept fixed
data, and the values written in normal fonts are the parameters while the EM was applied.
learned purely from the data. The results of these approaches differ
substantially as the data gets smaller. The effects of this difference
5. Results
to the model performance are discussed in Section 5.1.
The performance of the LEVT BN was evaluated using a 10-fold
4.4.2. Latent variables cross-validation [26]. We used multiple performance measures to
The BN contained several latent variables as described in Sec- assess the discrimination, calibration and accuracy of the model.
tion 4.3 (see Table 6 for a list of these variables). Ranked nodes Receiver operating characteristic (ROC) curves, and sensitivity
were used to model the NPT of these variables [24]. A ranked node and specificity values was used to assess the discrimination, the
is an approximation of the truncated normal distribution to the Hosmer–Lemeshow (HL) [27] test was used to assess the calibra-
multinomial distribution with ordinal scale. We used the frame- tion. The HL test divides the data into multiple subgroups, and cal-
work proposed by Fenton et al. [24] to elicit the parameters of culates a chi-square statistic comparing the observed outcomes to
ranked nodes. For each of the latent variables we first asked the the outcomes expected by the model in each subgroup. Low p-val-
domain experts to describe the relation between the variable and ues indicate a lack of calibration. In large datasets, small differ-
its parents. Afterwards, we selected a suitable ranked node func- ences between the expected and observed outcomes can lead to
tion and elicited initial weights that imitate the described relation. low p-values in a HL test but the visual representation of this test
We presented the behaviour of the ranked node under various provides a concise summary of the model calibration. The Brier
combinations of observations to the domain experts, and refined score (BS) and Brier skill score (BSS) was used to assess the accu-
the weights based on their comments. racy [28,29]. BS is the mean squared difference between the pre-
dicted probability and observed outcome. A BS of 0 indicates a
perfect model and 1 is the worst score achievable. BSS measures
4.4.3. Variables with adequate amount of data the improvement of the model’s prediction relative to the average
After the parameters with insufficient or no data were defined, probability of the event in the data or another reference probabil-
the remainder of the parameters were learned purely from the ity. A BSS of 1 indicates a perfect model, and a negative value indi-
data. The expectation–maximisation (EM) algorithm [25] was used cates a worse prediction than the average probability. The area
to learn those parameters as the dataset contained missing values. under the ROC curve (AUROC) of the LEVT BN was 0.91. When
Small correction factors were used at the maximisation step of the operated at 80% and 90% sensitivity, the specificity was 81% and
EM algorithm to avoid zero probabilities. The parameters that were 70% respectively. BS and BSS of the LEVT BN were 0.06 and 0.33
already defined in the previous steps were kept fixed while EM was respectively. The BN was well calibrated with a HL statistic of
applied. For example, EM was applied to the parameters of the 12.7 (p-value: 0.13). Fig. 11 shows a graphical representation of
‘Arterial Repair’ variable with more than 20 instances of data. the HL tests. The predictions of the model are divided into deciles
The other parameters of this variable had been already learnt by in this figure; the expected number of outcomes according to the
B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385 383
Table 8
Parameters learnt purely from data and from a combination of data and meta-analysis.
model and the number of outcomes in the data are compared for specificity was 40% and 60% respectively. The HL test indicated
each decile. poor calibration (p-value = 0.01). BS and BSS of MESS could not
be calculated as its outputs are not probabilities.
5.1. Not compensating for the lack of data
5.3. Purely data driven structure learning
The main reason of using our method for parameter learning
was that the available data were insufficient to learn parts of the We compared the performance of our method to three different
BN structure. Since the BN model in Fig. 8 has a complicated struc- data-driven structure learning algorithms. These algorithms learn
ture compared to the available data, a purely data driven parame- both the structure and parameters from data so they do not use
ter learning approach would probably overfit the model. To show expert knowledge at all in model development. These algorithms
the consequences of not doing anything to compensate for the lack avoid overfitting by penalising large BN structures. The following
of data, we learned the parameters of the same BN structure from structure algorithms were used:
data without using any information from the meta-analysis. The
parameters of the variables that had no data (i.e. ranked nodes, 1. A score based learning algorithm:hill climbing (HC) algorithm
see Section 4.4.3) were defined using the same parameters values with the BIC score [32–34].
elicited from experts. The data-driven parameter learning algo- 2. A constraint based algorithm:grow shrink (GS) algorithm [33].
rithm had poor discrimination, calibration and accuracy. The 3. A combination of score and constraint based approaches:max–min
AUROC was 0.68, the specificity was 29% and 45% at the 90% and hill climbing (MMHC) algorithm [35].
80% sensitivity levels, and the HL test indicated poor calibration 4. A score based EM algorithm:structural EM (SEM) algorithm with
(p-value: 0.01). The BS and BSS were 0.10 and 0.02. Our method the BIC score [36].
outperformed the data-driven parameter learning algorithm in all
measures. In summary, the purely data driven parameter learning The first three algorithms require complete datasets so we
overfitted the training data as the data was inadequate to learn imputed the missing values in the dataset using the Amelia pack-
some parts of the BN parameters (Table 7). This underlines the age [37] in the R statistical software for these algorithms (see
need to exploit other sources of information such as published evi- [37] for a description of the imputation technique used in Amelia).
dence. Our method overcomes the overfitting problem by using The HC, GS and MMHC algorithms are readily implemented in the
information from the meta-analysis when the data is insufficient. BNLearn package of R [38]. The SEM algorithm is able to handle
missing values when learning structure therefore it is not
necessary to impute the missing values beforehand. We used the
5.2. Mangled extremity severity score
SEM algorithm implemented in the structure learning package of
the Bayes net toolbox of Matlab [39].
The mangled extremity severity score (MESS) [30] is a well-
known scoring system [31] that was developed to provide decision
support in the management of patients with severe lower extrem-
ity injuries. MESS calculates a score based on the injury mecha-
nism, the degree of shock, the ischaemic status and the patient’s
age. If the score is above a certain threshold value MESS recom-
mends an early amputation. Our method outperformed MESS in
predicting the ‘Nonviable Extremity’ variable (Fig. 12). MESS had
an AUROC of 0.75. When operated at 90% and 80% sensitivity, its
Fig. 11. Calibration of the LEVT BN. Fig. 12. ROC curves for the LEVT BN and MESS.
384 B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385
Table 9
Results of our method and the structure learning algorithms.
Fig. 13. BN structures learned by (A) HC, (B) MMHC, (C) GS⁄ and (D) SEM algorithms ⁄GS algorithm learns the structure with undirected arcs.
The LEVT BN had a better AUROC and substantially better per- As further research, our auxiliary parameter learning model
formance at the operating points with higher sensitivity levels could be expanded to a unified parameter learning framework that
(see Table 9). The LEVT BN had better BS and BSS than the structure combines data, published evidence and domain knowledge. Quali-
learning algorithms. The LEVT BN and BNs developed by HC and GS tative expert constraints [7,40,41] could be integrated into our
approaches were well calibrated. Since the amount of data was parameter learning method to incorporate expert knowledge
insufficient to learn the relation between some variables, the struc- alongside data and meta-analysis. Moreover, the variance esti-
ture learning algorithms avoided overfitting by learning simple BN mated from the auxiliary parameter learning model (Section 3.2)
structures. The BN structures learned by HC, MMHC, GS and SEM could be used to show how well the parameters are understood.
algorithms are shown in Fig. 13. In our case study, we applied our parameter learning technique
to the parameters that had less than 20 instances of relevant data.
The effects using different data thresholds could be explored.
6. Conclusion Finally, Bayesian parameter learning methods estimate the entire
probability distribution of a parameter. The expected value of this
This paper presented a novel methodology to build BN decision distribution is used for the relevant NPT but the variance is often
support models from the results of a meta-analysis, expert knowl- ignored. Ways of integrating the variance to inference and
edge and data. The main contribution of this methodology was a parameter estimation techniques could be investigated.
novel parameter learning technique that combines univariate sta-
tistics with data to learn multivariate BN parameters. Our method Acknowledgments
was successfully applied to a trauma case-study of severely injured
lower extremities. We developed a BN model that accurately pre- We thank two anonymous reviewers for their valuable com-
dicts the viability of a lower extremity with vascular trauma. The ments and suggestions that have helped us to improve this paper.
case study demonstrated the benefits of integrating different
sources of evidence into BN development. In a 10-fold cross-valida- References
tion, the BN built by our approach outperformed the MESS scoring
system and four different data-driven structure learning [1] Buchan I, Winn J, Bishop C. A unified modelling approach to data intensive
healthcare. The fourth paradigm: data-intensive scientific discovery. Redmond
techniques. The AUROC of the LEVT BN was 0.91; whereas it was (WA): Microsoft Research; 2009.
0.84 for the best performing structure learning technique, 0.75 [2] Marshall JC. Surgical decision-making: integrating evidence, inference, and
for MESS, and 0.68 when meta-analysis values were not used for experience. Surg Clin North Am 2006;86:201–15. http://dx.doi.org/10.1016/
j.suc.2005.10.009.
defining the LEVT BN parameters. [3] Moons KGM, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and
The techniques presented in this paper can be applied to a prognostic research: what, why, and how? BMJ 2009;338:1317–20. http://
wider scope of problems than trauma care. Using our method, dx.doi.org/10.1136/bmj.b375. b375.
[4] Daly R, Shen Q, Aitken S. Learning Bayesian networks: approaches and issues.
models that reflect complexity of clinical decisions can be built
Knowledge Eng Rev 2011;26:99–157.
even when there is insufficient patient data. Our method enables [5] Flores JM, Nicholson AE, Brunskill A, Korb KB, Mascaro S. Incorporating expert
the use of information from other sources such as published evi- knowledge when learning Bayesian network structure: a medical case study.
dence and meta-analysis of systematic reviews. It offers a Bayesian Artif Intell Med 2011;53:181–204.
[6] Yet B, Perkins Z, Fenton N, Tai N, Marsh W. Not just data: a method for
way of combining multivariate patient data with published statis- improving prediction with knowledge. J Biomed Inform 2014;48:28–37.
tics conditioned on a smaller number of variables. http://dx.doi.org/10.1016/j.jbi.2013.10.012.
B. Yet et al. / Journal of Biomedical Informatics 52 (2014) 373–385 385
[7] Zhou Y, Fenton N, Neil M. Bayesian network approach to multinomial [24] Fenton NE, Neil M, Caballero JG. Using ranked nodes to model qualitative
parameter learning using data and expert judgments. Int J Approx Reason judgments in Bayesian networks. IEEE Trans Knowledge Data Eng
2014. http://dx.doi.org/10.1016/j.ijar.2014.02.008. 2007;19:1420–32.
[8] Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta- [25] Lauritzen SL. The EM algorithm for graphical association models with missing
analysis: multivariate approach and meta-regression. Stat Med data. Comput Stat Data Anal 1995;19:191–201.
2002;21:589–624. [26] Kohavi R. A study of cross-validation and bootstrap for accuracy estimation
[9] Vickers AJ. Whose data set is it anyway? Sharing raw data from randomized. and model selection. Proceedings of the 14th international joint conference on
Trials 2006;7:15. http://dx.doi.org/10.1186/1745-6215-7-15. artificial intelligence – IJCAI’95, vol. 2. San Francisco (CA), USA: Morgan
[10] Rawlins M. De testimonio: on the evidence for decisions about the use of Kaufmann Publishers Inc.; 1995. p. 1137–43.
therapeutic interventions. Clin Med 2008;8:579–88. [27] Hosmer DW, Lemeshow S. Goodness of fit tests for the multiple logistic
[11] Greenhalgh T, Howick J, Maskrey N. Evidence based medicine: a movement in regression model. Commun Stat Theory Methods 1980;9:1043–69.
crisis? BMJ 2014;348:g3725. http://dx.doi.org/10.1136/bmj.g3725. [28] Brier GW. Verification of forecasts expressed in terms of probability. Mon
[12] Druzdzel MJ, Díez FJ. Combining knowledge from different sources in causal Weather Rev 1950;78:1–3.
probabilistic models. J Mach Learn Res 2003;4:295–316. [29] Weigel AP, Liniger MA, Appenzeller C. The discrete Brier and ranked
[13] Lappenschaar M, Hommersom A, Lucas PJF, Lagro J, Visscher S. Multilevel probability skill scores. Mon Weather Rev 2007;135:118–24.
Bayesian networks for the analysis of hierarchical health care data. Artif Intell [30] Johansen KAJ, Daines M, Howey T, Helfet D, Hansen Jr ST. Objective criteria
Med 2013;57:171–83. http://dx.doi.org/10.1016/j.artmed.2012.12.007. accurately predict amputation following lower extremity trauma. J Trauma
[14] Higgins J, Thompson SG, Spiegelhalter DJ. A re-evaluation of random-effects Acute Care Surg 1990;30:568–73.
meta-analysis. J R Stat Soc: Ser A (Stat Soc) 2009;172:137–59. [31] Bosse MJ, MacKenzie EJ, Kellam JF, Burgess AR, Webb LX, et al. A prospective
[15] Müller M, Wandel S, Colebunders R, Attia S, Furrer H, et al. Immune evaluation of the clinical utility of the lower-extremity injury-severity scores. J
reconstitution inflammatory syndrome in patients starting antiretroviral Bone Joint Surg 2001;83:3–14.
therapy for HIV infection: a systematic review and meta-analysis. Lancet [32] Korb KB, Nicholson AE. Bayesian artificial intelligence. CRC Press; 2004.
Infect Dis 2010;10:251–61. http://dx.doi.org/10.1016/S1473-3099(10)70026- [33] Margaritis D. Learning Bayesian network model structure from data,
8. University of Pittsburgh; 2003. <http://www.cs.cmu.edu/afs/cs.cmu.edu/
[16] Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approaches to clinical trials user/dmarg/www/Papers/PhD-Thesis-Margaritis.pdf> [accessed 14.03.14].
and health-care evaluation. John Wiley & Sons; 2004. [34] Schwarz G. Estimating the dimension of a model. Ann Stat 1978;6:461–4.
[17] Neil M, Tailor M, Marquez D. Inference in hybrid Bayesian networks using [35] Tsamardinos I, Brown LE, Aliferis CF. The max–min hill-climbing Bayesian
dynamic discretization. Stat Comput 2007;17:219–33. network structure learning algorithm. Mach Learn 2006;65:31–78.
[18] Agena Ltd. AgenaRisk: Bayesian network and simulation software for risk [36] Friedman N. The Bayesian structural EM algorithm. In: Proceedings of the
analysis and decision support. <http://www.agenarisk.com> [accessed fourteenth conference on uncertainty in artificial intelligence; 1998. p. 129–
15.06.14]. 38. <http://dl.acm.org/citation.cfm?id=2074110> [accessed 28.05.13].
[19] Lunn D, Spiegelhalter D, Thomas A, Best N. The BUGS project: evolution, [37] Honaker J, Gary King, Blackwell M. Amelia II: a program for missing data;
critique and future directions. Stat Med 2009;28:3049–67. 2014. <http://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf>
[20] Fenton NE, Neil MD. Risk assessment and decision analysis with Bayesian [accessed 15.07.14].
networks. CRC Press; 2012. [38] Scutari M. Learning Bayesian networks with the bnlearn R package. J Stat
[21] Koller D, Friedman N. Probabilistic graphical models principles and Softw 2010;35:1–29.
techniques. Cambridge (Mass.): MIT Press; 2009. [39] Francois O. Structure learning package for BayesNet toolbox; 2010. <http://
[22] Perkins Z, Glasgow, Tai N. A systematic review of prognostic factors related to ofrancois.tuxfamily.org/slp.html> [accessed 15.07.14].
secondary amputation in patients with lower limb vascular trauma requiring [40] Feelders A, Van der Gaag LC. Learning Bayesian network parameters under
surgical repair. PROSPERO; 2012. <http://www.crd.york.ac.uk/PROSPERO/ order constraints. Int J Approx Reason 2006;42:37–53.
display_record.asp?ID=CRD42012002720> [accessed 14.03.14]. [41] Tong Y, Ji Q. Learning Bayesian networks with qualitative constraints. In: IEEE
[23] Gelman A, Rubin DB. Inference from iterative simulation using multiple conference on computer vision and pattern recognition (CVPR 2008); 2008. p.
sequences. Stat Sci 1992:457–72. 1–8. doi:http://dx.doi.org/10.1109/CVPR.2008.4587368.