Posterior Distributions For Likelihood Ratios in Forensic Science
Posterior Distributions For Likelihood Ratios in Forensic Science
forensic science
Ardo van den Hout
Department of Statistical Science, University College London, UK
Ivo Alberink
Netherlands Forensic Institute,
Department of Digital Technology and Biometry, The Netherlands
Abstract
Hypothesis testing in forensic science is discussed and using posterior distributions for likelihood ratios is illustrated. Instead of eliminating the
uncertainty by integrating (Bayes factor) or by conditioning on parameter
values, uncertainty in the likelihood ratio is retained by parameter uncertainty derived from posterior distributions. A posterior distribution for
a likelihood ratio can be summarised by the median and credible intervals. Using the posterior mean of the distribution is not recommended.
An analysis of forensic data for body height estimation is undertaken. The
posterior likelihood approach has been criticised both theoretically and
with respect to applicability. This paper addresses the latter and illustrates an interesting application area.
Key Words: Bayes factor, Bayesian inference, body height estimation, hypothesis testing, posterior likelihood ratios.
Introduction
,
P (Hd |E, B)
p(E|Hd , B) P (Hd |B)
{z
Bayes factor
where p() is a generic notation for a probability density function (for continuous
evidence data) or a probability mass function (for discrete evidence data). If
the Bayes factor is 1, then the evidence does not help to choose between Hp
and Hd . If the factor is between 0 and 1, then the evidence makes Hd more
likely. If the factor is larger than 1, then the evidence makes Hp more likely. For
applications of the Bayes factor in forensic practice, see Lindley (1977), Evett et
al. (1987), Wakeeld et al. (1991), Sjerps and Kloosterman (2003), Aitken and
Taroni (2004), and Bozza et al. (2008).
The denition of the Bayes factor above is in line with the formulation of
the Bayes factor within the statistical framework of model comparison. Given
2
second point: we hope to illustrate that forensic science provides interesting applications for Aitkins method. With regard to the rst point, Gelman et al. make
a strong case. But - as stated by Gelman et al. themselves - it does not imply
that Aitkins approach is wrong. It just means that the approach is not purely
Bayesian. Nevertheless, for reasons of consistency, we will use Aitkins terminology such as posterior distribution of the likelihood ratio and posterior probability
throughout this paper. Further discussion of the merits and the disadvantages of
the approach are presented in the conclusion.
Section 2 introduces terminology. In Section 3, the posterior distribution of
the likelihood ratio is explained within the context of forensic science. Section 4
presents an evaluation of evidence where the posterior distribution of the likelihood ratio is used for the measurement of body height. Background data in this
case consist of measurements on test persons. A comparison is made with the
Bayes factor approach. For the posterior sampling we use WinBUGS (Lunn et
al., 2000). Section 5 concludes the paper.
Terminology
For a continuous random variable, the likelihood ratio (LR) is the ratio of two
values of the probability function p(x|), given two values of model parameter
, and data x. For values 1 and 2 , we have LR = p(x|1 )/p(x|2 ), where, as
before, function p() is a generic notation for a probability density function or a
probability mass function.
Given two hypotheses H1 and H2 for assumptions for models M1 and M2 ,
respectively, the Bayes factor (BF ) in favour of H1 is given by
p(x|, H1 )p(|H1 )d
p(x|H1 )
BF =
=
.
p(x|H2 )
p(x|, H2 )p(|H2 )d
The BF is also called a marginal likelihood ratio as it is the ratio of two marginal
5
likelihoods. It is not necessarily the case that p(x|, H1 ) is the same function as
p(x|, H2 ). These probability functions are dened by M1 and M2 , respectively.
The same holds for p(|H1 ) and p(|H2 ). It is because of this that the BF can
be used to compare non-nested models.
If, however, M1 and M2 are nested, i.e., one can be derived from the other by
restricting a subset of the parameters, then the BF is still dierent from the LR,
as the latter is dened for specic parameter values and the former is dened by
integrating out the parameters. It is only in the specic case where the priors
given by p(|H1 ) and p(|H2 ) identify parameter values with probability 1 (have
a point mass 1 at those values), that the BF reduces to a LR.
The LR can be used for simple null hypothesis testing. For example, let the
null hypothesis be given by H1 : = 1 , and the alternative by H2 : = 1 . If b
is the maximum likelihood estimate, then the probability distribution of the test
b can be approximated by a chi-square distribution
statistic 2 log[p(x|1 )/p(y|)]
with 1 degree of freedom (conditional on some assumptions). This is the wellestablished likelihood ratio test.
The BF can also be used for null hypothesis testing. For H1 and H2 , BF
is given by p(x|1 )/ p(x|)p()d, where p() is the prior density under the
alternative hypothesis. In this case, H1 is rejected if BF < 1 and close to zero,
and H2 is rejected if BF > 1 and large.
The following example of a Bayes factor in forensic practice is taken from Lucy
(2005, Section 12.5). An eyewitness height description of the male perpetrator is
modelled as a normal distribution with mean 1.816 meter and standard deviation
0.054. The prosecutors hypothesis is Hp : perpetrator = suspect. The defenders
hypothesis is Hd : perpetrator = suspect. The assumed population distribution
of men is normal with mean 1.775 and standard deviation 0.098. The evidence is
the height E = 1.855 of the suspect.
The Bayes factor is in this case equal to the probability density of E under
Hp divided by the probability density of E under Hd . That is, BF = f (E|p =
1.816, p = 0.054)/f (E|d = 1.775, d = 0.098) = 1.951, where f is the density
of a normal distribution with mean and standard deviation (Lucy 2005).
We would like to add the following explanation in terms of the BF . The BF
in this case is dened as
p(E|, Hp )p(|Hp )d
p(E|Hp )
BF =
=
.
p(E|Hd )
p(E|, Hd )p(|Hd )d
(1)
There are no background data. The models under both hypotheses are completely
specied normal distributions. This means that p(|Hp ) species = (p , p )
with probability one. Likewise p(|Hb ) species = (d , d ) with probability
one. As a result both integrals disappear in (1) and we end up with p(E|, Hp ) =
f (E|p , p ) and p(E|, Hd ) = f (E|d , d ).
Note that there is no uncertainty associated with the BF . Consider the case
where background data are used for the estimation of d and d . In that case,
the denominator of (1) would have been
p(E|Hd , B) =
p(E|, Hd , B)p(|Hd , B)d
p(B|, Hd )p(|Hd )
=
p(E|, Hd , B)
d,
p(B|Hd )
where p(B|, Hd ) is the likelihood and p(|Hd ) is the prior density. Because
the BF is in this case dened conditional on background data B, there is still
no uncertainty associated with the BF . The uncertainty with respect to is
integrated out. Nevertheless, if a new data set B would be sampled, another
BF would be the result. By conditioning on B, this sample uncertainty is not
accounted for.
As an alternative method for simple null hypothesis testing, Aitkin (2010) advocates to use a Bayesian framework but instead of working with the BF , and
proposes to consider the posterior distribution of the LR. Instead of eliminating
the uncertainty by maximising (LR test) or by integrating (BF ), uncertainty in
the LR is retained by parameter uncertainty derived from the posterior distributions.
Bayesian inference focusses on the posterior density of parameters. If is
the parameter and x are the data, then the posterior is given by p(|x) =
p(x|)p()/p(x), where p(x|) is the likelihood of the data and p() is the prior
density of . Thus the posterior is proportional to the likelihood times the prior,
and this is written as p(|x) p(x|)p().
The posterior likelihood ratio approach is readily explained in terms of sampling. The LR is considered as a function of the parameters under both hypotheses. First, given H1 : = 1 , the likelihood is a single value L(1 ) = p(x|1 ).
Second, given H2 : = 1 , S parameter values are sampled from the posterior
p(|x) and for each value the likelihood L( ) is computed. Next, the S ratios
L(1 )/L( ) provide a random sample from the posterior of the LR.
At rst sight, the setting in Aitkin (2010) is dierent from the forensic science
setting. For the former, there is a data set and a model, and the hypotheses
are about model parameters. For the latter, there is evidence E and background
data B, and the hypotheses are about E - not about the model for B.
For the forensic science setting, we can dene an LR given an estimate of
model parameters for B. This only works if we assume that both the prosecutor
and the defender accept the same model for B. If the model parameter vector is
denoted , then we can dene a likelihood ratio by the ratio of two probability
p(E|Hp , )
.
p(E|Hd , )
(2)
(3)
Evaluation of evidence
In this section, the posterior of the likelihood ratio (2) is used for forensic data
for height estimation of a perpetrator. A comparison with the Bayes factor (3) is
made.
9
A perpetrator was well visible on a security camera and one image was chosen as the basis of height measuring. Background data B consist of additional
measurements of six test persons who were positioned in the same stance as the
perpetrators in front of the original camera (Edelman et al., 2010).
We use the following notation. Background data are measurements mi , for
test persons i = 1, 2, ..., 6, and known true heights hi . The model for the height
estimation is
mi = + hi + i
with
i N (0, 2 ),
(4)
where is the systematic measurement error, see Van den Hout and Alberink
(2010) for an extended model and details of the data. Let = (, log()).
The evidence is the measured height mp of the perpetrator. The height of the
suspect is hs . The prosecutors hypothesis is Hp : perpetrator is suspect (hp = hs ).
The defenders hypothesis is Hd : perpetrator is not suspect (hp = hs ). Assume
that both the prosecutor and the defender accept model (4). The BF is given by
p(mp |Hp , B)
p(mp |hp = hs , B)
=
p(mp |Hd , B)
p(mp |hp = h, B)p(h)dh
p(mp |, hp = hs )p(|B)d
]
= [
.
p(mp |, hp = h)p(h)dh p(|B)d
BF =
(5)
Let us assume that the height distribution of the population is given by p(h) =
p(h|h , h ), a normal distribution with known mean h and known standard
deviation h . The conditional LR is given by
LR =
p(mp |hp = hs , )
.
p(mp |hp = h, )p(h|h , h )dh
(6)
1 (mp h )2
1
exp
p(mp |hp = h, )p(h|h , h )dh =
,
2
2 + h2
2( 2 + h2 )
see, e.g., Gelman et al. (2004, Section 2.6) for a similar computation. If is
treated as a xed value, then there is no uncertainty associated with LR.
For the posterior of LR, rstly, we sample from the posterior p(|B).
Secondly we compute LR for each sampled .
To obtain the posterior p(|B), we have to specify the prior of the model
parameter vector . Gelman et al. (2004) discuss the denition of the prior
density in the context of the normal distribution, and also the sampling from the
resulting posterior. Various levels of informativeness and conjugacy are presented
by Gelman et al.
For the evaluation of evidence in the present setting, we specify an informative proper prior p() without worrying about conjugacy as we will rely on the
automatic MCMC procedures in WinBUGS to do the sampling.
To compare the posterior likelihood ratio approach with the Bayes Factor (5),
we approximate the integrals in the latter by using the trapezoidal rule (with 500
nodes). This computation includes the estimation of the marginal density p(B)
since the posterior for is given by p(|B) = p(B|)p()/p(B). In general, the
estimation of marginal density can be complex (Carlin and Louis, 2009). Since
consists of only two parameters, numerical approximation of the integrals works
ne. Sampling from the posterior of LR is undertaken in WinBUGS (Lunn et
al., 2000). WinBUGS is freely available software for the Bayesian analysis of
statistical models using Markov chain Monte Carlo (MCMC) methods, see also
www.mrc-bsu.cam.ac.uk/bugs. Code is provided in the Appendix. For the
inference in this application, the MCMC consisted of two chains, each with a
11
Table 1: Background data on measured heights and true heights of test persons,
and measured height of perpetrator.
Test persons
Perpetrator
burn-in of 10000, and a further 10000 updates for inference. Convergence of the
MCMC was checked by using the diagnostic tools provided within WinBUGS.
Evidence mp and background data for the height estimation are presented
in Table 1. The population distribution of Dutch Caucasian men is assumed to
be normal with mean h = 1.806 and standard deviation h = 0.1 (Statistics
Netherlands, www.cbs.nl, 2006). This species p(h|h , h ). For the prior of we
assume p() = p(, log()) = p()p(log()), and furthermore N (0, 0.1) and
log() U (10, 0). These priors are informative and take into account that the
measurements are in meters.
Bayesian inference using WinBUGS yields a posterior mean 0.029 for with
95% credible interval (CI) (0.017, 0.042). So there is a systematic overestimation
of the height of about 3cm. For the gures are 0.012 (0.006, 0.024). The
posterior density p(|B) has a regular shape and is depicted in Figure 1.
We will illustrate the evaluation of the evidence mp = 1.885 for various values
of the height of the suspect hs . Say that the suspect has the same height as the
perpetrator. In that case mp 1.885 0.029 = 1.856 = hs . If this is indeed
the case we would expect the value 1 to be located in the left tail of the density
of LR because it is likely that the suspect is the perpetrator and hence the mean
of LR should be larger than 1. In other words, P (LR < 1) should be small.
For the same reason, we would expect BF to be larger than 1. This is indeed
12
100
50
0
0.01
0.02
(s
ig
m
4
5
0.04
lo
g
alp0.03
ha
a)
0.05
The value hs = 1.825 illustrates a situation where the extra information of the
posterior of the LR is of particular use. The BF is estimated at 0.53. This is dissimilar to the posterior median 0.15 of the LR, whereas the posterior mean 0.521
of the LR is close to the BF . Where the BF gives no uncertainty information,
the sampled values of the LR allow many possible quantities to be estimated to
assess whether the evidence is in favour of the defenders hypothesis. The latter
is not the case. The 95% CI for the LR is (< 0.01, 2.86) which includes the value
1. Probability P (LR < 1) is estimated at 0.82.
For the value hs = 1.825, we investigate the sensitivity of the results with
regard to the specication of the prior p() = p(, log()). First, we use priors
which are less informative. We specify N (0, 1) and log() U (10, 5).
Given that measurements are in meters, these priors do not contain much information. For the LR, we obtain median 0.150 and 95% CI (< 0.01, 3.03), the BF
is estimated at 0.54. Next we specify N (0, 0.05) and log() U (10, 3).
The prior for implies that about 95% of the systematic error falls with the
interval (-10cm, 10cm), the prior for implies that is less than 10cm. These
priors are informative, but are still reasonable for this case. For the LR, we obtain
median 0.145 and CI (< 0.01, 2.80), the BF is estimated at 0.51. Given these
alternative specications of the priors, results are very similar to the previous
results.
Conclusion
is required, but it is also not fully Bayesian since it does not use the Bayes factor
for hypothesis testing.
The application discussed forensic data where heights were estimated on the
bases of images from a security camera. The posterior mean of the likelihood ratio
was similar to the Bayes factor. With samples available from the posterior of the
likelihood ratio, an all-round inference was possible by investigating posterior
percentiles and credible intervals.
As stated in the introduction, Gelman et al. (2010) criticise the posterior
likelihood ratio approach by arguing that it is incompatible with a Bayesian
perspective, and that it does not seem to be useful for common applications in
statistics. We hope to have shown in this paper that forensic science is an area
where the approach seems useful. The points raised by Gelman et al. (2010) with
respect to using vague priors, comparing discrete hypotheses, and the problem
with product of posteriors, are not applicable in our setting: In forensic science,
it make sense to use vague prior densities for the parameters in the model for the
background data, researchers are interested in comparing discrete hypotheses,
and - at least in the current application - there is no assessment of a product of
posteriors.
Nevertheless, we acknowledge that there are still important issues in the posterior likelihood ratio approach that need further attention. Using the posterior
distribution of LR for hypothesis testing can be seen as a hybrid of Bayesian
and frequentist methods. It is not fully Bayesian, but it is also not a frequentist
analysis. This ambiguity causes interpretation problems. For example, in a fully
Bayesian framework, a 95% credible interval of a parameter means that the posterior probability that the parameter lies in that interval is 0.95. A frequentist
95% condence interval means that given a large number of repeated samples,
95% of the estimated condence intervals includes the true value of the parame-
15
ter. What are the properties of the credible intervals for LR that we computed
in the current application?
Using the posterior likelihood ratio has a wide range of possible applications
in forensic practice. Computationally it is a feasible method to evaluate evidence.
It takes into account the uncertainty with regard to inference from background
and at the same time allows to model prior knowledge.
Appendix
WinBUGS code used in the evaluation of evidence. For more information on the
software and MCMC sampling see www.mrc-bsu.cam.ac.uk/bugs.
Data:
list(h = c(1.950, 1.795, 1.865, 1.755 ,1.910, 1.825) ,
m = c(1.964, 1.832, 1.900, 1.780, 1.937, 1.865))
Inits:
list(alpha=0, logsigma= -4)
list(alpha=0.02, logsigma= -5)
Model:
model{
# Model for measurement:
for(i in 1:6){ mu[i]<-h[i]+alpha;
m[i]~dnorm(mu[i], tau) }
# Evaluation of evidence:
h_s<- 1.825; m.p <- 1.885
# Under H_p:
pi<-3.141593; p_Hp<-1/(sqrt(2*pi)*sigma)*exp(-1/2*tau*pow(m.p-(h_s+alpha),2))
# Under H_d:
mu_h.pop<-1.806; var_h.pop<-0.01
tau_h.pop<-1/var_h.pop
p_Hd<-1/sqrt(2*pi*(var+var_h.pop) )*exp(-1/(2*(var+var_h.pop))*pow(m.p-alpha-mu_h.pop,2))
# LR:
LR<-p_Hp/p_Hd
# Strength of evidence:
c<-1; pprob<-step(c-LR)
# Converting precision to sd and var:
tau<-pow(sigma,-2); var<-pow(tau,-1)
# Priors:
alpha~dnorm(0,0.1); logsigma~dunif(-10,0); sigma<-exp(logsigma)}
16
References
Aitken, C.G.G and F. Taroni (2004). Statistics and the Evaluation of Evidence
for Forensics Scientists, Second Edition, Wiley, Chichester, UK.
Aitken, C.G.G. and D. Lucy (2004). Evaluation of trace evidence in the form of
multivariate data, Journal of the Royal Statistical Society: Series C (Applied Statistics), 53, 109122.
Aitkin, M. (1991). Posterior Bayes factors (with discussion) Journal of the Royal
Statistical Society, Series B, 53, 111142.
Aitkin, M. (1997). The calibration of P-values, posterior Bayes factors and
the AIC from the posterior distribution of the likelihood, Statistics and
Computing, 7, 253261.
Aitkin, M. (2010). Statistical Inference. An Integrated Bayesian/Likelihood
Approach, Chapman & Hall/CRC, Boca Raton, FL.
Aitkin, M., R.J. Boys, and T. Chadwick (2005). Bayesian point null hypothesis
testing via the posterior likelihood ratio, Statistics and Computing, 15, 217
230.
Bernardo, J.M. and A.F.M. Smith (2000). Bayesian Theory. Wiley, Chichester,
UK.
Bozza, S., F. Taroni, R. Marquis and M. Schmittbuhl (2008). Probabilistic
evaluation of handwriting evidence: likelihood ratio for authorship, Journal
of the Royal Statistical Society: Series C (Applied Statistics), 57, 329341.
Carlin, B.P. and T.A. Louis (2009). Bayesian Methods and Data Analysis. Third
Edition, Chapman & Hall, London.
Curran, J.M. (2005). An introduction to Bayesian credible intervals for sampling
error in DNA proles, Law, Probability and Risk, 4, 115126.
Dempster, A.P. (1974). The direct use of likelihood for significance testing,
in O. Barndor-Nielsen, P. Bleasild and G. Sihon (eds.),
17
Proc. Conf.
18