Bayesian Decision and Risk Analysis Lecture Notes 2022 QMUL
Bayesian Decision and Risk Analysis Lecture Notes 2022 QMUL
Norman Fenton
CRC Press
ISBN: 9781138035119
2018
Slide 3
2 nd Edition Chapter 1:
Introduction
Slide 4
This is the annual number of Americans killed, on average, by
lawnmowers - compared to two Americans killed annually, on
average, by immigrant Jihadist terrorists.
The figure was highlighted in a viral tweet this year from Kim
Kardashian in response to a migrant ban proposed by
President Trump; it had originally appeared in a Richard Todd
article for the Huffington Post.
Todd’s statistics and Kardashian’s tweet successfully
highlighted the huge disparity between (i) the number of
Americans killed each year (on average) by ‘immigrant Islamic
WINNER: Jihadist terrorists’ and (ii) the far higher average annual death
tolls among those ‘struck by lightning’, killed by ‘lawnmowers’,
INTERNATIONAL and in particular ‘shot by other Americans’.
STATISTIC OF Todd and Kardashian’s use of these figures shows how
everyone can deploy statistical evidence to inform debate and
THE YEAR: 69 highlight misunderstandings of risk in people’s lives.
Judging panel member Liberty Vittert said: 'Everyone on the
panel was particularly taken by this statistic and its insight into
risk - a key concept in both statistics and everyday life. When
you consider that this figure was put into the public domain by
Kim Kardashian, it becomes even more powerful because it
shows anyone, statistician or not, can use statistics to illustrate
an important point and illuminate the bigger picture.’
Tweet by Kim
Kardashian
that earned
"International
Statistic of the
Year" 2017
Note: Because of the particular 10-year period chosen (2007-2017) the terrorist attack
statistics do not include the almost 3000 deaths on 9/11 and also a number of other attacks
that were ultimately classified as terrorist attacks.
Slide 6
Significant
dissenter was
Nassim Nicolas
Taleb – a well-
known expert
on risk and
‘randomness’.
He exposed a
fundamental
problem with
the statistic
Slide 7
Slide 8
Slide 9
Causal
view of
lawnmower
versus
terrorist
attack
deaths
Slide 10
Cost-
benefit
trade-off
analysis
required for
informed
decision-
making
Slide 11
Is data alone enough to inform
decision making?
• What contextual and situational factors
cause the risk? How do these vary?
• What about novelty? The unknown is
dangerous (…but also an opportunity)
• Is the past a reliable predictor of the
future? If not what is being ignored?
• If we can determine the causal process
that generates the data we can potentially
control it.
Slide 12
• Medical
– Lifestyle factors and
symptoms
– Treatment efficacy • ‘Gut-feel’ decisions or
There is • Legal on the back of an
envelope is
more to – DNA match
– Alibis
fundamentally
inadequate.
assessing – Witnesses • Need for scientific
• transparency and
risk than Safety
– Rare events
articulation of causal
models.
statistics – Choose between • Need to understand
alternatives strengths and
• Financial limitations of statistics
– Liquidity risk and data analysis
– Operational risk • Focus on Bayesian and
Causal Networks
• Reliability
– Complex software
– Novel technology
Slide 13
2 nd Edition Chapter 2:
Slide 14
Predicting economic growth, the Normal
distribution and its limitations
6
0
1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2 Q2
Growth rate in GDP (%) over time from first quarter 1993 to first quarter 2008
• What are the chances that the next year’s growth rate will lie between 1.5%
and 3.5%? (stable economy)
• What are the chances that the growth will be less than 1.5% in each of the
next three quarters?
• What are the chances that within a year there will be negative growth?
(recession)
Slide 15
To answer these questions we need a
model.......
The distribution is symmetric around the
midpoint, which is called the mean of the
distribution. Because of this exactly half of the
distribution is greater than the mean and half
less than the mean.
Mean of 2.96 and a standard deviation of 0.75. - Answer: about 0.0003% which is less than
one in thirty thousand.
Slide 17
What happened next?
6
4
Within less than a year the growth rate was
below –5%. According to the model a growth
2
0
rate below –5% would happen considerably
-2
less frequently than the inverse of the life of
-4
the universe.
-6
-8
2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2 4 2
Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q
93 93 94 94 95 95 96 96 97 97 98 98 99 99 00 00 01 01 02 02 03 03 04 04 05 05 06 06 07 07 08 08 09
19 19 19 19 19 19 19 19 19 19 19 19 19 19 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
Conditions in 2008 were unlike Not only is the spread of the distribution
any that had previously been much wider, but it is clearly not ‘Normal’
seen. The standard statistical because it is not symmetric.
approaches inevitably fail in
such cases.
Slide 19
• Scores achieved (on an objective
Patterns and quality criteria) by the set of state
Randomness – schools in one council district in the
School League UK. Scores provide ‘choice” for
Tables parents
Position School Score 26 45 144
• School 38 achieved a significantly
Number 27 46 143 higher score than the next best
1 38 175 28 1 142
2 43 164 29 18 142 school, and its score (175) is over
3 44 163 30 22 141
4 25 158 31 26 141 52% higher than the lowest ranked
5
6
31
47
158
158
32
33
4
14
140
140
school, number 41 (score 115).
7
8
11
23
155
155
34
35
29
39
140
139 • Based on the impressive results of
9
10
48
40
155
153
36
37
8
5
138
136
School 38 parents clamour to
11 7 151 38 17 136 ensure that their child gets a place
12 30 151 39 34 136
13 6 150 40 3 134 at this school.
14 9 149 41 24 133
15 33 149 42 36 131 • How you would feel if, instead of
16 19 148 43 37 131
17 10 147 44 15 130 your child being awarded a place in
18
19
12
32
147
147
45
46
21
16
130
128
School 38, he/she was being sent to
20 2 146 47 13 120 school 41?
21 27 146 48 20 116
22 42 146 49 41 115
23 28 145
24 35 145
25 49 145
Slide 23
The Black Swan
correlates with
Number of
people who die
tangled in their
bedsheets Taken from http://tylervigen.com
• Which drug is effective for weight p-values
loss?
• For drug Precision the mean weight loss is 5 reward low
lbs and every one of the 100 subjects in the
study loses between 4.5 lb and 5.5 lb. variation
• For drug Oomph the mean weight loss is 20
lbs and every one of the 100 subjects in the more than
study loses between 10 lb and 30 lb.
• Classical statistical testing with p-
magnitude
values favours drug Precision of impact
Given a large number
The of variables and a
Shotgun sufficiently large data
set it is almost
Fallacy inevitable that a
statistically significant
correlation will be
discovered between at
least one pair of
variables
Slide 29
Spurious relations?
Height Intelligence
Temperature (T)
Inappropriate causal link
Age
Slide 30
The Danger of
Regression:
Looking
backwards
when you Suppose that you are blowing up a large balloon.
need to look After each puff you measure the surface area and
record it.
forwards
What will the surface area be on the 24th puff?
Slide 32
Simpson’s paradox – can you trust
averages?
Fred Jane
Year 1 average 50 40
Year 2 average 70 62
Overall Average 60 51
Fred Jane
Year 1 total 350 (7 x 50) 80 (2 x 40)
Year 2 total 210 (3 x 70) 496 (8 x 62)
Overall total 560 576
Slide 34
Simpson’s paradox drug example
A new drug is being tested on a group of 800 people (400
men and 400 women) with a particular ailment.
Half of the people (randomly selected) are given the drug and
the other half are given a placebo.
Drug taken No Yes
Recovered
No 240 200
Yes 160 200
For men: 70% (70 out of 100) taking the placebo recover, but only 60%
(180 out of 300) taking the drug recover.
For women: 30% (90 out of 300) taking the placebo recover, but only 20%
(20 out of 100) taking the drug recover.
Slide 38
How we
measure risk
can
dramatically
change our
perception
of risk
Slide 39
How we
measure risk
can
dramatically
change our
perception
of risk
Slide 40
• Relative risk is being
Absolute reported not absolute
risk
Vs • Really interested in
knowing actual chance
Relative of dying if we drink
regularly Vs if we do
not.
Risk • Of 200,000 deaths 8
are from mouth cancer.
This means 6 of those 8
drank wine regularly
and two did not (tripled
risk 6/2).
• But what is the actual
chance of dying from
mouth cancer if you
drink regularly?
0.0012% to 0.002%.
Slide 41
2nd Edition Chapter 3:
Slide 42
Are you more likely to die in a car crash
when the weather if good or bad?
Temperature (T)
𝑁 = 2.144 × 𝑇 + 243.55
Inevitable temptation arising from such results is to infer causal links such
as, in this case, higher temperatures cause more fatalities. Slide 43
New research proves
that driving in winter is
Newspaper actually safer than at
headline any other time of the
year!
Slide 44
Assessing Risk of Road
Fatalities: Causal model
We have information in a database
about temperature, number of fatal
crashes, and number of miles
travelled. These are therefore often
called ‘objective’ factors.
• Risk Definition 2:
Slide 49
..But this does not tell us tell us what we
need to know
Armageddon risk: large meteor strikes the Earth
Slide 52
Armageddon Bayesian Network
Slide 53
Lesson Summary
Slide 3
We want a unifying way to quantify for
diverse types of uncertain events
• Where we a have “good” understanding of the uncertainty
– "The next toss on a coin will be a head."
– “The next roll of a die will be a 6.”
• Where we have a “poor” understanding of the uncertainty
– “USA will win the next World Cup.”
– “My bank will suffer a major loss tomorrow.”
– “A hurricane will destroy the White House within the next 5 years.”
• Where there is incomplete information about event that already happened:
– “Oliver Cromwell spoke more than 3000 words on 23 April 1654.”
– “OJ Simpson murdered his wife.”
– “You (the reader) have an as yet undiagnosed form of cancer.”
• Even an “unknown” event:
– “My bank will be forced out of business in the next two years as a result
of a threat that we do not yet know about.”
• Frequentist definition of chance of an
elementary event is the frequency with which
that event would be observed over an infinite
number of repeated experiments.
– The chance of an event is simply the sum of the
frequencies of the elementary outcomes of the
event divided by the total events
The – If an experiment has n equally likely elementary
events then the chance of any event is m/n
Frequentist where m is the number of elementary events for
the event
measure of • Assumes repeatability of experiment: The
uncertainty experiment is repeatable many times under
identical conditions.
• Assumes independence of experiments: The
outcome of one experiment does not influence
the result of any subsequent experiment.
Experiments, Outcomes and Events
Slide 8
“Disease X” (yes, no) Toss a coin and roll a die:
and “Test for disease (H,1)
X” (pos, neg): (H,2)
(H,3)
(yes, pos) (H,4)
(yes, neg) (H,5)
(no, pos) (H,6)
Joint (no, neg) (T,1)
(T,2)
Experiments (T,3)
(T,4)
(T,5)
(T,6)
Slide 10
Joint Events and Marginalization
using Balls in an Urn
Slide 11
Calculating number of events using
Combinations
The number of 𝑛!
𝐶𝑜𝑚𝑏(𝑛, 𝑟) =
combinations of 𝑛 things 𝑟! (𝑛 − 𝑟)!
taken 𝑟 at a time:
49!
The number of elementary 𝐶𝑜𝑚𝑏 49,6 = = 13,983,816
events in the UK lottery is: 6! 49 − 6 !
Slide 12
Calculating number of events using
Permutations
= 52!
= 8065817517094387857
166063685640376697528
95054408832778240000
00000000
1
𝑃 𝑜𝑟𝑑𝑒𝑟 = ≈0
52! Slide 13
Determining
probability using
a repeated coin
tossing
experiment
But surely subjective measures
are irrational?
• Any subjective measure of uncertainty
cannot be validated
• Different experts will give different
subjective measurements
• They are therefore non-scientific and
objective measures are to be prefered
Slide 15
But can all uncertain problems
be expressed as frequency
problems?
• Consider issues even with coin tossing
– Is any coin perfectly fair? Throw
100,000 times and observe 50,001
heads.
– What happens if you observe 100
Heads in 100 tosses? Is assumption
of fairness even relevant?
• Any attempt to measure uncertainty
inevitably involves some subjective
judgement about
– The mechanism generating the
events
– The information available to you when
making your model and decisions
• How do you assess uncertainty when
frequentist assumptions don’t hold?
Slide 16
Combining Subjective and Objective
information
• More than a few drinks later the Casino closes forcing you to gamble
elsewhere. You know the only place open is Shady Sam’s but you
have never been. The doormen give you a hard time, there are
prostitutes at the bar and hustlers all around. Yet you decide to play
the same dice game.
• What is the probability of a six?
Slide 17
Combining Subjective and Objective
information
Slide 18
1. Modeller: My model contains two variables ‘lose job’ causes
‘cannot pay debt’. If a borrower loses employment they cannot
pay the debt back 90% of the time.
2. Observer: But by losing income they can still pay debt 10% of
the time. Why is that? This looks odd. How can they still have
chance of 10% of paying debt without a job?
3. Modeller: Because they could sell house and can still pay.
4. Observer: OK, that isn’t in the model, let’s add that to model
(model now has two causes for ‘can pay debt’: lose job and sell
house)
Causal
5. Observer: But if the borrower loses their job but doesn’t sell
their house what’s the chance of paying the debt?
Revelation
6. Modeller: Answer – 5%.
7. Observer: How could someone still pay? There must be some
other reason.
and
8. Modeller: Perhaps they could sell their grandmother into
slavery? Absence of
9. Observer: OK, sounds a bit extreme but let’s add that to the
model. What’s the chance of not paying debt now?
10. Modeller: If borrower loses job, doesn’t sell the house and they
Information
don’t sell their grandmother into slavery, then the chance is 1%.
11. Observer: But why 1%?
12. Modeller: Because they may rob a bank!
13. Observer: OK, let’s add that to the model
...dialogue continues
• At some point the modeller reveals all possible
causal mechanisms and achieves a zero Einstein
probability of the borrower not paying their
debt in the presence of all possible causes, said: ‘God
thus rendering the model deterministic
• Our probabilities represent:
does not
• Causal mechanisms that are NOT in the model play dice
• Our lack of information about possible causes
• What is or isn’t in the model depends on our with the
cognitive revelation, imagination, experience
and availability of information universe’
• Probability as an expression of a rational
agent’s degrees of belief about uncertain
propositions
• Rational agents may disagree. There is no
“one correct probability”
• A rational agent will update and adapt The
their model and probabilities when new
(relevant) information becomes available Subjectivist
• If she receives feedback her assessed
probabilities will in the limit converge to
Viewpoint
observed frequencies
• With enough information, and the same
assumptions, different observers will
converge on the same probability
Frequentist Subjective
Frequentist • Can be legitimately • Is an expression of a
versus applied only to rational agent’s
repeatable problems degrees of belief
Subjective • Must be a unique about uncertain
Viewpoints number capturing an propositions
objective property in • Rational agents may
the real world disagree. There is
• Only applies to events no “one correct”
generated by a measure
random process • Can be assigned to
• Can never be unique events
considered for unique
events
Slide 22
2 nd Edition Chapter 5:
Slide 23
• Event, 𝐸, is “next toss of a coin is a
head”.
• Probability of 𝐸 is 𝑃(𝐸)
Probability • Assignment of values depends on beliefs
and context:
Notation – Truly fair - 𝑃 𝐸 = 0.5
and – 100 tosses 20 heads - 𝑃 𝐸 = 0.2
– Inspected coin and sees it has two heads -
Examples 𝑃 𝐸 = 1.0
• Experiment has exhaustive events:
– {head, tail}, or
– {head, tail, side}
• Axiom 5.1 (Unit measure)The probability
of any event is a number between zero
Probability and one.
Axioms: 0 ≤ 𝑃(𝐸) ≤ 1
𝑃 𝐸1 ∪ 𝐸2 = 𝑃 𝐸1 + 𝑃 𝐸2 − 𝑃(𝐸1 ∩ 𝐸2 )
Joint
Observation 5.1 Probability of independent
probability of events: If 𝐸1 and 𝐸2 are independent events,
independent then the probability that both 𝐸1 and 𝐸2 happen
is equal to the probability of 𝐸1 times 𝐸2
events
𝑃 𝐸1 ∩ 𝐸2 = 𝑃 𝐸1 × 𝑃 𝐸2
(Multiplication
rule)
𝑃 𝐸1 ∩ 𝐸2 = 𝑃 𝐸1 , 𝐸2
Observation 5.2 Probability of dependent
events: If 𝐸1 and 𝐸2 are dependent events, then
Joint the probability that both 𝐸1 and 𝐸2 happen is
equal to the probability of 𝐸1 times 𝐸2 given 𝐸1
probability of
Dependent
𝑃 𝐸1 ∩ 𝐸2 = 𝑃 𝐸1 × 𝑃 𝐸2 | 𝐸1
events
Slide 29
Simple Example – Six on Two Dice?
𝑃 𝐸
= 𝑃 𝐸1 = 5 ∪ 𝐸2 = 6
= 𝑃 𝐸1 = 5 + 𝑃 𝐸2 = 6
− 𝑃 𝐸1 = 5 ∩ 𝐸2 = 6
= 1ൗ6 + 1ൗ6 − 0
= 1ൗ3
Joint event two Dice
𝑃 𝐷𝑖𝑐𝑒1 = 5,6 ∪ 𝐷𝑖𝑐𝑒2 = 5,6
Probability of 5 or 6 on one or both of two dice, denoted Dice A, Dice B
Cleaner notation:
1 1 1
𝑃 𝐴 =𝑃 𝐵 =𝑃 5∪6 =𝑃 5 +𝑃 6 −𝑃 5∩6 = + −0=
6 6 3
1 1 1 1 2 1 5
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = + − = − =
3 3 3 3 3 9 9
2 2 5
𝑃 𝐴 ∪ 𝐵 = 1 − 𝑃 ¬𝐴, ¬𝐵 = 1 − =
3 3 9
Probability
distributions
Slide 36
Joint probability distribution
• The joint probability distribution of A with states {a1, a2, a3} and B
with states {b1, b2, b3, b4} is the probability distribution for the joint
event (A, B):
• Can call A and B variables with states rather than experiments and
events
Slide 37
(Random) Variable Types
• Types of variables (defined by state type):
– Discrete Labeled {Spurs, West Ham, Chelsea}
– Boolean {True, False}
– Continuous Value E.g. {…, -0.5, 0.1, 0.7, 0.7001, 0.8, 0.9501, ….}
– Integer Value E.g. {…, -1, 0, 1, 2, 3, 4, ….., n}
– Interval E.g. {…., ]0,1], ]1, 2], ….., ]n-1, n] }
• Infinite
– Continuous E.g. { ]-infinity, 0], ….,]0,1], ]1, 2], …..]0, +infinity] }
Slide 38
Joint probability
distribution an probability
of marginalized events
Slide 39
Marginalization
Slide 40
Dealing with more than two Variables
n
If each variable has 10 states then need 10 probabilities! Slide 41
Axiom 5.4 Probability of dependent events: If
𝐴 and 𝐵 are dependent events, then the
probability that event ‘𝐵 occurs given that 𝐴
has already occurred’ is:
Fundamental 𝑃 𝐴 ∩ 𝐵 𝑃(𝐴, 𝐵)
𝑃 𝐵 𝐴) = =
Rule of 𝑃(𝐴) 𝑃(𝐴)
conditional
probability
Theorem 5.3:
𝑃 𝐴, 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴)𝑃 𝐴
Example Dependant Events
𝑃 𝐴 = 0.4
𝑃 𝐵 | 𝐴 = 0.7
𝑃 𝐴, 𝐵, 𝐶 = 𝑃 𝐵 ∩ 𝐶 ∩ 𝐴 = 𝑃 𝐴 𝐶 ∩ 𝐵)𝑃 𝐵 ∩ 𝐶
= 𝑃 𝐴 𝐵 ∩ 𝐶)𝑃 𝐶 | 𝐵 𝑃(𝐵) = 𝑃(𝐴 | 𝐵, 𝐶)𝑃(𝐶 𝐵 𝑃(𝐵)
We can therefore decompose any joint
probability into a series of ‘chained’
conditional probability statements:
The Chain
Rule
Summary
𝑃(𝐴, 𝐵)
𝑃 𝐴 𝐵) = Fundamental rule of probability
𝑃(𝐵)
𝑃 𝐴 𝐵) Conditional probability
Slide 45
Binomial Distribution Example
• Factory mass produces components with a failure rate, F, of 20% per year
𝑃 𝐹 = 0.2 𝑃 𝑛𝑜𝑡 𝐹 = 0.8
• Customer buys 5 components and needs to predict number that will fail within one year
of use
• Experiment with six outcomes: {0, 1, 2, 3, 4, 5} (Assume components are independent)
• Joint failure events:
𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝑛𝑜𝑡 𝐹 = 0.85 = 0.32768
𝑃 𝐹 × 𝑃 𝐹 × 𝑃 𝐹 × 𝑃 𝐹 × 𝑃 𝐹 = 0.25 = 0.00032
𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝑛𝑜𝑡 𝐹 × 𝑃 𝐹 × 𝑃 𝐹 × 𝑃 𝐹 = 0.22 × 0.83 = 0.2048
5!
• How many ways can 2 components fail? 𝐶𝑜𝑚𝑏 5,2 = = 10
2! 5 − 2 !
• Probability of 2 failing is:
𝑛!
𝐶𝑜𝑚𝑏(𝑛, 𝑥) =
𝑥! (𝑛 − 𝑥)!
1 2 2
3 1 1 1 2 4
𝑃(𝑋 = 1) = 1− =3 =
1 3 3 3 3 9
Slide 47
We have three doors,
behind which one has a
valuable prize and two
have something
Monty Hall worthless. After the
Game contestant chooses
one of the three doors
Show Monty Hall (who knows
which door has the
prize behind it) always
reveals a door (other
than the one chosen)
that has a worthless
item behind it. The
contestant can now
choose to switch doors
or stick to his or her
original choice.
Slide 48
Monty Hall Game Show Answer
X r Switch
r X r Stick
r X Switch
X r Switch
r X Switch
X Stick
r r
Slide 3
All Probabilities are Conditional
The probability of every event is actually conditioned on K - the background
knowledge or context. So if A is event “roll a 4 on a die”
• P(A | K) = 1/6 where K is the assumption that the die is genuinely fair and
there can be no outcome other than 1, 2, 3, 4, 5, 6.
• P(A | K) = 1/6 where K is the assumption that there can be no outcome
other than 1, 2, 3, 4, 5, 6 and I have no reason to suspect that any one is
more likely than any other.
• P(A | K) = 1/8 where K is the assumption that the only outcomes are 1, 2, 3,
4, 5, 6, ‘lost’, ‘lands on edge’ and all are equally likely.
• P(A | K) = 1/10 where K is the assumption that the results of an experiment,
in which we rolled the die 200 times and observed the outcome “4” 20
times, is representative of the frequency of “4” that would be obtained in
any number of rolls.
Slide 4
Updating Beliefs when we Observe
Evidence
H E
(Hypothesis) (Evidence)
Slide 5
Rev
Bayes
Slide 6
From fundamental rule:
𝑃 𝐻, 𝐸
𝑃 𝐻 𝐸) =
𝑃 𝐸
Derivation 𝑃 𝐻, 𝐸
of Bayes 𝑃 𝐸 𝐻) =
𝑃 𝐻
Theorem
⇒ Bayes theorem:
𝑃 𝐸 𝐻)𝑃(𝐻)
𝑃 𝐻 𝐸) =
𝑃(𝐸)
• Boolean variables {H, not H}:
𝑃 𝐸 𝐻)𝑃(𝐻)
𝑃 𝐻 𝐸) =
𝑃(𝐸)
𝑃 𝐸 ℎ𝑖 )𝑃 ℎ𝑖
𝑃 ℎ𝑖 | 𝐸 = 𝑛
σ𝑖=1 𝑃 𝐸 ℎ𝑖 )𝑃 ℎ𝑖
• In a particular chest clinic 5% • We want to update our
of all patients who have been belief in hypothesis given
to the clinic are ultimately the evidence: 𝑃 𝐻 𝐸) = ?
diagnosed as having lung
cancer (H).
• We know from fundamental
– 𝑃 𝐻 = 𝑡𝑟𝑢𝑒 = 0.05 rule that :
That is about 49 or 50
people.
So about 1 out
of 50 who test
positive
actually have
the disease
Slide 14
Alternative
event tree
representation
Slide 15
Harvard Medical School using Bayes’
Theorem
𝐻: 𝑑𝑖𝑠𝑒𝑎𝑠𝑒, ¬𝑑𝑖𝑠𝑒𝑎𝑠𝑒 , 𝐸: 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒
𝑃(𝐻 = 𝑑𝑖𝑠𝑒𝑎𝑠𝑒)=1/1000
𝑃 𝐸 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐻 = 𝑑𝑖𝑠𝑒𝑎𝑠𝑒)=1.0
𝑃 𝐸 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝐻 = ¬𝑑𝑖𝑠𝑒𝑎𝑠𝑒)=0.05
Slide 16
But the good news is you never have to do
those calculations manually
Slide 17
Marginalization Notation
• Each time we marginalize the result is dependent on the current state of the
joint probability model
• As we add new evidence this state changes and Bayesian notation can make
this change look confusing!
𝑃 𝐵 = 𝑃 𝐵 𝐴)𝑃(𝐴)
𝐴
Is the 𝑃 𝐵 in each
• We now know 𝐴 = 𝑇𝑟𝑢𝑒 and marginalize again: equation the same?
𝑃 𝐵 = 𝑃 𝐵 𝐴 = 𝑇𝑟𝑢𝑒)𝑃(𝐴 = 𝑇𝑟𝑢𝑒)
𝐴
P(H | E) = P(E | H)
Slide 19
Prosecutor’s and Defendant’s
Fallacies
• “Suppose a crime has been committed. DNA found at the scene matches the
defendant. It is of a type which is present in 1 in a 1000 people.”
• Prosecutor’s fallacy
– “There is a 1 in a 1000 chance that the defendant would have the DNA
match if he were innocent. Thus, there is a 99.9% chance that he is guilty.”
– This simply (and wrongly) assumes P(H | E) = P(E | H) and also ignores the
prior P(H)
• Defendant’s fallacy
– “This crime occurred in a city of 8,000,000 people. Hence, this blood type
would be found in approximately 8,000 people. The evidence has provided
a probability of 1 in 8,000 that the defendant is guilty and thus has no
relevance.”
– This provides a correct posterior P(H | E) assuming prior P(H) = 1 in
8,000,000 but ignores the change in the posterior from the prior.
Example: Legal Reasoning
– The Birmingham Six
Slide 21
Example: Legal Reasoning
– The Birmingham Six
• Subsequent investigation showed that Nitro-glycerine traces
could be deposited by many common materials, including
playing cards. Roughly 50% of the population had such
traces. Therefore assume 𝑃 𝐸 = 𝑁𝐼𝑇𝑅𝑂 𝐻 = ¬𝐻𝐸) = 0.5
• Assume prior is 𝑃(𝐻 = 𝐻𝐸) = 0.05
• What is 𝑃 𝐻 = 𝐻𝐸 𝐸 = 𝑁𝐼𝑇𝑅𝑂)?
𝑃 𝐸 𝐻)
𝐿𝑅 = – the Likelihood Ratio (LR)
𝑃 𝐸 𝑛𝑜𝑡 𝐻)
Slide 24
Second Order Probability
• Let us assume someone has smuggled a die out of either Shady Sam’s or Honest
Joe’s, but we do not know which casino it has come from. We wish to determine the
source of the die from (a) a prior belief about where the die is from and (b) data
gained from rolling the die a number of times.
• Assume: 𝑃 𝐽𝑜𝑒 ≡ 𝑃 𝑝 = 1ൗ6 = 0.7 𝑃 𝑆𝑎𝑚 ≡ 𝑃 𝑝 = 1ൗ12 = 0.3
• Data consists of one “6” and nineteen “not 6” results.
0.168756
0.168756
Slide 25
0.168756
The null hypothesis H and p-values
Slide 26
Hypothesis test for coin
Cumulative
Probability = 0.99
63
• If you do 100 flips you would reject the null hypothesis that it is
fair coin once you had seen more than 62 heads
Slide 27
Lindley’s Paradox
• Assume we have 999 ‘fair’ coins and one coin known to be biased toward
‘heads’. Select a coin randomly. 𝑃(𝐻0 ) = 0.999 , 𝑃(𝐻1 ) = 0.001
• Assume:
– 𝑃(𝐻0 : 𝑝 = 0.5)
– 𝑃(𝐻1 : 𝑝 = 0.9)
• As good experimenters we set a p-value of 0.01 in advance. Then we must
reject 𝐻0 if we see X = 63
• Use binomial to calculate probability of evidence X = 63
Slide 28
Lindley’s Paradox
• Then by Bayes:
Slide 29
2nd Edition Chapter 7:
Slide 30
Since it is important for Norman to arrive on time for work, a
number of people (including Norman himself) are interested in
the probability that he will be late. Since Norman usually
travels to work by train, one of the possible causes for Norman
being late is a train strike. Because it is quite natural to reason
Slide 31
Simple Risk Assessment Problem - Probabilities
𝑃 𝑁 = 𝑇𝑟𝑢𝑒
= 𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑇 = 𝑇𝑟𝑢𝑒)𝑃 𝑇 = 𝑇𝑟𝑢𝑒 𝑃 𝑇 = 𝐹𝑎𝑙𝑠𝑒 | 𝑁 = 𝑇𝑟𝑢𝑒
+𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑇 = 𝐹𝑎𝑙𝑠𝑒)𝑃 𝑇 = 𝐹𝑎𝑙𝑠𝑒 𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑇 = 𝐹𝑎𝑙𝑠𝑒)𝑃(𝑇 = 𝐹𝑎𝑙𝑠𝑒)
=
= 0.8 × 0.1 + 0.1 × 0.9 𝑃(𝑁 = 𝑇𝑟𝑢𝑒)
= 0.17 0.1 × 0.9
= = 0.52941
0.17
Slide 32
Simple Risk
Assessment
Problem –
Automatic in
AgenaRisk
Slide 33
Accounting
for Multiple
Causes and
Effects
𝑃 𝑂, 𝑀, 𝑇, 𝑁 = 𝑃 𝑀 𝑂, 𝑇)𝑃 𝑂 𝑃 𝑁 𝑇)𝑃(𝑇)
Slide 34
Calculations – Martin Late
Slide 35
Calculations – Martin Is Late given Norman
is Late
𝑃 𝑀 = 𝑇𝑟𝑢𝑒 | 𝑁 = 𝑇𝑟𝑢𝑒 = 𝑃(𝑀 = 𝑇𝑟𝑢𝑒 𝑂, 𝑇 𝑃 𝑂 𝑃 𝑇 | 𝑁 = 𝑇𝑟𝑢𝑒
𝑂,𝑇
Slide 36
Simple Risk
Assessment
Problem –
Automatic in
AgenaRisk
Slide 37
Let’s add an edge!
Complex
Declare conditional probability tables the same….
Case
𝑃 𝑁 𝑇, 𝑂) = 𝑃 𝑀 𝑇, 𝑂)
Slide 38
Calculations – Complex Case
Slide 39
Calculations – Complex Case
𝑃 𝑂, 𝑇 | 𝑁 = 𝑇𝑟𝑢𝑒
𝑃(𝑁 = 𝑇𝑟𝑢𝑒, 𝑂, 𝑇)
=
𝑃(𝑁 = 𝑇𝑟𝑢𝑒)
𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑂, 𝑇)𝑃(𝑂)𝑃(𝑇)
=
𝑃(𝑁 = 𝑇𝑟𝑢𝑒) Slide 40
Calculations – ‘Martin Is Late’ given
‘Norman is Late’ in Complex Case
𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑂, 𝑇)𝑃(𝑂)𝑃(𝑇)
𝑃 𝑂, 𝑇 | 𝑁 = 𝑇𝑟𝑢𝑒 =
𝑃(𝑁 = 𝑇𝑟𝑢𝑒)
Slide 41
Calculations – ‘Martin Is Late’ given
‘Norman is Late’ in Complex Case
𝑃 𝑁 = 𝑇𝑟𝑢𝑒 𝑂, 𝑇)𝑃(𝑂)𝑃(𝑇)
𝑃 𝑂, 𝑇 | 𝑁 = 𝑇𝑟𝑢𝑒 =
𝑃(𝑁 = 𝑇𝑟𝑢𝑒)
Note: Notice that the denominator in all cases is constant at 0.446 – it acts as a
Slide 42
normalization constant to ensure all probabilities sum to one.
Calculations – ‘Martin Is Late’ given
‘Norman is Late’ in Complex Case
𝑃 𝑀 = 𝑇𝑟𝑢𝑒 | 𝑁 = 𝑇𝑟𝑢𝑒 = 𝑃(𝑀 = 𝑇𝑟𝑢𝑒 𝑂, 𝑇 𝑃 𝑂, 𝑇 | 𝑁 = 𝑇𝑟𝑢𝑒
𝑂,𝑇,𝑁
Slide 43
Calculations – Complex Case Proof
𝑃 𝑀, 𝑂, 𝑇, 𝑁
𝑃 𝑀 | 𝑂, 𝑇, 𝑁 = First use Fundamental Rule
𝑃(𝑂, 𝑇, 𝑁)
𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃(𝑇)
=
σ𝑀 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃(𝑇)
Next calculate marginal 𝑃 𝑀
σ𝑁,𝑂,𝑇 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃(𝑇)
𝑃 𝑀 =
σ𝑀,𝑁,𝑂,𝑇 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃(𝑇) 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃 𝑇 = 1
𝑀,𝑁,𝑂,𝑇
= 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑁 𝑂, 𝑇)𝑃 𝑂 𝑃 𝑇
𝑁,𝑂,𝑇
= 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑂, 𝑇|𝑁 𝑃(𝑁)
𝑂,𝑇 𝑁
= 𝑃 𝑀 | 𝑂, 𝑇 𝑃 𝑂, 𝑇 = 𝑃(𝑀) Slide 44
𝑂,𝑇
Lesson Summary
Introduction to Bayesian
Networks and AgenaRisk
Slide 3
Bayesian and Bayesian
Network Applications
• Intelligent search
• Collaborative filtering
• Recommendation engines
• Machine learning
• Expert systems
• Data mining
• Risk assessment
• Computer vision
Slide 4
Definition
Slide 5
Simple Model
Slide 8
The crucial graphical feature of a BN
Slide 9
BN for the Asia example model
𝑃(𝐴)
𝑃(𝑆)
𝑃(𝐶|𝑆)
𝑃(𝐵|𝑆)
𝑃(𝑇𝐵|𝐴)
𝑃(𝑇𝐵𝑜𝐶|𝑇𝐵, 𝐶)
𝑃(𝑋|𝑇𝐵𝑜𝐶)
𝑃(𝐷|𝑇𝐵𝑜𝐶, 𝐵)
Slide 10
Properties of BNs
• Computations on full joint probability distribution not
feasible for large problems.
• Exploit conditional independence assumptions to
reduce combinatorial explosion
• Relatively easier to elicit conditional probability tables
from experts than ask for joint probabilities
• Causal structure easier to understand than
mathematics
• Fast algorithms are available to compile and execute
BNs (Pearl and Lauritzen & Speigelhalter)
• Evidence is propagated throughout BN by exploiting
Bayes Theorem
• No need to calculate by hand nor use standard
analytic formulations (e.g. conjugacy)
• Forecasts can be done with incomplete evidence
Slide 11
• https://www.agenarisk.com/installation
-upgrade-guide
Download • If you have any installation or setup issues
Link DO NOT contact AgenaRisk!
• Post to the forum….
AgenaRisk License
Instructions
• Floating License Server:
– Installed in ITL
– You can download it onto your laptop
or any other machines
– Runs on Windows/Mac/Linux
– Must have live internet connection
• Floating License Server address:
FLOAT-427DB0-29EAEF-82D719-82D4FA-D3B991-CE9D76
• RTFM – AgenaRisk user manual
Slide 13
Practical Session 1
Start AgenaRisk
Open Asia Model from \Model Library\Introductory\Asia\Asia.ast
Open AgenaRisk 10 User Manual
Explore different graph views
Examine Risk table view
Batched evidence
Soft evidence
Hide nodes
Add label, picture, edge annotations
Add notes
Perform sensitivity analysis
Slide 15
How to
build a BN
Slide 16
How to
Execute
and Query
a BN
Slide 17
Lesson Summary
Slide 3
Marginalization by Variable Elimination
𝑃 𝐴, 𝑇, 𝐿, 𝐸, 𝑋 = 𝑃 𝐴 𝑃 𝑇 𝐴 𝑃 𝐿 𝑃 𝐸 𝑇, 𝐿 𝑃(𝑋|𝐸)
𝑃 𝑇 = 𝑃(𝐴)𝑃(𝑇|𝐴)𝑃(𝐿)𝑃(𝐸|𝑇, 𝐿)𝑃(𝑋|𝐸 )
𝐴,𝐿,𝐸,𝑋
Slide 5
Causal (Evidential) Trail (Serial)
Slide 6
Causal Evidential Trail (Serial)
Slide 7
Common Cause (Diverging)
Slide 8
Common Cause (Diverging)
Slide 10
Common Effect (Converging)
Slide 11
Common Effect (Converging) Example
𝑎1 𝑎2
𝑏1 𝑏2 𝑏1 𝑏2
𝑃 𝐶 𝐴, 𝐵) = 𝑐 CPT as joint probabilities
1 .7 .5 .4 .8
𝑐2 .3 .5 .6 .2
then
Instantiate 𝐶 = 𝑐1 Instantiate 𝐴 = 𝑎2
𝑎1 𝑎2 𝑎1 𝑎2
𝑏1 𝑏2 𝑏1 𝑏2 𝑏1 𝑏2 𝑏1 𝑏2
𝑐1 0 0 .4 .8
𝑐1 .7 .5 .4 .8 𝑐2 0 0 0 0
𝑐2 0 0 0 0
𝑎1 𝑎2 𝑎1 𝑎2
𝑃(𝐴) = 𝑃(𝐴) =
.5 .5 0 1
𝑏 𝑏2 𝑏1 𝑏2
𝑃(𝐵) = 1 𝑃(𝐵) =
.46 .54 .33 .66
Note evidence instantiation is
equivalent to multiplying 0 or
Changed! Slide 12
1 into NPT entries
Determining d-separation
Enter evidence
on A G is D-separated
from A
blocked
J updated by F.
J updated by L
BUT no evidence
opened on L or descendant
(there is none)
Slide 13
Evidence: B = b and M = m
• Construct junction tree from BN graph
• Reduce graph to junction tree containing
serial connections with no loops
• Each converging BN fragment joined
together into single cluster
• Diverging and serial BN fragments serially
connected to allow blocking evidence flows
Overview of • Propagate evidence through junction tree
• When evidence entered calculate changes
Junction Tree to BN locally
algorithm • Propagate impact of evidence globally
through junction tree
• Use message passing to update likelihoods
throughout BN
• Calculations done using:
• Bayes theorem
• Fundamental rule
• Marginalisation
From Bayesian Network to Junction Tree
Slide 15
From Bayesian Network to Junction Tree
Slide 16
Creating a Moral Graph
B C G
D E H
A A A
B C G B C G B C G
D E H D E H D E
F F F
A A A A
B C B C B
D E D E D E D E
F
A
GEH ACE AE
E Clusters: GEC ABD E
E DEF ADE Slide 18
Cluster Identification
Slide 20
Two Step Propagation – Collection
ACE
CE CEG EG GEH
Slide 21
Two Step Propagation – Distribution
ACE
CE CEG EG GEH
Slide 22
Table Notation used during propagation
Slide 23
Propagation
AB B BC
∗
𝑡∗ 𝐵
∗
𝑡 ∗ (𝐵) 𝑡 ∗ (𝐵)
𝑡 𝐵 = 𝑡 𝐵𝐶 = 𝑡 𝐵 = 𝑡(𝐴𝐵) = 𝑡(𝐴𝐵) = 𝑡 ∗ (𝐴𝐵)
𝑡 𝐵 𝑡(𝐵) 𝑡(𝐵)
𝐶 𝐴 𝐴 𝐴
trick!
Slide 24
* Indicates evidence instantiated in cluster
Monty Hall Game Show – Simple
Solution
Monty Hall Game Show – More
Complex Solution
Simpson’s paradox drug example
A new drug is being tested on a group of 800 people (400
men and 400 women) with a particular ailment.
For men: 70% (70 out of 100) taking the placebo recover, but only 60%
(180 out of 300) taking the drug recover.
For women: 30% (90 out of 300) taking the placebo recover, but only 20%
(20 out of 100) taking the drug recover.
Slide 28
Simpson’s paradox explained
You cannot ignore Sex as a cause, though ideally want to design drug
trial so that drug is administered equally across all relevant strata,
including sex. Slide 30
Proposition 1: Correlation implies causation. No!
Refuting the
Assertion ‘If
There Is No
Correlation
Then There
Cannot be Proposition 2: Causation implies correlation. No!
Causation’
Causal model
Question: Is it beneficial to take the drug?
Slide 32
Modelling Counterfactuals
Suppose we know that a person who took the drug did not recover. Without knowing the
sex of the person, what is the probability that the person would have recovered if they had
not taken the drug?
Slide 33
Lesson Summary
Slide 3
Growth in Probability Table Size
Suppose that we decided to change the states of the water level nodes to be:
very low, low, medium, high, very high. Then the NPT now has 5 × 5 × 4 = 100
cells. If we add another parent with 5 states then this jumps to 500 entries,
and if we increase the granularity of rainfall to, say, 10 states then the NPT has
1,250 cells.
• Generally the most difficult to handle since few ‘non-manual’ tricks can be applied
• Only main non-manual tricks available are to use
• Comparative expressions
• Partitioned expressions
Slide 5
Using Comparative Expressions
Slide 6
Boolean Nodes
• Any state is the complement of the other and they are mutually exclusive
• OR function: if (A == “True” || B== “True”, “True”, “False”)
• AND function: if (A == “True” && B== “True”, “True”, “False”)
True 1 1 1 0 True 1 0 0 0
False 0 0 0 1 False 0 1 1 1
OR AND Slide 7
OR Example
Slide 8
OR Example – Assigned and Calculated Marginals
Slide 9
OR Example Diagnostic Inference
Slide 10
Naïve Bayes
Classifier
Model
Slide 12
AND Example
Slide 13
The M from N operator
Slide 14
NoisyOR
Like the OR function but where there is uncertainty (e.g even if all
the causal factors are “True” we cannot say with certainty that the
person will suffer a heart attack before 60).
Slide 15
NoisyOR
𝑌 = 𝑁𝑜𝑖𝑠𝑦𝑂𝑅(𝑋1 , 𝑣1 , 𝑋2 , 𝑣2 ,…, 𝑋𝑛 , 𝑣𝑛 , 𝑙)
Equivalent to:
𝑃 𝑌 = 𝑡𝑟𝑢𝑒 𝑋1 … 𝑋𝑛 ) = 1 − (1 − 𝑙) ෑ (1 − 𝑣𝑖 )
𝑋𝑖 𝑖𝑠 𝑡𝑟𝑢𝑒
Where 𝑙 is the leak value and 𝑣𝑖 is the weight (probability) associated with cause 𝑋𝑖
Slide 16
NoisyOR Example
Slide 17
NoisyOR Example Calculation
Slide 18
Ranked Nodes and Functions
Ranked nodes exploit the fact that there is essentially an underlying numerical scale
Slide 19
Truncated
Normal
(TNormal)
Distribution
Slide
20
Ranked nodes underlying (hidden)
scales
Ranked nodes with TNormal NPTs
Slide 22
Ranked Nodes - The Weighted Mean Function
Slide 23
2nd Edition Chapter 10:
Slide 24
Functions and Continuous Distributions
Slide 25
Approximating a continuous distribution using
discrete intervals
𝑛 𝑏
Slide 26
Joint and conditional probabilities
• Joint: 𝑓(𝑥1 , . . . , 𝑥𝑗 , . . . , 𝑥𝑘 )
• Conditional: 𝑓 𝑥1 , . . . , 𝑥𝑗 , . . . , 𝑥𝑘 = 𝑓 𝑥1 𝑥2 )𝑓 𝑥2 𝑥𝑗 , . . . , 𝑥𝑘 )
• Fundamental rule:
𝑓(𝑥, 𝑦)
𝑓 𝑥|𝑦 =
𝑓(𝑦)
Slide 28
Dynamic Discretization
𝑃(𝑋 = 𝑓𝑎𝑙𝑠𝑒) = 𝑁𝑜𝑟𝑚𝑎𝑙(𝜇1 , 𝜎12 ) 𝑋 = 𝑓𝑎𝑙𝑠𝑒
𝑃 𝑌 𝑋) = ൝
𝑃(𝑋 = 𝑡𝑟𝑢𝑒) = 0.5 𝑁𝑜𝑟𝑚𝑎𝑙(𝜇2 , 𝜎22 ) 𝑋 = 𝑡𝑟𝑢𝑒
1 1
Analytical solution: 𝑃(𝑌) = 𝑁 𝑌 | 𝜇1 , 𝜎12 + 𝑁 𝑌 | 𝜇2 , 𝜎22
2 2 Slide 29
Dynamic Discretization – Car Costs Example
Slide 30
Dynamic Discretization – Car Costs Example
Features:
• Hybrid model with continuous
variables conditioned on discrete
• Use scenarios to evaluate options
• Statistical distributions are “mixed” by
partitioning functions acting like IF
statements
Slide 31
Parameter Learning using Dynamic Discretization
Slide 32
Maximum likelihood (frequentist) estimate
σ𝑛𝑖=1 𝑝𝑖
𝜇ҭ 𝑝 = 𝑝lj = = 0.664
𝑛
2 σ𝑛𝑖=1(𝑝𝑖 − 𝑝)lj 2
𝜎ҭ𝑝 = = 0.0391
𝑛−1
• Parameters unknown
• Data known
• Set sensible priors and likelihoods
𝜇~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,1)
𝜎 2 ~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,1)
𝑝~𝑇𝑁𝑜𝑟𝑚𝑎𝑙(𝜇, 𝜎 2 , 0,1)
𝑃 𝑝 𝜇, 𝜎 2 , 𝑑𝑎𝑡𝑎) Slide 34
Bayesian Result
Slide 35
Comparing Frequentist and Bayesian Results
Frequentist
Bayesian
Slide 36
Second Order Probability
• Let us assume someone has smuggled a die out of either Shady Sam’s or Honest
Joe’s, but we do not know which casino it has come from. We wish to determine the
source of the die from (a) a prior belief about where the die is from and (b) data
gained from rolling the die a number of times.
• Assume: 𝑃 𝐽𝑜𝑒 ≡ 𝑃 𝑝 = 1ൗ6 = 0.7 𝑃 𝑆𝑎𝑚 ≡ 𝑃 𝑝 = 1ൗ12 = 0.3
• Data consists of one “6” and nineteen “not 6” results.
0.168756
0.168756
0.168756
Slide 37
Fixed value for 𝑝 Unknown value for 𝑝
Slide 38
Risk Aggregation
• Sum of a collection of financial assets or events, where
each asset or event is modelled as a variable:
• In cyber security we might estimate the number of
network breaches over a year and, for each breach,
have in mind the severity of loss (in terms of lost
availability, lost data or lost system integrity).
• In insurance we might have a portfolio of insurance
policies and expect a frequency of claims to be made
in each year, with an associated claim total.
• In operational risk we might be able to forecast the
frequency and severity of classes of events and then
wish to aggregate these into a total loss distribution
for all events (the so-called Loss Distribution Approach
(LDA).
Slide 39
Risk Aggregation Example
• Where:
𝑆0 = 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 0.2,100 , 𝑆1 = 𝑁𝑜𝑟𝑚𝑎𝑙 50,100 , 𝑆2 = 𝑈𝑛𝑖𝑓𝑜𝑟𝑚 0,50
Slide 40
Risk Aggregation
Slide 41
Compound Sum Analysis
Slide 42
Lesson Summary
Slide 3
Use d-connections for eliciting knowledge?
Slide 4
Do arc directions represent inference or causality?
Slide 6
Cause to effect and effect to cause equivalent
Slide 9
Cause Consequence Idiom
Slide 11
Joining Cause Consequence Idioms
Slide 12
Risk as Cause-Consequence
Drive Drive
fast? Speed fast?
warnings? Crash?
Make
Make
Crash? meeting?
meeting?
Seat
Nerves?
belt?
Win
Injury? contract?
Slide 13
More Complex Example
Slide 14
Measurement Idiom
Case 1: When the inaccuracy is fixed and Case 2 (Indicators): Only indirect measures are
known because direct measurement is possible
possible
Slide 16
Measurement Idioms: Implicit and Explicit
Slide 17
Data mined
Bayesian Network
model
Assumption 1. Where a
single expert provides their
opinion
Assumption 2. Where a
single expert is allowed to
make three repeated
judgments on the same
product
Assumption 3. Where
different independent
experts are used
Assumption 4. Where
different experts are used
but they suffer from the
same inaccuracy, that is,
they are dependent in some
significant and important
way
Slide 22
Definitional/Synthesis Idiom
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒
𝑣𝑒𝑙𝑜𝑐𝑖𝑡𝑦 =
𝑡𝑖𝑚𝑒
• Case 2: Hierarchical Definitions
Slide 23
Definitional/Synthesis Idiom -divorcing
Slide 24
• The induction idiom is simply a
model of statistical induction to
learn some parameter that
might then be used in some
other BN idiom.
• For example, we might learn the
accuracy of a thermometer and
then use the mean accuracy as a
probability value in some NPT
using the measurement idiom.
• None of the reasoning is
explicitly causal.
• The focus of Bayesian statistics is
induction and inference.
Induction Idiom
Induction Idiom: Model example
Asymmetry: Impossible paths
Person descends a staircase where there is a risk they will slip.
If they slip they fall either forwards or backwards:
Slips Yes No
Forwards 0.1 ??
Backward 0.9 ??
Slide 27
Asymmetry and Event Trees
• But suppose problem extended:
▪ If the person falls forwards they will be startled (but otherwise unharmed).
▪ If the person falls backwards they might break their fall or not.
▪ If they break their fall they will be bruised, but if not they will suffer a head
injury.
• Can always simply use BN equivalent of event tree.
• Event tree is asymmetric – some future states are unreachable from past states:
Impossible paths!
No Outcome OK
Slips
Slide 28
Asymmetry: Bayesian Network Solutions
• BN solution 1:
• BN solution 2:
Slide 29
Mountain pass problem
Mountain pass problem: We want to arrive at an appointment to visit a friend in
the next town. We can either take a car or go by train. The car journey can be
affected by bad weather, which might close the mountain pass through which
the car must travel. The only events that might affect the train journey are
whether the train is running on schedule or not; bad weather at the pass is
irrelevant to the train journey.
Slide 30
Mountain pass: obvious BN solutions do not work
Not only does the make appointment While it is possible to ease the
node have many impossible states, but problems associated with the NPT for
the model fails to enforce the crucial make appointment by introducing
assumption that ‘take car’ and ‘take synthetic nodes as shown here the
train’ are mutually exclusive problem of mutual exclusivity is still
alternatives with corresponding not addressed
mutually exclusive paths.
Slide 31
Mountain pass: BN solution
Slide 32
Lesson Summary
Slide 3
Risk and ‘Systems Perspective’
Slide 4
Operational Risk Terminology
Role of Causation
Slide 6
Swiss
Cheese
Model for
Rare
Catastrophic
Events
Slide 7
Swiss
Cheese
Model for
Rare
Catastrophic
Events
Slide 8
Bow Tie Model
Slide 9
Fault Tree Example as BN
if(Power_Failure=="True"||
Computer_Failure__OR=="True",
"True", "False") if(CPU=="True"||Digital_I_O=="True",
"True", "False")
mfromn(2,Power_Supply_1=="True",
Power_Supply_2=="True“
Power_Supply_3=="True")
Slide 10
Fault Tree Example as BN
Slide 11
Common Causes, Diversity and Resilience
Common cause
Event Trees
Example: Derailment Events
Slide 14
Event Tree for Derailment Example
Slide 16
‘Soft Systems’ Approach
Slide 18
‘Soft Systems’ Approach Example
Slide 19
• What risks can occur?
• Can they occur in my process?
• How rare are they?
Operational • How reliable are our controls?
• How good is our internal and external data?
risk in • What is likely level of losses?
finance • What is worst case scenario?
• How can we improve?
• How much capital should we set aside?
Example financial
accident – rogue
trading
Slide 21
Resiliency Perspective
Slide 22
Rogue Trading
Process Controls:
1. Trade request • Front office control environment (the
2. Conduct Trade control environment affects the
probability of unauthorised trading)
3. Registration of Trade
• Back office reconciliation checks
4. Reconciliation check (performed per trade)
5. Settlement and Netting • Market positions and results
6. Entering of Trade on Trading monitoring, Value-At-Risk (VAR)
Books calculation (periodical)
• Audit checks (periodical but not as
often as the market checks)
Slide 23
Rogue Trading States
Slide 24
Influencers on Controls
Slide 25
Loss Model
Slide 26
Slide 26
Loss Model
E - Events
C - Controls
O – Operational
Failures
D – Dependency
Factors
F – Failures
𝑇 𝑡 𝑚 𝑛 𝑜
𝑃(𝐸, 𝐶, 𝑂, 𝐹, 𝐷) = ෑ ෑ ෑ ෑ ෑ 𝑃(𝐸𝑡 |𝐸𝑡−1 , 𝐶𝑡 ) 𝑃(𝐶𝑡 |O𝐶𝑡 )𝑃(𝑂𝑗 |F𝑂𝑗 , D𝑂𝑗 )𝑃(𝐷𝑘 |O𝐶𝑡−𝑠 )𝑃(𝐹𝑖 )𝑃(𝐶0 )
𝑡=1 𝑠=1 𝑗=1 𝑖=1 𝑘=1 Slide 27
Conditional Probability Tables
𝑃(𝐸𝑡 |𝐸𝑡−1 , 𝐶𝑡 )
1 if 𝑂1 ∪ 𝑂2 = 𝑓𝑎𝑖𝑙
𝑃(𝐶1 = 𝑓𝑎𝑖𝑙|𝑂1 , 𝑂2 ) = ቊ
0 otherwise
Slide 28
Executing the Loss Model
Slide 29
Scenarios and Stress Testing
P(Loss)
Median loss
99% percentile
(1 in 100 year loss)
Loss, $
• Consider a scenario comprising of two events: market crash, 𝑀, and a rogue trading,
𝑅, event.
• Each of these is judged to have a percentage probability of occurring of 10% and 3%
respectively.
• For each of these we have discrete Boolean nodes in our BN and associated with
each state must assign a loss distribution, 𝐿𝑀 and 𝐿𝑅 .
• For 𝐿𝑀 this might be:
𝑃(𝐿𝑀 |𝑀 = 𝐹𝑎𝑙𝑠𝑒)~𝑁(10,100)
𝑃(𝐿𝑀 |𝑀 = 𝑇𝑟𝑢𝑒)~𝑁(500,10000)
• For 𝐿𝑅 this might be:
𝑃(𝐿𝑅 |𝑅 = 𝐹𝑎𝑙𝑠𝑒)~0
𝑃(𝐿𝑅 |𝑅 = 𝑇𝑟𝑢𝑒)~𝑁(250,10000)
$687m
Slide 32
Stress Testing – Tail Analysis
Slide 33
Historical interest rates
f ( X 4, Y )
f (Y | X 4 ) = dX ...looks tricky
X f ( X 4)
1 if X 4
P ( Z = true) =
0 if X 4
Result is: E (Y | X 4) = 60
Slide 35
Cyber Security
Modelling
• Cyber security analysis involves the
modelling of vulnerabilities in an
organisation’s information infrastructure
that might be maliciously attacked,
compromised and exploited by external
or internal agents
• Features:
– Physical assets (networks,
servers, mobile phones, etc.),
people
– Processes and procedures that
might contain security ‘holes’,
bugs, unpatched/un-updated
systems
– Other features that might present
themselves as vulnerabilities
Slide 36
Slide 37
Cyber Terminology
Slide 3
In 1964, Malcolm and A maths instructor assigned
Janet Collins were the following approximate
convicted of mugging estimates to the probability of
the particular characteristics:
People v and robbing an elderly
woman in an alleyway in • Yellow car: 1/10
Collins Los Angeles. The victim
had described her
• Man with moustache: 1/4
• Woman with ponytail: 1/10
(1964–68) assailant as a young
blonde woman, and • Woman with blonde hair:
another witness saw a 1/3
blonde woman with a
• Black man with beard: 1/10
ponytail run out of the
alley and jump into a • Interracial couple in car:
waiting yellow car driven 1/1000
by a black man with a
moustache and beard.
Use product rule to get 1 in
The Collinses were a
12 million chance that the
local couple who
couple were innocent
“matched” these various
characteristics.
Slide 4
Revising beliefs when you get forensic ‘match’
evidence
Imagine 1,000
other people
also at scene
Fred has size 13
About 10
out of the
1,000 people
have size 13
Fred is one of
11 with size 13
So there is a
10/11 chance
that Fred is NOT
guilty
That’s very
different from
the prosecution
claim of 1%
Prosecutor’s and Defendant’s
Fallacies
• “Suppose a crime has been committed. DNA found at the scene matches the
defendant. It is of a type which is present in 1 in a 1000 people.”
• Prosecutor’s fallacy
– “There is a 1 in a 1000 chance that the defendant would have the DNA
match if he were innocent. Thus, there is a 99.9% chance that he is guilty.”
– This simply (and wrongly) assumes P(H | E) = P(E | H) and also ignores the
prior P(H)
• Defendant’s fallacy
– “This crime occurred in a city of 8,000,000 people. Hence, this blood type
would be found in approximately 8,000 people. The evidence has provided
a probability of 1 in 8,000 that the defendant is guilty and thus has no
relevance.”
– This provides a correct posterior P(H | E) assuming prior P(H) = 1 in
8,000,000 but ignores the change in the posterior from the prior.
What have the law and Bayes’ in common?
Slide 11
Example: Legal Reasoning
– The Birmingham Six
Slide 12
Example: Legal Reasoning
– The Birmingham Six
• Subsequent investigation showed that Nitro-glycerine traces
could be deposited by many common materials, including
playing cards. Roughly 50% of the population had such
traces. Therefore assume 𝑃 𝐸 = 𝑁𝐼𝑇𝑅𝑂 𝐻 = ¬𝐻𝐸) = 0.5
• Assume prior is 𝑃(𝐻 = 𝐻𝐸) = 0.05
• What is 𝑃 𝐻 = 𝐻𝐸 𝐸 = 𝑁𝐼𝑇𝑅𝑂)?
𝑃 𝐸 𝐻)
𝐿𝑅 = – the Likelihood Ratio (LR)
𝑃 𝐸 𝑛𝑜𝑡 𝐻)
Slide 16
DNA Profiles
• DNA is formed from four chemical “bases” that bind together in pairs, called
“base pairs.” Each person’s DNA contains millions of such base pairs.
• A person’s DNA profile is determined by analyzing just a small number of
regions of the DNA, known as loci or markers. At each locus there are two
alleles, one inherited from mother and one from the father:
Slide 17
DNA Profiles
• The more closely two people are related, the more likely they are to
share genotypes.
• Getting a ‘reference’ sample from suspect is usually straightforward
but ‘questioned’ sample from crime scene can be more difficult
because it may be small or degraded (so-called ‘low template’ DNA
samples)
• How much of the original DNA sample can be determined from the
low template DNA sample?
– Suppose only two loci identifiable for Fred, D21 and vW1 with reference probabilities 0.09 and
0.14
» Fred’s RMP will be 0.09(0.14) = 0.126 = 1.26%
» A DNA match is only approx. 1 in 100!
• Even more problematic when we have mixed samples from more than
one person
• Also judgements about the peaks are made by forensic specialists
and algorithms
• Idea of a match is actually quite complex! Slide 19
DNA Profiles - Issues
Slide 20
• Assumes two deaths
would be independent
In 1999, Sally Clark was events, and hence that
convicted of the murder the assumed probability
The Case of of her two young children
who had died one year
of 1/8500 for a single
SIDS death could be
R v Sally apart. The prosecution
case relied partly on
multiplied by 1/8500.
• This (very low)
Clark flawed statistical
evidence presented by
probability is assumed
to be equivalent to the
(1998–2003) paediatrician Professor
Sir Roy Meadow to
probability of Sally
Clark’s innocence
counter the hypothesis (prosecutor’s fallacy).
that the children had died • The (prior) probability of
as a result of Sudden a SIDS death was
Infant Death Syndrome considered in isolation,
(SIDS) rather than without comparing it with
murder. He asserted that the (prior) probability of
there was a 1 in 73 the proposed
million probability of two alternative, namely of a
SIDS deaths in the same child being murdered by
family. a parent
Slide 21
Sally Clark Model – Original Trial
Slide 22
Sally Clark Model – Re-Trial
Slide 23
2nd Edition Chapter 16:
Slide 24
Legal Arguments
Slide 25
.. this is a
typical real
legal BN
Evidence idiom
Slide 29
Example Evidence Idiom
Slide 30
Evidence Accuracy Idiom
Slide 31
Evidence Accuracy Idiom DNA Example
Slide 32
Evidence Accuracy Idiom DNA Example
Slide 33
Idioms to deal with Motive and Opportunity
Slide 34
Idiom for Dependency Between
Different Types of Evidence
Slide 35
Camera Dependence
Suppose, for example, that the two pieces of evidence for ‘defendant present at scene’ were images from two
video cameras. If the cameras were of the same make and were pointing at the same spot then there is clear
dependency between the two pieces of evidence: if we know that one of the cameras captures an image of a
person matching the defendant, there is clearly a very high chance that the same will be true of the other
camera, irrespective of whether the defendant really was or was not present. Conversely, if one of the
cameras does not capture such an image, there is clearly a very high chance that the same will be true of the
other camera, irrespective of whether the defendant really was not present.
Slide 36
Alibi evidence
H2 influences A1 Slide 37
Explaining Away Idiom
• Explaining away idiom simple common consequence:
Enforces
mutual Slide 38
exclusivity
Agatha Christie’s play
Witness for the Prosecution
Tyrone Power, Marlene Dietrich, Charles Laughton
Academy Award Best Picture Winner 1957 Slide 39
Incriminating Evidence
Slide 40
Romaine, the Loyal Wife
There were also several pieces of exonerating evidence: the maid admitted that
she disliked Vole; the maid was previously the sole benefactor in Miss French’s
will; Vole’s blood type was the same as Miss French’s, and thus also matched
the blood found on his cuffs; Vole claimed that he had cut his wrist slicing ham;
Vole had a scar on his wrist to back this claim.
There was one other critical piece of defence evidence: Vole’s wife, Romaine,
was to testify that Vole had returned home at 9.30pm. This would place him far
away from the crime scene at the time of Miss French’s death. Slide 41
Romaine – Witness for the Prosecution!
Then during the trial Romaine was called as a witness for the
prosecution. Dramatically, she changed her story and testified
that Vole had returned home at 10.10pm, with blood on his
cuffs, and had proclaimed: ‘I’ve killed her’!
Slide 42
Scandal! Communist Overseas Lover
Just as the case looked hopeless for Vole, a mystery woman supplied the defence
lawyer with a bundle of letters. Allegedly these were written by Romaine to her
overseas lover (who was a communist!). In one letter she planned to fabricate her
testimony in order to incriminate Vole, and rejoin her lover.
This new evidence had a powerful impact on the judge and jury. The key witness
for the prosecution was discredited, and Vole was acquitted.
Slide 43
The dénouement.....
After the court case, Romaine revealed to the defence lawyer that she had
forged the letters herself. There was no lover overseas.
She reasoned that the jury would have dismissed a simple alibi from a
devoted wife; instead, they could be swung by the striking discredit of the
prosecution’s key witness!
Slide 44
Agatha Christie’s play
Witness for the Prosecution
Slide 45
Agatha Christie’s play
Witness for the Prosecution
Slide 46
Will Bayes be Accepted in the Law?
Lesson Summary
Slide 3
Estimating Reliability
Slide 4
Discrete Reliability Modeling
• Model assumptions:
𝑝𝑓𝑑~𝐵𝑒𝑡𝑎(𝛼, 𝛽, 0,1)
𝑓~𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝𝑓𝑑)
𝑛~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(1,10000)
Slide 5
Beta Priors for 𝑝𝑓𝑑
𝑝𝑓𝑑~𝐵𝑒𝑡𝑎(𝛼, 𝛽, 0,1)
𝛼 is the number of
“past” successes
𝛽 is the number of
“past” failures
Reflects confidence in
people and processes
Slide 6
Discrete Reliability Example
Slide 7
Confidence in Target
Reliability target (𝑝𝑓𝑑 < 0.01)
90%
10%
Slide 9
Censoring
Slide 10
Challenging TTF estimation problem
with imperfect data
Solution
• Model TTF for each system from failure data
• Handle observations as censored (or not)
• Use hierarchical model with hyper-parameters to mix
super-class and individual systems
Slide 11
IF(pfd > 100,"True","False")
5
𝜆𝑖 𝑖=1 ~𝐺𝑎𝑚𝑚𝑎 𝛼, 𝛽
IF(pfd >
200,"True","False")
𝑛𝑖 Slide 12
𝑡𝑖𝑗 ~ exp 𝜆𝑖 , 𝑖 = 1, . . . , 5
𝑖=1
Dynamic Fault Trees
Slide 13
Dynamic Fault Trees
CSP gate
𝜏𝐴𝑁𝐷 = max 𝜏𝑖
𝑖
WSP gate
𝑠𝑏
𝜏main if 𝜏spare < 𝜏main
𝜏𝑊𝑆𝑃 =ቐ
OR gate 𝑎𝑐𝑡
𝜏main + 𝜏spare 𝑠𝑏
if 𝜏spare > 𝜏main
𝜏𝑂𝑅 = min 𝜏𝑖
𝑖
PAND gate
𝜏 if 𝜏1 < 𝜏2
𝜏𝑃𝐴𝑁𝐷 = ቊ 2
∞ otherwise
Slide 14
HCAS system example
IF(𝜏𝐶𝑃𝑈𝑇 < 100,“Fail",“On")
𝜏𝑃 if 𝜏𝐵𝑠𝑏 < 𝜏𝑃
𝜏𝐶𝑃𝑈 =൝
𝜏 𝑇 = min 𝜏𝐶𝑆 , 𝜏𝑆𝑆 𝜏𝑃 + 𝜏𝐵𝑎𝑐𝑡 if 𝜏𝐵𝑠𝑏 > 𝜏𝑃
Slide 15
HCAS example solution
𝑀𝑇𝑇𝐹 = 351
Slide 17
Software Defect Prediction
Slide 18
Simplified Defects Model
Slide 19
Node Probability Tables
Slide 20
Prior model
Shows the marginal distributions of the simple model before any evidence
has been entered. So this represents our uncertainty before we enter any
specific information about this product.
Slide 21
Scenario 1
Zero defects found and fixed in testing AND problem complexity is ‘High’
Slide 22
Scenario 2
Operational usage is “Very High”. Replicates the apparently counter-
intuitive empirical observations whereby a product with no defects found in
testing has a high number of defects post-release.
Slide 23
Scenario 3
Slide 25
Lesson Summary