Computer Science CPSC 322: Bayesian Networks: Construction
Computer Science CPSC 322: Bayesian Networks: Construction
Lecture 20
Bayesian Networks:
Construction
1
Lecture Overview
• Recap lecture 19
• Bayesian networks: construction
• Defining Conditional Probabilities in a Bnet
• Considerations on Network Structure (time
permitting)
2
Chain Rule
• Allows representing a Join Probability Distribution
(JPD) as the product of conditional probability
distributions
3
Chain Rule example
𝑛𝑛
P(A,B,C,D)
= P(D|A,B,C) × P(A,B,C) =
= P(D|A,B,C) × P(C|A,B) × P(A,B)
= P(D|A,B,C) × P(C|B,A) × P(B|A) × P(A)
= P(A)P(B|A)P(C|A,B)P(D|A,B,C)
4
Why does the chain rule help us?
We will see how, under specific circumstances (variables
independence), this rule helps gain compactness
5
Marginal Independence
• Intuitively: if X ╨ Y, then
• learning that Y=y does not change your belief in X
• and this is true for all values y that Y could take
7
Exploiting marginal independence
8
Given the binary variables A,B,C,D,
To specify P(A,B,C,D) one needs the JDP below To specify P(A)×P(B) ×P(C)×P(D)
one needs the JDPs below
A B C D P(A,B,C,D)
T T T T A P(A)
T T T F T
T T F T F
T T F F
T F T T B P(B)
T F T F
T
T F F T
F
T F F F
F T T T C P(C)
F T T F T
F T F T
F
F T F F
F F T T D P(D)
F F T F
T
F F F T
F
F F F F 9
Conditional Independence
Lit-l1
Up-s2
Power-w0
Lit-l1
11
Conditional vs. Marginal Independence
Two variables can be
Understood
Material
Conditionally but not marginally independent Assignment Exam
• ExamGrade and AssignmentGrade Grade Grade
13
If A, B, C, D are Boolean variables
P(D | A,B,C) is given by the following table
A B C P(D=T|A,B,C) P(D=F|A,B,C)
T T T
T T F
T F T
T F F
F T T
F T F
F F T
F F F
16
Bayesian Networks: Intuition
Lit-l1
Understood
Material Smoking
Fire
At Sensor
Assignment Exam
Grade Grade
Alarm
17
Belief (or Bayesian) networks
Def. A Belief network consists of
• a directed, acyclic graph (DAG) where each node is associated
with a random variable Xi
• A domain for each variable Xi
• a set of conditional probability distributions for each node Xi given
its parents Pa(Xi) in the graph
P (Xi | Pa(Xi))
• Recap lecture 19
• Bayesian networks: construction
• Defining Conditional Probabilities in a Bnet
• Considerations on Network Structure (time
permitting)
19
How to build a Bayesian network
1. Define a total order over the random variables: (X1, …,Xn)
2. Apply the chain rule Predecessors of Xi in
the total order defined
P(X1, …,Xn) = ∏ni= 1 P(Xi | X1, … ,Xi-1) over the variables
21
Example for BN construction: Fire Diagnosis
You want to diagnose whether there is a fire in a building
• You can receive reports (possibly noisy) about whether everyone is
leaving the building
• If everyone is leaving, this may have been caused by a fire alarm
• If there is a fire alarm, it may have been caused by a fire or by
tampering
• If there is a fire, there may be smoke
Start by choosing the random variables for this domain, here all are Boolean:
• Tampering (T) is true when the alarm has been tampered with
• Fire (F) is true when there is a fire
• Alarm (A) is true when there is an alarm
• Smoke (S) is true when there is smoke
• Leaving (L) is true if there are lots of people leaving the building
• Report (R) is true if the sensor reports that lots of people are leaving the
building
Next apply the procedure described earlier
22
Example for BN construction: Fire Diagnosis
1. Define a total ordering of variables:
- Let’s chose an order that follows the causal sequence of events
- Fire (F), Tampering (T), Alarm, (A), Smoke (S) Leaving (L) Report
(R)
2. Apply the chain rule
P(F,T,A,S,L,R) =
23
24
Example for BN construction: Fire Diagnosis
1. Define a total ordering of variables:
- Let’s chose an order that follows the causal sequence of events
- Fire (F), Tampering (T), Alarm, (A), Smoke (S) Leaving (L) Report
(R)
2. Apply the chain rule
P(F,T,A,S,L,R) =
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
We will do steps 3, 4 and 5 together, for each element P(Xi | X1, … ,Xi-1) of
the factorization
3. For each variable (Xi), choose the parents Parents(Xi) by evaluating
conditional independencies, so that
P(Xi | X1, … ,Xi-1) = P (Xi | Parents (Xi))
4. Rewrite
P(X1, …,Xn) = ∏ni= 1 P (Xi | Parents (Xi))
5. Construct the Bayesian network 25
Fire Diagnosis Example
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Fire
Fire (F) is the first variable in the ordering, X1. It does not have
parents.
26
Example
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
27
Example
P(F)P (T ) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
28
Fire Diagnosis Example
P(F)P (T ) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
Alarm
29
Fire Diagnosis Example
P(F)P (T | F) P (A | F,T) P (S | F,T,A) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
Alarm
Smoke
30
Fire Diagnosis Example
P(F)P (T | F) P (A | F,T) P (S | F) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
Alarm
Smoke
31
Example
P(F)P (T | F) P (A | F,T) P (S | F) P (L | F,T,A,S) P (R | F,T,A,S,L)
Tampering Fire
Alarm
Smoke
Leaving
32
Fire Diagnosis Example
P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | F,T,A,S,L)
Tampering Fire
Alarm
Smoke
Leaving
Report
33
Fire Diagnosis Example
P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)
Tampering Fire
Alarm
Smoke
Leaving
Report
The result is the Bayesian network above, and its corresponding, very
compact factorization of the original JPD
35
36
Fire Diagnosis Example
P(F)P (T ) P (A | F,T) P (S | F) P (L | A) P (R | L)
5. Construct the Bayesian Net (BN)
• Nodes are the random variables
• Draw a directed arc from each variable in Pa(Xi) to Xi
• Define a conditional probability table (CPT) for each variable Xi:
• P(Xi | Pa(Xi))
Tamperi
Fire
ng
Alarm
Smoke
Leaving
Report
37
Lecture Overview
• Recap lecture 19
• Bayesian networks: construction
• Defining Conditional Probabilities in a Bnet
• Considerations on Network Structure (time
permitting)
38
Example for BN construction: Fire Diagnosis
• We are not done yet: must specify the Conditional Probability Table
(CPT) for each variable. All variables are Boolean.
• How many probabilities do we need to specify for this Bayesian network?
• For instance, how many probabilities do we need to explicitly specify
for Fire?
A. 1 B. 2 C. 4 D. 8
39
Example for BN construction: Fire Diagnosis
• We are not done yet: must specify the Conditional Probability Table
(CPT) for each variable. All variables are Boolean.
• How many probabilities do we need to specify for this Bayesian network?
• For instance, how many probabilities do we need to explicitly specify
for Fire? P(Fire): 1 probability –> P(Fire = T)
Because P(Fire = F) = 1 - P(Fire = T)
40
Example for BN construction: Fire Diagnosis
P(Fire=t)
0.01
41
Example for BN construction: Fire Diagnosis
P(Fire=t)
0.01
A. 6 B. 12 C. 20 D. 26-1
44
Example for BN construction: Fire Diagnosis
P(Tampering=t) P(Fire=t)
0.02 0.01
45
Example for BN construction: Fire Diagnosis
P(Tampering=t) P(Fire=t)
0.02 0.01
46
Example for BN construction: Fire Diagnosis
t f 0.85 f 0.01
f t 0.99
Alarm P(Leaving=t|A)
f f 0.0001
t 0.88
f 0.001
Leaving P(Report=t|L)
t 0.75 Once we have the CPTs in the network,
f 0.01 we can compute any entry of the JPD
P(Tampering=t, Fire=f, Alarm=t, Smoke=f, Leaving=t, Report=t) =
49
Bayesian Networks: Types of Query/Inference
Diagnostic Predictive Mixed Intercausal
Person smokes
Fire Fire happens There is no fire
next to sensor
P(F=t)=1 F=f
P(F|L=t)=? S=t
Fire Fire P(F|A=t,T=t)=?
Alarm
Smoking
at
Alarm Alarm Fire Sensor
Leaving P(A|F=f,L=t)=?
51
Compactness
• In a Bnet, how many rows do we need to explicitly
store for the CPT of a Boolean variable Xi with k
Boolean parents?
Compactness
• A CPT for a Boolean variable Xi with k Boolean parents
has 2k rows for the combinations of parent values
• If each variable has no more than k parents, the complete
network requires to specify n2k numbers
• For k<< n, this is a substantial improvement,
• the numbers required grow linearly with n, vs. O(2n) for
the full joint distribution
• E.g., if we have a Bnets with 30 boolean variables, each
with 5 parents
• Need to specify 30*25 probability
• But we need 230 for JPD
Realistic BNet: Liver Diagnosis
Source: Onisko et al., 1999
Report Alarm
Smoke Fire
Report Alarm
Smoke Fire
Report Alarm
Smoke Fire
Alarm
Report
Smoke Fire
Leaving Tampering
Alarm
Report
Smoke Fire
Are there wrong network structures?
• How can a network structure be wrong?
• If it misses directed edges that are required
• E.g. an edge is missing below, making Fire conditionally
independent of Alarm given Tampering and Smoke
Leaving Tampering
Report Alarm
Smoke Fire
Leaving Tampering
Report Alarm
Smoke Fire
2 E
3 E
Z
Y X Power(w0)
1 E Z
Lit(l1)
Z 2
Understood
Material
2 E
Assignment Exam
Grade Grade
3 E
Z
3
Smoking Fire
At Sensor
Alarm
Y E X
1 Z
3
Z
2 Assignment Exam
Grade Grade
Z Smoking 3
Fire
At Sensor
Alarm