0% found this document useful (0 votes)
52 views40 pages

22 BN Inference

1) Variable elimination is an algorithm for performing exact probabilistic inference in Bayesian networks. It interleaves joining factors and summing out variables to avoid repeating work. 2) An example Bayesian network is presented with random variables for rain (R), traffic (T), and being late for class (L). The joint probability distribution is factorized according to the network. 3) The first query asks for the marginal probability of being late for class (P(L)).

Uploaded by

Kishan Kumar Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views40 pages

22 BN Inference

1) Variable elimination is an algorithm for performing exact probabilistic inference in Bayesian networks. It interleaves joining factors and summing out variables to avoid repeating work. 2) An example Bayesian network is presented with random variables for rain (R), traffic (T), and being late for class (L). The joint probability distribution is factorized according to the network. 3) The first query asks for the marginal probability of being late for class (P(L)).

Uploaded by

Kishan Kumar Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

CSE 473: Artificial Intelligence

Bayesian Networks: Inference

Hanna Hajishirzi

Many slides over the course adapted from either Luke


Zettlemoyer, Pieter Abbeel, Dan Klein, Stuart Russell or Andrew
Moore
1
Outline

§ Bayesian Networks Inference


§ Exact Inference: Variable Elimination
§ Approximate Inference: Sampling
Bayes Net
Representation

3
4
Reachability (D-Separation)
! Question: Are X and Y Active Triples Inactive Triples
conditionally independent (dependent) (Independent)
given evidence vars {Z}?
! Yes, if X and Y “separated” by Z
! Look for active paths from X to Y
! No active paths = independence!
! A path is active if each triple
is active:
! Causal chain A → B → C where B
is unobserved (either direction)
! Common cause A ← B → C where
B is unobserved
! Common effect (aka v-structure)
A → B ← C where B or one of its
descendents is observed
! All it takes to block a path is
a single inactive segment
! 5
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#

A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#
6
Bayes Net Joint Distribution
Example:#Alarm#Network#
B# P(B)# E# P(E)#
+b# 0.001#
B# E# +e# 0.002#
Qb# 0.999# Qe# 0.998#

A#
A# J# P(J|A)# A# M# P(M|A)#
B# E# A# P(A|B,E)#
+a# +j# 0.9# +a# +m# 0.7#
+b# +e# +a# 0.95#
+a# Qj# 0.1# +a# Qm# 0.3#
Qa# +j# 0.05# J# M# Qa# +m# 0.01#
+b# +e# Qa# 0.05#
+b# Qe# +a# 0.94#
Qa# Qj# 0.95# Qa# Qm# 0.99#
+b# Qe# Qa# 0.06#
Qb# +e# +a# 0.29#
Qb# +e# Qa# 0.71#
Qb# Qe# +a# 0.001#
Qb# Qe# Qa# 0.999#

7
Probabilistic Inference
§ Probabilistic inference: compute a desired probability from
other known probabilities (e.g. conditional from joint)

§ We generally compute conditional probabilities


§ P(on time | no reported accidents) = 0.90
§ These represent the agent s beliefs given the evidence

§ Probabilities change with new evidence:


§ P(on time | no accidents, 5 a.m.) = 0.95
§ P(on time | no accidents, 5 a.m., raining) = 0.80
§ Observing new evidence causes beliefs to be updated
Inference
Inference#
! Inference:#calcula)ng#some# ! Examples:#
useful#quan)ty#from#a#joint#
! Posterior#probability#
probability#distribu)on#

! Most#likely#explana)on:#

9
Inference by Enumeration
§ General case:
§ Evidence variables:
§ Query* variable:
§ Hidden variables: All variables

§ We want:
§ First, select the entries consistent with the evidence
§ Second, sum out H to get joint of Query and evidence:

§ Finally, normalize the remaining entries to conditionalize


§ Obvious problems:
§ Worst-case time complexity O(dn)
§ Space complexity O(dn) to store the joint distribution
Inference in BN by Enumeration
Inference#by#Enumera)on#in#Bayes’#Net#
! Given#unlimited#)me,#inference#in#BNs#is#easy#
B# E#
! Reminder#of#inference#by#enumera)on#by#example:#

P (B | + j, +m) /B P (B, +j, +m) A#


X
= P (B, e, a, +j, +m)
e,a
J# M#
X
= P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
e,a

=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j| a)P (+m| a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j| a)P (+m| a)

11
Inference#by#Enumera)on?#
Inference by Enumerataion

P (Antilock|observed variables) = ?
12
Variable Elimination
§ Why is inference by enumeration so slow?
§ You join up the whole joint distribution before you
sum out the hidden variables
§ You end up repeating a lot of work!

§ Idea: interleave joining and marginalizing!


§ Called Variable Elimination
§ Still NP-hard, but usually much faster than inference
by enumeration

§ We ll need some new notation to define VE


Review

§ Joint distribution: P(X,Y) T W P


§ Entries P(x,y) for all x, y hot sun 0.4
§ Sums to 1
hot rain 0.1
cold sun 0.2
cold rain 0.3
§ Selected joint: P(x,Y)
§ A slice of the joint distribution T W P
§ Entries P(x,y) for fixed x, all y cold sun 0.2
§ Sums to P(x)
cold rain 0.3
Review
§ Family of conditionals:
P(X |Y)
T W P
§ Multiple conditionals
§ Entries P(x | y) for all x, y hot sun 0.8
§ Sums to |Y|
hot rain 0.2
cold sun 0.4
cold rain 0.6
§ Single conditional: P(Y | x)
§ Entries P(y | x) for fixed x, all
T W P
y cold sun 0.4
§ Sums to 1
cold rain 0.6
Review

§ Specified family: P(y | X)


§ Entries P(y | x) for fixed y,
T W P
but for all x hot rain 0.2
§ Sums to … who knows!
cold rain 0.6

§ In general, when we write P(Y1 … YN | X1 … XM)


§ It is a “factor,” a multi-dimensional array
§ Its values are all P(y1 … yN | x1 … xM)
§ Any assigned X or Y is a dimension missing (selected) from the array
Inference
§ Inference is expensive with enumeration

§ Variable elimination:
§ Interleave joining and marginalization: Store
initial results and then join with the rest
Example: Traffic Domain
§ Random R +r   0.1  
-­‐r   0.9  
Variables
§ R: Raining T
§ T: Traffic +r   +t   0.8  
+r   -­‐t   0.2  
§ L: Late for class! L -­‐r   +t   0.1  
-­‐r   -­‐t   0.9  
§ First query: P(L)
+t   +l   0.3  
+t   -­‐l   0.7  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
Variable Elimination Outline
§ Maintain a set of tables called factors
§ Initial factors are local CPTs (one per node)

+r   0.1   +r   +t   0.8   +t   +l   0.3  


+r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐r   0.9  
-­‐r   +t   0.1   -­‐t   +l   0.1  
-­‐r   -­‐t   0.9   -­‐t   -­‐l   0.9  

§ Any known values are selected


§ E.g. if we know , the initial factors are

+r   +t   0.8  
+r   0.1   +t   +l   0.3  
+r   -­‐t   0.2  
-­‐r   0.9   -­‐t   +l   0.1  
-­‐r   +t   0.1  
-­‐r   -­‐t   0.9  

§ VE: Alternately join factors and eliminate variables


Operation 1: Join Factors
§ First basic operation: joining factors
§ Combining factors:
§ Just like a database join
§ Get all factors over the joining variable
§ Build a new factor over the union of the variables involved

§ Example: Join on R
R,T
R
+r   0.1   +r   +t   0.8   +r   +t   0.08  
-­‐r   0.9   +r   -­‐t   0.2   +r   -­‐t   0.02  
T
-­‐r   +t   0.1   -­‐r   +t   0.09  
-­‐r   -­‐t   0.9   -­‐r   -­‐t   0.81  
§ Computation for each entry: pointwise products
Example: Multiple Joins

+r   0.1  
R -­‐r   0.9   Join R
+r   +t   0.08   R, T
+r   -­‐t   0.02  
T +r   +t   0.8   -­‐r   +t   0.09  
+r   -­‐t   0.2   -­‐r   -­‐t   0.81  
L
-­‐r   +t   0.1  
-­‐r   -­‐t   0.9  
L

+t   +l   0.3   +t   +l   0.3  
+t   -­‐l   0.7   +t   -­‐l   0.7  
-­‐t   +l   0.1   -­‐t   +l   0.1  
-­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9  
Example: Multiple Joins

R, T, L

+r   +t   0.08  
+r   -­‐t   0.02  
R, T -­‐r   +t   0.09   Join T +r   +t   +l   0.024  
-­‐r   -­‐t   0.81  
+r   +t   -­‐l   0.056  
+r   -­‐t   +l   0.002  
L
+r   -­‐t   -­‐l   0.018  
+t   +l   0.3  
-­‐r   +t   +l   0.027  
+t   -­‐l   0.7  
-­‐r   +t   -­‐l   0.063  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
-­‐r   -­‐t   +l   0.081  
-­‐r   -­‐t   -­‐l   0.729  
Operation 2: Eliminate
§ Second basic operation: marginalization
§ Take a factor and sum out a variable
§ Shrinks a factor to a smaller one
§ A projection operation
§ Example:

+r   +t   0.08  
+r   -­‐t   0.02   +t   0.17  
-­‐r   +t   0.09   -­‐t   0.83  
-­‐r   -­‐t   0.81  
Multiple Elimination

R, T, L T, L L

Sum Sum
+r   +t   +l   0.024  
out R out T
+r   +t   -­‐l   0.056  
+r   -­‐t   +l   0.002   +t   +l   0.051   +l   0.134  
-­‐l   0.886  
+r   -­‐t   -­‐l   0.018   +t   -­‐l   0.119  
-­‐r   +t   +l   0.027   -­‐t   +l   0.083  
-­‐r   +t   -­‐l   0.063   -­‐t   -­‐l   0.747  
-­‐r   -­‐t   +l   0.081  
-­‐r   -­‐t   -­‐l   0.729  
P(L) : Marginalizing Early!

+r   0.1  
-­‐r   0.9   Join R Sum out R

R +r   +t   0.08  
+r   +t   0.8   +r   -­‐t   0.02   +t   0.17  
-­‐r   +t   0.09   -­‐t   0.83  
+r   -­‐t   0.2  
-­‐r   +t   0.1   -­‐r   -­‐t   0.81   T
T
-­‐r   -­‐t   0.9  
R, T

L +t   +l   0.3   +t   +l   0.3   +t   +l   0.3  


L
+t   -­‐l   0.7   +t   -­‐l   0.7   L +t   -­‐l   0.7  
-­‐t   +l   0.1   -­‐t   +l   0.1   -­‐t   +l   0.1  
-­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9   -­‐t   -­‐l   0.9  
Marginalizing Early (aka VE*)
T
T, L L
Join T Sum out T
L

+t   0.17  
-­‐t   0.83  
+t   +l   0.051  
+l   0.134  
+t   -­‐l   0.119  
-­‐l   0.886  
+t   +l   0.3   -­‐t   +l   0.083  
+t   -­‐l   0.7   -­‐t   -­‐l   0.747  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  
* VE is variable elimination
Traffic#Domain#
Traffic Domain
R P (L) = ?

T ! Inference#by#Enumera)on# ! Variable#Elimina)on#
XX X X
= P (L|t)P (r)P (t|r) = P (L|t) P (r)P (t|r)
L t r t r
Join#on#r# Join#on#r#

Join#on#t# Eliminate#r#

Eliminate#r# Join#on#t#

Eliminate#t# Eliminate#t#

27
Marginalizing Early
Marginalizing#Early!#(aka#VE)#
Join#R# Sum#out#R# Join#T# Sum#out#T#

+r# +t# 0.08#


+r# 0.1# +r# Qt# 0.02#
Qr# 0.9# +t# 0.17#
Qr# +t# 0.09#
Qt# 0.83#
Qr# Qt# 0.81#
R R, T T T, L L
+r# +t# 0.8#
+r# Qt# 0.2#
Qr# +t# 0.1#
T Qr# Qt# 0.9# L L +t# +l# 0.051#
+t# Ql# 0.119# +l# 0.134#
Qt# +l# 0.083# Ql# 0.866#
L +t# +l# 0.3# +t# +l# 0.3# Qt# Ql# 0.747#
+t# +l# 0.3#
+t# Ql# 0.7# +t# Ql# 0.7#
+t# Ql# 0.7#
Qt# +l# 0.1# Qt# +l# 0.1#
Qt# +l# 0.1#
Qt# Ql# 0.9# Qt# Ql# 0.9#
Qt# Ql# 0.9#

28
Evidence
§ If evidence, start with factors that select that evidence
§ No evidence uses these initial factors:

+r   0.1   +r   +t   0.8   +t   +l   0.3  


-­‐r   0.9   +r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐r   +t   0.1   -­‐t   +l   0.1  
-­‐r   -­‐t   0.9   -­‐t   -­‐l   0.9  

§ Computing , the initial factors become:

+r   0.1   +r   +t   0.8   +t   +l   0.3  


+r   -­‐t   0.2   +t   -­‐l   0.7  
-­‐t   +l   0.1  
-­‐t   -­‐l   0.9  

§ We eliminate all vars other than query + evidence


Evidence II
§ Result will be a selected joint of query and evidence
§ E.g. for P(L | +r), we d end up with:

Normalize
+r   +l   0.026   +l   0.26  
+r   -­‐l   0.074   -­‐l   0.74  

§ To get our answer, just normalize this!

§ That’s it!
General Variable Elimination
§ Query:

§ Start with initial factors:


§ Local CPTs (but instantiated by evidence)

§ While there are still hidden variables (not Q or


evidence):
§ Pick a hidden variable H
§ Join all factors mentioning H
§ Eliminate (sum out) H

§ Join all remaining factors and normalize


Variable Elimination Bayes Rule
Start / Select Join on B Normalize

B a, B
B P
+b 0.1
-b 0.9 a
A B P A B P
B A P
+a +b 0.08 +a +b 8/17
+b +a 0.8 +a -b 0.09 +a -b 9/17
b -a 0.2
-b +a 0.1
-b -a 0.9
Example
Query:

Choose A
Example

Choose E

Finish with B

Normalize
Variable Elimination
P(B, j, m) = ∑ P(b, j, m, A, E) = B E
A,E

∑ P(B)P(E)P(A | B, E)P(m | A)P( j | A) A


A,E

∑ P(B)P(E)∑ P(A | B, E)P(m | A)P( j | A)


E A M J

= ∑ P(B)P(E)∑ P(m, j, A | B, E )
E A

= ∑ P(B)P(E)P(m, j | B, E ) = P(B)∑ P(m, j, E | B)


E E

= P(B)P(m, j | B)
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)o
Another Example
Another#Variable#Elimina)on#Example#
Another#Variable#Elimina)on#Example#

Computa)onal#complexity#cri)cally#
Computa)onal#complexity#cri)cally#
depends#on#the#largest#factor#being#
depends#on#the#largest#factor#being#
generated#in#this#process.##Size#of#facto
generated#in#this#process.##Size#of#factor
Computa)onal#complexity#cri)cally#
=#number#of#entries#in#table.##In#
=#number#of#entries#in#table.##In#
depends#on#the#largest#factor#being#
example#above#(assuming#binary)#all#
example#above#(assuming#binary)#all#
generated#in#this#process.##Size#of#fact
factors#generated#are#of#size#2#QQQ#as#
factors#generated#are#of#size#2#QQQ#as#
=#number#of#entries#in#table.##In#
they#all#only#have#one#variable#(Z,#Z,#
they#all#only#have#one#variable#(Z,#Z,#
example#above#(assuming#binary)#all#
and#X3#respec)vely).##
and#Xfactors#generated#are#of#size#2#QQQ#as#
3#respec)vely).##
they#all#only#have#one#variable#(Z,#Z,#
and#X3#respec)vely).##

36
Variable#Elimina)on#Ordering#
Variable#Elimina)on#Ordering#
Variable Elimination Ordering
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderin
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#th
maximum#factor#generated#for#each#of#the#orderings?#
! For#the#query#P(Xn|y1,…,yn)#work#through#the#following#two#different#orderings#
as#done#in#previous#slide:#Z,#X1,#…,#XnQ1#and#X1,#…,#XnQ1,#Z.##What#is#the#size#of#the#
maximum#factor#generated#for#each#of#the#orderings?#

…#
…#

…# …#

! Answer:#2n+1#versus#22#(assuming#binary)#
! Answer:#2n+1#versus#22#(assuming#binary)#
! In#general:#the#ordering#can#greatly#affect#efficiency.###
! In#general:#the#ordering#can#greatly#affect#efficiency.###
37
VE: Computational and Space
Complexity
VE:#Computa)onal#and#Space#Complexity#
! The#computa)onal#and#space#complexity#of#variable#elimina)on#is#
determined#by#the#largest#factor#

! The#elimina)on#ordering#can#greatly#affect#the#size#of#the#largest#factor.###
! E.g.,#previous#slide’s#example#2n#vs.#2#

! Does#there#always#exist#an#ordering#that#only#results#in#small#factors?#
! No!#

38
Exact Inference: Variable Elimination
§ Remaining Issues:
§ Complexity: exponential in tree width (size of the
largest factor created)
§ Best elimination ordering? NP-hard problem

§ What you need to know:


§ Should be able to run it on small examples, understand
the factor creation / reduction flow
§ Better than enumeration: saves time by marginalizing
variables as soon as possible rather than at the end

§ We have seen a special case of VE already


§ HMM Forward Inference
Variable#Elimina)
Variable Elimination
! Interleave#joining#and#marginalizing#

! dk#entries#computed#for#a#factor#over#k#
variables#with#domain#sizes#d#

! Ordering#of#elimina)on#of#hidden#variables#
can#affect#size#of#factors#generated#

! Worst#case:#running#)me#exponen)al#in#the#
size#of#the#Bayes’#net#
40

You might also like