0% found this document useful (0 votes)
9 views

Lecture Bayesian Networks

Uploaded by

haifa.zaidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture Bayesian Networks

Uploaded by

haifa.zaidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Bayesian Networks

Philipp Koehn

6 April 2017

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Outline 1

● Bayesian Networks

● Parameterized distributions

● Exact inference

● Approximate inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


2

bayesian networks

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Bayesian Networks 3

● A simple, graphical notation for conditional independence assertions


and hence for compact specification of full joint distributions

● Syntax
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(Xi∣P arents(Xi))

● In the simplest case, conditional distribution represented as


a conditional probability table (CPT) giving the
distribution over Xi for each combination of parent values

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 4

● Topology of network encodes conditional independence assertions:

● W eather is independent of the other variables

● T oothache and Catch are conditionally independent given Cavity

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 5

● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn’t call. Sometimes it’s set off by minor earthquakes.
Is there a burglar?

● Variables: Burglar, Earthquake, Alarm, JohnCalls, M aryCalls

● Network topology reflects “causal” knowledge


– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 6

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compactness 7

● A conditional probability table for Boolean Xi with k Boolean parents has 2k


rows for the combinations of parent values

● Each row requires one number p for Xi = true


(the number for Xi = f alse is just 1 − p)

● If each variable has no more than k parents,


the complete network requires O(n ⋅ 2k ) numbers

● I.e., grows linearly with n, vs. O(2n) for the full joint distribution

● For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25 − 1 = 31)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Global Semantics 8

● Global semantics defines the full joint distribution as the product of the local
conditional distributions:
n
P (x1, . . . , xn) = ∏ P (xi∣parents(Xi))
i=1

● E.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)

= P (j∣a)P (m∣a)P (a∣¬b, ¬e)P (¬b)P (¬e)


= 0.9 × 0.7 × 0.001 × 0.999 × 0.998
≈ 0.00063

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Local Semantics 9

● Local semantics: each node is conditionally independent


of its nondescendants given its parents

● Theorem: Local semantics ⇔ global semantics

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Markov Blanket 10

● Each node is conditionally independent of all others given its


Markov blanket: parents + children + children’s parents

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Constructing Bayesian Networks 11

● Need a method such that a series of locally testable assertions of


conditional independence guarantees the required global semantics
1. Choose an ordering of variables X1, . . . , Xn
2. For i = 1 to n
add Xi to the network
select parents from X1, . . . , Xi−1 such that
P(Xi∣P arents(Xi)) = P(Xi∣X1, . . . , Xi−1)

● This choice of parents guarantees the global semantics:


n
P(X1, . . . , Xn) = ∏ P(Xi∣X1, . . . , Xi−1) (chain rule)
i=1
n
= ∏ P(Xi∣P arents(Xi)) (by construction)
i=1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 12

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 13

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 14

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)?
● P (B∣A, J, M ) = P (B)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 15

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)?
● P (E∣B, A, J, M ) = P (E∣A, B)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 16

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)? No
● P (E∣B, A, J, M ) = P (E∣A, B)? Yes

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 17

● Deciding conditional independence is hard in noncausal directions


● (Causal models and conditional independence seem hardwired for humans!)
● Assessing conditional probabilities is hard in noncausal directions
● Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example: Car Diagnosis 18

● Initial evidence: car won’t start


● Testable variables (green), “broken, so fix it” variables (orange)
● Hidden variables (gray) ensure sparse structure, reduce parameters

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example: Car Insurance 19

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compact Conditional Distributions 20

● CPT grows exponentially with number of parents


CPT becomes infinite with continuous-valued parent or child

● Solution: canonical distributions that are defined compactly

● Deterministic nodes are the simplest case:


X = f (P arents(X)) for some function f

● E.g., Boolean functions


N orthAmerican ⇔ Canadian ∨ U S ∨ M exican

● E.g., numerical relationships among continuous variables

∂Level
= inflow + precipitation - outflow - evaporation
∂t

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compact Conditional Distributions 21

● Noisy-OR distributions model multiple noninteracting causes


– parents U1 . . . Uk include all causes (can add leak node)
– independent failure probability qi for each cause alone
Ô⇒ P (X∣U1 . . . Uj , ¬Uj+1 . . . ¬Uk ) = 1 − ∏ji = 1 qi

Cold F lu M alaria P (F ever) P (¬F ever)


F F F 0.0 1.0
F F T 0.9 0.1
F T F 0.8 0.2
F T T 0.98 0.02 = 0.2 × 0.1
T F F 0.4 0.6
T F T 0.94 0.06 = 0.6 × 0.1
T T F 0.88 0.12 = 0.6 × 0.2
T T T 0.988 0.012 = 0.6 × 0.2 × 0.1

● Number of parameters linear in number of parents

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Hybrid (Discrete+Continuous) Networks 22

● Discrete (Subsidy? and Buys?); continuous (Harvest and Cost)

● Option 1: discretization—possibly large errors, large CPTs


Option 2: finitely parameterized canonical families

● 1) Continuous variable, discrete+continuous parents (e.g., Cost)


2) Discrete variable, continuous parents (e.g., Buys?)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Continuous Child Variables 23

● Need one conditional density function for child variable given continuous
parents, for each possible assignment to discrete parents

● Most common is the linear Gaussian model, e.g.,:

P (Cost = c∣Harvest = h, Subsidy? = true)


= N (ath + bt, σt)(c)
2
1 1 c − (ath + bt)
= √ exp (− ( ) )
σt 2π 2 σt

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Continuous Child Variables 24

● All-continuous network with LG distributions


Ô⇒ full joint distribution is a multivariate Gaussian

● Discrete+continuous LG network is a conditional Gaussian network i.e., a


multivariate Gaussian over all continuous variables for each combination of
discrete variable values

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Discrete Variable w/ Continuous Parents 25

● Probability of Buys? given Cost should be a “soft” threshold:

● Probit distribution uses integral of Gaussian:


x
Φ(x) = ∫−∞ N (0, 1)(x)dx
P (Buys? = true ∣ Cost = c) = Φ((−c + µ)/σ)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Why the Probit? 26

● It’s sort of the right shape

● Can view as hard threshold whose location is subject to noise

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Discrete Variable 27

● Sigmoid (or logit) distribution also used in neural networks:


1
P (Buys? = true ∣ Cost = c) =
1 + exp(−2 −c+µ
σ )

● Sigmoid has similar shape to probit but much longer tails:

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


28

inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference Tasks 29

● Simple queries: compute posterior marginal P(Xi∣E = e)


e.g., P (N oGas∣Gauge = empty, Lights = on, Starts = f alse)

● Conjunctive queries: P(Xi, Xj ∣E = e) = P(Xi∣E = e)P(Xj ∣Xi, E = e)

● Optimal decisions: decision networks include utility information;


probabilistic inference required for P (outcome∣action, evidence)

● Value of information: which evidence to seek next?

● Sensitivity analysis: which probability values are most critical?

● Explanation: why do I need a new starter motor?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Enumeration 30

● Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation

● Simple query on the burglary network


P(B∣j, m)
= P(B, j, m)/P (j, m)
= αP(B, j, m)
= α ∑e ∑a P(B, e, a, j, m)

● Rewrite full joint entries using product of CPT entries:


P(B∣j, m)
= α ∑e ∑a P(B)P (e)P(a∣B, e)P (j∣a)P (m∣a)
= αP(B) ∑e P (e) ∑a P(a∣B, e)P (j∣a)P (m∣a)

● Recursive depth-first enumeration: O(n) space, O(dn) time

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Enumeration Algorithm 31

function E NUMERATION -A SK(X, e, bn) returns a distribution over X


inputs: X, the query variable
e, observed values for variables E
bn, a Bayesian network with variables {X} ∪ E ∪ Y
Q(X ) ← a distribution over X, initially empty
for each value xi of X do
extend e with value xi for X
Q(xi ) ← E NUMERATE -A LL(VARS[bn], e)
return N ORMALIZE(Q(X ))
function E NUMERATE -A LL(vars, e) returns a real number
if E MPTY ?(vars) then return 1.0
Y ← F IRST(vars)
if Y has value y in e
then return P (y ∣ P a(Y )) × E NUMERATE -A LL(R EST(vars), e)
else return ∑y P (y ∣ P a(Y )) × E NUMERATE -A LL(R EST(vars), ey )
where ey is e extended with Y = y

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Evaluation Tree 32

● Enumeration is inefficient: repeated computation


e.g., computes P (j∣a)P (m∣a) for each value of e

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Variable Elimination 33

● Variable elimination: carry out summations right-to-left,


storing intermediate results (factors) to avoid recomputation
P(B∣j, m)
= α P(B) ∑e P (e) ∑a P(a∣B, e) P (j∣a) P (m∣a)
² ² ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¶
B E A J M

= αP(B) ∑e P (e) ∑a P(a∣B, e)P (j∣a)fM (a)


= αP(B) ∑e P (e) ∑a P(a∣B, e)fJ (a)fM (a)
= αP(B) ∑e P (e) ∑a fA(a, b, e)fJ (a)fM (a)
= αP(B) ∑e P (e)fĀJM (b, e) (sum out A)
= αP(B)fĒ ĀJM (b) (sum out E)
= αfB (b) × fĒ ĀJM (b)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Variable Elimination Algorithm 34

function E LIMINATION -A SK(X, e, bn) returns a distribution over X


inputs: X, the query variable
e, evidence specified as an event
bn, a belief network specifying joint distribution P(X1, . . . , Xn)
factors ← [ ]; vars ← R EVERSE(VARS[bn])
for each var in vars do
factors ← [M AKE -FACTOR(var , e)∣factors]
if var is a hidden variable then factors ← S UM -O UT(var, factors)
return N ORMALIZE(P OINTWISE -P RODUCT(factors))

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Irrelevant Variables 35

● Consider the query P (JohnCalls∣Burglary = true)


P (J∣b) = αP (b) ∑ P (e) ∑ P (a∣b, e)P (J∣a) ∑ P (m∣a)
e a m
Sum over m is identically 1; M is irrelevant to the query

● Theorem 1: Y is irrelevant unless Y ∈ Ancestors({X} ∪ E)

● Here
– X = JohnCalls, E = {Burglary}
– Ancestors({X} ∪ E) = {Alarm, Earthquake}
⇒ M aryCalls is irrelevant

● Compare this to backward chaining from the query in Horn clause KBs

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Irrelevant Variables 36

● Definition: moral graph of Bayes net: marry all parents and drop arrows

● Definition: A is m-separated from B by C iff separated by C in the moral graph

● Theorem 2: Y is irrelevant if m-separated from X by E

● For P (JohnCalls∣Alarm = true), both


Burglary and Earthquake are irrelevant

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Complexity of Exact Inference 37

● Singly connected networks (or polytrees)


– any two nodes are connected by at most one (undirected) path
– time and space cost of variable elimination are O(dk n)

● Multiply connected networks


– can reduce 3SAT to exact inference Ô⇒ NP-hard
– equivalent to counting 3SAT models Ô⇒ #P-complete

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


38

approximate inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Stochastic Simulation 39

● Basic idea
– Draw N samples from a sampling distribution S
– Compute an approximate posterior probability P̂
– Show this converges to the true probability P

● Outline
– Sampling from an empty network
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples
– Markov chain Monte Carlo (MCMC): sample from a stochastic process
whose stationary distribution is the true posterior

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Sampling from an Empty Network 40

function P RIOR -S AMPLE(bn) returns an event sampled from bn


inputs: bn, a belief network specifying joint distribution P(X1, . . . , Xn)
x ← an event with n elements
for i = 1 to n do
xi ← a random sample from P(Xi ∣ parents(Xi))
given the values of P arents(Xi) in x
return x

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 41

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 42

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 43

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 44

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 45

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 46

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 47

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Sampling from an Empty Network 48

● Probability that P RIOR S AMPLE generates a particular event


SP S (x1 . . . xn) = ∏ni= 1 P (xi∣parents(Xi)) = P (x1 . . . xn)
i.e., the true prior probability

● E.g., SP S (t, f, t, t) = 0.5 × 0.9 × 0.8 × 0.9 = 0.324 = P (t, f, t, t)

● Let NP S (x1 . . . xn) be the number of samples generated for event x1, . . . , xn

● Then we have lim P̂ (x1, . . . , xn) = lim NP S (x1, . . . , xn)/N


N →∞ N →∞
= SP S (x1, . . . , xn)
= P (x1 . . . xn)

● That is, estimates derived from P RIOR S AMPLE are consistent

● Shorthand: P̂ (x1, . . . , xn) ≈ P (x1 . . . xn)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Rejection Sampling 49

● P̂(X∣e) estimated from samples agreeing with e

function R EJECTION -S AMPLING(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: N, a vector of counts over X, initially zero
for j = 1 to N do
x ← P RIOR -S AMPLE(bn)
if x is consistent with e then
N[x] ← N[x]+1 where x is the value of X in x
return N ORMALIZE(N[X])

● E.g., estimate P(Rain∣Sprinkler = true) using 100 samples


27 samples have Sprinkler = true
Of these, 8 have Rain = true and 19 have Rain = f alse
● P̂(Rain∣Sprinkler = true) = N ORMALIZE(⟨8, 19⟩) = ⟨0.296, 0.704⟩
● Similar to a basic real-world empirical estimation procedure

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Analysis of Rejection Sampling 50

● P̂(X∣e) = αNP S (X, e) (algorithm defn.)


= NP S (X, e)/NP S (e) (normalized by NP S (e))
≈ P(X, e)/P (e) (property of P RIOR S AMPLE)
= P(X∣e) (defn. of conditional probability)

● Hence rejection sampling returns consistent posterior estimates

● Problem: hopelessly expensive if P (e) is small

● P (e) drops off exponentially with number of evidence variables!

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting 51

● Idea: fix evidence variables, sample only nonevidence variables,


and weight each sample by the likelihood it accords the evidence

function L IKELIHOOD -W EIGHTING(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: W, a vector of weighted counts over X, initially zero
for j = 1 to N do
x, w ← W EIGHTED -S AMPLE(bn)
W[x ] ← W[x ] + w where x is the value of X in x
return N ORMALIZE(W[X ])

function W EIGHTED -S AMPLE(bn, e) returns an event and a weight


x ← an event with n elements; w ← 1
for i = 1 to n do
if Xi has a value xi in e
then w ← w × P (Xi = xi ∣ parents(Xi ))
else xi ← a random sample from P(Xi ∣ parents(Xi ))
return x, w

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 52

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 53

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 54

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 55

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 56

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 57

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 58

w = 1.0 × 0.1 × 0.99 = 0.099

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Analysis 59

● Sampling probability for W EIGHTED S AMPLE is


SW S (z, e) = ∏li = 1 P (zi∣parents(Zi))

● Note: pays attention to evidence in ancestors only


Ô⇒ somewhere “in between” prior and
posterior distribution

● Weight for a given sample z, e is


w(z, e) = ∏mi = 1 P (ei ∣parents(Ei ))

● Weighted sampling probability is


SW S (z, e)w(z, e)
= ∏li = 1 P (zi∣parents(Zi)) ∏m
i = 1 P (ei ∣parents(Ei ))
= P (z, e) (by standard global semantics of network)

● Hence likelihood weighting returns consistent estimates


but performance still degrades with many evidence variables
because a few samples have nearly all the total weight

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Approximate Inference using MCMC 60

● “State” of network = current assignment to all variables


● Generate next state by sampling one variable given Markov blanket
Sample each variable in turn, keeping evidence fixed

function MCMC-A SK(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: N[X ], a vector of counts over X, initially zero
Z, the nonevidence variables in bn
x, the current state of the network, initially copied from e
initialize x with random values for the variables in Y
for j = 1 to N do
for each Zi in Z do
sample the value of Zi in x from P(Zi ∣mb(Zi ))
given the values of M B(Zi ) in x
N[x ] ← N[x ] + 1 where x is the value of X in x
return N ORMALIZE(N[X ])

● Can also choose a variable to sample at random each time

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


The Markov Chain 61

● With Sprinkler = true, W etGrass = true, there are four states:

● Wander about for a while, average what you see

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


MCMC Example 62

● Estimate P(Rain∣Sprinkler = true, W etGrass = true)

● Sample Cloudy or Rain given its Markov blanket, repeat.


Count number of times Rain is true and false in the samples.

● E.g., visit 100 states


31 have Rain = true, 69 have Rain = f alse

● P̂(Rain∣Sprinkler = true, W etGrass = true)


= N ORMALIZE(⟨31, 69⟩) = ⟨0.31, 0.69⟩

● Theorem: chain approaches stationary distribution:


long-run fraction of time spent in each state is exactly
proportional to its posterior probability

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Markov Blanket Sampling 63

● Markov blanket of Cloudy is Sprinkler and Rain

● Markov blanket of Rain is


Cloudy, Sprinkler, and W etGrass

● Probability given the Markov blanket is calculated as follows:


P (x′i∣mb(Xi)) = P (x′i∣parents(Xi)) ∏Zj ∈Children(Xi) P (zj ∣parents(Zj ))

● Easily implemented in message-passing parallel systems, brains

● Main computational problems


– difficult to tell if convergence has been achieved
– can be wasteful if Markov blanket is large:
P (Xi∣mb(Xi)) won’t change much (law of large numbers)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Summary 64

● Bayes nets provide a natural representation for (causally induced)


conditional independence
● Topology + CPTs = compact representation of joint distribution
● Generally easy for (non)experts to construct
● Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
● Continuous variables Ô⇒ parameterized distributions (e.g., linear Gaussian)
● Exact inference by variable elimination
– polytime on polytrees, NP-hard on general graphs
– space = time, very sensitive to topology
● Approximate inference by LW, MCMC
– LW does poorly when there is lots of (downstream) evidence
– LW, MCMC generally insensitive to topology
– Convergence can be very slow with probabilities close to 1 or 0
– Can handle arbitrary combinations of discrete and continuous variables

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017

You might also like