0% found this document useful (0 votes)
9 views

Lecture Bayesian Networks

Uploaded by

haifa.zaidi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture Bayesian Networks

Uploaded by

haifa.zaidi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Bayesian Networks

Philipp Koehn

6 April 2017

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Outline 1

● Bayesian Networks

● Parameterized distributions

● Exact inference

● Approximate inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


2

bayesian networks

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Bayesian Networks 3

● A simple, graphical notation for conditional independence assertions


and hence for compact specification of full joint distributions

● Syntax
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(Xi∣P arents(Xi))

● In the simplest case, conditional distribution represented as


a conditional probability table (CPT) giving the
distribution over Xi for each combination of parent values

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 4

● Topology of network encodes conditional independence assertions:

● W eather is independent of the other variables

● T oothache and Catch are conditionally independent given Cavity

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 5

● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn’t call. Sometimes it’s set off by minor earthquakes.
Is there a burglar?

● Variables: Burglar, Earthquake, Alarm, JohnCalls, M aryCalls

● Network topology reflects “causal” knowledge


– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 6

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compactness 7

● A conditional probability table for Boolean Xi with k Boolean parents has 2k


rows for the combinations of parent values

● Each row requires one number p for Xi = true


(the number for Xi = f alse is just 1 − p)

● If each variable has no more than k parents,


the complete network requires O(n ⋅ 2k ) numbers

● I.e., grows linearly with n, vs. O(2n) for the full joint distribution

● For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25 − 1 = 31)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Global Semantics 8

● Global semantics defines the full joint distribution as the product of the local
conditional distributions:
n
P (x1, . . . , xn) = ∏ P (xi∣parents(Xi))
i=1

● E.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)

= P (j∣a)P (m∣a)P (a∣¬b, ¬e)P (¬b)P (¬e)


= 0.9 × 0.7 × 0.001 × 0.999 × 0.998
≈ 0.00063

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Local Semantics 9

● Local semantics: each node is conditionally independent


of its nondescendants given its parents

● Theorem: Local semantics ⇔ global semantics

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Markov Blanket 10

● Each node is conditionally independent of all others given its


Markov blanket: parents + children + children’s parents

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Constructing Bayesian Networks 11

● Need a method such that a series of locally testable assertions of


conditional independence guarantees the required global semantics
1. Choose an ordering of variables X1, . . . , Xn
2. For i = 1 to n
add Xi to the network
select parents from X1, . . . , Xi−1 such that
P(Xi∣P arents(Xi)) = P(Xi∣X1, . . . , Xi−1)

● This choice of parents guarantees the global semantics:


n
P(X1, . . . , Xn) = ∏ P(Xi∣X1, . . . , Xi−1) (chain rule)
i=1
n
= ∏ P(Xi∣P arents(Xi)) (by construction)
i=1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 12

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 13

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 14

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)?
● P (B∣A, J, M ) = P (B)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 15

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)?
● P (E∣B, A, J, M ) = P (E∣A, B)?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 16

● Suppose we choose the ordering M , J, A, B, E

● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)? No
● P (E∣B, A, J, M ) = P (E∣A, B)? Yes

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 17

● Deciding conditional independence is hard in noncausal directions


● (Causal models and conditional independence seem hardwired for humans!)
● Assessing conditional probabilities is hard in noncausal directions
● Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example: Car Diagnosis 18

● Initial evidence: car won’t start


● Testable variables (green), “broken, so fix it” variables (orange)
● Hidden variables (gray) ensure sparse structure, reduce parameters

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example: Car Insurance 19

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compact Conditional Distributions 20

● CPT grows exponentially with number of parents


CPT becomes infinite with continuous-valued parent or child

● Solution: canonical distributions that are defined compactly

● Deterministic nodes are the simplest case:


X = f (P arents(X)) for some function f

● E.g., Boolean functions


N orthAmerican ⇔ Canadian ∨ U S ∨ M exican

● E.g., numerical relationships among continuous variables

∂Level
= inflow + precipitation - outflow - evaporation
∂t

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Compact Conditional Distributions 21

● Noisy-OR distributions model multiple noninteracting causes


– parents U1 . . . Uk include all causes (can add leak node)
– independent failure probability qi for each cause alone
Ô⇒ P (X∣U1 . . . Uj , ¬Uj+1 . . . ¬Uk ) = 1 − ∏ji = 1 qi

Cold F lu M alaria P (F ever) P (¬F ever)


F F F 0.0 1.0
F F T 0.9 0.1
F T F 0.8 0.2
F T T 0.98 0.02 = 0.2 × 0.1
T F F 0.4 0.6
T F T 0.94 0.06 = 0.6 × 0.1
T T F 0.88 0.12 = 0.6 × 0.2
T T T 0.988 0.012 = 0.6 × 0.2 × 0.1

● Number of parameters linear in number of parents

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Hybrid (Discrete+Continuous) Networks 22

● Discrete (Subsidy? and Buys?); continuous (Harvest and Cost)

● Option 1: discretization—possibly large errors, large CPTs


Option 2: finitely parameterized canonical families

● 1) Continuous variable, discrete+continuous parents (e.g., Cost)


2) Discrete variable, continuous parents (e.g., Buys?)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Continuous Child Variables 23

● Need one conditional density function for child variable given continuous
parents, for each possible assignment to discrete parents

● Most common is the linear Gaussian model, e.g.,:

P (Cost = c∣Harvest = h, Subsidy? = true)


= N (ath + bt, σt)(c)
2
1 1 c − (ath + bt)
= √ exp (− ( ) )
σt 2π 2 σt

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Continuous Child Variables 24

● All-continuous network with LG distributions


Ô⇒ full joint distribution is a multivariate Gaussian

● Discrete+continuous LG network is a conditional Gaussian network i.e., a


multivariate Gaussian over all continuous variables for each combination of
discrete variable values

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Discrete Variable w/ Continuous Parents 25

● Probability of Buys? given Cost should be a “soft” threshold:

● Probit distribution uses integral of Gaussian:


x
Φ(x) = ∫−∞ N (0, 1)(x)dx
P (Buys? = true ∣ Cost = c) = Φ((−c + µ)/σ)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Why the Probit? 26

● It’s sort of the right shape

● Can view as hard threshold whose location is subject to noise

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Discrete Variable 27

● Sigmoid (or logit) distribution also used in neural networks:


1
P (Buys? = true ∣ Cost = c) =
1 + exp(−2 −c+µ
σ )

● Sigmoid has similar shape to probit but much longer tails:

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


28

inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference Tasks 29

● Simple queries: compute posterior marginal P(Xi∣E = e)


e.g., P (N oGas∣Gauge = empty, Lights = on, Starts = f alse)

● Conjunctive queries: P(Xi, Xj ∣E = e) = P(Xi∣E = e)P(Xj ∣Xi, E = e)

● Optimal decisions: decision networks include utility information;


probabilistic inference required for P (outcome∣action, evidence)

● Value of information: which evidence to seek next?

● Sensitivity analysis: which probability values are most critical?

● Explanation: why do I need a new starter motor?

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Enumeration 30

● Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation

● Simple query on the burglary network


P(B∣j, m)
= P(B, j, m)/P (j, m)
= αP(B, j, m)
= α ∑e ∑a P(B, e, a, j, m)

● Rewrite full joint entries using product of CPT entries:


P(B∣j, m)
= α ∑e ∑a P(B)P (e)P(a∣B, e)P (j∣a)P (m∣a)
= αP(B) ∑e P (e) ∑a P(a∣B, e)P (j∣a)P (m∣a)

● Recursive depth-first enumeration: O(n) space, O(dn) time

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Enumeration Algorithm 31

function E NUMERATION -A SK(X, e, bn) returns a distribution over X


inputs: X, the query variable
e, observed values for variables E
bn, a Bayesian network with variables {X} ∪ E ∪ Y
Q(X ) ← a distribution over X, initially empty
for each value xi of X do
extend e with value xi for X
Q(xi ) ← E NUMERATE -A LL(VARS[bn], e)
return N ORMALIZE(Q(X ))
function E NUMERATE -A LL(vars, e) returns a real number
if E MPTY ?(vars) then return 1.0
Y ← F IRST(vars)
if Y has value y in e
then return P (y ∣ P a(Y )) × E NUMERATE -A LL(R EST(vars), e)
else return ∑y P (y ∣ P a(Y )) × E NUMERATE -A LL(R EST(vars), ey )
where ey is e extended with Y = y

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Evaluation Tree 32

● Enumeration is inefficient: repeated computation


e.g., computes P (j∣a)P (m∣a) for each value of e

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Variable Elimination 33

● Variable elimination: carry out summations right-to-left,


storing intermediate results (factors) to avoid recomputation
P(B∣j, m)
= α P(B) ∑e P (e) ∑a P(a∣B, e) P (j∣a) P (m∣a)
² ² ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¶
B E A J M

= αP(B) ∑e P (e) ∑a P(a∣B, e)P (j∣a)fM (a)


= αP(B) ∑e P (e) ∑a P(a∣B, e)fJ (a)fM (a)
= αP(B) ∑e P (e) ∑a fA(a, b, e)fJ (a)fM (a)
= αP(B) ∑e P (e)fĀJM (b, e) (sum out A)
= αP(B)fĒ ĀJM (b) (sum out E)
= αfB (b) × fĒ ĀJM (b)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Variable Elimination Algorithm 34

function E LIMINATION -A SK(X, e, bn) returns a distribution over X


inputs: X, the query variable
e, evidence specified as an event
bn, a belief network specifying joint distribution P(X1, . . . , Xn)
factors ← [ ]; vars ← R EVERSE(VARS[bn])
for each var in vars do
factors ← [M AKE -FACTOR(var , e)∣factors]
if var is a hidden variable then factors ← S UM -O UT(var, factors)
return N ORMALIZE(P OINTWISE -P RODUCT(factors))

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Irrelevant Variables 35

● Consider the query P (JohnCalls∣Burglary = true)


P (J∣b) = αP (b) ∑ P (e) ∑ P (a∣b, e)P (J∣a) ∑ P (m∣a)
e a m
Sum over m is identically 1; M is irrelevant to the query

● Theorem 1: Y is irrelevant unless Y ∈ Ancestors({X} ∪ E)

● Here
– X = JohnCalls, E = {Burglary}
– Ancestors({X} ∪ E) = {Alarm, Earthquake}
⇒ M aryCalls is irrelevant

● Compare this to backward chaining from the query in Horn clause KBs

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Irrelevant Variables 36

● Definition: moral graph of Bayes net: marry all parents and drop arrows

● Definition: A is m-separated from B by C iff separated by C in the moral graph

● Theorem 2: Y is irrelevant if m-separated from X by E

● For P (JohnCalls∣Alarm = true), both


Burglary and Earthquake are irrelevant

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Complexity of Exact Inference 37

● Singly connected networks (or polytrees)


– any two nodes are connected by at most one (undirected) path
– time and space cost of variable elimination are O(dk n)

● Multiply connected networks


– can reduce 3SAT to exact inference Ô⇒ NP-hard
– equivalent to counting 3SAT models Ô⇒ #P-complete

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


38

approximate inference

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Inference by Stochastic Simulation 39

● Basic idea
– Draw N samples from a sampling distribution S
– Compute an approximate posterior probability P̂
– Show this converges to the true probability P

● Outline
– Sampling from an empty network
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples
– Markov chain Monte Carlo (MCMC): sample from a stochastic process
whose stationary distribution is the true posterior

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Sampling from an Empty Network 40

function P RIOR -S AMPLE(bn) returns an event sampled from bn


inputs: bn, a belief network specifying joint distribution P(X1, . . . , Xn)
x ← an event with n elements
for i = 1 to n do
xi ← a random sample from P(Xi ∣ parents(Xi))
given the values of P arents(Xi) in x
return x

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 41

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 42

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 43

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 44

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 45

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 46

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Example 47

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Sampling from an Empty Network 48

● Probability that P RIOR S AMPLE generates a particular event


SP S (x1 . . . xn) = ∏ni= 1 P (xi∣parents(Xi)) = P (x1 . . . xn)
i.e., the true prior probability

● E.g., SP S (t, f, t, t) = 0.5 × 0.9 × 0.8 × 0.9 = 0.324 = P (t, f, t, t)

● Let NP S (x1 . . . xn) be the number of samples generated for event x1, . . . , xn

● Then we have lim P̂ (x1, . . . , xn) = lim NP S (x1, . . . , xn)/N


N →∞ N →∞
= SP S (x1, . . . , xn)
= P (x1 . . . xn)

● That is, estimates derived from P RIOR S AMPLE are consistent

● Shorthand: P̂ (x1, . . . , xn) ≈ P (x1 . . . xn)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Rejection Sampling 49

● P̂(X∣e) estimated from samples agreeing with e

function R EJECTION -S AMPLING(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: N, a vector of counts over X, initially zero
for j = 1 to N do
x ← P RIOR -S AMPLE(bn)
if x is consistent with e then
N[x] ← N[x]+1 where x is the value of X in x
return N ORMALIZE(N[X])

● E.g., estimate P(Rain∣Sprinkler = true) using 100 samples


27 samples have Sprinkler = true
Of these, 8 have Rain = true and 19 have Rain = f alse
● P̂(Rain∣Sprinkler = true) = N ORMALIZE(⟨8, 19⟩) = ⟨0.296, 0.704⟩
● Similar to a basic real-world empirical estimation procedure

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Analysis of Rejection Sampling 50

● P̂(X∣e) = αNP S (X, e) (algorithm defn.)


= NP S (X, e)/NP S (e) (normalized by NP S (e))
≈ P(X, e)/P (e) (property of P RIOR S AMPLE)
= P(X∣e) (defn. of conditional probability)

● Hence rejection sampling returns consistent posterior estimates

● Problem: hopelessly expensive if P (e) is small

● P (e) drops off exponentially with number of evidence variables!

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting 51

● Idea: fix evidence variables, sample only nonevidence variables,


and weight each sample by the likelihood it accords the evidence

function L IKELIHOOD -W EIGHTING(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: W, a vector of weighted counts over X, initially zero
for j = 1 to N do
x, w ← W EIGHTED -S AMPLE(bn)
W[x ] ← W[x ] + w where x is the value of X in x
return N ORMALIZE(W[X ])

function W EIGHTED -S AMPLE(bn, e) returns an event and a weight


x ← an event with n elements; w ← 1
for i = 1 to n do
if Xi has a value xi in e
then w ← w × P (Xi = xi ∣ parents(Xi ))
else xi ← a random sample from P(Xi ∣ parents(Xi ))
return x, w

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 52

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 53

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 54

w = 1.0

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 55

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 56

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 57

w = 1.0 × 0.1

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Example 58

w = 1.0 × 0.1 × 0.99 = 0.099

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Likelihood Weighting Analysis 59

● Sampling probability for W EIGHTED S AMPLE is


SW S (z, e) = ∏li = 1 P (zi∣parents(Zi))

● Note: pays attention to evidence in ancestors only


Ô⇒ somewhere “in between” prior and
posterior distribution

● Weight for a given sample z, e is


w(z, e) = ∏mi = 1 P (ei ∣parents(Ei ))

● Weighted sampling probability is


SW S (z, e)w(z, e)
= ∏li = 1 P (zi∣parents(Zi)) ∏m
i = 1 P (ei ∣parents(Ei ))
= P (z, e) (by standard global semantics of network)

● Hence likelihood weighting returns consistent estimates


but performance still degrades with many evidence variables
because a few samples have nearly all the total weight

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Approximate Inference using MCMC 60

● “State” of network = current assignment to all variables


● Generate next state by sampling one variable given Markov blanket
Sample each variable in turn, keeping evidence fixed

function MCMC-A SK(X, e, bn, N) returns an estimate of P (X ∣e)


local variables: N[X ], a vector of counts over X, initially zero
Z, the nonevidence variables in bn
x, the current state of the network, initially copied from e
initialize x with random values for the variables in Y
for j = 1 to N do
for each Zi in Z do
sample the value of Zi in x from P(Zi ∣mb(Zi ))
given the values of M B(Zi ) in x
N[x ] ← N[x ] + 1 where x is the value of X in x
return N ORMALIZE(N[X ])

● Can also choose a variable to sample at random each time

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


The Markov Chain 61

● With Sprinkler = true, W etGrass = true, there are four states:

● Wander about for a while, average what you see

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


MCMC Example 62

● Estimate P(Rain∣Sprinkler = true, W etGrass = true)

● Sample Cloudy or Rain given its Markov blanket, repeat.


Count number of times Rain is true and false in the samples.

● E.g., visit 100 states


31 have Rain = true, 69 have Rain = f alse

● P̂(Rain∣Sprinkler = true, W etGrass = true)


= N ORMALIZE(⟨31, 69⟩) = ⟨0.31, 0.69⟩

● Theorem: chain approaches stationary distribution:


long-run fraction of time spent in each state is exactly
proportional to its posterior probability

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Markov Blanket Sampling 63

● Markov blanket of Cloudy is Sprinkler and Rain

● Markov blanket of Rain is


Cloudy, Sprinkler, and W etGrass

● Probability given the Markov blanket is calculated as follows:


P (x′i∣mb(Xi)) = P (x′i∣parents(Xi)) ∏Zj ∈Children(Xi) P (zj ∣parents(Zj ))

● Easily implemented in message-passing parallel systems, brains

● Main computational problems


– difficult to tell if convergence has been achieved
– can be wasteful if Markov blanket is large:
P (Xi∣mb(Xi)) won’t change much (law of large numbers)

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017


Summary 64

● Bayes nets provide a natural representation for (causally induced)


conditional independence
● Topology + CPTs = compact representation of joint distribution
● Generally easy for (non)experts to construct
● Canonical distributions (e.g., noisy-OR) = compact representation of CPTs
● Continuous variables Ô⇒ parameterized distributions (e.g., linear Gaussian)
● Exact inference by variable elimination
– polytime on polytrees, NP-hard on general graphs
– space = time, very sensitive to topology
● Approximate inference by LW, MCMC
– LW does poorly when there is lots of (downstream) evidence
– LW, MCMC generally insensitive to topology
– Convergence can be very slow with probabilities close to 1 or 0
– Can handle arbitrary combinations of discrete and continuous variables

Philipp Koehn Artificial Intelligence: Bayesian Networks 6 April 2017

You might also like