Lecture Bayesian Networks
Lecture Bayesian Networks
Philipp Koehn
6 April 2017
● Bayesian Networks
● Parameterized distributions
● Exact inference
● Approximate inference
bayesian networks
● Syntax
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(Xi∣P arents(Xi))
● I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary
doesn’t call. Sometimes it’s set off by minor earthquakes.
Is there a burglar?
● I.e., grows linearly with n, vs. O(2n) for the full joint distribution
● Global semantics defines the full joint distribution as the product of the local
conditional distributions:
n
P (x1, . . . , xn) = ∏ P (xi∣parents(Xi))
i=1
● E.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e)
● P (J∣M ) = P (J)?
● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)?
● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)?
● P (B∣A, J, M ) = P (B)?
● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)?
● P (E∣B, A, J, M ) = P (E∣A, B)?
● P (J∣M ) = P (J)? No
● P (A∣J, M ) = P (A∣J)? P (A∣J, M ) = P (A)? No
● P (B∣A, J, M ) = P (B∣A)? Yes
● P (B∣A, J, M ) = P (B)? No
● P (E∣B, A, J, M ) = P (E∣A)? No
● P (E∣B, A, J, M ) = P (E∣A, B)? Yes
∂Level
= inflow + precipitation - outflow - evaporation
∂t
● Need one conditional density function for child variable given continuous
parents, for each possible assignment to discrete parents
inference
● Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation
● Here
– X = JohnCalls, E = {Burglary}
– Ancestors({X} ∪ E) = {Alarm, Earthquake}
⇒ M aryCalls is irrelevant
● Compare this to backward chaining from the query in Horn clause KBs
● Definition: moral graph of Bayes net: marry all parents and drop arrows
approximate inference
● Basic idea
– Draw N samples from a sampling distribution S
– Compute an approximate posterior probability P̂
– Show this converges to the true probability P
● Outline
– Sampling from an empty network
– Rejection sampling: reject samples disagreeing with evidence
– Likelihood weighting: use evidence to weight samples
– Markov chain Monte Carlo (MCMC): sample from a stochastic process
whose stationary distribution is the true posterior
● Let NP S (x1 . . . xn) be the number of samples generated for event x1, . . . , xn
w = 1.0
w = 1.0
w = 1.0
w = 1.0 × 0.1
w = 1.0 × 0.1
w = 1.0 × 0.1