Bayesian Networks: Chapter 14, Sections 1-4
Bayesian Networks: Chapter 14, Sections 1-4
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 1
Bayesian networks
A simple, graphical notation for conditional independence assertions
and hence for compact specification of full joint distributions
Syntax:
– a set of nodes, one per variable
– a directed, acyclic graph (link ≈ “directly influences”)
– a conditional distribution for each node given its parents:
P(Xi|P arents(Xi))
In the simplest case, the conditional distribution is represented
as a conditional probability table (CPT) giving the
distribution over Xi for each combination of parent values
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 2
Example
The topology of a network encodes conditional independence assertions:
Weather Cavity
Toothache Catch
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 3
Example
I’m at work. My neighbor John calls to say my alarm is ringing, but my
neighbor Mary doesn’t call. Sometimes it’s set off by minor earthquakes. Is
there a burglar?
Variables: Burglar, Earthquake, Alarm, JohnCalls, M aryCalls
The network topology reflects our “causal” knowledge:
– a burglar can trigger the alarm
– an earthquake can trigger the alarm
– the alarm can cause Mary to call
– the alarm can cause John to call
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 4
Example contd.
P(B) P(E)
Burglary Earthquake
.001 .002
B E P(A|B,E)
T T .95
Alarm
T F .94
F T .29
F F .001
A P(J|A) A P(M|A)
JohnCalls T .90 MaryCalls T .70
F .05 F .01
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 5
Compactness
A CPT for Boolean Xi with k Boolean parents has
2k rows for the combinations of parent values B E
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 6
Global semantics
The global semantics defines the full joint distribution
as the product of the local conditional distributions: B E
n
P (x1, . . . , xn) = Πi = 1P (xi|parents(Xi)) A
e.g., P (j ∧ m ∧ a ∧ ¬b ∧ ¬e) J M
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 7
Markov blanket
Theorem: Each node is conditionally independent of all others given its
Markov blanket: parents + children + children’s parents
U1 Um
...
X
Z 1j Z nj
Y1 Yn
...
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 8
Constructing Bayesian networks
We need a method such that a series of locally testable assertions of
conditional independence guarantees the required global semantics
1. Choose an ordering of variables X1, . . . , Xn
2. For i = 1 to n
add Xi to the network
select parents from X1, . . . , Xi−1 such that
P(Xi|P arents(Xi)) = P(Xi|X1, . . . , Xi−1)
This choice of parents guarantees the global semantics:
n
P(X1, . . . , Xn) = Πi = 1P(Xi|X1, . . . , Xi−1) (chain rule)
n
= Πi = 1P(Xi|P arents(Xi)) (by construction)
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 9
Example
Suppose we choose the ordering M , J, A, B, E
MaryCalls
JohnCalls
P (J|M ) = P (J)?
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 10
Example
Suppose we choose the ordering M , J, A, B, E
MaryCalls
JohnCalls
Alarm
P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)?
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 11
Example
Suppose we choose the ordering M , J, A, B, E
MaryCalls
JohnCalls
Alarm
Burglary
P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)?
P (B|A, J, M ) = P (B)?
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 12
Example
Suppose we choose the ordering M , J, A, B, E
MaryCalls
JohnCalls
Alarm
Burglary
Earthquake
P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)? Yes
P (B|A, J, M ) = P (B)? No
P (E|B, A, J, M ) = P (E|A)?
P (E|B, A, J, M ) = P (E|A, B)?
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 13
Example
Suppose we choose the ordering M , J, A, B, E
MaryCalls
JohnCalls
Alarm
Burglary
Earthquake
P (J|M ) = P (J)? No
P (A|J, M ) = P (A|J)? P (A|J, M ) = P (A)? No
P (B|A, J, M ) = P (B|A)? Yes
P (B|A, J, M ) = P (B)? No
P (E|B, A, J, M ) = P (E|A)? No
P (E|B, A, J, M ) = P (E|A, B)? Yes
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 14
Example contd.
MaryCalls
JohnCalls
Alarm
Burglary
Earthquake
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 15
Example contd.
The chosen ordering of the variables can have a big impact on the size of
the network! Network (b) has 25 − 1 = 31 numbers—exactly the same as
the full joint distribution
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 16
Inference tasks
Simple queries: compute posterior marginal P(Xi|E = e)
e.g., P (Burglar|JohnCalls = true, M aryCalls = true)
or shorter, P (B|j, m)
Conjunctive queries: P(Xi, Xj |E = e) = P(Xi|E = e)P(Xj |Xi, E = e)
Optimal decisions: decision networks include utility information;
probabilistic inference required for P (outcome|action, evidence)
Value of information: which evidence to seek next?
Sensitivity analysis: which probability values are most critical?
Explanation: why do I need a new starter motor?
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 17
Inference by enumeration
Slightly intelligent way to sum out variables from the joint without actually
constructing its explicit representation
Simple query on the burglary network:
P(B|j, m) B E
= P(B, j, m)/P (j, m)
A
= αP(B, j, m)
= α Σe Σa P(B, e, a, j, m) J M
(where e and a are the hidden variables)
Rewrite full joint entries using product of CPT entries:
P(B|j, m)
= α Σe Σa P(B)P (e)P(a|B, e)P (j|a)P (m|a)
= α P(B) Σe P (e) Σa P(a|B, e)P (j|a)P (m|a)
Recursive depth-first enumeration: O(n) space, O(dn) time
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 18
Evaluation tree
P(b)
.001
P(e) P( e)
.002 .998
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 19
Inference by variable elimination
Variable elimination: carry out summations right-to-left,
storing intermediate results (factors) to avoid recomputation
P(B|j, m) = α P(B) Σe P (e) Σa P(a|B, e) P (j|a) P (m|a)
= α f1(B) Σe f2(E) Σa f3(A, B, E) f4(A) f5(A)
(where f1, f2, f4, f5, are 2-element vectors, and f3 is a 2 × 2 × 2 matrix)
Sum out A to get the 2 × 2 matrix f6, and then E to get the 2-vector f7:
f6(B, E) = Σa f3(A, B, E) × f4(A) × f5(A)
= f3(a, B, E) × f4(a) × f5(a) + f3(¬a, B, E) × f4(¬a) × f5(¬a)
f7(B) = Σe f2(E) × f6(B, E) = f2(e) × f6(B, e) + f2(¬e) × f6(B, ¬e)
Finally, we get this:
P(B|j, m) = α f1(B) × f7(B)
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 20
Irrelevant variables
Consider the query P (JohnCalls|Burglary = true)
B E
X X X
P (J|b) = αP (b) P (e) P (a|b, e)P (J|a) P (m|a) A
e a m
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 21
Summary
Bayes nets provide a natural representation for (causally induced)
conditional independence
Topology + CPTs = compact representation of joint distribution
Generally easy for (non)experts to construct
Probabilistic inference tasks can be computed exactly:
– variable elimination avoids recomputations
– irrelevant variables can be removed, which reduces complexity
c
Artificial Intelligence, spring 2013, Peter Ljunglöf; based on AIMA Slides
Stuart Russel and Peter Norvig, 2004 Chapter 14, Sections 1–4 22