Probabilistic Reasoning
Probabilistic Reasoning
REASONING SYSTEMS
2
fi
Bayes’-Rule
Why is this rule useful?
• Causal experiences Cause
C: cause, E: effect Mistake in
technical system
• Diagnostic Inference
causal diagnostic
inferences inferences
Bayes’ Rule 3
Knowledge in uncertain domains
Cavity Weather
Toothache Catch
Probabilistic Networks 5
Simple Bayesian Network • Example Alarm
– new burglar alarm fairly
reliable at detecting a
burglary
– also responds on occasion to
Burglary Earthquake minor earthquake
– two neighbors, John and Mary
have promised to call when
they hear the alarm
– John always calls when he
Alarm hears alarm, but sometimes
confuses the telephone
ringing with the alarm and
calls then too
JohnCalls MaryCalls – Mary likes loud music and
sometimes misses the alarm
altogether
– Given the evidence of who
has or has not called, we
would like to estimate the
Probabilistic Networks probability of a burglary. 6
Simple Bayesian Network
P(B) P(E)
Burglary Earthquake
0.001 0.002
B E P(A)
1 1 0.95
Alarm 1 0 0.94
0 1 0.29
0 0 0.001
A P(J) A P(M)
JohnCalls MaryCalls 1 0.90 1 0.70
0 0.05 0 0.01
conditional distributions
Probabilistic Networks 7
Semantics of Bayesian Networks
• General idea
– Joint distribution can be expressed Burglary Earthquake
as product of local conditional
probabilities
– Every entry in the joint probability
distribution can be calculated from Alarm
the information in the network.
– Generic entry
JohnCalls MaryCalls
Global Semantics 9
Representing the full joint distribution
• Example
P(B) P(E)
– Alarm has sounded but neither a
burglary nor an earthquake has 0.001 0.002
occurred, and both John and Mary B E P(A)
call
1 1 0.95
– P (j ∧ m ∧ a ∧ ¬b ∧ ¬e) 1 0 0.94
= P (j|a) P(m|a) P(a|¬b,¬e) P(¬b) P(¬e) 0 1 0.29
= 0.9 × 0.7 × 0.001 × 0.999 × 0.998 0 0 0.001
≈ 0.00063
A P(J) A P(M)
1 0.90 1 0.70
0 0.05 0 0.01
Global Semantics 10
Method for Constructing Bayesian Networks
Global Semantics 11
Chain rule
Global Semantics 12
fi
fi
Construction of Bayesian Networks
• Intuitive
– Parents of node Xi should contain all Alarm
those nodes in X1,…Xi-1 that
directly in uence Xi
– Example.:
• M (is in uenced by B or E but not JohnCalls MaryCalls
directly)
• In uenced by A, and J calls are not
evident
P(M|J,A,E,B) = P(M|A)
Global Semantics 13
fl
fl
fl
General Procedure
Global Semantics 14
fi
fi
Notes
• Construction method guarantees that the network is
acyclic
– Because each node is connected only to earlier nodes.
• Redundancies
– No redundant probability values
– Exception: for one entry in each row of each conditional
probability table, if (P(x2|x1) P(¬x2|x1)) is redundant
Global Semantics 16
fl
Node Ordering
• Local structures (example) • Order
– 30 nodes, each max. 5 parents – Add:
– 960 for BN, > 1 billion with • root rst
joint distribution • then direct in uencers
• then down to leaves
• Construction
– What happens with “wrong”
– Not trivial order?
– Variable directly in uenced
only from a few others
– Set parent node
“appropriately” ➞ Network
topology
– “Direct in uencers” rst
– Thus: correct order important
Global Semantics 17
fi
fl
fl
fl
fi
Example ordering
• Let us consider the burglary
example again. Burglary Earthquake
– M, J, A, B, E
– M, J, E, B, A
JohnCalls MaryCalls
Global Semantics 18
Example
19
Example contd.
20
Example contd.
21
Example contd.
22
Example contd.
23
Example contd.
24
Example ordering (2)
• Order
1 2 – M,J,E,B,A
MaryCalls JohnCalls
• Network
– 31 probabilities
3 – like full joint distribution
– thus: bad choice
Earthquake
• All three networks represent
same probability distribution
Burglary Alarm
• Last two versions
4 5 – simply fail to represent all the
conditional independence
relationships
– end up specifying a lot of
unnecessary numbers instead.
Global Semantics 25
Conditional independence relations in Bayesian
networks
• Before
– “numerical (global) semantics Burglary Earthquake
with probability distribution
– from this derive conditional
independencies
Alarm
• Idea now
– opposite direction: topological
(local) semantics
JohnCalls MaryCalls
– speci es conditional
independencies
– from this derive numerical
semantics
Local Semantics 26
fi
Conditional independence relations in Bayesian
networks
• General idea
– A node is conditionally
independent of its non- Burglary Earthquake
descendants given its parents
– A node is conditionally
independent of all other nodes in
the network given its parents, Alarm
children, and children’s parents—
that is, given its Markov
blanket
• Examples
J is independent of B and E given A, i.e. JohnCalls MaryCalls
P(J|A,B,E) = P(J|A)
Local Semantics 27
Conditional independence relations in Bayesian
networks
Subsidy? Harvest
Cost
Buys?
Subsidy? Harvest
Cost
Buys?
36
Inference tasks
<latexit sha1_base64="iSh0IN7RqaZW3RGq0diCQBplaDw=">AAACH3icbZBLSwMxFIUzvq2vqks3wSIoSJn6qF0W3bisYFuhU+VO5o5Gk8yQZJQy9H+4rX/Gnbj1v7gwrRV8HQh8nHMvuZwwFdxY33/zJianpmdm5+YLC4tLyyvF1bWWSTLNsMkSkeiLEAwKrrBpuRV4kWoEGQpsh3cnw7x9j9rwRJ3bXopdCdeKx5yBddZlHoQxbfS3j3dvd+XOVbHkl6u1/cPqPvXL/ki08htKZKzGVfE9iBKWSVSWCTCmU/FT281BW84E9gtBZjAFdgfX2HGoQKLp5qOr+3TLORGNE+2esnTkft/IQRrTk6GblGBvzO9saP6XdTIb17o5V2lmUbHPj+JMUJvQYQU04hqZFT0HwDR3t1J2AxqYdUUVAo0KH1giJagoD2KQXPQijCETtp8HJv7igqvrTzl/obVXrlTLB2cHpXptXNwc2SCbZJtUyBGpk1PSIE3CiCaPZECevIH37L14r5+jE954Z538kPf2ASx5o6I=</latexit>
P(B, j, m)
<latexit sha1_base64="JkcjlKEWADoo28Pg20v8OX2o568=">AAACKnicbZBLaxsxFIU1SZqmbpNOkkUX3YiYQgLBjJvE8aZgmk2XDsQP8BhzR3MnVi1pBknTYob5NdmmfyY7021+RheVH4G8Dgg+zrkXSSfKBDc2CGbe2vrGm823W+8q7z9s73z0d/e6Js01ww5LRar7ERgUXGHHciuwn2kEGQnsRZOLed77hdrwVF3ZaYZDCdeKJ5yBddbI//SNhiCyMdAijBLaLg+/H/88lkcjvxrUGs2Ts8YJDWrBQrT+HKpkpfbI/xfGKcslKssEGDOoB5kdFqAtZwLLSpgbzIBN4BoHDhVINMNi8YGSfnFOTJNUu6MsXbiPNwqQxkxl5CYl2LF5ns3N17JBbpPmsOAqyy0qtrwoyQW1KZ23QWOukVkxdQBMc/dWysaggVnXWSXUqPA3S6UEFRdhApKLaYwJ5MKWRWiSB664ul6U8xK6X2v1Ru308rTaaq6K2yKfyQE5JHVyTlrkB2mTDmGkJDfklvzxbr07b+b9XY6ueaudffJE3v1//xCnDA==</latexit>
= ↵P(B,
XX j, m)
<latexit sha1_base64="vUUznu7B5haZwxbCiVr6CXKcaIA=">AAACPHicbVBNTxsxEPUGaCEtJaVHLhZRpUSKog3QkAtS1F44BokEpGwUzXpnE4PtXdleqmiV38Cv6TX9Hb33VnHlxAHnA6kNfZLlp/dmNDMvTAU31vd/eYWNza03b7d3iu/e737YK33c75kk0wy7LBGJvg7BoOAKu5ZbgdepRpChwKvw9tvcv7pDbXiiLu0kxYGEkeIxZ2CdNCxVz2gAIh0DDUwmh7j8gOZBGNPOtPK1dlOTNaxBdVgq+/Vm6/hL85j6dX8B2lgnZbJCZ1h6CqKEZRKVZQKM6Tf81A5y0JYzgdNikBlMgd3CCPuOKpBoBvnipCn97JSIxol2T1m6UP/uyEEaM5Ghq5Rgx2bdm4v/8/qZjVuDnKs0s6jYclCcCWoTOs+HRlwjs2LiCDDN3a6UjUEDsy7FYqBR4XeWSAkqyoMYJBeTCGPIhJ3mgYlfeNHF9Sqc16R3VG806ycXJ+V2axXcNjkgh6RCGuSUtMk56ZAuYeSe/CAz8tObeb+9P97DsrTgrXo+kX/gPT4D7m2uBA==</latexit>
=↵ P(B, j, m, e, a)
e a
<latexit sha1_base64="e0AiG8DbbZ0xkrC+azfkn89N8wY=">AAACVHicdVBda9swFFXcduu8rcvWx72IhUELxdhZyMfDoF1f+pjB0hbiEK7l60atJBtJ3ghOfs9+zV47+l/6UDlJYR3bAd17OOdeJJ2kENzYMLxreFvbO8+e777wX756vfem+fbduclLzXDEcpHrywQMCq5wZLkVeFloBJkIvEhuTmv/4jtqw3P1zc4LnEi4UjzjDKyTps2T4UGyuD6Sh/Qz9WMQxQxobEo5xXUD6gYOXcG6wCI5WpHrBdRNujZttsKg24/aYZeGQdTu9HsDRwaDfu9Tj0ZBuEKLbDCcNu/jNGelRGWZAGPGUVjYSQXaciZw6celwQLYDVzh2FEFEs2kWn11ST86JaVZrt1Rlq7UPzcqkMbMZeImJdiZ+durxX9549Jm/UnFVVFaVGx9UVYKanNa50ZTrpFZMXcEmOburZTNQAOzLl0/1qjwB8ulBJVWcQaSi3mKGZTCLqvYZI/cd3E9ZkL/T87bQdQNOl87reMvm+B2yXvygRyQiPTIMTkjQzIijPwkv8gt+d24bdx7W97OetRrbHb2yRN4ew8ln7JP</latexit>
XX
P (b|j, m) = ↵ P (b)P (e)P (a|b, e)P (j|a)P (m|a)
e a
⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄
⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄
50
Direct sampling methods
Approximate inference in BN 51
Direct sampling methods
Approximate inference in BN 52
Direct sampling methods
Approximate inference in BN 53
Direct sampling methods
Approximate inference in BN 54
Computing answers
Approximate inference in BN 55
Computing answers
Approximate inference in BN 56
Rejection sampling
Approximate inference in BN 57
Rejection sampling
• Let ^
P(X|e) be the estimated distribution. Then
from the definition just given
Approximate inference in BN 58
Rejection sampling
P(Rain | Sprinkler = true) ≈ NORMALIZE( < 8,19 > ) = < 0.296,0.704 >
Approximate inference in BN 59
Rejection sampling
How often does it rain the day after we have observed aurora
borealis? Ignoring all those days with no aurora borealis…
60
Likelihood weighting
Approximate inference in BN 61
Likelihood weighting
Approximate inference in BN 62
Likelihood weighting
For the evidence P(Rain | Cloudy = true , WetGrass = true )
Approximate inference in BN 64
Likelihood weighting
• For any particular value x of X, the estimated posterior
probability can be calculated as follows:
Approximate inference in BN 68
Summary
This chapter has described Bayesian networks, a well-
developed representation for uncertain knowledge. Bayesian
networks play a role roughly analogous to that of propositional
logic for de nite knowledge.
• A Bayesian network is a directed acyclic graph whose nodes
correspond to random variables; each node has a conditional
distribution for the node given its parents.
• Bayesian networks provide a concise way to represent
conditional independence relationships in the domain.
• A Bayesian network speci es a full joint distribution; each joint
entry is de ned as the product of the corresponding entries in
the local conditional distributions. A Bayesian network is often
exponentially smaller than the full joint distribution.
69
fi
fi
fi
Summary (2)
• Inference in Bayesian networks means computing the probability
distribution of a set of query variables, given a set of evidence
variables. Exact inference algorithms, such as variable
elimination, evaluate sums of products of conditional
probabilities as ef ciently as possible.
• In polytrees (singly connected networks), exact inference takes
time linear in the size of the network. In the general case, the
problem is intractable.
• Stochastic approximation techniques such as likelihood
weighting and Markov chain Monte Carlo can give
reasonable estimates of the true posterior probabilities in a
network and can cope with much larger networks than can
exact algorithms.
70
fi