0% found this document useful (0 votes)
3 views

Probabilistic Reasoning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Probabilistic Reasoning

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

PROBABILISTIC

REASONING SYSTEMS

In which we explain how to build reasoning systems that use network


models to reason with uncertainty according to the laws of probability
theory.
Outline

• Knowledge in uncertain domains


• Probabilistic Networks
• Semantic of Bayesian Networks
– Global Semantic
– Local Semantic
• Ef cient representation of conditions distributions
• Exact inference in Bayesian Networks
• Approximate inference in Bayesian Networks
• Summary

2
fi
Bayes’-Rule
Why is this rule useful?
• Causal experiences Cause
C: cause, E: effect Mistake in
technical system
• Diagnostic Inference

causal diagnostic
inferences inferences

This simple equation underlies


Effect
all modern AI systems for system behavior
shows symptom
probabilistic inference

Bayes’ Rule 3
Knowledge in uncertain domains

• Joint probability distribution • Complexity


– delivers answers to questions – Independence and conditional
that exists in domain dependence reduce complexity
– Problem: intractable with large
number of variables • Bayesian Networks
– Speci cation – Data structure represents
Probabilities dif cult for atomic dependencies between
events variables
– Speci cation of joint
distribution

Knowledge in uncertain domains 4


fi
fi
fi
Syntax

• Graph-theoretical structure • Conditional probability tables


– Set of variables as nodes – For each node a table for
(discrete, continuous) conditional probabilities
– Each node corresponds to – Table consists of distribution of
random variable probabilities given their parents
– Directed acyclic graph (DAG), P(Xi|Parents(Xi))
links express causal
dependencies between variables

Cavity Weather

Toothache Catch

Probabilistic Networks 5
Simple Bayesian Network • Example Alarm
– new burglar alarm fairly
reliable at detecting a
burglary
– also responds on occasion to
Burglary Earthquake minor earthquake
– two neighbors, John and Mary
have promised to call when
they hear the alarm
– John always calls when he
Alarm hears alarm, but sometimes
confuses the telephone
ringing with the alarm and
calls then too
JohnCalls MaryCalls – Mary likes loud music and
sometimes misses the alarm
altogether
– Given the evidence of who
has or has not called, we
would like to estimate the
Probabilistic Networks probability of a burglary. 6
Simple Bayesian Network

P(B) P(E)
Burglary Earthquake
0.001 0.002

B E P(A)
1 1 0.95
Alarm 1 0 0.94
0 1 0.29
0 0 0.001

A P(J) A P(M)
JohnCalls MaryCalls 1 0.90 1 0.70
0 0.05 0 0.01

conditional distributions

Probabilistic Networks 7
Semantics of Bayesian Networks

• Two views on semantics • Views are equivalent


1. Global Semantics: The rst – rst helpful in
is to see the network as a understanding how to
representation of the joint construct networks
probability distribution – second helpful in designing
2. Local Semantics: The inference procedures
second is to view it as an
encoding of a collection of
conditional independence
statements

Semantics of Bayesian Networks 8


fi
fi
Representing the full joint distribution

• General idea
– Joint distribution can be expressed Burglary Earthquake
as product of local conditional
probabilities
– Every entry in the joint probability
distribution can be calculated from Alarm
the information in the network.
– Generic entry

JohnCalls MaryCalls

Global Semantics 9
Representing the full joint distribution
• Example
P(B) P(E)
– Alarm has sounded but neither a
burglary nor an earthquake has 0.001 0.002
occurred, and both John and Mary B E P(A)
call
1 1 0.95
– P (j ∧ m ∧ a ∧ ¬b ∧ ¬e) 1 0 0.94
= P (j|a) P(m|a) P(a|¬b,¬e) P(¬b) P(¬e) 0 1 0.29
= 0.9 × 0.7 × 0.001 × 0.999 × 0.998 0 0 0.001
≈ 0.00063
A P(J) A P(M)
1 0.90 1 0.70
0 0.05 0 0.01

Global Semantics 10
Method for Constructing Bayesian Networks

• Generic Rule • Reformulate the rule


– Use of conditional
probabilities
 Product rule
– Semantic
but: not how to construct a
network • Repeat process
– Implicitly: conditional – Reduction of conjunctive
independence probabilities to a conditional
Help for Knowledge dependency and a smaller
Engineer conjunction
– Final: a big product

Global Semantics 11
Chain rule

• Compare with • I.e.:


– This last condition is satis ed
by labeling the nodes in any
reveals that speci cation is equivalent to order that is consistent with
general assertion the partial order implicit in the
graph structure.
P(Xi|Xi-1,…,x1) = P(Xi|Parents(Xi)) – The Bayesian network is a
correct representation of the
domain only if each node is
(as long as Parents(Xi) ⊆ {Xi-1,…,X1}) conditionally independent of its
predecessors in the node
ordering, given its parents.

Global Semantics 12
fi
fi
Construction of Bayesian Networks

• Important while constructing


– We need to choose parents for each Burglary Earthquake
node such that this property holds.

• Intuitive
– Parents of node Xi should contain all Alarm
those nodes in X1,…Xi-1 that
directly in uence Xi
– Example.:
• M (is in uenced by B or E but not JohnCalls MaryCalls
directly)
• In uenced by A, and J calls are not
evident
P(M|J,A,E,B) = P(M|A)

Global Semantics 13
fl
fl
fl
General Procedure

1. Choose the set of relevant variables Xi that describe the


domain.
2. Choose an ordering for the variables.
(Any ordering works, but some orderings work better than others, as we will
see.)
3. While there are variables left:
a) Pick a variable Xi and add a node to the network for it.
b) Set Parents(Xi) to some minimal set of nodes already in the net such that
the conditional independence property is satis ed.
c) De ne the conditional distribution P(Xi|Parents(Xi))

Global Semantics 14
fi
fi
Notes
• Construction method guarantees that the network is
acyclic
– Because each node is connected only to earlier nodes.

• Redundancies
– No redundant probability values
– Exception: for one entry in each row of each conditional
probability table, if (P(x2|x1) P(¬x2|x1)) is redundant

• This means that it is impossible for the knowledge engineer


or domain expert to create a Bayesian network that violates
the axioms of probability!
Global Semantics 15
Compactness
• Compactness • Local Structures (also: sparse)
– A Bayesian Network is a – Each sub-component is
complete and not-redundant connected to a limited
representation of a domain number of other components
– Can be more compact as a – Complexity: linear instead of
joint distribution exponential
– This is important in practice – With BN: in most domains
– Compactness is an example one variable is in uenced by k
for property that we call in others, with n variables 2k
local structure (or sparse conditional probabilities, the
coded) in general whole network n2k
– In contrast, the full joint
distribution contains 2n
numbers

Global Semantics 16
fl
Node Ordering
• Local structures (example) • Order
– 30 nodes, each max. 5 parents – Add:
– 960 for BN, > 1 billion with • root rst
joint distribution • then direct in uencers
• then down to leaves

• Construction
– What happens with “wrong”
– Not trivial order?
– Variable directly in uenced
only from a few others
– Set parent node
“appropriately” ➞ Network
topology
– “Direct in uencers” rst
– Thus: correct order important
Global Semantics 17
fi
fl
fl
fl
fi
Example ordering
• Let us consider the burglary
example again. Burglary Earthquake

• Suppose we decide to add the


nodes in the order Alarm

– M, J, A, B, E
– M, J, E, B, A
JohnCalls MaryCalls

Global Semantics 18
Example

19
Example contd.

20
Example contd.

21
Example contd.

22
Example contd.

23
Example contd.

24
Example ordering (2)
• Order
1 2 – M,J,E,B,A
MaryCalls JohnCalls
• Network
– 31 probabilities
3 – like full joint distribution
– thus: bad choice
Earthquake
• All three networks represent
same probability distribution
Burglary Alarm
• Last two versions
4 5 – simply fail to represent all the
conditional independence
relationships
– end up specifying a lot of
unnecessary numbers instead.
Global Semantics 25
Conditional independence relations in Bayesian
networks
• Before
– “numerical (global) semantics Burglary Earthquake
with probability distribution
– from this derive conditional
independencies
Alarm
• Idea now
– opposite direction: topological
(local) semantics
JohnCalls MaryCalls
– speci es conditional
independencies
– from this derive numerical
semantics

Local Semantics 26
fi
Conditional independence relations in Bayesian
networks
• General idea
– A node is conditionally
independent of its non- Burglary Earthquake
descendants given its parents
– A node is conditionally
independent of all other nodes in
the network given its parents, Alarm
children, and children’s parents—
that is, given its Markov
blanket
• Examples
J is independent of B and E given A, i.e. JohnCalls MaryCalls
P(J|A,B,E) = P(J|A)

B is independent of J and M given A and E, i.e.


P(B|A,E,J,M) = P(B|A,E)

Local Semantics 27
Conditional independence relations in Bayesian
networks

• Node X is conditionally • A node X is conditionally


independent of its non-descendants independent of all other nodes in
(e.g., the Zij s) given its parents (the the network given its Markov
Uij s) blanket.
Local Semantics 28
Compact conditional distributions

Ef cient Representation of conditional distributions 29


fi
P (¬f ever|cold, ¬f lu, ¬malaria) = 0.6
P (¬f ever|¬cold, f lu, ¬malaria) = 0.2

Compact conditional distributions P (¬f ever|¬cold, ¬f lu, malaria) = 0.1

Ef cient Representation of conditional distributions 30


fi
Bayesian nets with continuous variables

Subsidy? Harvest

Cost

Buys?

Ef cient Representation of conditional distributions 31


fi
Continuous child variables

Ef cient Representation of conditional distributions 32


fi
Continuous child variables

Ef cient Representation of conditional distributions 33


fi
Discrete variable w/ continuous parents

Subsidy? Harvest

Cost

Buys?

Ef cient Representation of conditional distributions 34


fi
Discrete variable w/ continuous parents

Ef cient Representation of conditional distributions 35


fi
Discrete variable w/ continuous parents

36
Inference tasks

Exact inference by enumeration 37


Enumeration algorithm

Exact inference by enumeration 38


Inference by enumeration

<latexit sha1_base64="iSh0IN7RqaZW3RGq0diCQBplaDw=">AAACH3icbZBLSwMxFIUzvq2vqks3wSIoSJn6qF0W3bisYFuhU+VO5o5Gk8yQZJQy9H+4rX/Gnbj1v7gwrRV8HQh8nHMvuZwwFdxY33/zJianpmdm5+YLC4tLyyvF1bWWSTLNsMkSkeiLEAwKrrBpuRV4kWoEGQpsh3cnw7x9j9rwRJ3bXopdCdeKx5yBddZlHoQxbfS3j3dvd+XOVbHkl6u1/cPqPvXL/ki08htKZKzGVfE9iBKWSVSWCTCmU/FT281BW84E9gtBZjAFdgfX2HGoQKLp5qOr+3TLORGNE+2esnTkft/IQRrTk6GblGBvzO9saP6XdTIb17o5V2lmUbHPj+JMUJvQYQU04hqZFT0HwDR3t1J2AxqYdUUVAo0KH1giJagoD2KQXPQijCETtp8HJv7igqvrTzl/obVXrlTLB2cHpXptXNwc2SCbZJtUyBGpk1PSIE3CiCaPZECevIH37L14r5+jE954Z538kPf2ASx5o6I=</latexit>

P(B, j, m)
<latexit sha1_base64="JkcjlKEWADoo28Pg20v8OX2o568=">AAACKnicbZBLaxsxFIU1SZqmbpNOkkUX3YiYQgLBjJvE8aZgmk2XDsQP8BhzR3MnVi1pBknTYob5NdmmfyY7021+RheVH4G8Dgg+zrkXSSfKBDc2CGbe2vrGm823W+8q7z9s73z0d/e6Js01ww5LRar7ERgUXGHHciuwn2kEGQnsRZOLed77hdrwVF3ZaYZDCdeKJ5yBddbI//SNhiCyMdAijBLaLg+/H/88lkcjvxrUGs2Ts8YJDWrBQrT+HKpkpfbI/xfGKcslKssEGDOoB5kdFqAtZwLLSpgbzIBN4BoHDhVINMNi8YGSfnFOTJNUu6MsXbiPNwqQxkxl5CYl2LF5ns3N17JBbpPmsOAqyy0qtrwoyQW1KZ23QWOukVkxdQBMc/dWysaggVnXWSXUqPA3S6UEFRdhApKLaYwJ5MKWRWiSB664ul6U8xK6X2v1Ru308rTaaq6K2yKfyQE5JHVyTlrkB2mTDmGkJDfklvzxbr07b+b9XY6ueaudffJE3v1//xCnDA==</latexit>

= ↵P(B,
XX j, m)
<latexit sha1_base64="vUUznu7B5haZwxbCiVr6CXKcaIA=">AAACPHicbVBNTxsxEPUGaCEtJaVHLhZRpUSKog3QkAtS1F44BokEpGwUzXpnE4PtXdleqmiV38Cv6TX9Hb33VnHlxAHnA6kNfZLlp/dmNDMvTAU31vd/eYWNza03b7d3iu/e737YK33c75kk0wy7LBGJvg7BoOAKu5ZbgdepRpChwKvw9tvcv7pDbXiiLu0kxYGEkeIxZ2CdNCxVz2gAIh0DDUwmh7j8gOZBGNPOtPK1dlOTNaxBdVgq+/Vm6/hL85j6dX8B2lgnZbJCZ1h6CqKEZRKVZQKM6Tf81A5y0JYzgdNikBlMgd3CCPuOKpBoBvnipCn97JSIxol2T1m6UP/uyEEaM5Ghq5Rgx2bdm4v/8/qZjVuDnKs0s6jYclCcCWoTOs+HRlwjs2LiCDDN3a6UjUEDsy7FYqBR4XeWSAkqyoMYJBeTCGPIhJ3mgYlfeNHF9Sqc16R3VG806ycXJ+V2axXcNjkgh6RCGuSUtMk56ZAuYeSe/CAz8tObeb+9P97DsrTgrXo+kX/gPT4D7m2uBA==</latexit>

=↵ P(B, j, m, e, a)
e a

<latexit sha1_base64="e0AiG8DbbZ0xkrC+azfkn89N8wY=">AAACVHicdVBda9swFFXcduu8rcvWx72IhUELxdhZyMfDoF1f+pjB0hbiEK7l60atJBtJ3ghOfs9+zV47+l/6UDlJYR3bAd17OOdeJJ2kENzYMLxreFvbO8+e777wX756vfem+fbduclLzXDEcpHrywQMCq5wZLkVeFloBJkIvEhuTmv/4jtqw3P1zc4LnEi4UjzjDKyTps2T4UGyuD6Sh/Qz9WMQxQxobEo5xXUD6gYOXcG6wCI5WpHrBdRNujZttsKg24/aYZeGQdTu9HsDRwaDfu9Tj0ZBuEKLbDCcNu/jNGelRGWZAGPGUVjYSQXaciZw6celwQLYDVzh2FEFEs2kWn11ST86JaVZrt1Rlq7UPzcqkMbMZeImJdiZ+durxX9549Jm/UnFVVFaVGx9UVYKanNa50ZTrpFZMXcEmOburZTNQAOzLl0/1qjwB8ulBJVWcQaSi3mKGZTCLqvYZI/cd3E9ZkL/T87bQdQNOl87reMvm+B2yXvygRyQiPTIMTkjQzIijPwkv8gt+d24bdx7W97OetRrbHb2yRN4ew8ln7JP</latexit>

XX
P (b|j, m) = ↵ P (b)P (e)P (a|b, e)P (j|a)P (m|a)
e a

P(B|j, m) = ↵h0.00059224, 0.0014919i ⇡ h0.284, 0.716i


Exact inference by enumeration 39
Evaluation tree

Exact inference by enumeration 40


Inference by variable elimination

Exact inference by enumeration 41


Inference by variable elimination

Exact inference by enumeration 42


Variable elimination: Basic operations

Pointwise product of factorsf1 andf2 :


f1 (x1 , . . . , xj , y1 , . . . , yk ) ⇥ f2 (y1 , . . . , yk , z1 , . . . , zl )
= f (x1 , . . . , xj , y1 , . . . , yk , z1 , . . . , zl )
E.g., f1 (a, b) ⇥ f2 (b, c) = f (a, b, c)
<latexit sha1_base64="OJ78PhF9t/YBG0fzPS+pAxGAdYA=">AAADtXicfVJdb9MwFHUbPkbHRwePvFgUpFaKoqaDDZCQJhASj0WiW6UmihzH6UyTOLKdrVmUv8Uf4JE33vgh8MxNUtatnbiSo+N7z7nH8bWfRlzp4fBXq23cun3n7s69zu79Bw8fdfceHyuRScomVERCTn2iWMQTNtFcR2yaSkZiP2In/uJDVT85Y1JxkXzRecrcmMwTHnJKNKS8vdYPJ2AhiOtWRZxDnSWalAWN80VZDM2hdWBW35dlZ4PqRxkrCzn3gWZVlGq9AlrCzqmIY5IEhRMTXc5st3A0W+otk55dbvDBItkWNFYNu8o3vGIseKLPuWI4lSLIqMYixCGhWkhVlrgyL0LPBnRFBDbr2qh86zgdfEntLz3bdKJAaGUuva9mvt7m3mIAjXjMFAggQN2/Xjcv1tsLLxqUde93q+7/732T9qM1t8z12Yjp/ztBbe6bdADdw6oAsPS6PRhCHXgb2CvQO3r++9v3s90/Y6/70wkEzWIYBo2IUjN7mGq3IFJzGjG460yxlNAFjGsGMCHg7Bb1UEr8AjIBDoWElWhcZ68qChIrlcc+MOEHTtVmrUreVJtlOnztFjxJM80S2hiFWYS1wNUTxgGXjOooB0Co5HBWTE+JhLHDQ+/Ul/CmioPLX94GxyPL3rf2P8NtvEdN7KCn6BnqIxsdoiP0CY3RBNH2qD1tk7ZvHBquERhhQ223Vpon6FoY4i95MDRG</latexit>

Summing out a variable from a product of factors:


move any constant factors outside the summation
add up submatrices in pointwise product of remaining factors
<latexit sha1_base64="xTrMYrmz8nMOxCnUMb0u9/ngzF4=">AAADX3icbVJNb9NAEHUcoCUtbQonBEIrIiQOVRRTKB+nCi4cW0HaSrFVrdfjZBXvrrUfaS3LR/5Tfwc3JC6c+ROM7QAlYaS1xvPezNt52jjPuLGj0beO3711+87G5t3e1va9nd3+3v1To5xmMGYqU/o8pgYyLmFsuc3gPNdARZzBWTz/UONnC9CGK/nZFjlEgk4lTzmjFksXex0XJpBiczOqFAXiIC2tSiaKeVWO9kfDw/36+7LqrVDjzEFV6mmMtGFNqc8rpEm4ZEoIKpMyFNRWkyAqQwtXdk1kEFQrfJSQ6w2tVMuu6y2v/OSE4HJKlLMVoWRBNae4OUm1Evifa5U4ZolKSUqZVdq8q8JwOUGoBRAqC8KUNJZK+5tTTzM8AWJnQAwqNFb9baRJQlyOSIyI5gwM4ZLkikt7yQ3cVNUgKJf1DZezccokgKvooj9Au5og60mwTAZHj69Pfn55cn180f8aJoo5gbaxjBozCUa5jUqqLWcZoCvOQE7ZHI2dYCqpABOVjX0VeYaVhKRK48E9m+rNjpIKYwoRIxNXmplVrC7+D5s4m76JSi5zZ0GyVih1GbGK1I+NJFwDs1mBCWWa410Jm1GNXuCT7DUmvK3j8M/K68npi2FwMDw4QTfee21seo+8p95zL/Bee0feR+/YG3us8933/S1/2//R3ejudPst1e8sex54/0T34S+3Whxb</latexit>

⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄

assuming f1 , . . . , fi do not depend on X


<latexit sha1_base64="aqeXn6Ga999+fl4lmRIfCYjArpA=">AAAD7XicfVPNbtQwEE4Tfsry0y09cjG0SEhEq00LBYQqVXDpsQi2XWm9WjmOszUb21HswEaW3wEhcQAhrrwPN96Ax2CSlKrsQkdy9GXmm/lmxkmcZ1ybfv/nih9cunzl6uq1zvUbN2+tdddvH2lVFpQNqMpUMYyJZhmXbGC4ydgwLxgRccaO49nLOn78jhWaK/nGVDkbCzKVPOWUGHBN1v0AJyzFotKlsDhmUy7tQTllDotYze0Wfs2ngmw5zGTSBlynydCGC6atxSFuEA6dQwg1Qeil6cyKCuSYNMRZKqqZs/2w39sN6+cjt0iNs5I5W0xjoPVqSn0eA02y91QJQaADLIhxo2hssWFzsySyGbkFPkjI5YRWqmVDRdvOP5mjdBKh09EQpoky+uw1nczQXudCAkfoTyX8HN4tfxi5/9A7Tb2LBfk5DJdDCjt0zmE82mbzcaeZyBINelxOYf+omQUqhjirK4VQAZwNDYeJQlIZlLAcrhIpeZYwdJPuJmy7MbQMolOwuX93z3ysNn4dTro/cKJoKWDrNIMORlE/N2NLCsNpxmCppWY5oTO4lxFASWCGsW2279B98CQoVQUcaVDjPZ9hidC6EjEwob0TvRirnf+KjUqTPh1bLvPSMElbobTMkFGo/vRRwgtGTVYBILTg0CuiJ6Qg1MAP0mmW8Ky23bORl8HRdi/a6e28gm288Fpb9e5497wHXuQ98fa9A+/QG3jUf+t/8D/7XwIVfAq+Bt9aqr9ymrPh/WXB998ejVA+</latexit>

Exact inference by enumeration 43


Variable elimination: Basic operations
Pointwise product of factorsf1 andf2 :
f1 (x1 , . . . , xj , y1 , . . . , yk ) ⇥ f2 (y1 , . . . , yk , z1 , . . . , zl )
= f (x1 , . . . , xj , y1 , . . . , yk , z1 , . . . , zl )
E.g., f1 (a, b) ⇥ f2 (b, c) = f (a, b, c)
<latexit sha1_base64="OJ78PhF9t/YBG0fzPS+pAxGAdYA=">AAADtXicfVJdb9MwFHUbPkbHRwePvFgUpFaKoqaDDZCQJhASj0WiW6UmihzH6UyTOLKdrVmUv8Uf4JE33vgh8MxNUtatnbiSo+N7z7nH8bWfRlzp4fBXq23cun3n7s69zu79Bw8fdfceHyuRScomVERCTn2iWMQTNtFcR2yaSkZiP2In/uJDVT85Y1JxkXzRecrcmMwTHnJKNKS8vdYPJ2AhiOtWRZxDnSWalAWN80VZDM2hdWBW35dlZ4PqRxkrCzn3gWZVlGq9AlrCzqmIY5IEhRMTXc5st3A0W+otk55dbvDBItkWNFYNu8o3vGIseKLPuWI4lSLIqMYixCGhWkhVlrgyL0LPBnRFBDbr2qh86zgdfEntLz3bdKJAaGUuva9mvt7m3mIAjXjMFAggQN2/Xjcv1tsLLxqUde93q+7/732T9qM1t8z12Yjp/ztBbe6bdADdw6oAsPS6PRhCHXgb2CvQO3r++9v3s90/Y6/70wkEzWIYBo2IUjN7mGq3IFJzGjG460yxlNAFjGsGMCHg7Bb1UEr8AjIBDoWElWhcZ68qChIrlcc+MOEHTtVmrUreVJtlOnztFjxJM80S2hiFWYS1wNUTxgGXjOooB0Co5HBWTE+JhLHDQ+/Ul/CmioPLX94GxyPL3rf2P8NtvEdN7KCn6BnqIxsdoiP0CY3RBNH2qD1tk7ZvHBquERhhQ223Vpon6FoY4i95MDRG</latexit>

Exact inference by enumeration 44


Variable elimination: Basic operations
Summing out a variable from a product of factors:
move any constant factors outside the summation
add up submatrices in pointwise product of remaining factors
<latexit sha1_base64="xTrMYrmz8nMOxCnUMb0u9/ngzF4=">AAADX3icbVJNb9NAEHUcoCUtbQonBEIrIiQOVRRTKB+nCi4cW0HaSrFVrdfjZBXvrrUfaS3LR/5Tfwc3JC6c+ROM7QAlYaS1xvPezNt52jjPuLGj0beO3711+87G5t3e1va9nd3+3v1To5xmMGYqU/o8pgYyLmFsuc3gPNdARZzBWTz/UONnC9CGK/nZFjlEgk4lTzmjFksXex0XJpBiczOqFAXiIC2tSiaKeVWO9kfDw/36+7LqrVDjzEFV6mmMtGFNqc8rpEm4ZEoIKpMyFNRWkyAqQwtXdk1kEFQrfJSQ6w2tVMuu6y2v/OSE4HJKlLMVoWRBNae4OUm1Evifa5U4ZolKSUqZVdq8q8JwOUGoBRAqC8KUNJZK+5tTTzM8AWJnQAwqNFb9baRJQlyOSIyI5gwM4ZLkikt7yQ3cVNUgKJf1DZezccokgKvooj9Au5og60mwTAZHj69Pfn55cn180f8aJoo5gbaxjBozCUa5jUqqLWcZoCvOQE7ZHI2dYCqpABOVjX0VeYaVhKRK48E9m+rNjpIKYwoRIxNXmplVrC7+D5s4m76JSi5zZ0GyVih1GbGK1I+NJFwDs1mBCWWa410Jm1GNXuCT7DUmvK3j8M/K68npi2FwMDw4QTfee21seo+8p95zL/Bee0feR+/YG3us8933/S1/2//R3ejudPst1e8sex54/0T34S+3Whxb</latexit>

⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄

assuming f1 , . . . , fi do not depend on X


<latexit sha1_base64="aqeXn6Ga999+fl4lmRIfCYjArpA=">AAAD7XicfVPNbtQwEE4Tfsry0y09cjG0SEhEq00LBYQqVXDpsQi2XWm9WjmOszUb21HswEaW3wEhcQAhrrwPN96Ax2CSlKrsQkdy9GXmm/lmxkmcZ1ybfv/nih9cunzl6uq1zvUbN2+tdddvH2lVFpQNqMpUMYyJZhmXbGC4ydgwLxgRccaO49nLOn78jhWaK/nGVDkbCzKVPOWUGHBN1v0AJyzFotKlsDhmUy7tQTllDotYze0Wfs2ngmw5zGTSBlynydCGC6atxSFuEA6dQwg1Qeil6cyKCuSYNMRZKqqZs/2w39sN6+cjt0iNs5I5W0xjoPVqSn0eA02y91QJQaADLIhxo2hssWFzsySyGbkFPkjI5YRWqmVDRdvOP5mjdBKh09EQpoky+uw1nczQXudCAkfoTyX8HN4tfxi5/9A7Tb2LBfk5DJdDCjt0zmE82mbzcaeZyBINelxOYf+omQUqhjirK4VQAZwNDYeJQlIZlLAcrhIpeZYwdJPuJmy7MbQMolOwuX93z3ysNn4dTro/cKJoKWDrNIMORlE/N2NLCsNpxmCppWY5oTO4lxFASWCGsW2279B98CQoVQUcaVDjPZ9hidC6EjEwob0TvRirnf+KjUqTPh1bLvPSMElbobTMkFGo/vRRwgtGTVYBILTg0CuiJ6Qg1MAP0mmW8Ky23bORl8HRdi/a6e28gm288Fpb9e5497wHXuQ98fa9A+/QG3jUf+t/8D/7XwIVfAq+Bt9aqr9ymrPh/WXB998ejVA+</latexit>
X = variable to be summed out

Exact inference by enumeration 45


Variable elimination algorithm

Exact inference by enumeration 46


Complexity of exact inference

Exact inference by enumeration 47


Clustering algorithms
• Variable elimination algorithm is simple and ef cient for
answering individual queries
• For computation of posterior probabilities for all the
variables in a network it can be less ef cient: O(n2)
• Using clustering algorithms (also known as join tree
algorithms), this can be reduced to O(n).
• The basic idea of clustering is to join individual nodes of
the network to form cluster nodes in such a way that the
resulting network is a polytree.

Exact inference by enumeration 48


fi
fi
Clustering algorithms
• Multiply connected network can be
converted into a polytree by
combining Sprinkler and Rain node
into cluster node called
Sprinkler+Rain.
• Two Boolean nodes replaced by a
mega-node that takes on four
possible values: TT,TF, FT, FF. The
mega-node has only one parent, the
Boolean variable Cloudy, so there are
two conditioning cases.

Exact inference by enumeration 49


APPROXIMATE INFERENCE IN BAYESIAN
NETWORKS
• Randomized sampling algorithms, also called Monte
Carlo algorithms
• Provide approximate answers whose accuracy depends
on the number of samples generated
• Monte Carlo algorithms are used in many branches of
science to estimate quantities that are difficult to
calculate exactly.
• Here: sampling applied to the computation of posterior
probabilities
• Two families of algorithms: direct sampling and Markov
chain sampling

50
Direct sampling methods

• Generation of samples from a known probability


distribution
• Example
P(Coin) = ⟨0.5,0.5⟩

• Sampling from this distribution is exactly like


flipping the coin: with probability 0.5 it will return
heads , and with probability 0.5 it will return tails.

Approximate inference in BN 51
Direct sampling methods

Approximate inference in BN 52
Direct sampling methods

Approximate inference in BN 53
Direct sampling methods

• PRIOR-SAMPLE generates samples from the prior


joint distribution specified by the network

• Each sampling step depends only on the parent


values
SPS(x1 … xn) = P(x1 … xn) .

Approximate inference in BN 54
Computing answers

• Answers are computed by counting the actual


samples generated
• Say, N total samples and NPS(x1,…, xn) number of
times the event x1,…xn occurs in the samples

Approximate inference in BN 55
Computing answers

• For example, consider the event produced earlier:


[true,false,true,true]. The sampling probability for
this event is

• Hence, in the limit of large N, we expect 32.4% of


the samples to be of this event.

Approximate inference in BN 56
Rejection sampling

Approximate inference in BN 57
Rejection sampling
• Let ^
P(X|e) be the estimated distribution. Then
from the definition just given

• Rejection sampling produces a consistent estimate


of the true probability.

Approximate inference in BN 58
Rejection sampling

• Estimate P(Rain | Sprinkler = true), using 100


samples. Of the 100 that we generate, suppose
that 73 have Sprinkler = false and are rejected,
while 27 have Sprinkler = true.
• Of the 27, 8 have Rain = true and 19 have Rain =
false.
• Thus,

P(Rain | Sprinkler = true) ≈ NORMALIZE( < 8,19 > ) = < 0.296,0.704 >

Approximate inference in BN 59
Rejection sampling

How often does it rain the day after we have observed aurora
borealis? Ignoring all those days with no aurora borealis…
60
Likelihood weighting

• Likelihood weighting avoids the inefficiency of


rejection sampling
• It generates only events that are consistent with
the evidence e.
• It is a particular instance of the general statistical
technique of importance sampling, tailored for
inference in Bayesian networks.
• Let’s see how it works…

Approximate inference in BN 61
Likelihood weighting

Approximate inference in BN 62
Likelihood weighting
For the evidence P(Rain | Cloudy = true , WetGrass = true )

• Cloudy is an evidence variable with


value true. Therefore, we set
w ← w×P(Cloudy=true) = 0.5 .
• Sprinkler is not an evidence variable, so sample from
P(Sprinkler | Cloudy = true ) = ⟨0.1,0.9⟩; suppose this
returns false.
• Similarly, sample from P(Rain|Cloudy=true) =
⟨0.8,0.2⟩; suppose this returns true .
• WetGrass is an evidence variable with value true.
Therefore, we set
w ← w × P(WetGrass=true | Sprinkler=false, Rain=true ) =
0.5 x 0.9 = 0.45
Approximate inference in BN 63
Likelihood weighting

• The weight for a given sample x is the product of the


likelihoods for each evidence variable given its
parents

• Multiplying the last two equations we see that the


weighted probability of a sample has the particularly
convenient form

Approximate inference in BN 64
Likelihood weighting
• For any particular value x of X, the estimated posterior
probability can be calculated as follows:

• Hence, likelihood weighting returns consistent


estimates.
Approximate inference in BN 65
Inference by Markov chain simulation

• Markov chain Monte Carlo (MCMC) algorithms


work quite differently from rejection sampling and
likelihood weighting.
• MCMC state change similar to SA
• Gibbs sampling well suited for BN
• Starts with an arbitrary state and generates a next
state by randomly sampling a value for one of the
nonevidence variables Xi.
• Sampling for Xi is done conditioned on the current
values of the variables in the Markov blanket of Xi.
Approximate inference in BN 66
Inference by Markov chain
simulation
• P(Rain | Sprinkler=true, WetGrass=true)
• Initial state is [true,true,false,true]
• Cloudy:
– P(Cloudy|Sprinkler=true,Rain=false). Suppose the
result is Cloudy = false.
– Then the new current state is [false,true,false,true].
• Rain:
– Given the current values of its Markov blanket
variables: P(Rain | Cloudy = false , Sprinkler = true ,
WetGrass = true ). Suppose this yields Rain = true .
– The new current state is [false,true,true,true].
Approximate inference in BN 67
Inference by Markov chain simulation

Approximate inference in BN 68
Summary
This chapter has described Bayesian networks, a well-
developed representation for uncertain knowledge. Bayesian
networks play a role roughly analogous to that of propositional
logic for de nite knowledge.
• A Bayesian network is a directed acyclic graph whose nodes
correspond to random variables; each node has a conditional
distribution for the node given its parents.
• Bayesian networks provide a concise way to represent
conditional independence relationships in the domain.
• A Bayesian network speci es a full joint distribution; each joint
entry is de ned as the product of the corresponding entries in
the local conditional distributions. A Bayesian network is often
exponentially smaller than the full joint distribution.

69
fi
fi
fi
Summary (2)
• Inference in Bayesian networks means computing the probability
distribution of a set of query variables, given a set of evidence
variables. Exact inference algorithms, such as variable
elimination, evaluate sums of products of conditional
probabilities as ef ciently as possible.
• In polytrees (singly connected networks), exact inference takes
time linear in the size of the network. In the general case, the
problem is intractable.
• Stochastic approximation techniques such as likelihood
weighting and Markov chain Monte Carlo can give
reasonable estimates of the true posterior probabilities in a
network and can cope with much larger networks than can
exact algorithms.

70
fi

You might also like