0% found this document useful (0 votes)

4 views70 pages

Probabilistic Reasoning

Uploaded by

pranavcholletiscribd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views70 pages

Probabilistic Reasoning

Uploaded by

pranavcholletiscribd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

PROBABILISTIC

REASONING SYSTEMS

In which we explain how to build reasoning systems that use network

models to reason with uncertainty according to the laws of probability
theory.
Outline

• Knowledge in uncertain domains

• Probabilistic Networks
• Semantic of Bayesian Networks
– Global Semantic
– Local Semantic
• Ef cient representation of conditions distributions
• Exact inference in Bayesian Networks
• Approximate inference in Bayesian Networks
• Summary

2
fi
Bayes’-Rule
Why is this rule useful?
• Causal experiences Cause
C: cause, E: effect Mistake in
technical system
• Diagnostic Inference

causal diagnostic
inferences inferences

This simple equation underlies

Effect
all modern AI systems for system behavior
shows symptom
probabilistic inference

Bayes’ Rule 3
Knowledge in uncertain domains

• Joint probability distribution • Complexity

– delivers answers to questions – Independence and conditional
that exists in domain dependence reduce complexity
– Problem: intractable with large
number of variables • Bayesian Networks
– Speci cation – Data structure represents
Probabilities dif cult for atomic dependencies between
events variables
– Speci cation of joint
distribution

Knowledge in uncertain domains 4

fi
fi
fi
Syntax

• Graph-theoretical structure • Conditional probability tables

– Set of variables as nodes – For each node a table for
(discrete, continuous) conditional probabilities
– Each node corresponds to – Table consists of distribution of
random variable probabilities given their parents
– Directed acyclic graph (DAG), P(Xi|Parents(Xi))
links express causal
dependencies between variables

Cavity Weather

Toothache Catch

Probabilistic Networks 5
Simple Bayesian Network • Example Alarm
– new burglar alarm fairly
reliable at detecting a
burglary
– also responds on occasion to
Burglary Earthquake minor earthquake
– two neighbors, John and Mary
have promised to call when
they hear the alarm
– John always calls when he
Alarm hears alarm, but sometimes
confuses the telephone
ringing with the alarm and
calls then too
JohnCalls MaryCalls – Mary likes loud music and
sometimes misses the alarm
altogether
– Given the evidence of who
has or has not called, we
would like to estimate the
Probabilistic Networks probability of a burglary. 6
Simple Bayesian Network

P(B) P(E)
Burglary Earthquake
0.001 0.002

B E P(A)
1 1 0.95
Alarm 1 0 0.94
0 1 0.29
0 0 0.001

A P(J) A P(M)
JohnCalls MaryCalls 1 0.90 1 0.70
0 0.05 0 0.01

conditional distributions

Probabilistic Networks 7
Semantics of Bayesian Networks

• Two views on semantics • Views are equivalent

1. Global Semantics: The rst – rst helpful in
is to see the network as a understanding how to
representation of the joint construct networks
probability distribution – second helpful in designing
2. Local Semantics: The inference procedures
second is to view it as an
encoding of a collection of
conditional independence
statements

Semantics of Bayesian Networks 8

fi
fi
Representing the full joint distribution

• General idea
– Joint distribution can be expressed Burglary Earthquake
as product of local conditional
probabilities
– Every entry in the joint probability
distribution can be calculated from Alarm
the information in the network.
– Generic entry

JohnCalls MaryCalls

Global Semantics 9
Representing the full joint distribution
• Example
P(B) P(E)
– Alarm has sounded but neither a
burglary nor an earthquake has 0.001 0.002
occurred, and both John and Mary B E P(A)
call
1 1 0.95
– P (j ∧ m ∧ a ∧ ¬b ∧ ¬e) 1 0 0.94
= P (j|a) P(m|a) P(a|¬b,¬e) P(¬b) P(¬e) 0 1 0.29
= 0.9 × 0.7 × 0.001 × 0.999 × 0.998 0 0 0.001
≈ 0.00063
A P(J) A P(M)
1 0.90 1 0.70
0 0.05 0 0.01

Global Semantics 10
Method for Constructing Bayesian Networks

• Generic Rule • Reformulate the rule

– Use of conditional
probabilities
 Product rule
– Semantic
but: not how to construct a
network • Repeat process
– Implicitly: conditional – Reduction of conjunctive
independence probabilities to a conditional
Help for Knowledge dependency and a smaller
Engineer conjunction
– Final: a big product

Global Semantics 11
Chain rule

• Compare with • I.e.:

– This last condition is satis ed
by labeling the nodes in any
reveals that speci cation is equivalent to order that is consistent with
general assertion the partial order implicit in the
graph structure.
P(Xi|Xi-1,…,x1) = P(Xi|Parents(Xi)) – The Bayesian network is a
correct representation of the
domain only if each node is
(as long as Parents(Xi) ⊆ {Xi-1,…,X1}) conditionally independent of its
predecessors in the node
ordering, given its parents.

Global Semantics 12
fi
fi
Construction of Bayesian Networks

• Important while constructing

– We need to choose parents for each Burglary Earthquake
node such that this property holds.

• Intuitive
– Parents of node Xi should contain all Alarm
those nodes in X1,…Xi-1 that
directly in uence Xi
– Example.:
• M (is in uenced by B or E but not JohnCalls MaryCalls
directly)
• In uenced by A, and J calls are not
evident
P(M|J,A,E,B) = P(M|A)

Global Semantics 13
fl
fl
fl
General Procedure

1. Choose the set of relevant variables Xi that describe the

domain.
2. Choose an ordering for the variables.
(Any ordering works, but some orderings work better than others, as we will
see.)
3. While there are variables left:
a) Pick a variable Xi and add a node to the network for it.
b) Set Parents(Xi) to some minimal set of nodes already in the net such that
the conditional independence property is satis ed.
c) De ne the conditional distribution P(Xi|Parents(Xi))

Global Semantics 14
fi
fi
Notes
• Construction method guarantees that the network is
acyclic
– Because each node is connected only to earlier nodes.

• Redundancies
– No redundant probability values
– Exception: for one entry in each row of each conditional
probability table, if (P(x2|x1) P(¬x2|x1)) is redundant

• This means that it is impossible for the knowledge engineer

or domain expert to create a Bayesian network that violates
the axioms of probability!
Global Semantics 15
Compactness
• Compactness • Local Structures (also: sparse)
– A Bayesian Network is a – Each sub-component is
complete and not-redundant connected to a limited
representation of a domain number of other components
– Can be more compact as a – Complexity: linear instead of
joint distribution exponential
– This is important in practice – With BN: in most domains
– Compactness is an example one variable is in uenced by k
for property that we call in others, with n variables 2k
local structure (or sparse conditional probabilities, the
coded) in general whole network n2k
– In contrast, the full joint
distribution contains 2n
numbers

Global Semantics 16
fl
Node Ordering
• Local structures (example) • Order
– 30 nodes, each max. 5 parents – Add:
– 960 for BN, > 1 billion with • root rst
joint distribution • then direct in uencers
• then down to leaves

• Construction
– What happens with “wrong”
– Not trivial order?
– Variable directly in uenced
only from a few others
– Set parent node
“appropriately” ➞ Network
topology
– “Direct in uencers” rst
– Thus: correct order important
Global Semantics 17
fi
fl
fl
fl
fi
Example ordering
• Let us consider the burglary
example again. Burglary Earthquake

• Suppose we decide to add the

nodes in the order Alarm

– M, J, A, B, E
– M, J, E, B, A
JohnCalls MaryCalls

Global Semantics 18
Example

19
Example contd.

20
Example contd.

21
Example contd.

22
Example contd.

23
Example contd.

24
Example ordering (2)
• Order
1 2 – M,J,E,B,A
MaryCalls JohnCalls
• Network
– 31 probabilities
3 – like full joint distribution
– thus: bad choice
Earthquake
• All three networks represent
same probability distribution
Burglary Alarm
• Last two versions
4 5 – simply fail to represent all the
conditional independence
relationships
– end up specifying a lot of
unnecessary numbers instead.
Global Semantics 25
Conditional independence relations in Bayesian
networks
• Before
– “numerical (global) semantics Burglary Earthquake
with probability distribution
– from this derive conditional
independencies
Alarm
• Idea now
– opposite direction: topological
(local) semantics
JohnCalls MaryCalls
– speci es conditional
independencies
– from this derive numerical
semantics

Local Semantics 26
fi
Conditional independence relations in Bayesian
networks
• General idea
– A node is conditionally
independent of its non- Burglary Earthquake
descendants given its parents
– A node is conditionally
independent of all other nodes in
the network given its parents, Alarm
children, and children’s parents—
that is, given its Markov
blanket
• Examples
J is independent of B and E given A, i.e. JohnCalls MaryCalls
P(J|A,B,E) = P(J|A)

B is independent of J and M given A and E, i.e.

P(B|A,E,J,M) = P(B|A,E)

Local Semantics 27
Conditional independence relations in Bayesian
networks

• Node X is conditionally • A node X is conditionally

independent of its non-descendants independent of all other nodes in
(e.g., the Zij s) given its parents (the the network given its Markov
Uij s) blanket.
Local Semantics 28
Compact conditional distributions

Ef cient Representation of conditional distributions 29

fi
P (¬f ever|cold, ¬f lu, ¬malaria) = 0.6
P (¬f ever|¬cold, f lu, ¬malaria) = 0.2

Compact conditional distributions P (¬f ever|¬cold, ¬f lu, malaria) = 0.1

Ef cient Representation of conditional distributions 30

fi
Bayesian nets with continuous variables

Subsidy? Harvest

Cost

Buys?

Ef cient Representation of conditional distributions 31

fi
Continuous child variables

Ef cient Representation of conditional distributions 32

fi
Continuous child variables

Ef cient Representation of conditional distributions 33

fi
Discrete variable w/ continuous parents

Subsidy? Harvest

Cost

Buys?

Ef cient Representation of conditional distributions 34

fi
Discrete variable w/ continuous parents

Ef cient Representation of conditional distributions 35

fi
Discrete variable w/ continuous parents

36
Inference tasks

Exact inference by enumeration 37

Enumeration algorithm

Exact inference by enumeration 38

Inference by enumeration

<latexit sha1_base64="iSh0IN7RqaZW3RGq0diCQBplaDw=">AAACH3icbZBLSwMxFIUzvq2vqks3wSIoSJn6qF0W3bisYFuhU+VO5o5Gk8yQZJQy9H+4rX/Gnbj1v7gwrRV8HQh8nHMvuZwwFdxY33/zJianpmdm5+YLC4tLyyvF1bWWSTLNsMkSkeiLEAwKrrBpuRV4kWoEGQpsh3cnw7x9j9rwRJ3bXopdCdeKx5yBddZlHoQxbfS3j3dvd+XOVbHkl6u1/cPqPvXL/ki08htKZKzGVfE9iBKWSVSWCTCmU/FT281BW84E9gtBZjAFdgfX2HGoQKLp5qOr+3TLORGNE+2esnTkft/IQRrTk6GblGBvzO9saP6XdTIb17o5V2lmUbHPj+JMUJvQYQU04hqZFT0HwDR3t1J2AxqYdUUVAo0KH1giJagoD2KQXPQijCETtp8HJv7igqvrTzl/obVXrlTLB2cHpXptXNwc2SCbZJtUyBGpk1PSIE3CiCaPZECevIH37L14r5+jE954Z538kPf2ASx5o6I=</latexit>

P(B, j, m)
<latexit sha1_base64="JkcjlKEWADoo28Pg20v8OX2o568=">AAACKnicbZBLaxsxFIU1SZqmbpNOkkUX3YiYQgLBjJvE8aZgmk2XDsQP8BhzR3MnVi1pBknTYob5NdmmfyY7021+RheVH4G8Dgg+zrkXSSfKBDc2CGbe2vrGm823W+8q7z9s73z0d/e6Js01ww5LRar7ERgUXGHHciuwn2kEGQnsRZOLed77hdrwVF3ZaYZDCdeKJ5yBddbI//SNhiCyMdAijBLaLg+/H/88lkcjvxrUGs2Ts8YJDWrBQrT+HKpkpfbI/xfGKcslKssEGDOoB5kdFqAtZwLLSpgbzIBN4BoHDhVINMNi8YGSfnFOTJNUu6MsXbiPNwqQxkxl5CYl2LF5ns3N17JBbpPmsOAqyy0qtrwoyQW1KZ23QWOukVkxdQBMc/dWysaggVnXWSXUqPA3S6UEFRdhApKLaYwJ5MKWRWiSB664ul6U8xK6X2v1Ru308rTaaq6K2yKfyQE5JHVyTlrkB2mTDmGkJDfklvzxbr07b+b9XY6ueaudffJE3v1//xCnDA==</latexit>

= ↵P(B,
XX j, m)
<latexit sha1_base64="vUUznu7B5haZwxbCiVr6CXKcaIA=">AAACPHicbVBNTxsxEPUGaCEtJaVHLhZRpUSKog3QkAtS1F44BokEpGwUzXpnE4PtXdleqmiV38Cv6TX9Hb33VnHlxAHnA6kNfZLlp/dmNDMvTAU31vd/eYWNza03b7d3iu/e737YK33c75kk0wy7LBGJvg7BoOAKu5ZbgdepRpChwKvw9tvcv7pDbXiiLu0kxYGEkeIxZ2CdNCxVz2gAIh0DDUwmh7j8gOZBGNPOtPK1dlOTNaxBdVgq+/Vm6/hL85j6dX8B2lgnZbJCZ1h6CqKEZRKVZQKM6Tf81A5y0JYzgdNikBlMgd3CCPuOKpBoBvnipCn97JSIxol2T1m6UP/uyEEaM5Ghq5Rgx2bdm4v/8/qZjVuDnKs0s6jYclCcCWoTOs+HRlwjs2LiCDDN3a6UjUEDsy7FYqBR4XeWSAkqyoMYJBeTCGPIhJ3mgYlfeNHF9Sqc16R3VG806ycXJ+V2axXcNjkgh6RCGuSUtMk56ZAuYeSe/CAz8tObeb+9P97DsrTgrXo+kX/gPT4D7m2uBA==</latexit>

=↵ P(B, j, m, e, a)
e a

<latexit sha1_base64="e0AiG8DbbZ0xkrC+azfkn89N8wY=">AAACVHicdVBda9swFFXcduu8rcvWx72IhUELxdhZyMfDoF1f+pjB0hbiEK7l60atJBtJ3ghOfs9+zV47+l/6UDlJYR3bAd17OOdeJJ2kENzYMLxreFvbO8+e777wX756vfem+fbduclLzXDEcpHrywQMCq5wZLkVeFloBJkIvEhuTmv/4jtqw3P1zc4LnEi4UjzjDKyTps2T4UGyuD6Sh/Qz9WMQxQxobEo5xXUD6gYOXcG6wCI5WpHrBdRNujZttsKg24/aYZeGQdTu9HsDRwaDfu9Tj0ZBuEKLbDCcNu/jNGelRGWZAGPGUVjYSQXaciZw6celwQLYDVzh2FEFEs2kWn11ST86JaVZrt1Rlq7UPzcqkMbMZeImJdiZ+durxX9549Jm/UnFVVFaVGx9UVYKanNa50ZTrpFZMXcEmOburZTNQAOzLl0/1qjwB8ulBJVWcQaSi3mKGZTCLqvYZI/cd3E9ZkL/T87bQdQNOl87reMvm+B2yXvygRyQiPTIMTkjQzIijPwkv8gt+d24bdx7W97OetRrbHb2yRN4ew8ln7JP</latexit>

XX
P (b|j, m) = ↵ P (b)P (e)P (a|b, e)P (j|a)P (m|a)
e a

P(B|j, m) = ↵h0.00059224, 0.0014919i ⇡ h0.284, 0.716i

Exact inference by enumeration 39
Evaluation tree

Exact inference by enumeration 40

Inference by variable elimination

Exact inference by enumeration 41

Inference by variable elimination

Exact inference by enumeration 42

Variable elimination: Basic operations

Pointwise product of factorsf1 andf2 :

f1 (x1 , . . . , xj , y1 , . . . , yk ) ⇥ f2 (y1 , . . . , yk , z1 , . . . , zl )
= f (x1 , . . . , xj , y1 , . . . , yk , z1 , . . . , zl )
E.g., f1 (a, b) ⇥ f2 (b, c) = f (a, b, c)
<latexit sha1_base64="OJ78PhF9t/YBG0fzPS+pAxGAdYA=">AAADtXicfVJdb9MwFHUbPkbHRwePvFgUpFaKoqaDDZCQJhASj0WiW6UmihzH6UyTOLKdrVmUv8Uf4JE33vgh8MxNUtatnbiSo+N7z7nH8bWfRlzp4fBXq23cun3n7s69zu79Bw8fdfceHyuRScomVERCTn2iWMQTNtFcR2yaSkZiP2In/uJDVT85Y1JxkXzRecrcmMwTHnJKNKS8vdYPJ2AhiOtWRZxDnSWalAWN80VZDM2hdWBW35dlZ4PqRxkrCzn3gWZVlGq9AlrCzqmIY5IEhRMTXc5st3A0W+otk55dbvDBItkWNFYNu8o3vGIseKLPuWI4lSLIqMYixCGhWkhVlrgyL0LPBnRFBDbr2qh86zgdfEntLz3bdKJAaGUuva9mvt7m3mIAjXjMFAggQN2/Xjcv1tsLLxqUde93q+7/732T9qM1t8z12Yjp/ztBbe6bdADdw6oAsPS6PRhCHXgb2CvQO3r++9v3s90/Y6/70wkEzWIYBo2IUjN7mGq3IFJzGjG460yxlNAFjGsGMCHg7Bb1UEr8AjIBDoWElWhcZ68qChIrlcc+MOEHTtVmrUreVJtlOnztFjxJM80S2hiFWYS1wNUTxgGXjOooB0Co5HBWTE+JhLHDQ+/Ul/CmioPLX94GxyPL3rf2P8NtvEdN7KCn6BnqIxsdoiP0CY3RBNH2qD1tk7ZvHBquERhhQ223Vpon6FoY4i95MDRG</latexit>

Summing out a variable from a product of factors:

move any constant factors outside the summation
add up submatrices in pointwise product of remaining factors
<latexit sha1_base64="xTrMYrmz8nMOxCnUMb0u9/ngzF4=">AAADX3icbVJNb9NAEHUcoCUtbQonBEIrIiQOVRRTKB+nCi4cW0HaSrFVrdfjZBXvrrUfaS3LR/5Tfwc3JC6c+ROM7QAlYaS1xvPezNt52jjPuLGj0beO3711+87G5t3e1va9nd3+3v1To5xmMGYqU/o8pgYyLmFsuc3gPNdARZzBWTz/UONnC9CGK/nZFjlEgk4lTzmjFksXex0XJpBiczOqFAXiIC2tSiaKeVWO9kfDw/36+7LqrVDjzEFV6mmMtGFNqc8rpEm4ZEoIKpMyFNRWkyAqQwtXdk1kEFQrfJSQ6w2tVMuu6y2v/OSE4HJKlLMVoWRBNae4OUm1Evifa5U4ZolKSUqZVdq8q8JwOUGoBRAqC8KUNJZK+5tTTzM8AWJnQAwqNFb9baRJQlyOSIyI5gwM4ZLkikt7yQ3cVNUgKJf1DZezccokgKvooj9Au5og60mwTAZHj69Pfn55cn180f8aJoo5gbaxjBozCUa5jUqqLWcZoCvOQE7ZHI2dYCqpABOVjX0VeYaVhKRK48E9m+rNjpIKYwoRIxNXmplVrC7+D5s4m76JSi5zZ0GyVih1GbGK1I+NJFwDs1mBCWWa410Jm1GNXuCT7DUmvK3j8M/K68npi2FwMDw4QTfee21seo+8p95zL/Bee0feR+/YG3us8933/S1/2//R3ejudPst1e8sex54/0T34S+3Whxb</latexit>

⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄

assuming f1 , . . . , fi do not depend on X

<latexit sha1_base64="aqeXn6Ga999+fl4lmRIfCYjArpA=">AAAD7XicfVPNbtQwEE4Tfsry0y09cjG0SEhEq00LBYQqVXDpsQi2XWm9WjmOszUb21HswEaW3wEhcQAhrrwPN96Ax2CSlKrsQkdy9GXmm/lmxkmcZ1ybfv/nih9cunzl6uq1zvUbN2+tdddvH2lVFpQNqMpUMYyJZhmXbGC4ydgwLxgRccaO49nLOn78jhWaK/nGVDkbCzKVPOWUGHBN1v0AJyzFotKlsDhmUy7tQTllDotYze0Wfs2ngmw5zGTSBlynydCGC6atxSFuEA6dQwg1Qeil6cyKCuSYNMRZKqqZs/2w39sN6+cjt0iNs5I5W0xjoPVqSn0eA02y91QJQaADLIhxo2hssWFzsySyGbkFPkjI5YRWqmVDRdvOP5mjdBKh09EQpoky+uw1nczQXudCAkfoTyX8HN4tfxi5/9A7Tb2LBfk5DJdDCjt0zmE82mbzcaeZyBINelxOYf+omQUqhjirK4VQAZwNDYeJQlIZlLAcrhIpeZYwdJPuJmy7MbQMolOwuX93z3ysNn4dTro/cKJoKWDrNIMORlE/N2NLCsNpxmCppWY5oTO4lxFASWCGsW2279B98CQoVQUcaVDjPZ9hidC6EjEwob0TvRirnf+KjUqTPh1bLvPSMElbobTMkFGo/vRRwgtGTVYBILTg0CuiJ6Qg1MAP0mmW8Ky23bORl8HRdi/a6e28gm288Fpb9e5497wHXuQ98fa9A+/QG3jUf+t/8D/7XwIVfAq+Bt9aqr9ymrPh/WXB998ejVA+</latexit>

Exact inference by enumeration 43

Variable elimination: Basic operations
Pointwise product of factorsf1 andf2 :
f1 (x1 , . . . , xj , y1 , . . . , yk ) ⇥ f2 (y1 , . . . , yk , z1 , . . . , zl )
= f (x1 , . . . , xj , y1 , . . . , yk , z1 , . . . , zl )
E.g., f1 (a, b) ⇥ f2 (b, c) = f (a, b, c)
<latexit sha1_base64="OJ78PhF9t/YBG0fzPS+pAxGAdYA=">AAADtXicfVJdb9MwFHUbPkbHRwePvFgUpFaKoqaDDZCQJhASj0WiW6UmihzH6UyTOLKdrVmUv8Uf4JE33vgh8MxNUtatnbiSo+N7z7nH8bWfRlzp4fBXq23cun3n7s69zu79Bw8fdfceHyuRScomVERCTn2iWMQTNtFcR2yaSkZiP2In/uJDVT85Y1JxkXzRecrcmMwTHnJKNKS8vdYPJ2AhiOtWRZxDnSWalAWN80VZDM2hdWBW35dlZ4PqRxkrCzn3gWZVlGq9AlrCzqmIY5IEhRMTXc5st3A0W+otk55dbvDBItkWNFYNu8o3vGIseKLPuWI4lSLIqMYixCGhWkhVlrgyL0LPBnRFBDbr2qh86zgdfEntLz3bdKJAaGUuva9mvt7m3mIAjXjMFAggQN2/Xjcv1tsLLxqUde93q+7/732T9qM1t8z12Yjp/ztBbe6bdADdw6oAsPS6PRhCHXgb2CvQO3r++9v3s90/Y6/70wkEzWIYBo2IUjN7mGq3IFJzGjG460yxlNAFjGsGMCHg7Bb1UEr8AjIBDoWElWhcZ68qChIrlcc+MOEHTtVmrUreVJtlOnztFjxJM80S2hiFWYS1wNUTxgGXjOooB0Co5HBWTE+JhLHDQ+/Ul/CmioPLX94GxyPL3rf2P8NtvEdN7KCn6BnqIxsdoiP0CY3RBNH2qD1tk7ZvHBquERhhQ223Vpon6FoY4i95MDRG</latexit>

Exact inference by enumeration 44

Variable elimination: Basic operations
Summing out a variable from a product of factors:
move any constant factors outside the summation
add up submatrices in pointwise product of remaining factors
<latexit sha1_base64="xTrMYrmz8nMOxCnUMb0u9/ngzF4=">AAADX3icbVJNb9NAEHUcoCUtbQonBEIrIiQOVRRTKB+nCi4cW0HaSrFVrdfjZBXvrrUfaS3LR/5Tfwc3JC6c+ROM7QAlYaS1xvPezNt52jjPuLGj0beO3711+87G5t3e1va9nd3+3v1To5xmMGYqU/o8pgYyLmFsuc3gPNdARZzBWTz/UONnC9CGK/nZFjlEgk4lTzmjFksXex0XJpBiczOqFAXiIC2tSiaKeVWO9kfDw/36+7LqrVDjzEFV6mmMtGFNqc8rpEm4ZEoIKpMyFNRWkyAqQwtXdk1kEFQrfJSQ6w2tVMuu6y2v/OSE4HJKlLMVoWRBNae4OUm1Evifa5U4ZolKSUqZVdq8q8JwOUGoBRAqC8KUNJZK+5tTTzM8AWJnQAwqNFb9baRJQlyOSIyI5gwM4ZLkikt7yQ3cVNUgKJf1DZezccokgKvooj9Au5og60mwTAZHj69Pfn55cn180f8aJoo5gbaxjBozCUa5jUqqLWcZoCvOQE7ZHI2dYCqpABOVjX0VeYaVhKRK48E9m+rNjpIKYwoRIxNXmplVrC7+D5s4m76JSi5zZ0GyVih1GbGK1I+NJFwDs1mBCWWa410Jm1GNXuCT7DUmvK3j8M/K68npi2FwMDw4QTfee21seo+8p95zL/Bee0feR+/YG3us8933/S1/2//R3ejudPst1e8sex54/0T34S+3Whxb</latexit>

⌃ f ⇥ ··· ⇥f
x 1 k = f1 ⇥ · · · ⇥ fi ⌃ x fi+1 ⇥ · · · ⇥ fk = f1 ⇥ · · · ⇥ fi ⇥ fX̄

assuming f1 , . . . , fi do not depend on X

Exact inference by enumeration 45

Variable elimination algorithm

Exact inference by enumeration 46

Complexity of exact inference

Exact inference by enumeration 47

Clustering algorithms
• Variable elimination algorithm is simple and ef cient for
answering individual queries
• For computation of posterior probabilities for all the
variables in a network it can be less ef cient: O(n2)
• Using clustering algorithms (also known as join tree
algorithms), this can be reduced to O(n).
• The basic idea of clustering is to join individual nodes of
the network to form cluster nodes in such a way that the
resulting network is a polytree.

Exact inference by enumeration 48

fi
fi
Clustering algorithms
• Multiply connected network can be
converted into a polytree by
combining Sprinkler and Rain node
into cluster node called
Sprinkler+Rain.
• Two Boolean nodes replaced by a
mega-node that takes on four
possible values: TT,TF, FT, FF. The
mega-node has only one parent, the
Boolean variable Cloudy, so there are
two conditioning cases.

Exact inference by enumeration 49

APPROXIMATE INFERENCE IN BAYESIAN
NETWORKS
• Randomized sampling algorithms, also called Monte
Carlo algorithms
• Provide approximate answers whose accuracy depends
on the number of samples generated
• Monte Carlo algorithms are used in many branches of
science to estimate quantities that are difficult to
calculate exactly.
• Here: sampling applied to the computation of posterior
probabilities
• Two families of algorithms: direct sampling and Markov
chain sampling

50
Direct sampling methods

• Generation of samples from a known probability

distribution
• Example
P(Coin) = ⟨0.5,0.5⟩

• Sampling from this distribution is exactly like

flipping the coin: with probability 0.5 it will return
heads , and with probability 0.5 it will return tails.

Approximate inference in BN 51
Direct sampling methods

Approximate inference in BN 52
Direct sampling methods

Approximate inference in BN 53
Direct sampling methods

• PRIOR-SAMPLE generates samples from the prior

joint distribution specified by the network

• Each sampling step depends only on the parent

values
SPS(x1 … xn) = P(x1 … xn) .

Approximate inference in BN 54
Computing answers

• Answers are computed by counting the actual

samples generated
• Say, N total samples and NPS(x1,…, xn) number of
times the event x1,…xn occurs in the samples

Approximate inference in BN 55
Computing answers

• For example, consider the event produced earlier:

[true,false,true,true]. The sampling probability for
this event is

• Hence, in the limit of large N, we expect 32.4% of

the samples to be of this event.

Approximate inference in BN 56
Rejection sampling

Approximate inference in BN 57
Rejection sampling
• Let ＾
P(X|e) be the estimated distribution. Then
from the definition just given

• Rejection sampling produces a consistent estimate

of the true probability.

Approximate inference in BN 58
Rejection sampling

• Estimate P(Rain | Sprinkler = true), using 100

samples. Of the 100 that we generate, suppose
that 73 have Sprinkler = false and are rejected,
while 27 have Sprinkler = true.
• Of the 27, 8 have Rain = true and 19 have Rain =
false.
• Thus,

P(Rain | Sprinkler = true) ≈ NORMALIZE( < 8,19 > ) = < 0.296,0.704 >

Approximate inference in BN 59
Rejection sampling

How often does it rain the day after we have observed aurora
borealis? Ignoring all those days with no aurora borealis…
60
Likelihood weighting

• Likelihood weighting avoids the inefficiency of

rejection sampling
• It generates only events that are consistent with
the evidence e.
• It is a particular instance of the general statistical
technique of importance sampling, tailored for
inference in Bayesian networks.
• Let’s see how it works…

Approximate inference in BN 61
Likelihood weighting

Approximate inference in BN 62
Likelihood weighting
For the evidence P(Rain | Cloudy = true , WetGrass = true )

• Cloudy is an evidence variable with

value true. Therefore, we set
w ← w×P(Cloudy=true) = 0.5 .
• Sprinkler is not an evidence variable, so sample from
P(Sprinkler | Cloudy = true ) = ⟨0.1,0.9⟩; suppose this
returns false.
• Similarly, sample from P(Rain|Cloudy=true) =
⟨0.8,0.2⟩; suppose this returns true .
• WetGrass is an evidence variable with value true.
Therefore, we set
w ← w × P(WetGrass=true | Sprinkler=false, Rain=true ) =
0.5 x 0.9 = 0.45
Approximate inference in BN 63
Likelihood weighting

• The weight for a given sample x is the product of the

likelihoods for each evidence variable given its
parents

• Multiplying the last two equations we see that the

weighted probability of a sample has the particularly
convenient form

Approximate inference in BN 64
Likelihood weighting
• For any particular value x of X, the estimated posterior
probability can be calculated as follows:

• Hence, likelihood weighting returns consistent

estimates.
Approximate inference in BN 65
Inference by Markov chain simulation

• Markov chain Monte Carlo (MCMC) algorithms

work quite differently from rejection sampling and
likelihood weighting.
• MCMC state change similar to SA
• Gibbs sampling well suited for BN
• Starts with an arbitrary state and generates a next
state by randomly sampling a value for one of the
nonevidence variables Xi.
• Sampling for Xi is done conditioned on the current
values of the variables in the Markov blanket of Xi.
Approximate inference in BN 66
Inference by Markov chain
simulation
• P(Rain | Sprinkler=true, WetGrass=true)
• Initial state is [true,true,false,true]
• Cloudy:
– P(Cloudy|Sprinkler=true,Rain=false). Suppose the
result is Cloudy = false.
– Then the new current state is [false,true,false,true].
• Rain:
– Given the current values of its Markov blanket
variables: P(Rain | Cloudy = false , Sprinkler = true ,
WetGrass = true ). Suppose this yields Rain = true .
– The new current state is [false,true,true,true].
Approximate inference in BN 67
Inference by Markov chain simulation

Approximate inference in BN 68
Summary
This chapter has described Bayesian networks, a well-
developed representation for uncertain knowledge. Bayesian
networks play a role roughly analogous to that of propositional
logic for de nite knowledge.
• A Bayesian network is a directed acyclic graph whose nodes
correspond to random variables; each node has a conditional
distribution for the node given its parents.
• Bayesian networks provide a concise way to represent
conditional independence relationships in the domain.
• A Bayesian network speci es a full joint distribution; each joint
entry is de ned as the product of the corresponding entries in
the local conditional distributions. A Bayesian network is often
exponentially smaller than the full joint distribution.

69
fi
fi
fi
Summary (2)
• Inference in Bayesian networks means computing the probability
distribution of a set of query variables, given a set of evidence
variables. Exact inference algorithms, such as variable
elimination, evaluate sums of products of conditional
probabilities as ef ciently as possible.
• In polytrees (singly connected networks), exact inference takes
time linear in the size of the network. In the general case, the
problem is intractable.
• Stochastic approximation techniques such as likelihood
weighting and Markov chain Monte Carlo can give
reasonable estimates of the true posterior probabilities in a
network and can cope with much larger networks than can
exact algorithms.

70
fi

AI Bayes Theorem
No ratings yet
AI Bayes Theorem
10 pages
02-Random Variables
No ratings yet
02-Random Variables
38 pages
Applied Statistical and Probability For Engineering
No ratings yet
Applied Statistical and Probability For Engineering
45 pages
Z-Chart & Loss Function Tables
No ratings yet
Z-Chart & Loss Function Tables
1 page
PPT06-Probabilistic Reasoning
No ratings yet
PPT06-Probabilistic Reasoning
31 pages
STPM Maths T 2020 Assignment Methodology Example
0% (1)
STPM Maths T 2020 Assignment Methodology Example
3 pages
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
No ratings yet
Representing Knowledge in An Uncertain Domain IN AI: Bayesian Networks
7 pages
Probabilistic Reliability Engineering (PDFDrive)
No ratings yet
Probabilistic Reliability Engineering (PDFDrive)
536 pages
Probabilistic Reasoning: 13.1 Representing Knowledge in An Uncertain Domain
100% (1)
Probabilistic Reasoning: 13.1 Representing Knowledge in An Uncertain Domain
1 page
Bayesian Belief Network in Artificial Intelligence
No ratings yet
Bayesian Belief Network in Artificial Intelligence
10 pages
Unit 6
No ratings yet
Unit 6
126 pages
School of Statistics
No ratings yet
School of Statistics
9 pages
Richman Moorman 2000 Physiological Time Series Analysis Using Approximate Entropy and Sample Entropy
No ratings yet
Richman Moorman 2000 Physiological Time Series Analysis Using Approximate Entropy and Sample Entropy
11 pages
Unit 5
No ratings yet
Unit 5
98 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
Statistics and Probability Reviewer
77% (13)
Statistics and Probability Reviewer
6 pages
EECS6895 AdvancedBigDataAnalytics Lecture6
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture6
81 pages
Bayesian Networks Analysis
No ratings yet
Bayesian Networks Analysis
51 pages
Libpgm For Bayesian Networks: Dr. A. Obulesh Associate Professor
No ratings yet
Libpgm For Bayesian Networks: Dr. A. Obulesh Associate Professor
59 pages
Bayesian Networks and Inference
No ratings yet
Bayesian Networks and Inference
50 pages
ECON 762 Lecture Notes
No ratings yet
ECON 762 Lecture Notes
19 pages
13 Bayes Nets
No ratings yet
13 Bayes Nets
38 pages
Artificial Intelligence: Adina Magda Florea
No ratings yet
Artificial Intelligence: Adina Magda Florea
36 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
AIFA 25 Bayesian Logic 120324
No ratings yet
AIFA 25 Bayesian Logic 120324
33 pages
Bayes Nets 2016
No ratings yet
Bayes Nets 2016
62 pages
Monte Carlo Artificial Intelligence: Bayesian Networks
No ratings yet
Monte Carlo Artificial Intelligence: Bayesian Networks
26 pages
SPTC 0301 Q3 FPF
No ratings yet
SPTC 0301 Q3 FPF
31 pages
4 Unce
No ratings yet
4 Unce
32 pages
SP14 CS188 Lecture 16 - Bayes Nets
No ratings yet
SP14 CS188 Lecture 16 - Bayes Nets
42 pages
Chapter 5. Bayesian Statistics (II)
No ratings yet
Chapter 5. Bayesian Statistics (II)
30 pages
AI 16 Bayes Nets
No ratings yet
AI 16 Bayes Nets
32 pages
Uncertain Knowledge
No ratings yet
Uncertain Knowledge
31 pages
Unit V - Graphical Models
No ratings yet
Unit V - Graphical Models
43 pages
cs188 Su24 Lec07
No ratings yet
cs188 Su24 Lec07
89 pages
Bayesian Networks: Section 1 - 2
No ratings yet
Bayesian Networks: Section 1 - 2
16 pages
Lecture Bayesian Networks
No ratings yet
Lecture Bayesian Networks
50 pages
Estimating The Errors On Measured Entropy and Mutual Information
No ratings yet
Estimating The Errors On Measured Entropy and Mutual Information
10 pages
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
No ratings yet
Bayesian Belief Network, Exact Inference, Approx Inference, Causal Network
15 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
41 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
Bayesian Neworks
No ratings yet
Bayesian Neworks
32 pages
Bayesian and Inference
No ratings yet
Bayesian and Inference
86 pages
Unit V
No ratings yet
Unit V
17 pages
Good BayesianNetworksPrimer
No ratings yet
Good BayesianNetworksPrimer
23 pages
Measures of Central Tendency and Dispersion Formulas
No ratings yet
Measures of Central Tendency and Dispersion Formulas
13 pages
Lecture8 - Bays1
No ratings yet
Lecture8 - Bays1
40 pages
Lecture 5 Bayesian Networks
No ratings yet
Lecture 5 Bayesian Networks
12 pages
Bayesian Networks
No ratings yet
Bayesian Networks
7 pages
202004021910158758chandrabhan Artificial Intelligence Probabilistic Reasoning
No ratings yet
202004021910158758chandrabhan Artificial Intelligence Probabilistic Reasoning
11 pages
Bayesian Networks in AI
No ratings yet
Bayesian Networks in AI
8 pages
Probability and Statistical Inference 3rd Edition Magdalena Niewiadomska-Bugaj Download PDF
100% (1)
Probability and Statistical Inference 3rd Edition Magdalena Niewiadomska-Bugaj Download PDF
47 pages
TTNT 08 Probabilistic Reasoning
No ratings yet
TTNT 08 Probabilistic Reasoning
32 pages
Statistics Handout CH 1&2
No ratings yet
Statistics Handout CH 1&2
20 pages
Unit-5 Bayes' Rule and Bayesian Network
No ratings yet
Unit-5 Bayes' Rule and Bayesian Network
9 pages
Chapter 13
No ratings yet
Chapter 13
65 pages
13 Bayes-Net
No ratings yet
13 Bayes-Net
19 pages
Ai Pro
No ratings yet
Ai Pro
11 pages
SP14 CS188 Lecture 16 Bayes Nets 4
No ratings yet
SP14 CS188 Lecture 16 Bayes Nets 4
42 pages
BNetwork Presentation
No ratings yet
BNetwork Presentation
18 pages
Bivariate Dynamic Cumulative
No ratings yet
Bivariate Dynamic Cumulative
21 pages
Chapter 5 Multiples
No ratings yet
Chapter 5 Multiples
8 pages
Bayesian Network
No ratings yet
Bayesian Network
20 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
22 pages
1 BN Sintax Semantics
No ratings yet
1 BN Sintax Semantics
10 pages
Unit 2
No ratings yet
Unit 2
45 pages
ENCI 504 - Practice Problem 5 - Solutions
No ratings yet
ENCI 504 - Practice Problem 5 - Solutions
10 pages
MATH 1280 Calculator
No ratings yet
MATH 1280 Calculator
13 pages
SP14 CS188 Lecture 16 - Bayes Nets - Print
No ratings yet
SP14 CS188 Lecture 16 - Bayes Nets - Print
31 pages
Cosmological Parameter Estimation With Sequential Linear Simulation-Based Inference
No ratings yet
Cosmological Parameter Estimation With Sequential Linear Simulation-Based Inference
12 pages
Semantics
No ratings yet
Semantics
11 pages
Artificial Intelligence: Lecture 13 - Bayes Nets Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 13 - Bayes Nets Dr. Shivanjali Khare
32 pages
Unit 5
No ratings yet
Unit 5
7 pages
Probabilistic Inferences in Bayesian Networks
No ratings yet
Probabilistic Inferences in Bayesian Networks
15 pages
IAI-Unit5-set 2
No ratings yet
IAI-Unit5-set 2
5 pages
1962-Some Problems Connected With Rayleigh Distributions
No ratings yet
1962-Some Problems Connected With Rayleigh Distributions
8 pages
Bayesian Networks
No ratings yet
Bayesian Networks
16 pages
Mathematics-II (Probability & Statistics) - Course Oultine
No ratings yet
Mathematics-II (Probability & Statistics) - Course Oultine
2 pages
Bayesian Networks
No ratings yet
Bayesian Networks
8 pages
Mount Zion College of Engineering and Technology
No ratings yet
Mount Zion College of Engineering and Technology
22 pages
4.2 Bayesian Networks
No ratings yet
4.2 Bayesian Networks
18 pages
GCE A Level The Poisson Distribution
No ratings yet
GCE A Level The Poisson Distribution
6 pages
INSE6400FinalFall2012 PDF
No ratings yet
INSE6400FinalFall2012 PDF
5 pages
1st Periodical Exam in Statistics and Probability Reviewer
No ratings yet
1st Periodical Exam in Statistics and Probability Reviewer
5 pages
7.unit - 5 - The Semeantics of Bayesian Network
No ratings yet
7.unit - 5 - The Semeantics of Bayesian Network
6 pages
MA107 Tutorial 5
No ratings yet
MA107 Tutorial 5
4 pages
QA Handout Normal Distribution
No ratings yet
QA Handout Normal Distribution
3 pages
Business Statistics - Problems On Statistics
No ratings yet
Business Statistics - Problems On Statistics
2 pages
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet

Probabilistic Reasoning

Uploaded by

Probabilistic Reasoning

Uploaded by

PROBABILISTIC

In which we explain how to build reasoning systems that use network

• Knowledge in uncertain domains

This simple equation underlies

• Joint probability distribution • Complexity

Knowledge in uncertain domains 4

• Graph-theoretical structure • Conditional probability tables

• Two views on semantics • Views are equivalent

Semantics of Bayesian Networks 8

• Generic Rule • Reformulate the rule

• Compare with • I.e.:

• Important while constructing

1. Choose the set of relevant variables Xi that describe the

• This means that it is impossible for the knowledge engineer

• Suppose we decide to add the

B is independent of J and M given A and E, i.e.

• Node X is conditionally • A node X is conditionally

Ef cient Representation of conditional distributions 29

Compact conditional distributions P (¬f ever|¬cold, ¬f lu, malaria) = 0.1

Ef cient Representation of conditional distributions 30

Ef cient Representation of conditional distributions 31

Ef cient Representation of conditional distributions 32

Ef cient Representation of conditional distributions 33

Ef cient Representation of conditional distributions 34

Ef cient Representation of conditional distributions 35

Exact inference by enumeration 37

Exact inference by enumeration 38

P(B|j, m) = ↵h0.00059224, 0.0014919i ⇡ h0.284, 0.716i

Exact inference by enumeration 40

Exact inference by enumeration 41

Exact inference by enumeration 42

Pointwise product of factorsf1 andf2 :

Summing out a variable from a product of factors:

assuming f1 , . . . , fi do not depend on X

Exact inference by enumeration 43

Exact inference by enumeration 44

assuming f1 , . . . , fi do not depend on X

Exact inference by enumeration 45

Exact inference by enumeration 46

Exact inference by enumeration 47

Exact inference by enumeration 48

Exact inference by enumeration 49

• Generation of samples from a known probability

• Sampling from this distribution is exactly like

• PRIOR-SAMPLE generates samples from the prior

• Each sampling step depends only on the parent

• Answers are computed by counting the actual

• For example, consider the event produced earlier:

• Hence, in the limit of large N, we expect 32.4% of

• Rejection sampling produces a consistent estimate

• Estimate P(Rain | Sprinkler = true), using 100

• Likelihood weighting avoids the inefficiency of

• Cloudy is an evidence variable with

• The weight for a given sample x is the product of the

• Multiplying the last two equations we see that the

• Hence, likelihood weighting returns consistent

• Markov chain Monte Carlo (MCMC) algorithms

You might also like