Bayesian and inference
Bayesian and inference
Amarda Shehu
2 Bayesian Networks
Syntax
Semantics
Parameterized Distributions
Amarda Shehu (580) Outline of Today’s Class – Bayesian Networks and Inference 2
Bayesian Networks
Syntax:
a set of nodes, one per variable
a directed, acyclic graph (link ≈ “directly influences”)
a conditional distribution for each node given its parents:
P(Xi |Parents(Xi ))
I’m at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn’t
call. Sometimes it’s set off by minor earthquakes. Is there a burglar?
has:
I.e., grows linearly with n, vs. O(2n ) for the full joint distribution
distribution
distribution
= P(j|a)P(m|a)P(a|¬b, ¬e)P(¬b)P(¬e)
= 0.9 × 0.7 × 0.001 × 0.999 × 0.998
≈ 0.00063
P(J|M) = P(J)?
P(J|M) = P(J)? No
P(A|J, M) = P(A|J)? P(A|J, M) = P(A)
P(J|M) = P(J)? No
P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No
P(B|A, J, M) = P(B|A)?
P(B|A, J, M) = P(B)?
P(J|M) = P(J)? No
P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No
P(B|A, J, M) = P(B|A)? Yes
P(B|A, J, M) = P(B)? No
P(E |B, A, J, M) = P(E |A)?
P(E |B, A, J, M) = P(E |A, B)?
P(J|M) = P(J)? No
P(A|J, M) = P(A|J)? P(A|J, M) = P(A)? No
P(B|A, J, M) = P(B|A)? Yes
P(B|A, J, M) = P(B)? No
P(E |B, A, J, M) = P(E |A)? No
P(E |B, A, J, M) = P(E |A, B)? Yes
Need one conditional density function for child variable given continuous parents, for
each possible assignment to discrete parents
Slightly intelligent way to sum out variables from the joint without actually constructing
its explicit representation
P(B|j, m)
= α P(B)
| {z }
Σ e P(e)
| {z }
Σ |P(a|B,
a e) P(j|a) P(m|a)
{z } | {z } | {z }
B E A J M
Example of factors:
f4 (A) is hP(j|a), P(j|¬a)i = h0.90, 0.05i f5 (A) is hP(m|a), P(m|¬a)i = h0.70, 0.01i
f3 (A, b, E ) is a matrix of two rows, hP(a|b, e), P(¬a|b, e)i and hP(a|b, ¬e), P(¬a|b, ¬e)i
f3 (A, B, E ) is a 2x2x2 matrix (considering also b and ¬b).
equivalent to going bottom-up in tree, keeping track of both children in a vector, and
multiplying child with parent to “roll up” to higher level.
Generally:
Pointwise product of factors f1 and f2 :
f1 (x1 , . . . , xj , y1 , . . . , yk ) × f2 (y1 , . . . , yk , z1 , . . . , zl )
= f (x1 , . . . , xj , y1 , . . . , yk , z1 , . . . , zl ) vars are unions
Correct: P(j|A) × P(m|A) = P(j, m|A) (because J and M are conditionally indepedent
given their parent set A)
“Summing out” A means pointwise product on each branch and sum up at parent
Let f4 (A) × f5 (A) be f (j, m, A) =< P(j, m|a), P(j, m|¬a > (from previous slide)
Take pointwise product of first row of f3 (A, b, E ) with f (j, m, A)
Take pointwise product of second row of f3 (A, b, E ) with f (j, m, A)
Sum the two rows to get a new factor f6 (b, E )
Defn: moral graph of Bayes net: marry all parents and drop arrows
How to reduce time? Identify structure in BN similar to CSP setting: group variables
together to “reduce” network to a polytree
The exponential time cost is hidden in the combined CPTs, which can become
exponentially large
How to reduce time? Identify structure in BN similar to CSP setting: group variables
together to “reduce” network to a polytree
The exponential time cost is hidden in the combined CPTs, which can become
exponentially large
Or...
Or...
Or...
Or...
Basic idea:
Outline:
– Markov chain Monte Carlo (MCMC): sample from a stochastic process whose
stationary distribution is the true posterior
Empty refers to the absence of any evidence: used to estimate joint probabilities
Main idea:
Once parent is sampled, its value is fixed and used to sample child
Events generated via this direct sampling, observing joint probability distribution
Example next
Empty refers to the absence of any evidence: used to estimate joint probabilities
Main idea:
Once parent is sampled, its value is fixed and used to sample child
Events generated via this direct sampling, observing joint probability distribution
Example next
Main idea:
Given distribution too hard to sample directly from it: use an easy-to-sample distribution
for direct sampling, and then reject samples based on hard-to-sample distribution
Generate 100 samples for Cloudy , Sprinkler , Rain, WetGrass via direct sampling
27 samples have Sprinkler = true event of interest
Of these, 8 have Rain = true and 19 have Rain = false.
Problem:
If e is very rare event, most samples rejected; hopelessly expensive if P(e) is small
Main idea:
Generate only events that are consistent with given values e of evidence variables E
Weight each sample by the likelihood it accords the evidence (how likely e is)
Rain considered next, nonevidence, so sample from BN, w does not change
w = 1.0 × 0.1
Most samples will have very low weights, and weight estimate will be dominated by tiny
fraction of samples that contribute little likelihood to evidence
Change framework: do not directly sample (from scratch), but modify preceding sample
Main idea:
Markov Chain Monte Carlo (MCMC) algorithm(s) generate each sample by making a
random change to a preceding sample
♦ MCMC Analysis
♦ Stationarity
♦ Detailed Balance
π(x0 ) = Σxπ(x)q(x → x ) 0
for all x0
Σxπ(x)q(x → x ) 0
= Σxπ(x )q(x 0 0
→ x)
Relative approximation: e
|P(X | )−P̂(X | e )|
≤
P(X | e)