BayesianNetworks Reduced
BayesianNetworks Reduced
Bayesian Networks
2
Bayesian Belief Networks (BNs)
3
Example BN
P(A) = 0.001
a
P(C|A) = 0.2
P(B|A) = 0.3
P(B|A) = 0.001
b c P(C|A) = 0.005
d e
P(D|B,C) = 0.1 P(E|C) = 0.4
P(D|B,C) = 0.01 P(E|C) = 0.002
P(D|B,C) = 0.01
P(D|B,C) = 0.00001
Note that we only specify P(A) etc., not P(A), since they have to add to one
4
Conditional independence and
chaining
Conditional independence assumption
P ( xi | i , q) P ( xi | i ) i
where q is any set of variables q
(nodes) other than x i and its successors
xi
i blocks influence of other nodes on x i
and its successors (q influences x i only
through variables in i )
With this assumption, the complete joint probability distribution of all
variables in the network can be represented by (recovered from) local
CPDs by chaining these CPDs:
P ( x1 ,..., xn ) ni1 P ( xi | i )
5
Chaining: Example
a
b c
d e
6
Topological semantics
A node is conditionally independent of its non-
descendants given its parents
A node is conditionally independent of all other nodes in
the network given its parents, children, and childrens
parents (also known as its Markov blanket)
The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z
7
Inference tasks
Simple queries: Computer posterior marginal P(Xi | E=e)
E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
Conjunctive queries:
P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
Optimal decisions: Decision networks include utility
information; probabilistic inference is required to find
P(outcome | action, evidence)
Value of information: Which evidence should we seek next?
Sensitivity analysis: Which probability values are most
critical?
Explanation: Why do I need a new starter motor?
8
Approaches to inference
Exact inference
Enumeration
Belief propagation in polytrees
Variable elimination
Clustering / join tree algorithms
Approximate inference
Stochastic simulation / sampling methods
Markov chain Monte Carlo methods
Genetic algorithms
Neural networks
Simulated annealing
Mean field theory
9
Direct inference with BNs
Instead of computing the joint, suppose we just want the
probability for one variable
Exact methods of computation:
Enumeration
Variable elimination
Join trees: get the probabilities associated with every query
variable
10
Inference by enumeration
Add all of the terms (atomic event probabilities) from the
full joint distribution
If E are the evidence (observed) variables and Y are the
other (unobserved) variables, then:
P(X|e) = P(X, E) = P(X, E, Y)
Each P(X, E, Y) term can be computed using the chain rule
Computationally expensive!
11
Example: Enumeration
a
b c
d e
P(xi) = i P(xi | i) P(i)
Suppose we want P(D=true), and only the value of E is
given as true
P (d|e) = ABCP(a, b, c, d, e)
= ABCP(a) P(b|a) P(c|a) P(d|b,c) P(e|c)
With simple iteration to compute this expression, theres
going to be a lot of repetition (e.g., P(e|c) has to be
recomputed every time we iterate over C=true)
12
Exercise: Enumeration
p(smart)=.8 p(study)=.6
smart study
p(fair)=.9
prepared fair
p(prep|) smart smart
study .9 .7
pass study .5 .1
smart smart
p(pass|)
prep prep prep prep Query: What is the
fair .9 .7 .7 .2
probability that a student
studied, given that they pass
fair .1 .1 .1 .1
the exam? 13
Summary
Bayes nets
Structure
Parameters
Conditional independence
Chaining
BN inference
Enumeration
Variable elimination
Sampling methods
14