0% found this document useful (0 votes)
21 views

BayesianNetworks Reduced

Uploaded by

Suparna Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

BayesianNetworks Reduced

Uploaded by

Suparna Sinha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

CS 63

Bayesian Networks

Chapter 14.1-14.2; 14.4


Adapted from slides by Some material borrowed
Tim Finin and from Lise Getoor.
Marie desJardins.
1
Outline
• Bayesian networks
– Network structure
– Conditional probability tables
– Conditional independence
• Inference in Bayesian networks
– Exact inference
– Approximate inference

2
Bayesian Belief Networks (BNs)

• Definition: BN = (DAG, CPD)


– DAG: directed acyclic graph (BN’s structure)
• Nodes: random variables (typically binary or discrete, but
methods also exist to handle continuous variables)
• Arcs: indicate probabilistic dependencies between nodes
(lack of link signifies conditional independence)
– CPD: conditional probability distribution (BN’s parameters)
• Conditional probabilities at each node, usually stored as a table
(conditional probability table, or CPT)
P ( xi |  i ) where  i is the set of all parent nodes of xi

– Root nodes are a special case – no parents, so just use priors


in CPD:
 i  , so P ( x i |  i )  P ( x i )

3
Example BN
P(A) = 0.001
a
P(C|A) = 0.2
P(B|A) = 0.3
P(B|A) = 0.001
b c P(C|A) = 0.005

d e
P(D|B,C) = 0.1 P(E|C) = 0.4
P(D|B,C) = 0.01 P(E|C) =
P(D|B,C) = 0.01 0.002
P(D|B,C) =
0.00001

Note that we only specify P(A) etc., not P(¬A), since they have to add to one

4
Conditional independence and
chaining
• Conditional independence assumption
– P ( xi |  i , q)  P ( xi |  i ) i
where q is any set of variables q
(nodes) other than x i and its successors
–  i blocks influence of other nodes on x i
xi
and its successors (q influences x i only
through variables in  i )
– With this assumption, the complete joint probability distribution of all
variables in the network can be represented by (recovered from) local
CPDs by chaining these CPDs:
P ( x1 ,..., x n )   ni1 P ( x i |  i )

5
Chaining: Example
a

b c

d e

Computing the joint probability for all variables is easy:


P(a, b, c, d, e)
= P(e | a, b, c, d) P(a, b, c, d) by the product rule
= P(e | c) P(a, b, c, d) by cond. indep. assumption
= P(e | c) P(d | a, b, c) P(a, b, c)
= P(e | c) P(d | b, c) P(c | a, b) P(a, b)
= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)

6
Topological semantics
• A node is conditionally independent of its non-
descendants given its parents
• A node is conditionally independent of all other nodes in
the network given its parents, children, and children’s
parents (also known as its Markov blanket)
• The method called d-separation can be applied to decide
whether a set of nodes X is independent of another set Y,
given a third set Z

7
Inference tasks
• Simple queries: Computer posterior marginal P(Xi | E=e)
– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)
• Conjunctive queries:
– P(Xi, Xj | E=e) = P(Xi | e=e) P(Xj | Xi, E=e)
• Optimal decisions: Decision networks include utility
information; probabilistic inference is required to find
P(outcome | action, evidence)
• Value of information: Which evidence should we seek next?
• Sensitivity analysis: Which probability values are most
critical?
• Explanation: Why do I need a new starter motor?
8
Approaches to inference
• Exact inference
– Enumeration
– Belief propagation in polytrees
– Variable elimination
– Clustering / join tree algorithms
• Approximate inference
– Stochastic simulation / sampling methods
– Markov chain Monte Carlo methods
– Genetic algorithms
– Neural networks
– Simulated annealing
– Mean field theory

9
Direct inference with BNs
• Instead of computing the joint, suppose we just want the
probability for one variable
• Exact methods of computation:
– Enumeration
– Variable elimination
• Join trees: get the probabilities associated with every query
variable

10
Inference by enumeration
• Add all of the terms (atomic event probabilities) from the
full joint distribution
• If E are the evidence (observed) variables and Y are the
other (unobserved) variables, then:
P(X|e) = α P(X, E) = α ∑ P(X, E, Y)
• Each P(X, E, Y) term can be computed using the chain rule
• Computationally expensive!

11
Example: Enumeration
a

b c

d e
• P(xi) = Σ πi P(xi | πi) P(πi)
• Suppose we want P(D=true), and only the value of E is
given as true
• P (d|e) =  ΣABCP(a, b, c, d, e)
=  ΣABCP(a) P(b|a) P(c|a) P(d|b,c) P(e|c)
• With simple iteration to compute this expression, there’s
going to be a lot of repetition (e.g., P(e|c) has to be
recomputed every time we iterate over C=true)
12
Exercise: Enumeration
p(smart)=.8 p(study)=.6
smart study

p(fair)=.9
prepared fair
p(prep|…) smart smart
study .9 .7
pass study .5 .1
smart smart
p(pass|…)
prep prep prep prep Query: What is the
fair .9 .7 .7 .2
probability that a student
studied, given that they pass
fair .1 .1 .1 .1
the exam? 13
Summary
• Bayes nets
– Structure
– Parameters
– Conditional independence
– Chaining
• BN inference
– Enumeration
– Variable elimination
– Sampling methods

14

You might also like