Bayesian Networks
Bayesian Networks
1
Introduction
Suppose you are trying to determine
if a patient has pneumonia. You
observe the following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing
2
Introduction
You would like to determine how
likely the patient has pneumonia
given that the patient has a cough, a
fever, and difficulty breathing
3
Introduction
Now suppose you order a chest x-
ray and the results are positive.
Your belief that that the patient has
pneumonia is now much higher.
4
Introduction
• In the previous slides, what you observed
affected your belief that the patient has
pneumonia
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Why in fact, we do...
5
Bayesian Networks
• Bayesian networks help us reason with uncertainty
• In the opinion of many AI researchers, Bayesian
networks are the most significant contribution in
AI in the last 10 years
• They are used in many applications eg.:
– Spam filtering / Text mining
– Speech recognition
– Robotics
– Diagnostic systems
– Syndromic surveillance
6
Bayesian Networks (An Example)
8
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the
event
• For example, the random variable A could
be the event of getting a heads on a coin flip
9
Boolean Random Variables
• We deal with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not occurring
• Examples (Let A be a Boolean random variable):
A = Getting heads on a coin flip
A = It will rain today
A = There is a typo in these slides
10
Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an
outcome would be obtained if the process were repeated a large
number of times under similar conditions*
*
Ahem…there’s also the Bayesian
definition which says probability is
your degree of belief in an outcome
P(A = false)
11
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which
B is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”
H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) = 1/40
P(H = true | F = true) = 1/2
12
The Joint Probability Distribution
• We will write P(A = true, B = true) to mean
“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
P(F = true)
Area of " H and F" region
Area of " F" region
P(H true, F true)
P(F true)
P(H = true)
In general, P(X|Y)=P(X,Y)/P(Y)
13
The Joint Probability Distribution
• Joint probabilities can be between A B C P(A,B,C)
any number of variables false false false 0.1
false false true 0.2
eg. P(A = true, B = true, C = true)
false true false 0.05
• For each combination of variables,
false true true 0.05
we need to say how probable that true false false 0.3
combination is
true false true 0.1
• The probabilities of these true true false 0.05
combinations need to sum to 1 true true true 0.15
Sums to 1
14
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
• Note: May need to use false true true 0.05
marginalization and Bayes rule, true false false 0.3
(both of which are not discussed in
these slides) true false true 0.1
true true false 0.05
16
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)
18
Conditional Independence
Variables A and B are conditionally independent
given C if any of the following hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)
20
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A
C D
C D
25
Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its non-
descendants (ND1, ND2)
P1 P2
ND1 X ND2
C1 C2
26
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution over
all the variables X1, …, Xn in the Bayesian net
using the formula:
n
P ( X 1 x1 ,..., X n xn ) P ( X i xi | Parents ( X i ))
i 1
27
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A
C D
28
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A
B
These numbers are from the
conditional probability tables
C D
29
Inference
• Using a Bayesian network to compute
probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)
X = The query variable(s)
30
Inference
HasPneumonia
32
How is the Bayesian network created?
1. Get an expert to design it
– Expert must determine the structure of the Bayesian network
• This is best done by modeling direct causes of a variable as
its parents
– Expert must determine the values of the CPT entries
• These values could come from the expert’s informed opinion
• Or an external source eg. census information
• Or they are estimated from data
• Or a combination of the above
34
Learning Bayesian Networks from Data
• Each possible structure
A B C D
contains information about the
conditional independence
or or or relationships between A, B, C
and D
A B A • We would like a structure that
C B
? contains conditional
independence relationships
that are supported by the data
D C D • Note that we also need to learn
the values in the CPTs from
data
35
Learning Bayesian Networks from Data
How does Bayesian statistics help?
A 1. I might have a prior belief about what the
structure should look like.
B
2. I might have a prior belief about what the
C D values in the CPTs should be.
37
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance
38
Bayesian Networks in Syndromic
Surveillance
From: Goldenberg, A., Shmueli, G., Caruana,
R. A., and Fienberg, S. E. (2002). Early
statistical detection of anthrax outbreaks by
tracking over-the-counter medication sales.
Proceedings of the National Academy of
Sciences (pp. 5237-5249)
40
Population-wide ANomaly Detection
and Assessment (PANDA)
• A detector specifically for a large-scale
outdoor release of inhalational anthrax
• Uses a massive causal Bayesian network
• Population-wide approach: each person in
the population is represented as a
subnetwork in the overall model
41
Population-Wide Approach
Anthrax Release Global nodes
Each person in
Person Model Person Model Person Model
the population
Each person in
Person Model Person Model Person Model
the population
…
…
Gender
Age Decile Age Decile Gender
Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
from Anthrax from Other from Anthrax from Other
Respiratory CC Respiratory CC
When Admitted When Admitted
ED Admission ED Admission
Person Model (Initial Prototype)
Anthrax Release
…
Female
… 20-30 50-60 Male
Gender
Age Decile Age Decile Gender
Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
Unknown
from Anthrax False from Other from Anthrax from Other
Respiratory CC Respiratory CC
When Admitted When Admitted
47
References
Bayesian networks:
• “Bayesian networks without tears” by Eugene Charniak
• “Artificial Intelligence: A Modern Approach” by Stuart
Russell and Peter Norvig
• “Learning Bayesian Networks” by Richard Neopolitan
• “Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference” by Judea Pearl
Other references:
• My webpage
http://www.eecs.oregonstate.edu/~wong
48