0% found this document useful (0 votes)
18 views48 pages

Bayesian Networks

Uploaded by

salahalj2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views48 pages

Bayesian Networks

Uploaded by

salahalj2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 48

Bayesian Networks

1
Introduction
Suppose you are trying to determine
if a patient has pneumonia. You
observe the following symptoms:
• The patient has a cough
• The patient has a fever
• The patient has difficulty
breathing

2
Introduction
You would like to determine how
likely the patient has pneumonia
given that the patient has a cough, a
fever, and difficulty breathing

We are not 100% certain that the


patient has pneumonia because of
these symptoms. We are dealing
with uncertainty!

3
Introduction
Now suppose you order a chest x-
ray and the results are positive.
Your belief that that the patient has
pneumonia is now much higher.

4
Introduction
• In the previous slides, what you observed
affected your belief that the patient has
pneumonia
• This is called reasoning with uncertainty
• Wouldn’t it be nice if we had some
methodology for reasoning with
uncertainty? Why in fact, we do...

5
Bayesian Networks
• Bayesian networks help us reason with uncertainty
• In the opinion of many AI researchers, Bayesian
networks are the most significant contribution in
AI in the last 10 years
• They are used in many applications eg.:
– Spam filtering / Text mining
– Speech recognition
– Robotics
– Diagnostic systems
– Syndromic surveillance
6
Bayesian Networks (An Example)

From: Aronsky, D. and Haug, P.J., Diagnosing community-acquired pneumonia


with a Bayesian network, In: Proceedings of the Fall Symposium of the
American Medical Informatics Association, (1998) 632-636.
7
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance

8
Probability Primer: Random Variables
• A random variable is the basic element of
probability
• Refers to an event and there is some degree
of uncertainty as to the outcome of the
event
• For example, the random variable A could
be the event of getting a heads on a coin flip

9
Boolean Random Variables
• We deal with the simplest type of random
variables – Boolean ones
• Take the values true or false
• Think of the event as occurring or not occurring
• Examples (Let A be a Boolean random variable):
A = Getting heads on a coin flip
A = It will rain today
A = There is a typo in these slides

10
Probabilities
We will write P(A = true) to mean the probability that A = true.
What is probability? It is the relative frequency with which an
outcome would be obtained if the process were repeated a large
number of times under similar conditions*

The sum of the red


and blue areas is 1
P(A = true)

*
Ahem…there’s also the Bayesian
definition which says probability is
your degree of belief in an outcome
P(A = false)

11
Conditional Probability
• P(A = true | B = true) = Out of all the outcomes in which
B is true, how many also have A equal to true
• Read this as: “Probability of A conditioned on B” or
“Probability of A given B”

H = “Have a headache”
F = “Coming down with Flu”
P(F = true)
P(H = true) = 1/10
P(F = true) = 1/40
P(H = true | F = true) = 1/2

“Headaches are rare and flu is rarer, but if


P(H = true) you’re coming down with flu there’s a 50-
50 chance you’ll have a headache.”

12
The Joint Probability Distribution
• We will write P(A = true, B = true) to mean
“the probability of A = true and B = true”
• Notice that:
P(H=true|F=true)
P(F = true)
Area of " H and F" region

Area of " F" region
P(H true, F true)

P(F true)
P(H = true)
In general, P(X|Y)=P(X,Y)/P(Y)
13
The Joint Probability Distribution
• Joint probabilities can be between A B C P(A,B,C)
any number of variables false false false 0.1
false false true 0.2
eg. P(A = true, B = true, C = true)
false true false 0.05
• For each combination of variables,
false true true 0.05
we need to say how probable that true false false 0.3
combination is
true false true 0.1
• The probabilities of these true true false 0.05
combinations need to sum to 1 true true true 0.15

Sums to 1

14
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
• Note: May need to use false true true 0.05
marginalization and Bayes rule, true false false 0.3
(both of which are not discussed in
these slides) true false true 0.1
true true false 0.05

Examples of things you can compute: true true true 0.15

• P(A=true) = sum of P(A,B,C) in rows with A=true


• P(A=true, B = true | C=true) =
P(A = true, B = true, C = true) / P(C = true)
15
The Problem with the Joint
Distribution
• Lots of entries in the A B C P(A,B,C)
table to fill up! false false false 0.1

• For k Boolean random false false true 0.2


false true false 0.05
variables, you need a false true true 0.05
table of size 2k true false false 0.3
• How do we use fewer true false true 0.1

numbers? Need the true true false 0.05


true true true 0.15
concept of
independence

16
Independence
Variables A and B are independent if any of
the following hold:
• P(A,B) = P(A) P(B)
• P(A | B) = P(A)
• P(B | A) = P(B)

This says that knowing the outcome of


A does not tell me anything new about
the outcome of B.
17
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2 n
values in the table
• If the coin flips are independent, then
n
P (C1 ,..., Cn )  P (Ci ) Each P(Ci) table has 2 entries
i 1 and there are n of them for a
total of 2n values

18
Conditional Independence
Variables A and B are conditionally independent
given C if any of the following hold:
• P(A, B | C) = P(A | C) P(B | C)
• P(A | B, C) = P(A | C)
• P(B | A, C) = P(B | C)

Knowing C tells me everything about B. I don’t gain


anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give)
19
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance

20
A Bayesian Network
A Bayesian network is made up of:
1. A Directed Acyclic Graph
A

C D

2. A set of tables for each node in the graph


A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.6 false false 0.01 false false 0.02 false false 0.4
true 0.4 false true 0.99 false true 0.98 false true 0.6
true false 0.7 true false 0.05 true false 0.9
true true 0.3 true true 0.95 true true 0.1
A Directed Acyclic Graph
Each node in the graph is a A node X is a parent of
random variable another node Y if there is an
arrow from node X to node Y
A eg. A is a parent of B

C D

Informally, an arrow from


node X to node Y means X
has a direct influence on Y
22
A Set of Tables for Each Node
A P(A) A B P(B|A)
Each node Xi has a
false 0.6 false false 0.01
conditional probability
true 0.4 false true 0.99
true false 0.7
distribution P(Xi | Parents(Xi))
true true 0.3 that quantifies the effect of
the parents on the node
B C P(C|B)
false false 0.4 The parameters are the
false true 0.6 A probabilities in these
true false 0.9 conditional probability tables
true true 0.1 (CPTs)
B
B D P(D|B)
false false 0.02
C D
false true 0.98
true false 0.05
true true 0.95
A Set of Tables for Each Node
Conditional Probability
Distribution for C given B
B C P(C|B)
false false 0.4
false true 0.6
true false 0.9
true true 0.1 For a given combination of values of the parents (B
in this example), the entries for P(C=true | B) and
P(C=false | B) must add up to 1
eg. P(C=true | B=false) + P(C=false |B=false )=1

If you have a Boolean variable with k Boolean parents, this table


has 2k+1 probabilities (but only 2k need to be stored)
24
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables

25
Conditional Independence
The Markov condition: given its parents (P1, P2),
a node (X) is conditionally independent of its non-
descendants (ND1, ND2)

P1 P2

ND1 X ND2

C1 C2

26
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution over
all the variables X1, …, Xn in the Bayesian net
using the formula:
n
P ( X 1  x1 ,..., X n  xn )  P ( X i  xi | Parents ( X i ))
i 1

Where Parents(Xi) means the values of the Parents of the node Xi


with respect to the graph

27
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

C D

28
Using a Bayesian Network Example
Using the network in the example, suppose you want to
calculate:
This is from the
P(A = true, B = true, C = true, D = true) graph structure
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) P( D = true | B = true)
= (0.4)*(0.3)*(0.1)*(0.95) A

B
These numbers are from the
conditional probability tables
C D

29
Inference
• Using a Bayesian network to compute
probabilities is called inference
• In general, inference involves queries of the form:
P( X | E )
E = The evidence variable(s)
X = The query variable(s)

30
Inference
HasPneumonia

HasCough HasFever HasDifficultyBreathing ChestXrayPositive

• An example of a query would be:


P( HasPneumonia = true | HasFever = true, HasCough = true)
• Note: Even though HasDifficultyBreathing and
ChestXrayPositive are in the Bayesian network, they are not
given values in the query (ie. they do not appear either as query
variables or evidence variables)
• They are treated as unobserved variables
31
The Bad News
• Exact inference is feasible in small to
medium-sized networks
• Exact inference in large networks takes a
very long time
• We resort to approximate inference
techniques which are much faster and give
pretty good results

32
How is the Bayesian network created?
1. Get an expert to design it
– Expert must determine the structure of the Bayesian network
• This is best done by modeling direct causes of a variable as
its parents
– Expert must determine the values of the CPT entries
• These values could come from the expert’s informed opinion
• Or an external source eg. census information
• Or they are estimated from data
• Or a combination of the above

2. Learn it from data


– This is a much better option but it usually requires a large amount
of data
– This is where Bayesian statistics comes in!
33
Learning Bayesian Networks from Data
Given a data set, can you learn
what a Bayesian network with A B C D
variables A, B, C and D would
look like?
or or or
A B C D
true false false true
true false true false A B A
true
false
false
false
true
true
false
false
false
true
false
true C B
?
false true false false
false true false false D C D
: : : :

34
Learning Bayesian Networks from Data
• Each possible structure
A B C D
contains information about the
conditional independence
or or or relationships between A, B, C
and D
A B A • We would like a structure that

C B
? contains conditional
independence relationships
that are supported by the data
D C D • Note that we also need to learn
the values in the CPTs from
data
35
Learning Bayesian Networks from Data
How does Bayesian statistics help?
A 1. I might have a prior belief about what the
structure should look like.
B
2. I might have a prior belief about what the
C D values in the CPTs should be.

These beliefs get updated as I see more data


B D P(D|B)
false false 0.02
false true 0.98
true false 0.05
true true 0.95 36
Learning Bayesian Networks from Data
• We won’t have enough time to describe how we
actually learn Bayesian networks from data
• If you are interested, here are some references:
– Gregory F. Cooper and Edward Herskovits. A Bayesian
Method for the Induction of Probabilistic Networks
from Data. Machine Learning, 9:309-347, 1992.
– David Heckerman. A Tutorial on Learning Bayesian
Networks. Technical Report MSR-TR-95-06,
Microsoft Research. 1995. (Available online)

37
Outline
1. Introduction
2. Probability Primer
3. Bayesian networks
4. Bayesian networks in syndromic
surveillance

38
Bayesian Networks in Syndromic
Surveillance
From: Goldenberg, A., Shmueli, G., Caruana,
R. A., and Fienberg, S. E. (2002). Early
statistical detection of anthrax outbreaks by
tracking over-the-counter medication sales.
Proceedings of the National Academy of
Sciences (pp. 5237-5249)

• Syndromic surveillance systems traditionally


monitor univariate time series
• With Bayesian networks, it allows us to model
multivariate data and monitor it
39
What’s Strange About Recent
Events (WSARE) Algorithm
Bayesian networks used to model the
multivariate baseline distribution for ED data
Date Time Gender Age Home Many
Location more…
6/1/03 9:12 M 20s NE …
6/1/03 10:45 F 40s NE …
6/1/03 11:03 F 60s NE …
6/1/03 11:07 M 60s E …
6/1/03 12:15 M 60s E …
: : : : : :

40
Population-wide ANomaly Detection
and Assessment (PANDA)
• A detector specifically for a large-scale
outdoor release of inhalational anthrax
• Uses a massive causal Bayesian network
• Population-wide approach: each person in
the population is represented as a
subnetwork in the overall model

41
Population-Wide Approach
Anthrax Release Global nodes

Location of Release Time of Release Interface nodes

Each person in
Person Model Person Model Person Model
the population

• Note the conditional independence


assumptions
• Anthrax is infectious but non-contagious
42
Population-Wide Approach
Anthrax Release Global nodes

Location of Release Time of Release Interface nodes

Each person in
Person Model Person Model Person Model
the population

• Structure designed by expert judgment


• Parameters obtained from census data, training data,
and expert assessments informed by literature and
experience
43
Person Model (Initial Prototype)
Anthrax Release

Time Of Release Location of Release



Gender
Age Decile Age Decile Gender

Home Zip Home Zip


Other ED Other ED
Anthrax Infection Disease Anthrax Infection Disease

Respiratory Respiratory CC Respiratory Respiratory CC


from Anthrax From Other from Anthrax From Other

Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
from Anthrax from Other from Anthrax from Other

Respiratory CC Respiratory CC
When Admitted When Admitted

ED Admission ED Admission
Person Model (Initial Prototype)
Anthrax Release

Time Of Release Location of Release


Female
… 20-30 50-60 Male
Gender
Age Decile Age Decile Gender

Home Zip Home Zip


Other ED Other ED
Anthrax Infection 15213 Disease Anthrax Infection 15146 Disease

Respiratory Respiratory CC Respiratory Respiratory CC


from Anthrax From Other from Anthrax From Other

Respiratory Respiratory
CC CC
ED Admit ED Admit ED Admit ED Admit
Unknown
from Anthrax False from Other from Anthrax from Other

Respiratory CC Respiratory CC
When Admitted When Admitted

Yesterday ED Admission never ED Admission


What else does this give you?
1. Can model information such as the spatial
dispersion pattern, the progression of
symptoms and the incubation period
2. Can combine evidence from ED and OTC
data
3. Can infer a person’s work zip code from
their home zip code
4. Can explain the model’s belief in an
anthrax attack
46
Acknowledgements
• These slides were partly based on a tutorial
by Andrew Moore
• Greg Cooper, John Levander, John
Dowling, Denver Dash, Bill Hogan, Mike
Wagner, and the rest of the RODS lab

47
References
Bayesian networks:
• “Bayesian networks without tears” by Eugene Charniak
• “Artificial Intelligence: A Modern Approach” by Stuart
Russell and Peter Norvig
• “Learning Bayesian Networks” by Richard Neopolitan
• “Probabilistic Reasoning in Intelligent Systems: Networks
of Plausible Inference” by Judea Pearl

Other references:
• My webpage
http://www.eecs.oregonstate.edu/~wong

48

You might also like