EECS6895 AdvancedBigDataAnalytics Lecture6
EECS6895 AdvancedBigDataAnalytics Lecture6
Bayesian Network
Direction of Evolution
recognition
perception
reasoning sensors
strategy representation
memory
3
Introduction
Suppose the doctor is trying to
determine if a patient has inhalational
anthrax. She observes the following
symptoms:
• The patient has a cough
• The patient has difficulty in breathing
• The patient has a fever
4
Introduction
Dealing with uncertainty:
You would like to determine how likely the
patient is infected with inhalational
anthrax given that the patient has a cough,
a fever, and difficulty breathing
5
Introduction
New evidence: X-ray image shows that the
patient has a wide mediastinum.
Belief update: your belief that the patient is
infected with inhalational anthrax is now
much higher now.
6
Introduction
7
Bayesian Network Has Anthrax
P(A = false)
11
Conditional Probability
• P(A = true | B = true) : Out of all the outcomes in which B is
true, how many also have A equal to true
• Read as: “Probability of A given B”
F = “Have a fever”
C = “Coming down with cold”
P(F = true)
P(F = true) = 1/10
P(C = true) = 1/15
P(F = true | C = true) = 1/2
12
The Joint Probability Distribution
• P(A = true, B = true) :“the probability of A = true and B = true”
• Notice that:
P(F=true|C=true)
13
The Joint Probability Distribution
A B C P(A,B,C)
• Joint probabilities can be between any false false false 0.1
number of variables false false true 0.2
e.g. P(A = true, B = true, C = true) false true false 0.05
Sums to 1
14
The Joint Probability Distribution
A B C P(A,B,C)
• Once you have the joint probability false false false 0.1
distribution, you can calculate any false false true 0.2
probability involving A, B, and C false true false 0.05
• Note: May need to use false true true 0.05
marginalization and Bayes rule, true false false 0.3
true false true 0.1
true true false 0.05
Examples of things you can compute: true true true 0.15
16
Independence
How is independence useful?
• Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
• If the coin flips are not independent, you need 2n
values in the table
• If the coin flips are independent, then
n
P(C1 ,..., Cn ) = ∏ P(Ci )
i =1
17
Conditional Independence
• C and A are conditionally independent given B if the following
holds:
P(C | A, B) = P(C | B)
• Example: “Cancer is a common cause of the two symptoms: a
positive X-ray and dyspnoea”
Lung
cancer
B
dyspnoea
X-ray
A C
C D
2. A set of tables for each node in the graph: conditional probability table
C D
21
A Set of Tables for Each Node
Each node Xi has a conditional probability distribution
P(Xi | Parents(Xi)) that quantifies the effect of the parents on the node
except the root node
A P(A) A B P(B|A)
false 0.4 false false 0.03
true 0.6 A false true 0.97
true false 0.6
true true 0.4
B
B C P(C|B) B D P(D|B)
false false 0.3 C D false false 0.01
false true 0.7 false true 0.99
true false 0.8 true false 0.04
true true 0.2 true true 0.96
Bayesian Networks
Two important properties:
1. Encodes the conditional independence relationships between the
variables in the graph structure
2. Is a compact representation of the joint probability distribution
over the variables
A
C D
23
Conditional Independence
The probability distribution for each node depends only on its parents
C1 and C2 are conditionally independent given X
P1 P2
C1 C2
24
The Joint Probability Distribution
Due to the conditional independence property, the
joint probability distribution over all the variables X1,
…, Xn in the Bayesian net can be computed using the
formula:
n
P( X 1 = x1 ,..., X n = xn ) = ∏ P( X i = xi | Parents( X i ))
i =1
25
Using a Bayesian Network Example
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) *
P(C = true | B = true) *P( D = true | B = true) A
= (0.6)*(0.4)*(0.2)*(0.96)
B
C D
26
Using a Bayesian Network Example
P(A = true, B = true, C = true, D = true)
= P(A = true) * P(B = true | A = true) * from the graph structure
P(C = true | B = true) *P( D = true | B = true)
A
= (0.6)*(0.4)*(0.2)*(0.96)
B
From the conditional
probability tables
C D
A P(A) A B P(B|A) B D P(D|B) B C P(C|B)
false 0.4 false false 0.03 false false 0.01 false false 0.3
true 0.6 false true 0.97 false true 0.99 false true 0.7
true false 0.6 true false 0.04 true false 0.8
true true 0.4 true true 0.96 true true 0.2
27
Another example
• I'm at work, neighbor Jeff calls to say my alarm is ringing, but neighbor Mary doesn't call.
Sometimes it's set off by minor earthquakes. Is there a burglar?
A:Alarm
29
Bayesian Network for Alarm Domain
B:Burglary E:Earthquake
P(B) P(E)
.001 .002
B E P(A)
T T .95
T F .94 A:Alarm
F T .29
F F .001
A P(M) A P(J)
T .70 T .90
F .01 F .05
M:Mary Calls J:Jeff Calls
P(J =true, M=true, A=true, B=false, E=false)
= P(J =true |A =true)P(M =true |A =true)P(A =true |B =false, E =false)P(B =false)P(E =false)
= 0.9 * 0.7 * 0.001 * 0.999 * 0.998 = 0.00062 30
Outline
• Introduction
• Probability Review
• Bayesian Network
• Inference methods
• Network Structure Learning
Inference
• How can one infer the (probabilities of) values of one or more
network variables, given observed values of others?
• P( X | E )
E = The evidence variable(s)
32
Inference: example
Has Anthrax
Has Cough Has Fever Has Difficulty Breathing Has Wide Mediastinum
33
Inference in Bayesian Network
• Exact inference:
Variable Elimination
Junction Tree
• Approximate inference:
Markov Chain Monte Carlo
Variational Methods
FromConditional
Bayesian Network to Junction Tree
dependence among random variables
– Exact Inference
•
allows information propagated from a node to another
⇨ foundation of probabilistic
node → random variable inference Not straightforward
NP hard
36
Constructing Junction Trees
1. Moralization: construct an undirected graph from the DAG
2. Triangulation: Selectively add arcs to the moral graph
3. Build a junction graph by identifying the cliques and separators
4. Build the junction tree by find an appropriate spanning tree
37
Step 1: Moralization: marry the parents
a a a
b c g b c g b c g
d e h d e h d e h
f f f
G=(V,E) GM
1. For all w ∈ V:
• For all u,v∈parents(w) add an edge e=u-v.
2. Undirect all edges.
38
Step 2: Triangulation
a a
b c g b c g
d e h d e h
f f
GM GT
39
Step 3: Build the junction graph
• A junction graph for an undirected graph G is an undirected, labeled
graph.
• Clique: a subgraph that is complete and maximal.
• The nodes are the cliques in G.
• If two cliques intersect, they are joined in the junction graph by an
edge labeled with their intersection. (separators)
40
a a a
b c g b c g b c g
d e h d e h d e h
f f f
Bayesian Network Moral graph GM Triangulated graph GT
G=(V,E)
abd a ace
a a a
ad ae ce b c c g
ade e ceg g
d d e e e
e
de e eg d e e h
separators
def e egh
f Cliques
e.g. ceg ∩ egh = eg
Junction graph GJ (not complete) 41
Step 4: Junction Tree
• A junction tree is a sub-graph of the junction graph
that
• Is a tree
• Contains all the cliques (spanning tree)
• Satisfies the running intersection property:
for each pair of nodes U, V, all nodes on the path
between U and V contain U ∩ V
42
Step 4: Junction Tree (cont.)
• Theorem: An undirected graph is triangulated if and only if its junction
graph has a junction tree
• Definition: The weight of a link in a junction graph is the number of
variable in the label. The weight of a junction tree is the sum of
weights of the labels.
• Theorem: A sub-tree of the junction graph of a triangulated graph is a
junction tree if and only if it is a spanning of maximal weight
43
There are several methods to find MST.
Kruskal’s algorithm: choose successively a link of
maximal weight unless it creates a cycle.
abd ace
abd a ace
ad ae ce
ad ae ce
44
Inference using junction tree
• Potential ∅𝑋: a function that maps each instantiation x of a set of variables X into a nonnegative real number
∑ 𝑌
Marginalization: suppose 𝑋 ∈ 𝑌, ∅𝑌 = ∅
•
𝑌 \X
• Constraints on potentials
1) Consistency property : for each clique X and neighboring separator S, it holds that
Pomegranate
Generate Models with Pomegranate (greedy)
• Results can highly depend
on samples
Generate Acyclic Permutations
• For each node, randomly assign it to level 1, 2, …, K
• Randomly pick two nodes
• One node from level k
• Other node from level k+1
• Add a directed edge from first to second node
• Edges cannot skip levels or connect nodes in the same level
• This leveling system prevents generating networks with cycles
• Use PGMPY’s K2Score to quantify network fit
http://www.lx.it.pt/~asmc/pub/talks/09-TA/ta_pres.pdf
Example where max_level = 3
Network with best score
How to handle larger datasets with many
columns/nodes and cardinality?
Use histogram binning to reduce the cardinality
• Instead of having too many cardinality, reduce it using histogram (fixed
# bins or percentage).
• Also reduces chance of having value only appear once or twice
Example with max_level = 5
Permutation + Pruning Algorithm
• Have a loop where we choose 1, 2, … tuple of nodes and run through
each permutation of node-edge connections with the tuples
• After every loop, look through the permutations that are generated
and pick the ones with highest scores
• Scores are calculated by doing prediction on the leaf nodes (nodes without
any children)
Networks with high scores
Network with high score
Ardi Machine Learning includes Bayesian
Networks
ml-ui ml-manager ml-worker
• Read files and
• Get file from UI parameters
• Process parameters • Run Bayesian code
• Store entry in database • Return results
• Wait for available BayesNetworkHandler.py
requests worker to take the work
Pomegran
• Read results PGMPY
ate
ml-db
• Stores data
and
parameters
Acknowledgements
• Some of the materials are based on work by the following:
• Dr.Cheng, Dr. Wong, Dr. Hamo, Dr. Silberstein, Dr. Huang, Mr. Chang-
Ogimoto, etc…