0% found this document useful (0 votes)

38 views35 pages

An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】

This document discusses probabilistic graphical models and their use in expert systems. It introduces probabilistic graphical models including probabilistic networks represented by graphs. It describes properties of conditional independence and how graphs can represent conditional independence relationships through d-separation. It discusses different types of graphs including undirected graphs, directed acyclic graphs, and how they can be used to represent factorizations and perform inference. It provides an example of the Lauritzen-Spiegelhalter algorithm for inference in graphical models.

Uploaded by

565864220

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views35 pages

An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】

Uploaded by

565864220

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

PROBABILISTIC GRAPHICAL

MODELS

David Madigan
Rutgers University

[email protected]
Expert Systems

•Explosion of interest in “Expert Systems” in the

early 1980’s

IF the infection is primary-bacteremia

AND the site of the culture is one of the sterile sites
AND the suspected portal of entry is the gastrointestinal tract
THEN there is suggestive evidence (0.7) that infection is bacteroid.

•Many companies (Teknowledge, IntelliCorp,

Inference, etc.), many IPO’s, much media hype

•Ad-hoc uncertainty handling

Uncertainty in Expert Systems

If A then C (p1)
If B then C (p2)

What if both A and B true?

Then C true with CF:

p1 + (p2 X (1- p1))

“Currently fashionable ad-hoc mumbo jumbo”

A.F.M. Smith
Eschewed Probabilistic Approach

•Computationally intractable
•Inscrutable
•Requires vast amounts of data/elicitation

e.g., for n dichotomous variables need 2n - 1

probabilities to fully specify the joint distribution
Conditional Independence

X Y|Z ! f X ,Y |Z ( x, y | z ) = f X |Z ( x | z ) fY |Z ( y | z )
Conditional Independence

•Suppose A and B are marginally independent. Pr(A),

Pr(B), Pr(C|AB) X 4 = 6 probabilities
•Suppose A and C are conditionally independent

given B: Pr(A), Pr(B|A) X 2, Pr(C|B) X 2 = 5

A B C

C A|B

•Chain with 50 variables requires 99 probabilities

versus 250-1
Properties of Conditional Independence (Dawid, 1980)

For any probability measure P and random variables A, B, and C:

CI 1: A B [P] ⇒ B A [P]

CI 2: A B ∪ C [P] ⇒ A B [P]

CI 3: A B ∪ C [P] ⇒ A B | C [P]

CI 4: A B and A C | B [P] ⇒ A B ∪ C [P]

Some probability measures also satisfy:

CI 5: A B | C and A C | B [P] ⇒ A B ∪ C [P]

CI5 satisfied whenever P has a positive joint probability density with

respect to some product measure
Markov Properties for Undirected Graphs

(Global) S separates A from B ⇒ A B|S

(Local) α V \ cl(α) | bd (α)
(Pairwise) α β | V \ {α,β}

(G) ⇒ (L) ⇒ (P)

X2 X5, X4 | X1, X3 (1)

X1 X2
⇒ X2 X4 | X1, X3, X5 (2)
X5 X3

X4 To go from (2) to (1) need X5 X2 | X1,X3? or CI5

Lauritzen, Dawid, Larsen & Leimer (1990)
Factorizations

A density f is said to “factorize according to G” if:

f(x) = Π ψC(xC)
C εC

• cliques are maximally complete subgraphs “clique potentials”

Proposition: If f factorizes according to a UG G, then it also
obeys the global Markov property

“Proof”: Let S separate A from B in G and assume V = A ! B ! S .

Let CA be the set of cliques with non-empty intersection with A.
Since S separates A from B, we must have B " C = ! for all C
in CA. Then:
f ( x) = #$ C ( xC ) #$ C ( xC ) = f1 ( x A! S ) f 2 ( xB ! S )
C"C A C"C \ C A
Markov Properties for Acyclic Directed Graphs
(Bayesian Networks)

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S

(Local) α nd(α)\pa(α) | pa (α)

(G) ⇔ (L)
X1
X1

X3
X3

X2
X2

Lauritzen, Dawid, Larsen & Leimer (1990)

Factorizations

A density f admits a “recursive factorization” according to an

ADG G if f(x) = Π f(xv | xpa(v) )

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )

v εV

Lemma: If P admits a recursive factorization according to an

ADG G, then P factorizes according GM (and chordal
supergraphs of GM)

Lemma: If P admits a recursive factorization according to an

ADG G, and A is an ancestral set in G, then PA admits a
recursive factorization according to the subgraph GA
Factorizations

A p(A,B,C,D,E,F,G,H,S) =
p(A)p(C|A)p(D|C)p(S|D,F)p(E|S)
C G B p(F|G)p(G|B)p(H|S,B)p(B)
D F ⇒
S
H
p(S|A,B,C,D,E,F,G,H) ∝
E p(S|D,F)p(E|S)p(H|S,B)

{D,F,W,H,B} is the “Markov Blanket” of S. It contains the parents of

S, the children of S, and the other parents of the children of S.
Markov Properties for Acyclic Directed Graphs
(Bayesian Networks)

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S

(Local) α nd(α)\pa(α) | pa (α)

(G) ⇒ (L) α ∪ nd(α) is an ancestral set; pa(α) obviously

separates α from nd(α)\pa(α) in Gan(α∪nd(α))m

(L) ⇒ (factorization) induction on the number of vertices

d-separation

A chain π from a to b in an acyclic directed graph G is said to be

blocked by S if it contains a vertex γ ∈ π such that either:
- γ ∈ S and arrows of π do not meet head to head at γ, or
- γ ∉ S nor has γ any descendents in S, and arrows of π
do meet head to head at g
Two subsets A and B are d-separated by S if all chains from A
to B are blocked by S
d-separation and global markov property

Let A, B, and S be disjoint subsets of a directed, acyclic graph,

G. Then S d-separates A from B if and only if S separates A
from B in Gan(A,B,S)m
UG – ADG Intersection

A B C

C A|B

A A D
A B C
B
C C B A C|B

A C A B | C,D A B C
C D | A,B
A C|B

A B C

A C|B
UG – ADG Intersection

UG ADG

Decomposable

•UG is decomposable if chordal

No CI5
•ADG is decomposable if moral
•Decomposable ~ closed-form log-linear models
Chordal Graphs and RIP

•Chordal graphs (uniquely) admit clique orderings

that have the Running Intersection Property
V 1. {V,T}
2. {A,L,T}
T L 3. {L,A,B}
A S 4. {S,L,B}
5. {A,B,D}
X D B 6. {A,X}

•The intersection of each set with those earlier in the list is fully contained
in previous set
•Can compute cond. probabilities (e.g. Pr(X|V)) by message passing
(Lauritzen & Spiegelhalter, Dawid, Jensen)
Probabilistic Expert System

•Computationally intractable
•Inscrutable
•Requires vast amounts of data/elicitation

•Chordal UG models facilitate fast inference

•ADG models better for expert system applications –

more natural to specify Pr( v | pa(v) )
Factorizations

UG Global Markov Property ⇔ f(x) = Π ψC(xC)

C εC

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )

v εV
Lauritzen-Spiegelhalter Algorithm

•Moralize
•Triangulate

Algorithm is widely deployed in commercial software

L&S Toy Example

Pr(C|B)=0.2 Pr(C|¬B)=0.6
A B C Pr(B|A)=0.5 Pr(B|¬A)=0.1
Pr(A)=0.7

ψ(A,B) ← Pr(B|A)Pr(A)
A B C
ψ (B,C) ← Pr(C|B)

B ¬B C ¬C
B ¬B
AB B BC A 0.35 0.35 B 0.2 0.8
1 1
¬A 0.03 0.27 ¬B 0.6 0.4

Message Schedule: AB BC BC AB Pr(A|C)

¬C C ¬C
C
B ¬B B 0.076 0
B 0.076 0.304
0.38 0.62 ¬B 0.372
¬B 0.372 0.248 0
Other Theoretical Developments

Do the UG and ADG global Markov properties

identify all the conditional independences implied
by the corresponding factorizations?

Yes. Completeness for ADGs by Geiger and Pearl

(1988); for UGs by Frydenberg (1988)

Graphical characterization of collapsibility in

hierarchical log-linear models
(Asmussen and Edwards, 1983)
Collapsibility

Survival Survival
No Yes No Yes
Less 3 176 1.7% Less 17 197 7.9%
Care Care
More 4 293 1.4% More 2 23 8.0%

Clinic A Clinic B

Survival
No Yes
Less 20 373 5.1%
Care
More 6 316 1.9%

Pooled
Collapsibility

Care Clinic Surv.

Theorem: A graphical log-linear model L

is collapsible onto A iff every connected
component of Ac is complete.
Bayesian Learning for Discrete ADG’s

• Example: three binary variables

• Five parameters:
Local and Global Independence
Bayesian learning
Consider a particular state pa(v)+ of pa(v)
Equivalence Classes and Chain Graphs

• ADG models for a fixed set of vertices decompose into

Markov equivalence classes:
A B C A B C A B C

A C|B
b b b

a d a d a d
A D | B,C
c c c

D1 D2 D3
B C|A
b

a d A D | B,C
c B C
D4
Why is this a problem?

• Repeating analyses for equivalent ADGs leads to significant

computational inefficiencies.
• Ensuring that equivalent ADGs have equal posterior
probabilities imposes severe constraints on prior
distributions (Geiger and Heckerman, 1995).
• Bayesian model averaging procedures that average across
ADGs assign weights to statistical models that are
proportional to equivalence class sizes.
Equivalence Class Characterization

Theorem (Verma & Pearl, Glymour et al, Frydenberg, AMP94):

Two ADGs are Markov equivalent iff they have the same
skeletons and the same immoralities.

Definition The essential graph D* associated with D is the graph

D* := ∪(D’|D’ ~ D),

b b
b
c
a d a d
a d
c c

b b b

a d a d a d

c c c

D1 D2 D3
Essential Graphs
AMP (1995)

• Essential graphs are chain graphs

• D* is the unique smallest chain graph Markov equivalent to D
• A graph G = (V, E) is equal to D* for some ADG D if and only if G
satisfies the following four conditions:
(i) G is a chain graph;
(ii) For every chain component t of G, Gt is chordal;
(iii) The configuration a→bc does not occur as an induced
subgraph of G;
(iv) Every arrow a→b ∈ G is strongly protected in G:
c2

a b a b a b a b (c1!c2)

c c c c1
(a) (b) (c) (d)

also Meek (1995) and Chickering (1995)

What’s a Chain Graph?

“Equivalence”:
a ~ b iff a b

Chain Components (“Boxes”)

Chain Graphs

UG ADG

Decomposable CG

•Chain graph Markov property, Frydenberg (1990)

•Equivalence results (LWF, AMP, Meek, Studeny)

A D
C D | A,B or C D|A ?
C B
Cox & Wermuth (1996)

Lauritzen - Graphical Models (1996)
No ratings yet
Lauritzen - Graphical Models (1996)
306 pages
100 Opposite Words-1 PDF
86% (7)
100 Opposite Words-1 PDF
5 pages
Directed Graphical Models
No ratings yet
Directed Graphical Models
54 pages
We Are Pilgrims PDF
100% (1)
We Are Pilgrims PDF
300 pages
ST Flour Notes
No ratings yet
ST Flour Notes
104 pages
Epi Summer 24
No ratings yet
Epi Summer 24
291 pages
Slides Bnshort
No ratings yet
Slides Bnshort
322 pages
Introduction To Markov Random Fields: Andrew Blake and Pushmeet Kohli
No ratings yet
Introduction To Markov Random Fields: Andrew Blake and Pushmeet Kohli
15 pages
Bayesian Network
No ratings yet
Bayesian Network
15 pages
Case Study With Probabilistic Models
No ratings yet
Case Study With Probabilistic Models
85 pages
Graphical
No ratings yet
Graphical
99 pages
Romantizam
No ratings yet
Romantizam
283 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Theorist's Toolkit Lecture 1: Probabilistic Arguments
No ratings yet
Theorist's Toolkit Lecture 1: Probabilistic Arguments
7 pages
Presentation of Comparing Large-Scale Graphs Based On Quantum Probability Theory
No ratings yet
Presentation of Comparing Large-Scale Graphs Based On Quantum Probability Theory
60 pages
Graph Theory and Its Applications
No ratings yet
Graph Theory and Its Applications
71 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
58 pages
Week 4
No ratings yet
Week 4
48 pages
16 Graphical Models
No ratings yet
16 Graphical Models
27 pages
ML Practical Journal With Writeups
No ratings yet
ML Practical Journal With Writeups
46 pages
Directed vs. Undirected Graphical Models
No ratings yet
Directed vs. Undirected Graphical Models
16 pages
BN Lecture2
No ratings yet
BN Lecture2
37 pages
17 Factor Graphs
No ratings yet
17 Factor Graphs
27 pages
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
29 pages
Markov Networks
No ratings yet
Markov Networks
22 pages
PGM Theory Notes
No ratings yet
PGM Theory Notes
16 pages
BN DBN SSM HMM - Ghahramani
No ratings yet
BN DBN SSM HMM - Ghahramani
30 pages
Bayesian Belief Networks: CS 2740 Knowledge Representation
No ratings yet
Bayesian Belief Networks: CS 2740 Knowledge Representation
21 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
PPT10-W10-Graph Analytics For Big Data
No ratings yet
PPT10-W10-Graph Analytics For Big Data
55 pages
On Perfectness in Gaussian Graphical Models: Arash A. Amini, Bryon Aragam, Qing Zhou
No ratings yet
On Perfectness in Gaussian Graphical Models: Arash A. Amini, Bryon Aragam, Qing Zhou
15 pages
Bayes Ball
No ratings yet
Bayes Ball
5 pages
A Review of Gaussian Markov Models For Conditional Independence
No ratings yet
A Review of Gaussian Markov Models For Conditional Independence
30 pages
5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains
No ratings yet
5 Minimal I-Maps, Chordal Graphs, Trees, and Markov Chains
8 pages
Lower Bounds On The Size of Markov Equivalence Classes: Erik Jahn
No ratings yet
Lower Bounds On The Size of Markov Equivalence Classes: Erik Jahn
18 pages
Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen
No ratings yet
Learning Bayesian Networks With R: Susanne G. Bøttcher Claus Dethlefsen
11 pages
Ai Algorithms PDF
No ratings yet
Ai Algorithms PDF
20 pages
Lecture 4
No ratings yet
Lecture 4
16 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
Introduction To MRFs
No ratings yet
Introduction To MRFs
6 pages
Factor Graph
No ratings yet
Factor Graph
3 pages
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
No ratings yet
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
4 pages
BENGALI (Code: 105) Syllabus: CLASS-XII (2018 - 2019)
No ratings yet
BENGALI (Code: 105) Syllabus: CLASS-XII (2018 - 2019)
2 pages
Detecting Weighted Hidden Cliques
No ratings yet
Detecting Weighted Hidden Cliques
18 pages
Lecture07 Bneuman
No ratings yet
Lecture07 Bneuman
6 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
Belief Propagation Cambridge
No ratings yet
Belief Propagation Cambridge
22 pages
2.-UndirectedGraphs 2
No ratings yet
2.-UndirectedGraphs 2
8 pages
Markov Chains Ergodicity
No ratings yet
Markov Chains Ergodicity
8 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Markov Triplets and Undirected Graphical Models: Lecturer: Tsachy Weissman Scribe: Katarina Van Heusen
No ratings yet
Markov Triplets and Undirected Graphical Models: Lecturer: Tsachy Weissman Scribe: Katarina Van Heusen
4 pages
Prob Inf
No ratings yet
Prob Inf
56 pages
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
No ratings yet
Learning Causal Bayesian Network Structures From Experimental Data - Byron Ellis Wing Hung Wong
39 pages
Machine Learning Models and Theories
No ratings yet
Machine Learning Models and Theories
38 pages
A Project Report On "Lic Housing Finance Loan"
0% (1)
A Project Report On "Lic Housing Finance Loan"
14 pages
Examples of Feedback Questionnaires
No ratings yet
Examples of Feedback Questionnaires
5 pages
Cheat Sheet 4
No ratings yet
Cheat Sheet 4
2 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
Axis Bank Set 1
50% (2)
Axis Bank Set 1
14 pages
Syllabus My Little Island Full - Final
No ratings yet
Syllabus My Little Island Full - Final
52 pages
A Note On Sampling Graphical Markov Models - Megan Bernstein and Prasad Tetali
No ratings yet
A Note On Sampling Graphical Markov Models - Megan Bernstein and Prasad Tetali
16 pages
Synthesis
No ratings yet
Synthesis
61 pages
Sources of Funds
No ratings yet
Sources of Funds
26 pages
INTERSUBJECTIVITY
No ratings yet
INTERSUBJECTIVITY
68 pages
FNCP (Hypertension)
No ratings yet
FNCP (Hypertension)
3 pages
The Clown A Spiritual Resource For The Humanization of Health
No ratings yet
The Clown A Spiritual Resource For The Humanization of Health
5 pages
Christianity and The French Legion
No ratings yet
Christianity and The French Legion
19 pages
An Introduction To Orthodontics
No ratings yet
An Introduction To Orthodontics
38 pages
Workshop 2-Basic Patterns
No ratings yet
Workshop 2-Basic Patterns
10 pages
Dynamic Absolute Kinematic ViSCOSITY
No ratings yet
Dynamic Absolute Kinematic ViSCOSITY
11 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
Ethics and Culture Module 3
No ratings yet
Ethics and Culture Module 3
22 pages
Chapter2 Handouts
No ratings yet
Chapter2 Handouts
15 pages
Close-Up B1 SB 14
No ratings yet
Close-Up B1 SB 14
3 pages
Finalprospectus 21 22
No ratings yet
Finalprospectus 21 22
24 pages
Lecture 11
No ratings yet
Lecture 11
32 pages
Chengalpattu
No ratings yet
Chengalpattu
14 pages
Présentation
No ratings yet
Présentation
5 pages
Family Law
No ratings yet
Family Law
21 pages
MATH858D Markov Chains: Maria Cameron
No ratings yet
MATH858D Markov Chains: Maria Cameron
44 pages
Mechanical Thrombectomy For Acute Ischemic Stroke - UpToDate
No ratings yet
Mechanical Thrombectomy For Acute Ischemic Stroke - UpToDate
20 pages
Maths Paper 2 Paper NIE - 762d7ca5 4cf9 46b3 8da4 Ebf6e10da689
No ratings yet
Maths Paper 2 Paper NIE - 762d7ca5 4cf9 46b3 8da4 Ebf6e10da689
6 pages
Rebellion of The Tribes
No ratings yet
Rebellion of The Tribes
8 pages
Skala BiK 2008
No ratings yet
Skala BiK 2008
19 pages
Bronx Masquerade Selection Test
No ratings yet
Bronx Masquerade Selection Test
5 pages
Patricia Benner PP Nurs 324
No ratings yet
Patricia Benner PP Nurs 324
31 pages

An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】

Uploaded by

An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】

Uploaded by

PROBABILISTIC GRAPHICAL

•Explosion of interest in “Expert Systems” in the

IF the infection is primary-bacteremia

•Many companies (Teknowledge, IntelliCorp,

•Ad-hoc uncertainty handling

What if both A and B true?

p1 + (p2 X (1- p1))

“Currently fashionable ad-hoc mumbo jumbo”

e.g., for n dichotomous variables need 2n - 1

•Suppose A and B are marginally independent. Pr(A),

given B: Pr(A), Pr(B|A) X 2, Pr(C|B) X 2 = 5

•Chain with 50 variables requires 99 probabilities

For any probability measure P and random variables A, B, and C:

CI 4: A B and A C | B [P] ⇒ A B ∪ C [P]

Some probability measures also satisfy:

CI5 satisfied whenever P has a positive joint probability density with

(Global) S separates A from B ⇒ A B|S

(G) ⇒ (L) ⇒ (P)

X2 X5, X4 | X1, X3 (1)

X4 To go from (2) to (1) need X5 X2 | X1,X3? or CI5

A density f is said to “factorize according to G” if:

• cliques are maximally complete subgraphs “clique potentials”

“Proof”: Let S separate A from B in G and assume V = A ! B ! S .

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S

Lauritzen, Dawid, Larsen & Leimer (1990)

A density f admits a “recursive factorization” according to an

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )

Lemma: If P admits a recursive factorization according to an

Lemma: If P admits a recursive factorization according to an

{D,F,W,H,B} is the “Markov Blanket” of S. It contains the parents of

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S

(G) ⇒ (L) α ∪ nd(α) is an ancestral set; pa(α) obviously

(L) ⇒ (factorization) induction on the number of vertices

A chain π from a to b in an acyclic directed graph G is said to be

Let A, B, and S be disjoint subsets of a directed, acyclic graph,

•UG is decomposable if chordal

•Chordal graphs (uniquely) admit clique orderings

•Chordal UG models facilitate fast inference

•ADG models better for expert system applications –

UG Global Markov Property ⇔ f(x) = Π ψC(xC)

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )

Algorithm is widely deployed in commercial software

Message Schedule: AB BC BC AB Pr(A|C)

Do the UG and ADG global Markov properties

Yes. Completeness for ADGs by Geiger and Pearl

Graphical characterization of collapsibility in

Care Clinic Surv.

Theorem: A graphical log-linear model L

• Example: three binary variables

• ADG models for a fixed set of vertices decompose into

• Repeating analyses for equivalent ADGs leads to significant

Theorem (Verma & Pearl, Glymour et al, Frydenberg, AMP94):

Definition The essential graph D* associated with D is the graph

• Essential graphs are chain graphs

also Meek (1995) and Chickering (1995)

Chain Components (“Boxes”)

•Chain graph Markov property, Frydenberg (1990)

You might also like