0% found this document useful (0 votes)
38 views35 pages

An Introduction to Probabilistic Graphical Models 【微信公众号:一介狂书生】

This document discusses probabilistic graphical models and their use in expert systems. It introduces probabilistic graphical models including probabilistic networks represented by graphs. It describes properties of conditional independence and how graphs can represent conditional independence relationships through d-separation. It discusses different types of graphs including undirected graphs, directed acyclic graphs, and how they can be used to represent factorizations and perform inference. It provides an example of the Lauritzen-Spiegelhalter algorithm for inference in graphical models.

Uploaded by

565864220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views35 pages

An Introduction to Probabilistic Graphical Models 【微信公众号:一介狂书生】

This document discusses probabilistic graphical models and their use in expert systems. It introduces probabilistic graphical models including probabilistic networks represented by graphs. It describes properties of conditional independence and how graphs can represent conditional independence relationships through d-separation. It discusses different types of graphs including undirected graphs, directed acyclic graphs, and how they can be used to represent factorizations and perform inference. It provides an example of the Lauritzen-Spiegelhalter algorithm for inference in graphical models.

Uploaded by

565864220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

PROBABILISTIC GRAPHICAL

MODELS

David Madigan
Rutgers University

[email protected]
Expert Systems

•Explosion of interest in “Expert Systems” in the


early 1980’s

IF the infection is primary-bacteremia


AND the site of the culture is one of the sterile sites
AND the suspected portal of entry is the gastrointestinal tract
THEN there is suggestive evidence (0.7) that infection is bacteroid.

•Many companies (Teknowledge, IntelliCorp,


Inference, etc.), many IPO’s, much media hype

•Ad-hoc uncertainty handling


Uncertainty in Expert Systems

If A then C (p1)
If B then C (p2)

What if both A and B true?


Then C true with CF:

p1 + (p2 X (1- p1))

“Currently fashionable ad-hoc mumbo jumbo”


A.F.M. Smith
Eschewed Probabilistic Approach

•Computationally intractable
•Inscrutable
•Requires vast amounts of data/elicitation

e.g., for n dichotomous variables need 2n - 1


probabilities to fully specify the joint distribution
Conditional Independence

X Y|Z ! f X ,Y |Z ( x, y | z ) = f X |Z ( x | z ) fY |Z ( y | z )
Conditional Independence

•Suppose A and B are marginally independent. Pr(A),


Pr(B), Pr(C|AB) X 4 = 6 probabilities
•Suppose A and C are conditionally independent

given B: Pr(A), Pr(B|A) X 2, Pr(C|B) X 2 = 5

A B C

C A|B

•Chain with 50 variables requires 99 probabilities

versus 250-1
Properties of Conditional Independence (Dawid, 1980)

For any probability measure P and random variables A, B, and C:


CI 1: A B [P] ⇒ B A [P]

CI 2: A B ∪ C [P] ⇒ A B [P]

CI 3: A B ∪ C [P] ⇒ A B | C [P]

CI 4: A B and A C | B [P] ⇒ A B ∪ C [P]

Some probability measures also satisfy:


CI 5: A B | C and A C | B [P] ⇒ A B ∪ C [P]

CI5 satisfied whenever P has a positive joint probability density with


respect to some product measure
Markov Properties for Undirected Graphs

(Global) S separates A from B ⇒ A B|S


(Local) α V \ cl(α) | bd (α)
(Pairwise) α β | V \ {α,β}

(G) ⇒ (L) ⇒ (P)

X2 X5, X4 | X1, X3 (1)


X1 X2
⇒ X2 X4 | X1, X3, X5 (2)
X5 X3

X4 To go from (2) to (1) need X5 X2 | X1,X3? or CI5


Lauritzen, Dawid, Larsen & Leimer (1990)
Factorizations

A density f is said to “factorize according to G” if:


f(x) = Π ψC(xC)
C εC

• cliques are maximally complete subgraphs “clique potentials”


Proposition: If f factorizes according to a UG G, then it also
obeys the global Markov property

“Proof”: Let S separate A from B in G and assume V = A ! B ! S .


Let CA be the set of cliques with non-empty intersection with A.
Since S separates A from B, we must have B " C = ! for all C
in CA. Then:
f ( x) = #$ C ( xC ) #$ C ( xC ) = f1 ( x A! S ) f 2 ( xB ! S )
C"C A C"C \ C A
Markov Properties for Acyclic Directed Graphs
(Bayesian Networks)

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S


(Local) α nd(α)\pa(α) | pa (α)

(G) ⇔ (L)
X1
X1

X3
X3

X2
X2

Lauritzen, Dawid, Larsen & Leimer (1990)


Factorizations

A density f admits a “recursive factorization” according to an


ADG G if f(x) = Π f(xv | xpa(v) )

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )


v εV

Lemma: If P admits a recursive factorization according to an


ADG G, then P factorizes according GM (and chordal
supergraphs of GM)

Lemma: If P admits a recursive factorization according to an


ADG G, and A is an ancestral set in G, then PA admits a
recursive factorization according to the subgraph GA
Factorizations

A p(A,B,C,D,E,F,G,H,S) =
p(A)p(C|A)p(D|C)p(S|D,F)p(E|S)
C G B p(F|G)p(G|B)p(H|S,B)p(B)
D F ⇒
S
H
p(S|A,B,C,D,E,F,G,H) ∝
E p(S|D,F)p(E|S)p(H|S,B)

{D,F,W,H,B} is the “Markov Blanket” of S. It contains the parents of


S, the children of S, and the other parents of the children of S.
Markov Properties for Acyclic Directed Graphs
(Bayesian Networks)

(Global) S separates A from B in Gan(A,B,S)m ⇒ A B|S


(Local) α nd(α)\pa(α) | pa (α)

(G) ⇒ (L) α ∪ nd(α) is an ancestral set; pa(α) obviously


separates α from nd(α)\pa(α) in Gan(α∪nd(α))m

(L) ⇒ (factorization) induction on the number of vertices


d-separation

A chain π from a to b in an acyclic directed graph G is said to be


blocked by S if it contains a vertex γ ∈ π such that either:
- γ ∈ S and arrows of π do not meet head to head at γ, or
- γ ∉ S nor has γ any descendents in S, and arrows of π
do meet head to head at g
Two subsets A and B are d-separated by S if all chains from A
to B are blocked by S
d-separation and global markov property

Let A, B, and S be disjoint subsets of a directed, acyclic graph,


G. Then S d-separates A from B if and only if S separates A
from B in Gan(A,B,S)m
UG – ADG Intersection

A B C

C A|B

A A D
A B C
B
C C B A C|B

A C A B | C,D A B C
C D | A,B
A C|B

A B C

A C|B
UG – ADG Intersection

UG ADG

Decomposable

•UG is decomposable if chordal


No CI5
•ADG is decomposable if moral
•Decomposable ~ closed-form log-linear models
Chordal Graphs and RIP

•Chordal graphs (uniquely) admit clique orderings


that have the Running Intersection Property
V 1. {V,T}
2. {A,L,T}
T L 3. {L,A,B}
A S 4. {S,L,B}
5. {A,B,D}
X D B 6. {A,X}

•The intersection of each set with those earlier in the list is fully contained
in previous set
•Can compute cond. probabilities (e.g. Pr(X|V)) by message passing
(Lauritzen & Spiegelhalter, Dawid, Jensen)
Probabilistic Expert System

•Computationally intractable
•Inscrutable
•Requires vast amounts of data/elicitation

•Chordal UG models facilitate fast inference

•ADG models better for expert system applications –


more natural to specify Pr( v | pa(v) )
Factorizations

UG Global Markov Property ⇔ f(x) = Π ψC(xC)


C εC

ADG Global Markov Property ⇔ f(x) = Π f(xv | xpa(v) )


v εV
Lauritzen-Spiegelhalter Algorithm

A
A
ψ (C,S,D) ← Pr(S|C, D)
E F ψ(A,E) ← Pr(E|A) Pr(A)
E F ψ (C,E) ← Pr(C|E)
ψ(F,D,B) ← Pr(D|F)Pr(B|F)Pr(F)
C D
B C D ψ (D,B,S) ← 1
B
S ψ (B,S,G) ← Pr(G|S,B)
S ψ (H,S) ← Pr(H|S)
H
G H
G

•Moralize
•Triangulate

Algorithm is widely deployed in commercial software


L&S Toy Example

Pr(C|B)=0.2 Pr(C|¬B)=0.6
A B C Pr(B|A)=0.5 Pr(B|¬A)=0.1
Pr(A)=0.7

ψ(A,B) ← Pr(B|A)Pr(A)
A B C
ψ (B,C) ← Pr(C|B)

B ¬B C ¬C
B ¬B
AB B BC A 0.35 0.35 B 0.2 0.8
1 1
¬A 0.03 0.27 ¬B 0.6 0.4

Message Schedule: AB BC BC AB Pr(A|C)


¬C C ¬C
C
B ¬B B 0.076 0
B 0.076 0.304
0.38 0.62 ¬B 0.372
¬B 0.372 0.248 0
Other Theoretical Developments

Do the UG and ADG global Markov properties


identify all the conditional independences implied
by the corresponding factorizations?

Yes. Completeness for ADGs by Geiger and Pearl


(1988); for UGs by Frydenberg (1988)

Graphical characterization of collapsibility in


hierarchical log-linear models
(Asmussen and Edwards, 1983)
Collapsibility

Survival Survival
No Yes No Yes
Less 3 176 1.7% Less 17 197 7.9%
Care Care
More 4 293 1.4% More 2 23 8.0%

Clinic A Clinic B

Survival
No Yes
Less 20 373 5.1%
Care
More 6 316 1.9%

Pooled
Collapsibility

Care Clinic Surv.

Theorem: A graphical log-linear model L


is collapsible onto A iff every connected
component of Ac is complete.
Bayesian Learning for Discrete ADG’s

• Example: three binary variables

• Five parameters:
Local and Global Independence
Bayesian learning
Consider a particular state pa(v)+ of pa(v)
Equivalence Classes and Chain Graphs

• ADG models for a fixed set of vertices decompose into


Markov equivalence classes:
A B C A B C A B C

A C|B
b b b

a d a d a d
A D | B,C
c c c

D1 D2 D3
B C|A
b

a d A D | B,C
c B C
D4
Why is this a problem?

• Repeating analyses for equivalent ADGs leads to significant


computational inefficiencies.
• Ensuring that equivalent ADGs have equal posterior
probabilities imposes severe constraints on prior
distributions (Geiger and Heckerman, 1995).
• Bayesian model averaging procedures that average across
ADGs assign weights to statistical models that are
proportional to equivalence class sizes.
Equivalence Class Characterization

Theorem (Verma & Pearl, Glymour et al, Frydenberg, AMP94):


Two ADGs are Markov equivalent iff they have the same
skeletons and the same immoralities.

Definition The essential graph D* associated with D is the graph


D* := ∪(D’|D’ ~ D),

b b
b
c
a d a d
a d
c c

b b b

a d a d a d

c c c

D1 D2 D3
Essential Graphs
AMP (1995)

• Essential graphs are chain graphs


• D* is the unique smallest chain graph Markov equivalent to D
• A graph G = (V, E) is equal to D* for some ADG D if and only if G
satisfies the following four conditions:
(i) G is a chain graph;
(ii) For every chain component t of G, Gt is chordal;
(iii) The configuration a→bc does not occur as an induced
subgraph of G;
(iv) Every arrow a→b ∈ G is strongly protected in G:
c2

a b a b a b a b (c1!c2)

c c c c1
(a) (b) (c) (d)

also Meek (1995) and Chickering (1995)


What’s a Chain Graph?

“Equivalence”:
a ~ b iff a b

Chain Components (“Boxes”)


Chain Graphs

UG ADG

Decomposable CG

•Chain graph Markov property, Frydenberg (1990)


•Equivalence results (LWF, AMP, Meek, Studeny)

A D
C D | A,B or C D|A ?
C B
Cox & Wermuth (1996)

You might also like