0% found this document useful (0 votes)

14 views9 pages

Scribe Lecture4

The document discusses exact probabilistic inference techniques for graphical models. It covers the tasks of inference, which is answering queries about a probability distribution, and learning, which is obtaining a model from data. Exact inference algorithms compute the exact probability and include elimination, message passing, and junction trees. However, exact inference is NP-hard for arbitrary graphs, so approximate inference techniques like sampling are commonly used. The document provides examples of inference queries like likelihood, conditional probability, and most probable assignment.

Uploaded by

Kishan Kumar Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views9 pages

Scribe Lecture4

Uploaded by

Kishan Kumar Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

10-708: Probabilistic Graphical Models 10-708, Spring 2017

Lecture 4: Exact Inference

Lecturer: Eric P. Xing Scribes: Yohan Jo, Baoyu Jing

1 Probabilistic Inference and Learning

In practice, exact inference is not used widely, and most probabilistic inference algorithms are approximate.
Nevertheless, it is important to understand exact inference and its limitations.
There are two typical tasks with graphical models: inference and learning. Given a graphical model M that
describes a unique probability distribution P , inference means answering queries about PM , e.g., PM (X|Y ).
Learning means obtaining a point estimate of model M from data D. However, in statistics, both inference
and learning are commonly referred to as either inference or estimation. From the Bayesian perspective,
for example, learning p(M |D) is actually an inference problem. When not all variables are observable,
computing point estimates of M needs inference to impute the missing data.

1.1 Likelihood

One of the simplest queries one may ask is likelihood estimation. The likelihood estimation of a probability
distribution is to compute the probability of the given evidence, where evidence is an assignment of values
to a subset of variables. Likelihood estimation involves marginalization of the other variables. Formally, let
e and x denote evidence and the remaining variables, respectively, the likelihood of e is
X X
P (e) = ··· P (x1 , · · · , xk , e).
x1 xk

From the perspective of computational statistics, calculating this is computationally expensive because the
computation involves exploring exponentially many configurations.

1.2 Conditional Probability

Another type of queries is the conditional probability of variables given evidence. The probability of variables
X given evidence e is
P (X, e) P (X, e)
P (X|e) = =P .
P (e) x (X = x, e)
P
This is called a posteriori belief in X given evidence e.
As in likelihood estimation, calculating a conditional probability involves marginalization of variables that
are not of our interest (i.e., the summation part in the equation). We will be covering efficient algorithms
for marginalization later in this class.
Calculating a conditional probability is called in different ways depending on the query nodes. When the
query node is a terminal variable in a directed graphical model, the inference process is called prediction. But

1
2 Lecture 4: Exact Inference

y1 y2 P (y1 , y2 )
0 0 0.35
0 1 0.05
1 0 0.3
1 1 0.3

Table 1: MPA

if the query node is an ancestor of the evidence, the inference process is called diagnosis; e.g., the probability
of disease/fault given evidence symptoms. For instance, in the deep belief network, a restricted Boltzmann
machine with multiple hidden layers, the hidden layers are estimated given data.

1.3 Most Probable Assignment

We may query the most probable assignment (MPA) for a subset of variables in the domain. Given evidence
e, query variables Y , and the other variables Z, the MPA of Y is

X
MPA(Y |e) = arg max P (y|e) = arg max P (y, z|e).
y y
z

This is the maximum a posteriori configuration of Y .

The MPA of a variable depends on its context. For example, in Table 1, the MPA of Y1 is 1 because
P (Y1 = 0) = 0.35 + 0.05 = 0.4 and P (Y1 = 1) = 0.3 + 0.3 = 0.6. However, when we include Y2 as context,
the MPA of (Y1 , Y2 ) is (0, 0) because P (Y1 = 0, Y2 = 0) = 0.35 is the highest value in the table.
This is related to multi-task learning in machine learning. Multi-task learning indicates solving multiple
learning tasks jointly instead of learning task-specific models separately ignoring context. For example, we
may improve the accuracy of a classifier of having lunch and that of a classifier of having coffee if we jointly
model having lunch and having coffee together, compared to making the two classifiers separately.

2 Approaches to Inference

There are two types of inference techniques: exact inference and approximate inference. Exact inference
algorithms calculate the exact value of probability P (X|Y ). Algorithms in this class include the elimination
algorithm, the message-passing algorithm (sum-product, belief propagation), and the junction tree algo-
rithms. We will not cover the junction tree algorithms in the class because they are outdated and confusing.
The time complexity of exact inference on arbitrary graphical models is NP-hard. However, we can improve
efficiency for particular families of graphical models. Approximate inference techniques include stochastic
simulation and sampling methods, Markov chain Monte Carlo methods, and variational algorithms.
A more complex network
Lecture 4: Exact Inference 3

A food
Elimination web
on Chains Undirected Chains

A B C D E A B C D E

Rearranging(b)
terms ...

Model
Rearranging terms ... (a) Chain Conditional Random Field

1
B A
P (e) (b , a ) ( c , b ) ( d , c ) ( e, d )
d c b a Z
C (c, b ) ( dD
P (e) P ( a ) P (b | a ) P ( c | b ) P ( d | c ) P ( e | d ) 1
, c ) ( e, d ) (b , a )
...
d c b a
y1 yP (2c | b) P (d | cy) P3(e | d ) P ( a ) P (b y
| aT
)
Z d c b a

E F
d c b a

...
© Eric Xing @ CMU, 2005-2017 12

xA1 xA2 xA3 xAT G H

(c) Hidden Markov Model (d) General Directed Graphical Model

p(x2 | y2) … p(yT | yT-1) p(xT | yT)
What is the probability that hawks
Figureare leaving Model
1: Graphical givenExamples
that the grass condition is poor?
© Eric Xing @ CMU, 2005-2017 20

3 Elimination

3.1 Chains

Let’s first consider inference on the simple chain in Figure 1a. The probability P (E = e) can be calculated
as
XXXX
P (e) = P (a, b, c, d, e).
d c b a

This naive summation enumerates over exponentially many configurations of the variables and thus is inef-
ficient. But if we use the chain structure, the marginal probability can be calculated as

Eric Xing @ CMU, 2005-2017 16

XXXX
P (e) = P (a)P (b|a)P (c|b)P (d|c)P (e|d)
d c b a
X X X X
= P (e|d) P (d|c) P (c|b) P (a)P (b|a)
d c b a
X X X
= P (e|d) P (d|c) P (c|b)P (b)
d c b
X X
= P (e|d) P (d|c)P (c)
d c
X
= P (e|d)P (d).
d

The summation in each line involves enumerating over only two variables, and there are four such summations.
Therefore, in general, the time complexity of the elimination algorithm is O(nk 2 ), where n is the number
of variables and k is the possible values of each variable. That is, for simple chains, exact inference can be
done in polynomial time as opposed to the exponential time for the naive approach.
4 Lecture 4: Exact Inference

3.2 Hidden Markov Model

3.3 Undirected Chains

Let’s consider the undirected chain in Figure 1b. With the elimination algorithm, the marginal probability
P (E = e) can be calculated similarly as
XXXX 1
P (e) = φ(b, a)φ(c, b)φ(d, c)φ(e, d)
c a
Z
d b
1 XXX X
= φ(c, b)φ(d, c)φ(e, d) φ(b, a)
Z c a
d b
.
= ..

In general, we can view the task at hand as that of computing the value of an expression of the form
XY
φ,
z φ∈F

where F is a set of factors. This task is called the sum-product inference.

4 Variable Elimination

4.1 The Algorithm

We can extend the elimination algorithm to arbitrary graphical models. For a directed graphical model,
X XX Y
P (X1 , e) = ··· P (xi |pai ).
xn x3 x2 i∈V

For variable elimination, we repeat:

Lecture 4: Exact Inference 5

1. Move all irrelevant terms outside of the innermost sum.

2. Perform the innermost sum and get a new term.
3. Insert the new term into the product.

Note that the elimination algorithm has no benefit if the innermost term includes all variables, that is, xi is
dependent on all the other variables. However, in most problems, the number of variables in the innermost
term is less than the total number of variables.
For undirected graphical models,
φ(X1 , e)
P (X1 |e) = P .
x1 φ(X1 , e)

Let’s consider the graphical model in Figure 1d. The joint probability distribution factorizes to

P (a, b, c, d, e, f, g, h) = P (a)P (b)P (c|b)P (d|a)P (e|c, d)P (f |a)P (g|e)P (h|e, f ).

To calculate the conditional probability P (A|h), we first choose an elimination order:

H, G, F, E, D, C, B.

We condition on the evidence node H by fixing its value to h. To treat marginalization and conditioning as
formally equivalent, we can define an evidence potential δ(h = h̃) whose value is one if the inner statement
is true and zero otherwise. Then, we obtain
X
P (H = h̃|e, f ) = P (h|e, f )δ(h = h̃).
h

The conditional probability P (a|h) is calculated as

XXXXXX
P (a, h̃) = P (a)P (b)P (c|b)P (d|a)P (e|c, d)P (f |a)P (g|e)P (h|e, f )
b c d e f g
X X X X X X X
= P (a) P (b) P (c|b) P (d|a) P (e|c, d) P (f |a) P (g|e) P (h|e, f )δ(h = h̃)
b c d e f g h
X X X X X X
= P (a) P (b) P (c|b) P (d|a) P (e|c, d) P (f |a) P (g|e)mh (e, f )
b c d e f g
X X X X X X
= P (a) P (b) P (c|b) P (d|a) P (e|c, d) P (f |a)mh (e, f ) P (g|e)
b c d e f g
X X X X X
= P (a) P (b) P (c|b) P (d|a) P (e|c, d) P (f |a)mh (e, f )
b c d e f
X X X X
= P (a) P (b) P (c|b) P (d|a) P (e|c, d)mf (a, e)
b c d e
X X X
= P (a) P (b) P (c|b) P (d|a)me (a, c, d)
b c d
X X
= P (a) P (b) P (c|b)md (a, c)
b c
X
= P (a) P (b)mc (a, b, c)
b
= P (a)mb (a)
6 Lecture 4: Exact Inference

Therefore,
P (a, h̃) P (a)mb (a)
P (a|h̃) = =P .
P (h̃) a P (a)mb (a)

4.2 Complexity of Variable Elimination

In one elimination step, we should compute:

X k
Y
mx (y1 , ..., yk ) = m0x (x, y1 , ..., yk )m0x (x, y1 , ..., yk ) = mi (x, yci )
x i=1

5 Understanding Variable Elimination

The equations in the above section describe the process of Elimination from the perspective of mathematics.
While we can describe the Elimination process from the perspective of graph - graph elimination algorithm.
There are mainly two steps in the graph elimination algorithm: moralization and (undirected) graph
elimination.

5.1 Moralization

Moralization is a process of converting a directed acyclic graph(DAG) into “equivalent” undirected graph.
Its procedure is:

• Starting from an input DAG

• Connect nodes if they share a common child.

• Make directed edges to undirected edges.

5.2 Graph Elimination

The graph elimination algorithm is:

Input undirected GM or moralized DAG & the elimination order I
for each node Xi in I
- connect all of the remaining neighbors of Xi
- remove Xi from the graph
end
Lecture 4: Exact Inference 7

5.3 Graph Elimination and Marginalization

Now we can interpret the Elimination algorithm from the perspective of graph elimination algorithm. As
shown in the Figure 2 below, the summation step in Elimination can be represented by an elimination
step in graph elimination algorithm. In addition, the intermediate terms in Elimination correspond to the
elimination cliques resulted from graph elimination algorithm (Figure 3a). In addition, we can also construct
a clique tree to represent the elimination process (Figure 3b).

Figure 2: A graph elimination

(a) Elimination cliques (b) Clique tree

Figure 3: Cliques

(a) Star (b) Tree (c) Ising Model

Figure 4: Tree-Width Examples

8 Lecture 4: Exact Inference

5.4 Complexity

The overall complexity is determined by the number of the largest elimination clique. A “good” elimination
will make the largest clique relatively small. Tree-width k is introduced to study this problem. Its definition
is one less than the smallest achievable value of the cardinality of the largest elimination clique, ranging
over all possible elimination ordering. However, finding such k as well as the “best” elimination ordering is
NP-hard. For some of the graphs such as stars (Figure 4a) (k = 2 − 1) and trees (Figure 4b) (k = 2 − 1), we
can easily get their tree-width k, while for graphs like ising model (Figure 4c), it is very hard to compute
the k.

6 Message Passing

Figure 5: Tree GMs: from left to right undirected tree, directed tree, polytree

6.1 From Elimination to Message Passing

One of the limitation of the elimination is that it only answers one query (e.g., on one node). To answer
other queries (nodes) as well, the notion of message is introduced. Each step of elimination is actually a
message passing on a clique tree. Although different query has different clique tree, the message passing
through the tree can be reused.
There are mainly three types of trees (Figure 5): undirected tree, directed tree and polytree. In fact,
directed and undirected trees are equivalent, since:
(1) Undirected trees can be converted to directed by choosing a root and directing all edges away from it.
(2) A directed tree and the corresponding undirected tree make
Q the same Qconditional independence assertions.
(3) Parameterizations are essentially the same. p(x) = Z1 ( i∈V ψ(xi ) (i,j)∈E ψ(xi , xj ))

6.2 Elimination on A Tree

We can show that elimination on trees is equivalent to message passing along tree branches. As shown in
figure 6a, let mji (xi ) denote the factor resulting from eliminating variables from bellow up to i, which is a
function of xi : X Y
mji (xi ) = (ψ(xj )ψ(xi , xj ) mkj (xj ))
xj k∈N (j)\i

This is equivalent to the message from j to i.

Lecture 4: Exact Inference 9

The process of elimination or message passing can be described as:

• Choose query node f as the root of the tree.

• View tree as a directed tree with edges pointing towards leaves from f .

• Elimination ordering based on depth-first traversal.

• Elimination of each node can be considered as message-passing (or Belief Propagation) directly along
tree branches, rather than on some transformed graphs.

Therefore, we can use the tree itself as a data-structure to do general inference.

6.3 The Message Passing Protocol

To efficiently compute the marginal distributions of all nodes in a graph, a Message Passing Protocol (Figure
6b) is introduced: A node can send a message to its neighbors when (and only when) it has received messages
from all its other neighbors. Based on this protocol, a naive computing approach is considering each node as
the root, and executing the message passing algorithm for it. The complexity of the naive approach is N C,
where N is the number of nodes and C is the complexity of a complete message passing.

(a) Message Passing (b) Message Passing Protocol

Figure 6: Message Passing

(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
No ratings yet
(Adaptive Computation and Machine Learning) Daphne Koller - Nir Friedman - Probabilistic Graphical Models - Principles and PDF
1,270 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
Daphne Koller, Nir Friedman Probabilistic Graphical Models Principles and Techniques 2009
100% (10)
Daphne Koller, Nir Friedman Probabilistic Graphical Models Principles and Techniques 2009
1,270 pages
Graphical Models, Exponential Families, and Variational Inference
No ratings yet
Graphical Models, Exponential Families, and Variational Inference
301 pages
Probabilistic Circuits Tutorial
No ratings yet
Probabilistic Circuits Tutorial
188 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
6th Central Pay Commission Salary Calculator
100% (436)
6th Central Pay Commission Salary Calculator
15 pages
Solved Paper
No ratings yet
Solved Paper
51 pages
Mathematics in The Modern World Chapter 7
100% (1)
Mathematics in The Modern World Chapter 7
27 pages
NEVER Kiss Your Best Frined PDF - Sum
67% (43)
NEVER Kiss Your Best Frined PDF - Sum
134 pages
VTU ADA Lab Programs
No ratings yet
VTU ADA Lab Programs
31 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Chapter 9 Data Mining
No ratings yet
Chapter 9 Data Mining
147 pages
Lec7 - Bayesian Network I
No ratings yet
Lec7 - Bayesian Network I
62 pages
Unix MCQ
No ratings yet
Unix MCQ
43 pages
PML Class 1 2025
No ratings yet
PML Class 1 2025
54 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Introduction To Variational Methods
No ratings yet
Introduction To Variational Methods
51 pages
Prob Inf
No ratings yet
Prob Inf
56 pages
100 Program CPP
100% (1)
100 Program CPP
78 pages
Machine Learning
No ratings yet
Machine Learning
88 pages
SP14 Cs188 Lecture 18 - Bayes Nets III Inference - Print
No ratings yet
SP14 Cs188 Lecture 18 - Bayes Nets III Inference - Print
39 pages
Cs687 Bayes Nets Inference
No ratings yet
Cs687 Bayes Nets Inference
30 pages
Bayesian Machine Learning
No ratings yet
Bayesian Machine Learning
127 pages
GTM1 Notes
No ratings yet
GTM1 Notes
13 pages
22 BN Inference
No ratings yet
22 BN Inference
40 pages
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
No ratings yet
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
40 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
1 - Introduction To Graphs
No ratings yet
1 - Introduction To Graphs
28 pages
6438 CombinedNotes
No ratings yet
6438 CombinedNotes
206 pages
Lecture10 - Bayes 3
No ratings yet
Lecture10 - Bayes 3
43 pages
Mount Zion College of Engineering and Technology
No ratings yet
Mount Zion College of Engineering and Technology
23 pages
15 BNInference
No ratings yet
15 BNInference
39 pages
Kolter PGM
No ratings yet
Kolter PGM
75 pages
6 Uncertainty6
No ratings yet
6 Uncertainty6
36 pages
BN Lecture3
No ratings yet
BN Lecture3
32 pages
Graph Lecture19
No ratings yet
Graph Lecture19
42 pages
450 Questions
No ratings yet
450 Questions
15 pages
BN Lecture4
No ratings yet
BN Lecture4
22 pages
16 Graphical Models
No ratings yet
16 Graphical Models
27 pages
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
No ratings yet
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
40 pages
Practical Statistical Relational AI: Pedro Domingos
No ratings yet
Practical Statistical Relational AI: Pedro Domingos
109 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Final F02soln
No ratings yet
Final F02soln
11 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
04 Exact Inference
No ratings yet
04 Exact Inference
23 pages
Aiml Partb Unit II QP
No ratings yet
Aiml Partb Unit II QP
5 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
Shortest Path Algorithms
100% (1)
Shortest Path Algorithms
25 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
Homework1 Solution
No ratings yet
Homework1 Solution
9 pages
Java J2EE
No ratings yet
Java J2EE
4 pages
Building Probabilistic Graphical Models With Python
No ratings yet
Building Probabilistic Graphical Models With Python
24 pages
Exp1 A09 DS
No ratings yet
Exp1 A09 DS
6 pages
Exact Inference Bayesian Networks
No ratings yet
Exact Inference Bayesian Networks
3 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
Xpectation Aximization: Grading An Exam Without An Answer Key
No ratings yet
Xpectation Aximization: Grading An Exam Without An Answer Key
9 pages
Probabilistic Reasoning: CS 188: Artificial Intelligence
No ratings yet
Probabilistic Reasoning: CS 188: Artificial Intelligence
10 pages
Bayesian Statistics and Belief Networks
No ratings yet
Bayesian Statistics and Belief Networks
36 pages
Bio-Chips: A Seminar Report On
No ratings yet
Bio-Chips: A Seminar Report On
5 pages
Formal Language and Compiler Design - 2
No ratings yet
Formal Language and Compiler Design - 2
40 pages
CS 188 Introduction To Fall 2015 Artificial Intelligence: Q1 Q2 Q3 Total /30 /40 /30 /100
No ratings yet
CS 188 Introduction To Fall 2015 Artificial Intelligence: Q1 Q2 Q3 Total /30 /40 /30 /100
8 pages
Correct Answer Carry One Mark and Wrong Answer Carry 0.25 Marks. VERBAL SECTION (25 Questions-25min)
No ratings yet
Correct Answer Carry One Mark and Wrong Answer Carry 0.25 Marks. VERBAL SECTION (25 Questions-25min)
18 pages
Ece368h1s 01 - 22 - 2023
No ratings yet
Ece368h1s 01 - 22 - 2023
4 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
Algorithm Quiz
No ratings yet
Algorithm Quiz
40 pages
Lec22 PDF
No ratings yet
Lec22 PDF
8 pages
Graphical Models: Michael I. Jordan
No ratings yet
Graphical Models: Michael I. Jordan
16 pages
Belete Brihanuu1
No ratings yet
Belete Brihanuu1
22 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Unix Questions
No ratings yet
Unix Questions
7 pages
RPH Data Handling Math For TMK
No ratings yet
RPH Data Handling Math For TMK
5 pages
SQL Interview Questions PDF
88% (43)
SQL Interview Questions PDF
48 pages
DM4ML Quiz
No ratings yet
DM4ML Quiz
7 pages
Lecture 1: Statistical Signal Processing
No ratings yet
Lecture 1: Statistical Signal Processing
3 pages
Edullantes Chapter-6-Quiz
No ratings yet
Edullantes Chapter-6-Quiz
8 pages
Cheatsheet Variables Models
No ratings yet
Cheatsheet Variables Models
4 pages
Cambridge Part 2567
No ratings yet
Cambridge Part 2567
30 pages
Unit 13 - Dynamic Programming
No ratings yet
Unit 13 - Dynamic Programming
10 pages
Asst6 Math 239
No ratings yet
Asst6 Math 239
4 pages
6.1 Vertex Colouring
No ratings yet
6.1 Vertex Colouring
18 pages
Graph Theory
No ratings yet
Graph Theory
44 pages
Unit IV Graph Theory Slides Discrete Mathematics Sumangal
No ratings yet
Unit IV Graph Theory Slides Discrete Mathematics Sumangal
82 pages
Stack Queue
No ratings yet
Stack Queue
24 pages
Situation Reaction Test SRTs SSB AFSB Gomilitary - in
No ratings yet
Situation Reaction Test SRTs SSB AFSB Gomilitary - in
3 pages
Chap 11
No ratings yet
Chap 11
12 pages
Practice Sheets I To V
No ratings yet
Practice Sheets I To V
12 pages
Lexicography
No ratings yet
Lexicography
21 pages
Controls Bischoff 2250
No ratings yet
Controls Bischoff 2250
22 pages
Graph Theory Paths and Cycle
No ratings yet
Graph Theory Paths and Cycle
21 pages
Flocking and Swarms: Consensus Algorithms
No ratings yet
Flocking and Swarms: Consensus Algorithms
34 pages
Itri 611 Data
No ratings yet
Itri 611 Data
23 pages
Statistics
No ratings yet
Statistics
28 pages
Data Structures Explained - CodeHype
No ratings yet
Data Structures Explained - CodeHype
10 pages
Vending Machine
No ratings yet
Vending Machine
2 pages
MCS 033
No ratings yet
MCS 033
4 pages

Scribe Lecture4

Uploaded by

Scribe Lecture4

Uploaded by

10-708: Probabilistic Graphical Models 10-708, Spring 2017

Lecture 4: Exact Inference

Lecturer: Eric P. Xing Scribes: Yohan Jo, Baoyu Jing

1 Probabilistic Inference and Learning

1.2 Conditional Probability

1.3 Most Probable Assignment

This is the maximum a posteriori configuration of Y .

xA1 xA2 xA3 xAT G H

(c) Hidden Markov Model (d) General Directed Graphical Model

Eric Xing @ CMU, 2005-2017 16

3.2 Hidden Markov Model

3.3 Undirected Chains

where F is a set of factors. This task is called the sum-product inference.

4.1 The Algorithm

For variable elimination, we repeat:

1. Move all irrelevant terms outside of the innermost sum.

To calculate the conditional probability P (A|h), we first choose an elimination order:

The conditional probability P (a|h) is calculated as

4.2 Complexity of Variable Elimination

In one elimination step, we should compute:

5 Understanding Variable Elimination

• Starting from an input DAG

• Connect nodes if they share a common child.

• Make directed edges to undirected edges.

5.2 Graph Elimination

The graph elimination algorithm is:

5.3 Graph Elimination and Marginalization

Figure 2: A graph elimination

(a) Elimination cliques (b) Clique tree

(a) Star (b) Tree (c) Ising Model

Figure 4: Tree-Width Examples

6.1 From Elimination to Message Passing

6.2 Elimination on A Tree

This is equivalent to the message from j to i.

The process of elimination or message passing can be described as:

• Choose query node f as the root of the tree.

• Elimination ordering based on depth-first traversal.

Therefore, we can use the tree itself as a data-structure to do general inference.

6.3 The Message Passing Protocol

(a) Message Passing (b) Message Passing Protocol

Figure 6: Message Passing

You might also like