Week-9
Week-9
● Compatibility:
● Refers to how well the values of variables in a clique "agree"
with each other.
● High compatibility means the configuration is likely or
desirable.
● Affinity:
● Refers to the strength of the relationship between variables
in a clique.
● High affinity means the variables strongly influence each
other.
Potential Functions
● Formula:
● P(X)=(1/Z) ∏cΨc(Xc)
● Ψc(Xc): Potential function for clique c.
● Z: Partition function (normalizing constant).
● Ψc(Xc) must be non-negative.
● Z ensures P(X) is a valid probability distribution.
Partition Function (Z)
● Z= ∑X ∏cΨc(Xc)
A B C
● Statement:
● Any probability distribution consistent with the factorization over a
graph can be expressed using potential functions of the form:
Ψc(Xc)=exp{−E(Xc)}
● E(Xc): Energy function (can be any real-valued function).
● Key Points:
● Energy functions simplify the representation of potential functions.
● High energy corresponds to low probability, and vice versa.
Energy Functions and Intuition
● Energy functions E(Xc) are derived from data.
● High-count configurations in data are assigned low energy
(high probability).
● Energy functions are not restricted to being normalized.
Markov Random Fields (MRFs)
Markov Random Fields (MRFs)
● Definition:
● Undirected graphical models are also called Markov Random Fields.
● Variables are independent of non-neighbors given their immediate
neighbors.
● Key Points:
● MRFs are commonly used in image processing (e.g., pixel labeling).
● Example: Lattice structure for modeling images.
Pixel Labeling with MRFs
● Each pixel is a random variable (e.g., foreground or background).
● Potential functions are defined for edges (pairs of neighboring pixels).
● Inference involves finding the configuration of labels with the lowest
energy.
● Example:
○ For a 3x3 image, there are 9 pixels, each with a label (foreground or
background).
○ The goal is to assign labels such that the overall energy is minimized.
Pixel Labeling with MRFs
Training MRFs
● Dataset Preparation:
● Given an image dataset where each pixel needs to be labeled
(e.g., foreground/background segmentation).
● Defining Potentials:
● Node potentials: Derived from observed pixel values using a
classifier (e.g., logistic regression).
● Edge potentials: Learned from co-occurrence statistics in labeled
training data.
Example of MRF Training
● Energy Function:
● The energy function is formulated as:
𝐸(𝑋)=∑𝑖𝜓𝑖(𝑋𝑖)+∑(𝑖,𝑗)∈𝐸𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)
● Problem:
● What is an HMM?
● A graphical model representing probabilistic dependencies.
● Hidden states generate observed sequences.
● Based on the Markov assumption: The present state depends
only on the previous state.
Components of an HMM
● A directed chain
structure where
each state depends
only on the
previous state.
● Observations
depend only on the
corresponding
hidden state.
Example - Part-of-Speech Tagging
● Question
What is the probability of observing the sequence (Walk,
Shop) given the model?
Calculated Example
Inference in Graphical Models
grade
Inference Query: SAT
P(J)=∑C,D,I,G,S,L,HP(C,D,I,G,S,L,J,H)
letter
8
Naive summation: O(2 ) for
binary variables
happy Job
Variable Elimination (Intuition)
● Eliminate C
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● Compute τ1(D)=∑CP(C)P(D∣C)
● New factor over D
Step-by-Step Elimination-2
● Eliminate D
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● Compute τ2(G,I)=∑DP(G∣I,D)τ1(D)
● New factor over G,I.
Step-by-Step Elimination-3
● Eliminate I
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● Compute τ3(G,S)=∑Iτ2(G,I)P(I)P(S∣I)
Step-by-Step Elimination-4
● Eliminate S
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● Compute τ4(J,L,G)=∑Sτ3(G,S)P(J∣L,S)
Step-by-Step Elimination-5
● Eliminate L
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● Compute τ5(J,G)=∑Lτ4(J,L,G)P(L∣G)
Step-by-Step Elimination-6
● Eliminate G
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)
● Compute τ6(J,H)=∑Gτ5(J,G)P(H∣G,J)
Step-by-Step Elimination-6
● Eliminate H
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)
● τ6(J,H)
● Compute P(J)=∑Hτ6(J,H)
Computational Complexity
● Induced Width: Size of the largest clique formed during
elimination.
● Determines complexity: O(exp(induced width))
● Good Order:Always eliminate variables that disconnect the
graph first (like C).
● Rule of Thumb: Start with variables that have the fewest
connections!
● Fill-in Edges: Edges added during elimination to form cliques.
Treewidth and Induced Width
● Definition:
● Treewidth: Minimal induced width across all possible
elimination orderings.
● Induced Width: Size of the largest clique formed during
variable elimination.
● Key Points:
● Measures graph complexity for inference.
● For trees: Treewidth = 1 (if defined as max clique size - 1).
● Elimination order impacts induced width (leaf-to-root is
optimal for trees).
Limitations of Variable Elimination
● Problem:
● Recomputing intermediates for new queries (e.g., P(H)after
P(J)).
● No reuse of calculations.
Belief Propagation (The "Time-Saver")
● Cache intermediate results ("messages") between nodes.
● Like writing down sub-totals while cleaning:
○ Compute "message" from C to D once.
○ Reuse it for all future queries involving D
● Works perfectly for trees (no loops).
● Approximate for complex graphs (but still faster).
R S
We want to compute:
G
R S
Key Intuition
G
Source
Question-1
In the undirected graph given below, how many terms will be there in its potential
function factorization?
a) 7
b) 3
c) 5
d) 9
e) None
Question-1- Correct answer
In the undirected graph given below, how many terms will be there in its
potential function factorization?
a) 7
b) 3
c) 5
d) 9
e) none
a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above
Question-3 - Correct answer
a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above
Correct options: (e)-All pairs have an alternate route to each other that
does not pass through C
Question-4
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov
Models?
a) I and II
b) I and IV
c) II and III
d) III and IV
Question-4
I. The ”Hidden” in HMM refers to the fact that the state transition
probabilities are unknown. — it is known
II. The ”Markov” property means that the current state depends only on the
previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not
directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations
for calculations. – Markov means future state dependence on current state
only
Question-4 - Correct answer
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov Models?
a) I and II
b) I and IV
c) II and III
d) III and IV
For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)
a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A
Question-5 - Correct answer
For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)
a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A
a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV
Question-6
a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV
Next Session:
Tuesday:
01-Apr-2025
6:00 - 8:00 PM