0% found this document useful (0 votes)

0 views

Week-9

The document provides an overview of Undirected Graphical Models (UGMs) and Hidden Markov Models (HMMs), focusing on their structures, potential functions, and applications in problem-solving. It discusses the properties of cliques, the significance of potential functions in modeling relationships, and the challenges in inference and computation. Additionally, it covers the training of Markov Random Fields (MRFs) for image processing tasks and the inference processes in graphical models.

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Week-9

Uploaded by

Harshit Arora

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-9)

Shreya Bansal
PMRF PhD Scholar
IIT Ropar
Week-9 Contents

1. Undirected Graph Models

2. Hidden Markov Model
3. Variable elimination
4. Tree width and Belief Propagation
Introduction to Undirected Graphical Models
Introduction to Undirected Graphical Models

● Undirected Graphical Models (UGMs) represent joint

probability distributions using potential functions.
● Unlike Directed Graphical Models (Bayesian Networks),
UGMs do not impose restrictions on the form of potential
functions.
● UGMs are used to model relationships where directionality is
not inherent (e.g., pixel neighborhoods in images).
Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the

graph.
Cliques in Undirected Graphical Models

● A clique in an undirected graph is a subset of nodes where:

○ Every pair of nodes in the subset is connected by an
edge. (Fully connected)
○ The subset is maximal (no additional node can be
added to the subset while maintaining full
connectivity).
Types of Cliques
Maximal Clique: A clique that cannot be extended by adding another node.

Non-Maximal Clique: A clique that is part of a larger clique.

Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the

graph.
● They capture the "compatibility" or "afﬁnity" between
variables in a clique.
Compatibility and Afﬁnity

● Compatibility:
● Refers to how well the values of variables in a clique "agree"
with each other.
● High compatibility means the configuration is likely or
desirable.
● Affinity:
● Refers to the strength of the relationship between variables
in a clique.
● High affinity means the variables strongly influence each
other.
Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the graph.

● They capture the "compatibility" or "afﬁnity" between variables in a
clique.
● Properties:
● Ψc(Xc)≥0(non-negative).
● Not restricted to being probabilities (unlike conditional probabilities in
directed models).
● Example:
● For a clique c={X1,X2} , Ψc(X1 , X2) measures how likely X1 and X2 are to
take speciﬁc values together.
Compatibility in a Clique
Example: Pixel Labeling in Images

● Clique: Two neighboring pixels in an image.

● Potential Function:
● Assign high compatibility if neighboring pixels have the
same label (e.g., both foreground or both background).
● Assign low compatibility if neighboring pixels have different
labels.
● Result:
● Encourages smoothness in pixel labeling (e.g., large regions
of foreground or background).
Probability Distribution in UGMs

● Formula:
● P(X)=(1/Z) ∏cΨc(Xc)
● Ψc(Xc): Potential function for clique c.
● Z: Partition function (normalizing constant).
● Ψc(Xc) must be non-negative.
● Z ensures P(X) is a valid probability distribution.
Partition Function (Z)

● Z= ∑X ∏cΨc(Xc)

● Z sums over all possible conﬁgurations of X.

● Computationally expensive for large graphs (e.g., 2n for
n binary variables).
Example: Product Over Cliques

A B C

● Cliques : {A,B} and

{B,C}
● Variables: A, B, and C
are binary variables
(can take values 0 or
1).
Example: Product Over Cliques A B C

● Calculate the Partition Function Z

Example: Product Over Cliques A B C
Example: Product Over Cliques A B C

Calculate P(X) for Each

Conﬁguration
Challenges in UGMs

● Potential functions Ψc(Xc) are unrestricted (unlike

conditional probabilities in directed models).
● Normalization is required to ensure P(X) is a valid
probability distribution.
● Inference and computation of Z are computationally
intensive.
● Example: For a graph with 20 binary variables, Z requires
summing over 220=1,048,576 configurations.
Factorization in Undirected vs. Directed Graphs
● Directed Graphs (Bayesian Networks): Factorize as a product of conditional probabilities:
𝑃(𝑋1,𝑋2,...,𝑋𝑛)=∏𝑖 𝑃(𝑋 𝑖∣Parents(𝑋 𝑖))
● Dependencies are encoded through parent-child relationships.
● Example: Disease diagnosis models.
● Undirected Graphs (MRFs & CRFs) : Factorize as a product of potential functions over cliques:
𝑃(𝑋)=(1/Z) ∏cΨc(Xc)
● No directionality; dependencies are symmetric.
● Example: Image segmentation using MRFs.
● Key Differences
● Directed Graphs: Encode causal relationships, easy sampling.
● Undirected Graphs: More flexible but harder for inference.
Why factorization use cliques
● Factorization in undirected graphs uses cliques because of
the Hammersley-Clifford Theorem, which states that if a
probability distribution 𝑃(𝑋) is positive and follows the
Markov property with respect to an undirected graph, then it
can be factorized into a product of potential functions over
the maximal cliques of the graph.
𝑃(𝑋)=(1/Z) ∏cΨc(Xc)
Why Cliques?
● Complete Local Dependencies:
○ A clique is a fully connected subset of nodes, meaning all
variables within the clique directly interact.
○ Factorization over cliques ensures that no dependencies are
ignored.
● Markov Property Preservation:
○ The clique potentials maintain local conditional independence,
ensuring consistency with the graphical structure.
● Simplified Computation:
○ Using maximal cliques reduces redundancy and avoids
unnecessary factorization over non-maximal subsets.
Hammersley-Clifford Theorem

● Statement:
● Any probability distribution consistent with the factorization over a
graph can be expressed using potential functions of the form:
Ψc(Xc)=exp{−E(Xc)}
● E(Xc): Energy function (can be any real-valued function).

● Key Points:
● Energy functions simplify the representation of potential functions.
● High energy corresponds to low probability, and vice versa.
Energy Functions and Intuition
● Energy functions E(Xc) are derived from data.
● High-count conﬁgurations in data are assigned low energy
(high probability).
● Energy functions are not restricted to being normalized.
Markov Random Fields (MRFs)
Markov Random Fields (MRFs)

● Deﬁnition:
● Undirected graphical models are also called Markov Random Fields.
● Variables are independent of non-neighbors given their immediate
neighbors.

● Key Points:
● MRFs are commonly used in image processing (e.g., pixel labeling).
● Example: Lattice structure for modeling images.
Pixel Labeling with MRFs
● Each pixel is a random variable (e.g., foreground or background).
● Potential functions are defined for edges (pairs of neighboring pixels).
● Inference involves finding the configuration of labels with the lowest
energy.
● Example:
○ For a 3x3 image, there are 9 pixels, each with a label (foreground or
background).
○ The goal is to assign labels such that the overall energy is minimized.
Pixel Labeling with MRFs
Training MRFs

● Node potentials: Derived from observed pixel values.

● Edge potentials: Learned from co-occurrence statistics in the
data.
● Energy functions are often learned using maximum likelihood
or logistic regression.
Example of MRF Training

● Dataset Preparation:
● Given an image dataset where each pixel needs to be labeled
(e.g., foreground/background segmentation).
● Deﬁning Potentials:
● Node potentials: Derived from observed pixel values using a
classiﬁer (e.g., logistic regression).
● Edge potentials: Learned from co-occurrence statistics in labeled
training data.
Example of MRF Training
● Energy Function:
● The energy function is formulated as:

𝐸(𝑋)=∑𝑖𝜓𝑖(𝑋𝑖)+∑(𝑖,𝑗)∈𝐸𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)

● Where 𝜓𝑖(𝑋𝑖)represents node potentials and 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗) represents edge potentials.

● Parameter Learning:
● Use Maximum Likelihood Estimation (MLE) to estimate parameters.
● Gradient-based optimization methods (e.g., Stochastic Gradient Descent) help reﬁne parameters.
● Inference for Prediction:
● Given a new image, inference (e.g., Graph Cuts, Belief Propagation) ﬁnds the most probable pixel
labels.
Example of MRF Training

● Problem:

Given a noisy grayscale image, classify each pixel as foreground (1) or

background (0) using MRF.

● Step 1: Deﬁne the Graph Structure

Each pixel is a node in an undirected graph.

Edges connect neighboring pixels (e.g., 4-nearest neighbors).

Each node 𝑋𝑖 takes values 0 (background) or 1 (foreground).

Example of MRF Training

● Step 2: Deﬁne Node and Edge Potentials

● Node Potential 𝜓𝑖(𝑋𝑖): The probability of a pixel being foreground or
background based on its observed intensity 𝑌𝑖 .
● Example: 𝜓𝑖(𝑋𝑖)=𝑃(𝑌𝑖∣𝑋𝑖=1) if foreground or 𝑃(𝑌𝑖∣𝑋𝑖=0)if background
where probabilities can be estimated from histograms of training images.
● Edge Potential 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗): Encourages smoothness (neighboring pixels
prefer similar labels)
● 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)=𝑒 −𝛽∣𝑌𝑖−𝑌𝑗∣ where 𝛽 controls smoothness.
Example of MRF Training

● Step 3: Compute the Energy Function

● The total energy is:𝐸(𝑋)=∑𝑖𝜓𝑖(𝑋𝑖)+∑(𝑖,𝑗)∈𝐸𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)
● For a 3x3 pixel patch:
● Observed intensities:

● Assumed foreground intensity range: 200-255

● Assumed background intensity range: 0-100
Challenges in Inference

● Exact inference is computationally hard, especially in graphs

with loops.
● Approximate inference methods are often used.
● Trees allow for exact inference, but loops complicate the
process.
Applications of MRFs

● Image segmentation and labeling.

● Conditional Random Fields (CRFs): Extension of MRFs for
structured prediction tasks.
● Widely used in computer vision and natural language processing.
Introduction to HMMs

● What is an HMM?
● A graphical model representing probabilistic dependencies.
● Hidden states generate observed sequences.
● Based on the Markov assumption: The present state depends
only on the previous state.
Components of an HMM

1. States (X): Hidden variables.

2. Observations (Y): The observable sequence.
3. Transition Probabilities (A): Probabilities of moving from
one state to another.
4. Emission Probabilities (B): Probability of an observation
given a state.
5. Initial Probabilities (π): Probability distribution of the
starting state.
Graphical Representation

● A directed chain
structure where
each state depends
only on the
previous state.
● Observations
depend only on the
corresponding
hidden state.
Example - Part-of-Speech Tagging

● States (X): Noun (N), Verb (V), Adjective (A)

● Observations (Y): Words in a sentence ("dog", "barks", "loudly")
● Transition Probabilities (A):
P(N → V) = 0.5, P(N → A) = 0.3, etc.
● Emission Probabilities (B):
P("dog" | N) = 0.6, P("barks" | V) = 0.7, etc.
Calculated Example 0.3

0.7 0.4 0.6

R S
Given:
0.5 0.6
States: Rainy (R), Sunny (S) 0.4
0.1 0.3 0.1
Observations: Walk (W), Shop (Sh), Clean (C)

Transition Probabilities (A):

P(R|R) = 0.7, P(S|R) = 0.3 W Sh C

P(R|S) = 0.4, P(S|S) = 0.6

Emission Probabilities (B):

P(W|R) = 0.1, P(Sh|R) = 0.4, P(C|R) = 0.5

P(W|S) = 0.6, P(Sh|S) = 0.3, P(C|S) = 0.1

Initial Probabilities (π):P(R) = 0.6, P(S) = 0.4

Calculated Example

● Question
What is the probability of observing the sequence (Walk,
Shop) given the model?
Calculated Example
Inference in Graphical Models

● Two Core Problems:

● Inference: Given a model, answer queries (e.g., marginals, conditionals).
○ Example: P(Job∣Grade)
● Learning:
○ Parameter Learning: Estimate potentials/CPDs given the
graph.
○ Structure Learning: Discover the graph from data (harder).
● Challenge:
● Marginalizing over large joint distributions is computationally
expensive.
Example Model (Student Scenario)
coherence
P(C,D,I,G,S,L,J,H)=
P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I)
P(L∣G) P(J∣L,S) P(H∣G,J) difficulty intelligence

grade
Inference Query: SAT

P(J)=∑C,D,I,G,S,L,HP(C,D,I,G,S,L,J,H)
letter
8
Naive summation: O(2 ) for
binary variables
happy Job
Variable Elimination (Intuition)

● Goal: Push sums inward to minimize computation.

● Key Idea:
● Factorize: Express joint as product of local potentials
(CPDs/factors).
● Eliminate Variables Sequentially: Marginalize one variable at
a time, updating remaining factors.
● Example Elimination Order:
● C→D→I→S→L→G→H
Step-by-Step Elimination-1

● Eliminate C
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ1(D)=∑CP(C)P(D∣C)
● New factor over D
Step-by-Step Elimination-2

● Eliminate D
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ2(G,I)=∑DP(G∣I,D)τ1(D)
● New factor over G,I.
Step-by-Step Elimination-3

● Eliminate I
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ3(G,S)=∑Iτ2(G,I)P(I)P(S∣I)
Step-by-Step Elimination-4

● Eliminate S
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ4(J,L,G)=∑Sτ3(G,S)P(J∣L,S)
Step-by-Step Elimination-5

● Eliminate L
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)

● Compute τ5(J,G)=∑Lτ4(J,L,G)P(L∣G)
Step-by-Step Elimination-6

● Eliminate G
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)

● Compute τ6(J,H)=∑Gτ5(J,G)P(H∣G,J)
Step-by-Step Elimination-6

● Eliminate H
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)
● τ6(J,H)

● Compute P(J)=∑Hτ6(J,H)
Computational Complexity
● Induced Width: Size of the largest clique formed during
elimination.
● Determines complexity: O(exp(induced width))
● Good Order:Always eliminate variables that disconnect the
graph first (like C).
● Rule of Thumb: Start with variables that have the fewest
connections!
● Fill-in Edges: Edges added during elimination to form cliques.
Treewidth and Induced Width
● Definition:
● Treewidth: Minimal induced width across all possible
elimination orderings.
● Induced Width: Size of the largest clique formed during
variable elimination.
● Key Points:
● Measures graph complexity for inference.
● For trees: Treewidth = 1 (if defined as max clique size - 1).
● Elimination order impacts induced width (leaf-to-root is
optimal for trees).
Limitations of Variable Elimination

● Problem:
● Recomputing intermediates for new queries (e.g., P(H)after
P(J)).
● No reuse of calculations.
Belief Propagation (The "Time-Saver")
● Cache intermediate results ("messages") between nodes.
● Like writing down sub-totals while cleaning:
○ Compute "message" from C to D once.
○ Reuse it for all future queries involving D
● Works perfectly for trees (no loops).
● Approximate for complex graphs (but still faster).
R S

Example of Belief Propagation

Consider this Bayesian Network (Tree):

Rain (R) → Wet Grass (G)
Sprinkler (S) → Wet Grass (G)

We want to compute:

P(R=1∣G=1) (Probability it rained given the grass is wet).

P(S=1∣G=1) (Probability the sprinkler was on given the grass is

wet).
R S

Step 1: Deﬁne Probabilities

G
● Assume binary variables (0/1)
● Priors: P(R=1)=0.2 P(S=1)=0.1
● Conditional Probabilities:
● P(G=1∣R=1,S=0)=0.9 (Grass gets wet if it rains)
● P(G=1∣R=0,S=1)=0.8 (Grass gets wet if sprinkler is on)
● P(G=1∣R=1,S=1)=0.99 (Both rain and sprinkler)
● P(G=1∣R=0,S=0)=0.01 (Neither)
Step 1: Factor Graph Representation
R S
● The joint distribution factors as:
P(R,S,G)=P(R)⋅P(S)⋅P(G∣R,S)
● Priors:
G
(R)=[0.8,0.2] (for R=0,R=1)
(S)=[0.9,0.1] (for S=0,S=1)

● Conditional Probability Table (CPT)

for G (R,S,G): Given by the problem
(see table below).
Step 1: Factor Graph Representation
● Parent vs. Child Nodes: R S

○ Parents (R,S): Directly inﬂuence the child (G).

○ Child (G): Depends on its parents (via P(G∣R,S)).

G
● Message Types:

○ Child-to-parent message: Sent from G to R (or S).

○ "Given my observed value, here’s how likely each

of your states is."

○ Parent-to-child message: Sent from R or S to G.

○ "Here’s my current belief about my state."

● Goal: Compute how observing G=1 updates beliefs about

parents R and S.
Message G to Parent R
Message G to Parent R
R S

How R Updates Its Belief

G
Message from Child G to Parent S
R S

G
R S

Key Intuition
G

● Child-to-parent message: Summarizes how well each parent

state explains the observed child, accounting for uncertainty
in the other parents.
● Parent-to-child message: Shares the parent’s current belief
about itself.
Key Formula
MAP Inference (Finding the "Best" Conﬁguration)

● Goal: Find the most probable assignment (e.g., best

job/happiness combo).
● Trick: Replace sums with max in variable elimination:
○ Compute maxCP(C)P(D∣C) instead of ∑C .
○ Track which C value gave the max (e.g., C=1).
○ Repeat for all variables.
● Result: Probability of the best conﬁguration + the
conﬁguration itself.
Assignment-9 (Cs-101- 2024) (Week-9)

Source
Question-1
In the undirected graph given below, how many terms will be there in its potential
function factorization?

a) 7
b) 3
c) 5
d) 9
e) None
Question-1- Correct answer

In the undirected graph given below, how many terms will be there in its
potential function factorization?

a) 7
b) 3
c) 5
d) 9
e) none

Correct options: (b)-three cliques {A,D,E}, {A,B,C,D}, {B,C,F}

Question-2

Consider the following directed graph:

Based on the d-separation rules, which of the following statements is true?

a) A and C are conditionally independent given B

b) A and E are conditionally independent given D
c) B and E are conditionally dependent given C
d) A and C are conditionally dependent given D and E
Question-2-Explanation

a) A and C are conditionally independent given B: A→B→C

Chain Structure: if observed→ block (independent)

b) A and E are conditionally independent given D

Chain Structure: if observed→ block (independent)

c) B and E are conditionally dependent given C

Chain Structure: if observed→ block (independent)

d) A and C are conditionally dependent given D and E

A→D→E← C: if E,D observes , collider path, observe means dependent
Question-2- Correct answer

Consider the following directed graph: Based on the d-separation rules,

which of the following statements is true?

a) A and C are conditionally independent given B

b) A and E are conditionally independent given D
c) B and E are conditionally dependent given C
d) A and C are conditionally dependent given D and E

Correct options: (a) (b) (d)

Question-3

Consider the following undirected graph: In the undirected graph given

above, which nodes are conditionally independent of each other given C?
Select all that apply.

a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above
Question-3 - Correct answer

Consider the following undirected graph: In the undirected graph given

above, which nodes are conditionally independent of each other given C?
Select all that apply.

a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above

Correct options: (e)-All pairs have an alternate route to each other that
does not pass through C
Question-4
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov
Models?

a) I and II
b) I and IV
c) II and III
d) III and IV
Question-4
I. The ”Hidden” in HMM refers to the fact that the state transition
probabilities are unknown. — it is known
II. The ”Markov” property means that the current state depends only on the
previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not
directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations
for calculations. – Markov means future state dependence on current state
only
Question-4 - Correct answer
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov Models?

a) I and II
b) I and IV
c) II and III
d) III and IV

Correct options: (c)

Question-5

For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)

a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A
Question-5 - Correct answer

For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)

a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A

Correct options: (a)

Question-6
Consider the following statements regarding belief propagation:
I. Belief propagation is used to compute marginal probabilities in graphical models.
II. Belief propagation can be applied to both directed and undirected graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy graphs.
IV. Belief propagation works by passing messages between nodes in a graph.
Which of the statements correctly describe the use of belief propagation?

a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV
Question-6

I. Belief propagation is used to compute marginal probabilities in graphical

models.
II. Belief propagation can be applied to both directed and undirected
graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy
graphs.-- exact solution for tree not loops
IV. Belief propagation works by passing messages between nodes in a
graph.
Question-6 - Correct answer
Consider the following statements regarding belief propagation:
I. Belief propagation is used to compute marginal probabilities in graphical models.
II. Belief propagation can be applied to both directed and undirected graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy graphs.
IV. Belief propagation works by passing messages between nodes in a graph.
Which of the statements correctly describe the use of belief propagation?

a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV

Correct options: (c)

Question-7
HMMs are used for ﬁnding these. Select all that apply

a) Probability of a given observation sequence

b) All possible hidden state sequences given an observation sequence
c) Most probable observation sequence given the hidden states
d) Most probable hidden states given the observation sequence
Question-7 - Correct answer
HMMs are used for ﬁnding these. Select all that apply

a) Probability of a given observation sequence

b) All possible hidden state sequences given an observation sequence
c) Most probable observation sequence given the hidden states
d) Most probable hidden states given the observation sequence

Correct options: (a)(d)

Suggestions and Feedback

Next Session:

Tuesday:
01-Apr-2025
6:00 - 8:00 PM

MPFM-401CM Series Instruction Manual OR110424.3
No ratings yet
MPFM-401CM Series Instruction Manual OR110424.3
64 pages
Building Probabilistic Graphical Models With Python
No ratings yet
Building Probabilistic Graphical Models With Python
24 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
Data Collection Budget Template
No ratings yet
Data Collection Budget Template
5 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
AItRBM Proof
No ratings yet
AItRBM Proof
23 pages
Markov Networks
No ratings yet
Markov Networks
22 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
16831_lecture07_bneuman
No ratings yet
16831_lecture07_bneuman
6 pages
Directed vs. Undirected Graphical Models
No ratings yet
Directed vs. Undirected Graphical Models
16 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Markov Random Fields and Segmentation With Graph Cuts: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Markov Random Fields and Segmentation With Graph Cuts: Computer Vision Jia-Bin Huang, Virginia Tech
44 pages
Introduction To Markov Random Fields: Andrew Blake and Pushmeet Kohli
No ratings yet
Introduction To Markov Random Fields: Andrew Blake and Pushmeet Kohli
15 pages
Lab Tutorial
No ratings yet
Lab Tutorial
103 pages
MRF For Vision and Image Processing
No ratings yet
MRF For Vision and Image Processing
472 pages
Introduction To MRFs
No ratings yet
Introduction To MRFs
6 pages
2.-UndirectedGraphs 2
No ratings yet
2.-UndirectedGraphs 2
8 pages
unit -IV
No ratings yet
unit -IV
1 page
Restricted Boltzmann Machines: Abstract
No ratings yet
Restricted Boltzmann Machines: Abstract
21 pages
Machine Learning Technique - Introduction To Graphical Models
No ratings yet
Machine Learning Technique - Introduction To Graphical Models
12 pages
Graph Lecture19
No ratings yet
Graph Lecture19
42 pages
PGM Theory Notes
No ratings yet
PGM Theory Notes
16 pages
CVX Lecture Graphs
No ratings yet
CVX Lecture Graphs
79 pages
Factor Graph
No ratings yet
Factor Graph
3 pages
17 Factor Graphs
No ratings yet
17 Factor Graphs
27 pages
Unit III GNN
No ratings yet
Unit III GNN
56 pages
Learning Graphs From Data A Signal Representation Perspective
No ratings yet
Learning Graphs From Data A Signal Representation Perspective
20 pages
Image Analysis and Markov Random Fields (MRFS)
No ratings yet
Image Analysis and Markov Random Fields (MRFS)
22 pages
prob_inf
No ratings yet
prob_inf
56 pages
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
No ratings yet
GUISE Uniform Sampling of Graphlets For Large Graph Analysis Removed
4 pages
grl unit 3
No ratings yet
grl unit 3
14 pages
The Graph Neural Network Model
No ratings yet
The Graph Neural Network Model
20 pages
grl unit 4
No ratings yet
grl unit 4
15 pages
Directed Graphical Models
No ratings yet
Directed Graphical Models
54 pages
Semi-Supervised Learning with Graphs
No ratings yet
Semi-Supervised Learning with Graphs
174 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Conditional Random Field
No ratings yet
Conditional Random Field
5 pages
Graph Neural Networks
100% (1)
Graph Neural Networks
27 pages
Mining Complex Networks Bogumil Kaminski Pawel Pralat Francois Theberge download
100% (1)
Mining Complex Networks Bogumil Kaminski Pawel Pralat Francois Theberge download
89 pages
GNNS
No ratings yet
GNNS
7 pages
Original GNN
No ratings yet
Original GNN
22 pages
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
No ratings yet
WWW23-Tutorial-V6 Self-Supervised Learning and Pre-Training On Graphs
107 pages
Bacciu 2020
No ratings yet
Bacciu 2020
62 pages
Network Modelling and Variational Bayesian Inference For Structure Analysis of Signed Networks
No ratings yet
Network Modelling and Variational Bayesian Inference For Structure Analysis of Signed Networks
19 pages
Concentration of Random Graphs and Application To Community Detection
No ratings yet
Concentration of Random Graphs and Application To Community Detection
22 pages
CS109/Stat121/AC209/E-109 Data Science: Network Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Network Models
20 pages
CS109/Stat121/AC209/E-109 Data Science: Network Models
No ratings yet
CS109/Stat121/AC209/E-109 Data Science: Network Models
20 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Markov Chain Monte Carlo Foundations Applications Lecture Notes Ucb Cs294 Itebooks instant download
No ratings yet
Markov Chain Monte Carlo Foundations Applications Lecture Notes Ucb Cs294 Itebooks instant download
46 pages
Introduction to Random Graphs Alan Frieze pdf download
No ratings yet
Introduction to Random Graphs Alan Frieze pdf download
54 pages
2204.07697v1
No ratings yet
2204.07697v1
23 pages
AI - W6L12
No ratings yet
AI - W6L12
44 pages
16 Graphical Models
No ratings yet
16 Graphical Models
27 pages
Instant download Introduction to Random Graphs Alan Frieze pdf all chapter
No ratings yet
Instant download Introduction to Random Graphs Alan Frieze pdf all chapter
57 pages
Markov Random Fields (MRF)
No ratings yet
Markov Random Fields (MRF)
42 pages
Graphs and Networks
No ratings yet
Graphs and Networks
5 pages
GraphBasedDataScience
No ratings yet
GraphBasedDataScience
37 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Khushboo Komal FullStackPythonDeveloper
No ratings yet
Khushboo Komal FullStackPythonDeveloper
3 pages
Knowledge Management in Engineering Through ICT-2. DR K Dhamodharan Scopus Paper
No ratings yet
Knowledge Management in Engineering Through ICT-2. DR K Dhamodharan Scopus Paper
25 pages
01company Profile - Ambetronics Engineers PVT - LTD
No ratings yet
01company Profile - Ambetronics Engineers PVT - LTD
13 pages
Computer Project
No ratings yet
Computer Project
35 pages
ME 406 The Logistic Map: 1. Introduction
No ratings yet
ME 406 The Logistic Map: 1. Introduction
32 pages
Pbl Report Shivam
No ratings yet
Pbl Report Shivam
21 pages
Trainin PHP
No ratings yet
Trainin PHP
5 pages
Security Assignment
No ratings yet
Security Assignment
114 pages
Properties of Document Object: Element That Represents The HTML Document. It Has Properties and Methods. by The Help of
No ratings yet
Properties of Document Object: Element That Represents The HTML Document. It Has Properties and Methods. by The Help of
15 pages
C/C++ Code and Arduino Code: Sistemas Embebidos Oscar Acevedo, PHD
No ratings yet
C/C++ Code and Arduino Code: Sistemas Embebidos Oscar Acevedo, PHD
7 pages
Raspberry Pi Annual Volume 2
No ratings yet
Raspberry Pi Annual Volume 2
177 pages
3rd Lateral 9, 10 & 11
No ratings yet
3rd Lateral 9, 10 & 11
1 page
Osteoporosis Assessment Update What's New in 2005
No ratings yet
Osteoporosis Assessment Update What's New in 2005
98 pages
Cse2icx Assignment1
No ratings yet
Cse2icx Assignment1
4 pages
Playwright Python API Testing
No ratings yet
Playwright Python API Testing
10 pages
Microprocessor Notes
No ratings yet
Microprocessor Notes
21 pages
TLM for CNN
No ratings yet
TLM for CNN
32 pages
Associate Cloud Engineer - 3
No ratings yet
Associate Cloud Engineer - 3
26 pages
Wiring Diagram: RS485 Communication Connections For All 3K Spec 200 Series With 719 Electronics
No ratings yet
Wiring Diagram: RS485 Communication Connections For All 3K Spec 200 Series With 719 Electronics
1 page
Printing and Colour Definitions - Draft Dec 2020
No ratings yet
Printing and Colour Definitions - Draft Dec 2020
20 pages
Face Landmark Detection Using CNN, Random Forest & XGBoost
No ratings yet
Face Landmark Detection Using CNN, Random Forest & XGBoost
26 pages
Ericsson Router 6676
No ratings yet
Ericsson Router 6676
2 pages
S. Vasim Raja - Original PDF
No ratings yet
S. Vasim Raja - Original PDF
3 pages
Curriculum Vitae of Rifat
No ratings yet
Curriculum Vitae of Rifat
2 pages
BCM7312 Micro Directv L11
100% (1)
BCM7312 Micro Directv L11
3 pages
S AV6 Toshiba Elenota - PL 1
No ratings yet
S AV6 Toshiba Elenota - PL 1
3 pages
Module 3 RS GIS
No ratings yet
Module 3 RS GIS
11 pages
A MATLAB Toolbox For Large-Scale Networked Systems
No ratings yet
A MATLAB Toolbox For Large-Scale Networked Systems
12 pages

Week-9

Uploaded by

Week-9

Uploaded by

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-9)

1. Undirected Graph Models

● Undirected Graphical Models (UGMs) represent joint

● Potential functions Ψc(Xc) are associated with cliques in the

● A clique in an undirected graph is a subset of nodes where:

Non-Maximal Clique: A clique that is part of a larger clique.

● Potential functions Ψc(Xc) are associated with cliques in the

● Potential functions Ψc(Xc) are associated with cliques in the graph.

● Clique: Two neighboring pixels in an image.

● Z sums over all possible conﬁgurations of X.

● Cliques : {A,B} and

● Calculate the Partition Function Z

Calculate P(X) for Each

● Potential functions Ψc(Xc) are unrestricted (unlike

● Node potentials: Derived from observed pixel values.

● Where 𝜓𝑖(𝑋𝑖)represents node potentials and 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗) represents edge potentials.

Given a noisy grayscale image, classify each pixel as foreground (1) or

● Step 1: Deﬁne the Graph Structure

Each pixel is a node in an undirected graph.

Edges connect neighboring pixels (e.g., 4-nearest neighbors).

Each node 𝑋𝑖 takes values 0 (background) or 1 (foreground).

● Step 2: Deﬁne Node and Edge Potentials

● Step 3: Compute the Energy Function

● Assumed foreground intensity range: 200-255

● Exact inference is computationally hard, especially in graphs

● Image segmentation and labeling.

1. States (X): Hidden variables.

● States (X): Noun (N), Verb (V), Adjective (A)

0.7 0.4 0.6

Transition Probabilities (A):

P(R|R) = 0.7, P(S|R) = 0.3 W Sh C

Emission Probabilities (B):

P(W|R) = 0.1, P(Sh|R) = 0.4, P(C|R) = 0.5

P(W|S) = 0.6, P(Sh|S) = 0.3, P(C|S) = 0.1

Initial Probabilities (π):P(R) = 0.6, P(S) = 0.4

● Two Core Problems:

● Goal: Push sums inward to minimize computation.

Example of Belief Propagation

Consider this Bayesian Network (Tree):

P(R=1∣G=1) (Probability it rained given the grass is wet).

P(S=1∣G=1) (Probability the sprinkler was on given the grass is

Step 1: Deﬁne Probabilities

● Conditional Probability Table (CPT)

○ Parents (R,S): Directly inﬂuence the child (G).

○ Child (G): Depends on its parents (via P(G∣R,S)).

○ Child-to-parent message: Sent from G to R (or S).

○ "Given my observed value, here’s how likely each

○ Parent-to-child message: Sent from R or S to G.

○ "Here’s my current belief about my state."

● Goal: Compute how observing G=1 updates beliefs about

How R Updates Its Belief

● Child-to-parent message: Summarizes how well each parent

● Goal: Find the most probable assignment (e.g., best

Correct options: (b)-three cliques {A,D,E}, {A,B,C,D}, {B,C,F}

Consider the following directed graph:

a) A and C are conditionally independent given B

a) A and C are conditionally independent given B: A→B→C

b) A and E are conditionally independent given D

c) B and E are conditionally dependent given C

d) A and C are conditionally dependent given D and E

Consider the following directed graph: Based on the d-separation rules,

a) A and C are conditionally independent given B

Correct options: (a) (b) (d)

Consider the following undirected graph: In the undirected graph given

Consider the following undirected graph: In the undirected graph given

Correct options: (c)

Correct options: (a)

I. Belief propagation is used to compute marginal probabilities in graphical

Correct options: (c)

a) Probability of a given observation sequence

a) Probability of a given observation sequence

Correct options: (a)(d)

You might also like