0% found this document useful (0 votes)
0 views

Week-9

The document provides an overview of Undirected Graphical Models (UGMs) and Hidden Markov Models (HMMs), focusing on their structures, potential functions, and applications in problem-solving. It discusses the properties of cliques, the significance of potential functions in modeling relationships, and the challenges in inference and computation. Additionally, it covers the training of Markov Random Fields (MRFs) for image processing tasks and the inference processes in graphical models.

Uploaded by

Harshit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Week-9

The document provides an overview of Undirected Graphical Models (UGMs) and Hidden Markov Models (HMMs), focusing on their structures, potential functions, and applications in problem-solving. It discusses the properties of cliques, the significance of potential functions in modeling relationships, and the challenges in inference and computation. Additionally, it covers the training of Markov Random Fields (MRFs) for image processing tasks and the inference processes in graphical models.

Uploaded by

Harshit Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

Introduction to Machine Learning

- Prof. Balaraman Ravindran | IIT Madras

Problem Solving Session (Week-9)


Shreya Bansal
PMRF PhD Scholar
IIT Ropar
Week-9 Contents

1. Undirected Graph Models


2. Hidden Markov Model
3. Variable elimination
4. Tree width and Belief Propagation
Introduction to Undirected Graphical Models
Introduction to Undirected Graphical Models

● Undirected Graphical Models (UGMs) represent joint


probability distributions using potential functions.
● Unlike Directed Graphical Models (Bayesian Networks),
UGMs do not impose restrictions on the form of potential
functions.
● UGMs are used to model relationships where directionality is
not inherent (e.g., pixel neighborhoods in images).
Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the


graph.
Cliques in Undirected Graphical Models

● A clique in an undirected graph is a subset of nodes where:


○ Every pair of nodes in the subset is connected by an
edge. (Fully connected)
○ The subset is maximal (no additional node can be
added to the subset while maintaining full
connectivity).
Types of Cliques
Maximal Clique: A clique that cannot be extended by adding another node.

Non-Maximal Clique: A clique that is part of a larger clique.


Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the


graph.
● They capture the "compatibility" or "affinity" between
variables in a clique.
Compatibility and Affinity

● Compatibility:
● Refers to how well the values of variables in a clique "agree"
with each other.
● High compatibility means the configuration is likely or
desirable.
● Affinity:
● Refers to the strength of the relationship between variables
in a clique.
● High affinity means the variables strongly influence each
other.
Potential Functions

● Potential functions Ψc(Xc) are associated with cliques in the graph.


● They capture the "compatibility" or "affinity" between variables in a
clique.
● Properties:
● Ψc(Xc)≥0(non-negative).
● Not restricted to being probabilities (unlike conditional probabilities in
directed models).
● Example:
● For a clique c={X1,X2} , Ψc(X1 , X2) measures how likely X1 and X2 are to
take specific values together.
Compatibility in a Clique
Example: Pixel Labeling in Images

● Clique: Two neighboring pixels in an image.


● Potential Function:
● Assign high compatibility if neighboring pixels have the
same label (e.g., both foreground or both background).
● Assign low compatibility if neighboring pixels have different
labels.
● Result:
● Encourages smoothness in pixel labeling (e.g., large regions
of foreground or background).
Probability Distribution in UGMs

● Formula:
● P(X)=(1/Z) ∏cΨc(Xc)
● Ψc(Xc): Potential function for clique c.
● Z: Partition function (normalizing constant).
● Ψc(Xc) must be non-negative.
● Z ensures P(X) is a valid probability distribution.
Partition Function (Z)

● Z= ∑X ∏cΨc(Xc)

● Z sums over all possible configurations of X.


● Computationally expensive for large graphs (e.g., 2n for
n binary variables).
Example: Product Over Cliques

A B C

● Cliques : {A,B} and


{B,C}
● Variables: A, B, and C
are binary variables
(can take values 0 or
1).
Example: Product Over Cliques A B C

● Calculate the Partition Function Z


Example: Product Over Cliques A B C
Example: Product Over Cliques A B C

Calculate P(X) for Each


Configuration
Challenges in UGMs

● Potential functions Ψc(Xc) are unrestricted (unlike


conditional probabilities in directed models).
● Normalization is required to ensure P(X) is a valid
probability distribution.
● Inference and computation of Z are computationally
intensive.
● Example: For a graph with 20 binary variables, Z requires
summing over 220=1,048,576 configurations.
Factorization in Undirected vs. Directed Graphs
● Directed Graphs (Bayesian Networks): Factorize as a product of conditional probabilities:
𝑃(𝑋1,𝑋2,...,𝑋𝑛)=∏𝑖 𝑃(𝑋 𝑖∣Parents(𝑋 𝑖))
● Dependencies are encoded through parent-child relationships.
● Example: Disease diagnosis models.
● Undirected Graphs (MRFs & CRFs) : Factorize as a product of potential functions over cliques:
𝑃(𝑋)=(1/Z) ∏cΨc(Xc)
● No directionality; dependencies are symmetric.
● Example: Image segmentation using MRFs.
● Key Differences
● Directed Graphs: Encode causal relationships, easy sampling.
● Undirected Graphs: More flexible but harder for inference.
Why factorization use cliques
● Factorization in undirected graphs uses cliques because of
the Hammersley-Clifford Theorem, which states that if a
probability distribution 𝑃(𝑋) is positive and follows the
Markov property with respect to an undirected graph, then it
can be factorized into a product of potential functions over
the maximal cliques of the graph.
𝑃(𝑋)=(1/Z) ∏cΨc(Xc)
Why Cliques?
● Complete Local Dependencies:
○ A clique is a fully connected subset of nodes, meaning all
variables within the clique directly interact.
○ Factorization over cliques ensures that no dependencies are
ignored.
● Markov Property Preservation:
○ The clique potentials maintain local conditional independence,
ensuring consistency with the graphical structure.
● Simplified Computation:
○ Using maximal cliques reduces redundancy and avoids
unnecessary factorization over non-maximal subsets.
Hammersley-Clifford Theorem

● Statement:
● Any probability distribution consistent with the factorization over a
graph can be expressed using potential functions of the form:
Ψc(Xc)=exp{−E(Xc)}
● E(Xc): Energy function (can be any real-valued function).

● Key Points:
● Energy functions simplify the representation of potential functions.
● High energy corresponds to low probability, and vice versa.
Energy Functions and Intuition
● Energy functions E(Xc) are derived from data.
● High-count configurations in data are assigned low energy
(high probability).
● Energy functions are not restricted to being normalized.
Markov Random Fields (MRFs)
Markov Random Fields (MRFs)

● Definition:
● Undirected graphical models are also called Markov Random Fields.
● Variables are independent of non-neighbors given their immediate
neighbors.

● Key Points:
● MRFs are commonly used in image processing (e.g., pixel labeling).
● Example: Lattice structure for modeling images.
Pixel Labeling with MRFs
● Each pixel is a random variable (e.g., foreground or background).
● Potential functions are defined for edges (pairs of neighboring pixels).
● Inference involves finding the configuration of labels with the lowest
energy.
● Example:
○ For a 3x3 image, there are 9 pixels, each with a label (foreground or
background).
○ The goal is to assign labels such that the overall energy is minimized.
Pixel Labeling with MRFs
Training MRFs

● Node potentials: Derived from observed pixel values.


● Edge potentials: Learned from co-occurrence statistics in the
data.
● Energy functions are often learned using maximum likelihood
or logistic regression.
Example of MRF Training

● Dataset Preparation:
● Given an image dataset where each pixel needs to be labeled
(e.g., foreground/background segmentation).
● Defining Potentials:
● Node potentials: Derived from observed pixel values using a
classifier (e.g., logistic regression).
● Edge potentials: Learned from co-occurrence statistics in labeled
training data.
Example of MRF Training
● Energy Function:
● The energy function is formulated as:

𝐸(𝑋)=∑𝑖𝜓𝑖(𝑋𝑖)+∑(𝑖,𝑗)∈𝐸𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)

● Where 𝜓𝑖(𝑋𝑖)represents node potentials and 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗) represents edge potentials.


● Parameter Learning:
● Use Maximum Likelihood Estimation (MLE) to estimate parameters.
● Gradient-based optimization methods (e.g., Stochastic Gradient Descent) help refine parameters.
● Inference for Prediction:
● Given a new image, inference (e.g., Graph Cuts, Belief Propagation) finds the most probable pixel
labels.
Example of MRF Training

● Problem:

Given a noisy grayscale image, classify each pixel as foreground (1) or


background (0) using MRF.

● Step 1: Define the Graph Structure

Each pixel is a node in an undirected graph.

Edges connect neighboring pixels (e.g., 4-nearest neighbors).

Each node 𝑋𝑖 takes values 0 (background) or 1 (foreground).


Example of MRF Training

● Step 2: Define Node and Edge Potentials


● Node Potential 𝜓𝑖(𝑋𝑖): The probability of a pixel being foreground or
background based on its observed intensity 𝑌𝑖 .
● Example: 𝜓𝑖(𝑋𝑖)=𝑃(𝑌𝑖∣𝑋𝑖=1) if foreground or 𝑃(𝑌𝑖∣𝑋𝑖=0)if background
where probabilities can be estimated from histograms of training images.
● Edge Potential 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗): Encourages smoothness (neighboring pixels
prefer similar labels)
● 𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)=𝑒 −𝛽∣𝑌𝑖−𝑌𝑗∣ where 𝛽 controls smoothness.
Example of MRF Training

● Step 3: Compute the Energy Function


● The total energy is:𝐸(𝑋)=∑𝑖𝜓𝑖(𝑋𝑖)+∑(𝑖,𝑗)∈𝐸𝜓𝑖𝑗(𝑋𝑖,𝑋𝑗)
● For a 3x3 pixel patch:
● Observed intensities:

● Assumed foreground intensity range: 200-255


● Assumed background intensity range: 0-100
Challenges in Inference

● Exact inference is computationally hard, especially in graphs


with loops.
● Approximate inference methods are often used.
● Trees allow for exact inference, but loops complicate the
process.
Applications of MRFs

● Image segmentation and labeling.


● Conditional Random Fields (CRFs): Extension of MRFs for
structured prediction tasks.
● Widely used in computer vision and natural language processing.
Introduction to HMMs

● What is an HMM?
● A graphical model representing probabilistic dependencies.
● Hidden states generate observed sequences.
● Based on the Markov assumption: The present state depends
only on the previous state.
Components of an HMM

1. States (X): Hidden variables.


2. Observations (Y): The observable sequence.
3. Transition Probabilities (A): Probabilities of moving from
one state to another.
4. Emission Probabilities (B): Probability of an observation
given a state.
5. Initial Probabilities (π): Probability distribution of the
starting state.
Graphical Representation

● A directed chain
structure where
each state depends
only on the
previous state.
● Observations
depend only on the
corresponding
hidden state.
Example - Part-of-Speech Tagging

● States (X): Noun (N), Verb (V), Adjective (A)


● Observations (Y): Words in a sentence ("dog", "barks", "loudly")
● Transition Probabilities (A):
P(N → V) = 0.5, P(N → A) = 0.3, etc.
● Emission Probabilities (B):
P("dog" | N) = 0.6, P("barks" | V) = 0.7, etc.
Calculated Example 0.3

0.7 0.4 0.6


R S
Given:
0.5 0.6
States: Rainy (R), Sunny (S) 0.4
0.1 0.3 0.1
Observations: Walk (W), Shop (Sh), Clean (C)

Transition Probabilities (A):

P(R|R) = 0.7, P(S|R) = 0.3 W Sh C


P(R|S) = 0.4, P(S|S) = 0.6

Emission Probabilities (B):

P(W|R) = 0.1, P(Sh|R) = 0.4, P(C|R) = 0.5

P(W|S) = 0.6, P(Sh|S) = 0.3, P(C|S) = 0.1

Initial Probabilities (π):P(R) = 0.6, P(S) = 0.4


Calculated Example

● Question
What is the probability of observing the sequence (Walk,
Shop) given the model?
Calculated Example
Inference in Graphical Models

● Two Core Problems:


● Inference: Given a model, answer queries (e.g., marginals, conditionals).
○ Example: P(Job∣Grade)
● Learning:
○ Parameter Learning: Estimate potentials/CPDs given the
graph.
○ Structure Learning: Discover the graph from data (harder).
● Challenge:
● Marginalizing over large joint distributions is computationally
expensive.
Example Model (Student Scenario)
coherence
P(C,D,I,G,S,L,J,H)=
P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I)
P(L∣G) P(J∣L,S) P(H∣G,J) difficulty intelligence

grade
Inference Query: SAT

P(J)=∑C,D,I,G,S,L,HP(C,D,I,G,S,L,J,H)
letter
8
Naive summation: O(2 ) for
binary variables
happy Job
Variable Elimination (Intuition)

● Goal: Push sums inward to minimize computation.


● Key Idea:
● Factorize: Express joint as product of local potentials
(CPDs/factors).
● Eliminate Variables Sequentially: Marginalize one variable at
a time, updating remaining factors.
● Example Elimination Order:
● C→D→I→S→L→G→H
Step-by-Step Elimination-1

● Eliminate C
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ1(D)=∑CP(C)P(D∣C)
● New factor over D
Step-by-Step Elimination-2

● Eliminate D
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ2(G,I)=∑DP(G∣I,D)τ1(D)
● New factor over G,I.
Step-by-Step Elimination-3

● Eliminate I
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ3(G,S)=∑Iτ2(G,I)P(I)P(S∣I)
Step-by-Step Elimination-4

● Eliminate S
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)

● Compute τ4(J,L,G)=∑Sτ3(G,S)P(J∣L,S)
Step-by-Step Elimination-5

● Eliminate L
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)

● Compute τ5(J,G)=∑Lτ4(J,L,G)P(L∣G)
Step-by-Step Elimination-6

● Eliminate G
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)

● Compute τ6(J,H)=∑Gτ5(J,G)P(H∣G,J)
Step-by-Step Elimination-6

● Eliminate H
● P(C) P(D∣C) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ1(D) P(I) P(G∣I,D) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● P(I) τ2(G,I) P(S∣I) P(L∣G) P(J∣L,S) P(H∣G,J)
● τ3(G,S)P(L∣G) P(J∣L,S) P(H∣G,J)
● τ4(J,L,G)P(L∣G) P(H∣G,J)
● τ5(J,G)P(H∣G,J)
● τ6(J,H)

● Compute P(J)=∑Hτ6(J,H)
Computational Complexity
● Induced Width: Size of the largest clique formed during
elimination.
● Determines complexity: O(exp(induced width))
● Good Order:Always eliminate variables that disconnect the
graph first (like C).
● Rule of Thumb: Start with variables that have the fewest
connections!
● Fill-in Edges: Edges added during elimination to form cliques.
Treewidth and Induced Width
● Definition:
● Treewidth: Minimal induced width across all possible
elimination orderings.
● Induced Width: Size of the largest clique formed during
variable elimination.
● Key Points:
● Measures graph complexity for inference.
● For trees: Treewidth = 1 (if defined as max clique size - 1).
● Elimination order impacts induced width (leaf-to-root is
optimal for trees).
Limitations of Variable Elimination

● Problem:
● Recomputing intermediates for new queries (e.g., P(H)after
P(J)).
● No reuse of calculations.
Belief Propagation (The "Time-Saver")
● Cache intermediate results ("messages") between nodes.
● Like writing down sub-totals while cleaning:
○ Compute "message" from C to D once.
○ Reuse it for all future queries involving D
● Works perfectly for trees (no loops).
● Approximate for complex graphs (but still faster).
R S

Example of Belief Propagation


G

Consider this Bayesian Network (Tree):


Rain (R) → Wet Grass (G)
Sprinkler (S) → Wet Grass (G)

We want to compute:

P(R=1∣G=1) (Probability it rained given the grass is wet).

P(S=1∣G=1) (Probability the sprinkler was on given the grass is


wet).
R S

Step 1: Define Probabilities


G
● Assume binary variables (0/1)
● Priors: P(R=1)=0.2 P(S=1)=0.1
● Conditional Probabilities:
● P(G=1∣R=1,S=0)=0.9 (Grass gets wet if it rains)
● P(G=1∣R=0,S=1)=0.8 (Grass gets wet if sprinkler is on)
● P(G=1∣R=1,S=1)=0.99 (Both rain and sprinkler)
● P(G=1∣R=0,S=0)=0.01 (Neither)
Step 1: Factor Graph Representation
R S
● The joint distribution factors as:
P(R,S,G)=P(R)⋅P(S)⋅P(G∣R,S)
● Priors:
G
(R)=[0.8,0.2] (for R=0,R=1)
(S)=[0.9,0.1] (for S=0,S=1)

● Conditional Probability Table (CPT)


for G (R,S,G): Given by the problem
(see table below).
Step 1: Factor Graph Representation
● Parent vs. Child Nodes: R S

○ Parents (R,S): Directly influence the child (G).

○ Child (G): Depends on its parents (via P(G∣R,S)).


G
● Message Types:

○ Child-to-parent message: Sent from G to R (or S).

○ "Given my observed value, here’s how likely each


of your states is."

○ Parent-to-child message: Sent from R or S to G.

○ "Here’s my current belief about my state."

● Goal: Compute how observing G=1 updates beliefs about


parents R and S.
Message G to Parent R
Message G to Parent R
R S

How R Updates Its Belief


G
Message from Child G to Parent S
R S

G
R S

Key Intuition
G

● Child-to-parent message: Summarizes how well each parent


state explains the observed child, accounting for uncertainty
in the other parents.
● Parent-to-child message: Shares the parent’s current belief
about itself.
Key Formula
MAP Inference (Finding the "Best" Configuration)

● Goal: Find the most probable assignment (e.g., best


job/happiness combo).
● Trick: Replace sums with max in variable elimination:
○ Compute maxCP(C)P(D∣C) instead of ∑C .
○ Track which C value gave the max (e.g., C=1).
○ Repeat for all variables.
● Result: Probability of the best configuration + the
configuration itself.
Assignment-9 (Cs-101- 2024) (Week-9)

Source
Question-1
In the undirected graph given below, how many terms will be there in its potential
function factorization?

a) 7
b) 3
c) 5
d) 9
e) None
Question-1- Correct answer

In the undirected graph given below, how many terms will be there in its
potential function factorization?

a) 7
b) 3
c) 5
d) 9
e) none

Correct options: (b)-three cliques {A,D,E}, {A,B,C,D}, {B,C,F}


Question-2

Consider the following directed graph:


Based on the d-separation rules, which of the following statements is true?

a) A and C are conditionally independent given B


b) A and E are conditionally independent given D
c) B and E are conditionally dependent given C
d) A and C are conditionally dependent given D and E
Question-2-Explanation

a) A and C are conditionally independent given B: A→B→C


Chain Structure: if observed→ block (independent)

b) A and E are conditionally independent given D


Chain Structure: if observed→ block (independent)

c) B and E are conditionally dependent given C


Chain Structure: if observed→ block (independent)

d) A and C are conditionally dependent given D and E


A→D→E← C: if E,D observes , collider path, observe means dependent
Question-2- Correct answer

Consider the following directed graph: Based on the d-separation rules,


which of the following statements is true?

a) A and C are conditionally independent given B


b) A and E are conditionally independent given D
c) B and E are conditionally dependent given C
d) A and C are conditionally dependent given D and E

Correct options: (a) (b) (d)


Question-3

Consider the following undirected graph: In the undirected graph given


above, which nodes are conditionally independent of each other given C?
Select all that apply.

a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above
Question-3 - Correct answer

Consider the following undirected graph: In the undirected graph given


above, which nodes are conditionally independent of each other given C?
Select all that apply.

a) A, E
b) B, F
c) A, D
d) B, D
e) None of the above

Correct options: (e)-All pairs have an alternate route to each other that
does not pass through C
Question-4
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov
Models?

a) I and II
b) I and IV
c) II and III
d) III and IV
Question-4
I. The ”Hidden” in HMM refers to the fact that the state transition
probabilities are unknown. — it is known
II. The ”Markov” property means that the current state depends only on the
previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not
directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations
for calculations. – Markov means future state dependence on current state
only
Question-4 - Correct answer
Consider the following statements about Hidden Markov Models (HMMs):
I. The ”Hidden” in HMM refers to the fact that the state transition probabilities are unknown.
II. The ”Markov” property means that the current state depends only on the previous state.
III. The ”Hidden” aspect relates to the underlying state sequence that is not directly observable.
IV. The ”Markov” in HMM indicates that the model uses matrix operations for calculations.
Which of the statements correctly describe the ”Hidden” and ”Markov” aspects of Hidden Markov Models?

a) I and II
b) I and IV
c) II and III
d) III and IV

Correct options: (c)


Question-5

For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)

a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A
Question-5 - Correct answer

For the given graphical model, what is the optimal variable elimination
order when trying to calculate P(E=e)

a) A, B, C, D
b) D, C, B, A
c) A, D, B, C
d) D, A, C, A

Correct options: (a)


Question-6
Consider the following statements regarding belief propagation:
I. Belief propagation is used to compute marginal probabilities in graphical models.
II. Belief propagation can be applied to both directed and undirected graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy graphs.
IV. Belief propagation works by passing messages between nodes in a graph.
Which of the statements correctly describe the use of belief propagation?

a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV
Question-6

I. Belief propagation is used to compute marginal probabilities in graphical


models.
II. Belief propagation can be applied to both directed and undirected
graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy
graphs.-- exact solution for tree not loops
IV. Belief propagation works by passing messages between nodes in a
graph.
Question-6 - Correct answer
Consider the following statements regarding belief propagation:
I. Belief propagation is used to compute marginal probabilities in graphical models.
II. Belief propagation can be applied to both directed and undirected graphical models.
III. Belief propagation guarantees an exact solution when applied to loopy graphs.
IV. Belief propagation works by passing messages between nodes in a graph.
Which of the statements correctly describe the use of belief propagation?

a) I and II
b) II and III
c) I, II, and IV
d) I, III, and IV
e) II, III, and IV

Correct options: (c)


Question-7
HMMs are used for finding these. Select all that apply

a) Probability of a given observation sequence


b) All possible hidden state sequences given an observation sequence
c) Most probable observation sequence given the hidden states
d) Most probable hidden states given the observation sequence
Question-7 - Correct answer
HMMs are used for finding these. Select all that apply

a) Probability of a given observation sequence


b) All possible hidden state sequences given an observation sequence
c) Most probable observation sequence given the hidden states
d) Most probable hidden states given the observation sequence

Correct options: (a)(d)


Suggestions and Feedback

Next Session:

Tuesday:
01-Apr-2025
6:00 - 8:00 PM

You might also like