0% found this document useful (0 votes)
10 views5 pages

AI60201 2024 Endsem Solutions

The document outlines the examination details for a course on Graphical and Generative Models for Machine Learning at IIT Kharagpur, including specific instructions and a breakdown of questions across three sections. It covers topics such as graphical models, variational inference, GANs, and clustering methods. The exam consists of both short answer and detailed problem-solving questions, requiring students to demonstrate their understanding of complex machine learning concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

AI60201 2024 Endsem Solutions

The document outlines the examination details for a course on Graphical and Generative Models for Machine Learning at IIT Kharagpur, including specific instructions and a breakdown of questions across three sections. It covers topics such as graphical models, variational inference, GANs, and clustering methods. The exam consists of both short answer and detailed problem-solving questions, requiring students to demonstrate their understanding of complex machine learning concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

End-Spring Semester Examination 2023-24


Date of Examination: 19th April 2024 Session: FN Duration: 3 Hrs Full Marks: 80

Subject No. : AI60201 Subject : Graphical and Generative Models for Machine Learning
Department/Center/School: Centre of Excellence in Artificial Intelligence

Specific charts, graph paper, log book etc., required

Special Instructions (if any) : All calculations must be shown in details

Section A (Answer all questions)

Q1. Provide short answers to each question [4x5=20 marks]

i) Explain why directed graphical models and undirected graphical models do not
represent the same set of probability distributions.

- You need to show at least one graphical model whose DGM and moralized
UGM versions do not have same set of conditional independence relations.
They may include the head-head collision or “square” graphs.

ii) Explain how message-passing inference can be used for inference of any
intermediate latent state of a sequence of observations that follows a HMM.

- You are expected to draw the HMM, show the message paths and define
message formula (sum-product)

iii) Explain “reparameterization trick” in the context of Variational Autoencoders

- You should write the formulation and why it is needed for training VAE
(backpropagation cannot handle intermediate sampling)

iv) Explain “mode collapse” in GAN and how it can be solved.

- At least one solution should be mentioned. Bi-GAN was discussed in class,


any other solution will also be accepted if explained properly.

v) You have trained a denoising diffusion model using N images of size m x n.


Explain how you will use it to generate new images from Gaussian noise.

- Sampling algorithm with full expression and explanation of the terms


Q2. i) Define a Dirichlet Process Mixture Model, and show how it can be used for an infinite
topic model with appropriate choice of hyperparameters. [10 marks]

- DPMM is not same as CRP (though related). Writing CRP equations will bring
only patial marks. For topic models, we have documents, each having own
distribution over shared set of topics. This requires hierarchical DPMM. The
topics are the mixture components, drawn from the base distribution (Dirichlet)

ii) Explain the approach of variational inference with necessary equations. Show how it can
be applied for inference in Hidden Markov Models. [4+6=10 marks]

- This is self-explanatory. Expression for q(Zt) should be written, as was derived


in class before midsem.

Section B (Answer any 2 questions)

Q3. Draw a Bayesian Network for the given model in plate notation. Derive the Gibbs
Sampling updates for Z1, Z2, Z3. State the full algorithm for inference (including burn-in,
sample collection etc). Assume the parameters are known. [2+4+4=10 marks]

π ~ Dir([a1,…aK]), Z2(i)~Cat(π), Z3(i,j)~N(uk, sk2) {k=Z2(i)}, X3(i,j)~N(Z3(i,j),σ2)

- Π outside any box, a big box having Z2, within it another smaller box
containing Z3 and X3. Arrows: Π -> Z2, Z2->Z3, Z3->X3

- For Gibbs Sampling, p(Z2(i) | Z2(-i), π, Z3) α p(Z2(i) | π) * ∏p(Z3(i,j) | Z2(i))

p(Z3(i,j) | Z2(i), X3(i, j)) α p(Z3(i,j) | Z2(i)) * p(X3(i,j) | Z3(i,j))

Need to write the expression for each of these PDFs (Gaussian/Categorical)

Q4. A variational autoencoder generates 2x1 vectors by a feedforward neural network


whose architecture is given below. Similarly, the architecture of the encoder is also
provided. Suppose you have a sample of Z=1.2 from the base distribution N(0,1). Calculate
the average loss function over 5 reconstructions, using e={-0.5 0.8 1.3 -2.2 0.1} for the
reparameterization. [10 marks]

- The decoder represents function X=f(z) s.t. X = [6z2+4, 8z2-8]. We start with
z=1.2, and calculate the corresponding X (call it X0). This X is now passed to
the encoder, which estimates u(X)=(x2-x1)/2 = z2-6, S(X)=(4X1-3X2)/2 = 20.

- Now, to calculate the loss, we need to see how the decoder can reconstruct
X0 from the encoder’s output. We need to calculate Z ~ N(u(X0), S(X0)) [Z =
u(X0) + e*S(X0)], and then X=f(Z), and then the loss ||X-X0||2 +
KL(N(0,1)||N(u(X0),S(X0))). To calculate the new value of Z, we use the
parameterization trick, for which we are provided with 5 values of e. We
calculate the loss for each of them, then the average

Q5. i) Develop a Normalizing Flow Model that takes a input vector Z (1xD) and converts it
into a vector of the form [a1Z1 ……. adZd b/Zd+1……..b/ZD] at each stage. The original input
is Z~N(0,I) where 0 is D-dim 0-vector and I is DxD identity matrix. Derive the PDF of the
output vector X after 2 stages of transformation as above.

ii) We have N observations of X. How do we estimate the parameters (a 1, …. ad, b)?

- Need to derive the expression for X in terms of Z. This will help to calculate
p(X) in terms of p(Z), which is known. For (ii), need to write the joint PDF
p(X1)*p(X2)….*p(XN), and then maximize it w.r.t the parameters.

[5+5=10 marks]

Section C (Answer any 2 questions)

Q6. Consider a GAN whose generator has the same architecture as the decoder of the VAE
in Q4. The discriminator is a logistic regression classifier, whose weight vector is [2 1] with
bias 0. The dataset has 5 observations [(1 -2), (2 -4), (-2 4), (5, -10), (-3, 6)].

i) Drawing 5 samples from the generator using noise values Z={-0.5 0.8 1.3 -2.2
0.1}, evaluate the GAN objective function.

ii) Suggest new weights of the generator, so that the objective function improves
w.r.t the generator. Similarly, suggest new weights of the discriminator so that
objective function improves w.r.t the discriminator.

iii) Explain the GAN objective function from the perspective of J-S divergence.

[4+4+2=10 marks]
[I have given a max of 6 marks for doing the first part correctly]

- The “decoder” from Q4 needs to be used, not “encoder”. For each value of Z,
it gives us X = f(Z), whose formula is again [6z2+4, 8z2-8]. Thus we get 5
values of X (drawn from Xgen). We also have 5 samples from Xdata.To calculate
the GAN objective function, we calculate average values of log(D(x)) and
log(1-D(x)) using these, where D(x) = 1/(1+exp(-w.x)) with w=[2 1]. This gives
us the GAN objective function value.

- We can easily see that the true data is of the form (z, -2z) while we are
generating [6z2+4, 8z2-8]. So to improve objective WRT generator, we must
change the weights of the decoder accordingly. Similarly, we find that
discriminator is totally confused (D(x)=0.5) for Xdata. So its weight has to be
changed so that it has high response for Xdata (D(x)=1)

Q7. Consider a labelled dataset as given below. Two models have been used to generate
10 observations each, with class labels. Using different measures of generative model
evaluation, compare the two models against the training data. Use a suitably trained
Bayesian Classifier for classification purposes. [10 marks]

TrainID 1 2 3 4 5 6 7 8 9 10

X 21.5 24.8 29.4 27.6 -12.2 -15.7 -13.6 -11.1 0.5 -0.7

Y A A A A B B B B C C

M1ID 1 2 3 4 5 6 7 8 9 10

X 12.1 18.3 16.7 14.5 13.6 -7.4 -1.2 -4.6 -6.8 -2.0

Y A A A A A B B B B B

M2ID 1 2 3 4 5 6 7 8 9 10

X 4.8 1.2 3.4 -2.4 -3.2 -1.6 1.8 0.3 -0.5 -1.2

Y A A A B B B C C C C

- We can use the standard criteria like diversity, sharpness and inception score
as discussed in class. For sharpness, we need a classifier trained on the
original data to classify the generated samples, but the classifier should be
probabilistic (Bayesian specified). For each of the 20 observations, the
classifier will give a probability distribution over class labels, whose entropy
can be calculated. Then average entropy value will be calculated separately
for the M1 samples and M2 samples, and compared for sharpness (lower
entropy: higher sharpness). For the diversity, the generated class label
distributions should be compared to the original class label distribution w.r.t
cross-entropy. Other criteria (ex. Based on mean/variance etc are acceptable
as long as done consistently)

Q8. i) Consider an online clustering problem, where the data arrives one by one, and may
either be placed in an existing cluster or to a new cluster, according to Chinese Restaurant
Process. Considering the mixture components are N(u, 25) with a base distribution of
N(0,25) on ‘u’ and α=2, demonstrate how the dataset below will be clustered according to
the CRP. How can we control the number of clusters formed?

21.5 -15.7 0.5 -12.2 27.6 24.8 -13.6 -11.1 29.4 -0.7

- For online clustering, we cannot use Gibbs Sampling. But for each datapoint,
we will calculate its clustering distribution (probability of joining existing cluster
or creating new cluster). For these calculations we need CRP formula:

P(Zn+1=k) (nk/(n+ α)*N(x; uk, 25). ‘uk’ for each cluster is estimate as the mean
of the datapoints already assigned to the cluster.

ii) We have D-dimensional real-valued measurements of a geophysical variable X at N


points, with lat-lon-alt positions {(x1,y1,z1), ….. (xN,yN,zN)}. Using Gaussian Process
Regression, how can we estimate X at a new position (x’,y’,z’)? [6+4=10 marks]

- The main thing to note here is that (x,y,z) vector denotes the locations where
we have measurements, so we calculate mean function and cov . function
accordingly. For eg. Σ(x,x’) = exp(-||x-x’||2) where x, x’ are 3D location vectors.

You might also like