0% found this document useful (0 votes)

5 views4 pages

AI60201 Module3 4 Problems

The document contains a series of questions related to various machine learning models, including binary image generation, variational autoencoders, Deep Boltzmann Machines, Normalizing Flows, GANs, and clustering with Dirichlet Processes. Each question requires mathematical derivations, probability distributions, and parameter estimations based on given observations and model architectures. The document also discusses advanced concepts such as Gibbs Sampling, Chinese Restaurant Processes, and Gaussian Processes in the context of real-valued observations.

Uploaded by

insipidintegrator

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views4 pages

AI60201 Module3 4 Problems

Uploaded by

insipidintegrator

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Q1. I want to generate a 3x3 binary image by sequentially generating pixels row-wise.

The first pixel

X(1,1) follows Bernoulli(0.5). Each pixel X(i,j) follows a Bernoulli distribution, with parameter h(i,j)
equal to 0.5 times the mean of the pixel values {X(<=i,<=j)}, i.e. above and to the left of it.

i) Write a general expression for p(X), where X is any 3x3 binary model, according to this
model.
ii) Calculate the probability of generating an image where the central pixel (i.e. X(2,2)) is
different from the remaining 8 pixels (which are all equal).

Q2. A variational autoencoder can generate data X as 2x2 matrices as follows: i) Two random variables
Z1, Z2 are sampled from N(a,1) and N(b,1) independently, ii) these are passed to a neural network
with 1 hidden layer which produces 4 output values – X(1,1), X(1,2), X(2,1), X(2,2). The decoder
network architecture is given below.

i) What is the probability distribution over the space of 2x2 real-valued matrices that is
induced by the autoencoder?
ii) I have N observations from the VAE: [X1, X2, …. XN] (all 2x2 matrices). Assuming all edges
between Z and V have equal weight ‘w1’, and all edges between V and X also have equal
weight ‘w2’, estimate these parameters a, b, w1, w2.
iii) Now consider another encoder network with 1 hidden layer of 3 nodes, and 2 output
nodes. The edge weights are again constrained in the same way as the decoder (as
described in (ii). Consider that the output nodes represent the means of the Gaussian-
distributed codes (variance is considered to be fixed to 1). Given the N observations,
explain how encoder and decoder parameters will be estimated with necessary
derivations.

Solution Sketch:

i) P(X) = P(X11)*P(X12)*P(X21)*P(X22)

Z1~N(a,1), Z2~N(b,1). V1=w1(Z1+Z2), V2=w1Z1, V3=w1*Z2, so V1~N(w1(a+b), 2w12), V2~N(w1a,

w12), V3~N(w1b, w12). Also, X11=w2*V1, X12=w2*V2, X21=w2*V2, X22=w2*(V1+V3).

So we have X11~N(w1w2(a+b), 2w12 w22), X12~N(w1w2a, w12 w22), X21~N(w1w2a, w12 w22), X22~N(w1
w2(a+2b), 3w12 w22).

ii) From the observations, calculate sample means [m11, m12, m21, m22] and sample
variances [s11, s12, s21, s22]. Considering N large enough, we can have equations like
w1w2(a+b) = m11, w1w2a = m12 = m21, w1w2(a+2b) = m22 etc. Solving these, we can find
a, b, and w1w2. We can estimate w1, w2 individually if we have some prior on them.

iii) We first develop the augmented dataset as [(X1, e1, f1 ), ….. [(XN, eN, fN)] by sampling the
noise ei~N(0,1), fi~N(0,1) (for the reparameterization trick). The loss function L(Xi) has two
parts: reconstruction error L1(X1) and KL divergence L2(Xi). For input Xi to the encoder,
we can easily derive expressions for the two outputs, let’s say c(Xi) and d(Xi). So the code
variables have distributions N(c(Xi),1) and N(d(Xi),1), while according to the model these
are N(a,1) and N(b,1). So L2(Xi) = ½*(a-c(Xi))2 + ½*(b-d(Xi))2 [Using formular for K-L Div
between two Gaussians]. Again, the code for Xi as calculated by the encoder is [c(Xi)+ei,
d(Xi)+fi], which is equivalent to sampling from N(c(Xi),1), N(d(Xi),1). The decoder calculates
X’ where X’(1,1)=w2*w1*(c(Xi)+ d(Xi)+ei+ fi), X’(1,2)=X’(2,1)=w2*w1*(c(Xi)+ei),
X’(2,2)=w2*w1*(c(Xi)+ei+2*d(Xi)+2*fi). Accordingly, we have L1(Xi)=||X’-Xi||2. We now
need to calculate the gradient of L=Σi(L1(Xi)+L2(Xi)) with respect to each parameter in both
encoder and decoder, and run gradient descent.

Q3. Consider a Deep Boltzmann Machine to model 4D real-valued observations. There are 3 hidden
layers, with 2, 2, 1 hidden nodes, where the 2 nodes on the lowest hidden layer is real-valued in (0,1),
while the other 3 are binary. The edge potential functions are defined as usual, i.e. S(x,y)=exp(-x*y*w)
where ‘w’ is the edge weight.

i) Calculate the density p(X) over the 4D real space as defined by the model (assuming the
edge weights are all 1).
ii) We are provided with N observations [X1, ….. XN], each of which is a 4D vector. How do we
estimate the suitable model parameters (edge weights)?

Solution Sketch:

In addition to the observation X=[X1,X2,X3,X4] there are latent variables [Y1,Y2] (real), [Z1,Z2] (binary),
and V (binary). So p(X, Y, Z, V) =

S(X1,Y1)* ….. S(X4,Y1)S(X1,Y2)….S(X4,Y2)S(Y1,Z1)S(Y1,Z2)S(Y2,Z1)S(Y2,Z2)S(Z1,V)S(Z2,V)

= exp(-X1*Y1 - … - X4*Y1 - X1*Y2 - … - X4*Y2 - Y1*Z1 - Y1*Z2 - Y2*Z1 - Y2*Z2 - Z1*V - Z2*V).

To obtain p(X), we need to marginalize the other variables. Y1, Y2 can be eliminated by integration
over (0,1), while for Z1, Z2, V we need to sum over {0,1}.

For edge weight estimation, we need the approach of computing gradients using contrastive
divergence, as discussed. For this, we need to sample different variables through Gibbs Sampling, for
which we need to derive the distributions, such as p(V|Z1,Z2), p(Z1|V,Y1,Y2), p(Y1|Z1,Z2,X1,X2,X3,X4)
etc. These are easy to calculate based on the edge potential functions.

Q4. Consider a Normalizing Flow model, where we start with a 2x2 matrix Z where each entry follows
N(0,1). In each step, we carry out the operation Z(i+1) = A(i)*Z(i)+B(i) where A(i) is an invertible 2x2
matrix, and B(i) is any 2x2 matrix.

i) In the special case that A(i)=i*I (I is 2x2 identity) and B(i)=[[i 0], [0 i]] at each step, calculate
the distributions of Z(2), Z(3) etc
ii) Given any observation X (a 2x2 matrix), calculate the corresponding Z in the general case
of A, B
iii) Given a set of observations of 2x2 real matrices (X1, …. , XN), and assuming A(i)=a(i)*I,
B(i)=[[b(i) 0], [0 b(i)]], discuss how to estimate the parameters by maximum likelihood.
Q5. Consider a variable X0 ~ U(4,5). It is subjected to diffusion q(Xt|X0) according to a schedule (β1, β2
…..) where 0<βi<1. Calculate the marginal distribution of the diffusion stages X1, X2, …. Based on these,
calculate the denoising distributions q(Xt|Xt+1). Discuss how the denoising distribution can take a
sample from N(0,1) and convert it into a sample from U(4,5).

Q6. The generator of a GAN takes as input a random variable Z~N(0,1), and maps it to a 2D vector X
by a simple linear transformation with parameters (w1,w2). Find the generator’s distribution pGEN.

Now, we want to distinguish between vectors based on which component is larger using a binary label.
How do we make the generator into a conditional generator?

The data distribution (pDATA) is a GMM where the two modes are (100,1) and (1,100) with variance
20 each. We now need to build a discriminator based on logistic regression. Taking a random initial
value of the discriminator’s parameter, calculate the GAN objective by taking N samples from pGEN
and pDATA. Optimize the discriminator’s parameter w.r.t. these samples, and re-calculate the GAN
objective.

For generating Gaussian samples, use samples drawn from N(0,1): [0.54, 1.8, -2.26, 0.86, 0.32, -1.31,
-0.43, 0.34, 3.58, 2.77, -1.35, 3.03, 0.73, -0.06, 0.71, -0.21, -0.12, 1.49, 1.41, 1.42]

Q7. We are interested in the task of 3x3 binary image generation. The class label indicates the number
of white pixels in the image (0-9). For each class. The class distribution p(Y) and the distribution p(X|Y)
are specified. Now a generative model f is developed, which also produces binary images. How will
you evaluate the model performance? If the class label is based on a weighted sum of the pixels, how
will this approach change?

Q8. Consider the following dataset of 3D real vectors. We wish to cluster them based on Dirichlet
Process, for which we consider Gaussian Base Distribution N(0, I) [I: 3x3 identity matrix] for the
Gaussian component mean, and Gamma Base distribution Γ(6, 2) on σ2, where σ2I is the Gaussian
component covariance. The DP concentration parameter α=2. Consider any arbitrary initial clustering
of the vectors. Demonstrate one full pass of Gibbs Sampling based on CRP, that includes updating the
cluster indices, and the Gaussian component parameters.

ID 1 2 3 4 5 6 7 8 9 10 11 12
X1 1.9 -4.6 0.2 7.5 -1.3 1.5 2.4 9.0 -3.0 -6.2 4.8 5.8
X2 2.7 -8.7 0.8 4.5 4.9 2.4 3.6 5.8 -7.2 -9.9 1.7 3.1
X3 4.1 4.3 2.1 -2.7 8.6 3.6 4.0 -1.2 6.2 3.2 -5.2 -2.2

Q9. Distance-Dependent Chinese Restaurant Process: All observations are real-valued vectors. Each
new observation joins a cluster depending not only the size of the cluster (usual CRP), but also on its
mean Euclidean distance from all the observations in that cluster. The DDCRP score between an
observation and a cluster is defined by these two quantities, and the distribution is created using
softmax function on these score. Demonstrate this process on the above dataset.
Q10. In case of Chinese Restaurant Process, calculate p(Z2|Z1=1). In case of DPMM, calculate
p(Z2|Z1=1, X2, X1).

Solution Sketch: p(Z2|Z1=1) = ∫p(Z2|π)p(π)dπ [Z1=1 is by definition, i.e. it is a sure event]. In other
words, p(Z2=1)= π1, and π1~Beta(1, α) and Expected value of π1: 1/(1+ α). p(Z2=2)=1- π1= α /(1+ α).

In case of DPMM, p(Z2=1|Z1=1, X2, X1) = p(X2|Z2=1, Z1=1, X1)*p(Z2=1| Z1=1, X1) = f(X2,
φ1)*(1/(1+α))

Q11. We are looking to estimate real-valued observations ‘y’ at points with 2D vector representations
‘x’, i.e. y=f(x). We consider a Gaussian Process prior over it, i.e. f ~ GP(u(x), C(x,x’)) where the mean
function u(x)=x, and covariance function C(x,x’) = exp(-||x-x’||2), where x is a 2D vector. Estimate y at
(0,0) based on the following observations of ‘f’. Show how the uncertainty varies as we use more
observations for this estimate.

ID 1 2 3 4 5
X1 3 2 -4 5 -6
X2 4 6 -2 -5 3
Y 0.17 0.14 0.18 0.12 0.13

Module in Assessment of Learning 2
No ratings yet
Module in Assessment of Learning 2
17 pages
Final Solution
No ratings yet
Final Solution
12 pages
Measures of Relative Dispersion Skewness and Kurtosis
100% (1)
Measures of Relative Dispersion Skewness and Kurtosis
26 pages
CS236 Hw2 Answers
No ratings yet
CS236 Hw2 Answers
14 pages
Dimitri Bertsekas and John N Tsitsiklis - Introduction To Probability
No ratings yet
Dimitri Bertsekas and John N Tsitsiklis - Introduction To Probability
6 pages
Deep Learning 2017 Lecture7GAN
No ratings yet
Deep Learning 2017 Lecture7GAN
62 pages
Solution Manual For Microeconometrics
59% (22)
Solution Manual For Microeconometrics
785 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
94 pages
CS236 Homework 3 Answer
No ratings yet
CS236 Homework 3 Answer
8 pages
CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
100% (1)
CS 236, Fall 2018 Midterm Exam: Stanford University Honor Code
6 pages
1.deep Learning Assignment1 Solutions 1
100% (3)
1.deep Learning Assignment1 Solutions 1
12 pages
Hyperparameter Tuning
No ratings yet
Hyperparameter Tuning
4 pages
Chapter 5
No ratings yet
Chapter 5
140 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
From Denoising Diffusions To Denoising Markov Models
No ratings yet
From Denoising Diffusions To Denoising Markov Models
55 pages
DiffusionModel DDPM
No ratings yet
DiffusionModel DDPM
52 pages
Score Approximation, Estimation and Distribution Recovery of Diffusion Models On Low-Dimensional Data
No ratings yet
Score Approximation, Estimation and Distribution Recovery of Diffusion Models On Low-Dimensional Data
52 pages
Slides No Break
No ratings yet
Slides No Break
77 pages
Unit - 5 - Review - 1
No ratings yet
Unit - 5 - Review - 1
103 pages
Correlation
No ratings yet
Correlation
57 pages
Application DPM
No ratings yet
Application DPM
43 pages
L11 TopicModels 2
No ratings yet
L11 TopicModels 2
37 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
Mlgs 2021 Retake
No ratings yet
Mlgs 2021 Retake
54 pages
M03 Clustering
No ratings yet
M03 Clustering
37 pages
Mlgs 2021 Endterm Solution
No ratings yet
Mlgs 2021 Endterm Solution
26 pages
Khan - Diffusion Models and Normalizing Flows
No ratings yet
Khan - Diffusion Models and Normalizing Flows
36 pages
Osta L4
No ratings yet
Osta L4
30 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
8614, Unit 6 3
No ratings yet
8614, Unit 6 3
36 pages
5 5 The Normal Distribution Hard
No ratings yet
5 5 The Normal Distribution Hard
26 pages
ML Assignment
No ratings yet
ML Assignment
17 pages
cs188 sp16 F Sol
No ratings yet
cs188 sp16 F Sol
27 pages
Lesson 2. Constructing Probability Distributions
No ratings yet
Lesson 2. Constructing Probability Distributions
28 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
67 pages
DGM 2023 Endterm Solution
No ratings yet
DGM 2023 Endterm Solution
12 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Lecture 14
No ratings yet
Lecture 14
23 pages
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
No ratings yet
Flexible Regression - Lecture 6: Marnie Mclean Room 344 Mathematics and Statistics Building
27 pages
Improved Denoising Diffusion Probabilistic Models
No ratings yet
Improved Denoising Diffusion Probabilistic Models
17 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
Homework 1 - Solutions
No ratings yet
Homework 1 - Solutions
6 pages
Model R R Square Adjusted R Square Std. Error of The Estimate 1, 892, 723, 47657
No ratings yet
Model R R Square Adjusted R Square Std. Error of The Estimate 1, 892, 723, 47657
2 pages
Part A Assignment - No - 4
No ratings yet
Part A Assignment - No - 4
14 pages
2017-18-I MS Key
No ratings yet
2017-18-I MS Key
6 pages
Final 2012 Wsolutions
No ratings yet
Final 2012 Wsolutions
14 pages
HW 4
No ratings yet
HW 4
5 pages
2022 Exam2 Solution
No ratings yet
2022 Exam2 Solution
10 pages
hw5 1
No ratings yet
hw5 1
6 pages
2019-20-I ES Key
No ratings yet
2019-20-I ES Key
4 pages
PEJI, ISAIAH EUGENE G (Written Report, Paired Samples T Test)
No ratings yet
PEJI, ISAIAH EUGENE G (Written Report, Paired Samples T Test)
16 pages
Estimation of Parametric Functions in Downton's
No ratings yet
Estimation of Parametric Functions in Downton's
17 pages
Time Grad
No ratings yet
Time Grad
11 pages
9709 - w21 - QP - 52 Solved+unsolved
No ratings yet
9709 - w21 - QP - 52 Solved+unsolved
12 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
AI60201 2024 Endsem Solutions
No ratings yet
AI60201 2024 Endsem Solutions
5 pages
SMS 3466 Survival Analysis
No ratings yet
SMS 3466 Survival Analysis
6 pages
Assignment 2 Submitted
No ratings yet
Assignment 2 Submitted
8 pages
Wa0193.
No ratings yet
Wa0193.
4 pages
t4 Sol
No ratings yet
t4 Sol
8 pages
Practice Finals
No ratings yet
Practice Finals
7 pages
Endsem ML Regular AK
No ratings yet
Endsem ML Regular AK
7 pages
Examples1 2up
No ratings yet
Examples1 2up
4 pages
Final 2012 W
No ratings yet
Final 2012 W
8 pages
Probabilistic Modelling and Reasoning
No ratings yet
Probabilistic Modelling and Reasoning
13 pages
Probabilistic Robotics Exam, Spring 2012: 4 Points
No ratings yet
Probabilistic Robotics Exam, Spring 2012: 4 Points
7 pages
Tutorial On Diffusion Models
No ratings yet
Tutorial On Diffusion Models
4 pages
Monte Carlo Simulations With Implementations in MATLAB
No ratings yet
Monte Carlo Simulations With Implementations in MATLAB
10 pages
Mid-Term2024 SOL
No ratings yet
Mid-Term2024 SOL
4 pages
Probabilistic Engineering Design
No ratings yet
Probabilistic Engineering Design
7 pages
Foundations of Data Science: Exercise 1
No ratings yet
Foundations of Data Science: Exercise 1
5 pages
Ad
No ratings yet
Ad
5 pages
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
No ratings yet
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
4 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
Statcal - Miclat - Module 2 Assignment
No ratings yet
Statcal - Miclat - Module 2 Assignment
4 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Types of Inferential Statistics
No ratings yet
Types of Inferential Statistics
3 pages
Practice Questions Lec 18 45
No ratings yet
Practice Questions Lec 18 45
4 pages
Statistics and Probability Q3 Reviewer
No ratings yet
Statistics and Probability Q3 Reviewer
3 pages
EE 769 2018 End-Sem
No ratings yet
EE 769 2018 End-Sem
2 pages
CROSSTABS
No ratings yet
CROSSTABS
5 pages
Distribution of Occupation Source: Hanif Bin Abdul Manap
No ratings yet
Distribution of Occupation Source: Hanif Bin Abdul Manap
2 pages
HW 1
No ratings yet
HW 1
4 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
GRV and Vectors
No ratings yet
GRV and Vectors
2 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Regression Analysis - STAT510
No ratings yet
Regression Analysis - STAT510
39 pages

AI60201 Module3 4 Problems

Uploaded by

AI60201 Module3 4 Problems

Uploaded by

Q1. I want to generate a 3x3 binary image by sequentially generating pixels row-wise.

The first pixel

Z1~N(a,1), Z2~N(b,1). V1=w1*(Z1+Z2), V2=w1*Z1, V3=w1*Z2, so V1~N(w1(a+b), 2w12), V2~N(w1a,

S(X1,Y1)* ….. *S(X4,Y1)*S(X1,Y2)*….*S(X4,Y2)*S(Y1,Z1)*S(Y1,Z2)*S(Y2,Z1)*S(Y2,Z2)*S(Z1,V)*S(Z2,V)

You might also like

Z1~N(a,1), Z2~N(b,1). V1=w1(Z1+Z2), V2=w1Z1, V3=w1*Z2, so V1~N(w1(a+b), 2w12), V2~N(w1a,

S(X1,Y1)* ….. S(X4,Y1)S(X1,Y2)….S(X4,Y2)S(Y1,Z1)S(Y1,Z2)S(Y2,Z1)S(Y2,Z2)S(Z1,V)S(Z2,V)