0% found this document useful (0 votes)
139 views4 pages

E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam

1. This document provides instructions for a machine learning signal processing exam. It outlines the exam format, rules, questions, and grading procedures. Students are asked to answer all questions, provide their name and student number, and upload their responses in PDF format. The exam duration is 90 minutes. 2. The document contains 5 questions related to machine learning topics like latent variable models, Bayesian machine learning, linear discriminant analysis, mixture models, and robust principal component analysis. Students are asked to derive algorithms, update rules, and solve related mathematical problems around these ML applications. 3. Questions involve modeling exam grading as a latent variable problem, deriving an EM algorithm for Bayesian parameter estimation, expressing LDA using pairwise scatter matrices

Uploaded by

rishi gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views4 pages

E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam

1. This document provides instructions for a machine learning signal processing exam. It outlines the exam format, rules, questions, and grading procedures. Students are asked to answer all questions, provide their name and student number, and upload their responses in PDF format. The exam duration is 90 minutes. 2. The document contains 5 questions related to machine learning topics like latent variable models, Bayesian machine learning, linear discriminant analysis, mixture models, and robust principal component analysis. Students are asked to derive algorithms, update rules, and solve related mathematical problems around these ML applications. 3. Questions involve modeling exam grading as a latent variable problem, deriving an EM algorithm for Bayesian parameter estimation, expressing LDA using pairwise scatter matrices

Uploaded by

rishi gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

E9 205 – Machine Learning For Signal Processing

Practice Midterm Exam


Date: Feb 23, 2022, 3:30pm

Instructions

1. This exam is open book. However, computers, mobile phones and other handheld devices
are not allowed.

2. Any reference materials that are used in the exam (other than materials distributed in
the course channel) should be pre-approved with the instructor before the exam.

3. No additional resources (other than those pre-approved) are allowed for use in the exam.

4. Academic integrity and ethics of highest order are expected.

5. Notation - bold symbols are vectors, capital bold symbols are matrices and regular symbols
are scalars.

6. Answer all questions.

7. Name your scanned copy of answer file in pdf format as FirstName-LastName-midterm.pdf


and follow the upload to the Teams channel.

8. All answer sheets should contain your name and SR number in the top.

9. Question number should be clearly marked for each response.

10. Total Duration - 90 minutes including answer upload

11. Total Marks - 100 points


1. MLSP Exam and grading - Prof. Raj is evaluating the midterm exam of the MLSP
course which was taken by N students. The exam had Q questions. From the answers
provided by students, he finds the assignment variable xnq where (xnq = 1) indicates
that the answer for student n and question q was correct and (xnq = 0) indicates answer
for student n and question q was incorrect. Here n ∈ {1, .., N } and q ∈ {1, .., Q}. Each
question is assigned a latent difficulty δq and each student is associated with a latent ability
αn . Prof. Raj uses a sigmoidal model for the conditional probability of the assignment
variable (xnq = 1) given the latent ability vector α = [α1 , .., αN ]T and latent difficulty
vector δ = [δ1 , ..., δQ ]T . Specifically,

p(xnq = 1|α, δ) = σ(αn − δq )

where σ is the sigmoidal nonlinearity function. He plans to estimate the deterministic


latent parameters in the model given the binary data matrix X of dimension N × Q
containing elements [xnq ] (assuming that variables xnq are i.i.d.).

(a) Find the total data likelihood under the given model for the MLSP exam.
(b) How can Prof. Raj apply gradient based learning to estimate the latent ability
of students αn and latent difficulty of questions δq which maximize the total log-
likelihood ?

(Points 15)

2. Bayesian Machine Learning - Varada is a budding stock market analyst. At the outset
of her job, she attempts modeling the market data denoted as X = {xi , i = 1, ..., N } using
a GMM λ = {αc , µc , Σc }C
c=1 . Given only a small number of samples (N = 30), she decides
on using a simple covariance matrix Σc = I. At this point, her colleague Vikas asks her
to apply Bayesian techniques instead of ML for parameter estimation of the Gaussian
mixture model. For mixture component means, Vikas suggests using a prior density
given by Gaussian density,
ρc
p(µc ) ∝ exp{− (µ − mc )∗ (µc − mc )}
2 c
where ρc > 0 and mc are the hyper parameters of the Gaussian distribution. Vikas
also suggests to not use any prior distribution for weight parameters and to assume the
hyper-parameters as fixed quantities. Let Θ = {αc , µc }C c=1 denote the parameters of
interest for estimating the model. While the ML rule maximizes arg maxΘ p(X|Θ), the
Bayesian estimation suggested by Vikas maximizes the MAP rule arg maxΘ p(Θ|X) which
is equivalent to maximizing arg maxΘ p(X|Θ)p(Θ).

(a) The first challenge before Varada is modify the EM algorithm (which maximizes the
ML objective) to optimize the MAP rule for Bayesian estimation. How would you
derive an iterative algorithm for Bayesian estimation. Show that your algorithm
consistently improves the objective function at each iteration. ( Points 10)

(b) Varada has managed to develop the EM algorithm for Bayesian estimation. Now
she does some mathematical analysis and to her delight finds that her choice of prior
distributions for µc obeys the conjugate density property (the EM style lower bound
for the posterior distribution is also a Gaussian distribution like the prior distribu-
tion). How did she arrive at this property ? What are the mean and covariance of
the EM style posterior Gaussian distribution. ( Points 15)

(c) Using the above problem solutions, Varada proceed to derive the iterative update
rules for Θ = {αc , µc }C
c=1 . Can you perform the same derivation ? (Points 10)

3. Revisiting LDA - The LDA is the problem of finding projection matrix W which max-
W T Sb W D
imizes T r( W T S W ) for data x1 , ..xN ∈ R , where
w

X
C X
Sw = (xk − µc )(xk − µc )T
c=1 xk ∈c
X
N
ST = (xk − µ)(xk − µ)T
k=1

with C denoting the number of classes, µc being the within class mean for class c, µ being
the sample mean of the entire data and between class scatter matrix Sb = ST − Sw .
Show that LDA problem can be equivalently expressed using all the pairwise scatter
matrices Sij = (xi − xj )(xi − xj )T for all i, j = 1, ..., N and the affinity matrix R of size
N × N whose (i, j)th element rij defined as
rij = N1k if xi , xj belong to the kth class
rij = 0 otherwise.
Here, Nk is the number of data points belonging to class k. Specifically, show that,
X
N X
N
Sw = rij Sij
i=1 j=1
N X
X N
1
Sb = ( − rij )Sij
N
i=1 j=1

(Points 20)
4. Line Mixture Model - A line mixture model is the problem of fitting a mixture of lines
on a 2-D dataset. Let zi = [xi yi ]T denote a set of 2-D data i = {1, .., N }. Each mixture
component in the LMM is defined using a line fk (xi ) = ak xi + bk , k = {1, ..., K}, where
K is the number of mixtures and ak , bk are the parameters of the line for the kth mixture
component. The pdf of zi is modeled as,

X
K
p(zi |λ) = αk N (yi ; fk (xi ), σk2 )
k=1

where σk is the variance of the k-th mixture component and the model parameters λ =
{ak , bk , σk }K
k=1 . Given a set of N data points,

(a) Write down the Q function which will allow the EM estimation of the λ.
(b) Find the iterative maximization steps for all the parameters in the model.

(Points 15)

5. Robust Principal Components - Vani is interested in applying PCA to a set of pure


data {x1 , ..., xN } where each xn is D dimensional. She has read in wiki that applying
PCA finds the principal components as the eigenvectors of sample covarince matrix Sxx
with the largest eigenvalues. However in measuring the data xn , she realizes that an
additive noise ǫn has also been introduced, i.e., the measured data with her is yn where
yn = xn + ǫn with ǫn having a sample mean of 0 and sample covariance of Σǫ which
is full rank. The pure data xn and the noise ǫn are also uncorrelated. She realizes that
application of PCA on the set of noisy data {y1 , ..., yN } will not recover the principal
components of the pure data. With this concern, she approaches Ryan who is a student
of MLSP course. Ryan solves this problem using the Cholesky decomposition Σǫ = AAT .
Specifically, using a data transformation zn = A−1 yn , he is able to recover the prinicipal
components of the pure data. How will you go about finding the prinicipal components of
the pure data if you were Ryan. Further, for a PCA of K dimensions (K < D), how can
Ryan estimate the reconstruction error in PCA using the data transformation approach.
(Points 15)

You might also like