E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
E9 205 - Machine Learning For Signal Processing: Practice Midterm Exam
Instructions
1. This exam is open book. However, computers, mobile phones and other handheld devices
are not allowed.
2. Any reference materials that are used in the exam (other than materials distributed in
the course channel) should be pre-approved with the instructor before the exam.
3. No additional resources (other than those pre-approved) are allowed for use in the exam.
5. Notation - bold symbols are vectors, capital bold symbols are matrices and regular symbols
are scalars.
8. All answer sheets should contain your name and SR number in the top.
(a) Find the total data likelihood under the given model for the MLSP exam.
(b) How can Prof. Raj apply gradient based learning to estimate the latent ability
of students αn and latent difficulty of questions δq which maximize the total log-
likelihood ?
(Points 15)
2. Bayesian Machine Learning - Varada is a budding stock market analyst. At the outset
of her job, she attempts modeling the market data denoted as X = {xi , i = 1, ..., N } using
a GMM λ = {αc , µc , Σc }C
c=1 . Given only a small number of samples (N = 30), she decides
on using a simple covariance matrix Σc = I. At this point, her colleague Vikas asks her
to apply Bayesian techniques instead of ML for parameter estimation of the Gaussian
mixture model. For mixture component means, Vikas suggests using a prior density
given by Gaussian density,
ρc
p(µc ) ∝ exp{− (µ − mc )∗ (µc − mc )}
2 c
where ρc > 0 and mc are the hyper parameters of the Gaussian distribution. Vikas
also suggests to not use any prior distribution for weight parameters and to assume the
hyper-parameters as fixed quantities. Let Θ = {αc , µc }C c=1 denote the parameters of
interest for estimating the model. While the ML rule maximizes arg maxΘ p(X|Θ), the
Bayesian estimation suggested by Vikas maximizes the MAP rule arg maxΘ p(Θ|X) which
is equivalent to maximizing arg maxΘ p(X|Θ)p(Θ).
(a) The first challenge before Varada is modify the EM algorithm (which maximizes the
ML objective) to optimize the MAP rule for Bayesian estimation. How would you
derive an iterative algorithm for Bayesian estimation. Show that your algorithm
consistently improves the objective function at each iteration. ( Points 10)
(b) Varada has managed to develop the EM algorithm for Bayesian estimation. Now
she does some mathematical analysis and to her delight finds that her choice of prior
distributions for µc obeys the conjugate density property (the EM style lower bound
for the posterior distribution is also a Gaussian distribution like the prior distribu-
tion). How did she arrive at this property ? What are the mean and covariance of
the EM style posterior Gaussian distribution. ( Points 15)
(c) Using the above problem solutions, Varada proceed to derive the iterative update
rules for Θ = {αc , µc }C
c=1 . Can you perform the same derivation ? (Points 10)
3. Revisiting LDA - The LDA is the problem of finding projection matrix W which max-
W T Sb W D
imizes T r( W T S W ) for data x1 , ..xN ∈ R , where
w
X
C X
Sw = (xk − µc )(xk − µc )T
c=1 xk ∈c
X
N
ST = (xk − µ)(xk − µ)T
k=1
with C denoting the number of classes, µc being the within class mean for class c, µ being
the sample mean of the entire data and between class scatter matrix Sb = ST − Sw .
Show that LDA problem can be equivalently expressed using all the pairwise scatter
matrices Sij = (xi − xj )(xi − xj )T for all i, j = 1, ..., N and the affinity matrix R of size
N × N whose (i, j)th element rij defined as
rij = N1k if xi , xj belong to the kth class
rij = 0 otherwise.
Here, Nk is the number of data points belonging to class k. Specifically, show that,
X
N X
N
Sw = rij Sij
i=1 j=1
N X
X N
1
Sb = ( − rij )Sij
N
i=1 j=1
(Points 20)
4. Line Mixture Model - A line mixture model is the problem of fitting a mixture of lines
on a 2-D dataset. Let zi = [xi yi ]T denote a set of 2-D data i = {1, .., N }. Each mixture
component in the LMM is defined using a line fk (xi ) = ak xi + bk , k = {1, ..., K}, where
K is the number of mixtures and ak , bk are the parameters of the line for the kth mixture
component. The pdf of zi is modeled as,
X
K
p(zi |λ) = αk N (yi ; fk (xi ), σk2 )
k=1
where σk is the variance of the k-th mixture component and the model parameters λ =
{ak , bk , σk }K
k=1 . Given a set of N data points,
(a) Write down the Q function which will allow the EM estimation of the λ.
(b) Find the iterative maximization steps for all the parameters in the model.
(Points 15)