Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
Assignment 11: Introduction To Machine Learning Prof. B. Ravindran
(d) λM LE = Pnn
i=1 xi
(e) λM LE = Pn−1
n
i=1 xi
Pn
i=1 xi
(f) λM LE = n−1
Sol. (d)
n
Y n
Y Pn
L(λ, x1 , . . . , xn ) = f (xi , λ) = λe−λx = λn e−λ i=1 xi
i=1 i=1
Pn
d ln λn e−λ i=1 xi
d ln (L(λ, x1 , . . . , xn ))
=
dλ dλ Pn
d ln (n ln(λ) − λ i=1 xi )
=
dλ
n
n X
= − xi
λ i=1
2. (2 marks) Suppose we are trying to model a p dimensional Gaussian distribution. What is the
actual number of independent parameters that need to be estimated?
(a) 2
(b) p
(c) 2p
(d) p(p + 1)
(e) p(p + 1)/2
(f) p(p + 3)/2
1
Sol. (f)
Explanation Mean vector has p parameters. The covariance matrix is symmetric (p × p) and
hence has p p+1
2 independent parameters.
(e) λM LE = Pn−1
n
i=1 xi
Pn
i=1 xi
(f) λM LE = n−1
Sol. (c)
Write the likelihood:
Y λxi e−nλ
l(λ; x) = e−λ = λx1 +x2 +···+xn
i
xi ! x1 !x2 ! · · · xn !
Take the log and differentiate the log-likelihood with respect to λ and set it to 0.
4. During parameter estimation for a GMM model using data X, which of the following quantities
are you minimizing (directly or indirectly)?
(a) Log-likelihood
(b) Negative Log-likelihood
(c) Cross-entropy
(d) Residual Sum of Squares (RSS)
Sol. (b)
5. (2 marks) In Gaussian Mixture Models, πi are the mixing coefficients. Select the correct
conditions that the mixing coefficients need to satisfy for a valid GMM model.
(a) 0 ≤ πi ≤ 1∀i
(b) −1 ≤ πi ≤ 1∀i
P
(c) i πi = 1
P
(d) i πi need not be bounded
2
6. (2 marks) Expectation-Maximization, or the EM algorithm, consists of two steps - E step and
the M-step. Using the following notation, select the correct set of equations used at each step
of the algorithm.
Notation.
X Known/Given variables/data
Z Hidden/Unknown variables
θ Total set of parameters to be learned
θk Values of all the parameters after stage k
Q(, ) The Q-function as described in the lectures
(a) E − EZ|X,θm−1 [log(P r(X, Z|θ))]
(b) E − EZ|X,θ [log(P r(X, Z|θm ))]
P
(c) M − argmaxθ Z P r(Z|X, θm−1 ) · log(P r(X, Z|θ))
(d) M − argmaxθ Q(θ, θm−1 )
(e) M − argmaxθ Q(θ, θm−2 )
Sol. (a), (c), (d)