0% found this document useful (0 votes)
12 views3 pages

PRML Test 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

PRML Test 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Department of Electrical Engineering

IIT Hyderabad
EE5610/AI5000 - PRML

Classtest - 2 Max. Marks: 40

1. Let the target variable t be regressed from the observed inputs x as


t̂ = y(x, w) = wT φ(x). Let the model parameters be estimated from N
training examples (x1:N , t1:N ), using least squares approach discussed in
the class, and let us denote it with w(N) . Consider a scenario where
we have access to M additional inputs points (xN+1:N+M ), i.e., the cor-
responding desired targets are not available for these M points. Let
us use the model w(N) trained on original N points to estimate the tar-
gets for the these M points, (t̂N+1:N+M ). Let us mix the new inputs and
their predicted targets (xN+1:N+M , t̂N+1:N+M ) with the original N points to
create a bigger dataset with N + M points. Let us use this augmented
dataset to train a new model w(N+M) . Does the new model w(N+M) per-
form better than or worse than or similar the original one w(N) . Provide
mathematical justifications for your arguments. (10)

2. Suppose you are experimenting with L1 and L2 regularization. Further,


imagine that you are running gradient descent and at some iteration
your weight vector is w = [1, ] ∈ R2 where  > 0 is very small. With the
help of this example explain why L2 norm does not encourage sparsity
i.e., it will not try to drive  to 0 to produce a sparse weight vector. Give
mathematical explanation.

3. When we perform least squares linear regression, we make certain ide-


alized assumptions about the errors en , namely, that it is distributed
N(0, β−1 ). In practice, departures from these assumptions occur. Par-
ticularly, in cases where the error distribution is heavier tailed than the
Normal distribution (i.e. has more probability in tails than the Normal).

1
The least square loss is sensitive to outliers, and hence robust regres-
sion methods are of interest. The problem with the least square loss in
the existence of outliers (i.e. when the noise term en can be arbitrarily
large), is that it weighs each observation equally in getting parameter
estimates. Robust methods, on the other hand, enable the observations
to be weighted unequally. More specifically, observations that produce
large residuals are down-weighted by a robust estimation method.
In this problem, you will assume that en are independent and identically
distributed according to a Laplacian distribution, rather than Gaussian.
That is, each en ∼ Laplace(0, b) = 1
2b exp(− ebn ).
(a) Provide the loss function JL (w) whose minimization is equivalent
to finding the ML estimate under the Laplacian noise model.
(b) Suggest a method for optimizing the objective function in (a), and
write the update equations.
(c) Why do you think that the above model provides a more robust fit to
data compared to the standard model assuming Gaussian distribution
of the noise terms?

4. When we have multiple independent outputs t = [t1 t2 · · · tK ]T in linear


regression, the model becomes

K
Y
p(t|x, W) = N(tk |wTk φk (x), β−1
k )
k=1

Since the likelihood factorizes across dimensions, so does ML estimates.


Thus
W = [w1 w2 · · · wK ]

where wk can be estimated using the method discussed in the class from
the k−th dimension of the data. Suppose we have the training data as
follows

2
x t
0 [−1 − 1]T
0 [−1 − 2]T
0 [−2 − 1]T
1 [1 1]T
1 [1 2]T
1 [2 1]T

Let each input xn be embedded into a 2-D space using the following
basis functions:

φ(0) = [1 0]T φ(1) = [0 1]T

The estimate of the output vector becomes

t̂n = WT φ(xn )

where W is a 2x2 matrix. Compute ML estimate for W from the above


data.

You might also like