0% found this document useful (0 votes)
22 views2 pages

Tut1 Questions

MLPR questions

Uploaded by

Amir Sharifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views2 pages

Tut1 Questions

MLPR questions

Uploaded by

Amir Sharifi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

MLPR Tutorial Sheet 1

You should attempt the tutorial questions before your tutorial.


I strongly recommend you discuss these questions—and the course in general—with your
peers. You could try to meet up with people from your tutorial group, or other people you
have met in lectures.
As a student, I’d attempt tutorial questions by myself at first, but usually couldn’t do
everything in a sensible amount of time. If you can’t do a part, skip it at first and move
on! After attempting what I could, I’d meet up with a friend from the class, and we’d pool
our understanding. Finally we’d ask someone else, or ask questions in a tutorial, about the
parts that neither of us could manage. Tutorials run more smoothly if you have agreed with
someone what you want to talk about!
Your tutorial session usually won’t discuss every part. The point of tutorials isn’t to give
you the answers: the answers will be made available after the week’s tutorials, and can be
discussed further on Hypothesis. The tutorial sessions are mainly useful for giving you
practice at explaining your thinking, and discussing particular points that the group might
find helpful.

1. Linear Regression and linear transformations:


Alice fits a function f (x) = w> x to a training set of N datapoints {x(n) , y(n) } by least
squares. The inputs x are D-dimensional column vectors.
Bob has heard that by transforming the inputs x with a vector-valued function φ, he
can fit an alternative function, g(x) = v> φ(x), with the same fitting code. He decides
to use a linear transformation φ(x) = Ax, where A is an invertible matrix.

a) Show that Bob’s procedure will fit the same function as Alice’s original procedure.

NB You don’t have to do any extensive mathematical manipulation. You also


don’t need a mathematical expression for the least squares weights.

b) Could Bob’s procedure be better than Alice’s if the matrix A is not invertible?

[If you need a hint, it may help to remind yourself of the discussion involving
invertible matrices in the pre-test answers.]

c) Alice becomes worried about overfitting, adds a regularizer λw> w to the least-
squares error function, and refits the model. Assuming A is invertible, can Bob
choose a regularizer so that he will still always obtain the same function as Alice?

d) Bonus part: Only do this part this week if you have time. Otherwise review it later.
Suppose we wish to find the vector v that minimizes the function
(y − Φv)> (y − Φv) + v> Mv.

i) Show that v> Mv = v> ( 12 M + 12 M> )v, and hence that we can assume with-
out loss of generality that M is symmetric.

ii) Why would we usually choose M to be positive semi-definite in a regularizer,


meaning that a> Ma ≥ 0 for all vectors a?

iii) Assume we can find a factorization M = AA> . Can we minimize the function
above using a standard routine that can minimize (z − Xw)> (z − Xw) with
respect to w?

MLPR:tut1 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 1


2. Logistic Sigmoids:

i) Sketch — with pen and paper — a contour plot of the sigmoidal function

φ ( x ) = σ ( v > x + b ),

for v = [1 2]> and b = 5, where σ ( a) = 1/(1 + exp(− a)).

Indicate the precise location of the φ = 0.5 contour on your sketch, and give at
least a rough indication of some other contours. Also mark the vector v on your
diagram, and indicate how its direction is related to the contours.

Hints: What happens to φ as x moves orthogonal (perpendicular) to v? What


happens to φ as x moves parallel to v? To draw the φ = 0.5 contour, it may help to
identify special places where it is easy to show that φ = 0.5.

ii) If x and v were three-dimensional, what would the contours of φ look like, and
how would they relate to v? (A sketch is not expected.)

3. Radial Basis Functions (RBFs):


In this question we form a linear regression model for one-dimensional inputs: f ( x ) =
w> φ( x; h), where φ( x; h) evaluates the input at 101 basis functions. The basis functions
2 /h2
φk ( x ) = e−( x−ck )

share a common user-specified bandwidth h, while the√positions of the centers are set
to make the basis functions overlap: ck = (k − 51)h/ 2, with k = 1 . . . 101. The free
parameters of the model are the bandwidth h and weights w.
The model is used to fit a dataset with N = 70 observations each with inputs x ∈
[−1, +1]. Assume each of the observations has outputs y ∈ [−1, +1] also. The model
is fitted for any particular h by transforming the inputs using that bandwidth into a
feature matrix Φ, then minimizing the regularized least squares cost:

C = (y − Φw)> (y − Φw) + 0.1w> w.

a) Explain why many of the weights will be close to zero when h = 0.2, and why
even more weights will probably be close to zero when h = 1.0.

b) It is suggested that we could choose h by fitting w for each h in a grid of values,


and pick the h which led to a fit with the smallest cost C. Explain whether this
suggestion is a good idea, or whether you would modify the procedure.

c) Another data set with inputs x ∈ [−1, +1] arrives, but now you notice that all
of the observed outputs are larger, y ∈ [1000, 1010]. What problem would we
encounter if we performed linear regression as above to this data? How could
this problem be fixed?

MLPR:tut1 Iain Murray, http://www.inf.ed.ac.uk/teaching/courses/mlpr/2018/ 2

You might also like