hw1 2025
hw1 2025
Instructions
Your submission should include: 1) a .pdf file that containing all the answers to the questions. For the
questions requiring to plot figures, you should also include all the figures in the .pdf fie. 2) source code
(e.g., .py files, python notebook, .m files...)
(a) [5 points] Let f (w) = w⊤ Xb, where X ∈ Rn×p is a n × p matrix, and b is a p-dimensional column
vector. Compute ∇f (w) using the definition of gradient.
(b) [5 points] Let f (w) = tr(Bww⊤ A), where A, B ∈ Rn×n are squared matrices of size n × n, and
tr(A) is the trace of the squared matrix A. Using the definition of gradient, compute ∇f (w).
(c) [5 points] Let f (w) = tr(Bww⊤ A). Compute the Hessian matrix H of f with respect to w using
the definition.
1 1 2 −1
(d) [5 points] If A = and B = . Is f (w) a convex function? (Hint: you may use
1 2 −1 3
Python/Matlab/R for this question.)
(e) [5 points] In Lecture 5, we have define the sigmoid function: σ(a) = 1+e1−a . Let f (w) = log(σ(w⊤ x)),
where log is the natural logarithmic function. Compute ∇f (w) using the definition of gradient.
(a) [5 points] Load the training data hw1xtr.dat and hw1ytr.dat into the memory and plot it on one
graph. Load the test data hw1xte.dat and hw1yte.dat into the memory and plot it on another
graph.
(b) [10 points] Add a column vector of 1’s to the features, then use the linear regression formula
discussed in Lecture 3 to obtain a 2-dimensional weight vector. Plot both the linear regression line
and the training data on the same graph. Also report the average error on the training set using
Eq. (1).
m
1 X ⊤
err = (w xi − yi )2 (1)
m i=1
(c) [5 points] Plot both the regression line and the test data on the same graph. Also report the
average error on the test set using Eq. (1).
1
(d) [10 points] Implement the 2nd-order polynomial regression by adding new features x2 to the inputs.
Repeat (b) and (c). Compare the training error and test error. Is it a better fit than linear
regression?
(e) [10 points] Implement the 3rd-order polynomial regression by adding new features x2 , x3 to the
inputs. Repeat (b) and (c). Compare the training error and test error. Is it a better fit than linear
regression and 2nd-order polynomial regression?
(f) [10 points] Implement the 4th-order polynomial regression by adding new features x2 , x3 , x4 to
the inputs. Repeat (b) and (c). Compare the training error and test error. Compared with the
previous results, which order is the best for fitting the data?
(b) [10 points] Plot the value of each weight parameter (including the bias term w0 ) as a function of λ.
(c) [5 points] Write a procedure that performs five-fold cross-validation on your training data (page 7
of Lecture 4). Use it to determine the best value for λ. Show the average error on the validation
set as a function of λ. Is the same as the best λ in (a)? For the best fit, plot the test data and the
ℓ2 -regularized 4th-order polynomial regression line obtained.