0% found this document useful (0 votes)
9 views3 pages

Homework2 - Tran Anh Vu

Homework 2

Uploaded by

azanetranclc17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views3 pages

Homework2 - Tran Anh Vu

Homework 2

Uploaded by

azanetranclc17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

COMP3020 Machine Learning Fall 2024

Homework 2
Tran Anh Vu - V202100569
October 21, 2024

1 Perceptron
1.1 Exercise 1a
- Based on the diagrams of data distribution, we can state that this dataset is linearly separable.
Therefore, we can train a Perceptron to classify it perfectly.
- Initially, w0 = [0, −1] and b0 = 12 We iteratively go through each sample: - For iteration 1:
• x1 = [0, 0] =⇒ a = w0T x1 + b0 = 12 . So a.y < 0 which implies the incorrect prediction. Then we
do the update w1 = w0 + y.x1 = [0, −1] and b1 = b0 + y = −0.5
• x2 = [0, 1] =⇒ a = w1T x2 + b1 = −3
2 . So a.y < 0 which implies the incorrect prediction. Then
we do the update w2 = w1 + y.x2 = [0, 0] and b2 = b1 + y2 = 0.5
• x3 = [1, 0] =⇒ a = w2T x3 + b2 = 12 . So a.y > 0 which implies the correct prediction.
• x3 = [1, 1] =⇒ a = w2T x4 + b2 = 12 . So a.y > 0 which implies the correct prediction.
- For iteration 2:
• x1 = [0, 0] =⇒ a = w2T x1 + b2 = 12 . So a.y < 0 which implies the incorrect prediction. Then we
do the update w3 = w2 + y.x1 = [0, 0] and b3 = b2 + y = −0.5
• x2 = [0, 1] =⇒ a = w3T x2 + b3 = −1
2 . So a.y < 0 which implies the incorrect prediction. Then
we do the update w4 = w3 + y.x2 = [0, 1] and b4 = b3 + y = 0.5
• x3 = [1, 0] =⇒ a = w4T x3 + b4 = 12 . So a.y > 0 which implies the correct prediction.
• x3 = [1, 1] =⇒ a = w4T x4 + b4 = 32 . So a.y > 0 which implies the correct prediction.
- For iteration 3:
• x1 = [0, 0] =⇒ a = w4T x1 + b4 = 12 . So a.y < 0 which implies the incorrect prediction. Then we
do the update w5 = w4 + y.x1 = [0, 1] and b5 = b4 + y = −0.5
• x2 = [0, 1] =⇒ a = w5T x2 + b5 = 21 . So a.y > 0 which implies the correct prediction.

1
• x3 = [1, 0] =⇒ a = w5T x3 + b5 = −1
2 . So a.y > 0 which implies the incorrect prediction. Then
we do the update w6 = w5 + y.x3 = [1, 1] and b6 = b5 + y = 0.5
• x3 = [1, 1] =⇒ a = w6T x4 + b6 = 52 . So a.y > 0 which implies the correct prediction.
- For iteration 4:
• x1 = [0, 0] =⇒ a = w2T x1 + b2 = 12 . So a.y < 0 which implies the incorrect prediction. Then we
do the update w7 = w6 + y.x1 = [1, 1] and b7 = b6 + y = −0.5
• x2 = [0, 1] =⇒ a = w7T x2 + b7 = 21 . So a.y > 0 which implies the correct prediction.
• x3 = [1, 0] =⇒ a = w7T x3 + b7 = 12 . So a.y > 0 which implies the correct prediction.
• x3 = [1, 1] =⇒ a = w7T x4 + b7 = 32 . So a.y > 0 which implies the correct prediction.
−1
Therefore, the perfect classifier for the dataset is the Perceptron with w∗ = [1, 1] and b∗ = 2

1.2 Exercise 1b
Assume that we can find a Perceptron that perfectly classifies the dataset and its parameters are
w∗ = [w1 , w2 ] and b∗ = b Therefore, those parameter has to satisfy the following system of equations:

b < 0(1)


w + b ≥ 0(2)
1
w2 + b ≥ 0(3)


w1 + w2 + b < 0(4)

From that system, we plus (1) and (4), we get w1 + w2 + 2b < 0 while plus (2) and (3), we get
w1 + w2 + 2b ≥ 0 ( contradiction). Therefore, we can not find a Perceptron that perfectly fits the
dataset.

2 Linear Regression
2.1 Exercise 2a
The loss function is given as:
L(w) = ∥Xw − y∥2 = (Xw − y)T (Xw − y) = wT X T Xw − 2y T Xw + y T y

Now, take the gradient of this with respect to w:

∂L(w)
= 2X T (Xw − y)
∂w
This is the derivative of the loss function with respect to w.

2.2 Exercise 2b
From Exercise 2a, we have the derivative as:
∂L(w)
= 2X T (Xw − y)
∂w
Setting this equal to zero:

X T (Xw∗ − y) = 0

=⇒ X T Xw∗ = X T y

Since X is full column rank, X T X is invertible. Thus, we can solve for w∗ :

w∗ = (X T X)−1 X T y

2
2.3 Exercise 2c
We have:
Lridge = ∥Xw − y∥2 + λ∥w∥2 = (Xw − y)T (Xw − y) + λwT w

Taking the derivative with respect to w:


∂Lridge
= 2X T (Xw − y) + 2λw
∂w

Setting the derivative equal to zero, we have:

X T (Xw∗ − y) + λw∗ = 0

=⇒ X T Xw∗ + λw∗ = X T y
=⇒ (X T X + λI)w∗ = X T y
In case λ > 0, we have (X T X + λI) > 0 because X T X > 0 has been proved above. Therefore,
(X T X + λI) is invertable. Hence, the solution is:

w∗ = (X T X + λI)−1 X T y

2.4 Exercise 2d
Let X and y ∗ be features and label sets of the new dataset such that we can obtain L2 regularization

by training with ordinary least square regression with this new dataset. In other words:

∥X ∗ w − y ∗ ∥2 = ∥Xw − y∥2 + λ∥w∥2 (1)


 
v1
Besides that, we have the norm of a stacked vector is the sum of the norms of v1 and v2 :
v2
  2
v1
= ∥v1 ∥2 + ∥v2 ∥2
v2
Therefore, we can ’compress’ the expression ∥Xw − y∥2 + λ∥w∥2 into :
  2     2
Xw − y X y
L( w) = = w− (2)
λIw λI 0

where I is the identity matrix.    


∗ X ∗ y
Compare (2) to (1), we can set X = and y = so that (1) will be satisfied. To do this, we
λI 0
can add m artificial samples to the dataset, where these samples are constructed such that the input
features are scaled by λ, and the corresponding outputs are all zero. The artificial samples would look
like this:
Xartificial = λI, yartificial = 0

Hence, we can augment the original dataset X, y to the augmented dataset X ∗ , y ∗ to achieve the same
effect as L2 regularize while using the ordinary least square regression.

3 Coding Questions
I described my code in comment lines

You might also like