Machine Learning Homework1 Solutions
Machine Learning Homework1 Solutions
September 16,2024
Problem 1:
Goal : To find a one dimensional function that takes a scalar input and outputs a scalar
f : ℝ → ℝ.
f( x,𝜃 ) = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + . . . + 𝜃𝑑 𝑥 𝑑
After splitting the dataset into two random halves, compute 𝜃 ∗ and use this Regression model for d ranging
from 1 to 100 we get,
Regression Model for D = 100 :
When we use cross validation and plot the Train Error vs Test Error graph, we can see that the minimum value
of Test Error is present at Degree = 9 with error value = 1.085044e+21
As we can see in the graph that the testing error reaches minimum when the degree = 9 and starts increasing
afterwards while the training error decreases and stays constant once it reaches 0, This shows that the model
starts to overfit, as the degree increases and hence the testing error increases rapidly.
Problem 2:
Goal : To Build a Multi-variate Regression function f : ℝ100 → ℝ , where the basis functions are of the form
𝑘
𝑓(𝑥; 𝜃) = ∑ 𝜃𝑖 𝑥𝑖
𝑖=1
We use a l2 function to penalize the model and minimize the risk of overfitting.
𝑁
1 2 𝜆 2
𝑅(𝑅𝑒𝑔) = ∑ (𝑦𝑖 − 𝑓(𝑥; 𝜃)) + ||𝜃 ||
2𝑁 2𝑁
𝑖=1
When we partially differentiate Risk with respect to 𝜃 and equate to 0 we get the model with minimum risk.
𝜃 ∗ = (𝑋 𝑇 𝑋 + 𝜆𝐼)−1𝑋 𝑇 𝑌
On applying two-fold cross validation and plotting the graph for various values of 𝜆 in the range 0 to 1000.we
obtain :
We can see that the test error is minimum for 𝜆 = 422. And as 𝜆 increases we can also observe that the values in
𝜃 decreases and the model starts to slowly underfit. Hence the training error slowly decreases while the testing
error increases.
Problem 3:
(i) To prove:
1
For 𝑔(𝑧) = , 𝑔(−𝑧) = 1 − 𝑔(𝑧) is true.
1+𝑒 −𝑧
Proof :
1
𝑔(𝑧) = → (1)
1+𝑒 −𝑧
Then,
1
𝑔(−𝑧) =
1 + 𝑒𝑧
1 + 𝑒 −𝑧 − 1
𝑔(−𝑧) =
1 + 𝑒 −𝑧
1 + 𝑒 −𝑧 1
𝑔(−𝑧) = −
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
1
𝑔(−𝑧) = 1 −
1 + 𝑒 −𝑧
𝑔(−𝑧) = 1 − 𝑔(𝑧)
(ii) To prove :
𝑦
𝑔 −1 (𝑦) = ln ( )
1−𝑦
Proof
Therefore,
1
𝑔(𝑔−1 (𝑦)) = = 𝑦
1 + 𝑒 −𝑔−1(𝑦)
1 −1
= 1 + 𝑒 −𝑔 (𝑦)
𝑦
1 −1
− 1 = 𝑒 −𝑔 (𝑦)
𝑦
1−𝑦 −1
= 𝑒 −𝑔 (𝑦)
𝑦
1−𝑦 −1
ln ( ) = ln(𝑒 −𝑔 (𝑦) )
𝑦
1−𝑦
𝑔 −1 (𝑦) = − ln ( )
𝑦
Hence Proved.
Problem 4:
Goal :
To Implement a linear Logistic regression algorithm for binary classification in MatLab using gradient
descent.
Classification Function :
𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑓(𝑥𝑖 , 𝜃)) + 𝑦𝑖 log(𝑓(𝑥𝑖 , 𝜃))
𝑁
𝑖=1
We know that,
𝜕
𝑓(𝑥𝑖 , 𝜃) = 𝑔(𝜃 𝑇 𝑋) and 𝜕𝜃 𝑔(𝜃 𝑇 𝑋) = (𝑔(𝜃 𝑇 𝑋))(1 − 𝑔(𝜃 𝑇 𝑋)) 𝑋 → (1)
Therefore,
𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑔(𝜃 𝑇 𝑋)) + 𝑦𝑖 log(𝑔(𝜃 𝑇 𝑋))
𝑁
𝑖=1
𝑁
𝜕 1 1 1 𝜕
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 ( 𝑇
) − (1 − 𝑦𝑖 ) ( 𝑇
)) ( 𝑔(𝜃 𝑇 𝑋))
𝜕𝜃 𝑁 𝑔(𝜃 𝑋) 1 − 𝑔(𝜃 𝑋) 𝜕𝜃
𝑖=1
𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 (1 − 𝑔(𝜃 𝑇 𝑋)) − (1 − 𝑦𝑖 )(𝑔(𝜃 𝑇 𝑋))) 𝑋
𝜕𝜃 𝑁
𝑖=1
𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦𝑖 − (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) + (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1
𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦 − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1
Here , 𝜂 is the step size and the iteration can be stopped when the descent is negligible in size or when it is less
than tolerance 𝜀 . Where both 𝜀 and 𝜂 are hyper parameters.
For 𝜂 = 3 and 𝜀 = 0.1 , The model produces a 12.5% binary classification error and performs with
87.5 % accuracy and it takes 19 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.03 , The model produces a 10.5% binary classification error and performs with
89.5 % accuracy and it takes 161 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.01 , The model produces a 2% binary classification error and performs with
98 % accuracy and it takes 1050 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.001 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes 27784 iterations to converge.
This model performs excellently with 0% error and with 100% accuracy but takes 27784 iterations and
converges many iterations after it reaches zero error. So, we can increase the step size and tolerance
level so that the model reaches convergence and attains zero error just at the right time.
For 𝜂 = 4 and 𝜀 = 0.0025 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes just 10,292 iterations to converge.
Here the model performs with 100% accuracy and 0% error and also converges right after the binary
classification error reaches zero. And runs in least amount of time.