0% found this document useful (0 votes)
3 views

Machine Learning Homework1 Solutions

The document presents solutions to four machine learning problems, including polynomial regression and multivariate regression, highlighting optimal degrees and regularization techniques. It also proves properties of logistic functions and implements a logistic regression algorithm in MATLAB, detailing the performance metrics and convergence rates for various hyperparameter settings. The findings indicate that the model can achieve 100% accuracy with zero classification error under specific conditions.

Uploaded by

harishbalaji0905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Machine Learning Homework1 Solutions

The document presents solutions to four machine learning problems, including polynomial regression and multivariate regression, highlighting optimal degrees and regularization techniques. It also proves properties of logistic functions and implements a logistic regression algorithm in MATLAB, detailing the performance metrics and convergence rates for various hyperparameter settings. The findings indicate that the model can achieve 100% accuracy with zero classification error under specific conditions.

Uploaded by

harishbalaji0905
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MACHINE LEARNING HOMEWORK 1 SOLUTIONS

HARISH BALAJI BOOMINATHAN (hb2917)

September 16,2024

Problem 1:
Goal : To find a one dimensional function that takes a scalar input and outputs a scalar
f : ℝ → ℝ.

Form of the function :

f( x,𝜃 ) = 𝜃0 + 𝜃1 𝑥 + 𝜃2 𝑥 2 + . . . + 𝜃𝑑 𝑥 𝑑

Where d is the degree of the polynomial.

Empirical Risk is given by


𝑁
1 2
𝑅𝑒𝑚𝑝 (𝜃) = ∑ (𝑦𝑖 − 𝑓( 𝑥; 𝜃))
2𝑁
𝑖=1

On partially differentiating with respect to 𝜃 and equating to 0 ,we get:


𝜃 ∗ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌

After splitting the dataset into two random halves, compute 𝜃 ∗ and use this Regression model for d ranging
from 1 to 100 we get,
Regression Model for D = 100 :
When we use cross validation and plot the Train Error vs Test Error graph, we can see that the minimum value
of Test Error is present at Degree = 9 with error value = 1.085044e+21
As we can see in the graph that the testing error reaches minimum when the degree = 9 and starts increasing
afterwards while the training error decreases and stays constant once it reaches 0, This shows that the model
starts to overfit, as the degree increases and hence the testing error increases rapidly.

Problem 2:
Goal : To Build a Multi-variate Regression function f : ℝ100 → ℝ , where the basis functions are of the form
𝑘

𝑓(𝑥; 𝜃) = ∑ 𝜃𝑖 𝑥𝑖
𝑖=1

We use a l2 function to penalize the model and minimize the risk of overfitting.
𝑁
1 2 𝜆 2
𝑅(𝑅𝑒𝑔) = ∑ (𝑦𝑖 − 𝑓(𝑥; 𝜃)) + ||𝜃 ||
2𝑁 2𝑁
𝑖=1

When we partially differentiate Risk with respect to 𝜃 and equate to 0 we get the model with minimum risk.
𝜃 ∗ = (𝑋 𝑇 𝑋 + 𝜆𝐼)−1𝑋 𝑇 𝑌

On applying two-fold cross validation and plotting the graph for various values of 𝜆 in the range 0 to 1000.we
obtain :
We can see that the test error is minimum for 𝜆 = 422. And as 𝜆 increases we can also observe that the values in
𝜃 decreases and the model starts to slowly underfit. Hence the training error slowly decreases while the testing
error increases.

Problem 3:

(i) To prove:
1
For 𝑔(𝑧) = , 𝑔(−𝑧) = 1 − 𝑔(𝑧) is true.
1+𝑒 −𝑧

Proof :

1
𝑔(𝑧) = → (1)
1+𝑒 −𝑧

Then,

1
𝑔(−𝑧) =
1 + 𝑒𝑧

On multiplying and dividing by 𝑒 −𝑧 , we get


𝑒 −𝑧
𝑔(−𝑧) = 𝑒 −𝑧 +1
On adding and subtracting 1 on the numerator,

1 + 𝑒 −𝑧 − 1
𝑔(−𝑧) =
1 + 𝑒 −𝑧

1 + 𝑒 −𝑧 1
𝑔(−𝑧) = −
1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
1
𝑔(−𝑧) = 1 −
1 + 𝑒 −𝑧

Using equation (1):

𝑔(−𝑧) = 1 − 𝑔(𝑧)

And hence proved.

(ii) To prove :

𝑦
𝑔 −1 (𝑦) = ln ( )
1−𝑦

Proof

We know that 𝑔(𝑔−1 (𝑦)) = 𝑦

Therefore,
1
𝑔(𝑔−1 (𝑦)) = = 𝑦
1 + 𝑒 −𝑔−1(𝑦)

1 −1
= 1 + 𝑒 −𝑔 (𝑦)
𝑦

1 −1
− 1 = 𝑒 −𝑔 (𝑦)
𝑦

1−𝑦 −1
= 𝑒 −𝑔 (𝑦)
𝑦

On taking natural logarithm on both sides we have,

1−𝑦 −1
ln ( ) = ln(𝑒 −𝑔 (𝑦) )
𝑦

Using the properties of logarithm,


1−𝑦
ln ( ) = −𝑔−1 (𝑦)
𝑦
Therefore,

1−𝑦
𝑔 −1 (𝑦) = − ln ( )
𝑦

Again using properties of logarithm,


𝑦
𝑔 −1 (𝑦) = ln ( )
1−𝑦

Hence Proved.

Problem 4:

Goal :
To Implement a linear Logistic regression algorithm for binary classification in MatLab using gradient
descent.

Classification Function :

𝑓(𝑥; 𝜃) = (1 + exp(−𝜃 𝑇 𝑋))−1

That minimizes the Empirical Risk with logistical loss


𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = ∑(𝑦𝑖 − 1) log(1 − 𝑓(𝑥𝑖 , 𝜃)) − 𝑦𝑖 log(𝑓(𝑥𝑖 , 𝜃))
𝑁
𝑖=1

𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑓(𝑥𝑖 , 𝜃)) + 𝑦𝑖 log(𝑓(𝑥𝑖 , 𝜃))
𝑁
𝑖=1

We know that,
𝜕
𝑓(𝑥𝑖 , 𝜃) = 𝑔(𝜃 𝑇 𝑋) and 𝜕𝜃 𝑔(𝜃 𝑇 𝑋) = (𝑔(𝜃 𝑇 𝑋))(1 − 𝑔(𝜃 𝑇 𝑋)) 𝑋 → (1)

Therefore,
𝑁
1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) log(1 − 𝑔(𝜃 𝑇 𝑋)) + 𝑦𝑖 log(𝑔(𝜃 𝑇 𝑋))
𝑁
𝑖=1

Hence the gradient is,


𝑁
𝜕 1 1 𝜕 1 𝜕
𝑅𝐸𝑚𝑝 (𝜃) = − ∑(1 − 𝑦𝑖 ) ( 𝑇
) (− 𝑔(𝜃 𝑇 𝑋)) + 𝑦𝑖 𝑇
( 𝑔(𝜃 𝑇 𝑋))
𝜕𝜃 𝑁 1 − 𝑔(𝜃 𝑋) 𝜕𝜃 𝑔(𝜃 𝑋) 𝜕𝜃
𝑖=1

𝑁
𝜕 1 1 1 𝜕
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 ( 𝑇
) − (1 − 𝑦𝑖 ) ( 𝑇
)) ( 𝑔(𝜃 𝑇 𝑋))
𝜕𝜃 𝑁 𝑔(𝜃 𝑋) 1 − 𝑔(𝜃 𝑋) 𝜕𝜃
𝑖=1

Using Equation (1):


𝑁
𝜕 1 1 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 ( 𝑇
) − (1 − 𝑦𝑖 ) ( )) (𝑔(𝜃 𝑇 𝑋)) (1 − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁 𝑔(𝜃 𝑋) 1 − 𝑔(𝜃 𝑇 𝑋)
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑ ( 𝑦𝑖 (1 − 𝑔(𝜃 𝑇 𝑋)) − (1 − 𝑦𝑖 )(𝑔(𝜃 𝑇 𝑋))) 𝑋
𝜕𝜃 𝑁
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦𝑖 − (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) + (𝑦𝑖 ) 𝑔(𝜃 𝑇 𝑋) − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1

𝑁
𝜕 1
𝑅𝐸𝑚𝑝 (𝜃) = − ∑( 𝑦 − 𝑔(𝜃 𝑇 𝑋))𝑋
𝜕𝜃 𝑁
𝑖=1

We can Apply batch gradient to get 𝜃 ∗ ,


(𝑡+1) 𝜕
𝜃{ = 𝜃𝑡 − 𝜂 𝑅(𝜃).
𝜕𝜃

Here , 𝜂 is the step size and the iteration can be stopped when the descent is negligible in size or when it is less
than tolerance 𝜀 . Where both 𝜀 and 𝜂 are hyper parameters.

We can Initialize 𝜃 0 to a small random matrix.

For 𝜂 = 3 and 𝜀 = 0.1 , The model produces a 12.5% binary classification error and performs with
87.5 % accuracy and it takes 19 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.03 , The model produces a 10.5% binary classification error and performs with
89.5 % accuracy and it takes 161 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.01 , The model produces a 2% binary classification error and performs with
98 % accuracy and it takes 1050 iterations to converge.
For 𝜂 = 2 and 𝜀 = 0.001 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes 27784 iterations to converge.
This model performs excellently with 0% error and with 100% accuracy but takes 27784 iterations and
converges many iterations after it reaches zero error. So, we can increase the step size and tolerance
level so that the model reaches convergence and attains zero error just at the right time.

For 𝜂 = 4 and 𝜀 = 0.0025 , The model produces a 0% binary classification error and performs with
100% accuracy and it takes just 10,292 iterations to converge.
Here the model performs with 100% accuracy and 0% error and also converges right after the binary
classification error reaches zero. And runs in least amount of time.

You might also like