0% found this document useful (0 votes)

15 views3 pages

Assignment 1

Uploaded by

nihalahmad322

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

Assignment 1

Uploaded by

nihalahmad322

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CS-437/CS-5317, EE-414/EE-517: Deep Learning

Assignment # 1

Dated: September 19, 2023

Hello Everyone!
The objective of this assignment is to enhance your understanding of gradient descent algorithms and
the impact of learning rate policies on their performance. You are welcome to discuss any issues in Office
hours or via email.

!
Submission Instructions: For questions that require a written response, write the answer in com-
ments using ’#’ within your code. Submit your Jupyter Notebook file as
’Assignment1_<RollNumber>.ipynb’ (e.g., Assignment1_21060003.ipynb).
Submission Deadline: October 3, 2023 11:55 PM

1 Optimization Methods for Linear Regression (CLO 3): 70 points

Model: yˆi = mxi +c, where m, c are the parameters to be trained over the dataset, and yˆi is the estimate
of the model for input xi (with true/desired output value yi ).

Dataset: The dataset is based on the model y = ax + b + ϵ where ϵ is sampled from a normal distribution
with mean 0 and variance σ 2 . Dataset samples are available in the form {xi }N N
i=1 and {yi }i=1 where xi is
the input and yi is its desired/true output value.
PN
Cost Function: J = N1 i=1 (yi − yˆi )2 , also known as Mean Squared Error (MSE). This is the ob-
jective we aim to minimize.

1.1 Direct Method (10 points)

Optimization methods like gradient descent can be used to minimize the cost function of linear regression.
However, there exists an analytical solution for linear regression. Given below is the formula for the
analytical solution for linear regression where θ represents the minima, X is an array of input variables
and Y is an array of target values.
θ = (X T X)−1 X T Y
Use this formula to compute the exact solution for the given dataset.

1.2 Iterative Method (60 points)

In this section, you will employ gradient descent algorithms to optimize for the solution. The full-batch
gradient descent algorithm (GD) has already been done for you. Modify the weight update methods and
the train method in the Linear Regression class to implement the following optimizers:

(a) Stochastic Gradient Descent (SGD) - A single optimizing step will be over each example per epoch.
Train the model over this dataset for 50 epochs, with learning rate set to 0.001. Keep a record of
the mean loss per epoch while training. Additionally, you can tweak these parameters to get better
results.

1
(b) Mini-batch SGD - A single optimizing step will encompass a subset (mini-batch) of the whole dataset.
This is midway between iterating over one sample at a time and iterating over the complete dataset
in one go, offering the best of both worlds. Try out mini-batch sizes of 5, 10 20, and show results
for each case. Comment on which batch-size gives the lowest error? How does this method work out
as compared to SGD and GD?
(c) SGD Nesterov Momentum - This performs the optimization step over each example in each iteration.
Train the model with a learning rate of 0.001 and momentum of 0.5 for 50 epochs. You can adjust
the parameters to get better results.

(d) Adaptive Moment Estimation (Adam) - This performs the optimization step over entire dataset per
iteration. Train the model with a learning rate of 0.5 for 50 epochs. You can adjust the parameters
to get better results.

After training, plot the ’mean loss per epoch’ versus ’number of epochs’ for each optimizer for comparison.
This will give more insight into how training progressed with each epoch.

i
Info: The widely used (but not strictly followed) definition of an epoch is the duration in which
the algorithm completes iterating over the complete dataset. Thus, in the above case, after SGD
has iterated over each sample one-by-one and there are no samples left, we say that one epoch has
passed. After this, the algorithm is again iterated over the samples, and we can keep doing this till
a certain criteria is met, which in our case is the total number of epochs (10).

2 Learning Rate Decay (CLO 3): 30 points

In this section, you will add the learning rate decay functionality to the Linear Regression class from
Part 1 to allow for the following policies:

2.1 Constant Learning Rate

The equation given below shows that at each iteration the learning rate remains constant. Note that
η(0) is the initial learning rate while η(t) is the learning rate at the tth iteration. You can show its
implementation along with any optimizer of your choice from Part 1.

η(t) = η(0), ∀t ∈ [0, N ]

2.2 Auto-reduce Learning Rate

The idea is based on reducing the learning rate when the validation performance plateaues or diverges.
When the network parameters have approached the vicinity of a local minimum in the loss landscape,
and further training does not improve validation performance, learning rate is reduced.

In this case, at each iteration, the learning rate is scaled by γ α where γ ∈ (0, 1) is the decay rate
and α is the current epoch while an additional parameter p called ‘patience’, i.e. number of epochs to
wait before reducing the learning rate if validation performance does not improve, is also used. Experi-
ment with this strategy using any optimizer of your choice. Explain how this strategy affects convergence
compared to your results in Part 1.

η(t) = η(0)γ α , ∀t ∈ [0, N ]

2.3 Polynomial Decay

Implement the polynomial decay schedule in which η(0) > η(N ) to non-linearly decrease the learning
rate between an upper and lower bound. Note that the value of the decay rate (γ) must be greater than
zero. γ
t
η(t) = [η(N ) − η(0)] + η(0), ∀t ∈ [0, N ]
N

2
Implement this policy with SGD Nesterov Momentum and explain your observations and potential reasons
for behaviour of the optimizer with given policy.

Thought provoking practice questions

These will not be graded.
1. What difference is observed if the data is noisy?. This can be done by increasing the value of the
standard deviation (or variance) of the distribution in the dataset generator.

2. There are other learning rate policies which can be employed such as Cyclical learning rates and
Cosine annealing schedule. Further details are present in this paper. Implement any one of them
and observe any difference in your results.
3. Explore other optimizers such as Adagrad, RMSprop etc. This link discusses other optmizers in
detail. Implement any one of them and observe any difference(s) in your results.

Hist SN T1 e ST
No ratings yet
Hist SN T1 e ST
58 pages
Module 08 - Basic Network Configuration
100% (1)
Module 08 - Basic Network Configuration
12 pages
Classroom and Lab Area - Job Roles Wise
No ratings yet
Classroom and Lab Area - Job Roles Wise
115 pages
Ameya Yamini Linear Regression Doc
No ratings yet
Ameya Yamini Linear Regression Doc
15 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Assignment 2 Regression2
No ratings yet
Assignment 2 Regression2
4 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Lect 7 - Gradient Descent
No ratings yet
Lect 7 - Gradient Descent
13 pages
Cs7602 - Machine Learning Assignment 1: Submitted by
No ratings yet
Cs7602 - Machine Learning Assignment 1: Submitted by
11 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
DNN Cluster S2 22 MidSem Regular
No ratings yet
DNN Cluster S2 22 MidSem Regular
6 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
Solution Dseclzg524!01!102020 Ec2r
100% (1)
Solution Dseclzg524!01!102020 Ec2r
6 pages
1710993830340
No ratings yet
1710993830340
9 pages
21 CP 46 - (ML LAB 3)
No ratings yet
21 CP 46 - (ML LAB 3)
13 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
LinearRegression Tutorial
No ratings yet
LinearRegression Tutorial
40 pages
Lab2 Linear Regression
100% (1)
Lab2 Linear Regression
18 pages
CH - En.u4cse19101 Cheduri Linearregression
No ratings yet
CH - En.u4cse19101 Cheduri Linearregression
8 pages
HW 3
No ratings yet
HW 3
4 pages
MCQ1
No ratings yet
MCQ1
22 pages
Col774 Ass1 v1
No ratings yet
Col774 Ass1 v1
5 pages
Bil470 hw2 Summer2024
No ratings yet
Bil470 hw2 Summer2024
4 pages
AI2025 Lecture05 Inperson Slide
No ratings yet
AI2025 Lecture05 Inperson Slide
47 pages
Aie231 NN Lab5
No ratings yet
Aie231 NN Lab5
7 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Homework 2
No ratings yet
Homework 2
3 pages
Linear Regression Using Batch Gradient Descent
No ratings yet
Linear Regression Using Batch Gradient Descent
7 pages
Gradient Descent
No ratings yet
Gradient Descent
16 pages
20102A0071 DL Experiment5
No ratings yet
20102A0071 DL Experiment5
6 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
Linear and Logistic Regression Mathematical Intuition 1695069755
No ratings yet
Linear and Logistic Regression Mathematical Intuition 1695069755
3 pages
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
No ratings yet
Z.H. Sikder University of Science and Technology: Mid-Term Examination, Fall-2020
6 pages
Linear Regression With Gradient Descent
100% (1)
Linear Regression With Gradient Descent
8 pages
20102A0071 DL Experiment5.b
No ratings yet
20102A0071 DL Experiment5.b
5 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Final Report
No ratings yet
Final Report
8 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
CS4100 CS5100 CW1 20241001
No ratings yet
CS4100 CS5100 CW1 20241001
10 pages
Take It Easy: Created Status Last Read
No ratings yet
Take It Easy: Created Status Last Read
55 pages
ML TW-PW 02-2
No ratings yet
ML TW-PW 02-2
9 pages
5 Gradients
No ratings yet
5 Gradients
26 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Ai Lab
No ratings yet
Ai Lab
19 pages
HandsOnML Ch4E
No ratings yet
HandsOnML Ch4E
46 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
No ratings yet
Exercise - 3: DS203-2024-S1 Roll Number: 23B2215
25 pages
Ps 1
No ratings yet
Ps 1
16 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
LogisticRegression ExercisesSolutions
No ratings yet
LogisticRegression ExercisesSolutions
5 pages
Lab5 Linear Regression
No ratings yet
Lab5 Linear Regression
1 page
FODL Question Bank
No ratings yet
FODL Question Bank
28 pages
MFD S Assignment 2
No ratings yet
MFD S Assignment 2
18 pages
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
HI5004 Group Assignment Guideline T1.2021
No ratings yet
HI5004 Group Assignment Guideline T1.2021
15 pages
2025 Uc Secondary Teaching
No ratings yet
2025 Uc Secondary Teaching
20 pages
Motherboard Labeling Designed by Fujitsu
No ratings yet
Motherboard Labeling Designed by Fujitsu
3 pages
History of Transistors Volume 1
100% (2)
History of Transistors Volume 1
41 pages
E2 UoE Unit 4
No ratings yet
E2 UoE Unit 4
16 pages
AMP Microproject Grp-12
No ratings yet
AMP Microproject Grp-12
16 pages
DB en Trio Ups 2g 1ac 1ac 120v 750va 107057 en 01
No ratings yet
DB en Trio Ups 2g 1ac 1ac 120v 750va 107057 en 01
24 pages
Design and Analysis of An Hydraulic Trash Compactor: Test Engineering and Management February 2020
No ratings yet
Design and Analysis of An Hydraulic Trash Compactor: Test Engineering and Management February 2020
13 pages
Medical Image Analysis: Published by Elsevier B.V
No ratings yet
Medical Image Analysis: Published by Elsevier B.V
1 page
Reported Speech: Mr.A-Bouhandi
No ratings yet
Reported Speech: Mr.A-Bouhandi
1 page
Agfa Parat-1
No ratings yet
Agfa Parat-1
30 pages
Dawn 2
No ratings yet
Dawn 2
8 pages
16th January 2018 Part 1 Standardised Competence-Oriented Written School-Leaving Examination
No ratings yet
16th January 2018 Part 1 Standardised Competence-Oriented Written School-Leaving Examination
28 pages
Overview of Distributed Control Systems Formalisms
No ratings yet
Overview of Distributed Control Systems Formalisms
4 pages
Jemarah Rabina
No ratings yet
Jemarah Rabina
4 pages
P1 - M24TZ2 - ChatGPT - Marking Notes
No ratings yet
P1 - M24TZ2 - ChatGPT - Marking Notes
2 pages
WPP - 4 - Federalism
No ratings yet
WPP - 4 - Federalism
2 pages
Mscds Ad 2025
No ratings yet
Mscds Ad 2025
1 page
Ch05 Student (Prob. Tuts)
No ratings yet
Ch05 Student (Prob. Tuts)
154 pages
Vlsi Term Paper Topics
100% (1)
Vlsi Term Paper Topics
7 pages
Angular
No ratings yet
Angular
330 pages
Field Record of Concrete: Commercial & Office Building On Plot No. 373-1343 at Al Barsha First Dubai
No ratings yet
Field Record of Concrete: Commercial & Office Building On Plot No. 373-1343 at Al Barsha First Dubai
38 pages
Chapter4performanceparav2 28student 29
No ratings yet
Chapter4performanceparav2 28student 29
19 pages
Simple Future Tense: Presented by Henny Septia Utami, M.PD
100% (2)
Simple Future Tense: Presented by Henny Septia Utami, M.PD
10 pages
2av56 Sensor
No ratings yet
2av56 Sensor
1 page
Case Study
No ratings yet
Case Study
2 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages

Assignment 1

Uploaded by

Assignment 1

Uploaded by

CS-437/CS-5317, EE-414/EE-517: Deep Learning

Dated: September 19, 2023

1 Optimization Methods for Linear Regression (CLO 3): 70 points

1.1 Direct Method (10 points)

1.2 Iterative Method (60 points)

2 Learning Rate Decay (CLO 3): 30 points

2.1 Constant Learning Rate

η(t) = η(0), ∀t ∈ [0, N ]

2.2 Auto-reduce Learning Rate

η(t) = η(0)γ α , ∀t ∈ [0, N ]

2.3 Polynomial Decay

Thought provoking practice questions

You might also like