0% found this document useful (0 votes)

76 views7 pages

HW 3

Uploaded by

Ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views7 pages

HW 3

Uploaded by

Ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Homework 3 @ 2022-09-23 19:46:20Z

EECS 182 Deep Neural Networks

Fall 2022 Anant Sahai Homework 3
This homework is due on Saturday, September 24, 2022, at 10:59PM.

1. Accelerating Gradient Descent with momentum

Consider the problem finding the minimizer of the following objective:

L(w) = ||y − Xw||22 (1)

In the previous homework, we proved that gradient descent (GD) algorithm can converge and derive the
convergence rate. In this homework, we will add the momentum term and how it affects to the convergence
rate. The optimization procedure of gradient descent+momentum is given below:

wt+1 = wt − ηzt+1
zt+1 = (1 − β)zt + βgt , (2)

where gt = ∇L(wt ), η is learning rate and β defines how much averaging we want for the gradient. Note
that when β = 1, the above procedure is just original gradient descent.
Let’s investigate the effect of this change. We’ll see that this modification can actually ’accelerate’ the
convergence by allowing larger learning rates.

(a) Recall that the gradient descent update of 1 is

wt+1 = I − 2η(X T X) wt + 2ηX T y (3)

and the minimizer is

w∗ = (X T X)−1 X T y (4)

The geometric convergence rate (in the sense of what base is there for convergence as ratet ) of this
procedure is

rate = max |1 − 2ησi2 | (5)

You saw on the last homework that if we choose the learning rate that maximizes Eq. 5, the optimal
learning rate, η ∗ is
1
η∗ = 2 2
, (6)
σmin + σmax

where σmax and σmin are the maximum and minimum singular value of the matrix X. The correspond-

Homework 3, © UCB EECS 182, Fall 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 1
Homework 3 @ 2022-09-23 19:46:20Z

ing optimal convergence rate is

(σmax /σmin )2 − 1
optimal rate = (7)
(σmax /σmin )2 + 1

Therefore, how fast ordinary gradient descent converges is determined by the ratio between the maxi-
mum singular value and the minimum singular value as above.
Now, let’s consider using momentum to smooth the gradients before taking a step in Eq.2.

wt+1 = wt − ηzt+1
zt+1 = (1 − β)zt + β(2X T Xwt − 2X T y) (8)

We can use the SVD of the matrix X = U ΣV T , where Σ = diag(σmax , σ2 , . . . , σmin ) with the same
(potentially rectangular) shape as X. This allows us to reparameterize the parameters wt and averaged
gradients zt as below:

xt = V T (wt − w∗ )
at = V T zt . (9)

Please rewrite Eq. 8 with the reparameterized variables, xt [i] and at [i]. (xt [i] and at [i] are i-th
components of xt and at respectively.)
(b) Notice that the above 2 × 2 vector/matrix recurrence has no external input. We can derive the 2 × 2
system matrix Ri from above such that
" # " #
at+1 [i] at [i]
= Ri (10)
xt+1 [i] xt [i]

Derive Ri .
(c) Use the computer to symbolically find the eigenvalues of the matrix Ri .
When are they purely real? When are they repeated and purely real? When are they complex?
(d) For the case when they are repeated, what is the condition on η, β, σi that keeps them stable
(strictly inside the unit circle)? What is the highest learning rate η as a function of β and σi that
results in repeated eigenvalues?
(e) For the case when the eigenvalues are real, what is the condition on η, β, σi that keeps them stable
(strictly inside the unit circle)? What is the upper bound of learning rate? Express with β, σi
(f) For the case when the eigenvalues are complex, what is the condition on η, β, σi that keeps them
stable (strictly inside the unit circle)? What is the highest learning rate η as a function of β and
σi that results in complex eigenvalues?
(g) Now, apply what you have learned to the following problem. Assume that β = 0.1 and we have a
problem with two singular values σmax 2 = 5 and σmin2 = 0.05. What learning rate η should we
choose to get the fastest convergence for gradient descent with momentum? Compare how many
iterations it will take to get within 99.9% of the optimal solution (starting at 0) using this learning
rate and momentum with what it would take using ordinary gradient descent.

Homework 3, © UCB EECS 182, Fall 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 2
Homework 3 @ 2022-09-23 19:46:20Z

2. Understanding Convolution as Finite Impulse Response Filter

For the discrete time signal, the output of linear time invariant system is defined as:
∞
X ∞
X
y[n] = x[n] ∗ h[n] = x[n − i] · h[i] = x[i] · h[n − i] (11)
i=−∞ i=−∞

, where x is the input signal, h is impulse response (also referred to as the filter). Please note that the con-
volution operations is to ’flip and drag’. But for neural networks, we simply implement the convolutional
layer without flipping and such operation is called correlation. Interestingly, in CNN those two operations
are equivalent because filter weights are initialized and updated. Even though you implement ’true’ convo-
lution, you just ended up with getting the flipped kernel. In this question, we will follow the definition.
Now let’s consider rectangular signal with the length of L (sometimes also called the "rect" for short, or,
alternatively, the "boxcar" signal). This signal is defined as:
(
1 n = 0, 1, 2, ..., L − 1
x(n) =
0 otherwise

Here’s an example plot for L = 7, with time indices shown from -2 to 8 (so some implicit zeros are shown):

Figure 1: The rectangular signal with the length of 7

(a) The impulse response is define as:

(
1 ( 12 )n n = 0, 1, 2, ...
h(n) = ( )n u(n) =
2 0 otherwise

. Compute and plot the convolution of x(n) and h(n).

(b) Now let’s shift x(n) by N , i.e. x2 (n) = x(n − N ). Let’s put N = 5 Then, compute y2 (n) =
h(n) ∗ x2 (n). Which property of the convolution can you find?
Now, let’s extend 1D to 2D. The example of 2D signal is the image. The operation of 2D convolution
is defined as follows:
∞
X ∞
X
y[m, n] = x[m, n] ∗ h[m, n] = x[m − i, n − j] · h[i, j] = x[i, j] · h[m − i, n − j]
i,j=−∞ i,j=−∞
(12)

, where x is input signal, h is FIR filter and y is the output signal.

Homework 3, © UCB EECS 182, Fall 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 3
Homework 3 @ 2022-09-23 19:46:20Z

(c) 2D matrices, x and h are given like below:

 
1 2 3 4 5
6 7 8 9 10
 
x =  11 12 13 14 15 (13)
 
 16 17 18 19 20
 
21 22 23 24 25
 
−1 −2 −1
h= 0 0 0 (14)
 
1 2 1

Then, evaluate y. Assume that there is no pad and stride is 1.

(d) Now let’s consider striding and padding. Evaluate y for following cases:
i. stride, pad = 1, 1
ii. stride, pad = 2, 1

Homework 3, © UCB EECS 182, Fall 2022. All Rights Reserved. This may not be publicly shared without explicit permission. 4
Homework 3 @ 2022-09-23 19:46:20Z

3. Feature Dimensions of Convolutional Neural Network In this problem, we compute output feature
shape of convolutional layers and pooling layers, which are building blocks of CNN. Let’s assume that input
feature shape is W × H × C, where W is the width, H is the height and C is the number of channels of
input feature.

(a) A convolutional layer has 4 hyperparameters: the filter size(K), the padding size (P ), the stride step
size (S) and the number of filters (F ). How many weights and biases in this convolutional layer? And
what is the shape of output feature that this convolutional layer produces?
(b) A pooling layer has 2 hyperparameters: the stride step size(S) and the filter size (K). What is the
output feature shape that this pooling layer produces?
(c) Let’s assume that we have the CNN model which consists of L successive convolutional layers and the
filter size is K and the stride step size is 1 for every convolutional layer. Then what is the receptive
field size?
(d) Consider a downsampling layer (e.g. pooling layer and strided convolution layer). In this problem, we
investigate pros and cons of downsampling layer. This layer reduces the output feature resolution and
this implies that the output features loose the certain amount of spatial information. Therefore when we
design CNN, we usually increase the channel length to compensate this loss. For example, if we apply
the max pooling layer with kernel size of 2 and stride size of 2, we increase the output feature size by
a factor of 2. If we apply this max pooling layer, how much the receptive field increases? Explain
the advantage of decreasing the output feature resolution with the perspective of reducing the
amount of computation.

4. Coding Question: BatchNorm

Look at the BatchNormalization.ipynb. In this notebook, you’ll implement batch normalization layer. For
this question, please submit a .zip file your completed work to the Gradescope assignment titled "HW 3
(Code)". No written portion.

(a) Implement forward operation of batch normalization layer.

(b) Implement backward operation of batch normalization layer.
(c) Implement Fully Connected Nets with Batch Normalization

5. Coding Question: Desigining 2D Filter

Look at the HandDesignFilters.ipynb. In this notebook, you’ll design 2D image filter by hand. For this
question, please submit a .zip file your completed work to the Gradescope assignment titled "HW 3 (Code)".
No written portion.

(a) Design averaging filter.

(b) Design edge detection filter.

6. Coding Question: CNN

Look at the ConvolutionalNetworks.ipynb. In this notebook, you’ll implement Convolutional Neural Net-
works. For this question, please submit a .zip file your completed work to the Gradescope assignment titled
"HW 3 (Code)". No written portion.

(a) Implement forward operation of convolutional layer and max pooling layer.
(b) Implement Three-layer ConvNet
(c) Implement forward operation of spatial batch normalization layer.

7. Homework Process and Study Group

Citing sources and collaborators are an important part of life, including being a student!
We also want to understand what resources you find helpful and how much time homework is taking, so we
can change things in the future if possible.

(a) What sources (if any) did you use as you worked through the homework?
(b) If you worked with someone on this homework, who did you work with?
List names and student ID’s. (In case of homework party, you can also just describe the group.)
(c) Roughly how many total hours did you work on this homework? Write it down here where you’ll
need to remember it for the self-grade form.

Contributors:

• Suhong Moon.

• Gabriel Goh.

• Anant Sahai.

• Dominic Carrano.

• Babak Ayazifar.

• Sukrit Arora.

• Fei-Fei Li.

• Sheng Shen.

• Jake Austin.

• Kevin Li.

Design & Analysis of Algorithms Lab Manual
50% (2)
Design & Analysis of Algorithms Lab Manual
84 pages
Updated SIBO Specific Diet Food Guide
No ratings yet
Updated SIBO Specific Diet Food Guide
13 pages
Jacobi Method PDF
100% (1)
Jacobi Method PDF
57 pages
DSA 2022 Question Paper
100% (1)
DSA 2022 Question Paper
2 pages
Fibonacci Search Technique
No ratings yet
Fibonacci Search Technique
3 pages
Data Structure Lab Manual 2021-22
No ratings yet
Data Structure Lab Manual 2021-22
235 pages
Lesson Plan DAA
No ratings yet
Lesson Plan DAA
3 pages
Stack and Queue
No ratings yet
Stack and Queue
24 pages
DS Unit 1 PDF
No ratings yet
DS Unit 1 PDF
19 pages
Introduction To Computer Simulation of Physical Systems (Lecture 1)
No ratings yet
Introduction To Computer Simulation of Physical Systems (Lecture 1)
63 pages
Unit - 1 Daa
No ratings yet
Unit - 1 Daa
38 pages
Ai 1
No ratings yet
Ai 1
12 pages
TCS and Infosys
No ratings yet
TCS and Infosys
5 pages
Lect 25
No ratings yet
Lect 25
28 pages
Salesforce - LeetCode
No ratings yet
Salesforce - LeetCode
3 pages
DSAIIMid Bank
No ratings yet
DSAIIMid Bank
323 pages
Homemade Elemental Diet Recipes Feb 2022
No ratings yet
Homemade Elemental Diet Recipes Feb 2022
2 pages
Qa With DR Mark Pimentel
No ratings yet
Qa With DR Mark Pimentel
30 pages
Codetantra DS UNIT-1
No ratings yet
Codetantra DS UNIT-1
6 pages
Network Techniques
No ratings yet
Network Techniques
8 pages
Theorem 5.3.1.: 1 N 2 N I 2 I
No ratings yet
Theorem 5.3.1.: 1 N 2 N I 2 I
18 pages
HW 4
No ratings yet
HW 4
10 pages
Dijkstra Algorithm
No ratings yet
Dijkstra Algorithm
13 pages
Linked List: Method To Insert (Add) A Node in The Beginning of A Linked List
No ratings yet
Linked List: Method To Insert (Add) A Node in The Beginning of A Linked List
6 pages
Dynamic Programming Made Simpler
No ratings yet
Dynamic Programming Made Simpler
15 pages
Volume2 PDF
No ratings yet
Volume2 PDF
55 pages
Chapter 6 - Integer Programing Full
No ratings yet
Chapter 6 - Integer Programing Full
44 pages
Ansar
No ratings yet
Ansar
9 pages
Computational Methods in Process Engineering Lab Experiment - 1
No ratings yet
Computational Methods in Process Engineering Lab Experiment - 1
10 pages
NM Lab in C Program
No ratings yet
NM Lab in C Program
14 pages
AVL Trees in Java
No ratings yet
AVL Trees in Java
7 pages
Chapter 5 - Concurrency Deadlock - For Class
No ratings yet
Chapter 5 - Concurrency Deadlock - For Class
54 pages
University Examinations Ics2308: Artificial Intelligence Assignment 2 / Cat 2 DUE DATE: Thursday 15 October 2020 Instructions
No ratings yet
University Examinations Ics2308: Artificial Intelligence Assignment 2 / Cat 2 DUE DATE: Thursday 15 October 2020 Instructions
2 pages
Primeri Za Implementacija Na Magacin I Red So Listi
No ratings yet
Primeri Za Implementacija Na Magacin I Red So Listi
5 pages
Question: Exercise 3. Draw The 11-Item Hash Table That Results From Using
No ratings yet
Question: Exercise 3. Draw The 11-Item Hash Table That Results From Using
4 pages
Parul University: Faculty Name: Institute Name: Academic Year: Year: Semester: 6 Level: Program Name: Division
No ratings yet
Parul University: Faculty Name: Institute Name: Academic Year: Year: Semester: 6 Level: Program Name: Division
1 page
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

HW 3

Uploaded by

HW 3

Uploaded by

Homework 3 @ 2022-09-23 19:46:20Z

EECS 182 Deep Neural Networks

1. Accelerating Gradient Descent with momentum

L(w) = ||y − Xw||22 (1)

(a) Recall that the gradient descent update of 1 is

and the minimizer is

rate = max |1 − 2ησi2 | (5)

ing optimal convergence rate is

2. Understanding Convolution as Finite Impulse Response Filter

Figure 1: The rectangular signal with the length of 7

(a) The impulse response is define as:

. Compute and plot the convolution of x(n) and h(n).

, where x is input signal, h is FIR filter and y is the output signal.

(c) 2D matrices, x and h are given like below:

Then, evaluate y. Assume that there is no pad and stride is 1.

4. Coding Question: BatchNorm

(a) Implement forward operation of batch normalization layer.

5. Coding Question: Desigining 2D Filter

(a) Design averaging filter.

6. Coding Question: CNN

7. Homework Process and Study Group

You might also like