Ps 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

University of Michigan

EECS 504: Foundations of Computer Vision


Winter 2020. Instructor: Andrew Owens.

Problem Set 4: Backpropagation

Posted: Tuesday, February 4, 2020 Due: Tuesday, February 11, 2020

For Problem 4.1, please submit your written solution to Gradescope as a .pdf file.
For Problem 4.2, please submit your solution to Canvas as a notebook file (.ipynb), containing
the visualizations that we requested.

The starter code can be found at:


https://drive.google.com/open?id=1m04gqnMQXE6l0n3phzaxnViP-GVw8chA

We recommend editing and running your code in Google Colab, although you are welcome to
use your local machine instead.

Problem 4.1 Understanding backpropagation

Recall that (as in Lecture 8), we can represent a formula as a computation graph, which can
make it easier to reason about computing gradients. The following diagram is an example of
the equation f (x, y, z) = (x + y)z:

z × f(x, y, z)

(a) In the same way, given the input ~x = [x0 , x1 ], w


~ = [w0 , w1 , w2 ], draw a computation graph
1
for f (~x, w)
~ = 1+e−(w0 x0 +w1 x1 +w2 ) .

Note: Please use the following operations: +, ×, −, +1, exp, x1 . (1 point)

1
(b) Given x = −2, y = 5, z = −4, we could calculate forward pass on the example diagram
given above:

x
−2
+
3
5
y

−4
z × f(x, y, z) = −12

In the same way, given w ~ = [1, 3, −2], ~x = [5, 8], calculate forward pass of the equation
1
f (~x, w)
~ = 1+e−(w0 x0 +w1 x1 +w2 ) .
You can directly write those numbers on the diagram you draw in (a). (1 point)

∂f ∂f ∂f
(c) Recall that we can calculate backward pass by using chain rule, which gives us ∂x , ∂y , ∂z
as following: (Let q = x + y)

x
∂f ∂q
∂q ∂x = −4
+
∂f
∂f ∂q
= −4 ∂q = −4
∂q ∂y
y
∂f
∂z =3
∂f
z × ∂f =1

∂f ∂f
In the same way, draw a new diagram and calculate ~ , ∂~
∂w x on that diagram by using chain rule.

Note: You only meed to write the number below the arrow. (1 points)

Note that there can be ambiguity in how we write the computation graph: we can use primitives
that have simple local gradients. Noticed that we define operation σ(x) in the following way,
which is called sigmoid:
1
σ(x) = (1)
1 + e−x
(d) (Optional) Show that the derivative of sigmoid is (1 − σ(x))σ(x). (0 points)

~ = 1+e−(w0 x01+w1 x1 +w2 ) using σ(x) as


(e) Draw a new computation graph of the equation f (~x, w)
a node of the graph. Calculate forward pass and backward pass, making use of the fact you
proved in (d). Comment on whether ∂∂fw~ , ∂f
x is the as same as (c). (1 point)
∂~

2
Problem 4.2 Multi-layer perceptron

In this problem, we will train a two-layer neural network to classify images. Our network
will have two layers, and a softmax layer to perform classification. We’ll train the network
to minimize a cross-entropy loss function (also known as softmax loss). The network uses a
ReLU nonlinearity after the first fully connected layer. In other words, the network has the
following architecture:
1) input, 2) fully connected layer, 3) ReLU, 4) fully connected layer, 5) softmax.

More concretely, we compute class probabilities c from an input image x as:

c = softmax(W2 relu(W1 x + b1 ) + b2 ), (2)

where Wi , bi are the parameters of the fully connected layers.

Figure 1: The CIFAR-10 dataset, which we’ll be classifying in this problem. [1]

The outputs of the second fully-connected layer are the scores for each class. You should not
use a deep learning library (e.g. PyTorch) for this problem: instead, you will implement it
from scratch.

(a) Implement the fully connected, ReLU, and Softmax layers. (3 points)

Hint:

• Fully connected layer:


y = Wx + b (3)

• ReLU: 
x, x≥0
y= (4)
0, otherwise

• Softmax:
exi
yi = PN (5)
xj
j=1 e

3
Note: When you exponentiate even large–ish numbers in your softmax layer, the result
could be quite large and numpy would return inf. To avoid these numerical issues, you
can first subtract the maximum value of the input to the softmax, exploiting the fact:

exi /emax(x) exi −max(x)


yi = PN = PN (6)
xj max(x) xj −max(x)
j=1 e /e j=1 e

(b) In this problem, implement the softmax classifier class. (1 point)

(c) In this problem, you need to set up model hyperparameters (hidden dim, learning rate,
lr decay, batch size.) Run the given code and report the accuracy on test set. If your method
is implemented correctly, you should obtain at least 45% accuracy on the test set. (1 point)

(d) Plot the both training and validation accuracy across iterations. (0.5 point)

Reference

[1]Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

Acknowledgement

Part of the homework and the starter code are taken from previous EECS442 by David Fouhey
and CS231n at Stanford University by Fei-Fei Li, Justin Johnson and Serena Yeung. Please
feel free to similarly re-use our problems while similarly crediting us.

You might also like