Ps 4
Ps 4
Ps 4
For Problem 4.1, please submit your written solution to Gradescope as a .pdf file.
For Problem 4.2, please submit your solution to Canvas as a notebook file (.ipynb), containing
the visualizations that we requested.
We recommend editing and running your code in Google Colab, although you are welcome to
use your local machine instead.
Recall that (as in Lecture 8), we can represent a formula as a computation graph, which can
make it easier to reason about computing gradients. The following diagram is an example of
the equation f (x, y, z) = (x + y)z:
z × f(x, y, z)
1
(b) Given x = −2, y = 5, z = −4, we could calculate forward pass on the example diagram
given above:
x
−2
+
3
5
y
−4
z × f(x, y, z) = −12
In the same way, given w ~ = [1, 3, −2], ~x = [5, 8], calculate forward pass of the equation
1
f (~x, w)
~ = 1+e−(w0 x0 +w1 x1 +w2 ) .
You can directly write those numbers on the diagram you draw in (a). (1 point)
∂f ∂f ∂f
(c) Recall that we can calculate backward pass by using chain rule, which gives us ∂x , ∂y , ∂z
as following: (Let q = x + y)
x
∂f ∂q
∂q ∂x = −4
+
∂f
∂f ∂q
= −4 ∂q = −4
∂q ∂y
y
∂f
∂z =3
∂f
z × ∂f =1
∂f ∂f
In the same way, draw a new diagram and calculate ~ , ∂~
∂w x on that diagram by using chain rule.
Note: You only meed to write the number below the arrow. (1 points)
Note that there can be ambiguity in how we write the computation graph: we can use primitives
that have simple local gradients. Noticed that we define operation σ(x) in the following way,
which is called sigmoid:
1
σ(x) = (1)
1 + e−x
(d) (Optional) Show that the derivative of sigmoid is (1 − σ(x))σ(x). (0 points)
2
Problem 4.2 Multi-layer perceptron
In this problem, we will train a two-layer neural network to classify images. Our network
will have two layers, and a softmax layer to perform classification. We’ll train the network
to minimize a cross-entropy loss function (also known as softmax loss). The network uses a
ReLU nonlinearity after the first fully connected layer. In other words, the network has the
following architecture:
1) input, 2) fully connected layer, 3) ReLU, 4) fully connected layer, 5) softmax.
Figure 1: The CIFAR-10 dataset, which we’ll be classifying in this problem. [1]
The outputs of the second fully-connected layer are the scores for each class. You should not
use a deep learning library (e.g. PyTorch) for this problem: instead, you will implement it
from scratch.
(a) Implement the fully connected, ReLU, and Softmax layers. (3 points)
Hint:
• ReLU:
x, x≥0
y= (4)
0, otherwise
• Softmax:
exi
yi = PN (5)
xj
j=1 e
3
Note: When you exponentiate even large–ish numbers in your softmax layer, the result
could be quite large and numpy would return inf. To avoid these numerical issues, you
can first subtract the maximum value of the input to the softmax, exploiting the fact:
(c) In this problem, you need to set up model hyperparameters (hidden dim, learning rate,
lr decay, batch size.) Run the given code and report the accuracy on test set. If your method
is implemented correctly, you should obtain at least 45% accuracy on the test set. (1 point)
(d) Plot the both training and validation accuracy across iterations. (0.5 point)
Reference
[1]Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.
Acknowledgement
Part of the homework and the starter code are taken from previous EECS442 by David Fouhey
and CS231n at Stanford University by Fei-Fei Li, Justin Johnson and Serena Yeung. Please
feel free to similarly re-use our problems while similarly crediting us.