Ann 2
Ann 2
Full Your
Problem
Points Score
1 Multiple Choice 10
2 Short Questions 12
3 Backpropagation 9
Total 31
The purpose of this mock exam is to give you an idea of the type of problems and the
structure of the final exam. The mock exam is not graded. The final exam will most
probably be composed of 90 graded points with a total time of 90 minutes.
• For each question, you’ll receive 2 points if all boxes are answered correctly (i.e. correct
answers are checked, wrong answers are not checked) and 0 otherwise.
• If you change your mind, please fill the box: (interpreted as not checked)
• If you change your mind again, please circle the box: (interpreted as checked)
I2DL Mock Exam, Page 2 of 8 SoSe 2020
4. (2 points) Which of the following optimization methods use first order momentum?
Stochastic Gradient Descent
√
Adam
RMSProp
Gauss-Newton
5. (2 points) Making your network deeper by adding more parametrized layers will al-
ways...
√
slow down training and inference speed.
reduce the training loss.
improve the performance on unseen data.
√
(Optional: make your model sound cooler when bragging about it
at parties.)
I2DL Mock Exam, Page 3 of 8 SoSe 2020
Solution:
The model performs better on unseen data than on training data - this should not
happen under normal circumstances. Possible explanations:
• Training and Validation data sets are not from the same distribution
• Error in the implementation
• ...
2. (2 points) You’re working for a cool tech startup that receives thousands of job appli-
cations every day, so you train a neural network to automate the entire hiring process.
Your model automatically classifies resumes of candidates, and rejects or sends job offers
to all candidates accordingly. Which of the following measures is more important for
your model? Explain.
T rue P ositives
Recall = T otal P ositive Samples
T rue P ositives
Precision = T otal P redicted P ositive Samples
Solution:
Precision: High precision means low rate of false positives.
False Negatives are okay, since we get ”thousands of applications” it’s not too bad if
we miss a few candidates even when they’d be a good fit. However, we don’t want
False Positives, i.e. offer a job to people who are not well suited.
I2DL Mock Exam, Page 4 of 8 SoSe 2020
3. (2 points) You’re training a neural network for image classification with a very large
dataset. Your friend who studies mathematics suggests: ”If you would use Newton-
Method for optimization, your neural network would converge much faster than with
gradient descent!”. Explain whether this statement is true (1p) and discuss potential
downsides of following his suggestion (1p).
Solution:
Faster convergence in terms of number of iterations (”mathematical view”). (1 pt.)
However: Approximating the inverse Hessian is highly computationally costly, not
feasible for high-dimensional datasets. (1 pt.)
4. (2 points) Your colleague trained a neural network using standard stochastic gradient
descent and L2 weight regularization with four different learning rates (shown below)
and plotted the corresponding loss curves (also shown shown below). Unfortunately he
forgot which curve belongs to which learning rate. Please assign each of the learning rate
values below to the curve (A/B/C/D) it probably belongs to and explain your thoughts.
l e a r n i n g r a t e s = [ 3 e −4, 4e −1, 2e −5, 8e −3]
2.1
Curve C (green)
2.0
Curve D (orange)
1.9
0 20 40 60 80 100 120 140
iteration
Solution:
Curve A: 4e-1 = 0.4 (Learning Rate is way too high)
Curve B: 2e-5 = 0.00002 (Learning Rate is too low)
Curve C: 8e-3 = 0.008 (Learning Rate is too high)
Curve D: 3e-4 = 0.0003 (Good Learning Rate)
I2DL Mock Exam, Page 5 of 8 SoSe 2020
Solution:
Without non-linearities, our network can only learn linear functions, because the
composition of linear functions is again linear.
6. (3 points) When implementing a neural network layer from scratch, we usually imple-
ment a ‘forward‘ and a ‘backward‘ function for each layer. Explain what these functions
do, potential variables that they need to save, which arguments they take, and what
they return.
Solution:
Forward Function:
• takes output from previous layer, performs operation, returns result (1 pt.)
• caches values needed for gradient computation during backprop (1 pt.)
Backward Function:
7. (0 points) Optional: Given a Convolution Layer with 8 filters, a filter size of 6, a stride
of 2, and a padding of 1. For an input feature map of 32 × 32 × 32, what is the output
dimensionality after applying the Convolution Layer to the input?
Solution:
32−6+2·1
2
+ 1 = 14 + 1 = 15 (1 pt.)
15 × 15 × 8 (1 pt.)
I2DL Mock Exam, Page 6 of 8 SoSe 2020
b2 b4
(a) (3 points) Compute the output (o1 , o2 ) with the input (i1 , i2 ) and network paramters
as specified above. Write down all calculations, including intermediate layer results.
Solution:
Forward pass:
(b) (1 point) Compute the mean squared error of the output (o1 , o2 ) calculated above
and the target (t1 , t2 ).
Solution:
1 1
M SE = × (t1 − o1 )2 + × (t2 − o2 )2 = 0.5 × 1.0 + 0.5 × 4.0 = 2.5
2 2
(c) (5 points) Update the weight w21 using gradient descent with learning rate 0.1 as
well as the loss computed previously. (Please write down all your computations.)
Solution:
+ ∂M SE
w21 = w21 − lr ∗ = 0.5 − 0.1 ∗ −1.5 = 0.65
∂w21
I2DL Mock Exam, Page 8 of 8 SoSe 2020
Additional Space for solutions. Clearly mark the problem your answers are
related to and strike out invalid solutions.