0% found this document useful (0 votes)
427 views2 pages

CS5242 Neural Networks and Deep Learning: Quiz 1

This document contains a quiz for a course on neural networks and deep learning. The quiz contains 9 multiple choice questions testing understanding of concepts such as the differences between expert systems and machine learning models, applications of deep learning models, loss functions used for classification problems, gradient descent optimization algorithms, backpropagation, overfitting and regularization techniques, and computational details of common neural network operations.

Uploaded by

Ssss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
427 views2 pages

CS5242 Neural Networks and Deep Learning: Quiz 1

This document contains a quiz for a course on neural networks and deep learning. The quiz contains 9 multiple choice questions testing understanding of concepts such as the differences between expert systems and machine learning models, applications of deep learning models, loss functions used for classification problems, gradient descent optimization algorithms, backpropagation, overfitting and regularization techniques, and computational details of common neural network operations.

Uploaded by

Ssss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS5242 Neural Networks and Deep Learning

Quiz 1

Name ___________________ Student Number __A___________ NUSNET ID __E_____________

1. Which statement is correct


A. Deep learning is a killer for all AI tasks.
B. One of the differences between expert systems and machine learning models is that the
later learns from data, whereas the expert systems typically consists of rules designed by
experts.
C. Deep learning models uses a cascade of multiple layers of linear processing units for feature
extraction and transformation.
D. Neural networks model how the brain and neurons work.

2. Which statement is correct:


A. For a classification model, the cross-entropy loss function measures the difference between
the prediction and the ground truth; therefore, it is equal to the accuracy of the model.
B. Softmax and cross-entropy are usually used in multi-class multi-label classification problem.
C. Gradient descent algorithms are usually applied to tune the model parameters and hyper-
parameters.
D. f(x) = ReLU(x) + 0.01x can be used as an activation function in MLP.

3. Which statement is correct,


A. GD optimizes the average loss over all training data; therefore, it is always faster (in terms of
training time) than SGD algorithm.
B. SGD has faster convergence rate than mini-batch SGD since it has lower chance of getting
local optimum.
C. When we use Adam, we may need to decay the learning rate to get smaller loss.
D. Mini-batch SGD is faster than SGD per iteration since it can fully utilize the computing units
of the GPU.

4. Which statement is correct

A. BP algorithm updates the parameters based on the gradients of the average loss w.r.t the
parameters.
B. To add a new operator in a neural network, we need to implement the forward and
backward method for the BP algorithm to call.
C. For chain rule I, suppose there is an operator y=f(x), where x and y could be scalars or
𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
vectors or matrices. If the gradient of the final output o w.r.t y is known, then 𝜕𝜕𝜕𝜕 = 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕 .
D. For chain rule I, suppose the variables 𝑣𝑣1 , 𝑣𝑣2 , … 𝑣𝑣𝑘𝑘 (k>3) are following the topological order,
𝜕𝜕𝑣𝑣 𝜕𝜕𝑣𝑣𝑘𝑘−1
to compute 𝜕𝜕𝑣𝑣𝑘𝑘, we have to compute 𝜕𝜕𝑣𝑣1
explicitly.
1

5. Which statement is correct


A. The number of layers in a MLP model is a hyper-parameter, whereas the mini-batch size is
not a hyper-parameter.
B. Underfitting is better than overfitting.
C. When the training error is high, we say the model is overfitting.
D. When the model capacity increases, the bias reduces, and the variance increases gradually.

6. Which statement is correct

A. Adding the squared L2 norm of the parameters into the loss function prevents the model
parameters from becoming large; therefore, it regularizes the model and reduces the chance
of overfitting.
B. If we do not initialize the parameters randomly in MLP model, then all parameters will be
the same throughout the training process.
C. Adam algorithm always converges to better optimal points than mini-batch SGD with
momentum.
D. Early stopping cannot regularize the model since it does not change the model to add any
constraint.

7. Suppose we are applying SGD to train a model with two parameters 𝒘𝒘 ∈ 𝑅𝑅 2 ; and squared L2
𝜆𝜆 2
norm regularization ( �|𝒘𝒘|� ) is also applied. Given the learning rate 0.5, the parameter values
2
as 𝒘𝒘 = (1, 2)𝑇𝑇 , the gradient of cross-entropy loss w.r.t the parameters as (2, −2)𝑇𝑇 , and the
coefficient 𝜆𝜆 = 1; after one step of updating, 𝒘𝒘 = (−0.5, 2 )𝑇𝑇 (2 points)

8. Suppose 𝑦𝑦 = 𝒛𝒛𝑇𝑇 𝑼𝑼 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(𝑾𝑾𝑾𝑾), 𝑧𝑧 ∈ 𝑅𝑅 3 , 𝑈𝑈 ∈ 𝑅𝑅 3×5 , 𝑊𝑊 ∈ 𝑅𝑅 5×4 , 𝑥𝑥 ∈ 𝑅𝑅 4 , the shape of


𝜕𝜕𝜕𝜕
𝑖𝑖𝑖𝑖 (5,4)
𝜕𝜕𝑾𝑾

9. Suppose the Leaky ReLU is f(x) = 0.01x if x <0; x, otherwise. Given 𝒙𝒙 = (2, −1)𝑇𝑇 , and the output
𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕
is 𝒉𝒉; if gradient of the loss w.r.t 𝒉𝒉 is 𝜕𝜕𝒉𝒉 = (1, 1)𝑇𝑇 , 𝜕𝜕𝒙𝒙
= (1, 0.01)𝑇𝑇

You might also like