DL Quiz1

The document contains a quiz for a Deep Learning course, consisting of beginner, intermediate, and advanced questions related to various concepts in machine learning and deep learning. It covers topics such as regression, neural networks, dropout, batch normalization, and the universal approximation theorem. The quiz is structured to assess knowledge through multiple-choice questions and short answer formats.

Uploaded by

Abhay Shakya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

DL Quiz1

Uploaded by

Abhay Shakya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Deep Learning (CSE641/ECE555)

Quiz-1 (15 Marks) (Duration 60 min)

Name . . . . . .
Roll No. . . . . . .

Beginner: 6 Marks
1. How regression and density estimation models can be used for classification? .5 Marks
Regression uses class scores for classification, and Density estimation uses Baye’s law.
2. Assume that the input X to some scalar function f (.) is n×m matrix. What is the dimensionality of the gradient
of f with respect to X? .5 Marks
Same as X n × m, because each element of this matrix represents the partial derivative of f with respect to the
corresponding element of X.
3. Which one is more powerful - a two layer neural network without any activation function or a two layer binary
decision tree & Why? .5 Marks
NN. In BDT the decision nodes for depth 2 is 4, and the number of leaf nodes is 8.
4. What are the different types of learning methods in ML/DL? 1 Marks
Lecture 1, slide 11
5. Your binary classification network is y = σ(ReLU (z)), where the predicted label of input is chosen to be 1 when
ŷ ≥ 0.5 and 0 otherwise. What will happen to this network while training? 1 Marks
All samples will be labelled positive.
6. What happens to the receptive field of a 1D convolutional network as more layers are added? .5 Marks
(a) It stays the same
(b) It decreases exponentially
(c) It increases linearly
(d) It increases non-linearly
7. Which of the following is true about dropout? .5 Marks
(a) Dropout leads to sparsity in the trained weights
(b) At test time, dropout is applied with inverted keep probability
(c) The larger the keep probability of a layer, the stronger the regularization of the weights in that layer
(d) None of the above
8. Can we remove the bias parameter from the fully-connected layer and the convolutional layer before the batch
normalization? 1 Marks
Mathematically Yes, since the mean subtraction step in BN will cancel the bias and BN itself has a bias parameter.
However, if BN is applied after the activation function, then not always.
9. According to the Universal Approximation Theorem, which type of function can a sufficiently wide feedforward
neural network approximate on a compact subset of Rn .5 Marks
(a) Only smooth functions
(b) Only polynomial functions
(c) Any continuous function
(d) Only functions with bounded derivatives
Intermediate: 5 Marks
1. Which of these methods will use less CPU RAM: (a) loading model from individual weight files or (b) sequential
model loading with meta device? .5 Mark
individual weight files

1
2. What are the two broad ways to reduce/avoid overfitting (Hint: Think about function approx. in ERM frame-
work)? .5 Marks
Reducing the space (less functionals, simple DNN with fewer modules/layers); Making the choice of f ∗ less
dependent on data (penalty on coefficients, margin maximization, ensemble methods)
3. Which of the following would you consider to be valid activation functions (elementwise non-linearities) to train
a neural net in practice and Why? .5 Mark
(a) ϕ(x) = − min(2, x)
(b) ϕ(x) = 0.9x2 + 1
(
min(x, 0.1x)|x ≥ 0
(c) ϕ(x) =
min(x, 0.1x)|x < 0
(
max(x, 0.1x)|x ≥ 0
(d) ϕ(x) =
min(x, 0.1x)|x < 0

(i), (ii), (iii). (iv) is linear functions, therefore quite useless as activations.
4. You are training a deep MLP (100 layers) on a binary classification task, using a sigmoid activation in the final
layer and a mixture of tanh and ReLU activations for all other layers. You notice weights to a subset of layers
stop updating after the first epoch of training, even though your network has not yet converged. Why is this
happening? and explain which among the below options might help. 2 Marks
(a) Increase the size of your training set
(b) Switch the ReLU activations with leaky ReLUs everywhere
(c) Add Batch Normalization before every activation
(d) Increase the learning rate
(ii), (iii).
Classic vanishing gradient problem. Increasing size of the training set (i) doesn’t help as the issue lies with the
learning dynamics of the network. Varying the learning rate (iv) might help the network learn faster, but as the
problem states the gradients to specific layers almost completely go to zero, so the issue seems to be localized
to specific layers. (ii) Solves the problem of dying relus by passing some gradient signal back through all relu
layers. (iii) Adding BatchNorm prior to every activation ensures the tanh layers have inputs distributed closer
to the linear region of the activation, so the elementwise derivative across the layer evaluates closer to 1.
5. An input of size [2 × 3 × 256 × 256] passed through a 2D convolution layer with output channel (12), kernel size
(5,5), dilation (2,2), padding=valid, stride (1,3). Find output shape. 1.5 Marks
(2x12×248×83) Look formula of Height and Width of an image on PyTorch’s 2D Convolution Documentation.
Advanced: 4 Marks
1. After the first DL assignment, your friend Ram concludes that Dropout and Batch Normalization (BN) often lead
to a worse performance when they are combined together for CNNs (including ResNets). But your friend Sita
argues against it by showing a counter-example of Wide-ResNet (WRN). Is Ram’s conclusion wrong? Explain.
Dropout shifts the variance of a specific neural unit when we transfer the state of that network from training to
test. However, BN maintains its statistical variance, which is accumulated from the entire learning procedure,
in the test phase. The inconsistency of variances in Dropout and BN i.e., the “variance shift”) causes the
unstable numerical behavior in inference that leads to erroneous predictions finally. Meanwhile, the large feature
dimension in WRN further reduces the “variance shift” to bring benefits to the overall performance.
2. You have a dataset where each example contains two features, x1 and x2, and a binary label as shown below.
You want to develop a model to perform binary classification using a single hidden layer with 4 neurons. If
you use the below mentioned activation function is it possible for this model to achieve perfect accuracy on this
dataset? If so, provide a set of weights that achieves perfect accuracy. If not, briefly explain why.
2x2
(
x ≥ 0 : x+|x|
ϕ(x) =
x≤0:0

The key is to have each hidden node evaluate one of the sides of the separating square, and the output layer
checks that all conditions are true (or false, depending on how the hidden weights are set). For example: w1 =
(0, -1), b1 = 2; w2 = (-1, 0), b2 = 2; w3 = (0, 1), b3 = -1; w4 = (1, 0), b4 = -1; woutput = (1, 1, 1, 1), b1 = -4

2
Bonus Questions.

1. Derive an expression for the gradient of cross-entropy loss. 1 Point

the cross-entropy loss function for a single example:
C
X
L(y, ŷ) = − yi log(ŷi )
i=1

Where:
• y is the true label.
• ŷ is the predicted logits before applying softmax.
• C is the number of classes.
Apply chainrule
∂L ∂L ∂oi
=
∂ ŷi ∂oi ∂ ŷi

oi is the raw output (logit) of the i-th class.

C
∂L ∂ X
=− yj log(ŷj )
∂oi ∂oi j=1
yi
=−
ŷi
∂oi
= ŷi (1 − ŷi )
∂ ŷi

∂L yi
= − ŷi (1 − ŷi )
∂ ŷi ŷi
= −yi (1 − ŷi )

final gradient w.r.t logits

∂L
= ŷi − yi
∂ ŷi

Deep Learning (CSE641/ECE555)

Quiz-1 (15 Marks) (Duration 60 min)
Name . . . . . .
Roll No. . . . . . .

Beginner: 6 Marks
1. How regression and density estimation models can be used for classification? .5 Marks
2. Assume that the input X to some scalar function f (.) is n×m matrix. What is the dimensionality of the gradient
of f with respect to X? .5 Marks
Same as X n × m, because each element of this matrix represents the partial derivative of f with respect to the
corresponding element of X.

3
3. Which one is more powerful - a two layer neural network without any activation function or a two layer binary
decision tree? Why? .5 Marks
4. What are the different types of learning methods in ML/DL? .5 Marks
5. Your binary classification network is y = σ(ReLU (z)), where the predicted label of an input is chosen to be 1
when ŷ ≥ 0.5 and 0 otherwise. What will happen to this network while training? 1 Marks
6. Name three types of functions for which UAT doesn’t hold true. 1 Marks
functions that are non-continuous or defined over an open interval or over an infinitely wide domain.
7. Which of the following is true about dropout? .5 Marks

(a) Dropout leads to sparsity in the trained weights

(b) At test time, dropout is applied with inverted keep probability
(c) The larger the keep probability of a layer, the stronger the regularization of the weights in that layer
(d) None of the above

8. Can we remove the bias parameter from the fully-connected layer and the convolutional layer before the batch
normalization? 1 Marks
9. The maximal number of linear regions of functions computed by a single layer rectifier network with n0 inputs
and n1 hidden units is? .5 Marks
Pn0 n1
j=0 j

Intermediate: 5 Marks
1. Which of the following would you consider to be valid activation functions (elementwise non-linearities) to train
a neural net in practice and Why? .5 Mark
(a) ϕ(x) = − min(2, x)
(b) ϕ(x) = 0.9x2 + 1
(
min(x, 0.1x)|x ≥ 0
(c) ϕ(x) =
min(x, 0.1x)|x < 0
(
max(x, 0.1x)|x ≥ 0
(d) ϕ(x) =
min(x, 0.1x)|x < 0

2. Name and briefly explain the three types of double decent phenomena in DL. .5 Marks
Model-wise: Classical double descent; Sample-wise: There is a regime where more data hurts; Epoch-wise: Also
called as Grokking (very popular in LLMs).
3. Consider the estimate MAP=fY |X (y|x)fX (x). Let X be a continuous random variable with the following PDF
fX (x) = [2x (0 < x < 1); 0 otherwise]. Also suppose

PY |X (y|x) = x(1 − x)y−1 , for y = 1, 2, · · · (Geometric dist.)

Find the MAP estimate of X given Y = 3 .5 Marks

y−1 2 2
PY |X (y|x) = x(1 − x) = PY |X (3|x) = x(1 − x) . Hence maximum of PY |X (y|x)fX (x) = x(1 − x) 2x is x = .5

4. After training the model architecture with cross-entropy loss, you find that the softmax classifier works well.
Specifically, the model achieves 100% accuracy on the training data. However, you observe that the training loss
doesn’t quite reached zero. How can you fix this? If not why? 2 Marks
P z zc
Given correct class ‘c’, the loss reduces to the term log i e − log e ; as all the exponentials are non-zero, this
loss cannot be zero, unless output probability=1, which is not achievable in finite computation.
5. An input of size [4 × 3 × 256 × 256] passed through a 2D convolution layer with output channel (12), kernel size
(5,3), dilation (2,1), padding=valid, stride (1,3). Find output shape. 1.5 Marks
(4x12×248×85) Look formulla of Height and Width of image on PyTorch’s 2D Convolution Documentation.
Advanced: 4 Marks

4
1. After the first DL assignment, your friend Ram concludes that Dropout and Batch Normalization (BN) often lead
to a worse performance when they are combined together for CNNs (including ResNets). But your friend Sita
argues against it by showing a counter-example of Wide-ResNet (WRN). Is Ram’s conclusion wrong? Explain
Yes/No.
2. You have a dataset where each example contains two features, x1 and x2, and a binary label as shown below.
You want to develop a model to perform binary classification using a single hidden layer with 4 neurons. If
you use the below mentioned activation function is it possible for this model to achieve perfect accuracy on this
dataset? If so, provide a set of weights that achieves perfect accuracy. If not, briefly explain why.
2x2
(
x ≥ 0 : x+|x|
ϕ(x) =
x≤0:0

Bonus Question

1. Prove the following lower bound on the cross-entropy loss for an example considering K classes, softmax activation
with cross-entropy loss and ground truth vector y as one-hot encoding. 1 Point

LCE (ŷ, y) ≥ K log K

X X ŷ 1
LCE (ŷ, y) = − log ŷi ≥ −K log (Jensen’s inequality) = (−K) log (softmax sums to 1) = K log K
K K

Midpaper
No ratings yet
Midpaper
16 pages
Genai See
No ratings yet
Genai See
51 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
Examen Deep Learning
100% (1)
Examen Deep Learning
8 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
CS335 Lab6
No ratings yet
CS335 Lab6
7 pages
SS 2020
No ratings yet
SS 2020
21 pages
DL Assigment Aryan Gupta UE218015
No ratings yet
DL Assigment Aryan Gupta UE218015
5 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Disc11-Examprep-Sols (9 Files Merged)
No ratings yet
Disc11-Examprep-Sols (9 Files Merged)
12 pages
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
No ratings yet
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
6 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
SS 2021
No ratings yet
SS 2021
16 pages
DL - Midterm - Fall23
No ratings yet
DL - Midterm - Fall23
2 pages
WS 2021 Solutions
No ratings yet
WS 2021 Solutions
16 pages
SS 2021 Solutions
No ratings yet
SS 2021 Solutions
16 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Module 2
No ratings yet
Module 2
13 pages
DL Group Exercise 1
No ratings yet
DL Group Exercise 1
7 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Solution Dseclzg524!01!102020 Ec2r
100% (1)
Solution Dseclzg524!01!102020 Ec2r
6 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
CS6910 Tutorial5
No ratings yet
CS6910 Tutorial5
9 pages
Homework 2
No ratings yet
Homework 2
3 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
WS 2021
No ratings yet
WS 2021
16 pages
Week 7
No ratings yet
Week 7
7 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
DL Midterm Rubrics
No ratings yet
DL Midterm Rubrics
5 pages
ANN Notes
No ratings yet
ANN Notes
7 pages
Deep Learning For Beginners Mock Exam PDF
No ratings yet
Deep Learning For Beginners Mock Exam PDF
15 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
DL Exp-3 16010422230
No ratings yet
DL Exp-3 16010422230
9 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Is The Data Linearly Separable?: A) Yes B) No
No ratings yet
Is The Data Linearly Separable?: A) Yes B) No
19 pages
Instructor's Solution Manual For Neural Networks
No ratings yet
Instructor's Solution Manual For Neural Networks
40 pages
Week 3
No ratings yet
Week 3
5 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Comprehensive Exam - Answer Key - DNN - EC3M - October 2024
No ratings yet
Comprehensive Exam - Answer Key - DNN - EC3M - October 2024
7 pages
DSE 3151 25 Sep 2023
No ratings yet
DSE 3151 25 Sep 2023
9 pages
Ass5 Soln
No ratings yet
Ass5 Soln
6 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
19CSE456 - VI Sem May 2022
No ratings yet
19CSE456 - VI Sem May 2022
6 pages
Quiz AI2
No ratings yet
Quiz AI2
11 pages
MT1 SP19 Solutions
No ratings yet
MT1 SP19 Solutions
14 pages
Week 8
No ratings yet
Week 8
4 pages
Solution Dseclzg524 05-07-2020 Ec3r
No ratings yet
Solution Dseclzg524 05-07-2020 Ec3r
7 pages
Module 2
No ratings yet
Module 2
12 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Minor 1 - DNN
No ratings yet
Minor 1 - DNN
2 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
Polynomial Function Graphing
100% (1)
Polynomial Function Graphing
19 pages
S&S PDF
No ratings yet
S&S PDF
224 pages
Prak. Robotika Cerdas Tugas 2
0% (1)
Prak. Robotika Cerdas Tugas 2
7 pages
Final Quiz 2 3
No ratings yet
Final Quiz 2 3
4 pages
Cs7015 (Deep Learning) : Lecture 11: Convolutional Neural Networks, Lenet, Alexnet, Zf-Net, Vggnet, Googlenet and Resnet
No ratings yet
Cs7015 (Deep Learning) : Lecture 11: Convolutional Neural Networks, Lenet, Alexnet, Zf-Net, Vggnet, Googlenet and Resnet
477 pages
Java
No ratings yet
Java
27 pages
Brute Force Searching String Matching
No ratings yet
Brute Force Searching String Matching
9 pages
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
No ratings yet
A G1002 Pages: 2: Answer Any Two Full Questions, Each Carries 15 Marks
2 pages
Directsparsematrices
No ratings yet
Directsparsematrices
87 pages
Ch.10 Numerical Methods
No ratings yet
Ch.10 Numerical Methods
1 page
Unit 1 - Assignment
No ratings yet
Unit 1 - Assignment
10 pages
DSP Matlab Programs
No ratings yet
DSP Matlab Programs
18 pages
Power Generation, Operation, and Control
No ratings yet
Power Generation, Operation, and Control
21 pages
3 Binary - Search
No ratings yet
3 Binary - Search
20 pages
Syllabus For Dynamic Programming
No ratings yet
Syllabus For Dynamic Programming
2 pages
AMS - 326 - Syllabus - Summer 2024
No ratings yet
AMS - 326 - Syllabus - Summer 2024
2 pages
WWW - Manaresults.co - In: Code No: R1631044
No ratings yet
WWW - Manaresults.co - In: Code No: R1631044
2 pages
Particle Swarm Optimization - 1
No ratings yet
Particle Swarm Optimization - 1
21 pages
Voice Recognition Using FFT Transformation
No ratings yet
Voice Recognition Using FFT Transformation
3 pages
Method of Quadratic Sequence
No ratings yet
Method of Quadratic Sequence
4 pages
A 77.3-dB SNDR 62.5-kHz Bandwidth Continuous-Time Noise-Shaping SAR ADC With Duty-Cycled GM-C Integrator
No ratings yet
A 77.3-dB SNDR 62.5-kHz Bandwidth Continuous-Time Noise-Shaping SAR ADC With Duty-Cycled GM-C Integrator
10 pages
Matlab 4
No ratings yet
Matlab 4
4 pages
Good Question DSP
No ratings yet
Good Question DSP
36 pages
Kde Presentation PDF
No ratings yet
Kde Presentation PDF
105 pages
A Shifted Block Lanczos Algorithm For Solving Sparse Symmetric Generalized Eigenproblems
No ratings yet
A Shifted Block Lanczos Algorithm For Solving Sparse Symmetric Generalized Eigenproblems
45 pages
The FPGA Implementation of The Digital Receiver 3
No ratings yet
The FPGA Implementation of The Digital Receiver 3
64 pages
Assignment No.: Aim: Implement A Star Algorithm For Eight Puzzle Problem. Objective
No ratings yet
Assignment No.: Aim: Implement A Star Algorithm For Eight Puzzle Problem. Objective
4 pages
Problem 1 (Total: 15%) : 19ECE06C Signals & Systems Problem-Based Project
No ratings yet
Problem 1 (Total: 15%) : 19ECE06C Signals & Systems Problem-Based Project
6 pages
Homework 3: Gauss-Jordan Elimination, Otherwise No Points)
No ratings yet
Homework 3: Gauss-Jordan Elimination, Otherwise No Points)
2 pages
DSP - Eee F434 2018-19 - CMS PDF
No ratings yet
DSP - Eee F434 2018-19 - CMS PDF
3 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet