CSC413 A2

The document provides instructions for Assignment 2 for CSC413. It states that the deadline for submission is November 7, 2023 by 6pm EST. Students can submit a PDF report or image of handwritten solutions on Markus. Late submissions will be evaluated based on the syllabus criteria. The assignment contains 3 questions - the first asks students to provide example values for neural network parameters to make the hidden units dead, the second asks about dropout layers in PyTorch, and the third explains the bias-variance decomposition formula.

Uploaded by

Ian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views3 pages

CSC413 A2

Uploaded by

Ian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CSC413 Assignment 2

Deadline: Nov 7, 2023 by 6pm EST

Submission: Compile and submit a PDF report containing your written solutions. You may also submit an
image of your legible hand-written solutions. Submissions will be done on Markus.
Late Submission: Please see the syllabus for the late submission criteria. You must work individually on this
assignment.

Question 1. Dead Units (3 pts)

Consider the following neural network, where x ∈ R2 and h ∈ R2 , and y ∈ R2 .

h = ReLU (W(1) x + b(1) )

y = w(2) h + b(2)

Suppose also that each element of x is between -1 and 1.

Part (a)
Come up with an example values of the parameters W(1) , and b(1) such that both hidden units h1 and h2 are
dead.
Answer:

0 0 −1
W(1) = , b(1) =
0 0 −1

Regardless of the input data, both hidden units h1 and h2 will always be ReLU ([−1, −1]T ) = max(−1, 0) = 0.

Part (b)
Show that the gradients of y with respect to W(1) , and b(1) are zero.
Answer:
Let z = W(1) x + b(1)

∂y ∂y ∂h ∂z
= · · = w(2) · 0 · x = 0
∂W(1) ∂h ∂z z1 =z2 =−1 ∂W(1)
∂y ∂y ∂h ∂z
(1)
= · · = w(2) · 0 · 0 = 0
∂b ∂h ∂z z1 =z2 =−1 ∂b(1)

Question 2. Dropout (3pt)

Part (a)
In a dropout layer, instead of “zeroing out” activations at test time, we multiply the weights by 1 − p, where p
is the probability that an activation is set to zero during training. Explain why the multiplication by 1 − p is
necessary for the neural network to make meaningful predictions.
Answer:

When dropout is implemented, during training, dropout is applied with probability 1 − p to each neuron,
effectively setting a fraction 1 − p of activations to zero. This creates a thinner architecture in the given training
batch. However, when we make prediction, we do not use a dropout layer. This means that all the neurons are
activated during the prediction step. But, because of taking all the neurons from a layer, the final weights will
be larger than expected. Scaling the weights by 1 − p during testing helps maintain the expected value of the
activations, so the network doesn’t need to adapt to the sudden drop in activations.

1
Part (b)
Explain the difference between model.train() and model.eval() modes of evaluating a network in Pytorch. Does
the Dropout layer in Pytorch behave differently in these two modes? Feel free to look at the online documen-
tation for Pytorch.
Answer:

model.train() activates dropout layers, which randomly deactivate a fraction of neurons during forward
passes. This mode is used during training to prevent overfitting. While model.eval() deactivates dropout layers,
causing them to pass all activations through without any dropout. This mode is used during testing, validation,
and inference to ensure consistent and deterministic results.

Question 3. Bias-variance decomposition (4 pts)

Let D = (xi , yi )|i = 1...n be a dataset obtained from the true underlying data distribution P , i.e. D ∼ P n .
And let hD (·) be a classifier trained on D. Show the variance bias decomposition

ED,x,y (hD (x) − y)2 = ED,x (hD (x) − ĥ(x))2 + Ex,y (ŷ(x) − y)2 + Ex (ĥ(x) − ŷ(x))2
| {z } | {z } | {z } | {z }
Expected test error Variance Noise Bias2

where ĥ(x) = ED∼P n [hD (x)] is the expected regressor over possible training sets, given the learning algorithm A
and ŷ(x) = Ey|x [y] is the expected label given x. As mentioned in the lecture, labels might not be deterministic
given x. To carry out the proof, proceed in the following steps:

Part (a)
Show that the following identity holds
2 2
2
ED,x,y (hD (x) − y) = ED,x ĥD (x) − ĥ(x) + Ex,y ĥ(x) − y (1)

Answer:
Reformulate (1) as
h i2
2
Ex,y (hD (x) − y) = Ex,y hD (x) − ĥ(x) + ĥ(x) − y
2 2
= Ex,y ĥD (x) − ĥ(x) + 2Ex,y,D hD (x) − ĥ(x) ĥ(x) − y + Ex,y,D ĥ(x) − y
2 2
= Ex,y ĥD (x) − ĥ(x) + Ex,y,D ĥ(x) − y

Note that the second term in the above equation is zero because

Ex,y,D hD (x) − ĥ(x) ĥ(x) − y = Ex,y ED hD (x) − ĥ(x) ĥ(x) − y

= Ex,y ED [hD (x)] − ĥ(x) ĥ(x) − y

= Ex,y ĥ(x) − ĥ(x) ĥ(x) − y

= Ex,y [0]
=0

2
Part (b)
Next, show 2 2
2
Ex,y ĥ(x) − y = Ex,y (ŷ(x) − y) + Ex ĥ(x) − ŷ(x) (2)

which completes the proof by substituting (2) into (1).

Answer:
Reformulate (2) as
2 h i2
Ex,y ĥ(x) − y = Ex,y ĥ(x) − ŷ(x) + (ŷ(x) − y)
2
2
= Ex ĥ(x) − ŷ(x) + 2Ex,y ĥ(x) − ŷ(x) (ŷ(x) − y) + Ex,y (ŷ(x) − y)
2
2
= Ex ĥ(x) − ŷ(x) + Ex,y (ŷ(x) − y)

Note that the second term in the above equation is also zero because

Ex,y ĥ(x) − ŷ(x) (ŷ(x) − y) = Ex Ey|x ĥ(x) − ŷ(x) (ŷ(x) − y)

= Ex Ey|x [ŷ(x) − y] ĥ(x) − ŷ(x)

= Ex ŷ(x) − Ey|x [y] ĥ(x) − ŷ(x)

= Ex (ŷ(x) − ŷ(x)) ĥ(x) − ŷ(x)

= Ex [0]
=0

Part (c)
Explain in a sentence or two what overfitting means and which term in this formula represents it.
Answer:

Overfitting means the model fits too close to the training data, thus gives accurate predictions for training
data but not for new data. When the model overfits, the variance term of this formula will be very high and
the bias term will be very low.

Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
53% (19)
Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
103 pages
CS230: Deep Learning: Winter Quarter 2018 Stanford University Midterm Examination 180 Minutes
100% (1)
CS230: Deep Learning: Winter Quarter 2018 Stanford University Midterm Examination 180 Minutes
36 pages
1.deep Learning Assignment1 Solutions 1
100% (3)
1.deep Learning Assignment1 Solutions 1
12 pages
Solution Manual Neural Networks and Lear
No ratings yet
Solution Manual Neural Networks and Lear
5 pages
Lab 04 Sol PDF
No ratings yet
Lab 04 Sol PDF
7 pages
Solution Dseclzg524 05-07-2020 Ec3r
No ratings yet
Solution Dseclzg524 05-07-2020 Ec3r
7 pages
Review Article: Deep Learning For Computer Vision: A Brief Review
No ratings yet
Review Article: Deep Learning For Computer Vision: A Brief Review
14 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
Week 7
No ratings yet
Week 7
7 pages
Msep2013 L5
No ratings yet
Msep2013 L5
14 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
50% (2)
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
103 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Solution Manual Neural Networks and Lear
No ratings yet
Solution Manual Neural Networks and Lear
5 pages
1155 CS F425 20230524120823 Mid Semester Question Paper DL
No ratings yet
1155 CS F425 20230524120823 Mid Semester Question Paper DL
5 pages
Homework 2
No ratings yet
Homework 2
3 pages
Solution Dseclzg524!01!102020 Ec2r
100% (1)
Solution Dseclzg524!01!102020 Ec2r
6 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
Document 7 Study
No ratings yet
Document 7 Study
8 pages
Class Test 1
No ratings yet
Class Test 1
5 pages
Ee263 Homework 2 Solutions
100% (1)
Ee263 Homework 2 Solutions
8 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
TT1 QBAns1
No ratings yet
TT1 QBAns1
15 pages
Learning From Data HW 6
No ratings yet
Learning From Data HW 6
5 pages
Week 3
No ratings yet
Week 3
3 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
Week 3
No ratings yet
Week 3
5 pages
ML Assignment
No ratings yet
ML Assignment
17 pages
L3 Ann
No ratings yet
L3 Ann
15 pages
Exam2 Review Solutions
No ratings yet
Exam2 Review Solutions
18 pages
2.160 Identification, Estimation, and Learning: End-of-Term Examination
No ratings yet
2.160 Identification, Estimation, and Learning: End-of-Term Examination
3 pages
Week 2
No ratings yet
Week 2
7 pages
HW 1
No ratings yet
HW 1
11 pages
DL Midterm Rubrics
No ratings yet
DL Midterm Rubrics
5 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
CS7015 (Deep Learning) : Lecture 8
No ratings yet
CS7015 (Deep Learning) : Lecture 8
86 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Questions 11: Feed-Forward Neural Networks: Roman Belavkin Middlesex University
No ratings yet
Questions 11: Feed-Forward Neural Networks: Roman Belavkin Middlesex University
7 pages
TFM Lichtner Bajjaoui Aisha
No ratings yet
TFM Lichtner Bajjaoui Aisha
18 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
Ps 3
No ratings yet
Ps 3
15 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Mid-Term A2 ML Solution
No ratings yet
Mid-Term A2 ML Solution
7 pages
Sol3 2016
No ratings yet
Sol3 2016
8 pages
Intro To Neural Networks Explained For Beginners: Sajjad Mustafa
No ratings yet
Intro To Neural Networks Explained For Beginners: Sajjad Mustafa
110 pages
UDL Errata
No ratings yet
UDL Errata
8 pages
CS6910 Tutorial5
No ratings yet
CS6910 Tutorial5
9 pages
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
No ratings yet
CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
8 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
Deep Learning - IIT Ropar - Unit 6 - Week 3
No ratings yet
Deep Learning - IIT Ropar - Unit 6 - Week 3
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Adaline Madaline Comes Under The Supervised Learning Networks
No ratings yet
Adaline Madaline Comes Under The Supervised Learning Networks
8 pages
Lecture 5 - Machine Learning and Deep Learning
No ratings yet
Lecture 5 - Machine Learning and Deep Learning
64 pages
DL CS05
No ratings yet
DL CS05
22 pages
Backpropagation Math
No ratings yet
Backpropagation Math
11 pages
A Guide To Deep Learning and Neural Networks
No ratings yet
A Guide To Deep Learning and Neural Networks
15 pages
PVSNet Palm Vein Authentication
No ratings yet
PVSNet Palm Vein Authentication
8 pages
Deep Learning Algorithms and Architectures
No ratings yet
Deep Learning Algorithms and Architectures
26 pages
Deep Learning State of The Art: Amulya Viswambharan ID 202090007 Kehkshan Fatima ID
No ratings yet
Deep Learning State of The Art: Amulya Viswambharan ID 202090007 Kehkshan Fatima ID
17 pages
Machine Learning Megapack
No ratings yet
Machine Learning Megapack
6 pages
Adaline
No ratings yet
Adaline
18 pages
TOO4TO Module 7 / Artificial Intelligence and Sustainability: Part 1
No ratings yet
TOO4TO Module 7 / Artificial Intelligence and Sustainability: Part 1
17 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
21 pages
Lec 28 Variations in BPNN
100% (1)
Lec 28 Variations in BPNN
20 pages
Intelligent Control of Drives-1
No ratings yet
Intelligent Control of Drives-1
82 pages
00-Introduction DNN
No ratings yet
00-Introduction DNN
32 pages
Implementation of Single Layer Perceptron Model Using MATLAB
No ratings yet
Implementation of Single Layer Perceptron Model Using MATLAB
5 pages
References
No ratings yet
References
2 pages
2 - Neural Network
100% (1)
2 - Neural Network
59 pages
AI Unit 4 - Artificial Neural Network by Kulbhushan (Krazy Kaksha & KK World)
No ratings yet
AI Unit 4 - Artificial Neural Network by Kulbhushan (Krazy Kaksha & KK World)
5 pages
Artificial Neural Networks-Unsupervised Learning PDF
No ratings yet
Artificial Neural Networks-Unsupervised Learning PDF
39 pages
VaasuBisht Resume (July Updated)
No ratings yet
VaasuBisht Resume (July Updated)
2 pages
Ain3001 Presentation Guideline For ML Midterm
No ratings yet
Ain3001 Presentation Guideline For ML Midterm
3 pages
Ai Sex Video HD
No ratings yet
Ai Sex Video HD
5 pages
Tensor Flow
No ratings yet
Tensor Flow
130 pages
Explainable AI in Image Processing
No ratings yet
Explainable AI in Image Processing
13 pages
7 Ann Multilayer Perceptron Full
No ratings yet
7 Ann Multilayer Perceptron Full
69 pages
ECE 457 Course Synopsis Computational Intelligence: Fuzzy Logic and Neural Networks Fundamentals
No ratings yet
ECE 457 Course Synopsis Computational Intelligence: Fuzzy Logic and Neural Networks Fundamentals
1 page
Neural Network Architectures
No ratings yet
Neural Network Architectures
32 pages
Deep Learning Chorale Prelude
No ratings yet
Deep Learning Chorale Prelude
6 pages