0% found this document useful (0 votes)

2 views

CS6910_Tutorial5

This document is a tutorial for CS6910, covering various topics in neural networks and machine learning. It includes questions on loss functions, derivatives, gradient descent, and network architecture, as well as proofs and derivations related to these concepts. The tutorial aims to deepen understanding of classification problems and optimization techniques in neural networks.

Uploaded by

Debasmita_Kolkata

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CS6910_Tutorial5

Uploaded by

Debasmita_Kolkata

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

CS6910: Tutorial 3

VERSION I

1. Show that for a binary classification problem, minimising the cross

entropy loss is the same as minimising the KL divergence between the
true and predicted distributions.

2. You want your neural network based classification model to be highly

confident in addition to being accurate. One way of achieving this
is to ensure that the probability predicted for the correct class yi
should be larger than the probabilities predicted for the other classes
by a significant margin ∆ (say, ≥ 0.3). How would you design a loss
function to ensure this? For example, if we have 3 classes and if the
correct class label is 0, and the probabilities predicted by the model
are [y0 = 0.58, y1 = 0.37, y2 = 0.05] then the model should incur: (i)
no loss for the correct class, (ii) a loss for assigning a probability of
0.37 for the 2nd class since the difference in the probabilities for the
correct class and incorrect class is just 0.21 (that is lesser than ∆),
(iii) no loss for assigning a probability of 0.05 for the 3rd class since
the difference in the probability is greater than ∆.
Loss function =

Version I Page 1 of 9
3. An ordered network is a network where the state variables can be
computed one at a time in a specified order. Given the ordered net-
work below, give a formula for calculating the ordered derivative ∂y
∂y1
3

in terms of partial derivatives w.r.t. y1 and y2 where y1 , y2 and y3

are the outputs of nodes 1, 2 and 3 respectively.

y1 3 y3

(a)
dy3 ∂y3 dy2 ∂y3
= +
dy1 ∂y2 dy1 ∂y1
(b)
dy3 ∂y3 dy2 ∂y3
=
dy1 ∂y2 dy1 ∂y1
(c)
dy3 ∂y3 dy2 ∂y3
= −
dy1 ∂y2 dy1 ∂y1
(d)
None of the above.

4. Let φ1 (.) and φ2 (.) denote the sigmoid and the tanh functions re-
spectively. Tick the correct options.
(a) φ1 (-ν) = φ1 (ν) and φ2 (-ν) = 1 - φ2 (ν)
(b) φ1 (-ν) = -φ1 (ν) and φ2 (-ν) = 1 - φ2 (ν)
(c) φ1 (-ν) = 1 - φ1 (ν) and φ2 (-ν) = -φ2 (ν)
(d) None of the above.

Version I Page 2 of 9
5. Consider vectors u, x ∈ Rd , and matrix A ∈ Rn×n . The derivative of
a scalar f w.r.t. a vector x is a vector by itself, given by

∂f ∂f ∂f
∇x f = , ,...,
∂x1 ∂x2 ∂xn

Derive the expressions for the following derivatives (gradients).

∇x uT x , ∇ x xT x and ∇x xT Ax

(a) uT , xT and AxT

(b) uT , 2xT and 2AxT
(c) u, 2x and 2Ax
(d) u, 2x and Ax

6. A fair coin results in either Head (1) or Tail (0) with equal probability.
What is the entropy of the random variable indicating the outcome
of the toss? If instead, we had a biased coin with P(H) = 0.7, does
the entropy increase or decrease?

(a) With fair coin, entropy is 1. With biased coin, it is 0.88.

(b) With fair coin, entropy is 0.88. With biased coin, it is 1.
(c) With fair coin, entropy is 0.88. With biased coin, it is 0.66.
(d) With fair coin, entropy is 0.25. With biased coin, it is 0.90.

7. Recall the gradient descent update rule that comes from the Taylor
series when at each step we ensure L(θK+1 ) < L(θK ) where L is the
loss function. Suppose we are dealing with a quadratic loss function.
Can you come up with a better update rule such that we reach the
minima quickly.
Bonus question: Think about why this is not a widely used update
rule even when this is much faster then gradient descent.

8. Consider a fully connected network with 3 inputs x1 , x2 , x3 . Suppose

there are two hidden layers with 4 neurons having sigmoid activation
functions. Further, the output layer is a softmax layer. Assume that
all the weights in the network are set to 1 and all biases are set to 0.
Write down the output of the network as a function of x = [x1 , x2 , x3 ].
y=

Version I Page 3 of 9
9. Consider the following computation,

x σ f (x)

f (x) = tanh(w · x + b)

The value L is given by,

1
L= (y − f (x))4
2
Here, x and y are constants and w and b are parameters that can be
modified. In other words, L is a function of w and b.
∂L
Derive the partial derivatives, ∂w and ∂L
∂b .

(a)
∂L
= 2(y − f (x))3 f (x)(1 − f (x))2 x
∂w
∂L
= 2(y − f (x))3 f (x)(1 − f (x))2
∂b
(b)
∂L
= 2(y − f (x))3 (f (x)2 − 1)x
∂w
∂L
= 2(y − f (x))3 (f (x)2 − 1)
∂b
(c)
∂L
= 2(f (x) − y)3 f (x)2
∂w
∂L
= 2(f (x) − y)3 f (x)2 x
∂b
(d)
∂L
= 2(y − f (x))3 f (x)(1 − f (x))yx
∂w
∂L
= 2(y − f (x))3 f (x)(1 − f (x))y
∂b

Version I Page 4 of 9
2
y
10. Let f(x, y) = x2 + 100 . Gradient descent with a fixed step size η is
run for finding the minimum value of f from an initial point (xo , yo ).
1. Give an expression for (xt , yt ) in terms of η and (xo , yo ).

2. Let (xo , yo ) = (10, 0), give the range of η for which convergence
to the solution is guaranteed.

3. Let (xo , yo ) = (2, 5), give the range of η for which convergence
to the solution is guaranteed.

11. Consider a multivariate linear regression problem where the output

Ŷ = XW where X ∈ Rm×n , m is the number of training samples, n
is the number of features and Y is the true labels. The objective is
to minimize the squared error function where Y , Ŷ ∈ Rm .
M
1 X
L(W ) = (Yi − Ŷi )2
M i=1

∂L
Derive the gradient ∂W for the gradient descent update rule.
Answer:
∂L
∂W =

12. Consider a binary classification problem. Which of the following loss

functions when used with a deep neural network (≥ 1 hidden layer)
with non-linear activations is a convex loss function? Provide a
proof for your answer.
(a) Cross Entropy
(b) Mean Squared error
(c) All of the above
(d) None of the above

Version I Page 5 of 9
13. Consider a multivariate linear regression problem where the output
Ŷ = XW where X ∈ Rm×n , m is the number of training samples, n
is the number of features and Y is the true labels. The objective is
to minimize the squared error function where Y , Ŷ ∈ Rm .
M
1 X
L(W ) = (Yi − Ŷi )2
M i=1

Find a closed form solution to this problem if it exists. Think about

why do we use gradient descent (an iterative approach) in practice
over this.

(a) W = X T XX T Y
(b) W = X T (XX T )−1 Y
(c) W = (X T X)T XY
(d) W = (X T X)−1 X T Y

14. Suppose we train a deep neural network using the cross entropy loss
for classification. Now, instead of minimizing the cross-entropy loss,
suppose we change our objective function (J(θ)) to maximize the
probability of the correct class. What changes will have to be made
in our training setup?
(a) We cannot use backpropagation since it is applicable only in
scenarios where we are minimizing an objective function, not
maximizing it.
(b) We will have to change the update rule to θj : θj + α ∂J(θ)
∂θj .
(c) We do not need to change anything and the network will still
get trained properly without any modification.

15. Which of the following loss functions when used with logistic re-
gression is a convex loss function? Provide a proof for your an-
swer.
(a) Cross Entropy
(b) Mean Squared error
(c) All of the above
(d) None of the above

Version I Page 6 of 9
16. Suppose we have the following four points: x1 = (1,1), x2 = (-1,
P3
3), x3 = (2, 4) and (y1 , y2 , y3 ) = (5, 11, 18). Find minw i=1
(xi T w − yi )2 and also the value of w that leads to this minimum
value.
(a) min value = 0, w = [1,4]
(b) min value = 0, w = [4,1]
(c) min value = 1, w = [2,5]
(d) min value = 1, w = [5,2]

17. Which of the following metrics can be used to measure the similarity
between two probability distributions?
(a) Jensen-Shannon divergence
(b) Kullback–Leibler(KL) divergence
(c) Cross-Entropy
(d) Mahalanobis divergence

18. Consider this function: x2 y 2 + y 2 z 2 + z 2 x2 = 0. Compute

∂x
∂y .

(a)
y(x + z)
−
x(y + z)
(b)
y 2 (x + z)
−
x2 (y + z)
(c)
y(x2 − z 2 )
x(y 2 − z 2 )
(d)
y(x2 + z 2 )
−
x(y 2 + z 2 )

19. Consider a binary classification problem. Which of the following loss

functions when used with a deep neural network (≥ 1 hidden layer)
with linear activations is a convex loss function? Provide a proof for
your answer.

(a) Cross Entropy

(b) Mean Squared error
(c) All of the above
(d) None of the above

Version I Page 7 of 9
20. Consider this quadratic loss function J(θ). Which of the following
update equations will take minimum steps to reach from point 1 to
point 2?

(a) θ∗ = θo - 0.5 ∇θ J(θo )

(b) θ∗ = θo - H −1 ∇θ J(θo ) where H is hessian of J at θo
(c) θ∗ = θo - 4 ∇θ J(θo )
(d) θ∗ = θo - 2 ∇θ J(θo )

21. We are given astronomical data for star classification. Stars can be
classified into seven main types (O, B, A, F, G, K, M) based on their
surface temperatures. Additionally, there are sub-classes identified
based on their sizes: supergiants, giants, main-sequence stars, and
subdwarfs. Hence, a given training sample can be a supergiant star
of O type. What changes should be done to the standard feed forward
neural network to handle such cases where the classes are not mutually
exclusive?
(a) No changes are needed and we can model this problem with the
standard setup.
(b) Use sigmoid instead of softmax as the output activation func-
tion.
(c) Use Swish activation function instead of ReLU.
(d) Use binary cross-entropy loss for each class instead of categorical
cross-entropy loss.

Version I Page 8 of 9
22. An e-commerce company builds a feed forward neural network that
predicts how similar are two products. The network has 2 hidden
layers and an output layer.

Instead of a linear module, the company decides to have a quadratic

module in the first layer with b1 = 0 where x ∈ Rn . In addition to
this, they use Swish activation function(g(x)) instead of ReLU. Loss
is cross-entropy.

a 1 = xT W 1 x
g(x) = x.sigmoid(x)
Compute the backpropagation updates of this network, specifically
∂L ∂L ∂L ∂L
derive: ∂a 3
, ∂a 2
, ∂a 1
and ∂W 111

END

Version I Page 9 of 9

Olver PDE Student Solutions Manual
50% (2)
Olver PDE Student Solutions Manual
63 pages
When Law and Ethics Collide Social Control in Child Protective Services
No ratings yet
When Law and Ethics Collide Social Control in Child Protective Services
21 pages
Exhaust System Calculation
No ratings yet
Exhaust System Calculation
2 pages
Multiple Integration: Sphere 3
No ratings yet
Multiple Integration: Sphere 3
5 pages
eg_grad
No ratings yet
eg_grad
26 pages
Chapter 34
No ratings yet
Chapter 34
31 pages
MT2176
No ratings yet
MT2176
4 pages
Las 6-7
No ratings yet
Las 6-7
8 pages
Jksdhflkms
No ratings yet
Jksdhflkms
2 pages
Calculus III-11810 MATH-2415 Final
No ratings yet
Calculus III-11810 MATH-2415 Final
9 pages
Chapter 3-Partial Derivatives
No ratings yet
Chapter 3-Partial Derivatives
5 pages
Paperia 2 2024
No ratings yet
Paperia 2 2024
11 pages
Math 102 Worksheet1
No ratings yet
Math 102 Worksheet1
2 pages
Lecture 7
No ratings yet
Lecture 7
4 pages
2005 dec
No ratings yet
2005 dec
2 pages
18-Application of Derivative-02 - Solved Example
No ratings yet
18-Application of Derivative-02 - Solved Example
18 pages
4_derivatives_to_post
No ratings yet
4_derivatives_to_post
3 pages
chapter 6
No ratings yet
chapter 6
7 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Math21900 Exam
No ratings yet
Math21900 Exam
6 pages
Functions of Several Variables: MATH1251 - Calculus Outline
No ratings yet
Functions of Several Variables: MATH1251 - Calculus Outline
11 pages
Calculus Exam
No ratings yet
Calculus Exam
2 pages
N.B: Answer Six Questions, Taking Three From Each Section
No ratings yet
N.B: Answer Six Questions, Taking Three From Each Section
2 pages
1P1A Calculus 1 Example Sheet
No ratings yet
1P1A Calculus 1 Example Sheet
4 pages
Maths Class Xii Chapter 05 Continuity and Differentiability Practice Paper 05 2024 Answers
No ratings yet
Maths Class Xii Chapter 05 Continuity and Differentiability Practice Paper 05 2024 Answers
6 pages
Apde2 Exam 2012
No ratings yet
Apde2 Exam 2012
6 pages
2019 Fall Midterm 2 - other prof (1)
No ratings yet
2019 Fall Midterm 2 - other prof (1)
7 pages
MATH 31B - Week 1 Exponential, Inverse Functions, and Logarithmic Functions (I)
No ratings yet
MATH 31B - Week 1 Exponential, Inverse Functions, and Logarithmic Functions (I)
3 pages
Lecture Note 3
No ratings yet
Lecture Note 3
40 pages
WTW 164 Exam 2021
No ratings yet
WTW 164 Exam 2021
2 pages
Differential Equations Model Paper
No ratings yet
Differential Equations Model Paper
2 pages
Solución en Matlab para Descarga de Tanque
No ratings yet
Solución en Matlab para Descarga de Tanque
20 pages
Advanced Mathematics Tut 2
No ratings yet
Advanced Mathematics Tut 2
4 pages
Calc 2 Springstudyguide
No ratings yet
Calc 2 Springstudyguide
5 pages
Solutions231218 1
No ratings yet
Solutions231218 1
6 pages
Questions - ch3
No ratings yet
Questions - ch3
9 pages
MATH(1)2021_SEM_I
No ratings yet
MATH(1)2021_SEM_I
6 pages
Calculus Test12 Sol 20191015
No ratings yet
Calculus Test12 Sol 20191015
7 pages
Math 1120 Exam Revision 2 Doc X
No ratings yet
Math 1120 Exam Revision 2 Doc X
3 pages
C 06 Anti Differentiation
No ratings yet
C 06 Anti Differentiation
31 pages
2017 exam
No ratings yet
2017 exam
4 pages
University of Ghana: Second Semester Examinations, 2013/2014 Bsc/Ba
No ratings yet
University of Ghana: Second Semester Examinations, 2013/2014 Bsc/Ba
5 pages
Assignment Calculus
No ratings yet
Assignment Calculus
2 pages
Exercises Set 04. Postulates
No ratings yet
Exercises Set 04. Postulates
8 pages
1 - Linear - Algebra - 1 - Ordinary - Differential - Equations - and - Vector - Calculus 28th July
No ratings yet
1 - Linear - Algebra - 1 - Ordinary - Differential - Equations - and - Vector - Calculus 28th July
3 pages
Solutions Week 4
No ratings yet
Solutions Week 4
5 pages
Calculus Questions2019
No ratings yet
Calculus Questions2019
5 pages
EMT 4701 Applied Maths Final
No ratings yet
EMT 4701 Applied Maths Final
15 pages
MAS 201 Spring 2021 (CD) Differential Equations and Applications
100% (1)
MAS 201 Spring 2021 (CD) Differential Equations and Applications
23 pages
Tutorial Chapter 8 STD PDF
No ratings yet
Tutorial Chapter 8 STD PDF
10 pages
Differential-Calculus
No ratings yet
Differential-Calculus
47 pages
solution2WCB1nov16
No ratings yet
solution2WCB1nov16
6 pages
2.7 2.8 With Ans
No ratings yet
2.7 2.8 With Ans
5 pages
Differentiation
No ratings yet
Differentiation
10 pages
Practice Test 1 Solutions
No ratings yet
Practice Test 1 Solutions
7 pages
QM - Excercise - 0 - Wavefunction and Probability
No ratings yet
QM - Excercise - 0 - Wavefunction and Probability
4 pages
DE Chap1
No ratings yet
DE Chap1
13 pages
2021 5S MTHS C11 - Noterspoint
No ratings yet
2021 5S MTHS C11 - Noterspoint
8 pages
Lesson 4
No ratings yet
Lesson 4
25 pages
CalcAnswersCh7 Nswers To Exercises For Chapter 7 Logarithmic and Exponential Functions PDF
No ratings yet
CalcAnswersCh7 Nswers To Exercises For Chapter 7 Logarithmic and Exponential Functions PDF
7 pages
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
IT-Guest-Lecture
No ratings yet
IT-Guest-Lecture
2 pages
s10762-019-00628-7
No ratings yet
s10762-019-00628-7
18 pages
lec8
No ratings yet
lec8
28 pages
MCA Assignment 2014-15 - I Sem
No ratings yet
MCA Assignment 2014-15 - I Sem
17 pages
Published Article I-Ching December 2017
No ratings yet
Published Article I-Ching December 2017
15 pages
Lesson 3: Timbre
No ratings yet
Lesson 3: Timbre
15 pages
Equilibrium
No ratings yet
Equilibrium
49 pages
Tantra Yoga
No ratings yet
Tantra Yoga
7 pages
Production of Aniline Final Edit - Group 3
No ratings yet
Production of Aniline Final Edit - Group 3
57 pages
David sm13 PPT 05
No ratings yet
David sm13 PPT 05
41 pages
What Is Academic Text?
No ratings yet
What Is Academic Text?
28 pages
Political Philosophy of Plato
No ratings yet
Political Philosophy of Plato
8 pages
SEN Korea
No ratings yet
SEN Korea
46 pages
Sobin. P. Mathew: Pezhumkkoottathil (H), Naranganam P. O, Kozhencherry, Pathanamthitta
No ratings yet
Sobin. P. Mathew: Pezhumkkoottathil (H), Naranganam P. O, Kozhencherry, Pathanamthitta
2 pages
Batu Nisan: Pola Pengrajin Dan Korelasinya Terhadap Budaya: (Studi Kasus Kampung Gondang Kelurahan Manahan)
No ratings yet
Batu Nisan: Pola Pengrajin Dan Korelasinya Terhadap Budaya: (Studi Kasus Kampung Gondang Kelurahan Manahan)
19 pages
2017.kolb & Whishaw
No ratings yet
2017.kolb & Whishaw
9 pages
8.1 RBC Morphology
No ratings yet
8.1 RBC Morphology
5 pages
Co-Production of Public Services and Outcomes Elke Loeffler - Download the ebook now for an unlimited reading experience
100% (1)
Co-Production of Public Services and Outcomes Elke Loeffler - Download the ebook now for an unlimited reading experience
65 pages
XII (2023-25) 16th to 22nd Dec. 2024
No ratings yet
XII (2023-25) 16th to 22nd Dec. 2024
3 pages
The Perpetuity of The Law of God
No ratings yet
The Perpetuity of The Law of God
15 pages
En Verified Devices
No ratings yet
En Verified Devices
9 pages
Ujian Sumatif 1 Ting 1 2017
No ratings yet
Ujian Sumatif 1 Ting 1 2017
8 pages
Composition of Crude Oil
No ratings yet
Composition of Crude Oil
9 pages
DS - Chapter 5 - Naming
No ratings yet
DS - Chapter 5 - Naming
45 pages
Sad Final Project Haramaya Woreda Land Management System
No ratings yet
Sad Final Project Haramaya Woreda Land Management System
59 pages
Ai
No ratings yet
Ai
8 pages
Ymfc Al Setup
No ratings yet
Ymfc Al Setup
18 pages
Elite Algo v22
No ratings yet
Elite Algo v22
16 pages
District Industries Daman
No ratings yet
District Industries Daman
10 pages
I Et 5230.00 22313 500 PPC 001
No ratings yet
I Et 5230.00 22313 500 PPC 001
9 pages
Udemi 1
No ratings yet
Udemi 1
22 pages
441 Elevator Speech
No ratings yet
441 Elevator Speech
8 pages