Deep Learning Quiz Merged
Deep Learning Quiz Merged
Deep Learning
Assignment- Week 1
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2= 20
______________________________________________________________________________
QUESTION 1:
Which of the following is (are) region descriptor(s) ? Choose the correct option.
I) Fourier descriptor II) co-occurrence matrix III) Intensity histogram IV ) Signature
a. Both I and IV
b. Only I
c. Both II and III
d. None of the above
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 2:
Consider a two class Bayes’ Minimum Risk Classifier. Probability of class ω1 is P (ω1) =0.4 . P (x|
ω1) = 0.65, P (x| ω2) =0.5 and the loss matrix values are
11 12 0.1 0.9
0.85 0.15
21 22
a. 0.51
b. 0.61
c. 0.53
d. 0.39
Correct Answer: c
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
P (ω2 ) = 1 - P (ω1 ) = 0.6
Where, P(ω1 /x) = P (ω1 )* P (x| ω1 ) / P(x) and P(ω2 /x) = P (ω2 )* P (x| ω2 ) / P(x)
______________________________________________________________________________
QUESTION 3:
If the larger values of gray co-occurrence matrix are concentrated around the main diagonal,
then which one of the following will be true?
Correct Answer: c
Detailed Solution:
Options are self-explanatory. We can’t comment anything on the entropy based on the
values of diagonal elements. Because it depends on the randomness of the value. Whereas
element difference moment will be low and inverse element difference moment will be high.
______________________________________________________________________________
QUESTION 4:
Suppose Fourier descriptor of a shape has K coefficient, and we remove last few coefficients
and use only first m (m<K) number of coefficients to reconstruct the shape. What will be effect
of using truncated Fourier descriptor on the reconstructed shape?
b. We will get only the fine details of the boundary of the shape.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d. Low frequency component of the boundary will be removed from contour of the
shape.
Correct Answer: a
Detailed Solution:
Low frequency component of Fourier descriptor captures the general shape properties of
the object and high frequency component captures the finer detail. So, if we remove the last
few components, then the finer details will be lost, and as a result the reconstructed shape
will be smoothed version of original shape. The boundary of the reconstructed shape will
be a low frequency approximation of the original shape boundary.
______________________________________________________________________________
QUESTION 5:
Signature descriptor of an unknown shape is given in the figure, can you identify the unknown
shape?
a. Circle
b. Square
c. Straight line
d. Cannot be predicted
Correct Answer: a
Detailed Solution:
Distance from centroid to boundary is same for every value of ϴ. This is true for Circle
with a radius k.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
Signature descriptor of an unknown shape is given in the figure. If the value of k is 7 cm., what
is the area of the unknown shape
______________________________________________________________________________
QUESTION 7:
Which of the following is not a Co-occurrence matrix-based descriptor?
a. Entropy
b. Uniformity
c. Intensity histogram.
d. All of the above.
Correct Answer: c
Detailed Solution:
QUESTION 8:
Given an image I (fig 1), The gray co-occurrence matrix C (fig 2) can be constructed by specifying
the displacement vector d = (dx, dy). Let the position operator be specified as (1, 1), which has
the interpretation: one pixel to the right and one pixel below. (Both the image and the partial
gray co-occurrence is given in the figure 1, and 2 respectively. Blank values and ‘X’,‘Y’ values in
gray co-occurrence matrix are unknown.)
2 0 2 0 1
0 1 1 2 2
2 1 2 2 1
1 2 2 0 1
1 0 1 2 0
Fig1: I
Fig 2: C
a. 0
b. 1
c. 2
d. 3
Correct Answer: d
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 9:
What is the value of maximum probability descriptor?
a. 1/4
b. 3/12
c. 1/3
d. 3/16
Correct Answer: a
Detailed Solution:
Maximum probability = max (cij). cij is normalized co-occurrence matrix = 4/16 = 1/4.
______________________________________________________________________________
QUESTION 10:
The plot of distance of the different boundary point from the centroid of the shape taken at
various direction is known as
a. Signature descriptor
b. Polygonal descriptor
c. Fourier descriptor.
d. Convex Hull
Correct Answer: a
Detailed Solution:
__________________________________________________________
************END***********
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 2
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 2 = 20
______________________________________________________________________________
QUESTION 1:
Two random variables 𝑋1 and 𝑋2 follows Gaussian distribution with following mean and
variance.
𝑋1~𝑁 (0, 3) 𝑎𝑛𝑑 𝑋2~𝑁 (0, 2)
Correct Answer: a
Detailed Solution:
As 𝑿𝟏 has more variance than the 𝑿𝟐, so the distribution of 𝑿𝟏 will be more spread than
the 𝑿𝟐. So, distribution of 𝑿𝟏 will be flatter than distribution of 𝑿𝟐.
QUESTION 2:
In which scenario the discriminant function will be linear when a two class Bayesian classifier is
used to classify two class of points distributed normally? Choose the correct option.
a. Only II
b. Both I and II
c. Only III
d. None of the above
Correct Answer: b
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
QUESTION 3:
Choose the correct option regarding discriminant functions gi(x) for multiclass classification (x is the
feature vector to be classified).
Statement i : Risk value R (α i|x) in Bayes minimum risk classifier can be used as a discriminant function.
Statement ii : Negative of Risk value R (α i|x) in Bayes minimum risk classifier can be used as a
discriminant function.
Statement iii : Negative of Aposteriori probability P(ωi|x) in Bayes minimum error classifier can be used
as a discriminant function.
Statement iv : Aposteriori probability P(ωi|x) in Bayes minimum error classifier can be used as a
discriminant function.
Correct Answer: d
Detailed Solution:
QUESTION 4:
If we choose the discriminant function 𝑔𝑖 (𝑥) as a function of posterior probability. i.e. 𝑔𝑖 (𝑥 ) =
𝑓(𝑝(𝑤𝑖⁄𝑥 )). Then, which of following cannot be the function 𝑓( )?
Correct Answer: b
Detailed Solution:
QUESTION 5:
For a two-class problem, the linear discriminant function is given by 𝑔(𝑥) = 𝑎𝑡 𝑦. What is
updation rule for finding the weight vector 𝑎? Here 𝑦 is augmented feature vector.
a. 𝑎( 𝑘 + 1 ) = 𝑎( 𝑘 ) + 𝜂 ∑ 𝑦
b. 𝑎( 𝑘 + 1 ) = 𝑎( 𝑘 ) − 𝜂 ∑ 𝑦
c. 𝑎( 𝑘 + 1 ) = 𝑎(𝑘 − 1) − 𝜂𝑎(𝑘)
d. 𝑎( 𝑘 + 1 ) = 𝑎(𝑘 − 1) + 𝜂𝑎(𝑘)
Correct Answer: a
Detailed Solution:
𝑎( 𝑘 + 1 ) = 𝑎( 𝑘 ) + 𝜂 ∑ 𝑦
QUESTION 6:
You are given some data points for two different class.
Class 1 points: {(11, 11), (13, 11), (8, 10), (9, 9), (7, 7), (7, 5), (15, 3)}
Class 2 points: {(7, 11), (15, 9), (15, 7), (13, 5), (14, 4), (9, 3), (11, 3)}
Compute the mean vectors 𝜇1 and 𝜇 2 for these two classes and choose the correct option.
a. 𝜇1 = [10] and 𝜇 2 = [ 6 ]
10 12
b. 𝜇1 = [ ] and 𝜇 2 = [12]
12
8 7
10 10
c. 𝜇1 = [ ] and 𝜇 2 = [ ]
7 7
10 12
d. 𝜇1 = [ ] and 𝜇 2 = [ ]
8 6
Correct Answer: d
Detailed Solution: Add the points for each class and divide the output by number of points
i.e., 7
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 7:
You are given some data points for two different class es.
Class 1 points: {(11, 11), (13, 11), (8, 10), (9, 9), (7, 7), (7, 5), (15, 3)}
Class 2 points: {(7, 11), (15, 9), (15, 7), (13, 5), (14, 4), (9, 3), (11, 3)}
Compute the covariance matrices Σ1 and Σ2 for these two classes and choose the correct option.
Correct Answer: d
Detailed Solution:
QUESTION 8:
You are given some data points for two different class.
Class 1 points: {(11, 11), (13, 11), (8, 10), (9, 9), (7, 7), (7, 5), (15, 3)}
Class 2 points: {(7, 11), (15, 9), (15, 7), (13, 5), (14, 4), (9, 3), (11, 3)}
Assume that the points are samples from normal distribution and a two class Bayesian classifier
is used to classify them. Also assume the prior probability of the classes are equal i.e.,
𝑝(𝜔1 ) = 𝑝(𝜔2 )
Which of the following is true about the corresponding decision boundary used in the classifier?
(Choose correct option regarding the given statements)
Statement i: Decision boundary passes through the midpoint of the line segment joining the
means of two classes
Statement ii: Decision boundary will be orthogonal bisector of the line joining the means of two
classes.
Correct Answer: a
Detailed Solution:
𝟖. 𝟐𝟗 −𝟎. 𝟖𝟓
𝚺𝟏 = 𝚺𝟐 = 𝚺 = [ ] but 𝚺 is not an identity matrix So, only option a is correct.
−𝟎. 𝟖𝟓 𝟖. 𝟐𝟗
QUESTION 9:
You are given some data points for two different class.
Class 1 points: {(11,11), (13,11), (8,10), (9,9), (7,7), (7,5), (15,3)}
Class 2 points: {(7,11), (15,9), (15,7), (13,5), (14,4), (9,3), (11,3)}
Classify the following two new samples (𝐴 = (6,11), 𝐵 = (14,3)) using K-nearest neighbor.
Where K=3. Use Manhattan distance as a distance function.
Given two points (x_1, y_1) and (x_2, y_2), the Manhattan Distance d between them is:
d = |x_1 - x_2| + |y_1 - y_2|
Correct Answer: c
Detailed Solution:
Calculate Manhattan distance of each point of Class 1 and 2 from A = (6,11). Find the
closest 3 points. From class 1, 2 points are closest to A=(6,11) whereas from class 2, 1 point
is close to A.
Follow similar procedure for Point B.
QUESTION 10:
Suppose if you are solving a four-class problem, how many discriminants function you will need
for solving?
a. 1
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
b. 2
c. 3
d. 4
Correct Answer: d
Detailed Solution:
For n class problem we need n number of discriminant function.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 3
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Find the scalar projection of vector b = <-3, 2> onto vector a = <1,1>?
a. 0
1
b.
√2
−1
c.
√2
−1
d. 2
Correct Answer: c
Detailed Solution:
𝒃∙𝒂 −𝟑 ×𝟏+𝟐×𝟏 −𝟏
Scalar projection of b onto vector a is given by the scalar value |𝒂|
= = √𝟐
√ 𝟏 𝟐 +𝟏 𝟐
QUESTION 2:
Suppose there is a feature vector represented as [1, 4, 3]. What is the distance of this feature
vector from the separating plane x1+ 2x2- 2x3 + 3 = 0. Choose the correct option.
a. 1
b. 5
c. 3
d. 2
Correct Answer: d
Detailed Solution:
𝒂𝒚 𝟏+ 𝒃𝒚 𝟐+𝒄𝒚𝟑+𝒅
Distance of a vector [y1, y2, y3] from the plane ax1+ bx2 + cx3 + d = 0 is given by d =
√𝒂𝟐 +𝒃𝟐 +𝒄 𝟐
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝟏 ×𝟏+ 𝟐 ×𝟒+(−𝟐)×𝟑+𝟑 𝟔
= = =2
√𝟏 𝟐 +𝟐 𝟐 +𝟐 𝟐 𝟑
QUESTION 3:
If we employ SVM to realize two input logic gates, then which of the following will be true?
a. The weight vector for AND gate and OR gate will be same.
b. The margin for AND gate and OR gate will be same.
c. Both the margin and weight vector will be same for AND gate and OR
gate.
d. None of the weight vector and margin will be same for AND gate and
OR gate.
Correct Answer: b
Detailed Solution:
As we can see although the weight vectors are not same but the margin is same.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Suppose we have the below set of points with their respective classes as shown in the table.
Answer the following question based on the table.
X Y Class Label
1 1 +1
-1 -1 -1
2 2 +1
-1 2 +1
1 -1 -1
What can be a possible decision boundary of the SVM for the given points?
a. 𝑦=0
b. 𝑥=0
c. 𝑥=𝑦
d. 𝑥 +𝑦 = 1
Correct Answer: a
Detailed Solution:
QUESTION 5:
Suppose we have the below set of points with their respective classes as shown in the table.
Answer the following question based on the table.
X Y Class Label
1 1 +1
-1 -1 -1
2 2 +1
-1 2 +1
1 -1 -1
Find the decision boundary of the SVM trained on these points and choose which of the
following statements are true based on the decision boundary.
i) The point (-1,-2) is classified as -1
ii) The point (-1,-2) is classified as +1
iii) The point (1,-2) is classified as -1
iv) The point (1,-2) is classified as +1
Correct Answer: b
Detailed Solution:
The decision boundary is y=0. For the point (-1,-2) , -2 < 0 so the point is classified as -1.
Similarly, for the point (1,-2) , -2 > 0 so the point is classified as -1.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
The shape of the loss landscape during optimization of SVM resembles to which structure?
a. Linear
b. Ellipsoidal
c. Non-convex with multiple possible local minimum
d. Paraboloid
Correct Answer: d
Detailed Solution:
In SVM the objective to find the maximum margin based hyperplane (W) such that
The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function.
______________________________________________________________________________
QUESTION 7:
How many local minimum can be encountered while solving the optimization for maximizing
margin for SVM?
a. 2
b. 1
c. ∞ (infinite)
d. 0
Correct Answer: b
Detailed Solution:
In SVM the objective to find the maximum margin-based hyperplane (W) such that
The above optimization is a quadratic optimization with a paraboloid landscape for the loss
function. Since the shape is paraboloid, there can be only 1 global minimum.
______________________________________________________________________________
QUESTION 8:
Suppose we have one feature x ∈ R and binary class y. The dataset consists of 3 points: p1: (x1,
y1) = (−1, −1), p2: (x2, y2) = (1, 1), p3: (x3, y3) = (3, 1). Which of the following true with respect
to SVM?
a. Maximum margin will increase if we remove the point p2 from the training set.
b. Maximum margin will increase if we remove the point p3 from the training set.
c. Maximum margin will remain same if we remove the point p2 from the training set.
d. None of the above.
Correct Answer: a
Detailed Solution:
Here the point p2 is a support vector, if we remove the point p2 then maximum margin will
increase.
______________________________________________________________________________
QUESTION 9:
Choose the correct option regarding classification using SVM for two classes
Statement i : While designing an SVM for two classes, the equation 𝑦𝑖 (𝑎 𝑡 𝑥𝑖 + 𝑏) ≥ 0 is used to choose
the separating plane using the training vectors.
Statement ii : During inference, for an unknown vector 𝑥𝑗 , if 𝑦𝑗 (𝑎 𝑡𝑥𝑗 + 𝑏) ≥ 0 , then the vector can be
assigned class 1.
Statement iii : During inference, for an unknown vector 𝑥𝑗 , if (𝑎 𝑡 𝑥𝑗 + 𝑏) > 0 , then the vector can be
assigned class 1.
Statement iv : While designing an SVM for two classes, the equation 𝑦𝑖 (𝑎 𝑡𝑥𝑖 + 𝑏) ≥ 1 is used to choose
the separating plane using the training vectors.
Correct Answer: d
Detailed Solution:
___________________________________________________________________________
QUESTION 10:
Find the distance of the 3D point, 𝑃 = (−2, 4, 1) from the plane defined by
2𝑥 + 3𝑦 + 6𝑧 + 7 = 0?
a. 3
b. 4
c. 0
d. ∞ (infinity)
Correct Answer: a
Detailed Solution:
______________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Let X and Y be two features to discriminate between two classes. The values and class labels of
the features are given here under. The minimum number of neuron-layers required to design
the neural network classifier
X Y #Class
1 2 Class-II
0 0 Class-I
-2 -2 Class-I
3 2 Class-II
-1 -1 Class-I
a. 2
b. 1
c. 5
d. 4
Correct Answer: b
Detailed Solution:
Please refer to the lectures of week 4. Plot the feature points. They are linearly separable.
Hence single layer is able to do the classification task.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 2:
Let us assume that we implement an AND function using a single neuron as shown below. The
activation function 𝑓𝑁𝐿 (∙) , of our neuron is denoted as: f(y)=0, for y<30, f(y)=1 for y >=30. What
would be a possible combination of the weights and bias?
a. Bias = 5, w1 = 5, w2 = 25
b. Bias = 10, w1 = 5, w2 = 5
c. Bias = 10, w1 = 15, w2 = 15
d. Bias = 5, w1 = 10, w2 = 10
Correct Answer: c
Detailed Solution:
For AND function, (w1*x1+w2*x2+bias) should be >= 30 only when x1 and x2 equal to 1
and the expression should be less than 30 for all other values of x1 and x2. Only option C
satisfies that. Please refer to the lectures of week 4.
QUESTION 3:
Which among the following options give the range for a logistic function?
a. -1 to 1
b. -1 to 0
c. 0 to 1
d. 0 to infinity
Correct Answer: c
Detailed Solution:
QUESTION 4:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. [0.58,0.11, 0.31]
b. [0.43,0.24, 0.33]
c. [0.60,0.10,0.30]
d. [0.67, 0.09,0.24]
Correct Answer: d
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = ∑𝑛 𝒙𝒌 for j=1,2…,n
𝑘 =1 𝒆
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟑) = ∑𝑛 𝒙𝒌 =0.67and similarly the other values
𝑘 =1 𝒆
QUESTION 5:
Which of the following options is true?
a. In Batch Gradient Descent, a small batch of sample is selected randomly instead
of the whole data set for each iteration.
b. In Batch Gradient Descent, the whole data set is processed together for update
in each iteration.
c. Batch Gradient Descent considers only one sample for updates and has noisier
updates.
d. Batch Gradient Descent produces noisier updates than Stochastic Gradient
Descent
Correct Answer: b
Detailed Solution:
Batch Gradient Descent considers whole dataset for updates in each iteration.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
Choose the correct option:
Correct Answer: b
Detailed Solution:
Follow lecture 17
______________________________________________________________________________
QUESTION 7:
Choose the correct options about the assumptions are generally made during optimization in
machine learning.
Correct Answer: c
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 8:
An artificial neuron receives n inputs 𝑥 1 , 𝑥 2 , 𝑥 3 , … . . 𝑥 𝑛 with weights 𝑤1 , 𝑤2 , 𝑤3 , … . . 𝑤𝑛 attached
to the input links. The weighted sum_________________ is computed to be passed on to a
non-linear filter Φ called activation function to release the output. Fill in the blanks by choosing
one option from the following.
a. ∑𝑖 𝑤𝑖
b. ∑𝑖 𝑥 𝑖
c. ∑𝑖 𝑤𝑖 + ∑𝑖 𝑥 𝑖
d. ∑𝑖 𝑤𝑖 𝑥 𝑖
Correct Answer: d
Detailed Solution:
QUESTION 9:
Consider the below neural network. 𝑝̂ is the output after applying the non-linearity function
𝑓𝑁𝐿 (∙) on 𝑦 .The non-linearity 𝑓𝑁𝐿 (∙) is given as a step function i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
Choose the correct outputs generated by the network when the inputs are
{𝑥 1 = 1, 𝑥 2 = 0, 𝑥 3 = 0} and {𝑥 1 = 0, 𝑥 2 = 1, 𝑥 3 = 1}. Outputs are in the same order as inputs.
a. 1, 1
b. 0, 0
c. 1, 0
d. 0, 1
Correct Answer: c
Detailed Solution:
Detailed Solution:
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 = 𝟐 𝑿 𝟏 − 𝟏. 𝟓 𝑿 𝟎 + 𝟏 𝑿 𝟎 = 𝟐 𝒇(𝒚) = 𝟏 𝒂𝒔 𝒚 ≥ 𝟎
QUESTION 10:
Consider the below neural network. 𝑝̂ is the output after applying the non-linearity function
𝑓𝑁𝐿 (∙) on 𝑦 .The non-linearity 𝑓𝑁𝐿 (∙) is given as a step function i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
Choose the correct set of weights 𝑤1 , 𝑤2 and bias for which the network behaves as an OR
function.
a. 𝑤1 = 1, 𝑤2 = 1.5, 𝑏𝑖𝑎𝑠 = 1
b. 𝑤1 = 1, 𝑤2 = 0.5, 𝑏𝑖𝑎𝑠 = −1
c. 𝑤1 = 1, 𝑤2 = 1.5, 𝑏𝑖𝑎𝑠 = −1
d. 𝑤1 = 1, 𝑤2 = −0.5, 𝑏𝑖𝑎𝑠 = 1
Correct Answer: c
For OR function
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒃𝒊𝒂𝒔 𝒇(𝒚) = 𝟏 𝒂𝒔 𝒚 ≥ 𝟎, so 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒃𝒊𝒂𝒔 should be ≥ 𝟎
for (𝒙𝟏 = 𝟏, 𝒙𝟐 = 𝟏) , (𝒙𝟏 = 𝟏, 𝒙𝟐 = 𝟎), (𝒙𝟏 = 𝟎, 𝒙𝟐 = 𝟏) and should be < 0 for
(𝒙𝟏 = 𝟎, 𝒙𝟐 = 𝟎). Only option C satisfies this condition.
___________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 5
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
What is the main benefit of stacking multiple layers of neuron with non-linear activation
functions over a single layer perceptron?
Correct Answer: c
Detailed Solution:
QUESTION 2:
For a 2-class classification problem, what is the minimum number of nodes required for the
output layer of a multi-layered neural network?
a. 2
b. 1
c. 3
d. None of the above
Correct Answer: b
Detailed Solution:
Only 1 node is enough. We can expect that node to be activated (have high activation value)
only when class = +1 else the node should NOT be activated (have activation close to zero).
We can use the binary (2-class) cross entropy loss to train such a model.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 3:
What will the output from node 𝑎3 in the following neural network setup when the inputs are
(𝑥1, 𝑥2) = (0, 1).
The activation function used in each of three nodes 𝑎1 , 𝑎2 and 𝑎3 are zero-thresholding i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
a. -1
b. 0
c. 1
d. 0.5
Correct Answer: b
Detailed Solution:
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Which basic logic gate is implemented by the following neural network setup. The activation
function used in each of three nodes 𝑎1 , 𝑎2 and 𝑎3 are zero-thresholding i.e.,
0, 𝑖𝑓 𝑣 < 0
𝑓(𝑣) = {
1, 𝑖𝑓 𝑣 ≥ 0
a. AND
b. NOR
c. XNOR
d. XOR
Correct Answer: c
Detailed Solution:
for input (𝟎, 𝟎)
QUESTION 5:
𝜕𝐽
Find the gradient component for the network shown below if 𝐽 (∙) = (𝑝̂ − 𝑝)2 is the loss
𝜕𝑤1
function, 𝑝 is the target and the non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function
represented as 𝜎(∙)?
a. 2𝑝̂ × (1 − 𝜎(𝑦)) × 𝑥 1
b. 2(𝑝̂ − 𝑝) × 𝜎 (𝑦) × (1 − 𝜎 (𝑦)) × 𝑥 1
c. 2(𝑝̂ − 𝑝) × (1 − 𝜎 (𝑦)) × 𝑥 1
d. 2(1 − 𝑝) × (1 − 𝜎 (𝑦)) × 𝑥 1
Correct Answer: b
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝟏
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟏 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟏
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(𝒑
̂ − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟏
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟏
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟏
𝝏𝒘𝟏
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
Find the output 𝑝̂ corresponding to input {𝑥 1 = 1, 𝑥 2 = 1, 𝑥 3 = 0}, for the network shown
below. The non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function.
a. 0
b. 1
c. 0.5
d. 0.25
Correct Answer: c
Detailed Solution:
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 7:
𝜕𝐽
Find the gradient component for the network shown below if 𝐽(∙) = (𝑝̂ − 𝑝)2 is the loss
𝜕𝑤2
function, 𝑝=1 is the target and the non-linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function
represented as 𝜎(∙)?
a. −0.5
b. −1
c. 0
d. −0.25
Correct Answer: d
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
Using chain rule,
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟐 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟐
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(̂
𝒑 − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟐
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟐
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟐 = 𝟐 × (𝟎. 𝟓 − 𝟏) × 𝟎. 𝟓 × (𝟏 − 𝟎. 𝟓) × 𝟏
𝝏𝒘𝟐
= −𝟎. 𝟐𝟓
QUESTION 8:
What will be the updated value of 𝑤2 after the first iteration from the current state of the
network shown below if 𝐽 (∙) = (𝑝̂ − 𝑝)2 is the loss function, 𝑝=1 is the target and the non-
linearity 𝑓𝑁𝐿 (∙) is the sigmoid activation function represented as 𝜎(∙)?
a. −0.5
b. −1
c. 0
d. −0.25
Correct Answer: a
Detailed Solution:
̂ − 𝒑) 𝟐
𝑱 ( ∙) = ( 𝒑
̂ = 𝒇𝑵𝑳 (𝒚)
𝒑
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝒚 = 𝒙𝟏 𝒘𝟏 + 𝒙𝟐 𝒘𝟐 + 𝒙𝟑 𝒘𝟑 + 𝒃 = 𝟏 𝑿 𝟐 − 𝟏 𝑿 𝟏 + 𝟎 𝑿 𝟏 − 𝟏 = 𝟎 ̂ = 𝝈(𝒚) = 𝜎(0) =
𝒑
0.5
Using chain rule,
𝝏𝑱 𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= ∙ ∙
𝝏𝒘𝟐 𝝏𝒑 ̂ 𝝏𝒚 𝝏𝒘𝟐
𝝏𝑱 𝝏𝒑
̂ 𝝏𝒚
= 𝟐(𝒑
̂ − 𝒑) , = 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)), = 𝒙𝟐
𝝏𝒑
̂ 𝝏𝒚 𝝏𝒘 𝟐
𝝏𝑱
= 𝟐 (𝒑
̂ − 𝒑) × 𝝈(𝒚) × (𝟏 − 𝝈(𝒚)) × 𝒙𝟐 = 𝟐 × (𝟎. 𝟓 − 𝟏) × 𝟎. 𝟓 × (𝟏 − 𝟎. 𝟓) × 𝟏
𝝏𝒘𝟐
= −𝟎. 𝟐𝟓
𝝏𝑱
Updated 𝒘𝟐 = 𝒘𝟐 − 𝜼 𝝏𝒘 = −𝟏 − 𝟐 × (−𝟎. 𝟐𝟓) = −𝟏 + 𝟎. 𝟓 = −𝟎. 𝟓
𝟐
QUESTION 9:
Suppose a neural network has 3 input nodes, x, y, z. There are 2 neurons, Q and F. Q = x + y and
F = Q * z. What is the gradient of F with respect to x, y and z? Assume, (x, y, z) = (-2, 5, -4).
a. (-4, 3, -3)
b. (-4, -4, 3)
c. (4, 4, -3)
d. (3, 3, 4)
Correct Answer: b
Detailed Solution:
𝝏𝑭
𝑭 = 𝑸. 𝒛, = 𝑸 = 𝒙+𝒚 = 𝟑
𝝏𝒛
𝝏𝑭 𝝏𝑭
𝑭 = 𝑸. 𝒛 = (𝒙 + 𝒚). 𝒛, = 𝒛 = −𝟒, = 𝒛 = −𝟒
𝝏𝒙 𝝏𝒚
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 10:
Suppose a fully-connected neural network has a single hidden layer with 15 nodes. The input is
represented by a 5D feature vector and the number of classes is 3. Calculate the number of
parameters of the network. Consider there are NO bias nodes in the network?
a. 225
b. 75
c. 78
d. 120
Correct Answer: d
Detailed Solution:
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 6
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Which of the following is not true for PCA? Choose the correct option.
Correct Answer: d
Detailed Solution:
QUESTION 2:
What is the output of sigmoid function for an input with dynamic range[0, ∞]?
a. [0, 1]
b. [−1, 1]
c. [0.5, 1]
d. [0.25, 1]
Correct Answer: c
Detailed Solution:
𝟏
𝑺𝒊𝒈𝒎𝒐𝒊𝒅(𝒙) =
𝟏 + 𝒆−𝒙
𝟏 𝟏
If 𝒙 = 𝟎, 𝑺𝒊𝒈𝒎𝒐𝒊𝒅(𝟎) = = = 𝟎. 𝟓
𝟏+𝒆−𝟎 𝟏+𝟏
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝟏 𝟏
If 𝒙 = ∞, 𝑺𝒊𝒈𝒎𝒐𝒊𝒅(∞) = = =𝟏
𝟏+𝒆−∞ 𝟏+𝟎
QUESTION 3:
A zero-bias autoencoder has 3 input neurons, 1 hidden neuron and 3 output neurons. If the
2
network is perfectly trained using an input[3 ]. What would be the values of the weights in the
5
autoencoder?
2
a. [1 1 1], [3]
5
0.2
b. [1 1 1], [0.3 ]
0.5
1
c. [0.2 0.3 0.5], [1]
1
1
d. [2 3 5], [1]
1
Correct Answer: b
Detailed Solution:
𝑦 = 𝑊2 ∙ 𝑊1 ∙ 𝑥 ∙∙∙∙∙∙∙∙∙ (1)
2
If the network is perfectly trained, 𝑦 = 𝑥 = [3 ]
5
0.2
Equation 1 is only satisfied if 𝑊1 = [ 1 1 1] and 𝑊2 = [0.3]
0.5
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
A single hidden and no-bias autoencoder has 100 input neurons and 10 hidden neurons. What
will be the number of parameters associated with this autoencoder?
a. 1000
b. 2000
c. 2110
d. 1010
Correct Answer: b
Detailed Solution:
QUESTION 5:
Consider the 2-layer neural network shown below. The weights are represented as follows:
𝑘
𝑤𝑚𝑛 = weight between 𝑛th node of 𝑘th layer and 𝑚 th node (𝑘 − 1)th layer. 0th node is the bias
node = 1 as depicted in the diagram.
1
e.g. 𝑤32 = weight between 2nd node of hidden layer and 3rd node of input layer. Refer to the
diagram. All weights have not been shown to maintain clarity.
Sigmoid activation function is applied to both the hidden layer and the output layer. The loss
function is defined as 𝐽(∙) = 0.5(𝑦 − 𝑡) 2 where 𝑡 is the true label.
a. 0.13, 0.54
b. 0.33, 0.52
c. 0.23, 0.51
d. 0.13, 0.51
Correct Answer: b
Detailed Solution:
Let input vector be X = [1 1 0 1]T
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
1
−0.4 0.2 0.4 −0.5 1 −0.7
𝒂 = 𝜎(𝑊1 𝑋) 𝑊1𝑋 = [ ][ ] = [ ]
0.2 −0.3 0.1 0.2 0 0.1
1
−0.7 0.33
𝜎 ([ ]) = [ ]
0.1 0.52
QUESTION 6:
Consider the 2-layer neural network shown below. The weights are represented as follows:
𝑘
𝑤𝑚𝑛 = weight between 𝑛th node of 𝑘th layer and 𝑚 th node (𝑘 − 1)th layer. 0th node is the bias
node = 1 as depicted in the diagram.
1
e.g. 𝑤32 = weight between 2nd node of hidden layer and 3rd node of input layer. Refer to the
diagram. All weights have not been shown to maintain clarity.
Sigmoid activation function is applied to both the hidden layer and the output layer. The loss
function is defined as 𝐽(∙) = 0.5(𝑦 − 𝑡) 2 where 𝑡 is the true label.
Find the final output at node 𝑦 for given input {𝑥 1 = 1, 𝑥 2 = 0, 𝑥 3 = 1}? Choose the closest
answer.
a. 0.13
b. 0.33
c. 0.48
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d. 0.51
Correct Answer: c
Detailed Solution:
Hidden vector be A = [1 0.33 0.52]T as calculated in other question.
1
𝒚 = 𝜎 (𝑊2 𝐴) 𝑊2 𝐴 = [0.1 − 0.3 − 0.2] [0.33] = −0.1
0.52
𝜎 (−0.1) = 0.48
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 7:
Consider the 2-layer neural network shown below. The weights are represented as follows:
𝑘
𝑤𝑚𝑛 = weight between 𝑛th node of 𝑘th layer and 𝑚 th node (𝑘 − 1)th layer. 0th node is the bias
node = 1 as depicted in the diagram.
1
e.g. 𝑤32 = weight between 2nd node of hidden layer and 3rd node of input layer. Refer to the
diagram. All weights have not been shown to maintain clarity.
Sigmoid activation function is applied to both the hidden layer and the output layer. The loss
function is defined as 𝐽(∙) = 0.5(𝑦 − 𝑡) 2 where 𝑡 is the true label.
a. −0.09
b. −0.11
c. −0.13
d. −0.04
Correct Answer: d
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝑡 = 1 𝑎𝑛𝑑 𝑦 = 0.48
1
2 2 2 2
Let 𝒂 = 𝑊 𝐴 = 𝑤01 + 𝑤11 𝒂𝟏 + 𝑤21 𝒂𝟐 = [0.1 − 0.3 − 0.2] [ 0.33] = −0.1
0.52
𝝏𝑱 𝝏𝑱 𝝏𝒚 𝝏𝒂
2
= ∙ ∙ 2
𝝏 𝑤11 𝝏𝒚 𝝏𝒂 𝝏 𝑤11
𝝏𝑱 𝝏𝒚 𝝏𝒂
= (𝒚 − 𝒕), = 𝝈 (𝒂) × (𝟏 − 𝝈(𝒂)), = 𝒂𝟏
𝝏𝒚 𝝏𝒂 𝝏𝑤211
𝝏𝑱
2
= (𝒚 − 𝒕) × 𝝈(𝒂) × (𝟏 − 𝝈(𝒂)) × 𝒂𝟏 = (𝟎. 𝟒𝟖 − 𝟏) × 𝟎. 𝟒𝟖 × (𝟏 − 𝟎. 𝟒𝟖) × 𝟎. 𝟑𝟑
𝝏𝑤11
= −𝟎. 𝟎𝟒
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 8:
Consider the 2-layer neural network shown below. The weights are represented as follows:
𝑘
𝑤𝑚𝑛 = weight between 𝑛th node of 𝑘th layer and 𝑚 th node (𝑘 − 1)th layer. 0th node is the bias
node = 1 as depicted in the diagram.
1
e.g. 𝑤32 = weight between 2nd node of hidden layer and 3rd node of input layer. Refer to the
diagram. All weights have not been shown to maintain clarity.
Sigmoid activation function is applied to both the hidden layer and the output layer. The loss
function is defined as 𝐽(∙) = 0.5(𝑦 − 𝑡) 2 where 𝑡 is the true label.
a. −0.29
b. −0.1
c. −0.14
d. −0.04
Correct Answer: c
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
𝑡 = 1 𝑎𝑛𝑑 𝑦 = 0.48
1
2 2 2 2
Let 𝒂 = 𝑊 𝐴 = 𝑤01 + 𝑤11 𝒂𝟏 + 𝑤21 𝒂𝟐 = [0.1 − 0.3 − 0.2] [ 0.33] = −0.1
0.52
𝝏𝑱 𝝏𝑱 𝝏𝒚 𝝏𝒂
2
= ∙ ∙ 2
𝝏 𝑤21 𝝏𝒚 𝝏𝒂 𝝏 𝑤21
𝝏𝑱 𝝏𝒚 𝝏𝒂
= (𝒚 − 𝒕), = 𝝈 (𝒂) × (𝟏 − 𝝈(𝒂)), = 𝒂𝟐
𝝏𝒚 𝝏𝒂 𝝏𝑤221
𝝏𝑱
2
= (𝒚 − 𝒕) × 𝝈(𝒂) × (𝟏 − 𝝈(𝒂)) × 𝒂𝟐 = (𝟎. 𝟒𝟖 − 𝟏) × 𝟎. 𝟒𝟖 × (𝟏 − 𝟎. 𝟒𝟖) × 𝟎. 𝟓𝟐
𝝏𝑤21
= −𝟎. 𝟎𝟕
2 2 𝝏𝑱
Updated 𝑤21 = 𝑤21 − 𝜼 = −𝟎. 𝟐 − 𝟎. 𝟗 × (−𝟎. 𝟎𝟕) = −𝟎. 𝟐 + 𝟎. 𝟎𝟔 = −𝟎. 𝟏𝟒
𝑤221
QUESTION 9:
𝑑𝑦 𝑑𝑦
𝑦 = min (𝑎, 𝑏) and 𝑎 > 𝑏. What is the value of 𝑑𝑎 and 𝑑𝑏 ?
a. 1, 0
b. 0, 1
c. 0, 0
d. 1, 1
Correct Answer: b
Detailed Solution:
QUESTION 10:
Let’s say vectors 𝑎⃗ = {2; 4} and 𝑏⃗⃗ = {𝑛; 1} forms the first two principle components after
applying PCA. Under such circumstances, which among the following can be a possible value of
n?
a. 2
b. -2
c. 0
d. 1
Correct Answer: b
Detailed Solution:
____________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 7
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Select the correct option about Sparse Autoencoder?
Statement 2: The idea is to encourage network to learn an encoding and decoding which only
relies on activating a small number of neurons
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 2:
Select the correct option about Denoising autoencoders?
Statement A: The loss is between the original input and the reconstruction from a noisy version
of the input
Correct Answer: d
Detailed Solution:
For denoising autoencoder, both statement 1 and 2 are true. Thus option (d) is correct
______________________________________________________________________________
QUESTION 3:
Which of the following autoencoder methods uses corrupted versions of the input?
a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 4:
Which of the following autoencoder methods uses a hidden layer with fewer units than the
input layer?
a. Overcomplete design
b. Undercomplete Design
c. Sparse Design
d. Denoising Design
Correct Answer: b
Detailed Solution:
QUESTION 5:
Correct Answer: b
Detailed Solution:
Except option (b), rest all the options are true about auroencoders
____________________________________________________________________________
QUESTION 6:
Find the value of 𝑑(𝑡 − 34) ∗ 𝑥(𝑡 + 56); 𝑑(𝑡) being the delta function and * being the
convolution operation.
a. 𝑥(𝑡 + 56)
b. 𝑥(𝑡 + 32)
c. 𝑥(𝑡 + 22)
d. 𝑥(𝑡 − 56)
Correct Answer: c
Detailed Solution:
_____________________________________________________________________________
QUESTION 7:
Impulse response is the output of ________________system due to impulse input applied at
time=0. Fill in the blanks from the options below.
a. Linear
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
b. Time Varying
c. Time Invariant
d. Linear And Time Invariant
Correct Answer: d
Detailed Solution:
Impulse response is output of LTI system due to impulse input pplied at time t=0 or n=0.
Behaviour of an LTI system is characterized by its impulse response.
_________________________________________________________________________
QUESTION 8:
The impulse function is ___ when t=0. Fill in the blanks.
a. 1
b. 0
c. Infinity
d. None of the above
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
Given the image below where, Row 1: Original Input, Row 2: Noisy input, Row 3: Reconstructed
output. Choose one of the following variants of autoencoder that is most suited to get Row 3
from Row 2.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. Stacked autoencoder
b. Sparse autoencoder
c. Denoising autoencoder
d. None of the above
Correct Answer: c
Detailed Solution:
Reconstruction of original noise-free data from noisy input is the tasks of denoising
autoencoder
____________________________________________________________________________
QUESTION 10:
Which of the following is true for Contractive Autoencoders?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. penalizing instances where a small change in the input leads to a large change in
the encoding space
b. penalizing instances where a large change in the input leads to a small change in
the encoding space
c. penalizing instances where a small change in the input leads to a small change in
the encoding space
d. None of the above
Correct Answer: a
Detailed Solution:
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 8
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Which of the following is false about CNN?
Detailed Solution:
QUESTION 2:
The input image has been converted into a matrix of size 64 X 64 and a kernel/filter of size 5x5
with a stride of 1 and no padding. What will be the size of the convoluted matrix?
a. 5x5
b. 59x59
c. 64x64
d. 60x60
Correct Answer: d
Detailed Solution:
The size of the convoluted matrix is given by CxC where C=((I-F+2P)/S)+1, where C is the
size of the Convoluted matrix, I is the size of the input matrix, F the size of the filter matrix
and P the padding applied to the input matrix. Here P=0, I=64, F=5 and S=1. Therefore,
the answer is 60x60.
______________________________________________________________________________
QUESTION 3:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Filter size of 3x3 is convolved with matrix of size 4x4 (stride=1). What will be the size of output
matrix if valid padding is applied:
a. 4x4
b. 3x3
c. 2x2
d. 1x1
Correct Answer: c
Detailed Solution:
This type is used when there is no requirement for Padding. The output matrix after
convolution will have the dimension of ((n – f +2P)/S+ 1) x ((n – f +2P)/S+ 1)
______________________________________________________________________________
QUESTION 4:
Let us consider a Convolutional Neural Network having three different convolutional layers in
its architecture as:
Layer 3 of the above network is followed by a fully connected layer. If we give a 3-D
image input of dimension 39 X 39 to the network, then which of the following is the input
dimension of the fully connected layer.
a. 1960
b. 2200
c. 4563
d. 13690
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: a
Detailed Solution:
the input image of dimension 39 X 39 X 3 convolves with 10 filters of size 3 X 3 and takes
the Stride as 1 with no padding. After these operations, we will get an output of 37 X 37 X
10.
______________________________________________________________________________
QUESTION 5:
Suppose you have 64 convolutional kernel of size 3 x 3 with no padding and stride 1 in the first
layer of a convolutional neural network. You pass an input of dimension 1024x1024x3 through
this layer. What are the dimensions of the data which the next layer will receive?
a. 1020x1020x40
b. 1022x1022x40
c. 1021x1021x40
d. 1022x1022x3
Correct Answer: b
Detailed Solution:
Requires four hyperparameters: Number of filters K=64, their spatial extent F=3, the
stride S=1, the amount of padding P=0.
____________________________________________________________________________
QUESTION 6:
Consider a CNN model which aims at classifying an image as either a rose,or a marigold, or a lily
or orchid (consider the test image can have only 1 of the images at a time) . The last (fully-
connected) layer of the CNN outputs a vector of logits, L, that is passed through a ____
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
activation that transforms the logits into probabilities, P. These probabilities are the model
predictions for each of the 4 classes.
a. Leaky ReLU
b. Tanh
c. ReLU
d. Softmax
Correct Answer: a
Detailed Solution:
Softmax works best if there is one true class per example, because it outputs a probability
vector whose entries sum to 1.
____________________________________________________________________________
QUESTION 7:
Suppose your input is a 300 by 300 color (RGB) image, and you use a convolutional layer with
100 filters that are each 5x5. How many parameters does this hidden layer have (without bias)
a. 2501
b. 2600
c. 7500
d. 7600
Correct Answer: c
Detailed Solution:
Now we have 100 such filters. Now, as there is no bias so, total number of parameters= = 5
* 5 * 3 * 100 = 7500
______________________________________________________________________________
QUESTION 8:
Which of the following activation functions can lead to vanishing gradients?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. ReLU
b. Sigmoid
c. Leaky ReLU
d. None of the above
Correct Answer: b
Detailed Solution:
For sigmoid activation, a large change in the input of the sigmoid function will cause a
small change in the output. Hence, the derivative becomes small. When more and more
layers uses such activation, the gradient of the loss function becomes very small making the
network difficult to train.
___________________________________________________________________________
QUESTION 9:
Statement 1: Residual networks can be a solution for vanishing gradient problem
Statement 3: Residual networks can never be a solution for vanishing gradient problem
a. Statement 2 is correct
b. Statement 3 is correct
c. Both Statement 1 and Statement 2 are correct
d. Both Statement 2 and Statement 3 are correct
Correct Answer: c
Detailed Solution:
____________________________________________________________________________
QUESTION 10:
Input to SoftMax activation function is [0.5,0.5,1]. What will be the output?
a. [0.28,0.28,0.44]
b. [0.022,0.956, 0.022]
c. [0.045,0.910,0.045]
d. [0.42, 0.42,0.16]
Correct Answer: a
Detailed Solution:
𝒙
𝒆 𝒋
SoftMax, 𝝈(𝒙𝒋 ) = 𝑛 for j=1,2…,n
∑𝑘=1 𝒆𝒙𝒌
𝒙
𝒆 𝒋
Therefore, 𝝈(𝟎. 𝟓) = 𝑛 =0.28and similarly the other values
∑𝑘=1 𝒆𝒙𝒌
______________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
What can be a possible consequence of choosing a very small learning rate?
a. Slow convergence
b. Overshooting minima
c. Oscillations around the minima
d. All of the above
Correct Answer: a
Detailed Solution:
Choosing a very small learning rate can lead to slower convergence and thus option (a) is
correct.
______________________________________________________________________________
QUESTION 2:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for 𝛾?
𝑉𝑡 = 𝛾𝑉𝑡−1 + 𝜂∇𝜃 𝐽(𝜃)
a. 𝛾 is the momentum term which indicates acceleration
b. 𝛾 is the step size
c. 𝛾 is the first order moment
d. 𝛾 is the second order moment
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. 𝛾 is
that fraction which indicates how much acceleration you want and its value lies between 0 and 1.
______________________________________________________________________________
QUESTION 3:
Which of the following is true about momentum optimizer?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: d
Detailed Solution:
Option (a), (b) and (c) all are true for momentum optimiser. Thus, option (d) is correct.
______________________________________________________________________________
QUESTION 4:
Let 𝐽(𝜃) be the cost function. Let the gradient descent update rule for 𝜃𝑖 be,
𝜃𝑖+1 = 𝜃𝑖 + ∇𝜃𝑖
𝑑𝐽(𝜃𝑖 )
a. −𝛼
𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
b. 𝛼
𝑑 𝜃𝑖
𝑑𝐽(𝜃𝑖 )
c. −
𝑑𝜃𝑖+1
𝑑𝐽(𝜃𝑖 )
d.
𝑑𝜃𝑖
Correct Answer: a
Detailed Solution:
Gradient descent update rule for 𝜃𝑖 is,
𝑑𝐽(𝜃𝑖 )
𝜃𝑖+1 = 𝜃𝑖 − 𝛼 , 𝛼 is the learning rate
𝑑 𝜃𝑖
______________________________________________________________________________
QUESTION 5:
A given cost function is of the form J(θ) =6 θ2 - 6θ+6? What is the weight update rule for
gradient descent optimization at step t+1? Consider, 𝛼 to be the learning rate.
a. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 6𝛼(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − 𝛼(12𝜃 − 6 + 6)
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d. 𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 + 1)
Correct Answer: a
Detailed Solution:
𝜕𝐽(𝜃)
= 12𝜃 − 6
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 6𝛼(2𝜃 − 1)
______________________________________________________________________________
QUESTION 6:
If the first few iterations of gradient descent cause the function f(θ0,θ1) to increase rather than
decrease, then what could be the most likely cause for this?
Correct Answer: a
Detailed Solution:
If learning rate were small enough, then gradient descent should successfully take a tiny small
downhill and decrease f(θ0,θ1) at least a little bit. If gradient descent instead increases the
objective value that means learning rate is too high.
______________________________________________________________________________
QUESTION 7:
For a function f(θ0,θ1), if θ0 and θ1 are initialized at a global minimum, then what should be the
values of θ0 and θ1 after a single iteration of gradient descent?
Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________
QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
d. Too small update of weight values leading to slower convergence
Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
______________________________________________________________________________
QUESTION 9:
What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Update the weights and biases using gradient descent formula
3. Pass an input through the network and get values from output layer
4. Initialize weights and biases of the network with random values
5. Calculate gradient value corresponding to each weight and bias
a. 1, 2, 3, 4, 5
b. 5, 4, 3, 2, 1
c. 3, 2, 1, 5, 4
d. 4, 3, 1, 5, 2
Correct Answer: d
Detailed Solution:
Initialize random weights, and then start passing input instances and calculate error response
from output layer and back-propagate the error through each subsequent layers. Then update the
neuron weights using a learning rate and gradient of error. Please refer to the lectures of week 4.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 10:
You run gradient descent for 15 iterations with learning rate 𝜂 = 0.3 and compute error after
each iteration. You find that the value of error decreases very slowly. Based on this, which of
the following conclusions seems most plausible?
Correct Answer: a
Detailed Solution:
Error rate is decreasing very slowly. Therefore increasing the learning rate is a most plausible
solution.
______________________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 10
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
a. Prevent overfitting
b. Faster convergence
c. Faster inference time
d. Prevent Co-variant shift
Correct Answer: c
Detailed Solution:
Inference time does not become faster due to batch normalization. It increases the computational
burden. So, inference time increases.
____________________________________________________________________________
QUESTION 2:
A neural network has 3 neurons in a hidden layer. Activations of the neurons for three batches
1 0 6
are[2] , [2] , [9] respectively. What will be the value of mean if we use batch normalization in
3 5 2
this layer?
2.33
a. [4.33]
3.33
2.00
b. [2.33]
5.66
1.00
c. [1.00]
1.00
0.00
d. [0.00]
0.00
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: a
Detailed Solution:
1 1 0 6 2.33
× ([2] + [2] + [9]) = [4.33]
3
3 5 2 3.33
______________________________________________________________________________
QUESTION 3:
How can we prevent underfitting?
Correct Answer: b
Detailed Solution:
Underfitting happens whenever feature samples are capable enough to capture the data
distribution. We need to increase the feature size, so data can be fitted perfectly well.
______________________________________________________________________________
QUESTION 4:
How do we generally calculate mean and variance during testing?
Correct Answer: c
Detailed Solution:
We generally calculate batch mean and variance statistics during training and use the estimated
batch mean and variance during testing.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
Which one of the following is not an advantage of dropout?
a. Regularization
b. Prevent Overfitting
c. Improve Accuracy
d. Reduce computational cost during testing
Correct Answer: d
Detailed Solution:
Dropout makes some random features during training but while testing we don’t zero-down any
feature. So there is no question of reduction of computational cost.
______________________________________________________________________________
QUESTION 6:
What is the main advantage of layer normalization over batch normalization?
a. Faster convergence
b. Lesser computation
c. Useful in recurrent neural network
d. None of these
Correct Answer: c
Detailed Solution:
See the lectures/lecture materials.
______________________________________________________________________________
QUESTION 7:
While training a neural network for image recognition task, we plot the graph of training error
and validation error. Which is the best for early stopping?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. A
b. B
c. C
d. D
Correct Answer: c
Detailed Solution:
Minimum validation point is the best for early stopping.
______________________________________________________________________________
QUESTION 8:
Which among the following is NOT a data augmentation technique?
Correct Answer: b
Detailed Solution:
Random shuffle of all the pixels of the image will distort the image and neural network will be
unable to learn anything. So, it is not a data augmentation technique.
______________________________________________________________________________
QUESTION 9:
Which of the following is true about model capacity (where model capacity means the ability of
neural network to approximate complex functions)?
Correct Answer: a
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Dropout and learning rate has nothing to do with model capacity. If hidden layers increase, it
increases the number of learnable parameter. Therefore, model capacity increases.
______________________________________________________________________________
QUESTION 10:
Batch Normalization is helpful because
Correct Answer: a
Detailed Solution:
Batch normalization layer normalizes the input.
______________________________________________________________________________
______________________________________________________________________________
************END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 11
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
Which of following can be a target output of semantic segmentation problem with 4 class?
a.
0 1 0 1 1 0 0 0 1 0 0 0
0 1 0 0 0 0 1 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0 0 0
I II III IV
b.
0 1 0 1 0 0 0 0 1 0 1 0
0 1 0 0 1 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 1
I II III IV
c.
0 1 0 1 0 0 0 0 1 0 0 0
0 1 0 0 0 1 1 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 1
I II III IV
d.
0 1 0 1 0 0 0 0 1 0 0 0
0 1 0 1 0 0 1 0 0 0 0 0
1 1 0 0 1 0 0 0 1 0 1 1
I II III IV
Correct Answer: c
Detailed Solution:
Target output should be one hot encoded vector at every pixel location. It should one if the pixel
belongs to that particular class otherwise 0.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 2:
Suppose you have a 1𝐷 signal 𝑥 = [1,2,3,4,5] and a filter 𝑓 = [1,2,3,4], and you perform stride
2 transpose convolution on the signal 𝑥 by the filter 𝑓 to get the signal 𝑦. What will be the
signal 𝑦 if we don’t perform cropping?
a. 𝑦 = [1,2,5,8,9,14,13,20,19,26,3,4]
b. 𝑦 = [1,2,3,4,5,4,3,2,1]
c. 𝑦 = [1,2,5,8,9,14,13,20,17,26,15,20]
d. 𝑦 = [0,0,5,8,9,14,13,20,19,26,0,0]
Correct Answer: c
Detailed Solution:
1 2 3 4 5
1 1
2 2
1*3+2*1=5 3 1
4*1+2*2=8 4 2
3*2+1*3=9 3 1
4*2+2*3=14 4 2
3*3+1*4=13 3 1
4*3+2*4=20 4 2
3*4+1*5=17 3 1
4*4+2*5=26 4 2
15 3
20 4
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 3:
What are the different challenges one face while creating a facial recognition system?
Correct Answer: d
Detailed Solution:
Please refer to the lecture of week 11.
______________________________________________________________________________
QUESTION 4:
Fully Connected Convolutional network or FCN became one of the major successful network
architectures. Can you identify what are the advantages of FCN which makes it a successful
architecture for semantic segmentation?
Correct Answer: d
Detailed Solution:
Please refer to the lecture of week 11. It has a larger receptive field by using strided Convolution
layer, also it mixes global feature and number of computations are reduced as we are down
sampling the image resolution.
____________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
In a Deep CNN architecture, the feature map before applying a max pool layer with (2x2) kernel
is given bellow.
12 6 15 9
19 2 7 18
14 2 17 6
3 5 19 2
After few successive convolution layers, the feature map is again up-sampled using Max Un-
pooling. If the following feature map is present before Max-Unpooling layer, what will be the
output of the Max-Unpooling layer?
5 6
8 13
a.
0 0 0 0
5 0 0 6
8 0 0 0
0 0 13 0
b.
5 5 6 6
5 5 6 6
8 8 13 13
8 8 13 13
c.
5 0 6 0
0 0 0 0
8 0 13 0
0 0 0 0
d. None of the above
Correct Answer: a
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 6:
What could be thought as disadvantage of Fully Convolutional neural network for semantic
segmentation addressed by other researchers?
a. It has a fixed receptive field, so the object with lesser size than the receptive
filed will be missed by the network.
b. Down Sampling the image dimension over the depth makes the feature map
sparse.
c. It requires lot of computation.
d. None of the above
Correct Answer: a
Detailed Solution:
The fixed receptive filed is the disadvantage of Fully convolution network for semantic
segmentation addressed by “Learning Deconvolution Network for Semantic Segmentation”
paper.
____________________________________________________________________________
QUESTION 7:
What will be the dice coefficient of following two one hot encoded vector? (|A|=no of 1 bit)
A 1 0 1 0 0 0 1 1 1 0 0 1 0 1
B 1 0 0 0 0 1 1 1 0 0 0 1 0 0
a. 0.83
b. 0.41
c. 0.67
d. 0.90
Correct Answer: c
Detailed Solution:
No of 1 bit in A=7
No of 1 bit in B =5
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Overlapping 1 bit =4
2∗|𝐴∩𝐵| 2∗4
Dice Coefficient = |𝐴|+|𝐵|
= = 0.67
5+7
______________________________________________________________________________
QUESTION 8:
What will be the value of dice coefficient between A and B?
Correct Answer: a
Detailed Solution:
|A|=7.82
|B|=8
0 0 0 0
A∩B= 0 0 0 0
0.89 0.85 0.88 0.91
0.99 0.97 0.95 0.97
|A∩B|=7.42
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
______________________________________________________________________________
QUESTION 9:
In FaceNet, why the L2 normalization layer is used?
a. To constrain the embedding function in a d-dimensional hyper-sphere.
b. For regularization of weight vector, i.e. L2 regularization.
c. For getting a sparse embedding function.
d. None of the above.
Correct Answer: a
Detailed Solution:
Using the L2 normalization layer we impose the constrain that ||f(x)||22 =1 . This will constrain
the embedding function to live on the d-dimensional hyper-sphere.
______________________________________________________________________________
QUESTION 10:
What is the use of Skip Connection in image denoising networks?
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
***********END*******
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Deep Learning
Assignment- Week 12
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
During training a Variational Auto-encoder (VAE), it is assumed that 𝑃(𝑧|𝑥) ∼ 𝑁(0, 𝐼) i.e., given
an input sample, the encoder is forced to map its latent code to 𝑁(0, 𝐼). After the training is
over, we want to use the VAE as a generative model. What should be the best choice of
distribution from which we should sample a latent vector to generate a novel example?
a. 𝑁(0, 𝐼): Normal distribution with zero mean and identity covariance
b. 𝑁(1, 𝐼): Normal distribution with mean = 1 and identity covariance
c. Uniform distribution between [-1, 1]
d. 𝑁(−1, 𝐼): Normal distribution with mean = -1 and identity covariance
Correct Answer: a
Detailed Solution:
Since during training, we forced the latent code to follow 𝑁(0, 𝐼), the decoder has learnt to map
latent codes from that distribution only. So, during sampling if we provide vectors from any
other distributions, then the encoder will have low probability to have encountered such vectors
thereby leading to unrealistic reconstructions. So, we should sample vectors from 𝑁(0, 𝐼) for
using the pre-trained VAE as a generative model.
______________________________________________________________________________
QUESTION 2:
When the GAN game has converged to its Nash equilibrium (when the Discriminator randomly
makes an error in distinguishing fake samples from real samples), what is the probability (of
belongingness to real class) given by the Discriminator to a fake generated sample?
a. 1
b. 0.5
c. 0
d. 0.25
Correct Answer: b
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Nash equilibrium is reached when the generated distribution, 𝑝𝑔 (𝑥) equals the original data
distribution, 𝑝𝑑𝑎𝑡𝑎 (𝑥), which leads to 𝐷(𝑥) = 0.5 for all 𝑥.
______________________________________________________________________________
QUESTION 3:
Why is re-parameterization trick used in VAE?
Correct Answer: b
Detailed Solution:
We cannot sample in a differentiable manner from within a computational graph present in a
neural network. Re-parameterization enables the sampling function to be present outside the
main computational graph which enables us to do regular gradient descent optimization.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Which one of the following graphical models fully represents a Variational Auto-encoder (VAE)
realization?
Correct Answer: a
Detailed Explanation:
For practical realization of VAE, we have an encoder 𝑄(∙) which receives an input signal, 𝑥 and
generates a latent code, 𝑧. This part of the network can be denoted by 𝑄(𝑧|𝑥) and directed from
𝑥 to 𝑧. Next, we have a decoder section which takes the encoded z vector to reconstruct the input
signal, 𝑥. This part of the network is represented by 𝑃(𝑥|𝑧) and should be directed from 𝑧 to 𝑥.
______________________________________________________________________________
QUESTION 5:
Which one of the following computational graphs correctly depict the re-parameterization trick
deployed for practical Variational Auto-encoder (VAE) implementation? Circular nodes
represent random nodes in the models and the quadrilateral nodes represent deterministic
nodes.
a b c d
Correct Answer: a
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
With the re-parameterization trick, the only random component in the network is the node of ∊
which is sampled from 𝑁(0, 𝐼). The other nodes of μ and σ are deterministic. Since ∊ is sampled
from outside the computational graph, the overall z vector also becomes deterministic
component for a given set of μ, σ and ∊. Also, if z is not deterministic, we cannot back propagate
gradients through it. Also, in the computation graph, the forward arrows will emerge from μ,σ,ϵ
towards z for computing the z vector.
______________________________________________________________________________
QUESTION 6:
For the following min-max game, at which state of (x, y) do we achieve the Nash equilibrium
(the state where change of one variable does not alter the state of the other variable)?
a. X = 0 y = -1
b. X = 0 , y= 0
c. X = 0 y = 1
d. X = ∞(infinite), y = 0
Correct Answer: b
Detailed Solution:
The Nash equilibrium is x=y=0. This is the only state where the action of one player does not
affect the other player’s move. It is the only state that any opponents’ actions will not change the
game outcome.
______________________________________________________________________________
QUESTION 7:
Which of the following losses can be used to optimize for generator’s objective (while training a
Generative Adversarial network) by MINIMIZING with gradient descent optimizer? Consider
cross-entropy loss,
and D(G(z)) = probability of belonging to real class as output by the Discriminator for a given
generated sample G(z).
a. CE(1, D(G(z)))
b. CE(1, -D(G(z)))
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
c. CE(1, 1 - D(G(z)))
d. CE(1, 1 / D(G(z)))
Correct Answer: a
Detailed Solution:
Except for option (a) none of the other objective function are minimized at D(G(z)) = 1 which is
the goal of the Generator, i.e. to force the Discriminator to output probability=1 for a generated
sample. Loss function in option (a) is the only choice which keeps on decreasing as D(G(z))
increases. Also, it is required that D(G(z)) ∈ [0,1].
______________________________________________________________________________
QUESTION 8:
While training a Generative Adversarial network, which of the following losses CANNOT be
used to optimize for discriminator objective (while only sampling from the distribution of
generated samples) by MAXIMIZING with gradient ASCENT optimizer? Consider cross-entropy
loss,
and D(G(z)) = probability of belonging to real class as output by the Discriminator for a given
generated sample, G(z) from a noise vector, z.
a. CE(1, D(G(z)))
b. -CE(1, D(G(z)))
c. CE(1, 1 + D(G(z)))
d. -CE(1, 1 - D(G(z)))
Correct Answer: b
Detailed Solution:
During optimization of discriminator, when we sample from the distribution of fake/generated
distribution, we want D(G(z)) = 0. Since we want to use gradient ASCENT optimization, the
objective function should increase as we approach D(G(z)) = 0 while the objective value should
decrease with increase in value of D(G(z)). Apart from option (b), all other options satisfy the
above conditions.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 9:
For training VAE, we want to predict an unknown distribution of latent code given an observed
sample, i.e., P(z|x), but we approximate it with some distribution Q(z|x) which we can control
by varying some known parameters. Which of the following loss functions is used as a loss to
minimize?
𝑃(𝑥,𝑧)
a. − ∑𝑧 𝑄(𝑧|𝑥)𝑙𝑜𝑔 𝑄(𝑧∨𝑥)
𝑃(𝑥,𝑧)
b. − ∑𝑥 𝑄(𝑧|𝑥)𝑙𝑜𝑔 𝑄(𝑧∨𝑥)
𝑃(𝑥,𝑧)
c. ∑𝑧 𝑃(𝑧|𝑥)𝑙𝑜𝑔 𝑄(𝑧∨𝑥)
d. None of the above
Correct Answer: a
Detailed Solution:
Since we are trying to approximate P(z|x) with Q(z|x), we will try to minimize the KL
divergence, KL(Q(z|x) || P(z|x)) which eventually leads to maximization of the well-known
𝑃(𝑥,𝑧)
Variational lower bound of ∑𝑧 𝑄(𝑧|𝑥)𝑙𝑜𝑔 𝑄(𝑧∨𝑥).
𝑃(𝑥,𝑧)
So, we will minimize − ∑𝑧 𝑄(𝑧|𝑥)𝑙𝑜𝑔 . See the lecture videos for detailed derivations.
𝑄(𝑧∨𝑥)
______________________________________________________________________________
QUESTION 10:
Above figure shows latent vector subtraction of two concepts of “face with glasses” and
“glasses”. What is expected from the resultant vector?
a. glasses
b. face without glasses
c. face with 2 glasses
d. None of the above
Correct Answer: b
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
It is expected that VAE latent space follows vector arithmetic. Thus the resultant vector is a
vector subtraction of the two concepts which will result in the final vector to represent a face
without glasses.
_______________________________________________________________________
______________________________________________________________________________
************END*******