Introduction to Machine Learning - - Unit 8 - Week 5
Introduction to Machine Learning - - Unit 8 - Week 5
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
Click to register
for Certification
exam
Week 5 : Assignment 5
(https://examform.nptel.ac.in/2025_01/exam_form/dashboard)
The due date for submitting this assignment has passed.
2
About pk + (m − 1)k + k
NPTEL ()
2
p + (m − 1)pk + k
How does an 2 2
p + (m − 1)pk + k
NPTEL
online Yes, the answer is correct.
course Score: 1
work? () Accepted Answers:
2
pk + (m − 1)k + k
Week 0 ()
2) Consider a neural network layer defined as y = ReLU (W x). Here x ∈ R
p
is the 1 point
Week 1 () input, y ∈ R
d
is the output and W ∈ R
d×p
is the parameter matrix. The ReLU activation
∂y
(defined asReLU (z) := max(0, z) for a scalar z) is applied element-wise to W xFind i
∂Wij
Week 2 () where i = 1, . . , d and j = 1, . . . , p. In the following options, I(condition) is an indicator
function that returns 1 if the condition is true and 0 if it is false.
Week 3 ()
p
I(∑ Wik x k ≤ 0)x i
Week 4 () k=1
p
I(∑ Wik x k > 0)x j
Week 5 () k=1
https://onlinecourses.nptel.ac.in/noc25_cs46/unit?unit=60&assessment=313 1/5
4/22/25, 4:50 AM Introduction to Machine Learning - - Unit 8 - Week 5
Artificial
p
I(∑ Wik x k > 0)Wij x j
Neural k=1
Networks I -
p
Early Models I(∑
k=1
Wik x k ≤ 0)Wij x j
(unit?
Yes, the answer is correct.
unit=60&lesso Score: 1
n=61)
Accepted Answers:
p
I(∑ Wik x k > 0)x j
Artificial k=1
Neural
3) Consider a two-layered neural network y = σ(W
(B)
σ(W
(A)
x)). Let 1 point
Networks II -
Backpropagati h = σ(W
(A)
x) denote the hidden layer representation.W (A)
and W (B)
are arbitrary weights.
on (unit? Which of the following statement(s) is/are true? Note:∇ g (f ) denotes the gradient of f w.r.t g.
unit=60&lesso
n=62)
∇ h (y) depends on W (A)
Artificial
Neural ∇ (A) (y) depends on W (B)
W
Networks III -
Backpropagati
∇ (A) (h) depends on W (B)
on Continued W
(unit?
unit=60&lesso ∇
W
(B) (y) depends on W (A)
n=63) Yes, the answer is correct.
Score: 1
Artificial
Accepted Answers:
Neural
(y) depends on W
(B)
∇
Networks IV - W
(A)
Training, ∇
W
(B) (y) depends on W (A)
Initialization
and Validation 4) Which of the following statement(s) about the initialization of neural network weights 1 point
(unit? is/are true for a network that uses the sigmoid activation function?
unit=60&lesso
n=64) Two different initializations of the same network could converge to different minima
Parameter For a given initialization, gradient descent will converge to the same minima irrespective
Estimation I - of the learning rate.
The Maximum
Initializing all weights to the same constant value leads to undesirable results
Likelihood
Estimate (unit? Initializing all weights to very large values leads to undesirable results
unit=60&lesso Yes, the answer is correct.
n=65) Score: 1
Accepted Answers:
Parameter
Two different initializations of the same network could converge to different minima
Estimation II -
Initializing all weights to the same constant value leads to undesirable results
Priors and the
MAP estimate Initializing all weights to very large values leads to undesirable results
(unit?
unit=60&lesso 5) Consider the following statements about the derivatives of the sigmoid 1 point
n=66) 1 exp(x)−exp(−x)
(σ(x) = ) and tanh (tanh(x) = ) activation functions. Which of
1+exp(−x) exp(x)+exp(−x)
n=67)
′ 1
0 < σ (x) ≤
Week 5 4
Feedback
https://onlinecourses.nptel.ac.in/noc25_cs46/unit?unit=60&assessment=313 2/5
4/22/25, 4:50 AM Introduction to Machine Learning - - Unit 8 - Week 5
Form : ′
tanh (x) =
1
(1 − (tanh(x)) )
2
2
Introduction To
Machine 0 < tanh (x) ≤ 1
′
Learning (unit?
unit=60&lesso Yes, the answer is correct.
Score: 1
n=286)
Accepted Answers:
Quiz: Week 5 ′
σ (x) = σ(x)(1 − σ(x))
: Assignment ′
0 < σ (x) ≤
1
4
5 ′
0 < tanh (x) ≤ 1
(assessment?
name=313) 6) A geometric distribution is defined by the p.m.f. f (x; p) = (1 − p)(x−1) p for 1 point
x = 1, 2, . . . . . . Given the samples [4, 5, 6, 5, 4, 3] drawn from this distribution, find the MLE
Week 6 () of p .
Week 7 () 0.111
0.222
Week 8 ()
0.333
0.444
Week 9 ()
Yes, the answer is correct.
Week 10 () Score: 1
Accepted Answers:
0.222
Week 11 ()
7) Consider a Bernoulli distribution with p = 0.7 (true value of the parameter). We 1 point
Week 12 ()
draw samples from this distribution and compute an MAP estimate of p by assuming a prior
distribution over p . Let N (μ, σ 2 ) denote a gaussian distribution with a mean μ and variance
Text
Transcripts σ
2
.Distributions are normalized as needed. Which of the following statement(s) is/are true?
()
If the prior is N (0.6, 0.1) , we will likely require fewer samples for converging to the true
Download value than if the prior is N (0.4, 0.1)
Videos ()
If the prior is N (0.4, 0.1) , we will likely require fewer samples for converging to the true
Books () value than if the prior is N (0.6, 0.1)
Problem With a prior of N (0.1, 0.001), the estimate will never converge to the true value,
Solving regardless of the number of samples used.
Session -
Jan 2025 () With a prior of U (0, 0.5) (i.e. uniform distribution between 0 and 0.5), the estimate will
never converge to the true value, regardless of the number of samples used.
8) Which of the following statement(s) about parameter estimation techniques is/are 1 point
true?
https://onlinecourses.nptel.ac.in/noc25_cs46/unit?unit=60&assessment=313 3/5
4/22/25, 4:50 AM Introduction to Machine Learning - - Unit 8 - Week 5
To obtain a distribution over the predicted values for a new data point, we need to
compute an integral over the parameter space.
The MAP estimate of the parameter gives a point prediction for a new data point.
The MLE of a parameter gives a distribution of predicted values for a new data point.
We need a point estimate of the parameter to compute a distribution of the predicted
values for a new data point.
Yes, the answer is correct.
Score: 1
Accepted Answers:
To obtain a distribution over the predicted values for a new data point, we need to compute an
integral over the parameter space.
The MAP estimate of the parameter gives a point prediction for a new data point.
10) Which of the following statement(s) about activation functions is/are NOT true? 1 point
Non-linearity of activation functions is not a necessary criterion when designing very deep
neural networks
https://onlinecourses.nptel.ac.in/noc25_cs46/unit?unit=60&assessment=313 4/5