Week 2
Week 2
1. Which of the following statements about the sigmoid function is NOT true?
(a) The derivative of the sigmoid function can be negative.
(b) The sigmoid function is continuous and differentiable.
(c) The sigmoid function maps any input value to a value between 0 and 1.
(d) The sigmoid function can be used as an activation function in neural networks.
Answer: (a)
Solution: The derivative of the sigmoid function is always positive
2. How many boolean functions can be designed for 4 inputs?
a)65,536
b)8
c)256
d)64
Answer: a)
4
Solution: No.of boolean functions are given by 22 = 65, 536.
3. How many neurons do you need in the hidden layer of a perceptron to learn any boolean
function with 4 inputs? (Only one hidden layer is allowed)
a)16
b)64
c)56
d)32
Answer: a)
Solution: No of neurons needed to represent all boolean functions of n inputs in the
perceptron is 2n .
4. We have a function that we want to approximate using 150 rectangles (towers). How many
neurons are required to construct the required network?
a)301
b)451
c)150
d)500
Answer: a)
Solution: To approximate one rectangle we need 2 neurons. Hence to create 150 towers we
will need 300 neurons. One extra neuron is required for aggregation
5. What happens to the output of the sigmoid function as |x| becomes very large for input x?
Select all relevant operations (MSQ)
(a) The output approaches 0.5
1
(b) The output approaches 1.
(c) The output oscillates between 0 and 1.
(d) The output approaches 0.
Answer: (b),(d)
Solution: As |x| becomes very large,(that is when x approaches −∞ and ∞) sigmoid
function approaches 0 or 1
6. We have a classification problem with labels 0 and 1. We train a logistic model and find out
that w0 learned by our model is -17. We are to predict the label of a new test point x using
this trained model. If wT x = 1, which of the following statements is True?
a) We cannot make any prediction as the value of wT x does not make sense.
b) The label of the test point is 0.
c) The label of the test point is 1.
d) We cannot make any prediction as we do not know the value of b.
Answer: b)
Solution: For a logistic model, the predicted label for a test point x is given by:
ŷ = σ(wT x + b)
0.8
0.6
0.4
0.2
wT x
12 14 16 18 20
a)w0 = 14
2
b)w0 = −14
c)w > 0
d)w < 0
Answer: b)
Solution: When w0 = −wT x then the value of sigmoid becomes 0.5. In the diagram, this
happens when wT x = 14, hence the value of w0 = −14.
8. Suppose we have a function f (x1 , x2 ) = x21 + 3x2 + 25 which we want to minimize the given
function using the gradient descent algorithm. We initialize (x1 , x2 ) = (0, 0). What will be
the value of x1 after ten updates in the gradient descent process?(Let η be 1)
a)0
b)−3
c)−4.5
d)−3
Answer: a)
Solution: Gradient of f (x1 , x2 ) at any general point (x1 , x2 ) is (2x1 , 3)
Hence, after the first update value of (x1 , x2 ) = (0, 0) − (2x, 3) = (0, −3)
Value of (x1 , x2 ) after the further updates is given by (x1 , x2 ) = (0, y) − (2x0, 3) = (0, y − 3)
Hence x1 remains 0
9. Consider a function f (x) = x3 − 3x2 + 2. What is the updated value of x after 2nd iteration
of the gradient descent update, if the learning rate is 0.1 and the initial value of x is 4?
Answer: range(1.76,1.82)
solution: Gradient of the function is 3x(x − 2) The value of x after the first update is x -
0.1(3x(x-2))=4-2.4=1.6 The value of x after the second update is x -
0.1(3x(x-2))=1.6+0.192=1.79
10. What is the purpose of the gradient descent algorithm in machine learning?
a) To minimize the loss function
b) To maximize the loss function
c) To minimize the output function
d) To maximize the output function
Answer: a) To minimize the loss function
Solution: Gradient descent is an optimization algorithm used to find the values of the
parameters that minimize the loss function.