Question Bank - Deep Learning
Question Bank - Deep Learning
Sl.
No Questions
.
1. How many layers of Deep learning algorithms are constructed?
A. 2 B. 3 C. 4 D. 5
2. Which of the following is well suited for perceptual tasks?
A. Feed-forward neural networks
B. Recurrent neural networks
C. Convolutional neural networks
D. Reinforcement Learning
3.
Which neural network has only one hidden layer between the input
and output?
5.
The number of nodes in the input layer is 10 and the hidden layer
is 5. The maximum number of connections from the input layer to
the hidden layer are
A. 50 B. less than 50
C. more than 50 D. It is an arbitrary value
6. The perceptron algorithm used for
8.
The theorem that states a neural network with a single hidden layer
9.
Which of the following statements is true about the representation
power of a multilayer network of sigmoid neurons?
2.
Suppose we have a function 𝑓(𝑥1 , 𝑥2 ) = 𝑥12 + 3𝑥2 + 25 which we
want to minimize the given function using the gradient descent
algorithm. We initialize (𝑥1 , 𝑥2 ) = (0,0) . What will be the value
of 𝑥1 after ten updates in the gradient descent process? (Let η be 1)
8.
Define ”winter of AI” referring to in the history of artificial
intelligence.
9.
Given the following input values to a sigmoid neuron: x1: 0.72,
x2: 0.49, x3: 0.08, x4: 0.53, and x5: 0.27 what labels will the
sigmoid neuron predict for these inputs?
The
following network doesn't contain any biases and the weights of
the network are given below:
𝑊1 = [1 1 3 2 1 1 1 2 3 ] 𝑊2 = [1 1 2 3 2 1 ] 𝑊3 =
[2 3]. The input to the network is 𝑥 = [1 1 1 ] .
(a) What is the predicted output for the given input x after
doing the forward pass?
(b) Compute and enter the loss between the output generated
by input x and the true output y.
7
Explain the Representation Power of MLPs.
1. ________ is the use of MLFFNN.
(a) to realize structure of MLP (b) to solve pattern classification
problem
(c) to solve pattern mapping problem (d) to realize an
approximation to a MLP
2. Pattern recall takes more time for —---
a) MLFNN
b) Basis function
c) Equal for both MLFNN and basis function
d) None of the mentioned
3. The function of GRNN is _____
a) function approximation task
b) pattern classification task
c) function approximation and pattern classification task
d) none of the mentioned
4.
The following activation functions which can only give positive
outputs greater than 0 is ___
7.
—----- is a property of eigenvalues of a symmetric matrix.
(a) Eigenvalues are always positive (b) Eigenvalues are always real
8.
What is the determinant of a matrix with eigenvalues λ1 and λ2?
3. Consider a gradient profile ΔW= [1, 0.9, 0.6, 0.01, 0.1, 0.2, 0.5,
0.55, 0.56]. Assume υ-1=0, ɛ=0, ꞵ =0.9 and the learning rate
is η− 1 =0.1. Suppose that we use the Adagrad algorithm then
what is the value of η6 = η/sqrt(𝜈𝑡 + ɛ)?
4. Examine the use of bias in a neural network.
𝑡1 = [2 2 ] , 𝑡2 = [1 2 ], 𝑡3 = [2 1 ]. Find
(a) the mean of the given data points x1, x2, x3?
1
(b) the covariance matrix C= ∑𝑡 𝑡=1 (𝑡 − 𝑡)(𝑡 −
𝑡
𝑡
𝑡) .
the eigenvector corresponding to the maximum eigenvalue of
the matrix
2. What is gradient descent? Explain its types, advantages and
disadvantages in detail.
70 155
65 130
72 180
68 160
74 190
7.
Write short notes on i) AdaGrad ii) RMSProp
1. Dimensionality reduction techniques are used to:
a) Dropout
b) L2 regularization
c) Early stopping
d) Dataset augmentation
7.
The main purpose of a hidden layer in an under-complete
autoencoder is _____
(a) To increase the number of neurons in the network (b) To
reduce the number of neurons in the network
8.
If the dimension of the hidden layer representation is more than
the dimension of the input layer, then what kind of autoencoder
do we have?
9.
The the primary advantages of Autoencoders over PCA is ____
10
If the dimension of the input layer in an under-complete
autoencoder is 6, what is the possible dimension of the hidden
layer?
(a)The model will overfit the training data. (b) The model will
underfit the training data.
8.
—---- is the regularization techniques is likely to produce a
sparse weight vector.
9.
The main cause of the Dead ReLU problem in deep learning is
_____
6. Define L2 regularization.
S.No Questions
1. Derive an equation of adding Gaussian noise to the inputs of a neural
network which is equivalent to L2 regularization?
5. What is noise injection and its types and explain their benefits?
8.
What is the computational complexity of computing the
softmax function in the output layer
of a neural network?
9.
Which of the following is an advantage of the CBOW model
compared to the Skip-gram
model?
10.
The architectures which has the highest no of layers
7.
Consider a convolution operation with an input image of size
256x256x3 and 40 filters of size 11x11x3, using a stride of 4
and a padding of 2. What is the height of the output size ?
S.No Questions
1. Compare and contrast the architectural design of LeNet and
ResNet.
6.
i) Consider a convolution operation with an input image of size
100x100x3 and a filter of size 8x8x3, using a stride of 1 and a
padding of 1. What is the output size?
7.
Explain the features of ResNet and VGGNet.
S.No Questions
1. Which layer type is typically used to capture sequential
dependencies in an RNN?
a) Input layer
b) Hidden layer
c) Output layer
d) Activation layer
2. Which activation function is commonly used in the recurrent
layers of an RNN?
a) ReLU (Rectified Linear Unit)
b) Sigmoid
c) Tanh (Hyperbolic Tangent)
d) Softmax
3. What is the purpose of the bidirectional RNN architecture?
a) To handle sequential data in both forward and backward
directions
b) To reduce the computational complexity of the network
c) To adjust the learning rate during training
d) None of the above
4. What is the purpose of the peephole connections in a Long
Short-Term Memory (LSTM) network?
a) To allow the cell state to influence the gating mechanisms
b) To adjust the learning rate during training
c) To introduce non-linearity to the network
d) None of the above
5. What is the purpose of the cell state in an LSTM network?
a) To store long-term dependencies in the input sequence
b) To adjust the learning rate during training
c) To compute the gradients for backpropagation
d) None of the above
6.
Which of the following is a common variant of the attention
mechanism?
8.
Which of the following is the main disadvantage of using
BPTT?
2. Define GRU.
8.
List out the main advantage of using GRUs over traditional
RNNs.
S.No Questions
1. Suppose that we need to develop an RNN model for sentiment
classification. The input to the model is a sentence composed
of five words and the output is the sentiments (positive or
negative). Assume that each word is represented as a vector of
length 70×1 and the output labels are one-hot encoded. Further,
the state vector St is initialized with all zeros of size 50×1. How
many parameters (including bias) are there in the network?