0% found this document useful (0 votes)
8 views

Question Bank - Deep Learning

The document contains a series of important questions related to various concepts in deep learning, including neural networks, gradient descent, and autoencoders. It covers topics such as the structure of neural networks, types of activation functions, and regularization techniques. The questions are designed to assess understanding and application of these concepts in practical scenarios.

Uploaded by

vishwa1854
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Question Bank - Deep Learning

The document contains a series of important questions related to various concepts in deep learning, including neural networks, gradient descent, and autoencoders. It covers topics such as the structure of neural networks, types of activation functions, and regularization techniques. The questions are designed to assess understanding and application of these concepts in practical scenarios.

Uploaded by

vishwa1854
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Important Questions for all V units

Sl.
No Questions
.
1. How many layers of Deep learning algorithms are constructed?

A. 2 B. 3 C. 4 D. 5
2. Which of the following is well suited for perceptual tasks?
A. Feed-forward neural networks
B. Recurrent neural networks
C. Convolutional neural networks
D. Reinforcement Learning
3.
Which neural network has only one hidden layer between the input
and output?

A. Shallow neural network


B. Deep neural network
C. Feed-forward neural networks
D. Recurrent neural networks
4.
Which of the following statements is true when you use 1×1
convolutions in a CNN?

A. It can help in dimensionality reduction


B. It can be used for feature pooling
C. It suffers less overfitting due to small kernel size
D. All of the above

5.
The number of nodes in the input layer is 10 and the hidden layer
is 5. The maximum number of connections from the input layer to
the hidden layer are

A. 50 B. less than 50
C. more than 50 D. It is an arbitrary value
6. The perceptron algorithm used for

(a) Clustering data points (b) Finding the shortest path in


a graph

(c) Classifying data (d) Solving optimization


problems
7.
Suppose we have a boolean function that takes 5 inputs x1, x2, x3,
x4, x5? We have an MP neuron with parameter θ = 1. For how
many inputs will this MP neuron give output y = 1?

(a) 21 (b) 31 (c) 30 (d) 32

8.
The theorem that states a neural network with a single hidden layer

containing a finite number of neurons can approximate any


continuous function?

(a) Bayes theorem (b) Central limit theorem

(c) Fourier’s theorem (d) Universal approximation theorem

9.
Which of the following statements is true about the representation
power of a multilayer network of sigmoid neurons?

(a) A multilayer network of sigmoid neurons can represent any


Boolean function.

(b) A multilayer network of sigmoid neurons can represent any


continuous function.

(c) A multilayer network of sigmoid neurons can represent any


function.

(d) A multilayer network of sigmoid neurons can represent any


linear function.
1. Given an MP neuron with the inputs as x1, x2, x3, x4, x5 and
threshold θ=3 where x5 is inhibitory input. For input (1,1,1,0,1),
what will be the value of y?

2.
Suppose we have a function 𝑓(𝑥1 , 𝑥2 ) = 𝑥12 + 3𝑥2 + 25 which we
want to minimize the given function using the gradient descent
algorithm. We initialize (𝑥1 , 𝑥2 ) = (0,0) . What will be the value
of 𝑥1 after ten updates in the gradient descent process? (Let η be 1)

3. Given that the probability of Event A occurring is 0.20 and the


probability of Event B occurring is 0.80, which of the event has the
highest and lowest information content?

4. Mention the function of MLP.


5. List some of the advantages and applications of backpropagation.
6. Write about Thresholding Logic in Deep Learning.
7.
A neural network has two hidden layers with 5 neurons in each
layer, and an output layer with 3 neurons, and an input layer
with 2 neurons. How many weights are there in total?(Dont
assume any bias terms in the network)

8.
Define ”winter of AI” referring to in the history of artificial
intelligence.

9.
Given the following input values to a sigmoid neuron: x1: 0.72,
x2: 0.49, x3: 0.08, x4: 0.53, and x5: 0.27 what labels will the
sigmoid neuron predict for these inputs?

10. Define gradient descent.


1. We are given the following dataset with features as (x1, x2) and y
as the label (-1,1). If we apply the perception algorithm on the
following dataset with w initialized as (0,0). What will be the value
of w when the algorithm converges? (Start the algorithm from
(2,2))

2. The following diagram represents a neural network containing two


hidden layers and one output layer. The input to the network is a
column vector 𝑥 ∈ 𝑅3 . The activation function used in hidden layers
is sigmoid. The output layer doesn't contain any activation function
2
and the loss used is squared error loss (𝑝𝑟𝑒𝑑𝑦 − 𝑡𝑟𝑢𝑒𝑦 ) .

The
following network doesn't contain any biases and the weights of
the network are given below:

𝑊1 = [1 1 3 2 1 1 1 2 3 ] 𝑊2 = [1 1 2 3 2 1 ] 𝑊3 =
[2 3]. The input to the network is 𝑥 = [1 1 1 ] .

The target value y is y=8.

(a) What is the predicted output for the given input x after
doing the forward pass?
(b) Compute and enter the loss between the output generated
by input x and the true output y.

3. Explain the concept of perceptron and its types.

4 Enumerate on Multi-Layer Perceptron (MLP) and also discuss


about the layer by layer working of a multilayer perceptron with a
neat diagram.
5. Explain about backpropagation also illustrate the working of
forward and backward pass.
6.
Check whether the given input – output pair satisfy the given
MP Neuron

f(x)= { 1, if x1+x2+x3>2 ; 0, otherwise }

i) y=1 for (x1, x2, x3) = (0,1,1)

ii) y=0 for (x1, x2, x3) = (0,0,1)

iii) y=1 for (x1, x2, x3) = (0,0,0)

iv) y=1 for (x1, x2, x3) = (1,1,1)

v) y=0 for (x1, x2, x3) = (1,0,1)

7
Explain the Representation Power of MLPs.
1. ________ is the use of MLFFNN.
(a) to realize structure of MLP (b) to solve pattern classification
problem
(c) to solve pattern mapping problem (d) to realize an
approximation to a MLP
2. Pattern recall takes more time for —---
a) MLFNN
b) Basis function
c) Equal for both MLFNN and basis function
d) None of the mentioned
3. The function of GRNN is _____
a) function approximation task
b) pattern classification task
c) function approximation and pattern classification task
d) none of the mentioned
4.
The following activation functions which can only give positive
outputs greater than 0 is ___

(a) Sigmoid (b) ReLU (c) Tanh (d) Linear

5. The gradient of the function 2x2 – 3y2 + 4y – 10 at point (0, 0) is


(a) 0i + 4j (b) 1i + 10j (c) 2i – 3j (d) -3i + 4j
6.
The variant of gradient descent that uses an estimate of the next
gradient to update the current position of the parameters is

(a) Momentum optimization (b) Stochastic gradient descent

(c) Nesterov accelerated gradient descent (d) Adagrad

7.
—----- is a property of eigenvalues of a symmetric matrix.

(a) Eigenvalues are always positive (b) Eigenvalues are always real

(c) Eigenvalues are always negative (d) Eigenvalues can be


complex numbers with imaginary part non zero

8.
What is the determinant of a matrix with eigenvalues λ1 and λ2?

(a) λ1 + λ2 (b) λ1 - λ2 (c) λ1 * λ2 (d) λ1 / λ2


1. A team has a data set that contains 1000 samples for training a
feed-forward neural network. Suppose they decided to use
gradient descent algorithm to update the weights. How many
times do the weights get updated after training the network for
5 epochs using both batch gradient descent and stochastic
gradient descent algorithm?

2. Justify why Stochastic gradient algorithms result in more


oscillations of the parameter during the training process of the
neural network.

3. Consider a gradient profile ΔW= [1, 0.9, 0.6, 0.01, 0.1, 0.2, 0.5,
0.55, 0.56]. Assume υ-1=0, ɛ=0, ꞵ =0.9 and the learning rate
is η− 1 =0.1. Suppose that we use the Adagrad algorithm then
what is the value of η6 = η/sqrt(𝜈𝑡 + ɛ)?
4. Examine the use of bias in a neural network.

5. What are eigenvalues and eigenvectors? What is the purpose of


Eigen decomposition?

6. List out the different types of Gradient Descent.


7. Define Adam.
8. Define Adagrad.
9. Mention the advantage of using mini-batch gradient descent
over batch gradient descent.
10.
The probability of Event A happening is 0.95 and the
probability of Event B happening is 0.05. Which of the
following has high information content and low information
content?
S.No Questions
1. Consider the following data points x1, x2, x3 to answer
following questions:

𝑡1 = [2 2 ] , 𝑡2 = [1 2 ], 𝑡3 = [2 1 ]. Find

(a) the mean of the given data points x1, x2, x3?
1
(b) the covariance matrix C= ∑𝑡 𝑡=1 (𝑡 − 𝑡)(𝑡 −
𝑡
𝑡
𝑡) .
the eigenvector corresponding to the maximum eigenvalue of
the matrix
2. What is gradient descent? Explain its types, advantages and
disadvantages in detail.

3. Differentiate stochastic gradient descent and gradient descent.

4. Illustrate the functions and features of feedforward neural


network in detail.

5. Explain about Adagrad Optimizer in detail with its advantages and


limitations.
6.
What is the covariance between height and weight in the given
dataset?

Height (in) Weight (lb)

70 155

65 130

72 180

68 160

74 190

7.
Write short notes on i) AdaGrad ii) RMSProp
1. Dimensionality reduction techniques are used to:

a. Increase the number of features in a dataset


b. Reduce the number of features in a dataset
c. Scale the features in a dataset
d. Balance the class distribution in a dataset

2. PCA finds the directions of maximum variance in the data by:


a. Singular Value Decomposition (SVD)
b. Eigendecomposition
c. Correlation analysis
d. T-distributed Stochastic Neighbor Embedding (t-SNE)

3. Which regularization technique is commonly used to prevent


overfitting in autoencoders by penalizing large weights?
a) Early stopping
b) Dataset augmentation
c) L2 regularization
d) Instance Normalization

4. Which regularization technique involves creating new


synthetic data points based on existing data samples?

a) Dropout
b) L2 regularization
c) Early stopping
d) Dataset augmentation

5. Which dimensionality reduction technique can handle both


numerical and categorical features?
a. PCA
b. LDA
c. t-SNE
d. Multiple Correspondence Analysis
6.
—----- is a measure of the amount of variance explained by a
principal component in PCA.

(a) Eigenvalue (b) Covariance

(c) Correlation (d) Mean absolute deviation

7.
The main purpose of a hidden layer in an under-complete
autoencoder is _____
(a) To increase the number of neurons in the network (b) To
reduce the number of neurons in the network

(c) To limit the capacity of the network (d) None of These

8.
If the dimension of the hidden layer representation is more than
the dimension of the input layer, then what kind of autoencoder
do we have?

(a) Complete autoencoder (b) Under-complete autoencoder

(c) Overcomplete autoencoder (d) Sparse autoencoder

9.
The the primary advantages of Autoencoders over PCA is ____

(a) Autoencoders are less prone to overfitting than PCA.

(b) Autoencoders are faster and more efficient than PCA.

(c) Autoencoders require fewer input data than PCA.

(d) Autoencoders can capture nonlinear relationships in the


input data.

10
If the dimension of the input layer in an under-complete
autoencoder is 6, what is the possible dimension of the hidden
layer?

(a) 6 (b) 2 (c) 8 (d) 0


S.No Questions
1. What is the significance of principal components?
2. What are the different types of autoencoders?
3. What is the primary reason for adding corruption to the input
data in a denoising autoencoder?

4 How is autoencoders different from PCA?

5. What is undercomplete and overcomplete autoencoders?

6. Difference between sparse and denoising autoencoders.

7. Discuss about Contractive Autoencoders

1. Explain Principal Component Analysis (PCA) and list the


advantages and disadvantages of PCA?

2. What are autoencoders? Explain different types of


autoencoders in detail.

3. Compare different types of autoencoders.


1. What is the main advantage of using dropout regularization in
deep learning models?
A. It reduces the model's complexity.
B. It increases the size of the training dataset.
C. It improves the model's generalization ability.
D. It makes the model deeper.

2. Which regularization technique is particularly useful when


dealing with imbalanced datasets?
A. Dropout regularization
B. L1 regularization
C. Weight decay
D. Data augmentation

3. In L2 regularization, what is the penalty term added to the loss


function based on?
A. The absolute value of the weights
B. The square of the weights
C. The logarithm of the weights
D. The exponential of the weights
4. How does weight decay affect the loss function in neural
networks?
A. It adds a penalty term based on the absolute values of
the weights.
B. It increases the learning rate.
C. It decreases the batch size.
D. It adds random noise to the input data.
5. In dropout regularization, what is the probability of keeping a
neuron during training typically set to?
A. 0 B.0.5 C. 1 D. 2
6.
What is the effect of high bias on a model’s performance?

(a)The model will overfit the training data. (b) The model will
underfit the training data.

(c)The model will be unable to learn anything from the training


data. (d) The model’s performance will be unaffected by bias.
7.
How can overfitting be prevented in deep learning?

(a) By increasing the complexity of the model (b) By


decreasing the size of the training data

(c) By adding more layers to the model (d) By using


regularization techniques such as dropout

8.
—---- is the regularization techniques is likely to produce a
sparse weight vector.

(a) L1 regularization (b) L2 regularization

(c) Dropout (d) Data augmentation

9.
The main cause of the Dead ReLU problem in deep learning is
_____

(a) High variance (b) High negative bias

(c) Overfitting (d) Underfitting


1. What is the bias-variance tradeoff, and why is it important in
machine learning?
2. How does L2 regularization differ from L1 regularization?

3. Write some applications of NLP data augmentation?

4. What makes batch normalization effective in deep networks?

5. What is Greedy Layerwise Pretraining and how does it work in deep


neural networks?

6. Define L2 regularization.

S.No Questions
1. Derive an equation of adding Gaussian noise to the inputs of a neural
network which is equivalent to L2 regularization?

2. What are some common data augmentation techniques used in


Natural Language Processing (NLP) give brief explanation on each
techniques?

3. Explain ensemble methods and how do bagging work in ensemble


method?

4. What happens if we initialize all weights to 0 and what will happen


during back propagation?

5. What is noise injection and its types and explain their benefits?

6. Explain better weight initialization method.

7. Difference between Lasso and Ridge Regularization.


1. Which layer type is typically used to extract local features in a
CNN?
a) Convolutional layer b) Pooling layer
c) Fully connected layer d) Activation layer

2. What is the purpose of the pooling layer in a CNN?


a) To reduce the spatial dimensions of the feature maps
b) To introduce non-linearity to the network
c) To adjust the weights and biases of the network
d) To compute the gradients for backpropagation
3. Which layer type is used to reduce the spatial dimensions in a
CNN?
a) Convolutional layer b) Pooling layer
c) Fully connected layer d) Activation layer
4. What is the purpose of the fully connected layers in a CNN?
a) To capture global patterns and make predictions
b) To reduce the spatial dimensions of the input data
c) To apply non-linear transformations to the feature maps
d) To initialize the weights and biases of the network
5.
Which layer type is responsible for backpropagating the
gradients and updating the network's parameters in CNN?
a) Convolutional layer b) Pooling layer
c) Fully connected layer d) Activation layer
6.
Which of the following is an advantage of using the skip-gram
method over the bag-of-words approach?

(a) The skip-gram method is faster to train (b) The skip-gram


method performs better on rare words

(c) The bag-of-words approach is more accurate (d) The bag-


of-words approach is better for short text
7.
We add incorrect pairs into our corpus to maximize the
probability of words that occur in the same context and
minimize the probability of words that occur in different
contexts. This technique is called

(a) Hierarchical softmax (b) Contrastive estimation

(c) Negative sampling (d) Glove representations

8.
What is the computational complexity of computing the
softmax function in the output layer

of a neural network?

(a) O(n) (b) O(n2) (c) O(nlogn) (d) O(logn)

9.
Which of the following is an advantage of the CBOW model
compared to the Skip-gram

model?

(a) It is faster to train (b) It requires less memory

( c) It performs better on rare words (d) All of the above

10.
The architectures which has the highest no of layers

(a) AlexNet (b) GoogleNet (c) VGG (d) ResNet


S.No Questions
1. You are given the one hot representation of two words below:
CAR= [1, 0, 0, 0, 0], BUS= [0, 0, 0, 1, 0]

What is the Euclidean distance between CAR and BUS?

2. Summarize the relation between SVD and Word2Vec.

3. Consider an input image of size 100 x 100 x 3. Suppose that


we use 10 kernels (filters) each of size 1 x 1, zero padding P
= 1 and stride value S2 How many parameters are there?
(assume no bias terms)

4 Outline the motivation behind using multiple filters in one


Convolution layer.

5. Define deep art and deep dream.

6. Mention the disadvantage of using Hierarchical Softmax

7.
Consider a convolution operation with an input image of size
256x256x3 and 40 filters of size 11x11x3, using a stride of 4
and a padding of 2. What is the height of the output size ?
S.No Questions
1. Compare and contrast the architectural design of LeNet and
ResNet.

2. Explain briefly about different layers in CNN?

3. How visualization techniques help in understanding CNN


behavior?

4. Explain the principles of guided backpropagation in CNN.

5. Differentiate between AlexNet and GoogleNet

6.
i) Consider a convolution operation with an input image of size
100x100x3 and a filter of size 8x8x3, using a stride of 1 and a
padding of 1. What is the output size?

ii) Consider an input image of size 100×100×1. Suppose that


we used kernel of size 3×3, zero padding P=1 and stride value
S=3. What will be the output dimension?

iii) Consider an input image of size 100×100×3. Suppose that


we use 10 kernels (filters) each of size 1×1, zero padding P=1
and stride value S=2. How many parameters are there? (assume
no bias terms)

7.
Explain the features of ResNet and VGGNet.
S.No Questions
1. Which layer type is typically used to capture sequential
dependencies in an RNN?
a) Input layer
b) Hidden layer
c) Output layer
d) Activation layer
2. Which activation function is commonly used in the recurrent
layers of an RNN?
a) ReLU (Rectified Linear Unit)
b) Sigmoid
c) Tanh (Hyperbolic Tangent)
d) Softmax
3. What is the purpose of the bidirectional RNN architecture?
a) To handle sequential data in both forward and backward
directions
b) To reduce the computational complexity of the network
c) To adjust the learning rate during training
d) None of the above
4. What is the purpose of the peephole connections in a Long
Short-Term Memory (LSTM) network?
a) To allow the cell state to influence the gating mechanisms
b) To adjust the learning rate during training
c) To introduce non-linearity to the network
d) None of the above
5. What is the purpose of the cell state in an LSTM network?
a) To store long-term dependencies in the input sequence
b) To adjust the learning rate during training
c) To compute the gradients for backpropagation
d) None of the above
6.
Which of the following is a common variant of the attention
mechanism?

(a) Self-attention (b) Multi-task attention

(c) Adversarial attention (d) Transfer learning attention


7.
Which of the following is a common architecture used for
sequence learning in deep learning?

a) Convolutional Neural Networks (CNNs) b) Autoencoders

c) Recurrent Neural Networks (RNNs) d) Generative


Adversarial Networks (GANs)

8.
Which of the following is the main disadvantage of using
BPTT?

a) It is computationally expensive. b) It is difficult to


implement.

c) It requires a large amount of data. d) It is prone to overfitting.


S.No Questions
1. Define Truncated BPTT.

2. Define GRU.

3. What is the correct sequence of operations performed by an LSTM


at time step t?

4 Define attention mechanism

5. Define attention over images

6. What is the purpose of the forget gate in an LSTM network?

7. What are the problems in the RNN architecture?

8.
List out the main advantage of using GRUs over traditional
RNNs.
S.No Questions
1. Suppose that we need to develop an RNN model for sentiment
classification. The input to the model is a sentence composed
of five words and the output is the sentiments (positive or
negative). Assume that each word is represented as a vector of
length 70×1 and the output labels are one-hot encoded. Further,
the state vector St is initialized with all zeros of size 50×1. How
many parameters (including bias) are there in the network?

2. Explain briefly Recurrent Neural Networks (RNNs).

3. What is an encoder-decoder model and advantage of using an


encoder-decoder model for sequence-to-sequence tasks?

4. Write different Types of Attention Mechanisms for Images


and application?

5. Describe the function and features of LSTM in detail.

You might also like