Insem2 Scheme
Insem2 Scheme
Insem2 Scheme
TECH
SECOND SESSIONAL EXAMINATION APRIL 2023: SCHEME
SUBJECT: DEEP LEARNING (DSE 5251)
(18/04/2023)
Time: 10:30-11:30 AM MAX. MARKS: 15
1 Consider an RNN with a single hidden layer containing 50 hidden units. The input to the network is a 1
sequence of 100-dimensional vectors, and the output is a sequence of 10-dimensional vectors. If there are
5 time steps, compute the total number of parameters in the network, assuming no biases are used?
Answer:
2 Consider the task of Video Captioning using an encoder-decoder network, where the input is a video and 1
output is the caption. State the equation/s for the encoder part of this network?
Answer:
3 How does the input gate in an LSTM regulate the flow of new information? 1
Answer:
An LSTM cell consists of three main components: the input gate, the forget gate, and the output gate.
These gates control the flow of information in and out of the memory cell, allowing LSTMs to effectively
capture long-term dependencies in sequential data.
The input gate in an LSTM determines how much of the new input should be stored in the cell state and
how much of the existing cell state should be preserved. The input gate is typically implemented as a
sigmoid activation function applied to a combination of the current input and the previous hidden state
(output) of the LSTM. The sigmoid function squashes the values between 0 and 1, representing the gating
mechanism. The equation for the input gate of LSTM is given as:
The output of the input gate (it) acts as a filter for the new input. A value close to 0 means that
the gate is closed, and no new information is allowed into the cell state. A value close to 1
means that the gate is open, and all the new information is allowed to flow into the cell state.
The input gate activation is then element-wise multiplied with the candidate values, which are
derived from the current input. The candidate values represent the information that can be
added to the cell state. This multiplication allows the input gate to selectively update the cell
state, preserving only the relevant information.
Scheme: Description of input gate (0.5 mark) + Brief explanation of how it regulates the
flow of information (0.5 marks)
4 Briefly explain the solution for the exploding gradient problem in RNNs. 1
Answer:
The solution for exploding gradient problem in RNN is gradient clipping, which involves
rescaling the gradients if their norm exceeds a certain threshold. This prevents the gradients from
growing too large and helps stabilize the training process. Gradient clipping can be implemented
using the following ways:
I. Clipping by value:
if ‖g‖ ≥ max_threshold or ‖g‖ ≤ min_threshold then
g ← threshold (accordingly)
end if
Answer:
The attention mechanism typically operates on a spatial grid, attending to different locations
within the feature map. The number of locations corresponds to the spatial dimensions of the
feature map, which is 56 * 56 =3136.
Scheme: Correct answer (1 mark)
6 Consider a RNN based network which takes as input yesterday and today’s values to predict tomorrow’s 2
values (The weights and biases are pretrained and mentioned in the diagram). Write down the
computations to predict tomorrow’s value based on yesterday and today. If yesterday’s input was 0.2 and
today’s input is 0.5:
Answer:
Predicted today’s value:
Input * w1 + b1 => Relu Output * w3 + b2 = 0.693
Scheme:
Computing Relu output for today (0.5 mark)
Computing tomorrow’s predicted value (1.5 mark)
7 Briefly explain how the issues of vanishing and exploding gradients arise in an RNN using relevant 2
equations.
Answer:
The problem of vanishing and exploding gradients occurs during backpropagation through time
in RNNs, specifically while computing gradients. For instance, consider the following equation
of gradient computation w.r.t recurrent weight W.
Let us analyze the term , as it is the most dependent on all the previous time steps.
--- (1)
We know that:
So,
As the activation functions and their derivatives are bounded functions, we can re-write the
above as:
, where represents the bound.
If λ = ||W||, then:
Scheme:
Briefly explaining the problem of vanishing and exploding gradients (0.5 marks)
The proof with equation (1.5 marks)
8 You have been given the task of designing a deep learning model which takes as input the names of the
people in English and converts it to the closest corresponding letters in Hindi. With the help of a neat
block/architecture diagram and computations, explain the working of the encoder-decoder model for this
task. 3
Answer:
Scheme:
Block diagram with all inputs, outputs, parameters marked (1 mark)
Computation with explanation (2 marks)
9 You have been given the task of designing a deep learning model for document classification. With the 3
help of a neat block/architecture diagram and computations, explain the working of the encoder-decoder
model with attention mechanism, for this task.
Answer:
Scheme:
Block diagram (1.5 marks)
Computation (1.5 marks)