0% found this document useful (0 votes)
7 views

Assignment-5-Solution

This document is an assignment on Large Language Models consisting of 8 questions, with a total mark of 10. The questions cover topics such as the disadvantages of RNNs, the purpose of LSTM cell states, and the time complexity of RNNs. Each question includes the correct answer and a brief solution or explanation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Assignment-5-Solution

This document is an assignment on Large Language Models consisting of 8 questions, with a total mark of 10. The questions cover topics such as the disadvantages of RNNs, the purpose of LSTM cell states, and the time complexity of RNNs. Each question includes the correct answer and a brief solution or explanation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction to Large Language Models

Assignment- 5

Number of questions: 8 Total mark: 6 X 1 + 2 X 2 = 10


_________________________________________________________________________

QUESTION 1: [1 mark]
Which of the following is a disadvantage of Recurrent Neural Networks (RNNs)?

a. Can only process fixed-length inputs.


b. Symmetry in how inputs are processed.
c. Difficulty accessing information from many steps back.
d. Weights are not reused across timesteps.

Correct Answer: c

Solution: Please refer to the lecture slides.


_______________________________________________________________________

QUESTION 2: [1 mark]

Why are RNNs preferred over fixed-window neural models?


a. They have a smaller parameter size.
b. They can process sequences of arbitrary length.
c. They eliminate the need for embedding layers.
d. None of the above.

Correct Answer: b
Solution: Please refer to lecture slides.
_________________________________________________________________________

QUESTION 3: [1 mark]

What is the primary purpose of the cell state in an LSTM?


a. Store short-term information.
b. Control the gradient flow across timesteps.
c. Store long-term information.
d. Perform the activation function.

Correct Answer: c
Solution: The cell stores long-term information in LSTM.
_________________________________________________________________________

QUESTION 4: [1 mark]

In training an RNN, what technique is used to calculate gradients over multiple timesteps?
a. Backpropagation through Time (BPTT)
b. Stochastic Gradient Descent (SGD)
c. Dropout Regularization
d. Layer Normalization

Correct Answer: a
Solution: Please refer to lecture slides.
_________________________________________________________________________

QUESTION 5: [2 mark]

Consider a simple RNN:

● Input vector size: 3


● Hidden state size: 4
● Output vector size: 2
● Number of timesteps: 5

How many parameters are there in total?

a. 210
b. 190
c. 90
d. 42

Correct Answer: d
Solution:
Input to hidden weights: 3×4=12
Hidden to hidden weights: 4×4=16
Hidden to output weights: 4×2=8
Bias terms: 4(hidden) + 2(output) = 6
Total: 12+16+8+6=42
_________________________________________________________________________

QUESTION 6: [1 mark]

What is the time complexity for processing a sequence of length 'N' by an RNN, if the input
embedding dimension, hidden state dimension, and output vector dimension are all 'd'?

a. O(N)
b. O(N²d)
c. O(Nd)
d. O(Nd²)

Correct answer: d
Solution: The time complexity of processing a sequence of length N by an RNN depends on
the computational cost of updating the hidden state at each time step.
At each time step, the RNN updates its hidden state ht using the previous hidden state ht-1
and the current input xt. This update typically involves matrix multiplications:

I. Input-to-hidden transformation: Wx * xt, where Wx is a d × d matrix, leading to a


complexity of O(d²).
II. Hidden-to-hidden transformation: Wh * ht-1, where Wh is also a d × d matrix, leading
to a complexity of O(d²).
III. Activation function application: This is typically O(d) and negligible compared to
matrix multiplications.

Since these computations occur at every time step, the total complexity for a sequence of
length N is: O(N * d²)
_________________________________________________________________________

QUESTION 7: [1 mark]

Which of the following is true about Seq2Seq models?


(i) Seq2Seq models are always conditioned on the source sentence.
(ii) The encoder compresses the input sequence into a fixed-size vector representation.
(iii) Seq2Seq models cannot handle variable-length sequences.
a. (i) and (ii)
b. (ii) only
c. (iii) only
d. (i), (ii), and (iii)

Correct Answer: a
Solution: Seq2Seq models are designed to encode variable-length sequences but
compress them into fixed-size vector representations.

_________________________________________________________________________

QUESTION 8: [2 marks]

Given the following encoder and decoder hidden states, compute the attention scores. (Use
dot product as the scoring function)

Encoder hidden states: h1=[1,2], h2=[3,4], h3=[5,6]

Decoder hidden state: s=[0.5,1]

a. 0.00235,0.04731,0.9503
b. 0.0737,0.287,0.6393
c. 0.9503,0.0137,0.036
d. 0.6393,0.0737,0.287

Correct Answer: a
Solution:
e1 = 1*0.5+2*1 =0.5+2 = 2.5
e2 = 3*0.5+4*1 =1.5+4 = 5.5
e3 = 5*0.5+6*1 =2.5+6 = 8.5

α1 = e2.5/(e2.5 + e5.5 + e8.5) = 0.00235

α2 = e5.5/(e2.5 + e5.5 + e8.5) = 0.04731

α3 = e8.5/(e2.5 + e5.5 + e8.5) = 0.9503

_________________________________________________________________________

You might also like