Exercise #2 28 - 4 - 2025

The document discusses various problems with sequence models, specifically focusing on Vanilla RNNs, LSTMs, and GRUs. It includes multiple-choice questions, explanations of key concepts such as gradient vanishing and the roles of different gates in LSTMs and GRUs, as well as coding tasks related to RNN implementation. Additionally, it presents numerical problems related to RNNs and their computations.

Uploaded by

Mohamed ibrahim Abdeldayem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Exercise #2 28 - 4 - 2025

Uploaded by

Mohamed ibrahim Abdeldayem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Exercise #2 28/4/2025

Problems with sequence models.

1. Vanilla RNN
2. LSTM
3. GRU
I. MCQ:
1. Weight-tying in sequence models:
A. Increases model parameters
B. Forces the decoder to use the same embedding matrix as the encoder
C. Only applies to convolutional networks
D. Prevents overfitting completely
2. What is the primary role of the hidden state in a vanilla RNN?
A. To store the model’s weights
B. To capture information from previous time steps
C. To compute the output activation function
D. To initialize the input embeddings
3. In a vanilla RNN, gradient vanishing occurs because:
A. The activation functions produce zero outputs
B. The chain of repeated multiplications by the same weight matrix shrinks gradients exponentially when its
spectral radius < 1
C. The learning rate is too high
D. The loss function is not differentiable
4. Which activation function is commonly used inside RNN cells to mitigate vanishing gradients?
A. Sigmoid
B. Tanh
C. ReLU
D. Softmax
5. Which of the following helps prevent overfitting in RNNs during training?
A. Increasing the hidden state size without regularization
B. Applying dropout on recurrent connections
C. Removing all activation functions
D. Using a single-layer RNN only
6. Gradient clipping in RNN training is used to:
A. Prevent gradient explosion by capping the norm
B. Prevent gradient vanishing by boosting small gradients
C. Clip weights instead of gradients
D. Speed up backward propagation
7. In an LSTM cell, the forget gate controls:
A. How much new information can be added from the current input
B. How much past information to retain or discard
C. The activation function applied to the cell state
D. The learning rate for the weight updates
8. In an LSTM cell, the output gate controls:
A. How much of the cell state to expose as the hidden state
B. How much new information can be added to the cell state
C. How much past information to forget
D. The learning rate of the network
9. Which of the following is not a component of an LSTM cell?
A. Input gate
B. Forget gate
C. Reset gate
D. Output gate
10. A GRU differs from an LSTM because it:
A. Has separate input, forget, and output gates
B. Uses a single “update” gate instead of input+forget gates
C. Cannot model long-term dependencies
D. Always outperforms LSTMs on sequence tasks
11. In a GRU cell, the reset gate primarily controls:
A. How much past information to forget when computing the new candidate activation
B. The overall learning rate
C. The nonlinearity applied at the output
D. The dropout rates
12. A key advantage of GRUs over LSTMs is that GRUs:
A. Always achieve higher accuracy
B. Have fewer parameters and can train faster
C. Use convolutional operations
D. Requires no gating mechanisms
13. The update gate in a GRU mechanism:
A. Determines how much past information to keep versus update with new input
B. Controls the final output activation
C. Resets the hidden state to zero
D. Computes the attention weights

II. Answer the following questions:

1. Explain why vanilla RNNs struggle with long-term dependencies. Sketch how gradients can vanish or
explode through many time steps.
2. Describe the role of the forget gate in the LSTM cell. How does it help preserve long-term information?
3. Contrast the GRU’s update gate with the LSTM’s forget and input gates. Why might GRUs train faster?

III. Coding:
a) Implement a vanilla RNN to process sequences of length 50 with hidden size 16.
b) Train it on a synthetic task where the label depends only on the first input token.
c) Observe and report how test accuracy changes as sequence length increases from 10 to 100.
d) Explain your observations in terms of vanishing gradients.

VI. Numerical problem with solution

[1] Forward Pass in a Simple RNN

Given an RNN cell defined as:
[2] LSTM Gate Computation
For an LSTM cell, you are given the following:
[3] GRU Reset and Update Gate Dynamics
In a GRU, the update and reset gates are computed as:
[4] Backpropagation Through a Single RNN Cell
Consider the RNN cell from problem 1:

Solution:

[5] Gradient of the Loss for the input in an LSTM

[6] Sequence Classification Output Gradient

Problem:
A sequence classifier uses the final hidden state hT of an RNN to predict class scores via

Ultratech Cement: Particulars Test Results Requirements of
100% (1)
Ultratech Cement: Particulars Test Results Requirements of
1 page
Week 11
No ratings yet
Week 11
3 pages
UNIT 4 (MCQS)
No ratings yet
UNIT 4 (MCQS)
13 pages
Week 11 Nptel Deep Learning
No ratings yet
Week 11 Nptel Deep Learning
6 pages
Week 11
No ratings yet
Week 11
3 pages
Week11 Discussion - Deep Learning
No ratings yet
Week11 Discussion - Deep Learning
23 pages
Module 4
No ratings yet
Module 4
14 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
DL Unit-3 Question Bank
No ratings yet
DL Unit-3 Question Bank
39 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
CSE 4237 SoftCom Solutions
No ratings yet
CSE 4237 SoftCom Solutions
115 pages
RNN
No ratings yet
RNN
28 pages
Document 11
No ratings yet
Document 11
7 pages
Lecture 3 LSTM, GRU
No ratings yet
Lecture 3 LSTM, GRU
45 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
HW4 Supplement Quiz
No ratings yet
HW4 Supplement Quiz
5 pages
RNN 2
No ratings yet
RNN 2
144 pages
DL QB 2marks
No ratings yet
DL QB 2marks
4 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
DL U-Ii
No ratings yet
DL U-Ii
41 pages
UNIT-5 Foundations of Deep Learning
No ratings yet
UNIT-5 Foundations of Deep Learning
9 pages
ML (Cs-601) Unit 4 Complete
No ratings yet
ML (Cs-601) Unit 4 Complete
45 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Unit 2 DL
No ratings yet
Unit 2 DL
44 pages
LSTM and GRU
No ratings yet
LSTM and GRU
22 pages
All Mcqs
No ratings yet
All Mcqs
5 pages
CNN RNN LSTM GRU Simple
100% (3)
CNN RNN LSTM GRU Simple
20 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Long Short-Term Memory
No ratings yet
Long Short-Term Memory
9 pages
Unit 2 DL
No ratings yet
Unit 2 DL
43 pages
LSTM - Nem
No ratings yet
LSTM - Nem
8 pages
LSTM Gru Notes
No ratings yet
LSTM Gru Notes
8 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
14 pages
Bidirectional LSTM Networks For Poetry Generation in Hindi
No ratings yet
Bidirectional LSTM Networks For Poetry Generation in Hindi
4 pages
Hierarchically Gated Recurrent
No ratings yet
Hierarchically Gated Recurrent
20 pages
LSTM
No ratings yet
LSTM
24 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
LSTM
No ratings yet
LSTM
22 pages
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
No ratings yet
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
10 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
LSTM&RNN
No ratings yet
LSTM&RNN
10 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
NLP - L8 LSTM
No ratings yet
NLP - L8 LSTM
7 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
Deep Learning (MODULE-5)
No ratings yet
Deep Learning (MODULE-5)
71 pages
Practice Question DL Unit-3
No ratings yet
Practice Question DL Unit-3
3 pages
UNIT-5-Modern Recurrent Neural Networks
No ratings yet
UNIT-5-Modern Recurrent Neural Networks
60 pages
A Review of Recurrent Neural Networks
No ratings yet
A Review of Recurrent Neural Networks
36 pages
Deep Learning - Unit-V Two Marks
No ratings yet
Deep Learning - Unit-V Two Marks
5 pages
LSTM & Gru
No ratings yet
LSTM & Gru
17 pages
Sequence Generation With RNNs - Post Quiz - Attempt Review
100% (2)
Sequence Generation With RNNs - Post Quiz - Attempt Review
5 pages
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
No ratings yet
CS601 - Machine Learning - Unit 4 - Notes - 1672759767
12 pages
Deepooo
No ratings yet
Deepooo
13 pages
Machine Learning Unit 4 RNN
No ratings yet
Machine Learning Unit 4 RNN
11 pages
Unit 4
No ratings yet
Unit 4
50 pages
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
No ratings yet
RNN & LSTM: Nguyen Van Vinh Computer Science Department, UET, Vnu Ha Noi
35 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Comptia Network+ Primer
From Everand
Comptia Network+ Primer
John Greene
No ratings yet
Cisco Certified Network Associate CCNA 200-301
From Everand
Cisco Certified Network Associate CCNA 200-301
Manish Soni
No ratings yet
Organ Donar Prediction Using Machine Learning
No ratings yet
Organ Donar Prediction Using Machine Learning
13 pages
Quiz 2 AIS Niko Arniño
No ratings yet
Quiz 2 AIS Niko Arniño
8 pages
6 BSTs and AVL Trees
No ratings yet
6 BSTs and AVL Trees
12 pages
Which Control The Pitch Angle of The Tail Rotor Blades: by Pressing On The Right Pedal, The Pitch Is
No ratings yet
Which Control The Pitch Angle of The Tail Rotor Blades: by Pressing On The Right Pedal, The Pitch Is
5 pages
DD Assignment
No ratings yet
DD Assignment
40 pages
2WH Light
No ratings yet
2WH Light
36 pages
WRM Installation & Version Update Guide 9.7.3
No ratings yet
WRM Installation & Version Update Guide 9.7.3
19 pages
ALL Boolean Algebra
No ratings yet
ALL Boolean Algebra
25 pages
Fundamentals of Quantum Programming in I
No ratings yet
Fundamentals of Quantum Programming in I
354 pages
Cummins 220 KW
No ratings yet
Cummins 220 KW
7 pages
Lab Manual Digital Marketing: Mr. Prince Vohra
No ratings yet
Lab Manual Digital Marketing: Mr. Prince Vohra
32 pages
1.18 Omron Ground Fault Relay Catalogue
No ratings yet
1.18 Omron Ground Fault Relay Catalogue
3 pages
Swe-Spp-001-P-Dc-061 - DC String Cable Sizing - R3
No ratings yet
Swe-Spp-001-P-Dc-061 - DC String Cable Sizing - R3
9 pages
Trends1 Aio Pretest
No ratings yet
Trends1 Aio Pretest
4 pages
CSDM 2.0 White Paper Final
No ratings yet
CSDM 2.0 White Paper Final
23 pages
Students Perceptions On Online Education
No ratings yet
Students Perceptions On Online Education
4 pages
Admit Card SSC CGL
No ratings yet
Admit Card SSC CGL
6 pages
Nuaire ECO4-AE-1Z-STND Installation and Manual
No ratings yet
Nuaire ECO4-AE-1Z-STND Installation and Manual
4 pages
Sew Cost Map
No ratings yet
Sew Cost Map
20 pages
Microwave and RF Design Volume 3 Networks 3rd Michael Steer Download
No ratings yet
Microwave and RF Design Volume 3 Networks 3rd Michael Steer Download
86 pages
Aag Dissertation Research Grants
100% (2)
Aag Dissertation Research Grants
7 pages
Information Sheet 2.4-1 Electrical and Electric Controls
No ratings yet
Information Sheet 2.4-1 Electrical and Electric Controls
23 pages
Advanced Power Electronics Corp.: Description
No ratings yet
Advanced Power Electronics Corp.: Description
6 pages
Thurs 2-1-2025 (1-B) Practical Exam
No ratings yet
Thurs 2-1-2025 (1-B) Practical Exam
3 pages
Remaining Work / Punch List (Issued by HTL-Mech)
No ratings yet
Remaining Work / Punch List (Issued by HTL-Mech)
3 pages
IBDP Computer Science Revision Notes Paper 1
100% (1)
IBDP Computer Science Revision Notes Paper 1
32 pages
Tip 23a01-06
No ratings yet
Tip 23a01-06
7 pages
S21 Sorting Algorithms Activity-1
No ratings yet
S21 Sorting Algorithms Activity-1
2 pages
Numpy Tutorial by Expertized Guy
No ratings yet
Numpy Tutorial by Expertized Guy
12 pages