Exercise #2 28 - 4 - 2025
Exercise #2 28 - 4 - 2025
III. Coding:
a) Implement a vanilla RNN to process sequences of length 50 with hidden size 16.
b) Train it on a synthetic task where the label depends only on the first input token.
c) Observe and report how test accuracy changes as sequence length increases from 10 to 100.
d) Explain your observations in terms of vanishing gradients.
Solution:
Problem:
A sequence classifier uses the final hidden state hT of an RNN to predict class scores via