Assignment-8 Task 1
Assignment-8 Task 1
In this course, so far, we only discussed classifiers, which work on non-sequential data (or alternatively also data with a
fixed sequence).
a) Why might those algorithms have trouble predicting the correct output sequential data (like time-series, text, etc.) for
specific tasks? Give an example and explain it based on that.
b) In order to better make use of sequential data, in Neural Networks the concept of Recurrent Neural Networks (RNNs)
was introduced. What’s the difference to Feed-Forward Neural Networks?
c) Shortly explain the general idea of Backpropagation Through Time for training such networks along with the inherent
problems of vanishing and exploding gradients.
d) How does the variant of LSTMs (Long short-term memory) try to solve the issue of vanishing gradient problems?
e) Summarize the main take-aways of this question in 4 sentences/bullet points!
a) With standard neural networks, we have problems with predicting correct output sequential data because they are not
designed to process this sequential data. The input data to a neural network is often a single vector. Thus it does not
consider the order in which the data is entered and the correlation between the data. This is because such neural
networks don’t have mechanisms like memory which would store this data and process it.
b) Recurrent Neural Networks (RNNs) differ from Feed-Forward Neural Networks by introducing recurrent connections.
These connections create loops that allow information to persist, enabling the network to maintain a hidden state. This
hidden state captures dependencies between sequential inputs, making RNNs suitable for tasks where context or order
matters, such as language modeling or speech recognition.
c) Backpropagation Through Time (BPTT) is the training algorithm for RNNs. It unfolds the network over time, turning it
into a deep, unrolled network. However, during backpropagation, gradients can either become too small (vanishing
Task 1 1
gradients) or too large (exploding gradients). This happens because gradients are multiplied through the chain of
recurrent connections during the unfolding process.
d) Long short-term memory (LSTM) networks are a type of RNN designed to address the vanishing gradient problem.
LSTMs use a more complex memory cell with three gates: input, output, and forget gates. These gates regulate the flow
of information, allowing the network to selectively store, update, or discard information over long sequences. This
mechanism helps LSTMs capture long-term dependencies more effectively than traditional RNNs.
e) Summary:
Non-sequential classifiers struggle with tasks involving sequential data due to their lack of consideration for temporal
dependencies.
RNNs, with recurrent connections, excel in capturing sequential patterns, making them suitable for tasks like natural
language processing.
BPTT is used to train RNNs but faces challenges of vanishing and exploding gradients, hindering effective learning.
LSTMs, with specialized memory cells and gating mechanisms, address the vanishing gradient problem in RNNs,
enabling better handling of long-term dependencies in sequential data.
Task 1 2