DL Module 5
DL Module 5
Explain how the recurrent neural network (RNN) processes data sequences.
Recurrent Neural Networks (RNNs) are designed to process sequential data by retaining
information about previous inputs in their internal state. This allows RNNs to model dependencies
in sequences, making them ideal for tasks like time series forecasting, natural language
processing, and speech recognition.
○
3. Output Generation:
○ At each time step, the RNN can produce an output yt based on the hidden state:
4. Sequence Dependency:
○ The hidden state ht serves as a connection between time steps, allowing the model to
capture dependencies across the sequence.
In standard "causal" RNNs, the state at time t captures information from past inputs (x1,x2,…,xt−1)
and the current input (xt). However, many applications require predictions that depend on the entire
input sequence, including future inputs. For example:
Applications
Extensions to 2D Data
● For 2D inputs like images, the bidirectional approach can be extended by having RNNs
operate in four directions: up, down, left, and right.
● At each pixel (i,j) the output Oi,j is influenced by neighboring pixels and potentially
long-range dependencies.
● Advantages over Convolutional Networks:
○ While CNNs focus on local interactions through filters, bidirectional RNNs can capture
long-range dependencies across the image.
○ Trade-off: RNNs for 2D data are computationally more expensive than CNNs.
Explain LSTM working principle along with equations.
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) designed to capture
long-term dependencies by managing information flow through gating mechanisms. LSTM cells
address the vanishing and exploding gradient problems in standard RNNs, making them effective
for tasks requiring long-term memory, such as speech and handwriting recognition.
\
Write a note on Speech Recognition and NLP.
Speech recognition aims to map spoken language (acoustic signals) into the corresponding
sequence of words. The process involves the following key points:
Early Approaches:
NLP :
Why It Is Used:
1. Accelerates Training: By providing the correct output from the ground truth at each time
step, the model learns faster as it avoids compounding errors.
2. Prevents Error Accumulation: Using the model's own predictions can lead to cascading
errors when predictions deviate from the ground truth. Teacher forcing mitigates this issue
during training.
3. Stabilizes Learning: It ensures that the model stays on the correct path by aligning its
predictions with the ground truth sequence.
4. Improves Convergence: It often leads to faster convergence of the model compared to
training with predicted inputs.
● Exposure Bias: During inference, the model uses its own predictions as inputs, which can
differ from the training process where it always uses ground truth inputs. This mismatch can
lead to poor performance when the model is deployed.
● Dependency on Ground Truth: The model might over-rely on the ground truth during
training and fail to generalize when ground truth inputs are unavailable during inference.
In standard RNNs, each of these transformations is shallow, meaning they involve a single layer of
computation (a learned affine transformation followed by a nonlinearity).
Trade-offs
● Advantages: Adding depth enhances the model's capacity to process complex data and
extract high-level features.
● Challenges: Deeper architectures make optimization harder, as gradients must propagate
through longer paths, increasing the risk of vanishing or exploding gradients.
Key Features:
Advantages: