0% found this document useful (0 votes)
9 views5 pages

Deep Learning U4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Deep Learning U4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Introduction to Deep Recurrent Neural Networks (RNNs)

• Definition: RNNs are designed to handle sequential data by maintaining a hidden state
that captures information from previous time steps. They are particularly effective for
tasks where the order of inputs is significant.

• Key Characteristics:

o Recurrent Connections: Enable the network to maintain and update a memory


of previous inputs.

o Sequence Processing: Suitable for tasks like time series prediction, language
modeling, and sequence classification.

2. Backpropagation Through Time (BPTT)

• Definition: An extension of the backpropagation algorithm for training RNNs. It unrolls


the RNN through time, treating it as a feedforward network.

• Steps:

o Unroll the Network: Create copies of the network for each time step.

o Forward Pass: Calculate the outputs for each time step.

o Calculate Loss: Compute the loss at each time step.

o Backward Pass: Backpropagate the loss through each time step to update the
weights.

• Challenges: Computationally expensive and can lead to vanishing/exploding gradients.

3. Vanishing and Exploding Gradients

• Vanishing Gradients: Gradients become too small, causing the network to stop
learning effectively.

o Solution: Use activation functions like ReLU, and architectures like LSTMs and
GRUs.

• Exploding Gradients: Gradients become too large, causing unstable updates and
divergent behavior.

o Solution: Gradient clipping, which limits the gradient's magnitude.

4. Truncated BPTT
• Definition: A method to reduce the computational load of BPTT by truncating the
backpropagation to a fixed number of time steps.

• Steps:

o Truncate the Sequence: Divide the sequence into smaller chunks.

o Apply BPTT: Perform BPTT within each chunk.

• Advantages: Reduces computational cost and mitigates vanishing/exploding gradients.

5. Gated Recurrent Units (GRUs)

• Definition: A type of RNN that uses gating mechanisms to control the flow of
information, addressing the vanishing gradient problem.

• Components:

o Update Gate: zt=σ(Wz⋅[ht−1,xt])z_t = \sigma(W_z \cdot [h_{t-1}, x_t])

▪ Controls how much of the past information to retain.

o Reset Gate: rt=σ(Wr⋅[ht−1,xt])r_t = \sigma(W_r \cdot [h_{t-1}, x_t])

▪ Controls how much of the past information to forget.

o New Memory Content: h~t=tanh⁡(W⋅[rt∗ht−1,xt])\tilde{h}_t = \tanh(W \cdot


[r_t * h_{t-1}, x_t])

o Final Memory at Time t: ht=zt∗ht−1+(1−zt)∗h~th_t = z_t * h_{t-1} + (1 - z_t) *


\tilde{h}_t

6. Long Short Term Memory (LSTM)

• Definition: A type of RNN that uses a more complex gating mechanism to capture long-
term dependencies and solve the vanishing gradient problem.

• Components:

o Forget Gate: ft=σ(Wf⋅[ht−1,xt]+bf)f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)

▪ Determines what information to discard from the cell state.

o Input Gate: it=σ(Wi⋅[ht−1,xt]+bi)i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)

▪ Decides what new information to add to the cell state.

o Cell State Update: C~t=tanh⁡(WC⋅[ht−1,xt]+bC)\tilde{C}_t = \tanh(W_C \cdot


[h_{t-1}, x_t] + b_C)
o Cell State: Ct=ft∗Ct−1+it∗C~tC_t = f_t * C_{t-1} + i_t * \tilde{C}_t

o Output Gate: ot=σ(Wo⋅[ht−1,xt]+bo)o_t = \sigma(W_o \cdot [h_{t-1}, x_t] +


b_o)

▪ Determines what part of the cell state to output.

o Hidden State: ht=ot∗tanh⁡(Ct)h_t = o_t * \tanh(C_t)

7. Solving the Vanishing Gradient Problem with LSTMs

• Mechanism: LSTMs use cell states and gating mechanisms to maintain a constant flow
of gradients, preserving long-term dependencies and addressing the vanishing
gradient problem.

8. Encoding and Decoding in RNN Network

• Encoding: The process of converting input sequences into fixed-size context vectors
that capture essential information.

o Encoder: An RNN that processes the input sequence and produces a context
vector.

• Decoding: The process of generating output sequences from the context vectors.

o Decoder: An RNN that takes the context vector and generates the output
sequence.

9. Attention Mechanism

• Definition: A technique that allows the model to focus on specific parts of the input
sequence when making predictions, enhancing performance on tasks with long-range
dependencies.

• Types:

o Additive Attention: Combines the hidden states and context vectors additively.

▪ Formula: eij=vTtanh⁡(Whhi+Wssj−1)e_{ij} = v^T \tanh(W_h h_i + W_s


s_{j-1})

o Multiplicative (Dot-Product) Attention: Uses a dot product between hidden


states and context vectors.

▪ Formula: eij=hiTWasj−1e_{ij} = h_i^T W_a s_{j-1}

10. Attention over Images


• Definition: Extends the attention mechanism to image data, allowing the model to
focus on specific regions of an image.

• Application: Used in image captioning, where the model generates descriptions based
on focused image regions.

o Example: Show, Attend, and Tell model for image captioning.

11. Hierarchical Attention

• Definition: A multi-level attention mechanism that allows the model to focus on


different parts of the input at different levels of abstraction.

• Application: Used in hierarchical sequence processing, such as document classification


and multi-level sequence modeling.

12. Directed Graphical Models

• Definition: Probabilistic models represented as directed graphs, where nodes


represent random variables and edges represent conditional dependencies.

• Types:

o Bayesian Networks: Directed acyclic graphs representing joint probability


distributions. Used for tasks like inference and learning in probabilistic models.

o Dynamic Bayesian Networks (DBNs): Extend Bayesian networks to model


temporal sequences. Used in applications like speech recognition and time
series prediction.

13. Applications of Deep RNN in Image Processing

• Image Captioning: Combining CNNs for feature extraction and RNNs for sequence
generation to produce textual descriptions of images.

• Image Generation: Using RNNs to generate new images based on learned patterns and
sequences.

14. Applications of Deep RNN in Natural Language Processing (NLP)

• Text Generation: Generating coherent and contextually relevant text sequences based
on input data.

• Machine Translation: Translating text from one language to another using sequence-
to-sequence models with attention mechanisms.
• Sentiment Analysis: Analyzing the sentiment of text by capturing contextual
information and understanding the sentiment expressed.

15. Applications of Deep RNN in Speech Recognition

• Speech-to-Text: Converting spoken language into written text by capturing temporal


dependencies in the audio signal.

• Speaker Identification: Recognizing and identifying speakers based on their unique


speech patterns.

16. Applications of Deep RNN in Video Analytics

• Action Recognition: Identifying and classifying actions and activities in video


sequences.

• Video Captioning

You might also like