0% found this document useful (0 votes)

24 views36 pages

Module 4

The document discusses recurrent neural networks (RNNs), detailing their structure, including recurrent neurons, memory cells, and various input-output sequence types. It covers training methods, data preparation for machine learning models, and forecasting techniques using linear models, simple RNNs, and deep RNNs, while addressing challenges like unstable gradients and short-term memory problems. Additionally, it introduces advanced architectures such as LSTM and GRU for improved performance.

Uploaded by

3BR18CS151Srinath V Devale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views36 pages

Module 4

Uploaded by

3BR18CS151Srinath V Devale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Recurrent Neurons and Layers

1. The simplest RNN has just one neuron that:

● Receives inputs at each time step t
● Receives its own previous output from time step t-1
● At the first time step, with no previous output, it starts at 0
2. When expanded to a full RNN layer:
○ Every neuron receives both the input vector x(t)
○ Every neuron receives the output vector from previous time step ŷ
(t-1)
○ The inputs and outputs become vectors instead of scalars
3. Each recurrent neuron has two weight sets:
● wx: weights for current input x(t)
● wŷ: weights for previous outputs ŷ(t-1)
● For a full layer, these become matrices Wx and Wŷ
4. The output calculation for a single instance is:
ŷ(t) = ϕ(Wx⊺x(t) + Wŷ⊺ŷ(t-1) + b)
5. For a mini-batch, the output calculation becomes:
Ŷ(t) = ϕ(X(t)Wx + Ŷ(t-1)Wŷ + b) = ϕ([X(t) Ŷ(t-1)]W + b) with W = [Wx Wŷ]
Where:
● Ŷ(t) is m × n neurons matrix (outputs at time t)
● X(t) is m × n inputs matrix (inputs for all instances)
● Wx is n inputs × n neurons matrix (weights for current inputs)
● Wŷ is n neurons × n neurons matrix (weights for previous outputs)
● b is the bias vector of size n neurons
6. Key characteristics:
● The output Ŷ(t) depends on both current input X(t) and previous output Ŷ
(t-1)
● This creates a chain of dependencies going back to the first time step
● At t=0, previous outputs are initialized to zeros
Memory Cells
1. Memory in RNNs:
● A recurrent neuron's output at time t depends on all previous inputs
● This creates a form of memory in the network
● Any part of a neural network that maintains state across time steps
is called a memory cell
2. Basic Memory Cells:
○ A single recurrent neuron is a basic memory cell
○ A layer of recurrent neurons is also a basic memory cell
○ These basic cells can typically learn patterns about 10 steps long
○ The pattern length capability varies depending on the task
3. More Complex Cells:

● Later chapters cover more sophisticated cell types

● These can learn patterns roughly 10 times longer
● Pattern length still varies based on the task

4. Cell State Characteristics:

● Cell state at time t is denoted as h(t) (h stands for "hidden")

● State is a function of:
○ Current inputs x(t)
○ Previous state h(t-1)
● Written as: h(t) = f(x(t), h(t-1))
5. Cell Output:

● Output at time t is denoted as ŷ(t)

● Output is a function of:
○ Previous state
○ Current inputs
● In basic cells: output equals state
● In complex cells: output may differ from state
Input and Output Sequences
1. Sequence-to-Sequence (top-left):
● Takes a sequence and outputs a sequence
● Example: Power consumption forecasting where you input N days of data and
output predictions shifted by one day
● Best for tasks where input and output are naturally sequential and aligned

2. Sequence-to-Vector (top-right):
● Takes a sequence but only uses final output
● Example: Sentiment analysis of movie reviews, where words are the input
sequence and the output is a single sentiment score
● Good for classification/scoring of sequential data
3. Vector-to-Sequence (bottom-left):
● Takes a single vector repeatedly as input and produces a sequence
● Example: Image captioning, where a CNN-processed image is input and
the output is a sequence of words describing it
● Useful when generating sequential content from a fixed input
4. Encoder-Decoder (bottom-right):
● Combines sequence-to-vector (encoder) with vector-to-sequence
(decoder)
● Example: Language translation, where input sentence is encoded to a
vector, then decoded to target language
● Better than direct sequence-to-sequence for translation because it can
consider entire input context before generating output
● More complex implementation than the diagram suggests (covered in
Chapter 16)
Training RNNs
1. Basic Concept:
● BPTT involves unrolling the RNN through time
● Uses regular backpropagation principles on the unrolled network
● Consists of forward pass followed by backward pass

2. Forward Pass:
● Network processes the input sequence from start to finish
● Represented by dashed arrows in Figure 15-5
● Generates predictions Ŷ(0) through Ŷ(T) for each timestep
3. Loss Function:

● Evaluates output sequence against target sequence

● Format: ℒ(Y(0), Y(1), ..., Y(T); Ŷ(0), Ŷ(1), ..., Ŷ(T))
● Can selectively ignore certain outputs depending on the task
● Example: Sequence-to-vector RNNs only use the final output

4. Backward Pass:

● Gradients flow backward through the unrolled network

● Only flows through outputs used in loss calculation
● In the example, only flows through Ŷ(2), Ŷ(3), and Ŷ(4)

5. Parameter Updates:

● Same parameters (W and b) are used at each timestep

● Parameters receive multiple gradient updates during backprop
● Final gradient descent step updates parameters just like regular backprop
Preparing Data for ML models
● The text describes preparing time series data for machine learning
models, with the goal of forecasting tomorrow's ridership based on
8 weeks (56 days) of past data.
● The concept of using sliding windows: Every 56-day window from
the past serves as training data, with the target being the value
immediately following each window.
● Keras provides two methods for creating time series datasets:

First method using timeseries_dataset_from_array():

import tensorflow as tf

my_series = [0, 1, 2, 3, 4, 5]

my_dataset = tf.keras.utils.timeseries_dataset_from_array(

my_series,

targets=my_series[3:], # targets are 3 steps into the future

sequence_length=3,

batch_size=2

)
Alternative method using window():

dataset = tf.data.Dataset.range(6).window(4, shift=1, drop_remainder=True)

dataset = dataset.flat_map(lambda window_dataset:

window_dataset.batch(4))

# Helper function for extracting windows

def to_windows(dataset, length):

dataset = dataset.window(length, shift=1, drop_remainder=True)

return dataset.flat_map(lambda window_ds: window_ds.batch(length))

Final data preparation steps for the rail ridership example:

rail_train = df["rail"]["2016-01":"2018-12"] / 1e6

rail_valid = df["rail"]["2019-01":"2019-05"] / 1e6

rail_test = df["rail"]["2019-06":] / 1e6

seq_length = 56

train_ds = tf.keras.utils.timeseries_dataset_from_array(

rail_train.to_numpy(),

targets=rail_train[seq_length:],

sequence_length=seq_length,
batch_size=32,

shuffle=True,

seed=42

valid_ds = tf.keras.utils.timeseries_dataset_from_array(

rail_valid.to_numpy(),

targets=rail_valid[seq_length:],

sequence_length=seq_length,

batch_size=32

)
Forecasting Using Linear Model
Performance Results:
● The model achieved a validation MAE of approximately 37,866
● This performance is:
○ Better than naive forecasting
○ Worse than the SARIMA model

Key Model Characteristics:

● Uses Huber loss instead of MAE directly for better performance
● Implements early stopping to prevent overfitting
● Uses SGD optimizer with momentum
● Monitors validation MAE for early stopping
Code Snippet:

tf.random.set_seed(42)

model = tf.keras.Sequential([

tf.keras.layers.Dense(1, input_shape=[seq_length])

])
Forecasting Using Simple RNN
Initial Simple RNN Implementation:

model = tf.keras.Sequential([

tf.keras.layers.SimpleRNN(1, input_shape=[None, 1])

])
2. Input Shape Requirements:
● RNN layers expect 3D inputs: [batch size, time steps, dimensionality]
● Input_shape ignores the first dimension (batch size)
● Time steps can be None (any size)
● Dimensionality is 1 for univariate time series
3. How the Simple RNN Works:
● Initial state h(init) starts at 0
● Each step processes current input and previous state
● Uses hyperbolic tangent (tanh) activation by default
● Outputs only the final value unless return_sequences=True
4. Problems with Initial Model:
● Validation MAE > 100,000 (poor performance)
● Only 3 parameters total (2 weights + 1 bias)
● Limited by tanh activation range (-1 to +1)
● Too simple for the complexity of the data
Forecasting Using Deep RNN
Code Snippet:

deep_model = tf.keras.Sequential([

tf.keras.layers.SimpleRNN(32, return_sequences=True,

input_shape=[None, 1]),

tf.keras.layers.SimpleRNN(32, return_sequences=True),

tf.keras.layers.SimpleRNN(32),

tf.keras.layers.Dense(1)

])
Fighting Unstable Gradiets Problem
1. Common Deep Learning Techniques That Help:
● Good parameter initialization
● Faster optimizers
● Dropout
2. ReLU and Non-saturating Activation Functions:
● May not help as much with RNNs
● Can actually increase instability
● Risk of exploding outputs due to weight reuse across time steps
● Saturating functions like tanh are preferred (hence being the default)
3. Gradient Issues:
● Gradients can explode
● Solutions include:
○ Using smaller learning rates
○ Monitoring gradient size (via TensorBoard)
○ Using gradient clipping
4. Batch Normalization (BN) Limitations:
● Less effective with RNNs than with feedforward networks
● Cannot be used effectively between time steps
● When used in memory cells:
○ Same BN layer used at each time step
○ Same parameters regardless of input scale
○ Only slightly beneficial when applied to layer inputs
○ Not helpful when applied to hidden states
○ Can slow down training
5. Layer Normalization Benefits:
● Better suited for RNNs than batch normalization
● Normalizes across features dimension instead of batch dimension
● Advantages:
○ Can compute statistics on the fly at each time step
○ Works independently for each instance
○ Consistent behavior during training and testing
○ Doesn't need exponential moving averages
○ Learns scale and offset parameters for each input
6. Implementation:
● Used after linear combination of inputs and hidden states
● Requires defining a custom memory cell in Keras
● Cell's call() method needs to handle both:
○ Current time step inputs
○ Previous time step hidden states
Tackling Short-Term Memory Problem

Using,
1. LSTM
2. GRU
LSTM
Code Snippet:

model = tf.keras.Sequential([

tf.keras.layers.LSTM(32, return_sequences=True, input_shape=

[None, 5]),

tf.keras.layers.Dense(14)

])
GRU
Code Snippet:

model = tf.keras.Sequential([

tf.keras.layers.GRU(32, return_sequences=True, input_shape=

[None, 5]),

tf.keras.layers.Dense(14)

])

Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
60 pages
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
No ratings yet
Deep Learning Fundamentals and ArchitecturesDeep Learning Fundamentals and Architectures
9 pages
Chapter15 RNN
No ratings yet
Chapter15 RNN
29 pages
Advanced Deep Learning with RNNs
No ratings yet
Advanced Deep Learning with RNNs
50 pages
Lesson 7 - RNN
No ratings yet
Lesson 7 - RNN
89 pages
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
No ratings yet
A Practical Guide To RNN and LSTM in Keras - by Mohit Mayank - Towards Data Science
13 pages
DL Experiments
No ratings yet
DL Experiments
19 pages
Time Series Analysis and Reinforcement Learning
No ratings yet
Time Series Analysis and Reinforcement Learning
38 pages
RNNs for Time Series Prediction
No ratings yet
RNNs for Time Series Prediction
7 pages
RNNs: LSTM and GRU Overview
No ratings yet
RNNs: LSTM and GRU Overview
32 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages
RNN Implementation Guide
No ratings yet
RNN Implementation Guide
18 pages
Deep Learning Subject Practicals Uni Mumbai
No ratings yet
Deep Learning Subject Practicals Uni Mumbai
13 pages
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
100% (1)
On Deep Machine Learning & Time Series Models: A Case Study With The Use of Keras
34 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
63 pages
Unit 4
No ratings yet
Unit 4
86 pages
DL Exp-7 16010422230
No ratings yet
DL Exp-7 16010422230
12 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
Build RNN with Numpy: Step-by-Step Guide
No ratings yet
Build RNN with Numpy: Step-by-Step Guide
36 pages
DL Mod 3
No ratings yet
DL Mod 3
4 pages
Built An AI Based Forecasting Model For Intraday Trading 1713981234
No ratings yet
Built An AI Based Forecasting Model For Intraday Trading 1713981234
4 pages
Lab - 2 Csa
No ratings yet
Lab - 2 Csa
10 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Dis6 Sol
No ratings yet
Dis6 Sol
6 pages
Recurrent Neural Networks (RNNS)
No ratings yet
Recurrent Neural Networks (RNNS)
45 pages
RNN Recurrent Neural Network: Application Input Sequence Task
No ratings yet
RNN Recurrent Neural Network: Application Input Sequence Task
10 pages
AN2DL 04 2324 RecurrentNeuralNetworks
No ratings yet
AN2DL 04 2324 RecurrentNeuralNetworks
34 pages
DL Co3 - PPT 1
No ratings yet
DL Co3 - PPT 1
22 pages
DL 8
No ratings yet
DL 8
4 pages
Module 7 RNN
No ratings yet
Module 7 RNN
12 pages
Sequence Models
No ratings yet
Sequence Models
73 pages
Handling Sequence Data in Pytorch
No ratings yet
Handling Sequence Data in Pytorch
13 pages
Module 6 - Deep Sequence Modeling-Original
No ratings yet
Module 6 - Deep Sequence Modeling-Original
65 pages
Deep Learning Basics by Romain Tavenard
No ratings yet
Deep Learning Basics by Romain Tavenard
49 pages
598 114 216 Recurrent Neural Networks
No ratings yet
598 114 216 Recurrent Neural Networks
87 pages
RNN Overview: Types, Applications, and Code
No ratings yet
RNN Overview: Types, Applications, and Code
8 pages
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
No ratings yet
Recurrent Neural Networks (RNNS) : A Gentle Introduction and Overview
16 pages
Dense Neural Nets
No ratings yet
Dense Neural Nets
68 pages
18 Rnns
No ratings yet
18 Rnns
57 pages
Unit 5
No ratings yet
Unit 5
42 pages
ch6 RNN
No ratings yet
ch6 RNN
25 pages
Chap 7.2 Sequence Analysis Using RNN LSTM
No ratings yet
Chap 7.2 Sequence Analysis Using RNN LSTM
60 pages
Lecture5 MCQ Guide
No ratings yet
Lecture5 MCQ Guide
9 pages
Advanced RNN Design & Applications
No ratings yet
Advanced RNN Design & Applications
41 pages
RNN and Time Series Data Analysis
No ratings yet
RNN and Time Series Data Analysis
37 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Understanding Recurrent Neural Networks
No ratings yet
Understanding Recurrent Neural Networks
48 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
9 Deep Leaning RNN
No ratings yet
9 Deep Leaning RNN
64 pages
CNN and RNN Applications in AI
No ratings yet
CNN and RNN Applications in AI
41 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
Sequence Modeling with Neural Networks
No ratings yet
Sequence Modeling with Neural Networks
75 pages
Blue and White Simple Business Plan Presentation
No ratings yet
Blue and White Simple Business Plan Presentation
15 pages
Autoencoder Types and Architectures
No ratings yet
Autoencoder Types and Architectures
23 pages
Module 5
No ratings yet
Module 5
23 pages
Computer Science ETR 2024 Answer Key
No ratings yet
Computer Science ETR 2024 Answer Key
4 pages
On The Open Road by Changle Stuti - 240604 - 225251
No ratings yet
On The Open Road by Changle Stuti - 240604 - 225251
159 pages
How To Maintain Versions in SAP Controlling - SAP Training Tutorials
No ratings yet
How To Maintain Versions in SAP Controlling - SAP Training Tutorials
4 pages
Dissertation Assistance at TU Wien
100% (2)
Dissertation Assistance at TU Wien
5 pages
Conntentt
No ratings yet
Conntentt
6 pages
Uster Classimat 5: Instrument For Classification and Analysis of Yarn Faults in Staple Yarns
No ratings yet
Uster Classimat 5: Instrument For Classification and Analysis of Yarn Faults in Staple Yarns
9 pages
CSIT 314 - Topic 5 - Verification & Validation and Test-Driven Development
No ratings yet
CSIT 314 - Topic 5 - Verification & Validation and Test-Driven Development
33 pages
Depression Cases in CalaBaRZon 2021
No ratings yet
Depression Cases in CalaBaRZon 2021
1 page
Trouble Shooting Filing
No ratings yet
Trouble Shooting Filing
6 pages
Dip AdultEducation (Civil) 2023
No ratings yet
Dip AdultEducation (Civil) 2023
9 pages
Installed Files Vendor
No ratings yet
Installed Files Vendor
28 pages
Vibration Program Audit Agenda Guide
No ratings yet
Vibration Program Audit Agenda Guide
1 page
DMS10 6300 307yg
No ratings yet
DMS10 6300 307yg
9 pages
Buy SSN Number
No ratings yet
Buy SSN Number
5 pages
Lab 10
No ratings yet
Lab 10
6 pages
Final Report
No ratings yet
Final Report
36 pages
Syntax and Parsing in Compiler Design
No ratings yet
Syntax and Parsing in Compiler Design
27 pages
IDBI Bank Savings Account Statement
100% (1)
IDBI Bank Savings Account Statement
4 pages
GAC Diesel Engine Application Guide
100% (1)
GAC Diesel Engine Application Guide
233 pages
Java Project
No ratings yet
Java Project
22 pages
Applications and Correlations of The Wave Equation Analysis Program GRLWEAP
No ratings yet
Applications and Correlations of The Wave Equation Analysis Program GRLWEAP
17 pages
ITIL Knowledge Management Guide
No ratings yet
ITIL Knowledge Management Guide
57 pages
Device Management
No ratings yet
Device Management
32 pages
1bcx PDF Advanced Computer Architectures A Design Space Approach International Computer Science Series by Dezso Sima Terence Fountain Peter Karsuk1
75% (4)
1bcx PDF Advanced Computer Architectures A Design Space Approach International Computer Science Series by Dezso Sima Terence Fountain Peter Karsuk1
2 pages
Junior Software Engineer Job Opening at Onextel
No ratings yet
Junior Software Engineer Job Opening at Onextel
3 pages
CYBERCRIME
No ratings yet
CYBERCRIME
6 pages
MIS107 (20) Assignment Tasmiah Jamil Esha 2013670030 Section 20
No ratings yet
MIS107 (20) Assignment Tasmiah Jamil Esha 2013670030 Section 20
7 pages
Diagnostic List Dongfeng Heavy Truck
No ratings yet
Diagnostic List Dongfeng Heavy Truck
1 page
N-Queen Backtracking Guide
No ratings yet
N-Queen Backtracking Guide
3 pages
Industrial Stud Welding Solution
No ratings yet
Industrial Stud Welding Solution
2 pages
Token Launch Strategy for Founders
No ratings yet
Token Launch Strategy for Founders
4 pages
Web Application Security Overview
No ratings yet
Web Application Security Overview
3 pages