Recurrent Neural Networks: Anahita Zarei, PH.D

This document discusses recurrent neural networks (RNNs) and their use for sequence modeling tasks. It provides an overview of long short-term memory (LSTM) and gated recurrent unit (GRU) RNNs, as well as 1D convolutional neural networks. The document motivates the use of RNNs by explaining that sequences are important in tasks like language processing, activity forecasting, and genome modeling. It then demonstrates how RNNs can model sequences by maintaining internal states, and describes how LSTMs and GRUs help address the vanishing gradient problem in simple RNNs. Finally, it provides a stock price prediction example to illustrate how to preprocess time series data and use RNNs for sequential regression tasks.

Uploaded by

Nick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views37 pages

Recurrent Neural Networks: Anahita Zarei, PH.D

Uploaded by

Nick

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Recurrent Neural Networks

Anahita Zarei, Ph.D.

Overview
• Recurrent Networks
• LSTM
• GRU
• 1D Convnets
• Reading: 6.2, 6.3, 6.4 from Deep Learning with Python
Motivation for RNN’s
• We previously saw that Convolutional Neural Networks (CNNs), form
the base of many state-of-the-art computer vision systems. However,
we do not understand the world around us with vision alone.
• Sound, for one, also plays an important role. As humans, we
communicate and express ideas through sequences of symbolic
reductions and abstract representations.
• Naturally, we would want machines to understand this manner of
processing sequential information, as it could help us to resolve many
problems we face with such sequential tasks in the real world.
Examples of Sequences
• Visit a foreign country and need to order in a restaurant.
• Want your car to perform a sequence of movements automatically so
that it is able to park by itself.
• Want to understand how different sequences of adenine, guanine,
thymine, and cytosine molecules in the human genome lead to
differences in biological processes occurring in the human body.
• What's common between all these examples?
• All are related to sequence modeling tasks. In such tasks, the training
examples (vectors of words, a set of car movements, or configuration
of A, G, T, and C molecules) are multiple time-dependent data points.
Examples of Sequences
• Don't judge a book by its ___.
• How do you know what the next word is?
• You consider the relative positions of words and (subconsciously) perform
some form of Bayesian inference, leveraging the sentences you have
previously seen and their apparent similarity to this example.
• i.e., you used your internal model of the English language to predict the
most probable word to follow.
• Language model refers to the probability of a particular configuration of
words occurring together in a given sequence.
• Such models are the fundamental components of modern speech
recognition and machine translation systems.
• They rely on modeling the likelihood of sequences of words.
Why RNN’s?
• A major characteristic of densely connected networks and convnets, is that
they have no memory.
• Each input shown to them is processed independently, with no state kept
in between inputs.
• For example, these networks would likely treat both “this movie is a bomb”
and “this movie is the bomb” as being negative reviews, since they don’t
consider inter-word relationships and sentence structure.
• Therefore, in order to process a sequence or a time series, you have to
show the entire sequence to the network at once.
• For instance, in the IMDB example: an entire movie review was
transformed into a single large vector and processed in one go. Such
networks are called feedforward networks.
Why RNN’s?
• The previous architectures did not operate over a sequence of
vectors.
• This prohibits us from sharing any time-dependent information that
may affect the likelihood of our predictions.
• In the case of image classification, the fact that NN saw the image of a
cat at the last iteration does not help it classify the current image,
because the class probabilities of these two instances are not
temporally related. However, this approach may cause problem in
other instances such as in sentiment analysis.
RNN Architecture
• An RNN processes sequences by iterating through
the sequence elements and maintaining a state
containing information relative to what it has seen
so far.
• RNN has an internal loop; they save relevant
information in memory (also referred to as its
state) and use this information to perform
predictions at subsequent time steps.
• The state of the RNN is reset between processing
two different, independent sequences (such as two
different IMDB reviews)
• So one sequence is still a single data point: a single
input to the network.
• What changes is that this data point is no longer
processed in a single step; rather, the network
internally loops over sequence elements.
RNN Architecture
• RNNs are characterized by their activation function, such as the following
function in this case
•
• In practice, you’ll always use a more elaborate model than the simple
expression above.
• SimpleRNN has a major issue: although it should theoretically be able to
retain at time t information about inputs seen many timesteps before, in
practice, such long-term dependencies are impossible to learn.
• This is due to the vanishing gradient problem, an effect that is similar to
what is observed with non-recurrent networks (feedforward networks) that
are many layers deep: as you keep adding layers to a network, the network
eventually becomes untrainable.
LSTM (Long Short-Term Memory)
• LSTM adds a way to carry information
across many timesteps.
• Imagine a conveyor belt running parallel
to the sequence you’re processing.
• Information from the sequence can jump
onto the conveyor belt at any point, be
transported to a later timestep, and jump
off, intact, when you need it.
• This is essentially what LSTM does: it
saves information for later, thus
preventing older signals from gradually
vanishing during processing.
• LSTM network provides a more complex
solution to the problems of exploding and
vanishing gradients.
GRU
• The GRU can be considered the younger sibling of the LSTM
• In essence, both leverage similar concepts to modeling long-term
dependencies, such as remembering whether the subject of the sentence
is plural, when generating following sequences.
• The underlying difference between GRUs and LSTMs is in the
computational complexity they represent.
• Simply put, LSTMs are more complex architectures that, while
computationally expensive and time-consuming to train, perform very well
at breaking down the training data into meaningful and generalizable
representations.
• GRUs, on the other hand, while computationally less intensive, are limited
in their representational abilities compared to LSTM.
• However, not all tasks require heavyset 10-layer LSTMs!!
Example – Stock Market
• The goal of this exercise is to predict the movement of stock prices.
• We will use the S&P 500 dataset, and select a random stock to
prepare for sequential modeling.
• The dataset comprises historical stock prices (opening, high, low, and
closing prices) for all current S&P 500 large capital companies traded
on the American stock market.
• We do acknowledge the stochasticity that lies embedded in market
trends: the reality is that there is a lot of randomness that often
escapes even the most predictive of models. Investor behavior is hard
to foresee, as investors tend to capitalize for various motives.
Importing the Data
Visualizing the Data
• We select a random stock (American airlines
group, aal) out of the 505 different stocks in
our dataset.
• Note that data is sorted by date, since we
deal with a time series prediction problem
where the order of the sequence is very
important to our task.
• We then visually display our data by plotting
out the high and low prices (on a given day) in
sequential order of occurrence.
• We observe that, while slightly different from
one another, the high and low prices both
follow the same pattern.
• Hence, it would be redundant to use both
these variables for predictive modeling, as
they are highly correlated. We pick just the
high values.
Convert to Numpy Array
• We will convert the high price
column on a given observation
day into a NumPy array.
• We do so by calling values on
that column, which returns its
NumPy representation.
Train, Validation, and Test Splits
We use 70% of our data for training, 15% for validation, and 15% for
test.
Visualizing the Data Subsets
• We visualize the unnormalized
training, validation, and testing
segments of the AAL stock data.
• Note that, the test data appears
between the price range of $40
to $55 in the time frame of
observations it represents, while
training data appears in the
range between $25 to $50+ in its
respectively longer span of
observation.
Normalizing the Data
• Recall that you need to normalize data for various machine learning
tasks.

• You do need to reshape your data to a 2D array from a scaler array.

Normalizing Data
Recall that we normalize data based on training parameters.
Creating sequences
• In order to train the RNN we need to organize our time series into
segments of n consecutive values in a given sequence.
• The output for each training sequence will correspond to the stock
price some timesteps into the future.
• We have two variables look_back and foresight:
• look_back refers to the number of stock prices we keep in a given
observation.
• foresight refers to the number of steps between the last data point in the
observed sequence, and the data point we aim to predict.
Sequences

• What will be the length of trainNorm after creating the

sequence?
• 648-7-6 = 635
Creating Sequences for Validation and Test
• You typically need to experiment with different values of look_back
and foresight to assess how larger look_back and foresight values
each affect the predictive power of your model.
• In practice, you will experience diminishing returns on either side for both
values.
Reshaping the Data for Keras Layers
• We need to prepare a 3D tensor of (nb_samples, look_back,
num_features).
Imports
Simple LSTM
Fitting Simple LSTM
Error Plot for Simple LSTM
Simple LSTM Performance on Test Set
Simple GRU
Fitting Simple GRU
Error Plot for Simple GRU
Simple GRU Performance on Test Set
Sequence processing with convnets
• We saw that convnets perform particularly well on computer vision
problems
• This is due to their ability to extract features from local input patches and
allow for representation modularity and data efficiency.
• The same properties that make convnets excel at computer vision also
make them highly relevant to sequence processing.
• Time can be treated as a spatial dimension, like the height or width of a 2D
image.
• convnets can be competitive with RNNs on certain sequence-processing
problems, usually at a considerably cheaper computational cost.
Understanding 1D convolution for Sequence
Data
• The convolution layers introduced previously were 2D
convolutions, extracting 2D patches from image tensors
and applying an identical transformation to every patch.
• In the same way, you can use 1D convolutions, extracting
local 1D patches (subsequences) from sequences.
• Such 1D convolution layers can recognize local patterns in
a sequence.
• Because the same input transformation is performed on
every patch, a pattern learned at a certain position in a
sentence can later be recognized at a different position,
making 1D convnets translation invariant (for temporal
translations).
• For instance, a 1D convnet processing sequences of
characters using convolution windows of size 5 should be
able to learn words length 5 or less, and recognize these
words in any context in an input sequence.
1D pooling for sequence data
• In Keras, you use a 1D convnet via the Conv1D layer, which has an interface
similar to Conv2D.
• It takes as input 3D tensors with shape (samples, time, features) and
returns similarly shaped 3D tensors.
• The convolution window is a 1D window on the temporal axis: axis 1 in the
input tensor.
• Keep in mind that here you can use larger convolution windows with 1D
convnets.
• With a 2D convolution layer, a 3 × 3 convolution window contains 3 × 3 = 9
feature vectors; but with a 1D convolution layer, a convolution window of
size 3 contains only 3 feature vectors. You can thus easily afford 1D
convolution windows of size 7 or 9.
Advantages and Drawbacks of RNN’s
References
• Hands-on Neural Networks with Keras by Niloy Purkait
• Deep Learning with Python by Chollet

Construction Management Book
100% (1)
Construction Management Book
100 pages
Databricks Generative AI Engineer Associate Practice Questions
No ratings yet
Databricks Generative AI Engineer Associate Practice Questions
7 pages
Pyspark PDF
100% (1)
Pyspark PDF
397 pages
Conclusion
86% (7)
Conclusion
2 pages
Case Study Data Science Business
100% (1)
Case Study Data Science Business
805 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
SCERTS Implementation
100% (2)
SCERTS Implementation
8 pages
Noah Stanton - Math Minute 3 8th
0% (1)
Noah Stanton - Math Minute 3 8th
3 pages
Machine Learning in Python - Course Notes
No ratings yet
Machine Learning in Python - Course Notes
36 pages
Europe 1789 To 1914.. The Age of Industry and Empire. Vol. 4 PDF
No ratings yet
Europe 1789 To 1914.. The Age of Industry and Empire. Vol. 4 PDF
631 pages
Spiritual Cleansings by Carlos G. Montenegro
100% (2)
Spiritual Cleansings by Carlos G. Montenegro
162 pages
Summary of Vaswani - Attention Is All You Need Paper
No ratings yet
Summary of Vaswani - Attention Is All You Need Paper
5 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
BERT - Assignment - Jupyter Notebook
0% (2)
BERT - Assignment - Jupyter Notebook
8 pages
Mlops Productionalization Brochure
No ratings yet
Mlops Productionalization Brochure
7 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
55 pages
Treasury Management Assignment NMIMS June 2025
No ratings yet
Treasury Management Assignment NMIMS June 2025
3 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
Machine Learning Tutorial
No ratings yet
Machine Learning Tutorial
149 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
Premanand Naik DataScientist 4years Pune
No ratings yet
Premanand Naik DataScientist 4years Pune
4 pages
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
100% (1)
Paper 1-Bidirectional LSTM With Attention Mechanism and Convolutional Layer
51 pages
Ethical Dilemmas Group Three
No ratings yet
Ethical Dilemmas Group Three
23 pages
Chartered Data Scientists Curriculum 2020 PDF
No ratings yet
Chartered Data Scientists Curriculum 2020 PDF
4 pages
Real Time Object Detection Using Deep Learning Andmachine Learning Project
No ratings yet
Real Time Object Detection Using Deep Learning Andmachine Learning Project
56 pages
Data Streams: Models and Algorithms
No ratings yet
Data Streams: Models and Algorithms
372 pages
PROJECT REPORT For Machine Learning
No ratings yet
PROJECT REPORT For Machine Learning
22 pages
Data Engineering Explanation
No ratings yet
Data Engineering Explanation
43 pages
Scalable-ML-3 4 1
No ratings yet
Scalable-ML-3 4 1
147 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Bayesian Machine Learning
No ratings yet
Bayesian Machine Learning
127 pages
Isolation Forest
No ratings yet
Isolation Forest
11 pages
Sofware Engineering 82% Unified Modeling Language 80%
100% (1)
Sofware Engineering 82% Unified Modeling Language 80%
4 pages
M1 - Introducing Google Cloud v5.2 - ILT
No ratings yet
M1 - Introducing Google Cloud v5.2 - ILT
69 pages
Instrumentation Module 3 Lesson 3
No ratings yet
Instrumentation Module 3 Lesson 3
40 pages
The Transformer Model in Equations: John Thickstun
No ratings yet
The Transformer Model in Equations: John Thickstun
5 pages
2019 0705-NET Slide-Deck
No ratings yet
2019 0705-NET Slide-Deck
97 pages
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
No ratings yet
T-GCPBDML-B - M2 - Data Engineering For Streaming Data - ILT Slides
71 pages
Concall SWSOLAR
No ratings yet
Concall SWSOLAR
20 pages
Distributed Computing With Python - Sample Chapter
No ratings yet
Distributed Computing With Python - Sample Chapter
18 pages
Unit - II Architectural Framework For IoT Systems
No ratings yet
Unit - II Architectural Framework For IoT Systems
13 pages
Lecture 6 CNN - Detection
No ratings yet
Lecture 6 CNN - Detection
48 pages
Seating Plan
No ratings yet
Seating Plan
21 pages
Netops
No ratings yet
Netops
81 pages
A Project: Kadi Sarva Vishwavidyalaya - University
No ratings yet
A Project: Kadi Sarva Vishwavidyalaya - University
45 pages
E Sports
No ratings yet
E Sports
6 pages
Master of Data Science Strategy and Leadership
No ratings yet
Master of Data Science Strategy and Leadership
30 pages
Kaggle State of Machine Learning and Data Science 2020 PDF
No ratings yet
Kaggle State of Machine Learning and Data Science 2020 PDF
30 pages
Chapter 7-Managing Quality: 07/20/2022 1 Arsi University Om Mba-2020
No ratings yet
Chapter 7-Managing Quality: 07/20/2022 1 Arsi University Om Mba-2020
33 pages
Current Electricity f1
No ratings yet
Current Electricity f1
4 pages
Kaggle: Your Machine Learning and Data Science Community
No ratings yet
Kaggle: Your Machine Learning and Data Science Community
7 pages
Lesson 6
No ratings yet
Lesson 6
3 pages
M5 - Custom Model Building With SQL in BigQuery ML Slides
No ratings yet
M5 - Custom Model Building With SQL in BigQuery ML Slides
32 pages
Counter Affidavit of FR Shay Cullen
No ratings yet
Counter Affidavit of FR Shay Cullen
6 pages
Mygov 1727777650123364621
No ratings yet
Mygov 1727777650123364621
2 pages
Neural
No ratings yet
Neural
35 pages
Cultural
No ratings yet
Cultural
9 pages
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
No ratings yet
Pre-Installed SAP Portable Hard Drive Plug N Play For Laptop and Desktops
23 pages
Discussion Board 3
No ratings yet
Discussion Board 3
9 pages
Apache Spark Essential Training
No ratings yet
Apache Spark Essential Training
30 pages
Early Stopping in Practice
No ratings yet
Early Stopping in Practice
14 pages
Data Science For Service Change: City and County of San Francisco
No ratings yet
Data Science For Service Change: City and County of San Francisco
49 pages
Cloud Foundry Certified Developer
No ratings yet
Cloud Foundry Certified Developer
7 pages
Data Science Links
No ratings yet
Data Science Links
1 page
Pioneer Sa 610
No ratings yet
Pioneer Sa 610
23 pages
Dharwar Supergroup
No ratings yet
Dharwar Supergroup
4 pages
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
No ratings yet
Predictive Maintenance Using Machine Learning: AWS Implementation Guide
11 pages
Deep Learning@Ok Interviews
No ratings yet
Deep Learning@Ok Interviews
6 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
Jss College of Arts, Commerce & Science (Autonomous) : GOVERNING BODY (2017-19)
No ratings yet
Jss College of Arts, Commerce & Science (Autonomous) : GOVERNING BODY (2017-19)
2 pages
Human Rightts
No ratings yet
Human Rightts
2 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
WP - Databricks vs. ETL Data Lake - Updated
No ratings yet
WP - Databricks vs. ETL Data Lake - Updated
12 pages
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
No ratings yet
18 Free Exploratory Data Analysis Tools For People Who Don't Code So Well
14 pages
Troubleshooting Spark Challenges
No ratings yet
Troubleshooting Spark Challenges
7 pages
Literary Criticism (LITT 501) October 13, 2018 Deautomatizing Perception
No ratings yet
Literary Criticism (LITT 501) October 13, 2018 Deautomatizing Perception
4 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
Donald Ngandeu 1
No ratings yet
Donald Ngandeu 1
6 pages
Knime Anomaly Detection Visualization
No ratings yet
Knime Anomaly Detection Visualization
13 pages
Cognos Query Tips and Guidelines
No ratings yet
Cognos Query Tips and Guidelines
11 pages
Managing Change in Printing Industry
No ratings yet
Managing Change in Printing Industry
10 pages
Brittany King Data Scientist Resume
No ratings yet
Brittany King Data Scientist Resume
1 page
Epson WF C5790 Product Brochure
No ratings yet
Epson WF C5790 Product Brochure
2 pages
Poster Presentation Templates 55x28.5
No ratings yet
Poster Presentation Templates 55x28.5
1 page
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Latihan Soal Bahasa Inggris Kelas 6
No ratings yet
Latihan Soal Bahasa Inggris Kelas 6
3 pages
Honors Physics Equations
No ratings yet
Honors Physics Equations
3 pages
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Deep Learning with Hadoop
From Everand
Deep Learning with Hadoop
Dipayan Dev
No ratings yet