0% found this document useful (0 votes)

138 views22 pages

Long Short-Term Memory Networks PDF

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) that addresses the vanishing gradient problem and allows learning of long-term dependencies. An LSTM memory cell contains gates that regulate the flow of information into and out of the cell through multiplicative interactions. These gates allow the network to store and access information over long periods of time. LSTMs have achieved state-of-the-art results on tasks involving sequence modeling such as speech recognition, machine translation, and image captioning.

Uploaded by

Subham Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

138 views22 pages

Long Short-Term Memory Networks PDF

Uploaded by

Subham Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Long Short-Term Memory

Akshay Sood
Introduction
● Feedforward neural networks
Recurrent Neural Networks (RNNs)
● Networks with feedback loops (recurrent edges)
● Output at current time step depends on current input as well as previous state (via recurrent edges)

UNFOLD
IN TIME
Training RNNs
● Backpropagation Through Time (BPTT)
○ Regular (feedforward) backprop applied to RNN unfolded in time
○ Truncated BPTT approximation
Training RNNs
● Problem: can’t capture long-term dependencies due to vanishing/exploding
gradients during backpropagation

UNFOLD
Long Short-Term Memory networks (LSTMs)
● A type of RNN architecture that addresses the vanishing/exploding gradient problem
and allows learning of long-term dependencies

● Recently risen to prominence with state-of-the-art performance in speech recognition,

language modeling, translation, image captioning
LSTMs
Central Idea: A memory cell (interchangeably block) which can maintain its state
over time, consisting of an explicit memory (aka the cell state vector) and gating
units which regulate the information flow into and out of the memory.

MEMORY

LSTM Memory Cell

LSTM Memory Cell
Gate (sigmoid layer
followed by pointwise
multiplication)

Simplified schematic
for reference
Cell state vector
● Represents the memory of the LSTM
● Undergoes changes via forgetting of old memory (forget gate) and addition of new
memory (input gate)

Cell state vector

Gates

● Gate: sigmoid neural net layer followed by pointwise multiplication operator

● Gates control the flow of information to/from the memory

● Gates are controlled by a concatenation of the output from the previous time step and
the current input and optionally the cell state vector.
Forget Gate
● Controls what information to throw away from memory
Input Gate
● Controls what new information is added to cell state from current input
Memory Update
● The cell state vector aggregates the two components (old memory via the
forget gate and new memory via the input gate)
Output Gate
● Conditionally decides what to output from the memory
LSTM Memory Cell Summary
LSTM Training
● Backpropagation Through Time (BPTT) most common
● What weights are learned?
○ Gates (input/output/forget)
○ Input tanh layer
● Outputs depend on the task:
○ Single output prediction for the whole sequence (e.g. below)
○ One output at each time step (sequence labeling)
Deep LSTMs

● Deep LSTMs can be created by stacking multiple LSTM

layers vertically, with the output sequence of one layer
forming the input sequence of the next (in addition to
recurrent connections within the same layer)

● Increases the number of parameters - but given sufficient

data, performs significantly better than single-layer LSTMs
(Graves et al. 2013)

● Dropout usually applied only to non-recurrent edges,

including between layers
Bidirectional RNNs
● Data processed in both directions processed with two separate hidden layers, which are then fed
forward into the same output layer

● Bidirectional RNNs can better exploit context in both directions, for e.g. bidirectional LSTMs perform
better than unidirectional ones in speech recognition (Graves et al. 2013)
LSTMs for Machine Translation (Sutskever et al. 2014)

● Encoder and decoder LSTMs

Demos
● Handwriting generation demo:
http://www.cs.toronto.edu/~graves/handwriting.html
● Music composition:
http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neur
al-networks/
● Image captioning and other stuff:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
References
● Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
(The original paper on LSTMs; the forget gate was added later)
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (A great blog post introducing LSTMs)
● Lipton, Zachary C., John Berkowitz, and Charles Elkan. "A critical review of recurrent neural networks for sequence
learning." (A nice review of RNNs, including LSTMs, bidirectional RNNs and state-of-the-art applications)
● https://deeplearning4j.org/lstm (Another nice introduction to recurrent networks and LSTMs, with code examples -
Deeplearning4j is a deep learning platform for Java and Scala)
● Sutskever, I., Vinyals, O., & Le, Q. (2014). Sequence to sequence learning with neural networks. Advances in Neural
Information. (A paper that proposes two LSTMs (one for encoding, one for decoding) for machine translation)
● Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. Acoustics,
Speech and Signal. (A paper that proposes deep bidirectional LSTMs for speech recognition)
● Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. (Paper introducing image captioning
using ConvNet + LSTM)
● https://medium.com/@shiyan/understanding-lstm-and-its-diagrams-37e2f46f1714 (Neat LSTM explanation diagrams)
● http://deeplearning.net/tutorial/lstm.html (Tutorial applying LSTM to sentiment analysis)
● https://xkcd.com/1093/
Other useful links
● http://deeplearning.net/tutorial/lstm.html
● https://github.com/zhongkaifu/RNNSharp
● http://blog.leanote.com/post/[email protected]/RNN-and-LSTM-List
● https://deeplearning4j.org/lstm
● https://apaszke.github.io/lstm-explained.html
● https://medium.com/@shiyan/understanding-lstm-and-its-diagrams-37e2f46f1
714

Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Create A Map Student Guide
No ratings yet
Create A Map Student Guide
6 pages
01 - Lecture Slide - Overview of Tensorflow
100% (1)
01 - Lecture Slide - Overview of Tensorflow
65 pages
Machine Learning Cheat Sheet ??? - ?
No ratings yet
Machine Learning Cheat Sheet ??? - ?
231 pages
1 Wire Command Set
No ratings yet
1 Wire Command Set
46 pages
KSC2016 - Recurrent Neural Networks
No ratings yet
KSC2016 - Recurrent Neural Networks
66 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Computer Vision I: Ai Courses by Opencv
No ratings yet
Computer Vision I: Ai Courses by Opencv
9 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Introduction To Natural Language Processing (NLP)
No ratings yet
Introduction To Natural Language Processing (NLP)
87 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
NLP and Generative AI Syllabus - 2025
No ratings yet
NLP and Generative AI Syllabus - 2025
5 pages
TensorFlow Basics
100% (1)
TensorFlow Basics
38 pages
Maths of Machine Learning
No ratings yet
Maths of Machine Learning
75 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
How To Use An Existing DNN Recognizer For Decoding in Kaldi
No ratings yet
How To Use An Existing DNN Recognizer For Decoding in Kaldi
14 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
Yourfirstweekwithreact 2 Ndedition
No ratings yet
Yourfirstweekwithreact 2 Ndedition
177 pages
Deep Learning Lab With Output
No ratings yet
Deep Learning Lab With Output
12 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Adaptive Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Adaptive Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
19 pages
Maths For Machine Learning
No ratings yet
Maths For Machine Learning
47 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
No ratings yet
Stanford University CS224d - Deep Learning For Natural Language Processing - Syllabus
3 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Online Machine Learning Algorithms For Currency Exchange Prediction
No ratings yet
Online Machine Learning Algorithms For Currency Exchange Prediction
84 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
Back Propagation
100% (1)
Back Propagation
27 pages
An Intro To Threading in Python - Real Python
No ratings yet
An Intro To Threading in Python - Real Python
25 pages
Lecture 1 Kaldi
No ratings yet
Lecture 1 Kaldi
56 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
Deep Learning With Keras Tutorial
No ratings yet
Deep Learning With Keras Tutorial
34 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Learning AI
No ratings yet
Learning AI
27 pages
NLP Semester 7
No ratings yet
NLP Semester 7
1,072 pages
Computer Education For Nepali School Students - QBASIC CLASS IX
No ratings yet
Computer Education For Nepali School Students - QBASIC CLASS IX
10 pages
Natural Language Processing (NLP) With Python - Tutorial
No ratings yet
Natural Language Processing (NLP) With Python - Tutorial
72 pages
Pattern Classification: All Materials in These Slides Were Taken From
No ratings yet
Pattern Classification: All Materials in These Slides Were Taken From
44 pages
Python-Linear Regression
No ratings yet
Python-Linear Regression
72 pages
Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
No ratings yet
Physics-Informed Neural Networks For Encoding Dynamics in Real Physical Systems
110 pages
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
100% (1)
AdvancesInKnowledgeDicoveryAndDataMining 2012 Part1
642 pages
Introduction To Compilers1
No ratings yet
Introduction To Compilers1
47 pages
Cours 2 - Training Deep Neural Networks
No ratings yet
Cours 2 - Training Deep Neural Networks
42 pages
Python Notes
No ratings yet
Python Notes
279 pages
Fake News Detection
No ratings yet
Fake News Detection
14 pages
React Js Cheat Sheet
No ratings yet
React Js Cheat Sheet
280 pages
Parallel and Distributed Computing Systems
100% (1)
Parallel and Distributed Computing Systems
57 pages
Pytorch Lightning Manual Readthedocs Io English May2020
No ratings yet
Pytorch Lightning Manual Readthedocs Io English May2020
562 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
LSTM
No ratings yet
LSTM
42 pages
Software Engineering: Chapter 6-Data Flow Diagram
No ratings yet
Software Engineering: Chapter 6-Data Flow Diagram
32 pages
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
No ratings yet
Chomsky Normal Form: Dr.C.Sathiya Kumar, Associate Professor, VIT Univerrsity
4 pages
LSTM
No ratings yet
LSTM
19 pages
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
No ratings yet
Long Short-Term Memory Networks (LSTM) - Simply Explained! - Data Basecamp
4 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
No ratings yet
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
39 pages
Fulltext PDF
No ratings yet
Fulltext PDF
28 pages
Long Short-Term Memory Networks PDF
No ratings yet
Long Short-Term Memory Networks PDF
22 pages
2017 Davidnelson Ijcnn17 PDF
No ratings yet
2017 Davidnelson Ijcnn17 PDF
9 pages
Fulltext PDF
No ratings yet
Fulltext PDF
28 pages
Unit I - MMD - Lecture NoteStu
No ratings yet
Unit I - MMD - Lecture NoteStu
10 pages
360 Ring of Light Error Codes Simplified RevA Oct20'05
No ratings yet
360 Ring of Light Error Codes Simplified RevA Oct20'05
7 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Q4 - WEEK2 - WW - PT For G9
No ratings yet
Q4 - WEEK2 - WW - PT For G9
3 pages
Rezilens Profile-New
No ratings yet
Rezilens Profile-New
13 pages
An Introduction To Biometric Recognition
No ratings yet
An Introduction To Biometric Recognition
17 pages
Device Dispatch
No ratings yet
Device Dispatch
7 pages
OPT B1plus Unit Test 11 Higher
No ratings yet
OPT B1plus Unit Test 11 Higher
6 pages
CM100 SpecificationEng
No ratings yet
CM100 SpecificationEng
3 pages
Tsedey Bank
No ratings yet
Tsedey Bank
11 pages
PHD Position
No ratings yet
PHD Position
2 pages
Jyothsna CV
No ratings yet
Jyothsna CV
1 page
(6es7952-1al00-0aa0) Memory Card
No ratings yet
(6es7952-1al00-0aa0) Memory Card
1 page
Federated Learning For Healthcare - Systematic Review and Architecture Proposal
No ratings yet
Federated Learning For Healthcare - Systematic Review and Architecture Proposal
23 pages
Javell: Address: 23 A East Avenue, Linstead P.O., Jamaica Email: Telephone: (876) 484-8766 1876-416-8765
No ratings yet
Javell: Address: 23 A East Avenue, Linstead P.O., Jamaica Email: Telephone: (876) 484-8766 1876-416-8765
3 pages
Mark Min
No ratings yet
Mark Min
6 pages
TTP-245p 247 User Manual E
No ratings yet
TTP-245p 247 User Manual E
50 pages
Brit J Educational Tech - 2023 - Giannakos - The Role of Learning Theory in Multimodal Learning Analytics
No ratings yet
Brit J Educational Tech - 2023 - Giannakos - The Role of Learning Theory in Multimodal Learning Analytics
22 pages
PV Inverter Thesis
100% (1)
PV Inverter Thesis
7 pages
Guide For Anaconda Navigator Installation
No ratings yet
Guide For Anaconda Navigator Installation
4 pages
Smart Car Parking System in Multiplexes
No ratings yet
Smart Car Parking System in Multiplexes
6 pages
Lab 1 - Modeling Photovoltaic Module in Matlab-Simulink
No ratings yet
Lab 1 - Modeling Photovoltaic Module in Matlab-Simulink
4 pages
PDF24 Creator Manual
No ratings yet
PDF24 Creator Manual
3 pages
HM616 HM618
No ratings yet
HM616 HM618
4 pages
Virtualized Research Environments On Bwforcluster Nemo: Zki Arbeitskreis Supercomputing, 17.03.2017, Duisburg
No ratings yet
Virtualized Research Environments On Bwforcluster Nemo: Zki Arbeitskreis Supercomputing, 17.03.2017, Duisburg
18 pages
Fifth Generation: List Processing: LISP
No ratings yet
Fifth Generation: List Processing: LISP
7 pages
Bhopal XII CS QP - PRE TERM END 2
No ratings yet
Bhopal XII CS QP - PRE TERM END 2
4 pages
Memory Subsystems - Types of Memory - Memory Connections (Pin Assignments) - Memory Devices - Memory Capacity and Organizations - Address Decoding
No ratings yet
Memory Subsystems - Types of Memory - Memory Connections (Pin Assignments) - Memory Devices - Memory Capacity and Organizations - Address Decoding
46 pages

Long Short-Term Memory Networks PDF

Uploaded by

Long Short-Term Memory Networks PDF

Uploaded by

Long Short-Term Memory

● Recently risen to prominence with state-of-the-art performance in speech recognition,

LSTM Memory Cell

Cell state vector

● Gate: sigmoid neural net layer followed by pointwise multiplication operator

● Gates control the flow of information to/from the memory

● Deep LSTMs can be created by stacking multiple LSTM

● Increases the number of parameters - but given sufficient

● Dropout usually applied only to non-recurrent edges,

● Encoder and decoder LSTMs

You might also like