Unit5 6LSTM

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

PYTHON PROGRAMMING & DATA SCIENCE

Long Short-Term Memory Networks(LSTM)


Long Short-Term Memory Networks:
 The most popular and efficient way to deal with gradient problems,
i.e., Long Short-Term Memory Network (LSTMs).
Long-Term Dependencies:
Suppose we want to predict the last word in the text:
“The clouds are in the ______.”
The most obvious answer to this is the “sky.”
Consider this sentence:
“I have been staying in Spain for the last 10 years…I can speak fluent ______.”
The word we predict will depend on the previous few words in context.
Here we need the context of Spain to predict the last word in the text, and the
most suitable answer to this sentence is “Spanish.”
Long Short-Term Memory Networks(LSTM)
The gap between the relevant information and the point where it's needed
may have become very large. LSTMs help us solve this problem.
Long Short-Term Memory Networks:
LSTMs are a special kind of Recurrent Neural Network — capable of learning
long-term dependencies by remembering information for long periods is the
default behavior.
All recurrent neural networks are in the form of a chain of repeating
modules of a neural network. In standard RNNs, this repeating module will
have a very simple structure,
such as a single tanh layer.
Long Short-Term Memory Networks(LSTM)
LSTMs also have a chain-like structure, but the repeating module is a bit
different structure. Instead of having a single neural network layer, four
interacting layers are communicating extraordinarily.
Long Short-Term Memory Networks(LSTM)
Workings of LSTMs:
The diagrammatical represent of working of an LSTM are:

LSTMs work in a 3-step process.


Long Short-Term Memory Networks(LSTM)
Step 1: Decide how much past data it should remember
The first step in the LSTM is to decide which information should be omitted
from the cell in that particular time step.
The sigmoid function determines this.
It looks at the previous state (ht-1) along with the current input xt and
computes the function.
Long Short-Term Memory Networks(LSTM)
Step 2: Decide how much this unit adds to the current state 
In the second layer, there are two parts. One is the sigmoid function, and the
other is the tanh function.
 In the sigmoid function, it decides which values to let through (0 or 1). 
tanh function gives weightage to the values which are passed, deciding their
level of importance (-1 to 1).
Long Short-Term Memory Networks(LSTM)
Step 3: Decide what part of the current cell state makes it to the output
The third step is to decide what the output will be.
First, we run a sigmoid layer, which decides what parts of the cell state make
it to the output.
Then, we put the cell state through tanh to push the values to be between -1
and 1 and multiply it by the output of the sigmoid gate.
Long Short-Term Memory Networks(LSTM)
Applications of LSTM:
Some of the famous applications of LSTM includes:
1.  Language Modelling
2. Machine Translation
3. Image Captioning
4. Handwriting generation
5. Question Answering Chatbots

You might also like