Module 6
Module 6
Intelligence
Recurrent Neural Networks
No Relation
Recurrent Neural
Network
▪ RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous
computations
▪ Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far
▪ RNN typically predicts one output per each time step. Conceptually,
▪ Then, it calculates the error across each time step and adds up all of the
▪ Following which the network is rolled back up and the weights are updated
Suppose we try to
predict the last word
in this text.. Input Output
Input Output
I’ve been staying in Spain for the last 10 years. I can speak fluent …………..
• In this case, the network needs the context of ‘Spain’ to predict the last word in this text, which is “Spanish”
• The gap between the word which we want to predict and the relevant information is very large and this is known as
long term dependency
▪ Now, if there is a really long dependency, there’s a good probability that one
of the gradients might approach zero and this would lead to all the gradients
∂E/∂W=0
▪ Such states would no longer help the network to learn anything. This is
Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem
Standard RNN
All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will
Long Short Term Networks are special kind of RNNs which are explicitly designed to avoid the long-term dependency problem
h1 h
h 1
LSTM
h
1 1
LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer,
The key to LSTMs is the cell state. The cell state is kind of like a conveyor belt. It runs straight down the entire
chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged
h
h 1 h
The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures
called gates
Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and
a pointwise multiplication operation
The sigmoid layer outputs numbers between zero and one, describing how much of each component should be
let through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
Step 1
The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This
decision is made by a sigmoid layer called the “forget gate layer”
Step 2
The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a
sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of
new candidate values, that could be added to the state
Step 3
Then we have to update the old cell state, Ct-1, into new cell state Ct. So, we multiply the old state (Ct-1) by ft,
forgetting the things we decided to forget earlier. Then we add (it * C~t). This is the new candidate values, scaled
by how much we decided to update each state value
h
h 1 h
Step 4
Finally, we’ll run a sigmoid layer which decides what part of the cell state we’re going to output. Then, we put the
cell state through tanh and multiply it by the output of the sigmoid gate, so that we only output the parts we
decided to
h
h 1
Adding the LSTM layer with the output and input shape:
10
11
12
13
Normalized
Raw Data Normalizing
Data
14
15
Fitting the model with normalized values and number of epochs to be 500:
16
17
18
A True
B False
A True
B False
A 4
B 2
C 3
D None of these
A 4
B 2
C 3
D None of these
A 1
B 2
C 3
D 4
A 1
B 2
C 3
D 4