Recurrent Neural Network
Recurrent Neural Network
Recurrent Neural Network
NETWORKS:
Feedforward Network
• A trained feed forward network can be exposed to any random
collection of inputs, therefore not following any order.
wn1
w21
-1
θ1
yn y2 y1
•W n s s
ij =∑ s=1 (2V i - 1)(2V j - 1)
Updating Rule
• The values of neurons i and j will converge if the weight
between them is positive. Similarly, they will diverge if the
weight is negative.
[1 1]T
[-1 -1]T
[-1 1]T
Synchronous Update
• For the same initial vector [-1 -1]T
Case I: If y i(k) has changed from -1 to +1; i.e. (Δy i = +2), then net i
must be positive and ΔE will be negative.
Case II: If y i(k) has changed from +1 to -1; i.e. (Δy i = -2), then
neti must have been negative and ΔE will be again negative.
• Thus RNN came into existence, which solved this issue with
the help of a Hidden Layer.
• A loop allows information to be passed from one step of the network to the next.
•A RNN can be thought of as multiple copies of the same network, each passing a
message to a successor.
•This chain-like nature reveals that RNNs are intimately related to sequences and
lists and the natural architecture of neural network to use for such data.
Sentiment classification
Music Generation
Name Entity Recognition
Loss function
• In the case of a recurrent neural network, the loss function L of
all time steps is defined based on the loss at every time step.
<t> <t>
• L(y’, y) = ∑t=1 .. Ty [L (y’ , y )]
||∇L|
|
Training a RNN
• RNN uses backpropagation algorithm applied at every time
stamp, called backpropagation through time (BTT).
• This reflects the fact that the same task is performed at each
step, just with different inputs reduces the total number of
parameters we need to learn.
•An event downstream in time depends upon, and is a function of, one or more events that
came before.
Limitation of RNN
• Recurrent Neural Networks work just fine when dealing with
short-term dependencies.
• The colour of milk is ---
• Nothing to do with the context of the statement.
• The RNN need not to remember what was said before this, or what
was its meaning.
• The core concept of LSTMs are the cell state, and its various
gates.
• It looks at ht−1 and xt, and outputs a number [0, 1] for each
number in the cell state Ct−1.
• This has two parts. First, a sigmoid layer called the “input gate
layer” decides which values we’ll update.
•It’s now time to update the old cell state, Ct−1, into the new cell
state Ct.
Output Gate
• Its job is to selecting useful information
from the current cell state and showing
it out as an output via the output gate.
• The special thing about them is that they can be trained to keep
information from long ago, without washing it through time or
remove information which is irrelevant to the prediction.
A single Unit
Gated Recurrent Unit
Update Gate
• Calculating the update gate zt for time step t using the
formula:
• The update gate helps the model to determine how much of the
past information (from previous time steps) needs to be passed
along to the future.
• The model can decide to copy all the information from the past
and eliminate the risk of vanishing gradient problem.
Update Gate
Reset gate
• This gate is used to decide how much of the past information
to forget.
Current memory content
• New memory content which will use the reset gate to store the
relevant information from the past.
The model is not washing out the new input every single time but
keeps the relevant information and passes it down to the next time
steps of the network.
X,*: element-wise
Variants of RNNs
Bidirectional (BRNN)
Bidirectional (BRNN)
• Sometimes it’s not just about learning from the past to predict
the future, but we also need to look into the future to fix the
past.
• The length of the input sequence and the length of the output
sequence need not necessarily be the same.
RNNs in NLP: Language Modeling and
Generating Text
1. Given a sequence of words we want to predict the probability
of each word given the previous words.