Gated Recurrent Unit
Gated Recurrent Unit
Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that
was introduced by Cho et al. in 2014 as a simpler alternative to Long Short-
Term Memory (LSTM) networks. Like LSTM, GRU can process sequential data
such as text, speech, and time-series data.
The basic idea behind GRU is to use gating mechanisms to selectively update
the hidden state of the network at each time step. The gating mechanisms
are used to control the flow of information in and out of the network. The
GRU has two gating mechanisms, called the reset gate and the update gate.
The reset gate determines how much of the previous hidden state should be
forgotten, while the update gate determines how much of the new input
should be used to update the hidden state. The output of the GRU is
calculated based on the updated hidden state.
The equations used to calculate the reset gate, update gate, and hidden state
of a GRU are as follows:
Reset gate: r_t = sigmoid(W_r * [h_{t-1}, x_t])
Update gate: z_t = sigmoid(W_z * [h_{t-1}, x_t])
Candidate hidden state: h_t’ = tanh(W_h * [r_t * h_{t-1}, x_t])
Hidden state: h_t = (1 – z_t) * h_{t-1} + z_t * h_t’
where W_r, W_z, and W_h are learnable weight matrices, x_t is the input at
time step t, h_{t-1} is the previous hidden state, and h_t is the current hidden
state.
In summary, GRU networks are a type of RNN that use gating mechanisms to
selectively update the hidden state at each time step, allowing them to
effectively model sequential data. They have been shown to be effective in
various natural language processing tasks, such as language modeling,
machine translation, and speech recognition.
Unlike LSTM, it consists of only three gates and does not maintain an Internal
Cell State. The information which is stored in the Internal Cell State in an
LSTM recurrent unit is incorporated into the hidden state of the Gated
Recurrent Unit. This collective information is passed onto the next Gated
Recurrent Unit. The different gates of a GRU are as described below:-
1.
Update Gate(z): It determines how much of the past knowledge
needs to be passed along into the future. It is analogous to the
Output Gate in an LSTM recurrent unit.
2. Reset Gate(r): It determines how much of the past knowledge to
forget. It is analogous to the combination of the Input Gate and the
Forget Gate in an LSTM recurrent unit.
3. Current Memory Gate( ): It is often overlooked during a typical
discussion on Gated Recurrent Unit Network. It is incorporated into
the Reset Gate just like the Input Modulation Gate is a sub-part of
the Input Gate and is used to introduce some non-linearity into the
input and to also make the input Zero-mean. Another reason to
make it a sub-part of the Reset gate is to reduce the effect that
previous information has on the current information that is being
passed into the future.
The basic work-flow of a Gated Recurrent Unit Network is similar to that of
a basic Recurrent Neural Network when illustrated, the main difference
between the two is in the internal working within each recurrent unit as Gated
Recurrent Unit networks consist of gates which modulate the current input
and the previous hidden state.
Working of a Gated Recurrent Unit:
• Take input the current input and the previous hidden state as
vectors.
• Calculate the values of the three different gates by following the
steps given below:-
Note that the blue circles denote element-wise multiplication. The positive
sign in the circle denotes vector addition while the negative sign denotes
vector subtraction(vector addition with negative value). The weight matrix W
contains different weights for the current input vector and the previous
hidden state for each gate.