We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 3
|. Definitions and Notations:
Let be the input vector at time step t.
Let /s be the hidden state vector at time step *.
Let 1: be the output vector at time step t
Let IV: be the weight matrix for the input to hidden connections.
+ Lot "Vi be the weight matrix for the hidden to hidden connections.
Lt V5 be the weight matrix for the hidden to output connections.
Let !s be the bias vector for the hidden layer.
Let's be the bias vector for the output layer.
2. Hidden State Calculation:
‘The hidden state / is a function of the current input and the previous hidden state /. This captures the sequential nature of the data
tanh(W, <2, + Wy hes + by
+ tanl is the hyperbolic tangent function, which introduces non-linearity
3. Output Calculation
‘The output ti at each time step is calculated using the hidden state
+ This formula maps the hidden state to the output space.‘4. Backpropagation Through Time (BPTT):
To train the RNN, we use backpropagation through time (BPTT), which unrolls the RN for a certain number of time steps and calculates
‘the gradient of the loss with respect to the weights.
+ Let Lbe the loss function (e.g., Mean Squared Error for regression),
+ Here, Tis the true target value at time step (
6. Gradient Calculation:
To update weights using gradient descent, compute the gradients of the loss with respect to the weights,
+ Gradient w.r-t. Output Weights 'Ys:
th 5
aw, 248y, OW
Where § = itis the error term at time step ¢
wt w.rt. Hidden Weights |V, and input Weights 1
+ GradiIs the backpropagated error to the hidden layer:
at
4-1)
‘This uses the derivative of the tanh funetion: tanh (=) =
= tanh?(z),
6. Weight Update Rule:
Using gradient descent, update each weight by moving in the direction of the negative gradient:
aL
aed
We nae
a
aL
W.-W, ng
+ Nis the learning rate.