0% found this document useful (0 votes)
9 views12 pages

Recurrent Neural Network: Unit - 3

Uploaded by

Jayasree Selvam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views12 pages

Recurrent Neural Network: Unit - 3

Uploaded by

Jayasree Selvam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Recurrent Neural

network
Unit -3
Acceptor –Encoder -Transducer
Challenge of Long Term Dependencies

• The basic problem is that gradients propagated over many stages tend to either
vanish or explode.
• Neural network optimization face a difficulty when computational graphs become deep,
e.g.,
• • Feedforward networks with many layers
• • RNNs that repeatedly apply the same operation at each time step of a long
temporal sequence
• The difficulty with long-term dependencies arise from exponentially smaller weights given
to long-term interactions.
Challenge of Long Term Dependencies
Challenge of Long Term Dependencies
• The eigenvalues raised to power of t causing eigenvalues with
magnitude less than one to decay to zero and eigenvalues with
magnitude greater than one to explode.
• To solve this problem, we need a special type of RNN that can handle
long term dependencies. This is where Long short term memory
(LSTM) networks come into picture.
Leaky units
• Designing a model that operates at different time scales is one way
to handle long term dependencies.
• This allows some parts of the model to operate at fine grained time
scales and can handle small details,
• while other parts operate at coarse time scales and more effectively
transfer information from distant past to the present.
• Various strategies for building both fine and coarse time scales are
possible.
Leaky units
• Leaky units – leaky integrators -They help models retain
information from previous time steps while also allowing
for gradual forgetting of irrelevant information.
• One type of leaky unit is the Leaky Rectified Linear Unit
(Leaky ReLU), which is an activation function that allows
a small gradient in the negative section instead of being
completely zero.

The Leaky ReLU function is defined as


f(x)=max(αx,x)
Skip connections and dropouts

• Skip connections: These are direct connections from a neuron at one time
step to a neuron at a much later time step (not just the next time step).
• By adding these skip connections, the network can more easily capture long-range
dependencies in the data. For example, a variable or feature from 10 time steps
ago can influence the current time step directly, helping the network learn from
earlier events or states.
• Mixed Delayed and Single-Step Connections
Even with these delayed connections, there’s still the possibility that gradients
can explode or vanish over time. This is because the network has both
immediate (1-step) and delayed connections, and gradients can still grow
exponentially in certain cases.
• However, the presence of both types of connections (delayed and single-step)
allows the network to capture a broader range of temporal dependencies, improving
its ability to model sequences with varying time scales.
Skip connections and dropouts
• Active Removal of Connections: One way to enforce different time scales
is to actively remove shorter connections (length-one connections) and
replace them with longer connections.
• There are two basic strategies for setting the time constants used by leaky
units.
1. One strategy is to manually fix them to values that remain constant, for example
by sampling their values from some distribution once at initialization time.
2. Another strategy is to make the time constants free parameters and learn them.

Having such leaky units at different time scales appears to help with long-term
dependencies

You might also like