0% found this document useful (0 votes)
29 views13 pages

UNIT5

deep learning

Uploaded by

Principal CECC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views13 pages

UNIT5

deep learning

Uploaded by

Principal CECC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT–V

Sequence Modeling:

Recurrent and Recursive Nets Unfolding Computational Graphs,


Recurrent Neural Networks, Bidirectional RNNs, Deep Recurrent
Networks, Recursive Neural Networks, The Challenge of Long-Term
Dependencies, Optimization for Long-Term Dependencies, Explicit
Memory.

Unfolding Computational Graphs

Basic formula of RNN (10.4) is shown below:

It basically says the current hidden state h(t) is a function f of the previous
hidden state h(t-1) and the current input x(t). The theta are the parameters
of the function f. The network typically learns to use h(t) as a kind of lossy
summary of the task-relevant aspects of the past sequence of inputs up to
t.

Unfolding maps the left to the right in the figure below (both are
computational graphs of a RNN without output o)
where the black square indicates that an interaction takes place with a
delay of 1 time step, from the state at time t to the state at time t + 1.

Unfolding/parameter sharing is better than using different parameters per


position: less parameters to estimate, generalize to various length.

10.2 Recurrent Neural Network

Variation 1 of RNN (basic form): hidden2hidden connections,


sequence output. As in Fig 10.3.

The basic equations that defines the above RNN is shown


in (10.6) below (on pp. 385 of the book)
The total loss for a given sequence of x values paired with a sequence
of y values would then be just the sum of the losses over all the time steps.
For example, if L(t) is the negative log-likelihood

of y (t) given x (1), . . . , x (t) , then sum them up you get the loss for the
sequence as shown in (10.7):

 Foward Pass: The runtime is O(τ) and cannot be reduced by


parallelization because the forward propagation graph is inherently
sequential; each time step may only be computed after the previous
one.
 Backward Pass: see Section 10.2.2.

Variation 2 of RNN output2hidden, sequence output. As shown in Fig


10.4, it produces an output at each time step and have recurrent
connections only from the output at one time step to the hidden units at
the next time step
Teacher forcing (Section 10.2.1, pp 385) can be used to train RNN as in
Fig 10.4 (above), where only output2hidden connections exist, i.e
hidden2hidden connections are absent.

In teach forcing, the model is trained to maximize the conditional


probability of current output y(t), given both the x sequence so far and the
previous output y(t-1), i.e. use the gold-standard output of previous time
step in training.

Variation 3 of RNN hidden2hidden, single output. As Fig 10.5


recurrent connections between hidden units, that read an entire sequence
and then produce a single output
What is Recurrent Neural Network (RNN)?
Recurrent Neural Network(RNN) is a type of Neural Network where the
output from the previous step is fed as input to the current step. In
traditional neural networks, all the inputs and outputs are independent of
each other. Still, in cases when it is required to predict the next word of
a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which
solved this issue with the help of a Hidden Layer. The main and most
important feature of RNN is its Hidden state, which remembers some
information about a sequence. The state is also referred to as Memory
State since it remembers the previous input to the network. It uses the
same parameters for each input as it performs the same task on all the
inputs or hidden layers to produce the output. This reduces the
complexity of parameters, unlike other neural networks.
Recurrent Neural Network

How RNN differs from Feedforward Neural Network?

Artificial neural networks that do not have looping nodes are called feed
forward neural networks. Because all information is only passed
forward, this kind of neural network is also referred to as a multi-layer
neural network.
Information moves from the input layer to the output layer – if any
hidden layers are present – unidirectionally in a feedforward neural
network. These networks are appropriate for image classification tasks,
for example, where input and output are independent. Nevertheless,
their inability to retain previous inputs automatically renders them less
useful for sequential data analysis.
Bi-directional Recurrent Neural Network
An architecture of a neural network called a bidirectional recurrent
neural network (BRNN) is made to process sequential data. In order for
the network to use information from both the past and future context in
its predictions, BRNNs process input sequences in both the forward and
backward directions. This is the main distinction between BRNNs and
conventional recurrent neural networks.

There are four types of RNNs based on the number of inputs and outputs
in the network.
1. One to One
2. One to Many
3. Many to One
4. Many to Many
One to One
This type of RNN behaves the same as any simple Neural network it is
also known as Vanilla Neural Network. In this Neural network, there is
only one input and one output.

One to One RNN


One To Many
In this type of RNN, there is one input and many outputs associated with
it. One of the most used examples of this network is Image captioning
where given an image we predict a sentence having Multiple words.

One to Many RNN


Many to One
In this type of network, Many inputs are fed to the network at several
states of the network generating only one output. This type of network is
used in the problems like sentimental analysis. Where we give multiple
words as input and predict only the sentiment of the sentence as output.
Many to One RNN
Many to Many
In this type of neural network, there are multiple inputs and multiple
outputs corresponding to a problem. One Example of this Problem will
be language translation. In language translation, we provide multiple
words from one language as input and predict multiple words from the
second language as output.

Many to Many RNN

Bidirectional RNNs
A BRNN has two distinct recurrent hidden layers, one of which
processes the input sequence forward and the other of which processes it
backward. After that, the results from these hidden layers are collected
and input into a prediction-making final layer. Any recurrent neural
network cell, such as Long Short-Term Memory (LSTM) or Gated
Recurrent Unit, can be used to create the recurrent hidden layers.
The BRNN functions similarly to conventional recurrent neural
networks in the forward direction, updating the hidden state depending
on the current input and the prior hidden state at each time step. The
backward hidden layer, on the other hand, analyses the input sequence
in the opposite manner, updating the hidden state based on the current
input and the hidden state of the next time step.
Compared to conventional unidirectional recurrent neural networks, the
accuracy of the BRNN is improved since it can process information in
both directions and account for both past and future contexts. Because
the two hidden layers can complement one another and give the final
prediction layer more data, using two distinct hidden layers also offers a
type of model regularisation.
In order to update the model parameters, the gradients are computed for
both the forward and backward passes of the backpropagation through
the time technique that is typically used to train BRNNs. The input
sequence is processed by the BRNN in a single forward pass at
inference time, and predictions are made based on the combined outputs
of the two hidden layers. layers.

Bi-directional Recurrent Neural Network

Working of Bidirectional Recurrent Neural Network

1. Inputting a sequence: A sequence of data points, each represented


as a vector with the same dimensionality, are fed into a BRNN. The
sequence might have different lengths.
2. Dual Processing: Both the forward and backward directions are
used to process the data. On the basis of the input at that step and the
hidden state at step t-1, the hidden state at time step t is determined
in the forward direction. The input at step t and the hidden state at
step t+1 are used to calculate the hidden state at step t in a reverse
way.
3. Computing the hidden state: A non-linear activation function on
the weighted sum of the input and previous hidden state is used to
calculate the hidden state at each step. This creates a memory
mechanism that enables the network to remember data from earlier
steps in the process.
4. Determining the output: A non-linear activation function is used to
determine the output at each step from the weighted sum of the
hidden state and a number of output weights. This output has two
options: it can be the final output or input for another layer in the
network.
5. Training: The network is trained through a supervised learning
approach where the goal is to minimize the discrepancy between the
predicted output and the actual output. The network adjusts its
weights in the input-to-hidden and hidden-to-output connections
during training through backpropagation.

Recursive Neural Network

A branch of machine learning and artificial intelligence (AI) known as


"deep learning" aims to replicate how the human brain analyses
information and learns certain concepts. Deep Learning's foundation is
made up of neural networks. These are intended to precisely identify
underlying patterns in a data collection and are roughly modelled after the
human brain. Deep Learning provides the answer to the problem of
predicting the unpredictable.

A subset of deep neural networks called recursive neural networks


(RvNNs) are capable of learning organized and detailed data. By
repeatedly using the same set of weights on structured inputs, RvNN
enables you to obtain a structured prediction. Recursive refers to the
neural network's application to its output.

Recursive neural networks are capable of handling hierarchical data


because of their indepth tree-like structure. In a tree structure, parent
nodes are created by joining child nodes. There is a weight matrix for
every child-parent bond, and comparable children have the same weights.
To allow for recursive operations and the use of the same weights, the
number of children for each node in the tree is fixed. When it's necessary
to parse a whole sentence, RvNNs are employed.

We add the weight matrices' (W i) and children's (C i) products and use


the transformation f to determine the parent node's representation.
h=f(∑i=1i=cWiCi)Hd)

1What are long-term dependencies?

Long-term dependencies are the situations where the output of an RNN


depends on the input that occurred many time steps ago. For instance,
consider the sentence "The cat, which was very hungry, ate the mouse".
To understand the meaning of this sentence, you need to remember that
the cat is the subject of the verb ate, even though they are separated by a
long clause. This is a long-term dependency, and it can affect the
performance of an RNN that tries to generate or analyze such sentences.

2.Why are long-term dependencies hard to learn?

The main reason why long-term dependencies are hard to learn is that
RNNs suffer from the vanishing or exploding gradient problem. This
means that the gradient, which is the signal that tells the network how to
update its weights, becomes either very small or very large as it
propagates through the network. When the gradient vanishes, the network
cannot learn from the distant inputs, and when it explodes, the network
becomes unstable and produces erratic outputs. This problem is caused by
the repeated multiplication of the same matrix, which represents the
connections between the hidden units, at each time step.

3.How can you solve the vanishing or exploding gradient problem?

One way to solve the vanishing or exploding gradient problem is to use a


different activation function for the hidden units. The activation function
determines how the units respond to the input and output signals. The
most common activation function for RNNs is the hyperbolic tangent
(tanh), which has a range between -1 and 1. However, this function can
cause the gradient to vanish if the input is too large or too small. A better
alternative is the rectified linear unit (ReLU), which has a range between
0 and infinity. This function can prevent the gradient from vanishing, but
it can also cause it to explode if the input is too large.

Another way to solve the vanishing or exploding gradient problem is to


use a different weight initialization method for the network. The weight
initialization method determines how the network assigns random values
to its weights before training. The most common method for RNNs is the
uniform initialization, which draws the weights from a uniform
distribution between -1 and 1. However, this method can cause the
weights to be too large or too small, which can affect the gradient. A
better alternative is the orthogonal initialization, which draws the weights
from an orthogonal matrix, which preserves the norm of the gradient.

Explicit memory
Explicit memory is declarative memory because we consciously try to
recall a specific event or piece of information. Things we intentionally try
to recall or remember, such as formulas and dates, are all stored in
explicit memory. We utilize recalled information such as this during
everyday activities such as work or when running errands.

Explicit memory can be classed as either episodic or semantic. Episodic


memory is the memory of one's own personal past, while semantic
memories contain hard facts and concepts such as names.

MRI studies show that during recall of explicit short-term memories, the
prefrontal cortex is activated, the most recently evolved addition to the
mammalian brain. Interestingly, there appears to be a separation in
function between the left and right sides of the prefrontal cortex, with the
right more involved in spatial and the left verbal working memory.

The hippocampus, neocortex, and amygdala have been implicated during


the formation and storage of explicit long-term memory. The
hippocampus is found within the brain's temporal lobe and forms and
indexes memories about our own lives for later access.

We know this as Henry Molaison had his hippocampus removed in the


treatment of epilepsy in 1952, and following the procedure was unable to
form any new memories of things he had done. He was, however, able to
learn new skills and motor tasks, examples of implicit memory that do
not rely on this region of the brain.
Examples of explicit memory include:

 Recalling phone numbers.


 Completing an exam.
 Remembering items on a list.
 Birth dates.
 Important event dates.
 Names.
 Locations.
 Country names.

You might also like