REPORT
REPORT
CHAPTER 1
INTRODUCTION
Here is an example of how neural networks can identify a dog’s breed based on their
features.
The image pixels of two different breeds of dogs are fed to the input layer of the
neural network.
The image pixels are then processed in the hidden layers for feature extraction.
The output layer produces the result to identify if it’s a German Shepherd or a
Labrador.
Such networks do not require memorizing the past output.
Several neural networks can help solve different business problems. Let’s look at a few of
them.
RNN works on the principle of saving the output of a particular layer and feeding
this back to the input in order to predict the output of the layer.
Below is how you can convert a Feed-Forward Neural Network into a Recurrent
Neural Network:
The solution to these issues is the RNN. An RNN can handle sequential data,
accepting the current input data, and previously receive inputs. RNNs can memorize
previous inputs due to their internal memory.
RNN’s and feed-forward neural networks get their names from the way they
channel information.
Feed-forward neural networks have no memory of the input they receive and are
bad at predicting what’s coming next. Because a feed-forward network only considers the
current input, it has no notion of order in time. It simply can’t remember anything about
what happened in the past except its training.
The two images below illustrate the difference in information flow between a
RNN and a feed-forward neural network.
A usual RNN has a short-term memory. In combination with a LSTM they also
have a long-term memory (more on that later).
Imagine you have a normal feed-forward neural network and give it the word
"neuron" as an input and it processes the word character by character. By the time it
reaches the character "r," it has already forgotten about "n," "e" and "u," which makes it
almost impossible for this type of neural network to predict which character would come
next.
Simply put: recurrent neural networks add the immediate past to the present.
Therefore, a RNN has two inputs: the present and the recent past. This is
important because the sequence of data contains crucial information about what is coming
next, which is why a RNN can do things other algorithms can’t.
A feed-forward neural network assigns, like all other deep learning algorithms, a
weight matrix to its inputs and then produces the output. Note that RNNs apply weights to
the current and also to the previous input. Furthermore, a recurrent neural network will
also tweak the weights for both through gradient descent and backpropagation through
time (BPTT).
The input layer ‘x’ takes in the input to the neural network and processes it and
passes it onto the middle layer.
The middle layer ‘h’ can consist of multiple hidden layers, each with its own
activation functions and weights and biases. If you have a neural network where the
various parameters of different hidden layers are not affected by the previous layer, ie: the
neural network does not have memory, then you can use a recurrent neural network.
The Recurrent Neural Network will standardize the different activation functions
and weights and biases so that each hidden layer has the same parameters. Then, instead
of creating multiple hidden layers, it will create one and loop over it as many times as
required.
where:
where:
Yt -> output
Why -> weight at output layer
CHAPTER 2
APPLICATIONS OF RECURRENT NEURAL NETWORKS
2.1 IMAGE CAPTIONING
RNNs are used to caption an image by analyzing the activities present.
Given an input in one language, RNNs can be used to translate the input into
different languages as output.
1. One to One
2. One to Many
3. Many to One
4. Many to Many
This type of neural network is known as the Vanilla Neural Network. It's used for
general machine learning problems, which has a single input and a single output.
This type of neural network has a single input and multiple outputs. An example of
this is the image caption.
This RNN takes a sequence of inputs and generates a single output. Sentiment
analysis is a good example of this kind of network where a given sentence can be
classified as expressing positive or negative sentiments.
CHAPTER 3
ISSUES OF STANDARD RNNS
RNNs suffer from the problem of vanishing gradients. The gradients carry
information used in the RNN, and when the gradient becomes too small, the parameter
updates become insignificant. This makes the learning of long data sequences difficult.
Long training time, poor performance, and bad accuracy are the major issues in
gradient problems
Now, let’s discuss the most popular and efficient way to deal with gradient
problems, i.e., Long Short-Term Memory Network (LSTMs).
Suppose you want to predict the last word in the text: “The clouds are in the ______.”
The most obvious answer to this is the “sky.” We do not need any further context
to predict the last word in the above sentence.
Consider this sentence: “I have been staying in Spain for the last 10 years…I can
speak fluent ______.”
The word you predict will depend on the previous few words in context. Here, you
need the context of Spain to predict the last word in the text, and the most suitable answer
to this sentence is “Spanish.” The gap between the relevant information and the point
where it's needed may have become very large. LSTMs help you solve this problem.
In a typical RNN, one input is fed into the network at a time, and a single output is
obtained. But in backpropagation, you use the current as well as the previous inputs as
input. This is called a timestep and one timestep will consist of many time series data
points entering the RNN simultaneously.
Once the neural network has trained on a time set and given you an output, that
output is used to calculate and accumulate the errors. After this, the network is rolled
back up and weights are recalculated and updated keeping the errors in mind.
All RNN are in the form of a chain of repeating modules of a neural network. In
standard RNNs, this repeating module will have a very simple structure, such as a single
tanh layer.
LSTMs also have a chain-like structure, but the repeating module is a bit different
structure. Instead of having a single neural network layer, four interacting layers are
communicating extraordinarily.
Let the output of h(t-1) be “Alice is good in Physics. John, on the other hand, is good at
Chemistry.”
Let the current input at x(t) be “John plays football well. He told me yesterday over the
phone that he had served as the captain of his college football team.”
The forget gate realizes there might be a change in context after encountering the first full
stop. It compares with the current input sentence at x(t). The next sentence talks about
John, so the information on Alice is deleted. The position of the subject is vacated and
assigned to John.
Step 2: Decide How Much This Unit Adds to the Current State
In the second layer, there are two parts. One is the sigmoid function, and the other is the
tanh function. In the sigmoid function, it decides which values to let through (0 or 1). tanh
function gives weightage to the values which are passed, deciding their level of
importance (-1 to 1).
With the current input at x(t), the input gate analyzes the important information —
John plays football, and the fact that he was the captain of his college team is important.
“He told me yesterday over the phone” is less important; hence it's forgotten. This process
of adding some new information can be done via the input gate.
Step 3: Decide What Part of the Current Cell State Makes It to the Output
The third step is to decide what the output will be. First, we run a sigmoid layer, which
decides what parts of the cell state make it to the output. Then, we put the cell state
through tanh to push the values to be between -1 and 1 and multiply it by the output of the
sigmoid gate.
Let’s consider this example to predict the next word in the sentence: “John played
tremendously well against the opponent and won for his team. For his contributions,
brave ____ was awarded player of the match.”
There could be many choices for the empty space. The current input brave is an adjective,
and adjectives describe a noun. So, “John” could be the best output after brave.
Build deep learning models in TensorFlow and learn the TensorFlow open-source
framework with the Deep Learning Course (with Keras &TensorFlow). Enroll now!
CHAPTER 4
ADVANTAGES & DISADVANTAGES
The principal advantage of RNN over ANN is that RNN can model a collection of
records (i.e., time collection) so that each pattern can be assumed to be dependent
on previous ones.
Recurrent neural networks are even used with convolutional layers to extend the
powerful pixel neighborhood.
CHAPTER 5
CONCLUSION
Recurrent Neural Networks stand at the foundation of the modern-day marvels of
synthetic intelligence. They provide stable foundations for synthetic intelligence
programs to be greater green, flexible of their accessibility, and most importantly,
extra convenient to use.
However, the outcomes of recurrent neural network work show the actual cost of
the information in this day and age. They display what number of things may be
extracted out of records and what this information can create in return. And that is
exceptionally inspiring.
REFERENCES
1) Decision Tree Algorithm Introduction
2) Natural Language Processing with Python
3) An Introduction to Reinforcement Learning
4) Data Science And Machine Learning: Hands-On Labs With Python
5) Data Visualization Using Plotly: Python’s Visualization Library
6) Deep Learning Vs Machine Learning
7) Python Decorators and Generators Q & A: Day 6 Live Session Review
8) Python FAQ: Technical & Career Oriented Questions For Beginners
9) Beginners Guide To Data Types In Python