0% found this document useful (0 votes)
31 views19 pages

Deep Learning

Uploaded by

adarsh24jdp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views19 pages

Deep Learning

Uploaded by

adarsh24jdp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

History of Deep Learning [DL]: The chain rule that underlies the back-propagation algorithm was

invented in the seventeenth century (Leibniz, 1676; L’Hôpital, 1696) Beginning in the 1940s, the
function approximation techniques were used to motivate machine learning models such as the
perceptron . The earliest models were based on linear models. Critics including Marvin Minsky pointed
out several of the flaws of the linear model family, such as its inability to learn the XOR function, which
led to a backlash against the entire neural network approach
Efficient applications of the chain rule based on dynamic programming began to appear in the 1960s
and 1970s
Werbos (1981) proposed applying chain rule techniques for training artificial neural networks.
The idea was finally developed in practice after being independently rediscovered in different ways
(LeCun, 1985; Parker, 1985; Rumelhart et al., 1986a)
Following the success of back-propagation, neural network research gained popularity and reached a
peak in the early 1990s.
Afterwards, other machine learning techniques became more popular until the modern deep learning
renaissance that began in 2006 The core ideas behind modern feedforward networks have not changed
substantially since the 1980s. The same back-propagation algorithm and the same approaches to
gradient descent are still in use. Most of the improvement in neural network performance from 1986 to
2015 can be attributed to two factors. First, larger datasets have reduced the degree to which statistical
generalization is a challenge for neural networks. Second, neural networks have become much larger,
because of more powerful computers and better software infrastructure.A small number of algorithmic
changes have also improved the performance of neural networks noticeably. One of these algorithmic
changes was the replacement of mean squared error with the cross-entropy family of loss functions.
Mean squared error was popular in the 1980s and 1990s but was gradually replaced by cross-entropy
losses and the principle of maximum likelihood as ideas spread between the statistics community and
the machine learning community. The other major algorithmic change that has greatly improved the
performance of feedforward networks was the replacement of sigmoid hidden units with piecewise
linear hidden units, such as rectified linear units. Rectification using the max{0, z} function was
introduced in early neural network models and dates back at least as far as the Cognitron and Neo-
Cognitron (Fukushima, 1975, 1980). For small datasets, Jarrett et al. (2009) observed that using
rectifying nonlinearities is even more important than learning the weights of the hidden layers. Random
weights are 3 sufficient to propagate useful information through a rectified linear network, enabling the
classifier layer at the top to learn how to map different feature vectors to class identities. When more
data is available, learning begins to extract enough useful knowledge to exceed the performance of
randomly chosen parameters. Glorot et al. (2011a) showed that learning is far easier in deep rectified
linear networks than in deep networks that have curvature or two-sided saturation in their activation
functions. When the modern resurgence of deep learning began in 2006, feedforward networks
continued to have a bad reputation. From about 2006 to 2012, it was widely believed that feedforward
networks would not perform well unless they were assisted by other models, such as probabilistic
models. Today, it is now known that with the right resources and engineering practices, feedforward
networks perform very well. Today, gradient-based learning in feedforward networks is used as a tool to
develop probabilistic models. Feedforward networks continue to have unfulfilled potential. In the
future, we expect they will be applied to many more tasks, and that advances in optimization algorithms
and model design will improve their performance even further.

Deep learning - Deep learning is a collection of statistical techniques of machine learning for
learning feature hierarchies that are actually based on artificial neural networks.

Example of Deep Learning

In the example given above, we provide the raw data of images to the first layer of
the input layer. After then, these input layer will determine the patterns of local
contrast that means it will differentiate on the basis of colors, luminosity, etc. Then
the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose,
and lips, etc. And then, it will fixate those face features on the correct face
template. So, in the 2nd hidden layer, it will actually determine the correct face here
as it can be seen in the above image, after which it will be sent to the output layer.
Likewise, more hidden layers can be added to solve more complex problems, for
example, if you want to find out a particular kind of face having large or light
complexions. So, as and when the hidden layers increase, we are able to solve
complex problems.

Architectures

o DeepNeuralNetworks
It is a neural network that incorporates the complexity of a certain level,
which means several numbers of hidden layers are encompassed in between
the input and output layers. They are highly proficient on model and process
non-linear associations.

o DeepBeliefNetworks
A deep belief network is a class of Deep Neural Network that comprises of
multi-layer belief networks.
Steps to perform DBN:

1. With the help of the Contrastive Divergence algorithm, a layer of


features is learned from perceptible units.

2. Next, the formerly trained features are treated as visible units, which
perform learning of features.

3. Lastly, when the learning of the final hidden layer is accomplished,


then the whole DBN is trained.

o RecurrentNeuralNetworks
It permits parallel as well as sequential computation, and it is exactly similar
to that of the human brain (large feedback network of connected neurons).
Since they are capable enough to reminisce all of the imperative things
related to the input they have received, so they are more precise

Types of Deep Learning Networks

1. Feed Forward Neural Network

A feed-forward neural network is none other than an Artificial Neural Network, which
ensures that the nodes do not form a cycle. In this kind of neural network, all the
perceptrons are organized within layers, such that the input layer takes the input,
and the output layer generates the output. Since the hidden layers do not link with
the outside world, it is named as hidden layers. Each of the perceptrons contained
in one single layer is associated with each node in the subsequent layer. It can be
concluded that all of the nodes are fully connected. It does not contain any visible
or invisible connection between the nodes in the same layer. There are no back-
loops in the feed-forward network. To minimize the prediction error, the
backpropagation algorithm can be used to update the weight values.

Applications:

o Data Compression

o Pattern Recognition

o Computer Vision
o Sonar Target Recognition

o Speech Recognition

o Handwritten Characters Recognition

2. Recurrent Neural Network

Recurrent neural networks are yet another variation of feed-forward networks. Here
each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one
must have knowledge about the words that were previously used. It not only
processes the inputs but also shares the length as well as weights crossways time.
It does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow
computational speed as well as it does not contemplate any future input for the
current state. It has a problem with reminiscing prior information.

Applications:

o Machine Translation

o Robot Control

o Time Series Prediction

o Speech Recognition

o Speech Synthesis

o Time Series Anomaly Detection

o Rhythm Learning

o Music Composition

3. Convolutional Neural Network

Convolutional Neural Networks are a special kind of neural network mainly used for
image classification, clustering of images and object recognition. DNNs enable
unsupervised construction of hierarchical image representations. To achieve the
best accuracy, deep convolutional neural networks are preferred more than any
other neural network.

Applications:

o Identify Faces, Street Signs, Tumors.

o Image Recognition.

o Video Analysis.
o NLP.

o Anomaly Detection.

o Drug Discovery.

o Checkers Game.

o Time Series Forecasting.

4. Restricted Boltzmann Machine

RBMs are yet another variant of Boltzmann Machines. Here the neurons present in
the input layer and the hidden layer encompasses symmetric connections amid
them. However, there is no internal association within the respective layer. But in
contrast to RBM, Boltzmann machines do encompass internal connections inside the
hidden layer. These restrictions in BMs helps the model to train efficiently.

Applications:

o Filtering.

o Feature Learning.

o Classification.

o Risk Detection.

o Business and Economic analysis.

5. Autoencoders

An autoencoder neural network is another kind of unsupervised machine learning


algorithm. Here the number of hidden cells is merely small than that of the input
cells. But the number of input cells is equivalent to the number of output cells. An
autoencoder network is trained to display the output similar to the fed input to force
AEs to find common patterns and generalize the data. The autoencoders are mainly
used for the smaller representation of the input. It helps in the reconstruction of the
original data from compressed data. This algorithm is comparatively simple as it
only necessitates the output identical to the input.

o Encoder: Convert input data in lower dimensions.

o Decoder: Reconstruct the compressed data.

Applications:

o Classification.

o Clustering.

o Feature Compression.
Deep learning applications
o Self-DrivingCars
In self-driven cars, it is able to capture the images around it by processing a
huge amount of data, and then it will decide which actions should be
incorporated to take a left or right or should it stop. So, accordingly, it will
decide what actions it should take, which will further reduce the accidents
that happen every year.

o Voice Controlled Assistance


When we talk about voice control assistance, then Siri is the one thing that
comes into our mind. So, you can tell Siri whatever you want it to do it for
you, and it will search it for you and display it for you.

o Automatic Image Caption Generation


Whatever image that you upload, the algorithm will work in such a way that it
will generate caption accordingly. If you say blue colored eye, it will display a
blue-colored eye with a caption at the bottom of the image.

o Automatic Machine Translation


With the help of automatic machine translation, we are able to convert one
language into another with the help of deep learning.

Limitations

o It only learns through the observations.

o It comprises of biases issues.

Advantages

o It lessens the need for feature engineering.

o It eradicates all those costs that are needless.

o It easily identifies difficult defects.

o It results in the best-in-class performance on problems.

Disadvantages

o It requires an ample amount of data.

o It is quite expensive to train.


o It does not have strong theoretical groundwork.

McCulloch-Pitts Model of Neuron


o The McCulloch-Pitts neural model, which was the earliest ANN model, has only two
types of inputs — Excitatory and Inhibitory. The excitatory inputs have weights of
positive magnitude and the inhibitory weights have weights of negative magnitude.
The inputs of the McCulloch-Pitts neuron could be either 0 or 1. It has a threshold
function as an activation function. So, the output signal yout is 1 if the input ysum is
greater than or equal to a given threshold value, else 0. The diagrammatic

representation of the model is as follows:


Simple McCulloch-Pitts neurons can be used to design logical operations. For that
purpose, the connection weights need to be correctly decided along with the threshold
function (rather than the threshold value of the activation function). For better
understanding purpose, let me consider an example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need
to decide when John will carry the umbrella. The situations are as follows:
 First scenario: It is not raining, nor it is sunny
 Second scenario: It is not raining, but it is sunny
 Third scenario: It is raining, and it is not sunny
 Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input
signals as follows:
 X1: Is it raining?
 X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights
X1 and X2 as 1 and a threshold function as 1. So, the neural network model will look like:
Truth Table for this case will be:

Situation x1 x2 ysum yout

1 0 0 0 0

2 0 1 1 1

3 1 0 1 1

4 1 1 2 1

The truth table built with respect to the problem is depicted above. From the
truth table, I can conclude that in the situations where the value of yout is 1, John needs to
carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.

Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a neural
network that has multiple layers. To create a neural network we combine neurons together so
that the outputs of some neurons are inputs of other neurons. A multi-layer perceptron has
one input layer and for each input, there is one neuron(or node), it has one output layer with
a single node for each output and it can have any number of hidden layers and each hidden
layer can have any number of nodes. A schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below
In the multi-layer perceptron diagram above, we can see that there are three inputs and thus
three input nodes and the hidden layer has three nodes. The output layer gives two outputs,
therefore there are two output nodes. The nodes in the input layer take input and forward it
for further process, in the diagram above the nodes in the input layer forwards their output to
each of the three nodes in the hidden layer, and in the same way, the hidden layer processes
the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and 1
using the sigmoid formula. (x)=1/(1+exp(-x))
Now that we are done with the theory part of multi-layer perception, let’s go ahead and
implement some code in python using the TensorFlow library.

Sigmoid Neuron

Sigmoid Neuron
Sigmoid functions are the fundamental building block of the deep neural network.
Sigmoid functions are similar to perceptrons and MP Neuron Model, but the
significant difference is that sigmoid neurons are smoother at the boundary than
perceptrons and MP Neuron Model. In a sigmoid neuron, for every input xi, it
weights wi associated with it. The weights depict the importance of the input in the
decision-making process. The output from sigmoid ranges between zero to one,
which we can interpret as a probability rather than zero or one like in the
perceptron model. One of the most commonly used sigmoid functions is the logistic
function, characteristic of an "S" shaped curve.The sigmoid function is commonly used as
an activation function in artificial neural networks. In feedforward neural networks, the sigmoid
function is applied to each neuron’s output, allowing the network to introduce non-linearity into
the model. This nonlinearity is important because it allows the neural network to learn more
complex decision boundaries, which can improve its performance on specific tasks.

Advantages:

1.Produces output values between 0 and 1, which can be helpful for binary classification and

logistic regression problems.


 2.Differentiable means that its derivative can be calculated, and it is easy to optimize the

network by adjusting the weights and biases of the neurons.

Disadvantages:

 1.It can produce output values close to 0 or 1, which can cause problems with the optimization

algorithm.

 2.The gradient of the sigmoid function becomes very small near the output values of 0 or 1,

which makes it difficult for the optimization algorithm to adjust the weights and biases of the

neurons.

Here is a comparison table of sigmoid, ReLU, and tanh activation functions in terms of

performance and optimization:


Activation
Performance Optimization
Function
Good for binary classification and logistic regression Can have issues with optimization
Sigmoid problems. near output values of 0 or 1.

Can have issues with optimization


Computationally efficient, however, can suffer from
ReLU with neurons that output 0.
the “dying ReLU” problem.
Computationally efficient, its range is centered
Can have issues with optimization
tanh around zero, which can be helpful for specific
near output values of -1 or 1.
problems.

Feed forward neural network


Feed forward neural networks are artificial neural networks in which nodes do not
form loops. This type of neural network is also known as a multi-layer neural network
as all information is only passed forward.

During data flow, input nodes receive data, which travel through hidden layers, and
exit output nodes. No links exist in the network that could get used to by sending
information back from the output node.

A feed forward neural network approximates functions in the following way:


 An algorithm calculates classifiers by using the formula y = f* (x).
 Input x is therefore assigned to category y.
 According to the feed forward model, y = f (x; θ). This value
determines the closest approximation of the function.

Feed forward neural networks serve as the basis for object detection in photos, as
shown in the Google Photos app.

What is the working principle of a feed forward neural


network?
When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.

This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1,
while if it falls below the threshold, it is usually -1.

As a feed forward neural network model, the single-layer perceptron often gets used
for classification. Machine learning can also get integrated into single-layer
perceptrons. Through training, neural networks can adjust their weights based on a
property called the delta rule, which helps them compare their outputs with the
intended values.

As a result of training and learning, gradient descent occurs. Similarly, multi-layered


perceptrons update their weights. But, this process gets known as back-propagation.
If this is the case, the network's hidden layers will get adjusted according to the
output values produced by the final layer.

Layers of feed forward neural network

 Input layer:
The neurons of this layer receive input and pass it on to the other layers of the
network. Feature or attribute numbers in the dataset must match the number of
neurons in the input layer.

 Output layer:

According to the type of model getting built, this layer represents the forecasted
feature.

 Hidden layer:

Input and output layers get separated by hidden layers. Depending on the type of
model, there may be several hidden layers.There are several neurons in hidden
layers that transform the input before actually transferring it to the next layer. This
network gets constantly updated with weights in order to make it easier to predict.

 Neuron weights:

Neurons get connected by a weight, which measures their strength or magnitude.


Similar to linear regression coefficients, input weights can also get compared.Weight
is normally between 0 and 1, with a value between 0 and 1.

 Neurons:

Artificial neurons get used in feed forward networks, which later get adapted from
biological neurons. A neural network consists of artificial neurons.Neurons function
in two ways: first, they create weighted input sums, and second, they activate the
sums to make them normal.Activation functions can either be linear or nonlinear.
Neurons have weights based on their inputs. During the learning phase, the network
studies these weights.

 Activation Function:

Neurons are responsible for making decisions in this area.According to the


activation function, the neurons determine whether to make a linear or nonlinear
decision. Since it passes through so many layers, it prevents the cascading effect
from increasing neuron outputs.

An activation function can be classified into three major categories: sigmoid,


Tanh, and Rectified Linear Unit (ReLu).

 Sigmoid:

Input values between 0 and 1 get mapped to the output values.


 Tanh:

A value between -1 and 1 gets mapped to the input values.

 Rectified linear Unit:

Only positive values are allowed to flow through this function. Negative values get
mapped to 0.

Function in feed forward neural network

Cost function

In a feed forward neural network, the cost function plays an important role. The
categorized data points are little affected by minor adjustments to weights and
biases.Thus, a smooth cost function can get used to determine a method of
adjusting weights and biases to improve performance.

Following is a definition of the mean square error cost function:

Image source

Where,
w = the weights gathered in the network

b = biases

n = number of inputs for training

a = output vectors

x = input

‖v‖ = vector v's normal length

Loss function

The loss function of a neural network gets used to determine if an adjustment needs
to be made in the learning process.Neurons in the output layer are equal to the
number of classes. Showing the differences between predicted and actual
probability distributions. Following is the cross-entropy loss for binary classification.

Image source

As a result of multiclass categorization, a cross-entropy loss occurs:

Gradient learning algorithm

In the gradient descent algorithm, the next point gets calculated by scaling the
gradient at the current position by a learning rate. Then subtracted from the current
position by the achieved value.To decrease the function, it subtracts the value (to
increase, it would add). As an example, here is how to write this procedure:
The gradient gets adjusted by the parameter η, which also determines the step size.
Performance is significantly affected by the learning rate in machine learning.

Output units

In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.There
is a close relationship between the choice of output units and the cost function. Any
unit that can serve as a hidden unit can also serve as an output unit in a neural
network.

Advantages of feed forward Neural Networks


 Machine learning can be boosted with feed forward neural networks'
simplified architecture.
 Multi-network in the feed forward networks operate independently,
with a moderated intermediary.
 Complex tasks need several neurons in the network.
 Neural networks can handle and process nonlinear data easily
compared to perceptrons and sigmoid neurons, which are otherwise
complex.
 A neural network deals with the complicated problem of decision
boundaries.
 Depending on the data, the neural network architecture can vary.
For example, convolutional neural networks (CNNs) perform
exceptionally well in image processing, whereas recurrent neural
networks (RNNs) perform well in text and voice processing.
 Neural networks need graphics processing units (GPUs) to handle
large datasets for massive computational and hardware
performance. Several GPUs get used widely in the market, including
Kaggle Notebooks and Google Collab Notebooks.

Applications of feed forward neural networks


Backpropagation Process in Deep Neural
Network
Backpropagation is one of the important concepts of a neural network.
Our task is to classify our data best. For this, we have to update the
weights of parameter and bias, but how can we do that in a deep neural
network? In the linear regression model, we use gradient descent to
optimize the parameter. Similarly here we also use gradient descent
algorithm using Backpropagation.For a single training
example, Backpropagation algorithm calculates the gradient of
the error function. Backpropagation can be written as a function of the
neural network. Backpropagation algorithms are a set of methods used to
efficiently train artificial neural networks following a gradient descent
approach which exploits the chain rule.The main features of
Backpropagation are the iterative, recursive and efficient method through
which it calculates the updated weight to improve the network until it is
not able to perform the task for which it is being trained

https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network// check
the numerical

You might also like