Deep Learning
Deep Learning
invented in the seventeenth century (Leibniz, 1676; L’Hôpital, 1696) Beginning in the 1940s, the
function approximation techniques were used to motivate machine learning models such as the
perceptron . The earliest models were based on linear models. Critics including Marvin Minsky pointed
out several of the flaws of the linear model family, such as its inability to learn the XOR function, which
led to a backlash against the entire neural network approach
Efficient applications of the chain rule based on dynamic programming began to appear in the 1960s
and 1970s
Werbos (1981) proposed applying chain rule techniques for training artificial neural networks.
The idea was finally developed in practice after being independently rediscovered in different ways
(LeCun, 1985; Parker, 1985; Rumelhart et al., 1986a)
Following the success of back-propagation, neural network research gained popularity and reached a
peak in the early 1990s.
Afterwards, other machine learning techniques became more popular until the modern deep learning
renaissance that began in 2006 The core ideas behind modern feedforward networks have not changed
substantially since the 1980s. The same back-propagation algorithm and the same approaches to
gradient descent are still in use. Most of the improvement in neural network performance from 1986 to
2015 can be attributed to two factors. First, larger datasets have reduced the degree to which statistical
generalization is a challenge for neural networks. Second, neural networks have become much larger,
because of more powerful computers and better software infrastructure.A small number of algorithmic
changes have also improved the performance of neural networks noticeably. One of these algorithmic
changes was the replacement of mean squared error with the cross-entropy family of loss functions.
Mean squared error was popular in the 1980s and 1990s but was gradually replaced by cross-entropy
losses and the principle of maximum likelihood as ideas spread between the statistics community and
the machine learning community. The other major algorithmic change that has greatly improved the
performance of feedforward networks was the replacement of sigmoid hidden units with piecewise
linear hidden units, such as rectified linear units. Rectification using the max{0, z} function was
introduced in early neural network models and dates back at least as far as the Cognitron and Neo-
Cognitron (Fukushima, 1975, 1980). For small datasets, Jarrett et al. (2009) observed that using
rectifying nonlinearities is even more important than learning the weights of the hidden layers. Random
weights are 3 sufficient to propagate useful information through a rectified linear network, enabling the
classifier layer at the top to learn how to map different feature vectors to class identities. When more
data is available, learning begins to extract enough useful knowledge to exceed the performance of
randomly chosen parameters. Glorot et al. (2011a) showed that learning is far easier in deep rectified
linear networks than in deep networks that have curvature or two-sided saturation in their activation
functions. When the modern resurgence of deep learning began in 2006, feedforward networks
continued to have a bad reputation. From about 2006 to 2012, it was widely believed that feedforward
networks would not perform well unless they were assisted by other models, such as probabilistic
models. Today, it is now known that with the right resources and engineering practices, feedforward
networks perform very well. Today, gradient-based learning in feedforward networks is used as a tool to
develop probabilistic models. Feedforward networks continue to have unfulfilled potential. In the
future, we expect they will be applied to many more tasks, and that advances in optimization algorithms
and model design will improve their performance even further.
Deep learning - Deep learning is a collection of statistical techniques of machine learning for
learning feature hierarchies that are actually based on artificial neural networks.
In the example given above, we provide the raw data of images to the first layer of
the input layer. After then, these input layer will determine the patterns of local
contrast that means it will differentiate on the basis of colors, luminosity, etc. Then
the 1st hidden layer will determine the face feature, i.e., it will fixate on eyes, nose,
and lips, etc. And then, it will fixate those face features on the correct face
template. So, in the 2nd hidden layer, it will actually determine the correct face here
as it can be seen in the above image, after which it will be sent to the output layer.
Likewise, more hidden layers can be added to solve more complex problems, for
example, if you want to find out a particular kind of face having large or light
complexions. So, as and when the hidden layers increase, we are able to solve
complex problems.
Architectures
o DeepNeuralNetworks
It is a neural network that incorporates the complexity of a certain level,
which means several numbers of hidden layers are encompassed in between
the input and output layers. They are highly proficient on model and process
non-linear associations.
o DeepBeliefNetworks
A deep belief network is a class of Deep Neural Network that comprises of
multi-layer belief networks.
Steps to perform DBN:
2. Next, the formerly trained features are treated as visible units, which
perform learning of features.
o RecurrentNeuralNetworks
It permits parallel as well as sequential computation, and it is exactly similar
to that of the human brain (large feedback network of connected neurons).
Since they are capable enough to reminisce all of the imperative things
related to the input they have received, so they are more precise
A feed-forward neural network is none other than an Artificial Neural Network, which
ensures that the nodes do not form a cycle. In this kind of neural network, all the
perceptrons are organized within layers, such that the input layer takes the input,
and the output layer generates the output. Since the hidden layers do not link with
the outside world, it is named as hidden layers. Each of the perceptrons contained
in one single layer is associated with each node in the subsequent layer. It can be
concluded that all of the nodes are fully connected. It does not contain any visible
or invisible connection between the nodes in the same layer. There are no back-
loops in the feed-forward network. To minimize the prediction error, the
backpropagation algorithm can be used to update the weight values.
Applications:
o Data Compression
o Pattern Recognition
o Computer Vision
o Sonar Target Recognition
o Speech Recognition
Recurrent neural networks are yet another variation of feed-forward networks. Here
each of the neurons present in the hidden layers receives an input with a specific
delay in time. The Recurrent neural network mainly accesses the preceding info of
existing iterations. For example, to guess the succeeding word in any sentence, one
must have knowledge about the words that were previously used. It not only
processes the inputs but also shares the length as well as weights crossways time.
It does not let the size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is that it has slow
computational speed as well as it does not contemplate any future input for the
current state. It has a problem with reminiscing prior information.
Applications:
o Machine Translation
o Robot Control
o Speech Recognition
o Speech Synthesis
o Rhythm Learning
o Music Composition
Convolutional Neural Networks are a special kind of neural network mainly used for
image classification, clustering of images and object recognition. DNNs enable
unsupervised construction of hierarchical image representations. To achieve the
best accuracy, deep convolutional neural networks are preferred more than any
other neural network.
Applications:
o Image Recognition.
o Video Analysis.
o NLP.
o Anomaly Detection.
o Drug Discovery.
o Checkers Game.
RBMs are yet another variant of Boltzmann Machines. Here the neurons present in
the input layer and the hidden layer encompasses symmetric connections amid
them. However, there is no internal association within the respective layer. But in
contrast to RBM, Boltzmann machines do encompass internal connections inside the
hidden layer. These restrictions in BMs helps the model to train efficiently.
Applications:
o Filtering.
o Feature Learning.
o Classification.
o Risk Detection.
5. Autoencoders
Applications:
o Classification.
o Clustering.
o Feature Compression.
Deep learning applications
o Self-DrivingCars
In self-driven cars, it is able to capture the images around it by processing a
huge amount of data, and then it will decide which actions should be
incorporated to take a left or right or should it stop. So, accordingly, it will
decide what actions it should take, which will further reduce the accidents
that happen every year.
Limitations
Advantages
Disadvantages
1 0 0 0 0
2 0 1 1 1
3 1 0 1 1
4 1 1 2 1
The truth table built with respect to the problem is depicted above. From the
truth table, I can conclude that in the situations where the value of yout is 1, John needs to
carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.
Multi-layer Perceptron
Multi-layer perception is also known as MLP. It is fully connected dense layers, which
transform any input dimension to the desired dimension. A multi-layer perception is a neural
network that has multiple layers. To create a neural network we combine neurons together so
that the outputs of some neurons are inputs of other neurons. A multi-layer perceptron has
one input layer and for each input, there is one neuron(or node), it has one output layer with
a single node for each output and it can have any number of hidden layers and each hidden
layer can have any number of nodes. A schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below
In the multi-layer perceptron diagram above, we can see that there are three inputs and thus
three input nodes and the hidden layer has three nodes. The output layer gives two outputs,
therefore there are two output nodes. The nodes in the input layer take input and forward it
for further process, in the diagram above the nodes in the input layer forwards their output to
each of the three nodes in the hidden layer, and in the same way, the hidden layer processes
the information and passes it to the output layer.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and 1
using the sigmoid formula. (x)=1/(1+exp(-x))
Now that we are done with the theory part of multi-layer perception, let’s go ahead and
implement some code in python using the TensorFlow library.
Sigmoid Neuron
Sigmoid Neuron
Sigmoid functions are the fundamental building block of the deep neural network.
Sigmoid functions are similar to perceptrons and MP Neuron Model, but the
significant difference is that sigmoid neurons are smoother at the boundary than
perceptrons and MP Neuron Model. In a sigmoid neuron, for every input xi, it
weights wi associated with it. The weights depict the importance of the input in the
decision-making process. The output from sigmoid ranges between zero to one,
which we can interpret as a probability rather than zero or one like in the
perceptron model. One of the most commonly used sigmoid functions is the logistic
function, characteristic of an "S" shaped curve.The sigmoid function is commonly used as
an activation function in artificial neural networks. In feedforward neural networks, the sigmoid
function is applied to each neuron’s output, allowing the network to introduce non-linearity into
the model. This nonlinearity is important because it allows the neural network to learn more
complex decision boundaries, which can improve its performance on specific tasks.
Advantages:
1.Produces output values between 0 and 1, which can be helpful for binary classification and
Disadvantages:
1.It can produce output values close to 0 or 1, which can cause problems with the optimization
algorithm.
2.The gradient of the sigmoid function becomes very small near the output values of 0 or 1,
which makes it difficult for the optimization algorithm to adjust the weights and biases of the
neurons.
Here is a comparison table of sigmoid, ReLU, and tanh activation functions in terms of
During data flow, input nodes receive data, which travel through hidden layers, and
exit output nodes. No links exist in the network that could get used to by sending
information back from the output node.
Feed forward neural networks serve as the basis for object detection in photos, as
shown in the Google Photos app.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1,
while if it falls below the threshold, it is usually -1.
As a feed forward neural network model, the single-layer perceptron often gets used
for classification. Machine learning can also get integrated into single-layer
perceptrons. Through training, neural networks can adjust their weights based on a
property called the delta rule, which helps them compare their outputs with the
intended values.
Input layer:
The neurons of this layer receive input and pass it on to the other layers of the
network. Feature or attribute numbers in the dataset must match the number of
neurons in the input layer.
Output layer:
According to the type of model getting built, this layer represents the forecasted
feature.
Hidden layer:
Input and output layers get separated by hidden layers. Depending on the type of
model, there may be several hidden layers.There are several neurons in hidden
layers that transform the input before actually transferring it to the next layer. This
network gets constantly updated with weights in order to make it easier to predict.
Neuron weights:
Neurons:
Artificial neurons get used in feed forward networks, which later get adapted from
biological neurons. A neural network consists of artificial neurons.Neurons function
in two ways: first, they create weighted input sums, and second, they activate the
sums to make them normal.Activation functions can either be linear or nonlinear.
Neurons have weights based on their inputs. During the learning phase, the network
studies these weights.
Activation Function:
Sigmoid:
Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
Cost function
In a feed forward neural network, the cost function plays an important role. The
categorized data points are little affected by minor adjustments to weights and
biases.Thus, a smooth cost function can get used to determine a method of
adjusting weights and biases to improve performance.
Image source
Where,
w = the weights gathered in the network
b = biases
a = output vectors
x = input
Loss function
The loss function of a neural network gets used to determine if an adjustment needs
to be made in the learning process.Neurons in the output layer are equal to the
number of classes. Showing the differences between predicted and actual
probability distributions. Following is the cross-entropy loss for binary classification.
Image source
In the gradient descent algorithm, the next point gets calculated by scaling the
gradient at the current position by a learning rate. Then subtracted from the current
position by the achieved value.To decrease the function, it subtracts the value (to
increase, it would add). As an example, here is how to write this procedure:
The gradient gets adjusted by the parameter η, which also determines the step size.
Performance is significantly affected by the learning rate in machine learning.
Output units
In the output layer, output units are those units that provide the desired output or
prediction, thereby fulfilling the task that the neural network needs to complete.There
is a close relationship between the choice of output units and the cost function. Any
unit that can serve as a hidden unit can also serve as an output unit in a neural
network.
https://www.javatpoint.com/pytorch-backpropagation-process-in-deep-neural-network// check
the numerical