Unit - 3-NNDL - Notes
Unit - 3-NNDL - Notes
Introduction to Deep Learning, Historical Trends in Deep Learning, Deep Feed - Forward
Networks, Gradient-Based Learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
Introduction to Deep Learning:
Deep learning is the branch of machine learning which is based on artificial Neural
Network architecture. An artificial Neural Network or A
NN uses Layers of interconnected nodes called Neurons that work together to process and
learn from the Input data.
In a fully connected Deep Neural Network, there is an Input Layer and one or more
Hidden Layers connected one after the other. Each neuron receives Input from the previous
Layer Neurons or the Input Layer. The Output of one neuron becomes the Input to other
Neurons in the next Layer of the Network, and this process continues until the final Layer
produces the Output of the Network.
The Layers of the Neural Network transform the Input data through a series of
nonlinear transformations, allowing the Network to learn complex representations of the Input
data.
Artificial Intelligence
Machine
Learning Data Science
Deep
Learning
Architectures:
Applications:
Data Compression
Pattern Recognition
Computer Vision
Sonar Target Recognition
Speech Recognition
Handwritten Characters Recognition
For Example: To guess the succeeding word in any sentence, one must have knowledge
about the words that were previously used. It is not only processes the Inputs but also shares
the length as well as weights crossways time.
Applications:
Machine Translation
Robot Control
Time Series Prediction & Anomaly Detection
Speech Recognition
Speech Synthesis
Rhythm Learning
Music Composition
Applications:
Identify Faces, Street Signs, and Tumors.
Image Recognition.
Video Analysis.
NLP.
Anomaly Detection.
Drug Discovery.
Checkers Game.
Time Series Forecasting.
Applications:
Filtering.
Feature Learning.
Classification.
Risk Detection.
Business and Economic Analysis.
1) The first Input is fed to the Network, which is represented as Matrix x1, x2, and 1
where 1 is the bias value.
[ x1 x2 1 ]
2) Each Input is multiplied by weight with respect to the first and second model to obtain
their probability of being in the positive region in each model.
Multiply the Inputs by a matrix of weight using matrix multiplication.
3) After that, Take the sigmoid of our scores and gives us the probability of the point being
in the positive region in both models.
4) Multiply the probability which is obtained from the previous step with the second set of
weights. Include 1 as bias of one whenever taking a combination of Inputs.
So, what we will do we use our non-linear model to produce an Output that describes the
probability of the point being in the positive region. The point was represented by 2 and 2.
Along with bias, we will represent the Input as
[ 2 2 1]
The first linear model in the Hidden Layer recall and the equation defined it
-4x1-x2=12
Which means in the first Layer to obtain the linear combination the Inputs are multiplied
by -4, -1 and the bias value is multiplied by twelve.
The weight of the Inputs are multiplied by -1/5, 1, and the bias is multiplied by three to
obtain the linear combination of that same point in our second model.
The second Layer contains the weights which dictated the combination of the linear
models in the first Layer to obtain the non-linear model in the second Layer. The weights are
1.5, 1, and a bias value of 0.5.
Now, we have to multiply our probabilities from the first Layer with the second set of weights
as
It is complete math behind the feed forward process where the Inputs from the Input
traverse the entire depth of the Neural Network. In this example, there is only one Hidden
Layer. Whether there is one Hidden Layer or twenty, the computational processes are the
same for all Hidden Layers.
A Feed-Forward Neural Network is none other than , which ensures that the nodes do
not form a cycle. In this kind of Neural Network, all the perceptions’ are organized within
Layers, such that the Input Layer takes the Input, and the Output Layer generates the Output.
Since the Hidden Layers do not link with the outside world, it is named as Hidden Layers.
Each of the perceptions contained in one single Layer is associated with each node in
the subsequent Layer. It can be concluded that all of the nodes are fully connected.
It does not contain any visible or invisible connection between the nodes in the same
Layer. There are no back-loops in the Feed-Forward Network. To minimize the prediction
error, the back propagation algorithm can be used to update the weight values.
This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a Gradient descent algorithm is to minimize the
cost function using iteration. To achieve this goal, it performs two steps iteratively:
Calculates the first-order derivative of the function to compute the Gradient or
slope of that function.
Move away from the direction of the gradient, which means slope increased from
the current point by alpha times, where Alpha is defined as Learning Rate. It is a
tuning parameter in the optimization process , helps to decide the length of the
steps.
What is Cost-Function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the form of a
single real number.
It helps to increase and improve machine learning efficiency by providing feedback to
this model so that it can minimize error and find the local or global minimum.
Further, it continuously iterates along the direction of the negative Gradient until the
cost function approaches zero. At this steepest descent point, the model will stop
learning further.
Although cost function and loss function are considered synonymous, also there is a
minor difference between them.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
The slight difference between the loss function and the cost function is about the error
within the training of machine learning models, as loss function refers to the error of
one training example, while a cost function calculates the average error across an
entire training set.
The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using Gradient descent algorithms over known data to
reduce the cost function.
Hypothesis
Parameters
Cost function
Goal
Y = mx + c Where m is the slope of the line, and 'c' is the intercepts on the Y-
axis.
Hidden Units:
The design of Hidden units is an extremely active area of research and these does not
have many definitive guiding theoretical principles .
Rectified Linear Units are an excellent default choice of Hidden unit.
It is usually impossible to predict in advance which will work best.
The design process consists of trial and error, intuiting that a kind of Hidden unit may
work well, and evaluating its performance on a validation set.
Some Hidden units are not differentiable at all Input points.
Only difference with linear units that their output is 0 across half its domain
Derivative is 1 everywhere that the unit is active
Thus Gradient direction is far more useful than with activation functions with second-
order effects
Rectified Linear Units are typically used on top of an affine
transformation: h=g(W Tx+b)ℎ.
Good practice to set all elements of b to a small value such as 0.1.
This makes it likely that RLU will be initially active for most training samples and allow
derivatives to pass through
One drawback to rectified linear units is that they cannot learn via gradient based
methods on examples for which their activation is zero.
Three generalizations of rectified linear units are based on using a non-zero slope αi when
zi < 0: hi=g(z, α) i=max(0,zi)+αi
min(0,zi) hi=g(z, α) i=max(0,zi)+αimin(0,zi).
1. Absolute value rectification fixes αi = −1 to obtain g(z) = |z|. It is used for object
recognition from images
2. A leaky ReLU fixes αi to a small value like 0.01
3. parametric ReLU treats αi as a learnable parameter
Architecture Design:
The architecture of a neural network is the structure of interconnected nodes, called
neurons, that are organized in layers. The design of a neural network architecture is
important because it determines how the network functions and learns.
When designing a neural network architecture, you can consider things like:
Problem: Understand the problem you're trying to solve
Model objectives: Define what you want the model to do
Network type: Choose the type of network you want to use
Model complexity: Consider how complex the model should be
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
Layers and units: Decide how many layers and units the network should have
Activation functions: Choose the activation functions to use
Regularization and dropout: Decide how to use regularization and dropout
Optimization algorithm and learning rate: Select an optimization algorithm and
learning rate
Some types of neural network architectures include:
Convolutional Neural Networks (CNNs): Used for image processing and analysis,
such as image classification and facial recognition
Recurrent Neural Networks (RNNs): Used for processing sequential data, such as
time-series data
Generative Adversarial Networks (GANs): Used for generative tasks, where the
network automatically learns to generate new data that resembles the original dataset
Other types of neural network models include: Feedforward Neural Network, Long
Short-Term Memory (LSTM) Network, Gated Recurrent Unit (GRU) Network, Auto
encoder, and Radial Basis Function Network (RBFN).
Architecture Terminology
The word architecture refers to the overall structure of the Network:
How many units should It has?
How the units should be connected to each other?
Most Neural Networks are organized into groups of units called Layers
Most Neural Network architectures arrange these Layers in a chain structure
With each Layer being a function of the Layer that preceded.
Neural Network:
Neural Networks are an information processing paradigm inspired by the human
nervous system. Just like in the human nervous system, we have biological Neurons in the
same way in Neural Networks we have artificial neurons, artificial Neurons are mathematical
functions derived from biological neurons.
The human brain is estimated to have about 10 billion neurons, each connected to an
average of 10,000 other neurons. Each neuron receives a signal through a synapse, which
controls the effect of the sign concerning on the neuron.
Back Propagation:
The back propagation algorithm works by computing the Gradient of the loss function
with respect to each weight via the chain rule, computing the Gradient Layer by Layer, and
iterating backward from the last Layer to avoid redundant computation of intermediate terms
in the chain rule.
Parameters :
x = Inputs Training Vector x=(x1,x2,…………xn).
t = Target Vector t=(t1,t2……………tn).
δk = Error at Output Unit.
δj = Error at Hidden Layer.
α = Learning Rate.
V0 j = Bias of Hidden Unit j.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
Training Algorithm :
Step 6: Each Output unit Yk (k=1 to n) receives a target pattern corresponding to an Input
pattern then error is calculated as:
δk = ( tk – yk ) + Yink
Step 7: Each Hidden unit Zj (j=1 to a) sums its Input from all units in the Layer above
δinj = Σ δj W jk
The error information term is calculated as :
δj = δinj + Zinj
Step 8: Each Output unit Yk (k=1 to m) updates its bias and weight (j=1 to a).
The weight correction term is given by :
Δ wjk = α δk zj and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each Hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight
connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j