0% found this document useful (0 votes)
43 views38 pages

Unit 4

The document discusses artificial neural networks (ANNs). It describes the key components of ANNs including their layered architecture with input, hidden, and output layers. Different ANN types are presented such as feedforward and recurrent networks. Perceptrons, multi-layer perceptrons, backpropagation, and other ANN algorithms are also summarized.

Uploaded by

Abhinav Kaushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views38 pages

Unit 4

The document discusses artificial neural networks (ANNs). It describes the key components of ANNs including their layered architecture with input, hidden, and output layers. Different ANN types are presented such as feedforward and recurrent networks. Perceptrons, multi-layer perceptrons, backpropagation, and other ANN algorithms are also summarized.

Uploaded by

Abhinav Kaushik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Unit-4

Artificial Neural Network


 ANN has 3 layers i.e. Input layer, Hidden layer, and
Output layer.
 Each ANN has a single input & output but may also have
none, one or many hidden layers.
 Structure of ANN classifies into many types of architecture
such as a Single layer, Multi-layer, Feed-forward, and
Recurrent networks.
 There are weights associated with each input neuron in
ANN .
 An activation function is applied over the net input to
calculate the output. The output is then compared to the
target and weights are adjusted.
Components of ANN
 Inputs: Inputs are the set of values for which we need
to predict a output value. They can be viewed as
features or attributes in a dataset.
 Weights: weights are the real values that are attached
with each input/feature and they convey the
importance of that corresponding feature in predicting
the final output.
 Bias: Bias is used for shifting the activation function
towards left or right. It is a constant which helps the
model in a way that it can fit best for the given data.
Components of ANN

 Summation Function: The work of the summation


function is to bind the weights and inputs together
and calculate their sum.
 Activation Function: Activation Function decides
whether a neuron should be activated or not. This
means that it will decide whether the neuron’s input to
the network is important or not in the process of
prediction using simpler mathematical operations.
Neurons
 The building blocks for neural networks are artificial
neurons.
 These are simple computational units that have
weighted input signals and produce an output signal
using an activation function.
Basic Structure of ANN
Feed-forward Vs Back-propagation
 Two essential terms describing the movement of
information—feed-forward and back-propagation.
 Feed-forward Propagation - the flow of information
occurs in the forward direction. The input is used to
calculate some intermediate function in the hidden
layer, which is then used to calculate an output.
 Back-propagation - the weights of the network
connections are repeatedly adjusted to minimize the
difference between the actual output vector of the net
and the desired output vector.
Perceptron
 A perceptron is a neural network unit that does a precise
computation to detect features in the input data
 Perceptron is mainly used to classify the data into two parts.
Therefore, it is also known as Linear Binary Classifier.
 Given inputs x1 through xn, the output 0(x1,…..,xn) computed
by perceptron is :
o(x1,…….,xn) = 1 if w0+w1x1+w2x2+…….+wnxn >0
-1 , otherwise
where each wi is a real- valued constant or weight that
determines contribution of input xi to the perceptron output
 Perceptron model works in two important steps as follows:
 Step-1 : In the first step first, multiply all input values with
corresponding weight values and then add them to determine
the weighted sum. Mathematically, we can calculate the weighted
sum as follows:
∑wi*xi = x1*w1 + x2*w2 +…wn*xn
 Add a special term called bias 'b' to this weighted sum to
improve the model's performance.
∑wi*xi + b
 Step-2 : In the 2nd step, an activation function is applied with
the above-mentioned weighted sum, which gives us output either
in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
This step function or Activation function is vital in ensuring that
output is mapped between (0,1) or (-1,1).
Multi-layer Perceptron Model
 In single layer perceptron there are only input & output
layers
 It is a 2 layer architecture
 A multi-layer perceptron is a neural network that has
multiple layers.
 MLPs are faster than single layer networks
 A multi-layer perceptron has one input layer and for each
input, there is one neuron(or node), it has one output
layer with a single node for each output and it can have
any number of hidden layers and each hidden layer can
have any number of nodes.
Multi-Layer Perceptron Model
 A schematic diagram of a Multi-Layer Perceptron
(MLP) is depicted below :
Perceptron Training Rule
Activation Functions of Perceptron
 The activation function applies a step rule (convert the
numerical output into +1 or -1) to check if the output of the
weighting function is greater than zero or not.
Delta & Gradient Descent Rule
 The perceptron rule finds a successful weight vector
when the training examples are linearly separable, it
can fail to converge if the examples are not linearly
separable.
Delta & Gradient Descent Rule
 A second training rule, called the delta rule, is
designed to overcome this difficulty.
 The key idea behind the delta rule is to use gradient
descent to search the hypothesis space of possible
weight vectors to find the weights that best fit the
training examples.
 The delta training rule is best understood by
considering the task of training an unthresholded
perceptron; that is, a linear unit for which the output o
is given by
Delta & Gradient Descent Rule
 In order to derive a weight learning rule for linear units,
we specify a measure for the training error of a hypothesis
(weight vector), relative to the training examples :

where D is the set of training examples, ‘td’ is the target


output for training example ‘d’, and od is the output of
the linear unit for training example ‘d’.
 The direction of steepest descent can be found by computing
the derivative of E with respect to each component of the vector
w.
 This vector derivative is called the gradient of E with respect to
w, written as,

 The training rule for gradient descent is :

where

The negative sign is present because we want to move the weight


vector in the direction that decreases E.
 This training rule can also be written in its component
form as :

Where
 Finally,
Gradient Descent Algorithm
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Derivation of Back-propagation
Generalization of Neural Networks
 Generalization is a term used to describe a model’s ability
to react to new data i.e after being trained on a training
set, a model can digest new data and make accurate
predictions.
 A model’s ability to generalize is central to the success of a
model.
 If a model has been trained too well on training data, it
will be unable to generalize. It will make inaccurate
predictions when given new data, making the model
useless even though it is able to make accurate predictions
for the training data. This is called over-fitting. The
inverse is also true
Generalization of Neural Networks
 Under-fitting happens when a model has not been trained
enough on the data. In the case of under-fitting, it makes
the model just as useless and it is not capable of making
accurate predictions, even with the training data.
 In any real world application, the performance of ANN
mostly depends upon its generalization capability.
 Generalization of ANN is its ability to handle unseen data.
 Generalization capability of the network is mostly
determined by system complexity and training of the
network
Generalization of Neural Networks
SOM Algorithm
 Self Organizing Map (or Kohonen Map or SOM) is a
type of ANN
 It follows an unsupervised learning approach and trains its
network through a competitive learning algorithm.
 SOM is used for clustering and mapping (or dimensionality
reduction) techniques to map multidimensional data onto
lower-dimensional which helps to reduce complex problems
for easy interpretation.
 SOM has two layers, one is the Input layer and the other one
is the Output layer.
SOM Algorithm
 The architecture of the Self Organizing Map with two
clusters and n input features of any sample is given
below:
SOM Algorithm
SOM Algorithm
Deep Learning
 Deep learning is a machine learning technique
 The inspiration for deep learning is the way that the human
brain filters information.
 The majority of modern deep learning architectures are based
on ANNs
Classification of Neural Networks :
 Shallow neural network: The Shallow neural network has
only one hidden layer between the input and output.
 Deep neural network: Deep neural networks have more than
one layer. For instance, Google LeNet model for image
recognition counts 22 layers.
 Nowadays, deep learning is used in many ways like a driverless
car, mobile phone, Google Search Engine, Fraud detection, TV,
and so on.

You might also like