0% found this document useful (0 votes)
30 views41 pages

Unit Iv

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views41 pages

Unit Iv

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT IV:

Artificial Neural Networks - Introduction, Understanding the Biological Neuron, Exploring the Artificial
Neuron, Types of Activation Functions, Early Implementations of ANN, Architectures of Neural Network:
Single-layer feed forward network, Multi-layer feed forward ANNs , Recurrent network, Learning Process
in ANN, Back propagation, Deep Learning (TB-1-Ch-10).

Deep learning is a technique which mimics the human brain.(through sensory organs)

Deep Learning AI mimics the intricate neural networks of the human brain, enabling computers to
autonomously discover patterns and make decisions from vast amounts of unstructured data.

Artificial intelligence is the overarching system. Machine learning is a subset of AI.


Deep learning is a subfield of machine learning, and neural networks make up the
backbone of deep learning algorithms.

--> Artificial intelligence is used to classify machines that mimic human intelligence and human cognitive functions
like problem-solving and learning. AI uses predictions and automation to optimize and solve complex tasks that
humans have historically done, such as facial and speech recognition, decision making and translation.

--> Machine learning depends on human intervention to allow a computer system to identify patterns, learn,
perform specific tasks and provide accurate results.

--> Deep learning is a subset of machine learning. The primary difference between machine learning and deep
learning is how each algorithm learns and how much data each type of algorithm uses.
Deep learning automates much of the feature extraction piece of the process, eliminating some of the manual
human intervention required. It also enables the use of large data sets, earning the title of scalable machine
learning. A deep-learning model requires more data points to improve accuracy, whereas a machine-
learning model relies on less data given its underlying data structure.

--> Neural networks, also called artificial neural networks (ANNs) or simulated neural networks (SNNs), are a
subset of machine learning and are the backbone of deep learning algorithms. They are called “neural” because
they mimic how neurons in the brain signal one another.
Neural networks are made up of node layers – an input layer, one or more hidden layers, and an output
layer. Each node is an artificial neuron that connects to the next, and each has a weight and threshold value.
When one node’s output is above the threshold value, that node is activated and sends its data to the network’s
next layer. If it’s below the threshold, no data passes along.

Using neural networks, speech and image recognition tasks can happen in minutes instead of the hours
they take when done manually. Google’s search algorithm is a well-known example of a neural network. Neural
networks are widely used in a variety of applications, including image recognition, predictive modeling and natural
language processing (NLP).

Applications of artificial neural networks


Image recognition was one of the first areas in which neural networks were successfully applied. But the
technology uses have expanded to many more areas:

● Chatbots.
● NLP, translation and language generation.
● Stock market predictions.
● Delivery driver route planning and optimization.
● Drug discovery and development.
● Social media.
● Personal assistants.
the “deep” in deep learning refers to the depth of layers in a neural network. A neural network of more
than three layers, including the inputs and the output, can be considered a deep-learning algorithm.

Artificial Neural Networks – Introduction:

We learnt the machine learning process maps with the human learning process. Now, it is time to
see how the human nervous system has been mimicked in the computer world in the form of an
artificial neural network or simply a neural network.

Machine learning, is how a machine learns to perform tasks and improves with time from experience, by
expert guidance or by itself. Machine learning, as we have seen, mimics the human form of learning.
On the other hand, human learning, or for that matter every action of a human being, is controlled by the
nervous system. In any human being, the nervous system coordinates the different actions by
transmitting signals to and from different parts of the body.
The nervous system is constituted of a special type of cell, called neuron or nerve cell, which has
special structures allowing it to receive or send signals to other neurons.
Neurons connect with each other to transmit signals to or receive signals from other neurons. This
structure essentially forms a network of neurons or a neural network.

The biological neural network is a massively large and complex parallel computing network. It is because
of this massive parallel computing network that the nervous system helps human beings to perform actions
or take decisions at a speed and with such ease that the fastest supercomputer of the world will also be
envy of. For example, let us think of the superb flying catches taken by the fielders in the cricket world
cup (massive parallel complex network, i.e. the neural network.)

The fascinating capability of the biological neural network has inspired the inception of artificial
neural network (ANN).

An ANN is made up of artificial neurons. An ANN is a machine designed to model the functioning of
the nervous system or, more specifically, the neurons. The only difference is that the biological form
of neuron is replicated in the electronic or digital form of neuron. Digital neurons or artificial neurons
form the smallest processing units of the ANNs.

Types of Neural Networks:


1. Feed Forward Neural Networks --> used for supervised learning in cases where the data to be learned is neither sequential nor time-
dependent
2. Recurrent Neural Networks --> used in speech recognition and natural language processing.
3. Convolutional Neural Networks --> useful for finding patterns in images to recognize objects, classes, and categories.

Advantages of ANN:
A. Parallel processing abilities
B. Information storage
C. Non-linearity
D. Fault Tolerance
E. Unrestricted input variables
F. Observation-based decision
G. Unorganized data processing
H. Ability to learn hidden relationship

UNDERSTANDING THE BIOLOGICAL NEURON:

The human nervous system has two main parts –


⮚ The central nervous system (CNS) consisting of the brain and spinal cord
⮚ The peripheral nervous system consisting of nerves and ganglia outside the brain and spinal cord.

The CNS integrates all information, in the form of signals, from the different parts of the body. The
peripheral nervous system, on the other hand, connects the CNS with the limbs and organs. Neurons are
basic structural units of the CNS. A neuron is able to receive, process, and transmit information in the
form of chemical and electrical signals.

“Neurons are the fundamental unit of the nervous system specialized to transmit information to
different parts of the body.” This is carried out in both physical and electrical forms.
It has three main parts to carry out its primary functionality of receiving and transmitting information:
1. Dendrites – to receive signals from neighbouring neurons.
2. Soma – main body of the neuron which accumulates the signals coming from the different dendrites.
It ‘fires’ when a sufficient amount of signal is accumulated.
3. Axon – last part of the neuron which receives signal from soma, once the neuron ‘fires’, and passes it
on to the neighbouring neurons through the axon terminals (to the adjacent dendrite of the neighbouring
neurons).
There is a very small gap between the axon terminal of one neuron and the adjacent dendrite of the
neighbouring neuron. This small gap is known as synapse. The signals transmitted through synapse may
be excitatory or inhibitory.
The axon, a human neuron, is 10–12 μm in diameter. Each synapse spans a gap of about a millionth of
an inch wide.
Note: Soma contains nucleus and other parts of the cell needed to sustain its life.

Neuron

FIG. 10.1 Structure of biological neuron


typical diagram of Biological Neural Network

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected to one another in various layers of
the networks. These neurons are known as nodes.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell
nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Relationship between Biological neural network and artificial neural network:

An Artificial Neural Network in the field of Artificial intelligence where it attempts to mimic the
network of neurons makes up a human brain so that computers will have an option to understand things
and make decisions in a human-like manner.
The basic structure of a neural network consists of three main components: the input layer, the hidden layer, and the
output layer. A neural network can have one or multiple input, hidden, or output layers depending on complexity.

Information is received in the input layer, and the input node processes it, decides how to categorise it, and transfers it to the
next layer: the hidden layer.

Information is received from both the input layer and other hidden layers in the hidden layer. There are various hidden layers
based on the type of neural network being used. At this point in the process, hidden layers take the input, process the
information from the previous layer, and then move it on to the next layer, either another hidden layer or the output layer.

The output layer is the final layer in a neural network. After receiving the data from the hidden layer (or layers), the output layer
processes it and produces the output value.

EXPLORING THE ARTIFICIAL NEURON:

The biological neural network has been modelled in the form of ANN with artificial neurons simulating
the function of biological neurons.

FIG. 10.2 Structure of an artificial neuron

In Figure 10.2, input signal xi (x1 , x2 , …,xn ) comes to an artificial neuron. Each neuron has three major
components:

1. A set of ‘i’ synapses having weight wi . A signal xi forms the input to the i-th synapse having weight
wi The value of weight w may be positive or negative. A positive weight has an excitatory effect, while a
negative weight has an inhibitory effect on the output of the summation junction, ysum .
(Note: Excitatory effects on the neuron-- means they increase the likelihood that the neuron will fire an action potential.
Inhibitory effects on the neuron-- means they decrease the likelihood that the neuron will fire an action.)

2. A summation junction for the input signals is weighted by the respective synaptic weight. Because it
is a linear combiner or adder of the weighted input signals, the output of the summation junction, ysum ,
can be expressed as follows:

[Note: Typically, a neural network also includes a bias which adjusts the input of the activation function. However, for the
sake of simplicity, we are ignoring bias for the time being. In the case of a bias ‘b’, the value of ysum would have been as
follows:

3. A threshold activation function (or simply activation function, also called squashing function)
results in an output signal only when an input signal exceeding a specific threshold value comes as an
input. It is similar in behavior to the biological neuron which transmits the signal only when the total
input signal meets the firing threshold.

[Note: An activation function is a function that is added into an artificial neural network in order to help the
network learn complex patterns in the data. When comparing with a neuron-based model that is in our brains,
the activation function is at the end deciding what is to be fired to the next neuron.]

Output of the activation function, yout , can be expressed as follows:

Activation functions are an integral building block of neural networks that enable them to learn complex patterns in
data.
Activation functions are an integral building block of neural networks that enable them to learn complex patterns in data. They transform the input
signal of a node in a neural network into an output signal that is then passed on to the next layer. Without activation functions, neural networks would be
restricted to modeling only linear relationships between inputs and outputs.
Activation functions enable neural networks to learn these non-linear relationships by introducing non-linear behaviors through activation
functions. This greatly increases the flexibility and power of neural networks to model complex and nuanced data.

Fig: Neuron

Input Layer: As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer: The hidden layer presents in-between input and output layers. It performs all the
calculations to find hidden features and patterns.
Output Layer: The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a
bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to


produce the output. Activation functions choose whether a node should fire or not. Only those who are
fired make it to the output layer.

Types of Activation Functions:


There are different types of activation functions. The most commonly used activation functions are:
1. Identity function
Identity function is used as an activation function for the input layer. It is a linear function having the
form the output remains the same as the input.

2. Threshold/step function
Step/threshold function is a commonly used activation function. As depicted in Figure 10.3a, step
function gives 1 as output if the input is either 0 or positive. If the input is negative, the step function
gives 0 as output. Expressing mathematically,

Fig: 10.3 Step and Threshold functions

The threshold function (depicted in Fig. 10.3b) is almost like the step function, with the only difference
being the fact that θ is used as a threshold value instead of 0. Expressing mathematically,
3. ReLU (Rectified Linear Unit) function
ReLU is the most popularly used activation function in the areas of convolutional neural networks and
deep learning. It is of the form

This means that f(x) is zero when x is less than zero and f(x) is equal to x when x is above or equal to
zero. Figure 10.4 depicts the curve for a ReLU activation function.

FIG. 10.4 ReLU function

This function is differentiable, except at a single point x = 0.


In that sense, the derivative of a ReLU is actually a sub-derivative.

4. Sigmoid function

It is most commonly used activation function in neural networks. The need for sigmoid function stems
from the fact that many learning algorithms require the activation function to be differentiable and hence
continuous. Step function is not suitable in those situations as it is not continuous. There are two types of
sigmoid function:
1. Binary sigmoid function
2. Bipolar sigmoid function

a. Binary Sigmoid Function:

where k = steepness or slope parameter of the sigmoid function. By varying the value of k, sigmoid
functions with different slopes can be obtained. It has range of (0, 1).
Fig 10.5: Sigmoid Function

The slope at origin is k/4. As the value of k becomes very large, the sigmoid function becomes a
threshold function.
b. Bipolar Sigmoid Function:
A bipolar sigmoid function, depicted in Figure 10.5b, is of the form

The range of values of sigmoid functions can be varied depending on the application. However, the
range of (−1, +1) is most commonly adopted.

Fig: Perceptron Training Rule

5. Hyperbolic tangent function:


Hyperbolic tangent function is another continuous activation function, which is bipolar in nature. It is a
widely adopted activation function for a special type of neural network known as backpropagation
network (discussed elaborately in Section 10.8). The hyperbolic tangent function is of the form

This function is similar to the bipolar sigmoid function.


In some cases, it is desirable to have values ranging from −1 to +1. In that case, there will be a need to
reframe the activation function. For example, in the case of step function, the revised definition would
be as follows:
Properties of Neural Networks

• Many neuron-like threshold switching units


• Many weighted interconnections among units
• Highly parallel, distributed process
• Emphasis on tuning weights automatically
• Input is a high-dimensional discrete or real-valued (e.g, sensor input)
EARLY IMPLEMENTATIONS OF ANN:
1. McCulloch–Pitts model of neuron:
The McCulloch–Pitts neural model (depicted in Fig. 10.6), which was the earliest ANN model, has only
two types of inputs – excitatory and inhibitory. The excitatory inputs have weights of positive
magnitude and the inhibitory weights have weights of negative magnitude. The inputs of the
McCulloch– Pitts neuron could be either 0 or 1. It has a threshold function as activation function.
So, the output signal yout is 1 if the input ysum is greater than or equal to a given threshold value,
else 0.
Simple McCulloch–Pitts neurons can be used to design logical operations. For that purpose, the
connection weights need to be correctly decided along with the threshold function (rather the threshold
value of the activation function). Let us take a small example.

FIG. 10.6 McCulloch–Pitts neuron

Ex:
John carries an umbrella if it is sunny or if it is raining.
There are four given situations. We need to decide when John will carry the umbrella. The situations are
as follows:
● Situation 1 – It is not raining nor is it sunny.
● Situation 2 – It is not raining, but it is sunny.
● Situation 3 – It is raining, and it is not sunny.
● Situation 4 – Wow, it is so strange! It is raining as well as it is sunny.
To analyse the situations using the McCulloch–Pitts neural model, we can consider the input signals as
follows:
● x 1→ Is it raining?
● x 2→ Is it sunny?
So, the value of both x1 and x2 can be either 0 or 1. We can use the value of both weights x1 and x2 as 1
and a threshold value of the activation function as 1. So, the neural model will look as Figure 10.7a.

FIG. 10.7 McCulloch–Pitts neural model (illustration)


Formally, we can say,

The truth table built with respect to the problem is depicted in Figure 10.7b. From the truth table, we can
conclude that in the situations where the value of y is 1, John needs to carry an umbrella. Hence, he will
need to carry an umbrella in situations 2, 3, and 4. Surprised, as it looks like a typical logic problem
related to ‘OR’ function? Do not worry, it is really an implementation of logical OR using the
McCulloch–Pitts neural model.

2. Rosenblatt’s perceptron

Perceptron is a linear classifier which helps to classify the given input data.

Rosenblatt’s perceptron is built around the McCulloch–Pitts neural model. The perceptron, as depicted
in Figure 10.7, receives a set of input x1 , x2 ,…, xn . The linear combiner or the adder node computes
the linear combination of the inputs applied to the synapses with synaptic weights being w1, w2, …, wn .
Then, the hard limiter checks whether the resulting sum is positive or negative. If the input of the
hard limiter node is positive, the output is +1, and if the input is negative, the output is −1.
Mathematically, the hard limiter input is

However, perceptron includes an adjustable value or bias as an additional weight w0. This additional
weight w0 is attached to a dummy input x0, which is always assigned a value of 1.
This consideration modifies the above equation to

The output is decided by the expression

The objective of perceptron is to classify a set of inputs into two classes, c1 and c2. This can be done
using a very simple decision rule – assign the inputs x0, x1, x2 , …, xn to c1 if the output of the
perceptron, i.e. yout , is +1 and c2 if yout is −1. So, for an n-dimensional signal space, i.e. a space for ‘n’
input signals x0, x1, x2, …, xn , the simplest form of perceptron will have two decision regions,
resembling two classes, separated by a hyperplane defined by
FIG. 10.8 Rosenblatt’s perceptron

Therefore, for two input signals denoted by variables x1 and x2, the decision boundary is a straight line
of the form

So, for a perceptron having the values of synaptic weights w0, w1, and w2 as −2, ½, and ¼, respectively,
the linear decision boundary will be of the form

8
So, any point (x1, x2) which lies above the decision boundary, as depicted by Figure 10.9, will be
assigned to class c1 and the points which lie below the boundary are assigned to class c2.

FIG. 10.9 Perceptron decision boundary

Let us examine if this perceptron is able to classify a set of points given below:
p1 = (5, 2) and p2 = (−1, 12) belonging to c1
p3 = (3, −5) and p4 = (−2, −1) belonging to c2
As depicted in Figure 10.10, we can see that on the basis of activation function output, only points p1
and p2 generate an output of 1. Hence, they are assigned to class c1 as expected.
On the other hand, p3 and p4 points having activation function output as negative generate an output of
0. Hence, they are assigned to class c2, again as expected.

FIG. 10.10 Class assignment through perceptron


The same classification is obtained by mapping the points in
the input space, as shown in Figure 10.11.

FIG. 10.11 Classification by decision boundary


Thus, we can see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for twodimensional space), decision
planes (for three-dimensional space), or decision hyperplanes (for n-dimensional space).

Appropriate values of the synaptic weights w0 , w1 , w2 , …, wn can be obtained by training a


perceptron. However, one assumption for perceptron to work properly is that the two classes should
be linearly separable (as depicted in Figure 10.12a), i.e. the classes should be sufficiently separated
from each other. Otherwise, if the classes are non-linearly separable (as depicted in Figure 10.12b),
then the classification problem cannot be solved by perceptron.

FIG. 10.12 Class separability


A) Multi-layer perceptron
A basic perceptron works very successfully for data sets which possess linearly separable patterns.
However, in practical situation, that is an ideal situation to have. This was exactly the point driven by
Minsky and Papert in their work (1969).They showed that a basic perceptron is not able to learn to
compute even a simple 2-bit XOR. Why is that so? Let us try to understand.

Figure 10.13 is the truth table highlighting output of a 2-bit XOR function.

In the above fig, the data is not linearly separable. Only a curved decision boundary can separate the
classes properly.
To address this issue, the other option is to use two decision lines in place of one. Figure 10.14
shows how a linear decision boundary with two decision lines can clearly partition the data.

FIG. 10.14 Classification with two decision lines in XOR function output
This is the philosophy used to design the multi-layer perceptron model. The major highlights of this
model are as follows:
● The neural network contains one or more intermediate layers between the input and the output nodes,
which are hidden from both input and output nodes.
● Each neuron in the network includes a non-linear activation function that is differentiable.
● The neurons in each layer are connected with some or all the neurons in the previous layer.
The diagram in Figure 10.15 resembles a fully connected multi-layer perceptron with multiple
hidden layers between the input and output layers. It is called fully connected because any neuron in
any layer of the perceptron is connected with all neurons (or input nodes in the case of the first hidden
layer) in the previous layer. The signals flow from one layer to another layer from left to right.

FIG. 10.15 Multi-layer perceptron


ADALINE network model
Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN developed by Professor
Bernard Widrow of Stanford University. As depicted in Figure 10.16, it has only
output neuron. The output value can be +1 or −1. A bias input x (where x = 1) having a weight w is
added. The activation function is such that if the weighted sum is positive or 0, then the output is 1, else
it is −1. Formally, we can say,

FIG. 10.16 ADALINE network

The supervised learning algorithm adopted by the ADALINE network is known as Least Mean
Square (LMS) or Delta rule.
A network combining a number of ADALINEs is termed as MADALINE (many ADALINE).
MADALINE networks can be used to solve problems related to nonlinear separability.

Note: Both perceptron and ADALINE are neural network models. Both of them are classifiers for binary
classification. They have linear decision boundary and use a threshold activation function.
Architectures of Neural Network:
ANN is a computational system consisting of a large number of interconnected units called artificial
neurons. The connection between artificial neurons can transmit signal from one neuron to another.
Some of the choices are listed below:
● There may be just two layers of neuron in the network – the input and output layer.
● Other than the input and output layers, there may be one or more intermediate ‘hidden’ layers of
neuron.
● The neurons may be connected with one or more of the neurons in the next layer.
● The neurons may be connected with all neurons in the next layer.
● There may be single or multiple output signals. If there are multiple output signals, they might be
connected with each other.
● The output from one layer may become input to neurons in the same or preceding layer.

The main architectures of artificial neural networks, considering the neuron disposition, how they are
interconnected and how its layers are composed, can be divided as follows:
1. Single-layer feedforward network
2. Multi-layer feedforward networks
3. Recurrent or Feedback networks

a) Single-layer feed forward network:

It consists of only two layers as depicted in Figure 10.17 – the input layer and the output layer.
The input layer consists of a set of ‘m’ input neurons X1 , X2 , …, Xm connected to each of the ‘n’
output neurons Y1 , Y2 , …, Yn . The connections carry weights w11 , w12 , …, wmn . The input layer
of neurons does not conduct any processing – they pass the input signals to the output neurons.
The computations are performed only by the neurons in the output layer. So, though it has two
layers of neurons, only one layer is performing the computation. This is the reason why the network
is known as single layer in spite of having two layers of neurons. Also, the signals always flow from
the input layer to the output layer. Hence, this network is known as feed forward.
The net signal input to the output neurons is given by

for the k-th output neuron.


The signal output from each output neuron will depend on the activation function used.

FIG. 10.17 Single-layer feed forward

b) Multi-layer feed forward ANNs:


The multi-layer feed forward network is quite similar to the single-layer feed forward network,
except for the fact that there are one or more intermediate layers of neurons between the input
and the output layers. Hence, the network is termed as multi-layer. The structure of this network is
depicted in Figure 10.18.

FIG. 10.18 Multi-layer feed forward

Single Perceptrons can express linear decision surfaces.


In contrast, Multilayer networks learned by the BACKPROPAGATION algorithm are capable of expressing
a rich variety of nonlinear decision surfaces.
Multi layer network will help the model to learn better from this kind of complex alignment of the data
points.

Each of the layers may have varying number of neurons.


For example, the one shown in Figure 10.18 has ‘m’ neurons in the input layer and ‘r’ neurons in the
output layer, and there is only one hidden layer with ‘n’ neurons.

The net signal input to the neuron in the hidden layer is given by

for the k-th hidden layer neuron.

The net signal input to the neuron in the output layer is given by

for the k-th output layer neuron

c) Competitive network:

The competitive network is almost the same in structure as the single-layer feed forward network.
The only difference is that the output neurons are connected with each other (either partially or
fully). Figure 10.19 depicts a fully connected competitive network.

In competitive networks, for a given input, the output neurons compete amongst themselves to
represent the input. It represents a form of unsupervised learning algorithm in ANN that is suitable to
find clusters in a data set.

FIG. 10.19 Competitive network


d) Recurrent network:
FIG. 10.20 Recurrent neural network
We have seen that in feed forward networks, signals always flow from the input layer towards the output
layer (through the hidden layers in the case of multi-layer feed forward networks), i.e. in one direction.
In the case of recurrent neural networks, there is a small deviation. There is a feedback loop, as depicted
in Figure 10.20, from the neurons in the output layer to the input layer neurons. There may also be
self-loops.
Neural networks imitate the mechanism of human brain.

Neural network is composed of nodes where as brain is composed of neurons.


Neural Network using Matlab - YouTube
b is bias which is associated with the storage of information.
X1, x2, x3 are the signals entering the node and y is output
W1, w2, w3 are the corresponding weights of the signals

Here, the signal x2 will have 5 times more effect than that of x1
If we assign w1=0, then x1 signal will not be transmitted. i.e x1 will not have any effect on the
network.
Here its using linear activation function (i.e. input is same as output)

V=Wx+b
Practically, it is not correct, because if we use linear activation function, mathematically amulti
layer neural network become single layer neural network. i.e. Hidden layer becomes in
effective when we use linear activation function for them.
Supervised Learning with training data:

In machine learning, we modify the model whereas in neural network, we modify the weights.
Neural Network stores information in terms of weights.
The systematic way of modifying the weights is called ‘Learning Rule’. The representative learning
rule of single layer neural network is ‘Delta Rule’
ei is the error of node i, and alpha is the learning rate (in between 0 and 1).
If the alpha value is too low, the output fails to converge to an acceptable solution.
If the alpha value is too high, the output wanders around the expected solution.

All training data goes through step 2 to step 5 is called an epoch. i.e. one training iteration for all data
represents one epoch.

MAT LAB Approach:


LEARNING PROCESS IN ANN:

What is learning in the context of ANNs?


There are four major aspects which need to be decided:
1. The number of layers in the network
2. The direction of signal flow
3. The number of nodes in each layer
4. The value of weights attached with each interconnection between neurons

1. Number of layers:
A neural network may have a single layer or multi-layer.
In the case of a single layer, a set of neurons in the input layer receives signal, i.e. a single feature per
neuron, from the data set. The value of the feature is transformed by the activation function of the
input neuron. The signals processed by the neurons in the input layer are then forwarded to the
neurons in the output layer. The neurons in the output layer use their own activation function to
generate the final prediction.
More complex networks may be designed with multiple hidden layers between the input layer and
the output layer. Most of the multi-layer networks are fully connected.
2. Direction of signal flow:
In certain networks, termed as feed forward networks, signal is always fed in one direction, i.e. from
the input layer towards the output layer through the hidden layers, if there is any.
However, certain networks, such as the recurrent network, also allow signals to travel from the output
layer to the input layer.
This is also an important consideration for choosing the correct learning model.
3. Number of nodes in layers:
In the case of a multi-layer network, the number of nodes in each layer can be varied. However, the
number of nodes or neurons in the input layer is equal to the number of features of the input data
set. Similarly, the number of output nodes will depend on possible outcomes, e.g. number of classes in
the case of supervised learning. So, the number of nodes in each of the hidden layers is to be chosen
by the user. A larger number of nodes in the hidden layer help in improving the performance.
However, too many nodes may result in overfitting as well as an increased computational expense.
4. Weight of interconnection between neurons:
For solving a learning problem using ANN, we can start with a set of values for the synaptic weights
and keep doing changes to those values in multiple iterations. In the case of supervised learning, the
objective to be pursued is to reduce the number of misclassifications. Ideally, the iterations for
making changes in weight values should be continued till there is no misclassification.
However, in practice, such a stopping criterion may not be possible to achieve. Practical stopping
criteria may be the rate of misclassification less than a specific threshold value, say 1%, or the
maximum number of iterations reaches a threshold, say 25, etc. There may be other practical
challenges to deal with, such as the rate of misclassification is not reducing progressively. This may
become a bigger problem when the number of interconnections and hence the number of weights
keeps increasing. There are ways to deal with those challenges, which we will see in more details in the
next section.
So, to summarize, learning process using ANN is a combination of multiple aspects – which include
deciding the number of hidden layers, number of nodes in each of the hidden layers, direction of signal
flow, and last but not the least, deciding the connection weights.

Multi-layer feed forward network is a commonly adopted architecture. It has been observed
that a neural network with even one hidden layer can be used to reasonably approximate any
continuous function. The learning method adopted to train a multi-layer feed forward network is
termed as backpropagation, which we will study in the next section.

Backpropagation:
In 1986, an efficient method of training an ANN was discovered. In this method, errors, i.e. difference
in output values of the output layer and the expected values, are propagated back from the output
layer to the preceding layers. Hence, the algorithm implementing this method is known as
backpropagation, i.e. propagating the errors backward to the preceding layers.
The backpropagation algorithm is applicable for multi-layer feed forward networks. It is a supervised
learning algorithm which continues to adjust the weights of the connected neurons with an objective
to reduce the deviation of the output signal from the target output. This algorithm consists of
multiple iterations, also known as epochs. Each epoch consists of two phases –
● A forward phase in which the signals flow from the neurons in the input layer to the neurons in
the output layer through the hidden layers. The weights of the interconnections and activation
functions are used during the flow. In the output layer, the output signals are generated.
● A backward phase in which the output signal is compared with the expected value. The
computed errors are propagated backwards from the output to the preceding layers. The errors
propagated back are used to adjust the interconnection weights between the layers.
The iterations continue till a stopping criterion is reached.
One main part of the algorithm is adjusting the interconnection weights. This is done using a
technique termed as gradient descent. In simple terms, the algorithm calculates the partial derivative
of the activation function by each interconnection weight to identify the ‘gradient’ or extent of change
of the weight required to minimize the cost function. Quite understandably, therefore, the activation
function needs to be differentiable.

FIG. 10.21 Backpropagation algorithm


We have already seen that multi-layer neural networks have multiple hidden layers. During the
learning phase, the interconnection weights are adjusted on the basis of the errors generated by the
network, i.e. difference in the output signal of the network vis-à-vis the expected value. These errors
generated at the output layer are propagated back to the preceding layers. Because of the backward
propagation of errors which happens during the learning phase, these networks are also called back-
propagation networks or simply backpropagation nets. One such backpropagation net with one hidden
layer is depicted in Figure 10.22. In this network, X is the bias input to the hidden layer and Y is the bias
input to the output layer.
The net signal input to the hidden layer neurons is given by

for the k-th neuron in the hidden layer. If f is the activation function of the hidden layer, then
FIG. 10.22 Backpropagation net
The net signal input to the output layer neurons is given by

for the k-th neuron in the output layer. Note that the input signals to X and Y are assumed as 1. If f is
the activation function of the hidden layer, then

If t is the target output of the k-th output neuron, then the cost function defined as the squared error
of the output layer is given by

So, as a part of the gradient descent algorithm, partial derivative of the cost function E has to be done
with respect to each of the interconnection weights
Mathematically, it can be represented as follows:

for the interconnection weight between the j-th neuron in the hidden layer and the k-th neuron in the
output layer. This expression can be deduced to
If we assume as a component of the weight adjustment needed for
weight wjk corresponding to the k-th output neuron, then

On the basis of this, the weights and bias need to be updated as follows:

Note that ‘α’ is the learning rate of the neural network.


In the same way, we can perform the calculations for the interconnection weights between the input
and hidden layers.
The weights and bias for the interconnection between the input and hidden layers need to be
updated as follows:

Note 1: There are two types of gradient descent algorithms. When a full data set is used in one shot to
compute the gradient, it is known as full batch gradient descent. In the case of stochastic gradient
descent, which is also known as incremental gradient descent, smaller samples of the data are taken
iteratively and used in the gradient computation.
Thus, stochastic gradient descent tries to identify the global minima.
Note 2:
A real-world simile for the gradient descent algorithm is a blind person trying to come down from a hill
top without anyone to assist. The person is not able to see. So, for the person who is not able to see
the path of descent, the only option is to check in which direction the land slope feels to be downward.
One challenge of this approach arises when the person reaches a point which feels to be the lowest, as
all the points surrounding it is higher in slope, but in reality it is not so. Such a point, which is local
minima and not global minima, may be deceiving and stalls the algorithm before reaching the real
global minima.

• The BACKPROPAGATION Algorithm learns the weights for a multilayer network, given a
network with a fixed set of units and interconnections.
• It employs gradient descent to attempt to minimize the squared error between the network
output values and the target values for these outputs.
• In BACKPROPAGATION algorithm, we consider networks with multiple output units rather than
single units as before, so we redefine E to sum the errors over all of the network output units.
DEEP LEARNING:

In a multi-layer neural network, as we keep increasing the number of hidden layers, the computation
becomes very expensive. Going beyond two to three layers becomes quite difficult computationally.
The only way to handle such intense computation is by using graphics processing unit (GPU)
computing.
When we have less number of hidden layers – at the maximum two to three layers, it is a normal
neural network, which is sometimes given the fancy name ‘shallow neural network’. However, when
the number of layers increases, it is termed as deep neural network. One of the earliest deep neural
networks had three hidden layers. Deep learning is a more contemporary branding of deep neural
networks, i.e. multilayer neural networks having more than three layers.
Deep Learning, on the other hand, is just a type of Machine Learning, inspired by the
structure of a human brain. Deep learning algorithms attempt to draw similar conclusions
as humans would by continually analyzing data with a given logical structure. To achieve
this, deep learning uses a multi-layered structure of algorithms called neural networks.
Neural networks enable us to perform many tasks, such as clustering,
classification or regression. With neural networks, we can group or sort unlabeled
data according to similarities among the samples in this data. Or in the case of
classification, we can train the network on a labeled dataset in order to classify the samples
in this dataset into different categories.
The first advantage of deep learning over machine learning is the needlessness of the
so-called feature extraction.

You might also like