0% found this document useful (0 votes)
53 views42 pages

Unit-6 AI (April 11, 2023)

The document discusses artificial neural networks and artificial intelligence. It provides details on neurons, neural networks, their components and layers. It also discusses the goals, advantages and disadvantages of artificial intelligence.

Uploaded by

nikhilbadlani77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views42 pages

Unit-6 AI (April 11, 2023)

The document discusses artificial neural networks and artificial intelligence. It provides details on neurons, neural networks, their components and layers. It also discusses the goals, advantages and disadvantages of artificial intelligence.

Uploaded by

nikhilbadlani77
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Study material of Unit-6

Syllabus Artificial Neural Networks : Neurons and biological motivation.


Linear threshold units. Perceptrons : representational limitation and gradient
descent training. Multilayer networks and backpropagation. Hidden layers
and constructing intermediate, distributed representation. Overfitting ,
learning network structure , recurrent network.

1
What is Artificial Intelligence (AI)?
Artificial Neural Networks (ANN) are the fundamental building blocks
of Artificial Intelligence (AI) technology. ANNs are the basis of machine-
learning models; they simulate the process of learning identical to human brains.
Simply put, ANNs give machines the capacity to accomplish human-like
performance (and beyond) for specific tasks.

In today's world, technology is growing very fast, and we are getting in touch with
different new technologies day by day.

Here, one of the booming technologies of computer science is Artificial


Intelligence which is ready to create a new revolution in the world by making
intelligent machines. The Artificial Intelligence is now all around us. It is currently
working with a variety of subfields, ranging from general to specific, such as self-
driving cars, playing chess, proving theorems, playing music, Painting, etc.

AI is one of the fascinating and universal fields of Computer science which has a
great scope in future. AI holds a tendency to cause a machine to work as a human.

Artificial Intelligence is composed of two words Artificial and Intelligence,


where Artificial defines "man-made," and intelligence defines "thinking power",
hence AI means "a man-made thinking power."

So, we can define AI as:

"It is a branch of computer science by which we can create intelligent machines


which can behave like a human, think like humans, and able to make
decisions."

Artificial Intelligence exists when a machine can have human based skills such as
learning, reasoning, and solving problems

With Artificial Intelligence you do not need to preprogram a machine to do some


work, despite that you can create a machine with programmed algorithms which
can work with own intelligence, and that is the awesomeness of AI.

It is believed that AI is not a new technology, and some people says that as per
Greek myth, there were Mechanical men in early days which can work and behave
like humans.

Why Artificial Intelligence?


2
Before Learning about Artificial Intelligence, we should know that what is the
importance of AI and why should we learn it. Following are some main reasons to
learn about AI:

o With the help of AI, you can create such software or devices which can solve
real-world problems very easily and with accuracy such as health issues,
marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as
Cortana, Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new
Opportunities.

Goals of Artificial Intelligence

Following are the main goals of Artificial Intelligence:

1. Replicate human intelligence


2. Solve Knowledge-intensive tasks
3. An intelligent connection of perception and action
4. Building a machine which can perform tasks that requires human intelligence
such as:
o Proving a theorem

o Playing chess

o Plan some surgical operation

o Driving a car in traffic

5. Creating some system which can exhibit intelligent behavior, learn new
things by itself, demonstrate, explain, and can advise to its user.

What Comprises to Artificial Intelligence?

Artificial Intelligence is not just a part of computer science even it's so vast and
requires lots of other factors which can contribute to it. To create the AI first we
should know that how intelligence is composed, so the Intelligence is an intangible
part of our brain which is a combination of Reasoning, learning, problem-
solving perception, language understanding, etc.

3
To achieve the above factors for a machine or software Artificial Intelligence
requires the following discipline:

o Mathematics
o Biology
o Psychology
o Sociology
o Computer Science
o Neurons Study
o Statistics

Advantages of Artificial Intelligence

Following are some main advantages of Artificial Intelligence:

o High Accuracy with less errors: AI machines or systems are prone to less
errors and high accuracy as it takes decisions as per pre-experience or
information.
o High-Speed: AI systems can be of very high-speed and fast-decision
making, because of that AI systems can beat a chess champion in the Chess
game.
o High reliability: AI machines are highly reliable and can perform the same
action multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as
defusing a bomb, exploring the ocean floor, where to employ a human can
be risky.

4
o Digital Assistant: AI can be very useful to provide digital assistant to the
users such as AI technology is currently used by various E-commerce
websites to show the products as per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as
a self-driving car which can make our journey safer and hassle-free, facial
recognition for security purpose, Natural language processing to
communicate with the human in human-language, etc.

Disadvantages of Artificial Intelligence

Every technology has some disadvantages, and the same goes for Artificial
intelligence. Being so advantageous technology still, it has some disadvantages
which we need to keep in our mind while creating an AI system. Following are the
disadvantages of AI:

o High Cost: The hardware and software requirement of AI is very costly as it


requires lots of maintenance to meet current world requirements.
o Can't think out of the box: Even we are making smarter machines with AI,
but still they cannot work out of the box, as the robot will only do that work
for which they are trained, or programmed.
o No feelings and emotions: AI machines can be an outstanding performer,
but still it does not have the feeling so it cannot make any kind of emotional
attachment with human, and may sometime be harmful for users if the proper
care is not taken.
o Increase dependency on machines: With the increment of technology,
people are getting more dependent on devices and hence they are losing their
mental capabilities.
o No Original Creativity: As humans are so creative and can imagine some
new ideas but still AI machines cannot beat this power of human intelligence
and cannot be creative and imaginative.

5
Artificial Neural Network(ANN)

What are Neurons in a Neural Network?


A layer consists of small individual units called neurons. A neuron in a neural
network can be better understood with the help of biological neurons. An artificial
neuron is similar to a biological neuron. It receives input from the other neurons,
performs some processing, and produces an output.

Now let’s see an Artificial Neuron-

Here, X1 and X2 are inputs to the artificial neurons, f(X) represents the processing
done on the inputs and y represents the output of the neuron.

What is Artificial Neural Network(ANN) ?

An artificial neutral network (ANN) is a system that is based on the biological


neural network, such as the brain. The brain has approximately 100 billion
neurons, which communicate through electro-chemical signals. The neurons are
6
connected through junctions called synapses. Each neuron receives thousands of
connections with other neurons, constantly receiving incoming signals to reach the
cell body. If the resulting sum of the signals surpasses a certain threshold, a
response is sent through the axon. The ANN attempts to recreate the computational
mirror of the biological neural network, although it is not comparable since the
number and complexity of neurons and the used in a biological neural network is
many times more than those in an artificial neutral network.

Artificial Neural Network(ANN) uses the processing of the brain as a basis to


develop algorithms that can be used to model complex patterns and prediction
problems.

Artificial Neural Networks contain artificial neurons which are called units.
These units are arranged in a series of layers that together constitute the whole
Artificial Neural Networks in a system. A layer can have only a dozen units or
millions of units as this depends on the complexity of the system. Commonly,
Artificial Neural Network has an input layer, output layer as well as hidden
layers. The input layer receives data from the outside world which the neural
network needs to analyze or learn about. Then this data passes through one or
multiple hidden layers that transform the input into data that is valuable for the
output layer. Finally, the output layer provides an output in the form of a response
of the Artificial Neural Networks to input data provided.

In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of
7
one unit on another unit. As the data transfers from one unit to another, the neural
network learns more and more about the data which eventually results in an
output from the output layer.

An ANN is comprised of a network of artificial neurons (also known as "nodes").


These nodes are connected to each other, and the strength of their connections to
one another is assigned a value based on their strength: inhibition (maximum being
-1.0) or excitation (maximum being +1.0). If the value of the connection is high,
then it indicates that there is a strong connection. Within each node's design, a
transfer function is built in. There are three types of neurons in an ANN, input
nodes, hidden nodes, and output nodes.

Lets begin by first understanding how our brain processes


information:

In our brain, there are billions of cells called neurons, which processes information
in the form of electric signals. External information/stimuli is received by the
dendrites of the neuron, processed in the neuron cell body, converted to an output
and passed through the Axon to the next neuron. The next neuron can choose to
either accept it or reject it depending on the strength of the signal.

8
Now, lets try to understand how a ANN works:

Here, w1, w2, w3 gives the strength of the input signals

As you can see from the above, an ANN is a very simplistic representation of a how
a brain neuron works.

9
Key Components of the Neural Network Architecture
(OR Representation of Neural Network)

The Neural Network architecture is made of individual units called neurons that mimic
the biological behavior of the brain.
Here are the various components of a neuron.

Neuron in Artificial Neural Network

10
Input - It is the set of features that are fed into the model for the learning process.
For example, the input in object detection can be an array of pixel values pertaining
to an image.
Weight - Its main function is to give importance to those features that contribute
more towards the learning. It does so by introducing scalar multiplication between
the input value and the weight matrix. For example, a negative word would impact
the decision of the sentiment analysis model more than a pair of neutral words.
Transfer function - The job of the transfer function is to combine multiple inputs
into one output value so that the activation function can be applied. It is done by a
simple summation of all the inputs to the transfer function.
Activation Function—It introduces non-linearity in the working of perceptrons to
consider varying linearity with the inputs. Without this, the output would just be a
linear combination of input values and would not be able to introduce non-linearity
in the network.

11
In Neural Network the activation function defines if given node should be
“activated” or not based on the weighted sum. Let’s define this weighted sum
value as z. In this section I would explain why “Step Function” and “Linear
Function” won’t work and talk about “Sigmoid Function” one of the most
popular activation functions. There are also other functions which I will leave
aside for now.

The transfer function translates the input signals to output signals. Four types of
transfer functions are commonly used, Unit step (threshold), sigmoid, piecewise
linear, and Gaussian.

Bias - The role of bias is to shift the value produced by the activation function. Its
role is similar to the role of a constant in a linear function.
When multiple neurons are stacked together in a row, they constitute a layer, and
multiple layers piled next to each other are called a multi-layer neural network.
We've described the main components of this type of structure below.

Multi-layer neural network

12
Input Layer
The data that we feed to the model is loaded into the input layer from external
sources like a CSV file or a web service. It is the only visible layer in the complete
Neural Network architecture that passes the complete information from the outside
world without any computation.
Hidden Layers
The hidden layers are what makes deep learning what it is today. They are
intermediate layers that do all the computations and extract the features from the
data.
There can be multiple interconnected hidden layers that account for searching
different hidden features in the data. For example, in image processing, the first
hidden layers are responsible for higher-level features like edges, shapes, or
boundaries. On the other hand, the later hidden layers perform more complicated
tasks like identifying complete objects (a car, a building, a person).
Output Layer
The output layer takes input from preceding hidden layers and comes to a final
prediction based on the model’s learnings. It is the most important layer where we
get the final result.
In the case of classification/regression models, the output layer generally has a single
node. However, it is completely problem-specific and dependent on the way the
model was built.
Unit step (threshold)
The output is set at one of two levels, depending on whether the total input is
greater than or less than some threshold value.

13
Example of ANN
To make things clearer, lets understand ANN using a simple example: A bank
wants to assess whether to approve a loan application to a customer, so, it wants to
predict whether a customer is likely to default on the loan. It has data like below:

So, we have to predict Column X. A prediction closer to 1 indicates that the


customer has more chances to default.

Lets try to create an Artificial Neural Network architecture loosely based on the
structure of a neuron using this example:

14
In general, a simple ANN architecture for the above example could be:

15
Key Points related to the architecture:

1. The network architecture has an input layer, hidden layer (there can be more than
1) and the output layer. It is also called MLP (Multi Layer Perceptron) because of
the multiple layers.

2. The hidden layer can be seen as a “distillation layer” that distills some of the
important patterns from the inputs and passes it onto the next layer to see. It makes
the network faster and efficient by identifying only the important information from
the inputs leaving out the redundant information

3. The activation function serves two notable purposes:

- It captures non-linear relationship between the inputs

- It helps convert the input into a more useful output.

16
In the above example, the activation function used is sigmoid:

O1 = 1 / (1+exp(-F))

Where F = W1*X1 + W2*X2 + W3*X3

Sigmoid activation function creates an output with values between 0 and 1. There
can be other activation functions like Tanh, softmax and RELU.

4. Similarly, the hidden layer leads to the final prediction at the output layer:

O3 = 1 / (1+exp(-F 1))

Where F 1= W7*H1 + W8*H2

Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75)
indicates that there is a higher indication of customer defaulting.

5. The weights W are the importance associated with the inputs. If W1 is 0.56 and
W2 is 0.92, then there is higher importance attached to X2: Debt Ratio than X1:
Age, in predicting H1.

6. The above network architecture is called “feed-forward network”, as you can see
that input signals are flowing in only one direction (from inputs to outputs). We can
also create “feedback networks where signals flow in both directions.

7. A good model with high accuracy gives predictions that are very close to the
actual values. So, in the table above, Column X values should be very close to
Column W values. The error in prediction is the difference between column W and
column X:

17
8. The key to get a good model with accurate predictions is to find “optimal values
of W — weights” that minimizes the prediction error. This is achieved by “Back
propagation algorithm” and this makes ANN a learning algorithm because by
learning from the errors, the model is improved.

9. The most common method of optimization algorithm is called “gradient


descent”, where, iteratively different values of W are used and prediction errors
assessed. So, to get the optimal W, the values of W are changed in small amounts
and the impact on prediction errors assessed. Finally, those values of W are chosen
as optimal, where with further changes in W, errors are not reducing further.

18
How do Artificial Neural Networks learn?

Artificial neural networks are trained using a training set. For example, suppose
you want to teach an ANN to recognize a cat. Then it is shown thousands of
different images of cats so that the network can learn to identify a cat. Once the
neural network has been trained enough using images of cats, then you need to
check if it can identify cat images correctly. This is done by making the ANN
classify the images it is provided by deciding whether they are cat images or not.
The output obtained by the ANN is corroborated by a human-provided
description of whether the image is a cat image or not. If the ANN identifies
incorrectly then back-propagation is used to adjust whatever it has learned during
training. Back-propagation is done by fine-tuning the weights of the connections
in ANN units based on the error rate obtained. This process continues until the
artificial neural network can correctly recognize a cat in an image with minimal
possible error rates.

What are the types of Artificial Neural Networks?

There are different types of neural networks, but they are generally classified into
feed-forward and feed-back networks.

1. Feedforward Neural Network


The feedforward neural network is one of the most basic artificial neural
networks. In this ANN, the data or the input provided travels in a single direction.
It enters into the ANN through the input layer and exits through the output layer
while hidden layers may or may not exist. So the feedforward neural network has
a front propagated wave only and usually does not have backpropagation.

A feed-forward network is a non-recurrent network which contains inputs,


outputs, and hidden layers; the signals can only travel in one direction. Input data
is passed onto a layer of processing elements where it performs calculations. Each
processing element makes its computation based upon a weighted sum of its
inputs. The new calculated values then become the new input values that feed the
next layer. This process continues until it has gone through all the layers and
19
determines the output. A threshold transfer function is sometimes used to quantify
the output of a neuron in the output layer.

Feed-forward networks include :


a) Perceptron (linear and non-linear) and
b) Radial Basis Function networks

2. Recurrent Neural Network


The Recurrent Neural Network saves the output of a layer and feeds this output
back to the input to better predict the outcome of the layer. The first layer in the
RNN is quite similar to the feed-forward neural network and the recurrent neural
network starts once the output of the first layer is computed. After this layer, each
unit will remember some information from the previous step so that it can act as
a memory cell in performing computations.

3. Convolutional Neural Network


A Convolutional neural network has some similarities to the feed-forward neural
network, where the connections between units have weights that determine the
influence of one unit on another unit. But a CNN has one or more than one
convolutional layers that use a convolution operation on the input and then pass
the result obtained in the form of output to the next layer. CNN has applications
in speech and image processing which is particularly useful in computer vision.

4. Modular Neural Network


A Modular Neural Network contains a collection of different neural networks that
work independently towards obtaining the output with no interaction between
them. Each of the different neural networks performs a different sub-task by
obtaining unique inputs compared to other networks. The advantage of this
modular neural network is that it breaks down a large and complex computational
process into smaller components, thus decreasing its complexity while still
obtaining the required output.

20
5. Radial basis function Neural Network
Radial basis functions are those functions that consider the distance of a point
concerning the center. RBF functions have two layers. In the first layer, the input
is mapped into all the Radial basis functions in the hidden layer and then the
output layer computes the output in the next step. Radial basis function nets are
normally used to model the data that represents any underlying trend or function.

Applications of Artificial Neural Networks

1. Social Media
Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests you people
that you might know in real life so that you can send them friend requests. Well,
this magical effect is achieved by using Artificial Neural Networks that analyze
your profile, your interests, your current friends, and also their friends and various
other factors to calculate the people you might potentially know. Another
common application of Machine Learning in social media is facial recognition.
This is done by finding around 100 reference points on the person’s face and then
matching them with those already available in the database using convolutional
neural networks.

2. Marketing and Sales


When you log onto E-commerce sites like Amazon and Flipkart, they will
recommend your products to buy based on your previous browsing history.
Similarly, suppose you love Pasta, then Zomato, Swiggy, etc. will show you
restaurant recommendations based on your tastes and previous order history. This
is true across all new-age marketing segments like Book sites, Movie services,
Hospitality sites, etc. and it is done by implementing personalized marketing.
This uses Artificial Neural Networks to identify the customer likes, dislikes,
previous shopping history, etc. and then tailor the marketing campaigns
accordingly.
3. Healthcare
Artificial Neural Networks are used in Oncology to train algorithms that can
identify cancerous tissue at the microscopic level at the same accuracy as trained
physicians. Various rare diseases may manifest in physical characteristics and
21
can be identified in their premature stages by using Facial Analysis on the patient
photos. So the full-scale implementation of Artificial Neural Networks in the
healthcare environment can only enhance the diagnostic abilities of medical
experts and ultimately lead to the overall improvement in the quality of medical
care all over the world.
4. Personal Assistants
I am sure you all have heard of Siri, Alexa, Cortana, etc. and also heard them
based on the phones you have!!! These are personal assistants and an example of
speech recognition that uses Natural Language Processing to interact with the
users and formulate a response accordingly. Natural Language Processing uses
artificial neural networks that are made to handle many tasks of these personal
assistants such as managing the language syntax, semantics, correct speech, the
conversation that is going on, etc.

Key advantages of ANNs :

ANNs have some key advantages that make them most suitable for certain
problems and situations:

1. ANNs have the ability to learn and model non-linear and complex relationships,
which is really important because in real-life, many of the relationships between
inputs and outputs are non-linear as well as complex.

2. ANNs can generalize — After learning from the initial inputs and their
relationships, it can infer unseen relationships on unseen data as well, thus making
the model generalize and predict on unseen data.

3. Unlike many other prediction techniques, ANN does not impose any restrictions
on the input variables (like how they should be distributed). Additionally, many
studies have shown that ANNs can better model heteroskedasticity i.e. data with
high volatility and non-constant variance, given its ability to learn hidden
22
relationships in the data without imposing any fixed relationships in the data. This
is something very useful in financial time series forecasting (e.g. stock prices)
where data volatility is very high.

Perceptron
In Machine Learning and Artificial Intelligence, Perceptron is the most commonly
used term for all folks. It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input values or scores,
and a threshold. Perceptron is a building block of an Artificial Neural Network.

Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented the
Perceptron for performing certain calculations to detect input data capabilities or
business intelligence.

Perceptron is a linear Machine Learning algorithm used for supervised learning for
various binary classifiers. This algorithm enables neurons to learn elements and
processes them one by one during preparation. Here "Perceptron in Machine
Learning," we will discuss in-depth knowledge of Perceptron and its basic
functions in brief. Let's start with the basic introduction of Perceptron.

23
What is the Perceptron model in Artificial Neuron network/
Machine Learning ?
Perceptron is Machine Learning algorithm for supervised learning of various
binary classification tasks. Further, Perceptron is also understood as an Artificial
Neuron or neural network unit that helps to detect certain input data
computations in business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary
classifiers. Hence, we can consider it as a single-layer neural network with four
main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.

Basic Components of Perceptron

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.

o Wight and Bias:

24
Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

The data scientist uses the activation function to take a subjective decision based
on various problem statements and forms the desired outputs. Activation function
may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking
whether the learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network


that consists of four main parameters named input values (Input nodes), weights
and Bias, net sum, and an activation function. The perceptron model begins with
the multiplication of all input values and their weights, then adds these values

25
together to create the weighted sum. Then this weighted sum is applied to the
activation function 'f' to obtain the desired output. This activation function is also
known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output
is mapped between required values (0,1) or (-1,1). It is important to note that the
weight of input is indicative of the strength of a node. Similarly, an input's bias
value gives the ability to shift the activation function curve up or down.

Perceptron Function

Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x'
with the learned weight coefficient 'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

26
Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value
as follows:

Y = f(∑wi*xi + b)

Characteristics of Perceptron

The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary


classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made
whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight
function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the
two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must
have an output signal; otherwise, no output will be shown.

27
Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as
follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer
perceptron model is to analyze the linearly separable objects with binary outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data, so
it begins with inconstantly allocated input for weight parameters. Further, it sums
up all inputs (weight). After adding all inputs, if the total sum of all inputs is more
than a pre-determined value, the model gets activated and shows the output value
as +1.

If the outcome is same as pre-determined or threshold value, then the performance


of this model is stated as satisfied, and weight demand does not change. However,
this model consists of a few discrepancies triggered when multiple weight inputs
values are fed into the model. Hence, to find desired output and minimize errors,
some changes should be necessary for the weights input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm,


which executes in two stages as follows:

28
o Forward Stage: Activation functions start from the input layer in the
forward stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are
modified as per the model's requirement. In this stage, the error between
actual output and demanded originated backward on the output layer and
ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial


neural networks having various layers in which activation function does not remain
linear, similar to a single layer perceptron model. Instead of linear, activation
function can be executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process
linear and non-linear patterns. Further, it can also implement logic gates such as
AND, OR, XOR, NAND, NOT, XNOR, NOR.

Limitations of Perceptron Model


A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to the


hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input
vectors. If input vectors are non-linear, it is not easy to classify them properly.

Recurrent Neural Networks (RNNs)


The basic deep learning architecture has a fixed input size, and this acts as a blocker in
scenarios where the input size is not fixed. Also, the decisions made by the model were
based on the current input with no memory of the past.
Recurrent Neural Networks work very well with sequences of data as input. Its
functionality can be seen in solving NLP problems like sentiment analysis, spam
filters, time series problems like sales forecasting, stock market prediction, etc.

29
Recurrent Neural Networks have the power to remember what it has learned in the past
and apply it in future predictions.

The input is in the form of sequential data that is fed into the RNN, which has a hidden
internal state that gets updated every time it reads the following sequence of data in the
input.
The internal hidden state will be fed back to the model. The RNN produces some
output at every timestamp.
The mathematical representation is given below:

30
Note: We use the same function and parameters at every timestamp.

The Long Short Term Memory Network (LSTM)


In RNN each of our predictions looked only one timestamp back, and it has a very
short-term memory. It doesn't use any information from further back.
To rectify this, we can take our Recurrent Neural Networks structure and expand it by
adding some more pieces to it.
The critical part that we add to this Recurrent Neural Networks is memory. We want
it to be able to remember what happened many timestamps ago. To achieve this, we
need to add extra structures called gates to the artificial neural network structure.

• Cell state (c_t): It corresponds to the long-term memory content of the network.

31
• Forget Gate: Some information in the cell state is no longer needed and is
erased. The gate receives two inputs, x_t (current timestamp input) and h_t-1
(previous cell state), multiplied with the relevant weight matrices before bias is
added. The result is sent into an activation function, which outputs a binary value
that decides whether the information is retained or forgotten.
• Input gate: It decides what piece of new information is to be added to the cell
state. It is similar to the forget gate using the current timestamp input and
previous cell state with the only difference of multiplying with a different set of
weights.
• Output gate: The output gate's job is to extract meaningful information from the
current cell state and provide it as an output.

Gradient Descent, Cost functions and Backpropagation in


Neural Networks

Gradient Descent Training


The human brain’s learning process is complicated, and research has barely
scratched the surface of how humans learn. However, the little that we do know is
valuable and helpful for building models. Unlike machines, humans do not need a
large quantity of data to comprehend how to tackle an issue or make logical
predictions; instead, we learn from our experiences and mistakes.

Humans learn through a process of synaptic plasticity. Synaptic plasticity is a term


used to describe how new neural connections are formed and strengthened after
gaining new information. In the same way that the connections in the brain are
strengthened and formed as we experience new events, we train artificial neural
networks by computing the errors of neural network predictions and strengthening
or weakening internal connections between neurons based on these errors.

Gradient Descent is a standard optimization algorithm. It is frequently the first


optimization algorithm introduced to train machine learning. Let’s dissect the term
“Gradient Descent” to get a better understanding of how it relates to machine
learning algorithms.
32
A gradient is a measurement that quantifies the steepness of a line or curve.
Mathematically, it details the direction of the ascent or descent of a line.

Descent is the action of going downwards. Therefore, the gradient descent


algorithm quantifies downward motion based on the two simple definitions of these
phrases.

To train a machine learning algorithm, you strive to identify the weights and biases
within the network that will help you solve the problem under consideration.

For example, you may have a classification problem. When looking at an image,
you want to determine if the image is of a cat or a dog. To build your model, you
train your algorithm with training data with correctly labelled data samples of cats
and dogs images.

While the example described above is classification, the problem could be


localization or detection. Nonetheless, how well a neural network performs on a
problem is modelled as a function, more specifically, a cost function; a cost or what
is sometimes called a loss function measures how wrong a model is. The partial
derivatives of the cost function influence the ultimate model’s weights and biases
selected.

Gradient Descent is the algorithm that facilitates the search of parameters


values that minimize the cost function towards a local minimum or optimal
accuracy.

Neural networks are impressive. Equally impressive is the capacity for a


computational program to distinguish between images and objects within images
without being explicitly informed of what features to detect.

33
It is helpful to think of a neural network as a function that accepts inputs (data ), to
produce an output prediction. The variables of this function are the parameters or
weights of the neuron.

Therefore the key assignment to solving a task presented to a neural network


will be to adjust the values of the weights and biases in a manner that
approximates or best represents the dataset.

The image below depicts a simple neural network that receives input(X1, X2, X3,
Xn), these inputs are fed forward to neurons within the layer containing
weights(W1, W2, W3, Wn). The inputs and weights undergo a multiplication
operation and the result is summed together by an adder(), and an activation
function regulates the final output of the layer.

Figure 1: Image of a shallow neural network created by Author

To assess the performance of neural networks, a mechanism for quantifying the


difference or gap between the neural network prediction and the actual data sample

34
value is required, yielding the calculation of a factor that influences the
modification of weights and biases within a neural network.

The error gap between the predicted value of a neural network and the actual value
of a data sample is facilitated by the cost function.

Figure 2: Neural Network internal connections and predictions depicted

The image above illustrates a simple neural network architecture of densely


connected neurons that classifies images containing the digits 0–3. Each neuron in
the output layer corresponds to a digit. The higher the activations of the connection
to a neuron, the higher the probability outputted by the neuron. The probability
corresponds to the likelihood that the digit fed forward through the network is
associated with the activated neuron.

When a ‘3’ is fed forward through the network, we expect the connections
(represented by the arrows in the diagram) responsible for classifying a ‘3’ to have
higher activation, which results in a higher probability for the output neuron
associated with the digit ‘3’.

35
Several components are responsible for the activation of a neuron, namely biases,
weights, and the previous layer activations. These specified components have to be
iteratively modified for the neural network to perform optimally on a particular
dataset.

By leveraging a cost function such as ‘mean squared error’, we obtain information


in relation to the error of the network that is used to propagate updates backwards
through the network’s weights and biases.

For completeness, below are examples of cost functions used within machine
learning:

• Mean Squared Error

• Categorical Cross-Entropy

• Binary Cross-Entropy

• Logarithmic Loss

We have covered how to improve neural networks’ performance through a


technique that measures the network’s predictions. The rest of the content in this
article focuses on the relationship between gradient descent, backpropagation, and
cost function.

The image in figure 3 illustrates a cost function plotted on the x and y-axis that hold
values within the function’s parameter space. Let’s take a look at how neural
networks learn by visualizing the cost function as an uneven surface plotted on a
graph within the parameter spaces of the possible weight/parameters values.

36
Figure 3: Gradient Descent visualized

The blue points in the image above represent a step (evaluation of parameters
values into the cost function) in the search for a local minimum. The lowest point
of a modelled cost function corresponds to the position of weights values that
results in the lowest value of the cost function. The smaller the cost function is, the
better the neural network performs. Therefore, it is possible to modify the networks’
weights from the information gathered.

Gradient descent is the algorithm employed to guide the pairs of values chosen at
each step towards a minimum.

• Local Minimum: The minimum parameter values within a specified range


or sector of the cost function.

• Global Minimum: This is the smallest parameter value within the entire
cost function domain.

The gradient descent algorithm guides the search for values that minimize the
function at a local/global minimum by calculating the gradient of a differentiable
function and moving in the opposite direction of the gradient.

37
Advantages of gradient descent and its variants:

1. Widely used: Gradient descent and its variants are widely used in
machine learning and optimization problems because they are effective
and easy to implement.
2. Convergence: Gradient descent and its variants can converge to a global
minimum or a good local minimum of the cost function, depending on
the problem and the variant used.
3. Scalability: Many variants of gradient descent can be parallelized and are
scalable to large datasets and high-dimensional models.
4. Flexibility: Different variants of gradient descent offer a range of trade-
offs between accuracy and speed, and can be adjusted to optimize the
performance of a specific problem.

Disadvantages of gradient descent and its variants:

1. Choice of learning rate: The choice of learning rate is crucial for the
convergence of gradient descent and its variants. Choosing a learning rate
that is too large can lead to oscillations or overshooting, while choosing
a learning rate that is too small can lead to slow convergence or getting
stuck in local minima.
2. Sensitivity to initialization: Gradient descent and its variants can be
sensitive to the initialization of the model’s parameters, which can affect
the convergence and the quality of the solution.
Time-consuming: Gradient descent and its variants can be time-
consuming, especially when dealing with large datasets and high-
dimensional models. The convergence speed can also vary depending on
the variant used and the specific problem.
3. Local optima: Gradient descent and its variants can converge to a local
minimum instead of the global minimum of the cost function, especially
in non-convex problems. This can affect the quality of the solution, and
techniques like random initialization and multiple restarts may be used
to mitigate this issue.

38
Backpropagation.

Backpropagation is the method of adjusting neuron outputs (bias, weights,


activations) iteratively to reduce the cost function. In a neural network architecture,
the neuron’s inputs, including all the preceding connections to the neurons in the
previous layer, determine its output.

The iterative mathematical process embedded in backpropagation calculates the


partial derivative of the cost function with respect to the weights, biases, and
previous layer activations to identify which values affect the gradient of the cost
function.

The minimization of the cost function by calculating the gradient leads to a local
minimum. In each iteration or training step, the weights in the network are adjusted
by the calculated gradient, alongside the learning rate, which controls the factor of
modification made to weight values. This process is repeated for each step to be
taken during the training phase of a neural network with the goal to be closer to a
local minimum after each step.

Figure 4: Backwards propagation of errors (Backpropagation). Gif Source from:


3Blue1Brown, Chapter 3, Deep Learning

39
The name “Backpropagation” comes from the process’s literal meaning, which is
“backward propagation of errors.” The partial derivative of the gradient quantifies
the error. By propagating the errors backwards through the network, the partial
derivative of the gradient of the last layer (closest layer to the output layer) is used
to calculate the gradient of the second to the last layer.

The propagation of errors through the layers and the utilization of the partial
derivative of the gradient from a previous layer in the current layer occurs until the
first layer (closest layer to the input layer) in the network is reached.

Overfitting in Neural Network


With the increase in the number of parameters, neural networks have the freedom
to fit multiple types of datasets which is what makes them so powerful. But,
sometimes this power is what makes the neural network weak. The networks often
lose control over the learning process and the model tries to memorize each of the
data points causing it to perform well on training data but poorly on the test dataset.
This is called overfitting.

Overfitting occurs when the model tries to make predictions on data that is very
noisy. A model that is overfitted is inaccurate because the trend does not reflect
the reality present in the data. To overcome this, there are a few techniques that
can be used.

Techniques to prevent overfitting in Neural Networks


(OR Methods to Avoid Overfitting of a Model)

You can identify that your model is not right when it works well on training data
but does not perform well on unseen and new data. You can also track the
performance of the model performance through concepts like bias and variance.

40
But how to solve this problem? Here are some of the techniques you can use to
effectively overcome the overfitting problem in your neural network.

Simplifying The Model

The first step when dealing with overfitting is to decrease the complexity of the
model. To decrease the complexity, we can simply remove layers or reduce the
number of neurons to make the network smaller. There is no general rule on how
much to remove or how large your network should be. But, if your neural network
is overfitting, try making it smaller.

Early Stopping

Early stopping is a form of regularization while training a model with an iterative


method, such as gradient descent. Since all the neural networks learn exclusively
by using gradient descent, early stopping is a technique applicable to all the
problems. This method updates the model so as to make it better fit the training
data with each iteration. Up to a point, this improves the model’s performance on
data on the test set. Past that point, however, improving the model’s fit to the
training data leads to increased generalization error. Early stopping rules provide
guidance as to how many iterations can be run before the model begins to overfit.

Use Data Augmentation

In the case of neural networks, data augmentation simply means increasing the size
of the data that is increasing the number of images present in the dataset. Some of
the popular image augmentation techniques are flipping, translation, rotation,
scaling, changing brightness, adding noise etcetera.

Use Regularization

Regularization is a technique to reduce the complexity of the model. It does so by

41
adding a penalty term to the loss function. The most common techniques are
known as L1 and L2 regularization:

The L1 penalty aims to minimize the absolute value of the weights. This is
mathematically shown in the below formula.

The L2 penalty aims to minimize the squared magnitude of the weights. This is
mathematically shown in the below formula.

If the data is too complex to be modeled accurately then L2 is a better choice as it


is able to learn inherent patterns present in the data. While L1 is better if the data
is simple enough to be modeled accurately. For most of the computer vision
problems that I have encountered, L2 regularization almost always gives better
results. However, L1 has an added advantage of being robust to outliers. So the
correct choice of regularization depends on the problem that we are trying to solve.

Use Dropouts

Dropout modifies the network itself. It randomly drops neurons from the neural
network during training in each iteration. When we drop different sets of neurons,
it’s equivalent to training different neural networks. The different networks will
overfit in different ways, so the net effect of dropout will be to reduce overfitting.

42

You might also like