0% found this document useful (0 votes)
99 views31 pages

Module 1 Ann

Uploaded by

darshanram.g9141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views31 pages

Module 1 Ann

Uploaded by

darshanram.g9141
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Module-1: Introduction to Artificial Neural Networks

Content.

• Introduction to ANN

• Biological neural network

• Architecture of ANN

• What is perceptron and its components.

• Perceptron learning algorithm

• Types of perceptron

• Activation function and its properties.

• Single layer perceptron and its limitations

• Multilayer perceptron algorithm

I. Introduction to ANN
The term "Artificial neural network" refers to a biologically inspired
sub-field of artificial intelligence modeled after the brain. An Artificial
neural network is usually a computational network based on biological
neural networks that construct the structure of the human brain. Similar
to a human brain has neurons interconnected to each other, artificial
neural networks also have neurons that are linked to each other in
various layers of the networks. These neurons are known as nodes.

Artificial neural network tutorial covers all the aspects related to the
artificial neural network. In this tutorial, we will discuss ANNs,
Adaptive resonance theory, Kohonen self-organizing map, Building
blocks, unsupervised learning, Genetic algorithm, etc.

What is Artificial Neural Network?

The term "Artificial Neural Network" is derived from Biological


neural networks that develop the structure of a human brain. Similar to
the human brain that has neurons interconnected to one another, artificial
neural networks also have neurons that are interconnected to one another
in various layers of the networks. These neurons are known as nodes.

II. Biological neural network


The given figure illustrates the typical diagram of Biological Neural
Network.

The typical Artificial Neural Network looks something like the given
figure.
Dendrites from Biological Neural Network represent inputs in Artificial
Neural Networks, cell nucleus represents Nodes, synapse represents
Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural


network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial


intelligence where it attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand
things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like
interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each neuron
has an association point somewhere in the range of 1,000 and 100,000.
In the human brain, data is stored in such a manner as to be distributed,
and we can extract more than one piece of this data when necessary from
our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural


network, we have to understand what a neural network consists of. In
order to define a neural network that consists of a large number of
artificial neurons, which are termed units arranged in a sequence of
layers. Lets us look at various types of layers available in an artificial
neural network.

Artificial Neural Network primarily consists of three layers:

Input Layer:
As the name suggests, it accepts inputs in several different formats
provided by the programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It


performs all the calculations to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden


layer, which finally results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum
of the inputs and includes a bias. This computation is represented in the
form of a transfer function.

t determines weighted total is passed as an input to


an activation function to produce the output. Activation functions choose
whether a node should fire or not. Only those who are fired make it to
the output layer. There are distinctive activation functions available that
can be applied upon the sort of task we are performing.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:


Artificial neural networks have a numerical value that can perform more
than one task simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole


network, not on a database. The disappearance of a couple of pieces of
data in one place doesn't prevent the network from working.

Capability to work with incomplete knowledge:

After ANN training, the information may produce output even with
inadequate data. The loss of performance here relies upon the
significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples


and to encourage the network according to the desired output by
demonstrating these examples to the network. The succession of the
network is directly proportional to the chosen instances, and if the event
can't appear to the network in all its aspects, it can produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from


generating output, and this feature makes the network fault-tolerance.
Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial


neural networks. The appropriate network structure is accomplished
through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing


solution, it does not provide insight concerning why and how. It
decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing


power, as per their structure. Therefore, the realization of the equipment
is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into
numerical values before being introduced to ANN. The presentation
mechanism to be resolved here will directly impact the performance of
the network. It relies on the user's abilities.

The duration of the network is unknown:


The network is reduced to a specific value of the error, and this value
does not give us optimum results.

III. Perceptron

Perceptron was introduced by Frank Rosenblatt in 1957. He proposed a


Perceptron learning rule based on the original MCP neuron. A
Perceptron is an algorithm for supervised learning of binary classifiers.
This algorithm enables neurons to learn and processes elements in the
training set one at a time.

Basic Components of Perceptron

Perceptron is a type of artificial neural network, which is a fundamental


concept in machine learning. The basic components of a perceptron are:
1. Input Layer: The input layer consists of one or more input
neurons, which receive input signals from the external world or
from other layers of the neural network.

2. Weights: Each input neuron is associated with a weight, which


represents the strength of the connection between the input
neuron and the output neuron.

3. Bias: A bias term is added to the input layer to provide the


perceptron with additional flexibility in modeling complex
patterns in the input data.

4. Activation Function: The activation function determines the


output of the perceptron based on the weighted sum of the inputs
and the bias term. Common activation functions used in
perceptrons include the step function, sigmoid function, and
ReLU function.

5. Output: The output of the perceptron is a single binary value,


either 0 or 1, which indicates the class or category to which the
input data belongs.

6. Training Algorithm: The perceptron is typically trained using a


supervised learning algorithm such as the perceptron learning
algorithm or backpropagation. During training, the weights and
biases of the perceptron are adjusted to minimize the error
between the predicted output and the true output for a given set
of training examples.

7. Overall, the perceptron is a simple yet powerful algorithm that


can be used to perform binary classification tasks and has paved
the way for more complex neural networks used in deep learning
today.

Types of Perceptron:

1. Single layer: Single layer perceptron can learn only linearly


separable patterns.

2. Multilayer: Multilayer perceptrons can learn about two or more


layers having a greater processing power.

The Perceptron algorithm learns the weights for the input signals in
order to draw a linear decision boundary.

Note: Supervised Learning is a type of Machine Learning used to learn


models from labeled training data. It enables output prediction for future
or unseen data. Let us focus on the Perceptron Learning Rule in the next
section.

How Does Perceptron Work?


AS discussed earlier, Perceptron is considered a single-layer neural link
with four main parameters. The perceptron model begins with
multiplying all input values and their weights, then adds these values to
create the weighted sum. Further, this weighted sum is applied to the
activation function ‘f’ to obtain the desired output. This activation
function is also known as the step function and is represented by ‘f.’

This step function or Activation function is vital in ensuring that output


is mapped between (0,1) or (-1,1). Take note that the weight of input
indicates a node’s strength. Similarly, an input value gives the ability the
shift the activation function curve up or down.

Step 1: Multiply all input values with corresponding weight values and
then add to calculate the weighted sum. The following is the
mathematical expression of it:

∑wi*xi = x1*w1 + x2*w2 + x3*w3+……..x4*w4


Add a term called bias ‘b’ to this weighted sum to improve the model’s
performance.

Step 2: An activation function is applied with the above-mentioned


weighted sum giving us an output either in binary form or a continuous
value as follows:

Y=f(∑wi*xi + b)

Types of Perceptron models

We have already discussed the types of Perceptron models in the


Introduction. Here, we shall give a more profound look at this:

1. Single Layer Perceptron model: One of the easiest


ANN(Artificial Neural Networks) types consists of a feed-
forward network and includes a threshold transfer inside the
model. The main objective of the single-layer perceptron model
is to analyze the linearly separable objects with binary outcomes.
A Single-layer perceptron can learn only linearly separable
patterns.

2. Multi-Layered Perceptron model: It is mainly similar to a single-


layer perceptron model but has more hidden layers.
Forward Stage: From the input layer in the on stage, activation functions
begin and terminate on the output layer.

Backward Stage: In the backward stage, weight and bias values are
modified per the model’s requirement. The backstage removed the error
between the actual output and demands originating backward on the
output layer. A multilayer perceptron model has a greater processing
power and can process linear and non-linear patterns. Further, it also
implements logic gates such as AND, OR, XOR, XNOR, and NOR.

Advantages:

 A multi-layered perceptron model can solve complex non-linear


problems.

 It works well with both small and large input data.

 Helps us to obtain quick predictions after the training.

 Helps us obtain the same accuracy ratio with big and small data.

Disadvantages:

 In multi-layered perceptron model, computations are time-


consuming and complex.
 It is tough to predict how much the dependent variable affects
each independent variable.

 The model functioning depends on the quality of training.

Characteristics of the Perceptron Model

The following are the characteristics of a Perceptron Model:

1. It is a machine learning algorithm that uses supervised learning


of binary classifiers.

2. In Perceptron, the weight coefficient is automatically learned.

3. Initially, weights are multiplied with input features, and then the
decision is made whether the neuron is fired or not.

4. The activation function applies a step rule to check whether the


function is more significant than zero.

5. The linear decision boundary is drawn, enabling the distinction


between the two linearly separable classes +1 and -1.

6. If the added sum of all input values is more than the threshold
value, it must have an output signal; otherwise, no output will be
shown.
Limitation of Perceptron Model

The following are the limitation of a Perceptron model:

1. The output of a perceptron can only be a binary number (0 or 1)


due to the hard-edge transfer function.

2. It can only be used to classify the linearly separable sets of input


vectors. If the input vectors are non-linear, it is not easy to
classify them correctly.

IV Activation Function

Definition

In artificial neural networks, an activation function is one that outputs a


smaller value for tiny inputs and a higher value if its inputs are greater
than a threshold. An activation function "fires" if the inputs are big
enough; otherwise, nothing happens. An activation function, then, is a
gate that verifies how an incoming value is higher than a threshold
value.

Because they introduce non-linearities in neural networks and enable the


neural networks can learn powerful operations, activation functions are
helpful. A feedforward neural network might be refactored into a
straightforward linear function or matrix transformation on to its input if
indeed the activation functions were taken out.
By generating a weighted total and then including bias with it, the
activation function determines whether a neuron should be turned on.
The activation function seeks to boost a neuron's output's nonlinearity.

Explanation: As we are aware, neurons in neural networks operate in


accordance with weight, bias, and their corresponding activation
functions. Based on the mistake, the values of the neurons inside a
neural network would be modified. This process is known as back-
propagation. Back-propagation is made possible by activation functions
since they provide the gradients and error required to change the biases
and weights.

Need of Non-linear Activation Functions

An interconnected regression model without an activation function is all


that a neural network is. Input is transformed nonlinearly by the
activation function, allowing the system to learn and perform more
challenging tasks.

It is merely a thing procedure that is used to obtain a node's output. It


also goes by the name Transfer Function.

The mixture of two linear functions yields a linear function, so no matter


how several hidden layers we add to a neural network, they all will
behave in the same way. The neuron cannot learn if all it has is a linear
model. It will be able to learn based on the difference with respect to
error with a non-linear activation function.

The mixture of two linear functions yields a linear function in itself, so


no matter how several hidden layers we add to a neural network, they all
will behave in the same way. The neuron cannot learn if all it has is a
linear model.

The two main categories of activation functions are:

o Linear Activation Function


o Non-linear Activation Functions

Linear Activation Function

As can be observed, the functional is linear or linear. Therefore, no


region will be employed to restrict the functions' output.
The normal data input to neural networks is unaffected by the
complexity or other factors.

Non-linear Activation Function

The normal data input to neural networks is unaffected by the


complexity or other factors.

Activation Function

o Linear Function

Equation: A linear function's equation, which is y = x, is similar to the


eqn of a single direction.
The ultimate activation function of the last layer is nothing more than a
linear function of input from the first layer, regardless of how many
levels we have if they are all linear in nature. -inf to +inf is the range.

Uses: The output layer is the only location where the activation
function's function is applied.

If we separate a linear function to add non-linearity, the outcome will no


longer depend on the input "x," the function will become fixed, and our
algorithm won't exhibit any novel behaviour.

A good example of a regression problem is determining the cost of a


house. We can use linear activation at the output layer since the price of
a house may have any huge or little value. The neural network's hidden
layers must perform some sort of non-linear function even in this
circumstance.

o Sigmoid Function

It is a functional that is graphed in a "S" shape.

A is equal to 1/(1 + e-x).

Non-linear in nature. Observe that while Y values are fairly steep, X


values range from -2 to 2. To put it another way, small changes in x also
would cause significant shifts in the value of Y. spans from 0 to 1.
Uses: Sigmoid function is typically employed in the output nodes of a
classi?cation, where the result may only be either 0 or 1. Since the value
for the sigmoid function only ranges from 0 to 1, the result can be easily
anticipated to be 1 if the value is more than 0.5 and 0 if it is not.

o Tanh Function

The activation that consistently outperforms sigmoid function is known


as tangent hyperbolic function. It's actually a sigmoid function that has
been mathematically adjusted. Both are comparable to and derivable
from one another.

Range of values: -1 to +1. non-linear nature

Uses: - Since its values typically range from -1 to 1, the mean again for
hidden layer of a neural network will be 0 or very near to it. This helps
to centre the data by getting the mean close to 0. This greatly facilitates
learning for the following layer.

Equation:

max A(x) (0, x). If x is positive, it outputs x; if not, it outputs 0.

Value Interval: [0, inf]


ature: non-linear, which allows us to simply backpropagate the mistakes
and have the ReLU function activate many layers of neurons.

Uses: Because ReLu includes simpler mathematical processes than tanh


and sigmoid, it requires less computer time to run. The system is sparse
and efficient for computation since only a limited number of neurons are
activated at any given time.

Simply said, RELU picks up information considerably more quickly


than sigmoid and Tanh functions.

o ReLU (Rectified Linear Unit) Activation Function

Currently, the ReLU is the activation function that is employed the most
globally. Since practically all convolutional neural networks and deep
learning systems employ it.

The derivative and the function are both monotonic.

However, the problem is that all negative values instantly become zero,
which reduces the model's capacity to effectively fit or learn from the
data. This means that any negative input to a ReLU activation function
immediately becomes zero in the graph, which has an impact on the final
graph by improperly mapping the negative values.

o Softmax Function
Although it is a subclass of the sigmoid function, the softmax function
comes in handy when dealing with multiclass classification issues.

Used frequently when managing several classes. In the output nodes of


image classification issues, the softmax was typically present. The
softmax function would split by the sum of the outputs and squeeze all
outputs for each category between 0 and 1.

The output unit of the classifier, where we are actually attempting to


obtain the probabilities to determine the class of each input, is where the
softmax function is best applied.

The usual rule of thumb is to utilise RELU, which is a usual perceptron


in hidden layers and is employed in the majority of cases these days, if
we really are unsure of what encoder to apply.

A very logical choice for the output layer is the sigmoid function if your
input is for binary classification. If our output involves multiple classes,
Softmax can be quite helpful in predicting the odds for each class.

Desirable properties of activation functions

1. Non Linearity

The purpose of the activation function is to introduce non-linearity into


the network in turn allows you to model a response variable (aka target
variable, class label, or score) that varies non-linearly with its
explanatory variables

Non-linear means that the output cannot be reproduced from a linear


combination of the inputs

Another way to think of it: without a non-linear activation function in the


network, a NN, no matter how many layers it had, would behave just like
a single-layer perceptron, because summing these layers would give you
just another linear function (see definition just above).

2) Continuously differentiable

This property is necessary for enabling gradient-based optimization


methods.

The binary step activation function is not differentiable at 0, and it


differentiates to 0 for all other values, so gradient-based methods can
make no progress with it

3) Range
When the range of the activation function is finite, gradient-based
training methods tend to be more stable, because pattern presentations
significantly affect only limited weights.

When the range is infinite, training is generally more efficient because


pattern presentations significantly affect most of the weights. In the latter
case, smaller learning rates are typically necessary.

4) Monotonic

When the activation function is monotonic, the error surface associated


with a single-layer model is guaranteed to be convex.

5) Approximates identity near the origin

When activation functions have this property, the neural network will
learn efficiently when its weights are initialized with small random
values.

When the activation function does not approximate identity near the
origin, special care must be used when initializing the weights.

Introduction to Single Layer Perceptron


A single layer perceptron is a simple algorithm used in machine learning.
It is a linear classifier that divides data into two classes using a line or
hyperplane. The single layer perceptron is limited because it cannot learn
non-linear problems. It can only learn linearly separable problems. The
single layer perceptron is also limited by its inability to learn XOR
problems.

Limitations of the Single Layer Perceptron Algorithm

The single layer perceptron is a very simple algorithm that can only learn
linear decision boundaries. This means that it is not well suited for more
complex problems where the data is not linearly separable. Additionally,
the single layer perceptron is a batch learning algorithm, which means
that it cannot learn from new data points incrementally. The single layer
perceptron is susceptible to local minima, meaning that it may not find
the global optimum solution to a problem.

1. Limited to Linearly Separable Data

The single layer perceptron is a neural network with only one hidden
layer. It is a simple model that is used to learn how to classify data. The
single layer perceptron is limited to linearly separable data, which means
that it can only learn to classify data that can be separated by a line. This
limitation means that the single layer perceptron cannot learn to classify
data that is not linearly separable.
2. Limited Memory Capacity and Range of Capabilities

The single layer perceptron is limited in terms of the amount of


information it can take in and process at one time. This limitation is due
to the fact that the perceptron only has a single layer of neurons. As a
result, the perceptron can only learn to recognize patterns that are linearly
separable. Additionally, the single layer perceptron is also limited in
terms of its range of capabilities. This means that it can only learn to
perform simple tasks such as classification or regression.

3. Difficulty with Nonlinear Mappings

A single layer perceptron is limited in its ability to learn nonlinear


mappings. This is because the perceptron only has one layer of neurons,
and therefore can only learn linear mappings. Nonlinear mappings are
more complex, and require a more sophisticated neural network
architecture such as a multilayer perceptron or a convolutional neural
network.

Other Limitations

In addition to the limitations discussed above, single layer perceptron has


a few other notable limitations.

First, single layer perceptron is unable to learn complex patterns. This is


because the model is only able to learn linear decision boundaries. If the
data contains non-linear patterns, the model will not be able to learn
them.

Second, single layer perceptron is also unable to learn interactions


between features. This means that if there are two features that have an
interaction, the model will not be able to learn it.

Third, single layer perceptron is vulnerable to overfitting. This means


that if the training data is not representative of the real data, the model
will perform poorly on unseen data.

Fourth, single layer perceptron is also sensitive to noise. This means that
if there is any noise in the training data, it will adversely affect the
performance of the model.

What is a Multilayer Perceptron Neural Network?


A multilayer perceptron (MLP) Neural network belongs to the
feedforward neural network. It is an Artificial Neural Network in which
all nodes are interconnected with nodes of different layers.

Frank Rosenblatt first defined the word Perceptron in his perceptron


program. Perceptron is a basic unit of an artificial neural network that
defines the artificial neuron in the neural network. It is a supervised
learning algorithm containing nodes’ values, activation functions, inputs,
and weights to calculate the output.

The Multilayer Perceptron (MLP) Neural Network works only in the


forward direction. All nodes are fully connected to the network. Each
node passes its value to the coming node only in the forward direction.
The MLP neural network uses a Backpropagation algorithm to increase
the accuracy of the training model.

Structure of MultiLayer Perceptron Neural Network

This network has three main layers that combine to form a complete
Artificial Neural Network. These layers are as follows:

Input Layer

It is the initial or starting layer of the Multilayer perceptron. It takes


input from the training data set and forwards it to the hidden layer. There
are n input nodes in the input layer. The number of input nodes depends
on the number of dataset features. Each input vector variable is
distributed to each of the nodes of the hidden layer.

Hidden Layer

It is the heart of all Artificial neural networks. This layer comprises all
computations of the neural network. The edges of the hidden layer have
weights multiplied by the node values. This layer uses the activation
function.

There can be one or two hidden layers in the model.

Several hidden layer nodes should be accurate as few nodes in the


hidden layer make the model unable to work efficiently with complex
data. More nodes will result in an overfitting problem.

Output Layer

This layer gives the estimated output of the Neural Network. The
number of nodes in the output layer depends on the type of problem. For
a single targeted variable, use one node. N classification problem, ANN
uses N nodes in the output layer.

Working of MultiLayer Perceptron Neural Network

 The input node represents the feature of the dataset.

 Each input node passes the vector input value to the hidden layer.
 In the hidden layer, each edge has some weight multiplied by the
input variable. All the production values from the hidden nodes
are summed together. To generate the output

 The activation function is used in the hidden layer to identify the


active nodes.

 The output is passed to the output layer.

 Calculate the difference between predicted and actual output at the


output layer.

 The model uses backpropagation after calculating the predicted


output.

You might also like