0% found this document useful (0 votes)
16 views65 pages

CO2 - ANN Structure and Funadamentals - P1

The document provides an overview of Artificial Neural Networks (ANN), focusing on their structure, functionality, and learning algorithms, particularly the perceptron model. It discusses the historical development of neural theories, the biological basis of neurons, and the differences between standard computers and neural networks. Additionally, it outlines various use cases for ANN learning and the training process for multi-layer perceptrons (MLP).

Uploaded by

hope98754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views65 pages

CO2 - ANN Structure and Funadamentals - P1

The document provides an overview of Artificial Neural Networks (ANN), focusing on their structure, functionality, and learning algorithms, particularly the perceptron model. It discusses the historical development of neural theories, the biological basis of neurons, and the differences between standard computers and neural networks. Additionally, it outlines various use cases for ANN learning and the training process for multi-layer perceptrons (MLP).

Uploaded by

hope98754
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Artificial Neural Network: Structure,

Functionality, Architecture, Inference and


performance measure.
AIM

To familiarize students with the concept of Perceptron

INSTRUCTIONAL OBJECTIVES

This Session is designed to:


1. Explain about History of ANN.
2. Discuss ANN Model.
3. Understand working of SLP and MLP .

LEARNING OUTCOMES

At the end of this session, you should be able to:


1. Know history of ANN.
2. Write perceptron learning algorithms.
3. Design Neural Networks for different data.

2
Networks
Reticular Theory
Joseph von Gerlach proposed that the nervous system is a single continuous network
as opposed to a network of many discrete cells! 1871-1873

Staining Technique
Camillo Golgi discovered a chemical reaction that allowed him to examine nervous
tissue in much greater detail than ever before. He was a proponent of Reticular
theory . 1871-1873
Neuron Doctrine
Santiago Ramón y Cajal used Golgi’s technique to study the nervous system and
proposed that it is actually made up of discrete individual cells forming a network
(as opposed to a single continuous network) 1888-1891

The Term Neuron


The term neuron was coined by Heinrich Wilhelm Gottfried von Waldeyer-Hartz
around 1891.
He further consolidated the Neuron Doctrine. 1888-1891

3
Nobel Prize
Both Golgi (reticular theory) and Cajal (neuron doctrine) were jointly
awarded the 1906 Nobel Prize for Physiology or Medicine, that resulted in
lasting conflicting ideas and controversies between the two scientists

The Final Word


In 1950s electron microscopy finally confirmed the neuron doctrine by unambiguously
demonstrating that nerve cells were individual cells interconnected through synapses
(a network of many individual neurons)

4
Computers vs. Neural Networks
“Standard” Computers Neural Networks

 one CPU Highly parallel processing

fast processing units slow processing units

reliable units unreliable units

static infrastructure dynamic infrastructure


SESSION INTRODUCTION

• Neural networks are a biologically-


inspired algorithms that attempt to
mimic the functions of neurons in
the brain.
• Each neuron acts as a
computational unit, accepting input
from the dendrites and outputting
signal through the axon terminals.
• Actions are triggered when a
specific combination of neurons are
activated.” The Human Brain is
made up of about 100 billion
neurons
• Neurons receive electric signals at
Artificial Neural Network
A set of major aspects of a parallel distributed model includes:
 a set of processing units (cells).
 a state of activation for every unit, which equivalent to the output of the unit.
 connections between the units. Generally each connection is defined by a weight.
 a propagation rule, which determines the effective input of a unit from its external
inputs.
 an activation function, which determines the new level of activation based on the
effective input and the current activation.
 an external input for each unit.
 a method for information gathering (learning rule).
 an environment within which the system must operate, providing input signals and _ if
necessary _ error signals.
Biological Neuron
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output

• A neuron is connected to other neurons


• through about 10,000 synapses
• A neuron receives input from other neurons.
• Inputs are combined.
• Once input exceeds a critical level- the neuron discharges a spike ‐ an electrical pulse that
travels - from the body- down the axon - to the next neuron(s)
• The axon endings almost touch the dendrites or cell body of the next neuron.
Biological Neuron
• Transmission of an electrical signal from one
neuron to the next is effected by
neurotransmitters.
• Neurotransmitters are chemicals which are
released from the first neuron and which
bind to the Second.
• This link is called a synapse.
• The strength of the signal that reaches the
next neuron depends on factors such as the
amount of neurotransmitter available.
USE CASES OF ANN LEARNING
1. Recognition and identification
- In general computing and telecommunications: speech, vision and handwriting recognition
- In finance: signature verification and bank note verification
2. Forecasting and prediction
- In finance: foreign exchange rate and stock market forecasting
- In agriculture: crop yield forecasting
- In marketing: sales forecasting
- In meteorology: weather prediction
3 Assessment
- In engineering: product inspection monitoring and control
- In defence: target tracking
- In security: motion detection, surveillance image analysis and fingerprint matching

10
USE CASES OF ANN LEARNING

4. Traveling issues of sales professionals


• This type refers to finding an optimal path to travel between cities in a
particular area. Neural networks help solve the problem of providing
higher revenue at minimal costs. Logistical considerations are
enormous, and here we have to find optimal travel paths for sales
professionals moving from town to town.
5 .Image compression
• The idea behind the data compression neural network is to store,
encrypt, and recreate the actual image again. We can optimize the size
of our data using image compression neural networks. It is the ideal
application to save memory and optimize it.
11
How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
The output is a function of the
input, that is affected by the
weights, and the transfer
functions
McCulloch Pitts Model
• McCulloch (neuroscientist) and Pitts (logician)
proposed a highly simplified computational model
of the neuron (1943)
• g aggregates the inputs and the function f takes a
y ∈ { 0,
1} decision based on this aggregation
• The inputs can be excitatory or inhibitory
f • y = 0 if any x i is inhibitory, else
y = f (g(x )) = 1 if
g
g(x ) ≥ θ
= 0
x1 x 2 .. .. x n ∈ { 0,
if
1}
Source: NPTEL IITM-Deep Learn
g(x) < θ 1
1
Y ∈ y ∈ { 0, y ∈ { 0,
1} 1}
{ 0, 1}
θ 3 1

x1 x2 x3 x1 x2 x3 x1 x2 x3
A McCulloch Pitts AND OR
unit function function
y ∈ { 0, y ∈ { 0, y ∈ { 0,
1} 1} 1}

1 0 0

x1 x2 x1 x2 x1

x 1 AND ! NOR NOT



circle at the
x 2end
∗ indicates inhibitory input: if function
any inhibitory input is 1 the function
output will be 0 Source: NPTEL IITM-Deep Learn
1
3
Perceptron Model

The perceptron model is a more


general computational model
than McCulloch-Pitts neuron.

It takes an input, aggregates it


(weighted sum) and returns 1
only if the aggregated sum is
more than some threshold else
returns 0.

Source: NPTEL IITM-Deep Learn


16
Rewriting the threshold as
shown above and making
it a constant input with a
variable weight.

Source: NPTEL IITM-Deep Learn


17
A single perceptron can only be
used to implement linearly
separable functions.

It takes both real and boolean


inputs and associates a set
of weights to them, along with
a bias (the threshold thing I
mentioned above).

We learn the weights, we get


the function. Let's use a
perceptron to learn an OR
function.
OR Function Using A Perceptron

Source: NPTEL IITM-Deep Learn


18
XNOR Function Using MLP

XOR Function Using MLP

19
Perceptron Learning Algorithm
Our goal is to find the w vector that can
perfectly classify positive inputs and negative
inputs in our data. I will get straight to the
algorithm. Here goes:

Source: NPTEL IITM-Deep Learn


20
Perceptron Convergence
These seemingly arbitrary operations of x and w would
help us learn that perfect w that can perfectly
classify P and N.
We have already established that when x belongs to P,
we want w.x > 0.
The angle between w and x should be less than 90 because the cosine of the angle is
proportional to the dot product.
Angle between w and x should be less
than 90 when x belongs to P class and .

The angle between them should be


more than 90 when x belongs to N class.

Source: NPTEL IITM-Deep Learn


21
(Case 1),

So when we are adding x to w, which we do when


x belongs to P and w.x < 0 we are
essentially increasing the cos(alpha) value, which
means, we are decreasing the alpha value, the
angle between w and x, which is what we desire.

And the similar intuition works for the case


when x belongs to N and w.x ≥ 0 (Case 2).

Source: NPTEL IITM-Deep Learn


22
Limitations of Linear Models 2
3
 Linear models: Output produced by taking a linear combination of input features

Some monotonic function


(e.g., sigmoid)

 This basic architecture is classically also known as the “Perceptron” (not to be confused
with the Perceptron “algorithm”, which learns a linear classification model)
 This can’t however learn nonlinear functions or nonlinear decision boundaries Source: NPTEL IIT
Limitations of Classic Non-Linear Models
 Non-linear models: kNN, kernel methods, generative classification, decision trees etc.
 All have their own disadvantages
 kNN and kernel methods are expensive to generate predictions
 Kernel based and generative models particularize the decision boundary to a particular class
of functions, e.g. quadratic polynomials, gaussian functions etc.
 Decision trees require optimization over many arbitrary hyperparameters to generate good
results, and are (somewhat) expensive to generate predictions from
 Not a deal-breaker, most common competitor for deep learning over large datasets tends
to be some decision-tree derivative
 In general, non-linear ML models are COMPLICATED beasts
Source: NPTEL IIT
2
Multi-layer Perceptron (MLP) 5
 An MLP consists of an input layer, an output layer, and one or more hidden layers

Output Layer
(with a scalar-valued output)
Hidden layer Learnab
units/nodes act as
new features
le
weights
Hidden Layer
(with K=2 hidden units)
Can think of this
model as a
combination of two
predictions and of
two simpler models

Input Layer
(with D=3 visible units)

Source: NPTEL IIT


SESSION INTRODUCTION

• Multilayer perceptron is a type


of feed-forward artificial
neural network that generates
a set of outputs from a set of
inputs.
• An MLP is a neural network
connecting multiple layers in
a directed graph
.
WHY MULTI-LAYER NETWORKS?

• Single perceptron can only


express linear decision
surfaces.
• Multilayer networks can express
a rich variety of nonlinear
decision surfaces.
MULTI-LAYER NETWORKS WITH NON-LINEAR
UNITS

• Multiple layers of cascaded linear units still produce only


linear functions.
• We prefer networks capable of representing highly nonlinear
functions.
• What we need is a unit whose output is a nonlinear function
of its inputs, but whose output is also a differentiable function
of its inputs.
• One solution is the sigmoid unit, a unit very much like a
perceptron, but based on a smoothed, differentiable
threshold function.
Machine learning
FEEDFORWARD NEURAL NETWORK

• Feedforward Neural
Networks are artificial neural
networks where the connections
between units do not form a cycle.
• Feedforward neural networks were the
first type of artificial neural network
devised.
• They are called feedforward because
information only travels forward in the
network (no loops), first through the
input nodes, then through the hidden
nodes (if present), and finally through
the output nodes.
BUILDING BLOCKS OF NEURAL NETWORK
Neurons
The building blocks for neural networks are artificial
neurons.

These are simple computational units that have weighted


input signals and produce an output signal using an
activation function.
•Neuron weights:
Neurons get connected by a weight, which measures their
strength or magnitude. Similar to linear regression
coefficients.
Weight is normally between 0 and 1, with a value between
0 and 1.
Activation Function:
An Activation Function decides whether a neuron should
be activated or not. It will decide whether the neuron's
input to the network is important or not in the process of
prediction using simpler mathematical operations
03/05/2025

ACTIVATION FUNCTIONS

33
TRAINING OF MLP
Initialization
• Initialize weights and bias of the network

Forward propagation-step
• Starting with the input layer, propagate data forward to the output layer.
• This Based on the output, calculate the error (the difference between the predicted and known
outcome).
• The error needs to be minimized.
Back propagation -step
• Backpropagate the error.
• Find its derivative with respect to each weight in the network, and update the model.

Repeat the forward process and backpropagation steps given above over
multiple epochs to learn ideal weights.

03/05/2025 34
3
Multi-layer Perceptron (MLP) 5
 An MLP consists of an input layer, an output layer, and one or more hidden layers

Output Layer
(with a scalar-valued output)
Hidden layer Learnab
units/nodes act as
new features
le
weights
Hidden Layer
(with K=2 hidden units)
Can think of this
model as a
combination of two
predictions and of
two simpler models

Input Layer
(with D=3 visible units)

Source: NPTEL IIT


Neural Net with One Hidden Layer
 Each input transformed into several pre-
Can even be identity
activations using linear models (e.g., for regression yn = sn )
 Nonlinear activation applied on each pre-act.
 Linear model learned on the new “features”
 Finally, output is produced as
 Unknowns () learned by minimizing some loss
function, for example )
(squared, logistic, softmax, etc)
Neural Nets: A Compact Illustration Will denote a linear
combination of
inputs followed by
a nonlinear
operation on the
 Note: Hidden layer pre-act and post-act will be shown together result

Will directly
show the final
output
Will combine pre-act and post-
act and directly show only to
denote the value computed by
Single a hidden node
Hidden More succinctly..
Layer

 Different layers may use different non-linear activations. Output layer may have none.

Source: NPTEL IITK


Activation Functions
Preferred more than
sigmoid For sigmoid as well as tanh
sigmoid. Helps keep
tanh, gradients the mean of the next
saturate (become layer’s inputs close
close to zero as the
h h to zero (with
function tends to its sigmoid, it is close to
extreme values) 0.5)

a a

Leaky ReLU
ReLU ReLU and Leaky
Helps fix the dead ReLU are among
neuron problem the most popular
h of ReLU when is a ones
negative number

Source: NPTEL IITK


MLP Can Learn Nonlinear Function
 An MLP can be seen as a composition ofbymultiple
Obtained composing linear
the models combinedHigh-score
nonlinearly
in the middle
and low-score on either of
two one-sided increasing the two sides of it. Exactly
score functions (using = 1, score what we want for the
Score monotonically and = -1 to “flip” the second given classification
increases. One-sided one before adding). This can problem
increase (not ideal for now learn nonlinear decision
learning nonlinear boundary
decision boundaries) score score
score

A nonlinear
classification
problem

Standard Single “Perceptron” Classifier (no hidden units)


A single hidden layer MLP with
sufficiently large number of hidden
units can approximate any function A Multi-layer Perceptron Classifier
Source: NPTEL IITK (Hornik, 1991) (one hidden layer with 2 units)
The types of decision regions that can be
formed by single and multilayer perceptron
with one and two layers of hidden layers are
given in the Figure.

40
SOLVE THE PROBLEM USING SIGMOID
ACTIVATION FUNCTION – CALCULATE OUTPUTS

41
TOPICS TO BE COVERED

1. Back Propagation Overview


2. Back Propagation Algorithm
3. A Step By Step Back Propagation
Example
1. Gradient Descent Rule
2. Delta learning Rule
WHAT IS BACK PROPAGATION?

• Backpropagation is an algorithm used in artificial intelligence (AI) to fine-


tune mathematical weight functions and improve the accuracy of an
artificial neural network's outputs. Backpropagation is the process of
tuning a neural network's weights to better the prediction accuracy.
• There are two directions in which information flows in a neural network.
Forward propagation — also called inference — is when data goes into the
neural network and out pops a prediction.
• A neural network can be thought of as a group of connected input/output
(I/O) nodes. The level of accuracy each node produces is expressed as a
loss function (error rate).
BACK PROPAGATION OVERVIEW
Backpropagation is the essence of neural
network training. It is the method of fine-
tuning the weights of a neural network
based on the error rate obtained in the
previous epoch (i.e., iteration). Proper
1
tuning of the weights allows you to
reduce
2 error rates and make the model
reliable by increasing its generalization.
3

Backpropagation in neural network is a


short form for “backward propagation of
errors.” It is a standard method of
training artificial neural networks. This
method helps calculate the gradient of a
loss function with respect to all the
weights in the network.
HOW BACKPROPAGATION WORKS: SIMPLE
ALGORITHM
• Inputs X, arrive through the preconnected path
• Input is modeled using real weights W. The weights are usually
randomly selected.
• Calculate the output for every neuron from the input layer, to the
hidden layers, to the output layer.
• Calculate the error in the outputs

ErrorB= Actual Output – Desired Output

• Travel back from the output layer to the hidden layer to adjust
the weights such that the error is decreased.
• Keep repeating the process until the desired output is achieved
A Step by Step BACK PROPAGATION Example

The goal of backpropagation is to optimize the weights so that the neural network can learn how to
correctly map arbitrary inputs to outputs.
To work with a single training set: given inputs 0.05 and 0.10, we want the neural network to output
0.01 and 0.99.
BACK PROPAGATION WORKING METHOD

Calculating the Total Error


We can now calculate the error for each output neuron using the squared error function and sum
them to get the total error:

For example,
2
the target output for I s 0.01 but the neural network output 0.75136507, therefore its error is:

Repeating this process for (r (remembering that the target is 0.99) we get:

The total error for the neural network is the sum of these errors:
The Backwards Pass

Our goal with backpropagation is to update each of the weights in the network so that they
cause the actual output to be closer the target output, thereby minimizing the error for each
output neuron
Output Layer and the network as a whole.

1
Consider . we want to know how much a change in affects the total error, .
2

By applying the chain rule we know that:


BACK PROPAGATION

We need to figure out each piece in this equation.


First, how much does the total error change with respect to the output?
BACK PROPAGATION

When we take the partial derivative of the total error with respect to , the quantity
becomes zero because does not affect it which means we’re taking the derivative of a constant which is zero.

Next,1 how much does the output of change with respect to its total net input?
The partial derivative of the logistic function is the output multiplied by 1 minus the output:
2

Finally, how much does the total net input of change with respect to ?
Putting it all together:

You’ll often see this calculation combined in the form of the delta rule:

The Delta rule in machine learning and neural network environments is a specific type of
backpropagation that helps to refine connectionist ML/AI networks, making connections
between inputs and outputs with layers of artificial neurons.

In general, backpropagation has to do with recalculating input weights for artificial


neurons using a gradient method. Delta learning does this using the difference between
a target activation and an actual obtained activation. Using a linear activation function,
network connections are adjusted.

Another way to explain the Delta rule is that it uses an error function to perform gradient
descent learning.
Apply the Delta rule

To decrease the error, we then subtract this value from the current weight
(optionally multiplied by some learning rate, eta, which we’ll set to 0.5):
We can repeat this process to get the new weights , , and :
GOING BACKWARDS BACK-PROPAGATION OF ERROR

• Computing the errors at the output is no more difficult than it was for the Perceptron, but working out
what to do with those errors is more difficult. The method that we are going to look at is called back-
propagation of error, which makes it clear that the errors are sent backwards through the network.
• There are actually, just three things that you need to know,
• Chain Rule
• Gradient Descent
• Delta Rule

54
BACK-PROPAGATION OF ERROR

• sum-of-squares error function

The weights of the network are trained so that the error goes downhill until it reaches a local minimum, just like a ball
rolling under gravity.

55
ACTIVATION FUNCTION

56
• So now we’ve got a new form of error computation and a new activation function that decides whether or not a neuron should fire.
We can differentiate it, so that when we change the weights, we do it in the direction that is downhill for the error, which means
that we know we are improving the error function of the network.
• As far as an algorithm goes, we’ve fed our inputs forward through the network and worked out which nodes are firing.
• Now, at the output, we’ve computed the errors as the sum-squared difference between the outputs and the targets.
• What we want to do next?
• To compute the gradient of these errors and use them to decide how much to update each weight in the network. We will do that
first for the nodes connected to the output layer, and after we have updated those, we will work backwards through the network
until we get back to the
inputs again. There are just two problems:
• for the output neurons, we don’t know the inputs.
• for the hidden neurons, we don’t know the targets; for extra hidden layers, we know neither the inputs nor the targets, but even
this won’t matter for the algorithm we derive.

57
THE ADVANTAGES OF USING A BACKPROPAGATION ALGORITHM ARE AS
FOLLOWS:

• It does not have any parameters to tune except for the number of inputs.
• It is highly adaptable and efficient and does not require any prior knowledge
about the network.
• It is a standard process that usually works well.
• It is user-friendly, fast and easy to program.
• Users do not need to learn any special functions.

58
THE DISADVANTAGES OF USING A BACKPROPAGATION ALGORITHM ARE AS
FOLLOWS:

• It prefers a matrix-based approach over a mini-batch approach.


• Data mining is sensitive to noise and irregularities.
• Performance is highly dependent on input data.
• Training is time- and resource-intensive.

59
WEB REFERENCES

[1] https://www.guru99.com/backpropogation-neural-network.html
[2]
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-exa
mple/
[3] http://neuralnetworksanddeeplearning.com/chap2.html

60
Conclusion
 Artificial Neural Networks are an imitation of the biological neural
networks, but much simpler ones.
 The computing would have a lot to gain from neural networks.
Their ability to learn by example makes them very flexible and
powerful furthermore there is need to device an algorithm in order
to perform a specific task.
REFERENCES

 Craig Heller, and David Sadava, Life: The Science of Biology, fifth
edition, Sinauer Associates, INC, USA, 1998.
 Introduction to Artificial Neural Networks, Nicolas Galoppo von Borries
 Tom M. Mitchell, Machine Learning, WCB McGraw-Hill, Boston, 1997.
Q. How does each neuron work in ANNS?
What is back propagation?
A neuron: receives input from many other neurons;
changes its internal state (activation) based on the current input;
sends one output signal to many other neurons, possibly including its
input neurons (ANN is recurrent network).

Back-propagation is a type of supervised learning, used at each layer


to minimize the error between the layer’s response and the actual data.
Self-Assessment Questions

1. Single layer perceptron can learn ___________.

(a) Linear Boundary


(b) Non-Linear Boundary
(c) Both
(d) Depends on data

2. Sigmoid activation function used at output layer for

(a) Binary class


(b)Multi-class
(c) Both
(d)None

64
THANK YOU

OUR TEAM

65

You might also like