Biological Neuron and Memory: Understanding The Basics of Neural Function and Memory Mechanisms
Biological Neuron and Memory: Understanding The Basics of Neural Function and Memory Mechanisms
BIOLOGICAL NEURON
AND MEMORY
UNDERSTANDING THE BASICS OF NEURAL FUNCTION AND MEMORY
MECHANISMS
BIOLOGICAL NEURON:
• Definition:
• A neuron is a specialized cell transmitting nerve impulses; a nerve cell.
• Structure:
• Dendrites: Receive signals from other neurons.
• Cell Body (Soma): Contains the nucleus and processes incoming signals.
• Axon: Transmits signals away from the cell body.
• Synapse: The junction between two neurons where information is transmitted.
STRUCTURE OF A BIOLOGICAL NEURON:
FEATURES OF BIOLOGICAL NEURAL NETWORK
• Biological neural networks consist of interconnected neurons in the brain and nervous system.
• They process information through electrical and chemical signals, enabling complex behaviors and
cognitive functions.
• Robustness and fault tolerance: The decay of nerve cells does not seem to affect the performance
significantly.
• Flexibility: The network automatically adjusts to a new environment without using any
preprogrammed instructions.
• Ability to deal with a variety of data situations: The network can deal with information that is
fuzzy, probabilistic, noisy and inconsistent.
• Collective computation: The network performs routinely many operations in parallel and also a
given task in a distributed manner.
• Neural networks are a biologically-inspired algorithms that attempt to mimic the
functions of neurons in the brain.
• Each neuron acts as a computational unit, accepting input from the dendrites and
outputting signal through the axon terminals.
• Actions are triggered when a specific combination of neurons are activated.” The
Human Brain is made up of about 100 billion neurons
• Neurons receive electric signals at the dendrites and send them to the axon.
KEY FEATURES:
Neurons: Axon:
• Basic building blocks of the nervous system. • Long, slender projection that conducts electrical impulses
away from the cell body.
• Comprise dendrites, cell body (soma), axon, and synapses.
• Transmits signals to other neurons, muscles, or glands.
Dendrites:
• Branch-like structures that receive signals from other
neurons. Synapses:
• Conduct electrical impulses toward the cell body. • Junctions between neurons where communication occurs.
Cell Body (Soma): • Consist of the presynaptic ending, synaptic cleft, and
postsynaptic membrane.
• Contains the nucleus and other organelles.
• Integrates incoming signals and generates outgoing signals.
KEY FEATURES:
8
STRUCTURE AND FUNCTION OF
SINGLE NEURON - ARTIFICIAL
NEURAL NETWORK (ANN)
OVERVIEW:
• Artificial Neural Networks (ANNs) are computational models inspired by the biological
neural networks in the brain.
• A single neuron in an ANN is also known as a perceptron.
COMPONENTS OF THE BASIC ARTIFICIAL NEURON
• Inputs: Inputs are the set of values for which we need to predict a output value. They
can be viewed as features or attributes in a dataset.
• Weights: weights are the real values that are attached with each input/feature and they
convey the importance of that corresponding feature in predicting the final output. (will
discuss about this in-detail in this article).
4
COMPONENTS OF THE BASIC ARTIFICIAL NEURON
• Bias: Bias is used for shifting the activation function towards left or right, you can
compare this to y-intercept in the line equation. (will discuss more about this in this
article)
• Summation Function: The work of the summation function is to bind the weights and
inputs together and calculate their sum.
• Activation Function: It is used to introduce non-linearity in the model.
5
WHAT IS THE ROLE OF THE ACTIVATION
FUNCTIONS IN NEURAL NETWORKS?
• The idea behind the activation function is to introduce nonlinearity into the neural network
so that it can learn more complex functions.
• Without the Activation function, the neural network behaves as a linear classifier, learning the
function which is a linear combination of its input data.
• The activation function converts the inputs into outputs.
• The activation function is responsible for deciding whether a neuron should be activated i.e.,
fired or not.
• To make the decision, firstly it calculates the weighted sum and further adds bias with it.
• So, the basic purpose of the activation function is to introduce non-linearity into the output of
a neuron.
6
ACTIVATION FUNCTIONS
• Activation functions are functions used in a neural network to compute the weighted
sum of inputs and biases, which is in turn used to decide whether a neuron can be
activated or not.
• Activation functions play an integral role in neural networks by introducing nonlinearity.
• This nonlinearity allows neural networks to develop complex representations and
functions based on the inputs that would not be possible with a simple linear regression
model.
7
TYPES OF ACTIVATION FUNCTIONS
8
LINEAR ACTIVATION FUNCTIONS
9
LIMITATIONS OF LINEAR ACTIVATION FUNCTION
• It’s not possible to use backpropagation as the derivative of the function is a constant and
has no relation to the input x.
• All layers of the neural network will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still be a
linear function of the first layer. So, essentially, a linear activation function turns the neural
network into just one layer.
10
BINARY STEP FUNCTION
11
LIMITATIONS OF BINARY STEP FUNCTION
12
SIGMOID / LOGISTIC ACTIVATION FUNCTION
13
LIMITATIONS OF SIGMOID ACTIVATION FUNCTION
14
TANH FUNCTION (HYPERBOLIC TANGENT)
15
ADVANTAGE OF TANH FUNCTION
• The output of the tanh activation function is Zero centered; hence we can easily map the
output values as strongly negative, neutral, or strongly positive.
• Usually used in hidden layers of a neural network as its values lie between -1 to 1;
therefore, the mean for the hidden layer comes out to be 0 or very close to it. It helps in
centering the data and makes learning for the next layer much easier.
16
LIMITATIONS OF TANH FUNCTION
17
RELU FUNCTION
18
ADVANTAGES OF RELU FUNCTION
• Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
• ReLU accelerates the convergence of gradient descent towards the global minimum of
the loss function due to its linear, non-saturating property.
19
LIMITATIONS OF RELU FUNCTION
20
LEAKY RELU FUNCTION
21
ADVANTAGES OF LEAKY RELU FUNCTION
22
LIMITATIONS OF LEAKY RELU FUNCTION
23
THANKS
24
COMPARISON AND
CHARACTERISTICS OF BIOLOGICAL
AND ARTIFICIAL NEURAL NETWORK
1
RESEMBLANCE OF BIOLOGICAL AND ARTIFICIAL
NEURON
2
COMPARISION BETWEEN THE ANN AND BNN
THANKS
4
MC-CULLOCH AND PITTS MODEL
1
McCulloch-Pitts Neuron Model
• McCulloch (neuroscientist) and Pitts (logician) proposed a highly
simplified computational model of the neuron (1943),
abbreviated as MP Neuron is the fundamental building block of
Artificial Neural Network. This can be mainly used for
y ∈ { 0, 1}
classification problems.
• g aggregates the inputs and the function f takes a decision based on
this aggregation
f • The inputs can be excitatory or inhibitory
g • y = 0 if any x i is inhibitory, else
y = f (g(x )) = 1 if g(x ) ≥ θ
= 0 if g(x) < θ
x1 x2 .. .. x n ∈ { 0, 1} • θ is called the thresholding parameter
• This is called Thresholding Logic
11
• On taking various inputs the function aggregates them and takes decision based on
the aggregation.
• Aggregation simply means sum of these binary inputs. If the aggregated value
exceeds the threshold, the output is 1 else it is 0.
• For more details about McCulloch Pitts model clink on the below link
• https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1
3
Case Study: McCulloh and Pitt’s model
Case Study: McCulloh and Pitt’s model
• Consider an input signal
• X1 : is it raining?
• X2 : is it sunny?
Case Study: McCulloh and Pitt’s model
• The value of both scenario is either 1 or 0
• Assumption: Use the weights for both X1 and X2 as 1
• Keep threshold as 1
• Draw the simple McCulloh and Pitt’s model for this scenario
Case Study: McCulloh and Pitt’s model
Case Study: McCulloh and Pitt’s model
• Write the truth table for this case study
Case Study: McCulloh and Pitt’s model
• Truth Table
Case Study: McCulloh and Pitt’s model
• Write the function for Ysum and Yout
Case Study: McCulloh and Pitt’s model
• Conclusion:
- Situation where Yout is 1, John needs to bring umbrella.
- In scenarios 2,3,4, John has to bring umbrella
Implement AND function using Mc-Culloch
pitts model
12
13
14
15
Implementation of xor function using MP
neuron model
16
17
18
19
20
21
22
23
24
The McCulloch-Pitts model and the perceptron model are both early computational models of artificial neurons, but they have some key differences:
1. Inventors and Time Period:
- McCulloch-Pitts Model: Proposed by Warren McCulloch and Walter Pitts in 1943.
- Perceptron Model: Developed by Frank Rosenblatt in 1957.
2. Architecture:
- McCulloch-Pitts Model: It is a simplified model of a biological neuron, consisting of binary inputs (0 or 1), weighted connections, and a threshold function to produce binary
outputs.
- Perceptron Model: The perceptron is an extension of the McCulloch-Pitts model that incorporates real-valued weights and a summation function followed by a threshold
activation function.
3. Activation Function:
- McCulloch-Pitts Model: Uses a binary threshold activation function. If the weighted sum of inputs exceeds a certain threshold, the neuron fires (output is 1); otherwise, it doesn't
(output is 0).
- Perceptron Model: Also uses a threshold activation function, but it can handle real-valued inputs and weights. The output is 1 if the weighted sum exceeds the threshold, and 0
otherwise.
4. Learning:
- McCulloch-Pitts Model: It does not incorporate a learning mechanism. The weights and thresholds are typically set manually.
- Perceptron Model: Introduced the concept of learning. Rosenblatt's perceptron learning algorithm updates the weights based on errors, allowing it to learn from data.
5. Capabilities:
- McCulloch-Pitts Model: Limited in its ability to learn complex patterns due to fixed, manually set weights.
- Perceptron Model: Can learn linearly separable patterns and is capable of performing binary classification tasks.
25
THANKS
26
MC-CULLOCH AND PITTS MODEL
1
McCulloch-Pitts Neuron Model
• McCulloch (neuroscientist) and Pitts (logician) proposed a highly
simplified computational model of the neuron (1943),
abbreviated as MP Neuron is the fundamental building block of
Artificial Neural Network. This can be mainly used for
y ∈ { 0, 1}
classification problems.
• g aggregates the inputs and the function f takes a decision based on
this aggregation
f • The inputs can be excitatory or inhibitory
g • y = 0 if any x i is inhibitory, else
y = f (g(x )) = 1 if g(x ) ≥ θ
= 0 if g(x) < θ
x1 x2 .. .. x n ∈ { 0, 1} • θ is called the thresholding parameter
• This is called Thresholding Logic
11
• On taking various inputs the function aggregates them and takes decision based on
the aggregation.
• Aggregation simply means sum of these binary inputs. If the aggregated value
exceeds the threshold, the output is 1 else it is 0.
• For more details about McCulloch Pitts model clink on the below link
• https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1
3
Case Study: McCulloh and Pitt’s model
Case Study: McCulloh and Pitt’s model
• Consider an input signal
• X1 : is it raining?
• X2 : is it sunny?
Case Study: McCulloh and Pitt’s model
• The value of both scenario is either 1 or 0
• Assumption: Use the weights for both X1 and X2 as 1
• Keep threshold as 1
• Draw the simple McCulloh and Pitt’s model for this scenario
Case Study: McCulloh and Pitt’s model
Case Study: McCulloh and Pitt’s model
• Write the truth table for this case study
Case Study: McCulloh and Pitt’s model
• Truth Table
Case Study: McCulloh and Pitt’s model
• Write the function for Ysum and Yout
Case Study: McCulloh and Pitt’s model
• Conclusion:
- Situation where Yout is 1, John needs to bring umbrella.
- In scenarios 2,3,4, John has to bring umbrella
Implement AND function using Mc-Culloch
pitts model
12
13
14
15
THANKS
16
ARTIFICIAL NEURON
MODEL
SESSION - 5
CONTENT
• The first computational model of a neuron was proposed by Warren McCulloch and
Walter Pitts in 1943.
• Simplest binary classification can be achieved by the following way
LIMITATIONS
• We attach to each input a weight ( wi) and notice how we add an input of value 1 with a
weight of −θ. This is called bias.
• The inputs can be seen as neurons and will be called the input layer. Altogether, these
neurons and the function form a perceptron.
• The binary classification function of perceptron network is represented as
PERCEPTRON MODEL
• Schematic Representation
PERCEPTRON MODEL
1 𝑖𝑓 𝑦𝑖𝑛 > ϴ
• Y = f(yin) = 0 𝑖𝑓 − ϴ ≤ 𝑦𝑖𝑛 < 0
−1 𝑖𝑓 𝑦𝑖𝑛 < ϴ
• F(yin) is the activation function (step function)
PERCEPTRON MODEL
𝑦𝑖𝑛 = 𝑤𝑖𝑥𝑖 + 𝑏
𝑖=1
1 𝑖𝑓 𝑦𝑖𝑛 > ϴ
Y = f(yin) = 0 𝑖𝑓 − ϴ ≤ 𝑦𝑖𝑛 < 0
−1 𝑖𝑓 𝑦𝑖𝑛 < ϴ
TRAINING ALGORITHM
• Implementation of two input AND gate for bipolar input using Rosenblatt’s perceptron
model
MINSKY AND PAPERT
MODEL
INTRODUCTION
• Marvin Minsky and Seymour Papert are two influential figures in the field of artificial
intelligence and neural networks.
• Their book, "Perceptrons: An Introduction to Computational Geometry," published in
1969, is a seminal work that critically analyzed the capabilities and limitations of the
Perceptron model, a simple type of artificial neural network.
• Their analysis highlighted significant challenges in the field and spurred the development
of more complex neural network architectures.
KEY CONCEPTS
17
THE PERCEPTRON MODEL
20
LIMITATIONS OF THE PERCEPTRON
• Minsky and Papert provided mathematical proofs showing the Perceptron's limitations.
They rigorously analyzed the types of problems that could and could not be solved by
the Perceptron.
• Their proofs highlighted that any problem requiring a non-linear decision boundary
cannot be solved by a single-layer Perceptron.
IMPLICATIONS FOR AI RESEARCH
• Criticism and Impact: Their work initially led to a decline in interest and funding for
neural network research during the 1970s, a period often referred to as the "AI Winter."
• Reevaluation and Revival: Despite the initial setback, their criticisms were crucial for
the eventual resurgence of neural networks. Researchers recognized the need for multi-
layer networks, leading to the development of more advanced models like multi-layer
perceptrons (MLPs) and deep neural networks.
ADVANCES FOLLOWING MINSKY AND
PAPERT'S WORK
24
MULTI-LAYER PERCEPTRONS (MLPS):
• Introduction of additional layers (hidden layers) between the input and output layers.
• Use of non-linear activation functions (e.g., sigmoid, tanh, ReLU) in hidden layers.
• Ability to solve non-linearly separable problems and learn complex patterns.
25
BACKPROPAGATION ALGORITHM:
26
NEURAL NETWORK ARCHITECTURES:
27
ACTIVATION FUNCTIONS
Session - 6
28
ACTIVATION FUNCTIONS
• Activation functions are functions used in a neural network to compute the weighted
sum of inputs and biases, which is in turn used to decide whether a neuron can be
activated or not.
• Activation functions play an integral role in neural networks by introducing nonlinearity.
• This nonlinearity allows neural networks to develop complex representations and
functions based on the inputs that would not be possible with a simple linear regression
model.
29
TYPES OF ACTIVATION FUNCTIONS
30
LINEAR ACTIVATION FUNCTIONS
31
LIMITATIONS OF LINEAR ACTIVATION FUNCTION
• It’s not possible to use backpropagation as the derivative of the function is a constant and
has no relation to the input x.
• All layers of the neural network will collapse into one if a linear activation function is
used. No matter the number of layers in the neural network, the last layer will still be a
linear function of the first layer. So, essentially, a linear activation function turns the neural
network into just one layer.
32
BINARY STEP FUNCTION
33
LIMITATIONS OF BINARY STEP FUNCTION
34
SIGMOID / LOGISTIC ACTIVATION FUNCTION
35
LIMITATIONS OF SIGMOID ACTIVATION FUNCTION
36
TANH FUNCTION (HYPERBOLIC TANGENT)
37
ADVANTAGE OF TANH FUNCTION
• The output of the tanh activation function is Zero centered; hence we can easily map the
output values as strongly negative, neutral, or strongly positive.
• Usually used in hidden layers of a neural network as its values lie between -1 to 1;
therefore, the mean for the hidden layer comes out to be 0 or very close to it. It helps in
centering the data and makes learning for the next layer much easier.
38
LIMITATIONS OF TANH FUNCTION
39
RELU FUNCTION
40
ADVANTAGES OF RELU FUNCTION
• Since only a certain number of neurons are activated, the ReLU function is far more
computationally efficient when compared to the sigmoid and tanh functions.
• ReLU accelerates the convergence of gradient descent towards the global minimum of
the loss function due to its linear, non-saturating property.
41
LIMITATIONS OF RELU FUNCTION
42
LEAKY RELU FUNCTION
43
ADVANTAGES OF LEAKY RELU FUNCTION
44
LIMITATIONS OF LEAKY RELU FUNCTION
45
THANKS
46
THANKS
ANN TEAM
47
HEBBIAN LEARNING
RULE
SESSION - 7
BEGINNINGS OF ARTIFICIAL NEURON
Unsupervised Learning
2
HEBBIAN LEARNING RULE
• Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by Donald O Hebb
in 1949.
• It is one of the first and also easiest learning rules in the neural network.
• According to Hebb’s rule, the weights are found to increase proportionately to the product of
input and output.
• It means that in a Hebb network if two neurons are interconnected then the weights
associated with these neurons can be increased by changes in the synaptic gap.
• This network is suitable for bipolar data.
• The Hebbian learning rule is generally applied to logic gates.
HEBBIAN LEARNING RULE
x input
y target output
TRAINING ALGORITHM
• Step -1: Initially the weights are set to zero; set bias to zero.
w=0 for all inputs i=1 to n where n is the total number of neurons; b=0
• Step -2: Let s be the output. The activation function for inputs is generally set as an identity
function.
• Step -3: The activation function for output is also set to y=s.
• Step-4: Weight adjustment and bias are adjusted using the formula:
w(new) = w(old) + x*y
b(new) = b(old) + y
• Step -5: the steps 2 to 4 are repeated for each input vector and output
AND GATE
IMPLEMENTATION
Hebbian Learning Rule
Source: https://www.youtube.com/watch?v=xqMCEFPk2cY
Prof. Preethi J, IIT Bombay
6
HEBBIAN LEARNING RULE:
AND GATE IMPLEMENTATION
HEBBIAN LEARNING RULE : AND GATE
IMPLEMENTATION
HEBBIAN LEARNING RULE : AND GATE
IMPLEMENTATION
HEBBIAN LEARNING RULE: AND GATE w(new) = w(old) + x*y
IMPLEMENTATION b(new) = b(old) + y
HEBBIAN LEARNING RULE: AND GATE
IMPLEMENTATION
HEBBIAN LEARNING RULE: AND GATE
IMPLEMENTATION
• Implement a 2 input OR gate using Hebbian learning for bipolar input and draw the Hebb
network for OR gate with updated weights.
13
HEBBIAN LEARNING RULE: OR GATE w(new) = w(old) + x*y
IMPLEMENTATION b(new) = b(old) + y
BEGINNINGS OF ARTIFICIAL NEURON
Supervised Learning
15
PERCEPTRON
LEARNING
SESSION - 8
PERCEPTRON MODEL
18
PERCEPTRON MODEL
• Schematic Representation
PERCEPTRON MODEL
• The output of perceptron network (yout) is a function of the weighted sum of inputs:
• yin = x1w1 + x2w2 + x3w3 + …… xnwn
• yout = f(yin)
1 𝑖𝑓 𝑦𝑖𝑛 > ϴ
• y = f(yin) =ቐ 0 𝑖𝑓 − ϴ ≤ 𝑦𝑖𝑛 < 0
−1 𝑖𝑓 𝑦𝑖𝑛 < ϴ
• f(yin) is the activation function (step function)
PERCEPTRON MODEL: LEARNING
𝑦𝑖𝑛 = 𝑤𝑖𝑥𝑖 + 𝑏
𝑖=1
1 𝑖𝑓 𝑦𝑖𝑛 > ϴ
y = f(yin) =ቐ 0 𝑖𝑓 − ϴ ≤ 𝑦𝑖𝑛 < 0
−1 𝑖𝑓 𝑦𝑖𝑛 < ϴ
TRAINING ALGORITHM
• Implementation of two input AND gate for bipolar input using Rosenblatt’s perceptron
model
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
Credit: Mahesh Huddar
Perceptron Learning Rule
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
28
AND function (2 bipolar inputs and output) using Perceptron
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
29
AND function using Perceptron
If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
30
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
31
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
32
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
33
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
34
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
35
KEY SLIDES OF PERCEPTRON TRAINING
WORKED OUT EXAMPLE
ANN TEAM
36
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
37
AND function using Perceptron If y ≠ t, then
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
38
AND function using Perceptron
Source: https://www.youtube.com/watch?v=CvbYumf_wSI
39
HEBB’S RULE VS PERCEPTRON LEARNING
RULE
• 1943 Artificial Neuron Model (McCulloch Pitt)
40
THANKS
ANN TEAM
41
DELTA LEARNING
RULE
SESSION 9 & 10
HEBB’S RULE VS
PERCEPTRON LEARNING RULE
• 1943 Artificial Neuron Model (McCulloch Pitt)
2
HEBB’S RULE VS
PERCEPTRON LEARNING RULE
• 1943 Artificial Neuron Model (McCulloch Pitt)
3
POPULAR LEARNING RULES IN ANN
DELTA LEARNING RULE
• https://www.youtube.com/watch?v=ktGm0WCoQOg
• https://www.youtube.com/watch?v=MUoEv1Hv0KM
17
18
THANKS
ANN TEAM
Department of IR&D
Session - 11
2
AIM OF THE SESSION
To model complex functions by propagating input data through multiple layers to produce an output.
INSTRUCTIONAL OBJECTIVES
This Session is designed to: Defining their architecture and function, describing how weights and
biases are adjusted during training, and demonstrating.
LEARNING OUTCOMES
At the end of this session, you should be able to: Design and implement a neural network model to
solve real-world problems.
FEED-FORWARD NEURAL NETWORK (FFNN)
• Feed-forward neural networks (FFNNs) are a type of artificial neural network that can be
used for analyzing pattern association, pattern classification, and pattern mapping. In
these tasks, the network is trained to recognize patterns in input data and map them to a
corresponding output.
• Pattern Classification is the process of assigning input data to one of several pre-defined
categories or classes. For example, a FFNN could be trained to classify images of animals into
categories such as "cat," "dog," or "bird." During training, the network is presented with a set of
input images and their corresponding categories, and it learns to map each image to its correct
category. Once trained, the network can be used to classify new images into these categories.
4
CONT…
The process of training a feed-forward neural network involves the following steps:
1. Initialization: The weights and biases of the network are randomly initialized to small values.
2. Forward Propagation: The input data is fed through the network, and each neuron in each layer
calculates a weighted sum of its inputs, applies an activation function to this sum, and passes the
result on to the next layer.
3. Error Calculation: The difference between the actual output and the desired output is calculated,
and this error is used to adjust the weights and biases in the network.
4. Backward Propagation: The error is propagated backwards through the network, and the weights
and biases of each neuron are adjusted to minimize the error.
5. Repeat: Steps 2-4 are repeated many times until the network reaches a state where the error is
minimized, and the network is able to accurately map inputs to outputs.
5
Feed-Forward Neural Network
Cont…
Cont…
Cont…
Cont…
REFERENCES FOR FURTHER LEARNING OF THE
SESSION
Reference Books:
1. "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
2. "Introduction to Artificial Neural Systems" by Jacek M. Zurada
3. "Artificial Intelligence: A Modern Approach" by Stuart Russell and Peter Norvig.
Sites and Web links:
1. Website: http://www.deeplearningbook.org/
2. Website: https://www.wiley.com/en-
us/Introduction+to+Artificial+Neural+Systems%2C+Second+Edition-p-9780471551616
3. Website: http://aima.cs.berkeley.edu/
THANK YOU
2
Perceptron
IMAGE SOURCE:https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53
AND gate perceptron
b -1
A
1 N OUTPUT
𝒙𝟏 D
1
𝒙𝟐
4
Multi layer Perceptron
Image source: https://www.researchgate.net/figure/A-hypothetical-example-of-Multilayer-Perceptron-Network_fig4_303875065
MLP
•The output y is calculated by:
m
y j (n) = j (v j (n)) = j w ji (n) yi (n)
i =0
Where w0(n) is the bias.
https://www.youtube.com/watch?v=tUeGI--71q8&list=PL2sEPpvG8-TPUjB10H1AQpcTmumWlpODe
Derivation of Back Propagation Algorithm
by Mahesh Huddar
https://www.youtube.com/watch?v=XN5IRqtFhOY
Backpropagation algorithm
•Assume that a set of examples
• ={x(n),d(n)}, n=1,…,N is given.
x(n) is the input vector of dimension m0 and d(n) is the desired response
vector of dimension M
•Thus an error signal for the output neuron j will be
ej(n)=dj(n)-yj(n)
•We can derive a learning algorithm for an MLP by assuming an
optimisation approach which is based on the steepest descent direction,
I.e. w(n)=-g(n)
Where g(n) is the gradient vector of the cost function and is the learning
rate.
Backpropagation algorithm
•The algorithm that it is derived from the steepest descent direction is
called back-propagation
•Define a instantaneous cost function as follows:
1
( n) =
2
e j ( n)
2 jC
Where C is the set of all output neurons.
•If we assume that there are N examples in the set then the
average squared error is:
N
1
av =
N
( n)
n =1
•We need to calculate the gradient wrt Eav or wrt to
E(n). In the first case we calculate the gradient per
epoch (i.e. in all patterns N) while in the second the
gradient is calculated per pattern.
•In the case of Eav we have the Batch mode of the
algorithm. In the case of E(n) we have the Online or
Stochastic mode of the algorithm.
•Assume that we use the online mode for the rest of
the calculation. The gradient is defined as:
(n)
g ( n) =
w ji (n)
•Using the chain rule of calculus we can write:
(n) (n) e j (n) y j (n) v j (n)
=
w ji (n) e j (n) y j (n) v j (n) w ji (n)
e j ( n)
= −1
y j ( n )
•And,
y j ( n)
= j ' (v j ( n))
v j ( n)
v j ( n)
= yi ( n )
w ji ( n)
(n)
wij (n) = − = e j (n) j ' (v j (n)) yi (n)
w ji (n)
•The equation regarding the weight corrections can be
written as:
2 kC
•Then we have:
(n) ek (n)
= ek (n)
y j (n) kC y j (n)
•We use again the chain rule of differentiation to get
the partial derivative of ek(n) wrt yj(n):
(n) e (n) vk (n)
= ek (n) k
y j (n) kC vk (n) y j (n)
•Hence: ek ( n)
= − k ' (vk ( n))
vk ( n)
•The local field vk(n) is defined as:
m
vk (n) = wkj (n) y j (n)
j =0
= − k ( n)wkj ( n)
kC
•Putting all together we find for the local gradient of a
hidden neuron j the following formula:
j (n) = j ' (v j (n)) k (n)wkj (n)
kC
( n ) = o j ( n)
( L)
yj
where oj(n) is the jth component of the output
vector o. L is the total number of layers in the
network.
• Compute the error signal:
e j ( n) = d j ( n) − o j ( n)
j j k k kj
• Session no: 17
• Topic: Radial Basis Function Networks
INTRODUCTION
Source: https://www.cs.cmu.edu/afs/cs/academic/class/15883-f19/slides/rbf.pdf
2
INTRODUCTION
Linear Perceptron
Source:
3
INTRODUCTION
RBFN
4
INTRODUCTION
RBFN
5
BASIC ARCHITECTURE OF RBFN: 3 LAYERS
• Input layer
• Source nodes that connect to the network to its
environment
• Hidden layer
• Hidden units provide a set of basis function
• High dimensionality
• Output layer
• Linear combination of hidden functions
Source:
6
Source: https://www.cs.cmu.edu/afs/cs/academic/class/15883-f19/slides/rbf.pdf
7
RBF NETWORK
Source: https://www.cs.cmu.edu/afs/cs/academic/class/15883-f19/slides/rbf.pdf
8
BASIC ARCHITECTURE OF RBFN:
MULTIPLE OUTPUTS FOR CLASSIFICATION
Source:
9
RADIAL BASIS FUNCTION
Source:
10
LEARNING PROCESS
• weight calculation.
• Center Selection: The centers of the RBF neurons can be determined using
clustering techniques like k-means or through other strategies based on the
dataset.
• Weight Calculation: The weights connecting the hidden RBF layer to the
output layer are typically calculated using linear regression or other techniques
such as the Moore-Penrose pseudo-inverse.
Source:
11
LEARNING RULE FOR RADIAL BASIS NETWORK
G = Gaussian function
Source: https://www.csd.uoc.gr/~hy476/lectures/WK4%20-%20Radial%20Basis%20Function%20Networks.ppt
12
LEARNING RULE FOR RADIAL BASIS NETWORK
Source: https://www.csd.uoc.gr/~hy476/lectures/WK4%20-%20Radial%20Basis%20Function%20Networks.ppt
13
LEARNING RULE FOR RADIAL BASIS NETWORK
Source: https://www.csd.uoc.gr/~hy476/lectures/WK4%20-%20Radial%20Basis%20Function%20Networks.ppt
14
LEARNING RULE FOR RADIAL BASIS NETWORK
Source: https://www.csd.uoc.gr/~hy476/lectures/WK4%20-%20Radial%20Basis%20Function%20Networks.ppt
15
Illustrative Example - XOR Problem
Source:
16
MLP VS RBFN
17
RBFs for Classification
Source:
18
APPLICATIONS OF RBFN
i. Function Approximation:
• RBFNs are often used for function approximation tasks. Given input data and corresponding target
values, the network learns to approximate the underlying function.
• The RBF neurons capture the input-output relationship based on the chosen radial basis functions.
ii. Classification:
• RBFNs can be used for classification tasks by assigning classes based on the output of the network.
• Typically, a softmax function or other appropriate activation function is used in the output layer for
classification.
19
ADVANTAGES AND LIMITATIONS OF RBFN
i. Advantages:
• They have the ability to generalize well to unseen data if properly trained.
• RBFNs can be interpretable, as the center locations of RBF neurons often correspond to meaningful features.
ii. Limitations:
• The number of RBF neurons and their spread parameters need to be carefully chosen, which can require
domain knowledge or hyperparameter tuning.
• Training RBFNs can be computationally expensive due to the need for center selection and weight calculation.
Source:
20
SELF ASSESSMENT QUESTIONS
1.What is a key characteristic of Radial Basis Function (RBF) networks compared to traditional feedforward neural
networks?
•A) They have fewer layers.
•B) They use linear activation functions.
•C) They use radial basis functions in the hidden layer.
•D) They require more training data.
2.Which of the following functions is commonly used as a radial basis function in RBF networks?
•A) Sigmoid
•B) ReLU
•C) Gaussian
•D) Softmax
3. Which training algorithm is commonly used for adjusting the parameters of an RBF network?
•A) Gradient Descent
•B) Backpropagation
•C) Evolutionary Algorithms
•D) K-means Clustering
21
TERMINAL QUESTIONS
22
RESOURCES
Online Resources:
1.Tutorial on RBF Networks by Giuseppe Boccignone: A detailed tutorial covering the theory and practical aspects
of RBF networks, available on arXiv.
2.RBF Networks on Scholarpedia: A comprehensive entry providing an overview of RBF networks, their
mathematical foundations, and applications.
3.Wikipedia: Radial basis function - Wikipedia's entry on RBFs covers their mathematical formulation, applications,
and variations.
4.Stanford CS229 Lecture Notes: Radial Basis Function Networks - Lecture notes from Stanford's CS229 course
covering RBF networks and their applications.
23
ARTIFICIAL NEURAL NETWORKS
• Session no: 18
• Topic: Unsupervised Learning – Hamming network
Types of learning
• Supervised.
• Unsupervised.
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
2
Types of learning
3
Hamming Distance
Consider two bipolar vectors (i.e., each co-ordinate can two either of two values):
Source: https://www.youtube.com/watch?v=qR8-Ix_1Z3E
4
Hamming Distance
Consider two bipolar vectors (i.e., each co-ordinate can two either of two values):
= 1 -1 -1 -1 -1 +1 -1 = 2
=2
= No. of Similarities – No. of dissimilarities
Source: https://www.youtube.com/watch?v=qR8-Ix_1Z3E
5
Hamming Network
• Hamming network is used to classify an input vector to one of
the pre-stored vectors/patterns (error detection and correction).
It can also be used for clustering patterns into pre-defined
number of clusters.
• All the nodes in the input layer are connected to all the nodes in
the output layer.
Source: https://www.youtube.com/watch?v=qR8-Ix_1Z3E
6
Hamming Network
• Hamming network is used to classify an input vector to one of
the pre-stored vectors/patterns.
• Hamming network has an input layer and an output layer.
• All the nodes in the input layer are connected to all the nodes in
the output layer.
Source: https://www.youtube.com/watch?v=qR8-Ix_1Z3E
7
Hamming Network
• In the network diagram, shown on right, there are
o n=4 input nodes, and
o p=3 output nodes.
o 12 connections with associated 12 weights
• Each node in the output layer stores one vector.
So, the network on the right stores 3 vectors in the output layer.
Source: https://www.youtube.com/watch?v=qR8-Ix_1Z3E
8
Hamming Network
In our case, p=3 and n=4. Let us denote the 3 stored vectors as
9
Hamming Network
Let X = (x1, x2, x3, x4) denote a 4-dimensional input vector.
Earlier, we had
10
Hamming Network
Let X = (x1, x2, x3, x4) denote a 4-dimensional input vector. Then, we have
11
Hamming Network
Let X = (x1, x2, x3, x4) denote a 4-dimensional input vector.
Now, this Hamming network computes (-ve) Hamming distance between input (X) and stored patterns.
This will permit us to associate an input vector to that pattern in the output layer which most similar to
the input vector (classification) or restore noisy patterns.
12
Hamming algorithm
• Step (1) : Specify the examples.
• Step (2) : Fix the weights matrix :
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
13
Hamming net: solved problem
The Hamming net has three examples :
e(1) = [1 1 -1 -1], e(2) = [-1 -1 -1 1], e(3) = [1 -1 -1 1]
We are given the following vector, and we have to classify it to the closest example :
V = [1 -1 1 1]
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
14
Hamming net: solved problem
We consider here the Hamming net has three examples :
e(1) = [1 1 -1 -1], e(2) = [-1 -1 -1 1], e(3) = [1 -1 -1 1]
We are given the following vector and we have to classify it to the closest example :
V = [1 -1 1 1]
15
Hamming Net: Digit Recognition Task
Suppose the task is to recognize 1 of 8 signs recorded in the network “memory”. The picture
below shows 8 used signs:
The matrices that represent these signs are two-dimensional (3x5 pixels), but we write them as a
single row (15 positions) vectors (of 15 dimensions). Black squares are ones, white squares are
zeros.
Now, we can follow the Hamming net algorithm, shown in the previous slide, to classify any digit
to one of these 8 classes.
Source: https://home.agh.edu.pl/~vlsi/AI/hamming_en/
16
SELF ASSESSMENT QUESTIONS
17
TERMINAL QUESTIONS
18
References
• Ahmed Hashmi, Chemoy Das. (2012). Neural Networks and its Application. www.slideshare.net
• K Ming Leung. (2007). Fixed Weight Competitive Nets : Hamming Net. Polytechnic University.
Source:
19
ARTIFICIAL NEURAL NETWORKS
• Session no: 19
• Topic: Maxnet Architecture
RECAP: Types of learning
• Supervised.
• Unsupervised.
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
2
RECAP: Hamming Network
Let X = (x1, x2, x3, x4) denote a 4-dimensional input vector. Then, we have
3
RECAP: Hamming Network
Let X = (x1, x2, x3, x4) denote a 4-dimensional input vector.
4
MAXNET
• MAXNET is a Hamming
• One-layer Two-layer
• Recurrent Feed-forward
• Competitive network
• It conducts a competition to determine which node
has the highest initial value.
• It depends on Winner-Take-All (WTA) policy
(Only one nonzero output).
Source: https://www.youtube.com/watch?v=T-b0HG9dEpM&t=4s
5
MAXNET
It is a subnet with ‘n’ nodes, which are all
completely interconnected.
In other words, from one node, there will
connections to all ‘n’ nodes, including itself.
Source: https://www.youtube.com/watch?v=T-b0HG9dEpM&t=4s
6
MAXNET
• Every node in MAXNET receives inhibitory
inputs from all other nodes via ‘lateral’ [intra-
layer] connections.
These connections will have negative weights (-ε).
7
MAXNET
• Self excitation weights, θ ~= 1
• Mutual inhibition weights, ε ≤ 1/n
Source: https://www.youtube.com/watch?v=T-b0HG9dEpM&t=4s
8
Hamming-Maxnet network
Input Output
9
Hamming-Maxnet network
Inp Out
ut put
10
Hamming-Maxnet network: solved problem
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
11
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
12
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
13
Hamming-Maxnet network: TB test
Inp Out
ut put
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
14
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
15
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
16
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
17
Hamming-Maxnet network: TB test
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
18
Hamming-Maxnet network:
another solved problem
Source: https://www.youtube.com/watch?v=n-J_mpgPf8k
19
Hamming-Maxnet network: conclusions
Source: https://cs.wmich.edu/~elise/courses/cs6800/DC.pptx
20
SELF ASSESSMENT QUESTIONS
1. Which of the following best describes the learning rule used in Maxnet architecture?
A) Hebbian learning
B) Competitive learning
C) Backpropagation
D) None of the above
Answer: B) Competitive learning
21
TERMINAL QUESTIONS
22
References
• Ahmed Hashmi, Chemoy Das. (2012). Neural Networks and its Application. www.slideshare.net
• K Ming Leung. (2007). Fixed Weight Competitive Nets : Hamming Net. Polytechnic University.
• Karam Hatim, Mohammed Hamdi. (2009). Detection of Tuberculosis by using Artificial Neural Networks.
University of Mosul
Source:
23
ARTIFICIAL NEURAL NETWORKS
22AIP3204
CO-3
SESSION-15
https://www.nobelprize.org/prizes/physics/2024/popular-information/
Associative memory
Imagine that you are trying to remember a fairly unusual word that you rarely use, such
as one for that sloping floor often found in cinemas and lecture halls. You search your
memory. It’s something like ramp… perhaps rad…ial? No, not that. Rake, that’s it!
This process of searching through similar words to find the right one is reminiscent of
the associative memory that the physicist John Hopfield discovered in 1982.
The Hopfield network can store patterns and has a method for recreating them. When
the network is given an incomplete or slightly distorted pattern, the method can find
the stored pattern that is most similar.
https://www.nobelprize.org/prizes/physics/2024/popular-information/
The Hopfield network can be used to recreate data that contains noise or which has been partially erased
https://www.nobelprize.org/prizes/physics/2024/popular-information/
Hopfield Network
OVERVIEW
• Introduction
• Summary
Feedback Neural Networks - Introduction
➢Presents a detailed analysis of the pattern recognition tasks which can
be performed by feedback neural networks (FNN)
➢Most general form → For a set of neuron units, output of each unit is
fed as input to all other units and itself
𝒘 is a vector
𝑾𝒆𝒊𝒈𝒉𝒕 𝒗𝒆𝒄𝒕𝒐𝒓 𝒇𝒐𝒓 𝒋𝒕𝒉 𝒖𝒏𝒊𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒐𝒖𝒕𝒑𝒖𝒕 𝒍𝒂𝒚𝒆𝒓: 𝒘𝑗 = [𝑤𝑗1 , 𝑤𝑗2 , … , 𝑤𝑗𝑀 ]𝑇
Analysis of Linear Auto-Associative Networks
➢ Objective → Associate a given pattern with itself during
training
➢ So in autoassociation → 𝑏𝑙 = 𝑎𝑙 , 𝑙 = 1,2,3, … 𝐿
➢ Lack of accretive behavior during recall → Therefore not useful for storing information
➢ Linear units replaced with non-linear units makes pattern storage possible for these
networks
Analysis Of Pattern Storage Networks
➢ Objective is to store a given set of patterns so that any of them can be recalled
exactly when an approximate pattern is presented for input to the network
➢ Pattern recall should happen despite disturbances to features and their spatial
relations occurring due to:
→ Noise and distortion or
→ Natural variation of the pattern generation process
➢ Outputs of the processing units (neurons) at any instant of time define the
output state of the network at that instant
https://www.nobelprize.org/prizes/physics/2024/popular-information/
Analysis Of Pattern Storage Networks
➢ Feedback in the units and non-linear processing of the units → creates basins of
attraction
➢ Small deviations from the stable state can be measured using Hamming distance
➢ 2𝑁 different states of a network with 𝑁 binary units, but only 𝑁 energy minima – so can
store only 𝑁 binary patterns
➢ If number patterns > number of basins of attraction → hard storage problem – patterns
cannot be stored in the given network
➢ If number patterns < number of basins of attraction → false wells or minima → state of
the network may settle into a false well → error in recall
➢ Two types
→ Continuous
→ Discrete
→ Continuous model :
• State update determined by activation dynamics
• Units have continuous non-linear output functions
→ Discrete model :
• State update is asynchronous
• Units have binary/bipolar output functions
Hopfield Network
The Hopfield Model
CONSIDER THE MC-CULLOCH-PITTS MODEL
The Hopfield Model MC-CULLOCH-PITTS MODEL
The Hopfield Model
• A Hopfield net is composed of binary threshold units with
recurrent connections between them
• Introduction
• The Hopfield Model
• The Energy Function
• Storing memories in a Hopfield Net
• Types of Hopfield Nets (Discrete, Cont)
• Storage Capacity of a Hopfield Net
• Solved Problem Example
• Summary
Introduction
• In 1982, John Hopfield introduced an artificial neural network to collect and
retrieve memory like the human brain. Here, a neuron is either on or off the
situation.
• Thus, similar to the human brain, the Hopfield model has stability in pattern
recognition.
The Hopfield Model
https://www.nobelprize.org/prizes/physics/2024/popular-information/
Energy Analysis of the Hopfield Network
• Discrete Hopfield Model
→ Associated with each state of the network, Hopfield proposed
an energy function whose values always either reduces or
remains the same with the change in the state of the network
Where
is the threshold value of the unit ‘i’
is the state of the ‘i’ the unit
is the state of the ‘j’ the unit
are the symmetric weights
is the total energy of the network
Energy Analysis of the Hopfield Network
• Discrete Hopfield Network
→ The energy profile (landscape) of the network is determined only by the
network architecture – no of units, output functions, threshold values,
connections between units and strength of the connections
Figure (b):
The inverse function
The storage capacity of the Hopfield Net
So 𝑣 𝑠1 = 0, 𝑠2 = 1, 𝑠3 = 1 = 𝒗 𝟎, 𝟏, 𝟏 =
− 1 𝑤12 𝑠1 𝑠2 + 𝑤13 𝑠1 𝑠3 + 𝑤23 𝑠2 𝑠3 + [𝑠1 𝜃1 + 𝑠2 𝜃2 + 𝑠3 𝜃3 ]
= − 𝑤12 (0)𝑠2 + 𝑤13 (0)𝑠3 + (0.4) + [𝜃2 + 𝜃3 ]
= −0.4 + 𝜃2 + 𝜃3 = −0.4 + −0.2 + 0.7 = 0.1
Problem Example – Continued …
So 𝑣 𝑠1 = 1, 𝑠2 = 0, 𝑠3 = 0 = 𝒗 𝟏, 𝟎, 𝟎 =
− 1 𝑤12 𝑠1 𝑠2 + 𝑤13 𝑠1 𝑠3 + 𝑤23 𝑠2 𝑠3 + 𝑠1 𝜃1 + 𝑠2 𝜃2 + 𝑠3 𝜃3
= 0 + 𝑠1 𝜃1 = −0.1
𝜃1 = −0.1
𝜃2 = −0.2
So 𝑣 𝑠1 = 1, 𝑠2 = 0, 𝑠3 = 1 = 𝒗 𝟏, 𝟎, 𝟏 = 𝜃3 = 0.7
− 1 𝑤12 𝑠1 𝑠2 + 𝑤13 𝑠1 𝑠3 + 𝑤23 𝑠2 𝑠3
+ −0.5 𝑠1 𝜃1 + 𝑠2 𝜃2 + 𝑠3 𝜃3
= −𝑤13 𝑠1 𝑠3 + 𝑠1 𝜃1 + 𝑠3 𝜃3
= −0.5 1 1 + −0.1 + 0.7 = −0.5 + 0.6
= 0.1
Problem Example – Continued … 𝜃1 = −0.1
𝜃2 = −0.2
𝜃3 = 0.7
So 𝑣 𝑠1 = 1, 𝑠2 = 1, 𝑠3 = 0 = 𝒗 𝟏, 𝟏, 𝟎 =
− 1 𝑤12 𝑠1 𝑠2 + 𝑤13 𝑠1 𝑠3 + 𝑤23 𝑠2 𝑠3 + 𝑠1 𝜃1 + 𝑠2 𝜃2 + 𝑠3 𝜃3
= − 𝑤12 𝑠1 𝑠2 + 𝑠1 𝜃1 + 𝑠2 𝜃2 = − −0.5 + [ −0.1 + −0.2
= 0.5 − 0.3 = 0.2
So 𝑣 𝑠1 = 1, 𝑠2 = 1, 𝑠3 = 1 = 𝒗 𝟏, 𝟏, 𝟏 =
− 1 𝑤12 𝑠1 𝑠2 + 𝑤13 𝑠1 𝑠3 + 𝑤23 𝑠2 𝑠3 + [𝑠1 𝜃1 + 𝑠2 𝜃2 +
𝑠3 𝜃3 ] = − 1 𝑤12 + 𝑤13 + 𝑤23 + 𝜃1 + 𝜃2 +𝜃3 =
− [ −0.5 + 0.5 + (0.4)] + [ −0.1 + −0.2 + 0.7 =
− 0.4 + 0.4 = 0
Problem Example – Continued …
BASE-
10 𝒔𝟏 𝒔𝟐 𝒔𝟑 𝑣 𝑠1 , 𝑠2 , 𝑠3
VALUE
0 0 0 0 0
1 0 0 1 0.7
2 0 1 0 - 0.2
3 0 1 1 0.1
4 1 0 0 - 0.1
5 1 0 1 0.1
6 1 1 0 0.2
7 1 1 1 0
Problem Example – Continued …
BASE-
10 𝒔𝟏 𝒔𝟐 𝒔𝟑 𝑣 𝑠1, 𝑠2, 𝑠3
VALUE
0 0 0 0 0
1 0 0 1 0.7
2 0 1 0 - 0.2
3 0 1 1 0.1
4 1 0 0 - 0.1
5 1 0 1 0.1
6 1 1 0 0.2
7 1 1 1 0
The Hopfield Net - Conclusions and Summary
INSTRUCTIONAL OBJECTIVES
LEARNING OUTCOMES
A dynamical system is a
system whose state evolves
with time over a state space
according to a fixed rule.
ACTIVATION OF NEURON
• yi -neuron-i
• yj a vector of the respective outputs of all neurons inputting to
neuron-i,
• wij the symmetric weight of the connection between neurons i and j.
ACTIVATION & UPDATE
• If the sign of that field differs from the sign of the neuron’s current
output, its state will flip to align itself.
• If the sign of the field of input matches the sign of the neuron’s
current output, it will stay the same.
ACTIVATION & UPDATE
• If the sign of that field differs from the sign of the neuron’s current
output, its state will flip to align itself.
• If the sign of the field of input matches the sign of the neuron’s
current output, it will stay the same.
CALCULAITON OF PARAMETERS
Each neuron knows only its own state and incoming inputs, and yet
a distributed pattern emerges from the network’s collective activity.
ACTIVATION FOR HOPFILED
Hopfield used an energy function for the network given by below eqn and it
has monotonically decreasing behavior of the energy
HEBB RULE
1. Hopfiled network is a
(a) Auto-Associative
(b) Set- associative
(c) Non memory
(d) Unsupervised
number of clusters, m = 2
Out ut ayex
W2 m = # of nodes in the output layer
Tnputyer
dimensionality of vector, n = 4 X
n = # of nodes in the input layer
Algovithm
Tihalize the oejghts (). random Values
So-
may be addumeol. i = 1,2,...n (dimensionality of input vector)
j = 1,2,...,m (# of clusters [nodes in output layer])
Sae K.
> Tnihalize he Wearming
he Euclidean Distance
Sep1 Calcalate square
e. or each- 1b m
o)- (-
s
Ao tot D)
jm Umit mdex J,
md ing min)mum
the krmulh
Updae lesrning rate XUsmg
(t) 05 (t) -
o o [hoso] o 6 o o . No
inihial
CRustess o ev med 2. Assume om
features (Vectorvecy
No onpu-tex dimension) n-4
m 22
cdutevo
bu 2
betoee O*1.
Wg,(neu)-
N, (o))
, (n)
=
w, (o) + o.s [ -
0-2t05[o- o2)- 0.
A a ) - , () + 0-s Lh- , (J
-+05 0 4-0-2
) 0-6t-5)-oc) -08
09 +05 1-o3] -
o9
u 7)-
Nod opolated 9
Weh Mataix. 0-2 7
0-9 O5
O-9 o-3
Caae ) 2d /p Vec-os
a)d() Winning O2
u, (o) + «
[x- M (o)|
0-7+05 (o-o-7)O35
Maa) 05+0.5 (0-0:5) - 25
Winning CDuerey io O)
W, (n) 0.025
Nay (n)= C3
Wgn)= 6:45
0.02.5 0-95
(7)=-475 O3 0-35
045 625
Nes updo-ed Motin O475 0.15
Optimization in Neural
Network
Session 25, 26
• Backpropagation is the most common method for optimization.
• Other methods like genetic algorithm, Tabu search, and simulated
annealing can be also used.
• when we talk about ANN optimization, the objective function is mean
square error function (loss/cost function).
• We have to find optimize values weights of neural network to
minimize the objective function.
• Although, gradient based search techniques such as back-propagation
are currently the most widely used optimization techniques for
training neural networks, it has been shown that these gradient
techniques are severely limited in their ability to find global solutions.
• Global search techniques have been identified as a potential solution
to this problem.
• Two well-known global search techniques, Simulated Annealing and
the Genetic Algorithm can be used.
• Because of its ease of use, an overwhelming majority of these
applications have used some variation of the gradient technique,
backpropagation (BP) for optimizing the networks.
• Although, backpropagation has unquestionably been a major factor
for the success of past neural network applications, it is plagued with
inconsistent and unpredictable performances.
• for a variety of complex functions the genetic algorithm was able to
achieve superior solutions for neural network optimization than
backpropagation.
• another global search heuristic, Tabu Search (TS), also able to
systematically achieve superior solutions for optimizing the neural
network than those achieved by backpropagation.
• In addition to GA and TS another well-known global search heuristic is
simulated annealing.
• Simulated annealing has been shown to perform well for optimizing a
wide variety of complex problems.
• although the most frequently used algorithm for optimizing neural
networks is backpropagation, it is likely to obtain local solutions.
• simulated annealing, a global search algorithm, performs better than
backpropagation, but is also uses a point to point search.
• Both BP and SA have multiple user determined parameters which may
significantly impact the solution.
• Since there are no established rules for selecting these parameters,
solution outcome is based on chance.
• The genetic algorithm appears to be able to systematically obtain superior
solutions to simulated annealing for optimizing neural networks.
• The genetic algorithm’s process of moving from one population of
points to another enables it to discard potential local solutions and
also to achieve the superior solutions in a computationally more
efficient manner.
Genetic Algorithm
• Heuristic search algorithm inspired by Charles Darwin’s theory of
natural evolution.
• Genetic algorithms are based on the ideas of natural selection and
genetics.
• These are intelligent exploitation of random searches provided with
historical data to direct the search into the region of better
performance in solution space.
• They are commonly used to generate high-quality solutions for
optimization problems and search problems.
• Genetic algorithms simulate the process of natural selection which
means those species that can adapt to changes in their environment
can survive and reproduce and go to the next generation.
• In simple words, they simulate “survival of the fittest” among
individuals of consecutive generations to solve a problem.
• Each generation consists of a population of individuals and each
individual represents a point in search space and possible solution.
• Each individual is represented as a string of
character/integer/float/bits.
• This string is analogous to the Chromosome.
• Genetic algorithms are based on an analogy with the genetic
structure and behavior of chromosomes of the population.
• Following is the foundation of GAs based on this analogy –
1.Individuals in the population compete for resources and mate
2.Those individuals who are successful (fittest) then mate to create
more offspring than others
3.Genes from the “fittest” parent propagate throughout the
generation, that is sometimes parents create offspring which is better
than either parent.
4.Thus each successive generation is more suited for their environment.
Search space
• The architecture of the Learning Vector Quantization with the number of classes in an
input data and n number of input features for any sample is given below:
LVQ
8
DYNAMICALLY DRIVEN
RECURRENT
NETWORKS (RNN)
22AIP3204 ARTIFICIAL NEURAL NETWORKS
RNN
1. Dynamic Computation:
1. DD-RNNs can incorporate dynamic decision-making processes at each time step, allowing the network to adjust how it processes inputs
based on context or historical information.
2. This dynamic aspect could come from using additional attention mechanisms, gating mechanisms, or self-organization techniques that modify
the flow of information.
1. Like traditional RNNs, DD-RNNs aim to maintain memory over time, but they often include mechanisms that help the network forget or
retain information more effectively based on the context of the input sequence.
2. Techniques such as long short-term memory (LSTM) or gated recurrent units (GRUs) are commonly used in DD-RNNs to handle vanishing
gradient problems and improve memory retention.
KEY CONCEPTS IN DD-RNNS:
1. DD-RNNs can adjust the depth, width, or complexity of the network dynamically, allowing them to learn more efficient
representations of sequential data.This adaptability may allow the network to process time series data more efficiently.
1. These networks can learn to modify their structure or the way information flows over time. The key idea is that the
network is not fixed, but can adapt during training to better handle the temporal relationships and dynamics of the
data.
APPLICATIONS OF DD-RNNS:
1. Time Series Forecasting: DD-RNNs can be particularly useful for tasks where sequences
have complex temporal patterns, like financial forecasting, weather prediction, or
demand forecasting.
2. Speech Recognition and Natural Language Processing (NLP): DD-RNNs can process
dynamic sequences of text or audio more efficiently, allowing for better performance in
speech-to-text systems or language modeling.
3. Robotics and Control Systems: In robotics, DD-RNNs can be used to model dynamic
environments or predict sequences of actions to improve decision-making in real-time
systems.
COMPARISON WITH TRADITIONAL RNNS:
1. Traditional RNNs are simple in design and have difficulty handling long-term
dependencies due to the vanishing gradient problem. They use a fixed structure to
propagate information through time.
2. DD-RNNs, on the other hand, introduce more complex and adaptable components to
handle dynamic and complex temporal dependencies. They can adjust their architecture
or processing strategy to better fit the nature of the input sequence.
CONCLUSION
In essence, DD-RNNs are more flexible and capable of handling complex, dynamic systems
compared to traditional RNNs. They are useful in situations where the relationships within
sequential data evolve over time or require more sophisticated memory and attention
mechanisms.