Course Material Neural Updated
Course Material Neural Updated
Prepared by
Ms. Rakhi Sharma
Assistant Professor(CSE-AI/DS)
Note: Examiner will set nine questions in total. Question one will be compulsory. Question one will have 6 parts of
2.5 marks each from all units and remaining eight questions of 15 marks each to be set by taking two questions from
each unit. The students have to attempt five questions in total, first being compulsory and selecting one from each
unit.
Objectives of the course:
1. To understand the different issues involved in the design and implementation of a Neural Networks.
2. To study the basic of neural network and its activation functions.
3. To understand and use of perceptron and its application in real world
4. To develop an understanding of essential NN concepts such as: learning, feed forward and feed backward
5. To design and build a simple NN model to solve a problem
Unit-I
Overview of biological neurons: Structure of biological neuron, neurobiological analogy, Biological
neuron equivalencies to artificial neuron model, Evolution of neural network.
Activation Functions: Threshold functions, Signum function, Sigmoid function, Tanhyperbolic function,
Stochastic function, Ramp function, , Linear function, Identity function.
ANN Architecture: Feed forward network, Feed backward network, single and multilayer network, fully
recurrent network.
Unit-II
McCulloch and Pits Neural Network (MCP Model): Architecture, Solution of AND, OR function using
MCP model, Hebb Model: Architecture, training and testing, Hebb network for AND function.
Perceptron Network: Architecture, training, Testing, single and multi-output model, Perceptron for AND
function.
Linear function, application of linear model, linear separability, solution of OR function using liner
separability model.
Unit-III
Learning: Supervised, Unsupervised, reinforcement learning, Gradient Decent algorithm, generalized delta
learning rule, Habbian learning, Competitive learning, Back propagation Network: Architecture, training
and testing.
Unit-IV
Associative memory: Auto associative and Hetro associative memory and their architecture, training
(insertion) and testing (Retrieval) algorithm using Hebb rule and Outer Product rule. Storage capacity,
Testing of associative memory for missing and mistaken data, Bidirectional memory
Reference Books:
1. David Kriesel, A Brief Introduction to Neural Networks, dkriesel.com, 2005
2. Gunjan Goswami, Introduction to Artificial Neural Networks, S.K. Kataria& Sons, 2012
3. Raul Rojas, Neural Networks: A Systematic Introduction, 1996.
4. S. Sivanandam, Introduction to Artificial Neural Networks, 2003
5. Introduction to artificial Neural systems by Jacek M. Zurada, 1994, Jaico Publ. House.
6. Principles of Soft Computing by S.N. Deepa, S.N. Sivanandam., Weley publication
Course Outcomes
The students will learn
1. Know the purpose of Artificial Neural Networks
2. Apply the concepts of activation, propagation functions
3. Work with supervised learning network paradigm
4. Work with unsupervised learning network paradigm
5. Know the purpose and working of Neural Networks memory concepts
Unit -1
• Input
• Weight
• Bias
• Activation Function
• Output
Artificial Neural Networks (ANNs) have a wide range of applications across various fields due to their ability to
model complex relationships and learn from data. Some of the key applications of ANNs include:
1. Image Recognition and Computer Vision: ANNs are used extensively in tasks such as image classification,
object detection, and facial recognition. Convolutional Neural Networks (CNNs) are a specialized type of
ANN designed for these tasks.
2. Natural Language Processing (NLP): ANNs are employed in NLP tasks like sentiment analysis, machine
translation, chatbots, and text generation. Recurrent Neural Networks (RNNs) and Transformers are
commonly used architectures for NLP.
3. Speech Recognition: ANNs, particularly recurrent and convolutional neural networks, are used in automatic
speech recognition systems, making voice assistants and dictation software possible.
4. Recommendation Systems: ANNs power recommendation algorithms in e-commerce, streaming services,
and social media platforms to suggest products, movies, music, or content to users.
5. Time Series Forecasting: ANNs can model complex temporal patterns, making them valuable for tasks like
stock price prediction, weather forecasting, and demand forecasting in supply chain management.
6. Autonomous Vehicles: ANNs are used for object detection, localization, and decision-making in autonomous
vehicles, enabling them to perceive and navigate their environment.
7. Healthcare: ANNs are employed in medical image analysis for tasks such as tumor detection in radiology,
disease diagnosis, and drug discovery.
8. Financial Modeling: ANNs are used for predicting stock prices, credit risk assessment, fraud detection, and
algorithmic trading.
9. Gaming: ANNs are used in game development for creating intelligent non-player characters (NPCs), game
AI, and procedural content generation.
10. Manufacturing and Quality Control: ANNs can be used to monitor and optimize manufacturing processes,
detect defects in products, and predict equipment failures.
11. Natural Resource Management: ANNs are applied in fields like agriculture to optimize crop yields, manage
resources efficiently, and predict disease outbreaks in plants.
12. Social Media Analysis: ANNs are used to analyze social media data for sentiment analysis, trend prediction,
and recommendation of content to users.
13. Drug Discovery: ANNs assist in drug discovery by predicting the biological activity of molecules and
speeding up the process of identifying potential drug candidates.
14. Energy Management: ANNs help in optimizing energy consumption in buildings and industrial processes,
leading to energy savings.
15. Robotics: ANNs are used for robot control, path planning, and object recognition, enabling robots to perform
tasks in various industries, including manufacturing and healthcare.
16. Anomaly Detection: ANNs are effective at identifying anomalies in data, making them useful for fraud
detection in finance, network security, and quality control in manufacturing.
17. Human Resource Management: ANNs can assist in talent acquisition and employee performance prediction.
18. Environmental Monitoring: ANNs are used for analyzing environmental data, such as predicting air quality,
weather forecasting, and monitoring wildlife.
Soma Node
Dendrites Input
Axon Output
The following table shows the comparison between ANN and BNN based on some criteria mentioned.
Evolution of Neural Network:
ANN during 1940s to 1960s
Some key developments of this era are as follows −
• 1943 − It has been assumed that the concept of neural network started with the work of physiologist,
Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they modeled a simple neural
network using electrical circuits in order to describe how neurons in the brain might work.
• 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated activation
of one neuron by another increases its strength each time they are used.
• 1956 − An associative memory network was introduced by Taylor.
• 1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was invented by
Rosenblatt.
• 1960 − Bernard Widrow and Marcian Hoff developed models called "ADALINE" and “MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows −
• 1961 − Rosenblatt made an unsuccessful attempt but proposed the “backpropagation” scheme for
multilayer networks.
• 1964 − Taylor constructed a winner-take-all circuit with inhibitions among output units.
• 1969 − Multilayer perceptron MLP was invented by Minsky and Papert.
• 1971 − Kohonen developed Associative memories.
• 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows −
1982 − The major development was Hopfield’s Energy approach.
•
1985 − Boltzmann machine was developed by Ackley, Hinton, and Sejnowski.
•
1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
•
1988 − Kosko developed Binary Associative Memory BAM and also gave the concept of Fuzzy Logic in ANN.
Activation Functions
The activation function of a neuron defines it’s output given its inputs.We will be talking about some popular
activation functions:
1. Sigmoid Function:
Description: Takes a real-valued number and scales it between 0 and 1. Large negative numbers become 0 and large
Range: (0,1)
Pros: As it’s range is between 0 and 1, it is ideal for situations where we need to predict the probability of an event as
an output.
Cons: The gradient values are significant for range -3 and 3 but become much closer to zero beyond this range which
almost kills the impact of the neuron on the final output. Also, sigmoid outputs are not zero-centered (it is centred
around 0.5) which leads to undesirable zig-zagging dynamics in the gradient updates for the weights
Plot:
2. Tanh Function:
Description: Similar to sigmoid but takes a real-valued number and scales it between -1 and 1.It is better than sigmoid
Range: (-1,1)
Pros: The derivatives of the tanh are larger than the derivatives of the sigmoid which help us minimize the cost
function faster
Cons: Similar to sigmoid, the gradient values become close to zero for wide range of values (this is known as
vanishing gradient problem). Thus, the network refuses to learn or keeps learning at a very small rate.
Plot:
3. Softmax Function:
Description: Softmax function can be imagined as a combination of multiple sigmoids which can returns the
probability for a datapoint belonging to each individual class in a multiclass classification problem
Formula:
Range: (0,1), sum of output = 1
Pros: Can handle multiple classes and give the probability of belonging to each class
Cons: Should not be used in hidden layers as we want the neurons to be independent. If we apply it then they will be
linearly dependent.
Plot:
4. ReLU Function:
Description: The rectified linear activation function or ReLU for short is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will output zero. This is the default function but modifying default
parameters allows us to use non-zero thresholds and to use a non-zero multiple of the input for values below the
threshold (called Leaky ReLU).
Formula: max(0,x)
Range: (0,inf)
Pros: Although RELU looks and acts like a linear function, it is a nonlinear function allowing complex relationships
to be learned and is able to allow learning through all the hidden layers in a deep network by having large derivatives.
Cons: It should not be used as the final output layer for either classification/regression tasks
Plot:
Synchronous activation" and "asynchronous activation" are terms that refer to how neurons in a neural network
update their activation state in response to input signals. These concepts are especially relevant when considering
the dynamics of recurrent neural networks (RNNs), where neurons can have feedback connections that influence
their own future activations. Let's explore these terms and their implications:
1. Synchronous Activation:
• In synchronous activation, all neurons in the network update their activation states simultaneously in
discrete time steps.
• Neurons process their inputs and calculate their new activations based on the inputs received from the
previous time step.
• This approach simplifies computation and can be easier to implement, but it might not capture certain
temporal dynamics as effectively.
• It's common in feedforward neural networks and some types of RNNs.
2. Asynchronous Activation:
• In asynchronous activation, neurons update their activation states individually and asynchronously
based on their internal state and incoming inputs.
• Neurons might not all update at the same time step; instead, they update whenever they receive input
that crosses their activation threshold.
• This approach allows for richer temporal dynamics and can better capture certain time-sensitive
behaviors, making it more biologically plausible.
• Asynchronous activation is often used in networks that require precise timing, such as spiking neural
networks or more complex RNN architectures.
Threshold functions:
Order of activation:
There are perhaps three activation functions you may want to consider for use in hidden layers; they are:
The output layer is the layer in a neural network model that directly outputs a prediction.
There are perhaps three activation functions you may want to consider for use in the output layer; they are:
• Linear
• Logistic (Sigmoid)
• Softmax
This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used.
Threshold functions
Threshold functions, also known as activation functions, play a critical role in artificial neural networks. These
functions determine whether a neuron should be activated (produce an output) based on the total input it receives.
They introduce non-linearity to the network, allowing it to learn and represent complex relationships in data. Here
are some common types of threshold functions:
Step Function:
Description: One of the simplest threshold functions. It outputs 1 when the input exceeds a certain threshold and 0
otherwise. Due to its discontinuous nature, it's not commonly used in modern neural networks.
Sigmoid Function:
Description: A smooth, S-shaped curve that maps input values to the range (0, 1). It was popular in the past as an
activation function, but its vanishing gradient problem and convergence issues for deep networks have led to its
reduced usage.
Hyperbolic Tangent (Tanh) Function:
Description: Similar to the sigmoid function but centered around zero, producing outputs in the range (-1, 1) It can
suffer from vanishing gradient problems like the sigmoid.
Rectified Linear Unit (ReLU):
Description: Widely used in modern neural networks. It outputs the input as is if it's positive; otherwise, it
outputs zero. ReLU effectively mitigates vanishing gradient issues and promotes sparse activations.
Leaky ReLU:
Description: A variant of ReLU that allows a small gradient for negative inputs, addressing the "dying ReLU"
problem.
Parametric ReLU (PReLU):
Description: Similar to Leaky ReLU but with the slope for negative inputs being learned during training.
Exponential Linear Unit (ELU)
Description: Combines linearity for positive inputs with smoothness for negative inputs. Helps mitigate vanishing
gradient issues and supports negative values.
Signum function
The signum function simply gives the sign for the given values of x. For x value greater than zero, the value of the
output is +1, for x value lesser than zero, the value of the output is -1, and for x value equal to zero, the output is
equal to zero. The signum function can be defined and understood from the below expression.
Let us consider x. The function sgn x yielding a real number, is defined by:
sgn x = 1 if 0 < x, sgn x = -1 if x < 0, sgn x = 0, otherwise.
Solved Examples:
Example 1: Find the result for the values of x, using the signum function
Output = {-1,-1,+1,0,+1,+1,-1}
Sigmoid Function:
Description: Takes a real-valued number and scales it between 0 and 1. Large negative numbers become 0 and large
Range: (0,1)
Pros: As it’s range is between 0 and 1, it is ideal for situations where we need to predict the probability of an event as
an output.
Cons: The gradient values are significant for range -3 and 3 but become much closer to zero beyond this range which
almost kills the impact of the neuron on the final output. Also, sigmoid outputs are not zero-centered (it is centred
around 0.5) which leads to undesirable zig-zagging dynamics in the gradient updates for the weights
Plot:
Applications
The sigmoid function's ability to transform any real number to one between 0 and 1 is advantageous in data
science and many other fields such as:
• In deep learning as a non-linear activation function within neurons in artificial neural networks to allows the
network to learn non-linear relationships between the data
• In binary classification, also called logistic regression, the sigmoid function is used to predict the probability
of a binary variable.
Although the sigmoid function is prevalent in the context of gradient descent, the gradient of the sigmoid
function is in some cases problematic. The gradient vanishes to zero for very low and very high input values,
making it hard for some models to improve.
For example, during backpropagation in deep learning, the gradient of a sigmoid activation function is used to
update the weights & biases of a neural network. If these gradients are tiny, the updates to the weights &
biases are tiny and the network will not learn.
Alternatively, other non-linear functions such as the Rectified Linear Unit (ReLu) are used, which do not show
these flaws.
2. Tanh Function:
Description: Similar to sigmoid but takes a real-valued number and scales it between -1 and 1.It is better than sigmoid
Pros: The derivatives of the tanh are larger than the derivatives of the sigmoid which help us minimize the cost
function faster
Cons: Similar to sigmoid, the gradient values become close to zero for wide range of values (this is known as
vanishing gradient problem). Thus, the network refuses to learn or keeps learning at a very small rate.
Plot:
1. Stochastic Function: A stochastic function, also known as a random function, is a mathematical function that
introduces an element of randomness or uncertainty into its output. In other words, when you apply a
stochastic function to the same input multiple times, you may get different outcomes each time due to the
random nature of the function. Stochastic functions are often used in probabilistic models, simulations, and
scenarios where variability is a key factor.
2. Ramp Function: A ramp function, also called a unit ramp function or simply a ramp, is a mathematical
function that increases linearly with its input, starting from a specified point (usually the origin where the
input is zero). The ramp function is defined as follows:
ramp(x) = max(0, x)
In this definition, the ramp function outputs zero when the input x is negative and increases linearly for positive
values of x.
3. Linear Function: A linear function is a type of mathematical function that has a constant rate of change. It
represents a straight-line relationship between the input and the output. A linear function can be expressed as:
f(x) = ax + b
Where a is the slope (rate of change) of the line, and b is the y-intercept (the point where the line intersects the y-
axis).
4. Identity Function: The identity function is a simple mathematical function that returns its input unchanged.
In other words, for any input x, the identity function outputs the same value x. It is often denoted as:
f(x) = x
The identity function is useful in various mathematical contexts and can serve as a baseline or reference when
comparing other functions.
• Perceptron
• Feed Forward Neural Network
• Multilayer Perceptron
• Convolutional Neural Network
• Radial Basis Functional Neural Network
• Recurrent Neural Network
• LSTM – Long Short-Term Memory
• Sequence to Sequence Models
• Modular Neural Network
Neural networks represent deep learning using artificial intelligence. Certain application scenarios are too heavy or
out of scope for traditional machine learning algorithms to handle. As they are commonly known, Neural Network
pitches in such scenarios and fills the gap. Also, enrol in the neural networks and deep learning course and enhance
your skills today.
Artificial neural networks are inspired by the biological neurons within the human body which activate under certain
circumstances resulting in a related action performed by the body in response. Artificial neural nets consist of
various layers of interconnected artificial neurons powered by activation functions that help in switching them
ON/OFF. Like traditional machine algorithms, here too, there are certain values that neural nets learn in the training
phase.
Briefly, each neuron receives a multiplied version of inputs and random weights, which is then added with a static
bias value (unique to each neuron layer); this is then passed to an appropriate activation function which decides the
final value to be given out of the neuron. There are various activation functions available as per the nature of input
values. Once the output is generated from the final neural net layer, loss function (input vs output)is calculated, and
backpropagation is performed where the weights are adjusted to make the loss minimum. Finding optimal values of
weights is what the overall operation focuses around. Please refer to the following for better understanding-
Weights are numeric values that are multiplied by inputs. In backpropagation, they are modified to reduce the loss.
In simple words, weights are machine learned values from Neural Networks. They self-adjust depending on the
difference between predicted outputs vs training inputs.
Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.
There are many types of neural networks available or that might be in the development stage. They can be classified
depending on their: Structure, Data flow, Neurons used and their density, Layers and their depth activation filters
etc. Also, learn about the Neural network in R to further your learning.
Types of Neural
network
We are going to discuss the following neural networks:
A. Perceptron
Perceptron model, proposed by Minsky-Papert is one of the simplest and oldest models of Neuron. It is the smallest
unit of neural network that does certain computations to detect features or business intelligence in the input data. It
accepts weighted inputs, and apply the activation function to obtain the output as the final result. Perceptron is also
known as TLU(threshold logic unit)
Perceptron is a supervised learning algorithm that classifies the data into two categories, thus it is a binary classifier.
A perceptron separates the input space into two categories by a hyperplane represented by the following equation:
Advantages of Perceptron
Perceptrons can implement Logic Gates like AND, OR, or NAND.
Disadvantages of Perceptron
Perceptrons can only learn linearly separable problems such as boolean AND problem. For non-linear problems
such as the boolean XOR problem, it does not work.
B. Feed Forward Neural Networks
Number of layers depends on the complexity of the function. It has uni-directional forward propagation but no
backward propagation. Weights are static here. An activation function is fed by inputs which are multiplied by
weights. To do so, classifying activation function or step activation function is used. For example: The neuron is
activated if it is above threshold (usually 0) and the neuron produces 1 as an output. The neuron is not activated if it
is below threshold (usually 0) which is considered as -1. They are fairly simple to maintain and are equipped with to
deal with data which contains a lot of noise.
1. Cannot be used for deep learning [due to absence of dense layers and back propagation]
C. Multilayer Perceptron
• Speech Recognition
• Machine Translation
• Complex Classification
An entry point towards complex neural nets where input data travels through various layers of artificial
neurons. Every single node is connected to all neurons in the next layer which makes it a fully connected neural
network. Input and output layers are present having multiple hidden Layers i.e. at least three or more layers in total.
It has a bi-directional propagation i.e. forward propagation and backward propagation.
Inputs are multiplied with weights and fed to the activation function and in backpropagation, they are modified to
reduce the loss. In simple words, weights are machine learnt values from Neural Networks. They self-adjust
depending on the difference between predicted outputs vs training inputs. Nonlinear activation functions are used
followed by softmax as an output layer activation function.
Advantages on Multi-Layer Perceptron
1. Used for deep learning [due to the presence of dense fully connected layers and back propagation]
Disadvantages on Multi-Layer Perceptron:
• Image processing
• Computer Vision
• Speech Recognition
• Machine translation
Convolution neural network contains a three-dimensional arrangement of neurons instead of the standard two-
dimensional array. The first layer is called a convolutional layer. Each neuron in the convolutional layer only
processes the information from a small part of the visual field. Input features are taken in batch-wise like a filter.
The network understands the images in parts and can compute these operations multiple times to complete the full
image processing. Processing involves conversion of the image from RGB or HSI scale to grey-scale. Furthering the
changes in the pixel value will help to detect the edges and images can be classified into different categories.
Propagation is uni-directional where CNN contains one or more convolutional layers followed by pooling and
bidirectional where the output of convolution layer goes to a fully connected neural network for classifying the
images as shown in the above diagram. Filters are used to extract certain parts of the image. In MLP the inputs are
multiplied with weights and fed to the activation function. Convolution uses RELU and MLP uses nonlinear
activation function followed by softmax. Convolution neural networks show very effective results in image and
video recognition, semantic parsing and paraphrase detection.
Radial Basis Function Network consists of an input vector followed by a layer of RBF neurons and an output layer
with one node per category. Classification is performed by measuring the input’s similarity to data points from the
training set where each neuron stores a prototype. This will be one of the examples from the training set.
When a new input vector [the n-dimensional vector that you are trying to classify] needs to be classified, each
neuron calculates the Euclidean distance between the input and its prototype. For example, if we have two classes
i.e. class A and Class B, then the new input to be classified is more close to class A prototypes than the class B
prototypes. Hence, it could be tagged or classified as class A.
Each RBF neuron compares the input vector to its prototype and outputs a value ranging which is a measure of
similarity from 0 to 1. As the input equals to the prototype, the output of that RBF neuron will be 1 and with the
distance grows between the input and prototype the response falls off exponentially towards 0. The curve generated
out of neuron’s response tends towards a typical bell curve. The output layer consists of a set of neurons [one per
category].
Application: Power Restoration
a. Powercut P1 needs to be restored first
b. Powercut P3 needs to be restored next, as it impacts more houses
c. Powercut P2 should be fixed last as it impacts only one house
Designed to save the output of a layer, Recurrent Neural Network is fed back to the input to help in
predicting the outcome of the layer. The first layer is typically a feed forward neural network
followed by recurrent neural network layer where some information it had in the previous time-step
is remembered by a memory function. Forward propagation is implemented in this case. It stores
information required for it’s future use. If the prediction is wrong, the learning rate is employed to
make small changes. Hence, making it gradually increase towards making the right prediction during
the backpropagation.
Advantages of Recurrent Neural Networks
1. Model sequential data where each sample can be assumed to be dependent on historical ones is one
of the advantage.
2. Used with convolution layers to extend the pixel effectiveness.
Disadvantages of Recurrent Neural Networks
LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a
‘memory cell’ that can maintain information in memory for long periods of time. A set of gates is used to control
when information enters the memory when it’s output, and when it’s forgotten. There are three types of gates viz,
Input gate, output gate and forget gate. Input gate decides how many information from the last sample will be kept
in memory; the output gate regulates the amount of data passed to the next layer, and forget gates control the tearing
rate of memory stored. This architecture lets them learn longer-term dependencies
This is one of the implementations of LSTM cells, many other architectures exist.
G. Sequence to sequence models
A sequence to sequence model consists of two Recurrent Neural Networks. Here, there exists an encoder that
processes the input and a decoder that processes the output. The encoder and decoder work simultaneously – either
using the same parameter or different ones. This model, on contrary to the actual RNN, is particularly applicable in
those cases where the length of the input data is equal to the length of the output data. While they possess similar
benefits and limitations of the RNN, these models are usually applied mainly in chatbots, machine translations, and
question answering systems.
Unit 2
McCulloch-Pitts Model
1. Neuron
2. Excitatory Input
3. Inhibitory Input
4. Output
1. Neuron
Neuron is a computational unit which has incoming input signals. The input signals are computed and an output is fired.
The neuron further consists of following two elements –
• Summation Function
This simply calculates the sum of incoming inputs(excitatory).
• Activation Function
Essentially activation function in this case is the step function which sees if the summation is more than equal to a preset
Threshold value , if yes then neuron should fire (i.e. output =1 ) if not the neuron should not fire (i.e. output =0).
2. Excitatory Input
This is an incoming binary signals to neuron, which can have only two values 0 or 1. the value of 0 indicates that the
input is off, whereas the value of 1 indicates that the input is on.
3. Inhibitory Input
This is another type of input signal to neuron. If this input is on, this will now allow neuron to fire , even if there are other
excitatory inputs which are on.
4. Output
This is simply the output of the neuron which again can take only binary values of 0 or 1. The value of 0 indicates that
the neuron does not fire, the value of 1 indicates the neuron does fire.
Hebb Model
For a neural net, the Hebb learning rule is a simple one.
Donald Hebb stated in 1949 that in the brain, the learning is performed by the change in synaptic
gap(strength) Hebb explained it:
"When an axon of cell A is near enough to excite cell B, and repeatedly or permanently
takes place in firing it, some growth process or metabolic change takes place in one or both
the cells such that A’s efficiency, as one of the cell firing B is increased.”
According to the Hebb rule, the weight vector is found to increase proportionately to the product
of the input and the learning signal. Here the learning signal is equal to the neuron's output.
wi(new) = wi(old) + xiy
The Hebb rule is more suited for bipolar data than binary data. If binary data is used, the above weight
updation formula cannot distinguish two conditions namely;
1.A training pair in which an input unit is "on" and target value is "off."
2. A training pair in which both the input unit and the target value are "off."
Thus, there are limitations in Hebb rule application over binary data. Hence, the representation
using bipolar data is advantageous.
Flowchart of Hebb Training algorithm
Training Algorithm
Step 0: First initialize the weights. Basically in this network they may be set to zero, i.e., w; = 0
for i = 1 to n where "n" may be the total number of input neurons.
Step 1: Steps 2-4 have to be performed for each input training vector and target output pair, s: t.
Step 2: Input units activations are set. Generally, the activation function of input layer is identity
function: xi=si for i=1 to n
Step 3: Output units activations are set: y=t
Step 4: Weight adjustments and bias adjustments are performed:
Wi(new) = wi(old) + xiy
b(new) = b(old) + y
The above five steps complete the algorithmic process. In Step 4, the weight updation formula can
also be given in vector form as
w(new)= u(old) +xy
Here the change in weight can be expressed as
D.w = xy
As a result,
w(new) = w(old) +D.w
The Hebb rule can be used for pattern association, pattern categorization, parcem classification
and over a
range of other areas
AND function is very simple and mostly known to everyone where the output is 1/SET/ON if
both the inputs are 1/SET/ON. But in the above example, we have used ‘-1' instead of ‘0’ this
is because the Hebb network uses bipolar data and not binary data because the product item in
the above equations would give the output as 0 which leads to a wrong calculation.
Starting with setp1 which is inializing the weights and bias to ‘0’, so we get
w1=w2=b=0
A) First input [x1,x2,b]=[1,1,1] and target/y = 1. Now using the initial weights as old weight
and applying the Hebb rule (ith value of w(new) = ith value of w(old) + (ith value of x * y))
as follow;
Now the above final weights act as the initial weight when the second input pattern is
presented. And remember that weight change here is;
Δb = y = 1
We got our first output and now we start with the second inputs from the table(2nd row)
B) Second input [x1,x2,b]=[1,-1,1] and target/y = -1.
Note: here that the initial or the old weights are the final(new) weights obtained by
performing the first input pattern i.e [w1,w2,b] = [1,1,1]
Δw2 =x2*y = -1 * -1 = 1
Δb = y = -1
similarly, using the same process for third and fourth row we get a new table as follows;
Fig 4. Final output table
Perceptron model
Perceptron is Machine Learning algorithm for supervised learning of various binary classification tasks.
Further, Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect certain input
data computations in business intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a
supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural network with
four main parameters, i.e., input values, weights and Bias, net sum, and an activation function.
In Machine Learning, binary classifiers are defined as the function that helps in deciding whether input data can be
represented as vectors of numbers and belongs to some specific class.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as a classification
algorithm that can predict linear predictor function in terms of weight and feature vectors.
This is the primary component of Perceptron which accepts the initial data into the system for further processing. Each
input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is another most important parameter
of Perceptron components. Weight is directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron will fire or not. Activation
Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on various problem statements and
forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by
checking whether the learning process is slow or has vanishing or exploding gradients.
This step function or Activation function plays a vital role in ensuring that output is mapped between required values
(0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of a node. Similarly, an
input's bias value gives the ability to shift the activation function curve up or down.
In the first step first, multiply all input values with corresponding weight values and then add them to determine the
weighted sum. Mathematically, we can calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us output
either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
The output obtained from the associator is a binary vector and hence output can be taken as input signal to
the response unit and classification can be performed.
There are n input neurons,1 output neuron and a bias.
The input layer and output layer neurons are connected through a directed communication link which is
associated with weights.
The goal of perceptron net is to classify the input pattern as a member or not a member to a particular class.
Training Algorithm
Step 0: Initialize the weights and the bias(for easy calculation they can be set to 0).Also initialize the
Step 1:Perform Steps 2-6 until the final stopping condition is false.
Step 2:Perform Steps 3-5 for each training pair indicated by s:t.
Step 3: The input layer containing input units is applied with identity activation functions:
xi =si
Step 4: Calculate the output of the nwvork. To do so, first obtain the net input:
Yin= b+ ∑xiwi
where "n" is the number of input neurons in the input layer. Then apply activations over the net input calculated to
obtain the output:
1 if yin > q
y= f (yin ) = 0 if –q yin q
1 if yin < –q
Step 5: Weight and bias adjustment: Compare ilie value of the actual (calculated) output and desired (target)
output.
If y is not =t then
b(new) = b(old) + at
else
wi (new) = wi (old)
b(new) = b(old)
Step 6: Train the nerwork until diere is no weight change. This is the stopping condition for the network. If this
condition is not met, then start again from Step 2.
Testing Algorithm
Step 0:The initial weights to be used here are taken from the training algorithms (the final weights obtained
during training).
Step 1:For each input vector X to be classified, perform Steps 2-3.
Step 2:Set activations of the input unit.
Step 3:Obrain the response of output unit.
Yin= ∑xiwi
I=1
1 if yin > q
y= f (yin ) = 0 if –q yin q
1 if yin < –q
Multi-Layered Perceptron Model:
Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a
greater number of hidden layers.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as
follows:
o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In this
stage, the error between actual output and demanded originated backward on the output layer and ended on the input
layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having various layers
in which activation function does not remain linear, similar to a single layer perceptron model. Instead of linear,
activation function can be executed as sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns. Further,
it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight coefficient
'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
o 'w' represents real-valued weights vector
o 'b' represents the bias
o 'x' represents a vector of input x values.
Characteristics of Perceptron
The perceptron model has the following characteristics.
o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If input vectors are non-linear, it is
not easy to classify them properly.
Weights w1 = 1.2, w2 = 0.6, Threshold = 1 and Learning Rate n = 0.5 are given
This is not greater than the threshold of 1, so the output = 0, Here the target is same as calculated output.
This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.
This is greater than the threshold of 1, so the output = 1. Here the target does not match with the calculated output.
After updating weights are w1 = 0.7, w2 = 0.6 Threshold = 1 and Learning Rate n = 0.5
This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.
This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.
This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.
This is greater than the threshold of 1, so the output = 1. Here the target is same as calculated output.
Hence the final weights are w1= 0.7 and w2 = 0.6, Threshold = 1 and Learning Rate n = 0.5.
Linear Function:
Linear Activation Function
f(x) = a.x
Properties:
1. Since the derivative is constant, the gradient has no relation with input
2. Back propagation is constant as the change is delta x
Linear Models
Introduction to Linear Models
The linear model is one of the most simple models in machine learning. It assumes that the data is linearly separable
and tries to learn the weight of each feature. Mathematically, it can be written as Y=WTX, where X is the feature
matrix, Y is the target variable, and W is the learned weight vector. We apply a transformation function or a
threshold for the classification problem to convert the continuous-valued variable Y into a discrete category. Here
we will briefly learn linear and logistic regression, which are the regression and classification task models,
respectively.
Linear Regression
Linear Regression is a statistical approach that predicts the result of a response variable by combining numerous
influencing factors. It attempts to represent the linear connection between features (independent variables) and the
target (dependent variables). The cost function enables us to find the best possible values for the model parameters.
A detailed discussion on linear regression is presented in a different article.
Linear models have a wide range of applications in various fields due to their simplicity,
interpretability, and effectiveness in many scenarios. Here are some common applications of
linear models:
1. Regression Analysis:
• Predictive Modeling: Linear regression is used to model the relationship between a
dependent variable and one or more independent variables. It's widely used in fields like
economics, finance, and epidemiology to make predictions.
• Time Series Analysis: Linear models can be applied to time series data to forecast future
values based on historical trends.
2. Classification:
• Logistic Regression: This is a linear model used for binary classification tasks. It's widely
used in fields like healthcare for disease prediction and in marketing for customer churn
prediction.
3. Natural Language Processing (NLP):
• Text Classification: Linear models like logistic regression and linear SVMs are used for
tasks such as sentiment analysis, spam detection, and topic classification.
4. Image Processing:
• Image Classification: Linear models are applied to computer vision tasks for classifying
objects in images when feature extraction methods are used, like HOG (Histogram of
Oriented Gradients) or bag-of-words for images.
5. Economics:
• Demand Estimation: Linear models are used to estimate the relationship between the
demand for a product and various factors like price, income, and advertising spend.
6. Finance:
• Portfolio Optimization: Linear models help investors construct portfolios that optimize
returns while managing risk.
• Credit Scoring: Linear models are used to assess the creditworthiness of individuals or
businesses.
7. Engineering:
• Control Systems: Linear models are used in control engineering to model and control
various physical systems.
8. Biology and Life Sciences:
• Pharmacokinetics: Linear models are applied to study the distribution and elimination of
drugs in the body.
• Genomics: Linear models can be used for gene expression analysis and predicting
biological outcomes.
9. Social Sciences:
• Sociology: Linear models are used to analyze social phenomena and relationships.
• Psychology: Linear models can be used in psychological research to examine
relationships between variables.
10. Environmental Science:
• Environmental Impact Assessment: Linear models can assess the impact of various
factors on the environment.
11. Marketing:
• Market Response Models: Linear models are used to understand how marketing efforts
affect sales and customer behavior.
12. Quality Control:
• Manufacturing: Linear models can be applied to quality control processes to detect defects
or variations in production.
13. Operations Research:
• Linear Programming: Linear models are used to optimize resource allocation in logistics,
supply chain management, and transportation.
14. Physics:
• Particle Physics: Linear models are used in the analysis of experimental data to discover
new particles or phenomena.
15. Astronomy:
• Stellar Spectroscopy: Linear models can be used to analyze the spectral characteristics of
stars and celestial objects.
16. Social Media and Recommender Systems:
• Linear models can be used to build recommendation systems for suggesting products,
movies, or content to users based on their preferences and behavior.
Applying Linear Models
Example: Modeling linear relationships can help solve real-world applications. Consider the example
situations below, and note how different problem-solving methods may be used in each.
1. Nadia has $200 in her savings account. She gets a job that pays $7.50 per hour and she
deposits all her earnings in her savings account. Write the equation describing this problem in
slope-intercept form. How many hours would Nadia need to work to have $500 in her
account?
Linear Separability
1. Definition: Linear separability is a property of a dataset where two or more classes of data
points can be perfectly separated using a straight line (in 2D) or a hyperplane (in higher
dimensions). In other words, there exists a line or hyperplane that can cleanly separate all
data points of one class from those of another class.
2. Visualization: In a two-dimensional space, think of a scatterplot with data points from two
different classes. If you can draw a single straight line on the plot in such a way that all data
points of one class are on one side of the line, and all data points of the other class are on the
opposite side, the data is linearly separable.
3. Mathematical Expression: Linear separability can be mathematically represented as follows:
If you have data points (x1, y1), (x2, y2), ..., (xn, yn), and each data point is associated with a
class label (either 0 or 1), then the data is linearly separable if there exist coefficients (w0,
w1, w2, ..., wn) such that w0 + w1x1 + w2x2 + ... + wnxn > 0 for all points of one class and
w0 + w1x1 + w2x2 + ... + wnxn < 0 for all points of the other class.
4. Use in Machine Learning: Linear separability is an important concept in machine learning,
especially in binary classification tasks. If your data is linearly separable, you can use linear
classifiers like linear SVMs, logistic regression, or perceptrons to make accurate predictions.
5. Limitations: Not all real-world datasets are linearly separable. In many cases, data points
from different classes may overlap or be arranged in complex patterns that cannot be
separated by a single straight line or hyperplane. In such cases, more sophisticated machine
learning models or non-linear transformation techniques may be necessary.
6. Example: Imagine a dataset of flowers with two classes: red roses and blue violets. If the
dataset is linearly separable, you can find a line that cleanly separates the red roses from the
blue violets on a scatterplot. This line can serve as a decision boundary for classifying new
flowers.
7. Importance: Linear separability simplifies classification tasks and allows for the use of
simple and interpretable models. However, it's essential to assess whether a dataset is linearly
separable before choosing a classification algorithm, as non-linearly separable data may
require more complex model architectures.
We say a two-dimensional dataset is linearly separable if we can separate the positive from the negative
objects with a straight line.
It doesn’t matter if more than one such line exists. For linear separability, it’s sufficient to find only
one:
Conversely, no line can separate linearly inseparable 2D data:
Key Points
• Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points together based on their inherent characteristics.
• Feature extraction captures essential information from the data, enabling the model to make
meaningful distinctions.
• Label association assigns categories to the clusters based on the extracted patterns and
characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing
both dogs and cats. The model has never seen an image of a dog or cat before, and it has no pre-existing
labels or categories for these animals. Your task is to use unsupervised learning to identify the dogs and
cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and
cats ‘. But it can categorize them according to their similarities, patterns, and differences, i.e., we can
easily categorize the above picture into two parts. The first may contain all pics having dogs in them and
the second part may contain all pics having cats in them. Here you didn’t learn anything before, which
means no training data or examples.
It allows the model to work on its own to discover patterns and information that was previously
undetected. It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
• Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Hebbian Learning
Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by Donald O Hebb. It is one
of the first and also easiest learning rules in the neural network. It is used for pattern classification. It is a
single layer neural network, i.e. it has one input layer and one output layer. The input layer can have
many units, say n. The output layer only has one unit. Hebbian rule works by updating the weights
between neurons in the neural network for each training sample.
Hebbian Learning Rule Algorithm :
1. Set all weights to zero, w i = 0 for i=1 to n, and bias to zero.
2. For each input vector, S(input vector) : t(target output pair), repeat steps 3-5.
3. Set activations for input units with the input vector X i = Si for i = 1 to n.
4. Set the corresponding output value to the output neuron, i.e. y = t.
5. Update weight and bias by applying Hebb rule for all i = 1 to n:
Reinforcement Learning:
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to maximize cumulative
rewards in a given situation. Unlike supervised learning, which relies on a training dataset with predefined answers,
RL involves learning through experience. In RL, an agent learns to achieve a goal in an uncertain, potentially
complex environment by performing actions and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning
• Agent: The learner or decision-maker.
• Environment: Everything the agent interacts with.
• State: A specific situation in which the agent finds itself.
• Action: All possible moves the agent can make.
• Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The agent takes actions within the
environment, receives rewards or penalties, and adjusts its behavior to maximize the cumulative reward. This
learning process is characterized by the following elements:
• Policy: A strategy used by the agent to determine the next action based on the current state.
• Reward Function: A function that provides a scalar feedback signal based on the state and action.
• Value Function: A function that estimates the expected cumulative reward from a given state.
• Model of the Environment: A representation of the environment that helps in planning by predicting future
states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to
find the best possible path to reach the reward. The following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond
and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path
which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step
will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the
diamond.
The learning rate is a hyperparameter that determines the size of the step taken in the weight update. A small
learning rate results in a slow convergence, while a large learning rate can lead to overshooting the minimum and
oscillating around the minimum. It’s important to choose an appropriate learning rate that balances the speed of
convergence and the stability of the optimization.
Variants of Gradient Descent
1) Batch Gradient Descent:
In batch gradient descent, the gradient of the loss function is computed with respect to the weights for the entire
training dataset, and the weights are updated after each iteration. This provides a more accurate estimate of the
gradient, but it can be computationally expensive for large datasets.
2) Stochastic Gradient Descent (SGD):
In SGD, the gradient of the loss function is computed with respect to a single training example, and the weights are
updated after each example. SGD has a lower computational cost per iteration compared to batch gradient descent,
but it can be less stable and may not converge to the optimal solution.
3) Mini-Batch Gradient Descent:
Mini-batch gradient descent is a compromise between batch gradient descent and SGD. The gradient of the loss
function is computed with respect to a small randomly selected subset of the training examples (called a mini-
batch), and the weights are updated after each mini-batch. Mini-batch gradient descent provides a balance between
the stability of batch gradient descent and the computational efficiency of SGD.
4) Momentum:
Momentum is a variant of gradient descent that incorporates information from the previous weight updates to help
the algorithm converge more quickly to the optimal solution. Momentum adds a term to the weight update that is
proportional to the running average of the past gradients, allowing the algorithm to move more quickly in the
direction of the optimal solution
Competitive Learning :
Artificial neural networks often utilize competitive learning models to classify input without the use of labeled
data. The process begins with an input vector (often a data set). This input is then presented to a network of
artificial neurons, each of which has its own set of weights, which act like filters. Each neuron computes a score
based on its weight and the input vector, typically through a dot product operation (a way of multiplying the input
information with the filter and adding the results together).
After the computation, the neuron that has the highest score (the "winner") is updated, usually by shifting its weights
closer to the input vector. This process is often referred to as the "Winner-Takes-All" strategy. Over time, neurons
become specialized as they get updated toward input vectors they can best match. This leads to the formation of
clusters of similar data, hence enabling the discovery of inherent patterns within the input dataset.
To illustrate how one can use competitive learning, imagine an eCommerce business wants to segment its customer
base for targeted marketing, but they have no prior labels or segmentation. By feeding customer data (purchase
history, browsing pattern, demographics, etc.) to a competitive learning model, they could automatically find
distinct clusters (like high spenders, frequent buyers, discount lovers) and tailor marketing strategies accordingly.
For this simple illustration, let's assume we have a dataset composed of 1-dimensional input vectors ranging from 1
to 10 and a competitive learning network with two neurons.
Step 1: Initialization
We start by initializing the weights of the two neurons to random values. Let's assume:
• Neuron 1 weight: 2
• Neuron 2 weight: 8
Step 2: Presenting the input vector
Now, we present an input vector to the network. Let's say our input vector is '5'.
We calculate the distance between the input vector and the weights of the two neurons. The neuron with the weight
closest to the input vector 'wins.' This could be calculated using any distance metric, for example, the absolute
difference:
We adjust the winning neuron's weight to bring it closer to the input vector. If our learning rate (a tuning parameter
in an optimization algorithm that determines the step size at each iteration) is 0.5, the weight update would be:
We repeat the process with all the other input vectors in the dataset, updating the weights after each presentation.
Step 6: Convergence
After several iterations (also known as epochs), the neurons' weights will start to converge to the centers of their
corresponding input clusters. In this case, with 1-dimensional data ranging from 1 to 10, we could expect one neuron
to converge around the lower range (1 to 5) and the other around the higher range (6 to 10).
This process exemplifies how competitive learning works. Over time, each neuron specializes in a different cluster
of the data, enabling the system to identify and represent the inherent groupings in the dataset.
Backpropagation Algorithm
Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural
network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows
you to reduce error rates and make the model reliable by increasing its generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.” It is a standard method of
training artificial neural networks. This method helps calculate the gradient of a loss function with respect to all the
weights in the network.
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the loss function for a single weight by
the chain rule. It efficiently computes one layer at a time, unlike a native direct computation. It computes the
gradient, but it does not define how the gradient is used. It generalizes the computation in the delta rule.
Consider the following Back propagation neural network example diagram to understand:
Advantages:
1.For time series data, such as speech recognition, stock price forecasting, and natural language processing,
Backpropagation through time is very successful. It has the ability to recall information from the past and can handle
input data of any length.
2.It is a commonly used and well-known algorithm that has been proven to function effectively in real-world
settings.
3.Many well-liked deep learning frameworks, like TensorFlow and PyTorch, utilise Backpropagation through time.
Disadvantages:
1.Backpropagation through time can be computationally expensive and demands a lot of CPU power.
2.The backpropagation of errors over long sequences makes the BPTT training process unstable when the sequence
length is large, which causes the vanishing or expanding gradient problem.
3.The approach favors short-term memory since it only propagates errors backwards as far as the length of the
sequence.
Applications:
1.Natural Language Processing: Because Backpropagation through time can produce text sequences based on prior
inputs, it is frequently used for language modeling, text categorization, and machine translation.
2.Speech Recognition: In order to recognize spoken words and translate them into text, Backpropagation through
time is employed in speech recognition tasks.
3.Stock Price Prediction: Using historical data trends and Backpropagation through time, one may forecast stock
prices.
4.Using prior notes and musical patterns, Backpropagation through time can be utilized to create musical sequences.
Unit 4
Unit 4
Architecture
As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors.
Hetero Associative memory
Similar to Auto Associative Memory network, this is also a single layer neural network. However, in
this network the input training vector and the output target vectors are not the same. The weights are
determined so that the network stores a set of patterns. Hetero associative network is static in nature,
hence, there would be no non-linear and delay operations.
Architecture
As shown in the following figure, the architecture of Hetero Associative Memory network
has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Applications of Associative memory :-
1. It can be only used in memory allocation format.
2. It is widely used in the database management systems, etc.
3. Networking: Associative memory is used in network routing tables to quickly find the path to a
destination network based on its address.
4. Image processing: Associative memory is used in image processing applications to search for
specific features or patterns within an image.
5. Artificial intelligence: Associative memory is used in artificial intelligence applications such as
expert systems and pattern recognition.
6. Database management: Associative memory can be used in database management systems to
quickly retrieve data based on its content.
Advantages of Associative memory :-
1. It is used where search time needs to be less or short.
2. It is suitable for parallel searches.
3. It is often used to speedup databases.
4. It is used in page tables used by the virtual memory and used in neural networks.
Disadvantages of Associative memory :-
1. It is more expensive than RAM.
2. Each cell must have storage capability and logical circuits for matching its content with
external argument.
Outer product rule
Storage capacity of Associative memory:
Associative memory, often referred to as content-addressable memory (CAM), is a type of computer memory that
allows data to be retrieved based on its content or attributes, rather than requiring a specific memory address. This
makes it well-suited for applications that involve searching for data or patterns within a large dataset.
Testing associative memory for missing and mistaken data involves ensuring that the memory system can accurately
retrieve the desired data even when it's incomplete or contains errors. Here are some common methods and
considerations for testing associative memory in such scenarios:
1. Incomplete or Missing Data:
• Test for Retrieval of Partial Data: Assess whether the associative memory can successfully retrieve
data even if only a part of the search key or data is provided. This is crucial in situations where some
attributes of the data might be missing or incomplete.
• Conduct Tests with Wildcards: Use wildcards or placeholders in the search keys to simulate missing
data. For example, use "don't care" bits that can match any value in the search key. Verify that the
memory can handle such incomplete queries.
• Evaluate Handling of NULL Values: In databases and certain applications, NULL values may
indicate missing data. Test whether the associative memory can correctly handle and retrieve data
containing NULL values.
2. Mistaken Data:
• Introduce Data Corruption: To test how well the associative memory handles mistaken or corrupted
data, deliberately introduce errors or incorrect values into the stored data. This may involve bit flips,
noise injection, or other forms of data corruption.
• Test Error Tolerance: Assess the memory's error tolerance by providing search keys with slight
variations or errors. Check if the memory can still retrieve the correct data, even when the input is not
a perfect match.
• Evaluate Error Correction: Some associative memory systems may have built-in error correction
capabilities. Test these features to ensure that they can successfully correct mistaken data and provide
the correct response.
3. Simulation and Benchmarking:
• Create Simulation Scenarios: Develop test scenarios that mimic real-world situations where missing
or mistaken data can occur. These scenarios should be representative of the specific application and
data types you're working with.
• Benchmark Performance: Measure the performance of the associative memory in terms of retrieval
accuracy, response time, and resource utilization when handling missing or mistaken data. Compare
the results against predetermined benchmarks or performance expectations.
4. Validation and Verification:
• Use Validation Data: Have a set of validation data with known missing or mistaken values. Test the
associative memory using this data to verify that it returns the expected results.
• Verification with Test Suites: Develop comprehensive test suites that cover various scenarios of
missing and mistaken data. This ensures thorough validation of the memory system.
5. Test Automation:
• Implement Test Automation: Use automated testing frameworks and scripts to systematically test the
associative memory's response to different missing and mistaken data scenarios. Automation helps
ensure consistency and repeatability.
6. Stress Testing:
• Subject the Memory to Stress Tests: Push the associative memory system to its limits by testing its
performance with large datasets, heavy workloads, and extreme conditions. This can help identify
potential issues under stress.
7. Real-World Data:
• Test with Real-World Data: Whenever possible, use actual data from the application domain to test
how the associative memory handles missing or mistaken data. Real-world data may present unique
challenges that synthetic data does not.
8. Error Reporting:
• Implement Error Reporting and Logging: Include mechanisms for the memory system to report errors
and log events related to missing and mistaken data. This can be valuable for diagnosing and
debugging issues during testing and in a production environment.
Algorithm: