0% found this document useful (0 votes)
106 views90 pages

Course Material Neural Updated

Uploaded by

Mohit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views90 pages

Course Material Neural Updated

Uploaded by

Mohit Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Course Material

Neural Network Fundamentals

Subject code : PCC-CSE-401G


Regulation :
Year :3rd
Semester :5th
Academic Year Aug 2023 - dec 2023

Prepared by
Ms. Rakhi Sharma
Assistant Professor(CSE-AI/DS)

Department of Electrical Engineering


DPG Institute of Technology and Management
Gurugram, 122004 Haryana
Neural Networks Fundamentals
Course code PCC-CSE-401G
Category Professional Core Course
Course title Neural Networks Fundamentals
Scheme and Credits L T P Credits
3 0 3
Class work 25 Marks
Exam 75 Marks
Total 100 Marks
Duration of Exam 03 Hours

Note: Examiner will set nine questions in total. Question one will be compulsory. Question one will have 6 parts of
2.5 marks each from all units and remaining eight questions of 15 marks each to be set by taking two questions from
each unit. The students have to attempt five questions in total, first being compulsory and selecting one from each
unit.
Objectives of the course:

1. To understand the different issues involved in the design and implementation of a Neural Networks.
2. To study the basic of neural network and its activation functions.
3. To understand and use of perceptron and its application in real world
4. To develop an understanding of essential NN concepts such as: learning, feed forward and feed backward
5. To design and build a simple NN model to solve a problem

Unit-I
Overview of biological neurons: Structure of biological neuron, neurobiological analogy, Biological
neuron equivalencies to artificial neuron model, Evolution of neural network.
Activation Functions: Threshold functions, Signum function, Sigmoid function, Tanhyperbolic function,
Stochastic function, Ramp function, , Linear function, Identity function.
ANN Architecture: Feed forward network, Feed backward network, single and multilayer network, fully
recurrent network.
Unit-II
McCulloch and Pits Neural Network (MCP Model): Architecture, Solution of AND, OR function using
MCP model, Hebb Model: Architecture, training and testing, Hebb network for AND function.
Perceptron Network: Architecture, training, Testing, single and multi-output model, Perceptron for AND
function.
Linear function, application of linear model, linear separability, solution of OR function using liner
separability model.
Unit-III
Learning: Supervised, Unsupervised, reinforcement learning, Gradient Decent algorithm, generalized delta
learning rule, Habbian learning, Competitive learning, Back propagation Network: Architecture, training
and testing.
Unit-IV
Associative memory: Auto associative and Hetro associative memory and their architecture, training
(insertion) and testing (Retrieval) algorithm using Hebb rule and Outer Product rule. Storage capacity,
Testing of associative memory for missing and mistaken data, Bidirectional memory
Reference Books:
1. David Kriesel, A Brief Introduction to Neural Networks, dkriesel.com, 2005
2. Gunjan Goswami, Introduction to Artificial Neural Networks, S.K. Kataria& Sons, 2012
3. Raul Rojas, Neural Networks: A Systematic Introduction, 1996.
4. S. Sivanandam, Introduction to Artificial Neural Networks, 2003
5. Introduction to artificial Neural systems by Jacek M. Zurada, 1994, Jaico Publ. House.
6. Principles of Soft Computing by S.N. Deepa, S.N. Sivanandam., Weley publication

Course Outcomes
The students will learn
1. Know the purpose of Artificial Neural Networks
2. Apply the concepts of activation, propagation functions
3. Work with supervised learning network paradigm
4. Work with unsupervised learning network paradigm
5. Know the purpose and working of Neural Networks memory concepts
Unit -1

Introduction to Neural Networks:


Neural networks are a class of machine learning models inspired by the structure and function of the human brain.
They are designed to recognize patterns and relationships within data, enabling them to perform tasks like image
and speech recognition, natural language processing, and more. Neural networks have gained widespread attention
and popularity due to their ability to learn complex patterns and solve a variety of problems that are difficult for
traditional algorithms to tackle.
Here are some key concepts related to neural networks:
1. Neuron (Node): The basic building block of a neural network is a neuron, also known as a node. It takes in
multiple inputs, performs a weighted sum of these inputs, applies an activation function, and produces an
output.
2. Layer: Neurons are organized into layers within a neural network. There are three main types of layers:
• Input Layer: This layer receives the initial data input and passes it to the subsequent layers.
• Hidden Layers: These layers come between the input and output layers. They perform complex
transformations on the data, enabling the network to learn intricate patterns.
• Output Layer: This layer produces the final output of the network's computation. The number of
neurons in the output layer corresponds to the number of classes in classification problems or the
dimensions of the target values in regression problems.
3. Weights and Biases: Each connection between neurons has an associated weight, which determines the
strength of the connection. Additionally, each neuron typically has a bias term that shifts the output.
4. Activation Function: The activation function introduces non-linearity into the model. It determines whether a
neuron should "fire" (produce an output) or not based on the weighted sum of its inputs. Common activation
functions include ReLU (Rectified Linear Activation), Sigmoid, and Tanh.
5. Feedforward Propagation: The process of passing input data through the network, layer by layer, to produce
an output is called feedforward propagation. This is how neural networks make predictions.
6. Backpropagation: Backpropagation is the process of updating the weights and biases of the network based on
the error (difference between predicted and actual outputs) to improve its performance. It involves
calculating gradients and propagating them backward through the network.
7. Training: Training a neural network involves providing it with a labeled dataset and adjusting its weights and
biases iteratively to minimize the error. This is typically done using optimization algorithms like stochastic
gradient descent (SGD) or its variants.

Artificial neurons or perceptron consist of:

• Input
• Weight
• Bias
• Activation Function
• Output

Artificial Neural Networks (ANNs) have a wide range of applications across various fields due to their ability to
model complex relationships and learn from data. Some of the key applications of ANNs include:
1. Image Recognition and Computer Vision: ANNs are used extensively in tasks such as image classification,
object detection, and facial recognition. Convolutional Neural Networks (CNNs) are a specialized type of
ANN designed for these tasks.
2. Natural Language Processing (NLP): ANNs are employed in NLP tasks like sentiment analysis, machine
translation, chatbots, and text generation. Recurrent Neural Networks (RNNs) and Transformers are
commonly used architectures for NLP.
3. Speech Recognition: ANNs, particularly recurrent and convolutional neural networks, are used in automatic
speech recognition systems, making voice assistants and dictation software possible.
4. Recommendation Systems: ANNs power recommendation algorithms in e-commerce, streaming services,
and social media platforms to suggest products, movies, music, or content to users.
5. Time Series Forecasting: ANNs can model complex temporal patterns, making them valuable for tasks like
stock price prediction, weather forecasting, and demand forecasting in supply chain management.
6. Autonomous Vehicles: ANNs are used for object detection, localization, and decision-making in autonomous
vehicles, enabling them to perceive and navigate their environment.
7. Healthcare: ANNs are employed in medical image analysis for tasks such as tumor detection in radiology,
disease diagnosis, and drug discovery.
8. Financial Modeling: ANNs are used for predicting stock prices, credit risk assessment, fraud detection, and
algorithmic trading.
9. Gaming: ANNs are used in game development for creating intelligent non-player characters (NPCs), game
AI, and procedural content generation.
10. Manufacturing and Quality Control: ANNs can be used to monitor and optimize manufacturing processes,
detect defects in products, and predict equipment failures.
11. Natural Resource Management: ANNs are applied in fields like agriculture to optimize crop yields, manage
resources efficiently, and predict disease outbreaks in plants.
12. Social Media Analysis: ANNs are used to analyze social media data for sentiment analysis, trend prediction,
and recommendation of content to users.
13. Drug Discovery: ANNs assist in drug discovery by predicting the biological activity of molecules and
speeding up the process of identifying potential drug candidates.
14. Energy Management: ANNs help in optimizing energy consumption in buildings and industrial processes,
leading to energy savings.
15. Robotics: ANNs are used for robot control, path planning, and object recognition, enabling robots to perform
tasks in various industries, including manufacturing and healthcare.
16. Anomaly Detection: ANNs are effective at identifying anomalies in data, making them useful for fraud
detection in finance, network security, and quality control in manufacturing.
17. Human Resource Management: ANNs can assist in talent acquisition and employee performance prediction.
18. Environmental Monitoring: ANNs are used for analyzing environmental data, such as predicting air quality,
weather forecasting, and monitoring wildlife.

Types of Neural Networks

There are seven types of neural networks that can be used.


• Multilayer Perceptron (MLP): A type of feedforward neural network with three or more layers, including
an input layer, one or more hidden layers, and an output layer. It uses nonlinear activation functions.
• Convolutional Neural Network (CNN): A neural network that is designed to process input data that has
a grid-like structure, such as an image. It uses convolutional layers and pooling layers to extract features
from the input data.
• Recursive Neural Network (RNN): A neural network that can operate on input sequences of variable
length, such as text. It uses weights to make structured predictions.
• Recurrent Neural Network (RNN): A type of neural network that makes connections between the neurons
in a directed cycle, allowing it to process sequential data.
• Long Short-Term Memory (LSTM): A type of RNN that is designed to overcome the vanishing gradient
problem in training RNNs. It uses memory cells and gates to selectively read, write, and erase
information.
• Sequence-to-Sequence (Seq2Seq): A type of neural network that uses two RNNs to map input sequences
to output sequences, such as translating one language to another.
• Shallow Neural Network: A neural network with only one hidden layer, often used for simpler tasks or
as a building block for larger networks.
Structure of Biological neural network:
The biological neural network is the intricate and complex network of interconnected neurons found in the nervous
systems of living organisms, including humans. It serves as the foundation for how the brain functions and enables
various cognitive, sensory, and motor functions

Working of a Biological Neuron


As shown in the above diagram, a typical neuron consists of the following four parts with the help of which we can
explain its working −
• Dendrites − They are tree-like branches, responsible for receiving the information from other neurons
it is connected to. In other sense, we can say that they are like the ears of neuron.
• Soma(Cell Nucleus) − It is the cell body of the neuron and is responsible for processing of information,
they have received from dendrites.
• Axon − It is just like a cable through which neurons send the information.
• Synapses − It is the connection between the axon and other neuron dendrites.
ANN versus BNN (neurobiological analogy)
Before taking a look at the differences between Artificial Neural Network ANN and Biological Neural Network BNN,
let us take a look at the similarities based on the terminology between these two.

Neurobiological Analogy and Biological neuron equivalencies to artificial neuron model

1. Neuron in Biological Networks vs. Neuron in Artificial Networks:


• Biological Neuron: In the brain, a neuron receives inputs from other neurons through its dendrites,
processes this information in its cell body, and generates electrical signals (action potentials) that travel
down its axon. These signals are transmitted to other neurons through synapses using
neurotransmitters.
• Artificial Neuron: In artificial neural networks, a neuron (also known as a node or unit) receives
weighted inputs from previous layers, computes a weighted sum of these inputs, applies an activation
function, and produces an output. This output is then propagated to subsequent layers.
2. Synapses and Weights:
• Biological Synapse: In biological networks, synapses play a crucial role in transmitting information
between neurons using neurotransmitters. The strength of the connection between neurons is adjusted
over time through a process called synaptic plasticity.
• Artificial Weights: In artificial networks, weights represent the strength of connections between
neurons. These weights are learned during the training process, where the network adjusts them to
minimize the difference between predicted and actual outputs.
3. Activation Function and Firing:
• Biological Firing: A biological neuron fires an action potential when the sum of its inputs crosses a
certain threshold. The action potential travels down the axon to transmit the signal.
• Artificial Activation: An artificial neuron's activation function determines whether it will "fire"
(produce an output) based on the weighted sum of its inputs. Common activation functions include the
step function, sigmoid, and ReLU.
4. Layers and Networks:
• Biological Network: Biological neurons are organized into complex networks that process information
and perform various functions.
• Artificial Network: Artificial neural networks consist of layers of interconnected artificial neurons. The
architecture of the network, including the number of layers and neurons per layer, is determined by the
problem being solved.
5. Learning and Plasticity:
• Biological Learning: Biological neural networks exhibit plasticity, allowing them to adapt and learn
from experiences. Synaptic strength can be adjusted through long-term potentiation (LTP) and long-
term depression (LTD).
• Artificial Learning: In artificial networks, learning involves adjusting the weights to minimize the error
between predicted and actual outputs. Backpropagation is a common method used for this purpose.
6. Hierarchy and Representation:
• Biological Hierarchy: In the brain, neurons are organized into hierarchical structures, allowing for the
representation of increasingly complex concepts and patterns.
• Artificial Representation: Artificial networks with multiple layers can learn hierarchical features and
representations from data, enabling them to capture intricate patterns.

Artificial Neural Network ANN


Biological Neural Network BNN

Soma Node

Dendrites Input

Synapse Weights or Interconnections

Axon Output

The following table shows the comparison between ANN and BNN based on some criteria mentioned.
Evolution of Neural Network:
ANN during 1940s to 1960s
Some key developments of this era are as follows −
• 1943 − It has been assumed that the concept of neural network started with the work of physiologist,
Warren McCulloch, and mathematician, Walter Pitts, when in 1943 they modeled a simple neural
network using electrical circuits in order to describe how neurons in the brain might work.
• 1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact that repeated activation
of one neuron by another increases its strength each time they are used.
• 1956 − An associative memory network was introduced by Taylor.
• 1958 − A learning method for McCulloch and Pitts neuron model named Perceptron was invented by
Rosenblatt.
• 1960 − Bernard Widrow and Marcian Hoff developed models called "ADALINE" and “MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows −
• 1961 − Rosenblatt made an unsuccessful attempt but proposed the “backpropagation” scheme for
multilayer networks.
• 1964 − Taylor constructed a winner-take-all circuit with inhibitions among output units.
• 1969 − Multilayer perceptron MLP was invented by Minsky and Papert.
• 1971 − Kohonen developed Associative memories.
• 1976 − Stephen Grossberg and Gail Carpenter developed Adaptive resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows −
1982 − The major development was Hopfield’s Energy approach.

1985 − Boltzmann machine was developed by Ackley, Hinton, and Sejnowski.

1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.

1988 − Kosko developed Binary Associative Memory BAM and also gave the concept of Fuzzy Logic in ANN.

Activation Functions

The activation function of a neuron defines it’s output given its inputs.We will be talking about some popular

activation functions:

1. Sigmoid Function:

Description: Takes a real-valued number and scales it between 0 and 1. Large negative numbers become 0 and large

positive numbers become 1

Formula: 1 /(1 + e^-x)

Range: (0,1)

Pros: As it’s range is between 0 and 1, it is ideal for situations where we need to predict the probability of an event as

an output.

Cons: The gradient values are significant for range -3 and 3 but become much closer to zero beyond this range which

almost kills the impact of the neuron on the final output. Also, sigmoid outputs are not zero-centered (it is centred

around 0.5) which leads to undesirable zig-zagging dynamics in the gradient updates for the weights

Plot:
2. Tanh Function:

Description: Similar to sigmoid but takes a real-valued number and scales it between -1 and 1.It is better than sigmoid

as it is centred around 0 which leads to better convergence

Formula: (e^x — e^-x) / (e^x + e^-x)

Range: (-1,1)

Pros: The derivatives of the tanh are larger than the derivatives of the sigmoid which help us minimize the cost

function faster

Cons: Similar to sigmoid, the gradient values become close to zero for wide range of values (this is known as
vanishing gradient problem). Thus, the network refuses to learn or keeps learning at a very small rate.

Plot:

3. Softmax Function:

Description: Softmax function can be imagined as a combination of multiple sigmoids which can returns the

probability for a datapoint belonging to each individual class in a multiclass classification problem

Formula:
Range: (0,1), sum of output = 1

Pros: Can handle multiple classes and give the probability of belonging to each class

Cons: Should not be used in hidden layers as we want the neurons to be independent. If we apply it then they will be

linearly dependent.

Plot:

4. ReLU Function:

Description: The rectified linear activation function or ReLU for short is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will output zero. This is the default function but modifying default

parameters allows us to use non-zero thresholds and to use a non-zero multiple of the input for values below the
threshold (called Leaky ReLU).

Formula: max(0,x)

Range: (0,inf)

Pros: Although RELU looks and acts like a linear function, it is a nonlinear function allowing complex relationships

to be learned and is able to allow learning through all the hidden layers in a deep network by having large derivatives.

Cons: It should not be used as the final output layer for either classification/regression tasks

Plot:

Synchronous activation" and "asynchronous activation" are terms that refer to how neurons in a neural network
update their activation state in response to input signals. These concepts are especially relevant when considering
the dynamics of recurrent neural networks (RNNs), where neurons can have feedback connections that influence
their own future activations. Let's explore these terms and their implications:
1. Synchronous Activation:
• In synchronous activation, all neurons in the network update their activation states simultaneously in
discrete time steps.
• Neurons process their inputs and calculate their new activations based on the inputs received from the
previous time step.
• This approach simplifies computation and can be easier to implement, but it might not capture certain
temporal dynamics as effectively.
• It's common in feedforward neural networks and some types of RNNs.
2. Asynchronous Activation:
• In asynchronous activation, neurons update their activation states individually and asynchronously
based on their internal state and incoming inputs.
• Neurons might not all update at the same time step; instead, they update whenever they receive input
that crosses their activation threshold.
• This approach allows for richer temporal dynamics and can better capture certain time-sensitive
behaviors, making it more biologically plausible.
• Asynchronous activation is often used in networks that require precise timing, such as spiking neural
networks or more complex RNN architectures.
Threshold functions:

Order of activation:

Activation functions for Hidden layers:

There are perhaps three activation functions you may want to consider for use in hidden layers; they are:

• Rectified Linear Activation (ReLU)


• Logistic (Sigmoid)
• Hyperbolic Tangent (Tanh)
This is not an exhaustive list of activation functions used for hidden layers, but they are the most commonly used.

Activation for Output Layers

The output layer is the layer in a neural network model that directly outputs a prediction.

All feed-forward neural network models have an output layer.

There are perhaps three activation functions you may want to consider for use in the output layer; they are:

• Linear
• Logistic (Sigmoid)
• Softmax
This is not an exhaustive list of activation functions used for output layers, but they are the most commonly used.

Threshold functions

Threshold functions, also known as activation functions, play a critical role in artificial neural networks. These
functions determine whether a neuron should be activated (produce an output) based on the total input it receives.
They introduce non-linearity to the network, allowing it to learn and represent complex relationships in data. Here
are some common types of threshold functions:
Step Function:
Description: One of the simplest threshold functions. It outputs 1 when the input exceeds a certain threshold and 0
otherwise. Due to its discontinuous nature, it's not commonly used in modern neural networks.
Sigmoid Function:

Description: A smooth, S-shaped curve that maps input values to the range (0, 1). It was popular in the past as an
activation function, but its vanishing gradient problem and convergence issues for deep networks have led to its
reduced usage.
Hyperbolic Tangent (Tanh) Function:

Description: Similar to the sigmoid function but centered around zero, producing outputs in the range (-1, 1) It can
suffer from vanishing gradient problems like the sigmoid.
Rectified Linear Unit (ReLU):

Description: Widely used in modern neural networks. It outputs the input as is if it's positive; otherwise, it
outputs zero. ReLU effectively mitigates vanishing gradient issues and promotes sparse activations.
Leaky ReLU:

Description: A variant of ReLU that allows a small gradient for negative inputs, addressing the "dying ReLU"
problem.
Parametric ReLU (PReLU):

Description: Similar to Leaky ReLU but with the slope for negative inputs being learned during training.
Exponential Linear Unit (ELU)
Description: Combines linearity for positive inputs with smoothness for negative inputs. Helps mitigate vanishing
gradient issues and supports negative values.

Signum function

The signum function simply gives the sign for the given values of x. For x value greater than zero, the value of the
output is +1, for x value lesser than zero, the value of the output is -1, and for x value equal to zero, the output is
equal to zero. The signum function can be defined and understood from the below expression.

Domain and Range of Signum Function


The Domain of the signum function is all real numbers i.e. R and the co-domain and range of the signum function
are [-1, 0, 1].
Properties of signum function

Let us consider x. The function sgn x yielding a real number, is defined by:
sgn x = 1 if 0 < x, sgn x = -1 if x < 0, sgn x = 0, otherwise.

Solved Examples:
Example 1: Find the result for the values of x, using the signum function

x = {- 4.93, - 7.66, 12, 0, 4.2, 2.33333, -8.10}


Solution: Here we use the following signum function to find the output, for the input values of x.

x = {- 4.93, - 7.66, 12, 0, 4.2, 2.33333, -8.10}

Output = {-1,-1,+1,0,+1,+1,-1}

Applications of Signum Function


Signum function has various applications in different fields. Some of its applications are:
• It is used to find the sign of the real number
• It helps to project a complex number on the unit circle
• A positively or negatively inclined line with the X-axis is obtained when the signum function is
integrated
• It can also be used to predict the probability of the occurrence of an event.
• It is also used for implementing the on and off switch in electronic devices.
• It also finds applications in a thermostat such that the system starts cooling above a specific
temperature and stops cooling below a specific temperature.

Sigmoid Function:

Description: Takes a real-valued number and scales it between 0 and 1. Large negative numbers become 0 and large

positive numbers become 1

Formula: 1 /(1 + e^-x)

Range: (0,1)

Pros: As it’s range is between 0 and 1, it is ideal for situations where we need to predict the probability of an event as

an output.

Cons: The gradient values are significant for range -3 and 3 but become much closer to zero beyond this range which

almost kills the impact of the neuron on the final output. Also, sigmoid outputs are not zero-centered (it is centred

around 0.5) which leads to undesirable zig-zagging dynamics in the gradient updates for the weights

Plot:

Applications

The sigmoid function's ability to transform any real number to one between 0 and 1 is advantageous in data
science and many other fields such as:

• In deep learning as a non-linear activation function within neurons in artificial neural networks to allows the
network to learn non-linear relationships between the data
• In binary classification, also called logistic regression, the sigmoid function is used to predict the probability
of a binary variable.

Issues with the sigmoid function

Although the sigmoid function is prevalent in the context of gradient descent, the gradient of the sigmoid
function is in some cases problematic. The gradient vanishes to zero for very low and very high input values,
making it hard for some models to improve.

For example, during backpropagation in deep learning, the gradient of a sigmoid activation function is used to
update the weights & biases of a neural network. If these gradients are tiny, the updates to the weights &
biases are tiny and the network will not learn.

Alternatively, other non-linear functions such as the Rectified Linear Unit (ReLu) are used, which do not show
these flaws.

Hyperbolic Tanh Function:

2. Tanh Function:

Description: Similar to sigmoid but takes a real-valued number and scales it between -1 and 1.It is better than sigmoid

as it is centred around 0 which leads to better convergence

Formula: (e^x — e^-x) / (e^x + e^-x)


Range: (-1,1)

Pros: The derivatives of the tanh are larger than the derivatives of the sigmoid which help us minimize the cost

function faster

Cons: Similar to sigmoid, the gradient values become close to zero for wide range of values (this is known as

vanishing gradient problem). Thus, the network refuses to learn or keeps learning at a very small rate.

Plot:
1. Stochastic Function: A stochastic function, also known as a random function, is a mathematical function that
introduces an element of randomness or uncertainty into its output. In other words, when you apply a
stochastic function to the same input multiple times, you may get different outcomes each time due to the
random nature of the function. Stochastic functions are often used in probabilistic models, simulations, and
scenarios where variability is a key factor.

2. Ramp Function: A ramp function, also called a unit ramp function or simply a ramp, is a mathematical
function that increases linearly with its input, starting from a specified point (usually the origin where the
input is zero). The ramp function is defined as follows:

ramp(x) = max(0, x)
In this definition, the ramp function outputs zero when the input x is negative and increases linearly for positive
values of x.
3. Linear Function: A linear function is a type of mathematical function that has a constant rate of change. It
represents a straight-line relationship between the input and the output. A linear function can be expressed as:
f(x) = ax + b
Where a is the slope (rate of change) of the line, and b is the y-intercept (the point where the line intersects the y-
axis).
4. Identity Function: The identity function is a simple mathematical function that returns its input unchanged.
In other words, for any input x, the identity function outputs the same value x. It is often denoted as:

f(x) = x
The identity function is useful in various mathematical contexts and can serve as a baseline or reference when
comparing other functions.

Types of Neural Networks and Definition of Neural Network

By Great Learning Team Updated on Nov 23, 2022 118940


This blog is custom-tailored to aid your understanding of different types of commonly used neural networks, how
they work, and their industry applications. The blog commences with a brief introduction to the working of neural
networks. We have tried to keep it very simple yet effective.
Types of neural networks models are listed below:

The nine types of neural networks are:

• Perceptron
• Feed Forward Neural Network
• Multilayer Perceptron
• Convolutional Neural Network
• Radial Basis Functional Neural Network
• Recurrent Neural Network
• LSTM – Long Short-Term Memory
• Sequence to Sequence Models
• Modular Neural Network

An Introduction to Artificial Neural Network

Neural networks represent deep learning using artificial intelligence. Certain application scenarios are too heavy or
out of scope for traditional machine learning algorithms to handle. As they are commonly known, Neural Network
pitches in such scenarios and fills the gap. Also, enrol in the neural networks and deep learning course and enhance
your skills today.

Artificial neural networks are inspired by the biological neurons within the human body which activate under certain
circumstances resulting in a related action performed by the body in response. Artificial neural nets consist of
various layers of interconnected artificial neurons powered by activation functions that help in switching them
ON/OFF. Like traditional machine algorithms, here too, there are certain values that neural nets learn in the training
phase.

Briefly, each neuron receives a multiplied version of inputs and random weights, which is then added with a static
bias value (unique to each neuron layer); this is then passed to an appropriate activation function which decides the
final value to be given out of the neuron. There are various activation functions available as per the nature of input
values. Once the output is generated from the final neural net layer, loss function (input vs output)is calculated, and
backpropagation is performed where the weights are adjusted to make the loss minimum. Finding optimal values of
weights is what the overall operation focuses around. Please refer to the following for better understanding-

Weights are numeric values that are multiplied by inputs. In backpropagation, they are modified to reduce the loss.
In simple words, weights are machine learned values from Neural Networks. They self-adjust depending on the
difference between predicted outputs vs training inputs.
Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.

• Input layer represents dimensions of the input vector.


• Hidden layer represents the intermediary nodes that divide the input space into regions with (soft)
boundaries. It takes in a set of weighted input and produces output through an activation function.
• Output layer represents the output of the neural network.
Types of Neural Networks

There are many types of neural networks available or that might be in the development stage. They can be classified
depending on their: Structure, Data flow, Neurons used and their density, Layers and their depth activation filters
etc. Also, learn about the Neural network in R to further your learning.
Types of Neural
network
We are going to discuss the following neural networks:

A. Perceptron

Perceptron model, proposed by Minsky-Papert is one of the simplest and oldest models of Neuron. It is the smallest
unit of neural network that does certain computations to detect features or business intelligence in the input data. It
accepts weighted inputs, and apply the activation function to obtain the output as the final result. Perceptron is also
known as TLU(threshold logic unit)

Perceptron is a supervised learning algorithm that classifies the data into two categories, thus it is a binary classifier.
A perceptron separates the input space into two categories by a hyperplane represented by the following equation:

Advantages of Perceptron
Perceptrons can implement Logic Gates like AND, OR, or NAND.

Disadvantages of Perceptron
Perceptrons can only learn linearly separable problems such as boolean AND problem. For non-linear problems
such as the boolean XOR problem, it does not work.
B. Feed Forward Neural Networks

Applications on Feed Forward Neural Networks:

• Simple classification (where traditional Machine-learning based classification algorithms have


limitations)
• Face recognition [Simple straight forward image processing]
• Computer vision [Where target classes are difficult to classify]
• Speech Recognition
The simplest form of neural networks where input data travels in one direction only, passing through artificial neural
nodes and exiting through output nodes. Where hidden layers may or may not be present, input and output layers are
present there. Based on this, they can be further classified as a single-layered or multi-layered feed-forward neural
network.

Number of layers depends on the complexity of the function. It has uni-directional forward propagation but no
backward propagation. Weights are static here. An activation function is fed by inputs which are multiplied by
weights. To do so, classifying activation function or step activation function is used. For example: The neuron is
activated if it is above threshold (usually 0) and the neuron produces 1 as an output. The neuron is not activated if it
is below threshold (usually 0) which is considered as -1. They are fairly simple to maintain and are equipped with to
deal with data which contains a lot of noise.

Advantages of Feed Forward Neural Networks

1. Less complex, easy to design & maintain


2. Fast and speedy [One-way propagation]
3. Highly responsive to noisy data
Disadvantages of Feed Forward Neural Networks:

1. Cannot be used for deep learning [due to absence of dense layers and back propagation]
C. Multilayer Perceptron

Applications on Multi-Layer Perceptron

• Speech Recognition
• Machine Translation
• Complex Classification
An entry point towards complex neural nets where input data travels through various layers of artificial
neurons. Every single node is connected to all neurons in the next layer which makes it a fully connected neural
network. Input and output layers are present having multiple hidden Layers i.e. at least three or more layers in total.
It has a bi-directional propagation i.e. forward propagation and backward propagation.

Inputs are multiplied with weights and fed to the activation function and in backpropagation, they are modified to
reduce the loss. In simple words, weights are machine learnt values from Neural Networks. They self-adjust
depending on the difference between predicted outputs vs training inputs. Nonlinear activation functions are used
followed by softmax as an output layer activation function.
Advantages on Multi-Layer Perceptron

1. Used for deep learning [due to the presence of dense fully connected layers and back propagation]
Disadvantages on Multi-Layer Perceptron:

1. Comparatively complex to design and maintain


Comparatively slow (depends on number of hidden layers)

D. Convolutional Neural Network

Applications on Convolution Neural Network

• Image processing
• Computer Vision
• Speech Recognition
• Machine translation
Convolution neural network contains a three-dimensional arrangement of neurons instead of the standard two-
dimensional array. The first layer is called a convolutional layer. Each neuron in the convolutional layer only
processes the information from a small part of the visual field. Input features are taken in batch-wise like a filter.
The network understands the images in parts and can compute these operations multiple times to complete the full
image processing. Processing involves conversion of the image from RGB or HSI scale to grey-scale. Furthering the
changes in the pixel value will help to detect the edges and images can be classified into different categories.

Propagation is uni-directional where CNN contains one or more convolutional layers followed by pooling and
bidirectional where the output of convolution layer goes to a fully connected neural network for classifying the
images as shown in the above diagram. Filters are used to extract certain parts of the image. In MLP the inputs are
multiplied with weights and fed to the activation function. Convolution uses RELU and MLP uses nonlinear
activation function followed by softmax. Convolution neural networks show very effective results in image and
video recognition, semantic parsing and paraphrase detection.

Advantages of Convolution Neural Network:

1. Used for deep learning with few parameters


2. Less parameters to learn as compared to fully connected layer
Disadvantages of Convolution Neural Network:

• Comparatively complex to design and maintain


• Comparatively slow [depends on the number of hidden layers]
E. Radial Basis Function Neural Networks

Radial Basis Function Network consists of an input vector followed by a layer of RBF neurons and an output layer
with one node per category. Classification is performed by measuring the input’s similarity to data points from the
training set where each neuron stores a prototype. This will be one of the examples from the training set.

When a new input vector [the n-dimensional vector that you are trying to classify] needs to be classified, each
neuron calculates the Euclidean distance between the input and its prototype. For example, if we have two classes
i.e. class A and Class B, then the new input to be classified is more close to class A prototypes than the class B
prototypes. Hence, it could be tagged or classified as class A.

Each RBF neuron compares the input vector to its prototype and outputs a value ranging which is a measure of
similarity from 0 to 1. As the input equals to the prototype, the output of that RBF neuron will be 1 and with the
distance grows between the input and prototype the response falls off exponentially towards 0. The curve generated
out of neuron’s response tends towards a typical bell curve. The output layer consists of a set of neurons [one per
category].
Application: Power Restoration
a. Powercut P1 needs to be restored first
b. Powercut P3 needs to be restored next, as it impacts more houses
c. Powercut P2 should be fixed last as it impacts only one house

F. Recurrent Neural Networks


Applications of Recurrent Neural Networks

• Text processing like auto suggest, grammar checks, etc.


• Text to speech processing
• Image tagger
• Sentiment Analysis
• Translation

Designed to save the output of a layer, Recurrent Neural Network is fed back to the input to help in
predicting the outcome of the layer. The first layer is typically a feed forward neural network
followed by recurrent neural network layer where some information it had in the previous time-step
is remembered by a memory function. Forward propagation is implemented in this case. It stores
information required for it’s future use. If the prediction is wrong, the learning rate is employed to
make small changes. Hence, making it gradually increase towards making the right prediction during
the backpropagation.
Advantages of Recurrent Neural Networks

1. Model sequential data where each sample can be assumed to be dependent on historical ones is one
of the advantage.
2. Used with convolution layers to extend the pixel effectiveness.
Disadvantages of Recurrent Neural Networks

1. Gradient vanishing and exploding problems


2. Training recurrent neural nets could be a difficult task
3. Difficult to process long sequential data using ReLU as an activation function.
Improvement over RNN: LSTM (Long Short-Term Memory) Networks

LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units include a
‘memory cell’ that can maintain information in memory for long periods of time. A set of gates is used to control
when information enters the memory when it’s output, and when it’s forgotten. There are three types of gates viz,
Input gate, output gate and forget gate. Input gate decides how many information from the last sample will be kept
in memory; the output gate regulates the amount of data passed to the next layer, and forget gates control the tearing
rate of memory stored. This architecture lets them learn longer-term dependencies

This is one of the implementations of LSTM cells, many other architectures exist.
G. Sequence to sequence models

A sequence to sequence model consists of two Recurrent Neural Networks. Here, there exists an encoder that
processes the input and a decoder that processes the output. The encoder and decoder work simultaneously – either
using the same parameter or different ones. This model, on contrary to the actual RNN, is particularly applicable in
those cases where the length of the input data is equal to the length of the output data. While they possess similar
benefits and limitations of the RNN, these models are usually applied mainly in chatbots, machine translations, and
question answering systems.

Unit 2

McCulloch and Pits Neural Network (MCP Model): Architecture

1. McCulloch-Pitts Model of Neuron


The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types of inputs
— Excitatory and Inhibitory. The excitatory inputs have weights of positive magnitude and the inhibitory
weights have weights of negative magnitude. The inputs of the McCulloch-Pitts neuron could be either 0
or 1. It has a threshold function as an activation function. So, the output signal yout is 1 if the input ysum is
greater than or equal to a given threshold value, else 0. The diagrammatic representation of the model is
as follows:

McCulloch-Pitts Model

Simple McCulloch-Pitts neurons can be used to design logical operations.

Different parts of McCulloch-Pitts Neuron Model


McCulloch-Pitts neuron model’s anatomy is very simple and has following parts –

1. Neuron
2. Excitatory Input
3. Inhibitory Input
4. Output

1. Neuron
Neuron is a computational unit which has incoming input signals. The input signals are computed and an output is fired.
The neuron further consists of following two elements –

• Summation Function
This simply calculates the sum of incoming inputs(excitatory).

• Activation Function
Essentially activation function in this case is the step function which sees if the summation is more than equal to a preset
Threshold value , if yes then neuron should fire (i.e. output =1 ) if not the neuron should not fire (i.e. output =0).

• Neuron fires: Output =1 , if Summation >= Threshold


• Neuron does not fires: Output =0 , if Summation < Threshold

2. Excitatory Input
This is an incoming binary signals to neuron, which can have only two values 0 or 1. the value of 0 indicates that the
input is off, whereas the value of 1 indicates that the input is on.

3. Inhibitory Input
This is another type of input signal to neuron. If this input is on, this will now allow neuron to fire , even if there are other
excitatory inputs which are on.

4. Output
This is simply the output of the neuron which again can take only binary values of 0 or 1. The value of 0 indicates that
the neuron does not fire, the value of 1 indicates the neuron does fire.
Hebb Model
 For a neural net, the Hebb learning rule is a simple one.
 Donald Hebb stated in 1949 that in the brain, the learning is performed by the change in synaptic
gap(strength) Hebb explained it:
 "When an axon of cell A is near enough to excite cell B, and repeatedly or permanently
takes place in firing it, some growth process or metabolic change takes place in one or both
the cells such that A’s efficiency, as one of the cell firing B is increased.”
 According to the Hebb rule, the weight vector is found to increase proportionately to the product
of the input and the learning signal. Here the learning signal is equal to the neuron's output.
wi(new) = wi(old) + xiy
The Hebb rule is more suited for bipolar data than binary data. If binary data is used, the above weight
updation formula cannot distinguish two conditions namely;
1.A training pair in which an input unit is "on" and target value is "off."
2. A training pair in which both the input unit and the target value are "off."
Thus, there are limitations in Hebb rule application over binary data. Hence, the representation
using bipolar data is advantageous.
Flowchart of Hebb Training algorithm
Training Algorithm
 Step 0: First initialize the weights. Basically in this network they may be set to zero, i.e., w; = 0
for i = 1 to n where "n" may be the total number of input neurons.
 Step 1: Steps 2-4 have to be performed for each input training vector and target output pair, s: t.
 Step 2: Input units activations are set. Generally, the activation function of input layer is identity
function: xi=si for i=1 to n
 Step 3: Output units activations are set: y=t
 Step 4: Weight adjustments and bias adjustments are performed:
 Wi(new) = wi(old) + xiy
 b(new) = b(old) + y
 The above five steps complete the algorithmic process. In Step 4, the weight updation formula can
also be given in vector form as
 w(new)= u(old) +xy
 Here the change in weight can be expressed as
 D.w = xy
 As a result,
 w(new) = w(old) +D.w
 The Hebb rule can be used for pattern association, pattern categorization, parcem classification
and over a
 range of other areas

Designing a Hebb network to implement AND function:

Fig 3. Training data table

AND function is very simple and mostly known to everyone where the output is 1/SET/ON if
both the inputs are 1/SET/ON. But in the above example, we have used ‘-1' instead of ‘0’ this
is because the Hebb network uses bipolar data and not binary data because the product item in
the above equations would give the output as 0 which leads to a wrong calculation.
Starting with setp1 which is inializing the weights and bias to ‘0’, so we get
w1=w2=b=0

A) First input [x1,x2,b]=[1,1,1] and target/y = 1. Now using the initial weights as old weight
and applying the Hebb rule (ith value of w(new) = ith value of w(old) + (ith value of x * y))
as follow;

w1(new) = w1(old) + (x1*y) = 0+1 * 1 = 1

w2(new) = w2(old) + (x2*y) = 0+1 * 1 = 1

b(new) = b(old) + y = 0+1 =1

Now the above final weights act as the initial weight when the second input pattern is
presented. And remember that weight change here is;

Δith value of w = ith value of x * y

hence weight changes relating to the first input are;

Δw1= x1y = 1*1=1

Δw2 = x2y = 1*1=1

Δb = y = 1

We got our first output and now we start with the second inputs from the table(2nd row)
B) Second input [x1,x2,b]=[1,-1,1] and target/y = -1.

Note: here that the initial or the old weights are the final(new) weights obtained by
performing the first input pattern i.e [w1,w2,b] = [1,1,1]

Weight change here is;

Δw1 = x1*y = 1*-1 = -1

Δw2 =x2*y = -1 * -1 = 1

Δb = y = -1

The new weights here are;

w1(new) = w1(old) + Δw1= 1–1 = 0

w2(new) = w2(old) + Δw2= 1+1 = 2

b(new) = b(old) + Δb= 1–1=0

similarly, using the same process for third and fourth row we get a new table as follows;
Fig 4. Final output table

Here the final weights we get are w1=2, w2=2, b=-2

Hebb network for AND function

Perceptron model
Perceptron is Machine Learning algorithm for supervised learning of various binary classification tasks.
Further, Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect certain input
data computations in business intelligence.

Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a
supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural network with
four main parameters, i.e., input values, weights and Bias, net sum, and an activation function.

In Machine Learning, binary classifiers are defined as the function that helps in deciding whether input data can be
represented as vectors of numbers and belongs to some specific class.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as a classification
algorithm that can predict linear predictor function in terms of weight and feature vectors.

Basic Components of Perceptron


Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which contains three main components.
These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into the system for further processing. Each
input node contains a real numerical value.

o Wight and Bias:

Weight parameter represents the strength of the connection between units. This is another most important parameter
of Perceptron components. Weight is directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear equation.

o Activation Function:

These are the final and important components that help to determine whether the neuron will fire or not. Activation
Function can be considered primarily as a step function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on various problem statements and
forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by
checking whether the learning process is slow or has vanishing or exploding gradients.

How does Perceptron work?


In Machine Learning, Perceptron is considered as a single-layer neural network that consists of four main parameters
named input values (Input nodes), weights and Bias, net sum, and an activation function. The perceptron model begins
with the multiplication of all input values and their weights, then adds these values together to create the weighted
sum. Then this weighted sum is applied to the activation function 'f' to obtain the desired output. This activation
function is also known as the step function and is represented by 'f'.

This step function or Activation function plays a vital role in ensuring that output is mapped between required values
(0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of a node. Similarly, an
input's bias value gives the ability to shift the activation function curve up or down.

Perceptron model works in two important steps as follows:


Step-1

In the first step first, multiply all input values with corresponding weight values and then add them to determine the
weighted sum. Mathematically, we can calculate the weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's performance.

∑wi*xi + b

Step-2

In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us output
either in binary form or a continuous value as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models


Based on the layers, Perceptron models are divided into two types. These are as follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

 The output obtained from the associator is a binary vector and hence output can be taken as input signal to
the response unit and classification can be performed.
 There are n input neurons,1 output neuron and a bias.
 The input layer and output layer neurons are connected through a directed communication link which is
associated with weights.
 The goal of perceptron net is to classify the input pattern as a member or not a member to a particular class.
Training Algorithm
Step 0: Initialize the weights and the bias(for easy calculation they can be set to 0).Also initialize the

learning rate a(0< a<= 1). For simplicity a is set to 1.

Step 1:Perform Steps 2-6 until the final stopping condition is false.

Step 2:Perform Steps 3-5 for each training pair indicated by s:t.

Step 3: The input layer containing input units is applied with identity activation functions:

xi =si

Step 4: Calculate the output of the nwvork. To do so, first obtain the net input:

Yin= b+ ∑xiwi
where "n" is the number of input neurons in the input layer. Then apply activations over the net input calculated to
obtain the output:

1 if yin > q

y= f (yin ) = 0 if –q  yin  q

1 if yin < –q

Step 5: Weight and bias adjustment: Compare ilie value of the actual (calculated) output and desired (target)
output.

If y is not =t then

wi (new) = wi (old) + atxi

b(new) = b(old) + at

else

wi (new) = wi (old)

b(new) = b(old)

Step 6: Train the nerwork until diere is no weight change. This is the stopping condition for the network. If this
condition is not met, then start again from Step 2.

Testing Algorithm
 Step 0:The initial weights to be used here are taken from the training algorithms (the final weights obtained
during training).
 Step 1:For each input vector X to be classified, perform Steps 2-3.
 Step 2:Set activations of the input unit.
 Step 3:Obrain the response of output unit.

Yin= ∑xiwi

I=1

1 if yin > q

y= f (yin ) = 0 if –q  yin  q

1 if yin < –q
Multi-Layered Perceptron Model:
Like a single-layer perceptron model, a multi-layer perceptron model also has the same model structure but has a
greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as
follows:

o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the model's requirement. In this
stage, the error between actual output and demanded originated backward on the output layer and ended on the input
layer.

Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having various layers
in which activation function does not remain linear, similar to a single layer perceptron model. Instead of linear,
activation function can be executed as sigmoid, TanH, ReLU, etc., for deployment.

A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns. Further,
it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear problems.


o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects each independent variable.
o The model functioning depends on the quality of the training.

Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight coefficient
'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0
o 'w' represents real-valued weights vector
o 'b' represents the bias
o 'x' represents a vector of input x values.

Characteristics of Perceptron
The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of binary classifiers.


2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between the two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must have an output signal; otherwise, no output
will be shown.

Limitations of Perceptron Model


A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If input vectors are non-linear, it is
not easy to classify them properly.

Perceptron for AND gate


Perceptron model implementation for AND GATE:

Truth Table of AND Logical GATE is,

Weights w1 = 1.2, w2 = 0.6, Threshold = 1 and Learning Rate n = 0.5 are given

• For Training Instance 1: A=0, B=0 and Target = 0

wi.xi = 0*1.2 + 0*0.6 = 0

This is not greater than the threshold of 1, so the output = 0, Here the target is same as calculated output.

• For Training Instance 2: A=0, B=1 and Target = 0

wi.xi = 0*1.2 + 1*0.6 = 0.6

This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.

• For Training Instance 2: A=1, B=0 and Target = 0

wi.xi = 1*1.2 + 0*0.6 = 1.2

This is greater than the threshold of 1, so the output = 1. Here the target does not match with the calculated output.

Hence we need to update the weights.


Now,

After updating weights are w1 = 0.7, w2 = 0.6 Threshold = 1 and Learning Rate n = 0.5

w1 = 0.7, w2 = 0.6 Threshold = 1 and Learning Rate n = 0.5

For Training Instance 1: A=0, B=0 and Target = 0

wi.xi = 0*0.7 + 0*0.6 = 0

This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.

For Training Instance 2: A=0, B=1 and Target = 0

wi.xi = 0*0.7 + 1*0.6 = 0.6

This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.

For Training Instance 3: A=1, B=0 and Target = 0

wi.xi = 1*0.7 + 0*0.6 = 0.7

This is not greater than the threshold of 1, so the output = 0. Here the target is same as calculated output.

For Training Instance 4: A=1, B=1 and Target = 1

wi.xi = 1*0.7 + 1*0.6 = 1.3

This is greater than the threshold of 1, so the output = 1. Here the target is same as calculated output.

Hence the final weights are w1= 0.7 and w2 = 0.6, Threshold = 1 and Learning Rate n = 0.5.
Linear Function:
Linear Activation Function

The equation for Linear activation function is:

f(x) = a.x

When a = 1 then f(x) = x and this is a special case known as identity.

Properties:

1. Range is -infinity to +infinity


2. Provides a convex error surface so optimisation can be achieved faster
3. df(x)/dx = a which is constant. So cannot be optimised with gradient descent
Limitations:

1. Since the derivative is constant, the gradient has no relation with input
2. Back propagation is constant as the change is delta x

Linear Models
Introduction to Linear Models

The linear model is one of the most simple models in machine learning. It assumes that the data is linearly separable
and tries to learn the weight of each feature. Mathematically, it can be written as Y=WTX, where X is the feature
matrix, Y is the target variable, and W is the learned weight vector. We apply a transformation function or a
threshold for the classification problem to convert the continuous-valued variable Y into a discrete category. Here
we will briefly learn linear and logistic regression, which are the regression and classification task models,
respectively.
Linear Regression

Linear Regression is a statistical approach that predicts the result of a response variable by combining numerous
influencing factors. It attempts to represent the linear connection between features (independent variables) and the
target (dependent variables). The cost function enables us to find the best possible values for the model parameters.
A detailed discussion on linear regression is presented in a different article.

Applications of Linear Models

Linear models have a wide range of applications in various fields due to their simplicity,
interpretability, and effectiveness in many scenarios. Here are some common applications of
linear models:

1. Regression Analysis:
• Predictive Modeling: Linear regression is used to model the relationship between a
dependent variable and one or more independent variables. It's widely used in fields like
economics, finance, and epidemiology to make predictions.
• Time Series Analysis: Linear models can be applied to time series data to forecast future
values based on historical trends.
2. Classification:
• Logistic Regression: This is a linear model used for binary classification tasks. It's widely
used in fields like healthcare for disease prediction and in marketing for customer churn
prediction.
3. Natural Language Processing (NLP):
• Text Classification: Linear models like logistic regression and linear SVMs are used for
tasks such as sentiment analysis, spam detection, and topic classification.
4. Image Processing:
• Image Classification: Linear models are applied to computer vision tasks for classifying
objects in images when feature extraction methods are used, like HOG (Histogram of
Oriented Gradients) or bag-of-words for images.
5. Economics:
• Demand Estimation: Linear models are used to estimate the relationship between the
demand for a product and various factors like price, income, and advertising spend.
6. Finance:
• Portfolio Optimization: Linear models help investors construct portfolios that optimize
returns while managing risk.
• Credit Scoring: Linear models are used to assess the creditworthiness of individuals or
businesses.
7. Engineering:
• Control Systems: Linear models are used in control engineering to model and control
various physical systems.
8. Biology and Life Sciences:
• Pharmacokinetics: Linear models are applied to study the distribution and elimination of
drugs in the body.
• Genomics: Linear models can be used for gene expression analysis and predicting
biological outcomes.
9. Social Sciences:
• Sociology: Linear models are used to analyze social phenomena and relationships.
• Psychology: Linear models can be used in psychological research to examine
relationships between variables.
10. Environmental Science:
• Environmental Impact Assessment: Linear models can assess the impact of various
factors on the environment.
11. Marketing:
• Market Response Models: Linear models are used to understand how marketing efforts
affect sales and customer behavior.
12. Quality Control:
• Manufacturing: Linear models can be applied to quality control processes to detect defects
or variations in production.
13. Operations Research:
• Linear Programming: Linear models are used to optimize resource allocation in logistics,
supply chain management, and transportation.
14. Physics:
• Particle Physics: Linear models are used in the analysis of experimental data to discover
new particles or phenomena.
15. Astronomy:
• Stellar Spectroscopy: Linear models can be used to analyze the spectral characteristics of
stars and celestial objects.
16. Social Media and Recommender Systems:
• Linear models can be used to build recommendation systems for suggesting products,
movies, or content to users based on their preferences and behavior.
Applying Linear Models

Example: Modeling linear relationships can help solve real-world applications. Consider the example
situations below, and note how different problem-solving methods may be used in each.
1. Nadia has $200 in her savings account. She gets a job that pays $7.50 per hour and she
deposits all her earnings in her savings account. Write the equation describing this problem in
slope-intercept form. How many hours would Nadia need to work to have $500 in her
account?

Linear Separability

1. Definition: Linear separability is a property of a dataset where two or more classes of data
points can be perfectly separated using a straight line (in 2D) or a hyperplane (in higher
dimensions). In other words, there exists a line or hyperplane that can cleanly separate all
data points of one class from those of another class.
2. Visualization: In a two-dimensional space, think of a scatterplot with data points from two
different classes. If you can draw a single straight line on the plot in such a way that all data
points of one class are on one side of the line, and all data points of the other class are on the
opposite side, the data is linearly separable.
3. Mathematical Expression: Linear separability can be mathematically represented as follows:
If you have data points (x1, y1), (x2, y2), ..., (xn, yn), and each data point is associated with a
class label (either 0 or 1), then the data is linearly separable if there exist coefficients (w0,
w1, w2, ..., wn) such that w0 + w1x1 + w2x2 + ... + wnxn > 0 for all points of one class and
w0 + w1x1 + w2x2 + ... + wnxn < 0 for all points of the other class.
4. Use in Machine Learning: Linear separability is an important concept in machine learning,
especially in binary classification tasks. If your data is linearly separable, you can use linear
classifiers like linear SVMs, logistic regression, or perceptrons to make accurate predictions.
5. Limitations: Not all real-world datasets are linearly separable. In many cases, data points
from different classes may overlap or be arranged in complex patterns that cannot be
separated by a single straight line or hyperplane. In such cases, more sophisticated machine
learning models or non-linear transformation techniques may be necessary.
6. Example: Imagine a dataset of flowers with two classes: red roses and blue violets. If the
dataset is linearly separable, you can find a line that cleanly separates the red roses from the
blue violets on a scatterplot. This line can serve as a decision boundary for classifying new
flowers.
7. Importance: Linear separability simplifies classification tasks and allows for the use of
simple and interpretable models. However, it's essential to assess whether a dataset is linearly
separable before choosing a classification algorithm, as non-linearly separable data may
require more complex model architectures.

Linearly Separable 2D Data

We say a two-dimensional dataset is linearly separable if we can separate the positive from the negative
objects with a straight line.
It doesn’t matter if more than one such line exists. For linear separability, it’s sufficient to find only
one:
Conversely, no line can separate linearly inseparable 2D data:

Linearly Separable -Dimensional Data


The equivalent of a 2D line is a hyperplane in -dimensional spaces. Assuming that the features are
real, the equation of a hyperplane is:
Solution of OR function using linear separability model.
Unit-III
Supervised and Unsupervised learning
Machine learning is a field of computer science that gives computers the ability to learn without being
explicitly programmed. Supervised learning and unsupervised learning are two main types of machine
learning.
In supervised learning, the machine is trained on a set of labeled data, which means that the input data is
paired with the desired output. The machine then learns to predict the output for new input data.
Supervised learning is often used for tasks such as classification, regression, and object detection.
In unsupervised learning, the machine is trained on a set of unlabeled data, which means that the input
data is not paired with the desired output. The machine then learns to find patterns and relationships in
the data. Unsupervised learning is often used for tasks such as clustering, dimensionality reduction, and
anomaly detection.
What is Supervised learning?
Supervised learning is a type of machine learning algorithm that learns from labeled data. Labeled data is
data that has been tagged with a correct answer or classification.
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Supervised
learning is when we teach or train the machine using data that is well-labelled. Which means some data is
already tagged with the correct answer. After that, the machine is provided with a new set of
examples(data) so that the supervised learning algorithm analyses the training data(set of training
examples) and produces a correct outcome from labeled data.
For example, a labeled dataset of images of Elephant, Camel and Cow would have each image tagged
with either “Elephant” , “Camel”or “Cow.”
Key Points:
• Supervised learning involves training a machine from labeled data.
• Labeled data consists of examples with the correct answer or classification.
• The machine learns the relationship between inputs (fruit images) and outputs (fruit labels).
• The trained machine can then make predictions on new, unlabeled data.
Example:
Let’s say you have a fruit basket that you want to identify. The machine would first analyze the image to
extract features such as its shape, color, and texture. Then, it would compare these features to the features
of the fruits it has already learned about. If the new image’s features are most similar to those of an
apple, the machine would predict that the fruit is an apple.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to
train the machine with all the different fruits one by one like this:
• If the shape of the object is rounded and has a depression at the top, is red in color, then it will be
labeled as –Apple.
• If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be
labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from the basket,
and asked to identify it.
Since the machine has already learned the things from previous data and this time has to use it wisely. It
will first classify the fruit with its shape and color and would confirm the fruit name as BANANA and
put it in the Banana category. Thus the machine learns the things from training data(basket containing
fruits) and then applies the knowledge to test data(new fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of algorithms:
• Regression: A regression problem is when the output variable is a real value, such as “dollars” or
“weight”.
• Classification: A classification problem is when the output variable is a category, such as “Red” or
“blue” , “disease” or “no disease”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is already
tagged with the correct answer.
1- Regression
Regression is a type of supervised learning that is used to predict continuous values, such as house prices,
stock prices, or customer churn. Regression algorithms learn a function that maps from the input features
to the output value.
Some common regression algorithms include:
• Linear Regression
• Polynomial Regression
• Support Vector Machine Regression
• Decision Tree Regression
• Random Forest Regression
2- Classification
Classification is a type of supervised learning that is used to predict categorical values, such as whether a
customer will churn or not, whether an email is spam or not, or whether a medical image shows a tumor
or not. Classification algorithms learn a function that maps from the input features to a probability
distribution over the output classes.
Some common classification algorithms include:
• Logistic Regression
• Support Vector Machines
• Decision Trees
• Random Forests
• Naive Baye
Evaluating Supervised Learning Models
Evaluating supervised learning models is an important step in ensuring that the model is accurate and
generalizable. There are a number of different metrics that can be used to evaluate supervised learning
models, but some of the most common ones include:
For Regression
• Mean Squared Error (MSE): MSE measures the average squared difference between the predicted
values and the actual values. Lower MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square root of MSE, representing the standard
deviation of the prediction errors. Similar to MSE, lower RMSE values indicate better model
performance.
• Mean Absolute Error (MAE): MAE measures the average absolute difference between the
predicted values and the actual values. It is less sensitive to outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared measures the proportion of the variance in
the target variable that is explained by the model. Higher R-squared values indicate better model
fit.
For Classification
• Accuracy: Accuracy is the percentage of predictions that the model makes correctly. It is
calculated by dividing the number of correct predictions by the total number of predictions.
• Precision: Precision is the percentage of positive predictions that the model makes that are actually
correct. It is calculated by dividing the number of true positives by the total number of positive
predictions.
• Recall: Recall is the percentage of all positive examples that the model correctly identifies. It is
calculated by dividing the number of true positives by the total number of positive examples.
• F1 score: The F1 score is a weighted average of precision and recall. It is calculated by taking the
harmonic mean of precision and recall.
• Confusion matrix: A confusion matrix is a table that shows the number of predictions for each
class, along with the actual class labels. It can be used to visualize the performance of the model
and identify areas where the model is struggling.
Applications of Supervised learning
Supervised learning can be used to solve a wide variety of problems, including:
• Spam filtering: Supervised learning algorithms can be trained to identify and classify spam emails
based on their content, helping users avoid unwanted messages.
• Image classification: Supervised learning can automatically classify images into different
categories, such as animals, objects, or scenes, facilitating tasks like image search, content
moderation, and image-based product recommendations.
• Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing patient data,
such as medical images, test results, and patient history, to identify patterns that suggest specific
diseases or conditions.
• Fraud detection: Supervised learning models can analyze financial transactions and identify
patterns that indicate fraudulent activity, helping financial institutions prevent fraud and protect
their customers.
• Natural language processing (NLP): Supervised learning plays a crucial role in NLP tasks,
including sentiment analysis, machine translation, and text summarization, enabling machines to
understand and process human language effectively.
Advantages of Supervised learning
• Supervised learning allows collecting data and produces data output from previous experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in the training data.
Disadvantages of Supervised learning
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
• Supervised learning cannot handle all complex tasks in Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.
What is Unsupervised learning?
Unsupervised learning is a type of machine learning that learns from unlabeled data. This means that the
data does not have any pre-existing labels or categories. The goal of unsupervised learning is to discover
patterns and relationships in the data without any explicit guidance.
Unsupervised learning is the training of a machine using information that is neither classified nor labeled
and allowing the algorithm to act on that information without guidance. Here the task of the machine is to
group unsorted information according to similarities, patterns, and differences without any prior training
of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the machine.
Therefore the machine is restricted to find the hidden structure in unlabeled data by itself.
You can use unsupervised learning to examine the animal data that has been gathered and distinguish
between several groups according to the traits and actions of the animals. These groupings might
correspond to various animal species, providing you to categorize the creatures without depending on
labels that already exist.

Key Points
• Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points together based on their inherent characteristics.
• Feature extraction captures essential information from the data, enabling the model to make
meaningful distinctions.
• Label association assigns categories to the clusters based on the extracted patterns and
characteristics.
Example
Imagine you have a machine learning model trained on a large dataset of unlabeled images, containing
both dogs and cats. The model has never seen an image of a dog or cat before, and it has no pre-existing
labels or categories for these animals. Your task is to use unsupervised learning to identify the dogs and
cats in a new, unseen image.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and
cats ‘. But it can categorize them according to their similarities, patterns, and differences, i.e., we can
easily categorize the above picture into two parts. The first may contain all pics having dogs in them and
the second part may contain all pics having cats in them. Here you didn’t learn anything before, which
means no training data or examples.
It allows the model to work on its own to discover patterns and information that was previously
undetected. It mainly deals with unlabelled data.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories of algorithms:
• Clustering: A clustering problem is where you want to discover the inherent groupings in the data,
such as grouping customers by purchasing behavior.
• Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.

Hebbian Learning
Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by Donald O Hebb. It is one
of the first and also easiest learning rules in the neural network. It is used for pattern classification. It is a
single layer neural network, i.e. it has one input layer and one output layer. The input layer can have
many units, say n. The output layer only has one unit. Hebbian rule works by updating the weights
between neurons in the neural network for each training sample.
Hebbian Learning Rule Algorithm :
1. Set all weights to zero, w i = 0 for i=1 to n, and bias to zero.
2. For each input vector, S(input vector) : t(target output pair), repeat steps 3-5.
3. Set activations for input units with the input vector X i = Si for i = 1 to n.
4. Set the corresponding output value to the output neuron, i.e. y = t.
5. Update weight and bias by applying Hebb rule for all i = 1 to n:

Reinforcement Learning:
Reinforcement Learning (RL) is a branch of machine learning focused on making decisions to maximize cumulative
rewards in a given situation. Unlike supervised learning, which relies on a training dataset with predefined answers,
RL involves learning through experience. In RL, an agent learns to achieve a goal in an uncertain, potentially
complex environment by performing actions and receiving feedback through rewards or penalties.
Key Concepts of Reinforcement Learning
• Agent: The learner or decision-maker.
• Environment: Everything the agent interacts with.
• State: A specific situation in which the agent finds itself.
• Action: All possible moves the agent can make.
• Reward: Feedback from the environment based on the action taken.
How Reinforcement Learning Works
RL operates on the principle of learning optimal behavior through trial and error. The agent takes actions within the
environment, receives rewards or penalties, and adjusts its behavior to maximize the cumulative reward. This
learning process is characterized by the following elements:
• Policy: A strategy used by the agent to determine the next action based on the current state.
• Reward Function: A function that provides a scalar feedback signal based on the state and action.
• Value Function: A function that estimates the expected cumulative reward from a given state.
• Model of the Environment: A representation of the environment that helps in planning by predicting future
states and rewards.
Example: Navigating a Maze
The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to
find the best possible path to reach the reward. The following problem explains the problem more easily.

The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond
and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path
which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step
will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the
diamond.

Main points in Reinforcement learning –


• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to a particular problem
• Training: The training is based upon the input, The model will return a state and the user will decide to
reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.

What is Gradient Descent?


Gradient descent is an optimization algorithm that is used to minimize the loss function in a machine learning
model. The goal of gradient descent is to find the set of weights (or coefficients) that minimize the loss function.
The algorithm works by iteratively adjusting the weights in the direction of the steepest decrease in the loss
function.
How does Gradient Descent Work?
The basic idea of gradient descent is to start with an initial set of weights and update them in the direction of the
negative gradient of the loss function. The gradient is a vector of partial derivatives that represents the rate of change
of the loss function with respect to the weights. By updating the weights in the direction of the negative gradient, the
algorithm moves towards a minimum of the loss function.

The learning rate is a hyperparameter that determines the size of the step taken in the weight update. A small
learning rate results in a slow convergence, while a large learning rate can lead to overshooting the minimum and
oscillating around the minimum. It’s important to choose an appropriate learning rate that balances the speed of
convergence and the stability of the optimization.
Variants of Gradient Descent
1) Batch Gradient Descent:
In batch gradient descent, the gradient of the loss function is computed with respect to the weights for the entire
training dataset, and the weights are updated after each iteration. This provides a more accurate estimate of the
gradient, but it can be computationally expensive for large datasets.
2) Stochastic Gradient Descent (SGD):
In SGD, the gradient of the loss function is computed with respect to a single training example, and the weights are
updated after each example. SGD has a lower computational cost per iteration compared to batch gradient descent,
but it can be less stable and may not converge to the optimal solution.
3) Mini-Batch Gradient Descent:
Mini-batch gradient descent is a compromise between batch gradient descent and SGD. The gradient of the loss
function is computed with respect to a small randomly selected subset of the training examples (called a mini-
batch), and the weights are updated after each mini-batch. Mini-batch gradient descent provides a balance between
the stability of batch gradient descent and the computational efficiency of SGD.
4) Momentum:
Momentum is a variant of gradient descent that incorporates information from the previous weight updates to help
the algorithm converge more quickly to the optimal solution. Momentum adds a term to the weight update that is
proportional to the running average of the past gradients, allowing the algorithm to move more quickly in the
direction of the optimal solution

Delta Learning Rule


It was developed by Bernard Widrow and Marcian Hoff and It depends on supervised learning and has a continuous
activation function. It is also known as the Least Mean Square method and it minimizes error over all the training
patterns.
It is based on a gradient descent approach which continues forever. It states that the modification in the weight of a
node is equal to the product of the error and the input where the error is the difference between desired and actual
output.
Computed as follows:
Assume (x1,x2,x3……………………….xn) –>set of input vectors
and (w1,w2,w3…………………..wn) –>set of weights
y=actual output
wo=initial weight
wnew=new weight
δw=change in weight
Error= ti-y
Learning signal(ej)=(ti-y)y’
y=f(net input)= ∫wixi
δw=αxiej=αxi(ti-y)y’
wnew=wo+δw
The updating of weights can only be done if there is a difference between the target and actual output(i.e., error)
present:
case I: when t=y
then there is no change in weight
case II: else
wnew=wo+δw
open this link to know more about Delta learning rule:
https://www.mldawn.com/what-is-the-delta-rule-part-1/

Competitive Learning :
Artificial neural networks often utilize competitive learning models to classify input without the use of labeled
data. The process begins with an input vector (often a data set). This input is then presented to a network of
artificial neurons, each of which has its own set of weights, which act like filters. Each neuron computes a score
based on its weight and the input vector, typically through a dot product operation (a way of multiplying the input
information with the filter and adding the results together).

After the computation, the neuron that has the highest score (the "winner") is updated, usually by shifting its weights
closer to the input vector. This process is often referred to as the "Winner-Takes-All" strategy. Over time, neurons
become specialized as they get updated toward input vectors they can best match. This leads to the formation of
clusters of similar data, hence enabling the discovery of inherent patterns within the input dataset.

To illustrate how one can use competitive learning, imagine an eCommerce business wants to segment its customer
base for targeted marketing, but they have no prior labels or segmentation. By feeding customer data (purchase
history, browsing pattern, demographics, etc.) to a competitive learning model, they could automatically find
distinct clusters (like high spenders, frequent buyers, discount lovers) and tailor marketing strategies accordingly.

The Competitive Learning Process: A Step-by-Step Example

For this simple illustration, let's assume we have a dataset composed of 1-dimensional input vectors ranging from 1
to 10 and a competitive learning network with two neurons.

Step 1: Initialization

We start by initializing the weights of the two neurons to random values. Let's assume:
• Neuron 1 weight: 2
• Neuron 2 weight: 8
Step 2: Presenting the input vector

Now, we present an input vector to the network. Let's say our input vector is '5'.

Step 3: Calculating distance

We calculate the distance between the input vector and the weights of the two neurons. The neuron with the weight
closest to the input vector 'wins.' This could be calculated using any distance metric, for example, the absolute
difference:

• Neuron 1 distance: |5-2| = 3


• Neuron 2 distance: |5-8| = 3
Since both distances are equal, we can choose the winner randomly. Let's say Neuron 1 is the winner.

Step 4: Updating weights

We adjust the winning neuron's weight to bring it closer to the input vector. If our learning rate (a tuning parameter
in an optimization algorithm that determines the step size at each iteration) is 0.5, the weight update would be:

• Neuron 1 weight: 2 + 0.5*(5-2) = 3.5


• Neuron 2 weight: 8 (unchanged)
Step 5: Iteration

We repeat the process with all the other input vectors in the dataset, updating the weights after each presentation.

Step 6: Convergence

After several iterations (also known as epochs), the neurons' weights will start to converge to the centers of their
corresponding input clusters. In this case, with 1-dimensional data ranging from 1 to 10, we could expect one neuron
to converge around the lower range (1 to 5) and the other around the higher range (6 to 10).

This process exemplifies how competitive learning works. Over time, each neuron specializes in a different cluster
of the data, enabling the system to identify and represent the inherent groupings in the dataset.

Backpropagation Algorithm
Backpropagation is the essence of neural network training. It is the method of fine-tuning the weights of a neural
network based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning of the weights allows
you to reduce error rates and make the model reliable by increasing its generalization.
Backpropagation in neural network is a short form for “backward propagation of errors.” It is a standard method of
training artificial neural networks. This method helps calculate the gradient of a loss function with respect to all the
weights in the network.
How Backpropagation Algorithm Works
The Back propagation algorithm in neural network computes the gradient of the loss function for a single weight by
the chain rule. It efficiently computes one layer at a time, unlike a native direct computation. It computes the
gradient, but it does not define how the gradient is used. It generalizes the computation in the delta rule.
Consider the following Back propagation neural network example diagram to understand:

1. nputs X, arrive through the preconnected path


2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is decreased.
Keep repeating the process until the desired output is achieved
Why We Need Backpropagation?
Most prominent advantages of Backpropagation are:
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function to be learned.
What is a Feed Forward Network?
A feedforward neural network is an artificial neural network where the nodes never form a cycle. This kind of neural
network has an input layer, hidden layers, and an output layer. It is the first and simplest type of artificial neural
network.

Types of Backpropagation Networks


Two Types of Backpropagation Networks are:
• Static Back-propagation
• Recurrent Backpropagation
Static back-propagation:
It is one kind of backpropagation network which produces a mapping of a static input for static output. It is useful to
solve static classification issues like optical character recognition.
Recurrent Backpropagation:
Recurrent Back propagation in data mining is fed forward until a fixed value is achieved. After that, the error is
computed and propagated backward.
The main difference between both of these methods is: that the mapping is rapid in static back-propagation while it
is nonstatic in recurrent backpropagation.
History of Backpropagation
• In 1961, the basics concept of continuous backpropagation were derived in the context of control theory by J.
Kelly, Henry Arthur, and E. Bryson.
• In 1969, Bryson and Ho gave a multi-stage dynamic system optimization method.
• In 1974, Werbos stated the possibility of applying this principle in an artificial neural network.
• In 1982, Hopfield brought his idea of a neural network.
• In 1986, by the effort of David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, backpropagation
gained recognition.
• In 1993, Wan was the first person to win an international pattern recognition contest with the help of the
backpropagation method.

Variations in Back Propagation:


Different types of Backpropagation (4 types)
There are four main types of backpropagation:
1. Static backpropagation
2. Recurrent backpropagation
3. Resilient backpropagation
4. Backpropagation through time
1. Static backpropagation
Static backpropagation involves calculating the error function relative to the network weights, which is utilized to
modify these weights and enhance the peformance of the network.
Core Concept:
The fundamental idea behind static backpropagation involves calculating the gradient of the error function with
respect to the network weights. This is done by applying the chain rule of calculus, which allows us to represent the
derivative of a function with respect to its argument in terms of the derivatives of its intermediate variables.
Equation:
The equation for static backpropagation is as follows:
∂E/∂w = ∂E/∂y * ∂y/∂s * ∂s/∂w
Where E is the error function, w is a weight of the network, y is the output of the network, and s is an intermediate
variable.
Advantages:
The following are some benefits of static backpropagation:
1. Able to handle intricate nonlinear input–output interactions.
2. Can be applied to a variety of tasks, including speech recognition, image recognition, and natural language
processing.
3. The network is capable of being trained via supervised learning, which enables it to learn from labeled data.
Disadvantages:
The following are some of the drawbacks of static backpropagation:
1. For big networks, it may be computationally expensive.
2. The error function may have local minima, when it remains at an unfavorable value.
3. Needs a significant volume of labeled data for training.
Applications:
Static backpropagation has several uses, such as the following:
1. Image recognition: A neural network can be trained to recognize images via static backpropagation, for as by
identifying handwritten numbers in a dataset.
2. Static backpropagation can be used to train a neural network to convert speech into text for speech
recognition.
3. Static backpropagation can be used to train a neural network to comprehend and produce natural language
text in natural language processing.
2. Recurrent backpropagation
Recurrent backpropagation is a popular technique for training artificial neural networks, which involves computing
the gradients of the loss function with respect to the network's parameters and then changing the parameters to
minimize the loss. Recurrent neural networks have an internal feedback loop that enables data to be transferred from
one phase to the next. Because the parameters of the network are shared across time steps, it is necessary to properly
combine the gradients from each time step in order to update the parameters.
Equation
Recurrent backpropagation uses a similar equation to the conventional backpropagation algorithm, but it calculates
weight updates by adding the gradients from the current time step and all prior time steps. It is shown below:
Δwij(n) = -η ∂E(n) / ∂wij(n) - λ ∂E(n-1) / ∂wij(n) - λ^2 ∂E(n-2) / ∂wij(n) - ... - λ^(n-1) ∂E(1) / ∂wij(n)
where Δwij(n) is the weight update for the j-th neuron in the i-th layer at time step n, η is the learning rate, λ is the
forgetting factor, and ∂E(t) / ∂wij(n) is the partial derivative of the error with respect to the weight wij at time step t.
Advantages:
1.RBP is a good choice for resolving sequence issues because it can handle sequences of different lengths.
2.The feedback links in recurrent backpropagation allows the network to recall earlier inputs.
3.Recurrent backpropagaton allows sequential data can be handled in real time.
Disadvantages:
1.Due to the feedback connections and the requirement to handle sequences of various lengths, training recurrent
backpropagation can be computationally expensive.
2.When the sequence length is large, recurrent backpropagation can experience gradients that disappear or explode.
Applications:
1.Recurrent backpropagation is utilized for speech recognition when audio signals are present.
2.Recurrent backpropagation is used to process text input that is written in natural language.
3.Recurrent backpropagation is used to forecast future values in time series data, including stock prices and weather
predictions.
3. Resilient backpropagation
Resilient backpropagation is a type of neural network learning algorithm that adjusts weights and biases using only
the sign of the gradien. The fundamental principle of resilient backpropagation is to modify the step size of weight
updates in accordance with the size of earlier weight updates. The algorithm is made to be resistant to redundant,
noisy, or unsuitable training data.
Equation:
The weight update rule for Resilient backpropagation is given by:
Δw_ij(t) = -sign(g_ij(t)) * Δw_ij(t-1)
where g_ij(t) is the gradient of the error function with respect to the weight w_ij at time t, sign() is the sign function,
and Δw_ij(t) is the weight change at time t.
Advantages:
1.Since resilient backpropagation just considers the gradient's sign, it is less susceptible to noisy data and can
converge more quickly than other gradient descent techniques.
2.Since it adjusts the step size of weight updates, it is more resistant to redundant or poorly conditioned training
data.
3.Compared to other gradient descent techniques, resilient backpropagation requires less processing.
Disadvantages:
1.Since resilient backpropagation does not take into account the gradient's strength, it may become stuck in local
minima.
2.More hyperparameter adjustment may be necessary than with other gradient descent techniques.
Applications:
1.In numerous neural network topologies, resilient backpropagation has been utilized for a variety of tasks,
including image classification, speech recognition, and stock price prediction.
2.It can be used to enhance the performance of deep learning architectures like convolutional neural networks
(CNNs) and recurrent neural networks (RNNs).
3.Additionally, resilient backpropagation has been used to a variety of optimization issues in robotics, computer
vision, and control theory.
4. Backpropagation through time
Backpropagation through time is used in recurrent neural networks (RNNs) to compute gradients for each weight in
the network. Like any other neural network, it may be used to train sequences of input/output patterns. However,
unlike other neural networks, it also preserves the "memory" of earlier inputs in the sequence, which makes it
especially useful for time series data. The feedforward neural networks are trained using the backpropagation
technique, which is extended to the RNNs via Backpropagation through time.
Equation:
The Backpropagation through time algorithm involves the use of chain rule of differentiation to compute the
gradient of the error function with respect to the weights of the recurrent neural network. The algorithm is based on
the following equation:

Advantages:
1.For time series data, such as speech recognition, stock price forecasting, and natural language processing,
Backpropagation through time is very successful. It has the ability to recall information from the past and can handle
input data of any length.
2.It is a commonly used and well-known algorithm that has been proven to function effectively in real-world
settings.
3.Many well-liked deep learning frameworks, like TensorFlow and PyTorch, utilise Backpropagation through time.
Disadvantages:
1.Backpropagation through time can be computationally expensive and demands a lot of CPU power.
2.The backpropagation of errors over long sequences makes the BPTT training process unstable when the sequence
length is large, which causes the vanishing or expanding gradient problem.
3.The approach favors short-term memory since it only propagates errors backwards as far as the length of the
sequence.
Applications:
1.Natural Language Processing: Because Backpropagation through time can produce text sequences based on prior
inputs, it is frequently used for language modeling, text categorization, and machine translation.
2.Speech Recognition: In order to recognize spoken words and translate them into text, Backpropagation through
time is employed in speech recognition tasks.
3.Stock Price Prediction: Using historical data trends and Backpropagation through time, one may forecast stock
prices.
4.Using prior notes and musical patterns, Backpropagation through time can be utilized to create musical sequences.
Unit 4
Unit 4

Associative memory networks:


These kinds of neural networks work based on pattern association, which means they can store
different patterns and at the time of giving an output they can produce one of the stored patterns by
matching them with the given input pattern. These types of memories are also called Content-
Addressable Memory CAM. Associative memory makes a parallel search with the stored patterns as
data files.

Following are the two types of associative memories we can observe −

• Auto Associative Memory


• Hetero Associative memory
Auto Associative Memory
This is a single layer neural network in which the input training vector and the output target vectors are
the same. The weights are determined so that the network stores a set of patterns.

Architecture

As shown in the following figure, the architecture of Auto Associative memory network
has ‘n’ number of input training vectors and similar ‘n’ number of output target vectors.
Hetero Associative memory
Similar to Auto Associative Memory network, this is also a single layer neural network. However, in
this network the input training vector and the output target vectors are not the same. The weights are
determined so that the network stores a set of patterns. Hetero associative network is static in nature,
hence, there would be no non-linear and delay operations.

Architecture

As shown in the following figure, the architecture of Hetero Associative Memory network
has ‘n’ number of input training vectors and ‘m’ number of output target vectors.
Applications of Associative memory :-
1. It can be only used in memory allocation format.
2. It is widely used in the database management systems, etc.
3. Networking: Associative memory is used in network routing tables to quickly find the path to a
destination network based on its address.
4. Image processing: Associative memory is used in image processing applications to search for
specific features or patterns within an image.
5. Artificial intelligence: Associative memory is used in artificial intelligence applications such as
expert systems and pattern recognition.
6. Database management: Associative memory can be used in database management systems to
quickly retrieve data based on its content.
Advantages of Associative memory :-
1. It is used where search time needs to be less or short.
2. It is suitable for parallel searches.
3. It is often used to speedup databases.
4. It is used in page tables used by the virtual memory and used in neural networks.
Disadvantages of Associative memory :-
1. It is more expensive than RAM.
2. Each cell must have storage capability and logical circuits for matching its content with
external argument.
Outer product rule
Storage capacity of Associative memory:
Associative memory, often referred to as content-addressable memory (CAM), is a type of computer memory that
allows data to be retrieved based on its content or attributes, rather than requiring a specific memory address. This
makes it well-suited for applications that involve searching for data or patterns within a large dataset.

Factors Affecting Storage Capacity:


• Number of Memory Cells: The storage capacity of an associative memory is primarily determined by the
number of memory cells it contains. Each memory cell can store a specific data item and its associated tag.
More memory cells allow for the storage of more data items.
• Size of Memory Cells: The size of each memory cell affects the amount of data that can be stored in it.
Larger memory cells can hold more information. This size can be measured in bits, bytes, or other units
depending on the technology used.
• Technology Used: The technology employed to implement the associative memory plays a crucial role in
determining its storage capacity. Different technologies have different characteristics and limits. For
instance, optical associative memories may have different storage capacities compared to electronic ones.
• Hardware Limitations: The hardware used to build the associative memory system can also affect its storage
capacity. This includes factors like the number of memory chips or modules in use.

Testing associative memory for missing and mistaken data involves ensuring that the memory system can accurately
retrieve the desired data even when it's incomplete or contains errors. Here are some common methods and
considerations for testing associative memory in such scenarios:
1. Incomplete or Missing Data:
• Test for Retrieval of Partial Data: Assess whether the associative memory can successfully retrieve
data even if only a part of the search key or data is provided. This is crucial in situations where some
attributes of the data might be missing or incomplete.
• Conduct Tests with Wildcards: Use wildcards or placeholders in the search keys to simulate missing
data. For example, use "don't care" bits that can match any value in the search key. Verify that the
memory can handle such incomplete queries.
• Evaluate Handling of NULL Values: In databases and certain applications, NULL values may
indicate missing data. Test whether the associative memory can correctly handle and retrieve data
containing NULL values.
2. Mistaken Data:
• Introduce Data Corruption: To test how well the associative memory handles mistaken or corrupted
data, deliberately introduce errors or incorrect values into the stored data. This may involve bit flips,
noise injection, or other forms of data corruption.
• Test Error Tolerance: Assess the memory's error tolerance by providing search keys with slight
variations or errors. Check if the memory can still retrieve the correct data, even when the input is not
a perfect match.
• Evaluate Error Correction: Some associative memory systems may have built-in error correction
capabilities. Test these features to ensure that they can successfully correct mistaken data and provide
the correct response.
3. Simulation and Benchmarking:
• Create Simulation Scenarios: Develop test scenarios that mimic real-world situations where missing
or mistaken data can occur. These scenarios should be representative of the specific application and
data types you're working with.
• Benchmark Performance: Measure the performance of the associative memory in terms of retrieval
accuracy, response time, and resource utilization when handling missing or mistaken data. Compare
the results against predetermined benchmarks or performance expectations.
4. Validation and Verification:
• Use Validation Data: Have a set of validation data with known missing or mistaken values. Test the
associative memory using this data to verify that it returns the expected results.
• Verification with Test Suites: Develop comprehensive test suites that cover various scenarios of
missing and mistaken data. This ensures thorough validation of the memory system.
5. Test Automation:
• Implement Test Automation: Use automated testing frameworks and scripts to systematically test the
associative memory's response to different missing and mistaken data scenarios. Automation helps
ensure consistency and repeatability.
6. Stress Testing:
• Subject the Memory to Stress Tests: Push the associative memory system to its limits by testing its
performance with large datasets, heavy workloads, and extreme conditions. This can help identify
potential issues under stress.
7. Real-World Data:
• Test with Real-World Data: Whenever possible, use actual data from the application domain to test
how the associative memory handles missing or mistaken data. Real-world data may present unique
challenges that synthetic data does not.
8. Error Reporting:
• Implement Error Reporting and Logging: Include mechanisms for the memory system to report errors
and log events related to missing and mistaken data. This can be valuable for diagnosing and
debugging issues during testing and in a production environment.

Bidirectional Associative Memory (BAM):


Bidirectional Associative Memory (BAM) is a supervised learning model in Artificial Neural Network.
This is hetero-associative memory, for an input pattern, it returns another pattern which is potentially of a
different size. This phenomenon is very similar to the human brain. Human memory is necessarily
associative. It uses a chain of mental associations to recover a lost memory like associations of faces with
names, in exam questions with answers, etc.
In such memory associations for one type of object with another, a Recurrent Neural Network (RNN) is
needed to receive a pattern of one set of neurons as an input and generate a related, but different, output
pattern of another set of neurons.
Why BAM is required?
The main objective to introduce such a network model is to store hetero-associative pattern pairs.
This is used to retrieve a pattern given a noisy or incomplete pattern.
BAM Architecture:
When BAM accepts an input of n-dimensional vector X from set A then the model recalls m-dimensional
vector Y from set B. Similarly when Y is treated as input, the BAM recalls X.

Here's a brief explanation of bidirectional memory in neural networks:

1. Forward and Backward Processing:


• In a bidirectional neural network, the input sequence is processed in two directions: forward
(from the beginning to the end of the sequence) and backward (from the end to the
beginning of the sequence). This means that each element in the sequence is considered in
both its natural order and in reverse order.
2. Bidirectional Recurrent Neural Networks (BiRNNs):
• Bidirectional memory is often used in the context of Bidirectional Recurrent Neural
Networks (BiRNNs). A BiRNN is a type of recurrent neural network that consists of two
separate recurrent layers: one that processes the sequence forward and another that
processes it backward.
3. Combining Forward and Backward Information:
• In a BiRNN, the outputs from the forward and backward layers are typically combined in
some way to provide a more comprehensive representation of the input sequence. Common
methods for combining these two representations include concatenation, element-wise
addition, or other more complex operations.
4. Advantages of Bidirectional Memory:
• Improved Contextual Understanding: Bidirectional memory allows the network to capture
context from both past and future elements in a sequence. This can be particularly useful in
tasks where understanding the context of a data point requires information from both
directions, such as in natural language processing.
• Enhanced Feature Extraction: By processing the data in both directions, the network can
potentially extract more meaningful and informative features from the input sequence. This
can lead to improved performance in tasks like sequence-to-sequence prediction, sentiment
analysis, or named entity recognition.
• Handling Long-Range Dependencies: Bidirectional networks can better handle long-range
dependencies in sequences, which can be challenging for unidirectional networks. For
example, in language modeling, understanding the meaning of a word may require
considering words that are far apart in the text.
5. Use Cases:
• Natural Language Processing: Bidirectional RNNs, such as Bidirectional LSTM (Long
Short-Term Memory) or Bidirectional GRU (Gated Recurrent Unit), are commonly used in
tasks like machine translation, text classification, and named entity recognition.
• Speech Recognition: Bidirectional RNNs can be employed to process speech signals for
tasks like speech recognition and understanding, where capturing context from both past
and future phonemes or words is essential.
• Time Series Analysis: Bidirectional networks can be applied to time series data to improve
predictions and detect patterns that depend on both past and future data points.

Algorithm:

You might also like