Introduction To ANN
Introduction To ANN
PREPARED BY,
Mahesh Kumar V B
Assistant Professor
1. Introduction
New models of computing to perform pattern recognition tasks are inspired by the structure and performance
of our biological neural network. However, these models are not expected to reach the performance of the
biological network due to several reasons. Firstly, the full operation of a biological neuron and its
interconnections remains not fully understood. Additionally, simulating the vast number of neurons and
their asynchronous operations presents significant challenges. Despite these limitations, basic computing
units can emulate some features of biological networks.
Biological neural networks possess several characteristics that make them superior to even sophisticated AI
systems for pattern recognition tasks:
• Robustness and Fault Tolerance: Nerve cell decay minimally affects performance.
• Flexibility: Networks adapt to new environments without pre-programmed instructions.
• Handling Data Variety: Capable of managing fuzzy, probabilistic, noisy, and inconsistent
information.
• Collective Computation: Operates many tasks in parallel and distributes tasks effectively.
1. Structure of a Neuron
o A neuron consists of:
▪ Cell Body (Soma): Contains the nucleus and supports basic cellular functions.
▪ Dendrites: Receive signals from other neurons.
▪ Axon: Transmits signals to other neurons or muscle fibers via synapses.
▪ Synaptic Junctions (Synapses): Points of connection between neurons where signal
transmission occurs.
2. Signal Transmission
o At synapses, signals are transmitted chemically through neurotransmitter substances.
o Transmission process:
▪ Neurotransmitters modify electrical potential in the receiving neuron.
▪ If potential reaches threshold (-60 mV to -70 mV), neuron fires an electrical pulse.
▪ Pulse travels down the axon as a sequence of fixed-strength signals.
3. Neuron Types
o Sensory Neurons: Receive information from sensory organs like eyes or ears.
o Interneurons: Transmit signals between neurons.
o Motor Neurons: Transmit signals to muscle fibers.
4. Physical Characteristics
o Sizes:
▪ Cell body: 10-80 micrometers (µm).
▪ Dendrites and axons: Few µm in diameter.
▪ Synaptic gap width: About 200 nanometers (nm).
o Length:
▪ Internal neurons: 0.01 mm in the human brain.
▪ Limb neurons: Up to 1 meter.
5. Signal Propagation
o Speed: 0.5-2 meters per second in human brain cells.
o Signal stops at synapses; transmission across synapses mainly chemical.
6. Synaptic Activity
o Neurotransmitters affect postsynaptic neuron conductance:
▪ Excitatory synapses: Depolarize, promoting neuron activation.
▪ Inhibitory synapses: Hyperpolarize, reducing neuron activation.
o Synaptic Plasticity: Strength of synaptic connections adjusts based on activity, influencing
learning and memory.
7. Neural Complexity
o Types of Neurons: Differ in dendritic branching, axon length, and structural details.
o Connectivity: Vast network of neurons with converging (inputs) and diverging (outputs)
connections.
o Human Cortex: Contains about 10^11 neurons, with each neuron receiving input from about
10^4 synapses.
8. Challenges and Research
o Understanding neural network operations remains complex despite basic principles being
universal.
o Research focuses on understanding brain function through simple neuron unit behaviours and
connections.
1. Speed:
o Computer: Executes instructions in nanoseconds (10^-9 seconds).
o Biological Neural Networks (BNNs): Neuronal events occur in milliseconds (10^-3
seconds).
o Implication: Computers process information nearly a million times faster than BNNs.
2. Processing:
o Computer: Operates sequentially, one instruction after another.
o BNNs: Perform massively parallel operations with fewer computational steps per operation.
o Implication: Despite being slower, BNNs excel in parallel processing, advantageous for
certain complex tasks.
3. Size and Complexity:
o BNNs: Approximately 10^11 neurons and 10^15 interconnections in the human brain.
o Computers: Conventional computing is limited by fixed architecture and memory size.
o Implication: The vast size and interconnectivity of BNNs enable complex pattern recognition
tasks beyond current computer capabilities.
4. Storage:
o Computer: Stores information in memory locations, overwriting old data.
o BNNs: Store information in the strengths of interconnections (synaptic weights), adaptable
without overwriting.
o Implication: BNNs exhibit associative memory, allowing recall from partial or noisy inputs,
unlike conventional computers.
5. Fault Tolerance:
o BNNs: Information is distributed across the network, allowing functionality even with
damaged neurons or connections.
o Computers: Memory corruption leads to irretrievable data loss.
o Implication: BNNs are fault-tolerant due to distributed information storage, unlike
computers which are prone to data loss.
6. Control Mechanism:
o Computer: Centralized control unit manages all computing activities.
o BNNs: Neurons operate autonomously based on local information, transmitting outputs to
connected neurons.
o Implication: BNNs lack a centralized control mechanism, relying on decentralized
processing akin to distributed computing tasks.
7. Adaptability and Learning:
o BNNs: Adjust synaptic strengths (learning) to store new information and adapt to
environmental changes.
o Computers: Programmed with fixed instructions and require reprogramming for significant
changes.
o Implication: BNNs demonstrate learning and adaptation capabilities crucial for cognitive
tasks and memory formation.
8. Complexity Handling:
o BNNs: Excel at handling fuzzy, inconsistent, and probabilistic data due to associative and
distributed nature of information.
o Computers: Require precise, logical rules and struggle with ambiguous data.
o Implication: BNNs mimic human cognitive abilities, making them suitable for complex
pattern recognition tasks beyond traditional computing.
The field of neural networks has evolved significantly since its inception. Key developments and
contributions have laid a strong theoretical and conceptual foundation. Table 1.1 provides an overview of
some of these significant milestones.
• Warren McCulloch and Walter Pitts proposed a model of a computing element known as the
McCulloch-Pitts neuron, which performs a weighted sum of the inputs followed by a threshold logic
operation. This model was capable of performing logical computations using combinations of these
neurons. However, the model's primary limitation was the fixed weights, preventing it from learning
from examples.
• Marvin Minsky developed a learning machine where connection strengths could be adapted
automatically, paving the way for adaptive learning systems.
• Frank Rosenblatt proposed the perceptron model with adjustable weights via the perceptron
learning law. This model showed convergence for pattern classification problems that are linearly
separable. Although it was shown that multilayer perceptrons could perform any pattern classification
task, there was no systematic learning algorithm to adjust the weights.
• Bernard Widrow and Marcian Hoff introduced the Adaline model and the LMS (Least Mean
Squares) learning algorithm. This algorithm proved its convergence and was successfully used in
adaptive signal processing.
• Marvin Minsky and Seymour Papert demonstrated the limitations of the perceptron model,
highlighting the need for suitable learning algorithms for multilayer networks.
• John Hopfield conducted an energy analysis of feedback neural networks, showing the existence of
stable equilibrium states in a network with symmetric weights and asynchronous state updates. This
resurgence of interest in neural networks laid the groundwork for further developments.
• Ackley, Hinton, and Sejnowski proposed the Boltzmann machine, a feedback neural network with
stochastic neuron units. These neurons used probabilistic update rules, and the machine included
hidden units to aid in pattern storage.
1986: Backpropagation
• Rumelhart, Hinton, and Williams demonstrated the generalized delta rule (backpropagation),
which allowed systematic adjustment of weights in a multilayer feedforward neural network to learn
implicit mappings in input-output pairs.
McCulloch and Pitts (1943) von Neumann (1946) - General-purpose electronic computer
Minsky and Papert (1969) Little (1974) - Ising model and neural network
- No learning for MLP Little and Shaw (1978) - Stochastic law for NN, spin glasses
Processing Unit:
• An artificial neural network (ANN) is a simplified model of a biological neural network, consisting
of interconnected processing units.
• Each processing unit has a summing part and an output part.
o The summing part receives N input values, weights each input, and computes a weighted
sum known as the activation value.
o The output part produces a signal from the activation value.
o Weights can be positive (excitatory) or negative (inhibitory).
o Inputs and outputs can be discrete or continuous, and deterministic, stochastic, or fuzzy.
Interconnections:
• Processing units are interconnected in a specific topology to perform pattern recognition tasks.
o Inputs to a processing unit may come from outputs of other units or external sources.
o Outputs of a unit can be fed to several other units, including itself.
o The connection strength, or weight, determines how much output one unit receives from
another.
o In an ANN with N units, each unit has a unique activation value and output value at any time.
o The set of activation values defines the activation state, and the set of output values defines
the output state of the network.
Operations:
• Each unit receives inputs from connected units or external sources and computes a weighted sum of
these inputs.
• The activation value determines the unit's output state.
• The output values, along with other inputs, determine the activation and output states of other units.
• Activation dynamics describes how activation values change over time, determining the activation
state space (all possible activation states) and the output state space (all possible output states).
• The trajectory of activation states over time describes the activation dynamics of the network.
• Synaptic dynamics refers to changes in the connection weights over time, forming a weight vector
that defines the network's long-term memory.
• Adjusting weights to store patterns is known as learning, governed by a learning law or learning
algorithm.
Update:
In biological neural networks, the activation dynamics and updates are much more complex than the
simplified ANN models. ANN models and their governing equations are designed according to the specific
pattern recognition tasks they aim to handle.
Models of Neuron
1. McCulloch-Pitts Model
The McCulloch-Pitts (MP) model is one of the earliest models of an artificial neuron. It simplifies the
functioning of a biological neuron by using a mathematical model to describe the activation and output of
the neuron.
• Inputs: The model takes M input values a1,a2,...,aM.
• Weights: Each input is associated with a weight w1,w2,...,wM
• Bias: A bias term θ is included.
• Activation Function: The activation x is computed as the weighted sum of inputs minus the bias:
Output Function: The output s is a nonlinear function f(x) of the activation value x. The original MP model
used a binary step function:
Three commonly used nonlinear functions (binary, ramp and sigmoid) are shown in Figure 1.3, although
only the binary function was used in the original MP model.
In the MP model, the weights are fixed. Hence, a network using this model does not have the capability of
learning. Moreover, the original model allows only binary output states, operating at discrete time steps.
Example
Consider a neuron with two inputs (a1 and a2), weights (w1=0.5 and w2=0.5), and bias θ=0.5.
2. Perceptron
The Perceptron, introduced by Frank Rosenblatt, improves on the MP model by incorporating learning
through weight adjustments.
• Inputs and Weights: Similar to the MP model, with weights that are adjustable.
• Activation Function:
• Output Function: Uses a step function like the MP model.
S = f(x)
Example
Suppose a Perceptron with two inputs (a1 and a2), initial weights (w1=0.1 and w2=0.1), and a learning rate
η=0.01. If the inputs are a1=1 and a2=0 with target output b=1:
There is a perceptron learning law which gives a step-by-step procedure for adjusting the weights. Whether
the weight adjustment converges or not depends on the nature of the desired input-output pairs to be
represented by the model. The perceptron convergence theorem enables us to determine whether the given
pattern pairs are representable or not. If the weight values converge, then the corresponding problem is said
to be represented by the perceptron network.
3. Adaline
ADAptive LINear Element (ADALINE) is a computing model proposed by Widrow and is shown in
Figure 1.6.
The main distinction between the Rosenblatt's perceptron model and the Widrow's Adaline model is that, in
the Adaline the analog activation value (x) is compared with the target output (b). In other words, the output
is a linear function of the activation value (x). The equations that describe the operation of an Adaline are as
follows:
Example:
This weight update rule minimises the mean squared error a2, averaged over all inputs. Hence it is called
Least Mean Squared (LMS) error learning law. This law is derived using the negative gradient of the error
surface in the weight space. Hence it is also known as a gradient descent algorithm.
Neural networks learn by adjusting their synaptic weights based on specific rules known as learning laws.
These laws describe how weights should be updated in response to input data and the network's output.
• Short Term Memory (STM): Modeled by the activation state of the network, representing temporary
information held during processing.
• Long Term Memory (LTM): Encoded in the synaptic weights, representing the knowledge acquired
through learning.
Learning Laws
Learning laws are models of synaptic dynamics, describing how weights (synaptic connections) are updated
over time. They are often expressed as learning equations.
The weight vector wi for the i-th processing unit at time t+1 is given by:
1. Hebb's Law
• Principle: The synaptic strength increases when both the presynaptic and postsynaptic neurons are
simultaneously active. The weight change is proportional to the product of the input signal and the
output signal.
• Characteristics:
o Unsupervised Learning: The network learns from the input patterns without needing a
desired output.
o Weight Initialization: Weights are initialized to small random values around zero.
• Inputs:
o Input vector a = (0.5,0.2)
o Output signal si = 1
o Learning rate η = 0.1
• Principle: The weights are adjusted based on the error between the desired output and the actual
output. This law is used in perceptrons with bipolar output functions.
Key Points:
• Bipolar Output Functions: This law is applicable only for bipolar output functions, also known as
the discrete perceptron learning law.
• Error Correction: Weights are adjusted only if the actual output si is incorrect. If si is correct, the
term [bi−sgn(wTa)] becomes zero, resulting in no change to the weights.
• Supervised Learning: The perceptron learning law is a supervised learning law, requiring a desired
output for each input.
• Weight Initialization: Weights can be initialized to any random values as the initial values are not
critical. The weights will converge to the final values through repeated use of the input-output pattern
pairs.
• Convergence: The weights will converge to the correct values if the input-output pattern pairs are
representable by the system.
• Principle: The change in the weight vector is proportional to the error between the desired output
and the actual output, modulated by the derivative of the output function. This is also known as the
continuous perceptron learning law.
Key Points:
• Differentiable Output Function: The Delta Learning Law requires a differentiable activation
function because the weight update rule depends on the derivative of the output function.
• Error Correction: The weights are adjusted based on the error [bi−f(wTa)] between the desired
output and the actual output.
• Supervised Learning: It is a supervised learning law, meaning it requires a desired output for each
input.
• Weight Initialization: Weights can be initialized to any random values. They are not critical initially
because they will eventually converge to the correct values.
• Convergence: The weights will converge to their final values through repeated iterations using the
input-output pattern pairs. The convergence can be enhanced by adding more layers of processing
units between the input and output layers.
• Generalization: The Delta Learning Law can be extended to multiple layers in a feedforward
network, forming the basis for the backpropagation algorithm used in training deep neural networks.
Through repeated iterations with different input-output pairs, the weights will eventually converge to values
that minimize the error for the given dataset. This process is more efficient when using multiple layers in the
network, allowing for more complex representations and improved learning.
Widrow-Hoff LMS Learning Law
• Principle: Also known as the Least Mean Squared (LMS) error learning law, it adjusts the weights
to minimize the mean squared error between the desired output and the actual output. It is a special
case of the delta learning law with a linear output function.
Key Points:
• Supervised Learning: It is a supervised learning law, requiring a desired output for each input.
• Linear Output Function: The output function is assumed to be linear, i.e., f(x)=x
• Error Correction: The weights are adjusted based on the error [bi−wTa] between the desired output
and the actual output.
• Gradient Descent: The weight update is proportional to the negative gradient of the error, leading to
minimization of the mean squared error.
• Initialization: Weights can be initialized to any values.
• Convergence: The input-output pattern pairs are applied multiple times to achieve convergence.
Convergence is not guaranteed for any arbitrary training data set.
• Principle: The change in the weight vector is directly proportional to the desired output and the input
vector. This is a supervised learning law, as it uses the desired output to adjust the weights.
Characteristics:
• Supervised Learning: Uses the desired output value to adjust the weights.
• Weight Initialization: Weights are typically initialized to small random values close to zero.
• Relation to Hebbian Learning: This can be viewed as a special case of Hebbian learning, but it is
supervised.
Characteristics:
The Outstar Learning Law is a supervised learning algorithm used to adjust the weights in a neural network
layer to capture the desired output pattern characteristics. This learning law is typically used in networks
with instars for data compression.
Key Points:
• Supervised Learning: It is a supervised learning law, requiring a desired output pattern for each
input.
• Desired Response: The weight adjustment aims to make the output pattern of the network layer
match the desired response vector b=(b1,b2,...,bM).
• Active Unit: The kkk-th unit is the only active unit in the input layer during learning.
• Initialization: The weight vectors are initialized to zero before training.
• Data Compression: The Outstar Learning Law is used with networks of instars to capture input and
output pattern characteristics, aiding in data compression.
Example:
Consider a neural network layer with 3 output neurons and 1 active input neuron. We will use the Outstar
Learning Law to adjust the weights based on a given desired output pattern.
By understanding and applying these learning laws, neural networks can be trained effectively to recognize
patterns, adapt to new data, and perform various tasks in machine learning and artificial intelligence