Ann TUTORIAL
Ann TUTORIAL
Ann TUTORIAL
1
UNIT I FUNDAMENTALS OF ANN
Fundamentals of ANN – Biological Neurons and Their Artificial Models – Types of ANN
– Properties – Different Learning Rules – Types of Activation Functions – Training of
ANN – Perceptron Model (Both Single &Multi-Layer) – Training Algorithm – Problems
Solving Using Learning Rules and Algorithms – Linear Separability Limitation and Its
Over Comings
1. FUNDAMENTALS OF ANN
The human brain consists of a large number, more than a billion of neural cells that
process information. Each cell works like a simple processor. The massive interaction between
all cells and their parallel processing only makes the brain’s abilities possible. Figure 1
represents a human biological nervous unit. Various parts of biological neural network(BNN)
is marked in Figure 1.
2
Dendrites are branching fibres that extend from the cell body or soma.
Soma or cell body of a neuron contains the nucleus and other structures, support
chemical processing and production of neurotransmitters.
Axon is a singular fiber carries information away from the soma to the synaptic sites of
other neurons (dendrites ans somas), muscels, or glands.
Axon hillock is the site of summation for incoming information. At any moment, the
collective influence of all neurons that conduct impulses to a given neuron will determine
whether or n ot an action potential will be initiated at the axon hillock and propagated along
the axon.
Myelin sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap exists
between each myelin sheath cell along the axon. Since fat inhibits the propagation of electricity,
the signals jump from one gap to the next.
Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells. Since fat
serves as a good insulator, the myelin sheaths speed the rate of transmission of an electrical
impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or a
gland. Electrochemical communication between neurons take place at these junctions.
Terminal buttons of a neuron are the small knobs at the end of an axon that release
chemicals called neurotransmitters.
3
A processing unit sums the inputs, and then applies a non-linear activation function
(i.e. squashing/transfer/threshold function).
An output line transmits the result to other neurons.
Neuron consists of three basic components –weights, thresholds and a single activation
function. An Artificial neural network(ANN) model based on the biological neural sytems is
shown in Figure 2.
Supervised learning
Unsupervised learning
Reinforced learning
Hebbian learning
Gradient descent learning
4
Competitive learning
Stochastic learning
Every input pattern that is used to train the network is associated with an output pattern
which is the target or the desired pattern.
5
1.3.1.2 Unsupervised learning
In this learning method the target output is not presented to the network.It is as if there
is no teacher to present the desired patterns and hence the system learns of its own by
discovering and adapting to structural features in the input patterns.
In this method, a teacher though available, doesnot present the expected answer but
only indicates if the computed output correct or incorrect.The information provided helps the
network in the learning process.
This rule was proposed by Hebb and is based on correlative weight adjustment.This is
the oldest learning mechanism inspired by biology.In this, the input-output pattern pairs (𝑥𝑖 , 𝑦𝑖 )
are associated by the weight matrix W, known as the correlation matrix.
It is computed as
Here 𝑦𝑖 𝑇 is the transposeof the associated output vector 𝑦𝑖 .Numerous variants of the rule have
been proposed.
This is based on the minimization of error E defined in terms of weights and activation
function of the network.Also it is required that the activation function employed by the network
is differentiable, as the weight update is dependent on the gradient of the error E.
Thus if ∆𝑤𝑖𝑗 is the weight update of the link connecting the 𝑖𝑡ℎ and 𝑗𝑡ℎ neuron of the
two neighbouring layers, then ∆𝑤𝑖𝑗 is defined as,
𝜕𝐸
∆𝑤𝑖𝑗 = ɳ ----------- eq (2)
𝜕𝑤𝑖𝑗
𝜕𝐸
Where, ɳ is the learning rate parameter and is the error gradient with reference to the
𝜕𝑤𝑖𝑗
weight 𝑤𝑖𝑗 .
In this method, those neurons which respond strongly to input stimuli have their weights
updated.
6
When an input pattern is presented, all neurons in the layer compete and the winning
neurons undergoes weight adjustment.Hence it is a winner-takes-all strategy.
The different learning laws or rules with their features is given in Table1 which is given
below
Table 1: Different learning laws with their weight details and learning type
7
1.4 TYPES OF ACTIVATION FUNCTIONS
Linear functions are simplest form of Activation function.Refer figure 4 . f(x) is just
an identity function.Usually used in simple networks. It collects the input and produces an
output which is proportionate to the given input. This is Better than step function because it
gives multiple outputs, not just True or False
1.4.2. Binary Step Function (with threshold ) (aka Heaviside Function or Threshold
Function)
1 if x
f (x)
0 if x ----------- eq (4)
Binary step function is shown in figure 4. It is also called Heaviside function. Some
literatures it is also known as Threshold function. Equation 4 gives the output for this function.
8
1.4.3. Binary Sigmoid
9
1.5 PERCEPTRON MODEL
1.5.1 Simple Perceptron for Pattern Classification
----------- eq (7)
Equation 7 gives the bipolar activation function which is the most common function
used in the perceptron networks. Figure 7 represents a single layer perceptron network. The
inputs arising from the problem space are collected by the sensors and they are fed to the
aswociation units.Association units are the units which are responsible to associate the inputs
based on their similarities. This unit groups the similar inputs hence the name association unit.
10
A single input from each group is given to the summing unit.Weights are randomnly fixed
intially and assigned to this inputs. The net value is calculate by using the expression
This value is given to the activation function unit to get the final output response.The
actual output is compared with the Target or desired .If they are same then we can stop training
else the weights haqs to be updated .It means there is error .Error is given as δ = b-s , where b
is the desired / Target output and S is the actual outcome of the machinehere the weights are
updated based on the perceptron Learning law as given in equation 9.
Step 1: Initialize weights and bias.For simplicity, set weights and bias to zero.Set
learning rate in the range of zero to one.
11
1.5.3 Multi-Layer Perceptron Model
Figure 8 is the general representation of Multi layer Perceptron network.Inbetween the
input and output Layer there will be some more layers also known as Hidden layers.
12
1.6 LINEARLY SEPERABLE & LINEAR IN SEPARABLE TASKS
Perceptron are successful only on problems with a linearly separable solution sapce.
Figure 9 represents both linear separable as well as linear in seperable problem.Perceptron
cannot handle, in particular, tasks which are not linearly separable.(Known as linear
inseparable problem).Sets of points in two dimensional spaces are linearly separable if the sets
can be seperated by a straight line.Generalizing, a set of points in n-dimentional space are that
can be seperated by a straight line.is called Linear seperable as represented in Figure 9.
Single layer perceptron can be used for linear separation.Example AND gate.But it cant
be used for non linear ,inseparable problems.(Example XOR Gate).Consider figure 10.
13
Here a single decision line cannot separate the Zeros and Ones Linearly.At least Two
lines are required to separate Zeros and Onesas shown in Figure 10. Hence single layer
networks can not be used to solve inseparable problems. To over come this problem we go for
creation of convex regions.
Convex regions can be created by multiple decision lines arising from multi layer
networks.Single layer network cannot be used to solve inseparable problem.Hence we go for
multilayer network there by creating convex regions which solves the inseparable problem.
Select any Two points in a region and draw a straight line between these two points. If
the points selected and the lines joining them both lie inside the region then that region is
known as convex regions.
14
REFERENCE BOOKS
Note: For further reference, kindly refer the class notes, PPTs, Video lectures available in the
Learning Management System (Moodle)
15
SCHOOL OF ELECTRICAL AND ELECTRONICS
1
2. MULTI LAYER NETWORKS
Single Layer networks cannot used to solve Linear Inseparable problems & can only be
used to solve linear separable problems
Single layer networks cannot solve complex problems
Single layer networks cannot be used when large input-output data set is available
Single layer networks cannot capture the complex information’s available in the
training pairs
Any neural network which has at least one layer in between input and output layers is
called Multi-Layer Networks
Layers present in between the input and out layers are called Hidden Layers
Input layer neural unit just collects the inputs and forwards them to the next higher
layer
Hidden layer and output layer neural units process the information’s feed to them and
produce an appropriate output
Multi -layer networks provide optimal solution for arbitrary classification problems
Multi -layer networks use linear discriminants, where the inputs are non linear
2
2.3 BACK PROPAGATION NETWORKS (BPN)
3
2.3.1 BPN Algorithm
The algorithm for BPN is as classified int four major steps as follows:
Algorithm
I. Initialization of weights
Step 1: Initialize the weights to small random values near zero
Step 2: While stop condition is false , Do steps 3 to 10
Step 3: For each training pair do steps 4 to 9
Step 4: Each input xi is received and forwarded to higher layers (next hidden)
Step 5: Hidden unit sums its weighted inputs as follows
Zinj = Woj + Σxiwij
Applying Activation function
Zj = f(Zinj)
This value is passed to the output layer
Step 6: Output unit sums it’s weighted inputs
yink= Voj + Σ ZjVjk
Applying Activation function
Yk = f(yink)
4
V. Updating of Weights & Biases
Step 9: continued:
New Weight is
Wij(new) = Wij(old) + Δwij
Vjk(new) = Vjk(old) + ΔVjk
New bias is
Woj(new) = Woj(old) + Δwoj
Vok(new) = Vok(old) + ΔVok
Step 10: Test for Stop Condition
2.3.2 Merits
2.3.3 Demerits
This network was proposed by Hect & Nielsen in 1987.It implements both supervised
& Unsupervised Learning. Actually it is a combination of two Neural architectures (a) Kohonan
Layer - Unsupervised (b) Grossberg Layer – Supervised. It Provides good solution where long
training is not tolerated. CPN functions like a Look-up Table Generalization. The training pairs
may be Binary or Continuous. CPN produces a correct output even when input is partially
incomplete or incorrect. Main types of CPN is (a) Full Counter Propagation (b) Forward only
Counter Propagation. Figure 2 represents the architectural diagram of CPN network.
5
• Forward only nets are the simplified form of Full Counter Propagation networks
• Forward only nets are used for approximation problems
The first layer is Kohonan layer which uses competative learning law.The procedure
used here is , when an input is provided the weighted net values is calculated for each node.
Then the node with maximum output is selected and the signals from other neurons are
inhibited. This output from the wining neuron only is provided to the next higher layer, which
is the Supervised Grosssberg layer.Grossberg processing is similar to that of an normal
supervised algorithm.
Step 1: Initialize the weights & learning rate to a small random value near zero
Step 2: While stop condition is false , Do steps 3 to 9
Step 3: Set the X- input layer input Activations to vector X
Step 4: Each input xi is received and forwarded to higher layers (Kohonan Layer)
Step 5: Kohonan unit sums its weighted inputs as follows
Inputs & Weights are normalised, then net value is calculated as follows
Kinj = Woj + XW ( in vectors)
Applying Activation function
Kj = f(Kinj)
Step 5A: Wining Cluster is identified.( The node with a maximum output is selected as Winner.
Only this output is forwarded to the next Grossberg layer, All other units output are
inhibited)
6
For clustering the inputs Xi’s , Euclidian Distance Norm function is used
Dj = (xi-vij) ^ 2 + (yk-wkj) ^ 2
Dj should be minimum
Step 6: Update the weights over the calculated winner unit Kj
Step 7: Test for stop condition of phase -I
(Phase –I : Input X layer to Z Cluster layers)
Phase – II : Z Cluster layers to Y output layers
Step 8: Repeat steps 5,5A,6 for Phase –II layers
Step 9: Test for stop condition for phase II
2.4.2 Merits
A combination of Unsupervised (Phase-I) & Supervised
Network works as like a Look Up Table
Fast and Coarse approximation
100 times faster than BPN model
2.4.3 Demerits
Learning phase requires intensive calculations
Selection of number of Hidden layer neurons is an issue
Selection of number of Hidden layers is also an issue
Network gets trapped in Local Minima
7
2.5 BI-DIRECTIONAL ASSOCIATIVE MEMORIES
8
Figure 3 represents the BAM architecture. BAM contains ‘n’ neurons in X layer and
‘m’ neurons in Y layer. Both X & Y layers can act as input or Output layers. Weights
for X layer is taken as Wij . Weights for Y layer is taken as Wij T. If Binary or Bipolar
Activations are used then it is known as Discrete BAM. If Continuous Activations are
used then it comes under Continuous BAM type.
2.5.2 Algorithm
The following steps explain the procedural flow of BiDirectional Associative Memory
2.5.3 Merits
2.5.4 Demerits
• Incorrect Convergence
• Memory capacity is limited because storage of ‘m’ patterns should be lesser than
‘n’ neurons of smaller layer
• Sometimes the networks learns some patterns which are not provided to it
9
2.5.5 Applications of BAM
• Fault Detection
• Pattern Association
• Real Time Patient Monitoring
• Medical Diagnosis
• Pattern Mapping
• Pattern Recognition systems
• Optimization problems
• Constraint satisfaction problems
How to learn a new pattern without forgetting the old traces (patterns) and how to adapt
to the changing environment (i/p). When there is change in the patterns (plasticity) how to
remember previously learned vectors (stability problem) is a problem. ART uses competitive
law (self-regulating control) to solve this PLACITICITY – STABILITY Dilemma. The
simplified ART diagram is given below in Figure 5.
2.6.2 Comparison Layer: Take 1D i/p vector and transfers it to the best match in recognition
field (best match - neuron in recognition unit whose weight closely matches with i/p
vector).
10
2.6.3 Recognition Unit: produces an output proportional to the quality of match. In this way
recognition field allows a neuron to represent a category to which the input vectors are
classified.
Vigilance parameter: After the i/p vectors are classified a reset module compares the
strength of match to vigilance parameter (defined by the user). Higher vigilance produces fine
detailed memories and lower vigilance value gives more general memory.
2.6.4 Reset module: compares the strength of recognition phase. When vigilance threshold
is met then training starts otherwise neurons are inhibited until a new i/p is provided.
There are two set of weights (1) Bottom up weight - from F1 layer to F2 Layer
(2) Top –Down weight – F2 to F1 Layer
Fast learning: Happens in ART 1 – Weight changes are rapid and takes place during
resonance. The network is stabilized when correct match at cluster unit is reached.
Slow Learning: Used in ART 2. weight change is slow and does not reach equilibrium
in each learning iteration.so more memory to store more i/p patterns (to reach stability) is
required.
11
Images adapted from Laurene Fausett, “Fundamentals of Neural Networks,
Architectures, Algorithms and
12
Images adapted from Laurene Fausett, “Fundamentals of Neural Networks,
Architectures, Algorithms and Applications”, Prentice Hall publications
• Pattern Recognition
• Pattern Restoration
• Pattern Generalization
• Pattern Association
• Speech Recognition
• Image Enhancement
• Image Restoration
• Facial Recognition systems
• Optimization problems
• Used to solve Constraint satisfaction problems
The net is a fully interconnected neural net, in the sense that each unit is connected
to every other unit. The net has symmetric weights with no self-connections, Wij =Wji and
Wii = 0
Only one unit updates its activation at a time and each unit continues to receive an
external signal in addition to the signal from the other units in the net. The asynchronous
13
updating of the units allows a function, known as an energy or Lyapunov function, to be found
for the net.
The basic diagram for Hopfield Networks is given in Figure 7. Here no learning
algorithm is used. No Hidden units/layers used. Patterns are simply stored by learning the
energies. Similar to Human brain in storing and retrieving memory patterns. Some patterns /
images are stored & when similar noisy input is provided the network recalls the related stored
pattern. The neuron can be ONN(+1) or OFF(-1).The neurons can change state between +1 &
-1 based on the inputs which they receives from other neurons. Hopfield Network is trained to
store patterns(memories). It can recognize previously learned (stored) Pattern from partial
(noisy) inputs.
Based on the activation functions used the Hopfield Network can Be classified into two
types. They are
14
Discrete Hopfield Network – Uses Discrete Activation Function
Continuous Hopfield Network – Uses Continuous Activation Function
Hopfield Networks Uses Lyapunov Energy Function. Energy function guarantees the
network to reach a stable minimum local energy state which resembles the stored patterns
2.7.4 Algorithm
15
Demerits
Incorrect Convergence
Memory capacity is limited because storage of ‘m’ patterns should be lesser than ‘n’
neurons of smaller layer
Sometimes the networks learn some patterns which are not provided to it
REFERENCE BOOKS
16
SCHOOL OF ELECTRICAL AND ELECTRONICS
1
2. MULTI LAYER NETWORKS
Developed by Finish Prof Teuvo Kohonan in 1980’s. This network is also known as
Topology Preserving maps. The name Topology preserving is provided since the location or
position of the node varies in the stating time of training procedure and once the network
learned the given input pattern the topology or the location of neural nodes are fixed.
Each node is provided with a weight vector which is nothing but the position of that
node in the input space or Map. Job of training is to adjust this weight vector so the distance in
the map reduces. The weight moves towards the input. Thus, from a higher dimension the map
reduces to a 2 Dimension. This is the Dimensionality Reduction Process. After training, SOM
can classify the input by selecting a nearest node (small distance) with closest weight vector to
the input space vector
2
This transformation is performed in an orderly manner. SOM uses only two-
dimensional discretized input space known as Maps for its operation. Instead of error correction
learning SOM uses Competitive / Winner Takes All learning is utilized in SOM.
Step 1: Initialize the Weights Wij. Initialize the learning rate and topological neighbourhood
parameters
Step 2: While stop condition is false Do steps 3 to 9
Step 3: For each input Vector x, do steps 4 to 6
Step 4: For each j calculate D(j) = Σ (Wij – xi)2
Step 5: Find the index ‘j’ for which D(j) is Minimum
Step 6: For all units of j within a specific neighbourhood of j and for all i
Wij (new) = Wij (old) + α [ xi – Wij (old)]
Step 7: Update Learning Rate
Step 8: Reduce the radius of Topological Neighbourhood at specific time periods
Step 9: Test for stop condition
• Initialization
• Competition
For the given each inputs patterns, the neurons calculate a discriminant function(Here
we use Euclidean Distance function).This Discriminant function acts as a basis for the
3
competition among neurons. The neuron with smaller Distance value is selected as wining
neuron(Winner Takes All law)
• Cooperation
The wining neuron determines the spatial locations of the excited neurons in that
neighbourhood (Topological map). Thus, a cooperation between the neurons is established by
the wining neuron in that rearranged neighbourhood
• Adaptation
The wining neurons by adjusting its weight values, tries to minimize the discriminant
function (distance value) between them and the inputs. When similar inputs are provided the
response of the wining neuron is enhance in a better way
• Easy to interpret
• Dimensionality Reduction
• Capable of handling different types of classification problems
• Can cluster large complex input set’s
• SOM training Time is less
• Simple algorithm
• Easy to implement
Demerits
It does not build a generative model for the data, i.e, the model does not understand
how data is created.
It does not behave so gently when using categorical data, even worse for mixed types
data.
The time for preparing model is slow, hard to train against slowly evolving data
• Character Recognition
• Speech Recognition
4
• Texture Recognition
• Image Clustering
• Data Clustering
• Classification problems
• Dimensionality reduction applications
• Seismic analysis
• Failure Analysis etc
5
2.2.1 Algorithm
Step 1: Initialize the weight vectors to the ‘m’ Training vectors, where ‘m’ is the number of
different classification/Cluster. Start learning rate α near zero (small value)
Step 2: While Stop condition id false do steps 3 to 6
Step 3: For each input training vector X, do steps 4 to 5
Step 4: Find J such that D(J) is minimum
Step 5: Update the weights of Jth neural unit as given below
IF T = Cj then
Wj(new) = Wj(old) + α [x -wj(old)] [ Move the weight vector W towards the input X]
If T is Not Equal to C then
Wj(new) = Wj(old) - α[x -wj(old)] [ Move the weight vector W away from the input X]
Step 6: Reduce Learning Rate α
Step 7: Test for stop condition (Either a fixed number of Iteration reached or Learning rate
αhas reached a very minimum value
2.2.2 Merits
Demerits
6
o All input feature vectors are connected with the Middle-Hidden Layer
o Hidden nodes are connected into groups and Each group denotes a particular class ‘K’
o Each node present in Hidden Layer resembles to a Gaussian Function centered on its
Feature Vector for that Kth class
o All of these Gaussian function outputs of a group/class are fed to the Kth Output unit
o Hence, we have only ‘K’ Output units only
o PNN is closely related to PARZEN Window PDF Estimator or Mixed Gaussian
Estimator
o For any output node ‘K’, all Gaussian values (of the previous Hidden layer) for that
output class are summed up
o This summed up value is scaled to a Probability Density Function (PDF)
o If class 1 contains ‘p’ feature vectors and Class 2 contains ‘Q’ feature vectors, Then P
nodes are present in Hidden layer for the class 1 & Q nodes for class 2 is present
o The equations for Gaussian functions for any input is given as
2.3.1. Algorithm
7
Figure 3: General Probabilistic Neural Network Architecture Diagram
8
Cascade correlation addresses both issues of slow rate of convergence and fixation of
nodes while training by dynamically adding hidden units to the architecture-but only the
minimum number necessary to achieve the specified error tolerance for the training set.
Furthermore, a two-step weight-training process ensures that only one layer of weights is being
trained at any time.
A cascade correlation net consists of input units, hidden units, and output units. Input
units are connected directly to output units with adjustable weighted connections.
Connections from inputs to a hidden unit are trained when the hidden unit is added to
the net and are then frozen. Connections from the hidden units to the output units are adjustable.
Cascade correlation starts with a minimal network, consisting only of the required input and
output units (and a bias input that is always equal to 1). This net is trained until no further
improvement is obtained; the error for each output unit is then computed (summed over all
training patterns).
Next, one hidden unit is added to the net in a two-step process. During the first step, a
candidate unit is connected to each of the input units, but is not connected to the output units.
The weights on the connections from the input units to the candidate unit are adjusted to
maximize the correlation between the candidate's output and the residual error at the output
units. The residual error is the difference between the target and the computed output,
multiplied by the derivative of the output unit's activation function, i.e., the quantity that would
be propagated back from the output units in the backpropagation algorithm. When this training
is completed, the weights are frozen and the candidate unit becomes a hidden unit in the net.
The second step in which the new unit is added to the net now commences. The new
hidden unit is connected to the output units, the weights on the connections being adjustable.
Now all connections to the output units are trained. (The connections from the input units are
trained again, and the new connections from the hidden unit are trained for the first time.)
9
Figure 4. Schematic Representation of Cascade Corelation Network
2.4.1. Merits
2.4.2. Applications
o General Regression Neural Networks [GRNN’s] was proposed by D.F. Specht in 1991
o GRNN is a Single pass learning Network
10
o General Regression Neural Networks uses Gaussian Activation function for its Hidden
Layers
o GRNN is based on Function Approximation or Function estimation procedures
o Output is estimated using weighted average of the outputs of training dataset, where the
weight is calculated using the Euclidean distance between the training data and test
data
o If the distance is large then the weight will be very less and if the distance is small more
weight is given to the output
o Contains 4 layers: (1) Input layer (2) Hidden (pattern) Layer (3) Summation Layer (4)
Output (division) Layer
o GRNN’s Estimator is given by the equation
Where x = input
xi = Training sample
Y(xi) = Output for sample I
di2 = Euclidean Distance
e-(di2∕2σ2) = Activation Function – This value is taken as weight value
σ = Spread constant (only Unknown parameter)
Select σ when MSE is Minimum
Used to calculate optimum value of σ. First divide the samples into two parts. One part
is used to train and the other is used to Test the network. Apply GRNN to Test data based on
Training data & calculate MSE for different σ . Select the Minimum MSE and its
Corresponding σ. The architecture Diagram of GRNN is given in Figure 5.
11
Figure 5. General Regression Neural Network
Consider the characters given in figure 6. Now the objective is to recognise a particular
alphabet, say ‘A’ in this example. Using Image analysis models the particular alphabet is
segmented and converted into Intensity or Gray scale or Pixel values. The general work flow
is shown in Figure 7. The first procedure is segmentation. Segmentation is the process of
Subdividing the images into sub blocks. So alphabet “A” is isolated by using appropriate
segmentation procedures like thresholding or region Growing or Edge detector based
algorithms.
12
Figure 6. Input to Character recognition system
13
Figure 8b. Character Pattern conversion into
Figure 8a. Character Pattern values
intensity
Figures 6,7,8 are adapted from Praveen Kumar et al. (2012), “Character Recognition
using Neural Network”, vol3 ,issue 2., .Pp 978- 981, IJST
For figure 8.b Texture Features, Shape Features and or Boundary features etc can be
extracted. This feature values are known as exemplars which is the actual input into the
neural network. Consider any neural network. The input is the feature table created as
explained in the above process, which is shown in Figure 9. This table is provided as in put to
the neural system
Figure 9 Adopted from Yusuf Perwej et al. (2011), “Neural Networks for Handwritten
English Alphabet Recognition”, International Journal of Computer Applications (0975 –
8887) Volume 20– No.7, April 2011.
14
Figure 10. ANN implementation of character recognition system
Figure 10: Adopted from Anita pal et al. (2010), “Handwritten English Character
Recognition Using Neural Network”, International Journal of Computer Science &
Communication, Vol. 1, No. 2, July-December 2010, pp. 141-144.
If the feature sets matches between the trained and current input features the output
produces “1” , which denotes that the particular alphabet is trained else “0” not recognised.
REFERENCE BOOKS
15
SCHOOL OF ELECTRICAL AND ELECTRONICS
1
4. INTRODUCTION TO FUZZY LOGIC
Classical set - Operations and properties - Fuzzy Set - Operations and properties -
Problems, Classical Relations - Operations and Properties, Fuzzy Relations - Operations
and Properties - Compositions Membership function -FLCS - Need for FLC-
Fuzzification - Defuzzification.
4.1 INTRODUCTION
The classical set theory is built on the fundamental concept of “set” of which an
individual is either a member or not a member. A sharp, crisp, and unambiguous distinction
exists between a member and a nonmember for any well-defined “set” of entities in this theory,
and there is a very precise and clear boundary to indicate if an entity belongs to the set.
Namely, in the classical set theory, it is not allowed that an element is in a set and not
in the set at the same time. Thus, many real-world application problems cannot be described
and handled by the classical set theory, including all those involving elements with only partial
membership of a set. On the contrary, fuzzy set theory accepts partial memberships, and,
therefore, in a sense generalizes the classical set theory to some extent.
Fuzzy logic is an extension of Boolean logic by Lot Zadeh in 1965 based on the
mathematical theory of fuzzy sets, which is a generalization of the classical set theory. By
introducing the notion of degree in the verification of a condition, thus enabling a condition to
be in a state other than true or false, fuzzy logic provides a very valuable flexibility for
reasoning, which makes it possible to take into account inaccuracies and uncertainties. In order
to introduce the concept of fuzzy sets, we first review the elementary set theory of classical
mathematics. It will be seen that the fuzzy set theory is a very natural extension of the classical
set theory, and is also a rigorous mathematical notion.
Fuzzy logic is defined as a Multivalued Logic with various degrees of values for its
member elements. Fuzzy logic is based on "degrees of truth" than the (1 or 0) Boolean logic
on which the modern computer is based.
2
4.2.2 Classical Sets
Let A and B be two subsets on the universe X. Operations are shown below
A ∩ B = B ∩ A. …….. (5)
Associativity A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩ C …….. (6)
Distributivity A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) …….. (7)
Idempotency A∪A=A
Identity A∪Ø=A
A∩X=A
A ∩ Ø = Ø. …….. (9)
A ∪ X = X.
Transitivity If A ⊆ B and B ⊆ C, then A ⊆ C ……… (10)
Involution A’’ = A ……… (11)
3
4.2.3 Fuzzy Sets
A fuzzy set is a set with a smooth boundary. Fuzzy logic is based on the theory of fuzzy
sets, which is a generalization of the classical set theory. Saying that the theory of fuzzy sets is
a generalization of the classical set theory means that the latter is a special case of fuzzy sets
theory. To make a metaphor in set theory speaking, the classical set theory is a subset of the
theory of fuzzy sets,
A fuzzy set, is defined as a set containing elements that have varying degrees of
membership values in the range of zero to one.
Union …. (15)
Intersection …. (16)
Complement …. (17)
Fuzzy logic is based on fuzzy set theory, which is a generalization of the classical set
theory. The classical sets are also called clear sets, classical logic is also known as Boolean
logic or binary.
Figure 1 gives the Membership function characterizing the subset of 'good' quality of
service. Let us consider X be a set. A fuzzy subset A of X is characterized by a membership
function. Let Input 1is quality of service. Subsets: poor, good and excellent. Input 2 is quality
of food. Subsets: awful and delicious. Consider the output tip amount. Subsets: low, medium
4
and high. Refer figure 3. For ‘Good’ a value of ‘1’ is taken and for ‘Poor’ a value of ‘0’ is
taken. This is how membership functions converts a classical value into Fuzzy value.
The shape of the membership function is chosen arbitrarily by following the advice of
the expert or by statistical studies: sigmoid, hyperbolic, tangent, exponential, Gaussian or any
other form can be used.
A fuzzy set is defined by a function that maps objects in a domain of concern into their
membership value in a set. Such a function is called the membership function.
A fuzzy set, then, is a set containing elements that have varying degrees of membership
in the set. This idea is in contrast with classical, or crisp, sets because members of a crisp set
would not be members unless their membership is full, or complete, in that set (i.e., their
membership is assigned a value of 1). Elements in a fuzzy set, because their membership need
not be complete, can also be members of other fuzzy sets on the same universe.
Figure 2: Diagrams for (a) crisp set boundary and (b) fuzzy set boundary
Examples
5
The total number of elements in a universe X is called its cardinal number, denoted nx,
where x is a label for individual elements in the universe.
Collections of elements within a universe are called sets, and collections of elements
within sets are called subsets. We define the null set, ∅, as the set containing no elements, and
the whole set, X, as the set of all elements in the universe.
A fuzzy set is prescribed by vague or ambiguous properties; hence its boundaries are
ambiguously specified. Fuzzy set theory permits the gradual assessment of the membership of
elements in a set, described with the aid of a membership function valued in the real unit [0,1].
Examples
Words like young, tall, good or high are fuzzy. There is no single quantitative value
which defines the young term. For some people, age 25 is young, and for others, age 35 is
young. The concept young have no boundary. Refer figure 3. The linguistic terms middle age,
old age, etc represents the member element AGE.
Two special properties of set operations are known as the excluded middle axioms and
De Morgan’s principles. These properties are enumerated here for two sets A and B. The
excluded middle axioms are very important because these are the only set operations described
here that are not valid for both classical sets and fuzzy sets. There are two excluded middle
6
axioms. The first, called the axiom of the excluded middle, deals with the union of a set A and
its complement; the second, called the axiom of contradiction, represents the intersection of a
set A and its complement.
All Properties of classical sets also hold for fuzzy sets, except for the excluded middle
and contradiction axioms. Fuzzy sets can overlap. A set and its complement can overlap. The
excluded middle axioms, extended for fuzzy sets, are expressed as
Venn diagrams comparing the excluded middle axioms for classical (crisp) sets and
fuzzy sets are shown in the below Figure 4.
Figure 4(a) Crisp set A and its complement; (b) Fuzzy AA’ ; and (c) crisp A A’ =
All other operations on classical sets also hold for fuzzy sets, except for the excluded
middle axioms. These two axioms do not hold for fuzzy sets since they do not form part of the
basic axiomatic structure of fuzzy sets. Since fuzzy sets can overlap, a set and its complement
can also overlap. The excluded middle axioms, extended for fuzzy sets, are expressed as:
Control systems abound in our everyday life; perhaps we do not see them as such,
because some of them are larger than what a single individual can deal with, but they are
7
ubiquitous. For example, economic systems are large, global systems that can be controlled;
ecosystems are large, amorphous, and long-term systems that can be controlled.
Control systems are sometimes divided into two classes. If the objective of the control
system is to maintain a physical variable at some constant value in the presence of disturbances,
the system is called a regulatory type of control, or a regulator. The second class of control
systems is set point tracking controllers. In this scheme of control, a physical variable is
required to follow or track some desired time function. An example of this type of system is an
automatic aircraft landing system, in which the aircraft follows a “ramp” to the desired
touchdown point.
A number of assumptions are implicit in a fuzzy control system design. Six basic
assumptions are commonly made whenever a fuzzy rule-based control policy is selected.
The plant is observable and controllable: state, input, and output variables are usually
available for observation and measurement or computation. There exists a body of knowledge
comprising a set of linguistic rules, engineering common sense, intuition, or a set of input–
output measurements data from which rules can be extracted. A solution exists. The control
engineer is looking for a “good enough” solution, not necessarily the optimum one. The
controller will be designed within an acceptable range of precision. The problems of stability
and optimality are not addressed explicitly; such issues are still open problems in fuzzy
controller design.
8
Control system is a set of hardware component which regulates or alters or modifies
the behavior of the system. Fuzzy control system uses approximation so that the nonlinearity,
data or knowledge incompleteness is reduced. The General Block Diagram is shown in
Figure 6.
Fuzzy relations is used to map elements of one universe, say X, to those of another
universe, say Y, with the help of Cartesian product. The “strength” of the relation measured
9
with a membership function having “degrees” of strength of the relation on the unit interval
[0,1]. Hence, a fuzzy relation is a mapping from the Cartesian space X × Y to the interval
[0,1], where the strength of the mapping is expressed by the membership function
Cardinality of fuzzy sets is infinity; the cardinality of a fuzzy relation between two or
more universes is also infinity.
Let and be fuzzy relations on the Cartesian space X × Y. Then the following
operations apply for the membership values for various set operations
Let R and S be fuzzy relations on the Cartesian space X × Y. Then the following
operations apply for the membership values for various set operations (these are similar to the
same operations on crisp sets
The excluded middle axioms for fuzzy relations do not result, in general, in the null
relation, O, or the complete relation, E.
...........(24)
10
..............(25)
From the above equations, the excluded middle axioms for fuzzy relations do not
result, in general, in the null relation, O, or the complete relation, E.
Let be a fuzzy set on universe X and be a fuzzy set on universe Y, then the
Cartesian product between fuzzy sets and will result in fuzzy relation , which is given
as
...........(26)
........(27)
The Cartesian product defined by is implemented in the same way as the cross
product of two vectors. Cartesian product is not the same as the arithmetic product. Cartesian
product employs the idea of pairing of elements among sets. For example, for a fuzzy set
A has four elements, for a fuzzy set B that has five elements, then the resulting fuzzy
relation R will be represented by a matrix of size 4 × 5, that is, R will have four rows and
five columns.
.........(28)
.........(29)
and fuzzy max–product composition is defined in terms of the membership function theoretic
notation as
.........(30)
11
It should be noted out that neither crisp nor fuzzy compositions are commutative in
general so
Different types of composition are (1) MAX-MIN (2) MAX –PRODUCT (3) MAX-
MAX (4) MIN- MIN (5) MIN-MAX etc., Compositions provides more information which
reduces the impreciseness present in the problem.
4.7 FUZZIFICATION
4.7.1 Fuzzification
Inference
Intuition
Rank ordering
Using GA
Using ANN
Inductive reasoning
Meta rules
Fuzzy statistics
These are some methods used to generate membership values and there by used to
convert a crisp value into fuzzy.
In a FLS, a rule base is constructed to control the output variable. A fuzzy rule based
system consists of simple IF-THE with a condition and a conclusion. Sample fuzzy rule for an
air conditioner system is given below.
Table 1 shows the matrix representation of the fuzzy rules for the above said FLS. Row
contains the values that current room temperature can take, column is the values for target
12
temperature, and each cell is the resulting command. For instance, if temperature is cold and
target is warm then command is heat.
IF x is A THEN y is B
where x and y are linguistic variables; A and B are linguistic values determined by fuzzy sets
on the universe of discourse X and Y, respectively.
In the field of artificial intelligence (machine intelligence), there are various ways to
represent knowledge. Perhaps the most common way to represent human knowledge is to form
it into natural language expressions of the type IF premise (antecedent), THEN conclusion
(consequent). The form is commonly referred to as the IF–THEN rule-based form; this form is
generally referred to as the deductive form.
It does not, however, capture the deeper forms of knowledge usually associated with
intuition, structure, function, and behavior of the objects around us simply because these latter
forms of knowledge are not readily reduced to linguistic phrases or representations; this deeper
form, is referred to as inductive.
The fuzzy rule-based system is most useful in modeling some complex systems that can
be observed by humans because they make use of linguistic variables as their antecedents and
consequents; as described here these linguistic variables can be naturally represented by fuzzy
sets and logical connectives of these sets.
13
4.9 DEFUZZIFICATION
Centroid method
Weighted average method
Mean-max membership method
Centre of sums method
Centre of largest area method
First (or last) of maxima method etc
Any of the above method can be used based on the level of intelligent control required.
REFERENCE BOOKS
1. Timothy Ross, “Fuzzy Logic with Engineering Application”, McGraw Hill, Edition
1997.
3. Fuzzy Sets and Fuzzy Logic: Theory and Applications byGeorge J. Klir, Bo Yuan,
Prentice Hall, 1995.
14
SCHOOL OF ELECTRICAL AND ELECTRONICS
1
UNIT 5 FLCS, CLASSIFICATION & APPLICATIONS
Fuzzy decision making -Types, Fuzzy Rule Based System, Knowledge Based System, Non-
linear Fuzzy Control system - Fuzzy Classification - Hard C Means - Fuzzy C Means.
Applications of fuzzy - Water level controller, Fuzzy image Classification, Speed control
of motor.
5.1 KNOWLEDGE BASE
Place in which different information related to the problem are stored here. It’s the
collection of expert knowledge about the problem. Advantages of fuzzy knowledge base.
• Comprehensibility
• Parsimony
• Modularity
• Uncertainty
• Parallelism
• Robust
Collection of rules representing different environment is called Rule base system. Rules
are framed by assigning relationship to fuzzy linguistic variables. In a FLS, a rule base is
constructed to control the output variable. A fuzzy rule-based system consists of simple IF-
THEN with a condition and a conclusion. Sample fuzzy rule for an air conditioner system is
given below.
These IF-THEN rules are the base for fuzzy reasoning. If-THEN rules are of different
types: They are as follows.
Table 1 shows the matrix representation of the fuzzy rules for the above said FLS. Rows
contains the values that current room temperature can take, columns are the values for target
temperature, and each cell is the resulting command. For instance, if temperature is cold and
target is warm then command is heat.
2
Table 1: Fuzzy rule matrix example for Temperature control problem
This block responsible for fuzzy control outcomes. Making decisions under uncertainty
is tough since, we have to handle bulk information’s. Different classifications in this category
are,
Most control situations are more complex than we can deal with mathematically. In this
situation, fuzzy control can be developed, provided a body of knowledge about the control
process exists, and formed into a number of fuzzy rules. A simple FLCS is shown in figure
1.For example, suppose an industrial process output is given in terms of the pressure. We can
calculate the difference between the desired pressure and the output pressure, called the
pressure error (e), and we can calculate the difference between the desired rate of change of the
pressure, dp/dt, and the actual pressure rate, called the pressure error rate, (˙e). Also, assume
that knowledge can be expressed in the form of IF–THEN rules such as:
3
IF pressure error (e) is “positive big (PB)” or “positive medium (PM)” and
IF pressure error rate (˙e) is “negative small (NS),” THEN heat input change is
“negative medium (NM).”
The linguistic variables defining the pressure error, “PB” and “PM,” and the pressure
error rate, “NS” and “NM,” are fuzzy, but the measurements of both the pressure and pressure
rate as well as the control value for the heat (the control variable) ultimately applied to the
system are precise (crisp). An input to the industrial process (physical system) comes from the
controller. The physical system responds with an output, which is sampled and measured by
some device. If the measured output is a crisp quantity, it can be fuzzified into a fuzzy set .This
fuzzy output is then considered as the fuzzy input into a fuzzy controller, which consists of
linguistic rules.
The output of the fuzzy controller is then another series of fuzzy sets. Since most
physical systems cannot interpret fuzzy commands (fuzzy sets), the fuzzy controller output
must be converted into crisp quantities using defuzzification methods. These crisp (de-
fuzzified) control-output values then become the input values to the physical system and the
entire closed-loop cycle is repeated device. If the measured output is a crisp quantity, it can be
fuzzified into a fuzzy set.
This fuzzy output is then considered as the fuzzy input into a fuzzy controller, which
consists of linguistic rules. The output of the fuzzy controller is then another series of fuzzy
sets. Since most physical systems cannot interpret fuzzy commands (fuzzy sets), the fuzzy
controller output must be converted into crisp quantities using defuzzification methods These
crisp (De- fuzzified) control-output values then become the input values to the physical system
and the entire closed-loop cycle is repeated.
4
5.4.1 Fuzzy Logic Based Water Level Controller
Step 1:Identify the i/p and o/p variables. Here in this case minimum and maximum level are
inputs and valve position is output variables.
Step 2:Assign an appropriate membership function and perform Fuzzification process. i/p1 is
water level, i/p2 is error rate. Valve position is the output being controlled.
The membership graphs for i/p and o/p variables are given in Figure (2) a, b and c.
For water level three functions and for error rate again three membership functions are used.
Fig(2a) I/p membership function for Fig(2b)I/p membership function for error
water level rate
For output variable Valve position, we consider open slow, open fast, close slow, close
fast and no change as linguistic variables, which is shown in Figure c.
Rule 1: IF water level is min AND error rate is negative THEN valve position open fast
Rule 2: IF water level is max AND error rate is positive THEN valve position close fast
Similarly, rules are framed for remaining conditions and these rule outcomes are aggregated.
Final outcome is the defuzzied value of this aggregated value. Based on this value only the
value opens or closes.
5
5.4.2 Fuzzy Logic Based Image Classification
Fuzzy logic addresses the vagueness and ambiguity which is present in an image. Fuzzy
Logic is a powerful tool that denotes and analyses human knowledge in form of fuzzy rules.
The following Figure 3 represents a general Fuzzy based Image analysis procedure. In this
block diagram the input image is first converted into a fuzzy image. Here the image intensities
are fuzzified and converted into fuzzy values by assigning membership values for the
intensities present in the image. Figure represent a general procedure of Fuzzy based image
analysis procedure. Then based on an expert knowledge rules are created as given below.
6
Figure 3 &4 are adopted from: http://imageprocessingplace.com /downloads_V3/root
downloads /tutorials/ fuzzy_image_processing.pdf
Figure 4 shows how the original image pixels are converted into fuzzy values. After
obtaining a resultant image (fuzzified intensities) the other image processing procedure like
Enhancement, Segmentation, Object recognition, Clustering or Classification process are
implemented. The general image analysis procedures can be implemented to get the Region Of
Interest (ROI).
As the basic rule for designing any fuzzy based application we have to identify the input
and output variables. In this example let us consider Voltage and Error Rate as inputs and Speed
as output variable. The various components present in a DC motor is shown in Figure 5. DC
motors are used in various applications like Electric trains, Cranes, Trolly’s, Rolling mills,
Robotic Manipulators etc.
Consider the input variable Voltage. By applying Triangular Membership functions the
crisp voltage values are converted into fuzzy as shown in Figure 6.Similarly Figure 7 shows
the variable Error rate with three Triangular membership functions.For output variable Speed
,three membership functions namely Less, Average speed and Large speed is considered as
shown in Figure 8. All these function values are normalised between the range ‘o’ to ‘1’. Figure
7
9 shows the Rule based system generated for this application. Figure 9 shows the rule based
system ( generated using MATLAB Software). Around 8 Rules have been formed as an
example. Figure 10 shows the rule sfiring and deffuzified output.
8
Figure : 8 Membership Graph for the Output “Speed”
9
Figure : 10 Rule Viewer
The last graph on the right side is the defuzzified output.This is the value which is given
to the final device.
5.5 CLUSTERING
Clustering is a process in which large sets of data is grouped into clusters / groups of
smaller sets of similar data. Fuzzy c-means (FCM) is a method of clustering which allows one
piece of data to be associated with two or more clusters. This method was developed by Dunn
in 1973 and improved by Bezdek in 1981 and it is frequently used in pattern recognition
applications. Fuzzy C means algorithm is an extended version of Hard C means technique.
Fuzzy C means is more adaptable procedure where Hard c means is not a flexible process.
FCM assigns membership values to each input data point with respect to the cluster centre or
cluster head. The distance between the cluster head and the current input data point position
which are present in each cluster is measured. For this purpose, Euclidean Distance measure is
used. FCM is similar to that of K Means algorithm.
10
5.5.1 Fuzzy C-Means Algorithm
(3) Update
(4) if || U(k+1) - U(k)|| < threshold, then stop iterating; otherwise return to step 2
Step 1. Initialize the centroids ci,i=1,..c. This is typically achieved by randomly selecting c
points from among all of the data points.
Step 2 Determine the membership matrix U by Equation
Step 3. Compute the dissimilarity function by using Equation below. Stop if its improvement
over previous iteration is below a threshold.
Step 4. Compute new centroids using ci=1\Gi(Xi)
The performance of the algorithm depends on the initial positions of centroids. So the
algorithm gives no guarantee for an optimum solution. _ Hard k-means algorithm executes a
sharp classification, in which each object is either assigned to a class or not.
5.6.2 Advantages
11
5.6.3 Disadvantages
REFERENCE BOOKS
1. Timothy Ross, “Fuzzy Logic with Engineering Application”, McGraw Hill, Edition
1997.
3. Fuzzy Sets and Fuzzy Logic: Theory and Applications byGeorge J. Klir, Bo Yuan,
Prentice Hall, 1995.
12