0% found this document useful (0 votes)
33 views193 pages

Image Processing 7

Chapter 7 discusses pattern recognition, defining it as the categorization of input data into identifiable classes based on significant features. It covers applications of pattern recognition, neural networks, and the learning processes involved, including supervised and unsupervised learning. The chapter emphasizes the importance of feature selection and classification algorithms in effectively identifying and organizing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views193 pages

Image Processing 7

Chapter 7 discusses pattern recognition, defining it as the categorization of input data into identifiable classes based on significant features. It covers applications of pattern recognition, neural networks, and the learning processes involved, including supervised and unsupervised learning. The chapter emphasizes the importance of feature selection and classification algorithms in effectively identifying and organizing data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 193

CHAPTER7

Representations, Description and Recognition


(6 Hrs.)

Bal Krishna Subedi


CDCSIT, TU
CONTENTS
Introduction to some descriptors (Chain codes,
Signatures, Shape Numbers, Fourier Descriptors),
Patterns and pattern classes, Decision-Theoretic
Methods, Overview of Neural Networks in Image
Processing, Overview of pattern recognition.
Human Perception
What is a Pattern Recognition?
• Pattern recognition can be defined as the categorization of input
data into identifiable classes via the extraction of significant
features or attributes of the data from a background of irrelevant
detail.
• “The assignment of a physical object or event to one of several
pre specified categories”–Duda and Hart.
• Pattern recognition is about guessing or predicting the unknown
nature of an observation, a discrete quantity such as black or
white, one or zero, sick or healthy, real or fake.
• For example, a pattern could be a fingerprint image, a
handwritten word, a human face, or a speech signal. The pattern
recognition problems are important in a variety of engineering
and scientific disciplines such as biology, psychology, medicine,
marketing, artificial intelligence, computer vision and remote
sensing.
Pattern Recognition Applications
An Example
An Example : Decision Process
An Example : Selecting Features
An Example : Selecting Features
An Example : Selecting Features
An Example : Selecting Features
An Example : Cost of Error
An Example : Cost of Error
An Example : Multiple Features
An Example : Multiple Features
An Example : Decision Boundaries
An Example : Decision Boundaries
Pattern Recognition Systems
Pattern Recognition Systems
Pattern Recognition Systems
The Design Cycle
The Design Cycle
The Design Cycle
The Design Cycle
The Design Cycle
Summary
Neural Network
• Neural networks represent a brain metaphor for information
processing.
• Neural computing refers to a pattern recognition methodology
for machine learning. The resulting model from neural
computing is often called an artificial neural network (ANN)
or neural network (NN).
• Due to their ability to learn from the data, their nonparametric
nature (i.e., no rigid assumptions), and their ability to
generalize, neural networks have been shown to be promising
in many forecasting and business classification applications.
Basic Concepts of Neural Networks
The human brain is composed of special cells called neurons.
Neural network elements:
- Nucleus: The central processing portion of a neuron
- Soma: The main body of the neuron in which the cell nucleus is
contained
- Dendrite: The part of a biological neuron that provides inputs to
the cell
- Axon: An outgoing connection (i.e., terminal) from a biological
neuron
- Synapse: The connection (where the weights are) between
processing elements in a neural network
Artificial Neural Network
- Neural concepts are usually implemented as software
simulations of the massive parallel processes that involve
processing elements (also called artificial neurons)
interconnected in a network structure.
- Connections between neurons have an associated weight.
- Each neuron calculates a weighted sum of the incoming neuron
values, transforms this input, and passes on its neural value as
the input to subsequent neurons or external outputs.
Units of Neural Network:

Nodes(units):
Nodes represent a cell of neural network.
Links:
Links are directed arrows that show propagation of information from one node to another
node.
Activation:
Activations are inputs to or outputs from a unit.
Weight:
Each link has weight associated with it which determines strength and sign of the
connection.
Activation function:
A function which is used to derive output activation from the input activations to a given
node is called activation function.
Bias Weight:
Bias weight is used to set the threshold for a unit. Unit is activated when the weighted sum
of real inputs exceeds the bias weight.
Why use neural networks?
Neural networks, with their remarkable ability to derive meaning from
complicated or imprecise data, can be used to extract patterns and detect trends
that are too complex to be noticed by either humans or other computer techniques.
A trained neural network can be thought of as an "expert" in the category of
information it has been given to analyze. Other advantages include:
1. Adaptive learning: An ability to learn how to do tasks based on the data given
for training or initial experience.
2. Self-Organisation: An ANN can create its own organisation or representation
of the information it receives during learning time.
3. Real Time Operation: ANN computations may be carried out in parallel, and
special hardware devices are being designed and manufactured which take
advantage of this capability.
4. Fault Tolerance via Redundant Information Coding: Partial destruction of a
network leads to the corresponding degradation of performance. However, some
network capabilities may be retained even with major network damage
Network structures:
Feed-forward networks: Feed-forward ANNs allow signals to travel one
way only; from input to output. There is no feedback (loops) i.e. the output of
any layer does not affect that same layer. Feed-forward ANNs tend to be
straight forward networks that associate inputs with outputs. They are
extensively used in pattern recognition. This type of organization is also
referred to as bottom-up or top-down.
Feedback networks (Recurrent networks)
Feedback networks can have signals traveling in both directions by introducing loops
in the network. Feedback networks are very powerful and can get extremely
complicated. Feedback networks are dynamic; their 'state' is changing continuously
until they reach an equilibrium point. They remain at the equilibrium point until the
input changes and a new equilibrium needs to be found. Feedback architectures are
also referred to as interactive or recurrent.
Types of Feed Forward Neural Network:
Single-layer neural
networks (perceptrons)
A neural network in
which all the inputs
connected directly to the
outputs is called a
single-layer neural
network, or a perceptron
network. Since each
output unit is
independent of the others
each weight affects only
one of the outputs.
Multilayer neural
networks (perceptrons)
The neural network
which contains input
layers, output layers and
some hidden layers also
is called multilayer
neural network. The
advantage of adding
hidden layers is that it
enlarges the space of
hypothesis. Layers of the
network are normally
fully connected.
Perceptron Learning Theory:
• The term "Perceptrons" was coined by Frank RosenBlatt in
1962 and is used to describe the connection of simple neurons
into networks. These networks are simplified versions of the
real nervous system where some properties are exagerrated
and others are ignored. For the moment we will concentrate on
Single Layer Perceptrons.
• So how can we achieve learning in our model neuron? We
need to train them so they can do things that are useful. To do
this we must allow the neuron to learn from its mistakes.
• There is in fact a learning paradigm that achieves this, it is
known as supervised learning and works in the following
manner.
i. set the weight and thresholds of the neuron to random values.
ii. present an input.
iii. caclulate the output of the neuron.
iv. alter the weights to reinforce correct decisions and discourage
wrong decisions, hence reducing the error. So for the network to
learn we shall increase the weights on the active inputs when we
want the output to be active, and to decrease them when we want
the output to be inactive.
v. Now present the next input and repeat steps iii. - v.
Perceptron Learning Algorithm:
Backpropagation
• It is a supervised learning method, and is an implementation of the
Delta rule. It requires a teacher that knows, or can calculate, the
desired output for any given input.
• It is most useful for feed-forward networks (networks that have no
feedback, or simply, that have no connections that loop). The term is
an abbreviation for "backwards propagation of errors".
Backpropagation requires that the activation function used by the
artificial neurons (or "nodes") is differentiable.
• As the algorithm's name implies, the errors (and therefore the
learning) propagate backwards from the output nodes to the inner
nodes. So technically speaking, backpropagation is used to calculate
the gradient of the error of the network with respect to the network's
modifiable weights.
1. An error value is calculated for each node in the outer layer.
2. The weights feeding into each node, in this layer, are adjusted
according to the error value for that node (in a similar way to the
previous example).
3. The error, for each of the nodes, is then attributed to each of
the nodes in the previous layer (on the basis of the strength of
the connection). Thus the error is passed back through the
network.
4. Steps 2 and 3 are repeated, i.e., the nodes in the preceding
layer are adjusted, until the errors are propagated backwards
through the entire network, finally reaching the input layer
Learning in ANN
Supervised learning
- Uses a set of inputs for which the desired outputs are known
- Example: Back propagation algorithm
• Unsupervised learning
- Uses a set of inputs for which no desired output are known.
- The system is self-organizing; that is, it organizes itself
internally. A human must examine the final categories to assign
meaning and determine the usefulness of the
results.
- Example: Self-organizing map
Applications
Classification
• Classification includes a broad range of decision-theoretic
approaches to the identification of images (or parts thereof).
• All classification algorithms are based on the assumption that
the image in question depicts one or more features (e.g.,
geometric parts in the case of a manufacturing classification
system, or spectral regions in the case of remote sensing) and
that each of these features belongs to one of several distinct
and exclusive classes.
Classification
• The classes may be specified a priori by an
analyst (as in supervised classification) or
automatically clustered (i.e. as in unsupervised
classification) into sets of prototype classes,
where the analyst merely specifies the number
of desired categories.
Classification
• Image classification analyzes the numerical properties
of various image features and organizes data into
categories. Classification algorithms typically employ
two phases of processing: training and testing.
• In the initial training phase, characteristic properties
of typical image features are isolated and, based on
these, a unique description of each classification
category, i.e. training class, is created. In the
subsequent testing phase, these feature-space
partitions are used to classify image features.
Role of Image Classifier
• The image classifier performs the role of a
discriminant – discriminates one class against
others
• Discriminant value highest for one class, lower
for other classes (multiclass)
• Discriminant value positive for one class,
negative for another class (two class)
Features
• Features are attributes of the data elements
based on which the elements are assigned to
various classes.
• E.g., in satellite remote sensing, the features
are measurements made by sensors in
different wavelengths of the electromagnetic
spectrum – visible/ infrared /
microwave/texture features
Features
• In medical diagnosis, the features may be the
temperature, blood pressure, lipid profile,
blood sugar, and a variety of other data
collected through pathological investigations
• The features may be qualitative (high,
moderate, low) or quantitative.
Features
• Feature can be defined as any distinctive aspect, quality or characteristic
which, may be symbolic (i.e., color) or numeric (i.e., height).
• The combination of d-features is represented as a d-dimensional column
vector called a feature vector. The d-dimensional space defined by the
feature vector is called feature space. Objects are represented as points in
feature space. This representation is called a scatter plot.
• Pattern is defined as composite of features that are characteristic of an
individual. In classification, a pattern is a pair of variables {x,w} where x is a
collection of observations or features (feature vector) and w is the concept
behind the observation (label). The quality of a feature vector is related to
its ability to discriminate examples from different classes (Figure 1.1).
Examples from the same class should have similar feature values and while
examples from different classes having different feature values.
Features
Feature Selection
• To distinguish objects of different types,
characteristics that can produce descriptive
parameters should be decided.
• The measured quantity of such selection is
called Feature.
• Proper selection simplifies the problem but
improper increases complexity.
• Good selection criteria could be: realiable,
independent and discriminating, at least if they are
useful features.
Good Features Characteristics

Discrimination:
• Features should take different values for
object belonging to different class.
• Ex: A diameter could be a good feature to sort
the fruits as it takes significantly different
values for different fruits like grapes and
apples.
Good Features Characteristics
Reliability:
• Features should be similar for all objects of
the same class.
• Ex: Color may be a poor features for apples if
they occur in varying degree of redness i.e.
green apple and red apple might differ
significantly even though they both belong to
the class of apples.
Good Features Characteristics
Independence:
• Various features in multi-feature system
should be uncorrelated with each other.
• Ex: weight and diameter would constitute
highly correlated features, namely, size of the
fruits.
Small in numbers:
• More features then more complexity, training
and providing weights becomes more
difficult.
Good Features Characteristics
• The goal of a classifier is to partition feature space into class-
labeled decision regions. Borders between decision regions
are called decision boundaries (Figure 1.2).
Good Features Characteristics
• If the characteristics or attributes of a class are known, individual objects
might be identified as belonging or not belonging to that class. The
objects are assigned to classes by observing patterns of distinguishing
characteristics and comparing them to a model member of each class.
• Pattern recognition involves the extraction of patterns from data, their
analysis and, finally, the identification of the category (class) each of the
pattern belongs to. A typical pattern recognition system contains a
sensor, a preprocessing mechanism (segmentation), a feature extraction
mechanism (manual or automated), a classification or description
algorithm, and a set of examples (training set) already classified or
described (post-processing)(Figure 1.3).
Good Features Characteristics
Classification
Classification
• Supervised learning is where there is input variables
(x) and an output variable (Y) and an algorithm is
used to learn the mapping function from the input to
the output.
Y = f(X)
• The goal is to approximate the mapping function so
well that when there is new input data (x) then the
output variables (Y) for that data can be predicted.
Classification
• Unsupervised learning is where there is only input
data (X) and no corresponding output variables.
Unsupervised learning finds hidden patterns or
intrinsic structures in data. It is used to draw
inferences from datasets consisting of input data
without labeled responses.
Supervised and unsupervised learning with a real-life example

• Suppose we have a basket and filled it with different kinds of fruits.


• Task is to arrange them into groups.
• We have four types of fruits. They are Apple, Banana, Grapes and cherry
Supervised Learning:
• We already know about the physical characters of fruits
• So arranging the same type of fruits at one place is easy now
• In data mining terminology the earlier work is called as training the data
• The things are learnt from the training data. This is because of response
variable
• Response variable means just a decision variable
• We can observe response variable below (FRUIT NAME)
Classification
No. SIZE COLOR SHAPE FRUIT NAME

Rounded shape
1 Big Red with depression Apple
at the top

Heart-shaped to
2 Small Red Cherry
nearly globular

Long curving
3 Big Green Banana
cylinder

Round to
4 Small Green oval,Bunch Grape
shape Cylindrical
Classification
• Suppose we have taken a new fruit from the basket
then we will see the size, color, and shape of that
particular fruit.
• If size is Big, color is Red, the shape is rounded shape
with a depression at the top, you will confirm the
fruit name as apple and you will put in apple group.
• We can observe in the table that a column was
labeled as “FRUIT NAME“. This is called as a response
variable.
Classification
Unsupervised Learning:
• Suppose we have a basket and it is filled with some
different types of fruits and our task is to arrange
them as groups.
• This time, we don’t know anything about the fruits,
this is the first time you have seen them. You have no
clue about those.
• So, how will you arrange them?
• WE will take a fruit and will arrange them by
considering the physical character of that particular
fruit.
Classification
• Suppose we have considered color.
• Then we will arrange them on considering base condition
as color.
• Then the groups will be something like this.
• RED COLOR GROUP: apples & cherry fruits.
• GREEN COLOR GROUP: bananas & grapes.
• So now we will take another physical character such as size.
– RED COLOR AND BIG SIZE: apple.
– RED COLOR AND SMALL SIZE: cherry fruits.
– GREEN COLOR AND BIG SIZE: bananas.
– GREEN COLOR AND SMALL SIZE: grapes.
• Here we did not learn anything before ,means no train data
and no response variable
Representation and Description
Representation and Description
Representation and Description
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Chain Codes
Signatures
Signatures
Signatures
Signatures
Signatures
Signatures
Some Simple Descriptors
Shape Numbers
Shape Numbers
Shape Numbers
Shape Numbers
Fourier Descriptors
Fourier Descriptors
Fourier Descriptors
Fourier Descriptors
Fourier Descriptors

If only first P Fourier Coefficients are used.


Fourier Descriptors
Fourier Descriptors

This is a way of using the Fourier transform to


analyze the shape of a boundary.
• The x-y coordinates of the boundary are treated as
the real and imaginary parts of a complex number.
• Then the list of coordinates is Fourier transformed
using the DFT .
• The Fourier coefficients are called the Fourier
descriptors.
• The basic shape of the region is determined by the
first several coefficients, which represent lower
frequencies.
• Higher frequency terms provide information on the
fine detail of the boundary.
Fourier Descriptors
Pattern and Pattern Classes
Object Recognition
Object Recognition
Object Recognition
Object Recognition
Object Recognition
Object Recognition
Object Recognition
Object Recognition
Recognition Based on Decision
Theoretic Methods
Minimum Distance Classifier
Minimum Distance Classifier
Minimum Distance Classifier
Minimum Distance Classifier
Matching By Correlation
Matching By Correlation
Matching By Correlation
Matching By Correlation
Neural Network
Neural Network : Training Algorithm
Neural Network : Training Algorithm
Neural Network : Training Algorithm
Neural Network : Training Algorithm
Neural Network : Training Algorithm
Neural Network : Training Algorithm

http://en.wikipedia.org/wiki/Hopfield_network

http://home.agh.edu.pl/~vlsi/AI/hamming_en/
Hamming Neural Network
Hopfield Neural Network
Classification
• Unsupervised(Clustering)
• Supervised • K – Means
• Parallelepiped • ISODATA
• Minimum Distance
• Maximum Likelihood
(Bayes Rule)
• Non parametric
• Parametric
Supervised Classification
• Supervised Classification
• Also called as supervised classification
• Can be distributed free or statistical
• Distribution Free methods do not require
knowledge of probability distributions
• Statistical techniques are based on
probability distribution which can be both
parametric as well as non parametric.
Supervised Classification
• A Classification Procedure is said to be
supervised if user defines decision rules or
provides training sets for guiding machine.
• General Procedure includes
1. Set up a Classification Scheme:
a) Be appropriate to the scale and resolution
b) Be appropriate to the application
c) Include background classes that can be confused with
the target classes.
Supervised Classification…
2. Select the features to be used in the
classification(Feature Extraction)
a) Reduce Redundancy
b) Define useful features
3. Characterize the classes in terms of selected
features.
4. Determine the parameters (if any) required for the
classifier
5. Perform and classification
6. Evaluate the result
Supervised Classification…
• The classifier has the advantage of an analyst
or domain knowledge using which the
classifier can be guided to learn the
relationship between the data and the classes.
• The number of classes, prototype pixels for
each class can be identified using this prior
knowledge.
Parallelepiped Classifier
• The parallelepiped classifier is essentially a
thresholding operation in multiple bands.
• The simplest case is with a single variable (1 spectral
band) where a pixel is assigned to a particular class
if its gray value is less than some minimum and
greater that some maximum
Parallelepiped Classification…
• Advantages:
• Models data distributions as a rectangle in measurement
space.
• Simple to set up
• Easy to understand
• Very fast(Real Time)
• Disadvantages
• Actual data distribution does not fit rectangular model
well.
Parallelepiped Classification
• Even though the classes do not overlap, the
rectangular boundaries defined by the classifier do
overlap.
• Within the overlap region the classifier cannot
distinguish between the two classes.
• A more powerful and adaptable classification
scheme is needed.
Minimum Distance Classifier
• Defines classes in terms of the distance from a
prototype vector usually the mean vector from the
class.
• Discriminate function is defined in terms of distance
from the mean: di(k) = 1/(µi – k) where µi is the
mean vector for the ith class.
Minimum Distance Classifier
• Simplest kind of supervised classification
• The method:
- Calculate the mean vector for each class
- Calculate the statistical (Euclidean)
distance from each pixel to class mean vector
- Assign each pixel to the class it is closest
to
Minimum Distance Classifier…
Advantages
• Somewhat better description of distribution.

Disadvantages
• Much more computation than for the
parallelepiped method.
Maximum Likelihood Classification
• Calculates the likelihood of a pixel being in
different classes conditional on the available
features, and assigns the pixel to the class
with the highest likelihood
Maximum Likelihood Classification
• The likelihood of a feature vector x to be in class Ci is
taken as the conditional probability P(Ci|x).
• We need to compute P(Ci|x), that is the conditional
probability of class Ci given the pixel vector x.
• It is not possible to directly estimate the conditional
probability of a class given the feature vector.
• Instead, it is computed indirectly in terms of the
conditional probability of feature vector x given that
it belongs to class Ci.
Maximum Likelihood Classification
• P(Ci|x) is computed using Bayes’ Theorem in terms
of P(x|Ci)
• P(Ci|x) = P(x|Ci) P(Ci) / P(x)
• x is assigned to class Cj such that
• P(Cj|x) = Maxi P(Ci|x), i=1…K, the number of classes.
• P(Ci) is the prior probability of occurrence of class i in
the image
• P(x) is the multivariate probability density function of
feature x.
Maximum Likelihood Classification
Advantages and Disadvantages:
• Normally classifies every pixel no matter how far it is
from a class mean
• Slowest method – more computationally intensive
• Normally distributed data assumption is not always
true, in which case the results are not likely to be
very accurate
• Thresholding condition can be introduced into the
classification rule to separately handle ambiguous
feature vectors
Maximum Likelihood:Non Parametric
Classifiers
• Class conditional probability density function P(X/A)
is estimated using frequency of measurement
vectors in the training data.
• Important advantage is that any pattern, however
irregular it may be, can be characterized exactly.
• Major shortcomings are:
• Difficult to obtain large enough training sample to
adequately characterize the PDF.
• Require massive memory or clever programming
Maximum Likelihood:Parametric classifier

• Uses a parameterized model to describe the


distribution P(x/w).
• Functional form generally are well defined,
probabilities are completely specified when
means, variances, and covariances are
present.
K-NN Classifier
• K-nearest neighbour classifier
• Simple in concept, time consuming to
implement
• For a pixel to be classified, find the K closest
training samples (in terms of feature vector
similarity or smallest feature vector distance)
• Among the K samples, find the most frequently
occurring class Cm
• Assign the pixel to class Cm
K-NN Classifier
Algorithm run in following manner
• Determine k = number of nearest neighbors.
• Calculate the distance(w.r.t. Euclidean distance) between test
sample and all training samples.
• Sort the distances ( shortest should get ranking number one)
and determine nearest neighbors of the sample
• Gather the category classes of the nearest neighbors.
• Use the simple majority voting to predict the class of the test
sample
• -> the class that occurs the most frequently in the
nearest neighbors wins.
K-NN Classifier

In adjoin figure, basically there are two set of data, represented as plus and
minus region. We are interested to find out, where does X symbol lie. In fig (a)
considering K=1, it lie to minus region because with in calculated distance
there is higher majority of minus neighbor. While figure two clearly not
decisive because for given K=2, distance measured hold almost same number
of majority. While in third figure, the point goes to region plus as the majority
of plus region.
Unsupervised Learning
• Data do not have target attribute.
• Process involves Exploring and finding some
intrinsic properties in them.
• Technique named clustering is used, a way to
find similarity among data.
• No priori information is provided as
supervised one, so called unsupervised
learning.
Unsupervised Learning
• When access to domain knowledge or the experience
of an analyst is missing, the data can still be analyzed
by numerical exploration, whereby the data are
grouped into subsets or clusters based on statistical
similarity
• In the absence of reliable training data it is possible
to understand the structure of the data using
statistical methods such as clustering algorithms
• Popular clustering algorithms are k-means and
ISODATA.
Clustering :An illustration
• The data set has four natural groups of data
points. i.e. 4 natural clusters.
Why Clustering?
Problem domain:
• Grouping of people of similar sizes to make “small”,
“medium” and “large” T shirts.
• Tailor made for each person: Too Expensive
• One size fits all? : does not fit all
• In marketing, Segment customers according to their
similarities.
• In fact clustering is one of the most utilized data mining
technique. It finds application in medicine, psychology,
botany marketing and many more.
• In recent years, due to rapid increase of online documents,
text clustering becomes important.
K-Means Algorithm
• An unsupervised clustering algorithm
• “K” stands for number of clusters, it is typically a user
input to the algorithm; some criteria can be used to
automatically estimate K
• K-means algorithm is iterative in nature
• It converges, however only a local minimum is
obtained
• Works only for numerical data
• Easy to implement
K-Means Algorithm
Given k, algorithms work as follows:
1) Randomly choose k data points(seeds) to be
the initial centroids, cluster centers.
2) Assign each data point to the closest
centroid.
3) Re-compute the centroids using the current
cluster memberships.
4) If a convergence criterion is not met, go to 2.
K-Means Algorithm
K-Means Algorithm
K-Means Example
Pros and Cons of K means
Strengths:
• Simple : easy to understand and to implement
• Efficient : Time Complexity: O(tkn) where n is the
number of data points, k is the number of clusters,
and t is the number of iterations.
since both t and k are small, k-means is
considered a linear algorithm, k means is most
popular clustering algorithm.
Pros and Cons of K means
Weakness of K-Means
• The algorithm is only applicable if the mean is
defined. Generally, k-mode the centroid is
represented by most frequent values.
• Sensitive to outliers. Outliers are the data points
that are very far away from other data points. It
could be error in the data recording or some
special data points with very different values.
Pros and Cons of K means
Weakness of K-Means
• The algorithm is only applicable if the mean is
defined. Generally, k-mode the centroid is
represented by most frequent values.
• Sensitive to outliers. Outliers are the data points
that are very far away from other data points. It
could be error in the data recording or some
special data points with very different values.
Pros and Cons of K means
Weakness of K-Means
ISODATA
• Iterative Self-Organizing Data Analysis Technique (the
last A added to make the acronym sound better)
• ISODATA is a method of unsupervised classification
• Don’t need to know the number of clusters
• Algorithm splits and merges clusters
• User defines threshold values for parameters
• Computer runs algorithm through many iterations
until threshold is reached
ISODATA
• ISODATA unsupervised classification calculates class means evenly
distributed in the data space then iteratively clusters the remaining
pixels using minimum distance techniques.
• Each iteration recalculates means and reclassifies pixels with respect
to the new means.
• Iterative class splitting, merging, and deleting is done based on input
threshold parameters.
• All pixels are classified to the nearest class unless a standard
deviation or distance threshold is specified, in which case some
pixels may be unclassified if they do not meet the selected criteria.
• This process continues until the number of pixels in each class
changes by less than the selected pixel change threshold or the
maximum number of iterations is reached
How ISODATA works:
1) Cluster centers are randomly placed and pixels are
assigned based on the shortest distance to center
method
2) The standard deviation within each cluster, and the
distance between cluster centers is calculated
- Clusters are split if one or more standard
deviation is greater than the user-defined threshold
- Clusters are merged if the distance between
them is less than the user-defined threshold
How ISODATA works:
3) A second iteration is performed with the new
cluster centers
4) Further iterations are performed until:
i) the average inter-center distance falls below
the user-defined threshold,
ii) the average change in the inter-center
distance between iterations is less than a threshold, or
iii) the maximum number of iterations is reached
Advantages of ISODATA
• Don’t need to know much about the data
beforehand
• Little user effort required
• ISODATA is very effective at identifying
spectral clusters in data
Drawbacks of ISODATA
• May be time consuming if data is very
unstructured
• Algorithm can spiral out of control leaving only
one class
KNN vs. K-mean

• Many people get confused between these two statistical


techniques- K-mean and K-nearest neighbor. See some of the
difference below -
• K-mean is an unsupervised learning technique (no dependent
variable) whereas KNN is a supervised learning algorithm
(dependent variable exists)
• K-mean is a clustering technique which tries to split data
points into K-clusters such that the points in each cluster tend
to be near each other whereas K-nearest neighbor tries to
determine the classification of a point, combines the
classification of the K nearest points
HOPFIELD NEURAL NETWORK
• A Hopfield neural network is a type of artificial neural network
invented by John Hopfield in 1982.
• It usually works by first learning a number of binary patterns
and then returning the one that is the most similar to a given
input
• It is composed of one layer of nodes.
• All the nodes acts as both inputs and outputs.
• Each of node is connected to all the others but not itself.
A Hopfield network which operates in a discrete line fashion or
in other words, it can be said the input and output patterns are
discrete vector, which can be either binary (0,1) or bipolar (+1, -
1) in nature. The network has symmetrical weights with no self-
connections i.e., wij = wji and wii = 0.
Architecture

Following are some important points to keep in mind about


discrete Hopfield network −
• This model consists of neurons with one inverting and one
non-inverting output.
• The output of each neuron should be the input of other
neurons but not the input of self.
• Weight/connection strength is represented by wij.
• Connections can be excitatory as well as inhibitory. It would
be excitatory, if the output of the neuron is same as the input,
otherwise inhibitory.
• Weights should be symmetrical, i.e. wij = wji
Every pair of units i and j in a
Hopfield network have a
connection that is described by
the connectivity weight wij

The connections in a Hopfield net


typically have the following
restrictions:
• wii = 0 (no unit has a connection
with itself)
• wij = wji (connections are
symmetric)
Construction of Hopfield network
• Each input i has weight wi
• After each node is updated, output is produced.
• Weighted sum of all inputs = Ʃiwixi
Output

• The connection weight from node j to node i = wij


• The weight matrix, W should have following two rules :
Symmetry : wij = wji
No self connection : wii = 0
Construction of Hopfield network

• W=

• Inputs, x = where xj is either 1 or -1.


Training Algorithm
Case 1: Binary input patterns

• For storing a set of binary patterns s(p), p=1 to P


Where, s(p)= (s1(p), s2(p),……si(p),…….sn(p))
Weight matrix is given by

Wij= , for i is not equal to j

178
Training Algorithm

Case 2: For bipolar patterns (1, -1)

Weight matrix is given by:

Wij= , for i is not equal to j

179
Testing Algorithm
Step 1: Initialize the weights to store patterns.
Step 2: When activations of network are not converged,
perform steps 3 to 9.
Step 3: For each input vector x, perform steps 4-8.
Step 4: Make initial activations of network equal to external
input vector x :
Yi=Xi, i=1 to n
Step 5: Perform steps 6-8 for each unit Yi (units are updated in
random order)

180
Testing Algorithm
Step 6: calculate the net input of network.
Yin= Xi +

Step 7: Apply activation over net input to calculate


output
Yi =
Here,

181
Testing Algorithm
Step 8: Transmit the obtained output Yi to all the other
units.Thus, activation vectors are updated.

Step 9: Test the network for convergence.

182
Energy Function
• Hopfield defines the energy function of network by using
network architecture. The quantity is called "energy" because
it either decreases or stays the same upon network units
being updated
• Architecture includes the number of neurons, their output
functions, threshold values, connection between neurons,
and the strength of the connections.
• At each iteration of the processing of the network, the energy
value decreases and the network reaches a stable state when
its energy values reaches a minimum.

183
Energy Function
• For a given state xi of the network with connection weights
Wij,
where Wij=Wji and Wii=0

E=-

• Update Xi to Xi’ and new energy be E’.


• E’<=E.

184
Hamming Network :
Overview

Solves binary pattern recognition problems eg
recognizing apples and oranges

Uses both feedforward and feedback/recurrent
neural layers

Objective : To decide which prototype vector is
closest to the input vector

Working principle : Calculating hamming
distances between input vector and weight
vector of output neurons
Hamming Distance


The distance between two vectors is equal to
the number of elements that are different .

Greater the match between patterns , smaller
the hamming distance .

Node with minimum hamming distance to
input vector is the winner .

Hamming distance = number of different bits
Hamming Network
Architechture
Architechture contd...(I)
Architechture contd...(II)
Feed Forward Layer


Performs correlation , or inner product ,
between each of the prototype patterns and
the input pattern .

Uses linear transfer function .

Inner Product :
 Largest when the vectors point in same
direction
 Smallest if they point in opposite direction
Recurrent Layer


Initialized with the output of the feedforward
layer .

Deciding learning rate , number of neurons is
important .

Competitive Layer : neurons compete with
each other to determine a winner .

After competition only one neuron will have a
nonzero output .
Hamming Algorithm

You might also like