0% found this document useful (0 votes)
77 views

Pattern Recognition

This document provides an overview of a pattern recognition course taught by Prof. Jyotsna Singh. The course content covers basic principles of pattern recognition systems including learning and adaptation approaches. It also covers relevant mathematical foundations such as linear algebra, probability theory, and normal distributions. The document defines pattern recognition and discusses how humans and animals develop skills to recognize patterns. It also discusses the challenges in replicating complex human/animal pattern recognition abilities. Finally, it provides examples of pattern classification tasks like classifying images of fish into different species.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Pattern Recognition

This document provides an overview of a pattern recognition course taught by Prof. Jyotsna Singh. The course content covers basic principles of pattern recognition systems including learning and adaptation approaches. It also covers relevant mathematical foundations such as linear algebra, probability theory, and normal distributions. The document defines pattern recognition and discusses how humans and animals develop skills to recognize patterns. It also discusses the challenges in replicating complex human/animal pattern recognition abilities. Finally, it provides examples of pattern classification tasks like classifying images of fish into different species.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Pattern Recognition

Course Instructor
Prof. Jyotsna Singh
COURSE CONTENT:
Unit 1 Basics of pattern Design principles of pattern
recognition, recognition system, Learning and
adaptation, Pattern recognition
approaches, Mathematical
foundations: Linear algebra,
Probability Theory, Expectation,
mean and covariance, Normal
distribution, multivariate normal
densities, Chi squared test.
What is pattern recognition?
During their evolution, animals and human beings have
developed sophisticated skills for recognition of “patterns”
acquired by their sense organs.
Skills for fast recognition of predators, fast detection of
features which distinguish friends and foes, etc.
Pattern recognition skills have been initially developed to
struggle for existence, afterwards they have been refined to
develop high-level abilities (e.g., writing, painting).
Human pattern recognition may be regarded as the
identification of patterns within the data collected by our sense
organs.
It is worth noting that pattern recognition is an activity largely
subconscious for human beings.
The challenge of pattern recognition
[Theo Pavlidis, Why general AI is so hard?, http://theopavlidis.com]

We need to replicate complex transformations that the


human/animal brain has evolved over millions of years.

We have to deal with the fact the pattern-recognition


processing is not unidirectional and also affected by
other factors than the input (the “context”).
An engineering definition of pattern
recognition
Ø Patternrecognition aims to build machines able to recognize patterns like
aeronautical engineering aims to build airplanes able to fly.
Pattern recognition can be defined as the scientific discipline that studies
theories and methods for designing machines that are able to recognize
patterns in noisy data…

Pattern recognition has an “engineering” nature, as its final goal is the design of
“machines”
(R.P.W. Duin, F. Roli D. de Ridder, Pattern Recognition Letters, 2002)

Ø Nowadays, replicating the human pattern recognition performance for


a large variety of tasks (building a general pattern recognition system)
is still impossible.

Ø But we can be successful on limited and well understood tasks!


Phases of Pattern Recognition
Typically, three different phases are identified in a patter recognition
process: classification, recognition, and interpretation
These three phases are strictly linked, and are carried out jointly,
progressively, and in an unconscious way by human beings.

assigning a “pattern” to a category/class (e.g., assigning


– Classification:
symbols in a document to alphabet letters);
– Recognition: recognizing a complex object made up of component
patterns (e.g., recognizing a “word” after having classified the letters );

– Interpretation: understanding the meaning of the whole stream of input


data. For instance, interpreting the semantic meaning of a text, after the
recognition of the words.

Ø This course focuses on pattern classification. We use the term


recognition instead of classification if the context makes the meaning
clear, and there is no ambiguity.
How to characterize patterns
Different representations of patterns:
◦ A vector of numerical features
◦ First-order logical formulas
◦ Graphs
◦ Semantic networks etc.

The best representation


Patterns belonging to the same class should be close in the
feature space, and should be far away from patterns of
different classes
How to characterize patterns
Statistical approach (in this course, we focus on this approach)
◦ Patterns characterized with a numerical feature vector
◦ Classification model is based on probability density functions
Syntactic approach
◦ Patterns are described with “rules”
◦ Classification model is a “grammar”
Structural approach
◦ Patterns are represented with graphs
◦ Classification model is a matching algorithm between graphs
Symbolic approach
◦ Patterns are described with symbolic models
◦ Classification models are based on logical inference and symbolic processing
Dimensionality reduction
When dealing with high dimensional data, it is often useful to
reduce the dimensionality by projecting the data to a lower
dimensional subspace which captures the “essence” of the
data.
A simple example may be represented by the projection of
some 3D data down to a 2D plane.
The 2D approximation is quite good, since most points lie
close to this subspace. Reducing to 1d may result in a
rather poor approximation.
A hypothetical example of pattern classification
Suppose that a fish packing plant wants to automate the
process of sorting incoming fish on a conveyor belt, in order to
discriminate salmons from sea basses by processing images
acquired by a digital camera…
[Example from DHS book; Duda, Hart, and Stork book]
Classification of salmon and sea bass
Suppose that a fish packing plant wants to automate the process of
sorting incoming fish on a conveyor belt, in order to discriminate
salmons from sea basses by processing images acquired by a digital
camera.
Classification of salmon and sea bass

• A large part of the image acquired is not so useful to


classify the objects of interest, as a large part is
“background”…

•All the image background is not interesting for our


classification task.

• We need to identify “regions of interest” (called ROIs)


which contain the patterns we
want to classify,

• then to characterize such patterns (regions) with


“features” which are the most informative measures for
the classification task at hand.
Classification of salmon and sea bass
Supervised learning : Classification
The main goal in a classification task is to classify an unknown group of
input samples to one out of a number of classes, which are considered to be
known a priori.
Eg : optical character recognition (OCR), authorship recognition given a tex, music
classification (e.g., genre, author, song), among many others.
Data Representation : we should encode related information that
resides in the raw data in an efficient and information-rich way.
The data representation is usually done by transforming the raw data
(measured by a sensing device) into a new space in forms of patterns.
Each pattern is represented by a vector x ∈ Rl , which is also denoted as
feature vector.
segmentation
We pre-process images to enhance
quality (e.g. to enhance image contrast)
and to identify “patterns” of interest.
For instance, we “segment” images to
separate background image from
regions containing fish.
Image segmentation is often used in face
recognition, in order to identify the image
regions which contain the most relevant
information.

In this application, segmentation is often


based on the detection of “skin” regions.
Representing data by features
The elements of the such vector are known as features, which
represent the most elemental and distinctive characteristics of the
original data in the transformed space.

Few features are able to represent a much larger amount of data.

In some way, they are able to “synthesize” the original data while
keeping the most significant part of the information content.

Each pattern becomes a single point in an l-dimensional space,


known as the feature space or the input space.
Feature
Feature is any distinctive aspect, quality or characteristic
◦ Features may be symbolic (i.e., color) or numeric (i.e., height)
Definitions
◦ The combination of d features is represented as a d-dimensional column vector
called a feature vector
◦ The d-dimensional space defined by the feature vector is called the feature space
◦ Objects are represented as points in feature space. This representation is called a
scatter plot
Feature Extraction :
Regions of interest (containing “patterns”) are usually
characterized with a set of physical measures, called features
(e.g., length, color, shape of the fish).

Compact representations of patterns to facilitate their correct


classification Each pattern is characterized by a set of features
(e.g. length, lightness, area, etc)

Pattern classification is about assigning class labels to patterns


(“objects”). Patterns are described by a set of measurements
called also features (or attributes).

Hereafter, we assume that each pattern is described by a feature


vector with “d” elements: x = (x1, x2,…., xd).
Features generation and feature selection

The previously described process of representing raw data in a


feature space is known as feature generation stage.

Usually, one starts with some large value K of features and


eventually selects the l most informative ones.

The later step is performed via an optimizing procedure and it


known as the feature selection stage.
What makes a “good” feature vector?
The quality of a feature vector is related to its ability to discriminate
examples from different classes
◦ Examples from the same class should have similar feature values
◦ Examples from different classes have different feature values
Class
Intuitively, a class contains similar objects, whereas objects
from different classes are dissimilar.
(salmon and sea bass belong to two different classes).
we can assume that there are c possible classes in the
problem, and will denote that as: Ω = {ω1, ω2, ..., ωc},
Each pattern belongs to one of the “c” classes of the set Ω.
Each pattern has a class label.
The feature values are arranged as a
d-dimensional vector.
The real space is called the feature
space, each axis corresponding to a
physical feature.

scatter plot
Classifiers
The task of a classifier is to partition feature
space into class-labeled decision regions
◦ Borders between decision regions are called
decision boundaries
◦ The classification of feature vector x consists of
determining which decision region it belongs to,
and assign x to this class

A classifier can be represented as a set of discriminant functions


n The classifier assigns a feature vector x to class ωι if
Training a classifier
Once decided upon the input space, the next step is to train a
classifier.
This is achieved by first selecting a set of data, whose class is
known and comprise the training set.
This is a set of pairs, (xk, yk), with k = 1, . . . ,K, where yk is the
output variable denoting the class in which xk belong, and it is
known as the corresponding class label.
The class labels, yk, take values over a discrete set, e.g., {1, 2, . . .
,M}, for an M-class classification task.
Training a classifier
Based on the training data, it is possible to design a function
f, which predicts the output label, given the input.
This function is known as the classifier.
In general, we need to design a set of such functions, but we
start considering a simple case.
We focus on the case of two-class classification task, in which
Yk ∈ {−1, +1}.
Once the classifier has been designed, the system is ready for
predictions.
Example of classification task-1
Given an unknown pattern, we form the corresponding
feature vector x from the raw data, and
depending on the value of , the pattern is classified
into one of the classes.
Initially, we are given the set of training points in the
two-dimensional space (two features used, x1, x2).
In the figure, stars belong to class ω1 and the crosses to the
other class ω2, defined for the two-class classification task.
Feature Space and Decision Boundary

How to reduce error? We could use two features instead of only


one.
In a d-dimensional feature space, we will have a decision boundary
instead of a simple threshold value.

What is the best decision boundary (classifier)?


Example of classification task-1
Figure illustrates the classification task.

The linear classifier has been designed in order to separate the


training data into the two classes, having on its positive side the
points coming from one class and on its negative side those of the
other.
Example of classification task-1

The “blue” circle, whose class is unknown, is classified to the same class as the “star”
points, since it lies on the positive side of the classifier.
Classification Model
After the extraction of a set of features to characterize patterns,
we should select a classification model using such features to
classify patterns.
Let us assume a very simple classification model based on a
simple heuristic rule:
A sea bass is generally longer than a salmon
We can rewrite more formally this heuristic rule as follows:
◦ if l > l* then fish=sea bass , else fish=salmon
The threshold value l* can be an heuristic value that managers of
the fish plant know, otherwise we should estimate it.
How can we estimate l* ?
◦ We need a set of samples/examples of the two fish categories
(called “training set”)
Classification model
Computation of parameters of the classification model by a
“training set”

*
The threshold value l can
be estimated by the
Empirical distributions
obtained by a “training set”
(set of examples of the
two fish categories)

✔ In this example, we have only one


parameter whose value has to be
estimated for our classifier (l=l*).
✔ In general, we will have a set of
parameters.
Computation of parameters of the
classification model by a “training set”

The information to design a classifier is usually in the


form of a labeled data set D (called design or training
set):
D = [x1, x2,…., xn]
xi = (xi1, xi2,…., xid) ; i=1,..,n
xi belongs to one of the “c” classes (xi ε ωj ; j=1,…,c)
In the previous example, D is the data set used to
compute the empirical distributions of the length of the
two fish types.
This allows us to estimate the threshold value l* that
discriminates between salmon and sea bass.
Classification models
This simple example suggests us a more general classification model.
We could estimate the two probability functions: P(length / salmon)
and P(lenght / sea bass)
and then make a probabilistic decision…

• If the feature l (“length” of fish) does not allow


a good discrimination between the two
classes, we can check if a different feature can
do better.

• The “lightness” feature is more effective, it


distinguishes much better the two kinds of fish.
Cost function
In some applications, different errors have different
“costs”
For instance, misclassifying “salmon” can have a higher
cost.
Customers might dislike to find sea bass into a box
where salmon should be contained.
Therefore, the threshold value x* should be adjusted in
order to take into account “costs”
Cost Versus Classification rate
linear classifier was designed to minimize the overall misclassification rate
Is this the best objective function for our fish processing plant?
The cost of misclassifying salmon as sea bass is that the end customer
will occasionally find a tasty piece of salmon when he purchases sea bass
The cost of misclassifying sea bass as salmon is an end customer upset
when he finds a piece of sea bass purchased at the price of salmon
Intuitively, we could adjust the decision boundary to minimize this cost function
The issue of generalization
The recognition rate of our linear classifier (95.7%) met the
design specs, but we still think we can improve the performance
of the system
We then design an artificial neural network with five hidden layers, a
combination of logistic and hyperbolic tangent activation functions,
train it with the Levenberg-Marquardt algorithm and obtain an
impressive classification rate of 99.9975% with the following decision
boundary.
Satisfied with our classifier, we integrate
the system and deploy it to the fish
processing plant.
After a few days, the plant manager calls
to complain that the system is
misclassifying an average of 25% of the
fish.
What went wrong?
Regression I
Similarly to the classification task, the regression task
involves, to a large extent, the feature generation and
selection stages.
However, the output variable, y, is no longer discrete
but it takes continuous values, e.g., in an interval in the
real axis or in a complex-valued region.
The regression task is basically a curve fitting problem.
Regression
Regression problems involving signals
For example, in financial applications one can predict
tomorrow’s stock market price given current market
conditions and all other related information.
Each piece of information is a measured value of a
corresponding feature.
Examples of prediction applications also involve:
− Predict the age of a viewer watching a given video on YouTube.
− Predict the location in 3D space of a robot arm end effector, given
control
signals (torques) sent to its various motors.
− Predict the temperature at any location inside a building using
weather data, time, door sensors, etc.

Signal and image restoration and de-noising are also


examples of regression tasks.
Questions ????

You might also like