Introduction 22feb24
Introduction 22feb24
Introduction
classification classes
feedback
supervision Generates suitable data/signals according to the
purposes of the recognition process (eg, by
normalization, equalization, calibration, filtering for
quality enhancement ).
It is a major problem and consists in extracting the information contained in the data/signals and
making it usable for subsequent analysis, therefore it depends on the characteristics of the
classification process. A parameter derived from a signal must be significant, i.e., have a high
discriminative power, but at the same time it should be possible to obtain it with the appropriate
computational simplicity.
6
The process of classification/recognition (2)
physical reality
an element to be classified (a sample or pattern),
characterized by the extracted parameters , is
transducers assigned to a class with criteria such as the
maximum a posteriori probability, the minimum
pre-processing distance, ...
if the result of the process is not considered satisfactory, a modification of the process
through a feedback (or backtracking) is applied. The change can be more or less
extensive and may involve any module of the system: new sensors, insertion of a new
pre-processing, extraction of new parameters, change of the classification method , ...
7
Supervised classifiers
• There are two main types of classifiers:
– supervised
– unsupervised.
Crop fields
Forest Pasture
– The set of six considered soil classes (leaves of the tree) are first
divided into two “macro-classes”: Vegetated vs Unvegetated
– then the five vegetation classes into Forest, Crop fields, and Pasture
– finally, crop fields into Corn, Wheat, and Sugar beet.
– Classification can be applied repeatedly at each level of the tree,
starting from the root, or only once at the leaves level. In the last
case, labeled samples are then grouped according to macro-classes.
18
Distance measures
• Several classifiers, including some that explicitly use
discriminant functions, are based on the criterion of minimum
distance between the unknown samples and classes. It is
therefore important to provide a definition of distance in the
feature space.
• A distance or metrics in the feature space is a function d(·,·)
defined by the following properties:
– d(x, x) = 0;
– d(x, y) = d(y, x) > 0 x y;
– d(x, z) d(x, y) + d(y, z) (triangle inequality)
19
Distance measures in classification problems
if = 2 it corresponds to y d ( x , y ) = max xi − yi
the Euclidean distance i =1,2,...,n
x
20
Similarity Measures
• A similarity measure is a quantitative measure of how much
two samples "resemble" .
– Tipically the range of a similarity measure s(·,·) is [– 1, 1]:
if s(x, y) = 1, x and y are said similar.
– There are different types of similarity measures. A first class of
functions takes into account only the two samples involved in the
similarity measure. In this class the following measures are
included:
xt y
cosine-type measure : s ( x , y ) =
x
– = cos
x y y
xt y
– Tanimoto measure: s ( x, y) = t
x x + yt y − xt y
21
Remark
• The above distances and similarity measures are referred to
the case in which the feature space is ℝn (real-valued features).
• Even in applications that don’t involve real-valued features,
but binary or discrete or symbolic features (eg, strings of bits
or symbols), it is possible to introduce specific notions of
distance and similarity measures (eg, Hamming distance
between binary strings). However, we will not use them
hereinafter.
22
Feature normalization
• The data, on which a classification algorithm operates, are
typically represented by a finite set of samples, called data set
(eg, the set of pixels of an image ).
– However, the measures that characterize the samples are
typically linked to physical quantities with different units of
measurement.
– The values of the various features can be different for orders of
magnitude (eg, height in meters and weight in kilograms).
– This may occur either because of non-homogeneity of the units of
measurement in which the features are expressed, or because of
the different ranges in which the variables can take values.
• Solution: Feature normalization.
– The normalization can be described as a function that, starting
from the i-th original feature xi, returns the value xi’ = hi(xi), being
hi an appropriate normalization function (i = 1, 2, ..., n ).
23
Tipical normalization functions
• Calling X = {x1, x2, ..., xN} the data set, the most typical functions
hi applied to the feature xi are the following:
– division by maximum (over the data set):
xi
xi = , xi ,max = max xki
xi ,max k =1,2,...,N
𝑥x22
o
x
x
(a) 𝑥
x22
x
o
x
x
(b) 𝑥
x2
2
x
x o o
o
o (c)
x x o
x o
ω1 x x
x
o x
x ox
x x
x
ω
x
x
x o
x
o x
x
x
x
ω111
1
o o
o o o
1
o x
ω2
1 1
o
o
o
o
ω2
2
o x
o
o
x o
x 2
o
o
o
o
2
ω1
o o o
o x
o
o o
x1 x1
1 1
𝑥1 2
𝑥x1
1 𝑥1
2 2
– (a) presents a linearly separable data set, (b) and(c) two non-
linearly separable data sets. The class 1 in (c) is bimodal.
– In the cases (a) and (c) statistical techniques work very well while
the case (b) presents a much more complex situation.
27
Samples arrangement in the feature space
• The properties of a class are based also on the geometrical
arrangement of the related samples in the feature space.
– In particular, if the classes have preferential directions and/or are
very overlapped, some classification rules fail.
– Example —In general, it is difficult to separate the samples in a
feature space region where the two classes intersect. In addition,
a classifier using only the information about the centroid of the
classes would perform very poorly.
𝑥2 x x
o
x x x o o
x o
x o o
x
x o
o x x
o o x
o oo x x
𝑥1