Unit IV. Digital Image Classification
Unit IV. Digital Image Classification
2. Definition of the clusters in the feature space. Here two approaches are
possible: supervised classification and unsupervised classification. In a
supervised classification, the operator defines the clusters during the training
process; in an unsupervised classification a clustering algorithm
automatically finds and defines a number of clusters in the feature space .
3.Selection of classification algorithm. Once the spectral classes
have been defined in the feature space, the operator needs to
decide on how the pixels (based on their DN-values) are assigned
to the classes. The assignment can be based on different criteria
4. Running the actual classification. Once the training data have
been established and the classifier algorithm selected, the actual
classification can be carried out. This means that, based on its DN-
values, each individual pixel in the image is assigned to one of the
defined classes.
5. Validation of the result. Once the classified image has been
produced its quality is assessed by comparing it to reference data
(ground truth). This requires selection of a sampling technique,
generation of an error matrix, and the calculation of error
parameters.
Preparation for image classification
Image classification serves a specific goal: converting image data into thematic
data. In the application context, one is rather interested in thematic
characteristics of an area (pixel) rather than in its reflection values. Thematic
characteristic such as land cover, land use, soil type or mineral type can be used
for further analysis and input to models. In addition, image classification can also
be considered as data reduction: the n multispectral bands result in a single
valued raster file.
With the particular application in mind, the information classes of interest need
to be defined and their spatio-temporal characteristics assessed. Based on these
characteristics the appropriate image data can be selected. Selection of the
adequate data set concerns the type of sensor, the relevant wavelength bands
and the date(s) of acquisition.
Before starting to work with the acquired data, a selection of the available
spectral bands may be made. Reasons for not using all available bands (for
example all seven bands of Landsat TM) lie in the problem of band correlation
and, sometimes, in limitations of hard- and software. Band correlation occurs
when the spectral reflection is similar for two bands.
Supervised image classification
One of the main steps in image classification is the ‘partitioning’ of the feature
space. In supervised classification this is realized by an operator who defines the
spectral characteristics of the classes by identifying sample areas (training
areas).
Supervised classification requires that the operator be familiar with the area of
interest. The operator needs to know where to find the classes of interest in the
area covered by the image. This information can be derived from ‘general area
knowledge’ or from dedicated field observations.
A sample of a specific class, comprising of a number of training pixels, forms a
cluster in the feature space. The clusters, as selected by the operator:
• should form a representative data set for a given class; this means that the
variability of a class within the image should be taken into account.
• should not or limitedly overlap with the other clusters, otherwise, a reliable
separation is not possible. Using a specific data set, some classes may have
significant spectral overlap, which, in principle, means that these classes
cannot be discriminated by image classification. Solutions are to add other
spectral bands, and/or, add image data acquired at other moments.
Unsupervised image classification
Supervised classification requires knowledge of the area at hand. If this
knowledge is not sufficient available or the classes of interest are not yet
defined, an unsupervised classification can be applied. In an unsupervised
classification, clustering algorithms are used to partition the feature space into
a number of clusters.
Several methods of unsupervised classification system exist, their main purpose
being to produce spectral groupings based on certain similarities. In one of the
most common approaches, the user has to define the maximum number of
clusters in a data set. Based on this, the computer locates arbitrary mean
vectors as the centre points of the clusters. Each pixel is then assigned to a
cluster by the minimum distance to cluster centroid decision rule.
Once all the pixels have been labelled, recalculation of the cluster centre takes
place and the process is repeated until the proper cluster centres are found and
the pixels are labelled accordingly.
The iteration stops when the cluster centres do not change any more. At any
iteration, however, clusters with less than a specified number of pixels are
eliminated
Classification algorithms
After the training sample sets have been defined, classification of the
image can be carried out by applying a classification algorithm.
Several classification algorithms exist. The choice of the algorithm
depends on the purpose of the classification and the characteristics of
the image and training data. In the following, three classifier
algorithms are explained. First the box classifier is explained, for its
simplicity to help you understanding the principle. In practice, the box
classifier is hardly ever used. In practice the Minimum Distance to
Mean and the Maximum Likelihood classifiers are used.
Box classifier
The box classifier is the most simple classification method. For this purpose, upper
and lower limits are defined for each class. The limits may be based on the minimum
and maximum values, or on the mean and standard deviation per class. When the
lower and the upper limits are used, they define a box-like area in the feature space,
which is why it is called box classifier. The number of boxes depends on the number
of classes. Box classification is also known as parallelepiped classification since the
opposite sides are parallel (Figure 12.9). During classification, an unknown pixel will
be checked to see if it falls in any of the boxes. It is labelled with the class in which
box it falls. Pixels that do not fall inside any of the boxes will be assigned the
unknown class, sometimes also referred to as the reject class.
The disadvantage of the box classifier is the
overlap between the classes. In such a case, a
pixel is arbitrarily assigned the label of the first
box it encounters.