0% found this document useful (0 votes)
17 views53 pages

Digital Image Classification

This document provides an outline for a lecture on digital image classification. It discusses the principles of image classification, including viewing data in image space, spectral space and feature space. The key steps of the image classification process are selection of image data, defining clusters through training, selecting a classification algorithm, running the classification, and validating results. Statistical classification approaches are covered, including defining distances between clusters. The objectives of the lecture are to explain these concepts and demonstrate classification algorithms using examples.

Uploaded by

Clintone Omondi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views53 pages

Digital Image Classification

This document provides an outline for a lecture on digital image classification. It discusses the principles of image classification, including viewing data in image space, spectral space and feature space. The key steps of the image classification process are selection of image data, defining clusters through training, selecting a classification algorithm, running the classification, and validating results. Statistical classification approaches are covered, including defining distances between clusters. The objectives of the lecture are to explain these concepts and demonstrate classification algorithms using examples.

Uploaded by

Clintone Omondi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Lecture 9.

Digital Image
Classification

Ms. Betsy Mugo


Lecture Outline
 9. Digital image classification
 9.1. Introduction
 9.2. Objectives
 9.3. Overview
 9.4. Principle of Image Classification
 9.5. Process of Image Classification
 9.6. Statistical Approaches of Image Classification
 9.7. Object/knowledge base Image Classification
 9.8. Validation of classification results
 9.9. Challenges of image classification
 9.10. Summary
 9.11. Activity
 9.12. Further Reading
9.1. Introduction

 In this lecture we will focus on digital image


classification. We will discuss the principle and
process of digital image classification. Using
examples we will demonstrate how the
different algorithms work. We will also discuss
the procedure for accuracy assessment as
well as challenges encountered in the
classification process.
9.2. Objectives

 At the end of this lecture you should be able to:-


– To explain the principle of image classification
– Distinguish between information and spectral classes
– Explain the role of image, feature and spectral space in image
classification.
– Describe the process of image classification
– Distinguish between statistical and knowledge based
classification
– Apply classification algorithms
– Validate classification results.
9.3. Overview

 Information Classes and Spectral Classes


– Information classes are those categories of interest that the analyst is actually
trying to identify in the imagery, such as different kinds of crops, different
forest types or tree species, different geologic units or rock types, etc.
– Spectral classes are groups of pixels that are uniform (or near-similar) with
respect to their brightness values in the different spectral channels of the data.

The objective is to match the spectral classes in the data to the


information classes of interest. Rarely is there a simple one-to-one
match between these two types of classes.
9.4. Principle of Image Classification(1/10)

 There are three views of Data namely:


 Image Space
 Spectral Space
 Feature Space.
9.4. Principle of Image Classification(2/10)

Image Space
 The concept here is to display the data samples in relation to one another
in a geometric sense thus providing a picture of the ground scene to the
viewer.
 The key use of imagery in multispectral processing is to facilitate the
analyst associate multispectral data points (pixels) with specific locations
(points) in the ground scene.
 In the analysis process it is useful in labelling pixels in the data as training
samples.
 A digital image is a 2-D array of elements. In each element, the energy
reflected or emitted from the corresponding area on the earth‘s surface is
stored.
 The spatial arrangement of the measurements defines the image or image
space.
 Depending on the sensor, data are recorded in n-bands and digital image
elements are usually stored as 8-bit DN-values (range: 0-255).
9.4. Principle of Image Classification(3/10)

Structure of a multiband image


9.4. Principle of Image Classification(4/10)

Spectral Space
 The idea here is if response versus wavelength conveys needed
information by which to identify the contents of an individual pixel, this
then would provide a fundamental simplicity that is important from a
processing point of view.
 Each pixel would be labelled individually resulting in much better result
compared to a label associated with a neighbourhood of pixels.
 Response as a function of wavelength has useful characteristics, since it
provides the analyst with spectral information that is directly interpretable.
 Practically, this is not always feasible due to the fact that the earth
surface cover types tend to vary in a characteristic way,
– E.g. spectral response of a maize field is not uniform for all pixels in the field
but varies in a characteristic way about some mean value.
9.4. Principle of Image Classification(5/10)

Spectral Response Curve


9.4. Principle of Image Classification(6/10)

Feature Space
 For one pixel, the values in e.g. two bands can be regarded as components
of a 2-D vector, the feature vector. E.g. a feature vector of (13,55) implies
that 13DN and 55DN values are stored for band 1 and band 2 respectively.
 Similarly, this can be extended for a 3band situation.
 A graph showing the values of the feature vectors is called feature space or
feature space plot.
 NB. Plotting for a four or more dimensional cases is difficult. A practical
solution is to plot all possible combinations of two bands separately, e.g. for
4 bands six combinations namely 1&2; 1&3; 1&4; 2&3; 2&4; 3&4.
 Plotting the combinations of the values of all the pixels of one image yields
a large cluster of points referred to as a scatter plot.
9.4. Principle of Image Classification(7/10)

Feature Space Concept


9.4. Principle of Image Classification(8/10)

Distances and clusters in Feature Space

 Distances in feature space is expressed as euclidean distance


and the units are DN since this is the unit of the axes.
 In a two dimensional feature space, the distance can be
calculated according to the pythagoras theorem.
 E.g. the distance between (10,10) and (40,30) equals the square
root of (40-10)2 + (30-10)2
9.4. Principle of Image Classification(9/10)

Image Classification Principle


 The scatterplot gives information about the distribution of pixel values in two bands of
an image.
 Note each class occupies a limited area in the feature space.
 Each cluster of feature vectors (class) occupies its own area in the feature space.
 This figure shows the basic assumption for image classification where a specific part
of the feature space corresponds to a specific class.
 Once the classes have been defined in the feature space, each image pixel can be
compared to these classes and assigned to a specific class.
 Note:
– classes need to have different spectral characteristics in order to be distinguished in image
classification.
– If classes do not have distinct clusters in the feature space, image classification can only give
results to a certain level of reliability.
 The principle of image classification is that a pixel is assigned to a class based on its
feature vector by comparing it to predefined clusters in the feature space. When this is
done for all image pixels in the image, the result is a classified image.
9.4. Principle of Image Classification(10/10)

Issues in image classification

 There are two main issues in image classification:-


– Definition of clusters: this is an interactive process and is done
during training process.
– Method for comparison: comparison of individual pixels with
the clusters is done using classifier algorithms.
9.5. Image Classification Process (1/5)

 Involves five steps:-


– Selection and preparation of image data
– Definition of clusters
– Selection of classification algorithm
– Running the actual classification
– Validation of the results.
9.5. Image Classification Process (2/5)

Selection and preparation of image data

 The objective of using image data should be well


understood, i.e. what objects or features are to be
extracted. This will further dictate:-
– The spatial resolution of the data
– The spectral resolution of the data
– The temporal resolution of the data
9.5. Image Classification Process (3/5)

Definition of Clusters(Training)

 Here two approaches are possible:-


– Supervised classification where the operator defines
the clusters during the training process.
– Unsupervised classification involves a clustering
algorithm automatically finding and defining a
number of clusters in the feature space.
9.5. Image Classification Process (4/5)

Selection of a classification algorithm

 The operator needs to decide how the pixels based on their DN-Values
are assigned to the classes. The choice will depend on a number of
criteria namely:-
– Purpose of the classification
– Characteristics of the image data and training data

 Classifier algorithms can be broadly classified in two:-


– Statistical approaches
 Box classifier
 Minimum to Mean Distance Classifier
 Maximum likelihood classifier.
– Knowledge based approaches
9.5. Image Classification Process (5/5)

Validation of the result

 Once classification has been done, the quality of the classified


image is assessed by comparing it to reference data (ground
truth).
 This involves:-
– Selection of a sampling technique
– Computation of accuracy measures.
– Interpretation of the accuracy measures.
9.6. Statistical Approaches (1/17)

 Sampling training areas for classification


9.6. Statistical Approaches (2/17)

Statistics for classification Variance and Covariance


9.6. Statistical Approaches (3/17)

Statistics for Classification Definition of Distance


9.6. Statistical Approaches (4/17)

About ClusteringAlgorithm
 The operator defines the maximum number of clusters in adata.
 On the basis of this, arbitrary mean vectors are located by the computer as the centre points of the
clusters.
 Each pixel is then assigned to a cluster by the minimum distance to cluster centroid decision rule.
 Once all the pixels have been labelled, recalculation of the cluster centre takes place and the
process is repeated until the proper centres are found and the pixels labelled accordingly.
 Iteration stops when the cluster centres do not change anymore. Note at any iteration, any clusters
with less than the specific number of pixels are eliminated.
 After clustering, analysis is done of the closeness or separability of the clusters on the basis of
intercluster distance or divergence measure. Merging of clusters is done to reduce the number of
unnecessary subdivisions in the data set. The operator canuse:-
– Maximum number of clusters
– Distance between two cluster centres
– Radius of a cluster
– Minimum number of pixels
 As a threshold for cluster elimination.
9.6. Statistical Approaches (5/17)

Hierarchical ClusteringAlgorithm
9.6. Statistical Approaches (6/17)

Non-Hierachical Clustering Algorithm (ISODATA)


9.6. Statistical Approaches (7/17)

Example of a Clustered image


9.6. Statistical Approaches (8/17)

About Box Classifier


 Simple classification method
 Upper and lower limits for each class are defined where limits may be based on:-
– Minimum and maximum values
– Mean
– Standard deviation per class.
 By defining a lower and an upper limit, this results to a box-like area in the feature
space hence box classifier.
 Number of boxes depend on the number of classes.
 Also known as parallelepiped classification since the opposite sides are parallel.
 During classification an unknown pixel will be checked to see if it falls in any of the
boxes and it is labelled with the class in which box it falls. Pixels not falling inside any of
the boxes are assigned unknown class or reject class.
 Disadvantage of box classifier is that overlap between classes results to a pixel being
arbitrarily assigned to the label of the first box it encounters.
9.6. Statistical Approaches (9/17)

Box Classifier
9.6. Statistical Approaches (10/17)

About Minimum Distance to Mean Classifier


 Basis for MDM classifier is the cluster centres
 The euclidean distance from an unknown pixel to various cluster centres
are calculated
 The unknown pixel is then assigned to that class whose the distance is
least.
 The disadvantage of MDM classifier include:
– Pixels even at large distances from a cluster centre may be assigned to this
class. This necessitates definition of a threshold that limits the search
distance.
– It does not take the class variability into account, i.e. some clusters are small
and dense while others are large and dispersed.
9.6. Statistical Approaches (11/17)

Minimum to Mean Distance


Classifier
9.6. Statistical Approaches (12/17)

Example of Classification by MDC


9.6. Statistical Approaches (13/17)
About Maximum Likelihood Classifier

 Not only considers the cluster centre but also its shape, size and
orientation.
 This is achieved by computing a statistical distance based on the mean
values and covariance matrix of the clusters.
 The statistical distance is a probability value that tells us the probability of
an observation x belongs to a specific cluster.
 A pixel is assigned to the cluster to which it has the highest probability.
 The assumption of maximum likelihood is that the statistics of the clusters
have a normal or gaussian distribution.
9.6. Statistical Approaches (14/17)

Maximum Likelihood Classification


9.6. Statistical Approaches (15/17)

The Likelihood Function


9.6. Statistical Approaches (16/17)

Example of MLK Classification


9.6. Statistical Approaches (17/17)

Comparison of the Classifier algorithms


9.7. Knowledge Based Approaches
 Statistical approaches have proved to be inadequate in dealing with high resolution satellite
images and particularly for urban applications due to the heterogeneous nature and variable
object sizes which cause spectral signatures to overlap.
 This has led to the development of knoweledge based approaches.
 Such systems in principle incorporate knowledge about the objects of concern within the image
data and are represented within the knowledge base.
 In addition, map or GIS data (e.g. with limited accuracy and currentness) is in most cases
available giving an indication of the location of specific objects and serving as an initial symbolic
scene description, thus forming a hypothesis about the objects suspected within the image to
be interpreted.
 This hypothesis is tested against current image data and returns features, based on
interpretation rules and a reliability measure. Within the interpretation, these features and their
relationships are grouped and the hypothesis having the highest confidence value is selected,
where confidence values are preset.
 Whereas conventional satellite image analysis techniques are restricted to pixel based
classification the new approaches, employ a model-driven top-down approach together with a
data-driven bottom-up process of image analysis.
Knowledge Domain
Knowledge domain
 Imaging Source
– This refers to basically all the information pertaining to the imaging source, which include spectral,
spatial, and temporal resolutions. The choice of sensor data to use is application dependent. An
understanding of the imaging source characteristics and the objects in the scene enhances the
interpretability of the objects.
 Application area
– Background knowledge in a certain discipline enhances the ability to accurately interpret features. This is
due to the fact that different earth features have different spectral, spatial and structural characteristics.
Moreover, the perception of an image scene is problem dependent, e.g. a forester would be interested in
tree cover and species as opposed to a geologist whose interest is basically in the rocks.
 Geographic Location
– Most of the problems under investigation are unique to the region e.g. mushrooming of unplanned
developments is a situation prevalent in the developing countries as opposed to the developed ones.
Consequently knowledge about the geographic area and its corresponding cultural and physical
conditions are essential for a better understanding and interpretation of the image features.
 Object knowledge
– Objects have different attributes e.g. shape, color, size, texture, height, site, orientation, function, etc.
Image interpreters employ these characteristics to identify and label different objects within a scene.
However, depending on the application some attributes contribute more evidence in the discrimination
process than others.
Knowledge Based Approaches
 Rule Based Systems
– These are also sometimes called production systems. They consist of a knowledge base which comprises
data (objects, facts, goals) and rules (condition, action) and an inference engine whose role is to assemble
rule instantiations in a conflict set from where one or more rules are selected based on some criteria e.g.
recency or specificity. The relationships are represented using the condition-action pairs, thus
– IF (condition)
– THEN (action)
– Where the condition is also referred to as premise, antecedent or Left Hand Side (LHS) and the action,
conclusion, consequent or Right Hand Side (RHS).
 Semantic networks
– These are graphical knowledge representations of nodes which represent objects, concepts or events and
are linked to each other through arcs which represent relationships between objects. IS-A and IS-PART-OF
are the mostly applied links. Like frames, semantic networks allow inheritance of properties from other
objects.
 Frames
– These are structural models used to represent groups of attributes that describe a given object, whereby
each attribute is stored in a slot which may contain default values, rules or procedures for changing the
values attached to the attributes. Information contained in a frame can be either procedural or declarative,
or both. A frame is defined by fixing the number of and type of slots.
 Hybrids
– They incorporate the attractive elements e.g. the representation paradigms into a single integrated
programming environment.
9.8. Validation of the Results
 Sources of errors:
– Data Acquisition Errors:
These could be attributed to sensor performance, platform stability, and viewing conditions.
They can be reduced or compensated by carrying out systematic corrections e.g. by
calibrating detector response with on-board light sources generating known radiances.
Further, corrections, can be modified by ancillary data such as known atmospheric
conditions, during the initial processing of the raw data.
– Data Modeling Errors:
The information extraction algorithms are normally not 100% efficient. They range from
simple statistical algorithms to expert systems employing cognitive knowledge in the
interpretation process. The ability to model objects within the scene determines how well they
are discriminated from each other. The limitations involved in modeling of real world objects
leads to some errors.
– Scene-dependent Errors:
These are application dependent and refer mainly to the location and classification accuracy.
The former refers to how well the mapped objects correspond to the real objects on the earth
whereas classification accuracy deals with how well an object has been identified on the
image.
Reference Data
 The source of reference data is normally generated from in situ investigation involving field measurements orquite
often from the interpretation of remotely sensed data at higher scale or resolution e.g. aerialimages.
 Technical specifications of reference data.
– Temporal resolution necessitates that the reference data be acquired as close to the time of acquisition as possible due to inevitable
landscape changes. This is especially critical for dynamic applications for instance in this study where the focus is on urban growth.
This is to ensure that any changes extracted from the source information can be appropriately related to the reference data.
– The spatial resolution implies that the level of detail should allow for a reasonable comparison. In general, the data employed as the
reference are largely abstractions, often thematic, recording one or more surface types or themes while overlooking others,
depending on the purpose and the details being emphasized (level of generalization). On the other hand, the sensor takes a synoptic
view of all surface features with limitations of detector resolution and object characteristics. Consequently, when quantifying accuracy,
the lack of equivalence and totality should be taken into consideration
– Co-registration is one of the minimal requirements in accuracy assessment. This is to enable comparisons to be made either by
overlaying the extracted information on the reference data or by carrying out automatic statistical comparisons. Errors that would
arise from localization would be prevented or minimized if the two data sets are well co-registered.
– Sampling scheme and the number of reference points is an important consideration as this determines whether or not the evaluation
is rigorous. Sampling schemes include simple random to directed approaches e.g. stratified and equalized random sampling. It has
been reported that simple random sampling has the disadvantage of under- sampling whereas the latter schemes tend to over-
sample. Several researchers have different recommendations pertaining to the sample size. As a rule of thumb, a minimum of 50
samples per each land use category have been suggested, but this should be increased if an area is large in which case the
minimum number of samples should be between 75 or 100 [Lillesand and Kiefer, 2000].
– The sample unit varies across different applications and this could be individual pixels, group of pixels (clusters) or polygons. It does
also depend on the reference data available, for instance where the reference data is vector then the sample unit should also be
polygons, whereas where high resolution remotely sensed data is used e.g. aerial photographs then the choice of whether single
pixels or clusters of pixels are used will depend on the level of processing or segmentation.
Accuracy Indices
 Different accuracy indices for purposes of evaluating extracted information have been reported in
literature, and they include producers accuracy, users accuracy, overall accuracy, average accuracy,
combined accuracy, and kappa coefficient of agreement . Within the framework of object extraction and
identification, they are used to establish in a quantitative way the correctness of object localization and
labeling.
– Producers accuracy refers to the number of correctly identified and labeled samples for a given class with respect
to the total number of reference samples for that category. Thus it gives an indication of the error of omission.
– On the other hand, the users accuracy is a measure of the error of commission and it is basically the number of
correctly identified and labeled samples for a given class with respect to the number of samples interpreted as
belonging to that class.
– The overall accuracy is a measure of the total correctly interpreted samples with respect to the entire number of
samples. Thus it gives an impression of how well the entire image has been interpreted.
– Average accuracy is the average of the individual categories in which case this could be either the producers or
users accuracy.
– The combined accuracy is the average between the overall and average accuracy. The biases in both the two
indices are minimized whereby the overall tends to be biased for the classes with large number of samples
whereas the vice versa is true for the average accuracy.
– The kappa index is a measure of the agreement between the interpreted image and the referencedata.
Observedaccuracy chance agreement
k
– where:- 1chance agreement
 Observed accuracy :is the a measure of the agreement between the reference data and the automatic classifier,
 Chance agreement: :is a measure of the agreement between the reference data and the random classifier.
Numerical Example

 Interpretation of the table:


– A total of 163 samples were collected.
– ABCD are the reference classes whereas abcd are the classes in the classification result.
READ:
– 53 classes of A were found in the real world (reference), whereas the classification result yields 61 cases of a
in which 35 cases agree.
 Overall accuracy is commonly cited as a measure of mapping accuracy (also known as Proportion
Correctly Classified (PCC) and is the number of correctly classifiedpixels;
i.e. sum of the diagonal cells in the error matrix divided by the total number of pixels checked. (35 +
11 + 38 + 2)/163 = 53%. This is one figure for the result as a whole.
Numerical Example Cont.
 Other measures are derived per class, e.g.;
 Error of omission, which refers to those sample points that are omitted in the
interpretation result.
 For class A, for which 53 samples were interpreted as b, c, or d. This results in an
error of ommission of 18/53 = 34%. This has to do with reference data and hence
relates to the columns in the error matrix.
 Error of commission has to do with the interpretation results and thus relates to the
rows in the error matrix. It refers to incorrectly classified samples.
 E.g. for class d, only 2 of the 21 samples are correctly labelled which translates to
10%.
 Errors of commission and omission are also refered to as Type I and Type II errors
respectively.
 User accuracy is the corollary of commission error and is the probability that a
certain reference class has also been labelled that class.
 Producers accuracy is the corollary of omission error and is the probability that a
sampled point on the map is that particular class.
9.9. Challenges of Image
Classification
 Image classification is a powerful technique for deriving thematic classes
from multi-band image data.
 Limitations of pixel based image classification are that it results in:-
– Spectral classes
– Each pixel is assigned to one class only. (Mixed Pixels or Mixels).
Spectral Classes
 These are classes linked directly to the spectral bands used in the classification, which are
inturn linked to surface characteristics.
 During classification, a spectral class may be represented by several trainingclasses.
– Could be due to variability within a spectral class. E.g. consider a class such as grass, there could be
different types of grass, having different spectral characteristics. Furthermore, same type of grass may
have different spectral characteristics when considered over large areas due for example to different soil
and climatic conditions.
– A land use class may be comprised of several land cover classes thus resulting in 1 to 1, 1 to n, and n-1
relationships.
 The 1 to n relationships are a serious problem and can only be solved by adding
data/knowledge to the classification procedure.
 Added data could be:-
– Other RS data (e.g. other bands, moments, etc)
– Existing spatial data e.g. topographic maps, historical land inventories, road maps, etc.
 An example of using historical land cover data and defining the probability of certain land cover
changes.
 Another example is to use elevation, slope and aspect information where elevation differences
play an important role in variations in surface cover types.
Mixed Pixels
 Each pixel is ideally supposed to be assigned to only one class.
 When dealing with relatively small pixels this is not a problem, however
large pixels mean that more land cover classes will occur within one pixel.
 Thus the spectral value of the pixel is an average of the reflectance of the
land cover present within the pixel. This phenomena is refered to as
mixed pixel or mixel.
 This calls for a different approach e.g. :
– Fuzzy classsification which involves assigning the pixel to more than one
thematic class
– Sub-pixel classifier algorithms. (Spectral mixture analysis)
 A more practical approach is to always select data with the appropriate
spatial resolution.
Fuzzy Classification
 This involves the fuzzy set concept.
 Here a given pixel is said to have partial membership in more than
one category.
 Instead of having hard boundaries between classes in the spectral
measurement space, fuzzy regions are established and
membership grades assigned to each pixel that describe how
close a pixel measurment is to the mean of all classes.
Sub-pixel Classifier Algorithm
 Most applications of sub-pixel classifier, make use of linear mixture
models, whereby the observed spectral response from an area on the
ground is assumed to be a linear mixture of the individual spectral
signatures of the various land cover types present which are refered to as
endmembers and expressed mathematically as:-
N

F F F .....................F 1
i 1 2 N
i1

 Where F1,F2,..............FN represent the fraction of each N possible


endmembers and their sum must equal to 1.
9.10. Summary

 In this lecture we have discussed in details the


principle and process of digital image
classification. The emphasis has been on the
statistical approaches although the concept
behind knowledge based approaches has also
been discussed. The procedure for validation
of the classified results has also been
discussed in terms of computation of accuracy
indices and their interpretation.
9.12. Further Reading

 Fundamentals of Remote Sensing


http://www.ccrs.nrcan.gc.ca/ccrs/eduref/t
utorial/tutore.html
 Digital Images and Processing
Techniques
http://www.ccrs.nrcan.gc.ca/ccrs/eduref/e
xercise/digexece.html

You might also like