Image Classification: Unsupervised
Image Classification: Unsupervised
Unsupervised
04.1
Image Classification
Classification of images provides useful information about land cover, based on the
spectral radiometric response of the cover type. This is done by one of two basic methods:
unsupervised or supervised classification. Unsupervised classification categorizes
continuous raster data into discrete thematic groups having similar spectral-radiometric
values. Supervised classification allows the analyst to define classes of interest. The
computer then calculates training statistics based on the definitions found in signature files
and assigns each pixel of the image to the class that it most closely resembles.
For unsupervised training and classification, ERDAS Imagine employs the (ISODATA)
clustering technique which uses the statistics of the data to evaluate the similarities or
differences of the pixel values then groups the pixels into separate classes. This process
takes several passes, or iterations, until it reaches a convergence threshold. The groups are
then defined by a signature file, which can be used to create a new raster layer of discrete
class values.
In this lab, we will classify two images with unsupervised classification techniques. The
Hajdůbőszőrmėny, Hungary image consists of very large, distinct fields of agricultural
crops and an isolated city (recall that we used this image in Image Enhancement Lab). The
U of M St. Paul campus is more typical of an urban area but does include crops along with
residential and commercial areas. The imagery is high definition NAIP. The U of M campus
also includes LiDAR imagery which will be incorporated into the classification process.
Note: Since we are producing many files over the course of this lab, it is very important to
give the files descriptive names so you can locate the correct one quickly and easily. Also,
make sure you are saving all of the files that you produce to your own disk space. Finally,
remember that you can keep as many 2D Views open as you want, which will help you
evaluate any differences between classification methods.
Unsupervised Classification
Imagine uses the ISODATA (Iterative Self-Organizing Data Analysis Technique) algorithm
to perform an unsupervised classification, or clustering, of the image pixels into spectral
clusters. Each cluster represents a group of pixels which have similar spectral
characteristics in the input bands. The ISODATA clustering method starts by arbitrarily
establishing N cluster means based on the means and standard deviations of the bands in
the input file ("N" is a number specified by the user, this is essentially the number of
classes you are trying to classify). A minimum distance criterion is then used to assign each
pixel to the "nearest" cluster. The cluster means are then recalculated and each individual
pixel is again compared to the new cluster means and assigned to the nearest cluster, this
process is iterated a specified number of times. For a complete description of the ISODATA
decision rules, refer to the ERDAS Field Guide.
04.2
We will generate several classifications and compare the results:
Step 1:
Hungary Classification, Set Up: First we will load the Hungary image into the viewer.
In the Raster ribbon tab in the Classification Group click on Unsupervised and click on
Unsupervised Classification in the dropdown menu. The Unsupervised Classification
(Isodata) dialog box will open:
We’ll use the K Means method, which has more simplified inputs than the ISODATA method.
(See notes at the end of this lesson for an extract from the release notes of version 2013 that
discusses K Means and ISODATA). The number of classes may be set to 20 for your first run,
but note that an increased number might be beneficial on a subsequent run. Maximum
iterations are the maximum number of times that the ISODATA program should re-cluster the
data. This parameter prevents the program from running too long, or from potentially getting
"stuck" in a cycle without reaching the convergence threshold. A setting of 25 is appropriate
for the initial run. Note: if ISODATA fails to meet the convergence threshold in less than 25
iterations, changing your input bands (i.e. use a different subset of
04.3
the original TM bands) before increasing the number iterations can be beneficial
(not necessary in this case, just use the subset of four layers given for the Hungary
image). However, the number of iterations should not exceed 50.
The completed dialog should look something like this when you are completed:
Examine your entries but do not click OK yet…
Click on the Initializing Options… button and turn on the Principal Axis as the way to
"Initialize Means Along:". This option specifies that the initial class clusters begin along
a line that defines the correlation between bands. The alternative is to start along a 45-
degree line. Starting along the "Principal Axis" essentially allows the classification to
exploit the data to its fullest. Click Close on the File Statistics dialog box.
For all other parameters, accept the default. Click OK to start the classification process. Watch
the progress bar to see how many iterations the program actually required to meet the
convergence threshold. The output file will contain a single data layer and the value in
04.4
each pixel will be a cluster number (if you don’t pay attention to the progress bar and you
later want to know how many iterations were accomplished or where you saved your
output file to, you will need to view the log file. To access the log file, click on the File menu
in the main menu bar and select ‘Session’ then ‘Session Log’. Note how many interesting
logs there are of the things you have done so far this session!).
Identifying Classes
.
Each of the classes has been assigned a color similar to its original. You can override
this color by clicking in the color patch for a given class (row) and choosing a new
color. Try changing the opacity setting to a 0 for a class and see what happens to the
image.
Another way to assign colors is in the Table Query group, you can click on the
Colors icon to bring up the colors dialog box, once the Attributes Table is
open:
04.5
By setting a Start Color and an End Color ERDAS will create a color ramp for you, but
this may make it hard to differentiate classes. Click Apply and then Close to view
your changes. Either way, re-assigning colors might make the different
"Unsupervised Classes" more apparent like in the example below:
3. Display cursor:
Click on the layer with the classified image name in the Contents window. Open an
inquire cursor, (Home > Inquire cursor to bring a cursor (cross-hair)) into this
04.6
display window. You can determine the cluster class number which corresponds to
the position of the display cursor by widening the Viewer information window
until the Class_Names column appears. The Class Name is usually the last column in
the cursor display box. Note, the class name corresponds to the File Pixel value only
because we have not changed the names of the Class Names column yet.
04.7
You could use either attribute editor but go ahead and close this raster editor.
We will use the raster Table > Show Attributes version for our work instead.
First, set all clusters to appear in black (Edit > Colors - recall the Colors icon in Table >
Query group, when you have the raster attribute table open. Make the start and end colors
both be black). Then, select one single cluster to turn white (or other distinctive color).
Inspect 3 to 6 sites which are in this cluster and compare these areas to the same areas on
the TM false color composite (take advantage of your linked Inquire cursor). Use the
reference map from the Enhancement Lab to determine which spectral clusters correspond
to different information classes. Within the "Raster Attribute Editor" window you can
change the class names accordingly.
Note, a given cluster may represent more than one land class. It is up to you to decide if
there is just a "little bit" of confusion or "major" confusion. Major confusion(s) could mean
rerunning the unsupervised classification with a different subset of bands and/or a larger
number of output clusters. An additional category ("unknown" or "mixed") may be added
if only a small percent of the area is in clusters which are confused with several cover
types. And, an information class may have two or more cluster classes. An example might
be alfalfa for hay where some fields have been recently harvested (low amount of
vegetation) and some are nearing harvest time (high amount of vegetation).
04.8
Color Selector
Set the color you want the selected row(s) to be. Many ways to select colors exist. The RGB
color selection mode is the default. You can change the combinations of Red, Green, and
Blue to get different colors. For each component (R, G, or B), you can use your cursor to
change its value either by moving the little squares beside each color (left-hold and drag)
or by left-clicking the arrow keys. Alternatively, you can manipulate the "scroll-box" when
using the color wheel to change color or brightness, or you can move the black dot inside
the color wheel to change colors. There are also IHS and Color Name modes which you can
explore. After selecting the color you want, left-click on Apply to apply this color to the
cluster(s) you selected in the raster Table > Show Attributes editor. The pixels
corresponding to the selected cluster(s) will be instantly changed to the chosen color.
Thematic Recode
After completing your class naming and color assignments, save the file by
clicking on the save icon in the top menu. Close the raster table,
remove the cluster image from the 2D View , and reload it into the empty View.
Using the Recode operation we can merge input rows together and if desired,
apply new values to the class value of the row. Similar to the raster attribute
editor, there are two recode operations; one works on the file open
in the 2D window (Thematic > Recode [from the Edit Group]) and the other is a free
standing routine. The freestanding routine is more stable and works better so we will
use it: pick Raster > Thematic > Recode [from the Raster GIS Group].
You will be asked for the Input File you would like to recode classes as well as specifying
the Output File name for
the recoded file. Enter
those on the screen at the
left. Then Click on the
Setup Recode… button to
specify the record
parameters. The window
opens as below. (Resize
the Recode window so
you see all the fields. You
can now go down the
rows and enter a New
Code value in the
appropriate rows to
group all of your like rows into a single class. Your screen will look something like this:
04.9
The values in the Red, Green and Blue columns will, of course, be dependent on the colors
you chose for the class in the previous steps. When changing row values, be sure to follow
this procedure:
Pick a class to work on
Select the row(s) in this class by left clicking in the row column (shift-click for
multiple selections)
Change the value in the New Value box to the desired new value for this class
Click Change Selected Rows (Any row that is selected will change to the New
Value)
[Do not change the value of the new value Column in the table itself. It will look like it
works but the change doesn’t get saved.] Repeat until all of your classes have been
processed. Make a note of what names were assigned to each ‘New Value’.
04.10
When finished go ahead make sure no rows are selected in the recode table (by right
clicking in the Value column and clicking Select None) and click OK.
Remove all images from the 2D Views. Reload your new, recoded raster file that you
just saved. Notice, it should only have as many classes as you recoded to above.
The recoded ouput file no longer has the Class_Names attribute column
of the original file. You will have to re-create it using the Table > Column
properties in the Query Group.
04.11
The properties window will open. Click New in the Columns area as below:
In the Title box, give the new column a meaningful name, like Class_Names. From the Type
drop down select a type of ‘String’. Then click OK. The new column should be added to your
attribute table. You can type in the appropriate class names and select colors for your
recoded classes using the raster Table > Show Attributes. Be sure to save your changes
Step 2:
Hungary, cont. Next, using the same steps as above, we will classify the Hungary image a
second time using fewer classes, this time only 8. Be sure to use a name for the output
cluster file that is different. The Hungary image has 7 types of crops and some unknown
surfaces. We are not going to classify fewer than 7 classes because it will combine different
crop types into one class, (although, it may be reasonable that different crop types could be
combined into one class or with more classes, if the crop types are spectrally similar). Be
sure to specify Initializing Option – Principal Axis again.
Compare the two classifications and see if some crops are misclassified in either case.
Critically think about the following questions: What are the strengths and weaknesses of
each classification result? How could you make improvements? Were the classes identified
by the ISODATA algorithm the same ones you would have identified in classifying the
image manually? What is the effect of varying the number of initial classes?
Note, the ISODATA method takes into account only the variation in the DN values of the
input rasters. It does not consider spatial patterns or ground truth information, but rather
04.12
is a statistical method for identifying spectral-radiometric differences in the imagery. We
will explore object oriented classification in a later lab lesson.
Step 3:
U of M St. Paul Campus. We will now apply similar classification techniques to a NAIP
image of the St. Paul campus. Add the naip_2008_umn.tif image into 2D View #1. Open a
new viewer and load obiaintro_main-ndsm.tif into that viewer.
This raster is derived from a LiDAR point cloud and represents the nDSM (normalized
digital surface model) of the study area. The nDSM is created by subtracting the bare earth
raster or digital elevation model (DEM) from the digital surface model (DSM) which is
created by looking at only the first LiDAR points. We’ll look at this in more detail in the
LiDAR lab. Set up a linked Inquire cursor between the two images. What do the file pixel
values represent in the nDSM?
One of the most straight forward techniques to utilize the nDSM is to incorporate it into the
image stack just like you would any other image layer. Create the following image stack:
Layer 1: NAIP Blue
Layer 2: NAIP Green
Layer 3: NAIP Red
Layer 4: NAIP NIR
Layer 5: nDSM
Tip: You will use the same image Layer Selection and Stacking tool but you will use the
Layer dropdown selector this time.
04.13
Remove any images that may be in your 2D views and remove all 2D Views except for
one. Load this new 5 layer stack into your viewer with Red=Layer 4, Green=Layer 3 and
Blue=Layer 2. What does it look like? Now set all colors to be Layer 5. Produce a
classification with 10 classes. Run another classification with 20 classes. Compare the
classified results with the original image of the area. Notice there is confusion between
some classes. Did increasing the number of classes resolve it at all?
We used a simple data fusion technique by directly incorporating the nDSM raster in with
the “optical” stack. There are other techniques to utilize LiDAR such as masks that would be
beyond the scope of this class. A raster could be created with values of 0 or 1 in each cell
depending on if its elevation is above a certain threshold (1) or not (0).
Lesson 04 Outcomes
1. Understand how unsupervised classifications are implemented and have produced your
first classifications.
3. Understand how different types of landscapes and image resolution may affect image
classification.
04.14
Excerpt from the ERDAS Imagine 2013 Release notes about K-Means
Unsupervised Classification
Isodata and K-Means
The term "Iterative Self-Organizing Data Analysis Technique” (Isodata) describes a broad
algorithmic approach to performing unsupervised classification on imagery. It is iterative in that it
repeatedly performs an entire classification (outputting a thematic raster layer) and recalculates
statistics. "Self-Organizing" refers to the way in which it locates the clusters that are inherent in
the data.
In prior releases the form of Isodata implemented in ERDAS IMAGINE covered the subset of the
larger approach which is generally referred to as the K-Means method. This method requires you to
enter a predetermined number of output clusters. Pixels are iteratively classified into this number
of clusters without any deleting, splitting, or merging during the process.
With the 2013 release the Unsupervised capabilities available to IMAGINE Professional users has
been expanded to include a fuller set of Isodata options. In this expanded Isodata method you can
set a range for the number of clusters to be produced. This is because the Isodata algorithm can
perform cluster deletion, splitting, and merging between iterations.
One of the advantages of this enhanced Isodata option is that it provides for producing a greater
number classes in the tails of the image histograms – the dark / shadowed areas and the bright
areas.
Faster Classification
The new Unsupervised Classification dialog also provides an enhanced option for increasing the
speed of processing. This involves setting Skip Factors whereby the initial class clusters are built
up by the software using a sampling of the image pixels, thereby increasing the speed with which
the technique converges on a set of classes.
However in ERDAS IMAGINE 2013 an option has been added called “Add 1:1 Iteration” which
performs a final pass of the algorithm using all the original image pixels.
By this method a full resolution classification can be produced much more rapidly than by
traditional methods of running across the whole image, with little effect on the results.
04.15