0% found this document useful (0 votes)
55 views4 pages

K-Means and ISODATA Clustering Algorithms For Landcover Classification Using Remote Sensing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views4 pages

K-Means and ISODATA Clustering Algorithms For Landcover Classification Using Remote Sensing

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Sindh Univ. Res. Jour. (Sci. Ser.) Vol.

48 (2) 315-318 (2016)

SI NDH UNIVERSITYRESEARCH JOURNAL (SCIENCESERIES)

K-Means and ISODATA Clustering Algorithms for Landcover Classification Using Remote Sensing

A. W. ABBAS, N. MINALLH, N. AHMAD, S.A.R. ABID, M.A.A. KHAN

University of Engineering and Technology Peshawar, Pakistan


Received 17thJune 2015 and Revised 28rd April 2016

Abstract-The aim of thisexploration work is to analyze the presentation ofunsupervised classification algorithms ISODATA(Iterative
Self-Organizing Data Analysis Technique Algorithm)andK-Means in remote sensing, to evaluate statistically by iterative techniques to
automatically group pixels of similar spectral features into unique clusters. This investigation used SUPARCO(Space and Upper
Atmosphere Research Commission (Pakistan)) obtained remotely sensed patch of Abbottabad Pakistan. The test patch of Abbottabad is
divided into Five bands i.e. NDVI (Normalized Difference Vegetation Index), green, near infrared, far infrared, and green. The ROIs
(regions of interest) selected for classification of Land Cover data comprises five different types of classes i.e. water bodies, agriculture,
settled area, forest and barren land. In this research of remote sensing the first step was to preprocess Abbottabad test patch by filtering,
to improve performance of classification andneighboring pixels homogeneity. The next step was to assess the accuracy of Two pixel
based unsupervised classifiers i.e. ISODATA and k-means on the said test patch. Finally, the mentioned classifiers performance is
evaluated by varying their different parameters to categorize the effect of the clustering algorithms and their class statisticson whole
classification outcomes.
Keywords: K-Means; ISODATA; Clustering Algorithms

1. INTRODUCTION transform. On the feature extracted image feature


Remote Sensing (RS) imaginary is a vital reduction is performed using energy based selection.
source of information for observation of earth surface Finally different K means clustering is performed and
(RS). In modern terms, RS is the use of aerial sensor analyzed using MATLAB and ground truth data for
technologies to detect and classify objects from distance improving classification accuracy. TaneeKamkhet in
on Earth, its surface, atmosphere and oceans by means (Kamkh, 2012) discussed analysis of thaichote band
of propagated signals (Schowengerdt and Robert, 2007). characteristics using unsupervised pixel-based
RS can be divided into supervised and unsupervised classification. Each band was individually classified
classification, but mostly it is cumbersome to obtain using ISODATA and K-Means methods.In
prior knowledge due to the effect of image noise, (Memarsadeghiet al., 2003 ) an effective and modified
various characteristics and complex background. version of ISODATA classifier is presented, which
Therefore unsupervised classification and cluster improved running time of classifier through storing of
analysis is of great importance in RS studies (Wang and points in a KD-tree and estimates the dispersion of each
Cheng, 2010).The significance of Earth observation in cluster. Anil K. Jain in (Jain, 2010) praises K-Means
our future decision-making processes through Remote clustering algorithm published in 1955 as the standard,
Sensing, Pattern recognition, automatic classification, simplest although many algorithms presented since then
clustering, Change detection, feature extraction and but K-Means is still widely used after 50 years.
parameter estimation is advantageous for economic Therefore to design a general purpose clustering
reasons, disaster management, high yield of crop algorithm is cumbersome and difficult. The general
production, deforestation, security and surveillance issues in design of clustering algorithms, their overview,
(Schowengerdt and Robert, 2007), (Liu et al., 2015). summary of clustering methods, and guidelines for
latest research i.e. data clustering on large scale, real-
With advancement of technologies various
time feature selection ensemble and semi-supervised are
classification approaches are deployed on remotely
discussed.
sensed images to get desired information, in
(Venkateswaranet al., 2013 )Venkateswaran presented The study area chosen for research
performance analysis of K-Means clustering for isAbbottabad region of Hazara district in Khyber
remotely sensed images. In this paper, a novel method Pakhtunkhwa province of northeastern Pakistan with an
for unsupervised classification in multi-temporal optical altitudeof 1,260 meters (4,134 ft) and the total area of
image based on DWT feature extraction and K-Means 1,967 square kilometers(SMEDA). The reasons for
clustering is proposed. After preprocessing the optical selection of this region are based on the area having
image is feature extracted using the discrete wavelet 5types of land use with support for the interpretation
++
Correspondence Author: Email Nasru Minallah <[email protected]>
A. W. ABBAS et al. 316

these are agriculture, forest, settled area, barren and Implementation of Unsupervised Classification By
water (Razaet al., 2012).The geographical position of K-Mean And ISODATA Method
Abbottabad region and its land patterns are shown in Currently various clustering algorithms are
“Fig.1 (a) and (b)”. generally deployed in remote sensing. The two well-
known are the K-Means and the ISODATA
This paper illustrates the performance analysis of
unsupervised classification algorithms.
K-Means and ISODATA Clustering Algorithms for
Landcover Classification. The continuing paper is These algorithms are iterative in nature. Firstly
structured into three sections. Section 2 illustrates select the arbitrary starting values which show properties
Methodology. Performance analysis of results and of cluster and effect result of classification.
discussions are elaborated in section 3while section 4
has presented conclusion and future work. Generally in both approaches first step is assignment
of arbitrary initial values to cluster. Secondly classify
each pixel to the nearby cluster. To calculate cluster
mean of all pixels in one cluster is the third step. The
repetition of 2nd and 3rd steps continues until the
"change" between the iteration is small. The "change"
can be considered in 2 ways either by the percentage of
change of pixels from one iteration to another or by
calculating the change of distances for the mean cluster
vector between iterations.
(a)(b)
In addition, for improvement the ISODATA
FOREST AGRICULTUR WATER BARREN SETTELED consists of splitting and merging of clusters.
Fig1. (a)Abbottabad region geography(b)Land use pattern of  The criteria for merging the clusters are based on
Abbottabad certain threshold if the distance between the centers
2. METHODOLOGY of 2 clusters is less than that or if the number of
Pattern recognition approaches are commonly pixels in one cluster isfewer than that limit, clusters
deployed to recognize the underlying patterns in would be merged.
remotely sensed data (Websource). In this research the  The condition for splitting of clusters into 2 is
analysis started from data acquisition, pre-processing, satisfied if the cluster standard deviation increased
then unsupervised classification by K-mean and than a predefined value and the number of pixels is 2
ISODATA method was carried out, with final times the threshold for the minimum number of
processing in post-classification and accuracy pixels(Tou and Gonzalez, 2012 ).
assessment has been discussed in next section using
The K-Means Algorithm
ENVI 5.0.
Feature extraction is the most important step of
Data Acquisition any recognition system. The purpose of feature
Datasets for RS and clustering can be from a wide extraction is to take the important characteristics of the
range of sources like satellite sensor data, ground based image and classify the overall image using this small set
sensor data, general data of weather, energy systems and of information. The selection of features directly effects
so on. In this work dataset is the test patch of the classification operation. Good features results in a
Abbottabad region KPK Pakistan acquired from satellite higher success rate in the process of recognition and
of SUPARCO. Once the dataset is acquired it is vice versa. In this paper, two types of features have been
preprocessed, so that it is suitable for subsequent sub- extracted.
processes. The goal of k-means is to reduce the variability
within the cluster. The summation of squares distances
Pre-processing
termed as errors, between each pixel and its assigned
In this step using software ENVI 5.0, to obtain true
cluster center is minimized and declared as objective
color image load test patch in RGB color with sequence
function
of green, red and near infrared bands. The test patch
SS ∑∀ x C x 1 
has5bandsi.e. red, green, far infrared, near infrared and
Wherein C(x) pixel x is assigned is to the mean of
one added band Normalized Difference Vegetation
the cluster.
Index (NDVI). The true color image is pre-processed by
filtering from convolutions and morphology using Mean Squared Error (MSE) is a measure of the
median filter. The filtered image is divided into 5types within cluster variability and represented as.
of lands i.e. agriculture, forest, settled area, barren land ∑∀
and water bodies using ROI tool. MSE 2
K-Means and ISODATA Clustering Algorithms……….. 317

In equation 2,b represents the number of spectral In Toolbox using ENVI,


Perform optional spatial and
goto Classification, then
bands, whereas N is the number of pixels and c indicates Unsupervised Classification
spectral subsetting, or/and
masking, then click OK. The K-
the number of clusters(Websource1). K-Means and then K-Means
Means Parameters dialog
Classification. Input File
implementation steps in ENVI 5.0 are shown in (Fig.2). dialog is prompted.
appears.

The ISODATA Algorithm In the fields provided


For ending the process of
ISODATA computes class means consistently next give values for
iterations give a Change
number of classes and
circulated in the data space before iteratively clusters maximum number of
Threshold in percent "0-
100%"
the continuing pixels utilizing least distance approaches. iterations.
Every iteration recalculates means as well as reclassifies
pixels through respect to the new means, while in the K- For optional standard deviation,
Means approach, the number of clusters K remains the values in the Maximum Stdev
Output to File or Memory
From Mean or Maximum
same throughout the iteration, although it mayturn out Distance Error fields are
is selected .
later that more or fewer clusters would fit the data inserted, respectively.
better. This drawback can be overcome in the
ISODATA Algorithm, which allows the number of
clusters to be adjusted automatically during the iteration Finally press OK. For each iteration
of classifier, status bar cycles from 0
by merging similar clusters and splitting clusters with to 100%. Resultant output is added
to the Layer Manager
large standard deviations (Websource2).
Implementation steps of ISODATA in ENVI 5.0 are
shown in (Fig.3) Fig 2. K-Means implementation work flow

3. PERFORMANCE ANALYSIS OF By changing the number of classes from 5 to 10 for


RESULTS AND DISCUSSIONS K-Means and 5-15 for ISODATA and keeping other
As shown in Fig. 1 (b) Abbottabad region is divided parameters default, resulted images are “Fig. 4 (2 3)” and
into 5types of land patterns but discussed in (Kamkhet. “Fig. 4 (3 1)” with 10classes respectively. It is deduced
2012) the overall of the accuracy by both the that when number of classes are same for both
unsupervised classifiers is not the indicator for each land unsupervised algorithms then resulted images clusters
pattern. Therefore for cluster analysis, true color filtered are also same. Since it is justified that ISODATA is
image of Abbottabad has been used as test patch for the extension of K-Means and when number of classes’
K-Means and ISODATA unsupervised classifiers by parameter is same, their classification is also same.
varying its parameters for performance analysis.
Change threshold parameter is analyzed in “Fig. 4
Using software Envi 5.0,“Fig. 4 (11)” shows true (3 2)” and “Fig. 4 (3 3)” by varying it from 5% to 10%
color median filtered image of Abbottabad region for for K-Means and ISODATA respectively. It has been
further processing in the unsupervised data classification, seen that there is no remarkable difference from default
“Fig. 4 (1 2)” is processed for K-Means with default parameter and varied parameter in both the unsupervised
values of number of classes 5 and change threshold of algorithms.
5.0 as a result image is clustered into 5classes i.e. red,
green, blue, cyan and yellow, while “Fig. 4 (1 3)” is After the classification processing, the result in each
clustering with ISODATA, with default initial values to band was re-checked with the accuracy, by the existing
insert in ENVI are classes = 5-10, iterations = 1, change land cover data, by cross classification and tabulation. In
threshold = 5.0, minimum pixels for class = 1, maximum this, the yields should be classified into two parts:
class stdv= 1.000, class distance minimum = 5.000 and overall accuracy of the whole image and accuracy of
maximum merge pairs = 2, as a result 7classes of red, each land cover.
green blue, cyan, yellow, purple and indigo are formed. In post classification, for class statistics of K-Means
and ISODATA clustering with default parameter values
“Fig. 4 (2 1)” and “Fig. 4 (2 2)” are obtained as has been used. Table 1 and Table 2 Class Means for each
a result of changing iterations from 1 to 10 for K-Means band number(class) against their values for full scene of
and ISODATA respectively. It has been seen that as 760,258points dimension has been evaluated for both the
iterations increases their accuracy increases, clusters and clustering algorithms. It has been shown that in both the
classes got uniform and clear and in ISODATA the cases class 1(red) has maximum class means
resulted classes increased to maximum of 10. comparatively.
A. W. ABBAS et al. 318

4. CONCLUSION
In Toolbox As ISODATA does not keep
goto Classification, then
Unsupervised
Perform optional spatial
and spectral subsetting,
a fixed number of classes,
give values in the minimum
In a nutshell it has been concluded that
Classification and then
ISODATA
or/and masking, then
click OK. The ISODATA
and maximum Number Of
Classes, for spliting and
unsupervised classification of land pattern with K-
Parameters dialog
Classification. Input
File dialog is prompted.
appears.
merging classes based on
input thresholds Means and ISODATA by varying the parameter values
for number of iterations, number of classes and change
threshold yields a different degree of accuracy and each
In proposed fields Minimum # Pixels in
justify its performance according to literature.
For splitting, enter
give values for
Maximum
Class to its field is inserted for
merging purpose. If pixels are
the Maximum Class Stdv to
its field. If the standard
In addition for post classification of class statistics,
Iterations and the
fewer than the minimum
number, class will be deleted
deviation of a class is larger
than this threshold then the
class 1(red) has maximum average.
Change Threshold and the pixels placed in the
class is split into two classes.
(0-100%). class(es) nearest to them.

In future, the performance of unsupervised


To set the optional
algorithms with supervised algorithms will be analyzed
Enter the minimum distance
(in DN) between class means
fields, enter the values in
the Maximum Stdev Output to justify literature and their application according to
and the maximum number of
merge pairs in the fields
provided for merging and
From
Mean or Maximum
to File or Memory
is selected .
situation.
Distance Error fields,
splitting respectively
respectively.

REFERENCES:
Jain., A. K. (2010) “Data clustering: 50 years beyond
Finally press OK. For each iteration of
classifier, status bar cycles from 0 to 100%. K-Means,” Journal of Pattern Recognition, Vol. 31,
Resultant output is added to the Layer
Manager 651-666, Elsevier Science Inc. New York, NY, US.
Fig 3. ISODATA implementation work flow
Kamkhet.T, (2012)“Analysis of thaichote band
characteristics using unsupervised pixel-based
classification,” 33rdAsian Conference on Remote
Sensing (ACRS), Pattaya,Thailand, Vol. 1. 3:395- 401.

Liu.S., L. Bruzzone. F. Bovolo, M. Zanettiand P. Du.


(2015) “Sequential Spectral Change Vector Analysis for
Iteratively Discovering and Detecting Multiple Changes
in Hyperspectral Images,” Transactions on Geoscience
and Remote Sensing, IEEE, Vol. 53, 8:4363–4378.

Memarsadeghi. N., N. Goddard. S. Nathan. and J. L,


Moigne. (2003)“A fast implementation of the isodata
clustering algorithm,” IEEE International Geoscience
Fig 4. (1 1) Abbottabad region filtered image, and Remote Sensing Symposium (IGARSS), Toulouse,
Table 1. Class Statistics for K-Means
France, Vol. 3, 2057–2059.
Class Distribution
Class/Band Occurrence of Individual Occurrence of Raza. A., I. A. Raja. and S. Raza (2012) "Land-use
Class (in Points) Individual Class (%age) change analysis of district abbottabad pakistan: taking
Unclassified 0 00.000
advantage of gis and remote sensing."A scientific jour.
Class 1 (Red) 197,431 25.699
Class 2 (Green) 177,599 23.117 of COMSATS – science vision Vol.18 No.1-2.675Pp.
Class 3 (Blue) 130,249 16.954
Class 4 (Yellow) 98,040 12.761 Schowengerdt D.andA.Robert.(2007)“Remote sensing:
Class 5 (Cyan) 164,939 21.469 models and methods for image processing,”. Academic
Table 2. Class Statistics for ISODATA
Press (3rd ed.) ISBN 978-0-12-369407-2, 02Pp.
Class Distribution
Class/Band Occurrence of Individual Occurrence of Tou.J. T, and R. C.Gonzalez.(1994).“Pattern
Class (in Points) Individual Class (%age) Recognition Principles,”Addison-Wesley Publishing
Unclassified 0 00.000
Class 1 (Red) 160,351 20.872 Company, Reading, Massachusetts.
Class 2 (Green) 134,212 17.470
Class 3 (Blue) 104,971 13.664 Venkateswaran. K., N. Kasthuri. K. Balakrishnan. and
Class 4 (Yellow) 86,166 11.216
Class 5 (Cyan) 72,337 09.416
K. Prakash. (2013) “Performance Analysis of K-Means
Class 5 (Purple) 59,037 07.685 Clustering For Remotely Sensed Images,” International
Class 5 (Indigo) 151,184 19.679 Jour.of Computer Applications (0975–8887) Vol.84, 12.

You might also like