Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
Clustering Algorithm For Spatial Data Mining: An: A.Padmapriya, N.Subitha
28
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.10, April 2013
Interpretation
Knowledge
Data mining
Patterns
Transformation
Preprocessing
Transformed Data
Preprocessed Data
Selection
Target Data
Data
2. BACKGROUND STUDY A data mining system may accomplish one or more of the
In general, data mining tasks can be classified into two following data mining tasks [1, 4].
categories: descriptive data mining and predictive data
mining. The former describes the data set in a concise and 1. Class description. Class description provides a concise
summary manner and presents interesting general properties and succinct summarization of a data and distinguishes
of the data whereas the latter construct one or a set of models, it from others .The summarization of a collection of data
performs inference on the available set if data, and attempts to is called class characterization; whereas the comparison
predict the behavior of new data sets. between two or more collections of data is called class
comparison or discrimination. Class description should
cover not only its summary properties, such as count,
29
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.10, April 2013
sum, and average, but also its properties on data ensure that the inter-cluster similarity is low and the intra-
dispersion, such as variance, quartile, etc. cluster similarity is high [10]. For example, one may cluster
the houses in an area according to their house category, floor
For example, class description can be used to compare area, and geographical locations.
European versus Asian sales of a company, identify the
important factors which discriminate the two classes, and Data mining research has been focused on high quality and
present a summarized overview. scalable clustering methods for large databases and
multidimensional data warehouses.
2. Association. Association is the discovery of association
relationships or correlations among a set of items. They 6. Time-series analysis. Time-series analysis is to analyze
are often expressed in the rule form showing attribute- large set of time-series data to find certain regularities and
value conditions that occur frequently together in a interesting characteristics, including search for similar
given set of data. An association rule in the form of sequence or subsequence patterns, periodicities, trends and
X→Y is interpreted as “database tuples that satisfy X deviations. For example, one may predict the trend of the
are likely to satisfy Y”. stock values for a company based on its stock history,
Association analysis is widely used in transaction data business situation, competitor’s performance, and current
analysis for directed marketing, catalog design, and market.
other business decision making process.
There are also other data mining tasks, such as outlier
Substantial research has been performed recently on analysis, etc. Identification of new data mining tasks to make
association analysis with efficient algorithms proposed, better use of the collected data itself is an interesting research
including the level-wise Apriori search, mining multiple-level, topic.
multi-dimensional associations, mining associations for
numerical, categorical, and interval data, meta-pattern directed Applications
or constraint-based mining, and mining correlations.
Data mining is a young discipline with wide and diverse
3. Classification. Classification analyzes a set of training data applications, there is still a nontrivial gab between general
(i.e., a set of objects whose class label is known) and principles of data mining tools for particular applications.
constructs a model for each class based on the features in the
data. A decision tree rules is generated by such a classification 1. Biomedical and DNA Data Analysis.
process, which can be used for better understanding of each
2. Financial Data Analysis.
class in the database and for classification of future data [1].
For example, one may classify diseases and help predict the 3. Retail Industry.
kind of diseases based on the symptoms of patients.
4. Telecommunication Industry.
There have been many classification methods developed in
the fields of machine learning, statistics, database, neural
3. SPATIAL DATA MINING
network, rough sets, and others. Classification has been used
in customer segmentation, business modeling, and credit Spatial data are the data that have spatial or location
analysis. component, and they the information, which is more complex
than classical data. A spatial database stores spatial data
4. Prediction. This mining function predicts the possible represents by spatial data types and spatial relationship and
values of some missing data or the value distribution of among data [6, 8].
certain attributes in a set of objects. It involves the finding of Spatial data is a highly demanding field because huge
the set of attributes relevant to the attribute of interest (e.g., by amounts of spatial data have been collected in various
applications, ranging from remote sensing, to geographical
some statistical analysis) and predicting the value distribution
information systems (GIS), computer cartography,
based on the set of data o f data similar to the selected objects. environmental assessment and planning[8] etc.
For example, an employee’s potential salary can be predicted
based on the salary distribution of similar employees in the Data Attributes
company. Usually, regression analysis, generalized linear
model, correlation analysis and decision trees are useful tools DATA = the (WHAT) dimension determines an attribute of
in quality prediction. Genetic algorithms and neural network an object.
models are also popularly used in prediction. SPATIAL DATA = (WHERE) & (WHAT) denotes attribute
data referenced to a specific location.
The Attributes of spatial objects are highly dependent on
5. Clustering. Clustering analysis is to identify clusters
location and often influenced by neighboring objects.
embedded in the data, where a cluster is a collection of data
objects that are “similar” to one another. Similarity can be
expressed by distance functions, specified by users or experts.
A good clustering method produces high quality clusters to
30
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.10, April 2013
A spatial database stores a large amount of space-related Some of the applications of spatial data mining are listed
data, such as maps, preprocessed remote sensing or medical below,
imaging data, and VLSI chip layout data. Spatial database
carry topological and or distance information, usually Geographic information systems,
organized by sophisticated, multidimensional spatial indexing
structures that are accessed by spatial data access methods and
Geo marketing
often require spatial reasoning, geometric computation, and remote sensing
spatial knowledge representation techniques [12]. image database exploration
medical imaging
Spatial Data mining navigation
traffic control
Spatial data mining is the process [18] of discovering environmental studies
interesting and previously un-known, but potentially useful
patterns from large spatial datasets. Extracting interesting and
useful patterns from spatial datasets is more difficult than (D) Clustering Methods
extracting the corresponding patterns from traditional numeric
and categorical data due to the complexity of spatial data The collection of clusters is known as clustering.
types, spatial relationships, and spatial autocorrelation. Spatial Goal: like Generalization, to reveal relationships between
data mining , i.e., mining knowledge from large amounts of spatial and non-spatial attributes
spatial data, is a highly demanding field because huge
amounts of spatial data have been collected in various There are various types of clustering as follows
applications, ranging from remote sensing, to geographical
information system(GIS), computer cartography, 1. Hierarchical Methods
environmental assessment and planning[8] etc. The collected
data far exceeded human’s ability to analyze. Recent studies It can have two types of algorithms they are [9],
on data mining have extended the scope of data mining from Agglomerative Algorithm
relational and transactional databases to spatial databases.
Divisive Algorithm
(A) Spatial Data Mining Methods 2. Partitioning Methods
Spatial data mining has to perform various methods some of It can contain many types of algorithms they are [10],
them are mentioned below Nearest Neighbor Algorithm
Density Based Algorithm
1. Generalization Based Knowledge Discovery
2. Clustering Methods K-Medoids Methods
3. Aggregate Proximity Measuring K-Mean Methods
4. Spatial Association Rules
3. Grid Based Methods
Among the four methods the research is based on
clustering method. 4. Methods Based on Co-occurrence of Categorical Data.
31
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.10, April 2013
Step 1: Place randomly initial group centroids into the We have to re-assigns each record in the dataset to the most
similar cluster and re-calculate the arithmetic mean of all the
2d space.
clusters in the dataset. The arithmetic mean of a cluster is the
arithmetic mean of all the records in that cluster.
Step 2: Assign each object to the group that has the
closest centroid. For Example, if a cluster contains two records where the
record of the set of measurements for
Step 3: Recalculate the positions of the centroids.
John = {20, 170, 80} and
Step 4: If the positions of the centroids didn't change Henry = {30, 160, 120},
32
International Journal of Computer Applications (0975 – 8887)
Volume 68– No.10, April 2013
33