Analysis&Comparisonof Efficient Techniquesof
Analysis&Comparisonof Efficient Techniquesof
Analysis&Comparisonof Efficient Techniquesof
I. INTRODUCTION
Cluster analysis divides data into meaningful or useful
groups (clusters). If meaningful clusters are the goal, then the
resulting clusters should capture the natural structure of the
data. For example, cluster analysis has been used to group
related documents for browsing, to find genes and proteins
that have similar functionality, and to provide a grouping of
spatial locations prone to earthquakes. However, in other
cases, cluster analysis is only a useful starting point for other
purposes, e.g., data compression or efficiently finding the
nearest neighbors of points. Whether for understanding or
utility, cluster analysis has long been used in a wide variety of
fields: psychology and other social sciences, biology,
statistics, pattern recognition, information retrieval, machine
learning, and data mining.
The scope of this paper is modest: to provide an
introduction to cluster analysis in the field of data mining,
where we define data mining to be the discovery of useful, but
non-obvious, information or patterns in large collections of
data. Much of this paper is necessarily consumed with
providing a general background for cluster analysis, but we
also discuss a number of clustering techniques that have
recently been developed specifically for data mining.
.
II. K- MEAN ALGORITHM
The k-means algorithm is a simple iterative method to
partition a given dataset into a user- specied number of
clusters, k. This algorithm has been discovered by several
researchers across different disciplinesGray and Neuhoff [6]
provide a nice historical Back ground for k-means placed in
the larger context of hill-climbing algorithms. The algorithm
operates on a set of d-dimensional vectors, D = {xi | i = 1, . . .
, N }, where xi d denotes the ith data point. The algorithm is
initialized by picking k points in d as the initial k cluster
109
110
where v is a class label, yi is the class label for the ith nearest
neighbors, and I () is an indicator function that returns the
value 1 if its argument is true and 0 otherwise.
111
112
V. CONCLUSION
Data mining is a broad area that integrates techniques from
several elds including machine learning, statistics, pattern
recognition, articial intelligence, and database systems, for
the analysis of large volumes of data. There have been a large
number of data mining algorithms rooted in these elds to
perform different data analysis tasks.
The K mean approach makes the solution less sensitive to
initialization, and since the hierarchical method provides
results at multiple resolutions, and K-mean algorithm is also
sensitive to the presence of outliers, since mean is not a
robust statistic. A Knn algorithm is more sophisticated
approach, k-nearest neighbor (kNN) classication ,nds a
group of k objects in the training set that are closest to the test
object, and bases the assignment of a label on the
predominance of a particular class in this neighborhood. KNN
classication is an easy to understand and easy to implement
classication technique. Despite its simplicity, it can perform
well in many situations. Apriori is a seminal algorithm for
nding frequent itemsets using candidate generation . It is
characterized as a level-wise complete search algorithm using
anti-monotonicity of item sets. We hope this paper can inspire
more researchers in data mining to further explore these
algorithms, including their impact and new research issues.
REFERENCES
[1] . Agrawal R, Srikant R (1994) Fast algorithms for mining association
rules. In: Proceedings of the 20th VLDB conference, pp 487499
113