0% found this document useful (0 votes)

47 views5 pages

Analysis&Comparisonof Efficient Techniquesof

This document summarizes and compares several clustering algorithms that are commonly used in data mining. It begins by introducing cluster analysis and its applications. It then describes the k-means algorithm in detail, including how it works, its time complexity, and some of its limitations such as sensitivity to initialization and an assumption of spherical clusters. The document also briefly discusses k-nearest neighbor classification and compares the strengths and weaknesses of different clustering techniques.

Uploaded by

astha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views5 pages

Analysis&Comparisonof Efficient Techniquesof

Uploaded by

astha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-1, Issue-3, August 2012

Analysis and Comparison of Efficient Techniques of

Clustering Algorithms in Data Mining
Shiv Pratap Singh Kushwah, Keshav Rawat, Pradeep Gupta
AbstractThis paper presents the comparison of data mining
algorithms for clustering. These algorithms are among the most
inuential data mining algorithms in the research community.
With each algorithm, we provide a description of the algorithm,
discuss the impact of the algorithm, and review current and
further research on the algorithm. These algorithms cover
classication, clustering, statistical learning, association analysis,
and link mining, which are all among the most important topics in
data mining research and development.
Index Terms cluster, data mining, clustering method, k-mean.

I. INTRODUCTION
Cluster analysis divides data into meaningful or useful
groups (clusters). If meaningful clusters are the goal, then the
resulting clusters should capture the natural structure of the
data. For example, cluster analysis has been used to group
related documents for browsing, to find genes and proteins
that have similar functionality, and to provide a grouping of
spatial locations prone to earthquakes. However, in other
cases, cluster analysis is only a useful starting point for other
purposes, e.g., data compression or efficiently finding the
nearest neighbors of points. Whether for understanding or
utility, cluster analysis has long been used in a wide variety of
fields: psychology and other social sciences, biology,
statistics, pattern recognition, information retrieval, machine
learning, and data mining.
The scope of this paper is modest: to provide an
introduction to cluster analysis in the field of data mining,
where we define data mining to be the discovery of useful, but
non-obvious, information or patterns in large collections of
data. Much of this paper is necessarily consumed with
providing a general background for cluster analysis, but we
also discuss a number of clustering techniques that have
recently been developed specifically for data mining.
.
II. K- MEAN ALGORITHM
The k-means algorithm is a simple iterative method to
partition a given dataset into a user- specied number of
clusters, k. This algorithm has been discovered by several
researchers across different disciplinesGray and Neuhoff [6]
provide a nice historical Back ground for k-means placed in
the larger context of hill-climbing algorithms. The algorithm
operates on a set of d-dimensional vectors, D = {xi | i = 1, . . .
, N }, where xi d denotes the ith data point. The algorithm is
initialized by picking k points in d as the initial k cluster

representatives or centroids. Techniques for selecting these

initial seeds include sampling at random from the dataset,
setting them as the solution of clustering a small subset of the
data or perturbing the global mean of the data k times. Then
the algorithm iterates between two steps till convergence:
Step 1: Data Assignment. Each data point is assigned to its
closest centroid, with ties broken arbitrarily. This results in a
partitioning of the data.
Step 2: Relocation of means. Each cluster representative
is relocated to the center (mean) of all data points assigned to
it. If the data points come with a probability measure
(weights), then the relocation is to the expectations (weighted
mean) of the data partitions.
The algorithm converges when the assignments (and hence
the cj values) no longer change. The algorithm execution is
visually depicted in Fig. 1. Note that each iteration needs N
k comparisons, which determines the time complexity of one
iteration. The number of iterations required for convergence
varies and may depend on N , but as a rst cut, this algorithm
can be considered linear in the dataset size.
One issue to resolve is how to quantify closest in the
assignment step. The default measure of closeness is the
Euclidean distance, in which case one can readily show that
the non-negative cost function,

will decrease whenever there is a change in the assignment or

the relocation steps, and hence convergence is guaranteed in a
nite number of iterations. The greedy-descent nature of
k-means on a non-convex cost also implies that the
convergence is only to a local optimum, and indeed the
algorithm is typically quite sensitive to the initial centroid
locations. Figure 2 1illustrates how a poorer result is obtained
for the same dataset as in Fig. 1 for a different choice of the
three initial centroids. The local minima problem can be
countered to some extent by running the algorithm multiple
times with different initial centroids, or by doing limited local
search about the converged solution.

Manuscript received August 08, 2012.

Shiv Pratap Singh Kushwah, Department of CSE/IT, ITM Universe,
Gwalior, India
Keshav Rawat, Department of CSE/IT, ITM Universe, Gwalior, India
Pradeep Gupta, Department of CSE/IT, GEC, Gwalior, India

109

Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining

Fig. 1 Changes in cluster representative locations (indicated by

+ signs) and data assignments (indicated by color) during an
execution of the k-means algorithm

In addition to being sensitive to initialization, the k-means

algorithm suffers from several other problems. First, observe
that k-means is a limiting case of tting data by a mixture of k
Gaussians with identical, isotropic covariance matrices ( = 2
I), when the soft assignments of data points to mixture
components are hardened to allocate each data point solely to
the most likely component. So, it will falter whenever the data
is not well described by reasonably separated spherical balls,
for example, if there are non-covex shaped clusters in the
data. This problem may be alleviated by rescaling the data to
whiten it before clustering, or by using a different distance
measure that is more appropriate for the dataset. For example,
information-theoretic clustering uses the KL-divergence to
measure the distance between two data points representing
two discrete probability distributions. It has been recently
shown that if one measures distance by selecting any member
of a very large class of divergences called Bregman
divergences during the assignment step and makes no other
changes, the essential properties of k-means, including
guaranteed convergence, linear separation boundaries and
scalability, are retained [2]. This result makes k-means
effective for a much larger class of datasets so long as an
appropriate divergence is used. k-means can be paired with
another algorithm to describe non-convex clusters. One rst
clusters the data into a large number of groups using k-means.
These groups are then agglomerated into larger clusters using
single link hierarchical clustering, which can detect complex
shapes. This approach also makes the solution less sensitive to
initialization, and since the hierarchical method provides
results at multiple resolutions, one does not need to
pre-specify k either.
The cost of the optimal solution decreases with increasing k
till it hits zero when the number of clusters equals the number
of distinct data-points. This makes it more difcult to (a)
directly compare solutions with different numbers of clusters
and (b) to nd the optimum value of k. If the desired k is not
known in advance, one will typically run k-means with
different values of k, and then use a suitable criterion to select
one of the results. For example, SAS uses the
cube-clustering-criterion, while X-means adds a complexity
term (which increases with k) to the original cost function
(Eq. 1) and then identies the k which minimizes this adjusted
cost. Alternatively, one can progressively increase the number
of clusters, in conjunction with a suitable stopping criterion.
Bisecting k-means achieves
this by rst putting all the data into a single cluster, and then
recursively splitting the least compact cluster into two using
2-means. The celebrated LBG algorithm [6] used for vector
quantization doubles the number of clusters till a suitable
code-book size is obtained. Both these approaches thus
alleviate the need to know k beforehand.
The algorithm is also sensitive to the presence of outliers,
since mean is not a robust statistic. A preprocessing step to
remove outliers can be helpful.

III. KNN: K-NEAREST NEIGHBOR CLASSIFICATION

Fig. 2 Effect of an inferior initialization on the k-means results

One of the simplest, and rather trivial classiers is the Rote

classier, which memorizes the entire training data and
performs classication only if the attributes of the test object
match one of the training examples exactly. An obvious
drawback of this approach is that many test records will not be

110

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-1, Issue-3, August 2012
classied because they do not exactly match any of the
training records. A more sophisticated approach, k-nearest
neighbor (kNN) classication [4,10], nds a group of k
objects in the training set that are closest to the test object, and
bases the assignment of a label on the predominance of a
particular class in this neighborhood. There are three key
elements of this approach: a set of labeled objects, e.g., a set
of stored records, a distance or similarity metric to compute
distance between objects, and the value of k, the number of
nearest neighbors. To classify an unlabeled object, the
distance of this object to the labeled objects is computed, its
k-nearest neighbors are identied, and the class labels of these
nearest neighbors are then used to determine the class label of
the object.
Figure 3 provides a high-level summary of the
nearest-neighbor classication method. Given a training set D
and a test object x = (x , y ), the algorithm computes the
distance (or similarity) between z and all the training objects
(x, y) D to determine its nearest-neighbor list, Dz . (x is the
data of a training object, while y is its class. Likewise, x is the
data of the test object and y is its class.)
Once the nearest-neighbor list is obtained, the test object is
classied based on the majority class of its nearest neighbors:

where v is a class label, yi is the class label for the ith nearest
neighbors, and I () is an indicator function that returns the
value 1 if its argument is true and 0 otherwise.

Fig. 3 The k-nearest neighbor classication algorithm

3.1 Issue with kNN:

There are several key issues that affect the performance of
kNN. One is the choice of k. If k is too small, then the result
can be sensitive to noise points. On the other hand, if k is too
large, then the neighborhood may include too many points
from other classes.
Another issue is the approach to
combining the class labels. The simplest method is to take a
majority vote, but this can be a problem if the nearest
neighbors vary widely in their distance and the closer
neighbors more reliably indicate the class of the object. A
more sophisticated approach, which is usually much less
sensitive to the choice of k, weights each objects vote by its
distance, where the weight factor is often taken to be the
reciprocal of the squared distance: wi = 1/d(x , xi )2 . This
amounts to replacing the last step of the kNN algorithm with
the following:

The choice of the distance measure is another important

consideration. Although various measures can be used to

compute the distance between two points, the most desirable

distance measure is one for which a smaller distance between
two objects implies a greater likelihood of having the same
class. Thus, for example, if kNN is being applied to classify
documents, then it may be better to use the cosine measure
rather than Euclidean distance. Some distance measures can
also be affected by the high dimensionality of the data. In
particular, it is well known that the Euclidean distance
measure become less discriminating as the number of
attributes increases. Also, attributes may have to be scaled to
prevent distance measures from being dominated by one of
the attributes. A number of schemes have been developed that
try to compute the weights of each individual attribute based
upon a training set [5].
3.2 Impact of kNN:
KNN classication is an easy to understand and easy to
implement classication technique. Despite its simplicity, it
can perform well in many situations. In particular, a well
known result by Cover and Hart [3] shows that the the error of
the nearest neighbor rule is bounded above by twice the Bayes
error under certain reasonable assumptions. Also, the error of
the general kNN method asymptotically approaches that of
the Bayes error and can be used to
approximate it.
KNN is particularly well suited for multi-modal classes as
well as applications in which an object can have many class
labels. For example, for the assignment of functions to genes
based on expression proles, some researchers found that
kNN outperformed SVM, which is a much more sophisticated
classication scheme [9].
3.3 Current and future research
Although the basic kNN algorithm and some of its variations,
such as weighted kNN and assigning weights to objects, are
relatively well known, some of the more advanced techniques
for kNN are much less known. For example, it is typically
possible to eliminate many of the stored data objects, but still
retain the classication accuracy of the kNN classier. This is
known as condensing and can greatly speed up the
classication of new objects [7]. In addition, data objects can
be removed to improve classication accuracy, a process
known as editing [13]. There has also been a considerable
amount of work on the application of proximity graphs
(nearest neighbor graphs, minimum spanning trees, relative
neighborhood graphs, Delaunay triangulations, and Gabriel
graphs) to the kNN problem. Recent papers by Toussaint
[11,12], which emphasize a proximity graph viewpoint,
provide an overview of work addressing these three areas and
indicate some remaining open problems.

IV. THE PRIORI ALGORITHM

One of the most popular data mining approaches is to nd
frequent itemsets from a transaction dataset and derive
association rules. Finding frequent itemsets (itemsets with
frequency larger than or equal to a user specied minimum
support) is not trivial because of its combinatorial explosion.
Once frequent itemsets are obtained, it is straightforward to
generate association rules with condence larger than or equal
to a user specied minimum condence. Apriori is a seminal
algorithm for nding frequent itemsets using candidate

111

Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining

generation [1]. It is characterized as a level-wise complete
search algorithm using anti-monotonicity of itemsets, if an
itemset is not frequent, any of its superset is never frequent.
By convention, Apriori assumes that items within a
transaction or itemset are sorted in lexicographic order.
Let the set of frequent itemsets of size k be Fk and their
candidates be Ck . Apriori rst scans the database and
searches for frequent itemsets of size 1 by accumulating the
count for each item and collecting those that satisfy the
minimum support requirement. It then iterates on the
following three steps and extracts all the frequent itemsets.
1. Generate Ck+1 , candidates of frequent itemsets of size k +
1, from the frequent itemsets
of size k.
2. Scan the database and calculate the support of each
candidate of frequent itemsets.
3. Add those itemsets that satises the minimum support
requirement to Fk+1 .
The Apriori algorithm is shown in Fig. 3. Function apriori-gen
in line 3 generates Ck+1 from Fk in the following two step
process:
1. Join step: Generate R K +1 , the initial candidates of
frequent itemsets of size k + 1 by taking the union of the two
frequent itemsets of size k, Pk and Q k that have the rst k 1
elements in common.

where, iteml < item2 < < item k < item k .

2. Prune step: Check if all the itemsets of size k in Rk+1 are
frequent and generate Ck+1 by removing those that do not
pass this requirement from Rk+1 . This is because any subset
of size k of Ck+1 that is not frequent cannot be a subset of a
frequent itemset of size k + 1.
Function subset in line 5 nds all the candidates of the
frequent itemsets included in transaction t. Apriori, then,
calculates frequency only for those candidates generated this
way by scanning the database.
It is evident that Apriori scans the database at most kmax+1
times when the maximum size of frequent itemsets is set at
kmax .
The Apriori achieves good performance by reducing the
size of candidate sets (Fig. 4). However, in situations with
very many frequent itemsets, large itemsets, or very low minimum support, it still suffers from the cost of generating a
huge number of candidate sets and scanning the database
repeatedly to check a large set of candidate itemsets. In fact, it
is necessary to generate 2100 candidate itemsets to obtain
frequent itemsets of size 100.

Fig. 4 Apriori algorithm

4.1 The impact of the algorithm
Many of the pattern nding algorithms such as decision tree,
classication rules and clustering techniques that are
frequently used in data mining have been developed in
machine learning research community. Frequent pattern and
association rule mining is one of the few exceptions to this
tradition. The introduction of this technique boosted data
mining research and its impact is tremendous. The algorithm
is quite simple and easy to implement. Experimenting with
Apriori-like algorithm is the rst thing that data miners try to
do.
4.2 Current and further research
Since Apriori algorithm was rst introduced and as
experience was accumulated, there have been many attempts
to devise more efcient algorithms of frequent itemset
mining. Many of them share the same idea with Apriori in that
they generate candidates. These include hash-based
technique, partitioning, sampling and using vertical data
format. Hash-based technique can reduce the size of
candidate itemsets. Each itemset is hashed into a corresponding bucket by using an appropriate hash function. Since
a bucket can contain different itemsets, if its count is less than
a minimum support, these itemsets in the bucket can be
removed from the candidate sets. A partitioning can be used
to divide the entire mining problem into n smaller problems.
The dataset is divided into n non-overlapping partitions such
that each partition ts into main memory and each partition is
mined separately. Since any itemset that is potentially
frequent with respect to the entire dataset must occur as a
frequent itemset in at least one of the partitions, all the
frequent itemsets found this way are candidates, which can be
checked by accessing the entire dataset only once. Sampling
is simply to mine a random sampled small subset of the entire
data. Since there is no guarantee that we can nd all the
frequent itemsets, normal practice is to use a lower support
threshold. Trade off has to be made between accuracy and
efciency. Apriori uses a horizontal data format, i.e. frequent
itemsets are associated with each transaction. Using vertical
data format is to use a different format in which transaction

112

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-1, Issue-3, August 2012
IDs (TIDs) are associated with each itemset. With this format,
mining can be performed by taking the intersection of TIDs.
The support count is simply the length of the TID set for the
itemset. There is no need to scan the database because TID set
carries the complete information required for computing
support.
The most outstanding improvement over Apriori would be
a method called FP-growth (frequent pattern growth) that
succeeded in eliminating candidate generation [8]. It adopts a
divide and conquer strategy by (1) compressing the database
representing frequent items into a structure called FP-tree
(frequent pattern tree) that retains all the essential information
and (2) dividing the compressed database into a set of
conditional databases, each associated
with one frequent itemset and mining each one separately. It
scans the database only twice. In the rst scan, all the frequent
items and their support counts (frequencies) are derived and
they are sorted in the order of descending support count in
each transaction. In the second scan, items in each transaction
are merged into a prex tree and items (nodes) that appear in
common in different transactions are counted. Each node is
associated with an item and
its count. Nodes with the same label are linked by a pointer
called node-link. Since items are sorted in the descending
order of frequency, nodes closer to the root of the prex tree
are shared by more transactions, thus resulting in a very
compact representation that stores all the necessary
information. Pattern growth algorithm works on FP-tree by
choosing an item in the order of increasing frequency and
extracting frequent itemsets that contain the
chosen item by recursively calling itself on the conditional
FP-tree. FP-growth is an order of magnitude faster than the
original Apriori algorithm.

[2] . Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with

Bregman divergences. J Mach Learn Res 6:17051749
[3] . Cover T, Hart P (1967) Nearest neighbor pattern classication. IEEE
Trans Inform Theory 13(1):2127
[4] . Fix E, Hodges JL, Jr (1951) Discriminatory analysis, nonparametric
discrimination. USAF School of Aviation Medicine, Randolph Field,
Tex., Project 21-49-004, Rept. 4, Contract AF41(128)-31, February
1951
[5] Han E (1999) Text categorization using weight adjusted k-nearest
neighbor classication. PhD thesis, University of Minnesota, October
1999
[6] Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inform Theory
44(6):23252384
[7] Hart P (1968) The condensed nearest neighbor rule. IEEE Trans Inform
Theory 14:515516
[8] Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate
generation. In: Proceedings of ACM SIGMOD international
conference on management of data, pp 112
[9] Kuramochi M, Karypis G (2005) Gene Classication using Expression
Proles: A Feasibility Study. Int J Artif Intell Tools 14(4):641660
[10] Tan P-N, Steinbach M, Kumar V (2006) Introduction to data mining.
Pearson Addison-Wesley11. Toussaint GT (2002) Proximity graphs
for nearest neighbor decision rules: recent progress. In: Interface- 2002,
34th symposium on computing and statistics (theme: Geoscience and
Remote Sensing). Ritz-Carlton Hotel, Montreal, Canada, 1720 April,
2002
[11] 12.Toussaint GT (2002) Open problems in geometric methods for
instance-based learning. JCDCG 273283
[12] Wilson DL (1972) Asymptotic properties of nearest neighbor rules
using edited data. IEEE Trans Syst Man Cyberne 2:408420

V. CONCLUSION
Data mining is a broad area that integrates techniques from
several elds including machine learning, statistics, pattern
recognition, articial intelligence, and database systems, for
the analysis of large volumes of data. There have been a large
number of data mining algorithms rooted in these elds to
perform different data analysis tasks.
The K mean approach makes the solution less sensitive to
initialization, and since the hierarchical method provides
results at multiple resolutions, and K-mean algorithm is also
sensitive to the presence of outliers, since mean is not a
robust statistic. A Knn algorithm is more sophisticated
approach, k-nearest neighbor (kNN) classication ,nds a
group of k objects in the training set that are closest to the test
object, and bases the assignment of a label on the
predominance of a particular class in this neighborhood. KNN
classication is an easy to understand and easy to implement
classication technique. Despite its simplicity, it can perform
well in many situations. Apriori is a seminal algorithm for
nding frequent itemsets using candidate generation . It is
characterized as a level-wise complete search algorithm using
anti-monotonicity of item sets. We hope this paper can inspire
more researchers in data mining to further explore these
algorithms, including their impact and new research issues.
REFERENCES
[1] . Agrawal R, Srikant R (1994) Fast algorithms for mining association
rules. In: Proceedings of the 20th VLDB conference, pp 487499

113

Finals Lesson 2 Development of Motivation and Self-Regulation, Moral Development Theories
100% (2)
Finals Lesson 2 Development of Motivation and Self-Regulation, Moral Development Theories
18 pages
Leadership in Education-1
100% (2)
Leadership in Education-1
57 pages
Food Countable - Uncountable
100% (5)
Food Countable - Uncountable
2 pages
Legal Philosophy As Practical Philosophy: Revus
No ratings yet
Legal Philosophy As Practical Philosophy: Revus
25 pages
Dll-Mil 2023
No ratings yet
Dll-Mil 2023
3 pages
Term Paper: Amity Institute of Psychology and Allied Sciences Amity University, Noida
No ratings yet
Term Paper: Amity Institute of Psychology and Allied Sciences Amity University, Noida
17 pages
Chapter 4 Deconstruction
No ratings yet
Chapter 4 Deconstruction
10 pages
Chitta', "The Mind-Stuff" As A Cognitive Apparatus Model of Mind and Process of Cognition As in Yogasutra of Patanjali - RP
No ratings yet
Chitta', "The Mind-Stuff" As A Cognitive Apparatus Model of Mind and Process of Cognition As in Yogasutra of Patanjali - RP
7 pages
Theory of Computer Science - SCJ 3203: Paridah Samsuri Mohd Soperi Mohd Zahid
No ratings yet
Theory of Computer Science - SCJ 3203: Paridah Samsuri Mohd Soperi Mohd Zahid
49 pages
Phillip Kevin Lane: Kotler - Keller
No ratings yet
Phillip Kevin Lane: Kotler - Keller
32 pages
Cohesive Devices
No ratings yet
Cohesive Devices
31 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
The Nature of Leadership: The Learning Objectives of This Chapter Are To
No ratings yet
The Nature of Leadership: The Learning Objectives of This Chapter Are To
12 pages
Rhymes Proposal
No ratings yet
Rhymes Proposal
29 pages
16pf - 6th Edition - Competency Profile and Interview Guide - Ella - SAMPLE
No ratings yet
16pf - 6th Edition - Competency Profile and Interview Guide - Ella - SAMPLE
26 pages
Threshold Level Syllabus From Threshold 1990 (Ek and Trim 1990) Language Functions For Threshold Level 1 Imparting and Seeking Factual Information
No ratings yet
Threshold Level Syllabus From Threshold 1990 (Ek and Trim 1990) Language Functions For Threshold Level 1 Imparting and Seeking Factual Information
3 pages
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
No ratings yet
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
12 pages
Strong Slot and Filler
No ratings yet
Strong Slot and Filler
17 pages
IV-Day 5
No ratings yet
IV-Day 5
3 pages
Feet Competencies and Rubrics 5
No ratings yet
Feet Competencies and Rubrics 5
12 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
What Are The Different Forces of Change in The Workplace
No ratings yet
What Are The Different Forces of Change in The Workplace
2 pages
Unsupervised K-Means Clustering Algorithm
No ratings yet
Unsupervised K-Means Clustering Algorithm
17 pages
Personal Affirmation Statement Personal Affirmation Statement
No ratings yet
Personal Affirmation Statement Personal Affirmation Statement
1 page
Standardization and Its Effects On K-Means Clustering Algorithm
No ratings yet
Standardization and Its Effects On K-Means Clustering Algorithm
6 pages
Task and Feedback
No ratings yet
Task and Feedback
10 pages
Lesson Plan Grade 9 January
No ratings yet
Lesson Plan Grade 9 January
7 pages
Fraud Detection in Credit Card by Clustering Approach: 2. K-Means Clustering Algorithm
No ratings yet
Fraud Detection in Credit Card by Clustering Approach: 2. K-Means Clustering Algorithm
4 pages
W6 Prep-Geography
No ratings yet
W6 Prep-Geography
4 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
An Initial Seed Selection Algorithm
No ratings yet
An Initial Seed Selection Algorithm
11 pages
Ota Mentor
No ratings yet
Ota Mentor
5 pages
Data Clustering Using Kernel Based
No ratings yet
Data Clustering Using Kernel Based
6 pages
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
No ratings yet
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
6 pages
Sample Logic OBE Syllabusedited
No ratings yet
Sample Logic OBE Syllabusedited
7 pages
A Wavelet-Based Anytime Algorithm For K-Means Clustering of Time Series
No ratings yet
A Wavelet-Based Anytime Algorithm For K-Means Clustering of Time Series
12 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Xuelin Qian Pose-Normalized Image Generation ECCV 2018 Paper
No ratings yet
Xuelin Qian Pose-Normalized Image Generation ECCV 2018 Paper
18 pages
Dynamic Approach To K-Means Clustering Algorithm-2
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
16 pages
Research On K-Value Selection Method of K-Means Clustering Algorithm
No ratings yet
Research On K-Value Selection Method of K-Means Clustering Algorithm
10 pages
A Dangerous Method Film Analysis
No ratings yet
A Dangerous Method Film Analysis
2 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
AK-means: An Automatic Clustering Algorithm Based On K-Means
No ratings yet
AK-means: An Automatic Clustering Algorithm Based On K-Means
6 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
16 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
KMeansPP Soda
No ratings yet
KMeansPP Soda
9 pages
Unit 5
No ratings yet
Unit 5
63 pages
K-Means Clustering Algorithm and Its Improvement R
No ratings yet
K-Means Clustering Algorithm and Its Improvement R
6 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
No ratings yet
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
10 pages
The K-Means Clustering Technique General Considera
No ratings yet
The K-Means Clustering Technique General Considera
11 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
K-Means Clustering Clustering Algorithms Implementation and Comparison
No ratings yet
K-Means Clustering Clustering Algorithms Implementation and Comparison
4 pages
Comprehensive Review of K-Means Clustering Algorithms
No ratings yet
Comprehensive Review of K-Means Clustering Algorithms
5 pages
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
No ratings yet
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
10 pages
Ijert Ijert: Decision Making To Predict Customer Preferences in Life Insurance
No ratings yet
Ijert Ijert: Decision Making To Predict Customer Preferences in Life Insurance
4 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
02 Data Mining-Partitioning Method
No ratings yet
02 Data Mining-Partitioning Method
8 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
GE 1 LP 1 ANSWER SHEET Revised
No ratings yet
GE 1 LP 1 ANSWER SHEET Revised
3 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
0909 PHD Lopez-Moliner
No ratings yet
0909 PHD Lopez-Moliner
2 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
No ratings yet
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
5 pages
Emails Do's and Don'Ts
No ratings yet
Emails Do's and Don'Ts
1 page
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Genedata
No ratings yet
Genedata
67 pages
Application of K-Means 1002.2425 PDF
No ratings yet
Application of K-Means 1002.2425 PDF
4 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
Yihao Final Paper CCSC For Submission
No ratings yet
Yihao Final Paper CCSC For Submission
6 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
Na 2010
No ratings yet
Na 2010
5 pages
Fast and Robust General Purpose Clustering Algorit
No ratings yet
Fast and Robust General Purpose Clustering Algorit
29 pages
Kmeans Journal
No ratings yet
Kmeans Journal
21 pages
Anupama Luthra - 2011
No ratings yet
Anupama Luthra - 2011
21 pages
8910 24120 1 PB
No ratings yet
8910 24120 1 PB
7 pages
Unit 4
No ratings yet
Unit 4
46 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

Analysis&Comparisonof Efficient Techniquesof

Uploaded by

Analysis&Comparisonof Efficient Techniquesof

Uploaded by

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

ISSN: 2278-3075, Volume-1, Issue-3, August 2012

Analysis and Comparison of Efficient Techniques of

representatives or centroids. Techniques for selecting these

will decrease whenever there is a change in the assignment or

Manuscript received August 08, 2012.

Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining

Fig. 1 Changes in cluster representative locations (indicated by

In addition to being sensitive to initialization, the k-means

III. KNN: K-NEAREST NEIGHBOR CLASSIFICATION

Fig. 2 Effect of an inferior initialization on the k-means results

One of the simplest, and rather trivial classiers is the Rote

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

Fig. 3 The k-nearest neighbor classication algorithm

3.1 Issue with kNN:

The choice of the distance measure is another important

compute the distance between two points, the most desirable

IV. THE PRIORI ALGORITHM

Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining

where, iteml < item2 < < item k < item k .

Fig. 4 Apriori algorithm

International Journal of Innovative Technology and Exploring Engineering (IJITEE)

[2] . Banerjee A, Merugu S, Dhillon I, Ghosh J (2005) Clustering with

You might also like