0% found this document useful (0 votes)

64 views6 pages

Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648

Abstract— The ability to monitor the progress of students’ academic performance is a critical issue to the academic community of higher learning. A system for analyzing students’ results based on cluster analysis and uses standard statistical algorithms to arrange their scores data according to the level of their performance is described. In this paper, we also implemented k-mean clustering algorithm for analyzing students’ result data. The model was combined with the deterministic model to anal

Uploaded by

Somu Naskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views6 pages

Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648

Uploaded by

Somu Naskar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Engineering Research & Technology (IJERT)

ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

Analysis And Study Of K-Means Clustering Algorithm

Sudhir Singh and Nasib Singh Gill
Deptt of Computer Science & Applications
M. D. University, Rohtak, Haryana

Abstract We present our algorithm in Section 3, time

Study of this paper describes the behavior of K- complexity of algorithms in Section 4, we describe
means algorithm. Through this paper we have try to the experimental results in Section 5 and we conclude
overcome the limitations of K-means algorithm by with Section 6.
proposed algorithm. Basically actual K-mean
algorithm takes lot of time when it is applied on a 2. K-MEANS CLUSTERING
large database. That’s why the proposed clustering
concept comes into picture to provide quick and K-means algorithm is one of the partitioning based
efficient clustering technique on large data set. In clustering algorithms [2]. The general objective is to
this paper performance evaluation is done for obtain the fixed number of partitions/clusters that
proposed algorithm using Max Hospital Diabetic minimize the sum of squared Euclidean distances
Patient Dataset. between objects and cluster centroids.
Keywords: Clustering, K-means, Threshold, Let X={xi| i=1,2,…………..,n} be a data set with n
outlier, Square Error. objects, k is the number of clusters, mj is the centroid
RT
of cluster cj where j=1,2,……….,k. Then the
algorithm finds the distance between a data object
and a centroid by using the following Euclidean
1. Introduction
IJE

distance formula [1].

The Euclidean distance between
Clustering is the process of partitioning or grouping a two points/objects/items in a dataset, defined by point
given set of patterns into disjoint clusters. This is X and point Y is defined by Equation below [5].
done such that patterns in the same cluster are alike EUCLIDEAN DISTANCE(X,Y) = ( |X1-
and patterns belonging to two different clusters are Y1|2 + |X2-Y2|2 + ... + |XN-1-YN-1|2 + |XN-YN|2 )1/2
different. Clustering has been a widely studied OR Euclidean distance formula=√∑|xi-mj|2
problem in a variety of application domains. Several where X represents is the first data point, Y is the
algorithms have been proposed in the literature for second data point, N is the number of characteristics
clustering: CLARA, CLARANS [6], Focusing or attributes in data mining terminology.
Techniques [4], P-CLUSTER [5]. DBSCAN [3] and Starting from an initial distribution of cluster centers
BIRCH [7]. The k-means method has been shown to in data space, each object is assigned to the cluster
be effective in producing good clustering results for with closest center, after which each center itself is
many practical applications. However, a direct updated as the center of mass of all objects belonging
algorithm of k-means method requires time to that particular cluster. The procedure is repeated
proportional to the product of number of patterns and until convergence.
number of clusters per iteration. This is
computationally very expensive especially for large 2.1. K-MEANS ALGORITHM [1]
datasets. We propose a novel algorithm for
implementing the k-means method. Our algorithm INPUT: // Set of n items to cluster
produces the same or comparable (due to the round-
off errors) clustering results to the direct k-means D= {d1, d2, d3,………………, dn}
algorithm. It has significantly superior performance
than the direct k-means algorithm in most cases. The
// No. of cluster (temporary cluster)
rest of this paper is organized as follows. We review
previously proposed approaches for improving the randomly chosen i.e. k
performance of the k-means algorithms in Section 2.

IJERTV2IS70648 www.ijert.org 2546

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

// So below, K is set of subset of D selection of initial centroid points it is

as temporary cluster and C is set of susceptible to a local optimum and may miss
centroids of those clusters. the global optimum. It may converge to
suboptimal solutions. This means
K= {k1, k2, k3,………………, kk suboptimal classification may be found,
requiring multiple runs with different initial
},
conditions. The selection of spurious data
points as a center may lead to no data points
C= {c1, c2, c3,………………, ck}
in the class, with the outcome that the center
cannot be updated.
Where k1= {d1}, k2=
3. It can model only a spherical shape of
{d2}, k3= {d3}…… kk= {dk} clusters. Thus the non convex shape of
clusters cannot be modeled in center based
And c1=d1, c2=d2, clustering.
c3=d3,………. ck=dk, 4. It is sensitive to outliers since a small
amount of outliers can substantially
// here k<=n influence the mean value.
5. Due to the nature of iteration scheme in
Output: // // K is set of subset of D as final producing the clustering result, it begins at
cluster and C is set of centroids starting cluster centroids and iteratively
of these cluster. updates these centroids to decrease the
square error. But it is not confirmed how
K= {k1, k2, k3,………………, kk many time it iterates which is not relevant
}, for bigger data set. It may take a huge
number of iterations to converge. Such
number of iterations cannot be determined
RT
C= {c1, c2, c3,………………, ck}
beforehand and may change from run to run.
Algorithm: Result may be bad with high dimensional
data.
IJE

K-means (D, K, C) 6. It cannot be used for clustering problems

whose results cannot fit in main memory,
1. Arbitrarily choose k objects from D as which is the case when data set has very
high dimensionality or desired number of
the initial cluster centers. cluster is too big.
2. Repeat
3. (re) assign each object to the cluster
to which the object is the most similar, 3. PROPOSED CLUSTERING
based on the mean value
of the objects in the cluster.
ALGORITHM
4. Update the cluster means, i.e., Input: // A set D of n objects to cluster. A threshold
calculate the mean value of the objects value Tth.
for each cluster.
5. Until no change. D= {d1, d2, d3,………………, dn}, Tth

2.2. LIMITATTIONS OF K-MEANS Output:// A set K of k subsets of D as final clusters

CLUSTERING ALGORITHM and a set C of centroids of these
clusters.
A critical look at the available literature indicates the
following shortcomings are in the existing K-means K= {k1, k2, k3,………………, kk},
clustering algorithms [13].
1. In partitioning based K-means clustering C= {c1,c2,c3,………………,ck}
algorithms, the number of clusters (k) needs
to be determined beforehand. Algorithm:
2. The algorithm is sensitive to an initial seed
selection (starting cluster centroids). Due to

IJERTV2IS70648 www.ijert.org 2547

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

Proposed cluster algorithm (D,Tth) influence the mean value. In proposed

clustering algorithm outliers can’t influence
1. Let k=1 the mean value. They can be easily
2. // Randomly choose a object from identified and removed (if desired).
D, let it be p 4. In K-means it is not confirmed that how
many times it iterates but in proposed
k1= {p}
clustering it is known.
3. K={k1} 5. Data are stored in secondary memory and
4. c1=p data objects are transferred to main memory
5. C={c1} one at a time for clustering. Only the cluster
6. Assign a constant value to Tth representations i.e. centroid are stored
7. for l=2 to n do permanently in main memory to alleviate
8. Choose next random point space limitations thus space requirements of
proposed algorithm is very small, necessary
from D other than already
only for the centroids of clusters. In K-
chose points let it be q. means memory space is more required to
9. Determine m, distance store each object permanently in memory
between q and centroid along with centroids.
cm(1<=m<=k) in C such
that distance is minimum 4. TIME COMPEXITY
using eq. (1).
10. If (distance<=Tth) then 4.1 TIME COMPLEXITY OF K-MEANS
11. km=km union q CLUSTERING ALGORITHM [1]
12. Calculate new
mean (centroid To calculate the running time of K-means algorithm
RT
cm) for cluster it is necessary to know the number of times each
km using eq. (2). statement run and cost of running. But sometimes
13. Else k=k+1 number of steps is not known so it has been assumed.
IJE

14. kk={q} For example let number of times first statement runs
15. K=K union {kk} with cost m1 is q (>=1). For each q next statement ,
16. ck=q for i=1,2,…………n where n is number of data
17. C=C union {ck} objects, runs n+1 times with cost m2. For each q and
for each n, next statement runs k+1 times, where k is
number of cluster with cost m3. 4th statement runs
3.1. ADVANTAGES OF PROPOSED one time for each q and for each n with cost m4.
CLUSTERING Calculating new mean for each cluster requires k+1
runs for each q with cost m5.
Having looked at the available literature indicates the
following advantages can be found in proposed Running time for algorithm is the sum of running
clustering over K-means clustering algorithm. time for each statement executed i.e.
1. In K-means clustering algorithms, the
number of clusters (k) needs to be T(n) =m1*q+m2*1∑q (n+1)+m3*1∑q 1∑
n

determined beforehand but in proposed (k+1)+m41∑q 1∑n 1+m51∑q (k+1).

clustering algorithm it is not required. It
generates number of clusters automatically. =
2. K-means depends upon initial selection of m1*q+m2*q*(n+1)+m3*q*n*(k+1)+m4*q*n*1+m5
cluster points, it is susceptible to a local
*q*(k+1).
optimum and may miss global optimum.
Proposed clustering algorithm is employed
= m1*q+m2*q*n+ m2*q+m3*q*n*k+
to improve the chances of finding the global
optimum. m3*q*n+m4*q*n+m5*q*k+ m5*q.
3. K-means is sensitive to outliers since a small
amount of outliers can substantially

IJERTV2IS70648 www.ijert.org 2548

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

of clusters. Rest of statement is part of if-then-else

=(m1+m2+m5)*q+(m2+m3+m4)*q*n+m3*q*n*k. body which runs for n-1 times. Let if-then part body
runs for r times with cost m11, m12 and then else
For worst case it will be O(ni) where part body runs for n-1-r times with cost m13, m14,
2<=i<3 m15, m16.
For best case it will be O(n) Running time algorithm is the sum of running time
for each statement executed i.e.
For average case it will be O(n2).
T(n)=m1*1+ m2*1+ m3*1+ m4*1+ m5*1 +m6*1+
4.2 Time complexity of proposed clustering m7*n+ m8*q+ m9*i=2∑n (k+1)+ m10*(n-1)+m11*r+
algorithm. m12*r+ m13*(n-1-r)+ m14*(n-1-r)+ m15*(n-1-r)+
m16*(n-1-r)+ m17*(n-1-r).
Time taken by an algorithm depends on the input data
set. Clustering a thousand data objects takes longer T(n)=m1+ m2+ m3+ m4+ m5 +m6+ (m7+ m10+
time than clustering one object. Moreover K-means m13+m14+m15)*n-( m10+m13+ m14+ m15+ m16+
and proposed algorithm takes different amounts of m17)+( m11+ m12- m13- m14- m15- m16-
time to cluster same data objects. In general, the time m17)*r+m9* i=2∑n (k+1)+m8*q.
taken by an algorithm grows with the size of input, so
it is traditional to describe the running time of For worst case let p increases with increase in i then
program as a function of size of its input. To do so,
n
there is need to define the terms “Running Time ” i=2∑ (k+1)=2+3………..n
and “Size of Input” more carefully. Most natural
=n*(n+1)/2-1
measure is the number of objects in the input. In this
RT
analysis number of objects is represented by n. So T(n)=m1+ m2+ m3+ m4+ m5 +m6+ (m7+ m10+
Running time of an algorithm on a particular input is m13+m14+m15)*n-( m10+m13+ m14+ m15+ m16+
the number of primitive operations or “steps”
IJE

m17)+( m11+ m12- m13- m14- m15- m16-

executed. It is convenient to define the notion of m17)*r+m9* n*(n+1)/2-1+m8*q.
steps so that it is as machine –independent as
possible. A constant amount of time is required to T(n)=O(n2)
execute each line of algorithm. One line may take
n
different amount of time than another line, but it is For best case let p=1 for 2<=i<=n then i=2∑

assumed that each execution of ith line takes time mi (k+1)=2*n

where mi is a constant. In the following discussion ,
T(n)=m1+ m2+ m3+ m4+ m5 +m6+ (m7+ m10+
expression for running time of both algorithms
m13+m14+m15)*n-( m10+m13+ m14+ m15+ m16+
evolves from a messy formula that uses all the
m17)+( m11+ m12- m13- m14- m15- m16-
statement costs mi to a much simpler notation that
m17)*r+m9* 2*n+m8*q.
concise and more easily manipulated. This simpler
notation makes it easy to determine whether one T(n)=O(n)
algorithm is more efficient than another.
For average case it will be O(ni) for 1<=i<=2.
In proposed clustering algorithm, like incremental K-
means, number of times each statement runs is Table1 Comparison of algorithm’s running time
known. 1st, 2nd, 3rd, 4th, 5th and 6th statement runs one Name of Worst Average Best case
time only with cost m1, m2, m3, m4, m5 and m6 algorithm case case
k-means O(ni) O(n2) O(n)
respectively. Next statement for i=2,3,…………..n,
where
runs n times with cost m7 where n is number of data 2<=i<3
objects. 8th statement finds next random object to Proposed O(n2) O(ni) O(n)
cluster. 9th statement scans centroid of each cluster Algorithm where
with cost m9. So it runs k+1 times where k is number 1<=i<=2

IJERTV2IS70648 www.ijert.org 2549

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

Above table shows three test cases having minimum

number of object in a cluster as 2,3 and 4, threshold
5. Experimental Result value varies from 6 to 12 for each test case. On
different –different threshold value we have obtained
The implementation of proposed algorithm is using different values of square error, number of object as
Dot Net Visual Studio 2008 using language C# and Outlier and number of cluster form.
backend Microsoft SQL Server 2008. We have
evaluated our algorithm on Max hospital data set of
diabetic patients. All the experimental results PROPOSED CLUSTERING
reported are on Intel Core i3 whose clock speed of TECH.
processor is 3.0GHz and the memory size is 4 GB
running on window7 home basic. 20
THERSHOL
D VALUE
Table 2: Experimental Result obtained by Proposed 15
12
Algorithm 11
10 10 9 NO. OF
8
7 CLUSTER
MIN. 6
5 FORMED
NO.
OF NO. NO. SQUARE
0 ERROR
TE THER SQUA OBJE OF OF *100
1 2 3 4 5 6 7
ST SHOL RE CT IN OBJEC CLUS
C D ERRO A T. AS TER
RT
A VALU R CLUS OUTLI FOR- Figure 1: Graph representing test case1.
SE E *100 TER. ERS MED
12 17.57 2 2 9 PROPOSED CLUSTERING
IJE

11 15.18 2 1 11
TECH.
10 9.14 2 4 12
1 9 7.64 2 3 13 20
18
8 6.22 2 6 12 THERSHOL
16
7 4.84 2 8 12 D VALUE
14
6 3.78 2 11 12 12
12 11
12 17.2 3 6 7 10 10 NO. OF
9 8
11 14.79 3 7 8 8 7 CLUSTER
6 FORMED
10 8.42 3 12 8 6
2 9 6.9 3 11 9 4 SQUARE
8 5.58 3 14 8 2 ERROR
7 4.35 3 14 9 0 *100
6 3.56 3 15 10 1 2 3 4 5 6 7
12 17.21 4 6 7
11 14.13 4 10 7 Figure 2: Graph representing test case2.
10 7.49 4 18 6
3 9 5.8 4 20 6
8 5.32 4 17 7
7 3.92 4 20 7
6 2.78 4 27 6

IJERTV2IS70648 www.ijert.org 2550

International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 7, July - 2013

7. References
PROPOSED CLUSTERING
1. Han, J. &Kamber, M. (2012). Data Mining: Concepts
TECH. and Techniques. 3rd.ed. Boston: Morgan Kaufmann
Publishers.
30 2. Sudhir Singh, Dr. Nasib Singh Gill,Comparative Study
Of Different Data Mining Techniques : A Review, www.
25 THERSHOL ijltemas.in, Volume II, Issue IV, APRIL 2013 IJLTEMAS
D VALUE ISSN 2278 – 2540.
20 3. M. Ester, H. Kriegel, J. Sander, and X. Xu. A Density-
Based Algorithm for Discovering Clusters in Large Spatial
15 NO. OF Databases with Noise. Proc. of the 2nd Int’l Conf. on
12 Knowledge Discovery and Data Mining, August 1996.
11 CLUSTER
10 4. M. Ester, H. Kriegel, and X. Xu. Knowledge Discovery
10 9 8 FORMED in Large Spatial Databases: Focusing Techniques for
7 6
SQUARE Efficient Class Identification. Proc. of the Fourth Int’l.
5 Symposium on Large Spatial Databases, 1995.
ERROR
5. D. Judd, P. McKinley, and A. Jain. Large-Scale Parallel
0 *100 Data Clustering. Proc. Int’l Conference on Pattern
Recognition, August 1996.
1 2 3 4 5 6 7
6. R. T. Ng and J. Han. Efficient and Effective Clustering
Methods for Spatial Data Mining. Proc. of the 20th Int’l
Conference on Very Large Databases, Santiago, Chile,
Figure 3: Graph representing test case3. pages 144–155, 1994.
7. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: An
Above graph shows that Efficient Data Clustering Method for Very Large Databases
Proc. of the 1996 ACM SIGMOD Int’l Conf. on
1. As threshold value decreases Square Error Management of Data, Montreal, Canada, pages 103–114,
RT
June 1996.
decreases. Lower the value of Square Error 8. Performance Evaluation of Incremental K-means
higher the compactness of cluster and as Clustering Algorithm, Sanjay Chakraborty , N.K. Nagwani
separate as possible. Hence as we decrease National Institute of Technology (NIT) Raipur, CG, India,
IJE

IIJDWM, Journal homepage: www.ifrsa.org.

the threshold value cluster quality increases.
9. PERFORMANCE ANALYSIS OF PARTITIONAL
2. As we decreases the threshold value number AND INCREMENTAL CLUSTERING, Seminar National
of cluster form increases. Aplikasi Teknologi Informasi 2005 (SNATI 2005) ISBN:
3. As we decrease the threshold value number 979-756-061-6 Yogyakarta, 18 June 2005.
10. Performance Evaluation of Incremental K-means
of object as Outlier increases. Clustering Algorithm, IFRSA International Journal of Data
Warehousing & Mining |Vol1|issue 1|Aug 2011.
6. Conclusions 11. M H Dunham, “Data Mining: Introductory and
Advanced Topics,” Prentice Hall, 2002.
In this paper we presented an algorithm for 12. R C Dubes, A K Jain, “Algorithms for Clustering
Data,” Prentice Hall, 1988.
performing K-means clustering. Our experimental 13. Data Mining and Statistics for Decision Making, Page
result demonstrated that our scheme can improve the no. 251, Stephane Tuffey, Wiley Publication.
direct K-means algorithm. This paper also explains
the time complexity of K-means and our purposed
algorithm.

There are several improvements possible to the basic

strategy presented in this paper. One approach will be
to use the concept of Nearest Neighbor Clustering
Algorithm to improve the compactness of clusters.

IJERTV2IS70648 www.ijert.org 2551

Cognizant Response To AZ CISS RFP-112806-Word
100% (4)
Cognizant Response To AZ CISS RFP-112806-Word
241 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
Ecoview9 Plus User & Technical Manual (Mar, 2015) R-TR2
No ratings yet
Ecoview9 Plus User & Technical Manual (Mar, 2015) R-TR2
312 pages
Em70 140
No ratings yet
Em70 140
2 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
Telnet
No ratings yet
Telnet
2 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
1 Kmeans
No ratings yet
1 Kmeans
13 pages
Surds Review Questions
No ratings yet
Surds Review Questions
6 pages
Web Devlopement Intv Questions
No ratings yet
Web Devlopement Intv Questions
4 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Lecture 1 (UNIT 1)
No ratings yet
Lecture 1 (UNIT 1)
68 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
2018 - 4 - Answer Key of Naib Tehsildar (Main) - 2018 Held On 14-04-2018
No ratings yet
2018 - 4 - Answer Key of Naib Tehsildar (Main) - 2018 Held On 14-04-2018
2 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Algo
No ratings yet
Algo
59 pages
Unit 4
No ratings yet
Unit 4
125 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
10C Form T1 PHD Thesis Submission For Repository NITT
No ratings yet
10C Form T1 PHD Thesis Submission For Repository NITT
2 pages
Unit 4
No ratings yet
Unit 4
29 pages
Mod Menu Log - Com - Fffungame.taptaprun
No ratings yet
Mod Menu Log - Com - Fffungame.taptaprun
23 pages
x64dbg Documentation
No ratings yet
x64dbg Documentation
281 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Notion On False Friends (French - English)
No ratings yet
Notion On False Friends (French - English)
9 pages
Unit IV
No ratings yet
Unit IV
51 pages
TB Barricade v3 - ESP
No ratings yet
TB Barricade v3 - ESP
6 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Da Exp 10 66
No ratings yet
Da Exp 10 66
6 pages
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - K-Means Clustering Algo
25 pages
Stylistics Is A Branch of Linguistics That Focuses On The Study of Style in Language
No ratings yet
Stylistics Is A Branch of Linguistics That Focuses On The Study of Style in Language
5 pages
Question: 2. An Air Conditioning Plant Comprising Lter, Cooler Coil, Fan A
No ratings yet
Question: 2. An Air Conditioning Plant Comprising Lter, Cooler Coil, Fan A
2 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Ma 101 Course Guide
No ratings yet
Ma 101 Course Guide
2 pages
Part4 F
No ratings yet
Part4 F
26 pages
Ex.5:Propositional Model Checking - Wumpus World: Aim: Methodology
No ratings yet
Ex.5:Propositional Model Checking - Wumpus World: Aim: Methodology
6 pages
USAA Bank Statement 5 Page
No ratings yet
USAA Bank Statement 5 Page
8 pages
K Means
No ratings yet
K Means
40 pages
MCT Unit 2
No ratings yet
MCT Unit 2
26 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
Amazon Application Engineer - JD
No ratings yet
Amazon Application Engineer - JD
2 pages
Oomd Mod
No ratings yet
Oomd Mod
9 pages
Na 2010
No ratings yet
Na 2010
5 pages
Us Deloitte and Ibm App Modernization Field Guide
No ratings yet
Us Deloitte and Ibm App Modernization Field Guide
34 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
For Visual Studio User'S Manual: Motoplus SDK
No ratings yet
For Visual Studio User'S Manual: Motoplus SDK
84 pages
Project Management: Openings For Disruption From AI and Advanced Analytics
No ratings yet
Project Management: Openings For Disruption From AI and Advanced Analytics
30 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Kmean
No ratings yet
Kmean
24 pages
OWASP Quick Start Guide
No ratings yet
OWASP Quick Start Guide
13 pages
Windows XP Professional SP3 x86 - Microsoft - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
Windows XP Professional SP3 x86 - Microsoft - Free Download, Borrow, and Streaming - Internet Archive
16 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
100% (1)
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
24 pages
Title: University of Northeastern Philippines
No ratings yet
Title: University of Northeastern Philippines
3 pages
Ender-3 Assembly Instruction (V1.0)
No ratings yet
Ender-3 Assembly Instruction (V1.0)
14 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Clustering
No ratings yet
Clustering
125 pages
Ni-Cad Battery Sizing Calculation (IEEE 1115)
No ratings yet
Ni-Cad Battery Sizing Calculation (IEEE 1115)
2 pages
Clustering
No ratings yet
Clustering
84 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
No ratings yet
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
5 pages
2875 27398 1 SP
No ratings yet
2875 27398 1 SP
4 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
No ratings yet
Enhancing The Exactness of K-Means Clustering Algorithm by Centroids
7 pages
Storage Technologies: Digital Assignment 1
No ratings yet
Storage Technologies: Digital Assignment 1
16 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
7 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
No ratings yet
Journal of Computer Applications - WWW - Jcaksrce.org - Volume 4 Issue 2
5 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
K Mean
No ratings yet
K Mean
12 pages
K Means
No ratings yet
K Means
33 pages

Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648

Uploaded by

Analysis and Study of K Means Clustering Algorithm IJERTV2IS70648

Uploaded by

International Journal of Engineering Research & Technology (IJERT)

Analysis And Study Of K-Means Clustering Algorithm

Abstract We present our algorithm in Section 3, time

distance formula [1].

IJERTV2IS70648 www.ijert.org 2546

// So below, K is set of subset of D selection of initial centroid points it is

K-means (D, K, C) 6. It cannot be used for clustering problems

2.2. LIMITATTIONS OF K-MEANS Output:// A set K of k subsets of D as final clusters

IJERTV2IS70648 www.ijert.org 2547

Proposed cluster algorithm (D,Tth) influence the mean value. In proposed

determined beforehand but in proposed (k+1)+m4*1∑q 1∑n 1+m5*1∑q (k+1).

IJERTV2IS70648 www.ijert.org 2548

of clusters. Rest of statement is part of if-then-else

m17)+( m11+ m12- m13- m14- m15- m16-

assumed that each execution of ith line takes time mi (k+1)=2*n

IJERTV2IS70648 www.ijert.org 2549

Above table shows three test cases having minimum

IJERTV2IS70648 www.ijert.org 2550

IIJDWM, Journal homepage: www.ifrsa.org.

There are several improvements possible to the basic

IJERTV2IS70648 www.ijert.org 2551

You might also like

determined beforehand but in proposed (k+1)+m41∑q 1∑n 1+m51∑q (k+1).