0% found this document useful (0 votes)

16 views6 pages

Marutho 2018

Uploaded by

someone4551

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Marutho 2018

Uploaded by

someone4551

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

The Determination of Cluster Number at k-mean using

Elbow Method and Purity Evaluation on Headline News
Dhendra Marutho1, Sunarna Hendra Handaka2, Ekaprana Wijaya3, Muljono4
1,2,3,4
Computer Science Faculty of Dian Nuswantoro University, Semarang, Indonesia
1
[email protected], [email protected], [email protected], [email protected]

Abstract— Information is one of the most important particular topic they wanted. Grouping of news by title can be
thing in our lives, while humans is naturally impatient done by looking at similarities in a title [3].
when searching for information from the internet. Users The three problems in the clustering process are: (i)
want to get the right answer instantaneously with minimal determining the size of the similarity between different
effort. News headlines can be used to categorize news elements (ii) applying efficient algorithms to find the group of
types, as appropriate. The appropriate type of news can elements, which are most similar to the unsupervised way and
make it easier for us to choose the particular topic we (iii) obtaining descriptions that can characterize the elements of
want. Similarity in a title can be used to clustering news the cluster[4].
based on news title. From those reason this dataset
research contain the title of online news site. TFIDF used K-means algorithm is a reliable algorithm for clustering
process. The headline data are clustered so that the appropriate
as Document Preprocessing method, K-Means as
classes are obtained. The problem that often arises is the
clustering method, and elbow method used to optimize
determination of the number of clusters. The exact number of
number of cluster. Purity method applied to evaluate news clusters will show the maximum resemblance in each class
title clustering as internal evaluation. SSE (Sum Square generated. The Elbow method can be a principle method to
Error) of each cluster are calculate and compared to determine the exact value for the number of clusters. The purity
optimize number of cluster in the elbow method, the result value is used to evaluate the result of the method, purposed for
of those comparison evaluate using internal evaluation producing the expected cluster number.
called purity, purity value is conformity between cluster
and ideal cluster. From the calculation of elbow method, II. RELATED WORKS
the most optimal number of cluster are 8 cluster, there is K-Means is a popular and simple clustering technique, but
0.228 point between 7cluster and 8 cluster SSE value so the the results are based on the chosen cluster center so that it can
elbow form are made. Purity evaluation method generates easily generate local optimization. Since the cluster center is
value 0.514 in the number of cluster are 8, this is the randomly selected, the center may be poorly selected. A
highest value and the one closest to one rather than the research conducted by Aditi Anand Shetkar et al. proposed a
other number of cluster which mean the most ideal. The K-means ++ algorithm to solve this problem, by spreading the
conclusion is the elbow method can be used to optimize cluster center evenly. In K-means ++, the first cluster center is
number of cluster on K-Mean clustering method.. randomly selected then it looks for another point based on the
exact possibility. K-Means ++ gives better results than K-
Keywords—Clustering; K-Means;ElbowMethod;Purity;Tf- Means [5].
idf; Ahmad Izzuddin et al used Principal Component Analysis
I. INTRODUCTION (PCA) to reduce the lecturer performance data before the k-
mean algorithm was applied. It is proven to be effective to
Thousands and even millions of news generated by various improve a model quality. The study used DBI to measure the
news sites every day. Political news, economy, social culture, validity of the cluster. The result showed that DBI, in case of
and entertainment are continuously published every time combining PCA and K-mean is smaller than the conventional
through the Internet, while humans are naturally impatient k-mean[6].
when searching for information from the Internet; users want to
get the right answer instantaneously with minimum effort [1]. Ni Putu Eka Merliana et al used the Elbow method to
In many applications, the title is the first thing users notice. determine the best number of clusters in the k-means
Although only a word or phrase, it can dramatically change the algorithm. First, calculate the SSE value of the specified
overall message of the content. A proper title is very important, cluster. Secondly, when a drastic decrease in the SSE value and
therefore it should be descriptive, concise and correct in there is no significant change, then the current Elbow point is
grammar [2]. determined. That point is the optimal k value [7].
News headlines can be used to categorize news type; The cluster validity of k-mean and fuzzy c-mean
appropriate news type can make it easier for users to choose the algorithms is measured using purity and entropy. Entropy uses

Authorized licensed use limited to: Middlesex University. Downloaded on533

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.
2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

external class information and cluster purity is measured by Here we got dataset contain news title from news site, the
the reference to the labels called entropy. The lower entropy data are preprocessed using four type of preprocessing, first are
means better clustering. The entropy strengthens when the tokenize next are stop word removal and TF-IDF term
objects truth in the cluster is more diverse. The higher entropy weighting are performed. The data reduced using principal
means the worse cluster. The number of disorders found using component analysis.
entropy [8]. Satya Chaitanya Sripada et al mentioned that the K-Mean clustering method applied using number of cluster
higher the purity value indicates a good cluster. Entropy is an are 2 until 10, while SSE (Sum Square Error) are calculated
inverted measurement, the lower the entropy value the better and recorded. Put it on a graphic and determine the corner of
the clustering result. the elbow method. Those result are evaluate using purity and
Budi Santoso et al used the Genetic Algorithm (GA) to each number of cluster are generate purity number.
optimize the cluster's initial center on the k-means algorithm.
The k-means algorithm is applied to the lecturer performance B. K-means
data in a Faculty of Computer Science of Brawijaya University K-means is a simple unsupervised learning algorithm, used
in 2016. The result showed that GA-K-means algorithm has to classify data based on Euclidian Distance technic between
higher cluster quality that achieved 2,74 % compared to the k- the data [11]. K-Means is a fast and simple clustering method
means algorithm without Genetic Algorithm (GA), where the with a smaller number of iterations. This algorithm divides data
quality of cluster obtained using the Silhouette Coefficient into k section. Cluster requirements are estimated based on user
method [9]. choice. Computers randomly select and assign objects to one
cluster (k). The distance between each object and the center of
Kamaljit Kaur et al compared K-means method with each cluster is calculated and resulted in an optimal cluster
Median-Based K-means. The method was applied to the solution. Objects within a particular cluster are adjacent to each
datasets of the UCI machine learning repository, in which the other [12].
Median-based k-means was used to select the initial centroids.
The result was to determine the initial centroid first which is C. Preprocessing
better than random selection[10]. 1) Tokenize
III. METHOD Tokenization is the process of dividing a sentence into a
word by omitting commas, spaces and special symbols in a
A. Proposed System sentence. This process generates tokens [5].
Optimizing number of cluster of K-Mean method is the 2) Stop word removal
main purpose of this study. Elbow method are choose to Prepositions, articles, and pronouns etc, are the most
determine the number of cluster while purity are perform to common words in text documents and do not give any meaning
evaluate. to the document. These words are eliminated and not required
News Title for text mining applications [13].
3) Weighting
preprocessing A weighting step is done by calculating the frequency of
documents, one of the methods used is TF-IDF. After the pre-
Tokenization
Stop word removal processing of the document, each document in the dataset is
Term weighting
Principal component analysis
represented as an N-dimensional vector in the term space,
where "N" denotes the number of words/terms. Document
vectors are exposed to some standard weighting schemes, such
elbows method as Term Frequency-Inverse Document Frequency (TF-IDF).
We need to calculate the weight of Term Frequency (TF),
Inverse Document Frequency (IDF) and finally TF * IDF i.e.
Determine number of cluster TF and IDF products. TF-IDF analysis is done by considering
two factors: Term Frequency (TF) and Inverse Document
Frequency (IDF).
k-mean

calculate SSE
TF-IDF = TF * IDF
TF: - K / T, where K = the total number of certain words in
No document d and T = number of words in document d.
Elbows formed
IDF = D / DF, where D = number of documents in dataset,
Yes DF = total number of documents containing a particular word
[5].
D. Principal Component Analysis (PCA)
evaluation
using purity Principal Component Analysis (PCA) is a technique for
connecting new variables that are linear combinations of the

Figure 1. Design model of the proposed system

Authorized licensed use limited to: Middlesex University. Downloaded on534

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.
2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

original variable. The maximum number of these new variables B. Stop word removal
will be equal to the sum of the old variables and are not The next process is stop word removal, which removes
correlated to each other [6]. The algorithm is used to reduce the prepositions, articles, pronouns etc.
data of several ratios into several indexes, which is a linear
combination of all initial ratios [14]. C. Term weighting
E. Elbow method Next step is the term weighting implemented to the data
which then performed optimization using PCA technic so that
This method focuses on the percentage of variants as the it resulted in 500 data.
function of the number of clusters. Based on the idea that there
should be an optimal number of k-means algorithm, so adding From the above processes, the data is then processed using
the number k will not contribute significantly [15]. The value the k-mean algorithm, starting with n = 2 to n = 10. In this
of k is added one by one and the Sum Square Error (SSE) value process, the SSE value is revealed and showed in the following
is recorded. table.

TABLE I. K AND SSE VALUE

= ‖ − ‖
∈ n SSE SSE ratio
2 0,877 0,877
SSE is the sum of the average Euclidean Distance of each 3 0,720 0,157
point against the centroid [16]. When the value drops
4 0,698 0,022
drastically and forms a smaller angle, then the value of k is
found. Starting from k = 2 and the SSE value is then added step 5 0,677 0,021
by step, where kn = k + 1, the largest SSE kn - SSE kn - 1 is 6 0,659 0,018
the point in which the optimal k value is found. When the value
of k is re-added then the new cluster is similar to the previous 7 0,655 0,004
cluster or the number of errors does not change significantly 8 0,427 0,228
which resulted in the value of k.
9 0,419 0,008
F. Purity 10 0,422 0,003
Purity is an algorithm used to evaluate the value of k
produced by Elbow method. It is a method for measuring the
quality of cluster based on the labeled data. The purity value 1 From the above table, the largest difference is at n = 7 and
is gotten from the pure cluster result. That means that all data is n = 8 value, it is more clearly seen in Fig. 1 below
perfectly classified according to the existing label. The
resulting cluster is not pure if the purity value is close to 0
(zero), which means that multiple cluster members contained in
different classes [17].

= 1/ ∩

N = No. of data points

K = No. of cluster Figure 1. SSE value and Number of clusters
Ci = Cluster in C
tj is specify the max. count for each cluster Ci The value of SSE at n = 2 is 0.877 then at the value of n = 3
is 0.720, which create a difference of 0.157. The next is at n =
4 to n = 7 is 0.689, 0.677, 0.659, and 0.655. There was no
IV. RESULT AND DISCUSSION significant change at these points, the different value ranges
The dataset used contained a 1000 (thousand) news from 0.004 to 0.022. The change is quite drastic at the value of
headline records retrieved from various news sites. The data is n = 8 i.e. SSE of 0.427, which means that there is a difference
pre-processed i.e. of 0.228, the difference is higher than 0.157. Then n value is
increased successively to the 9 and 10 that yields SSE value of
A. Tokenization 0.419 and 0.422. That means there is a difference that is not too
The tokenization process yields word data. From the far which ranges in 0.008 and 0.003. It can be concluded that
headline dataset generated the words, which are used to the the n optimal point for the clustering data is n = 8 based on
next steps. Elbow method.

Authorized licensed use limited to: Middlesex University. Downloaded on535

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.
2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

The next step is to compare the result of the optimal cluster TABLE VI. DATA CLUSTER N=6
value using Elbow method and purity calculation for each
value of n. The purity is the maximum number of each cluster Category a b c d
divided by the amount of data. It used the class label as a Cluster 1 0 30 0 0
category. In case of that, in this study the data processed
amounted to 1000 with the label is as many as 4 pieces namely Cluster 2 0 0 50 0
a, b, c, and d. The value of purity is the sum of the maximum Cluster 3 248 140 172 247
value of each cluster divided by 1000. The value of n = 2 to n =
10 in arithmetic purity value, which are shown in the table Cluster 4 0 0 28 0
below. Cluster 5 0 54 0 0
Cluster 6 2 26 0 3
TABLE II. DATA CLUSTER N=2

Category a b c d TABLE VII. DATA CLUSTER N=7

Cluster 1 0 0 66 0 Category a b c d
Cluster 2 250 250 184 250 Cluster 1 0 30 0 0
Cluster 2 248 140 170 247
Cluster 3 0 0 40 0
In the cluster number equal to 2, which most categories of Cluster 4 0 0 22 0
data are inserted into cluster 2, only 66 from category c which
becomes member of cluster 1. The maximum value at cluster 1 Cluster 5 0 0 18 0
is 66 and on cluster 2 is 250 then obtained the purity value, Cluster 6 0 54 0 0
which is equal to 0.316.
Cluster 7 2 26 0 3
Furthermore, at the value of n = 3 to n = 7 that has a purity
value ranging from 0.409 to 0.438, the value is still lower than
the purity value of n = 8, is 0.514. The successive result of
clustering n = 3 to n = 7 is in the following table. The clustering value of n = 8 shown in Table VIII,
produces a purity value of 0.514, which is a vertex on the
Elbow method, based on that method the value at this point has
to be the highest one. This is proven by the value of n = 9, in
TABLE III. DATA CLUSTER N=3
which the purity value decreased continuously into 0.503,
Category a b c d shown in Table IX.

Cluster 1 250 157 184 250

Cluster 2 0 0 66 0 TABLE VIII. DATA CLUSTER N=8
Cluster 3 0 93 0 0
Category a b c d
TABLE IV. DATA CLUSTER N=4 Cluster 1 0 29 0 0
Category a b c d Cluster 2 0 0 22 0
Cluster 1 0 0 50 0 Cluster 3 0 0 40 0
Cluster 2 250 157 171 250 Cluster 4 229 92 145 152
Cluster 3 0 93 0 0 Cluster 5 0 0 18 0
Cluster 4 0 0 29 0 Cluster 6 21 48 25 95
Cluster 7 0 29 0 3
TABLE V. DATA CLUSTER N=5 Cluster 8 0 52 0 0
Category a b c d
Cluster 1 0 0 50 0
Cluster 2 250 146 171 250
Cluster 3 0 0 29 0
Cluster 4 0 39 0 0
Cluster 5 0 65 0 0

Authorized licensed use limited to: Middlesex University. Downloaded on536

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.
2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

TABLE IX. DATA CLUSTER N=9 The maximum purity value is 0.514 which is generated in the
value of n = 8 and n = 10.
Category a b c d
Cluster 1 0 29 0 0
Cluster 2 0 29 0 2
Cluster 3 0 1 9 0
Cluster 4 0 0 39 0
Cluster 5 0 0 20 0
Cluster 6 0 0 18 0
Cluster 7 0 52 0 0
Cluster 8 20 39 22 77
Cluster 9 230 100 142 171

The purity value increased to 0.514 at the value of n = 10, the Figure 2. SSE and Purity
clustering results can be seen in Table X. In Figure 2, the SSE and Purity values intersect at the value
of n = 8 with the value of SSE = 0.427 and the maximum
purity value is at 0.514.
TABLE X. DATA CLUSTER N=10
V. CONCLUSION
Category a b c d the Elbow method is purposed for determining the number
Cluster 1 0 21 0 0 of the cluster in the k-mean algorithm that we adopt in this
study. It is said that the optimal point for determining the k
Cluster 2 0 0 18 0 value is at the point, in which there is a significant change in
Cluster 3 0 32 0 0 the SSE value so that an angle is formed. The experiment was
Cluster 4 19 47 23 89 conducted on news headline data and performed an internal
evaluation through purity. The purity test results showed that
Cluster 5 229 92 141 158 the number of the cluster in the Elbow method is same as the
Cluster 6 0 0 20 0 best result of internal purity evaluation measurement.
Cluster 7 0 1 9 0
Cluster 8 0 38 0 0 REFERENCE
Cluster 9 0 0 39 0 [1] N. Gali, R. Mariescu-Istodor, and P. Fränti, “Using
Cluster 10 2 19 0 3 linguistic features to automatically extract web page
title,” Expert Syst. Appl., vol. 79, pp. 296–312, 2017.
TABLE XI. PURITY [2] H. Yunhua et al., “Title Extraction from Bodies of
HTML Documents and its Application to Web Page
k value purity Retrieval,” Res. Dev. Inf. Retr., pp. 250–257, 2005.
2 0.316 [3] I. Blokh and V. Alexandrov, “News clustering based
on similarity analysis,” Procedia Comput. Sci., vol.
3 0.409 122, pp. 715–719, 2017.
4 0.422 [4] A. Ahmad and L. Dey, “A k-mean clustering
5 0.433 algorithm for mixed numeric and categorical data,”
Data Knowl. Eng., vol. 63, no. 2, pp. 503–527, 2007.
6 0.436 [5] A. A. Shetkar and S. Fernandes, “Text Categorization
7 0.438 of Documents using K-Means and K-Means ++
8 0.514 Clustering Algorithm,” Int. J. Recent Innov. Trends
Comput. Commun., vol. 4, no. June, pp. 485–489,
9 0.503 2016.
10 0.514 [6] A. Izzuddin, “Optimasi Cluster pada Algoritma K-
Means dengan Reduksi Dimensi Dataset
The clustering processes using k-means algorithm resulted Menggunakan Principal Component Analysis untuk
in some groupings as shown in above tables. Table XI Pemetaan Kinerja Dosen,” Energy J. Ilm. Ilmu-Ilmu
displayed purity calculation result on each specified n value. Tek., vol. 5, no. 2, pp. 41–46, 2015.
[7] N. Putu, E. Merliana, P. Studi, M. Teknik, F. T.

Authorized licensed use limited to: Middlesex University. Downloaded on537

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.
2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

Industri, and U. A. Jaya, “Analisa Penentuan Jumlah vol. 6, no. 2, pp. 20285–20288, 2017.
Cluster Terbaik Pada Metode K-Means,” Semin. [14] S. Sharma, “Applied Multivariate Techniques,” John
Nasionalmulti Disiplin Ilmu&Call Pap. Unisbank, pp. Wiley Sons Inc., 1996.
978–979. [15] P. Bholowalia and A. Kumar, “EBK-Means : A
[8] S. C. Sripada, “Comparison of Purity and Entropy of Clustering Technique based on Elbow Method and K-
K-Means Clustering and Fuzzy C Means Clustering,” Means in WSN,” Int. J. Comput. Appl., vol. 105, no. 9,
Indian J. Comput. Sci. Eng., vol. 2, no. 3, pp. 343– pp. 17–24, 2014.
346, 2011. [16] E. Muningsih and A. B. S. I. Yogyakarta, “Optimasi
[9] B. Santoso, I. Cholissodin, and B. D. Setiawan, jumlah cluster k-means dengan metode elbow untuk
“Optimasi K-Means untuk Clustering Kinerja pemetaan pelanggan,” Pros. Semin. Nas. ELINVO, no.
Akademik Dosen Menggunakan Algoritme Genetika,” September, pp. 105–114, 2017.
J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 1, no. [17] H. Jain and R. Grover, “Clustering Analysis with
12, pp. 1652–1659, 2017. Purity Calculation of Text and SQL Data using K-
[10] K. Kaur, D. Singh Dhaliwal, and R. Kumar Vohra, means Clustering Algorithm,” IJAPRR, vol. IV, no.
“Statistically Refining the Initial Points for K-Means 44557, pp. 47–58, 2017.
Clustering Algorithm,” Int. J. Adv. Res. Comput. Eng.
Technol., vol. 2, no. 11, pp. 2278–1323, 2013.
[11] S. S. Jamadar and P. D. Y. Loni, “Efficient Cluster
Head Selection Method Based On K-means Algorithm
to Maximize Energy of Wireless Sensor Networks,”
Int. Res. J. Eng. Technol., vol. 3, no. 08| Aug 2016,
pp. 1579–1583, 2016.
[12] M.Anusha, “An Enhanced K-Means Genetic
Algorithms for Optimal Clustering,” Int. Conf.
Comput. Intell. Comput. Res., vol. 14, pp. 1–13, 2016.
[13] M. Raghuvanshi and R. Patel, “An Improved
Document Clustering with Multiview Point Similarity
/ Dissimilarity measures,” Int. J. Eng. Comput. Sci.,

Authorized licensed use limited to: Middlesex University. Downloaded on538

September 02,2020 at 06:39:19 UTC from IEEE Xplore. Restrictions apply.

MASTERCRAFT 2005 Gauges Manual
No ratings yet
MASTERCRAFT 2005 Gauges Manual
27 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
K-Means Clustering Optimization Using The Elbow Method and Early Centroid Determination Based On Mean and Median Formula
No ratings yet
K-Means Clustering Optimization Using The Elbow Method and Early Centroid Determination Based On Mean and Median Formula
9 pages
Stop Using The Elbow Criterion For K-Means
No ratings yet
Stop Using The Elbow Criterion For K-Means
7 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
Day 3
No ratings yet
Day 3
37 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Clustering Approach For Analyzing The Student's Efficiency and Performance Based
No ratings yet
Clustering Approach For Analyzing The Student's Efficiency and Performance Based
18 pages
Determination of The Number of Cluster A Priori Using A K-Means Algorithm
No ratings yet
Determination of The Number of Cluster A Priori Using A K-Means Algorithm
3 pages
Research On K-Value Selection Method of K-Means Clustering Algorithm
No ratings yet
Research On K-Value Selection Method of K-Means Clustering Algorithm
10 pages
K-MEANS CLUSTERING PPT Kpu
No ratings yet
K-MEANS CLUSTERING PPT Kpu
4 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
Clusterer Ensemble
100% (1)
Clusterer Ensemble
7 pages
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
No ratings yet
MMZ XRF O0 Ra Pre 0 ZB XGXW W1 Er 02 OAYQum QDD78 HQP
4 pages
K - Mean Clustering
No ratings yet
K - Mean Clustering
15 pages
13 Unsupervised Learning
No ratings yet
13 Unsupervised Learning
9 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
K-Means Clustering
No ratings yet
K-Means Clustering
14 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
K-Means Clustering Algorithm and Its Improvement R
No ratings yet
K-Means Clustering Algorithm and Its Improvement R
6 pages
SD, TGHFDHSGD
No ratings yet
SD, TGHFDHSGD
1 page
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering Lecture
No ratings yet
Clustering Lecture
46 pages
Unsupervised K-Means Clustering Algorithm
No ratings yet
Unsupervised K-Means Clustering Algorithm
17 pages
Modified Clustering K-Means Algorithm For Intrusion Detection
No ratings yet
Modified Clustering K-Means Algorithm For Intrusion Detection
9 pages
I Jsa It 04132012
No ratings yet
I Jsa It 04132012
4 pages
Clustering
No ratings yet
Clustering
7 pages
Unit 4
No ratings yet
Unit 4
63 pages
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
No ratings yet
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
22 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Lecture 19
No ratings yet
Lecture 19
21 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
The Clustering Validity With Silhouette and Sum of Squared Errors
No ratings yet
The Clustering Validity With Silhouette and Sum of Squared Errors
8 pages
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
No ratings yet
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
10 pages
Module 4 - 5TH Sem
No ratings yet
Module 4 - 5TH Sem
23 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
CSC 240 HW 5
No ratings yet
CSC 240 HW 5
11 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
No ratings yet
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
10 pages
A Novel Multi-Viewpoint Based Similarity Measure For Document Clustering
No ratings yet
A Novel Multi-Viewpoint Based Similarity Measure For Document Clustering
4 pages
Dynamic Approach To K-Means Clustering Algorithm-2
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
16 pages
Unit 5
No ratings yet
Unit 5
63 pages
Unit 4
No ratings yet
Unit 4
46 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
No ratings yet
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
13 pages
Clustering
No ratings yet
Clustering
20 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
K Means
No ratings yet
K Means
25 pages
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
No ratings yet
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
6 pages
The K-Means Clustering Technique General Considera
No ratings yet
The K-Means Clustering Technique General Considera
11 pages
05 Group Account Management
No ratings yet
05 Group Account Management
13 pages
Bus Times
No ratings yet
Bus Times
2 pages
3.1 Critical Thinking Rubric
No ratings yet
3.1 Critical Thinking Rubric
1 page
Sans 10292
No ratings yet
Sans 10292
31 pages
Ml-15 Work Procedure For Row Clean Up & Restoration
No ratings yet
Ml-15 Work Procedure For Row Clean Up & Restoration
8 pages
EEPROM Cross Reference (In Detail)
No ratings yet
EEPROM Cross Reference (In Detail)
11 pages
SAP Enhancement Package 5 For SAP ERP 6.0
No ratings yet
SAP Enhancement Package 5 For SAP ERP 6.0
222 pages
Global Summit On Innovation, Productivity in The Age of AI - Revised
No ratings yet
Global Summit On Innovation, Productivity in The Age of AI - Revised
6 pages
PCS50-630 User Manual 20220509
No ratings yet
PCS50-630 User Manual 20220509
37 pages
AMF-65 AMS RFT Partnership Range - CULT PDF
No ratings yet
AMF-65 AMS RFT Partnership Range - CULT PDF
46 pages
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
No ratings yet
Pengaruh Penyajian Laporan Keuangan Dan Aksesibilitas TERHADAP TINGKAT AKUNTABILITAS KEU PADA SKPD KAB BENGKALIS
7 pages
7 PPT
No ratings yet
7 PPT
21 pages
Lesson 2 Introduction of Robot HAT
No ratings yet
Lesson 2 Introduction of Robot HAT
4 pages
Master Thesis RSM Erasmus University
100% (3)
Master Thesis RSM Erasmus University
5 pages
A Queueing Model With Server Breakdowns Repairs Va
No ratings yet
A Queueing Model With Server Breakdowns Repairs Va
13 pages
First Order Differential Equation: Homogenous Equations TEST 1: 19/10 6-7 PM Presentation Chapter 1 - 30/9/2015
No ratings yet
First Order Differential Equation: Homogenous Equations TEST 1: 19/10 6-7 PM Presentation Chapter 1 - 30/9/2015
8 pages
Project 2
No ratings yet
Project 2
4 pages
Game Designer Resume
100% (1)
Game Designer Resume
6 pages
Final Rodent
No ratings yet
Final Rodent
7 pages
Tips For Managing Virtual Teams 24 03 PDF
No ratings yet
Tips For Managing Virtual Teams 24 03 PDF
1 page
Set A
No ratings yet
Set A
4 pages
Axis Mobile Features
No ratings yet
Axis Mobile Features
34 pages
Technical Brochure Metal Ceilings V100-V200-en EU
No ratings yet
Technical Brochure Metal Ceilings V100-V200-en EU
12 pages
Capacidades de Reabastecimento R1700K
No ratings yet
Capacidades de Reabastecimento R1700K
2 pages
Lastexception 63670259058
No ratings yet
Lastexception 63670259058
25 pages
06 - Panel PRO-FACE de Válvula
No ratings yet
06 - Panel PRO-FACE de Válvula
24 pages
Quiz 2 AIS Niko Arniño
No ratings yet
Quiz 2 AIS Niko Arniño
8 pages
Wravor Catalog en
No ratings yet
Wravor Catalog en
28 pages
XZB (36B) Rotating Cups-Installation Technique and Inspection Record
No ratings yet
XZB (36B) Rotating Cups-Installation Technique and Inspection Record
23 pages

Marutho 2018

Uploaded by

Marutho 2018

Uploaded by

2018 International Seminar on Application for Technology of Information and Communication (iSemantic)

The Determination of Cluster Number at k-mean using

Authorized licensed use limited to: Middlesex University. Downloaded on533

Figure 1. Design model of the proposed system

Authorized licensed use limited to: Middlesex University. Downloaded on534

TABLE I. K AND SSE VALUE

N = No. of data points

Authorized licensed use limited to: Middlesex University. Downloaded on535

Category a b c d TABLE VII. DATA CLUSTER N=7

Cluster 1 250 157 184 250

Authorized licensed use limited to: Middlesex University. Downloaded on536

Authorized licensed use limited to: Middlesex University. Downloaded on537

Authorized licensed use limited to: Middlesex University. Downloaded on538

You might also like