Ijcset 2016060701

The document discusses and compares simple k-means clustering and modified k-means clustering techniques. It provides an overview of the simple k-means clustering algorithm and partitioning method. It also describes the WEKA tool and how it can be used to perform simple k-means clustering on sample medical data.

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views3 pages

Ijcset 2016060701

Uploaded by

Edward

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Saroj et al | IJCSET(www.ijcset.

net) | July 2016 | Vol 6, Issue 7, 279-281

Review: Study on Simple k Mean and Modified K

Mean Clustering Technique
Saroj Kavita
Student of Masters of Technology, Assistant Professor,
Department of Computer Science and Engineering Department of Computer Science and Engineering
JCDM college of Engineering, SIRSA, GJU, Hisar, JCDM college of Engineering, SIRSA, GJU, Hisar,
Haryana, India Haryana, India

Abstract: The main aim of this review paper is to provide various data mining problems successfully. In this paper,
comprehensive review of simple k mean clustering and clustering analysis is done by using simple k mean
modified k mean clustering techniques. Clustering is used as clustering and Modified k mean clustering. Normalization
active research in various fields like statistics, pattern and Indexing is an important preprocessing step in to
recognition and machine learning etc. Cluster Analysis is data
mining tool for a large and multivariate database. Clustering
standardize the values of all variables from dynamic range
is the one of data mining techniques in which data is divided into specific range. Cluster analysis is type data mining
into the groups of similar objects and dissimilar objects into technique which is used to find data segmentation and
another group. Clustering is a suited example of unsupervised pattern information. By clustering the data people get the
classification. data distribution, observe the character of each cluster, and
make further study on particular clusters. The aim of cluster
Keywords: Clustering, clustering Technique: Simple K Mean analysis is that the objects in a group should be similar to
and Modified k mean clustering. one another and different from the objects in other groups.
Clustering is much better when there is greater similarity
within a group and greater the difference between the
I. INTRODUCTION groups. So we can say that raw data has to be used with the
Due to the increased availability of computer hardware and algorithm to extract useful information from it. Various
software and the rapid computerization of business, large clustering algorithms according to different techniques
amount of data has been collected and stored in databases. have been designed and applied to various data mining
Researchers have estimated that amount of information in problems successfully. The most commonly used
the world doubles for every 20 months [1]. However raw algorithms in Clustering are Hierarchical, Partitioning,
data cannot be used directly. Its real value is predicted by Density Based and Grid based algorithms. The popular
extracting information useful for decision support. In most clustering techniques which have been suggested so far are
areas, data analysis was traditionally a manual process. either partition based clustering or hierarchical clustering
When the size of data manipulation and exploration goes but both approaches have their own advantages and
beyond human capabilities, people look for computing disadvantages in terms of the number of clusters, shape of
technologies to automate the process. Data mining is one of clusters, and cluster overlapping. When any clustering
the youngest research activities in the field of computing algorithm is applied to the raw data, only then we can get
science and is defined as extraction of interesting (non- clusters which are useful as shown in fig. [2].
trivial, implicit, previously unknown and potentially useful)
patterns or knowledge from huge amount of data. Data
mining is the process of analyzing data from different
perspectives and summarizing it into useful information
[2]. Data mining consists of extract, transform, and load
transaction data onto the data warehouse system, Data
mining includes the anomaly detection, association rule
learning, classification, regression, summarization and
clustering. Data mining is one of the most important
research fields which are due to the expansion of both
computer hardware and software technologies, which has
imposed organizations to depend heavily on these
technologies. Data mining concepts and methods can be
applied in various fields like marketing, medicine, real
estate, customer relationship management, engineering,
web mining, etc. Various clustering algorithms according
to different techniques have been designed and applied to Fig 1: Stages of Clustering [3]

279
Saroj et al | IJCSET(www.ijcset.net) | July 2016 | Vol 6, Issue 7, 279-281

II. PARTITIONING CLUSTERING made of a number of attributes, any of which can be

Data objects are partitioned into non overlapping clusters nominal, numeric or a string. Data set is used here medical
so that each and every object is in exactly in one subset. data.
The reason of division of the data into several subsets is
that checking of the all possible subset systems is 3) Overview of WEKA Tool
computationally not feasible; there are certain greedy
For working with WEKA we do not have any need the
heuristics schemes which are used in the form of iterative
deep knowledge of data mining because it is very popular
optimization. This means different relocation schemes that
data mining tool. WEKA is open source and freely
iteratively reassign points between the k clusters [3].
available as well as platform-independent user and provides
many facilities. It also provides the graphical user interface
to the user. WEKA is a landmark system in the history of
the data mining and machine learning research
communities, because it is the only toolkit that has gained
such widespread adoption and survived for an extended
period of time. It provides different algorithms for data
mining and machine learning. Simple k mean clustering is
Fig 2 before and after partitioning
done by using this tool. We have to give the different
datasets as input to the WEKA tool. After that it will
A. Simple K-Means Clustering process the input and gives the output. In this way clusters
It is a partitioning method which finds mutual exclusive will be formed.
clusters of spherical shape. It generates a specific number
of disjoint, flat (non-hierarchical) clusters. K-Means
algorithm recognizes objects into k – partitions where each
partition represents a cluster. We start out with initial set of
means and classify cases based on their distances to their
centers. Next, we compute the cluster means again, using
the cases that are assigned to the clusters; then, we
reclassify all cases based on the new set of means. We keep
repeating this step until cluster means don’t change
between successive steps. Finally, we calculate the means
of cluster once again and assign the cases to their
permanent clusters. [4]

1) Method for Simple K means clustering

1 Input: k= no. of clusters. D= data set that contains n
objects.
2 Output: Set of k clusters.
Fig 3 Front view of WEKA tool
Method:
1. Randomly choose k objects from D as the initial cluster
III. MODIFIED K MEAN CLUSTERING
centre.
Modified K Mean approach is designed to improve the
2. Repeat.
time, no. of iterations and sum of squared errors and this
3. Reassign each object to the cluster to which the object is provides much better result as compare to Simple K Mean
most similar, based on the mean value of the objects clustering done by using K Mean tool. This is simple to use
in the cluster. and it also provides graphical user interface to the user. In
4. Update the cluster means, i.e. calculate the mean value of Modified K Mean clustering data is reduced by normalized
the objects for each cluster. method and then the parameters such as time taken, no. of
5. until no change. iterations, Sum of squared errors are improved by using
normalized technique. Normalization index method used
for modification. In this method input data has to be
2) Dataset minimized by using indexing so that we can get the raw
The dataset is set of data items and this is a very basic data in sequence after that we calculate the Euclidean
concept of machine learning. A dataset is equivalent to a distance between different clusters. After that
two-dimensional spreadsheet or database table. In WEKA, normalization is done to minimize the sum of squared
dataset is implemented by the weka.core.Instances class. A errors and no. of iteration resulting in less execution time to
dataset is a collection of examples; each example can be make the grouping of clusters.
taken from one of class weka.core.Instance. Each Instance

280
Saroj et al | IJCSET(www.ijcset.net) | July 2016 | Vol 6, Issue 7, 279-281

REFERENCES
[1] Shraddha Shukla and Naganna S. “A Review on K-means Data
Clustering Approach” International Journal of Information &
Computation Technology, Volume 4, 2014
[2] S.Anupama Kumar and M. N. Vijayalakshmi “Relevance of data
mining techniques in editification sector”, International Journal of
Machine Learning and Computing, Volume 3, Issue 1, February
2013.
[3] Saroj, Tripti Chaudhary, “Study on Various Clustering Techniques”,
International Journal of Computer Science and Information
Technologies, Volume 6, Issue 3, 2015.
[4] Narender Kumar, Vishal Verma, Vipin Saxena “Cluster analysis in
data mining using k-means method”, International Journal of
Computer Applications, Volume 76, Issue.12, August 2013.
[5] Aastha Joshi, Rajneet Kaur “A Review: Comparative Study of
Various Clustering Techniques in Data Mining” International
Journal of Advanced Research in Computer Science and Software
Engineering, Volume 3, Issue 3, March 2013.
[6] Amandeep Kaur Mann & Navneet Kaur “Review paper on
Clustering Techniques”, Global Journal of Computer Science
and Technology Software and Data Engineering, Volume 13,
Issue 5, Year 2013.
[7] Bharat Chaudhary, Manan Parikh “A Comparative Study of
Clustering Algorithm using WEKA Tool”, International Journal of
Application or Innovation in Engineering and Management,
Volume 1, Issue 2, October 2012.
[8] Vaishali R. Patel, Rupa G. Mehta, “Clustering Algorithms: A
Comprehensive Survey”, International Conference on Electronics,
Information and Communication Systems Engineering, 2011.

Fig 4 Flow chart of Modified k means clustering

First step according to the given flow chart is that we have

to provide the raw data for conversion of data using
indexing and calculation of Euclidean distance is done so
that grouping of clusters is done in similar and dissimilar
categories.

IV. CONCLUSION
K Mean clustering is the most important type of
Partitioning clustering. Partitioning clustering is the one in
which clusters are partitioned according to their distances.
K mean clustering is that technique in which K cluster is
chosen and cluster which are at the less distance from k
cluster is selected in one group and others which are
farthest from the K cluster is placed in different group. In
this paper simple k mean clustering has been described by
using the WEKA tool and taking the medical data set i.e
Pima, Aids, Breast Cancer. On the other hand Modified k
mean clustering has been described based on normalization
and indexing approach using .NET which takes less time
with minimum no. of sum of squared errors to execute the
cluster.

ACKNOWLEDGMENT
I would like to thank Computer Science Engineering
department of JCDMCOE, SIRSA for the support and
providing an environment for this research work.

281

Budoya Catalogue Shuriken
100% (2)
Budoya Catalogue Shuriken
25 pages
RAGS BBW Year 2 Leaflet - Bangladesh 30052012
No ratings yet
RAGS BBW Year 2 Leaflet - Bangladesh 30052012
8 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
Statistical Considerations On The K - Means Algorithm
No ratings yet
Statistical Considerations On The K - Means Algorithm
9 pages
Ijettcs 2014 04 25 123
No ratings yet
Ijettcs 2014 04 25 123
5 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
PRJ C MR 18
No ratings yet
PRJ C MR 18
4 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
No ratings yet
An Improved K-Means Cluster Algorithm Using Map Reduce Techniques To Mining of Inter and Intra Cluster Datain Big Data Analytics
12 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Anupama Luthra - 2011
No ratings yet
Anupama Luthra - 2011
21 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
V5I5201647
No ratings yet
V5I5201647
13 pages
1120pm - 85.epra Journals 8308
No ratings yet
1120pm - 85.epra Journals 8308
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Research On K Mean Algorithm
No ratings yet
Research On K Mean Algorithm
5 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Clustering
No ratings yet
Clustering
11 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
No ratings yet
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
4 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
PSO and WDO Data Clusterin
No ratings yet
PSO and WDO Data Clusterin
19 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
3 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
No ratings yet
Data Mining Project: Cluster Analysis and Dimensionality Reduction in R Using Bank Marketing Data Set
31 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering in AI
No ratings yet
Clustering in AI
16 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Iterative Improved K-Means Clusterin
No ratings yet
Iterative Improved K-Means Clusterin
5 pages
1 s2.0 S0020025522014633 Main
No ratings yet
1 s2.0 S0020025522014633 Main
33 pages
Introduction To Data Science: Clustering
No ratings yet
Introduction To Data Science: Clustering
45 pages
K-Means Clustering Method For The Analysis of Log Data
No ratings yet
K-Means Clustering Method For The Analysis of Log Data
3 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Comprehensive Review of K Means Clustering Algorithms1
No ratings yet
Comprehensive Review of K Means Clustering Algorithms1
6 pages
Genedata
No ratings yet
Genedata
67 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
DWM Exp6 A49
No ratings yet
DWM Exp6 A49
7 pages
Clustering
No ratings yet
Clustering
34 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
No ratings yet
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
5 pages
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
No ratings yet
A Genetic K-Means Clustering Algorithm Based On The Optimized Initial Centers
7 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
5 CS 03 Ijsrcse
No ratings yet
5 CS 03 Ijsrcse
4 pages
What If We Knew What Happens After Death?: Naresh Kumar
No ratings yet
What If We Knew What Happens After Death?: Naresh Kumar
11 pages
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
No ratings yet
Clustering Algorithms For Mixed Datasets: A Review: K. Balaji and K. Lavanya
10 pages
Final Term Paper
No ratings yet
Final Term Paper
13 pages
Ijctt V71i2p105
No ratings yet
Ijctt V71i2p105
7 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
Healthcare E Guide System Using K Means
No ratings yet
Healthcare E Guide System Using K Means
90 pages
Prediction Analysis Techniques of Data Mining: A Review
No ratings yet
Prediction Analysis Techniques of Data Mining: A Review
7 pages
LJ 9
No ratings yet
LJ 9
7 pages
MPRA Paper 20588
No ratings yet
MPRA Paper 20588
10 pages
Sensors 24 02197
No ratings yet
Sensors 24 02197
23 pages
A Fuzzy Approach For Multi-Type Relational Data Clustering
No ratings yet
A Fuzzy Approach For Multi-Type Relational Data Clustering
14 pages
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
No ratings yet
Survey On Energy-Efficient Techniques For Wireless Sensor Networks
15 pages
Enery Saving Algorithms in Sensor Systems
No ratings yet
Enery Saving Algorithms in Sensor Systems
4 pages
Energy Efficient Routing Protocols For Wireless Sensor Network
No ratings yet
Energy Efficient Routing Protocols For Wireless Sensor Network
5 pages
SMK Means An Improved Mini Batch K Means Algorithm
No ratings yet
SMK Means An Improved Mini Batch K Means Algorithm
16 pages
HL Icdcs2020
No ratings yet
HL Icdcs2020
11 pages
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
No ratings yet
Int J Communication - 2006 - Zheng - Energy Efficient Network Protocols and Algorithms For Wireless Sensor Networks
4 pages
JETIR1503025
No ratings yet
JETIR1503025
4 pages
Springer Jeevan Sensor Accepted Version
No ratings yet
Springer Jeevan Sensor Accepted Version
15 pages
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
No ratings yet
An Energy Efficient Routing Protocol For Wireless Sensor Networks Using A-Star Algorithm
8 pages
Energy Efficient Routing Protocol
No ratings yet
Energy Efficient Routing Protocol
14 pages
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
No ratings yet
Hierarchical Energy-Saving Routing Algorithm Using Fuzzy Logic in Wireless Sensor Networks
11 pages
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
No ratings yet
Energy-Aware Data Processing Techniques For Wireless Sensor Networks: A Review
21 pages
Template
No ratings yet
Template
14 pages
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
No ratings yet
Energy Saving With Node Sleep and Power Control Mechanisms For Wireless Sensor Networks
1 page
1 s2.0 S1877050914009077 Main
No ratings yet
1 s2.0 S1877050914009077 Main
8 pages
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
No ratings yet
IET Communications - 2019 - Nilsaz Dezfuli - Distributed Energy Efficient Algorithm For Ensuring Coverage of Wireless
7 pages
Roostapour Dy KCo SMC08
No ratings yet
Roostapour Dy KCo SMC08
6 pages
Dunkels 07 Demo
No ratings yet
Dunkels 07 Demo
2 pages
N
0% (2)
N
3 pages
History of Ananda Bekasi Hospytal
No ratings yet
History of Ananda Bekasi Hospytal
3 pages
Physical Activities Towards Health and Fitness 2 Testing Form
No ratings yet
Physical Activities Towards Health and Fitness 2 Testing Form
3 pages
Precis Writing Examples and Questions
88% (8)
Precis Writing Examples and Questions
2 pages
ENCON 1 - Syllabus
No ratings yet
ENCON 1 - Syllabus
6 pages
Theoretical Orientation Scale
No ratings yet
Theoretical Orientation Scale
4 pages
Watchguard Cyberark Integration Guide
No ratings yet
Watchguard Cyberark Integration Guide
10 pages
Letter of Acceptance - SDS - 003OG000008tH8T
No ratings yet
Letter of Acceptance - SDS - 003OG000008tH8T
2 pages
Bataan Peninsula State University: Presentation Skills and Techniques
No ratings yet
Bataan Peninsula State University: Presentation Skills and Techniques
3 pages
Safa's Academic Calendar
No ratings yet
Safa's Academic Calendar
4 pages
Red Team Blue Team Exercise Data Sheet
No ratings yet
Red Team Blue Team Exercise Data Sheet
2 pages
Summarizing Blended Activity
No ratings yet
Summarizing Blended Activity
28 pages
The Impact of Racism On The Schooling Experiences of Aboriginal and Torres Strait Islander Students
No ratings yet
The Impact of Racism On The Schooling Experiences of Aboriginal and Torres Strait Islander Students
23 pages
Document
No ratings yet
Document
7 pages
Traders World Issue 48
100% (1)
Traders World Issue 48
101 pages
Previews IEEE 841-2009 Pre
100% (1)
Previews IEEE 841-2009 Pre
14 pages
Cambridge Primary Mathematics Stage 6 Learner S Book 6 Cambridge Primary Maths 4th Edition Emma Low Download
No ratings yet
Cambridge Primary Mathematics Stage 6 Learner S Book 6 Cambridge Primary Maths 4th Edition Emma Low Download
42 pages
Amharic MQ2 G6L
No ratings yet
Amharic MQ2 G6L
5 pages
Its New Media But Is It Art Education Scholz
100% (1)
Its New Media But Is It Art Education Scholz
20 pages
EQAVET and ECVET
No ratings yet
EQAVET and ECVET
240 pages
Religious Experience As Feeling
No ratings yet
Religious Experience As Feeling
2 pages
CV of Yudi Yuwono Wiwoho MD - January 2020
No ratings yet
CV of Yudi Yuwono Wiwoho MD - January 2020
3 pages
Lab Time Table Updated 13.7.2024
No ratings yet
Lab Time Table Updated 13.7.2024
6 pages
Blackbook MAEMA
No ratings yet
Blackbook MAEMA
77 pages
Research V
No ratings yet
Research V
27 pages
Las Math10 Q4 W1
No ratings yet
Las Math10 Q4 W1
8 pages
Resume 2018
No ratings yet
Resume 2018
2 pages
6 Levels of Thinking Every Student MUST Master
No ratings yet
6 Levels of Thinking Every Student MUST Master
22 pages

Ijcset 2016060701

Uploaded by

Ijcset 2016060701

Uploaded by

Saroj et al | IJCSET(www.ijcset.

net) | July 2016 | Vol 6, Issue 7, 279-281

Review: Study on Simple k Mean and Modified K

II. PARTITIONING CLUSTERING made of a number of attributes, any of which can be

1) Method for Simple K means clustering

Fig 4 Flow chart of Modified k means clustering

First step according to the given flow chart is that we have

You might also like