K-Means Clustering Using RapidMiner

Uploaded by

chamarilk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views10 pages

K-Means Clustering Using RapidMiner

Uploaded by

chamarilk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

K-means Clustering

USING RAPIDMINER

Sanchit Kumar | Data Warehousing and Data Mining | April 27, 2016
Problem Statement

Sonia is a program director for a major health insurance provider. Recently she has
been reading in medical journals and other articles, and found a strong emphasis
on the influence of weight, gender and cholesterol on the development of coronary
heart disease. The research she’s read confirms time after time that there is a
connection between these three variables, and while there is little that can be done
about one’s gender, there are certainly life choices that can be made to alter one’s
cholesterol and weight. She begins brainstorming ideas for her company to offer
weight and cholesterol management programs to individuals who receive health
insurance through her employer. As she considers where her efforts might be most
effective, she finds herself wondering if there are natural groups of individuals who
are most at risk for high weight and high cholesterol, and if there are such groups,
where the natural dividing lines between the groups occur.

PAGE 1
Algorithm Used
K-MEANS ALGORITHM

Formally, given a data set, D, of n objects, and k, the number of clusters to form,
the k-means algorithm organizes the objects into k partitions, where each
partition represents a cluster. The clusters are formed to optimize an objective
partitioning criterion, such as a dissimilarity function based on distance, so that
the objects within a cluster are “similar” to one another and “dissimilar” to objects
in other clusters in terms of the data set attributes.

PAGE 2
Data Set Used

Using the insurance company’s claims database, Sonia extracts three attributes for
547 randomly selected individuals. The three attributes are the insured’s weight in
pounds as recorded on the person’s most recent medical examination, their last
cholesterol level determined by blood work in their doctor’s lab, and their gender.
As is typical in many data sets, the gender attribute uses 0 to indicate Female and 1
to indicate Male. We will use this sample data from Sonia’s employer’s database to
build a cluster model to help Sonia understand how her company’s clients, the
health insurance policy holders, appear to group together on the basis of their
weights, genders and cholesterol levels.

A data set has been prepared for this example, and is available as
Chapter06DataSet.csv on the book’s (Data Mining for the Masses) companion web
site.

PAGE 3
Applications of the Algorithm

K-means clustering is very flexible in its ability to group observations together. For
this example, it does not necessarily predict which insurance policy holders will or
will not develop heart disease. It simply takes known indicators from the attributes
in a data set, and groups them together based on those attributes’ similarity to
group averages. Because any attributes that can be quantified can also have means
calculated, k-means clustering provides an effective way of grouping observations
together based on what is typical or normal for that group. It also helps us
understand where one group begins and the other ends, or in other words, where
the natural breaks occur between groups in a data set.

The k-Means operator in RapidMiner allows data miners to set the number of
clusters they wish to generate, to dictate the number of sample means used to
determine the clusters, and to use a number of different algorithms to evaluate
means. While fairly simple in its set-up and definition, k-Means clustering is a
powerful method for finding natural groups of observations in a data set.

PAGE 4
Screenshots

Fig 1. Process View

Fig 2. Cluster Model

PAGE 5
Fig 3. Centroid Table

Fig 4. Folder View of Cluster 3

PAGE 6
Fig. 5. Filtered View of the data belonging to Cluster 3

PAGE 7
Evaluation

Sonia’s major objective in the hypothetical scenario posed at the beginning of the
chapter was to try to find natural breaks between different types of heart disease
risk groups. Using the k-Means operator in RapidMiner, we have identified four
clusters for Sonia, and we can now evaluate their usefulness in addressing Sonia’s
question.

We see in the screenshots that cluster 3 has the highest average weight and
cholesterol. With 0 representing Female and 1 representing Male, a mean of 0.591
indicates that we have more men than women represented in this cluster.
Knowing that high cholesterol and weight are two key indicators of heart disease
risk that policy holders can do something about, Sonia would likely want to start
with the members of cluster 3 when promoting her new programs. She could then
extend her programming to include the people in clusters 2 and 0, which have the
next incrementally lower means for these two key risk factor attributes.

PAGE 8
References

1. Book: Data Mining for the Masses - Dr. Matthew A North

2. Book: Data Mining: Concepts and Techniques - Han, Kamber, Pei
3. Dataset: Data Mining for the Masses - Site
4. Video: K-Means Clustering in RapidMiner - YouTube

PAGE 9

Entrepreneurship - PPTX Version 1 - Copy (Autosaved) (Autosaved) (Autosaved) (Autosaved)
67% (3)
Entrepreneurship - PPTX Version 1 - Copy (Autosaved) (Autosaved) (Autosaved) (Autosaved)
211 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Grade 06 ICT 1st Term Test Paper 2023 English Medium Royal College
60% (5)
Grade 06 ICT 1st Term Test Paper 2023 English Medium Royal College
6 pages
GR 9 Natural Sciences Class Text Getc Aet Level 4 4 Aug 2022 v2
No ratings yet
GR 9 Natural Sciences Class Text Getc Aet Level 4 4 Aug 2022 v2
264 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
K-Means Clustering - Numerical Example
100% (1)
K-Means Clustering - Numerical Example
6 pages
Concept of Quantitative Revolution in Geography
50% (2)
Concept of Quantitative Revolution in Geography
3 pages
Radiohead - Ok Computer - Thesis
100% (6)
Radiohead - Ok Computer - Thesis
216 pages
CV of Dr. Vishnu Prasad Pandey
No ratings yet
CV of Dr. Vishnu Prasad Pandey
12 pages
K-Means Clustering Method For The Analysis of Log Data
No ratings yet
K-Means Clustering Method For The Analysis of Log Data
3 pages
Lect3 Clustering
No ratings yet
Lect3 Clustering
86 pages
Seminar Report Format
No ratings yet
Seminar Report Format
19 pages
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
No ratings yet
An Introduction To WEKA Explorer: in Part From: Yizhou Sun 2008
104 pages
Chapter Six: Data Mining For The Masses by Matthew North
No ratings yet
Chapter Six: Data Mining For The Masses by Matthew North
11 pages
January 2024: Top 10 Downloaded Articles in Computer Science & Information Technology
No ratings yet
January 2024: Top 10 Downloaded Articles in Computer Science & Information Technology
35 pages
Data Mining Health
No ratings yet
Data Mining Health
10 pages
February 2024-: Top Read Articles in Computer Science & Information Technology
No ratings yet
February 2024-: Top Read Articles in Computer Science & Information Technology
35 pages
Soft Clustering
No ratings yet
Soft Clustering
7 pages
End To End Machine Learning Problem
No ratings yet
End To End Machine Learning Problem
20 pages
Anjali f9
No ratings yet
Anjali f9
29 pages
A Survey On Data Mining Approaches For Healthcare
No ratings yet
A Survey On Data Mining Approaches For Healthcare
26 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Wine Scientific Paper
No ratings yet
Wine Scientific Paper
4 pages
Data Mining Unupervised Techniques
No ratings yet
Data Mining Unupervised Techniques
27 pages
7B-Data - Handling - and - BI - 21 Part 2
No ratings yet
7B-Data - Handling - and - BI - 21 Part 2
12 pages
Heart Prediction
No ratings yet
Heart Prediction
6 pages
Study of Road Accident Patterns in Uttarakhand Using K-Means Algorithm
No ratings yet
Study of Road Accident Patterns in Uttarakhand Using K-Means Algorithm
5 pages
Design and Implementation of High End Multiple Security Based ATM Monitoring System
No ratings yet
Design and Implementation of High End Multiple Security Based ATM Monitoring System
3 pages
Data Mining (DM) : Lecture 3: Know Your Data
No ratings yet
Data Mining (DM) : Lecture 3: Know Your Data
53 pages
Neda PDPMR Final PDF
No ratings yet
Neda PDPMR Final PDF
84 pages
Data Mining and Knowledge Discovery: Applications, Techniques, Challenges and Process Models in Healthcare
No ratings yet
Data Mining and Knowledge Discovery: Applications, Techniques, Challenges and Process Models in Healthcare
7 pages
Mining
No ratings yet
Mining
129 pages
Data Mining in Healthcare
No ratings yet
Data Mining in Healthcare
10 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Blended Clustering For Health Data Mining
No ratings yet
Blended Clustering For Health Data Mining
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
Fuzzy Cluster
No ratings yet
Fuzzy Cluster
28 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Ijcset 2016060701
No ratings yet
Ijcset 2016060701
3 pages
Data Mining and Health
No ratings yet
Data Mining and Health
11 pages
DATA-51000-ClusteringAssignmentTemplateNew Maternal Health Risk
No ratings yet
DATA-51000-ClusteringAssignmentTemplateNew Maternal Health Risk
12 pages
Using Decision Trees in Data Mining For Predicting Factors Influencing of Heart Disease
No ratings yet
Using Decision Trees in Data Mining For Predicting Factors Influencing of Heart Disease
6 pages
Unit 3
No ratings yet
Unit 3
58 pages
Data Mining and It's Applications in Healthcare
No ratings yet
Data Mining and It's Applications in Healthcare
5 pages
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
No ratings yet
Application of The K-Means Clustering Algorithm in Medical Claims Fraud / Abuse Detection
10 pages
Heart Disease Diagnosis Using Data Mining Technique
No ratings yet
Heart Disease Diagnosis Using Data Mining Technique
4 pages
CHL5230 2025w Lecture 09 v2
No ratings yet
CHL5230 2025w Lecture 09 v2
25 pages
Unit II Final
No ratings yet
Unit II Final
152 pages
Introduction To Data Science: Clustering
No ratings yet
Introduction To Data Science: Clustering
45 pages
Prediction of Heart Disease by Clustering and Classification Techniques Prediction of Heart Disease by Clustering and Classification Techniques
No ratings yet
Prediction of Heart Disease by Clustering and Classification Techniques Prediction of Heart Disease by Clustering and Classification Techniques
8 pages
Irjet V6i31160
No ratings yet
Irjet V6i31160
7 pages
Practical # 12
No ratings yet
Practical # 12
3 pages
Heart Disease Prediction Using Naive Bayes and K-Means Techniques
No ratings yet
Heart Disease Prediction Using Naive Bayes and K-Means Techniques
5 pages
Medical Data Mining and Analysis For Heart Disease Dataset Using Classification Techniques
No ratings yet
Medical Data Mining and Analysis For Heart Disease Dataset Using Classification Techniques
5 pages
Statistical Considerations On The K - Means Algorithm
No ratings yet
Statistical Considerations On The K - Means Algorithm
9 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
Ijert Ijert: Decision Making To Predict Customer Preferences in Life Insurance
No ratings yet
Ijert Ijert: Decision Making To Predict Customer Preferences in Life Insurance
4 pages
NB 7
No ratings yet
NB 7
3 pages
Data Mining in Bioinformatics
No ratings yet
Data Mining in Bioinformatics
21 pages
WMM Final Updated
No ratings yet
WMM Final Updated
11 pages
1744 5586 1 PB
No ratings yet
1744 5586 1 PB
9 pages
The Effects of Financial Inclusion On Po PDF
No ratings yet
The Effects of Financial Inclusion On Po PDF
11 pages
Disease Prediction Using Data Mining
No ratings yet
Disease Prediction Using Data Mining
5 pages
Niir Indore Madhya Pradesh India Business Industrial Directory Database List Companies Small Medium Enterprises Sme Industries XLSX Excel Format 7th Edition
No ratings yet
Niir Indore Madhya Pradesh India Business Industrial Directory Database List Companies Small Medium Enterprises Sme Industries XLSX Excel Format 7th Edition
2 pages
Module 4
No ratings yet
Module 4
63 pages
Nuwanethi Obata Senehebara Amathumak
No ratings yet
Nuwanethi Obata Senehebara Amathumak
40 pages
New Leader Assimilation: Process and Outcomes
No ratings yet
New Leader Assimilation: Process and Outcomes
21 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Chapter 12
No ratings yet
Chapter 12
44 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
I.1 Theoretical Framework Problem Statement and Research Questions
No ratings yet
I.1 Theoretical Framework Problem Statement and Research Questions
5 pages
Writing
No ratings yet
Writing
4 pages
Example of A Good Scientific Literature Review
100% (3)
Example of A Good Scientific Literature Review
7 pages
MongoDB For Data Science Seminar
No ratings yet
MongoDB For Data Science Seminar
135 pages
Brand Equity UCB
No ratings yet
Brand Equity UCB
24 pages
Correct Validation WP Final V
No ratings yet
Correct Validation WP Final V
26 pages
MST-002 - Descriptive Statistics
No ratings yet
MST-002 - Descriptive Statistics
267 pages
Risk Assessment - Characterization
No ratings yet
Risk Assessment - Characterization
10 pages
19 DistributedDatabases
No ratings yet
19 DistributedDatabases
76 pages
DSTBD - 10 DMClassification ENG
No ratings yet
DSTBD - 10 DMClassification ENG
160 pages
Tutorial rm5 Prom6extension
No ratings yet
Tutorial rm5 Prom6extension
20 pages
18 Recovery
No ratings yet
18 Recovery
53 pages
ABE Strategic Marketing Management Examination Tips June 2014
No ratings yet
ABE Strategic Marketing Management Examination Tips June 2014
6 pages
DSTBD 9-DMassrules
No ratings yet
DSTBD 9-DMassrules
98 pages
Tutorial DataMiningENG
No ratings yet
Tutorial DataMiningENG
8 pages
Wraps: - How To Make Better Decisions
No ratings yet
Wraps: - How To Make Better Decisions
22 pages
Jovanovicetal 2014RapidMinerBook
No ratings yet
Jovanovicetal 2014RapidMinerBook
17 pages
V6i5 0268
No ratings yet
V6i5 0268
7 pages
No. 77 Guidelines For The Surveyor On How To Control The Thickness Measurement Process No.77
100% (1)
No. 77 Guidelines For The Surveyor On How To Control The Thickness Measurement Process No.77
3 pages
Project Based Learning For Electrostatics: César Mora, Carlos Collazos, Ricardo Otero, Jaime Isaza
No ratings yet
Project Based Learning For Electrostatics: César Mora, Carlos Collazos, Ricardo Otero, Jaime Isaza
5 pages
15 QueryOptimization
No ratings yet
15 QueryOptimization
78 pages
3571-1668069094770-87-1591986311434-HND PP W5 Solution Methodologies
No ratings yet
3571-1668069094770-87-1591986311434-HND PP W5 Solution Methodologies
21 pages
20 ElasticSearch
No ratings yet
20 ElasticSearch
62 pages
DSTBD Oracle hints-IT
No ratings yet
DSTBD Oracle hints-IT
11 pages
Oppo Vs Vivo - 2 Rutvi Dedhiya 17
No ratings yet
Oppo Vs Vivo - 2 Rutvi Dedhiya 17
12 pages
Pblu Identification
No ratings yet
Pblu Identification
8 pages
Data 2
No ratings yet
Data 2
1 page
14 PhysicalAccess
No ratings yet
14 PhysicalAccess
41 pages
Answers Assignment B21 43
No ratings yet
Answers Assignment B21 43
7 pages
COM 142 Teaching Outline For SEM 2
No ratings yet
COM 142 Teaching Outline For SEM 2
3 pages
RapidMiner Data Science Foundations Course Description
No ratings yet
RapidMiner Data Science Foundations Course Description
2 pages
MKT 4131 Assessment 2 - AY21-22
No ratings yet
MKT 4131 Assessment 2 - AY21-22
3 pages
1 s2.0 S0264275123002470 Main
No ratings yet
1 s2.0 S0264275123002470 Main
15 pages
Artikel 2
No ratings yet
Artikel 2
10 pages
Participatory Development Planning
No ratings yet
Participatory Development Planning
16 pages
Optimization
No ratings yet
Optimization
4 pages
Synthesis Essay
No ratings yet
Synthesis Essay
3 pages
How To Write Literature Review in Research Example
No ratings yet
How To Write Literature Review in Research Example
8 pages
Emergence III
From Everand
Emergence III
Larry Matthews
No ratings yet
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

K-Means Clustering Using RapidMiner

Uploaded by

K-Means Clustering Using RapidMiner

Uploaded by

K-means Clustering

Fig 1. Process View

Fig 2. Cluster Model

Fig 4. Folder View of Cluster 3

1. Book: Data Mining for the Masses - Dr. Matthew A North

You might also like