Intro To K Means Clustering

K-means clustering is an unsupervised learning algorithm that groups similar data points together into K clusters. It works by first randomly assigning data points to K clusters, then iteratively updating the cluster centroids and reassigning points until cluster membership stabilizes. The optimal number of clusters K can be chosen using the elbow method, which involves plotting the sum of squared errors for different values of K - the elbow in the graph indicates the best K.

Uploaded by

Abie D'first Hacker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views12 pages

Intro To K Means Clustering

Uploaded by

Abie D'first Hacker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Introduction to

K Means Clustering
Reading Assignment

Chapter 10 of
Introduction to Statistical Learning
By Gareth James, et al.
K Means Clustering

K Means Clustering is an unsupervised learning algorithm that

will attempt to group similar clusters together in your data.
So what does a typical clustering problem look like?
● Cluster Similar Documents
● Cluster Customers based on Features
● Market Segmentation
● Identify similar physical groups
K Means Clustering

● The overall goal is to divide data into distinct groups such

that observations within each group are similar
K Means Clustering

The K Means Algorithm

● Choose a number of Clusters “K”
● Randomly assign each point to a cluster
● Until clusters stop changing, repeat the following:
○ For each cluster, compute the cluster centroid by
taking the mean vector of points in the cluster
○ Assign each data point to the cluster for which the
centroid is the closest
K Means Clustering
Choosing a K Value
Choosing a K Value

● There is no easy answer for choosing a “best” K value

● One way is the elbow method
First of all, compute the sum of squared error (SSE) for some
values of k (for example 2, 4, 6, 8, etc.).
The SSE is defined as the sum of the squared distance
between each member of the cluster and its centroid.
Choosing a K Value

If you plot k against the SSE, you will see that the error
decreases as k gets larger; this is because when the number
of clusters increases, they should be smaller, so distortion is
also smaller.
The idea of the elbow method is to choose the k at which the
SSE decreases abruptly.
This produces an "elbow effect" in the graph, as you can see
in the following picture:
Choosing a K Value
Example with R

Let's go to RStudio and begin to explore an example, then

you’ll have a project to test your understanding!

K - Mean Clustering
No ratings yet
K - Mean Clustering
15 pages
Unit 6 - Machine Learning in R
No ratings yet
Unit 6 - Machine Learning in R
45 pages
Clustering
No ratings yet
Clustering
43 pages
Clustering and K-Mean Algorithm
No ratings yet
Clustering and K-Mean Algorithm
38 pages
Week 10
No ratings yet
Week 10
41 pages
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
No ratings yet
Statistical Computing With R: Masters in Data Sciences 503 (S27) Third Batch, SMS, TU, 2024
30 pages
Clustering (Kmeans)
No ratings yet
Clustering (Kmeans)
10 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Lecture 19
No ratings yet
Lecture 19
21 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
ML Practical 4
No ratings yet
ML Practical 4
2 pages
Alehandro Lumentah 210211010188 Assignment09
No ratings yet
Alehandro Lumentah 210211010188 Assignment09
10 pages
21csc305p Machine Learning Unit 3 - Updated
No ratings yet
21csc305p Machine Learning Unit 3 - Updated
147 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Stop Using The Elbow Criterion For K-Means
No ratings yet
Stop Using The Elbow Criterion For K-Means
7 pages
K-MEANS CLUSTERING PPT Kpu
No ratings yet
K-MEANS CLUSTERING PPT Kpu
4 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
Unit 4
No ratings yet
Unit 4
63 pages
Elbow Method For Optimal Cluster Number in K-Means
No ratings yet
Elbow Method For Optimal Cluster Number in K-Means
8 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
MODULE 4 Clustering
No ratings yet
MODULE 4 Clustering
23 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
K Means Clustering
No ratings yet
K Means Clustering
13 pages
Day 3
No ratings yet
Day 3
37 pages
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
No ratings yet
CSC649 Lecture 3 Unsupervised ML - KMeansClustering
22 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Clustering
No ratings yet
Clustering
6 pages
Clustering
No ratings yet
Clustering
4 pages
Kmeansfinal
No ratings yet
Kmeansfinal
16 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Anova For Comparing Means Between More Than 2 Groups: Variance: Average of Squared Differences From Mean
No ratings yet
Anova For Comparing Means Between More Than 2 Groups: Variance: Average of Squared Differences From Mean
69 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Kmean
No ratings yet
Kmean
24 pages
STAT452 Project1
No ratings yet
STAT452 Project1
13 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
K, Eans
No ratings yet
K, Eans
4 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
Lecture 11 K Means Clustering
No ratings yet
Lecture 11 K Means Clustering
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages