0% found this document useful (0 votes)
206 views24 pages

K-Means With Elbow Method

Unsupervised learning is a type of machine learning that finds previously unknown patterns in data without pre-existing labels. There are two main types of unsupervised learning problems: clustering and association. Clustering aims to discover inherent groupings in the data, while association rule learning discovers rules that describe large portions of the data. K-means clustering is an algorithm that groups data points into k clusters by minimizing distances between points and cluster centroids, calculating new centroids iteratively until convergence.

Uploaded by

21dce106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views24 pages

K-Means With Elbow Method

Unsupervised learning is a type of machine learning that finds previously unknown patterns in data without pre-existing labels. There are two main types of unsupervised learning problems: clustering and association. Clustering aims to discover inherent groupings in the data, while association rule learning discovers rules that describe large portions of the data. K-means clustering is an algorithm that groups data points into k clusters by minimizing distances between points and cluster centroids, calculating new centroids iteratively until convergence.

Uploaded by

21dce106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Machine Learning

Unsupervised Learning
What is Unsupervised Learning?
• Unsupervised learning is a type of self-organized Hebbian learning that
helps find previously unknown patterns in data set without pre-existing
labels.
What is Unsupervised Learning?
What is Unsupervised Learning?
Unsupervised learning problems can be further grouped into clustering and
association problems.
• Clustering: A clustering problem is where you want to discover the
inherent groupings in the data, such as grouping customers by
purchasing behavior.
• Association: An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as people
that buy X also tend to buy Y.
What is Unsupervised Learning?
What is Unsupervised Learning?
What is Unsupervised Learning?
Unsupervised Learning Techniques
K-means algorithm
Hierarchical clustering
Principal Component Analysis
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Principal Component Analysis (PCA)
K-Nearest Neighbors (K-NN)
K-means algorithm
• K-means clustering is an unsupervised machine learning algorithm used to group a
dataset into k clusters.
• The goal of the algorithm is to group similar data points together while maximizing the
dissimilarity between different clusters.
• It's widely used for various applications, including customer segmentation, image
compression, and data preprocessing.
Working of K-means algorithm
1. Initialization: Choose the number of clusters (k) you want to create and randomly initialize k
cluster centroids. Each centroid represents the center of a cluster.
2. Assignment Step: For each data point, calculate its distance from each of the k centroids. Assign
the data point to the cluster whose centroid is closest to it. This forms k initial clusters.
3. Update Step: Recalculate the centroids of the k clusters by taking the mean of all the data points
assigned to each cluster.
4. Repeat: Repeat the assignment and update steps iteratively until either a maximum number of
iterations is reached or the centroids stop changing significantly.
5. Convergence: The algorithm is considered to have converged when the centroids of the clusters
no longer change substantially between iterations.
Working of K-means algorithm: Example
Point Coordinates
A1 (2,10)
STEP1: randomly initialize k cluster centroids
A2 (2,6)
A3 (11,11) Centroid 1=(2,6) is associated with cluster 1.
A4 (6,9)
Centroid 2=(5,10) is associated with cluster 2.
A5 (6,4)
A6 (1,2) Centroid 3=(6,11) is associated with cluster 3.
A7 (5,10)
A8 (4,9)
A9 (10,12)
A10 (7,5)
A11 (9,11)
A12 (4,6)
A13 (3,10)
A14 (3,8)
A15 (6,11)
Working of K-means algorithm: Example
Distance from Distance from
Distance from Assigned The distance function between
Point Centroid 2 Centroid 3
Centroid 1 (2,6) Cluster
(5,10) (6,11) two points
A1 (2,10) 4 3 4.123106 Cluster 2 a = (x1, y1) and b = (x2, y2) is
A2 (2,6) 0 5 6.403124 Cluster 1 defined as-
A3 (11,11) 10.29563 6.082763 5 Cluster 3
A4 (6,9) 5 1.414214 2 Cluster 2
Euclidean Distance =
√((x₂ - x₁)² + (y₂ - y₁)²)
A5 (6,4) 4.472136 6.082763 7 Cluster 1
A6 (1,2) 4.123106 8.944272 10.29563 Cluster 1
(x₁, y₁) = (2, 6)
A7 (5,10) 5 0 1.414214 Cluster 2 (x₂, y₂) = (2, 10)
A8 (4,9) 3.605551 1.414214 2.828427 Cluster 2
A9 (10,12) 10 5.385165 4.123106 Cluster 3 Euclidean Distance C1
A10 (7,5) 5.09902 5.385165 6.082763 Cluster 1 = √((2 - 2)² + (6 - 10)²)
A11 (9,11) 8.602325 4.123106 3 Cluster 3 = √(0² + 4²)
A12 (4,6) 2 4.123106 5.385165 Cluster 1 = √(0 + 16)
A13 (3,10) 4.123106 2 3.162278 Cluster 2 = √16
A14 (3,8) 2.236068 2.828427 4.242641 Cluster 1 =4
A15 (6,11) 6.403124 1.414214 0 Cluster 3
Working of K-means algorithm: Example
Distance from Distance from
Distance from Assigned Euclidean Distance =
Point Centroid 2 Centroid 3
Centroid 1 (2,6) Cluster
(5,10) (6,11) √((x₂ - x₁)² + (y₂ - y₁)²)
A1 (2,10) 4 3 4.123106 Cluster 2
A2 (2,6) 0 5 6.403124 Cluster 1 (x₁, y₁) = (5, 10)
A3 (11,11) 10.29563 6.082763 5 Cluster 3 (x₂, y₂) = (2, 10)
A4 (6,9) 5 1.414214 2 Cluster 2
Euclidean Distance C2
A5 (6,4) 4.472136 6.082763 7 Cluster 1
= √((5 - 2)² + (10 - 10)²)
A6 (1,2) 4.123106 8.944272 10.29563 Cluster 1
= √(3² + 0²)
A7 (5,10) 5 0 1.414214 Cluster 2 = √(9 + 0)
A8 (4,9) 3.605551 1.414214 2.828427 Cluster 2 = √9
A9 (10,12) 10 5.385165 4.123106 Cluster 3 =3
A10 (7,5) 5.09902 5.385165 6.082763 Cluster 1
A11 (9,11) 8.602325 4.123106 3 Cluster 3
A12 (4,6) 2 4.123106 5.385165 Cluster 1
A13 (3,10) 4.123106 2 3.162278 Cluster 2
A14 (3,8) 2.236068 2.828427 4.242641 Cluster 1
A15 (6,11) 6.403124 1.414214 0 Cluster 3
Working of K-means algorithm: Example
Distance from Distance from
Distance from Assigned
Point Centroid 2 Centroid 3
Centroid 1 (2,6)
(5,10) (6,11)
Cluster Euclidean Distance =
√((x₂ - x₁)² + (y₂ - y₁)²)
A1 (2,10) 4 3 4.123106 Cluster 2
A2 (2,6) 0 5 6.403124 Cluster 1 (x₁, y₁) = (6, 11)
A3 (11,11) 10.29563 6.082763 5 Cluster 3 (x₂, y₂) = (2, 10)
A4 (6,9) 5 1.414214 2 Cluster 2
A5 (6,4) 4.472136 6.082763 7 Cluster 1 Euclidean Distance C3
A6 (1,2) 4.123106 8.944272 10.29563 Cluster 1 = √((6 - 2)² + (11 - 10)²)
A7 (5,10) 5 0 1.414214 Cluster 2 = √(4² + 1²)
A8 (4,9) 3.605551 1.414214 2.828427 Cluster 2 = √(16 + 1)
A9 (10,12) 10 5.385165 4.123106 Cluster 3 = √17
A10 (7,5) 5.09902 5.385165 6.082763 Cluster 1
= 4.12
A11 (9,11) 8.602325 4.123106 3 Cluster 3
A12 (4,6) 2 4.123106 5.385165 Cluster 1
A13 (3,10) 4.123106 2 3.162278 Cluster 2
A14 (3,8) 2.236068 2.828427 4.242641 Cluster 1
A15 (6,11) 6.403124 1.414214 0 Cluster 3
Working of K-means algorithm: Example
Now, we will calculate the new centroid for each cluster.

In cluster 1, we have 6 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), A12 (4,6), A14 (3,8).
To calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates of
each point in the cluster.
Mean= [ (2+6+1+7+4+3) /6 + (6+4+2+5+6+8)/6 ]=(23/6) +(31/6)

Hence, the new centroid for cluster 1 is (3.833, 5.167).


Working of K-means algorithm: Example

In cluster 2, we have 5 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), and A13 (3,10).
Hence, the new centroid for cluster 2 is (4, 9.6)

In cluster 3, we have 4 points i.e. A3 (11,11), A9 (10,12), A11 (9,11), and A15 (6,11).
Hence, the new centroid for cluster 3 is (9, 11.25).
Working of K-means algorithm: Example
Distance from Distance from Distance from
Point Centroid 1 centroid 2 (4, centroid 3 (9, Assigned Cluster
(3.833, 5.167) 9.6) 11.25)
A1 (2,10) 5.169 2.040 7.111 Cluster 2
A2 (2,6) 2.013 4.118 8.750 Cluster 1
A3 (11,11) 9.241 7.139 2.016 Cluster 3
A4 (6,9) 4.403 2.088 3.750 Cluster 2
A5 (6,4) 2.461 5.946 7.846 Cluster 1
A6 (1,2) 4.249 8.171 12.230 Cluster 1
A7 (5,10) 4.972 1.077 4.191 Cluster 2
A8 (4,9) 3.837 0.600 5.483 Cluster 2
A9 (10,12) 9.204 6.462 1.250 Cluster 3
A10 (7,5) 3.171 5.492 6.562 Cluster 1
A11 (9,11) 7.792 5.192 0.250 Cluster 3
A12 (4,6) 0.850 3.600 7.250 Cluster 1
A13 (3,10) 4.904 1.077 6.129 Cluster 2
A14 (3,8) 2.953 1.887 6.824 Cluster 2
A15 (6,11) 6.223 2.441 3.010 Cluster 2
Working of K-means algorithm: Example
Now, we will calculate the new centroid for each cluster for the third iteration.
1. In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), and A12 (4,6). To
calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates
of each point in the cluster. Hence, the new centroid for cluster 1 is (4, 4.6).
2. In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), A13 (3,10), A14
(3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571)
3. In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new
centroid for cluster 3 is (10, 11.333).
Working of K-means algorithm: Example
Distance from Distance from
Point Distance from centroid 2 (4.143, centroid 3 (10, Assigned Cluster
Centroid 1 (4, 4.6) 9.571) 11.333)

A1 (2,10) 5.758 2.186 8.110 Cluster 2


A2 (2,6) 2.441 4.165 9.615 Cluster 1
A3 (11,11) 9.485 7.004 1.054 Cluster 3
A4 (6,9) 4.833 1.943 4.631 Cluster 2
A5 (6,4) 2.088 5.872 8.353 Cluster 1
A6 (1,2) 3.970 8.197 12.966 Cluster 1
A7 (5,10) 5.492 0.958 5.175 Cluster 2
A8 (4,9) 4.400 0.589 6.438 Cluster 2
A9 (10,12) 9.527 6.341 0.667 Cluster 3
A10 (7,5) 3.027 5.390 7.008 Cluster 1
A11 (9,11) 8.122 5.063 1.054 Cluster 3
A12 (4,6) 1.400 3.574 8.028 Cluster 1
A13 (3,10) 5.492 1.221 7.126 Cluster 2
A14 (3,8) 3.544 1.943 7.753 Cluster 2
A15 (6,11) 6.705 2.343 4.014 Cluster 2
Working of K-means algorithm: Example
Now, we will calculate the new centroid for each cluster for the third iteration.

1. In cluster 1, we have 5 points i.e. A2 (2,6), A5 (6,4), A6 (1,2), A10 (7,5), and A12 (4,6). To
calculate the new centroid for cluster 1, we will find the mean of the x and y coordinates
of each point in the cluster. Hence, the new centroid for cluster 1 is (4, 4.6).
2. In cluster 2, we have 7 points i.e. A1 (2,10), A4 (6,9), A7 (5,10) , A8 (4,9), A13 (3,10), A14
(3,8), and A15 (6,11). Hence, the new centroid for cluster 2 is (4.143, 9.571)
3. In cluster 3, we have 3 points i.e. A3 (11,11), A9 (10,12), and A11 (9,11). Hence, the new
centroid for cluster 3 is (10, 11.333).
Working of K-means algorithm: Example
Here, you can observe that no point has changed its cluster compared to the previous
iteration. Due to this, the centroid also remains constant. Therefore, we will say that the
clusters have been stabilized. Hence, the clusters obtained after the third iteration are the
final clusters made from the given dataset. If we plot the clusters on a graph, the graph
looks like as follows.
How to Determine the Optimal K for K-Means?
1. The Elbow Method
2. The Silhouette Method (Assignment)
How to Determine the Optimal K for K-Means?
The Elbow Method: This is probably the most well-known method for determining the
optimal number of clusters. It is also a bit naive in its approach.
Calculate the Within-Cluster-Sum of Squared Errors (WSS) for different values of k, and
choose the k for which WSS becomes first starts to diminish. In the plot of WSS-versus-k,
this is visible as an elbow.
How to Determine the Optimal K for K-Means?

You might also like