0% found this document useful (0 votes)
64 views

Unsupervised Learning - Clustering

This document discusses unsupervised machine learning and clustering. It provides an overview of clustering techniques like k-means clustering. K-means clustering partitions unlabeled data into k clusters where each cluster has a centroid. It discusses how k-means works by assigning data points to the closest centroid, recomputing centroids, and repeating until convergence. The document notes some weaknesses of k-means like sensitivity to outliers, initial seeds, and not knowing the optimal number of clusters k beforehand.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Unsupervised Learning - Clustering

This document discusses unsupervised machine learning and clustering. It provides an overview of clustering techniques like k-means clustering. K-means clustering partitions unlabeled data into k clusters where each cluster has a centroid. It discusses how k-means works by assigning data points to the closest centroid, recomputing centroids, and repeating until convergence. The document notes some weaknesses of k-means like sensitivity to outliers, initial seeds, and not knowing the optimal number of clusters k beforehand.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Unsupervised Machine Learning -

Clustering

May 2020

SECRET
Knowledge Share –Session plan
Topic Application Schedule

Overview of Machine learning and feature selection Generic 19-Feb


Regression - Supervised Machine Learning Market Share Forecast/ Inventory 13-Mar
Obsolescence

Classification - Supervised Machine Learning Technician Attrition 10-Apr


Clustering - Unsupervised Machine Learning Dealer/Parts Clustering 8-May
Bagging & Boosting - Ensemble Methods Service Parts Forecasting 5-Jun
Genetic Algorithm -Reinforcement Learning Vehicle Route optimization 3-Jul
Linear programming and mathematical optimization Container Loading/Vanning 31-Jul
Dimension Reduction & Pattern Search - Generic 28-Aug
Unsupervised Machine Learning

Descriptive, Predictive & prescriptive Analytics

2
SECRET
Machine Learning Universe

SECRET 3
What is clustering?

• The organization of unlabeled data into similarity groups


called clusters.
• A cluster is a collection of data items which are “similar”
between them, and “dissimilar” to data items in other clusters.
Historic application of clustering

SECRET 5
Clustering techniques

Divisive

K-means
K-Means clustering
• K-means (MacQueen, 1967) is a partitional clustering algorithm
• Let the set of data points D be {x1, x2, …, xn},
where xi = (xi1, xi2, …, xir) is a vector in X  Rr, and r is the
number of dimensions.
• The k-means algorithm partitions the given data into
k clusters:
– Each cluster has a cluster center, called centroid.
– k is specified by the user
K-means clustering example: step 1

SECRET 8
K-means clustering example – step 2

SECRET 9
K-means clustering example – step 3

SECRET 10
K-means clustering example

SECRET 11
K-means clustering example

SECRET 12
K-means clustering example

SECRET 13
Weaknesses of K-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• Sensitive to initial seed
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away
from other data points.
– Outliers could be errors in the data recording or
some special data points with very different values.
Optimal Number of cluster

Within Cluster Sum of Squares (WCSS)


Optimal Number of cluster
Sensitivity to initial seeds

Random selection of seeds (centroids) Random selection of seeds (centroids)

Iteration 1 Iteration 2 Iteration 1 Iteration 2


Outlier
s

SECRET 18
K-means summary

• Despite weaknesses, k-means is still the most


popular algorithm due to its simplicity and
efficiency
• No clear evidence that any other clustering
algorithm performs better in general
• Comparing different clustering algorithms is a
difficult task. No one knows the correct
clusters!

You might also like