0% found this document useful (0 votes)
14 views50 pages

Week 10

ML Course and projec

Uploaded by

adeelniaz1391
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views50 pages

Week 10

ML Course and projec

Uploaded by

adeelniaz1391
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Unsupervised Learning

Dr. Saifal Talpur

27-11-23
Reminder
• "Machine learning is the subfield of computer science that gives computers the
ability to learn without being explicitly programmed."
Arthur Samuel, 1959

• What do we mean by machine learning? Most computer


programs today are handcrafted by humans. Software
engineers carefully craft every rule that governs how
software behaves and then translate it into computer
code.

4
What kind of AI do you • Supervised
learning
know?

• In supervised learning, we need to have training examples, (such as images of animals,


and labels)
• If we have a high number of these labeled training examples, we can train a classifier
on detecting the subtle statistical patterns that differentiate dogs from all other
animals.
• The classifier does not know what a dog fundamentally is. It only knows the statistical
patterns that linked images to dogs in training.
• If a supervised learning classifier encounters something that's very different from the
training data, it can often get confused and will just output nonsense.
5
Other type of
learning?
• While supervised learning is the majority of industrial AI, it
requires labelled examples.
• Unsupervised learning: Imagine that as a bank you have a large
number of customers. You would be interested in regrouping
them into different market segments, but what we don't know
how.
• In this example, we would like an algorithm that looks at a lot of
data from customers and groups them into segments. This is an
example of unsupervised learning.

6
Reinforcement
learning
• In reinforcement learning, we train agents who take
actions in an environment, such as a self‐driving car
on the road, or an asset manager to take positions.
While we do not have labels, that is, we cannot tell
what the correct action is in any situation, we can assign
rewards or punishments.

• For example, we could reward keeping a proper distance


from the car in front.

7
Let us concentrate
today on Unsupervised
learning

8
What can we do with unsupervised
learning?
• Clustering
• K-means, K-means++
• CAH
• DBSCAN

• Dimensionality reduction
• PCA
• Auto-encoder

• Generative models
• GAN

9
What Is
Clustering?

10
Clustering
Models
• So far, we have discussed supervised learning
where we were predicting a known class label
• Clustering models are unsupervised
• We are trying to learn and understand patterns in
unlabeled data
• The goal is to group similar data points into
segments/clusters
• You may hear ”clustering” and ”segmentation”
both used to describe these models – they are
synonymous
• Business stakeholders often are more familiar with
“segmentation” than “clustering”

11
Mathematically

Divide data into meaningful,homogeneous,subsets/clusters/classes,


•For a better understanding of the underlying processes of data generation
•As an initialization of other tasks (eg.Supervised classification)

12
Clustering
Main ingredients
•The number of clusters,k
•The distance between points,d
•Evaluation of the quality of clusters
•Comparison between different clustering results
•The optimization procedure

13
Clustering

Approaches

•Hirerachical (divisive or
aggloremative)
•Centroid or partition-based
•Density-based
•Statistical modeling-based

14
Clustering Use
Cases
• Customer segmentation
• Rewards data misuse detection
• Segmentation on product and customer
strategy
• Anomaly detection

15
K‐Means
Clustering

16
K‐Means
Procedure
1. Select number of clusters before running the model, often called
k
2. Randomly choose k centroids (cluster centers)
• Can use K‐Means++ to reduce randomness by placing cluster centers a far
distance apart
3. Calculate the distance of each data point to all cluster centers
and assign all data points to the closest cluster
4. Find new centroids of each cluster by taking the mean of all data
points in the cluster
5. Use the new centroids and repeat steps 3 and 4 until the cluster
centers stop moving

17
K‐Means Visually

1
Image source: 8
https://towardsdatascience.com/k‐means‐clustering‐explained‐4528df86a120
K‐means—pitfalls

19
K‐means—pitfalls

20
K‐means—pitfalls

21
K‐means—pitfalls

22
K‐means—pitfalls

23
K‐Means Pros and
Cons
Pro
• Easy to interpret
• Scalable to large data
sets Cons
• Easy to overfit and only a small number of features can be used
• Does not handle highly correlated features well
• Number of clusters has to be preset
• Can only draw linear boundaries. If your data has non‐linear
boundaries, it will not perform well.
• Sensitive to outliers
• Slows down substantially as the number of samples increases
because distances between all data points and centroids
must be calculated with each adjustment
24
Clustering
Evaluation

25
Cluster Evaluation Metrics:
Inertia
• Inertia: The sum or squared distances of of all samples to their closest centroid
(cluster center)
• Distortion: Weighted sum of the squared distances between from data point to
its centroid

26
Cluster Evaluation Metrics:
Inertia
• Inertia will always decrease, looking for a leveling‐off point
• Seeing leveling off below at segments 4 and 5
• No rule of thumb for “good inertia”, can only compare
multiple models to each other

27
Cluster Evaluation Metrics:
Distortion
• Distortion will always decrease, looking for a leveling‐off
point
• Seeing leveling off below at segments 4 and 5
• Also, no rule of thumb for “good distortion”, can only
compare multiple models to each other

28
Image source:
https://livebook.manning.com/concept/r/dunn‐index
Cluster Evaluation Metrics: Elbow
Method
• Used to choose the
optimal number
of clusters
• Vary number of
Steep
clusters and er
monitor evaluation
metrics
Less
• Looking for where the Steep
slope becomes
less steep and the
metric improves less
rapidly
• Showing
“diminishing
returns”

2
9
Cluster Evaluation Metrics:
Silhouette Score

Silhouette Score:
Average distance
between the intra‐
cluster and inter‐cluster
variation, normalized
by the maximum.

3
0
Source:
https://www.analyticsvidhya.com/blog/2021/05/k ‐mean‐getting‐the‐optimal‐number‐of‐cluste
rs/;
Cluster Evaluation Metrics:
Silhouette Score
• Individual scores vary from ‐1 and +1
• The silhouette score is the average across all
data points

Interpretation
+1: The sample is far away from the neighboring
cluster
0: The sample is on or very near to the decision
boundary of a neighboring cluster
‐1: The sample may have been assigned to the
wrong cluster

3
1
Cluster Evaluation Metrics:
Silhouette Plots
• Thickness of plot represents the cluster size
• The silhouette scores is shown on the
horizontal axis

32
Image source:
https://scikit‐learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
33
Image source:
https://scikit‐learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
Tips and
Tricks

34
Incorporating Business Knowledge

• Unlike supervised learning


there is no “right answer” in
segmentation
• Using evaluation metrics
we can exclude many
“wrong” answers
• Always weight business value
equally with evaluation metrics
• Example: Shown on right 2
clusters has the best evaluation
metric, but 2 clusters will be
useless to the business 35
Avoiding
Overfitting
• Especially with categorical variables, be mindful of
overfitting
• An overfitted clustering model may automatically
put customers into one segment because of one
variable even if that variable is not important
• Always be wary if 0% of a categorical variable
is in one segment
• This may happen and be correct but you should always
make sure it makes “business sense”
• More commonly it is caused by overfitting
• Example when not overfitted: Rewards segmentation
with no use of points in lower value segments
• Example when overfitted: A common product is used
by 0% of a segment
36
Data with
Outliers
• K‐Means and Hierarchical clustering are often
ineffective with data that has extreme outliers
• If this is an issue, all the best customers may be in one
segment and the other segments look very similar
• The model is focusing on parsing out the “best” data
points and it loses power with the “least valuable” data
points
• In these cases, tend towards clustering methods
based on
density, like DBSCAN or Gaussian Mixture
Modeling

37
Utilize
Weights!
• Weights are one of the best tools you
have in both segmentation and predictive
models
• Often useful to give higher weight to rows
demonstrating patterns of high business value
• Often creates smaller “good” groups and larger
“worse” groups
• Example: In an automotive dealer repair
segmentation we weighted rows with higher
dealership repair spend and shorter recency as
being 20% more important

38
Criteria for Dividing
Clusters
Linkage criteria is the criteria used for choosing the closest data points
to merge with one another. It determine the rules for
combining clusters.
Linkage Criteria Description Pros/Cons
Ward’s Linkage Minimizes variance of clusters. • Biased towards globular
Aim is to choose the combination clusters
with the smallest increase in • Good with noisy data
variance.
Average Linkage Minimize average distance • Biased towards globular
between the points in each clusters
cluster • Good with noisy data
Centroid Linkage Maximizes difference between • Good with noisy data
centroids (mean of all data points) • Best with globular clusters
Complete Linkage Maximizes the distance between • Good with noisy data, often
the two farthest data points in breaks data into large
each cluster clusters
• Best with globular clusters
Single Linkage Maximizes the distance between • Impacted less by outliers
the two closest data points in each • Prone to noise
cluster 39
Criteria for Dividing Clusters

4
Source: 2
https://dataaspirant.com/hierarchical‐clustering‐algorithm/#t‐1608531820
444
Choosing the Number of Clusters
• When clusters are combined, you
create a dendrogram of each
combination
• The vertical line represents the
difference between the t wo
clusters
• The larger the distance of the
vertical line, the more
dissimilar the clusters are
from one another
• To choose the number of
clusters, draw and line
and separate the dendrogram
across the tallest vertical line

41
Hierarchical Clustering Pros
and Cons
Pro
• Do not need to set the number of cluster before modeling
• There are more “levers to pull” and tweak in the models to fit
the model to your data

Cons
• More complex to understand and explain than K‐Means
• More difficult to tune
• Not scalable to large data sets

42
DBSCAN

43
DBSCAN

44
DBSCAN—Algorithm

Let ClusterCount=0.For every point p:


1. If p it is not a core point,assign a null label to it [e.g.,zero]
2. If p
is a core point,a new cluster is formed [with label
ClusterCount:= ClusterCount+1]
Then find all points density-reachable from p and classify them in the cluster.
[Reassign the zero labels but not the others]
Repeat this process until all of the points have been visited.

45
DBSCAN ‐ Large Eps

46
DBSCAN ‐ Optimal Eps

47
In application

48
DBSCAN

49
Thank You

50

You might also like