0% found this document useful (0 votes)
4 views

Clustering Machine Learning Algorithms (2)

Uploaded by

Amine Benattouch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Clustering Machine Learning Algorithms (2)

Uploaded by

Amine Benattouch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Clustering

Machine
Learning
Algorithms
Outline

01 ML Categories 03 K-Means

02 What is clustering? 04 Hierarchical Clustering


Machine Learning Catgeories

(1) Supervised
● Used to train machines using
labeled data
● Takes labeled inputs and maps it
to known outputs (you already Classification
know the target variable)
&
Regression Problems
Machine Learning Catgeories

Clustering
&
(2) Unsupervised Association Problems
● Uses unlabeled data to discover
patterns and features in the data
● Understands patterns and trends in
the data and discovers the output
Machine Learning Catgeories

Reward Based
(3) Reinforcement Problems
● Uses an agent and an environment
to produce actions and rewards
● Follows trial and error method to
arrive at final solution
● Agent receives award after
finishing task
Clustering

Grouping similar objects together

Clusters

According to some predefined similarity or dissimilarity measures


Clustering Methods

01 02

Partitional Hierarchical
Partitional Clustering

Database
‘k’
partitions
‘n’ Objects of data

Satisfying: Process:
- Each group contains at least one object - Create Initial partitioning
- Each object belongs to exactly one cluster - Use an iterative relocation technique to improve
partitioning
K-Means
K-Means

Stop Condition
- Define a maximum number of iterations
- Inertia doesn’t decrease or only
decreases insignificantly
(Inertia is the sum of squared distances. It
keeps decreasing throughout the iterations,
thus improving the data compactness)
K-Means

Avoiding Local Optima


By minimizing initialization bias:
We perform 10 different random
initializations and run the K-Means
algorithm and calculate inertia.
We then choose the best option
(lowest inertia)
This can be performed in R language
via n_init parameter
K-Means

Advantages
Fast
Can serve as a data reduction
technique
K-Means

Disadvantages
It has a tendency to identify clusters
with same size ad volume (spherical
shapes)
Unable to identify elongated or non-
convex clusters
K-Means

Practical Use

Text Mining
Predictive Marketing
Clustering Methods

01 02

Partitional Hierarchical
Hierarchical Clustering

Database
Dendrogram
‘n’ Objects
Hierarchical Clustering
Hierarchical Clustering

Top-Bottom

Clustering continues until small


groups of similar clusters are
obtained
Hierarchical Clustering

Bottom-Up

Clustering continues until a


single cluster is obtained
Hierarchical Clustering
Algorithm

Step1: Consider every data point as


an individual cluster
Step 2: Calculate Proximity Matrix
for each cluster
Step 3:Merge the clusters which are
highly similar or close to each other.
Step 4: Recalculate the proximity
matrix for each cluster
Step 5: Repeat Steps 3 and 4 until
only a single cluster remains.
Hierarchical Clustering
Linkage Methods

A B C D E

A 0

B 1 0

C 2 2 0

D 2 5 3 0

E 3 4 6 6 0
Hierarchical Clustering
Linkage Methods

A B C D E

A 0

B 1 0

C 2 2 0

D 2 5 3 0

E 3 4 6 6 0
Hierarchical Clustering
Linkage Methods

A,B C D E

A,B 0

C 0

D 3 0

E 6 6 0
Hierarchical Clustering
Linkage Methods

A,B C D E

A,B 0

Single C 2 0

Link D 2 3 0

E 3 6 6 0
Hierarchical Clustering
Linkage Methods

A,B C D E

A,B 0

Complete C 2 0

Link D 5 3 0

E 4 6 6 0
Hierarchical Clustering
Linkage Methods

A,B C D E

A,B 0

C 2 0
Average D 3.5 3 0

E 3.5 6 6 0
Hierarchical Clustering
Linkage Methods

A,B C D E

A,B 0

C 2 0
Average D 3.5 3 0

E 3.5 6 6 0
Hierarchical Clustering
Linkage Methods

(A,B),C D E

(A,B),C 0

D 0
Average E 6 0
Hierarchical Clustering
Linkage Methods

(A,B),C D E

(A,B),C 0

D 3.33 0
Average E 4.33 6 0
Hierarchical Clustering
Linkage Methods

Single Link criteria


Complete Link criteria
Distance between
Average criteria cluster means

Centeroid criteria
Ward’s criteria
Hierarchical Clustering
Linkage Methods

Single Link criteria


Complete Link criteria
Minimize total within
Average criteria cluster variance

Centeroid criteria
Ward’s criteria
Hierarchical Clustering

Advantages Disadvantages

It overcomes the spherical High complexity level because


shape problem of K-Means it makes a huge number of
calculations between groups
‫بالتوفيق‬

You might also like