ML 8

This document discusses clustering algorithms, specifically k-means clustering. It begins with an introduction to clustering and unsupervised learning. It then describes the k-means clustering algorithm, including defining the objective function, explaining the steps of the algorithm, and discussing convergence. Examples are provided to demonstrate applying k-means to datasets. The document concludes by discussing the strengths and weaknesses of k-means clustering and techniques for evaluating clustering results.

Uploaded by

Tejas Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views31 pages

ML 8

Uploaded by

Tejas Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Clustering Algorithm

Mr. Rohan Pillai

Assistant Professor
Department of Electrical Engineering, DTU
Supervised Learning
Unsupervised Learning
Unsupervised Learning
Introduction to Clustering
Can you spot the clusters here?
Group the following datapoints into 2
clusters :-

1. Single Linkage (Agglomerative) 2. K-Means (K = 2 here) Clustering

Algorithm Algorithm
Various aspects of Clustering
Distance ( dissimilarity ) measures
Cluster Evaluation ( a hard problem)
Optimal Clusters ?
Clustering techniques
1. k- means Clustering Algorithm
k-means objective function
k-means algorithm
k-means convergence ( stopping criterion)

•
☆qqi
iao
is Hot
•q &
*
aggro
q*%
k- means clustering example 1 :
Use k-means algorithm to divide the following dataset into three clusters

Step 1 : Randomly initialize the cluster centers (synaptic weights)

k- means clustering example 1…
Step 2 : Determine cluster membership for each datapoints
k- means clustering example 1…
Step 3 : Re-estimate cluster centers (adapt synaptic weights)
k- means clustering example 1…
k- means clustering example 1…
k- means clustering example 1…
k- means clustering : Strengths &
Weaknesses
Strengths

❑ Simple : easy to understand and to implement

❑ Relatively efficient : Time complexity = O(tkn) ,

where n is the number of datapoints,
k is the number of clusters, and
t is the number of iterations. (Since both k and t are small, k-means algorithm is
considered a linear algorithm)

❑ Procedure always terminates successfully

k- means clustering : Strengths &
Weaknesses
Weaknesses
❑ Does not necessarily find the most optimal configuration

❑ The algorithm is only applicable if the mean is defined.

- for categorical data, k-mode - the centroid is represented by most frequent values.

❑ The user needs to specify k.

❑ The algorithm is sensitive to outliers.

❑ Significantly sensitive to the initial randomly selected cluster centers

Effects of Outliers
Sensitivity to initial seeds
Clustering validity problem
• Problem 1 :
- A problem we face in clustering is to decide the optimal number of
clusters that fits a dataset
• Problem 2 :
- The various clustering algorithms behave in a different way depending on
▪ The features of the dataset (geometry and density distribution of clusters)
▪ The input parameter values (for eg : for k-means, initial cluster choices influence the
result)
• So how do we know , which clustering method is better/ suitable?
• We need a clustering quality criteria !!
Clustering quality criteria
One way to find the number of clusters :
‘Elbow method’
Reference ( Slides adapted from ):
• Andrew Moore, CMU
(https://www.cs.cmu.edu/~./awm/tutorials/kmeans11.pdf)

• http://www.mit.edu/~9.54/fall14/slides/Class13.pdf

• https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/und
erstanding-machine-learning-theory-algorithms.pdf

• CC282 Unsupervised Learning (Clustering) Lecture 7 – R.

Palaniappan (2008)

SDLC Grade 11 - EM
100% (1)
SDLC Grade 11 - EM
41 pages
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
100% (1)
Course Tittle:-Project Title:-: Object Oriented Software Analysis and Design
24 pages
59 Tweed 15w Amp Kit Instructions
100% (1)
59 Tweed 15w Amp Kit Instructions
44 pages
SOP-017 Physical Security (v.05)
No ratings yet
SOP-017 Physical Security (v.05)
9 pages
Unit 3 Java Script: Web Technologies
No ratings yet
Unit 3 Java Script: Web Technologies
135 pages
Incertech Catalogue.
No ratings yet
Incertech Catalogue.
24 pages
MOD 4 Notes
No ratings yet
MOD 4 Notes
19 pages
BIJ Data Analysis Report
No ratings yet
BIJ Data Analysis Report
18 pages
SPPL Ca 2022 V18a
No ratings yet
SPPL Ca 2022 V18a
284 pages
2303ec039 - Display Systems
100% (1)
2303ec039 - Display Systems
2 pages
Artificial Intelligence and Machine Learning in The Travel Industry Simplifying Complex Decision Making Ben Vinod Download
No ratings yet
Artificial Intelligence and Machine Learning in The Travel Industry Simplifying Complex Decision Making Ben Vinod Download
54 pages
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
No ratings yet
HB Ac2 Acv2 Ethernet Ip Geraeteintegration en
52 pages
DCM601A51 iTM 1
No ratings yet
DCM601A51 iTM 1
1 page
Install & Secure Windows Server 2016 Domain Controller PDF
No ratings yet
Install & Secure Windows Server 2016 Domain Controller PDF
152 pages
Nginx
No ratings yet
Nginx
41 pages
Automatic Plant Irrigation System
No ratings yet
Automatic Plant Irrigation System
7 pages
DS - Fujitsu PRIMERGY TX1310
No ratings yet
DS - Fujitsu PRIMERGY TX1310
7 pages
(Test 1) Elektrostatyka (B) PDF
No ratings yet
(Test 1) Elektrostatyka (B) PDF
1 page
CP 4 Ba Install 24
No ratings yet
CP 4 Ba Install 24
10 pages
Windows XP Professional SP3 x86 - Microsoft - Free Download, Borrow, and Streaming - Internet Archive
No ratings yet
Windows XP Professional SP3 x86 - Microsoft - Free Download, Borrow, and Streaming - Internet Archive
16 pages
Nep 2020 Ciet Behera
No ratings yet
Nep 2020 Ciet Behera
15 pages
05 - RLC and MAC Protocols
No ratings yet
05 - RLC and MAC Protocols
66 pages
Company Profile Template
No ratings yet
Company Profile Template
3 pages
Mastering Excel
No ratings yet
Mastering Excel
12 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Intern Description
No ratings yet
Intern Description
3 pages
Hacktricks-Cloud:basic-Github-Information - MD at Master Carlospolop:hacktricks-Cloud GitHub
No ratings yet
Hacktricks-Cloud:basic-Github-Information - MD at Master Carlospolop:hacktricks-Cloud GitHub
1 page
Moeketsi Mofokeng Computational Geophysics GEOP4004
No ratings yet
Moeketsi Mofokeng Computational Geophysics GEOP4004
2 pages
Log
No ratings yet
Log
7 pages
Python科學計算第一次作業
No ratings yet
Python科學計算第一次作業
1 page