0% found this document useful (0 votes)
82 views

Unsupervised Learning

Uploaded by

ayesha bashir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Unsupervised Learning

Uploaded by

ayesha bashir
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Artificial Intelligence

Machine Learning
Unsupervised learning
2
Disclaimer Statement
In preparation of these slides, materials have been taken
from different online sources in the shape of books, websites,
research papers and presentations etc. However, the author
does not have any intention to take any benefit of these in
her/his own name. This lecture (audio, video, slides etc) is
prepared and delivered only for educational purposes and is
not intended to infringe upon the copyrighted material.
Sources have been acknowledged where applicable. The
views expressed are presenter’s alone and do not necessarily
represent actual author(s) or the institution.
Unsupervised Learning
Training data:“examples” x.

x1 , . . . , x n , x i ∈ X ⊂ Rn

• Clustering/segmentation:

f : R d − → {C 1 , . . . C k } (set of clusters).

Example: Find clusters in the population, fruits, species.


Unsupervised learning

Feature 2

Feature 1
Find clusters in the population feature 1 and feature 2.
Unsupervised learning

Feature 2

Feature 1
Methods: K-means, gaussian mixtures, hierarchical agglomerative clustering,
spectral clustering, DBScan, etc.
Clustering examples
• Clustering of the population by their demographics.

• Clustering of geographic objects (mineral deposits,


houses, etc.)
• Clustering of stars

• Audio signal separation. Example?

• Image segmentation. Example?


K-Means: example

Clustering of the population by their demographics.


K-Means: example

Clustering of geographic objects (mineral deposits, houses, etc.)


K-Means: example

Clustering of stars
K-Means: example

Clustering of stars
K-Means: example

Clustering of stars
K-Means: example

Clustering of stars
Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.
Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.

• µ j is the mean of all examples in the j t h cluster.


Clustering: K-Means
• Goal: Assign each example (x 1 , . . . , x n ) to one of the k clusters
{C1, . . . Ck}.

• µ j is the mean of all examples in the j t h cluster.

• Minimize: Σk Σ
2
J = ||x i − µ j ||
j=1 x i ∈C j
Clustering: K-Means
Algorithm K-Means:
Initialize randomly µ 1 , · · · µ k .
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to
the cluster with the closest µ j .
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to the cluster with the closest µ j .
Calculate the new mean for each cluster as follows:

1 Σ
µj = xi
|Cj |
x i ∈C j

Until convergence∗.
Clustering: K-Means
Algorithm K-Means: Initialize
randomly µ 1 , · · · µ k . Repeat
Assign each point x i to the cluster with the closest µ j .
Calculate the new mean for each cluster as follows:

1 Σ
µj = xi
|Cj |
x i ∈C j

Until convergence∗.

∗Convergence: Means no change in the clusters OR maximum


number of iterations reached.
K-Means: pros and cons
+Easy to implement

BUT...

-Need to know K
-Suffer from the curse of dimensionality
-No theoretical foundation
K-Means: questions

1. How to set k to optimally cluster the data?

2.How to evaluate your model? 3.How to cluster

non circular shapes?


K-Means: question 1
How to set k to optimally cluster the data?
G-means algorithm (Hamerly and Elkan, NIPS 2003) 1.Initialize
k to be a small number
2. Run k-means with those cluster centers, and store the resulting centers as C
3. Assign each point to its nearest cluster
4. Determine if the points in each cluster fit a Gaussian distribu- tion (Anderson-
Darling test).
5. For each cluster, if the points seem to be normally distributed, keep the cluster
center. Otherwise, replace it with two cluster centers.
6. Repeat this algorithm from step 2. until no more cluster cen- ters are created.
K-Means: question 2
How to evaluate your model?
• Not trivial (as compared to counting the number of errors in classification).

• Internal evaluation: using same data. high intra-cluster sim- ilarity


(documents within a cluster are similar) and low inter- cluster similarity. E.g.,
Davies-Bouldin index that takes into account both the distance inside the
clusters and the distance between clusters. The lower the value of the index,
the wider is the separation between different clusters, and the more tightly the
points within each cluster are located together.

• External evaluation: use of ground truth of external data. E.g., mutual


information, entropy, adjusted random index, etc.
K-Means: question 3
How to cluster non circular shapes?

There are other methods: spectral clustering, DBSCAN, BIRCH, etc. that
handle other shapes.

You might also like