0% found this document useful (0 votes)

7 views46 pages

Lecture 3

Uploaded by

asedovskaya.ann

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views46 pages

Lecture 3

Uploaded by

asedovskaya.ann

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Clustering

 What is Unsupervised learning?

 K-means clustering
 Hierarchical clustering
 Gaussian mixture model

What is Unsupervised Learning?

Unsupervised learning, also called Descriptive analytics, describes a family of

methods for uncovering latent structure in data.
In Supervised learning aka Predictive analytics, our data consisted of
observations (xi, yi), xi Rp, i = 1, . . . , n. Such data is called labelled, and the yi
are thought of as the labels for the data.
In Unsupervised learning, we just look at data xi, i = 1, . . . , n. This is called
unlabelled data.
Even if we have labels yi, we may still wish to temporarily ignore the yi and
conduct unsupervised learning on the inputs xi
Examples of clustering tasks:
 Identify similar groups of online shoppers based on their browsing and purchasing
history.
 Identify similar groups of music listeners or movie viewers based on their ratings or
recent listening/viewing patterns.
 Cluster input variables based on their correlations to remove redundant predictors
from consideration.
 Cluster hospital patients based on their medical histories.
 Cluster labeled data to see how classes are separated by features.

Left: Data Right: One possible way to cluster the data

Here's a less clear example. How should we partition it?
Here's one reasonable clustering.
A clustering is a partition {C1, . . . , CK}, where each Ck denotes a subset of the
observations.
Each observation belongs to one and only one of the clusters.
To denote that the i-th observation is in the k-th cluster, we write i Ck.
Method: K-mean clustering

Main idea: A good clustering is one for which the within-cluster variation is as
small as possible.
The within-cluster variation for cluster Ck is some measure of the amount by
which the observations within each class differ from one another.
We'll denote it by WCV (Ck).
Goal: Find C1, . . . , CK that minimize

This says: Partition the observations into K clusters such that the WCV summed
up over all K clusters is as small as possible.
How to define within-cluster variation?
Goal: Find C1, . . . , CK that minimize

Typically, we use Euclidean distance:

where |Ck| denotes the number of observations in cluster k.

To be clear: we're treating K as fixed ahead of time. We are not optimizing K as
part of this objective.
Simple example
How do we minimize WCV?

It's computationally infeasible to actually minimize this criterion.

We essentially have to try all possible partitions of n points into K sets.
When n = 10, K = 4, there are 34,105 possible partitions.
When n = 25, K = 4, there are 5 × 1013…
We're going to have to settle for an approximate solution.
K-means algorithm

It turns out that we can rewrite WCVk more conveniently:

Where is just the average of all the points in cluster Ck

So, let's try the following:
K-means algorithm:
1. Start by randomly partitioning the observations into K clusters.
2. Until the clusters stop changing, repeat:
a. For each cluster, compute the cluster centroid x¯k,
b. Assign each observation to the cluster whose centroid is the closest.
K-means demo with K = 3
Summary of K-means
We'd need to minimize

It's infeasible to actually optimize this in practice, but K-means at least gives us
a so-called local optimum of this objective.
The result we get depends both on K, and also on the random initialization that
we wind up with.
It's a good idea to try different random starts and pick the best result among
them.
There's a method called K-means++ that improves how the clusters are
initialized.
A related method, called K-medoids, clusters based on distances to a centroid
that is chosen to be one of the points in each cluster.
Hierarchical clustering
K-means is an objective-based approach that requires us to pre-specify the number
of clusters K.
The answer it gives is somewhat random: it depends on the random initialization
we started with.
Hierarchical clustering is an alternative approach that does not require a pre-
specified choice of K, and which provides a deterministic answer (no
randomness).
We'll focus on bottom-up or agglomerative hierarchical clustering.
Top-down or divisive clustering is also good to know about, but we won't directly
cover it here.
Dendogram
Left: Dendrogram obtained from complete linkage clustering
Center: Dendrogram cut at height 9, resulting in K = 2 clusters
Right: Dendrogram cut at height 5, resulting in K = 3 clusters
Interpreting dendrograms

Observations 5 and 7 are similar to each other, as are observations 1 and 6.

Observation 9 is no more similar to observation 2 than it is to observations 8, 5
and 7.
This is because observations {2, 8, 5, 7 } all fuse with 9 at height 1.8.
Linkages
Let dij = d(xi, xj) denote the dissimilarity (distance) between observation xi and xj.
At our first step, each cluster is a single point, so we start by merging the two
observations that have the lowest dissimilarity.
But after that…we need to think about distances not between points, but between
sets (clusters).
The dissimilarity between two clusters is called the linkage.
i.e., Given two sets of points, G and H, a linkage is a dissimilarity measure d(G,H)
telling us how different the points in these sets are.
Let's look at some examples.
Common linkage types
Complete. Maximal inter-cluster dissimilarity. Compute all pairwise
dissimilarities between the observations in cluster A and the observations in
cluster B and record the largest of these dissimilarities.

Single. Minimal inter-cluster dissimilarity. Compute all pairwise dissimilarities

between the observations in cluster A and the observations in cluster B and record
the smallest of these dissimilarities.

Average. Mean inter-cluster dissimilarity. Compute all pairwise dissimilarities

between the observations in cluster A and the observations in cluster B and record
the average of these dissimilarities.

Centroid. Dissimilarity between the centroid for cluster A (a mean vector of

length p) and the centroid for cluster B. Centroid linkage can result in undesirable
inversions.

Ward. Minimizes the variance of the clusters being merged.

Single linkage
dij = d(xi, xj) is pair distance,
single linkage score dsingle(G,H) is the distance of the closest pair.
Complete linkage
Average Linkage
Shortcomings of Single and Complete linkage

Single and complete linkage have some practical problems:

 Single linkage suffers from chaining.
 In order to merge two groups, only need one pair of points to be close,
irrespective of all others. Therefore clusters can be too spread out, and not
compact enough.

Complete linkage avoids chaining but suffers from crowding.

 Because its score is based on the worst-case dissimilarity between pairs, a
point can be closer to points in other clusters than to points in its own cluster .
Clusters are compact, but not far enough apart.

Average linkage tries to strike a balance. It uses average pairwise dissimilarity, so

clusters tend to be relatively compact and relatively far apart.
CHAINING versus CROWDING
Shortcomings of average linkage
Average linkage has its own problems:
• Unlike single and complete linkage, average linkage doesn’t give us a
nice interpretation when we cut the dendrogram.
• Results of average linkage clustering can change if we simply apply a
monotone increasing transformation to our dissimilarity measure, our results can
change

This can be a big problem if we’re not sure precisely what dissimilarity measure
we want to use.
Single and Complete linkage do not have this problem.
Gaussian Mixtures Model (GMM)

Multivariate Gaussian distribution

Gaussian Mixture Model:

Assume each observation has probability πk of coming from cluster k.

Assume that all observations from cluster k a drawn randomly from a
MVN(µk, ∑k) distribution.

We are assuming that there are latent class labels that we do not observe.
Expectation – Maximization Algorithm
PIJNCIPAL COMPONENT ANALYSIS (PCA)
PCA using scikit-learn

import numpy as np
from sklearn.decomposition import PCA

X = reading of data

pca = PCA()
pca.fit(X)

print(pca.explained_variance_ratio_)
print(pca.mean_)
C = components_
Y = pca.transform(X)

Data Privacy - Principles and Practice-Chapman and Hall - CRC (2017)
80% (5)
Data Privacy - Principles and Practice-Chapman and Hall - CRC (2017)
232 pages
Detection of Parkinson's Disease Using Machine Learning
75% (4)
Detection of Parkinson's Disease Using Machine Learning
91 pages
Full File at Https://Testbankdirect - Eu/ Test Bank For Essentials of Business Analytics 3Rd Edition by Camm
100% (3)
Full File at Https://Testbankdirect - Eu/ Test Bank For Essentials of Business Analytics 3Rd Edition by Camm
24 pages
Chapter 8 - Clustering
No ratings yet
Chapter 8 - Clustering
42 pages
Bok:978 1 4899 7218 7 PDF
No ratings yet
Bok:978 1 4899 7218 7 PDF
375 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Unit-2: Multi-Dimensional Data Model?
No ratings yet
Unit-2: Multi-Dimensional Data Model?
21 pages
Data Mining For Design and Marketing PDF
100% (1)
Data Mining For Design and Marketing PDF
312 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
No ratings yet
Data Mining & Business Intelligence (2170715) : Unit-5 Concept Description and Association Rule Mining
39 pages
Data Mining - Chapter 4 Cluster Analysis
No ratings yet
Data Mining - Chapter 4 Cluster Analysis
37 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Lecture 18 K Means Clustering
No ratings yet
Lecture 18 K Means Clustering
77 pages
Clustering - The Data Ensemble
No ratings yet
Clustering - The Data Ensemble
4 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
No ratings yet
Clustering Algorithms: Dalya Baron (Tel Aviv University) XXX Winter School, November 2018
53 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
4 3 Topic Notes New
No ratings yet
4 3 Topic Notes New
9 pages
Clustering
No ratings yet
Clustering
69 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Clustering
No ratings yet
Clustering
75 pages
Module5 QB 1
No ratings yet
Module5 QB 1
21 pages
Module12.02 UnsupervisedLearning
No ratings yet
Module12.02 UnsupervisedLearning
25 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Lecture07 95791
No ratings yet
Lecture07 95791
84 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Clustering (Class 38-39)
No ratings yet
Clustering (Class 38-39)
45 pages
5) - Differentiate Between K-Means and Hierarchical Clustering
No ratings yet
5) - Differentiate Between K-Means and Hierarchical Clustering
4 pages
Unit 5
No ratings yet
Unit 5
63 pages
Unit - 4 DMA
No ratings yet
Unit - 4 DMA
145 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
44 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Unit Iv
No ratings yet
Unit Iv
19 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering
No ratings yet
Clustering
39 pages
B43 Exp5 ML
No ratings yet
B43 Exp5 ML
6 pages
Session 18-Cluster Analysis
No ratings yet
Session 18-Cluster Analysis
20 pages
Cluster Analysis: Minh Tran, PHD
No ratings yet
Cluster Analysis: Minh Tran, PHD
37 pages
AI Chapter 3 Part 5
No ratings yet
AI Chapter 3 Part 5
30 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
PART2
No ratings yet
PART2
61 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Evolution of Machine Learning
No ratings yet
Evolution of Machine Learning
7 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
IDS Unit-3 L2
No ratings yet
IDS Unit-3 L2
26 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
What Is Unsupervised Learning
No ratings yet
What Is Unsupervised Learning
9 pages
Clustering
No ratings yet
Clustering
55 pages
Question: Design A BI System For Fraud Detection .Describe All The Steps From Data Collection To Decision Making Clearly?
No ratings yet
Question: Design A BI System For Fraud Detection .Describe All The Steps From Data Collection To Decision Making Clearly?
2 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
Helsenki - Intro To ML
No ratings yet
Helsenki - Intro To ML
35 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Topic 1 ISP565
No ratings yet
Topic 1 ISP565
58 pages
Artificial Intelligence and Cognitive Psychology PDF
No ratings yet
Artificial Intelligence and Cognitive Psychology PDF
9 pages
MonashOnline Graduate Diploma in Data Science
No ratings yet
MonashOnline Graduate Diploma in Data Science
4 pages
Week 1
No ratings yet
Week 1
50 pages
Kenny-230724-Top 50 Data Science Projects
No ratings yet
Kenny-230724-Top 50 Data Science Projects
9 pages
Mathematical and Computational Sciences
No ratings yet
Mathematical and Computational Sciences
18 pages
Artikel Audit 4 PDF
No ratings yet
Artikel Audit 4 PDF
10 pages
Management Information Systems: Chapter 8: Accessing Organizational Information - Data Warehouse + BP18
No ratings yet
Management Information Systems: Chapter 8: Accessing Organizational Information - Data Warehouse + BP18
27 pages
9721 333c
No ratings yet
9721 333c
3 pages
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
No ratings yet
Artificial Intelligence: Machine Learning Algorithms Id3 Dbscan
30 pages
Lec 1
No ratings yet
Lec 1
24 pages
Orange: From Experimental Machine Learning To Interactive Data Mining
No ratings yet
Orange: From Experimental Machine Learning To Interactive Data Mining
16 pages
It0095 F1
No ratings yet
It0095 F1
34 pages
Robust Decision Trees
No ratings yet
Robust Decision Trees
6 pages
3 Dbscan
No ratings yet
3 Dbscan
7 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
Comparing Dataset Characteristics That Favor The Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms
No ratings yet
Comparing Dataset Characteristics That Favor The Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms
7 pages
Lab 13
No ratings yet
Lab 13
5 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Clustering

 What is Unsupervised learning?

What is Unsupervised Learning?

Unsupervised learning, also called Descriptive analytics, describes a family of

Left: Data Right: One possible way to cluster the data

Typically, we use Euclidean distance:

where |Ck| denotes the number of observations in cluster k.

It's computationally infeasible to actually minimize this criterion.

It turns out that we can rewrite WCVk more conveniently:

Where is just the average of all the points in cluster Ck

Observations 5 and 7 are similar to each other, as are observations 1 and 6.

Single. Minimal inter-cluster dissimilarity. Compute all pairwise dissimilarities

Average. Mean inter-cluster dissimilarity. Compute all pairwise dissimilarities

Centroid. Dissimilarity between the centroid for cluster A (a mean vector of

Ward. Minimizes the variance of the clusters being merged.

Single and complete linkage have some practical problems:

Complete linkage avoids chaining but suffers from crowding.

Average linkage tries to strike a balance. It uses average pairwise dissimilarity, so

Multivariate Gaussian distribution

Assume each observation has probability πk of coming from cluster k.

You might also like