Lecture 3
Lecture 3
Main idea: A good clustering is one for which the within-cluster variation is as
small as possible.
The within-cluster variation for cluster Ck is some measure of the amount by
which the observations within each class differ from one another.
We'll denote it by WCV (Ck).
Goal: Find C1, . . . , CK that minimize
This says: Partition the observations into K clusters such that the WCV summed
up over all K clusters is as small as possible.
How to define within-cluster variation?
Goal: Find C1, . . . , CK that minimize
It's infeasible to actually optimize this in practice, but K-means at least gives us
a so-called local optimum of this objective.
The result we get depends both on K, and also on the random initialization that
we wind up with.
It's a good idea to try different random starts and pick the best result among
them.
There's a method called K-means++ that improves how the clusters are
initialized.
A related method, called K-medoids, clusters based on distances to a centroid
that is chosen to be one of the points in each cluster.
Hierarchical clustering
K-means is an objective-based approach that requires us to pre-specify the number
of clusters K.
The answer it gives is somewhat random: it depends on the random initialization
we started with.
Hierarchical clustering is an alternative approach that does not require a pre-
specified choice of K, and which provides a deterministic answer (no
randomness).
We'll focus on bottom-up or agglomerative hierarchical clustering.
Top-down or divisive clustering is also good to know about, but we won't directly
cover it here.
Dendogram
Left: Dendrogram obtained from complete linkage clustering
Center: Dendrogram cut at height 9, resulting in K = 2 clusters
Right: Dendrogram cut at height 5, resulting in K = 3 clusters
Interpreting dendrograms
This can be a big problem if we’re not sure precisely what dissimilarity measure
we want to use.
Single and Complete linkage do not have this problem.
Gaussian Mixtures Model (GMM)
We are assuming that there are latent class labels that we do not observe.
Expectation – Maximization Algorithm
PIJNCIPAL COMPONENT ANALYSIS (PCA)
PCA using scikit-learn
import numpy as np
from sklearn.decomposition import PCA
X = reading of data
pca = PCA()
pca.fit(X)
print(pca.explained_variance_ratio_)
print(pca.mean_)
C = components_
Y = pca.transform(X)