Lecture_08_slides
Lecture_08_slides
Unsupervised learning
Announcements
▪ Unsupervised learning
• Clustering: k-Means
4
Introduction Linear regression Logistic regression
Feature 1 x1
PCA
Projection of points onto a lower dimensional subspace
w1x1 + w2x2
The new feature
Length
Weight x1
Projection of a point onto a subspace
Subspace
Standardize data:
PCA objective:
Eigenvalue decomposition of a symmetric matrix
Solution to PCA using eigenvalue decomposition
Principal components
Example: projection of data onto 2 principal components
PCA
Pseudo-code
PCA example - distinguishing texts
Defining features
Autoencoders
Autoencoder
Introduction
Supervised
S
& - unsupervised
classification dimensionale
I 1)
regression ,
recluchen clostery
2X :
lin logesha
regar K .
NN
negression
>
·
PCA
-
K-
means
.
"
NN
-
Naive
Bayes
Clustering - definition and motivation
Goal: group data points as a first step to understand the data set
▪ Toy example:
• Each data sample is a point in 2D
cluster 1
cluster 2
cluster 3 cluster 4
k-means approach to clustering
▪ k-Means idea:
• Identify&
k cluster of data points given N samples.
-
&
cluster and add the other data points to the nearest cluster center.
pank
representative
k-Means
Preliminaries
▪ Determining the cluster centres to minimize the distance of each point to its
assigned cluster
k-Means objective function
Algorithm - heuristic
1. Initialize {μ1, μ2, . . . , μk} (e.g., randomly)
2. While not converged
1. Assign each point …. to the nearest center
2. Update each center μj based on the points assigned to it
k-Means
Algorithm - Details
▪ Step 2.1: Assign each point…. to the nearest center
• For each point …, compute the Euclidean distance to every center
{μ1, μ2, . . . , μk}
• Find the smallest distance
• The point is said to be assigned to the corresponding cluster (note that each
point is assigned to a single cluster)
▪ Step 2.2: Update each center μj based on the points assigned to it
• Recompute each center μj as the mean of the points that were assigned to it
k-Means
Algorithm - Convergence
▪ Step 2 is repeated while k-Means has not converged
• What criteria to stop iterating?
• Fixed number of iterations? It’s arbitrary and a too small number can lead to
bad results
• The difference in assignments or center locations between two iterations can
be used as criteria to stop the algorithm
PCA
importance of Standardisation:
PCA on StatQuest!!! k-Means: chapter 4 of LinAlgebra book