0% found this document useful (0 votes)
5 views

Lecture_08_slides

The document covers unsupervised learning techniques, focusing on dimensionality reduction methods like Principal Component Analysis (PCA) and clustering methods such as k-Means. It explains the motivation behind these techniques, their algorithms, and their applications in data analysis. Additionally, it provides examples and comparisons between PCA and autoencoders for dimensionality reduction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture_08_slides

The document covers unsupervised learning techniques, focusing on dimensionality reduction methods like Principal Component Analysis (PCA) and clustering methods such as k-Means. It explains the motivation behind these techniques, their algorithms, and their applications in data analysis. Additionally, it provides examples and comparisons between PCA and autoencoders for dimensionality reduction.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

1

Unsupervised learning
Announcements

▪ Last year exam posted on Moodle front page

▪ Quiz 1 grades posted.

▪ Next week: 22.11.2023, Quiz 2 - same format as last week


• Covers topics covered during 23.10-06.11
3
Outline

▪ Unsupervised learning

• Dimensionality reduction: Principal Component Analysis (PCA)

• Clustering: k-Means
4
Introduction Linear regression Logistic regression

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees


networks
Review of data statistics, ex: Quiz 1 grades
Mean: 7.63, Mode: 8, Median: 8, Standard deviation: 1.85
Brief review of last lecture
Introduction
Unsupervised learning

Unsupervised learning is a type of machine learning that looks for previously


undetected patterns in a data set with no pre-existing labels and with a minimum of
human supervision.

in the next techniques, we don’t use labels anymore!


Dimensionality reduction
Motivation
Intuition

We have samples described by a series of features

We want to find a smaller set of new


x2 features that explain our sample because:

Less features is easier to visualize


Feature 2

Some of the current feature can be


redundant
Some of the current features are not very
useful to describe our samples
Feature 1 x1
Principal component analysis (PCA)
Approach to dimensionality reduction

How to find this smaller set of new features ?


PCA : Find the best linear combination of
x2
features to create new features that explain our
samples better
Feature 2

Feature 1 x1
PCA
Projection of points onto a lower dimensional subspace

How to find this smaller set of new features ?


x2

w1x1 + w2x2
The new feature
Length

A mix of length and weight that describe


our samples better

Weight x1
Projection of a point onto a subspace
Subspace

Distance of a point to a subspace

Projection of a point onto a subspace


Find a subspace to minimize average distance of data to it

Sum of distances of all data points to a subspace

PCA chooses a subspace to minimize


Formulation of PCA objective using the data matrix

Standardize data:

Frobenius norm of a matrix:

PCA objective:
Eigenvalue decomposition of a symmetric matrix
Solution to PCA using eigenvalue decomposition
Principal components
Example: projection of data onto 2 principal components
PCA
Pseudo-code
PCA example - distinguishing texts
Defining features

Each data sample is a document

There are d unique words in all the documents

Feature is positive for document if word is in document


TFIDF feature definition for documents

Term frequency of word … in document …

Document frequency of word ….

TFIDF for each word ….


PCA example
Based on “Principal Component Analysis” lecture of Stanford EE104

Distinguishing text: The Critique of Pure Reason by Immanuel Kant and


The Problems of Philosophy by Bertrand Russell
Dimensionality reduction
Other techniques

There are several other techniques for dimensionality reduction :

Linear discriminant analysis (LDA)

Generalized discriminant analysis (GDA)

T-distributed Stochastic Neighbor Embedding (t-SNE)

Autoencoders
Autoencoder
Introduction

▪ An autoencoder is a type of neural network often used for dimensionality


reduction.
▪ Autoencoders are trained in an unsupervised manner, by minimizing the
N
L (x , x ̂ )
i i

reconstruction error / loss
i=1

▪ Example: Squared error


Autoencoder
Autoencoder vs. PCA

Top: Some examples of the original MNIST


test samples

Middle: Reconstructed output from an auto-


encoder with a latent space of 8 dimensions
This auto-encoder uses convolutional layers, and
was trained on the MNIST training set

Bottom: Reconstructed output from PCA with


8 reduced dimensions

Image credit: F. Fleuret, Deep Learning (EPFL)


Summary - dimensionality reduction
Used for
• Exploratory data analysis
• Visualizing data
• Help reduce overfitting by reducing feature dimension

PCA: an approach to dimensionality reduction


▪ Projects data onto a linear subspace
▪ Useful in case there is approximately linear dependence
between different features
▪ Easy to compute
▪ Connection to singular value decomposition (see Problem set 2)
Clustering
ML

Supervised
S
& - unsupervised
classification dimensionale

I 1)
regression ,
recluchen clostery
2X :
lin logesha
regar K .
NN
negression
>
·

PCA
-
K-
means
.
"
NN
-

Naive
Bayes
Clustering - definition and motivation

Goal: group data points as a first step to understand the data set

Examples: (see 4.4 and 4.5 in LinAlgebra book


- Clustering music genres: grouping music based on the similarities in their
audio features (beat per minute, duration, loudness, amount of words, etc)
Clustering introduction

▪ Toy example:
• Each data sample is a point in 2D

cluster 1
cluster 2

cluster 3 cluster 4
k-means approach to clustering

How to cluster data without labels?

Try to find similarity between groups of points

k-means: Group points based on their proximity


(in terms of distance in the feature space)
k-means
Introduction

▪ Given a set of unlabelled input samples, group the samples into k


clusters (k ∈ ℕ) [xi). ,

▪ k-Means idea:
• Identify&
k cluster of data points given N samples.
-

• Find prototype points μ1, μ2, . . . , μk representing the center of each


-

&
cluster and add the other data points to the nearest cluster center.

pank
representative
k-Means
Preliminaries

▪ A single representative point for data:


k-Means
▪ Choose k clusters to represent data

▪ Determining the cluster a single point … belongs to

▪ Determining the cluster centres to minimize the distance of each point to its
assigned cluster
k-Means objective function

Algorithm - heuristic
1. Initialize {μ1, μ2, . . . , μk} (e.g., randomly)
2. While not converged
1. Assign each point …. to the nearest center
2. Update each center μj based on the points assigned to it
k-Means
Algorithm - Details
▪ Step 2.1: Assign each point…. to the nearest center
• For each point …, compute the Euclidean distance to every center
{μ1, μ2, . . . , μk}
• Find the smallest distance
• The point is said to be assigned to the corresponding cluster (note that each
point is assigned to a single cluster)
▪ Step 2.2: Update each center μj based on the points assigned to it

• Recompute each center μj as the mean of the points that were assigned to it
k-Means
Algorithm - Convergence
▪ Step 2 is repeated while k-Means has not converged
• What criteria to stop iterating?
• Fixed number of iterations? It’s arbitrary and a too small number can lead to
bad results
• The difference in assignments or center locations between two iterations can
be used as criteria to stop the algorithm

▪ K-Means does not always converge to the best solution



k-Means
Example
▪ Use the Palmer Penguins dataset
▪ With Flipper length against Bill length

With label Without label


k-Means
Example
▪ Centroid initialisation
k-Means
Example
▪ First assignment
k-Means
Example
▪ Next assignment
k-Means
Example
▪ Next assignment
k-Means
Example
▪ Final assignment
Summary - Clustering
Used for understanding data
Examples:
Topic discovery in a large set of documents
Recommendation engines
Guessing missing entries

k-means: an approach to clustering


Easy to implement and to interpret
k-means algorithm converges. However, non-convex optimization: solution depends on
initial condition
Further reading

PCA
importance of Standardisation:
PCA on StatQuest!!! k-Means: chapter 4 of LinAlgebra book

k-means: Chapter 4 of LinAlgebra book

▪ For coding of PCA/k-means, see SciKit Learn

You might also like