0% found this document useful (0 votes)

41 views23 pages

Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium

This document discusses unsupervised learning techniques including clustering and dimensionality reduction. It explains k-means clustering, hierarchical clustering, principal component analysis (PCA), and singular value decomposition (SVD). Real-world applications are provided for each technique.

Uploaded by

dimo.michev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views23 pages

Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium

Uploaded by

dimo.michev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Resume Membership

Search Medium Write

Published in Machine Learning for Humans

Vishal Maini Follow

Aug 19, 2017 · 10 min read · Listen

Machine Learning for Humans, Part 3:

Unsupervised Learning
Clustering and dimensionality reduction: k-means clustering,
hierarchical clustering, principal component analysis (PCA), singular
value decomposition (SVD)

This series is available as a full-length e-book! Download here. Free

for download, contributions appreciated (paypal.me/ml4h)

How do you find the underlying structure of a dataset? How do you

summarize it and group it most usefully? How do you effectively represent
data in a compressed format? These
3.4K are the
11 goals of unsupervised learning,
which is called “unsupervised” because you start with unlabeled data
(there’s no Y).

The two unsupervised learning tasks we will explore are clustering the data
into groups by similarity and reducing dimensionality to compress the data
while maintaining its structure and usefulness.

Examples of where unsupervised learning methods might be useful:

- An advertising platform segments the U.S. population into smaller

groups with similar demographics and purchasing habits so that
advertisers can reach their target market with relevant ads.
- Airbnb groups its housing listings into neighborhoods so that users
can navigate listings more easily.

- A data science team reduces the number of dimensions in a large

data set to simplify modeling and reduce file size.

In contrast to supervised learning, it’s not always easy to come up with

metrics for how well an unsupervised learning algorithm is doing.
“Performance” is often subjective and domain-specific.

Clustering
An interesting example of clustering in the real world is marketing data
provider Acxiom’s life stage clustering system, Personicx. This service
segments U.S. households into 70 distinct clusters within 21 life stage groups
that are used by advertisers when targeting Facebook ads, display ads, direct
mail campaigns, etc.

A selection of Personicx demographic clusters

Their white paper reveals that they used centroid clustering and principal
component analysis, both of which are techniques covered in this section.

You can imagine how having access to these clusters is extremely useful for
advertisers who want to (1) understand their existing customer base and (2)
use their ad spend effectively by targeting potential new customers with
relevant demographics, interests, and lifestyles.

You can actually find out which cluster you personally would belong to by answering a few simple questions in
Acxiom’s “What’s My Cluster?” tool.

Let’s walk through a couple of clustering methods to develop intuition for

how this task can be performed.

k-means clustering
“And k rings were given to the race of Centroids, who above all else, desire power.”
The goal of clustering is to create groups of data points such that points in
different clusters are dissimilar while points within a cluster are similar.

With k-means clustering, we want to cluster our data points into k groups. A
larger k creates smaller groups with more granularity, a lower k means larger
groups and less granularity.

The output of the algorithm would be a set of “labels” assigning each data
point to one of the k groups. In k-means clustering, the way these groups are
defined is by creating a centroid for each group. The centroids are like the
heart of the cluster, they “capture” the points closest to them and add them
to the cluster.

Think of these as the people who show up at a party and soon become the
centers of attention because they’re so magnetic. If there’s just one of them,
everyone will gather around; if there are lots, many smaller centers of
activity will form.

Here are the steps to k-means clustering:

1. Define the k centroids. Initialize these at random (there are also
fancier algorithms for initializing the centroids that end up
converging more effectively).
2. Find the closest centroid & update cluster assignments. Assign
each data point to one of the k clusters. Each data point is assigned
to the nearest centroid’s cluster. Here, the measure of “nearness” is
a hyperparameter — often Euclidean distance.
3. Move the centroids to the center of their clusters. The new
position of each centroid is calculated as the average position of
all the points in its cluster.
Keep repeating steps 2 and 3 until the centroid stop moving a lot at
each iteration (i.e., until the algorithm converges).

That, in short, is how k-means clustering works! Check out this visualization
of the algorithm — read it like a comic book. Each point in the plane is
colored according the centroid that it is closest to at each moment. You’ll
notice that the centroids (the larger blue, red, and green circles) start
randomly and then quickly adjust to capture their respective clusters.
Another real-life application of k-means clustering is classifying handwritten
digits. Suppose we have images of the digits as a long vector of pixel
brightnesses. Let’s say the images are black and white and are 64x64 pixels.
Each pixel represents a dimension. So the world these images live in has
64x64=4,096 dimensions. In this 4,096-dimensional world, k-means
clustering allows us to group the images that are close together and assume
they represent the same digit, which can achieve pretty good results for digit
recognition.
Hierarchical clustering
“Let’s make a million options become seven options. Or five. Or twenty? Meh, we
can decide later.”

Hierarchical clustering is similar to regular clustering, except that you’re

aiming to build a hierarchy of clusters. This can be useful when you want
flexibility in how many clusters you ultimately want. For example, imagine
grouping items on an online marketplace like Etsy or Amazon. On the
homepage you’d want a few broad categories of items for simple navigation,
but as you go into more specific shopping categories you’d want increasing
levels of granularity, i.e. more distinct clusters of items.

In terms of outputs from the algorithm, in addition to cluster assignments

you also build a nice tree that tells you about the hierarchies between the
clusters. You can then pick the number of clusters you want from this tree.

Here are the steps for hierarchical clustering:

1. Start with N clusters, one for each data point.
2. Merge the two clusters that are closest to each other. Now you
have N-1 clusters.

3. Recompute the distances between the clusters. There are several

ways to do this (see this tutorial for more details). One of them
(called average-linkage clustering) is to consider the distance
between two clusters to be the average distance between all their
respective members.
4. Repeat steps 2 and 3 until you get one cluster of N data points.
You get a tree (also known as a dendrogram) like the one below.
5. Pick a number of clusters and draw a horizontal line in the
dendrogram. For example, if you want k=2 clusters, you should draw a
horizontal line around “distance=20000.” You’ll get one cluster with
data points 8, 9, 11, 16 and one cluster with the rest of the data
points. In general, the number of clusters you get is the number of
intersection points of your horizontal line with the vertical lines
in the dendrogram.
Source: Solver.com. For more detail on hierarchical clustering, you can check this video out.

Dimensionality reduction
“It is not the daily increase, but the daily decrease. Hack away at the unessential.”
— Bruce Lee

Dimensionality reduction looks a lot like compression. This is about trying to

reduce the complexity of the data while keeping as much of the relevant
structure as possible. If you take a simple 128 x 128 x 3 pixels image (length x
width x RGB value), that’s 49,152 dimensions of data. If you’re able to reduce
the dimensionality of the space in which these images live without
destroying too much of the meaningful content in the images, then you’ve
done a good job at dimensionality reduction.

We’ll take a look at two common techniques in practice: principal

component analysis and singular value decomposition.

Principal component analysis (PCA)

First, a little linear algebra refresher — let’s talk about spaces and bases.

You’re familiar with the coordinate plane with origin O(0,0) and basis vectors
i(1,0) and j(0,1). It turns out you can choose a completely different basis and
still have all the math work out. For example, you can keep O as the origin
and choose the basis to vectors i’=(2,1) and j’=(1,2). If you have the patience
for it, you’ll convince yourself that the point labeled (2,2) in the i’, j’
coordinate system is labeled (6, 6) in the i, j system.
Plotted using Mathisfun’s “Interactive Cartesian Coordinates”

This means we can change the basis of a space. Now imagine much higher-
dimensional space. Like, 50K dimensions. You can select a basis for that
space, and then select only the 200 most significant vectors of that basis.
These basis vectors are called principal components, and the subset you
select constitute a new space that is smaller in dimensionality than the
original space but maintains as much of the complexity of the data as
possible.

To select the most significant principal components, we look at how much of

the data’s variance they capture and order them by that metric.

Another way of thinking about this is that PCA remaps the space in which
our data exists to make it more compressible. The transformed dimension is
smaller than the original dimension.

By making use of the first several dimensions of the remapped space only,
we can start gaining an understanding of the dataset’s organization. This is
the promise of dimensionality reduction: reduce complexity (dimensionality
in this case) while maintaining structure (variance). Here’s a fun paper
Samer wrote on using PCA (and diffusion mapping, another technique) to try
to make sense of the Wikileaks cable release.

Singular value decomposition (SVD)

Let’s represent our data like a big A = m x n matrix. SVD is a computation that
allows us to decompose that big matrix into a product of 3 smaller matrices
(U=m x r, diagonal matrix Σ=r x r, and V=r x n where r is a small number).
Here’s a more visual illustration of that product to start with:
The values in the r*r diagonal matrix Σ are called singular values. What’s cool
about them is that these singular values can be used to compress the original
matrix. If you drop the smallest 20% of singular values and the associated
columns in matrices U and V, you save quite a bit of space and still get a
decent representation of the underlying matrix.

To examine what that means more precisely, let’s work with this image of a
dog:
We’ll use the code written in Andrew Gibiansky’s post on SVD. First, we show
that if we rank the singular values (the values of the matrix Σ) by magnitude,
the first 50 singular values contain 85% of the magnitude of the whole matrix
Σ.
We can use this fact to discard the next 250 values of sigma (i.e., set them to
0) and just keep a “rank 50” version of the image of the dog. Here, we create a
rank 200, 100, 50, 30, 20, 10, and 3 dog. Obviously, the picture is smaller, but
let’s agree that the rank 30 dog is still good. Now let’s see how much
compression we achieve with this dog. The original image matrix is 305*275
= 83,875 values. The rank 30 dog is 305*30+30+30*275=17,430 — almost 5
times fewer values with very little loss in image quality. The reason for the
calculation above is that we also discard the parts of the matrix U and V that
get multiplied by zeros when the operation UΣ’V is carried out (where Σ’ is
the modified version of Σ that only has the first 30 values in it).
Unsupervised learning is often used to preprocess the data. Usually, that
means compressing it in some meaning-preserving way like with PCA or
SVD before feeding it to a deep neural net or another supervised learning
algorithm.

Onwards!
Now that you’ve finished this section, you’ve earned an awful, horrible,
never-to-be-mentioned-again joke about unsupervised learning. Here goes…

Person-in-joke-#1: Y would u ever need to use unsupervised tho?

Person-in-joke-#2: Y? there’s no Y.

Next up… Part 4: Neural Networks & Deep Learning!

Practice materials & further reading

3a — k-means clustering
Play around with this clustering visualization to build intuition for how the
algorithm works. Then, take a look at this implementation of k-means clustering
for handwritten digits and the associated tutorial.

3b — SVD
For a good reference on SVD, go no further than Andrew Gibiansky’s post.

Enter your email below if you’d like to stay up-to-date with future content
💌
On Twitter? So are we. Feel free to keep in touch — Vishal and Samer 🙌🏽

Part 1: Why Machine Learning Matters ✅

Part 2.1: Supervised Learning ✅

Part 2.2: Supervised Learning II ✅

Part 2.3: Supervised Learning III ✅

Part 3: Unsupervised Learning ✅

Part 4: Neural Networks & Deep Learning

Part 5: Reinforcement Learning

Appendix: The Best Machine Learning Resources

Machine Learning Artificial Intelligence Deep Learning Unsupervised Learning

Tech

Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
Grouping
No ratings yet
Grouping
98 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Lecture 2.1.1 To 2.1.2
No ratings yet
Lecture 2.1.1 To 2.1.2
97 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering
No ratings yet
Clustering
75 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
4 Clustring
No ratings yet
4 Clustring
48 pages
Module 4
No ratings yet
Module 4
63 pages
Clustering and K-Means Algorithm
No ratings yet
Clustering and K-Means Algorithm
81 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
Clustering
No ratings yet
Clustering
75 pages
Cluster
No ratings yet
Cluster
20 pages
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
No ratings yet
Cluster Analysis: Talha Farooq Faizan Ali Muhammad Abdul Basit
16 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
Clustering
No ratings yet
Clustering
32 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Clustering
No ratings yet
Clustering
38 pages
Introduction To Data Science: Clustering
No ratings yet
Introduction To Data Science: Clustering
45 pages
Module 5
No ratings yet
Module 5
43 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
Clustering
No ratings yet
Clustering
29 pages
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
No ratings yet
Topic Cheatsheet For GCP's Professional Machine Learning Engineer Beta Exam
2 pages
Clustering
No ratings yet
Clustering
84 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit-4 ML
No ratings yet
Unit-4 ML
16 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Cluster
100% (1)
Cluster
72 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
DWDM Unit V Note
No ratings yet
DWDM Unit V Note
19 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Clustering and Visualisation of Data - 2020
No ratings yet
Clustering and Visualisation of Data - 2020
5 pages
Clustering
No ratings yet
Clustering
39 pages
Rotation V3 - Encrypt
No ratings yet
Rotation V3 - Encrypt
39 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Product Catalogue en
No ratings yet
Product Catalogue en
40 pages
Review of Automation and Robotics For The BioIndus
No ratings yet
Review of Automation and Robotics For The BioIndus
18 pages
NetApp ONTAP S3
No ratings yet
NetApp ONTAP S3
19 pages
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
No ratings yet
Unit 5 - Design Concept (Sofrware Engineering) - NSG Academy
11 pages
The Influence of Social Media On Marketing
No ratings yet
The Influence of Social Media On Marketing
19 pages
Css q2 Week6 g12
No ratings yet
Css q2 Week6 g12
4 pages
Act. No. 02 Visual Acuity and Perception
No ratings yet
Act. No. 02 Visual Acuity and Perception
5 pages
Usim Conformance Testing Spec
No ratings yet
Usim Conformance Testing Spec
99 pages
Wifi Service Log
No ratings yet
Wifi Service Log
405 pages
Project Cost Estimate
100% (1)
Project Cost Estimate
2 pages
Grey Modern Company Resume
No ratings yet
Grey Modern Company Resume
2 pages
Mini Project 2 Semster
No ratings yet
Mini Project 2 Semster
35 pages
Quectel-Antenna-Brochure - V1 7 4
No ratings yet
Quectel-Antenna-Brochure - V1 7 4
20 pages
Introduction To The Motherboard Meet 1
No ratings yet
Introduction To The Motherboard Meet 1
8 pages
Samsung Ah68 02293b Users Manual 280484
No ratings yet
Samsung Ah68 02293b Users Manual 280484
39 pages
Makalah Group 8 B. Ingg
No ratings yet
Makalah Group 8 B. Ingg
19 pages
The Girl With The Broken Heart Lurlene Mcdaniel Download
No ratings yet
The Girl With The Broken Heart Lurlene Mcdaniel Download
27 pages
Panasonic Phone System KXT308
No ratings yet
Panasonic Phone System KXT308
6 pages
Stacks Notes
No ratings yet
Stacks Notes
21 pages
Data Mining Project
No ratings yet
Data Mining Project
5 pages
Create Windows 11 Bootable Usb Multiple Unattended Scripts - Google Search. 1pdf
No ratings yet
Create Windows 11 Bootable Usb Multiple Unattended Scripts - Google Search. 1pdf
1 page
52-199x Series Digital Mixers
No ratings yet
52-199x Series Digital Mixers
17 pages
Staad To Afes - Google Search
No ratings yet
Staad To Afes - Google Search
2 pages
wb3 Draft
No ratings yet
wb3 Draft
6 pages
Beautiful Architecture: Review
No ratings yet
Beautiful Architecture: Review
2 pages
Downloads F Biomassehacker en
No ratings yet
Downloads F Biomassehacker en
8 pages
Homework 3
No ratings yet
Homework 3
2 pages
Tara Coen, Bba: #204 - 13339 102A AVENUE, SURREY, BC, V3T 0C5::: (604) 290-8272
No ratings yet
Tara Coen, Bba: #204 - 13339 102A AVENUE, SURREY, BC, V3T 0C5::: (604) 290-8272
2 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium

Uploaded by

Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium

Uploaded by

Resume Membership

Search Medium Write

Published in Machine Learning for Humans

Vishal Maini Follow

Aug 19, 2017 · 10 min read · Listen

Machine Learning for Humans, Part 3:

This series is available as a full-length e-book! Download here. Free

How do you find the underlying structure of a dataset? How do you

Examples of where unsupervised learning methods might be useful:

- An advertising platform segments the U.S. population into smaller

- A data science team reduces the number of dimensions in a large

In contrast to supervised learning, it’s not always easy to come up with

A selection of Personicx demographic clusters

Let’s walk through a couple of clustering methods to develop intuition for

Here are the steps to k-means clustering:

Hierarchical clustering is similar to regular clustering, except that you’re

In terms of outputs from the algorithm, in addition to cluster assignments

Here are the steps for hierarchical clustering:

3. Recompute the distances between the clusters. There are several

Dimensionality reduction looks a lot like compression. This is about trying to

We’ll take a look at two common techniques in practice: principal

Principal component analysis (PCA)

To select the most significant principal components, we look at how much of

Singular value decomposition (SVD)

Person-in-joke-#1: Y would u ever need to use unsupervised tho?

Next up… Part 4: Neural Networks & Deep Learning!

Practice materials & further reading

More from Machine Learning for Humans 🤖👶

Part 1: Why Machine Learning Matters ✅

Part 2.2: Supervised Learning II ✅

Part 2.3: Supervised Learning III ✅

Part 3: Unsupervised Learning ✅

Part 4: Neural Networks & Deep Learning

Part 5: Reinforcement Learning

Appendix: The Best Machine Learning Resources

Machine Learning Artificial Intelligence Deep Learning Unsupervised Learning

You might also like