0% found this document useful (0 votes)

45 views5 pages

Unsupervised Learning - Clustering Cheatsheet - Codecademy

This document provides an overview of the K-Means clustering algorithm and how it can be implemented using scikit-learn. It explains that K-Means groups unlabeled data into K clusters based on centroid distances, and the steps are to initialize centroids randomly, assign points to closest centroids, and update centroid positions. It also discusses measuring clustering quality using inertia and finding the optimal K with the elbow method.

Uploaded by

Imane Loukili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views5 pages

Unsupervised Learning - Clustering Cheatsheet - Codecademy

Uploaded by

Imane Loukili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

14/11/2023 17:32 Unsupervised Learning: Clustering Cheatsheet | Codecademy

Cheatsheets / Unsupervised Learning

Clustering

K-Means: Inertia

Inertia measures how well a dataset was clustered by K-

Means. It is calculated by measuring the distance
between each data point and its centroid, squaring this
distance, and summing these squares across one
cluster.
A good model is one with low inertia AND a low number
of clusters ( K ). However, this is a tradeoff because as
K increases, inertia decreases.
To find the optimal K for a dataset, use the Elbow
method; find the point where the decrease in inertia
begins to slow. K=3 is the “elbow” of this graph.

Unsupervised Learning Basics

Patterns and structure can be found in unlabeled data

using unsupervised learning, an important branch of
machine learning. Clustering is the most popular
unsupervised learning algorithm; it groups data points
into clusters based on their similarity. Because most
datasets in the world are unlabeled, unsupervised
learning algorithms are very applicable.
Possible applications of clustering include:

Search engines: grouping news topics and search

results

Market segmentation: grouping customers based

on geography, demographics, and behaviors

https://www.codecademy.com/learn/unsupervised-learning-skill-path/modules/clustering-skill-path/cheatsheet 1/5
14/11/2023 17:32 Unsupervised Learning: Clustering Cheatsheet | Codecademy

K-Means Algorithm: Intro

K-Means is the most popular clustering algorithm. It

uses an iterative technique to group unlabeled data into
K clusters based on cluster centers (centroids). The
data in each cluster are chosen such that their average
distance to their respective centroid is minimized.

1. Randomly place K centroids for the initial

clusters.

2. Assign each data point to their nearest centroid.

3. Update centroid locations based on the locations

of the data points.

Repeat Steps 2 and 3 until points don’t move between

clusters and centroids stabilize.

K-Means Algorithm: 2nd Step

After randomly choosing centroid locations for K- # distance formula

Means, each data sample is allocated to its closest
def distance(a, b):
centroid to start creating more precise clusters.
The distance between each data sample and every one = (a[0] - b[0]) **2
centroid is calculated, the minimum distance is two = (a[1] - b[1]) **2
selected, and each data sample is assigned a label that distance = (one+two) ** 0.5
indicates its closest cluster.
return distance
The distance formula is implemented as .distance()
and used for each data point.
np.argmin() is used to find the minimum distance and
find the cluster at that distance.

https://www.codecademy.com/learn/unsupervised-learning-skill-path/modules/clustering-skill-path/cheatsheet 2/5
14/11/2023 17:32 Unsupervised Learning: Clustering Cheatsheet | Codecademy

Scikit-Learn Datasets

The scikit-learn library contains built-in datasets in

its datasets module that are often used in machine
learning problems like classification or regression.
Examples:

Iris dataset (classification)

Boston house-prices dataset (regression)

The format of these datasets are important to their use

with algorithms. For example, each piece of data in the
Iris dataset is a sample (flower type), and each element
within a sample is a feature (i.e. petal width).

K-Means Using Scikit-Learn

Scikit-Learn, or sklearn , is a machine learning library from sklearn.cluster import KMeans

for Python that has a K-Means algorithm
implementation that can be used instead of creating
one from scratch. model = KMeans(n_clusters=3)
To use it:
model.fit(data_samples)
Import the KMeans() method from the
sklearn.cluster library to build a model with
n_clusters labels = model.predict(data_samples)

Fit the model to the data samples using .fit()

Predict the cluster that each data sample belongs

to using .predict() and store these as labels

Cross Tabulation Overview

Cross-tabulations involve grouping pieces of data import pandas as pd

together in order to examine their relationship in a
different way. Sometimes correlations within data can
be seen better when not just looking at total responses. cross_tab =
This technique is often performed in Python after pd.crosstab(df['pred_labels'],
running K-Means; the Pandas method .crosstab() df['user_labels'])
allows for comparison between resulting cluster labels
and user-defined labels for each data sample. In order
to validate the results of a K-Means model with this
technique, there must be user-defined labels for all
data samples.

https://www.codecademy.com/learn/unsupervised-learning-skill-path/modules/clustering-skill-path/cheatsheet 3/5
14/11/2023 17:32 Unsupervised Learning: Clustering Cheatsheet | Codecademy

K-Means: Reaching Convergence

In K-Means, after placing K random centroids, the data

samples are repeatedly assigned to the nearest
centroid and then centroid locations are updated. This
continues until each of the centroids’ coordinates
converge, or stop changing.
This sequence of events can be implemented in Python
using a while loop. The loop continues until the
difference between each element of the updated
centroids and each element of the past
centroids_old is 0. This will mean the centroids have
converged and the clusters are complete!

K-Means Algorithm: 3rd Step

The third step of K-Means updates centroid locations.

After the data are assigned to their respectively closest
centroid in step 2, each cluster center location is
adjusted to be the average of its assigned data points.
The NumPy .mean() function is used to find the
average x and y-coordinates of all data points for each
cluster and store these as the new centroid locations.

https://www.codecademy.com/learn/unsupervised-learning-skill-path/modules/clustering-skill-path/cheatsheet 4/5
14/11/2023 17:32 Unsupervised Learning: Clustering Cheatsheet | Codecademy

K-Means Algorithm: 1st Step

The first step of the K-Means clustering algorithm

requires placing K random centroids which will become
the centers of the K initial clusters. This step can be
implemented in Python using the Numpy
random.uniform() function; the x and y-coordinates are
randomly chosen within the x and y ranges of the data
points.

Print Share

https://www.codecademy.com/learn/unsupervised-learning-skill-path/modules/clustering-skill-path/cheatsheet 5/5

AI Concepts Using Python
100% (5)
AI Concepts Using Python
428 pages
02 01 KMeans
100% (1)
02 01 KMeans
62 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Github Data Science Projects
No ratings yet
Github Data Science Projects
16 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
Unit IV
No ratings yet
Unit IV
96 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Kmea
No ratings yet
Kmea
53 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
CE345 - Lecture #9 - Clustering
No ratings yet
CE345 - Lecture #9 - Clustering
56 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
Wa0033.
No ratings yet
Wa0033.
38 pages
02 - KNN & Regression
No ratings yet
02 - KNN & Regression
40 pages
Week 9
No ratings yet
Week 9
66 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
Internship Report On Data Science
No ratings yet
Internship Report On Data Science
33 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Week 11
No ratings yet
Week 11
49 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
BE AIDS 2020 Syllabus
No ratings yet
BE AIDS 2020 Syllabus
126 pages
K Means
No ratings yet
K Means
25 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
K Means
No ratings yet
K Means
24 pages
ML Unit-5
No ratings yet
ML Unit-5
21 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Unit 4 Machine Learning
No ratings yet
Unit 4 Machine Learning
12 pages
EAI13
No ratings yet
EAI13
19 pages
01 K Means - Merged
No ratings yet
01 K Means - Merged
26 pages
Chapter 8
No ratings yet
Chapter 8
15 pages
2.3. Clustering - Scikit-Learn 1
No ratings yet
2.3. Clustering - Scikit-Learn 1
24 pages
K-Means Clustering Clearly Explained
No ratings yet
K-Means Clustering Clearly Explained
12 pages
Week 14 and 15 Machine Learning Unsupervised 2
No ratings yet
Week 14 and 15 Machine Learning Unsupervised 2
25 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
10.lab Activity
No ratings yet
10.lab Activity
11 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
K Means Clustering - Experiment 12
No ratings yet
K Means Clustering - Experiment 12
3 pages
AI With Python - Unsupervised Learning - Clustering
No ratings yet
AI With Python - Unsupervised Learning - Clustering
12 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Aiml 8
No ratings yet
Aiml 8
7 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Machine Learning K Means - Unsupervised
No ratings yet
Machine Learning K Means - Unsupervised
5 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
K Means
No ratings yet
K Means
9 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Exp 7
No ratings yet
Exp 7
3 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
AI Enhanced+Cybersecurity+in+Smart+Manufacturing
No ratings yet
AI Enhanced+Cybersecurity+in+Smart+Manufacturing
38 pages
B. Tech-in-Computer-Science-and-Engineering-Data-Science-Thir-Year-2023-24
No ratings yet
B. Tech-in-Computer-Science-and-Engineering-Data-Science-Thir-Year-2023-24
73 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Sample Final Exam Solutions
No ratings yet
Sample Final Exam Solutions
30 pages
cs221 Lecture12
No ratings yet
cs221 Lecture12
28 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
Aiml Q Bank
No ratings yet
Aiml Q Bank
25 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
ST-2 Notes KMBN IT-02 (Unit 2 and Unit 3) MR Rohit Pratap Singh
No ratings yet
ST-2 Notes KMBN IT-02 (Unit 2 and Unit 3) MR Rohit Pratap Singh
25 pages
UNIT III IRT
No ratings yet
UNIT III IRT
66 pages
Machine MCQ
No ratings yet
Machine MCQ
32 pages
Aiml MCQS
No ratings yet
Aiml MCQS
48 pages
Logistics Service Mode Selection For Last Mile Delivery An Analysis Method Considering Customer Utilit
No ratings yet
Logistics Service Mode Selection For Last Mile Delivery An Analysis Method Considering Customer Utilit
22 pages
Hardware Sales Forecasting Using Clustering and Machine Learning Approach
No ratings yet
Hardware Sales Forecasting Using Clustering and Machine Learning Approach
11 pages
GCP PMLE Notes
No ratings yet
GCP PMLE Notes
3 pages
ENVI Tutorial: Classification Methods
No ratings yet
ENVI Tutorial: Classification Methods
16 pages
Academic Journals
No ratings yet
Academic Journals
11 pages
Flow-Based Programming For Machine Learning
No ratings yet
Flow-Based Programming For Machine Learning
30 pages
Everything You Need To Know About K-Means Clustering - by Tanvi Penumudy - Analytics Vidhya - Medium
No ratings yet
Everything You Need To Know About K-Means Clustering - by Tanvi Penumudy - Analytics Vidhya - Medium
14 pages
K-Means and K-NN Methods For Determining Student Interest
No ratings yet
K-Means and K-NN Methods For Determining Student Interest
13 pages
(MPI Vs OpenMP) Parallel K-Means Clustering
No ratings yet
(MPI Vs OpenMP) Parallel K-Means Clustering
27 pages
1 s2.0 S187705092030644X Main
No ratings yet
1 s2.0 S187705092030644X Main
11 pages
Analyzing Crime in Chicago Through Machine Learning: Nathan Holt
No ratings yet
Analyzing Crime in Chicago Through Machine Learning: Nathan Holt
8 pages
Mall Customer Segmentation
No ratings yet
Mall Customer Segmentation
19 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet