0% found this document useful (0 votes)

7 views3 pages

Cheat Sheet-Building Unsupervised Learning Models

The document provides a cheat sheet for building unsupervised learning models, detailing various algorithms such as UMAP, t-SNE, PCA, DBSCAN, HDBSCAN, and K-Means, along with their pros, cons, applications, and key hyperparameters. It also includes associated functions for generating data and visualizations. The authors of the document are Jeff Grossman and Abhishek Gagneja.

Uploaded by

tibocef309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

Cheat Sheet-Building Unsupervised Learning Models

Uploaded by

tibocef309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

5/23/25, 7:49 AM about:blank

Cheat Sheet: Building Unsupervised Learning Models

Unsupervised learning models

Model Name Brief Description Code Syntax

UMAP (Uniform Manifold Approximation and Projection) is used

for dimensionality reduction.
Pros: High performance, preserves global structure. from umap.umap_ import UMAP
Cons: Sensitive to parameters. umap = UMAP(n_neighbors=15, min_dist=0.1, n_components=2)
Applications: Data visualization, feature extraction.
Key hyperparameters:
UMAP
n_neighbors: Controls the local neighborhood size (default
= 15).
min_dist: Controls the minimum distance between points
in the embedded space (default = 0.1).
n_components: The dimensionality of the embedding
(default = 2).

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a

nonlinear dimensionality reduction technique.
Pros: Good for visualizing high-dimensional data. from sklearn.manifold import TSNE
Cons: Computationally expensive, prone to overfitting. tsne = TSNE(n_components=2, perplexity=30, learning_rate=200)
Applications: Data visualization, anomaly detection.
Key hyperparameters:
t-SNE
n_components: The number of dimensions for the output
(default = 2).
perplexity: Balances attention between local and global
aspects of the data (default = 30).
learning_rate: Controls the step size during optimization
(default = 200).

PCA (principal component analysis) is used for linear

dimensionality reduction. from sklearn.decomposition import PCA
Pros: Easy to interpret, reduces noise. pca = PCA(n_components=2)
Cons: Linear, may lose information in nonlinear data.
Applications: Feature extraction, compression.
Key hyperparameters:
PCA
n_components: Number of principal components to retain
(default = 2).
whiten: Whether to scale the components (default = False).
svd_solver: The algorithm to compute the components
(default = 'auto').

DBSCAN (Density-Based Spatial Clustering of Applications with

Noise) is a density-based clustering algorithm. from sklearn.cluster import DBSCAN
Pros: Identifies outliers, does not require the number of clusters. dbscan = DBSCAN(eps=0.5, min_samples=5)
Cons: Difficult with varying density clusters.
Applications: Anomaly detection, spatial data clustering.
DBSCAN Key hyperparameters:

eps: The maximum distance between two points to be

considered neighbors (default = 0.5).
min_samples: Minimum number of samples in a
neighborhood to form a cluster (default = 5).

HDBSCAN (Hierarchical DBSCAN) improves on DBSCAN by

handling varying density clusters. import hdbscan
Pros: Better handling of varying densities. clusterer = hdbscan.HDBSCAN(min_cluster_size=5)
Cons: Can be slower than DBSCAN.
Applications: Large datasets, complex clustering problems.
HDBSCAN Key hyperparameters:

min_cluster_size: The minimum size of clusters (default =

5).
min_samples: Minimum number of samples to form a
cluster (default = 10).

K-Means K-Means is a centroid-based clustering algorithm that groups data from sklearn.cluster import KMeans
clustering into k clusters. kmeans = KMeans(n_clusters=3)
Pros: Efficient, simple to implement.
Cons: Sensitive to initial cluster centroids.

about:blank 1/3
5/23/25, 7:49 AM about:blank

Model Name Brief Description Code Syntax

Applications: Customer segmentation, pattern recognition.
Key hyperparameters:

n_clusters: Number of clusters (default = 8).

init: Method for initializing the centroids ('k-means++' or
'random', default = 'k-means++').
n_init: Number of times the algorithm will run with
different centroid seeds (default = 10).

Associated fuctions used

Method Brief Description Code Syntax

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=100, centers=2, random_state=42)

Generates isotropic Gaussian blobs

make_blobs
for clustering.

from numpy.random import multivariate_normal

samples = multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=100)

Generates samples from a

multivariate_normal
multivariate normal distribution.

import plotly.express as px
fig = px.scatter_3d(df, x='x', y='y', z='z')
fig.show()

Creates a 3D scatter plot using

plotly.express.scatter_3d
Plotly Express.

import geopandas as gpd

gdf = gpd.GeoDataFrame(df, geometry='geometry')

Creates a GeoDataFrame from a

geopandas.GeoDataFrame
Pandas DataFrame.

gdf = gdf.to_crs(epsg=3857)

Transforms the coordinate

geopandas.to_crs reference system of a
GeoDataFrame.

contextily.add_basemap Adds a basemap to a import contextily as ctx

GeoDataFrame plot for context. ax = gdf.plot(figsize=(10, 10))
ctx.add_basemap(ax)

about:blank 2/3
5/23/25, 7:49 AM about:blank

Method Brief Description Code Syntax

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(X)
variance_ratio = pca.explained_variance_ratio_

Returns the proportion of variance

pca.explained_variance_ratio_ explained by each principal
component.

Author
Jeff Grossman
Abhishek Gagneja

about:blank 3/3

Strength and Coordination Training, An Integrative Approach - Bosch
100% (3)
Strength and Coordination Training, An Integrative Approach - Bosch
164 pages
Concept, Importance and Objectives of In-Service Teacher Education
90% (10)
Concept, Importance and Objectives of In-Service Teacher Education
15 pages
Python Course Cheat Sheet
No ratings yet
Python Course Cheat Sheet
30 pages
Visualization With Seaborn - Python Data Science Handbook
No ratings yet
Visualization With Seaborn - Python Data Science Handbook
17 pages
Proposed Program On Mid-Year Performance Review and Evaluation
100% (8)
Proposed Program On Mid-Year Performance Review and Evaluation
9 pages
CS-3361-Data-science-lab Manual
No ratings yet
CS-3361-Data-science-lab Manual
36 pages
CC Unit IV
No ratings yet
CC Unit IV
30 pages
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
No ratings yet
Content From Jose Portilla's Udemy Course Learning Python For Data Analysis and Visualization Notes by Michael Brothers, Available On
13 pages
Finale Mock Questions
100% (1)
Finale Mock Questions
15 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
Clustering
No ratings yet
Clustering
75 pages
AbidAdhikari26840 DWDM
No ratings yet
AbidAdhikari26840 DWDM
43 pages
Datascience Internship
No ratings yet
Datascience Internship
43 pages
Baidurya Debnath 4
No ratings yet
Baidurya Debnath 4
37 pages
1lesson The Deped Science Framework and Curriculum Key Stages in K To 10 (1) 012915
No ratings yet
1lesson The Deped Science Framework and Curriculum Key Stages in K To 10 (1) 012915
39 pages
03 01 Machine Learning
No ratings yet
03 01 Machine Learning
34 pages
Unit 2-2
No ratings yet
Unit 2-2
33 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
Casos de ML Unsupervised Daniel Ames Camayo
No ratings yet
Casos de ML Unsupervised Daniel Ames Camayo
20 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
De&v Lab Manual
No ratings yet
De&v Lab Manual
91 pages
Section 7
No ratings yet
Section 7
33 pages
Data Science Project Training Report
No ratings yet
Data Science Project Training Report
19 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
AIML Short Term Internship Session 9 Summary-1719044709410
No ratings yet
AIML Short Term Internship Session 9 Summary-1719044709410
14 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
Unit V Anseer Key
No ratings yet
Unit V Anseer Key
10 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Datascience
No ratings yet
Datascience
26 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
PP&DS Unit Iii
No ratings yet
PP&DS Unit Iii
26 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
ML Python Exercises UOM BDS Cluster Analysis
No ratings yet
ML Python Exercises UOM BDS Cluster Analysis
8 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
DMV U4 RK
No ratings yet
DMV U4 RK
16 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Graphs Using Matplotlib
No ratings yet
Graphs Using Matplotlib
23 pages
Practical 5
No ratings yet
Practical 5
6 pages
Unsuper
No ratings yet
Unsuper
15 pages
DBSCAN Clustering
No ratings yet
DBSCAN Clustering
6 pages
Dbscan Implementation in Python
No ratings yet
Dbscan Implementation in Python
5 pages
ML Unit-5
No ratings yet
ML Unit-5
8 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Numpy Cheatsheet
No ratings yet
Numpy Cheatsheet
11 pages
Maxbox - Starter68 Machine Learning
No ratings yet
Maxbox - Starter68 Machine Learning
5 pages
Sample TP3 - 4 Plan
No ratings yet
Sample TP3 - 4 Plan
5 pages
DAVL PR1.2 Mit
No ratings yet
DAVL PR1.2 Mit
10 pages
Experiment 4 1
No ratings yet
Experiment 4 1
4 pages
ML 1
No ratings yet
ML 1
6 pages
Drawback of Standard K-Means Algorithm
No ratings yet
Drawback of Standard K-Means Algorithm
5 pages
Dav Lab
No ratings yet
Dav Lab
8 pages
TWP
No ratings yet
TWP
2 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
ML 2.3 Prashant
No ratings yet
ML 2.3 Prashant
4 pages
Lesson Plan Quartiles For Grouped Data 2021
100% (4)
Lesson Plan Quartiles For Grouped Data 2021
4 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
Esam - DWM Lab 8
No ratings yet
Esam - DWM Lab 8
5 pages
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
No ratings yet
10 - DBSCANClusteringOnIRIS-Copy1 - Jupyter Notebook
4 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
1 An Introduction To Machine Learning With Scikit Learn
No ratings yet
1 An Introduction To Machine Learning With Scikit Learn
2 pages
Clustering
No ratings yet
Clustering
1 page
R Reference Card For Data Mining
No ratings yet
R Reference Card For Data Mining
3 pages
Grade 10 French Course Outline
No ratings yet
Grade 10 French Course Outline
5 pages
DLL CW 3
No ratings yet
DLL CW 3
3 pages
Unit 4
No ratings yet
Unit 4
20 pages
CE155 Orientation
No ratings yet
CE155 Orientation
23 pages
CoE V11063TM20 09.08.11
No ratings yet
CoE V11063TM20 09.08.11
2 pages
MRK - Spring 2024 - PSY612 - 9345 - 31843 - S2409F5868
No ratings yet
MRK - Spring 2024 - PSY612 - 9345 - 31843 - S2409F5868
38 pages
Lesson 55 TS25
No ratings yet
Lesson 55 TS25
5 pages
Chapinbook Ahmadshakani30mac2022
No ratings yet
Chapinbook Ahmadshakani30mac2022
9 pages
123 Tiara Past Tense 02
No ratings yet
123 Tiara Past Tense 02
28 pages
LP Lnhs
No ratings yet
LP Lnhs
7 pages
Yearly Plan Form 3 2020
No ratings yet
Yearly Plan Form 3 2020
3 pages
MODULAR LEARNING MODALITY (Swim Lane)
No ratings yet
MODULAR LEARNING MODALITY (Swim Lane)
35 pages
Daily Lesson Plan: Class DAY Date Subject Time Week Topic Theme Language Focus Skills
No ratings yet
Daily Lesson Plan: Class DAY Date Subject Time Week Topic Theme Language Focus Skills
2 pages
Teachers Standards Information Sheet Dec 2021
No ratings yet
Teachers Standards Information Sheet Dec 2021
1 page
Lesson Examplar Population Explosion
No ratings yet
Lesson Examplar Population Explosion
6 pages
Weekly Newsletter, 19012025 - 20250119 - 111436 - 0000
No ratings yet
Weekly Newsletter, 19012025 - 20250119 - 111436 - 0000
4 pages
Practical Research RRL 1
No ratings yet
Practical Research RRL 1
5 pages
Use of Popular Media in Teaching
No ratings yet
Use of Popular Media in Teaching
4 pages
Поурочный план по английскому языку - Helping other people -
No ratings yet
Поурочный план по английскому языку - Helping other people -
5 pages
Six Basic Questions: Planning Research Papers 1
No ratings yet
Six Basic Questions: Planning Research Papers 1
2 pages
Frankenstein Day 5
No ratings yet
Frankenstein Day 5
2 pages
4th QTR WEEK 7 Char. Ed 6
No ratings yet
4th QTR WEEK 7 Char. Ed 6
2 pages
Second Language Writing, Genre, and Identity: An Interview With Ken Hyland
No ratings yet
Second Language Writing, Genre, and Identity: An Interview With Ken Hyland
5 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Cheat Sheet-Building Unsupervised Learning Models

Uploaded by

Cheat Sheet-Building Unsupervised Learning Models

Uploaded by

5/23/25, 7:49 AM about:blank

Cheat Sheet: Building Unsupervised Learning Models

Model Name Brief Description Code Syntax

UMAP (Uniform Manifold Approximation and Projection) is used

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a

PCA (principal component analysis) is used for linear

DBSCAN (Density-Based Spatial Clustering of Applications with

eps: The maximum distance between two points to be

HDBSCAN (Hierarchical DBSCAN) improves on DBSCAN by

min_cluster_size: The minimum size of clusters (default =

Model Name Brief Description Code Syntax

n_clusters: Number of clusters (default = 8).

Associated fuctions used

Method Brief Description Code Syntax

from sklearn.datasets import make_blobs

Generates isotropic Gaussian blobs

from numpy.random import multivariate_normal

Generates samples from a

Creates a 3D scatter plot using

import geopandas as gpd

Creates a GeoDataFrame from a

Transforms the coordinate

contextily.add_basemap Adds a basemap to a import contextily as ctx

Method Brief Description Code Syntax

from sklearn.decomposition import PCA

Returns the proportion of variance

You might also like