Machine Learning Cheat Sheet
Machine Learning Cheat Sheet
Machine Learning Cheat Sheet
This cheat sheet provides a comprehensive guide to the top machine learning
algorithms, including their benefits, drawbacks, and practical applications. Follow
@chelseaintech on Instagram for more cheat sheets, if you enjoyed this!
Machine Learning:
Types of ML:
o Supervised Learning: Learns from labeled data where each data
point has an associated label or output value. Used for tasks like
classification (predicting categories) and regression (predicting
continuous values).
o Unsupervised Learning: Learns from unlabeled data where data
points don't have predefined labels. Used for tasks like clustering
(grouping similar data points) and dimensionality reduction
(reducing the number of features).
o Reinforcement Learning: Learns through trial and error interactions
with an environment. The agent receives rewards or penalties for its
actions, and aims to maximize its rewards over time.
● Common ML Tasks:
o Classification: Predicting a category (e.g., spam email or not)
o Regression: Predicting a continuous value (e.g., housing price)
o Clustering: Grouping similar data points together (e.g., customer
segmentation)
o Anomaly Detection: Identifying unusual data points (e.g., fraud
detection)
o Recommendation Systems: Recommending products, movies, etc.
(e.g., Netflix recommendations)
Supervised Learning
Category Algorithm Description Use Cases Advantages Disadvantages
Predicting
housing prices
Linear model based on features Simple to Assumes linear
Linear
for regression like square implement, relationship,
Regression
tasks footage, number interpretable sensitive to outliers
of bedrooms,
location etc.
Extension of
Modeling the
Linear Flexible, handles Computationally
Polynomial acceleration of a
Regression for non-linear expensive,
Regression vehicle based on
non-linear relationships overfitting risk
Regression time and velocity.
relationships
Regularized
Predicting stock Reduces
Linear
Ridge prices, GDP overfitting, Requires tuning
Regression to
Regression growth and improves hyperparameter
prevent
inflation rates generalization
overfitting
Regularized
Selects relevant Computationally
Linear Predicting
Lasso features, expensive, requires
Regression for customer
Regression reduces tuning
feature retention strategy
dimensionality hyperparameter
selection
Credit risk
Logistic modeling to Interpretable,
Assumes linear
Logistic model for predict if a handles
relationship,
Regression binary customer is likely categorical
sensitive to outliers
classification to default on a features
loan or not.
Tree-based Diagnosing
Handles
Classification model for diseases based Overfitting risk,
Decision non-linear
classification on symptoms sensitive to data
Trees relationships,
and reported by quality
easy to interpret
regression patients.
Support Max-margin Robust to noise, Computationally
Text classification,
Vector model for handles expensive, requires
bioinformatics,
Machines classification high-dimension tuning
image recognition
(SVM) and al data hyperparameter
regression
Ensemble of Predicting
Reduces Computationally
decision trees customer
Random overfitting, expensive, requires
for improved behavior, Medical
Forest improves tuning
robustness diagnosis,Fraud
generalization hyperparameter
and accuracy detection
2. Unsupervised Learning:
Unsupervised Learning
Category Algorithm Description Use Cases Advantages Disadvantages
Assumes
Clustering Customer spherical
Easy to
K-Means model for segmentation clusters,
implement,
Clustering grouping similar and anomaly sensitive to
fast
data points detection. initial
conditions
Clustering Computationall
Document Flexible,
model for y expensive,
Hierarchical organization handles
building a requires tuning
Clustering Clustering based on non-spherica
hierarchy of hyperparamete
similarity. l clusters
clusters r
Clustering
Handles Computationall
model for Detecting air
varying y expensive,
discovering pollution
DBSCAN densities, requires tuning
clusters of patterns based
robust to hyperparamete
varying on sensor data
noise r
densities
Linear
Principal dimensionality Data Assumes linear
Component reduction visualization and Fast, easy to relationships,
Analysis technique for feature implement sensitive to
(PCA) retaining most extraction. outliers
Dimensionalit variability
y Reduction
Non-linear
Dimensionality Handles
t-SNE dimensionality Computationall
reduction for non-linear
(t-Distribute reduction y expensive,
genomic data. relationships,
d Stochastic technique for requires tuning
Extracting preserves
Neighbor visualizing hyperparamete
features from local
Embedding) high-dimension r
images structure
al data
Model for Credit card fraud Robust to Computationall
detecting detection by noise, y expensive,
One-Class
outliers and identifying handles requires tuning
SVM
anomalies in anomalous high-dimensi hyperparamete
data transactions onal data r
Anomaly
Detection Model for Handles Computationall
Detecting
detecting varying y expensive,
Local Outlier anomalous
outliers and densities, requires tuning
Factor (LOF) readings from
anomalies in robust to hyperparamete
IoT sensors
data noise r
Algorithm for
Web usage Assumes
discovering
mining to binary data,
Apriori frequent Fast, easy to
identify frequent sensitive to
Algorithm itemsets and implement
browsing support
association
patterns threshold
rules
Association
Rule Mining Algorithm for Software bug
Computationall
discovering analysis by Handles
y expensive,
Eclat frequent finding non-binary
requires tuning
Algorithm itemsets and frequently data,
hyperparamete
association co-occurring efficient
r
rules error conditions
Resources:
https://github.com/antoinebrl/awesome-ml-blogs.