Machine Learning Cheat Sheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

MACHINE LEARNING CHEAT SHEET

This cheat sheet provides a comprehensive guide to the top machine learning
algorithms, including their benefits, drawbacks, and practical applications. Follow
@chelseaintech on Instagram for more cheat sheets, if you enjoyed this!

Machine Learning:

Machine Learning is a field of computer science that allows computers to learn


from data without explicit programming. It involves building algorithms that can
improve their performance on a specific task over time.

Types of ML:
o Supervised Learning: Learns from labeled data where each data
point has an associated label or output value. Used for tasks like
classification (predicting categories) and regression (predicting
continuous values).
o Unsupervised Learning: Learns from unlabeled data where data
points don't have predefined labels. Used for tasks like clustering
(grouping similar data points) and dimensionality reduction
(reducing the number of features).
o Reinforcement Learning: Learns through trial and error interactions
with an environment. The agent receives rewards or penalties for its
actions, and aims to maximize its rewards over time.

● Common ML Tasks:
o Classification: Predicting a category (e.g., spam email or not)
o Regression: Predicting a continuous value (e.g., housing price)
o Clustering: Grouping similar data points together (e.g., customer
segmentation)
o Anomaly Detection: Identifying unusual data points (e.g., fraud
detection)
o Recommendation Systems: Recommending products, movies, etc.
(e.g., Netflix recommendations)

● Data for ML:


o Data is crucial for ML. It can be structured (tabular), unstructured
(text, images), or semi-structured (like JSON).
o Pre-processing steps like cleaning (handling missing values),
normalization (scaling features), and feature engineering (creating
new features) are often required.
● Essential Algorithms:
1. Supervised Learning

Supervised Learning
Category Algorithm Description Use Cases Advantages Disadvantages
Predicting
housing prices
Linear model based on features Simple to Assumes linear
Linear
for regression like square implement, relationship,
Regression
tasks footage, number interpretable sensitive to outliers
of bedrooms,
location etc.
Extension of
Modeling the
Linear Flexible, handles Computationally
Polynomial acceleration of a
Regression for non-linear expensive,
Regression vehicle based on
non-linear relationships overfitting risk
Regression time and velocity.
relationships
Regularized
Predicting stock Reduces
Linear
Ridge prices, GDP overfitting, Requires tuning
Regression to
Regression growth and improves hyperparameter
prevent
inflation rates generalization
overfitting
Regularized
Selects relevant Computationally
Linear Predicting
Lasso features, expensive, requires
Regression for customer
Regression reduces tuning
feature retention strategy
dimensionality hyperparameter
selection
Credit risk
Logistic modeling to Interpretable,
Assumes linear
Logistic model for predict if a handles
relationship,
Regression binary customer is likely categorical
sensitive to outliers
classification to default on a features
loan or not.
Tree-based Diagnosing
Handles
Classification model for diseases based Overfitting risk,
Decision non-linear
classification on symptoms sensitive to data
Trees relationships,
and reported by quality
easy to interpret
regression patients.
Support Max-margin Robust to noise, Computationally
Text classification,
Vector model for handles expensive, requires
bioinformatics,
Machines classification high-dimension tuning
image recognition
(SVM) and al data hyperparameter
regression
Ensemble of Predicting
Reduces Computationally
decision trees customer
Random overfitting, expensive, requires
for improved behavior, Medical
Forest improves tuning
robustness diagnosis,Fraud
generalization hyperparameter
and accuracy detection

2. Unsupervised Learning:
Unsupervised Learning
Category Algorithm Description Use Cases Advantages Disadvantages
Assumes
Clustering Customer spherical
Easy to
K-Means model for segmentation clusters,
implement,
Clustering grouping similar and anomaly sensitive to
fast
data points detection. initial
conditions

Clustering Computationall
Document Flexible,
model for y expensive,
Hierarchical organization handles
building a requires tuning
Clustering Clustering based on non-spherica
hierarchy of hyperparamete
similarity. l clusters
clusters r

Clustering
Handles Computationall
model for Detecting air
varying y expensive,
discovering pollution
DBSCAN densities, requires tuning
clusters of patterns based
robust to hyperparamete
varying on sensor data
noise r
densities

Linear
Principal dimensionality Data Assumes linear
Component reduction visualization and Fast, easy to relationships,
Analysis technique for feature implement sensitive to
(PCA) retaining most extraction. outliers
Dimensionalit variability
y Reduction
Non-linear
Dimensionality Handles
t-SNE dimensionality Computationall
reduction for non-linear
(t-Distribute reduction y expensive,
genomic data. relationships,
d Stochastic technique for requires tuning
Extracting preserves
Neighbor visualizing hyperparamete
features from local
Embedding) high-dimension r
images structure
al data
Model for Credit card fraud Robust to Computationall
detecting detection by noise, y expensive,
One-Class
outliers and identifying handles requires tuning
SVM
anomalies in anomalous high-dimensi hyperparamete
data transactions onal data r
Anomaly
Detection Model for Handles Computationall
Detecting
detecting varying y expensive,
Local Outlier anomalous
outliers and densities, requires tuning
Factor (LOF) readings from
anomalies in robust to hyperparamete
IoT sensors
data noise r

Algorithm for
Web usage Assumes
discovering
mining to binary data,
Apriori frequent Fast, easy to
identify frequent sensitive to
Algorithm itemsets and implement
browsing support
association
patterns threshold
rules
Association
Rule Mining Algorithm for Software bug
Computationall
discovering analysis by Handles
y expensive,
Eclat frequent finding non-binary
requires tuning
Algorithm itemsets and frequently data,
hyperparamete
association co-occurring efficient
r
rules error conditions

● Python libraries you need to know :

Python Libraries Primary Use Cases Key Features

Numpy Data manipulation Mathematical functions for


arrays

Pandas Data manipulation DataFrame and Series


Data analysis objects, Handling missing
Data cleaning data, Reading/writing various
file formats

MatpotLib/Seaborn Data visualization Wide range of plot types-


Statistical graphics Highly customizable -
Integration with NumPy and
Pandas

SciKit Learn Data preprocessing Model Simple, efficient data mining


evaluation and analysis

Tensorflow Deep Learning, Numerical Flexible architecture for


computation deployment

Pytorch Train neural networks Strong GPU Acceleration


Dynamic computation graphs

Resources:

Explore this curated compilation of outstanding technical blogs on machine


learning, spanning from cutting-edge research to deployment. Whether you seek
the latest breakthroughs or practical tutorials, these sites are essential for
staying informed and up-to-date.

https://github.com/antoinebrl/awesome-ml-blogs.

You might also like