0% found this document useful (0 votes)

28 views

Supervised Learning Algorithms cheat sheet

The document is a cheat sheet on Supervised Learning Algorithms by Dmytro Nikolaiev, covering key concepts, types of algorithms, and their applications in regression and classification tasks. It details various algorithms including Linear Regression, Logistic Regression, Support Vector Machines, k-Nearest Neighbors, Decision Trees, and Ensemble Methods like Bagging and Boosting. The document also discusses the pros and cons of each algorithm, their hyperparameters, and the importance of feature engineering and regularization.

Uploaded by

Arun Vastrad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Supervised Learning Algorithms cheat sheet

Uploaded by

Arun Vastrad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Supervised Learning Algorithms cheat sheet

by Dmytro Nikolaiev

Contents

Supervised Learning Algorithms cheat sheet ................................................................................. 1

Introduction ................................................................................................................................. 2
Simple Algorithms .......................................................................................................................... 4
Linear Regression ........................................................................................................................ 4
Logistic Regression ..................................................................................................................... 6
Support Vector Machines ............................................................................................................ 7
k-Nearest Neighbors .................................................................................................................... 9
Decision Trees ........................................................................................................................... 11
Ensemble Methods ........................................................................................................................ 13
Bagging ..................................................................................................................................... 13
Random Forest .......................................................................................................................... 15
Extra Trees ................................................................................................................................ 15
Boosting .................................................................................................................................... 16
Stacking ..................................................................................................................................... 18
Conclusions ............................................................................................................................... 20

1
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Introduction
Supervised learning is the machine learning task of learning a function that maps an input to an
output based on example input-output pairs. A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used later for mapping new
examples.

The most popular supervised learning tasks are: Regression and Classification.

 The result of solving the regression task is a model that can make numerical predictions.
For example:
o Real estate value prediction
o Predicting your company's revenue next year
 The result of solving the classification task is a model that can make classes predictions.
For example:
o Spam detection
o Classifying news articles
 The line between these tasks is sometimes fuzzy (predicting the probability of cancer
based on blood tests)

Classification vs Regression. Image by Author

2
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Classification algorithms can be also divided to hard and soft:

 Hard classification algorithms predict whether a data point belongs to a particular class
without producing the probability estimation.
 Soft classification algorithms in turn, also estimate the class conditional probabilities.

Classification algorithms can be also divided by the number of classes to classify:

 Binary classification - only two classes.

 Multiclass classification - more than two classes.
o Multilabel classification (Multilabel-multiclass) - multiple classes, but classes
are binary (the presence of people in the image). Result - [0, 0, 1] or [1, 0, 1].
o Multioutput classification (Multioutput-multiclass) also known as multitask
classification - multiple classes, but classes are not binary (predict the number of
items). Result - [5, 0, 1] or [7, 0, 0].

Some algorithms are designed only for binary classification problems (SVM for example). So,
they cannot be used for multi-class classification tasks directly. Instead, heuristic methods can be
used to split a multi-class classification problem into multiple binary classification datasets and
train a binary classification model each:

 OvR (one-vs-rest) - sometimes OvA (one-vs-all) - you have to train N classifiers for N
classes, but on full dataset.
 OvO (one-vs-one) - you have to train N*(N-1)/2 classifiers for N classes, but on
subsamples from your dataset. Better suited for unbalanced samples.

Next, the following algorithms will be reviewed or mentioned (note, that all of them solve both
classification and regression task, except of Linear Regression (only Regression) and Logistic
Regression (only Classification)):

 Linear Regression
 Logistic Regression
 Support Vector Machines
 k-Nearest Neighbors
 Decision Tree
 Ensemble methods:
o Bagging and Pasting
o Random Forest and Extra Trees
o Boosting
o Stacking and Blending

3
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Simple Algorithms
I use the phrase simple algorithms not in the sense that they are simply implemented (although
some of them really are), but in the fact that these are separate algorithms, not ensemble learning
that we will see later.

Linear Regression
In the simplest case, the regression task is to draw a line through the data points so that an error
between this line (predictions) and real values is minimal. In general, this is the problem of
minimizing the loss function, so the optimization problem. Usually, the loss function is the MSE -
mean square error (because of maximum likelihood estimation), and the optimization algorithm
is gradient descent. Anyway, any other loss function of optimization algorithm can be used.

One of the important properties of linear regression is that optimal parameters (according to
MSE, again, because of maximum likelihood estimation) can be calculated with simple Normal
Equation. But this method does not scale well with large number of features, so any other
optimization method can be applied instead.

If the data dependences is more complex, than a straight line, we can add powers of each feature
as new features (PolynomialFeatures class from sklearn can be used) and then train a Linear
Regression model. This technique is called Polynomial Regression. Process of creating new
features (e.g. x^n, or log(x), e^x etc.) is called feature engineering and can significantly increase
linear model performance.

Other popular version of this algorithm is Bayesian Linear Regression, that predicts not only
values, but also it's probabilities, by building a confidence interval. This is possible thanks to
Bayes' theorem.

One of the most efficient way to avoid overfitting and outliers influence with regression is
regularization. Regularization term is added to loss function so regression coefficients have to
be as little as possible.

 LASSO regression - implements L1 regularization, + |coeff|.

 Ridge regression - implements L2 regularization, + coeff^2. Also known as Tikhonov
regularization.
 Elastic Net regression - implements both L1 and L2 regularization.

Regularized regression can also be used like feature selection tool. Thanks to some properties,
LASSO regression, for example, can delete insignificant features (set their coefficients to zero).

As mentioned earlier, linear regression solve only regression task.

4
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Main hyperparameters:

 feature engineering and feature selection

 regularization type and parameter
 solver - optimization algorithm

Pros:

 Have few parameters, and learn fast

 Can be configured using stochastic gradient descent, without the need to store all the
samples in memory and can be used for online learning
 More interpretable than complex models
 Is well suited for problems with a small number of data points and large number of
features
 Is well suited for sparse data

Cons:

 Poorly restores complex dependencies

 Requires data preprocessing

5
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Logistic Regression
Like Linear Regression model, Logistic Regression (is also known as logit regression) computes
a weighted sum of the input features (plus bias) but instead of outputting this result directly, it
outputs the logistic of the result. The logistic is a sigmoid function, that outputs a number
between 0 and 1, so Logistic Regression is soft binary classifier that estimates the probability
that instance belongs to the positive class. Depends of some threshold different values of
accuracy/recall can be obtained. The same types of regularization as in Linear Regression can be
used.

Sigmoid Function. Public Domain

Very similar probit regression uses a little different function - probit function instead of
sigmoid.

The Logistic Regression model can be generalized to support multiple classes directly, without
training multiple classifiers. This is called Softmax Regression (or Multinomial Logistic
Regression). This model computes a score for each class and then estimates the probability of
each class by applying softmax function (also called normalized exponential).

As mentioned earlier, logistic regression solve only classification task.

Is based on Linear Regression, so inherits all the hyperparameters, pros and cons of this
algorithm. What can be noted separately - high interpretation level of this algorithm, so it is
usually widely used in credit scoring tasks and medical diagnostics.

6
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Support Vector Machines

Support Vector Machines algorithm is based on support vectors concept - the extreme points
(circled in black in the image).
In case of classification task it tries to draw a separating line between classes such that support
vectors are located as far as possible from this line (separating hyperplane in general case):

 Hard Margin Classification - it is assumed that instances of the same class are on the
same side of the separating hyperplane without exceptions.
 Soft Margin Classification - allows violation of the decision boundary, which is regulated
by the regularization parameter.

In case of regression task, instead, it tries to draw a line to fit as many instances as possible
inside the border, "on the street".

Support Vectors. Public Domain

Since SVM requires calculating distances between points it also requires feature scaling.

The most important and mathematically elegant feature of SVM is that the solution of the Dual
Problem (which is the basis of SVM) does not depend on the features directly (as vectors), but
only on their pairwise scalar products. This allows us to replace the scalar product with a certain
function k(a, b), which is called the kernel. In fact, the kernel is a scalar product in some other
space. This procedure allows you to build nonlinear classifiers (which are actually linear in a
larger dimension space) without adding new features and is called kernel trick.

The use of different kernels allows this algorithm to recover very complex dependencies in both
classification and regression tasks. The most popular kernels are:

 polynomial
 RBF - Gaussian Radial Basis Function
 sigmoid and others
7
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

SVM with different kernels and default parameters. Image by Author

One-class SVM also can be used for the Anomaly Detection problem.

Main hyperparameters:

 kernel type
 regularization parameter C - a penalty for each misclassified data point (usually 0.1 < C <
100)
 regularization parameter gamma - controls regions separating different classes. Large
gamma - too specific class regions (overfitting). (usually 0.0001 < gamma < 10)

Pros:

 One of the most powerful and flexible models

 As linear model inherits the pros of linear regression

Cons:

 Requires data preprocessing

 It scales well with number of features, but not samples, so works well only on small and
medium-sized datasets

8
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

k-Nearest Neighbors
The nearest neighbor algorithm, as a representative of metric methods, makes two hypotheses
about the data distribution:

 Continuity hypothesis for regression - close objects correspond to close answers, and
 Compactness hypothesis for classification - close objects correspond to the same class.

For a new object we have to find k nearest neighbors. Definition of nearest depends on the
distance metric that we want to use (Manhattan, Euclidean etc.).

k-Nearest Neighbors algorithm. The result may differ depending on k.

Public Domain

The most important hyperparameter is number of neighbors - k. A good initial approximation of

k is to set k to square root of data points number, but, of course, k can be found with Cross
Validation. Classification then is computed from a simple majority vote of the nearest neighbors
of each point, regression - from a mean value of the nearest neighbors of each point.

Main hyperparameters:

 k - number of neighbors
 distance metric

9
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Pros:

 Lazy learning - we just have to load data in memory

 Simple interpretation
 Simple realization

Cons:

 Requires data preprocessing

 Poorly restores complex dependencies (classification for highly overlapping data)
 As any metric algorithm works badly with sparse high-dimensional data
 As any instance-based algorithm have to store all train data in memory

10
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Decision Trees
At each step, the train set is divided into two (or more) parts, depending on a particular choice.
Usually these algorithms are greedy, that means, that they are looking for a local optimal
solution at a specific step. The popular algorithms for building trees are:

 ID3 (one of the oldest algorithm, Iterative Dichotomiser 3 was invented by Ross
Quinlan),
 C4.5, C5.0 (an extensions of ID3 algorithm, they were developed by the same person and
consists in pruning the tree after using ID3),
 CART (Classification And Regression Tree is optimized for both classification (Gini
Inpurity as measure) and regression (MSE as measure) trees and is implemented in scikit-
learn).

Decision Tree Classifier and Regressor. Image by Author

Decision Tree Structure using the example Decision Tree Classifier above. Image by Author
11
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Different measures for calculating information gain can be used. Then decision tree algorithm
use information gain to split a particular node:

 Entropy - measure of disorder.

 Gini Impurity.

The so-called decision tree pruning shows itself better than simply limiting the length of the
tree. This is the procedure when we build a tree of full depth, after that we remove insignificant
nodes of the tree. However, this process is more resource-intensive.

Main hyperparameters:

 maximum depth of the tree - the less the less overfitting, usually 10-20
 minimum number of objects in a leaf - the greater the less overfitting, usually 20+

Pros:

 Simple interpretation
 Simple realization
 Computational simplicity
 Does not require features preprocessing and can handle with missing values
 Feature importance can be calculated using information gain

Cons:

 Unstable and variable (investigation of greedy algorithm) - a small change in the input
data can completely change the structure of the tree
 High sensitivity to the content of the training set and noise
 Poorly restores complex (non-linear) dependencies
 The tendency to overfitting at a large depth of the tree
 Unlike linear models, they are not extrapolated (they can only predict the value in the
range from the minimum to the maximum value of train set)

12
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Ensemble Methods
Ensemble methods (also ensemble learning) are techniques that create multiple models and then
combine them to produce improved results. Ensemble methods usually produces more accurate
solutions than a single model would.

Bagging
Bagging stands for bootstrap aggregating.

When we have a train set X_train (N x M) N data points and M features then we train n
trees on X, where X (N x M) is a random subsample of X_train with the same size. When X is
formed with replacement algorithm is called bagging, and when X is formed without
replacement algorithm is called pasting. When this model does prediction, really, it gets n
predictions from n different models and aggregates them. Classification is computed from a
simple majority vote of the models and regression is computed from a mean value of the models'
predictions.

Bagging. Image by Author

Pasting was originally designed for large datasets, when computing power is limited. Bagging,
on the other hand, can use the same subsets many times, which is great for smaller sample sizes,
in which it improves robustness.

13
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

This approach allows to leave the same bias, but decrease the variance thanks to Central Limit
Theorem. The more variable the algorithms are, the lower the correlation of their predictions and,
accordingly, the CLT works better (decision trees are a great choice).

If we are using bagging, there is a chance that a sample would never be selected, while others
may be selected multiple times. In general, for a big dataset, 37% of its samples are never
selected and we could use it to test our model. This is called Out-of-Bag scoring, or OOB
Scoring.

Main hyperparameters:

 type of models
 n_estimators - the number of models in the ensemble
 max_samples - the number of samples to take from train set to train each base model
 max_features - the number of features to take from train set to train each base model

Pros:

 Very good quality

 Training process can be simply parallelized because models learns independently from
each other
 Does not require features preprocessing and built-in assessment of the importance of
features (in case of trees)
 Resistant to overfitting
 Resistant to outliers
 OOB Scoring allows to use full dataset without splitting it into train and validation

Cons:

 Complexity of interpretation
 Does not cope well with a very large number of features or for sparse data
 Trains and makes predictions significantly slower than linear models

14
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Random Forest
Despite the fact that bagging can be applied with all types of algorithms, bagging over decision
trees has become widespread. Since they are unstable and variable, a good result is obtained. In
fact, random forest is bagging over decision trees with random subspace method.

When we have train set X_train N x M (N data points and M features) then we train n
trees on X, where X (N x m) is random subsample of X_train with replacement, but we also
take a random subset of the m (m < M) features. This is called the Random Subspace Method.
When this model does prediction, really, it gets n predictions from n different models and
aggregates them. Classification is computed from a simple majority vote of the models and
regression is computed from a mean value of the models' predictions.

This approach allows to leave the same bias, but decrease the variance thanks to Central Limit
Theorem.

As known Isolation Forest algorithm also can be used for the Anomaly detection problem.

Inherits the pros and cons of bagging.

Main hyperparameters:

 n_estimators - the number of trees in the ensemble - the more the better
 max_features - the number of features to draw from train set to train each base tree - n/3
for regression and sqrt(n) for classification is recommended
 max_depth - the maximum depth of the tree
 min_sample_leaf - the minimum number of samples required to split an internal node

Extra Trees
Extra Trees is related to the widely used random forest algorithm.

 Unlike bagging and random forest that trains each decision tree from a bootstrap
sample of the training dataset, the Extra Trees algorithm trains each decision tree on the
whole training dataset.
 Like random forest, the Extra Trees algorithm will randomly sample the features at
each split point of a decision tree.
 Unlike random forest, which uses a greedy algorithm to select an optimal split point, the
Extra Trees algorithm randomly selects a split point.

It can often achieve as good or better performance than the random forest algorithm,
although it uses a simpler algorithm to construct the decision trees used as members of the
ensemble, so it works faster.

Inherits all hyperparameters, the pros and cons of random forest.

15
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Boosting
A boosting is an ensemble of weak algorithms (the prediction accuracy is slightly better than
random) that are trained sequentially and each subsequent one considers the error of the
previous one.

Boosting. Image by Author

The general idea of boosting can be implemented in different ways. Three the most popular types
of boosting are:

 AdaBoost

AdaBoost stands for Adaptive Boosting. This is a greedy iterative algorithm. At each step
it identifies misclassified data points and adjusts the weights to minimize the training
error.
This version of boosting is sensitive to outliers.

 Gradient Boosting

Gradient Boosting is also called Gradient Boosting Machine - GBM. As any boosting
implementation, at each step, this algorithm tries to minimize the errors made in the
previous steps. But instead of changing the weights (like AdaBoost), GBM trains the next
model on the residual errors of its predecessor. One of the implementations of GDM is
LightGBM.

16
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

 XGBoost

XGBoost stands for eXtreme Gradient Boosting. This implementation was designed for
speed and performance - it works parallel with GPU and Hadoop. XGBFIR is a great
library for XGBoost feature importance analysis.

Since the main implementations of boosting still use decision trees as basic models, boosting,
like random forest, determines the importance of features. But the popularity of the boosting
creates a lot of libraries that allows you to do more detailed analysis (for example XGBFIR
library allows you to analyse not one feature importance, but also their double and even triple
combinations).

Main hyperparameters:

 Types of models and ways of their interaction with each other

Pros:

 Very good quality, usually better than random forest

 Built-in assessment of the importance of features

Cons:

 Learning is slower than random forest, because learning process has to be strictly
sequential (although the implementations like XGBoost or LightGBM can argue with
this)
 Prone to overfitting
 Works well only with sufficiently large datasets

17
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Stacking
The architecture of a stacking model involves two or more base models, often referred to as
level-0 models, and a meta-model that combines the predictions of the base models, referred
to as a level-1 model.

 Level-0 Models (Base-Models): Training data is divided into K folds. Then K models train
on the K-1 folds each.
 Level-1 Model (Meta-Model): Model that learns how to combine the predictions of the
base models in the best possible way.

Stacking. Image by Author

Differences from boosting:

 Unlike bagging, in stacking, the models are typically different (e.g. not all decision
trees).
 Unlike boosting, in stacking, a single model is used to learn how to best combine the
predictions from the contributing models (e.g. instead of a sequence of models that
correct the predictions of prior models).

The usage of a simple linear model as the meta-model often gives stacking the colloquial name
blending.

Main hyperparameters:

 Types of models and ways of their interaction with each other

18
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Pros:

 Improves the quality of the model when nothing else helps

 Allows you to effectively mix models of different classes, combining their strengths
 Help you win gold on Kaggle

Cons:

 High computational complexity

 Complexity of interpretation
 Сan easily overfit with information leak
 Works well only on sufficiently large datasets

19
Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Conclusions
The most popular supervised learning algorithms were described here (of course, there are
others). As a conclusion, I want to describe the process of choosing an algorithm to solve a
typical supervised learning task (classification or regression). It's very simple - you just need to
answer two questions.

Is your data sparse? If yes, then you will have to use linear methods. This is usually an SVM, and
with different kernels it will allow you to restore complex dependencies. Remember that linear
methods require data preprocessing, which can be problematic in some cases.

If your data is dense, then you are more lucky. Now everything depends on their amount. If there
are a lot of them, then use a boosting, otherwise - random forest. Both of these algorithms are
powerful, resistant to noise and will show you a good quality, but they will take a long time to
learn and predict. Also, remember that boosting is prone to overfitting.

Block diagram for algorithm selection. Image by Author

What does a lot of data mean? How much is it? Usually talk about a threshold value of 100
thousand samples, but in any case, you can (and most likely will) try different algorithms.

This is just a recommendation, and you should try different algorithms with different
hyperparameters to solve your task in the best way.

TOP SDET JAVA Programs For your next Interview
No ratings yet
TOP SDET JAVA Programs For your next Interview
36 pages
Indian Currency Coins
100% (4)
Indian Currency Coins
1 page
Question Bank - Machine Learning (Repaired)
100% (1)
Question Bank - Machine Learning (Repaired)
78 pages
Kamigakari Expansion 2 - Machine God of Damocles
100% (1)
Kamigakari Expansion 2 - Machine God of Damocles
215 pages
ECE_449_Notes
No ratings yet
ECE_449_Notes
5 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
13 pages
Machinelearning
No ratings yet
Machinelearning
59 pages
ASSIGNMENT2
No ratings yet
ASSIGNMENT2
6 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Supervised ML Algorithms
No ratings yet
Supervised ML Algorithms
9 pages
UNIT 1,2,3
No ratings yet
UNIT 1,2,3
17 pages
Super Cheatsheet Machine Learning
100% (1)
Super Cheatsheet Machine Learning
15 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Lec05 - Supervised
No ratings yet
Lec05 - Supervised
26 pages
ML UNIT-4
No ratings yet
ML UNIT-4
20 pages
CS 229 - Supervised Learning Cheatsheet
No ratings yet
CS 229 - Supervised Learning Cheatsheet
2 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
5 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
ML models
No ratings yet
ML models
21 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Math For Machine Learning Book Preview
0% (1)
Math For Machine Learning Book Preview
43 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
A Broader Understanding of ML and Types of Regression
No ratings yet
A Broader Understanding of ML and Types of Regression
8 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
UNIT-4 PDA PPT
No ratings yet
UNIT-4 PDA PPT
111 pages
Kanksha2021_Chapter_SupervsedLearnngAlgorthmASu
No ratings yet
Kanksha2021_Chapter_SupervsedLearnngAlgorthmASu
9 pages
Machine Learning (1)
No ratings yet
Machine Learning (1)
133 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Cs229 Ml Notes
No ratings yet
Cs229 Ml Notes
192 pages
Lect 1
No ratings yet
Lect 1
24 pages
PID5108657
No ratings yet
PID5108657
8 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Supervised Learning Notes
No ratings yet
Supervised Learning Notes
13 pages
2 Mark Questions
No ratings yet
2 Mark Questions
13 pages
Unit 3 Machine learning (1)
No ratings yet
Unit 3 Machine learning (1)
15 pages
Machine Learning - Brief
No ratings yet
Machine Learning - Brief
12 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
AIML
No ratings yet
AIML
30 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
NB 13
No ratings yet
NB 13
27 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Chap2 SupervisedLearning
No ratings yet
Chap2 SupervisedLearning
24 pages
Machine Learning Revision Notes
No ratings yet
Machine Learning Revision Notes
6 pages
Lec10_Winter2024_annotated
No ratings yet
Lec10_Winter2024_annotated
48 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Ai
No ratings yet
Ai
10 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ML and Deploying It Using Flask and Docker.
No ratings yet
ML and Deploying It Using Flask and Docker.
30 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Exam Preparation- Machine Learning Applications
No ratings yet
Exam Preparation- Machine Learning Applications
4 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
12_Bài toán phân lớp_LR_v2
No ratings yet
12_Bài toán phân lớp_LR_v2
130 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
class_4_ch6.1707791259
No ratings yet
class_4_ch6.1707791259
3 pages
class_4_com_PT2_W.S_2023.1702626179
No ratings yet
class_4_com_PT2_W.S_2023.1702626179
1 page
Std_4_Lesson_11%2C_Our_Water_Resources.1707476049
No ratings yet
Std_4_Lesson_11%2C_Our_Water_Resources.1707476049
4 pages
class_4_ch8.1707791645
No ratings yet
class_4_ch8.1707791645
2 pages
ROBOTICS_NOTES_term2.1708151332
No ratings yet
ROBOTICS_NOTES_term2.1708151332
13 pages
CLASS_3_Annual_Practical_exam_Q.B.1708765217
No ratings yet
CLASS_3_Annual_Practical_exam_Q.B.1708765217
1 page
English_Revision
No ratings yet
English_Revision
3 pages
Std_4_Lesson_12_Our_Mineral_Wealth.1708510512
No ratings yet
Std_4_Lesson_12_Our_Mineral_Wealth.1708510512
2 pages
Std_42C_Woksheet%2C_ANNUAL_EXAM_2023-24.1709440667
No ratings yet
Std_42C_Woksheet%2C_ANNUAL_EXAM_2023-24.1709440667
3 pages
class_4_Annual_Practical_Exam_Question_bank.1708841651
No ratings yet
class_4_Annual_Practical_Exam_Question_bank.1708841651
1 page
Board Infinity _ JUnit
No ratings yet
Board Infinity _ JUnit
34 pages
Chp 3. Types of Testing
No ratings yet
Chp 3. Types of Testing
10 pages
Selenium Full Notes
No ratings yet
Selenium Full Notes
64 pages
1744020988533
No ratings yet
1744020988533
20 pages
Automation Tester Questions Set #1 2024
No ratings yet
Automation Tester Questions Set #1 2024
9 pages
1743514323983
No ratings yet
1743514323983
5 pages
Unmoved Pdfs
No ratings yet
Unmoved Pdfs
4 pages
Mathematics
No ratings yet
Mathematics
3 pages
Rent Receipt Template
No ratings yet
Rent Receipt Template
1 page
Accenture Tech Vision 2024
100% (2)
Accenture Tech Vision 2024
90 pages
Natural Language Processing Rahul Sahai
No ratings yet
Natural Language Processing Rahul Sahai
30 pages
Neural Network Regression
No ratings yet
Neural Network Regression
7 pages
Predicting Perfume Rate and Popularity
No ratings yet
Predicting Perfume Rate and Popularity
5 pages
Ms Richa Verma
No ratings yet
Ms Richa Verma
8 pages
Computer Vision and Image Processing
100% (1)
Computer Vision and Image Processing
24 pages
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
No ratings yet
SpectralSpatial Morphological Attention Transformer For Hyperspectral Image Classification
15 pages
MIS Past Paper 2019 (Solution)
No ratings yet
MIS Past Paper 2019 (Solution)
8 pages
Pravin 2022 Piper
No ratings yet
Pravin 2022 Piper
26 pages
Gartner Future of Work Reinvented - Oct.2022
No ratings yet
Gartner Future of Work Reinvented - Oct.2022
13 pages
Introduction Blinkit
No ratings yet
Introduction Blinkit
34 pages
mini project (1)
No ratings yet
mini project (1)
24 pages
Abstract Review PPT Tem - 03
No ratings yet
Abstract Review PPT Tem - 03
7 pages
Conference Brochure ICGLSII25 Online Mode
No ratings yet
Conference Brochure ICGLSII25 Online Mode
5 pages
AI for Business Professionals
No ratings yet
AI for Business Professionals
1 page
EPGP+in+Machine+Learning+ +AI+Brochure
No ratings yet
EPGP+in+Machine+Learning+ +AI+Brochure
24 pages
Playbook (The Holland Brothers 2)
100% (1)
Playbook (The Holland Brothers 2)
276 pages
Artificial Intelligence (AI) Internship in Gurgaon at Samarth Community
No ratings yet
Artificial Intelligence (AI) Internship in Gurgaon at Samarth Community
2 pages
The Globe and Mail 30-12-2024
No ratings yet
The Globe and Mail 30-12-2024
295 pages
Machine Learning For Marketing in Python
No ratings yet
Machine Learning For Marketing in Python
3 pages
Grade 6 Ict Based On Eos1 Worksheet Unit 2 - Revision - (Answer The Questions Before Looking in This Answerkey)
No ratings yet
Grade 6 Ict Based On Eos1 Worksheet Unit 2 - Revision - (Answer The Questions Before Looking in This Answerkey)
3 pages
Mini Project Presentation 1 (Literature Review)
No ratings yet
Mini Project Presentation 1 (Literature Review)
7 pages
Sop Ai SHV
No ratings yet
Sop Ai SHV
1 page
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
No ratings yet
Chen Et Al. - 2020 - Pre-Trained Image Processing Transformer
13 pages
Documentary Analysis Paper 1 Abou Artificial Inteligence (AI) 1
No ratings yet
Documentary Analysis Paper 1 Abou Artificial Inteligence (AI) 1
2 pages
Priyanka - Activitas - HR Analytics
No ratings yet
Priyanka - Activitas - HR Analytics
90 pages
Neural Networks-Notes
No ratings yet
Neural Networks-Notes
24 pages
Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey
No ratings yet
Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey
38 pages
ML Unit 2 MCQ
100% (2)
ML Unit 2 MCQ
3 pages

Supervised Learning Algorithms cheat sheet

Uploaded by

Supervised Learning Algorithms cheat sheet

Uploaded by

Supervised Learning algorithms cheat sheet by Dmytro Nikolaiev

Supervised Learning Algorithms cheat sheet

Supervised Learning Algorithms cheat sheet ................................................................................. 1

Classification vs Regression. Image by Author

Classification algorithms can be also divided to hard and soft:

Classification algorithms can be also divided by the number of classes to classify:

 Binary classification - only two classes.

 LASSO regression - implements L1 regularization, + |coeff|.

As mentioned earlier, linear regression solve only regression task.

 feature engineering and feature selection

 Have few parameters, and learn fast

 Poorly restores complex dependencies

Sigmoid Function. Public Domain

As mentioned earlier, logistic regression solve only classification task.

Support Vector Machines

Support Vectors. Public Domain

SVM with different kernels and default parameters. Image by Author

 One of the most powerful and flexible models

 Requires data preprocessing

k-Nearest Neighbors algorithm. The result may differ depending on k.

The most important hyperparameter is number of neighbors - k. A good initial approximation of

 Lazy learning - we just have to load data in memory

 Requires data preprocessing

Decision Tree Classifier and Regressor. Image by Author

 Entropy - measure of disorder.

Bagging. Image by Author

 Very good quality

Inherits the pros and cons of bagging.

Inherits all hyperparameters, the pros and cons of random forest.

Boosting. Image by Author

 Types of models and ways of their interaction with each other

 Very good quality, usually better than random forest

Stacking. Image by Author

Differences from boosting:

 Types of models and ways of their interaction with each other

 Improves the quality of the model when nothing else helps

 High computational complexity

Block diagram for algorithm selection. Image by Author

You might also like