Machine Learning A Z Course Downloadable Slides V1.5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 658

Machine

Learning A-Z
Course Slides

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Welcome to the course!

Dear student,

Welcome to the “Machine Learning A-Z” course brought to you by SuperDataScience. We are
super-excited to have you on board! In this class you will learn many interesting and useful
concepts while having lots of fun.

These slides may be updated from time-to-time. If this happens, you will be able to find them in
the course materials repository with a new version indicated in the filename.

We kindly ask that you use these slides only for the purpose of supporting your own learning
journey and we look forward to seeing you inside the class!

Enjoy machine learning,


Kirill & Hadelin

PS: if you are not yet enrolled in the course, you can find it here.

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Who Are Your Instructors?

• Hello! My name is Kirill Eremenko • Hi there! My name is Hadelin de Ponteves


• I have a bachelor’s degree in Physics & Maths • I have a master’s in Machine Learning and I
and a background in Data Analytics consulting used to do Reinforcement Learning at Google
• I used to host the SuperDataScience podcast • I wrote a research paper on Machine Learning

We’ve been teaching online together since 2016 and over 1 Million students have enrolled in our
Machine Learning and Data Science courses. You can be confident that you are in good hands!

© SuperDataScience
Data
Preprocessing

© SuperDataScience
The Machine
Learning
Process

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Machine Learning Process
Data Pre-Processing
• Import the data
• Clean the data
• Split into training & test sets
• Feature Scaling

Modelling
• Build the model
• Train the model
• Make predictions

Evaluation
• Calculate performance metrics
• Make a verdict

© SuperDataScience
Training Set
& Test Set
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Training Set & Test Set

~
Train
80% 𝑦! = 𝑏! + 𝑏"𝑋" + 𝑏#𝑋#

Test
20% V.S.
Predicted values 𝑦! Actual values 𝑦

© SuperDataScience
Feature
Scaling

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling

Normalization Standardization

𝑋 − 𝑋"#$
! 𝑋−𝜇
𝑋 = 𝑋! =
𝑋"%& − 𝑋"#$ 𝜎

[0 ; 1] [-3 ; +3]

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling

70,000 $ 45 yrs
10,000 1

60,000 $ 44 yrs
8,000 4

52,000 $ 40 yrs

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling

Normalization

!
𝑋 − 𝑋"#$
𝑋 =
𝑋"%& − 𝑋"#$

[0 ; 1]

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling

70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
44 yrs
40 yrs
45 yrs
0.444
0
1
Feature Scaling

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.75
0
1
0.444
0
1
Feature Scaling

© SuperDataScience
Regression

© SuperDataScience
Simple Linear
Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression

𝑦! = 𝑏! + 𝑏" 𝑋"
Dependent variable Independent variable

y-intercept (constant)

Slope coefficient

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression
Each point represents
𝑦 [tonnes] a separate harvest

~
(Potato yield)

+3𝑡
𝑦! = 𝑏! + 𝑏"𝑋"
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 𝑏! + 𝑏"×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔
8𝑡

𝑏! = 8[𝑡]
𝑡 +1𝑘𝑔 𝑋! [kg]
𝑏" = 3[ ] (Nitrogen Fertilizer)
𝑘𝑔

© SuperDataScience
Ordinary Least
Squares
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression

𝑦 [tonnes]
Ordinary Least Squares: (Potato yield)

𝑦!
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙: 𝜀! = 𝑦! − 𝑦%!

𝑦%!
𝑦!

𝑦! = 𝑏! + 𝑏"𝑋" 𝑦%!

𝑏!, 𝑏" such that:


𝑆𝑈𝑀(𝑦$ − 𝑦!$ )# is minimized
𝑋! [kg]
(Nitrogen Fertilizer)

© SuperDataScience
Multiple Linear
Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Multiple Linear Regression

𝑦! = 𝑏! + 𝑏" 𝑋" + 𝑏# 𝑋# + ⋯ + 𝑏$ 𝑋$
Dependent variable Independent variable 1 Independent variable 2 Independent variable n

y-intercept Slope coefficient 1 Slope coefficient 2 Slope coefficient n


(constant)

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Multiple Linear Regression

~
3 3 3
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 8𝑡 + 3 45 ×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔 − 0.54 °7 ×𝐴𝑣𝑔𝑇𝑒𝑚𝑝 °𝐶 + 0.04 88 ×𝑅𝑎𝑖𝑛[𝑚𝑚]

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading

The Application of Multiple Linear Regression and


Artificial Neural Network Models for Yield Prediction
of Very Early Potato Cultivars before Harvest

Magdalena Piekutowska et. al. (2021)

Link:

https://www.mdpi.com/2073-4395/11/5/885

© SuperDataScience
R Squared

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
R Squared
𝑦 [tonnes] Regression: 𝑦 [tonnes] Average:
(Potato yield) (Potato yield)

𝑦#$%
𝑦" 𝑦"

𝑦!"

𝑋! [kg] 𝑋! [kg]
(Nitrogen Fertilizer) (Nitrogen Fertilizer)

𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )# 𝑆𝑆393 = 𝑆𝑈𝑀(𝑦$ − 𝑦:;5 )#


Rule of thumb (for our tutorials)*:

𝑆𝑆<=> 1.0 = Perfect fit (suspicious)


~0.9 = Very good
𝑅# =1− <0.7 = Not great
𝑆𝑆393 <0.4 = Terrible
<0 = Model makes no sense for this data

*This is highly dependent on the context

© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Adjusted R Squared
𝑆𝑆<=>
𝑅# =1− R2 – Goodness of fit
𝑆𝑆393 (greater is better)
Problem:
𝑦% = 𝑏' + 𝑏( 𝑋( + 𝑏) 𝑋) + 𝑏* 𝑋* 𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )#
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same (This is because of Ordinary Least Squares: 𝑆𝑆$%& -> Min)

Solution:

𝑛−1
𝐴𝑑𝑗 𝑅# =1− 1 − 𝑅# ×
𝑛−𝑘−1
k – number of independent variables
n – sample size

© SuperDataScience
Assumptions Of
Linear Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Assumptions of Linear Regression
Anscombe's quartet (1973):

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Assumptions of Linear Regression

1. Linearity 2. Homoscedasticity 3. Multivariate Normality


(Linear relationship between Y and each X) (Equal variance) (Normality of error distribution)

4. Independence 5. Lack of Multicollinearity 6. The Outlier Check


(of observations. Includes “no autocorrelation”) (Predictors are not correlated with each other) (This is not an assumption, but an “extra”)

𝑋"~ 𝑋# 𝑋"~ 𝑋#

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Bonus

Download the Assumptions poster at:

superdatascience.com/assumptions

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading

Verifying the Assumptions of Linear Regression


in Python and R

Eryk Lewinson (2019)

Link:

towardsdatascience.com/verifying-the-
assumptions-of-linear-regression-in-python-
and-r-f4cd2907d4c0

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Profit R&D Spend Admin Marketing State

192,261.83 165,349.20 136,897.80 471,784.10 New York


191,792.06 162,597.70 151,377.59 443,898.53 California
191,050.39 153,441.51 101,145.55 407,934.54 California
182,901.99 144,372.41 118,671.85 383,199.62 New York
166,187.94 142,107.34 91,391.77 366,168.42 California

y = b0 + b1*x1 + b2*x2 + b3*x3 + ???

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dummy Variables

Profit R&D Spend Admin Marketing State New York California

192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0


191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dummy Variables

Profit R&D Spend Admin Marketing State New York California

192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0


191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dummy Variables

Profit R&D Spend Admin Marketing State New York California

192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0


191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39
182,901.99
153,441.51
144,372.41
D =1-D
101,145.55
118,671.85 2
407,934.54
383,199.62
California
1New York
0
1
1
0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1 + b5*D2

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dummy Variables

Profit R&D Spend Admin Marketing State New York California

192,261.83 165,349.20 136,897.80 471,784.10 New York 1 0


191,792.06 162,597.70 151,377.59 443,898.53 California 0 1
191,050.39 153,441.51 101,145.55 407,934.54 California 0 1
182,901.99 144,372.41 118,671.85 383,199.62 New York 1 0
166,187.94 142,107.34 91,391.77 366,168.42 California 0 1

y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*D1 + b5*D2

Always omit one dummy


variable
Machine Learning A-Z © SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X7
X6
X5

Why?
X4

y
X3
X2

X1

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
2)
1)
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
5 methods of building models:
1. All-in
2. Backward Elimination
3. Forward Selection Stepwise
4. Bidirectional Elimination Regression
5. Score Comparison

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
“All-in” – cases:
• Prior knowledge; OR
• You have to; OR
• Preparing for Backward
Elimination

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Backward Elimination
STEP 1: Select a significance level to stay in the model (e.g. SL = 0.05)

STEP 2: Fit the full model with all possible predictors

STEP 3: Consider the predictor with the highest P-value. If P > SL, go to STEP 4, otherwise go to FIN

STEP 4: Remove the predictor

STEP 5: Fit model without this variable*


FIN: Your Model Is Ready

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Forward Selection
STEP 1: Select a significance level to enter the model (e.g. SL = 0.05)

STEP 2: Fit all simple regression models y ~ x n Select the one with the lowest P-value

STEP 3: Keep this variable and fit all possible models with one extra predictor added to the one(s) you
already have

STEP 4: Consider the predictor with the lowest P-value. If P < SL, go to STEP 3, otherwise go to FIN

FIN: Keep the previous model

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Bidirectional Elimination
STEP 1: Select a significance level to enter and to stay in the model
e.g.: SLENTER = 0.05, SLSTAY = 0.05

STEP 2: Perform the next step of Forward Selection (new variables must have: P < SLENTER to enter)

STEP 3: Perform ALL steps of Backward Elimination (old variables must have P < SLSTAY to stay)

STEP 4: No new variables can enter and no old variables can exit

FIN: Your Model Is Ready

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
All Possible Models
STEP 1: Select a criterion of goodness of fit (e.g. Akaike criterion)

STEP 2: Construct All Possible Regression Models: 2N-1 total combinations

STEP 3: Select the one with the best criterion

Example:
FIN: Your Model Is Ready
10 columns means
1,023 models

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
5 methods of building models:
1. All-in
2. Backward Elimination
3. Forward Selection
4. Bidirectional Elimination
5. Score Comparison

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
In this section we learned:
1. How to create dummies for categorical IVs
2. How to avoid the dummy variable trap
3. Backward, Forward, Bidirectional, All Possible
4. We actually built a model. Step-By-Step!!
5. How to use adjusted R-squared in modelling
6. How to interpret coefficients of a MLR

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple
Linear
Regression

Multiple
Linear
Regression

Polynomial
Linear
Regression

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1

Machine Learning A-Z


y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1

Machine Learning A-Z


y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1

Machine Learning A-Z


y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Polynomial
Linear
Regression

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
1992
Vladimir Vapnik

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Ordinary Least Squares ε-Insensitive Tube
x2 SUM (y – ŷ)2 -> min x2

ε
ε

x1 x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
%
1
𝑤 !
+ 𝐶 & 𝜉" + 𝜉"∗ → 𝑚𝑖𝑛
2
"#$

Ordinary Least Squares ε-Insensitive Tube


x2 SUM (y – ŷ)2 -> min x2 Slack Variables ξi and ξi*

ξ5 ε
ξ3
ξ2 ε
ξ4*

ξ1*

x1 x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Chapter 4 – Support Vector Regression


(from: Efficient Learning Machines:
Theories, Concepts, and Applications for
Engineers and System Designers)

By Mariette Awad & Rahul Khanna (2015)

Link:

https://core.ac.uk/download/pdf/81523322.pdf

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1
ε

Machine Learning A-Z


x2
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Section on SVM:
• SVM Intuition

Section on Kernel SVM:


Y
• Kernel SVM Intuition
• Mapping to a higher dimension
• The Kernel Trick
• Types of Kernel Functions
• Non-linear Kernel SVR
X

Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Y
X2 Split 1

Split 3
200
Split 2
170

Split 4

20 40 X1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X1
Split 1

20
X2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X1 < 20

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X1
Split 2
Split 1

20
X2

170

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X1 < 20

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X2 < 170

Yes
No
X1 < 20

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X2 Split 1

Split 3
200
Split 2
170

20 X1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X2 < 170

Yes
No
X1 < 20

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 < 20

Yes No

X2 < 200 X2 < 170

Yes No Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X2 Split 1

Split 3
200
Split 2
170

Split 4

20 40 X1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 < 20

Yes No

X2 < 200 X2 < 170

Yes No Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 < 20

Yes No

X2 < 200 X2 < 170

Yes No Yes No

X1 < 40

Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Y
X2 Split 1

65.7
Split 3 1023
200
Split 2
170

300.5 -64.1 0.7

Split 4

20 40 X1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 < 20

Yes No

X2 < 200 X2 < 170

Yes No Yes No

X1 < 40

Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 < 20

Yes No

X2 < 200 X2 < 170

Yes No Yes No

300.5 65.7 X1 < 40 1023

Yes No

-64.1 0.7

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Pick at random K data points from the Training set.

STEP 2: Build the Decision Tree associated to these K data points.

STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2

STEP 4: For a new data point, make each one of your Ntree trees predict the value of Y to
for the data point in question, and assign the new data point the average across all of the
predicted Y values.

Machine Learning A-Z © SuperDataScience


R Squared

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
R Squared
𝑦 [tonnes] Regression: 𝑦 [tonnes] Average:
(Potato yield) (Potato yield)

𝑦#$%
𝑦" 𝑦"

𝑦!"

𝑋! [kg] 𝑋! [kg]
(Nitrogen Fertilizer) (Nitrogen Fertilizer)

𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )# 𝑆𝑆393 = 𝑆𝑈𝑀(𝑦$ − 𝑦:;5 )#


Rule of thumb (for our tutorials)*:

𝑆𝑆<=> 1.0 = Perfect fit (suspicious)


~0.9 = Very good
𝑅# =1− <0.7 = Not great
𝑆𝑆393 <0.4 = Terrible
<0 = Model makes no sense for this data

*This is highly dependent on the context

© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Adjusted R Squared
𝑆𝑆<=>
𝑅# =1− R2 – Goodness of fit
𝑆𝑆393 (greater is better)
Problem:
𝑦% = 𝑏' + 𝑏( 𝑋( + 𝑏) 𝑋) + 𝑏* 𝑋* 𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )#
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same (This is because of Ordinary Least Squares: 𝑆𝑆$%& -> Min)

Solution:

𝑛−1
𝐴𝑑𝑗 𝑅# =1− 1 − 𝑅# ×
𝑛−𝑘−1
k – number of independent variables
n – sample size

© SuperDataScience
Classification

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Classification?
Classification: a Machine Learning technique to identify the
category of new observations based on training data.

Likely to stay Likely to leave Dogs Cats

© SuperDataScience
Logistic
Regression

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Logistic Regression
Logistic regression: predict a categorical 𝑦 [yes/no]
dependent variable from a number of (Took up offer?)
independent variables.

~
YES
YES

81%

≥ 50%
Will purchase Age
health insurance: < 50%
Yes / No 42%

𝑝
ln = 𝑏! + 𝑏" 𝑋" NO
NO

1−𝑝 18 35 45 60
𝑋! [yrs]
(Age)

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Logistic Regression

~
Will purchase Age Income Level of Family or
health insurance: Education Single
Yes / No

'
ln ()' = 𝑏* + 𝑏( 𝑋( + 𝑏+ 𝑋+ + 𝑏, 𝑋, + 𝑏- 𝑋-

© SuperDataScience
Maximum
Likelihood

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Likelihood
𝑦 [yes/no] 𝑦 [yes/no]
(Took up offer?) (Took up offer?)

YES YES
0.95 0.98 1- 0.96
0.92
1- 0.58

0.54

1- 0.10
1- 0.01 1- 0.04
0.03
NO NO
𝑋! [yrs] 𝑋! [yrs]
18 60 18 60
(Age) (Age)

Likelihood = 0.03 x 0.54 x 0.92 x 0.95 x 0.98 x (1 – 0.01) x (1 – 0.04) x (1 – 0.10) x (1 – 0.58) x (1 – 0.96)

Likelihood = 0.00019939

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Likelihood

𝑦 [yes/no]
(Took up offer?)
Likelihood = 0.00007418
Likelihood = 0.00012845

YES Likelihood = 0.00016553

Likelihood = 0.00019939

Best Curve <= Maximum Likelihood

NO
𝑋! [yrs]
18 60
(Age)

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Before K-NN After K-NN
X2 x2

Category 2 Category 2

New data point K-NN New data point assigned


to Category 1

Category 1 Category 1

x1 x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Choose the number K of neighbors

STEP 2: Take the K nearest neighbors of the new data point, according to the Euclidean distance

STEP 3: Among these K neighbors, count the number of data points in each category

STEP 4: Assign the new data point to the category where you counted the most neighbors

Your Model is Ready

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Choose the number K of neighbors: K = 5

x2

Category 2

New data point

Category 1

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
y

y2 P2(x2,y2)

y1
P1(x1,y1)

x1 x2 x

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2

Category 1: 3 neighbors
Category 2
Category 2: 2 neighbors
New data point

Category 1

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Margin

x2

Support
Vectors

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Margin

x2 Positive Hyperplane

Maximum Margin
Hyperplane
(Maximum Margin Classifier)

Support
Vectors

Negative Hyperplane

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Support
Vectors

x1
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Linearly Separable Not Linearly Separable

x2 x2

x1 x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
f=x-5

x1
0

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
f = (x – 5)^2
f=x-5

x1
0

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
f = (x – 5)^2
f=x-5

x1
0

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
New z
2D Space Dimension
3D Space
x2

Hyperplane

Mapping Function

x2

x1
x1
Machine Learning A-Z © SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2D Space
z 3D Space x2

Non Linear Separator

Projection

x2

x1

x1
Machine Learning A-Z © SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Mapping to a Higher Dimensional Space
can be highly compute-intensive

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
"

#* "
⃗ )⃗
(& !
&
⃗%=𝑒⃗ 𝑙
𝐾 𝑥,

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
"
⃗ %⃗!
$"
"
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "

Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2D Space
x2

"
⃗ %⃗!
$"
"
La
nd
ma
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "
rk

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2D Space
x2

"
⃗ %⃗!
$"
"
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2D Space
x2

"
⃗ %⃗!
$"
"
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2 ⃗ 𝑙⃗( + 𝐾 𝑥,
𝐾 𝑥, ⃗ 𝑙⃗&
(Simplified Formula)

Green when:
⃗ 𝑙⃗( + 𝐾 𝑥,
𝐾 𝑥, ⃗ 𝑙⃗& > 0
Red when:
⃗ 𝑙⃗( + 𝐾 𝑥,
𝐾 𝑥, ⃗ 𝑙⃗& = 0

x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
"
⃗ %⃗!
$"
"
Gaussian RBF Kernel ⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "

Sigmoid Kernel 𝐾 𝑋, 𝑌 = tanh 𝛾 X 𝑋 E 𝑌 + 𝑟

Polynomial Kernel 𝐾 𝑋, 𝑌 = 𝛾 X 𝑋 E 𝑌 + 𝑟 F , 𝛾 > 0

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Section on SVR:
• SVR Intuition
Section on SVM:
• SVM Intuition
Y
Section on Kernel SVM:
• Kernel SVM Intuition
• Mapping to a higher dimension
• The Kernel Trick
• Types of Kernel Functions X

• Non-linear Kernel SVR


Image source: http://www.cs.toronto.edu/~duvenaud/cookbook/index.html

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#
⃗ (⃗"
'%
%
⃗ 𝑙⃗$ =
𝐾 𝑥, 𝑒 )*#

Y Y

X X

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#
⃗ (⃗"
'%
%
⃗ 𝑙⃗$ =
𝐾 𝑥, 𝑒 )*#

Y Y

X X

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#
⃗ (⃗"
'%
%
⃗ 𝑙⃗$ =
𝐾 𝑥, 𝑒 )*#

Y Y

X X

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
X
Y

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#
Y ⃗ (⃗"
'%
%
⃗ 𝑙⃗$ =
𝐾 𝑥, 𝑒 )*#

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1

m2 m2 m2 m2 m2 m2 m2 m2 m2 m2

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
m2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
m2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴)
𝑃(𝐵)
𝑃 𝐴𝐵 =

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴)
𝑃(𝐵)
𝑃 𝐴𝐵 =

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
X2
Category
Drives 2

Walks 1
Category

X
Age
1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives

Walks

New data point

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝐵 𝐴 ∗ 𝑃(𝐴)
𝑃(𝐵)
𝑃 𝐴𝐵 =

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Age
Drives

Walks
Salary

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
𝑃(𝑊𝑎𝑙𝑘𝑠) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
10
Walks 𝑃 𝑊𝑎𝑙𝑘𝑠 =
30

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃(𝑋) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
4
Walks 𝑃 𝑋 =
30

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟
𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝐴𝑚𝑜𝑛𝑔 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑊𝑎𝑙𝑘
𝑃(𝑋|𝑊𝑎𝑙𝑘𝑠) =
Walks 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
3
𝑃 𝑋|𝑊𝑎𝑙𝑘𝑠 =
10

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

3 10
10 ∗ 30
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 = = 0.75
4
30
#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

1 20
20 ∗ 30
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 = = 0.25
4
30
#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
0.75 𝑣. 𝑠. 0.25

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
0.75 > 0.25

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 > 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives

Walks

New data point

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives

Walks

New data point

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Age
Drives

Walks
Salary

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐷𝑟𝑖𝑣𝑒𝑟𝑠
𝑃(𝐷𝑟𝑖𝑣𝑒𝑠) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
20
Walks 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 =
30

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃(𝑋) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
4
Walks 𝑃 𝑋 =
30

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟
𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝐴𝑚𝑜𝑛𝑔 𝑡ℎ𝑜𝑠𝑒 𝑤ℎ𝑜 𝑊𝑎𝑙𝑘
𝑃(𝑋|𝐷𝑟𝑖𝑣𝑒𝑠) =
Walks 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑊𝑎𝑙𝑘𝑒𝑟𝑠
1
𝑃 𝑋|𝐷𝑟𝑖𝑣𝑒𝑠 =
20

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

1 20
20 ∗ 30
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 = = 0.25
4
30
#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives

Walks

New data point

Age

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Salary
Drives
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑃(𝑋) =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
4
Walks 𝑃 𝑋 =
30

Age
NOTE: Same both times

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
#3 Likelihood #1 Prior Probability
#4 Posterior Probability

𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)

#2 Marginal Likelihood

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋
𝑣. 𝑠.
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠) 𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑣. 𝑠.
𝑃(𝑋) 𝑃(𝑋)

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 𝑣. 𝑠. 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
0.75 𝑣. 𝑠. 0.25

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
0.75 > 0.25

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 > 𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
x1
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Split 1

x1
60
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Split 1

x1
Split 2

50
60
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2 Split 2

60 Split 3 Split 1

50 70
x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2 Split 2

60 Split 3 Split 1

Split 4

50 70
x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Split 1

x1
60
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X2 < 60

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Split 1

x1
Split 2

50
60
x2

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
No
X1 < 50

Yes
No
X2 < 60

Yes

Machine Learning A-Z


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2 Split 2

60 Split 3 Split 1

50 70
x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X2 < 60

Yes No

X1 < 70 X1 < 50

Yes No Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x2 Split 2

60 Split 3 Split 1

20 Split 4

50 70
x1

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X2 < 60

Yes No

X1 < 70 X1 < 50

Yes No Yes No

X2 < 20

Yes No

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Pick at random K data points from the Training set.

STEP 2: Build the Decision Tree associated to these K data points.

STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2

STEP 4: For a new data point, make each one of your Ntree trees predict the category to
which the data points belongs, and assign the new data point to the category that wins
the majority vote.

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
pŷ̂ (Probability)
y(Predicted
(Actual DV)
DV)

ŷ=1 ŷ=1
1

0.5

X
ŷ=0 ŷ=0

20 30 40 50

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
p̂ y
(Probability)
(Actual DV)

#2 #4
1

0.5

#1 #3 X

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
False Posi.ve
p̂ (Probability)
y (Actual DV) (Type I Error)

ŷ (Predicted DV) #2 #4
1

0.5

#1 #3 X

False Nega.ve
(Type II Error) Fin.

Machine Learning A-Z © SuperDataScience


Confusion
Matrix &
Accuracy

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Confusion Matrix & Accuracy

Prediction

NEG POS

TRUE FALSE
NEG
NEG POS
Actual

FALSE TRUE
POS
NEG POS

Type II Error Type I Error


Image source: nature.com
(False Negatives) (False Positives)

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Confusion Matrix & Accuracy

Prediction

NEG POS Accuracy Rate and Error Rate:

𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑇𝑁 + 𝑇𝑃 84
𝐴𝑅 = = = = 84%
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙 100
NEG 43 12
𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝐹𝑃 + 𝐹𝑁 16
Actual

𝐸𝑅 = = = = 16%
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙 100
POS 4 41

Type II Error Type I Error


(False Negatives) (False Positives)

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading

Understanding the Confusion Matrix from Scikit


learn

Samarth Agrawal (2021)

Link:

https://towardsdatascience.com/understanding-the-
confusion-matrix-from-scikit-learn-c51d88929c79

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ (Predicted DV) Scenario 1:

Accuracy Rate = Correct / Total


0 1 AR = 9,800/10,000 = 98%

0 9,700 150
y (Actual DV)

1 50 100

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ (Predicted DV) Scenario 1:

Accuracy Rate = Correct / Total


0 1 AR = 9,800/10,000 = 98%

0 9,850 0 Scenario 2:
y (Actual DV)

Accuracy Rate = Correct / Total


AR = 9,850/10,000 = 98.5%
1 150 0

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Purchased

10,000

8,000

6,000

4,000

2,000

0
0 20,000 40,000 60,000 80,000 100,000 Total Contacted

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Purchased Crystal Ball Good Model

100%

80%

60%
Poor Model
40%
Random
20%

0
10%
0 20% 40% 60% 80% 100% Total Contacted

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Note:

CAP = Cumulative Accuracy Profile

ROC = Receiver Operating Characteristic

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Purchased Model

100%

80%

60%

40%
Random
20%

0
0 20% 40% 60% 80% 100% Total Contacted

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Purchased Perfect Model Good Model

100%

80%
aP
aR
60%
aR AR = aP
40%
Random Model
20%

0
0 20% 40% 60% 80% 100% Total Contacted

Machine Learning A-Z © SuperDataScience


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Purchased Perfect Model Good Model

100%
X%
80% 90% < X < 100% Too Good
80% < X < 90% Very Good
60% 70% < X < 80% Good
60% < X < 70% Poor
X < 60% Rubbish
40%
Random Model
20%

0
50%
0 20% 40% 60% 80% 100% Total Contacted

Machine Learning A-Z © SuperDataScience


Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?

Clustering – grouping
unlabelled data

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?
Supervised Learning
(e.g. Regression, Classification)

Unsupervised Learning
(e.g. Clustering)

Image source: mdpi.com/2073-8994/10/12/734

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?

Spending Score Spending Score

Clustering

Annual Income $ Annual Income $

© SuperDataScience
K-Means
Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering

© SuperDataScience
The Elbow
Method

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method

Within Cluster Sum of Squares:

...

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 1
C1
The Elbow Method

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 2
C2
Cluster 1
The Elbow Method

C1

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method

C2 Cluster 2

C1
Cluster 1

Cluster 3
C3

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
The Elbow Method

Optimal number of clusters

© SuperDataScience
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

Cluster 2
K-Means
Cluster 1

Cluster 3

Different results

Cluster 2
K-Means
Cluster 3

Cluster 1

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

K-Means++ Initialization Algorithm:

Step 1: Choose first centroid at random among data points

Step 2: For each of the remaining data points compute the distance (D)
to the nearest out of already selected centroids

Step 3: Choose next centroid among remaining data points using


weighted random selection – weighted by D2

Step 4: Repeat Steps 2 and 3 until all k centroids have been selected

Step 5: Proceed with standard k-means clustering

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 2

Cluster 3
Cluster 1
K-Means++

© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
After HC

Same as K-Means but different process


HC
Before HC
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms N clusters

STEP 2: Take the two closest data points and make them one cluster That forms N-1
clusters

STEP 3: Take the two closest clusters and make them one cluster That forms N - 2
clusters

STEP 4: Repeat STEP 3 until there is only one cluster

FIN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x
P2(x2,y2)

x2
P1(x1,y1)

x1
y2

y1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Distance Between Two Clusters:

• Option 1: Closest Points

• Option 2: Furthest Points

• Option 3: Average Distance

• Option 4: Distance Between Centroids


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Consider the following dataset of N = 6 data points
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms 6 clusters
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Make each data point a single-point cluster That forms 6 clusters
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 2: Take the two closest data points and make them one cluster
That forms 5 clusters
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 3: Take the two closest clusters and make them one cluster
That forms 4 clusters
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 4: Repeat STEP 3 until there is only one cluster

FIN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z

P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z

P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z

P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z

P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z

P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
P3
P2
P1
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
2 clusters
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
4 clusters
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
6 clusters
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience

Largest distance
2 clusters
P1
P3

P2

Machine Learning A-Z


P4

P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience

Largest distance
3 clusters

P1
P3
P2
P4
P6
P5

Machine Learning A-Z


P7

P9
P8
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
People who bought also bought …
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
User ID Movies liked
46578 Movie1, Movie2, Movie3, Movie4
98989 Movie1, Movie2
71527 Movie1, Movie2, Movie4
78981 Movie1, Movie2
89192 Movie2, Movie4
61557 Movie1, Movie3

Movie1 Movie2

Potential Rules: Movie2 Movie4

Movie1 Movie3
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo

Burgers French Fries

Potential Rules: Vegetables Fruits

Burgers, French Fries Ketchup


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Support = 10 / 100 = 10%
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Confidence = 7 / 40 = 17.5%
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Lift = 17.5% / 10% = 1.75
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Step 1: Set a minimum support and confidence

Step 2: Take all the subsets in transactions having higher support than minimum support

Step 3: Take all the rules of these subsets having higher confidence than minimum confidence

Step 4: Sort the rules by decreasing lift


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
People who bought also bought …
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
User ID Movies liked
46578 Movie1, Movie2, Movie3, Movie4
98989 Movie1, Movie2
71527 Movie1, Movie2, Movie4
78981 Movie1, Movie2
89192 Movie2, Movie4
61557 Movie1, Movie3

Movie1 Movie2

Potential Rules: Movie2 Movie4

Movie1 Movie3
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo
Burgers French Fries

Potential Rules: Vegetables Fruits

Burgers, French Fries Ketchup


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Market Basket Optimisation:
Movie Recommendation:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Step 1: Set a minimum support

Step 2: Take all the subsets in transactions having higher support than minimum support

Step 3: Sort these subsets by decreasing support


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D1 D2 D3 D4 D5

Examples used for educational purposes. No affiliation with Coca-


Cola
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
D5
D4
D3
D2
D1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Where we think the μ* values will be
I.e. We are NOT trying to guess the distributions behind the machines

Return
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
• Deterministic • Probabilistic
• Requires update at every round • Can accommodate delayed feedback
• Better empirical evidence
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Here’s what we will learn:
• Types of Natural Language Processing
• Classical vs Deep Learning Models
• End-to-end Deep Learning Models
• Bag-Of-Words

• Note: Seq2Seq and Chatbots are outside the


scope of this course
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Learning
Deep

Seq2Seq
DNLP
Processing
Language
Natural
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Learning
Deep

Seq2Seq
DNLP
Processing
Language
Natural
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Some examples:
1. If / Else Rules (Chatbot)
2. Audio frequency components analysis (Speech
Recognition) NLP DL
3. Bag-of-words model (Classification)
4. CNN for text Recognition (Classification)
5. Seq2Seq (many applications)
Comment Pass/Fail
Great job! 1
Amazing work.
Well done.
1
Yes
1
I’m back EOS Seq2Seq
Very well written. 1
Poor effort. 0
Could have done better. 0
h0 h1 h2Try harder
h3 next time.
… h0
n g0 g1 g2
… …

Hello Kirill , Checking EOS Yes I’m back

Encoder Decoder

Image Source: www.wildml.com


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Deep Learning
End-to-end

Models
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]

20,000 elements long

if badminton table
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]

20,000 elements long

SOS Special
EOS Words
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]

20,000 elements long


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]

20,000 elements long


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]

20,000 elements long


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]

20,000 elements long

Training Data:
Hey mate, have you read about Hinton’s capsule networks?
Did you like that recipe I sent you last week?
Hi Kirill, are you coming to dinner tonight?
Dear Kirill, would you like to service your car with us again?
Are you coming to Australia in December?

NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]

20,000 elements long

Training Data:
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]

NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NLP DL
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V

[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]

20,000 elements long

Training Data:
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]
Image
… Source: www.helloacm.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2017

25,600x
1980

2x
1956
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: mkomo.com
Log-scale
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: nature.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: Time Magazine
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Geoffrey Hinton
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: www.austincc.edu
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
4
Input value 1 1

5
Input value 2 2
8 Output value

Input value 3 3

Input Hidden Output


Layer Layer Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output Layer
8
4

Input Layer
2

3
1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output Layer
8

Hidden Layers
4

Input Layer
2

3
1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Artificial Neural Networks Used for Regression & Classification
Supervised

Convolutional Neural Networks Used for Computer Vision

Recurrent Neural Networks Used for Time Series Analysis


Unsupervised

Self-Organizing Maps Used for Feature Detection

Deep Boltzmann Machines Used for Recommendation Systems

AutoEncoders Used for Recommendation Systems


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What we will learn in this section:
• The Neuron
• The Activation Function
• How do Neural Networks work? (example)
• How do Neural Networks learn?
• Gradient Descent
• Stochastic Gradient Descent
• Backpropagation
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: www.austincc.edu
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
Axon

Neuron
Dendrites
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Node

neuron
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
neuron

Input signal m
Input signal 1

Input signal 2
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output signal
neuron

Input signal m
Input signal 1

Input signal 2
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1

Input value 2 X2 neuron Output signal

Input value m Xm

Synapse
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1

Input value 2 X2 neuron y Output value

Input value m Xm

Synapse
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1

Input value 2 X2 neuron y Output value


Independent
variable 2

Input value m Xm
Independent
variable m

Standardize
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Efficient BackProp

By Yann LeCun et al. (1998)

Link:

http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1

Input value 2 X2 neuron y Output


Outputvalue
value
Independent
variable 2

Can be:
• Continuous (price)
• Binary (will exit yes/no)
Input value m Xm • Categorical
Independent
variable m
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1 y1 Output value 1

Input value 2 X2 neuron yy2 Output value 2


Independent
variable 2

y3 Output value p
Input value m Xm
Independent
variable m
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1

Input value 2 X2 neuron y Output value


Independent
variable 2

Input value m Xm
Independent
variable m

Same observation
Single Observation Single Observation
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

Input value 2 X2 w2 neuron y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

Input value 2 X2 w2
?
neuron y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

1st step:
Input value 2 X2 w2 y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

2nd step:
Input value 2 X2 w2 y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

2nd step:
3rd step
Input value 2 X2 w2 y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

2nd step:
3rd step
Input value 2 X2 w2 y Output value

wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Threshold Function

0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Sigmoid

0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Rectifier

0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hyperbolic Tangent (tanh)

-1
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Deep sparse rectifier


neural networks

By Xavier Glorot et al. (2011)

Link:

http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

2nd step:
3rd step
Input value 2 X2 w2 y Output value

wm
Input value m Xm
If threshold activation function:

Assuming the DV is binary (y = 0 or


1)
If sigmoid activation function:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
4
Input value 1 X1

5
Output value
Input value 2 X2 y
6

Input value m Xm
7

Input Hidden Output


Layer Layer Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1
w1

Bedrooms X2 w2 Price = w1*x1+ w2*x2+ w3*x3+ w4*x4


y
w3
Distance to city (Miles) X3

w4
Age X4

Input Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

.
Input Layer Hidden Layer Output Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

.
Input Layer Hidden Layer Output Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

y Price

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1

Input value 2 X2 w2 ŷ
y Output value

wm
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1

Input value 2 X2 w2 ŷ Output value

wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm Xm
y y

X1 X1
w1 w1

w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ

wm wm
Xm Xm
y y

X1 X1

Adjust w1, w2, w3


w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

A list of cost functions used in


neural networks, alongside
applications

CrossValidated (2015)

Link:

http://stats.stackexchange.com/questions/154879/a-list-of-cost-
functions-used-in-neural-networks-alongside-applications
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm Xm
y y

X1 X1
w1 w1

w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ

wm wm
Xm Xm
y y

X1 X1

Adjust w1, w2, w3


w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2
Output value

Actual value
y
y
ŷ
w1
X1
Input value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2

ŷ
Best!
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

y Price

Distance to city (Miles) X3

Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1

Bedrooms X2

y Price

Distance to city (Miles) X3

25 weights
Age X4

Input Layer Hidden Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
1,000 x 1,000 x … x 1,000 = 1,00025 = 1075 combinations

Sunway TaihuLight: World’s fastest Super Computer

93 PFLOPS

93 x 1015

1075 / (93 x 1015)

= 1.08 x 1058 seconds

= 3.42 x 1050 years


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: neuralnetworksanddeeplearning.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2

ŷ
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2

ŷ
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ
Best!
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm Xm
y y

X1 X1
w1 w1

w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ

wm wm
Xm Xm
y y

X1 X1

Adjust w1, w2, w3


w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm Xm
y y

X1 X1
w1 w1

w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ

wm wm
Xm Xm
y y

X1 X1

Adjust w1, w2, w3


w1 w1

X2 w2 ŷ X2 w2 ŷ

wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1

X2 w2 ŷ X2 w2 ŷ

Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s

Upd w’s
Upd w’s
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

A Neural Network in 13 lines


of Python (Part 2 - Gradient
Descent)

Andrew Trask (2015)

Link:

https://iamtrask.github.io/2015/07/27/python-network-part2/
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Neural Networks and Deep


Learning

Michael Nielsen (2015)

Link:

http://neuralnetworksanddeeplearning.com/chap2.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Forward
Backpropagation
Propagation

Image Source: neuralnetworksanddeeplearning.com


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Neural Networks and Deep


Learning

Michael Nielsen (2015)

Link:

http://neuralnetworksanddeeplearning.com/chap2.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Randomly initialise the weights to small numbers close to 0 (but not 0).

STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.

STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each
neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.

STEP 4: Compare the predicted result to the actual result. Measure the generated error.

STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to
how much they are responsible for the error. The learning rate decides by how much we update the
weights.

STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).

STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What we will learn in this section:
• What are Convolutional Neural Networks?
• Step 1 - Convolution Operation
• Step 1(b) - ReLU Layer
• Step 2 - Pooling
• Step 3 - Flattening
• Step 4 - Full Connection
• Summary

• EXTRA: Softmax & Cross-Entropy


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: a talk by Geoffrey Hinton
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: google trends
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Yann Lecun
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Facebook
Google
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output

(Image
Label

class)
CNN
Input Image
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Happy

Sad
CNN

CNN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
B / W Image 2x2px

Pixel 1 Pixel 2 Pixel 1 Pixel 2


2d array

Pixel 3 Pixel 4 Pixel 3 Pixel 4

Red channel Green


Colored Image 2x2px channel

Pixel 1 Pixel 2
3d array Colored
Pixel 1 Pixel 2
Colored
Image
Image
Pixel 3 Pixel 4
Pixel 3 Pixel 4

Blue channel
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0

0
0

0
0

0
0

0
0

0
0

0
0

0
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Convolution

STEP 2: Max Pooling

STEP 3: Flattening

STEP 4: Full Connection


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Gradient-Based Learning
Applied to Document
Recognition

By Yann LeCun et al. (1998)

Link:

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Introduction to
Convolutional Neural
Networks

By Jianxin Wu (2017)

Link:
http://cs.nju.edu.cn/wujx/paper/CNN.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature
Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0
0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1
0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1 2
0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0

0 0 0 1 0 0 0 1 0 0 1 0 1 2 1

0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1 2 1
0 0 0 0 0 0 0

Input Feature Feature Map


Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We create many Feature
feature maps to Maps
0 0 0 0 0 0 0 obtain our first
convolution layer
0 1 0 0 0 1 0

0 0 0 0 0 0 0

0 0 0 1 0 0 0

0 1 0 0 0 1 0

0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Image

Convolutional
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Sharpen:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Blur:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Edge Enhance:

Image Source: docs.gimp.org/en/plug-in-convmatrix.html


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Edge Detect:

Image Source: docs.gimp.org/en/plug-in-convmatrix.html


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Emboss:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: eonardoaraujosantos.gitbooks.io
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We create many Feature
feature maps to Maps
0 0 0 0 0 0 0 obtain our first
convolution layer
0 1 0 0 0 1 0

0 0 0 0 0 0 0

0 0 0 1 0 0 0

0 1 0 0 0 1 0

0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Image

Convolutional
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Maps
0 0 0 0 0 0 0
0 1 0 0 0 1 0 Rectifier
0 0 0 0 0 0 0
y
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0
Input Image

Convolutional Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Understanding
Convolutional Neural
Networks with A
Mathematical Model

By C.-C. Jay Kuo (2016)

Link:

https://arxiv.org/pdf/1609.04112.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Delving Deep into Rectifiers:


Surpassing Human-Level
Performance on ImageNet
Classification

By Kaiming He et al. (2015)

Link:

https://arxiv.org/pdf/1502.01852.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0

Feature Map
0

2
0

1
1

0
0

0
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 Max Pooling
1 0 1 2 1

1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1
Max Pooling
1 0 1 2 1

1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1
Max Pooling
1 0 1 2 1

1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1

1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4
1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2
1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0

0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0
0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0 2
0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0

0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0 2 1
0 0 1 2 1

Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

Evaluation of Pooling
Operations in Convolutional
Architectures for Object
Recognition

By Dominik Scherer et al. (2010)

Link:

http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0

0 1 0 0 0 1 0

0 0 0 0 0 0 0

0 0 0 1 0 0 0

0 1 0 0 0 1 0 Convolution Pooling

0 0 1 1 1 0 0

0 0 0 0 0 0 0

Input Image
Convolutional Pooling Layer
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: scs.ryerson.ca/~aharley/vis/conv/flat.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Pooled Feature
0
1
1
1
2
2
1
4
0

Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
1
1
0
4
2
1
0
2
1
Flattening

Pooled Feature
0
1
1
1
2
2
1
4
0

Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input layer of a future ANN
Flattening

Pooling Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0 Convolution Pooling Flattening

0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0

Input Image
Input
layer of
Convolutional Pooling Layer a future
Layer ANN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1

Flattening
X2 Output
value

Xm

Input Layer Fully Connected Layer Output Layer


NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
0.9

0.2

0.2
0.1

0.1

0.1
1

1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
0.9

0.2

0.2
0.1

0.1

0.1
1

1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
0.9

0.9
0.2

0.2

0.2
0.1

0.1

1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
0.9

0.9
0.2

0.2

0.2
0.1

0.1

1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog

Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95
Dog

Cat
0.4

0.8

0.8

0.2
0.1

0.1
1

Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.79
0.21
Dog

Cat
0.4
0.8

0.9

0.2
0.1

0.1
1

1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: a talk by Geoffrey Hinton
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

The 9 Deep Learning Papers


You Need To Know About
(Understanding CNNs Part 3)

Adit Deshpande (2016)

Link:

https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-
Learning-Papers-You-Need-To-Know-About.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95
Dog

Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95

z2
z1
Dog

Cat
Flattenin
g
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
1
0
Dog 0.9

0.1
Cat
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.4
0.6

0.9
0.7
0.3

0.1
0.4
0.9

0.9

0.6
0.1

0.1
1

0
Dog

Dog

Dog
Cat

Cat

Cat
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Row Dog Cat^ Dog Cat Row Dog Cat^ Dog Cat
^ ^
#1 0.9 0.1 1 0 #1 0.6 0.4 1 0
#2 0.1 0.9 0 1 #2 0.3 0.7 0 1
Classification Error
#3 0.4 0.6
1/3 = 0.33 1 0 #3 0.1 0.9
1/3 = 0.33 1 0
Mean Squared Error
0.25 0.71
Cross-Entropy
0.38 1.06
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

A Friendly Introduction to
Cross-Entropy Loss

By Rob DiPietro (2016)

Link:

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:

How to implement a neural


network Intermezzo 2

By Peter Roelants (2016)

Link:

http://peterroelants.github.io/posts/neural_network_implementation
_intermezzo02/

You might also like