Machine Learning A Z Course Downloadable Slides V1.5
Machine Learning A Z Course Downloadable Slides V1.5
Machine Learning A Z Course Downloadable Slides V1.5
Learning A-Z
Course Slides
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Welcome to the course!
Dear student,
Welcome to the “Machine Learning A-Z” course brought to you by SuperDataScience. We are
super-excited to have you on board! In this class you will learn many interesting and useful
concepts while having lots of fun.
These slides may be updated from time-to-time. If this happens, you will be able to find them in
the course materials repository with a new version indicated in the filename.
We kindly ask that you use these slides only for the purpose of supporting your own learning
journey and we look forward to seeing you inside the class!
PS: if you are not yet enrolled in the course, you can find it here.
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Who Are Your Instructors?
We’ve been teaching online together since 2016 and over 1 Million students have enrolled in our
Machine Learning and Data Science courses. You can be confident that you are in good hands!
© SuperDataScience
Data
Preprocessing
© SuperDataScience
The Machine
Learning
Process
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Machine Learning Process
Data Pre-Processing
• Import the data
• Clean the data
• Split into training & test sets
• Feature Scaling
Modelling
• Build the model
• Train the model
• Make predictions
Evaluation
• Calculate performance metrics
• Make a verdict
© SuperDataScience
Training Set
& Test Set
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Training Set & Test Set
~
Train
80% 𝑦! = 𝑏! + 𝑏"𝑋" + 𝑏#𝑋#
Test
20% V.S.
Predicted values 𝑦! Actual values 𝑦
© SuperDataScience
Feature
Scaling
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling
Normalization Standardization
𝑋 − 𝑋"#$
! 𝑋−𝜇
𝑋 = 𝑋! =
𝑋"%& − 𝑋"#$ 𝜎
[0 ; 1] [-3 ; +3]
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling
70,000 $ 45 yrs
10,000 1
60,000 $ 44 yrs
8,000 4
52,000 $ 40 yrs
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling
Normalization
!
𝑋 − 𝑋"#$
𝑋 =
𝑋"%& − 𝑋"#$
[0 ; 1]
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Scaling
70,000 $ 45 yrs
60,000 $ 44 yrs
52,000 $ 40 yrs
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
44 yrs
40 yrs
45 yrs
0.444
0
1
Feature Scaling
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.75
0
1
0.444
0
1
Feature Scaling
© SuperDataScience
Regression
© SuperDataScience
Simple Linear
Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression
𝑦! = 𝑏! + 𝑏" 𝑋"
Dependent variable Independent variable
y-intercept (constant)
Slope coefficient
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression
Each point represents
𝑦 [tonnes] a separate harvest
~
(Potato yield)
+3𝑡
𝑦! = 𝑏! + 𝑏"𝑋"
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 𝑏! + 𝑏"×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔
8𝑡
𝑏! = 8[𝑡]
𝑡 +1𝑘𝑔 𝑋! [kg]
𝑏" = 3[ ] (Nitrogen Fertilizer)
𝑘𝑔
© SuperDataScience
Ordinary Least
Squares
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Simple Linear Regression
𝑦 [tonnes]
Ordinary Least Squares: (Potato yield)
𝑦!
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙: 𝜀! = 𝑦! − 𝑦%!
𝑦%!
𝑦!
𝑦! = 𝑏! + 𝑏"𝑋" 𝑦%!
© SuperDataScience
Multiple Linear
Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Multiple Linear Regression
𝑦! = 𝑏! + 𝑏" 𝑋" + 𝑏# 𝑋# + ⋯ + 𝑏$ 𝑋$
Dependent variable Independent variable 1 Independent variable 2 Independent variable n
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Multiple Linear Regression
~
3 3 3
𝑃𝑜𝑡𝑎𝑡𝑜𝑒𝑠 𝑡 = 8𝑡 + 3 45 ×𝐹𝑒𝑟𝑡𝑖𝑙𝑖𝑧𝑒𝑟 𝑘𝑔 − 0.54 °7 ×𝐴𝑣𝑔𝑇𝑒𝑚𝑝 °𝐶 + 0.04 88 ×𝑅𝑎𝑖𝑛[𝑚𝑚]
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading
Link:
https://www.mdpi.com/2073-4395/11/5/885
© SuperDataScience
R Squared
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
R Squared
𝑦 [tonnes] Regression: 𝑦 [tonnes] Average:
(Potato yield) (Potato yield)
𝑦#$%
𝑦" 𝑦"
𝑦!"
𝑋! [kg] 𝑋! [kg]
(Nitrogen Fertilizer) (Nitrogen Fertilizer)
© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Adjusted R Squared
𝑆𝑆<=>
𝑅# =1− R2 – Goodness of fit
𝑆𝑆393 (greater is better)
Problem:
𝑦% = 𝑏' + 𝑏( 𝑋( + 𝑏) 𝑋) + 𝑏* 𝑋* 𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )#
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same (This is because of Ordinary Least Squares: 𝑆𝑆$%& -> Min)
Solution:
𝑛−1
𝐴𝑑𝑗 𝑅# =1− 1 − 𝑅# ×
𝑛−𝑘−1
k – number of independent variables
n – sample size
© SuperDataScience
Assumptions Of
Linear Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Assumptions of Linear Regression
Anscombe's quartet (1973):
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Assumptions of Linear Regression
𝑋"~ 𝑋# 𝑋"~ 𝑋#
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Bonus
superdatascience.com/assumptions
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading
Link:
towardsdatascience.com/verifying-the-
assumptions-of-linear-regression-in-python-
and-r-f4cd2907d4c0
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Profit R&D Spend Admin Marketing State
Why?
X4
y
X3
X2
X1
STEP 3: Consider the predictor with the highest P-value. If P > SL, go to STEP 4, otherwise go to FIN
STEP 2: Fit all simple regression models y ~ x n Select the one with the lowest P-value
STEP 3: Keep this variable and fit all possible models with one extra predictor added to the one(s) you
already have
STEP 4: Consider the predictor with the lowest P-value. If P < SL, go to STEP 3, otherwise go to FIN
STEP 2: Perform the next step of Forward Selection (new variables must have: P < SLENTER to enter)
STEP 3: Perform ALL steps of Backward Elimination (old variables must have P < SLSTAY to stay)
STEP 4: No new variables can enter and no old variables can exit
Example:
FIN: Your Model Is Ready
10 columns means
1,023 models
Multiple
Linear
Regression
Polynomial
Linear
Regression
ε
ε
x1 x1
ξ5 ε
ξ3
ξ2 ε
ξ4*
ξ1*
x1 x1
Link:
https://core.ac.uk/download/pdf/81523322.pdf
Split 3
200
Split 2
170
Split 4
20 40 X1
20
X2
Yes
20
X2
170
Yes
Yes
No
X1 < 20
Yes
Split 3
200
Split 2
170
20 X1
Yes
No
X1 < 20
Yes
Yes No
Yes No Yes No
Split 3
200
Split 2
170
Split 4
20 40 X1
Yes No
Yes No Yes No
Yes No
Yes No Yes No
X1 < 40
Yes No
65.7
Split 3 1023
200
Split 2
170
Split 4
20 40 X1
Yes No
Yes No Yes No
X1 < 40
Yes No
Yes No
Yes No Yes No
Yes No
-64.1 0.7
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the value of Y to
for the data point in question, and assign the new data point the average across all of the
predicted Y values.
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
R Squared
𝑦 [tonnes] Regression: 𝑦 [tonnes] Average:
(Potato yield) (Potato yield)
𝑦#$%
𝑦" 𝑦"
𝑦!"
𝑋! [kg] 𝑋! [kg]
(Nitrogen Fertilizer) (Nitrogen Fertilizer)
© SuperDataScience
Adjusted
R Squared
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Adjusted R Squared
𝑆𝑆<=>
𝑅# =1− R2 – Goodness of fit
𝑆𝑆393 (greater is better)
Problem:
𝑦% = 𝑏' + 𝑏( 𝑋( + 𝑏) 𝑋) + 𝑏* 𝑋* 𝑆𝑆<=> = 𝑆𝑈𝑀(𝑦$ − 𝑦!$ )#
𝑆𝑆"#" doesn’t change
𝑆𝑆$%& will decrease or stay the same (This is because of Ordinary Least Squares: 𝑆𝑆$%& -> Min)
Solution:
𝑛−1
𝐴𝑑𝑗 𝑅# =1− 1 − 𝑅# ×
𝑛−𝑘−1
k – number of independent variables
n – sample size
© SuperDataScience
Classification
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Classification?
Classification: a Machine Learning technique to identify the
category of new observations based on training data.
© SuperDataScience
Logistic
Regression
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Logistic Regression
Logistic regression: predict a categorical 𝑦 [yes/no]
dependent variable from a number of (Took up offer?)
independent variables.
~
YES
YES
81%
≥ 50%
Will purchase Age
health insurance: < 50%
Yes / No 42%
𝑝
ln = 𝑏! + 𝑏" 𝑋" NO
NO
1−𝑝 18 35 45 60
𝑋! [yrs]
(Age)
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Logistic Regression
~
Will purchase Age Income Level of Family or
health insurance: Education Single
Yes / No
'
ln ()' = 𝑏* + 𝑏( 𝑋( + 𝑏+ 𝑋+ + 𝑏, 𝑋, + 𝑏- 𝑋-
© SuperDataScience
Maximum
Likelihood
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Likelihood
𝑦 [yes/no] 𝑦 [yes/no]
(Took up offer?) (Took up offer?)
YES YES
0.95 0.98 1- 0.96
0.92
1- 0.58
0.54
1- 0.10
1- 0.01 1- 0.04
0.03
NO NO
𝑋! [yrs] 𝑋! [yrs]
18 60 18 60
(Age) (Age)
Likelihood = 0.03 x 0.54 x 0.92 x 0.95 x 0.98 x (1 – 0.01) x (1 – 0.04) x (1 – 0.10) x (1 – 0.58) x (1 – 0.96)
Likelihood = 0.00019939
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Maximum Likelihood
𝑦 [yes/no]
(Took up offer?)
Likelihood = 0.00007418
Likelihood = 0.00012845
Likelihood = 0.00019939
NO
𝑋! [yrs]
18 60
(Age)
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Before K-NN After K-NN
X2 x2
Category 2 Category 2
Category 1 Category 1
x1 x1
STEP 2: Take the K nearest neighbors of the new data point, according to the Euclidean distance
STEP 3: Among these K neighbors, count the number of data points in each category
STEP 4: Assign the new data point to the category where you counted the most neighbors
x2
Category 2
Category 1
x1
y2 P2(x2,y2)
y1
P1(x1,y1)
x1 x2 x
Category 1: 3 neighbors
Category 2
Category 2: 2 neighbors
New data point
Category 1
x1
x2
Support
Vectors
x1
x2 Positive Hyperplane
Maximum Margin
Hyperplane
(Maximum Margin Classifier)
Support
Vectors
Negative Hyperplane
x1
x1
x2
x2 x2
x1 x1
x1
0
x1
0
x1
0
Hyperplane
Mapping Function
x2
x1
x1
Machine Learning A-Z © SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2D Space
z 3D Space x2
Projection
x2
x1
x1
Machine Learning A-Z © SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Mapping to a Higher Dimensional Space
can be highly compute-intensive
#* "
⃗ )⃗
(& !
&
⃗%=𝑒⃗ 𝑙
𝐾 𝑥,
"
⃗ %⃗!
$"
"
La
nd
ma
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "
rk
x1
"
⃗ %⃗!
$"
"
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "
x1
"
⃗ %⃗!
$"
"
⃗ 𝑙⃗! =
𝐾 𝑥, 𝑒 &' "
x1
Green when:
⃗ 𝑙⃗( + 𝐾 𝑥,
𝐾 𝑥, ⃗ 𝑙⃗& > 0
Red when:
⃗ 𝑙⃗( + 𝐾 𝑥,
𝐾 𝑥, ⃗ 𝑙⃗& = 0
x1
Y Y
X X
Y Y
X X
Y Y
X X
m2 m2 m2 m2 m2 m2 m2 m2 m2 m2
Walks 1
Category
X
Age
1
Walks
Age
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Walks
Salary
Age
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Age
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Age
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
3 10
10 ∗ 30
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 = = 0.75
4
30
#2 Marginal Likelihood
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
1 20
20 ∗ 30
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 = = 0.25
4
30
#2 Marginal Likelihood
Walks
Age
Walks
Age
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Walks
Salary
Age
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Age
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
Age
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
1 20
20 ∗ 30
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 = = 0.25
4
30
#2 Marginal Likelihood
Walks
Age
Age
NOTE: Same both times
𝑃 𝑋 𝑊𝑎𝑙𝑘𝑠 ∗ 𝑃(𝑊𝑎𝑙𝑘𝑠)
𝑃 𝑊𝑎𝑙𝑘𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
𝑃 𝑋 𝐷𝑟𝑖𝑣𝑒𝑠 ∗ 𝑃(𝐷𝑟𝑖𝑣𝑒𝑠)
𝑃 𝐷𝑟𝑖𝑣𝑒𝑠 𝑋 =
𝑃(𝑋)
#2 Marginal Likelihood
x1
60
x2
x1
Split 2
50
60
x2
60 Split 3 Split 1
50 70
x1
60 Split 3 Split 1
Split 4
50 70
x1
x1
60
x2
Yes
x1
Split 2
50
60
x2
Yes
No
X2 < 60
Yes
60 Split 3 Split 1
50 70
x1
Yes No
X1 < 70 X1 < 50
Yes No Yes No
60 Split 3 Split 1
20 Split 4
50 70
x1
Yes No
X1 < 70 X1 < 50
Yes No Yes No
X2 < 20
Yes No
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the category to
which the data points belongs, and assign the new data point to the category that wins
the majority vote.
ŷ=1 ŷ=1
1
0.5
X
ŷ=0 ŷ=0
20 30 40 50
#2 #4
1
0.5
#1 #3 X
ŷ (Predicted DV) #2 #4
1
0.5
#1 #3 X
False Nega.ve
(Type II Error) Fin.
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Confusion Matrix & Accuracy
Prediction
NEG POS
TRUE FALSE
NEG
NEG POS
Actual
FALSE TRUE
POS
NEG POS
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Confusion Matrix & Accuracy
Prediction
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑇𝑁 + 𝑇𝑃 84
𝐴𝑅 = = = = 84%
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙 100
NEG 43 12
𝐼𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝐹𝑃 + 𝐹𝑁 16
Actual
𝐸𝑅 = = = = 16%
𝑇𝑜𝑡𝑎𝑙 𝑇𝑜𝑡𝑎𝑙 100
POS 4 41
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading
Link:
https://towardsdatascience.com/understanding-the-
confusion-matrix-from-scikit-learn-c51d88929c79
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ (Predicted DV) Scenario 1:
0 9,700 150
y (Actual DV)
1 50 100
0 9,850 0 Scenario 2:
y (Actual DV)
10,000
8,000
6,000
4,000
2,000
0
0 20,000 40,000 60,000 80,000 100,000 Total Contacted
100%
80%
60%
Poor Model
40%
Random
20%
0
10%
0 20% 40% 60% 80% 100% Total Contacted
100%
80%
60%
40%
Random
20%
0
0 20% 40% 60% 80% 100% Total Contacted
100%
80%
aP
aR
60%
aR AR = aP
40%
Random Model
20%
0
0 20% 40% 60% 80% 100% Total Contacted
100%
X%
80% 90% < X < 100% Too Good
80% < X < 90% Very Good
60% 70% < X < 80% Good
60% < X < 70% Poor
X < 60% Rubbish
40%
Random Model
20%
0
50%
0 20% 40% 60% 80% 100% Total Contacted
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?
Clustering – grouping
unlabelled data
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?
Supervised Learning
(e.g. Regression, Classification)
Unsupervised Learning
(e.g. Clustering)
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What is Clustering?
Clustering
© SuperDataScience
K-Means
Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means Clustering
© SuperDataScience
The Elbow
Method
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
...
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 1
C1
The Elbow Method
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 2
C2
Cluster 1
The Elbow Method
C1
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
C2 Cluster 2
C1
Cluster 1
Cluster 3
C3
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
The Elbow Method
The Elbow Method
© SuperDataScience
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
Cluster 2
K-Means
Cluster 1
Cluster 3
Different results
Cluster 2
K-Means
Cluster 3
Cluster 1
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
Step 2: For each of the remaining data points compute the distance (D)
to the nearest out of already selected centroids
Step 4: Repeat Steps 2 and 3 until all k centroids have been selected
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Cluster 2
Cluster 3
Cluster 1
K-Means++
© SuperDataScience
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
After HC
STEP 2: Take the two closest data points and make them one cluster That forms N-1
clusters
STEP 3: Take the two closest clusters and make them one cluster That forms N - 2
clusters
FIN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
x
P2(x2,y2)
x2
P1(x1,y1)
x1
y2
y1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Distance Between Two Clusters:
FIN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
z
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P6
P5
P4
P3
P2
P1
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
2 clusters
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
4 clusters
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
6 clusters
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Largest distance
2 clusters
P1
P3
P2
P6
P5
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Largest distance
3 clusters
P1
P3
P2
P4
P6
P5
P9
P8
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
People who bought also bought …
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
User ID Movies liked
46578 Movie1, Movie2, Movie3, Movie4
98989 Movie1, Movie2
71527 Movie1, Movie2, Movie4
78981 Movie1, Movie2
89192 Movie2, Movie4
61557 Movie1, Movie3
Movie1 Movie2
Movie1 Movie3
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo
Step 2: Take all the subsets in transactions having higher support than minimum support
Step 3: Take all the rules of these subsets having higher confidence than minimum confidence
Movie1 Movie2
Movie1 Movie3
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Transaction ID Products purchased
46578 Burgers, French Fries, Vegetables
98989 Burgers, French Fries, Ketchup
71527 Vegetables, Fruits
78981 Pasta, Fruits, Butter, Vegetables
89192 Burgers, Pasta, French Fries
61557 Fruits, Orange Juice, Vegetables
87923 Burgers, French Fries, Ketchup, Mayo
Burgers French Fries
Step 2: Take all the subsets in transactions having higher support than minimum support
Return
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
We’ve generated our own bandit configuration
New Round
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
• Deterministic • Probabilistic
• Requires update at every round • Can accommodate delayed feedback
• Better empirical evidence
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Here’s what we will learn:
• Types of Natural Language Processing
• Classical vs Deep Learning Models
• End-to-end Deep Learning Models
• Bag-Of-Words
Seq2Seq
DNLP
Processing
Language
Natural
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Learning
Deep
Seq2Seq
DNLP
Processing
Language
Natural
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Some examples:
1. If / Else Rules (Chatbot)
2. Audio frequency components analysis (Speech
Recognition) NLP DL
3. Bag-of-words model (Classification)
4. CNN for text Recognition (Classification)
5. Seq2Seq (many applications)
Comment Pass/Fail
Great job! 1
Amazing work.
Well done.
1
Yes
1
I’m back EOS Seq2Seq
Very well written. 1
Poor effort. 0
Could have done better. 0
h0 h1 h2Try harder
h3 next time.
… h0
n g0 g1 g2
… …
Encoder Decoder
Models
DL
NLP
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
if badminton table
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
SOS Special
EOS Words
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... , 0]
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
Training Data:
Hey mate, have you read about Hinton’s capsule networks?
Did you like that recipe I sent you last week?
Hi Kirill, are you coming to dinner tonight?
Dear Kirill, would you like to service your car with us again?
Are you coming to Australia in December?
…
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
Training Data:
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]
…
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NLP DL
Hello Kirill, Checking if you are back to Oz. Let me know if you are around … Cheers, V
[1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 3]
Training Data:
[1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, ... , 2]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 0, ... , 0]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, ... , 1]
[1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, ... , 1]
Image
… Source: www.helloacm.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
2017
25,600x
1980
2x
1956
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: mkomo.com
Log-scale
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: nature.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Source: Time Magazine
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Geoffrey Hinton
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: www.austincc.edu
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
4
Input value 1 1
5
Input value 2 2
8 Output value
Input value 3 3
Input Layer
2
3
1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output Layer
8
Hidden Layers
4
Input Layer
2
3
1
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Artificial Neural Networks Used for Regression & Classification
Supervised
Neuron
Dendrites
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Node
neuron
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
neuron
Input signal m
Input signal 1
Input signal 2
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Output signal
neuron
Input signal m
Input signal 1
Input signal 2
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Input value m Xm
Synapse
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Input value m Xm
Synapse
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1
Input value m Xm
Independent
variable m
Standardize
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Efficient BackProp
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1
Can be:
• Continuous (price)
• Binary (will exit yes/no)
Input value m Xm • Categorical
Independent
variable m
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1 y1 Output value 1
y3 Output value p
Input value m Xm
Independent
variable m
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
Independent
variable 1
Input value m Xm
Independent
variable m
Same observation
Single Observation Single Observation
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
Input value 2 X2 w2
?
neuron y Output value
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
1st step:
Input value 2 X2 w2 y Output value
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
2nd step:
Input value 2 X2 w2 y Output value
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
2nd step:
3rd step
Input value 2 X2 w2 y Output value
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
2nd step:
3rd step
Input value 2 X2 w2 y Output value
wm
Input value m Xm
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Threshold Function
0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Sigmoid
0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Rectifier
0
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Hyperbolic Tangent (tanh)
-1
1
y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
http://jmlr.org/proceedings/papers/v15/glorot11a/glorot11a.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input value 1 X1
w1
2nd step:
3rd step
Input value 2 X2 w2 y Output value
wm
Input value m Xm
If threshold activation function:
5
Output value
Input value 2 X2 y
6
Input value m Xm
7
w4
Age X4
Bedrooms X2
Age X4
.
Input Layer Hidden Layer Output Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1
Bedrooms X2
Age X4
.
Input Layer Hidden Layer Output Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1
Bedrooms X2
Age X4
Bedrooms X2
Age X4
Bedrooms X2
Age X4
Bedrooms X2
Age X4
Bedrooms X2
Age X4
Bedrooms X2
Age X4
Bedrooms X2
y Price
Age X4
Input value 2 X2 w2 ŷ
y Output value
wm
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ y C
Input value 1 X1
w1
wm C = ½(ŷ- y)2
Input value m Xm
y Actual value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm Xm
y y
X1 X1
w1 w1
w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ
wm wm
Xm Xm
y y
X1 X1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
CrossValidated (2015)
Link:
http://stats.stackexchange.com/questions/154879/a-list-of-cost-
functions-used-in-neural-networks-alongside-applications
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm Xm
y y
X1 X1
w1 w1
w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ
wm wm
Xm Xm
y y
X1 X1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2
Output value
Actual value
y
y
ŷ
w1
X1
Input value
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2
ŷ
Best!
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Area (feet2) X1
Bedrooms X2
y Price
Age X4
Bedrooms X2
y Price
25 weights
Age X4
93 PFLOPS
93 x 1015
ŷ
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
C = ½(ŷ- y)2
ŷ
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
ŷ
Best!
C
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm Xm
y y
X1 X1
w1 w1
w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ
wm wm
Xm Xm
y y
X1 X1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm Xm
y y
X1 X1
w1 w1
w2 w2
C = ∑ ½(ŷ- y)2
X2 ŷ X2 ŷ
wm wm
Xm Xm
y y
X1 X1
X2 w2 ŷ X2 w2 ŷ
wm wm
Xm
y
Xm
y
C
X1 X1
w1 w1
X2 w2 ŷ X2 w2 ŷ
Xm
wm
y
Xm
wm
y
ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y ŷ y
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
Upd w’s
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
https://iamtrask.github.io/2015/07/27/python-network-part2/
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Forward
Backpropagation
Propagation
Link:
http://neuralnetworksanddeeplearning.com/chap2.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Randomly initialise the weights to small numbers close to 0 (but not 0).
STEP 2: Input the first observation of your dataset in the input layer, each feature in one input node.
STEP 3: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each
neuron’s activation is limited by the weights. Propagate the activations until getting the predicted result y.
STEP 4: Compare the predicted result to the actual result. Measure the generated error.
STEP 5: Back-Propagation: from right to left, the error is back-propagated. Update the weights according to
how much they are responsible for the error. The learning rate decides by how much we update the
weights.
STEP 6: Repeat Steps 1 to 5 and update the weights after each observation (Reinforcement Learning). Or:
Repeat Steps 1 to 5 but update the weights only after a batch of observations (Batch Learning).
STEP 7: When the whole training set passed through the ANN, that makes an epoch. Redo more epochs.
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
What we will learn in this section:
• What are Convolutional Neural Networks?
• Step 1 - Convolution Operation
• Step 1(b) - ReLU Layer
• Step 2 - Pooling
• Step 3 - Flattening
• Step 4 - Full Connection
• Summary
(Image
Label
class)
CNN
Input Image
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Happy
Sad
CNN
CNN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
B / W Image 2x2px
Pixel 1 Pixel 2
3d array Colored
Pixel 1 Pixel 2
Colored
Image
Image
Pixel 3 Pixel 4
Pixel 3 Pixel 4
Blue channel
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0
0
0
0
0
0
0
0
0
0
0
0
0
0
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
STEP 1: Convolution
STEP 3: Flattening
Gradient-Based Learning
Applied to Document
Recognition
Link:
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Introduction to
Convolutional Neural
Networks
By Jianxin Wu (2017)
Link:
http://cs.nju.edu.cn/wujx/paper/CNN.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Input Feature
Image Detector
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1 2
0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 1 0 0 0
0 0 1
0 0 0 0 0 0 0 0 1 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 1 2 1
0 1 0 0 0 1 0 1 4 2 1 0
0 1 1
0 0 1 1 1 0 0 0 0 1 2 1
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Input Image
Convolutional
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Sharpen:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: docs.gimp.org/en/plug-in-convmatrix.html
Blur:
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Edge Enhance:
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Input Image
Convolutional
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Feature Maps
0 0 0 0 0 0 0
0 1 0 0 0 1 0 Rectifier
0 0 0 0 0 0 0
y
0 0 0 1 0 0 0
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
0
Input Image
Convolutional Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: http://mlss.tuebingen.mpg.de/2015/slides/fergus/Fergus_1.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Understanding
Convolutional Neural
Networks with A
Mathematical Model
Link:
https://arxiv.org/pdf/1609.04112.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
https://arxiv.org/pdf/1502.01852.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: Wikipedia
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0
Feature Map
0
2
0
1
1
0
0
0
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 Max Pooling
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1
Max Pooling
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1
Max Pooling
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0 2
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 1 0 0 0
0 1 1 1 0 1 1 0
Max Pooling
1 0 1 2 1 4 2 1
1 4 2 1 0 0 2 1
0 0 1 2 1
Pooled Feature
Feature Map Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Evaluation of Pooling
Operations in Convolutional
Architectures for Object
Recognition
Link:
http://ais.uni-bonn.de/papers/icann2010_maxpool.pdf
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0
0 1 0 0 0 1 0 Convolution Pooling
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Input Image
Convolutional Pooling Layer
Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: scs.ryerson.ca/~aharley/vis/conv/flat.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Pooled Feature
0
1
1
1
2
2
1
4
0
Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
1
1
0
4
2
1
0
2
1
Flattening
Pooled Feature
0
1
1
1
2
2
1
4
0
Map
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Input layer of a future ANN
Flattening
Pooling Layer
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0 0 0 0 0 0 0
0 1 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 1 0 0 0 Convolution Pooling Flattening
0 1 0 0 0 1 0
0 0 1 1 1 0 0
0 0 0 0 0 0 0
Input Image
Input
layer of
Convolutional Pooling Layer a future
Layer ANN
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
X1
Flattening
X2 Output
value
Xm
Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
0.9
0.2
0.2
0.1
0.1
0.1
1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
0.9
0.2
0.2
0.1
0.1
0.1
1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
0.9
0.9
0.2
0.2
0.2
0.1
0.1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
0.9
0.9
0.2
0.2
0.2
0.1
0.1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Dog
Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95
Dog
Cat
0.4
0.8
0.8
0.2
0.1
0.1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.79
0.21
Dog
Cat
0.4
0.8
0.9
0.2
0.1
0.1
1
1
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Image Source: a talk by Geoffrey Hinton
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
https://adeshpande3.github.io/adeshpande3.github.io/The-9-Deep-
Learning-Papers-You-Need-To-Know-About.html
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
© SuperDataScience
Machine Learning A-Z
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95
Dog
Cat
Flattening
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.05
0.95
z2
z1
Dog
Cat
Flattenin
g
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
1
0
Dog 0.9
0.1
Cat
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
0.4
0.6
0.9
0.7
0.3
0.1
0.4
0.9
0.9
0.6
0.1
0.1
1
0
Dog
Dog
Dog
Cat
Cat
Cat
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Row Dog Cat^ Dog Cat Row Dog Cat^ Dog Cat
^ ^
#1 0.9 0.1 1 0 #1 0.6 0.4 1 0
#2 0.1 0.9 0 1 #2 0.3 0.7 0 1
Classification Error
#3 0.4 0.6
1/3 = 0.33 1 0 #3 0.1 0.9
1/3 = 0.33 1 0
Mean Squared Error
0.25 0.71
Cross-Entropy
0.38 1.06
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
A Friendly Introduction to
Cross-Entropy Loss
Link:
https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/
NOT FOR DISTRIBUTION © SUPERDATASCIENCE www.superdatascience.com
Additional Reading:
Link:
http://peterroelants.github.io/posts/neural_network_implementation
_intermezzo02/