AST Day 2 Slides
AST Day 2 Slides
Techniques (Day 2)
Neumann Chew C. H.
ITOM, Nanyang Business School.
[email protected]
2
Day 2 (Part 1a)
Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 5
The concept of a model is useful as it allows Xs.
X1
X2
X3
.
Model 𝑌
𝐸𝑟𝑟𝑜𝑟 = 𝑌 − 𝑌
.
Xk Example: Predicting housing price.
What is Y? What is 𝑌 ? What are Xs?
6
Model Complexity
• The size of the model (e.g. model parameters)
• The number of X variables.
• The greater the complexity, the lower the error on the
dataset.
• Should we be happy if we reach zero error on the dataset?
• Scenario: Investment company selecting which stock to
buy using a ML model.
7
Train-Test Split
Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 8
Industry Standard Practice
• The Train-Test split is the industry standard for
ML/AI/Analytics practice in Predictive Modeling.
• There are two limitations:
– If Y is categorical, rare cases may appear in only one of the two
subsets.
– Data is sacrificed (from the model) to form a testset.
9
Train – Test Split
(Stratified version)
Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 10
10-fold Cross Validation
Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 11
Model Overfitting
Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 12
Common Model Performance Metrics
Predict a Continuous Predict a Categorical
Target Variable Y Variable Y
• RMSE (Root Mean • Confusion Matrix
Square Error) • False Positive Rate
• MAPE (Mean Absolute • False Negative Rate
Prediction Error)
• Mean Directional
Accuracy (MDA)
13
RMSE – A popular metric to compute model
prediction error on a continuous Y variable
3. Netflix used RMSE in their US$1 mil prize. Right/Wrong? What’s the implication?
14
Compare Different Models’ Performance
• The lower the RMSE on a testset (or average of
10 folds CV), the better is the model performance.
15
Watch Pre-class Lecture Videos 6.x or read main textbook Chapter 6.
𝑦ො e ~ N(0, σ)
Straight Line Equation
Y_hat typically used as an Errors (aka Residuals) follow a Normal
estimate of Y. Distribution with mean 0 and a
constant standard deviation.
17
Day 2 (Part 2a)
MULTICOLLINEARITY
18
Pre-Class Activity: Did Exercise 2.1 at Home
19
Multi-Collinearity
• When an X can be “easily explained” using all the other Xs
in the model.
– Example: A linear combination of X1, X3, X4 can explain 91% of X2.
• Why do you still need that X in the model?
• Creates instability in model coefficients i.e. high variance in
the model coefficient of that X.
20
Example of Multi-Collinearity: Predict Weight of
Growing Child
𝑌 = 4𝑋1
𝑌 = 2𝑋1 + 2𝑋2
𝑌 = 10𝑋1 − 6𝑋2
𝑌 = 1000𝑋2 − 996𝑋1
22
Is multicollinearity to be avoided?
• No.
• Model Performance may still be very good.
• But do not interpret the model coefficients the standard
way if your model is multi-collinear.
23
Day 2 (Part 2b)
25
Watch Pre-class Lecture Videos 7.x or
read main textbook Chapter 7.
Day 2 (Part 3)
• Xs are unrestricted.
• Model output 𝑌 can be any value within a reasonable range.
• But what if Y is a categorical variable?
• Has Disease X or not; Approve/Reject Loan Application; Pass/Fail;
• Very Happy/Happy/Neutral/Sad/Very Sad; A/B/C/D/E/F; Red/Green/Blue,…
27
Logistic Regression Model for Categorical Y is a 2-step process.
𝑌 = 𝑐𝑎𝑡1 𝑌 = 𝑐𝑎𝑡0
• Find a Function that takes linear equation as input and outputs a probability .
• For Binary outcomes Y, a popular choice for threshold = 50%.
28
What is the Logistic Function?
29
Logistic Function output is between 0 and 1
1
𝑓 𝑥 =
1 + 𝑒 −𝑥
𝑥
• Accepts any value for x.
• Output is between 0 and 1.
• Hence Logistic function f(x) can be interpreted as a probability.
30
Logistic Function with multiple Xs
1
𝑓 𝑧 =
1 + 𝑒 −𝑧
• Let 𝑧 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚 𝑧
• Model coefficients are optimised to fit the data.
• Xs affect P(Y = cat1) via the model coefficients.
31
Logistic Regression Model for Categorical Y is a 2-step process.
X1 Logistic Function
1
Linear Equation P(Y = cat1) =
X2 1+𝑒 −𝑧
𝑧 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚
X3
.
.
Xm
𝑌 = 𝑐𝑎𝑡1 𝑌 = 𝑐𝑎𝑡0
• Logistic Function takes the linear equation as input and outputs P(Y = cat1).
• P(Y = cat0) = 1 – P(Y = cat1)
32
Measuring Model Prediction Errors
If Y is continuous, the model prediction error can be
calculated by considering:
• For each obs: Error = Actual Y value – Model Predicted Y
• Over the entire dataset with n obs: RMSE.
33
Confusion Matrix
36
Odds of Event A is defined in terms of P(A)
𝑃(𝐴)
𝑂 𝐴 ≡
1 − 𝑃(𝐴)
37
Example: Probability & Odds of Heart Attack
1
𝑃 𝑌=1 =
1 + 𝑒 −𝑧
𝑃 𝑌=1 1 𝑒 −𝑧 𝑧
𝑂𝑑𝑑𝑠 𝑌 = 1 ≡ = ÷ = 𝑒
1−𝑃 𝑌 =1 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧
39
Odds Ratio for each predictor
For each continuous X variable:
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑏𝑦 1 𝑢𝑛𝑖𝑡
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 𝑌 = 1 = = 𝑒 𝑐𝑜𝑒𝑓
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝑠𝑡𝑎𝑡𝑢𝑠 𝑞𝑢𝑜
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝐵
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 𝑌 = 1 = = 𝑒 𝑐𝑜𝑒𝑓
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝐵𝑎𝑠𝑒𝑙𝑖𝑛𝑒 𝐴
Request for Proof of Relationship between Logistic Reg Model Coef and
Odds Ratio.PDF from instructor if you are interested in the proof.
40
Identifying the Risk Factors for Y to be cat1
• Two equivalent “tests”
– Which X variable has p-value < 5%
– Which X variable has Odds Ratio 95% Confidence
Interval excluding 1.
41
Hours Studying is a risk factor in
Pass/Fail Exam (from p-value)
43
What’s special about Odds Ratio = 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
=1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
44
What if Odds Ratio > 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
>1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
45
What if Odds Ratio < 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
<1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
46
The events A and B depends on the type of X
• If X is categorical, then dummy variables are
created, and A is always the baseline level.
• If X is continuous, then A is the status quo and B
is a 1 unit increase in X.
47
Odds Ratio for the predictor in passexam.csv
48
Quantifying Risk Factor with Odds Ratio = ecoef
50
Discuss Solution to Pre-Class Ex 2.3 Part B.
51
Day 1 (Part 4)
MULTI-CATEGORICAL Y
52
What if Y has 3 or more categorical outcomes?
• A/B/C/D/E
• Pass/borderline Pass/Fail
• 0/1/2
53
Discuss Solution to Pre-Class Ex 2.3 Part C.
54
Summary
• Categorical Y prediction can be achieved by
– using logistic function on a linear combination of Xs.
– Comparing the logistic function against a threshold.
• Good habit to check the levels of the Y variable to avoid
misinterpreting the software output.
• Confusion Matrix shows the performance (both correct
and wrong predictions) of the logistic regression model.
• Changing the threshold can change the error tradeoffs.
55
Quiz 2
56
What is the most impt
Q1 thing that you learned
today?
Reflection
on your
Learning
What is still confusing
Q2 or difficult to you?
5
7
The End of Day 2
ANY QUESTIONS?
REMEMBER TO COMPLETE PRE-CLASS ACTIVITIES
BEFORE DAY 3 CLASS (SEE CHECKLIST 3)
58