0% found this document useful (0 votes)

26 views58 pages

AST Day 2 Slides

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views58 pages

AST Day 2 Slides

Uploaded by

Joel Lim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Analytics Strategy and

Techniques (Day 2)

Neumann Chew C. H.
ITOM, Nanyang Business School.
[email protected]

Updated: 26 Aug 2023

Day 2 Schedule

2
Day 2 (Part 1a)

THE USEFULNESS OF A MODEL

3
Models
• Plan for the models that will be
tested.
• Some models have special
performance metrics in-built.
• Some models need extra code to
compute specific metrics.
• Models learnt in this course
– Linear Regression
– Logistic Regression
– CART
– Random Forest
4
Models* on Explanability Scale
Linear Quantile Logistic Neural Deep
CART MARS
Regression Regression Regression Network Learning

Highest Explanability Lowest Explanability

Power (White Box) Power (Black Box)

*: Selected list of models on the Explanability Scale.

Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 5
The concept of a model is useful as it allows Xs.

X1
X2
X3
.
Model 𝑌෠
𝐸𝑟𝑟𝑜𝑟 = 𝑌 − 𝑌෠
.
Xk Example: Predicting housing price.
What is Y? What is 𝑌෠ ? What are Xs?
6
Model Complexity
• The size of the model (e.g. model parameters)
• The number of X variables.
• The greater the complexity, the lower the error on the
dataset.
• Should we be happy if we reach zero error on the dataset?
• Scenario: Investment company selecting which stock to
buy using a ML model.
7
Train-Test Split

Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 8
Industry Standard Practice
• The Train-Test split is the industry standard for
ML/AI/Analytics practice in Predictive Modeling.
• There are two limitations:
– If Y is categorical, rare cases may appear in only one of the two
subsets.
– Data is sacrificed (from the model) to form a testset.

9
Train – Test Split
(Stratified version)

Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 10
10-fold Cross Validation

Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 11
Model Overfitting

Source: Chew C.H. (2021) Artificial Intelligence, Analytics and Date Science Vol. 1 Core Concepts and Models, Chapter 2, Cengage. 12
Common Model Performance Metrics
Predict a Continuous Predict a Categorical
Target Variable Y Variable Y
• RMSE (Root Mean • Confusion Matrix
Square Error) • False Positive Rate
• MAPE (Mean Absolute • False Negative Rate
Prediction Error)
• Mean Directional
Accuracy (MDA)

13
RMSE – A popular metric to compute model
prediction error on a continuous Y variable

1. Why is this a good metric?

2. Can we use RMSE for Categorical Y variable? Explain.

3. Netflix used RMSE in their US$1 mil prize. Right/Wrong? What’s the implication?
14
Compare Different Models’ Performance
• The lower the RMSE on a testset (or average of
10 folds CV), the better is the model performance.

15
Watch Pre-class Lecture Videos 6.x or read main textbook Chapter 6.

Day 2 (Part 1b)

CONTINUOUS Y VARIABLE AND

LINEAR REGRESSION
16
Review on Linear Regression Model
𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑚 𝑥𝑚 + 𝑒

𝑦ො e ~ N(0, σ)
Straight Line Equation
Y_hat typically used as an Errors (aka Residuals) follow a Normal
estimate of Y. Distribution with mean 0 and a
constant standard deviation.
17
Day 2 (Part 2a)

MULTICOLLINEARITY
18
Pre-Class Activity: Did Exercise 2.1 at Home

• Discuss Solution to Exercise 2.1.PDF

• The lm() function generates the linear regression model (see lecture
videos 6.x)
– Also shows how to do train-test split and compute RMSE.
• There is something strange in the linear regression model results in
Q3. Did you detect it?

19
Multi-Collinearity
• When an X can be “easily explained” using all the other Xs
in the model.
– Example: A linear combination of X1, X3, X4 can explain 91% of X2.
• Why do you still need that X in the model?
• Creates instability in model coefficients i.e. high variance in
the model coefficient of that X.

20
Example of Multi-Collinearity: Predict Weight of
Growing Child

𝑌෠ = 4𝑋1
𝑌෠ = 2𝑋1 + 2𝑋2
𝑌෠ = 10𝑋1 − 6𝑋2
𝑌෠ = 1000𝑋2 − 996𝑋1

• Many models can be used to predict Y and X variables are collinear.

• Each model are equally accurate.
21
Variance Inflation Factor to Detect Multicollinearity
• Given a linear regression model with only continuous predictors, the
variance inflation factor of Xj is 1
𝑉𝐼𝐹𝑗 =
1 − 𝑅𝑗2
• Rj2 is the R2 value of a linear regression with Xj ~ all other Xs in the model.
• Two popular cut-offs to conclude collinear Xj
– VIF > 5 (implies R2 > 80%)
– VIF > 10 (implies R2 > 90%)
• VIF is for continuous Xs only model. Use adjusted Generalised VIF if model includes categorical X.
– Adj.GVIF is shown in the last column of the vif() output from Rpackage car.
• Two popular cut-offs using Adj.GVIF
– Adj.GVIF > sqrt(5) ≈ 2.24
– Adj.GVIF > sqrt(10) ≈ 3.16

22
Is multicollinearity to be avoided?
• No.
• Model Performance may still be very good.
• But do not interpret the model coefficients the standard
way if your model is multi-collinear.

23
Day 2 (Part 2b)

CATEGORICAL X VARIABLES AND

DUMMY VARIABLES
24
In-Class Activity: Do Exercise 2.2
Est: 20 mins.
• Clarify why we need dummy variables.
• Instructor shows how to do Q1 – Q3.
• Do (as much as you can) the remaining questions in Exercise
2.2.PDF

25
Watch Pre-class Lecture Videos 7.x or
read main textbook Chapter 7.

Day 2 (Part 3)

CATEGORICAL Y VARIABLE AND

LOGISTIC REGRESSION
26
Linear Regression Model for Continuous Y
X1
X2 Model
𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚 𝑌෠
X3
.
.
Xm

• Xs are unrestricted.
• Model output 𝑌෠ can be any value within a reasonable range.
• But what if Y is a categorical variable?
• Has Disease X or not; Approve/Reject Loan Application; Pass/Fail;
• Very Happy/Happy/Neutral/Sad/Very Sad; A/B/C/D/E/F; Red/Green/Blue,…
27
Logistic Regression Model for Categorical Y is a 2-step process.

X1 Find a Function based on Xs that

Linear Equation can be used as a Probability
X2
𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚 P(Y = cat1)
X3
.
.
Xm

𝑌෠ = 𝑐𝑎𝑡1 𝑌෠ = 𝑐𝑎𝑡0

• Find a Function that takes linear equation as input and outputs a probability .
• For Binary outcomes Y, a popular choice for threshold = 50%.
28
What is the Logistic Function?

• Linear function: f(x) = 2 + 3x

• Quadratic function: f(x) = 4x2 + 2x – 2
• Logistic function:
1
𝑓 𝑥 =
1+𝑒 −𝑥

29
Logistic Function output is between 0 and 1
1
𝑓 𝑥 =
1 + 𝑒 −𝑥

𝑥
• Accepts any value for x.
• Output is between 0 and 1.
• Hence Logistic function f(x) can be interpreted as a probability.
30
Logistic Function with multiple Xs
1
𝑓 𝑧 =
1 + 𝑒 −𝑧

• Let 𝑧 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚 𝑧
• Model coefficients are optimised to fit the data.
• Xs affect P(Y = cat1) via the model coefficients.
31
Logistic Regression Model for Categorical Y is a 2-step process.

X1 Logistic Function
1
Linear Equation P(Y = cat1) =
X2 1+𝑒 −𝑧
𝑧 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 +… 𝑏𝑚 𝑥𝑚
X3
.
.
Xm

𝑌෠ = 𝑐𝑎𝑡1 𝑌෠ = 𝑐𝑎𝑡0

• Logistic Function takes the linear equation as input and outputs P(Y = cat1).
• P(Y = cat0) = 1 – P(Y = cat1)
32
Measuring Model Prediction Errors
If Y is continuous, the model prediction error can be
calculated by considering:
• For each obs: Error = Actual Y value – Model Predicted Y
• Over the entire dataset with n obs: RMSE.

If Y is binary (cat0 or cat1), then there are only two possible

prediction error for each obs:
• Model predicted Y = 1, but actually Y = 0. i.e. False Positive.
• Model predicted Y = 0, but actually Y = 1. i.e. False Negative.
• Over the entire dataset with n obs: Confusion Matrix.

33
Confusion Matrix

• A confusion matrix compares model predicted values

against the actual data values.
• The main diagonal values “50” and “100” represents the
number of correct model predictions.
• The other diagonal values “10” and “5” represents the
number of wrong model predictions.
34
Confusion Matrix

• True Positive rate = TP/Actual Yes = 100/105. Aka Sensitivity or Recall.

• False Positive rate = FP/Actual No = 10/60. Aka Type 1 error.
• True Negative rate = TN/Actual No = 50/60. Aka Specificity.
• False Negative rate = FN/Actual Yes = 5/105. Aka Type 2 error.
• Overall Accuracy = 150 / 165; Overall Error = 15 / 165.
35
Discuss Solution to Pre-Class Ex 2.3 Part A.

• Q1 reveals the main weakness of logistic regression.

• Q2 reveals a common misinterpretation of the model coefficient.

36
Odds of Event A is defined in terms of P(A)
𝑃(𝐴)
𝑂 𝐴 ≡
1 − 𝑃(𝐴)

Typically expressed as two numbers: Integer Numerator

and Integer Denominator

37
Example: Probability & Odds of Heart Attack

• Event A: Heart Attack

• If P(A) = 0.25, what is the Odds(A)?
Odds (A) = 0.25/(1-0.25) = 1/3
Odds of A is 1 to 3.

• If P(A) = 0.75, what is the Odds(A)?

Odds(A) = 0.75/(1-0.75) = 3/1
Odds of A is 3 to 1.
38
Odds if P(Y = 1) is a logistic function
𝑧 = 𝑏0 + 𝑏1 𝑋1 + 𝑏2 𝑋2 + ⋯ + 𝑏𝑚 𝑋𝑚

1
𝑃 𝑌=1 =
1 + 𝑒 −𝑧

𝑃 𝑌=1 1 𝑒 −𝑧 𝑧
𝑂𝑑𝑑𝑠 𝑌 = 1 ≡ = ÷ = 𝑒
1−𝑃 𝑌 =1 1 + 𝑒 −𝑧 1 + 𝑒 −𝑧

39
Odds Ratio for each predictor
For each continuous X variable:
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑒 𝑏𝑦 1 𝑢𝑛𝑖𝑡
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 𝑌 = 1 = = 𝑒 𝑐𝑜𝑒𝑓
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝑠𝑡𝑎𝑡𝑢𝑠 𝑞𝑢𝑜

For each categorical X variable:

𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝐵
𝑂𝑑𝑑𝑠 𝑅𝑎𝑡𝑖𝑜 𝑌 = 1 = = 𝑒 𝑐𝑜𝑒𝑓
𝑂𝑑𝑑𝑠 𝑌 = 1 𝑖𝑓 𝑋 𝑖𝑠 𝐵𝑎𝑠𝑒𝑙𝑖𝑛𝑒 𝐴

Request for Proof of Relationship between Logistic Reg Model Coef and
Odds Ratio.PDF from instructor if you are interested in the proof.
40
Identifying the Risk Factors for Y to be cat1
• Two equivalent “tests”
– Which X variable has p-value < 5%
– Which X variable has Odds Ratio 95% Confidence
Interval excluding 1.

41
Hours Studying is a risk factor in
Pass/Fail Exam (from p-value)

• It’s p-value is less than 5% (actual p-value = 1.67%)

• Hours studying has a positive association with passing the exam.
– Coefficient is positive.
42
Hours Studying is a risk factor in Pass/Fail Exam
(from Odds Ratio 95% Confidence Interval)

• Odds Ratio 95% CI excludes 1.

• Statistical conclusion is the same as using p-value.

43
What’s special about Odds Ratio = 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
=1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠

• Odds of getting Y = cat1 is the same regardless of A or B.

• A or B does not affect Y.

44
What if Odds Ratio > 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
>1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠

• Odds of getting Y = cat1 is higher if B occurs compared to A.

• Odds is related to probability. The higher the probability, the
higher the Odds, and vice versa.

45
What if Odds Ratio < 1?
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐵 ℎ𝑎𝑝𝑝𝑒𝑛𝑠
<1
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑌 = 𝑐𝑎𝑡1 𝑖𝑓 𝐴 ℎ𝑎𝑝𝑝𝑒𝑛𝑠

• Odds of getting Y = cat1 is lower if B occurs compared to A.

46
The events A and B depends on the type of X
• If X is categorical, then dummy variables are
created, and A is always the baseline level.
• If X is continuous, then A is the status quo and B
is a 1 unit increase in X.

47
Odds Ratio for the predictor in passexam.csv

• If a student studies for 1 more hour, what do

you expect his Odds of Passing the Exam?
1. Equals 1
2. More than 1
3. Less than 1

48
Quantifying Risk Factor with Odds Ratio = ecoef

• For predictor Hours: Odds Ratio = e1.5046 ≈ 4.5

• If student studies for 1 more hour, odds of passing
the exam increases by a factor of 4.5.
49
Identifying the Risk Factors for Y to be cat1
• Two equivalent “tests”
– Which X variable has p-value < 5%
– Which X variable has Odds Ratio 95% Confidence Interval
excluding 1.

50
Discuss Solution to Pre-Class Ex 2.3 Part B.

• Showing Odds Ratios are sufficient to identify high risk factors.

51
Day 1 (Part 4)

MULTI-CATEGORICAL Y
52
What if Y has 3 or more categorical outcomes?

• A/B/C/D/E
• Pass/borderline Pass/Fail
• 0/1/2

• We will only need to study 3 outcomes scenario as

the structure is similar for more than 3 outcomes.
• First, define the baseline level for Y e.g. Y = 0.
• Then Y = 1 is compared to baseline,
• and Y = 2 is compared to baseline, …etc.

53
Discuss Solution to Pre-Class Ex 2.3 Part C.

• For 3 categorical Y, 2 linear equations and hence 2 logistic functions

are produced.
• Use Odds Ratios to determine statistical significant if p-values are
not shown.

54
Summary
• Categorical Y prediction can be achieved by
– using logistic function on a linear combination of Xs.
– Comparing the logistic function against a threshold.
• Good habit to check the levels of the Y variable to avoid
misinterpreting the software output.
• Confusion Matrix shows the performance (both correct
and wrong predictions) of the logistic regression model.
• Changing the threshold can change the error tradeoffs.
55
Quiz 2

56
What is the most impt
Q1 thing that you learned
today?
Reflection
on your
Learning
What is still confusing
Q2 or difficult to you?

5
7
The End of Day 2
ANY QUESTIONS?
REMEMBER TO COMPLETE PRE-CLASS ACTIVITIES
BEFORE DAY 3 CLASS (SEE CHECKLIST 3)
58

Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
No ratings yet
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
72 pages
Data Science Unit-5
No ratings yet
Data Science Unit-5
37 pages
Predictive Analytics - Regression
No ratings yet
Predictive Analytics - Regression
27 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
27 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
W1.2 Regression 1
No ratings yet
W1.2 Regression 1
28 pages
CSE3506 PPT Ref1
No ratings yet
CSE3506 PPT Ref1
135 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Week 10 - Lecture 10
No ratings yet
Week 10 - Lecture 10
59 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Exam 2 Review
No ratings yet
Exam 2 Review
23 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Linear Regression Logistic Regression Classification
No ratings yet
Linear Regression Logistic Regression Classification
66 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Unit 2
No ratings yet
Unit 2
19 pages
Chapter 4 Statistical Classification Methods
No ratings yet
Chapter 4 Statistical Classification Methods
63 pages
BFCAI BigDataAnalytics Lecture#5 2
No ratings yet
BFCAI BigDataAnalytics Lecture#5 2
69 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
2021 Quiz2 Problems
No ratings yet
2021 Quiz2 Problems
13 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
BA - Advanced Statistical Method Using R (P2)
No ratings yet
BA - Advanced Statistical Method Using R (P2)
12 pages
ML Mod 2
No ratings yet
ML Mod 2
13 pages
Module1.4 Regression
No ratings yet
Module1.4 Regression
24 pages
02 Regression and Classification Problems
No ratings yet
02 Regression and Classification Problems
7 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Btt002 w4 Glossary
No ratings yet
Btt002 w4 Glossary
3 pages
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
100% (14)
Regression & Linear Modeling Best Practices and Modern Methods, 1st Edition Complete DOCX Download
15 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
DSBDL - Write - Ups - 4 To 7
No ratings yet
DSBDL - Write - Ups - 4 To 7
11 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
LinearRegressionUsing R
No ratings yet
LinearRegressionUsing R
91 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
13 pages
YLP Logistic Regression
No ratings yet
YLP Logistic Regression
61 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
No ratings yet
Logistic Regression and Discriminant Analysis: Jerry D.T. Purnomo, PH.D
54 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Offrey Vining PDF
No ratings yet
Offrey Vining PDF
510 pages
Logistic Regression Lecture Notes
No ratings yet
Logistic Regression Lecture Notes
11 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Module 4 (Forecasting)
No ratings yet
Module 4 (Forecasting)
42 pages
Mee1024 Operations-Research TH 1.1 47 Mee1024 PDF
No ratings yet
Mee1024 Operations-Research TH 1.1 47 Mee1024 PDF
2 pages
Mba 2 Sem Operations Research Nmba 025 2016 17
No ratings yet
Mba 2 Sem Operations Research Nmba 025 2016 17
2 pages
11 Managed Services
No ratings yet
11 Managed Services
25 pages
Ch08 Project Scheduling
100% (1)
Ch08 Project Scheduling
51 pages
Demand Forecasting
100% (1)
Demand Forecasting
41 pages
Table of Interest Faktor and Building
100% (1)
Table of Interest Faktor and Building
30 pages
Lecture 1 - Introduction To NN - CET
No ratings yet
Lecture 1 - Introduction To NN - CET
53 pages
2nd Year Statistics Chapter Wise Test
No ratings yet
2nd Year Statistics Chapter Wise Test
8 pages
Advanced Statistics Previous Year Questions
No ratings yet
Advanced Statistics Previous Year Questions
20 pages
On Sequential Analysis
0% (1)
On Sequential Analysis
9 pages
Week 1 - Introduction To SDGAI
No ratings yet
Week 1 - Introduction To SDGAI
36 pages
Week 3 - Post - GAN
No ratings yet
Week 3 - Post - GAN
38 pages
Comptia Linux Xk0 005 Exam Objectives (2 0)
No ratings yet
Comptia Linux Xk0 005 Exam Objectives (2 0)
16 pages
05 Ge 302 Pert PDF
No ratings yet
05 Ge 302 Pert PDF
8 pages
1 Descriptive Statistics
No ratings yet
1 Descriptive Statistics
20 pages
Ant Colony Optimization: Dr. B. S. Girish
No ratings yet
Ant Colony Optimization: Dr. B. S. Girish
24 pages
Test1 PDF
No ratings yet
Test1 PDF
10 pages
07 Resource Monitoring
No ratings yet
07 Resource Monitoring
37 pages
Simple and Multiple Linear Regression and Correlation
No ratings yet
Simple and Multiple Linear Regression and Correlation
41 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
Lecture 2 - CNN and Overfitting
No ratings yet
Lecture 2 - CNN and Overfitting
42 pages
This Study Resource Was: Case Study 2: Finding Jill Moran's Retirement Annuity
100% (3)
This Study Resource Was: Case Study 2: Finding Jill Moran's Retirement Annuity
3 pages
5.2) Multinomial Logistic Regression
No ratings yet
5.2) Multinomial Logistic Regression
34 pages
06 Sample Exam Questions
No ratings yet
06 Sample Exam Questions
79 pages
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M3 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
45 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
PA
No ratings yet
PA
8 pages
AST Day 4 Slides (New)
No ratings yet
AST Day 4 Slides (New)
37 pages
Lecture 2 - Conv - Operation
No ratings yet
Lecture 2 - Conv - Operation
31 pages
Lecture 6 - Use Cases of CNN and Implementation
No ratings yet
Lecture 6 - Use Cases of CNN and Implementation
33 pages
Exam 1 Questions
No ratings yet
Exam 1 Questions
6 pages
EC204 Topic 3 - Oligopoly and Game Theory Applications Student Slides PDF
No ratings yet
EC204 Topic 3 - Oligopoly and Game Theory Applications Student Slides PDF
30 pages
Leadership Team
No ratings yet
Leadership Team
10 pages
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
No ratings yet
M2 - T-GCPFCI-B - Core Infrastructure 5.0 - ILT
47 pages
Soal Korelasi Dan Regresi Linier Sederhana
No ratings yet
Soal Korelasi Dan Regresi Linier Sederhana
11 pages
08 Interconnecting Networks
No ratings yet
08 Interconnecting Networks
45 pages
01 Interacting With Google Cloud
No ratings yet
01 Interacting With Google Cloud
20 pages
Week 0 - Introduction To SDGAI
No ratings yet
Week 0 - Introduction To SDGAI
8 pages
Example For Final Exam
No ratings yet
Example For Final Exam
6 pages
Lecture 0 - DLIR Module Intro
No ratings yet
Lecture 0 - DLIR Module Intro
8 pages
2012e4003lecture10-12 Complete Part2 PDF
No ratings yet
2012e4003lecture10-12 Complete Part2 PDF
73 pages
Lecture 1 - NN - Computation
No ratings yet
Lecture 1 - NN - Computation
5 pages
Ans in Day 4 Slides
No ratings yet
Ans in Day 4 Slides
5 pages
MODIFIED - STUDY GUIDE FOR FINAL EXAM - English
No ratings yet
MODIFIED - STUDY GUIDE FOR FINAL EXAM - English
3 pages
Anova 1
No ratings yet
Anova 1
20 pages
2016ies Exam Question Paper-1
No ratings yet
2016ies Exam Question Paper-1
3 pages
Exercise 1.2 Data Exploration
No ratings yet
Exercise 1.2 Data Exploration
1 page
Assignment
No ratings yet
Assignment
2 pages
Best Linear Predictor
No ratings yet
Best Linear Predictor
15 pages
Mod6 4.1 Blogger vs. WordPress
No ratings yet
Mod6 4.1 Blogger vs. WordPress
5 pages
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
No ratings yet
Mod6 2.1 RACE Framework Your Practical Tool For Effective Digital Marketing
5 pages
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
No ratings yet
Asset-V1 RISE+MASTER-BCG RF+Wave11-DSM09-P0-3+Type@[email protected] Build Your Team Everest Onepager Pre Read
1 page
Chapter10 SimultaneousMoveOneShotGame
No ratings yet
Chapter10 SimultaneousMoveOneShotGame
3 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet

AST Day 2 Slides

Uploaded by

AST Day 2 Slides

Uploaded by

Analytics Strategy and

Updated: 26 Aug 2023

THE USEFULNESS OF A MODEL

Highest Explanability Lowest Explanability

*: Selected list of models on the Explanability Scale.

1. Why is this a good metric?

2. Can we use RMSE for Categorical Y variable? Explain.

Day 2 (Part 1b)

CONTINUOUS Y VARIABLE AND

• Discuss Solution to Exercise 2.1.PDF

• Many models can be used to predict Y and X variables are collinear.

CATEGORICAL X VARIABLES AND

CATEGORICAL Y VARIABLE AND

X1 Find a Function based on Xs that

• Linear function: f(x) = 2 + 3x

If Y is binary (cat0 or cat1), then there are only two possible

• A confusion matrix compares model predicted values

• True Positive rate = TP/Actual Yes = 100/105. Aka Sensitivity or Recall.

• Q1 reveals the main weakness of logistic regression.

Typically expressed as two numbers: Integer Numerator

• Event A: Heart Attack

• If P(A) = 0.75, what is the Odds(A)?

For each categorical X variable:

• It’s p-value is less than 5% (actual p-value = 1.67%)

• Odds Ratio 95% CI excludes 1.

• Odds of getting Y = cat1 is the same regardless of A or B.

• Odds of getting Y = cat1 is higher if B occurs compared to A.

• Odds of getting Y = cat1 is lower if B occurs compared to A.

• If a student studies for 1 more hour, what do

• For predictor Hours: Odds Ratio = e1.5046 ≈ 4.5

• Showing Odds Ratios are sufficient to identify high risk factors.

• We will only need to study 3 outcomes scenario as

• For 3 categorical Y, 2 linear equations and hence 2 logistic functions

You might also like