CPSC 4830 2025summer Lecture 3

The document covers the fundamentals of regression analysis, focusing on linear and logistic regression, including their formulas and assumptions. It discusses the importance of various tests for model validation, such as linearity, normality, and homoscedasticity, and provides guidance on how to address failures in these tests. Additionally, it highlights key metrics like p-value and R-squared, and offers practical examples for applying regression techniques in data analytics.

Uploaded by

Jerd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views33 pages

CPSC 4830 2025summer Lecture 3

Uploaded by

Jerd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

CPSC 4830

Data Mining for Data Analytics

Lecture 3
Regression
2 types
1.Linear Regression (Predict values)

Y = β0 + β1X1+ β2X2+ β2X3+… + βkXk

2.Logistic Regression (Predict class label)

Linear Regression
= β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
Terminology:
True value (Y) = β0 + β1X1+ β2X2+ β2X3+… + βkXk
Predicted value () : Dependent Variable (DV)
Input variable (Xi) : Independent Variables (IV)
Note: Xi always refer to one IV, and X refers to ALL IV
Assumptions:
1.Linearity: Each Xi has a linear relationship with the mean of Y
2.Normality: For any fixed value of X (fixed all Xi), Y is normally distributed
3.Homoscedasticity: The variance of residual (error) is the same for any value of X
4.Independence: All Xi are independent with each other
Linear Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Before building model:
1.Linearity: Plot scatter plot for Y and every Xi, Transformation or Remove
2.Independence: Calculate the Correlation Matrix among X, or use VIF (Variance Inflation Factor)
After building model:
3.Normality: Check with K-S test or Q-Q Plot
4.Homoscedasticity: Q-Q Plot the residual (errors), or residual against y, or residual against all X
Linear Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Before building model:
1.Linearity: Plot scatter plot for Y and every Xi, Transformation or Remove
2.Independence: Calculate the Correlation Matrix among X, or use VIF (Variance Inflation Factor)
After building model:
3.Normality: Check with K-S test or Q-Q Plot
4.Homoscedasticity: Q-Q Plot the residual (errors), or residual against y, or residual against all X

What if failed for the above tests?

Linear Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Before building model:
1.Linearity: Plot scatter plot for Y and every Xi, Transformation or Remove
2.Independence: Calculate the Correlation Matrix among X, or use VIF (Variance Inflation Factor)
After building model:
3.Normality: Check with K-S test or Q-Q Plot
4.Homoscedasticity: Q-Q Plot the residual (errors), or residual against y, or residual against all X

What if failed for the above tests?

5.Linearity: Consider higher order or transformation, and check again
6.Independence: Group the correlated features together by PCA or average, etc.
7.Normality: Data transformation or robust regression (higher order)
8.Homoscedasticity: check the pattern and apply different algorithms, like funnel shape -> transformation for Y;
non-linear -> weighted least square, etc.
Linear Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
The values that we concern most:
1.p-value for each Xi: to keep it or drop it
2.Squared R: How much variance being explained
3.Slope and Intercept for the model
Note: other values are useful too
Linear Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Kaggle
Interview Query
Datasets for playing
Telus
…

1.Predict brain weight of mammals based their body weight (x01.txt)

2.Predict blood fat content based on age and weight (x09.txt)
3.Predict death rate from cirrhosis based on a number of other factors (x20.txt)
4.Predict selling price of houses based on a number of factors (X27.txt)
And so on…
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk

Beside MSE, what other measures did you learn?

Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk

Beside MSE, what other measures did you learn?

MSE, MAE, MAPE, RMSE
So, when to use each case?
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk

Beside MSE, what other measures did you learn?

MSE, MAE, MAPE, RMSE
So, when to use each case?
MSE, RMSE sensitive to outlier -> Good for stable
MAE relatively not sensitive -> Good for unstable
MAPE favours underestimate model -> budget
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Linear Regression Theory = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Polynomial Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Polynomial Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk

Questions: When will we use linear regression? When will we consider 2 nd order or higher order?
When to stop searching for higher order solution?
Polynomial Regression = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk

Questions: When will we use linear regression? When will we consider 2 nd order or higher order?
When to stop searching for higher order solution?
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Model Capacity, = β0 + β1X1+ β2X2+ β2X3+… + βkX k + ε
= β0 + β1X1+ β2X2+ β2X3+… + βkXk
Overfitting, Underfitting
Take home messages
• Regression: Linear regression and Logistic regression
• How to use it in python
• What is p-value, R square, intercept, slope (beta or weight)
• What are the assumptions

Vienna Superautomatica Parts Diagram
0% (1)
Vienna Superautomatica Parts Diagram
8 pages
CIDAM World Religions
100% (16)
CIDAM World Religions
18 pages
Tamil Periodic Table
No ratings yet
Tamil Periodic Table
31 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
ML Ai
No ratings yet
ML Ai
53 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Unit III Da Notes
No ratings yet
Unit III Da Notes
43 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
3 Da
No ratings yet
3 Da
16 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Chapter 06-Regression Analysis
No ratings yet
Chapter 06-Regression Analysis
41 pages
Lecture 1: Introduction and Key Concepts
No ratings yet
Lecture 1: Introduction and Key Concepts
62 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Unit 2
No ratings yet
Unit 2
79 pages
Linear Regression Concepts - A4
No ratings yet
Linear Regression Concepts - A4
6 pages
CH 2
No ratings yet
CH 2
31 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
Da Sem Unit 3-1
No ratings yet
Da Sem Unit 3-1
13 pages
Module 4
No ratings yet
Module 4
33 pages
Lecture 2
No ratings yet
Lecture 2
23 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Unit 2
No ratings yet
Unit 2
136 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
ML Unit
No ratings yet
ML Unit
23 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Linear Regression With Assumpt
No ratings yet
Linear Regression With Assumpt
3 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
Polynomial Regression
No ratings yet
Polynomial Regression
11 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
ML Combined
No ratings yet
ML Combined
254 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Ec2 1
No ratings yet
Ec2 1
11 pages
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Machine Design Trial 1
No ratings yet
Machine Design Trial 1
13 pages
CPSC 4830 2025summer Lecture 2
No ratings yet
CPSC 4830 2025summer Lecture 2
42 pages
Mcdes 1
No ratings yet
Mcdes 1
7 pages
Machine Design Trial 1
No ratings yet
Machine Design Trial 1
7 pages
Macdes Boards
No ratings yet
Macdes Boards
6 pages
Powerplant and Industrial Plant Engineering Trial 1: 1 Points
No ratings yet
Powerplant and Industrial Plant Engineering Trial 1: 1 Points
8 pages
Mathematics and Engineering Sciences
No ratings yet
Mathematics and Engineering Sciences
9 pages
Powerplant and Industrial Plant Engineering Trial 1: 1 Points
No ratings yet
Powerplant and Industrial Plant Engineering Trial 1: 1 Points
8 pages
Math Boards Problem
No ratings yet
Math Boards Problem
16 pages
AE Lecture Otto - Diesel Cycles
No ratings yet
AE Lecture Otto - Diesel Cycles
19 pages
Numerical Methods
No ratings yet
Numerical Methods
25 pages
Ob Labor Delivery Skills Checklist
No ratings yet
Ob Labor Delivery Skills Checklist
3 pages
Books of M.A I-Ii Etc
No ratings yet
Books of M.A I-Ii Etc
3 pages
Traditional Chinese Medicine (TCM) : Chen Shaodong TCM Department, Xiamen University
No ratings yet
Traditional Chinese Medicine (TCM) : Chen Shaodong TCM Department, Xiamen University
35 pages
Gas Pressure Regulator Series 240Pl: Serving The Gas Industry Worldwide
No ratings yet
Gas Pressure Regulator Series 240Pl: Serving The Gas Industry Worldwide
11 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
Continuous Crystallizers
50% (2)
Continuous Crystallizers
22 pages
JCP KM
No ratings yet
JCP KM
2 pages
Intuition
100% (6)
Intuition
337 pages
Tutankhamuns Missing Ribs KMT 18.1 PDF
100% (3)
Tutankhamuns Missing Ribs KMT 18.1 PDF
7 pages
Diesel Generator Warranty
No ratings yet
Diesel Generator Warranty
1 page
CS 191x Courseware4
No ratings yet
CS 191x Courseware4
3 pages
7-1 Initial Sizing Reference
No ratings yet
7-1 Initial Sizing Reference
4 pages
WUC111 Final Exam
No ratings yet
WUC111 Final Exam
4 pages
Som All Theory Question and Answers Shaikh Sir Notes
No ratings yet
Som All Theory Question and Answers Shaikh Sir Notes
7 pages
Glad Tidings of The Kingdom of God Issue 1564
No ratings yet
Glad Tidings of The Kingdom of God Issue 1564
20 pages
Vol 3-2 28-08-07
No ratings yet
Vol 3-2 28-08-07
96 pages
Momentum and Forces in Fluid Flow
No ratings yet
Momentum and Forces in Fluid Flow
11 pages
Articulation Styles:: The Tongue Moves in An Up, Then
No ratings yet
Articulation Styles:: The Tongue Moves in An Up, Then
3 pages
Arista 7280R-DataSheet
No ratings yet
Arista 7280R-DataSheet
16 pages
Operation On The Musculoskeletal Oni Fix
No ratings yet
Operation On The Musculoskeletal Oni Fix
3 pages
Geography Revision Booklet
No ratings yet
Geography Revision Booklet
170 pages
Best PLace of Original Pearls in Hydrabad
No ratings yet
Best PLace of Original Pearls in Hydrabad
2 pages
A320 Reset
100% (21)
A320 Reset
82 pages
SRM20 Operator Manual
No ratings yet
SRM20 Operator Manual
19 pages
K8 Document Quotation-93522502 - 250114 - 140637
No ratings yet
K8 Document Quotation-93522502 - 250114 - 140637
1 page
AC Daikin FXMQ-P-Ducted-Engineering-Data PDF
No ratings yet
AC Daikin FXMQ-P-Ducted-Engineering-Data PDF
42 pages

CPSC 4830 2025summer Lecture 3

Uploaded by

CPSC 4830 2025summer Lecture 3

Uploaded by

CPSC 4830

Data Mining for Data Analytics

Y = β0 + β1X1+ β2X2+ β2X3+… + βkXk

2.Logistic Regression (Predict class label)

What if failed for the above tests?

What if failed for the above tests?

1.Predict brain weight of mammals based their body weight (x01.txt)

Beside MSE, what other measures did you learn?

Beside MSE, what other measures did you learn?

Beside MSE, what other measures did you learn?

You might also like