0% found this document useful (0 votes)

2 views51 pages

Lecture+4+ +Intro+to+Modeling+and+Linear+Regression

This lecture introduces modeling concepts in data science, focusing on statistical models, particularly linear regression. It explains the modeling process, including defining models, estimating parameters, and evaluating model performance using metrics like Mean Squared Error (MSE) and Mean Absolute Error (MAE). The lecture also discusses the importance of loss functions and their impact on model accuracy and computational cost.

Uploaded by

csy060522

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views51 pages

Lecture+4+ +Intro+to+Modeling+and+Linear+Regression

Uploaded by

csy060522

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Lecture 4 : Intro to modeling and Linear

regression model

2 0 2 5 S P R ING I N T RO TO DATA S CI E NCE

1
Where Are We?

2
What is a Model?

Essentially, all
• A model is an idealized representation of a system. models are
wrong, but some
• Example: we model the fall of an object on Earth as are useful.
subject to a constant acceleration of 9.81 m/s² due to
gravity.
• While this describes the behavior of our system, it is
merely an approximation.
• It doesn’t account for the effects of air resistance, local
variations in gravity, etc.
• But in practice, it’s accurate enough to be useful! George Box, Statistician
(1919-2013)

3
What is a Model?

• In data science, a model usually means a mathematical rule or function that describes the
relationships between variables.

• In this course, we will cover

• Statistical Models

• Machine Learning Models

• Deep Learning Models

4
Example : A statistical model
Below are Sales vs TV, Radio and Newspaper

Can we predict Sales using these three?

Perhaps we can do better using a model
Sales ≈ f(TV; Radio; Newspaper)

5
A statistical model

• Here Sales is a response or target that we wish to predict; generically referred to as Y .

• TV, Radio, Newspaper are a feature, or input, or predictor; we name X

• We write our model as

𝑌𝑌 = 𝑓𝑓 𝑋𝑋 + 𝜖𝜖

where 𝜖𝜖 captures measurement errors and other discrepancies (will come back here later).

• A set of approaches for estimating 𝑓𝑓 is sometimes called a statistical learning procedure.

6
What is f(X) good for?

• 1. We can make predictions of 𝑌𝑌 at new points 𝑋𝑋 = 𝑥𝑥.

• 2. We can understand which components of 𝑋𝑋 = (𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑝𝑝 ) are important in explaining
𝑌𝑌, and which are irrelevant.

• 3. Depending on the complexity of f, we may be able to understand how each component 𝑋𝑋𝑖𝑖
of 𝑋𝑋 affects 𝑌𝑌.

7
The Modeling Process

How should we represent the world?

How do we quantify prediction error?

How do we choose the best parameters of our model given our

data?

How do we evaluate whether this process gave rise to a good

model?

8
The Modeling Process

9
Simple Linear Regression: Our First Model

• Simple Linear Regression Model (SLR) : 𝑦𝑦 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜖𝜖

• SLR is a parametric model, meaning we choose the "best" parameters for slope and
intercept based on data.

• We often express 𝜃𝜃 = (𝜃𝜃0 , 𝜃𝜃1 ) as a single parameter vector.

• Sample-based estimate of parameter 𝜃𝜃 is written as 𝜃𝜃̂ which will provide the estimate of 𝑦𝑦

• Usually, we pick the parameters that appear "best" according to some criterion we choose.

10
Which 𝜃𝜃 is best?

We need some metric of how

"good" or "bad" our predictions are.
For every chapter of the novel Little Women, Estimate the # of
characters based on the # of periods in that chapter.

11
The Modeling Process

12
Loss Functions

• A loss function 𝐿𝐿 𝑦𝑦, 𝑦𝑦� characterizes the cost, error, or fit resulting from a particular choice
of model or model parameters

• The choice of loss function affects the accuracy and computational cost of estimation.

• The choice of loss function should depends on the estimation task:

• Are outputs quantitative or qualitative?
• Do we care about outliers?
• Are all errors equally costly? (e.g., false negative on cancer test)

13
L2 and L1 Loss

14
Residuals as Loss Function?

• Why don't we directly use residual error as the loss

function?

•Doesn't work: Big negative residuals shouldn't cancel out

big positive residuals
• Our predictions can be very off, but we can still get a
zero residual.

15
Empirical Risk is Average Loss over Data

• We care about how bad our model’s predictions are for our entire data set, not just for one
point.

• A natural measure, then, is of the average loss (aka empirical risk) across all points.

• Given data

• The average loss on the sample tells us how well the model fits the data (not the
population), But hopefully these are close.

16
Empirical Risk is Average Loss over Data

The colloquial term for average loss depends on which loss function we choose.

17
The Modeling Process

We want to find that minimize

this objective function.

18
Minimizing MSE for the SLR Model

• Recall: we wanted to pick the regression line

• To minimize the (sample) Mean Squared Error:

• To find the best values, we set derivatives equal to zero to obtain the optimality
conditions:

19
Partial Derivative of MSE with Respect to 𝜃𝜃0, 𝜃𝜃1

20
Estimating Equations

• To find the best values, we set derivatives equal to zero to obtain the optimality conditions:

• To find the best 𝜃𝜃0, 𝜃𝜃1, we need to solve the estimating equations on the right.

21
From Estimating Equations to Estimators

• Goal: Choose 𝜃𝜃0 , 𝜃𝜃1 to solve two estimating equations:

(1) (2)

• (1)

22
From Estimating Equations to Estimators

Lets try (2)-(1) *

23
Estimating Equations

• Plug in definitions of correlation and SD:

• Solve for parameters :

24
Minimizing MSE for the SLR Model

• Recall: we wanted to pick the regression line

• To minimize the (sample) Mean Squared Error:

• To find the best values, we set derivatives equal to zero to obtain the optimality
conditions:

25
Estimating Equations

• Estimating equations are the equations that the model fit has to solve. They help us:
• Derive the estimates.
• Understand what our model is paying attention to.

For SLR:

•The residuals should average to zero (otherwise we should fix the intercept!)
•The residuals should be orthogonal to the predictor variable (or we should fix the slope!)

26
The Modeling Process

27
Evaluating Models

What are some ways to determine if our model was a good fit to our data?
1. Performance metrics: Root Mean Square Error (RMSE)
• A lower RMSE indicates more "accurate" predictions (lower "average loss" across data)
• RMSE is in the same units as y.

28
Four Mysterious Datasets (Anscombe’s quartet)

2. Visualization: Look at a residual plot to visualize the difference between actual and predicted values.

• The four dataset each have the same mean of

x, mean of y, SD of x, SD of y, and r value.

• Since our optimal Least Squares SLR model

only depends on those quantities, they all have
the same regression line and RMSE.

• However, only one of these four sets of data

makes sense to model using SLR.

29
Four Mysterious Datasets (Anscombe’s quartet)

The residual plot of a good regression shows no pattern.

30
The Modeling Process: Using a Different Model

31
The Constant Model

• You work at a local boba shop and want to

estimate the sales each day.

• Here's your data from 5 randomly selected

previous days, arbitrarily sorted by number of
drinks sold:

{20, 21, 22, 29, 33}

A. 0
B. 25
C. 22
D. 100
E. Something else

This is a constant model.

32
The Constant Model

• The constant model summarizes the data by always "predicting" the same number—
i.e., predicting a constant.

• It ignores any relationships between variables:

• For instance, boba tea sales likely depend on the time of year, the weather, how the
customers feel, whether school is in session, etc.
• Ignoring these factors is a simplifying assumption.

33
The Constant Model

• The constant model is also a parametric, statistical model:

𝑦𝑦 = 𝜃𝜃0 + 𝜖𝜖

• Our parameter is 1-dimensional.

• We now have no input into our model; we predict 𝑦𝑦� = 𝜃𝜃̂0 .
• Like before, we can still determine the best 𝜃𝜃0 that minimizes average loss on our data.

34
The Modeling Process: Using a Different Model

35
The Modeling Process: Using a Different Model

36
Fit the Model: Rewrite MSE for the Constant Model

• Recall that Mean Squared Error (MSE) is average squared loss (L2 loss) over the data

• Given the constant model,

• We fit the model by finding the optimal 𝜃𝜃0 that minimizes the MSE.

37
�0 = 𝑦𝑦̅
Fit the Model : 𝜃𝜃

• We can show that average loss is minimized by

• Derivation :

• It holds true regardless of what data sample you have.

• It provides some formal reasoning as to why the mean is such a common summary statistic.

38
Revisit the Boba Shop Example

• You work at a local boba shop and want to estimate

the sales each day.

• Here's your data from 5 randomly selected previous

days, arbitrarily sorted by number of drinks sold:

{20, 21, 22, 29, 33}

A. 0
B. 25
C. 22
D. 100
E. Something else
We will predict the mean of the previous five days’ sale:

(20 + 21 + 22 + 29 + 33)/5 = 25.

39
[Loss] Comparing Two Different Models, Both Fit with MSE

40
[Fit] Comparing Two Different Models, Both Fit with MSE

41
The Modeling Process: Using a Different Loss Function

42
Fit the Model: Rewrite MAE for the Constant Model

• Recall that Mean Absolute Error (MAE) is average absolute loss (L1 loss)

• Given the constant model

• We fit the model by finding the optimal 𝜃𝜃0 that minimizes the MAE.

43
Exploring MAE: A Piecewise function

• For the boba dataset {20, 21, 22, 29, 33}:

• Absolute (L1) Loss on one observation:

MAE (Mean Absolute Error) across all data:

Piecewise linear function…

minimized at 22?
44
Fit the Model: differentiation

45
Fit the Model: set equal to 0

• Theta needs to be such that there are an equal # of points to the left and right.

• This is the definition of the median!

• For example, in our boba tea dataset {20, 21, 22, 29, 33}, the point in green (22) is the
median.

46
MSE and MAE: Comparing Optimal Parameters

47
MSE and MAE: Comparing Loss Surfaces

48
MSE and MAE: Comparing Sensitivity to Outliers

49
MSE and MAE: Comparing Uniqueness of Solutions

50
Summary: Loss Optimization, Calculus, and…Critical Points?

• First, define the objective function as average loss.

• Plug in L1 or L2 loss.
• Plug in model so that resulting expression is a function of 𝜃𝜃.
•Then, find the minimum of the objective function:
• 1. Differentiate with respect to 𝜃𝜃.
• 2. Set equal to 0.
• 3. Solve for 𝜃𝜃̂
• Recall critical points from calculus: could be a minimum, maximum, or saddle point!
• We should technically also perform the second derivative test, i.e., show .
• MSE has a property—convexity—that guarantees that is a global minimum.
• The proof of convexity for MAE is beyond this course.

SRM Formula Sheet-2
100% (1)
SRM Formula Sheet-2
11 pages
2 Simple Linear Regression
No ratings yet
2 Simple Linear Regression
22 pages
DS100 Sp22 Lec 09 - Intro To Modeling, SLR
No ratings yet
DS100 Sp22 Lec 09 - Intro To Modeling, SLR
69 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
Regression
No ratings yet
Regression
45 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
BA H DSEC IiApplied Econometrics 5th Sem
No ratings yet
BA H DSEC IiApplied Econometrics 5th Sem
7 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
Regression
No ratings yet
Regression
44 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Lec 3
No ratings yet
Lec 3
22 pages
Lecture 1: Introduction and Key Concepts
No ratings yet
Lecture 1: Introduction and Key Concepts
62 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Loss Functions
No ratings yet
Loss Functions
37 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
2-LR Optim
No ratings yet
2-LR Optim
60 pages
Linear Regression
No ratings yet
Linear Regression
59 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
ML L6 Linear Regresion
No ratings yet
ML L6 Linear Regresion
54 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
36 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
SRM Formula Sheet
No ratings yet
SRM Formula Sheet
16 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
17.1.6 General Comments On Linear Regression
No ratings yet
17.1.6 General Comments On Linear Regression
7 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Chap 2 Linear Regression - Part1
No ratings yet
Chap 2 Linear Regression - Part1
29 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Elements of The Sampling Problem: IS T-.S
No ratings yet
Elements of The Sampling Problem: IS T-.S
33 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
EconometricsII Exercises
100% (1)
EconometricsII Exercises
27 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
100% (1)
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
49 pages
Chapter 9 Estimation From Sampling Data
No ratings yet
Chapter 9 Estimation From Sampling Data
23 pages
Click Here: Developer's Guide To Excelets/Sinex
No ratings yet
Click Here: Developer's Guide To Excelets/Sinex
3 pages
EC2020 Elements of Econometrics
No ratings yet
EC2020 Elements of Econometrics
4 pages
Week 3 Forecasting Homework
No ratings yet
Week 3 Forecasting Homework
4 pages
MId - Term 2
No ratings yet
MId - Term 2
9 pages
Switching Models: Introductory Econometrics For Finance © Chris Brooks 2002 1
No ratings yet
Switching Models: Introductory Econometrics For Finance © Chris Brooks 2002 1
29 pages
Generalized Least Squares: Simon Jackman Stanford University
No ratings yet
Generalized Least Squares: Simon Jackman Stanford University
3 pages
International Journal of Pavement Engineering
No ratings yet
International Journal of Pavement Engineering
6 pages
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
No ratings yet
Cram Er-Rao Lower Bound and Information Geometry: 1 Introduction and Historical Background
27 pages
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
No ratings yet
STAT 366 - Sample Survey Theory and Methods II - Lecture 2
82 pages
BUS204 Final Assignment 1
No ratings yet
BUS204 Final Assignment 1
4 pages
Point Estimation of Parameters and Sampling Distributions: Chapter 7 (Cont)
No ratings yet
Point Estimation of Parameters and Sampling Distributions: Chapter 7 (Cont)
14 pages
Short Term Load Forecasting With Using M
No ratings yet
Short Term Load Forecasting With Using M
9 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Discovering Statistics Using IBM SPSS Statistics 4th Edition (Ebook PDF) PDF Download
100% (1)
Discovering Statistics Using IBM SPSS Statistics 4th Edition (Ebook PDF) PDF Download
58 pages
Tugas 5 Statistik Pendidikan "Analisis Chapter 9"
No ratings yet
Tugas 5 Statistik Pendidikan "Analisis Chapter 9"
6 pages
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
No ratings yet
Chapter 5 Bivariate Analysis Students Notes 230125 152159-1
13 pages
SCI 1020 - wk2
No ratings yet
SCI 1020 - wk2
4 pages
Histograms Questions
No ratings yet
Histograms Questions
6 pages
Anova: Sum of Squares DF Mean Square F Sig. Between Groups Within Groups Total
No ratings yet
Anova: Sum of Squares DF Mean Square F Sig. Between Groups Within Groups Total
2 pages
Xây dựng và kiểm định HQTT R -
No ratings yet
Xây dựng và kiểm định HQTT R -
6 pages
VCE General Mathematics Unit 3 AOS1 Review
No ratings yet
VCE General Mathematics Unit 3 AOS1 Review
6 pages
Statistical Formulae (QBM101)
No ratings yet
Statistical Formulae (QBM101)
4 pages
Econ 339 Final Cheat Sheet
No ratings yet
Econ 339 Final Cheat Sheet
2 pages
Ridge and Lasso
No ratings yet
Ridge and Lasso
2 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
From Everand
Top 20 MS Excel VBA Simulations, VBA to Model Risk, Investments, Growth, Gambling, and Monte Carlo Analysis
Andrei Besedin
2.5/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Lecture+4+ +Intro+to+Modeling+and+Linear+Regression

Uploaded by

Lecture+4+ +Intro+to+Modeling+and+Linear+Regression

Uploaded by

Lecture 4 : Intro to modeling and Linear

2 0 2 5 S P R ING I N T RO TO DATA S CI E NCE

• In this course, we will cover

• Machine Learning Models

• Deep Learning Models

Can we predict Sales using these three?

• Here Sales is a response or target that we wish to predict; generically referred to as Y .

• TV, Radio, Newspaper are a feature, or input, or predictor; we name X

• We write our model as

• A set of approaches for estimating 𝑓𝑓 is sometimes called a statistical learning procedure.

• 1. We can make predictions of 𝑌𝑌 at new points 𝑋𝑋 = 𝑥𝑥.

How should we represent the world?

How do we quantify prediction error?

How do we choose the best parameters of our model given our

How do we evaluate whether this process gave rise to a good

• Simple Linear Regression Model (SLR) : 𝑦𝑦 = 𝜃𝜃0 + 𝜃𝜃1 𝑥𝑥 + 𝜖𝜖

• We often express 𝜃𝜃 = (𝜃𝜃0 , 𝜃𝜃1 ) as a single parameter vector.

We need some metric of how

• The choice of loss function should depends on the estimation task:

• Why don't we directly use residual error as the loss

•Doesn't work: Big negative residuals shouldn't cancel out

We want to find that minimize

• Recall: we wanted to pick the regression line

• To minimize the (sample) Mean Squared Error:

• Goal: Choose 𝜃𝜃0 , 𝜃𝜃1 to solve two estimating equations:

Lets try (2)-(1) *

• Plug in definitions of correlation and SD:

• Solve for parameters :

• Recall: we wanted to pick the regression line

• To minimize the (sample) Mean Squared Error:

• The four dataset each have the same mean of

• Since our optimal Least Squares SLR model

• However, only one of these four sets of data

The residual plot of a good regression shows no pattern.

• You work at a local boba shop and want to

• Here's your data from 5 randomly selected

{20, 21, 22, 29, 33}

This is a constant model.

• It ignores any relationships between variables:

• The constant model is also a parametric, statistical model:

• Our parameter is 1-dimensional.

• Given the constant model,

• We can show that average loss is minimized by

• It holds true regardless of what data sample you have.

• You work at a local boba shop and want to estimate

• Here's your data from 5 randomly selected previous

{20, 21, 22, 29, 33}

(20 + 21 + 22 + 29 + 33)/5 = 25.

• Given the constant model

• For the boba dataset {20, 21, 22, 29, 33}:

• Absolute (L1) Loss on one observation:

Piecewise linear function…

• This is the definition of the median!

• First, define the objective function as average loss.

You might also like