Lecture+4+ +Intro+to+Modeling+and+Linear+Regression
Lecture+4+ +Intro+to+Modeling+and+Linear+Regression
regression model
1
Where Are We?
2
What is a Model?
Essentially, all
• A model is an idealized representation of a system. models are
wrong, but some
• Example: we model the fall of an object on Earth as are useful.
subject to a constant acceleration of 9.81 m/s² due to
gravity.
• While this describes the behavior of our system, it is
merely an approximation.
• It doesn’t account for the effects of air resistance, local
variations in gravity, etc.
• But in practice, it’s accurate enough to be useful! George Box, Statistician
(1919-2013)
3
What is a Model?
• In data science, a model usually means a mathematical rule or function that describes the
relationships between variables.
4
Example : A statistical model
Below are Sales vs TV, Radio and Newspaper
5
A statistical model
where 𝜖𝜖 captures measurement errors and other discrepancies (will come back here later).
6
What is f(X) good for?
• 2. We can understand which components of 𝑋𝑋 = (𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑝𝑝 ) are important in explaining
𝑌𝑌, and which are irrelevant.
• 3. Depending on the complexity of f, we may be able to understand how each component 𝑋𝑋𝑖𝑖
of 𝑋𝑋 affects 𝑌𝑌.
7
The Modeling Process
8
The Modeling Process
9
Simple Linear Regression: Our First Model
• SLR is a parametric model, meaning we choose the "best" parameters for slope and
intercept based on data.
• Sample-based estimate of parameter 𝜃𝜃 is written as 𝜃𝜃̂ which will provide the estimate of 𝑦𝑦
• Usually, we pick the parameters that appear "best" according to some criterion we choose.
10
Which 𝜃𝜃 is best?
11
The Modeling Process
12
Loss Functions
• A loss function 𝐿𝐿 𝑦𝑦, 𝑦𝑦� characterizes the cost, error, or fit resulting from a particular choice
of model or model parameters
• The choice of loss function affects the accuracy and computational cost of estimation.
13
L2 and L1 Loss
14
Residuals as Loss Function?
15
Empirical Risk is Average Loss over Data
• We care about how bad our model’s predictions are for our entire data set, not just for one
point.
• A natural measure, then, is of the average loss (aka empirical risk) across all points.
• Given data
• The average loss on the sample tells us how well the model fits the data (not the
population), But hopefully these are close.
16
Empirical Risk is Average Loss over Data
The colloquial term for average loss depends on which loss function we choose.
17
The Modeling Process
18
Minimizing MSE for the SLR Model
• To find the best values, we set derivatives equal to zero to obtain the optimality
conditions:
19
Partial Derivative of MSE with Respect to 𝜃𝜃0, 𝜃𝜃1
20
Estimating Equations
• To find the best values, we set derivatives equal to zero to obtain the optimality conditions:
• To find the best 𝜃𝜃0, 𝜃𝜃1, we need to solve the estimating equations on the right.
21
From Estimating Equations to Estimators
(1) (2)
• (1)
22
From Estimating Equations to Estimators
23
Estimating Equations
24
Minimizing MSE for the SLR Model
• To find the best values, we set derivatives equal to zero to obtain the optimality
conditions:
25
Estimating Equations
• Estimating equations are the equations that the model fit has to solve. They help us:
• Derive the estimates.
• Understand what our model is paying attention to.
For SLR:
•The residuals should average to zero (otherwise we should fix the intercept!)
•The residuals should be orthogonal to the predictor variable (or we should fix the slope!)
26
The Modeling Process
27
Evaluating Models
What are some ways to determine if our model was a good fit to our data?
1. Performance metrics: Root Mean Square Error (RMSE)
• A lower RMSE indicates more "accurate" predictions (lower "average loss" across data)
• RMSE is in the same units as y.
28
Four Mysterious Datasets (Anscombe’s quartet)
2. Visualization: Look at a residual plot to visualize the difference between actual and predicted values.
29
Four Mysterious Datasets (Anscombe’s quartet)
30
The Modeling Process: Using a Different Model
31
The Constant Model
32
The Constant Model
• The constant model summarizes the data by always "predicting" the same number—
i.e., predicting a constant.
• For instance, boba tea sales likely depend on the time of year, the weather, how the
customers feel, whether school is in session, etc.
• Ignoring these factors is a simplifying assumption.
33
The Constant Model
34
The Modeling Process: Using a Different Model
35
The Modeling Process: Using a Different Model
36
Fit the Model: Rewrite MSE for the Constant Model
• Recall that Mean Squared Error (MSE) is average squared loss (L2 loss) over the data
• We fit the model by finding the optimal 𝜃𝜃0 that minimizes the MSE.
37
�0 = 𝑦𝑦̅
Fit the Model : 𝜃𝜃
• Derivation :
38
Revisit the Boba Shop Example
A. 0
B. 25
C. 22
D. 100
E. Something else
We will predict the mean of the previous five days’ sale:
39
[Loss] Comparing Two Different Models, Both Fit with MSE
40
[Fit] Comparing Two Different Models, Both Fit with MSE
41
The Modeling Process: Using a Different Loss Function
42
Fit the Model: Rewrite MAE for the Constant Model
• Recall that Mean Absolute Error (MAE) is average absolute loss (L1 loss)
• We fit the model by finding the optimal 𝜃𝜃0 that minimizes the MAE.
43
Exploring MAE: A Piecewise function
45
Fit the Model: set equal to 0
• Theta needs to be such that there are an equal # of points to the left and right.
• For example, in our boba tea dataset {20, 21, 22, 29, 33}, the point in green (22) is the
median.
46
MSE and MAE: Comparing Optimal Parameters
47
MSE and MAE: Comparing Loss Surfaces
48
MSE and MAE: Comparing Sensitivity to Outliers
49
MSE and MAE: Comparing Uniqueness of Solutions
50
Summary: Loss Optimization, Calculus, and…Critical Points?
51