0% found this document useful (0 votes)
11 views39 pages

3-Lecture 3-1

The document discusses the concepts of response and predictor variables in the context of multiple linear regression, emphasizing the relationship between predictors and the outcome variable. It covers the use of qualitative predictors, the creation of dummy variables, and the implications of collinearity and interaction effects on model interpretation. Additionally, it highlights the importance of residual analysis and the potential for overfitting when too many predictors or interaction terms are included.

Uploaded by

Adel Shousha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

3-Lecture 3-1

The document discusses the concepts of response and predictor variables in the context of multiple linear regression, emphasizing the relationship between predictors and the outcome variable. It covers the use of qualitative predictors, the creation of dummy variables, and the implications of collinearity and interaction effects on model interpretation. Additionally, it highlights the importance of residual analysis and the potential for overfitting when too many predictors or interaction terms are included.

Uploaded by

Adel Shousha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Response vs.

Predictor Variables

X Y
predictors outcome
features response variable
covariates dependent variable

TV radio newspaper sales


n observations

230.1 37.8 69.2 22.1


44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9

p predictors
CS109A, PROTOPAPAS, RADER, TANNER 3
Multilinear Models

In practice, it is unlikely that any response variable Y depends solely on one predictor x.
Rather, we expect that is a function of multiple predictors 𝑓(𝑋! , … , 𝑋" ). Using the
notation we introduced last lecture,

𝑌 = 𝑦! , … , 𝑦# , 𝑋 = 𝑋! , … , 𝑋" and 𝑋$ = 𝑥!$ , … , 𝑥%$ , … , 𝑥#$ ,

we can still assume a simple form for 𝑓 -a multilinear


form:
𝑓 𝑋! , … , 𝑋" = 𝛽# + 𝛽! 𝑋! + ⋯ + 𝛽" 𝑋"

+ has the form:


Hence, 𝑓,
𝑓) 𝑋! , … , 𝑋" = 𝛽)# + 𝛽)! 𝑋! + ⋯ + 𝛽)" 𝑋"

CS109A, PROTOPAPAS, RADER, TANNER 4


Multiple Linear Regression

Given a set of observations,


{(x1,1 , . . . , x1,J , y1 ), . . . (xn,1 , . . . , xn,J , yn )},

the data and the model can be expressed in vector notation,

0 1 0 1
0 1 1 x1,1 ... x1,J 0
y1 B C B 1 C
B .. C B 1 x2,1 ... x2,J C B C
Y = @ . A, X=B .. .. .. .. C, = B . C,
@ . . . . A @ .. A
yy
1 xn,1 ... xn,J J

CS109A, PROTOPAPAS, RADER, TANNER 5


Multilinear Model, example

For our data


𝑆𝑎𝑙𝑒𝑠 = 𝛽, + 𝛽- × 𝑇𝑉 + 𝛽. ×𝑅𝑎𝑑𝑖𝑜 + 𝛽/ ×𝑁𝑒𝑤𝑠𝑝𝑎𝑝𝑒𝑟

In linear algebra notation

𝑆𝑎𝑙𝑒𝑠- 1 𝑇𝑉- 𝑅𝑎𝑑𝑖𝑜- 𝑁𝑒𝑤𝑠- 𝛽,


𝒀= ⋮ ,𝑿 = ⋮ ⋮ ⋮ ,𝜷 = ⋮
𝑆𝑎𝑙𝑒𝑠0 1 𝑇𝑉0 . 𝑅𝑎𝑑𝑖𝑜0 𝑁𝑒𝑤𝑠0 𝛽/

= ×

CS109A, PROTOPAPAS, RADER, TANNER 6


Multiple Linear Regression

The model takes a simple algebraic form:

Y =X +✏

We will again choose the MSE as our loss function, which can be
expressed in vector notation as
1
MSE( ) = kY X k2
n
Minimizing the MSE using vector calculus yields,

1
b = X> X X> Y = argmin MSE( ).
CS109A, PROTOPAPAS, RADER, TANNER 7
z
Multiple Linear Regression
Qualitative Predictors

So far, we have assumed that all variables are quantitative. But in


practice, often some predictors are qualitative.
Example: The credit data set contains information about balance, age,
cards, education, income, limit , and rating for a number of potential
customers.

Income Limit Rating Cards Age Education Gender Student Married Ethnicity Balance
14.890 3606 283 2 34 11 Male No Yes Caucasian 333

106.02 6645 483 3 82 15 Female Yes Yes Asian 903

104.59 7075 514 4 71 11 Male No No Asian 580

148.92 9504 681 3 36 11 Female No No Asian 964


55.882 4897 357 2 68 16 Male No Yes Caucasian 331

CS109A, PROTOPAPAS, RADER, TANNER 13


Qualitative Predictors

If the predictor takes only two values, then we create an indicator or


dummy variable that takes on two possible numerical values.
For example for the gender, we create a new variable:

1 if i th person is female
xi =
0 if i th person is male
We then use this variable as a predictor in the regression equation.

0 + 1 + ✏i if i th person is female
yi = 0 + 1 xi + ✏i =
0 + ✏i if i th person is male

CS109A, PROTOPAPAS, RADER, TANNER 14


Qualitative Predictors

Question: What is interpretation of 𝛽+ and 𝛽,?

CS109A, PROTOPAPAS, RADER, TANNER 15


Qualitative Predictors

Question: What is interpretation of 𝛽+ and 𝛽,?

• 𝛽+ is the average credit card balance among males,

• 𝛽+ + 𝛽, is the average credit card balance among females,

• and 𝛽, the average difference in credit card balance between females


and males.

Example: Calculate 𝛽+ and 𝛽, for the Credit data.


You should find 𝛽+~$509, 𝛽,~$19

CS109A, PROTOPAPAS, RADER, TANNER 16


More than two levels: One hot encoding

Often, the qualitative predictor takes more than two values (e.g. ethnicity
in the credit data).

In this situation, a single dummy variable cannot represent all possible


values.

We create additional dummy variable as:



1 if i th person is Asian
xi,1 =
0 if i th person is not Asian

1 if i th person is Caucasian
xi,2 =
0 if i th person is not Caucasian
CS109A, PROTOPAPAS, RADER, TANNER 17
More than two levels: One hot encoding

We then use these variables as predictors, the regression


equation becomes: 8
< 0 + 1 + ✏i if i th person is Asian
yi = 0 + 1 xi,1 + 2 xi,2 + ✏i = 0 + 2 + ✏i if i th person is Caucasian
:
0 + ✏i if i th person is AfricanAmerican

Question: What is the interpretation of 𝛽, , 𝛽- , 𝛽. ?

CS109A, PROTOPAPAS, RADER, TANNER 18


Collinearity
Collinearity and multicollinearity refers to the case in which two or more predictors
are correlated (related).
Limit and Rating are
highly correlated

Both limit and rating have positive


The regression coefficients are not coefficients, but it is hard to understand if the
uniquely determined. In turn it hurts the balance is higher because of the rating or is
interpretability of the model as then the it because of the limit? If we remove limit
regression coefficients are not unique then we achieve almost the same model
and have influences from other features. performance but the coefficients change.
CS109A, PROTOPAPAS, RADER, TANNER 19
Beyond linearity

So far we assumed:

• linear relationship between X and Y


• the residuals 𝑟- = 𝑦- − 𝑦.- were uncorrelated (taking the average of the
square residuals to calculate the MSE implicitly assumed
uncorrelated residuals).

These assumptions need to be verified using the data and visually


inspecting the residuals.

CS109A, PROTOPAPAS, RADER, TANNER 20


Residual Analysis

If the correct model is not linear then,


𝑦 = 𝛽+ + 𝛽,𝑥 + 𝝓 𝒙 + 𝜖
our model assuming linear relationship is:
𝑦. = 𝛽3+ + 𝛽3,𝑥
Then the residuals, 𝑟 = 𝑦 − 𝑦. = 𝜖 + 𝝓 𝒙 , are not independent of 𝒙

In residual analysis, we typically create two types of plots:

1. a plot of 𝑟- with respect to 𝑥- or 𝑦.- . This allows us to compare the


distribution of the noise at different values of 𝑥- or 𝑦.- .
2. a histogram of 𝑟- . This allows us to explore the distribution of the
noise independent of 𝑥- or 𝑦.- .
CS109A, PROTOPAPAS, RADER, TANNER 21
Residual Analysis

Linear assumption is correct. There is Linear assumption is incorrect. There


no obvious relationship between is an obvious relationship between
residuals and x. Histogram of residuals residuals and x. Histogram of
is symmetric and normally distributed. residuals is symmetric but not
normally distributed.

Note: For multi-regression, we plot the residuals vs predicted y, 𝑦,


. since there are too many
x’s and that could wash out the relationship.
CS109A, PROTOPAPAS, RADER, TANNER 22
Beyond linearity: synergy effect or interaction effect

We also assume that the average effect on sales of a one-unit increase


in TV is always 𝛽, regardless of the amount spent on radio.

Synergy effect or interaction effect states that when an increase on the


radio budget affects the effectiveness of the TV spending on sales.

We change

𝑌 = 𝛽+ + 𝛽,𝑋, + 𝛽0𝑋0 + 𝜖
to:
𝑌 = 𝛽+ + 𝛽,𝑋, + 𝛽0𝑋0 + 𝛽1𝑋,𝑋0 + 𝜖

CS109A, PROTOPAPAS, RADER, TANNER 23


What does it mean?

0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 =6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, ×𝐼𝑛𝑐𝑜𝑚𝑒.

CS109A, PROTOPAPAS, RADER, TANNER 24


What does it mean?

0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 =6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, ×𝐼𝑛𝑐𝑜𝑚𝑒.
0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 = 6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, + 𝛽1 ×𝐼𝑛𝑐𝑜𝑚𝑒
CS109A, PROTOPAPAS, RADER, TANNER 25
Too many predictors, collinearity and too many
interaction terms leads to OVERFITTING!

CS109A, PROTOPAPAS, RADER, TANNER 26


CS109A, PROTOPAPAS, RADER, TANNER 27

You might also like