3-Lecture 3-1
3-Lecture 3-1
Predictor Variables
X Y
predictors outcome
features response variable
covariates dependent variable
p predictors
CS109A, PROTOPAPAS, RADER, TANNER 3
Multilinear Models
In practice, it is unlikely that any response variable Y depends solely on one predictor x.
Rather, we expect that is a function of multiple predictors 𝑓(𝑋! , … , 𝑋" ). Using the
notation we introduced last lecture,
0 1 0 1
0 1 1 x1,1 ... x1,J 0
y1 B C B 1 C
B .. C B 1 x2,1 ... x2,J C B C
Y = @ . A, X=B .. .. .. .. C, = B . C,
@ . . . . A @ .. A
yy
1 xn,1 ... xn,J J
= ×
Y =X +✏
We will again choose the MSE as our loss function, which can be
expressed in vector notation as
1
MSE( ) = kY X k2
n
Minimizing the MSE using vector calculus yields,
1
b = X> X X> Y = argmin MSE( ).
CS109A, PROTOPAPAS, RADER, TANNER 7
z
Multiple Linear Regression
Qualitative Predictors
Income Limit Rating Cards Age Education Gender Student Married Ethnicity Balance
14.890 3606 283 2 34 11 Male No Yes Caucasian 333
Often, the qualitative predictor takes more than two values (e.g. ethnicity
in the credit data).
So far we assumed:
We change
𝑌 = 𝛽+ + 𝛽,𝑋, + 𝛽0𝑋0 + 𝜖
to:
𝑌 = 𝛽+ + 𝛽,𝑋, + 𝛽0𝑋0 + 𝛽1𝑋,𝑋0 + 𝜖
0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 =6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, ×𝐼𝑛𝑐𝑜𝑚𝑒.
0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 =6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, ×𝐼𝑛𝑐𝑜𝑚𝑒.
0 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽,×𝐼𝑛𝑐𝑜𝑚𝑒.
𝑥2345673 = 6
1 𝐵𝑎𝑙𝑎𝑛𝑐𝑒 = 𝛽+ + 𝛽0 + 𝛽, + 𝛽1 ×𝐼𝑛𝑐𝑜𝑚𝑒
CS109A, PROTOPAPAS, RADER, TANNER 25
Too many predictors, collinearity and too many
interaction terms leads to OVERFITTING!