Classical Multiple Regression
Classical Multiple Regression
Y y i X 1 x i1 x ip
n 1
n k
1 x n1 x np
y n
X is called the design matrix as though the research chose the values of x exogenously.
The critical assumption is actually that x is uncorrelated with the error .
Classical Linear Regression Assumptions
1. Y=X+ Linearity
2. E[Y|X]=X or E[|X]=0 explanatory variables are exogenous, independent of errors
3. Var(Y|X)=I errors are iid ( independent and identically distributed)
4. X is fixed
5. X has full column rank: nk and columns of X are not dependent
6. is normally distributed
Problems that might arise with these assumptions
1. wrong regressors, nonlinearity in the parameters, changing parameters
2. biased intercept
3. autocorrelation and heteroskedasticity
4. errors in variables, lagged values, simultaneous equation bias
5. multicollinearity
6. inappropriate tests
Least Squares Estimator
We want to know the latent values of the parameters and so we have to use
the data Y,X to create estimators. Start with . Let the guess of what might be denoted
by the letter b. If Y=Xb+e, then this is really a definition of the resulting residual errors
from a guess b. We want to make these small in a summed squared sense:
min b SSE ee=(Y-Xb)(Y-Xb)=YY-2bXY+bXXb.
SSE
2X ' Y 2X ' Xb 0 , or
b
OLS estimator of
b=(XX)-1XY
cov(X, Y )
var(X )
cov(X, Y )
var(X )
var(Y )
var(Y )
var(X )
sY
sX
Hence, b is like a correlation between x and y when we do not standardize the scales of
the variables.
The residual vector e is by definition e=Y-Xb or
e=Y-X(XX)-1XY=(I-X(XX)-1X)Y =MY,
where M=I- X(XX)-1X. This matrix M is the centering matrix around the regression
line and is very much like the mean centering matrix H=I-1 1/n=I-1 (11)-11.
Theorem: M is symmetric and idempotent (MM=M), tr(M)=n-k, MX=0.
Given the regression centering matrix M, the sum of squared errors is SSE=ee=
(MY)(MY)=YMY.
Theorem: E[b]= OLS is unbiased.
proof: E[b]=E[(XX)-1XY]= E[(XX)-1X(X+)]= E[(XX)-1XX+(XX)-1X)]
= +(XX)-1XE[]=.
Theorem: var[b]=2(XX)-1.
proof: E[(b-)(b-)]=+(XX)-1X-)+(XX)-1X-)]=
=(XX)-1XX(XX)-1]=(XX)-1XE[]X(XX)-1
=(XX)-1X2IX(XX)-1=2(XX)-1XX(XX)-1=2(XX)-1.
Theorem: Xe=0, estimated errors are orthogonal to the data generating them.
Proof: XMY=(X-XX(XX)-1X)Y=(X-X)Y=0Y=0.
Now consider estimating . SSE=ee=YMY=(X+)M(X+)=
XMX+2MX+M. The first two terms are zero because MX=0. Hence
ee=M=tr(M) (note: the trace of a scalar is trivial)=tr(M). Given this, the
expected value of ee is just tr(ME[])=tr(M=2tr(M)=2(n-k).
SSR
SSE
e' e
Y ' MY
1
1
1
SST
SST
Y' HY
Y ' HY .
n 1
(1 R 2 )
nk
.
Adding a variable with a t-stat >1.0 will increase adjusted R2. Notice: not t-stat>1.96.
(Y X)' (Y X)
2
2 2 2 4
2 MLE e' e / n .
Note: divide by n not n-k. Hence MLE of 2 is biased.
Theorem: Y|X ~ N(X, 2I) then b~N(,2(XX)-1),
independent.
nk 2
s ~ 2n k , and b and s2 are
2
Confidence Intervals
Joint:
(b-)(XX)(b-) ks2Fk,n-k()
One at a time:
bi SE(bi) tn-k(/2)
Simultaneous:
bi SE(bi) kFk , n k ()
Hypothesis Testing
Ho: R=r, where R qk and r q1 for linear restrictions on k1.
Let a be the OLS estimators of using the above q restrictions: a min ee s.t. Ra=r. Let
b be the unconstrained OLS estimators of . Likelihood ratio LR = La/Lb. Define
1
1
LR = -2ln(LR)=2ln(Lb)-2ln(La)= 2 (Y Xb)' (Y Xb) 2 ( Y Xa )' (Y Xa ) . If we
q ,n k .
e' e /( n k )
Hence we can test the restrictions R=r by running the regression with the constraints and
unconstrained and computing
?
= 0 Lagrange Multiplier Test
Likelihood
?
Ratio 0 =
Test
R=r
?
= 0 Wald Test
constr
unconstr