100% found this document useful (1 vote)
95 views30 pages

Multiple Regression Analysis: I 0 1 I1 K Ik I

Multiple regression analysis allows modeling of a dependent variable as a function of several independent or explanatory variables. It extends simple linear regression to incorporate more than one predictor. The key aspects covered include: - Multiple regression models take the form of y = β0 + β1x1 + ... + βkxK + u, where y is the dependent variable, the x's are explanatory variables, and the β's are coefficients. - Ordinary least squares (OLS) estimation is used to estimate the coefficients by minimizing the sum of squared errors. - Goodness-of-fit measures like R-squared and adjusted R-squared indicate how well the model fits the data.

Uploaded by

ajayikayode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
95 views30 pages

Multiple Regression Analysis: I 0 1 I1 K Ik I

Multiple regression analysis allows modeling of a dependent variable as a function of several independent or explanatory variables. It extends simple linear regression to incorporate more than one predictor. The key aspects covered include: - Multiple regression models take the form of y = β0 + β1x1 + ... + βkxK + u, where y is the dependent variable, the x's are explanatory variables, and the β's are coefficients. - Ordinary least squares (OLS) estimation is used to estimate the coefficients by minimizing the sum of squared errors. - Goodness-of-fit measures like R-squared and adjusted R-squared indicate how well the model fits the data.

Uploaded by

ajayikayode
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

3.

Multiple Regression Analysis


The general linear regression with k explana-
tory variables is just an extension of the sim-
ple regression as follows
(1) y
i
=
0
+
1
x
i1
+ +
k
x
ik
+u
i
.
Because
(2)
y
i
x
ij
=
j
j = 1, . . . , k, coecient
j
indicates the marginal
eect of variable x
j
, and indicates the amount
y is expected to change as x
j
changes by
one unit and other variables are kept con-
stant (ceteris paribus).
The multiple regression opens up several ad-
ditional options to enrich analysis and make
modeling more realistic compared to the sim-
ple regression.
1
Example 3.1: Consider the hourly wage example. En-
hance the model as
(3) log(w) =
0
+
1
x
1
+
2
x
2
+
3
x
3
,
where w = average hourly earnings, x
1
= years of ed-
ucation (educ), x
2
= years of labor market experience
(exper), and x
3
= years with the current employer
(tenure).
Dependent Variable: LOG(WAGE)
Method: Least Squares
Date: 08/21/12 Time: 09:16
Sample: 1 526
Included observations: 526
Variable Coefficient Std. Error t-Statistic Prob.
C 0.284360 0.104190 2.729230 0.0066
EDUC 0.092029 0.007330 12.55525 0.0000
EXPER 0.004121 0.001723 2.391437 0.0171
TENURE 0.022067 0.003094 7.133070 0.0000
R-squared 0.316013 Mean dependent var 1.623268
Adjusted R-squared 0.312082 S.D. dependent var 0.531538
S.E. of regression 0.440862 Akaike info criterion 1.207406
Sum squared resid 101.4556 Schwarz criterion 1.239842
Log likelihood -313.5478 Hannan-Quinn criter. 1.220106
F-statistic 80.39092 Durbin-Watson stat 1.768805
Prob(F-statistic) 0.000000
For example the coecient 0.092 means that, hold-
ing exper and tenure xed, another year of education
is predicted to increase wage by approximately 9.2%.
Staying another year at the same rm (educ xed,
exper=tenure=1) is expected to result in a salary
increase by approximately 0.4%+2.2% = 2.6%.
2
Example 3.2: Consider the consumption function
C = f(Y ), where Y is income. Suppose the assump-
tion is that as incomes grow the marginal propensity
to consume decreases.
In simple regression we could try to t a level-log
model or log-log model.
One possibility also could be

1
=
1l
+
1q
Y ,
where according to our hypothesis
1q
< 0. Thus the
consumption function becomes
C =
0
+(
1l
+
1q
Y )Y +u
=
0
+
1l
Y +
1q
Y
2
+u
This is a multiple regression model with x
1
= Y and
x
2
= Y
2
.
This simple example demonstrates that we can mean-
ingfully enrich simple regression analysis (even though
we have essentially only two variables, C and Y ) and
at the same time get a meaningful interpretation to
the above polynomial model.
The response of C to a one unit change in Y is now
C
Y
=
1l
+2
1q
Y .
3
Estimation
In order to estimate the model we replace the
classical assumption 3 as
3. None of the independent variables is con-
stant, and no observation vector of any in-
dependent variable can be written as a lin-
ear combination of the observation vectors
of any other independent variables.
The estimation method again is the OLS,
which produces estimates

0
,

1
, . . . ,

k
by min-
imizing
(4)
n

i=1
(y
i

1
x
i1

k
x
ik
)
2
with respect to the parameters.
Again the rst order solution is to set the
(k +1) partial derivatives equal to zero.
The solution is straightforward although the
explicit form of the estimators become com-
plicated.
4
Matrix form
Using matrix algebra simplies considerably
the notations in multiple regression.
Denote the observation vector on y as
(5) y = (y
1
, . . . , y
n
)

,
where the prime denotes transposition.
In the same manner denote the data matrix
on x-variables enhanced with ones in the rst
column as an n (k +1) matrix
(6) X =

1 x
11
x
12
x
1k
1 x
21
x
22
x
2k
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n1
x
n2
x
nk

,
where k < n (the number of observations, n,
is larger than the number of x-variables, k).
5
Then we can present the whole set of regres-
sion equations for the sample
(7)
y
1
=
0
+
1
x
11
+ +
k
x
1k
+u
1
y
2
=
0
+
1
x
21
+ +
k
x
2k
+u
2
.
.
.
y
n
=
0
+
1
x
n1
+ +
k
x
nk
+u
n
in the matrix form as
(8)

y
1
y
2
.
.
.
y
n

1 x
11
x
12
x
1k
1 x
21
x
22
x
2k
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n1
x
n2
x
nk

2
.
.
.

u
1
u
2
.
.
.
u
n

or shortly
(9) y = Xb +u,
where
b = (
0
,
1
, . . . ,
k
)

is the parameter vector and


u = (u
1
, u
2
, . . . , u
n
)

is the error vector.


6
The normal equations for the rst order con-
ditions of the minimization of (4) in matrix
form are simply
(10) X

b = X

y
which gives the explicit solution for the OLS
estimator of b as
(11)

b = (X

X)
1
X

y,
where

b = (

0
,

1
, . . . ,

k
)

and existence of
(X

X)
1
is granted by assumption 3.
The tted model is
(12) y
i
=

0
+

1
x
i1
+ +

k
x
ik
,
i = 1, . . . , n.
7
Remark 3.1:
Single and multiple regression do in general
not produce the same parameter estimates
on the same independent variables. For ex-
ample, if you t the simple regression
y =

0
+

1
x
1
, where

0
and

1
are the OLS
estimators, and t a multiple regression
y =

0
+

1
x
1
+

2
x
2
then it turns out that

1
=

1
+

1
, where
1
is the slope coe-
cient from regressing x
2
on x
1
. This implies
that

1
=

1
unless

2
= 0, or x
1
and x
2
are
uncorrelated.
8
Goodness-of-Fit
Again in the same manner as with the simple
regression we have
(13)
n

i=1
(y
i
y)
2
=
n

i=1
( y
i
y)
2
+
n

i=1
(y
i
y
i
)
2
or
(14) SST = SSE +SSR,
where
(15) y
i
=

0
+

1
x
i1
+ +

k
x
ik
.
9
R-square:
R-square denotes again the sample variation
in the tted y
i
s as a proportion of the sam-
ple variation in the original y
i
s:
(16) R
2
=
SSE
SST
= 1
SSR
SST
.
Again as in the case of the simple regression,
R
2
can be shown to be the squared correla-
tion coecient between the actual y
i
and t-
ted y
i
. This correlation is called the multiple
correlation
(17) R =

(y
i
y)( y
i

y)

(y
i
y)
2

( y
i

y)
2
.
Remark 3.2:

y = y.
Remark 3.3: R
2
never decreases when an explanatory
variable is added to the model.
10
Adjusted R-square:
(18)

R
2
= 1
s
2
u
s
2
y
= 1
n 1
n k 1
(1 R
2
),
where
(19)
s
2
u
=
1
n k 1
n

i=1
(y
i
y
i
)
2
=
1
n k 1
n

i=1
u
2
i
is an estimator of the error variance
2
u
= Var[u
i
].
The square root of (19) is the so called
standard error of the regression.
11
3.3 Expected values of the OLS estimators
Given observation tuples (x
i1
, x
i1
, . . . , x
ik
, y
i
),
i = 1, . . . , n the classical assumptions read now:
Assumptions (classical assumptions):
1. y =
0
+

k
i=1

i
x +u in the population.
2. {(x
i1
, x
i1
, . . . , x
ik
, y
i
)} is a random sample
of the model above, implying uncorrelated
residuals: Cov(u
i
, u
j
) = 0 for all i = j.
3. All independent variables including the vec-
tor of constants are linearly independent,
implying that (X

X)
1
exists.
4. E[u|x
1
, . . . , x
k
] = 0 implying E[u] = 0
and Cov(u, x
1
) = . . . = Cov(u, x
k
) = 0 .
5. Var[u|x
1
, . . . , x
k
] =
2
implying Var[u] =
2
.
Under these assumptions we can show that
the estimators of the regression coecients
are unbiased. That is
Theorem 3.1: Given observations on the x-variables
(20) E[

j
] = j
12
Using matrix notations, the proof of Theo-
rem 3.1 is pretty straightforward. To do this
write
(21)

b = (X

X)
1
X

y
= (X

X)
1
X

(Xb +u)
= (X

X)
1
X

Xb +(X

X)
1
X

u
= b +(X

X)
1
X

u.
Given X, the expected value of

b is
(22)
E[

b] = b +(X

X)
1
X

E[u]
= b
because by Assumption 4 E[u] = 0. Thus the
OLS estimators of the regression coecients
are unbiased.
Remark 3.4: If z = (z
1
, . . . , z
k
)

is a random vector,
then E[z] = (E[z
1
], . . . , E[z
n
])

. That is the expected


value of a vector is a vector whose components are
the individual expected values.
13
Irrelevant variables in a regression
Suppose the correct model is
(23) y =
0
+
1
x
1
+u
but we estimate the model as
(24) y =
0
+
1
x
1
+
2
x
2
+u.
Thus,
2
= 0 in reality. The OLS estimation
results yield
(25) y =

0
+

1
x
1
+

2
x
2
By Theorem 3.1 E[

j
] =
j
, thus in particu-
lar E[

2
] =
2
= 0, implying that inclusion of
extra variables to a regression does not bias
the results.
However, as will be seen later, they decrease
accuracy of estimation by increasing variance
of the OLS estimates.
14
Omitted variable bias
Suppose now as an example that the correct
model is
(26) y =
0
+
1
x
1
+
2
x
2
+u,
but we misspecify the model as
(27) y =
0
+
1
x
1
+v,
where the omitted variable is embedded into
the residual term v =
2
x
2
+u.
OLS estimator for
1
for specication (27)is
(28)

1
=

(x
i1
x
1
)y
i

(x
i1
x
1
)
2
.
From Equation (2.37) we have
(29)

1
=
1
+
n

i=1
a
i
v
i
,
where
(30) a
i
=
(x
i1
x
1
)

(x
i1
x
1
)
2
.
15
Thus because E[v
i
] = E[
2
x
i2
+u
i
] =
2
x
i2
(31)
E[

1
] =
1
+

a
i
E[v
i
]
=
1
+

a
i

2
x
i2
=
1
+
2

(x
i1
x
1
)x
i2

(x
i1
x
1
)
2

1
i.e.,
(32) E[

1
] =
1
+
2

1
,
where

1
is the slope coecient of regressing
x
2
upon x
1
, implying that

1
is biased for
1
unless x
1
and x
2
are uncorrelated (or
2
= 0).
This is called the omitted variable bias.
The direction of the omitted variable bias is
as follows:
Corr(x
1
, x
2
) > 0 Corr(x
1
, x
2
) < 0

2
> 0 positive bias negative bias

2
< 0 negative bias positive bias
16
3.4 The variance of OLS estimators
Write the regression model
(33) y
i
=
0
+
1
x
i1
+ +
k
x
ik
+u
i
i = 1, . . . , n in the matrix form
(34) y = Xb +u.
Then we can write the OLS estimators com-
pactly as
(35)

b = (X

X)
1
X

y.
17
Under the classical assumption 15, and as-
suming X xed we can show that the variance-
covariance matrix b is
(36) Cov[

b] = (X

X)
1

2
u
.
Variances of the individual coecients are
obtained form the main diagonal of the ma-
trix, and can be shown to be of the form
(37) Var[

j
] =

2
u
(1 R
2
j
)

n
i=1
(x
ij
x
j
)
2
,
j = 1, . . . , k, where R
2
j
is the R-square when
regressing x
j
on the other explanatory vari-
ables and the constant term.
18
Multicollinearity
In terms of linear algebra, we say that vectors
x
1
, x
2
, . . . , x
k
are linearly independent if
a
1
x
1
+ +a
k
x
k
= 0
holds only if
a
1
= = a
k
= 0.
Otherwise x
1
, . . . , x
k
are linearly dependent.
In such a case some a

= 0 and we can write


x

= c
1
x
1
+ +c
1
x
1
+c
+1
x
+1
+ +c
k
x
k
,
where c
j
= a
j
/a

that is x

can be repre-
sented as a linear combination of the other
variables.
19
In statistics the multiple correlation measures
the degree of linear dependence. If the vari-
ables are perfectly linearly dependent. That
is, if for example, x
j
is a linear combination
of other variables, the multiple correlation
R
j
= 1.
A perfect linear dependence is rare between
random variables. However, particularly be-
tween macro economic variables dependen-
cies are often high.
20
From the variance equation (37)
Var[

j
] =

2
u
(1 R
2
j
)

n
i=1
(x
ij
x
j
)
2
,
we see that Var[

j
] as R
2
j
1. That
is, the more the explanatory variables are lin-
early dependent the larger the variance be-
comes. This implies that the coecient es-
timates become increasingly instable.
High (but not perfect) correlation between
two or more explanatory variables is called
multicollinearity.
21
Symptoms of multicollinearity:
(1) High correlations between explanatory vari-
ables.
(2) R
2
is relatively high, but the coecient
estimates tend to be insignicant (see the
section of hypothesis testing)
(3) Some of the coecients are of wrong
sign and some coecients are at the same
time unreasonably large.
(4) Coecient estimates change much from
one model alternative to another.
22
Example 3.3: Variable E
t
denotes cost expenditures
in a sample of Toyota Mark II cars at time point t, M
t
denotes the milage and A
t
age.
Consider model alternatives:
Model A: E
t
=
0
+
1
A
t
+u
1t
Model B: E
t
=
0
+
1
M
t
+u
2t
Model C: E
t
=
0
+
1
M
t
+
2
A
t
+u
3t
23
Estimation results: (t-values in parentheses)
Variable Malli A Malli B Malli C
Constant -626.24 -796.07 7.29
(-5.98) (-5.91) (0.06)
Age 7.35 27.58
(22.16) (9.58)
Miles 53.45 -151.15
(18.27) (-7.06)
df 55 55 54

R
2
0.897 0.856 0.946
368.6 437.0 268.3
Findings:
Apriori, coecients
1
,
1
,
1
, and
2
should be pos-
itive. However,
2
= 151.15 (!!?), but

1
= 53.45.
Correlation r
M,E
= 0.996!
24
Remedies:
In the collinearity problem the question is the
there is not enough information to reliably
identify each variables contribution as an ex-
planatory variable in the model. Thus in or-
der to alleviate the problem:
(1) Use non-sample information if available
to impose restrictions between coecients.
(2) Increase the sample size if possible.
(3) Drop the most collinear (on the base of
R
2
j
) variables.
(4) If a linear combination (usually a sum)
of the most collinear variables is meaningful,
replace the collinear variables by the linear
combination.
25
Variances of misspecied models
Consider again as in (26) the regression model
(38) y =
0
+
1
x
1
+
2
x
2
+u.
Suppose the following models are estimated
by OLS
(39) y =

0
+

1
x
1
+

2
x
2
and
(40) y =

0
+

1
x
1
.
Then by (37)
(41) Var[

1
] =

2
u
(1 r
2
12
)

(x
i1
x
1
)
2
and in analogy to (2.40)
(42)
Var[

1
] =
Var[
2
x
2
+u]

(x
i1
x
1
)
2
=

2
u

(x
i1
x
1
)
2
,
where r
12
is the sample correlation between
x
1
and x
2
.
26
Thus Var[

1
] Var[

1
], and the inequality is
strict if r
12
= 0.
In summary (assuming r
12
= 0):
(1) If
2
= 0, then

1
is biased,

1
is unbi-
ased, and Var[

1
] < Var[

1
]
(2) If
2
= 0, then both

1
and

1
are unbi-
ased, but Var[

1
] < Var[

1
]
27
Estimating error variance
2
u
An unbiased estimator of the error variance
Var[u] =
2
u
is
(43)
2
u
=
1
n k 1
n

i=1
u
2
i
,
where
(44) u
i
= y
i

1
x
i1

k
u
ik
.
The term n k 1 in (42) is the
degrees of freedom (df).
It can be shown that
(45) E[
2
u
] =
2
u
,
i.e.,
2
u
is unbiased estimator of
2
u
.

u
is called the standard error of the regres-
sion.
28
Standard errors of

k
Standard deviation of

j
is the square root
of (37), i.e.
(46)

j
=

Var[
j
] =

u

(1 R
2
j
)

(x
ij
x
j
)
2
Substituting
u
by its estimate
u
=


2
u
gives
the standard error of

j
(47) se(

j
) =

u

(1 R
2
j
)

(x
ij
x
j
)
2
.
29
3.5 The Gauss-Markov Theorem
Theorem 3.2: Under the classical assumptions 15

0
,

1
, . . .

k
are best linear unbiased estimators (BLUEs)
of
0
,
1
, . . .
k
, respectively.
BLUE:
Best: The variance of the OLS estimator is
smallest among all linear unbiased estimators
of
j
Linear:

j
=

n
i=1
w
ij
y
i
.
Unbiased: E[

j
] =
j
.
30

You might also like