0% found this document useful (0 votes)
3 views

Chapter 3 Multiple regression

The document discusses the principles and applications of econometrics, focusing on multiple regression models and the interpretation of coefficients. It outlines the Ordinary Least Squares (OLS) estimation method, its properties, and the importance of Gauss-Markov assumptions for unbiased estimation. Additionally, it addresses issues such as omitted variables, goodness-of-fit measures, and inference statistics in regression analysis.

Uploaded by

Khôi Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 3 Multiple regression

The document discusses the principles and applications of econometrics, focusing on multiple regression models and the interpretation of coefficients. It outlines the Ordinary Least Squares (OLS) estimation method, its properties, and the importance of Gauss-Markov assumptions for unbiased estimation. Additionally, it addresses issues such as omitted variables, goodness-of-fit measures, and inference statistics in regression analysis.

Uploaded by

Khôi Văn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Econometrics

Nguyen Van Quy

Data Science program - NEU

February 11, 2025

Nguyen Van Quy Econometrics 2024/2025 1 / 49


Multiple Regression Model

Nguyen Van Quy Econometrics 2024/2025 2 / 49


The need of multiplicity?

Usually, studying an economics relationship requires many


independent variables.
More flexible and more suitable in terms of functional forms
Better regression and prediction.
Example: Demand/Supply, Phillip curve, etc

Nguyen Van Quy Econometrics 2024/2025 3 / 49


Model

Model of k explanatory
In Population (PRF) In Sample (SRF)
E (yi ) = β0 + β1 x1i + · · · + βk xki yˆi = β̂0 + β̂1 x1i + · · · + β̂k xki
yi = β0 + β1 x1i + · · · + βk xki + ui yi = β̂0 + β̂1 x1i + · · · + β̂k xki + ei

Intercept β0 = E (y |0,...,0 )
∂E (y )
Slope βj = ∂xj
If β1 = · · · = βk = 0: model is overall insignificant

Nguyen Van Quy Econometrics 2024/2025 4 / 49


Matrix form

y1 = β0 + β1 x11 + · · · βk xk1 + ε1
..
.
yn = β0 + β1 x1n + · · · βk xkn + εn
     
y1 1 x11 · · · xk1   ε1
y2  1 x12 · · · xk2  β0  ε2 
 .   
 ..  =  .. .   ..  +  .. 
  
.. . .
 .  . . . ..  .
βk
yn 1 x1n · · · xkn εn

y = Xβ + u
ŷ = X β̂
y = X β̂ + e

Nguyen Van Quy Econometrics 2024/2025 5 / 49


Interpret the coefficients

How do we interpret equation below?

\ = 1.29 + 0.453hsGPA + 0.0094ACT


colGPA
\ = 2.40 + 0.0271ACT
colGPA

Ceteris paribus interpretations


Changing more than one independent variable simultaneously

Nguyen Van Quy Econometrics 2024/2025 6 / 49


OLS estimation

Nguyen Van Quy Econometrics 2024/2025 7 / 49


OLS estimation
Find β̂j , j = 0, k such that they minimizes
n
X n
X
RSS = ei2 = (yi − β̂0 − β̂1 xi1 − · · · − β̂k xik )2
i=1 i=1

In terms of matrix form

β̂k2
β̂) = kyy − X β̂
min S(β̂
β̂

β̂)0 (yy − X β̂
β̂) = (yy − X β̂
We have S(β̂ β̂) = yy 0 − 2β̂ 0X 0y + β̂ 0X 0X β̂
β̂. So take
FOC, one gets
X 0X β̂ = X 0y
If X has full column rank (no multicollinearity of x1 , . . . , xk ), then

X 0X )−1X 0y
β̂ OLS = (X (1)

Nguyen Van Quy Econometrics 2024/2025 8 / 49


OLS fitted values

The sample average of the residuals is zero and so ȳ = ŷ¯ .


The sample covariance between each independent variable and the
OLS residuals is zero. Consequently, the sample covariance between
the OLS fitted values and the OLS residuals is zero.
The point (x̄1 , x̄2 , . . . , x̄k , ȳ ) is always on the OLS regression line.

Nguyen Van Quy Econometrics 2024/2025 9 / 49


Geometric interpretation of OLS

We have
0 = X 0y − X 0X β̂ = X 0e
Which means that e should be perpendicular to every column vector
of matrix X , i.e. perpendicular to the vector space spanned by the
column vectors of X
Condition X 0e = 0 is called the system of normal equations.
XX 0 )−1 )X
Notice that ŷ = X β̂ = PX y , where PX = X (XX X is orthogonal
projector on vector space spanned by X .
Let MX = I − PX is orthogonal projector on the orthogonal space of
X , e = MX y .

Nguyen Van Quy Econometrics 2024/2025 10 / 49


OLS properties

Nguyen Van Quy Econometrics 2024/2025 11 / 49


Gauss - Markov assumptions

Gauss - Markov assumptions


1. (Linearity) y = β0 + β1 x1 + · · · + βk xk + u
2. (Zero mean) E (uu |X ) = 0
3. (Random sampling) We have a random sample of n observations,
{(xi1 , xi2 , . . . , xik , yi ) : i = 1, n}.
4. (No perfect collinearity) Rank(X ) = k + 1.
5. (Homoskedasticity) Var (ui ) = σ 2 for all i = 1, n.
6. (Non auto correlation) Cov (ui , uj ) = 0 for all i 6= j.
7. (Normality) u ∼ N(00, σ 2I ).

Nguyen Van Quy Econometrics 2024/2025 12 / 49


Properties of OLS estimator

OLS estimator
Under Assumptions 1-4, OLS estimator is unbias, E (β̂ OLS ) = β .
X 0X )−1 .
Under Assumptions 1-6, Var (β̂ OLS ) = σ 2 (X

Moreover
σ2
Var (β̂j OLS ) =
TSSj (1 − Rj2 )
where TSSj = i (xij − x¯j )2 is the total sample variation in xj , and Rj2 is
P
the R-squared from regressing xj on all other independent variables (and
including an intercept).

Nguyen Van Quy Econometrics 2024/2025 13 / 49


Properties of OLS estimator

BLUE
Under Assumptions 1-6, the OLS estimator, β̂ OLS , is a best linear
unbiased estimator (BLUE) of β .

When n → ∞, (β̂ OLS


n ) converges in probability to β .
Moreover, under Assumption 1-6, the estimators are asymptotically normal.
MVUE
Under Assumptions 1-7, the OLS estimator, β̂ OLS , is also the minimum
variance unbiased estimator of β .

Nguyen Van Quy Econometrics 2024/2025 14 / 49


Estimator of σ

β̂), variance of random error σ 2 is unknown, estimated by


In the Var (β̂

ee 0
P 2
ei
s2 = =
n − (k + 1) n−k −1

Estimated variance-covariance matrix:

Var
d (β̂ X 0X )−1
β̂) = s 2 (X

Standard Error of estimated coefficient

βj ) = Var
Se(β d (β̂j )

Nguyen Van Quy Econometrics 2024/2025 15 / 49


Partialling out effect

Nguyen Van Quy Econometrics 2024/2025 16 / 49


Partialling Out interpretation

We focus on β̂1 .
We have
n n
! !
X X
2
β̂1 = rˆi1 yi / rˆi1
i=1 i=1

The residuals rˆi1 come from the regression of x1 on x2 , . . . , xk .


So that we can then do a simple regression of y on rˆ1 to obtain β̂1 .
β̂1 then measures the effect of x1 on y after x2 , . . . , xk have been
partialled or netted out.

Nguyen Van Quy Econometrics 2024/2025 17 / 49


Simple and Multiple Regression Estimates

Let consider k = 2. In the simple regression of y on x1 , we have


ỹ = β̃0 + β̃1 x1 . In the multiple regression, ŷ = βb0 + βb1 x1 + βb2 x2 .
We have β̃1 = βb1 + βb2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1 .
Simple and multiple regression estimates are equal if
The partial effect of x2 on ŷ is zero, i.e., βb2 = 0
x1 and x2 are uncorrelated, i.e., δ̃1 = 0
Simple and multiple regression estimates are almost never identical.
But we can use the above formula to characterize why they might be
either very different or quite similar.

Nguyen Van Quy Econometrics 2024/2025 18 / 49


Including Irrelevant Variables

One (or more) of the independent variables is included in the model


even though it has no partial effect on y in the population
Suppose we specify the model as

y = β0 + β1 x1 + β2 x2 + β3 x3

and this model satisfies Assumptions 1-4. However, x3 has no effect


on y after x1 and x2 have been controlled for, β3 = 0.
There is no effect on the unbiasedness of all coefficients.
However, if x1 and x3 are highly correlated then R1 is high, which
leads to a large variance of βb1 .

Nguyen Van Quy Econometrics 2024/2025 19 / 49


Omitted Variables

For example, we should regress y on x1 and x2 but instead of that, we


regress y on x1 only. Then the coefficient of x1 is mostly bias.
Omitted variable bias β2 δ̃1 where δ̃1 is the slope coefficient from the
simple regression of x2 on x1
The direction of bias.

Nguyen Van Quy Econometrics 2024/2025 20 / 49


Omitted Variables

However, if we regress more than 3 independent variables, this is more


problematic. For example, suppose the population model

y = β0 + β1 x1 + β2 x2 + β3 x3 + u

satisfies Assumptions 1-4. But we omit x3 and estimate the model as

y = β̃0 + β̃1 x1 + β̃2 x2 + u

Suppose that x1 is correlated with x3 .


It’s clear that β̃1 is probably biased (same reason as previous one)
Moreover, β̃2 is also biased even x2 is uncorrelated with x3 .
It’s usually difficult to obtain the direction of bias in β̃1 and β̃2
Nevertheless, if we assume that x1 and x2 are uncorrelated, then we
can study the direction of bias.

Nguyen Van Quy Econometrics 2024/2025 21 / 49


Omitted Variables

Now we compare two estimator of β1 . One comes from

y = βb0 + βb1 x1 + βb2 x2

And the other comes from

y = β̃0 + β̃1 x1

When β2 6= 0, β̃1 is biased, βb1 is unbiased, and Var(βb1 )>Var(βb1 ).


When β2 = 0, both β̃1 and βb1 are unbiased, and Var(βb1 )>Var(βb1 ).
What we should choose between β̃1 and βb1 ?

Nguyen Van Quy Econometrics 2024/2025 22 / 49


Goodness-of-fit

Nguyen Van Quy Econometrics 2024/2025 23 / 49


Sum of squares

P
yi
Let ȳ = .
n
n
X
Total sum of squares: TSS = (yi − ȳ ), df = n − 1.
i=1
n
X
Explained/Regression sum of squares: ESS = (ŷi − ȳ ), df = k.
i=1
n
X n
X
Residual sum of squares: RSS = (yi − ŷi ) = ei2 ,
i=1 i=1
df = n − 1 − k.
TSS = ESS + RSS.

Nguyen Van Quy Econometrics 2024/2025 24 / 49


Goodness-of-fit

R-squared is squared of correlation between y and ŷ ;


ESS RSS
R2 = =1−
TSS TSS
It is interpreted as the proportion of the sample variation in y that is
explained by the OLS regression line.
Adding new explanatory variable, even it’s irrelevant, will artificially
increases R 2 .
Adjusted R-square:

n−1 RSS n − 1 s2
Ra2 = 1 − (1 − R 2 ) = 1 − =1− 2
n−k −1 TSS n − k − 1 sy

For models with a different number of explanatory variables, only the


Ra2 can be compared.

Nguyen Van Quy Econometrics 2024/2025 25 / 49


Remark

If there is no constant in the model, R 2 has no meaning because the


way it is computed requires a constant term.
R 2 and adjusted-R 2 are valid only if comparing models that have the
same dependent variable. So they are inappropriate to compare 2
models with y and log (y ) as the dependent variable.
The (adjusted)-R 2 is not enough to assess the relevance of a
regression: we’ll need statistical tests.

Nguyen Van Quy Econometrics 2024/2025 26 / 49


Inference statistics

Nguyen Van Quy Econometrics 2024/2025 27 / 49


Inference with T-distribution

We know the distribution of each βj , but one value is still unknown σj .


The unbiased estimator of σ 2 in the general multiple regression case is
RSS
b2 =
σ
n−k −1

Standard error of βbj is

σ
b
Se(βbj ) = q
TSSj (1 − Rj2 )

Using what we know about the distribution of βbj (normal) and Se(βbj )
(χ2 ), we get :
βbj − βj
∼ tn−k−1
Se(βbj )

Nguyen Van Quy Econometrics 2024/2025 28 / 49


Inference with T-distribution

Statistic Hyp. pair Reject H0 P-value


H0 : βj = βj∗ |t| > t(n−k−1)α/2 2P(T > |tobs |)
βbj − βj∗ 6 βj∗
H1 : β j =
t= H0 : βj > βj∗ t > t(n−k−1)α/2 P(T > tobs )
Se(βbj )
H1 : β j > βj∗
H0 : β j = βj∗ t < −t(n−k−1)α/2 P(T < tobs )
H1 : β j < βj∗

Important t-test
H0 : βj = 0 vs H1 : βj 6= 0, j = 1, k
βbj
If |t| = > tα/2 , reject H0 : significant coefficient
Se(βbj )

Nguyen Van Quy Econometrics 2024/2025 29 / 49


Inference of Coefficients

Confidence interval of single coefficient

βbj ± tα/2 Se(βbj )

Inference on two coefficient, say β1 ± β2


Testing H0 : β1 ± β2 = β ∗

(βb1 ± βb1 ) − βj∗


t=
Se(βb1 ± βb1 )

Confidence interval: (βb1 ± βb1 ) ± Se(βb1 ± βb1 )


q
where Se(βb1 ± βb1 ) = Se 2 (βb1 ) + Se 2 (βb1 ) ± 2Cov (βb1 , βb1 )

Nguyen Van Quy Econometrics 2024/2025 30 / 49


The test procedure : an example

Consider the following model:

income = a + b × height + c × education

estimated over N individuals


Suppose that the estimated parameter b̂ is close to zero
We thus infer that variable height could be irrelevant : the correlation
between income and height could (should) be zero
The ”true” b should be zero
But even if it is the case, it is very unlikely that we get b̂ = 0 (due to
computations).
Given the computed b̂, there should be a way to assess if the ”true” b
is in fact zero or not

Nguyen Van Quy Econometrics 2024/2025 31 / 49


The test procedure : an example

Let’s call H0 the hypothesis: b = 0, and H1 the hypothesis: b 6= 0


Should we consider H0 as true?
βbj −βj
We know that for this model, ∼ tn−3
Se(βbj )
Is the latter still plausible, if we take H0 as granted ?
Taking H0 as granted means that we assume b = 0, so that t
βbj
becomes t =
Se(βbj )
If under H0 , we find this value t to be unlikely to belong to a tn−3
distribution, then we will say that H0 was wrong
Rejecting H0 ⇐⇒ parameter b is significant
Not rejecting H0 ⇐⇒ parameter b is not significant

Nguyen Van Quy Econometrics 2024/2025 32 / 49


Example

Regression result in sample of 12 employees, in which wage depends on


experience (exp: year), education (edu: year), dummy of male

[ i = −4.9 + 0.41 expi + 0.83 edui + 1.2 malei , R 2 = 0.7575


wage
(4.38) (0.098) (0.299) (1.125)

(a) Interpret estimated slope and coefficient of determination


(b) At 5%, test for significant of slope
(c) Confidence 95% of significant slope
(d) Test hypothesis that slope of experience equals unit
(e) Test for hypothesis that slope of experience is less than slope of
education, and confidence interval of difference at 5%, knowing
covariance of estimated slopes is 0.001.

Nguyen Van Quy Econometrics 2024/2025 33 / 49


Some remarks

α = TypeIerror = P(H0 rejected|H0 is true)


β = TypeIIerror = P(H0 accepted|H1 is true)
α is the significance level, what is the intuition of α?
1 − β is the power of a test : it indicates how powerful a test is in
finding deviations from the null hypothesis H0
Lowering α =⇒ increasing β. Why?
Since we cannot minimize both, we set α as fixed (e.g. 5%) and try
to find the test that minimizes β for this given α

Nguyen Van Quy Econometrics 2024/2025 34 / 49


Some remarks

Dropping a useful variable can lead to non consistent estimates, while


keeping an unimportant variable only leads to loss in precision
Say we set α = 0.01 with a small sample size: then estimates are
likely to have a large variance
So even if the true parameter is not zero, its t-statistic is likely to be
small, thus failing to reject H0 although it is false
In that case, we might remove from the analysis a relevant variable
simply because we’ve been too stringent about the size of the test

Nguyen Van Quy Econometrics 2024/2025 35 / 49


Example

Suppose we are testing the hypothesis b = 0, while the true value is


b = 0.1
The probability that we reject the null (H0 ) depends on the standard
error of b̂, thus on the sample size
The larger the sample, the smaller the standard error so the more
likely we are to reject H0 .
Type II errors thus become increasingly unlikely when sample size
increases
We can thus decrease the size of the test α, e.g., 1%
Similarly, we can choose a size of 10% in small samples

Nguyen Van Quy Econometrics 2024/2025 36 / 49


Correlation and Estimated Coefficient

Model: y = β0 + β1 x1 + · · · + βk xk + u
Correlation of xk and y and estimated β
ck may has different sign
Added Variable plot
Regress y = β0 + β1 x1 + · · · + βk−1 xk−1 + u1 , gains e1
Regress xk = α0 + α1 x1 + · · · + αk−1 xk−1 + u2 , gains e2
Plot e2 on e1 → Added Plot, shows relationship of y versus xk
Partial Correlation

t(βck )
ry ,xk |x6=k = q
ck ))2 + n − k − 1
(t(β

Nguyen Van Quy Econometrics 2024/2025 37 / 49


Prediction Interval

 
1
x ∗ 
 1
Forecast at x1 = x1∗ , . . . , xk = xk∗ or vector x ∗ =  . 
 .. 
xk∗
ck x ∗ = x ∗0 βb
Point estimate: yc∗ = βb0 + βb1 x1∗ + · · · + β k
q
0
Standard error: Se(pred) = s 1 + x (X X )−1 x ∗
∗ 0

Confidence interval

yc∗ ± t(n−k−1)α/2 Se(pred)

Nguyen Van Quy Econometrics 2024/2025 38 / 49


Inference with F-distribution
Testing for reducing model
Full model
y = β0 + β1 x1 + · · · + βk xk + u
Reduced model, after remove p explanatory variable

y = β0 + β1 x1 + · · · + βk−p xk−p + u

Hypotheses
H0 : βk−p+1 = · · · = βk = 0: Reduced model is correct
H1 : not H0 : Reduced model is not correct
Statistic
(RSSReduced − RSSFull )/p RSSReduced − RSSFull
Fstat = = 2
RSSFull /(n − k − 1) p × sFull

If Fstat > f(p,n−k−1)α then reject H0 .


Nguyen Van Quy Econometrics 2024/2025 39 / 49
Overall Significant Test

Other formula of Reducing test (if dependent variable is unchanged)


2 − R2
(RFull Reduced )/p
Fstat = 2
(1 − RFull )/(n − k − 1)

Most important F-test: for all slopes, i.e, p = k


H0 : β1 = · · · = βk = 0: model is overall insignificant
H1 : not H0 : model is overall significant
2 /k
RFull
Fstat = 2 )/(n − k − 1)
(1 − RFull
If Fstat > f(k,n−k−1)α then reject H0 .

Nguyen Van Quy Econometrics 2024/2025 40 / 49


Linear Hypothesis Testing

Combine hypothesis: H0 : (β1 = 2 and β2 = 3): cannot using T-test


Called 2 restrictions, matrix present
    
1 0 β1 2
=
0 1 β2 3

General linear hypothesis (restriction) of coefficients C β = d


Hypotheses
( pair
H0 : C β = d
H1 : C β 6= d
Number or ”=” is number of restrictions, is p

Nguyen Van Quy Econometrics 2024/2025 41 / 49


Linear Hypothesis Testing

Under H0 , full model → reduced model


Full model
y = β0 + β1 x1 + β2 x2 + beta3 x3 + u
Under hypothesis β1 = 2 and β2 = 3, reduced model is

y − 2x1 − 3x2 = β0 + beta3 x3 + u

(RSSReduced − RSSFull )/p


Fstat = , critical f(p,n−k−1)α .
RSSFull /(n − k − 1)

Nguyen Van Quy Econometrics 2024/2025 42 / 49


T-test and F-test

T-test for one restriction only,


H0 contains one “=”,
H1 can be 6=, >, <
F-test for p restrictions, p can be larger than 1
H1 contains 6= only
If T-test and F-test apply for same hypothesis then
Fstat = (tstat )2
fcrit = (tcrit )2 ,
T-test and F-test have same P-value

Nguyen Van Quy Econometrics 2024/2025 43 / 49


Example

Regression result in sample of 12 observations

[ i = −4.9 + 0.41expi + 0.83edui + 1.2malei


wage
R 2 = 0.7575, RSS = 22.95, s = 1.694

At significant level 5%
1. Test for significant of model
2. Remove variable male, then R 2 = 0.723, RSS = 26.202. Test for
removing male
3. Regress wage on exp, then R 2 = 0.52, RSS = 45.423. Test for
reducing model
4. Test hypothesis that sum of coefficient of exp and edu is 1, if reduced
model has R 2 = 0.6883, RSS = 24.597.

Nguyen Van Quy Econometrics 2024/2025 44 / 49


Example in R

Data or 12 employees: exp: experience (year); edu: education (year); male


= 1 for male, = 0 otherwise; wage

exp 1 2 2 3 4 5 7 10 10 12 15 16
edu 13 12 16 11 15 15 10 15 13 11 13 15
male 1 1 0 0 1 0 1 0 0 1 1 0
wage 6 6 12 6 11 8 8 10 11 10 15 13

Nguyen Van Quy Econometrics 2024/2025 45 / 49


Matrix calculation

exp <-c(1,2,2,3,4,5,7,10,10,12,15,16)
edu <-c(13,12,16,11,15,15,10,15,13,11,13,15)
male <- c(1,1,0,0,1,0,1,0,0,1,1,0)
wage <- c(6,6,12,6,11,8,8,10,11,10,15,13)
intercept <- c(rep(1,12))
explanatory<-data.frame(intercept, exp, edu, male)
X <-data.matrix(explanatory)
y <-data.matrix(wage)
beta <- solve(t(X) %*% X) %*% (t(X) %*% y)
beta

Nguyen Van Quy Econometrics 2024/2025 46 / 49


Matrix calculation

fitted <- X %*% beta # fitted value vector


resid <- y - fitted # residual vector
resid.SS <- t(resid) %*% resid # residual SS matrix
resid.SS <- as.vector(resid.SS) # convert to value
s.sq <- resid.SS/8 # regression variance
cov.beta <- s.sq* solve(t(X)%*% X) # covariance matrix
var.beta <- diag(cov.beta) # variance of coef
var.beta <- data.matrix(var.beta) # convert into matrix
se.beta <- sqrt(var.beta) # standard error
t.beta <- beta/se.beta # t-statistic
p.beta <- 2*(1-pt(abs(t.beta),8)) # P-value of t-test
TSS <- sum((wage - mean(wage))^2) # Total SS
R2 <- 1 - resid.SS/TSS # R-square
f <- (R2/3)/((1-R2)/8) # F-statistic
p.ftest <- pf(1-f,3,8) # P-value of F-test

Nguyen Van Quy Econometrics 2024/2025 47 / 49


Output

#output
reg1 <-lm(wage ~ exp + edu + male)
summary(reg1)

#variance-covariance matrix
round(vcov(reg1),4)

Nguyen Van Quy Econometrics 2024/2025 48 / 49


Linear hypothesis testing
Install packet AER

install.packages("AER")
library(AER)

Test hypothesis: βexp = 1

linearHypothesis(reg1,"exp = 1")

Test hypothesis: βexp + βedu = 1

linearHypothesis(reg1, "exp + edu = 1")

Testing for deleting 2 variables edu and male

reg1 <- lm(wage ~ exp + edu + male)


reg2 <- lm(wage ~ exp)
anova(reg1, reg2)

Nguyen Van Quy Econometrics 2024/2025 49 / 49

You might also like