0% found this document useful (0 votes)
83 views57 pages

Ch04 - MLR Inference - Ver2

The document discusses the sampling distributions of OLS estimators in multiple linear regression. Under the assumptions of the classical linear model, the OLS estimators are normally distributed around the true population parameters. A t-test can be used to test hypotheses about a single population parameter, where the standardized estimator follows a t-distribution rather than a normal distribution due to estimating the standard error. The t-distribution was first introduced by statistician William Sealy Gosset and converges to the normal as the degrees of freedom increase.

Uploaded by

林嘉明
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views57 pages

Ch04 - MLR Inference - Ver2

The document discusses the sampling distributions of OLS estimators in multiple linear regression. Under the assumptions of the classical linear model, the OLS estimators are normally distributed around the true population parameters. A t-test can be used to test hypotheses about a single population parameter, where the standardized estimator follows a t-distribution rather than a normal distribution due to estimating the standard error. The t-distribution was first introduced by statistician William Sealy Gosset and converges to the normal as the degrees of freedom increase.

Uploaded by

林嘉明
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Ch04.

Multiple Regression Analysis: Inference

Ping Yu

Faculty of Business and Economics


The University of Hong Kong

Ping Yu (HKU) MLR: Inference 1 / 57


Sampling Distributions of the OLS Estimators

Sampling Distributions of the OLS Estimators

Ping Yu (HKU) MLR: Inference 2 / 57


Sampling Distributions of the OLS Estimators

Statistical Inference in the Regression Model

Hypothesis tests about population parameters.


Construction of confidence intervals.
- These two tasks are closely related.

We need to study sampling distributions of the OLS estimators for statistical inference!
The OLS estimators are random variables.
We already know their expected values and their variances.
However, for hypothesis tests we need to know their distribution.
In order to derive their distribution we need additional assumptions.
Assumption about distribution of errors: normal distribution.

Ping Yu (HKU) MLR: Inference 3 / 57


Sampling Distributions of the OLS Estimators

Standard Assumptions for the MLR Model (continue)

Assumption
! MLR.6
" (Normality): ui is independent of (xi1 , ! ! ! , xik ), and
ui " N 0, s . 2

- It is stronger than MLR.4 (zero conditional mean) and MLR.5 (homoskedasticity).

Ping Yu (HKU) MLR: Inference 4 / 57


Sampling Distributions of the OLS Estimators

Discussion of the Normality Assumption

The error term is the sum of "many" different unobserved factors.


Sums of many independent and similarly distributed factors are normally
distributed (central limit theorem or CLT).
Problems:
- How many different factors? Number large enough?
- Possibly very heterogeneous distributions of individual factors.
- How independent are the different factors?
- Are they additive?
The normality of the error term is an empirical question.
In some cases, the error distribution should be "close" to normal, e.g., test scores.
In many other cases, normality is questionable or impossible by definition.

Ping Yu (HKU) MLR: Inference 5 / 57


Sampling Distributions of the OLS Estimators

continue

Examples where normality cannot hold:


- Wages (nonnegative; also: minimum wage). [figure here]
- Number of arrests (takes on a small number of integer values). [figure here]
- Unemployment (indicator variable, takes on only 1 or 0).
In some cases, normality can be achieved through transformations of the
dependent variable (e.g. use log(wage) instead of wage).
Under normality, OLS is the best (even nonlinear) unbiased estimator, i.e., it is the
minimum variance unbiased estimator.
Important: For the purposes of statistical inference, the assumption of normality
can be replaced by a large sample size (Chapter 5, will not be covered).
Terminology:
- MLR.1-MLR.5: Gauss-Markov assumptions
- MLR.1-MLR.6: classical linear model (CLM) assumptions

Ping Yu (HKU) MLR: Inference 6 / 57


Sampling Distributions of the OLS Estimators

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0 2 4 6 8 10 12
Wage Number of Arrests

Figure: PDF of Wage and PMF of Number of Arrests: the minimum wage in HK is HK$34.5/hr,
and in Beijing RMB24/hr at present.

Ping Yu (HKU) MLR: Inference 7 / 57


Sampling Distributions of the OLS Estimators

Normal Sampling Distributions

Theorem 4.1: Under assumptions MLR.1-MLR.6,


! ! ""
bb " N b , Var bb . j j j

Therefore,
bb j # b j
! " " N (0, 1) .
sd bb j

The estimators are normally distributed around the true parameters with the
variance that was derived earlier.
- Note that as before, we are conditioning on fxi , i = 1, ! ! ! , ng.
The standardized estimators
! " follow a standard normal distribution.
- Recall that if X " N µ, s 2 , then

X #µ
" N (0, 1)
s

Ping Yu (HKU) MLR: Inference 8 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Testing Hypotheses about a Single Population


Parameter: The t Test

Ping Yu (HKU) MLR: Inference 9 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

t-Distribution for Standardized Estimators

Theorem 4.2: Under assumptions MLR.1-MLR.6,

bb j # b j
! " " tn#k #1 = tdf .
se bb j

If the standardization is done using the estimated standard deviation (= standard


error), the normal distribution is replaced by a t-distribution. [why? see the slide
after introducting the t-distribution.]
The t-distribution is close to the standard normal distribution if n # k # 1 is large.
The t-distribution is named after Gosset (1908)[photo here], “The probable error of
a mean”. At the time, Gosset worked at Guiness Brewery, which prohibited its
employees from publishing in order to prevent the possible loss of trade secrets.
To circumvent this barrier, Gosset published under the pseudonym “Student”.
Consequently, this famous distribution is known as the student’s t rather than
Gosset’s t! The name “t” was popularized by R.A. Fisher (we will discuss him
later).

Ping Yu (HKU) MLR: Inference 10 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

History of the t-Distribution

William S. Gosset (1876-1937)

Ping Yu (HKU) MLR: Inference 11 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

[Review] t-Distribution

If Z is a standardized normal variable,

Z " N (0, 1) ,

and variable X has a c 2 (chi-square) distribution with n degrees of freedom,

X " c 2n ,

independent of Z , then
Z standard normal variable
p =p " tn ,
X /n independent chi-square variable/df

a t-distribution with n degrees of freedom.

Ping Yu (HKU) MLR: Inference 12 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

continue

If Z1 , ! ! ! , Zn are independently distributed random variables such that Zi " N (0, 1),
i = 1, ! ! ! , n, then
n
X= ∑ Zi2
i =1

follows the c2 distribution with n degrees of freedom.


Note that ,
n h i
∑ Zi2 n ! E Zi2 = 1 as n ! ∞, 1
i =1
so
tn ! N (0, 1) as n ! ∞.
h i h i
Recall that Var (Zi ) = E Zi2 # E [Zi ]2 , so E Zi2 = Var (Zi ) + E [Zi ]2 = 1 + 02 = 1.

( ) * ( * + ) , + ,
1
E ∑ni =1 Zi2 n = E Zi2 and Var ∑ni =1 Zi2 n = Var Zi2 /v ! 0.
Ping Yu (HKU) MLR: Inference 13 / 57
Testing Hypotheses about a Single Population Parameter: The t Test

continue

0.4

0.35

0.3

0.25
Density

0.2

0.15

0.1

0.05

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

Figure: Density of the t Distribution for n = 1, 2, 5, ∞

Ping Yu (HKU) MLR: Inference 14 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

(*) Why the Standardized bb j Using Its SE Follows the t-Distribution?

We provide only a rough idea here.


Note that
! " ! "
bb j # b j bb j # b j /sd bb j
! " = s .
se bb bi2 /(n#k #1)
! "
∑ni=1 u
j
2 Var bb j
SSTj (1#Rj )
,
! " r
s2
bb j # b j
SSTj (1#Rj2 )
= s .
∑ni=1 u bi2 /(n#k #1) s2
SSTj (1#Rj2 ) SSTj (1#Rj2 )
! " ! "
bb j # b j /sd bb j
= s
! "2 .
n b
u
∑i =1 si (n # k # 1)

N (0, 1)
" r . = tn#k #1
c n#k #1 (n # k # 1)
2

Ping Yu (HKU) MLR: Inference 15 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

t Statistic or t Ratio
Suppose the null hypothesis (for more general hypotheses, see below) is
H0 : b j = 0.
The population parameter is equal to zero, i.e. after controlling for the other
independent variables, there is no effect of xj on y .
The t statistic or t ratio of bb for this H is j 0

bb
tbb = !j " .
j
se bb j
The t statistic will be used to test the above null hypothesis. The farther the
estimated coefficient is away from zero, the less likely it is that the null hypothesis
holds true. But what does "far" away from zero mean?
This depends on the variability of the estimated coefficient, i.e., its standard
deviation. The t statistic measures how many estimated standard deviations the
estimated coefficient is away from zero.
If the null hypothesis is true,
bb bb # b
tbb = ! j " = j ! "j " tn#k #1 .
j
se bb j se bb j

Ping Yu (HKU) MLR: Inference 16 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

a: Testing against One-Sided Alternatives (greater than zero)

Goal: Define a rejection rule so that, if H0 is true, it is rejected only with a small
probability (=significance level or level for short, e.g., 5%).
To determine the rejection rule, we need to decide the relevant alternative
hypothesis.
First consider a one-sided alternative of the form

H1 : b j > 0.

Reject the null hypothesis in favor of the alternative hypothesis if the estimated
coefficient is "too large" (i.e. larger than a critical value). (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, this is the point of the t-distribution with 28 degrees of freedom that is
exceeded in 5% of the cases.
So the rejection rule is to reject if the t statistic is greater than 1.701.
Analog: evidences cannot happen if you are not a criminal in testing

H0 : you are innocent vs. H1 : you are guilty

Ping Yu (HKU) MLR: Inference 17 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Figure: 5% Rejection Rule for the Alternative H1 : b j > 0 with 28 df

Rejection region is the set of values of t statistic at which we will reject the null.
Ping Yu (HKU) MLR: Inference 18 / 57
Testing Hypotheses about a Single Population Parameter: The t Test

Example: Hourly Wage Equation

Test whether, after controlling for education and tenure, higher work experience
leads to higher hourly wages.
The fitted regression line is

log\
(wage ) = .284 + .092educ + .0041exper + .022tenure
(.104)(.007) (.0017) (.003)
n = 526, R 2 = .316

where standard errors appear in parentheses below the estimated coefficients.


Test
H0 : b exper = 0 against H1 : b exper > 0.
One would either expect a positive effect of experience on hourly wage or no effect
at all.

Ping Yu (HKU) MLR: Inference 19 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

continue

The t statistic for bb exper is

.0041
texper = ( 2.41.
.0017
df = n # k # 1 = 526 # 3 # 1 = 522, quite large, so the standard normal
approximation applies.
The 5% critical value is c0.05 = 1.645, and the 1% critical value is c0.01 = 2.326.
- 5% and 1% are conventional significance levels.
The null hypothesis is rejected because the t statistic exceeds the critical value.
The conclusion is that the effect of experience on hourly wage is statistically
greater than zero at the 5% (and even at the 1%) significance level.

Ping Yu (HKU) MLR: Inference 20 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Testing against One-Sided Alternatives (less than zero)

We want to test
H0 : b j = 0 against H1 : b j < 0
Reject the null hypothesis in favor of the alternative hypothesis if the estimated
coefficient is "too small" (i.e., smaller than a critical value). (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, this is the point of the t-distribution with 18 degrees of freedom so
that 5% of the cases are below the point.
So the rejection rule is to reject if the t statistic is less than #1.734.

Ping Yu (HKU) MLR: Inference 21 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Figure: 5% Rejection Rule for the Alternative H1 : b j < 0 with 18 df

Ping Yu (HKU) MLR: Inference 22 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Example: Student Performance and School Size

Test whether smaller school size leads to better student performance.


The fitted regression line is

\
math10 = 2.274 + .00046totcomp + .048staff # .00020enroll
(6.113)(.00010) (.040) (.00022)
n = 408, R 2 = .0541 (quite small)

where
math10 = percentage of students passing 10th-grade math test
totcomp = average annual teacher compensation
staff = staff per one thousand students
enroll = school enrollment (= school size)

Ping Yu (HKU) MLR: Inference 23 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

continue

Test
H0 : b enroll = 0 against H1 : b enroll < 0.
- Do larger schools hamper student performance or is there no such effect?
The t statistic for bb is enroll

#.00020
tenroll = ( #.91.
.00022
df = n # k # 1 = 408 # 3 # 1 = 404, quite large, so the standard normal
approximation applies.
The 5% critical value is c0.05 = #1.645, and the 15% critical value is c0.15 = #1.04.
The null hypothesis is not rejected because the t statistic is not smaller than the
critical value.
The conclusion is that one cannot reject the hypothesis that there is no effect of
school size on student performance (not even for a lax significance level of 15%).

Ping Yu (HKU) MLR: Inference 24 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

continue

Using an alternative specification of functional form, we have


\
math10 = #207.66 + 21.16 log (totcomp ) + 3.98 log (staff ) # 1.29 log (enroll )
(48.70) (4.06) (4.19) (0.69)
n = 408, R 2 = .0654 (slightly higher, but still quite small)
Test
H0 : b log(enroll ) = 0 against H1 : b log(enroll ) < 0.

The t statistic for bb log(enroll ) is

#1.29
tlog(enroll ) = ( #1.87.
0.69
c0.05 = #1.645, and tlog(enroll ) < c0.05 , so the hypothesis that there is no effect of
school size on student performance can be rejected in favor of the hypothesis that
the effect is negative.
How large is the effect? quite small:
∂ math10 ∂ math10 #1.29/100 #0.0129
#1.29 = = = = .
∂ log (enroll ) ∂ enroll/enroll 1/100 +1%

Ping Yu (HKU) MLR: Inference 25 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

b: Testing against Two-Sided Alternatives

We want to test
H0 : b j = 0 against H1 : b j 6= 0
Reject the null hypothesis in favor of the alternative hypothesis if the absolute
value of the estimated coefficient is too large. (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, these are the points of the t-distribution so that 5% of the cases lie in
the two tails.
So the rejection rule is to reject if the t statistic is greater than 2.06 or less than
#2.06.

Ping Yu (HKU) MLR: Inference 26 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Figure: 5% Rejection Rule for the Alternative H1 : b j 6= 0 with 25 df

Ping Yu (HKU) MLR: Inference 27 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Example: Determinants of College GPA

The fitted regression line is

\
colGPA = 1.39 + .412hsGPA + .015ACT # .083skipped
(.33) (.094) (.011) (.026)
n = 141, R 2 = .234

where
skipped = average number of lectures missed per week
df = n # k # 1 = 141 # 3 # 1 = 137, quite large, so the standard normal
approximation applies.
thsGPA = 4.38 > c0.01 = 2.576
tACT = 1.36 < c0.10 = 1.645
1 1
1tskipped 1 = j#3.19j > c0.01 = 2.576
The effects of hsGPA and skipped are significantly different from zero at the 1%
significance level. The effect of ACT is not significantly different from zero, not
even at the 10% significance level.

Ping Yu (HKU) MLR: Inference 28 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

"Statistically Significant" Variables in a Regression

If a regression coefficient is different from zero in a two-sided test, the


corresponding variable is said to be statistically significant.
If the number of degrees of freedom is large enough so that the normal
approximation applies, the following rules of thumb apply:

jt ratioj > 1.645 =) statistically significant at 10% level


jt ratioj > 1.96 =) statistically significant at 5% level
jt ratioj > 2.576 =) statistically significant at 1% level

1.96(t 2) is a magic number in practice.

Ping Yu (HKU) MLR: Inference 29 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Guidelines for Discussing Economic and Statistical Significance

If a variable is statistically significant, discuss the magnitude of the coefficient to


get an idea of its economic or practical significance.
The fact that a coefficient is statistically significant does not necessarily mean it is
economically or practically significant!
- E.g, log (enroll ) in the above example is statistically significant, but economically
insignificant.
If a variable is statistically and economically important but has the "wrong" sign,
the regression model might be misspecified.
If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may
think of dropping it from the regression.
If the sample size is small, effects might be imprecisely estimated so that the case
for dropping insignificant variables is less strong.

Ping Yu (HKU) MLR: Inference 30 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

c: Testing More General Hypotheses About a Regression Coefficient

The general null is stated as


H0 : b j = aj ,
where aj is the hypothesized value of the coefficient.
The t statistic is

(estimate # hypothesized value) bb j # aj


t= = ! ".
standard error se bb j

The test works exactly as before, except that the hypothesized value is
substracted from the estimate when forming the statistic.

Ping Yu (HKU) MLR: Inference 31 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Example: Campus Crime and Enrollment

An interesting hypothesis is whether crime increases by one percent if enrollment


is increased by one percent.
The fitted regression line is

log\
(crime ) = #6.63 + 1.27 log(enroll )
(1.03) (.11)
n = 97, R 2 = .585

where
crime = annual number of crimes on college campuses
Although bb is different from one but is this difference statistically
log(enroll )
significant? We want to test

H0 : b log(enroll ) = 1 against H1 : b log(enroll ) 6= 1

The t statistic is
1.27 # 1
t=
t 2.45 > 1.96 = c0.05 ,
.11
so the null is rejected at the 5% level.

Ping Yu (HKU) MLR: Inference 32 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Different Traditions of Hypothesis Testing

Rejection/Acceptance Dichotomy:

Jerzy Neyman (1894-1981), Berkeley Egon Pearson (1895-1980)2 , UCL

p-Value: R.A. Fisher (we will discuss more about him later).

2
He is the son of Karl Pearson.
Ping Yu (HKU) MLR: Inference 33 / 57
Testing Hypotheses about a Single Population Parameter: The t Test

d: Computing p-Values for t Tests

If the significance level is made smaller and smaller, there will be a point where the
null hypothesis cannot be rejected anymore.
The reason is that, by lowering the significance level, one wants to avoid more and
more to make the error of rejecting a correct H0 .
The smallest significance level at which the null hypothesis is still rejected, is
called the p-value of the hypothesis test.
- The p-value is the significance level at which one is indifferent between rejecting
and not rejecting the null hypothesis. [figure here]
- Alternatively, the p-value is the probability of observing a t statistic as extreme as
we did if the null is true. [p = P (jT j > jtj)]
- A null hypothesis is rejected if and only if the corresponding p-value is smaller
than the significance level. [a = P (jT j > ca ), so jtj > ca () p < a]
A small p-value is evidence against the null hypothesis because one would reject
the null hypothesis even at small significance levels.
A large p-value is evidence in favor of the null hypothesis.

Ping Yu (HKU) MLR: Inference 34 / 57


Testing Hypotheses about a Single Population Parameter: The t Test

Figure: Obtaining the P-Value Against a Two-Sided Alternative, When t = 1.85 and df = 40

P-values are more informative than tests at fixed significance levels because you
can choose your own significance level.

Ping Yu (HKU) MLR: Inference 35 / 57


Confidence Intervals

Confidence Intervals

Ping Yu (HKU) MLR: Inference 36 / 57


Confidence Intervals

Confidence Intervals

Simple manipulation of the result in Theorem 4.2 implies that


0 1
B ! " ! "C
B C
P Bbb j # c0.05 ! se bb j - b j - bb j + c0.05 ! se bb j C
@| {z } | {z }A (*)
1lower bound1 of the CI
! upper bound of the CI
1b 1
1 b j!#b "j 1
= P 1 b 1 - c0.05 = 0.95,
1 se b j 1

where c0.05 is the 5% critical value of two-sided test, and 0.95 is called the
confidence level.
Interpretation of the confidence interval: ! "
- The bounds of the interval are random. (the length = 2c0.05 ! se bb j is also
random!)
- In repeated samples, the interval that is constructed in the above way will cover
the population regression coefficient in 95% of the cases.
Analog: catch a butterfly using a net.

Ping Yu (HKU) MLR: Inference 37 / 57


Confidence Intervals

How to Construct CIs? Inverting the Test Statistic

Collecting all bj ’s that are not rejected in the t test of H0 : b j = bj vs. H1 : b j 6= bj .


[why? see (*)]
Relationship between confidence intervals and hypotheses tests: If bj 2 /CI, then
we will reject H0 : b j = bj in favor of H1 : b j 6= bj at the level of (1#confidence
level).
Confidence intervals for typical confidence levels:
! ! " ! ""
P bb j # c0.01 ! se bb j - b j - bb j + c0.01 ! se bb j = 0.99,
! ! " ! ""
P bb j # c0.05 ! se bb j - b j - bb j + c0.05 ! se bb j = 0.95,
! ! " ! ""
P bb j # c0.10 ! se bb j - b j - bb j + c0.10 ! se bb j = 0.90.

Use rules of thumb: c0.01 = 2.576, c0.05 = 1.96 and c0.10 = 1.645.
- to catch a butterfly with a higher probability, we must use a larger net.

Ping Yu (HKU) MLR: Inference 38 / 57


Confidence Intervals

Example: Model of R&D Expenditures

The fitted regression line is

\
log (rd ) = #4.38 + 1.084 log(sales ) + .0217profmarg
(.47) (.060) (.0128)
2
n = 32, R = .918 (very high)

where
rd = firm’s spending on R&D
sales = annual sales
profmarg = profits as percentage of sales
df = 32 # 2 # 1 = 29, so c0.05 = 2.045.
The 95% CI for b log(sales ) is 1.084 / 2.045 (.060) = (.961, 1.21). The effect of
log(sales ) on log (rd ) is relatively precisely estimated as the interval is narrow.
Moreover, the effect is significantly different from zero because zero is outside the
interval.
The 95% CI for b profmarg is .0217 / 2.045 (.0128) = (#.0045, .0479). The effect of
profmarg on log (rd ) is imprecisely estimated as the interval is very wide. It is not
even statistically significant because zero lies in the interval.

Ping Yu (HKU) MLR: Inference 39 / 57


Testing Hypotheses about a Single Linear Combination of the Parameters

Testing Hypotheses about a Single Linear


Combination of the Parameters

Ping Yu (HKU) MLR: Inference 40 / 57


Testing Hypotheses about a Single Linear Combination of the Parameters

Example: Return to Education at 2-Year vs. at 4-Year Colleges

Suppose the model is

log (wage ) = b 0 + b 1 jc + b 2 univ + b 3 exper + u,

where
jc = years of education at 2-year colleges
univ = years of education at 4-year colleges
Suppose we want to test

H0 : b 1 # b 2 = 0 vs. H1 : b 1 # b 2 < 0

A possible test statistic would be

bb # bb 2
t= !1 ".
se bb 1 # bb 2

The difference between the estimates is normalized by the estimated standard


deviation of the difference. The null hypothesis would have to be rejected if the
statistic is "too negative" to believe that the true difference between the
parameters is equal to zero.

Ping Yu (HKU) MLR: Inference 41 / 57


Testing Hypotheses about a Single Linear Combination of the Parameters

continue

It is impossible to compute such a t statistic with standard regression output


because
! " r ! "
b b
se b 1 # b 2 = d bb # bb
Var 1 2
r ! " ! " ! "
= d bb + Var
Var d bb # 2Cov d bb , bb ,
1 2 1 2

! "
d bb , bb is usually not available in regression output.
where Cov 1 2

Alternative method: Define q 1 = b 1 # b 2 and test H0 : q 1 = 0 vs. H1 : q 1 < 0.


Now, b 1 = q 1 + b 2 . Inserting it into the original regression, we have

log (wage ) = b 0 + (q 1 + b 2 ) jc + b 2 univ + b 3 exper + u


= b 0 + q 1 jc + b 2 (jc + univ ) + b 3 exper + u,

where jc + univ is a new regressor, representing total years of college.

Ping Yu (HKU) MLR: Inference 42 / 57


Testing Hypotheses about a Single Linear Combination of the Parameters

continue

The fitted regression line is

log\
(wage ) = 1.472 # .0102jc + .0769totcoll + .0049exper
(.021) (.0069) (.0023) (.0002)
2
n = 6, 763, R = .222

Now,
#.0102
t= = #1.48 2 (c0.05 , c0.10 ) = (#1.645, #1.282) ,
.0069
so the null is rejected at 10% level but not at 5% level.
Alternatively, the p-value is

P (T < #1.48) = .070 2 (.05, .10) ,

and the 95% CI for q 1 is

#.0102 / 1.96 (.0069) = (#.0237, .0003)

covering 0.
This method works always for single linear hypotheses.
Ping Yu (HKU) MLR: Inference 43 / 57
Testing Multiple Linear Restrictions: The F Test

Testing Multiple Linear Restrictions: The F Test

Ping Yu (HKU) MLR: Inference 44 / 57


Testing Multiple Linear Restrictions: The F Test

a: Testing Exclusion Restrictions

Consider the following model that explains major league baseball players’ salaries:

log (salary ) = b 0 + b 1 years + b 2 gamesyr + b 3 bavg


+b 4 hrunsyr + b 5 rbisyr + u,

where
salary = the 1993 total salary
years = years in the league
gamesyr = average games played per year
bavg = career batting average
hrunsyr = home runs per year
rbisyr = runs batted in per year
The hypotheses are

H0 : b 3 = b 4 = b 5 = 0 vs. H1 : H0 is not true

where H0 is not true means "at least one of b 3 , b 4 and b 5 is not zero".
This is to test whether performance measures have no effects or can be excluded
from regression.

Ping Yu (HKU) MLR: Inference 45 / 57


Testing Multiple Linear Restrictions: The F Test

Estimation of the Unrestricted Model

The estimated unrestricted model is

log \
(salary ) = 11.19 + .0689years + .0126gamesyr
(.29) (.0121) (.0026)
+.00098bavg + .0144hrunsyr + .0108rbisyr
(.00110) (.0161) (.0072)
n = 353, SSRur = 181.186, R 2 = .6278

where the subscript ur in SSR indicates the SSR in the unrestricted model.
None of these three variables is statistically significant when tested individually.
Idea: How would the model fit (measured in SSR) be if these variables were
dropped from the regression?

Ping Yu (HKU) MLR: Inference 46 / 57


Testing Multiple Linear Restrictions: The F Test

Estimation of the Restricted Model

The estimated restricted model is

log \
(salary ) = 11.22 + .0713years + .202gamesyr
(.11) (.0125) (.0013)
n = 353, SSRr = 198.311, R 2 = .5971

where the subscript r in SSR indicates the SSR in the restricted model.
The sum of squared residuals (SSR) necessarily increases in the restricted model
[why? recall from Chapter 3 for the case with H0 : b k +1 = 0 and q = 1], but is the
increase statistically significant?
The rigorous test statistic is

(SSRr # SSRur ) /q
F= " Fq,n#k #1 ,
SSRur /(n # k # 1)

where, q = dfr # dfur = (n # (k # q ) # 1) # (n # k # 1) is the number of restrictions,


and n # k # 1 = dfur .
The relative increase of the sum of squared residuals when going from H1 to H0
follows a F -distribution (if the null hypothesis H0 is correct).

Ping Yu (HKU) MLR: Inference 47 / 57


Testing Multiple Linear Restrictions: The F Test

History of the F -Distribution

Ronald A. Fisher (1890-1962), UCL

Ronald A. Fisher (1890-1962) is one iconic founder of modern statistical theory. The name of
F -distribution was coined by G.W. Snedecor, in honor of R.A. Fisher. The p-value is also
credited to him.

Ping Yu (HKU) MLR: Inference 48 / 57


Testing Multiple Linear Restrictions: The F Test

[Review] F -Distribution

If X1 follows a c 2 distribution with d1 degrees of freedom,

X1 " c 2d1 ,

and X2 follows a c 2 distribution with d2 degrees of freedom,

X2 " c 2d2 ,

independent of X1 , then
X1 /d1 chi-square variable/df
= " Fd1 ,d2 ,
X2 /d2 independent chi-square variable/df
an F -distribution with degrees of freedom d1 and d2 .
As in the t-distribution, X2 /d2 ! 1 as d2 ! ∞. So

Fd1 ,d2 ! c 2d1 /d1

as d2 ! ∞.

Ping Yu (HKU) MLR: Inference 49 / 57


Testing Multiple Linear Restrictions: The F Test

Figure: The 5% Critical Value and Rejection Region in an F3,60 Distribution

We need only to show that (SSRr # SSRur ) /s 2 " c 2q and is independent of


(SSRr #SSRur )/q
SSRur to show F = SSRur /(n#k #1)
" Fq,n#k #1 , since from the t statistic, we know
that SSRur /s 2 " c 2n#k #1 .

Ping Yu (HKU) MLR: Inference 50 / 57


Testing Multiple Linear Restrictions: The F Test

Test Decision in Example

The F statistic is
(198.311 # 181.186) /3
F= t 9.55,
181.186/(353 # 5 # 1)
where q = 3, n = 353 and k = 5.
Since F " F3,347 t c 23 /3, c0.01 = 3.78, thus the null is rejected.
Alternatively, p-value= P (F3,347 > 9.55) = 0.0000, so the null hypothesis is
overwhelmingly rejected (even at very small significance levels).
Discussion:
- If H0 is rejected, we say that the three variables are "jointly significant".
- They were not significant when tested individually.
- The likely reason is multicollinearity between them.

Ping Yu (HKU) MLR: Inference 51 / 57


Testing Multiple Linear Restrictions: The F Test

(*) b: Relationship between F and t Statistics

When there is only one restriction, we can use both the t test and F test.
It turns out that they are equivalent (in testing against two-sided alternatives) in the
sense that F = t 2 .
- Recall that
2 32
c2 6 N (0, 1) 7
F1,n#k #1 = .1 =6
4r .
7 = t2
5 n#k #1
c n#k #1 (n # k # 1)
2
c2 ( n # k # 1 )
n#k #1

The t statistic is more flexible for testing a single hypothesis since it can be used to
test against one-sided alternatives.
Since t statistics are easier to obtain than F statistic, there is no reason to use an
F statistic to test hypotheses about a single parameter.
The F statistic is intended to detect whether a set of coefficients is different from
zero, but it is never the best test for determining whether a single coefficient is
different from zero. The t test is best suited for testing a single hypothesis.
- It is possible that b 1 and/or b 2 is significant based on the t test, but (b 1 , b 2 ) are
jointly insignificant based on the F test.

Ping Yu (HKU) MLR: Inference 52 / 57


Testing Multiple Linear Restrictions: The F Test

c: The R-Squared Form of the F Statistic

Recall that ! "


SSR
R2 = 1 # =) SSR = SST 1 # R 2 .
SST
As a result,
(SSRr # SSRur ) /q
F =
SSRur /(n # k # 1)
h ! " ! "i
SST 1 # Rr2 # SST 1 # Rur 2 /q
= + ,
2 / (n # k # 1)
SST 1 # Rur
! "
2 # R 2 /q
Rur r
= + , .
2
1 # Rur /(n # k # 1)

In the example,
(.6278 # .5971) /3
F= t 9.54,
(1 # .6278) /347
very close to the result based on SSR (difference due to rounding error).

Ping Yu (HKU) MLR: Inference 53 / 57


Testing Multiple Linear Restrictions: The F Test

e: The F Statistic for Overall Significance of a Regression

In the regression
y = b 0 + b 1 x1 + ! ! ! + b k xk + u,
we want to check whether (x1 , ! ! ! , xk ) do not help to explain y .
Rigorously,
H0 : b 1 = ! ! ! = b k = 0.
The restricted model is
y = b 0 + u,
which is a regression on constant, and bb 0 = y from Assignment I.
The F statistic is
!
"
2 # R 2 /q
Rur 2 /k
r Rur
F=+ ,
2 / (n # k # 1)
= + ,
2 / (n # k # 1)
" Fk ,n#k #1 ,
1 # Rur 1 # Rur

where q = k and Rr2 = 0 from Assignment I.


The test of overall significance is reported in most regression packages; the null
hypothesis is usually overwhelmingly rejected.

Ping Yu (HKU) MLR: Inference 54 / 57


Testing Multiple Linear Restrictions: The F Test

f: Testing General Linear Restrictions

Example: Test whether house price assessments are rational, where the model is

log (price ) = b 0 + b 1 log (assess ) + b 2 log (lotsize )


+b 3 log (sqrft ) + b 4 bdrms + u,

where
price = house price
assess = the assessed housing value (before the house was sold)
lotsize = size of the lot, in feet
sqrft = square footage
bdrms = number of bedrooms
The null is
H0 : b 1 = 1, b 2 = b 3 = b 4 = 0,
b 1 = 1 means that if house price assessments are rational, a 1% change in the
assessment should be associated with a 1% change in price.
b 2 = b 3 = b 4 = 0 means that in addition, other known factors should not influence
the price once the assessed value has been controlled for.

Ping Yu (HKU) MLR: Inference 55 / 57


Testing Multiple Linear Restrictions: The F Test

Example continue

The unrestricted regression is

y = b 0 + b 1 x1 + b 2 x2 + b 3 x3 + b 4 x4 + u.

The restricted regression is

y = b 0 + x1 + u =) y # x1 = b 0 + u

- The restricted model is actually a regression of y # x1 on a constant, and the


resulting bb 0 is the sample mean of y # x1 .
The test statistic is
(SSRr # SSRur ) /q (1.880 # 1.822) /4
F= = t .661,
SSRur /(n # k # 1) 1.822/ (88 # 4 # 1)

where SSRur is obtained from the next slide.


F " F4,83 =) c0.05 = 2.50 =) H0 cannot be rejected.

Ping Yu (HKU) MLR: Inference 56 / 57


Testing Multiple Linear Restrictions: The F Test

Regression Output for the Unrestricted Regression

The fitted unrestricted regression line is

log\
(price ) = .264 + 1.043 log (assess ) + .0074 log (lotsize )
(.570) (.151) (.0386)
#.1032 log (sqrft ) + .0338bdrms
(.1384) (.0221)
n = 88, SSR = 1.822, R 2 = .773

The F test works for general multiple linear hypotheses.


For all tests and confidence intervals, validity of assumptions MLR.1-MLR.6 (esp.
homoskedasticity) has been assumed. Tests may be invalid otherwise.
The p-value for the F test is similarly defined as in the t test, see slide 51.
(*) Like the CI, the confidence region of (b 1 , b 2 ) can be constructed by collecting
all values (b1 , b2 ) that cannot be rejected in testing H0 : b 1 = b1 , b 2 = b2 vs.
H1 : b 1 6= b1 , b 2 6= b2 .

Ping Yu (HKU) MLR: Inference 57 / 57

You might also like