Ch04 - MLR Inference - Ver2
Ch04 - MLR Inference - Ver2
Ping Yu
We need to study sampling distributions of the OLS estimators for statistical inference!
The OLS estimators are random variables.
We already know their expected values and their variances.
However, for hypothesis tests we need to know their distribution.
In order to derive their distribution we need additional assumptions.
Assumption about distribution of errors: normal distribution.
Assumption
! MLR.6
" (Normality): ui is independent of (xi1 , ! ! ! , xik ), and
ui " N 0, s . 2
continue
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0 2 4 6 8 10 12
Wage Number of Arrests
Figure: PDF of Wage and PMF of Number of Arrests: the minimum wage in HK is HK$34.5/hr,
and in Beijing RMB24/hr at present.
Therefore,
bb j # b j
! " " N (0, 1) .
sd bb j
The estimators are normally distributed around the true parameters with the
variance that was derived earlier.
- Note that as before, we are conditioning on fxi , i = 1, ! ! ! , ng.
The standardized estimators
! " follow a standard normal distribution.
- Recall that if X " N µ, s 2 , then
X #µ
" N (0, 1)
s
bb j # b j
! " " tn#k #1 = tdf .
se bb j
[Review] t-Distribution
Z " N (0, 1) ,
X " c 2n ,
independent of Z , then
Z standard normal variable
p =p " tn ,
X /n independent chi-square variable/df
continue
If Z1 , ! ! ! , Zn are independently distributed random variables such that Zi " N (0, 1),
i = 1, ! ! ! , n, then
n
X= ∑ Zi2
i =1
( ) * ( * + ) , + ,
1
E ∑ni =1 Zi2 n = E Zi2 and Var ∑ni =1 Zi2 n = Var Zi2 /v ! 0.
Ping Yu (HKU) MLR: Inference 13 / 57
Testing Hypotheses about a Single Population Parameter: The t Test
continue
0.4
0.35
0.3
0.25
Density
0.2
0.15
0.1
0.05
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
N (0, 1)
" r . = tn#k #1
c n#k #1 (n # k # 1)
2
t Statistic or t Ratio
Suppose the null hypothesis (for more general hypotheses, see below) is
H0 : b j = 0.
The population parameter is equal to zero, i.e. after controlling for the other
independent variables, there is no effect of xj on y .
The t statistic or t ratio of bb for this H is j 0
bb
tbb = !j " .
j
se bb j
The t statistic will be used to test the above null hypothesis. The farther the
estimated coefficient is away from zero, the less likely it is that the null hypothesis
holds true. But what does "far" away from zero mean?
This depends on the variability of the estimated coefficient, i.e., its standard
deviation. The t statistic measures how many estimated standard deviations the
estimated coefficient is away from zero.
If the null hypothesis is true,
bb bb # b
tbb = ! j " = j ! "j " tn#k #1 .
j
se bb j se bb j
Goal: Define a rejection rule so that, if H0 is true, it is rejected only with a small
probability (=significance level or level for short, e.g., 5%).
To determine the rejection rule, we need to decide the relevant alternative
hypothesis.
First consider a one-sided alternative of the form
H1 : b j > 0.
Reject the null hypothesis in favor of the alternative hypothesis if the estimated
coefficient is "too large" (i.e. larger than a critical value). (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, this is the point of the t-distribution with 28 degrees of freedom that is
exceeded in 5% of the cases.
So the rejection rule is to reject if the t statistic is greater than 1.701.
Analog: evidences cannot happen if you are not a criminal in testing
Rejection region is the set of values of t statistic at which we will reject the null.
Ping Yu (HKU) MLR: Inference 18 / 57
Testing Hypotheses about a Single Population Parameter: The t Test
Test whether, after controlling for education and tenure, higher work experience
leads to higher hourly wages.
The fitted regression line is
log\
(wage ) = .284 + .092educ + .0041exper + .022tenure
(.104)(.007) (.0017) (.003)
n = 526, R 2 = .316
continue
.0041
texper = ( 2.41.
.0017
df = n # k # 1 = 526 # 3 # 1 = 522, quite large, so the standard normal
approximation applies.
The 5% critical value is c0.05 = 1.645, and the 1% critical value is c0.01 = 2.326.
- 5% and 1% are conventional significance levels.
The null hypothesis is rejected because the t statistic exceeds the critical value.
The conclusion is that the effect of experience on hourly wage is statistically
greater than zero at the 5% (and even at the 1%) significance level.
We want to test
H0 : b j = 0 against H1 : b j < 0
Reject the null hypothesis in favor of the alternative hypothesis if the estimated
coefficient is "too small" (i.e., smaller than a critical value). (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, this is the point of the t-distribution with 18 degrees of freedom so
that 5% of the cases are below the point.
So the rejection rule is to reject if the t statistic is less than #1.734.
\
math10 = 2.274 + .00046totcomp + .048staff # .00020enroll
(6.113)(.00010) (.040) (.00022)
n = 408, R 2 = .0541 (quite small)
where
math10 = percentage of students passing 10th-grade math test
totcomp = average annual teacher compensation
staff = staff per one thousand students
enroll = school enrollment (= school size)
continue
Test
H0 : b enroll = 0 against H1 : b enroll < 0.
- Do larger schools hamper student performance or is there no such effect?
The t statistic for bb is enroll
#.00020
tenroll = ( #.91.
.00022
df = n # k # 1 = 408 # 3 # 1 = 404, quite large, so the standard normal
approximation applies.
The 5% critical value is c0.05 = #1.645, and the 15% critical value is c0.15 = #1.04.
The null hypothesis is not rejected because the t statistic is not smaller than the
critical value.
The conclusion is that one cannot reject the hypothesis that there is no effect of
school size on student performance (not even for a lax significance level of 15%).
continue
#1.29
tlog(enroll ) = ( #1.87.
0.69
c0.05 = #1.645, and tlog(enroll ) < c0.05 , so the hypothesis that there is no effect of
school size on student performance can be rejected in favor of the hypothesis that
the effect is negative.
How large is the effect? quite small:
∂ math10 ∂ math10 #1.29/100 #0.0129
#1.29 = = = = .
∂ log (enroll ) ∂ enroll/enroll 1/100 +1%
We want to test
H0 : b j = 0 against H1 : b j 6= 0
Reject the null hypothesis in favor of the alternative hypothesis if the absolute
value of the estimated coefficient is too large. (why?)
Construct the critical value so that, if the null hypothesis is true, it is rejected in, for
example, 5% of the cases. [figure here]
In the figure, these are the points of the t-distribution so that 5% of the cases lie in
the two tails.
So the rejection rule is to reject if the t statistic is greater than 2.06 or less than
#2.06.
\
colGPA = 1.39 + .412hsGPA + .015ACT # .083skipped
(.33) (.094) (.011) (.026)
n = 141, R 2 = .234
where
skipped = average number of lectures missed per week
df = n # k # 1 = 141 # 3 # 1 = 137, quite large, so the standard normal
approximation applies.
thsGPA = 4.38 > c0.01 = 2.576
tACT = 1.36 < c0.10 = 1.645
1 1
1tskipped 1 = j#3.19j > c0.01 = 2.576
The effects of hsGPA and skipped are significantly different from zero at the 1%
significance level. The effect of ACT is not significantly different from zero, not
even at the 10% significance level.
The test works exactly as before, except that the hypothesized value is
substracted from the estimate when forming the statistic.
log\
(crime ) = #6.63 + 1.27 log(enroll )
(1.03) (.11)
n = 97, R 2 = .585
where
crime = annual number of crimes on college campuses
Although bb is different from one but is this difference statistically
log(enroll )
significant? We want to test
The t statistic is
1.27 # 1
t=
t 2.45 > 1.96 = c0.05 ,
.11
so the null is rejected at the 5% level.
Rejection/Acceptance Dichotomy:
p-Value: R.A. Fisher (we will discuss more about him later).
2
He is the son of Karl Pearson.
Ping Yu (HKU) MLR: Inference 33 / 57
Testing Hypotheses about a Single Population Parameter: The t Test
If the significance level is made smaller and smaller, there will be a point where the
null hypothesis cannot be rejected anymore.
The reason is that, by lowering the significance level, one wants to avoid more and
more to make the error of rejecting a correct H0 .
The smallest significance level at which the null hypothesis is still rejected, is
called the p-value of the hypothesis test.
- The p-value is the significance level at which one is indifferent between rejecting
and not rejecting the null hypothesis. [figure here]
- Alternatively, the p-value is the probability of observing a t statistic as extreme as
we did if the null is true. [p = P (jT j > jtj)]
- A null hypothesis is rejected if and only if the corresponding p-value is smaller
than the significance level. [a = P (jT j > ca ), so jtj > ca () p < a]
A small p-value is evidence against the null hypothesis because one would reject
the null hypothesis even at small significance levels.
A large p-value is evidence in favor of the null hypothesis.
Figure: Obtaining the P-Value Against a Two-Sided Alternative, When t = 1.85 and df = 40
P-values are more informative than tests at fixed significance levels because you
can choose your own significance level.
Confidence Intervals
Confidence Intervals
where c0.05 is the 5% critical value of two-sided test, and 0.95 is called the
confidence level.
Interpretation of the confidence interval: ! "
- The bounds of the interval are random. (the length = 2c0.05 ! se bb j is also
random!)
- In repeated samples, the interval that is constructed in the above way will cover
the population regression coefficient in 95% of the cases.
Analog: catch a butterfly using a net.
Use rules of thumb: c0.01 = 2.576, c0.05 = 1.96 and c0.10 = 1.645.
- to catch a butterfly with a higher probability, we must use a larger net.
\
log (rd ) = #4.38 + 1.084 log(sales ) + .0217profmarg
(.47) (.060) (.0128)
2
n = 32, R = .918 (very high)
where
rd = firm’s spending on R&D
sales = annual sales
profmarg = profits as percentage of sales
df = 32 # 2 # 1 = 29, so c0.05 = 2.045.
The 95% CI for b log(sales ) is 1.084 / 2.045 (.060) = (.961, 1.21). The effect of
log(sales ) on log (rd ) is relatively precisely estimated as the interval is narrow.
Moreover, the effect is significantly different from zero because zero is outside the
interval.
The 95% CI for b profmarg is .0217 / 2.045 (.0128) = (#.0045, .0479). The effect of
profmarg on log (rd ) is imprecisely estimated as the interval is very wide. It is not
even statistically significant because zero lies in the interval.
where
jc = years of education at 2-year colleges
univ = years of education at 4-year colleges
Suppose we want to test
H0 : b 1 # b 2 = 0 vs. H1 : b 1 # b 2 < 0
bb # bb 2
t= !1 ".
se bb 1 # bb 2
continue
! "
d bb , bb is usually not available in regression output.
where Cov 1 2
continue
log\
(wage ) = 1.472 # .0102jc + .0769totcoll + .0049exper
(.021) (.0069) (.0023) (.0002)
2
n = 6, 763, R = .222
Now,
#.0102
t= = #1.48 2 (c0.05 , c0.10 ) = (#1.645, #1.282) ,
.0069
so the null is rejected at 10% level but not at 5% level.
Alternatively, the p-value is
covering 0.
This method works always for single linear hypotheses.
Ping Yu (HKU) MLR: Inference 43 / 57
Testing Multiple Linear Restrictions: The F Test
Consider the following model that explains major league baseball players’ salaries:
where
salary = the 1993 total salary
years = years in the league
gamesyr = average games played per year
bavg = career batting average
hrunsyr = home runs per year
rbisyr = runs batted in per year
The hypotheses are
where H0 is not true means "at least one of b 3 , b 4 and b 5 is not zero".
This is to test whether performance measures have no effects or can be excluded
from regression.
log \
(salary ) = 11.19 + .0689years + .0126gamesyr
(.29) (.0121) (.0026)
+.00098bavg + .0144hrunsyr + .0108rbisyr
(.00110) (.0161) (.0072)
n = 353, SSRur = 181.186, R 2 = .6278
where the subscript ur in SSR indicates the SSR in the unrestricted model.
None of these three variables is statistically significant when tested individually.
Idea: How would the model fit (measured in SSR) be if these variables were
dropped from the regression?
log \
(salary ) = 11.22 + .0713years + .202gamesyr
(.11) (.0125) (.0013)
n = 353, SSRr = 198.311, R 2 = .5971
where the subscript r in SSR indicates the SSR in the restricted model.
The sum of squared residuals (SSR) necessarily increases in the restricted model
[why? recall from Chapter 3 for the case with H0 : b k +1 = 0 and q = 1], but is the
increase statistically significant?
The rigorous test statistic is
(SSRr # SSRur ) /q
F= " Fq,n#k #1 ,
SSRur /(n # k # 1)
Ronald A. Fisher (1890-1962) is one iconic founder of modern statistical theory. The name of
F -distribution was coined by G.W. Snedecor, in honor of R.A. Fisher. The p-value is also
credited to him.
[Review] F -Distribution
X1 " c 2d1 ,
X2 " c 2d2 ,
independent of X1 , then
X1 /d1 chi-square variable/df
= " Fd1 ,d2 ,
X2 /d2 independent chi-square variable/df
an F -distribution with degrees of freedom d1 and d2 .
As in the t-distribution, X2 /d2 ! 1 as d2 ! ∞. So
as d2 ! ∞.
The F statistic is
(198.311 # 181.186) /3
F= t 9.55,
181.186/(353 # 5 # 1)
where q = 3, n = 353 and k = 5.
Since F " F3,347 t c 23 /3, c0.01 = 3.78, thus the null is rejected.
Alternatively, p-value= P (F3,347 > 9.55) = 0.0000, so the null hypothesis is
overwhelmingly rejected (even at very small significance levels).
Discussion:
- If H0 is rejected, we say that the three variables are "jointly significant".
- They were not significant when tested individually.
- The likely reason is multicollinearity between them.
When there is only one restriction, we can use both the t test and F test.
It turns out that they are equivalent (in testing against two-sided alternatives) in the
sense that F = t 2 .
- Recall that
2 32
c2 6 N (0, 1) 7
F1,n#k #1 = .1 =6
4r .
7 = t2
5 n#k #1
c n#k #1 (n # k # 1)
2
c2 ( n # k # 1 )
n#k #1
The t statistic is more flexible for testing a single hypothesis since it can be used to
test against one-sided alternatives.
Since t statistics are easier to obtain than F statistic, there is no reason to use an
F statistic to test hypotheses about a single parameter.
The F statistic is intended to detect whether a set of coefficients is different from
zero, but it is never the best test for determining whether a single coefficient is
different from zero. The t test is best suited for testing a single hypothesis.
- It is possible that b 1 and/or b 2 is significant based on the t test, but (b 1 , b 2 ) are
jointly insignificant based on the F test.
In the example,
(.6278 # .5971) /3
F= t 9.54,
(1 # .6278) /347
very close to the result based on SSR (difference due to rounding error).
In the regression
y = b 0 + b 1 x1 + ! ! ! + b k xk + u,
we want to check whether (x1 , ! ! ! , xk ) do not help to explain y .
Rigorously,
H0 : b 1 = ! ! ! = b k = 0.
The restricted model is
y = b 0 + u,
which is a regression on constant, and bb 0 = y from Assignment I.
The F statistic is
!
"
2 # R 2 /q
Rur 2 /k
r Rur
F=+ ,
2 / (n # k # 1)
= + ,
2 / (n # k # 1)
" Fk ,n#k #1 ,
1 # Rur 1 # Rur
Example: Test whether house price assessments are rational, where the model is
where
price = house price
assess = the assessed housing value (before the house was sold)
lotsize = size of the lot, in feet
sqrft = square footage
bdrms = number of bedrooms
The null is
H0 : b 1 = 1, b 2 = b 3 = b 4 = 0,
b 1 = 1 means that if house price assessments are rational, a 1% change in the
assessment should be associated with a 1% change in price.
b 2 = b 3 = b 4 = 0 means that in addition, other known factors should not influence
the price once the assessed value has been controlled for.
Example continue
y = b 0 + b 1 x1 + b 2 x2 + b 3 x3 + b 4 x4 + u.
y = b 0 + x1 + u =) y # x1 = b 0 + u
log\
(price ) = .264 + 1.043 log (assess ) + .0074 log (lotsize )
(.570) (.151) (.0386)
#.1032 log (sqrft ) + .0338bdrms
(.1384) (.0221)
n = 88, SSR = 1.822, R 2 = .773