AE 2023 Lecture4 PDF
AE 2023 Lecture4 PDF
Applied Econometrics
Dr. Le Anh Tuan
1
Hypothesis testing in empirical research
Research
•Introduction Hypothesis •Results
•Motivations •Theory background, •Confirm Hypothesis
•Contributions •Hypotheses
•Direction of
influence, specific
Research Statistical
question hypothesis
2
A Reminder
►Imagine you want to find out whether a new diet actually
helps people lose weight or whether it is completely
useless (most diets are)
►You’ve collected data about 100 people, who had been on
the diet for 8 weeks
►Let di denote the difference in the weights of ith person
after and before the diet:
!" = $%"&ℎ( )*(%+ – $%"&ℎ( -%*.+%
►You’re testing the average effect of a diet, which means
you’re making a hypothesis about the /./01)(".2 3%)2
of !, called μ
►First, you need to state:
►H0: the null hypothesis
►H1: the alternative hypothesis
3
A Reminder
► In hypothesis testing, the null hypothesis is something you’re trying
to disprove (reject) using the evidence in our data.
► In order to show the diet works, you’ll actually be disproving it
doesn’t. Therefore, the null hypothesis will be:
H0: μ = 0 (on average, there’s no effect)
► The alternative hypothesis is a vague definition of what you’re trying
to show, e.g.:
H1: μ < 0 (on average, people lose weight)
► Next, you look at the average effect of diet in your data and find that,
say, "̅ = – 1.5 (in your sample, on average, people lost 1.5 kg)
► Is that a reason to reject H0?
► We don’t know yet
► Even if H0 is actually true, we would not expect the sample
average to be exactly 0
► The question is whether –1.5 is sufficiently far away from 0
so that we can reject H0.
4
A Reminder
►In hypothesis testing, we can make Type I Error mistakes.
5
Hypothesis Testing
► Consider the following multiple regression model:
!" = $% + $' ('" + ⋯ + $* ('* + +"
6
Hypothesis Testing
► Assumption MLR.6 (Normality of error terms). The population error
! is independent of the explanatory variables and is normally
distributed with zero mean and variance "# .
independently of
It is assumed that the
unobserved factors are
normally distributed around
the population regression
function.
7
Hypothesis Testing
8
Hypothesis Testing Process
9
Hypothesis Testing Process
5. Interpret. We either:
► Reject the null
► Fail to reject the null
► We NEVER accept the null
10
Testing Hypotheses about a Single Population
Parameter:
The t Test
11
Hypothesis Testing
► Under MLR.1 through MLR.6, the sampling distribution of
standardized estimators is Student’s distribution, t-statistics, !"–#–1
12
Hypothesis Testing
► Null hypothesis (for more general hypotheses)
The population parameter is equal to zero, i.e.
after controlling for the other independent
variables, there is no effect of !" on #
This depends on the variability of the estimated coefficient, i.e. its standard
deviation. The t-statistic measures how many estimated standard deviations
the estimated coefficient is away from zero.
13
Decision rule
► Choose a significant level (1%, 5%, or 10%)
► If the deviation of ! "# from the null hypothesis value "# =0 is large enough, one
would reject H0
► Intuition: If t is very large (or very small) then
a) the estimated mean ! "# is far from "# (under H0) and/or
b) the standard deviation of the estimated deviation is small relative to ! "# - "#
!,
+
$ − &'$() =
0,)
-.(+
14
Testing against one-sided alternatives
(greater than zero)
Test against
c0.05=1.701
15
Testing against one-sided alternatives
(greater than zero)
► Example WAGE: Test whether, after controlling for education and
tenure, higher work experience leads to higher hourly wages
16
Testing against one-sided alternatives
(greater than zero)
(.(*+
► ! − #!$!%#!%&# = (.(+, = 2.41
► Degrees of freedom 01 = 2 − 3 − 1 = 526 − 3 − 1 = 522
► At significant 5% level :
► &7%!%&$8 9$8:;: &(.(= = 1.645
► t-statistics > &(.(= → we reject the null
► At significant 1% level :
► &7%!%&$8 9$8:;: &(.(+ = 2.326
► t-statistics > &(.(+ → we reject the null
18
Testing against one-sided alternatives (less
than zero)
Test against
19
Testing against one-sided alternatives (less than
zero)
► Example: Student performance and school size-MEAP93
► Test whether smaller school size leads to better student
performance
Percentage of students Average annual Staff per one Student enrollment
passing maths test teacher compensation thousand students (= school size)
20
Testing against one-sided alternatives (less than
zero)
().)))+
► ! − #!$!%#!%&# = = −0.91
).)))++
► Degrees of freedom /0 = 1 − 2 − 1 = 408 − 3 − 1 = 404
► At significant 5% level :
► &6%!%&$7 8$79:: &).)< = −1.65
► t-statistics > &).)< → we cannot reject the null
21
Testing against one-sided alternatives
(less than zero)
Test against
Reject the null hypothesis in favour of the
alternative hypothesis if the absolute value of
the estimated coefficient is too large.
► Test : !0: $%&'() = 0 (no effect of high school GPA on college GPA)
!- : $%&'() ≠ 0 (a significant effect of hsGPA on colGPA)
24
Testing against two-sided alternatives
(.*+,
► !"#$%& = (.(-* = 4.38
► Degrees of freedom 12 = 3 − 5 − 1 = 141 − 3 − 1 = 137
► At significant 1% level :
► 89:!:8;< =;<>?: 8(.(+ = 2.576
► !"#$%& > 8(.(+ → we can reject the null
25
Summary
► If a regression coefficient is different from zero in a two-
sided test, the corresponding variable is said to be
“statistically significant”
► If the number of degrees of freedom is large enough so that
the normal approximation applies, the following rules of
thumb apply:
26
Testing more general hypotheses about a
regression coefficient
► Null hypothesis
Hypothesized value
of the coefficient
► t-statistic
27
Testing more general hypotheses about
a regression coefficient
► Example: Campus crime and enrollment
► We would like to test whether crime increases by one percent
if enrollment is increased by one percent
28
Using p-Values for Hypothesis Testing
29
Using p-Values for Hypothesis Testing
What is the p-value?
► Classical approach to hypothesis testing: first choose the
significance level, then test the hypothesis at the given
level of significance (e.g. 5%)
► However, there is no ”correct” significance level.
► What is the smallest significance level at which the null
hypothesis would still be rejected?
► p-value is the smallest significance level at which
the null hypothesis would be rejected.
► Remember that the significance level describes the
probability of type I error.
→ the smaller the p-value, the more evidence there is in the
sample data against the null hypothesis and for the
alternative hypothesis.
→ if p-value is less than our level of significance, we reject !0
30
Using p-Values for Hypothesis Testing
32
Discussing economic and statistical
significance
► If a variable is statistically significant, discuss the magnitude of
the coefficient to get an idea of its economic or practical
importance
► The fact that a coefficient is statistically significant does not
necessarily mean it is economically or practically significant!
► If a variable is statistically and economically important but has
the “wrong” sign, the regression model might be misspecified
► If a variable is statistically insignificant at the usual levels (10%,
5%, or 1%), one may think of dropping it from the regression
► If the sample size is small, effects might be imprecisely estimated
so that the case for dropping insignificant variables is less strong
33
Confidence Intervals
34
Confidence Intervals Critical value of
two-sided test
35
Confidence Intervals
► A confidence interval (or interval estimate) for !" is the interval
given by
#% ± '. )*($
$ #% )
► At 90% confidence level:
#% − '-.. . )*($
$ #% ) ≤ $
#% ≤ $#% + '-.. . )*($
#% )
With
36
Using Confidence Intervals for Hypothesis
Testing
► Confidence intervals can be used to easily carry out the two-
tailed test
!0: $% = 'j
!1: $% ≠ '%
► The rule is as follows:
H0 is rejected at the 5% significance level if, and only if, * is
not in the 95% confidence interval for βj
reject in favor of
37
Confidence Intervals – Example
The effect of sales on R&D is relatively precisely This effect is imprecisely estimated as the
estimated as the interval is narrow. Moreover, the interval is very wide. It is not even
effect is significantly different from zero because statistically significant because zero lies in
zero is outside the interval. the interval.
38