0% found this document useful (0 votes)
22 views12 pages

Econ321 2017 Tutorial 1

The document discusses two estimators for the mean and variance of a normal distribution. It shows that the sample mean and sample variance are both unbiased estimators. The sample variance has a smaller variance than the alternative estimator when the sample size is greater than two, making it more efficient.

Uploaded by

Miriam Black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views12 pages

Econ321 2017 Tutorial 1

The document discusses two estimators for the mean and variance of a normal distribution. It shows that the sample mean and sample variance are both unbiased estimators. The sample variance has a smaller variance than the alternative estimator when the sample size is greater than two, making it more efficient.

Uploaded by

Miriam Black
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ECON321: Advanced Econometrics

1st Semester 2017


Tutorial 1

1. Suppose Yi where i = 1, 2, …, n. are i.i.d . with E (Yi ) = µ . An estimator for µ is


1 n ~ Y + Y2
given by Y = ∑
n i =1
Yi . Alternatively, one could use Y = 1
2
to estimate µ .
~
Show that both Y and Y are unbiased estimators of µ . Are they both consistent?
Which one would be better to use, and why?

Answer:
• Both estimators are unbiased:

E (Y ) =
1 n
E (Yi ) = [E (Y1 ) + E (Y2 ) + ... + E (Yn )]
1

n i =1 n
= [µ + µ + ... + µ ] = nµ = µ
1 1
n n
()
2
E Y = ∑ E (Yi ) = [µ + µ ] = 2 µ = µ
~ 1 1 1
2 i =1 2 2
• Y is consistent by the Law of Large Numbers (recall the LLN).
~
Y does not change when we increase the sample size, it does not keep improving,
so it cannot be a consistent estimator.

• Variances are different:

Var (Y ) =
n

∑Var (Y ) = n [Var (Y ) + Var (Y ) + ... + Var (Y )]


1 1
i 1 2 n
n2 i =1
2

σ2
n
1 2 2 2
[1
= 2 σ + σ + ... + σ = 2 nσ =
n
2

n
]
() [ σ
]
2 2
Var Y = ∑Var (Yi ) = σ 2 + σ 2 = 2σ 2 =
~ 1 1 1
4 i =1 4 4 2

~
So the variance of Y is smaller than the variance of Y as long as n is larger than 2. In
this case the sample mean is more efficient.

σ2 σ2
< iff n > 2
n 2
2. Suppose Yi ~ NID (µ , σ ) where i = 1, 2, …, n. An estimator for σ
2 2
is given by

(Yi − Y )2 where Y = 1 ∑ Yi . Show that σˆ 2 is an unbiased


n n
1
σˆ 2 = ∑
n − 1 i =1 n i =1

estimators of σ . Is σ~ = ∑ (Yi − Y ) also an unbiased estimator of σ ?


n
2 2 1 2 2
n i =1
Answer:
Begin by adding and subtracting the true mean from the expression inside the
summation sign, and use lower case letters to denote mean-deviated variables:

1 n
σˆ 2 =  ∑ ((Yi − µ ) − (Y − µ ))2 
n − 1  i =1 

1  n
=  ∑ ( yi − y )2 
n − 1  i =1 

We’ll see why this is important after a few steps.

Now write out the expression in the summation sign:

1  n 2 n 2 n

σˆ 2 =  ∑
n − 1  i =1
y i + ∑
i =1
y − 2 ∑
i =1
yi y 

Recognise that this can be written as:

1 n 2 
σˆ 2 =  ∑
n − 1  i =1
yi + ny 2 − 2ny 2 

1 n 2 
=  ∑
n − 1  i =1
yi − ny 2 

n
since ∑y
i =1
i = ny .

Take the expectations of both sides and redefine things within the brackets:

n 
( )
E σˆ 2 =
1
E ∑ yi2 − ny 2 
n − 1  i =1 
1 n 
=  ∑
n − 1  i =1
( ) ( ( ))
E yi2 − n E y 2 

1  2  σ 2 
= nσ − n 
n −1   n 

Here’s where we make use of two results:


( ) ( )
E yi2 = E (Yi − µ ) = Var (Yi ) = σ 2
2

and:

( ) ( ) 1 n 
E y 2 = E (Y − µ ) = Var (Y ) = Var  ∑ Yi 
2

 n i =1 
  1 σ2
( )
n
1
= 2 Var  ∑ Yi  = 2 nσ 2 =
n  i =1  n n

We can write this earlier expression as:

( )
E σˆ 2 =
1
n −1
[
nσ 2 − σ 2 ]
=
1
n −1
[
(n − 1)σ 2 ]
=σ 2

This proves that this formula σˆ 2 is an unbiased estimator for σ 2 .

If we start with σ~ 2 = ∑ (Yi − Y ) we end up at the same place as above, but the term
n
1 2

n i =1
outside of the brackets is slightly different:

( ) 1 1 
E σ~ 2 = nσ 2 − nσ 2 
n n 
[
= (n − 1)σ 2
1
n
]
=
(n − 1)σ 2
n

Thus, σ~ 2 is not unbiased. In general, it will be slightly smaller than the true variance
(e.g., if n = 100, the expectation of σ~ 2 is 99% of the true variance). However, it is
consistent, because as n goes to infinity the bias disappears.
COMPUTER LAB

The purpose of this lab is to review basic OLS using the econometric software package
STATA.
It is expected that you are already familiar with Stata.

• Now read in the prepared dataset for this tutorial. Type:

insheet using labour.csv, clear

You should see in the ‘Variables’ area four variables (date, obs, ur and lfpr). The first
is the quarter and year (ranging from Mar-86 to Dec-08). The second is the number
of observations (ranging from 1 to 92). The third is the official Unemployment Rate
(% of the labour force actively seeking and available for work based on the quarterly
Household Labour Force Survey (HLFS)). The fourth is the official Labour Force
Participation Rate (% of working age population (15+) who are either employed or
unemployed, also from the HLFS).

At any point you can ‘look’ at these data, by either examining all of the inputted data.
Type:

list

(Note: you’ll need to hit the ‘enter key’ several times to scroll through these data.)

Or you can produce descriptive statistics by typing:

summarize

be sure to use the US spelling! You should get the following:


. summarize

Variable Obs Mean Std. Dev. Min Max

date 0
obs 92 46.5 26.70206 1 92
ur 92 6.094565 2.0826 3.4 10.9
lfpr 92 65.81848 1.575983 63.3 69.3

The date variable is a string variable, and cannot be summarised in a numerical table.
The Unemployment Rate ranges from a low of 3.4% to a high of 10.9%, and has an
average slightly higher than 6.09% over the sample period. The average Labour
Force Participation Rate is nearly 65.82%.

• These are time series data for the New Zealand economy over this 92-quarter period
(23 years). There are some advantages in telling STATA that these are times series
data (you’ll see one in a minute). To do this we need to create an index for the
quarterly data (call it ‘qtr’) based on a baseline of qtr=1 indicating 1960q1 (this is set
arbitrarily in STATA). Type the following:
generate qtr=obs+103
tsset qtr, quarterly

The variable called ‘qtr’ now has a number and format (%tq – see the Variables area)
that recognises the date. Using the list command you can see that this last column
matches the first column (just a different date format).

• Here’s one advantage of this time series format. We can plot the values of the two
labour market variables over our entire sample period by typing:

graph twoway connected ur qtr


12
10
8
ur
6
4

1985q1 1990q1 1995q1 2000q1 2005q1 2010q1


qtr

graph twoway connected lfpr qtr


70
68
lfpr
66 64
62

1985q1 1990q1 1995q1 2000q1 2005q1 2010q1


qtr

The data points are the solid dots in the diagram, and the term ‘connected’ connects
the dots. (Note: replacing ‘connected’ with ‘line’ would remove the dots and
replacing it with ‘scatter’ would remove the line in this command.) The UR has
recently increased after a long expansion period, while the LFPR continues to rise.

• Time to do some simple regression analysis. Let’s say that we suspect that the
aggregate participation rates depend on the state of the labour market in general and
the unemployment rate in particular. The model we have in mind looks like this:
lfprt = β1 + β 2urt + ut

To estimate this model with OLS, type the following:

regress lfpr ur

You should get this output:

Source SS df MS Number of obs = 92


F( 1, 90) = 299.26
Model 173.761767 1 173.761767 Prob > F = 0.0000
Residual 52.2568688 90 .580631875 R-squared = 0.7688
Adj R-squared = 0.7662
Total 226.018636 91 2.48372127 Root MSE = .76199

lfpr Coef. Std. Err. t P>|t| [95% Conf. Interval]

ur -.6635147 .0383552 -17.30 0.000 -.7397139 -.5873155


_cons 69.86231 .2468887 282.97 0.000 69.37182 70.3528

Interpret the estimates for the coefficients, and the P-values.

• For two-variable regression models like this, it’s often helpful to show the scatter
diagram. This can be done by typing the following:

graph twoway scatter lfpr ur

There is an obvious negative relationship between these aggregate variables. When


the unemployment rate rises, the labour force participation rate shrinks.
70
68
lfpr
66 64
62

4 6 8 10 12
ur
• But we can also easily draw the regression line estimated above by typing:

graph twoway (scatter lfpr ur) (lfit lfpr ur) ... where ‘lfit’ stands for line fit

You should get this:

70

68

66

64

62

4 6 8 10 12
ur

lfpr Fitted values

• By the way, if you want to produce heteroscedasticity robust standard errors on your
coefficient estimates, you need to modify this regression command slightly.

regress lfpr ur, vce(robust)

Linear regression Number of obs = 92


F( 1, 90) = 279.35
Prob > F = 0.0000
R-squared = 0.7688
Root MSE = .76199

Robust
lfpr Coef. Std. Err. t P>|t| [95% Conf. Interval]

ur -.6635147 .039699 -16.71 0.000 -.7423836 -.5846458


_cons 69.86231 .2647262 263.90 0.000 69.33639 70.38824

Not much of a change in this case.


• Finally, inspection of the scatter diagram suggests that the functional form for this
regression specification may be wrong. Suppose we suspect that participation
depends on both the unemployment rate and its squared value. We have a second-
degree polynomial (a quadratic) below.

lfprt = δ1 + δ 2urt + δ 3urt2 + vt

Creating this squared term is easy. It can be ‘generated’ by typing:



generate ur2=ur*ur

To see what you’ve done just type:

summarize ur ur2

Variable Obs Mean Std. Dev. Min Max

ur 92 6.094565 2.0826 3.4 10.9


ur2 92 41.4338 28.57753 11.56 118.81

And you can see below that these regressors will be highly correlated (not
surprisingly), which raises the issue of multicollinearity. However, it’s easy to
confirm that the collinearity isn’t perfect:

corr ur ur2

(obs=92)

ur ur2

ur 1.0000
ur2 0.9877 1.0000

Now estimate this new regression specification with the following:

regress lfpr ur ur2, vce(robust)

and get this output:


Linear regression Number of obs = 92
F( 2, 89) = 328.05
Prob > F = 0.0000
R-squared = 0.8355
Root MSE = .6464

Robust
lfpr Coef. Std. Err. t P>|t| [95% Conf. Interval]

ur -1.899395 .1776117 -10.69 0.000 -2.252305 -1.546484


ur2 .0911841 .0121134 7.53 0.000 .067115 .1152532
_cons 73.61636 .5914972 124.46 0.000 72.44107 74.79165

• Interpreting the estimated impact of the unemployment rate on the labour force
participation rate is a little tricky. It helps to use calculus. Take the derivative of the
dependent variable with respect to the explanatory variable

∂lfprt
= δ 2 + 2δ 3urt
∂urt

This ‘marginal effect’ of a percentage point change in the unemployment rate on
participation is a linear function of both δ 2 , δ 3 and the level of current
unemployment.

Replace the actual coefficients with their estimates in this sample, and ‘evaluate’ this
derivative at the sample mean for the unemployment rate:

∂lfprt
= d 2 + 2 d 3u r
∂urt urt = u r
≈ −1.899395 + 2(0.0911841)(6.094565)
≈ −0.78794
o
This says that the marginal effect is larger in absolute value at the mean
unemployment rate than our earlier estimate. More importantly, it says that this
marginal effect depends on the current unemployment rate.
For example, in a very tight labour market (e.g., unemployment at 3.4%), the effect of
a one percentage-point rise in the unemployment rate is substantial:

∂lfprt
≈ −1.899395 + 2(0.0911841)(3.4 ) ≈ −1.2793
∂urt urt = 3.4

In a very loose labour market (e.g., unemployment at 10.9%), the effect of a one
percentage-point rise in the unemployment rate is actually positive:
∂lfprt
≈ −1.899395 + 2(0.0911841)(10.9 ) ≈ 0.0884184
∂urt urt =10.9

We would determine the ‘breakeven point’ for the direction of this relationship. Set
the derivative equal to zero and solve for the unemployment rate:

∂lfprt
≈ −1.899395 + 2(0.0911841)urt = 0
∂urt urt
urt ≈ 10.4152

We estimate in our sample that the depressing impact of rising unemployment on


participation stops when the unemployment rate hits approximately 10.4%.

• You can also test whether or not these marginal effects are equal to zero at different
current unemployment rates. After you estimate the regression above, these Wald
tests can be called with the following command:

test ur+2*ur2*6.094565=0

The terms ‘ur’ and ‘ur2’ refer to coefficient estimates on these variables. The testing
procedure will do the algebra for you. You should get:
( 1) ur + 12.18913 ur2 = 0

F( 1, 89) = 457.61
Prob > F = 0.0000

We can reject this null hypothesis at better than a 0.1% level.

The test on the marginal effect in the tight labour market is written:

test ur+2*ur2*3.4=0

It’s also significantly different from zero.

( 1) ur + 6.8 ur2 = 0

F( 1, 89) = 175.48
Prob > F = 0.0000

The test on the marginal effect in the loose/depressed labour market is written:

test ur+2*ur2*10.9=0

It’s not significantly different from zero.


( 1) ur + 21.8 ur2 = 0

F( 1, 89) = 0.94
Prob > F = 0.3343

You might also like