0% found this document useful (0 votes)
8 views

Lect 2

Uploaded by

vss.yt15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lect 2

Uploaded by

vss.yt15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

ECON4150 - Introductory Econometrics

Lecture 3: Review of Statistics & OLS

Stock and Watson Chapter 3-4


2

Lecture outline

• Comparing means from different populations

• Ideal randomized experiment

• Using the t-statistic when n is small

• Relationship between two random variables

• California test score data

• scatter plot

• sample covariance

• sample correlation

• Linear regression with 1 regressor

• derivation of the OLS estimators

• measures of fit (R 2 and SER)


3

Comparing means from different populations

• Previous lecture we tested the hypothesis that the mean wage of


individuals with a master degree equals 60000

• Suppose we would like to test whether the mean wages of men and
women with a master degree differ by an amount d0

H0 : µw M − µw F = d0 H1 : µw M − µw F 6= d0

• To test the null hypothesis against the two-sided alternative we follow


the 4 steps with some adjustments
 
Step 1: Estimate (µw M − µw F ) by W M − W F

• Because a weighted average of 2 independent normal random variables


is itself normally distributed we have (Cov W M , W F = 0)
 
σWM σWF
WM − WF ∼ N µw M − µw F , +
nM nF
4

Comparing means from different populations

 
Step 2: Estimate σWM and σWF to obtain SE W M − W F
s
2 2
  sW M
sW F
SE W M − W F = +
nM nF

Step 3: compute the t-statistic


 
W M − W F − d0
act
t =  
SE W M − W F

Step 4: Reject H0 at a 5% significance level if


• |t act | > 1.96
• or if p − value < 0.05
5

Comparing means from different populations

Suppose we have random samples of 500 men and 500 women with a
master degree

and we would like to test that the mean wages are equal:

H0 : µw M − µw F = 0 H1 : µw M − µw F 6= 0

Step 1: W M − W F = 64159.45 − 53163.41 = 10996.04


 
Step 2: SE W M − W F = 1240.709

(W M −W F )−0
Step 3: t act = SE (W M −W F )
= 10996.04
1240.709
= 8.86

Step 4: Since we use a 5% significance level, we reject H0 because


|t act | = 8.86 > 1.96
6

Comparing means from different populations


Difference in mean wages between men and women with a master degree

Thursday January 12 15:47:46 2017 Page 1

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
This is how to do the test in Stata: ___/ / /___/ / /___/
Statistics/Data Analysis

1 . ttest wage, by(female)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 500 64159.45 847.7946 18957.26 62493.76 65825.13


1 500 53163.41 905.8709 20255.89 51383.62 54943.2

combined 1,000 58661.43 643.9819 20364.5 57397.72 59925.14

diff 10996.04 1240.709 8561.34 13430.73

diff = mean( 0) - mean( 1) t = 8.8627


Ho: diff = 0 degrees of freedom = 998

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
7

Confidence interval for the difference in population means

• The method for constructing a confidence interval for 1 population mean


can be easily extended to the difference between 2 population means

• A hypothesized value of the difference in means d0 will be rejected if


|t| > 1.96

• and will be in the confidence set if |t| ≤ 1.96

• Thus the 95% confidence interval for (µWM − µWF ) are the values of d0
 
within ±1.96 standard errors of W M − W F

95% confidence interval for µWM − µWF


   
W M − W F ± 1.96 · SE W M − W F

10996.04 ± 1.96 · 1240.709

{8561.34 , 13430.73}
8

Comparing means from different populations


Example: An ideal randomized experiment

In this course we will focus on estimating causal effects:

the expected effect on Y of a change in X

A causal effect can be measured by an ideal randomized experiment:

• Subjects are selected by simple random sampling from the population of


interest

• Subjects are randomly assigned to a treatment or control group

• Treatment group receives treatment of interest (X = 1), control group


receives no treatment (X = 0).

• The mean causal effect is the difference between the mean outcome
when treated and the mean outcome when untreated

Mean causal effect = µX =1 − µX =0


9

Comparing means from different populations


Example: An ideal randomized experiment

If we want to know whether the treatment is effective we can test:

H0 : µX =1 − µX =0 = 0 H1 : µX =1 − µX =0 6= 0

Step 1: Estimate (µX =1 − µX =0 ) by computing the difference in mean


outcomes of individuals in the treatment and control group:

Y Treated − Y Control

 
Step 2: Compute SE Y Treated − Y Control

(Y Treated −Y Control )−0


Step 3: Compute t act = SE (Y Treated −Y Control )

Step 4: Reject the null hypothesis of no treatment effect at a 5%


significance level if |t act | > 1.96
10

Using the t-statistic when n is small

• The test on the previous slide is based on the sample size n being large

• Especially in actual randomized experiments n can be small

• If the hypothesis test concerns 1 population mean, the t-statistic

Y − µY ,0
t act =  
SE Y

• is not normally distributed for small n!


• has the student-t distribution in the special case that the population
distribution of Y is normal.

• If the hypothesis test concerns the difference in 2 population means, the


t-statistic  
Y M − Y F − d0
t act =  
SE Y M − Y F

• is not normally distributed for small n!


• does not have a student-t distribution even if the population
distributions are normal!
11

Relationship between two random variables

• In general, questions in econometrics involve a relationship between 2


(or more) random variables:

• What is the relation between education and earnings?

• What is the relation between interest rates and economic growth?

• What is the relation between the beer tax and traffic fatalities?

• What is the relation between class size and student test scores?

• In this and coming lectures we will focus on the last of these questions.
12

California test score data

• We will use a data set that contains data on test performance, school
characteristics and student demographic backgrounds.
• The data are from 420 districts in California.
• Data were obtained from the California Department of Education
• Main variables of interest:
• TestScore is the district average of the reading and math scores of
5th grade students
• ClassSize is defined as the number of students divided by the
number of full-time equivalent teachers in the district.
13

The relation between class size and test scores


• To examine the relation between class size and test scores we can
make a scatter plot
A scatter plot is a plot of n observations on Xi and Yi in which each
observation is represented by the point (Xi , Yi )

700

680
Test score

660

640

620

600
14 16 18 20 22 24 26
Class size
.
14

Sample covariance

• The covariance is a measure of the extend to which two random


variables X and Y move together,

Cov (X , Y ) = σXY = E [(X − µX ) · (Y − µY )]

• The population covariance is unobserved but can be estimated by the


sample covariance sXY
n
1 X  
sXY = Xi − X Yi − Y
n−1
i=1

• If (Xi , Yi )are i.i.d and have finite fourth moments E X 4 < ∞ &


E Y4 < ∞

p
sXY −→ σXY
• The sample covariance between class size and test scores sCT =-8.16
15

Sample correlation

• What does it mean for the sample covariance between test scores and
class size to equal -8.16?

• The units of the covariance are the units of test scores multiplies by the
units of class size

• The sample correlation rXY measures the strength of the linear


association between X and Y that is unit-free and lies between -1 and 1
sXY
rXY =
sX sY

• The sample correlation between class size and test scores rCT =-0.23
Friday January 13 10:48:09 2017 Page 1 16

Sample covariance and correlation in Stata ___ ____ ___


/__ / ____
To compute the sample covariance in Stata: ___/ / /___/
Statistics/Dat

1 . corr test_score class_size, covariance


(obs=420)

test_s~e class_~e
Friday January 13 10:48:38 2017 Page 1
test_score 363.03
class_size -8.15932 3.57895
___ _
/__
___/ /
To compute the sample correlation in Stata: Statis

1 . corr test_score class_size


(obs=420)

test_s~e class_~e

test_score 1.0000
class_size -0.2264 1.0000
.
Linear regression with one regressor
18

Linear regression with one regressor

Suppose we would like to answer the following question:

What is the effect on district test scores if we would increase district average
class size by 1 student?

We would like to know


4Test score
βClassSize =
4Class size

βClassSize is the definition of the slope of a straight line relating test scores and
class size
Test score = β0 + βClassSize × Class size
where β0 is the intercept of the straight line.
19

Linear regression with one regressor

• The average test score in district i does not only depend on the average
class size

• It also depends on factors such as

• Quality of the teachers

• Student background

• quality of text books

• .....

• The equation describing the linear relation between Test score and
Class size is better written as

Test scorei = β0 + βClassSize × Class sizei + ui

where ui lumps together all other district characteristics that affect


average test scores.
20

Terminology for the Linear Regression Model with One Regressor

The linear regression model with one regressor is denoted by

Yi = β0 + β1 Xi + ui

where

• Yi is the dependent variable

• Xi is the independent variable or regressor

• β0 + β1 Xi is the population regression line

• β0 is the intercept of the population regression line (expected value of Y


when X = 0)

• β1 is the slope of the population regression line

• ui is the error term (all other factors determining Yi )


21

Linear regression with one regressor

u1 u6

X
22

Linear regression with one regressor


• In general we don’t know β0 and β1 and we have to estimate them using
a random sample of data.
• How to find the line that fits the data best?

700

680
Test score

660

640

620

600
14 16 18 20 22 24 26
Class size
23

The Ordinary Least Squares Estimator (OLS)

The OLS estimator chooses the regression coefficients so that the estimated
regression line is as close as possible to the observed data,
where closeness is measured by the sum of the squared
mistakes made in predicting Y given X

• Let b0 and b1 be estimators of β0 and β1

• The predicted value of Yi given Xi using these estimators is b0 + b1 Xi

• The prediction mistake is

Yi − (b0 + b1 Xi ) = Yi − b0 − b1 Xi

• The estimators of the slope and intercept that minimize


n
X
(Yi − b0 − b1 Xi )2
i=1

are called the ordinary least squares (OLS) estimators of β0 and β1


24

Y is the ordinary least squares estimator of µY

• Suppose there is no X only Y

Yi = µY + ui
• Let m be an estimator of µY

• The least squares estimator minimizes


n
X
(Yi − m)2
i=1

• Taking the derivative w.r.t m and setting it to zero gives


Pn 2

−2 ni=1 (Yi − m)
P
∂m i=1 (Yi − m) = =0

−2 ni=1 Yi + 2 · n · m
P
=0
1
Pn
n i=1 Yi − m =0
• Solving for m gives
n
1X
m= Yi = Y
n
i=1
25

The Simple Linear Regression Model

Yi = β0 + β1 Xi + ui

• OLS minimizes sum of squared prediction mistakes:


n
X n 
X 2
bi2 =
u Yi − βb0 − βb1 Xi
i=1 i=1

• Step 1:
n 2
∂ X
Yi − βb0 − βb1 Xi = 0
∂ βb0 i=1

• Step 2:
n 2
∂ X
Yi − βb0 − βb1 Xi = 0
∂ βb1 i=1
26

Step 1: OLS estimator of β0

Pn Pn  

∂β i=1 ui2 = −2 i=1 Yi − βb0 − βb1 Xi =0
0
b

P 
n Pn
1
βb0 − ni=1 βb1 Xi
P
= n i=1 Yi − i=1 =0

Pn
1
Yi − n1 nβb0 − βb1 n1 ni=1 Xi
P
= n i=1 =0

= Yi − βb0 − βb1 Xi =0

• This gives

c0 = Y − βb1 X
β
27

Step 2: OLS estimator of β1

Pn Pn  

∂β i=1 ui2 = −2 · i=1 −Xi Yi − βb0 − βb1 Xi =0
1
b

Devide by − 2 and substitute for βb0 :


Pn    
= i=1 X i Y i − Y − b1 X − βb1 Xi
β =0

rewrite
Pn    
i=1 Xi Yi − Y − βb1 Xi − βb1 X

rewrite
Pn    
Yi − Y − βb1 ni=1 Xi Xi − X
P
= i=1 Xi =0

Algebra trick
Pn    P   
= i=1 Xi − X Yi − Y − βb1 ni=1 Xi − X Xi − X =0
28

Step 2: OLS estimator of β1

Algebra trick:

Pn    Pn Pn Pn Pn
i=1 Xi − X Yi − Y = i=1 Xi Yi − i=1 Xi Y − i=1 X Yi + i=1 XY
Pn Pn 1
Pn 
= i=1 Xi Yi − i=1 Xi Y − nX n i=1 Yi + nX Y
Pn Pn
= i=1 Xi Yi − i=1 Xi Y −nX Y + nX Y
Pn Pn
= Xi Yi − i=1 Xi Y
i=1
 
= ni=1 Xi Yi − Y
P

By a similar reasoning:

Pn   P   
i=1 Xi Xi − X = ni=1 Xi − X Xi − X .
29

Step 2: OLS estimator of β1

Pn Pn    P   

∂β i=1 ui2 = i=1 Xi − X Yi − Y − βb1 ni=1 Xi − X Xi − X =0
1
b

Solving for βb1 gives the OLS estimator:

Pn 1
Pn
Pi=1
(Xi −X )(Yi −Y ) n−1 i=1 (Xi −X )(Yi −Y ) sxy
β
c1 = n = = sx2
i=1 (Xi −X )(Xi −X )
1
Pn
n−1 i=1 (Xi −X )(Xi −X )

The OLS predicted values Y


bi and residuals u
bi are:

Y
bi = βb0 + βb1 Xi

bi = Yi − Y
u bi
30

The Simple Linear Regression Model


Example: Class size and test scores

TestScore_hat=698.9 - 2.28 * ClassSize


700

680
Test score

660

640

620

600
15 20 25
Class size
.
31

The Simple Linear Regression Model


Example: Class size and test scores

Friday January 13 14:48:31 2017 Page 1

TestScorei = β0 + β1 ClassSize i + u____


___ ____ i ____ ____(R)
/__ / ____/ / ____/
. ___/ / /___/ / /___/
Statistics/Data Analysis

1 . regress test_score class_size, robust

Linear regression Number of obs = 420


F(1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

Robust
test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

class_size -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671


_cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

• βb1 = −2.27 A reduction in class size by 1 student is associated with an


increase in test scores by 2.27 points

• βb0 = 698.93 The expected test score when class size is zero equals
698.93 (what does it mean for class size to be zero)?
Friday January 13 15:00:27 2017 Page 1 32

Y is the ordinary least squares estimator of µY ___ ____ ____ ____


Example: test scores /__ / ____/ / _
___/ / /___/ / /_
The sample mean of district average test scores TestScore Statistics/Data
= 654.16 Analysi

1 . mean test_score

Mean estimation Number of obs = 420

Mean Std. Err. [95% Conf. Interval]

test_score
Friday 654.1565
January 13 15:00:50 2017 .9297082
Page 1 652.3291 655.984

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
As shown on slide 24 we can also obtain the sample /mean
___/ by OLS
/___/ / /___/
Statistics/Data Analysis

1 . regress test_score

Source SS df MS Number of obs = 420


F(0, 419) = 0.00
Model 0 0 . Prob > F = .
Residual 152109.594 419 363.030056 R-squared = 0.0000
Adj R-squared = 0.0000
Total 152109.594 419 363.030056 Root MSE = 19.053

test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

_cons 654.1565 .9297082 703.61 0.000 652.3291 655.984

.
33

Measures of fit

How well does the estimated regression line describe the data?

• Does the regressor X account for much or for little variation in Y ?

• Are the observations in the scatter plot clustered closely around the
regression line?

Two measures of how well the OLS line fits the data.

The R 2 measures the fraction of the variation in Yi


explained/predicted by Xi

The standard error of the regression SER measures how far Yi typically is
from its predicted value
34

2
The R

R 2 is the fraction of the sample variance of Yi explained/predicted by Xi


We can write
Yi = Y
bi + u
bi
2
which implies that the R is the ratio of the sample variance of Y
bi and the
sample variance of Yi
Pn  b 2
Explained sum of squares ESS i=1 Yi − Y
R2 = = = P  2
Total sum of squares TSS n
i=1 Yi − Y

The R 2 ranges from 0 to 1

• If R 2 = 0, Xi explains no none of the variation in Yi


• If R 2 = 1, Xi explains all of the variation in Yi (Yi = Y
bi )
• in practice 0 < R 2 < 1
35

2
The R

The total sum of squares TSS can be divided in the explained sum of
squares ESS and the residual sum of squares SSR:

TSS = ESS + SSR


Pn  2 Pn  b 2 P  2
i=1 Yi − Y = i=1 Yi − Y + ni=1 Yi − Y
bi

Pn  2 Pn  b 2 P
i=1 Yi − Y = i=1 Yi − Y + ni=1 u
bi2

This implies that the R 2 can also be written as

Pn
ESS TSS − SSR SSR b2
u
R2 = = =1− = P i=1 i 2
TSS TSS TSS n
i=1 Yi − Y
36

2
The R
Friday
Example: January
Class 13 test
size and 14:48:31 2017
scores Page 1

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis

1 . regress test_score class_size, robust

Linear regression Number of obs = 420


F(1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

Robust
test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

class_size -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671


_cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

R 2 = 0.0512

Note: the R 2 is uninformative about whether an increase in class size causes


a reduction in test scores!
37

The standard error of the regression

• Another measures of fit is the SER.

The standard error of the regression (SER) is an estimator of the standard


deviation of the regression error ui
v
u n
q u 1 X
2 bi2
SER = sbu = sbu = t u
n−2
i=1

It measures the spread of the observations around the regression line in the
units of the dependent variable

• The divisor n-2 is used because 2 degrees of freedom were lost in


estimating the two regression coefficients β0 and β1 .
38

The standard error of the regression


Example: Class
Friday size and
January 13 test scores
14:48:31 2017 Page 1

___ ____ ____ ____ ____(R)


/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis

1 . regress test_score class_size, robust

Linear regression Number of obs = 420


F(1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

Robust
test_score Coef. Std. Err. t P>|t| [95% Conf. Interval]

class_size -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671


_cons 698.933 10.36436 67.44 0.000 678.5602 719.3057

In Stata the SER is denoted as Root MSE.

SER = 18.6

You might also like