0% found this document useful (0 votes)
37 views68 pages

Lec Topic2

Uploaded by

xinyangw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views68 pages

Lec Topic2

Uploaded by

xinyangw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

ECON2280 Introductory Econometrics

First Term, 2024-2025

The Simple Regression Model

September, 2024

1 / 68
Definition of the Simple Regression Model

2 / 68
Definition of the Simple Regression Model

▶ The simple linear regression (SLR) model is also called two-variable


linear regression model or bivariate linear regression model.
▶ The SLR model is usually written as

y = β0 + β1 x + u

– β0 : the intercept parameter or the constant term


– β1 : the slope parameter, and is most often of main interest in
econometrics
– y : Dependent variable; explained variable; response variable;
predicted variable; regressand
– x : Independent variable; explanatory variable; control variable;
predictor variable; regressor; covariate
– u: error term; disturbance; unobservable

3 / 68
Interpretation of the SLR Model
▶ The SLR model tries to “explain variable y in terms of variable x” or
“study how y varies with changes in x”:
dy ∂y ∂y ∂u ∂u
= + = β1 + .
dx ∂x
|{z} ∂u ∂x ∂x
ceteris paribus effect

▶ Definition of causal effect of x on y: Other factors being equal


(ceteris paribus), how does y change when x changes?
Most economic questions are ceteris paribus questions.

∂x = β1 , which is
▶ From the definition, the causal effect of x on y is ∂y
equal to dx only if ∂x = 0. That is, dx captures the causal effect on
dy ∂u dy

x on y only if ∂x
∂u
=0
▶ The simple linear regression model is rarely applicable in practice
but its discussion is useful for pedagogical reasons.

4 / 68
Two SLR Examples

▶ Example (Soybean Yield and Fertilizer):

yield = β0 + β1 fertilizer + u,

where β1 measures the effect of fertilizer on yield, holding all other


factors fixed, and u contains factors such as rainfall, land quality,
presence of parasites, ...

▶ Example (A Simple Wage Equation):

wage = β0 + β1 educ + u,

where β1 measures the change in hourly wage given another year of


education, holding all other factors fixed, and u contains factors
such as labor force experience, innate ability, tenure with current
employer, work ethic, ...

5 / 68
When Can We Estimate a Causal Effect?

▶ When ∂x∂u
= 0, dy
dx has a causal interpretation for each individual.
However, very often in practice, we are not able to estimate the
individual specific causal effect.
Technically, because we usually can observe only one pair of (x , y )
for each individual, we cannot identify the individual causal effect
which requires y values for at least two x values.
▶ We therefore explore the change in E [y |x ] in response to a change
in x , and discuss when this has a casual interpretation.

6 / 68
When Can We Estimate a Causal Effect?

▶ Zero conditional mean assumption:

E [u|x ] = 0

– The expected value of u is 0 for every slice of the population


endowed with the value x . To put it differently, x contains no
information about the mean of u.
– E [u|x ] = 0 =⇒ Cov (x , u) = 0. (But not vice versa.)
In practice, we just argue why x and u are correlated to
invalidate a causal interpretation of our estimator.
– E [u|x ] = 0 =⇒ E [u] = 0 (by the Law of Iterated
Expectation)

▶ We also refer E [u|x ] = 0 as the identification assumption (i.e., the


assumption needed to identify the causal effect of x on E (y |x )).

7 / 68
When Can We Estimate a Causal Effect?

▶ The conditional mean independence assumption implies that

E [y |x ] = E [β0 + β1 x + u|x ]
= β0 + β1 x + E [u|x ] (1)
= β 0 + β1 x

▶ The average value of y can be expressed as a linear function of x ,


although in general E [y |x ] can be any function of x .
▶ Equation (1) is the population regression function (PRF).
dE [y |x ] [y |x ]
▶ dx = ∂E∂x = β1 . Hence β1 captures the change in the
expected value of y given a one-unit increase in x .
▶ The PRF is unknown. It is a theoretical relationship assuming a
linear model and conditional mean independence. We need to
estimate the PRF.

8 / 68
26 Part 1 Regression Analysis with Cross-Sectional Data
E (y |x ) as a linear function of x
F i g u r e 2 . 1 E(yx) as a linear function of x.
y

E(y|x) 5 0 1 1 x

© Cengage Learning, 2013


x1 x2 x3

For example, suppose that x is the high school grade point average and y is the college
GPA, and we happen to know that E(colGPAuhsGPA) 5 1.5 1 0.5 hsGPA. [Of course,
9 / 68
in practice, we never know the population intercept and slope, but it is useful to pretend
Exercise

Question: If we want to know what is the return to college


education, i.e., how much more one can earn by going to college,
what is the regression equation to be estimated? What is the
condition that allows you to estimate a causal effect? Please
discuss whether this condition is likely to hold.

10 / 68
Solution
▶ We want to estimate the following wage equation

wage = β0 + β1 educ + u,

educ = 1 if the individual is a college graduate and 0


otherwise, and u represents the unobservables, including
innate ability.
▶ The important condition to get causal effect is:
E [u|educ = 1] = E [u|educ = 0] = 0.
dE [wage|educ]
= E [wage|educ = 1] − E [wage|educ = 0]
deduc
= E [β0 + β1 educ + u|educ = 1] − E [β0 + β1 educ + u|educ = 0]
= (β0 + β1 ) + E [u|educ = 1] − β0 − E [u|educ = 0]
= (β0 + β1 ) − β0 = β1
▶ The conditional mean independence assumption is unlikely to
hold here because individuals with more education will also be
more intelligent on average.
11 / 68
Exercise

Question: If we have E [u|educ] = c ̸= 0. Can we still compare the


wage difference between those with and without college degree to
get a causal estimation on the return to college education?

Solution:
dE [wage|educ]
= E [wage|educ = 1] − E [wage|educ = 0]
deduc
= E [β0 + β1 educ + u|educ = 1] − E [β0 + β1 educ + u|educ = 0]
= (β0 + β1 ) + E [u|educ = 1] − β0 − E [u|educ = 0]
= (β0 + β1 ) + c − β0 − c = β1

12 / 68
Deriving the Ordinary Least Squares Estimates

13 / 68
Deriving the Ordinary Least Squares Estimates

A Random Sample
A Random Sample

▶ In order to estimate the regression model we need data.


In order to estimate the regression model we need data.

▶ We can write for each i

Ping Yu (HKU)
yi = β0 +SLRβ1 xi + ui 18 / 78

14 / 68
the population regression of savings on income.
There are several ways to motivate the following estimation procedure. We will use
A Random Sample (2.5) and an important implication of assumption (2.6): in the population, u is uncorrelated
with x. Therefore, we see that u has zero expected value and that the covariance between
x and u is zero:
E(u) 5 0 [2.10]
▶ Savings and income of a random sample of 15 families, and the
populationF i gregression E [savings|income] = β0 + β1 income
u r e 2 . 2 Scatterplot of savings and income for 15 families, and the population
regression E(savingsincome) 5 0 1 1 income.

savings

E(savings|income) 5 0 1 1 income

0
income

© Cengage Learning, 2013


0

▶ Given a random sample, how to estimate the PRF?


Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

15 / 68
Ordinary Least Squares (OLS) Estimation
▶ Denote (β̂0 , β̂1 ) be the estimates for parameters (β0 , β1 ).

▶ The fitted (or predicted) value of y when x = xi is:

ŷi = β̂0 + β̂1 xi

▶ The residual for observation i is:

ûi = yi − ŷi = yi − β̂0 − β̂1 xi

▶ We choose to minimize the sum of squared residuals


n
X n
X
min ûi2 = min (yi − β̂0 − β̂1 xi )2
β̂0 ,β̂1 i=1 β̂0 ,β̂1 i=1

▶ The minimizer can be found based on the first order conditions


(FOCs)

16 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:


n
1X
(yi − β̂0 − β̂1 xi ) = 0 (2)
n i=1

▶ To which assumption does this equation correspond?


▶ Vevox: 123-393-465

17 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:


n
1X
(yi − β̂0 − β̂1 xi ) = 0 (2)
n i=1

▶ To which assumption does this equation correspond?


▶ Vevox: 123-393-465
▶ E [û] = 0

17 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:


n
1X
xi (yi − β̂0 − β̂1 xi ) = 0 (3)
n i=1

▶ To which assumption does this equation correspond?

18 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:


n
1X
xi (yi − β̂0 − β̂1 xi ) = 0 (3)
n i=1

▶ To which assumption does this equation correspond?


▶ Cov (x , û) = 0

18 / 68
Ordinary Least Squares (OLS) Estimation

▶ From Equation (2),


β̂0 = ȳ − x̄ β̂1 ,
Pn
where x̄ = 1
n i=1 xi is the sample mean of x , and ȳ is similarly
defined.
▶ Substituting β̂0 into Equation (3), we have
Pn Pn
(yi − ȳ )xi (x − x̄ )(yi − ȳ ) σ̂xy
β̂1 = Pi=1n = Pn i
i=1
= 2 (4)
i=1 (xi − x̄ )x i i=1 (x i − x̄ )2 σ̂x

which is the OLS estimator for β1 .


▶ If the constant term is suppressed in the model, i.e., β0 = 0
Pn
xi yi
β̃1 = Pi=1
n 2
i=1 xi

19 / 68
OrdinaryF i gLeast
u r e 2 . 4 Squares
Fitted values and(OLS)
residuals. Estimation
y

yi

ûi 5 residual yˆ 5 ˆ 0 1 ˆ 1x

yˆi 5 fitted value

© Cengage Learning, 2013


yˆ 1
y1

x1 xi x

▶ With the OLS estimates, we form the OLS regression line:


pyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

ŷ = β̂0 + β̂1 x

20 / 68
Example: CEO Salary and Return on Equity
▶ Suppose the SLR model is

salary = β0 + β1 roe + u

where salary is the CEO salary in thousands of dollars, and roe is


the return on equity of the CEO’s firm in percentage.
▶ The fitted regression is

\ = 963.191 + 18.501roe
salary
▶ β̂1 = 18.501 =⇒ if the return on equity increases by 1
percent, then salary is predicted to change by $18,501.
▶ β̂0 = 963.191 =⇒ even if roe = 0, the predicted salary of
CEO is $963,191.
▶ Causal Interpretation of β̂1 ? Think about what factors are included
in u, and whether Cov (x , u) = 0.

21 / 68
Example: Voting Outcomes and Campaign Expenditures

▶ Suppose the SLR model is

voteA = β0 + β1 shareA + u

where voteA is the percentage of vote for candidate A, and shareA


is the percentage of total campaign expenditures spent by A.
▶ The fitted regression is

\ = 26.81 + 0.464shareA
voteA
▶ β̂1 = 0.464 =⇒ if candidate A’s share of spending increases
by one percentage point, he or she receives 0.464 (about one
half) percentage points more of the total vote.
▶ β̂0 = 26.81 =⇒ If candidate A does not spend any on
campaign, then he or she will receive about 26.81% of the
total vote.

22 / 68
Chapter 2 The Simple Regression Model 33

Ordinary Least Squares (OLS) Estimation 


F i g u r e 2 . 5 The OLS regression line salary
​ ​  963.191 1 18.501 roe and the
(unknown) population regression function.

salary

salary 5 963.191 1 18.501 roe

E(salary|roe) 5 0 1 roe
1

963.191

© Cengage Learning, 2013


roe

▶ The OLS regression line is also called the sample regression function
(SRF).TheWage
Example 2.4
PRFand is something
Education
fixed but unknown, while the SRF
changes with different realized sample.
For the population of people in the workforce in 1976, let y 5 wage, where wage is
measured in dollars per hour. Thus, for a particular person, if wage 5 6.75, the hourly
wage is $6.75. Let x 5 educ denote years of schooling; for example, educ 5 12 cor-
responds to a complete high school education. Since the average wage in the sample is
$5.90, the Consumer Price Index indicates that this amount is equivalent to $19.06 in 23 / 68
Exercise

The following table contains the ACT scores and the GPA for eight
college students.

Student GPA ACT


1 2.8 21
2 3.4 24
3 3.0 26
4 3.5 27
5 3.6 29
6 3.0 25
7 2.7 25
8 3.7 30

And x̄ = 25.875, ȳ = 3.2125, i=1 (xi –x̄ )(yi –ȳ ) = 5.8125, and
Pn

i=1 (xi –x̄ ) = 56.875.


Pn 2

24 / 68
Exercise

▶ Estimate the relationship between GPA and ACT using OLS

d = β̂0 + β̂1 ACT


GPA

▶ How much higher is the GPA predicted to be if the ACT score


is increased by five points?
▶ Compute the fitted values and residuals for each observation.
▶ What is the predicted value of GPA when ACT = 20?

25 / 68
Solution

▶ β̂1 = 5.8125/56.875 = 0.1022,


β̂0 = 3.2125 − 0.1022 × 25.875 = 0.5681.
▶ If ACT is 5 points higher, GPA increases by
0.1022 × 5 = 0.511.
ˆ 1 = 0.5681 + 0.1022 × 21 = 2.7143.
▶ GPA
û1 = 2.8 − 2.7143 = 0.0857.
▶ When ACT = 20, GPA ˆ = 0.5681 + 0.1022 × 20 = 2.61.

26 / 68
Properties of OLS on Any Sample of Data

27 / 68
Algebraic Properties of OLS Statistics
Pn
i=1 ûi = 0: Some residuals are positive and others are negative, so

the fitted regression line lies in the middle of the data points.
– ȳ = ŷ¯ + û
¯ = ŷ¯ : the sample average of the fitted values, ŷ , is the
same as sample average of y
– ȳ = β̂0 + β̂1 x̄ : the fitted regression line passes through (x̄ , ȳ )
Pn
▶ i=1 xi ûi = 0:
n n n
1X ¯ = 1
X
¯ = 1
X
C
d ov (x , û) = (xi − x̄ )(ûi − û) xi (ûi − û) xi ûi = 0
n n n
i=1 i=1 i=1

▶ These two properties imply


n
X n
X n
X n
X
ŷi ûi = (β̂0 +β̂1 xi )ûi = β̂0 ûi +β̂1 xi ûi = 0 =⇒ C
d ov (ŷ , û) = 0
i=1 i=1 i=1 i=1

28 / 68
Measures of Variation
▶ SST: Total sum of squares measures the total amount of variability
in the dependent variable
n
X
SST = (yi − ȳ )2
i=1

▶ SSE: Explained sum of squares represents variation explained by


regression
n n
(ŷi − ŷ¯ )2 =
X X
SSE = (ŷi − ȳ )2
i=1 i=1

▶ SSR: Sum of squared residuals measures the total amount of


variability that the model does not explain
n n
¯ 2=
X X
SSR = (ûi − û) ûi2
i=1 i=1

29 / 68
Measures of Variation
Sum of Squares

Total Prediction
Total PredictionErrors
Errors
12
Log GDP per capita growth

11
10
9
8
7
6

1 2 3 4 5 6 7 8

Log Settler Mortality

30 / 68
Measures of Variation
Sum of Squares

Residuals
Residuals
12
Log GDP per capita growth

11
10
9
8
7
6

1 2 3 4 5 6 7 8

Log Settler Mortality

31 / 68
Measures of Variation
▶ It can be shown that SST = SSE + SSR
n
X
SST = (yi − ȳ )2
i=1
n
X
= [(yi − ŷi ) + (ŷi − ȳ )]2
i=1
n
X
= [ûi + (ŷi − ȳ )]2
i=1
n
X n
X n
X
= ûi2 + 2 ûi (ŷi − ȳ ) + (ŷi − ȳ )2
i=1 i=1 i=1
n
X
= SSR + 2 ûi (ŷi − ȳ ) + SSE
i=1
= SSR + SSE

32 / 68
R-squared

▶ The R-squared of the regression, also called the coefficient of


determination, is defined as
SSE SSR
R2 = =1− ∈ [0, 1]
SST SST
▶ R-squared measures the fraction of the total variation that is
explained by the regression.
▶ R-squared tries to explain variation not level; a constant cannot
explain variation (but explains only level), so R 2 = 0 if only the
constant contributes to the regression
▶ R-squared is defined only if there is an intercept; we need to use the
constant to absorb the level of y , and then use xi to measure the
variation of yi .

33 / 68
R-squared Properties of OLS on Any Sample of Data

4 2

1.9
3
1.8

1.7
2
1.6

1.5

1.4
0
1.3

1.2
-1
1.1

-2 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

▶ R 2 is often misinterpreted as ”goodness


Figure: Data Patterns ofR 2fit”.
for R 2 = 0 and = 1 Low R s in
2

regression equations are not uncommon, especially for


cross-sectional analysis.does
Caution: A high R-squared A high R 2 doesmean
not necessarily notthatnecessarily mean
the regression has that
a
the regression
causal has [check
interpretation! a causal interpretation.
the following two examples]With a low R-squared, it
is still possible that β̂1 is a good
Ping Yu (HKU) SLR
estimate of the ceteris paribus45 / 78
relationship between x and y .

34 / 68
Two Examples of R-Squared

▶ CEO Salary and Return on Equity:

\ = 963.191 + 18.501roe
salary

n = 209, R 2 = 0.0132
The regression explains only 1.3% of the total variation in salaries.
▶ Voting Outcomes and Campaign Expenditures:

\ = 26.81 + 0.464ShareA
voteA

n = 173, R 2 = 0.856
The regression explains 85.6% of the total variation in election
outcomes.

35 / 68
Units of Measurement and Functional Form

36 / 68
Changing Units of Measurement
▶ Data Scaling

– Predictions in different units


– Different interpretations
▶ Example:
wage = β0 + β1 educ + u

– wage is in dollars; educ is in years


▶ Original fitted regression:

[ dollars = β̂0 + β̂1 educyears


wage

▶ Wage in cents rather than dollars?

1
[ dollars =
wage wage
[ cents
100

37 / 68
Changing Units of Measurement
▶ Substitute:
1
[ cents = β̂0 + β̂1 educyears
wage
100
[ cents = 100β̂0 + 100β̂1 educyears
wage
=⇒ the estimates of β0 and β1 are scaled by 100
▶ What if we want to measure educ in months?
1
educyears = educmonths
12
▶ Substitute:
1
 
wage
[ dollars = β̂0 + β̂1 educmonths
12

1
 
wage
[ dollars = β̂0 + β̂1 educmonths
12
=⇒ the estimate of β1 are scaled by 1/12
38 / 68
Incorporating Nonlinearities in Simple Regression

▶ Not everything linear in real life.

▶ Relationship between education and wage is linear? Which has the


higher benefit?
– 3 more years after 6th grade?
– 3 more years after undergrad?

▶ Common ways to easily handle non-linearity

1. Take log of the dependent variable


2. Take log of the independent variable
3. Take logs of both

39 / 68
Log-level Model
▶ Regression of log wages on years of education:

ln(wage) = β0 + β1 educ + u

where ln(·) denotes the natural logarithm.


▶ This is often called semi-log or log-linear regression model.

▶ The interpretation of the regression coefficient:

∂ ln(wage) 1 ∂wage ∂wage/wage


β1 = = =
∂educ wage ∂educ ∂educ

where ∂wage/wage is the proportional change of wage.


▶ Or,
100∂wage/wage %∆wage
100β1 = =
∂educ ∆educ
where %∆ is read as “percentage change of”, and ∆ is read as
”change of”.
40 / 68
Example: A Log Wage Equation

▶ The fitted regression line is

ln(wage)
\ = 0.584 + 0.083educ

which implies
[ ≈ e 0.584+0.083educ .
wage
▶ For example, if the current wage is $10 per hour (which implies that
educ = ln(10)−0.584
0.083 . Suppose the education is increased by one year.
Then:
  
ln(10) − 0.584
∆wage = exp 0.584 + 0.083 +1 −10 = 0.865 ≈ 0.83
0.083

∂wage/wage +$0.83/$10
= = 0.083
∂educ +1year

41 / 68
Example: A Log Wage Equation
42 Part 1 Regression Analysis with Cross-Sectional Data

wage = exp(β + β educ), with β1 > 0


F i g u r e 2 . 6 wage  exp(b0 1 b01educ), with
1 b1 > 0.

wage

© Cengage Learning, 2013


0 educ

▶ When the wage level is higher, the increase in wage for one more
year of eduction
Example 2.10
is larger, but the percentage increase of wage is the
A Log Wage Equation
same.Using the same data as in Example 2.4, but using log(wage) as the dependent variable, we
obtain the following relationship:
42 / 68
​
log(wage) ​ 5 0.584 1 0.083 educ [2.44]
Log-log Model
▶ CEO Salary and Firm Sales:

ln(salary ) = β0 + β1 ln(sales) + u

▶ This changes the interpretation of the regression coefficient:

∂ln(salary ) ∂salary /salary %∆salary


β1 = = = = elasticity
∂ ln(sales) ∂sales/sales %∆sales

▶ The log-log form postulates a constant elasticity model, whereas the


semi-log form assumes a semi-elasticity model, with 100β1 called the
semi-elasticity of y with respect to x . In the log wage-level model:

∂ln(wage) ∂ln(wage)
elasticity = = = β1 educ,
∂ln(educ) ∂educ/educ

which depends on educ. The elasticity is larger for a higher


education level.

43 / 68
Log-log Model

▶ The fitted regression line is

ln(salary
\ ) = 4.822 + 0.257 ln(sales),

which implies

\ ≈ e 4.822+0.257 ln(sales) = e 4.822 sales 0.257 .


salary

▶ The salary increases by 0.257% for every 1% increase of sales.

44 / 68
Summary of Functional Forms Involving Logarithms

Part 1 Regression Analysis with Cross-Sectional Data

T a b l e 2 . 3 Summary of Functional Forms Involving Logarithms


Dependent Independent Interpretation
Model Variable Variable of 1

© Cengage Learning, 2013


Level-level y x y  1x
Level-log y log(x) y  (1/100)%x
Log-level log(y) x %y  (1001)x
Log-log log(y) log(x) %y  1%x

We end this subsection by summarizing four combinations of functional forms avail-


able from using either the original variable or its natural log. In Table 2.3, x and y stand for
the variables in their original form. The model with y as the dependent variable and x as
the independent variable is called the level-level model because each variable appears in its
level form. The model with log(y) as the dependent variable and x as the independent vari-
able is called the log-level model. We will not explicitly discuss the level-log model here,
because it arises less often in practice. In any case, we will see examples of this model in
later chapters. 45 / 68
The Meaning of “Linear” Regression

▶ The simple linear model also allows for certain nonlinear


relationships. So what does “linear” mean here?
▶ SLR models impose linearity in parameters β0 and β1 , instead of x
and y .

▶ A SLR model: cons = β0 + β1 inc + u

▶ A nonlinear regression model: cons = 1/(β0 + β1 inc) + u

▶ For most applications, choosing a model that can be put into the
linear regression framework is sufficient.

46 / 68
Expected Values and Variances of the OLS
Estimators

47 / 68
Statistical Properties of OLS Estimators
▶ Recall that the OLS estimator for β0 and β1 are:
Pn
(x − x̄ )(yi − ȳ )
β̂1 = i=1 Pn i and β̂0 = ȳ − x̄ β̂1
i=1 (xi − x̄ )
2

where the data {(xi , yi ) : i = 1, ..., n} is random and depends on the


particular sample that has been drawn.
▶ The estimators are themselves random variables. The realized values
(i.e., estimates) depends on the random sample that is drawn.
▶ Important: OLS is an estimator. It’s a machine that we plug data
into and we get out estimates.
▶ What will the estimators estimate on average and how large is their
variability in repeated samples? i.e.,

E [β̂0 ] =?, E [β̂1 ] =?, and Var (β̂0 ) =?, Var (β̂1 ) =?

48 / 68
Standard Assumptions for the SLR Model

▶ Assumption SLR.1 (Linear in Parameters):

y = β0 + β1 x + u.

– In the population, the relationship between y and x is linear.


– The “linear” in linear regression means “linear in parameter”.

▶ Assumption SLR.2 (Random Sampling): The data


{(xi , yi ) : i = 1, ..., n} is a random sample drawn from the
population, i.e., each data point follows the population equation,

y i = β0 + β1 x i + u i .

49 / 68
Discussion of Random Sampling: Wage and Education

▶ The population consists, for example, of all workers of country A.

▶ In the population, a linear relationship between wages (or log


wages) and years of education holds.
▶ Draw completely randomly a worker from the population.

▶ The wage and the years of education of the worker drawn are
random because one does not know beforehand which worker is
drawn.
▶ Throw back worker into population and repeat random draw n
times.
▶ The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education.

50 / 68
Standard Assumptions for the SLR Model
46 Part 1 Regression Analysis with Cross-Sectional Data

Graph of y = β
F i g u r e 2 . 7 Graph of yi  0 1 1xii 1 ui. 0
+ β1 xi + ui

yi

ui PRF
E(y|x) 5 0 1 1x

u1
y1

© Cengage Learning, 2013


x1 xi x

51 / 68
Standard Assumptions for the SLR Model

▶ Assumption SLR.3 (Sample Variation in Explanatory Variable):


Pn
i=1 (xi − x̄ ) > 0
2

– The values of the explanatory variables are not all the same
(otherwise it would be impossible to study how much the
dependent variable changes when the explanatory variable
changes one unit, β1 ).
Pn
– Note that i=1 (xi − x̄ )2 is the denominator of β̂1 . If
Pn
i=1 (xi − x̄ ) = 0, β̂1 is not defined.
2

▶ Assumption SLR.4 (Zero Conditional Mean): E [u|x ] = 0

52 / 68
the population). For example, if y 5 wage and x 5 educ, then (2.18) fails only if everyone
in the sample has the same amount of education (for example, if everyone is a high school
graduate; see Figure 2.3). If just one person has a different amount of education, then
Standard Assumptions for the SLR Model
(2.18) holds, and the estimates can be computed.

F i g u r e 2 . 3 A scatterplot of wage against education when educi 5 12 for all i.

wage

© Cengage Learning, 2013


0 12 educ

Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
eemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

53 / 68
Unbiasedness of OLS
▶ Theorem: Under assumptions SLR.1-SLR.4,

E [β̂0 ] = β0 and E [β̂1 ] = β1

for any values of β0 and β1 . That is the OLS estimator is an


unbiased estimator.
▶ How to understand unbiasedness?

▶ The estimated coefficients may be smaller or larger, depending on


the sample that is the result of a random draw. (In a given sample,
estimates may differ considerably from true values.)
▶ However, on average, they are equal to the values that characterize
the true relationship between y and x in the population.
▶ “On average” means if the random sampling the estimation were
repeated many times.

54 / 68
Unbiasedness of OLS: Proof
▶ We condition on {xi , i = 1, ..., n}, i.e., the x values can be treated as
fixed. Now, the only randomness is from {ui , i = 1, ..., n}. Note that
Pn
(xi − x̄ )yi
β̂1 − β1 = Pi=1 n − β1
(x − x̄ )2
i=1 i
Pn
SLR.1,2 (xi − x̄ )(β0 + β1 xi + ui )
i=1 P
= n − β1
(x − x̄ )2
i=1 i
Pn Pn Pn
i=1
(xi − x̄ ) i=1
(xi − x̄ )xi (xi − x̄ )ui
= β0 Pn + β1 Pn + Pi=1
n − β1
(xi − x̄ )2 (x − x̄ )
i=1 i
2 (x − x̄ )2
i=1 i
Pn i=1
(xi − x̄ )ui
= Pi=1 n
i=1
(xi − x̄ )
2

where the last equality is because


n
X n
X n
X
(xi − x̄ ) = 0 and (xi − x̄ )xi = (xi − x̄ )2 > 0
i=1 i=1 i=1

55 / 68
Unbiasedness of OLS: Proof

▶ Taking expectations on both sides


 Pn  Pn
(xi − x̄ )ui E [ i=1 (xi − x̄ )ui |x ]
E [β̂1 |x ] − β1 = E Pi=1 n |x = Pn
(x − x̄ )2
i=1 i
(xi − x̄ )2
Pn Pni=1
i=1
E [(xi − x̄ )ui |x ] SLR.2,4 i=1
(xi − x̄ )E [ui |x ]
= Pn = Pn =0
i=1
(xi − x̄ )
2
i=1
(xi − x̄ )
2

▶ Further, since ȳ = β0 + β1 x̄ + ū,

E [β̂0 |x ] = E [ȳ − β̂1 x̄ |x ] = E [β0 − (β̂1 − β1 )x̄ + ū|x ]


= β0 − E [β̂1 − β1 |x ]x̄ + E [ū|x ] = β0

where the lastPnequality is because β1 is unbiased, and


E [ū|x ] = n1 i=1 E [ui |x ] = 0 by Assumptions SLR.2 and SLR.4.

▶ The key assumption for unbiasedness is Assumption SLR.4.

56 / 68
Variances of the OLS Estimators

▶ Unbiasedness is not the only desirable property of the OLS


estimator.
▶ Depending on the sample, the estimates will be nearer or farther
away from the true population values.
▶ How far can we expect our estimates to be away from the true
population values on average
▶ Sampling variability is measured by the estimator’s variance.

57 / 68
The Same Mean But Different Dispersion
Expected Values and Variances of the OLS Estimators

0.5

0
-2 -1 0 1 2

Figure: Random Variables with the Same Mean BUT Different Distributions

Ping Yu (HKU) SLR 66 / 78


58 / 68
Homoskedasticity
▶ Assumption SLR.5 (Homoskedasticity): Var (ui |xi ) = σ 2
52 Part 1 Regression Analysis with Cross-Sectional Data
– The variability of the unobserved influences does not depend
Fon
i g uthe
r e 2 .value of the
8 The simple explanatory
regression variable
model under homoskedasticity.

f(y|x)

© Cengage Learning, 2013


x1 E(y|x) 5 0 1 1 x
x2
x3
x

59 / 68
Heteroskedasticity
Chapter 2 The Simple Regression Model 53
▶ When Var (ui |xi ) depends on xi , the error term is said to exhibit
heteroskedasticity.
F i g u r e 2 . 9 Var(wageeduc) increasing with educ.

f(wage|educ)

wage

© Cengage Learning, 2013


8 E(wage|educ) 5
12 0 1 1educ

16
educ

60 / 68
Variances of OLS Estimators
▶ Theorem: Under assumptions SLR.1-SLR.5,

σ2 σ2
Var (β̂1 ) = Pn =
i=1 (xi − x̄ )
2 SSTx
Pn Pn
σ 2 n−1 i=1 xi2 σ 2 n−1 i=1 xi2
Var (β̂0 ) = Pn =
i=1 (xi − x̄ )
2 SSTx

▶ The sampling variability of the estimated regression coefficients is


the higher when
– the larger the variability of the unobserved factors, σ 2
– the smaller the sample size, n
– the smaller the variation in the explanatory variable

(Note that SSTx = nVar


d (x ).)

61 / 68
Expected Values and Variances of the OLS Estimators
Variances of OLS Estimators

2 2

0 0

-2 -2

0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2

2 2

0 0

-2 -2

0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2

Figure: Relative Difficulty in Identifying β 1


62 / 68
Variances of OLS Estimators: Proof

▶ We focus on Var (β̂1 ). We condition on {xi , i = 1, ..., n}.


 Pn 
(xi − x̄ )ui
Var (β̂1 ) = Var (β̂1 − β1 ) = Var Pi=1
n
(x − x̄ )2
i=1 i
Pn Pn
Var ( (xi − x̄ )ui ) SLR.2
i=1 i=1
Var ((xi − x̄ )ui )
= =
SSTx2 SSTx2
Pn 2
Pn
(x − x̄ ) Var (ui ) SLR.5
i=1 i
(x − x̄ )2 σ 2
i=1 i
= =
SSTx2 SSTx2
2 2
σ SSTx σ
= =
SSTx2 SSTx

▶ The key assumption to get this simple formula of Var (β̂1 ) is


Assumption SLR.5.
▶ The only unknown component of Var (β̂1 ) and Var (β̂0 ) is σ 2 .

63 / 68
Estimating the Error Variance

▶ Under SLR.4 and 5, Var (ui |xi ) = σ 2 = Var (ui ).


The variance of u does not depend on x , i.e., is equal to the
unconditional variance.
▶ The sample analog of Var (ui ) is
n n
1X ¯ 2= 1
X SSR
σ̃ 2 = (ûi − û) (ûi )2 = .
n i=1 n i=1 n

Note that

ûi = β0 + β1 xi + ui − β̂0 − β̂1 xi = ui − (β̂0 − β0 ) − (β̂1 − β1 )xi

=⇒ E [ûi − ui ] = −E (β̂0 − β0 ) − E (β̂1 − β1 )xi = 0


This is why we can useP ûi to substitute ui in the genuine sample
n
analog of Var (ui ) = n1 i=1 (ui − ū)2 .

64 / 68
Estimating the Error Variance
▶ Unfortunately σ̃ 2 is a biased estimator of σ 2

▶ An unbiased estimate of the error variance can be obtained by


subtracting the number of estimated regression coefficients from the
number of observations:
n
1 X 2 SSR
σ̂ 2 = ûi = ,
n − 2 i=1 n−2

where n − 2 is called the degree of freedom of {ûi }ni=1 .


▶ Effectively, only n − 2 residuals are used since the other two residuals
can be derived from these n − 2 residuals by solving the two FOCs.
▶ Theorem (Unbiased Estimation of σ 2 ) [proof not required]: Under
assumptions SLR.1-SLR.5,

E [σ̂ 2 ] = σ 2 .

65 / 68
SER and SE

▶ σ̂ = σ̂ 2 is called the standard error of the regression (SER).
▶ The estimated standard deviations of the regression coefficients are
called standard errors. They measure how precisely the regression
coefficients are estimated:
s
σ̂ 2
q
σ̂
se(β̂1 ) = Var
d (β̂1 ) = =√
SSTx SSTx
s Pn
σ̂ 2 n−1 i=1 xi2
q
σ̂
se(β̂0 ) = d (β̂0 ) =
Var =q
SSTx Pn
SSTx /n−1 i=1 xi2

That is, we plug in σ̂ 2 for the unknown σ 2 .


▶ se(β̂0 ) and se(β̂1 ) are also random variables.

66 / 68

You might also like