0% found this document useful (0 votes)

37 views68 pages

Lec Topic2

Uploaded by

xinyangw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views68 pages

Lec Topic2

Uploaded by

xinyangw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

ECON2280 Introductory Econometrics

First Term, 2024-2025

The Simple Regression Model

September, 2024

1 / 68
Definition of the Simple Regression Model

2 / 68
Definition of the Simple Regression Model

▶ The simple linear regression (SLR) model is also called two-variable

linear regression model or bivariate linear regression model.
▶ The SLR model is usually written as

y = β0 + β1 x + u

– β0 : the intercept parameter or the constant term

– β1 : the slope parameter, and is most often of main interest in
econometrics
– y : Dependent variable; explained variable; response variable;
predicted variable; regressand
– x : Independent variable; explanatory variable; control variable;
predictor variable; regressor; covariate
– u: error term; disturbance; unobservable

3 / 68
Interpretation of the SLR Model
▶ The SLR model tries to “explain variable y in terms of variable x” or
“study how y varies with changes in x”:
dy ∂y ∂y ∂u ∂u
= + = β1 + .
dx ∂x
|{z} ∂u ∂x ∂x
ceteris paribus effect

▶ Definition of causal effect of x on y: Other factors being equal

(ceteris paribus), how does y change when x changes?
Most economic questions are ceteris paribus questions.

∂x = β1 , which is
▶ From the definition, the causal effect of x on y is ∂y
equal to dx only if ∂x = 0. That is, dx captures the causal effect on
dy ∂u dy

x on y only if ∂x
∂u
=0
▶ The simple linear regression model is rarely applicable in practice
but its discussion is useful for pedagogical reasons.

4 / 68
Two SLR Examples

▶ Example (Soybean Yield and Fertilizer):

yield = β0 + β1 fertilizer + u,

where β1 measures the effect of fertilizer on yield, holding all other

factors fixed, and u contains factors such as rainfall, land quality,
presence of parasites, ...

▶ Example (A Simple Wage Equation):

wage = β0 + β1 educ + u,

where β1 measures the change in hourly wage given another year of

education, holding all other factors fixed, and u contains factors
such as labor force experience, innate ability, tenure with current
employer, work ethic, ...

5 / 68
When Can We Estimate a Causal Effect?

▶ When ∂x∂u
= 0, dy
dx has a causal interpretation for each individual.
However, very often in practice, we are not able to estimate the
individual specific causal effect.
Technically, because we usually can observe only one pair of (x , y )
for each individual, we cannot identify the individual causal effect
which requires y values for at least two x values.
▶ We therefore explore the change in E [y |x ] in response to a change
in x , and discuss when this has a casual interpretation.

6 / 68
When Can We Estimate a Causal Effect?

▶ Zero conditional mean assumption:

E [u|x ] = 0

– The expected value of u is 0 for every slice of the population

endowed with the value x . To put it differently, x contains no
information about the mean of u.
– E [u|x ] = 0 =⇒ Cov (x , u) = 0. (But not vice versa.)
In practice, we just argue why x and u are correlated to
invalidate a causal interpretation of our estimator.
– E [u|x ] = 0 =⇒ E [u] = 0 (by the Law of Iterated
Expectation)

▶ We also refer E [u|x ] = 0 as the identification assumption (i.e., the

assumption needed to identify the causal effect of x on E (y |x )).

7 / 68
When Can We Estimate a Causal Effect?

▶ The conditional mean independence assumption implies that

E [y |x ] = E [β0 + β1 x + u|x ]
= β0 + β1 x + E [u|x ] (1)
= β 0 + β1 x

▶ The average value of y can be expressed as a linear function of x ,

although in general E [y |x ] can be any function of x .
▶ Equation (1) is the population regression function (PRF).
dE [y |x ] [y |x ]
▶ dx = ∂E∂x = β1 . Hence β1 captures the change in the
expected value of y given a one-unit increase in x .
▶ The PRF is unknown. It is a theoretical relationship assuming a
linear model and conditional mean independence. We need to
estimate the PRF.

8 / 68
26 Part 1 Regression Analysis with Cross-Sectional Data
E (y |x ) as a linear function of x
F i g u r e 2 . 1 E(yx) as a linear function of x.
y

E(y|x) 5 0 1 1 x

© Cengage Learning, 2013

x1 x2 x3

For example, suppose that x is the high school grade point average and y is the college
GPA, and we happen to know that E(colGPAuhsGPA) 5 1.5 1 0.5 hsGPA. [Of course,
9 / 68
in practice, we never know the population intercept and slope, but it is useful to pretend
Exercise

Question: If we want to know what is the return to college

education, i.e., how much more one can earn by going to college,
what is the regression equation to be estimated? What is the
condition that allows you to estimate a causal effect? Please
discuss whether this condition is likely to hold.

10 / 68
Solution
▶ We want to estimate the following wage equation

wage = β0 + β1 educ + u,

educ = 1 if the individual is a college graduate and 0

otherwise, and u represents the unobservables, including
innate ability.
▶ The important condition to get causal effect is:
E [u|educ = 1] = E [u|educ = 0] = 0.
dE [wage|educ]
= E [wage|educ = 1] − E [wage|educ = 0]
deduc
= E [β0 + β1 educ + u|educ = 1] − E [β0 + β1 educ + u|educ = 0]
= (β0 + β1 ) + E [u|educ = 1] − β0 − E [u|educ = 0]
= (β0 + β1 ) − β0 = β1
▶ The conditional mean independence assumption is unlikely to
hold here because individuals with more education will also be
more intelligent on average.
11 / 68
Exercise

Question: If we have E [u|educ] = c ̸= 0. Can we still compare the

wage difference between those with and without college degree to
get a causal estimation on the return to college education?

12 / 68
Deriving the Ordinary Least Squares Estimates

13 / 68
Deriving the Ordinary Least Squares Estimates

A Random Sample
A Random Sample

▶ In order to estimate the regression model we need data.

In order to estimate the regression model we need data.

▶ We can write for each i

Ping Yu (HKU)
yi = β0 +SLRβ1 xi + ui 18 / 78

14 / 68
the population regression of savings on income.
There are several ways to motivate the following estimation procedure. We will use
A Random Sample (2.5) and an important implication of assumption (2.6): in the population, u is uncorrelated
with x. Therefore, we see that u has zero expected value and that the covariance between
x and u is zero:
E(u) 5 0 [2.10]
▶ Savings and income of a random sample of 15 families, and the
populationF i gregression E [savings|income] = β0 + β1 income
u r e 2 . 2 Scatterplot of savings and income for 15 families, and the population
regression E(savingsincome) 5 0 1 1 income.

savings

E(savings|income) 5 0 1 1 income

0
income

© Cengage Learning, 2013

▶ Given a random sample, how to estimate the PRF?

Copyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

15 / 68
Ordinary Least Squares (OLS) Estimation
▶ Denote (β̂0 , β̂1 ) be the estimates for parameters (β0 , β1 ).

▶ The fitted (or predicted) value of y when x = xi is:

ŷi = β̂0 + β̂1 xi

▶ The residual for observation i is:

ûi = yi − ŷi = yi − β̂0 − β̂1 xi

▶ We choose to minimize the sum of squared residuals

n
X n
X
min ûi2 = min (yi − β̂0 − β̂1 xi )2
β̂0 ,β̂1 i=1 β̂0 ,β̂1 i=1

▶ The minimizer can be found based on the first order conditions

(FOCs)

16 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:

n
1X
(yi − β̂0 − β̂1 xi ) = 0 (2)
n i=1

▶ To which assumption does this equation correspond?

▶ Vevox: 123-393-465

17 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:

n
1X
(yi − β̂0 − β̂1 xi ) = 0 (2)
n i=1

▶ To which assumption does this equation correspond?

▶ Vevox: 123-393-465
▶ E [û] = 0

17 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:

n
1X
xi (yi − β̂0 − β̂1 xi ) = 0 (3)
n i=1

▶ To which assumption does this equation correspond?

18 / 68
Ordinary Least Squares (OLS) Estimation

Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1

▶ Divide by −2, and divide by n:

n
1X
xi (yi − β̂0 − β̂1 xi ) = 0 (3)
n i=1

▶ To which assumption does this equation correspond?

▶ Cov (x , û) = 0

18 / 68
Ordinary Least Squares (OLS) Estimation

▶ From Equation (2),

β̂0 = ȳ − x̄ β̂1 ,
Pn
where x̄ = 1
n i=1 xi is the sample mean of x , and ȳ is similarly
defined.
▶ Substituting β̂0 into Equation (3), we have
Pn Pn
(yi − ȳ )xi (x − x̄ )(yi − ȳ ) σ̂xy
β̂1 = Pi=1n = Pn i
i=1
= 2 (4)
i=1 (xi − x̄ )x i i=1 (x i − x̄ )2 σ̂x

which is the OLS estimator for β1 .

▶ If the constant term is suppressed in the model, i.e., β0 = 0
Pn
xi yi
β̃1 = Pi=1
n 2
i=1 xi

19 / 68
OrdinaryF i gLeast
u r e 2 . 4 Squares
Fitted values and(OLS)
residuals. Estimation
y

ûi 5 residual yˆ 5 ˆ 0 1 ˆ 1x

yˆi 5 fitted value

© Cengage Learning, 2013

yˆ 1
y1

x1 xi x

▶ With the OLS estimates, we form the OLS regression line:

pyright 2012 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

ŷ = β̂0 + β̂1 x

20 / 68
Example: CEO Salary and Return on Equity
▶ Suppose the SLR model is

salary = β0 + β1 roe + u

where salary is the CEO salary in thousands of dollars, and roe is

the return on equity of the CEO’s firm in percentage.
▶ The fitted regression is

\ = 963.191 + 18.501roe
salary
▶ β̂1 = 18.501 =⇒ if the return on equity increases by 1
percent, then salary is predicted to change by $18,501.
▶ β̂0 = 963.191 =⇒ even if roe = 0, the predicted salary of
CEO is $963,191.
▶ Causal Interpretation of β̂1 ? Think about what factors are included
in u, and whether Cov (x , u) = 0.

21 / 68
Example: Voting Outcomes and Campaign Expenditures

▶ Suppose the SLR model is

voteA = β0 + β1 shareA + u

where voteA is the percentage of vote for candidate A, and shareA

is the percentage of total campaign expenditures spent by A.
▶ The fitted regression is

\ = 26.81 + 0.464shareA
voteA
▶ β̂1 = 0.464 =⇒ if candidate A’s share of spending increases
by one percentage point, he or she receives 0.464 (about one
half) percentage points more of the total vote.
▶ β̂0 = 26.81 =⇒ If candidate A does not spend any on
campaign, then he or she will receive about 26.81% of the
total vote.

22 / 68
Chapter 2 The Simple Regression Model 33

Ordinary Least Squares (OLS) Estimation 

F i g u r e 2 . 5 The OLS regression line salary
 963.191 1 18.501 roe and the
(unknown) population regression function.

salary

salary 5 963.191 1 18.501 roe

E(salary|roe) 5 0 1 roe
1

963.191

© Cengage Learning, 2013

roe

▶ The OLS regression line is also called the sample regression function
(SRF).TheWage
Example 2.4
PRFand is something
Education
fixed but unknown, while the SRF
changes with different realized sample.
For the population of people in the workforce in 1976, let y 5 wage, where wage is
measured in dollars per hour. Thus, for a particular person, if wage 5 6.75, the hourly
wage is $6.75. Let x 5 educ denote years of schooling; for example, educ 5 12 cor-
responds to a complete high school education. Since the average wage in the sample is
$5.90, the Consumer Price Index indicates that this amount is equivalent to $19.06 in 23 / 68
Exercise

The following table contains the ACT scores and the GPA for eight
college students.

Student GPA ACT

1 2.8 21
2 3.4 24
3 3.0 26
4 3.5 27
5 3.6 29
6 3.0 25
7 2.7 25
8 3.7 30

And x̄ = 25.875, ȳ = 3.2125, i=1 (xi –x̄ )(yi –ȳ ) = 5.8125, and
Pn

i=1 (xi –x̄ ) = 56.875.

Pn 2

24 / 68
Exercise

▶ Estimate the relationship between GPA and ACT using OLS

d = β̂0 + β̂1 ACT

GPA

▶ How much higher is the GPA predicted to be if the ACT score

is increased by five points?
▶ Compute the fitted values and residuals for each observation.
▶ What is the predicted value of GPA when ACT = 20?

25 / 68
Solution

▶ β̂1 = 5.8125/56.875 = 0.1022,

β̂0 = 3.2125 − 0.1022 × 25.875 = 0.5681.
▶ If ACT is 5 points higher, GPA increases by
0.1022 × 5 = 0.511.
ˆ 1 = 0.5681 + 0.1022 × 21 = 2.7143.
▶ GPA
û1 = 2.8 − 2.7143 = 0.0857.
▶ When ACT = 20, GPA ˆ = 0.5681 + 0.1022 × 20 = 2.61.

26 / 68
Properties of OLS on Any Sample of Data

27 / 68
Algebraic Properties of OLS Statistics
Pn
i=1 ûi = 0: Some residuals are positive and others are negative, so
▶
the fitted regression line lies in the middle of the data points.
– ȳ = ŷ¯ + û
¯ = ŷ¯ : the sample average of the fitted values, ŷ , is the
same as sample average of y
– ȳ = β̂0 + β̂1 x̄ : the fitted regression line passes through (x̄ , ȳ )
Pn
▶ i=1 xi ûi = 0:
n n n
1X ¯ = 1
X
¯ = 1
X
C
d ov (x , û) = (xi − x̄ )(ûi − û) xi (ûi − û) xi ûi = 0
n n n
i=1 i=1 i=1

▶ These two properties imply

n
X n
X n
X n
X
ŷi ûi = (β̂0 +β̂1 xi )ûi = β̂0 ûi +β̂1 xi ûi = 0 =⇒ C
d ov (ŷ , û) = 0
i=1 i=1 i=1 i=1

28 / 68
Measures of Variation
▶ SST: Total sum of squares measures the total amount of variability
in the dependent variable
n
X
SST = (yi − ȳ )2
i=1

▶ SSE: Explained sum of squares represents variation explained by

regression
n n
(ŷi − ŷ¯ )2 =
X X
SSE = (ŷi − ȳ )2
i=1 i=1

▶ SSR: Sum of squared residuals measures the total amount of

variability that the model does not explain
n n
¯ 2=
X X
SSR = (ûi − û) ûi2
i=1 i=1

29 / 68
Measures of Variation
Sum of Squares

Total Prediction
Total PredictionErrors
Errors
12
Log GDP per capita growth

11
10
9
8
7
6

1 2 3 4 5 6 7 8

Log Settler Mortality

30 / 68
Measures of Variation
Sum of Squares

Residuals
Residuals
12
Log GDP per capita growth

11
10
9
8
7
6

1 2 3 4 5 6 7 8

Log Settler Mortality

31 / 68
Measures of Variation
▶ It can be shown that SST = SSE + SSR
n
X
SST = (yi − ȳ )2
i=1
n
X
= [(yi − ŷi ) + (ŷi − ȳ )]2
i=1
n
X
= [ûi + (ŷi − ȳ )]2
i=1
n
X n
X n
X
= ûi2 + 2 ûi (ŷi − ȳ ) + (ŷi − ȳ )2
i=1 i=1 i=1
n
X
= SSR + 2 ûi (ŷi − ȳ ) + SSE
i=1
= SSR + SSE

32 / 68
R-squared

▶ The R-squared of the regression, also called the coefficient of

determination, is defined as
SSE SSR
R2 = =1− ∈ [0, 1]
SST SST
▶ R-squared measures the fraction of the total variation that is
explained by the regression.
▶ R-squared tries to explain variation not level; a constant cannot
explain variation (but explains only level), so R 2 = 0 if only the
constant contributes to the regression
▶ R-squared is defined only if there is an intercept; we need to use the
constant to absorb the level of y , and then use xi to measure the
variation of yi .

33 / 68
R-squared Properties of OLS on Any Sample of Data

4 2

1.9
3
1.8

1.7
2
1.6

1.5

1.4
0
1.3

1.2
-1
1.1

-2 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

▶ R 2 is often misinterpreted as ”goodness

Figure: Data Patterns ofR 2fit”.
for R 2 = 0 and = 1 Low R s in
2

regression equations are not uncommon, especially for

cross-sectional analysis.does
Caution: A high R-squared A high R 2 doesmean
not necessarily notthatnecessarily mean
the regression has that
a
the regression
causal has [check
interpretation! a causal interpretation.
the following two examples]With a low R-squared, it
is still possible that β̂1 is a good
Ping Yu (HKU) SLR
estimate of the ceteris paribus45 / 78
relationship between x and y .

34 / 68
Two Examples of R-Squared

▶ CEO Salary and Return on Equity:

\ = 963.191 + 18.501roe
salary

n = 209, R 2 = 0.0132
The regression explains only 1.3% of the total variation in salaries.
▶ Voting Outcomes and Campaign Expenditures:

\ = 26.81 + 0.464ShareA
voteA

n = 173, R 2 = 0.856
The regression explains 85.6% of the total variation in election
outcomes.

35 / 68
Units of Measurement and Functional Form

36 / 68
Changing Units of Measurement
▶ Data Scaling

– Predictions in different units

– Different interpretations
▶ Example:
wage = β0 + β1 educ + u

– wage is in dollars; educ is in years

▶ Original fitted regression:

[ dollars = β̂0 + β̂1 educyears

wage

▶ Wage in cents rather than dollars?

1
[ dollars =
wage wage
[ cents
100

37 / 68
Changing Units of Measurement
▶ Substitute:
1
[ cents = β̂0 + β̂1 educyears
wage
100
[ cents = 100β̂0 + 100β̂1 educyears
wage
=⇒ the estimates of β0 and β1 are scaled by 100
▶ What if we want to measure educ in months?
1
educyears = educmonths
12
▶ Substitute:
1

wage
[ dollars = β̂0 + β̂1 educmonths
12

1

wage
[ dollars = β̂0 + β̂1 educmonths
12
=⇒ the estimate of β1 are scaled by 1/12
38 / 68
Incorporating Nonlinearities in Simple Regression

▶ Not everything linear in real life.

▶ Relationship between education and wage is linear? Which has the

higher benefit?
– 3 more years after 6th grade?
– 3 more years after undergrad?

▶ Common ways to easily handle non-linearity

1. Take log of the dependent variable

2. Take log of the independent variable
3. Take logs of both

39 / 68
Log-level Model
▶ Regression of log wages on years of education:

ln(wage) = β0 + β1 educ + u

where ln(·) denotes the natural logarithm.

▶ This is often called semi-log or log-linear regression model.

▶ The interpretation of the regression coefficient:

∂ ln(wage) 1 ∂wage ∂wage/wage

β1 = = =
∂educ wage ∂educ ∂educ

where ∂wage/wage is the proportional change of wage.

▶ Or,
100∂wage/wage %∆wage
100β1 = =
∂educ ∆educ
where %∆ is read as “percentage change of”, and ∆ is read as
”change of”.
40 / 68
Example: A Log Wage Equation

▶ The fitted regression line is

ln(wage)
\ = 0.584 + 0.083educ

which implies
[ ≈ e 0.584+0.083educ .
wage
▶ For example, if the current wage is $10 per hour (which implies that
educ = ln(10)−0.584
0.083 . Suppose the education is increased by one year.
Then:

ln(10) − 0.584
∆wage = exp 0.584 + 0.083 +1 −10 = 0.865 ≈ 0.83
0.083

∂wage/wage +$0.83/$10
= = 0.083
∂educ +1year

41 / 68
Example: A Log Wage Equation
42 Part 1 Regression Analysis with Cross-Sectional Data

wage = exp(β + β educ), with β1 > 0

F i g u r e 2 . 6 wage  exp(b0 1 b01educ), with
1 b1 > 0.

wage

© Cengage Learning, 2013

0 educ

▶ When the wage level is higher, the increase in wage for one more
year of eduction
Example 2.10
is larger, but the percentage increase of wage is the
A Log Wage Equation
same.Using the same data as in Example 2.4, but using log(wage) as the dependent variable, we
obtain the following relationship:
42 / 68

log(wage)  5 0.584 1 0.083 educ [2.44]
Log-log Model
▶ CEO Salary and Firm Sales:

ln(salary ) = β0 + β1 ln(sales) + u

▶ This changes the interpretation of the regression coefficient:

∂ln(salary ) ∂salary /salary %∆salary

β1 = = = = elasticity
∂ ln(sales) ∂sales/sales %∆sales

▶ The log-log form postulates a constant elasticity model, whereas the

semi-log form assumes a semi-elasticity model, with 100β1 called the
semi-elasticity of y with respect to x . In the log wage-level model:

∂ln(wage) ∂ln(wage)
elasticity = = = β1 educ,
∂ln(educ) ∂educ/educ

which depends on educ. The elasticity is larger for a higher

education level.

43 / 68
Log-log Model

▶ The fitted regression line is

ln(salary
\ ) = 4.822 + 0.257 ln(sales),

which implies

\ ≈ e 4.822+0.257 ln(sales) = e 4.822 sales 0.257 .

salary

▶ The salary increases by 0.257% for every 1% increase of sales.

44 / 68
Summary of Functional Forms Involving Logarithms

Part 1 Regression Analysis with Cross-Sectional Data

T a b l e 2 . 3 Summary of Functional Forms Involving Logarithms

Dependent Independent Interpretation
Model Variable Variable of 1

© Cengage Learning, 2013

Level-level y x y  1x
Level-log y log(x) y  (1/100)%x
Log-level log(y) x %y  (1001)x
Log-log log(y) log(x) %y  1%x

We end this subsection by summarizing four combinations of functional forms avail-

able from using either the original variable or its natural log. In Table 2.3, x and y stand for
the variables in their original form. The model with y as the dependent variable and x as
the independent variable is called the level-level model because each variable appears in its
level form. The model with log(y) as the dependent variable and x as the independent vari-
able is called the log-level model. We will not explicitly discuss the level-log model here,
because it arises less often in practice. In any case, we will see examples of this model in
later chapters. 45 / 68
The Meaning of “Linear” Regression

▶ The simple linear model also allows for certain nonlinear

relationships. So what does “linear” mean here?
▶ SLR models impose linearity in parameters β0 and β1 , instead of x
and y .
√
▶ A SLR model: cons = β0 + β1 inc + u
√
▶ A nonlinear regression model: cons = 1/(β0 + β1 inc) + u

▶ For most applications, choosing a model that can be put into the
linear regression framework is sufficient.

46 / 68
Expected Values and Variances of the OLS
Estimators

47 / 68
Statistical Properties of OLS Estimators
▶ Recall that the OLS estimator for β0 and β1 are:
Pn
(x − x̄ )(yi − ȳ )
β̂1 = i=1 Pn i and β̂0 = ȳ − x̄ β̂1
i=1 (xi − x̄ )
2

where the data {(xi , yi ) : i = 1, ..., n} is random and depends on the

particular sample that has been drawn.
▶ The estimators are themselves random variables. The realized values
(i.e., estimates) depends on the random sample that is drawn.
▶ Important: OLS is an estimator. It’s a machine that we plug data
into and we get out estimates.
▶ What will the estimators estimate on average and how large is their
variability in repeated samples? i.e.,

E [β̂0 ] =?, E [β̂1 ] =?, and Var (β̂0 ) =?, Var (β̂1 ) =?

48 / 68
Standard Assumptions for the SLR Model

▶ Assumption SLR.1 (Linear in Parameters):

y = β0 + β1 x + u.

– In the population, the relationship between y and x is linear.

– The “linear” in linear regression means “linear in parameter”.

▶ Assumption SLR.2 (Random Sampling): The data

{(xi , yi ) : i = 1, ..., n} is a random sample drawn from the
population, i.e., each data point follows the population equation,

y i = β0 + β1 x i + u i .

49 / 68
Discussion of Random Sampling: Wage and Education

▶ The population consists, for example, of all workers of country A.

▶ In the population, a linear relationship between wages (or log

wages) and years of education holds.
▶ Draw completely randomly a worker from the population.

▶ The wage and the years of education of the worker drawn are
random because one does not know beforehand which worker is
drawn.
▶ Throw back worker into population and repeat random draw n
times.
▶ The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education.

50 / 68
Standard Assumptions for the SLR Model
46 Part 1 Regression Analysis with Cross-Sectional Data

Graph of y = β
F i g u r e 2 . 7 Graph of yi  0 1 1xii 1 ui. 0
+ β1 xi + ui

ui PRF
E(y|x) 5 0 1 1x

u1
y1

x1 xi x

51 / 68
Standard Assumptions for the SLR Model

▶ Assumption SLR.3 (Sample Variation in Explanatory Variable):

Pn
i=1 (xi − x̄ ) > 0
2

– The values of the explanatory variables are not all the same
(otherwise it would be impossible to study how much the
dependent variable changes when the explanatory variable
changes one unit, β1 ).
Pn
– Note that i=1 (xi − x̄ )2 is the denominator of β̂1 . If
Pn
i=1 (xi − x̄ ) = 0, β̂1 is not defined.
2

▶ Assumption SLR.4 (Zero Conditional Mean): E [u|x ] = 0

52 / 68
the population). For example, if y 5 wage and x 5 educ, then (2.18) fails only if everyone
in the sample has the same amount of education (for example, if everyone is a high school
graduate; see Figure 2.3). If just one person has a different amount of education, then
Standard Assumptions for the SLR Model
(2.18) holds, and the estimates can be computed.

F i g u r e 2 . 3 A scatterplot of wage against education when educi 5 12 for all i.

wage

0 12 educ

Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
eemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

53 / 68
Unbiasedness of OLS
▶ Theorem: Under assumptions SLR.1-SLR.4,

E [β̂0 ] = β0 and E [β̂1 ] = β1

for any values of β0 and β1 . That is the OLS estimator is an

unbiased estimator.
▶ How to understand unbiasedness?

▶ The estimated coefficients may be smaller or larger, depending on

the sample that is the result of a random draw. (In a given sample,
estimates may differ considerably from true values.)
▶ However, on average, they are equal to the values that characterize
the true relationship between y and x in the population.
▶ “On average” means if the random sampling the estimation were
repeated many times.

54 / 68
Unbiasedness of OLS: Proof
▶ We condition on {xi , i = 1, ..., n}, i.e., the x values can be treated as
fixed. Now, the only randomness is from {ui , i = 1, ..., n}. Note that
Pn
(xi − x̄ )yi
β̂1 − β1 = Pi=1 n − β1
(x − x̄ )2
i=1 i
Pn
SLR.1,2 (xi − x̄ )(β0 + β1 xi + ui )
i=1 P
= n − β1
(x − x̄ )2
i=1 i
Pn Pn Pn
i=1
(xi − x̄ ) i=1
(xi − x̄ )xi (xi − x̄ )ui
= β0 Pn + β1 Pn + Pi=1
n − β1
(xi − x̄ )2 (x − x̄ )
i=1 i
2 (x − x̄ )2
i=1 i
Pn i=1
(xi − x̄ )ui
= Pi=1 n
i=1
(xi − x̄ )
2

where the last equality is because

n
X n
X n
X
(xi − x̄ ) = 0 and (xi − x̄ )xi = (xi − x̄ )2 > 0
i=1 i=1 i=1

55 / 68
Unbiasedness of OLS: Proof

▶ Taking expectations on both sides

▶ Further, since ȳ = β0 + β1 x̄ + ū,

E [β̂0 |x ] = E [ȳ − β̂1 x̄ |x ] = E [β0 − (β̂1 − β1 )x̄ + ū|x ]

= β0 − E [β̂1 − β1 |x ]x̄ + E [ū|x ] = β0

where the lastPnequality is because β1 is unbiased, and

E [ū|x ] = n1 i=1 E [ui |x ] = 0 by Assumptions SLR.2 and SLR.4.

▶ The key assumption for unbiasedness is Assumption SLR.4.

56 / 68
Variances of the OLS Estimators

▶ Unbiasedness is not the only desirable property of the OLS

estimator.
▶ Depending on the sample, the estimates will be nearer or farther
away from the true population values.
▶ How far can we expect our estimates to be away from the true
population values on average
▶ Sampling variability is measured by the estimator’s variance.

57 / 68
The Same Mean But Different Dispersion
Expected Values and Variances of the OLS Estimators

0.5

0
-2 -1 0 1 2

Figure: Random Variables with the Same Mean BUT Different Distributions

Ping Yu (HKU) SLR 66 / 78

58 / 68
Homoskedasticity
▶ Assumption SLR.5 (Homoskedasticity): Var (ui |xi ) = σ 2
52 Part 1 Regression Analysis with Cross-Sectional Data
– The variability of the unobserved influences does not depend
Fon
i g uthe
r e 2 .value of the
8 The simple explanatory
regression variable
model under homoskedasticity.

f(y|x)

x1 E(y|x) 5 0 1 1 x
x2
x3
x

59 / 68
Heteroskedasticity
Chapter 2 The Simple Regression Model 53
▶ When Var (ui |xi ) depends on xi , the error term is said to exhibit
heteroskedasticity.
F i g u r e 2 . 9 Var(wageeduc) increasing with educ.

f(wage|educ)

wage

8 E(wage|educ) 5
12 0 1 1educ

16
educ

60 / 68
Variances of OLS Estimators
▶ Theorem: Under assumptions SLR.1-SLR.5,

σ2 σ2
Var (β̂1 ) = Pn =
i=1 (xi − x̄ )
2 SSTx
Pn Pn
σ 2 n−1 i=1 xi2 σ 2 n−1 i=1 xi2
Var (β̂0 ) = Pn =
i=1 (xi − x̄ )
2 SSTx

▶ The sampling variability of the estimated regression coefficients is

the higher when
– the larger the variability of the unobserved factors, σ 2
– the smaller the sample size, n
– the smaller the variation in the explanatory variable

(Note that SSTx = nVar

d (x ).)

61 / 68
Expected Values and Variances of the OLS Estimators
Variances of OLS Estimators

2 2

0 0

-2 -2

0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2

2 2

0 0

-2 -2

0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2

Figure: Relative Difficulty in Identifying β 1

62 / 68
Variances of OLS Estimators: Proof

▶ We focus on Var (β̂1 ). We condition on {xi , i = 1, ..., n}.

Pn
(xi − x̄ )ui
Var (β̂1 ) = Var (β̂1 − β1 ) = Var Pi=1
n
(x − x̄ )2
i=1 i
Pn Pn
Var ( (xi − x̄ )ui ) SLR.2
i=1 i=1
Var ((xi − x̄ )ui )
= =
SSTx2 SSTx2
Pn 2
Pn
(x − x̄ ) Var (ui ) SLR.5
i=1 i
(x − x̄ )2 σ 2
i=1 i
= =
SSTx2 SSTx2
2 2
σ SSTx σ
= =
SSTx2 SSTx

▶ The key assumption to get this simple formula of Var (β̂1 ) is

Assumption SLR.5.
▶ The only unknown component of Var (β̂1 ) and Var (β̂0 ) is σ 2 .

63 / 68
Estimating the Error Variance

▶ Under SLR.4 and 5, Var (ui |xi ) = σ 2 = Var (ui ).

The variance of u does not depend on x , i.e., is equal to the
unconditional variance.
▶ The sample analog of Var (ui ) is
n n
1X ¯ 2= 1
X SSR
σ̃ 2 = (ûi − û) (ûi )2 = .
n i=1 n i=1 n

Note that

ûi = β0 + β1 xi + ui − β̂0 − β̂1 xi = ui − (β̂0 − β0 ) − (β̂1 − β1 )xi

=⇒ E [ûi − ui ] = −E (β̂0 − β0 ) − E (β̂1 − β1 )xi = 0

This is why we can useP ûi to substitute ui in the genuine sample
n
analog of Var (ui ) = n1 i=1 (ui − ū)2 .

64 / 68
Estimating the Error Variance
▶ Unfortunately σ̃ 2 is a biased estimator of σ 2

▶ An unbiased estimate of the error variance can be obtained by

subtracting the number of estimated regression coefficients from the
number of observations:
n
1 X 2 SSR
σ̂ 2 = ûi = ,
n − 2 i=1 n−2

where n − 2 is called the degree of freedom of {ûi }ni=1 .

▶ Effectively, only n − 2 residuals are used since the other two residuals
can be derived from these n − 2 residuals by solving the two FOCs.
▶ Theorem (Unbiased Estimation of σ 2 ) [proof not required]: Under
assumptions SLR.1-SLR.5,

E [σ̂ 2 ] = σ 2 .

65 / 68
SER and SE
√
▶ σ̂ = σ̂ 2 is called the standard error of the regression (SER).
▶ The estimated standard deviations of the regression coefficients are
called standard errors. They measure how precisely the regression
coefficients are estimated:
s
σ̂ 2
q
σ̂
se(β̂1 ) = Var
d (β̂1 ) = =√
SSTx SSTx
s Pn
σ̂ 2 n−1 i=1 xi2
q
σ̂
se(β̂0 ) = d (β̂0 ) =
Var =q
SSTx Pn
SSTx /n−1 i=1 xi2

That is, we plug in σ̂ 2 for the unknown σ 2 .

▶ se(β̂0 ) and se(β̂1 ) are also random variables.

66 / 68

Deep Learning MCQ
90% (73)
Deep Learning MCQ
34 pages
BivariateReg WT2425
No ratings yet
BivariateReg WT2425
109 pages
Three Reservoir Problems
100% (3)
Three Reservoir Problems
28 pages
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
No ratings yet
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
107 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
3 Lecture03
No ratings yet
3 Lecture03
30 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Chapter - 2
No ratings yet
Chapter - 2
59 pages
Module 4
No ratings yet
Module 4
36 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Linear Regression Model - Applied - Part 1&2
No ratings yet
Linear Regression Model - Applied - Part 1&2
69 pages
CH 02 Wooldridge 5e ppt20250307
No ratings yet
CH 02 Wooldridge 5e ppt20250307
51 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
CH 3
No ratings yet
CH 3
123 pages
EC2C4 Econometrics II
No ratings yet
EC2C4 Econometrics II
56 pages
After Midterm Slides
No ratings yet
After Midterm Slides
134 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
94% (17)
Jeffrey M Wooldridge Solutions Manual and Supplementary Materials For Econometric Analysis of Cross Section and Panel Data 2003
135 pages
Econometric S
No ratings yet
Econometric S
8 pages
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
No ratings yet
Lecture 2 & 3: Simple Linear Regression: Gumilang Aryo Sahadewo
55 pages
Bangla Hand Written Digit Recognition
No ratings yet
Bangla Hand Written Digit Recognition
19 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
Part 2 - Multiple Regression Model
No ratings yet
Part 2 - Multiple Regression Model
49 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
Lecture 1a
No ratings yet
Lecture 1a
17 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
62 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Assignment3SolNew Fall2024
No ratings yet
Assignment3SolNew Fall2024
9 pages
03 - Mich - Solutions To Problem Set 1 - Ao319
No ratings yet
03 - Mich - Solutions To Problem Set 1 - Ao319
13 pages
Introduction To Econometrics (3 Updated Edition, Global Edition)
No ratings yet
Introduction To Econometrics (3 Updated Edition, Global Edition)
8 pages
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
No ratings yet
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
29 pages
Lectures
No ratings yet
Lectures
766 pages
Handout 41100 PolynomialRegression
No ratings yet
Handout 41100 PolynomialRegression
6 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
100% (1)
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
27 pages
Suggested Solutions: Problem Set 5: β = (X X) X Y
No ratings yet
Suggested Solutions: Problem Set 5: β = (X X) X Y
7 pages
The Simple Regression Model (2 Variable Model) : Empirical Economics 26163 Handout 1
No ratings yet
The Simple Regression Model (2 Variable Model) : Empirical Economics 26163 Handout 1
9 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Chapter 1: The Nature of Econometrics and Economic Data
No ratings yet
Chapter 1: The Nature of Econometrics and Economic Data
19 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Control Lab Quiz
No ratings yet
Control Lab Quiz
1 page
CBSE Class 10 Maths Worksheet - Polynomials
No ratings yet
CBSE Class 10 Maths Worksheet - Polynomials
3 pages
Zhu Et Al. - 2024 - Propagation Structure-Aware Graph Transformer For
No ratings yet
Zhu Et Al. - 2024 - Propagation Structure-Aware Graph Transformer For
12 pages
Two-Variable Regression Analysis, Some Basic Ideas
No ratings yet
Two-Variable Regression Analysis, Some Basic Ideas
28 pages
Choosing A Functional Form
No ratings yet
Choosing A Functional Form
8 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Chapter-4-Discrete Time Signal in Transform Domain
No ratings yet
Chapter-4-Discrete Time Signal in Transform Domain
26 pages
Null 2
No ratings yet
Null 2
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
16 pages
Solving Nonlinear Programming Problem in Fuzzy Environment
No ratings yet
Solving Nonlinear Programming Problem in Fuzzy Environment
12 pages
C&i Lab Final Eee
No ratings yet
C&i Lab Final Eee
50 pages
Transportation Model
No ratings yet
Transportation Model
44 pages
SSE QuantifyingGhostlyEpisodes PartII 7 29NDKD
No ratings yet
SSE QuantifyingGhostlyEpisodes PartII 7 29NDKD
39 pages
RC4 Encryption Algorithm
No ratings yet
RC4 Encryption Algorithm
7 pages
Reading Material Mod 3 Statistical Methods
No ratings yet
Reading Material Mod 3 Statistical Methods
15 pages
21 Eng 143
No ratings yet
21 Eng 143
19 pages
SDET Formulae MidSem2 2018 Ver3
No ratings yet
SDET Formulae MidSem2 2018 Ver3
2 pages
Program 4
No ratings yet
Program 4
8 pages
Confusion Matrix ROC
No ratings yet
Confusion Matrix ROC
8 pages
Paper 4a QP
No ratings yet
Paper 4a QP
6 pages
A Survey of Gaussian Convolution Algorithms: Pascal Getreuer
No ratings yet
A Survey of Gaussian Convolution Algorithms: Pascal Getreuer
25 pages
Intro To LINGO
No ratings yet
Intro To LINGO
21 pages
ISYE 6740 - (SU22) Syllabus
No ratings yet
ISYE 6740 - (SU22) Syllabus
6 pages
LMI Relaxations and Its Application To Data-Driven Control Design For Switched Affine Systems
No ratings yet
LMI Relaxations and Its Application To Data-Driven Control Design For Switched Affine Systems
21 pages
Central Divided Difference: Topic: Differentiation
No ratings yet
Central Divided Difference: Topic: Differentiation
14 pages
Numerical Analysis I Sma 2321.docx 1
No ratings yet
Numerical Analysis I Sma 2321.docx 1
3 pages
EEE2035F: Signals and Systems I: Class Test 1
No ratings yet
EEE2035F: Signals and Systems I: Class Test 1
5 pages
A Language-Based Approach To Measuring Scholarly Impact
No ratings yet
A Language-Based Approach To Measuring Scholarly Impact
8 pages
Important Examples For 1st Mid Term Exam
No ratings yet
Important Examples For 1st Mid Term Exam
3 pages
DIP Final
No ratings yet
DIP Final
3 pages