0% found this document useful (0 votes)

13 views83 pages

Simple Linear Regression Model I

Uploaded by

traczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views83 pages

Simple Linear Regression Model I

Uploaded by

traczyk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

Introduction to Econometrics

THE SIMPLE REGRESSION MODEL:

DEFINITION AND ESTIMATION

Tobias Broer

January 24, 2023

Outline of today’s lecture

1. Definition of the Simple Regression Model

2. Deriving the Ordinary Least Squares Estimates
3. Properties of OLS on any Sample of Data
4. Units of Measurement and Functional Form
Learning outcomes

1. After this lecture, you will be familiar with the concept, and language, of
“OLS regression analysis"
2. You will know how to estimate the coefficients in the simple linear
regression model on the basis of a sample
3. You will know the properties of the OLS estimates, including goodness of
fit
4. And you will be able to interpret the estimated coefficients, and the effect
of changing units of measurement, and functional form
1. Definition of the Simple Regression Model

1. Consider a cross-sectional population (of which we will eventually have a

random sample), with attributes including x and y
2. We are interested in “how y varies with changes in x" in this population
3. Examples:
I x is amount of fertilizer, and y is soybean yield.
I x is years of schooling, y is hourly wage.
Three issues

1. How do we allow factors other than x to affect y ? There is never an exact

relationship between two variables (in interesting cases).
2. What is the functional relationship between y and x?
3. How can we be sure we a capturing a ceteris paribus (all other factors held
constant) relationship between y and x (as is so often the goal)?
Simple linear regression model

Assume the following relation in the population

y = β0 + β1 x + u, (1)
Simple linear regression model

Assume the following relation in the population

y = β0 + β1 x + u, (1)

I Assumption 1a: Linear model, β0 intercept parameter and β1 slope

parameter
I Assumption 1b: All Factors other than x affect y through “error term" u
Simple linear regression model

Assume the following relation in the population

y = β0 + β1 x + u, (1)

I Assumption 1a: Linear model, β0 intercept parameter and β1 slope

parameter
I Assumption 1b: All Factors other than x affect y through “error term" u
I Note: relation between y and x is not symmetric
I Teminology:
I “Simple" = 2 variable
I “linear" - linear model
I “regression" - no useful meaning (historical origin from “regression to the
mean")
I y is dependent / explained / response variable / regressand
I x is independent / explanatory / control variable / regressor
I u is error / disturbance term
Simple linear regression model

y = β0 + β1 x + u, (2)

Assumptions 1a and 1b solve the three issues

I Other factors than x affect y only through u.
I Functional relationship between y and x: linear
I Ceteris paribus relationship: ∆y = β1 ∆x + ∆u, so ∆y |
∆x ∆u =0
= β1 , tells us
how y changes when x changes, holding all other factors (i.e. u) fixed.
EXAMPLE 1: Yield and Fertilizer

I A model to explain crop yield by fertilizer use is

yield = β0 + β1 fertilizer + u, (3)

I A model to explain crop yield by fertilizer use is

yield = β0 + β1 fertilizer + u, (3)

I u contains land quality, rainfall on a plot of land, and so on.
I The slope parameter, β1 , is of primary interest: it tells us how yield
changes when the amount of fertilizer changes, holding all else fixed.
I Note: Is the effect of fertilizer really constant?
The linear function is probably not realistic here. The effect of fertilizer is
likely to diminish at large amounts of fertilizer.
EXAMPLE 2: Wage and Education

wage = β0 + β1 educ + u (4)

I u contains somewhat nebulous factors (“ability”) but also past workforce

experience and tenure on the current job.

∆wage = β1 ∆educ (5)

when ∆u = 0
I But: Is each year of education really worth the same dollar amount no
matter how much education one starts with?
Simple linear regression model

y = β0 + β1 x + u, (6)

I Note: We are mainly interested in the population coefficient β1 , which

describes a ceteris paribus relation. If we know it, that’s enough.
I But: We typically do not know β1 . If we had data on y ,x,u, we could just
calculate β1 .
I But: u is unobserved. If we could manipulate x experimentally, we might
still identify β1 .
I But: We typically cannot change x, and have to identify β1 from given,
“observational" data.
Simple linear regression model

y = β0 + β1 x + u, (6)

I Note: We are mainly interested in the population coefficient β1 , which

y = β0 + β1 x + u, (7)

I We only observe given yi = β0 + β1 xi + ui and xi

I Can we identify β1 for any arbitrary u?
Can we identify β1 for any arbitrary u?

y = β0 + β1 x + u (8)
Simple linear regression model

I We only observe given yi = β0 + β1 xi + ui and xi

I Can we identify β1 for any arbitrary u?
NO!
I For example, suppose u = c + dx, so yi = (β0 + c) + (β1 + d)xi .
I Or, even worse, u = −β1 x.
I The problem here is that the unobserved u comoves with x (we can add
another error to their relation, s.t. they comove “on average”).
I This implies that some of the observed comovement of x and y comes
from the unobserved u. So we cannot hope to identify β1 from
observations on {yi , xi } alone.
I So we must restrict our attention to certain kinds of situations where u
fulfills additional assumptions.
Additional assumptions on u

y = β0 + β1 x + u, (9)

I Assumption 2a: E (u) = 0 where E (·) is the expected value operator.

I NB: As long as we are not interested in β0 , assumption 2a is without loss
of generality: The presence of β0 in

y = β0 + β1 x + u (10)

allows us to assume E (u) = 0. If the average of u is different from zero,

we just adjust the intercept, leaving the slope the same. If α0 = E (u) then
we can write

y = (β0 + α0 ) + β1 x + (u − α0 ), (11)

where the new error, u 0 = u − α0 , has a zero mean.

I The new intercept is β0 + α0 . The important point is that the slope, β1 ,
has not changed.
How about dependence between u and x?

y = β0 + β1 x + u, (12)

I We could assume u and x uncorrelated in the population:

Corr (x, u) = 0 (13)

I Zero correlation actually works for most purposes, and in this course.
I But it implies only that u and x are not linearly related. Ruling out only
linear dependence can cause problems with interpretation and makes
statistical analysis more difficult.
How about dependence between u and x?

y = β0 + β1 x + u, (14)

I We could assume u and x independent in the population:

I Turns out (full) independence is more than we need.
Mean independence

y = β0 + β1 x + u, (15)

I Assumption 2b: E (u|x) = E (u) for all values of x, where E (u|x) means
“the expected value of u given x.”
I We say u is mean independent of x.
I Note: Full independence also suffices, as it implies mean independence.
I For this course, Cov (x, u) = 0 also suffices.
Example 1: Fertilizer and yield

I Suppose u is “land quality” and x is fertilizer amount. Then

E (u|x) = E (u) if fertilizer amounts are chosen independently of quality.
This assumption is reasonable but assumes fertilizer amounts are assigned
at random.
I Might fail if farmers have deployed more fertilizer on better / worse land.
Example 2: Wage equation

I Suppose u is “ability” and x is years of education. We need, for example,

E (ability |x = 8) = E (ability |x = 12) = E (ability |x = 16) (16)

so that the average ability is the same in the different portions of the
population with an 8th grade education, a 12th grade education, and a
four-year college eduction.
I Because people choose education levels partly based on ability, this
assumption is almost certainly false.
Zero conditional mean and population regression function

I Combining E (u|x) = E (u) (the substantive assumption) with E (u) = 0 (a

normalization) gives

E (u|x) = 0, for all values of x (17)

I Called the zero conditional mean assumption.
I Because the expected value is a linear operator, E (u|x) = 0 implies

E (y |x) = β0 + β1 x + E (u|x) = β0 + β1 x, (18)

which shows the population regression function (PRF) is a linear function

of x.
I Regression analysis is essentially about explaining effects of explanatory
variables on average outcomes of y . Only if the PRF has slope β1 can we
hope to identify it from data using the techniques in this course.
I A different approach to simple regression ignores the causality issue and
just starts with a linear model for E (y |x) as a descriptive device.
Illustration

I The straight line is the PRF, E (y |x) = β0 + β1 x. The conditional

distribution of y at three different values of x are superimposed.
I For a given value of x, we see a range of y values: remember,
y = β0 + β1 x + u, and u has a distribution in the population.
Summary so far

I Aim: test economic theory, quantify ceteris paribus effect of x on y ,

forecast y
I Needed to define functional form, and how factors other than x effect y ,
making sure we can isolate cet-par effect
I Assumed y = β0 + β1 x + u, the “simple linear regression model"
I Saw that, if we want to have some hope to gain information about β1
from observations about yi , xi , i = 1, .., N we need to restrict u
I Added two assumptions
I Assumption 2a: E (u) = 0
I Assumption 2b: E (u|x) = E (u)
I Implies a linear “population regression function " E [y |x] = β0 + β1 x
2. Deriving the Ordinary Least Squares Estimates
Deriving the Ordinary Least Squares Estimates

I Given data on x and y , how can we estimate the population parameters,

β0 and β1 ?
I Let {(xi , yi ) : i = 1, 2, . . . , n} be a sample of size n (the number of
observations) from the population. Think of this as a random sample.
Deriving the Ordinary Least Squares Estimates

I The graph shows n = 15 families and the (usually unknown) population

regression of saving on income.
I We observe yi and xi , but not ui . (However, we know ui is there.)
Deriving the Ordinary Least Squares Estimates

I The graph shows n = 15 families and the (usually unknown) population

regression of saving on income.
I We observe yi and xi , but not ui . (However, we know ui is there.)
I Plug any observation into the population equation:

yi = β0 + β1 xi + ui (19)
where the i subscript indicates a particular observation.
I Strategy: Use our assumptions about u to “identify" β0 , β1 from observed
yi , xi .
Deriving the Ordinary Least Squares Estimates

I Zero conditional mean assumption E (u|x) = 0. Implies

1. E (u) = E (y − β0 − β1 x) = 0
2. Cov (x, u) = Cov (x, y − β0 − β1 x) = 0
in the population
I Remember, the first condition essentially defines the intercept.
I The second condition, stated in terms of the covariance, means that x and
u are uncorrelated. (We could have assumed Cov (u, x) = 0 directly).
I Note:
1. E (u|x) = 0 implies
Cov (x, u) = E [(x − µx )u] = Ex [(x − µx )Eu|x (u|x)] = Ex [(x − µx )0] = 0
2. With E (u) = 0, Cov (x, u) = 0 is the same as E (xu) = 0 because
Cov (x, u) = E (xu) − E (x)E (u) = E (xu).
Deriving the Ordinary Least Squares Estimates

1. E (u) = E (y − β0 − β1 x) = 0
2. E (xu) = E [x(y − β0 − β1 x)] = 0
I These are the two conditions in the population that determine β0 and β1 .
I Method of moments: Use their sample analogs to determine β̂0 and β̂1 ,
the estimates from the data (Note the hats!):
d

N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0 (20)
i=1
N
X
N −1 xi (yi − β̂0 − β̂1 xi ) = 0 (21)
i=1

I These are the “OLS normal equations", two linear equations in the two
unknowns β̂0 and β̂1 .
Deriving the Ordinary Least Squares Estimates

N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0 (22)
i=1
N
X
N −1 xi (yi − β̂0 − β̂1 xi ) = 0 (23)
i=1

Pn
I Solve (22) to get β̂0 = ȳ − β̂1 x̄, for the sample averages ȳ = n−1 i=1 yi
and x̄ = n−1 ni=1 xi
P

I Substitute this into (23):

n
X
xi [yi − (ȳ − β̂1 x̄) − β̂1 xi ] = 0 (24)
i=1

I Simple algebra gives

n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (25)
i=1 i=1
Deriving the Ordinary Least Squares Estimates

n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1
Deriving the Ordinary Least Squares Estimates

n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1

I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)
Deriving the Ordinary Least Squares Estimates

n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1

I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)

I Use this to write (26)

n
" n
#
X X
(xi − x̄)(yi − ȳ ) = β̂1 (xi − x̄)2 (27)
i=1 i=1
Deriving the Ordinary Least Squares Estimates

n
" n #
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄) (26)
i=1 i=1

I Remember:
Pn
i=1 (xi − x̄) = 0
Pn Pn Pn
i=1 xi (yi − ȳ ) = i=1 (xi − x̄)(yi − ȳ ) = i=1 (xi − x̄)yi
Pn Pn 2
i=1 xi (xi − x̄) = i=1 (xi − x̄)

I Use this to write (26)

n
" n
#
X X
(xi − x̄)(yi − ȳ ) = β̂1 (xi − x̄)2 (27)
i=1 i=1

Pn
I If i=1 (xi − x̄)2 > 0, we can write
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (28)
i=1 (xi − x̄) Sample Variance(xi )
Deriving the Ordinary Least Squares Estimates

Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (29)
i=1 (xi − x̄) Sample Variance(xi )

I This formula for β̂1 shows us how to take the data we have and compute
the slope estimate. For reasons we will see, β̂1 is called the ordinary least
squares (OLS) slope estimate. We often refer to it as the slope estimate.
I It can be computed whenever the sample variance of the xi is not zero,
which only rules out the case where each xi is the same value. In other
words, we do not have to assume anything about the population to
calculate β̂1 .
I Once we have β̂1 , we compute β̂0 = ȳ − β̂1 x̄. This is the OLS intercept
estimate.
I These days, one lets a computer do the calculations, which can be tedious
even if n is small.
Why “Ordinary Least Squares Estimates"

I For any candidates β̂0 and β̂1 , define a fitted value for each data point i as

ŷi = β̂0 + β̂1 xi (30)

We have n of these. It is the value we predict for yi given that x has taken
on the value xi .
I The mistake we make is the residual:

ûi = yi − ŷi = yi − β̂0 − β̂1 xi , (31)

and we have n residuals.

Why “Ordinary Least Squares Estimates"
Why “Ordinary Least Squares Estimates"

I For any candidates β̂0 and β̂1 , define a fitted value for each data point i as

ŷi = β̂0 + β̂1 xi (32)

I The mistake we make is the residual:

ûi = yi − ŷi = yi − β̂0 − β̂1 xi , (33)

I Suppose we measure the size of the mistake, for each i, by squaring the
residual: ûi2 ≥ 0. Then we add them all up:

n
X n
X
ûi2 = (yi − β̂0 − β̂1 xi )2 (34)
i=1 i=1

I This quantity is called the sum of squared residuals.

I If we choose β̂0 and β̂1 to minimize the sum of squared residuals it can be
shown (using calculus or other arguments) that the solutions are the slope
and intercept estimates we obtained before.
Deriving the ‘Ordinary Least Squares Estimates’
n
X n
X
min ûi2 = (yi − β̂0 − β̂1 xi )2 (35)
β0 ,β1
i=1 i=1
Deriving the ‘Ordinary Least
n
Squares
n
Estimates’
X X
min ûi2 = (yi − β̂0 − β̂1 xi )2 (36)
β0 ,β1
i=1 i=1
Xn
FOCβ0 2(yi − β̂0 − β̂1 xi )(−1) = 0
i=1
n
X n
X
yi − β̂0 − β̂1 xi = 0
i=1 i=1

β̂0 = ȳ − β̂1 x̄
n
X
FOCβ1 2(yi − β̂0 − β̂1 xi )(−xi ) = 0
i=1
n
X n
X n
X
xi yi − β̂0 xi − β̂1 xi2 = 0
i=1 i=1 i=1
n
" n
#
X X
xi (yi − ȳ ) = β̂1 xi (xi − x̄)
i=1 i=1
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (37)
i=1 (xi − x̄) Sample Variance(xi )

Same equations as before!

Deriving the OLS estimators - 2 approaches

I Method of moments: impose population assumptions on sample

I OLS: minimise sum of squared deviations of yi from the sample regression
line (equal to the residual ui )
I NB: the OLS regression coefficients can be derived on any sample of the
data, independently of the model
I But: only under certain assumptions about the true model can we hope
that our OLS regression coefficients are “good" estimators of some true
population coefficients
Interpreting OLS estimates

I Once we have the numbers β̂0 and β̂1 for a given data set, we write the
OLS regression line as a function of x:

ŷ = β̂0 + β̂1 x (38)

I The OLS regression line allows us to predict y for any (sensible) value of
x. It is also called the sample regression function.
I The intercept, β̂0 , is the predicted y when x = 0. (The prediction is
usually meaningless if x = 0 is not possible.)
I The slope, β̂1 , allows us to predict changes in y for any (reasonable)
change in x:

∆ŷ = β̂1 ∆x (39)

I If ∆x = 1, so that x increases by one unit, then ∆ŷ = β̂1 . So β1 is the
“predicted change in y when x changes by 1 unit”.
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)

I Data are from 1991 on men only. wage is reported in dollars per hour,
educ is number of completed years of schooling.
I The estimated equation is

[ = −5.12 + 1.43educ
wage (40)
n = 759 (41)
I Below we discuss the negative intercept. Literally, it says that wage is
predicted to be −$5.12 when educ = 0!
I Each additional year of schooling is estimated to be worth $1.43.
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)

[ = −5.12 + 1.43educ
wage (42)
n = 759 (43)

I Note: We do not know the true population coefficients β0 and β1 . Rather,

β̂0 = −5.12 and β̂1 = 1.43 are our estimates from this particular sample of
759 men. These estimates may or may not be close to the population
values.
I If we obtain another sample of 759 men the estimates would almost
certainly change. But we can use the sampling distribution of the OLS
estimators to derive confidence intervals and test hypotheses about β0 and
β1 .
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)

I The function

[ = −5.12 + 1.43educ
wage (44)

is the OLS (or sample) regression line.

I Plugging in educ = 0 gives the silly prediction wage
[ = −5.12.
Extrapolating outside the range of the data can produce strange
predictions. There are no men in the sample with educ < 8.
I When educ = 8,

[ = −5.12 + 1.43(8) = 6.32

wage (45)
I The predicted hourly wage at eight years of education is $6.32, which we
can think of as our estimate of the average wage in the population when
educ = 8. But no one in the sample earns exactly $6.32: some earn more,
some earn less. One worker earns $6.25, which is close.
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
3. Properties of OLS on any Sample of Data
3. Properties of OLS on any Sample of Data

1. Algebraic Properties of OLS estimates, that hold mechanically /

independently of the underlying population model
2. Goodness of fit: R 2
3. Properties of OLS on any Sample of Data

I Once we have the sample regression function

ŷ = β̂0 + β̂1 x (46)

we get the OLS fitted values by plugging the xi into the equation:

ŷi = β̂0 + β̂1 xi ,i = 1, 2, . . . , n (47)

I The OLS residuals are

ûi = yi − ŷi = yi − β̂0 − β̂1 xi ,i = 1, 2, . . . , n (48)

EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
1. Algebraic Properties of OLS Statistics
1. Algebraic Properties of OLS Statistics
From the first OLS normal equation (22)
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0
i=1
(49)

1. The OLS residuals always add up to zero

n
X
ûi = 0 (50)
i=1

2. Because yi = ŷi + ûi by definition,

n
X n
X n
X
n−1 yi = n−1 ŷi + n−1 ûi
i=1 i=1 i=1

and so
ȳ = ŷ (51)
So the sample average of the actual yi is the same as the sample average
of the fitted values.
Algebraic Properties of OLS Statistics 2

PN
From the second OLS normal equation (23) N −1 i=1 xi (yi − β̂0 − β̂1 xi ) = 0
4. The sample covariance (and therefore the sample correlation) between the
explanatory variables and the residuals is always zero:
n
X
xi ûi = 0 (52)
i=1

5. Because the ŷi are linear functions of the xi , the fitted values and residuals
are uncorrelated, too:

n
X n
X n
X n
X
ŷi ûi = β̂0 + β̂1 xi ûi == β̂0 ûi + β̂1 xi ûi = 0 (53)
i=1 i=1 i=1 i=1

I Property (50) to (53) hold by construction. β̂0 and β̂1 were chosen to
make them true.
Algebraic Properties of OLS Statistics 3: ȳ = β̂0 + β̂1 x̄

I From (22), the point (x̄, ȳ ) is always on the OLS regression line. That is,
if we plug in the average for x, we predict the sample average for y :
N
X
N −1 (yi − β̂0 − β̂1 xi ) = 0
i=1

ȳ = β̂0 + β̂1 x̄ (54)

Again, we chose the estimates to make this true.

Algebraic Properties of OLS Statistics 3: ȳ = β̂0 + β̂1 x̄

ȳ = β̂0 + β̂1 x̄ (54)

Again, we chose the estimates to make this true.

I This implies we can write the sample regression function in terms of “mean
deviations" as

β̂0 = ȳ − β̂1 x̄
yi = β̂0 + β̂1 xi
yi − ȳ = β̂1 (xi − x̄) + ûi (55)

Sometimes this helps to interpret β̂1 . Also means we can estimate β1 from
1
PN
N i−1 (xi − x̄)(yi − ȳ − β̂ 1 (xi − x̄)) = 0.
2. Goodness-of-Fit
2. Goodness-of-Fit

I For each observation, write yi = ŷi + ûi

I Define the total sum of squares (SST), explained sum of squares (SSE)
and residual sum of squares (or sum of squared residuals) as
n
X
SST = (yi − ȳ )2 (56)
i=1
Xn
SSE = (ŷi − ȳ )2 (57)
i=1
Xn
SSR = ûi2 (58)
i=1

I Each of these is a sample variance when divided by n (or n − 1). SST /n is

the sample variance of yi , SSE /n is the sample variance of ŷi , and SSR/n
is the sample variance of ûi .
Goodness-of-Fit
I By writing
n
X n
X
SST = (yi − ȳ )2 = [(yi − ŷi ) + (ŷi − ȳ )]2 (59)
i=1 i=1
n
X
= [ûi + (ŷi − ȳ )]2 (60)
i=1

and using that the fitted values and residuals are uncorrelated

SST = SSE + SSR (61)

I Assuming SST > 0, we can define the fraction of the total variation in yi
that is explained by xi (or the OLS regression line) as

SSE SSR
R2 = =1− (62)
SST SST
I Called the R-squared of the regression.
I It can be shown to equal the square of the correlation between yi and ŷi .
Therefore,
0 ≤ R2 ≤ 1 (63)
Goodness-of-Fit

I R 2 = 0 means no linear relationship between yi and xi . R 2 = 1 means a

perfect linear relationship.
I As R 2 increases, the yi are closer and closer to falling on the OLS
regression line.
I Do not want to fixate on R 2 . It is a useful summary measure but tells us
nothing about causality. Having a “high” R-squared is neither necessary
nor sufficient to infer causality.
EXAMPLE: Effects of Education on Hourly Wage (WAGE2.DTA)
4. Units of Measurement and Functional Form
Units of Measurement

I In the simple linear regression model y = β0 + β1 x + u, the coefficient are

interpreted as follows
1. β0 is the value of y when x = 0
2. β1 is the change in y when x changes by 1 unit, ceteris paribus (i.e. holding
all else, including u, fixed). Sometimes easier to interpret from
∆y = β1 ∆x + ∆u
I It is very important to know how y and x are measured in order to
interpret regression functions. Consider an equation estimated from
CEOSAL1.DTA, where annual CEO salary is in thousands of dollars and
the return on equity is a percent:

\ = 963.191 + 18.501roe
salary (64)
n = 209, R 2 = .0132 (65)
Units of Measurement

I In the simple linear regression model y = β0 + β1 x + u, the coefficient are

\ = 963.191 + 18.501roe
salary (64)
n = 209, R 2 = .0132 (65)

I When roe = 0 (it never is in the data), salary

\ = 963.191. But salary is in
thousands of dollars, so $963,191.
I A one percentage point increase in roe increases predicted salary by
18.501, or $18501.
EXAMPLE: Salary
Units of Measurement

I What if we measure roe as a decimal number, rather than a percent?

Define

roedec = roe/100 (66)

I What will happen to the intercept, slope, and R 2 when we regress salary
on roedec ?
I To see the effect, substitute roe = 100 ∗ roedec in (64)
I Nothing happens to the intercept: roedec = 0 is the same as roe = 0. But
the slope will increase by a factor of 100. The goodness-of-fit does not
change.
I The new regression is

\ = 963.191 + 1850.1roedec
salary (67)
n = 209, R 2 = .0132 (68)
I Now a one percentage point change in roe is the same as ∆roedec = .01,
and so we get the same effect as before.
Units of Measurement

I What if we measure salary in dollars, rather than thousands of dollars, so

salarydol = 1000 · salary ?
I Substitute salary = salarydol/1000 and simplify:
I Both the intercept and slope get multiplied by 1000:

\ = 963191 + 18501roe
salarydol (69)
n = 209, R 2 = .0132 (70)
Using the Natural Logarithm in Simple Regression

I Recall the wage example:

[ = −5.12 + 1.43educ
wage (71)
n = 759, R 2 = .133 (72)
I Might be an okay approximation, but unsatisfying for a couple of reasons.
First, the negative intercept is a bit strange (even though the equation
gives sensible predictions for education ranging from 8 to 20).
I Second reason is more important: the dollar value of another year of
schooling is constant. So the 16th year of education is worth the same as
the second. We expect additional years of schooling to be worth more, in
dollar terms, than previous years.
I How can we incorporate an increasing effect? One way is to postulate a
constant percentage effect. We can approximate percentage changes using
the natural log (log for me, but also ln is common).
Using the Natural Logarithm in Simple Regression

I Now let the dependent variable be log(wage):

log(wage) = β0 + β1 educ + u (73)

Holding u fixed,

∆ log(wage) = β1 ∆educ (74)

so
∆ log(wage)
β1 = (75)
∆educ
Using the Natural Logarithm in Simple Regression

∆ log(wage)
β1 = (76)
∆educ

I For small changes of w from w1 : log (w ) ≈ log (w1 ) + w −w1

w1
Using the Natural Logarithm in Simple Regression

∆ log(wage)
β1 = (76)
∆educ

I For small changes of w from w1 : log (w ) ≈ log (w1 ) + w −w

w1
1
(Taylor
approx)
I Thus, a small log change equals the proportionate change:
∆ log(wage) = log (w2 )−log (w1 ) ≈ [log (w1 )+ w−w1
w1
]−log (w1 ) = ∆w
w1
.
Using the Natural Logarithm in Simple Regression

∆ log(wage)
β1 = (76)
∆educ

I For small changes of w from w1 : log (w ) ≈ log (w1 ) + w −w

w1
1
(Taylor
approx)
I Thus, a small log change equals the proportionate change:
∆ log(wage) = log (w2 )−log (w1 ) ≈ [log (w1 )+ w−w1
w1
]−log (w1 ) = ∆w
w1
.
I Note that a small fraction ∆x of x (e.g. 0.05), equals 1/100 · ∆x percent
1 ∆x
of x (e.g. 5 (percent)), since ∆x
x
= 100 x
100

∆ log(wage) = 1/100 · %∆wage = β1 ∆educ + ∆u (77)

Using the Natural Logarithm in Simple Regression

∆ log(wage)
β1 = (76)
∆educ

I For small changes of w from w1 : log (w ) ≈ log (w1 ) + w −w

∆ log(wage) = 1/100 · %∆wage = β1 ∆educ + ∆u (77)

I This means when ∆ log(wage) = 1/100∆%(wage) = β1 ∆educ + ∆u, we

have a simple interpretation of β1 b y multiplying by 100:

100β1 ≈ %∆wage when ∆educ = 1 (78)

Using the Natural Logarithm in Simple Regression

∆ log(wage)
β1 = (76)
∆educ

I For small changes of w from w1 : log (w ) ≈ log (w1 ) + w −w

∆ log(wage) = 1/100 · %∆wage = β1 ∆educ + ∆u (77)

I This means when ∆ log(wage) = 1/100∆%(wage) = β1 ∆educ + ∆u, we

have a simple interpretation of β1 b y multiplying by 100:

100β1 ≈ %∆wage when ∆educ = 1 (78)

I In this example, 100β1 is often called the return to education (just like an
investment).
I This measure is free of units of measurement of wage (currency, price
level).
Using the Natural Logarithm in Simple Regression

I We still want to explain wage in terms of educ! This gives us a way to get
a (roughly) constant percentage effect.
I Predicting wage is more complicated but not usually the most interesting
question.
I The next graph shows the relationship between wage and educ when
u = 0.
Using the Natural Logarithm in Simple Regression

I We can also use the log on both sides of the equation to get constant
elasticity models. For example, if

log(salary ) = β0 + β1 log(sales) + u (79)

then

∆log (salary ) %∆salary

β1 = ≈ (80)
∆log (sales) %∆sales
I The elasticity is free of units of salary and sales.
I A constant elasticity model for salary and sales makes more sense than a
constant dollar effect.
I The elasticity does not change if sales is measured in, say, billions. (So,
define lsalesbil = log(sales/1000).) The intercept will change.
Using the Natural Logarithm in Simple Regression

Model Dep. Var. Indep. Var. β1

Level-Level y x ∆y = β1 ∆x
Level-Log y log(x) ∆y = (β1 /100)%∆x
Log-Level log(y ) x %∆y = (100β1 )∆x
Log-Log log(y ) log(x) %∆y = β1 %∆x
What is “linear"?

I The possibility of using the natural log to get nonlinear relationships

between y and x raises a question: What do we mean now by “linear”
regression?
I The answer is that the model is linear in the parameters, β0 and β1 .
I In other words, we must be able to write the model as
f (y ) = β0 + β1 f (x) + u
I We can use any transformations of the dependent and independent
variables to get interesting interpretations for the parameters.
Recap

I We defined the simple linear regression model

y = β0 + β1 x + u (81)
I We saw how we needed to assume E (u|x) = 0 to identify β0 , β1 from
observations on x, y .
I Alternatively, we could have assumed Cov (x, u) = 0, or independence of
x, u.
I We derived the OLS estimator of β0
Pn
i=1 (xi − x̄)(yi − ȳ ) Sample Covariance(xi , yi )
β̂1 = Pn 2
= (82)
(x
i=1 i − x̄) Sample Variance(xi )

using the method of moments and OLS.

I We showed algebraic properties of OLS estimates.
I And derived R 2 as a measure of goodness of fit.
I We also discussed changing the units of x and y , and using log(y), log(x).

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Two-Variable Regression Analysis, Some Basic Ideas
No ratings yet
Two-Variable Regression Analysis, Some Basic Ideas
28 pages
Design Build Agreement
100% (1)
Design Build Agreement
7 pages
08 Building Construction Fdny
No ratings yet
08 Building Construction Fdny
55 pages
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
No ratings yet
EC212: Introduction To Econometrics Simple Regression Model (Wooldridge, Ch. 2)
107 pages
Lec Topic2
No ratings yet
Lec Topic2
68 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
No ratings yet
IAPRI Technical Training-Intro To Applied Econometrics 2018 06 25+-+Nicole+Mason
29 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
STAT 445-Lecture 1 - 2021
No ratings yet
STAT 445-Lecture 1 - 2021
42 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
04 16 Simple Regression
No ratings yet
04 16 Simple Regression
47 pages
IE Chapter2
No ratings yet
IE Chapter2
46 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Lectures
No ratings yet
Lectures
766 pages
Pertemuan 3
No ratings yet
Pertemuan 3
23 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Ie Slide02
No ratings yet
Ie Slide02
30 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
Chapter - 2
No ratings yet
Chapter - 2
59 pages
Regression Analysis With Cross-Sectional Data
No ratings yet
Regression Analysis With Cross-Sectional Data
0 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
62 pages
Chapter 1: The Nature of Econometrics and Economic Data
No ratings yet
Chapter 1: The Nature of Econometrics and Economic Data
19 pages
Session 3
No ratings yet
Session 3
17 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
Econ20222 MJAbackgr
No ratings yet
Econ20222 MJAbackgr
164 pages
Chapter 2
No ratings yet
Chapter 2
18 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
61 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
(2021) EC6041 Lecture 2 CLRM
No ratings yet
(2021) EC6041 Lecture 2 CLRM
30 pages
Econometrics I
No ratings yet
Econometrics I
43 pages
3
No ratings yet
3
29 pages
Unit 5 and 6 - Inferential Statistics and Regression Analysis
No ratings yet
Unit 5 and 6 - Inferential Statistics and Regression Analysis
68 pages
Simple Linear Regression Analysis..
No ratings yet
Simple Linear Regression Analysis..
51 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
Linear Regression 101
No ratings yet
Linear Regression 101
20 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
CH 3
No ratings yet
CH 3
123 pages
BivariateReg WT2425
No ratings yet
BivariateReg WT2425
109 pages
CH 02 Wooldridge 5e ppt20250307
No ratings yet
CH 02 Wooldridge 5e ppt20250307
51 pages
Slides (Handout) - Caio - Chapter 2 (Wooldridge)
No ratings yet
Slides (Handout) - Caio - Chapter 2 (Wooldridge)
86 pages
CH 2. Simple Linear Regression
No ratings yet
CH 2. Simple Linear Regression
63 pages
Econometrics Lecture 2 Simple Regression
No ratings yet
Econometrics Lecture 2 Simple Regression
33 pages
Lecture 2-2 - Simple Linear Regression (One Regressor)
No ratings yet
Lecture 2-2 - Simple Linear Regression (One Regressor)
22 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Wooldridge (2018) - Introductury Econometrics - A Modern Approach-Chapter 2
No ratings yet
Wooldridge (2018) - Introductury Econometrics - A Modern Approach-Chapter 2
47 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Additional Cheatsheet en
No ratings yet
Additional Cheatsheet en
3 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
100% (1)
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
27 pages
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Elements of Tensor Calculus
From Everand
Elements of Tensor Calculus
A. Lichnerowicz
3.5/5 (2)
Sulaiman 2020
No ratings yet
Sulaiman 2020
6 pages
David Osborn
No ratings yet
David Osborn
9 pages
AireO2Mixer ProductSheet USA
No ratings yet
AireO2Mixer ProductSheet USA
2 pages
Brazil Market Growth
No ratings yet
Brazil Market Growth
11 pages
Nigerian Constitution
No ratings yet
Nigerian Constitution
6 pages
Stock Audit Report
No ratings yet
Stock Audit Report
8 pages
Radiomuseum Lumophon Bruckner wd320 9120
No ratings yet
Radiomuseum Lumophon Bruckner wd320 9120
2 pages
Syllabus For Watershed Management
No ratings yet
Syllabus For Watershed Management
2 pages
BATCHI EIA CH 1 3
No ratings yet
BATCHI EIA CH 1 3
14 pages
Lab 07
No ratings yet
Lab 07
22 pages
Introduction To React Native
No ratings yet
Introduction To React Native
9 pages
Gustave Eiffel
No ratings yet
Gustave Eiffel
13 pages
ZXR10 M6000-Series Datasheet
No ratings yet
ZXR10 M6000-Series Datasheet
117 pages
Macdonald Lawrence Timber Framing LTD Wood Residu-Wageningen University and Research 248972
No ratings yet
Macdonald Lawrence Timber Framing LTD Wood Residu-Wageningen University and Research 248972
42 pages
273 GEO 1 Syndicate O
No ratings yet
273 GEO 1 Syndicate O
6 pages
Sample M.pharm Fresher Resumes
No ratings yet
Sample M.pharm Fresher Resumes
5 pages
GE Operations & Maintenance: A Digital Industrial Company
No ratings yet
GE Operations & Maintenance: A Digital Industrial Company
4 pages
ENG1203 Task 4
No ratings yet
ENG1203 Task 4
2 pages
Essay On Journey of Life
100% (1)
Essay On Journey of Life
3 pages
Helping Parents With Challenging Children Positive Family Intervention Facilitator Guide, 1st Edition Optimized EPUB Download
100% (10)
Helping Parents With Challenging Children Positive Family Intervention Facilitator Guide, 1st Edition Optimized EPUB Download
14 pages
Training Materials - Compliance-Implem. of The Reg. Fram. - 20AB02
No ratings yet
Training Materials - Compliance-Implem. of The Reg. Fram. - 20AB02
156 pages
Lecture 01 Introduction To Programming
No ratings yet
Lecture 01 Introduction To Programming
41 pages
2020 CreativeSupply HotelHandbook
No ratings yet
2020 CreativeSupply HotelHandbook
43 pages
Aloe Vera Sooting Gel
No ratings yet
Aloe Vera Sooting Gel
4 pages
Decision Theory Part 1
No ratings yet
Decision Theory Part 1
38 pages
Methods of Study and Sources of Data
No ratings yet
Methods of Study and Sources of Data
3 pages
Urban and PeriUrban Horticulture
No ratings yet
Urban and PeriUrban Horticulture
109 pages
Chapter 2 Understanding ACI Hardware and Topologies
No ratings yet
Chapter 2 Understanding ACI Hardware and Topologies
31 pages