0% found this document useful (0 votes)
94 views88 pages

CH 02

Uploaded by

Pan Kan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views88 pages

CH 02

Uploaded by

Pan Kan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

The Simple

Chapter 2 Regression
Model

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a
certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use. © kentoh/Shutterstock.
The Simple
Regression Model
● Definition of the simple linear regression model

“Explains variable in terms of variable ”

Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Interpretation of the simple linear regression model

“Studies how varies with changes in :”

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

● The simple linear regression model is rarely applicable in prac-


tice but its discussion is useful for pedagogical reasons

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Example: Soybean yield and fertilizer

Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed

● Example: A simple wage equation

Labor force experience,


tenure with current employer,
work ethic, intelligence, …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Assumptions about the Error Term

The expected value of u does not depend on x


The average value of u in the population is 0
The second assumption is simply a normalization of u if there is an
intercept in the equation. To see this, suppose
and
This is equivalent to
where

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use. 5
The Simple
Regression Model
● Assumptions about the Error Term

has an important implication that we’ll refer


back to later.

Proof:

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use. 6
The Simple
Regression Model
● Conditional mean independence assumption

The explanatory variable must not


contain information bout the mean
of the unobserved factors

● Example: wage equation

e.g. intelligence …

The conditional mean independence assumption is unlikely to hold because


individuals with more education will also be more intelligent on average.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Population regression function (PFR)
• The conditional mean independence assumption implies that

• This means that the average value of the dependent variable


can be expressed as a linear function of the explanatory variable
• Interpretation?
Intercept
Slope
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model

Population regression function

For individuals with , the


average value of is

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Deriving the ordinary least squares estimates

● In order to estimate the regression model one needs data

● A random sample of observations

First observation

Second observation

Third observation Value of the dependent


variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Error term “u“

● We try to estimate this line, by estimating β0 and β1:


Fitted values and residuals

● How should we estimate β0 and β1?

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Minimize sum of squared regression residuals

i.e.

● FOCs w.r.t β0 and β1 are:

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● First, rewrite the FOC w.r.t. β0

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Now, plug this into the FOC w.r.t. β1

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Ordinary Least Squares (OLS) estimates

● But we usually write them in a slightly different form:

● Showing that they are equivalent follows from the basic


properties of summations.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● For the denominator,

● I leave the numerator for your PS

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Why we use the second form?

● The denominator is non-negative, so β1 is positive (negative) if


x, y are positively (negatively) correlated in the sample.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Fit as good as possible a regression line through the data
points:

Fitted regression line


For example, the i-th
data point

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Terminology

Salary in thousands of dollars Average return on equity of the CEO‘s firm (%)

● More often, we will just say that we are regressing salary on


return on equity:
• Note that it is “we are regressing the dependent variable on the
independent variable”, not vice versa

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● CEO Salary and return on equity

Salary in thousands of dollars Average return on equity of the CEO‘s firm (%)

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● CEO Salary and return on equity

Salary in thousands of dollars Average return on equity of the CEO‘s firm (%)

● Fitted regression

Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by $18,501

● What does the intercept tell us?

● Causal interpretation?
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Three possible stories:
• x -> y: When stock prices rise, CEOs get a pay raise
• z -> x & z -> y: Good CEOs tend to get a bigger paycheck and their
companies usually do well
• y -> x: High CEO compensation give them incentive to work hard ->
profit rises, stock price rises

● Assuming it is causal, is this a big effect?

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model

Fitted regression line


(depends on sample)

Unknown population regression line

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Wage and education

Hourly wage in dollars Years of education

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Wage and education

Hourly wage in dollars Years of education

● Fitted regression

Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by $0.54

● How do we interpret β0?

● Causal interpretation?
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● For now, assume causal interpretation. Then what’s the return
of your four years of college education?
• Assume you work 2000 hours per year (50 weeks x 40 hours/week)

4*0.54*2000 = $ 4,320 Every Year

● Was it worth it?

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Voting outcomes and campaign expenditures (two parties)

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Voting outcomes and campaign expenditures (two parties)

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

● Fitted regression

Intercept
If candidate A‘s share of spending increases by one
percentage point, he or she receives 0.464 percen-
tage points more of the total vote
● What does β0 tell us?

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Causal interpretation?

● Three possible stories:


• x -> y: ?
• z -> x & z -> y: ?
• y -> x: ? (hint: policy influence)

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Properties of OLS on any sample of data

● Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

● Algebraic properties of OLS regression (property I)

Deviations from regression


line sum up to zero

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Algebraic properties of OLS regression (property II)

Intepretation: Covariance between


deviations and regressors is zero

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let’s get some intuition for these principles graphically...
6
4
2
0
-2

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● What’s wrong with the residuals? How do we fix it?

4
2
residuals
0
-2

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let’s get some intuition for these principles graphically...
6
4
2
0
-2

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● What’s wrong with the residuals? How do we fix it?

2
0
residuals
-2
-4

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let’s get some intuition for these principles graphically...
6
4
2
0
-2

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● What’s wrong with the residuals? How do we fix it?

4
2
residuals
0
-2
-4

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let’s get some intuition for these principles graphically...
6
4
2
0
-2

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● What’s wrong with the residuals? How do we fix it?

4
2
residuals
0
-2
-4

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Algebraic properties of OLS regression (property III)

Sample averages of y and


x lie on regression line

Consider the fitted regression

If we plug in , we always get .

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let’s get some intuition for these principles graphically...
6
4
2
0
-2

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model

For example, CEO number 12‘s salary was


$526,023 lower than predicted using the
the information on his firm‘s return on equity

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Goodness-of-Fit

“How well does the explanatory variable explain the dependent variable?”
6

6
4

4
2

2
0

0
-2

-2

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Measures of Variation

Total sum of squares, Explained sum of squares, Residual sum of squares,


represents total variation represents variation represents variation not
in the dependent variable explained by regression explained by regression

Dividing by (n-1), estimator


of the sample variance

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Decomposition of total variation

Total variation Explained part Unexplained part

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● So we are left wanting to show that the middle term is 0

● What are these terms equal to?

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Goodness-of-fit measure (R-squared)

R-squared measures the fraction of the


total variation that is explained by the
regression

● R2 is always between 0 and 1

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● R2 = 0
x tells us nothing at all about y
1
.8
.6
.4
.2
0

0 .2 .4 .6 .8 1
x

y Fitted values

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● R2 = 1
All of the data points fall perfectly on the regression line
1
.9
.8
.7
.6
.5

0 .2 .4 .6 .8 1
x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● CEO Salary and return on equity

The regression explains only 1.3%


of the total variation in salaries

● Voting outcomes and campaign expenditures

The regression explains 85.6% of the


total variation in election outcomes

● Caution: A high R-squared does not necessarily mean that the


regression has a causal interpretation!

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Dependent Variable
• Suppose we regress y on x and obtain
• What happens if we change the unit of measurement of y?

• Suppose

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Dependent Variable
• What about the intercept?
• Recall that

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Dependent Variable

● Let’s verify this in Stata

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Independent Variable
• Suppose we regress y on x and obtain
• What happens if we change the unit of measurement of x?

• Suppose

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Independent Variable
• What about the intercept?
• Recall that

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units – Independent Variable

● Let’s verify this in Stata

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Changing units

• May change the magnitude of our OLS estimates, but their


interpretation remains the same
• No impact on R2 – unit independent

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Non-Linear Transformations
• More sophisticated transformations of their regression variables
(beyond changing units). For example, ln(y)

● Why we might want to take logs?


• Any ideas?

● Let’s see some possible justifications

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Wage and education

Hourly wage in dollars Years of education

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Wage and education
25
20
15
10
5
0

0 5 10 15 20
educ

● Does this look like a “good” regression?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let‘s look at the residuals

15
10
Residuals
5 0
-5

0 5 10 15 20
educ

● What‘s wrong with the residuals?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let‘s look at the residuals

.25
.2.15
Density
.1
.05
0

-5 0 5 10 15
Residuals

● What‘s wrong with the residuals?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Incorporating nonlinearities: Semi-logarithmic form

● Regression of log wages on years of education

Natural logarithm of wage

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let‘s look at the residuals again

2
15

1
10
Residuals

Residuals
0
5

-1
0

-2
-5

0 5 10 15 20 0 5 10 15 20
educ educ

wage log(wage)
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let‘s look at the residuals again
.25

1
.2

.8
.15

.6
Density

Density
.1

.4
.05

.2
0

-5 0 5 10 15 -2 -1 0 1 2
Residuals Residuals

wage log(wage)
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Incorporating nonlinearities: Semi-logarithmic form

● Regression of log wages on years of education

Natural logarithm of wage

● This changes the interpretation of the regression coefficient:

Percentage change of wage

… if years of education
are increased by one year

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Fitted regression

The wage increases by 8.3% for


every additional year of education
(= return to another year of
education)

For example:

Growth rate of wage is 8.3%


per year of education
Increasing return to education

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Incorporating nonlinearities: Log-logarithmic form

● CEO salary and firm sales

Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales

● This changes the interpretation of the regression coefficient:

Percentage change of salary


… if sales increase by 1%

Logarithmic changes are


always percentage changes

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● CEO salary and firm sales: fitted regression

+ 1% sales; + 0.257% salary


● For example:

● The log-log form postulates a constant elasticity model,


whereas the semi-log form assumes a semi-elasticity model

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Natural Log Transformations

Transforming the variables changes the interpretation of the slope


parameter!

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use. 70
The Simple
Regression Model
● Evaluating Our OLS Estimators

● Suppose we draw a random sample with two obs (x1, x2) from
some distribution where

● is unknown, but we can estimate it with:


• Estimator 1:

• Estimator 2:

● Which one is a better estimator?


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Unbiasedness
• First, let’s see if they are biased?

• No, both estimators are unbiased.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Variance
• Then, how far we can expect the two estimators to be away from

• Estimator 1 has smaller variance.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Expected values and variances of the OLS estimators

● The estimated regression coefficients are random variables


because they are calculated from a random sample

Data is random and depends on particular sample that has been drawn

● The question is what the estimators will estimate on average


and how large their variability in repeated samples is

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Standard assumptions for the linear regression model

● Assumption SLR.1 (Linear in parameters)

In the population, the relationship


between y and x is linear

● Assumption SLR.2 (Random sampling)

The data is a random sample


drawn from the population

Each data point therefore follows


the population equation

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Discussion of random sampling: Wage and education
• The population consists, for example, of all workers of country A
• In the population, a linear relationship between wages (or log
wages) and years of education holds
• Draw completely randomly a worker from the population
• The wage and the years of education of the worker drawn are
random because one does not know beforehand which worker is
drawn
• Throw back worker into population and repeat random draw
times
• The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model

The values drawn


for the i-th worker

The implied deviation


from the population
relationship for
the i-th worker:

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Assumptions for the linear regression model (cont.)

● Assumption SLR.3 (Sample variation in the explanatory


variable)

The values of the explanatory variables are not all


the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
lead to different values of the dependent variable)

● Assumption SLR.4 (Zero conditional mean)

The value of the explanatory variable must


contain no information about the mean of
the unobserved factors

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Theorem 2.1 (Unbiasedness of OLS)

● Interpretation of unbiasedness
• The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
• However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
• “On average” means if sampling was repeated, i.e. if drawing the
random sample and doing the estimation was repeated many times
• In a given sample, estimates may differ considerably from true
values

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Variances of the OLS estimators
• Depending on the sample, the estimates will be nearer or farther
away from the true population values
• How far can we expect our estimates to be away from the true
population values on average (= sampling variability)?
• Sampling variability is measured by the estimator‘s variances

● Assumption SLR.5 (Homoskedasticity)

The value of the explanatory variable must


contain no information about the variability
of the unobserved factors

• Note: this assumption was not necessary for unbiasedness


© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Graphical illustration of homoskedasticity

The variability of the unobserved


influences does not depend on the
value of the explanatory variable

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● An example for heteroskedasticity: Wage and education

The variance of the unobserved


determinants of wages increases
with the level of education

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Theorem 2.2 (Variances of the OLS estimators)

Under assumptions SLR.1 – SLR.5:

● What’s the intuition?


• Recall σ2 = var(u|x) – the variability of the error term
• SSTx is the total sum of squared variation in the x’s

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Variances of β1

Which gives you a more precise estimator of β1?


6

6
4

4
2

2
0

0
-2

-2

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Variances of β1

Which gives you a more precise estimator of β1?

.7
1
.8

.6
.6

.5
.4

.4
.2

.3
0

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Theorem 2.2 (Variances of the OLS estimators)

Under assumptions SLR.1 – SLR.5:

● Conclusion:
• The sampling variability of the estimated regression coefficients will
be the higher, the larger the variability of the unobserved factors,
and the lower, the higher the variation in the explanatory variable

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Estimating the error variance

The variance of u does not depend on x,


i.e. equal to the unconditional variance

One could estimate the variance of the


errors by calculating the variance of the
residuals in the sample; unfortunately
this estimate would be biased

An unbiased estimate of the error variance can be obtained by


substracting the number of estimated regression coefficients
from the number of observations

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Theorem 2.3 (Unbiasedness of the error variance)

● Calculation of standard errors for regression coefficients

Plug in for
the unknown

The estimated standard deviations of the regression coefficients are called “standard
errors.” They measure how precisely the regression coefficients are estimated.

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.

You might also like