0% found this document useful (0 votes)

96 views88 pages

CH 02

Uploaded by

Pan Kan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views88 pages

CH 02

Uploaded by

Pan Kan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

The Simple

Chapter 2 Regression
Model

“Explains variable in terms of variable ”

Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…

“Studies how varies with changes in :”

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

● The simple linear regression model is rarely applicable in prac-

tice but its discussion is useful for pedagogical reasons

Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed

● Example: A simple wage equation

Labor force experience,

tenure with current employer,
work ethic, intelligence, …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed

•
The average value of u in the population is 0
The second assumption is simply a normalization of u if there is an
intercept in the equation. To see this, suppose
and
This is equivalent to
where

has an important implication that we’ll refer

back to later.

Proof:

The explanatory variable must not

contain information bout the mean
of the unobserved factors

● Example: wage equation

e.g. intelligence …

The conditional mean independence assumption is unlikely to hold because

individuals with more education will also be more intelligent on average.

• This means that the average value of the dependent variable

can be expressed as a linear function of the explanatory variable
• Interpretation?
Intercept
Slope
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model

Population regression function

For individuals with , the

average value of is

● In order to estimate the regression model one needs data

● A random sample of observations

First observation

Second observation

Third observation Value of the dependent

variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation

● We try to estimate this line, by estimating β0 and β1:

Fitted values and residuals

● How should we estimate β0 and β1?

i.e.

● FOCs w.r.t β0 and β1 are:

● But we usually write them in a slightly different form:

● Showing that they are equivalent follows from the basic

properties of summations.

● I leave the numerator for your PS

● The denominator is non-negative, so β1 is positive (negative) if

x, y are positively (negatively) correlated in the sample.

Fitted regression line

For example, the i-th
data point

Salary in thousands of dollars Average return on equity of the CEO‘s firm (%)

● More often, we will just say that we are regressing salary on

return on equity:
• Note that it is “we are regressing the dependent variable on the
independent variable”, not vice versa

Salary in thousands of dollars Average return on equity of the CEO‘s firm (%)

● Fitted regression

Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by $18,501

● What does the intercept tell us?

● Causal interpretation?
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Three possible stories:
• x -> y: When stock prices rise, CEOs get a pay raise
• z -> x & z -> y: Good CEOs tend to get a bigger paycheck and their
companies usually do well
• y -> x: High CEO compensation give them incentive to work hard ->
profit rises, stock price rises

● Assuming it is causal, is this a big effect?

Fitted regression line

(depends on sample)

Unknown population regression line

Hourly wage in dollars Years of education

● Fitted regression

Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by $0.54

● How do we interpret β0?

● Causal interpretation?
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● For now, assume causal interpretation. Then what’s the return
of your four years of college education?
• Assume you work 2000 hours per year (50 weeks x 40 hours/week)

40.542000 = $ 4,320 Every Year

● Was it worth it?

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

● Fitted regression

Intercept
If candidate A‘s share of spending increases by one
percentage point, he or she receives 0.464 percen-
tage points more of the total vote
● What does β0 tell us?

● Three possible stories:

• x -> y: ?
• z -> x & z -> y: ?
• y -> x: ? (hint: policy influence)

● Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

● Algebraic properties of OLS regression (property I)

Deviations from regression

line sum up to zero

Intepretation: Covariance between

deviations and regressors is zero

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?

4
2
residuals
0
-2

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?

2
0
residuals
-2
-4

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?

4
2
residuals
0
-2
-4

0 .2 .4 .6 .8 1
x

● Does it look like a convincing “best guess” of y given x?

4
2
residuals
0
-2
-4

0 .2 .4 .6 .8 1
x

Sample averages of y and

x lie on regression line

Consider the fitted regression

If we plug in , we always get .

0 .2 .4 .6 .8 1
x

For example, CEO number 12‘s salary was

$526,023 lower than predicted using the
the information on his firm‘s return on equity

“How well does the explanatory variable explain the dependent variable?”
6

6
4

4
2

2
0

0
-2

-2

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

Total sum of squares, Explained sum of squares, Residual sum of squares,

represents total variation represents variation represents variation not
in the dependent variable explained by regression explained by regression

Dividing by (n-1), estimator

of the sample variance

Total variation Explained part Unexplained part

● What are these terms equal to?

R-squared measures the fraction of the

total variation that is explained by the
regression

● R2 is always between 0 and 1

0 .2 .4 .6 .8 1
x

y Fitted values

0 .2 .4 .6 .8 1
x

The regression explains only 1.3%

of the total variation in salaries

● Voting outcomes and campaign expenditures

The regression explains 85.6% of the

total variation in election outcomes

● Caution: A high R-squared does not necessarily mean that the

regression has a causal interpretation!

• Suppose

● Let’s verify this in Stata

• Suppose

● Let’s verify this in Stata

• May change the magnitude of our OLS estimates, but their

interpretation remains the same
• No impact on R2 – unit independent

● Why we might want to take logs?

• Any ideas?

● Let’s see some possible justifications

Hourly wage in dollars Years of education

0 5 10 15 20
educ

● Does this look like a “good” regression?

15
10
Residuals
5 0
-5

0 5 10 15 20
educ

● What‘s wrong with the residuals?

.25
.2.15
Density
.1
.05
0

-5 0 5 10 15
Residuals

● What‘s wrong with the residuals?

● Regression of log wages on years of education

Natural logarithm of wage

2
15

1
10
Residuals

Residuals
0
5

-1
0

-2
-5

0 5 10 15 20 0 5 10 15 20
educ educ

wage log(wage)
© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Let‘s look at the residuals again
.25

1
.2

.8
.15

.6
Density

Density
.1

.4
.05

.2
0

-5 0 5 10 15 -2 -1 0 1 2
Residuals Residuals

● Regression of log wages on years of education

Natural logarithm of wage

● This changes the interpretation of the regression coefficient:

Percentage change of wage

… if years of education
are increased by one year

The wage increases by 8.3% for

every additional year of education
(= return to another year of
education)

For example:

Growth rate of wage is 8.3%

per year of education
Increasing return to education

● CEO salary and firm sales

Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales

● This changes the interpretation of the regression coefficient:

Percentage change of salary

… if sales increase by 1%

Logarithmic changes are

always percentage changes

+ 1% sales; + 0.257% salary

● For example:

● The log-log form postulates a constant elasticity model,

whereas the semi-log form assumes a semi-elasticity model

Transforming the variables changes the interpretation of the slope

parameter!

● Suppose we draw a random sample with two obs (x1, x2) from
some distribution where

● is unknown, but we can estimate it with:

• Estimator 1:

• Estimator 2:

● Which one is a better estimator?

• No, both estimators are unbiased.

• Estimator 1 has smaller variance.

● The estimated regression coefficients are random variables

because they are calculated from a random sample

Data is random and depends on particular sample that has been drawn

● The question is what the estimators will estimate on average

and how large their variability in repeated samples is

● Assumption SLR.1 (Linear in parameters)

In the population, the relationship

between y and x is linear

● Assumption SLR.2 (Random sampling)

The data is a random sample

drawn from the population

Each data point therefore follows

the population equation

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Discussion of random sampling: Wage and education
• The population consists, for example, of all workers of country A
• In the population, a linear relationship between wages (or log
wages) and years of education holds
• Draw completely randomly a worker from the population
• The wage and the years of education of the worker drawn are
random because one does not know beforehand which worker is
drawn
• Throw back worker into population and repeat random draw
times
• The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education

The values drawn

for the i-th worker

The implied deviation

from the population
relationship for
the i-th worker:

● Assumption SLR.3 (Sample variation in the explanatory

variable)

The values of the explanatory variables are not all

the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
lead to different values of the dependent variable)

● Assumption SLR.4 (Zero conditional mean)

The value of the explanatory variable must

contain no information about the mean of
the unobserved factors

● Interpretation of unbiasedness
• The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
• However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
• “On average” means if sampling was repeated, i.e. if drawing the
random sample and doing the estimation was repeated many times
• In a given sample, estimates may differ considerably from true
values

© 2016 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license
distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
The Simple
Regression Model
● Variances of the OLS estimators
• Depending on the sample, the estimates will be nearer or farther
away from the true population values
• How far can we expect our estimates to be away from the true
population values on average (= sampling variability)?
• Sampling variability is measured by the estimator‘s variances

● Assumption SLR.5 (Homoskedasticity)

The value of the explanatory variable must

contain no information about the variability
of the unobserved factors

• Note: this assumption was not necessary for unbiasedness

The variability of the unobserved

influences does not depend on the
value of the explanatory variable

The variance of the unobserved

determinants of wages increases
with the level of education

Under assumptions SLR.1 – SLR.5:

● What’s the intuition?

• Recall σ2 = var(u|x) – the variability of the error term
• SSTx is the total sum of squared variation in the x’s

Which gives you a more precise estimator of β1?

6
4

4
2

2
0

0
-2

-2

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

Which gives you a more precise estimator of β1?

.7
1
.8

.6
.6

.5
.4

.4
.2

.3
0

0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
x x

Under assumptions SLR.1 – SLR.5:

● Conclusion:
• The sampling variability of the estimated regression coefficients will
be the higher, the larger the variability of the unobserved factors,
and the lower, the higher the variation in the explanatory variable

The variance of u does not depend on x,

i.e. equal to the unconditional variance

One could estimate the variance of the

errors by calculating the variance of the
residuals in the sample; unfortunately
this estimate would be biased

An unbiased estimate of the error variance can be obtained by

substracting the number of estimated regression coefficients
from the number of observations

● Calculation of standard errors for regression coefficients

Plug in for
the unknown

The estimated standard deviations of the regression coefficients are called “standard
errors.” They measure how precisely the regression coefficients are estimated.

Solution Manual For Introductory Econometrics 6th Edition by Woolridge
0% (3)
Solution Manual For Introductory Econometrics 6th Edition by Woolridge
7 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
Principles of Econometrics 4e Chapter 2 Solution
84% (19)
Principles of Econometrics 4e Chapter 2 Solution
33 pages
CH 02 Wooldridge 6e PPT Updated
No ratings yet
CH 02 Wooldridge 6e PPT Updated
39 pages
Interval Estimation Practice Questions
0% (2)
Interval Estimation Practice Questions
19 pages
Topic 6 Simple Linear Regression
No ratings yet
Topic 6 Simple Linear Regression
57 pages
CH 04 Wooldridge 6e PPT Updated
No ratings yet
CH 04 Wooldridge 6e PPT Updated
39 pages
CH 03 Wooldridge 6e PPT Updated
No ratings yet
CH 03 Wooldridge 6e PPT Updated
36 pages
Introduction To Econometrics - Stock & Watson - CH 7 Slides
100% (1)
Introduction To Econometrics - Stock & Watson - CH 7 Slides
35 pages
Econometrics Slides
No ratings yet
Econometrics Slides
289 pages
Chapter 4-Functional Forms of Regression Model
No ratings yet
Chapter 4-Functional Forms of Regression Model
21 pages
Logistic PDF
No ratings yet
Logistic PDF
146 pages
Lecture 4
No ratings yet
Lecture 4
17 pages
ch12 Autocorrelation
100% (1)
ch12 Autocorrelation
36 pages
Linear Regression Questions Answers
No ratings yet
Linear Regression Questions Answers
6 pages
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
100% (1)
Econometrics: Multicollinearity: What Happens If The Regressors Are Correlated?
45 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
CH 12
0% (2)
CH 12
25 pages
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
100% (1)
3 Multiple Linear Regression: Estimation and Properties: Ezequiel Uriel Universidad de Valencia Version: 09-2013
37 pages
Chapter 1: The Nature of Econometrics and Economic Data Chapter 2: The Simple Regression Model
No ratings yet
Chapter 1: The Nature of Econometrics and Economic Data Chapter 2: The Simple Regression Model
19 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
30 pages
Econometrics I: TA Session 5: Giovanna Ubida
No ratings yet
Econometrics I: TA Session 5: Giovanna Ubida
20 pages
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
No ratings yet
Multiple Regression Analysis: Inference: Wooldridge: Introductory Econometrics: A Modern Approach, 5e
23 pages
Econometrics Multiple Regression Analysis: Heteroskedasticity
No ratings yet
Econometrics Multiple Regression Analysis: Heteroskedasticity
19 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Introduction To Econometrics - Stock & Watson - CH 5 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 5 Slides
71 pages
ARCH Model
No ratings yet
ARCH Model
26 pages
Answer Set 5 - Fall 2009
No ratings yet
Answer Set 5 - Fall 2009
38 pages
Econometric Analysis of Panel Data: William Greene Department of Economics Stern School of Business
No ratings yet
Econometric Analysis of Panel Data: William Greene Department of Economics Stern School of Business
88 pages
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
No ratings yet
Department of Economics: ECONOMICS 481: Economics Research Paper and Seminar
15 pages
Heteroskedasticity
100% (1)
Heteroskedasticity
23 pages
ECOF Outline and Path
No ratings yet
ECOF Outline and Path
5 pages
Patterns of Technical Change: Toward A Taxonomy and A Theory
No ratings yet
Patterns of Technical Change: Toward A Taxonomy and A Theory
31 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
Introduction To Regression Models For Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. Mcmanus
No ratings yet
Introduction To Regression Models For Panel Data Analysis Indiana University Workshop in Methods October 7, 2011 Professor Patricia A. Mcmanus
42 pages
Problems With OLS
No ratings yet
Problems With OLS
8 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
No ratings yet
Instrumental Variables & 2SLS: y + X + X + - . - X + U X + Z+ X + - . - X + V
21 pages
TSExamples PDF
No ratings yet
TSExamples PDF
9 pages
Lecture 6
No ratings yet
Lecture 6
23 pages
Panel Data
No ratings yet
Panel Data
9 pages
Topic 6 Two Variable Regression Analysis Interval Estimation and Hypothesis Testing
No ratings yet
Topic 6 Two Variable Regression Analysis Interval Estimation and Hypothesis Testing
36 pages
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
No ratings yet
07 - Lent - Topic 2 - Generalized Method of Moments, Part II - The Linear Model - mw217
16 pages
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
100% (1)
Multicollinearity: Abhijeet Kumar Kumar Anshuman Manish Kumar Umashankar Singh
22 pages
Structure Project Topics For Projects 2016
No ratings yet
Structure Project Topics For Projects 2016
4 pages
ECON 421 Syllabus
No ratings yet
ECON 421 Syllabus
3 pages
Heteroscedasticity Notes
No ratings yet
Heteroscedasticity Notes
9 pages
Axiomatic Probability and Concepts
No ratings yet
Axiomatic Probability and Concepts
6 pages
Wooldridge 7e Ch01 SM-1
No ratings yet
Wooldridge 7e Ch01 SM-1
5 pages
CH 05 Wooldridge 5e PPT
No ratings yet
CH 05 Wooldridge 5e PPT
8 pages
Lecture 5
No ratings yet
Lecture 5
25 pages
ECON 330-Econometrics-Dr. Farooq Naseer
No ratings yet
ECON 330-Econometrics-Dr. Farooq Naseer
5 pages
Chapter 17
No ratings yet
Chapter 17
1 page
CH-15 - IInd Sem 23-24
No ratings yet
CH-15 - IInd Sem 23-24
99 pages
School of Economics, Finance and Banking College of Business Beeq5113 Applied Econometrics SECOND SEMESTER 2017/2018 Exercise 3
No ratings yet
School of Economics, Finance and Banking College of Business Beeq5113 Applied Econometrics SECOND SEMESTER 2017/2018 Exercise 3
1 page
linear-regression-ML-DA - Ipynb - Colaboratory
No ratings yet
linear-regression-ML-DA - Ipynb - Colaboratory
4 pages
Studenmund Ch02 v2
No ratings yet
Studenmund Ch02 v2
30 pages
Lecture 3
No ratings yet
Lecture 3
30 pages
Econometrics Question M.Phil II 2020
No ratings yet
Econometrics Question M.Phil II 2020
4 pages
Econometrics 1
No ratings yet
Econometrics 1
74 pages
Qualitative Forecasting Techniques: A. Sales Force Composite
No ratings yet
Qualitative Forecasting Techniques: A. Sales Force Composite
3 pages
Chapter 2 Power Point Slides
No ratings yet
Chapter 2 Power Point Slides
40 pages
Qualitative Response Regression Models
No ratings yet
Qualitative Response Regression Models
6 pages
Choosing The Correct Statistical Test
No ratings yet
Choosing The Correct Statistical Test
26 pages
Lecture 1. Introduction To Econometrics
100% (1)
Lecture 1. Introduction To Econometrics
24 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Dummy Dependent Variable Models
No ratings yet
Dummy Dependent Variable Models
2 pages
Multiple Regression
No ratings yet
Multiple Regression
12 pages
Econometrics Notes 2024
100% (1)
Econometrics Notes 2024
46 pages
Chapter 4 Power Point Slides
No ratings yet
Chapter 4 Power Point Slides
38 pages
Multiple Linear Regression (Continue) Example:: CO Product Y Solvent Total X Hydrogen Consumption X Y X Y X Y X X
No ratings yet
Multiple Linear Regression (Continue) Example:: CO Product Y Solvent Total X Hydrogen Consumption X Y X Y X Y X X
4 pages
3 - Wooldridge - Introductory Econometrics - Ch03
No ratings yet
3 - Wooldridge - Introductory Econometrics - Ch03
25 pages
Chapter 18 Power Point Slides
No ratings yet
Chapter 18 Power Point Slides
18 pages
Lecture-2 Least Squares Regression
No ratings yet
Lecture-2 Least Squares Regression
18 pages
Estimating Stock Market Volatility With Markov Regime-Switching GARCH Models
No ratings yet
Estimating Stock Market Volatility With Markov Regime-Switching GARCH Models
11 pages
Multiple Regression Model
No ratings yet
Multiple Regression Model
7 pages
Econometrics - Exercise Set 1 (Solution)
No ratings yet
Econometrics - Exercise Set 1 (Solution)
7 pages
CH 02 Wooldridge 6e PPT Updated
No ratings yet
CH 02 Wooldridge 6e PPT Updated
35 pages
Cornwall & Rupert (1997)
No ratings yet
Cornwall & Rupert (1997)
10 pages
Block 1
No ratings yet
Block 1
81 pages
01-Introduction - Overview
No ratings yet
01-Introduction - Overview
60 pages
(Ebook PDF) Introductory Econometrics: A Modern Approach 6th Edition PDF Download
100% (2)
(Ebook PDF) Introductory Econometrics: A Modern Approach 6th Edition PDF Download
48 pages
PGD - Stat-3.Simple Corr & Regression
No ratings yet
PGD - Stat-3.Simple Corr & Regression
28 pages
Econometrics II Chapter 4 Panel Data Econometrics
No ratings yet
Econometrics II Chapter 4 Panel Data Econometrics
31 pages
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
No ratings yet
RSM1282-2025-Session 9-Binary Dependent Variables & Logistic Regression - POST
35 pages
5 2 4soln
No ratings yet
5 2 4soln
2 pages
Bayesian Multiple Linear Regression
No ratings yet
Bayesian Multiple Linear Regression
7 pages
Financial Development and Economic Growth - International Evidence
No ratings yet
Financial Development and Economic Growth - International Evidence
15 pages
Logit Models From Economics and Other Fields 2nd Edition J. S. Cramer PDF Download
100% (3)
Logit Models From Economics and Other Fields 2nd Edition J. S. Cramer PDF Download
61 pages
Illustrating PRINCE2: Project management in real terms
From Everand
Illustrating PRINCE2: Project management in real terms
Susan Tuttle
No ratings yet

CH 02

Uploaded by

CH 02

Uploaded by

The Simple

“Explains variable in terms of variable ”

Intercept Slope parameter

“Studies how varies with changes in :”

● The simple linear regression model is rarely applicable in prac-

● Example: A simple wage equation

Labor force experience,

has an important implication that we’ll refer

The explanatory variable must not

● Example: wage equation

The conditional mean independence assumption is unlikely to hold because

• This means that the average value of the dependent variable

Population regression function

For individuals with , the

● In order to estimate the regression model one needs data

● A random sample of observations

Third observation Value of the dependent

● We try to estimate this line, by estimating β0 and β1:

● How should we estimate β0 and β1?

● FOCs w.r.t β0 and β1 are:

● But we usually write them in a slightly different form:

● Showing that they are equivalent follows from the basic

● I leave the numerator for your PS

● The denominator is non-negative, so β1 is positive (negative) if

Fitted regression line

● More often, we will just say that we are regressing salary on

● What does the intercept tell us?

● Assuming it is causal, is this a big effect?

Fitted regression line

Unknown population regression line

Hourly wage in dollars Years of education

Hourly wage in dollars Years of education

● How do we interpret β0?

4*0.54*2000 = $ 4,320 Every Year

● Was it worth it?

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

● Three possible stories:

● Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

● Algebraic properties of OLS regression (property I)

Deviations from regression

Intepretation: Covariance between

● Does it look like a convincing “best guess” of y given x?

● Does it look like a convincing “best guess” of y given x?

● Does it look like a convincing “best guess” of y given x?

● Does it look like a convincing “best guess” of y given x?

Sample averages of y and

Consider the fitted regression

If we plug in , we always get .

For example, CEO number 12‘s salary was

Total sum of squares, Explained sum of squares, Residual sum of squares,

Dividing by (n-1), estimator

Total variation Explained part Unexplained part

● What are these terms equal to?

R-squared measures the fraction of the

● R2 is always between 0 and 1

The regression explains only 1.3%

● Voting outcomes and campaign expenditures

The regression explains 85.6% of the

● Caution: A high R-squared does not necessarily mean that the

● Let’s verify this in Stata

● Let’s verify this in Stata

• May change the magnitude of our OLS estimates, but their

● Why we might want to take logs?

● Let’s see some possible justifications

Hourly wage in dollars Years of education

● Does this look like a “good” regression?

● What‘s wrong with the residuals?

● What‘s wrong with the residuals?

● Regression of log wages on years of education

Natural logarithm of wage

● Regression of log wages on years of education

Natural logarithm of wage

● This changes the interpretation of the regression coefficient:

Percentage change of wage

40.542000 = $ 4,320 Every Year