0% found this document useful (0 votes)
137 views43 pages

CH 02 PPT Simple Linear Regression

This document discusses the simple linear regression model. It defines the model using an intercept, slope parameter, dependent and independent variables. It explains that the model studies how the dependent variable varies with changes in the independent variable, while holding other factors constant. Examples are provided of regressing soybean yield on fertilizer and hourly wage on education. The document also discusses estimating the regression model from a random sample using the method of Ordinary Least Squares.

Uploaded by

YAHIA ADEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views43 pages

CH 02 PPT Simple Linear Regression

This document discusses the simple linear regression model. It defines the model using an intercept, slope parameter, dependent and independent variables. It explains that the model studies how the dependent variable varies with changes in the independent variable, while holding other factors constant. Examples are provided of regressing soybean yield on fertilizer and hourly wage on education. The document also discusses estimating the regression model from a random sample using the method of Ordinary Least Squares.

Uploaded by

YAHIA ADEL
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

lOMoARcPSD|6851183

Ch 02 PPT - simple linear regression

Econometrics (香港浸會大學)

Studocu is not sponsored or endorsed by any college or university


Downloaded by YAHIA ADEL ([email protected])
lOMoARcPSD|6851183

The Simple
Regression Model

Chapter 2

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Definition
Definition of the simple linear regression model

"Explains variable in terms of variable "

Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
regressand, explanatory variable,
response variable,… unobservables,…
regressor,…

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Interpretation
I nterpretation of the simple linear regression model

"Studies how varies with changes in :"

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

The simple linear regression model is rarely applicable in prac-


tice but its discussion is useful for pedagogical reasons

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Examples
Example: Soybean yield and fertilizer

Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed

Example: A simple w age equation

Labor force experience,


tenure with current employer,
work ethic, intelligence …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Conditional Mean
Independence
When is there a causal interpretation?
Conditional mean independence assumption

The explanatory variable must not


contain information about the mean
of the unobserved factors

Example: w age equation

e.g. intelligence …

The conditional mean independence assumption is unlikely to hold because


individuals with more education will also be more intelligent on average.

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Population Regression
Function
Population regression function ( PRF)
The conditional mean independence assumption implies that

This means that the average value of the dependent variable


can be expressed as a linear function of the explanatory variable

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example of PRF

Population regression function

For individuals with , the


average value of is

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Random Sample
I n order to estimate the regression model one needs data

A random sample of observations

First observation

Second observation

Third observation Value of the dependent


variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Fitted Regression Line


Fit as good as possible a regression line through the data points:

Fitted regression line


For example, the i-th
data point

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

OLS Estimates
What does "as good as possible" mean?
Regression residuals

Minimize sum of squared regression residuals

Ordinary Least Squares ( OLS) estimates

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 1 (ceosal1)
CEO Salary and return on equity

Salary in thousands of dollars Return on equity of the CEO‘s firm

Fitted regression

Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by 18.501 $
Causal interpretation?

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 1 (ceosal1)

salary c roe

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 1 (ceosal1)

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 1 Cont‘d

Fitted regression line


(depends on sample)

Unknown population regression line

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 2 (wage1)
Wage and education

Hourly wage in dollars Years of education

Fitted regression

Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by 0.54 $
Causal interpretation?

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 2 (wage1)

wage c educ

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 3 (vote1)
Voting outcomes and campaign expenditures ( tw o parties)

Percentage of vote for candidate A Percentage of campaign expenditures on candidate A

Fitted regression

Intercept
If candidate A‘s share of spending increases by one
percentage point, he or she receives 0.464 percen-
Causal interpretation? tage points more of the total vote

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example 3 (vote1)

votea c sharea

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Properties of OLS
Properties of OLS on any sample of data
Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

Algebraic properties of OLS regression

Deviations from regression Correlation between deviations Sample averages of y and


line sum up to zero and regressors is zero x lie on regression line

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example Illustration

For example, CEO number 12‘s salary was


526,023 $ lower than predicted using the
the information on his firm‘s return on equity

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Goodness of Fit
Goodness-of-Fit

"How well does the explanatory variable explain the dependent variable?"

Measures of Variation

Total sum of squares, Explained sum of squares, Residual sum of squares,


represents total variation represents variation represents variation not
in dependent variable explained by regression explained by regression

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

R-squared
Decomposition of total variation

Total variation Explained part Unexplained part

Goodness-of-fit measure ( R-squared / Coefficient of

determination)

R-squared measures the fraction of the


total variation that is explained by the
regression

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Examples (Reporting
Regression Result)
CEO Salary and return on equity

The regression explains only 1.3 %


of the total variation in salaries

Voting outcomes and campaign expenditures

The regression explains 85.6 % of the


total variation in election outcomes

Caution: A high R-squared does not necessarily mean that the


regression has a causal interpretation!

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Unit of Measurement

෣ =963,191+18,501roe
‫݈݋݀ݕݎ݈ܽܽݏ‬

෣ =963.191+1850.1roedec
‫ݕݎ݈ܽܽݏ‬

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Semi-logarithmic Form
I ncorporating nonlinearities: Semi-logarithmic form
Regression of log w ages on years of eduction

Natural logarithm of wage

This changes the interpretation of the regression coefficient:

Percentage change of wage

… if years of education
are increased by one year

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example (wage1)

log(wage) c educ

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example (wage1)
Fitted regression

The wage increases by 8.3 % for


every additional year of education
(= return to education)

For example:

Growth rate of wage is 8.3 %


per year of education

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Log-logarithmic Form
I ncorporating nonlinearities: Log-logarithmic form
CEO salary and firm sales

Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales

This changes the interpretation of the regression coefficient:

Percentage change of salary


… if sales increase by 1 %

Logarithmic changes are


always percentage changes

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example (ceosal1)

log(salary) c log(sales)

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Example (ceosal1)
CEO salary and firm sales: fitted regression

For example: + 1 % sales → + 0.257 % salary

The log-log form postulates a constant elasticity model,


w hereas the semi-log form assumes a semi-elasticity model

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Statistical Properties of OLS


Estimators
Expected values and variances of the OLS estimators
The estimated regression coefficients are random variables
because they are calculated from a random sample

Data is random and depends on particular sample that has been drawn

The question is w hat the estimators w ill estimate on average


and how large their variability in repeated samples is

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Assumptions for OLS


Standard assumptions for the linear regression model

Assumption SLR.1 ( Linear in parameters)

In the population, the relationship


between y and x is linear

Assumption SLR.2 ( Random sampling)

The data is a random sample


drawn from the population

Each data point therefore follows


the population equation

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Discussion of Random
Sampling
Discussion of random sampling: Wage and education
The population consists, for example, of all workers of country A
In the population, a linear relationship between wages (or log wages)
and years of education holds
Draw completely randomly a worker from the population
The wage and the years of education of the worker drawn are random
because one does not know beforehand which worker is drawn

Throw back worker into population and repeat random draw times

The wages and years of education of the sampled workers are used to
estimate the linear relationship between wages and education

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Illustration

The values drawn


for the i-th worker

The implied deviation


from the population
relationship for
the i-th worker:

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Assumptions for OLS (Cont‘d)


Assumptions for the linear regression model ( cont.)

Assumption SLR.3 ( Sample variation in explanatory variable)

The values of the explanatory variables are not all


the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
lead to different values of the dependent variable)

Assumption SLR.4 ( Zero conditional mean)

The value of the explanatory variable must


contain no information about the mean of
the unobserved factors

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Unbiaseness of OLS
Theorem 2.1 ( Unbiasedness of OLS)

I nterpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
"On average" means if sampling was repeated, i.e. if the random
sample draw and the estimation was repeated many times
In a given sample, estimates may differ considerably from true values

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Assumptions for OLS (Cont‘d)


Variances of the OLS estimators
Depending on the sample, the estimates will be nearer or farther
away from the true population values
How far can we expect our estimates to be away from the true
population values on average (= sampling variability)?
Sampling variability is measured by the estimator‘s variances

Assumption SLR.5 ( Homoskedasticity)

The value of the explanatory variable must


contain no information about the variability
of the unobserved factors

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Homoskedasticity
Graphical illustration of homoskedasticity

The variability of the unobserved


influences does not dependent on
the value of the explanatory variable

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Heteroskedasticity
An example for heteroskedasticity: Wage and education

The variance of the unobserved


determinants of wages increases
with the level of education

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Variances of OLS Estimators


Theorem 2.2 ( Variances of OLS estimators)

Under assumptions SLR.1 – SLR.5:

Conclusion:
The sampling variability of the estimated regression coefficients will be
the higher the larger the variability of the unobserved factors, and the
lower, the higher the variation in the explanatory variable

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Error Variance
Estimating the error variance

The variance of u does not depend on x,


i.e. is equal to the unconditional variance

One could estimate the variance of the


errors by calculating the variance of the
residuals in the sample; unfortunately
this estimate would be biased

An unbiased estimate of the error variance can be obtained by


substracting the number of estimated regression coefficients
from the number of observations

Downloaded by YAHIA ADEL ([email protected])


lOMoARcPSD|6851183

Unbiaseness of Error Variance


Theorem 2.3 ( Unbiasedness of the error variance)

Calculation of standard errors for regression coefficients

Plug in for
the unknown

The estimated standard deviations of the regression coefficients are called "standard
errors". They measure how precisely the regression coefficients are estimated.

Downloaded by YAHIA ADEL ([email protected])

You might also like