0% found this document useful (0 votes)

44 views

2 Simple Regression Model

The document discusses the basics of simple linear regression models, including defining the population regression function, deriving the ordinary least squares estimates that minimize the sum of squared residuals, and providing examples of estimating and interpreting regression coefficients based on sample data to predict relationships between dependent and independent variables.

Uploaded by

fitra purna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

2 Simple Regression Model

Uploaded by

fitra purna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Simple Regression Model

Ani Katchova

© 2020 by Ani Katchova. All rights reserved.

Outline
• Simple regression terminology
• Examples and interpretation of coefficients
• Population regression function
• Derivation of OLS estimates
• Examples of simple regression – interpretation of results
• Variations, R-squared
• Log transformations - Log-log, log-linear, and linear-log forms
• Gauss Markov assumptions
• Unbiasedness of OLS estimators
• Variance of OLS estimators

2
Terminology
Regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
y is dependent variable, x is independent variable (one independent
variable for a simple regression), u is error, β0 and β1 are parameters.

Estimated equation: 𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 Population Sample

𝑦𝑦� is predicted value, 𝛽𝛽̂0 and 𝛽𝛽̂1 are coefficients Parameter 𝛽𝛽 Coefficient 𝛽𝛽̂
Error 𝑢𝑢 Residual 𝑢𝑢�

Residual: 𝑢𝑢� = 𝑦𝑦 − 𝑦𝑦�

𝑢𝑢� = actual value minus predicted value for dependent variable

3
Simple regression model example
Dependent Indep. Predicted value Residual Simple regression: actual and predicted values
variable y variable x 𝑦𝑦� = 20 + 0.5𝑥𝑥 𝑢𝑢� = 𝑦𝑦 − 𝑦𝑦� 22.5

Hourly wage Years of 21.5

$ experience 21

20 1 =20+0.5*1=20.5 =20-20.5=-0.5 20.5

21 2 =20+0.5*2=21 =21-21=0
20

19.5
21 1 =20+0.5*1=20.5 =21-20.5=0.5 0 0.5 1 1.5 2 2.5 3 3.5

Hourly wage Predicted wage Linear (Predicted wage)

22 3 =20+0.5*3=21.5 =22-21.5=0.5

Simple regression: hourly wage depends on years of experience.

Figure shows regression line, slope, predicted values, actual values, and residuals.

4
Simple regression:
actual values, predicted values, and residuals
Regression line fits as good as possible through the data points

5
Interpretation of coefficients
∆𝑦𝑦 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦
𝛽𝛽̂1 = =
∆𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥

• The coefficient 𝛽𝛽̂1 measures by how much the dependent variable

changes when the independent variable changes by one unit.
• 𝛽𝛽̂1 is also called slope in the simple linear regression.
• A derivative of a function is another function showing the slope.
∆𝑢𝑢
• The formula above is correct if =0, which means all other factors
∆𝑥𝑥
are fixed.

6
Population regression function
Population regression function:
𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝐸𝐸 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝐸𝐸 𝑢𝑢 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥
if 𝐸𝐸 𝑢𝑢 𝑥𝑥 =0 (this assumption is called zero conditional mean)

For the population, the average value of the dependent variable can be
expressed as a linear function of the independent variable.

7
Population regression function
• Population regression function shows the relationship between y and
x for the population

8
Population regression function
• For individuals with a particular x, the average value of y is 𝐸𝐸 𝑦𝑦 𝑥𝑥 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥

• Note that x1, x2, x3 here refers to xi and not different variables

9
Derivation of the OLS estimates
• For a regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• We need to estimate the regression equation: 𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 and find
the coefficients 𝛽𝛽̂0 and 𝛽𝛽̂1 by looking at the residuals
• 𝑢𝑢� = 𝑦𝑦 − 𝑦𝑦� = 𝑦𝑦 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥
• Obtain a random sample of data with n observations
(𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 ), where 𝑖𝑖 = 1 … 𝑛𝑛 is the observation
• The goal is to obtain as good fit as possible of the estimated
regression equation

10
Derivation of the OLS estimates
• Minimize the sum of squared residuals

𝑛𝑛 𝑛𝑛
min � 𝑢𝑢� 2 = � (𝑦𝑦 − 𝛽𝛽̂0 − 𝛽𝛽̂1 𝑥𝑥 )2
𝑖𝑖=1 𝑖𝑖=1
We obtain OLS coefficients:
∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)(𝑦𝑦
̅ 𝑖𝑖 − 𝑦𝑦)
� 𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑦𝑦)
̂
𝛽𝛽1 = =
∑(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 2 𝑣𝑣𝑣𝑣𝑣𝑣(𝑥𝑥)

𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅

OLS is Ordinary Least Squares, based on minimizing the squared residuals.

11
OLS properties
𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥̅
The sample average of the dependent and independent variable are on
the regression line
𝑛𝑛
� 𝑢𝑢� = 0
𝑖𝑖=1
The residuals sum up to zero (note that we minimized the sum of
squared residuals)
𝑛𝑛
� 𝑥𝑥 𝑢𝑢� = 0
𝑖𝑖=1
The covariance between the independent variable and residual is zero.
12
Simple regression example: CEO’s salary
Simple regression model explaining how return on equity (roe) affects CEO’s salary.
Regression model
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝛽𝛽0 + 𝛽𝛽1 𝑟𝑟𝑟𝑟𝑟𝑟 + 𝑢𝑢

Estimated equation for predicted value of wage

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

Residuals
�
𝑢𝑢� = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
We estimate the regression model to find the coefficients.
𝛽𝛽̂1 measures the change in the CEO’s salary associated with one unit increase in
roe, holding other factors fixed.
13
Estimated equation and interpretation

• Estimated equation
� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
• Salary is measured in thousand dollars, ROE (return on equity) is measured
in %.
• 𝛽𝛽̂1 measures the change in the CEO’s salary associated with one unit
increase in roe, holding other factors fixed.
• Interpretation of 𝛽𝛽̂1 : the CEO’s salary increases by $18,501 for each 1%
increase in ROE.
• Interpretation of 𝛽𝛽̂0 : if the ROE is zero, the CEO’s salary is $963,191.

14
Stata output for simple regression
. regress salary roe

Source SS df MS Number of obs = 209

F(1, 207) = 2.77
Model 5166419.04 1 5166419.04 Prob > F = 0.0978
Residual 386566563 207 1867471.32 R-squared = 0.0132
Adj R-squared = 0.0084
Total 391732982 208 1883331.64 Root MSE = 1366.6

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

roe 18.50119 11.12325 1.66 0.098 -3.428196 40.43057

_cons 963.1913 213.2403 4.52 0.000 542.7902 1383.592

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
15
Simple regression results in a table
(1)
VARIABLES salary

roe 18.50*
(11.12)
Constant 963.2***
(213.2)

Observations 209
R-squared 0.013

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠

16
Regression line for sample vs
population regression function for population

17
Estimated regression

15000
10000
5000
0

0 20 40 60
return on equity, 88-90 avg

1990 salary, thousands $ Fitted values

Actual and predicted values

18
Actual values, predicted values, and residuals

15000
10000
5000
0

0 20 40 60
return on equity, 88-90 avg
True value Predicted value
Residual
19
Actual, predicted values, and residuals
roe salary �
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑢𝑢�
predicted value Residual
963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟 �
𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 − 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
14.1 1095 1224 -129
10.9 1001 1165 -164
23.5 1122 1398 -276
5.9 578 1072 -494
13.8 1368 1219 149
20 1145 1333 -188
16.4 1078 1267 -189
16.3 1094 1265 -171
10.5 1237 1157 80
26.3 833 1450 -617

The mean salary is 1,281 ($1,281,000). The mean predicted salary is also 1,281.
The mean for the residuals is zero. 20
Simple regression example: wage
Simple regression model explaining how education affects wages for workers.
Regression model
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢

Estimated equation for predicted value of wage

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤

Residuals
𝑢𝑢� = 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 − 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
�
We estimate the regression model to find the coefficients.
𝛽𝛽̂1 measures the change in wage associated with one more year of education,
holding other factors fixed.
21
Estimated equation and interpretation

• Estimated equation
� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 = −0.90 + 0.54 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤
• Wage is measured in $/hour. Education is measured in years.
• 𝛽𝛽̂1 measures the change in person’s wage associated with one
additional year increase in education, holding other factors fixed.
• Interpretation of 𝛽𝛽̂1 : the hourly wage increases by $0.54 for
additional year of education.
• Interpretation of 𝛽𝛽̂0 : if education is zero, person’s wage is -$0.90 (but
no one in the sample has zero education).
22
Stata output for simple regression
. reg wage educ

Source SS df MS Number of obs = 526

F(1, 524) = 103.36
Model 1179.73205 1 1179.73205 Prob > F = 0.0000
Residual 5980.68226 524 11.4135158 R-squared = 0.1648
Adj R-squared = 0.1632
Total 7160.41431 525 13.6388844 Root MSE = 3.3784

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5413593 .053248 10.17 0.000 .4367534 .6459651

_cons -.9048517 .6849678 -1.32 0.187 -2.250472 .4407687

23
Variations
𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦 − 𝑦𝑦)
� 2 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦� − 𝑦𝑦)
� 2 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1(𝑦𝑦 − 𝑦𝑦)
� 2 = ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢� 2
SST = SSE + SSR
• SST is total sum of squares and measures the total variation in the
dependent variable
• SSE is explained sum of squares and measures the variation explained by
the regression
• SSR is residual sum of squares and measures the variation not explained by
the regression
Note: some call SSE error sum of squared and SSR regression sum of squares,
where R & E are confusingly reversed.
24
Variations

25
Goodness of fit measure
R-squared
• R2 = SSE/SST = 1 – SSR/SST
• R-squares is explained sum of squares divided by total sum of
squares.
• R-squared is a goodness of fit measure. It measures the proportion of
total variation that is explained by the regression.
• An R-squared of 0.7 is interpreted as 70% of the variation is explained
by the regression and the rest is due to error.
• R-squared that is greater than 0.25 is considered good fit.

26
R-squared calculated
. reg wage educ

Source SS df MS Number of obs = 526

F(1, 524) = 103.36
Model 1179.73205 1 1179.73205 Prob > F = 0.0000
Residual 5980.68226 524 11.4135158 R-squared = 0.1648
Adj R-squared = 0.1632
Total 7160.41431 525 13.6388844 Root MSE = 3.3784

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5413593 .053248 10.17 0.000 .4367534 .6459651

_cons -.9048517 .6849678 -1.32 0.187 -2.250472 .4407687

R-squared = SS Model /SS Total = 1179.73 / 7160.41 = 0.1648

16% of the variation in wage is explained by the regression and the rest is due to error.
This is not a very good fit. 27
Log transformation (logged variables)
• Sometimes variables (y or x) are expressed as logs, log(y) or log(x)
• With logs, interpretation is in percentage/elasticity
• Variables such as age and education that are measured in units such
as years should not be logged
• Variables measured in percentage points (e.g. interest rates) should
not be logged
• Logs cannot be used if variables have zero or negative values
• Taking logs often reduces problems with large values or outliers
• Taking logs helps with homoskedasticity and normality
28
Log-log form
• Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• log-log form: 𝑙𝑙𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝛽𝛽0 + 𝛽𝛽1 log(𝑥𝑥) + 𝑢𝑢
• Instead of the dependent variable, use log of the dependent variable.
• Instead of the independent variable, use log of the independent variable.

∆log(𝑦𝑦) ∆𝑦𝑦 𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

𝛽𝛽̂1 = = =
∆log(𝑥𝑥) 𝑦𝑦 ∆𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥

• The dependent variable changes by 𝛽𝛽̂1 percent when the independent

variable changes by one percent.

29
Log-linear form (also called semi-log)
• Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• Log-linear form: 𝑙𝑙𝑙𝑙𝑙𝑙(𝑦𝑦) = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• Instead of the dependent variable, use log of the dependent variable.

∆log(𝑦𝑦) ∆𝑦𝑦 1 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

𝛽𝛽̂1 = = =
∆𝑥𝑥 𝑦𝑦 ∆𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥

• The dependent variable changes by 𝛽𝛽̂1 *100 percent when the

independent variable changes by one unit.
30
Linear-log form
• Linear regression model: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• Linear-log form: 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 log(𝑥𝑥) + 𝑢𝑢
• Instead of the independent variable, use log of the independent
variable.

∆𝑦𝑦 𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

𝛽𝛽̂1 = = ∆𝑦𝑦 =
∆log(𝑥𝑥) ∆𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑥𝑥

• The dependent variable changes by 𝛽𝛽̂1 /100 units when the

independent variable changes by one percent.
31
Example of data with logs

wage lwage educ

3.10 1.13 11
3.24 1.18 12
3.00 1.10 11
6.00 1.79 8
5.30 1.67 12
8.75 2.17 16
11.25 2.42 18
5.00 1.61 12
3.60 1.28 12
18.18 2.90 17

32
Linear vs log-linear form
25

3
20

2
15

1
10

0
5
0

-1
0 5 10 15 20 0 5 10 15 20
educ educ

wage Fitted values lwage Fitted values

Linear form: wage on education Log-linear form: log wage on education 33

Linear vs log-linear form
(1) (2)
VARIABLES wage lwage

educ 0.541* 0.0827*

(0.0532) (0.00757)
Constant -0.905 0.584***
(0.685) (0.0973)

Observations 526 526

R-squared 0.165 0.186

Linear form: wage increases by $0.54 for each additional year of education.
Log-linear form: wage increases by 8.2% for each additional year of education.
34
Example of data with logs
Salary Sales
(thousand (Million
dollars) lsalary dollars) lsales
1095 7.0 27595 10.2
1001 6.9 9958 9.2
1122 7.0 6126 8.7
578 6.4 16246 9.7
1368 7.2 21783 10.0
1145 7.0 6021 8.7
1078 7.0 2267 7.7
1094 7.0 2967 8.0
1237 7.1 4570 8.4
833 6.7 2830 7.9

Note that one unit is thousand dollars for salary and million dollars for sales.
35
Linear vs log-log form
15000

10
9
10000

8
7
5000

6
0

5
0 20000 40000 60000 80000 100000 4 6 8 10 12
1990 firm sales, millions $ natural log of sales

1990 salary, thousands $ Fitted values natural log of salary Fitted values

Linear form: salary on sales Log-log form: log salary on log sales 36
Log-linear vs linear-log form

15000
10
9

10000
8
7

5000
6

0
5

0 20000 40000 60000 80000 100000 4 6 8 10 12

1990 firm sales, millions $ natural log of sales

natural log of salary Fitted values 1990 salary, thousands $ Fitted values

Log-linear form: log salary on sales Linear-log form: salary on log sales 37
Interpretation of coefficients
Linear Log-log Log-linear Linear-log
VARIABLES salary lsalary lsalary salary

sales 0.0155* 1.50e-05***

(0.00891) (3.55e-06)
lsales 0.257*** 262.9***
(0.0345) (92.36)
Constant 1,174*** 4.822*** 6.847*** -898.9
(112.8) (0.288) (0.0450) (771.5)

Linear form: salary increases by 0.155 thousand dollars ($155 dollars) for each additional one million dollars in sales.
Log-log form: salary increases by 0.25% for every 1% increase in sales.
Log-linear form: salary increases by 0.0015% (=0.000015*100) for each additional one million dollar increase in sales.
Linear-log form: salary increases by 2.629 (=262.9/100) thousand dollars for each additional 1% increase in sales.
38
Review questions
1. Define regression model, estimated equation, and residuals.
2. What method is used to obtain the coefficients?
3. What are the OLS properties?
4. How is R-squared defined and what does it measure?
5. By taking logs of the variable, how does the interpretation of
coefficients change?

39
Gauss Markov assumptions
• Gauss Markov assumptions are standard assumptions for the linear
regression model
1. Linearity in parameters
2. Random sampling
3. No perfect collinearity (or sample variance in the independent
variable)
4. Exogeneity or zero conditional mean – regressors are not correlated
with the error term
5. Homoscedasticity – variance of error term is constant

40
Assumption 1: linearity in parameters
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢

• The relationship between y and x is linear in the population.

• Note that the regression model can have logged variables (e.g. log
sales), squared variables (e.g. education2) or interactions of variables
(e.g. education*experience) but the 𝛽𝛽 parameters are linear.

41
Assumption 2: random sampling
𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 , where 𝑖𝑖= 1….n
• The data are a random sample drawn from the population.
• Each observation follows the population equation 𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥 + 𝑢𝑢
• Data on workers (y=wage, x=education).
• Population is all workers in the U.S. (150 million)
• Sample is workers selected for the study (1,000)
• Drawing randomly from the population – each worker has equal probability
of being selected
• For example, if young workers are oversampled, this will not be a
random/representative sample.

42
Assumption 3: no perfect collinearity
𝑛𝑛

𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 = �(𝑥𝑥 − 𝑥𝑥)̅ 2 > 0

𝑖𝑖=1
• In the simple regression model with one independent variable, there
needs to be sample variation in the independent variable (variance of
x must be positive).
• If there is no variation, the independent variable will be a constant
and a separate coefficient cannot be estimated because there is
perfect collinearity with the constant in the model.
∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)(𝑦𝑦
̅ �
𝑖𝑖 −𝑦𝑦)
̂
• Note that SSTx is in the denominator of 𝛽𝛽1 = ∑(𝑥𝑥𝑖𝑖 −𝑥𝑥)̅ 2

43
Assumption 4: zero conditional mean
(exogeneity)
𝐸𝐸 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 ) = 0
• Expected value of error term u given independent variable x is zero.
• The expected value of the error must not differ based on the values of
the independent variable.
• The errors must sum up to zero for each x.

44
Example of zero conditional mean
Regression model
𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 = 𝛽𝛽0 + 𝛽𝛽1 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 + 𝑢𝑢

• In the example of wage and education, when ability (which is

unobserved and part of the error) is higher, education would also be
higher.
• This is a violation of the zero conditional mean assumption.

45
Example of exogeneity vs endogeneity
Exogeneity - zero conditional mean Endogeneity - conditional mean is not zero
10

30
25
5

uhat_modified
20
Residuals

15
0

10
-5

5
10 12 14 16 18 10 12 14 16 18
educ educ

E(u|x)=0 error term is the same given education E(u|x)>0 ability/error is higher when education is higher
46
Unbiasedness of the OLS estimators
• Gauss Markov Assumptions 1-4 (linearity, random sampling, no
perfect collinearity, and zero conditional mean) lead to the
unbiasedness of the OLS estimators.
𝐸𝐸 𝛽𝛽̂0 = 𝛽𝛽0 and 𝐸𝐸 𝛽𝛽̂1 = 𝛽𝛽1
• Expected values of the sample coefficients 𝛽𝛽̂ are the population
parameters 𝛽𝛽.
• If we estimate the regression model with many random samples, the
average of these coefficients will be the population parameter.
• For a given sample, the coefficients may be very different from the
population parameters.

47
Assumption 5: homoscedasticity
• Homoscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 = 𝜎𝜎 2
• Variance of the error term 𝑢𝑢 must not differ with the independent
variable 𝑥𝑥.
• Heteroscedasticity 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢𝑖𝑖 𝑥𝑥𝑖𝑖 ≠ 𝜎𝜎 2 is when the variance of the error
term 𝑢𝑢 is not constant for each 𝑥𝑥.

48
Homoscedasticity vs heteroscedasticity
Homoscedasticity Heteroscedasticity
𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 = 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 ≠ 𝜎𝜎 2

49
Homoscedasticity vs heteroscedasticity
Homoscedasticity Heteroscedasticity
𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 = 𝜎𝜎 2 𝑣𝑣𝑣𝑣𝑣𝑣 𝑢𝑢 𝑥𝑥 ≠ 𝜎𝜎 2
10

15
10
5
Residuals

Residuals
5
0

0
-5

-5
10 12 14 16 18 0 5 10 15 20
educ educ

50
Unbiasedness of the error variance
We can estimate the variance of the error term as:
1
2
𝜎𝜎� = ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢� 𝑖𝑖2
𝑛𝑛−2
• The degrees of freedom (n-k-1) are corrected for the number of
independent variables k=1.

• Gauss Markov Assumptions 1-5 (linearity, random sampling, no

perfect collinearity, zero conditional mean, and homoscedasticity)
lead to the unbiasedness of the error variance.
𝐸𝐸 𝜎𝜎� 2 = 𝜎𝜎 2
51
Variances of the OLS estimators
• The estimated regression coefficients are random, because the
sample is random. The coefficients will vary if a different sample is
chosen.
• What is the sample variability in these OLS coefficients? How far are
the coefficients from the population parameters?

52
Variances of the OLS estimators

𝜎𝜎 2 𝜎𝜎 2
𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 = 𝑛𝑛 2
=
∑𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥)̅ 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥

𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0 = 𝑛𝑛 2
=
∑𝑖𝑖=1(𝑥𝑥 − 𝑥𝑥)̅ 𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥
The variances are higher if the variance of the error term is higher and
if the variance in the independent variable is lower.
Estimators with lower variance are desirable. This means low variance
in error term but high variance in the independent variable is desirable.
53
Standard errors of the regression coefficients

𝜎𝜎� 2
𝑠𝑠𝑠𝑠 𝛽𝛽̂1 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂1 =
𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥

𝜎𝜎� 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

𝑠𝑠𝑠𝑠 𝛽𝛽̂0 = 𝑣𝑣𝑣𝑣𝑣𝑣 𝛽𝛽̂0 =
𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥
• The standard errors are square root of the variances.
• The unknown population variance of error term 𝜎𝜎 2 is replaced with the sample variance
of the residuals 𝜎𝜎� 2 .
• The standard errors measure how precisely the regression coefficients are calculated.

54
Review questions
1. List and explain the 5 Gauss Markov assumptions.
2. Which assumptions are needed for the unbiasedness of the
coefficients?
3. Which assumptions are needed to calculate the variance of the OLS
coefficients?
4. Is it possible to have zero conditional mean but heteroscedasticity?

On Balanced Diet
100% (8)
On Balanced Diet
55 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Econ 251 S2018 PS6 Solutions
No ratings yet
Econ 251 S2018 PS6 Solutions
16 pages
Descriptive Text - LKPD
80% (5)
Descriptive Text - LKPD
2 pages
CH 2 Multiple Regression S
No ratings yet
CH 2 Multiple Regression S
78 pages
H-E-B Own Brands
No ratings yet
H-E-B Own Brands
7 pages
Agri 21
No ratings yet
Agri 21
11 pages
Simple Regression Model
No ratings yet
Simple Regression Model
54 pages
3 Multiple Regression Model
No ratings yet
3 Multiple Regression Model
48 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Class Exercise Simpleregression Ceo-Salary
No ratings yet
Class Exercise Simpleregression Ceo-Salary
1 page
EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
No ratings yet
EC212: Introduction To Econometrics Multiple Regression: Estimation (Wooldridge, Ch. 3)
76 pages
Ch3 Multiple Regression
No ratings yet
Ch3 Multiple Regression
56 pages
L9.1_2023
No ratings yet
L9.1_2023
47 pages
Econometrics 5 and 6
No ratings yet
Econometrics 5 and 6
16 pages
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 4 Answer Key
4 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
hjgh
No ratings yet
hjgh
48 pages
Modelo Multiple
No ratings yet
Modelo Multiple
51 pages
Excercise Review (Chapter 4-2)
No ratings yet
Excercise Review (Chapter 4-2)
5 pages
Student Name: Zainab Kashani Student ID: 63075: Date of Submission: Sep 16, 2020
No ratings yet
Student Name: Zainab Kashani Student ID: 63075: Date of Submission: Sep 16, 2020
10 pages
AE6207_Solution 1_2024
No ratings yet
AE6207_Solution 1_2024
8 pages
Empirical Exercises 6
No ratings yet
Empirical Exercises 6
7 pages
Ecomtrics Assigmnt and Answers
No ratings yet
Ecomtrics Assigmnt and Answers
8 pages
Results - Practical 2 - Econometrics
No ratings yet
Results - Practical 2 - Econometrics
4 pages
Homework5 Econometrics Revised
No ratings yet
Homework5 Econometrics Revised
5 pages
Stata Textbook Examples Introductory Econometrics by Jeffrey PDF
No ratings yet
Stata Textbook Examples Introductory Econometrics by Jeffrey PDF
104 pages
Stata Textbook Examples Introductory Eco No Metrics by Jeffrey
100% (1)
Stata Textbook Examples Introductory Eco No Metrics by Jeffrey
104 pages
05_week_economicsofeducation
No ratings yet
05_week_economicsofeducation
11 pages
ECMT1020 - Week 06 Workshop
No ratings yet
ECMT1020 - Week 06 Workshop
4 pages
Assignement 1 .Hridita. BUS 525
No ratings yet
Assignement 1 .Hridita. BUS 525
10 pages
Assignment 1 With Answers PDF
No ratings yet
Assignment 1 With Answers PDF
8 pages
β school+u (for women) : Estimation of log (wage) =β +
No ratings yet
β school+u (for women) : Estimation of log (wage) =β +
3 pages
Any Questions?: Eco 205: Econometrics
No ratings yet
Any Questions?: Eco 205: Econometrics
25 pages
QM 9 Instrumental Variables I
No ratings yet
QM 9 Instrumental Variables I
29 pages
ECON2280 T4 Stata Output
No ratings yet
ECON2280 T4 Stata Output
3 pages
Assignment No5
No ratings yet
Assignment No5
1 page
Introduction To Econometrics - Stock & Watson - CH 6 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 6 Slides
59 pages
YD Slides5 NonLin
No ratings yet
YD Slides5 NonLin
54 pages
MLRM
No ratings yet
MLRM
67 pages
No Linealidades Stock Watson
No ratings yet
No Linealidades Stock Watson
59 pages
sheet 2A_MR_2021
No ratings yet
sheet 2A_MR_2021
3 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
CH 6 Slides
No ratings yet
CH 6 Slides
59 pages
Week 3 Hypothesis Testing and Inference - 2024
No ratings yet
Week 3 Hypothesis Testing and Inference - 2024
51 pages
Dummy Variable Regression Models
No ratings yet
Dummy Variable Regression Models
48 pages
Multiple-Variable OLS Regressions
No ratings yet
Multiple-Variable OLS Regressions
10 pages
KTL 31 10 2024
No ratings yet
KTL 31 10 2024
2 pages
Topic 1 class exercises
No ratings yet
Topic 1 class exercises
5 pages
Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods
No ratings yet
Research Method: Lecture 7 (Ch14) Pooled Cross Sections and Simple Panel Data Methods
47 pages
Lab Exercises Answer
No ratings yet
Lab Exercises Answer
13 pages
CH 5 - Multicollearity
No ratings yet
CH 5 - Multicollearity
27 pages
Statistical Inference
No ratings yet
Statistical Inference
49 pages
Kiểm định
No ratings yet
Kiểm định
3 pages
Assignment
No ratings yet
Assignment
6 pages
Specification
No ratings yet
Specification
12 pages
Nonlinear Regression Part 1
No ratings yet
Nonlinear Regression Part 1
47 pages
Lnq = Β + Β Lnli + Β Lnki + Ɛ
No ratings yet
Lnq = Β + Β Lnli + Β Lnki + Ɛ
12 pages
Stata File Vanshika Arora
No ratings yet
Stata File Vanshika Arora
4 pages
CE1 Sol
No ratings yet
CE1 Sol
7 pages
L10.2_2023
No ratings yet
L10.2_2023
64 pages
Introduction To Econometrics, 5 Edition: Chapter 4: Nonlinear Models and Transformations of Variables
No ratings yet
Introduction To Econometrics, 5 Edition: Chapter 4: Nonlinear Models and Transformations of Variables
27 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Lecture 1
No ratings yet
Lecture 1
33 pages
Immediate download Compassionate Mind Guide to Overcoming Anxiety Using Compassion Focused Therapy to Calm Worry Panic and Fear The New Harbinger Compassion Focused Therapy Series 1st Edition Tirch ebooks 2024
100% (3)
Immediate download Compassionate Mind Guide to Overcoming Anxiety Using Compassion Focused Therapy to Calm Worry Panic and Fear The New Harbinger Compassion Focused Therapy Series 1st Edition Tirch ebooks 2024
81 pages
Astaangahradyam 3
No ratings yet
Astaangahradyam 3
4 pages
B Animation Principles
No ratings yet
B Animation Principles
6 pages
026 The Ordination of Deacons Priests and Bishops
No ratings yet
026 The Ordination of Deacons Priests and Bishops
46 pages
Role of Literature Review in The Research Process
No ratings yet
Role of Literature Review in The Research Process
5 pages
S.D. PUBLIC SCHOOL (1)
No ratings yet
S.D. PUBLIC SCHOOL (1)
23 pages
Understanding The Self Module 3: Differences
No ratings yet
Understanding The Self Module 3: Differences
2 pages
Lecture4 Project Management PDF
No ratings yet
Lecture4 Project Management PDF
20 pages
Who Exercise
No ratings yet
Who Exercise
3 pages
Journal of Computer Science Science Publications - 2019 - Vol7
No ratings yet
Journal of Computer Science Science Publications - 2019 - Vol7
7 pages
Instructions - Assume Suitable Data Wherever Necessary and Write The Assumptions Clearly
No ratings yet
Instructions - Assume Suitable Data Wherever Necessary and Write The Assumptions Clearly
2 pages
Pengaruh Terapi Bermain Dalam Menurunkan Kecemasan Pada Anak Sebagai Dampak Hospitalisasi Di RSUD Ambarawa
No ratings yet
Pengaruh Terapi Bermain Dalam Menurunkan Kecemasan Pada Anak Sebagai Dampak Hospitalisasi Di RSUD Ambarawa
6 pages
Chapter1 - Intoduction To Psychology PDF
No ratings yet
Chapter1 - Intoduction To Psychology PDF
34 pages
678872273-1743353377891-02vip90_tuanso07thionlinehoctubotuvungtrongdiemtheochudebuoi11
No ratings yet
678872273-1743353377891-02vip90_tuanso07thionlinehoctubotuvungtrongdiemtheochudebuoi11
6 pages
Eyeline Magazine: Sense of Place
No ratings yet
Eyeline Magazine: Sense of Place
5 pages
The Outsider Essay
100% (2)
The Outsider Essay
3 pages
Edited Mahgul (101 Bba 2k18)
No ratings yet
Edited Mahgul (101 Bba 2k18)
5 pages
Chapter 5 Spoilage, Rework, and Scrap
100% (1)
Chapter 5 Spoilage, Rework, and Scrap
21 pages
Deconstruction Literary Theory
No ratings yet
Deconstruction Literary Theory
40 pages
IELTS Writing Preparation Course Expectations and Outcome A Comparative Study of Iranian Students and Their Teachers Perspectives
No ratings yet
IELTS Writing Preparation Course Expectations and Outcome A Comparative Study of Iranian Students and Their Teachers Perspectives
25 pages
Congratulation
No ratings yet
Congratulation
4 pages
Module 2 The Western and Eastern Thoughts
100% (1)
Module 2 The Western and Eastern Thoughts
12 pages
Board Member Toolkit
100% (1)
Board Member Toolkit
112 pages
01141545
No ratings yet
01141545
7 pages
Marine Thesis Proposals
100% (3)
Marine Thesis Proposals
4 pages

2 Simple Regression Model

Uploaded by

2 Simple Regression Model

Uploaded by

Simple Regression Model

© 2020 by Ani Katchova. All rights reserved.

Estimated equation: 𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑥𝑥 Population Sample

Residual: 𝑢𝑢� = 𝑦𝑦 − 𝑦𝑦�

Hourly wage Years of 21.5

20 1 =20+0.5*1=20.5 =20-20.5=-0.5 20.5

Hourly wage Predicted wage Linear (Predicted wage)

Simple regression: hourly wage depends on years of experience.

• The coefficient 𝛽𝛽̂1 measures by how much the dependent variable

𝛽𝛽̂0 = 𝑦𝑦� − 𝛽𝛽̂1 𝑥𝑥̅

Estimated equation for predicted value of wage

Source SS df MS Number of obs = 209

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

roe 18.50119 11.12325 1.66 0.098 -3.428196 40.43057

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟

� = 𝛽𝛽̂0 + 𝛽𝛽̂1 𝑟𝑟𝑟𝑟𝑟𝑟 = 963.191 + 18.501 𝑟𝑟𝑟𝑟𝑟𝑟

1990 salary, thousands $ Fitted values

Actual and predicted values

Estimated equation for predicted value of wage

Source SS df MS Number of obs = 526

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5413593 .053248 10.17 0.000 .4367534 .6459651

Source SS df MS Number of obs = 526

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .5413593 .053248 10.17 0.000 .4367534 .6459651

R-squared = SS Model /SS Total = 1179.73 / 7160.41 = 0.1648

∆log(𝑦𝑦) ∆𝑦𝑦 𝑥𝑥 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

• The dependent variable changes by 𝛽𝛽̂1 percent when the independent

∆log(𝑦𝑦) ∆𝑦𝑦 1 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

• The dependent variable changes by 𝛽𝛽̂1 *100 percent when the

∆𝑦𝑦 𝑥𝑥 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑦𝑦

• The dependent variable changes by 𝛽𝛽̂1 /100 units when the

wage lwage educ

wage Fitted values lwage Fitted values

Linear form: wage on education Log-linear form: log wage on education 33

educ 0.541*** 0.0827***

Observations 526 526

0 20000 40000 60000 80000 100000 4 6 8 10 12

sales 0.0155* 1.50e-05***

• The relationship between y and x is linear in the population.

𝑆𝑆𝑆𝑆𝑇𝑇𝑥𝑥 = �(𝑥𝑥 − 𝑥𝑥)̅ 2 > 0

• In the example of wage and education, when ability (which is

• Gauss Markov Assumptions 1-5 (linearity, random sampling, no

𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2 𝜎𝜎 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

𝜎𝜎� 2 𝑛𝑛−1 ∑𝑛𝑛𝑖𝑖=1 𝑥𝑥𝑖𝑖2

You might also like

educ 0.541* 0.0827*