Lec Topic2
Lec Topic2
September, 2024
1 / 68
Definition of the Simple Regression Model
2 / 68
Definition of the Simple Regression Model
y = β0 + β1 x + u
3 / 68
Interpretation of the SLR Model
▶ The SLR model tries to “explain variable y in terms of variable x” or
“study how y varies with changes in x”:
dy ∂y ∂y ∂u ∂u
= + = β1 + .
dx ∂x
|{z} ∂u ∂x ∂x
ceteris paribus effect
∂x = β1 , which is
▶ From the definition, the causal effect of x on y is ∂y
equal to dx only if ∂x = 0. That is, dx captures the causal effect on
dy ∂u dy
x on y only if ∂x
∂u
=0
▶ The simple linear regression model is rarely applicable in practice
but its discussion is useful for pedagogical reasons.
4 / 68
Two SLR Examples
yield = β0 + β1 fertilizer + u,
wage = β0 + β1 educ + u,
5 / 68
When Can We Estimate a Causal Effect?
▶ When ∂x∂u
= 0, dy
dx has a causal interpretation for each individual.
However, very often in practice, we are not able to estimate the
individual specific causal effect.
Technically, because we usually can observe only one pair of (x , y )
for each individual, we cannot identify the individual causal effect
which requires y values for at least two x values.
▶ We therefore explore the change in E [y |x ] in response to a change
in x , and discuss when this has a casual interpretation.
6 / 68
When Can We Estimate a Causal Effect?
E [u|x ] = 0
7 / 68
When Can We Estimate a Causal Effect?
E [y |x ] = E [β0 + β1 x + u|x ]
= β0 + β1 x + E [u|x ] (1)
= β 0 + β1 x
8 / 68
26 Part 1 Regression Analysis with Cross-Sectional Data
E (y |x ) as a linear function of x
F i g u r e 2 . 1 E(yx) as a linear function of x.
y
E(y|x) 5 0 1 1 x
For example, suppose that x is the high school grade point average and y is the college
GPA, and we happen to know that E(colGPAuhsGPA) 5 1.5 1 0.5 hsGPA. [Of course,
9 / 68
in practice, we never know the population intercept and slope, but it is useful to pretend
Exercise
10 / 68
Solution
▶ We want to estimate the following wage equation
wage = β0 + β1 educ + u,
Solution:
dE [wage|educ]
= E [wage|educ = 1] − E [wage|educ = 0]
deduc
= E [β0 + β1 educ + u|educ = 1] − E [β0 + β1 educ + u|educ = 0]
= (β0 + β1 ) + E [u|educ = 1] − β0 − E [u|educ = 0]
= (β0 + β1 ) + c − β0 − c = β1
12 / 68
Deriving the Ordinary Least Squares Estimates
13 / 68
Deriving the Ordinary Least Squares Estimates
A Random Sample
A Random Sample
Ping Yu (HKU)
yi = β0 +SLRβ1 xi + ui 18 / 78
14 / 68
the population regression of savings on income.
There are several ways to motivate the following estimation procedure. We will use
A Random Sample (2.5) and an important implication of assumption (2.6): in the population, u is uncorrelated
with x. Therefore, we see that u has zero expected value and that the covariance between
x and u is zero:
E(u) 5 0 [2.10]
▶ Savings and income of a random sample of 15 families, and the
populationF i gregression E [savings|income] = β0 + β1 income
u r e 2 . 2 Scatterplot of savings and income for 15 families, and the population
regression E(savingsincome) 5 0 1 1 income.
savings
E(savings|income) 5 0 1 1 income
0
income
15 / 68
Ordinary Least Squares (OLS) Estimation
▶ Denote (β̂0 , β̂1 ) be the estimates for parameters (β0 , β1 ).
16 / 68
Ordinary Least Squares (OLS) Estimation
Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1
17 / 68
Ordinary Least Squares (OLS) Estimation
Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂0 :
n
X
−2(yi − β̂0 − β̂1 xi ) = 0
i=1
17 / 68
Ordinary Least Squares (OLS) Estimation
Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1
18 / 68
Ordinary Least Squares (OLS) Estimation
Pn
▶ Differentiate i=1 (yi − β̂0 − β̂1 xi )2 with respect to β̂1 :
n
X
−2xi (yi − β̂0 − β̂1 xi ) = 0
i=1
18 / 68
Ordinary Least Squares (OLS) Estimation
19 / 68
OrdinaryF i gLeast
u r e 2 . 4 Squares
Fitted values and(OLS)
residuals. Estimation
y
yi
ûi 5 residual yˆ 5 ˆ 0 1 ˆ 1x
x1 xi x
ŷ = β̂0 + β̂1 x
20 / 68
Example: CEO Salary and Return on Equity
▶ Suppose the SLR model is
salary = β0 + β1 roe + u
\ = 963.191 + 18.501roe
salary
▶ β̂1 = 18.501 =⇒ if the return on equity increases by 1
percent, then salary is predicted to change by $18,501.
▶ β̂0 = 963.191 =⇒ even if roe = 0, the predicted salary of
CEO is $963,191.
▶ Causal Interpretation of β̂1 ? Think about what factors are included
in u, and whether Cov (x , u) = 0.
21 / 68
Example: Voting Outcomes and Campaign Expenditures
voteA = β0 + β1 shareA + u
\ = 26.81 + 0.464shareA
voteA
▶ β̂1 = 0.464 =⇒ if candidate A’s share of spending increases
by one percentage point, he or she receives 0.464 (about one
half) percentage points more of the total vote.
▶ β̂0 = 26.81 =⇒ If candidate A does not spend any on
campaign, then he or she will receive about 26.81% of the
total vote.
22 / 68
Chapter 2 The Simple Regression Model 33
salary
E(salary|roe) 5 0 1 roe
1
963.191
▶ The OLS regression line is also called the sample regression function
(SRF).TheWage
Example 2.4
PRFand is something
Education
fixed but unknown, while the SRF
changes with different realized sample.
For the population of people in the workforce in 1976, let y 5 wage, where wage is
measured in dollars per hour. Thus, for a particular person, if wage 5 6.75, the hourly
wage is $6.75. Let x 5 educ denote years of schooling; for example, educ 5 12 cor-
responds to a complete high school education. Since the average wage in the sample is
$5.90, the Consumer Price Index indicates that this amount is equivalent to $19.06 in 23 / 68
Exercise
The following table contains the ACT scores and the GPA for eight
college students.
And x̄ = 25.875, ȳ = 3.2125, i=1 (xi –x̄ )(yi –ȳ ) = 5.8125, and
Pn
24 / 68
Exercise
25 / 68
Solution
26 / 68
Properties of OLS on Any Sample of Data
27 / 68
Algebraic Properties of OLS Statistics
Pn
i=1 ûi = 0: Some residuals are positive and others are negative, so
▶
the fitted regression line lies in the middle of the data points.
– ȳ = ŷ¯ + û
¯ = ŷ¯ : the sample average of the fitted values, ŷ , is the
same as sample average of y
– ȳ = β̂0 + β̂1 x̄ : the fitted regression line passes through (x̄ , ȳ )
Pn
▶ i=1 xi ûi = 0:
n n n
1X ¯ = 1
X
¯ = 1
X
C
d ov (x , û) = (xi − x̄ )(ûi − û) xi (ûi − û) xi ûi = 0
n n n
i=1 i=1 i=1
28 / 68
Measures of Variation
▶ SST: Total sum of squares measures the total amount of variability
in the dependent variable
n
X
SST = (yi − ȳ )2
i=1
29 / 68
Measures of Variation
Sum of Squares
Total Prediction
Total PredictionErrors
Errors
12
Log GDP per capita growth
11
10
9
8
7
6
1 2 3 4 5 6 7 8
30 / 68
Measures of Variation
Sum of Squares
Residuals
Residuals
12
Log GDP per capita growth
11
10
9
8
7
6
1 2 3 4 5 6 7 8
31 / 68
Measures of Variation
▶ It can be shown that SST = SSE + SSR
n
X
SST = (yi − ȳ )2
i=1
n
X
= [(yi − ŷi ) + (ŷi − ȳ )]2
i=1
n
X
= [ûi + (ŷi − ȳ )]2
i=1
n
X n
X n
X
= ûi2 + 2 ûi (ŷi − ȳ ) + (ŷi − ȳ )2
i=1 i=1 i=1
n
X
= SSR + 2 ûi (ŷi − ȳ ) + SSE
i=1
= SSR + SSE
32 / 68
R-squared
33 / 68
R-squared Properties of OLS on Any Sample of Data
4 2
1.9
3
1.8
1.7
2
1.6
1.5
1.4
0
1.3
1.2
-1
1.1
-2 1
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
34 / 68
Two Examples of R-Squared
\ = 963.191 + 18.501roe
salary
n = 209, R 2 = 0.0132
The regression explains only 1.3% of the total variation in salaries.
▶ Voting Outcomes and Campaign Expenditures:
\ = 26.81 + 0.464ShareA
voteA
n = 173, R 2 = 0.856
The regression explains 85.6% of the total variation in election
outcomes.
35 / 68
Units of Measurement and Functional Form
36 / 68
Changing Units of Measurement
▶ Data Scaling
1
[ dollars =
wage wage
[ cents
100
37 / 68
Changing Units of Measurement
▶ Substitute:
1
[ cents = β̂0 + β̂1 educyears
wage
100
[ cents = 100β̂0 + 100β̂1 educyears
wage
=⇒ the estimates of β0 and β1 are scaled by 100
▶ What if we want to measure educ in months?
1
educyears = educmonths
12
▶ Substitute:
1
wage
[ dollars = β̂0 + β̂1 educmonths
12
1
wage
[ dollars = β̂0 + β̂1 educmonths
12
=⇒ the estimate of β1 are scaled by 1/12
38 / 68
Incorporating Nonlinearities in Simple Regression
39 / 68
Log-level Model
▶ Regression of log wages on years of education:
ln(wage) = β0 + β1 educ + u
ln(wage)
\ = 0.584 + 0.083educ
which implies
[ ≈ e 0.584+0.083educ .
wage
▶ For example, if the current wage is $10 per hour (which implies that
educ = ln(10)−0.584
0.083 . Suppose the education is increased by one year.
Then:
ln(10) − 0.584
∆wage = exp 0.584 + 0.083 +1 −10 = 0.865 ≈ 0.83
0.083
∂wage/wage +$0.83/$10
= = 0.083
∂educ +1year
41 / 68
Example: A Log Wage Equation
42 Part 1 Regression Analysis with Cross-Sectional Data
wage
▶ When the wage level is higher, the increase in wage for one more
year of eduction
Example 2.10
is larger, but the percentage increase of wage is the
A Log Wage Equation
same.Using the same data as in Example 2.4, but using log(wage) as the dependent variable, we
obtain the following relationship:
42 / 68
log(wage) 5 0.584 1 0.083 educ [2.44]
Log-log Model
▶ CEO Salary and Firm Sales:
ln(salary ) = β0 + β1 ln(sales) + u
∂ln(wage) ∂ln(wage)
elasticity = = = β1 educ,
∂ln(educ) ∂educ/educ
43 / 68
Log-log Model
ln(salary
\ ) = 4.822 + 0.257 ln(sales),
which implies
44 / 68
Summary of Functional Forms Involving Logarithms
▶ For most applications, choosing a model that can be put into the
linear regression framework is sufficient.
46 / 68
Expected Values and Variances of the OLS
Estimators
47 / 68
Statistical Properties of OLS Estimators
▶ Recall that the OLS estimator for β0 and β1 are:
Pn
(x − x̄ )(yi − ȳ )
β̂1 = i=1 Pn i and β̂0 = ȳ − x̄ β̂1
i=1 (xi − x̄ )
2
E [β̂0 ] =?, E [β̂1 ] =?, and Var (β̂0 ) =?, Var (β̂1 ) =?
48 / 68
Standard Assumptions for the SLR Model
y = β0 + β1 x + u.
y i = β0 + β1 x i + u i .
49 / 68
Discussion of Random Sampling: Wage and Education
▶ The wage and the years of education of the worker drawn are
random because one does not know beforehand which worker is
drawn.
▶ Throw back worker into population and repeat random draw n
times.
▶ The wages and years of education of the sampled workers are used
to estimate the linear relationship between wages and education.
50 / 68
Standard Assumptions for the SLR Model
46 Part 1 Regression Analysis with Cross-Sectional Data
Graph of y = β
F i g u r e 2 . 7 Graph of yi 0 1 1xii 1 ui. 0
+ β1 xi + ui
yi
ui PRF
E(y|x) 5 0 1 1x
u1
y1
51 / 68
Standard Assumptions for the SLR Model
– The values of the explanatory variables are not all the same
(otherwise it would be impossible to study how much the
dependent variable changes when the explanatory variable
changes one unit, β1 ).
Pn
– Note that i=1 (xi − x̄ )2 is the denominator of β̂1 . If
Pn
i=1 (xi − x̄ ) = 0, β̂1 is not defined.
2
52 / 68
the population). For example, if y 5 wage and x 5 educ, then (2.18) fails only if everyone
in the sample has the same amount of education (for example, if everyone is a high school
graduate; see Figure 2.3). If just one person has a different amount of education, then
Standard Assumptions for the SLR Model
(2.18) holds, and the estimates can be computed.
wage
Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
eemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
53 / 68
Unbiasedness of OLS
▶ Theorem: Under assumptions SLR.1-SLR.4,
54 / 68
Unbiasedness of OLS: Proof
▶ We condition on {xi , i = 1, ..., n}, i.e., the x values can be treated as
fixed. Now, the only randomness is from {ui , i = 1, ..., n}. Note that
Pn
(xi − x̄ )yi
β̂1 − β1 = Pi=1 n − β1
(x − x̄ )2
i=1 i
Pn
SLR.1,2 (xi − x̄ )(β0 + β1 xi + ui )
i=1 P
= n − β1
(x − x̄ )2
i=1 i
Pn Pn Pn
i=1
(xi − x̄ ) i=1
(xi − x̄ )xi (xi − x̄ )ui
= β0 Pn + β1 Pn + Pi=1
n − β1
(xi − x̄ )2 (x − x̄ )
i=1 i
2 (x − x̄ )2
i=1 i
Pn i=1
(xi − x̄ )ui
= Pi=1 n
i=1
(xi − x̄ )
2
55 / 68
Unbiasedness of OLS: Proof
56 / 68
Variances of the OLS Estimators
57 / 68
The Same Mean But Different Dispersion
Expected Values and Variances of the OLS Estimators
0.5
0
-2 -1 0 1 2
Figure: Random Variables with the Same Mean BUT Different Distributions
f(y|x)
59 / 68
Heteroskedasticity
Chapter 2 The Simple Regression Model 53
▶ When Var (ui |xi ) depends on xi , the error term is said to exhibit
heteroskedasticity.
F i g u r e 2 . 9 Var(wageeduc) increasing with educ.
f(wage|educ)
wage
16
educ
60 / 68
Variances of OLS Estimators
▶ Theorem: Under assumptions SLR.1-SLR.5,
σ2 σ2
Var (β̂1 ) = Pn =
i=1 (xi − x̄ )
2 SSTx
Pn Pn
σ 2 n−1 i=1 xi2 σ 2 n−1 i=1 xi2
Var (β̂0 ) = Pn =
i=1 (xi − x̄ )
2 SSTx
61 / 68
Expected Values and Variances of the OLS Estimators
Variances of OLS Estimators
2 2
0 0
-2 -2
2 2
0 0
-2 -2
63 / 68
Estimating the Error Variance
Note that
64 / 68
Estimating the Error Variance
▶ Unfortunately σ̃ 2 is a biased estimator of σ 2
E [σ̂ 2 ] = σ 2 .
65 / 68
SER and SE
√
▶ σ̂ = σ̂ 2 is called the standard error of the regression (SER).
▶ The estimated standard deviations of the regression coefficients are
called standard errors. They measure how precisely the regression
coefficients are estimated:
s
σ̂ 2
q
σ̂
se(β̂1 ) = Var
d (β̂1 ) = =√
SSTx SSTx
s Pn
σ̂ 2 n−1 i=1 xi2
q
σ̂
se(β̂0 ) = d (β̂0 ) =
Var =q
SSTx Pn
SSTx /n−1 i=1 xi2
66 / 68