Simple Linear Regression Model
Simple Linear Regression Model
The Data Generating Process (DGP), or the population, is described by the following linear model:
Y j=β 0 + β 1 X j +ε j
Y j is the j-th observation of the dependent variable Y (it is known)
X j is the j-th observation of the independent variable X (it is known)
β 0 is the intercept term (it is unknown)
β 1 is the slope parameter (it is unknown)
ε j is the j-th error, the j-th unobserved factor that, besides X, affects Y (it is unknown)
Since the values of X j and Y j are known but the values of β 0, β 1and ε j are unknown, also the regression
model that describes the relationship between X and Y is unknown.
Graphically, the errors are the vertical distances between the observations of the dataset and the
observations predicted by the linear regression model:
j=1
⇓
^β =Y − ^β X
0 n 1 n
where:
n
1
Y n= ∑ Y is the sample average of Y
n j =1 j
n
1
X n= ∑ X is the sample average of X
n j=1 j
The 2nd of the first order conditions (FOC) is
n
∂O
=0 ⟺ ∑ 2 ( Y j− ^β 0− β^ 1 X j ) (−X j)=0
∂ β1 j=1
⇓
n n
∑ ( X ¿ ¿ j− X n) ( Y j−Y n ) 1
∑ ( X ¿ ¿ j− X n )( Y j−Y n ) sample covariance between X∧Y
^β 1= j =1 = ∙ j=1
= ¿¿
n n
T 1 sample variance of X
∑ ( X j−X n ) 2 ∙ ∑ ( X −X n )
T j=1 j
2
j=1
If ^β 0 and ^β 1 are the OLS estimators, the predicted values are defined as
Y^ j= ^β 0 + ^β 1 X j
while the residuals are defined as
ε^ j=Y j−Y^ j=Y j−( ^β0 + β^ 1 X j )
More ε^ j is close to 0, better is the quality of the regression.
E [ ε| X ] =E [ Y −( β ¿ ¿ 0+ β 1 X )¿ X ]=E [ Y | X ] −E ¿
E [ Y |X ]−{β 0+ β1 E ¿
E [ ε ]=E [ E [ ε| X ] ] =E [ 0 ]=0
In conclusion, the expectation of the unknown factors is not influenced by X and it is equal to 0:
E [ ε| X ] =E [ ε ] =0
This implies that
E [ Xε ] =0
Because, by the Law of iterated expectations, E [ Xε ] =E [ E [ Xε|X ] ] =E [ XE [ ε| X ] ] =E [ X ∙ 0 ] =0
Moreover
E [ ε^ j ]=0
3 important properties
We can derive 3 properties from the previous conclusions:
^ coincides with the sample average of Y
Property 1: the sample average of the fitted values Y
Y^ n=Y n
ε^ j=0
Property 3: the sample covariance between the regressors and the OLS residuals is always 0
n
1
∑ ( X −X n ) ( ^ε j− ε^ j ) =¿ 0 ¿
n j=1 j
The Explained sum of squares (SSE) measures the fitted values dispersion (variance explained by the
regression)
n n
SSE=∑ ( Y^j−Y n )2=¿ ∑ ( Y^j−Y^ n)2 ¿
j=1 j=1
The Sum of squares residuals (SSR) measures the residuals dispersion (variance explained by the
residuals)
n n
SSR=∑ (Y j−Y^j )2=¿ ∑ ( ε^ j)2 ¿
j=1 j=1
Theorem: the total variance of the data is given by the sum of the variance explained by the regression and
the variance explained by the residuals.
SST =SSE+ SSR
Of course:
- smaller is the SSR, better is the fit of the regression to the data
2
SSR ≪ SST ⇒ R ≈ 1(good fit )
- higher is the SSR, worst is the fit of the regression to the data
2
SSR ≈ SST ⇒ R ≈ 0 ( poor fit )
∑ (X ¿ ¿ j−X n )ε j
^β 1=β 1+ j=1 ¿
SS T X
n
Where SS T X =∑ ( X j −X n ) is the total sum of squares of X.
2
j=1
And then, by the law of iterated expectations, ^β 1 is an unbiased estimator of β 1, in the sense that
E [ ^β 1 ]=β 1
[ [ ]]
Because E ^β 1 =E E ^β 1| X =E [ β1 ] = β1
[ ]
^β can also be expressed as
0
^β =β +( β¿¿ 1− ^β ) X +ε ¿
0 0 1 n n
And then, by the law of iterated expectations, ^β 0 is an unbiased estimator of β 0, in the sense that
E [ ^β 0 ]= β0
[ [ ]]
Because E ^β 0 =E E ^β 0|X =E [ β 0 ] =β 0
[ ]
Variance of the OLS estimators
Assume that
1) the DGP of (X j ,Y ¿ ¿ j)¿ is Y j=β 0 + β 1 X j +ε j
2) E [ ε j ] =0
3) E [ ε j∨ X 1 , … , X n ] =E [ ε j ] =0
4) the ε j are independent
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2
2
σε
Then E [ ^β 1∨X ]=β 1 +
2 2
, so
SS T X
2 2 2
σε 2 σε σε
V [ β 1∨X ]=
^ Because V [ ^β 1∨X ]=E [ ^β 1∨X ]−( E [ ^β 1∨X ]) =β 1+
2 2 2
−β 1=
SST X SS T X SST X
Note that:
As the variance of the errors goes to 0, the variance of the estimator goes to 0, so the estimator gets
more and more precise
σ 2ε → 0 ⇒ V [ ^β 1∨X ] → 0
As the dimension of the sample goes to + ∞, the variance of the estimator goes to 0, so the estimator
gets more and more precise
n →+∞ ⇒ SS T X →+ ∞ ⇒ V [ β^ 1∨X ] → 0
2
σε 2 1 2
E [ ^β 0∨ X ]=β 0 +
2 2
X + σ , so
SST X n n ε
n
σ2 ∑ X 2j Because
V [ ^β 0∨X ]= ε j=1
n SST X
n
∑ X 2j
( )
2 2 2
σ 1 X 1 σ
V [ ^β 0∨X ]=E [ β^ 20∨ X ]−( E [ ^β 0∨ X ] ) =β 20+
2 ε
X 2n+ σ 2ε −β 20=σ 2ε n
+ = ε j=1
SST X n SST X n n SST X
Note that:
As the variance of the errors goes to 0, the variance of the estimator goes to 0, so the estimator gets
more and more precise
σ 2ε → 0 ⇒ V [ ^β 0∨X ] → 0
As the dimension of the sample goes to + ∞, the variance of the estimator goes to 0, so the estimator
gets more and more precise
n →+∞ ⇒ SS T X →+ ∞⇒ V [ β^ 0∨ X ] →0
Assume that
1) the DGP of (X j ,Y ¿ ¿ j)¿ is Y j=β 0 + β 1 X j +ε j
2) E [ ε j ] =0
3) E [ ε j∨ X 1 , … , X n ] =E [ ε j ] =0
4) the ε j are independent
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2
E [ σ^ ¿ ¿ ε ]=σ ε ¿
2 2