0% found this document useful (0 votes)
16 views

Simple Linear Regression Model

The document describes the simple linear regression model where: 1) The dependent variable Y is modeled as a linear function of the independent variable X, plus an error term ε. 2) The ordinary least squares (OLS) method is used to estimate the unknown slope β1 and intercept β0 parameters by minimizing the sum of squared errors between observed and predicted Y values. 3) The OLS estimators β^0 and β^1 are proven to be unbiased, and their variances depend on the variance of the error term ε and the distribution of X values. The R-squared statistic and the sums of squared errors are also introduced to measure the goodness of fit of the

Uploaded by

frapass99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Simple Linear Regression Model

The document describes the simple linear regression model where: 1) The dependent variable Y is modeled as a linear function of the independent variable X, plus an error term ε. 2) The ordinary least squares (OLS) method is used to estimate the unknown slope β1 and intercept β0 parameters by minimizing the sum of squared errors between observed and predicted Y values. 3) The OLS estimators β^0 and β^1 are proven to be unbiased, and their variances depend on the variance of the error term ε and the distribution of X values. The R-squared statistic and the sums of squared errors are also introduced to measure the goodness of fit of the

Uploaded by

frapass99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SIMPLE LINEAR REGRESSION MODEL

The Data Generating Process (DGP), or the population, is described by the following linear model:
Y j=β 0 + β 1 X j +ε j
 Y j is the j-th observation of the dependent variable Y (it is known)
 X j is the j-th observation of the independent variable X (it is known)
 β 0 is the intercept term (it is unknown)
 β 1 is the slope parameter (it is unknown)
 ε j is the j-th error, the j-th unobserved factor that, besides X, affects Y (it is unknown)

Since the values of X j and Y j are known but the values of β 0, β 1and ε j are unknown, also the regression
model that describes the relationship between X and Y is unknown.

Graphically, the errors are the vertical distances between the observations of the dataset and the
observations predicted by the linear regression model:

The ordinary least squares criterion (OLS)


With the OLS method we find the estimators of the unknown parameters β 0 and β 1that maximize the
explicative power of the linear model β 0 + β 1 X j and that, therefore, minimize the sum of the squared
errors:
n n
( ^β0 , ^β 1)=argmin ∑ ( Y j−β 0−β 1 X j )2 =argmin ∑ ε 2j
j=1 j=1
n
where ∑ ( Y j−β 0 −β1 X j ) is the objective function O ( β 0 , β 1 ), that must be derived with respect to β 0
2

j=1

and β 1and must be put equal to 0 in order to find ^β 0 and ^β 1.

The 1st of the first order conditions (FOC) is


n
∂O
=0 ∑ 2 ( Y j− β^ 0 − ^β1 X j ) (−1)=0 ⟺
∂ β0 j=1


^β =Y − ^β X
0 n 1 n
where:
n
1
 Y n= ∑ Y is the sample average of Y
n j =1 j
n
1
 X n= ∑ X is the sample average of X
n j=1 j
The 2nd of the first order conditions (FOC) is
n
∂O
=0 ⟺ ∑ 2 ( Y j− ^β 0− β^ 1 X j ) (−X j)=0
∂ β1 j=1


n n

∑ ( X ¿ ¿ j− X n) ( Y j−Y n ) 1
∑ ( X ¿ ¿ j− X n )( Y j−Y n ) sample covariance between X∧Y
^β 1= j =1 = ∙ j=1
= ¿¿
n n
T 1 sample variance of X
∑ ( X j−X n ) 2 ∙ ∑ ( X −X n )
T j=1 j
2

j=1

If ^β 0 and ^β 1 are the OLS estimators, the predicted values are defined as

Y^ j= ^β 0 + ^β 1 X j
while the residuals are defined as
ε^ j=Y j−Y^ j=Y j−( ^β0 + β^ 1 X j )
More ε^ j is close to 0, better is the quality of the regression.

The crucial assumptions


Consider 2 r.vs. X and Y and the following regression model
Y = β0 + β 1 X +ε

We know that E [ Y |X ]= β0 + β 1 X and ε =Y −¿ ¿Therefore

E [ ε| X ] =E [ Y −( β ¿ ¿ 0+ β 1 X )¿ X ]=E [ Y | X ] −E ¿
E [ Y |X ]−{β 0+ β1 E ¿

So, by the Law of iterated expectations

E [ ε ]=E [ E [ ε| X ] ] =E [ 0 ]=0
In conclusion, the expectation of the unknown factors is not influenced by X and it is equal to 0:
E [ ε| X ] =E [ ε ] =0
This implies that
E [ Xε ] =0
Because, by the Law of iterated expectations, E [ Xε ] =E [ E [ Xε|X ] ] =E [ XE [ ε| X ] ] =E [ X ∙ 0 ] =0

Moreover
E [ ε^ j ]=0
3 important properties
We can derive 3 properties from the previous conclusions:
 ^ coincides with the sample average of Y
Property 1: the sample average of the fitted values Y

Y^ n=Y n

 Property 2: the sample average of the OLS residuals ε^ jis 0:

ε^ j=0
 Property 3: the sample covariance between the regressors and the OLS residuals is always 0
n
1
∑ ( X −X n ) ( ^ε j− ε^ j ) =¿ 0 ¿
n j=1 j

Measures of goodness of prediction


The R2 measures the proportion of the variance of Y that is explained by X. The R2 varies between 0 and 1.
Higher is the R2, better is the fit of the model to the data.

2 SSE SST −SSR SSR


R= = =1−
SST SST SST
Where:
 The Total sum of squares (SST) measures the data dispersion (total variance of data)
n
SST =∑ (Y j−Y n )2
j =1

 The Explained sum of squares (SSE) measures the fitted values dispersion (variance explained by the
regression)
n n
SSE=∑ ( Y^j−Y n )2=¿ ∑ ( Y^j−Y^ n)2 ¿
j=1 j=1

 The Sum of squares residuals (SSR) measures the residuals dispersion (variance explained by the
residuals)
n n
SSR=∑ (Y j−Y^j )2=¿ ∑ ( ε^ j)2 ¿
j=1 j=1

Theorem: the total variance of the data is given by the sum of the variance explained by the regression and
the variance explained by the residuals.
SST =SSE+ SSR
Of course:
- smaller is the SSR, better is the fit of the regression to the data
2
SSR ≪ SST ⇒ R ≈ 1(good fit )
- higher is the SSR, worst is the fit of the regression to the data
2
SSR ≈ SST ⇒ R ≈ 0 ( poor fit )

Unbiasedness of the OLS estimators


Assume that
1) the DGP of (X j ,Y ¿ ¿ j)¿ is Y j=β 0 + β 1 X j +ε j
2) E [ ε j ] =0
3) E [ ε j∨ X 1 , … , X n ] =E [ ε j ] =0
These 3 assumptions are enough to prove that ^β 0 and ^β 1are unbiased estimators of β 0 and β 1.
^β can also be expressed as
1
n

∑ (X ¿ ¿ j−X n )ε j
^β 1=β 1+ j=1 ¿
SS T X
n
Where SS T X =∑ ( X j −X n ) is the total sum of squares of X.
2

j=1

From this, we derive that


E [ ^β 1|X ] =β 1

And then, by the law of iterated expectations, ^β 1 is an unbiased estimator of β 1, in the sense that

E [ ^β 1 ]=β 1

[ [ ]]
Because E ^β 1 =E E ^β 1| X =E [ β1 ] = β1
[ ]
^β can also be expressed as
0

^β =β +( β¿¿ 1− ^β ) X +ε ¿
0 0 1 n n

From this, we derive that


E [ ^β 0|X ]=β 0

And then, by the law of iterated expectations, ^β 0 is an unbiased estimator of β 0, in the sense that

E [ ^β 0 ]= β0

[ [ ]]
Because E ^β 0 =E E ^β 0|X =E [ β 0 ] =β 0
[ ]
Variance of the OLS estimators
Assume that
1) the DGP of (X j ,Y ¿ ¿ j)¿ is Y j=β 0 + β 1 X j +ε j
2) E [ ε j ] =0
3) E [ ε j∨ X 1 , … , X n ] =E [ ε j ] =0
4) the ε j are independent
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2

2
σε
Then E [ ^β 1∨X ]=β 1 +
2 2
, so
SS T X
2 2 2
σε 2 σε σε
V [ β 1∨X ]=
^ Because V [ ^β 1∨X ]=E [ ^β 1∨X ]−( E [ ^β 1∨X ]) =β 1+
2 2 2
−β 1=
SST X SS T X SST X

Note that:
 As the variance of the errors goes to 0, the variance of the estimator goes to 0, so the estimator gets
more and more precise
σ 2ε → 0 ⇒ V [ ^β 1∨X ] → 0
 As the dimension of the sample goes to + ∞, the variance of the estimator goes to 0, so the estimator
gets more and more precise
n →+∞ ⇒ SS T X →+ ∞ ⇒ V [ β^ 1∨X ] → 0

2
σε 2 1 2
E [ ^β 0∨ X ]=β 0 +
2 2
X + σ , so
SST X n n ε
n

σ2 ∑ X 2j Because
V [ ^β 0∨X ]= ε j=1
n SST X
n

∑ X 2j
( )
2 2 2
σ 1 X 1 σ
V [ ^β 0∨X ]=E [ β^ 20∨ X ]−( E [ ^β 0∨ X ] ) =β 20+
2 ε
X 2n+ σ 2ε −β 20=σ 2ε n
+ = ε j=1
SST X n SST X n n SST X

Note that:
 As the variance of the errors goes to 0, the variance of the estimator goes to 0, so the estimator gets
more and more precise
σ 2ε → 0 ⇒ V [ ^β 0∨X ] → 0
 As the dimension of the sample goes to + ∞, the variance of the estimator goes to 0, so the estimator
gets more and more precise
n →+∞ ⇒ SS T X →+ ∞⇒ V [ β^ 0∨ X ] →0

Estimator of the error’s variance


How can we compute V [ β
^ ∨X ] and V [ ^β ∨X ]? σ 2ε is not observed!
0 1

Assume that
1) the DGP of (X j ,Y ¿ ¿ j)¿ is Y j=β 0 + β 1 X j +ε j
2) E [ ε j ] =0
3) E [ ε j∨ X 1 , … , X n ] =E [ ε j ] =0
4) the ε j are independent
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2

Therefore, the random variable


n
1
σ^ 2ε = ∑ ε^ 2= SSR
n−2 j=1 j n−2
2
is an unbiased estimator of the error’s variance σ ε , in the sense that

E [ σ^ ¿ ¿ ε ]=σ ε ¿
2 2

You might also like