Econometrics 3
Econometrics 3
Yi − Y ei = Yi − Yi
Y
Yi − Y
β̂0
We take the squared sum of both sides of the equation, and we obtain:
2 2
∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅ )2 = ∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌̅) + ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̂𝑖 ) + 2 ∙ ∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌̅ ) ∙ (𝑌𝑖 − 𝑌̂𝑖 ),
∑ 𝑒𝑖 = 0
𝑖=1
2 2
∑𝑛𝑖=1(𝑌𝑖 − 𝑌̅ )2 = ∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌̅) + ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̂𝑖 ) , (2.10)
If we would like to analyse the total sample variability, we can examine the variation of the
dependent variable about the sample mean 𝑌̅ :
n
SST = ∑(𝑌𝑖 − 𝑌̅ )2
i=1
Variance Analysis
The total sum of squares shows the total variations of Y around the sample mean. The total sum
of squares consists of two parts: the explained sum of squares (SSR) and the residual sum of
squares (SSE):
𝑛
2
𝑆𝑆𝑅 = ∑(𝑌̂𝑖 − 𝑌̅ ) ,
𝑖=1
𝑛
2
𝑆𝑆𝐸 = ∑(𝑌𝑖 − 𝑌̂𝑖 ) .
𝑖=1
“Looking at the overall fit of an estimated model is useful not only for evaluating the quality of the
regression, but also for comparing models that have different data sets or combinations of independent
variables. We can never be sure that one estimated model represents the truth any more than another,
but evaluating the quality of the fit of the equation is one ingredient in a choice between different
formulations of a regression model. Be careful, however! The quality of the fit is a minor ingredient in
this choice, and many beginning researchers allow themselves to be overly influenced by it.”
Studenmund (2014)
Example 2.5
Let us give the decomposition of the variance for exercise 2.4.
𝑛
2
𝑆𝑆𝑅 = ∑(𝑌̂𝑖 − 𝑌̅ ) = 972.9288,
𝑖=1
𝑛
2
𝑆𝑆𝐸 = ∑(𝑌𝑖 − 𝑌̂𝑖 ) = 14.5836.
𝑖=1
SST = SSR + SSE,
𝑛
The main purpose of the regression analysis to provide a good estimation and explain the
variation of the dependent variable Y. Using the R-squared (coefficient of determination)
indicator we can measure what proportion of the total sum squares is explained by the
regression. R-squared is the coefficient of determination providing information about how well
the independent variable explains the dependent variable. R-squared is equal to the ratio of the
explained sum of squares to the total sum of squares:
𝑆𝑆𝑅 ∑𝑛 (𝑌̂ −𝑌̅)2
R2 = SST = ∑𝑖=1 𝑖
n (𝑌 −𝑌
̅ )2
.
i=1 𝑖
The indicator shows the proportion of the sample variation in Y that is explained by the
independent variable (X):
β̂0 R2 = 1
In this case the fit is perfect, and the residuals are zero (Fig. 2.8). Indeed, the minimum criterion
of the residual sum of squares is equivalent to the maximum criterion of the coefficient of
determination:
𝑆𝑆𝑅 𝑆𝑆𝑇 − 𝑆𝑆𝐸 𝑆𝑆𝐸
R2 = = = 1− .
SST SST SST
R2 is equal to zero, if the dependent and independent variable are not correlated:
𝑆𝑆𝑅
R2 = = 0,
SST
𝑛
2
𝑆𝑆𝑅 = ∑(𝑌̂𝑖 − 𝑌̅) = 0.
𝑖=1
The regression line is a horizontal (regression) line, if the value of x changes, the value of Y
remains the same; i.e. it does not change (Fig. 2.9):
𝑌̂𝑖 = 𝑌̅ .
β̂0
R2 = 0
If the value of the coefficient (R2 ) is greater, and the regression line fits the sample data better.
What is the acceptable value of (R2 )? There is no general rule to evaluate R2 ; however, a
coefficient above 0.5 may represent a good fit. It is very important to discover the variables that
significantly influence the dependent variable. Our estimation can result in a high determination
of the coefficient (R2 ); however the independent variable chosen and the dependent variable
can, in fact, be influenced by another variable.
Example 2.6
Try to answer the following question:
How well does the estimated regression function fit the data?
Let us compute the coefficient of determination for the example 2.4.
𝑆𝑆𝑅 972.9288
R2 = = = 0.985232.
SST 987.5124
On the basis of the result of R2 , we can state that 98.5232% of the total sum of squares is
explained by the regression equation.
We started the second chapter by introducing the linear correlation coefficient. Finally, at the
end of the chapter, we return to a discussion of the correlation coefficient. The main reason for
this is that there is a relationship between the coefficient of determination (R2 ) and the linear
correlation coefficient. In the last chapter, we discuss the association between the two
indicators.
2.5. The relationship between 𝐑𝟐 and the linear correlation
coefficient
In the previous chapter, we have found that the coefficient of determination is as follows:
𝑆𝑆𝑅 ∑𝑛 (𝑌̂𝑖 −𝑌̅)2
R2 = = ∑𝑖=1
n ̅ 2
.
SST i=1(𝑌𝑖 −𝑌)
The square of the linear correlation coefficient is equal to the coefficient of determination (R2 ):
2
(∑𝑛𝑖=1(𝑋𝑖 − X) ∙ (𝑌𝑖 − 𝑌̅))2
R = 𝑛 = 𝑟𝑥𝑦 2 .
∑𝑖=1(Xi − 𝑋̅)2 ∙ ∑ni=1(𝑌𝑖 − 𝑌̅)2
Let us prove the statement:
2 2
∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌̅ ) = ∑𝑛𝑖=1(β̂0 + β̂1 ∙ Xi − 𝑌̅ ) . (2.11)
∑𝑛𝑖=1(𝑋𝑖 − X) ∙ (𝑌𝑖 − 𝑌̅ )
β̂1 = ,
∑𝑛𝑖=1(Xi − 𝑋̅)2
2 2 2
∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌̅ ) = ∑𝑛𝑖=1(β̂0 + β̂1 ∙ Xi − 𝑌̅ ) = ∑𝑛𝑖=1(𝑌̅ − β̂1 ∙ 𝑋̅ + β̂1 ∙ Xi − 𝑌̅ ) = ∙,
𝑛
2
= ∑(−β̂1 ∙ 𝑋̅ + β̂1 ∙ Xi ) =
𝑖=1
𝑛 𝑛
2 (∑𝑛𝑖=1(𝑋𝑖 − X) ∙ (𝑌𝑖 − 𝑌̅ ))2
= β̂1 ̅ 2
∙ ∑(Xi − 𝑋) = ∑(Xi − 𝑋̅)2 =
(∑𝑛𝑖=1(Xi − 𝑋̅)2 )2
𝑖=1 𝑖=1
At this moment, we know that the association between the two indicators is valid if there is only
one independent variable and one dependent variable. Later (in Chapter 5), we will examine
the validity of (2.12) in multiple regression models. Do not forget that the Pearson correlation
coefficient measures the strength and direction of the linear association between two variables.
Example 2.7
Let us compute the linear correlation coefficient for exercise 2.4.
𝑟𝑥𝑦 2 = 0.985232 = 𝑅2 .
There is a strong linear association between the dependent and independent variable.