CH 2 Part II Handout
CH 2 Part II Handout
CHAPTER TWO
REGRESSION ANALYSIS
Part-II-MULTIPLE LINEAR REGRESSION ANALYSIS
MLRMs also have some advantages over SLRMs. Firstly, SLRMs are doubtful to draw
citrus paribus conclusion. In order to reach at a ceteris paribus conclusion, the effect of
all other factors should be controlled. In SLRMs, the effect of all other factors is assumed
to be captured by the random term 𝑼. In this case a ceteris paribus interpretation would
be possible if all factors included in the random error term are uncorrelated with the X.
This, however, is rarely realistic as most economic variables are interdependent. In effect
in most SLRMs the coefficient of X misrepresents the partial effect of 𝑿 on 𝒀. This
problem in econometrics is known as exclusion bias. Exclusion biases could be
minimized in econometric analysis if we could explicitly control for the effects of all
variables that determine X. Multiple regression analysis is more amenable to ceteris
paribus analysis because it allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. This is important both for testing economic
theories and for evaluating policy effects when we must rely on non-experimental data.
MLRMs are also flexible. A single MLRM can be used to estimate the partial effects of
so many variables on the dependent variable. In addition to their flexibility, MLRMs also
have higher explanatory power than SLRMs. Because the larger the number of
explanatory variables in a model, the large is part of 𝒀 which could be explained or
predicted by the model.
Consider the following relationship among four economic variables say, quantity demand
(𝒀𝒊 ), price of the good (𝑷𝟏 ), price of a substitute good (𝑷𝟐 ) and consumer’s income
(𝑴). Assuming linear functional form of the relationship, the true relationship can be
modeled as follows:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑷𝟏 + 𝜷𝟐 𝑷𝟐 + 𝜷𝟑 𝑴 + 𝑼𝒊 … … … … … … … … . . (𝟑. 𝒂)
Where,
𝜷′𝒔 are fixed and unknown parameters, and 𝑼𝒊 is the population disturbance term.
𝜷𝟎 is the intercept.
𝜷𝟏 measures the change in 𝒀𝒊 with respect to 𝑷𝟏 , holding other factors fixed.
𝜷𝟐 measures the change in 𝒀𝒊 with respect to 𝑷𝟐 , holding other factors fixed.
𝜷𝟑 measures the change in 𝒀𝒊 with respect to 𝑴, holding other factors fixed.
✓ Equation (3. 𝑎) is a multiple regression model with three explanatory
variables.
Just as in the case of simple regression, the variable 𝑼𝒊 is the error term or disturbance
term. No matter how many explanatory variables we include in our model, there will
always be factors we cannot include, and these are collectively contained in 𝑼𝒊 . This
disturbance term is of similar nature to that in simple regression, reflecting:
- Omissions of other relevant variables
- Random nature of human responses
- Errors of aggregation and measurement, etc.
In this chapter, we will first start our discussion with the basic assumptions of the
multiple regression analysis, and we will proceed our analysis with the case of two
explanatory variables and then we will generalize the multiple regression model for the
case of k-explanatory variables using matrix algebra.
Since the population regression equation is unknown, it has to be estimated from sample
data. That is (𝟑. 𝟏𝒂) has to be estimated by the sample regression function as follows:
̂𝟎 + 𝜷
̂𝒊 = 𝜷
𝒀 ̂ 𝟏 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 𝑿𝟐𝒊 … … … … … … … … … … … … (𝟑. 𝟏𝒃)
̂ 𝒋 are estimates of the 𝜷𝒋 and 𝒀
Where, 𝜷 ̂ is known as the predicted value of 𝒀.
Therefore, given sample observations on 𝒀𝒊 , 𝑿𝟏𝒊 𝑎𝑛𝑑 𝑿𝟐𝒊 , we can estimate (𝟑. 𝟏𝒂) by
(𝟑. 𝟐) as follows.
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 𝑿𝟐𝒊 + 𝒆𝒊 … … … … … … … … . . . (𝟑. 𝟐)
From (𝟑. 𝟐) we can obtain the prediction error/residuals of the model as:
𝒆𝒊 = 𝒀𝒊 − 𝒀 ̂𝟎 − 𝜷
̂ 𝒊 = 𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 … … … … … … … … . . (𝟑. 𝟑)
The method of ordinary least squares chooses the estimates that minimize the squared
prediction error of the model/sum of squared residuals. Therefore, squaring and
summing (𝟑. 𝟑) for all sample values of the variables, we get the total of squared
prediction error of the model.
That is, ̂𝟎 − 𝜷
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 )𝟐 … … … … … … … … (𝟑. 𝟒)
Therefore, to obtain expressions for the least square estimators, we partially differentiate
̂ 𝟎, 𝜷
∑ 𝒆𝟐𝒊 with respect to 𝜷 ̂ 𝟏 𝑎𝑛𝑑 𝜷
̂ 𝟐 and set the partial derivatives equal to zero.
𝝏[∑ 𝒆𝟐𝒊 ]
= −𝟐 ∑(𝒀𝒊 − 𝜷 ̂𝟎 − 𝜷̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … … . (𝟑. 𝟓)
𝝏𝜷̂𝟎
𝝏[∑ 𝒆𝟐𝒊 ]
̂𝟎 − 𝜷
= −𝟐 ∑ 𝑿𝟏𝒊 (𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … (𝟑. 𝟔)
𝝏𝜷̂𝟏
𝝏[∑ 𝒆𝟐𝒊 ]
̂𝟎 − 𝜷
= −𝟐 ∑ 𝑿𝟐𝒊 (𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … (𝟑. 𝟖)
𝝏𝜷̂𝟐
Simple manipulation of the multiple regression equation produces the following three
equations called OLS Normal Equations:
̂𝟎 + 𝜷
∑ 𝒀𝒊 = 𝒏𝜷 ̂ 𝟏 ∑ 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟐𝒊 … … … … … … … … … … . . . (𝟑. 𝟗)
̂ 𝟎 ∑ 𝑿𝟏𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝟏𝒊 = 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 … … … … . . … (𝟑. 𝟏𝟎)
̂ 𝟎 ∑ 𝑿𝟐𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝟐𝒊 = 𝜷 ̂ 𝟏 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟐𝟐𝒊 … … … … . . … (𝟑. 𝟏𝟏)
̂ 𝟎,
From (𝟑. 𝟗) we obtain 𝜷
̂𝟎 = 𝒀
𝜷 ̂ 𝟏𝑿
̅−𝜷 ̂ 𝟐𝑿
̅ 𝟏𝒊 − 𝜷 ̅ 𝟐𝒊 … … … … … … … … … … … … … … … . . (𝟐. 𝟏𝟐)
Substituting (3.12) in (3.10), we obtain:
̂ 𝟏𝑿
̅−𝜷
∑ 𝒀𝒊 𝑿𝟏𝒊 = (𝒀 ̂ 𝟐𝑿
̅ 𝟏𝒊 − 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊 + 𝜷
̅ 𝟐𝒊 ) ∑ 𝑿𝟏𝒊 + 𝜷 ̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊
̂ 𝟏𝑿
̅ ∑ 𝑿𝟏𝒊 − 𝜷
⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 = 𝒀 ̂ 𝟐𝑿
̅ 𝟏𝒊 ∑ 𝑿𝟏𝒊 − 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊
̅ 𝟐𝒊 ∑ 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊
−𝜷
⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 − 𝒀 ̂ 𝟏 (∑ 𝑿𝟐𝟏𝒊 − 𝑿
̅ ∑ 𝑿𝟏𝒊 = 𝜷 ̂ 𝟐 (∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝑿
̅ 𝟏 ∑ 𝑿𝟏𝒊 ) + 𝜷 ̅ 𝟐 ∑ 𝑿𝟏𝒊 )
̅𝑿
⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 − 𝒏 𝒀 ̅𝟏
̂ 𝟏 (∑ 𝑿𝟐𝟏𝒊 − 𝒏𝑿
=𝜷 ̂ 𝟐 (∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝒏𝑿
̅ 𝟐𝟏 ) + 𝜷 ̅ 𝟏𝑿
̅ 𝟐 ) … … … . . . (𝟑. 𝟏𝟑)
We know that
̅𝒀
∑(𝑿𝒊 − 𝒀𝒊 )𝟐 = ∑ 𝑿𝒊 𝒀𝒊 − 𝒏𝑿 ̅ = ∑ 𝒙 𝒊 𝒚𝒊
̅ )𝟐 = ∑ 𝑿𝟐𝒊 − 𝒏𝑿
∑(𝑿𝒊 − 𝑿 ̅ 𝟐 = ∑ 𝒙𝟐𝒊
Substituting the above equations in equation (𝟑. 𝟏𝟑), equation (𝟑. 𝟏𝟎) can be written in
deviation form as follows:
̂ 𝟏 ∑ 𝒙𝟐𝟏𝒊 + 𝜷
∑ 𝒙𝟏𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 … … … … … … … . . (𝟑. 𝟏𝟒)
Following the above procedure, if we substitute (3.13) in (3.12), we obtain;
̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷
∑ 𝒙𝟐𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝟐𝒊 … … … … … … … … … … … (𝟑. 𝟏𝟓)
Let’s put (𝟑. 𝟏𝟒) and (𝟑. 𝟏𝟓) together
̂ 𝟏 ∑ 𝒙𝟐𝟏𝒊 + 𝜷
∑ 𝒙𝟏𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷
∑ 𝒙𝟐𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝟐𝒊
̂ can easily be solved using matrix approach
̂ 𝟏 𝑎𝑛𝑑 𝜷
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝜷 𝟐
Therefore, we obtain;
∑ 𝒙𝟏𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟐𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟐𝒊 𝒚𝒊
̂𝟏 =
𝜷 … … … … … … … … . . (𝟐. 𝟏𝟕)
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐
∑ 𝒙𝟐𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟏𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟏𝒊 𝒚𝒊
̂𝟐 =
𝜷 … … … … … … … … . . (𝟐. 𝟏𝟖)
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐
̂ 𝟏 and 𝜷
The estimates 𝜷 ̂ 𝟐 have partial effect, or ceteris paribus, interpretations. From
the above equation, we have
̂𝒊 = 𝜷
∆𝒀 ̂ 𝟏 ∆𝑿𝟏 + 𝜷̂ 𝟐 ∆𝑿𝟐
So we can obtain the predicted change in 𝒀 given the changes in 𝑿𝟏 and 𝑿𝟐 . In particular,
̂ 𝟏 ∆𝑿𝟏 , holding 𝑿𝟐 fixed. The key
̂𝒊 = 𝜷
when 𝑿𝟐 is held fixed, so that ∆𝑿𝟐 = 𝟎, then ∆𝒀
point is that, by including 𝑿𝟐 in our model, we obtain a coefficient on 𝑿𝟏 with a ceteris
How can one interpret the coefficients of Educ and Exper? (NB: the coefficients
have a percentage interpretation when multiplied by 100)
The coefficient 0.125 means that, holding exper fixed, another year of education is
predicted to increase 𝒘𝒂𝒈𝒆 by 𝟏𝟐. 𝟓%, on average. Alternatively, if we take two people
with the same levels of experience, the coefficient on educ is the proportionate difference
in predicted wage when their education levels differ by one year. Similarly, the
coefficient of Experience, 0.085 means that holding Educ fixed, another year of related
work experience is predicted to increase 𝒘𝒂𝒈𝒆 by 𝟖. 𝟓%, on average.
3.3.3. The Coefficient of Determination (𝑹𝟐 ): The Case of Two explanatory variables
In the simple regression model, we introduced 𝑹𝟐 as a measure of the proportion of
variation in the dependent variable that is explained by variation in the explanatory
variable. In multiple regression model the same measure is relevant, and the same
formulas are valid but now we talk of the proportion of variation in the dependent
variable explained by all explanatory variables included in the model.
𝟐
𝑬𝑺𝑺 𝑹𝑺𝑺 ∑ 𝒆𝟐𝒊
𝑹 = =𝟏− =𝟏− … … … … … … … … … … … … . . (𝟑. 𝟏𝟗)
𝑻𝑺𝑺 𝑻𝑺𝑺 ∑ 𝒚𝟐𝒊
∑ 𝑒𝑖2 … … … … . . (3.20)
𝑬𝑺𝑺 𝜷 ̂ ∑𝒙 𝒚 + 𝜷 ̂ ∑𝒙 𝒚
𝟏 𝟏𝒊 𝒊 𝟐 𝟐𝒊 𝒊
∴𝑹 = 𝟐
= … … … … … … … … … . . (3.21)
𝑻𝑺𝑺 𝟐
∑ 𝒚𝒊
̅ 𝟐)
Adjusted Coefficient of Determination (𝑹
One major limitation with 𝑹𝟐 is that it can be made large by adding more and more
variables, even if the variables added have no economic justification. Algebraically, it is
the fact that as the variables are added, the sum of squared errors (RSS) goes down (it can
remain unchanged, but this is rare) and thus 𝑹𝟐 goes up. If the model contains 𝒏 − 𝟏
variables then 𝑹𝟐 = 𝟏. The manipulation of model just to obtain a high 𝑹𝟐 is not wise.
To overcome this limitation of the 𝑹𝟐 , we can “adjust” it in a way that takes into account
the number of variables included in a given model. This alternative measure of goodness
̅ 𝟐 , is usually reported by
of fit, called the adjusted 𝑹𝟐 and often symbolized as 𝑹
regression programs. It is computed as:
𝟐
̅ 𝟐 = 𝟏 − ∑ 𝒆𝒊𝟐⁄𝒏−𝒌 = 𝟏 − (𝟏 −
𝑹 ∑ ⁄ 𝒚𝒊 𝒏−𝟏
𝒏−𝟏
𝑹𝟐 ) ( ) … … … … … … (𝟑. 𝟐𝟐)
𝒏−𝒌
This measure does not always goes up when a variable is added because of the degree of
̅ 𝟐 is that it
freedom term 𝒏 − 𝒌 is the numerator. That is, the primary attractiveness of 𝑹
imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the
degrees of freedom of the regression 𝒏 − 𝒌 always decrease.
An interesting algebraic fact is that if we add a new regressor to a model, increases if, and
only if, the t statistic on the new regressor is greater than 1 in absolute value. Thus, we
̅ 𝟐 could be used to decide whether a certain additional regressor
see immediately that 𝑹
̅ 𝟐 has an upper bound that is equal to 1, but it does
must be included in the model. The 𝑹
not strictly have a lower bound since it can take negative values. While solving one
problem, this corrected measure of goodness of fit unfortunately introduces another one.
̅ 𝟐 is no longer the percent of variation explained.
It loses its interpretation; 𝑹
The general linear regression model with 𝒌 explanatory variables is written in the form:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 … … . . +𝜷𝒌 𝑿𝒌𝒊 + 𝑼𝒊 … … … … … … (𝟑. 𝟐𝟑)
Since 𝒊 represents the 𝒊𝒕𝒉 observation, we shall have ‘𝒏’ number of equations with ‘𝒏’
number of observations on each variable.
𝒀𝟏 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟏 + 𝜷𝟐 𝑿𝟐𝟏 + 𝜷𝟑 𝑿𝟑𝟏 … … . . +𝜷𝒌 𝑿𝒌𝟏 + 𝑼𝟏
𝒀𝟐 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟐 + 𝜷𝟐 𝑿𝟐𝟐 + 𝜷𝟑 𝑿𝟑𝟐 … … . . +𝜷𝒌 𝑿𝒌𝟐 + 𝑼𝟐
𝒀𝟑 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟑 + 𝜷𝟐 𝑿𝟐𝟑 + 𝜷𝟑 𝑿𝟑𝟑 … … . . +𝜷𝒌 𝑿𝒌𝟑 + 𝑼𝟑
…………………………………………………...............
𝒀𝒏 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒏 + 𝜷𝟐 𝑿𝟐𝒏 + 𝜷𝟑 𝑿𝟑𝒏 … … . . +𝜷𝒌 𝑿𝒌𝒏 + 𝑼𝒏
The above system of equations can be expressed in a compact form by using matrix
notation as:
Y1 1 X 11 X 21 ....... X k1 0 U 1
Y 1 X
X 22 ....... X k 2 1 U
2 12 2
Y3 = 1 X 13 X 23 ....... X k 3 2 + U 3
. . . . ....... . . .
Yn 1 X 1n X 2n ....... X kn n U n
Y = X . + U
Equation (𝟑. 𝟐𝟔) is true population relationship of the variables in matrix format. By
taking the conditional expected value of (𝟑. 𝟐𝟔) for a given values of the explanatory
variables of the model, we get the population regression function (𝑷𝑹𝑭) in matrix format
as:
𝑬(𝒀) = 𝑬(𝑿𝜷) + 𝑬(𝑼)
𝑬(𝒀) = 𝑿𝜷, 𝒃𝒆𝒄𝒂𝒖𝒔𝒆 𝑬(𝑼) = 𝟎
Therefore, 𝑷𝑹𝑭 = 𝑬(𝒀) = 𝑿𝜷 … … … … … … … … … … … … (𝟑. 𝟑𝟕)
In econometrics equations like (3.26) is difficult to estimate since estimation of (3.26)
requires population observation on all possible values of the variables of the model. As a
result, in most econometric analysis the true population relationship like (3.26) is
estimated by sample relationship. The sample relationship among variables with ‘𝒌’
number of explanatory variables and ‘𝒏’ number of observations can be set in matrix
format as follows:
̂ + 𝒆 … … … … … … … … … … … … … … … … … … … … … … … . . (𝟑. 𝟐𝟖)
𝒀 = 𝑿𝜷
Where: 𝜷 ̂ is a ((𝒌 + 𝟏) × 𝟏) column vector of estimates of the true population
parameters 𝜷
̂=𝒀
𝑿𝜷 ̂ is an (𝒏 × 𝟏) column vector of predicted values of the dependent
variable 𝒀.
𝒆 is an (𝒏 × 𝟏) column vector of the sample disturbance (error) term.
As in the two explanatory variables model, in the k-explanatory variable case the OLS
estimators are obtained by minimizing
̂𝟎 − 𝜷
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 − ⋯ . . −𝜷
̂ 𝒌 𝑿𝒌𝒊 )𝟐 … … … … … … … … (𝟑. 𝟐𝟗)
Where, ∑ 𝒆𝟐𝒊 is the total squared prediction error (or RSS) of the model.
In matrix notation, this amounts to minimizing 𝒆′ 𝒆. That is:
𝒆𝟏
𝒆𝟐
. 𝟐 𝟐 𝟐 𝟐
𝒆′ 𝒆 = [𝒆𝟏 , 𝒆𝟐 , … , 𝒆𝒏 ] . 𝟐
. = 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑 + ⋯ … … … + 𝒆𝒏 = 𝜮𝒆𝒊
.
[𝒆𝒏 ]
𝟐 ′
∴ 𝜮𝒆𝒊 = 𝒆 𝒆 … … … … … … … … … … … … … … … … . . (𝟑. 𝟑𝟎)
From (3.28) we can derive that; 𝒆 = 𝒀 − 𝒀 ̂ = 𝒀 − 𝑿𝜷 ̂
Therefore, through substitution in equation 3.30, we get;
𝒆′ 𝒆 = (𝒀 − 𝑿𝜷 ̂ )′ (𝒀 − 𝑿𝜷
̂)
̂ ′ 𝒀 − 𝒀′ 𝑿𝜷
𝒆′ 𝒆 = 𝒀′ 𝒀 − 𝑿′ 𝜷 ̂+𝜷
̂ ′ 𝑿′ 𝑿𝜷
̂ … … … … … . . (𝟑. 𝟑𝟏)
̂ = 𝒀′(𝟏× 𝒏) × 𝑿(𝒏×(𝒌+𝟏) × 𝜷
The order of the matrix 𝒀′ 𝑿𝜷 ̂ ((𝒌+𝟏)×𝟏 = 𝒀′ 𝑿𝜷
̂ (𝟏×𝟏) , it is
scalar.
̂ is a scalar, has only a single entry, it is equal to its transpose. That
Since the matrix 𝒀′ 𝑿𝜷
is;
𝒀′ 𝑿𝜷̂ (𝟏×𝟏) = [𝒀′ 𝑿𝜷
̂ (𝟏×𝟏) ]′
𝒀′ 𝑿𝜷̂ (𝟏×𝟏) = 𝒀𝑿′ 𝜷
̂′
(𝟏×𝟏)
′ ′ ′ ̂′ ̂ ′ 𝑿′ 𝑿𝜷
̂ … … … … … . . (𝟑. 𝟑𝟐)
𝒆 𝒆 = 𝒀 𝒀 − 𝟐𝑿 𝜷 𝒀 + 𝜷
̂ we get the expressions for OLS estimates in
Minimizing 𝒆′ 𝒆 in (3.32) with respect to 𝜷
matrix format as follows:
𝝏𝜮𝒆𝟐𝒊 𝝏(𝒆′ 𝒆)
= ̂
= −2𝑿′ 𝒀 + 𝟐𝑿′ 𝑿𝜷
̂
𝜕𝜷 𝜕𝜷̂
To get the above expression, we used the rule of differentiation of matrix notations,
namely;
̂ ′ 𝑿′ )
𝝏(𝜷 ̂ 𝑿𝜷
𝝏(𝜷 ̂ ′)
= 𝑿′ , 𝒂𝒏𝒅 ̂
= 𝟐𝑿𝜷
𝜕𝜷̂ ̂
𝜕𝜷
Equating the above expression to null vector, 0, we obtain:
−2𝑿′ 𝒀 + 𝟐𝑿′ 𝑿𝜷 ̂=𝟎 ⇒ 𝑿′ 𝑿𝜷̂ = 𝑿′ 𝒀 ➔ OLS Normal Equation
̂ = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
𝜷 … … … … … … … … … … … … . … … … . (𝟑. 𝟑𝟑)
➢
̂ is the vector of required least square estimators, 𝜷
Therefore, 𝜷 ̂ 𝟏, 𝜷
̂ 𝟐, … … . . , 𝜷
̂𝒌
3.3.5. The Coefficient of Determination (𝑹𝟐 ): The case of ‘k’ explanatory variables
The coefficient of multiple determination (𝑹𝟐 ) of MLRMs with ‘k’ number of
explanatory variables can be derived in matrix form as follows.
̅ 𝟐 … … … … … … … … … … … . . (𝟑. 𝟑𝟕)
̂ ′ 𝑿′ 𝒀 − 𝒏𝒀
𝑬𝑺𝑺 = 𝜷
𝑬𝑺𝑺 ̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 +𝜷
𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊 + …… +𝜷
̂ 𝒌 ∑ 𝒙𝒌𝒊 𝒚𝒊 ̂ ′ 𝑿′ 𝒀−𝒏𝒀
𝜷 ̅𝟐
Recall that, 𝑹𝟐 = = =
𝑻𝑺𝑺 ∑ 𝒚𝟐𝒊 𝒀′ 𝒀−𝒏𝒀 ̅𝟐
̂ ′ 𝑿′ 𝒀 − 𝒏𝒀
𝜷 ̅𝟐
∴ 𝑹𝟐 =
𝒀′ 𝒀 − 𝒏𝒀 ̅…
𝟐 … … … … … … … … … … … . . (𝟑. 𝟑𝟖)
Numerical Example-2:
▪ As illustration, let’s rework our consumption-income example of chapter -2
Observations 1 2 3 4 5 6
Consumption Expenditure (Y) 4 4 7 8 9 10
Monthly Income (X) 5 4 8 10 13 14
2 2 ′
̂ ) = 𝑬 [(𝜷
We know that, 𝑽𝒂𝒓(𝜷 ̂ − 𝑬(𝜷
̂ ) ] = 𝑬 [(𝜷
̂ − 𝜷) ] = 𝑬 [(𝜷
̂ − 𝜷)(𝜷
̂ − 𝜷) ]
′
̂ − 𝜷)(𝜷
𝑬 [(𝜷 ̂ − 𝜷) ] =
̂ 𝟏 − 𝜷 𝟏 )𝟐 ̂ 𝟏 − 𝜷𝟏 )(𝜷
𝑬[(𝜷 ̂ 𝟐 − 𝜷𝟐 )] … … 𝑬[(𝜷
̂ 𝟏 − 𝜷𝟏 )(𝜷
̂ 𝒌 − 𝜷𝒌 )]
𝑬(𝜷
̂ 𝟐 − 𝜷𝟐 )(𝜷
𝑬[(𝜷 ̂ 𝟏 − 𝜷𝟏 )] ̂ 𝟐 − 𝜷 𝟐 )𝟐
𝑬(𝜷 …… ̂ 𝟐 − 𝜷𝟐 )(𝜷
𝑬[(𝜷 ̂ 𝒌 − 𝜷𝒌 )]
= . .
. .
̂
[𝑬[(𝜷𝒌 − 𝜷𝒌 )(𝜷 ̂ 𝟏 − 𝜷𝟏 )] ̂ 𝒌 − 𝜷𝒌 )(𝜷
𝑬[(𝜷 ̂ 𝟐 − 𝜷𝟐 )] ……. ̂ 𝒌 − 𝜷 𝒌 )𝟐
𝑬(𝜷 ]
̂ 𝟏)
𝑽𝒂𝒓(𝜷 ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟐) …… ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝒌)
̂ 𝟐, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏) 𝑽𝒂𝒓(𝜷̂ 𝟐) … …. ̂ 𝟐, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝒌)
= . . .
. . .
[ 𝑪𝒐𝒗(𝜷̂ ̂
𝒌 , 𝜷𝟏 ) 𝑪𝒐𝒗(𝜷̂ 𝒌, 𝜷
̂ 𝟐) …….. 𝑽𝒂𝒓(𝜷 ̂ 𝒌) ]
The above matrix is a symmetric matrix containing variances along its main diagonal and
covariance of the estimators everywhere else. This matrix is, therefore, called the
Variance-covariance matrix of least squares estimators of the regression slopes. Thus,
̂ ) = 𝑬 [(𝜷
𝑽𝒂𝒓(𝜷 ̂ − 𝜷)′ ] … … … … … … … … … … … . (3.42)
̂ − 𝜷)(𝜷
̂ = 𝜷 + (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼
From (3.40), we know that 𝜷
̂ − 𝜷 = (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼 … … … … … … … … … … … . (3.43)
⇒𝜷
Substituting (3.43) in (3.42)
̂ ) = 𝑬[{(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼}{(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼}′ ]
𝑽𝒂𝒓(𝜷
= 𝑬[(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼𝑼′ 𝑿(𝑿′ 𝑿)−𝟏 ]
= (𝑿′ 𝑿)−𝟏 𝑿′ 𝑬(𝑼𝑼′ )𝑿(𝑿′ 𝑿)−𝟏
= (𝑿′ 𝑿)−𝟏 𝑿′ 𝝈𝟐𝒖 𝑰𝒏 𝑿(𝑿′ 𝑿)−𝟏
̂ 𝟏 − 𝜷𝟏 )
(𝜷
̂ − 𝜷) = [
In this model; (𝜷 ]
̂ 𝟐 − 𝜷𝟐 )
(𝜷
̂ − 𝜷)′ = [(𝜷
(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
̂ 𝟐 − 𝜷𝟐 )]
̂ − 𝜷𝟏 )
(𝜷
∴ (𝜷 ̂ − 𝜷)′ = [ 𝟏
̂ − 𝜷)(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
] [(𝜷 ̂ 𝟐 − 𝜷𝟐 )]
̂ 𝟐 − 𝜷𝟐 )
(𝜷
̂ − 𝜷)′ ] =
̂ − 𝜷)(𝜷
𝐴𝑛𝑑, 𝑬 [(𝜷
̂ 𝟏 − 𝜷 𝟏 )𝟐
(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
(𝜷 ̂ 𝟐 − 𝜷𝟐 )
[ ]
̂ 𝟏 − 𝜷𝟏 )(𝜷
(𝜷 ̂ 𝟐 − 𝜷𝟐 ) (𝜷̂ 𝟐 − 𝜷 𝟐 )𝟐
̂ 𝟏)
𝑽𝒂𝒓(𝜷 ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟐)
∴ (𝜷 ̂ − 𝜷)′ = [
̂ − 𝜷)(𝜷 ]
𝑪𝒐𝒗(𝜷̂ 𝟏, 𝜷
̂ 𝟐) 𝑽𝒂𝒓(𝜷 ̂ 𝟐)
𝝈𝟐𝒖 ∑ 𝒙𝟐𝟐
̂ 𝟏) =
∴ 𝑽𝒂𝒓(𝜷 … … … … … … … … … … (𝟑. 𝟒𝟔)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − ∑(𝒙𝟏 𝒙𝟐 )𝟐
𝝈𝟐𝒖 ∑ 𝒙𝟐𝟏
̂ 𝟐) =
𝑽𝒂𝒓(𝜷 … … … … … … … … … … … . . … (𝟑. 𝟒𝟕)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )𝟐
(−)𝝈𝟐𝒖 ∑ 𝒙𝟏 𝒙𝟐
̂ 𝟏, 𝜷
𝑪𝒐𝒗𝒂𝒓(𝜷 ̂ 𝟐) = … … … … … … … … … . … (𝟑. 𝟒𝟖)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )𝟐
The only unknown part in variances and covariance of the estimators is 𝝈𝟐𝒖 . Thus, we
have to have an unbiased estimate of the variance of the population error 𝝈𝟐𝒖 . As we have
𝟐 ∑ 𝒆𝟐
̂ 𝒖 = 𝒊 is an unbiased estimator of 𝝈𝟐𝒖 .
established in simple regression model 𝝈 𝒏−𝟐
For MLEMs with two explanatory variables, we have three parameters including the
constant term and therefore,
∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 = … … … … … … … … … … … … … . (𝟓. 𝟓𝟏)
𝒏−𝟑
This is all about the variance covariance of the parameters. Now it is time to see the
minimum variance property.
̂
Minimum variance of 𝜷
̂ vector are Best Estimators, we have to prove that the
To show that all the 𝜷𝒊′𝒔 in the 𝜷
variances obtained in (𝟑. 𝟒𝟒) are the smallest amongst all other possible linear unbiased
estimators. We follow the same procedure as followed in case of single explanatory
variable model where, we first assumed an alternative linear unbiased estimator and then
it was established that its variance is greater than the estimator of the regression model.
̂ is a linear
function of a normal random variable is itself normal. Thus, since 𝜷
combination of 𝒀, its probability distribution will be normal if 𝒀 is a normal.
That is, the sampling distributions of OLS estimates in MLRMs are normal with mean
values of the true values of their respective population parameters 𝜷 and variances given
by 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 .
Symbolically:
̂ ~𝑵[𝜷, 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 )] … … … … … … … … … . (𝟑. 𝟓𝟔)
𝜷
Or Equivalently,
̂ ~𝑵 [𝜷, √𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 )] … … … … … … … … … . (𝟑. 𝟓𝟕)
𝜷
The normality of the sampling distributions of OLS’s estimates around the true values of
the population parameters implies that under AMLRA there is equal chance for any OLS
estimate to over or under estimate the true value of the population parameter in a
particular sample. But the most probable value for an estimate at a particular sample is
the true value of the population parameter.
In multiple regression models, we will undertake two types of tests of significance. These
are test of individual and overall significance. Let’s examine them one by one.
1. Tests of Individual Significance
This is the process of verifying the individual statistical significance of each parameter
estimates of a model. That is, checking whether the impact of a single explanatory
variable on the dependent variable is significant or not after taking the impact of all other
explanatory variables on the dependent variable into account. To elaborate test of
individual significance, consider the following model of the determinants of Teff farm
productivity.
̂𝟎 + 𝜷
𝒀=𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝒆𝒊 … … … … … … … … ….(3.58)
Where: 𝒀 is total output of Teff per hectare of land, 𝑿𝟏 and 𝑿𝟐 are the amount of
fertilizer used and rainfall, respectively.
Given the above model suppose we need to check whether the application of fertilizer
(𝑿𝟏 ) has a significant effect on agricultural productivity holding the effect of rainfall (𝑿𝟐 )
on Teff farm productivity constant, i.e., whether fertilizer (𝑿𝟏 ) is a significant factor in
affecting Teff farm productivity after taking the impact of rainfall on Teff farm
̂ 𝟏 holding the
productivity into account. In this case, we test the significance of 𝜷
influence of 𝑿𝟐 on Y constant. Mathematically, test of individual significance involves
testing the following two pairs of null and alternative hypotheses.
𝑨. 𝑯𝟎 : 𝜷𝟏 = 𝟎 B. 𝑯𝟎 : 𝜷𝟐 = 𝟎
𝑯𝑨 : 𝜷 𝟏 ≠ 𝟎 𝑯𝑨 : 𝜷𝟐 ≠ 𝟎
The null hypothesis in 𝐴 states that holding 𝑿𝟐 constant, 𝑿𝟏 has no significant (linear)
influence on 𝒀. Similarly, the null hypothesis in ‘𝑩’ states that holding 𝑿𝟏 constant,
𝑿𝟐 has no influence on the dependent variable 𝒀. To test the individual significance of
parameter estimates in MLRMs, we can use the usual statistical techniques of tests. These
include:
̂ 𝟐𝒖 ∑ 𝒙𝟐𝟐
𝝈 ∑ 𝒆𝟐𝒊
̂ 𝟏 ) = √𝑽𝒂𝒓(𝜷
𝑺𝑬(𝜷 ̂ 𝟏) = √ ; 𝑤ℎ𝑒𝑟𝑒, ̂ 𝟐𝒖
𝝈 =
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )
𝟐 𝒏−𝟐
Step 3: Make decision. That means, accept or reject the null-hypothesis. In this case
̂ 𝟏) > 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ 𝟏 , Accept the null hypothesis. That is, the estimate 𝜷 ̂ 𝟏 is not
case of two explanatory variables model, the number of parameters is 3, then the degree
of freedom is 𝒏 − 𝟑.
Step 4: Compute the t-statistics (𝒕𝑪 ) of the estimates under the null hypothesis. That is,
̂ 𝟏 − 𝜷𝟏
𝜷
𝒕𝑪 =
𝑺𝑬(𝜷 ̂𝟏)
̂ 𝟏 is
Since 𝜷𝟏 = 𝟎 in the null hypothesis, the computed t-statistics (𝒕𝑪 ) of the estimate 𝜷
̂𝟏 − 𝟎
𝜷 ̂𝟏
𝜷
𝒕𝑪 = ⇒ 𝒕𝑪 =
̂𝟏)
𝑺𝑬(𝜷 ̂ 𝟏)
𝑺𝑬(𝜷
Step 5: Compare 𝒕𝑪 𝒂𝒏𝒅 𝒕𝒕 and make decision
̂ 𝟏 is not significant at the chosen
If 𝒕𝑪 < |𝒕𝒕 |, accept the null hypothesis. That is, 𝜷
level of significance. This would imply holding 𝑿𝟐 constant 𝑿𝟏 has no significant
linear impact on 𝒀.
If 𝒕𝑪 > |𝒕𝒕 |, reject the null hypothesis and hence accept the alternative one. That
̂ 𝟏 is significant at the chosen level of significance. This would imply
is, 𝜷
holding𝑿𝟐 constant 𝑿𝟏 has significant linear impact on Y.
3.5.2 Test of the Overall Significance of MLRMs
This is the process of testing the joint significance of parameter estimates of the model. It
involves checking whether the variation in dependent variable of a model is significantly
explained by the variation in all explanatory variables included in the model. To elaborate
test of the overall significance consider a model:
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝜷
̂ 𝟑 𝑿𝟑 + ⋯ … . . +𝜷
̂ 𝒌 𝑿𝒌 +𝒆𝒊
Given this model we may interested to know whether the variation in the dependent
variable can be attributed to the variation in all explanatory variables of the model or not.
If no amount of the variation in the dependent variable can be attributed to the variation
of explanatory variables included in the model then, none of the explanatory variables
included in the model are relevant. That is, all estimates of slope coefficient will be
statistically not different from zero. On the other hand, if it is possible to attribute
significant proportion of the variation in the dependent variable to the variation in
explanatory variables then, at least one of the explanatory variables included in the model
is relevant. That is, at least one of the estimates of slope coefficient will be statistically
different from zero (significant).
Thus, this test has the following null and alternative hypotheses to test:
𝑯𝟎 : 𝜷 𝟏 = 𝜷 𝟐 = 𝜷 𝟑 … … … … … … . . = 𝜷 𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables
included in the model are relevant in a sense that no amount of the variation in Y can be
attributed to the variation in all explanatory variables simultaneously. That means if all
explanatory variables of the model are change simultaneously it will left the value of Y
unchanged.
How to approach test of the overall significance of MLRM?
If the null-hypothesis is true, that is if all the explanatory variables included in the model
are irrelevant then, there wouldn’t be a significant explanatory power difference between
the models with and without all the explanatory variables. Thus, test of the overall
significance of MLRMs can be approached by testing whether the difference in
explanatory power of the model with and without all explanatory variables is significant
or not. In this case, if the difference is insignificant we accept the null-hypothesis and
reject it if the difference is significant.
Similarly, this test can be done by comparing the sum of squared errors (RSS) of the
model with and without all explanatory variables. In this case we accept the null-
hypothesis if the difference between the sums of squared errors (RSS) of the model with
and without all explanatory variables is insignificant. The notion of this is
straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of
them in the model contributes insignificant amount to the explanation of the model as a
result the sample prediction error of the model wouldn’t reduce significantly.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the
model without the inclusion of all the explanatory variables of the model, i.e., the residual
sum of square of the model obtained assuming that all the explanatory variables are
irrelevant (under the null hypothesis) and Unrestricted Residual Sum of Squares
(URSS) be the sum of squared errors of the model with the inclusion of all explanatory
variables in the model. It is always true that 𝑹𝑹𝑺𝑺 ≥ 𝑼𝑹𝑺𝑺 (why?). To elaborate these
concepts consider the following model
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝜷
̂ 𝟑 𝑿𝟑 + ⋯ … . . +𝜷
̂ 𝒌 𝑿𝒌 +𝒆𝒊
This model is called the unrestricted model. The test of joint hypothesis is given by:
𝑯𝟎 : 𝜷 𝟏 = 𝜷 𝟐 = 𝜷 𝟑 … … … … … … . . = 𝜷 𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
We know that:
̂ 𝒊 +𝒆𝒊
𝒀𝒊 = 𝒀 ̂𝒊
⇒ 𝒆𝒊 = 𝒀𝒊 − 𝒀
̂ 𝒊 )𝟐
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝒀
This sum of squared error is called unrestricted residual sum of square (URSS).
However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients
are zero the model shrinks to:
̂ 𝟎 + 𝒆𝒊
𝒀𝒊 = 𝜷
This model is called restricted model. Applying OLS we obtain:
∑ 𝒀𝒊
̂𝟎 =
𝜷 =𝒀̅ … … … … … … … … … … … … … … … . . (𝟑. 𝟓𝟗)
𝒏
Therefore, 𝒆𝒊 = 𝒀𝒊 − 𝜷 ̂ 𝟎 , but 𝜷
̂𝟎 = 𝒀
̅
𝒆𝒊 = 𝒀𝒊 − 𝒀̅
∴ ∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝒀̅ )𝟐 = ∑ 𝒚𝟐𝒊 = 𝑻𝑺𝑺
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of
square (TSS).
The ratio:
𝑹𝑹𝑺𝑺 − 𝑼𝑹𝑺𝑺⁄𝑲 − 𝟏
~𝑭(𝑲−𝟏, 𝒏−𝑲) … … … … … … … . . (𝟑. 𝟔𝟎)
𝑼𝑹𝑺𝑺⁄𝒏 − 𝑲
has an 𝑭 − 𝒅𝒊𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏 with 𝒌 − 𝟏 and 𝒏 − 𝒌 degrees of freedom for the numerator
and denominator, respectively.
𝑹𝑹𝑺𝑺 = 𝑻𝑺𝑺
𝑼𝑹𝑺𝑺 = ∑ 𝒚𝟐𝒊 − 𝜷̂ 𝟏 ∑ 𝒙𝟏 𝒚 − 𝜷
̂ 𝟐 ∑ 𝒙𝟐 𝒚 − ⋯ … . . … … … . . . − 𝜷
̂ 𝒌 ∑ 𝒙𝒌 𝒚 = 𝑹𝑺𝑺
𝑰. 𝒆. , 𝑼𝑹𝑺𝑺 = 𝑹𝑺𝑺
𝑻𝑺𝑺 − 𝑹𝑺𝑺⁄𝑲 − 𝟏
𝑭= ~𝑭(𝑲−𝟏, 𝒏−𝑲)
𝑹𝑺𝑺⁄𝒏 − 𝑲
𝑬𝑺𝑺⁄𝑲 − 𝟏
𝑭𝒄(𝑲−𝟏, 𝒏−𝑲) = … … … … … … … … … … … … … (𝟑. 𝟔𝟏)
𝑹𝑺𝑺⁄𝒏 − 𝑲
By: Teklebirhan A. (Asst. Prof.) Page 25
CHAPTER TWO: REGRESSION ANALYSIS 2025
➢ If the null hypothesis is not true, then the difference between RRSS and URSS
(or i.e., TSS & RSS) becomes large, implying that the constraints placed on the
model by the null hypothesis have large effect on the ability of the model to fit the
data, and the value of 𝑭 tends to be large. Thus, reject the null hypothesis if the
computed value of F (i.e., F test statistic) becomes too large or the P-value for
the F-statistic is lower than any acceptable level of significance (𝜶), and vice
versa.
In short, the Decision Rule is to
➢ Reject 𝑯𝟎 if 𝑭𝒄 > 𝑭𝟏−𝜶 (𝒌 − 𝟏, 𝒏 − 𝒌), 𝒐𝒓 𝑷 − 𝒗𝒂𝒍𝒖𝒆 < 𝜶, and vice versa.
➢ Implication: 𝑹𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒐𝒇 𝑯𝟎 implies that the parameters of the model are jointly
significant or the dependent variable 𝒀 is linearly related to at least one of the
independent variables included in the model.
NB: From the ANOVA table above, we can obtain the F-statistics of the model.