0% found this document useful (0 votes)
15 views27 pages

CH 2 Part II Handout

Chapter Two discusses Multiple Linear Regression Analysis (MLRA), emphasizing its advantages over Simple Linear Regression Analysis (SLRA) in capturing the interdependence of economic variables. It outlines the basic assumptions necessary for classical regression analysis and the estimation process using Ordinary Least Squares (OLS). The chapter also introduces the mathematical formulation of a multiple regression model with two explanatory variables and the method for deriving estimates.

Uploaded by

pitersimon3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views27 pages

CH 2 Part II Handout

Chapter Two discusses Multiple Linear Regression Analysis (MLRA), emphasizing its advantages over Simple Linear Regression Analysis (SLRA) in capturing the interdependence of economic variables. It outlines the basic assumptions necessary for classical regression analysis and the estimation process using Ordinary Least Squares (OLS). The chapter also introduces the mathematical formulation of a multiple regression model with two explanatory variables and the method for deriving estimates.

Uploaded by

pitersimon3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CHAPTER TWO: REGRESSION ANALYSIS 2025

CHAPTER TWO
REGRESSION ANALYSIS
Part-II-MULTIPLE LINEAR REGRESSION ANALYSIS

3.1. The Concept of Multiple Linear Regression Analysis


So far, we have been discussing about the simplest form of regression analysis called
simple linear regression analysis. However, the most realistic representation of real
world economic relationships is obtained from multiple regression analysis. This is
because most economic variables are determined by more than a single variable.

MLRMs also have some advantages over SLRMs. Firstly, SLRMs are doubtful to draw
citrus paribus conclusion. In order to reach at a ceteris paribus conclusion, the effect of
all other factors should be controlled. In SLRMs, the effect of all other factors is assumed
to be captured by the random term 𝑼. In this case a ceteris paribus interpretation would
be possible if all factors included in the random error term are uncorrelated with the X.
This, however, is rarely realistic as most economic variables are interdependent. In effect
in most SLRMs the coefficient of X misrepresents the partial effect of 𝑿 on 𝒀. This
problem in econometrics is known as exclusion bias. Exclusion biases could be
minimized in econometric analysis if we could explicitly control for the effects of all
variables that determine X. Multiple regression analysis is more amenable to ceteris
paribus analysis because it allows us to explicitly control for many other factors that
simultaneously affect the dependent variable. This is important both for testing economic
theories and for evaluating policy effects when we must rely on non-experimental data.

MLRMs are also flexible. A single MLRM can be used to estimate the partial effects of
so many variables on the dependent variable. In addition to their flexibility, MLRMs also
have higher explanatory power than SLRMs. Because the larger the number of
explanatory variables in a model, the large is part of 𝒀 which could be explained or
predicted by the model.

Consider the following relationship among four economic variables say, quantity demand
(𝒀𝒊 ), price of the good (𝑷𝟏 ), price of a substitute good (𝑷𝟐 ) and consumer’s income

By: Teklebirhan A. (Asst. Prof.) Page 1


CHAPTER TWO: REGRESSION ANALYSIS 2025

(𝑴). Assuming linear functional form of the relationship, the true relationship can be
modeled as follows:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑷𝟏 + 𝜷𝟐 𝑷𝟐 + 𝜷𝟑 𝑴 + 𝑼𝒊 … … … … … … … … . . (𝟑. 𝒂)
Where,
𝜷′𝒔 are fixed and unknown parameters, and 𝑼𝒊 is the population disturbance term.
𝜷𝟎 is the intercept.
𝜷𝟏 measures the change in 𝒀𝒊 with respect to 𝑷𝟏 , holding other factors fixed.
𝜷𝟐 measures the change in 𝒀𝒊 with respect to 𝑷𝟐 , holding other factors fixed.
𝜷𝟑 measures the change in 𝒀𝒊 with respect to 𝑴, holding other factors fixed.
✓ Equation (3. 𝑎) is a multiple regression model with three explanatory
variables.

Just as in the case of simple regression, the variable 𝑼𝒊 is the error term or disturbance
term. No matter how many explanatory variables we include in our model, there will
always be factors we cannot include, and these are collectively contained in 𝑼𝒊 . This
disturbance term is of similar nature to that in simple regression, reflecting:
- Omissions of other relevant variables
- Random nature of human responses
- Errors of aggregation and measurement, etc.

In this chapter, we will first start our discussion with the basic assumptions of the
multiple regression analysis, and we will proceed our analysis with the case of two
explanatory variables and then we will generalize the multiple regression model for the
case of k-explanatory variables using matrix algebra.

3.2. The Basic Assumptions of the Classical Regression Analysis (Revisited)


These are necessary conditions to obtain desirable results from the application of OLS to
estimate MLRMs. These include:
1. Randomness of the error term: The variable 𝑼 is a real random variable.
2. Zero mean of the error term: 𝑬(𝑼𝒊 ⁄𝑿𝒊 ) = 𝟎
3. Hemoskcedasticity: The variance of each 𝑈𝑖 is the same for all the 𝑋𝑖 values.
i.e. 𝑬(𝑼𝟐𝒊 ) = 𝜹𝟐𝒖 (constant variance)
4. Normality of 𝑼𝒊 : The values of each 𝑈𝑖 are normally distributed.
i.e., 𝑼𝒊 ~𝑵(𝟎, 𝜹𝟐𝒖 )

By: Teklebirhan A. (Asst. Prof.) Page 2


CHAPTER TWO: REGRESSION ANALYSIS 2025

5. No auto or serial correlation: The values of 𝑼𝒊 (corresponding to 𝑿𝒊 ) are


independent from the values of any other 𝑼𝒋 (corresponding to 𝑿𝒋 ) for 𝒊 𝒋.
i.e., 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎 for 𝒊 𝒋
6. The values of the explanatory variables included in the model are fixed in repeated
sampling
7. Independence of 𝑼𝒊 and 𝑿𝒊 : Every disturbance term 𝑼𝒊 is independent of the
explanatory variables. i.e., 𝑬(𝑼𝒊 𝑿𝟏𝒊 ) = 𝑬(𝑼𝒊 𝑿𝟐𝒊 ) = 𝟎
➢ This condition is automatically fulfilled if we assume that the values of the
X’s are a set of fixed numbers in all (hypothetical) samples.
8. No perfect multi-collinearity:- The explanatory variables of the models are not
perfectly correlated. That is, no explanatory variable of the model is a linear
combination of the other. Perfect collinearity is a problem, because the least squares
estimator cannot separately attribute variation in 𝒀 to the independent variables.
▪ Example: Suppose we regress weight (𝒀) on height measured in meters
(𝑿𝟏 ) and height measured in centimeters(𝑿𝟐 ). How could we
decide which regressor to attribute the changing weight to?
9. No model specification error
10. No error of aggregation and so on.

3.3. Estimation of a Multiple Regression Model


As in the case of SLRA, we can estimate the relationship between variables of a multiple
regression model using the method of Ordinary Least Squares (OLS). Thus, to
understand the nature of multiple regression analysis, we start our analysis with the case
of two explanatory variables, and then we shall extend this to the case of k-explanatory
variables.

3.3.1 A Model with Two Explanatory Variables


A MLRM with two explanatory variables is expressed mathematically as:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝑼𝒊 … … … … … … … … … (𝟑. 𝟏)
The conditional expected value of the above equation given the explanatory variables of
the model is called the population regression function (PRF) and given by;

By: Teklebirhan A. (Asst. Prof.) Page 3


CHAPTER TWO: REGRESSION ANALYSIS 2025

𝑬(𝒀𝒊 ) = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 … … … … … … … … (𝟑. 𝟏𝒂)


𝑆𝑖𝑛𝑐𝑒 𝑬(𝑼𝒊 ) = 𝟎
Where, 𝜷𝒊 is the population parameters,
𝜷𝟎 is referred to as the intercept and
𝜷𝟏 𝑎𝑛𝑑 𝜷𝟐 are known as the slope coefficient of the PRF.

Since the population regression equation is unknown, it has to be estimated from sample
data. That is (𝟑. 𝟏𝒂) has to be estimated by the sample regression function as follows:
̂𝟎 + 𝜷
̂𝒊 = 𝜷
𝒀 ̂ 𝟏 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 𝑿𝟐𝒊 … … … … … … … … … … … … (𝟑. 𝟏𝒃)
̂ 𝒋 are estimates of the 𝜷𝒋 and 𝒀
Where, 𝜷 ̂ is known as the predicted value of 𝒀.

Therefore, given sample observations on 𝒀𝒊 , 𝑿𝟏𝒊 𝑎𝑛𝑑 𝑿𝟐𝒊 , we can estimate (𝟑. 𝟏𝒂) by
(𝟑. 𝟐) as follows.
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 𝑿𝟐𝒊 + 𝒆𝒊 … … … … … … … … . . . (𝟑. 𝟐)
From (𝟑. 𝟐) we can obtain the prediction error/residuals of the model as:
𝒆𝒊 = 𝒀𝒊 − 𝒀 ̂𝟎 − 𝜷
̂ 𝒊 = 𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 … … … … … … … … . . (𝟑. 𝟑)
The method of ordinary least squares chooses the estimates that minimize the squared
prediction error of the model/sum of squared residuals. Therefore, squaring and
summing (𝟑. 𝟑) for all sample values of the variables, we get the total of squared
prediction error of the model.
That is, ̂𝟎 − 𝜷
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 )𝟐 … … … … … … … … (𝟑. 𝟒)
Therefore, to obtain expressions for the least square estimators, we partially differentiate
̂ 𝟎, 𝜷
∑ 𝒆𝟐𝒊 with respect to 𝜷 ̂ 𝟏 𝑎𝑛𝑑 𝜷
̂ 𝟐 and set the partial derivatives equal to zero.
𝝏[∑ 𝒆𝟐𝒊 ]
= −𝟐 ∑(𝒀𝒊 − 𝜷 ̂𝟎 − 𝜷̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … … . (𝟑. 𝟓)
𝝏𝜷̂𝟎
𝝏[∑ 𝒆𝟐𝒊 ]
̂𝟎 − 𝜷
= −𝟐 ∑ 𝑿𝟏𝒊 (𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … (𝟑. 𝟔)
𝝏𝜷̂𝟏
𝝏[∑ 𝒆𝟐𝒊 ]
̂𝟎 − 𝜷
= −𝟐 ∑ 𝑿𝟐𝒊 (𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 ) = 𝟎 … … … … … (𝟑. 𝟖)
𝝏𝜷̂𝟐
Simple manipulation of the multiple regression equation produces the following three
equations called OLS Normal Equations:

By: Teklebirhan A. (Asst. Prof.) Page 4


CHAPTER TWO: REGRESSION ANALYSIS 2025

̂𝟎 + 𝜷
∑ 𝒀𝒊 = 𝒏𝜷 ̂ 𝟏 ∑ 𝑿𝟏𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟐𝒊 … … … … … … … … … … . . . (𝟑. 𝟗)
̂ 𝟎 ∑ 𝑿𝟏𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝟏𝒊 = 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 … … … … . . … (𝟑. 𝟏𝟎)
̂ 𝟎 ∑ 𝑿𝟐𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝟐𝒊 = 𝜷 ̂ 𝟏 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 + 𝜷
̂ 𝟐 ∑ 𝑿𝟐𝟐𝒊 … … … … . . … (𝟑. 𝟏𝟏)
̂ 𝟎,
From (𝟑. 𝟗) we obtain 𝜷
̂𝟎 = 𝒀
𝜷 ̂ 𝟏𝑿
̅−𝜷 ̂ 𝟐𝑿
̅ 𝟏𝒊 − 𝜷 ̅ 𝟐𝒊 … … … … … … … … … … … … … … … . . (𝟐. 𝟏𝟐)
Substituting (3.12) in (3.10), we obtain:
̂ 𝟏𝑿
̅−𝜷
∑ 𝒀𝒊 𝑿𝟏𝒊 = (𝒀 ̂ 𝟐𝑿
̅ 𝟏𝒊 − 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊 + 𝜷
̅ 𝟐𝒊 ) ∑ 𝑿𝟏𝒊 + 𝜷 ̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊

̂ 𝟏𝑿
̅ ∑ 𝑿𝟏𝒊 − 𝜷
⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 = 𝒀 ̂ 𝟐𝑿
̅ 𝟏𝒊 ∑ 𝑿𝟏𝒊 − 𝜷 ̂ 𝟏 ∑ 𝑿𝟐𝟏𝒊
̅ 𝟐𝒊 ∑ 𝑿𝟏𝒊 + 𝜷

̂ 𝟐 ∑ 𝑿𝟏𝒊 𝑿𝟐𝒊
−𝜷

⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 − 𝒀 ̂ 𝟏 (∑ 𝑿𝟐𝟏𝒊 − 𝑿
̅ ∑ 𝑿𝟏𝒊 = 𝜷 ̂ 𝟐 (∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝑿
̅ 𝟏 ∑ 𝑿𝟏𝒊 ) + 𝜷 ̅ 𝟐 ∑ 𝑿𝟏𝒊 )

̅𝑿
⇒ ∑ 𝒀𝒊 𝑿𝟏𝒊 − 𝒏 𝒀 ̅𝟏

̂ 𝟏 (∑ 𝑿𝟐𝟏𝒊 − 𝒏𝑿
=𝜷 ̂ 𝟐 (∑ 𝑿𝟏𝒊 𝑿𝟐𝒊 − 𝒏𝑿
̅ 𝟐𝟏 ) + 𝜷 ̅ 𝟏𝑿
̅ 𝟐 ) … … … . . . (𝟑. 𝟏𝟑)

We know that
̅𝒀
∑(𝑿𝒊 − 𝒀𝒊 )𝟐 = ∑ 𝑿𝒊 𝒀𝒊 − 𝒏𝑿 ̅ = ∑ 𝒙 𝒊 𝒚𝒊
̅ )𝟐 = ∑ 𝑿𝟐𝒊 − 𝒏𝑿
∑(𝑿𝒊 − 𝑿 ̅ 𝟐 = ∑ 𝒙𝟐𝒊
Substituting the above equations in equation (𝟑. 𝟏𝟑), equation (𝟑. 𝟏𝟎) can be written in
deviation form as follows:
̂ 𝟏 ∑ 𝒙𝟐𝟏𝒊 + 𝜷
∑ 𝒙𝟏𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 … … … … … … … . . (𝟑. 𝟏𝟒)
Following the above procedure, if we substitute (3.13) in (3.12), we obtain;
̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷
∑ 𝒙𝟐𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝟐𝒊 … … … … … … … … … … … (𝟑. 𝟏𝟓)
Let’s put (𝟑. 𝟏𝟒) and (𝟑. 𝟏𝟓) together
̂ 𝟏 ∑ 𝒙𝟐𝟏𝒊 + 𝜷
∑ 𝒙𝟏𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 + 𝜷
∑ 𝒙𝟐𝒊 𝒚𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝟐𝒊
̂ can easily be solved using matrix approach
̂ 𝟏 𝑎𝑛𝑑 𝜷
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝜷 𝟐

By: Teklebirhan A. (Asst. Prof.) Page 5


CHAPTER TWO: REGRESSION ANALYSIS 2025

We can rewrite the above two equations in matrix form as follows.


∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ̂𝟏
𝜷 = ∑ 𝒙𝟏𝒊 𝒚𝒊 … … … … . (𝟑. 𝟏𝟔)
∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊 ̂𝟐
𝜷 ∑ 𝒙𝟐𝒊 𝒚𝒊
̂ 𝟏 and 𝜷
We can solve the above matrix using 𝐂𝐫𝐚𝐦𝐞𝐫’𝐬 𝐫𝐮𝐥𝐞 and obtain 𝜷 ̂ 𝟐 as follows;
∑ 𝒙𝟏𝒊 𝒚𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐 ∑ 𝒙𝟏𝒊 𝒚𝒊
| | | 𝟏𝒊 |
∑ 𝒙𝟐𝒊 𝒚𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊 𝒚𝒊
̂𝟏 =
𝜷 𝑎𝑛𝑑 ̂𝟐 =
𝜷
∑ 𝒙𝟐𝟏𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊
| | | 𝟏𝒊 |
∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊 ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 ∑ 𝒙𝟐𝟐𝒊

Therefore, we obtain;
∑ 𝒙𝟏𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟐𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟐𝒊 𝒚𝒊
̂𝟏 =
𝜷 … … … … … … … … . . (𝟐. 𝟏𝟕)
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐
∑ 𝒙𝟐𝒊 𝒚𝒊 . ∑ 𝒙𝟐𝟏𝒊 − ∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 . ∑ 𝒙𝟏𝒊 𝒚𝒊
̂𝟐 =
𝜷 … … … … … … … … . . (𝟐. 𝟏𝟖)
∑ 𝒙𝟐𝟏𝒊 . ∑ 𝒙𝟐𝟐𝒊 − (∑ 𝒙𝟏𝒊 𝒙𝟐𝒊 )𝟐

3.3.2. Interpretation of the OLS Regression Equation


̂ 𝒋′𝒔 is the
More important than the details underlying the computation of the 𝜷
interpretation of the estimated equation. Let’s consider the case of two explanatory
variables:
̂𝟎 + 𝜷
̂𝒊 = 𝜷
𝒀 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐

̂ 𝟎 in the above equation is the predicted value of 𝒀 when 𝑿𝟏 = 𝟎 and


The intercept 𝜷
𝑿𝟐 = 𝟎.
Sometimes, setting 𝑿𝟏 and 𝑿𝟐 both equal to zero is an interesting scenario; in other cases,
it may not make sense. Nevertheless, the intercept is always needed to obtain a prediction
of 𝒀 from the OLS regression line.

̂ 𝟏 and 𝜷
The estimates 𝜷 ̂ 𝟐 have partial effect, or ceteris paribus, interpretations. From
the above equation, we have
̂𝒊 = 𝜷
∆𝒀 ̂ 𝟏 ∆𝑿𝟏 + 𝜷̂ 𝟐 ∆𝑿𝟐
So we can obtain the predicted change in 𝒀 given the changes in 𝑿𝟏 and 𝑿𝟐 . In particular,
̂ 𝟏 ∆𝑿𝟏 , holding 𝑿𝟐 fixed. The key
̂𝒊 = 𝜷
when 𝑿𝟐 is held fixed, so that ∆𝑿𝟐 = 𝟎, then ∆𝒀
point is that, by including 𝑿𝟐 in our model, we obtain a coefficient on 𝑿𝟏 with a ceteris

By: Teklebirhan A. (Asst. Prof.) Page 6


CHAPTER TWO: REGRESSION ANALYSIS 2025

paribus interpretation. This is why multiple regression analysis is so useful. Similarly,


̂ 𝟐 ∆𝑿𝟐 , holding 𝑿𝟏 fixed.
̂𝒊 = 𝜷
∆𝒀
Example:3.1: Suppose an econometrician has estimated the following wage model
based on a sample of 100 individuals from a given city:
̂ 𝒊 ) = 𝟎. 𝟕𝟓 + 𝟎. 𝟏𝟐𝟓𝑬𝒅𝒖𝒄𝒊 + 𝟎. 𝟎𝟖𝟓𝑬𝒙𝒑𝒆𝒓
𝒍𝒏(𝑾𝒂𝒈𝒆

Where, 𝒍𝒏 (𝑾𝒂𝒈𝒆) is the natural logarithm of hourly wage measured in ETB


Educ is education attainment of sample respondent measured in years of
schooling &
𝑬𝒙𝒑𝒆𝒓 is experience measured in years of related work experience

How can one interpret the coefficients of Educ and Exper? (NB: the coefficients
have a percentage interpretation when multiplied by 100)

The coefficient 0.125 means that, holding exper fixed, another year of education is
predicted to increase 𝒘𝒂𝒈𝒆 by 𝟏𝟐. 𝟓%, on average. Alternatively, if we take two people
with the same levels of experience, the coefficient on educ is the proportionate difference
in predicted wage when their education levels differ by one year. Similarly, the
coefficient of Experience, 0.085 means that holding Educ fixed, another year of related
work experience is predicted to increase 𝒘𝒂𝒈𝒆 by 𝟖. 𝟓%, on average.

3.3.3. The Coefficient of Determination (𝑹𝟐 ): The Case of Two explanatory variables
In the simple regression model, we introduced 𝑹𝟐 as a measure of the proportion of
variation in the dependent variable that is explained by variation in the explanatory
variable. In multiple regression model the same measure is relevant, and the same
formulas are valid but now we talk of the proportion of variation in the dependent
variable explained by all explanatory variables included in the model.

The coefficient of determination is defined as:

𝟐
𝑬𝑺𝑺 𝑹𝑺𝑺 ∑ 𝒆𝟐𝒊
𝑹 = =𝟏− =𝟏− … … … … … … … … … … … … . . (𝟑. 𝟏𝟗)
𝑻𝑺𝑺 𝑻𝑺𝑺 ∑ 𝒚𝟐𝒊

By: Teklebirhan A. (Asst. Prof.) Page 7


CHAPTER TWO: REGRESSION ANALYSIS 2025

In the present model of two explanatory variables:


𝑦𝑖 = 𝛽̂1 𝑥1𝑖 + 𝛽̂2 𝑥2𝑖 + 𝑒𝑖
∴ 𝑒𝑖 = 𝑦𝑖 − 𝛽̂1 𝑥1𝑖 − 𝛽̂2 𝑥2𝑖
∑ 𝑒𝑖2 = ∑(𝑦𝑖 − 𝛽̂1 𝑥1𝑖 − 𝛽̂2 𝑥2𝑖 )2
∑ 𝑒𝑖2 = ∑ 𝑒𝑖 (𝑦𝑖 − 𝛽̂1 𝑥1𝑖 − 𝛽̂2 𝑥2𝑖 )
= ∑ 𝑒𝑖 𝑦𝑖 − 𝛽̂1 ∑ 𝑥1𝑖 𝑒𝑖 − 𝛽̂2 ∑ 𝑥1𝑖 𝑥2𝑖
= ∑ 𝑒𝑖 𝑦𝑖 𝑆𝑖𝑛𝑐𝑒 ∑ 𝑥1𝑖 𝑒𝑖 = ∑ 𝑒𝑖 𝑥2𝑖 = 0
∑ 𝑒𝑖2 = ∑ 𝑦𝑖 (𝑦𝑖 − 𝛽̂1 𝑥1𝑖 − 𝛽̂2 𝑥2𝑖 )
∑ 𝑒𝑖2 = ∑ 𝑦𝑖2 − 𝛽̂1 ∑ 𝑥1𝑖 𝑦𝑖 − 𝛽̂2 ∑ 𝑥2𝑖 𝑦𝑖

∑ 𝑦𝑖2 = 𝛽̂1 ∑ 𝑥1𝑖 𝑦𝑖 + 𝛽̂2 ∑ 𝑥2𝑖 𝑦𝑖 +

∑ 𝑒𝑖2 … … … … . . (3.20)

𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑆𝑢𝑚 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑆𝑢𝑚


[ ] [ ] [ ]
𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠

𝑬𝑺𝑺 𝜷 ̂ ∑𝒙 𝒚 + 𝜷 ̂ ∑𝒙 𝒚
𝟏 𝟏𝒊 𝒊 𝟐 𝟐𝒊 𝒊
∴𝑹 = 𝟐
= … … … … … … … … … . . (3.21)
𝑻𝑺𝑺 𝟐
∑ 𝒚𝒊

As in simple regression, R2 is also viewed as a measure of the prediction ability of the


model over the sample period, or as a measure of how well the estimated regression fits
the data. If 𝑹𝟐 is high, the model is said to “fit” the data well. On the other hand, if 𝑹𝟐 is
low, the model does not fit the data well.

̅ 𝟐)
Adjusted Coefficient of Determination (𝑹
One major limitation with 𝑹𝟐 is that it can be made large by adding more and more
variables, even if the variables added have no economic justification. Algebraically, it is
the fact that as the variables are added, the sum of squared errors (RSS) goes down (it can
remain unchanged, but this is rare) and thus 𝑹𝟐 goes up. If the model contains 𝒏 − 𝟏
variables then 𝑹𝟐 = 𝟏. The manipulation of model just to obtain a high 𝑹𝟐 is not wise.
To overcome this limitation of the 𝑹𝟐 , we can “adjust” it in a way that takes into account
the number of variables included in a given model. This alternative measure of goodness

By: Teklebirhan A. (Asst. Prof.) Page 8


CHAPTER TWO: REGRESSION ANALYSIS 2025

̅ 𝟐 , is usually reported by
of fit, called the adjusted 𝑹𝟐 and often symbolized as 𝑹
regression programs. It is computed as:
𝟐
̅ 𝟐 = 𝟏 − ∑ 𝒆𝒊𝟐⁄𝒏−𝒌 = 𝟏 − (𝟏 −
𝑹 ∑ ⁄ 𝒚𝒊 𝒏−𝟏

𝒏−𝟏
𝑹𝟐 ) ( ) … … … … … … (𝟑. 𝟐𝟐)
𝒏−𝒌
This measure does not always goes up when a variable is added because of the degree of
̅ 𝟐 is that it
freedom term 𝒏 − 𝒌 is the numerator. That is, the primary attractiveness of 𝑹
imposes a penalty for adding additional regressors to a model. If a regressor is added to
the model then RSS decreases, or at least remains constant. On the other hand, the
degrees of freedom of the regression 𝒏 − 𝒌 always decrease.

An interesting algebraic fact is that if we add a new regressor to a model, increases if, and
only if, the t statistic on the new regressor is greater than 1 in absolute value. Thus, we
̅ 𝟐 could be used to decide whether a certain additional regressor
see immediately that 𝑹
̅ 𝟐 has an upper bound that is equal to 1, but it does
must be included in the model. The 𝑹
not strictly have a lower bound since it can take negative values. While solving one
problem, this corrected measure of goodness of fit unfortunately introduces another one.
̅ 𝟐 is no longer the percent of variation explained.
It loses its interpretation; 𝑹

3.3.4. General Linear Regression Model and Matrix Approach


So far we have discussed the regression models containing one or two explanatory
variables. The econometric analysis of the simple regression model and a model with
two explanatory variables was made with ordinary algebra. However, a model with more
than two explanatory variables is virtually intractable with this tool. For this reason, a
multiple regression model with more than two explanatory variables can easily be
estimated using matrix algebra.

The general linear regression model with 𝒌 explanatory variables is written in the form:
𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 + 𝜷𝟑 𝑿𝟑𝒊 … … . . +𝜷𝒌 𝑿𝒌𝒊 + 𝑼𝒊 … … … … … … (𝟑. 𝟐𝟑)

By: Teklebirhan A. (Asst. Prof.) Page 9


CHAPTER TWO: REGRESSION ANALYSIS 2025

Where, (𝑖 = 1, 2, 3, … … … 𝑛) and 𝒏 is the size of the observation, and 𝜷𝟎 is the intercept,


𝜷𝟏 to 𝜷𝒌 partial slope coefficients 𝑼𝒊 is stochastic disturbance term and 𝒊 is 𝒊𝒕𝒉
observation.

Since 𝒊 represents the 𝒊𝒕𝒉 observation, we shall have ‘𝒏’ number of equations with ‘𝒏’
number of observations on each variable.
𝒀𝟏 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟏 + 𝜷𝟐 𝑿𝟐𝟏 + 𝜷𝟑 𝑿𝟑𝟏 … … . . +𝜷𝒌 𝑿𝒌𝟏 + 𝑼𝟏
𝒀𝟐 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟐 + 𝜷𝟐 𝑿𝟐𝟐 + 𝜷𝟑 𝑿𝟑𝟐 … … . . +𝜷𝒌 𝑿𝒌𝟐 + 𝑼𝟐
𝒀𝟑 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝟑 + 𝜷𝟐 𝑿𝟐𝟑 + 𝜷𝟑 𝑿𝟑𝟑 … … . . +𝜷𝒌 𝑿𝒌𝟑 + 𝑼𝟑
…………………………………………………...............
𝒀𝒏 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒏 + 𝜷𝟐 𝑿𝟐𝒏 + 𝜷𝟑 𝑿𝟑𝒏 … … . . +𝜷𝒌 𝑿𝒌𝒏 + 𝑼𝒏
The above system of equations can be expressed in a compact form by using matrix
notation as:
Y1  1 X 11 X 21 ....... X k1    0  U 1 
Y  1 X   
X 22 ....... X k 2   1  U 
 2  12  2
Y3  = 1 X 13 X 23 ....... X k 3    2  + U 3 
       
.  . . . ....... .   .   . 
Yn  1 X 1n X 2n ....... X kn    n  U n 
Y = X .  + U

In short 𝒀 = 𝑿𝜷 + 𝑼 … … … … … … … … … … … … … … … … … … … … … … … . . (𝟑. 𝟐𝟔)


Where: 𝒀 is an (𝒏 × 𝟏) column vector of true values of 𝒀.
𝑿 is an (𝒏 × (𝒌 + 𝟏)) matrix of true values of 𝒌 explanatory variables of the
model where the first column of 1’𝑠 represents the intercept term.
𝜷 is a ((𝑲 + 𝟏) × 𝟏) column vector of the population parameters
𝜷𝟎 , 𝜷𝟏 , 𝜷𝟐 , … . , 𝜷𝑲 .
𝑼 is an (𝒏 × 𝟏) column vector of the population random disturbance (error)
term.

Equation (𝟑. 𝟐𝟔) is true population relationship of the variables in matrix format. By
taking the conditional expected value of (𝟑. 𝟐𝟔) for a given values of the explanatory
variables of the model, we get the population regression function (𝑷𝑹𝑭) in matrix format
as:
𝑬(𝒀) = 𝑬(𝑿𝜷) + 𝑬(𝑼)
𝑬(𝒀) = 𝑿𝜷, 𝒃𝒆𝒄𝒂𝒖𝒔𝒆 𝑬(𝑼) = 𝟎
Therefore, 𝑷𝑹𝑭 = 𝑬(𝒀) = 𝑿𝜷 … … … … … … … … … … … … (𝟑. 𝟑𝟕)
In econometrics equations like (3.26) is difficult to estimate since estimation of (3.26)
requires population observation on all possible values of the variables of the model. As a
result, in most econometric analysis the true population relationship like (3.26) is

By: Teklebirhan A. (Asst. Prof.) Page 10


CHAPTER TWO: REGRESSION ANALYSIS 2025

estimated by sample relationship. The sample relationship among variables with ‘𝒌’
number of explanatory variables and ‘𝒏’ number of observations can be set in matrix
format as follows:
̂ + 𝒆 … … … … … … … … … … … … … … … … … … … … … … … . . (𝟑. 𝟐𝟖)
𝒀 = 𝑿𝜷
Where: 𝜷 ̂ is a ((𝒌 + 𝟏) × 𝟏) column vector of estimates of the true population
parameters 𝜷
̂=𝒀
𝑿𝜷 ̂ is an (𝒏 × 𝟏) column vector of predicted values of the dependent
variable 𝒀.
𝒆 is an (𝒏 × 𝟏) column vector of the sample disturbance (error) term.

As in the two explanatory variables model, in the k-explanatory variable case the OLS
estimators are obtained by minimizing
̂𝟎 − 𝜷
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝜷 ̂ 𝟏 𝑿𝟏𝒊 − 𝜷
̂ 𝟐 𝑿𝟐𝒊 − ⋯ . . −𝜷
̂ 𝒌 𝑿𝒌𝒊 )𝟐 … … … … … … … … (𝟑. 𝟐𝟗)

Where, ∑ 𝒆𝟐𝒊 is the total squared prediction error (or RSS) of the model.
In matrix notation, this amounts to minimizing 𝒆′ 𝒆. That is:
𝒆𝟏
𝒆𝟐
. 𝟐 𝟐 𝟐 𝟐
𝒆′ 𝒆 = [𝒆𝟏 , 𝒆𝟐 , … , 𝒆𝒏 ] . 𝟐
. = 𝒆𝟏 + 𝒆𝟐 + 𝒆𝟑 + ⋯ … … … + 𝒆𝒏 = 𝜮𝒆𝒊
.
[𝒆𝒏 ]
𝟐 ′
∴ 𝜮𝒆𝒊 = 𝒆 𝒆 … … … … … … … … … … … … … … … … . . (𝟑. 𝟑𝟎)
From (3.28) we can derive that; 𝒆 = 𝒀 − 𝒀 ̂ = 𝒀 − 𝑿𝜷 ̂
Therefore, through substitution in equation 3.30, we get;
𝒆′ 𝒆 = (𝒀 − 𝑿𝜷 ̂ )′ (𝒀 − 𝑿𝜷
̂)
̂ ′ 𝒀 − 𝒀′ 𝑿𝜷
𝒆′ 𝒆 = 𝒀′ 𝒀 − 𝑿′ 𝜷 ̂+𝜷
̂ ′ 𝑿′ 𝑿𝜷
̂ … … … … … . . (𝟑. 𝟑𝟏)
̂ = 𝒀′(𝟏× 𝒏) × 𝑿(𝒏×(𝒌+𝟏) × 𝜷
The order of the matrix 𝒀′ 𝑿𝜷 ̂ ((𝒌+𝟏)×𝟏 = 𝒀′ 𝑿𝜷
̂ (𝟏×𝟏) , it is
scalar.
̂ is a scalar, has only a single entry, it is equal to its transpose. That
Since the matrix 𝒀′ 𝑿𝜷
is;
𝒀′ 𝑿𝜷̂ (𝟏×𝟏) = [𝒀′ 𝑿𝜷
̂ (𝟏×𝟏) ]′
𝒀′ 𝑿𝜷̂ (𝟏×𝟏) = 𝒀𝑿′ 𝜷
̂′
(𝟏×𝟏)
′ ′ ′ ̂′ ̂ ′ 𝑿′ 𝑿𝜷
̂ … … … … … . . (𝟑. 𝟑𝟐)
𝒆 𝒆 = 𝒀 𝒀 − 𝟐𝑿 𝜷 𝒀 + 𝜷
̂ we get the expressions for OLS estimates in
Minimizing 𝒆′ 𝒆 in (3.32) with respect to 𝜷
matrix format as follows:

By: Teklebirhan A. (Asst. Prof.) Page 11


CHAPTER TWO: REGRESSION ANALYSIS 2025

𝝏𝜮𝒆𝟐𝒊 𝝏(𝒆′ 𝒆)
= ̂
= −2𝑿′ 𝒀 + 𝟐𝑿′ 𝑿𝜷
̂
𝜕𝜷 𝜕𝜷̂
To get the above expression, we used the rule of differentiation of matrix notations,
namely;
̂ ′ 𝑿′ )
𝝏(𝜷 ̂ 𝑿𝜷
𝝏(𝜷 ̂ ′)
= 𝑿′ , 𝒂𝒏𝒅 ̂
= 𝟐𝑿𝜷
𝜕𝜷̂ ̂
𝜕𝜷
Equating the above expression to null vector, 0, we obtain:
−2𝑿′ 𝒀 + 𝟐𝑿′ 𝑿𝜷 ̂=𝟎 ⇒ 𝑿′ 𝑿𝜷̂ = 𝑿′ 𝒀 ➔ OLS Normal Equation

̂ = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
𝜷 … … … … … … … … … … … … . … … … . (𝟑. 𝟑𝟑)


̂ is the vector of required least square estimators, 𝜷
Therefore, 𝜷 ̂ 𝟏, 𝜷
̂ 𝟐, … … . . , 𝜷
̂𝒌

3.3.5. The Coefficient of Determination (𝑹𝟐 ): The case of ‘k’ explanatory variables
The coefficient of multiple determination (𝑹𝟐 ) of MLRMs with ‘k’ number of
explanatory variables can be derived in matrix form as follows.

We know from (3.32) that


̂ ′𝒀 + 𝜷
𝜮𝒆𝟐𝒊 = 𝒆′ 𝒆 = 𝒀′ 𝒀 − 𝟐𝑿′ 𝜷 ̂ ′ 𝑿′ 𝑿𝜷
̂
̂ = 𝑿′ 𝒀 𝑎𝑛𝑑 ∑ 𝒀𝟐𝒊 = 𝒀′ 𝒀
Since (𝑿′ 𝑿)𝜷
∴ 𝒆′ 𝒆 = 𝒀′ 𝒀 − 𝟐𝜷 ̂ ′ 𝑿′ 𝒀 + 𝜷
̂ ′ 𝑿′ 𝒀
𝒆′ 𝒆 = 𝒀′ 𝒀 − 𝜷 ̂ ′ 𝑿′ 𝒀 … … … … … … … … … … … … … … … … … … … … (𝟑. 𝟑𝟒)
̂ ′ 𝑿′ 𝒀 = 𝒆′ 𝒆 − 𝒀′ 𝒀 … … … … … … … … … … … … … … … … … … … … (𝟑. 𝟑𝟓)
𝜷
We know, 𝒚𝒊 = 𝒀𝒊 − 𝒀 ̅ , and hence, ∑ 𝒚𝟐𝒊 = ∑[𝒀𝒊 − 𝒀
̅ ]𝟐
̅𝟐
∴ ∑ 𝒚𝟐𝒊 = ∑ 𝒀𝟐𝒊 − 𝒏𝒀
In matrix notation
̅ 𝟐 … … … … … … … … … … … … … … … … … … . (𝟑. 𝟑𝟔)
∑ 𝒚𝟐𝒊 = 𝒀′ 𝒀 − 𝒏𝒀
Equation (3.36) is the total variation (TSS) of the model.
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝒔𝒖𝒎 𝒐𝒇 𝒔𝒒𝒖𝒂𝒓𝒆𝒔(𝑬𝑺𝑺) = ∑ 𝒚𝟐𝒊 − ∑ 𝒆𝟐𝒊
̅ 𝟐 − 𝒆′ 𝒆
𝑬𝑺𝑺 = 𝒀′ 𝒀 − 𝒏𝒀 ⇒ ̅ 𝟐 − (𝒀′ 𝒀 − 𝜷
𝑬𝑺𝑺 = 𝒀′ 𝒀 − 𝒏𝒀 ̂ ′ 𝑿′ 𝒀)

̅ 𝟐 … … … … … … … … … … … . . (𝟑. 𝟑𝟕)
̂ ′ 𝑿′ 𝒀 − 𝒏𝒀
𝑬𝑺𝑺 = 𝜷
𝑬𝑺𝑺 ̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 +𝜷
𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊 + …… +𝜷
̂ 𝒌 ∑ 𝒙𝒌𝒊 𝒚𝒊 ̂ ′ 𝑿′ 𝒀−𝒏𝒀
𝜷 ̅𝟐
Recall that, 𝑹𝟐 = = =
𝑻𝑺𝑺 ∑ 𝒚𝟐𝒊 𝒀′ 𝒀−𝒏𝒀 ̅𝟐

̂ ′ 𝑿′ 𝒀 − 𝒏𝒀
𝜷 ̅𝟐
∴ 𝑹𝟐 =
𝒀′ 𝒀 − 𝒏𝒀 ̅…
𝟐 … … … … … … … … … … … . . (𝟑. 𝟑𝟖)

By: Teklebirhan A. (Asst. Prof.) Page 12


CHAPTER TWO: REGRESSION ANALYSIS 2025

Numerical Example-2:
▪ As illustration, let’s rework our consumption-income example of chapter -2
Observations 1 2 3 4 5 6
Consumption Expenditure (Y) 4 4 7 8 9 10
Monthly Income (X) 5 4 8 10 13 14

▪ Based on the above data,


a) Compute the OLS estimates using matrix formulation
b) Compute 𝑅 2 using matrix formulation
❑ Solution
̂
̂ = [𝜷𝟎 ] = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
a) For one explanatory variable case, 𝜷
̂𝟏
𝜷
𝟏 𝑿𝟏
𝟏 𝑿𝟐
𝟏 𝟏 𝟏 …….𝟏 𝒏 ∑ 𝑿𝒊
𝟏 𝑿𝟑
𝑿′ 𝑿 = [ ] =[ ]
𝑿𝟏 𝑿𝟐 𝑿𝟑 … … . 𝑿𝒏 . 𝟐
∑ 𝑿𝒊 ∑ 𝑿𝒊
.
[𝟏 𝑿𝒏 ]
𝒀𝟏
𝒀𝟐
𝟏 𝟏 𝟏 …….𝟏 𝒀𝟑 ∑ 𝒀𝒊

𝑿𝒀=[ ] =[ ]
𝑿𝟏 𝑿𝟐 𝑿𝟑 … … . 𝑿𝒏 .
∑ 𝑿𝒊 𝒀𝒊
.
[𝒀𝒏 ]
𝑼𝒔𝒊𝒏𝒈 𝒕𝒉𝒆 𝒂𝒃𝒐𝒗𝒆 𝒅𝒂𝒕𝒂, 𝒘𝒆 𝒄𝒂𝒏 𝒐𝒃𝒕𝒂𝒊𝒏;
𝟔 𝟓𝟒 𝟒𝟐
𝑿′ 𝑿 = [ ] and 𝑿′ 𝒀 = [ ]
𝟓𝟒 𝟓𝟕𝟎 𝟒𝟐𝟗
➢ Now, find the inverse of the matrix 𝑿′ 𝑿
Recall that the inverse of a matrix 𝑿′ 𝑿 can be found as follows:
𝟏
(𝑿′ 𝑿)−𝟏 = ′ [𝑨𝒅𝒋(𝑿′ 𝑿)]
|𝑿 𝑿|
𝑤ℎ𝑒𝑟𝑒, 𝑨𝒅𝒋(𝑿 𝑿) 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑟𝑎𝑛𝑠𝑝𝑜𝑠𝑒 𝑜𝑓 𝑡ℎ𝑒 𝒄𝒐𝒇𝒂𝒄𝒕𝒐𝒓 𝒎𝒂𝒕𝒓𝒊𝒙 𝑜𝑓 𝑿′ 𝑿

and 𝒄𝒐𝒇𝒂𝒄𝒕𝒐𝒓 𝒂𝒊𝒋 = (−)𝒊𝒋 |𝑴𝒊𝒏𝒐𝒓𝒊𝒋 |


𝟓𝟕𝟎 −𝟓𝟒
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝒄𝒐𝒇𝒂𝒄𝒕𝒐𝒓 𝒎𝒂𝒕𝒓𝒊𝒙 (𝑿′ 𝑿) = [ ] and its transpose is
−𝟓𝟒 𝟔
𝟓𝟕𝟎 −𝟓𝟒
[ ]
−𝟓𝟒 𝟔

By: Teklebirhan A. (Asst. Prof.) Page 13


CHAPTER TWO: REGRESSION ANALYSIS 2025

𝟏 𝟓𝟕𝟎 −𝟓𝟒 𝟏. 𝟏𝟑 −𝟎. 𝟏𝟎𝟕


(𝑿′ 𝑿)−𝟏 = [ ]=[ ]
𝟓𝟎𝟒 −𝟓𝟒 𝟔 −𝟎. 𝟏𝟎𝟕 𝟎. 𝟎𝟏𝟏𝟗
Where, the determinant of matrix 𝑿′ 𝑿 is 504
Therefore,
̂
𝜷̂ = [𝜷𝟎 ] = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀 = [ 𝟏. 𝟏𝟑 −𝟎. 𝟏𝟎𝟕 𝟒𝟐
][ ]=[
𝟏. 𝟓𝟔
]
̂𝟏
𝜷 −𝟎. 𝟏𝟎𝟕 𝟎. 𝟎𝟏𝟏𝟗 𝟒𝟐𝟗 𝟎. 𝟔𝟏
̂ ′ 𝑿′ 𝒀−𝒏𝒀
𝜷 ̅𝟐
b) Recall that ∴ 𝑹𝟐 = ′ ̅𝟐
𝒀 𝒀−𝒏𝒀
̂ ′ 𝑿′ 𝒀 = [𝟏. 𝟓𝟑 𝟒𝟐
𝜷 𝟎. 𝟔𝟎𝟕] [ ] = 𝟑𝟐𝟒. 𝟔𝟔
𝟒𝟐𝟗
𝒀𝟏
𝒀𝟐
𝒀𝟑
𝒀′ 𝒀 = [𝒀𝟏 𝒀𝟐 𝒀𝟑 . . 𝒀𝒏 ] = [∑ 𝒀𝟐𝒊 ] = 𝟑𝟐𝟔
.
.
[𝒀𝒏 ]
𝒂𝒏𝒅
𝟐
̅ = 𝟔(𝟕𝟐 ) = 𝟐𝟗𝟒
𝒏𝒀
𝟑𝟐𝟒. 𝟔𝟔𝟏 − 𝟐𝟗𝟒 𝟑𝟎. 𝟔𝟔
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝑹𝟐 = = ≅ 𝟎. 𝟗𝟔
𝟑𝟐𝟔 − 𝟐𝟗𝟒 𝟑𝟐
3.4. Statistical Properties of OLS Estimators: Matrix Approach
Like in the case of simple linear regression, the OLS estimators satisfy the Gauss-
Markov Theorem in multiple regressions. That is, in the class of linear and unbiased
estimators, OLS estimators are best estimators. Now, we are in a position to examine the
desirable properties of the OLS estimators in matrix notations:
̂ ′ 𝒔 𝒂𝒓𝒆 𝒍𝒊𝒏𝒆𝒂𝒓 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓𝒔 𝒐𝒇 𝜷
1. Linearity: Proposition 𝜷
We know that: ̂ = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
𝜷
To show the above proposition, let 𝑪 = (𝑿′ 𝑿)−𝟏 𝑿′
∴𝜷̂ = 𝑪𝒀 … … … … … … … … … … … … … … … … … … … . . (𝟑. 𝟑𝟗)
̂ is linear in Y.
Since 𝑪 is a matrix of fixed variables, equation (3.39) indicates that 𝜷
2. Unbiasedness: Proposition 𝜷 ̂ ′ 𝒔 𝒂𝒓𝒆 𝒖𝒏𝒃𝒊𝒂𝒔𝒆𝒅 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓𝒔 𝒐𝒇 𝜷
̂ = (𝑿′ 𝑿)−𝟏 𝑿′ 𝒀
𝜷
̂ = (𝑿′ 𝑿)−𝟏 𝑿′ (𝑿𝜷 + 𝑼)
𝜷
̂ = 𝜷 + (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼 … … … … … … … … … … … … … … … (𝟑. 𝟒𝟎)
𝜷
𝑺𝒊𝒏𝒄𝒆, [(𝑿′ 𝑿)−𝟏 𝑿′ 𝑿] = 𝑰
̂ ) = 𝑬(𝜷 + (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼)
𝑬(𝜷
= 𝑬(𝜷) + 𝑬[(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼]

By: Teklebirhan A. (Asst. Prof.) Page 14


CHAPTER TWO: REGRESSION ANALYSIS 2025

= 𝜷 + 𝑬(𝑿′ 𝑿)−𝟏 𝑿′ 𝑬(𝑼)


̂ ) = 𝜷 … … … … … … … … . . … … … … … (𝟑. 𝟒𝟏)
𝑬(𝜷
Since 𝑬(𝑼) = 𝟎
Thus, least square estimators are unbiased estimators in MLEMs.
3. Minimum variance
Before showing all the OLS estimators are best (possess the minimum variance property),
it is important to derive their variance.

2 2 ′
̂ ) = 𝑬 [(𝜷
We know that, 𝑽𝒂𝒓(𝜷 ̂ − 𝑬(𝜷
̂ ) ] = 𝑬 [(𝜷
̂ − 𝜷) ] = 𝑬 [(𝜷
̂ − 𝜷)(𝜷
̂ − 𝜷) ]

̂ − 𝜷)(𝜷
𝑬 [(𝜷 ̂ − 𝜷) ] =

̂ 𝟏 − 𝜷 𝟏 )𝟐 ̂ 𝟏 − 𝜷𝟏 )(𝜷
𝑬[(𝜷 ̂ 𝟐 − 𝜷𝟐 )] … … 𝑬[(𝜷
̂ 𝟏 − 𝜷𝟏 )(𝜷
̂ 𝒌 − 𝜷𝒌 )]
𝑬(𝜷
̂ 𝟐 − 𝜷𝟐 )(𝜷
𝑬[(𝜷 ̂ 𝟏 − 𝜷𝟏 )] ̂ 𝟐 − 𝜷 𝟐 )𝟐
𝑬(𝜷 …… ̂ 𝟐 − 𝜷𝟐 )(𝜷
𝑬[(𝜷 ̂ 𝒌 − 𝜷𝒌 )]
= . .
. .
̂
[𝑬[(𝜷𝒌 − 𝜷𝒌 )(𝜷 ̂ 𝟏 − 𝜷𝟏 )] ̂ 𝒌 − 𝜷𝒌 )(𝜷
𝑬[(𝜷 ̂ 𝟐 − 𝜷𝟐 )] ……. ̂ 𝒌 − 𝜷 𝒌 )𝟐
𝑬(𝜷 ]

̂ 𝟏)
𝑽𝒂𝒓(𝜷 ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟐) …… ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝒌)
̂ 𝟐, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟏) 𝑽𝒂𝒓(𝜷̂ 𝟐) … …. ̂ 𝟐, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝒌)
= . . .
. . .
[ 𝑪𝒐𝒗(𝜷̂ ̂
𝒌 , 𝜷𝟏 ) 𝑪𝒐𝒗(𝜷̂ 𝒌, 𝜷
̂ 𝟐) …….. 𝑽𝒂𝒓(𝜷 ̂ 𝒌) ]

The above matrix is a symmetric matrix containing variances along its main diagonal and
covariance of the estimators everywhere else. This matrix is, therefore, called the
Variance-covariance matrix of least squares estimators of the regression slopes. Thus,
̂ ) = 𝑬 [(𝜷
𝑽𝒂𝒓(𝜷 ̂ − 𝜷)′ ] … … … … … … … … … … … . (3.42)
̂ − 𝜷)(𝜷

̂ = 𝜷 + (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼
From (3.40), we know that 𝜷
̂ − 𝜷 = (𝑿′ 𝑿)−𝟏 𝑿′ 𝑼 … … … … … … … … … … … . (3.43)
⇒𝜷
Substituting (3.43) in (3.42)
̂ ) = 𝑬[{(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼}{(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼}′ ]
𝑽𝒂𝒓(𝜷
= 𝑬[(𝑿′ 𝑿)−𝟏 𝑿′ 𝑼𝑼′ 𝑿(𝑿′ 𝑿)−𝟏 ]
= (𝑿′ 𝑿)−𝟏 𝑿′ 𝑬(𝑼𝑼′ )𝑿(𝑿′ 𝑿)−𝟏
= (𝑿′ 𝑿)−𝟏 𝑿′ 𝝈𝟐𝒖 𝑰𝒏 𝑿(𝑿′ 𝑿)−𝟏

By: Teklebirhan A. (Asst. Prof.) Page 15


CHAPTER TWO: REGRESSION ANALYSIS 2025

= 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 𝑿′ 𝑿(𝑿′ 𝑿)−𝟏

̂ ) = 𝝈𝟐𝒖 (𝑿′ 𝑿)…


𝑽𝒂𝒓(𝜷 −𝟏
… … … … … … … … … … … (3.44)
Note: (𝝈𝟐𝒖 being a scalar can be moved in front or behind of a matrix while identity
matrix 𝑰𝒏 can be suppressed).

̂ ) = 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏


Thus, we obtain, 𝑽𝒂𝒓(𝜷
𝒏 ∑ 𝑿𝟏𝒏 … … . . ∑ 𝑿𝟏𝒏
∑ 𝑿𝟏𝒏 𝟐
∑ 𝑿𝟏𝒏 … … .. ∑ 𝑿𝟏𝒏 𝑿𝒌𝒏
Where, (𝑿′ 𝑿) = . . .
. . .
[ ∑ 𝑿 𝒌𝒏 ∑ 𝑿𝟏𝒏 𝑿𝒌𝒏 … … . ∑ 𝑿𝟐𝒌𝒏 ]
̂ 𝒊 by taking the ith term from
We can, therefore, obtain the variance of any estimator say 𝜷
the principal diagonal of (𝑿′ 𝑿)−𝟏 and then multiplying it by 𝝈𝟐𝒖 .
NB, the X’s are in their absolute form.
When the 𝒙’𝒔 are in deviation form, we can write the multiple regression in matrix
̂ = (𝒙′ 𝒙)−𝟏 𝒙′ 𝒚
form as; 𝜷
𝜷̂𝟏 ∑ 𝒙𝟐𝟏 ∑ 𝒙𝟏 𝒙𝟐 … … . . ∑ 𝒙𝟏 𝒙𝒌
̂𝟐
𝜷 ∑ 𝒙𝟐 𝒙𝟏 ∑ 𝒙𝟐𝟐 … … .. ∑ 𝒙𝟐 𝒙𝒌
Where, ̂ = . 𝑎𝑛𝑑 (𝒙′ 𝒙) =
𝜷 . . .
. . . .
̂
[ 𝜷𝒌 ] [ ∑ 𝒙𝒏 𝒙𝟏 ∑ 𝒙𝒏 𝒙𝟐 … … . ∑ 𝒙𝟐𝒌 ]
̂ doesn’t include the constant term 𝜷
The above column matrix 𝜷 ̂ 𝟎 . Under such condition,
the variances of slope parameters in deviation form can be written as:
̂ ) = 𝝈𝟐𝒖 (𝒙′ 𝒙)−𝟏 … … … … … … … … … … . . (𝟑. 𝟒𝟓)
𝑽𝒂𝒓(𝜷
In general, for MLEMs with two explanatory variables, the variances of OLS estimates
can be derived as follows. Such a model can be written in deviation form as:
𝒚𝟏 = 𝜷 𝟏 𝒙 𝟏 + 𝜷 𝟐 𝒙 𝟐
̂ ) = 𝑬 [(𝜷
𝑽𝒂𝒓(𝜷 ̂ − 𝜷)′ ]
̂ − 𝜷)(𝜷

̂ 𝟏 − 𝜷𝟏 )
(𝜷
̂ − 𝜷) = [
In this model; (𝜷 ]
̂ 𝟐 − 𝜷𝟐 )
(𝜷
̂ − 𝜷)′ = [(𝜷
(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
̂ 𝟐 − 𝜷𝟐 )]

By: Teklebirhan A. (Asst. Prof.) Page 16


CHAPTER TWO: REGRESSION ANALYSIS 2025

̂ − 𝜷𝟏 )
(𝜷
∴ (𝜷 ̂ − 𝜷)′ = [ 𝟏
̂ − 𝜷)(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
] [(𝜷 ̂ 𝟐 − 𝜷𝟐 )]
̂ 𝟐 − 𝜷𝟐 )
(𝜷
̂ − 𝜷)′ ] =
̂ − 𝜷)(𝜷
𝐴𝑛𝑑, 𝑬 [(𝜷

̂ 𝟏 − 𝜷 𝟏 )𝟐
(𝜷 ̂ 𝟏 − 𝜷𝟏 )(𝜷
(𝜷 ̂ 𝟐 − 𝜷𝟐 )
[ ]
̂ 𝟏 − 𝜷𝟏 )(𝜷
(𝜷 ̂ 𝟐 − 𝜷𝟐 ) (𝜷̂ 𝟐 − 𝜷 𝟐 )𝟐
̂ 𝟏)
𝑽𝒂𝒓(𝜷 ̂ 𝟏, 𝜷
𝑪𝒐𝒗(𝜷 ̂ 𝟐)
∴ (𝜷 ̂ − 𝜷)′ = [
̂ − 𝜷)(𝜷 ]
𝑪𝒐𝒗(𝜷̂ 𝟏, 𝜷
̂ 𝟐) 𝑽𝒂𝒓(𝜷 ̂ 𝟐)

Therefore, in case of two explanatory variables, 𝒙 in deviation form shall be:


𝒙𝟏𝟏 𝒙𝟐𝟏
𝒙𝟏𝟐 𝒙𝟐𝟐
𝒙𝟏𝟏 𝒙𝟏𝟐 … … . . 𝒙𝟏𝒏
𝒙= . 𝒂𝒏𝒅 𝒙′ = [𝒙 𝒙𝟐𝟐 … … . . 𝒙𝟐𝒏 ]
𝟏𝟐
.
[ 𝒙𝟏𝒏 𝒙𝟐𝒏 ]
𝒙𝟏𝟏 𝒙𝟐𝟏
𝒙𝟏𝟐 𝒙𝟐𝟐 ∑ 𝒙𝟐𝟏 ∑ 𝒙 𝟏 𝒙𝟐

𝒙𝟏𝟏 𝒙𝟏𝟐 … … . . 𝒙𝟏𝒏
𝒙 𝒙 = [𝒙 .
𝟏𝟐 𝒙𝟐𝟐 … … . . 𝒙𝟐𝒏 ] =[
𝟐
]
. ∑ 𝒙𝟏 𝒙𝟐 ∑ 𝒙𝟐
[ 𝒙𝟏𝒏 𝒙𝟐𝒏 ]
− ∑ 𝑥1 𝑥2 ∑ 𝑥22

[ 2
]
𝑨𝒅𝒋𝒐𝒊𝒏𝒕(𝒙 𝒙) − ∑ 𝑥 1 𝑥2 ∑ 𝑥 1
(𝒙′ 𝒙)−𝟏 = =
|𝒙′ 𝒙| ∑ 𝑥21 ∑ 𝑥1 𝑥2
| |
∑ 𝑥1 𝑥2 ∑ 𝑥22
∑ 𝒙𝟐𝟐 −∑ 𝒙𝟏 𝒙𝟐
𝝈𝟐𝒖 [ ]
− ∑ 𝒙𝟏 𝒙𝟐 ∑ 𝒙𝟐𝟏
̂) =
Thus, 𝑽𝒂𝒓(𝜷 𝝈𝟐𝒖 (𝒙′ 𝒙)−𝟏 = ∑ 𝒙𝟐 ∑ 𝒙𝟏 𝒙𝟐
| 𝟏 |
∑ 𝒙𝟏 𝒙𝟐 ∑ 𝒙𝟐𝟐

𝝈𝟐𝒖 ∑ 𝒙𝟐𝟐
̂ 𝟏) =
∴ 𝑽𝒂𝒓(𝜷 … … … … … … … … … … (𝟑. 𝟒𝟔)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − ∑(𝒙𝟏 𝒙𝟐 )𝟐
𝝈𝟐𝒖 ∑ 𝒙𝟐𝟏
̂ 𝟐) =
𝑽𝒂𝒓(𝜷 … … … … … … … … … … … . . … (𝟑. 𝟒𝟕)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )𝟐
(−)𝝈𝟐𝒖 ∑ 𝒙𝟏 𝒙𝟐
̂ 𝟏, 𝜷
𝑪𝒐𝒗𝒂𝒓(𝜷 ̂ 𝟐) = … … … … … … … … … . … (𝟑. 𝟒𝟖)
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )𝟐

By: Teklebirhan A. (Asst. Prof.) Page 17


CHAPTER TWO: REGRESSION ANALYSIS 2025

The only unknown part in variances and covariance of the estimators is 𝝈𝟐𝒖 . Thus, we
have to have an unbiased estimate of the variance of the population error 𝝈𝟐𝒖 . As we have
𝟐 ∑ 𝒆𝟐
̂ 𝒖 = 𝒊 is an unbiased estimator of 𝝈𝟐𝒖 .
established in simple regression model 𝝈 𝒏−𝟐

For k-parameters (including the constant parameter),


∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 = … … … … … … … … … … (𝟑. 𝟒𝟗)
𝒏−𝒌
̂ 𝟏 ∑ 𝒙𝟏 𝒚 − 𝜷
Where, ∑ 𝒆𝟐𝒊 = ∑ 𝒚𝟐𝒊 − 𝜷 ̂ 𝟐 ∑ 𝒙𝟐 𝒚 −
̂ 𝒌 ∑ 𝒙𝒌 𝒚 … … … … … . … … (𝟑. 𝟓𝟎)
⋯..−𝜷

For MLEMs with two explanatory variables, we have three parameters including the
constant term and therefore,
∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 = … … … … … … … … … … … … … . (𝟓. 𝟓𝟏)
𝒏−𝟑
This is all about the variance covariance of the parameters. Now it is time to see the
minimum variance property.
̂
Minimum variance of 𝜷
̂ vector are Best Estimators, we have to prove that the
To show that all the 𝜷𝒊′𝒔 in the 𝜷
variances obtained in (𝟑. 𝟒𝟒) are the smallest amongst all other possible linear unbiased
estimators. We follow the same procedure as followed in case of single explanatory
variable model where, we first assumed an alternative linear unbiased estimator and then
it was established that its variance is greater than the estimator of the regression model.

Assume that 𝜷̂ ∗ is an alternative unbiased and linear estimator of 𝜷, and is given by


̂ ∗ = [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝒀
𝜷
Where, 𝑨 is (𝒌 𝒙 𝒏) arbitrary matrix that is a function of X and/or other non-stochastic
variables, but it is not a function of y.
∴𝜷 ̂ ∗ = [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨][𝑿𝜷 + 𝑼] , Since 𝒀 = 𝑿𝜷 + 𝑼
̂ ∗ = 𝜷 + 𝑨𝑿𝜷 + [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝑼 … … … … … … … … … (𝟑. 𝟓𝟑)
⇒𝜷
Taking expectations on both sides of the above expression, we have
̂ ∗ ) = 𝜷 + 𝑨𝑿𝜷 + [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝑬(𝑼)
𝑬(𝜷
̂ ∗ ) = 𝜷 + 𝑨𝑿𝜷 , [𝑺𝒊𝒏𝒄𝒆 𝑬(𝑼) = 𝟎] … … … … … … … … … … … … … . . (𝟑. 𝟓𝟒)
𝑬(𝜷

By: Teklebirhan A. (Asst. Prof.) Page 18


CHAPTER TWO: REGRESSION ANALYSIS 2025

Since our assumption regarding an alternative 𝜷 ̂ ∗ is that it has to be an unbiased estimator


̂ ∗ ) = 𝜷; in other words, (𝑨𝑿𝜷) should be a null matrix.
of 𝜷, that is, 𝑬(𝜷
Thus, we say 𝑨𝑿 should be zero if 𝜷 ̂ ∗ = [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝒀 has to be an unbiased
estimator.

Let us now find variance of this alternative estimator.


̂ ∗ − 𝜷 = [(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝑼
Given that 𝑨𝑿 = 𝟎, equation (3.53) can be written as 𝜷
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒, 𝑽𝒂𝒓(𝜷 ̂ ∗ ) = 𝑬 [(𝜷 ̂ ∗ − 𝜷)′ ]
̂ ∗ − 𝜷)(𝜷
̂ ∗ ) = 𝑬 [[(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝑼𝑼′ [(𝑿′ 𝑿)−𝟏 𝑿 + 𝑨′ ]]
𝑽𝒂𝒓(𝜷
[(𝑿′ 𝑿)−𝟏 𝑿′ + 𝑨]𝑬(𝑼𝑼′ )[(𝑿′ 𝑿)−𝟏 𝑿 + 𝑨′ ]
= 𝝈𝟐𝒖 [(𝑿′ 𝑿)−𝟏 𝑿′ 𝑿(𝑿′ 𝑿)−𝟏 + 𝑨𝑿(𝑿′ 𝑿)−𝟏 + (𝑿′ 𝑿)−𝟏 𝑿′ 𝑨′ + 𝑨𝑨′ ]
= 𝝈𝟐𝒖 [(𝑿′ 𝑿)−𝟏 + 𝑨𝑨′ ]
(𝑺𝒊𝒏𝒄𝒆 (𝑿′ 𝑿)−𝟏 𝑿′ 𝑿 = 𝑰𝒏 𝑤ℎ𝑖𝑐ℎ 𝑤𝑒 𝑐𝑎𝑛 𝑠𝑢𝑝𝑟𝑒𝑠𝑠, 𝑎𝑛𝑑 𝑨𝑿 = 𝑿′ 𝑨′ = 𝟎)

̂ ∗ ) = 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 +𝝈𝟐𝒖…


∴ 𝑽𝒂𝒓(𝜷

𝑨𝑨′
… … … … … … … … … … … . (𝟑. 𝟓𝟓)

Therefore, 𝑽𝒂𝒓(𝜷 ̂ ∗ ) is greater than 𝑽𝒂𝒓(𝜷


̂ ) by an expression 𝝈𝟐𝒖 𝑨𝑨′ and it proves that
̂ is the best estimator.
𝜷

̂ is Best Unbiased Linear Estimator; that is to say, it is a BLUE


In conclusion, 𝜷
estimator.

3.5. Evaluation of an Estimated MLRM Model Using Statistical Criteria


After estimation of a MLRM, the next task is evaluation of the statistical relevance of the
variables included in the model. This can be done using the usual standard test
techniques. However, to use the standard significance test techniques, we need
Normality in the Probability Distributions of OLS’s Parameter Estimates in
̂ 𝑠.
MLRMs, 𝜷

3.5.1 The Probability Distributions of OLS’s Parameter estimates in MLRMs


In equation (3.39), we have established that OLS’s estimates in MLRMs are linear in 𝒀.
And from the basic property of normal probability distribution, we know that any linear

By: Teklebirhan A. (Asst. Prof.) Page 19


CHAPTER TWO: REGRESSION ANALYSIS 2025

̂ is a linear
function of a normal random variable is itself normal. Thus, since 𝜷
combination of 𝒀, its probability distribution will be normal if 𝒀 is a normal.

From equation (3.26), we know:


𝒀 = 𝑿𝜷 + 𝑼
In the above equation, 𝑿𝜷 is a matrix of fixed values, because the values of 𝑿′s are fixed
in repeated samples and 𝜷𝑠 are the true values of parameters of the population. As a
result, in the above equation 𝒀 is linear in 𝑼, i.e., the dependent variable 𝒀 is a linear
combination of the values of the population random term. Consequently, since the
population random disturbance term 𝑼 is normal by assumption, it follows that the
̂ are linear combinations of another normal
dependent variable Y is also normal. Since 𝜷
̂ themselves are normal
random variable (𝒀), then it follows that OLS’s estimates 𝜷
random variables. Therefore, the probability distributions of the sampling distribution of
OLS estimates in MLEMs are normal.

That is, the sampling distributions of OLS estimates in MLRMs are normal with mean
values of the true values of their respective population parameters 𝜷 and variances given
by 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 .

Symbolically:
̂ ~𝑵[𝜷, 𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 )] … … … … … … … … … . (𝟑. 𝟓𝟔)
𝜷
Or Equivalently,
̂ ~𝑵 [𝜷, √𝝈𝟐𝒖 (𝑿′ 𝑿)−𝟏 )] … … … … … … … … … . (𝟑. 𝟓𝟕)
𝜷

The normality of the sampling distributions of OLS’s estimates around the true values of
the population parameters implies that under AMLRA there is equal chance for any OLS
estimate to over or under estimate the true value of the population parameter in a
particular sample. But the most probable value for an estimate at a particular sample is
the true value of the population parameter.

3.5.2 Statistical Significance Test of Estimates in MLRMs /Hypothesis


Testing/

By: Teklebirhan A. (Asst. Prof.) Page 20


CHAPTER TWO: REGRESSION ANALYSIS 2025

In multiple regression models, we will undertake two types of tests of significance. These
are test of individual and overall significance. Let’s examine them one by one.
1. Tests of Individual Significance
This is the process of verifying the individual statistical significance of each parameter
estimates of a model. That is, checking whether the impact of a single explanatory
variable on the dependent variable is significant or not after taking the impact of all other
explanatory variables on the dependent variable into account. To elaborate test of
individual significance, consider the following model of the determinants of Teff farm
productivity.
̂𝟎 + 𝜷
𝒀=𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝒆𝒊 … … … … … … … … ….(3.58)
Where: 𝒀 is total output of Teff per hectare of land, 𝑿𝟏 and 𝑿𝟐 are the amount of
fertilizer used and rainfall, respectively.
Given the above model suppose we need to check whether the application of fertilizer
(𝑿𝟏 ) has a significant effect on agricultural productivity holding the effect of rainfall (𝑿𝟐 )
on Teff farm productivity constant, i.e., whether fertilizer (𝑿𝟏 ) is a significant factor in
affecting Teff farm productivity after taking the impact of rainfall on Teff farm
̂ 𝟏 holding the
productivity into account. In this case, we test the significance of 𝜷
influence of 𝑿𝟐 on Y constant. Mathematically, test of individual significance involves
testing the following two pairs of null and alternative hypotheses.
𝑨. 𝑯𝟎 : 𝜷𝟏 = 𝟎 B. 𝑯𝟎 : 𝜷𝟐 = 𝟎
𝑯𝑨 : 𝜷 𝟏 ≠ 𝟎 𝑯𝑨 : 𝜷𝟐 ≠ 𝟎

The null hypothesis in 𝐴 states that holding 𝑿𝟐 constant, 𝑿𝟏 has no significant (linear)
influence on 𝒀. Similarly, the null hypothesis in ‘𝑩’ states that holding 𝑿𝟏 constant,
𝑿𝟐 has no influence on the dependent variable 𝒀. To test the individual significance of
parameter estimates in MLRMs, we can use the usual statistical techniques of tests. These
include:

A. The Standard Error Test


The standard error test can be applied if the population variances of the parameter
estimates are known or if the sample size is sufficiently large (𝒏 > 𝟑𝟎). It can be used

By: Teklebirhan A. (Asst. Prof.) Page 21


CHAPTER TWO: REGRESSION ANALYSIS 2025

to test the individual significance of parameter estimates at 5% level of significance. To


elaborate the procedures of standard error test as a test of individual significance, we see
̂𝟎 + 𝜷
̂=𝜷
a model of two explanatory variables given by 𝒀 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 . But we will see
̂ 𝟏 .The test procedure for 𝜷
the test procedure only for 𝜷 ̂ 𝟐 can be done in the same way.
Step 1: Set the null and the alternative hypotheses empirically:
𝑯𝟎 : 𝜷𝟏 = 𝟎 and 𝑯𝑨 : 𝜷𝟏 ≠ 𝟎
Step 2: Compute the standard error of the estimate.

̂ 𝟐𝒖 ∑ 𝒙𝟐𝟐
𝝈 ∑ 𝒆𝟐𝒊
̂ 𝟏 ) = √𝑽𝒂𝒓(𝜷
𝑺𝑬(𝜷 ̂ 𝟏) = √ ; 𝑤ℎ𝑒𝑟𝑒, ̂ 𝟐𝒖
𝝈 =
∑ 𝒙𝟐𝟏 ∑ 𝒙𝟐𝟐 − (∑ 𝒙𝟏 𝒙𝟐 )
𝟐 𝒏−𝟐
Step 3: Make decision. That means, accept or reject the null-hypothesis. In this case
̂ 𝟏) > 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ 𝟏 , Accept the null hypothesis. That is, the estimate 𝜷 ̂ 𝟏 is not

statistically significant at 5% level of significance. This would imply that holding


𝑿𝟐 constant 𝑿𝟏 has no significant linear impact on 𝒀.
̂ 𝟏) < 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ 𝟏 , reject the null hypothesis. That is, the estimate 𝜷
̂ 𝟏 is

statistically significant at 5% level of significance. This would imply holding 𝑿𝟐


constant 𝑿𝟏 has significant linear impact on 𝒀.

B. The Student’s T-Test


If the size of the sample is small (𝒏 ≤ 𝟑𝟎) and the population variances of the estimates
are unknown then, we use the student’s t-test to perform test of individual significance of
parameter estimates at any chosen level of significance. The procedure is as follows.
Step 1: State the null and alternative hypotheses empirically.
𝑯𝟎 : 𝜷𝟏 = 𝟎 and 𝑯𝑨 : 𝜷𝟏 ≠ 𝟎
Step 2: Choose the level of significance (𝜶).
Step 3: Determine the critical values and identify the acceptance and rejection regions of
the null hypothesis at the chosen level of significance (𝜶) and the degrees of freedom
(𝒏 − 𝒌). To identify the critical values divide level of significance (α) into two then read
table value, 𝒕𝒕 from t-probability table at 𝜶⁄𝟐 with (𝒏 − 𝒌), where 𝒏 is number of
observations and 𝒌 is number of parameters in the model including the intercept term. In

By: Teklebirhan A. (Asst. Prof.) Page 22


CHAPTER TWO: REGRESSION ANALYSIS 2025

case of two explanatory variables model, the number of parameters is 3, then the degree
of freedom is 𝒏 − 𝟑.
Step 4: Compute the t-statistics (𝒕𝑪 ) of the estimates under the null hypothesis. That is,
̂ 𝟏 − 𝜷𝟏
𝜷
𝒕𝑪 =
𝑺𝑬(𝜷 ̂𝟏)

̂ 𝟏 is
Since 𝜷𝟏 = 𝟎 in the null hypothesis, the computed t-statistics (𝒕𝑪 ) of the estimate 𝜷
̂𝟏 − 𝟎
𝜷 ̂𝟏
𝜷
𝒕𝑪 = ⇒ 𝒕𝑪 =
̂𝟏)
𝑺𝑬(𝜷 ̂ 𝟏)
𝑺𝑬(𝜷
Step 5: Compare 𝒕𝑪 𝒂𝒏𝒅 𝒕𝒕 and make decision
̂ 𝟏 is not significant at the chosen
If 𝒕𝑪 < |𝒕𝒕 |, accept the null hypothesis. That is, 𝜷
level of significance. This would imply holding 𝑿𝟐 constant 𝑿𝟏 has no significant
linear impact on 𝒀.
If 𝒕𝑪 > |𝒕𝒕 |, reject the null hypothesis and hence accept the alternative one. That
̂ 𝟏 is significant at the chosen level of significance. This would imply
is, 𝜷
holding𝑿𝟐 constant 𝑿𝟏 has significant linear impact on Y.
3.5.2 Test of the Overall Significance of MLRMs
This is the process of testing the joint significance of parameter estimates of the model. It
involves checking whether the variation in dependent variable of a model is significantly
explained by the variation in all explanatory variables included in the model. To elaborate
test of the overall significance consider a model:
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝜷
̂ 𝟑 𝑿𝟑 + ⋯ … . . +𝜷
̂ 𝒌 𝑿𝒌 +𝒆𝒊
Given this model we may interested to know whether the variation in the dependent
variable can be attributed to the variation in all explanatory variables of the model or not.
If no amount of the variation in the dependent variable can be attributed to the variation
of explanatory variables included in the model then, none of the explanatory variables
included in the model are relevant. That is, all estimates of slope coefficient will be
statistically not different from zero. On the other hand, if it is possible to attribute
significant proportion of the variation in the dependent variable to the variation in
explanatory variables then, at least one of the explanatory variables included in the model

By: Teklebirhan A. (Asst. Prof.) Page 23


CHAPTER TWO: REGRESSION ANALYSIS 2025

is relevant. That is, at least one of the estimates of slope coefficient will be statistically
different from zero (significant).

Thus, this test has the following null and alternative hypotheses to test:
𝑯𝟎 : 𝜷 𝟏 = 𝜷 𝟐 = 𝜷 𝟑 … … … … … … . . = 𝜷 𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
The null hypothesis in a joint hypothesis states that none of the explanatory variables
included in the model are relevant in a sense that no amount of the variation in Y can be
attributed to the variation in all explanatory variables simultaneously. That means if all
explanatory variables of the model are change simultaneously it will left the value of Y
unchanged.
How to approach test of the overall significance of MLRM?
If the null-hypothesis is true, that is if all the explanatory variables included in the model
are irrelevant then, there wouldn’t be a significant explanatory power difference between
the models with and without all the explanatory variables. Thus, test of the overall
significance of MLRMs can be approached by testing whether the difference in
explanatory power of the model with and without all explanatory variables is significant
or not. In this case, if the difference is insignificant we accept the null-hypothesis and
reject it if the difference is significant.

Similarly, this test can be done by comparing the sum of squared errors (RSS) of the
model with and without all explanatory variables. In this case we accept the null-
hypothesis if the difference between the sums of squared errors (RSS) of the model with
and without all explanatory variables is insignificant. The notion of this is
straightforward in a sense that if all explanatory variables are irrelevant then, inclusion of
them in the model contributes insignificant amount to the explanation of the model as a
result the sample prediction error of the model wouldn’t reduce significantly.

Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors of the
model without the inclusion of all the explanatory variables of the model, i.e., the residual
sum of square of the model obtained assuming that all the explanatory variables are
irrelevant (under the null hypothesis) and Unrestricted Residual Sum of Squares
(URSS) be the sum of squared errors of the model with the inclusion of all explanatory

By: Teklebirhan A. (Asst. Prof.) Page 24


CHAPTER TWO: REGRESSION ANALYSIS 2025

variables in the model. It is always true that 𝑹𝑹𝑺𝑺 ≥ 𝑼𝑹𝑺𝑺 (why?). To elaborate these
concepts consider the following model
̂𝟎 + 𝜷
𝒀𝒊 = 𝜷 ̂ 𝟏 𝑿𝟏 + 𝜷
̂ 𝟐 𝑿𝟐 + 𝜷
̂ 𝟑 𝑿𝟑 + ⋯ … . . +𝜷
̂ 𝒌 𝑿𝒌 +𝒆𝒊
This model is called the unrestricted model. The test of joint hypothesis is given by:
𝑯𝟎 : 𝜷 𝟏 = 𝜷 𝟐 = 𝜷 𝟑 … … … … … … . . = 𝜷 𝒌 = 𝟎
𝑯𝑨 : 𝐴t least one of the 𝜷 is different from zero
We know that:
̂ 𝒊 +𝒆𝒊
𝒀𝒊 = 𝒀 ̂𝒊
⇒ 𝒆𝒊 = 𝒀𝒊 − 𝒀
̂ 𝒊 )𝟐
∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝒀
This sum of squared error is called unrestricted residual sum of square (URSS).

However, if the null hypothesis is assumed to be true, i.e., when all the slope coefficients
are zero the model shrinks to:
̂ 𝟎 + 𝒆𝒊
𝒀𝒊 = 𝜷
This model is called restricted model. Applying OLS we obtain:
∑ 𝒀𝒊
̂𝟎 =
𝜷 =𝒀̅ … … … … … … … … … … … … … … … . . (𝟑. 𝟓𝟗)
𝒏
Therefore, 𝒆𝒊 = 𝒀𝒊 − 𝜷 ̂ 𝟎 , but 𝜷
̂𝟎 = 𝒀
̅
𝒆𝒊 = 𝒀𝒊 − 𝒀̅
∴ ∑ 𝒆𝟐𝒊 = ∑(𝒀𝒊 − 𝒀̅ )𝟐 = ∑ 𝒚𝟐𝒊 = 𝑻𝑺𝑺
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of
square (TSS).

The ratio:
𝑹𝑹𝑺𝑺 − 𝑼𝑹𝑺𝑺⁄𝑲 − 𝟏
~𝑭(𝑲−𝟏, 𝒏−𝑲) … … … … … … … . . (𝟑. 𝟔𝟎)
𝑼𝑹𝑺𝑺⁄𝒏 − 𝑲
has an 𝑭 − 𝒅𝒊𝒕𝒓𝒊𝒃𝒖𝒕𝒊𝒐𝒏 with 𝒌 − 𝟏 and 𝒏 − 𝒌 degrees of freedom for the numerator
and denominator, respectively.
𝑹𝑹𝑺𝑺 = 𝑻𝑺𝑺
𝑼𝑹𝑺𝑺 = ∑ 𝒚𝟐𝒊 − 𝜷̂ 𝟏 ∑ 𝒙𝟏 𝒚 − 𝜷
̂ 𝟐 ∑ 𝒙𝟐 𝒚 − ⋯ … . . … … … . . . − 𝜷
̂ 𝒌 ∑ 𝒙𝒌 𝒚 = 𝑹𝑺𝑺
𝑰. 𝒆. , 𝑼𝑹𝑺𝑺 = 𝑹𝑺𝑺
𝑻𝑺𝑺 − 𝑹𝑺𝑺⁄𝑲 − 𝟏
𝑭= ~𝑭(𝑲−𝟏, 𝒏−𝑲)
𝑹𝑺𝑺⁄𝒏 − 𝑲
𝑬𝑺𝑺⁄𝑲 − 𝟏
𝑭𝒄(𝑲−𝟏, 𝒏−𝑲) = … … … … … … … … … … … … … (𝟑. 𝟔𝟏)
𝑹𝑺𝑺⁄𝒏 − 𝑲
By: Teklebirhan A. (Asst. Prof.) Page 25
CHAPTER TWO: REGRESSION ANALYSIS 2025

If we divide numerator and denominator of the above equation by 𝑻𝑺𝑺 then:


𝑬𝑺𝑺
⁄𝑲 − 𝟏
𝑭𝒄 = 𝑻𝑺𝑺
𝑹𝑺𝑺
𝑻𝑺𝑺 ⁄𝒏 − 𝑲
𝑹𝟐 ⁄𝑲 − 𝟏
∴ 𝑭𝒄 = … … … … … … … … … … … … … … … (𝟑. 𝟔𝟐)
𝟏 − 𝑹𝟐 ⁄𝒏 − 𝑲
This implies that the computed value of F can be calculated either as a ratio of
𝑬𝑺𝑺 & 𝑻𝑺𝑺 or 𝑹𝟐 & 𝟏 − 𝑹𝟐 . This value is compared with the table value of F which
leaves the probability of 𝜶 in the upper tail of the F-distribution with 𝒌 − 𝟏 & 𝒏 −
𝒌 degrees of freedom.

➢ If the null hypothesis is not true, then the difference between RRSS and URSS
(or i.e., TSS & RSS) becomes large, implying that the constraints placed on the
model by the null hypothesis have large effect on the ability of the model to fit the
data, and the value of 𝑭 tends to be large. Thus, reject the null hypothesis if the
computed value of F (i.e., F test statistic) becomes too large or the P-value for
the F-statistic is lower than any acceptable level of significance (𝜶), and vice
versa.
In short, the Decision Rule is to
➢ Reject 𝑯𝟎 if 𝑭𝒄 > 𝑭𝟏−𝜶 (𝒌 − 𝟏, 𝒏 − 𝒌), 𝒐𝒓 𝑷 − 𝒗𝒂𝒍𝒖𝒆 < 𝜶, and vice versa.
➢ Implication: 𝑹𝒆𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝒐𝒇 𝑯𝟎 implies that the parameters of the model are jointly
significant or the dependent variable 𝒀 is linearly related to at least one of the
independent variables included in the model.

The Analysis of Variance (ANOVA)


When we use Stata to estimate a given model, the Stata will produce a table at the top of
the regression result. This table lists the results of what is called the ANOVA. The
purpose of the Analysis of Variance (ANOVA) table is to partition the total variation of
the dependent variable into component parts, one due to its systematic association with
another variable and the second due to random error, i.e., residuals. It is a summary of the
explanation of the variation in the dependent variable. The ANOVA table for two
explanatory variables model is given as follows:

By: Teklebirhan A. (Asst. Prof.) Page 26


CHAPTER TWO: REGRESSION ANALYSIS 2025

Source of variation Sum of Squares (SS) df Mean Square (MS)


Explained ̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 + 𝜷
𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊 2 ̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 + 𝜷
𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊
Variation (ESS) 𝟐
Unexplained ∑ 𝒆𝟐𝒊 n-3 𝟐
∑ 𝒆𝟐𝒊
𝝈
̂ =
Variation (RSS) 𝒏−𝟑
Total Variation (TSS) ̂ 𝟏 ∑ 𝒙𝟏𝒊 𝒚𝒊 + 𝜷
∑ 𝒚𝟐𝒊 = 𝜷 ̂ 𝟐 ∑ 𝒙𝟐𝒊 𝒚𝒊 + ∑ 𝒆𝟐𝒊 n-1

NB: From the ANOVA table above, we can obtain the F-statistics of the model.

By: Teklebirhan A. (Asst. Prof.) Page 27

You might also like