0% found this document useful (0 votes)
14 views55 pages

CH-2-Part-I-Simple Linear Regression Analysis

Applied econometerics chapter 2

Uploaded by

mekliteyob6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views55 pages

CH-2-Part-I-Simple Linear Regression Analysis

Applied econometerics chapter 2

Uploaded by

mekliteyob6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CHAPTER TWO: REGRESSION ANALYSIS 2024

CHAPTER TWO
REGRESSION ANALYSIS

Part-I- Simple Linear Regression Analysis


2.1. Introduction
As you know, economic theories are mainly concerned with the relationships between economic
variables. These relationships can be stated in mathematical terms which show the functional
relationship of economic variables. The functional relationships of these variables define the
dependence of one variable upon the other variable(s) in the specific form. The specific
functional forms may be linear, quadratic, logarithmic, exponential, or any other form.

This part introduces students to the theory underlying the simplest possible regression analysis,
namely, the two variables regression in which the dependent variable is linearly related to a
single explanatory variable. This case is considered here, not because of its practical adequacy,
but because it presents the fundamental ideas of regression analysis as simply as possible.
Moreover, as we shall see in chapter three, the more general multiple regression analysis in
which the regressand is related to two or more regressors is a logical extension of the two-
variable case. We shall first discuss the basic concept of regression, and then proceed to the core
of simple linear regression analysis.

2.2. The Concept of Regression Analysis


The main goal of any econometric analysis is to establish an acceptable empirical causal relationship
between economic variables. The most important, perhaps single tool to establish an acceptable
empirical causal relationship in econometrics is regression analysis.

The term regression was introduced by Francis Galton. In his famous paper “Family Likeness in
Stature”, Galton found that, although there was a tendency for tall parents to have tall children and
for short parents to have short children, the average height of children born of parents of a given
height tended to move or “regress” toward the average height in the population as a whole.

Regression analysis is concerned with the study of the dependence of one variable (the
dependent variable) on one or more other variables (the explanatory variable(s)). In other words,
regression analysis is concerned with describing and evaluating the relationship between a given
variable, dependent variable, and one or more other variables, independent variable(s). The
objective of regression analysis is to estimate and/or predict the unknown (population) mean
value of the dependent variable in terms of the known values of the explanatory variables.

Instructor: Teklebirhan A. (Asst.Prof.) Page 1


CHAPTER TWO: REGRESSION ANALYSIS 2024

Regression analysis deals with statistical dependence among variables; but not with deterministic
dependence among variables. In statistical relationships, we essentially deal with random
(stochastic) variables; i.e., variables that have probability distributions.

For instance, an economist may be interested in studying the dependence of household monthly
consumption expenditure on household monthly disposable income. That is, our concern might
be with predicting the average consumption expenditure knowing household monthly
disposable income. Such an analysis is helpful in estimating the marginal propensity to consume
(MPC), that is, average change in consumption expenditure for, say, a unit change in disposable
income. To see how this can be done, consider Figure-2.1 below.

The figure shows the distribution of household monthly consumption expenditure in a


hypothetical population corresponding to the given or fixed values of the household monthly
disposable income. Notice that corresponding to any given household income is a range or
distribution of the consumption expenditure. However, notice that despite the variability of the
consumption expenditure for a given value of household income, the average consumption
expenditure generally increases as the income of household increases.

The line that passes through the average level of consumption expenditure for each level of
household income is known as the regression line. It shows how the average consumption
expenditure increases with the household’s income.

8000

6000

4000

2000
Regression Line

0
2000 4000 6000 8000 10000 12000
Income (In Birr)

Consumption (In Birr) Fitted values

Fig.2.1: The Scatter plot diagram

In econometrics we exclusively deal with stochastic relationships. Although regression analysis


deals with the dependence of one variable on other variables, it does not necessarily imply
causation. The determination of the direction of causation should come from outside of statistics,

Instructor: Teklebirhan A. (Asst.Prof.) Page 2


CHAPTER TWO: REGRESSION ANALYSIS 2024

i.e., economic theory. In other words, statistical relationships by themselves cannot logically imply
causation. To ascribe causality, one must appeal to ‘a priori’ or theoretical considerations.

In addition, regression analysis is closely related to correlation analysis but conceptually there is
huge difference between the two analyses. The primary objective of correlation analysis is to
measure the strength or degree of linear association between two variables. However, in regression
analysis (as already noted), we try to predict the average value of the dependent variable on the basis
of fixed values of the explanatory variables. (Check section 2.3. in the lecture ppt)
Terminologies in Regression Theory
The variables in a regression relation consist of dependent and explanatory variables. The dependent
variable is the variable whose variation is being explained by the other variable(s). The explanatory
variable is the variable whose variation is used to explain the variation in the dependent variable.

In the literature on regression, the terms dependent variable and explanatory variable are described
variously. The following is a representative list of the various terminologies used in regression
analysis:
Dependent Variable Explanatory Variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
Note that Regression analysis can be simple or multiple depending on the number of variables
included in the analysis. That is, if we are studying the dependence of a variable on only a single
explanatory variable, such as the dependence of consumption expenditure on the level of real
income, such a study is known as simple, or two-variable, regression analysis. However, if we are
studying the dependence of one variable on more than one explanatory variable, such as the
dependence of crop-yield on rainfall, labor spent, farm size, fertilizer, and etc, it is known as
Multiple Regression Analysis.

Types of Regression Models

On the basis of various criteria such as the type of data, the nature of the dependent variable, etc,
regression models may be classified in to various categories. What follows presents a simple
description of it.

Instructor: Teklebirhan A. (Asst.Prof.) Page 3


CHAPTER TWO: REGRESSION ANALYSIS 2024

Types of Regression Models

Nature of Dependent Types of Data Types of Independent


variables Variables

Continuous Discrete Cross- Panel Time Series Exogenous Endogenous


sectional Data Data
Data

Linear Discrete Panel Time Simultaneous


Regression Choice Data Series Equation
Models: Models: Models: Models:
Logit ▪ Fixed ▪ VAR ▪ ILS
&/or Effect ▪ VECM ▪ 2SLS
Probit ▪ Random ▪ ARCH ▪ IV
Models Effect ▪ GARCH
▪ ARDL

2.3. Population Regression Function Versus Sample Regression Function


Population Regression Function (PRF)
The economic theory of consumption (in its simplest form) can be modeled as stochastic of the
following form:
𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 (𝟐. 𝟏)
[Dependent variable] [The regression line/Explained] [Random Variable]

 Where, 𝒀𝒊 stands for the ith household consumption expenditure, 𝑋𝑖 represents the observed
ith household monthly income, 𝑈𝑖 represents all other variables that can affect households’
consumption spending.

The econometrics model given in [2.1] above is called population regression model or, simply, the
population model.
 This population regression model is called the true relationship because Y, X and U represent
their respective population values, and 𝜶 and 𝜷 are called the true parameters.

In the model there is only one factor X (income) to explain Y (Consumption expenditure). All the
other factors that affect Y are jointly captured by 𝑼. That means, a stochastic model is a model in
which the dependent variable is not only determined by the explanatory variable(s) included in the

Instructor: Teklebirhan A. (Asst.Prof.) Page 4


CHAPTER TWO: REGRESSION ANALYSIS 2024

model but also by others which are not included in the model. That is, the first component is the part
of Y explained by the changes in X and the second is the part of Y not explained by X, that is, to say
the change in Y is due to the random influence of U.

We can distinguish two parts from the above population model: the systematic component, 𝜶 + 𝜷𝑿𝒊
and the random disturbance, 𝑼𝒊 . Calling 𝝁𝒊 to the systematic component, we can write:
𝝁𝒊 = 𝜶 + 𝜷𝑿𝒊
This equation is known as the population regression function (PRF) or population line. Therefore,
as can be seen in figure 2.3, 𝝁𝒊 is a linear function of X with intercept 𝜶 and slope, 𝜷.
However, we can’t estimate the population parameters, 𝜶 and 𝜷 since we can’t have
population/census data for economic reasons. Therefore, we always opt to draw samples and
estimate population parameters based on sample information.

Sample Regression Function (SRF)


The basic idea of a regression model is to estimate the population parameters, 𝜶 and 𝜷, from a given
sample.

The sample regression function (SRF) is the sample counterpart of the population regression
function (PRF). Since the SRF is obtained for a given sample, a new sample will generate different
estimates. The SRF, which is an estimation of the PRF, given by
̂𝒊 = 𝜶
𝒀 ̂ 𝑿𝒊
̂+𝜷
̂ 𝒊 ) for 𝒀 when 𝑿 = 𝑿𝒊 . In the SRF 𝜶
allows us to calculate the fitted value (𝒀 ̂ are estimators of
̂ and 𝜷
the population parameters 𝜶 and 𝜷. For each 𝑿𝒊 , we have an observed value (𝒀𝒊 ) and a fitted value
̂ 𝒊 ). The difference between 𝒀𝒊 and 𝒀
(𝒀 ̂ 𝒊 is called the residual, 𝒆𝒊 :
̂ 𝒊 = 𝒀𝒊 − (𝜶
𝒆𝒊 = 𝒀𝒊 − 𝒀 ̂ 𝑿𝒊 )
̂+𝜷
̂𝒊.
▪ That is, the residual 𝒆𝒊 is the difference between the sample value, 𝒀𝒊 and the fitted value, 𝒀
➢ Rearranging the above equation, we obtain the population regression function (PRF) in its
stochastic form given by:
̂ 𝑿𝒊 + 𝒆𝒊 … … … … … … … … … … (𝟐. 𝟐)
̂+𝜷
𝒀𝒊 = 𝜶

̂, 𝒀
̂, 𝜷
To sum up, 𝜶 ̂ 𝒊 and 𝒆𝒊 , are the sample counterpart of 𝜶, 𝜷, 𝒀𝒊 and 𝑼𝒊 , respectively. It is possible
̂ for a given sample, but the estimates will change for each sample. On the
̂ and 𝜷
to calculate 𝜶
contrary, 𝜶 and 𝜷 are fixed, but unknown. Therefore, right now, our major task is to estimate the
population regression function (PRF) on the basis of the sample regression function (SRF).

Instructor: Teklebirhan A. (Asst.Prof.) Page 5


CHAPTER TWO: REGRESSION ANALYSIS 2024

8000
Sample Regression
Function (SRF)

6000
4000 𝑼𝒊 𝒆𝒊

Population Regression
Function (PRF)
2000
0

2000 4000 6000 8000 10000 12000


Household Monthly Income
Household Monthly Consumption Expenditure (Population)
Sample Household’s Monthly Consumption Expenditure

Fig.2.3: The Scatter plot diagram, Population Regression Function and Sample Regression Function

2.4. Methods of Estimation: The Classical Simple Linear Regression Analysis


Recall our Simple Linear Regression Model: 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊

This econometric model of consumption hypothesizes that the dependent variable Y (consumption)
is linearly related to the explanatory variable X (income) as well as the slope coefficient, 𝜷.

Such a model that describes the relationship between only two variables is called Simple Linear
Regression Model. The term linear regression implies that the dependent variable is linear in
parameters; but it may or may not be linear in explanatory variables.

Specifying the model is the first stage of any econometric application. The next step is the
estimation of the numerical values of the parameters of economic relationships. As far as methods
of estimation are concerned, the parameters of the simple linear regression model can be estimated
by various methods. Three of the most commonly used methods are:
1. Ordinary Least Square Method (OLS)
2. Method of Moments (MM)
3. Maximum Likelihood Method (MLM)
But, having some desirable properties (property of linearity, unbiasedness, and minimum
variance), OLS method is the most popular method to estimate regression parameters.

Instructor: Teklebirhan A. (Asst.Prof.) Page 6


CHAPTER TWO: REGRESSION ANALYSIS 2024

In this part, therefore, we shall estimate the two-variable PRF: 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 by the method
of Ordinary Least Square Method (OLS). However, the PRF is not directly observable. Thus, we
shall estimate PRF from the SRF: 𝒀𝒊 = 𝜶 ̂+𝜷 ̂ 𝑿𝒊 + 𝒆𝒊 .

To estimate 𝜶 and 𝜷 we have to collect data on 𝒀, 𝑿 and 𝑼. Nonetheless, we cannot get data on U
as it is stochastic and can never be observed. Therefore, in order to estimate the parameters, we
should guess the values of 𝑼𝒊 , i.e., make some plausible assumptions about the shape and
distribution of U.

2.4.1. The Basic Assumptions of the Classical Linear Regression Analysis (OLS)
The method of OLS is attributed to Carl Friedrich Gauss, a German Mathematician. OLS is an
econometric method used to derive estimates of the parameters of economic relationships from
statistical observations. However, it works under some restrictive assumptions. The most important
of these assumptions are discussed below.

1. The Model is Linear in Parameters.


The term “model” is broadly used to represent any phenomenon in a mathematical frame work. A
model is termed as linear if it is linear in parameters and nonlinear if it is not linear in parameters. In
other words, if all the partial derivatives of the dependent variable, Y, with respect to each of the
parameters 𝜷𝒊 ′𝒔 are independent of the parameters, then the model is called as a linear model. But if
any of the partial derivatives of, Y, with respect to any of the 𝜷𝒊 ′𝒔 is not independent of the
parameters, the model is called non-linear.

For example:
a. 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟐𝟏 + 𝜷𝟐 √𝑿𝟐 + 𝑼𝒊
is a linear model because 𝜕𝑌⁄𝜕𝛽𝑖 , (𝑖 = 0, 1, 2) are independent of the parameters, 𝛽𝑖 (𝑖 = 0, 1, 2)
b. 𝒀 = 𝜷𝟎 + 𝜷𝟐𝟏 𝑿𝟏 + 𝜷𝟐 𝒍𝒐𝒈𝑿𝟐 + 𝑼𝒊
is a non-linear model because 𝜕𝑌⁄𝜕𝛽1 = 2𝛽1 𝑋1, depends on 𝛽1 although 𝜕𝑌⁄𝜕𝛽0 and 𝜕𝑌⁄𝜕𝛽2 are
independent of any of the parameters, 𝛽0 or 𝛽2.
c. 𝒀 = 𝜶 + 𝜷𝑿 + 𝑼𝒊 is linear in both parameters and the variables, so it satisfies the assumption.

 Note that the classicals assumed that the model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or not.

2. 𝑼𝒊 is a Random Real Variable with mean value of zero


This means that the value which 𝑼 may assume in any one period depends on chance. This means
that for each value of x, the random variable (U) may assume various values, some greater than zero
and some smaller than zero, but if we considered all the positive and negative values of U, for any

Instructor: Teklebirhan A. (Asst.Prof.) Page 7


CHAPTER TWO: REGRESSION ANALYSIS 2024

given value of X, they would have an average value equal to zero. In other words, the positive and
negative values of 𝑼 cancel each other.
Mathematically,
𝑬(𝑼𝒊 ) = 𝟎………………………………..…. (2.3)
The implication of this assumption is that the factors not explicitly included in the model and
therefore subsumed in 𝑼𝒊 , do not systematically affect the mean value of Y. i.e., the positive 𝑼𝒊
values cancel out the negative 𝑈𝑖 values so that their average effect on Y is zero.
Given 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 , this assumption leads to the fact that: 𝑬(𝒀𝒊 ) = 𝜶 + 𝜷𝑿𝒊

3.The variance of the error term (𝑼𝒊 ) is constant across observations (the assumption of
homoscedasticity)
For all values of X, the U’s will show the same dispersion around their mean. This implies that,
given the value of X, the variance of 𝑈𝑖 is the same (constant) for all observations.
Note: this assumption implies that the values of Y corresponding to various values of X have
constant variance.

a) Homoskedastic Variance: b) Heteroskedastic Variance:


- The error has constant variance - Spread of errors depends on X
Mathematically;
𝑽𝒂𝒓(𝑼𝒊 ) = 𝑬[𝑼𝒊 − 𝑬(𝑼𝒊 )]𝟐 = 𝑬(𝑼𝒊 )𝟐 = 𝝈𝟐 , since 𝑬(𝑼𝒊 ) = 𝟎. This constant variance is called
homoscedasticity assumption and the constant variance itself is called homoscedastic variance.

4. The random variable (U) has a Normal Distribution


This means the values of U (for each x) have a bell shaped symmetrical distribution about their zero
mean and constant variance, 𝝈𝟐 , i.e.,
𝑼𝒊 ~𝑵(𝟎, 𝝈𝟐 ) ……………………………..……(2.4)

Instructor: Teklebirhan A. (Asst.Prof.) Page 8


CHAPTER TWO: REGRESSION ANALYSIS 2024

➢ The reason for this assumption is that if U is normally distributed, so will Y and the estimated
regression coefficients, and
✓ This will be useful in performing tests of hypotheses and constructing confidence
intervals for 𝜶 and 𝜷.
➢ This assumption is required mainly for hypothesis testing (inference).

5. The parameters 𝜶 and 𝜷 are fixed

If this assumption is not adopted, the regression model would be very difficult to handle. In any case,
it may be acceptable to postulate that the model parameters are stable over time.
6.The random terms of different observations (𝑼𝒊 , 𝑼𝒋 ) are independent (I.e., No autocorrelation
or No serial correlation)
This means the value which the random term assumed in one period does not depend on the
value which it assumed in any other period.
Algebraically,
𝑪𝒐𝒗(𝑼𝒊 , 𝑼𝒋 ) = 𝑬[[(𝑼𝒊 − 𝑬(𝑼𝒊 )][𝑼𝒋 − 𝑬(𝑼𝒋 ]]
𝑪𝒐𝒗(𝑼𝒊 , 𝑼𝒋 ) = 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎…………………………..…. (2.5)
If errors are serially correlated, an increase in the error term in one time period affects the error term
in the next period. Autocorrelation is a serious problem in time series data. That is, the assumption
that there is no serial correlation can be unrealistic in time series.

7. The values of X are fixed in repeated samples.


This means that, in taking different samples on Y and X, the X i values are the same in all samples,
but the u i values do differ from sample to sample. I.e., X is assumed to be non-stochastic.

8. Exogeneity: All explanatory variables are uncorrelated with the error term.
The explanatory variables are determined outside the model (they are exogenous). This means that
there is no correlation between the random variable and the explanatory variable(s).
What happens if this assumption is violated?
Suppose we have the model: 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝑼𝒊 . Suppose 𝑋𝑖 and 𝑈𝑖 are positively correlated,
i.e., when 𝑋𝑖 is large, 𝑈𝑖 tends to be large as well.

Why would 𝑋𝑖 and 𝑈𝑖 be correlated? Suppose you are trying to study the relationship between the
price of Burger and the quantity sold across various restaurants of a given city. Suppose you estimate
the following model: 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑷𝒓𝒊𝒄𝒆 + 𝑼𝒊 . The problem in this model, however, is that
quality of Burger differs across restaurants. Thus, quality should be included as explanatory variable,
but you fail to do so. Then it becomes part of 𝑼𝒊 . But price and quality are highly positively

Instructor: Teklebirhan A. (Asst.Prof.) Page 9


CHAPTER TWO: REGRESSION ANALYSIS 2024

correlated. Therefore, 𝑋𝑖 and 𝑈𝑖 are positively correlated. This means that 𝛽̂1 will be too high. This
is called Omitted Variables Bias.

Thus, to have unbiased estimator, the exogeneity assumption is important. Algebraically, if two
variables are unrelated their covariance shall be zero.
𝐼. 𝑒. , 𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝟎 … … … … … … … … … … … … … … … . . … . 2.6
Proof:-
𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝑬[[(𝑿𝒊 − 𝑬(𝑿𝒊 )][𝑼𝒊 − 𝑬(𝑼𝒊 )]]
= 𝑬[(𝑿𝒊 − 𝑬(𝑿𝒊 )(𝑼𝒊 )], given 𝑬(𝑼𝒊 ) = 𝟎
= 𝑬(𝑿𝒊 𝑼𝒊 ) − 𝑬(𝑿𝒊 )𝑬(𝑼𝒊 )
= 𝑬(𝑿𝒊 𝑼𝒊 ), given 𝑬(𝑼𝒊 ) = 𝟎
= 𝑿𝒊 𝑬(𝑼𝒊 ), given that the 𝑿𝒊 are fixed values.
𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝟎

9. The Regression Model is correctly specified /No Model specification error


This means that the mathematical form of the model is correctly specified and all important
explanatory variables are included in it. In other words, there is no specification bias or error in the
model used in empirical analysis. Unfortunately, in practice one rarely specifies the correct model.
Hence, an econometrician would use some judgment in choosing the correct model, i.e., in
determining the type and number of variables entering the model, and functional form of the model
s/he has to utilize on the basis of theoretical grounds.

10. Variability in X values


The X values in a given sample must not all be the same. This means that x assumes different values
in a given sample; but it assumes fixed values in a hypothetical repeated samples. This assumption is
very critical since without this assumption it would be impossible to estimate the parameters and
hence, regression analysis would fail. For example, if there is little variation in household income,

Instructor: Teklebirhan A. (Asst.Prof.) Page 10


CHAPTER TWO: REGRESSION ANALYSIS 2024

we will not be able to explain much of the variation in the consumption expenditure of the
households.
11. Absence of high multi-collinearity among explanatory variables (specific to Multiple
regression models)-(More in Chapter-3)
The explanatory variables are not perfectly correlated with each other. In other words, there is no
perfect linear relationship among the explanatory variables. This assumption however, does not
exclude non-linear relationships among the explanatory variables.

The Distribution of the Dependent Variable, Y


So far we have determined the distribution of the explanatory variables and the stochastic term. In
this section, we will determine the distribution of the dependent variable. Based on the assumptions
we discussed so far about the distributions of 𝑿 and 𝑼, we can establish that Y is normally
distributed with:
1. Mean: 𝑬(𝒀𝒊 ) = 𝜶 + 𝜷𝑿𝒊 , and

2. Variance: 𝑽𝒂𝒓(𝒀𝒊 ) = 𝑬(𝑼𝒊 ) = 𝝈𝟐


Proof:
1. By definition, the expected value of Y is equal to its mean value. Therefore, the mean of 𝒀 is
given as:
𝑬(𝒀𝒊 ) = 𝑬(𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )
= 𝜶 + 𝜷𝑿𝒊 Since 𝑬(𝑼𝒊 ) = 𝟎
 This is because, we know that 𝜶 and 𝜷 are parameters and hence, they are constant.
Besides, by assumption 7, the values of X are a set of fixed numbers implying that
𝑬(𝜶 + 𝜷𝑿𝒊 ) = 𝜶 + 𝜷𝑿𝒊 , and by assumption-2, 𝑬(𝑼𝒊 ) = 𝟎.

2. The variance of Y is given as: 𝑽𝒂𝒓(𝒀𝒊 ) = 𝑬[𝒀𝒊 − 𝑬(𝒀𝒊 )]𝟐


= 𝑬[𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 − (𝜶 + 𝜷𝑿𝒊 )]𝟐 Via substitution
= 𝑬[𝑼𝒊 ]𝟐 = 𝝈𝟐 (𝒔𝒊𝒏𝒄𝒆 = 𝑬[𝑼𝒊 ]𝟐 = 𝝈𝟐 )
∴ 𝑽𝒂𝒓(𝒀𝒊 ) = 𝝈𝟐
 Therefore, we can conclude that the variance of 𝒀 is the same as the variance of the stochastic
term.
3. The shape of the distribution of 𝒀 is normal
The shape of the distribution of 𝒀𝒊 is determined by the shape of the distribution of 𝑼𝒊 which is
normal by assumption 4. Since 𝜶 and 𝜷, being constant, they don’t affect the distribution of 𝒀𝒊 .
Furthermore, the values of the explanatory variable, 𝑋𝑖 , are a set of fixed values by assumption 7 and
therefore don’t affect the shape of the distribution of 𝒀𝒊 . Thus, the distribution of Y is normal
following the normality of the distribution of 𝑼.

Instructor: Teklebirhan A. (Asst.Prof.) Page 11


CHAPTER TWO: REGRESSION ANALYSIS 2024

∴ 𝒀𝒊 ~𝑵(𝜶 + 𝜷𝑿𝒊 , 𝝈𝟐 )……………………… (2.7)

2.5.2. Estimation of SLRM by Ordinary Least Square (OLS) Method


OLS is the technique used to estimate a line that will minimize the error (the difference between the
predicted and the actual values of a dependent variable).

̂ 𝑿𝒊 + 𝒆𝒊 .
̂+𝜷
From (2.2), we know that the simple regression model in its stochastic form is 𝒀𝒊 = 𝜶
Estimation of 𝜶 and 𝜷 by least square method (LS) or classical least square (CLS) involves finding
̂ which will minimize the sum of squared residuals (∑ 𝒆𝒊 𝟐 ).
̂ and 𝜷
values for the estimators 𝜶

Rearranging the above equation, we obtain:


̂ 𝑿𝒊 ) … … … … … … … … … … … … … … … . (𝟐. 𝟖)
̂+𝜷
𝒆𝒊 = 𝒀𝒊 − (𝜶
̂.
̂ and 𝜷
which indicates the residual value is a function of the numeric values of 𝜶

According to the least squares criterion, the best SRF (line) is obtained by minimizing the aggregate
of squared prediction errors in the above equation. The rationale of this criterion is straightforward.
It is intuitively obvious that the smaller the deviations of the true values of Y from the SRF (line), at
each values of X in the sample, the better fit the SRF (line) to the scatter of sample observations.
Thus, the least squares criterion calls for the values of the parameters of a model to be determined in
such a way that minimizes the sum of the squares of the deviations of the true values of Y from the
SRF (line) at each values of X in the sample. The total squared prediction error of a regression
line can be obtained by adding the squares of the difference between the true values of Y and the
̂ values of Y at each values of X in the sample.
estimated (predicted) 𝒀
Symbolically,
̂ 𝒊 ]𝟐
Total squared prediction error ∑ 𝒆𝒊 𝟐 = ∑[𝒀𝒊 − 𝒀
i.e.,
Total squared prediction error : ∑ 𝒆𝒊 𝟐 = ∑[𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ]𝟐 … … … … … … . (𝟐. 𝟗)
̂−𝜷
̂𝒊 = 𝜶
where, 𝒀 ̂ 𝑿𝒊
̂+𝜷
To find the values of 𝜶 ̂ that minimize this sum, we have to partially differentiate ∑ 𝒆𝒊 𝟐
̂ and 𝜷
̂ and set the partial derivatives equal to zero.
̂ and 𝜷
with respect to 𝜶
𝝏 ∑ 𝒆𝒊 𝟐
1. = −𝟐(∑ 𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 𝟎 … … … … … … … … … … … … . . (𝟐. 𝟏𝟎)
̂−𝜷
̂
𝝏𝜶
𝝏 ∑ 𝒆𝒊 𝟐
2. = −𝟐 ∑ 𝑿𝒊 (𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 𝟎 … … … … … … … … … … … … . . (𝟐. 𝟏𝟏)
̂−𝜷
𝝏𝜷̂

Instructor: Teklebirhan A. (Asst.Prof.) Page 12


CHAPTER TWO: REGRESSION ANALYSIS 2024

Equations (2.10) and (2.11) are often called the first order conditions for the OLS estimates, a
term that comes from optimization using calculus. The name “ordinary least squares” comes
from the fact that these estimates minimize the sum of squared residuals.

Note: at this point that the term in the parenthesis in equation 2.10 and 2.11 represents the
̂−𝜷
residual, 𝒆𝒊 = 𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 . Hence, it is possible to rewrite (2.10) and (2.11) as −𝟐 ∑ 𝒆𝒊 = 𝟎
and −𝟐 ∑ 𝑿𝒊 𝒆𝒊 = 𝟎. It follows that;
∑ 𝒆𝒊 = 𝟎 and ∑ 𝑿𝒊 𝒆𝒊 = 𝟎 … … … … … … … … … … … … … … (𝟐. 𝟏𝟐)

➢ Rearranging equation (2.10), we obtain:

∑ 𝒀𝒊 = 𝒏𝜶 ̂ ∑ 𝑿𝒊 … … … … … … … … … … … . . … … (𝟐. 𝟏𝟑)
̂+𝜷
➢ If we rearrange equation (2.11), we obtain:

̂ ∑ 𝑿𝟐𝒊
̂ ∑ 𝑿𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝒊 = 𝜶 … … … … … … … … … … … … … . (𝟐. 𝟏𝟒)

Equation (2.13) and (2.14) are called OLS Normal Equations.


In operations with summations, the following notations and rules must be taken into account.
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛

∑ 𝑋𝑖 = ∑ 𝑋𝑖 : ∑ 𝑎 = 𝑛𝑎 ∶ ∑ 𝑎𝑋𝑖 = 𝑎 ∑ 𝑋𝑖 ∶ ∑(𝑋𝑖 + 𝑌𝑖 ) = ∑ 𝑋𝑖 + ∑ 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

Solving the normal equations of OLS simultaneously we get expressions with which the optimal
numeric values of 𝜶̂ and 𝜷̂ are determined based on the least squares criterion. Dividing both sides
̂ we get the following expression:
of 2.13 by 𝒏 and solving for 𝜶

̂=𝒀
𝜶 ̂𝑿
̅−𝜷 ̅ … … … … … … … … … … … … … … . . … (𝟐. 𝟏𝟓)

➢ Substituting the values of 𝜶


̂ from (2.15) to (2.14), we get:
̂𝑿
̅−𝜷
∑ 𝒀𝒊 𝑿𝒊 = ∑ 𝑿𝒊 (𝒀 ̂ ∑ 𝑿𝟐𝒊
̅) + 𝜷
̅ ∑ 𝑿𝒊 − 𝜷
∑ 𝒀𝒊 𝑿𝒊 = 𝒀 ̂𝑿 ̂ ∑ 𝑿𝟐𝒊
̅ ∑ 𝑿𝒊 + 𝜷
̅ ∑ 𝑿𝒊 = 𝜷
∑ 𝒀𝒊 𝑿𝒊 − 𝒀 ̂ (∑ 𝑿𝒊 − 𝑿
𝟐 ̅ ∑ 𝑿𝒊 )
∑ 𝒀𝒊 𝑿𝒊 − 𝒏𝑿̅𝒀 ̂
̅ = 𝜷(∑ 𝑿𝒊 − 𝒏𝑿
𝟐 ̅ 𝟐)

̅𝒀
∑ 𝒀𝒊 𝑿𝒊 − 𝒏𝑿 ̅
̂=
𝜷
̅ 𝟐 ………….……………………………..…. (2.16)
∑ 𝑿𝟐𝒊 − 𝒏𝑿
Alternatively, equation (2.14) can be rewritten in somewhat different way as follows;
̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅ ) = ∑(𝒀𝒊 𝑿𝒊 − 𝒀
̅ 𝑿𝒊 − 𝑿
̅ 𝒀𝒊 + 𝑿
̅𝒀̅)

Instructor: Teklebirhan A. (Asst.Prof.) Page 13


CHAPTER TWO: REGRESSION ANALYSIS 2024

∑ 𝒀𝒊 𝑿𝒊 − 𝒀̅ ∑ 𝑿𝒊 − 𝑿
̅ ∑ 𝒀𝒊 + 𝒏𝑿̅𝒀
̅
∑ 𝒀𝒊 𝑿𝒊 − 𝒏 𝒀̅𝑿
̅ − 𝒏𝑿̅𝒀̅ + 𝒏𝑿
̅𝒀̅
̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅ ) = ∑ 𝒀𝒊 𝑿𝒊 − 𝒏 𝒀
̅𝑿̅ … … … … … … … … … … (𝟐. 𝟏𝟕)
̅ )𝟐 = ∑ 𝑿𝟐𝒊 − 𝒏𝑿
∑(𝑿𝒊 − 𝑿 ̅ 𝟐 … … … … … … … … … … … … … … . (𝟐. 𝟏𝟖)

Substituting (2.17) and (2.18) in (2.16), we get


̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅)
̂=
𝜷
∑(𝑿𝒊 − 𝑿̅ )𝟐
̅ ) as 𝒙𝒊 , and (𝒀𝒊 − 𝒀
➢ Now, denoting(𝑿𝒊 − 𝑿 ̅ ) as 𝒚𝒊 , We obtain;
∑ 𝒙 𝒊 𝒚𝒊
̂=
𝜷 … … … … … … … … … … … … … … (𝟐. 𝟏𝟗)
∑ 𝒙𝒊 𝟐
➢ The expression in (2.19) to estimate the slope coefficient is the formula in deviation form.
➢ Numerical Example 2.1: Estimation of a Keynesian consumption function
Given the Keynesian consumption function, 𝑪𝒐𝒏𝒔𝒊 = 𝜶 + 𝜷𝒊𝒏𝒄𝒊 + 𝑼𝒊
where, 𝑪𝒐𝒏𝒔𝒊 , and 𝒊𝒏𝒄𝒊 −stands for consumption expenditure and monthly income of the
ith household, respectively. And 𝜶 and 𝜷 are the intercept and marginal propensity to
consume, respectively, which we are interested to compute.

Given consumption (Y) and income (X) data, both in thousands of Birr, of six households, obtain the
OLS estimators of 𝜶 𝒂𝒏𝒅 𝜷?

Obsns 𝒀𝒊 𝑿𝒊 𝑌𝑖 𝑋𝑖 𝑋𝑖2 𝑦𝑖 = 𝑌𝑖 − 𝑌̅ 𝑥𝑖 = 𝑋𝑖 − 𝑋̅ 𝑦𝑖 𝑥𝑖 𝑥𝑖2 = (𝑋𝑖 − 𝑋̅)2


1. 4 5 20 25 -3 -4 12 16
2. 4 4 16 16 -3 -5 15 25
3. 7 8 56 64 0 -1 0 1
4. 8 10 80 100 1 1 1 1
5. 9 13 117 169 2 4 8 16
6. 10 14 140 196 3 5 15 25
Sums 42 54 429 570 0 0 51 84
42 54
∴ 𝑌̅ = 6
= 7, and 𝑋̅ = 6 = 9
̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅) 𝟓𝟏
̂=
𝜷 , ̂=
𝜷 = 𝟎. 𝟔𝟎𝟕
∑(𝑿𝒊 − 𝑿̅ )𝟐 𝟖𝟒
̂=𝒀
𝜶 ̂𝑿
̅−𝜷 ̅ ̂ = 𝟕 − 𝟎. 𝟔𝟎𝟕(𝟗) = 𝟏. 𝟓𝟑
∴𝜶

Therefore, the fitted regression equation (i.e., OLS regression line) is:

Instructor: Teklebirhan A. (Asst.Prof.) Page 14


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔

Regression Through the Origin (Estimation of a function with zero intercept)


In some cases economic theory postulates relationships which have a zero intercept. For example,
linear production functions of manufactured products should normally have zero intercept, since
output is zero when the factor inputs are zero. Thus, in such a case, the regression line passes
through the point (0, 0), as in figure 2.3 below. This is called regression through the origin.

Figure 2.3: Regression through the origin.


̂ , the
That means to fit the line 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊, subject to the restriction 𝜶 = 𝟎. To estimate 𝜷
problem can be stated in a form of restricted minimization problem and then the Lagrangean method
is applied.
We minimize: ∑ 𝒆𝒊 𝟐 = ∑(𝒀𝒊 − 𝜶 ̂−𝜷 ̂ 𝑿𝒊 )𝟐
̂ = 𝟎
Subject to: 𝜶
The composite function then becomes:
𝐿 = ∑(𝒀𝒊 − 𝜶̂−𝜷 ̂ 𝑿𝒊 )𝟐 − 𝝀𝜶,
̂ Where, 𝝀 is a Lagrange multiplier.
̂ and 𝝀
̂, 𝜷
We minimize the function with respect to 𝜶
𝜕𝐿
= −2 ∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) − 𝝀 = 0 − − − − − − − (𝑖)
̂−𝜷
𝜕𝜶̂
𝜕𝐿
−2 ∑(𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 )(𝑿𝒊 ) = 0 − − − − − − − (𝑖𝑖)
̂−𝜷
𝜕𝜷̂ =
𝜕𝐿
̂ = 0 − − − − − − − − − − − − − − − −(𝑖𝑖𝑖)
= −𝜶
𝜕𝝀
Substituting (iii) in (ii) and rearranging we obtain:
∑ 𝑿𝒊 (𝒀𝒊 − 𝜷 ̂ 𝑿𝒊 ) = 𝟎
∑ 𝒀𝒊 𝑿𝒊 − 𝜷 ̂ ∑ 𝑿𝒊 𝟐 = 𝟎 ∑ 𝒀𝒊 𝑿𝒊
̂=
𝜷
∑ 𝑿𝒊 𝟐 … … … … … … … … … … … . . (2.20)

 This formula involves the actual values (observations) of the variables and not their
̂.
deviation forms as in the case of unrestricted value of 𝜷

Numerical Example-2: Estimation of a supply function


Instructor: Teklebirhan A. (Asst.Prof.) Page 15
CHAPTER TWO: REGRESSION ANALYSIS 2024

Consider a supply function where firms will not produce any amount of the commodity if the market
price is zero. A preliminary analysis of a cross-sectional data based on a sample of 50 firms provides
the following intermediate result: ∑ 𝑸𝒊 𝑷𝒊 = 𝟕𝟓𝟎𝟎, ∑ 𝑷𝒊 𝟐 = 𝟏𝟎𝟎𝟎, 𝑷
̅ = 𝟏𝟓 & 𝑸
̅ = 𝟏𝟓𝟎

➢ Estimate the regression line and show the relationship graphically


Solution:

∑ 𝒀𝒊 𝑿𝒊 ∑ 𝑸𝒊 𝑷𝒊 𝟕𝟓𝟎𝟎
̂=
𝜷 = = = 𝟕. 𝟓
∑ 𝑿𝒊 𝟐 ∑ 𝑷𝒊 𝟐 𝟏𝟎𝟎𝟎
̂ = 𝟕. 𝟓𝑷
Therefore, the estimated regression line is: 𝑸𝑺

2.5. Alternative Functional Forms and Interpretation of OLS Estimates


So far we primarily dealt with models that are linear in parameters and in variables. In many cases,
however, linear relationships are not adequate for economic applications. The OLS method can also
be applied to models in which the endogenous variable and/or the exogenous variable have been
transformed. Now, we will consider some commonly used regression models that may be non–linear
in variables but are linear in the parameters or can be made so by suitable transformations of the
̂ in each case.
variables. We will look at the interpretation of the coefficient 𝜷

a) Linear Model
The coefficient measures the effect of the regressor X on Y. Let us look at this in detail. Consider the
sample regression line of a SLRM:
̂𝒊 = 𝜶
𝒀 ̂ 𝑿𝒊 … … … … … … … … … … . . (𝒂)
̂+𝜷
Taking the first order difference both sides of (a) and rearranging, we obtain:
̂𝒊
𝒅𝒀
̂ 𝒅𝑿𝒊 ,
̂𝒊 = 𝜷
𝒅𝒀 ̂ … … … … … (𝒃)
=𝜷
𝒅𝑿𝒊

̂ is the change in Y (in the units in which Y is measured) by a unit change of X (in the
Therefore, 𝜷
units in which X is measured).

For instance, in our consumption example above the fitted linear model was
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊 .
𝑪𝒐𝒏𝒔
̂ = 𝟎. 𝟔𝟎𝟕, can be interpreted as: if income increases by 1 thousands Birr, on average,
Therefore, 𝜷
consumption will increase by 0.607 thousands Birr.
➢ The linearity of such a model implies that a one-unit change in X always has the same effect on
Y, regardless of the value of X considered.

Instructor: Teklebirhan A. (Asst.Prof.) Page 16


CHAPTER TWO: REGRESSION ANALYSIS 2024

Estimation of elasticity from the estimated linear regression line


Elasticity measures the percentage change in the value of a variable as a result of a percentage
change in the value of another variable. Symbolically,
%∆𝒀 ∆𝒀 𝑿 𝒅𝒀 𝑿
𝑬𝒍𝒂𝒔𝒕𝒊𝒄𝒊𝒕𝒚 𝒐𝒇 𝒀 𝒕𝒐 𝑿(𝑬𝒀𝑿 ) = = .[ ] 𝒐𝒓 𝑬𝒀𝑿 = .[ ]
%∆𝑿 ∆𝑿 𝒀 𝒅𝑿 𝒀
We can estimate the numerical value of the average elasticity of Y to X from the estimated linear
̂𝐢 = 𝛂
regression model (i. e. , 𝐘 ̂ 𝐗 𝐢 ) as follows:
̂+𝛃
𝒅𝒀 𝑿̅
𝑬𝒀𝑿 = .[ ]
𝒅𝑿 𝒀̅
Where, 𝑬𝒀𝑿 is average elasticity of Y to X.
̅ is the mean value of Y in the sample.
𝒀
𝑿̅ is the mean value of X in the sample.
Numerical Example
Compute the average price elasticity of supply based on the information given in numerical
Example-2. The average price elasticity of supply can be computed as follows:
̅
𝑷 𝟏𝟓
̂ . [ ] = 𝟕. 𝟓 ∗ [
𝑬𝑸𝑺,𝑷 = 𝜷 ] = 𝟎. 𝟕𝟓
̅
𝑸 𝟏𝟓𝟎
The result shows that the average price elasticity of supply is 0.75. This implies that on average
quantity supply of the commodity would change by 0.75 percent if the price of the commodity
changed by one percent (i.e., on average quantity supply of the commodity is price inelastic).

b) Log-Linear Model ( or Exponential Model)


Some functional relationships are exponential which take the form = 𝒂𝑿 , where a (a constant) is the
“base”, X is the exponent and Y is “growing exponentially” so long as 𝑿 > 𝟎. The most common
base for exponential functions is the constant e (where, 𝒆 = 𝟐. 𝟕𝟏𝟖𝟑).

Suppose the underlying relationship between two variables (X & Y) is given as: 𝒀𝒊 = 𝒆(𝜶+𝜷𝑿𝒊 +𝑼𝒊 ).
By taking natural logs on both sides of the above model, we obtain the following log-linear model:
𝒍𝒏(𝒀𝒊 ) = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 … … … … … … … … … … … … (𝒄)
The corresponding sample regression function to (𝒄) be the following:
̂𝒊 ) = 𝜶
𝒍𝒏(𝒀 ̂ 𝑿𝒊 … … … … … … … … … … … … … … (𝒅)
̂+𝜷
The slope coefficient in this model measures relative change in Y for a given absolute change in X.
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝒀⁄
̂=
I.e., 𝜷 = 𝒀
𝐀𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐞𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐨𝐫𝐲 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝑿

Instructor: Teklebirhan A. (Asst.Prof.) Page 17


CHAPTER TWO: REGRESSION ANALYSIS 2024

If we multiply the relative change in Y by 100, equation (d) will give the percentage change or the
growth rate in Y for an absolute change in the explanatory variable, i.e., 100 times 𝜷 ̂ gives the
growth rate in Y.

In other words, taking first order differences in (d), and then multiplying both sides by 100%, we
obtain
̂ 𝒊 )% = 𝜶
𝟏𝟎𝟎𝒅𝒍𝒏(𝒀 ̂ + 𝟏𝟎𝟎𝜷 ̂ %𝒅𝑿𝒊
̂ %.
̂ will increase by 𝟏𝟎𝟎 ∗ 𝜷
Therefore, if 𝑿 increases by 1 unit, then 𝒀

In this model, the elasticity is proportional to the explanatory variable:


𝒅𝒀 𝑿 𝒅𝒍𝒏(𝒀)
I.e., 𝑬𝒀,𝑿 = . = ̂ . 𝑿 … … … … … … … … … … (𝒆)
.𝑿 = 𝜷
𝒅𝑿 𝒀 𝒅𝑿
𝒅𝒀
NB: 𝒅𝒍𝒏(𝒀) =
𝒀

Numerical Example-3: Return to Education


Suppose an econometrician is interested to study the effect of years of education on hourly wage. He
expects that each year of education increases wage by a constant percentage. Therefore, based on a
sample of 100 individuals from a given city, the following model is estimated to explain wages:
̂ 𝒊 ) = 𝟎. 𝟕𝟓 + 𝟎. 𝟏𝟐𝟓𝑬𝑫𝑼𝑪𝒊
𝒍𝒏(𝑾𝑨𝑮𝑬

Where, EDUC (education) is measured in years of schooling and Wage is hourly wage in ETB.

̂ : (The coefficient on educ will have a percentage


➢ Interpretation of the coefficient, 𝜷
interpretation when it is multiplied by 100)
➢ I.e., For every additional year of education, wage increases by 12.5%, on average.

c) Linear-Log Model
A linear-Log model is a function where the dependent variable is defined as a linear function of the
logarithm of the explanatory variable as shown below:
𝒀𝒊 = 𝜶 + 𝜷𝒍𝒏𝑿𝒊 + 𝑼𝒊 … … … … … … … … … … . . (𝒇)
The corresponding fitted function is the following:
̂𝒊 = 𝜶
𝒀 ̂ 𝒍𝒏𝑿𝒊 … … … … … … … … … … … … . . . (𝒈)
̂+𝜷
The slope coefficient in this model measures absolute change in Y for relative change in X.
𝐀𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐘 ∆𝒀
̂=
I.e., 𝜷 = ∆𝑿 or ̂ ∆𝑿⁄
∆𝒀 = 𝜷
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐗 ⁄𝑿 𝑿
Therefore, taking first order differences in (𝒈), and then multiplying and dividing the right hand side
by 100, we have

Instructor: Teklebirhan A. (Asst.Prof.) Page 18


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂
𝜷
̂𝒊 =
∆𝒀 𝟏𝟎𝟎%∆𝒍𝒏𝑿𝒊
𝟏𝟎𝟎
̂ /𝟏𝟎𝟎) units.
̂ will increase by (𝜷
Therefore, if X increases by 1%, then 𝒀
➢ And the elasticity of Y to X is given by:
𝒅𝒀 𝑿 𝒅𝒀 𝟏 𝟏
𝑬𝒀,𝑿 = . = ̂ . … … … … … … … … … … … … (𝒉)
. =𝜷
𝒅𝑿 𝒀 𝒅𝒍𝒏𝑿 𝒀 𝒀
𝒅𝑿
NB: 𝒅𝒍𝒏𝑿 = 𝑿

Numerical Example-4: Estimation of Expenditure on Dairy Products


Suppose an estimated model of expenditure on Dairy products (in ETB) as a function of income (in
ETB) is given as:
𝑫𝒂𝒊𝒓𝒚 = −𝟏𝟐 + 𝟕. 𝟓𝒍𝒏(𝒊𝒏𝒄)
̂ : if the consumer’s income increases by 1%, on average, the demand
The interpretation of 𝜷
of dairy products will increase by 0.075 ETB.
d) Log-Log Model (Double Log Model)
Sometimes, potential models are postulated in economic theory, such as the well-known Cobb-
Douglas functional form. This form is very popular for estimating production and demand
functions. A potential model with a unique explanatory variable is given by:
𝒀𝒊 = 𝒆𝜶 𝑿𝜷 𝒆𝑼 … … … … … … … … … … … … … … … . (𝒊)
This model is not linear in the parameters, but it is linearizable by taking natural logarithms, and the
following is obtained:
𝒍𝒏(𝒀𝒊 ) = 𝜶 + 𝜷𝒍𝒏𝑿𝒊 + 𝑼𝒊 … … … … … … … … … … . (𝒋)
The corresponding fitted model to (j) is the following:
̂𝒊 ) = 𝜶
𝒍𝒏(𝒀 ̂ 𝒍𝒏𝑿𝒊 … … … … … … … … … … … … . . (𝒌)
̂+𝜷
Taking first order differences in (k), we obtain
̂ 𝒊) = 𝜶
∆𝒍𝒏(𝒀 ̂ ∆𝒍𝒏𝑿𝒊 … … … … … … … … … … … … . . (𝒍)
̂+𝜷
The slope coefficient in this model measures relative change in Y for a given relative change in X.
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝒀⁄
̂=
I.e., 𝜷 = 𝒀
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐞𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐨𝐫𝐲 𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝑿⁄
𝑿
Therefore, multiplying both sides of (l) by 100%, we obtain percentage relationships
̂ 𝒊 )𝟏𝟎𝟎% = 𝜶
∆𝒍𝒏(𝒀 ̂ ∆𝒍𝒏𝑿𝒊 (𝟏𝟎𝟎%) … … … … … … … … … … . . (𝒎)
̂+𝜷
̂ represents elasticity of Y for X.
➢ Thus, in this model (m), 𝜷

Instructor: Teklebirhan A. (Asst.Prof.) Page 19


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ %. It is important to remark that, in this


̂ will increase by 𝜷
Therefore, if X increases by 1%, then 𝒀
̂ is the estimated elasticity of Y with respect to X, for any value of X and Y. Consequently,
model, 𝜷
in this model the elasticity is constant.

Numerical Example-5: Estimation of Quantity demand function of coffee: Log- Log Model
Suppose that to examine the effect of coffee price on quantity demanded of coffee, an investigator
estimated the following Log-Log Model:
̂ 𝒍𝒏(𝑪𝒐𝒇𝒇𝒑𝒓𝒊𝒄𝒆) + 𝑼𝒊
𝒍𝒏(𝑸𝑫𝑪𝒐𝒇𝒇) = 𝜶 + 𝜷
The fitted log-log model has been given as:
̂ ) = 𝟑. 𝟓 − 𝟓. 𝟐𝟓𝒍𝒏(𝒄𝒐𝒇𝒇𝒑𝒓𝒊𝒄𝒆)
𝒍𝒏(𝑸𝑫𝒄𝒐𝒇𝒇
̂ : if the price of coffee increases by 1%, the quantity
➢ Interpretation of the coefficient, 𝜷
demanded of coffee will decrease by 5.25%. In this case, 𝜷̂ represents the estimated price
elasticity of demand.

̂ in Different Models
Summary: Interpretation of 𝜷
Model If X increases by Then Y will change by
Linear 1 unit ̂ 𝒖𝒏𝒊𝒕𝒔
𝜷
Linear-Log 1% ̂ /𝟏𝟎𝟎) 𝒖𝒏𝒊𝒕𝒔
(𝜷
Log-Linear 1 unit ̂ ∗ 𝟏𝟎𝟎)%
(𝜷
Log-Log 1% ̂%
𝜷

Homework-1

1) Suppose you are interested to examine the effect of advertisement expenditure on sales volume
of firms producing and supplying a particular product. To this end, you have specified a simple
linear regression Model described as: 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 , and collected the following cross-
sectional data on six firms.

Sales(Y) Advert (X)


6 3
8 4
9 6

Instructor: Teklebirhan A. (Asst.Prof.) Page 20


CHAPTER TWO: REGRESSION ANALYSIS 2024

5 4
4.5 2
9.5 5

a) Estimate 𝜶 and 𝜷 by using the principle of least squares and interpret the results.
b) Estimate the average elasticity of sales volume to advertisement expenditure?
2) Assume that the data on sales and advertisement given above is in natural logs,
a) Specify your double log regression model, estimate the coefficients of your model by OLS
and interpret the results.
b) Estimate the elasticity of sales volume to advertisement expenditure?
2.6.Decomposition of the Variation of Y & “Goodness of Fit” of an Estimated Model
We have obtained the estimators minimizing the sum of squared residuals. Once the estimation has
been done, we can see how well our sample regression line fits our data. In other words, we can
measure how well the explanatory variable, X, explains the dependent variable, Y. It is often useful
to compute a number that summarizes how well the OLS regression line fits the data. The measures
that indicate how well the sample regression line fits the data are denominated goodness of fit
measures.
➢ Recall that
𝑌𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖 + 𝑒𝑖 … … … … … … … … … … … … (2.2)
Where, 𝑌̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖
𝑌𝑖 = 𝑌̂𝑖 + 𝑒𝑖 …………………………………………… (2.21)
Summing (2.21) over the samples will result the following expression:
∑ 𝑌𝑖 = ∑ 𝑌̂𝑖 + ∑ 𝑒𝑖
∑ 𝑌𝑖 = ∑ 𝑌̂𝑖 Since, ∑ 𝑒𝑖 = 0
Dividing both sides of the above by ‘𝒏’ will give us
∑ 𝒀𝒊 ̂𝒊
∑𝒀
= => 𝑌̅ = 𝑌̅̂ … … … … … … … … … … … … (2.22)
𝒏 𝒏

Now, subtract (2.22) from (2.21)


(𝑌𝑖 − 𝑌̅) = (𝑌̂𝑖 − 𝑌̅) + 𝑒𝑖 … … … … … … … … … … … … … . (2.23)

Let 𝑦𝑖 = 𝑌𝑖 − 𝑌̅ and 𝑦̂𝑖 = 𝑌̂𝑖 − 𝑌̅̂

Therefore, equation (2.23) in deviation form can be stated as:


𝑦𝑖 = 𝑦̂𝑖 + 𝑒𝑖 … … … … … … … … … … … … … … … … … . . (2.24)

Instructor: Teklebirhan A. (Asst.Prof.) Page 21


CHAPTER TWO: REGRESSION ANALYSIS 2024

Take summation and square both sides of the above equation,


∑(𝑦𝑖 )2 = ∑[𝑦̂𝑖 + 𝑒𝑖 ]2
̂𝒊 𝟐 + ∑ 𝒆𝟐𝒊 + 𝟐 ∑ 𝒚
∑(𝒚𝒊 )𝟐 = ∑ 𝒚 ̂𝒊 𝒆𝒊 … … … … … … … … … . (𝟐. 𝟐𝟓)
But, ∑ 𝑦̂𝑖 𝑒𝑖 = 0, How?
Proof
Recall that 𝑦̂𝑖 = 𝑌̂𝑖 − 𝑌̅, (Since 𝑌̅ = 𝑌̅̂ ). Moreover, we know that
𝑌̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖 , and 𝑌̅ = 𝛼̂ + 𝛽̂ 𝑋̅
Therefore, via substitution, 𝑦̂𝑖 = (𝛼̂ + 𝛽̂ 𝑋𝑖 ) − (𝛼̂ + 𝛽̂ 𝑋̅)
𝑦̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖 − 𝛼̂ − 𝛽̂ 𝑋̅
𝑦̂𝑖 = 𝛽̂ (𝑋𝑖 − 𝑋̅)
̂ 𝒙𝒊 … … … … … … … … … … … … … … … (𝟐. 𝟐𝟔)
̂𝒊 = 𝜷
∴𝒚
Where, 𝑥𝑖 = 𝑋𝑖 − 𝑋̅

Hence, by plugging equation (2.26) into 𝑦𝑖 = 𝑦̂𝑖 + 𝑒𝑖 and rearranging, we obtain:


̂ 𝒙𝒊 … … … … … … … … … … … … … … … . . (𝟐. 𝟐𝟕)
𝒆𝒊 = 𝒚𝒊 − 𝜷
Thus, taking the sum of the product of equation 2.26 and 2.27, we get
∑𝑦 ̂ 𝑥𝑖 )(𝑦 − 𝛽
̂𝑖 𝑒𝑖 = ∑(𝛽 ̂ 𝑥𝑖 )
𝑖
= ∑[𝛽̂ 𝑥𝑖 𝑦𝑖 − 𝛽̂ 2 𝑥𝑖 2 ]
= 𝛽̂ [∑ 𝑥𝑖 𝑦𝑖 − 𝛽̂ ∑ 𝑥𝑖 2 ], But we know that, ̂ = ∑ 𝒚𝒊𝒙𝟐𝒊
𝜷
∑ 𝒙𝒊
Therefore, with substitution, we obtain:
∑𝒙 𝒚
∑ 𝑦̂𝑖 𝑒𝑖 = 𝛽̂ [∑ 𝑥𝑖 𝑦𝑖 − 𝒊 𝟐𝒊 . ∑ 𝑥𝑖 2 ]
∑ 𝒙𝒊

∑ 𝑦̂𝑖 𝑒𝑖 = 𝛽̂ [∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 𝑦𝑖 ] = 𝛽̂ [0] = 0
∴ ∑ 𝑦̂𝑖 𝑒𝑖 = 0
➢ Therefore, equation [2.25] becomes
∑(𝑦𝑖 )2 = ∑(𝑦̂𝑖 )2 + ∑ 𝑒𝑖2 … … … … … . . (𝟐. 𝟐𝟖)
𝑶𝒓, ∑(𝑌𝑖 − 𝑌̅ )2 = ∑(𝑌̂𝑖 − 𝑌̅)2 + ∑(𝑌𝑖 − 𝑌̂𝑖 )2
𝑻𝒐𝒕𝒂𝒍 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅
[ ] = [ ] + [ ]
𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
In words,
𝑇𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 (𝑇𝑆𝑆) =
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 (𝐸𝑆𝑆) + 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 (𝑅𝑆𝑆)

𝑰. 𝒆. , 𝑻𝑺𝑺 = 𝑬𝑺𝑺 + 𝑹𝑺𝑺

Instructor: Teklebirhan A. (Asst.Prof.) Page 22


CHAPTER TWO: REGRESSION ANALYSIS 2024

The Coefficient of Determination (𝑹𝟐 ): A Measure of the “Goodness of Fit”

In econometrics, the ‘goodness of fit’ of a model is measured by a statistical index called coefficient
of determination (𝑹𝟐 ).

Definition
➢ Coefficient of Determination (𝑹𝟐 ) is the proportion of total variation of (Y) which is
explained by the variation of the explanatory variable (X) included in the model.
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒊𝒏 𝒀
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 𝑫𝒆𝒕𝒆𝒓𝒎𝒊𝒏𝒂𝒕𝒊𝒐𝒏 ( 𝑹𝟐 ) =
𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒊𝒏 𝒀
2
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆𝒔 𝑬𝑺𝑺 ∑(𝑦̂𝑖)
= = … … … … … … … . (𝟐. 𝟐𝟗)
𝑻𝒐𝒕𝒂𝒍 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆𝒔 𝑻𝑺𝑺 ∑(𝒚𝒊 )𝟐

The notion of this index is straightforward in a sense that a model will be good fit if the explanatory
variable (X) included in the model determines large part of the actual values of Y. But if X is
irrelevant variable, then model would predict no part of the actual values of X.

Equation (2.28), 𝒊. 𝒆. , 𝑻𝑺𝑺 = 𝑬𝑺𝑺 + 𝑹𝑺𝑺 shows that the total variation in the observed Y values
about their mean value can be partitioned into two parts, one attributable to the regression line and
the other to random forces because not all actual Y observations lie on the fitted line.
Geometrically, we have following figure.

Figure: Breakdown of the variation of 𝑌𝑖 into two components

Instructor: Teklebirhan A. (Asst.Prof.) Page 23


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ 𝒙𝒊 .
̂𝒊 = 𝜷
From equation (2.26), we have 𝒚
Squaring and summing both sides gives us: ∑ 𝒚 ̂ 𝟐 ∑ 𝒙𝟐𝒊 … … … … … … … … (𝟐. 𝟑𝟎)
̂𝒊 = 𝜷
We can substitute (2.30) in (2.29) to obtain:
𝟐
𝑬𝑺𝑺̂ ∑ 𝑥2𝑖
𝜷
𝟐
𝑹 = = … … … … … … … … … . … … … … … … … … . . (𝟐. 𝟑𝟏)
𝑻𝑺𝑺 ∑(𝒚𝒊 )𝟐
∑ 𝒙𝒊 𝒚𝒊 𝟐 ∑ 𝑥𝑖2
=( ) ∑(𝒚 )𝟐 ̂ = ∑ 𝒙𝒊𝒚𝟐𝒊
, Since 𝜷
∑ 𝒙𝒊 𝟐 𝒊 ∑ 𝒙𝒊
𝟐 𝟐
𝟐
(∑ 𝒙𝒊 𝒚𝒊 ) ∑ 𝒙𝟐𝒊
(∑ 𝒙𝒊 𝒚𝒊 )
𝑹 = 𝟐 𝟐
=
(∑ 𝒙𝒊 𝟐 ) ∑(𝒚𝒊 ) ∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐
𝟐

∑ 𝒙𝒊 𝒚𝒊
𝑹𝟐 = … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟐)
√∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐
[ ]
∑ 𝒙𝒊 𝒚 𝒊
But we know that the term is the formula of the correlation coefficient, . for this
√∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐

reason 𝑹𝟐 (= 𝒓𝟐𝒀𝑿 ) is called the Square of the Correlation Coefficient.

In the regression context, 𝒓𝟐𝒀𝑿 is a more meaningful measure than r, for the former tells us the
proportion of variation in the dependent variable explained by the explanatory variable(s) and
therefore provides an overall measure of the extent to which the variation in one variable determines
the variation in the other. The latter does not have such value.

𝟐
(∑ 𝒚𝒊 𝒙𝒊 )𝟐
∴𝑹 = … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟑)
∑ 𝒚𝟐𝒊 ∑ 𝒙𝟐𝒊
Alternatively, we know that, 𝑹𝑺𝑺 = 𝑻𝑺𝑺 − 𝑬𝑺𝑺, hence 𝑹𝟐 becomes;
𝑻𝑺𝑺 − 𝑹𝑺𝑺 𝑹𝑺𝑺
𝑹𝟐 = =𝟏−
𝑻𝑺𝑺 𝑻𝑺𝑺
𝟐
∑ 𝒆𝟐𝒊
∴𝑹 =𝟏− … … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟒)
∑ 𝒚𝟐𝒊
Given that, ∑ 𝑒𝑖2 = ∑ 𝑦𝑖2 − ∑ 𝑦̂𝑖2 , and ∑ 𝑦̂𝑖2 = 𝛽̂ 2 ∑ 𝑥𝑖2
̂ 𝟐 ∑ 𝒙𝟐𝒊
∴ ∑ 𝒆𝟐𝒊 = ∑ 𝒚𝟐𝒊 −𝜷 or ̂ ∑ 𝒚𝒊 𝒙𝒊
∑ 𝒆𝟐𝒊 = ∑ 𝒚𝟐𝒊 − 𝜷
∑ 𝒚𝟐𝒊 −𝜷
̂ 𝟐 ∑ 𝒙𝟐𝒊
Therefore, [2.34] can be rewritten as 𝑹𝟐 = 𝟏 −
∑ 𝒚𝟐𝒊

Instructor: Teklebirhan A. (Asst.Prof.) Page 24


CHAPTER TWO: REGRESSION ANALYSIS 2024

➢ 𝑹𝟐 measures part of total variation in Y which is explained by the model. As a result it is used as
an indicator to measure the explanatory power or goodness of fit of a model. It shows the
proportion of total variation in Y which is attributable for the variation in X.

Interpretation of 𝑹𝟐

For example, if 𝑹𝟐 is 0.9 this would mean 90% of the total variation in the true values Y is explained
by the model and the remaining 10% of the total variation in Y is not explained by the model.
Equivalently, it would mean that, 90% of the total variation in the values of Y is explained or
determined (caused) by the variation in the values of X. Therefore, the model has good fit.

The limiting values of the coefficient of determination (𝑹𝟐 )


The coefficient of determination can assume any values lying between zero and one. The maximum
and minimum value of 𝑹𝟐 are 1 and 0, respectively, that is to say; 𝟎 ≤ 𝑹𝟐 ≤ 𝟏.

We can prove the above preposition as follows:


Recall that ∑ 𝒚𝟐𝒊 = ∑ 𝒚
̂𝟐𝒊 + ∑ 𝒆𝟐𝒊
➢ Divide both sides by ∑ 𝒚𝟐𝒊 , we obtain:
∑𝒚̂𝟐𝒊 ∑ 𝒆𝟐𝒊 ∑ 𝒆𝟐𝒊
𝟐
1= + ⟹ 𝟏=𝑹 +
∑ 𝒚𝟐𝒊 ∑ 𝒚𝟐𝒊 ∑ 𝒚𝟐𝒊

𝟐
∑ 𝒆𝟐𝒊
𝑹 =𝟏−
∑ 𝒚𝟐𝒊
∑ 𝒆𝟐𝒊
➢ Note that is the proportion of the unexplained variation of the Y 's around their mean, 𝑌̅ .
∑ 𝒚𝟐𝒊
➢ If all the observations lie on the regression line, 𝑌̂ = 𝛼̂ + 𝛽̂ 𝑋𝑖 , there will be no scatter of points.
In other words, the total variation of Y is explained completely by the estimated regression line,
∑ 𝒆𝟐𝒊
and consequently there will be no unexplained variation, i.e, = 𝟎 and hence 𝑹𝟐 = 𝟏.
∑ 𝒚𝟐𝒊
➢ On the other hand, if the regression line explains only part of the variation in Y, there will be
∑ 𝒆𝟐𝒊
some unexplained variation, i.e.,
∑ 𝒚𝟐𝒊
> 𝟎. Therefore, 𝑹𝟐 < 𝟏.
∑ 𝒆𝟐
Finally, if the regression line does not explain any part of the variation of Y, ∑ 𝒚𝒊𝟐 = 𝟏, since ∑ 𝒚𝟐𝒊 =
𝒊

∑ 𝒆𝟐𝒊 . Therefore, 𝑹𝟐 = 𝟎, which shows 𝑹𝟐 lies between 0 and 1.


Thus, the closer 𝑹𝟐 is to 1, the better is the fit.

Instructor: Teklebirhan A. (Asst.Prof.) Page 25


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂
The Relationship between 𝑹𝟐 and 𝜷

(∑ 𝒚𝒊 𝒙𝒊 )(∑ 𝒚𝒊 𝒙𝒊 )
𝑹𝟐 =
∑ 𝒚𝟐𝒊 ∑ 𝒙𝟐𝒊
∑𝒚 𝒙 ∑𝒚 𝒙
⇒ 𝑹𝟐 = [ ∑ 𝒊 𝟐 𝒊 ] [ ∑ 𝒊 𝟐 𝒊 ] ̂ = ∑ 𝒚𝒊𝒙𝟐 𝒊,
𝐬𝐢𝐧𝐜𝐞 𝜷
𝒙𝒊 𝒚𝒊 ∑ 𝒙𝒊

∑ 𝒚𝒊 𝒙𝒊
̂[
⇒ 𝑹𝟐 = 𝜷 ]
∑ 𝒚𝟐𝒊
∑ 𝒚𝟐𝒊
̂ = 𝑹𝟐 [
⇒𝜷 ]
∑ 𝒚𝒊 𝒙𝒊
❑ Example: 2.5:
➢ Based on the consumption model estimated in example 2.1 above.
a) Find the total variation (TSS), explained variation (ESS) and Unexplained Variation
(RSS)?
b) Compute the coefficient of determination (𝑅 2 ) and interpret the result.

❑ Solution:
a) Recall that, ∑ 𝑦𝑖 2 = ∑ 𝑦̂𝑖 2 + ∑ 𝑒𝑖2
From equation [2.26], we know that 𝑦̂𝑖 = 𝛽̂ 𝑥𝑖

̂ 𝑖 2 = 𝛽̂ 2 ∑ 𝑥𝑖2
∴ ∑𝑦 and ∑ 𝑒𝑖2 = ∑ 𝑦𝑖2 −𝛽̂ 2 ∑ 𝑥𝑖2
➢ 𝑇𝑆𝑆 = ∑ 𝑦𝑖 2 = 𝟑𝟐 (𝐹𝑟𝑜𝑚 𝑡𝑎𝑏𝑙𝑒 − 2.1)
̂ 𝑖 2 = 𝛽̂ 2 ∑ 𝑥𝑖2 = (0.607)2 (84) = 30.95,
➢ 𝐸𝑆𝑆 = ∑ 𝑦 and

➢ 𝑅𝑆𝑆 = ∑ 𝑒2𝑖 = ∑ 𝑦𝑖2 −𝛽̂ 2 ∑ 𝑥𝑖2 = 32 − 30.95 = 1.05


̂𝟐
∑𝒚 ̂ 𝟐 ∑ 𝒙𝟐𝒊
𝜷 30.95
b) 𝑹𝟐 = ∑ = = = 0.97 = 97%
𝒚𝟐 ∑ 𝒚𝒊 𝟐 32

➢ Interpretation: 97% of the total variation in consumption expenditure is explained by


variation in household income, while the remaining 3% is due to other factors that are not
included in the model.

2.7. Evaluation of an Estimated Model


After estimation of a model, the next stage is to evaluate the estimated model. By evaluation of the
model means examining the ‘goodness’ of an estimated model. To judge on the ‘goodness’ of an

Instructor: Teklebirhan A. (Asst.Prof.) Page 26


CHAPTER TWO: REGRESSION ANALYSIS 2024

estimated econometrics model, there are three criteria. These are economic criterion, statistical
criterion (First order test) and econometric criterion (Second Order Tests).

Therefore, in the coming sections, we shall discuss on the econometric criteria and statistical criteria
of model evaluation, in order.

2.7.1. Econometric Criterion: Statistical Desirable Properties of OLS Estimators and the
Gauss-Markov Theorem

There are various econometric methods with which we may obtain the estimates of the parameters of
economic relationships. We would like an estimate to be as close as the value of the true population
parameters, i.e., to vary within only a small range around the true parameter. How are we to choose
among the different econometric methods, the one that gives ‘good’ estimates? We need some
criteria for judging the ‘goodness’ of an estimate.

The goodness of an estimate is judged based on how close the estimate is to the true population
parameter it represents. Therefore, the closeness of an estimator is the major criterion based on
which the goodness of an estimator is judged in econometric application.

‘Closeness’ of the estimate to the population parameter is measured by the mean and variance or
standard deviation of the sampling distribution of the estimates of the different econometric
methods. We assume the usual process of repeated sampling i.e., we assume that we get a very large
number of samples each of size ‘n’; we compute the estimates 𝜷 ̂ ′𝑠 from each sample, and for each
econometric method and we form their distribution. We next compare the mean (expected value) and
the variances of these distributions and we choose among the alternative estimates the one whose
distribution is concentrated as close as possible around the population parameter.

There are traditional criteria based on which the closeness of an estimate to the population parameter
can be determined. These are called desirable properties of Estimators (or estimates).
➢ Desirable properties of estimators are two categories:
1. Finite (small sample) desirable properties of estimators and
2. Infinite (large sample) or asymptotic properties of estimators.

1. Finite (Small Sample) Properties of Estimators.


The small-sample, or finite-sample, properties of the estimator 𝜷 ̂ refer to the properties of the
sampling distribution of 𝜷 ̂ for any sample of fixed size. The small-sample properties of the estimator,
̂ are defined in terms of the mean, 𝑬(𝜷
𝜷 ̂ ) and the variance, 𝑽𝒂𝒓(𝜷 ̂ ) of the finite-sample distribution
of the estimator, 𝛽̂ . In other words, these are desirable attributes of estimators under smaller sample
sizes. These include:
A. Unbiasedness
B. Minimum variance
Instructor: Teklebirhan A. (Asst.Prof.) Page 27
CHAPTER TWO: REGRESSION ANALYSIS 2024

C. Efficiency
D. Linearity
E. Minimum mean square error (MMSE)
F. Best, linear, unbiased (BLU)
A. Unbiased Estimator
According to this criterion a good estimator is one that produces an unbiased estimate. An estimate
is said to be unbiased if its bias is zero. The bias of an estimate is defined by the difference between
the expected value of the estimate and the value of the population parameter.
That is,
̂) − 𝜷
𝑩𝒊𝒂𝒔 𝒐𝒇 𝒂𝒏 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆 = 𝑬(𝜷
̂ ) is unbiased if
Thus, an estimate 𝑜𝑓(𝜷
̂ ) = 𝜷 … … … … … … … … … … … . (𝟐. 𝟑𝟓)
𝑬(𝜷

Figure 3.1: Biased and unbiased estimates of 𝜽


Unbiasedness is desirable but not sufficient alone. Unbiasedness is good to be attained along with
minimum variance. This is so because even an unbiased estimate could be far from the true
population parameter unless it has minimum variance.

B. Minimum variance estimator (Best Estimator)


An estimator is best if it has the smallest variance as compared to any other estimators obtained from
̂ is best if:
other econometric method. That is, an estimator 𝜷
̂ ) < 𝑽𝒂𝒓(𝜷∗ )
𝑽𝒂𝒓(𝜷
̂ )]𝟐 < 𝑬[𝜷∗ − 𝑬(𝜷∗ )]𝟐 … … … … … … … … … (𝟐. 𝟑𝟔)
̂ − 𝑬(𝜷
𝑬[𝜷
Where: 𝜷∗ is any other estimate of 𝛽 obtained from other econometric technique

C. Efficient Estimator
𝑬𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒄𝒚 = 𝑼𝒏𝒃𝒊𝒂𝒔𝒆𝒅𝒏𝒆𝒔𝒔 + 𝑴𝒊𝒏𝒊𝒎𝒖𝒎 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆

Instructor: Teklebirhan A. (Asst.Prof.) Page 28


CHAPTER TWO: REGRESSION ANALYSIS 2024

Efficiency is a desirable statistical property because from two unbiased estimators of the same
population parameter, we prefer the one that has the smaller variance, i.e., the one that is statistically
more precise.
➢ Let 𝛽̂ and 𝛽̃ be two unbiased estimators of the population parameter 𝜷, such that 𝑬(𝜷 ̂) = 𝜷
and 𝑬(𝜷 ̃ ) = 𝜷. Then the estimator 𝜷 ̂ is efficient relative to the estimator 𝜷∗ if the variance of
̂ is less than the variance of the finite-sample distribution of 𝜷
the finite-sample distribution of 𝜷 ̃;
▪ i.e., if 𝑽𝒂𝒓(𝜷̂ ) ≤ 𝑽𝒂𝒓(𝜷∗ ) for all finite 𝒏 where 𝐸(𝛽̂ ) = 𝛽 and (𝜷∗ ) = 𝛽.
Note: Both the estimators 𝜷̂ and 𝜷∗ must be unbiased, since the efficiency property refers only to
the variances of unbiased estimators.

D. Minimum-Mean Square Error (MSE)


Suppose we have two estimates, 𝜷 ̂ & 𝜷∗ , of a parameter 𝜷 obtained from two different methods of
estimation. Suppose also that 𝜷 ̂ is unbiased but has large variance as compared with 𝜷∗ . If 𝜷∗ is
biased, which of the estimates you would prefer? Why? Which criteria would you apply to judge on
the relative goodness of the estimates?

Minimum Mean-Square-Error (MSE) criterion is a combination of unbiasedness and minimum


variance criteria that is used to make selection between unbiased and minimum variance estimators.
An estimator has minimum Mean-Square error if and only if it has the smallest expected value of
squared deviation from the population parameter. Mean Squared Error (MSE) of an estimator 𝜷 ̂ is
defined as:
𝑴𝑺𝑬(𝜷 ̂ ) = 𝑬[𝜷̂ − 𝜷]𝟐 … … … … … … … … (𝟐. 𝟑𝟕)
Where: 𝜷 is the true population parameter
̂ , which is defined as: 𝑽𝒂𝒓 (𝜷
This is in contrast with the variance of 𝜷 ̂ ) = 𝑬[𝜷 ̂ − 𝑬(𝜷 ̂ )]𝟐

The difference between the two is that 𝑽𝒂𝒓 (𝜷̂ ) measures the dispersion of the distribution of
̂ around its mean, whereas 𝑴𝑺𝑬(𝜷
𝜷 ̂ ) measures dispersion around the true value of the parameter.

̂ ) from terms inside the bracket in (2.37) we get,


Adding and subtracting 𝑬(𝜷
𝑴𝑺𝑬(𝜷) = 𝑬[𝜷 ̂ − 𝑬(𝜷 ̂ ) − 𝜷]𝟐
̂ ) + 𝑬(𝜷

𝑬[𝜷 ̂ )]𝟐 + 𝑬[𝑬(𝜷


̂ − 𝑬(𝜷 ̂ ) − 𝜷]𝟐 + 𝟐𝑬[𝜷
̂ − 𝑬(𝜷
̂ )][𝑬(𝜷
̂ ) − 𝜷]

̂ − 𝑬(𝜷
But 𝑬[𝜷 ̂ )]𝟐 is the variance of the sampling distribution of 𝜷 ̂ ) − 𝜷]𝟐 is the square of
̂ and 𝑬(𝜷

̂ and 𝟐𝑬 [(𝜷
the bias of 𝜷 ̂ − 𝑬(𝜷
̂ )) (𝑬(𝜷
̂ ) − 𝜷)] = 𝟎.

̂ − 𝑬(𝜷
✓ B/c 𝟐𝑬 [(𝜷 ̂ )) (𝑬(𝜷 ̂ )𝟐 − 𝑬(𝜷
̂ ) − 𝜷)] = 𝟐 [𝑬(𝜷 ̂ )𝟐 − 𝜷𝑬(𝜷
̂ ) + 𝜷𝑬(𝜷
̂ )] = 𝟎
𝟐 𝟐
̂ ) − 𝜷] = 𝑬(𝜷
NB:𝑬[𝑬(𝜷 ̂ ) − 𝜷] and 𝑬[𝑬(𝜷
̂ )] = 𝑬(𝜷
̂ ), since the expected

value of a constant is simply the constant itself.

Instructor: Teklebirhan A. (Asst.Prof.) Page 29


CHAPTER TWO: REGRESSION ANALYSIS 2024

➢ Therefore,
̂ ) = 𝒗𝒂𝒓(𝜷
𝑴𝑺𝑬(𝜷 ̂ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷
̂ ) … … … … … … … … . (𝟐. 𝟑𝟖)

Equation 2.38 shows that MSE criterion is a comprehensive criterion which considers both
unbiasedness and minimum variance criterion. Thus, by using MSE criterion we can make choice
between unbiased and best (minimum variance) estimators. In this case, if the comparison is between
unbiased and best estimators then, the one with minimum MSE must be selected.

Example: Suppose we have two estimates 𝜷 ̂ and 𝜷∗ of a population parameter 𝜷 obtained from
two different econometric methods say OLS and maximum likelihood, respectively. Also
suppose that 𝜷̂ is unbiased and has a variance of 9. On the other hand, 𝑬(𝜷∗ ) = 𝟒 and its
̂ ) is 6, then which econometric method would you select?
variance is 8. If the 𝑬(𝜷

➢ Solution:
̂ ) = 𝒗𝒂𝒓(𝜷
𝑴𝑺𝑬(𝜷 ̂ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷
̂) 𝑴𝑺𝑬(𝜷∗ ) = 𝒗𝒂𝒓(𝜷∗ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷∗ )
̂ 𝒊𝒔 𝒖𝒏𝒃𝒊𝒂𝒔𝒆𝒅)
= 𝟗 + 𝟎 (𝒔𝒊𝒏𝒄𝒆 𝜷 = 8 + (4 − 6)2 = 12
̂) = 𝟗
𝑴𝑺𝑬(𝜷 𝑴𝑺𝑬(𝜷∗ ) = 𝟏𝟐

̂ ) < 𝑴𝑺𝑬(𝜷∗ ), 𝜷
Since 𝑴𝑺𝑬(𝜷 ̂ has minimum MSE than 𝜷∗ and OLS is preferred than
Maximum-likelihood to estimate the model.

E. Linear Estimator
An estimator is linear if it is a linear function of the dependent variable of the model. Linearity of an
estimate does not determine the closeness of an estimate to the true population parameter. However,
it is desirable in econometric analysis because it is useful to deduce the probability distribution of an
estimate from the known probability distribution of the dependent variable of a model.

F. Best, Linear, and Unbiased Estimator (BLUE)


The ideal of optimum properties that the OLS estimates possess may be summarized by the well-
known theorem known as the Gauss-Markov Theorem, named after Carl Friedrich Gauss (a
German Mathematician) and Andrey Markov (a well-known Russian Mathematician).

The GMT can be stated as: “Given the assumptions of the classical linear regression model, the
OLS estimators, in the class of linear and unbiased estimators, have minimum variance, i.e. the
OLS estimators are BLUE. According to this theorem, under the basic assumptions of the classical
linear regression model, the least squares estimators are linear, unbiased and have minimum variance
(i.e. are best of all linear unbiased estimators).

Instructor: Teklebirhan A. (Asst.Prof.) Page 30


CHAPTER TWO: REGRESSION ANALYSIS 2024

An estimator is called BLUE if:


a. Linear: a linear function of a random variable, such as, the dependent variable, 𝒀.
b. Unbiased: its average or expected value is equal to the true population parameter.
c. Minimum variance: It has a minimum variance in the class of linear and unbiased estimators.
An unbiased estimator with the least variance is known as an efficient estimator. According to
the Gauss-Markov theorem, the OLS estimators possess all the BLUE properties.

The detailed proof of these properties is presented below.


̂ are linear in Y.
̂ and 𝜷
a. Linearity: Proposition: 𝜶
̂ ): From (2.17), the OLS estimator of 𝜷
Proof: (for 𝜷 ̂ is given by:

̂ = ∑ 𝒙𝒊𝒚𝟐𝒊 = ∑ 𝒙𝒊(𝒀𝒊𝟐−𝒀̅) = ∑ 𝒙𝒊𝒀𝒊−𝒀̅𝟐∑ 𝒙𝒊),


𝜷
∑ 𝒙𝒊 ∑ 𝒙𝒊 ∑ 𝒙𝒊
̅ ) = ∑ 𝑿𝒊 − 𝒏𝑿
(but,∑ 𝒙𝒊 = ∑(𝑿𝒊 − 𝑿 ̅ = 𝒏𝑿
̅ − 𝒏𝑿
̅ = 𝟎)

̂ = ∑ 𝒙𝒊𝒀𝟐𝒊 ;
𝜷 Now, let
𝒙𝒊
= 𝒌𝒊 (𝑖 = 1, 2, … . . , 𝑛)
∑ 𝒙𝒊 ∑ 𝒙𝒊 𝟐

̂ = ∑ 𝒌𝒊 𝒀𝒊 … … … … … … … … … … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟗)
∴𝜷

̂ is a weighted average of the 𝑌’𝑠, with 𝒌𝒊 serving as the weights.


✓ (2.39) shows that 𝜷
̂ = 𝒌𝟏 𝒀𝟏 + 𝒌𝟐 𝒀𝟐 + 𝒌𝟑 𝒀𝟑 + − − − − − + 𝒌𝒏 𝒀𝒏
𝜷

∴ 𝛽̂ 𝑖𝑠 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒐𝒓 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑖𝑡 𝑖𝑠 𝑎 𝑙𝑖𝑛𝑒𝑎𝑟 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑌.


Activity:
𝟏
➢ Show that 𝜶 ̅ 𝒌𝒊 ) 𝒀𝒊 . Derive this relationship b/n 𝜶
̂ = ∑( − 𝑿
̂ is linear in Y? Hint: 𝜶 ̂ and Y.
𝒏

b. Unbiasedness: Prove that 𝜶 ̂ are unbiased estimators of the true parameters 𝜶 and 𝜷.
̂ 𝒂𝒏𝒅 𝜷
̂ 𝒂𝒏𝒅 𝜷
To show that 𝜶 ̂ are the unbiased estimators of their respective parameters means to prove
that:
̂ ) = 𝜷, and 𝑬(𝜶
𝑬(𝜷 ̂) = 𝜶
̂ is unbiased estimator of 𝜷.
Proof (1): Prove that 𝜷
̂ = ∑ 𝒌𝒊 𝒀𝒊 = ∑ 𝒌𝒊 (𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )
We know that 𝜷
̂ = 𝜶 ∑ 𝒌𝒊 + 𝜷 ∑ 𝒌𝒊 𝑿𝒊 + ∑ 𝒌𝒊 𝑼𝒊 ,
𝜷
But, ∑ 𝒌𝒊 = 𝟎 and ∑ 𝒌𝒊 𝑿𝒊 = 𝟏
𝒙𝒊 ∑ 𝒙𝒊 ∑(𝑿𝒊 −𝑿 ̅) ∑ 𝑿𝒊 −𝒏𝑿 ̅ ̅ −𝒏𝑿
𝒏𝑿 ̅
∑ 𝒌𝒊 = ∑ ( ) = = = = =𝟎
∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐
∴ ∑ 𝒌𝒊 = 𝟎

Instructor: Teklebirhan A. (Asst.Prof.) Page 31


CHAPTER TWO: REGRESSION ANALYSIS 2024

∑ 𝒙𝒊 𝑿𝒊 ̅ )𝑿𝒊
∑(𝑿𝒊 −𝑿 ∑ 𝑿𝟐𝒊 −𝑿
̅ ∑𝑿
𝒊 ∑ 𝑿𝟐𝒊 −𝒏𝑿̅𝟐
∑ 𝒌 𝒊 𝑿𝒊 = = = = =𝟏
∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝑿𝒊 𝟐 −𝒏𝑿
̅𝟐 ∑ 𝑿𝒊 𝟐 −𝒏𝑿
̅𝟐
∴ ∑ 𝒌𝒊 𝑿𝒊 = 𝟏
̂ = 𝜷 + ∑ 𝒌𝒊 𝑼𝒊
𝜷 ⇒ ̂ − 𝜷 = ∑ 𝒌𝒊 𝑼𝒊 … … … … … … … … … … … (𝟐. 𝟒𝟎)
𝜷
̂ ) = 𝑬(𝜷) + ∑ 𝒌𝒊 𝑬(𝑼𝒊 ), Since the 𝑿𝒊 are assumed to be nonstochastic/fixed, the 𝒌𝒊 are non-
𝑬(𝜷
stochastic too.
̂
∴ 𝑬(𝜷) = 𝜷, Since 𝑬(𝑼𝒊 ) = 𝟎

̂ is unbiased estimator of 𝜷.
Therefore, 𝜷
̂ is unbiased estimator of 𝜶.
Proof(2): prove that 𝜶
𝟏
̅ 𝒌𝒊 ) 𝒀𝒊
̂ = ∑( − 𝑿
From the proof of linearity property, we know that: 𝜶 𝒏
𝟏
̅ 𝒌𝒊 ) (𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )] , Since 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊
̂ = ∑ [( − 𝑿
𝜶 𝒏

= 𝜶 + 𝜷 𝟏⁄𝒏 ∑ 𝑿𝒊 + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝜶𝑿
̅ ∑ 𝒌𝒊 − 𝜷𝑿
̅ ∑ 𝒌𝒊 𝑿𝒊 − 𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊

̅ + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝜷𝑿
= 𝜶 + 𝜷𝑿 ̅−𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊 ,

= 𝜶 + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊 ,

𝜶 ̅ ∑ 𝒌𝒊 𝑼𝒊 = ∑ (𝟏 − 𝑿
̂ − 𝜶 = 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝑿 ̅ 𝒌𝒊 ) 𝑼𝒊 … … … … … … … … … … . . (𝟐. 𝟒𝟏)
𝒏

̂) = 𝜶 + 𝟏⁄𝒏 ∑ 𝑬(𝑼𝒊 ) − 𝑿
𝐸 (𝜶 ̅ ∑ 𝒌𝒊 𝑬(𝑼 )
𝒊

∴ 𝑬(𝜶
̂) = 𝜶
̂ is an unbiased estimator of 𝜶.
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝜶

̂
̂ 𝒂𝒏𝒅 𝜷
c. Minimum variance of 𝜶
Now, we have to establish that out of the class of linear and unbiased estimators of𝜶 𝑎𝑛𝑑 𝜷,
̂ 𝒂𝒏𝒅 𝜷
𝜶 ̂ possess the smallest sampling variances. For this, we shall first obtain variance of
̂ 𝒂𝒏𝒅 𝜷
𝜶 ̂ and then establish that each has the minimum variance in comparison of the variances of
other linear and unbiased estimators obtained by any other econometric methods than OLS.
̂
a. Variance of 𝜷
̂ ) = 𝑬 [𝜷
𝑽𝒂𝒓(𝜷 ̂ − 𝑬 (𝜷
̂ )]𝟐 = 𝑬[𝜷
̂ − 𝜷]𝟐 …………………………………… (2.42)
Substitute (2.40) in (2.42) and we obtain:
̂ ) = 𝑬(∑ 𝒌𝒊 𝑼𝒊 )𝟐
𝑽𝒂𝒓(𝜷
̂ ) = 𝑬[𝒌𝟐𝟏 𝒖𝟐𝟏 + 𝒌𝟐𝟐 𝒖𝟐𝟐 … … . +𝒌𝟐𝒏 𝒖𝟐𝒏 + 𝟐𝒌𝟏 𝒌𝟐 𝒖 𝒖𝟐 + ⋯ … + 𝟐𝒌𝒏−𝟏 𝒌𝒏 𝒖 𝒖𝒏 ]
𝑽𝒂𝒓(𝜷 𝟏 𝒏−𝟏

= 𝑬[𝒌𝟐𝟏 𝒖𝟐𝟏 + 𝒌𝟐𝟐 𝒖𝟐𝟐 … … . +𝒌𝟐𝒏 𝒖𝟐𝒏 ] + 𝑬[𝟐𝒌𝟏 𝒌𝟐 𝒖𝟏 𝒖𝟐 + ⋯ … + 𝟐𝒌𝒏−𝟏 𝒌𝒏 𝒖𝒏−𝟏 𝒖𝒏 ]

Instructor: Teklebirhan A. (Asst.Prof.) Page 32


CHAPTER TWO: REGRESSION ANALYSIS 2024

= 𝑬[∑ 𝒌𝟐𝒊 𝒖𝟐𝒊 ) + 𝑬 [∑ 𝒌𝒊 𝒌𝒋 𝒖 𝒖𝒋 ] , 𝒊≠𝒋


𝒊

= ∑ 𝒌𝒊 𝟐 𝑬(𝒖𝒊 𝟐 ) + 𝟐 ∑ 𝒌𝒊 𝒌𝒋 𝑬(𝒖𝒊 𝒖𝒋 ) = 𝝈𝟐 ∑ 𝒌𝒊 𝟐 , (Since 𝑬(𝒖𝒊 𝒖𝒋 ) = 𝟎)


𝟐
∑ 𝒙𝒊 (∑ 𝒙𝒊) 𝟏
𝟐 , and therefore, 𝒌𝒊 = =
∑ 𝒌𝒊 = ∑ 𝟐
∑ 𝒙𝒊 𝟐 )𝟐 ∑ 𝒙𝒊 𝟐
(∑ 𝒙𝒊
𝝈𝟐
̂ ) = 𝝈𝟐 ∑ 𝒌 𝒊 𝟐 =
𝑽𝒂𝒓(𝜷
∑ 𝒙𝒊 𝟐

𝝈𝟐
̂) =
∴ 𝑽𝒂𝒓(𝜷 … … … … … … … … … (𝟐. 𝟒𝟑)
∑ 𝒙𝒊 𝟐

b. Variance of ̂
𝑽𝒂𝒓(𝜶
̂ ) = 𝑬 [𝜶 ̂)]𝟐 = 𝑬[𝜶
̂ − 𝑬 (𝜶 ̂ − 𝜶]𝟐 … … … … … … … … … … … … … … … … . . (𝟐. 𝟒𝟒)
Substituting equation (2.41) in (2.44), we get,
𝟐
𝟏
̅
̂ ) = 𝑬 [∑ ( − 𝑿𝒌𝒊 ) 𝑼𝒊 ]
𝑽𝒂𝒓(𝜶
𝒏
𝟐
𝟏
̅ 𝒌𝒊 ) 𝑼𝟐𝒊 ]
̂ ) = 𝑬 [∑ ( − 𝑿
𝑽𝒂𝒓(𝜶
𝒏
𝟐 𝟐
𝟏 2 𝟏
̅
= ∑ ( − 𝑿𝒌𝒊 ) 𝑬(𝑼𝟐𝒊 ) ̅
̂ = 𝜎 ∑ ( − 𝑿𝒌𝒊 )
⇒ 𝑽𝒂𝒓(𝜶)
𝒏 𝒏
𝟏 𝟐 𝟏 𝟐
= 𝜎2 ∑ ( − 𝑿 ̅ 𝟐 𝒌𝟐𝒊 )
̅ 𝒌𝒊 + 𝑿 ̂) = 𝜎 2 ( − 𝑿
⇒ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒌𝟐𝒊 )
̅ ∑ 𝒌𝒊 + 𝑿
𝒏𝟐 𝒏 𝒏 𝒏
𝟏
̅ 𝟐 ∑ 𝒌𝟐𝒊 ), since ∑ 𝒌𝒊 = 𝟎
= 𝜎 2( + 𝑿
𝒏
𝟐
𝟏 ̅
𝑿 ∑ 𝒙𝟐 𝟏
= 𝜎 2 ( + ∑ 𝟐) Since ∑ 𝒌𝟐𝒊 = (∑ 𝒙𝟐𝒊)𝟐 = ∑ 𝒙𝟐
𝒏 𝒙𝒊 𝒊 𝒊

𝟏 ̅𝟐
𝑿 ∑ 𝒙𝟐𝒊 +𝒏𝑿̅𝟐 ∑ 𝑿𝟐𝒊
Moreover, +∑ = =
𝒏 𝒙𝟐𝒊 𝒏 ∑ 𝒙𝒊𝟐 𝒏 ∑ 𝒙𝟐𝒊
𝟐
𝟏 ̅
𝑿 ∑ 𝑿𝟐𝒊
𝑉𝑎𝑟(𝛼̂) = 𝜎 2 ( + ∑ 𝟐 ) = 𝜎 2 ( )
𝒏 𝒙𝒊 𝒏 ∑ 𝒙𝟐𝒊

∑ 𝑿𝟐𝒊
̂ ) = 𝝈𝟐 (
∴ 𝑽𝒂𝒓(𝜶 ) … … … … … … … … … … … … … … … … … (𝟐. 𝟒𝟓)
𝒏 ∑ 𝒙𝟐𝒊
We have managed to compute the variances of OLS estimators. Now, it is time to check whether
these variances of OLS estimators do possess minimum variance property compared to the
̂.
̂ and 𝜷
variances of other estimators of the true 𝜶 𝑎𝑛𝑑 𝜷, other than 𝜶

Instructor: Teklebirhan A. (Asst.Prof.) Page 33


CHAPTER TWO: REGRESSION ANALYSIS 2024

To establish that 𝜶 ̂
̂ and 𝜷 possess minimum variance property, we compare their variances with
that of the variances of some other alternative linear and unbiased estimators of 𝜶 𝑎𝑛𝑑 𝜷, say
̃ . Now, we want to prove that any other linear and unbiased estimator of the true population
̃ 𝑎𝑛𝑑 𝜷
𝜶
parameter obtained from any other econometric method has larger variance than that of OLS
estimators.
̂ and then that of 𝜶
➢ Let’s first show minimum variance of 𝜷 ̂.
̂
1. Minimum variance of 𝜷
̃ is an alternative linear and unbiased estimator of 𝜷 and;
Suppose 𝜷
Let us define this alternative linear estimator of 𝜷 as follows:
̃ = ∑ 𝒘𝒊 𝒀𝒊 ……………………………… (2.46)
𝜷
where , wi  ki ; but: wi = k i + ci
̃ = ∑ 𝒘𝒊 𝑬(𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 ),
𝜷 since, 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊
̃ = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 + ∑ 𝒘𝒊 𝑼𝒊
𝜷
̃) = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 + ∑ 𝒘𝒊 𝑬(𝑼 )
𝑬(𝜷 𝒊

̃) = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 , since 𝑬(𝑼𝒊 ) = 𝟎
∴ 𝑬(𝜷
̃ to be unbiased estimator of 𝜷, there must be true that
Therefore, for 𝜷
∑ 𝒘𝒊 = 𝟎 and ∑ 𝒘𝒊 𝑿𝒊 = 𝟏
But also, 𝒘𝒊 = 𝒌𝒊 + 𝒄𝒊

∑ 𝒘𝒊 = ∑(𝒌𝒊 + 𝒄𝒊 ) = ∑ 𝒌𝒊 + ∑ 𝒄𝒊

Therefore, ∑ 𝒄𝒊 = 𝟎, 𝒔𝒊𝒏𝒄𝒆 ∑ 𝒌𝒊 = ∑ 𝒘𝒊 = 𝟎
Again, ∑ 𝒘𝒊 𝑿𝒊 = ∑(𝒌𝒊 + 𝒄𝒊 )𝑿𝒊 = ∑ 𝒌𝒊 𝑿𝒊 + ∑ 𝒄𝒊 𝑿𝒊
Since, ∑ 𝒘𝒊 𝑿𝒊 = 𝟏 𝒂𝒏𝒅 ∑ 𝒌𝒊 𝑿𝒊 = 𝟏 ⇒ ∑ 𝒄𝒊 𝑿𝒊 = 𝟎
̅)
➢ From these values, we can derive ∑ 𝒄𝒊 𝒙𝒊 = 𝟎, 𝒘𝒉𝒆𝒓𝒆 𝒙𝒊 = (𝑿𝒊 − 𝑿

̅ ) = ∑ 𝒄𝒊 𝑿𝒊 + 𝑿
∑ 𝒄𝒊 𝒙𝒊 = ∑ 𝒄𝒊 (𝑿𝒊 − 𝑿 ̅ ∑ 𝒄𝒊

Since ∑ 𝒄𝒊 𝑿𝒊 = 𝟎, ∑ 𝒄𝒊 = 𝟎 ⇒ ∑ 𝒄𝒊 𝒙𝒊 = 𝟎
̂ has minimum variance or not, lets compute 𝑽𝒂𝒓(𝜷∗ ) to compare with
To prove whether 𝜷
̂ ).
𝑽𝒂𝒓(𝜷

𝑽𝒂𝒓(𝜷∗ ) = 𝑽𝒂𝒓 (∑ 𝒘𝒊 𝒀𝒊 )

= ∑ 𝒘𝟐𝒊 𝑽𝒂𝒓(𝒀𝒊 )

Instructor: Teklebirhan A. (Asst.Prof.) Page 34


CHAPTER TWO: REGRESSION ANALYSIS 2024

∴ 𝑽𝒂𝒓(𝜷∗ ) = 𝝈𝟐 ∑ 𝒘𝟐𝒊 , 𝑺𝒊𝒏𝒄𝒆 𝑽𝒂𝒓(𝒀𝒊 ) = 𝝈𝟐

But, ∑ 𝒘𝟐𝒊 = ∑(𝒌𝒊 + 𝒄𝒊 )𝟐 = ∑ 𝒌𝟐𝒊 + ∑ 𝒄𝟐𝒊 + 𝟐 ∑ 𝒌𝒊 𝒄𝒊


∑ 𝒙𝒊 𝒄𝒊
⇒ ∑ 𝒘𝟐𝒊 = ∑ 𝒌𝟐𝒊 + ∑ 𝒄𝟐𝒊 , 𝑺𝒊𝒏𝒄𝒆 ∑ 𝒌𝒊 𝒄𝒊 = =𝟎
∑ 𝒙𝟐𝒊
𝟏
𝑽𝒂𝒓(𝜷∗ ) = 𝝈𝟐 (∑ 𝒌𝟐𝒊 + ∑ 𝒄𝟐𝒊 ) ⇒ 𝝈𝟐 ∑ 𝒌𝟐𝒊 + 𝝈𝟐 ∑ 𝒄𝟐𝒊 ⇒ 𝝈𝟐 (∑ 𝒙𝟐 ) +𝝈𝟐 ∑ 𝒄𝟐𝒊
𝒊

̂ ) + 𝝈𝟐 ∑ 𝒄𝟐𝒊
∴ 𝑽𝒂𝒓(𝜷∗ ) = 𝑽𝒂𝒓(𝜷

Given that 𝒄𝒊 is an arbitrary constant, 𝝈𝟐 ∑ 𝒄𝟐𝒊 is a positive magnitude, i.e, it is greater than zero.
Thus, 𝑽𝒂𝒓(𝜷∗ ) > 𝑽𝒂𝒓(𝜷 ̂ ). This proves that 𝜷̂ possesses minimum variance property. In similar
̂ ) possesses minimum variance.
way, we can prove that the least square estimate of the intercept (𝜶

̂
2. Minimum Variance of 𝜶
We take a new estimator 𝜶̃ , which we assume to be a linear and unbiased estimator of function of 𝜶.
̂ is given by:
The least square estimator 𝜶
𝟏
̂ = ∑( − 𝑿
𝜶 ̅ 𝒌𝒊 ) 𝒀𝒊
𝒏

̂ , let’s use the weight


By analogy with the proof of the minimum variance property of 𝜷
𝒘𝒊 = 𝒄𝒊 + 𝒌𝒊 Consequently;
𝟏
̅ 𝒘𝒊 ) 𝒀𝒊
̂ = ∑( − 𝑿
𝜶 𝒏

̃ to be on unbiased estimator of the true 𝜶 , that is, 𝐸(𝜶


Since we want 𝜶 ̃ ) = 𝜶, we substitute for 𝒀𝒊 =
𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 in ̃ and
𝜶 find the expected value ̃.
of 𝜶
𝟏
̅ 𝒘𝒊 ) (𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )
̃ = ∑( − 𝑿
𝜶 𝒏
𝜶 𝜷𝑿𝒊 𝑼𝒊
̃ = ∑( +
𝜶 + ̅ 𝒘𝒊 − 𝜷𝑿
− 𝜶𝑿 ̅ 𝑿𝒊 𝒘𝒊 − 𝑿
̅ 𝒘𝒊 𝑼 𝒊 )
𝒏 𝒏 𝒏
𝑼𝒊
̅+∑
̃ = 𝜶 − 𝜷𝑿
𝜶 ̅ ∑ 𝒘𝒊 − 𝜷𝑿
− 𝜶𝑿 ̅ ∑ 𝑿𝒊 𝒘𝒊 − 𝑿
̅ ∑ 𝒘𝒊 𝑼 𝒊
𝒏

̃ to be an unbiased estimator of the true 𝜶, the following must hold.


For 𝜶
∑ 𝒘𝒊 = 𝟎 , ∑ 𝑿𝒊 𝒘𝒊 = 𝟏 𝒂𝒏𝒅 ∑ 𝒘𝒊 𝑼 𝒊 = 𝟎
i.e., if ∑ 𝒘𝒊 = 𝟎 𝒂𝒏𝒅 ∑ 𝑿𝒊 𝒘𝒊 = 𝟏 ⇒ ∑ 𝒄𝒊 = 𝟎 𝒂𝒏𝒅 ∑ 𝒄𝒊 𝑿𝒊 = 𝟎.
As in the case of ̂ , we need to
𝜷 compute ̃ ) to
𝑽𝒂𝒓(𝜶 compare with ̂)
𝑽𝒂𝒓(𝜶
𝟏
̅ 𝒘𝒊 ) 𝒀𝒊 )
̃ ) = 𝑽𝒂𝒓(∑ ( − 𝑿
𝑽𝒂𝒓(𝜶 𝒏

Instructor: Teklebirhan A. (Asst.Prof.) Page 35


CHAPTER TWO: REGRESSION ANALYSIS 2024

𝟏 𝟐
̅ 𝒘𝒊 ) 𝑽𝒂𝒓(𝒀𝒊 )
̃) = ∑ ( − 𝑿
𝑽𝒂𝒓(𝜶 𝒏

𝟏 𝟐
̅ 𝒘𝒊 )
= 𝝈𝟐 ∑ (𝒏 − 𝑿
𝟏 𝟏
̅ 𝟐 𝒘𝟐𝒊 − 𝟐 𝑿
= 𝝈𝟐 ∑(𝒏𝟐 + 𝑿 ̅ 𝒘𝒊 )
𝒏
𝒏 𝟐
= 𝝈𝟐 ( ̅ ∑ 𝒘𝟐𝒊 − 𝟐𝑿
+𝑿 ̅ 𝟏/𝒏 ∑ 𝒘𝒊 )
𝒏𝟐
𝟏
̅ 𝟐 ∑ 𝒘𝟐𝒊 ),
= 𝝈𝟐 (𝒏 + 𝑿 Since ∑ 𝒘𝒊 = 𝟎

𝑩𝒖𝒕, ∑ 𝒘𝟐𝒊 = ∑ 𝒌𝟐𝒊 + ∑ 𝒄𝟐𝒊


𝟏
̅ 𝟐 (∑ 𝒌𝟐𝒊 + ∑ 𝒄𝟐𝒊 ))
̃ ) = 𝝈𝟐 ( + 𝑿
𝑽𝒂𝒓(𝜶 𝒏

𝟐
𝟏 ̅
𝑿 𝟐
̃) = 𝝈𝟐 ( +
𝑽𝒂𝒓(𝜶 𝟐
̅ ∑ 𝒄𝟐𝒊
) + 𝝈𝟐 𝑿
𝒏 ∑ 𝒙𝒊
∑ 𝑿𝟐 ∑ 𝑿𝟐
𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊
̃ ) = 𝝈𝟐 ( ∑ 𝒊 𝟐 ) + 𝝈𝟐 𝑿 , But 𝝈𝟐 (𝒏 ∑ 𝒙𝒊 𝟐 ) = 𝑽𝒂𝒓(𝜶
̂)
𝒏 𝒙 𝒊 𝒊

̃ ) = 𝑽𝒂𝒓(𝜶
∴ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊
̂ ) + 𝝈𝟐 𝑿

̃ ) > 𝑽𝒂𝒓(𝜶
⇒ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊 > 𝟎
̂ ), Since 𝝈𝟐 𝑿
Therefore, we have proved that the least square estimators of OLS are best, linear and
unbiased (BLU) estimators.

2) Large-Sample (Asymptotic) Properties of Estimators


It often happens that an estimator does not satisfy one or more of the desirable statistical properties
in small samples. But as the sample size increases indefinitely, the estimator possesses several
desirable statistical properties. These properties are known as the large-sample, or asymptotic,
properties.
The large-sample, or asymptotic, desirable properties of estimators refers to the properties of the
̂ as the sample size n becomes indefinitely large, i.e., as sample size 𝒏
sampling distribution of 𝜷
approaches infinity (as 𝒏 → ∞). Asymptotic (large sample) desirable properties of estimators are:
Asymptotic unbiasedness
Consistency
Asymptotic efficiency

Instructor: Teklebirhan A. (Asst.Prof.) Page 36


CHAPTER TWO: REGRESSION ANALYSIS 2024

A. Asymptotic unbiasedness
̂ is asymptotically unbiased estimator of 𝜷 if:
An estimator 𝜷
̂ ) = 𝐥𝐢𝐦 𝑬(𝜷
𝑨𝑬(𝜷 ̂ 𝒏) = 𝜷
𝒏→∞
̂ is an asymptotically unbiased estimate of 𝜷, if its asymptotic bias is zero. The
In other words, 𝜷
̂ is the difference between the asymptotic mean of 𝜷
asymptotic bias of 𝜷 ̂ and the true value of the
population parameter 𝜷. That is:
̂ = 𝑨𝑬(𝜷
𝑨𝒔𝒚𝒎𝒑𝒕𝒐𝒕𝒊𝒄 𝑩𝒊𝒂𝒔 𝒐𝒇 𝜷 ̂ 𝒏) − 𝜷
̂ 𝒏) − 𝜷
= 𝐥𝐢𝐦 𝑬(𝜷
𝒏→∞

Thus, roughly speaking the bias of an asymptotically unbiased estimate vanishes as the size of the
sample gets sufficiently large. An unbiased estimate is also asymptotically unbiased but the converse
is not always true.

B. Consistency
̂ of 𝜷 is
An estimator is a consistent estimator if it results in a consistent estimate. An estimate 𝜷
consistent, if the following two conditions are met simultaneously:
̂ is asymptotically unbiased, i.e.,
1. 𝜷
̂) = 𝜷
𝐥𝐢𝐦 𝑬(𝜷
𝒏→∞
̂ degenerates, i.e.,
2. The distribution of 𝜷
𝐥𝐢𝐦 [𝑽𝒂𝒓(𝜷 ̂ )] = 𝐥𝐢𝐦 [𝜷 ̂ )]𝟐 = 𝟎
̂ − 𝑬(𝜷
𝒏→∞ 𝒏→∞

A consistent estimate converges to the population parameter as the sample size gets sufficiently
large. For consistency property to hold, the bias and variance of the estimator/estimate both should
tend to zero as the sample size increases indefinitely. Thus, this criterion assumes that with larger
and larger samples which contain more information we will be able to obtain an increasingly
accurate estimate of the population parameter.

C. Asymptotic Efficiency
̂ is an asymptotically efficient if the following conditions are met simultaneously:
An estimate 𝜷

1. 𝜷
̂ is consistent. That is:
𝟐
𝐥𝐢𝐦 𝑬(𝜷̂) = 𝜷 𝑎𝑛𝑑 ̂ )] = 𝐥𝐢𝐦 [𝜷
𝐥𝐢𝐦 [𝑽𝒂𝒓(𝜷 ̂ 𝒏𝒊 − 𝑬(𝜷
̂ 𝒏𝒊 )] = 𝟎
𝒏→∞ 𝒏→∞ 𝒏→∞

2. 𝜷
̂ has the smallest asymptotic variance as compared with any other consistent estimator 𝜷
̃
obtained from other method. That is:

Instructor: Teklebirhan A. (Asst.Prof.) Page 37


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ ) < 𝑨𝒔𝒚𝒎𝒑𝒕𝒐𝒕𝒊𝒄 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆 (𝜷


𝑨𝒔𝒚𝒎𝒑𝒕𝒐𝒕𝒊𝒄 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆(𝜷 ̃)

The variance of the random variable, 𝑼𝒊 (𝛔𝟐𝐮 )


̂ involve 𝝈𝟐𝒖 , which is the variance of the population random term.
̂ 𝑎𝑛𝑑 𝜷
Both the variances of 𝜶
However, it is difficult to obtain the value of 𝝈𝟐𝒖 , because, the population random disturbance term is
unobservable. As a result, the OLS’s estimates of 𝜶 ̂ are also unobservable.
̂ 𝑎𝑛𝑑 𝜷

̂ , we have to get an unbiased estimate of the


̂ 𝑎𝑛𝑑 𝜷
Therefore, in order to compute the variances of 𝜶
̂ 𝟐𝒖 from the variance of the sample error term/residual values. An
variance of the disturbance term 𝝈
̂ 𝟐𝒖 can be obtained from the
unbiased estimate of the variance of the random disturbance term 𝝈
variance of the sample residual values as follows:
∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 = … … … … … … … … … … … … … . . (𝟐. 𝟒𝟔)
𝒏−𝟐
̂ 𝟐𝒖 in the expressions for the variances of 𝜶
To use 𝝈 ̂ , we have to prove whether 𝝈
̂ 𝑎𝑛𝑑 𝜷 ̂ 𝟐𝒖 is the
∑ 𝒆𝟐
̂𝟐𝒖 ) = 𝑬 (
unbiased estimator of 𝝈𝟐𝒖 , i.e.,𝑬(𝝈 𝒊
) = 𝝈𝟐𝒖 .
𝒏−𝟐

̂𝟐𝒖 is unbiased estimator of 𝝈𝟐𝒖 , first we have to compute ∑ 𝒆𝟐𝒊 as follows,


To prove 𝝈
Recall our population Model:
𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 … … … … … … … … . (𝟐. 𝟏)
And
̅ = 𝜶 + 𝜷𝑿
𝒀 ̅+𝑼̅ … … … … … … … … … (𝟐. 𝟒𝟕)
Subtracting (2.47) from (2.1), we obtain,
̅ ) … … … … … … … (𝟐. 𝟒𝟖)
𝒚𝒊 = 𝜷𝒙𝒊 + (𝑼𝒊 − 𝑼
Also recall that,
̂𝒙𝒊 … … … … … … … … . . … (𝟐. 𝟒𝟗)
𝒆𝒊 = 𝒚𝒊 − 𝜷
Therefore, substituting (𝟐. 𝟒𝟖) in (2.49) gives
̂𝒙𝒊 … … … … … … … . . (𝟐. 𝟓𝟎)
̅) − 𝜷
𝒆𝒊 = 𝜷𝒙𝒊 + (𝑼𝒊 − 𝑼
Collecting like terms, we obtain:
̂ − 𝜷)𝒙𝒊
̅ ) − (𝜷
𝒆𝒊 = (𝑼𝒊 − 𝑼
Taking summation over the 𝒏 sample values and squaring both sides yields:
∑ 𝒆𝟐𝒊 = ∑[(𝑼𝒊 − 𝑼 ̂ − 𝜷)𝒙𝒊 ]𝟐
̅ ) − (𝜷
𝟐
̂ − 𝜷) 𝒙𝟐𝒊 −𝟐(𝑼𝒊 − 𝑼
̅ )𝟐 + (𝜷
∑ 𝒆𝟐𝒊 = ∑[(𝑼𝒊 − 𝑼 ̂ − 𝜷)𝒙𝒊 ]
̅ )(𝜷

̂ − 𝜷)𝟐 ∑ 𝒙𝟐𝒊 − 𝟐[(𝜷


̅ )𝟐 + (𝜷
∑ 𝒆𝟐𝒊 = ∑(𝑼𝒊 − 𝑼 ̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
̅ )]

Taking expected values we have:

Instructor: Teklebirhan A. (Asst.Prof.) Page 38


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ − 𝜷)𝟐 ∑ 𝒙𝟐𝒊 ] − 𝟐𝑬 [(𝜷


̅ )𝟐 ] + 𝑬[(𝜷
𝑬(∑ 𝒆𝟐𝒊 ) = 𝑬[∑(𝑼𝒊 − 𝑼 ̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
̅ )] … … … (𝟐. 𝟓𝟏)

The right hand side terms of (2.51) may be rearranged as follows


̅ )𝟐 ] = 𝑬(∑ 𝑼𝟐𝒊 + 𝒏𝑼
a. 𝑬[∑(𝑼𝒊 − 𝑼 ̅ 𝟐 − 𝟐𝑼
̅ ∑ 𝑼𝒊 ) = 𝑬(∑ 𝑼𝟐𝒊 − 𝑼
̅ ∑ 𝑼𝒊 )

∑ 𝑼𝟐𝒊 𝟏
= 𝑬 (∑ 𝑼𝟐𝒊 − ) = ∑ 𝑬(𝑼𝟐𝒊 ) − 𝑬 (∑ 𝑼𝟐𝒊 )
𝒏 𝒏
𝟏
= 𝒏𝝈𝟐𝒖 − 𝒏 𝑬(𝑼𝟏 + 𝑼𝟐 + ⋯ … . . +𝑼𝒏 )𝟐, Since 𝑬(𝑼𝟐𝒊 ) = 𝝈𝟐𝒖
𝟏
= 𝒏𝝈𝟐𝒖 − 𝑬 (∑ 𝑼𝟐𝒊 + 𝟐 ∑ 𝑼𝒊 𝑼𝒋 ) , 𝒊 ≠ 𝒋
𝒏
𝟏 𝟐
= 𝒏𝝈𝟐𝒖 − 𝒏𝝈𝟐𝒖 − ∑ 𝑬(𝑼𝒊 𝑼𝒋 )
𝒏 𝒏
= 𝒏𝝈𝟐𝒖 − 𝝈𝟐𝒖 , 𝑺𝒊𝒏𝒄𝒆 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎

̅ )𝟐 ] = 𝝈𝟐𝒖 (𝒏 − 𝟏) … … … … … … … … … … (𝟐. 𝟓𝟐)


𝑬[∑(𝑼𝒊 − 𝑼

̂ − 𝜷)𝟐 ∑ 𝒙𝟐𝒊 ] = ∑ 𝒙𝟐𝒊 . 𝑬(𝜷


b. 𝑬[(𝜷 ̂ − 𝜷)𝟐
𝟐
̂ ) = 𝝈𝒖⁄ 𝟐
̂ − 𝜷)𝟐 = 𝑽𝒂𝒓(𝜷
Given that the X’s are fixed in all samples and we know that 𝑬(𝜷
∑ 𝒙𝒊
𝟐
̂ − 𝜷) = ∑ 𝒙𝟐𝒊 . 𝝈𝒖⁄ 𝟐
Hence,∑ 𝒙𝟐𝒊 . 𝑬(𝜷
𝟐
∑ 𝒙𝒊
̂ − 𝜷)𝟐 = 𝝈𝟐𝒖 … … … … … … … … … … … … … … … … … . (𝟐. 𝟓𝟑)
∑ 𝒙𝟐𝒊 . 𝑬(𝜷

̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
c. −𝟐𝑬[(𝜷 ̂ − 𝜷)(∑ 𝒙𝒊 𝑼𝒊 − 𝑼
̅ )] = −𝟐𝑬[(𝜷 ̅ ∑ 𝒙𝒊 )]

̂ − 𝜷)(∑ 𝒙𝒊 𝑼𝒊 )] , 𝑺𝒊𝒏𝒄𝒆 ∑ 𝒙𝒊 = 𝟎
= −𝟐𝑬[(𝜷

̂ − 𝜷 = ∑ 𝒌𝒊 𝑼𝒊 and substitute it in the above expression, we obtain:


But from (2.40), 𝜷
̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
−𝟐𝑬 [(𝜷 ̅ )] = −𝟐𝑬[(∑ 𝒌𝒊 𝑼𝒊 ) (∑ 𝒙𝒊 𝑼𝒊 )]

∑ 𝒙𝒊 𝑼𝒊 𝒙𝒊
= −𝟐𝑬[( ) (∑ 𝒙𝒊 𝑼𝒊 )] , 𝑺𝒊𝒏𝒄𝒆, 𝒌𝒊 =
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
𝟐
(∑ 𝒙𝒊 𝑼𝒊 ) ∑ 𝒙𝟐𝒊 𝑼𝟐𝒊 + 𝟐 ∑ 𝒙𝒊 𝒙𝒋 𝑼𝒊 𝑼𝒋
= −𝟐𝑬 [ ] = −𝟐𝑬 [ ]
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
∑ 𝒙𝟐𝒊 𝑬(𝑼𝟐𝒊 ) 𝟐 ∑ 𝒙𝒊 𝒙𝒋 𝑬(𝑼𝒊 𝑼𝒋 )
= −𝟐 [ + ],𝒊 ≠ 𝒋
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
= −𝟐𝝈𝟐𝒖 , 𝑺𝒊𝒏𝒄𝒆𝑬(𝑼𝟐𝒊 ) = 𝝈𝟐𝒖 & 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎

Instructor: Teklebirhan A. (Asst.Prof.) Page 39


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
−𝟐𝑬[(𝜷 ̅ )] = −𝟐𝝈𝟐𝒖 …………………. (2.54)

Consequently, Equation (2.51) can be written in terms of (2.52), (2.53) and (2.54) as follows:

𝑬 (∑ 𝒆𝟐𝒊 ) = 𝝈𝟐𝒖 (𝒏 − 𝟏) + 𝝈𝟐𝒖 − 𝟐𝝈𝟐𝒖 = (𝒏 − 𝟐)𝝈𝟐𝒖 … … … … … … … … … … … . … (𝟐. 𝟓𝟓)

Therefore, if we define
∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 =
𝒏−𝟐
Its expected value is
𝟏
̂ 𝟐𝒖 ) =
𝑬(𝝈 𝑬 (∑ 𝒆𝟐𝒊 ) = 𝝈𝟐𝒖 … … … … … … … … … … … … … … … … … (𝟐. 𝟓𝟔)
𝒏−𝟐

̂𝟐𝒖 is unbiased estimator of the true variance of the error term (𝝈𝟐𝒖 ).
Therefore, 𝝈

∑ 𝒆𝟐
̂𝟐𝒖 =
The conclusion that we can drive from the above proof is that we can substitute 𝝈 𝒊
for (𝝈𝟐𝒖 )
𝒏−𝟐
̂ 𝑎𝑛𝑑 𝜷
in the variance expressions of 𝜶 ̂ 𝟐𝒖 ) = 𝝈𝟐𝒖 .
̂ , since𝑬 (𝝈

Hence, the formula of variance of 𝜶 ̂ becomes;


̂ 𝑎𝑛𝑑 𝜷
𝟏 𝟏 ∑ 𝒆𝟐𝒊
𝑽𝒂𝒓(𝜷 ̂) = 𝝈̂ 𝟐𝒖 ( 𝟐 ) = ( 𝟐 ) … … … … … … … … . (𝟐. 𝟓𝟕)
∑ 𝒙𝒊 𝒏 − 𝟐 ∑ 𝒙𝒊

∑ 𝑿𝟐𝒊 ∑ 𝒆𝟐𝒊 ∑ 𝑿𝟐𝒊


̂ 𝟐𝒖 (
̂) = 𝝈
𝑽𝒂𝒓(𝜶 ) = … … … … … … . … . (𝟐. 𝟓𝟖)
𝒏 ∑ 𝒙𝟐𝒊 𝒏(𝒏 − 𝟐) ∑ 𝒙𝟐𝒊

2.7.2. Statistical Inference: Statistical Test of Significance of OLS Estimators (First Order Tests)
In this section, we shall develop statistical criteria for the evaluation of an estimated model.
Statistical criteria are developed based on statistical and probability theories. The application of
statistical criteria to judge on the goodness of a model is known as tests of the statistical
significance (TSS) or first order tests of a model.

Test of the statistical significance of the parameter estimates is determining whether sample values
of the parameter estimates are statistically different from zero or not. TSS of a model is mandatory,
because specification and sampling errors are inevitable, and as a result of these the estimated model
could be different from the actual relationship of the variables.

̂ might be different from zero for a mere chance. That is, the sample
̂ 𝑎𝑛𝑑 𝜷
Sample values of 𝜶
̂ can be different from zero not for the fact that 𝜶 and 𝜷 are actually different from
̂ 𝑎𝑛𝑑 𝜷
values of 𝜶

Instructor: Teklebirhan A. (Asst.Prof.) Page 40


CHAPTER TWO: REGRESSION ANALYSIS 2024

zero but by a mere chance that happened in a sample. Therefore, in econometric applications it is
mandatory to test whether the true population values of 𝜶 and 𝜷 are different from zero.

The true population values of 𝜶 and 𝜷 are unobservable (can’t be estimated) so how can we
determine whether their values are zero or not? In econometric applications we use the sampling
̂ together with sample values of 𝜶
̂ 𝑎𝑛𝑑 𝜷
distributions of 𝜶 ̂ to compute how likely is for 𝜶 and
̂ 𝑎𝑛𝑑 𝜷
𝜷 to be zero. In this case, if it is less likely for 𝜶 and 𝜷 to be zero, then 𝜶 and 𝜷 are said to be
statistically different from zero and the reverse will be the conclusion if it is highly likely for 𝜶 and
̂ are statistically
̂ 𝑎𝑛𝑑 𝜷
𝜷 to be zero. If 𝜶 and 𝜷 are statistically different from zero, then we say 𝜶
significant (statistically different from zero).

̂ ) refers to the
̂ 𝑎𝑛𝑑 𝜷
Therefore, test of the statistical significance of the parameter estimates (𝜶
̂ together with their
̂ 𝑎𝑛𝑑 𝜷
process in which we use the sampling distributions of OLS estimators 𝜶
sample values to compute how likely is for 𝜶 and 𝜷 to be zero to determine whether they are
statistically different from zero.

Standard Statistical techniques of tests


There are several statistical techniques of tests which we may use to test the statistical significance
̂ . In this section, we shall learn the most common test techniques, namely
̂ 𝑎𝑛𝑑 𝜷
of 𝜶
i. The Standard Error test,
ii. The Z-test &
iii. The t- test
iv. Confidence Interval Test/Approach
➢ All of these testing procedures reach on the same conclusion.
However, to test the significance of the OLS parameter estimators using the above standard test
techniques, we need the following:
➢ The sampling distribution of estimators, 𝜶 ̂.
̂ 𝑎𝑛𝑑 𝜷
➢ The assumption of normality of the error term.
We have already derived that:
̂ 𝟐𝒖
𝝈 𝟏 ∑ 𝒆𝟐𝒊
̂) =
𝑽𝒂𝒓(𝜷 = ( )
∑ 𝒙𝟐𝒊 𝒏 − 𝟐 ∑ 𝒙𝟐𝒊

̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐𝒊 ∑ 𝑿𝟐𝒊
̂) =
𝑽𝒂𝒓(𝜶 =
𝒏 ∑ 𝒙𝟐𝒊 𝒏(𝒏 − 𝟐) ∑ 𝒙𝟐𝒊

Instructor: Teklebirhan A. (Asst.Prof.) Page 41


CHAPTER TWO: REGRESSION ANALYSIS 2024

For the purpose of estimation of the parameters the assumption of normality is not used, but we use
this assumption to test the significance of the parameter estimators; because the testing methods
identified above are based on the assumption of the normality assumption of the disturbance
term. Hence, before we discuss on the various testing methods it is important to see whether the
parameters are normally distributed or not.

We have already assumed that the error term is normally distributed with mean zero and
variance 𝝈𝟐 , i.e., 𝑼𝒊 ~𝑵(𝟎, 𝝈𝟐 ). Similarly, we also proved that 𝒀𝒊 ~𝑵(𝜶 + 𝜷𝑿𝒊 , 𝝈𝟐 ). Now, we want
to show the following:
𝟐
̂ ~𝑵 (𝜷, 𝝈̂𝒖𝟐)
1. 𝜷 ∑𝒙 𝒊

̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐 𝑹𝑺𝑺
̂ ~𝑵 (𝜶,
2. 𝜶 ), ̂ 𝟐𝒖 = 𝒊 =
Where, 𝝈
𝒏 ∑ 𝒙𝟐𝒊 𝒏−𝟐 𝒏−𝟐

To show whether 𝜶 ̂ 𝑎𝑛𝑑 𝜷̂ are normally distributed or not, we need to make use of one
property of normal distribution. “........ any linear function of a normally distributed
variable is itself normally distributed.”

̂ are linear in Y, it follows that


̂ 𝑎𝑛𝑑 𝜷
We have proved that 𝜶
𝟐 ̂𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈
̂ ~𝑵 (𝜷, 𝝈̂𝒖𝟐 )
𝜷 & ̂ ~𝑵 (𝜶,
𝜶 )
∑𝒙 𝒊 𝒏 ∑ 𝒙𝟐𝒊

Fig: The sampling Distribution of OLS Estimates


Now, let us now see these testing methods one by one.

i) Standard error test


To apply this test, the size of the sample (n) has to be large enough (𝒏 > 𝟑𝟎). It must be noted also
̂
̂ 𝑎𝑛𝑑 𝜷
that the standard error test used only as a rough estimation of the statistical significance of 𝜶
roughly at 5% (4.5%) of level of significance.

̂ are significantly different from zero, i.e.


̂ 𝑎𝑛𝑑 𝜷
This test helps us decide whether the estimates 𝜶
whether the sample from which they have been estimated might have come from a population whose
true parameters are zero. I.e., 𝜶 = 𝟎 𝑎𝑛𝑑/𝑜𝑟 𝜷 = 𝟎.

Instructor: Teklebirhan A. (Asst.Prof.) Page 42


CHAPTER TWO: REGRESSION ANALYSIS 2024

Steps to the standard error test


Step-1: Specify the null and alternative hypotheses of the test
𝑯𝟎 : 𝜶 = 𝟎 𝒐𝒓 𝑯𝟎 : 𝜷 = 𝟎
𝑯𝑨 : 𝜶 ≠ 𝟎 𝒐𝒓 𝑯𝑨 : 𝜷 ≠ 𝟎
̂
̂ 𝑎𝑛𝑑 𝜷
Step-2: Compute the standard errors of 𝜶
As we have seen earlier the population variances (or standard deviations) of 𝜶 𝑎𝑛𝑑 𝜷 are unknown
because the standard deviation of the population error term is unknown. Instead, we compute sample
̂ using the following formulas.
̂ 𝑎𝑛𝑑 𝜷
estimates of standard errors (or standard deviations) of 𝜶
̂ ) = √𝑽𝒂𝒓(𝜷
𝑺𝑬(𝜷 ̂)

̂ ) = √𝑽𝒂𝒓(𝜶
𝑺𝑬(𝜶 ̂)
Recall that:

̂ 𝟐𝒖
𝝈 ∑ 𝒆𝟐𝒊 ̂ 𝟐𝒖
𝝈 ∑ 𝒆𝟐𝒊
̂ ) = √𝑽𝒂𝒓(𝜷
𝑺𝑬(𝜷 ̂) = √ = √ , ̂) =
𝑺𝒊𝒏𝒄𝒆, 𝑽𝒂𝒓(𝜷 =
∑ 𝒙𝟐𝒊 (𝒏 − 𝟐) ∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊 (𝒏 − 𝟐) ∑ 𝒙𝟐𝒊

̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐𝒊 ∑ 𝑿𝟐𝒊
̂) = √
̂ ) = √𝑽𝒂𝒓(𝜶
𝑺𝑬(𝜶 = √
𝒏 ∑ 𝒙𝟐𝒊 𝒏(𝒏 − 𝟐) ∑ 𝒙𝟐𝒊
̂.
̂ 𝑎𝑛𝑑 𝜷
Step-3: Compare the standard errors with the numerical values of 𝜶
Decision Rule: (Eg: For the slope coefficient)
• ̂) > 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ , accept the null hypothesis and reject the alternative hypothesis. We
̂ is statistically insignificant.
conclude that 𝜷
• ̂) < 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ , reject the null hypothesis and accept the alternative hypothesis. We
̂ is statistically significant.
conclude that 𝜷

The basic notion of this test emanates from basic property of normal distribution which states that
about 95% of all possible values of any normally distributed variable are distributed within the range
of mean plus or minus twice the standard error of the variable (𝝁𝒙 ± 𝟐𝝈𝒙 ).

Rejection of the null-hypothesis implies that it is less probable (less than 5%) to observe the sample
̂ , if the true values of 𝜶 𝑎𝑛𝑑 𝜷 were zero. Thus, it is possible to conclude that the
̂ 𝑎𝑛𝑑 𝜷
values of 𝜶
̂ are statistically
̂ 𝑎𝑛𝑑 𝜷
true values of 𝜶 𝑎𝑛𝑑 𝜷 are statistically different from zero, hence 𝜶
significant.

̂ to assume values beyond twice


̂ 𝑎𝑛𝑑 𝜷
In other words, this means there is only 5% chance for 𝜶
their respective standard errors if the null hypotheses of 𝜶 = 𝟎 𝑜𝑟 𝜷 = 𝟎 were true. Thus, given this

Instructor: Teklebirhan A. (Asst.Prof.) Page 43


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ to be
̂ 𝑎𝑛𝑑 𝜷
fact it is possible to conclude that it would be improbable for the actual values of 𝜶
greater than twice the values of their standard errors if their mean values (𝜶 𝑎𝑛𝑑 𝜷) were zero.

Numerical example: Suppose that from a sample of size 𝒏 = 𝟏𝟓𝟎, we estimate the following
supply function.
𝑸 = 𝟐𝟎 + 𝟎. 𝟔𝑷
𝑺𝑬: (𝟏. 𝟓) (𝟎. 𝟎𝟐𝟓)
➢ Test the significance of the slope parameter at 𝟓% level of significance using the standard
error test.
̂ ) = 2(0.025) = 0.05
Solution: 𝟐𝑺𝑬(𝜷 ̂ = 𝟎. 𝟔
𝑎𝑛𝑑 𝜷
Thus, since 𝟐𝑺𝑬(𝜷̂) < 𝜷
̂, we reject the null hypothesis of 𝜷 = 𝟎 at 5% level of significance. This
implies that the true value of 𝜷 is statistically different from zero, hence 𝜷 ̂ is statistically
significant.

Note: The standard error test is an approximated test (which is approximated from the z-test
and t-test) and implies a two tail test conducted at 5% level of significance.

Economic interpretation of the ‘standard error test’


The acceptance or rejection of the null hypothesis has definite economic meaning. Namely, the
acceptance of the null hypothesis, 𝜷 = 𝟎 (the slope parameter is zero) implies that the explanatory
variable to which this estimate relates does not in fact influence the dependent variable 𝒀 and should
not be included in the model. In other words, the acceptance of the null-hypothesis implies that,
there is no a scientifically meaningful (a statistically acceptable) relationship between Y and X, at
5% level of significance.

ii) The Z-Test of OLS Estimates


This test is based on the standard Normal (Gaussian) Distribution and helps us to test the statistical
̂ at any chosen level of significance. This test is applied if the population
̂&𝜷
significance 𝜶
̂ are known or the population variances are unknown but we have sufficiently
̂&𝜷
variances of 𝜶
large sample (𝒏 > 𝟑𝟎). As we seen it earlier, the population variances of the OLS estimates are
unknown. Therefore, the Z-test is applicable if we have sufficiently large sample (𝒏 > 𝟑𝟎) with
̂.
̂ 𝑎𝑛𝑑 𝜷
which we estimate 𝜶

The general principle of this test is that it would be less probable for 𝜶 𝑎𝑛𝑑 𝜷 to be zero if it were
̂ under the hypothesized zero values of
̂ 𝑎𝑛𝑑 𝜷
less likely to observe the observed sample values 𝜶

Instructor: Teklebirhan A. (Asst.Prof.) Page 44


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ under
̂ 𝑎𝑛𝑑 𝜷
𝜶 𝑎𝑛𝑑 𝜷. Thus, in this test first we compute the probabilities for the sample values of 𝜶
the hypothesized zero values for 𝜶 𝑎𝑛𝑑 𝜷 and reject the null-hypotheses of zero values of 𝜶 𝑎𝑛𝑑 𝜷
if we found these probabilities being small (relative to the level of significance).

Steps of Z – test
Step – 1: Determine the nature of the test/Determine whether a one-tail or a two-tail test/.
The test will be one tail-test if it is possible to determine the possible signs of 𝜶 ̂&𝜷 ̂ to be either
positive or negative using prior information. In this case the test will be right tail test if the signs of
̂&𝜷
𝜶 ̂ are only positive and it will be left tail test if the possible signs of 𝜶 ̂ are only negative. On
̂&𝜷
the other hand, the test must be two tail test if prior information as regard to the possible signs of
̂&𝜷
𝜶 ̂ is not available.

Step – 2: Specify the null and the alternative hypotheses of the test.

For one-tail test


For two-tail test Right-tail Left-tail
𝑯𝟎 : 𝜶 = 𝟎 𝒐𝒓 𝜷 = 𝟎 𝑯𝟎 : 𝜶 = 𝟎 𝒐𝒓 𝜷 = 𝟎 𝑯𝟎 : 𝜶 = 𝟎 𝒐𝒓 𝜷 = 𝟎
𝑯𝑨 : 𝜶 ≠ 𝟎 𝒐𝒓 𝜷 ≠ 𝟎 𝑯𝑨 : 𝜶 > 𝟎 𝒐𝒓 𝜷 > 𝟎 𝑯𝑨 : 𝜶 < 𝟎 𝒐𝒓 𝜷 < 𝟎

̂ ).
Step -3: Choose the level of significance of the test (𝜶
Level of significance is the subjective probability that a researcher considers as too low to assume. It
is the probability of making ‘wrong’ decision, i.e., the probability of rejecting the hypothesis(𝑯𝟎 )
while it is actually true or the probability of committing a type I error. It is customary in
econometric research to choose the 5% or the 1% level of significance. 5% level of significance
means that in making our decision we allow (tolerate) five times out of a hundred to be ‘wrong’ i.e.,
reject the hypothesis when it is actually true.

Step – 4: Determine the critical values as well as the rejection and acceptance regions of the
null hypothesis based on the nature of the test under the chosen level of significance. The
critical value/s is/are the maximum and/or minimum standardized value/s beyond which there is only
‘𝜶%’ (level of significance) chance for the estimate to assume values under the null hypothesis. The
critical value/s of the test is/are obtained directly from Z-distribution probability table. As a result,
the critical values are also known as table or theoretical values (𝒁𝒕 ).

Instructor: Teklebirhan A. (Asst.Prof.) Page 45


CHAPTER TWO: REGRESSION ANALYSIS 2024

For instance, if the level of significance is 5% and the test is two tail test, then the critical values
can be obtained directly from 𝒁 − 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒕𝒂𝒃𝒍𝒆 through determining the Z-values beyond
which there is only 5% chance for the estimate to assume values under the null hypothesis of zero
values for 𝜶 𝑎𝑛𝑑 𝜷. To obtain these values we read from Z-probability table that the Z-values
beyond which there are only 2.5% (= 𝜶⁄𝟐) chance for 𝜶 ̂ if the hypothesized zero values of
̂ 𝑎𝑛𝑑 𝜷
𝜶 𝑎𝑛𝑑 𝜷 were true. These values are −𝟏. 𝟗𝟔 𝑎𝑛𝑑 𝟏. 𝟗𝟔. Thus, with these critical values the
rejection and acceptance regions for the null-hypothesis will be:

−𝟏. 𝟗𝟔 𝜷=𝟎 𝟏. 𝟗𝟔
The range of possible values for 𝜶 ̂ 𝑜𝑟 𝜷̂ bounded by the critical values is known as acceptance
region. Acceptance region contains (1 − 𝛼)% of all possible values of the estimate under null
hypothesis. That is, there would be (1 − 𝛼)% probability for the estimate to be in the acceptance
region if the null-hypothesis were true. The remaining range of values out of the acceptance region is
known as rejection region. Rejection region contains 𝜶% of all possible values of an estimate under
the null-hypothesis. That is, the rejection region is an interval of values in which there is 𝜶% chance
for the estimate to assume values given the null hypothesis.

Step-5: Compute Z-values of 𝜶 ̂&𝜷 ̂ under the null-hypothesis. These values are the computed
̂ . The computed values (𝒁𝒄 ) shows the Z-values of 𝜶
̂ 𝑜𝑟 𝜷
values (𝒁𝒄 ) of 𝜶 ̂&𝜷 ̂ if the null-
hypothesis were true. To compute 𝒁𝒄 we use the Z- transformation formula.
𝜶̂−𝜶 ̂−𝜷
𝜷
𝒁𝒄 = 𝒐𝒓 𝒁𝒄 =
𝑺𝑬(𝜶 ̂) 𝑺𝑬(𝜷 ̂)

Where, 𝜶 ̂&𝜷 ̂ is the actual sample estimators of 𝜶 and 𝜷


𝑬(𝜶 ̂ ) = 𝜷 is the mean value of the estimate
̂ ) = 𝜶 or 𝑬(𝜷
̂ 𝑜𝑟 𝜷
𝒁𝒄 are the computed values (or Z-statistics) of 𝜶 ̂.
To compute 𝒁𝒄 we use the hypothesized values of 𝜶 and 𝜷.
Estimator − Hypothesized value
𝑰. 𝒆., 𝒁𝒄 =
Standard Error of the estimator
For instance, in the above hypothesis the hypothesized values of 𝜶 and 𝜷 are zero. Thus, the
̂ will be:
̂ 𝑎𝑛𝑑 𝜷
computed values (𝒁𝒄 ) of 𝜶

Instructor: Teklebirhan A. (Asst.Prof.) Page 46


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ −𝟎
𝜶 ̂
𝜶 ̂−𝟎
𝜷 ̂
𝜷
=> 𝒁𝒄 = = & 𝒁𝒄 = =
̂ ) 𝑺𝑬(𝜶
𝑺𝑬(𝜶 ̂) ̂ ) 𝑺𝑬(𝜷
𝑺𝑬(𝜷 ̂)
̂ with the table values (𝒁𝒕 ), and hence make
̂&𝜷
Step-6: Compare Computed Z-values, 𝒁𝒄 of 𝜶
decision about the null hypothesis (𝑯𝟎 ).
The null-hypothesis will be rejected if it is less probable for the sample values of 𝜶 ̂&𝜷̂ if the null-
hypothesis were true. Thus, the null-hypothesis will be rejected if:
 Z-values /Computed values/ 𝒁𝒄 of 𝜶 ̂&𝜷 ̂ are greater than the absolute values of the critical
values (table values, 𝒁𝒕 . I.e., Reject 𝑯𝟎 if 𝒁𝒄 > |𝒁𝒕 |, Or
 Z-values /Computed values/, 𝒁𝒄 of 𝜶 ̂&𝜷 ̂ falls outside the acceptance region.

Rule of Thumb
In general, for 𝟓% level of significance the critical value of 𝒁 is 𝟏. 𝟗𝟔, which is approximately equal
to 𝟐, the decision rule about accepting or rejecting the null hypothesis suggests that we reject the
null hypothesis if:
𝒁𝑪𝒐𝒎𝒑𝒖𝒕𝒆𝒅 > |𝟏. 𝟗𝟔| ≅ 𝒁 > |𝟐|
The decision rule is to reject the null hypothesis if 𝒁 > 𝟐.
̂
𝜷 ̂
𝜷
But we know that 𝒁𝒄 = 𝑺𝑬(𝜷̂). Therefore, the equation becomes, 𝑺𝑬(𝜷̂) > 𝟐.

̂ > 𝟐𝑺𝑬(𝜷
This implies Z would be greater than 2 if and only if 𝜷 ̂ ). This is the rule of thumb under
the standard error test. Thus, 𝒕𝒉𝒆 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 𝒕𝒆𝒔𝒕 𝒂𝒏𝒅 𝒁 – 𝒕𝒆𝒔𝒕 𝒂𝒓𝒆 𝒊𝒅𝒆𝒏𝒕𝒊𝒄𝒂𝒍; they
are two ways of saying the same thing.

Example: Suppose that the following estimated cereal crop supply function is obtained from a
sample of 150 farmers.
𝑸 = 𝟔𝟎 + 𝟒𝑷
𝑺𝑬: (𝟓𝟎) (𝟏. 𝟓)
̂ is statistically significant or not at 5% level of significance assuming that there is
a. Test whether 𝜷
no prior information about the possible sign of the slope of cereal crop supply function?
̂ at 5% level of significance assuming that you have ‘prior
b. Test the statistical significance of 𝜷
knowledge’ from micro economic theory of supply that the slope of supply function is positive?

iii) Student’s t-test


The Z-test is applicable to test the validity of any hypothesis if we have sufficiently large sample
(𝒏 > 𝟑𝟎). However, sometimes our sample size for econometric analyses may be smaller. Under
such situation, it is misleading to base our test on Z-distribution. Because with smaller samples
(𝒏 < 𝟑𝟎) the sampling distribution of (𝜶 ̂ ) will not follow Z-distribution. Thus, we have to
̂ 𝑜𝑟 𝜷

Instructor: Teklebirhan A. (Asst.Prof.) Page 47


CHAPTER TWO: REGRESSION ANALYSIS 2024

look for another statistical technique of test when we have too small sample to satisfactorily
approximate the unknown population variances of 𝜶 ̂ . One competent statistical technique of
̂ 𝑎𝑛𝑑 𝜷
test for small sample is t-test.

That is, with smaller samples the estimates 𝜶 ̂ 𝑎𝑛𝑑 𝜷 ̂ follow t-distribution with n-k degrees of
freedom (df), where k is the number of parameters in the model (i.e., 𝛼 & 𝛽) and n is the size of the
sample. Therefore, TSS of 𝜶 ̂ 𝑎𝑛𝑑 𝜷 ̂ can be evaluated based on t-test together with t-transformation
formula and t-probability table. The actual values of 𝜶 ̂ can be transformed into their
̂ 𝑎𝑛𝑑 𝜷
equivalent t-values in units of t-distribution with n-k degrees by using t-transformation formula as
follows:
Sample Mean − population mean
𝒕𝒄 =
Standard Error of sample mean
We can derive the t-value of the OLS estimates,
̂ −𝜶
𝜶 ̂ −𝜷
𝜷
𝒕𝜶̂ = & 𝒕𝜷̂ = ̂) ➔ with 𝒏 − 𝟐 degrees of freedom
̂)
𝑺𝑬(𝜶 𝑺𝑬(𝜷
̂ , respectively
̂&𝜷
Where: 𝒕𝜶̂ 𝑎𝑛𝑑 𝒕𝜷̂ are t-statistics of the estimates, 𝜶
𝜶 & 𝜷 are population parameters
𝑺𝑬(𝜶̂ ) & 𝑺𝑬(𝜷̂ ) are sample estimates of the true population standard deviations
𝒏 − 𝟐 is the degrees of freedom since k (the number of parameters) in SLRM are 𝟐.

The t –distribution is symmetric with mean equal to zero and variance (𝒏 – 𝟏)/ (𝒏 − 𝟑) which
approaches to unity as n gets larger. Thus, as n increases the t –distribution approaches to Z –
distribution which is symmetric with mean zero and unit variance. The probabilities of the t –
distribution at different degrees of freedom have been tabulated by W.S. Cosset. Thus, by using the t
–distribution probability table it is possible to compute the probability for the observed value of
̂ 𝒐𝒓 𝜷
𝜶 ̂ under the hypothesized expected values of the sampling distribution of 𝜶 ̂ (𝜶 𝒐𝒓 𝜷)
̂ 𝒐𝒓 𝜷
when the sample size is small (𝒏 ≤ 𝟑𝟎).

Steps to implement t-test


All the procedure of t-test are same with the procedure of Z-test except the fact that the probabilities
of various t –values depends on the number of df (n-k).

Step-1: Specify the null and the alternative hypotheses based on the nature of the test (two tail/one
tail).
Step-2: Compute 𝒕𝒄 − called computed value of 𝒕 (or test-statistics of the estimator), by taking the
value of 𝜶 𝒐𝒓 𝜷 in the null hypothesis.
𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐨𝐫 – 𝐇𝐲𝐩𝐨𝐭𝐡𝐞𝐬𝐢𝐳𝐞𝐝 𝐯𝐚𝐥𝐮𝐞
𝑰. 𝒆., 𝒕𝒄 =
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐄𝐫𝐫𝐨𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐨𝐫
̂−𝟎
𝜷 ̂
𝜷
𝑰. 𝒆., 𝒕𝜷̂ = = , 𝑓𝑜𝑟 𝜷
𝑺𝑬(𝜷̂ ) 𝑺𝑬(𝜷 ̂)

Instructor: Teklebirhan A. (Asst.Prof.) Page 48


CHAPTER TWO: REGRESSION ANALYSIS 2024

Where, in our case the hypothesized value of the slope parameter is zero, (𝑖. 𝑒. , 𝜷 = 𝟎)
Step-3: Choose the level of significance (be it 1%, 5% or 10%)
Step-4: Check whether there is one tail test or two tail test. If we are considering a two tail test at
5% level of significance, divide it by two to obtain critical value of t from the t-table.
Step-5: Obtain critical or table value of t, called 𝒕𝒕 at 𝜶⁄𝟐 and n-2 df for two tail test in SLRM (and
define the critical region).
Step-6: Compare the computed value of 𝒕, 𝒕𝒄 (𝒕𝜷̂ 𝑓𝑜𝑟 𝛽) and table value of 𝒕, 𝒕𝒕 , and hence make
decision about the statistical significance of the estimate
̂ is statistically significant.
✓ If 𝒕𝜷̂ > 𝒕𝒕 , reject 𝑯𝟎 and accept 𝑯𝑨 . The conclusion is 𝜷
̂ is statistically insignificant.
✓ If 𝒕𝜷̂ < 𝒕𝒕 , accept 𝑯𝟎 and reject 𝑯𝑨 . The conclusion is 𝜷

Numerical Example:
Recall that from a sample of size 𝒏 = 𝟔, we have estimated the following simple consumption
function (example-1):
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔
𝑺𝑬: (𝟎. 𝟓𝟒) (𝟎. 𝟎𝟓𝟔)
Where, the values in the brackets are standard errors.
Test the hypothesis that income doesn’t affect consumption expenditure using the t-test
at 5% level of significance?
Solution:
a. The hypothesis we want to test: 𝑯𝟎 : 𝜷 = 𝟎 𝒂𝒈𝒂𝒊𝒏𝒔𝒕 𝑯𝑨 : 𝜷 ≠ 𝟎
b. The t-value for the test statistic is:
̂−𝟎
𝜷 ̂
𝜷 𝟎. 𝟔𝟎𝟕
𝒕𝜷̂ = = = ≅ 𝟏𝟎. 𝟖𝟒
𝑺𝑬(𝜷 ̂ ) 𝑺𝑬(𝜷̂ ) 𝟎. 𝟎𝟓𝟔
𝟎.𝟎𝟓
c. Since the test is a two tail test, we calculate 𝜶⁄𝟐 = 𝟐 = 𝟎. 𝟎𝟐𝟓 and obtain the critical/table
value of ‘𝒕’ at 𝜶⁄𝟐 = 𝟎. 𝟎𝟐𝟓 and 4 degree of freedom (df), i.e., (𝒏 − 𝟐 = 𝟔 − 𝟐). From t-table
‘𝒕𝒕 ’ at 0.025 level of significance and 𝟒 df is 𝟐. 𝟕𝟕𝟔.
d. Since 𝒕𝜷̂ = 𝟏𝟎. 𝟖𝟒 > 𝒕𝒕 = 𝟐. 𝟕𝟕𝟔.
e. It implies that 𝜷̂ is statistically significant.

Hypothesis Testing using P-value Approach


P-value is the actual probability of committing type-I error in a particular test. That is, the actual
probability of rejecting the null-hypothesis while it’s true. It shows the possibility (chance) being
wrong if we reject the null hypothesis. Thus, smaller p-values are evidences against the null
hypothesis. That means, it’s possible to reject the null hypothesis if the p-values are smaller relative
to the level of significance.

Instructor: Teklebirhan A. (Asst.Prof.) Page 49


CHAPTER TWO: REGRESSION ANALYSIS 2024

In other words, the p-value is the lowest level of significance at which the observed value of a
test statistic is significant (i.e., one rejects 𝑯𝟎 ). The P-Value (or probability value) is the
̂ ) when the null hypothesis is true. The P-
̂ 𝑜𝑟 𝜷
probability of getting a sample statistic (such as 𝜶
value for obtaining a sample outcome is compared to the level of significance (𝜶).

Steps to Test hypothesis using P-Value Approach


a. State the null and alternative hypothesis based on the nature of the test.
b. Determine the level of significance (𝜶).
c. Compute the test statistics based on either Z or t-transformation formula (i.e., find the
observed z-value or t-Value of the statistics).
d. Determine the P-value based on Z- or T-probability table.
e. Make decision by comparing the P-value with the chosen level of significance (𝜶).
Decision Rule
If the computed P-value < 𝜶. Reject the null hypothesis and accept the alternative.
̂ is statistically significant.
That means, 𝜷
If the computed P-value > 𝜶. Accept the null hypothesis and Reject the alternative.
̂ is statistically insignificant.
That means, 𝜷
 Example: Suppose that the following estimated Wheat supply function is obtained from a
sample of 150 farmers.
𝑸 = 𝟏𝟎𝟎 + 𝟒𝑷
𝑺𝑬: (𝟏𝟎) (𝟏. 𝟓)
̂ using P-value Approach under the following conditions:
Test the statistical significance of 𝜷
a. Assuming that there is no prior information about the possible sign of the slope of
Wheat supply function?
b. Assuming that you have ‘prior knowledge’ from micro economic theory of supply that
the slope of the above supply function is positive?
iv) Confidence Interval Estimation and hypothesis testing
Rejection of the null hypothesis doesn’t mean that our estimate 𝜶 ̂ is the correct estimate of
̂ 𝑎𝑛𝑑 𝜷
the true population parameter𝜶 𝑎𝑛𝑑 𝜷. It simply means that our estimate comes from a sample
drawn from a population whose parameter 𝜶 𝑎𝑛𝑑 𝜷 is different from zero.

Thus, we need to estimate the interval of values between which the true values of the population
parameters are expected to lie within a certain “degree of confidence”. In statistics, the process of
estimating an interval of values between which the true values of the population parameters are

Instructor: Teklebirhan A. (Asst.Prof.) Page 50


CHAPTER TWO: REGRESSION ANALYSIS 2024

expected to lie based on the sampling distribution of the sample estimates is called interval
estimation.

An interval of values constructed (estimated) to predict ranges of values for the unknown population
̂ & the actual
̂&𝜷
parameters 𝜶 𝑎𝑛𝑑 𝜷 based on some clues (i.e., the sampling distribution of 𝜶
̂ obtained from the sample is known as confidence interval for the population
̂&𝜷
sample values of 𝜶
parameters 𝜶 𝑎𝑛𝑑 𝜷. And the degree of certainty we may assign to the interval to contain the true
value of the population parameter within its range is known as confidence level or confidence
coefficient. Confidence coefficient is the probability for the interval to contain the true value of the
population parameter within its range. In this respect, we say that with a given probability the
population parameter will be within the defined confidence interval.
I. Confidence interval from the Standard Normal Distribution (Z−Distribution)
The Z-distribution in SLRA is employed when the size of the sample is large (𝒏 > 𝟑𝟎). If the
sample is large, we can use the Z-transformation formula together with Z-distribution probability
table to construct confidence interval for 𝜷. To do this we must follow the following steps:

Step – 1: State the null and the alternative hypothesis


Step – 2: Choose the confidence coefficient (confidence level 𝛅)
This is the probability for the interval to contain the true value of 𝜷 within its range. It is customary
in econometric analysis to choose 95% confidence coefficient but one can choose any level of
confidence coefficient as s/he may wish.
Step – 3: Construct a 𝜹% standardized confidence interval.
✓ Determine standardized interval of values within which there are 𝜹% standardized values.
That is:
𝑷𝒓(𝒁𝟏 < 𝒁 < 𝒁𝟐 ) = 𝜹
For instance, if 𝜹 is 95% then the standardized confidence interval is (−𝟏. 𝟗𝟔, 𝟏. 𝟗𝟔). That is:
𝑷𝒓(−𝟏. 𝟗𝟔 < 𝒁 < 𝟏. 𝟗𝟔) = 𝟗𝟓%
Step -4: Change the standardized confidence interval into actual value confidence interval.
𝑷𝒓(𝒁𝟏 < 𝒁 < 𝒁𝟐 ) = 𝜹%
̂ −𝜷
𝜷
But we know from Z- transformation formula that: 𝒁 = 𝑺𝑬(𝜷̂)
Through substitution, we obtain,
̂ −𝜷
𝜷
𝑷𝒓 (𝒁𝟏 < < 𝒁𝟐 ) = 𝜹%
̂)
𝑺𝑬(𝜷
̂) < 𝜷
𝑷𝒓 ( 𝒁𝟏 . 𝑺𝑬(𝜷 ̂ − 𝜷 < 𝒁𝟐 . 𝑺𝑬(𝜷
̂ )) = 𝜹%

Instructor: Teklebirhan A. (Asst.Prof.) Page 51


CHAPTER TWO: REGRESSION ANALYSIS 2024

̂ + 𝒁𝟏 . 𝑺𝑬(𝜷
𝑷𝒓(−𝜷 ̂ ) < −𝜷 < 𝒁𝟐 . 𝑺𝑬(𝜷
̂) − 𝜷
̂ ) = 𝜹%

̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
𝑷𝒓 (𝜷 ̂) > 𝜷 > 𝜷
̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
̂ )) = 𝜹%

̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
𝑷𝒓 (𝜷 ̂) < 𝜷 < 𝜷
̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
̂ )) = 𝜹%

̂ will be (𝜷
Thus, the actual 𝜹% confidence interval for 𝜷 ̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
̂) < 𝜷 < 𝜷
̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
̂ )).

For instance, if 𝜹% is 95% then, the actual 95% confidence interval for 𝜷 will be:
̂−𝜷
𝜷
−𝟏. 𝟗𝟔 < < 𝟏. 𝟗𝟔
𝑺𝑬(𝜷̂)
̂) < 𝜷
−𝟏. 𝟗𝟔. 𝑺𝑬(𝜷 ̂ − 𝜷 < 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂)
̂) − 𝜷
−𝟏. 𝟗𝟔. 𝑺𝑬(𝜷 ̂ < −𝜷 < 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂) − 𝜷
̂
̂ + 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
𝜷 ̂) > 𝜷 > 𝜷
̂ − 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂)

𝜷=𝜷 ̂ ± 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷̂)


That is, if 𝜹% is 95% then the actual 95% confidence interval for 𝜷 will be:
𝜷=𝜷 ̂ ± 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷̂)
Graphically,

𝒁𝟏 = −𝟏. 𝟗𝟔 𝒁𝟐 = 𝟏. 𝟗𝟔
The meaning of this confidence interval is that there is 95% chance for this interval to contain the
true value of the unknown parameter 𝜷 within its range.
Therefore, for any confidence level, 𝜹%, the confidence interval for 𝜷 based on Z-distribution is
given as:
𝜷=𝜷 ̂ ± 𝒁𝒕 . 𝑺𝑬(𝜷
̂)

Note that: in the language of hypothesis testing, the confidence interval that we have established is
called the acceptance region and the area(s) outside the acceptance region is (are) called the critical
region(s), or region(s) of rejection of the null hypothesis. The lower and upper limits of the
acceptance region are called the critical values.

Step-5: Make Decision- we can use the established confidence intervals to test hypotheses about the
parameters, and the decision rules are:

Instructor: Teklebirhan A. (Asst.Prof.) Page 52


CHAPTER TWO: REGRESSION ANALYSIS 2024

➢ If the hypothesized value of 𝜷 in the null hypothesis lies within the confidence interval, accept
𝑯𝟎 . The implication is that 𝜷 ̂ is statistically insignificant.
➢ If the hypothesized value of 𝜷 in the null hypothesis is outside the limit, reject 𝑯𝟎 . This
̂ is statistically significant.
indicates 𝜷

Example: construct a 95% confidence interval for the population slope parameter 𝜷, for the
estimated Wheat supply function.
𝑸 = 𝟔𝟎 + 𝟒𝑷
𝑺𝑬: (𝟓𝟎) (𝟏. 𝟓)
Where, the values in the bracket are standard errors and 𝒏 = 𝟏𝟓𝟎 𝒇𝒂𝒓𝒎𝒆𝒓𝒔

a. Construct 95% confidence interval for the slope parameter


b. Test the significance of the slope parameter using the constructed confidence interval.
Solution:
a. The hypothesis: 𝑯𝟎 : 𝜷 = 𝟎 𝑨𝒈𝒂𝒊𝒏𝒔𝒕 𝑯𝑨 : 𝜷 ≠ 𝟎
b. The limit within which the true 𝜷 lies at 95% confidence interval is:
̂ ± 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
𝜷=𝜷 ̂)
̂ = 𝟒, 𝑺𝑬(𝜷
𝜷 ̂ ) = 𝟏. 𝟓

As given, ±𝟏. 𝟗𝟔 is the critical value of the test


𝜷 = 𝟒 ± 𝟏. 𝟗𝟔(𝟏. 𝟓) = 𝟒 ± 𝟐. 𝟗𝟒
That is, the confidence interval is (𝟏. 𝟎𝟔, 𝟔. 𝟗𝟒)
c. Therefore, the value of 𝜷 in the null hypothesis is zero which implies it is outside the
confidence interval. Therefore, we can reject the null hypothesis that the true 𝜷 is zero with
̂ is statistically significant.
a 95% confidence coefficient. Hence, 𝜷

2. Confidence interval from the Student’s t-distribution.


As we said earlier t-distribution in SLRM is applied when the sample size is small (𝒏 ≤ 𝟑𝟎). That
is, if the sample size small (𝑛 ≤ 30) then, we should use t – probability distribution table together
with 𝒕– 𝒕𝒓𝒂𝒏𝒔𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝒇𝒐𝒓𝒎𝒖𝒍𝒂 to construct confidence interval for 𝜶 𝑎𝑛𝑑 𝜷.

The procedure for constructing confidence interval with t-distribution is similar to the one outlined
for Z-distribution the only difference is we must take into account the degrees of freedom while
using t-distribution probability table. Therefore, for any confidence level, 𝜹%, the confidence
interval for 𝜶 𝑎𝑛𝑑 𝜷 based on t-distribution can be given as:
̂)
̂ ± 𝒕𝜶⁄ . 𝑺𝑬(𝜶
𝜶=𝜶 𝑎𝑛𝑑 ̂ ± 𝒕𝜶⁄ . 𝑺𝑬(𝜷
𝜷=𝜷 ̂)
𝟐 𝟐

Instructor: Teklebirhan A. (Asst.Prof.) Page 53


CHAPTER TWO: REGRESSION ANALYSIS 2024

Exercise: Given the estimated model of consumption:


̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔
𝑺𝑬: (𝟎. 𝟓𝟒) (𝟎. 𝟎𝟓𝟔)
a. Construct 95% confidence interval for both 𝜶 𝑎𝑛𝑑 𝜷 based on t-distribution
b. Test the significance of the estimators using constructed confidence interval.

2.8. Prediction using Simple Linear Regression Model

Forecasting using regression involves making predictions about the dependent variable based on
average relationships observed in the estimated model. Predicted values are values of the dependent
variable based on the estimated model and a prediction about the values of the independent
variables.
For a simple regression, the value of Y is predicted as:

̂=𝜶
𝒀 ̂ 𝑿𝒑
̂ +𝜷
Where, 𝒀 ̂ is the predicted value of the dependent variable, and
𝑿𝒑 is the predicted value of the independent variable (input).
The estimated intercept and all the estimated slopes are used in the prediction of the value of the
dependent variable, even if a slope is not statistically significantly different from zero.
Example-1:
Suppose you have an estimated model of sales (Y) of firms producing a particular product as a
function of advertisement expenditure (X):
̂ = 𝟐. 𝟓𝑿𝟏
𝒀
In addition, you have predicted value for the independent variable (advertisement expenditure
in ETB), 𝐗 𝟏 = 𝟐𝟎𝟎. Then the predicted value for Sales is 500 Units.
Example-2:
Recall our estimated consumption Model: 𝐂𝐨𝐧𝐬̂ 𝐢 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝐢𝐧𝐜𝐢 . Based on this estimated
model predict the consumption expenditure of a household whose income is 12 ETB?
̂ 𝐢 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕(𝟏𝟐) = 𝟖. 𝟖𝟏𝟒. That means, a household with an income
Solution: 𝐂𝐨𝐧𝐬
of 12 ETB will spend 8.814 ETB of its income for consumption.
Reporting the Results of Regression Analysis
The results of the regression analysis derived are reported in conventional formats. It is not sufficient
merely to report the estimates of 𝜷’s.
➢ There are two conventional ways to report a regression result:
i. Equation form, i.e., by fitting the estimated coefficients in to the regression model and
ii. Table form

Instructor: Teklebirhan A. (Asst.Prof.) Page 54


CHAPTER TWO: REGRESSION ANALYSIS 2024

i. Equation Form /Fitting the estimated coefficients in to the regression model/


It is customary to present the estimated equations with standard errors placed in parenthesis below
the estimated parameter values. These results are supplemented by 𝑹𝟐 on (to the right side of the
regression equation).
Example: The estimated consumption function can be presented as:
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔
𝑺𝑬: (𝟎. 𝟓𝟒) (𝟎. 𝟎𝟓𝟔) 𝑹𝟐 = 𝟎. 𝟗𝟕
The numbers in the parenthesis below the parameter estimates are the standard errors.
➢ Some econometricians report the t-values of the estimated coefficients in place of the standard
errors.
ii. Table Form
In this case, the estimated coefficients, the corresponding t-statistics, and some other indicators are
presented in tabular form.
Example: The estimated regression result of our consumption function can be presented using
table as follows:

Instructor: Teklebirhan A. (Asst.Prof.) Page 55

You might also like