CH-2-Part-I-Simple Linear Regression Analysis
CH-2-Part-I-Simple Linear Regression Analysis
CHAPTER TWO
REGRESSION ANALYSIS
This part introduces students to the theory underlying the simplest possible regression analysis,
namely, the two variables regression in which the dependent variable is linearly related to a
single explanatory variable. This case is considered here, not because of its practical adequacy,
but because it presents the fundamental ideas of regression analysis as simply as possible.
Moreover, as we shall see in chapter three, the more general multiple regression analysis in
which the regressand is related to two or more regressors is a logical extension of the two-
variable case. We shall first discuss the basic concept of regression, and then proceed to the core
of simple linear regression analysis.
The term regression was introduced by Francis Galton. In his famous paper “Family Likeness in
Stature”, Galton found that, although there was a tendency for tall parents to have tall children and
for short parents to have short children, the average height of children born of parents of a given
height tended to move or “regress” toward the average height in the population as a whole.
Regression analysis is concerned with the study of the dependence of one variable (the
dependent variable) on one or more other variables (the explanatory variable(s)). In other words,
regression analysis is concerned with describing and evaluating the relationship between a given
variable, dependent variable, and one or more other variables, independent variable(s). The
objective of regression analysis is to estimate and/or predict the unknown (population) mean
value of the dependent variable in terms of the known values of the explanatory variables.
Regression analysis deals with statistical dependence among variables; but not with deterministic
dependence among variables. In statistical relationships, we essentially deal with random
(stochastic) variables; i.e., variables that have probability distributions.
For instance, an economist may be interested in studying the dependence of household monthly
consumption expenditure on household monthly disposable income. That is, our concern might
be with predicting the average consumption expenditure knowing household monthly
disposable income. Such an analysis is helpful in estimating the marginal propensity to consume
(MPC), that is, average change in consumption expenditure for, say, a unit change in disposable
income. To see how this can be done, consider Figure-2.1 below.
The line that passes through the average level of consumption expenditure for each level of
household income is known as the regression line. It shows how the average consumption
expenditure increases with the household’s income.
8000
6000
4000
2000
Regression Line
0
2000 4000 6000 8000 10000 12000
Income (In Birr)
i.e., economic theory. In other words, statistical relationships by themselves cannot logically imply
causation. To ascribe causality, one must appeal to ‘a priori’ or theoretical considerations.
In addition, regression analysis is closely related to correlation analysis but conceptually there is
huge difference between the two analyses. The primary objective of correlation analysis is to
measure the strength or degree of linear association between two variables. However, in regression
analysis (as already noted), we try to predict the average value of the dependent variable on the basis
of fixed values of the explanatory variables. (Check section 2.3. in the lecture ppt)
Terminologies in Regression Theory
The variables in a regression relation consist of dependent and explanatory variables. The dependent
variable is the variable whose variation is being explained by the other variable(s). The explanatory
variable is the variable whose variation is used to explain the variation in the dependent variable.
In the literature on regression, the terms dependent variable and explanatory variable are described
variously. The following is a representative list of the various terminologies used in regression
analysis:
Dependent Variable Explanatory Variable
Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
Note that Regression analysis can be simple or multiple depending on the number of variables
included in the analysis. That is, if we are studying the dependence of a variable on only a single
explanatory variable, such as the dependence of consumption expenditure on the level of real
income, such a study is known as simple, or two-variable, regression analysis. However, if we are
studying the dependence of one variable on more than one explanatory variable, such as the
dependence of crop-yield on rainfall, labor spent, farm size, fertilizer, and etc, it is known as
Multiple Regression Analysis.
On the basis of various criteria such as the type of data, the nature of the dependent variable, etc,
regression models may be classified in to various categories. What follows presents a simple
description of it.
Where, 𝒀𝒊 stands for the ith household consumption expenditure, 𝑋𝑖 represents the observed
ith household monthly income, 𝑈𝑖 represents all other variables that can affect households’
consumption spending.
The econometrics model given in [2.1] above is called population regression model or, simply, the
population model.
This population regression model is called the true relationship because Y, X and U represent
their respective population values, and 𝜶 and 𝜷 are called the true parameters.
In the model there is only one factor X (income) to explain Y (Consumption expenditure). All the
other factors that affect Y are jointly captured by 𝑼. That means, a stochastic model is a model in
which the dependent variable is not only determined by the explanatory variable(s) included in the
model but also by others which are not included in the model. That is, the first component is the part
of Y explained by the changes in X and the second is the part of Y not explained by X, that is, to say
the change in Y is due to the random influence of U.
We can distinguish two parts from the above population model: the systematic component, 𝜶 + 𝜷𝑿𝒊
and the random disturbance, 𝑼𝒊 . Calling 𝝁𝒊 to the systematic component, we can write:
𝝁𝒊 = 𝜶 + 𝜷𝑿𝒊
This equation is known as the population regression function (PRF) or population line. Therefore,
as can be seen in figure 2.3, 𝝁𝒊 is a linear function of X with intercept 𝜶 and slope, 𝜷.
However, we can’t estimate the population parameters, 𝜶 and 𝜷 since we can’t have
population/census data for economic reasons. Therefore, we always opt to draw samples and
estimate population parameters based on sample information.
The sample regression function (SRF) is the sample counterpart of the population regression
function (PRF). Since the SRF is obtained for a given sample, a new sample will generate different
estimates. The SRF, which is an estimation of the PRF, given by
̂𝒊 = 𝜶
𝒀 ̂ 𝑿𝒊
̂+𝜷
̂ 𝒊 ) for 𝒀 when 𝑿 = 𝑿𝒊 . In the SRF 𝜶
allows us to calculate the fitted value (𝒀 ̂ are estimators of
̂ and 𝜷
the population parameters 𝜶 and 𝜷. For each 𝑿𝒊 , we have an observed value (𝒀𝒊 ) and a fitted value
̂ 𝒊 ). The difference between 𝒀𝒊 and 𝒀
(𝒀 ̂ 𝒊 is called the residual, 𝒆𝒊 :
̂ 𝒊 = 𝒀𝒊 − (𝜶
𝒆𝒊 = 𝒀𝒊 − 𝒀 ̂ 𝑿𝒊 )
̂+𝜷
̂𝒊.
▪ That is, the residual 𝒆𝒊 is the difference between the sample value, 𝒀𝒊 and the fitted value, 𝒀
➢ Rearranging the above equation, we obtain the population regression function (PRF) in its
stochastic form given by:
̂ 𝑿𝒊 + 𝒆𝒊 … … … … … … … … … … (𝟐. 𝟐)
̂+𝜷
𝒀𝒊 = 𝜶
̂, 𝒀
̂, 𝜷
To sum up, 𝜶 ̂ 𝒊 and 𝒆𝒊 , are the sample counterpart of 𝜶, 𝜷, 𝒀𝒊 and 𝑼𝒊 , respectively. It is possible
̂ for a given sample, but the estimates will change for each sample. On the
̂ and 𝜷
to calculate 𝜶
contrary, 𝜶 and 𝜷 are fixed, but unknown. Therefore, right now, our major task is to estimate the
population regression function (PRF) on the basis of the sample regression function (SRF).
8000
Sample Regression
Function (SRF)
6000
4000 𝑼𝒊 𝒆𝒊
Population Regression
Function (PRF)
2000
0
Fig.2.3: The Scatter plot diagram, Population Regression Function and Sample Regression Function
This econometric model of consumption hypothesizes that the dependent variable Y (consumption)
is linearly related to the explanatory variable X (income) as well as the slope coefficient, 𝜷.
Such a model that describes the relationship between only two variables is called Simple Linear
Regression Model. The term linear regression implies that the dependent variable is linear in
parameters; but it may or may not be linear in explanatory variables.
Specifying the model is the first stage of any econometric application. The next step is the
estimation of the numerical values of the parameters of economic relationships. As far as methods
of estimation are concerned, the parameters of the simple linear regression model can be estimated
by various methods. Three of the most commonly used methods are:
1. Ordinary Least Square Method (OLS)
2. Method of Moments (MM)
3. Maximum Likelihood Method (MLM)
But, having some desirable properties (property of linearity, unbiasedness, and minimum
variance), OLS method is the most popular method to estimate regression parameters.
In this part, therefore, we shall estimate the two-variable PRF: 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 by the method
of Ordinary Least Square Method (OLS). However, the PRF is not directly observable. Thus, we
shall estimate PRF from the SRF: 𝒀𝒊 = 𝜶 ̂+𝜷 ̂ 𝑿𝒊 + 𝒆𝒊 .
To estimate 𝜶 and 𝜷 we have to collect data on 𝒀, 𝑿 and 𝑼. Nonetheless, we cannot get data on U
as it is stochastic and can never be observed. Therefore, in order to estimate the parameters, we
should guess the values of 𝑼𝒊 , i.e., make some plausible assumptions about the shape and
distribution of U.
2.4.1. The Basic Assumptions of the Classical Linear Regression Analysis (OLS)
The method of OLS is attributed to Carl Friedrich Gauss, a German Mathematician. OLS is an
econometric method used to derive estimates of the parameters of economic relationships from
statistical observations. However, it works under some restrictive assumptions. The most important
of these assumptions are discussed below.
For example:
a. 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟐𝟏 + 𝜷𝟐 √𝑿𝟐 + 𝑼𝒊
is a linear model because 𝜕𝑌⁄𝜕𝛽𝑖 , (𝑖 = 0, 1, 2) are independent of the parameters, 𝛽𝑖 (𝑖 = 0, 1, 2)
b. 𝒀 = 𝜷𝟎 + 𝜷𝟐𝟏 𝑿𝟏 + 𝜷𝟐 𝒍𝒐𝒈𝑿𝟐 + 𝑼𝒊
is a non-linear model because 𝜕𝑌⁄𝜕𝛽1 = 2𝛽1 𝑋1, depends on 𝛽1 although 𝜕𝑌⁄𝜕𝛽0 and 𝜕𝑌⁄𝜕𝛽2 are
independent of any of the parameters, 𝛽0 or 𝛽2.
c. 𝒀 = 𝜶 + 𝜷𝑿 + 𝑼𝒊 is linear in both parameters and the variables, so it satisfies the assumption.
Note that the classicals assumed that the model should be linear in the parameters regardless of
whether the explanatory and the dependent variables are linear or not.
given value of X, they would have an average value equal to zero. In other words, the positive and
negative values of 𝑼 cancel each other.
Mathematically,
𝑬(𝑼𝒊 ) = 𝟎………………………………..…. (2.3)
The implication of this assumption is that the factors not explicitly included in the model and
therefore subsumed in 𝑼𝒊 , do not systematically affect the mean value of Y. i.e., the positive 𝑼𝒊
values cancel out the negative 𝑈𝑖 values so that their average effect on Y is zero.
Given 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 , this assumption leads to the fact that: 𝑬(𝒀𝒊 ) = 𝜶 + 𝜷𝑿𝒊
3.The variance of the error term (𝑼𝒊 ) is constant across observations (the assumption of
homoscedasticity)
For all values of X, the U’s will show the same dispersion around their mean. This implies that,
given the value of X, the variance of 𝑈𝑖 is the same (constant) for all observations.
Note: this assumption implies that the values of Y corresponding to various values of X have
constant variance.
➢ The reason for this assumption is that if U is normally distributed, so will Y and the estimated
regression coefficients, and
✓ This will be useful in performing tests of hypotheses and constructing confidence
intervals for 𝜶 and 𝜷.
➢ This assumption is required mainly for hypothesis testing (inference).
If this assumption is not adopted, the regression model would be very difficult to handle. In any case,
it may be acceptable to postulate that the model parameters are stable over time.
6.The random terms of different observations (𝑼𝒊 , 𝑼𝒋 ) are independent (I.e., No autocorrelation
or No serial correlation)
This means the value which the random term assumed in one period does not depend on the
value which it assumed in any other period.
Algebraically,
𝑪𝒐𝒗(𝑼𝒊 , 𝑼𝒋 ) = 𝑬[[(𝑼𝒊 − 𝑬(𝑼𝒊 )][𝑼𝒋 − 𝑬(𝑼𝒋 ]]
𝑪𝒐𝒗(𝑼𝒊 , 𝑼𝒋 ) = 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎…………………………..…. (2.5)
If errors are serially correlated, an increase in the error term in one time period affects the error term
in the next period. Autocorrelation is a serious problem in time series data. That is, the assumption
that there is no serial correlation can be unrealistic in time series.
8. Exogeneity: All explanatory variables are uncorrelated with the error term.
The explanatory variables are determined outside the model (they are exogenous). This means that
there is no correlation between the random variable and the explanatory variable(s).
What happens if this assumption is violated?
Suppose we have the model: 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝒊 + 𝑼𝒊 . Suppose 𝑋𝑖 and 𝑈𝑖 are positively correlated,
i.e., when 𝑋𝑖 is large, 𝑈𝑖 tends to be large as well.
Why would 𝑋𝑖 and 𝑈𝑖 be correlated? Suppose you are trying to study the relationship between the
price of Burger and the quantity sold across various restaurants of a given city. Suppose you estimate
the following model: 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑷𝒓𝒊𝒄𝒆 + 𝑼𝒊 . The problem in this model, however, is that
quality of Burger differs across restaurants. Thus, quality should be included as explanatory variable,
but you fail to do so. Then it becomes part of 𝑼𝒊 . But price and quality are highly positively
correlated. Therefore, 𝑋𝑖 and 𝑈𝑖 are positively correlated. This means that 𝛽̂1 will be too high. This
is called Omitted Variables Bias.
Thus, to have unbiased estimator, the exogeneity assumption is important. Algebraically, if two
variables are unrelated their covariance shall be zero.
𝐼. 𝑒. , 𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝟎 … … … … … … … … … … … … … … … . . … . 2.6
Proof:-
𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝑬[[(𝑿𝒊 − 𝑬(𝑿𝒊 )][𝑼𝒊 − 𝑬(𝑼𝒊 )]]
= 𝑬[(𝑿𝒊 − 𝑬(𝑿𝒊 )(𝑼𝒊 )], given 𝑬(𝑼𝒊 ) = 𝟎
= 𝑬(𝑿𝒊 𝑼𝒊 ) − 𝑬(𝑿𝒊 )𝑬(𝑼𝒊 )
= 𝑬(𝑿𝒊 𝑼𝒊 ), given 𝑬(𝑼𝒊 ) = 𝟎
= 𝑿𝒊 𝑬(𝑼𝒊 ), given that the 𝑿𝒊 are fixed values.
𝑪𝒐𝒗(𝑿𝒊 𝑼𝒊 ) = 𝟎
we will not be able to explain much of the variation in the consumption expenditure of the
households.
11. Absence of high multi-collinearity among explanatory variables (specific to Multiple
regression models)-(More in Chapter-3)
The explanatory variables are not perfectly correlated with each other. In other words, there is no
perfect linear relationship among the explanatory variables. This assumption however, does not
exclude non-linear relationships among the explanatory variables.
̂ 𝑿𝒊 + 𝒆𝒊 .
̂+𝜷
From (2.2), we know that the simple regression model in its stochastic form is 𝒀𝒊 = 𝜶
Estimation of 𝜶 and 𝜷 by least square method (LS) or classical least square (CLS) involves finding
̂ which will minimize the sum of squared residuals (∑ 𝒆𝒊 𝟐 ).
̂ and 𝜷
values for the estimators 𝜶
According to the least squares criterion, the best SRF (line) is obtained by minimizing the aggregate
of squared prediction errors in the above equation. The rationale of this criterion is straightforward.
It is intuitively obvious that the smaller the deviations of the true values of Y from the SRF (line), at
each values of X in the sample, the better fit the SRF (line) to the scatter of sample observations.
Thus, the least squares criterion calls for the values of the parameters of a model to be determined in
such a way that minimizes the sum of the squares of the deviations of the true values of Y from the
SRF (line) at each values of X in the sample. The total squared prediction error of a regression
line can be obtained by adding the squares of the difference between the true values of Y and the
̂ values of Y at each values of X in the sample.
estimated (predicted) 𝒀
Symbolically,
̂ 𝒊 ]𝟐
Total squared prediction error ∑ 𝒆𝒊 𝟐 = ∑[𝒀𝒊 − 𝒀
i.e.,
Total squared prediction error : ∑ 𝒆𝒊 𝟐 = ∑[𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ]𝟐 … … … … … … . (𝟐. 𝟗)
̂−𝜷
̂𝒊 = 𝜶
where, 𝒀 ̂ 𝑿𝒊
̂+𝜷
To find the values of 𝜶 ̂ that minimize this sum, we have to partially differentiate ∑ 𝒆𝒊 𝟐
̂ and 𝜷
̂ and set the partial derivatives equal to zero.
̂ and 𝜷
with respect to 𝜶
𝝏 ∑ 𝒆𝒊 𝟐
1. = −𝟐(∑ 𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 𝟎 … … … … … … … … … … … … . . (𝟐. 𝟏𝟎)
̂−𝜷
̂
𝝏𝜶
𝝏 ∑ 𝒆𝒊 𝟐
2. = −𝟐 ∑ 𝑿𝒊 (𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 ) = 𝟎 … … … … … … … … … … … … . . (𝟐. 𝟏𝟏)
̂−𝜷
𝝏𝜷̂
Equations (2.10) and (2.11) are often called the first order conditions for the OLS estimates, a
term that comes from optimization using calculus. The name “ordinary least squares” comes
from the fact that these estimates minimize the sum of squared residuals.
Note: at this point that the term in the parenthesis in equation 2.10 and 2.11 represents the
̂−𝜷
residual, 𝒆𝒊 = 𝒀𝒊 − 𝜶 ̂ 𝑿𝒊 . Hence, it is possible to rewrite (2.10) and (2.11) as −𝟐 ∑ 𝒆𝒊 = 𝟎
and −𝟐 ∑ 𝑿𝒊 𝒆𝒊 = 𝟎. It follows that;
∑ 𝒆𝒊 = 𝟎 and ∑ 𝑿𝒊 𝒆𝒊 = 𝟎 … … … … … … … … … … … … … … (𝟐. 𝟏𝟐)
∑ 𝒀𝒊 = 𝒏𝜶 ̂ ∑ 𝑿𝒊 … … … … … … … … … … … . . … … (𝟐. 𝟏𝟑)
̂+𝜷
➢ If we rearrange equation (2.11), we obtain:
̂ ∑ 𝑿𝟐𝒊
̂ ∑ 𝑿𝒊 + 𝜷
∑ 𝒀𝒊 𝑿𝒊 = 𝜶 … … … … … … … … … … … … … . (𝟐. 𝟏𝟒)
∑ 𝑋𝑖 = ∑ 𝑋𝑖 : ∑ 𝑎 = 𝑛𝑎 ∶ ∑ 𝑎𝑋𝑖 = 𝑎 ∑ 𝑋𝑖 ∶ ∑(𝑋𝑖 + 𝑌𝑖 ) = ∑ 𝑋𝑖 + ∑ 𝑌𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
Solving the normal equations of OLS simultaneously we get expressions with which the optimal
numeric values of 𝜶̂ and 𝜷̂ are determined based on the least squares criterion. Dividing both sides
̂ we get the following expression:
of 2.13 by 𝒏 and solving for 𝜶
̂=𝒀
𝜶 ̂𝑿
̅−𝜷 ̅ … … … … … … … … … … … … … … . . … (𝟐. 𝟏𝟓)
̅𝒀
∑ 𝒀𝒊 𝑿𝒊 − 𝒏𝑿 ̅
̂=
𝜷
̅ 𝟐 ………….……………………………..…. (2.16)
∑ 𝑿𝟐𝒊 − 𝒏𝑿
Alternatively, equation (2.14) can be rewritten in somewhat different way as follows;
̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅ ) = ∑(𝒀𝒊 𝑿𝒊 − 𝒀
̅ 𝑿𝒊 − 𝑿
̅ 𝒀𝒊 + 𝑿
̅𝒀̅)
∑ 𝒀𝒊 𝑿𝒊 − 𝒀̅ ∑ 𝑿𝒊 − 𝑿
̅ ∑ 𝒀𝒊 + 𝒏𝑿̅𝒀
̅
∑ 𝒀𝒊 𝑿𝒊 − 𝒏 𝒀̅𝑿
̅ − 𝒏𝑿̅𝒀̅ + 𝒏𝑿
̅𝒀̅
̅ )(𝒀𝒊 − 𝒀
∑(𝑿𝒊 − 𝑿 ̅ ) = ∑ 𝒀𝒊 𝑿𝒊 − 𝒏 𝒀
̅𝑿̅ … … … … … … … … … … (𝟐. 𝟏𝟕)
̅ )𝟐 = ∑ 𝑿𝟐𝒊 − 𝒏𝑿
∑(𝑿𝒊 − 𝑿 ̅ 𝟐 … … … … … … … … … … … … … … . (𝟐. 𝟏𝟖)
Given consumption (Y) and income (X) data, both in thousands of Birr, of six households, obtain the
OLS estimators of 𝜶 𝒂𝒏𝒅 𝜷?
Therefore, the fitted regression equation (i.e., OLS regression line) is:
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔
This formula involves the actual values (observations) of the variables and not their
̂.
deviation forms as in the case of unrestricted value of 𝜷
Consider a supply function where firms will not produce any amount of the commodity if the market
price is zero. A preliminary analysis of a cross-sectional data based on a sample of 50 firms provides
the following intermediate result: ∑ 𝑸𝒊 𝑷𝒊 = 𝟕𝟓𝟎𝟎, ∑ 𝑷𝒊 𝟐 = 𝟏𝟎𝟎𝟎, 𝑷
̅ = 𝟏𝟓 & 𝑸
̅ = 𝟏𝟓𝟎
∑ 𝒀𝒊 𝑿𝒊 ∑ 𝑸𝒊 𝑷𝒊 𝟕𝟓𝟎𝟎
̂=
𝜷 = = = 𝟕. 𝟓
∑ 𝑿𝒊 𝟐 ∑ 𝑷𝒊 𝟐 𝟏𝟎𝟎𝟎
̂ = 𝟕. 𝟓𝑷
Therefore, the estimated regression line is: 𝑸𝑺
a) Linear Model
The coefficient measures the effect of the regressor X on Y. Let us look at this in detail. Consider the
sample regression line of a SLRM:
̂𝒊 = 𝜶
𝒀 ̂ 𝑿𝒊 … … … … … … … … … … . . (𝒂)
̂+𝜷
Taking the first order difference both sides of (a) and rearranging, we obtain:
̂𝒊
𝒅𝒀
̂ 𝒅𝑿𝒊 ,
̂𝒊 = 𝜷
𝒅𝒀 ̂ … … … … … (𝒃)
=𝜷
𝒅𝑿𝒊
̂ is the change in Y (in the units in which Y is measured) by a unit change of X (in the
Therefore, 𝜷
units in which X is measured).
For instance, in our consumption example above the fitted linear model was
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊 .
𝑪𝒐𝒏𝒔
̂ = 𝟎. 𝟔𝟎𝟕, can be interpreted as: if income increases by 1 thousands Birr, on average,
Therefore, 𝜷
consumption will increase by 0.607 thousands Birr.
➢ The linearity of such a model implies that a one-unit change in X always has the same effect on
Y, regardless of the value of X considered.
Suppose the underlying relationship between two variables (X & Y) is given as: 𝒀𝒊 = 𝒆(𝜶+𝜷𝑿𝒊 +𝑼𝒊 ).
By taking natural logs on both sides of the above model, we obtain the following log-linear model:
𝒍𝒏(𝒀𝒊 ) = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 … … … … … … … … … … … … (𝒄)
The corresponding sample regression function to (𝒄) be the following:
̂𝒊 ) = 𝜶
𝒍𝒏(𝒀 ̂ 𝑿𝒊 … … … … … … … … … … … … … … (𝒅)
̂+𝜷
The slope coefficient in this model measures relative change in Y for a given absolute change in X.
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝒀⁄
̂=
I.e., 𝜷 = 𝒀
𝐀𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐞𝐱𝐩𝐥𝐚𝐧𝐚𝐭𝐨𝐫𝐲 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞 ∆𝑿
If we multiply the relative change in Y by 100, equation (d) will give the percentage change or the
growth rate in Y for an absolute change in the explanatory variable, i.e., 100 times 𝜷 ̂ gives the
growth rate in Y.
In other words, taking first order differences in (d), and then multiplying both sides by 100%, we
obtain
̂ 𝒊 )% = 𝜶
𝟏𝟎𝟎𝒅𝒍𝒏(𝒀 ̂ + 𝟏𝟎𝟎𝜷 ̂ %𝒅𝑿𝒊
̂ %.
̂ will increase by 𝟏𝟎𝟎 ∗ 𝜷
Therefore, if 𝑿 increases by 1 unit, then 𝒀
Where, EDUC (education) is measured in years of schooling and Wage is hourly wage in ETB.
c) Linear-Log Model
A linear-Log model is a function where the dependent variable is defined as a linear function of the
logarithm of the explanatory variable as shown below:
𝒀𝒊 = 𝜶 + 𝜷𝒍𝒏𝑿𝒊 + 𝑼𝒊 … … … … … … … … … … . . (𝒇)
The corresponding fitted function is the following:
̂𝒊 = 𝜶
𝒀 ̂ 𝒍𝒏𝑿𝒊 … … … … … … … … … … … … . . . (𝒈)
̂+𝜷
The slope coefficient in this model measures absolute change in Y for relative change in X.
𝐀𝐛𝐬𝐨𝐥𝐮𝐭𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐘 ∆𝒀
̂=
I.e., 𝜷 = ∆𝑿 or ̂ ∆𝑿⁄
∆𝒀 = 𝜷
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐜𝐡𝐚𝐧𝐠𝐞 𝐢𝐧 𝐗 ⁄𝑿 𝑿
Therefore, taking first order differences in (𝒈), and then multiplying and dividing the right hand side
by 100, we have
̂
𝜷
̂𝒊 =
∆𝒀 𝟏𝟎𝟎%∆𝒍𝒏𝑿𝒊
𝟏𝟎𝟎
̂ /𝟏𝟎𝟎) units.
̂ will increase by (𝜷
Therefore, if X increases by 1%, then 𝒀
➢ And the elasticity of Y to X is given by:
𝒅𝒀 𝑿 𝒅𝒀 𝟏 𝟏
𝑬𝒀,𝑿 = . = ̂ . … … … … … … … … … … … … (𝒉)
. =𝜷
𝒅𝑿 𝒀 𝒅𝒍𝒏𝑿 𝒀 𝒀
𝒅𝑿
NB: 𝒅𝒍𝒏𝑿 = 𝑿
Numerical Example-5: Estimation of Quantity demand function of coffee: Log- Log Model
Suppose that to examine the effect of coffee price on quantity demanded of coffee, an investigator
estimated the following Log-Log Model:
̂ 𝒍𝒏(𝑪𝒐𝒇𝒇𝒑𝒓𝒊𝒄𝒆) + 𝑼𝒊
𝒍𝒏(𝑸𝑫𝑪𝒐𝒇𝒇) = 𝜶 + 𝜷
The fitted log-log model has been given as:
̂ ) = 𝟑. 𝟓 − 𝟓. 𝟐𝟓𝒍𝒏(𝒄𝒐𝒇𝒇𝒑𝒓𝒊𝒄𝒆)
𝒍𝒏(𝑸𝑫𝒄𝒐𝒇𝒇
̂ : if the price of coffee increases by 1%, the quantity
➢ Interpretation of the coefficient, 𝜷
demanded of coffee will decrease by 5.25%. In this case, 𝜷̂ represents the estimated price
elasticity of demand.
̂ in Different Models
Summary: Interpretation of 𝜷
Model If X increases by Then Y will change by
Linear 1 unit ̂ 𝒖𝒏𝒊𝒕𝒔
𝜷
Linear-Log 1% ̂ /𝟏𝟎𝟎) 𝒖𝒏𝒊𝒕𝒔
(𝜷
Log-Linear 1 unit ̂ ∗ 𝟏𝟎𝟎)%
(𝜷
Log-Log 1% ̂%
𝜷
Homework-1
1) Suppose you are interested to examine the effect of advertisement expenditure on sales volume
of firms producing and supplying a particular product. To this end, you have specified a simple
linear regression Model described as: 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 , and collected the following cross-
sectional data on six firms.
5 4
4.5 2
9.5 5
a) Estimate 𝜶 and 𝜷 by using the principle of least squares and interpret the results.
b) Estimate the average elasticity of sales volume to advertisement expenditure?
2) Assume that the data on sales and advertisement given above is in natural logs,
a) Specify your double log regression model, estimate the coefficients of your model by OLS
and interpret the results.
b) Estimate the elasticity of sales volume to advertisement expenditure?
2.6.Decomposition of the Variation of Y & “Goodness of Fit” of an Estimated Model
We have obtained the estimators minimizing the sum of squared residuals. Once the estimation has
been done, we can see how well our sample regression line fits our data. In other words, we can
measure how well the explanatory variable, X, explains the dependent variable, Y. It is often useful
to compute a number that summarizes how well the OLS regression line fits the data. The measures
that indicate how well the sample regression line fits the data are denominated goodness of fit
measures.
➢ Recall that
𝑌𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖 + 𝑒𝑖 … … … … … … … … … … … … (2.2)
Where, 𝑌̂𝑖 = 𝛼̂ + 𝛽̂ 𝑋𝑖
𝑌𝑖 = 𝑌̂𝑖 + 𝑒𝑖 …………………………………………… (2.21)
Summing (2.21) over the samples will result the following expression:
∑ 𝑌𝑖 = ∑ 𝑌̂𝑖 + ∑ 𝑒𝑖
∑ 𝑌𝑖 = ∑ 𝑌̂𝑖 Since, ∑ 𝑒𝑖 = 0
Dividing both sides of the above by ‘𝒏’ will give us
∑ 𝒀𝒊 ̂𝒊
∑𝒀
= => 𝑌̅ = 𝑌̅̂ … … … … … … … … … … … … (2.22)
𝒏 𝒏
∑ 𝑦̂𝑖 𝑒𝑖 = 𝛽̂ [∑ 𝑥𝑖 𝑦𝑖 − ∑ 𝑥𝑖 𝑦𝑖 ] = 𝛽̂ [0] = 0
∴ ∑ 𝑦̂𝑖 𝑒𝑖 = 0
➢ Therefore, equation [2.25] becomes
∑(𝑦𝑖 )2 = ∑(𝑦̂𝑖 )2 + ∑ 𝑒𝑖2 … … … … … . . (𝟐. 𝟐𝟖)
𝑶𝒓, ∑(𝑌𝑖 − 𝑌̅ )2 = ∑(𝑌̂𝑖 − 𝑌̅)2 + ∑(𝑌𝑖 − 𝑌̂𝑖 )2
𝑻𝒐𝒕𝒂𝒍 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅
[ ] = [ ] + [ ]
𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
In words,
𝑇𝑜𝑡𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 (𝑇𝑆𝑆) =
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 (𝐸𝑆𝑆) + 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 (𝑅𝑆𝑆)
In econometrics, the ‘goodness of fit’ of a model is measured by a statistical index called coefficient
of determination (𝑹𝟐 ).
Definition
➢ Coefficient of Determination (𝑹𝟐 ) is the proportion of total variation of (Y) which is
explained by the variation of the explanatory variable (X) included in the model.
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒊𝒏 𝒀
𝑪𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕 𝒐𝒇 𝑫𝒆𝒕𝒆𝒓𝒎𝒊𝒏𝒂𝒕𝒊𝒐𝒏 ( 𝑹𝟐 ) =
𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 𝒊𝒏 𝒀
2
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆𝒔 𝑬𝑺𝑺 ∑(𝑦̂𝑖)
= = … … … … … … … . (𝟐. 𝟐𝟗)
𝑻𝒐𝒕𝒂𝒍 𝑺𝒖𝒎 𝒐𝒇 𝑺𝒒𝒖𝒂𝒓𝒆𝒔 𝑻𝑺𝑺 ∑(𝒚𝒊 )𝟐
The notion of this index is straightforward in a sense that a model will be good fit if the explanatory
variable (X) included in the model determines large part of the actual values of Y. But if X is
irrelevant variable, then model would predict no part of the actual values of X.
Equation (2.28), 𝒊. 𝒆. , 𝑻𝑺𝑺 = 𝑬𝑺𝑺 + 𝑹𝑺𝑺 shows that the total variation in the observed Y values
about their mean value can be partitioned into two parts, one attributable to the regression line and
the other to random forces because not all actual Y observations lie on the fitted line.
Geometrically, we have following figure.
̂ 𝒙𝒊 .
̂𝒊 = 𝜷
From equation (2.26), we have 𝒚
Squaring and summing both sides gives us: ∑ 𝒚 ̂ 𝟐 ∑ 𝒙𝟐𝒊 … … … … … … … … (𝟐. 𝟑𝟎)
̂𝒊 = 𝜷
We can substitute (2.30) in (2.29) to obtain:
𝟐
𝑬𝑺𝑺̂ ∑ 𝑥2𝑖
𝜷
𝟐
𝑹 = = … … … … … … … … … . … … … … … … … … . . (𝟐. 𝟑𝟏)
𝑻𝑺𝑺 ∑(𝒚𝒊 )𝟐
∑ 𝒙𝒊 𝒚𝒊 𝟐 ∑ 𝑥𝑖2
=( ) ∑(𝒚 )𝟐 ̂ = ∑ 𝒙𝒊𝒚𝟐𝒊
, Since 𝜷
∑ 𝒙𝒊 𝟐 𝒊 ∑ 𝒙𝒊
𝟐 𝟐
𝟐
(∑ 𝒙𝒊 𝒚𝒊 ) ∑ 𝒙𝟐𝒊
(∑ 𝒙𝒊 𝒚𝒊 )
𝑹 = 𝟐 𝟐
=
(∑ 𝒙𝒊 𝟐 ) ∑(𝒚𝒊 ) ∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐
𝟐
∑ 𝒙𝒊 𝒚𝒊
𝑹𝟐 = … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟐)
√∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐
[ ]
∑ 𝒙𝒊 𝒚 𝒊
But we know that the term is the formula of the correlation coefficient, . for this
√∑ 𝒙𝟐𝒊 ∑ 𝒚𝒊 𝟐
In the regression context, 𝒓𝟐𝒀𝑿 is a more meaningful measure than r, for the former tells us the
proportion of variation in the dependent variable explained by the explanatory variable(s) and
therefore provides an overall measure of the extent to which the variation in one variable determines
the variation in the other. The latter does not have such value.
𝟐
(∑ 𝒚𝒊 𝒙𝒊 )𝟐
∴𝑹 = … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟑)
∑ 𝒚𝟐𝒊 ∑ 𝒙𝟐𝒊
Alternatively, we know that, 𝑹𝑺𝑺 = 𝑻𝑺𝑺 − 𝑬𝑺𝑺, hence 𝑹𝟐 becomes;
𝑻𝑺𝑺 − 𝑹𝑺𝑺 𝑹𝑺𝑺
𝑹𝟐 = =𝟏−
𝑻𝑺𝑺 𝑻𝑺𝑺
𝟐
∑ 𝒆𝟐𝒊
∴𝑹 =𝟏− … … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟒)
∑ 𝒚𝟐𝒊
Given that, ∑ 𝑒𝑖2 = ∑ 𝑦𝑖2 − ∑ 𝑦̂𝑖2 , and ∑ 𝑦̂𝑖2 = 𝛽̂ 2 ∑ 𝑥𝑖2
̂ 𝟐 ∑ 𝒙𝟐𝒊
∴ ∑ 𝒆𝟐𝒊 = ∑ 𝒚𝟐𝒊 −𝜷 or ̂ ∑ 𝒚𝒊 𝒙𝒊
∑ 𝒆𝟐𝒊 = ∑ 𝒚𝟐𝒊 − 𝜷
∑ 𝒚𝟐𝒊 −𝜷
̂ 𝟐 ∑ 𝒙𝟐𝒊
Therefore, [2.34] can be rewritten as 𝑹𝟐 = 𝟏 −
∑ 𝒚𝟐𝒊
➢ 𝑹𝟐 measures part of total variation in Y which is explained by the model. As a result it is used as
an indicator to measure the explanatory power or goodness of fit of a model. It shows the
proportion of total variation in Y which is attributable for the variation in X.
Interpretation of 𝑹𝟐
For example, if 𝑹𝟐 is 0.9 this would mean 90% of the total variation in the true values Y is explained
by the model and the remaining 10% of the total variation in Y is not explained by the model.
Equivalently, it would mean that, 90% of the total variation in the values of Y is explained or
determined (caused) by the variation in the values of X. Therefore, the model has good fit.
𝟐
∑ 𝒆𝟐𝒊
𝑹 =𝟏−
∑ 𝒚𝟐𝒊
∑ 𝒆𝟐𝒊
➢ Note that is the proportion of the unexplained variation of the Y 's around their mean, 𝑌̅ .
∑ 𝒚𝟐𝒊
➢ If all the observations lie on the regression line, 𝑌̂ = 𝛼̂ + 𝛽̂ 𝑋𝑖 , there will be no scatter of points.
In other words, the total variation of Y is explained completely by the estimated regression line,
∑ 𝒆𝟐𝒊
and consequently there will be no unexplained variation, i.e, = 𝟎 and hence 𝑹𝟐 = 𝟏.
∑ 𝒚𝟐𝒊
➢ On the other hand, if the regression line explains only part of the variation in Y, there will be
∑ 𝒆𝟐𝒊
some unexplained variation, i.e.,
∑ 𝒚𝟐𝒊
> 𝟎. Therefore, 𝑹𝟐 < 𝟏.
∑ 𝒆𝟐
Finally, if the regression line does not explain any part of the variation of Y, ∑ 𝒚𝒊𝟐 = 𝟏, since ∑ 𝒚𝟐𝒊 =
𝒊
̂
The Relationship between 𝑹𝟐 and 𝜷
(∑ 𝒚𝒊 𝒙𝒊 )(∑ 𝒚𝒊 𝒙𝒊 )
𝑹𝟐 =
∑ 𝒚𝟐𝒊 ∑ 𝒙𝟐𝒊
∑𝒚 𝒙 ∑𝒚 𝒙
⇒ 𝑹𝟐 = [ ∑ 𝒊 𝟐 𝒊 ] [ ∑ 𝒊 𝟐 𝒊 ] ̂ = ∑ 𝒚𝒊𝒙𝟐 𝒊,
𝐬𝐢𝐧𝐜𝐞 𝜷
𝒙𝒊 𝒚𝒊 ∑ 𝒙𝒊
∑ 𝒚𝒊 𝒙𝒊
̂[
⇒ 𝑹𝟐 = 𝜷 ]
∑ 𝒚𝟐𝒊
∑ 𝒚𝟐𝒊
̂ = 𝑹𝟐 [
⇒𝜷 ]
∑ 𝒚𝒊 𝒙𝒊
❑ Example: 2.5:
➢ Based on the consumption model estimated in example 2.1 above.
a) Find the total variation (TSS), explained variation (ESS) and Unexplained Variation
(RSS)?
b) Compute the coefficient of determination (𝑅 2 ) and interpret the result.
❑ Solution:
a) Recall that, ∑ 𝑦𝑖 2 = ∑ 𝑦̂𝑖 2 + ∑ 𝑒𝑖2
From equation [2.26], we know that 𝑦̂𝑖 = 𝛽̂ 𝑥𝑖
̂ 𝑖 2 = 𝛽̂ 2 ∑ 𝑥𝑖2
∴ ∑𝑦 and ∑ 𝑒𝑖2 = ∑ 𝑦𝑖2 −𝛽̂ 2 ∑ 𝑥𝑖2
➢ 𝑇𝑆𝑆 = ∑ 𝑦𝑖 2 = 𝟑𝟐 (𝐹𝑟𝑜𝑚 𝑡𝑎𝑏𝑙𝑒 − 2.1)
̂ 𝑖 2 = 𝛽̂ 2 ∑ 𝑥𝑖2 = (0.607)2 (84) = 30.95,
➢ 𝐸𝑆𝑆 = ∑ 𝑦 and
estimated econometrics model, there are three criteria. These are economic criterion, statistical
criterion (First order test) and econometric criterion (Second Order Tests).
Therefore, in the coming sections, we shall discuss on the econometric criteria and statistical criteria
of model evaluation, in order.
2.7.1. Econometric Criterion: Statistical Desirable Properties of OLS Estimators and the
Gauss-Markov Theorem
There are various econometric methods with which we may obtain the estimates of the parameters of
economic relationships. We would like an estimate to be as close as the value of the true population
parameters, i.e., to vary within only a small range around the true parameter. How are we to choose
among the different econometric methods, the one that gives ‘good’ estimates? We need some
criteria for judging the ‘goodness’ of an estimate.
The goodness of an estimate is judged based on how close the estimate is to the true population
parameter it represents. Therefore, the closeness of an estimator is the major criterion based on
which the goodness of an estimator is judged in econometric application.
‘Closeness’ of the estimate to the population parameter is measured by the mean and variance or
standard deviation of the sampling distribution of the estimates of the different econometric
methods. We assume the usual process of repeated sampling i.e., we assume that we get a very large
number of samples each of size ‘n’; we compute the estimates 𝜷 ̂ ′𝑠 from each sample, and for each
econometric method and we form their distribution. We next compare the mean (expected value) and
the variances of these distributions and we choose among the alternative estimates the one whose
distribution is concentrated as close as possible around the population parameter.
There are traditional criteria based on which the closeness of an estimate to the population parameter
can be determined. These are called desirable properties of Estimators (or estimates).
➢ Desirable properties of estimators are two categories:
1. Finite (small sample) desirable properties of estimators and
2. Infinite (large sample) or asymptotic properties of estimators.
C. Efficiency
D. Linearity
E. Minimum mean square error (MMSE)
F. Best, linear, unbiased (BLU)
A. Unbiased Estimator
According to this criterion a good estimator is one that produces an unbiased estimate. An estimate
is said to be unbiased if its bias is zero. The bias of an estimate is defined by the difference between
the expected value of the estimate and the value of the population parameter.
That is,
̂) − 𝜷
𝑩𝒊𝒂𝒔 𝒐𝒇 𝒂𝒏 𝑬𝒔𝒕𝒊𝒎𝒂𝒕𝒆 = 𝑬(𝜷
̂ ) is unbiased if
Thus, an estimate 𝑜𝑓(𝜷
̂ ) = 𝜷 … … … … … … … … … … … . (𝟐. 𝟑𝟓)
𝑬(𝜷
C. Efficient Estimator
𝑬𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒄𝒚 = 𝑼𝒏𝒃𝒊𝒂𝒔𝒆𝒅𝒏𝒆𝒔𝒔 + 𝑴𝒊𝒏𝒊𝒎𝒖𝒎 𝑽𝒂𝒓𝒊𝒂𝒏𝒄𝒆
Efficiency is a desirable statistical property because from two unbiased estimators of the same
population parameter, we prefer the one that has the smaller variance, i.e., the one that is statistically
more precise.
➢ Let 𝛽̂ and 𝛽̃ be two unbiased estimators of the population parameter 𝜷, such that 𝑬(𝜷 ̂) = 𝜷
and 𝑬(𝜷 ̃ ) = 𝜷. Then the estimator 𝜷 ̂ is efficient relative to the estimator 𝜷∗ if the variance of
̂ is less than the variance of the finite-sample distribution of 𝜷
the finite-sample distribution of 𝜷 ̃;
▪ i.e., if 𝑽𝒂𝒓(𝜷̂ ) ≤ 𝑽𝒂𝒓(𝜷∗ ) for all finite 𝒏 where 𝐸(𝛽̂ ) = 𝛽 and (𝜷∗ ) = 𝛽.
Note: Both the estimators 𝜷̂ and 𝜷∗ must be unbiased, since the efficiency property refers only to
the variances of unbiased estimators.
The difference between the two is that 𝑽𝒂𝒓 (𝜷̂ ) measures the dispersion of the distribution of
̂ around its mean, whereas 𝑴𝑺𝑬(𝜷
𝜷 ̂ ) measures dispersion around the true value of the parameter.
̂ − 𝑬(𝜷
But 𝑬[𝜷 ̂ )]𝟐 is the variance of the sampling distribution of 𝜷 ̂ ) − 𝜷]𝟐 is the square of
̂ and 𝑬(𝜷
̂ and 𝟐𝑬 [(𝜷
the bias of 𝜷 ̂ − 𝑬(𝜷
̂ )) (𝑬(𝜷
̂ ) − 𝜷)] = 𝟎.
̂ − 𝑬(𝜷
✓ B/c 𝟐𝑬 [(𝜷 ̂ )) (𝑬(𝜷 ̂ )𝟐 − 𝑬(𝜷
̂ ) − 𝜷)] = 𝟐 [𝑬(𝜷 ̂ )𝟐 − 𝜷𝑬(𝜷
̂ ) + 𝜷𝑬(𝜷
̂ )] = 𝟎
𝟐 𝟐
̂ ) − 𝜷] = 𝑬(𝜷
NB:𝑬[𝑬(𝜷 ̂ ) − 𝜷] and 𝑬[𝑬(𝜷
̂ )] = 𝑬(𝜷
̂ ), since the expected
➢ Therefore,
̂ ) = 𝒗𝒂𝒓(𝜷
𝑴𝑺𝑬(𝜷 ̂ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷
̂ ) … … … … … … … … . (𝟐. 𝟑𝟖)
Equation 2.38 shows that MSE criterion is a comprehensive criterion which considers both
unbiasedness and minimum variance criterion. Thus, by using MSE criterion we can make choice
between unbiased and best (minimum variance) estimators. In this case, if the comparison is between
unbiased and best estimators then, the one with minimum MSE must be selected.
Example: Suppose we have two estimates 𝜷 ̂ and 𝜷∗ of a population parameter 𝜷 obtained from
two different econometric methods say OLS and maximum likelihood, respectively. Also
suppose that 𝜷̂ is unbiased and has a variance of 9. On the other hand, 𝑬(𝜷∗ ) = 𝟒 and its
̂ ) is 6, then which econometric method would you select?
variance is 8. If the 𝑬(𝜷
➢ Solution:
̂ ) = 𝒗𝒂𝒓(𝜷
𝑴𝑺𝑬(𝜷 ̂ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷
̂) 𝑴𝑺𝑬(𝜷∗ ) = 𝒗𝒂𝒓(𝜷∗ ) + 𝑩𝒊𝒂𝒔𝟐 (𝜷∗ )
̂ 𝒊𝒔 𝒖𝒏𝒃𝒊𝒂𝒔𝒆𝒅)
= 𝟗 + 𝟎 (𝒔𝒊𝒏𝒄𝒆 𝜷 = 8 + (4 − 6)2 = 12
̂) = 𝟗
𝑴𝑺𝑬(𝜷 𝑴𝑺𝑬(𝜷∗ ) = 𝟏𝟐
̂ ) < 𝑴𝑺𝑬(𝜷∗ ), 𝜷
Since 𝑴𝑺𝑬(𝜷 ̂ has minimum MSE than 𝜷∗ and OLS is preferred than
Maximum-likelihood to estimate the model.
E. Linear Estimator
An estimator is linear if it is a linear function of the dependent variable of the model. Linearity of an
estimate does not determine the closeness of an estimate to the true population parameter. However,
it is desirable in econometric analysis because it is useful to deduce the probability distribution of an
estimate from the known probability distribution of the dependent variable of a model.
The GMT can be stated as: “Given the assumptions of the classical linear regression model, the
OLS estimators, in the class of linear and unbiased estimators, have minimum variance, i.e. the
OLS estimators are BLUE. According to this theorem, under the basic assumptions of the classical
linear regression model, the least squares estimators are linear, unbiased and have minimum variance
(i.e. are best of all linear unbiased estimators).
̂ = ∑ 𝒙𝒊𝒀𝟐𝒊 ;
𝜷 Now, let
𝒙𝒊
= 𝒌𝒊 (𝑖 = 1, 2, … . . , 𝑛)
∑ 𝒙𝒊 ∑ 𝒙𝒊 𝟐
̂ = ∑ 𝒌𝒊 𝒀𝒊 … … … … … … … … … … … … … … … … … … … … … … … … . . (𝟐. 𝟑𝟗)
∴𝜷
b. Unbiasedness: Prove that 𝜶 ̂ are unbiased estimators of the true parameters 𝜶 and 𝜷.
̂ 𝒂𝒏𝒅 𝜷
̂ 𝒂𝒏𝒅 𝜷
To show that 𝜶 ̂ are the unbiased estimators of their respective parameters means to prove
that:
̂ ) = 𝜷, and 𝑬(𝜶
𝑬(𝜷 ̂) = 𝜶
̂ is unbiased estimator of 𝜷.
Proof (1): Prove that 𝜷
̂ = ∑ 𝒌𝒊 𝒀𝒊 = ∑ 𝒌𝒊 (𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )
We know that 𝜷
̂ = 𝜶 ∑ 𝒌𝒊 + 𝜷 ∑ 𝒌𝒊 𝑿𝒊 + ∑ 𝒌𝒊 𝑼𝒊 ,
𝜷
But, ∑ 𝒌𝒊 = 𝟎 and ∑ 𝒌𝒊 𝑿𝒊 = 𝟏
𝒙𝒊 ∑ 𝒙𝒊 ∑(𝑿𝒊 −𝑿 ̅) ∑ 𝑿𝒊 −𝒏𝑿 ̅ ̅ −𝒏𝑿
𝒏𝑿 ̅
∑ 𝒌𝒊 = ∑ ( ) = = = = =𝟎
∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐
∴ ∑ 𝒌𝒊 = 𝟎
∑ 𝒙𝒊 𝑿𝒊 ̅ )𝑿𝒊
∑(𝑿𝒊 −𝑿 ∑ 𝑿𝟐𝒊 −𝑿
̅ ∑𝑿
𝒊 ∑ 𝑿𝟐𝒊 −𝒏𝑿̅𝟐
∑ 𝒌 𝒊 𝑿𝒊 = = = = =𝟏
∑ 𝒙𝒊 𝟐 ∑ 𝒙𝒊 𝟐 ∑ 𝑿𝒊 𝟐 −𝒏𝑿
̅𝟐 ∑ 𝑿𝒊 𝟐 −𝒏𝑿
̅𝟐
∴ ∑ 𝒌𝒊 𝑿𝒊 = 𝟏
̂ = 𝜷 + ∑ 𝒌𝒊 𝑼𝒊
𝜷 ⇒ ̂ − 𝜷 = ∑ 𝒌𝒊 𝑼𝒊 … … … … … … … … … … … (𝟐. 𝟒𝟎)
𝜷
̂ ) = 𝑬(𝜷) + ∑ 𝒌𝒊 𝑬(𝑼𝒊 ), Since the 𝑿𝒊 are assumed to be nonstochastic/fixed, the 𝒌𝒊 are non-
𝑬(𝜷
stochastic too.
̂
∴ 𝑬(𝜷) = 𝜷, Since 𝑬(𝑼𝒊 ) = 𝟎
̂ is unbiased estimator of 𝜷.
Therefore, 𝜷
̂ is unbiased estimator of 𝜶.
Proof(2): prove that 𝜶
𝟏
̅ 𝒌𝒊 ) 𝒀𝒊
̂ = ∑( − 𝑿
From the proof of linearity property, we know that: 𝜶 𝒏
𝟏
̅ 𝒌𝒊 ) (𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 )] , Since 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊
̂ = ∑ [( − 𝑿
𝜶 𝒏
= 𝜶 + 𝜷 𝟏⁄𝒏 ∑ 𝑿𝒊 + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝜶𝑿
̅ ∑ 𝒌𝒊 − 𝜷𝑿
̅ ∑ 𝒌𝒊 𝑿𝒊 − 𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊
̅ + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝜷𝑿
= 𝜶 + 𝜷𝑿 ̅−𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊 ,
= 𝜶 + 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝑿
̅ ∑ 𝒌𝒊 𝑼𝒊 ,
𝜶 ̅ ∑ 𝒌𝒊 𝑼𝒊 = ∑ (𝟏 − 𝑿
̂ − 𝜶 = 𝟏⁄𝒏 ∑ 𝑼𝒊 − 𝑿 ̅ 𝒌𝒊 ) 𝑼𝒊 … … … … … … … … … … . . (𝟐. 𝟒𝟏)
𝒏
̂) = 𝜶 + 𝟏⁄𝒏 ∑ 𝑬(𝑼𝒊 ) − 𝑿
𝐸 (𝜶 ̅ ∑ 𝒌𝒊 𝑬(𝑼 )
𝒊
∴ 𝑬(𝜶
̂) = 𝜶
̂ is an unbiased estimator of 𝜶.
𝑻𝒉𝒆𝒓𝒆𝒇𝒐𝒓𝒆, 𝜶
̂
̂ 𝒂𝒏𝒅 𝜷
c. Minimum variance of 𝜶
Now, we have to establish that out of the class of linear and unbiased estimators of𝜶 𝑎𝑛𝑑 𝜷,
̂ 𝒂𝒏𝒅 𝜷
𝜶 ̂ possess the smallest sampling variances. For this, we shall first obtain variance of
̂ 𝒂𝒏𝒅 𝜷
𝜶 ̂ and then establish that each has the minimum variance in comparison of the variances of
other linear and unbiased estimators obtained by any other econometric methods than OLS.
̂
a. Variance of 𝜷
̂ ) = 𝑬 [𝜷
𝑽𝒂𝒓(𝜷 ̂ − 𝑬 (𝜷
̂ )]𝟐 = 𝑬[𝜷
̂ − 𝜷]𝟐 …………………………………… (2.42)
Substitute (2.40) in (2.42) and we obtain:
̂ ) = 𝑬(∑ 𝒌𝒊 𝑼𝒊 )𝟐
𝑽𝒂𝒓(𝜷
̂ ) = 𝑬[𝒌𝟐𝟏 𝒖𝟐𝟏 + 𝒌𝟐𝟐 𝒖𝟐𝟐 … … . +𝒌𝟐𝒏 𝒖𝟐𝒏 + 𝟐𝒌𝟏 𝒌𝟐 𝒖 𝒖𝟐 + ⋯ … + 𝟐𝒌𝒏−𝟏 𝒌𝒏 𝒖 𝒖𝒏 ]
𝑽𝒂𝒓(𝜷 𝟏 𝒏−𝟏
𝝈𝟐
̂) =
∴ 𝑽𝒂𝒓(𝜷 … … … … … … … … … (𝟐. 𝟒𝟑)
∑ 𝒙𝒊 𝟐
b. Variance of ̂
𝑽𝒂𝒓(𝜶
̂ ) = 𝑬 [𝜶 ̂)]𝟐 = 𝑬[𝜶
̂ − 𝑬 (𝜶 ̂ − 𝜶]𝟐 … … … … … … … … … … … … … … … … . . (𝟐. 𝟒𝟒)
Substituting equation (2.41) in (2.44), we get,
𝟐
𝟏
̅
̂ ) = 𝑬 [∑ ( − 𝑿𝒌𝒊 ) 𝑼𝒊 ]
𝑽𝒂𝒓(𝜶
𝒏
𝟐
𝟏
̅ 𝒌𝒊 ) 𝑼𝟐𝒊 ]
̂ ) = 𝑬 [∑ ( − 𝑿
𝑽𝒂𝒓(𝜶
𝒏
𝟐 𝟐
𝟏 2 𝟏
̅
= ∑ ( − 𝑿𝒌𝒊 ) 𝑬(𝑼𝟐𝒊 ) ̅
̂ = 𝜎 ∑ ( − 𝑿𝒌𝒊 )
⇒ 𝑽𝒂𝒓(𝜶)
𝒏 𝒏
𝟏 𝟐 𝟏 𝟐
= 𝜎2 ∑ ( − 𝑿 ̅ 𝟐 𝒌𝟐𝒊 )
̅ 𝒌𝒊 + 𝑿 ̂) = 𝜎 2 ( − 𝑿
⇒ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒌𝟐𝒊 )
̅ ∑ 𝒌𝒊 + 𝑿
𝒏𝟐 𝒏 𝒏 𝒏
𝟏
̅ 𝟐 ∑ 𝒌𝟐𝒊 ), since ∑ 𝒌𝒊 = 𝟎
= 𝜎 2( + 𝑿
𝒏
𝟐
𝟏 ̅
𝑿 ∑ 𝒙𝟐 𝟏
= 𝜎 2 ( + ∑ 𝟐) Since ∑ 𝒌𝟐𝒊 = (∑ 𝒙𝟐𝒊)𝟐 = ∑ 𝒙𝟐
𝒏 𝒙𝒊 𝒊 𝒊
𝟏 ̅𝟐
𝑿 ∑ 𝒙𝟐𝒊 +𝒏𝑿̅𝟐 ∑ 𝑿𝟐𝒊
Moreover, +∑ = =
𝒏 𝒙𝟐𝒊 𝒏 ∑ 𝒙𝒊𝟐 𝒏 ∑ 𝒙𝟐𝒊
𝟐
𝟏 ̅
𝑿 ∑ 𝑿𝟐𝒊
𝑉𝑎𝑟(𝛼̂) = 𝜎 2 ( + ∑ 𝟐 ) = 𝜎 2 ( )
𝒏 𝒙𝒊 𝒏 ∑ 𝒙𝟐𝒊
∑ 𝑿𝟐𝒊
̂ ) = 𝝈𝟐 (
∴ 𝑽𝒂𝒓(𝜶 ) … … … … … … … … … … … … … … … … … (𝟐. 𝟒𝟓)
𝒏 ∑ 𝒙𝟐𝒊
We have managed to compute the variances of OLS estimators. Now, it is time to check whether
these variances of OLS estimators do possess minimum variance property compared to the
̂.
̂ and 𝜷
variances of other estimators of the true 𝜶 𝑎𝑛𝑑 𝜷, other than 𝜶
To establish that 𝜶 ̂
̂ and 𝜷 possess minimum variance property, we compare their variances with
that of the variances of some other alternative linear and unbiased estimators of 𝜶 𝑎𝑛𝑑 𝜷, say
̃ . Now, we want to prove that any other linear and unbiased estimator of the true population
̃ 𝑎𝑛𝑑 𝜷
𝜶
parameter obtained from any other econometric method has larger variance than that of OLS
estimators.
̂ and then that of 𝜶
➢ Let’s first show minimum variance of 𝜷 ̂.
̂
1. Minimum variance of 𝜷
̃ is an alternative linear and unbiased estimator of 𝜷 and;
Suppose 𝜷
Let us define this alternative linear estimator of 𝜷 as follows:
̃ = ∑ 𝒘𝒊 𝒀𝒊 ……………………………… (2.46)
𝜷
where , wi ki ; but: wi = k i + ci
̃ = ∑ 𝒘𝒊 𝑬(𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊 ),
𝜷 since, 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝑼𝒊
̃ = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 + ∑ 𝒘𝒊 𝑼𝒊
𝜷
̃) = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 + ∑ 𝒘𝒊 𝑬(𝑼 )
𝑬(𝜷 𝒊
̃) = 𝜶 ∑ 𝒘𝒊 + 𝜷 ∑ 𝒘𝒊 𝑿𝒊 , since 𝑬(𝑼𝒊 ) = 𝟎
∴ 𝑬(𝜷
̃ to be unbiased estimator of 𝜷, there must be true that
Therefore, for 𝜷
∑ 𝒘𝒊 = 𝟎 and ∑ 𝒘𝒊 𝑿𝒊 = 𝟏
But also, 𝒘𝒊 = 𝒌𝒊 + 𝒄𝒊
∑ 𝒘𝒊 = ∑(𝒌𝒊 + 𝒄𝒊 ) = ∑ 𝒌𝒊 + ∑ 𝒄𝒊
Therefore, ∑ 𝒄𝒊 = 𝟎, 𝒔𝒊𝒏𝒄𝒆 ∑ 𝒌𝒊 = ∑ 𝒘𝒊 = 𝟎
Again, ∑ 𝒘𝒊 𝑿𝒊 = ∑(𝒌𝒊 + 𝒄𝒊 )𝑿𝒊 = ∑ 𝒌𝒊 𝑿𝒊 + ∑ 𝒄𝒊 𝑿𝒊
Since, ∑ 𝒘𝒊 𝑿𝒊 = 𝟏 𝒂𝒏𝒅 ∑ 𝒌𝒊 𝑿𝒊 = 𝟏 ⇒ ∑ 𝒄𝒊 𝑿𝒊 = 𝟎
̅)
➢ From these values, we can derive ∑ 𝒄𝒊 𝒙𝒊 = 𝟎, 𝒘𝒉𝒆𝒓𝒆 𝒙𝒊 = (𝑿𝒊 − 𝑿
̅ ) = ∑ 𝒄𝒊 𝑿𝒊 + 𝑿
∑ 𝒄𝒊 𝒙𝒊 = ∑ 𝒄𝒊 (𝑿𝒊 − 𝑿 ̅ ∑ 𝒄𝒊
Since ∑ 𝒄𝒊 𝑿𝒊 = 𝟎, ∑ 𝒄𝒊 = 𝟎 ⇒ ∑ 𝒄𝒊 𝒙𝒊 = 𝟎
̂ has minimum variance or not, lets compute 𝑽𝒂𝒓(𝜷∗ ) to compare with
To prove whether 𝜷
̂ ).
𝑽𝒂𝒓(𝜷
𝑽𝒂𝒓(𝜷∗ ) = 𝑽𝒂𝒓 (∑ 𝒘𝒊 𝒀𝒊 )
= ∑ 𝒘𝟐𝒊 𝑽𝒂𝒓(𝒀𝒊 )
̂ ) + 𝝈𝟐 ∑ 𝒄𝟐𝒊
∴ 𝑽𝒂𝒓(𝜷∗ ) = 𝑽𝒂𝒓(𝜷
Given that 𝒄𝒊 is an arbitrary constant, 𝝈𝟐 ∑ 𝒄𝟐𝒊 is a positive magnitude, i.e, it is greater than zero.
Thus, 𝑽𝒂𝒓(𝜷∗ ) > 𝑽𝒂𝒓(𝜷 ̂ ). This proves that 𝜷̂ possesses minimum variance property. In similar
̂ ) possesses minimum variance.
way, we can prove that the least square estimate of the intercept (𝜶
̂
2. Minimum Variance of 𝜶
We take a new estimator 𝜶̃ , which we assume to be a linear and unbiased estimator of function of 𝜶.
̂ is given by:
The least square estimator 𝜶
𝟏
̂ = ∑( − 𝑿
𝜶 ̅ 𝒌𝒊 ) 𝒀𝒊
𝒏
𝟏 𝟐
̅ 𝒘𝒊 ) 𝑽𝒂𝒓(𝒀𝒊 )
̃) = ∑ ( − 𝑿
𝑽𝒂𝒓(𝜶 𝒏
𝟏 𝟐
̅ 𝒘𝒊 )
= 𝝈𝟐 ∑ (𝒏 − 𝑿
𝟏 𝟏
̅ 𝟐 𝒘𝟐𝒊 − 𝟐 𝑿
= 𝝈𝟐 ∑(𝒏𝟐 + 𝑿 ̅ 𝒘𝒊 )
𝒏
𝒏 𝟐
= 𝝈𝟐 ( ̅ ∑ 𝒘𝟐𝒊 − 𝟐𝑿
+𝑿 ̅ 𝟏/𝒏 ∑ 𝒘𝒊 )
𝒏𝟐
𝟏
̅ 𝟐 ∑ 𝒘𝟐𝒊 ),
= 𝝈𝟐 (𝒏 + 𝑿 Since ∑ 𝒘𝒊 = 𝟎
𝟐
𝟏 ̅
𝑿 𝟐
̃) = 𝝈𝟐 ( +
𝑽𝒂𝒓(𝜶 𝟐
̅ ∑ 𝒄𝟐𝒊
) + 𝝈𝟐 𝑿
𝒏 ∑ 𝒙𝒊
∑ 𝑿𝟐 ∑ 𝑿𝟐
𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊
̃ ) = 𝝈𝟐 ( ∑ 𝒊 𝟐 ) + 𝝈𝟐 𝑿 , But 𝝈𝟐 (𝒏 ∑ 𝒙𝒊 𝟐 ) = 𝑽𝒂𝒓(𝜶
̂)
𝒏 𝒙 𝒊 𝒊
̃ ) = 𝑽𝒂𝒓(𝜶
∴ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊
̂ ) + 𝝈𝟐 𝑿
̃ ) > 𝑽𝒂𝒓(𝜶
⇒ 𝑽𝒂𝒓(𝜶 ̅ 𝟐 ∑ 𝒄𝟐𝒊 > 𝟎
̂ ), Since 𝝈𝟐 𝑿
Therefore, we have proved that the least square estimators of OLS are best, linear and
unbiased (BLU) estimators.
A. Asymptotic unbiasedness
̂ is asymptotically unbiased estimator of 𝜷 if:
An estimator 𝜷
̂ ) = 𝐥𝐢𝐦 𝑬(𝜷
𝑨𝑬(𝜷 ̂ 𝒏) = 𝜷
𝒏→∞
̂ is an asymptotically unbiased estimate of 𝜷, if its asymptotic bias is zero. The
In other words, 𝜷
̂ is the difference between the asymptotic mean of 𝜷
asymptotic bias of 𝜷 ̂ and the true value of the
population parameter 𝜷. That is:
̂ = 𝑨𝑬(𝜷
𝑨𝒔𝒚𝒎𝒑𝒕𝒐𝒕𝒊𝒄 𝑩𝒊𝒂𝒔 𝒐𝒇 𝜷 ̂ 𝒏) − 𝜷
̂ 𝒏) − 𝜷
= 𝐥𝐢𝐦 𝑬(𝜷
𝒏→∞
Thus, roughly speaking the bias of an asymptotically unbiased estimate vanishes as the size of the
sample gets sufficiently large. An unbiased estimate is also asymptotically unbiased but the converse
is not always true.
B. Consistency
̂ of 𝜷 is
An estimator is a consistent estimator if it results in a consistent estimate. An estimate 𝜷
consistent, if the following two conditions are met simultaneously:
̂ is asymptotically unbiased, i.e.,
1. 𝜷
̂) = 𝜷
𝐥𝐢𝐦 𝑬(𝜷
𝒏→∞
̂ degenerates, i.e.,
2. The distribution of 𝜷
𝐥𝐢𝐦 [𝑽𝒂𝒓(𝜷 ̂ )] = 𝐥𝐢𝐦 [𝜷 ̂ )]𝟐 = 𝟎
̂ − 𝑬(𝜷
𝒏→∞ 𝒏→∞
A consistent estimate converges to the population parameter as the sample size gets sufficiently
large. For consistency property to hold, the bias and variance of the estimator/estimate both should
tend to zero as the sample size increases indefinitely. Thus, this criterion assumes that with larger
and larger samples which contain more information we will be able to obtain an increasingly
accurate estimate of the population parameter.
C. Asymptotic Efficiency
̂ is an asymptotically efficient if the following conditions are met simultaneously:
An estimate 𝜷
1. 𝜷
̂ is consistent. That is:
𝟐
𝐥𝐢𝐦 𝑬(𝜷̂) = 𝜷 𝑎𝑛𝑑 ̂ )] = 𝐥𝐢𝐦 [𝜷
𝐥𝐢𝐦 [𝑽𝒂𝒓(𝜷 ̂ 𝒏𝒊 − 𝑬(𝜷
̂ 𝒏𝒊 )] = 𝟎
𝒏→∞ 𝒏→∞ 𝒏→∞
2. 𝜷
̂ has the smallest asymptotic variance as compared with any other consistent estimator 𝜷
̃
obtained from other method. That is:
∑ 𝑼𝟐𝒊 𝟏
= 𝑬 (∑ 𝑼𝟐𝒊 − ) = ∑ 𝑬(𝑼𝟐𝒊 ) − 𝑬 (∑ 𝑼𝟐𝒊 )
𝒏 𝒏
𝟏
= 𝒏𝝈𝟐𝒖 − 𝒏 𝑬(𝑼𝟏 + 𝑼𝟐 + ⋯ … . . +𝑼𝒏 )𝟐, Since 𝑬(𝑼𝟐𝒊 ) = 𝝈𝟐𝒖
𝟏
= 𝒏𝝈𝟐𝒖 − 𝑬 (∑ 𝑼𝟐𝒊 + 𝟐 ∑ 𝑼𝒊 𝑼𝒋 ) , 𝒊 ≠ 𝒋
𝒏
𝟏 𝟐
= 𝒏𝝈𝟐𝒖 − 𝒏𝝈𝟐𝒖 − ∑ 𝑬(𝑼𝒊 𝑼𝒋 )
𝒏 𝒏
= 𝒏𝝈𝟐𝒖 − 𝝈𝟐𝒖 , 𝑺𝒊𝒏𝒄𝒆 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎
̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
c. −𝟐𝑬[(𝜷 ̂ − 𝜷)(∑ 𝒙𝒊 𝑼𝒊 − 𝑼
̅ )] = −𝟐𝑬[(𝜷 ̅ ∑ 𝒙𝒊 )]
̂ − 𝜷)(∑ 𝒙𝒊 𝑼𝒊 )] , 𝑺𝒊𝒏𝒄𝒆 ∑ 𝒙𝒊 = 𝟎
= −𝟐𝑬[(𝜷
∑ 𝒙𝒊 𝑼𝒊 𝒙𝒊
= −𝟐𝑬[( ) (∑ 𝒙𝒊 𝑼𝒊 )] , 𝑺𝒊𝒏𝒄𝒆, 𝒌𝒊 =
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
𝟐
(∑ 𝒙𝒊 𝑼𝒊 ) ∑ 𝒙𝟐𝒊 𝑼𝟐𝒊 + 𝟐 ∑ 𝒙𝒊 𝒙𝒋 𝑼𝒊 𝑼𝒋
= −𝟐𝑬 [ ] = −𝟐𝑬 [ ]
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
∑ 𝒙𝟐𝒊 𝑬(𝑼𝟐𝒊 ) 𝟐 ∑ 𝒙𝒊 𝒙𝒋 𝑬(𝑼𝒊 𝑼𝒋 )
= −𝟐 [ + ],𝒊 ≠ 𝒋
∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊
= −𝟐𝝈𝟐𝒖 , 𝑺𝒊𝒏𝒄𝒆𝑬(𝑼𝟐𝒊 ) = 𝝈𝟐𝒖 & 𝑬(𝑼𝒊 𝑼𝒋 ) = 𝟎
̂ − 𝜷) ∑ 𝒙𝒊 (𝑼𝒊 − 𝑼
−𝟐𝑬[(𝜷 ̅ )] = −𝟐𝝈𝟐𝒖 …………………. (2.54)
Consequently, Equation (2.51) can be written in terms of (2.52), (2.53) and (2.54) as follows:
Therefore, if we define
∑ 𝒆𝟐𝒊
̂ 𝟐𝒖
𝝈 =
𝒏−𝟐
Its expected value is
𝟏
̂ 𝟐𝒖 ) =
𝑬(𝝈 𝑬 (∑ 𝒆𝟐𝒊 ) = 𝝈𝟐𝒖 … … … … … … … … … … … … … … … … … (𝟐. 𝟓𝟔)
𝒏−𝟐
̂𝟐𝒖 is unbiased estimator of the true variance of the error term (𝝈𝟐𝒖 ).
Therefore, 𝝈
∑ 𝒆𝟐
̂𝟐𝒖 =
The conclusion that we can drive from the above proof is that we can substitute 𝝈 𝒊
for (𝝈𝟐𝒖 )
𝒏−𝟐
̂ 𝑎𝑛𝑑 𝜷
in the variance expressions of 𝜶 ̂ 𝟐𝒖 ) = 𝝈𝟐𝒖 .
̂ , since𝑬 (𝝈
2.7.2. Statistical Inference: Statistical Test of Significance of OLS Estimators (First Order Tests)
In this section, we shall develop statistical criteria for the evaluation of an estimated model.
Statistical criteria are developed based on statistical and probability theories. The application of
statistical criteria to judge on the goodness of a model is known as tests of the statistical
significance (TSS) or first order tests of a model.
Test of the statistical significance of the parameter estimates is determining whether sample values
of the parameter estimates are statistically different from zero or not. TSS of a model is mandatory,
because specification and sampling errors are inevitable, and as a result of these the estimated model
could be different from the actual relationship of the variables.
̂ might be different from zero for a mere chance. That is, the sample
̂ 𝑎𝑛𝑑 𝜷
Sample values of 𝜶
̂ can be different from zero not for the fact that 𝜶 and 𝜷 are actually different from
̂ 𝑎𝑛𝑑 𝜷
values of 𝜶
zero but by a mere chance that happened in a sample. Therefore, in econometric applications it is
mandatory to test whether the true population values of 𝜶 and 𝜷 are different from zero.
The true population values of 𝜶 and 𝜷 are unobservable (can’t be estimated) so how can we
determine whether their values are zero or not? In econometric applications we use the sampling
̂ together with sample values of 𝜶
̂ 𝑎𝑛𝑑 𝜷
distributions of 𝜶 ̂ to compute how likely is for 𝜶 and
̂ 𝑎𝑛𝑑 𝜷
𝜷 to be zero. In this case, if it is less likely for 𝜶 and 𝜷 to be zero, then 𝜶 and 𝜷 are said to be
statistically different from zero and the reverse will be the conclusion if it is highly likely for 𝜶 and
̂ are statistically
̂ 𝑎𝑛𝑑 𝜷
𝜷 to be zero. If 𝜶 and 𝜷 are statistically different from zero, then we say 𝜶
significant (statistically different from zero).
̂ ) refers to the
̂ 𝑎𝑛𝑑 𝜷
Therefore, test of the statistical significance of the parameter estimates (𝜶
̂ together with their
̂ 𝑎𝑛𝑑 𝜷
process in which we use the sampling distributions of OLS estimators 𝜶
sample values to compute how likely is for 𝜶 and 𝜷 to be zero to determine whether they are
statistically different from zero.
̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐𝒊 ∑ 𝑿𝟐𝒊
̂) =
𝑽𝒂𝒓(𝜶 =
𝒏 ∑ 𝒙𝟐𝒊 𝒏(𝒏 − 𝟐) ∑ 𝒙𝟐𝒊
For the purpose of estimation of the parameters the assumption of normality is not used, but we use
this assumption to test the significance of the parameter estimators; because the testing methods
identified above are based on the assumption of the normality assumption of the disturbance
term. Hence, before we discuss on the various testing methods it is important to see whether the
parameters are normally distributed or not.
We have already assumed that the error term is normally distributed with mean zero and
variance 𝝈𝟐 , i.e., 𝑼𝒊 ~𝑵(𝟎, 𝝈𝟐 ). Similarly, we also proved that 𝒀𝒊 ~𝑵(𝜶 + 𝜷𝑿𝒊 , 𝝈𝟐 ). Now, we want
to show the following:
𝟐
̂ ~𝑵 (𝜷, 𝝈̂𝒖𝟐)
1. 𝜷 ∑𝒙 𝒊
̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐 𝑹𝑺𝑺
̂ ~𝑵 (𝜶,
2. 𝜶 ), ̂ 𝟐𝒖 = 𝒊 =
Where, 𝝈
𝒏 ∑ 𝒙𝟐𝒊 𝒏−𝟐 𝒏−𝟐
To show whether 𝜶 ̂ 𝑎𝑛𝑑 𝜷̂ are normally distributed or not, we need to make use of one
property of normal distribution. “........ any linear function of a normally distributed
variable is itself normally distributed.”
̂ ) = √𝑽𝒂𝒓(𝜶
𝑺𝑬(𝜶 ̂)
Recall that:
̂ 𝟐𝒖
𝝈 ∑ 𝒆𝟐𝒊 ̂ 𝟐𝒖
𝝈 ∑ 𝒆𝟐𝒊
̂ ) = √𝑽𝒂𝒓(𝜷
𝑺𝑬(𝜷 ̂) = √ = √ , ̂) =
𝑺𝒊𝒏𝒄𝒆, 𝑽𝒂𝒓(𝜷 =
∑ 𝒙𝟐𝒊 (𝒏 − 𝟐) ∑ 𝒙𝟐𝒊 ∑ 𝒙𝟐𝒊 (𝒏 − 𝟐) ∑ 𝒙𝟐𝒊
̂ 𝟐𝒖 ∑ 𝑿𝟐𝒊
𝝈 ∑ 𝒆𝟐𝒊 ∑ 𝑿𝟐𝒊
̂) = √
̂ ) = √𝑽𝒂𝒓(𝜶
𝑺𝑬(𝜶 = √
𝒏 ∑ 𝒙𝟐𝒊 𝒏(𝒏 − 𝟐) ∑ 𝒙𝟐𝒊
̂.
̂ 𝑎𝑛𝑑 𝜷
Step-3: Compare the standard errors with the numerical values of 𝜶
Decision Rule: (Eg: For the slope coefficient)
• ̂) > 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ , accept the null hypothesis and reject the alternative hypothesis. We
̂ is statistically insignificant.
conclude that 𝜷
• ̂) < 𝜷
If 𝟐𝑺𝑬(𝜷 ̂ , reject the null hypothesis and accept the alternative hypothesis. We
̂ is statistically significant.
conclude that 𝜷
The basic notion of this test emanates from basic property of normal distribution which states that
about 95% of all possible values of any normally distributed variable are distributed within the range
of mean plus or minus twice the standard error of the variable (𝝁𝒙 ± 𝟐𝝈𝒙 ).
Rejection of the null-hypothesis implies that it is less probable (less than 5%) to observe the sample
̂ , if the true values of 𝜶 𝑎𝑛𝑑 𝜷 were zero. Thus, it is possible to conclude that the
̂ 𝑎𝑛𝑑 𝜷
values of 𝜶
̂ are statistically
̂ 𝑎𝑛𝑑 𝜷
true values of 𝜶 𝑎𝑛𝑑 𝜷 are statistically different from zero, hence 𝜶
significant.
̂ to be
̂ 𝑎𝑛𝑑 𝜷
fact it is possible to conclude that it would be improbable for the actual values of 𝜶
greater than twice the values of their standard errors if their mean values (𝜶 𝑎𝑛𝑑 𝜷) were zero.
Numerical example: Suppose that from a sample of size 𝒏 = 𝟏𝟓𝟎, we estimate the following
supply function.
𝑸 = 𝟐𝟎 + 𝟎. 𝟔𝑷
𝑺𝑬: (𝟏. 𝟓) (𝟎. 𝟎𝟐𝟓)
➢ Test the significance of the slope parameter at 𝟓% level of significance using the standard
error test.
̂ ) = 2(0.025) = 0.05
Solution: 𝟐𝑺𝑬(𝜷 ̂ = 𝟎. 𝟔
𝑎𝑛𝑑 𝜷
Thus, since 𝟐𝑺𝑬(𝜷̂) < 𝜷
̂, we reject the null hypothesis of 𝜷 = 𝟎 at 5% level of significance. This
implies that the true value of 𝜷 is statistically different from zero, hence 𝜷 ̂ is statistically
significant.
Note: The standard error test is an approximated test (which is approximated from the z-test
and t-test) and implies a two tail test conducted at 5% level of significance.
The general principle of this test is that it would be less probable for 𝜶 𝑎𝑛𝑑 𝜷 to be zero if it were
̂ under the hypothesized zero values of
̂ 𝑎𝑛𝑑 𝜷
less likely to observe the observed sample values 𝜶
̂ under
̂ 𝑎𝑛𝑑 𝜷
𝜶 𝑎𝑛𝑑 𝜷. Thus, in this test first we compute the probabilities for the sample values of 𝜶
the hypothesized zero values for 𝜶 𝑎𝑛𝑑 𝜷 and reject the null-hypotheses of zero values of 𝜶 𝑎𝑛𝑑 𝜷
if we found these probabilities being small (relative to the level of significance).
Steps of Z – test
Step – 1: Determine the nature of the test/Determine whether a one-tail or a two-tail test/.
The test will be one tail-test if it is possible to determine the possible signs of 𝜶 ̂&𝜷 ̂ to be either
positive or negative using prior information. In this case the test will be right tail test if the signs of
̂&𝜷
𝜶 ̂ are only positive and it will be left tail test if the possible signs of 𝜶 ̂ are only negative. On
̂&𝜷
the other hand, the test must be two tail test if prior information as regard to the possible signs of
̂&𝜷
𝜶 ̂ is not available.
Step – 2: Specify the null and the alternative hypotheses of the test.
̂ ).
Step -3: Choose the level of significance of the test (𝜶
Level of significance is the subjective probability that a researcher considers as too low to assume. It
is the probability of making ‘wrong’ decision, i.e., the probability of rejecting the hypothesis(𝑯𝟎 )
while it is actually true or the probability of committing a type I error. It is customary in
econometric research to choose the 5% or the 1% level of significance. 5% level of significance
means that in making our decision we allow (tolerate) five times out of a hundred to be ‘wrong’ i.e.,
reject the hypothesis when it is actually true.
Step – 4: Determine the critical values as well as the rejection and acceptance regions of the
null hypothesis based on the nature of the test under the chosen level of significance. The
critical value/s is/are the maximum and/or minimum standardized value/s beyond which there is only
‘𝜶%’ (level of significance) chance for the estimate to assume values under the null hypothesis. The
critical value/s of the test is/are obtained directly from Z-distribution probability table. As a result,
the critical values are also known as table or theoretical values (𝒁𝒕 ).
For instance, if the level of significance is 5% and the test is two tail test, then the critical values
can be obtained directly from 𝒁 − 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒕𝒂𝒃𝒍𝒆 through determining the Z-values beyond
which there is only 5% chance for the estimate to assume values under the null hypothesis of zero
values for 𝜶 𝑎𝑛𝑑 𝜷. To obtain these values we read from Z-probability table that the Z-values
beyond which there are only 2.5% (= 𝜶⁄𝟐) chance for 𝜶 ̂ if the hypothesized zero values of
̂ 𝑎𝑛𝑑 𝜷
𝜶 𝑎𝑛𝑑 𝜷 were true. These values are −𝟏. 𝟗𝟔 𝑎𝑛𝑑 𝟏. 𝟗𝟔. Thus, with these critical values the
rejection and acceptance regions for the null-hypothesis will be:
−𝟏. 𝟗𝟔 𝜷=𝟎 𝟏. 𝟗𝟔
The range of possible values for 𝜶 ̂ 𝑜𝑟 𝜷̂ bounded by the critical values is known as acceptance
region. Acceptance region contains (1 − 𝛼)% of all possible values of the estimate under null
hypothesis. That is, there would be (1 − 𝛼)% probability for the estimate to be in the acceptance
region if the null-hypothesis were true. The remaining range of values out of the acceptance region is
known as rejection region. Rejection region contains 𝜶% of all possible values of an estimate under
the null-hypothesis. That is, the rejection region is an interval of values in which there is 𝜶% chance
for the estimate to assume values given the null hypothesis.
Step-5: Compute Z-values of 𝜶 ̂&𝜷 ̂ under the null-hypothesis. These values are the computed
̂ . The computed values (𝒁𝒄 ) shows the Z-values of 𝜶
̂ 𝑜𝑟 𝜷
values (𝒁𝒄 ) of 𝜶 ̂&𝜷 ̂ if the null-
hypothesis were true. To compute 𝒁𝒄 we use the Z- transformation formula.
𝜶̂−𝜶 ̂−𝜷
𝜷
𝒁𝒄 = 𝒐𝒓 𝒁𝒄 =
𝑺𝑬(𝜶 ̂) 𝑺𝑬(𝜷 ̂)
̂ −𝟎
𝜶 ̂
𝜶 ̂−𝟎
𝜷 ̂
𝜷
=> 𝒁𝒄 = = & 𝒁𝒄 = =
̂ ) 𝑺𝑬(𝜶
𝑺𝑬(𝜶 ̂) ̂ ) 𝑺𝑬(𝜷
𝑺𝑬(𝜷 ̂)
̂ with the table values (𝒁𝒕 ), and hence make
̂&𝜷
Step-6: Compare Computed Z-values, 𝒁𝒄 of 𝜶
decision about the null hypothesis (𝑯𝟎 ).
The null-hypothesis will be rejected if it is less probable for the sample values of 𝜶 ̂&𝜷̂ if the null-
hypothesis were true. Thus, the null-hypothesis will be rejected if:
Z-values /Computed values/ 𝒁𝒄 of 𝜶 ̂&𝜷 ̂ are greater than the absolute values of the critical
values (table values, 𝒁𝒕 . I.e., Reject 𝑯𝟎 if 𝒁𝒄 > |𝒁𝒕 |, Or
Z-values /Computed values/, 𝒁𝒄 of 𝜶 ̂&𝜷 ̂ falls outside the acceptance region.
Rule of Thumb
In general, for 𝟓% level of significance the critical value of 𝒁 is 𝟏. 𝟗𝟔, which is approximately equal
to 𝟐, the decision rule about accepting or rejecting the null hypothesis suggests that we reject the
null hypothesis if:
𝒁𝑪𝒐𝒎𝒑𝒖𝒕𝒆𝒅 > |𝟏. 𝟗𝟔| ≅ 𝒁 > |𝟐|
The decision rule is to reject the null hypothesis if 𝒁 > 𝟐.
̂
𝜷 ̂
𝜷
But we know that 𝒁𝒄 = 𝑺𝑬(𝜷̂). Therefore, the equation becomes, 𝑺𝑬(𝜷̂) > 𝟐.
̂ > 𝟐𝑺𝑬(𝜷
This implies Z would be greater than 2 if and only if 𝜷 ̂ ). This is the rule of thumb under
the standard error test. Thus, 𝒕𝒉𝒆 𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 𝒆𝒓𝒓𝒐𝒓 𝒕𝒆𝒔𝒕 𝒂𝒏𝒅 𝒁 – 𝒕𝒆𝒔𝒕 𝒂𝒓𝒆 𝒊𝒅𝒆𝒏𝒕𝒊𝒄𝒂𝒍; they
are two ways of saying the same thing.
Example: Suppose that the following estimated cereal crop supply function is obtained from a
sample of 150 farmers.
𝑸 = 𝟔𝟎 + 𝟒𝑷
𝑺𝑬: (𝟓𝟎) (𝟏. 𝟓)
̂ is statistically significant or not at 5% level of significance assuming that there is
a. Test whether 𝜷
no prior information about the possible sign of the slope of cereal crop supply function?
̂ at 5% level of significance assuming that you have ‘prior
b. Test the statistical significance of 𝜷
knowledge’ from micro economic theory of supply that the slope of supply function is positive?
look for another statistical technique of test when we have too small sample to satisfactorily
approximate the unknown population variances of 𝜶 ̂ . One competent statistical technique of
̂ 𝑎𝑛𝑑 𝜷
test for small sample is t-test.
That is, with smaller samples the estimates 𝜶 ̂ 𝑎𝑛𝑑 𝜷 ̂ follow t-distribution with n-k degrees of
freedom (df), where k is the number of parameters in the model (i.e., 𝛼 & 𝛽) and n is the size of the
sample. Therefore, TSS of 𝜶 ̂ 𝑎𝑛𝑑 𝜷 ̂ can be evaluated based on t-test together with t-transformation
formula and t-probability table. The actual values of 𝜶 ̂ can be transformed into their
̂ 𝑎𝑛𝑑 𝜷
equivalent t-values in units of t-distribution with n-k degrees by using t-transformation formula as
follows:
Sample Mean − population mean
𝒕𝒄 =
Standard Error of sample mean
We can derive the t-value of the OLS estimates,
̂ −𝜶
𝜶 ̂ −𝜷
𝜷
𝒕𝜶̂ = & 𝒕𝜷̂ = ̂) ➔ with 𝒏 − 𝟐 degrees of freedom
̂)
𝑺𝑬(𝜶 𝑺𝑬(𝜷
̂ , respectively
̂&𝜷
Where: 𝒕𝜶̂ 𝑎𝑛𝑑 𝒕𝜷̂ are t-statistics of the estimates, 𝜶
𝜶 & 𝜷 are population parameters
𝑺𝑬(𝜶̂ ) & 𝑺𝑬(𝜷̂ ) are sample estimates of the true population standard deviations
𝒏 − 𝟐 is the degrees of freedom since k (the number of parameters) in SLRM are 𝟐.
The t –distribution is symmetric with mean equal to zero and variance (𝒏 – 𝟏)/ (𝒏 − 𝟑) which
approaches to unity as n gets larger. Thus, as n increases the t –distribution approaches to Z –
distribution which is symmetric with mean zero and unit variance. The probabilities of the t –
distribution at different degrees of freedom have been tabulated by W.S. Cosset. Thus, by using the t
–distribution probability table it is possible to compute the probability for the observed value of
̂ 𝒐𝒓 𝜷
𝜶 ̂ under the hypothesized expected values of the sampling distribution of 𝜶 ̂ (𝜶 𝒐𝒓 𝜷)
̂ 𝒐𝒓 𝜷
when the sample size is small (𝒏 ≤ 𝟑𝟎).
Step-1: Specify the null and the alternative hypotheses based on the nature of the test (two tail/one
tail).
Step-2: Compute 𝒕𝒄 − called computed value of 𝒕 (or test-statistics of the estimator), by taking the
value of 𝜶 𝒐𝒓 𝜷 in the null hypothesis.
𝐄𝐬𝐭𝐢𝐦𝐚𝐭𝐨𝐫 – 𝐇𝐲𝐩𝐨𝐭𝐡𝐞𝐬𝐢𝐳𝐞𝐝 𝐯𝐚𝐥𝐮𝐞
𝑰. 𝒆., 𝒕𝒄 =
𝐒𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐄𝐫𝐫𝐨𝐫 𝐨𝐟 𝐭𝐡𝐞 𝐞𝐬𝐭𝐢𝐦𝐚𝐭𝐨𝐫
̂−𝟎
𝜷 ̂
𝜷
𝑰. 𝒆., 𝒕𝜷̂ = = , 𝑓𝑜𝑟 𝜷
𝑺𝑬(𝜷̂ ) 𝑺𝑬(𝜷 ̂)
Where, in our case the hypothesized value of the slope parameter is zero, (𝑖. 𝑒. , 𝜷 = 𝟎)
Step-3: Choose the level of significance (be it 1%, 5% or 10%)
Step-4: Check whether there is one tail test or two tail test. If we are considering a two tail test at
5% level of significance, divide it by two to obtain critical value of t from the t-table.
Step-5: Obtain critical or table value of t, called 𝒕𝒕 at 𝜶⁄𝟐 and n-2 df for two tail test in SLRM (and
define the critical region).
Step-6: Compare the computed value of 𝒕, 𝒕𝒄 (𝒕𝜷̂ 𝑓𝑜𝑟 𝛽) and table value of 𝒕, 𝒕𝒕 , and hence make
decision about the statistical significance of the estimate
̂ is statistically significant.
✓ If 𝒕𝜷̂ > 𝒕𝒕 , reject 𝑯𝟎 and accept 𝑯𝑨 . The conclusion is 𝜷
̂ is statistically insignificant.
✓ If 𝒕𝜷̂ < 𝒕𝒕 , accept 𝑯𝟎 and reject 𝑯𝑨 . The conclusion is 𝜷
Numerical Example:
Recall that from a sample of size 𝒏 = 𝟔, we have estimated the following simple consumption
function (example-1):
̂ 𝒊 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝒊𝒏𝒄𝒊
𝑪𝒐𝒏𝒔
𝑺𝑬: (𝟎. 𝟓𝟒) (𝟎. 𝟎𝟓𝟔)
Where, the values in the brackets are standard errors.
Test the hypothesis that income doesn’t affect consumption expenditure using the t-test
at 5% level of significance?
Solution:
a. The hypothesis we want to test: 𝑯𝟎 : 𝜷 = 𝟎 𝒂𝒈𝒂𝒊𝒏𝒔𝒕 𝑯𝑨 : 𝜷 ≠ 𝟎
b. The t-value for the test statistic is:
̂−𝟎
𝜷 ̂
𝜷 𝟎. 𝟔𝟎𝟕
𝒕𝜷̂ = = = ≅ 𝟏𝟎. 𝟖𝟒
𝑺𝑬(𝜷 ̂ ) 𝑺𝑬(𝜷̂ ) 𝟎. 𝟎𝟓𝟔
𝟎.𝟎𝟓
c. Since the test is a two tail test, we calculate 𝜶⁄𝟐 = 𝟐 = 𝟎. 𝟎𝟐𝟓 and obtain the critical/table
value of ‘𝒕’ at 𝜶⁄𝟐 = 𝟎. 𝟎𝟐𝟓 and 4 degree of freedom (df), i.e., (𝒏 − 𝟐 = 𝟔 − 𝟐). From t-table
‘𝒕𝒕 ’ at 0.025 level of significance and 𝟒 df is 𝟐. 𝟕𝟕𝟔.
d. Since 𝒕𝜷̂ = 𝟏𝟎. 𝟖𝟒 > 𝒕𝒕 = 𝟐. 𝟕𝟕𝟔.
e. It implies that 𝜷̂ is statistically significant.
In other words, the p-value is the lowest level of significance at which the observed value of a
test statistic is significant (i.e., one rejects 𝑯𝟎 ). The P-Value (or probability value) is the
̂ ) when the null hypothesis is true. The P-
̂ 𝑜𝑟 𝜷
probability of getting a sample statistic (such as 𝜶
value for obtaining a sample outcome is compared to the level of significance (𝜶).
Thus, we need to estimate the interval of values between which the true values of the population
parameters are expected to lie within a certain “degree of confidence”. In statistics, the process of
estimating an interval of values between which the true values of the population parameters are
expected to lie based on the sampling distribution of the sample estimates is called interval
estimation.
An interval of values constructed (estimated) to predict ranges of values for the unknown population
̂ & the actual
̂&𝜷
parameters 𝜶 𝑎𝑛𝑑 𝜷 based on some clues (i.e., the sampling distribution of 𝜶
̂ obtained from the sample is known as confidence interval for the population
̂&𝜷
sample values of 𝜶
parameters 𝜶 𝑎𝑛𝑑 𝜷. And the degree of certainty we may assign to the interval to contain the true
value of the population parameter within its range is known as confidence level or confidence
coefficient. Confidence coefficient is the probability for the interval to contain the true value of the
population parameter within its range. In this respect, we say that with a given probability the
population parameter will be within the defined confidence interval.
I. Confidence interval from the Standard Normal Distribution (Z−Distribution)
The Z-distribution in SLRA is employed when the size of the sample is large (𝒏 > 𝟑𝟎). If the
sample is large, we can use the Z-transformation formula together with Z-distribution probability
table to construct confidence interval for 𝜷. To do this we must follow the following steps:
̂ + 𝒁𝟏 . 𝑺𝑬(𝜷
𝑷𝒓(−𝜷 ̂ ) < −𝜷 < 𝒁𝟐 . 𝑺𝑬(𝜷
̂) − 𝜷
̂ ) = 𝜹%
̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
𝑷𝒓 (𝜷 ̂) > 𝜷 > 𝜷
̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
̂ )) = 𝜹%
̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
𝑷𝒓 (𝜷 ̂) < 𝜷 < 𝜷
̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
̂ )) = 𝜹%
̂ will be (𝜷
Thus, the actual 𝜹% confidence interval for 𝜷 ̂ − 𝒁𝟐 . 𝑺𝑬(𝜷
̂) < 𝜷 < 𝜷
̂ − 𝒁𝟏 . 𝑺𝑬(𝜷
̂ )).
For instance, if 𝜹% is 95% then, the actual 95% confidence interval for 𝜷 will be:
̂−𝜷
𝜷
−𝟏. 𝟗𝟔 < < 𝟏. 𝟗𝟔
𝑺𝑬(𝜷̂)
̂) < 𝜷
−𝟏. 𝟗𝟔. 𝑺𝑬(𝜷 ̂ − 𝜷 < 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂)
̂) − 𝜷
−𝟏. 𝟗𝟔. 𝑺𝑬(𝜷 ̂ < −𝜷 < 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂) − 𝜷
̂
̂ + 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
𝜷 ̂) > 𝜷 > 𝜷
̂ − 𝟏. 𝟗𝟔. 𝑺𝑬(𝜷
̂)
𝒁𝟏 = −𝟏. 𝟗𝟔 𝒁𝟐 = 𝟏. 𝟗𝟔
The meaning of this confidence interval is that there is 95% chance for this interval to contain the
true value of the unknown parameter 𝜷 within its range.
Therefore, for any confidence level, 𝜹%, the confidence interval for 𝜷 based on Z-distribution is
given as:
𝜷=𝜷 ̂ ± 𝒁𝒕 . 𝑺𝑬(𝜷
̂)
Note that: in the language of hypothesis testing, the confidence interval that we have established is
called the acceptance region and the area(s) outside the acceptance region is (are) called the critical
region(s), or region(s) of rejection of the null hypothesis. The lower and upper limits of the
acceptance region are called the critical values.
Step-5: Make Decision- we can use the established confidence intervals to test hypotheses about the
parameters, and the decision rules are:
➢ If the hypothesized value of 𝜷 in the null hypothesis lies within the confidence interval, accept
𝑯𝟎 . The implication is that 𝜷 ̂ is statistically insignificant.
➢ If the hypothesized value of 𝜷 in the null hypothesis is outside the limit, reject 𝑯𝟎 . This
̂ is statistically significant.
indicates 𝜷
Example: construct a 95% confidence interval for the population slope parameter 𝜷, for the
estimated Wheat supply function.
𝑸 = 𝟔𝟎 + 𝟒𝑷
𝑺𝑬: (𝟓𝟎) (𝟏. 𝟓)
Where, the values in the bracket are standard errors and 𝒏 = 𝟏𝟓𝟎 𝒇𝒂𝒓𝒎𝒆𝒓𝒔
The procedure for constructing confidence interval with t-distribution is similar to the one outlined
for Z-distribution the only difference is we must take into account the degrees of freedom while
using t-distribution probability table. Therefore, for any confidence level, 𝜹%, the confidence
interval for 𝜶 𝑎𝑛𝑑 𝜷 based on t-distribution can be given as:
̂)
̂ ± 𝒕𝜶⁄ . 𝑺𝑬(𝜶
𝜶=𝜶 𝑎𝑛𝑑 ̂ ± 𝒕𝜶⁄ . 𝑺𝑬(𝜷
𝜷=𝜷 ̂)
𝟐 𝟐
Forecasting using regression involves making predictions about the dependent variable based on
average relationships observed in the estimated model. Predicted values are values of the dependent
variable based on the estimated model and a prediction about the values of the independent
variables.
For a simple regression, the value of Y is predicted as:
̂=𝜶
𝒀 ̂ 𝑿𝒑
̂ +𝜷
Where, 𝒀 ̂ is the predicted value of the dependent variable, and
𝑿𝒑 is the predicted value of the independent variable (input).
The estimated intercept and all the estimated slopes are used in the prediction of the value of the
dependent variable, even if a slope is not statistically significantly different from zero.
Example-1:
Suppose you have an estimated model of sales (Y) of firms producing a particular product as a
function of advertisement expenditure (X):
̂ = 𝟐. 𝟓𝑿𝟏
𝒀
In addition, you have predicted value for the independent variable (advertisement expenditure
in ETB), 𝐗 𝟏 = 𝟐𝟎𝟎. Then the predicted value for Sales is 500 Units.
Example-2:
Recall our estimated consumption Model: 𝐂𝐨𝐧𝐬̂ 𝐢 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕𝐢𝐧𝐜𝐢 . Based on this estimated
model predict the consumption expenditure of a household whose income is 12 ETB?
̂ 𝐢 = 𝟏. 𝟓𝟑 + 𝟎. 𝟔𝟎𝟕(𝟏𝟐) = 𝟖. 𝟖𝟏𝟒. That means, a household with an income
Solution: 𝐂𝐨𝐧𝐬
of 12 ETB will spend 8.814 ETB of its income for consumption.
Reporting the Results of Regression Analysis
The results of the regression analysis derived are reported in conventional formats. It is not sufficient
merely to report the estimates of 𝜷’s.
➢ There are two conventional ways to report a regression result:
i. Equation form, i.e., by fitting the estimated coefficients in to the regression model and
ii. Table form