100% found this document useful (1 vote)
757 views205 pages

Econometrics ppt-1

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
757 views205 pages

Econometrics ppt-1

Uploaded by

lemmademe204
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 205

HARAMAYA UNIVERSITY

COLLEGE OF AGRICULTURE AND


ENVIRONMENTAL SCIENCES

Econometrics
Kindineh Sisay
(Lecturer of Agricultural and Applied Economics)
Email: [email protected]

January, 2022
Haramaya University
CHAPTER OUTLINES
Unit 1: Fundamental concepts of Econometrics
Unit 2: Correlation Theory
Unit 3: Simple Linear Regression
Models
Unit 4: Multiple Regression Analysis
Unit 5: Econometric Problems
Unit 6: Non-linear regression and Time
series Economtrics
CHAPTER ONE
INTRODUCTION TO ECONOMETRICS

Outlines:
Definition and scope of econometrics

Why a Separate Discipline?

Economic models vs. econometric models

Methodology of econometrics

Desirable properties of an econometric model

Goals of econometrics
DEFINITION AND SCOPE

What is Econometrics?

oIt is a social science in which the tools of economic


theory, mathematics and statistical inference are
applied to the analysis of economic phenomena.
o Econometrics is the science which integrates
economic theory, economic statistics, and
mathematical economics to investigate the
empirical support of the general law established by
economic theory.
oIt is concerned with the measuring of economic
relationships.
CONT’D…

In short, econometrics may be considered as the


integration of economics, mathematics, and
statistics for the purpose of providing numerical
values for the parameters of economic
relationships and verifying economic theories.
ECONOMETRICS VS. MATHEMATICAL ECONOMICS & ECONOMIC
THEORY

Mathematical economics states economic theory in


terms of mathematical symbols while economic
theory use verbal exposition.
 There is no essential difference between them.
 Both express economic relationships in an exact or
deterministic form.
 Neither mathematical economics nor economic
theory allows for random elements which might
affect the relationship and make it stochastic.
CONT’D…
 Further, they do not provide numerical values for
the coefficients of economic relationships.
 Econometrics differs from mathematical economics
in that, it assumes random relationships among
economic variables.
 Econometric methods are designed to take into
account random disturbances which relate
deviations from exact behavioral patterns.
Moreover, econometric methods provide numerical
values of the coefficients of economic relationships.
 Example: Law of Demand ...
CONT’D…
Econometrics vs. statistics

 An economic statistician gathers empirical data, records


them, tabulates them or charts them, and attempts to
describe the pattern in their development over time
 Economic statistics is mainly a descriptive aspect of
economics.
 It does not provide explanations of the development of
the various variables and it does not provide
measurements for the coefficients of economic
relationships.
CONT’D…

 Econometrics utilizes statistical data to

estimate quantitative economic relationships

and to test hypothesis about them.


CONT’D…
Economic models vs. Econometric models

What is model for you?


A model is an organized set of relationships that
describes the functioning of an economic entity
It is a simplified representation of a real-world
process under a set of simplifying assumptions.
The real world system is represented by the
model in order to explain it, to predict it, and to
control it.
CONT’D…

 A given representation of real world system can be


a model if it fulfills the following requirements.

 A good model is both realistic and manageable.


CONT’D…
Economic models

 Any economic theory is an observation from the


real world.
 It is an organized set of relationships that
describes the functioning of an economic entity
under a set of simplifying assumptions.
 All economic reasoning is ultimately based on
models.
CONT’D…
Economic models consist of the following three basic
structural elements.

 A set of variables
 A list of fundamental relationships and
 A number of strategic coefficients
CONT’D…
Econometric models
 They contain a random element which is
ignored by mathematical models and economic
theory which postulate exact relationships
between economic variables.
 Example: Economic theory postulates that the
demand for a commodity depends on its price,
on the prices of other related commodities, on
consumers’ income and on tastes.
 This is an exact relationship which can be
written mathematically as:
CONT’D…
 The above demand equation is exact.
 However, many more factors may affect demand.
 In econometrics the influence of these ‘other’
factors is taken into account by introducing random
variable.

 The random term (also called error term or


disturbance term) is a surrogate variable for
important variables excluded from the model, errors
+
committed and measurement errors.Ui
METHODOLOGY OF ECONOMETRICS

 Starting with the postulated theoretical


relationships among economic variables,
econometric research or inquiry generally
proceeds along the following lines/stages:

 Specification of the model


 Estimation of the model
 Evaluation of the estimates
 Evaluation of he forecasting power of the
estimated model
CONT’D…
1. Specification of the model
In this step the econometrician has to express the relationships
between economic variables in mathematical form.

The step involves the determination of three important


issues:

i. Determine dependent and independent (explanatory) variables to be


included in the model,

ii. Determine a priori theoretical expectations about the size and sign
of the parameters of the function, and

iii. Determine mathematical form of the model (number of equations,


specific form of the equations, etc.).
CONT’D…
 Specification of the econometric model will be based on
economic theory and any available information related to the
phenomena under investigation.

 Moreover, knowledge of economic theory and familiarity


with the particular phenomenon being studied is also needed.

 It is the most important and the most difficult stage of any


econometric research.
CONT’D…
likelihood of committing errors or incorrectly specifying the
model.

The most common errors of specification are:


 Omissions of some important variables from the function.

 The omissions of some equations (for example, in


simultaneous equations model).

 The mistaken mathematical form of the functions.


CONT’D…
2. Estimation of the model

This is purely a technical stage which requires knowledge of


the various econometric methods, their assumptions and the
economic implications for the estimates of the parameters.

This stage includes the following activities:


 Gathering of the data on the variables included in the
model.

 Examination of the identification conditions of the function


(especially for simultaneous equations models).
CONT’D…
 Examination of the aggregations problems involved in the
variables of the function.

 Examination of the degree of correlation between the


explanatory variables (i.e., examination of the problem of
multicollinearity).

 Choice of appropriate economic techniques for estimation,


i.e. to decide a specific econometric method to be applied in
estimation; such as, OLS, MLM…
CONT’D…
3. Evaluation of the estimates
 Consists of deciding whether the estimates of the parameters
are theoretically meaningful and statistically significant.

 Enables the econometrician to evaluate the results of


calculations and determine the reliability of the results.

Criteria:
Economic a priori criteria: determined by economic theory
and refer to the size and sign of the parameters of economic
relationships.
CONT’D…
 Statistical criteria (first-order tests):
 These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the
parameters of the model.

o Correlation coefficient test, standard error test, t-test, F-


test, and R2-test are some of the most commonly used
statistical tests.

Econometric criteria (second-order tests): These are set by


the theory of econometrics and aim at the investigation of
whether the assumptions of the econometric method
employed are satisfied or not in any particular case.
CONT’D…
 The econometric criteria serve as a second order test (as
test of the statistical tests)

 i.e., they determine the reliability of the statistical


criteria;
 they help us establish whether the estimates have the
desirable properties of unbiasedness, consistency, etc.

 Econometric criteria aim at the detection of the violation or


validity of the assumptions of the various econometric
techniques.
CONT’D…
4. Evaluation of the forecasting power of the model

 Forecasting is one of the aims of econometric research.


 However, before using an estimated model for
forecasting by some way or another, the predictive power
and other requirements of the model need to be checked.

 It is possible that the model may be economically


meaningful and statistically, and econometrically correct
for the sample period for which the model has been
estimated.
CONT’D…

 This stage involves the investigation of the stability


of the estimates and their sensitivity to the changes
in the size of the sample.

 The estimated function should performs


adequately outside the sample data which require
model performance test under extra sample.
DESIRABLE PROPERTIES OF AN
ECONOMETRIC MODEL

The ‘goodness’ of an econometric model is judged

customarily based on the following desirable

properties:

 Theoretical plausibility;

 Compatibility: The model should be compatible with the

postulates of economic theory and adequately describe the


economic phenomena to which it relates.
CONT’D…
 Explanatory ability:
The model should be able to explain the observations
of the actual world.
It must be consistent with the observed behavior of
the economic variables whose relationship it
determines.
 Accuracy of the estimates of the parameter:
The estimates should possess (if possible) the
desirable properties of unbiasedness, consistency and
efficiency.
CONT’D…
 Forecasting ability: The model should produce
satisfactory predictions of future values of the
dependent variables.

 Simplicity: The model should represent the economic


relationships with maximum simplicity.

The fewer the equations and the simpler their mathematical

form, the better the model provided that the other desirable
properties are not affected by the simplifications of the model.
GOALS OF
ECONOMETRICS
Basically there are three main goals of
Econometrics.

 Analysis i.e., testing economic theory


 Policy making i.e., obtaining numerical
estimates of the coefficients of economic
relationships for policy simulations.
 Forecasting i.e., using the numerical
estimates of the coefficients in order to forecast
the future values of economic magnitudes.
 Use current and past economic data to predict future values of variables
such as inflation, GDP, stock prices, etc.
Review questions

• How would you define econometrics?


• How does it differ from mathematical economics and
statistics?
• Describe the main steps involved in any econometrics
research.
• Differentiate between economic and econometric
model.
• What are the goals of econometrics?
UNIT 2: CORRELATION
THEORY

Basic concepts of Correlation

Types of correlation

Methods and Types of Measuring


Correlation
BASIC CONCEPTS OF CORRELATION

Economic variables have a great tendency of moving


together

There is a possibility that the change in one variable is


on average accompanied by the change of the other
variable. This situation is known as correlation.

Correlation may be defined as the degree of linear


relationship existing between two or more variables.

The degree of relationship existing between two


variables is called simple correlation.
CONT’D…
The degree of relationship connecting three or more
variables is called multiple correlations.

A correlation is also said to be partial if it studies the


degree of relationship between two variables keeping
all other variables connected with these two are
constant.

Correlation may be linear, when all points (X, Y) on


scatter diagram seem to cluster near a straight, or
CONT’D…

Correlation is said to be linear if the change in one


variable brings a constant change of the other.

It may be non-linear if the change in one variable


brings a different change in the other.

Correlation may also be positive or negative.


CONT’D…
For example, the correlation between price of a
commodity and its quantity supplied is positive since as
price rises, quantity supplied will be increased and vice
versa.

Correlation is said to negative if an increase or a


decrease in one variable is accompanied by a decrease or
an increase in the other in which both are changed with
opposite direction.

For example, the correlation between price of a


commodity and its quantity demanded.
METHODS AND TYPES OF MEASURING CORRELATION

 In correlation analysis there are two important things


to be addressed.

• These are the type of co-variation existed between variables


and its strength.

There are three methods of measuring correlation.


These are:
 The Scattered Diagram or Graphic Method
 The Simple Linear Correlation coefficient
 The coefficient of Rank Correlation
1. THE SCATTERED DIAGRAM OR
GRAPHIC METHOD
The scatter diagram is a rectangular diagram which can help us
in visualizing the relationship between two phenomena.
 It puts the data into X-Y plane by moving from the lowest data
set to the highest data set.
 It is a non-mathematical method of measuring the degree of
co-variation between two variables.
 Scatter plots usually consist of a large body of data.
CONT’D…
 The closer the data points come together and make a straight
line, the higher the correlation between the two variables, or the
stronger the relationship.

 If the data points make a straight line going from the origin out
to high x- and y-values, then the variables are said to have a
positive correlation.

 If the line goes from a high-value on the y-axis down to a high-


value on the x-axis, the variables have a negative correlation.
CONT’D…
 A perfect positive correlation = 1, perfect negative
correlation = -1.

 no correlation = 0.

 The closer the number is to 1 or -1, the stronger the


correlation, or the stronger the relationship between the
variables.

 The closer the number is to 0, the weaker the correlation.

 Two variables may have a positive correlation, negative


correlation, or they may be uncorrelated.
CONT’D…
 Two variables are said to be positively correlated if
they tend to change together in the same direction,
that is, if they tend to increase or decrease together.

 Two variables are said to be negatively correlated if


they tend to change in the opposite direction
CONT’D…

 When X increases Y decreases, and vice versa.

 For example, saving and household size are negatively


correlated. When HH size increases, saving decreases and, vice
versa.

 The scatter diagram indicates the strength of the relationship


Provide u’r own example for
between the
+ two variables.
vely and – vely correlated
variables. Or find
independent variables that
2. SIMPLE CORRELATION

For a precise quantitative measurement of the degree of correlation


between Y and X we use a parameter which is called the correlation
coefficient.

For example if we measure the correlation between X and Y the


population correlation coefficient is represented by xy and its sample

estimate by rxy.

The simple correlation coefficient is used to measure relationships


which are simple and linear only.
CONT’D…
It cannot help us in measuring non-linear as well as multiple
correlations.

Sample correlation coefficient is defined by the formula


n X i Yi   X Y
i i

r
n X i  ( X i ) 2 n Yi  ( Yi ) 2
2 2

x i yi
rxy 
x y
2 2
i i

Where, xi  X i  X and y i Yi - Y


CONT’D…
Example 2.1: The following table shows the quantity
supplied for a commodity with the corresponding price
values.

Q . Determine the type of correlation that


exists between these two variables.

Table 1: Data for computation of correlation coefficient


CONT’D…
Time period(in Quantity supplied Yi Price Xi (in Birr)
days) (in tons)
1 10 2
2 20 4
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
CONT’D…
Y X xi  X i  X y i Yi  Y x2 y2 xiyi XY X2 Y2
10 2 -9 -51 81 2601 459 20 4 100
20 4 -7 -41 49 1681 287 80 16 400
50 6 -5 -11 25 121 55 300 36 2500
40 8 -3 -21 9 441 63 320 64 1600
50 10 -1 -11 1 121 11 500 100 2500
60 12 1 -1 1 1 -1 720 144 3600
80 14 3 19 9 361 57 1120 196 6400
90 16 5 29 25 841 145 1440 256 8100
90 18 7 29 49 841 203 1620 324 8100
120 20 9 59 81 3481 531 2400 400 14400
Sum=610 110 0 0 330 10490 1810 8520 1540 47700
Mean=61 11
CONT’D…
n XY   X Y 10(8520)  (110)(610)
r  0.975
10(1540)  (110)(110) 10(47700)  (610)(610)
n  X 2  ( X ) 2 n  Y 2  ( Y ) 2

Or using the deviation form (Equation 2.2), the correlation coefficient can be
computed as:
1810
r 0.975
330 10490
CONT’D…
 There is a strong positive correlation between the quantity
supplied and the price of the commodity under consideration.

 NB: If the correlation coefficient is zero, it indicates that there


is no linear relationship between the two variables.

 If the two variables are independent, the value of correlation


coefficient is zero but zero correlation coefficient does not show
us that the two variables are independent.
PROPERTIES OF SIMPLE CORRELATION COEFFICIENT

The value of correlation coefficient always ranges between -1


and +1.

The correlation coefficient is symmetric. That means, the


correlation coefficient of X on Y, is the correlation coefficient of Y
on X.

The correlation coefficient is independent of change of origin


and change of scale.
CONT’D …

If X and Y variables are independent, the correlation


coefficient is zero. But the converse is not true.

The correlation coefficient has the same sign with that of


regression coefficients.

The correlation coefficient is the geometric mean of two


regression coefficients.
THE MAJOR LIMITATIONS OF THE
METHOD ARE:
The value of the coefficient is unduly affected by
the extreme values

The coefficient requires the quantitative


measurement of both variables.

The correlation coefficient always assumes linear


relationship regardless of the fact whether the
assumption is true or not.

The coefficient is misinterpreted.


3. THE RANK CORRELATION
COEFFICIENT

In many cases the variables may be qualitative (or binary


variables) and hence cannot be measured numerically.

For example, profession, education, preferences for particular


brands, are such categorical variables.

For such cases it is possible to use another statistic, the rank


correlation coefficient (or spearman’s correlation coefficient.).

We rank the observations in a specific sequence for example in


order of size, importance, etc., using the numbers 1, 2, 3, …, n.
CONT’D…
In other words, we assign ranks to the data and measure
relationship between their ranks instead of their actual
numerical values.

Hence, the name of the statistic is given as rank correlation


coefficient.

If two variables X and Y are ranked in such way that the values
are ranked in ascending or descending order, the rank
correlation coefficient may be computed by the formula
CONT’D…

Where,
D = difference between ranks of corresponding pairs of
X and Y
n = number of observations.

 The values that r may assume range from + 1 to – 1.


 Two points are of interest when applying the rank correlation
coefficient.
 It does not matter whether we rank the observations in
ascending or descending order.
 However, we must use the same rule of ranking for both
CONT’D…EXAMPLE

Brands of A B C D E F G H I J K L Tot
soap al
Person I 9 1 4 1 8 1 3 2 5 7 1 6
0 1 2
Person II 7 8 3 1 1 1 2 6 5 4 1 9
0 2 1
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -
3
Di 2 4 4 1 0 4 1 1 1 0 9 1 9 50
6
CONT’D…
The rank correlation coefficient

This figure, 0.827, shows a marked similarity of preferences of


the two persons for the various brands of soap.
4. PARTIAL CORRELATION
COEFFICIENTS
A partial correlation coefficient measures the
relationship between any two variables, when all
other variables connected with those two are kept
constant.
For example, let us assume that we want to
measure the correlation between the number of
hot drinks (X1) consumed in a summer resort and
the number of tourists (X2) coming to that resort.
In order to measure the true correlation between
X1 and X2, we must find some way of accounting for
changes in X3.
CONT’D…
This is achieved with the partial correlation coefficient
between X1 and X2, when X3 is kept constant.

The partial correlation coefficient is determined in


terms of the simple correlation coefficients among the
various variables involved in a multiple relationship.

In our example there are three simple correlation


coefficients
CONT’D…
r12 = correlation coefficient between X1 and X2
r13 = correlation coefficient between X1 and X3
r23 = correlation coefficient between X2 and X3

The partial correlation coefficient between X1 and X2, keeping


the effect of X3 constant is given by:
EXAMPLE
The following table gives data on the yield of corn
per acre(Y), the amount of fertilizer used(X1) and
the amount of insecticide used (X2). Compute the
partial correlation coefficient between the yield of
corn and the fertilizer used keeping the effect of
insecticide constant.
Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980

Y 40 44 46 48 52 58 60 68 74 80

X1 6 10 12 14 16 18 22 24 26 32

X2 4 4 5 7 9 12 14 20 21 24
ANSWER
x1x

Year Y X1 X2 y x1 x2 x1y x2y 2 x12 x22 y2


197 20
1 40 6 4 -17 -12 -8 4 136 96 144 64 289
197 10
2 44 10 4 -13 -8 -8 4 104 64 64 64 169
197
3 46 12 5 -11 -6 -7 66 77 42 36 49 121
197
4 48 14 7 -9 -4 -5 36 45 20 16 25 81
197
5 52 16 9 -5 -2 -3 10 15 6 4 9 25
ANSWER
ryx1 = 0.9854

ryx = 0.9917
2

rx1x = 0.9725
2

Then
Any Q uestion …

Q
UNIT 3: SIMPLE LINEAR
REGRESSION MODELS

Basic Concepts and Assumptions

Least Squares Criteria

Normal Equations of OLS

Coefficient of Correlation and Determination

Properties of OLS Estimators


 Hypothesis Testing
BASIC CONCEPTS AND
ASSUMPTIONS

There are two major ways of estimating regression functions.


ordinary least square (OLS) method and
maximum likelihood (MLH) method.
The ordinary least square method is the easiest and the most
commonly used method as opposed to the maximum likelihood
(MLH) method which is limited by its assumptions.
MLH method is valid only for large sample as opposed to the
OLS method which can be applied to smaller samples.
CONT’D…
Classical Assumptions
The error terms ‘Ui’ are randomly distributed or the disturbance
terms are not correlated. This means that there is no systematic
variation or relation among the value of the error terms (Ui and
Uj); Where i = 1, 2, 3, …….., j = 1, 2, 3, ……. and .
Cov (Ui , Uj) = 0 for population
Cov (ei, ej) = 0 for sample .
The disturbance terms ‘Ui’ have zero mean. This implies the
sum of the individual disturbance terms is zero.
CONT’D…
This results in an upward or down ward shift
in the regression function.
CONT’D…
The disturbance terms have constant variance in each period
(homoscedasticity).
if the variance of the error terms varies as sample size changes
or as the value of explanatory variables changes, then this
leads to heteroscedasticity problem.

Explanatory variables ‘Xi’ and disturbance terms ‘Ui’ are


uncorrelated or independent.
CONT’D…
All the co-variances of the successive values of the error term
are equal to zero.

 The value in which the error term assumed in one period does
not depend on the value in which it assumed in any other
period. non-autocorrelation or non-serial correlation.

 The explanatory variable Xi is fixed in repeated samples


(the explanatory variables are non-random).

 Each value of Xi does not vary for instance owing to


change in sample size.
CONT’D…
 Linearity of the model in parameters. The simple linear
regression requires linearity in parameters; but not necessarily
linearity in variables(what is important is transforming the data
as required).

 Normality assumption-The disturbance term Ui is assumed to


have a normal distribution with zero mean and a constant
variance.

 Explanatory variables should not be perfectly, linearly


and/or highly correlated.
CONT’D…

The relationship between variables (or the model) is correctly


specified.

The explanatory variables do not have identical value. This


assumption is very important for improving the precision of
estimators.
OLS METHOD OF ESTIMATION

Estimating a linear regression function using the


Ordinary Least Square (OLS) method is simply about
calculating the parameters of the regression
function for which the sum of square of the error
terms is minimized.
Suppose we want to estimate the following equation
Sample regression:
CONT’D…

RSS= Residual Sum of


Squares.

Note that the equation is a composite function and


we should apply a chain rule in finding the partial
derivatives with respect to the parameter
estimates.
CONT’D…

That is, the partial derivative with respect to ̂ 0


CONT’D…

n XY  ( X )(  Y )
ˆ1 
Or
n X 2  ( X ) 2

_ _
 XY  n Y X and we have ˆ 0 Y  ˆ1 X

1  from above
_
 X i2  n X 2
CONT’D…

Example: Given the following sample data of three


pairs of ‘Y’ (dependent variable) and ‘X’
(independent variable).

Yi Xi
10 30
20 50
30 60
CONT’D…

A. find a simple linear regression


function; Y = f(X)

B. Interpret your result.

C. Predict the value of Y when X is


45.
CONT’D…
Solution
a. To fit the regression equation we do the following computations.
Yi Xi Yi Xi Xi2
10 30 300 900
20 50 1000 2500
30 60 1800 3600
Sum 60 140 3100 7000
Mean Y = 20 140
X=
3

n XY  ( X )(  Y )
3(3100)  (140)( 60)
ˆ1   0.64
3(7000)  (140) 2
n X 2  ( X ) 2

ˆ0 Y  ˆ1 X 20  0.64(140 / 3)  10

Thus the fitted regression function is given by: Yˆi   10  0.64 X i


CONT’D…
B. Interpretation, the value of the intercept term, -10,
implies that the value of the dependent variable ‘Y’ is – 10
when the value of the explanatory variable is zero.

The value of the slope coefficient is a measure of the


marginal change in the dependent variable ‘Y’ when
the value of the explanatory variable increases by one.

 For instance, in this model, the value of ‘Y’ increases on


average by 0.64 units when ‘X’ increases by one.

C. Y= -10+(0.64) (45)=18.8
CONT’D…
 That means when X assumes a value of 45, the
value of Y on average is expected to be 18.8.

 The regression coefficients can also be obtained


by simple formulae by taking the deviations
between the original values and their means.

Then, the coefficients can be represented by


alternative formula:
CONT’D…
Example 2.5: Find the regression equation for the data under Example 2.4, using the
shortcut formula. To solve this problem we proceed as follows.
Yi Xi y x xy x2 y2
10 30 -10 -16.67 166.67 277.78 100
20 50 0 3.33 0.00 11.11 0
30 60 10 13.33 133.33 177.78 100
Sum 60 140 0 0 300.00 466.67 200
Mean 20 46.66667

Then

x i yi
300 , and ˆ0 Y  ˆ1 X =20-(0.64) (46.67) = -10 with
ˆ1   0.64
466.67
x
2
i

results similar to previous case.


PROPERTIES OF OLS ESTIMATORS

 There can be several samples of the same size


that can be drawn from the same population.
 For each sample, the parameter estimates have
their own specific numerical values. Therefore, the
parameter estimate have different values for
estimating a given true population parameter.
 Thus, the parameter estimates also have a
normal distribution with their associative mean and
variance.
THE MEAN AND VARIANCE OF THE
PARAMETER ESTIMATES

Formula for mean and variance of the respective parameter


estimates and the error term are given below:
1. The mean of

2. The variance of

3. The mean of
4. The variance of
5. The estimated value of the variance of the error term
STANDARD ERROR OF THE
PARAMETERS
HYPOTHESIS TESTING

After estimation of the parameters there are hypothesis


testing.

We have to know that to what extent our estimates are


reliable enough and/or acceptable for further purpose.

That means, we have to evaluate the degree of


representativeness of the estimate to the true
population parameter.

Simply a model must be tested for its significance


before it can be used for any other purpose.
CONT’D…
The available test criteria are divided in to three
groups (Theoretical a priori criteria, statistical
criteria and econometric criteria).
1. Priori criteria set by economic theories are in
line with the consistency of coefficients of
econometric model to the economic theory.
2. Statistical criteria(first order tests), are set by
statistical theory and refer to evaluate the
statistical reliability of the model.
3. Econometric criteria refer to whether the
assumptions of an econometric model
employed in estimating the parameters are
fulfilled or not.
CONT’D…

bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbb
THE COEFFICIENT OF
DETERMINATION (R 2 )

The coefficient of determination is the measure of


the amount or proportion of the total variation of the
dependent variable that is determined or explained
by the model or the presence of the explanatory
variable in the model.

The total variation of the dependent variable is


measured from its arithmetic mean.
CONT’D…

The total variation of the dependent variable is given in the following form;

TSS=ESS + RSS

o which means total sum of square of the dependent variable is split into
explained sum of square and residual sum of square.
CONT’D…
The coefficient of determination is given by the formula

 (Yˆ  yˆ
2
i  Y )2 i
Explained Variation in Y
R2   
Total Variation in Y
 (Y y
2
i  Y )2 i

2.16
 2 
Since  y i   1  x i y i the coefficient of determination can also be given as

 1  xi y i
R2 
 y i2

Or

 (Y  Yˆi ) 2 e
2
i i
Unexplained Variation in Y
R 2 1  1  1 
Total Variation inY
 (Y y
2 2
i  Y) i
CONT’D…

The higher the coefficient of determination is the


better the fit.
Conversely, the smaller the coefficient of
determination is the poorer the fit.
One minus the coefficient of determination is called
the coefficient of non-determination, and it
gives the proportion of the variation in the
dependent variable that remained undetermined or
unexplained by the model.
TESTING THE SIGNIFICANCE OF
REGRESSION COEFFICIENT

Since the sample values of the intercept and the


coefficient are estimates of the true population
parameters, we have to test them for their statistical
reliability.

The significance of a model can be seen in terms of


the amount of variation in the dependent variable
that it explains and the significance of the
regression coefficients.
CONT’D…

 There are different tests that are available to test


the statistical reliability of the parameter estimates.
The following are the common ones;

 The standard error test


 The students t-test
 The standard normal test
1. THE STANDARD ERROR
TEST
This test first establishes the two hypotheses (the
null and alternative hypotheses). The two
hypotheses are given as follows:
H0: βi=0
H1: βi≠0
The standard error test is outlined as follows:
1. Compute the standard deviations of the
parameter estimates using the formula
CONT’D…

Compare the standard errors of the estimates with


the regression coefficients and make decision.
A) If the standard error of the estimate is less than
half of the absolute regression coefficients, that the
estimate is statistically significant.
If so, reject the null hypothesis and we can conclude
that the estimate is statistically significant.
CONT’D…

B) If the standard error of the estimate is greater


than half of the numerical value of the estimate,
the parameter estimate is not statistically reliable.

If so, conclude to accept the null hypothesis and


conclude that the estimate is not statistically
significant.
2. THE STUDENT T-TEST

In conditions where Z-test is not applied, t-test can


be used to test the statistical reliability of the
parameter estimates and vice versa.

The test depends on the degrees of freedom that the


sample has. The test procedures of t-test are similar
with that of the z-test.
CONT’D…
The procedures are outlined as follows;
 Set up the hypothesis.
 Determine the level of significance (usually a 5% )level)
 Determine the tabulated value of t from the table with n-k
degrees of freedom, where k is the number of parameters
estimated.
 Determine the calculated value of t.

The test rule or decision is given as follows:


Reject H0 if
| t cal |t / 2,n  k
ˆi
tcal 
se( ˆi )
3. THE STANDARD NORMAL
TEST
This test is based on the normal distribution.
The test is applicable if:
The standard deviation of the population is known irrespective
of the sample size
The standard deviation of the population is unknown provided
that the sample size is sufficiently large (n>30).
CONT’D…
The standard normal test or Z-test is outlined
as follows;
 Test the null hypothesis against the alternative
hypothesis
 Determine the level of significant (it is common
in applied econometrics to use 5% level of
significance).
 Determine the theoretical or tabulated value of Z
from the table.
 Make decision.
CONT’D…
The decision of statistical hypothesis testing consists
of two decisions; either accepting the null hypothesis
or rejecting it.
If , accept the null hypothesis while if , reject the null
hypothesis.
It is true that most of the times the null and
alternative hypotheses are mutually exclusive.
Accepting the null hypothesis means that rejecting
the alternative hypothesis and rejecting the null
hypothesis means accepting the alternative
hypothesis.
CONT’D…
Example: If the regression has a value of =29.48 and the
standard error is 36. Test the hypothesis that the value of at 5%
level of significance using standard normal test.
Solution: We have to follow the procedures of the test.
After setting up the hypotheses to be tested, the next step is to
determine the level of significance in which the test is carried out.
In the above example the significance level is given as 5%.
The third step is to find the theoretical value of Z at specified level
of significance. From the standard normal table we can get that .
The fourth step in hypothesis testing is computing the observed or
calculated value of the standard normal distribution using the
following formula.
. Since the calculated value of the test statistic is less than the
tabulated value, the decision is to accept the null hypothesis and
conclude that the value of the parameter is 25.
CONFIDENCE INTERVAL
ESTIMATION OF THE
REGRESSION COEFFICIENTS
rejecting the null hypothesis does not mean that the
parameter estimates are correct estimates of the
true population parameters.
It means that the estimate comes from the sample
drawn from the population whose population
parameter is significantly different from zero.

In order to define the range within which the true


parameter lies, we must construct a confidence
interval for the parameter. we can construct 100(1-
) % confidence intervals for the sample regression
coefficients.
CONT’D…
NB. The standard error of a given coefficient is the positive
square root of the variance of the coefficient.
X
2
i

 Variance of the intercept is given


var( ˆby
0 )  u
2

n xi
2

1
var(ˆ1 )  u
 Variance of the slope is given by 2

x
2
i

e
2
i
Where u2 
n k

is the estimate of the variance of the random term and k


is the number of parameters to be estimated in the model.
CONT’D…
The standard errors are the positive square root of the
variances and the 100 (1- ) % confidence interval for the
slope is given by:
   
1  t (n  k )( se( 1 ))  1  1  t (n  k )( se( 1 ))
2 2

1 ˆ1 t / 2,n k ( se( ˆ1 ))

 0 ˆ0 t / 2,n k ( se( ˆ0 ))


CONT’D…
Example 2.6: The following table gives the quantity supplied
(Y in tons) and its price (X pound per ton) for a commodity
over a period of twelve years.

Y 69 76 52 56 57 77 58 55 67 53 72 64

X 9 12 6 10 9 10 7 8 12 6 11 8
SOLUTION

Data for computation of different parameters

Time Y X XY X2 Y2 x y xy x2 y2 Yˆ ei ei2
1 69 9 621 81 4761 0 6 0 0 36 63.00 6.00 36.00
16
2 76 12 912 144 5776 3 13 39 9 9 72.75 3.25 10.56
12
3 52 6 312 36 2704 -3 -11 33 9 1 53.25 -1.25 1.56
- 105.0
4 56 10 560 100 3136 1 -7 -7 1 49 66.25 10.25 6
5 57 9 513 81 3249 0 -6 0 0 36 63.00 -6.00 36.00
19 115.5
6 77 10 770 100 5929 1 14 14 1 6 66.25 10.75 6
7 58 7 406 49 3364 -2 -5 10 4 25 56.50 1.50 2.25
8 55 8 440 64 3025 -1 -8 8 1 64 59.75 -4.75 22.56
9 67 12 804 144 4489 3 4 12 9 16 72.75 -5.75 33.06
10
10 53 6 318 36 2809 -3 -10 30 9 0 53.25 -0.25 0.06
11 72 11 792 121 5184 2 9 18 4 81 69.50 2.50 6.25
CONT’D…
1. Estimate the Coefficient of determination (R2), Coefficient
of non-determination (1- R2) and interpret the result.
2. Find the variance and standard error of the intercept
3. Find the variance and standard error of the slope
4. Run significance test of regression coefficients using the
following test methods
The standard error test
The students t-test

5. Fit the linear regression equation and determine the 95%


confidence interval for the slope.
SOLUTION
1. Estimate the Coefficient of determination (R2)

e
2
i
2 387
R 1  1  1  0.43 0.57
894
y
2
i

This result shows that 57% of the variation in the quantity


supplied of the commodity under consideration is explained by
the variation in the price of the commodity; and the rest 43%
remain unexplained by the price of the commodity
CONT’D…
4. Run significance test of regression coefficients using fitted
regression line for the data given:

Yi 33.75  3.25 X i
(8.3) (0.9)

A. Standard Error test


Since the standard error is less than half of the numerical
value of the slope, we have to reject the null hypothesis and
conclude that is statistically significant.
CONT’D…
B. The students t-test
ˆi 3.25
t cal   3.62
ˆ
se(  i ) 0.8979

Further tabulated value for t is 2.228.


When we compare these two values, the calculated t is greater
than the tabulated value. Hence, we reject the null hypothesis.
Rejecting the null hypothesis means, concluding that the price
of the commodity is significant in determining the quantity
supplied for the commodity.
CONT’D…
Generally, a two tail test of a null hypothesis at 5% level of
significance can be reduced to the following two t-rules.

1. If tcal is greater than 2 or less than -2, we reject the null


hypothesis
2. If tcal is less than 2 or greater than -2, accept the null
hypothesis.
CONT’D…
5. To estimate confidence interval we need standard error
which is determined as follows:
e
2
i
387 387
u2    38.7
n  k 12  2 10

1 1
var( ˆ1 )  u
2
38.7( ) 0.80625
48
x 2

se( ˆ1 )  var( ˆ1 )  0.80625 0.8979

ˆ1 3.25 (2.228)(0.8979) 3.25 2 3.25  2, 3.25  2 1.25, 5.25


PROPERTIES OF OLS
ESTIMATORS

The ideal or optimum properties that the OLS estimates


possess may be summarized by well known theorem
known as the Gauss-Markov Theorem.

Statement of the theorem: “Given the assumptions of the


classical linear regression model, the OLS estimators, in
the class of linear and unbiased estimators, have the
minimum variance, i.e. the OLS estimators are BLUE.
CONT’D…

The least squares estimators are linear, unbiased and have

minimum variance (i.e. are best of all linear unbiased

estimators).

An estimator is called BLUE (Best, Linear, Unbiased


Estimator) if it is:

Linear: a linear function of the random variable, such as, the


dependent variable Y.
CONT’D…

Minimum variance: It has a minimum variance in the class


of linear and unbiased estimators.

NB: An unbiased estimator with the least variance is known


as an efficient estimator.

According to the Gauss-Markov theorem, the OLS estimators


possess all the BLUE properties.
Any Q uestion …

Q
QUIZ

Given the following sample data of three pairs


of ‘Y’ (dependent variable) and ‘X’
(independent variable).
Yi Xi
10 30
20 50
30 60
QUIZ…

1. Fit the regression equation


2. Estimate the Coefficient of determination (R2), Coefficient of non-
determination (1- R2) and interpret the result.
3. Find the variance and standard error of the intercept
4. Find the variance and standard error of the slope
5. Run significance test of regression coefficients using the following
test methods
The standard error test
The students t-test

5. Fit the linear regression equation and determine the 95% confidence
interval for the slope.
UNIT 4: MULTIPLE REGRESSION
ANALYSIS

 Multiple Regression Models


 Notations and Assumptions
 Estimation of Partial regression
coefficients
 Analysis of Variance
 Hypothesis Testing
 Dummy Variable Regression Analysis
MULTIPLE REGRESSION MODEL
The multiple linear regression (population regression function)
in which we have one dependent variable Y, andX 1 , Xk explanatory
2 ,...... Xk
variables ( ), is given by:

Yi  0  1 X 1   2 X 2  ....   k X k  u i

Where,  0  the intercept = value of Y when all X’s are zero


i = are partial slope coefficients
u i = the random term
CONT’D…

But for the sake of simplicity possible regression


model with two-variables regression is presented
as:

Yi   0   1 X 1   2 X 2  u i
NOTATIONS AND ASSUMPTIONS

Randomness of ui - the variable u is a real random


variable.
Zero mean of ui - the random variable has a zero mean for
each value
Homoscedasticity of the random term - the random term
has constant variance. In other words, the variance is the
same for all the explanatory values.
0each are normally distributed
E (u i )of
Normality - the values

E (u i2 )  u2 Cons tan t

u i  N (0,  u2 )
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS

the sample regression function will look like the following.

^ ^ ^ ^
Yi   0  1 X 1   2 X 2
^
ei Yi  Y i

^ ^ ^
 0 ,  1 and  2 in such a way that i
e 2
Minimum
n
 (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2
i i 1
 0 3.5
ˆ0
^
 0
n
 (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2
i i 1
 0 3.6
ˆ1
^
 1
n
 (Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 ) 2
 e 2
i i 1
 0 3.7
ˆ 2
^
 2
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS
Solving equations (3.5), (3.6) and (3.7) simultaneously, we
obtain the system of normal equations given as follows:

^ ^ ^
 Y n     X    X
i 0 1 1i 2 2i 3 .8
^ ^ ^
 X Y   X    X    X X
1i i 0 1i 1 1
2
2 1i 2i 3 .9
^ ^ ^
 X Y   X    X X  b  X
2i i 0 2i 2 1i 2i 2
2
2i 3.10
CONT’D…
Then Letting

x1i  X 1i  X 1 3.11

x 2i  X 2i  X 2 3.12
yi Yi  Y 3.13

^  x y  x   x y  x x 
1 1
2
2 2 1 2
1  3.14
 x  x   x x 
2
1
2
2 1 2
2

^  x y  x   x y  x x 
2
2
1 1 1 2
2  3.15
 x  x   x x 
2
1
2
2 1 2
2

^   
 0 Y  ˆ1 X 1  ˆ 2 X 2 3.16
VARIANCE AND STANDARD
ERRORS OF OLS ESTIMATORS

 Estimating the numerical values of the parameters


is not enough in econometrics if the data are coming
from the samples.
 The standard errors derived are important for two
main purposes:
 To establish confidence intervals for the
parameters and
 To test statistical hypotheses.
 2
 2   
 X 1  x 2  X 2  x1  2 X 1 X 2  x1 x 2 
2 2
^2 1
 ^
Var   0   ui   
n 
 
  x1  x 2  ( x1 x 2 )
2 2 2

 
CONT’D…
^ ^
SE (  0 )  Var (  0 )

 ^  
Var  1   u 
2  x 22  ^ ^
2 
    x1  x 2  ( x1 x 2 ) 
2 2
SE ( 1 )  Var ( 1 )

^ ^ 2 
Var (  2 )  u 
 x12  ^ ^
2 
  x1  x 2  ( x1 x 2 ) 
2 2 SE (  2 )  Var (  2 )


^
2

 e 2
i
u
n 3
COEFFICIENT OF MULTIPLE
DETERMINATION
 Is the measure of the proportion of the variation in the
dependent variable that is explained jointly by the
independent variables in the model.
^

R2 
y 2

3.24
y 2

 However, every time we insert additional explanatory


variable in the model, the increases irrespective of the
improvement in the goodness-of- fit of the model
^ ^
 1  x1 y   2  x 2 y
R2  3.25
 y2
CONT’D…
 That means, highest value may not imply that the
model is good.

2 (n  1)
Rady 1  (1  R 2 ) 3.26
(n  k )

K is number of parameters
R2
 In multiple linear regression, 2therefore, we better interpret
R
the adjusted than the ordinary or the unadjusted .
CONFIDENCE INTERVAL
ESTIMATION
Interpretation of the confidence interval: Values of the
 ) are plausible with 100(1-
parameter lying in the interval %
confidence.

ˆi t / 2,n  k se( ˆi )


HYPOTHESIS TESTING

 Testing hypothesis about an individual partial regression


coefficient;

 Testing the overall significance of the estimated multiple


regression model;

 Testing if two or more coefficients are equal to one


another;

Testing the stability of the estimated regression model


over time/cross-sectional units
TESTING INDIVIDUAL REGRESSION
COEFFICIENTS
using the standard error test or the t-test

H 0 : ˆ1 0 H 0 : ˆ 2 0 H 0 : ˆ K 0
H : ˆ 0 H : ˆ 0
1 2
H 1 : ˆ K 0
1 1

A. Standard Error Test: decision rule is based on the relationship


between the numerical value of the parameter and the standard
error of the
1
same parameter.
S ( ˆ i )  ˆ i
2
If , we reject the null hypothesis, i.e. the estimate is
statistically significant.

Generalisation: The smaller the standard error, the stronger is the


evidence that the estimates are statistically significant.
CONT’D…

B. t-test – the more appropriate and formal way to test the hypothesis.
compute the t-ratios and compare them with the tabulated t-values and
make our decision.

Decision Rule: accept H0 if tcal < ttab

 Rejecting H0 means, the coefficient being tested is significantly

different from 0.

Accepting H0, on the other hand, means we don’t have sufficient

evidence to conclude that the coefficient is different from 0.


TESTING THE OVERALL SIGNIFICANCE OF REGRESSION MODEL

Hypotheses of such type are often called joint


hypotheses.
Testing the overall significance of the model means
testing the null hypothesis that none of the
explanatory variables in the model significantly
determine the changes in the dependent variable.

H 0 :  1   2 0
H 1 :  i 0, at least for one i.
CONT’D…

The test statistic for this test is given by:

 yˆ 2

Fcal  k  1
 e2
n k

Where, k is the number of parameters in the model.


CONT’D…
Overall significance test of a model are summarized
in the analysis of variance (ANOVA) table as follows:

Source of variation Sum of squares Degrees of Mean sum of Fcal


freedom squares
Regression ^2 k1 ^
MSE
SSE   y MSE 
y 2
F
k1 MSR
Residual SSR   e 2 n k e 2

MSR 
n k
Total SST  y 2 n 1
CONT’D…

These three sums of squares:


^
SSE  y  (Yˆi  Y ) 2 Explained Sum of Squares
2

y i2  (Yi  Yˆ ) 2  Unexplained Sum of Squares


SSR  e

SST  y 2  (Yi  Y ) 2 Total Sum of Squares

SST SSE  SSR

The test rule: reject H0 if Fcal ≥ Ftab


RELATIONSHIP BETWEEN F
AND R 2
 yˆ 2

R 2

y 2  e 2
(1  R 2
)  y 2

 yˆ 2
 R 2
y 2
Fcal 
 y2
k  1
 e2
R 2 1 
 e 2
n k
y 2

R2  y2 R2  y2
 e2 1  R 2 Fcal   .
(n  k )
k1 k1 (1  R 2 ) y 2
 y2 (1  R 2 ) y 2
n k
(n  k ) R 2
Fcal  .
k  1 (1  R 2 )
EXAMPLE

The following table shows a particular country’s imports (Y),


the level of Gross National Product(X1) measured in arbitrary

units, and the price index of imported goods (X2), over 12


years period.
CONT’D…

196 196 196 196 196 196 196 196 196 196 197 197
Year 0 1 2 3 4 5 6 7 8 9 0 1

Y 57 43 73 37 64 48 56 50 39 43 69 60

X1 220 215 250 241 305 258 354 321 370 375 385 385

X2 125 147 118 160 128 149 145 150 140 115 155 152
CONT…
a. Estimate the coefficients of the economic relationship and
Year
fit
Y
theX model.
1X x 2x y 1 X x 2xy xy xx 1
2
2
2
1 2 1 2 y2

1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625

1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625

1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625

1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625

1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625

1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625

1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625

1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625

1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625

1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625

1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625

1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
49206.9 2500.66 960.666
Sum 639 3679 1684 0.0004 0.0004 0 2 7 1043.25 -509 7 1536.25
306.583 140.333
Mean 53.25 3 3 0 0 0
CONT…

a. Estimate the parameters and fit the model.

b. Compute the variance and standard errors of the slopes.

c. Calculate and interpret the coefficient of determination


(the adjusted one).

d. Test the overall significance of the model.


CONT’D…

 Y 639 X 1 3679 X 2 1684 n 12

Y 639
Y   53.25
n 12

X 1
3679
X1   306.5833
n 12

X 2
1684
X2   140.3333
n 12
The summary results in deviation forms are then given by:

x x
2 2
1 49206.92 2 2500.667

 x y 1043.25
1 x 2 y  509

x x1 2 960.6667 y 2
1536.25
CONT’D…
CONT’D…
The fitted model is then written as: Yˆi = 75.40512 + 0.025365X1 - 0.21329X2
a) Compute the variance and standard errors of the slopes.
First, you need to compute the estimate of the variance of the random term as follows


^
2

e 
2
1401.223 1401.223
i
 155.69143
u
n 3 12  3 9
^
Variance of  1

 ^  2

Var   1   u 
 x 22 
 155.69143(
2500.667
) 0.003188
  x1  x 2  ( x1 x 2 ) 
2 2 2
  12212724
^
Standard error of  1
^ ^
SE (  1 )  Var (  1 )  0.003188 0.056462
^
Variance of  2
^ 
Var (  2 )  u 

^ 2 x12 
155.69143(
49206.92
) 0.0627
2 
  x1  x 2  ( x1 x 2 ) 
2 2
122127241
^
Standard error of  2
^ ^
SE (  2 )  Var (  2 )  0.0627 0.25046
CONT’D…
Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for you as
an exercise.
a) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R 2.

 yˆ 2
135.0262

e 2
1401.223

y 2
1536.25 (The sum of the above two). Then,

^ ^

2
 1  x1 y   2 x 2 y (0.025365)(1043.25)  (-0.21329)(-509)
R   0.087894
y 2
1356.25

e 2

1401.223
or R 2 1  1  0.087894
1356.25
y 2

b) Compute the adjusted R2.


2 (n  1 12 - 1
R ady 1  (1  R 2 ) 1 - (1 - 0.087894)  0.114796
n  k) 12 - 3
CONT’D…
a) Test the significance of X1 and X2 in determining the changes in Y using t-test.
The hypotheses are summarized in the following table.
Coefficient Hypothesis Estimate Std. error Calculated t Conclusion

1 H0: 1=0 0.025365 0.056462 0.025365 We do not


t cal  0.449249
H1: 10 0.056462 reject H0 since
tcal<ttab
2 H0: 2=0 -0.21329 0.25046  0.21329 We do not
t cal   0.85159
H1: 20  0.21329 reject H0 since
tcal<ttab

The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test revealed
that both X1 and X2 are insignificant to determine the change in Y since the calculated t values
are both less than the critical value.
CONT’D…
a) Test the overall significance of the model. (Hint: use  = 0.05)
This involves testing whether at least one of the two variables X 1 and X2 determine the changes
in Y. The hypothesis to be tested is given by:
H 0 :  1   2 0
H 1 :  i 0, at least for one i.
The ANOVA table for the test is give as follows:
Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
SSR  y 135.0262
k  1 =3-1=2 MSR 
 y2
135.0262

F 
MSE
k1 2 0.433634
67.51309
Residual SSE  e 2 1401.223 n  k =12- e 2
1401.223
MSE   
3=9 n k 9
155.614
Total SST  y 1536.25
2 n  1 =12-
1=11

The tabulated F value (critical value) is F(2, 11) = 3.98


CONT’D…

Or…

(n  k ) R2 (12 - 3) 0.087894
Fcal  . 2
 0.433632
k  1 (1  R ) 3 - 1 1  0.087894

The calculated F value (0.4336) is less than the tabulated value


(3.98). Hence, we accept the null hypothesis and conclude that
there is no significant contribution of the variables X1 and

X2 to the changes in Y.
Any Q uestion …

Q
EXTENSIONS OF REGRESSION MODELS

The relationship between Y and X can be non-linear rather


than linear.
The choice of a functional form for an equation is a vital
part of the specification of that equation.
The choice of a functional form almost always should be
based on an examination of the underlying economic
theory.
the one that comes closest to that underlying theory
should be chosen for the equation.
CONT’D…
Some Commonly Used Functional Forms
The Linear Form: It is based on the assumption that the slope
of the relationship between the independent variable and the
dependent variable is constant.

In this case elasticity


Y
is not constant.
 i
X

Economic theory frequently predicts only the sign of a


relationship and
N
not

its
Y functional
/Y

Y form.

X

X
Y ,X i
X / X X Y Y
CONT’D…
Log-linear, double Log or constant elasticity model
The most common functional form that is non-linear in the
variable (but still linear in the coefficients) is the log-linear
form.
 A log-linear form is often used, because the elasticities and
not the slopes are constant i.e.,  =  Constant.
CONT’D…
dem and
Output

 
Yi   0X i
1

p r ic e
Input


Yi   0 X i i eU i gd(log f)

ln Yi ln  0   i ln X i  U i ln Yi ln  0  1 ln X i

log f price
CONT’D…
The model is also called a constant elasticity model because
the coefficient of elasticity between Y and X (1) remains
constant.

Y X d ln Y
  1
X Y d ln X
CONT’D…
Semi-log Form

Yi   0   1 ln X 1i  U i ln Yi   0   1 X 1i  U i

1<0
Y=0+1Xi

1>0

x
CONT’D…
Polynomial Form

2
Y   0   1 X 1i   2 X 1i   3 X 2i  U i

Y
1  2 2 X 1
X 1

Y Y
 3 Y
X 2

A)
B)

X
Xi
Impact of age on earnings a typical cost curve
CONT…
Reciprocal Transformation (Inverse Functional Forms)

1
Yi   0   1 ( )   2 X 2i  U i
X 1i
1
Yi   0   1 ( )   2 X 2i  U i
X 1i

1 0  0
Y  0 
X1i 1  0

0

1 0 0
Y 0 
X1i 1 0
DUMMY VARIABLE REGRESSION
ANALYSIS
Dummy variables are discrete variables taking a value of ‘0’ or
‘1’. They are often called ‘on’ ‘off’ variables, being ‘on’ when
they are 1.
Dummy variables can be used either as explanatory variables
or as the dependent variable.
When they act as the dependent variable there are specific
problems with how the regression is interpreted, however when
they act as explanatory variables they can be interpreted in the
same way as other variables.
CONT…
These are: nominal, ordinal, interval and ratio scale variables.
regression models do not deal only with ratio scale variables; they can also
involve nominal and ordinal scale variables.

Where Yi = the (average) salary of public school teachers in state I, D 1i = 1 if


the state is in the Northeast, 0 otherwise (i.e. in other regions of the country)
D2i = 1 if the state is in the South, 0 otherwise (i.e. in other regions of the
country)
Dummy variable regression is that if there are m categories, we need only m-1
dummy variables.
CONT…
Dummy variables that represent a change in policy:
 Intercept dummy variables, that pick up a change in the intercept of the regression
 Slope dummy variables, that pick up a change in the slope of the regression

Suppose we want to analyze the effect of sex (D)on farm productivity of


maize (y) and
Di = 1 if the household head is male

Di = 0 if the household head is female

yi   Di  ui
CONT…
We can model this in the following way:
yi   Di  ui
This produces an average maize farm productivity for female hhh of E(y/D i =0)
=.
 The average maize farm productivity of male hhh will be E(y/D i = 1) =  + .
 If sex has a significant effect on maize farm productivity, this suggests that male
hhh have higher maize farm productivity than females.
CONT…

When we have a single dummy variable, we have information for both


categories in the model
Also note that Male = 1 – Female
Thus having both a dummy for male and one for female is redundant.
As a result of this, we always omit one category, whose intercept is the
model’s intercept ( or female).
This omitted category is called the reference category
CONT…

So far we have been dealing with variables that we can measure in
quantitative terms.

However, there are cases where certain variables of great importance are
qualitative in nature.

For instance, we may believe that the level of aggregate consumption


expenditure depends not only on disposable income, but also whether or not the
country is in a period of war or peace.

During war- time we expect consumption to be low as compared to peace-


time.
CON’T…
One approach to this problem would simply be to estimate two separate
consumption functions and obtain two consumption equations.

There is however, a more efficient procedure involving the estimation of only one
equation if we are willing to make certain assumptions.

Suppose that we hypothesize that war time controls do not alter the marginal
propensity to consume out of disposable income, but instead simply reduce the
average propensity to consume.
CON’T…
By this we mean that the slope remains the same, whereas the constant term
becomes smaller for war- time case.

With this assumption, the consumption function becomes

C t b0  b1Ydt  b2 Dt  u t , t 1, 2, ..., n,


CON’T…
Where Dt = 0 during peace time years

= 1 for war years

Equation above says that during peace time, when Dt = 0, we have


Ct = b0 + b1Ydt+ut
Which in period of war (Dt=1) becomes

Ct = (b0 + b2) + b1Ydt+ ut


CON’T…
Suppose the time period under consideration has both war and peace periods.

Using the data, we could estimate the values of the coefficient in equation with
our standard multiple regression equation.
CON’T…
Suppose that we in fact did this and obtained the equation

ˆ 40  0.9Y  30 D
C t dt t

Let us say that the t- ratio corresponding to the D t was of sufficient size to
suggest that the parameter b2 is not zero.

We would then conclude that the war had a significant negative effect on
consumption expenditures. The estimated consumption function would be
CON’T…

 Cˆ t 40  0.9Ydt , for years of peace

 Cˆ t 10  0.9Ydt , for war years

If consumption expenditures are measured in billions of dollars, a


comparison of the above two equation would then suggest that, for
corresponding levels of income, consumption expenditures were 30
billion dollars less during years of war.
CON’T…
The use of dummy variables is an extremely powerful extension of regression
analysis.

It allows us to expand the scope of our analysis to encompass variables that
we cannot measure in quantitative terms.

With dummy variables, we can take account of the effects of important


qualitative factors that influence the values of our dependent variable.
CON’T…
We can use as many dummy variables as we like as independent
variables in a regression equation, providing that we have a
sufficient number of observations to allow us to estimate the
equation.
But care should be taken to avoid dummy variable trap.
For the qualitative variable having s categories, we choose one of
them as the omitted category (without loss of generality, category 1)
and define dummy variables D2, ..., Ds for the rest.

What is dummy variable trap?

The coefficient of each dummy variable represents the increase in


the intercept relative to that for the basic category. But there is no
basic category for such a comparison creates dummy variable trap.
CON’T…

Observation Category X1 D1 D2 D3 D4
1 4 1 0 0 0 1
2 3 1 0 0 1 0
3 1 1 1 0 0 0
4 2 1 0 1 0 0
5 2 1 0 1 0 0
6 3 1 0 0 1 0
7 1 1 1 0 0 0
8 4 1 0 0 0 1

December 21, 2024 PREPARED BY K. S. 175


CON’T…
Mathematically, we have a special case of exact multicollinearity. If there is no
omitted category, there is an exact linear relationship between X1 and the dummy
variables.
X1 is the variable whose coefficient is b1. It is equal to 1 in all observations.
Usually we do not write it explicitly because there is no need to do so.

If there is an exact linear relationship among a set of the variables, it is


impossible in principle to estimate the separate coefficients of those variables.
To understand this properly, one needs to use linear algebra.
CON’T…
If you tried to run the regression anyway, the regression
application should detect the problem and do one of two
things. It may simply refuse to run the regression.

Alternatively, it may run it, dropping one of the variables in


the linear relationship, effectively defining the omitted
category by itself.

There is another way of avoiding the dummy variable trap.


That is to drop the intercept (and X1). There is no longer a
problem because there is no longer an exact linear
relationship linking the variables.
UNIT 5: ECONOMETRIC
PROBLEMS

Assumptions Revisited
Non-normality
Heteroscedasticity
Autocorrelation
Multicollinearity
ASSUMPTIONS REVISITED

two major problems arise in applying the classical linear


regression model.

1. those due to assumptions about the specification of the


model and about the disturbances and
2. those due to assumptions about the data
ASSUMPTIONS

 The regression model is linear in parameters.


 The values of the explanatory variables are fixed in repeated
sampling (non-stochastic).
 The mean of the disturbance (ui) is zero for any given value of
X
i.e. E(ui) = 0
 The variance of ui is constant i.e. homoscedastic
 There is no autocorrelation in the disturbance terms
 The explanatory variables are independently distributed with
the ui.
CON’T…
 The number of observations must be greater than the
number of explanatory variables.
 There is no linear relationship (multicollinearity) among the
explanatory variables.

 The stochastic (disturbance) term ui are normally distributed

i.e., ui ~ N(0, ²)

 The regression model is correctly specified

i.e., no specification error.


VIOLATIONS OF
ASSUMPTIONS
The Zero Mean Assumption i.e. E(ui)=0

 If this assumption is violated, we obtain a biased estimate of the


intercept term.

 But, since the intercept term is not very important we can leave it.

 The intercept term does not also have physical interpretation.

 The slope coefficients remain unaffected even if the assumption is


violated.
Non-normality

 The OLS estimators are BLUE regardless of whether the ui are


normally distributed or not.

 In addition, because of the central limit theorem, we can


argue that the test procedures:

 The t-tests and F-tests - are still valid asymptotically, i.e. in


large sample.
HETEROSCEDASTICITY: THE ERROR
VARIANCE IS NOT CONSTANT

If the error terms in the regression equation have a common


variance i.e., are Homoscedastic. If they do not have common
variance we say they are Heteroscedastic.

The basic questions to be addressed are:

What is the nature of the problem?


What are the consequences of the problem?
How do we detect (diagnose) the problem?
What remedies are available for the problem?
THE NATURE OF THE
PROBLEM

 In the case of homoscedastic disturbance terms, spread


around the mean is constant.

 But in the case of heteroscedasticity disturbance terms, the


variance changes with the explanatory variable.

 The problem of heteroscedasticity is likely to be more


common in cross-sectional than in time-series data.
CAUSES OF HETEROSCEDASTICITY

 Following the error-learning models, as people learn, their


errors of behavior become smaller over time thereby the
var/se as well.

 As income grows people have discretionary income and


hence more scope for choice about the disposition of their
income. Hence, the variance of the regression is more likely
to increase with income.

 Improvement in data collection techniques will reduce


errors (variance).
CON’T…

 Existence of outliers might also cause heteroscedasticity.


 Misspecification of a model can also be a cause for
heteroscedasticity.
 Skewness in the distribution of one or more explanatory
variables included in the model is another source of
heteroscedasticity.

 Incorrect data transformation and incorrect functional form


are also other sources.
CONSEQUENCES OF
HETEROSCEDASTICITY
 If the error terms of an equation are heteroscedastic:
 estimators are still linear.
 The least square estimators are still unbiased.
But……… there are three major consequences.
 It does affect the minimum variance property.
 The OLS estimators are inefficient.
 Thus the test statistics – t-test and F-test – cannot be relied
on in the face of uncorrected heteroscedasticity.
DETECTION OF
HETEROSCEDASTICITY
 There are no hard and fast rules (universally agreed upon
methods) for detecting the presence of heteroscedasticity.

 But some rules of thumb can be suggested.

 Most of these methods are based on the examination of the


OLS residuals,
 There are informal and formal methods of detecting
heteroscedasticity.
CON’T…

1. Nature of the problem


 heteroscedasticity is the rule rather than the exception.
2. Graphical method
 The squared residuals can be plotted either against Y or
against one of the explanatory variables.
 If there appears any systematic pattern, heteroscedasticity
might exist.

These two methods are informal methods.


CON’T…

3. Park Test- Park suggested a statistical test for


heteroscedasticity based on the assumption that the variance
of the disturbance term (i²) is some function of the
explanatory variable Xi.

  di 2 
rS 1  6 2

N ( N  1Correlation
4. Spearman’s Rank )  test

Where d is difference between ranks


CON’T…

5. Goldfeld and Quandt Test


 This is the most popular test and usually suitable for large
samples.
 It is assumed that if the variance (i²) is positively related to
one of the explanatory variables in the regression model and if
the number of observations is at least twice as many as the
parameters to be estimated, the test can be used.
Y    X  U
i 0 1 i i

 i 2  2 X i 2
REMEDIAL MEASURES

Remedies for Heteroscedasticity

 Include a previously omitted variable(s) if


heteroscedasticity is suspected due to omission of
variables.

 Redefine the variables in such a way that avoids


heteroscedasticity. For example, instead of total
income, we can use Income per capita.
AUTOCORRELATION:ERROR TERMS ARE
CORRELATED

 Is a problem which occurs when the assumption “non-


existence of serial correlation (autocorrelation) between the
disturbance terms, Ui’ is violated.
Cov(U i , V j ) 0 i j

 Serial correlation implies that the error term from one time
period depends in some systematic way on error terms from
other time periods.
 Autocorrelation is more a problem of time series data than
cross-sectional data.
 If by chance, such a correlation is observed in cross-sectional
units, it is called spatial autocorrelation.
CAUSES OF
AUTOCORRELATION
 Inertia or sluggishness in economic time-series is a great reason for
autocorrelation.
 For example, GNP, production, price index, employment, and
unemployment exhibit business cycles.
 In this upswing, the value of a series at one point in time is greater than
its previous values. These successive periods (observations) are likely to
be interdependent.
 Specification bias – exclusion of important variables or incorrect
functional forms
 Lags – in a time series regression
 Manipulation of data – if the raw data is manipulated (extrapolated or
interpolated)
CONSEQUENCES OF SERIAL CORRELATION

 The estimates of the parameters remain unbiased even in the


presence of autocorrelation but the X’s and the u’s must be
uncorrelated.

 Serial correlation increases the variance of the OLS


estimators.

 The variance of the disturbance term, Ui may be


underestimated.

 If the Ui’s are autocorrelated, then prediction based on the


OLS estimates will be inefficient.
DETECTING
AUTOCORRELATION
1. The existence of autocorrelation may be gained by plotting
the residuals either against their own lagged values or
against time.
2. Durbin-Watson Test
This test is appropriate only for the first order autoregressive
scheme.
CON’T…
This test is applicable if the underlying assumptions are
met:
The regression model includes an intercept term
The serial correlation is first order in nature
There are no missing observations in the data

The equation for the Durban-Watson d statistic is

 (e t  et  1 ) 2
d  t 2 N

e
2
t
t 1
REMEDIAL MEASURES FOR AUTOCORRELATION

1. The solution depends on the source of the problem.


If the source is omitted variables, the appropriate solution
is to include these variables.
If the source is misspecification of the mathematical form
the relevant approach will be to change the form.
2. If these sources are ruled out then the appropriate
procedure will be to transform the original data.
MULTICOLLINEARITY

 It exists when there is exact linear correlation between


Regressors
 It appear when the assumption ‘no independent variable is a
perfect linear function of one or more other independent
variables’ is violated
 Multicollinearity is not a condition that either exists or does
not exist in economic functions, but rather an inherent
phenomenon in most relationships due to the nature of
economic magnitude.
 Serious if VIF is greater than 10
CONSEQUENCES OF
MULTICOLLINEARITY
 The estimates of the coefficients are statistically unbiased.
 When multicollinearity is present in a function, the variances
and therefore the standard errors of the estimates will
increase, although some econometricians argue that this is
not always the case.
 The computed t-ratios will fall i.e. insignificant t-ratios will be
observed in the presence of multicollinearity.
 A high R² but few significant t-ratios are expected in the presence of
multicollinearity.
DETECTING MULTICOLLINEARITY

 High R² but few significant t-ratios

 VIF and Tolerance test

 High pair-wise (simple) correlation coefficients among the


repressors (explanatory variables).
REMEDIES FOR
MULTICOLLINEARITY
 Do Nothing
 Dropping one or more of the multicollinear variables
 Transformation of the variables
Two common transformations are:

to form a linear combination of the variables


to transform the equation into logs
 Increase the sample size
 Other Remedies (Factor analysis and Principal component
analysis or other techniques such as ridge regression).
Any Q uestion …

Q
GOOD LUCK !!!

You might also like