Econometrics
Econometrics
Econometrics
Kindineh Sisay
(Lecturer of Agricultural and Applied Economics)
Email: [email protected]
January, 2022
Haramaya University
CHAPTER OUTLINES
Unit 1: Fundamental concepts of Econometrics
Unit 2: Correlation Theory
Unit 3: Simple Linear Regression Models
Unit 4: Multiple Regression Analysis
Unit 5: Econometric Problems
Unit 6: Non-linear regression and Time series
Economtrics
CHAPTER ONE
INTRODUCTION TO ECONOMETRICS
Outlines:
Definition and scope of econometrics
Methodology of econometrics
Goals of econometrics
DEFINITION AND SCOPE
What is Econometrics?
about them.
CONT’D…
Economic models vs. Econometric models
A set of variables
A list of fundamental relationships and
A number of strategic coefficients
CONT’D…
Econometric models
They contain a random element which is ignored by
mathematical models and economic theory which postulate
exact relationships between economic variables.
Example: Economic theory postulates that the demand for a
commodity depends on its price, on the prices of other related
commodities, on consumers’ income and on tastes.
This is an exact relationship which can be written
mathematically as:
CONT’D…
The above demand equation is exact.
However, many more factors may affect demand.
In econometrics the influence of these ‘other’ factors is taken into
account by introducing random variable.
+ Ui
METHODOLOGY OF ECONOMETRICS
ii. Determine a priori theoretical expectations about the size and sign of the
parameters of the function, and
It is the most important and the most difficult stage of any econometric
research.
CONT’D…
likelihood of committing errors or incorrectly specifying the model.
Criteria:
o Correlation coefficient test, standard error test, t-test, F-test, and R 2-test
are some of the most commonly used statistical tests.
Theoretical plausibility;
The fewer the equations and the simpler their mathematical form, the
better the model provided that the other desirable properties are not
affected by the simplifications of the model.
GOALS OF ECONOMETRICS
Types of correlation
• These are the type of co-variation existed between variables and its
strength.
If the data points make a straight line going from the origin out to high x- and
y-values, then the variables are said to have a positive correlation.
If the line goes from a high-value on the y-axis down to a high-value on the
x-axis, the variables have a negative correlation.
CONT’D…
A perfect positive correlation = 1, perfect negative correlation = -1.
no correlation = 0.
The closer the number is to 1 or -1, the stronger the correlation, or the
stronger the relationship between the variables.
For example, saving and household size are negatively correlated. When HH
size increases, saving decreases and, vice versa.
The scatter diagram indicates the strength of the relationship between the two
variables.
Provide u’r own example for +
vely and – vely correlated
variables. Or find independent
variables that are + vely and – vely
2. SIMPLE CORRELATION
n X i Yi X i Y i
r
n X i ( X i ) 2 n Yi ( Yi ) 2
2 2
x i yi
rxy
x y
2 2
i i
Where, xi X i X and y i Yi - Y
CONT’D…
Example 2.1: The following table shows the quantity supplied for
a commodity with the corresponding price values.
Or using the deviation form (Equation 2.2), the correlation coefficient can be
computed as:
1810
r 0.975
330 10490
CONT’D…
There is a strong positive correlation between the quantity supplied and the
price of the commodity under consideration.
If the two variables are independent, the value of correlation coefficient is zero
but zero correlation coefficient does not show us that the two variables are
independent.
PROPERTIES OF SIMPLE CORRELATION COEFFICIENT
If X and Y variables are independent, the correlation coefficient is zero. But
the converse is not true.
The correlation coefficient has the same sign with that of regression
coefficients.
In many cases the variables may be qualitative (or binary variables) and
hence cannot be measured numerically.
For such cases it is possible to use another statistic, the rank correlation
coefficient (or spearman’s correlation coefficient.).
We rank the observations in a specific sequence for example in order of size,
importance, etc., using the numbers 1, 2, 3, …, n.
CONT’D…
In other words, we assign ranks to the data and measure relationship between
their ranks instead of their actual numerical values.
If two variables X and Y are ranked in such way that the values are ranked in
ascending or descending order, the rank correlation coefficient may be
computed by the formula
CONT’D…
Where,
D = difference between ranks of corresponding pairs of X and Y
n = number of observations.
Brands of A B C D E F G H I J K L Total
soap
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -3
Di2 4 4 1 0 4 1 1 16 0 9 1 9 50
CONT’D…
The rank correlation coefficient
The partial correlation coefficient between X1 and X2, keeping the effect of X3
constant is given by:
EXAMPLE
The following table gives data on the yield of corn per acre(Y),
the amount of fertilizer used(X1) and the amount of insecticide
used (X2). Compute the partial correlation coefficient between
the yield of corn and the fertilizer used keeping the effect of
insecticide constant.
Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
Y 40 44 46 48 52 58 60 68 74 80
X1 6 10 12 14 16 18 22 24 26 32
X2 4 4 5 7 9 12 14 20 21 24
ANSWER
Year Y X1 X2 y x1 x2 x1y x2y x1x2 x12 x22 y2
1971 40 6 4 -17 -12 -8 204 136 96 144 64 289
1972 44 10 4 -13 -8 -8 104 104 64 64 64 169
1973 46 12 5 -11 -6 -7 66 77 42 36 49 121
1974 48 14 7 -9 -4 -5 36 45 20 16 25 81
1975 52 16 9 -5 -2 -3 10 15 6 4 9 25
1976 58 18 12 1 0 0 0 0 0 0 0 1
1977 60 22 14 3 4 2 12 6 8 16 4 9
1978 68 24 20 11 6 8 66 88 48 36 64 121
1979 74 26 21 17 8 9 136 153 72 64 81 289
1980 80 32 24 23 14 12 322 276 168 196 144 529
Sum 570 180 120 0 0 0 956 900 524 576 504 1634
Mean 57 18 12
ANSWER
ryx1 = 0.9854
ryx = 0.9917
2
rx1x = 0.9725
2
Then
Any Q uestion …
Q
UNIT 3: SIMPLE LINEAR
REGRESSION MODELS
The value in which the error term assumed in one period does not depend on
the value in which it assumed in any other period. non-autocorrelation or non-
serial correlation.
Each value of Xi does not vary for instance owing to change in sample
size.
CONT’D…
Linearity of the model in parameters. The simple linear regression requires
linearity in parameters; but not necessarily linearity in variables(what is
important is transforming the data as required).
The explanatory variables do not have identical value. This assumption is very
important for improving the precision of estimators.
OLS METHOD OF ESTIMATION
n XY ( X )( Y )
ˆ1
Or
n X 2 ( X ) 2
_ _
XY n Y X and we have ˆ 0 Y ˆ1 X from above
1 _
X i2 n X 2
CONT’D…
Yi Xi
10 30
20 50
30 60
CONT’D…
n XY ( X )( Y )
3(3100) (140)( 60)
ˆ1 0.64
3(7000) (140) 2
n X 2 ( X ) 2
C. Y= -10+(0.64) (45)=18.8
CONT’D…
That means when X assumes a value of 45, the value of Y on
average is expected to be 18.8.
Then
x i yi
300 , and ˆ0 Y ˆ1 X =20-(0.64) (46.67) = -10 with
ˆ1 0.64
466.67
x
2
i
Formula for mean and variance of the respective parameter estimates and the
error term are given below:
1. The mean of
2. The variance of
3. The mean of
4. The variance of
5. The estimated value of the variance of the error term
STANDARD ERROR OF THE
PARAMETERS
HYPOTHESIS TESTING
We have to know that to what extent our estimates are reliable
enough and/or acceptable for further purpose.
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbb
THE COEFFICIENT OF
DETERMINATION (R 2 )
The total variation of the dependent variable is given in the following form;
TSS=ESS + RSS
o which means total sum of square of the dependent variable is split into
explained sum of square and residual sum of square.
CONT’D…
The coefficient of determination is given by the formula
(Yˆ yˆ
2
i Y )2 i
Explained Variation in Y
R2
Total Variation in Y
(Y y
2
i Y )2 i
2.16
2
Since y i 1 x i y i the coefficient of determination can also be given as
1 xi y i
R2
y i2
Or
(Y Yˆi ) 2 e
2
i i
Unexplained Variation in Y
R2 1 1 1
Total Variation inY
(Y y
2
i Y ) 2
i
CONT’D…
Since the sample values of the intercept and the coefficient are
estimates of the true population parameters, we have to test them
for their statistical reliability.
There are different tests that are available to test the statistical
reliability of the parameter estimates. The following are the
common ones;
This test first establishes the two hypotheses (the null and
alternative hypotheses). The two hypotheses are given as follows:
H0: βi=0
H1: βi≠0
The standard error test is outlined as follows:
1. Compute the standard deviations of the parameter estimates
using the formula
CONT’D…
The test depends on the degrees of freedom that the sample has.
The test procedures of t-test are similar with that of the z-test.
CONT’D…
The procedures are outlined as follows;
Set up the hypothesis.
Determine the level of significance (usually a 5% )level)
Determine the tabulated value of t from the table with n-k degrees of
freedom, where k is the number of parameters estimated.
Determine the calculated value of t.
| t cal | t / 2,n k
ˆi
tcal
se( ˆi )
3. THE STANDARD NORMAL TEST
In order to define the range within which the true parameter
lies, we must construct a confidence interval for the parameter.
we can construct 100(1- ) % confidence intervals for the
sample regression coefficients.
CONT’D…
NB. The standard error of a given coefficient is the positive square root of
the variance of the coefficient.
X
2
i
n xi
2
1
var(ˆ1 ) u
2
Variance of the slope is given by
x
2
i
e
2
i
Where u2
nk
is the estimate of the variance of the random term and k is the number of
parameters to be estimated in the model.
CONT’D…
The standard errors are the positive square root of the variances and the 100 (1-
) % confidence interval for the slope is given by:
1 t (n k )( se( 1 )) 1 1 t (n k )( se( 1 ))
2 2
Y 69 76 52 56 57 77 58 55 67 53 72 64
X 9 12 6 10 9 10 7 8 12 6 11 8
SOLUTION
5. Fit the linear regression equation and determine the 95% confidence interval
for the slope.
SOLUTION
1. Estimate the Coefficient of determination (R2)
e
2
i
387
R 1
2
1 1 0.43 0.57
894
y
2
i
This result shows that 57% of the variation in the quantity supplied of the
commodity under consideration is explained by the variation in the price of the
commodity; and the rest 43% remain unexplained by the price of the
commodity
CONT’D…
4. Run significance test of regression coefficients using fitted regression line for
the data given:
Yi 33.75 3.25 X i
(8.3) (0.9)
1. If tcal is greater than 2 or less than -2, we reject the null hypothesis
2. If tcal is less than 2 or greater than -2, accept the null hypothesis.
CONT’D…
5. To estimate confidence interval we need standard error which is determined
as follows:
e
2
i
387 387
u2 38.7
nk 12 2 10
1 1
var( ˆ1 ) u
2
38.7( ) 0.80625
48
x 2
The ideal or optimum properties that the OLS estimates possess may
be summarized by well known theorem known as the Gauss-Markov
Theorem.
The least squares estimators are linear, unbiased and have minimum variance
Linear: a linear function of the random variable, such as, the dependent
variable Y.
According to the Gauss-Markov theorem, the OLS estimators possess all the
BLUE properties.
Any Q uestion …
Q
QUIZ
Yi Xi
10 30
20 50
30 60
QUIZ…
5. Fit the linear regression equation and determine the 95% confidence interval for the
slope.
UNIT 4: MULTIPLE REGRESSION
ANALYSIS
Yi 0 1 X 1 2 X 2 .... k X k u i
Yi 0 1 X 1 2 X 2 u i
NOTATIONS AND ASSUMPTIONS
E (u i ) 0
E (u i2 ) u2 Cons tan t
u i N (0, u2 )
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS
^ ^ ^ ^
Yi 0 1 X 1 2 X 2
^
ei Yi Y i
^ ^ ^
0 , 1 and 2 in such a way that i
e 2
Minimum
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i
i 1
0 3.5
ˆ0
^
0
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i
i 1
0 3.6
ˆ1
^
1
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i
i 1
0 3.7
ˆ 2
^
2
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS
Solving equations (3.5), (3.6) and (3.7) simultaneously, we obtain the system of
normal equations given as follows:
^ ^ ^
Y n X X
i 0 1 1i 2 2i 3 .8
^ ^ ^
X Y X X X X
1i i 0 1i 1 1
2
2 1i 2i 3 .9
^ ^ ^
X Y X X X b X
2i i 0 2i 2 1i 2i 2
2
2i 3.10
CONT’D…
Then Letting
x1i X 1i X 1 3.11
x 2i X 2i X 2 3.12
yi Yi Y 3.13
^ x y x x y x x
1 1
2
2 2 1 2
1 3.14
x x x x
2
1
2
2 1 2
2
^ x y x x y x x
2
2
1 1 1 2
2 3.15
x x x x
2
1
2
2 1 2
2
^
0 Y ˆ1 X 1 ˆ 2 X 2 3.16
VARIANCE AND STANDARD ERRORS
OF OLS ESTIMATORS
2
2
X 1 x 2 X 2 x1 2 X 1 X 2 x1 x 2
2 2
^2 1
Var 0 ui
^
n
x1 x 2 ( x1 x 2 )
2 2 2
CONT’D…
^ ^
SE ( 0 ) Var ( 0 )
^
Var 1 u
2 x 22 ^ ^
2
x1 x 2 ( x1 x 2 )
2 2
SE ( 1 ) Var ( 1 )
^ ^ 2
Var ( 2 ) u
x12 ^ ^
2
x1 x 2 ( x1 x 2 )
2 2 SE ( 2 ) Var ( 2 )
^
2
e 2
i
u
n3
COEFFICIENT OF MULTIPLE
DETERMINATION
Is the measure of the proportion of the variation in the dependent variable
that is explained jointly by the independent variables in the model.
R2
y 2
3.24
y 2
^ ^
1 x1 y 2 x 2 y
R2 3.25
y2
CONT’D…
That means, highest value may not imply that the model is good.
(n 1)
2
Rady 1 (1 R 2 ) 3.26
(n k )
K is number of parameters
2
In multiple linear regression, therefore, we better interpret the adjusted R
2
than the ordinary or the unadjusted R .
CONFIDENCE INTERVAL
ESTIMATION
Interpretation of the confidence interval: Values of the parameter lying in the
interval are plausible with 100(1- )% confidence.
A. Standard Error Test: decision rule is based on the relationship between the
numerical value of the parameter and the standard error of the same parameter.
1
S ( ˆ i ) ˆ i
If , 2 we reject the null hypothesis, i.e. the estimate is statistically
significant.
Generalisation: The smaller the standard error, the stronger is the evidence that
the estimates are statistically significant.
CONT’D…
B. t-test – the more appropriate and formal way to test the hypothesis.
compute the t-ratios and compare them with the tabulated t-values and make our
decision.
Accepting H0, on the other hand, means we don’t have sufficient evidence to
conclude that the coefficient is different from 0.
TESTING THE OVERALL SIGNIFICANCE OF REGRESSION MODEL
H 0 : 1 2 0
H 1 : i 0, at least for one i.
CONT’D…
yˆ 2
Fcal k 1
e 2
nk
MSR
nk
Total SST y 2 n 1
CONT’D…
R 2
y 2 e 2
(1 R 2
) y 2
yˆ 2
R 2
y 2
Fcal
y2
k 1
e2
R2 1
e 2
nk
y 2
R2 y2 R2 y2
e2 1 R 2 Fcal .
(n k )
k 1 (1 R 2 ) y 2
y2 (1 R 2 ) y 2
k 1
nk
(n k ) R 2
Fcal .
k 1 (1 R 2 )
EXAMPLE
Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971
Y 57 43 73 37 64 48 56 50 39 43 69 60
X1 220 215 250 241 305 258 354 321 370 375 385 385
X2 125 147 118 160 128 149 145 150 140 115 155 152
CONT…
a. Estimate the coefficients of the economic relationship and fit the model.
Year Y X1 X2 x1 x2 y X12 x 22 x 1y x 2y x 1x 2 y2
1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625
1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625
1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625
1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625
1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625
1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625
1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625
1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625
1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625
1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625
1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625
1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
Sum 639 3679 1684 0.0004 0.0004 0 49206.92 2500.667 1043.25 -509 960.6667 1536.25
Y 639
Y 53.25
n 12
X 1
3679
X1 306.5833
n 12
X 2
1684
X2 140.3333
n 12
The summary results in deviation forms are then given by:
x x
2 2
1 49206.92 2 2500.667
x y 1043.25
1 x 2 y 509
x x1 2 960.6667 y 2
1536.25
CONT’D…
CONT’D…
The fitted model is then written as: Yˆi = 75.40512 + 0.025365X1 - 0.21329X2
a) Compute the variance and standard errors of the slopes.
First, you need to compute the estimate of the variance of the random term as follows
^
2
e 2
i
1401.223 1401.223
155.69143
u
n3 12 3 9
^
Variance of 1
^ 2
Var 1 u
x 22
155.69143(
2500.667
) 0.003188
x1 x 2 ( x1 x 2 )
2 2 2
12212724
^
Standard error of 1
^ ^
SE ( 1 ) Var ( 1 ) 0.003188 0.056462
^
Variance of 2
^
Var ( 2 ) u
^ 2
x12
155.69143(
49206.92
) 0.0627
2
x1 x 2 ( x1 x 2 )
2 2
122127241
^
Standard error of 2
^ ^
SE ( 2 ) Var ( 2 ) 0.0627 0.25046
CONT’D…
Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for you as
an exercise.
a) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R 2.
yˆ 2
135.0262
e 2
1401.223
y 2
1536.25 (The sum of the above two). Then,
^ ^
1 x1 y 2 x 2 y (0.025365)(1043.25) (-0.21329)(-509)
R
2
0.087894
y 2
1356.25
e 2
1401.223
or R 2 1 1 0.087894
1356.25
y 2
The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test revealed
that both X1 and X2 are insignificant to determine the change in Y since the calculated t values
are both less than the critical value.
CONT’D…
a) Test the overall significance of the model. (Hint: use = 0.05)
This involves testing whether at least one of the two variables X 1 and X2 determine the changes
in Y. The hypothesis to be tested is given by:
H 0 : 1 2 0
H 1 : i 0, at least for one i.
The ANOVA table for the test is give as follows:
Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
SSR y 135.0262
k 1 =3-1=2 MSR
y2
135.0262
F
MSE
k 1 2 0.433634
67.51309
Residual SSE e 2 1401.223 n k =12- e 2
1401.223
MSE
3=9 nk 9
155.614
Total SST y 1536.25
2 n 1 =12-
1=11
Or…
(n k ) R2 (12 - 3) 0.087894
Fcal . 0.433632
k 1 (1 R )
2
3 - 1 1 0.087894
The calculated F value (0.4336) is less than the tabulated value (3.98). Hence,
we accept the null hypothesis and conclude that there is no significant
contribution of the variables X1 and X2 to the changes in Y.
Any Q uestion …
Q
EXTENSIONS OF REGRESSION MODELS
Economic theory frequently predicts only the sign of a relationship and not its
functional form.N Y / Y Y X X
Y ,X i
X / X X Y Y
CONT’D…
Log-linear, double Log or constant elasticity model
The most common functional form that is non-linear in the variable (but still
linear in the coefficients) is the log-linear form.
A log-linear form is often used, because the elasticities and not the slopes are
constant i.e., = Constant.
CONT’D…
dem and
Output
Yi 0X i
1
p r ic e
Input
Yi 0 X i i eU i gd(log f)
ln Yi ln 0 i ln X i U i ln Yi ln 0 1 ln X i
log f price
CONT’D…
The model is also called a constant elasticity model because the coefficient
of elasticity between Y and X (1) remains constant.
Y X d ln Y
1
X Y d ln X
CONT’D…
Semi-log Form
Yi 0 1 ln X 1i U i ln Yi 0 1 X 1i U i
1<0
Y=0+1Xi
1>0
x
CONT’D…
Polynomial Form
Y 0 1 X 1i 2 X 1i 3 X 2i U i
2
Y
1 2 2 X 1
X 1
Y Y
3 Y
X 2
A)
B)
X
Xi
Impact of age on earnings a typical cost curve
CONT…
Reciprocal Transformation (Inverse Functional Forms)
1
Yi 0 1 ( ) 2 X 2i U i
X 1i
1
Yi 0 1 ( ) 2 X 2i U i
X 1i
1 0 0
Y 0
X1i 1 0
0
1 0 0
Y 0
X1i 1 0
DUMMY VARIABLE REGRESSION
ANALYSIS
Dummy variables are discrete variables taking a value of ‘0’ or ‘1’. They are
often called ‘on’ ‘off’ variables, being ‘on’ when they are 1.
Dummy variables can be used either as explanatory variables or as the
dependent variable.
When they act as the dependent variable there are specific problems with how
the regression is interpreted, however when they act as explanatory variables
they can be interpreted in the same way as other variables.
CONT…
These are: nominal, ordinal, interval and ratio scale variables.
regression models do not deal only with ratio scale variables; they can also
involve nominal and ordinal scale variables.
yi Di ui
CONT…
We can model this in the following way:
yi Di ui
This produces an average maize farm productivity for female hhh of E(y/D i =0)
=.
The average maize farm productivity of male hhh will be E(y/D i = 1) = + .
If sex has a significant effect on maize farm productivity, this suggests that male
hhh have higher maize farm productivity than females.
CONT…
So far we have been dealing with variables that we can measure in quantitative
terms.
However, there are cases where certain variables of great importance are
qualitative in nature.
There is however, a more efficient procedure involving the estimation of only one
equation if we are willing to make certain assumptions.
Suppose that we hypothesize that war time controls do not alter the marginal
propensity to consume out of disposable income, but instead simply reduce the
average propensity to consume.
CON’T…
By this we mean that the slope remains the same, whereas the constant term
becomes smaller for war- time case.
C t b0 b1Ydt b2 Dt u t , t 1, 2, ..., n,
CON’T…
Where Dt = 0 during peace time years
= 1 for war years
Using the data, we could estimate the values of the coefficient in equation with
our standard multiple regression equation.
CON’T…
Suppose that we in fact did this and obtained the equation
ˆ 40 0.9Y 30 D
C t dt t
Let us say that the t- ratio corresponding to the Dt was of sufficient size to
suggest that the parameter b2 is not zero.
We would then conclude that the war had a significant negative effect on
consumption expenditures. The estimated consumption function would be
CON’T…
It allows us to expand the scope of our analysis to encompass variables that
we cannot measure in quantitative terms.
Observation Category X1 D1 D2 D3 D4
1 4 1 0 0 0 1
2 3 1 0 0 1 0
3 1 1 1 0 0 0
4 2 1 0 1 0 0
5 2 1 0 1 0 0
6 3 1 0 0 1 0
7 1 1 1 0 0 0
8 4 1 0 0 0 1
Alternatively, it may run it, dropping one of the variables in the linear
relationship, effectively defining the omitted category by itself.
There is another way of avoiding the dummy variable trap. That is to drop
the intercept (and X1). There is no longer a problem because there is no
longer an exact linear relationship linking the variables.
UNIT 5: ECONOMETRIC
PROBLEMS
Assumptions Revisited
Non-normality
Heteroscedasticity
Autocorrelation
Multicollinearity
ASSUMPTIONS REVISITED
two major problems arise in applying the classical linear regression model.
1. those due to assumptions about the specification of the model and about
the disturbances and
2. those due to assumptions about the data
ASSUMPTIONS
i.e. E(ui) = 0
The variance of ui is constant i.e. homoscedastic
There is no autocorrelation in the disturbance terms
The explanatory variables are independently distributed with the ui.
CON’T…
The number of observations must be greater than the number of explanatory
variables.
There is no linear relationship (multicollinearity) among the explanatory
variables.
But, since the intercept term is not very important we can leave it.
The OLS estimators are BLUE regardless of whether the ui are normally
distributed or not.
In addition, because of the central limit theorem, we can argue that the test
procedures:
The t-tests and F-tests - are still valid asymptotically, i.e. in large sample.
HETEROSCEDASTICITY: THE ERROR
VARIANCE IS NOT CONSTANT
If the error terms in the regression equation have a common variance i.e., are
Homoscedastic. If they do not have common variance we say they are
Heteroscedastic.
As income grows people have discretionary income and hence more scope
for choice about the disposition of their income. Hence, the variance of the
regression is more likely to increase with income.
Incorrect data transformation and incorrect functional form are also other
sources.
CONSEQUENCES OF
HETEROSCEDASTICITY
If the error terms of an equation are heteroscedastic:
estimators are still linear.
The least square estimators are still unbiased.
But……… there are three major consequences.
It does affect the minimum variance property.
The OLS estimators are inefficient.
Thus the test statistics – t-test and F-test – cannot be relied on in the face of
uncorrected heteroscedasticity.
DETECTION OF
HETEROSCEDASTICITY
There are no hard and fast rules (universally agreed upon methods) for
detecting the presence of heteroscedasticity.
Most of these methods are based on the examination of the OLS residuals,
There are informal and formal methods of detecting heteroscedasticity.
CON’T…
d i test
Correlation
2
4. Spearman’s Rank
rS 1 6
N ( N 1)
2
Yi 0 1 X i U i
i2 2 Xi2
REMEDIAL MEASURES
Cov(U i , V j ) 0 i j
Serial correlation implies that the error term from one time period depends
in some systematic way on error terms from other time periods.
Autocorrelation is more a problem of time series data than cross-sectional
data.
If by chance, such a correlation is observed in cross-sectional units, it is
called spatial autocorrelation.
CAUSES OF AUTOCORRELATION
In this upswing, the value of a series at one point in time is greater than its previous
values. These successive periods (observations) are likely to be interdependent.
Specification bias – exclusion of important variables or incorrect functional forms
Lags – in a time series regression
Manipulation of data – if the raw data is manipulated (extrapolated or interpolated)
CONSEQUENCES OF SERIAL CORRELATION
If the Ui’s are autocorrelated, then prediction based on the OLS estimates will
be inefficient.
DETECTING AUTOCORRELATION
(e t et 1 ) 2
d t 2
N
e
2
t
t 1
REMEDIAL MEASURES FOR AUTOCORRELATION
Q
GOOD LUCK !!!