Econometrics ppt-1
Econometrics ppt-1
Econometrics
Kindineh Sisay
(Lecturer of Agricultural and Applied Economics)
Email: [email protected]
January, 2022
Haramaya University
CHAPTER OUTLINES
Unit 1: Fundamental concepts of Econometrics
Unit 2: Correlation Theory
Unit 3: Simple Linear Regression
Models
Unit 4: Multiple Regression Analysis
Unit 5: Econometric Problems
Unit 6: Non-linear regression and Time
series Economtrics
CHAPTER ONE
INTRODUCTION TO ECONOMETRICS
Outlines:
Definition and scope of econometrics
Methodology of econometrics
Goals of econometrics
DEFINITION AND SCOPE
What is Econometrics?
A set of variables
A list of fundamental relationships and
A number of strategic coefficients
CONT’D…
Econometric models
They contain a random element which is
ignored by mathematical models and economic
theory which postulate exact relationships
between economic variables.
Example: Economic theory postulates that the
demand for a commodity depends on its price,
on the prices of other related commodities, on
consumers’ income and on tastes.
This is an exact relationship which can be
written mathematically as:
CONT’D…
The above demand equation is exact.
However, many more factors may affect demand.
In econometrics the influence of these ‘other’
factors is taken into account by introducing random
variable.
ii. Determine a priori theoretical expectations about the size and sign
of the parameters of the function, and
Criteria:
Economic a priori criteria: determined by economic theory
and refer to the size and sign of the parameters of economic
relationships.
CONT’D…
Statistical criteria (first-order tests):
These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the
parameters of the model.
properties:
Theoretical plausibility;
form, the better the model provided that the other desirable
properties are not affected by the simplifications of the model.
GOALS OF
ECONOMETRICS
Basically there are three main goals of
Econometrics.
Types of correlation
If the data points make a straight line going from the origin out
to high x- and y-values, then the variables are said to have a
positive correlation.
no correlation = 0.
estimate by rxy.
r
n X i ( X i ) 2 n Yi ( Yi ) 2
2 2
x i yi
rxy
x y
2 2
i i
Or using the deviation form (Equation 2.2), the correlation coefficient can be
computed as:
1810
r 0.975
330 10490
CONT’D…
There is a strong positive correlation between the quantity
supplied and the price of the commodity under consideration.
If two variables X and Y are ranked in such way that the values
are ranked in ascending or descending order, the rank
correlation coefficient may be computed by the formula
CONT’D…
Where,
D = difference between ranks of corresponding pairs of
X and Y
n = number of observations.
Brands of A B C D E F G H I J K L Tot
soap al
Person I 9 1 4 1 8 1 3 2 5 7 1 6
0 1 2
Person II 7 8 3 1 1 1 2 6 5 4 1 9
0 2 1
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -
3
Di 2 4 4 1 0 4 1 1 1 0 9 1 9 50
6
CONT’D…
The rank correlation coefficient
Y 40 44 46 48 52 58 60 68 74 80
X1 6 10 12 14 16 18 22 24 26 32
X2 4 4 5 7 9 12 14 20 21 24
ANSWER
x1x
ryx = 0.9917
2
rx1x = 0.9725
2
Then
Any Q uestion …
Q
UNIT 3: SIMPLE LINEAR
REGRESSION MODELS
The value in which the error term assumed in one period does
not depend on the value in which it assumed in any other
period. non-autocorrelation or non-serial correlation.
n XY ( X )( Y )
ˆ1
Or
n X 2 ( X ) 2
_ _
XY n Y X and we have ˆ 0 Y ˆ1 X
1 from above
_
X i2 n X 2
CONT’D…
Yi Xi
10 30
20 50
30 60
CONT’D…
n XY ( X )( Y )
3(3100) (140)( 60)
ˆ1 0.64
3(7000) (140) 2
n X 2 ( X ) 2
C. Y= -10+(0.64) (45)=18.8
CONT’D…
That means when X assumes a value of 45, the
value of Y on average is expected to be 18.8.
Then
x i yi
300 , and ˆ0 Y ˆ1 X =20-(0.64) (46.67) = -10 with
ˆ1 0.64
466.67
x
2
i
2. The variance of
3. The mean of
4. The variance of
5. The estimated value of the variance of the error term
STANDARD ERROR OF THE
PARAMETERS
HYPOTHESIS TESTING
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbb
THE COEFFICIENT OF
DETERMINATION (R 2 )
The total variation of the dependent variable is given in the following form;
TSS=ESS + RSS
o which means total sum of square of the dependent variable is split into
explained sum of square and residual sum of square.
CONT’D…
The coefficient of determination is given by the formula
(Yˆ yˆ
2
i Y )2 i
Explained Variation in Y
R2
Total Variation in Y
(Y y
2
i Y )2 i
2.16
2
Since y i 1 x i y i the coefficient of determination can also be given as
1 xi y i
R2
y i2
Or
(Y Yˆi ) 2 e
2
i i
Unexplained Variation in Y
R 2 1 1 1
Total Variation inY
(Y y
2 2
i Y) i
CONT’D…
n xi
2
1
var(ˆ1 ) u
Variance of the slope is given by 2
x
2
i
e
2
i
Where u2
n k
Y 69 76 52 56 57 77 58 55 67 53 72 64
X 9 12 6 10 9 10 7 8 12 6 11 8
SOLUTION
Time Y X XY X2 Y2 x y xy x2 y2 Yˆ ei ei2
1 69 9 621 81 4761 0 6 0 0 36 63.00 6.00 36.00
16
2 76 12 912 144 5776 3 13 39 9 9 72.75 3.25 10.56
12
3 52 6 312 36 2704 -3 -11 33 9 1 53.25 -1.25 1.56
- 105.0
4 56 10 560 100 3136 1 -7 -7 1 49 66.25 10.25 6
5 57 9 513 81 3249 0 -6 0 0 36 63.00 -6.00 36.00
19 115.5
6 77 10 770 100 5929 1 14 14 1 6 66.25 10.75 6
7 58 7 406 49 3364 -2 -5 10 4 25 56.50 1.50 2.25
8 55 8 440 64 3025 -1 -8 8 1 64 59.75 -4.75 22.56
9 67 12 804 144 4489 3 4 12 9 16 72.75 -5.75 33.06
10
10 53 6 318 36 2809 -3 -10 30 9 0 53.25 -0.25 0.06
11 72 11 792 121 5184 2 9 18 4 81 69.50 2.50 6.25
CONT’D…
1. Estimate the Coefficient of determination (R2), Coefficient
of non-determination (1- R2) and interpret the result.
2. Find the variance and standard error of the intercept
3. Find the variance and standard error of the slope
4. Run significance test of regression coefficients using the
following test methods
The standard error test
The students t-test
e
2
i
2 387
R 1 1 1 0.43 0.57
894
y
2
i
1 1
var( ˆ1 ) u
2
38.7( ) 0.80625
48
x 2
estimators).
Q
QUIZ
5. Fit the linear regression equation and determine the 95% confidence
interval for the slope.
UNIT 4: MULTIPLE REGRESSION
ANALYSIS
Yi 0 1 X 1 2 X 2 .... k X k u i
Yi 0 1 X 1 2 X 2 u i
NOTATIONS AND ASSUMPTIONS
E (u i2 ) u2 Cons tan t
u i N (0, u2 )
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS
^ ^ ^ ^
Yi 0 1 X 1 2 X 2
^
ei Yi Y i
^ ^ ^
0 , 1 and 2 in such a way that i
e 2
Minimum
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i i 1
0 3.5
ˆ0
^
0
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i i 1
0 3.6
ˆ1
^
1
n
(Y ˆ0 ˆ1 X 1 ˆ 2 X 2 ) 2
e 2
i i 1
0 3.7
ˆ 2
^
2
ESTIMATION OF PARTIAL
REGRESSION COEFFICIENTS
Solving equations (3.5), (3.6) and (3.7) simultaneously, we
obtain the system of normal equations given as follows:
^ ^ ^
Y n X X
i 0 1 1i 2 2i 3 .8
^ ^ ^
X Y X X X X
1i i 0 1i 1 1
2
2 1i 2i 3 .9
^ ^ ^
X Y X X X b X
2i i 0 2i 2 1i 2i 2
2
2i 3.10
CONT’D…
Then Letting
x1i X 1i X 1 3.11
x 2i X 2i X 2 3.12
yi Yi Y 3.13
^ x y x x y x x
1 1
2
2 2 1 2
1 3.14
x x x x
2
1
2
2 1 2
2
^ x y x x y x x
2
2
1 1 1 2
2 3.15
x x x x
2
1
2
2 1 2
2
^
0 Y ˆ1 X 1 ˆ 2 X 2 3.16
VARIANCE AND STANDARD
ERRORS OF OLS ESTIMATORS
^
Var 1 u
2 x 22 ^ ^
2
x1 x 2 ( x1 x 2 )
2 2
SE ( 1 ) Var ( 1 )
^ ^ 2
Var ( 2 ) u
x12 ^ ^
2
x1 x 2 ( x1 x 2 )
2 2 SE ( 2 ) Var ( 2 )
^
2
e 2
i
u
n 3
COEFFICIENT OF MULTIPLE
DETERMINATION
Is the measure of the proportion of the variation in the
dependent variable that is explained jointly by the
independent variables in the model.
^
R2
y 2
3.24
y 2
2 (n 1)
Rady 1 (1 R 2 ) 3.26
(n k )
K is number of parameters
R2
In multiple linear regression, 2therefore, we better interpret
R
the adjusted than the ordinary or the unadjusted .
CONFIDENCE INTERVAL
ESTIMATION
Interpretation of the confidence interval: Values of the
) are plausible with 100(1-
parameter lying in the interval %
confidence.
H 0 : ˆ1 0 H 0 : ˆ 2 0 H 0 : ˆ K 0
H : ˆ 0 H : ˆ 0
1 2
H 1 : ˆ K 0
1 1
B. t-test – the more appropriate and formal way to test the hypothesis.
compute the t-ratios and compare them with the tabulated t-values and
make our decision.
different from 0.
H 0 : 1 2 0
H 1 : i 0, at least for one i.
CONT’D…
yˆ 2
Fcal k 1
e2
n k
MSR
n k
Total SST y 2 n 1
CONT’D…
R 2
y 2 e 2
(1 R 2
) y 2
yˆ 2
R 2
y 2
Fcal
y2
k 1
e2
R 2 1
e 2
n k
y 2
R2 y2 R2 y2
e2 1 R 2 Fcal .
(n k )
k1 k1 (1 R 2 ) y 2
y2 (1 R 2 ) y 2
n k
(n k ) R 2
Fcal .
k 1 (1 R 2 )
EXAMPLE
196 196 196 196 196 196 196 196 196 196 197 197
Year 0 1 2 3 4 5 6 7 8 9 0 1
Y 57 43 73 37 64 48 56 50 39 43 69 60
X1 220 215 250 241 305 258 354 321 370 375 385 385
X2 125 147 118 160 128 149 145 150 140 115 155 152
CONT…
a. Estimate the coefficients of the economic relationship and
Year
fit
Y
theX model.
1X x 2x y 1 X x 2xy xy xx 1
2
2
2
1 2 1 2 y2
1960 57 220 125 -86.5833 -15.3333 3.75 7496.668 235.1101 -324.687 -57.4999 1327.608 14.0625
1961 43 215 147 -91.5833 6.6667 -10.25 8387.501 44.44489 938.7288 -68.3337 -610.558 105.0625
1962 73 250 118 -56.5833 -22.3333 19.75 3201.67 498.7763 -1117.52 -441.083 1263.692 390.0625
1963 37 241 160 -65.5833 19.6667 -16.25 4301.169 386.7791 1065.729 -319.584 -1289.81 264.0625
1964 64 305 128 -1.5833 -12.3333 10.75 2.506839 152.1103 -17.0205 -132.583 19.52731 115.5625
1965 48 258 149 -48.5833 8.6667 -5.25 2360.337 75.11169 255.0623 -45.5002 -421.057 27.5625
1966 56 354 145 47.4167 4.6667 2.75 2248.343 21.77809 130.3959 12.83343 221.2795 7.5625
1967 50 321 150 14.4167 9.6667 -3.25 207.8412 93.44509 -46.8543 -31.4168 139.3619 10.5625
1968 39 370 140 63.4167 -0.3333 -14.25 4021.678 0.111089 -903.688 4.749525 -21.1368 203.0625
1969 43 375 115 68.4167 -25.3333 -10.25 4680.845 641.7761 -701.271 259.6663 -1733.22 105.0625
1970 69 385 155 78.4167 14.6667 15.75 6149.179 215.1121 1235.063 231.0005 1150.114 248.0625
1971 60 385 152 78.4167 11.6667 6.75 6149.179 136.1119 529.3127 78.75022 914.8641 45.5625
49206.9 2500.66 960.666
Sum 639 3679 1684 0.0004 0.0004 0 2 7 1043.25 -509 7 1536.25
306.583 140.333
Mean 53.25 3 3 0 0 0
CONT…
Y 639
Y 53.25
n 12
X 1
3679
X1 306.5833
n 12
X 2
1684
X2 140.3333
n 12
The summary results in deviation forms are then given by:
x x
2 2
1 49206.92 2 2500.667
x y 1043.25
1 x 2 y 509
x x1 2 960.6667 y 2
1536.25
CONT’D…
CONT’D…
The fitted model is then written as: Yˆi = 75.40512 + 0.025365X1 - 0.21329X2
a) Compute the variance and standard errors of the slopes.
First, you need to compute the estimate of the variance of the random term as follows
^
2
e
2
1401.223 1401.223
i
155.69143
u
n 3 12 3 9
^
Variance of 1
^ 2
Var 1 u
x 22
155.69143(
2500.667
) 0.003188
x1 x 2 ( x1 x 2 )
2 2 2
12212724
^
Standard error of 1
^ ^
SE ( 1 ) Var ( 1 ) 0.003188 0.056462
^
Variance of 2
^
Var ( 2 ) u
^ 2 x12
155.69143(
49206.92
) 0.0627
2
x1 x 2 ( x1 x 2 )
2 2
122127241
^
Standard error of 2
^ ^
SE ( 2 ) Var ( 2 ) 0.0627 0.25046
CONT’D…
Similarly, the standard error of the intercept is found to be 37.98177. The detail is left for you as
an exercise.
a) Calculate and interpret the coefficient of determination.
We can use the following summary results to obtain the R 2.
yˆ 2
135.0262
e 2
1401.223
y 2
1536.25 (The sum of the above two). Then,
^ ^
2
1 x1 y 2 x 2 y (0.025365)(1043.25) (-0.21329)(-509)
R 0.087894
y 2
1356.25
e 2
1401.223
or R 2 1 1 0.087894
1356.25
y 2
The critical value (t 0.05, 9) to be used here is 2.262. Like the standard error test, the t- test revealed
that both X1 and X2 are insignificant to determine the change in Y since the calculated t values
are both less than the critical value.
CONT’D…
a) Test the overall significance of the model. (Hint: use = 0.05)
This involves testing whether at least one of the two variables X 1 and X2 determine the changes
in Y. The hypothesis to be tested is given by:
H 0 : 1 2 0
H 1 : i 0, at least for one i.
The ANOVA table for the test is give as follows:
Source of Sum of Squares Degrees of Mean Sum of Squares Fcal
variation freedom
Regression ^ 2 ^ MSR
SSR y 135.0262
k 1 =3-1=2 MSR
y2
135.0262
F
MSE
k1 2 0.433634
67.51309
Residual SSE e 2 1401.223 n k =12- e 2
1401.223
MSE
3=9 n k 9
155.614
Total SST y 1536.25
2 n 1 =12-
1=11
Or…
(n k ) R2 (12 - 3) 0.087894
Fcal . 2
0.433632
k 1 (1 R ) 3 - 1 1 0.087894
X2 to the changes in Y.
Any Q uestion …
Q
EXTENSIONS OF REGRESSION MODELS
Yi 0X i
1
p r ic e
Input
Yi 0 X i i eU i gd(log f)
ln Yi ln 0 i ln X i U i ln Yi ln 0 1 ln X i
log f price
CONT’D…
The model is also called a constant elasticity model because
the coefficient of elasticity between Y and X (1) remains
constant.
Y X d ln Y
1
X Y d ln X
CONT’D…
Semi-log Form
Yi 0 1 ln X 1i U i ln Yi 0 1 X 1i U i
1<0
Y=0+1Xi
1>0
x
CONT’D…
Polynomial Form
2
Y 0 1 X 1i 2 X 1i 3 X 2i U i
Y
1 2 2 X 1
X 1
Y Y
3 Y
X 2
A)
B)
X
Xi
Impact of age on earnings a typical cost curve
CONT…
Reciprocal Transformation (Inverse Functional Forms)
1
Yi 0 1 ( ) 2 X 2i U i
X 1i
1
Yi 0 1 ( ) 2 X 2i U i
X 1i
1 0 0
Y 0
X1i 1 0
0
1 0 0
Y 0
X1i 1 0
DUMMY VARIABLE REGRESSION
ANALYSIS
Dummy variables are discrete variables taking a value of ‘0’ or
‘1’. They are often called ‘on’ ‘off’ variables, being ‘on’ when
they are 1.
Dummy variables can be used either as explanatory variables
or as the dependent variable.
When they act as the dependent variable there are specific
problems with how the regression is interpreted, however when
they act as explanatory variables they can be interpreted in the
same way as other variables.
CONT…
These are: nominal, ordinal, interval and ratio scale variables.
regression models do not deal only with ratio scale variables; they can also
involve nominal and ordinal scale variables.
yi Di ui
CONT…
We can model this in the following way:
yi Di ui
This produces an average maize farm productivity for female hhh of E(y/D i =0)
=.
The average maize farm productivity of male hhh will be E(y/D i = 1) = + .
If sex has a significant effect on maize farm productivity, this suggests that male
hhh have higher maize farm productivity than females.
CONT…
So far we have been dealing with variables that we can measure in
quantitative terms.
However, there are cases where certain variables of great importance are
qualitative in nature.
There is however, a more efficient procedure involving the estimation of only one
equation if we are willing to make certain assumptions.
Suppose that we hypothesize that war time controls do not alter the marginal
propensity to consume out of disposable income, but instead simply reduce the
average propensity to consume.
CON’T…
By this we mean that the slope remains the same, whereas the constant term
becomes smaller for war- time case.
Using the data, we could estimate the values of the coefficient in equation with
our standard multiple regression equation.
CON’T…
Suppose that we in fact did this and obtained the equation
ˆ 40 0.9Y 30 D
C t dt t
Let us say that the t- ratio corresponding to the D t was of sufficient size to
suggest that the parameter b2 is not zero.
We would then conclude that the war had a significant negative effect on
consumption expenditures. The estimated consumption function would be
CON’T…
It allows us to expand the scope of our analysis to encompass variables that
we cannot measure in quantitative terms.
Observation Category X1 D1 D2 D3 D4
1 4 1 0 0 0 1
2 3 1 0 0 1 0
3 1 1 1 0 0 0
4 2 1 0 1 0 0
5 2 1 0 1 0 0
6 3 1 0 0 1 0
7 1 1 1 0 0 0
8 4 1 0 0 0 1
Assumptions Revisited
Non-normality
Heteroscedasticity
Autocorrelation
Multicollinearity
ASSUMPTIONS REVISITED
But, since the intercept term is not very important we can leave it.
di 2
rS 1 6 2
N ( N 1Correlation
4. Spearman’s Rank ) test
i 2 2 X i 2
REMEDIAL MEASURES
Serial correlation implies that the error term from one time
period depends in some systematic way on error terms from
other time periods.
Autocorrelation is more a problem of time series data than
cross-sectional data.
If by chance, such a correlation is observed in cross-sectional
units, it is called spatial autocorrelation.
CAUSES OF
AUTOCORRELATION
Inertia or sluggishness in economic time-series is a great reason for
autocorrelation.
For example, GNP, production, price index, employment, and
unemployment exhibit business cycles.
In this upswing, the value of a series at one point in time is greater than
its previous values. These successive periods (observations) are likely to
be interdependent.
Specification bias – exclusion of important variables or incorrect
functional forms
Lags – in a time series regression
Manipulation of data – if the raw data is manipulated (extrapolated or
interpolated)
CONSEQUENCES OF SERIAL CORRELATION
(e t et 1 ) 2
d t 2 N
e
2
t
t 1
REMEDIAL MEASURES FOR AUTOCORRELATION
Q
GOOD LUCK !!!