0% found this document useful (0 votes)
2 views

Lesson 3

Course outline

Uploaded by

godstarkalinga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lesson 3

Course outline

Uploaded by

godstarkalinga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 33

ECONOMETRIC MODELLING

WISDOM R. MGOMEZULU
ECONOMETRIC PROCEDURE
• Statement of theory or hypothesis.
• Specification of the mathematical model of the theory.
• Specification of the statistical, or econometric, model.
• Obtaining the data.
• Estimation of the parameters of the econometric model.
• Hypothesis testing.
• Forecasting or prediction.
• Using the model for control or policy purposes.
Statement of Theory or Hypothesis

• This is where the researcher states what economic theory


says about the inter-dependency of economic variables of
interest.
• Economists include variables in econometric models based
on theory. For example, the theory of demand states that
quantity demanded for a product for a given time period is
inversely related to its own price.
• Economic theory also indicates the direction of
relationship between variables. However, the theory does
not quantify the dependency.
Specification of the Mathematical Model of Consumption

• The theory of demand can be presented in a mathematical


form as shown below;
• 𝑌 = 𝛽 0 + 𝛽1 𝑋
• 𝑌 is the dependent variable. For example, quantity
demanded 𝑥 is the independent variable. In this case price.
• 𝛽0 is the intercept parameter
• 𝛽1 is the slope parameter
• The mathematical model is deterministic or exact. It does
not take into account the possibility of error.
Specification of the Econometric Model

• The equation in the previous slide assumes that there is an


exact or deterministic relationship between the two
variables. But relationships between economic variables
are generally inexact.
• An econometric model is hence stochastic in nature
Data collection

• Data is collected on both dependent and independent


variables. The quality of estimates are as good as the
quality of data. There is need for selecting an adequate
sample using appropriate sampling technique. Data
collection officers
• (enumerators) should be those who are trustworthy, well
trained and preferably experienced.
Estimation of parameters

• Now that we have the data, our next task is to estimate the
parameters of the model. The numerical estimates of the
parameters give empirical content to the consumption
function
• Y = 0.65 + 7.2Xi
Hypothesis Testing
• Assuming that the fitted model is a reasonably good
approximation of reality, we have to develop suitable
criteria of finding out whether the estimates are in accord
with the expectations of the theory that is being tested.
• A theory or hypothesis that is not verifiable by empirical
evidence may not be acceptable as a part of scientific
enquiry.
Forecasting or Prediction
• If the chosen model does not refute the hypothesis or
theory under consideration, we may use it to predict the
future value(s) of the dependent, or forecast, variable Y on
the basis of the known or expected future value(s) of the
explanatory, or predictor, variable X.
Use of the Model for Control or Policy
Purposes

• Having known the parameter estimates, we can go ahead


and advise policy makers and implementers on how best
consumption should be changed to alter GDP levels.
REGRESSION
• Regression is a technique for determining the statistical
relationship between two or more variables where a
change in a dependent variable is associated with, and
depends on, a change in one or more independent
variables.
• A regression model can also be defined as a mathematical
equation that helps to predict or forecast the value of the
dependent variable based on the known values of
independent variables.
SIMPLE LINEAR
REGRESSION
• A simple linear regression is a statistical
equation that characterizes the relationship
between a dependent variable and only one
independent variable.
III. Simple Linear Model: Estimation

1. An Econometric Model

2. Assumptions of the Simple Linear Regression


Model
3.1 An Econometric Model

Two purposes in general…

1. Estimate a relationship among


economic variables, such as y = f(x).

2. Forecast or predict the value of one


variable, y, based on the value of
another variable, x.
 The simple regression function

E ( y | x)  y| x 1  2 x

 Slope of regression line

E ( y | x) dE ( y | x)
2  
x dx

“” denotes “change in”


Y X 800 1000 1200 1400 1600 1800 2000 2200 2400 2600

Weekly family 550 650 790 800 1020 1100 1200 1350 1370 1500
consumption
expenditure Y, 600 700 840 930 1070 1150 1360 1370 1450 1520
MK
650 740 900 950 1100 1200 1400 1400 1550 1750

700 800 940 1030 1160 1300 1440 1520 1650 1780

750 850 880 1080 1180 1350 1450 1570 1750 1800

- 880 - 1130 1250 1400 - 1600 1890 1850

- - - 1150 - - - 1620 - 1910

Total 3250 4620 4450 7070 6780 7500 6850 10430 9660 12110

Conditional 650 770 890 1010 1130 1250 1370 1490 1610 1730
means of Y, E(Y|
X)
E ( y | x )   y | x   0  1 x (2.1)

E (y/x)
Average expenditure

E (y/x) = β0 + β1x

∆E (y/x)
∆x

β0

X
Income
E (y/x)

200

150

100

80 100 120 140 160 180 200 220 240 260


Weekly income, $
Y
Weekly consumption expenditure, $

149
Distribution of Y given X = $ 220

101

65

80 140 220
Weekly Income
3.2 Assumptions of the Simple Linear Regression
Model-I

The average value of y, for each value of x, is given by the


linear regression
E ( y ) 1  2 x
2. For each value of x, the values of y are distributed about their
mean value, following probability distributions that all have the
same variance,
var( y )  2
Data satisfying this condition are said to be homoskedastic. If this
assumption is violated, so that for all values of
income x, the data are said to be heteroskedastic.
The values of y are all uncorrelated, and have zero
covariance, implying that there is no linear association among
them.
cov( yi , y j ) 0

This assumption can be made stronger by assuming that the values


of y are all statistically independent. This is what we mean by
saying the sampling is done at random.

The variable x is not random and must take at least two


different values. The idea of regression analysis is to measure the
effect of changes in one variable, x, on another, y.

5. (optional) The values of y are normally distributed about


their mean for each value of x, 2
y ~ N [(1  2 x),  ]
3.2.1 Introducing the Error Term

The random error term is

e  y  E ( y )  y  1  2 x

Rearranging gives

y 1  2 x  e
y is dependent variable; x is independent or explanatory variable
y SKIP
y4
e{
.
4 E(y) = 1 + 2x

y3
. .} e 3
y2 e {
2

y1 .
} e1

x1 x2 x3 x4 x

The relationship among y, e and the true regression line.


The essence of regression analysis is that any observation
on the dependent variable y can be decomposed into two
parts: a systematic component and a random component.

The systematic component of y is its mean E(y), which


itself is not random, since it is a mathematical expectation.

The random component of y is the difference between y


and its mean value E(y). This is called a random error term,
and is defined as
 i Yi  E (Y | X i )
or

Yi  E (Y | X i )   i
or

Yi   0  1 X i   i

where the deviation  i is an unobservable random variable taking positive or negative


values. Technically,  i is known as the stochastic disturbance or stochastic error
term. We are summing systematic or deterministic component and stochastic or
nonsystematic component. The dependent variable y is explained by a component that
varies systematically with the independent variable x and by the random error term  .
We assume that the stochastic component is a proxy for all the omitted or neglected
variables that may affect Y but are not (or cannot be) included in the regression model.
Now if we take the expected value of the equation above on both sides, we obtain

E (Yi | X i )  E[ E (Y | X i )]  E ( i | X i )

E (Yi | X i )  E (Y | X i )  E ( i | X i )

Since E (Yi | X i ) is the same thing as E (Y | X i ) , implies that

E ( i | X i ) = 0

Thus the assumption that the regression line passes through


the conditional means of Y implies that the conditional mean
values of (conditional upon the given X’s) are zero.
The random variable y and the error term differ only by a constant
E(y), and since y is random, so is the error term . Hence the
probability density function for y and the error term are identical
except for their location
(·)

f (e) f (y)

0 𝜷𝟎 + 𝜷𝟏𝒙

Figure 4: Probability density functions for  and y


Assumptions of the Simple Linear Regression Model-II

SR1 y 1  2 x  e

SR2. E (e) 0  E ( y ) 1  2 x

SR3. var(e)  2 var( y )

SR4. cov(ei , e j ) cov( yi , y j ) 0

SR5. {xt , t 1....T } is a set of fixed variables and must take


at least two different values.

SR6. (optional) The values of e are normally distributed


about their mean
e ~ N (0,  2 )
The significance of the stochastic disturbance term is:

a.Vagueness of theory. The theory determining the behaviour of y


may be, and often is, incomplete.
b.Unavailability of data. Even if we know what some of the
excluded variables are and therefore consider a multiple
regression rather than a simple egression, we may not have
quantitative information about these variables.
c.Core variables versus peripheral variables. There could be so
many other independent variables that also affect our dependent
variable. But is is quite possible that the joint influence of all or
some of these variables may be so small and at best nonsystematic
or random that as a practical matter and for cost considerations it
does not pay to introduce them into the model explicitly.
d. Intrinsic randomness in human behaviour. Even if we succeed in
introducing all the relevant variables into the model, there is
bound to be some unpredictable randomness in individual y’s that
cannot be explained no matter how hard we try.
e. Poor proxy variables. In practice data may be plagued by errors
of measurement. Measurements on proxy variables may not
accurately give the measurement on the true variable.
f. Principle of parsimony. We would like to keep our regression as
simple as possible. Of course, we should not exclude relevant and
important variables just to keep the regression model simple.
g. Wrong functional form. Even if we have theoretically correct
variables explaining a phenomenon and even if we can obtain data
on these variables, very often we do not know the form of the
functional relationship between the regressand and the regressors.
3.3 Estimating Parameters

3.3.1 The Least Square Estimation (LSE)


 The fitted regression line is

yˆt b1  b2 xt

 The least squares residual

eˆt  yt  yˆt  yt  b1  b2 xt

Suppose any other fitted line


yˆt* b1*  b2* xt
 Least squares line has smaller sum of squared residuals

 t  t t  t  t t)
ˆ
e 2
 ( y  ˆ
y ) 2
 ˆ
e *2
 ( y  ˆ
y * 2
y
. y4
^e { ^y = b + b x
4
.^y 1 2

^y 4

.} ^e3
3

y2 .
^e {. y
2 .
3

y^2
^y
1.
} ^e
y1. 1
x1 x2 x3 x4 x
The relationship among ^y, e and the fitted regression line.
y
. y4 ^y = b + b x

y^*2
^y*
. 3
{.
^e*
4

^y*
1 2

^y*= b* + b* x
^y*
1.
. ^e*{ 4
1 2

^e*{ y .

{
3
2 . 2 y3

^e*
1

y1.
x1 x2 x3 x4 x

The sum of squared residuals from any other line will be larger.

You might also like