0% found this document useful (0 votes)
78 views65 pages

Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics

This document provides an overview of linear regression models and econometrics. It defines key terms like: - Econometrics aims to test economic theories against real-world data and provide quantitative relationships between economic variables. - Regression analysis describes and evaluates the relationship between a dependent variable and one or more independent variables. - Simple linear regression involves a dependent variable and one independent variable, assuming the dependent variable is influenced only by the independent variable and error term.

Uploaded by

webeshet bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views65 pages

Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics

This document provides an overview of linear regression models and econometrics. It defines key terms like: - Econometrics aims to test economic theories against real-world data and provide quantitative relationships between economic variables. - Regression analysis describes and evaluates the relationship between a dependent variable and one or more independent variables. - Simple linear regression involves a dependent variable and one independent variable, assuming the dependent variable is influenced only by the independent variable and error term.

Uploaded by

webeshet bekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Chapter One

Review of Linear Regression Models


 Definitions and Components of econometrics

• The economic theories we learn in various economics courses


suggest many relationships among economic variables. For
instance,
• in microeconomics we learn demand and supply models in which
the quantities demanded and supplied of a good depend on its
price.
• In macroeconomics, we study ‘investment function’ to explain the
amount of aggregate investment in the economy as the rate of
interest changes; and ‘consumption function’ that relates aggregate
consumption to the level of aggregate disposable income.
Definitions of Econometrics
• Economic theories that postulate the relationships between
economic variables have to be checked against data obtained
from the real world.
– If empirical data verify the relationship proposed by economic
theory, we accept the theory as valid.
– If the theory is incompatible with the observed behavior, we either
reject the theory or in the light of the empirical evidence of the data,
modify the theory.
– To provide a better understanding of economic relationships and a
better guidance for economic policy making we also need to know
the quantitative relationships between the different economic
variables.
– The field of knowledge which helps us to carryout such an evaluation
of economic theories in empirical terms is econometrics.
WHAT IS ECONOMETRICS?
• Literally, econometrics means “economic measurement”
• “Econometrics is the science which integrates economic theory,
economic statistics, and mathematical economics to investigate the
empirical support of the general schematic law established by
economic theory.
• It is a special type of economic analysis and research in which the
general economic theories, formulated in mathematical terms, is
combined with empirical measurements of economic phenomena.
• the “metric” part of the word econometrics signifies ‘measurement’,
and hence econometrics is basically concerned with measuring of
economic relationships.
• In short, econometrics may be considered as the integration of
economics, mathematics, and statistics for the purpose of providing
numerical values for the parameters of economic relationships and
verifying economic theories.
Econometrics vs mathematical economics
• Mathematical economics states economic theory in terms of
mathematical symbols. There is no essential difference between
mathematical economics and economic theory. Both state the same
relationships, but while economic theory use verbal exposition,
mathematical use symbols.
• Both express economic relationships in an exact or deterministic form.
Neither mathematical economics nor economic theory allows for random
elements which might affect the relationship and make it stochastic.
• although econometrics presupposes, the economic relationships to be
expressed in mathematical forms, it does not assume exact or
deterministic relationship.
• Econometric methods are designed to take into account random
disturbances which relate deviations from exact behavioral patterns
suggested by economic theory and mathematical economics.
• Further more, econometric methods provide numerical values of the
coefficients of economic relationships.
Econometrics vs. statistics
• Econometrics differs from statistics. statistician gathers empirical
data, records them, tabulates them or charts them, and attempts to
describe the pattern in their development over time and perhaps
detect some relationship between various economic magnitudes.
• Mathematical (or inferential) statistics deals with the method of
measurement which are developed on the basis of controlled
experiments.
• But statistical methods of measurement are not appropriate for a
number of economic relationships because for most economic
relationships controlled or carefully planned experiments cannot be
designed due to the fact that the nature of relationships among
economic variables are stochastic or random.
• Yet the fundamental ideas of inferential statistics are applicable in
econometrics, but they must be adapted to the problem economic
life.
Importance of Econometrics
• Each of such specifications involves a relationship among
economic variables (Direction of a relationship).
• As economists, we may be interested in questions such
as:
– If one variable changes in a certain magnitude, by how much
will another variable change?
– Also, given that we know the value of one variable; can we
forecast or predict the corresponding value of another?
• The purpose of studying the relationships among
economic variables and attempting to answer questions
of the type raised here, is to help us understood the real
economic world we live in.
Goals of Econometrics
• Three main goals of Econometrics are identified:
– Analysis i.e. testing economic theory
– Policy making i.e. Obtaining numerical estimates of the
coefficients of economic relationships for policy
simulations.
– Forecasting i.e. using the numerical estimates of the
coefficients in order to forecast the future values of
economic magnitudes.
Concept of correlation and regression function

• The correlation coefficient measures the degree to


which two variables are related /associated
• simply correlation denoted by r.
• For more than two variables we have multiple
correlations.
• Two variables may have either positive
correlation, negative correlation or may not be
correlated.
• Furthermore, depending on the form of relationship
the correlation between two variables may be linear
or non-linear.
• When higher values of X are associated with higher values
of Y and lower values of X are associated with lower values
of Y, then the correlation is said to be positive or direct.
• Examples:
– Income and expenditure
– Number of hours spent in studying and the score obtained
– Height and weight
– Distance covered and fuel consumed by car.
• When higher values of X are associated with lower values
of Y and lower values of X are associated with higher values
of Y, then the correlation is said to be negative or inverse.
• Examples:
– Demand and supply
• The correlation between X and Y may be one of the following
Perfect positive (slope=1)
Positive (slope between 0 and 1)
No correlation (slope=0)
Negative (slope between -1 and 0)
Perfect negative (slope=-1)
• The presence of correlation between two variables may be due
to three reasons:
One variable being the cause of the other. The cause
is called “subject” or “independent” variable, while
the effect is called “dependent” variable.
Both variables being the result of a common cause.
That is, the correlation that exists between two
variables is due to their being related to some third
force.
Con’t
• Therefore, in this section, we shall be concerned with
quantifying the degree of association between two
variables with linear relationship.
• Contrary to regression analysis explained in the
previous section
• the computation of coefficient of correlation does not
require one variable to be designated as dependent
and the other as independent.
• The measure of the degree of relationship between
any two variables known as the pearsonian coefficient
of correlation, usually denoted by r, is defined
• is termed as the product – moment formula.
• It can be further simplified as

NB. The building blocks of this formula are,


therefore, , , , ,
and n(sample size).
r

XY  nXY
[ X  nX ] [Y  nY ]
2 2 2 2
Yi Xi
1 4 2 4 16 8
2 7 3 9 49 21
3 3 1 1 9 3
4 9 5 25 81 45
5 17 9 81 289 153
∑(total ) 40 20 120 444 230

Interpretation: It implies strong positive


relation between X & Y.
Concept of regression function

• Regression Analysis:- is concerned with describing and


evaluating the relationship between a dependent variable
and one or more independent variables.
• Regression Analysis: is a statistical technique that can be
used to develop a mathematical equation showing how
variables are related.
• is used for bringing out the nature of relationship and using
it to know the best approximate value of the other variable.
• Therefore, we will deal with the problem of estimating
and/or predicting the population mean/average values of
the dependent variable on the basis of known values of the
independent variable (s).
Types of variables:
• The variable whose value is to be
estimated/predicted is known as dependent
variable
• The variables which help us in determining the
value of the dependent variable are known as
independent variables.
Simple Linear Regression model

• A regression equation which involves only two variables, a


dependent and an in dependent referred to us simple linear
regression.
• This model assumes that the dependent variable is
influenced only by one systematic variable and the error
term.
• The relationship between any two variables may be linear
or non-linear.
• Linear implies a constant absolute change in the dependent
variable in response to a unit changes in the independent
variable.
Simple Linear Regression model

• The specific functional forms may be linear, quadratic,


logarithmic, exponential, hyperbolic, or any other form.
• In this part we shall consider a simple linear regression model,
i.e. a relationship between two variables related in a linear
form.
• A relationship between X and Y, characterized as Y = f(X) is said
to be deterministic or non-stochastic if for each value of the
independent variable (X) there is one and only one
corresponding value of dependent variable (Y).
• On the other hand, a relationship between X and Y is said to
be stochastic if for a particular value of X there is a whole
probabilistic distribution of values of Y.
Stochastic and Non-stochastic Relationships
• Assuming that the supply for a certain commodity depends on its
price (other determinants taken to be constant) and the function
being linear, the relationship can be put as:
Q  f (P)    P
• The above relationship between P and Q is such that for a particular
value of P, there is only one corresponding value of Q. This is,
therefore, a deterministic (non-stochastic) relationship since for each
price there is always only one corresponding quantity supplied. This
implies that all the variation in Y is due solely to changes in X, and
that there are no other factors affecting the dependent variable.
• If this were true all the points of price-quantity pairs, if plotted on a
two-dimensional plane, would fall on a straight line. However, if we
gather observations on the quantity actually supplied in the market
at various prices and we plot them on a diagram we see that they do
not fall on a straight line.
Stochastic and Non-stochastic Relationships

Fig. The scatter diagram

• The derivation of the observation from the line may be


attributed to several factors.
– Omission of variables from the function
– Random behavior of human beings
– Imperfect specification of the mathematical form of the model
– Error of aggregation
– Error of measurement
Econometric functions
• In order to take into account the above sources of errors we
introduce in econometric functions a random variable which is
usually denoted by the letter ‘u’ or ‘ ’ and is called error term

or random disturbance or stochastic term of the function, so
called be cause u is supposed to ‘disturb’ the exact linear
relationship which is assumed to exist between X and Y. By
introducing this random variable in the function the model is
rendered stochastic of the form:
Yi     X  u i
• Thus a stochastic model is a model in which the dependent
variable is not only determined by the explanatory variable(s)
included in the model but also by others which are not
included in the model.
Methods of estimation
• Specifying the model and stating its underlying assumptions are the first
stage of any econometric application. The next step is the estimation of
the numerical values of the parameters of economic relationships. The
parameters of the simple linear regression model can be estimated by
various methods. Three of the most commonly used methods are:
– Ordinary least square method (OLS)
– Generalized least square method (GLS)
– Instrumental variables method (IV)
– Two stage least square method (2SLS)
– Maximum likelihood method (MLM)
– Method of moments (MM)
• But, here we will deal with the OLS method of estimation linear
regression model.
The regression Equation
• Regression equation is a statement of equality that
defines the relationship between two variables.
• The equation of the line which is to be used in
predicting the value of the dependent variable
takes the form Ye= a + bx.
Yi

    xi  ui

  
the dependent var iable the regression line random var iable

• The most universally used and statistically


accepted method of fitting such an equation is the
method of least squares.
The Method of Least Squares
• This method requires that a straight line is to be fitted
being the vertical deviations of the observed Y values from
the straight line (predicted Y values) is the minimum.

• If e1, e2, …… en are the vertical deviations of observed Y


values from the straight line (predicted Y values – Ye),
fitting a straight line in keeping with the above condition
requires that (for n sample size)
• This can be done by partially
differentiating with respect to “a”
and “b” and equating them to zero.
 ei is the error made when taking Ye
instead of Y. Therefore, ei = Yi– Ye
.
• To find the value of b partially derivate
with respect to b
 Con’t
Alternative formulas

• Non zero Intercept


ˆ  XY  n X Y
 
 X i2  n X 2

ˆ  ( X  X )( Y  Y )
 
(X  X )2

• With Zero intercept (α=0)


 X iY
ˆ  i

 X i2
Example
• Suppose we want to study the relationship
between input (number of workers) and
output (thousands of Birr) of five factories
given in above table.
• To fit the regression line of Yi (thousands of
Birr) on Xi(number of workers, we can employ
the method of least squares as follows:
 Arrange the data in tabular form
Industry output (Y)in input(X)(no. of Paired data
thousand of birr workers) (X,Y)

1 4 2 2,4
2 7 3 3,7
3 3 1 1,3
4 9 5 5,9
5 17 9 9,17

Output level (Yi) is believed to depend on


number of workers (Xi). Accordingly, Yi is a
dependent variable and Xi is independent
variable.
In order to visualize the form of regression we plot these points
on a graph as shown in fig. 6.1. What we get is a scatter diagram.
• When carefully observed, the scatter diagram at
least shows the nature of relationship; whether
positive or negative and whether the curve is linear
or non-linear.
• When the general course of movement of the
paired points is best described by a straight line
• the next task is to fit a regression line which lies as
close as possible to every point on the scatter
diagram.
• This can be done by means of either free hand
drawing or the method of least squares.
• However, the latter is the most widely used
method.
Yi Xi Yi.Xi
4 2 8 4
7 3 21 9
3 1 3 1
9 5 45 25
17 9 153 81

40 20 230 120
Mean
8 4
Solution
• Substituting these values in the above
equations, we get

 Therefore the least square regression


equation equals
• Estimate the amount of Birr that a factory will
have if it has 8 workers i.e Xi=8

• Consequently, if a factory has 8 workers, its level


of output will be 15 thousand ETB.
Example 6.2. In what follows you are provided with
sample observations on price and quantity
supplied of a commodity X by a competitive firm.
a) Construct the scatter diagram
b) What is the linear regression of Yi(quantity
supplies) on Xi(price of the commodity X).
c) Suppose price of the commodity X be 32, what
will be the quantity supplied by the firm?
• Tab. 6.3. Data on price and quantity supplied.
 

• If the price of x is 32, the estimated quantity


supplied will be approximately equal to 51
units.
Regression of X on Y
• In the above sub-topic we have explored
regression of Y on X type.
• Sometimes, it is possible and of interest to fit the
regression of X on Y type, i.e., being Y as
independent and X dependent.
• In such cases, the general form of the equation is
given by
• Where Xe = expected value of X
• a0 – X-intercept
• b0 – slope of the regression:
• Applying the principle of least squares as
before, the constants ao & bo are given as
follows

N.B. The regression equation of Y on X type and


of X on Y type coincide at ( , )
Assumptions of the Classical Linear Regression Model….

7. The model is linear in parameters.


– The classicals assumed that the model should be linear in
the parameters regardless of whether the explanatory and
the dependent variables are linear or not.
• This is because if the parameters are non-linear it is difficult to
estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
Y    x  u
– Example 1. is linear in both parameters
and the variables, so it Satisfies the assumption
ln Y     ln x  u
– is linear only in the parameters.
Since the the classicals worry on the parameters, the
model satisfies the assumption.
• Dear students! Check yourself whether the following
models
ln2
Y   satisfy
 ln X the
U above assumption
2
i
Y    X  U
i i i
Assumptions of the Classical Linear Regression Model….
8. U i is a random real variable
• This means that the value which u may assume in any one
period depends on chance; it may be positive, negative or
zero. Every value has a certain probability of being assumed
by u in any particular instance.
9. The mean value of the random variable(U) in any
particular period is zero E (U )  0
i

• This means that for each value of x, the random variable(u)


may assume various values, some greater than zero and
some smaller than zero, but if we considered all the possible
and negative values of u, for any given value of X, they
would have on average value equal to zero. In other words
the positive and negative values of u cancel each other.
10. The variance of the random variable(U) is constant in
each period (The assumption of homoscedasticity)

• For all values of X, the u’s will show the same


dispersion around their mean. In Fig.2.c this
assumption is denoted by the fact that the values
that u can assume lie with in the same limits,
irrespective of the value of X. For , u can assume
any value with in the range AB; for , u can assume
any value with in the range CD which is equal to AB
and so on.
Graphically;

Var (U i )  E [U i  E (U i )] 2  E (U i ) 2   2
• Mathematically;
(Since E ( U i )  0).This constant variance is called
homoscedasticity assumption and the constant variance
itself is called homoscedastic variance.
11. The random variable (U) has a normal distribution
• This means the values of u (for each x) have a bell shaped
symmetrical distribution about their zero mean and
constant variance  , i.e. 2

• U i  N (0, 2
)

• The random terms of different observations are


independent. (The assumption of no autocorrelation)
• This means the value which the random term assumed in one period
does not depend on the value which it assumed in any other period.
• Algebraically,

Cov ( u i u j )   [( u i   ( u i )][ u j   ( u j )] 
 E (u i u j )  0
12. The are a set of fixed values in the hypothetical
process of repeated sampling which underlies the linear
regression model.
– This means that, in taking large number of samples on Y and X,
the values are the same in all samples, but the values do differ
from sample to sample, and so of course do the values of .
13. The explanatory variables are measured without error
– U absorbs the influence of omitted variables and possibly
errors of measurement in the y’s. i.e., we will assume that the
regressors are error free, while y values may or may not
include errors of measurement
14. The random variable (U) is independent of
the explanatory variables.
• This means there is no correlation between the
random variable and the explanatory variable.
If two variables are unrelated their covariance is
zero. Hence Cov ( X i , U i )  0
• Proof
cov( XU )   [( X i   ( X i )][ U i   (U i )] 

 [( X i  ( X i )(U i )] given E (U i )  0

 ( X iU i )  ( X i )(U i )   ( X i U i )  X i  (U i )  0
15, The dependent variable is normally distributed.
• i.e.  Y ~ N (   x ),  
i i
2

• Proof: Mean=(Y )    x  u  i i

    X since  (u )  0
i i

• Variance = Var (Y )   Y  (Y )


i i i
2

    X i  ui  (  X i )
2

•   (u i ) 2
  (Since
2 (ui )2   2 )
 var(Yi )   2
• The shape of the distribution of Y is determined by
i

the shape of u ithe distribution of which is normal by


assumption 6. Since ,  and  being constant, they
don’t affect the distribution of y . Furthermore, the
i

values of the explanatory variable, x , are a set of


i

fixed values by assumption 5 and therefore don’t


affect the shape of the distribution of y . i

 Y i ~ N(    x i ,  2 )
• successive values of the dependent variable are
independent, i.e Cov(Y , Y )  0 i j

• Proof: Cov(Yi , Y j )  E{[Yi  E (Yi )][Y j  E (Y j )]}


 E{[  X i  U i  E (  X i  U i )][  X j  U j  E (  X j  U j )}

Since Yi     X i  U i and Y j     X j  U j
= E[(   X i  Ui     X i )(   X  U     X )] Since  (u )  0
j j j i

 E (U iU j )  0
Therefore, Cov (Y i ,Y j )  0
PROPERTIES OF OLS ESTIMATORS
• The ideal or optimum properties that the OLS
estimates possess may be summarized by well
known theorem known as the Gauss-Markov
Theorem.
• Statement of the theorem: “Given the assumptions
of the classical linear regression model, the OLS
estimators, in the class of linear and unbiased
estimators, have the minimum variance, i.e. the
OLS estimators are BLUE.
The BLUE Theorem
• i.e. Best, Linear, Unbiased Estimator. An estimator is called
BLUE if:
• Linear: a linear function of the a random variable, such as,
the dependent variable Y.
• Unbiased: its average or expected value is equal to the true
population parameter.
• Minimum variance: It has a minimum variance in the class
of linear and unbiased estimators. An unbiased estimator
with the least variance is known as an efficient estimator.
• According to the Gauss-Markov theorem, the OLS
estimators possess all the BLUE properties. The detailed
proof of these properties are presented below
Linearity: (for ˆ & )ˆ

• ˆ   x y   x (Y  Y )   x Y  Y  x but
i i i i i
,
xi  (X  X )  X  nX  nX  nX  0
x 2x 2
x 2


i i
i

xi
•  ˆ   x i Y now let  K i ( i  1, 2 ,..... n )
 xi 2

 x i2

 ˆ   K i Y
•  ̂  K1Y1  K2Y2  K3Y3      KnYn
•  ˆ is linear in Y
• Check yourself question:
• Show that ̂is linear in Y? Hint: ̂   .1 nDerive
 Xk i Ythis
i
relationship between and Y. ̂
Unbiasedness:
• In our case, ˆ & ˆ are estimators of the true
parameters  &  .To show that they are the
unbiased estimators of their respective parameters
means to prove that: ( ˆ )   and (ˆ )  

• Proof (1): Prove that ˆ is unbiased i.e.  ( ˆ )   .


• We know that ˆ  kY   k (   X  U )  k  k X  k u
i i i i i i i i i

but ki  0 and ki Xi  1


xi  ( X  X )  X  nX nX  nX
k i     0  ki  0
xi
2
xi2
 x i2  x i2

k i X i 
xi X i ( X  X ) Xi
x i
2

x i
2

X 2  XX X 2  nX 2
X 2
 nX 2

X 2
 nX 2
1  k X i i 1

• ˆ    kiui  ˆ    kiui  ( ˆ )  E (  )   k i E (u i ),Since k are


i
fixed
but  ( u i )  0  ( ˆ )  
Proof(2): prove that ̂ is unbiased i.e.:  (ˆ )  

• From the proof of linearity property, we know that:


̂   1n  Xki Yi

•   1n  Xki   X i  Ui   since Yi     X i  U i

  1
n X i  1
n u i   Xk i   Xk i X i  Xk i u i

  1
n u i  Xk i u i
 ˆ    1
n u i  Xk iu i

   1
n  X k i )u i

 (ˆ )    1
n  ( u i )  X  k i  ( u i )
 ( ˆ )  
•̂ is an unbiased estimator of  .
Minimum variance of ˆ and ˆ

• a. Variance of ˆ from equ. Unbiased


var()  (ˆ  (ˆ))2  (ˆ  )2 var( ˆ )  E (  k i u i ) 2

 [k12 u12  k 22 u 22  ............  k n2 u n2  2k1 k 2 u1u 2  .......  2k n 1 k n u n 1u n ]


 [k12u12  k22u22  ............  kn2un2 ]  [2k1k2u1u2  ....... 2kn1knun1un ]
  (  k i2 u i2 )   (  k i k j u i u j ) i  j
• k i2 (ui2 )  2k i k j (ui u j )   2 k i2 since  ( u i u j )  0
• xi and therefore
k   x i2 1
i
 x i2 k i
2
 
(  x i2 ) 2  x i2

 2
 var( ˆ )   2
k i
2

 x i2
Variance of ˆ
var(ˆ )   (ˆ  ()      
2
2
ˆ
var( ˆ ) 
   1 n  Xk i  ui2
2
    1 n  Xk i  (ui ) 2
2
  2( 1n  Xk i ) 2

  2( 1n2  2n Xki  X 2ki2 )   (1n  2X n ki  X ki )


2 2 2
since  ki  0

  2 ( 1n  X 2 ki2 )

2 1 X2  x 2
1
 (  ) since ki2  i

n  xi 2 (xi2 ) 2 xi2
Again, 1 X 2  x i2  n X 2
 X 2 
    

n xi2
n  x i2  nxi
2

1 X2  2  X i 
2
 var(ˆ )    n  2    
2

2 
 xi   nxi 
The variance of the random variable (Ui)
• You may observe that the variances of the OLS
estimates involve ,which is the population variance
 2

of the random disturbance term. But it is difficult to


obtain the population data of the disturbance term
because of technical and economic reasons. Hence
it is difficult to compute  2; this implies that
variances of OLS estimates are also difficult to
compute. But we can compute these variances if we
take the unbiased estimate of which is ̂ computed
2 2

from the sample value of the disturbance term ei


 e 2
from the expression: ˆ u2  i

n  2
Show that OLS estimators have minimum
variance
• Minimum variance of Alpha
• Minimum variance of Beta

You might also like