CHAPTER 2 AcFn
CHAPTER 2 AcFn
2
If Y=f(X), terminology and notation of Y and X
are as follows:
Y X
Explained variable Explanatory variable
Regressand Regressor
Response Stimulus variables
3
2.1. The Concept of Regression Analysis under simple
linear regression model
The simplest economic relationship is represented
through a two-variable model (also called the simple
linear regression model) which is given by:
Y = a + bX
where a and b are unknown parameters (also called
regression coefficients) that we estimate using sample
data.
Here Y is the dependent variable and X is the
independent variable.
4
Example: Suppose the relationship between
expenditure (Y) and income (X) of households
is expressed as:
Y = 0.6X + 120
Here, on the basis of income, we can predict
expenditure. For instance, if the income of a
certain household is 1500 Birr, then the
estimated expenditure will be:
Expenditure = 0.6(1500) + 120 = 1020 Birr
Note that since expenditure is estimated on the
basis of income, expenditure is the dependent
variable and income is the independent
variable. 5
The purely mathematical model stated above has a
limited interest to the econometrician as it assumes
that there is an exact or deterministic relationship
between Y and the explanatory variable X. But
relationships between economic variables are
generally inexact (stochastic).
Such a set of behavioral equations derived from an
economic model is simply represented by
econometric model.
6
A simple linear regression model, i.e. a relationship
between two variables related in a linear form.
A relationship b/n variables has two important
forms: stochastic and non-stochastic, among which
we shall be using the former in econometric analysis.
A relationship between X and Y, characterized as Y =
f(X) is said to be deterministic or non-stochastic if for
each value of the independent variable (X) there is
one and only one corresponding value of dependent
variable (Y).
7
On the other hand, a relationship between X and
Y is said to be stochastic if for a particular
value of X there is a whole probabilistic
distribution of values of Y. In such a case, for
any given value of X, the dependent variable Y
assumes some specific value only with some
probability.
Let’s illustrate the distinction between stochastic
and non stochastic relationships with the help of a
supply function.
8
Assuming that the supply for a certain commodity
depends on its price (other determinants taken to be
constant) and the function being linear, the
relationship can be put as:
Q f ( P) P (2.1)
The above relationship between P and Q is such that for a
particular value of P, there is only one corresponding
value of Q. This is, therefore, a deterministic (non-
stochastic) relationship.
This implies that all the variation in Q is due solely to
changes in P, and that there are no other factors affecting
the dependent variable.
9
If this were true all the points of price-quantity
pairs, if plotted on a two-dimensional plane, would
fall on a straight line.
However, if we gather observations on the quantity
actually supplied in the market at various prices and
we plot them on a diagram we see that they do not
fall on a straight line.
10
The deviation of the observation from the line
may be attributed to several factors.
1. Omission of variables from the function
In economic reality each variable is influenced by
very large number of factors and each variable may
not be included in the function because of
a) Some of the factors may not be known.
b) Even if we know them the factors may not be
measured statistically example psychological
factors (test, preferences, expectations etc) are not
measurable
11
c) Some factors are random appearing in an
unpredictable way & time. Example epidemic, earth
quacks, etc..
d) Some factors may be omitted due to their small
influence on the dependent variables.
e) Even if all factors are known, the available data
may not be adequate for the measure of all factors
influencing a relationship.
2. Random behavior of human beings: The erratic
nature of human beings behavior. The human
behavior may deviate from the normal situation to a
certain extent in unpredictable way.
12
3. Imperfect specification of the mathematical form
of the model
We may wrongly specified the relationship between
variables. We may form linear function to non-
linearly related relationships and vice versa.
4. Error of aggregation
The data of many Economic variables are available in
aggregate form ex. Consumption, income etc is found
in aggregate form which we are added magnitudes
referring to individuals where behavior is dissimilar.
13
5. Error of measurement
When we are collecting data we may commit errors
of measurement.
6. Sampling error:
Consider a model relating consumption (Y) with
income (X) of households. The sample we randomly
choose to examine the relationship may turn out to be
predominantly poor households. In such cases, our
estimation of ∂ and ß from this sample may not be
as good as that from a balanced sample group.
14
In order to take into account the above sources of
errors we introduce in econometric functions a
random variable which is usually denoted by the
letter ‘u’ or ‘ ’ and is called error term or random
disturbance or stochastic term of the function
because u is supposed to ‘disturb’ the exact linear
relationship which is assumed to exist between X
and Y.
By introducing this random variable in the function
the model is rendered stochastic of the form:
Yi X i ui (2.2)
15
This stochastic model is a model in which the
dependent variable is not only determined by the
explanatory variable(s) included in the model but
also by others which are not included in the model.
The true relationship which connects the variables
involved is split into two parts:
a part represented by a line &
a part represented by the random term ‘u’.
16
The scatter of observations represents the true
relationship between Y and X.
The line represents the exact part of the
relationship, and
the deviation of the observation from the line
represents the random component of the
relationship. 17
If there were no errors in the model, we would
observe all the points on the line Y , Y ,......, Y 1
'
2
'
n
'
corresponding to X 1 , X 2 ,...., X n
19
Example 1. Y x u is linear in both parameters and
i
20
2. Ui is a random real variable
This means that the value which ‘ui’ may assume in
any one period depends on chance; it may be
positive, negative or zero.
Every value has a certain probability of being
assumed by ‘ui’ in any particular instance.
21
3. The mean value of the random variable(U) in
any particular period is zero
The mean value of the random variable(U) in any
particular period is zero.
This means that for each value of x, the random
variable(u) may assume various values, some
greater than zero and some smaller than zero, but
if we considered all the positive and negative
values of u, for any given value of X, they would
have on average value equal to zero. In other
words the positive and negative values of u cancel
each other.
Mathematically, E(U ) 0….……..….(2.3)
i
22
4. The variance of the random variable(U) is
constant in each period (The assumption of
homoscedasticity)
For all values of X, the ui’s will show the same
dispersion around their mean. In Fig.2.c this
assumption is denoted by the fact that the values
that ui can assume lie with in the same limits,
irrespective of the value of X. For x1 , u can
assume any value with in the range AB; for x2 , ui
can assume any value with in the range CD which
is equal to AB and so on.
23
Graphically;
Ui N (0, )
2
25
6. The assumption of no autocorrelation
The random terms of different observations(Ui ,Uj )are
independent.
This means the value which the random term assumed
in one period does not depend on the value which it
assumed in any other period.
Algebraically,
Cov(uiu j ) [(ui (ui )][u j (u j )]
E (u i u j ) 0
26
7. There are a set of fixed values in the hypothetical
process of repeated sampling which underlies the
linear regression model.
This means that, in taking large number of samples
on Y and X, the ‘x’ values are the same in all
samples, but the values of ‘ui’ do differ from sample
to sample, and so of course do the values of ‘yi’ .
27
8. The random variable (U) is independent of the
explanatory variables.
This means there is no correlation between the random
variable and the explanatory variable. If two variables
are unrelated their covariance is zero. i.e.
Cov( X i ,U i ) 0
28
9. The explanatory variables are measured without
error
Ui absorbs the influence of omitted variables and
possibly errors of measurement in the yi’s. i.e., we
will assume that the regressors are error free,
while ‘y’ values may or may not include errors of
measurement.
29
We can now use the above assumptions to derive
the following basic concepts.
A. The dependent variable is normally distributed.
Proof: i.eY ~N ( x ), 2
i i
Mean=
Variance: (Y ) x u
i i
X i ;since (ui ) 0
Var (Yi ) Yi (Yi )
2
X i ui ( X i )
2
(u i ) 2
2 ;since (ui )2 0
var(Yi ) 2
30
The shape of the distribution of is determined by
the shape of the distribution of ui which is normal
by assumption 4. Since ∂ and ß, being
constant, they don’t affect the distribution of .
Furthermore, the values of the explanatory
variable, , are a set of fixed values by assumption
5 and therefore don’t affect the shape of the
distribution of yi.
Yi ~N ( x i ), 2
31
E (U iU j ) 0
Proof:
Cov(Yi , Yj ) E{[Yi E(Yi )][Yj E(Yj )]}
E{[ X i Ui E( X i Ui )][ X j U j E( X j U j )}
Therefore, Cov(Yi ,Y j ) 0
32
2.2.Metods of Estimation for Simple Linear
Regression Model
Objective of regression analysis:
how the average value of the dependent variable
(regressand) varies with values of explanatory
variables (regressors).
E[Y | X ] f ( X )
This function is called conditional expectation function
(CEF), or population regression function (PRF)
33
The Stochastic PRF written in econometric model
is used for empirical purposes(analysis).
Yi E[Y | X i ] i
The stochastic disturbance term ɛi plays a critical role in
estimating the PRF.
The PRF is an idealized concept. since in practice one rarely
has access to the entire population of interest.Hence, we use
the stochastic sample regression function (SSRF)
to estimate the PRF, i.e.,
we use Y Yˆ e to estimate; Y E[Y | X ]
i i i i i i
Yˆ f(X )
i i
34
where:
Using theoretical r/p b/n X & Y, Yi is decomposed into
a non-stochastic/systematic component α+βXi and a
random component ui.
Yi xi ui .
the dependent var iable the regression line random var iable
X
X1 X2 X3 X4
37
O4
e4 SRF : Yˆ ˆ ˆX
Y ɛ4
R4 E[Y|Xi] = α + βXi
O1 R3
P4
P3 Ɛi & ei are not
ɛ1 RP
22 e3 identical
e1 ɛ3
e ɛ2
P1 2
O3 Ɛ1 < e1
α O2
R1 Ɛ2 = e2
̂
Ɛ3 < e3
X1 X2 X3 X4 X Ɛ4 > e4
38
Our sample is only one of the large number of
possibilities.
Implication: the SRF line is just one of the possible
SRFs. Each SRF line has unique ˆ and ˆ values.
Then, which of these lines should we choose?
Generally we will look for the SRF which is very close
to the (unknown) PRF.
We need a rule that makes the SRF as close as
possible to the observed data points.
But, how can we devise such a rule? Equivalently,
how can we choose the best technique to estimate
the parameters of interest (α and β)?
39
Generally, there are 3 methods of estimation:
method of least squares,
method of moments, and
maximum likelihood estimation.
2.3 The Method of Least Squares(OLS method)
The most common method for fitting a regression line is
the method of least-squares specifically, Ordinary Least
Squares (OLS).
The reasons to use OLS
i. The computational procedure of OLS is fairly simple as
compared to other econometric methods.
40
ii. The parameters obtained by this methods
have some optimal properties i.e. BLUE
(Best,Linear, Unbiased Estimators).
What does OLS do?
A line fits a dataset well if observations are
close to it, i.e., if predicted values obtained
using the line are close to the values actually
observed.
41
2.3 The Method of Least Squares
Meaning, the residuals should be small.
Therefore, when assessing the fit of a line, the
vertical distances of the points from the line are the
only distances that matter.
The OLS method calculates the best-fitting line for
a dataset by minimizing the sum of the squares of
the vertical deviations from each data point to the
line (the Residual Sum
n
of Squares, RSS).
Minimize RSS = i
2
e
i 1
We could think of minimizing RSS by successively
choosing pairs of values for ˆ and ˆ until RSS is
made as small as possible. 42
2.3 The Method of Least Squares
To find the values of ∂ & ß that minimize this
sum, we have to differentiate with respect to ∂ˆ &߈
& set the partial derivatives equal to zero
Why the sum of the squared residuals? Why not just
minimize the sum of the residuals?
To prevent negative residuals from cancelling
positive ones.
If we use ei , all the error terms ei would receive
equal importance no matter how closely/widely
scattered the individual observations are from SRF.
If so, the algebraic sum of ei’s may be small (even
zero) though the eis are widely scattered about SRF.
43
2.3 The Method of Least Squares
n
n
ˆ
n
ˆ ˆ
OLS: minimizei 1 i
i α β
2 2 2
e (Yi Y i ) (Y X i )
i 1 i 1
αˆ , βˆ
n n
F.O.C.: (1) ( e ) 2
i [ (Yi ˆ ˆX i ) 2 ]
i 1
0 i 1
0
ˆ ˆ
n n
2.[ (Yi ˆ ˆX i )][1] 0 (Yi ˆ ˆX i ) 0
i 1 i 1
n n n n n
Yi ˆ ˆX i 0 Yi nˆ ˆ X i 0.
i 1 i 1 i 1 i 1 i 1
Y ˆ ˆX 0 αˆ Y βˆX 44
2.3 The Method of Least Squares
n n
F.O.C.: (2) ( e ) 2
i [ (Yi ˆ ˆX i ) 2 ]
i 1
0 i 1
0
ˆ ˆ
n
2.[ (Yi ˆ ˆX i )][ X i ] 0
i 1
n
[(Yi ˆ ˆX i )( X i )] 0
i 1
n n n
Yi X i ˆX i ˆX i2 0
i 1 i 1 i 1
n n n
Yi X i ˆ X i ˆ X i2
i 1 i 1 i 1 45
2.3 The Method of Least Squares
Solve αˆ Y βˆX and Y i X i αˆ X i βˆ X i2
(called normal equations) simultaneously!
i i i i
Y X ˆ X ˆ X 2 Y X (Y βˆX)( X ) βˆ X 2
i i i i
ˆ ˆ
Yi X i Y X i βX X i β X i2
Yi X i Y X i βˆ X i2 βˆX X i
ˆ
Yi X i Y X i β( X i X X i )
2
ˆ
Yi X i nXY β( X i nX )
2 2
Xi
b/c X X i nX. 46
n
2.3 The Method of Least Squares
Y X nXY
Thus, 1. βˆ i i
2 2
X nX
i
Alternative expressions forβˆ :
( X X )(Y Y ) ˆβ xy
ˆ
2. β i i
x 2
2
( Xi X ) where : x X X & y Y Y .
i i
Cov ( X , Y ) n Y X ( X )( Y )
ˆ
3. β 4. βˆ i i i i
Var ( X ) 2 2
n X ( X ) 47
i i
2.3 The Method of Least Squares
48
2.3 The Method of Least Squares
Previously, we came across two normal equations:
1. (Yi ˆ ˆX i ) 0 this is equivalent to: ei 0
n
i 1
n
2. i
[(Y
ˆ ˆX )( X )] 0 equivalently,
i i e Xi i 0
i 1
Y Yˆ
Y ˆ ˆX
These facts imply that the sample regression line
passes through the sample mean values of X and Y.
Yˆ ˆ ˆX
Y
X 50
X
Example
1. Assume the following hypothetical weekly
data on Y (Demand for a normal good) and X
(its price) are obtained from a certain market.
𝑌𝑖 1 3 5 6 5 6 4 7 8 7
𝑋𝑖 7 6 5 4 4 4 4 3 3 4
1 1.4 2 1.96 4 x y 21
ˆ 0.75
i i
2 0.4 -1
x
0.16 1 2
3 2.4 2 i 28
5.76 4
4 -3.6 -3
12.96 9
5 0.4 0
6 -2.6 0
1.96 0
ˆ Y ˆX
6.76 0
7 -0.6 -2 0.36 4
9.6 0.75(8) 3.6
8 -0.4 -1 0.16 1
1.96 1
9 -1.4 1
0.16 4
10 0.4 2
30.4 28 54
Ʃ 0 0
2.3 The Method of Least Squares
ˆY 3.6 0.75 X ei Yi Yˆi
14.65
i 2
e 2
i i
0.01
i
e i
1 11.1 -0.10
yˆ 15.75
1.3225
2 8.85 1.15 2
3 11.10 0.90 0.81 i
4 7.35 -1.35 1.8225
5
6
9.60
9.60
0.40
-2.60
0.16
6.76
y 30.4 2
i
e 0
8 8.85 1.15 1.3225
i
9 10.35 0.65 0.4225
10 11.10 -1.10
1.21
Ʃ 96 0 14.65
55
2.3 The Method of Least Squares
Assumptions Underlying the Method of Least Squares
To obtain ˆ & ˆ in the model Yi ˆ ˆX i ei , the only
assumption we need is that:
X must take at least 2 distinct values (number of
observations number of parameters).
But the objective in regression analysis is not only
to obtain ˆ and ˆ but also to draw inferences about
the true parameters and .
For example, we’d like to know how close ˆ and ˆ
are to and or how close Yˆi is to E[Y | X i ] .
To that end, we must also make certain assumptions
about the manner in which Yi ’s are generated. 56
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
☞Given the assumptions of the classical linear
regression model, the least-squares estimators
possess some ideal or optimum properties.
These statistical properties are extremely important
because they provide criteria for choosing among
alternative estimators.
These properties are contained in the well-known
Gauss–Markov Theorem.
57
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
Gauss-Markov Theorem:
“Given the assumptions of the classical linear
regression model, the OLS estimators ˆ and ˆ , in
the class of linear and unbiased estimators, have
the minimum variance, i.e. the OLS estimators are
BLUE.
An estimator is called BLUE if:
Linear: a linear function of the random variable Y
58
2.4 Properties of OLS Estimators and the Gauss-Markov Theorem
59
Variances of OLS estimates:
ˆ
or , var( )
2
or, var (αˆ) σ 2
i
X 2
x 2
i
n xi2
60
2.5 Residuals and Goodness of Fit
Decomposing the variation in Y:
61
2.5 Residuals and Goodness of Fit
One measure of the variation in Y is the sum of its
squared deviations around its sample mean, often
described as the Total Sum of Squares, TSS.
TSS, the total sum of squares of Y can be
decomposed into two:
ESS, the ‘explained’ sum of squares, and
RSS, residual (‘unexplained’) sum of squares.
i
(Y Y ) 2
i
(Yˆ Y ) 2
i
e 2
62
2.5 Residuals and Goodness of Fit
Yi Yˆi ei Yi Y Yˆi Y ei
(Yi Y ) 2 (Yˆi Y ei ) 2
i
2
ˆ
(Y Y ) (Y Y e ) i
2
i
y 2
i ( yˆ i ei ) 2
i i i 2 yˆ i ei
y 2
ˆ
y 2
e 2
TSS y 2
ESS x 2
2. R
2
ˆ 2
R
2 ESS
( ˆx) 2
TSS y 2
TSS y 2
65
2.5 Residuals and Goodness of Fit
Coefficient of Determination (R2):
R
2 ESS
ˆ (
)( )
xy x 2
ESS i
ˆ
y 2
R
2
TSS x 2
y 2
TSS y 2
4. R
ESS
2
ˆ
xy
15.75
0.5181
TSS y 2 30.4
R 2 xy xy
x y 2 2
2
( xy ) 2
6. R 2 [cov( X , Y )]
5. R 2
var( X ) var(Y )
x y 2 2
66
2.5 Residuals and Goodness of Fit
A natural criterion of goodness of fit is the
correlation between the actual and fitted values of
Y. The least squares principle also maximizes this.
Note, R (ryˆ , y ) (rx , y )
2 2 2
i
x 2
n xi
2
n x 68 i
2
To sum up …
2 2
i i i
2
y ˆ
y 2
e 2
var(ˆ )
x 2
i 28
TSS ESS RSS 0.0357 2
R
2 ESS
y
ˆ 2
1 X 2
y 2 var(ˆ ) (
2
)
n xi
TSS 2
yˆ 2
̂ xy
var(ˆ ) (
2 1
64
)
10 28
yˆ 2 ˆ
2
x 2
2.3857 2
RSS (1 R ) y 2 2
But , ? 2
69
An unbiased estimator for σ2
E ( RSS ) E ( e ) (n 2)
2
i
2
Thus, if we define ˆ 2
e2
i
, then :
n2
1 1
E ( ) (
ˆ
2
) E ( ei ) (
2
)(n 2)
2 2
n2 n2
ˆ 2
i
e 2
is an unbiased estimator of 2 .
n2
70
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Why is the Error Normality Assumption Important?
The normality assumption permits us to derive the
functional form of the sampling distributions of
ˆ , ˆ & ˆ 2 .
Knowing the form of the sampling distributions
enables us to derive feasible test statistics for the
OLS coefficient estimators.
These feasible test statistics enable us to conduct
statistical inference, i.e.,
1)to construct confidence intervals for , & . 2
. i ~ N (0, )
2
Y ~ N ( X , )
i i
2
2
ˆ ~ N ( , ) ˆ ~ N ( ,
X 2
xi
2 i
2 )
x 2
i
ˆ
( ) xi ~ N (0,1)
2
ˆ
~ t n2
2
βˆ β seˆ(ˆ ) σˆ
e
i
~t n 2 n2
seˆ(βˆ)
X
2
ˆ seˆ(αˆ ) σˆ. i
seˆ( ˆ ) n x 2
i
2 i
x 72
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
73
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Let us continue with our earlier example.
We have: n 10, ˆ 3 .6, ˆ
0.75, R 0.5181,
2
e
2
2 14.65
is estimated by: σˆ
2
i
1.83125
n2 8
σˆ 1.83125 1.3532
Thus, vâr(ˆ ) 2.3857(1.83125) 4.3688
seˆ(ˆ ) 4.3688 2.09
vâr( ˆ ) 0.0357(1.83125) 0.0654
seˆ( ˆ ) 0.0654 0.256 74
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Hypothesis Testing
Suppose that from a sample of 10 observations, we estimate
the ff sales function:
Result: Yˆi 3.6 0.75 X i , where xi is adv. cost.
(2.09) (0.256)
1. Test the claim that sales doesn’t depend on advertising
expense (at 5% level of significance) & construct CI check by
both methods!
2. Test whether the intercept is greater than 3.5.
3. Can you reject the claim that a unit increase in advertising
expense raises sales by one unit? If so, at what level of
significance? Solution 78
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
81
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
Exercise
The following intermediate results are obtained from the data
collected on the quantity supplied (Y) for commodity A and
its price (X) for 10 years.
Y = 60 X2 = 304 xy = 53 y2 = 53
X = 60 Y2 = 413 x2 = 54
a. Assuming a relationship 𝑌𝑖 = 𝜕 + 𝛽𝑋𝑖 + 𝑈𝑖 , obtain the
OLS estimators of 𝜕 & 𝛽.
b. Compute the value of R2 (coefficient of determination)
and interpret the result.
82
2.6 Confidence Intervals and Hypothesis Testing in Regression Analysis
83