0% found this document useful (0 votes)
22 views42 pages

CH 1,2 and 3

Econometrics is the application of statistical methods to estimate economic relationships, test theories, and inform policy decisions, distinguishing it from economic theory and mathematical economics. It involves creating econometric models that include behavioral equations and disturbances, with goals of analysis, policy-making, and forecasting. The methodology of econometric research includes model specification, estimation, evaluation, and forecasting power assessment, with various data structures such as cross-sectional, time series, pooled cross-section, and panel data being utilized.

Uploaded by

degua300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views42 pages

CH 1,2 and 3

Econometrics is the application of statistical methods to estimate economic relationships, test theories, and inform policy decisions, distinguishing it from economic theory and mathematical economics. It involves creating econometric models that include behavioral equations and disturbances, with goals of analysis, policy-making, and forecasting. The methodology of econometric research includes model specification, estimation, evaluation, and forecasting power assessment, with various data structures such as cross-sectional, time series, pooled cross-section, and panel data being utilized.

Uploaded by

degua300
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CHAPTER ONE: INTRODUCTION

1.1 What is econometrics?

Literally interpreted, econometrics means “economic measurement.” Although measurement is


an important part of econometrics, the scope of econometrics is much broader. It is based upon
the development of statistical methods for estimating economic relationships, testing economic
theories, and evaluating and implementing government and business policy. The most common
application of econometrics is the forecasting of such important macroeconomic variables as
interest rates, inflation rates, and gross domestic product. While forecasts of economic indicators
are highly visible and are often widely published, econometric methods can also be used in
economic areas that have nothing to do with macroeconomic forecasting. For example, it is
possible to study the effects of political campaign expenditures on voting outcomes.

Econometrics is a combination of economic theory, mathematical economics and statistics, but it


is completely distinct from each one of these three branches of science. Economic theory makes
statements or hypotheses that are mostly qualitative in nature. For instance, microeconomic
theory states that, other things remaining the same, a reduction in the price of a commodity is
expected to increase the quantity demanded of that commodity. But the theory itself does not
provide any numerical measure of the relationship between the two, that is, it does not tell by
how much the quantity will go up or down as a result of a certain change in the price of the
commodity. It is the job of econometrician to provide such numerical statements.

The main concern of mathematical economics is to express economic theory in mathematical


form (equations) without regard to measurability or empirical verification of the theory. Both
economic theory and mathematical economics state the same relationships. Economic theory
uses verbal exposition but mathematical economics employs mathematical symbolism. Neither
of them allows for random elements which might affect the relationship and make it stochastic.
Furthermore, they do not provide numerical values for the coefficients of the relationships.
Although econometrics presupposes the expression of economic relationships in mathematical
form, like mathematical economics it does not assume that economic relationships are exact
(deterministic).

Economic statistics is mainly concerned with collecting, processing, and presenting economic
data in the form of charts and tables. It is mainly a descriptive aspect of economics. It does not
provide explanations of the development of the various variables and it does not provide
measurement of the parameters of economic relationships.

In a nutshell, econometrics is the application of statistical and mathematical techniques to the


analysis of economic data with a purpose of verifying or refuting economic theories. In this

1
respect, econometrics is distinguished from mathematical economics, which consists of the
application of mathematics only, and the theories derived need not necessarily have an empirical
content.

1.2 Models, Economic models and Econometric models

A model is a simplified representation of a real world process. For instance, saying the quantity
demanded of oranges depends on the prices of oranges is a simplified representation because
there are a lot of other variables that may determine the demand for oranges (including income of
consumers, change in the price of its substitutes/complements, etc.). In fact, there is no end to
this stream of other variables.

In practice, we include in our model all the variables we think are relevant for our purpose and
dump the rest of the variables in a basket called ‘disturbance.’ This brings us to the distinction
between an economic model and an econometric model.

An economic model is a set of assumptions that approximately describes the behavior of an


economy (or a sector of an economy). On the other hand, an econometric model consists of:

1. A set of behavioral equation derived from the economic model, these equations involve
some observed variables and some “disturbances”.
2. A statement of whether there are errors of observation in the observed variables.
3. A specification of the probability distribution of the “disturbances” (and errors of
measurement).

For instance, if we consider the simplest demand model where quantity demanded (q) depends
on price (p), then the econometric model usually contains:

i) Behavioral equation
q  0  1 p  u where u is a disturbance term.

ii) Specification of the probability distribution of u

E(u/p)=0 and that the values of u for the different observations are independently and
normally distributed with mean zero and variance  2 . Based on these specifications, one
proceeds to test empirically the law of demand or the hypothesis that β1<0.

1.3 Goals of Econometrics

The three main goals of econometrics include:

A. Analysis: - Testing Economic Theory

2
Economists formulated the basic principles of the functioning of the economic system using
verbal exposition and applying a deductive procedure. Economic theories thus developed in an
abstract level were not tested against economic reality. Econometrics aims primarily at the
verification of economic theories.

B. Policy-Making

In many cases, we apply the various econometric techniques in order to obtain reliable estimates
of the individual coefficients of the economic relationships from which we may evaluate
elasticities or other parameters of economic theory (multipliers, technical coefficients of
production, marginal costs, marginal revenues, etc.) The knowledge of the numerical value of
these coefficients is very important for the decisions of firms as well as for the formulation of the
economic policy of the government. It helps to compare the effects of alternative policy
decisions.

C. Forecasting
In formulating policy decisions, it is essential to be able to forecast the value of the economic
magnitudes. Such forecasts will enable the policy-maker to judge whether it is necessary to take
any measures in order to influence the relevant economic variables.

1.4 Methodology of Econometric Research

In any econometric research we may distinguish four basic stages:

A. Specification of the model


[[

The first and the most important step the econometrician has to take in attempting the study of
any relationship between variables, is to express this relationship in mathematical form, that is,
to specify the model with which the economic phenomenon will be explored empirically. This
is called the specification of the model or formulation of the maintained hypothesis. It involves
the determination of:

i) The dependent and explanatory variables which will be included in the model. The
econometrician should be able to make a list of the variables that might influence the
dependent variable based on:

3
 General economic theories
 Previous studies in any particular field and
 Information about individual condition in a particular case, and the actual
behavior of the economic agents may indicate the general factors that affect
the dependent variable.

ii) The a priori theoretical expectations about the sign and the size of the parameters of
the function. These priori definitions will be the theoretical criteria on the basis of
which the results of the estimation of the model will be evaluated.

Example: Consider a simple Keynesian consumption function

C = β0 + β1Y+ U Where: C = Consumption function Y = level of income.

In this function, the coefficient β1 is the marginal propensity to consume (MPC) and
should be positive with a value less than unity (0<β1<1). The constant intercept, β0 of
the function is expected to be positive.

iii) The mathematical form of the model (number of equations, linear or non-linear form
of these equations, etc).

In general, the specification of the econometric model will be based on economic theory and on
any available information relating to the phenomenon being studied.

B. Estimation of the Model

Having specified the econometric model, the next task of the econometrician is to obtain
estimates (numerical values) of the parameters of the model from the data available. In the above
Keynesian consumption function, if 1 = 0.8, this value provides a numerical estimates of the
marginal propensity to consume (MPC). It also supports Keynes’ hypothesis that MPC is less
than 1.

C. Evaluation of Estimates

After the estimation of the model, the econometrician must proceed with the evaluation of the
results, that is, with the determination of the reliability of these results. The evaluation consists of

4
deciding whether the estimates of the parameters are theoretically meaningful and statistically
satisfactory. Various criteria may be used.

- Economic a prior criteria: – These are determined by the principles of economic theory
and refer to the sign and the size of the parameters of economic relationships. In
econometric jargon, we say that economic theory imposes restrictions on the signs and
values of the parameters of economic relationships.
- Statistical criteria: – These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the parameters of the model.
The most widely used statistical criteria are the correlation coefficient and the standard
deviation (or the standard error) of the estimates. Note that the statistical criteria are
secondary only to the a priori theoretical criteria. The estimates of the parameters should
be rejected in general if they happen to have the wrong sign or size even though they
pass the statistical criteria.
- Econometric criteria: – are determined by econometric theory. It aims at the
investigation of whether the assumptions of the econometric method employed are
satisfied or not in any particular case. When the assumptions of an econometric
technique are not satisfied it is customary to re-specify the model.

D. Evaluation of the forecasting power of the estimated model

The final stage of any econometric research is concerned with the evaluation of the forecasting
validity of the model. Estimates are useful because they help in decision-making. A model, after
the estimation of its parameters, can be used in forecasting the values of economic variables. The
econometrician must ascertain how good the forecasts are expected to be. In other words, he
must test the forecasting power of the model.

It is conceivably possible that the model is economically meaningful and statistically and
econometrically correct for the sample period for which the model has been estimated, yet it may
not be suitable for forecasting due to, for example, rapid change in the structural parameters of
the relationship in the real world.

Therefore, the final stage of any applied econometric research is the investigation of the stability
of the estimates, their sensitivity to changes in the size of the sample. One way of establishing
the forecasting power of a model is to use the estimates of the model for a period not included in
the sample. The estimated value (forecast value) is compared with the actual (realized)
magnitude of the relevant dependent variable. Usually, there will be a difference between the

5
actual and the forecast value of the variable, which is tested with the aim of establishing whether
it is (statistically) significant. If after conducting the relevant test of significance, we find that the
difference between the realized value of the dependent variable and that estimated from the
model is statistically significant, we conclude that the forecasting power of the model, its extra -
sample performance, is poor.

Another way of establishing the stability of the estimates and the performance of the model
outside the sample of the data, from which it has been estimated, is to re-estimate the function
with an expanded sample, that is, a sample including additional observations. The original
estimates will normally differ from the new estimates. The difference is tested for statistical
significance with appropriate methods.

Reasons for a model’s poor forecasting performance

a) The values of the explanatory variables used in the forecast may not be accurate
b) The estimates of the coefficients  ' s  may be poor, due to deficiencies of the sample
data.
c) The estimates are ‘good’ for the period of the sample, but the structural background
conditions of the model may have changed from the period that was used as the basis for
the estimation of the model, and therefore the old estimates are not ‘good’ for forecasting.
The whole model needs re-estimation before it can be used for prediction.

Example: Suppose that we estimate the demand function for a given commodity with a single
equation model using time-series data for the period 1950 – 68 as follows

Q̂t = 100 + 5Yt – 30Pt

This equation is then used for ‘forecasting’ the demand of the commodity in the year 1970, a
period outside the sample data.

Given Y1970 = 1000 and P1970 = 5

Q̂t = 100 + 5(1000) – 30(5) = 4950 units.

If the actual demand for this commodity in 1970 is 4500, there is a difference of 450 between the
estimated from the model and the actual market demand for the product. The difference can be
tested for significance by various methods. If it is found significant, we try to find out what are
the sources of the error in the forecast, in order to improve the forecasting power of our model.

1.5 The structure of economic data


6
Economic data sets come in a variety of types. The most important data structures encountered in
applied work are the following:

 Cross-sectional data: consists of a sample of individuals, households, firms, cities


states, countries, or a variety of other units, taken at a given point in time. The
population census conducted by CSA is a good example.

 Time series data: consists of observations on a variable or several variables over time.
Examples of time series data include stock prices, money supply, consumer price index,
gross domestic product, annual homicide rates, and automobile sales figures.

 Pooled Cross Section data: have both cross-sectional and time series features. A pooled
cross section is analyzed much like a standard cross section, except that we often need to
account for secular differences in the variables across the time.

 Panel data (or longitudinal data): This is a special type of pooled data in which the
same cross-sectional units (say, individuals, firms or countries) are surveyed over time. It
consists of a time series for each cross sectional member in the data set. Hence, the key
feature of panel data that distinguishes it from a pooled cross section is the fact that the
same cross-sectional units are followed over a given time period.

(some sub sections left here)


2.5 Properties of Least Squares Estimators: The Gauss-Markov Theorem

7
Given the assumptions of the classical linear regression model, the least squares estimates
possess some ideal or optimum properties. These properties are contained in the well-known
Gauss-Markov theorem.

Gauss–Markov Theorem: Given the assumptions of the classical linear regression model, the
least-squares estimators, in the class of unbiased linear estimators, have minimum variance, that
is, they are BLUE.
i) Linear Estimator
An estimator is linear if it is a linear function of a random variable, such as the dependent
variable Y in the regression model. In other words, an estimator is linear if it is a linear function
of the sample observations.

Recall ˆ1 =
x y
i i

x 2
i

Expanding this we obtain,

=  x (Y  Y )
i i

x 2
i

=  xY  Y  x
i i

x 2
i

=
xY i i
because xi = (Xi- X ) = 0
x 2
i

Thus,

xi
̂1   k i Yi where ki =
 xi2
Assuming the values of X are fixed constants from sample to sample, we can rewrite the above
as

ˆ1 =  k iYi = k1Y1  k2Y2  ...  knYn  f (Y )

8
Hence, the estimate ˆ1 is a linear function of the Y’s, a linear combination of the values of the
dependent variable.

Similarly,

Given ̂ 0 = Y - ˆ1 X

Substituting ̂1   kiYi , we obtain

̂ 0 = Y - X k Y i i

=
Y  X  k Y i i
n

1 
=   n  X k Y
i i

With the assumption that X and ki are fixed constants from sample to sample, it can be noticed
that ̂ 0 depends only on the values of Y. It is a linear function of the sample values of Y.

ii) Unbiased estimator


An estimator is said to be unbiased if its average or expected value, E( ˆi ), is equal to the true
value, i. In other words, an estimator is unbiased if its bias is zero. The bias of an estimator is
defined as the difference between its expected value and the true parameter. That is,

Bias = E ( ˆi ) – i.

9
If ˆi is unbiased estimator of i, then E( ˆi ) – i = 0 which, in turn implies E ( ˆi ) = i.

xi
Recall: ̂1   k i Yi where ki =
 xi2

Substituting the PRF Yi = 0 + 1Xi + Ui into the above result we obtain

ˆ1 = ki (0 + 1Xi + Ui)

= 0 ki + 1 ki Xi +  ki Ui

= 1 +  ki Ui

This is because ki = 0 and kiXi = 1 (see Koutsoyiannis for the proof)

Now taking the expectation (E) of the above result on both sides, we obtain

E ( ˆ1 ) = 1 +  ki E(Ui) where E(Ui) =0

= 1

Therefore, ˆ1 is an unbiased estimator of 1.

Similarly, it can be proved that

E( ̂ 0 )=  0

1 
Recall: ̂ 0 =   n  X k Y
i i

Taking expected values

1 
E ( ̂ 0 ) =    X ki E (Yi )
n 

10
Given that n, X and ki are constant from sample to sample and E(Yi )  0  1 X i

1 
E ( ̂ 0 ) =   n  X k  ( 0  1 X i )
i

 X 
=   0  X ki  0  1 i  X ki 1 X i  , where, ki = 0 and kiXi = 1
n n 

= 0  1 X  1 X

Therefore, E ( ̂ 0 ) =  0

iii) Minimum Variance estimator (or best estimator)


An estimator is best when it has the smallest variance as compared with any other estimate
obtained from econometric methods. Symbolically ̂ is best if


E ˆ  E ( ˆ ) 
2

< E   E ( ) 
2

More formally,
~
Var ( ̂ ) < Var (  )

An unbiased estimator with the least variance is known as an efficient estimator. According to
the Gauss-Markov theorem, the least squares estimates are best (have the smallest variance) as
compared with any other linear unbiased estimator. This property is the main reason for the
popularity of the OLS method. It should be stressed that the OLS estimates have the least
variance within the class of linear unbiased estimators. It may well be that other non linear or
biased estimators from other models have a smaller variance.
(See Gujarati for the proof of the minimum variance property of the least-squares estimator).

11
2.6 Precision or Standard Errors of Least Squares Estimates
Recall: E ( ˆ1 ) = 1 + kiE(Ui)


By definition, Var ( ˆ1 ) = E ˆ1  E ( ˆ1 )
2
 where E ( ˆ1 ) = 1


= E ˆ1  1 
2
where ˆ1 = 1 +  ki Ui

= E 1   (ki ui )  1 
2

= E(kiui)2

= E (k12u12 + k22u22 + … + k2nu2n + 2k1k2u1u2 + … + 2kn-1knun-1un)

Recall that Var (ui) = E[ui -E(ui)] 2 = 2.

This is equal to E(ui)2, because E(ui) = 0

Furthermore, E(uiuj) = 0 for i  j. Then, it follows that

1
Var ( ˆ1 ) = 2ki2 where ki2 =
 xi
2

2
Var ( ˆ1 ) =
x
2
i

Thus, the standard error (s.e.) of ˆ1 is given by


s.e( ˆ1 ) =
x
2
i

Recall that ̂ 0 = Y  1 X .

1 
̂ 0 =   n  X k Y
i i

 1  
Var ( ̂ 0 )= var    X ki Yi 
 n  

12
2
1 
=   n  X ki  var(Yi )

 1 2 X ki 2 2
=  2  2   X ki  Because var(Yi)=  2
n n 

1 ,
Since ki = 0 and ki2 = we obtain
 xi
2

1 X 2 
Var( ̂ 0 ) =  2 
n  xi 
2

  xi 2  nX 2 
   x X
2
 nX
2 2 2
But recall =
 n x 2  i i
 i 

  X i 2  nX 2  nX 2 
  2 
  i 
2
 n x 

X
2

Var( ̂ 0 ) =  2 i

n x
2
i

Hence, the standard error of ̂ 0 is given by

X
2

s.e.( ̂ 0 ) =  i

n x
2
i

Note that the formula of the variance of ̂ 0 and ˆ1 involve the variance of the random term U,
i.e.  2 . However the true variance of Ui cannot be computed since the values of Ui are not
observable. But we may obtain unbiased estimate of  2 from the expression

Uˆ
2

̂ 2
 i

nk

where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.

13
2.7 The distribution of the dependent variable

Recall: Yi= 0 + 1Xi+ui

Then E(Yi)=E(0 + 1Xi+ui)

= E(0 + 1Xi) since E(ui)=0

Given 0 and 1 are parameters and assuming Xi‘s are a set of fixed values in repeated samples,
we have:

E (Yi) = 0 + 1Xi

Moreover, it can be shown that Var(Yi)= E(Yi – E(Yi))2 = E(ui)2 =  2

By definition, Var(Yi) = E(Yi – E(Yi))2

= E(0 + 1Xi+ui -0 - 1Xi)

= E(ui)2 =  2

Therefore, the shape of the distribution of Y is determined by the shape of the distribution of ui,
which is normal.

2.8 Confidence Interval

Note that the true population parameter is always unknown. In order to define how close to the
estimate the true parameter lies, we must construct confidence intervals for the true parameter. In

14
other words, we must establish limiting values around the estimate within which the true
parameter is expected to lie with a certain ‘degree of confidence’. Usually the 95% confidence
level is chosen. This means that in repeated sampling, the confidence limits, computed from the
sample, would include the true population parameter in 95 percent of the cases. In the other 5
percent of the cases the population parameter will fall outside the confidence limit. The
confidence interval can be constructed by the standard normal distribution or the t-distribution.

i) Confidence Interval from the Standard Normal Distribution (Z-distribution)

The Z-distribution may be employed either if we know the true standard deviation (of the
population) or when we have a large sample (n > 30). This is because, for large samples, the
sample standard deviation is a reasonably good estimate of the unknown population standard
deviation.

The Z-statistics for the regression parameters (i.e, ˆ i ), is given by

ˆi   i
Z=
s.e( ˆi )

where s.e = standard error.

Our first task is to choose a confidence coefficient say 95 percent. We next look at the standard
normal table and find that the probability of the value of Z lying between –1.96 and 1.96 is 0.95.
This may be written as follows

P(-1.96 < Z < 1.96) = 0.95


ˆi   i
P(-1.96 < < 1.96) = 0.95
s.e( ˆi )

Rearranging this result we obtain

P[ ˆ i -1.96 s.e ( ˆ i ) < i < ˆ i + 1.96 s.e ( ˆ i )] = 0.95

Thus, the 95 percent confidence interval for i is

15
ˆ i -1.96 s.e ( ˆ i ) < i < ˆ i + 1.96 s.e ( ˆ i )

or i = ˆ i  1.96 (s.e ˆ i )

ii) Confidence interval from the student’s t-Distribution

The procedure for constructing a confidence interval with the t-distribution is similar to the one
outlined earlier with the main difference that in this case we must take into account the degrees
of freedom.

ˆi   i
t=
 
s..e ˆi
with (n-k) degrees of freedom

In this regard, if we choose the 95% confidence level we can find from the t table the value of t =
0.025 with (n-k) degrees freedom. This implies that the probability of t lying between –t0.025 and
t0.025 is 0.95 (with n-k degrees of freedom). Consequently we may write:

P(-t0.025 < t < t0.025) = 0.95

ˆi   i
Substituting t =
 
s.e ˆi
in the above expression, we find

ˆi   i
P(-t0.025 <
 
s.e ˆi
< t0.025) = 0.95

Rearranging this we obtain

P[ ˆ i - t0.025 (s.e ˆ i ) < i < ˆ i + t0.025 (s.e ˆ i )] = 0.95

Thus the 95 percent confidence interval for i, when we use a small sample for its estimation, is

i = ˆ i  t0.025 (s.e ˆ i ) with (n-k) degrees of freedom

16
Example: Given the following regression from a sample of 20 observation

Yˆi = 128.5 + 2.88Xi

(38.2) (0.85)

where the values in the parenthesis are standard errors. Construct the 95% confidence interval for
the intercept and slope.

Solution:
Note that n = 20 and k (the number of parameters) = 2.
From the t-table the value of t0.025 for (n-k = 18) degrees of freedom is 2.10 and s.e( ̂ 0 ) = 38.2,
s.e( ˆ1 ) = 0.85.
Thus, the 95% confidence interval for the intercept is
128.5  (2.10) (38.2)

= 48.3 < 0 < 208.7

Interpretation: Given the confidence coefficient of 95%, in the long run, in 95 out of 100 cases
intervals like (48.3, 208.7) will contain the true 0.

Similarly, the 95% confidence interval for the slope, 1 is given by

ˆ1  t0.025 s.e( ˆ1 )

= 2.88  (2.10) (0.85)

= 1.1 < 1 < 4.67

Interpretation: Given the confidence coefficient of 95%, in the long run, in 95 out of 100 cases
intervals like (1.1, 4.67) will contain the true 1.

2.9 Testing the Significance of the Parameters Estimates

17
In addition to r2, testing of the reliability of the estimates ( ̂ 0 , ˆ1 ) should be done. That is, we
must see whether the estimates are statistically reliable. Since ̂ 0 and ˆ1 are sample estimates of
the parameters 0 and 1, the significance of the parameter estimates should be seen. Note that
given the assumption of normally distributed error term, the distribution of estimates ̂ and ˆ
0 1

is also normal. That is, ˆ i ~ N [E( ˆ i ), Var( ˆ i )]

More formally,


̂ 0 ~ N   0 ,  2 X i
2 

 n x 2 
 i

and

 1 
ˆ1 ~ N  1 ,  2 

 n xi2 

Frequently, we test the null hypothesis

H0: i = 0

against the alternative hypothesis:

H1:   0

This is a two-tailed (or two sided) hypothesis. Very often such a two-sided alternative hypothesis
reflects the fact that we do not have a strong a priori or theoretical expectation about the
direction in which the alternative hypothesis should move from the null hypothesis.

Sometimes, we may have a strong a priori or theoretical expectation (or expectations based on
some previous empirical work) that the alternative hypothesis is one sided or unidirectional
rather than two-sided.

For instance in a consumption – income function C = 0 + 1Y one could postulate that:

H0: 1  0.3

H1: 1 > 0.3

18
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (1) is greater than 0.3.

A) The Z-test of the least squares estimates

Recall that the Z-test is applicable only if

a) the population variance is known, or


b) the population variance is unknown, and provided that the sample size is sufficiently
large (n > 30).

In econometric applications the population variance of Y is unknown. However, if we have a


large sample (n > 30) we may still use the standard normal distribution and perform the Z test.

ˆi   i
Z=
s.e( ˆi )

If these conditions cannot be fulfilled, we apply the student’s t-test.

B) The Student’s t-Test

ˆi   i ˆ   i
t= = i
Var  i s.e.( i )

follows the t-distribution with n – k degrees of freedom, where

ˆ i = least square estimate of i

i = hypothesized value of i

Var ( ˆ i )= estimated variance of i (from the regression)

n = Sample size

k = total number of estimated parameters

The customary form of the null hypothesis is

19
H0: i = 0

H1: i  0

In this case the t-statistics reduces to

ˆi  0 ˆi
t* = 
s.e ( ˆi ) s.e ( ˆi )

where t* refers to the calculated (estimated) t value. This value is compared to the theoretical
(table) values of t that define the critical region in a two-tailed (for the above case) test, with n –
k degrees of freedom. Recall that the critical region depends of the chosen level of significance
(i.e., the value of ).

Acceptance
Rejection Rejection
Region
region region

-t/2 t/2

Figure 2.1: Acceptance and rejection regions

Note that the critical value  t/2 will be changed to t or - t if the test is a one-tailed test.

In the language of significance tests, a statistic is said to be statistically significant if the value of
the test statistic lies in the critical region. In this case the null hypothesis is rejected. By the same
token, a test is said to be statistically insignificant if the value of the test statistic lies in the
acceptance region. In this situation, the null hypothesis is not rejected.

Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.

Ĉ = 100 + 0.70Y

20
(75.5) (0.21)

where the figure in brackets are the standard errors of the coefficients. Are the estimates
significant?

Solution:

H0: 0 = 0 and H0 :  1 = 0

H1: 0  0 H1 :  1 = 0

Since n<30, we use the t-test.

For β0,

̂ 0 100
t* = = = 1.32
s.e ( ̂ 0 ) 75.5

and

For β1,

ˆ1 0.70
*
t = = = 3.3
s.e ( ˆ1 ) 0.21

The critical value of t for (n-k =) 18 degrees of freedom is t0.025 = 2.10

Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0. Thus the estimated value β0 is insignificant.

But for ˆ1 , the calculated value (=3.3) is greater than the table value (2.10), we reject H0 which
implies that the estimated value of β1 is significant in affecting the relationship between the two
variable.

In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we may

21
have low r2 values and low standard errors or high r2 values but high standard errors. There is no
agreement among econometricians in this case, so the main issue is whether to obtain high r2 or
lower standard error of the parameter estimates.

In general, r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.

2.10 Prediction in the least squares model

One of the major goals of econometric analysis is prediction. Consider the following
consumption function:

Y =  0 +  1X + U

That is, consumption expenditure (Y) is a function of income (X). One of the uses of regression
result is “prediction” or “forecasting” the future value of Y corresponding to given X. Suppose
on the basis of a sample we obtained the following sample regression.

Yˆi = 24.45 + 0.509Xi

where Yˆi is the estimator of true E(Yi) corresponding to given X. Note that there are two kinds of
predictions in this regard

i) Prediction of the conditional mean value of Y (mean prediction) and


ii) Prediction of an individual Y value corresponding to X0 (individual prediction)

a) Mean prediction
Suppose that we are interested in the prediction of the conditional mean of Y corresponding to a
chosen X, say X0. Assume that X0 = 100 and we want to predict E(Y/X0 = 100).

Yˆ0 = ̂ 0 + ˆ1 X 0

= 24.45 + 0.509 (100)

= 75.36

where Yˆ0 = estimator of E(Y/X0).

22
Note that since Yˆ0 is an estimator, it is likely to be different from its true value. The difference
between the two values will give some idea about the prediction or forecast error. In order to see
this, we need to know the mean and variance of Yˆ which is given by:
0

E( Yˆ0 ) = E( ̂ 0 + ˆ1 X 0 ) = 0 + 1X0

ˆ

2 1
Var( Y0 ) =   
 X0  X  
2


 n  xi2 
By replacing the unknown  u2 by its unbiased estimator ̂ 2 the above expression can be
rewritten as

 1 X  X 2 
Var( Yˆ0 ) = ˆ 2   0 2 
 n  xi 
RSS Uˆ i
2

where ˆ 2 = 
nk nk

Note that the standard error is given by Var (Yˆ0 )

What we can infer from the above results is that that the variance (and the standard error)
increases the further away the value of X0 is from X . Therefore,

Yˆ0   0  1 X 0 
t 
s.e(Yˆ0 )

follows the t-distribution with n-2 degrees of freedom.

The t-distribution can therefore be used to derive confidence intervals for the true E(Y0/X0) and
test hypothesis about it in the usual manner. Then the confidence interval becomes

Pr[ ̂ 0 + ˆ1 X 0 - t/2 se( Yˆ0 )  0 + 1X0  ̂ 0 + ˆ1 X 0 + t/2 se( Yˆ0 )] = 1 – 

= Yˆ0  t/2 s.e( Yˆ0 )

where s.e ( Yˆ0 ) = Var (Yˆ0 )

23
Now, suppose Var( Yˆ0 ) = 10.4759 where n = 10. We can construct the 95% confidence interval
for true E(Y/X0) = 0 + 1X0.

Note that the table value t0.025 for 8 degrees of freedom is 2.306. Moreover, recall that we
obtained Yˆ = 75.36. Thus, the 95% confidence interval is given by

75.36 – 2.306  10.4759   E(Y /X = 100)  75.36 + 2.306  10.4759 


0

= 67.90  E(Y/X = 100)  82.84

Thus, given X0 = 100, in repeated sampling, 95 out of 100 intervals like (67.90, 82.84) will
include the true mean value of Y.

b) Individual Prediction

If our interest lies in predicting an individual Y value, Y0, corresponding to a given X value, say
X0, then the application in forecasting is called individual prediction.

Again consider the previous example given as


Yˆ = 24.45 + 0.50X0
0

As computed in (a) above, we can give the point estimate for Yˆ0 to a given value of X0, say 100.
That is, Yˆ0 = 24.45 + 0.509(100)
= 75.36

In order to see the reliability of this result, we have to obtain the prediction error which is given
by the actual value less the predicted value.
i.e., Y0 - Yˆ 0

= (0 + 1X0 + U0) – ( ̂ 0 + ˆ1 X 0 )


= (0 - ̂ 0 ) + (1 - ˆ1 )X0 + U0
Note that E(Y0 - Yˆ0 ) = E[(0 - ̂ 0 ) + (1 - ˆ1 )X0 + U0] = 0
because ̂ 0 , ˆ1 are unbiased, X0 is a fixed number, and E(U0) is zero by assumption.

24
Thus, the variance of the prediction error is given by
Var(Y0 - Yˆ ) = E[Y0 - Yˆ ]2
0 0

= E[(0 - ̂ 0 ) + (1 - ˆ1 )X0 + U0]2


 1  X0  X  
2

= 2 1  n  
 X i  X  
2


By replacing the unknown 2 by its unbiased estimator ̂ 2 we get:


 1  X0  X  
2

var (Y0 - Yˆ0 ) = ˆ 1  n  


2

  X i  X  
2

Note that the variance increases the further away the value of X0 is from X where the standard
error is given by s.e = Var (Y  Yˆ ) 0 0

It then follows that the variable


Y0  Yˆ0
t=
se(Y  Yˆ )
0 0

follows a t-distribution with n-2 degrees of freedom. Therefore, the t-distribution can be used to
draw inferences about the true Y0. Continuing with the above example we see that the point
prediction of Y0 is 75.36 where its variance is 52.63. Then, the 95% confidence interval for Y0
corresponding to X0 = 100 will be:
75.36 – 2.306  
52.63  Y0 X 0 = 100  75.36 + 2.306  52.63 
= 58.63  Y0 X 0 = 100  92.09

MAXIMUM LIKELIHOOD ESTIMATION

A method of point estimation with some stronger theoretical properties than the method of OLS
is the method of maximum likelihood (ML). To use this method, however, one must make an
assumption about the probability distribution of the disturbance term ui. In the regression
context, the assumption most popularly made is that ui follows the normal distribution. Under the
normality assumption, the ML and OLS estimators of the intercept and slope parameters of the
regression model are identical. However, the OLS and ML estimators of the variance of ui are
different. In large samples, however, these two estimators converge. Thus, the ML method is
generally called a large-sample method. It is of broader application in that it can also be applied
to regression models that are nonlinear in the parameters. In this case, OLS is generally not used.

25
Assume a two-variable model:

Yi = o + 1Xi + Ui where the Yi are normally and independently distributed with

mean = o + 1Xi and variance = 2.

As a result, the joint probability density function of Y1, Y2, …, Yn given the preceding mean and
variance, can be written as

f (Y1, Y2, …, Yn/ o + 1Xi , 2)

But in view of the independence of the Y’s, this joint probability density function can be written
as a product of n individual density functions as

f (Y1, Y2, …, Yn/ o + 1Xi , 2)

= f(Y1/ o + 1Xi , 2) f(Y2/o + 1Xi , 2)… f(Yn/o + 1Xi, 2) …………………………….(1)

where,

1  1 Yi   0   1 X i 2 
f(Yi) = exp   ………………(2)
 2  2 2 

is the density function of a normally distributed variable with the given mean and variance.

Substituting (2) for each Yi into (1) gives

  1 Yi   0  1 X i  
 2
1 
f(Y1, Y2, …, Yn/ o + 1Xi,  ) =
2
exp    …………….(3)
n  2 
n

2  2

If Y1, Y2, …. Yn are known or given, but o, 1 and 2 are not known, the function in (3) is called
a likelihood function, denoted by LF( o, 1, 2) , and written as

26
  1 Y   0  1 X i  
 2
1 
LF (o, 1, 2) = exp   i  ………………………...(4)
n  2 
n

 2  2

The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible. Therefore, we have to find the maximum of the function in (4).

n 1 (Yi   0  1 X i ) 2
lnLF = - nln - ln(2) -  ………….………(5)
2 2 2

n n 1 (Y   0  1 X i ) 2
= ln2 - ln(2) -  i …………….…(6)
2 2 2 2

Differentiating partially with respect to o, 1, and 2, setting the result equal to zero we obtain

 ln LF  1
 2 (Yi – o – 1Xi) (-1) = 0 …………………….……(7)
 0 

 ln LF  1
 2 (Yi – o– 1Xi) (-Xi) = 0 …………………….……(8)
 1 

 ln LF n 1
  (Yi – o – 1Xi)2 = 0 …………….………..(9)
 2
2 2
2 4

~ ~
The above equations can be rewritten as (letting  0 ,  1 and ~ 2 denote the ML estimators)

~ ~
1
(Yi -  -  1 Xi) = 0 ………………….…………….……..…(10)
~ 2
0

~ ~
1
(Yi -  -  1 Xi)Xi = 0 ………………………………………(11)
~ 2
0

27
n  1 ~ ~

~ (Y -  -  1 Xi) = 0 ……………………..……….(12)


2
2
i
2~ 2 4 0

~ ~
After simplifying Yi = n  0 +  1 Xi ……………………………..…(13)

~ ~
YiXi =  0 Xi +  1 Xi2…………………….……….(14)

Equations (13) and (14) are precisely the normal equations of the least squares theory. Therefore,
the ML estimators are the same as the OLS estimators.

Moreover, substituting the ML ( = OLS) estimators into (12) and simplifying, we obtain the ML
estimator of ~ 2
~ ~
1
~ 2 = (Yi -  0 -  1 Xi)2
n

~ ~
1
= (Yi -  0 -  1 Xi)2
n

1 ˆ2
= U i
n

 2  1 
~
It is obvious that the ML estimator  differs from the OLS estimator  = 
2
  Uˆ i2
 (n  2) 
which is an unbiased estimator of 2. Thus, the ML estimator of 2 is biased. The magnitude of
this bias can be easily determined as follows:

1 n2 2 2 2
E( ~ 2 ) = ( Uˆ i2 ) =   = 2 – 
n  n  n

which shows that ~ 2 is biased downward (i.e., it underestimates the true 2) in small samples.
2 2
But notice that as n, the sample size, increases indefinitely, the bias factor    , tends to be
 n 
zero. Therefore, asymptotically (i.e., in a very large sample), ~ 2 is unbiased too because
lim E( ~ 2 ) =2 as n→∞.

28
CHAPTER THREE : MULTIPLE LINEAR REGRESSION

3.1 Introduction

In a simple regression, a dependent variable is a function of only one explanatory variable. However, in
economics you hardly found that one variable is affected by only one explanatory variable. Hence, a two
variable model is often inadequate in practical works. The multiple linear regression is concerned with
the relationship between a dependent variable and two or more explanatory variables.

3.2 The three-variable model: specification and assumptions

Consider a regression model with two explanatory variables.

Y = f(X1, X2)

Example: Demand for a commodity may be influenced not only by the price of the commodity but by
the consumers’ income. Since the theory does not specify the mathematical form of the demand
function, we assume the relationship between Y, and the regressors (X1, and X2) is linear. Hence we may
write the three variables Population Regression Function (PRF) as follows:

Yi = o + 1X1i + 2X2i +Ui

Where Y is the quantity demanded

X1 and X2 are the price and income respectively

o is the intercept term

1 is the coefficient of X1 and its expected sign is negative (due to the law of demand)

2 is the coefficient of X2 and its expected sign is positive assuming that the good is a

normal good.

29
Interpretation of partial regression coefficients

Given Yi = o + 1X1i + 2X2i +Ui

The meaning of partial regression coefficient is as follows: 1 measures the change in the mean value of
Y, E(Y), per unit change in X1, holding the value of X2 constant. Likewise, 2 measures the change in the
mean value of Y per unit change in X2, holding the value of X1 constant.

Consider also a model relating a person’s wage to observed educ (years of education), exper
(years of labor market experience), tenure (years with the current employer) and other
unobserved factors.
Wage = o + 1educ + 2exper + u

For instance, 1measures the change in hourly wage given another year of education, holding
experience fixed. Similarly, 2 measures the change in hourly wage given another year of
experience, holding education fixed.

To complete the specification of our simple model we need some assumptions about the random
variable U. These assumptions are the same as those assumptions already explained in the simple
linear regression model.

Assumptions of the model

1. Zero mean value of Ui


The random variable U has a zero mean value for each Xi. that is, E(Ui/X1i, X2i) = 0 for each i.

2. Homoscedasticity: The variance of each Ui is the same for all the Xi values

Var (Ui) = E(Ui2) =  u2

3. Normality: The values of each Ui are normally distributed. That is, Ui ~ N(0,  u2 )

30
4. No serial correlation: The values of Ui (corresponding to Xi) are independent of the values of any
other Uj (corresponding to Xj).
Cov (Ui, Uj) = 0 for i  j

5. Independence of Ui and Xi: Every disturbance term Ui is independent of the explanatory variables.
That is, there is zero covariance between Ui and each X variables.
Cov(Ui, X1i) = Cov (Ui, X2i) = 0

Here the values of the X’s are a set of fixed numbers in all hypothetical samples.

6. No perfect multicollinearity (No collinearity between the X variables): The explanatory variables
are not perfectly linearly correlated. That is, there is no exact linear relationship between X 1 and
X2.

7. Correct specification of the model: The model has no specification error in that all the important
explanatory variables appear explicitly in the function and the mathematical form is correctly
defined.

3.3 Estimation: the method of least squares

Suppose the sample regression function (SRF)

  
Yi =  0   1 X 1i   2 X 2i  uˆi

  
where  0 ,  1 and  2 are estimates of the true parameters  0 ,  1 and  2

ûi is the residual term.

But since U i is unobservable the above equation becomes

  
Yˆi =  o   1 X 1i   2 X 2i is the estimated regression line.

31
In the least squares estimation, the estimates will be obtained by choosing the values of the unknown

parameters that will minimize the sum of squares of the residuals. (OLS requires the  uˆ i be as small
2

as possible). Symbolically,

n   
Min  uˆ i = (Yi – Yˆi )2 =  (Yi -  0   1 X 1i   2 X 2i )2
2

A necessary condition for a minimum value is that the partial derivatives of the above expression with
  
respect to the unknowns (i.e.  0 ,  1 and  2 ) should be set to zero.

2
   

   Yi   0   1 X 1i   2 X 2i 
  0

 0

2
   

   Yi   0   1 X 1i   2 X 2i 
  0

 1

2
   

   Yi   0   1 X 1i   2 X 2i 
  0

 2

After differentiating, we get the following normal equations:

  
Yi = n  o +  1 X1i +  2 X2i

  
X1iYi =  o X1i +  1 X21i +  2 X1iX2i

  
X2iYi =  o X2i +  1  X1iX2i +  2 X22i

  
Solving the above normal equations we can obtain values for  o ,  1 and  2

32
  
 o = Y   1 X1   2 X 2

  x y  x    x y  x
2
x 2i 
1
1i i 2i 2i i 1i
=
 x  x    x x 
2
1i
2
2i 1i 2i
2

  x y  x    x y  x
2
x 2i 
 2=
2i i 1i 1i i 1i

 x  x    x x 
2
1i
2
2i 1i 2i
2

where the variables x and y are in deviation forms.

Derivation

uˆi  yi  yˆi

 uˆ 2
i   ( yi  ˆ1 x1i  ˆ2 x2i ) 2

  uˆi2
 2 ( yi  ˆ1 x1i  ˆ2 x2i )x1i  0
ˆ 1

  uˆi2
 2 ( yi  ˆ1 x1i  ˆ2 x2i )x2i  0
ˆ
 2

After rearranging ,
y x i 1i  ˆ1  x12i  ˆ2  x1i x2i

y x i 2i  ˆ1  x1i x2i  ˆ2  x22i

Solving these two equations simultaneously we will get the values of ˆ1 and ̂ 2 given above.

3.4 The Mean and Variance of the Parameter Estimates

The mean of the estimates of the parameters in the three-variable model is derived in the same way as
in the two-variable model.

33
The estimates are unbiased estimates of the true parameters of the relationship between Y, X1 and X2.

  
E(  0 ) =  0 E(  1 ) =  1 E(  2 ) =  2

  
The variance of  0 ,  1 and  2 are also given as follows

 1 X 12 x 2  X 22 x 2  2 X X 

Var(  o ) = ˆ  2

 2  1 1 2  x1 x 2

n  x1  x2   x1 x2  
u 2 2 2
 


Var(  1 ) = ˆ 2
x 2
2

 x  x   x x 
2 2 2
1 2 1 2


Var(  2 ) = ˆ 2 x 2
1

 x  x   x x 
2 2 2
1 2 1 2

Uˆ
2

Where ̂ 
2 i
, (in the three variable model, k = 3).
nk

Note that x1 and x2 are in deviations form.

3.5 The multiple coefficient of determination (R2) and the Adjusted R2

In a two variable regression model, the coefficient of determination (r2) measures the goodness of fit of
the regression equation. This notion of r2 can be easily extended to regression models containing more
than two variables.

In the three-variable model, we would like to know the proportion of the variation in Y explained by the
variables X1 and X2 jointly. The quantity that gives this information is known as the multiple coefficient of
determination. It is denoted by R2, with subscripts the variables whose relationships are being studies.

34
Example: R 2 y. X1 X 2 - shows the percentage of the total variation in Y explained by the regression plane,

that is, by changes in X1 and X2.

 yˆ  Yˆ  Y 
2 2


2 i i
R =
y  Y  Y 
y . X1 X 2 2 2
i i

 uˆ
2
RSS
 1
i
=1–
y
2
i
TSS

where: RSS – residual sum of squares

TSS – total sum of squares

Recall that

 
ŷi =  1 x1i +  2 x 2i (the variables are in deviation forms)

yi = yˆ i  uˆi

 
 uˆ i = (yi - ŷi )2 = (yi -  1 x1i -  2 x 2i )2
2

 
or  uˆ i =  ûi . ûi =  ûi (yi -  1 x1i -  2 x 2i )
2

 
=  ûi .yi -  1  ûi .x1i -  2  ûi .x2i

but  ûi .x1i =  ûi .x2i = 0

Hence  uˆ i =  ûi yi
2

=  (yi - ŷi )yi since ûi = yi - ŷi

 
= yi (yi -  1 x1i -  2 x 2i )

 
= yi2 -  1 x1i yi -  2 x2i yi

35
By substituting the value of  uˆ i in the formula of R2, we get
2

   
 yi 2  1  x1i yi   2  x2i yi 
 
 uˆi
2
2
 1  
R y . X1 X 2 = 1 –
 yi  yi
2 2

 
 1  x1i y i   2  x 2i y i
= , where x1i, x2i and yi are in their deviation forms.
y
2
i

The value of R2 lies between 0 and 1. The higher R2 the greater the percentage of the variation of Y
explained by the regression plane, that is, the better the goodness of fit of the regression plane to the
sample observations. The closer R2 to zero, the worse the fit is.

An important property of R2 is that it is a nondecreasing function of the number of explanatory variables or


regressors present in the model; as the number of regressors increases, R2 almost invariably increases
and never decreases. Stated differently, an additional X variable will not decrease R2.

To see this, recall the definition of R2

 uˆ
2
2 i
R =1–
y
2
i

It is clear that yi2 is independent of the number of X variables in the model because it is simply (Yi -

Y )2. The residual sum of squares (RSS),  uˆ i , however, depends on the number of explanatory
2

variables present in the model. It can be noticed that as the number of X variables increases,  uˆ i is
2

bound to decrease (at least it will not increase), hence R2 will increase. Therefore, in comparing two
regression models with the same dependent variable but different number of X variables, one should be
very wary of choosing the model with the highest R2. An explanatory variable which is not statistically
significant may be retained in the model if one looks at R2 only.

Therefore, to correct for this defect, we adjust R2 by taking into account the degrees of freedom, which
clearly decreases as new regressors are introduced in the function.

36
 uˆ n  k 
2
2 i
R =1–
 y n  1
2
i

2
or R = 1 – (1 – R2)
n 1
nk

where k = the number of parameters in the model (including the intercept term)

n = the number of sample observations

2
It is immediately apparent from the above equation that for k > 1, R < R2 which implies that as the
number of explanatory variables increases, the adjusted R2 is increasingly less than the unadjusted R2.
2
The adjusted R2, i.e. R , can be negative, although R2 is necessarily non-negative. In this case its value is
2
taken as zero. If n is large, R and R2 will not differ much. But with small samples, if the number of
2
regressors (X’s) is large in relation to the sample observations, R will be much smaller than R2.

3.6 Partial Correlation Coefficients

In the two variable regression model, we have used the simple correlation coefficient, r, as measure of

the degree of linear association between two variables. For three variables case, we can compute three
correlation coefficients: ryx1 (correlation between y and x1), ryx2, and rx1x2 – these are called gross or
simple correlation coefficients, or correlation coefficients of zero order. But, for example, ryx1 does not
likely reflect the true degree of association between Y and X1 in the presence of X2. Therefore, what we
need is a correlation coefficient that is independent of the influence, if any, of X2 on X1 and Y. Such a
correlation coefficient is known as the partial correlation coefficient. Conceptually, it is similar to the
partial regression coefficient.

ryx1.x2 = partial correlation coefficient between Y and X1, holding X2 constant

ryx2.x1 = partial correlation coefficient between Y and X2, holding X1 constant

rx1x2.y = partial correlation coefficient between X1 and X2, holding Y constant.

37
We can compute the partial correlations from the simple or zero order correlation coefficients as
follows.

ryx1  ryx 2 rx1x 2


ryx1.x2 =
(1  ryx2 2 )(1  rx21x 2 )

ryx 2  ryx1 rx 2 x 3
ryx2.x1 =
(1  ryx2 1 )(1  rx21x 2 )

rx1x 2  ryx1 ryx 2


rx1x2.y =
(1  ryx2 1 )(1  ryx2 2 )

Note: By order we mean the number of secondary subscripts. For example, ryx1.x2x3 is the
correlation coefficient of order two, whereas ryx1.x2x3x4 represents the correlation coefficient of
order three, and so on.

3.7 The Confidence Interval for  i

The principle involved in constructing the confidence is identical with that of simple regression.

4.8 Test of significance in multiple regression

3.8.1 Hypothesis Testing about Individual Partial Regression Coefficients

We can test whether a particular variable X1 or X2 is significant or not holding the other variable
constant. The t test is used to test a hypothesis about any individual partial regression coefficient. The
partial regression coefficient measures the change in the mean value of Y E(Y/X2,X3), per unit change in
X2, holding X3 constant


 1  i
t= 
~ t(n – k) (i = 0, 1, 2, …., k)
se(  i )

38
This is the observed (or sample) value of the t ratio, which we compare with the theoretical value of t
obtainable from the t-table with n – k degrees of freedom.

The theoretical values of t (at the chosen level of significance) are the critical values that define the
critical region in a two-tail test, with n – k degrees of freedom.

let us postulate that

H0:  i = 0

H1:  i  0 or one sided (  i > 0,  i < 0)

The null hypothesis states that, holding X2 constant, X1 has no (linear) influence on y.

If the computed t value exceeds the critical t value at the chosen level of significance, we may reject the

hypothesis; otherwise, we may accept it (  1 is not significant at the chosen level of significance and

hence the corresponding regression does not appear to contribute to the explanation of the variations
in Y).

Example. Suppose the estimated hourly wage equation is given as follows:

log( waˆge ) = .284 + .092 educ + .0041 exper + .022 tenure


s.e (.104) (.007) (.0017) (.003)
n = 526, R2 =.316,
Test whether the return to exper, controlling for educ and tenure, is zero in the population,
against the alternative that it is positive. That is, H0:  exp er = 0 against H1:  exp er > 0.

3.8.2 Testing the Overall Significance of a Regression

39
This test aims at finding out whether the explanatory variables (X1, X2, …Xk) do actually have any
significant influence on the dependent variable. The test of the overall significance of the regression
implies testing the null hypothesis

H0:  1 =  2 = … =  k = 0

Against the alternative hypothesis

H1: not all  i ’s are zero.

If the null hypothesis is true, then there is no linear relationship between y and the regressors.

The above joint hypothesis can be tested by the analysis of variance (ANOVA) technique. The following
table summarizes the idea.

Source of variation Sum of squares Degrees of freedom Mean square


(SS) (Df) (MSS)

Due to regression (ESS)


 yˆ 2
i
k–1
 yˆ 2

k 1

 uˆ
2
i
Due to Residual (RSS)
 uˆ
2
i nk
n–k

Total
y 2
i
n–1

(Total variation)

Therefore to undertake the test first find the calculated value of F and compare it with the F tabulated.
The calculated value of F can be obtained by using the following formula.

 yˆ k  1  ESS k  1 follows the F distribution with k – 1 and n – k df.


2
i
F=
 uˆ n  k  RSS n  k 
2
i

40
where k – 1 refers to degrees of freedom of the numerator

n – k refers to degrees of freedom of the denominator

k – number of parameters estimated

Decision Rule: If Fcalculated > Ftabulated (F(k – 1, n– k)), reject H0: otherwise you may accept it, where F(k –
1, n – k) is the critical F value at the  level of significance and (k – 1) numerator df and (n – k)
denominator df.

Note that there is a relationship between the coefficient of determination R2 and the F test used in the
analysis of variance.

From the above:

(n  k ) ESS
F
(k  1) RSS

(n  k ) ESS / TSS
F
(k  1)(1  ESS / TSS )

(n  k ) R 2
F
(k  1)(1  R 2 )

R 2 (k  1)
F=
(1  R 2 (n  k )

When R2 = 0, F is zero. The larger the R2, the greater the F value. In the limit, when R2 = 1, F is infinite.
Thus the F test, which is a measure of the overall significance of the estimated regression, is also a test
of significance of R2. Testing the null hypothesis is equivalent to testing the null hypothesis that (the
population) R2 is zero.

41
42

You might also like