0% found this document useful (0 votes)
57 views19 pages

Unit 1 PDF

Uploaded by

jyoti singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views19 pages

Unit 1 PDF

Uploaded by

jyoti singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT 1 INTRODUCTION TO

ECONOMETRICS
( : Structure

1.0 Objectives
.' I
.. 1 , 1.1 Introduction
.- 1 1.2 The Nature of Econometrics
1.3 Probability Distributions
1.3.1 Discrete Probability Distribution
13.2 Continuous Probability Distribution
,<

: I
! 1.4 Sampling Distribution
1.5 Statistical Inference
1.5.1 Estimation
1.5.2 Hypothesis Testing
1.6 Software Packages for Econometric Analysis
1.7 Let Us Sum Up
1.8 Key Words
I .9 Some Useful BookslReferences

1.0 OBJECTIVES
, After going through this unit you will be in a position to:
explain why we should study econometrics;
: appreciate the scope of econometrics; and ?.

learn certain basic statistical tools used in econometrics.

1.1 INTRODUCTION +

Econometrics has emerged as a specialized branch of economics which is


concerned mostly with empirical verification of economic theory.
-5 Econometrics has both theoretical side as well as applied side. In theoretical
econometrics several estimation techniques and test statistics for estimates are
developed. Theoretical developments in econometrics have been quite rapid
in recent years. It is somewhat dif3cult to keep track of all these
developments and cover the entire lot in a single course.
The present unit serves as a foundation for the course. It reviews some of the
background materials co~kideredto be useful for studying econometrics. Thus
it covers in very brief the concepts of probability distribution and statistical
inference. The Unit begins with a definition of econometrics and how
econometric methods are applied to real life situations.

1 . 2 THE NATURE OF ECONQMETRICS p-

kconometrics is a branch of economics which deals with


. . economic
I - .- . , . ,.. .
-

Basic Econometric Theory


economic phenomena based on the concurrent development of theory and
I observation, related by appropriate method., ,)f inference' [Samuelson,
Koopmans and Stone, 19541. We sho~lldnotice 11 ,ree issues being emphasized
in the above statement. First. econometrics deals with quantitative analysis of
economic relationship. Second, it is based on economic theory and logic.
Third, it requires appropriate methods to draw inierences.
Through logic and experience economists have attempted to estzblish
relationship between several variables Mote~nents in certain economic
variables are explained in terms of changes in some other variables. For
example, the law of demand states that as price increases the quantity
demanded of a commodity decreases. You will find numerous examples of'
such relationships. Some of these are apparent while others are quite complex.
As we do not know what exactly is happening we often make conjectures
based on logic and our understanding of things. Due to differences in
perception, often different persons explain the same phenomenon differently.
Some of the economic relationships have been proved through empirical
studies and validated under several situations. Some more are put forth as
hypothesis and unequivocal conclusions are- yet to be drawn. In both the
cases, however, logic needs to be supported by empirical observation. An
advantage with the subject matter of economics, unlike many other disciplines
in social sciences, is that many of the economic variables can be quantified.
This has helped in empirical measurement of economic variables and
validation of economic theory.
We will show how econometrics is different from other braches of economics
such as mathematical economics or economic statistics. Mathematical
economics usually presents economic theory in mathematical form. It does
not bother about empirical measurement of economic theory. Econometrics
on the other hand is mainly concerned with empirical verification of
economic theory. In doing so, econometrics makes use of the mathematical
equations suggested in mathematical economics. Economic statistics deals
with collection, processing and presentation of data. Statistics provides many
theories on the basis of which inferences can be drawn. For example, the
probability laws and sampling distribution are quite important in hypothesis
testing. The application of these laws to economic theory through empirical
research is a part of econometrics.
An important application of econometrics is prediction and forecasting on the
basis of econometric models. By taking into account actual data it helps
estimation of the parameters of a model and on the basis of those estimates
how to make predictions. Thus, econometrics is helpful in policy analysis
where we can simulate different values of exogenous variables and interpret
their effects on endogenous variables.
Let us point out the important steps involved in undertaking an econometric
study.

a) The first and foremost issue in econometrics is the statement of


hypothesis. It is drawn from economic theory or some logic built by the
researcher.
b) Second step is transformation of the above hypothesis into mathematical
equation(s). This is our econometric model that we want to study.
Third step is the collection of relevant data required for the study. Data Introduction to
C)
Econometrics
could be from primary or secondary sources.
d) ?'he next step is estimation of the parameters-of the econometric model.
This requires selection of appropriate estimation method. As you will
learn in subsequent units, there are several estimation techniques or
methods available to us.

e) Once estimates are obtained, the next tasks testing of the hypothesis put
forth in the first step above. Basically it amounts to statistical
significance of the estimates obtained by us.

f) Finally, we need to interpret' the results. On the basis of inferences


drawn through hypothesis testing, we find out the implications of the
results for the econometric model.
We mentioned above that statistical theory offers important guidelines to us in
formulation and testing of hypothesis. We present below in very, brief, some
of the relevant statistical theory that we would be using frequently.

1.3 PROBABILITY DISTRIBUTIONS

Let us begin with concept of random variable. jt is a variable that takes


different values with some probabilities. Suppose the random variable X under
consideration refers to the possible outcomes (called events) of tossing a coin.
There are two events in this case (Head and Tail), which are exhaustive and
mutually exclusive, and have probability of occurrence each. Similarly for
a fair dice there are six outcomes and each event has a probability of i. We
can generalize the concept as follows: A random variable X assumes values
X , , X, ,..........X, with corresponding probabilities p,,p,, p,,........ p,. We
express it as p ( X = X,) = p, . When the random variable assumes certain
isolated values we call it a discrete random variable. On the other hand, if it
. assumesaany value in an interval it is termed continuous random variable.
A probability distribution is a statement about the possible values of a
random variable along with their respective probabilities. Thus it describes
how the probabilities are distributed over the values of the random variable.
For a discrete random variable the probability distribution function is called
probability.mass function whereas for a continuous random variable it is
called probability densitj function.

1.3.1 Discrete Probability Distribution


For a discrete random variable, the probability muss .firnction is given by
y(X). w k r e p, is probability and X,is a value of the random variable, The
mass function should satisfy the following two conditions:

i) Probability of an event cannot be negative, i.e., for any value X, we


have p(X, ) 2: 0

ii) Probabilities of all possible outcomes sum to unity, i.e., C p(X,) = 1


all x
'
i.

Basic Econometric Theory


There are quite a few discrete probability distributions. We present below the
probability mass function of an important probability distribution, that is,
binomial distribution.
Binomial Distribution
The binomial distribution is an example of a discrete probability distribution.
It signifies two possible outcomes of an experiment, the occurrence of an
event or the non-occurrence of the event. A probability experiment can be
termed as a Bernoulli experiment, if it satisfies the following conditions.

1) The experiment consists of a sequence of n repeated trials.

2) Each trial results in an outcome that may be classified either as a success


or a failure.

3) i'he probability of a success, denoted by p, is known and remains the


same. in each trial. Consequently, the probability of a failure, denoted by
q = (I - p ) is also known and remains the same in each trial.
4) The trials are independent.
The probability of x successes in n trials is given by

1.3.2 Continuous Probability Distribution


A continuous random variable X has a zero probability of assuming exactly
any of its values. Apparently, this seems to be a surprising statement. Let us,
try to explain this by considering a random variable say, weight. Obviously
weight is a continuous random variable since it can vary continuously. .
Suppose, we do not know the weight of a person exactly but have a rough
idea that her weight falls between 50 kg and 51 kg. Now, there are an infinite
number of possible weights between these two limits. As a result, by its
definition, the probability of the person assuming a particular weight say, 50.3 .
kg will be negligibly small; almost equal to zero. But we can definitely attach
some probability to the person's weight being between 50 kg and 5 1 kg. Thus,
for a continuous random variable X, we assign a probability to an interval and
not to a particular value. Here, we look for a fbnction p ( 4 , called the
probability density function, such th& with the help of this function we can
- compute the probability ~ ( <ax < b ) , a and b are the limits of an interval (a,
b) where. a < b.
A probability density function is defined in such a manner that the area under
its curve bounded by x-axis is equal to one when computed over the domain
of X for which p(Y) is defined. The probability density function for a
continuous random variable X defined over the entire set of real numbers R
should satisfl the following conditions.

1) p ( X , ) 2 0 for all X , E R
Introduction-to "
Econometrics

!
a I Normal Distribution
1
2 ;
% i Normal distribution is perhaps the most widely used distribution in Statistics
T i and related subjects. It has found applications in inquiries coilcerning heights
and weights of people, IQ scores, errors in measurement, rainfall studies and
so on. The probability density function p(x) of a continuous random variable
that follows the normal distribution is given by

where -a < x < a , a n d


% /
n = 3.17 14.1 (approximately)
e = 2.7 1828 (approkimately). I

The nonnal density function is completely determined by the parameters p


and o. It means that given the values of p and o,we can trace out the normal
curve by obtaining the values ofp(x) for different values of x. In fact, it can be
sholvn that p and o are respectively the mean and the standard deviation of
the normal distribution. When a random variable X follows normal
distribution with mean p and standard deviation CY we write it in symbols as
-
X N(p, a ) and read as*.'X follows normal distribution with meanp and
standard deviation a .' The normal curve is a symmetrical bell-shaped curve
as shown in Fig. 1.1.

Fig 1.1: Normal Cume

The normal curve stretches from - ao to + a.It is symmetric aboutits mean.


The following area-properties hold for a normal distribution. In Fig. 1.2 below
we plot a normal curve with mean p = 50 and standarddeviation a =4.

a) 68.8 % of the area under the normal curve lies between the ordinates at
p - a and p + a. Thus in Fig. 1.2,68.8% area is covered when
* ,
x ranges
between 46 and 54.

b) 95.5% of the area under the normal curve lies between the ordinates at
' p - 2 a a n d p + 2 a . InFig. 1.7 95.5%areaiscoveredw 9
4 2 5 X 558.
Basic Econometric Theory 99.7% of the area (i.e., almost the whole of the distribution) under the
c)
normal curve lies between the ordinates at ,u - 3 0 and ,u + 30. In Fig.
15.2 we find that 99.7% area is covered when 38 I X S 62.

Fig. 1.2: Area Under Normal Curve


A problem encountered here is that p and o can take any value and finding
out corresponding probability is time consuming. This problem is tackled by
subtracting p from the normal variable and dividing it by o.This way we
obtain the 'standard normal variate', z = -
'-', - which has mean = 0 and
o
standard deviation = 1.
The probability density function of the standard nornlal variate, z, is given by

Once we obtain a standard normal variate, our seemingly hopeless task of


obtaining probability areas for different combinations of ,u and a becomes
elegantly simple. We should note that a standard normal variate has a unique
mean of 0 and a unique standard deviation of 1. It means, if we can construct
a table for probability areas of such a unique standard normal variate, it can
be used for obtaining probability for any normal variable with any
combination of mean and standard deviation. The only thing is that the given
n o m l variable is to be transformed into the standard normal variate. In fact,
such a table for areas (or probability) has been compiled for a standard normal
variate (see Table A1 at the end of this Block: Areas under the Standard
Normal Curve)
The Student's-t Distribution
W.S. Gosset presented the t-distribution. The interesting story is that Gosset
was employed in a brewery in Ireland. The rules of the company did not
permit any employee to publish any research finding independently. So,
Gosset adopted the pen-name 'student' and published his findings about this
distribution anonymously. Since then, the distribution has come to be known
as the student's-t distribution or simply, the t distribution.
-
If z, is a standard normal variant, i.e., z, N ( o , ~and ) z, is another
independent variable that follows the chi-square distribution with k degrees of
-
freedom, i.e. z, ~ , 2 then
, the variable
Introduction to
'Econometrics

is said to follow sthdent's-t distribution with k degrees of freedom.


The probability curves for the student's-t distribution for different degrees of
freedom are presented in Fig, 1.3

-
t with n = 10

8
Fig 1.3: Studefit'a-t Probability Curves

We note the important characteristics of this distribution.

1) As we can see in Fig. 1.3, like the normal distribution, the student's-t
distribution is also symmetric and its range of variation is also from -a,
to +a,; however, it is flatter than the normal distribution. We should note
that as the degrees of freedom increase, the student's-t distribution
. approaches the normal distribution.

2) The nleaii of the student's-t distribution is zero, and its variance is


k
,where, k is the degrees of freedom.
(k - 2)
Like the normal distribution, the student's-t distribution is often used in
statistical inferences, particularly wheb the sample size is small. The task
involves the integration of its density function; which may prove to be
tedious. As a result, in this case also, like the normal cli~tribution,a table has
been constructed for ready-reference purposes (see Ta' le A2 at the end of the
Block).
Chi-Square Distribution

If z is a standard normal variate, as defined above, then the variable z 2 is said


to be distributed as a variable with one degree of freedom. Since x 2 is a
squared term, when z ranges from - oo to + oo, chi-squares ranges from 0 to
+ a,. Moreover, since z has a mean 0, most of the values taken by x 2 will be
close to 0. As a result, probability density of x 2 distribution will be the
maximum near 0. We can generalizc the above. If z,,z2,.....zk are k
Basic J2conornetric Theory k
independent chi-square variables then the variable z = C z, follows
r=l

X2 distribution with k degrees of freedom. We denote this b! , The


probability density functions of for different tlcgl-ccs.of freedo111are y i ,n
in Fig. 1.4. Observe that as degree of freedom increases, the , ' distribution
approaches normal distribution.

Fig'l.4: Chi-square Probability Curves


The critical valuw'of Chi-square distribution is given in Table A3 at the end
of the Block.
F- Distribution

.,'
Another continuous probability distribution that finds use
the F distribution.
* . ii.1 econometrics is

If z, and z, are hw chi-square variables that are independently distributed with k,and k,
degrees of ~ o kspectively,
m the variable
F=- 2 1 4
22 /k2
follows F distribution with k, and k, degrees of fnxdonl respectively. The
variable is denoted by 4,,& ,k ,
the subscripts k, and k, are the degees of freedom
associated with the chi-square variables.
We may note here that k, is called the numaator degrees of freedom and in the same
way, k2is called the denominator d e w of freedom.
I

Some important properties of the F distribution are mentioned below.

1) The F distribution, like the chi-square distribution, is also skewcd to the


right. But, as kl and k2 increase, the F distribution approaches the
normal distribu6on.

2) An F distribution with 1 and k as the numerator and clci~ort~inator


degrees of freedom respectively is the square of a student's-t
distribution with k degrees of freedom.
Symbolically,
. .
3) F& fairly large denominatoi degrees of freedom k2, the predict of the Introduction to
Econometrics
ntln~(~rat.ror
degress of freedom k, and the F value is approxima&ly equal-
: to the chi-squa,*evalue with'degrees-freedom
.
....k ,' i.e.,
; .$y;
k, F = ~ 1 : ., ,.
.
2 , . .
..
,
,

.. . . .
li
. -,
. .,. < ; ..> < ..
. -.. :
, 2 ..*. . . . . . . . c. ' ~ .
~. ~, + . . 5 , ! '

The -F distribution -is extensively -used-in--statistical inference. Again, such


uses require obtaining areas under the F probabjlity curve and consequently. . *.,

: integrating the Fdensity 'functih; I'ithis c&@also our task is .faslitated.by, . , . - l

the p'roviiion o f the F ~ a b l e The . critical' .value foi F distribution i$ ,given'ini , .. jcr:,.
. ~ ~ b~l L' i f ' ihe
j ~ iend of t ~ ~e l b ~ k . : . .:
;;.
. . , . /

. .
.. . . #;.' . ,
. . , .-
. , ., : : , . . . .
, . 1
:. . . .., J . '. .I
> , ! ' , .
. :!,

. . . . . .

, . I . .....
'.
.
.
. . .,
. . .. : , .. . c.'. . : . ? .<;!..:.,;.; .; .>,,, :

Due icf c Onstraints such,,a?costs, time and manpower it is ofien not.possible to


undertake.a survey,of the:,enti~epopulation (that is,. census)*andwe 'resort to;
sampling
. , .. (that.is,,
. . > we studp .part of the popqlation}.. If the sample is drawn i n a.
random manner through appropriate probability. attached to each population.,
unit and the sample size is not very small, the sample can be a representative
one of the population. It is often argued that sampling resultah 189~~bk ntd- 4
sampling errors compared to census and thus. it is advisable to go for ;>;L. . , .
sampling rather than census.Theoretiially ji is kssiblk to draw a'number
,;::. .
of.!.,: b '. .?;.!

samples from8 :giv&hpopulitibii and each sainple provides us with sample .. ,!.'

mean (or any othergtatistii).:.~liristhe sampie means can be arrangedln the


form of a frequiii?~diiffibution, callid the indjling!d&ibu& . . . of . . <ample .-,
. . . .-i

mean.
As sample mean (i) assumes different values and foi each ialbe we can '
. . it is considered as .. a,.random..variable.:In;real.,;life
attach. ceqtain probability, ,,

sirnations we have, a ,finite .,population and ; t b . number : of samples .(and


therefore number of iample means) i s fmjte; .. Jn this, case.f i s discretefi,
random variable but when there is infinite number of samples, E couldbea:.
continuous random-variable. . . .
.
, .
. . . . *I
.
', .,. . . .
.
....~'...; .I, ... .. . .. .. . . ,. .: , , : . :' ' ' .,.
. : . i ;

. Now let u s consider q p t h q important conoept:; the central limit fkorenr: It.::
> . . # . .

says sampling;!$istributian of . ? is m a l . if the .parentpopulation :from


which the sample is drawn is normal. However, sampling ,distribution , of 2 .i s . <!<,.

aPPioXimatelj; n6rmal if sample sizk (iiji is:large,ikven iffhc popuhsrl


is not normal.. [f; the parent
, ; . . . p. : ~ p u l ? i qisapprqxiqately normal then smg$ing;,.
/ . ,

distribution of sample,means is approximgtely normal even. yhen k p l e size. :

, is small. his provides some,.r a t i o ~ l r for! , assuning that^ ;probability


distributionof a statistic normal!
. . . .
. . .
. . . . ,. . . . . . . . .:
. . .
is
. ' . ' . 2 . ,
,:
.... I .,..
*.'
.......
rr
I : .

We know that dispersion of sample means is'sm.qller in value than di~persion


of;the< parent 'popalati~,i...f;om .whi& ffie~'.gm~le-';~'. dYiwn;' 'The
, , . .: of;,tbli ampl ling. distributioq is called - @ ~ d a r c g
deviatiai. $p v r . . Thus.if.the ...
population has a'standird deviation a then the standard error of sample,
. . . . . . .
> < . '
. . . . . . . . . . . .. . . . *:, .... + .
I,., s,
.. .. . . . .. .. . . . . . . .
. 6 '
n~eansis - -- . , , .

&
*. .,
. ./ ' .,:,
. , ..: .. c > . . . . . : -,;j: . !

~ s u a l lwe
' ~ conside;;
.. iample i o be Large in siz{:if .n> 3.0. For small samples
. .,.!! , ! l j

( n 1301, y - n ~ l i n gdist!ib,!tion of spmpl'e means


. .:. is sim-ilar.,to student's. t
,

distribution. ~ e c h i l'that in the c&e of t distribution the shape of t h e , , '

probability curve changes according to its 'degrees of h e d o m y .For n > 30 . .


Basic Econometric Theory the sampling distribution of a statistic is considered to follow norinal
distribution.

1.5 STATISTICAL INFERENCE

~ t a t i i i c a linference deals with the methods of drawing ~onclusionsabout the


population characteristics on the basis of information contained in a sample
drawn from the population.~Populationmean is not known to us, but we know
the sample mean. In statistical inference we would be interested in answering
two types of questions. First, what would be the value of the population
mean? The answer lies in making an informed guess about the population
mean. This aspect of statistical inference is called esfimcrfion.The second
question pertains to certain assertion made about the population mean.
Suppose a manufacturer of electric bulbs claims that the mean life of electric
bulbs is equal to 2000 hours. On the basis of the sample information, can we
say whether the assertion is correct or not? This aspect of statistical inference
is called hypothesis testing. We discuss these two aspects below.
1.5.1 Estimation
Estimation could be of two types: point estimution and ihterWl estirnafion. In
point estimation we estimate the value of the population parameter as a single
point. On the other hand, in the case of interval estimation we estimate lower
and upper bounds around sample mean within which population mean is
likely to remain.
a) Point Estimation
As mentioned earlier we do not know the parameter value and want to guess
it by using sample information. Obviously the best guess will be the value of
the sample statistic. Here we use a single value or point as 'estimate' of the
parameter.
Let us distinguish between the concepts estimate and esfimafor.The estimator
is the formula and estimate is the particular value obtained by using the
formula. For example, if we use sample mean for estimation of populatiori
1
mean, then -Ex, is the estimator. Suppose I collect data on a sample, and put
n
the sampling units to this formula.and obtain a particular value for sample
mean, say 120. Then 120 is an estimate of population mean. It is possible that
you draw another sample from the same population, use the forn~ulafor
1
sample mean, that is - E x , , and obtain a different value, say 123. Here both
n i
120 and 123 are estimates of population mean. But in both the cases the
1
estimator is the same, which is -Ex, . Remember that the term statisfic:,
n I

which is used to mean a function of sample values, is a synonym for


estimator.
There may be situations when you would find more than one potential
estimator (alternative formulae) for a parameter. In order to choose the best
among these estimators, we need to follow certain criteria. These are as
follows:
Introduction to
i) Unhiusednes: If 8 is a statistic based on n observations then it is said to Econometrics
be unbiased if E ( 8 ) = 8 . It implies that aaestimate may be higher or
lower than the unknown value of the parameter but the expected value
of the estimate should be equal to the parameter. The extent of bias in a
statistic (or estimator) is given by ~ ( 6-8 ) . You can find that sample
mean is an unbiased estimate of population mean.
ii) Consistency: Consistency and asymptotic bias are large sample
properties of an estimator. A consistent estimator is defined as follows:
an estimator6 is said to be consistent if plim(8 ) = 8 . The notation plim
stands for 'probability limit' and it plies that as sample size n increases
infinitely the statistic should be equal to the parameter value. Notice the
difference between unbiasedness and consistency. In the Case
unbiasedness the statistic on the average equals parameter value. In the
case of consistency, the statistic equals the parameter value when
sample size tends to infinity. Consistency is important because some
estimators may not be consistent. Let us take an example from Patterson
(2000).
Suppose the statistic (a random variable, as we mentioned earlier)
assumes two discrete values with corresponding probabilities:

8 =8 with probability 9
8= n with probability

In this case plim ( 8 ) = B . Because as n -P or, the probability + 0.


Thus the statistic is consistent. However, it is not unbiased. For
example, if sample size is 10, then E ( ~ ) = B(&)+& (10) =
0.98+1 # 8 .
iii) Efficiency: It refers to the variance of an estimator. An estimator with a
smaller variance is said to be more efficient. Usually comparison in
variance is done among estimators of the same class. For example, we
can compare among linear estimators (that is, estimators which are
linear combinations of sample observations) and say that OLS estimator
is efficient than other linear estimators.

iv) Asymptotic bias: A statistic is said to be asymptotically unbiased if its


'asymptotic expectation' equals the parameter value. Asymptotic
expectation of a statistic is given by

) lim ~
AE(~= ( 8 ) as n + or,.
We observe that an unbiased estimator is asymptotically unbiased but the
reverse is not essentially true. For example, suppose we modifL our previous
example as
b =8 with probability

8= n with probability $
In this case
Basic E c o ~ o m e t r i cTheory
b i a r = ~ ( e ) - ~ ~ @ ( q j + f i : ( l ) - !~, -:- f s - h - E n L I1
' I
*2 ' , (~.

'.
Thus tk,dmve, stati8tic is aot .unbiased. fh the'limiting 'case, as n -+ co ,
hawever,'lim ~ ( 6=6!
) Thus it is aymgtoti<oll~:1!n5:;lsed. .
a ' - !?>)
2
.
%
r 4'
I . ,

Interval Estimation , ,

The point estigate may q ~ bet realistic in the sc.nsc that the parameter value
may not exactly be e q d to it. An altema$ivc yrocr;durc is to give ;in inter~~al,
which would hold the with certain probability.. Here we specif) a
lower limit and upper limit within which the parameter value is likaly to
,
remain. Also we spe& the probability of thr parmeter remaining.in the
interval. We call the interval as 'confidence interval' and the pryb~hiiityof
the parameter rem$ning ,within this interval as,,,'co~fidenoe.level3 or
'confidence coefficienti. ,,,
-
j

, . ,' -k.

'How do W$ find out the confiditnce intm'al and confidence 'coefficied",'~et


'
us begin with confidenoe coefficient. 1

We know that the sampling distribution of Y.for large


-
samples is normally
Pistribuied with me=
A; J

and standard crror ,:.


CT

y' n
where n is the siik of the
x-p
sample. By transforming the sampling distribution ( z =, ) we obtain
o/&
standard normal variate, which has zero m w andl unit variance. The
standard normal curve is symmetrical and therefore, the area under the curve
for 0 Iz < a is 0.5 as we can see from Table A1 given at the end of the Block.
If we want our confidence coefficient to be 95 per cent (that is, 0.95), we find
out a range for z which will cover 0.95 area of the stahd&d normal durve.
Since distribution of z is symmetridal, 0.475 area should reinain 'td the right
and 0.475 area should remain to the left of z = 0 . From Table A l we find that
0.475 area is covered when ~ 1 . 9 6 . Thus the ,probability that z rpnges
between -1.96 to 1.96 is- 0:95. Frdm this' ihfo'nnktion" let' us w0i.k' out '
; i j,
backward and find the rd&e %thin which ,uwill krhain. % .
2 .
,
-
, > . - L a '

,
We find that 1 . . > I t J
,.

P(-1.96 < z i 1.96)= 0.95 -

As each sample would provide us with a different value of n , the confidence


interval would be different. In each case the confidence interval may contain
the unknown parameter or it may not. Equation (1.6) means that if a'large
number of random samples, each of size n, are drawn from the given
is determined, tlxn in about 95% of the cases, the
- Introduction to
Econometrics

interval will include the population mean p.


The confidence coefficient is denoted by ( I - a ) where a is the level of
significance. Cotlfiderice coefficient could take any value. We can very well
ask for a confidence level of say 81 per cent or 97 per cent depending upon
how precise our conclusions should be. However, conventiotially two
confidence levels are frequently used, namely, 95 per cent and 99 per cent.
Let us find out the confidence interval when confidence coefficient ( 1 - a ) =
0.99. It-I this case 0.495 area should remain on either side of the standard
normal curve. If we look into the normal area table (Table Al at the end of
the Qlock) we find that 0.495 area is covered when z = 2.58.
Thus

P I-2.38.5 5 2.58) = 0.99

By reakt-atlgihg the terms itl the above we find that

Equatioo (1.8) implies that 99 per cent confidence interval for p is given by
3 + 2.58-.@
4i
By looking into the nomal area table you can work out the confidence
interval for confidence coefficient of 0.90 and find that

We observe from (1.7), (1.8) and (1.9) that as the interval widens, the
probability of the interval holding a population parameter (in this case p)
increases.
The two limits of the confidence interval are called conjidence limits. For
example, for 95 per cent confidence level we have the lower confidence limit
I
T
x + 1.96-.0
as x' - I .96--
&
and upper confidence limit as
- The confidence
J;;
coefficient can be interpreted as the confidence or trust that we place in these
limits for actually holding p.

1.5.2 Hypothesis Testing


A hypothesis is a tentative statement about a characteristic of a population. It
could be an assertion or a claim also. In hypothesis testing there are four
important components: i) null hypothesis, ii) alternative hypothesis, iii) test
statistic, and iv) interpretation of results.
Null and Alternative Hypothesis
Usually statistical hypotheses are denoted by the alphabet H. There are two
types of hypothesis: null hypothesis ;end alternative hypothesis. A null
hv~othesisisthe statement that we consider to be true about the population
Basic Econometric Theory '
and put to test by a test statistic. We denote the null hypothesis by H,, .
Suppose on the basis of our logic we put forth tlie.view that female literacy in
a village in Orissa is higher than that for Orissa. We begin \! ith the
presumption that female literacy in the village is cqual to that 01' Orissa
(which is say 51 per cent). Thus our null hypothesis i s that 'samplc nieati i s
equal to population mean'.

where p is the parameter, in this case fenlale litcracy in Orissa.

There is a possibility that the null hypothesis tliat we iiitcnd to tcst is not tr~lc
and female literacy in Orissa is not equal to 5 1 pet. cent. 1 hus there is a need
for an alternative hypothesis which holds truc rn case thc null hypothesis is
not true. We denote the alternative hypothesis ht tlie sqnibol H , and
formulate it as follows:

We have to keep in mind that null hypothesis and alternative hypothesis arc
mutually exclusive, that is, both cannot he true simultaneously. Secondly,
both H , and H ., exhaust all possible options regarding the parameter, that is.
there cannot be a third possibility. For example, in the case of female literacy
in the village, there are two possibilities - litetacy rate is 51 per cent or it is
not 5 1 per cent; a third possibility is not there.
It is a rare coincidence that sample mean ( \ - ) is equal to population mean
(p). In most cases we find a difference between :r and p . Is the difference
because of sampling fluctuation or is there a genuine difference between the
. sample and the population? In order to answer this question we need a tcst
statistic to test the difference between the two. The result that we obtatn by
using the test statistic needs to be interpreted and a decision nccds to be taken
regarding whether the null hypothes'is be rejected or not.
Rejection Region
While discussing confidence interval we mentioned if confidence level is 95
per cent then 5 per cent area of the standard normal curve remains under the
rejection region. Let us look into the standard normal curve presented in Fig.
1.5, where the x-axis represents the variable z and the y-axis represents the
probability of z, that isp(z). If thc estimate falls under the rejection region
then the null hypothesis is rejected. Otherwise, the hypothesis is not rejected.

Fig. 1.5: Rejection Region in Standard Normal Curve


We should note the following points. introduction to
Econometrics

When sample mean is equal to population ncan (that is, Y = p ) we find


that z = 0. When % > p we find that z is positive and when x < p we
find z to be negative. -
Note that we are concerned with, the difference between x and p .
Therefore, negative or positive sign of z does not matter much.

Higher the difference between s and p . higher is the absolute value of


L. Thus =-value measures the discrepancy between i and .p, and
therefore can be used as a test statistic.
We should find out a critical value of z beyond which the difference
between 7 and p is significant.

If the absolute value of z is less than the critical value we should not
rqjedt the null hypothesis.
If the absolute value of z exceeds the critical value we should reject the
null hypothesis and accept the alternative hypothesis.
Thus in the case of large samples the absolute value of z can be considei-edas
test statistic for hypothesis testing such that

When we have a significance level of 5 percent, the area covered under the
standard normal curve is 95 per cent. Thus 95 per cent area under the curve is
bounded by - 1.96 s z I 1.96. The remaining 5 per cent area is covered by
z 5 -1.96 and z r 1.96. Thus 2.5 per cent of area on both sides of the standard
normal curve constitute the re.jection region. .
For small samples ( n 530), if ~opulatiopstandard deviation is known we
apply z-statistic for hypothesis testing. On the other hand, if population
standard deviation is not k n o d we apply t-statistic. The same criteria apply
to hypothesis testing also.
In the case of small samples if population standard deviation is known the test
statistic is

On the other hand, if population standard deviation is not known the test
statistic is

In the case of t-distribution, however, the area under the curve (which implies
probability) changes according to degrees of freedom. Thus while finding the
critical value of t we should take into account the degrees o'f freedom. When
sample size is n, degrees of freedom is - I . Thus we should remember two
Basic Econometric Theory things while finding critical value of 1. These are: i)'significance level. and ii)'
degrees of freedom.
One-tail and Two-tail Tests
In Fig. 1.4 we have shown the rejection region on both sides of the standard
normal curve. However, in many cases we may place the rejection region on
one side (either left or right) of the standard normal curve.

Remember that if cr is the level of significance. then for a two-tail test


L

area is placed on both sides of the standard normal curve. Rut if it is a one-tail
test then cr area is placed on one-side of the standard normal curve. Thus the
critical value for one-tail and two tail test d i f i r .
The selection of one-tail or two-tail test depends upon the formulation of the
alternative hypothesis. When the alternative hypothesis is of the type
H , :F # p we have a two-tail test, because Y could be either greater than or
less thin p. On the other hand, if alternative hypothesis is of the type
ti, : x < p , then, entire rejection is on th"e left hand side of the standard
normal curve. Similarly, if the alternative hypothesis is of the type
H , : i> p , then the entire rejection is on the right hand side of the standard
normal durve.

1.6 SOFTWARE PACKAGES FOR


ECONOMQTRIC ANALYSIS +
As econometric analysis cbeals with empirical data it involves cumbersome
computations. We can think of estimating the parameters manually in some of
the simpler estimatiofi methods, such as ordinary least squares. In other cases,
however, it is quite difficult and time consuming. Some of the econometric
methods are iterative in nature and thus require repeated calculations. -
, In recent years, easy access to computers and availability of Software
Packages for Econometric ~ n a l ~ s i s - h a made
v e the job simpler. We find that
the most difficult computations can be done by the computer in few seconds.
There are several software packages available for econometric applications.
Most of these software packages include the important econometric
applications. Moi-eover, since they apply similar methods, they provide the
same results for a problem. We provide a list of selected software packages
below.
SPSS/PC+ by SPP Inc., 444 N. Michigan venue, Chicago, IL 6061 1, USA.
STATA by Computing Resources Centre, 10801 National Blvd., 3rd Floor,
Los Angeles, CA 90064, USA.
SHAZAM by Kenneth J . White, department of Economics, university of
British Columbia, Vancouver, BC V6T 1Y2, Canada.
LIMDEP by W. H. Greene, Stern Graduate School of Business, New York
University, 100 Trinity Place, New York, USA.
'
SASISTAT by SAS Institute Inc., PO Box 1818, Evanston., IL, USA.
introduction to
1.7 LET US SUM UP Econometrics

In this Unit we provided an outline of some essential statistical tools studied


some continuous probability distributions. Among these distributions, the
normal distribution 'is considered to be the most important one.
~ e s i d e sthe normal distribution, we have considered three other continuous
probability distributions, viz., the chi-square distribution, t h e student's-t
distribution and'the F distribution. We have already seen from the features of
the chi-square, student's-t and the F distributions that for large degrees of
fieedom, these distributions approach the normal distribuiion. This
relationship between the chi-square distribution, the studznt's-t distribution
and the F distribution on one hand and the normal distribution on the other
has tremendous practical implications. When the degrees of freedom happen
to be fairly large, instead of using the chi-square distribution or the student's-t
distribution or the F distribution separately as the situation may demand; we
can uniformly apply the normal distribution. As a result, our task gets
considerably simplified.

1.8 KEY WORDS


.Chi-square , It is an asymmetric distribution where the
Distribution range of variation for the random variable is
from zero to infinity. For fairly large degrees
of freedom, it approaches the normal
distribu~ion.
Continuous It is the probability distribution . f o r a
Probability continuous random variable.
Distribution
Degrees of Freedom : It refers to the number of pieces of
independent information that are required to
compute some characteristic of a given set of
observations.
Discrete Probability : It is the probability distribution for a discrete
~istributidn random variable.
\
.
Estimation It is ihe method of prediction about parameter
kalues on the basis of sample statistics.
Estimator It is anorher name given to statistic in the
theory of estimation.
F Distribution : It is an asymmetric distributiop that is skewed
to the right. For fairly large degrees of
freedom, it approaches the normal distribution. -
Normal Distribution : The best known of all the theoretical
probability distributions. It traces out a bell-
shaped symmetric probability curve.
Normal Variable : A random variable that follows the normal
distribution.
Paradieter : It is a measure of some characteristic of the
Basic Econometric Theory
Population It is the collection of all units of a specified
type in a given plncc and at a particular point
of time.
Probability It is a statement about the possible values of a
Distribution random variable along with their probabilities.
Random Sampling It is a procedure where every member of the
population has a definite chance or probability
of being selected in the sample. It is-also called
probability sampling.
Sample It is a sub-set of the population. It can be
drawn from 'the population in a scientific
manner by applying the rules of probability so
that personal bias is eliminated. Many samples
can be drawn from a population and there are
many methods of drawing a sample.
Sampling It is the relative frequency or probability
Distribution distribution of the values of a statistic when
the number of samples tends to infinity.
Sampling Error In the sampling method, we try to approximate
some feature of a given population from a
sample drawn from it. Now, since in the
sample all the members of the population are
not included, showsoever close the
approximation is, it is not identical to the
required population feature and some error is
committed. This error is called the sampling
error.
Significance Level There may be certain samples where
population mean would not remain within the
confidence interval around sample mean. The
percentage (probability) of such cases is called
significance level. It is usually denoted by a .
When a = 0.05 (that is, 5 percent) we can say
that in 5 per cent cases we are likely to reach
an incorrect decision.
Standard Error A sample statistic varies across samples which
can be presented in the form of a probability
distribution. Standard error is the standard
deviation of the sampling distribution of a
statistic.
Standard Normal A normal variable with mean 0 and standard
Variate deviation equal to 1.
Statistic It is a function of the values of the units that
are included in the sample. 'The basic purpose
of a statistic is to estimate some population
parameter.
Statistical Inference It is the process of concluding about. an
- . from a known sample
unknown population
drawn from it.
Introduction to
1.9 SOME USEFUL BOOKSIREFERENCES Econometrics

Gujarati, D., 1995, Basic Econometrics, McGraw-Hill, Singapore.


IGNOU, 2005, EEC-13: Elementary Statistical Methods and Survey
Techniques, Blocks 5 , 6 and 7.
Nagar, A. L. and R.X. Das, 1989, Basic Statistics: Oxford University Press,
Delhi.
Patterson, K, 2000, An Introduction to Applied Econometrics, Palgrave, New
York.
- Samuelson, P. A., T. C. Koopmans, and J. R. N. Stone, 'Report of the
Evaluative committee for Econometrica', Econometrica, vol. 22., no. 2, pp.
141-46..

You might also like