0% found this document useful (0 votes)
20 views93 pages

Chapter 1 and 2 Econometrics

The document outlines the course structure for AgEC 3034 Econometrics, aimed at undergraduate students in their third year, covering topics such as regression analysis, econometric problems, and forecasting. Assessment is based on tests, quizzes, assignments, a term paper, and a final exam, with a total of 7 ECETS. Key references include works by Koutsoyiannis, Greene, and Gujarati, emphasizing the integration of economic theory, mathematics, and statistical techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views93 pages

Chapter 1 and 2 Econometrics

The document outlines the course structure for AgEC 3034 Econometrics, aimed at undergraduate students in their third year, covering topics such as regression analysis, econometric problems, and forecasting. Assessment is based on tests, quizzes, assignments, a term paper, and a final exam, with a total of 7 ECETS. Key references include works by Koutsoyiannis, Greene, and Gujarati, emphasizing the integration of economic theory, mathematics, and statistical techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 93

ECONOMTERICS

Course code: AgEC 3034


Course Title: ECONOMTERICS Beneficiaries:
Undergraduate students of AgEc Year III
Course ECETS: 7
Lecture per week: 4 hr (M3-4, T3-4, W5-7)
Software Practical: 3 Hr (W5-7)
Academic Year: 2025 (2nd Semester)
Pre-requisites: Introduction to Statistics (AgEc2131)
and Statistics for Economists (AgEc2132)
Instructor: Yitna Tesfaye
E-mail: [email protected]
Office number: New Building fourth floor office No
423-1 or 5th floor room number 511
OUTLINE
Unit 1: What is Econometrics – What it is all
about?
Unit 2: Correlation Theory
Unit 3: Simple Linear Regression Models
Unit 4: Multiple Regression Analysis
Unit 5: Dummy Variable Regression Analysis
Unit 6: Econometric Problems
Unit 7: Non-linear Regression and Time Series
Econometrics
MODE OF ASSESSMENT/EVALUATION CRITERIA:
Students will be graded on the basis of their performance in the following way:-
Tests (2) = 20%
Quizzes (2) =10%
Assignment (1) =10%
Term Paper- Generate and interprete regression output obtained from software = 20%
Final exam = 40%
REFERENCES
• Koutsoyiannis, 2001. Theory of Econometrics, 2nd
edition.
• Greene, W.H., 2000. Econometric Analysis. MacMillan
Publishing Company, New York.
• Gujarati, D., 2003. Basic Econometrics, 3rd edition
• Damodar N. Gujarati, 2004. Basic Econometrics, 4th
Edition.
• Badi, 2002. Econometrics, 3rd edition. Maddala
introduction to Econometrics. 2nd ed. Macmillan
Publishing Company. New York
• Wooldridge 2005. Introductory Econometrics. 3rd Ed.
Unit 1: What is Econometrics – What it is
all about?

Economics Econometrics
The demand for a Interested to
good is a function quantify the
of the good’s price elasticity more
Which is the price accurately.
elasticity of
demand is
negative.
objectives

Measure and analyze the association between


economic variables

Formulate models to represent and study


economic relationships

Estimate economic parameters important for


policy formation and evaluation

Forecast economic relationships


1. Fundamental concepts of Econometrics

1. What is econometrics?
2. What is the importance of econometric
models?
3. Mention desirable properties of econometric
models
4. Differentiate between economic and
econometric model
5. What are the goals of econometrics?
Definition and scope of
econometrics

 What is Econometrics?
Econometrics means economic measurement.
The “metrics” part of the word signifies measurement
and econometrics is concerned with the measuring
of economic relationships.
It is a social science in which the tools of economic
theory, mathematics and statistical inference are
applied to the analysis of economic phenomena
(Arthur Goldberger).
 Econometrics is the integration of economic
theory, mathematics, and statistical techniques
for the purpose of testing hypotheses about
economic phenomena, estimating coefficients
of economic relationships and forecasting or
predicting future values of economic variables or
phenomena.
 Econometrics is subdivided into
1. Theoretical econometrics.
2. Applied econometrics.
1. Theoretical econometrics refers to the methods
for measurement of economic relationships in
general.
Cont.…

Applied econometrics examines the problems


encountered and the findings in particular fields
of economics, such as demand theory,
production, investment, consumption, and other
fields of applied economic research.

In any case, econometrics is partly an art and


partly a science, because often the intuition and
good judgment of the econometrician plays a
crucial role.
CONT…

 Econometrics is “the application of statistical and


mathematical methods to the analysis of economic
data, with a purpose of giving empirical content to
economic theories and verifying them or refuting
them.” Maddala
 It is a special type of economic analysis and research
in which the general economic theory formulated in
mathematical form (i.e. mathematical economics) is
combined with empirical measurement (i.e. statistics)
of economic phenomena.
.
• Econometrics has basically three closely interrelated functions.
1. to test economic theories or hypotheses.
• example, is consumption directly related to income? Is the quantity
demanded of a commodity inversely related to its price?
2. to provide numerical estimates of the coefficients of economic relationships.
These are essential in decision making.
– example, a government policymaker needs to have an
accurate estimate of the coefficient of the relationship
between consumption and income in order to determine
the stimulating (i.e., the multiplier) effect of a proposed tax
reduction.
– A manager needs to know if a price reduction increases or
reduces the total sales revenues of the firm and, if so, by
how much.
3. The forecasting of events. This, too, is necessary in order for policymakers to
take appropriate corrective action if the rate of unemployment or inflation is
predicted to rise in the future.
Why a Separate Discipline?
1. Economic theory makes statements or hypotheses that
are mostly of qualitative nature.
Example: ceteris paribus a reduction in the price of a
commodity is expected to increase the quantity
demanded.
• And Economic theory postulates an inverse
relationship between price and quantity demanded of
a commodity.
• But the theory does not provide numerical value as the
measure of the relationship between the two.
• Econometrics provide the numerical value by which the
quantity will go up or down as a result of changes in
the price of the commodity.
.
2. Economic statistics is concerned with collecting,
processing and presenting economic data (descriptive
statistics).
Example: collecting and refining data on national
accounts, index numbers, employment, prices, etc.
3. Mathematical statistics and mathematical
economics do provide much of the tools used in
Econometrics.
• But Econometrics needs special methods to deal with
economic data which are never experimental data.
Examples: Errors of measurement, problem of
multicollinearity, problem of serial correlation are only
econometric problems and are not concerns of
mathematical statistics.
.
• Econometrics utilizes these data to estimate
quantitative economic relationships and to
test hypothesis about them.

• The Econometrician is called upon to develop


special methods of analysis and deal with such
kinds of Econometric problems.
Economic models vs. Econometric models
• A model is any representation of an actual phenomenon such
as an actual system or process.

• The real world system is represented by the model in order to


explain it, to predict it, and to control it.

• Any model represents a compromise between reality and


manageability.
.
• A given representation of real world system can be a model if it fulfills the
following requirements.

1. It must be a “reasonable” representation of the real world system and in


that sense it should be realistic.

2. It must be “manageable” in that it yields certain insights or conclusions.

• A good model is both realistic and manageable.


• A highly realistic but too complicated model is a “bad” model in the
sense it is not manageable.

• A model that is highly manageable but so idealized that it is unrealistic


not accounting for important components of the real world system, is a
“bad” model too.
.
• In general to find the proper balance between
realism and manageability is the essence of
good Modeling.
• Thus a good model should, on the one hand,
specify the interrelationship among the parts
of a system in a way that is sufficiently
detailed and explicit.
• on the other hand, it should be sufficiently
simplified and manageable to ensure that the
model can be readily analyzed and conclusions
can be reached concerning the real world.
Economic models
• Any economic theory is an observation from the
real world.
• For one reason, the immense complexity of the
real world economy makes it impossible for us to
understand all interrelationships at once.
• Another reason is that all the interrelationships are
not equally important for the understanding of the
economic phenomenon under study.
• The procedure is to pick up the important factors
and relationships relevant to our problem and
• to focus our attention on these alone.
.
• a deliberately simplified analytical framework is
called on economic model.
• organized set of relationships that describes the
functioning of an economic entity under a set of
simplifying assumptions.
• All economic reasoning is ultimately based on
models.
Economic models consist of the following three
basic structural elements.
1. A set of variables
2. A list of fundamental relationships and
3. A number of strategic coefficients
Econometric models
• The most important characteristic of economic
relationships is that they contain a random element
which is ignored by mathematical economic models
which postulate exact relationships between economic
variables.
Example: Economic theory postulates that the demand
for a commodity depends on its price, on the prices of
other related commodities, on consumers’ income and
on tastes.
• Q = b0 + b1 P + b2 P0 + b3Y + b4 t
• The above demand equation is exact. However, many
more factors may affect demand.
• In econometrics the influence of these ‘other’ factors is
taken into account by the introducing random variable.
the stochastic form:
Q = b0 + b1 P + b2 P0 + b3Y + b4 t + u
, where u stands for the random factors which
affect the quantity demanded.
• The random term (also called error term or
disturbance term) is a surrogate variable for
important variables excluded from the model,
errors committed and measurement errors.
Previously in econometrics class
• Define Econometrics
• Objectives of econometrics
• importance of econometric models
• desirable properties of econometric models
• Differentiate between economic and econometric model
Methodology of
Econometrics
Econometric research is concerned
with the measurement of the
parameters of economic
relationships and with the
predication of the values of
economic variables.

The relationships of economic


theory which can be measured with
econometric techniques are
relationships in which some
variables are postulated as causes
of the variation of other variables.
.

Stages of econometric
research
4. Evaluation
of the
3. Evaluation
1. Specification of 2. Estimation forecasting
of the
the model of the model power of the
estimates
estimated
model
1. Specification of the model
• The relationships between economic variables
expressed in mathematical form.
• Involves three important issues:
1. Determine dependent and independent (explanatory)
variables to be included in the model,
2. Determine a priori theoretical expectations about the
size and sign of the parameters of the function, and
3. Determine mathematical form of the model (number of
equations, specific form of the equations, etc.
• Specification of the econometric model based on
economic theory
• specification of the econometric model presupposes
knowledge of economic theory and familiarity with
the particular phenomenon being studied
Cont…Specification of the model
• Specification of the model is the most important
and the most difficult stage of any econometric
research.
• It is often the weakest point of most econometric
applications.
• In this stage there exists enormous degree of
likelihood of committing errors or incorrectly
specifying the model.
The most common errors of specification are:
1. Omissions of some important variables from the
function.
2. The omissions of some equations (for example, in
simultaneous equations model).
3. The mistaken mathematical form of the functions.
Cont…Specification of the model
Some of the common reasons for incorrect
specification of the econometric models are:
1. imperfections, looseness of
statements in economic theories
2. limited knowledge of the factors
which are operative in any particular
case
3. difficult obstacles presented by data
requirements in the estimation of
large models
Age Weight (in Kg)
Example 1 7 28.3
8 28.3
9 36.7
• Assume that a doctor is noting 10 35.8
the weight of the first patient for
each of the ages between seven 11 38.7
and seventeen years old.
12 47.4
• This data is shown in table 1.
• Based on this data, one can 13 48.3
assume that the older a person 14 54.2
gets, the higher their weight.
• While this is true in this 15 65.6
example, a better guess can be 16 64.3
made using simple linear
regression. 17 65.8
• As shown in figure 1, the mathematical
model for this line is given such as:

• Y=4.22X-4.01
– Where:
• Y represents the weight variable.
• X represents the age variable.
• This mathematical equation can be used to
predict the weight of a person using their age.

• For instance, what is the weight of a 15-year-old


individual using the linear regression equation?
• The answer consists of replacing X, the age variable
with fifteen.
• Therefore, the weight is Y = 4.22 * 15 - 4.01 = 59.29
Kg.
Example 2
Demand is a multivariate function:

• Qd = f (P, Po T, S, I, E, Z)
Where: P is output price
Po is the price of other goods and services.
T is the taste of consumers toward the good.
S is the size of the population in the market.
I is the income of consumers.
E is the expectation of consumers about future market conditions.
Z is other factors.
• All things that affect demand work through one of these factors.
• When studying demand all factors that affect demand, except one,
are kept constant (ceteris paribus) and we determine what
happens to demand when the factor under consideration changes.

Tuesday, February 11, 2025 34


Example 3
• The supply of goods or services is affected by several factors. The
factors that influence supply include:
1. The price of the good (P).
2. The level of technology (T).
3. The price of factors of production (Pf).
4. The number of suppliers (S).
5. Expectations (E).
6. Others (Z).
• Everything that affects supply works through one of these
determinants. Supply function is, then, defined as
Qs = f (P, T, Pf, S, E, Z)

Tuesday, February 11, 2025 35


2. Estimation of the model
• a technical stage includes the following activities.
i. Gathering of the data on the variables included in the
model.
ii. Examination of the identification conditions of the function
(especially for simultaneous equations models).
iii. Examination of the aggregations problems involved in the
variables of the function.
iv. Examination of the degree of correlation between the
explanatory variables (i.e. examination of the problem of
multicollinearity).
v. Choice of appropriate economic techniques for estimation,
i.e. to decide a specific econometric method to be applied ;
such as, OLS (Ordinary List square Method), MLM
(Maximum Likelihood method) (multilevel modeling)
3. Evaluation of the estimates
• This stage consists of deciding whether the estimates of the
parameters are theoretically meaningful and statistically
significant.
• This stage enables determine the reliability of the results.
Criteria:
i. Economic a priori criteria: These criteria are determined by
economic theory and refer to the size and sign of the
parameters of economic relationships.
ii. Statistical criteria (first-order tests): These are determined
by statistical theory and aim at the evaluation of the
statistical reliability of the estimates of the parameters of
the model. Correlation coefficient test, standard error test,
t-test, F-test, and R2-test are some of the most commonly
used statistical tests.
.
iii. Econometric criteria (second-order tests): These
are set by the theory of econometrics and aim at
the investigation of whether the assumptions of
the econometric method employed are satisfied
or not in any particular case.
• The econometric criteria serve as a second order
test (as test of the statistical tests) i.e. they
determine the reliability of the statistical criteria;
they help us establish whether the estimates
have the desirable properties of un biasedness,
consistency, etc.
• Econometric criteria aim at the detection of the
violation or validity of the assumptions of the
various econometric techniques.
4. Evaluation of the forecasting power of the
model
• Forecasting is one of the aims of econometric research.
• before using an estimated model for forecasting by some way or
another, the predictive power and other requirements of the
model need to be checked.
• The model may be economically meaningful and statistically and
econometrically correct for the sample period for which the model
has been estimated.
• This stage involves the investigation of the stability of the estimates
and their sensitivity to changes in the size of the sample.
.
.
Desirable Properties of an
Econometric Model
• An econometric model is a model whose
parameters have been estimated with some
appropriate econometric technique.
• The ‘goodness’ of an econometric model is judged
customarily based on the following desirable
properties.
1. Theoretical plausibility: The model should be
compatible with the postulates of economic theory and
adequately describe the economic phenomena to
which it relates.
.
2. Forecasting ability: The model should produce
satisfactory predictions of future values of the
dependent (endogenous) variables.
3. Explanatory ability: The model should be able to
explain the observations of the actual world. It must
be consistent with the observed behaviour of the
economic variables whose relationship it determines.
4. Accuracy of the estimates of the parameter: The
estimates of the coefficients should be accurate in
the sense that they should approximate as best as
possible the true parameters of the structural model.
The estimates should, if possible, possess the
desirable properties of unbiasedness, consistency
and efficiency.
.
5. Simplicity: The model should represent the
economic relationships with maximum simplicity.
The fewer the equations and the simpler their
mathematical form, the better the model
provided that the other desirable properties are
not affected by the simplifications of the model.
Goals of Econometrics
Basically there are three main goals of
Econometrics:
i. Analysis i.e. testing economic theory
ii. Policy making i.e. obtaining numerical estimates
of the coefficients of economic relationships for
policy simulations.
iii. Forecasting i.e. using the numerical estimates of
the coefficients in order to forecast the future
values of economic magnitudes.
Chapter Two

Correlation Theory
2. Correlation Theory
At the end of this chapter we will be able to
answer the following questions?
1. define correlation?
2. List the correlation measurement
methods?
3. How to Calculate the correlation
coefficient of two variables?
Correlation Analysis
• Economic variables have a great tendency of
moving together and there is a possibility that the
change in one variable is on average
accompanied by the change of the other variable.
• This situation is known as correlation.
• Correlation defined as the degree of relationship
existing between two or more variables.
• The degree of relationship existing between two
variables is called simple correlation.
• The degree of relationship connecting three or
more variables is called multiple correlations.
• Here we shall examine only. simple correlation.
• A partial correlation studies the degree of
relationship between two variables keeping all
other variables connected with these two variables
are constant.
• Linear Correlation: all points (X, Y) on scatter
diagram seem to cluster near a straight line
• the change in one variable brings a constant change
of the other.
• Nonlinear correlation: all points seem to lie near a
curve.
• The change in one variable brings a different change
in the other.
.
• Correlation may also be or negative.
• Positive Correlation: an increase or a decrease
in one variable is accompanied by an increase
or a decrease by the other in which both
variables are changed with the same
direction.
• Example, the correlation between price of a
commodity and its quantity supplied is
positive since as price rises, quantity supplied
will be increased and vice versa.
.
• Negative Correlation: an increase or a
decrease in one variable is accompanied by a
decrease or an increase in the other in which
both are changed with opposite direction.
• Example, the correlation between price of a
commodity and its quantity demanded is
negative since as price rises, quantity
demanded will be decreased and vice versa.
Methods of Measuring Correlation
• two important things to be addressed.
1. The type of co-variation existed between
variables and its strength.
2. The types of correlation mentioned before do
not show to us the strength of co-variation
between variables.
There are three methods of measuring correlation.
1. The Scattered Diagram or Graphic Method
2. The Simple Linear Correlation coefficient
3. The coefficient of Rank Correlation
The Scattered Diagram or Graphic
Method
• The scatter diagram is a rectangular diagram which
help us in visualizing the relationship between two
phenomena.
• It puts the data into X-Y plane by moving from the
lowest data set to the highest data set.
• It is a non-mathematical method of measuring the
degree of co-variation between two variables.
• Scatter plots usually consist of a large body of data.
• The closer the data points come together and make a
straight line, the higher the correlation between the
two variables, or the stronger the relationship.
.
• If the data points make a straight line going
from the origin out to high x- and y-values,
then the variables have a positive correlation.
• If the line goes from a high-value on the y-axis
down to a high-value on the x-axis, the
variables have a negative correlation.
.
.
.
• A perfect positive correlation is given the
value of 1.
• A perfect negative correlation is given the
value of -1.
• If there is absolutely no correlation present
the value given is 0.
• The closer the number is to 1 or -1, the
stronger the correlation, or the stronger the
relationship between the variables.
• The closer the number is to 0, the weaker the
correlation.
.
• Two variables may have a positive correlation, negative
correlation, or they may be uncorrelated.
• This holds true both for linear and nonlinear
correlation.
• Two variables are said to be positively correlated if
they tend to change together in the same direction,
that is, if they tend to increase or decrease together.
• Such positive correlation is postulated by economic
theory for the quantity of a commodity supplied and
its price.
• When the price increases the quantity supplied
increases. Conversely, when price falls the quantity
supplied decreases.
.
• Negative correlation: Two variables are said to
be negatively correlated if they tend to change
in the opposite direction:
• when X increases Y decreases, and vice versa.
• For example, saving and household size are
negatively correlated. When price increases,
demand for the commodity decreases and
when price falls demand increases.
The Population Correlation Coefficient ‘’ and its Sample
Estimate ‘r’
• quantitative measurement of the degree of correlation
between Y and X
– a parameter called the correlation coefficient () used.
–  refers to the correlation of all the values of the population of X and
Y.
• Its estimate from any particular sample (the sample statistic for
correlation) is denoted by r with the relevant subscripts.
.
• For example if we measure the correlation
between X and Y the population correlation
coefficient is represented by xy and its sample
estimate by rxy.
• The simple correlation coefficient is used to
measure relationships which are simple and
linear only.
• It cannot help us in measuring non-linear as
well as multiple correlations.
Formula for Sample correlation coefficient
x i yi
. rxy =
 
2 2
xi yi

n X i Yi −  X i Y i

Or r=
n X i − ( X i ) 2 n  Yi − ( Yi ) 2
2 2

Where, xi = X i − X and y i = Yi - Y
.
• economic theory postulates that price (X) and
quantity supplied (Y) are positively correlated.
Example: The quantity supplied for a commodity
with the corresponding price values is given.
Determine the type of correlation that exists
between these two variables.
.
Time period(in days) Quantity supplied Yi (in Price Xi (in shillings)
tons)
1 10 2
2 20 4
3 50 6
4 40 8
5 50 10
6 60 12
7 80 14
8 90 16
9 90 18
10 120 20
Table 2: Computations of inputs for correlation coefficients
xxi = X −X yyi = Y −Y
Y X i = X ii − X i = Yii − Y x2 y2 x iy i XY X2 Y2

10 2 -9 -51 81 2601 459 20 4 100

20 4 -7 -41 49 1681 287 80 16 400

50 6 -5 -11 25 121 55 300 36 2500

40 8 -3 -21 9 441 63 320 64 1600

50 10 -1 -11 1 121 11 500 100 2500

60 12 1 -1 1 1 -1 720 144 3600

80 14 3 19 9 361 57 1120 196 6400

90 16 5 29 25 841 145 1440 256 8100

90 18 7 29 49 841 203 1620 324 8100

120 20 9 59 81 3481 531 2400 400 14400

Sum=610 110 0 0 330 10490 1810 8520 1540 47700

Mean=61 11
.
n XY −  X  Y
10(8520) − (110)(610)
r= = = 0.975
10(1540) − (110)(110) 10( 47700) − (610)(610)
n X 2 − ( X ) 2 n Y 2 − ( Y ) 2

Or using the deviation form (Equation 2.2), the correlation


coefficient can be computed as:

1810
r= = 0.975
330 10490
.
• This result shows that there is a strong positive
correlation between the quantity supplied and
the price of the commodity under consideration.
• The simple correlation coefficient has the value
always ranging between -1 and +1.
• Its minimum value is -1 and its maximum value is
+1.
• If r= -1, there is perfect negative correlation
between the variables.
• If , 0  r  +1 there is positive correlation between
the two variables and movement from zero to
positive one increases the degree of positive
correlation.
.
• If r= +1, there is perfect positive correlation
between the two variables.
• If the correlation coefficient is zero
• it indicates that there is no linear relationship
between the two variables.
• If the two variables are independent, the
value of correlation coefficient is zero
• but zero correlation coefficient does not
show us that the two variables are
independent.
Question 1:
Suppose there are two test scores:
Find the correlation coefficient?
Paper II
Paper I

110
29

107
32

100
27

96
29

89
25

78
25

67
21

66
26

49
22
Solution

rxy= 0.843
• Therefore there was a high positive
correlation/relationship observed between
mark results of Paper I and Paper II
Properties of Simple Correlation Coefficient
• The simple correlation coefficient has the following
important properties:
1. The value of correlation coefficient always ranges between -
-1 and +1.
2. The correlation coefficient is symmetric. That means , r = r
xy yx

where, rxy is the correlation coefficient of X on Y and ryx is


the correlation coefficient of Y on X.
3. The correlation coefficient is independent of change of
origin and change of scale.
• Change of origin we mean subtracting or adding a constant
from or to every values of a variable.
• Change of scale we mean multiplying or dividing every
value of a variable by a constant.
.
4. If X and Y variables are independent, the
correlation coefficient is zero. But the inverse
is not true.
5. The correlation coefficient has the same sign
with that of regression coefficients.
6. The correlation coefficient is the geometric
mean of two regression coefficients.
r = b yx * bxy
• Though, correlation coefficient is most
popular in applied statistics and econometrics,
it has its own limitations.
The major limitations of the method are:
1. The correlation coefficient always assumes linear
relationship regardless of the fact whether the
assumption is true or not.
2. It does not show cause and effect. For example, high
correlation between lung cancer and smoking does not
show us smoking causes lung cancer.
3. The value of the coefficient is unduly affected by the
extreme values
5. The coefficient requires the quantitative measurement
of both variables. If one of the two variables is not
quantitatively measured, the coefficient cannot be
computed.
The Rank Correlation Coefficient
• Linear correlation coefficient formulas are based
on the assumption that the variables involved are
quantitative and that we have accurate data for
their measurement.
• In many cases the variables may be qualitative (or
binary variables) and hence cannot be measured
numerically.
• For example, profession, education, preferences
for particular brands, are categorical variables.
• In many cases precise values of the variables may
not be available.
.
• Therefore the formulae developed for linear
correlation may not work.
• For such cases another statistic, the rank
correlation coefficient (or spearman’s correlation
coefficient) is developed.
• We rank the observations in a specific sequence
example in order of size, importance, etc.,
• Using the numbers 1, 2, 3, …, n.
– We assign ranks to the data and measure relationship
between their ranks instead of their actual numerical
values.
.
• If two variables X and Y are ranked in ascending
or descending order, the rank correlation
coefficient is computed by the formula
• .
6 D 2

r' = 1 −
n( n 2 − 1)
• Where,
• D = difference between ranks of corresponding
pairs of X and Y
• n = number of observations
• The values that r assume range from + 1 to – 1.
.
when applying the rank correlation coefficient.
1. it does not matter whether we rank the
observations in ascending or descending
order.

But we must use the same rule of ranking for


both variables.

2. if two (or more) observations have the same


value we assign to them the mean rank
Example: A market researcher asks experts to
express their preference for twelve different
brands of maize verities.
Find the rank correlation coefficient for persons
preference
Maize variety A B C D E F G H I

Person I 9 10 4 1 8 11 3 2 5

Person II 7 8 3 1 10 12 2 6 5

The figures in this table are ranks but not quantities.


We have to use the rank correlation coefficient to
determine the type of association between the
preferences of the two persons.
Computation for rank correlation coefficient
Maize Variety A B C D E F G H I J K L Total
Person I 9 10 4 1 8 11 3 2 5 7 12 6
Person II 7 8 3 1 10 12 2 6 5 4 11 9
Di 2 2 1 0 -2 -1 1 -4 0 3 1 -3
Di2 4 4 1 0 4 1 1 16 0 9 1 9 50

6 D 2
6(50)
r' = 1 − = 1− = 0.827
n( n − 1)
2
12(12 − 1)
2

Interpretation: similarity of preferences of the


two persons
Exercise
• In Table below , you are given the ranks of 10 students in
midterm and final examinations in statistics. Compute
Spearman’s coefficient of rank correlation and interpret it.
Partial Correlation Coefficients
• It measures the relationship between any two variables,
when all other variables connected with those two are
kept constant.
• example, let us assume that we want to measure the
correlation between the number of hot drinks (X1)
consumed in a summer resort and the number of
tourists (X2) coming to that resort.
• It is obvious that both these variables are strongly
influenced by weather conditions, which we may
designate by X3.
• we expect X1 and X2 to be positively correlated: when a
large number of tourists arrive in the summer resort,
one should expect a high consumption of hot drinks and
vice versa
.
• The computation of the simple correlation coefficient
between X1 and X2 may not reveal the true relationship
connecting these two variables, because of the
influence of the third variable, weather conditions (X3).
• The above positive relationship between number of
tourists and number of hot drinks consumed is
expected to hold if weather conditions can be assumed
constant.
• If weather condition changes, the relationship between
X1 and X2 may change to such an extent as to appear
even negative.
• if the weather is hot, the number of tourists will be
large, but because of the heat they will prefer to
consume more cold drinks and ice-cream rather than
hot drinks.
.
• If we overlook the weather and look only at X1 and X2
we will observe a negative correlation between these
two variables which is explained by the fact that hot
drinks as well as number of visitors are affected by
heat.
• In order to measure the true correlation between X1
and X2, we must find some way of accounting for
changes in X3.
• This is achieved with the partial correlation coefficient
between X1 and X2, when X3 is kept constant.
• The partial correlation coefficient is determined in
terms of the simple correlation coefficients among the
various variables involved in a multiple relationship.
.
• In our example there are three simple
correlation coefficients
• r12 = correlation coefficient between X1 and X2
• r13 = correlation coefficient between X1 and X3
• r23 = correlation coefficient between X2 and X3
• The partial correlation coefficient between X1
and X2, keeping the effect of X3 constant is
given by:
r12 − r13 r23
r12..3 =
(1 − r13 )(1 − r23 )
2 2
.
• The partial correlation between X1 and X3,
keeping the effect of X2 constant is given by:
r13 − r12 * r23 r23 − r12 * r13
r13.2 = and r23.1 =
(1 − r12 )(1 − r23 )
2 2
(1 − r )(1 − r )
2
12
2
13

1. The following table gives data on the yield of


corn per acre(Y), the amount of fertilizer
used(X1) and the amount of insecticide used
(X2). Compute the partial correlation coefficient
between the yield of corn and the fertilizer used
keeping the effect of insecticide constant.
Data on yield of corn, fertilizer and insecticides used

Year 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980

Y 40 44 46 48 52 58 60 68 74 80

X1 6 10 12 14 16 18 22 24 26 32

X2 4 4 5 7 9 12 14 20 21 24
Computation for partial correlation coefficients
Year Y X1 X2 y x1 x2 x1y x2y x1x 2 x12 x22 y2
1971 40 6 4 -17 -12 -8 204 136 96 144 64 289
1972 44 10 4 -13 -8 -8 104 104 64 64 64 169
1973 46 12 5 -11 -6 -7 66 77 42 36 49 121
1974 48 14 7 -9 -4 -5 36 45 20 16 25 81
1975 52 16 9 -5 -2 -3 10 15 6 4 9 25
1976 58 18 12 1 0 0 0 0 0 0 0 1
1977 60 22 14 3 4 2 12 6 8 16 4 9
1978 68 24 20 11 6 8 66 88 48 36 64 121
1979 74 26 21 17 8 9 136 153 72 64 81 289
1980 80 32 24 23 14 12 322 276 168 196 144 529
Sum 570 180 120 0 0 0 956 900 524 576 504 1634
Mean 57 18 12
.
• ryx1=0.9854
• ryx2=0.9917
• rx1x2=0.9725
• Then
ryx1 − ryx2 rx1x2 0.9854 − (0.9917)(0.9725)
ryx1 . x2 = = = 0.7023
(1 − ryx2 )(1 − rx1x2 )
2 2
(1 − 0.9917 )(1 − 0.9725 )
2 2
Practice Exercise
• Calculate Simple Correlation Coefficient?
Limitations of the Theory of Linear Correlation
• Correlation analysis has serious limitations as a
technique for the study of economic relationships.
1. The formulae for r apply only when the relationship
between the variables is linear.
• two variables may be strongly connected with a
nonlinear relationship.
• zero correlation and statistical independence of two
variables (X and Y) are not the same thing.
• Zero correlation implies zero covariance of X and Y so
that r=0.
• Statistical independence of x and y implies that the
probability of xi and yi occurring simultaneously is the
simple product of the individual probabilities
.
• P (x and y) = p (x) p (y)
• Independent variables do have zero covariance and are
uncorrelated:
• the linear correlation coefficient between two
independent variables is equal to zero.
• zero linear correlation does not necessarily imply
independence.
• uncorrelated variables may be statistically dependent.
• example if X and Y are related so that the observations
fall on a circle or on a symmetrical parabola, the
relationship is perfect but not linear.
• The variables are statistically dependent.
.
2. the second limitation of the theory is that although
the correlation coefficient is a measure of the co-
variability of variables, it does not necessarily imply
any functional relationship between the variables
concerned.
• Correlation theory does not establish, and/ or prove
any causal relationship between the variables.
• It seeks to discover a co-variation exists
• but it does not suggest that variations in, say, Y are
caused by variations in X, or vice versa.
• Knowledge of the value of r, alone, will not enable us
to predict the value of Y from X.
• A high correlation between variables Y and X may
describe any one of the following situations:
.
1. variation in X is the cause of variation in Y,
2. variation in Y is the cause of variation X,
3. Y and X are jointly dependent, or there is a
two- way causation, that is to say Y is the
cause of (is determined by) X, but also X is
the cause of (is determined by) Y. For
example in any market: q = f (p), but also p =
f(q), therefore there is a two – way causation
between q and p, or in other words p and q
are simultaneously determined.
.
4. there is another common factor (Z), that
affects X and Y in such a way as to show a
close relation between them. This often
occurs in time series when two variables have
strong time trends (i.e. grow over time). In this
case we find a high correlation between Y and
X, even though they happen to be causally
independent
5. The correlation between X and Y may be due
to chance.

You might also like