Correlation and Regression-1
Correlation and Regression-1
In this Chapter, we discuss correlation analysis which is a technique used to quantify the
interrelation between two or more continuous variables. For example, correlation coefficient
could be computed for a research that has been carried out to find out whether a person’s
height is related to his age.
LEARNING OBJECTIVES
CORRELATION
Correlation deals with finding the relationship between two quantitative variables without
being able to infer causal relationships. Correlation is a statistical technique used to determine
the degree, and direction to which two variables are related. Correlation expresses the
relationship or interdependence of two sets of variables upon each other in such a way that
the changes in the value of one variable are in sympathy with the changes in the other.
1
Correlation Coefficient is the numerical measurement showing the degree of correlation
between two variables.
Correlation does not mean the presence of cause and effect relationship between the two
distributions. Thus, a correlation between two variables does not necessarily imply that one
causes the other. The “cause and effect” assumption is a fallacy known as cum hoc ergo
propter hoc, Latin for "with this, therefore because of this". For example, when we say that
there is relationship between price and demand; it does not mean that price “cause” demand.
In other words, as price increases, the amount of demand decreases and vice versa.
It is generally assumed that when two variables are correlated, a certain relationship exists
between them. But there is a possibility that, statistically, two variables are found correlated
but practically they are not related at all. For example, there cannot be statistical relationship
between rainfall and percentage of pass in an examination, even though there may exist
correlation between them. Such correlation is called Spurious Correlation, which arises due
to chance factor.
USEFULNESS OF CORRELATION
Correlation is useful in physical and social sciences. The following are the important uses.
1. Correlation is very useful to economists to study the relationship between variables like
price and quantity demanded. It helps businessmen to estimate costs, sales, price, and
other related variables.
2. Some variables show some kind of relationship; correlation analysis helps in measuring
the degree of relationship between the variables like supply and demand etc.
3. The relation between variables can be verified and tested for significance, with the help of
the correlation analysis.
4. The coefficient of Correlation is a relative measure, and we can compare the relationship
between variables which are expressed in different units.
5. Sampling error can also be calculated.
6. Correlation is the basis for the concept of regression and ratio of variation.
Types of Correlation
2
1. Positive and Negative
2. Simple and Multiple
3. Partial and Total
4. Linear and Non-Linear
Correlation is said to be positive when the values of two variables move in the same
direction, so that an increase in the values of one variable is associated with an increase in the
values of the other variable also; and a decrease in the values of one variable is associated
with the decrease in the values of other variables.
3
Methods of Studying Correlation
The commonly used methods for studying the correlation between two variables are:
1. Graphical Method
a) Scatter diagram
b) Simple graph
2. Mathematical Method
Karl Pearson’s coefficient of correlation
1. a) Scatter Diagram
This is the simplest way of studying correlation between the two distributions, by plotting the
values on a chart known as scatter diagram. In this method, the given data are plotted on a
graph paper in the form of dots. X variables are plotted on the horizontal axis and y variables
on the vertical axis. Thus, we have the dots and we can know the scatter of the various points;
and this will show the type of correlation.
The following diagrams illustrate the degree and direction of relationship
Diagram I indicates positive correlation as it shows that the values of the two variables
move in the same direction.
Diagram 2 indicates negative correlation as the values of the two variables move in the
reverse direction.
Diagram 3 indicates no correlation.
Assumptions Testing
Correlation analysis has the following underlying assumptions:
• Related Pairs– the data should be collected from related pairs: i.e. if you obtain a score on
an X variable, there must be a score on the Y variable from the same subject.
4
• Scale of Measurement– data should be interval or ratio in nature.
• Normality– the scores for each variable should be normally distributed. For large data, this
assumption may be relaxed
• Linearity– the relationship between the two variables must be linear. You should first use a
scatter plot to establish if the data indicates a linear relationship.
• Homogeneity of Variance– the variability in scores for one variable is roughly the same at
all values of the other variable; i.e. it is concerned with how the scores cluster uniformly
about the regression line. The variance (standard deviation) of X should be roughly the same
as Y
To determine the numerical value of the coefficient of correlation, the following formula is
used:
n xy x y
Correlation coefficient rx , y
2 2
n x2 x n y2 y
n
xi x yi y
Cov x , y
rx , y i 1
n n
x y 2 2
xi x yi y
i 1 i 1
Where;
(X-X)(Y Y )
Cov(x,y) , x and y are the standard deviation of X and Y variables
n 1
respectively.
(X-X)( X X) 2
(Y-Y)(Y-Y) 2
Note: Cov(x,x) x and Cov(y,y) y
n 1 n 1
The sign of r denotes the natureof association while the value of r denotes the strength of
association.
If the sign is +ve, this means the relation is direct (an increase in one variable is
associated with an increase in the other variable and a decrease in one variable is
associated with a decrease in the other variable).
While if the sign is –ve, this means an inverse or indirect relationship (which means
an increase in one variable is associated with a decrease in the other).
5
Example 1
Yyome, an economic analyst wanted to find the relationship between inflation rate and prime
lending rate. He, therefore, collected data on inflation rate and lending rate over a seven-year
period. The data below represent the inflation rate (x) and prime lending rate (y) over the
seven-year period.
Compute the product moment correlation coefficient and comment on the results.
Solution 1
X Y xy x2 y2
33. 5.2 17.16 10.89 27.04
6.2 8.0 49.60 38.44 64.0
11.0 10.8 118.80 121.00 116.64
9.1 7.9 71.89 82.81 62.41
5.8 6.8 39.44 33.64 46.24
6.5 6.9 44.85 42.25 47.61
7.6 9.0 68.40 57.76 81.00
Sums 49.5 54.6 410.14 386.79 444.94
y2 444 .94
7( 410.14) ( 49.5)(54.6)
r
7(386.79) ( 49.5) 2 7( 444.94) (54.6) 2
2870.98 2702.7
r
2707.53 2450.25 3114.58 2981.16
6
168.28 168.28
r 0.91
185.25
257.28 133.42
A correlation coefficient of 0.91 shows a very strong positive correlation between inflation
rate (x) and prime lending rate (y).
Example 2
The managers of a company with ten operating plants of similar size producing small
components have observed the following pattern of expenditure on inspection and defective
parts delivered to the customer:
Observation No. 1 2 3 4 5 6 7 8 9 10
Inspection Expenditure/1000 Units 25 30 15 75 40 65 45 24 35 70
Defective parts/1000 Units delivered 50 35 60 15 46 20 28 45 42 22
They are wondering how strong the relationships is between inspection expenditure and the
number of faulty items delivered.
Calculate the product moment correlation coefficient and comment on your results.
Solution
X Y xy x2 y2
25 50 1250 625 2500
30 35 1050 900 1225
15 60 900 225 3600
75 15 1125 5625 225
40 46 1840 1600 2116
65 20 1300 4225 400
45 28 1260 2025 784
24 45 1080 576 2025
35 42 1470 1225 1764
70 22 1540 4900 484
424 363 12815 21926 15123
n xy x y
r
2 2
n X2 X n y2 y
7
128150 153912
r
2109260 179776 151230 131769
25762 25762
r 0.93
768398124 27720
APPLICATIONS IN BUSINESS
Review
Correlation is a statistical measure of the relationship between two series of numbers
representing data.
Positively Correlated items move in the same direction.
Negatively Correlated items move in opposite directions.
Correlation Coefficient is a measure of the degree of correlation between two series of
numbers representing data.
Keynotes
To reduce overall risk in a portfolio, it is best to combine assets that have a negative (or
low-positive) correlation.
Uncorrelated assets reduce risk somewhat, but not as effectively as combining negatively
correlated assets.
Investing in different investments with high positive correlation will not provide
sufficient diversification.
Consider a portfolio of three assets A, B, and C with return a, b and c respectively. Assumed
that they are equally weighted, the covariance of return from all possible pairs of assets can
be presented in the covariance matrix
8
cov(a, a ) cov(a, b) cov(a, c)
cov(b, a ) cov(b, b) cov(b, c)
cov(c, a) cov(c, b) cov(c, c )
Example
Miss EwuramaGyamfuaa wants to invest part of her student grant and she considering any
two of the following investment opportunities; A, B, C and D. the covariance matrix of the
historic returns of these investment opportunities is:
A B C D
Her interest is to diversify the investment to minimize the risk. Compute the correlation
matrix of these investments and advise Ewurama on the best combination.
Solution
Cov ( A, B) Cov ( A, B)
Corr ( A, B)
SD A SD B Cov ( A, A) Cov ( B, B)
A B C D
A 1 0.2330 0.9836 0.1895
B 0.2330 1 0.2672 0.1979
C 0.9836 0.2672 1 0.20470
D 0.1895 0.1979 0.20470 1
9
Miss Ewuramacan reduce or minimize risk by investing in products with a most negative
correlated investment. Hence, she should invest in .
This coefficient also known as the spearman rank correlation coefficient. It is an alternative
method of measuring correlation and based on the ranks of the sizes of item values.
Example 3
A group of 8 business students were tested in Quantitative Methods and cost Accounting.
Their rankings in the two tests were:
Student A B C D E F G H
Quantitative Methods (Ranking) 2 7 6 1 4 3 5 8
Cost Accounting (Ranking) 3 6 4 2 5 1 8 7
Calculate the spearman’s rank correlation coefficient for the two sets of ranks and comment
on the results.
10
Solution
QM C. A
Ranking Ranking
Rx Ry D= Rx- Ry d2
A 2 3 -1 1
B 7 6 1 1
C 6 4 2 4
D 1 2 -1 1
E 4 5 -1 1
F 3 1 2 4
G 5 8 -3 9
H 8 7 1 1
22
6 d2 6 22
r 1 2
1 0.74
nn 1 8 82 1
The rank correlation coefficient of 0.74 shows a strong positive relation between students’
performances in the two tests.
Example 4
A national consumer protection society investigated seven brands of paint to determine their
quality relative to price. The society’s conclusions were ranked according to the following
table:
Brand A B C D E F G
Price/Litre (x) 192 158 135 160 205 139 177
Quality ranking (Ry) 2 6 7 4 3 5 1
Using Spearman’s rank correlation coefficient determines whether the consumer generally
gets value for money.
11
Solution 4
Ranking for quality has already been done. Therefore price/litre must be ranked so that we
use the Spearman’s formula.
Brand Rx Ry D d2
A 2 2 0 0
B 5 6 -1 1
C 7 7 0 0
D 4 4 0 0
E 1 3 -2 4
F 6 5 1 1
G 3 1 2 4_
2
d 10
6 d2 6 10
r 1 2
1 0.821
nn 1 7 72 1
A coefficient of 0.82 shows a high degree of positive correlation which means that in general
the consumer gets value for money.
TIED RANKINGS
If one or more groups of data items have the same value (known as tied values) the ranks that
would have been allocated separately must be averaged and this average rank given to each
item with this equal value. For example the five numbers 8, 14, 14, 19, 21 would be
allocated ranks 1, 2.5, 2.5, 4, 5 respectively (since two items have value 14, each must be
allocated the average of ranks 2 and 3.
Example 5
The Department of Public Health under the auspices of the Ministry of Health investigated
the age, weight and diastolic blood pressure of nine women, with the following results:
Age (Years) 69 33 27 45 58 24 51 35 21
Weight (Kg) 64 70 60 102 75 67 76 55 67
Blood Pressure 85 85 70 85 75 85 80 60 55
(mm of mercury)
Solutions
Rank all the variables: Rx for rank of Age; Ry for rank of weight; and Rz for rank of Blood
pressure
The correlation coefficient of 0.45 in (ii) shows a moderate positive relation between weight
and blood pressure.
It can be said from the results of (i) & (ii) that blood pressure rises as age and weight
increase.
13
CORRECTION OF TIED RANKS
From practical viewpoint it is often not worth correcting for ties. Use of correction is advised
if
i) Three or more observation are tied equally
ii) The number of pairs of ties is more than ¼ of the number of observations.
2
j
m m2 1
6 d
i 1 12
r 1 2
N N 1
Where m is the number of equal observations with common rank and j total number of ties.
Examples; the table below present the price (P) and quantity demanded (Q) of a commodity.
Calculate the Coefficient of rank correlation between P and Q.
P 80 78 75 75 68 57 60 59
Q 110 111 114 114 114 116 115 117
2 m1 m 2 1 m2 m 2 1
6 d
12 12
r 1 2
N N 1
2 22 1 3 32 1
6 159 . 50
12 12
r 1
8 82 1
6 159 . 50 0 .5 2 .0
r 1
8 64 1
6 162
r 1
504
r 1 1 . 928 0 . 928
14
DEMERITS OF RANK CORRELATION METHOD
1. It is applied to ungrouped data only
2. The ranking procedure ignores the actual magnitude of the data and as such the results
obtained are only approximate
3. Computation is difficult as the number of paired observations increase.
2C n
rc , if 2 C n 0
n
2C n
rc , if 2 C n 0
n
Where:
C = the number of the positive concurrent deviation
N = the number of pairs of deviations compared
The number is one less than N(n=N-1)
STEPS
(1) Determine deviation of the paired series, (Dx and Dy). Dx and Dy are determined by
comparing the series with the preceding one. If the value is greater than the preceding,
deviation is taken positive (+), otherwise (-), if the value is equal to the preceding,
deviation will be zero (0).
(2) Multiply the corresponding deviations (Dx,Dy) to get the concurrent deviation.
(3) Count the number of positive concurrent deviations.
(4) Find n=N-1
2C n
(5) If 2C-n is positive, use rc and
n
2C n
If 2C-n is negative use rc
n
15
Example
The data below relates to prices and imports of Ayedwe Ltd. Determine the correlation
coefficient using concurrent method.
Price 368 384 385 361 347 384 395 403 400 385
Imports 22 21 24 20 22 26 24 28 28 27
Solution
Price X Deviation Dx Import Deviation Dy Concurrent
Deviation
Dx,Dy
368 22
384 + 21 - -
385 + 24 + +
361 - 20 - +
347 - 22 + -
384 + 26 + +
395 + 24 - -
403 + 29 + +
400 - 28 - +
385 - 27 - +
C=6
n = N – 1 = 10 – 1 = 9
16
REGRESSION ANALYSIS
Regression analysis is of great practical use even more than the correlation analysis; the
following are some uses,
These two techniques are directed towards a common purpose of establishing the degree and
the direction of relationship between two or more variables but the methods of doing so are
different. The choice of one or the other will depend on the purpose. In spite certain
similarities between these two, but there are some basic differences in the two approaches,
which have been summarized below:
17
CORRELATION REGRESSION
1. Correlation, literally means related or 1. Regression literally means return to
sympathetic movements between the normal, which is true on account
variables of the average of relationship.
2. There is a sort of interdependence, 2. It establishes a functional
which is mutual. relationship, which is mathematical
3. There is no cause and effect relation- showing dependence of one variable
ship. It only shows the existence of on the other.
some association in the movement of 3. It may have a cause and effect
variables. relationship.
4. It may be spurious correlation if the 4. It is a mathematical relationship,
sympathetic movement is on account which should be interpreted suitably.
of the influence of an outside variable 5. It is an absolute measure of
which has no relevance. relationship.
5. It is a relative measure showing 6. Besides verification it can also be
association between variables. used for estimation and prediction. It
6. It is used only for testing and tenders more comprehensive
verification of the relationship. It information.
tenders only a limited information. 7. It is very useful for further
7. It is not very useful for further mathematical treatment.
mathematical treatment.
The process of obtaining a linear regression equation for a given set of (bivariate) data is
often referred to as fitting a regression line.
There are 3 main methods commonly used to fit a regression line to a given set of bivariate
data. These are
(a) By inspection: This is the simplest method and consists of plotting a scatter diagram
for the relevant data and then drawing in the line that most suitably fits the data.
A scatter diagram is chart that portrays the relationship between two variables. It is to be
noted that the mean point of the data is to be plotted and ensure that the regression line passes
through this point.
18
This method suffers from the defect that we cannot get a unique line. Different people would
probably draw different lines using the same data.
(b) By semi-averages: This technique consists of splitting the data into two equal groups,
plotting the mean point for each group and joining these two points with a straight
line.
(c) By method of least squares: The most generally applied curve-fitting technique in
regression analysis is the method of least squares. This method imposes the
requirement that the sum of the squares of the deviations of the observed values of the
dependent variable from the corresponding computed values on the regression line
must be a minimum. Thus, if a straight line is fitted to a set of data by the method of
least squares, it is a “best fit” in the sense that the sum of the squared deviations
2
y y is the least compared with any other possible straight line. Another useful
characteristic of the least squares straight line is that it passes through the point of
means x, y and therefore makes the total of the positive and negative deviations
equal to 0.
The elementary form of a straight line y a bx is used, where ‘a’ is constant and
indicates the y- intercept; ‘b’ is also a constant and indicates the gradient of the line.
The values of a and b are obtained by solving these two simultaneous equations
y an b x …………………… (i)
xy a x b x 2 ………………… (ii)
and are called the normal equations. Derived from (i) and (ii) are the following
computational formulae for finding and .
n xy x y y x
b 2
and a b
n x2 x n n
Note:
Regression coefficient and the constant ( a ) can also be computed as follows when the
regression coefficient ( r ), standard deviations of x ( x ) and y ( y
) are known:
19
Regression Equation of on : bxy = r ( x / y
) and a xy y bxy x
Example:
A research was conducted to find the relationship between years of experience (x) and
monthly salary (y) in thousands of cedis earned by technicians in a very large company. The
data below gives the results of a sample of 12 technicians covered by the research:
X 12 16 6 23 27 8 5 19 23 13 16 8
Y 580 580 460 680 760 480 440 680 720 540 660 540
Determine the regression equation of Y on X by the method of least squares and use your
results to estimate the salary of technician with 15 years’ experience.
20
Solution
Calculations needed for determining the least squares regression equation
x y xy x2 y2
12 580 6960 144 336400
16 580 9280 256 336400
6 460 2760 36 211600
23 680 15640 529 462400
27 760 20520 729 577600
8 480 3840 64 230400
5 440 2200 25 193600
19 680 12920 361 462400
23 720 16560 529 518400
13 540 7020 169 291600
16 660 10560 256 435600
8_ 540 4320 64 291600
176 7120 112580 3162 4348000
y x 7120 176
And a b 14.04 593 .33 205 .92 387 .41
n n 12 12
The equation is y 387.41 14.04 x
Note: The b value of 14.04 means that for each additional year the salary of a technician
is expected to increase by about GH¢14040.
Example 2
21
Apuskeleke, an economic analyst wanted to find the relationship between inflation rate and
prime lending rate. He, therefore, collected data on inflation rate and lending rate over a
seven-year period. The data below represent the inflation rate (x) and prime lending rate (y)
over the seven-year period.
x 3.3 6.2 11.0 9.1 5.8 6.5 7.6
y 5.2 8.0 10.8 7.9 6.8 6.9 9.0
Find the line of best fit for predicting the prime lending rate from the inflation rate. Use your
results to predict the prime lending rate when the inflation rate is 10.5.
Solution 2
Calculations needed to find the line of best fit
x y xy x2
3.3 5.2 17.16 10.89
6.2 8.0 49.60 38.44
11.0 10.8 118.80 121.00
9.1 7.9 71.89 82.81
5.8 6.8 39.44 33.64
6.5 6.9 44.85 42.25 Summations of
7.6 9.0 68.40 57.76 the respective
columns
49.5 54.6 410.14 386.79
2870.98 2702.70
2707.53 2450.25
168.28
0.654
257.28
y x 54.6 49.5
And a b 0.654 7.80 4.62 3.18
n n 7 7
y 3.18 0.654x
22
3..18 6.867
10.05
The prime lending rate at inflation rate of 10.5 is 10.05.
COEFFICIENT OF DETERMINATION
Example: If the correlation coefficient between x and y is 0.8, the coefficient of determination
will be 0.64. It implies that there is 64% of variation in y explained by the variation in x and
the remaining 36% is explained by some other factors. This 1 r 2 is referred to as coefficient
of non-determination.The square root of coefficient of non-determination is known as
coefficient of alienation.
Implications
r b yx x bxy
2. If byx is positive then bxy should also be positive & vice versa.
3. If one regression coefficient is greater than one the other must be less than one.
4. The coefficient of correlation will have the same sign as that of our regression
coefficient.
5. Arithmetic mean of byx&bxyis equal to or greater than coefficient of correlation.
b yx bxy
r
2
6. Regression coefficient are independent of origin but not of scale.
23
APPLICATIONS
Illustration 1
Prices of inputs of general goods and services are facing an upward shift as a result of the
general increase in petroleum products in 2015 and this has the resultant increase in the unit
prices of goods and services.
Trustee, the operator of Koko Burger in Cape Coast Metro, has contracted AugBlay Advisory
Services to advise him on expected changes in demand and revenue. AugBlay Services has
ascertained a very high correlation between the price and demand of Koko Burger at -0.95.
Weekly demand and associated prices for a given period sampled and descriptive statistics of
the sample showed a standard deviation of demand and price as 165.23 and 8.32 respectively.
Also, demand and price averaged 403.22 and 15.63 respectively.
SOLUTION
(a) The regression equation relating quantity sold to price can be stated as
Q a bP
Where Q is the quantity sold, P is the unit price, b the regression coefficient/slope and
a , the regression intercept.
Recall that
Q
bQP r , where r is the correlation coefficient between Q and P , Q and P are
P
24
(b) at = GHC22;
The expected price ( ( P1 ) 1.12P 24.64 because prices are expected to increase by 12%
Illustration 2
Apraku, an importer of stationary material is considering cutting down the quantity of good
he imports to reduce the associated cost of the quarterly imports. The table below shows the
quarterly imports and importation cost over the past five years.
25
c. Estimate the regression equation of import volumes on importation cost and interpret your
results.
d. Determine the coefficients of non-determination and alienation and interpret them.
e. Given that the relationship between import volume(V) and importation cost(C) is such
that C V ,
i. find and
ii. The value of C when V 1000
Solution
Apraku’s consideration of cutting down the quantity of good he imports to reduce the
associated cost of the quarterly imports is an indication of suspicion of some possible
association between import volumes and importation cost. This may be validated by
examining the correlation between import volumes and importation cost. We will therefore
calculate the correlation coefficient between import volumes and importation cost.
Let:
X = Import Volume
Y = Importation Cost
Year Quarter X Y XY X2 Y2
GHC GHC GHC GHC GHC
2009 1 500 1000 500000 250000 1000000
2 511 1020 521220 261121 1040400
3 658 1300 855400 432964 1690000
4 450 1200 540000 202500 1440000
2010 1 560 1560 873600 313600 2433600
2 750 1580 1185000 562500 2496400
3 800 1700 1360000 640000 2890000
4 510 1900 969000 260100 3610000
2011 1 580 2000 1160000 336400 4000000
2 590 2100 1239000 348100 4410000
3 600 2300 1380000 360000 5290000
4 450 1500 675000 202500 2250000
2012 1 600 1800 1080000 360000 3240000
2 680 1566 1064880 462400 2452356
3 780 1800 1404000 608400 3240000
4 590 1890 1115100 348100 3572100
2013 1 456 1500 684000 207936 2250000
2 560 2300 1288000 313600 5290000
3 800 2400 1920000 640000 5760000
4 600 2300 1380000 360000 5290000
TOTAL 12025 34716 21194200 7470221 63644856
26
n XY X Y
(a) Correlation Coefficient (r)= 2 2 2 2
n X X n Y Y
20(21194200 ) (12025)(34716)
=
20(7470221) (12025 2 ) 20(63644856 ) (34716 2 )
6424100
= 0.356
18033.034
A correlation coefficient of 0.36 shows a low positive correlation between import volume and
importation costs. This is enough evidence to support Apraku’s view.
(b) The regression equation of importation costs (Y ) on import volumes (X ) could thus be
stated as:
Y a bX , where a = regression intercept; and
b = regression coefficient/slope
We can recall that
n XY X Y Y X
bYX 2
and a b
n X2 X n n
20(21194200) (12025)(34716)
b
20(7470221) (120252 )
34716 12025
a 1.3373
20 20
6424100
b 1.3373 a 931.75
4803795
The resultant regression equation could therefore be defined as:
Y 931.75 1.3373X
(c) Similarly, the regression equation of import volumes (X ) on importation costs (Y ) could
be stated as: X a bY
Given that, r bxy byx
r2 0.356 2
bxy 0.0948
byx 1.3373
Y X 12025 34716
Similarly, a b 0.0948 436.523
n n 20 20
The regression equation for import volume on importation cost could thus be defined as:
X 436.523 0.0949Y
27
(d) (i) Coefficient of non-determination = 1 – r2 , where r = correlation coefficient = 0.36 [(a)
above]
Coefficient of non-determination 1 0.36 2 0.8704
This implies that 87.04% of any variation in importation cost is not explained by variation
in import volumes.
C V
Applying log10 to the both sides, the above equation can be restated as:
28
Recall that,
n XY X Y
bCV 2
, where X = log V , Y = log C , and bCV
n X2 X
Also,
Y X
a b , where a log and b bCV
n n
64.52533 55.44038
a 0.569 1.649
20 20
Since a log , antilog a
anti log1.649 44.566
i. Thus, 44.57 and 0.57
EXCEL APPLICATION
COMMONLY USED BIVARIATE FUNCTIONS IN EXCEL
FUNCTION MEANING
linest y values, x values, True, False Calculate coefficient and intercept of regression
line of y on x
29
Illustration of Excel Applications
Exercises
1a) Calculate the value of coefficient of correlation between price and supply.
Price 8 10 15 17 20 22 24 25
Supply 25 30 32 35 37 40 42 45
b) Compute Karl Pearson’s coefficient of correlation between per capita National income and per
capita consumer expenditure from the data given below.
30
c) Calculate Karl Pearson’s coefficient correlation Advertisement and sales as per the data given.
d) The following data relate to annual net income and annual food expenditures (GH¢’m)
for 8 selected families in Sikanti.
Annual Net Income GHS 9000 5000 11000 13000 12000 7000 15000 13000
Food Expenditure GHS 6000 8000 4000 4000 3000 6000 3000 5000
2. For five cities, data have been collected on number of civil disturbances (riots, strikes and
so on) over the past year and on unemployment rate.
City A B C D E
Unemployment Rate (x) 22 20 10 15 9
Civil disturbance (y) 25 13 10 5 0
Worker A B C D E F G H I J
Rating of Worker 80 95 83 86 82 75 92 74 75 90
Rating of Supervisor 87 93 87 92 95 78 97 81 76 92
31
Determine the Spearman rank correlation between the workers’ ratings of the
supervisors and the latter’s ratings of the workers and interpret your results.
4. To investigate the relationship between height and shoe, the president of the Ladies’
Club at Ahayiaa Rubber Products Ltd collected the data below:
Lady 1 2 3 4 5 6 7 8 9 10
Height (cm) 164 168 167 165 171 171 168 171 169 165
Shoe size 38 39 40 38 39 40 40 40 39 39
Required
(a) Draw a scatter diagram for the data using the horizontal axis for the height and the
vertical, the shoe size.
(b) Determine the linear regression equation for estimating the shoe sizes from the given
heights. y 6.64 0.19 x . Use the regression equation to estimate the shoe size of a
lady whose height is 166 cm (38 or 39).
32