Correlation and Regression-2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

QUANTITATIVE TECHNIQUES II

Correlation and Regression


PART A

Introduction to Correlation

In the chapter Measures of central tendencies, we studied problems based on one variable called as
univariate analysis. But in the real world we have problems pertaining to two or more variables. If there
exist some relationship between these two or more variable, such an analysis is called as bivariate
analysis. The extent of relationship between these variables can be measured with the help of correlation.
The measure of correlation is called as correlation coefficient. For example, there exists some relationship
between the height of a father and the height of a son, price and demand, wage and price index, yield and
rainfall, height and weight and so on. Correlation is the statistical analysis which measure and analysis
the degree or extent to which two variables are associated or the closeness between each other.

Definition:

Correlation analysis attempts to determine the degree of relationship between variables or the degree of
association between the variables”

Thus, the association of any two variates is known as correlation. It depicts the relationship or
interdependence of two sets of variables upon each other in such a way that the change in one variable
will have a corresponding change in the other. Correlation is the numerical value showing the degree of
correlation between variables. One variable is called the independent (subject) and the other is dependent
(relative) variable. For example, rainfall and agricultural products. Rainfall causes the affects of
agricultural production, while agricultural production cannot cause the rainfall and thus rainfall is
independent and production is dependent.

Uses of Correlation

Correlation is used in both physical and social sciences, also in the field of business and economics.

1. Correlation is very useful to economists to study the relationship between variables, like price and
quantity demanded. For a businessman, it helps to estimate costs, sales, prices and other related
characteristics.
2. Correlation analysis helps in measuring the degree of relationship between the variables like
demand and supply, price and supply, income and expenditure, etc.
3. The measure of correlation can be further tested for significance in the research work.
4. The effect of correlation is to reduce the uncertainty of our prediction.
5. Correlation is the basis for the concept of regression and ratio of variation.

Positive and Negative correlations :

The correlation is said to be positive or direct correlation, if the movement of variables is on the same side
i.e, when the increase (decrease) in one variable is accompanied by an increase (decrease) in the value of
the other variable. For example, price and supply, height and weight etc.
1
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
If the two variables tent to move in the opposite directions so that the increase or decrease of one variable
is accompanied by a decrease or increase in the other variable, then the correlation is called negative or
inverse or indirect correlation. For example, price and demand, yield of crops and price, etc. In short an
increase in one variable is associated with the decrease in the other variable and vice versa.

Methods of studying correlation

(a) Karl Pearson’s coefficient of correlation


(b) Spearman’s rank coefficient of correlation

Mathematical Methods - Coefficient of correlation:

Correlation is a statistical technique used for analyzing the behavior of two or more variables.

(a) Karl Pearson’s Method

Karl Pearson, a reputed statistician has constructed a formula based on the mathematical treatment for
determining the coefficient of correlation.

Characteristics of Karl Pearson’s coefficient:

Following are the main characteristics of Karl Pearson’s coefficient of correlation.

1. It is based on the arithmetic mean and Standard deviation.


2. It determines the direction of relationship.
3. It establishes the size of relationship. The size ranges from +1 and -1.
4. Karl Pearson’s method is considered to be an Ideal method of calculation of correlation. It is
because of the covariance which is most reliable as a standard statistical tool.

Calculation of Karl Pearson’s coefficient of correlation (r)

Steps for finding correlation coefficient by direct method

1. Calculate the mean of the two series X and Y ,i.e, and


2. Calculate the deviation of X and Y from their respective means dx= x - and dy= y - .
3. Square these deviation in X and Y series . find the summation of these square of deviations of both
X and Y series to get and dy2
4. Multiply the single deviation of X with the single deviation of Y and find the summation to get
∑dxdy.
5. Find the value of the coefficient of correlation by using the formula

r= =

Where and

2
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Method of determining Correlation (Interpretation)

The degree of the correlation between the variables can be determined by the quantitative value of the

coefficient of correlation. On the basis of the formula given by Karl Pearson, we can state

approximately the degree of correlation.

Degree of Correlation Positive Negative


Range Range
Correlation lies between +1 and -1 From To From To

1. Perfect +1 -1

2. Very high degree +1.00 +0.9 -0.90 -1.00

3.high degree +0.90 +0.75 -0.75 -0.90

4.Moderate degree +0.75 +0.60 -0.60 -0.75

5.Low degree +0.6 +0.30 -0.30 -0.60


6.very low degree +0.30 +0.00 -0.00 -0.30

7.No correlation 0 0

Example 1. Calculate the coefficient of correlation from the following data:

X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Solution : COMPUTATION OF COEFFICIENT OF CORRELATION

X y dx= x - 5 dy=y - 12 dx2 dy2 dxdy


1 9 -4 -3 16 9 12
2 8 -3 -4 9 16 12
3 10 -2 -2 4 4 4
4 12 -1 0 1 0 0
5 11 0 -1 0 1 0
6 13 1 1 1 1 1
7 14 2 2 4 4 4
8 16 3 4 9 16 12
9 15 4 3 16 9 12
∑x= 45 ∑y = 108 ∑ dx2 = 60 ∑ dy2 = 60 ∑ dxdy = 57

Mean of X series Mean of Y series


3
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Coefficient of correlation r = =

Correlation coefficient is +0.95 and hence there is a very high degree of positive correlation.

Example 2. The following table gives the marks obtained by A and B in ten tests during the year 2009-10.
Calculate the correlation coefficient.

Test No. 1 2 3 4 5 6 7 8 9 10
Marks in
Statistics 77 54 27 52 14 35 90 25 56 60
Marks in
Maths 35 58 60 40 50 40 35 56 34 42
Solution :
Let the marks in statistics be taken as X and that of marks in maths as Y.
Computation of coefficient of correlation
X Y dx dy dx2 dy2 dxdy
70 35 21 -10 441 100 -210
54 58 5 13 25 169 65
27 60 -22 15 484 225 -330
52 40 3 -5 9 25 -15
21 50 -28 5 784 25 -140
35 40 -14 -5 196 25 70
90 35 41 -10 1681 100 -410
25 56 -24 11 576 121 -264
56 34 7 -11 49 121 -77
60 42 11 -3 121 9 -33
∑dxdy
490 450 ∑dx2= 4366 ∑dy2=920 =-1344

Mean of X series Mean of Y series

Coefficient of correlation r = =

Coefficient of correlation is -0.671 and hence there is a very moderate degree of negative correlation.

Example 4. From the following table, find correlation coefficient between age and the playing habits of
students
Age 15 16 17 18 19 20 21
(years)
4
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
No. of 250 200 150 120 100 80 90
students
Regular 200 150 90 48 30 12 50
players
What conclusions do you draw from the result obtained?

Solution :
First calculate the percentage of regular players and then calculate correlation coefficient. Let age be
denoted by X and the percentage of regular players as Y.
In the X series assumed mean is taken as 18 and Y series , the assumed mean is taken as 60

X Y y% dx dx2 dy dy2 dxdy


15 250 200 29.206 852.9904 -87.618
80 -3 9
16 200 150 24.206 585.9304 -48.412
75 -2 4
17 150 90 9.206 84.75044 -9.206
60 -1 1
18 120 48 -10.79 116.5104 0
40 0 0
19 100 30 -20.79 432.3904 -20.794
30 1 1
20 80 12 -35.79 1281.21 -71.588
15 2 4
21 90 50 4.766 22.71476 14.298
55.56 3 9
-223.32
∑dx =0 dx2 =28 ∑dy2= 3376.497

It’s a high degree of negative coefficient

Example 5 :
The following table gives the distribution of the population and those who are totally and partially blind
among them. Find out if there exists any relation between age and blindness.
Age 0 - 10 10 - 20 20 – 30 30 - 40 40 – 50 50 - 60 60 - 70 70 – 80
No of 100 80 50 40 35 29 16 8
persons(000’)
No of Blind 60 50 49 38 28 39 20 14

Solution:
In order to make the data comparable it is necessary to find out the number of blind out of a fixed
number (common unit). We have to find out the number of blind persons corresponding to one lakh, in
each group.
The first figure : (60/100)x 100 = 60
The second figure: (50/60) x 100 = 55

5
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
dy =
total dx= (y –
CI persons no of blind X Y (X-40) dx2 111.312) dy2 dxdy
-35 1225 -51.31 2633.024 1795.955
0 – 10 100 60 5 60
80 50 15 -25 625 -48.81 2382.709 1220.325
10 – 20 62.5
50 49 25 -15 225 -13.31 177.236 199.695
20 – 30 98
40 38 35 -5 25 -16.31 266.114 81.565
30 – 40 95
35 28 45 5 25 -31.31 980.504 -156.565
40 – 50 80
20 39 55 15 225 83.687 7003.514 1255.305
50 – 60 195
16 20 65 25 625 13.687 187.334 342.175
60 – 70 125
8 14 75 35 2975 4056.034 2229.045
70 – 80 175 8
∑dx2
= ∑dy2
4200 17686.47 6967.5

It is a high degree of positive correlation


Merits and Demerits of Karl Pearson’s correlation coefficient
Merits
1. Karl’s correlation coefficient is the most popular mathematical method used for measuring the
degree of relationship.
2. The coefficient of correlation summarises in one figure the degree of correlation, it also estimates
the value of the dependent variable from known values of the dependent variables.
Demerits
1. The assumption of linear relationship between variables is not affected whether it is correct or not.
2. The calculation of correlation coefficient is time consuming.
3. It is affected by the extreme values as is the standard deviation.

Example 7:
Calculate the coefficient of correlation by Pearson’s method between the density of population and the
death rate. Find significance of coefficient of correlation; also find the limits of probable error.
Cities A B C D E F
Density 200 500 400 700 600 300
Death rate 10 16 14 20 17 13

Solution: let density be X and Y be the death rate


X y dx = x-450 dx2 dy = y- 15 dy2 dxdy
-250 62500 -5 25 1250
200 10
6
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
50 2500 1 1 50
500 16
-50 2500 -1 1 50
400 14
250 62500 5 25 1250
700 20
150 22500 2 4 300
600 17
-150 22500 -2 4 300
300 13
∑dxdy
∑dx2=175000 ∑dy2=60 =3200

It is very high degree of positive correlation

Example 9:

If the covariance between X and Y variables is 15 and the variance of X and Y are respectively 25 and 9,

find the coefficient of correlation.

Solution:

Covariance = Variance of X =

Variance of Y = r=
There is perfect positive correlation between the variables.

Example 10: With the following data of cities, calculate the coefficient of correlation between the

population and death rate.

Cities Area in sq. kms Population in 000’ Number of Deaths


A 150 30 300
B 180 90 1440
C 100 40 560
7
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

D 60 42 840
E 120 72 1224
F 80 24 312
Solution:

We are asked correlation between density of population and death rate.

Density of population = population/area*100

Death Rate = deaths/population


Citie Area in sq. Populatio Numbe X Y dx= dx2 dy = dy dxd
s kms n in 000’ r of (Density (death (x-45) y-15 2 y
Deaths ) rate)

A 150 30 300 -25 625 -5 25 125


20 10
B 180 90 1440 5 25 1 1 5
50 16
C 100 40 560 -5 25 -1 1 5
40 14
D 60 42 840 25 625 5 25 125
70 20
E 120 72 1224 15 225 2 4 30
60 17
F 80 24 312 -15 225 -2 4 30
30 13
1750 60 320

r= =0.9875

There is a very high degree of correlation.

Example 11

What inference do you draw when the correlation coefficient between the two variables is.

(i) Equal to zero (ii) Equal to -1 (iii) equal to 0.25

Solution:

(i) No correlation.
8
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

(ii) Perfect negative correlation.

(iii) Very low degree of positive correlation.

Rank correlation
In 1904, Charles Edward Spearman, a British psychologist found out the method of ascertaining the
coefficient of correlation by ranks. This method is based on rank. This method is used for dealing with
qualitative characteristics such as intelligence, beauty, morality, character, etc. which cannot be
quantified as in the case of Karl Pearson’s coefficient of correlation. This measure uses ranks for the
respective observation for any erratic or irregular or extreme or inaccurate values for a given data,
because rank correlation is not based on the assumption of formality of data.
Rank correlation method only gives the approximate results as this method uses ranks instead of the
original values. Rank correlation is applicable only to individual observations.
The formula for Spearman’s Rank correlation which is denoted by r is:

Where r is the Spearman’s rank correlation coefficient


∑d2 is the sum of square of the differences of two ranks
N is the number of paired observation.
Spearman’s rank correlation coefficient(r) also lies between the value of +1 and -1 same as the Karl
Pearson’s coefficient of correlation. If r is -1 then there is a complete disagreement in the order of ranks
and they are in opposite directions and when r is +1 there is a complete agreement in the order of ranks.

Steps to find Rank coefficient of correlation


1. Ranks are awarded for the both the values of the series according to their values. For example the
highest value in the series is given rank 1 and the next highest value 2nd rank and so on.
2. Compute the difference between the ranks of the two series as d.
3. Square d and find the summation.
4. Compute ‘r’ by substituting the value in the formula

Common ranks
Sometimes the values of the variable would be same and their ranks will be same, in such cases the
common ranks are given to all the items having the same value by averaging the normal ranks which the
items would have got if they have differed slightly from each other.
For example:
X 50 46 30 50 60 20 70 50
Ranks 4 6 7 4 2 8 1 4
The item 50 is repeated thrice. Against these items there are three ranks i,e., 3, 4 and 5. If we take the
average of these ranks , we get the common rank 4 as under:
𝑁𝑜𝑟𝑚𝑎𝑙 𝑅𝑎𝑛𝑘𝑠 3+4+5
Common rank = 𝑁𝑜.𝑜𝑓 𝑟𝑎𝑛𝑘𝑠 𝑝𝑜𝑜𝑙𝑒𝑑 = 3
=4

9
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
When there are common ranks in the series , the correlation coefficient formula gets modified with some
adjustments to ∑d2
If there are ‘m’ items in the series and the ranks are common. Then a correlation is modified as:

If there are more than one such groups of items with the common ranks, the above value is added as
many times as the number of items sharing the ranks.
For example, in x series there are two items having the same value and the common rank is 5.5
( i,e., the average of ranks 6th and 7th) and in Y series there are three items with the rank 4(i.,e. 3rd, 4th
and the 5th) and four items with rank 8.5 (i.,e., 7,8,9 and 10), we have to add to the value of ∑d2 three
times as under:

+ +

+ +
Thus for two items m =2, for 3 items m=3 and for four items m = 4.

Example 12: Ten students have obtained the following marks in Statistics and Economics. Calculate the
rank coefficient of correlation.
Statistics 28 30 45 60 90 88 65 55 76 50
Economics 50 40 80 90 20 40 45 66 44 70

Solution: let x be marls in statistics and y be the marks in economics


Computation of rank coefficient of correlation
X Y Rx Ry d= Rx-Ry d2
28 50 10 6 4 16
30 60 9 5 4 16
45 80 8 2 6 36
60 90 5 1 4 16
90 20 1 10 -9 81
88 40 2 9 -7 49
65 45 4 7 -3 9
55 66 6 4 2 4
76 44 3 8 -5 25
50 70 7 3 4 16
∑ d2 = 268

There is a moderate degree of correlation.

10
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Example 13: Calculate the rank correlation between the order of merit and years of service for the

following data.

Employee A B C D E F G H I
Shelf life in months 10 24 15 19 20 15 14 22 20
Actual usage (months) 9 25 18 15 10 11 19 22 118

Solution:

Shelf life in months is taken as x and y as the actual usage.


X y Rx Ry d=Rx-Ry d2
10 9 9 9 0 0
24 25 1 1 0 0
15 18 6.5 4.5 2 4
19 15 5 6 -1 1
20 10 3.5 8 -4.5 20.25
15 11 6.6 7 -0.4 0.16
14 19 8 3 5 25
22 22 2 2 0 0
20 18 3.5 4.5 -1 1
2
∑d =51.41

In the series X , there are two 20’s and hence their ranks are shared as: (3+4)/2=3.5

There are two 15’s and hence their ranks are shared as (6+7)/2=6.5

In the series Y, there are two 18’s and hence their ranks are shared as (4+5)/2=4.5

Example 14: Fifteen industries of the state have been ranked according to profit earned in 2007-2008 and
the working capital for that year. Calculate the rank correlation coefficient.
Industries A B C D E F G H I J K L M N O
Rank (profit) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Rank( working Capital) 14 5 15 13 12 10 11 9 7 8 1 6 4 3 2

Solution: The ranks are already assigned and hence just find the d and its square.
Computation of rank correlation coefficient
Rx Ry d d2
1 14 -13 169
11
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
2 5 -3 9
3 15 -12 144
4 13 -9 81
5 12 -7 49
6 10 -4 16
7 11 -4 16
8 9 -1 1
9 7 2 4
10 8 2 4
11 1 10 100
12 6 6 36
13 4 9 81
14 3 11 121
15 2 13 169
2
∑d = 1000

Example 15 : Ten competitors in a beauty contest are ranked by three judges in the following order:
First
1 6 5 10 3 2 4 9 7 8
judge
Second
7 5 8 3 10 4 9 2 1 6
judge
Third
5 6 7 3 2 6 1 8 9 10
judge

Solution : The ranks are already assigned . The correlation between the judges in the combination of two
should be found to ascertain which two judges are close in their judgment.
The judge1 is taken as X , judge 2 as Y and judge 3 as Z.
Computation of Rank correlation Coefficient
d= d= d=
Rx Ry Rz Rx-Ry d2 Ry-Rz d2 Rz- Rx d2
1 7 5 -6 36 2 4 4 16
6 5 6 1 1 -1 1 0 0
5 8 7 -3 9 1 1 2 4
10 3 3 7 49 0 0 -7 49
3 10 2 -7 49 8 64 -1 1
2 4 6 -2 4 -2 4 4 16
4 9 1 -5 25 8 64 -3 9
9 2 8 7 49 -6 36 -1 1
7 1 9 6 36 -8 64 2 4
8 6 10 2 4 -4 16 2 4
2 2 2
∑d = 262 ∑d =254 ∑d =104

Correlation between judges X and Y

12
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Correlation between judges Y and Z

Correlation between judges Z and X

The judge 3 and judge 1 are close in their approach of judgment as r is positively correlated.
Merits and demerits of Rank correlation
Merits
1. It is simple to understand and easy to calculate.
2. It is very useful in the case of data which are of qualitative nature, like intelligence, honesty,
beauty, efficiency, etc.
3. Ranks are assigned only in this method which becomes easy for computation.
Demerits
1. It cannot be used for quantitative distribution.
2. If the number of the items is greater than 20, becomes tedious and requires lot of time.

Exercise
1. Calculate the Karl Pearson’s coefficient of correlation and interpret the result for the deviations
from their mean of the given two series X and Y.
X -4 -3 -2 -1 0 1 2 3 4
Y 3 -3 -4 0 4 1 2 -2 -1

2. The data relating to import price(Y) and import quantity (X) in respect of a given commodity are
as under:
Year 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Import 2 3 6 5 4 3 5 7 8 7
price
Quantity 6 5 4 5 7 10 9 7 8 9
imported
Calculate Karl Pearson’s coefficient of correlation between x and y and comment on it.

3. Calculate the Karl Pearson’s coefficient of correlation from the following data, using 20 as the working
mean for price, and 70 as the working mean for demand:
Price 14 16 17 18 19 20 21 22 23
Demand 84 78 70 75 66 67 62 58 60

4. Given is the data relating to the aptitude scores and productivity index.

13
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
Aptitude scores 9 18 18 20 20 23
Productivity Index 33 23 33 42 29 32
Find the coefficient of correlation between aptitude scores and productivity index.

5. Given :
x series Series
Arithmetic Mean 74.50 125.50
Assumed Mean 69.00 112.00
Standard Deviation 13.07 15.85
Summation of corresponding deviations of X and Y series = 2176.
Calculate the coefficient of correlation between the series.

6. From the following table calculate the coefficient of correlation by Karl Pearson’s method :
X 6 2 10 4 8
Y 9 - 5 8 7
Arithmetic means of x and Y series are 6 and 8 respectively.
Also find the probable error.

7. From the following data calculate coefficient of correlation between age and playing habits. How do
you interpret the result
Age 20 21 22 23 24 25

Number of 500 400 300 240 200 160


students

Regular 400 300 180 96 60 24


players

8. Compute spearman’s rank correlation coefficient for the following observations:


Candidate 1 2 3 4 5 6 7 8
judge 1 20 22 28 23 30 30 23 24
Judge 2 28 24 24 25 26 27 32 20

9. The marking of trainees in two skills, programming and analysis are as follows. What is the
coefficient of rank correlation?
Programming 3 5 8 4 7 10 2 1 6 9
Analysis 6 4 9 8 1 2 3 10 5 7

10. Calculate the rank correlation coefficient for the following table of marks of students in two
subjects.
First 80 64 54 49 48 35 32 29 20 18 15 10
subject
Second 36 38 39 41 27 43 45 52 51 42 40 52
subject
14
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

11. Ten competitors in a voice contest are ranked by three judges in the following orders:
First 1 6 5 10 3 2 4 9 7 8
judge
Second 3 5 8 4 7 10 2 1 6 9
judge
Third 6 4 9 8 1 2 3 10 5 7
judge

12. Calculate the coefficient of correlation between age of cars and the annual maintenance cost and
comment:
Age of cars 2 4 6 7 8 10 12
Annual maintenance cost 1,600 1,500 1,800 1,900 1,700 2,100 2,000
13. Quotations of index number of security prices of a certain joint stock company and of prices of
preferences shares and the debentures are given below:
Price 73.2 85.8 78.9 75.8 77.2 81.2 83.8
Debenture 97.8 99.2 98.8 98.3 98.3 96.7 97.1
price
Calculate the rank coefficient of correlation between the preference shares prices and debenture prices.

14. Following are the scores of ten students in a class and their IQ. Use the method of rank correlation
to determine the relationship between scores and IQ.
Students 1 2 3 4 5 6 7 8 9 10
Scores 35 40 25 55 85 90 65 55 45 50
IQ 100 100 110 140 150 130 100 120 140 110

15. The average daily wages for working class in Nagpur is Rs.12 and for that in Delhi Rs.18, their
respective standard deviations are Rs.2 and Rs.3 and the coefficient of correlation is 0.67. Find the
most likely wage in Delhi corresponding to the wage of Rs.20 in Nagpur.

16. Given the following values, find the expected value of X when Y is 12
Average of X series = 25 Average of Y series = 22

S.D of X series = 4 S.D of Y series = 5

18. The coefficient of correlation between marks obtained in mathematics and marks obtained in is -0.4,
the average marks are respectively 80 and 50. The standard deviation of marks in Mathematics and
English are 15 and 10 respectively. Estimate the marks of the student in mathematics who has secured 64
marks in English.

Answer
1. r = 0, 3. r=0.954 4. 0.034 5. 0.955 6. 11, 0.919 7. -0.071
8. r= -0.991 9 . -0.297 10. -0.685 11. I and II=-0.212: II and III=-0.297: I and III=0.636 12.
0.84 13.0.125 14. 0.47 15. Y20 = 26.04 18. 28.6 19. Marks in Maths =94

15
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

PART B

Introduction to Regression
Correlation measures the direction and the strength of the relationship between the variables and so
we can predict the value of one variable from the given value of variable knowing the degree of
association between these variables. For example, the demand and supply are correlated. We can find
the expected demand for the given supply for the market needs.
Regression analysis is widely used for deriving an appropriate functional relationship between the
variables. It helps us to estimate one variable or the dependent variable from the other variable or
independent variable. The prediction is based on average relationship arrived at statistically by
regression analysis.
The literal meaning of regression is ‘moving backward’, ‘going back’ or ‘return to the mean value’.
“Regession is a technique which estimates the value of unknown from the know values. Regression
also is defined as predicting or estimating the dependent values with the help of independent values.”
In regression analysis there are two types of variables. The variable whose value is influenced or is to
be predicted is called dependent variable and the variables which influence the value or is used for
prediction, is called independent variable. In regression analysis independent variable is also known
as regressor or predictor or explained variable.
Uses of regression analysis

1. Regression analysis is used almost in every field where two or more relative variables have the
tendency to go back to the averages. It is very useful in prediction purposes as in the fields of
statistics, economics, natural sciences and physical sciences and many other applied fields.
It is very well adopted for predicting sales, production or demand in any business entity
which would plan for a better profit.
2. Regression analysis predicts the unknown variable from the known values of the variable.
3. We can calculate the coefficient of correlation with the help of regression coefficient.
4. Regression analysis in statistical estimation of demand curves, supply curves, production function,
cost function, consumption functions etc., can be predicted.

16
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Correlation and Regression


The correlation establishes the relation or the degree of association between the two variables while
regression establishes a functional relationship between the dependent and the independent variable
so that the values of unknown variables can be estimated from the known values of independent
variables. Correlation precedes the regression analysis.

Regression Equations

Regression equations are the algebraic expression of the regression lines. Since there are two regression
lines, there will be two regression equations. One, the regression equation of X on Y is used to describe
the variation in the value of X for the given changes of Y and the regression equation Y on X is used to
describe the variation in the values of Y for the given charges of X.

Regression Equation of Y in X.

The regression equation of Y on X expressed as follows:

Yc = a + bX

Where Y is the dependent variable to be estimated and X is the independent variable.

In this equation ‘a’ and ‘b’ are two unknown constants (fixed numerical values) which determine the
position of the line completely. The constant ‘a’ determines the level of the fitted line i,e., the change in Y
for the unit change in X.

If the values of the constants ‘a’ and ‘b’ are obtained, the line equation is completely determined. But
how to determine these values, the answer is obtained by the method of least squares which states that the
line should be drawn through the plotted points in such a manner that the sum of the squares of the
vertical deviations of the actual Y values from the estimated Y values is the least or in the other words, in
order to obtain a line which fits the points best, (Y-Yc)2 should be minimum. Such a line is known as the
line of best fit.

Regression equation - Deviation taken from arithmetic mean


Deviation taken from arithmetic mean of X on Y.
This method is much easier and much simpler than the previous method which is a tedious one. We find
regression equations by taking deviations from their respective means.
1. Regression equation of X an Y

or or

: is the regression coefficient of x on y or it can be represented as bxy.


17
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

=
Deviation taken from arithmetic mean of Y on X.
2. Regression equation of Y an X

or or

: is the regression coefficient of y on x or it can be represented as byx.

Properties of regression coefficient

The relation between the coefficient of correlation and coefficient of regression is given by

(a) If bxy is positive , byx s also positive.

(b) If bxy is negative , byx s also negative.

(c) If one regression coefficient is greater than unity, then the other regression coefficient must be

lesser than unity.

Example 1: Calculate the two regression equations of X on Y and Y on X from the data given below,

taking deviations from the actual means of X and y variables.

Demand 12 13 15 13 12 20 20

Supply 45 40 43 37 40 39 43

Solution : The demand is taken as X series and the supply as Y

Computation of regression equations


X Y dx= (x- 15) dx2 dy= y-41 dy2 dxdy
12 45 -3 9 4 16 -12

13 40 -2 4 -1 1 2

15 43 0 0 2 4 0

13 37 -2 4 -4 16 8

18
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
12 40 -3 9 -1 1 3

20 39 5 25 -2 4 -10

20 43 5 25 2 4 10

∑dx2 =
∑x =105 ∑y = 287 ∑dx = 0 76 ∑dy = 0 ∑dy2 = 46 ∑dxdy=1

Regression equation of X on Y Regression equation of Y on X

EXAMPLE 2:

The following data relate to ages of husbands and wives. Obtain the two regression equations and

determine the most likely age of husbands for the age of wife 25 years and most likely age of wife age of

husband 30 years. Also determine the coefficient of correlation.

Age of 27 25 29 28 30 33 37 35 40 42
husbands
Age of 24 20 27 25 24 28 34 28 44 38
wives
Solution : The ages of husband is taken as X and the ages of wives is taken as Y.
dx = x - dy= y -
X Y 32.6 dx2 29.2 dy2 dxdy
27 24 -5.6 31.36 -5.2 27.04 29.12

25 20 -7.6 57.76 -9.2 84.64 69.92

29 27 -3.6 12.96 -2.2 4.84 7.92

28 25 -4.6 21.16 -4.2 17.64 19.32

19
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
30 24 -2.6 6.76 -5.2 27.04 13.52

33 28 0.4 0.16 -1.2 1.44 -0.48

37 34 4.4 19.36 4.8 23.04 21.12

35 28 2.4 5.76 -1.2 1.44 -2.88

40 44 7.4 54.76 14.8 219.04 109.52

42 38 9.4 88.36 8.8 77.44 82.72

∑dx2
∑x = ∑y = = ∑dy2 = ∑dxdy
326 292 ∑dx = 0 298.4 ∑dy = 0 483.6 =349.8

Regression equation of X on Y Regression equation of Y on X

The correlation coefficient is found through the regression coefficients

Thus the likely age of husband for age of wife being 25 years is

The likely age of wife for age of husband being 30 is

Example 3: Estimate : 1. The sales corresponding to advertising expenditure of Rs. 35 lakhs.

2. The advertising expenditure for the sales target of Rs. 30 crores .

20
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

The following data relates to advertising expenditure (in lakhs of rupees ) and their corresponding

sales (in crores of rupees)

Advertising
expenditure 15 16 19 20 21 23

Sales 9 12 17 23 21 26

Solution : Let advertising expenditure be denoted as X and sales as Y .

Calculation of regression equations


X Y dx = x – 19 dx2 dy = y - 18 dy2 dxdy
15 9 -4 16 -9 81 36
16 12 -3 9 -6 36 18
19 17 0 0 -1 1 0
20 23 1 1 5 25 5
21 21 2 4 3 9 6
23 26 4 16 8 64 32
∑dxdy =
∑x = 114 ∑y = 108 ∑dx2=46 0 ∑dy2 = 216 97

Regression equation of X on Y Regression equation of Y on X

The correlation coefficient is found through the regression coefficients

1. Thus the likely sales corresponding to advertising expenditure of Rs.35 lakhs is

2. Thus the AD expenses corresponding to sales of Rs.30 cr is


21
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

Regression equation – short cut method or deviations taken from Assumed means
When the actual means of X and Y series are in fractions, the calculation of the deviations becomes

tedious and hence the deviations are taken from the assumed mean. The value of the

regression coefficients , will be calculated as follows:

Regression equation of x on y Regression equation of y on x

Where where

dx = x – A and dy = y - A

A is the assumed mean.

Example 4: A company wants to assess the impact of Exports on its annual profit. The following table

presents the information for the last eight years.

Years 2009 2010 2011 2012 2013 2014 2015 2016


Exports 10 8 5 10 9 5 8 7
(Rs.’000)
Annual 40 50 43 59 60 45 40 40
profit(Rs.’000)
Estimate the regression equation and predict the annual profit for 2017 for an allocated sum of Rs.12,000

exports.

Solution: Let the exports be taken as x and that of annual profits as y.


dy= y
Years x y dx = x -6 dx2 -40 ∑dy2 dxdy
10 40 2 4 -3 9 -6
2010
22
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
8 50 0 0 7 49 0
2009
5 43 -3 9 0 0 0
2008
10 59 2 4 16 256 32
2007
9 60 1 1 17 289 17
2006
5 45 -3 9 2 4 -6
2005
8 40 0 0 -3 9 0
2004
7 40 -1 1 -3 9 3
2003
∑y
∑x =62 =377 ∑dx= -2 ∑dx2=28 ∑dy=33 ∑dy2=625 ∑dx dy=40

Regression equation of x on y Regression equation of y on x

X = 0.099Y + 3.085 Y = 1.755X +33.524

The annual profit for the sum of exports of Rs.12 is Y = = 1.755(12) +33.524=Rs. 54.584(in thousands)

Example 5: Calculate the coefficient of correlation and regression equations between X and Y series
from the following data:
23
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

X series Y series
Number of pairs of observation 15
Arithmetic mean 25 18
Sum of square of deviations from
arithmetic mean 286 136
summation of product deviation of X and Y series from their respective arithmetic mean = 169
Solution:
Lets the data given in the form of notations

Regression equation of X on Y Regression equation Y on X

The coefficient if correlation


There is a high degree of correlation.

Example 6: From the following data of the rainfall and production of rice , the most likely production
corresponding to the rainfall of 35”
Rain fall(inches) production(tonnes)
Mean 25 50
SD 6 8
Coefficient of correlation = +0.85
Solution : rain fall is taken as X and production as Y.
Regression equation of X on Y Regression equation of Y on X

24
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

The most likely production of rice for the rainfall of 35 inches is

Example 7: The coefficient of correlation between the ages of boys and girls in a community was found
to be +0.89, the average of boys was 13 years and that of girls 10 years. Their standard deviations were 3
ans 2 years respectively. Find with the help of regression equations:
(a) The expected age of boy when the girl’s age is 17.
(b) The expected age of girl when the boy’s age is 18.
Solution:
Let the boys age be X and girls age Y.

Regression equation of X on Y Regression equation of Y on X

(a) The expected age of boy when girl’s age 17 is


(b) =

(c) (b) The expected age of a girl when the boy’s age being 18 is
(d) =

Example 8: The following calculation have been made for closing prices of eight stocks (Y) on the
National stock Exchange on a certain, along with the volume of the sales in thousands of shares(X) . from
these calculation find the regression equation of volume of shares on stocks.
25
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
2 2
∑X= 56, ∑Y=40, ∑XY=364,∑X =524, ∑Y =256
Solution :

Regression equation of X on Y

Exercise

1. The average daily wages for working class in Nagpur is Rs.12 and for that in Delhi Rs.18, their
respective standard deviations are Rs.2 and Rs.3 and the coefficient of correlation is 0.67. Find the
most likely wage in Delhi corresponding to the wage of Rs.20 in Nagpur.

2. Given the following values, find the expected value of X when Y is 12


Average of X series = 25 Average of Y series = 22

S.D of X series = 4 S.D of Y series = 5

3. The coefficient of correlation between marks obtained in mathematics and marks obtained in is -0.4,
the average marks are respectively 80 and 50. The standard deviation of marks in Mathematics and
English are 15 and 10 respectively. Estimate the marks of the student in mathematics who has secured 64
marks in English.

4. Prices indices of cotton and wool are given below for 6 months of a year. Obtain the equations of
regression between the indices.
Prices index of cotton (X) 78 77 85 88 87 82

Prices index of wool (Y) 84 82 82 85 89 90

5. The following table gives the relative values of two variables :


X 42 44 58 55 89 98 66

Y 56 49 53 58 65 76 51

Determine the regression equations which may be associated with these values and calculate Karl
Pearson’s coefficient of correlation

6. The following table gives the aptitude test scores and the productivity indices of 10 workers
selection at random.
Aptitude Scores(X) 60 62 65 70 72 48 53 73 65 82
26
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II
Productivity index(Y) 68 60 62 80 85 40 52 62 60 81
Calculate the two regression equations and estimate the productivity index of a workers whose test score
is 92.
7. To study the relationship between expenditure on accommodation X and expenditure on food and
entertainment Y , an enquiry into 50 families gave the following results:
∑X=8500, ∑y =9600, σx = 60 , σy = 20 and r = 0.6
Estimate the expenditure on food and entertainment when expenditure on accommodation is Rs.200.

8. Following are the data on business on turnover and the staff of a company for eight years from
2002 to 2009:
Years 2002 2003 2004 2005 2006 2007 2008 2009

Business 45 50 60 75 80 110 150 170


turnover(Rs
crores)

Staff 2,600 3,000 3,100 3,530 3,850 4,300 5,870 7,150

Fit a proper regression equation to estimate manpower in terms of business turnover. Estimate the
staff requirement when the business turnover reaches Rs.200 crores.

9. Calculate the two regression equations of X on Y and Y on X from the data given below taking
deviations from actual means of X and Y:
Price (Rs) 10 12 13 12 16 15
Amount demanded 40 38 43 45 37 43
Estimate the likely demand when the price is Rs.20.

10. An industrial engineer collected the following data on experience and performance rating of 8
operators:
Operators 1 2 3 4 5 6 7 8

Experience 16 12 18 4 3 10 5 12
(years)

Performance 87 88 89 68 58 80 70 85
rating

(a) Does the data give evidence that experience improves performance?
(b) Estimate the performance rating of an operator having (a) 9 years (b) 15 years of experience.

Answer
1. Y 20 = 26.04 2. 28.6 3. Marks in Maths =94 4. X=4.78Y+42.084, Y = 0.265X+63.365

5. X=2.19Y-65.25, Y=0.037X+35.39, r =0.901 6. X=-0.596Y+26.26, Y=1.168Y-10.92


7. Y=158+0.2X, Y=198. 8. Y= 33.24X+1100.3; 7748.3
9. X= 17.92 - 0.12Y, Y = 44.25 - 0.25X, when x is 20, y= 49.25.
27
FOR PRIVATE CIRCULATION ONLY
QUANTITATIVE TECHNIQUES II

10. Y=69.67 + 1.133X

28
FOR PRIVATE CIRCULATION ONLY

You might also like