0% found this document useful (0 votes)
458 views27 pages

Chapter 4

Correlation measures the strength of the linear relationship between two or more variables. The correlation coefficient (r) ranges from -1 to 1. - r=1 indicates a perfect positive linear relationship - r=-1 indicates a perfect negative linear relationship - r=0 indicates no linear relationship between the variables - 0<r<1 indicates a positive linear relationship - -1<r<0 indicates a negative linear relationship The scatter diagram method can be used to interpret the correlation coefficient by visually depicting the relationship between the variables.

Uploaded by

Shuvo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
458 views27 pages

Chapter 4

Correlation measures the strength of the linear relationship between two or more variables. The correlation coefficient (r) ranges from -1 to 1. - r=1 indicates a perfect positive linear relationship - r=-1 indicates a perfect negative linear relationship - r=0 indicates no linear relationship between the variables - 0<r<1 indicates a positive linear relationship - -1<r<0 indicates a negative linear relationship The scatter diagram method can be used to interpret the correlation coefficient by visually depicting the relationship between the variables.

Uploaded by

Shuvo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

CHAPTER FOUR

CORRELATION

1. Define correlation and correlation coefficient.

Ans: correlation: correlation measure the strength of linear relationship between two or
more variables so that if the change in one variable effects to change in other variable.
Then they are said to be correlated.

For example, the production of paddy is dependent on the rainfall. Here production of
paddy is considered to be a dependent variable.

Correlation coefficient: The numerical value by which we measure the strength of


linear relationship between two or more variables is called correlation coefficient.

Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the correlation


coefficient between x and y is denoted by rxy and defined as,
n
∑ ( xi − x̄ )( y i− ȳ )
i=1
n n

rxy = √∑ (
i =1
2
xi − x̄ ) ∑ ( y i − ȳ )2
i =1 ……………..(1)

Equation (1) is also called Karl pearson’s coefficient of correlation formula given by
1890.

Algebraically (1) reduces to


n n

∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi

√{
n 2 n 2

rxy =
n
∑ x i2−
i=1
(∑ )i=1
n
xi
}{ n
∑ y i 2−
i=1
(∑ ) i=1
n
yi
}
2. Types of Correlation.
There are three types of correlations. They are

(i) simple correlation;


(ii) Partial correlation;
(iii) Multiple correlations.

(i) Simple correlation:


Correlation only between two variables is called simple correlation.

For example, correlation between income and expenditure.

Simple correlation is of three types:

(i) Positive correlation


(ii) Negative correlation
(iii) Zero correlation.

3. METHODS OF STUDYING SIMPLE CORRELATION

(i) Scatter Diagram method;

(ii) Karl Pearson’s Coefficient of correlation;


(iii) Spearman’s Rank Correlation and
(iv) Method of least squares.

SCATTER DIAGRAM METHOD

4. What is scatter diagram? Interpret the different values of r with the help of
scatter diagram.
Scatter diagram: The diagrammatic way of representing bivariate data is called scatter
diagram.
Suppose, (x1,y1), (x2,y2)………..(xn,yn) are n pairs of observations. If the values of the
variables x and y be plotted along the x-axis and y-axis respectively in the xy-plane, the
diagram of dots so obtained is known as scatter diagram.

Scatter diagrams for different values of r are as follows:

Fig 1.1
Scatter
diagram for showing r= +1 Fig 1.2 Scatter diagram for showing r= -1

Fig 1.3 Scatter diagram for showing 0 < r <1 Fig 1.4 Scatter diagram for showing -1< r <0
Fig 1.5 Scatter diagram for showing r = 0 Fig 1.6 Scatter diagram for showing r = 0

Interpret the values of r

1. r= +1, indicates a perfect positive relationship between x and y. the scatter


diagram will be as in fig. 1.1
2. r=-1, indicates a perfect negative relationship between x and y. the scatter diagram
will be as in fig. 1.2
3. r=0, means there is no linear relationship between x and y. In this case the two
variables are linearly independent. the scatter diagram will be as in fig. 1.5 and 1.6
4. 0 < r <1, indicates a positive relationship between x and y. In this case the scatter
diagram will be as in fig. 1.3
5. -1< r <0, indicates a negative relationship between x and y. In this case the scatter
diagram will be as in fig. 1.4

5. Write down the properties of correlation coefficient


1. Correlation coefficient is independent of change of origin and scale of measurement.

2. Correlation coefficient lies between -1 to +1. i.e, -1< rxy < 1.

3. Correlation coefficient is symmetric. i.e, rxy= ryx

4. Correlation coefficient is the geometric mean of regression coefficients

i.e, rxy= √ b yx ×b xy
5. For two independent variable correlation coefficient is zero

6. It is always unit free.

6. Show that Correlation coefficient lies between -1 to +1 i.e, -1 ¿ rxy ¿ 1.


Proof: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the
correlation coefficient between x and y is denoted by rxy and defined as,
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n

rxy = √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1 ……………..(1)

Suppose, ( x i− x̄ ) =X and ( y i − ȳ )=Y therefore

∑ XY
r= √∑ X 2 ∑ Y 2
Let us consider the following expression which is always positive.

X Y
±
2
i.e, ∑ ( √∑ X 2 √∑ Y 2 ) ≥0

X2 X Y Y2

or,
∑ ( ∑ X 2
±2
√ 2
∑ √ ∑
X Y 2
+
Y 2
≥0
)
∑ X2 ±2
∑ XY ∑ Y 2 )≥0
or, ( ∑ X2 √∑ X2 ∑ Y 2 + ∑Y2
or, 1 ±2 r+1≥0

or, 2(1 ±r)≥0

or, (1 ±r)≥0 ……(i)

From (i), 1+r ¿ 0 [considering +ve sign.]

or, r ¿−1

or, -1 ¿ r …………(ii)

and 1-r ¿ 0

or, 1 ¿ r

or, r ¿ 1 …………..(iii)
From (ii) and (iii) we get, 1< r < 1.

i.e, coefficient lies between -1 to +1.

7. Show that for two independent variable correlation coefficient is zero.


Proof: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the
arithmetic mean of x i is x̄ and y i is ȳ . Since x and y are independent therefore,
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
=0
Covariance, Cov(x,y)= n

or, ∑ ( x i − x̄ )( y i− ȳ ) =0
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n

We Know, rxy = √∑ (
i =1
xi − x̄ ) 2
∑ ( y i − ȳ )2
i =1

0
n n

=
√∑ (i =1
xi − x̄ )
2
∑ ( y i − ȳ )2
i =1

= 0 (proved)

8. Question: Uses of correlation coefficient.


1. To find the relationship between two variables.
2. To find the relationship between dependent variable and combined influence of a
group of independent variables.
3. To solve many problem in biology.
4. In social studies like relationships between crime and educations, correlation analysis
has got definite role to play.
5. In economies this is used specially.

9. Comment on the followings:

(i) r=0 (ii) r=-1 (iii) r=1 (iv) r ¿ 1 (v) r<1


(i) r=0, indicates that the correlation coefficient between x and y is zero.

(ii) r=-1, indicates that the correlation coefficient between x and y is perfect negative.

(iii) r=1, indicates that the correlation coefficient between x and y is perfect positive.

(iv) r ¿ 1 i.e, r=1 and r>1

r>1, is not possible because, the Correlation coefficient lies between -1 to +1.

(v) r<1, not possible because, the Correlation coefficient lies between -1 to +1.

Application Problem-1: If y = mx + c, then find the correlation coefficient between


x and y.

Solution: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the


correlation coefficient between x and y is denoted by rxy and defined as,
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n

rxy = √ ∑ ( xi − x̄ ) ∑ ( y i − ȳ )2
i =1
2

i =1 ……………..(1)

Now, y = mx + c………..(ii)
n
∑ ( xi − x̄ )( mxi + c−m x̄ −c )
i =1
n n

Therefore, rxy = √ ∑ ( xi − x̄ )2 ∑ ( mxi +c−m x̄−c )2


i =1 i =1 ……………..(1)
n
∑ ( xi − x̄ )( mxi −m x̄ )
i =1
n n

=
√∑ (
i =1
xi − x̄ ) 2
∑ ( mxi−m x̄ )2
i =1

n
m ∑ ( x i− x̄ )(x i − x̄ )
i=1
n n

=
m
√ ∑ ( x i− x̄ )2 ∑ ( x i− x̄ )2
i=1 i=1
n
∑ ( x i− x̄ )2
i=1
n
∑ ( x i− x̄ )2
= i=1

=1

Application Problem-2: A research physician recorded the pulse rates and the
temperatures of water submerging the faces of ten small children in cold water to control
the abnormally rapid heartbeats. The results are presented in the following table.
Calculate the correlation coefficient between temperature of water and reduction in pulse
rate.

Temperature of water 68 65 70 62 60 55 58 65 69 63
Reduction in pulse rate. 2 5 1 10 9 13 10 3 4 6

Solution: Calculating table of correlation coefficient.

xi yi xi2 yi2 xiyi


68 2 4624 4 136
65 5 4225 25 325
70 1 4900 1 70
62 10 3844 100 620
60 9 3600 81 540
55 13 3025 169 715
58 10 3364 100 580
65 3 4225 9 195
69 4 4761 16 276
63 6 3969 36 378
∑ x i= 635 ∑ yi= 63
∑ x i2 = 4053 ∑ y i2= 541 ∑ xi yi =3835
7
n n

∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi

√{
n 2 n 2

We know, rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )
i=1
yi

n
}
635×63
3835−
10
( 635 )2 ( 63 )2
= √{ 40537−
10
541−
10 }{ }
= -0.94

The result -0.94, indicates that the correlation coefficient between temperature of water
and reduction in pulse rate is highly negatively correlated.
Assignment problem-1: Compute r for the for the following paired sets of values:

i.(x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)

ii. (x, y): (1,1) , (2, 3), (3, 5), (4, 7), (5, 9)

iii.(x, y): (1,10) , (2, 8), (3, 6), (4, 4), (5, 2)

iv.(x, y): (2,9) , (3, 5), (4, 6), (5, 2), (6, 1)

v.(x, y): (-2,4) , (-1, 1), (0, 0), (1, 1), (2, 4)

Solution 1: (x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)

The formula for finding correlation coefficient is


n n

∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi

√{
n 2 n 2

rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )i=1
n
yi
}
Let us make a table to calculate correlation coefficient.

xi yi xi2 yi2 xiyi


1 2 1 4 2
2 3 4 9 6
3 5 9 25 15
4 4 16 16 16
5 7 25 49 35
∑ x i=15 ∑ yi= ∑ x i2=55 ∑ y i2= 1 ∑ x i y i=74
21 03
n n

∑ x i y i−
( )(∑ )
∑ xi
i=1 i=1
yi

√{
n 2 n 2

rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )
i=1
n
yi
}
15×21
74−
5
( 15 )2 ( 21 )2
= √{ 55−
5
103−
5}{ }
= 0.90

Comment: There exists a strong positive relationship between x and y.


Problem: above ii-v (Assignment)
Assignment Problem-2: The following table gives the ages and blood pressure of 10
women:

Age in years 56 42 36 47 49 42 72 63 55 60
x
Blood pressure 147 12 118 128 125 140 155 160 149 150
y 5
(i) Draw a scatter diagram
(ii) Find correlation coefficient between x and y and comment.

Ans: Try your-self

RANK CORRELATION
Rank correlation: In some situation it is difficult to measure the values of the variables
from bivariate distribution numerically, but the can be ranked. The correlation coefficient
between these two ranks is usually called rank correlation coefficient, given by Spearman
(1904). It is denoted by R. this is the only method for finding relationship between two
qualitative variables like beauty, honesty, intelligence, efficiency and so on.

When there are no ties, the formula for computing the spearman’s rank correlation
coefficient

6 ∑ d2
R = 1- n ( n2 −1 )

Here, R= rank correlation coefficient, n = number of pairs of observations being ranked.

d = difference between rank of x and rank of y.

Remarks:

(i) We always have ∑ d i =∑ ( R1−R2 )=0


(ii)Like simple correlation coefficient, rank correlation coefficient lies between -1 to +1.

Note: For finding rank correlation coefficient, we may have two types of data:

(i) Actual observations are given


(ii) Actual ranks are given

Example-1: Obtain the rank correlation co-efficient for the following data:
A: 80 75 90 70 65 60
B: 65 70 60 75 85 80

Solution: Here ranks of the score are not given. Let us start ranking from the highest
value for both the variables as shown in the table given below:

A B Rank of A Rank of B d = x-y d2


(x) (y)
80 65 2 5 -3 9
75 70 3 4 -1 1
90 60 1 6 -5 25
70 75 4 3 1 1
65 85 5 1 4 16
60 80 6 2 4 16
Total ∑ d i == 0 ∑ d i 2 == 68

2
6∑ d
R = 1- n ( n2 −1 )

6×4
2
= 1- 6 ( 6 −1 ) = - 0.94

Conclusion: There exist strongly negative relationship between A and B.

Example-2: Obtain the rank correlation co-efficient for the following data:

Examiner A B C D E
I 1 2 3 4 5
II 2 4 1 5 4

Solution: Here ranks of the score are given:

Ranking by Ranking by d = R1 – R2 d2
examiner-I: R1 examiner-II:
R2
1 2 -1 1
2 3 -1 1
3 1 2 4
4 5 -1 1
5 4 1 1
Total ∑ d i == 0 ∑ d 2 == 8
i

2
6∑ d
R = 1- n ( n2 −1 )

6×8
2
= 1- 5 ( 5 −1 )

= 0.6

Comment: There is a positive rank correlation coefficient between the rankings of two
examiners.

Repeated ranks or ties observations:


When ranks are repeated the following formula is used for finding rank correlation
coefficient:

6 {∑ d + 121 (m −m )+121 (m −m )+.. .. . .. .. . .. .}


2
12 1 22 2

R = 1- n ( n2 −1 )

Problems of equal ranks or tie in ranks:


Example-3: The following data refer to the marks obtained by 8 students in mathematics
and statistics:

Marks in 20 80 40 12 28 20 15 60
mathematics
Marks in statistics 30 60 20 30 50 30 40 20
Compute rank correlation coefficient and comment.

Solution: let the marks obtained by mathematics be x and the marks obtained by statistics
be y.

Table for computation of rank correlation.


x y Rank of x Rank of y (R2) d = R1- R2 d2
(R1)
20 30 3.5 4 -0.5 0.25
80 60 8 8 0 0
40 20 6 2 4 16
12 30 1 4 -3 9
28 50 5 7 -2 4 Here,
20 30 3.5 4 -0.5 0.25 m1 = 2,
15 40 2 6 -4 16
m2 = 3,
60 10 7 1 6 36
n=8
∑ d i 2 == 81 .5
R = 1-
1 2
{
6 81 .5+ ( 2 −2 ) + 1 ( 3 2−3 ) }
12 12
8 ( 8 2−1 )

=1

Assignment problem-3:

The following figures relate to advertisement expenditure and profit:

Profit (Tk.Crore):x 25 28 27 33 31 10 16 16 18 23

Adv. Exp.(Tk. Lakh):y 87 91 92 95 93 52 68 72 78 86


(i)Draw a scatter diagram and comment

(ii) Calculate Karl Pearson’s and Spearman rank correlation coefficients and comment.

Assignment problem-4:

The following figures relate to advertisement expenditure and sales of a company:

Adv. Exp. 62 67 73 78 85 78 91 9296 98

(Tk. Lac)
Sales 11 13 17 18 21 24 21 27 26 21
(Tk.Crore)
Calculate Karl Pearson’s correlation coefficient and Spearman rank correlation

coefficient and comment.

Regression
1. What is regression?
Ans: The probable movement of one variable in terms of the other variables is called
regression.

In other words the statistical technique by which we can estimate the unknown value of
one variable (dependent) from the known value of another variable is called regression.

The term “regression” was used by a famous Biometrician Sir. F. Galton (1822-1911) in
1877.

Example: The productions of paddy of amount y is dependent on rainfall of amount x.


Here x is independent variable and y is dependent variable.

2. Define regression analysis.


Ans: Regression analysis is a mathematical measure of the average relationship between
two or more variables in terms of the original units of data.

3. Define regression coefficient.


Ans: The mathematical measures of regression are called the coefficient of regression.

Let, (x1,y1), (x2,y2)……….. (xn,yn) be the pairs of n observations. Then the regression
coefficient of y on x is denoted by byx and defined by
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
byx = i=1
Again, the regression coefficient of x on y is denoted by bxy and defined by
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
bxy = i=1

4. Regression lines:
If we consider two variables X and Y, we shall have two regression lines as the
regression line of Y on X and the regression line of X on Y. The regression line of Y on
X gives the most probable values of Y for given values of X and The regression line of
X on Y gives the most probable values of X for given values of Y. Thus we have two
regression lines. However, when there is either perfect positive or perfect negative
correlation between the two variables, the two regression lines will coincide i.e, we will
have one line.

5. Regression equation:
The regression equation of y on x is expressed as follows:

y = a + bx , where y is the dependent variable to be estimated and x is the independent


variable, a is the intercept term (assume mean) and b is the slope of the line.
n
∑ ( x i− x̄ )( y i − ȳ )
i=1

∑ y −b ∑ x n
∑ ( xi − x̄ )2
Here, a = y - bx = n n and b= i =1

n n

∑ xi yi−
(∑ )(∑ )
i =1
xi
i=1
yi

n
n 2

n
∑ x i2 −
( )∑ xi
i =1

= i=1 n

Similarly, The regression equation of x on y is expressed as follows:

x = a + by, where x is the dependent variable to be estimated and y is the independent


variable, a is the intercept term (assume mean) and b is the slope of the line.
Here, a = x - by
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
And b= i=1

n n

∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi

n
n 2

n
∑ y i2 −
(∑ )
i =1
yi

= i=1 n

6. Write down the Properties of regression coefficient.


Ans: 1. Regression coefficient is independent of change of origin but not of scale.

2. Regression coefficient lies between - ¿ to + ¿ . i.e, -< ¿ byx < ¿ .

3. Regression coefficient is not symmetric. i.e, bxy ¿ byx

4. The geometric mean of regression coefficients is equal to correlation coefficient

i.e, rxy= √ b yx ×b xy
5. The arithmetic mean of two regression coefficient is greater than correlation

b yx +b xy
Coefficient. i.e,
( 2 )

rxy

6. If one of regression coefficient is greater than unity the other must be less than

unity. i.e, byx ¿ 1 and bxy<1

7. Regression coefficient is not pure number.

Coefficient of Determination, r 2  or  R2 :


   The coefficient of determination, r 2, is useful because it gives the proportion of
      the variance (fluctuation) of one variable that is predictable from the other
variable.
      It is a measure that allows us to determine how certain one can be in making
      predictions from a certain model/graph.
    The coefficient of determination is the ratio of the explained variation to the total
      variation.
    The coefficient of determination is such that 0 <  r 2 < 1,  and denotes the strength
      of the linear association between x and y. 
    The coefficient of determination represents the percent of the data that is the
closest
      to the line of best fit.  For example, if r = 0.922, then r 2 = 0.850, which means that
      85% of the total variation in y can be explained by the linear relationship between
x
      and y (as described by the regression equation).  The other 15% of the total
variation
      in y remains unexplained.
    The coefficient of determination is a measure of how well the regression line
      represents the data.  If the regression line passes exactly through every point on
the scatter plot, it would be able to explain all of the variation. The further the line is
      away from the points, the less it is able to explain.

7. Show that correlation coefficient is the geometric mean of


regression coefficients. i.e, rxy= √ b yx ×b xy
Proof: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the
correlation coefficient between x and y is denoted by rxy and defined as,
n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n

rxy = √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1 ……………..(1)
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
Again, the regression coefficient of y on x is, byx = i =1
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
Again, the regression coefficient of x on y is, bxy = i=1

n n
∑ ( x i− x̄ )( y i − ȳ ) ∑ ( x i− x̄ )( yi − ȳ )
i=1 i=1
n
× n
2
∑ ( xi − x̄ ) ∑ ( y i− ȳ ) 2
byx ¿ bxy = i=1 i=1

n
∑ ( xi − x̄ )( y i− ȳ )
i =1
n n

√ b yx ×b xy= √ ∑ ( xi − x̄ )2 ∑ ( y i − ȳ )2
i =1 i =1

= rxy (proved)

8. The arithmetic mean of two regression coefficient is greater than


b yx+b xy
correlation coefficient. i.e,
( 2 )≥
rxy
Proof: Let, (x1,y1), (x2,y2)……….. (xn,yn) be the pairs of n observations. Then the
regression coefficient of y on x is denoted by b yx and the regression coefficient of x on y
is denoted by bxy.

b yx + b xy
The arithmetic mean of byx and bxy is A.M=
( 2 ) and the geometric mean is

√ b yx ×b xy
G.M=

We know, Correlation coefficient is the geometric mean of regression coefficients.


=
xy
√ b yx ×b xy
i.e, r

Since, A.M ¿ G.M

b yx + b xy
( 2 ) ¿ √ b yx ×b xy
or,

b yx + b xy
( 2 ) ¿ r (proved
or,

9. Write down the uses of regression.


Ans: (i) Whether a relationship exists or not.

(ii) To find the strength of relationship.

(iii) Determination of mathematical equation.

(iv) Prediction the values of the dependent variables.

10.Distinguish between correlation coefficient and regression coefficient.

Correlation coefficient Regression coefficient.


1. The numerical value by which we measure 1. The mathematical measures of
the strength of linear relationship between regression are called the coefficient of
two or more variables is called correlation regression.
coefficient.

2. Correlation coefficient is independent of 2. Regression coefficient is independent


change of origin and scale of measurement. of change of origin but not of scale.

3. Correlation coefficient lies between -1 to 3. Regression coefficient lies between -


+1. i.e, -1< rxy < 1. ¿ to + ¿ . i.e, -< ¿ byx < ¿ .
4. Correlation coefficient is symmetric. i.e, 4. Regression coefficient is not
rxy= ryx symmetric. i.e, bxy ¿ byx
5. It is always unit free. 5. Regression coefficient is not pure
number.

6. When r=0 then the variables are 6. When r=0 then two lines of regression
correlated. are perpendicular to each other.

Application problem-1: A researcher wants to find out if there is any relationship


between the ages of husbands and the ages of wives. In other words, do old husbands
have old wives and young husbands have young wives? He took a random sample of 7
couples whose respective ages are given below:

Age of Husband(in years):x 39 25 29 35 32 27 37


Age of wife(in years):y 37 18 20 25 25 20 30

(a) Compute the regression line of y on x.


(b) Predict the age of wife whose husband’s age in 45 years.
(c) Find the regression line of x on y and estimate the age of husband if the age of his
wife is 28 years.
(d) Compute the value of correlation coefficient with the help of regression
coefficients.

Solution: The equation of the best –fitted regression line of y on x is ^y = a + bx


n n

∑ xi yi−
(∑ )(∑ )
i =1
xi
i=1
yi

n
n 2

n
∑ x i2 −
( )
∑ xi
i =1

Where, b = i=1 n and a = y - bx


Computation table

x y x2 y2 xy
39 37 1521 1369 1443
25 18 625 324 450
29 20 841 400 580
35 25 1225 625 875
32 25 1024 625 800
27 20 729 400 540
37 30 1369 900 1110
∑ x= 224 ∑ y=175 ∑ x 2=7334 ∑ y 2= 4643 ∑ xy= 5798
n n

∑ xi yi−
( )( )
∑ xi ∑ yi
i =1 i=1

n
( 224 )( 175 )
n 2 5798−
n
∑ x i2 −
(∑ )
i =1
xi
7334−
7
( 224 )2
(a) Here, b = i=1 n 7 = 1.193
=

And a = y - bx

∑y b
∑x
= n - n

175 ( 224 )
= 7 -(1.193) 7 = 25-38.176 = -13.176

Hence the fitted regression line is ^y = a + bx = -13.176 + 1.193x

(b) Hence, if the age of husband is 45, the probable age of wife would be
^y = -13.176 + 1.193x = -13.176 + 1.193 ¿ 45 = 40.51 years.
(c) The equation of the best –fitted regression line of y on x is ^x = a + by
n n

∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi

n
n 2

n
∑ y i2 −
(∑ )
i =1
yi

Where, b = i=1 n

( 224 )( 175 )
5798−
7
( 175 )2
4643−
= 7 = 0.739

And a= x - by
∑ x −b ∑ y
= n n
224 175
−0 . 739
= 7 7 = 13.525

Hence the fitted regression line is ^x = a + by = 13.525+ 0.739y


Hence, if the age of wife is 28 years, the estimate age of husband is

^x = a + by

= 13.525+ (0.739)(28) = 34.22 years.

Application problem-2: A research physician recorded the pulse rates and the
temperatures of water submerging the faces of ten small children in cold water to control
the abnormally rapid heartbeats. The results are presented in the following table.
Calculate the correlation coefficient and regression coefficients between temperature of
water and reduction in pulse rate.

Temperature of water 68 65 70 62 60 55 58 65 69 63
Reduction in pulse rate. 2 5 1 10 9 13 10 3 4 6
b yx + b xy
( 2 )

rxy
Also show that (i)

Solution: Calculating table of correlation coefficient and regression coefficients.

xi yi xi2 yi2 xiyi


68 2 4624 4 136
65 5 4225 25 325
70 1 4900 1 70
62 10 3844 100 620
60 9 3600 81 540
55 13 3025 169 715
58 10 3364 100 580
65 3 4225 9 195
69 4 4761 16 276
63 6 3969 36 378
∑ x i= 635 ∑ yi= 63
∑ x i2 = 4053 ∑ y i2= 541 ∑ xi yi =3835
7
n n

∑ x i y i−
(∑ )(∑ )
i=1
xi
i=1
yi

√{
n 2 n 2

We know, rxy =
n
∑ x i2−
i=1
(∑ )
i=1
n
xi
}{ n
∑ y i 2−
i =1
(∑ )i=1
n
yi
}
635×63
3835−
10
( 635 )2 ( 63 )2
= √{ 40537−
10 }{
541−
10 }
= -0.94
n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( xi − x̄ )2
We know, the regression coefficient of y on x is, byx = i =1

n n

∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi

n
635×63
n 2 3835−
10
n
∑ x i2 −
(∑ )i =1
xi
40537−
( 635 )2 −1655
=
= i =1 n = 10 = = 2145 -
0.77

n
∑ ( x i− x̄ )( y i − ȳ )
i=1
n
∑ ( y i − ȳ )2
Again, the regression coefficient of x on y is, bxy = i=1

n n

∑ xi yi−
( )(∑ )
∑ xi
i =1 i=1
yi

n
635×63
n 2 3835−
10
n
∑ y i2 −
(∑ )
i =1
yi
541−
( 63 )2 −1655
=
= i=1 n = 10 = 1441 -1.1

b yx + b xy
(i)
( 2 )

rxy
b yx +b xy (−0 .77 )+ (−1. 1 )
Here,
( 2 )
=
2 = -0.94 = rxy

Assignment Problem-1: The following data give the test scores and sales made by nine
salesmen during the last year of a big departmental store:

Test Scores: y 14 19 24 21 26 22 15 20 19
Sales(in lakh Taka) 31 36 48 37 50 45 33 41 39
(a) Find the regression equation of test scores on sales.
Ans: ^y = -2.4 + 0.56x
(b) Find the test scores when the sale is Tk. 40 lakh.
Ans: 20 lakh
(c) Find the regression equation of sales on test scores.
Ans: ^x = 7.8 + 1.61y
(d) Predict the value of sale if the test score is 30
Ans: 56.1 lakh
(e) Compute the value of correlation coefficient with the help of regression
coefficients.

Assignment Problem-2: The following table gives the ages and blood pressure of 10
women:

Age in years 56 42 36 47 49 42 72 63 55 60
x
Blood pressure 147 12 118 128 125 140 155 160 149 150
y 5
(i) Obtain the regression line of y on x. Ans: ^y = 83.76+ 1.11x
(ii) Estimate the blood pressure of a women whose age is 50 years. Ans: 139.26
(iii) Obtain the regression line of x on y.
(iv) Find correlation coefficient between x and y and comment.

Assignment Problem-3: Consider the following data set on two variables x and y:

x:1 2 3 4 5 6

y:6 4 3 5 4 2

(a) Find the equation of the regression line y on x. Ans: ^y = 5.799-0.541x


(b) Graph the line on a scatter diagram.
(c) Estimate the value of y when x = 4.5 Ans: ^y = 3.486
(d) Predict the value of y when x = 8. Ans: ^y =1.687

Assignment Problem-4: Cost accountants often estimate overhead based on production.


At the standard knitting company, they have collected information on overhead expenses
and units produced at different plants and what to estimate a regression equation to
predict future overhead.

Units 56 40 48 30 41 42 55 35
Overhead 282 173 233 116 191 171 274 152
(i)Draw a scatter diagram and comment

(ii)Fit a regression equation.

(iii)Estimate overhead when 65 units are produced.

Assignment Problem-5: The following data refer to information about annual sales

( Tk.’000) and year of experience of a super store of 8 salesmen:

Salesmen 1 2 3 4 5 6 7 8

Annual sales (Tk.’000) 90 75 78 86 95 110 130 145

Year of experience 7 4 5 6 11 12 13 17
(i)Fit two regression lines.

(ii)Estimate sales for year of experience is 10

(iii)Estimate year of experience for sales 100000

You might also like