Unit III Notes
Unit III Notes
Types of Correlation
(1) Positive and negative
(2) Simple and multiple
(3) Linear and non linear
Negative correlation:-if variables constantly deviate in the opposite directions, i.e., if increase (or
decrease) in one result in corresponding decrease (or increase) in the other, correlation is said to be
diverse Of negative. Eg.:-price and demand of a commodity, (il) the volume and pressure of a perfect
gas.
Multiple Correlation:-If we are studying three or more than three variable then it is called Multiple
correlation. For eg:- the yield of rice per acre and both the amount of rainfall and amount of fertilizers
are used.
X 10 20 30 40 50
Y 70 140 210 280 350
Non Linear Correlation (Curvilinear):- If the amount of change in one variable doesn’t bear a constant
ratio to the amount of change in the other variable then correlation is said to ben on linear correlation.
For eg
X 10 20 30 40 50
Y 20 10 200 80 50
Scatter Diagram.
It is the simplest way of the diagrammatic representation of bivariate (i.e. two variable) data. Thus
for the bivariate distribution (Xi, y;); i= I,2, ... , n. if the values of the variables X and Y be plotted
along the x-axis and y-axis respectively in the xy plane, the diagram of dots so obtained is known as
scatter diagram. From the scatter diagram, we can form a fairly good, though vague, idea whether
the variables are correlated or not, e.g..if the points are very dense, i.e.. very close to each other, we
should expect a fairly good amount of correlation between the variables and if the ,points are widely
scattered, a poor correlation is expected. This method, however, is not suitable if the number of
observations is fairly large.
𝐶𝑜𝑣(𝑋,𝑌)
r(X,Y)= σ𝑥σ𝑦
where Cov(X,Y) is covariance of (X,Y), σ𝑥 and σ𝑦 are Standard deviation (S.D) of X and Y respectively.
1
Then Cov(X,Y)= 𝑛 ∑(𝑋 − 𝑋)(𝑌 − 𝑌)
2
∑(𝑋−𝑋)
2
σ𝑥 = 𝑛
2
∑(𝑌−𝑌)
2
σ𝑦 = 𝑛
Remarks:-
(1) -1<rxy<1
(2) When r=1 it means there is perfect positive correlation between the variables.
(3) When r=-1 it means there is perfect negative correlation between the variables.
(4) When r=0 it means there is no relation between the variables.
(5) It may be noted that r (X, Y) provides a' measure of linear relationship between X and Y. For
nonlinear relationship, however, it is not very suitable.
(6) Sometimes. we write: Cov(X,Y)= σ𝑋𝑌
(7) Karl Pearson's correlation coefficient is also called product-moment correlation coefficient,
since
Cov(X, Y)-= E [(X - E(X))(Y - E(Y)] =µ1
Eg:-Calculate the correlation coefficient for the following heights (in inches) of fathers (X) and their sons
(Y) :
X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
SolCalculation for correlation coefficient
1 24
Cov(X,Y)= 𝑛 ∑(𝑋 − 𝑋)(𝑌 − 𝑌) = 8
=3
2
∑(𝑋−𝑋)
2 36
σ𝑥 = 𝑛
= 8
= 4. 5
2
∑(𝑌−𝑌)
2 44
σ𝑦 = 𝑛
= 8
= 5. 5
𝐶𝑜𝑣(𝑋,𝑌) 3
r(X,Y))= σ𝑥σ𝑦
= = 0. 603
4.5 5.5
For example:- If we consider the relation between intelligence and beauty, it is not necessary that a
beautiful individual is intelligent also. Let (xi,yi); i= 1. 2 •...•n be the ranks of the ith individual in two
characteristics A and B respectively. Pearson coefficient of correlation between the ranks xi's and yi'sis
called the rank correlation coefficient between A and B for that group of individuals
where di is different between the ranks of the ith individual among two
characters and n is number of paired observation
This method is useful when given data is of qualitative nature (i.e. honesty, beauty )
Eg:-The ranks of same 16 students in Mathematics and Physics are as follows. Two numbers within
brackets denote the rank of the students in Mathematics and Physics.
(1,1) (2,10) (3,3) (4,4) (5,5) (6,7) (7,2) (8,6) (9,8) (10,11) (11.15) (12,9) (13,14) (14,12) (15,16) (16,13)
Calculate the rank correlation coefficient for proficiencies of this group in Mathematics and Physics.
Sol
Rank Correlation
Rank in Maths Rank in d=X-Y d2
(X) Physics(Y)
1 1 0 0
2 10 -8 64
3 3 0 0
4 4 0 0
5 5 0 0
6 7 -1 1
7 2 5 25
8 6 2 4
9 8 1 1
10 11 -1 1
11 15 -4 16
12 9 3 9
13 14 -1 1
14 12 2 4
15 16 -1 1
16 13 3 9
2
∑ 𝑑 =136
2
6∑𝑑
6(136)
ρ=1− 2 =1− 16(255)
= 0. 8
(
𝑛 𝑛 −1 )
Repeated Ranks:-If any two or more individuals are bracketed equal in any classification with respect to
characteristics A and B, or if there is more than one item with the same value in the series, then the
Spearman's formula for calculating the rank correlation coefficient breaks down, since in this case each
of the variables X and Y does not assume the values 1,2, ... , n and consequently,𝑥 ≠ 𝑦.
In this case, common ranks are given to the repeated items. This common rank is the average of the
ranks which these items would have assumed if they were slightly different from each other and the next
item will get the rank next to the ranks already assumed. As a result of this, following adjustment or
correction is made in the rank correlation formula.
2
𝑚(𝑚 −1) 2
In the formula, .we add the factor 12
to ∑ 𝑑 , where m is the number of times an item is repeated.
This correction factor is to be added foreach repeated value in both the X-series. and Y-series.
Eg:- Obtain the rank correlation coefficient for the following data:
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Solution:-Calculation rank correlation
In the X-series we see that the value 75 occurs 2 times. The common rank given to these values is 2·5
which is the average of 2 and 3.The next value 68, then gets the next rank which' is 4.
Again we see that value 64 occurs thrice. The common rank given to it is 6 which is the average of 5, 6
and 7.
Similarly in the Y-series, the value 68 occurs twice and its common rank is 3·5 which is the average of 3
and 4.
In X series the correction is to be applied twice, once for the value 75 which occurs twice (m = 2) an<t
then for the value 64 which occurs thrice (m = 3). The total correction for the X-series is
2(4−1) 3(9−1) 5
12
+ 12 = 2
2(4−1) 1
Similarly, this correction for the Y-series is 12
= 2
as the value 68occurs twice.
2 5 1
6(∑𝑑 + 2 + 2
ρ=1− 2 = 0. 545
(
𝑛 𝑛 −1 )
Regression
The term "regression" literally means "stepping back towards the average" .It was first used by a British
biometrician Sir Francis Galton (1822-1911),in connection with the inheritance of stature.
Curve Regression:-If two variables x and y are correlated i.e., there exist an association or relationship
between them, the scatter diagram will be more or less concentrated round a curve. This curve is called
curve of regression and relationship is said to be expressed by means of curvilinear regression.
Regression Equation:-The mathematical equation of the regression curve is called regression equation.
Linear Regression:-If the curve is a straight line. it is called the line of regression and there is said to be
linear regression' between the variable, otherwise regression is said to be curvilinear.
Regression Coefficients:- 'b', the slope of the line of regression of Y on X is also called the coefficient of
regression of Y on X. It represents the increment in the value of dependent variable Y corresponding to a
unit change in the value of independent variable X. More precisely, we write
σ𝑦
bYX = Regression coefficient of Y on X =𝑟 σ𝑥
where r is correlation coefficient
Similarly, the coefficient of regression of X on Y indicates the change in the value of variable X
corresponding to a unit change in the value of variable Y and is given by
σ𝑥
bXY = Regression coefficient of X on Y=𝑟 σ𝑦
Eg:-Prove that correlation coefficient is the geometric mean between the regression coefficients.
Proof:- As we Know
σ𝑦 σ𝑥
𝑏𝑦𝑥 = 𝑟 σ𝑥
and 𝑏𝑥𝑦 = 𝑟 σ𝑦
σ𝑦 σ𝑥
𝑏𝑦𝑥 * 𝑏𝑥𝑦 = 𝑟 σ𝑥
*𝑟 σ𝑦
𝑟= 𝑏𝑦𝑥 * 𝑏𝑥𝑦
(1) Correlation coefficient is the geometric mean between the regression coefficients.
(2) If one of the regression coefficients is greater than unity the other must be less than unity
(3) Arithmetic mean of the, regression coefficients is greater than the correlation coefficient r, provided r>0.
(4) Regression coefficients are independent of the change of origin but not of scale.
(5) Angle between two lines or regression: eq of two lines is
σ𝑦 σ𝑥
𝑌 −𝑦= 𝑟 σ𝑥
(𝑋 − 𝑥)and𝑋 − 𝑥 = 𝑟 σ𝑦
(𝑌 − 𝑦)
If θ is the angle between the two lines of regression then
θ = 𝑡𝑎𝑛
−1⎰ 1−𝑟
⎱ 𝑟
2 σ𝑥σ𝑦 ⎱
σ𝑥+σ𝑦 ⎰
2 2
( )
Cases:-
π
(i) (r = 0). If r = 0, tan θ = ∞⟹ θ = 2
Thus if the two variables are uncorrelated, the lines of regression become
perpendicular to each other.
(ii) (r = ± 1). If r = ±I, , tan θ = 0⟹θ = 0 𝑜𝑟 π.
In this case the two lines of regression either coincide or they are parallel to each
other. But since both the lines of regression pass through the point.
Eg.:- In a partially destroyed laboratory record of an analysis of correlation data, the following results
only are legible:
Varlance of X = 9.
Regression equations: 8X - 10Y + 66 = O. 40X -18Y = 214.
What were (i) the mean values of X and Y.
(ii) the correlation coefficient between X and Y. and
(iii) the standard deviation of Y ?
Solution :-(i) Since both the lines of regression pass through the point(𝑋, 𝑌), we have
X 6 2 10 4 8
Y 9 11 5 8 7
Sol
Regression equation
X Y x=X-6 y=Y-8 xy x2 y2
6 9 0 1 0 0 1
2 11 -4 3 -12 16 9
10 5 4 -3 -12 16 9
4 8 -2 0 0 4 0
8 7 2 -1 -2 4 1
2 2
∑𝑋 = ∑𝑌 = ∑𝑥 = ∑𝑦 = ∑ 𝑥𝑦 =-26 ∑ 𝑥 =40 ∑ 𝑦 =20
30 40 24 32
𝑋 =6 𝑌 =8
∑(𝑋−𝑋)(𝑌−𝑌) ∑𝑥𝑦
𝑐𝑜𝑣(𝑋,𝑌) 1 −26
𝑟= σ𝑥σ𝑦
= 𝑛 1 1 = 1 1 = 1 1
( )( )
2 2 2 2
∑(𝑋−𝑋)
2
∑(𝑌−𝑌)
2
2 2 (40) 2 (20) 2
⎛ ⎞ ⎛ ⎞ ∑𝑥 ∑𝑦
𝑛 𝑛
⎝ ⎠ ⎝ ⎠
σ𝑥 −26
𝑏𝑥𝑦= 𝑟 σ𝑦
= 20
=− 1. 3
σ𝑦 −26
𝑏𝑦𝑥= 𝑟 σ𝑥
= 40
=− 0. 65