100% found this document useful (1 vote)
95 views9 pages

Correlation & Regression (Complete) .PDF Theory Module-6-B

1.1 Univariate Distribution : These are the distributions in which there is only one variable is involved. For example (i) The height of students in a class. (ii) The marks obtained by students in a class. 1.2 Bivariate Distribution : Distributions involving two discrete variables is called a bivariate distribution For example (1) The heights and weights of the students for a class in a school. (2) The marks obtained by students of a class in two subjects. 1.3 Bi

Uploaded by

Raju Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
95 views9 pages

Correlation & Regression (Complete) .PDF Theory Module-6-B

1.1 Univariate Distribution : These are the distributions in which there is only one variable is involved. For example (i) The height of students in a class. (ii) The marks obtained by students in a class. 1.2 Bivariate Distribution : Distributions involving two discrete variables is called a bivariate distribution For example (1) The heights and weights of the students for a class in a school. (2) The marks obtained by students of a class in two subjects. 1.3 Bi

Uploaded by

Raju Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

CORRELATION & REGRESSION

Total number of Questions in Correlation & Regression is :


(i) In chapter Examples .............................................................07
(ii) Solved Examples....................................................................... 09
Total no. of questions...................................................................16
Similarly f or some fixed values of y, the
1. DEFINITIONS
frequencies with which various x values occur,
when listed gives the conditional frequency
1.1 Univariate Distribution : These are the
distribution of x on y.
distributions in which there is only one variable is
involved.
2. CO-VARIANCE
For example
Before we study correlation, let us introduce the
(i) The height of students in a class.
concept of covariance between two quantitative
(ii) The marks obtained by students in a class. variables.
1.2 Bivariate Distribution : Distributions involving Definition : Covariance is the arithmetic mean
two discrete variables is called a bivariate of the products of the corresponding deviations of
distribution two series from their respective means.
For example Let two variables x and y takes the values x1, x2,
(1) The heights and weights of the students for x 3 ........x n and y 1 , y 2 , y 3 ...........y n then
a class in a school. covariance is defined as –
(2) The marks obtained by students of a class
in two subjects. Cov(x, y) =
b gb g
 x x y y
n
1.3 Bivariate frequency distribution : This is a Where x and y are the means of x and y series
distribution in which two variables are involved. respectively.
Let x and y be two variables. Suppose x takes
the values x 1, x 2,...........,x n and y takes the 3. CORRELATION
values y1, y2,............yn then we record our
observations in the form of ordered pairs (x i, yj), The relationship between two variables such that
where 1  i  n, 1  j  n, If a certain pair a change in one variable results in a positive or
occurs f ij times, we say that its frequency is f ij. negative change in the other variable is known as
correlation.
The function which assigns the frequencies f ij's
to the pairs (x i, yj) is known as a bivariate 3.1 Types of correlation :
frequency distribution.
(i) Perfect Correlation : If the two variables
Two way frequency tables : In such tables, vary in such a manner that their ratio is
the top row consists of the values of the variable always constant, then the correlation is said
x and the left hand column consists of the values to be perfect.
of the y. The frequencies corresponding to a pair (ii) Positive or direct correlation : If an
of values are written in the cell at the intersection increase or decrease in one v ariable
of the relevant row and column. corresponds to an increase or decrease in
The column total provide the univariate frequency the other, the correlation is said to be
distribution of x and the row totals provide the positive.
univariate frequency distribution of y. These For example
column totals and the row totals are known as Income and expenditure are positively (or
the marginal frequency distributions of x and directly) correlated because expenditure
y respectively. increases as income increases. Expenses
are curtailed with the decrease in income.
Conditional frequency and frequency
distribution : From the bivariate frequency (iii) Negative or indirect Correlation : If an
distribution we can study relationship of two increase or decrease in one v ariable
variables and their degree. Once we study the corresponds to a decrease or increase in the
degree of relationship we can estimate the value other, the correlation is said to be negative
of one variable while the value of the other variable For example
is given for some fixed values of x the Pressure and volume of a gas are negatively
frequencies with which the various y values occur, (or inversely) correlated because with the
when listed gives the conditional frequency increase of pressure on a gas these is
distributions of y on x. decrease in volume & viceversa.
(iv) Zero Correlation : If the variation in one Sol. Here x = 68, y = 69
has no relation with that in the other then the
variable have no correlation or there is zero x – x y – y (x – x )(y – y ) (x – x )2 (y – y )2
correlation between the variables. –3 –2 6 9 4
–2 –1 2 4 1
4. COEFFICIENT OF CORRELATION
–1 –3 3 1 9
Karl Pearson gave the following formula for the 0 0 0 0 0
calculation of correlation coefficient between two 1 3 3 1 9
variables x and y -
2 3 6 4 9
 ( x  x)( y  y ) 3 0 0 9 0
rxy =
 ( x  x) 2  ( y  y ) 2
 (x – x ) (y – y ) = 20
where x, y, x and y have usual meanings
20
Hence rxy = = 0.67
 dx dy (28 )(32)
or rxy = r =
dx 2  dy 2 Ans. [D]

where dx = (x – x ), dx 2 = (x – x )2 5. RANK CORRELATION


dy = (y – y ) and dy2 = (y – y )2 Rank correlation is the correlation between
Modified formula : dif f erent ranks or grades of the two
characteristics
dx. dy 6 d2
dxdy  It is given by 1 –
n
r = n(n 2  1)
R
|Sdx 2

( dx) 2 U
|VR
|Sdy 2

( dy) 2 U
|V n
|T n |W|T n |W Here d2 =  {(x i – x ) – (yi – y )}2
i1
where d2 = sum of the squares of the difference
Cov ( x, y) Cov ( x, y) of two ranks and n is the number of pairs of
Also rxy = x y = Var ( x). Var ( y) observations.

Examples
based on Coefficient of Correlation Examples
based on Rank Correlation
Ex.1 If var (x) = 8.25, var (y) = 33.96 and
cov (x, y) = 10.2, then the correlation Ex.3 From a random sample space, 5 students
coefficient is has been selected. Their marks in Maths
(A) 0.89 (B) – 0.64 and Statistics as given below -
(C) 0.61 (D) – 0.16 Roll No. 1 2 3 4 5
co ( x, y) 10.2 Marks in Maths 85 60 73 40 90
Sol. rxy = =
ar ( x). ar ( y) (8.25)(33.96) Marks in Statistics 93 75 65 50 80
= 0.61 Ans.[C] The rank correlation coefficient is
(A) 0.8 (B) 0.6
Ex.2 The coefficients of correlation between the (C) 0.5 (D) – 0.8
heights (in inches) of fathers and sons from
the following data Sol. Rank in Maths 2 4 3 5 1
Rank in Statistics 1 3 4 5 2
Heights of fathers (x) 65 66 67 68 69 70 71
Heights of sons (y) 67 68 66 69 72 72 69
Difference in rank (d2) 1 1 1 0 1

will be 64 4
  = 1 – = = 0.8
(A) 0.60 (B) – 0.60 5  24 5
(C) – 0.67 (D) 0.67 Ans.[A]
6. PROPERTIES OF CORRELATION COEFFICIENT 8.2 Line of regression of x on y :
(r) The line of regression of x on y gives the most
probable values of x for given values of y and so
(a) r lies between – 1 and + 1
it is used to estimate the value of x for given
(b) the correlation is value of y. Its equation is -
(i) perfect and positive if r = + 1
(ii) perfect and negative if r = – 1 cov.( x, y)
x – x =  2y (y – y )
(iii) not correlated if r = 0
(iv) positive if r > 0
x
(v) negative if r < 0 x – x = r  (y – y )
y
(c) It is independent of the change of origin and
scale. Examples
(d) It is a pure number and hence unitless based on Line of Regression
(e) If x and y are independent then r = 0
Ex.4 For the following data
7. REGRESSION ANALYSIS x y
Mean 65 67
In the previous article we have seen that
Standard deviation 5.0 2.5
correlation is merely a tool of ascertaining the
degree of relationship between two variables. If Correlation coefficient 0.8
does not tell any thing about the functional Then the equation of line of regression of y
relationship or nature of relationship between two on x is
variables but regression analysis attempts to 2
study the functional relationship between the (A) y – 67 = (x – 65)
5
variables so that one can predict the value of one
1
variable for the given value of the other variable, so (B) y – 67 = (x – 65)
5
Regression analysis is a statistical device with 2
the help of which we can estimate or predict the (C) x – 65 = (y – 67)
5
unknown values of one variable from the known
values of the other variable. 1
(D) x – 65 = (y – 67)
5
8. LINE OF REGRESSION Sol. Since the line of regression of y on x is
y
The regression line is a graphical method, which
y – y = r.  (x – x )   y – 67 =
describes the average of relationship between the x
two variables. 0.8  2.5
Let us take the case of two variable x and y we (x – 65)
5
shall have two lines of regression because there
are two variable. 2
  y – 67 = (x – 65) Ans.(A)
(i) Y on X (ii) X on Y 5

Ex.5 The two lines of regression are given by


8.1 Line of regression of y on x :
3x + 2y = 26 and 6x + y = 31. The coefficient
The line of regression of y on x gives most of correlation between x and y is
probable values of y for given values of x and so
it is used to estimate y for any given value of x. 1 1
(A) – (B)
Its equation is - 3 3
cov.( x, y) 1 1
y – y = (x – x ) (C) – (D)
 2x 2 2

y 3
Sol. 3x + 2y = 26 y = – x + 13
or y – y = r .  (x – x ) 2
x
10. PROPERTIES OF REGRESSION
1 31
6x + y = 31   x = – y + COEFFICIENTS
6 6

F
G 3 IF 1 I (i) r = b yx . b xy i.e. the coefficient of correlation
H2 JKGH 6 JK =
1
 r = –
2 is the geometric mean between the two
Ans.[C] regression coefficients.
(ii) If byx > 1, then bxy < 1, i.e. If one of the
9. REGRESSION COEFFICIENT regression coefficient is greater then unity
then the other will be less than unity.
(iii) If the correlation between the variables is not
(I) The regression coefficient of y on x is denoted perfect then the regression lines intersect at
by byx and is given by
( x, y)
y cov.( x, y ) (iv) byx is called the slope of regression line y on
byx – r.  = x and bxy is called the slope of regression
x  2x
line x on y.
This represents the change in the values of y
corresponding to a unit change in x . (v) byx + bxy > 2 b yx . b xy or byx + bxy > 2r
(i) The coefficient of regression of x on y is i.e the arithmetic mean of the regression
denoted by bxy and is given by coefficient is greater than the correlation
coefficient.
x cov.( x, y)
bxy = r  = (vi) Regression coefficients are independent of
y  2y change of origin but not of scale.
(vii) The product of lines of regression's gradients
This represents the change in the value of x
corresponding to a unit change in y.  2y
is given by
 2x
Examples (viii)If the angle between lines of regression is 
Regression coefficient
based on
F1 r I F
2   I
Hr K H   JK
G J G x y
Ex.6 If regression coefficient of y on x is 0.40, then tan   = . 2 2
x y
then the regression coefficient of x on y will
be (ix) If both the lines of regression coincide, then
correlation will be perfect linear
(A) 1.6 (B) 6.4
(C) 5.1 (D) 3.2 (x) If both byx and bxy are positive, then r will be
positive and if both byx & bxy are negative
Sol. We know that product of both regression
then r will be negative
coefficient must be  1,
therefore 1.6 × 0.40 = .64 < 1 Ans.[A]
11. IMPORTANT POINTS ON REGRESSION
LINES
Ex.7 If the two regression coefficient between x
and y are 0.8 and 0.2, then the coefficient of 
(I) If r = 0, then tan is not defined i.e. =
correlation between them is 2
(A) 0.4 (B) 0.6 Thus If two variables are not correlated, then
(C) 0.3 (D) 0.5 the lines of regression are perpendicular to
each other.
Sol. If to regression coefficient are 0.8 and 0.2
then coefficient of correlation will be (ii) If r = ± 1, then tan = 0 i.e. = 0 Thus
the regression lines are coincident
r = + 0.8  0.2 = 0.4 Ans.[A]
(iii) If regression lines are y = ax+b & x = cy+
bc  d ad  b
d then x = and y =
1 ac 1 ac
12. STANDARD ERROR OF PREDICTION

The deviation of the predicted value from the


observed value is known as the standard error
of prediction and is defined as

R
|S (y  y )
p
2 U
|V
Sy =
|T n |W
where y is actual value and yp is predicted
value.
In relation to coefficient of correlation, it is given
by -
(i) Standard error of estimate of x is

Sx = x 1 r 2
(ii) Standard error of estimate of y is

Sy = y 1 r 2

Your stage : Now you may solve remaining Qs. of Ex.1


SOLVED EXAMPLES
Ex.1 If
 Z2 = a2x2 + b2y2 + 2ab Cov (X, Y)
x : 3 4 8 6 2 1
 Z2 = a2 X2 + b2Y2 + 2ab r X Y
y : 5 3 9 6 9 2
then the coefficient of correlation will be  Cov( X, Y) 
approximately -   r Ans.[A]
 XY 
(A) 0.49 (B) 0.40
(C) – 0.49 (D) – 0.40 Ex.3 If X and Y are two independent variables with
34 means 5 and 10 and variances 4 and 9
Sol. Here x = 4, y = = 5.66
6 respectively. If U = 3X + 4Y and V = 3X – Y,
x – x y – y (x – x ) (y – y ) then r (U, V) is equal to -
–1 –0.66 0.66 (A) 0 (B) 1
0 –2.66 0 2
(C) (D) None of these
4 3.34 13.36 3
2 0.34 0.68 Sol. We have,
–2 3.34 –6.68 U = 3X + 4Y and V = 3X – Y
–3 –3.66 +10.98
U  3X  4Y and V  3X  Y
 (x – x )2 = 34,  (y – y )2 = 43.33
 U  U = 3 (X  X) + 4 (Y  Y)
19
Therefore rxy = = 0.49 and, V – V = 3 (X  X) – (Y  Y)
(34) (43.33)
Ans.[A] 1
 Cov (U, V) = (U  U )( V  V )
n
Ex.2 If Z = a X + bY and r is the correlation
1
coefficient between X and Y, then z2 is =  { 9( X  X ) 2  4( Y  Y ) 2  9( X  X )( Y  Y )]
equal to - n
(A) a2 x2 + b2 y2 + 2ab r x y = 9X2 – 4Y2 + 9Cov (X, Y)
(B) a2 x2 + b2 y2 – 2ab r x y = 9 x 4 – 4 x 9 + 9 x 0
(C) 2ab r x y [ X, Y are independent, Cov (X, Y) = 0]
(D) None of these = 0
Sol. We have,
Cov (U, V )
Z = aX + bY ...........(i) Hence, r(U, V) = = 0 Ans.[A]
u  v
 Z  aX  bY ...........(ii)
From (i) and (ii),
Ex.4 If X and Y are two uncorrelated variables and
Z  Z = a (X  X) + b (Y  Y) If U = X + 4 and V = X – Y, then r (U, V) is
equal to -
 ( Z  Z ) 2 = a2 ( X  X ) 2 + b2 ( Y  Y ) + 2ab
 2X   2Y  2X   2Y
(X  X) (Y  Y) (A) (B)
 2X   2Y  2X   2Y
2 1 1
  ( Z  Z)  a2  ( X  X) 2
 b2
n n  2X   2Y
(C) (D) None of these
XY
2 1
 (Y  Y)  2ab
n  (X  X )(Y  Y ) Sol. Putting a = b = 1 and a = 1, b = – 1 in

 2aX bY = a2X2 + b2Y2 + abr (X, Y) XY,


we get,
a2
Y = –
 2X  Y = X2 + Y2 and  2X  Y = X2 + Y 2 b2
[ r = 0 (given)]
2
b2
 U = + X2 Y2 and V2 = X2 + Y2  bXY = –
a2
(i)
 cov (U, V) = X2 – Y2 Since bYX . bXY < 1, therefore

Cov(U, V ) Cov(U, V ) a1 b a1b2


 r(U, V) = = – – 2 < 1  a b < 1
 U V U2 b1 a2 2 1

[ U = V from (i)]  a1b2 < a2b1 Ans.[B]

 2X   2Y
= Ans.[B] Ex.7 Let X and Y be two variables with the same
 2X   2Y mean. If the lines of regressions of Y on X
and X on Y are respectively y = ax + b and
Ex.5 If the coefficient of rank correlation between x = y + , then the value of the common
marks in Mathematics and marks in Physics mean is -
obtained by a certain group of students is b 1 a
0.8. If the sum of the squares of the (A) (B)
1 a b
differences in ranks is given to be 33, then
the number of students in the groups is -  b
(C) (D)
(A) 11 (B) 10 1 a 1 
(C) 30 (D) None of these
Sol. We have X = Y (given)
6  di2 Since the two lines of regression pass
Sol. We have, r = 1 – 2
n (n  1) through ( X , Y ) , therefore

It is given that r = 0.8,  di2 = 33. Y  aX  b  X  aX  b [ X  Y ]


Therefore
b
6( 33) 6 ( 33)  X Y Ans. [A]
1 a
0.8 = 1 – 2  2 = 0.2
n (n  1) n (n  1)
 n(n2 – 1) = 990 Ex.8 If acute angle between the two regression
 n(n2 – 1) = 10 (102 – 1)  n = 10 lines is , then -
Ans.[B] (A) sin  > 1 – r2 (B) tan  > 1 – r2
(C) sin  < 1 – r 2 (D) tan  < 1 – r2
Ex.6 If the lines of regression of Y on X and X on
 1 r 2    x y 
Y are respectively a1x + b1y + c1 = 0 and  
a2x + b2y + c2 = 0, then - Sol. tan  =  r   2  2 
   x y
(A) a1a2 < b1b2 (B) a1b2 < a2b1
(C) a2b1 < a1b2 (D) None of these Since A.M. of two quantities > G.M. of the
same quantities, hence
Sol. The lines of regressions of Y on X and X on
Y are respectively a1x + b1y + c1 = 0 and  2x   2y  x y 1
a2x + b2y + c2 = 0. Therefore, bYX = slope  > x y  < s
2  2x   2y 2
of the line of regression of Y on
2
a1 1 r 2  1 r2 
X = – b .......(i)  tan  <  tan  < 
2 
1 2r  2r 

1 tan2 
Since sin2  =
b XY = slope of the line of regression of X on 1  tan2 
Ex.9 If bYX and bXY are regression coefficients of
 2  Y on X and X on Y respectively, then -
  1 r 2   (A) bYX + bXY = 2r (X, Y)
  2r  
2     (B) bYX + bXY < 2r (X, Y)
 sin  <  2 
  2 (C) bYX + bXY > 2r (X, Y)
 1   1  r   
   (D) None of these
 2r   
    r Y r X
Sol. Since bYX =  and bXY =  . Therefore,
X Y
2
 1 r 2  bYX . bXY = r2  r is the GM of bYX and bXY
 sin2  <  
 1 r2  b YX  b XY
But is the AM of bYX and bXY
r2
Since 1 – < 1 + r2 andsin2 <1 2
and AM > GM
 sin  < 1 – r2 Ans.[C]
b YX  b XY
Therefore, > r  bYX + bXY > 2r
2
Ans.[C]

You might also like