Correction
Correction
8 Regression,
Rank Correlation
81 Bivariate Data
we
havVe considered the data arising out of investigation of one character
So
only
far
these are known
andth
as univariate data. But situation may come when we
are
the data which arises out of investigation of two characters and then this
data will be called bivariate data. As for example, we may consider jointly (i) Age
to
study
ofa
number rof pairssof values of xand yi.e. (x, y), each pair being for one individual.
observations in bivariate data are like
Hence the
> X
’ X
m nwmn w nu
Fig, 8.2(d) Fig. 8.2(e) Fig. 8.2()
inthe value of the other variable, then thesetwo variables are said to be:
correlated.
offthe
t
If the values of the variable y are not affected by changes in the values
variable x, we say that the two variables are uncorrelated.
regression
we want to denote the
By approximate the word value of one variable for estimation or the prediction
citit stands to specified value of the other variable.
ofthe words measure the average relationship between
On the
other
different
v a r i a b l e s .
ahle as independent
and dependent variable, we may assume a
f u n
relation between these two variable like yy = (x), or X= v(y) which
c t i o n a l
are known as regression equations and the mathematical curves denoted by them
curves.
are called regression
There are two types of regression curves-one is 'y on x' and another is 'x on
B4 Correlation Coefficient
Or simple regression, ie. when the two regression curves are linear, then
fhotae as 'correlation
u degree of collinearity is measured by aa quantity known
encient and it is generally denoted by xy or Pxy
To determine covariance, we use either formula (1) or formula (2).
The covariance of x and y measures how the variables &and y are connected. However
Cov (x, y) is not a dimensionless quantity. Hence to find a dimensionless relation between the
variables x and y we need to determine the correlation coefficient (or coefficient of correlation).
The correlation coefficient between x and y is denoted by r, Or p (which we denote by
r(x, y) in chapter 5) and is defined by
Cov (x, y) ... (3)
S$y
Sy are standard deviation of x and y respectively.
where s, and s,,
If the given sample for the variables x and y are (x; y;), i = 1, 2, ..., n then from (3),
... (4)
Ty
-()2 -
n 1
364, n = 8 find (i)the correlation coefficient and (ii) the regression equation of
X on y.
» Solution :
With usual notions,
n 12
S,Sy
2
Now, S. = n 11
2
524 56
V8 8
= 4.06.
11
25640O = 2.64.
364 56 40
8 8 = 0.98.
Therefore, 4.06x 2.64
Again,
4.06
= 0.98 X
2.64 = 151.
=7
Now, n
y 40 =5.
n 8
x-*=bu (y-)
or, x-7= 1.51(y - 5)
or, - 15ly + 0.55 = 0.,
Then, }u = S(x-10)
=)x-10-n = 140 - 10 x 10 = 40.
XU =Xy-15)
=)y-15n = 150 - 15 x 10 = 0.
180
= 2
10 10
17
215
10
(0 = 21.5.
Cov(u, v) = 11 11 11
60 40 0 = 6.
10 10 10
Cov(u, v)
Therefore,
6 6
= 0.91.
V2x/21.5 6.557
y : 9 4 1 0 1 4 9
-3 9 81 -27
-2 4 4 16 -8
-1 1 1 -1
0 0 0
1 1 1 1 1
2 4 4 16
3 9 81 27
Total 0 28 28 196 0
Correlation coefficient
0-0x 7
2
Since
=0, So the variables are uncorrelated.
Note: This example shows that though the two variables x and yare connected
by the relation y = x, ie. not independent but they are uncorrelated. So
two variables may be uncorrelated even if they are not independent.
Example 4. Suppose the following series of values for 2 variables xand y are
given:
1 2 3 4 5 6 7 8 9
9 8 7 6 5 3 2
What will be the correlation coefficient between x and y?
» Solution :
To find the correlation coefficient, let us cons:der the following table :
V= / - 5
9 4 4 16 16 -16
2 8 -3 3 -9
3 -2 2 4 4 -4
4 6 -1 1 1 1 -1
5 0 0
6 4 1 -1 1 -1
7 3 2 -2 4 4 4
8 3 -3 9 9 -9
9 1 -4 16 16 -16
|Total 0 60 60 -60
60
.-0x0
9 = -1,
Now, 60 .-0
V9
So, = = l and hence the correlation coefficient betweenv
and the variables r and yare linearly related.
N Example 5. If x and y are two correlated variables with same
the correlation coefficient is r, find the regression coefficient of variance
On (x and
that of (r + y) on x. Hence, find correlation coefficient between +y)
x and and
» Solution :
Now, Cov(x, y)
Then,
Cov(u, v) = X(u-T)\7-7)
ZI(x-7)\x+y-+-ù))
X(x-I)\x-++y-J)}
Xx-7}(x-)+(y-) +
1
1
Hence,
Example 6. A bivariate sample of size 11
gave the results x =7, S,=2, y
gave
=4anddr=0.5. It was later found that one pair (x
=9,5, =7, y = 9) was inaccurate
was rejected. How would original value of r be affected by this rejection?
Solution :
When X=7 and y = 9 are rejected.
7×11-7
Now,
10
9x11-9
= 9
10 Isince, s =L-(1
--
x = (4 ++49) >x 11 - 49 = 534 So,
Xxy - Xy
[since, r= so,xy =(rS,S,+y)n]
986
.-81 = 4.195
S, = 10
V
674
-7x9
n 10
2.098 x 4.195
= 0.499 0.5.