0% found this document useful (0 votes)
13 views10 pages

Correction

The document discusses bivariate data, which involves the investigation of two variables and their correlation through scatter diagrams. It explains the concepts of correlation and regression, detailing how to determine the relationship between two variables and the calculation of the correlation coefficient. Additionally, it provides examples and formulas for calculating correlation and regression coefficients, emphasizing the importance of these statistical measures in understanding the relationship between variables.

Uploaded by

Mainak Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views10 pages

Correction

The document discusses bivariate data, which involves the investigation of two variables and their correlation through scatter diagrams. It explains the concepts of correlation and regression, detailing how to determine the relationship between two variables and the calculation of the correlation coefficient. Additionally, it provides examples and formulas for calculating correlation and regression coefficients, emphasizing the importance of these statistical measures in understanding the relationship between variables.

Uploaded by

Mainak Bose
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

CHAPIER Correlation and

8 Regression,
Rank Correlation
81 Bivariate Data

we
havVe considered the data arising out of investigation of one character
So
only
far
these are known
andth
as univariate data. But situation may come when we
are
the data which arises out of investigation of two characters and then this
data will be called bivariate data. As for example, we may consider jointly (i) Age
to
study

Height of persons, (ii) Marks both in Mathematics and Physics of students.


ana and Blood pressure of persons, etc. The collection of simultaneous
Age
measurements
(i) of two variables is known as bivariate data. In bivariate data,
generally one variable is denoted by x and the other by y. Our raw data will consist

ofa
number rof pairssof values of xand yi.e. (x, y), each pair being for one individual.
observations in bivariate data are like
Hence the

82 Scatter Diagram or Dot Diagram


In case of a bivariate distribution having found n pairs of values (x y). (x, y).
on two variables (x, y), each pair being
corresponding to each individual, we may consider
these pairs of values to be n points in xy-plane by
plotting the variables Xand y along x-axis and y-axis
respectively in a Rectangular cartesian system. Then
pointing all these points in adata, generaly we get
aset of points in the xy-plane. This diagramatic
representation of a bivariate data is known as Scatter
Diagram or Dot Diagram.
8.2.1 Characteristics of Scatter Diagram ’X
AScatter diagram generally indicates the nature
how two variables are correlated with each other.
Fig. 8.1 Dot Diagram
Ihe following conclusions can be drawn by studying
the nature of the scatter diagram :
the points follow approximately
I the pattern of the dot diagram be such that Fig. 8.2(a)] then the correlation
iear path from bottom left corner to right
top
the bottom
is said to be 1positive and if this linear path be from the left top corner to
right Fig. 8.2(b)]. then the correlation is said to be negative.
Nhen the dot diagram does not show anytendency of a linear path hut
correlation is said to be
zero.
Fig. 8.2), 8.2(d)], the correlation
perfectly linear having
The correlation is said to be
the dot-diagram takes
the hape of a straight line (oeffirint
0f -1 according as 8.2(c), 8.2(0]
the 1-avis |lig.
angle 45 or -45 with

> X

Fig. 8.2(a) Fig.. 8.2(b) Fig. 8.2(c)


Y

’ X
m nwmn w nu
Fig, 8.2(d) Fig. 8.2(e) Fig. 8.2()

Thus we may say that a scatter diagram gives an indication


of the degree of linear
correlation between two given variables.

8.3 Correlation and Regression


say these two variables
When two variables arc considered in abivariate data, we
to be correlated if the change in the value of one is related to
the change in the
value of the other. As for example, we may say that the pressure and volume
correlated.
a gas are
Correlation may be of two types: (1)If the increase in the value of one variabk
these t
brings on average the increase in the value of the other variable, then
variables are said to be positively correlated. decrease
(2) If the increase in the value of one variable brings on average the negatively

inthe value of the other variable, then thesetwo variables are said to be:
correlated.
offthe
t
If the values of the variable y are not affected by changes in the values
variable x, we say that the two variables are uncorrelated.
regression
we want to denote the
By approximate the word value of one variable for estimation or the prediction
citit stands to specified value of the other variable.
ofthe words measure the average relationship between
On the
other
different
v a r i a b l e s .

Curve: In a scatter diagram, in most


of
in the diagram are more or the cases, it is noticed that
R e g r e s s i o n

the points plottgd

less concentrated in the


of a curve which is called
neighbourhood
regression curve. Mathernatical equation
ofaregression curve, called regression equation, gives how on average a dependernt
le changes under the change of
rariable ch the independent variable. So, if in a bivariate
the two variables denoted by x
and yare correlated, then treating one
other as
d i s t r i b u t i o n ,

ahle as independent
and dependent variable, we may assume a
f u n
relation between these two variable like yy = (x), or X= v(y) which
c t i o n a l

are known as regression equations and the mathematical curves denoted by them
curves.
are called regression
There are two types of regression curves-one is 'y on x' and another is 'x on

aRegression curve of y on x: If among the two variables x and y in


bivariate distribution, y is taken as dependent variable and x as independent
rariable then the corresponding regression curve like y= (x) is known as the
reression curveof y on x. From this curve we may get approximately the value
f the variable y by knowing the variable x.
(b) Regression curve of x on y: If among the two variables x and y in a
bivarate data, r is taken as dependent variable and yas independent variable, then
the corresponding regression curve like x = yy) is called the regression curve of
I0n y. From this curve we may get approximately the value of the variable x by
knowing that of the variable y.
In particular, if both the regression curves are linear (i.e. the regression curves
are straight lines), then the corresponding regressions are said to be linear regres
SOnS and the corresponding straight lines are called regression lines.

B4 Correlation Coefficient
Or simple regression, ie. when the two regression curves are linear, then
fhotae as 'correlation
u degree of collinearity is measured by aa quantity known
encient and it is generally denoted by xy or Pxy
To determine covariance, we use either formula (1) or formula (2).
The covariance of x and y measures how the variables &and y are connected. However
Cov (x, y) is not a dimensionless quantity. Hence to find a dimensionless relation between the
variables x and y we need to determine the correlation coefficient (or coefficient of correlation).
The correlation coefficient between x and y is denoted by r, Or p (which we denote by
r(x, y) in chapter 5) and is defined by
Cov (x, y) ... (3)
S$y
Sy are standard deviation of x and y respectively.
where s, and s,,
If the given sample for the variables x and y are (x; y;), i = 1, 2, ..., n then from (3),

... (4)
Ty
-()2 -

n 1

nxy- *2 ... (5)


i.e., Txy =

where the summation is taken over all samples of x and y.


Note
The above formula to determine correlation coefficient is known as Karl Pearson's formula
Example 1.Given Sx =56, y Xy =40, 2 =524, y =256, Zy =

364, n = 8 find (i)the correlation coefficient and (ii) the regression equation of
X on y.
» Solution :
With usual notions,

n 12

S,Sy
2

Now, S. = n 11

2
524 56
V8 8

= 4.06.
11

25640O = 2.64.

364 56 40
8 8 = 0.98.
Therefore, 4.06x 2.64

Again,

4.06
= 0.98 X
2.64 = 151.

=7
Now, n

y 40 =5.
n 8

Thus, the regression line of x on y is

x-*=bu (y-)
or, x-7= 1.51(y - 5)
or, - 15ly + 0.55 = 0.,

WEsample 2. Calculate correlation coefficient from the following results:


10, Sx =140, y = 150, (x- 10)² = 180, S(y-15)² = 215,
Xr-10)(y-15) = 60.
» Solution :
Let u=I - 10 and v = y- 15.

Then, }u = S(x-10)
=)x-10-n = 140 - 10 x 10 = 40.

XU =Xy-15)
=)y-15n = 150 - 15 x 10 = 0.

Again, = (r-10 =180


o = (y-15) =215
and u = S(r-10)(y -15) = 60.
Now,

180
= 2
10 10

17

215
10
(0 = 21.5.

Cov(u, v) = 11 11 11

60 40 0 = 6.
10 10 10

Cov(u, v)
Therefore,

6 6
= 0.91.
V2x/21.5 6.557

Example3 Calculate the coefficient of correlation for the following dan


X: -3 -2 -1 0 1 2 3

y : 9 4 1 0 1 4 9

Interprete the result.


» Solution :
To calculate the coefficient of correlation we consider the following table:

-3 9 81 -27
-2 4 4 16 -8
-1 1 1 -1
0 0 0
1 1 1 1 1
2 4 4 16
3 9 81 27
Total 0 28 28 196 0
Correlation coefficient

0-0x 7
2

Since
=0, So the variables are uncorrelated.
Note: This example shows that though the two variables x and yare connected
by the relation y = x, ie. not independent but they are uncorrelated. So
two variables may be uncorrelated even if they are not independent.
Example 4. Suppose the following series of values for 2 variables xand y are
given:
1 2 3 4 5 6 7 8 9
9 8 7 6 5 3 2
What will be the correlation coefficient between x and y?
» Solution :
To find the correlation coefficient, let us cons:der the following table :
V= / - 5
9 4 4 16 16 -16
2 8 -3 3 -9
3 -2 2 4 4 -4
4 6 -1 1 1 1 -1

5 0 0

6 4 1 -1 1 -1
7 3 2 -2 4 4 4

8 3 -3 9 9 -9
9 1 -4 16 16 -16

|Total 0 60 60 -60

60
.-0x0
9 = -1,
Now, 60 .-0
V9
So, = = l and hence the correlation coefficient betweenv
and the variables r and yare linearly related.
N Example 5. If x and y are two correlated variables with same
the correlation coefficient is r, find the regression coefficient of variance
On (x and
that of (r + y) on x. Hence, find correlation coefficient between +y)
x and and
» Solution :

Let o =o = o and r,, =r.

Now, Cov(x, y)

or, Cov(r, y) = ro².


Let, u = I and å=1 + y.

Then,

o = Var(x +y) =Var(x) + Var(y) +2 Cov(r, y)


= +? +2-ro =26 (1 +r).

Cov(u, v) = X(u-T)\7-7)
ZI(x-7)\x+y-+-ù))

X(x-I)\x-++y-J)}

Xx-7}(x-)+(y-) +
1

Gr+ Cov(x,y) =o² +r² =o(l+r)


Cov(u,v)

Thus, = b,m = up oCov(u, v) u o(1+r)1


bgy 2o(1+r) 2

o, Cov(v, u) o,o(1+r) = 1+I.


Similarly,

1
Hence,
Example 6. A bivariate sample of size 11
gave the results x =7, S,=2, y
gave
=4anddr=0.5. It was later found that one pair (x
=9,5, =7, y = 9) was inaccurate
was rejected. How would original value of r be affected by this rejection?
Solution :
When X=7 and y = 9 are rejected.
7×11-7
Now,
10

9x11-9
= 9
10 Isince, s =L-(1
--
x = (4 ++49) >x 11 - 49 = 534 So,

)y = (16 + 81) x 11 - 81 = 986

xy=[0.5 x2 x4+(7x9)]× 11 -(7x9) = 674

Xxy - Xy
[since, r= so,xy =(rS,S,+y)n]

So, noW 534


s, = 11 --(7°= -
10 7 =2.098.

986
.-81 = 4.195
S, = 10
V

674
-7x9
n 10
2.098 x 4.195
= 0.499 0.5.

So the new correlation is unaffected by this rejection.

You might also like