0% found this document useful (0 votes)
10 views9 pages

Unit III Notes

The document discusses correlation and regression analysis, explaining the concept of correlation as the relationship between two variables and detailing types of correlation such as positive, negative, simple, multiple, linear, and non-linear. It introduces methods for studying correlation, including scatter diagrams and the Karl Pearson coefficient, and provides formulas for calculating correlation coefficients and rank correlation. Additionally, it covers regression analysis, defining independent and dependent variables, and explaining linear and curvilinear regression with associated equations.

Uploaded by

sumitrathore2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Unit III Notes

The document discusses correlation and regression analysis, explaining the concept of correlation as the relationship between two variables and detailing types of correlation such as positive, negative, simple, multiple, linear, and non-linear. It introduces methods for studying correlation, including scatter diagrams and the Karl Pearson coefficient, and provides formulas for calculating correlation coefficients and rank correlation. Additionally, it covers regression analysis, defining independent and dependent variables, and explaining linear and curvilinear regression with associated equations.

Uploaded by

sumitrathore2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Unit III

Correlation & Regression Analysis


Correlation:-It the change in one variable affects a change in the other variable, the variables are said to
be correlated.

For eg.:- The correlation between

(1)​ The income and expenditure


(2)​ the height and weight of the person
(3)​ the volume and pressure of gas
(4)​ price and demand of a commodity

Types of Correlation
(1)​ Positive and negative
(2)​ Simple and multiple
(3)​ Linear and non linear

Positive and Negative Correlation:-


Positive correlation:- If both the variables varying in the same direction is known as positive or
direct correlation. Eg.:-the heights and weights of a group of persons, (il) the income and
expenditure.

Negative correlation:-if variables constantly deviate in the opposite directions, i.e., if increase (or
decrease) in one result in corresponding decrease (or increase) in the other, correlation is said to be
diverse Of negative. Eg.:-price and demand of a commodity, (il) the volume and pressure of a perfect
gas.

Simple and multiple Correlations:-


Simple Correlation:- If we are studying only two variable then it is called simple correlation.

Multiple Correlation:-If we are studying three or more than three variable then it is called Multiple
correlation. For eg:- the yield of rice per acre and both the amount of rainfall and amount of fertilizers
are used.

Linear and Non Linear Correlation:-


Linear Correlation:-If the amount of change in one variable tends to be in a constant ratio to the amount
of change in the other variable then correlation is said to be linear correlation.
For eg:-

X 10 20 30 40 50
Y 70 140 210 280 350

Non Linear Correlation (Curvilinear):- If the amount of change in one variable doesn’t bear a constant
ratio to the amount of change in the other variable then correlation is said to ben on linear correlation.

For eg

X 10 20 30 40 50
Y 20 10 200 80 50

Method of studying correlation

(1)​ Scatter diagram


(2)​ Graphical method
(3)​ Karl Pearson coefficient of correlation
(4)​ Rank Method
(5)​ Co-current Deviation Method
(6)​ Method of Least Square

Scatter Diagram.
It is the simplest way of the diagrammatic representation of bivariate (i.e. two variable) data. Thus
for the bivariate distribution (Xi, y;); i= I,2, ... , n. if the values of the variables X and Y be plotted
along the x-axis and y-axis respectively in the xy plane, the diagram of dots so obtained is known as
scatter diagram. From the scatter diagram, we can form a fairly good, though vague, idea whether
the variables are correlated or not, e.g..if the points are very dense, i.e.. very close to each other, we
should expect a fairly good amount of correlation between the variables and if the ,points are widely
scattered, a poor correlation is expected. This method, however, is not suitable if the number of
observations is fairly large.

Karl Pearson Coefficient of Correlation


Karl Pearson(1867-1936). a British Biometrician developed a formula to measure a linear relationship
between two variables, called Correlation Coefficient.
Correlation coefficient between two random variables X and Y, usually denoted by r(X.,Y) or simply
rxyis a numerical measure of linear relationship between them and is defined as

𝐶𝑜𝑣(𝑋,𝑌)
r(X,Y)= σ𝑥σ𝑦

where Cov(X,Y) is covariance of (X,Y), σ𝑥 and σ𝑦 are Standard deviation (S.D) of X and Y respectively.

1
Then Cov(X,Y)= 𝑛 ∑(𝑋 − 𝑋)(𝑌 − 𝑌)

2
∑(𝑋−𝑋)
2
σ𝑥 = 𝑛

2
∑(𝑌−𝑌)
2
σ𝑦 = 𝑛

Remarks:-

(1)​ -1<rxy<1
(2)​ When r=1 it means there is perfect positive correlation between the variables.
(3)​ When r=-1 it means there is perfect negative correlation between the variables.
(4)​ When r=0 it means there is no relation between the variables.
(5)​ It may be noted that r (X, Y) provides a' measure of linear relationship between X and Y. For
nonlinear relationship, however, it is not very suitable.
(6)​ Sometimes. we write: Cov(X,Y)= σ𝑋𝑌
(7)​ Karl Pearson's correlation coefficient is also called product-moment correlation coefficient,
since
Cov(X, Y)-= E [(X - E(X))(Y - E(Y)] =µ1

Eg:-Calculate the correlation coefficient for the following heights (in inches) of fathers (X) and their sons
(Y) :

X 65 66 67 67 68 69 70 72
Y 67 68 65 68 72 72 69 71
SolCalculation for correlation coefficient

X Y (X-68) (Y-69) (X-68)2 (Y-69)2 (X-68)(Y-69)


65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
544 552 36 44 24
∑𝑋 ∑𝑌
544 552
𝑋= 𝑛
= 8
= 68, 𝑌= 𝑛
= 8
= 69

1 24
Cov(X,Y)= 𝑛 ∑(𝑋 − 𝑋)(𝑌 − 𝑌) = 8
=3

2
∑(𝑋−𝑋)
2 36
σ𝑥 = 𝑛
= 8
= 4. 5

2
∑(𝑌−𝑌)
2 44
σ𝑦 = 𝑛
= 8
= 5. 5

𝐶𝑜𝑣(𝑋,𝑌) 3
r(X,Y))= σ𝑥σ𝑦
= = 0. 603
4.5 5.5

Rank Correlation Coefficient or Spearman’s Rank Correlation


Let us suppose that a group of n individuals is arranged in order of merit or proficiency in possession of
two characteristics A and B. These ranks in the two characteristics will, in general, be different.

For example:- If we consider the relation between intelligence and beauty, it is not necessary that a
beautiful individual is intelligent also. Let (xi,yi); i= 1. 2 •...•n be the ranks of the ith individual in two
characteristics A and B respectively. Pearson coefficient of correlation between the ranks xi's and yi'sis
called the rank correlation coefficient between A and B for that group of individuals

Formula of Spearman’s formula for the rank correlation coefficient

where di is different between the ranks of the ith individual among two
characters and n is number of paired observation

This method is useful when given data is of qualitative nature (i.e. honesty, beauty )

Eg:-The ranks of same 16 students in Mathematics and Physics are as follows. Two numbers within
brackets denote the rank of the students in Mathematics and Physics.

(1,1) (2,10) (3,3) (4,4) (5,5) (6,7) (7,2) (8,6) (9,8) (10,11) (11.15) (12,9) (13,14) (14,12) (15,16) (16,13)

Calculate the rank correlation coefficient for proficiencies of this group in Mathematics and Physics.

Sol
Rank Correlation
Rank in Maths Rank in d=X-Y d2
(X) Physics(Y)
1 1 0 0
2 10 -8 64
3 3 0 0
4 4 0 0
5 5 0 0
6 7 -1 1
7 2 5 25
8 6 2 4
9 8 1 1
10 11 -1 1
11 15 -4 16
12 9 3 9
13 14 -1 1
14 12 2 4
15 16 -1 1
16 13 3 9
2
∑ 𝑑 =136

Rank correlation coefficient is given by

2
6∑𝑑
6(136)
ρ=1− 2 =1− 16(255)
= 0. 8
(
𝑛 𝑛 −1 )

Repeated Ranks:-If any two or more individuals are bracketed equal in any classification with respect to
characteristics A and B, or if there is more than one item with the same value in the series, then the
Spearman's formula for calculating the rank correlation coefficient breaks down, since in this case each
of the variables X and Y does not assume the values 1,2, ... , n and consequently,𝑥 ≠ 𝑦.
In this case, common ranks are given to the repeated items. This common rank is the average of the
ranks which these items would have assumed if they were slightly different from each other and the next
item will get the rank next to the ranks already assumed. As a result of this, following adjustment or
correction is made in the rank correlation formula.
2
𝑚(𝑚 −1) 2
In the formula, .we add the factor 12
to ∑ 𝑑 , where m is the number of times an item is repeated.

This correction factor is to be added foreach repeated value in both the X-series. and Y-series.

Eg:- Obtain the rank correlation coefficient for the following data:
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
Solution:-Calculation rank correlation

X Y Rank X (x) Rank Y (y) d=x-y d2


68 62 4 5 -1 1
64 58 6 7 -1 1
75 68 2.5 3.5 -1 1
50 45 9 10 -1 1
64 81 6 1 5 25
80 60 1 6 -5 25
75 68 2.5 3.5 -1 1
40 48 10 9 1 1
55 50 8 8 0 0
64 70 6 2 4 16
2
∑ 𝑑 =72

In the X-series we see that the value 75 occurs 2 times. The common rank given to these values is 2·5
which is the average of 2 and 3.The next value 68, then gets the next rank which' is 4.
Again we see that value 64 occurs thrice. The common rank given to it is 6 which is the average of 5, 6
and 7.
Similarly in the Y-series, the value 68 occurs twice and its common rank is 3·5 which is the average of 3
and 4.
In X series the correction is to be applied twice, once for the value 75 which occurs twice (m = 2) an<t
then for the value 64 which occurs thrice (m = 3). The total correction for the X-series is
2(4−1) 3(9−1) 5
12
+ 12 = 2

2(4−1) 1
Similarly, this correction for the Y-series is 12
= 2
as the value 68occurs twice.
2 5 1
6(∑𝑑 + 2 + 2
ρ=1− 2 = 0. 545
(
𝑛 𝑛 −1 )

Regression
The term "regression" literally means "stepping back towards the average" .It was first used by a British
biometrician Sir Francis Galton (1822-1911),in connection with the inheritance of stature.

Definition:-Regression analysis is a mathematical measure of the average relationship between two or


more variables in terms of the original units of the data.
In regression analysis there are two types of variables.
1) Independent variable:- the variable which influences the values or is used for prediction is called
independent variable. It is also known as repressors or predictor or explanatory variable.
2) Dependent variable:-The variable whose value is influenced or is to be predicted is called dependent
variable is also known as regressed or explained variable.
Regression measures the nature and extent of correlation.

Curve Regression:-If two variables x and y are correlated i.e., there exist an association or relationship
between them, the scatter diagram will be more or less concentrated round a curve. This curve is called
curve of regression and relationship is said to be expressed by means of curvilinear regression.

Regression Equation:-The mathematical equation of the regression curve is called regression equation.

Linear Regression:-If the curve is a straight line. it is called the line of regression and there is said to be
linear regression' between the variable, otherwise regression is said to be curvilinear.

Y=a+bX called regression line of y on x.

X=a+bY called regression line of X on Y.

Regression Coefficients:- 'b', the slope of the line of regression of Y on X is also called the coefficient of
regression of Y on X. It represents the increment in the value of dependent variable Y corresponding to a
unit change in the value of independent variable X. More precisely, we write

σ𝑦
bYX = Regression coefficient of Y on X =𝑟 σ𝑥
where r is correlation coefficient

Similarly, the coefficient of regression of X on Y indicates the change in the value of variable X
corresponding to a unit change in the value of variable Y and is given by

σ𝑥
bXY = Regression coefficient of X on Y=𝑟 σ𝑦

Eg:-Prove that correlation coefficient is the geometric mean between the regression coefficients.

Proof:- As we Know

σ𝑦 σ𝑥
𝑏𝑦𝑥 = 𝑟 σ𝑥
and 𝑏𝑥𝑦 = 𝑟 σ𝑦

Multiplying both the equation we get

σ𝑦 σ𝑥
𝑏𝑦𝑥 * 𝑏𝑥𝑦 = 𝑟 σ𝑥
*𝑟 σ𝑦

𝑟= 𝑏𝑦𝑥 * 𝑏𝑥𝑦

Properties Of Regression Coefficients:-

(1)​ Correlation coefficient is the geometric mean between the regression coefficients.
(2)​ If one of the regression coefficients is greater than unity the other must be less than unity
(3)​ Arithmetic mean of the, regression coefficients is greater than the correlation coefficient r, provided r>0.
(4)​ Regression coefficients are independent of the change of origin but not of scale.
(5)​ Angle between two lines or regression: eq of two lines is
σ𝑦 σ𝑥
𝑌 −𝑦= 𝑟 σ𝑥
(𝑋 − 𝑥)and𝑋 − 𝑥 = 𝑟 σ𝑦
(𝑌 − 𝑦)
If θ is the angle between the two lines of regression then

θ = 𝑡𝑎𝑛
−1⎰ 1−𝑟
⎱ 𝑟
2 σ𝑥σ𝑦 ⎱

σ𝑥+σ𝑦 ⎰
2 2
( )
Cases:-
π
(i)​ (r = 0). If r = 0, tan θ = ∞⟹ θ = 2
Thus if the two variables are uncorrelated, the lines of regression become
perpendicular to each other.
(ii)​ (r = ± 1). If r = ±I, , tan θ = 0⟹θ = 0 𝑜𝑟 π.
In this case the two lines of regression either coincide or they are parallel to each
other. But since both the lines of regression pass through the point.
Eg.:- In a partially destroyed laboratory record of an analysis of correlation data, the following results
only are legible:
Varlance of X = 9.
Regression equations: 8X - 10Y + 66 = O. 40X -18Y = 214.
What were (i) the mean values of X and Y.
(ii) the correlation coefficient between X and Y. and
(iii) the standard deviation of Y ?
Solution :-(i) Since both the lines of regression pass through the point(𝑋, 𝑌), we have

8𝑋 − 10𝑌 + 66 = 0, and 40𝑋 − 18𝑌 = 214.

Solving by elimination method, we get 𝑋 = 13, 𝑌 = 17.


(ii) Let 8X - 1OY + 66 = 0 and 40X - 18Y = 214 be the lines of regression of Yon X and X on Y respectively.
These equations can be put in the form :
8 66 18 214
𝑌 = 10 𝑋 + 10 and𝑋 = 40 𝑌 + 40
8 18
𝑏𝑦𝑥 = 10
and𝑏𝑥𝑦 = 40
3
Hence 𝑟 = 𝑏𝑥𝑦 * 𝑏𝑦𝑥 = ± 5
= ±0. 6
σ𝑦 4 3 σ𝑦
(iii) We 𝑏𝑦𝑥 = 𝑟 σ𝑥
⟹ 5
= 5
. 9
(Given σ𝑥 = 9)
So σ𝑦 = 4

Eg:- From the data, obtain the regression equation

X 6 2 10 4 8
Y 9 11 5 8 7

Sol
Regression equation
X Y x=X-6 y=Y-8 xy x2 y2
6 9 0 1 0 0 1
2 11 -4 3 -12 16 9
10 5 4 -3 -12 16 9
4 8 -2 0 0 4 0
8 7 2 -1 -2 4 1
2 2
∑𝑋 = ∑𝑌 = ∑𝑥 = ∑𝑦 = ∑ 𝑥𝑦 =-26 ∑ 𝑥 =40 ∑ 𝑦 =20

30 40 24 32
𝑋 =6 𝑌 =8
∑(𝑋−𝑋)(𝑌−𝑌) ∑𝑥𝑦
𝑐𝑜𝑣(𝑋,𝑌) 1 −26
𝑟= σ𝑥σ𝑦
= 𝑛 1 1 = 1 1 = 1 1

( )( )
2 2 2 2
∑(𝑋−𝑋)
2
∑(𝑌−𝑌)
2
2 2 (40) 2 (20) 2
⎛ ⎞ ⎛ ⎞ ∑𝑥 ∑𝑦
𝑛 𝑛
⎝ ⎠ ⎝ ⎠
σ𝑥 −26
𝑏𝑥𝑦= 𝑟 σ𝑦
= 20
=− 1. 3

σ𝑦 −26
𝑏𝑦𝑥= 𝑟 σ𝑥
= 40
=− 0. 65

Now equations are


Y-8=-0.65(X-6) equation Y on X
X-6=-1.3(Y-8) equation X on Y

You might also like