0% found this document useful (0 votes)
9 views

Correlation and Regression

Uploaded by

helpd5124
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Correlation and Regression

Uploaded by

helpd5124
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

CORRELATION

1. Correlation coefficient is independent of change of origin and change of scale


𝑥 𝑖 −𝑎 𝑦 𝑖 −𝑏
Proof: Let 𝑢𝑖 = , 𝑣𝑖 =
ℎ ℎ
⟹ 𝑥𝑖 = 𝑎 + ℎ𝑢𝑖 , 𝑦𝑖 = 𝑏 + 𝑘𝑣𝑖
1 1 1
𝑥𝑖 = 𝑎+ℎ 𝑢𝑖 , 𝑦𝑖 = 𝑏+𝑘 𝑣𝑖
𝑁 𝑁 𝑁
∴ 𝑥 = 𝑎 + ℎ𝑢, 𝑦 = 𝑏 + 𝑘𝑣
𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝐵𝑢𝑡 𝑟𝑥𝑦 =
𝑥𝑖 − 𝑥 2 ∙ 𝑦𝑖 − 𝑦 2 .
ℎ 𝑢𝑖 − 𝑢 ∙ 𝑘 𝑣𝑖 − 𝑣
=
ℎ2 𝑢𝑖 − 𝑢 2 ∙ 𝑘2 𝑣𝑖 − 𝑣 2

ℎ𝑘 𝑢𝑖 − 𝑢 ∙ 𝑣𝑖 − 𝑣
= = 𝑟𝑢𝑣
ℎ. 𝑘 𝑢𝑖 − 𝑢 2 ∙ 𝑣𝑖 − 𝑣 2

∴ 𝑟𝑥𝑦 = 𝑟𝑢𝑣
Thus, 𝑟 is independent of change of origin (𝑎, 𝑏) and change of Scale ℎ, 𝑘.
Remark: the above theorem can also be stated as: “If 𝑥 = 𝑎𝑢 + 𝑏, 𝑦 = 𝑐𝑣 + 𝑑, where
𝑎, 𝑏, 𝑐, 𝑑 are constants then 𝑟𝑥𝑦 = 𝑟𝑢𝑣 ”
2. If 𝒙 and 𝒚 are two correlated variables with the same standard deviation and
heaving correlation coefficient r, show that correlation coefficient between 𝒙
and 𝒙 + 𝒚 is 𝟏𝒔 + 𝒓 𝟐.
𝑐𝑜𝑣 𝑥, 𝑥 + 𝑦
𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: By definition 𝑟 𝑥, 𝑥 + 𝑦 = ⋯⋯⋯⋯⋯⋯ 1
𝜎𝑥 𝜎𝑥+𝑦
1
Now, 𝑐𝑜𝑣 𝑥, 𝑥 + 𝑦 = 𝑥𝑖 − 𝑥 𝑥𝑖 + 𝑦𝑖 − 𝑥 − 𝑦
𝑛
1
= 𝑥𝑖 − 𝑥 𝑥𝑖 − 𝑥 + 𝑦𝑖 − 𝑦
𝑛
1 1
= 𝑥𝑖 − 𝑥 2 + 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑛 𝑛
2
= 𝜎 𝑥 + 𝑐𝑜𝑣 (𝑥, 𝑦)
𝑐𝑜𝑣 𝑥, 𝑥
But 𝑟 = ∴ 𝑐𝑜𝑣 𝑥𝑦
𝜎𝑥 𝜎𝑦
∴ 𝑐𝑜𝑣 𝑥, 𝑥 + 𝑦 = 𝜎𝑥 2 + 𝑟𝜎𝑥𝜎𝑦 = 𝜎 2
= 𝜎 2 1 + 𝑟 ∴ 𝜎𝑥 = 𝜎𝑦 = 𝜎 say by data … … … … (2)
Now to find 𝜎𝑥+𝑦 ; Let 𝑧 = 𝑥 + 𝑦
∴𝐸 𝑧 =𝐸 𝑥 +𝐸 𝑦
2 | APPLIED MATHEMATICS - IV

2 2
𝜎𝑧 2 = 𝐸 𝑧 − 𝐸(𝑧) = 𝑥 + 𝑦 − 𝐸 𝑥 + 𝐸(𝑦)
2
= 𝑥 − 𝐸(𝑥) + 𝑦 𝑦 − 𝐸(𝑦)
= 𝐸 𝑥 − 𝐸(𝑥) 2 𝐸 𝑦 − 𝐸(𝑦) 2
+ 2𝐸 𝑥 − 𝐸 𝑥 𝑦 − 𝐸(𝑦)
2 2
∴ 𝜎𝑥+𝑦 2 = 𝜎𝑥 + 𝜎𝑦 + 2𝑐𝑜𝑣(𝑥, 𝑦)
𝑐𝑜𝑣 𝑥, 𝑦
But 𝑟 =
𝜎𝑥 𝜎𝑦
∴ 𝑐𝑜𝑣 𝑥, 𝑦 = 𝑟𝜎𝑥 𝜎𝑦
∴ 𝜎𝑥 𝑦 2 = 𝜎𝑥 2 + 𝜎𝑦 2 + 2𝑟 𝜎𝑥 𝜎𝑦
𝜎𝑥 𝑦 2 = 𝜎𝑥 2 + 𝜎𝑦 2 + 2𝑟 𝜎𝑥 𝜎𝑦
= 2𝜎 2 + 2𝑟 𝜎 2 = 2𝜎 2 1 + 𝑟 ∴ 𝜎𝑥 = 𝜎𝑦 = 𝜎 … … … (3)
Putting the values from (2), (3) in (1)
𝜎 2 (1 + 𝑟) 𝜎 2 (1 + 𝑟) 1+𝑟
𝑟 𝑥, 𝑥 + 𝑦 = = =
𝜎 2𝜎 2 (1 + 𝑟) 𝜎 2(1 + 𝑟) 2

3. If 𝒁 = 𝒂𝒙 + 𝒃𝒚 and r is the correlation coefficient between x, y prove that


𝝈𝒛𝟐 = 𝒂𝟐 𝝈𝒙𝟐 + 𝒃𝟐 𝝈𝒚𝟐 + 𝟐𝒂𝒃𝒓 𝝈𝒙 𝝈𝒚 .
Solution: Since 𝑧 = 𝑎𝑥 + 𝑏𝑦, 𝐸 𝑧 = 𝑎 𝐸 𝑥 + 𝑏 𝐸(𝑦)
∴𝑧−𝐸 𝑧 =𝑎 𝑥−𝐸 𝑥 + 𝑏 𝑦 − 𝐸(𝑦)
Taking squares both sides
2
𝑧 = 𝐸(𝑧) = 𝑎2 𝑥 − 𝐸(𝑥) 2
+ 𝑏 2 𝑦 − 𝐸(𝑦) 2
+ 2𝑎𝑏 𝑥 − 𝐸(𝑥) 𝑦 − 𝐸(𝑦)
Taking expectations both sides
𝜎𝑧 2 = 𝑎2 𝜎𝑥 2 + 𝑏 2 𝜎𝑦 2 + 2𝑎𝑏𝑟 𝜎𝑥 𝜎𝑦 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (1)
Cor. 1: Changing the sign of b i.e. if 𝑧 = 𝑎𝑥 − 𝑏𝑦.
𝜎𝑧 2 = 𝑎2 𝜎𝑥 2 + 𝑏 2 𝜎𝑦 2 − 2𝑎𝑏𝑟 𝜎𝑥 𝜎𝑦 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (2)
Cor. 2: Putting 𝑎 = 1, 𝑏 = 1, 𝑖𝑛 1
𝜎𝑥 + 𝑦 2 = 𝜎𝑥 2 + 𝜎𝑦 2 + 2𝑟𝜎𝑥 𝜎𝑦 ⋯ ⋯ ⋯ ⋯ (3)
𝜎𝑥 2 + 𝜎𝑦 2 − 𝜎𝑥+𝑦 2
∴𝑟=
2𝜎𝑥 2 𝜎𝑦 2
Cor. 3: Putting 𝑎 = 1, 𝑏 = −1 in (1)
𝜎𝑥−𝑦 2 = 𝜎𝑥 2 𝜎𝑦 2 − 2𝑟𝜎𝑥 𝜎𝑦 ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (4)
𝜎𝑥 2 + 𝜎𝑦 2 − 𝜎𝑥−𝑦 2
∴𝑟=
2𝜎𝑥 𝜎𝑦
Cor. 4: Subtracting (4) from (2),
𝜎𝑥+𝑦 2 − 𝜎𝑥−𝑦 2 = 4𝑟𝜎𝑥 𝜎𝑦
∴ 𝜎𝑥+𝑦 2 − 𝜎𝑥−𝑦 2 < <
>0 > if 𝑟 0.
CORRELATION AND REGRESSION | 3

4. Two random variables x and y are jointly normally distributed and u and v are
defined by 𝑼 = 𝒙 𝐜𝐨𝐬 𝒂 + 𝒚 𝐬𝐢𝐧 𝒂, 𝑽 = 𝒚 𝐜𝐨𝐬 𝒂 − 𝒙 𝐬𝐢𝐧 𝒂, 𝑽 = 𝒚 𝐜𝐨𝐬 𝒂 − 𝒙 𝐬𝐢𝐧 𝒂.
2𝑟𝜎𝑥 𝜎𝑦
Solution: Show that U and V will be uncorrelated if tan 2𝑎 =
𝜎 𝑥 2 −𝜎 𝑦 2

We have 𝑐𝑜𝑣 𝑈, 𝑉 = 𝐸 𝑈 − 𝐸(𝑈) 𝑉 − 𝐸(𝑉) ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ (1)


Now, 𝑈 = 𝑥 cos 𝑎 + 𝑦 sin 𝑎, 𝐸 𝑢 = cos 𝑎 𝐸 𝑥 + sin 𝑎 𝐸 (𝑦)
𝑉 = 𝑦 cos 𝑎 − 𝑥 sin 𝑎 , 𝐸 𝑣 = cos 𝑎 𝐸 𝑦 ′ − sin 𝑎 𝐸(𝑥)
Putting these values in (1),
𝑐𝑜𝑣 𝑈, 𝑉 = 𝐸 cos 𝑎 𝑥 − 𝐸 𝑥 + sin 𝑎 𝑦 − 𝐸 𝑦

𝑥 cos 𝑎 𝑦 − 𝐸 𝑦 − sin 𝑎 𝑥 − 𝐸 𝑥

= 𝐸 𝑐𝑜𝑠 2 𝑎 𝑥 − 𝐸 𝑥 𝑦−𝐸 𝑦 − cos 𝑎 sin 𝑎 (𝑥 − 𝐸 𝑥 )2

+ sin 𝑎 cos 𝑎 (𝑦 − 𝐸 𝑦 )2 − 𝑠𝑖𝑛2 𝑎 𝑥 − 𝐸 𝑥 𝑦−𝐸 𝑦

= 𝑐𝑜𝑠 2 𝑎 𝐸 𝑥−𝐸 𝑥 𝑦−𝐸 𝑦


2 2
− sin 𝑎 cos 𝑎 𝐸 𝑥 − 𝐸(𝑥) + sin 𝑎 cos 𝑎 𝐸 𝑦 − 𝐸(𝑦)
−𝑠𝑖𝑛2 𝑎 𝑥 − 𝐸(𝑥)(𝑦 − 𝐸(𝑦))
= 𝑐𝑜𝑠 2 𝑎 − 𝑠𝑖𝑛2 𝑎 𝑐𝑜𝑣 𝑥, 𝑦 − sin 𝑎 cos 𝑎 [ 𝜎𝑥 2 − 𝜎𝑦 2 ]

Now U, V will be uncorrelated if 𝑐𝑜𝑣 𝑢, 𝑣 = 0.


sin 2𝑎
cos 2 𝑎 ∙ 𝑐𝑜𝑣 𝑥, 𝑦 = 𝜎𝑥 2 − 𝜎𝑦 2
2
𝑐𝑜𝑣 (𝑥, 𝑦)
𝐵𝑢𝑡 𝑟 =
𝜎𝑥 2 𝜎𝑦 2
∴ 𝑐𝑜𝑣 𝑥, 𝑦 = 𝑟𝜎𝑥 𝜎𝑦
∴ 2 cos 2𝑎 ∙ 𝑟𝜎𝑥 2 𝜎𝑦 2 = sin 2𝑎 [𝜎𝑥 2 − 𝜎𝑦 2 ]
2𝑟 𝑟𝜎𝑥 𝜎𝑦
∴ tan 2𝑎 =
𝜎𝑥 2 − 𝜎𝑦 2
5. Calculate the correlation coefficient from the following data

𝑿: 𝟐𝟑 𝟐𝟕 𝟐𝟖 𝟐𝟗 𝟑𝟎 𝟑𝟏 𝟑𝟑 𝟑𝟓 𝟑𝟔 𝟑𝟗
𝒀: 𝟏𝟖 𝟐𝟐 𝟐𝟑 𝟐𝟒 𝟐𝟓 𝟐𝟔 𝟐𝟖 𝟐𝟗 𝟑𝟎 𝟑𝟐
Solution:

𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
23 18 529 324 414
4 | APPLIED MATHEMATICS - IV

27 22 729 484 594


28 23 784 529 644
29 24 841 576 696
30 25 900 625 750
31 26 961 676 806
33 28 1089 784 924
35 29 1225 841 1015
36 30 1296 900 1080
39 32 1521 1024 1248
Σ𝑥 = 311 Σ𝑦 = 257 Σ𝑥 2 = 9875 Σ𝑦 2 = 6763 Σ𝑥𝑦 = 8171
Now
𝑛Σ𝑥𝑦 − (Σ𝑥)(Σ𝑦)
𝑟=
𝑛Σ𝑥 2 − Σ𝑥 2 𝑛Σ𝑦 2 − Σ𝑦 2

10 × 8171 − 311 × 257


𝑟=
10 × 9875 − 311 2 . 10 × 6763 − 257 2

81710 − 79927
𝑟=
98750 − 96721 67630 − 66049
1783
𝑟= = 0.9954
1791.301
6. Calculate the correlation coefficient from the following data:

𝑿: 𝟏𝟎𝟎 𝟐𝟎𝟎 𝟑𝟎𝟎 𝟒𝟎𝟎 𝟓𝟎𝟎


𝒀: 𝟑𝟎 𝟒𝟎 𝟓𝟎 𝟔𝟎 𝟕𝟎
Solution: Karl Pearson’s correlation co-efficient:

𝑋 𝑌 𝑥 − 300 𝑦 − 50 𝑢2 𝑣2 𝑢𝑣
𝑢= 𝑣=
100 10
100 30 −2 −2 4 4 4
200 40 −1 −1 1 1 1
300 50 0 0 0 0 0
400 60 1 1 1 1
500 70 2 2 4 4 4
Total 0 0 10 10 10
CORRELATION AND REGRESSION | 5

𝑛=5
𝑛Σuv − ΣuΣv
𝑟𝑥,𝑦 = 𝑟𝑢 ,𝑣 =
𝑛Σu2 − Σu 2 𝑛Σv 2 − Σv 2

5 10 − (0)(0)
= =1
5 10 − 10 2 5 10 − 0 2

Correlation coefficient =1.

7. Compute spearman’s rank correlation coefficient from the following data.

X 18 20 34 52 12
y 39 23 35 18 46
Solution: Let R be the spearman’s rank correlation coefficient of 𝑥 and 𝑦.
We have,
6Σ𝑑12
𝑅 =1−
𝑁3 − 𝑁
Where, N = Numbers of observations, Here N=5
Consider the following table

X Y 𝑅𝑥 𝑅𝑦 𝑑1 = 𝑅𝑥 − 𝑅𝑦 𝑑12
18 39 4 2 2 4
20 23 3 4 -1 1
34 35 2 3 -1 1
52 18 1 5 -4 16
15 46 5 1 4 16
38
We have,
6 38 6 38 228
𝑅 =1− 3
=1− =1−
5 −5 125 − 5 120
𝑹 = −𝟎. 𝟗
8. Calculate 𝑹 and 𝒓 from the following data.
𝑿 ∶ 𝟏𝟐 𝟏𝟕 𝟐𝟐 𝟐𝟕 𝟑𝟐
𝒀 ∶𝟏𝟏𝟑 𝟏𝟏𝟗 𝟏𝟏𝟕𝟏𝟏𝟓 𝟏𝟐𝟏
Interpret your result.
Solution: The values of 𝑅 and 𝑟 come out to be equal.
6 | APPLIED MATHEMATICS - IV

REGRESSION
1. Definition: A method of estimating the value of one variable when that of other is
known and when that of other is known and the variables are correlated.
2. Types of line of Regression : There are of two type as follow
(a) Line of Regression of 𝑦 on 𝑥, which is given as,
𝑦 = 𝑎 + 𝑏𝑥
(b) Line of Regression of 𝑥 on 𝑦, which is given as
𝑦 = 𝑎 + 𝑏𝑦
3. Method of obtaining the Regression :
(a) Method of scotter diagram: In this method, we plot a graph in which, one
variable which is plotted on X –axis and other variable is plotted on Y – axis
and If they are correlated perfectly
Example: Given the following pair of variables x and y
𝑋: 1 2 3 4
𝑌: 2 2 3 4
Plot the point on graph and draw line of Regression

(b) Method of least square: This method is classified in two type As given follows :
(i) By using summation: Consider we want to derive the regression of y on X
then it is given as.
𝑦 = 𝑎 + 𝑏𝑥
Now we have two following summation equation as follow,

𝑦 = 𝑎𝑛 + 𝑏 𝑥

𝑥𝑦 = 𝑎 𝑦+𝑏 x2

By using the given data we find the Reburied term and sub, by that we get the
constant 𝑎 and 𝑏. Similarly, for regression of 𝑥 on 𝑦 is given as
𝑥 = 𝑎 + 𝑏𝑦
CORRELATION AND REGRESSION | 7

And summation e𝑞 𝑛 is given as

𝑥 = 𝑎𝑛 + 𝑏 𝑦

𝑥𝑦 = 𝑎 𝑦+𝑏 𝑦2

(ii) By using 𝒃𝒙𝒚 and 𝒃𝒚𝒙 :


𝑏𝑥𝑦 = slope of line,
⟹ 𝑥 = 𝑎 + 𝑏𝑦
𝑏𝑦𝑥 = slope of line,
⟹ 𝑦 = 𝑎 + 𝑏𝑥
consider, 𝑏𝑦𝑥 given as
𝑥𝑦 𝑥 𝑦
𝜎𝑦 − 𝑥𝑦
𝑛 𝑛 𝑛
𝑏𝑦𝑥 =𝑟 = =
𝜎𝑥 𝑥2 𝑥 𝑥2
− ( )2
𝑛 𝑛
Also,
𝑥𝑦 𝑥 𝑦
𝜎𝑥 − 𝑥𝑦
𝑛 𝑛 𝑛
𝑏𝑥𝑦 =𝑟 = =
𝜎𝑦 𝑦2 𝑥 𝑦2
− ( )2
𝑛 𝑛
where 𝑟 = coefficient of regression
σx = S. D of x
σy = S. D of 𝑦
Regression of Line by using 𝑏𝑥𝑦 and 𝑏𝑦𝑥 is given as,
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦)
𝑥 − 𝑥 = 𝑏𝑥𝑦 (𝑦 − 𝑦)
(𝑦 − 𝑦) = 𝑏𝑥𝑦 (𝑥 − 𝑥 )
Where 𝑋 = mean of x ; 𝑦 = mean of y
4. Coefficient of Regression (r) : It is given as,
𝑟= 𝑏𝑥𝑦 × 𝑏𝑦𝑥
Case 1) when 𝑏𝑥𝑦 , 𝑏𝑦𝑥 > 0 then r, 0 < 𝑟 ≤ 1
Case 2) when 𝑏𝑥𝑦 , 𝑏𝑦𝑥 < 0 then r should be negative and 0 > 𝑟 > −1
5. Acute angle between the line of Regression:
Let 𝜃 be the angle between the regression Line then it is given as,
1 − 𝑟2 𝜎𝑥. 𝜎𝑦
tan 𝜃 =
𝑟 𝜎𝑥 2 + 𝜎𝑦 2
8 | APPLIED MATHEMATICS - IV

6. Choice of Regression Line: For estimating the value of one variable when other is
known, we have to select the line of regression suppose we have two line of
Regression as follows.
𝑦 = 𝑎 + 𝑏𝑥, and 𝑥 = 𝑎 + 𝑏𝑦
And we want to estimate y for x = 20
The following procedure to be followed.
Step (i) Find the sloppy of 1 equation and denote it as byx and similarly for 2nd
equation denote as bxy
Step (ii) Find the r values
(a) If r value Line between 0 and 1; Both pair of line of Regression are original and
to estimate are original and estimate y we should uses equation 1
(b) If r value doesn’t Line between o and 1; Both pair of line is not original and to
estimation y, we have to reverse the equation.
SOLVED PROBLEM
1. The following data gives the marks in two short examination obtained by 5
student in mathematics
Marks in the 1st exam (x): 6 2 10 4 8
Marks in the 2nd exam(y): 9 11 5 8 7
Determine the regression of line of y on x.
Solution: The regression of line of 𝑦 on 𝑥 is as follows
𝑦 = 𝑎 + 𝑏𝑥 … … … (1)
As we have summation equation as

𝑦 = 𝑎𝑛 + 𝑏 𝑥 … … … (2)

𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥 2 … … … (3)

Form the above data; we draw the table as,

Sr no X Y 𝑋2 Xy
1 6 9 36 54
2 2 11 4 22
3 10 5 100 50
4 4 8 16 32
5 8 7 64 56

N=5 𝑥 = 30 𝑥 = 40 𝑥 2 = 220 𝑥𝑦 = 214

Therefore equation 2 and 3 become,


40 = 5𝑎 + 30𝑏
CORRELATION AND REGRESSION | 9

214 = 30𝑎 + 220𝑏


Solving above two equations we get
𝐴 = 11.9
𝐵 = −0.65
𝑌 = 11.9 − 0.65
2. The following data regarding the heights and weight of 100 college students are
given as follow; Also calculator,
Solution: The given data is

𝑋 = 15000 ; 𝑦 = 6800 ; 𝑥 2 = 2272500 ; 𝑦 2 = 463052 ;

𝑋𝑦 = 1022250

𝑥𝑦 𝑥 𝑦 1022250 15000 6800


− −
𝑛 𝑛 𝑛 100 100 100
𝑁𝑜𝑤 𝑏𝑥𝑦 = =
𝑦2 𝑦 2 463025 6800
−( ) − 2
𝑛 𝑛 100 100
∴ 𝑏𝑥𝑦 = 3.6
𝑥𝑦 𝑥 𝑦 1022250 15000 6800
− −
𝑛 𝑛 𝑛 100 100 100
𝑏𝑦𝑥 = =
𝑥2 𝑥 2 2272500 15000
−( ) − 2
𝑛 𝑛 100 100
𝑏𝑦𝑥 = 0.1

Solution: 𝑟 = 𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥 = 3.6 ∗ 0.1 = 0.3


Now equation of line of Regression is,
(−𝑦) = 𝑏𝑦𝑥 (𝑥 − 𝑥 )
𝑦 6800 𝑥 15000
𝑦= = = 68; 𝑥 = = = 150
𝑛 100 𝑛 100
(𝑦 – 68) = 0.1(𝑥 − 150)
𝑌– 68 = 0.1𝑥– 15
𝑌 = 0.1 𝑥 + 53
3. Find the angle between the line of regression using the following data:
N = 10, 𝒙 = 𝟐𝟕𝟎 𝒚 = 𝟔𝟑𝟎 𝝈𝒙 = 𝟒 𝝈𝒚 = 𝟓, Rxy =0.6
Solution: let 𝜃 be the angle between the lines of Regression,
1 − 𝑟2 𝜎𝑥. 𝜎𝑦 1 − (0.6)2 4.5
so tan 𝜃 = =
𝑟 𝜎𝑥 2 + 𝜎𝑦 2 0.6 16 + 25
tan 𝜃 = 0.52
𝜃 = tan−1 0.52
10 | APPLIED MATHEMATICS - IV

4. State true or false with justification. If the two line of regression are 𝒙 + 𝟑𝒚 − 𝟓 =
𝟎 and 𝟒𝒙 + 𝟑𝒚 − 𝟖 = 𝟎 then the correlation coefficient is +𝟎. 𝟓.
Solution: 𝑥 + 3𝑦 = 0
∴ 3𝑦 = 5 − 𝑥
1 5
∴𝑦=− 𝑥+
3 3
And 4𝑥 + 3𝑦 − 8 = 0
∴ 3𝑦 = −4𝑥 + 8
4 8
∴𝑦=− +
3𝑥 3
1 4
𝐿𝑒𝑡 𝑏1 = − 𝑎𝑛𝑑 𝑏2 = −
3 3
1
Since, 𝑏1 < 𝑏2 , 𝑏𝑦𝑥 = 𝑏1 = −
3
1 3
𝑏𝑥𝑦 = =−
𝑏2 4
Hence Equation (1) is regression equation of Y on X and Equation (2) is regression
equation of X on Y.
−1 −3
∴ 𝑟 = ± 𝑏𝑦𝑥 𝑏𝑥𝑦 =± ×
3 4

1 1
= ± =±
4 2

= ±0.5
5. Obtain the equation of the line regression of cost on age from the following table
giving the age of the car of certain make and the annual maintenance cost. Also
find maintenance cost if age of the car is 9 years.

Age of the car (in years): x 2 4 6 8


Maintenance cost (in Thousands) : y 5 7 8.5 11
Solution: Let 𝑎 = 5, 𝑏 = 8, 𝐶 = 1
Here 𝑛 = −4

𝑋 𝑌 𝑢 =𝑋−5 𝑉 =𝑦−8 𝑢2 𝑢𝑣
2 5 −3 −3 9 9
4 7 −1 −1 1 1
6 8.5 1 0.5 1 0.5
8 11 3 3 9 9
𝛴 0 −0.5 20 19.5
CORRELATION AND REGRESSION | 11

Σu 0
𝑥 = 𝑎 + 𝑐𝑢 = 𝑎𝑐 =5+1× =4
𝑛 4
Σv −0.5
𝑦 = 𝑏 + 𝑐𝑣 = 𝑏 + 𝑐 =8+1× = 7.875
𝑛 4
𝑛Σuv − ΣuΣv 4 19.5 − 0 (−0.5)
𝑏𝑦𝑥 = 𝑏𝑣𝑢 = = = 0.975
𝑛Σu2 − Σu 2 4 20 − 0 2
∴ Regression Equation Y on X is

𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )
∴ 𝑦 − 7.875 = 0.975(𝑥 − 5)
𝑦 = 0.975𝑥 + 3
When x is 9 years
∴ y = 0.975 9 + 3 = 11.775
∴ Maintenance cost for a car 9 years old is 11.775 × 1000 = 11775units.

6. It is given that the means of x and y are 5 and 10. If the line of regression of y on x
is parallel to the line 𝟐𝟎𝒚 = 𝟗𝒙 + 𝟒𝟎, estimate the value of y for 𝒙 = 𝟑𝟎.
Solution: Given means of x and y are 5 and 10.
∴ 𝑥 = 5; 𝑦 = 10
Given line is 20𝑦 = 9𝑥 + 40
9 40
∴𝑦= 𝑥+
20 20
9
Slope of the above line 𝑚1 =
20

Slope of regression of y on x 𝑚2 = 𝑏𝑦𝑥


Since two lines are parallel 𝑚1 = 𝑚2
9
∴ 𝑏𝑦𝑥 =
20
∴ Regression equation of Y on X is 𝑦 − 𝑦 = 𝑏𝑦𝑥 (𝑥 − 𝑥 )
9
∴ 𝑦 − 10 = (𝑥 − 5)
20
∴ 20𝑦 − 200 = 9𝑥 − 45
∴ 20𝑦 = 9𝑥 + 155
𝑊ℎ𝑒𝑛 x = 30
∴ 20𝑦 = 9 30 + 155
∴ 𝑦 = 21.25

Estimate value of y for x = 30 is 21.55


12 | APPLIED MATHEMATICS - IV

7. The regression lines of a sample are 𝒙 + 𝟔𝒚 = 𝟔 and 𝟑𝒙 + 𝟐𝒚 = 𝟏𝟎. Find (i) mean
of 𝒙 and 𝒚 and (ii) coefficient of correlation between 𝒙and 𝒚.
Solution: (i) Give regression lines are
𝑥 + 6𝑦 − 6 = 0 ; 3𝑥 + 2𝑦 − 10 = 0
They pass through the point 𝑥 , 𝑦 we get
𝑥+6𝑦 =6 … … … … (1)
3𝑥 + 2 𝑦 = 106 … … … … (2)
3 × 𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛 1 − 𝐸𝑞𝑢𝑎𝑡𝑖𝑜𝑛(2)
3𝑥 + 18 𝑦 = 18
3𝑥 + 2𝑦 = 10
− − −
16 𝑦 = 8
1
𝑦 = 8 − 16 =
2
1
Put 𝑦 = in equation (1)
2
1
𝑥+6× =6
2
𝑥+3=6
𝑥=3
1
Means of 𝑥 and 𝑦 are 𝑥 = 3 and 𝑦 =
2

(ii) 𝑥 = −6𝑦 + 6 ∴ 𝑏𝑥𝑦 = −6


2𝑦 = 10 − 3𝑥
3
⟹𝑦 =5− 𝑥
2
3
∴ 𝑏𝑦𝑥 = −
2
Coefficient of correlation between 𝑥and 𝑦 is,

−3
𝑟= 𝑏𝑥𝑦 𝑏𝑦𝑥 = −6 × = 9=3
2
CORRELATION AND REGRESSION | 13

PROBLEMS
1. You are given the following data:

X Y
Arithmetic mean 36 85
Standard deviation 11 08

Correlation coefficient between X and Y = 0 . 66


(a) Find the two regression equations.
(b) Estimate the value of 'X' when 'Y' = 75.
2. Prove that : 𝜎3𝑥 2 − 6𝑦 = 𝜎 2 𝑦 − 2𝛾𝜎𝑥 𝜎𝑦
3. If the tangent of the angle made by the line of regression of y on x is 0.6 and
𝜎𝑦 = 2𝜎𝑥 . Find the correlation between x and y
4. Fit a second degree parabolic curve to the following data :

X: 1 2 3 4 5 6 7 8 9

Y: 2 6 7 8 10 11 11 10 9

5. Given 6𝑌 = 5𝑋 + 90 , 15𝑋 = 8𝑌 + 130 , 𝜎 2 𝑥 = 16. Find (i) 𝑥 and 𝑦 (ii) 𝑟 and 𝜎𝑦 2 .


6. If the tangent of the angle made by the line of regression of y on x is 0.6 and
𝜎𝑦 = 2𝜎𝑥 . Find the correlation between x and y .
7. Obtain the rank correlation coefficient from the following data:
X : 10, 12, 18, 18, 15, 40
Y : 12, 18, 25, 25, 50, 25
8. Out of consignment of 1, 00, 000 tennis balls, 400 were selected at random and
examined. It was found that 20 of these were defective. How many defective balls
you can reasonably expect to have in the whole consignment at 95% confidence
level?
9. From the following data calculate Spearman's rank correlation between x and y:
X : 36, 56, 20, 42, 33, 44, 50, 15, 60
Y : 50, 35, 70, 58, 75, 60, 45, 80, 38
10. Fit a straight line to the following data :

Year (x) 1951 1961 1971 1981 1991

Production (y) (000 tons) 10 12 8 10 13

Also estimate the production in 1987.


14 | APPLIED MATHEMATICS - IV

11. The following marks have been obtained by a class of students in Stats (out of 100):

Paper I 45 55 56 58 60 65 68 70 75 80 85

Paper II 56 50 48 60 62 64 65 70 74 82 90

Find the equations of lines of regression.


12. Fit a straight line to the following data :

x 1 2 3 4 5 6
y 49 54 60 73 80 86

13. The following data gave the growth of employment in lakhs in organized sector in
India between 1988 and 1995 :

Year 1988 1989 1990 1991 1992 1993 1994 1995


Public Sector 98 101 104 107 113 120 125 128
Private 65 65 67 68 68 69 68 68
Sector

Find the correlation coefficient between the employment in public sectors and
private sectors and give your comments.
14. Two Judges in a beauty contest gave the following marks out of 50 to 9 contestants:

Judge A 20 25 22 27 23 26 34 24 32
Judge B 30 42 45 46 33 34 40 35 39

Do the two Judges appear to agree in their standards? When will agreement
complete?
15. Show that the second degree curve fitting the following data is given by 𝑣 = 3 +
0.85𝑢 − 0.27𝑢2 where 𝑢 = 𝑥 − 5, 𝑣 = 𝑦 − 7. Also find 𝑦 𝑤ℎ𝑒𝑛 𝑥 = 10.

x 1 2 3 4 5 6 7 8 9

y 2 6 7 8 10 11 11 10 9

16. From the following data find the equation of line of regression of y on x and
estimate the most probable value of y when x = 80.

x 89 86 74 65 64 64 66 67 72 79
y 92 91 84 75 73 72 71 75 78 84

17. While calculating correlation coefficient between x & y following constants are
obtained. 𝑁 = 25, 𝑥 = 125 𝑦 = 100, 𝑥 2 = 650, 𝑦 2 = 460, 𝑥𝑦 = 508.
It was later discovered that it had recorded two pairs x = 6, y = 14 and x =8 , y =6
while the correct values were x = 8, y = 12 and x = 6 , y = 8. Calculate correct
correlation coefficient.
CORRELATION AND REGRESSION | 15

18. The following table shows the height of a sample of 12 fathers and their sons. Find
ranks correlation coefficients.
x 65 63 67 64 68 62 70 66 68 67 69 71
y 68 66 68 65 69 66 68 65 71 67 68 80

19. The following marks have been obtained by a class of students in stats (out of 100):
Paper I 45 55 56 58 60 65 68 70 75 80 85
Paper II 56 50 48 60 62 64 65 70 74 82 90

Compute the coefficient of correlation for the above data. Find also the equations of
lines of regression.
20. If 𝑥 = 𝑎𝑢 + 𝑏, 𝑦 = 𝑐𝑢 + 𝑑 𝑎, 𝑏, 𝑐, 𝑑 are constants then prove 𝑟𝑥𝑦 = 𝑟𝑢𝑣 . Where
𝑟𝑥𝑦 coefficient of correlation between x and y.
21. Fit a second degree curve for the following data:
x 1 2 3 4 5
y 1250 1400 1650 1950 2300

22. Find the coefficient of correlation for the following data:


x 2 4 5 6 8 11
Y 18 12 10 8 7 5

23. Show that 𝑅 = 𝑟 for the following data:


x 60 62 64 66 68 70 72 74
Y 92 83 101 110 128 119 137 146

24. Fit a second degree parabola to following data and estimate the value of y for x = 6
X: 1 2 3 4 5
Y: 25 28 33 39 46

25. The following data gives the age of or of certain make and the annual maintenance
cost , obtain the equation of line of regression of cost on age
Age (year): 2 4 6 8
Maintenance cost: 10 20 25 30
26. Find the equation of line of regression foe the following data
X: 5 6 7 8 9 10 11
Y: 11 14 14 15 12 17 16
Also find r.
16 | APPLIED MATHEMATICS - IV

27. Find the two equation of line of regression and estimate the value of y for x = 7 form
the following data
X: 0 1 2 3 4 5 6
Y: 5 9 8 10 11 9 11
28. Given the following result of weight and height of 10000 student
𝑥 = 150𝑦 = 68 inch𝜎𝑥 = 𝜎𝑥𝜎𝑦 = 𝜎𝑦 r = 0.6
29. The following result regarding 100 college student are given as.

𝑥 = 15000; 𝑦 = 6800 ; 𝑥 2 = 227250; 𝑥𝑦 = 1022250 ; 𝑦 2 = 463025

30. The two line of Regression are 6y = 5 x + 90 and 15x =8y +130 estimate y for x =
60
31. The regression Line sample are x + 6y = 6 and 3x + 2y = 10 find y when x =12
32. Find the angle between the line of regression using the following data
N=10 𝑥 = 270 𝑦 = 630 ,𝜎𝑦 = 5, 𝑟𝑥𝑦 = 0.
33. If 𝜎𝑥 = 𝜎𝑦 = 𝜎 and the angle between the equation of regression line is ta𝑛−1 .Find
the coefficient of Regression
34. The equation of two line of regression are x = 19.13 -0.874 and y = 11.64 – 0.5x
(a) Find mean of 𝑥 and 𝑦
(b) The coefficient of correlation between 𝑥 and 𝑦
35. Calculate the rank of correlation coefficient from the following data.
Rank in English ∶ 1, 3, 7, 5, 4, 6, 2, 10, 9, 8
Rank in Statistics: 3, 1, 4, 5, 6, 9, 7, 8, 10, 2
1 1
6 𝑑2 + 𝑚13 − 𝑚1 + 𝑚23 − 𝑚2 + ⋯
Ans: 𝑅 = 1 − 12 12
𝑛3 − 𝑛

36. Obtain the rank correlation coefficient from the following data.
𝑋 ∶ 10, 12, 18, 18, 15, 40.
𝑌 ∶ 12, 18, 25, 25, 50, 25.
Ans: 1 − 0.4571 = 0.5429
37. (a) Let 𝑟𝑥𝑦 = 0.4, 𝑐𝑜𝑣 𝑥, 𝑦 = 1.6, 𝜎𝑦 2 = 25. Find r.
(b) If 𝑅𝑥𝑦 = 0.143 and the sum of the squares of the differences between the ranks
in 48 find R.
Ans: 𝑁 = 7, Other roots of N are imaginary.

You might also like