Correlation and Regression
Correlation and Regression
Correlation and Regression
Instructor
Prof. Gopal Krishna Panda
Department of Mathematics
NIT Rourkela
Correlation and Regression
Correlation
In the last chapter, we discussed about the joint distribution of two random variables 𝑋 and 𝑌. We
come across random variables 𝑋 and 𝑌 that are independent and dependent. 𝑋 and 𝑌 that
dependent means, there exists some sort of relation between them. Correlation deals with the
amount of dependence between 𝑋 and 𝑌. In our discussion, we only discussion about the extent of
linearity between 𝑋 and 𝑌. It is measured by the correlation coefficient 𝒓 defined by
𝑪𝒐𝒗 𝑿, 𝒀
𝒓=
𝝈𝑿 ⋅ 𝝈𝒀
where 𝜎𝑋 and 𝜎𝑌 are standard deviations of 𝑋 and 𝑌 and 𝐶𝑜𝑣 𝑋, 𝑌 is the covariance between 𝑋
and 𝑌. We will see that 𝑟 measures the amount of linear dependence between 𝑋 and 𝑌. Recall that
𝝈𝟐𝑿 = 𝑬 𝑿 − 𝝁𝑿 𝟐 , 𝝈𝟐𝒀 = 𝑬 𝒀 − 𝝁𝒀 𝟐
and
𝑪𝒐𝒗 𝑿, 𝒀 = 𝑬 𝑿 − 𝝁𝑿 𝒀 − 𝝁𝒀 .
Example 1: Let 𝑋 and 𝑌 be random variables such that 𝑓 0,0 = 0.1, 𝑓 0,1 = 0.2, 𝑓 1,0 =
0.3 𝑎𝑛𝑑 𝑓 1,1 = 0.4. Find the correlation coefficient between 𝑋 and 𝑌.
Ans. To find the correlation coefficient 𝑟, we need to calculate the variances of 𝑋 and 𝑌 and the
covariance between 𝑋 and 𝑌. In tabular form, the joint distribution of 𝑋 and 𝑌 and the
marginal distributions of 𝑋 and 𝑌 are given in the following tables:
𝒙↓𝒚→ 0 1 𝒇𝟏 (𝒙) ↓
𝐶𝑜𝑣 𝑋, 𝑌 −0.02
𝑟= = = −0.0891.
𝜎𝑋 ⋅ 𝜎𝑌 0.21 × 0.24
1
Example 2: If (𝑋, 𝑌) has the pdf 𝑓 𝑥, 𝑦 = if 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑥 + 𝑦 ≤ 8 and 𝑓 𝑥, 𝑦 = 0
32
otherwise, find 𝑟.
Ans. It is easy to see that the marginal densities of 𝑋 and 𝑌 are
8−𝑥
𝑓1 𝑥 = ቐ 32 if 0 ≤ 𝑥 ≤ 8
0 otherwise
and
8−𝑦
𝑓2 𝑦 = ቐ 32 if 0 ≤ 𝑦 ≤ 8
0 otherwise.
Hence,
∞ 8
8−𝑥 8 8
𝜇𝑋 = න 𝑥 𝑓1 𝑥 𝑑𝑥 = න 𝑥 ⋅ 𝑑𝑥 = , 𝜇𝑌 = .
−∞ 0 32 3 3
∞ 8 8−𝑥 32 32
𝐸 𝑋2 2
= න 𝑥 𝑓1 𝑥 𝑑𝑥 = න 𝑥 ⋅ 2
𝑑𝑥 = ,𝐸 𝑌 2
= .
−∞ 0 32 3 3
∞ ∞
1 8 8−𝑦
𝐸 𝑋𝑌 = න න 𝑥𝑦 𝑓(𝑥, 𝑦) 𝑑𝑥𝑑𝑦 = න න 𝑥𝑦 𝑑𝑥𝑑𝑦
−∞ −∞ 32 0 0
1 8 8−𝑦
1 8 2 𝑑𝑦
16
= න 𝑦 න 𝑥 𝑑𝑥 𝑑𝑦 = න 𝑦 8−𝑦 = .
32 0 0 64 0 3
Hence,
2
32 8 32 32
𝜎𝑋2 =𝐸 𝑋2 2
− 𝜇𝑋 = − = 2
, 𝜎𝑌 = .
3 3 9 9
16 8 8 16
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋𝑌 − 𝜇𝑋 𝜇𝑌 = − × =− .
3 3 3 9
Hence,
16
𝐶𝑜𝑣 𝑋, 𝑌 − 1
𝑟= = 9 =− .
𝜎𝑋 ⋅ 𝜎𝑌 32 32 2
×
9 9
Limits of the correlation coefficient
Theorem 1: −1 ≤ 𝑟 ≤ 1.
Proof: If 𝑋 is any random variable, then 𝐸 𝑋 2 ≥ 0. In particular,
2
𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
𝐸 ± ≥ 0.
𝜎𝑋 𝜎𝑌
2 2
𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
⇒𝐸 +𝐸 ± 2𝐸 ≥0
𝜎𝑋 𝜎𝑌 𝜎𝑋 𝜎𝑌
1 2
1 2
𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
⇒ 𝐸 𝑋 − 𝜇𝑋 + 𝐸 𝑌 − 𝜇𝑌 ±2⋅ ≥0
𝜎𝑋2 𝜎𝑌2 𝜎𝑋 ⋅ 𝜎𝑌
1 2
1 2
𝐶𝑜𝑣 𝑋, 𝑌
⇒ 2 × 𝜎𝑋 + 2 × 𝜎𝑌 ± 2 × ≥0
𝜎𝑋 𝜎𝑌 𝜎𝑋 ⋅ 𝜎𝑌
1 2
1 2
𝐶𝑜𝑣 𝑋, 𝑌
⇒ 2 × 𝜎𝑋 + 2 × 𝜎𝑌 ± 2 × ≥0
𝜎𝑋 𝜎𝑌 𝜎𝑋 ⋅ 𝜎𝑌
⇒ 1 + 1 ± 2𝑟 ≥ 0 ⇒ 1 + 𝑟 ≥ 0 and 1 − 𝑟 ≥ 0
Thus,
𝑟 ≥ −1 and 𝑟 ≤ 1 ⇒ −1 ≤ 𝑟 ≤ 1.
Observe that 𝒓 = 𝟏 or 𝒓 = −𝟏 according as
2 2
𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌
𝐸 − = 0 or 𝐸 + =0
𝜎𝑋 𝜎𝑌 𝜎𝑋 𝜎𝑌
𝜎𝑌2 = 𝐸 𝑌 − 𝜇𝑌 2 = 𝐸 𝑚𝑋 + 𝑐 − 𝑚𝜇𝑋 + 𝑐 2
2
= 𝐸 𝑚𝑋 − 𝑚𝜇𝑋 = 𝐸 𝑚2 𝑋 − 𝜇𝑋 2
= 𝑚2 𝐸 𝑋 − 𝜇𝑋 2
= 𝑚2 𝜎𝑋2 ⇒ 𝜎𝑌 = 𝑚 𝜎𝑋 .
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌 = 𝐸 𝑋 − 𝜇𝑋 𝑚𝑋 − 𝑚𝜇𝑋
= 𝐸 𝑚 𝑋 − 𝜇𝑋 𝑋 − 𝜇𝑋 = 𝑚𝐸 𝑋 − 𝜇𝑋 2
= 𝑚𝜎𝑋2 .
Hence,
𝜇𝑌 = 𝐸 𝑏 + 𝑘𝑉 = 𝑏 + 𝑘𝐸 𝑉 = 𝑏 + 𝑘𝜇𝑉 ,
= 𝐸 ℎ 2 𝑈 − 𝜇𝑈 2 = ℎ 2 𝐸 𝑈 − 𝜇𝑈 2 = ℎ2 𝜎𝑈2 ⇒ 𝜎𝑋 = ℎ𝜎𝑈 .
Similarly,
𝜎𝑌2 = 𝐸 𝑌 − 𝜇𝑌 2 = 𝐸 𝑏 + 𝑘𝑉 − 𝑏 + 𝑘𝜇𝑉 2 = 𝐸 𝑘𝑉 − 𝑘𝜇𝑉 2
= 𝐸 𝑘 2 𝑉 − 𝜇𝑉 2 = 𝑘 2 𝐸 𝑉 − 𝜇𝑉 2 = 𝑘 2 𝜎𝑉2 ⇒ 𝜎𝑌 = 𝑘𝜎𝑉 .
Now,
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋 − 𝜇𝑋 𝑌 − 𝜇𝑌 = 𝐸 ℎ 𝑈 − 𝜇𝑈 𝑘 𝑉 − 𝜇𝑉
= ℎ𝑘𝐸 𝑈 − 𝜇𝑈 𝑉 − 𝜇𝑉 = ℎ𝑘 𝐶𝑜𝑣 𝑈, 𝑉 .
Hence,
𝑿 𝒙𝟏 𝒙𝟐 𝒙𝟑 ⋯ 𝒙𝒏
𝒀 𝒚𝟏 𝒚𝟐 𝒚𝟑 ⋯ 𝒚𝒏
Then,
𝑛 𝑛 𝑛 𝑛
1 1 1 1
𝜎𝑋2 = 𝑥𝑖 − 𝑥ҧ 2 = 𝑥𝑖2 − 𝑥ҧ 2 , 𝜎𝑌2 = 𝑦𝑖 − 𝑦ത 2 = 𝑦𝑖2 − 𝑦ത 2
𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
1 1
𝐶𝑜𝑣 𝑋, 𝑌 = 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത = 𝑥𝑖 𝑦𝑖 − 𝑥ҧ 𝑦ത
𝑛 𝑛
𝑖=1 𝑖=1
and
𝐶𝑜𝑣 𝑋, 𝑌
𝑟= .
𝜎𝑋 ⋅ 𝜎𝑌
Correlation coefficient for a sample
In some texts, the sample correlation coefficient is calculated slightly differently. It is defined as
𝑆𝑋,𝑌
𝑟=
𝑆𝑋 ⋅ 𝑆𝑌
where,
𝑛 𝑛 𝑛 𝑛
1 1 2 1 1
𝑆𝑋2 = 𝑥𝑖 − 𝑥ҧ 2
= 2 2
𝑥𝑖 − 𝑛𝑥ҧ , 𝑆𝑌 = 𝑦𝑖 − 𝑦ത 2
= 𝑥𝑖2 − 𝑛𝑥ҧ 2
𝑛−1 𝑛−1 𝑛 𝑛−1
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
1 1
𝑆𝑋,𝑌 = 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത = 𝑥𝑖 𝑦𝑖 − 𝑛𝑥ҧ 𝑦ത
𝑛−1 𝑛−1
𝑖=1 𝑖=1
Here, 𝑆𝑋2 , 𝑆𝑌2 and 𝑆𝑋,𝑌 are respectively called the sample variance of 𝑋, the sample variance of 𝑌
and the sample covariance of 𝑋 and 𝑌. Actually, 𝑆𝑋2 , 𝑆𝑌2 and 𝑆𝑋,𝑌 are the unbiased estimate of their
corresponding population parameters. However, the sample correlation coefficient calculated by
either formula gives the same value. Hence, we follow the former formula.
Example 5: The following table gives the marks of 5 students in two tests out of 20 (X →
mark in first test, Y → mark in second test). Find the correlation coefficient.
𝑿 𝟕 𝟏𝟓 𝟏𝟐 𝟏𝟏 𝟗
𝒀 𝟏𝟎 𝟏𝟒 𝟖 𝟏𝟓 𝟏𝟑
𝑿𝟐 49 225 144 121 81
𝒀𝟐 100 196 64 225 169
𝑿𝒀 70 210 96 165 117
1 1
𝑥ҧ = 𝑥𝑖 = 10.8, 𝑦ത = 𝑦𝑖 = 12,
5 5
From the table,
𝑥𝑖 = 54, 𝑦𝑖 = 60, 𝑥𝑖2 = 620, 𝑦𝑖2 = 754, 𝑥𝑖 𝑦𝑖 = 658.
Hence,
1 1
𝑥ҧ = 𝑥𝑖 = 10.8, 𝑦ത = 𝑦𝑖 = 12,
5 5
2
1 2 1
𝜎𝑋 = 𝑥𝑖 − 𝑥ҧ = × 620 − 10.82 = 7.36,
2
5 5
2
1 2 1
𝜎𝑌 = 𝑦𝑖 − 𝑦ത = × 754 − 122 = 6.8,
2
5 5
1 1
𝐶𝑜𝑣 𝑋, 𝑌 = 𝑥𝑖 𝑦𝑖 − 𝑥ҧ 𝑦ത = × 658 − 10.8 × 12 = 2.
5 5
Hence,
𝐶𝑜𝑣 𝑋, 𝑌 2
𝑟= = = 𝟎. 𝟐𝟖𝟐𝟕.
𝜎𝑋 ⋅ 𝜎𝑌 7.36 × 6.8
Example 6: Find 𝑟 and verify that in the former case, 𝑟 is positive and in the later case 𝑟
is negative.
𝑿 1 2 3 4 5 6 7 8 9 10
𝒀 1 4 9 16 25 36 49 64 81 100
Verify that for the above data 𝑟 = 0.9746. For the second data 𝑟 = −0.9746.
𝑿 1 2 3 4 5 6 7 8 9 10
𝑌 100 81 64 49 36 25 16 9 4 1
𝑛 𝑛+1 𝑛+1
𝑥𝑖 = 𝑦𝑖 = ⇒ 𝑥ഥ = 𝑦ത = ,
2 2
𝑛 𝑛 + 1 (2𝑛 + 1) 𝑛 2−1
𝑥𝑖2 = 𝑦𝑖2 = ⇒ 𝜎𝑋2 = 𝜎𝑌2 =
6 12
Let 𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖 , 𝑖 = 1,2, … , 𝑛. Then
1 2 1 2
1 2
𝑑𝑖 = 𝑥𝑖 − 𝑦𝑖 = 𝑥𝑖 − 𝑥ҧ − 𝑦𝑖 − 𝑦ത
𝑛 𝑛 𝑛
1 2
1 2
1
= 𝑥𝑖 − 𝑥ҧ + 𝑦𝑖 − 𝑦ത − 2 × 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝑛 𝑛 𝑛
= 𝜎𝑋2 + 𝜎𝑌2 − 2𝐶𝑜𝑣 𝑋, 𝑌 = 𝜎𝑋2 + 𝜎𝑌2 − 2𝑟𝜎𝑋 𝜎𝑌 .
Since
𝑛 2−1
𝜎𝑋2 = 𝜎𝑌2 = ,
12
it follows that
1 𝑛 2−1 𝑛 2−1
𝑑𝑖2 = 𝜎𝑋2 + 𝜎𝑌2 − 2𝑟𝜎𝑋 𝜎𝑌 = 2 × − 2𝑟 ×
𝑛 12 12
𝑛2 − 1
= 1−𝑟 × .
6
6 σ 𝑑𝑖2
⇒1−𝑟 = 2
.
𝑛 𝑛 −1
Hence,
𝟔 σ 𝒅𝟐𝒊
𝒓=𝟏− 𝟐
.
𝒏 𝒏 −𝟏
Example 7: The following table gives the marks of 10 students in two tests out of 100 (X → mark
in first test, Y → mark in second test). Find the rank correlation coefficient.
Test I 45 87 55 67 97 25 75 48 52 17
Marks
Test II 65 70 78 82 81 32 67 55 60 37
Marks
Thus, 𝑛 = 10, σ 𝑑𝑖2 = 30. Hence, the rank correlation coefficient is given by
6 σ 𝑑𝑖2 6 × 30 9
𝑟 =1− 2
= 1 − = .
𝑛 𝑛 −1 10 100 − 1 11
Repeated ranks
When two or more 𝑋 or 𝑌 values are same, the rank assigned to these values are the
average values in the case they are different. For each such repeated ranks, the factor
𝑚 𝑚2 − 1
12
is to be added to σ 𝑑𝑖2 . The following example illustrates this correction.
Example 7: The following table gives the marks of 10 students in two tests out of 100
(X → mark in first test, Y → mark in second test). Find the rank correlation coefficient.
Test I 45 87 55 61 97 25 75 48 61 17
Marks
Test II 65 65 78 82 81 32 65 55 60 37
Marks
Thus, 𝑛 = 10, σ 𝑑𝑖2 = 53.5. There are 2 repeated 𝑥-rank and 3 𝑦-rank. Hence, the factor
to be added to σ 𝑑𝑖2 is
2(22 − 1) 3(32 − 1) 30
+ = = 2.5.
12 12 12
Hence, the rank correlation coefficient is given by
𝑚 𝑚2 − 1
6 σ 𝑑𝑖2 +σ 6 53.5 + 2.5 56 109
12
𝑟 =1− =1− =1− = .
𝑛 𝑛2 − 1 10 100 − 1 165 165
Regression
As we have discussed, the correlation coefficient between 𝑋 and 𝑌 expresses the amount
of linearity between 𝑋 and 𝑌 numrically. The purpose of regression is to approximate the
relationship between 𝑋 and 𝑌 by linearity. This is achieved in two ways: estimating 𝑌
without disturbing 𝑋 such that the points 𝑋, 𝑒𝑠𝑡(𝑌) are on a straight line and estimating
𝑋 without disturbing 𝑌 such that 𝑒𝑠𝑡 𝑌 , 𝑌 are on a straight line.
𝒙 𝒊 , 𝒚𝒊
𝒚𝒊 − 𝐞𝐬𝐭(𝒚𝒊 )
𝒙𝒊 , 𝒂 + 𝒃𝒙𝒊
𝒙𝒊
due to the replacement of 𝒙𝒊 , 𝒚𝒊 by means of 𝒙𝒊 , 𝒂 + 𝒃𝒙𝒊 is shown. There are 𝑛 such
errors which need to be minimized in some way for the best possible line.
It is clear from the figure that the error in replacing the point 𝑥𝑖 , 𝑦𝑖 by 𝑥𝑖 , est 𝑦𝑖 =
𝑥𝑖 , 𝑎 + 𝑏𝑥𝑖 is equal to 𝐸𝑖 = 𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 , 𝑖 = 1,2, … , 𝑛. We need to minimize these
errors in some way. We can’t make all errors zero since the points are not on a line. Since
positive and negative errors of equal magnitude are equally important, we need to
consider some function of the absolute error to be minimized. Since it is difficult to
minimize sum of absolute errors we consider minimizing the sum of squares of the errors
𝑛
2
𝑆 = 𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖
𝑖=1
The above equations are known as normal equations and can be simplified as
𝑛 𝑛
On rearrangement, we get
𝑛 𝑛 𝑛 𝑛 𝑛
𝑦𝑖 = 𝑛𝑎 + 𝑏 𝑥𝑖 , 𝑥𝑖 𝑦𝑖 = 𝑎 𝑥𝑖 + 𝑏 𝑥𝑖2 .
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1
The line of regression of 𝑌 on 𝑋 , that is 𝑌 = 𝑎 + 𝑏𝑋, has been obtained keeping 𝑋 fixed
and estimating 𝑌 such that the values of 𝑋, 𝑒𝑠𝑡(𝑌) are on a straight line. We can
similarly consider the problem of finding a line of the form 𝑋 = 𝑎 + 𝑏𝑌, where we keep
𝑌 fixed and estimate 𝑋 such that the values of 𝑒𝑠𝑡 𝑋 , 𝑌 lie on a straight line. For this
we have to repeat the entire calculation we have done for fitting the line 𝑌 = 𝑎 + 𝑏𝑋.
However, the can also be obtained from fitting we have already done just by
interchanging 𝑋 and 𝑌. In doing so, we obtain the line
𝑿−𝒙 ഥ = 𝒃𝑿𝒀 𝒀 − 𝒚 ഥ
which is known as the line of regression of 𝑋 on 𝑌, or simply LR of 𝑋 on 𝑌.
The multiplier 𝑏𝑋𝑌 of
ഥ = 𝒃𝑿𝒀 𝒀 − 𝒚
𝑿−𝒙 ഥ
is known as the regression coefficient of 𝑋 on 𝑌and is equal to
𝐶𝑜𝑣 𝑋, 𝑌 𝜎𝑋
𝑏𝑋𝑌 = 2 =𝑟 .
𝜎𝑌 𝜎𝑌
• In LR of 𝑌 on 𝑋, 𝑋 is fixed, 𝑌 is estimated so that 𝑋, 𝑒𝑠𝑡(𝑌) are on a straight line.
• In LR of 𝑋 on 𝑌, 𝑌 is fixed, 𝑋 is estimated so that 𝑒𝑠𝑡 𝑋 , 𝑌 are on a straight line.
• The LR of 𝑌 on 𝑋 and LR of 𝑋 on 𝑌 can also be written as
𝟏
ഥ = 𝒃𝒀𝑿 𝑿 − 𝒙
𝒀−𝒚 ഥ , ഥ=
𝒀−𝒚 ഥ .
𝑿−𝒙
𝒃𝑿𝒀
• The slope of LR of 𝑌 on 𝑋 is 𝑏𝑌𝑋 , the regression coefficient of 𝑋 on 𝑌 while the slope
of LR of 𝑋 on 𝑌 is 1/𝑏𝑋𝑌 the reciprocal of the regression coefficient.
• Since the signs of 𝑏𝑌𝑋 , 𝑏𝑋𝑌 and 𝑟 are same as the sign of 𝐶𝑜𝑣(𝑋, 𝑌), hence both the
regression coefficients and the correlation coefficient are of the same sign.
• Hence, the slopes of both regression coefficients are of the same sign. Thus, either both
the slopes are the same sign.
• The two regression lines are either both inclined towards the positive 𝑥-axis or both are
inclines towards the negative 𝑥-axis (if they are not perpendicular to each other).
𝜎𝑌 𝜎𝑋
• 𝑏𝑌𝑋 ⋅ 𝑏𝑋𝑌 = 𝑟 ⋅𝑟 = 𝑟 2 . Hence, the correlation coefficient is the geometric mean
𝜎𝑋 𝜎𝑌
of the regression coefficients. However, mind the signs of the 𝑏𝑌𝑋 , 𝑏𝑋𝑌 and 𝑟.
• The point 𝑥,ҧ 𝑦ത is common to both the regression lines. Hence, the point of
intersection of the two regression lines is 𝑥,ҧ 𝑦ത .
• The two regression lines 𝑌 − 𝑦ത = 𝑏𝑌𝑋 𝑋 − 𝑥ҧ , 𝑌 − 𝑦ത = 1/𝑏𝑋𝑌 𝑋 − 𝑥ҧ coincide if
only if 𝑏𝑌𝑋 = 1/𝑏𝑋𝑌 which is equivalent to 𝑟 2 = 1 and 𝑟 = ±1. Thus, when the 𝑛
points are on a straight line, there is just one regression line.
• Angle of intersection of the two regression line:
𝜃2 𝜃1
𝜎 1 𝜎
Observe that 𝜃 = 𝜃1 − 𝜃2 , 0 ≤ 𝜃 ≤ 𝜋/2 , tan 𝜃1 = 𝑏𝑌𝑋 = 𝑟 𝑌 and tan 𝜃2 = 𝑏𝑋𝑌 = ⋅ 𝑌 .
𝜎𝑋 𝑟 𝜎𝑋
Hence,
tan 𝜃1 − tan 𝜃2
tan 𝜃 = tan 𝜃1 − 𝜃2 =
1 + tan 𝜃1 ⋅ tan 𝜃2
1 𝜎
𝑟− ⋅ 𝑌 𝜎𝑋 𝜎𝑌 1
𝑟 𝜎𝑋
= 2 = 2 2⋅ 𝑟−𝑟 .
𝜎𝑌 𝜎𝑋 + 𝜎𝑌
1+ 2
𝜎𝑋
Hence, the angle of intersection of the two regression lines is
−1
𝜎𝑋 𝜎𝑌 1
𝜃 = tan 2 2⋅ 𝑟−𝑟 .
𝜎𝑋 + 𝜎𝑌
• Case I: 𝜃 = 0 iff tan 𝜃 = 0 (since 0 ≤ 𝜃 ≤ 𝜋/2) and hence
𝜎𝑋 𝜎𝑌 1 1
2 2 ⋅ 𝑟 − 𝑟 = 0 ⇒ 𝑟 − 𝑟 = 0 ⇒ 𝑟 = ±1.
𝜎𝑋 + 𝜎𝑌
Thus, 𝜃 = 0 iff 𝑋 and 𝑌 are linearly related.
• 𝜃 = 𝜋/2 iff tan 𝜃 = ∞ which is possible iff 𝑟 = 0. Thus, 𝜃 = 𝜋/2 iff 𝑋 and 𝑌 are
uncorrelated.
Example 8: The following table gives the marks of 5 students in two tests out of 20 (X →
mark in first test, 𝑌 → mark in second test). Find the two lines of regression. Estimate the
mark of a student in the second test if his mark in the first test is 10 and the also estimate
the mark of a student in the first test if his mark in the second test is 12
𝑿 𝟕 𝟏𝟓 𝟏𝟐 𝟏𝟏 𝟗
𝒀 𝟏𝟎 𝟏𝟒 𝟖 𝟏𝟓 𝟏𝟑
Ans: Referring to Example 5, one can see that
𝑥ҧ = 10.8, 𝑦ത = 12, 𝜎𝑋2 = 7.36, 𝜎𝑌2 = 6.8, 𝐶𝑜𝑣 𝑋, 𝑌 = 2.
Hence,
𝐶𝑜𝑣 𝑋, 𝑌 2
𝑏𝑌𝑋 = 2 = = 0.2717
𝜎𝑋 7.36
𝐶𝑜𝑣 𝑋, 𝑌 2
𝑏𝑋𝑌 = 2 = = 0.2941
𝜎𝑌 6.8
Now, the LR of 𝑌 on 𝑋 is given by
𝑌 − 𝑦ത = 𝑏𝑌𝑋 𝑋 − 𝑥ҧ
⇒ 𝑌 − 12 = 0.2717 𝑋 − 10.8
⇒ 0.2717𝑋 − 𝑌 = −9.0656
and the LR of 𝑋 on 𝑌 is given by
𝑋 − 𝑥ҧ = 𝑏𝑋𝑌 𝑌 − 𝑦ത
⇒ 𝑋 − 10.8 = 0.2941 𝑌 − 12
⇒ 𝑋 − 0.2941𝑌 = 7.2708
Estimated mark of a student in the second test if his mark in the first test is 10 is to be
obtained from LR of 𝑌 on 𝑋. Hence, when 𝑋 = 10, 𝑌 = 0.2717 × 10 + 9.0656 = 11.78.
Estimated mark of a student in the first test if his mark in the second test is 12 is to be
obtained from LR of 𝑋 on 𝑌. Hence, if 𝑌 = 12, then 𝑋 = 0.2941 × 12 + 7.2708 = 10.8.
Example 9: The two lines of regression are given by 3.40𝑋 − 𝑌 = 24.72 and
𝑋 − 3.68𝑌 = −33.37. Find the means of 𝑋 and 𝑌, the two regression coefficients, the
correlation coefficient and the ratio of variances of 𝑋 and 𝑌.
Ans. The two regression lines intersects at 𝑥,ҧ 𝑦ത . Hence, 3.40𝑥ҧ − 𝑦ത = 24.72 and
𝑥ҧ − 3.68𝑦ത = −33.37. On solving, we get 𝑥ҧ = 10.8 𝑦
ത = 12.
1
𝑚2 = 0.272 = 𝑏𝑌𝑋 , 𝑚1 = 3.40 = ⇒ 𝑏𝑋𝑌 = 0.294.
𝑏𝑋𝑌
𝑟2 = 𝑏𝑋𝑌 ⋅ 𝑏𝑌𝑋 = 0.08 ⇒ 𝑟 = 0.283.
Observe that 𝑟 is positive since both 𝑏𝑋𝑌 and 𝑏𝑌𝑋 are positive. Since
𝜎𝑋𝜎𝑋 𝑏𝑋𝑌 0.294
𝑏𝑋𝑌 =𝑟⋅ ⇒ = = = 1.039.
𝜎𝑌 𝜎𝑌 𝑟 0.283
Hence
𝜎𝑋2
2 = 1.08.
𝜎𝑌