Regression Analysis
Regression Analysis
After having established the fact that two variables are closely
related we may be interested in estimating (predicting) the value of one
variable given the value of another.
Regression line of 𝒀 on 𝑿: 𝑌 = 𝑎 + 𝑏𝑋
Where a and b are two unknown constants (fixed numeric values)
which determine the position of the line completely.
These constants are called the parameters of the line. To determine the
values of a and b the following normal equations are to be solved simultaneously:
𝑛 𝑛
𝑌𝑖 = 𝑁𝑎 + 𝑏 𝑋𝑖
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑋𝑖 𝑌𝑖 = 𝑎 𝑋𝑖 + 𝑏 𝑋𝑖2
𝑖=1 𝑖=1 𝑖=1
Regression line of 𝑿 on 𝒀:
𝑋 = 𝑎 + 𝑏𝑌
𝑛 𝑛
𝑋𝑖 = 𝑁𝑎 + 𝑏 𝑌𝑖
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
𝑋𝑖 𝑌𝑖 = 𝑎 𝑌𝑖 + 𝑏 𝑌𝑖2
𝑖=1 𝑖=1 𝑖=1
Example 1:
From the following data obtain the two regression lines:
X 6 2 10 4 8
Y 9 11 5 8 7
Solution: X Y XY 𝑿𝟐 𝒀𝟐
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
(𝑋−𝑋)(𝑌−𝑌)
𝑟=
2 2
𝑋−𝑋 𝑌−𝑌
𝜎𝑦 (𝑋−𝑋)(𝑌−𝑌)
𝑟 = 2
𝜎𝑥 𝑋−𝑋
𝜎𝑥
Regression line of 𝑿 on 𝒀: 𝑋−𝑋 =𝑟 (𝑌 − 𝑌)
𝜎𝑦
𝜎𝑥 (𝑋−𝑋)(𝑌−𝑌)
𝑟 = 2
𝜎𝑦 𝑌−𝑌
From example 1:
𝜎𝑦 (𝑋−𝑋)(𝑌−𝑌) 𝜎𝑥
𝑋 = 6, 𝑌 = 8, 𝑟 = 2 = -0.65 𝑟 = -1.3
𝜎𝑥 𝑋−𝑋 𝜎𝑦
𝜎𝑦
Regression line of 𝒀 on 𝑿: 𝑌 − 𝑌 = 𝑟 (𝑋 − 𝑋)
𝜎𝑥
𝑌 = 11.9 − 0.65𝑋
𝜎𝑥
Regression line of 𝑿 on 𝒀: 𝑋 − 𝑋 = 𝑟 (𝑌 − 𝑌)
𝜎𝑦
𝑋 = 16.4 − 1.3𝑌
Correlation Coefficient from Regression
Coefficients
Product of the two regression coefficients gives us the value of the
coefficient of correlation.
𝑟= 𝑏𝑥𝑦 × 𝑏𝑦𝑥
𝜎𝑥 𝜎𝑦
𝑏𝑥𝑦 = 𝑟 𝑏𝑦𝑥 = 𝑟
𝜎𝑦 𝜎𝑥
Limitations:
1. Both the regression coefficients cannot be greater than one
Ex: If 𝑏𝑥𝑦 = 1.2 and 𝑏𝑦𝑥 = 1.4
then 𝑟 = 1.2 × 1.4 = 1.29 which is not possible.
2. The Coefficient of correlation (𝑟) will have the same sign as that of
regression coefficients.
i.e., If 𝑏𝑥𝑦 = −0.8 and 𝑏𝑦𝑥 = −0.12
𝑟 = −0.8 × −0.12 = −0.309
3. If the underroot of the two regression coefficients does not exceed one and also
both the regression coefficients have the same sign, the equations would be treated
okay, otherwise not.
Example 2:
The lines of the regression of a bi-variate population are
8𝑋 − 10𝑌 + 66 = 0
40𝑋 − 18𝑌 = 214
Hence 𝑿 = 13, 𝒀 = 17
(ii) Correlation Coefficient:
Assuming that (i) as the regression of 𝑿 on 𝒀,
8𝑋 = −66 + 10𝑌
66 10
⇒ 𝑋=− + 𝑌
8 8
10
∴ 𝑏𝑥𝑦 = = 1.25
8
𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
3
0.45 = 0.6 ×
𝜎𝑦
∴ 𝜎𝑦 = 4
Example 6
X 1 5 3 2 1 1 7 3
Y 6 1 0 0 1 2 1 5
𝑛 𝑛 𝑛 2 𝑛
𝑖=1 𝑋1𝑖 𝑋2𝑖 = 𝑎1.23 𝑖=1 𝑋2𝑖 + 𝑏12.3 𝑋
𝑖=1 2𝑖 + 𝑏13.2 𝑖=1 𝑋2𝑖 𝑋3𝑖
𝑛 𝑛 𝑛 𝑛 2
𝑖=1 𝑋1𝑖 𝑋3𝑖 = 𝑎1.23 𝑖=1 𝑋3𝑖 + 𝑏12.3 𝑖=1 𝑋2𝑖 𝑋3𝑖 + 𝑏13.2 𝑋
𝑖=1 3𝑖
𝒆𝟏.𝟐𝟑 = 𝒃𝟏𝟐.𝟑 𝑿𝟐 + 𝒃𝟏𝟑.𝟐 𝑿𝟑 - is called the estimate of 𝑋1 given
by plane of regression.
Assume that the variables 𝑿𝟏 on 𝑿𝟐 and 𝑿𝟑 have been measured from their
respective means, so that
𝑬 𝑿𝟏 = 𝑬 𝑿𝟐 = 𝑬 𝑿𝟑 = 𝟎
Hence taking the expectation on both sides, we get 𝒂 = 𝟎.
Thus the plane of regression of 𝑿𝟏 on 𝑿𝟐 and 𝑿𝟑 becomes