0% found this document useful (0 votes)
80 views22 pages

Regression Analysis

Regression analysis allows us to estimate or predict the value of one variable given known values of another variable. The statistical tool used for this is called regression. Regression lines can be calculated for both Y on X and X on Y. These lines use the method of least squares to calculate the constants a and b that determine the line. The normal equations are then solved to determine the parameter values. The correlation coefficient r can be calculated from the regression coefficients and provides a measure of the strength and direction of the linear relationship between the two variables. Multiple regression extends this to predict a dependent variable from two or more independent variables.

Uploaded by

Nanthitha B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views22 pages

Regression Analysis

Regression analysis allows us to estimate or predict the value of one variable given known values of another variable. The statistical tool used for this is called regression. Regression lines can be calculated for both Y on X and X on Y. These lines use the method of least squares to calculate the constants a and b that determine the line. The normal equations are then solved to determine the parameter values. The correlation coefficient r can be calculated from the regression coefficients and provides a measure of the strength and direction of the linear relationship between the two variables. Multiple regression extends this to predict a dependent variable from two or more independent variables.

Uploaded by

Nanthitha B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Regression Analysis

After having established the fact that two variables are closely
related we may be interested in estimating (predicting) the value of one
variable given the value of another.

The statistical tool with the help of which we are in a position to


estimate (or predict) the unknown values of one variable from known
values of another variable is called regression.
Regression Lines:
If we take the case of two variables X and Y, we shall have two regression lines as
the regression of X on Y and regression of Y on X.

Regression line of 𝒀 on 𝑿: 𝑌 = 𝑎 + 𝑏𝑋
Where a and b are two unknown constants (fixed numeric values)
which determine the position of the line completely.
These constants are called the parameters of the line. To determine the
values of a and b the following normal equations are to be solved simultaneously:
𝑛 𝑛

𝑌𝑖 = 𝑁𝑎 + 𝑏 𝑋𝑖
𝑖=1 𝑖=1

𝑛 𝑛 𝑛

𝑋𝑖 𝑌𝑖 = 𝑎 𝑋𝑖 + 𝑏 𝑋𝑖2
𝑖=1 𝑖=1 𝑖=1
Regression line of 𝑿 on 𝒀:
𝑋 = 𝑎 + 𝑏𝑌

The normal equations to solve the parameters are:

𝑛 𝑛

𝑋𝑖 = 𝑁𝑎 + 𝑏 𝑌𝑖
𝑖=1 𝑖=1

𝑛 𝑛 𝑛

𝑋𝑖 𝑌𝑖 = 𝑎 𝑌𝑖 + 𝑏 𝑌𝑖2
𝑖=1 𝑖=1 𝑖=1
Example 1:
From the following data obtain the two regression lines:

X 6 2 10 4 8
Y 9 11 5 8 7
Solution: X Y XY 𝑿𝟐 𝒀𝟐
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49

𝑿 = 30 𝒀 = 40 𝑿𝒀 = 214 𝑿𝟐 = 220 𝒀𝟐 = 340

(i) Regression line of 𝒀 on 𝑿: 𝑌 = 𝑎 + 𝑏𝑋


𝑌 = 𝑁𝑎 + 𝑏 𝑋
𝑋𝑌 = 𝑎 𝑋 + 𝑏 𝑋 2
𝑎 = 11.9 & 𝑏 = −0.65

∴ Regression line of 𝒀 on 𝑿 is 𝑌 = 11.9 − 0.65𝑋


(ii) Regression line of 𝑿 on 𝒀: 𝑋 = 𝑎 + 𝑏𝑌
𝑋 = 𝑁𝑎 + 𝑏 𝑌
𝑋𝑌 = 𝑎 𝑌 + 𝑏 𝑌 2
𝑎 = 16.4 & 𝑏 = −1.3

∴ Regression line of 𝑿 on 𝒀 is 𝑋 = 16.4 − 1.3𝑌


Deviations taken from the Arithmetic Mean
of X and Y
𝜎𝑦
Regression line of 𝒀 on 𝑿: 𝑌−𝑌 =𝑟 (𝑋 − 𝑋)
𝜎𝑥

(𝑋−𝑋)(𝑌−𝑌)
𝑟=
2 2
𝑋−𝑋 𝑌−𝑌

𝜎𝑦 (𝑋−𝑋)(𝑌−𝑌)
𝑟 = 2
𝜎𝑥 𝑋−𝑋

𝜎𝑥
Regression line of 𝑿 on 𝒀: 𝑋−𝑋 =𝑟 (𝑌 − 𝑌)
𝜎𝑦
𝜎𝑥 (𝑋−𝑋)(𝑌−𝑌)
𝑟 = 2
𝜎𝑦 𝑌−𝑌
From example 1:

X X-6 (𝑿 − 𝟔)𝟐 Y Y-8 (𝒀 − 𝟖)𝟐 (X-6)(Y-8)


6 0 0 9 1 1 0
2 -4 16 11 3 9 -12
10 4 16 5 -3 9 -12
4 -2 4 8 0 0 0
8 2 4 7 -1 1 -2
30 0 40 40 0 20 -26

𝜎𝑦 (𝑋−𝑋)(𝑌−𝑌) 𝜎𝑥
𝑋 = 6, 𝑌 = 8, 𝑟 = 2 = -0.65 𝑟 = -1.3
𝜎𝑥 𝑋−𝑋 𝜎𝑦

𝜎𝑦
Regression line of 𝒀 on 𝑿: 𝑌 − 𝑌 = 𝑟 (𝑋 − 𝑋)
𝜎𝑥
𝑌 = 11.9 − 0.65𝑋

𝜎𝑥
Regression line of 𝑿 on 𝒀: 𝑋 − 𝑋 = 𝑟 (𝑌 − 𝑌)
𝜎𝑦
𝑋 = 16.4 − 1.3𝑌
Correlation Coefficient from Regression
Coefficients
Product of the two regression coefficients gives us the value of the
coefficient of correlation.

𝑟= 𝑏𝑥𝑦 × 𝑏𝑦𝑥

𝜎𝑥 𝜎𝑦
𝑏𝑥𝑦 = 𝑟 𝑏𝑦𝑥 = 𝑟
𝜎𝑦 𝜎𝑥
Limitations:
1. Both the regression coefficients cannot be greater than one
Ex: If 𝑏𝑥𝑦 = 1.2 and 𝑏𝑦𝑥 = 1.4
then 𝑟 = 1.2 × 1.4 = 1.29 which is not possible.

2. The Coefficient of correlation (𝑟) will have the same sign as that of
regression coefficients.
i.e., If 𝑏𝑥𝑦 = −0.8 and 𝑏𝑦𝑥 = −0.12
𝑟 = −0.8 × −0.12 = −0.309
3. If the underroot of the two regression coefficients does not exceed one and also
both the regression coefficients have the same sign, the equations would be treated
okay, otherwise not.
Example 2:
The lines of the regression of a bi-variate population are
8𝑋 − 10𝑌 + 66 = 0
40𝑋 − 18𝑌 = 214

The variance of the X is 9. (i) Find the mean values of X and Y.


(ii) Also obtain the standard deviation of Y and coefficient of correlation
between X and Y.
(i) Mean values of 𝑋 and 𝑌
8𝑋 − 10𝑌 = −66 ------ (i)
40𝑋 − 18𝑌 = 214 ------ (ii)

Solving for 𝑿 and 𝒀 : Y = 17 & X = 13

Hence 𝑿 = 13, 𝒀 = 17
(ii) Correlation Coefficient:
Assuming that (i) as the regression of 𝑿 on 𝒀,
8𝑋 = −66 + 10𝑌
66 10
⇒ 𝑋=− + 𝑌
8 8
10
∴ 𝑏𝑥𝑦 = = 1.25
8

From Equation (ii) −18𝑌 = −40𝑋 + 214


18𝑌 = 40𝑋 − 214
40 214
𝑌= 𝑋−
18 18
40
∴ 𝑏𝑦𝑥 = = 2.22
18
Since both the regression coefficients are exceeding 1, our assumption
is wrong.
Hence, Assuming that (i) as the regression of 𝒀 on 𝑿,
−10𝑌 = −8𝑋 − 66
10𝑌 = 8𝑋 + 66
8 66
⇒ 𝑌= 𝑋+
10 10
8
∴ 𝑏𝑦𝑥 = = 0.8
10

From Equation (ii) as the regression of 𝑿 on 𝒀


40𝑋 = 18𝑌 + 214
18 214
𝑋= 𝑌+
40 40
18
∴ 𝑏𝑥𝑦 = = 0.45
40

∴ 𝑟= 𝑏𝑥𝑦 × 𝑏𝑦𝑥 = 0.45 × 0.8 = 0.6


(iii) Since, the variance of 𝑿 = 𝝈𝟐𝒙 = 𝟗 , i.e., 𝜎𝑥 = 3

𝜎𝑥
𝑏𝑥𝑦 = 𝑟
𝜎𝑦
3
0.45 = 0.6 ×
𝜎𝑦

∴ 𝜎𝑦 = 4
Example 6
X 1 5 3 2 1 1 7 3
Y 6 1 0 0 1 2 1 5

a) Fit a regression line of 𝑌 on 𝑋 and hence predict the value of 𝑌 if


𝑋 = 5.
b) Fit the regression line of 𝑋 on 𝑌 and hence predict the value of 𝑋 if
𝑌 = 2.5.
c) Calculate the Karl Pearson’s coefficient of correlation.
Multiple Regression
A multiple regression equation for estimating a dependent variable, say
𝑋1 from the independent variables 𝑋2 , 𝑋3 ⋯ ⋯ and is called a regression
equation of 𝑋1 on 𝑋2 , 𝑋3 ⋯ ⋯

In case of three variables, the regression equation of 𝑿𝟏 on 𝑿𝟐 and 𝑿𝟑


has the form
𝑿𝟏.𝟐𝟑 = 𝒂𝟏.𝟐𝟑 + 𝒃𝟏𝟐.𝟑 𝑿𝟐 + 𝒃𝟏𝟑.𝟐 𝑿𝟑
Normal Equations for the Least square
Regression Plane
𝑛 𝑛 𝑛

𝑋1𝑖 = 𝑁𝑎1.23 + 𝑏12.3 𝑋2𝑖 + 𝑏13.2 𝑋3𝑖


𝑖=1 𝑖=1 𝑖=1

𝑛 𝑛 𝑛 2 𝑛
𝑖=1 𝑋1𝑖 𝑋2𝑖 = 𝑎1.23 𝑖=1 𝑋2𝑖 + 𝑏12.3 𝑋
𝑖=1 2𝑖 + 𝑏13.2 𝑖=1 𝑋2𝑖 𝑋3𝑖

𝑛 𝑛 𝑛 𝑛 2
𝑖=1 𝑋1𝑖 𝑋3𝑖 = 𝑎1.23 𝑖=1 𝑋3𝑖 + 𝑏12.3 𝑖=1 𝑋2𝑖 𝑋3𝑖 + 𝑏13.2 𝑋
𝑖=1 3𝑖
𝒆𝟏.𝟐𝟑 = 𝒃𝟏𝟐.𝟑 𝑿𝟐 + 𝒃𝟏𝟑.𝟐 𝑿𝟑 - is called the estimate of 𝑋1 given
by plane of regression.

𝑿𝟏.𝟐𝟑 = 𝑿𝟏 − 𝒆𝟏.𝟐𝟑 - is called as the error of estimate or residual.


𝑿𝟏.𝟐𝟑 = 𝒂𝟏.𝟐𝟑 + 𝒃𝟏𝟐.𝟑 𝑿𝟐 + 𝒃𝟏𝟑.𝟐 𝑿𝟑

Assume that the variables 𝑿𝟏 on 𝑿𝟐 and 𝑿𝟑 have been measured from their
respective means, so that
𝑬 𝑿𝟏 = 𝑬 𝑿𝟐 = 𝑬 𝑿𝟑 = 𝟎
Hence taking the expectation on both sides, we get 𝒂 = 𝟎.
Thus the plane of regression of 𝑿𝟏 on 𝑿𝟐 and 𝑿𝟑 becomes

𝑿𝟏.𝟐𝟑 = 𝒃𝟏𝟐.𝟑 𝑿𝟐 + 𝒃𝟏𝟑.𝟐 𝑿𝟑


The coefficients 𝒃𝟏𝟐.𝟑 & 𝒃𝟏𝟑.𝟐 are known as partial regression coefficients.
Example 7
𝑋1 4 6 7 9 13 15
𝑋2 15 12 8 6 4 3
𝑋3 30 24 20 14 10 4

Find the multiple regression equation of 𝑋1 on 𝑋2 and 𝑋3 from


the above data.

You might also like