Regression
Regression
Regression means the estimation or prediction of unknown value of one variable from that of a known
variable .It is a statistical device used to study the relationship between two or more variables which are
related.
Simple regression
In the study of regression analysis, if there are two variables ,it is known as simple regression.
Multiple regression
In multiple regression analysis, there are more than two variables and we try to find out the effect of
two or more independent variables on one dependent variable.
Regression curve
If the given bivariate data are plotted on a graph, the points so obtained on the scatter diagram will be
less or more concentrate around a curve ,is called regression curve.
Linear regression
If the regression curve is a straight line ,we say that there is linear regression between the variables
under study.
If the curve of regression is not a straight line then the regression is termed as non-linear or curved
linear regression.
The points of the scatter diagram concentrate around a straight line, that line is called Line of Best fit.
This line is also known as regression line.
Take independent variable on X-axis and dependent variable on the Y-axis. We draw a smooth free
hand line in such a way that it clearly indicates the tendency of the original data. The line is fitted nearly
by inspection the free hand curve be drawn in such a way that the area of the curve below and above
the line are approximately equal. Such a line will describe the general tendency of the original data .This
is the regression line.
2). Method of least squares (curve fitting)
The principle of least squares is that principle which states that the line of best fit should be drawn in
such a manner that the sum of the squares of difference between the known values of the dependent
variable and the corresponding values of it obtained from the line of best fit should be the least.
ie ∑ (𝑦0 − 𝑦𝑒 )2 should be the least where y0 stands for known values of the dependent variable and
ye stands for the corresponding value of the dependent line.
Regression lines
1) Regression line of X on Y.
2) Regression equation of Y on X.
Regression Equations
n xy − ( x y ) n xy − ( x y )
byx = bxy =
n x 2 − ( x ) n y 2 − ( y )
2 2
PROBLEMS
1. From the following data of the age of Husband and the age of wife, form the two regression
equations and calculate the husband’s age when the wife’s age is 16.
Husband’s 36 23 27 28 28 29 30 31 33 35
age:
Wife’s age: 29 18 20 22 27 21 29 27 29 28
X Y XY X2 Y2
36 29 1044 1296 841
23 18 414 529 324
27 20 540 729 400
28 22 616 784 484
28 27 756 784 729
29 21 609 841 441
30 29 870 900 841
31 27 837 961 729
33 29 957 1089 841
35 28 980 1225 784
300 250 7623 9138 6414
∑ 𝑥 300
𝑥= = = 30
𝑛 10
∑ 𝑦 250
𝑦= = = 25
𝑛 10
n xy − ( x y )
byx =
n x 2 − ( x )
2
= 0.89
n xy − ( x y )
bxy =
n y 2 − ( y )
2
= 0.75
Equation of Y on X is y − y = byx ( x − x )
𝑦 − 25 = 0.89(𝑥 − 30)
Y=0.89 X - 1.7
Equation of X on Y is x − x = bxy ( y − y )
𝑥 − 30 = 0.75(𝑦 − 25)
X = 0.75Y + 11.25
i) When wife’s age (y) is 16 Husband’s age (x) is obtained by putting y=16 in the equation of
‘x’ on ‘y’
ii) When the Husband’s age of (x) is 40, wife’s age (y) is obtained by putting x = 40 in the
= 35.6-1.7
Wife’s age= 33.9
x Y
Arithmetic mean 36 85
Standard 11 8
deviation
𝜎𝑥 0.66 × 11
𝑏𝑥𝑦 = 𝑟 = = 0.91
𝜎𝑦 8
Equation of Y on X is y − y = byx ( x − x )
𝑦 − 85 = 0.48(𝑥 − 36)
= 0.48𝑥 − 17.28
𝑖. 𝑒. , 𝑦 = 0.48𝑥 + 67.72
Equation of X on Y is x − x = bxy ( y − y )
𝑥 − 36 = 0.91(𝑦 − 85)
= 0.91𝑦 − 77.35
𝑖. 𝑒. , 𝑥 = 0.91𝑦 − 41.35
Variance of x = 9
Find
c. The S.D of y.
Solution:
Therefore, y = 17
Thus 𝑥 = 13 , 𝑦 = 17
b. Coefficient of correlation
Taking first equation to be equation of x on y ,
8x=10y-66
10 66
Therefore, 𝑥 = 8 𝑦 − 8
Take second equation as y on x, -18y = -40x + 214
40 214
𝑦= 𝑥−
18 18
10 40 400
The product of coefficients 8 × 18 = 144 > 1
Therefore, our assumption is wrong. Hence the first equation is y on x and second is x on y.
Write the first equation as
-10y = -8x - 66
y=8/10 x+ 66/10
r = bxy byx
Therefore,
= √. 36 = 0.6
c. S.D of Y
Given variance of x = 9
Therefore 𝜎𝑥 = 3
𝜎𝑦 𝜎𝑦
𝑏𝑦𝑥 = 𝑟 = 0.8 = 0.6 ×
𝜎𝑥 3
0.8 × 0.3
∴ 𝜎𝑦 = =4
0.6
2. In correlation analysis, the choice of dependent and independent variables is purely depends
on personal choice and is of no practical significance.
In regression analysis, one has to decide which variable shall be taken as dependent and which as
independent.
3. Correlation analysis is not for the purpose of prediction whereas the regression analysis is
basically used for prediction purposes.
𝑏𝑦𝑥 is the regression co. efficient of y on x and 𝑏𝑥𝑦 is the regression co. efficient of x on y.
1). The sign of both regression coefficients will be the same. That is , both will be positive or both will be
negative.
2). Product of the two regression coefficients is the square of the correlation coefficient.
r = bxy byx
Ie,
3). 𝑏𝑦𝑥 and 𝑏𝑥𝑦 will have the same sign as ‘r’.
4). When there is perfect correlation, 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are reciprocals of each other.
𝑏𝑦𝑥 𝜎𝑦 𝑏𝑥𝑦 𝜎
5) =𝜎 and = 𝜎𝑥
𝑟 𝑥 𝑟 𝑦
6). Both the regression coefficients will not be greater than 1. Ie, one of them cannot be greater than1,or
both can be less than 1.
The difference between actual value and predicted value is the error in prediction. Standard error of
estimate is the square root of the mean of the squares of these errors. The standard error of estimate
measures how far the predicted value are accurate. Smaller the value of standard error, closer is the
predicted value to the actual value.
∑ (𝑦𝑜 −𝑦𝑒 )2
Standard error of estimate = √ 𝑛
where y0 stands for actual values and ye for predicted
value.