Simple Linear Regression and Its Properties 82
Simple Linear Regression and Its Properties 82
INTRODUCTION
Regression is the functional relationship between two variables and of the two
variables one may represent cause and the other may represent effect. The variable
representing cause is known as independent variable and is denoted by X. The variable X
is also known as predictor variable or regressor. The variable representing effect is known
as dependent variable and is denoted by Y. Y is also known as predicted variable. The
mathematical form of regression equation is given by
The regression equation of y on x is given by
y = a + bx
where, y is the dependent variable (y depends on x)
x is the independent variable (regressor)
a is the regression coefficient (or slope) of x,
a is the intercept,
a and b are called parameters or constants of the equation.
BASIC TYPES
1. Simple Linear regression
If there is only one independent variable in a regression equation, then it
called Simple linear regression.
y on x: (y depends on x) x on y: (x depends on y)
The equation is given by The equation is given by
y = a yx + byx x x = axy + bxy y
where, where,
covariance( x, y ) cov( x, y ) covariance( x, y ) cov( x, y )
byx = = bxy = =
variance(x) x2 variance(y ) y2
n n
( x − x )( y − y )
i i n ( x − x )( y − y )
( x − x )( y − y )
i i
i =1
i i = i =1
n −1 n
= = (y − y)
i =1 2
n n
(x − x ) (x − x )
i
2 2
i =1
i i
i =1 i =1 and axy = x − byx y
n −1
and intercept ayx = y − byx x
22AG2056 Statistical Methods (1+1) | 76
Note: The regression lines pass through ( x , y ) . So, ( x , y ) is the point of intersection of
the regression lines.
Example 9.1: For the given data, calculate the two regression equations and find the
correlation coefficient using the regression coefficients.
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
Solution:
x y x−x y− y ( x − x )2 ( y − y )2 ( x − x )( y − y )
2 18 -4 8 16 64 -32
4 12 -2 2 4 4 -4
5 10 -1 0 1 0 0
6 8 0 -2 0 4 0
8 7 2 -3 4 9 -6
11 5 5 -5 25 25 -25
36 60 0 0 50 106 -67
n n
36 xi y i
60
x= = = 6,
i =1
y= i =1
= = 10
n 6 n 6
y on x: x on y:
Regression coefficient ( byx ) Regression coefficient ( bxy )
n n
( xi − x )( yi − y ) ( x − x )( y − y )
i i
byx = i =1
n
bxy = i =1
n
( xi − x )2 ( y − y)
i =1
i
2
i =1
−67 −67
byx = bxy =
50 106
byx = −1.34 bxy = −0.63
22AG2056 Statistical Methods (1+1) | 77
Intercept ( a yx ) Intercept ( a xy )
a yx = y − byx x axy = x − byx y
a yx = 10 − (−1.34 6) axy = 6 − (−0.63 10)
a yx = 10 + 8.04 axy = 6 + (6.3)
a yx = 18.04 axy = 12.3
Regression equation of y on x: Regression equation of x on y:
y = a yx + byx x x = axy + bxy y
y =18.04 − 1.34 x x = 12.3 − 0.63 y
Example 9.2: The two lines of regression are, 8x – 10y + 66 = 0 and 40x – 18y – 214 = 0.
Find (a) the mean values of x and y, and (b) Correlation co-efficient between x and y.
Solution:
(a) the mean values of x and y
( x , y ) is the point of intersection of the regression lines.
Then, 8x – 10y + 66 = 0 ... (1)
40x – 18y – 214 = 0 ... (2)
(1) 5 40 x – 50 y + 330 = 0
(2) −1 − 40 x + 18y + 214 = 0
− 32 y + 544 = 0
−544
y=
−32
y = 17
Therefore, mean of y, y = 17
Substitute y = 17 in (1),
8x – 10 (17) = -66
8x – 170 = -66
x = 13
mean of x is 13.
22AG2056 Statistical Methods (1+1) | 78
18 8
r =
40 10
r = 0.6
Since, both the regression coefficients are positive, r must be positive. r = 0.6. The
variables x and y are positively correlated.
STATISTICAL CONCEPT
In statistics, the regression equation of y on x was given by
yi = a + bxi + ei ... (1)
where, yi is the ith observed value of y variable,
xi is the ith observed value of x variable,
ei is the residual (error) of the ith value.
ei = yi − yˆi
Therefore, residual is the difference between the predicted and the observed value.
22AG2056 Statistical Methods (1+1) | 79
Example: 9.3: The height (cm) and yield (g) of 5 plants were given below.
Plants Height (cm) Yield (g)
Plant 1 1 3
Plant 2 2 4
Plant 3 3 7
Plant 4 4 8
Plant 5 5 13
1. Fit a regression equation.
2. Calculate the predicted values and residual with interpret.
3. Also estimate y when the height of the plant is 2.5 cm.
22AG2056 Statistical Methods (1+1) | 80
Solution:
Height Yield
Plants xi − x yi − y ( xi − x ) 2 ( yi − y ) 2 ( xi − x )( yi − y )
(cm) (x) (g) (y)
Plant 1 1 3 -2 -4 4 16 8
Plant 2 2 4 -1 -3 1 9 3
Plant 3 3 7 0 0 0 0 0
Plant 4 4 8 1 1 1 1 1
Plant 5 5 13 2 6 4 36 12
Total 15 35 0 0 10 62 24
Mean 3 7 0 0 2 12.4 4.8
n n
xi
15 yi
35
x= = = 3 cm , y =
i =1 i =1
= =7g
n 5 n 5
n
cov( x, y ) = ( xi − x )( yi − y ) = 24
i =1
n n
(x − x )
i =1
i
2
= 10, (y − y)
i =1
i
2
= 62
cov( x, y ) ( x − x )( y − y )
i i
24
bˆyx = = i =1
= = 2.4
2
x n
(x − x ) 2 10
i
i =1
intercept aˆ yx = y − byx x
= 7 − (2.4 3)
= 7 − 7.2
= −0.2
The estimated regression equation is given by,
ˆ
yˆ = aˆ + bx i i
Interpretation:
The regression equation is given by yˆi = −0.2 + 2.4 xi . The regression coefficient is
2.4. i.e., One cm increase in plant height may increase the yield by 2.4 g.
***
22AG2056 Statistical Methods (1+1) | 82
EXERCISE: 9
1. The regression equation of y on x is .................
2. Regression coefficient lies between.................
3. The regression coefficient is a unitless measure. (True/False)
4. The regression coefficient will take the unit of the .....................
5. The regression coefficient is unaffected by change of .....................
6. The geometric mean of two regression coefficients is…………
7. Write the regression equation and expand its components.
8. Distinguish between regression and correlation.
9. Write the regression coefficient formula.
10. Define regression and its basic types.
11. What is error?
12. From the following data, find (i) the two regression lines, (ii) the coefficient of
correlation between the marks in Economics and Statistics.
Marks in Economics (x) 25 28 35 32 31 36 29 38 34 32
Marks in Statistics (x) 43 46 49 41 36 32 31 30 33 39
13. The regression equations are 3x + 2y = 26 and 6x + y = 31. Find the correlation
coefficient between X and Y.
14. The height (cm) and yield (g) of 5 plants were given below.
***