0% found this document useful (0 votes)
223 views8 pages

Simple Linear Regression and Its Properties 82

This document provides an introduction to simple linear regression and its properties. It defines regression as the relationship between two variables, where one is the independent variable (cause) and the other is the dependent variable (effect). The regression equation takes the form of y = a + bx, where y is the dependent variable, x is the independent variable, a is the intercept, and b is the slope or regression coefficient. Simple linear regression involves one independent variable, while multiple linear regression involves more than one. The properties of regression coefficients are discussed. Two examples are provided to demonstrate calculating regression equations and correlation coefficients from data.

Uploaded by

ttvignesuwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
223 views8 pages

Simple Linear Regression and Its Properties 82

This document provides an introduction to simple linear regression and its properties. It defines regression as the relationship between two variables, where one is the independent variable (cause) and the other is the dependent variable (effect). The regression equation takes the form of y = a + bx, where y is the dependent variable, x is the independent variable, a is the intercept, and b is the slope or regression coefficient. Simple linear regression involves one independent variable, while multiple linear regression involves more than one. The properties of regression coefficients are discussed. Two examples are provided to demonstrate calculating regression equations and correlation coefficients from data.

Uploaded by

ttvignesuwar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

LECTURE 9

SIMPLE LINEAR REGRESSION AND ITS PROPERTIES

INTRODUCTION
Regression is the functional relationship between two variables and of the two
variables one may represent cause and the other may represent effect. The variable
representing cause is known as independent variable and is denoted by X. The variable X
is also known as predictor variable or regressor. The variable representing effect is known
as dependent variable and is denoted by Y. Y is also known as predicted variable. The
mathematical form of regression equation is given by
The regression equation of y on x is given by
y = a + bx
where, y is the dependent variable (y depends on x)
x is the independent variable (regressor)
a is the regression coefficient (or slope) of x,
a is the intercept,
a and b are called parameters or constants of the equation.

BASIC TYPES
1. Simple Linear regression
If there is only one independent variable in a regression equation, then it
called Simple linear regression.

2. Multiple linear regression


If there are more than one independent variables in a regression equation,
then it called Multiple linear regression.

SIMPLE LINEAR REGRESSION

y on x: (y depends on x) x on y: (x depends on y)
The equation is given by The equation is given by
y = a yx + byx x x = axy + bxy y
where, where,
covariance( x, y ) cov( x, y ) covariance( x, y ) cov( x, y )
byx = = bxy = =
variance(x)  x2 variance(y )  y2
n n

 ( x − x )( y − y )
i i n  ( x − x )( y − y )
 ( x − x )( y − y )
i i
i =1
i i = i =1
n −1 n
= =  (y − y)
i =1 2
n n

 (x − x )  (x − x )
i
2 2
i =1
i i
i =1 i =1 and axy = x − byx y
n −1
and intercept ayx = y − byx x
22AG2056 Statistical Methods (1+1) | 76

Note: The regression lines pass through ( x , y ) . So, ( x , y ) is the point of intersection of
the regression lines.

Properties of regression coefficient


1. Correlation coefficient of x and y is the geometric mean of the two regression
coefficients of x and y. i.e.,
rxy = ryx =  bxy  byx
Note: The sign will be determined by the signs of the coefficients. If both the coefficients have
the ‘-’ sign, ‘-’ sign will be used. Similarly, if both have ‘+’ sign, use the ‘+’ sign.
2. Regression coefficients are independent of change of origin but not of scale.
3. Regression coefficient estimates the change in one variable for a unit change in
another related variable.
4. Regression coefficient is expressed in the units of dependent variable.
5. Regression coefficient lies between -∞ and +∞.

Example 9.1: For the given data, calculate the two regression equations and find the
correlation coefficient using the regression coefficients.
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
Solution:
x y x−x y− y ( x − x )2 ( y − y )2 ( x − x )( y − y )
2 18 -4 8 16 64 -32
4 12 -2 2 4 4 -4
5 10 -1 0 1 0 0
6 8 0 -2 0 4 0
8 7 2 -3 4 9 -6
11 5 5 -5 25 25 -25
36 60 0 0 50 106 -67

n n

36  xi y i
60
x= = = 6,
i =1
y= i =1
= = 10
n 6 n 6
y on x: x on y:
Regression coefficient ( byx ) Regression coefficient ( bxy )
n n

 ( xi − x )( yi − y )  ( x − x )( y − y )
i i
byx = i =1
n
bxy = i =1
n

 ( xi − x )2  ( y − y)
i =1
i
2

i =1

−67 −67
byx = bxy =
50 106
byx = −1.34 bxy = −0.63
22AG2056 Statistical Methods (1+1) | 77

Intercept ( a yx ) Intercept ( a xy )
a yx = y − byx x axy = x − byx y
a yx = 10 − (−1.34  6) axy = 6 − (−0.63 10)
a yx = 10 + 8.04 axy = 6 + (6.3)
a yx = 18.04 axy = 12.3
Regression equation of y on x: Regression equation of x on y:
y = a yx + byx x x = axy + bxy y
y =18.04 − 1.34 x x = 12.3 − 0.63 y

The correlation coefficient ( rxy )


rxy = ryx =  bxy  byx
r =  (−1.34)  ( −0.63)
r = − 0.8442
r = − 0.918
The correlation coefficient (r) of x and y is -0.918 and is negatively correlated.

Example 9.2: The two lines of regression are, 8x – 10y + 66 = 0 and 40x – 18y – 214 = 0.
Find (a) the mean values of x and y, and (b) Correlation co-efficient between x and y.
Solution:
(a) the mean values of x and y
( x , y ) is the point of intersection of the regression lines.
Then, 8x – 10y + 66 = 0 ... (1)
40x – 18y – 214 = 0 ... (2)
(1)  5  40 x – 50 y + 330 = 0
(2)  −1 − 40 x + 18y + 214 = 0
− 32 y + 544 = 0
−544
y=
−32
y = 17
Therefore, mean of y, y = 17
Substitute y = 17 in (1),
8x – 10 (17) = -66
8x – 170 = -66
x = 13
mean of x is 13.
22AG2056 Statistical Methods (1+1) | 78

(b) Correlation co-efficient between x and y


(1)  (2) 
10 y = 8 x + 66 40 x = 18 y − 214
8 66 18 214
y= x+ x= y−
10 10 40 40
which is in y on x form. which is in x on y form.
8 18
Hence, byx = Hence, bxy =
10 40
Then, correlation coefficient (r) is,

rxy = ryx =  bxy  byx

 18   8 
r =    
 40   10 
r =  0.6
Since, both the regression coefficients are positive, r must be positive. r = 0.6. The
variables x and y are positively correlated.

STATISTICAL CONCEPT
In statistics, the regression equation of y on x was given by
yi = a + bxi + ei ... (1)
where, yi is the ith observed value of y variable,
xi is the ith observed value of x variable,
ei is the residual (error) of the ith value.

Then, the estimated regression equation is given by,


ˆ
yˆ = aˆ + bx ... (2)
i i

where, yˆ i is the ith predicted value of y variable.

The residual of the regression equation is given by,


Eqn(1& 2) 
yi = a + bxi + ei
e = y − (aˆ + bxˆ )
i i i

ei = yi − yˆi
Therefore, residual is the difference between the predicted and the observed value.
22AG2056 Statistical Methods (1+1) | 79

Example: 9.3: The height (cm) and yield (g) of 5 plants were given below.
Plants Height (cm) Yield (g)
Plant 1 1 3
Plant 2 2 4
Plant 3 3 7
Plant 4 4 8
Plant 5 5 13
1. Fit a regression equation.
2. Calculate the predicted values and residual with interpret.
3. Also estimate y when the height of the plant is 2.5 cm.
22AG2056 Statistical Methods (1+1) | 80

Solution:
Height Yield
Plants xi − x yi − y ( xi − x ) 2 ( yi − y ) 2 ( xi − x )( yi − y )
(cm) (x) (g) (y)
Plant 1 1 3 -2 -4 4 16 8
Plant 2 2 4 -1 -3 1 9 3
Plant 3 3 7 0 0 0 0 0
Plant 4 4 8 1 1 1 1 1
Plant 5 5 13 2 6 4 36 12
Total 15 35 0 0 10 62 24
Mean 3 7 0 0 2 12.4 4.8

n n

 xi
15  yi
35
x= = = 3 cm , y =
i =1 i =1
= =7g
n 5 n 5
n
cov( x, y ) =  ( xi − x )( yi − y ) = 24
i =1
n n

 (x − x )
i =1
i
2
= 10,  (y − y)
i =1
i
2
= 62

1. Fitting a regression equation.


The estimated regression equation is given by,
yˆi = a + bxi
n

cov( x, y )  ( x − x )( y − y )
i i
24
bˆyx = = i =1
= = 2.4
2
x n

 (x − x ) 2 10
i
i =1

intercept aˆ yx = y − byx x
= 7 − (2.4  3)
= 7 − 7.2
= −0.2
The estimated regression equation is given by,
ˆ
yˆ = aˆ + bx i i

yˆi = −0.2 + 2.4 xi

2. Calculate the predicted values and residual.


The estimated regression equation for y on x is given by,
ˆ
yˆ = aˆ + bx i i

yˆi = −0.2 + 2.4 xi


The residuals are calculated by ei = yi − yˆi
22AG2056 Statistical Methods (1+1) | 81

Height Yield yˆi = −0.2 + 2.4 xi ei = yi − yˆi


Plants
(cm) (x) (g) (y)
Plant 1 1 3 2.2 0.8
Plant 2 2 4 4.6 -0.6
Plant 3 3 7 7 0
Plant 4 4 8 9.4 -1.4
Plant 5 5 13 11.8 1.2
Total 15 35 0.0

Interpretation:
The regression equation is given by yˆi = −0.2 + 2.4 xi . The regression coefficient is
2.4. i.e., One cm increase in plant height may increase the yield by 2.4 g.

4. Estimate y when the height of the plant is 2.5 cm.


The estimated regression equation is given by,
ˆ
yˆ = aˆ + bx
i i

yˆi = −0.2 + 2.4 xi


If x = 2.5cm , the estimated values
yˆi = −0.2 + (2.4  2.5)
= 5.8 g

***
22AG2056 Statistical Methods (1+1) | 82

EXERCISE: 9
1. The regression equation of y on x is .................
2. Regression coefficient lies between.................
3. The regression coefficient is a unitless measure. (True/False)
4. The regression coefficient will take the unit of the .....................
5. The regression coefficient is unaffected by change of .....................
6. The geometric mean of two regression coefficients is…………
7. Write the regression equation and expand its components.
8. Distinguish between regression and correlation.
9. Write the regression coefficient formula.
10. Define regression and its basic types.
11. What is error?
12. From the following data, find (i) the two regression lines, (ii) the coefficient of
correlation between the marks in Economics and Statistics.
Marks in Economics (x) 25 28 35 32 31 36 29 38 34 32
Marks in Statistics (x) 43 46 49 41 36 32 31 30 33 39

13. The regression equations are 3x + 2y = 26 and 6x + y = 31. Find the correlation
coefficient between X and Y.

14. The height (cm) and yield (g) of 5 plants were given below.

Plants Height (cm) Yield (g)


Plant 1 1.5 2
Plant 2 2 3.5
Plant 3 2.5 4
Plant 4 3 6.5
1. Fit a regression equation.
2. Calculate the predicted values and residual with interpret.
3. Also estimate y when the height of the plant is 2.75 cm.

***

You might also like