0% found this document useful (0 votes)
86 views38 pages

Unit 2-Part 3-Linear Regression

- The coefficient of correlation (r) between x and y values is 0.5 - The equation of the regression line of y on x is: Y = 1.1 + 1.3X - The equation of the regression line of x on y is: X = 0.5Y + 0.5

Uploaded by

sasuke Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views38 pages

Unit 2-Part 3-Linear Regression

- The coefficient of correlation (r) between x and y values is 0.5 - The equation of the regression line of y on x is: Y = 1.1 + 1.3X - The equation of the regression line of x on y is: X = 0.5Y + 0.5

Uploaded by

sasuke Uchiha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Regression

Example
 A researcher believes that there is a linear
relationship between BMI (Kg/m2) of
pregnant mothers and the birth-weight
(BW in Kg) of their newborn

 A researcher also says that there is a


linear relationship between the study
hours and result of the student.
Study Hours Regents
Score
3 80
5 90
2 75
6 80
7 90
1 50
2 65
7 85
1 40
7 100
Scatter Diagram
 Scatter diagram is a graphical method to
display the relationship between two
variables

 Scatter diagram plots pairs of bivariate


observations (x, y) on the X-Y plane

 Y is called the dependent variable

 X is called an independent variable


Is there a linear relationship between
study hours and result?
 Scatter diagrams are important for initial
exploration of the relationship between two
quantitative variables

 In the above example, we may wish to


summarize this relationship by a straight line
drawn through the scatter of points
Regression
 Regression is the estimation or prediction of unknown values of
one variable from known values of another variable

 The variable whose value is to be predicted is called dependent


variable and the variable which is used for prediction is called
independent variables

 If the scatter diagram indicates some relationship between two


variable x and y then the dots of the scatter diagram will be
concentrated round a curve

 The curve is called the curve of regression and the relationship


is said to the be expressed by means of curvilinear regression

 In the particular case, when the curve is a straight line, it is


called a line of regression and the regression is said to be linear.
 The equation of the line of regression of y on x is
y=a+bx ---eq 1
, where y is dependent variable and x is
independent variable.
 The line of regression always passes through point
( x, y) , y =a+bx ----eq 2
byx is the slope of the line r σy
(b=coeff of regression)

σx
where r = karl Pearson’s coefficient of correlation
r = cov(x, y) / σxσy
line of regression of y on x
 The equation of the line of regression of x on y is
x =a+by ----eq 1
, where x is dependent variable and y is
independent variable.
The line of regression always passes through point
( x, y) , x =a+by -- eq 2
bxy is the slope of the line r σx

σy
line of regression of x on y
Example

The model has a deterministic and a probabilistic


components
House
Cost
b out
ts a
os
e c z e)
u s . Si
a ho fo ot 75(
d i n g u a r e 00 +
l 0
Bui per sq t = 25
Most lots sell $7 5 s e c o s
for $25,000 Ho u

House size

11
Example

However, house cost vary even among same


size houses!Since cost behave unpredictably,
House we add a random component.
Cost

Most lots sell


for $25,000
House cost = 25000 + 75(Size)
House size

12
Estimating the Coefficients

 The estimates are determined by


 drawing a sample from the population of
interest,
 calculating sample statistics.
 producing a straight line that cuts into the data.
y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
x 14
The Least Squares (Regression) Line

A good line is one that minimizes


the sum of squared differences between
the points and the line.

15
y on x
y=a+bx substitute the values of a and b

a=y- bx b= y

y – y= r σy (x – x)
σx

x on y
x=a+by substitute the values of a and b

a=x- by b =σx x

x – x= r σx (y – y)
σy
Regression
LINE OF REGRESSION:
 The equation of the line of regression of y on x is y=a+bx

 y =a+bx (as Line of Regression passes through x, y )


y – y= r σy (x – x)
σx

 The equation of the line of regression of x on y is


(x – x) = r σx (y – y)
σy

where r = cov(x, y) / σxσy


Both the lines pass through (x, y)
Regression Coefficients

 The slope of the line of regression of Y on X is also called


the coefficient of regression of Y on X

 Regression coefficient of Y on X = bYX =r σY / σX

 Regression coefficient of X on Y = bXY =r σX / σY


Properties of Regression Coefficients
1. Correlation coefficient (r) is the geometric mean between
the regression coefficients
bYX =r σY / σX ----eq 1 and bXY =r σX / σY ----eq 2

bYX X bXY = r2
Example 1
Obtain the following for given data:
1) the least square regression line of y on x
2) line of regression of x on y
3) Also obtain an estimate of y for x = 8
4) Estimate value of X for Y = 2
5) Calculate the coefficient of correlation (r)

x 3 4 5 6 4 5 6 7
y 3 5 3 2 3 4 6 6
X Y X-X Y-Y (x-X)2 (Y-y)2 (x-x)(y-
Xmean =EX/n = 40/8 = 5
y)
3 3 -2 -1 4 1 2
Ymean =EY/n= 32/8 = 4
4 5 -1 1 1 1 -1
Coefficient of regression of y
5 3 0 -1 0 1 0
on x is:
6 2 1 -2 1 4 -2
4 3 -1 -1 1 1 1
5 4 0 0 0 0 0
=6/12=0.5
6 6 1 2 1 1 2 Line of regression of y on x is:
7 6 2 2 4 4 4
Sum Sum Sum = Sum= Sum= 6 Y – 4= 0.5 ( X – 5)
= 40 = 32
12 16
Y – 4 = 0.5 X - 2.5
Y = 0.5 X - 2.5 + 4
Y = 1.5 + 0.5 X
Also, Coefficient of regression of X on y is

= 6/16 =0.375

Line of regression of x on y is :

X- 5 = 0.375 ( y- 4)
X= 5 – 1.5 + 0.375 y
X= 3.5 + 0.375 y
Find Y when X=8:

Putting x=8 on line of regression of line Y on X


Y=0.5(8)+1.5 = 5.5

Find X when Y=2:

Putting Y=2 on line of regression of line X on Y


X=3.5 + 0.375 * 2
X=4.25

Coefficient of regression (r):

as bYX X bXY = r2
r = (0.5 * 0.375) 1/2 = 0.43
Example 2

For the following values of


x 1 2 3 4 5
y 2 5 3 8 7

 Calculate the Karl Pearson’s coefficient of


correlation (r)
 obtain the equation of regression line for x on y
 obtain the equation of regression line for y on x
 Find r using either following formula:

or
 Find bYX & bXY then find r [as bYX X bXY = r2 ]
Xi Yi Xi- Yi- (xi-x)2 (yi-y)2 (xi-x) Xmean = sum X/n =15/5=3
mean mean (yi-y)
1 2 Ymean = sum Y/n =25/5=5
-2 -3 4 9 6
2 5 For regression line calculation:
-1 0 1 0 0
3 3
0 -2 0 4 0
4 8
1 3 1 9 3
5 7
2 2 4 4 4 =13/10=1.3
Sum Sum
= 15 =25 Sum=10 Sum=26 Sum=13

=0.5

Line of regression of y on x is

Y-5= 1.3(x-3) y=1.3x -3.9+5


Y=1.1+1.34 X
Xi Yi Xi- Yi- (xi-x)2 (yi-y)2 (xi-x) Xmean = sum X/n = 15/5= 3
mean mean (yi-y)
1 2 Ymean = sum Y/n = 25/5= 5
-2 -3 4 9 6
2 5 For regression line calculation:
-1 0 1 0 0
3 3
0 -2 0 4 0
4 8
1 3 1 9 3
5 7
2 2 4 4 4 = 13/10=1.3
Sum Sum
= 15 = 25 Sum= 10Sum= 26Sum= 13

=13/26=0.5

Line of regression of y on x is

Y-5= 1.3 (x-3)


Y= 1.1+1.3x
Line of regression of x on y is

X-3=0.5(y-5)
X=0.5y+0.5

Calculate Value of r:
as bYX X bXY = r2
r = (1.3 * 0. 5) 1/2 = + 0.806
Example 3

The regression line of y on x for a certain bivariate data


is 5y+3x=52 and the line of regression of x on y is
2x+y=30 find

i) Arithmetic mean of x and y


ii) The coefficient of correlation between x and y
iii) The most probable value of y when x = 10
iv) The most probable value of x when y = 5
Example 3
Solve the given equation :

3x + 5y = 52  3x + 5y = 52 (subtract first equation from second)


2x + y = 30  10x+5y=150  7x=98  X=14

Calculate Y : 2 * 14 + y = 30 (put x=14 in second equation)


y= 2

AM of x= 14 and AM of y = 2 5y+3x=52  5y=52-3X


2x+y=30 –> 2x=30-y  x=15-y/2

as bYX X bXY = r2 Y on X
5y+3x=52  Y = a + bYXX y = (52/5) + (–3/5) x
X on Y
2x+y=30  X = a + bXYY  x = 15 + (-1/2) y
bYX = –3/5
bXY = –1/2 r = - 0.5477

The most probable value of y when x = 10


y = -3/5x+52/5 = -3/5 *10+52/5 = 4.4

The most probable value of x when y = 5


2x+y=30  x=12.5
Example 4
 In partially destroyed lab record of an analysis of correlation
data , the following results only are readable:
 Variance of x=9
 Regression equations: 8x- 10y+66=0 , 40x-18y =214
Find the value of
a) the correlation coefficient between x and y,
b) the mean values of x and y
c) standard deviation of y

Variance= SD2 r=0.6


Sd of y =4
Mean x=13
Mean y=17
ii) Since both the line pass through the point (Xmean,Ymean)
So solving two equation 8x- 10y+66=0 , 40x-18y =214 we get
Xmean = 13 and Ymean = 17

i) Let us consider
Line of y on x : 8x- 10y+66=0  y = 6.6 + 0.8 x  bYX = 0.8
Line of x on y : 40x-18y =214  x = (214/40) + (18/40) y  bXY = 18/40
as bYX X bXY = r2
r2 =0.8 *18/40 = 0.4 * 0.9  r = + 0.6

iii)
bYX = r σY / σX
0.8 = 0.6 * σY / 3
σY= 0.8 * 3 /0.6 = 4
σY = 4
Mean x= 67.657
Mean y = 68.70
r=0.513
Practice Questions
 The two regression lines obtained from certain data were
y=x+5 and 16x =9y-94. find the variance of X where variance
of y is 16. also find the covariance between X and Y
(ans variance of X= 9, cov =9)
(hint : cov(x,y)/σxσy = r)
Practice Questions
 The two regression lines obtained from certain data were
y=x+5 and 16x =9y-94. find the variance of X where variance
of y is 16. also find the covariance between X and Y
(ans variance of X= 9, cov =9)
(hint : cov(x,y)/σxσy = r)
Given: y = x +5  bYX = 1
16x =9y-94  bXY = 9/16
as bYX X bXY = r2
r= (1 * 9/16) 1/2 = 3/4---eq 1

Variance(y)= 16  so σY = (16) 1/2  so σY = 4 --eq 2


bYX = r σY / σX
Practice Questions

 Find the regression line of y on x for the following data:


Ans: y = 0.636 x + 0.548

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9

You might also like