Linear Regression
Linear Regression
REGRESSION?
LINEAR: Linear basically means lines
REGRESSION:
Regression basically means to
estimate or predict
Example :
Production in summer
Production in winter
REGRESSION:
It is used to predict or estimate the value
of the first variable using second variable
Dependent Independent
REMARKS :
1) Point of intersection of the two lines of regression
Both regression equations pass through the point (x,y), x is the mean
of the x series and y is the mean of the y series
y= mx+c x= my+c
y-y=byx(x-x) x-x=bxy(y-y)
REGRESSION REGRESSION
CO-EFFICIENT CO-EFFICIENT
• .
Y on x X on y
Σxy —___________
(Σx) (Σy) Σxy — ___________
(Σx) (Σy)
When the original n
______________ n
______________
values are used Σx2 — ____
(Σx)2 Σy2 — (Σy)
____
2
n n
SHEET .
are taken from
mean Σd x
2 Σd y2
byx = r____
σy bxy = ____
r σx
Regression σx σy
Coefficients
Coefficient of correlation
Solution:
Bxy= 3.2, byx = 0.8
Now r2 = bxy x byx = 3.2 x 0.8 = 2.56 > 1
Since r2 is always < 1 , therefore the statement is
false.
Q2) Find the line of the best fit to the following data
using :
(I) x as independent variable
(II) x as dependent variable
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Solution:
x y dx = x-x dy = y-y dxdy dx2 dy2
=x-7 =y-5
1 1 -6 -4 24 36 16
3 2 -4 -3 12 16 9
4 4 -3 -1 3 9 1
6 4 -1 -1 1 1 1
8 5 1 0 0 1 0
9 7 2 2 4 4 4
11 8 4 3 12 16 9
14 9 7 4 28 49 16
__ __
(II)Regression equation __ of x on y ( when x is dependent variable)
x-x = b xy ( y-y) I.e., x-7= 3( y-5) or 2x-3y+1=0
2
Q3) i)Calculate Karl Pearson's Coefficient of Correlation between
the marks in english and mathematics obtained by 10 students
Marks 20 13 18 21 11 12 17 14 19 15
in
English
Marks 17 12 23 25 14 8 19 21 22 19
in Math
r = Σd
____xd y =____
125 =125 ____
____ =125 =0.75, approx
Σd x 2 Σd y 2 110x254 27940 167.15
__ __
(ii) A line of best fits is y-y= Σd x d y (x –x) => y-18=125 (x-16)
____ ____
Σd x 2 110
=>25x -22y-4=0, which is a line of best fit (that is) the line of regression
of y on x.
y 2 4 3 2 4 Σy = 15
xy 10 28 24 8 24 Σxy =94
x2 25 49 64 16 36 Σx2 = 190
Solution:
bxy = Σxy — _________
(Σx ) (Σy ) 94 –_____
30x15
n
______________ =
______________ 5 _________
= 470–450
____ = 20 =
0.4 ____
____
Σx2 — (Σ )2 190 – (30)2 950–900 50
nx
5
Q5) FIND THE REGRESSION COEFFICIENTS BXY AND BYX FOR THE FOLLOWING
DATA :
Σx = 55, Σy = 88, Σx2 = 385, Σy2 = 1114, Σxy = 586 , n= 10
Solution:
byx = Σxy —_______
Σx Σy 586_____
– 55x88
n
______________ =
______________ 10 _________
= 586–484
____ = 102 =
1.24 ____ ____
Σx2 —n (Σx )2 385– (55)2 385–302.5 82.5
10
bxy = Σxy —_______
Σx Σy 586 _____
– 55x88
______________
n = 10 _________
______________ = 586–484
____= 102 =
0.3 ____ 2 ____
Σy —n(Σy )
2 1114– (88)2 1114–774.4 339.6
10
Q6) FROM THE FOLLOWING DATA, CALCULATE
(I) CORRELATION COEFFICIENT (II) STANDARD DEVIATION OF Y (σy
) X = 0.85Y , Y= 0.89X , σx = 3
Solution: _______ ________
(I) Correlation coefficient r =√bxy x byx =√0.85x0.89 = 0.765 = 0.869
σx => 0.85 = 0.869x3
(II) bxy = r x___ __ => 0.85 = 2.607
_____ => σy = _____
2.607 = 3.067
. σy σy σy 0.85
Q7) THE LINES OF REGRESSION OF A SET OF DATA ARE:
8x-10y+66=0, 40x-18y=214
The variance of x is 9. Find
(I) The mean value of x and y (II) coefficient b xy and b yx (III)
standard deviation of y (IV) the value of y for x=2 (V) the value of x
for y=3
Solution:
(I)Since the two lines of regression intersect in the point (x,y) the
means are obtained on solving the given equations simultaneously.
The given equations of the lines of regression are
8x-10y=66… .(1) 40x-18y=214… … (2)
Solving them we obtain x=13, y=17… therefore mean value of x=13
and mean value of y=17.
Rewriting the equations (1) and (2), we have
From (1), y= 8x +____
____ 66 Or y=0.8x+6.6 ( regression line of y on x)
10 10
Bxy = ____
r σy = 0.8
σx
From (2), x= 18y + ____
214 Or x=0.45y + 5.35 (regression line of x on y)
____
40 40
Therefore bxy = r____
σx = 0.45 therefore r2 = bxy x byx = 0.45x0.8 => r = +/- 0.6
σy
But bxy and byx both being +ve, therefore r is also +ve , r=+0.6
(III) Variance of x, I.e. σx2 = 9 therefore, σx = 3
From r σy = 0.8 , we have 0.6 x ____ σy = 0.8 => σy =4
____
σx 3
(IV) Since the lines of regression of y on x given best estimated value of y for
given x, therefore putting x=2 in 8x-10y+66=0, we get 16-10y+66=0 => y=8.2
(V) The line of regression of x on y gives the value of y. Putting
y=3 in the equation 40-18y=214, we get 40x-54=214 => x=6.7
Q8) THE PERSONAL MANAGER OF A FACTORY WANTS TO FIND A MEASURE
WHICH HE CAN USE TO FIX THE MONTHLY INCOME OF A PERSON APPLYING FOR
A JOB IN THE PRODUCTION DEPARTMENT. AS AN EXPERIMENT PROJECT, HE
COLLECTED DATA ON 7 PERSONS FROM THAT DEPARTMENT REFERRING TO
YEARS OF SERVICE AND THEIR MONTHLY INCOME.
Income in 10 8 6 5 9 7 11
‘00 (y)
Solution:
x y dx = x-8 dy = y-8 dx2 dx. dy
11 10 3 2 9 6
7 8 -1 0 1 0
9 6 1 -2 1 -2
5 5 -3 -3 9 9
8 9 0 1 0 0
6 7 -2 -1 4 2
10 11 2 3 4 6
Σx = 56 Σy = 56 Σdx2= 28 Σdxdy = 21
_ _
x = __
Σx = __
56=8 ; y= Σy__ =56=8
__
n 7 n 7
_ _
The regression equation of y on x is y-y = Σd xdy (x-x)
_____
Σdx2
y -8 = 21
__ (x-8) => y-8 =__3 (x-8)
28 4
4y-32 = 3x-24 => 4y= 3x +8
(II) when x = 13, 4y = 39+8 => y = 47
__ = 11.75
4
Hence, the initial start should be ₹11.75x100 I.e., ₹1175
Q9) AN ANALYST FOR A CERTAIN COMPANY WAS STUDYING THE
RELATIONSHIP BETWEEN TRAVEL EXPENSES (y) FOR 102 SALES
TRIPS AND THE DURATION IN DAYS (x) OF THESE TRIPS. HE FOUND
THAT RELATIONSHIP BETWEEN y AND x IS LINEAR. A SUMMARY OF
THE DATA IS GIVEN BELOW :
Σx = 510, Σy = 7140 , Σx2 = 4150, Σxy = 54900, Σy = 740200
(I)Estimate the two regression equations from the above data
(II)A given trip has to take seven days. How much money should a
salesman be allowed so that he will not run short of money?
Solution:
_ _
x =____
Σx =____
510 =5 ; y =____
Σy =____
7140= 70
n 102 n 102
b yx = Σxy -______
Σx. Σy 54900 – 510x7140
______________
n
______________ = ______________
102 = 1958400 = 12
_________
Σx 2 ______
- ( Σx)2 4150 – (510)
____________
2 163200
n 102 _ _
The regression equation of y on x is y- y = b yx (x- x) => y-70 =12 (x-5)
y = 12x + 10
When x=7, y=12x7+10=94
Solution:
b xy = Σxy _______
- Σx. Σy 54900 – 510x7140
______________
n = 102 = 1958400 __ = 48 =
_________
0.08 _______ ______________
Σy 2 - ( Σy)2 740200 – (7140)2 24520800 601
_ n_ 102
The regression equation of x on y is
x – x = b xy (y- y ) => x-5 = 0.08( y-70) =0.08y -5.6
x = 0.08y -0.6
Q10) EQUATION OF TWO REGRESSION LINES ARE 4x +3y+7=0 AND
3x+4y+8=0. FIND (I) MEAN OF x , MEAN OF y (II) REGRESSION
COEFFICIENTS b yx AND b xy AND (III) CORRELATION COEFFICIENT
BETWEEN x AND y
Solution :
The two regression lines are
4x+3y+7=0
3x+4y+8=0
Solving (I) and (II) , we get x __
= -4 , y=
__-11 .
7 _ _7
Since the regression lines intersect at (x,y), we have mean of
__ x= -4
and _ _
__ __ __
7
Let equation (1) be the regression line of y on x, then (2) is the
regression line of x on y. We shall check if our assumption is
correct or not. __ __ __ __ __
Writing (1) as y= -4x -7, we get b yx = -4, writing (2) as x = -4y -8, we
get __
3 3 3 3 3
b xy = -4 __
3
Therefore, b yx .b xy (-4/3) (-4/3) = 16 > 1, which is not possible.
9
So assumption is wrong. __ __ __ __ __
Hence, (1) is the regression line of x on y and (2) is the regression
line of y on
__ x. Writing (1) as x= -3y -7, we get b xy = -3, writing (2) as
y= -3x -8, we
Since b yx and b xy are both –ve and r should be of the same sign as
b yx and b xy , therefore
________ ___________
r = - √b yx . b xy =
√ (-3/4) (-3/4) =__-3 I.e., correlation coefficient between
x
__ 4
and y is -3
4
Q23FIND I)COEFFICIENTS OF REGRESSION AND II) REGRESSION EQUATION,
FOR THE FOLLOWING DATA:
Price x 78 89 97 69 59 79 68 61
Demand y 125 137 156 112 107 136 123 108
byx = bvu = Σ uv – Σu Σv
____ 1500- (-24)
______ (4)
_____________
n = _____________8 = ________
1500+ 12 = 1512
____ = ____
28
Σu2 - (Σu)
____2 1314 – (-24)
____ 2 1314-72 = 1242 23
n 8
Therefore,
_ the line
_ of regression of y on x is
y-y = byx (x – x) => y-125.5 =____
28 (x-75) => 28x-23y+786.5 =0
23
bxy= buv = Σ uv– Σu Σv
____ 1500- (-24) (4)
_______
n = _____________
_____________ 8 = 1512
____= 756
____
Σv2 - ____
(Σv)2 2012 – ____
(4)2 2010 1005
n 8
By: Niranjana and Shreeya