0% found this document useful (0 votes)
67 views

Linear Regression

Linear regression is a statistical method used to predict or estimate the value of one variable from another related variable. It involves finding the line of best fit that minimizes the distance between the observed data points and the line. The line of best fit is called the regression line and can be used to make predictions about unknown data points. The regression coefficients byx and bxy indicate the slope of the regression lines and are used to calculate the coefficient of correlation r, which measures the strength and direction of the linear relationship between the two variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Linear Regression

Linear regression is a statistical method used to predict or estimate the value of one variable from another related variable. It involves finding the line of best fit that minimizes the distance between the observed data points and the line. The line of best fit is called the regression line and can be used to make predictions about unknown data points. The regression coefficients byx and bxy indicate the slope of the regression lines and are used to calculate the coefficient of correlation r, which measures the strength and direction of the linear relationship between the two variables.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

LINEAR

REGRESSION?
LINEAR: Linear basically means lines

REGRESSION:
Regression basically means to
estimate or predict
Example :
Production in summer

Production in winter
REGRESSION:
It is used to predict or estimate the value
of the first variable using second variable

The statistical methods which helps us


to estimate or predict the unknown value
of one variable from the known value of
the related variable is called regression
Equation of line : y= mx+c

Dependent Independent
REMARKS :
1) Point of intersection of the two lines of regression
Both regression equations pass through the point (x,y), x is the mean
of the x series and y is the mean of the y series

2) Regression lines and the correlation


a) If the two lines of regression, when plotted on a graph paper, coincide,
there is a perfect correlation between the two series
b) The greater the angle between the lines the lesser is the correlation
between the two series
c) If they cut each other at right angles there is no correlation between them.
Evidently two regression lines cannot be parallel
REGRESSION
LINES AND
COEFFICIENT
OF
CORRELATION
GENERAL EQUATIONS OF
REGRESSION LINES
When y depends on x When x depends on y

y= mx+c x= my+c
y-y=byx(x-x) x-x=bxy(y-y)
REGRESSION REGRESSION
CO-EFFICIENT CO-EFFICIENT
• .

Y on x X on y
Σxy —___________
(Σx) (Σy) Σxy — ___________
(Σx) (Σy)
When the original n
______________ n
______________
values are used Σx2 — ____
(Σx)2 Σy2 — (Σy)
____
2
n n

FORMULA When deviations


bxy = _______
Σdxdy bxy =________
Σdxdy

SHEET .
are taken from
mean Σd x
2 Σd y2

When deviations bxy =Σuv – Σu. Σv bxy =Σuv –________


________ Σu. Σv
are taken from the n
____________ n
____________
assumed mean Σu2 —(Σu) 2 Σv2 —(Σv) 2
____ ____
n n

byx = r____
σy bxy = ____
r σx
Regression σx σy
Coefficients
Coefficient of correlation

The two lines of regression (y on x and x on y)


intersect at the point ( x,y); coefficient of
correlation r or ρ(x,y) = √byx*bxy ; -1<=r<=1;
r, byx,byx are of the same sign. Cov(x,y)=Σdxdy/n
Q1. IS THE FOLLOWING STATEMENT CORRECT? GIVE
REASONS. THE REGRESSION COEFFICIENT OF X ON Y IS 3.2
AND THAT OF Y ON X IS 0.8

Solution:
Bxy= 3.2, byx = 0.8
Now r2 = bxy x byx = 3.2 x 0.8 = 2.56 > 1
Since r2 is always < 1 , therefore the statement is
false.
Q2) Find the line of the best fit to the following data
using :
(I) x as independent variable
(II) x as dependent variable

x 1 3 4 6 8 9 11 14

y 1 2 4 4 5 7 8 9
Solution:
x y dx = x-x dy = y-y dxdy dx2 dy2
=x-7 =y-5
1 1 -6 -4 24 36 16
3 2 -4 -3 12 16 9
4 4 -3 -1 3 9 1
6 4 -1 -1 1 1 1
8 5 1 0 0 1 0
9 7 2 2 4 4 4
11 8 4 3 12 16 9
14 9 7 4 28 49 16

Σx=56 Σx = 40 Σdxdy= Σdx2= Σdy2 =


84 132 56
Solution:
Σx =___
x =___ 56 = 7, y=___
Σy =40=5
__
n 8 n 8
Regression coefficients b yx =___
Σd x d y__
= 84
__ =7, b xy___
= Σd x__
d y =__
84=3
Σd x 2 32 11 Σd y 2 56 2

(I)Regression equation of y on x ( when x is independent variable)


__ __
is __
y-y = b yx (x- x), I.e. y-5=7 (x-7) or 7x-11y+6=0
11

__ __
(II)Regression equation __ of x on y ( when x is dependent variable)
x-x = b xy ( y-y) I.e., x-7= 3( y-5) or 2x-3y+1=0
2
Q3) i)Calculate Karl Pearson's Coefficient of Correlation between
the marks in english and mathematics obtained by 10 students

Marks 20 13 18 21 11 12 17 14 19 15
in
English
Marks 17 12 23 25 14 8 19 21 22 19
in Math

ii) Also find a line of best fit.


iii) A candidate X scored 25 marks in English but was absent from the
mathematics test. Estimate his probable score for the latter test
Solution: (i) Let x denote the marks in English and y in
mathematics
x y dx =x-16 dy =y-18 dx 2 dy2
dx. dy
20 17 4 -1 -4 16 1
13 12 -3 -6 18 9 36
18 23 2 5 10 4 25
21 25 5 7 35 25 49
11 14 -5 -4 20 25 16
12 8 -4 -10 40 16 100
17 19 1 1 1 1 1
14 21 -2 3 -6 4 9
19 22 3 4 12 9 16
15 19 -1 1 -1 1 1
Σx=160 Σy=180 Σdx. Σdx 2=110 Σdy2 =25
dy =125 4
__ __
x =____
160 = 16 ; y=____
180=18
10 10

r = Σd
____xd y =____
125 =125 ____
____ =125 =0.75, approx
Σd x 2 Σd y 2 110x254 27940 167.15

__ __
(ii) A line of best fits is y-y= Σd x d y (x –x) => y-18=125 (x-16)
____ ____
Σd x 2 110
=>25x -22y-4=0, which is a line of best fit (that is) the line of regression
of y on x.

(III) When x=25, then y-18=____


125 (25-16) => y-18 = 25x9
____
110 22
=> 22y-396=225 => 22y=621 => y= 621 =28.2 or 28 marks to the nearest
____
integer 22
Q4) COMPUTE B XY FOR THE FOLLOWING DATA :
{(x,y) : (5,2) , (7,4),(8,3),(4,2),(6,4)}
x 5 7 8 4 6 Σx = 30

y 2 4 3 2 4 Σy = 15

xy 10 28 24 8 24 Σxy =94

x2 25 49 64 16 36 Σx2 = 190

Solution:
bxy = Σxy — _________
(Σx ) (Σy ) 94 –_____
30x15
n
______________ =
______________ 5 _________
= 470–450
____ = 20 =
0.4 ____
____
Σx2 — (Σ )2 190 – (30)2 950–900 50
nx
5
Q5) FIND THE REGRESSION COEFFICIENTS BXY AND BYX FOR THE FOLLOWING
DATA :
Σx = 55, Σy = 88, Σx2 = 385, Σy2 = 1114, Σxy = 586 , n= 10
Solution:
byx = Σxy —_______
Σx Σy 586_____
– 55x88
n
______________ =
______________ 10 _________
= 586–484
____ = 102 =
1.24 ____ ____
Σx2 —n (Σx )2 385– (55)2 385–302.5 82.5
10
bxy = Σxy —_______
Σx Σy 586 _____
– 55x88
______________
n = 10 _________
______________ = 586–484
____= 102 =
0.3 ____ 2 ____
Σy —n(Σy )
2 1114– (88)2 1114–774.4 339.6
10
Q6) FROM THE FOLLOWING DATA, CALCULATE
(I) CORRELATION COEFFICIENT (II) STANDARD DEVIATION OF Y (σy
) X = 0.85Y , Y= 0.89X , σx = 3
Solution: _______ ________
(I) Correlation coefficient r =√bxy x byx =√0.85x0.89 = 0.765 = 0.869
σx => 0.85 = 0.869x3
(II) bxy = r x___ __ => 0.85 = 2.607
_____ => σy = _____
2.607 = 3.067
. σy σy σy 0.85
Q7) THE LINES OF REGRESSION OF A SET OF DATA ARE:
8x-10y+66=0, 40x-18y=214
The variance of x is 9. Find
(I) The mean value of x and y (II) coefficient b xy and b yx (III)
standard deviation of y (IV) the value of y for x=2 (V) the value of x
for y=3
Solution:
(I)Since the two lines of regression intersect in the point (x,y) the
means are obtained on solving the given equations simultaneously.
The given equations of the lines of regression are
8x-10y=66… .(1) 40x-18y=214… … (2)
Solving them we obtain x=13, y=17… therefore mean value of x=13
and mean value of y=17.
Rewriting the equations (1) and (2), we have
From (1), y= 8x +____
____ 66 Or y=0.8x+6.6 ( regression line of y on x)
10 10
Bxy = ____
r σy = 0.8
σx
From (2), x= 18y + ____
214 Or x=0.45y + 5.35 (regression line of x on y)
____
40 40
Therefore bxy = r____
σx = 0.45 therefore r2 = bxy x byx = 0.45x0.8 => r = +/- 0.6
σy
But bxy and byx both being +ve, therefore r is also +ve , r=+0.6
(III) Variance of x, I.e. σx2 = 9 therefore, σx = 3
From r σy = 0.8 , we have 0.6 x ____ σy = 0.8 => σy =4
____
σx 3
(IV) Since the lines of regression of y on x given best estimated value of y for
given x, therefore putting x=2 in 8x-10y+66=0, we get 16-10y+66=0 => y=8.2
(V) The line of regression of x on y gives the value of y. Putting
y=3 in the equation 40-18y=214, we get 40x-54=214 => x=6.7
Q8) THE PERSONAL MANAGER OF A FACTORY WANTS TO FIND A MEASURE
WHICH HE CAN USE TO FIX THE MONTHLY INCOME OF A PERSON APPLYING FOR
A JOB IN THE PRODUCTION DEPARTMENT. AS AN EXPERIMENT PROJECT, HE
COLLECTED DATA ON 7 PERSONS FROM THAT DEPARTMENT REFERRING TO
YEARS OF SERVICE AND THEIR MONTHLY INCOME.

(I) FIND THE REGRESSION EQUATION (y) ON YEARS OF SERVICE (x)


(II) USING IT WHAT INITIAL START WOULD YOU RECOMMEND FOR A PERSON
APPLYING FOR A JOB AFTER HAVING SERVED IN SIMILAR CAPACITY IN
ANOTHER FACTORY FOR 13 YEARS?
Years of 11 7 9 5 8 6 10
service (x)

Income in 10 8 6 5 9 7 11
‘00 (y)
Solution:
x y dx = x-8 dy = y-8 dx2 dx. dy

11 10 3 2 9 6

7 8 -1 0 1 0

9 6 1 -2 1 -2

5 5 -3 -3 9 9

8 9 0 1 0 0

6 7 -2 -1 4 2

10 11 2 3 4 6

Σx = 56 Σy = 56 Σdx2= 28 Σdxdy = 21
_ _
x = __
Σx = __
56=8 ; y= Σy__ =56=8
__
n 7 n 7
_ _
The regression equation of y on x is y-y = Σd xdy (x-x)
_____
Σdx2
y -8 = 21
__ (x-8) => y-8 =__3 (x-8)
28 4
4y-32 = 3x-24 => 4y= 3x +8
(II) when x = 13, 4y = 39+8 => y = 47
__ = 11.75
4
Hence, the initial start should be ₹11.75x100 I.e., ₹1175
Q9) AN ANALYST FOR A CERTAIN COMPANY WAS STUDYING THE
RELATIONSHIP BETWEEN TRAVEL EXPENSES (y) FOR 102 SALES
TRIPS AND THE DURATION IN DAYS (x) OF THESE TRIPS. HE FOUND
THAT RELATIONSHIP BETWEEN y AND x IS LINEAR. A SUMMARY OF
THE DATA IS GIVEN BELOW :
Σx = 510, Σy = 7140 , Σx2 = 4150, Σxy = 54900, Σy = 740200
(I)Estimate the two regression equations from the above data
(II)A given trip has to take seven days. How much money should a
salesman be allowed so that he will not run short of money?
Solution:
_ _
x =____
Σx =____
510 =5 ; y =____
Σy =____
7140= 70
n 102 n 102

b yx = Σxy -______
Σx. Σy 54900 – 510x7140
______________
n
______________ = ______________
102 = 1958400 = 12
_________
Σx 2 ______
- ( Σx)2 4150 – (510)
____________
2 163200
n 102 _ _
The regression equation of y on x is y- y = b yx (x- x) => y-70 =12 (x-5)
y = 12x + 10
When x=7, y=12x7+10=94
Solution:
b xy = Σxy _______
- Σx. Σy 54900 – 510x7140
______________
n = 102 = 1958400 __ = 48 =
_________
0.08 _______ ______________
Σy 2 - ( Σy)2 740200 – (7140)2 24520800 601
_ n_ 102
The regression equation of x on y is
x – x = b xy (y- y ) => x-5 = 0.08( y-70) =0.08y -5.6
x = 0.08y -0.6
Q10) EQUATION OF TWO REGRESSION LINES ARE 4x +3y+7=0 AND
3x+4y+8=0. FIND (I) MEAN OF x , MEAN OF y (II) REGRESSION
COEFFICIENTS b yx AND b xy AND (III) CORRELATION COEFFICIENT
BETWEEN x AND y

Solution :
The two regression lines are
4x+3y+7=0
3x+4y+8=0
Solving (I) and (II) , we get x __
= -4 , y=
__-11 .
7 _ _7
Since the regression lines intersect at (x,y), we have mean of
__ x= -4
and _ _
__ __ __
7
Let equation (1) be the regression line of y on x, then (2) is the
regression line of x on y. We shall check if our assumption is
correct or not. __ __ __ __ __
Writing (1) as y= -4x -7, we get b yx = -4, writing (2) as x = -4y -8, we
get __
3 3 3 3 3
b xy = -4 __
3
Therefore, b yx .b xy (-4/3) (-4/3) = 16 > 1, which is not possible.
9
So assumption is wrong. __ __ __ __ __
Hence, (1) is the regression line of x on y and (2) is the regression
line of y on
__ x. Writing (1) as x= -3y -7, we get b xy = -3, writing (2) as
y= -3x -8, we
Since b yx and b xy are both –ve and r should be of the same sign as
b yx and b xy , therefore
________ ___________
r = - √b yx . b xy =
√ (-3/4) (-3/4) =__-3 I.e., correlation coefficient between
x
__ 4
and y is -3
4
Q23FIND I)COEFFICIENTS OF REGRESSION AND II) REGRESSION EQUATION,
FOR THE FOLLOWING DATA:

Price x 78 89 97 69 59 79 68 61
Demand y 125 137 156 112 107 136 123 108

Estimate the price when it is 100.


Solution : Let the assumed mean be a=78 and b=125 for the x and y series respectively
Let the deviation from the assumed mean be u=x-78 and v=y-125. We prepare the
following table
x y U=x-78 V=y-125 uv u2 v2
78 125 0 0 0 0 0
89 137 11 12 132 121 144
97 156 19 31 589 361 961
69 112 -9 -13 117 81 169
59 107 -19 -18 342 361 324
79 136 1 11 11 1 121
68 123 -10 -2 20 100 4
61 108 -17 -17 289 289 289
Σu=-24 Σv=4 Σuv=1500 Σu2=1314 Σu2=2012
_
x = a +Σu
____= 78 + -24
____= 78-3 =75
_ n 8
y = b +____
Σv =125 + 4 = 125+ 0.5 = 125.5
n

byx = bvu = Σ uv – Σu Σv
____ 1500- (-24)
______ (4)
_____________
n = _____________8 = ________
1500+ 12 = 1512
____ = ____
28
Σu2 - (Σu)
____2 1314 – (-24)
____ 2 1314-72 = 1242 23
n 8
Therefore,
_ the line
_ of regression of y on x is
y-y = byx (x – x) => y-125.5 =____
28 (x-75) => 28x-23y+786.5 =0
23
bxy= buv = Σ uv– Σu Σv
____ 1500- (-24) (4)
_______
n = _____________
_____________ 8 = 1512
____= 756
____
Σv2 - ____
(Σv)2 2012 – ____
(4)2 2010 1005
n 8
By: Niranjana and Shreeya

You might also like