Regression Analysis-1 JJ
Regression Analysis-1 JJ
7.1 INTRODUCTION
After having established the fact that two variablcs arc
in estimating (predicing) the valuc of thc one variable givencloscly relatcd
the values of we may be
if we know that advertising and sales arc corrclated we may find
out another. inexattere,g)
For
expccted
penditure for attaining a given amount of sales. Similarly. if we know that the amount a
achieveyield certain
rainfall arc closcly related we may find out the amount of rain of ice ec
duction figure. Regression analysis revcals averagc rclationshiprcquired to
betwccn two vari p a
makes possible estimation or prediction. anå t
The dictionary mcaning of the term
'regression' is the act
term 'regression' was first used by Sir Francis Galton in 1877of rcturning or going b
between the height of fathers and sons. This term was whilc studying
sion towards Mediocrity in Hereditary introduced by him in the thcnanerrelationk
:
fathers and sons revealed a very Stature." His study of height of about one
and short fathers short sons, but theinteresting relationship, i.e., tall fathers tend to havethous
average height of the sons of group of short fathers is tallsrre
than that of the fathers. The line describing the tendency to regress or going
Galton a 'Regression Line'. The term is still used back was callet h
points to represent the trend present, but it no longer describe that line drawn for a group
of "stepping back" that Galton nccessarily carries the original implhcabon
intended.
writers to use the term estimating line These days there is a growing tendency of the moden
instead of regression line because the expressen
estimating line is more clarificatory in character.
Let us
examine a few definitions of the term regression.
1. "Regression is the mcasure of the
terms of the original units of the data. average relationship betwcen two or more vanabics
-Bar
2. *The term regression analysis' refers to the
values of a variable from a knowledge of the values mcthods by which estimates are made ot
measurement of the errors involved in this estimation of one or more other variables and o
process." -Morris Hambrg
3. "One of the most frequently uscd
technigue in economics and business researen,
relation between two or more variables that are related casually, is regression
analy -Iarv
si lam
4. "Regression analysis attempts to establish the nature of the relationship' betwecn van
ables-that is, to study the functional relationship between the variables and thereby prosnk
mechanisn for prediction, or forecasting. -la-Lw-(h
It is clear from above definitions that regression analysis is a statistical device WiththN
of which we are in a position to estimate (or predict) the unknown values of one arnablethm
known values of another variable. The variable which is used to predict the varnabl of untee
is called the indepcndent variable or explanatory variable and the variable we are trying
dict is called the dependent variable or "explained" variable. The independent Vanables
noted by Xand the dependent variable by Y. The analysis used iis called the :Simple inearrey
hn
sion analysis-simplebecause there is variable, and
because of the assumed lincar only one predictor or independent
The term "lincar" means that anrelationship betweenen the dependentat and the independent
b are constants, is used to cquation of a straight line of the form betweenthe tho
describe
ables. Two variables are said to
the average
relationsShipthat exists indepen
have lincar when change in the variable(sglucs
)
able (say X) by one unit leads to constant dependent
two variables have lincar relationship the absolute change in the used to find oout
the
Rassuming anv alucs of wc can find out corresponding values ofX liom Eq. (i).
cxample if = Iwould be -3.38+ L.036 (65)- 63.96
For 70, Nwould be
Sinilarlv,if) 3.38 + L036 (70)= 69.14.
S nlot these points on the graph and obtain regression ine of Xon Y.
ry by assigning any valucs to Xin Eq (n) we can obtain corresponding values of
FThes if=ci, would be
35.82 + 2.476 (63) = 6$.8O8 or 65.81
would be 35.82 +2.476(70) =69,14
andfer '=
70,
ornh of original data and these lines would be as follows:
(INCHES)
70
sON 69
OF 68
HEIGHT 67 REGAESSION
LINE OFY ON X
66
65
REGRESSION
63 LINE OFX ON Y
62 63 64 65 66 67 68 69 70 71
HEIGHT OF FATHERS (INCHES)
(GRAPH OF ORIGINAL DATA)
Y
1
(INCHES)
70
SON 69
68
OF
HEIGHT57
65
64
a
63
62
63 64 65 66 67 68 69 70 71
69
SON
68
OF 67
HEIGHT
6E
65
64
hx=a+ by
63
a
62 63 64 65 66 67 68 69 70 71
HEIGHT OF FATHERS (INCHES)
Thesc cquations are usually called the normal cquations. In the cquations X, EY, LYY, EX
ndicaletotalswhich arc
computed from the obscrved pairs of of Iwo variables N and Y
whichthe least squares estinating line is to be fitted and Nvalues
is the number of observed pairs
fvalues.
X on y
Reeression Equation of
Regression Equation ofX on Yis cxpresscd as follows:
X = Na + bY X-at bf
Ts determine the values or a and D he following two normal cquations are to be solved
simultaneously.
EX= Na + bEY
EXY= a EY+ b E?
lustration 1. From the following data obtain the two regression cquations:
X 2 10 4
Y 9
8 7
Solution.
OBTAINING REGRESSION EQUATIONS
X Y
9 54 36
2 22 4 121
10 5 50 00 25
4 32 16 64
8 7 56 64 4g
= 2)(Y - a- bX )(-) = 0
2)(Y- a - bX)(-X) 0
b,, or r 0, Zxy
tinlthe
Instead of finding out the value of correlation We can
of regression coefficient by calculating Zxy and Lcoefticient, o,,o,
and dividing ee: bythe latte
the tormer
y_ Zy
No,o,
Alo b,, 0 (y)0,
7,9
on \
quatonot.
he noted that the underoot of tlhe product of(wo regression coefficients gives us
Symbolieallý:
slc of corelation coclicient.
Oy
b,, =r and b,, =
o,
Sinceb=rcÞ, , we can find out uny of the four values given the other three. Fo exan1ple.
Nhat r0.6, g -4 and b 0.8, we cun find ,
b. =
No a
7.10
0,
Regresson Eqatton of Aon )
0,
0,
llence 1 0 8) L3 l04
RevcsNion Etuation of on
20
-), 65
N
d (X A) and d 1
onfsSAN4L)SIs 7.1|
is
cquation ot ) on
regressin
the
Y-Y=r-D
Nscdy- slnsly
NEddr9
N
both the cases the numerator is the same, the only ditfercnce Is in
sheuldbe notedthat in
t denomnator. When the regression coefticients are calculated from correlation table thcir
sareobtained
as follows:
Efd,d
-x 1
(E fd,
N
and
=dass interval ofX variable:
duss interval of Yvariable
Efd, x Efd,
Efd,d, N
Similarly. r
2 (Efd, y'
Efd,
N
As is clear from above the formulac for calculating regression coefficients in a correlation
ale are the same-the only difference is that in a correlation table we are given frequencies
and hence we have multiplied every value by f.
llastration 3. Erom the data of illustration 1, obtain regression equations taking deviations from S in
cof and7 in case of Y
Solation.
CALCULATION FOR REGRESSION EQUATIONS
(X-5) Y (Y-7)
d, d d dd
+
1 9 +2 4 +2
2 9 +4 16 -12
+5 2 -2
+3 0
Ed =+ 5 J'= 4S SY= 40 El 25
ession Equation of Xon Y: X-X -b, (Y-Y)
-21- (5)(5)
N (5) -21-S
= -13
25- (5) D0
AEGAESSION
LINE OF
YON X
REGRESsION
LINE OF
X ONY
(u) Let
N164- 13(|0)- 164 13 =34
Y-6
X- l64 L3 (6) l64- 74- 8.6
Thesc ponts and the regression Ine through them are shown n the
Thus thc valuc of graph ar
regressIOn coefticIent comes out to be the same
Illustration S. GIven the bivanatc data
x- 2
244, (24,12d,)
2d,r
249
= 24.9: T = 2 305
= 30.5
10 10
127- S(-21)
10
O,
N
193-.(5)
10
127 +10.5
193 - 2.5
137.0.722
190.5
Y- 24.9 = 0.722 (X - 30.5)
y= 0.722X - 22.02 + 24.9 or Y= 0.722 X+ 2.88
Regression equation ofX on Y:
x-= (Y - )
201 (-21
127 + 10 S 137.5
0. 876
201-44.I 156.9
X- 30.50.876 (Y 24.9)
X-30.5 0.876 Y 21.81
X ),876 Y- 21L81 + 30,5
0.876 Y + 8.69
willbe
When= l6, X= 0.876 (16) + 8.69 7.15
-14. 016 + 8.69
22.706 or 23
the
hustration7. )You arc given following information
AnthmetCmcan Price (R Amount demanded (000 units)
Standand deviation 2 35
(orelation Cocicient
of
Obtan the rciSsIOn cquation amout demanded on price
Rs 12.5 and estimate the
Solution. Ilet the pIce be denoted
by Xand (BA likely demand when price
amount demanded by Y. The (H) Econ, Defhi Univ. 199
undet ()on prce L) will be :
regression equation of amourt
o,
Y=35, o, =5, X =10, Oy= 2. r=0.8
Y- 35 =0.8x= (X- 10)
Y-35 =2(X- 10)
Y- 35 = 2 X- 20
Y=2X+ I5
The estimated vlauc of Ywhen X=
12.5
Y =2(12.5) +15
25 + 15 = 40
Thus Y,,,= 40 (000 units)
gstration 8. In a partially destroycd laboratory record of an
kwtg results only are legible: analysIs of correiaton data the
Variance of X=9
Kg csson
cquations 8X- 10Y + 66 = 0
40X- 18Y= 214
fac on the basis of the
above
ie mean
values of XandYinformation
et vcient of
correlation between X and Y. and
ardatd devvattheion
of Y. |BA (H) Bcon, telhi Unix, 1994, MCA Madus Cnn 002
Alo
calculTheate standard crror of estate of regression of Yom and Aon
Mean values of Xund
KX |0Y
8X=- 66 + 170
8X= 104
REGRES ION AN V
X = 13 = 13
have to find out
(ii) For finding out the correlation cocfficient, we will the equation the
we don't know which of the two regression cquations
Y.
is of regressimake
Xon Y, we on coefaN aSsume
Let us take cq. (i) as the regression equation ofX on
8X=- 66 + 10 Y
66
X =
= 1.25
or
8
Y=X+6.6
8
b,, =
Jo.36 = 0.6
V10 40
(iii) o, = V9 = 3
b,, =r
.45=,63
1.8
.450, = 1.8 or o, = -= 4
45
Hence standard deviation of Yis 4.
(iv) S,, = o, I-?
g, =4, r= 0.6
S,, = 4 - (0.6) =4 x0.8 = 3.2
, 3,r-0.6
S, = 3/ - (0.6) =3x0.8= 2.4
a E S NANALSS 7.17
44, \will be
whr
I-Qo(44)- 36
o4i6=624
in statistics is 624
The mean marks
0.6 } 36
=0.o
(given)
16
9
4
b, =r -
06= 3
0k24 0.8
Ihe coefiicient of correlation between marks in two subjects 1s O.5.
O lustration 10. You are given the following data:
X Y
Arither etic Mean
36 85
Standard Deviation 8
orre latinn coeff. between X and Y=
0.66
DFiad the two
Eimate the Regression
value of X Equations
when Y 75
Selution.
Reyression Euuation of Xon Y
36, r 0.66, o - | ,o - 8, Y 8S
X- 36 66 -(Y - 85)
X- 36 9075 (Y- 85)
X= 9075 Y- 77.1375 + 36
9075 Y- 41.1375
Regression Equation ofY on X
r-7=r
REGRES IoN ANA
G,
Y-85 6u-36)
=.
Y- 85 = 48 (K- 36)
Y- 85 = 48.1- 17.28
Y= 48 X -67.72
(i) Fromthe regressOn equation of Xon Y, we can find out the estimated
value of X
X= 9075 (75) - 4L.1375
=68.062S -41.1375= 26.925
when Y14
Thus X- 26.925.
llustration 11. For a bivariate distribution, the lines of regression are
3X- 12Y = 19; 3Y+ 9X= 46
Find the means and correlation coeficient. [B. SC. (H)
Solution. Mean values of.X and Y Chemistry, Delhi Uinn I
3X -12Y = 19 .. ()
3Y-9X= 46 (ii)
Multiplying equation () by 3
9X+36Y= 57
9.-3Y= 46
3X +12x= 19 or 3X + 4 = 19 K = 5
Hence the mean of.X and Y are 5 and 0.333.
Correlation Coeftcient:
Let eq. () be regression of Yon
3X-12Y = 19 12Y= 19 - 3X
12 4
From eq. (ü) 3Y+9X= 46 9X= 46 - 3Y
46
X=Y or b, =-or
r=, x b,, =-x-
3 4
=-0. 289