STAT1
STAT1
Books:
1.Theory and Problems on Statistics
by M. R Spiegel-Second edition
2.Probability and Statistics
by Walpole, Myers, Keying Ye
1. Scatter diagram
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Fig1.
2. Regression analysis
Fig2.
AP=Y1
AQ= a0+ a1X1
PQ=AQ-AP
D1= a0+ a1X1- Y1
D2= Y2 -( a0+ a1X2)= -( a0+ a1 X2- Y2)
………..
…………..
Dn= a+bXn- Yn
S
= 2(a 0 + a1X1 - Y1 ) + 2(a 0 + a1X 2 - Y2 ) + ……. + 2(a 0 + a 1X n - Yn ) = 0
a
2
S
= 2X 1 (a 0 + a 1 X 1 - Y1 ) + 2X 2 (a 0 + a 1 X 2 - Y2 ) + ……. + 2X n (a 0 + a 1 X n - Yn ) = 0
b
Y = Na + a X
0 1
YX = a X + a X
2
0 1
a0 =
Y X − X XY
2
a1 =
N XY − X Y
N X − ( X )
2 2
N X 2 − ( X ) 2
Y = a 0 + a1 X
Y = a 0 + a1 X Y = a 0 + a1 X
Y − Y = a1 ( X − X ) y = a1 x x=X −X y =Y −Y
a1 is called the regression coefficient of Y on X
Proof
3
N XY − X Y
a1 = X =x+ X Y = y +Y
N X 2 − ( X ) 2
N ( x + X )( y + Y ) − ( x + X ) ( y + Y )
a1 =
N ( x + X ) 2 − ( ( x + X )) 2
N ( xy + xY + yX + XY ) − ( x + NX )( y + NY )
a1 =
N ( x + X ) 2 − ( ( x + X )) 2
N xy
a1 =
N x2
a1 =
xy
x 2
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Solution:
X Y XY X2
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 40 64
9 7 63 81
11 8 88 121
4
14 9 126 196
56 40 364 524
Y = Na + a X
0 1
XY = a X + a X
2
0 1
40=8 a0+56 a1
364=56a0+524 a1
6 7
a0 = , a1 =
11 11
6 7 76
Y= + X when X=10 Y= . when X = 7, Y =5 satisfied
11 11 11
OR,
X Y x=X −X y =Y −Y x2 xy y2
1 1 -6 -4 36 24 16
3 2 -4 -3 16 12 9
4 4 -3 -1 9 3 1
6 4 -1 -1 1 1 1
8 5 1 0 1 0 0
9 7 2 2 4 4 4
11 8 4 3 16 12 9
14 9 7 4 49 28 16
56 40 0 0 132 84 56
a1 =
xy = 84 = 7
x 132 11
2
y=
xy x
x 2
5
Y −Y =
xy ( X − X )
x 2
7
Y −5= ( X − 7)
11
6 7
Y= + X
11 11
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Solution:
X Y XY X2
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 40 64
9 7 63 81
11 8 88 121
14 9 126 196
56 40 364 524
a0 =
Y X − X XY = 6
2
a1 =
N XY − X Y
=
7
N X − ( X )
2
11 2
N X − ( X )
2 2
11
6 7
Y= + X
11 11
6
Fig3
Daily rainfall in 4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1
cm. (X)
Pollution 12.6 12.1 11.6 11.8 11.4 11.8 13.2 14.1
removed in
mg/m3 (Y)
Y=15.49-0.675X
Ex.4 The following data regarding the heights (Y) and weights (X)
of 100 students are given.
10Y=X+530
Fig4.
7
AP=X1
AQ= b0+b1Y1
PQ=AQ-AP
D1= b0+ b1Y1- X1
D2= X2 -( b0+ b1Y2)= -( b0+ b1Y2- X2)
………..
…………..
Dn= b0+ b1Yn-Xn
S
= 2(b 0 + b1 Y1 - X 1 ) + 2(b 0 + b1 Y2 - X 2 ) + ……. + 2(b 0 + b1 Yn - X n ) = 0
a
S
= 2Y1 (b 0 + b1 Y1 - X 1 ) + 2Y2 (b 0 + b1 Y2 - X 2 ) + ……. + 2Yn (b 0 + b1 Yn - X n ) = 0
b
X = Nb + b Y
0 1
YX = b Y + b Y
2
0 1
X Y − Y XY
2
N XY − X Y
b0 = b1 =
N Y − ( Y )
2 2
N Y 2 − ( Y ) 2
X = b0 + b1Y X = b0 + b1Y
b1 is called the regression coefficient of X on Y
Ex.5 Find the regression line of X onY for the following data.
Estimate the value of X when Y=6
8
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Solution:
X Y XY Y2
1 1 1 1
3 2 6 4
4 4 16 16
6 4 24 16
8 5 40 25
9 7 63 49
11 8 88 64
14 9 126 81
56 40 364 256
b =
X Y − Y XY = − 1
2
N Y − ( Y )
0 2 2
2
N XY − X Y 3
b = =
N Y − ( Y )
1 2 2
2
1 3
X =− + Y
2 2
Fig5
Ex.6 The following data regarding the heights (Y) and weights (X)
of 100 students are given.
3. Correlation
9
When two variables X, Y are so related that an increase in the one
is accompanied by an increase or decrease in the other, then
variables are said to be correlated. The yield of crop varies with
increase of rainfall.
Linear correlation:
If all points seem to lie near a straight line the correlation is said to
be linear
Fig6.
Nonlinear correlation:
If all points seem to lie near a curve the correlation is said to be
nonlinear
Fig7.
Fig8.
10
Fig9.
Fig10.
Fig11.
(Y est − Y ) 2 is
called explained variation because Yest − Y has a
definite pattern.
11
yest = a1x
r2 = =
total var iation (Y − Y ) 2
=
2
y est
y 2
x
2 2
a1
=
y 2
= 2 x
2
xy 2
x y
2
( xy)
=
2
x y 2 2
r=
xy where x = X − X , y =Y −Y
x y2 2
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Soln:
X Y x=X −X y =Y −Y x2 xy y2
1 1 -6 -4 36 24 16
3 2 -4 -3 16 12 9
12
4 4 -3 -1 9 3 1
6 4 -1 -1 1 1 1
8 5 1 0 1 0 0
9 7 2 2 4 4 4
11 8 4 3 16 12 9
14 9 7 4 49 28 16
56 40 132 84 56
X =7
Y =5
r=
xy =
84
=
84
= 0.977
x y
2 2
132 x56 85.98
X 1 3 4 6 8 9 11 14
Y 9 8 7 5 4 4 2 1
Soln:
X Y x=X −X y =Y −Y x2 xy y2
1 9 -6 4 36 -24 16
3 8 -4 3 16 -12 9
4 7 -3 2 9 -6 4
6 5 -1 0 1 0 0
8 4 1 -1 1 -1 1
9 4 2 -1 4 -2 1
11 2 4 -3 16 -12 9
14 1 7 -4 49 -28 16
56 40 132 -85 56
13
X =7
Y =5
r=
xy =
− 85
=
− 85
= −0.989
x y
2 2
132 x56 85.98
N XY − X Y
r=
[ N X 2 − ( X ) 2 ][ N Y 2 − ( Y ) 2 ]
=
( X − X )(Y − Y )
( X − X ) (Y − Y ) 2 2
( X − X )(Y − Y ) = ( XY − XY − XY + XY )
= XY − X Y − Y X + NXY
= XY − XNY − Y NX + NXY
= XY − NXY
X Y
= XY − N
( X − X ) = ( X − 2 XX + X )
2 2 2
= X − 2 X X + NX
2 2
= X − 2 XNX + NX
2 2
= X − NX 2 2
( X ) 2
= X −2
14
Similarly (Y − Y ) 2 = Y 2 −
2
( Y)
N
X Y
XY − N N XY − X Y
r= =
X 2 −
( X )2
Y 2 −
( Y )2
(N X 2
)(
− ( X ) 2 N Y 2 − ( Y ) 2 )
N N
Example
Find the coefficient of correlation between the variables X and Y
presented in the following table.
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9
Soln:
X Y XY X2 Y2
1 1 1 1 1
3 2 6 9 4
4 4 16 16 16
6 4 24 36 16
8 5 40 64 25
9 7 63 81 49
11 8 88 121 64
14 9 126 196 81
56 40 364 524 256
N XY − X Y
r= =0.977,
(N X 2
)(
− ( X ) 2 N Y 2 − ( Y ) 2 )
15
Comment: high correlation
Laboratory 8 3 9 2 7 10 4 6 1 5
Lecture 9 5 10 1 8 7 3 4 2 6
OR,
The following are the ranks given by two judges to 10 debators in
a competition. Find the coefficient of rank correlation.
Judge I 8 3 9 2 7 10 4 6 1 5
Judge II 9 5 10 1 8 7 3 4 2 6
Soln:
Judge I 8 3 9 2 7 10 4 6 1 5
Judge II 9 5 10 1 8 7 3 4 2 6
D -1 -2 -1 1 -1 3 1 2 -1 -1
D2 1 4 1 1 1 9 1 4 1 1
6 D 2
=1− = 0.8545 ,
n(n 2 − 1)
Comment: high correlation
16
(X − X ) x
2 2
sX = = Ns X = x
2
1. 2
N N
(Y − Y ) y
2 2
sY = = Ns Y = y
2
2. 2
N N
3. r = xy N rs X sY = xy
x2 y2
We know that
2
x y
0
s X sY
x2 2 xy y2
2
+ 0
sX s X sY sY 2
Taking summation
x2
2 xy
+
y2
0
2 2
sX s X sY sY
N 2rN+N 0
1 r 0 1+r 0 r −1 −1 r
17