0% found this document useful (0 votes)
11 views17 pages

STAT1

The document covers fundamental concepts in statistics, including correlation, regression analysis, and elementary probability theory. It explains methods for analyzing relationships between variables using scatter diagrams, regression lines, and correlation coefficients. Additionally, it provides examples and formulas for calculating regression lines and correlation coefficients based on given data.

Uploaded by

mhmnrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

STAT1

The document covers fundamental concepts in statistics, including correlation, regression analysis, and elementary probability theory. It explains methods for analyzing relationships between variables using scatter diagrams, regression lines, and correlation coefficients. Additionally, it provides examples and formulas for calculating regression lines and correlation coefficients based on given data.

Uploaded by

mhmnrahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Statistics and elementary quality control

Correlation regression and elementary probability theory,


binomial, poisson and normal distribution, tests of hypothesis.
Application of elementary quality control to practical problems.

Books:
1.Theory and Problems on Statistics
by M. R Spiegel-Second edition
2.Probability and Statistics
by Walpole, Myers, Keying Ye

Relationship between variables


Height and weight, hours studied and marks obtained, yield of
crops and fertilizer used
These relationships can be studied by three methods
1. Scatter diagram
2. Regression analysis
3. Correlation

1. Scatter diagram

If variables are plotted as points in xy plane then it is called a


scatter diagram.
Ex.1 Draw a scatter diagram for the following variables

X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Fig1.
2. Regression analysis

Regression analysis is the method used for estimating the unknown


values of one variable corresponding to the known value of another
variable.

(a) Regression line of Y on X


(Curve fitting by least square method)
Let Y=a0+a1X be the equation of the line of regression of Y on X

Fig2.

AP=Y1
AQ= a0+ a1X1
PQ=AQ-AP
D1= a0+ a1X1- Y1
D2= Y2 -( a0+ a1X2)= -( a0+ a1 X2- Y2)
………..
…………..
Dn= a+bXn- Yn

The curve having the property that


D12+D22+ D32…….+ Dn2 is a minimum is called best fitting curve.
Let S be the sum of the squares of such distances
S=( a0+ a1X1- Y1)2+( a0+ a1X2- Y2)2+…….+ (a0+ a1Xn- Yn)2
According to principle of least square we have to choose a and b
such that S is minimum

S
= 2(a 0 + a1X1 - Y1 ) + 2(a 0 + a1X 2 - Y2 ) + ……. + 2(a 0 + a 1X n - Yn ) = 0
a

2
S
= 2X 1 (a 0 + a 1 X 1 - Y1 ) + 2X 2 (a 0 + a 1 X 2 - Y2 ) + ……. + 2X n (a 0 + a 1 X n - Yn ) = 0
b

 Y = Na + a  X
0 1

 YX = a  X + a  X
2
0 1

These are called the normal equations of the regression line of Y


on X

a0 =
 Y  X −  X  XY
2

a1 =
N  XY −  X  Y
N  X − ( X )
2 2
N  X 2 − ( X ) 2

Y = a 0 + a1 X

(D1+D2+ D3…….+ Dn=0 because


na+b(X1+X2+ X3…….+ Xn)-(Y1+Y2+ Y3…….+Yn)
=na+bn X -n Y
=n(a+b X - Y )=0) no formula for  (a + bX i − Yi )

Y = a 0 + a1 X Y = a 0 + a1 X

Y − Y = a1 ( X − X ) y = a1 x x=X −X y =Y −Y
a1 is called the regression coefficient of Y on X

If a1 is the regression coefficient of Y on X then show that it


can be written as
a1 =
 xy where x=X −X y =Y −Y
x 2

Proof

3
N  XY −  X  Y
a1 = X =x+ X Y = y +Y
N  X 2 − ( X ) 2
N  ( x + X )( y + Y ) −  ( x + X ) ( y + Y )
a1 =
N  ( x + X ) 2 − ( ( x + X )) 2
N  ( xy + xY + yX + XY ) − ( x + NX )( y + NY )
a1 =
N  ( x + X ) 2 − ( ( x + X )) 2

N  xy
a1 =
N x2

a1 =
 xy
x 2

And the regression line is y=


 xy x
x 2

Ex.1.Find the regression line of Y on X for the following data.


Estimate the value of Y when X=10

X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Solution:

X Y XY X2
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 40 64
9 7 63 81
11 8 88 121

4
14 9 126 196
56 40 364 524

 Y = Na + a  X
0 1

 XY = a  X + a  X
2
0 1

40=8 a0+56 a1
364=56a0+524 a1

6 7
a0 = , a1 =
11 11

6 7 76
Y= + X when X=10 Y= . when X = 7, Y =5 satisfied
11 11 11

OR,

X Y x=X −X y =Y −Y x2 xy y2
1 1 -6 -4 36 24 16
3 2 -4 -3 16 12 9
4 4 -3 -1 9 3 1
6 4 -1 -1 1 1 1
8 5 1 0 1 0 0
9 7 2 2 4 4 4
11 8 4 3 16 12 9
14 9 7 4 49 28 16
56 40 0 0 132 84 56

a1 =
 xy = 84 = 7
 x 132 11
2

y=
 xy x
x 2

5
Y −Y =
 xy ( X − X )
x 2

7
Y −5= ( X − 7)
11
6 7
Y= + X
11 11

Ex.2 Find the regression line of Y on X for the following data.


Estimate the value of Y when X=10

X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Solution:

X Y XY X2
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 40 64
9 7 63 81
11 8 88 121
14 9 126 196
56 40 364 524

a0 =
 Y  X −  X  XY = 6
2

a1 =
N  XY −  X  Y
=
7
N  X − ( X )
2
11 2
N  X − ( X )
2 2
11
6 7
Y= + X
11 11

6
Fig3

Ex.3 In a study of relationship between the amount of rainfall and


the quantity of air pollution removed, the following data were
collected. Find the regression line of Y on X.

Daily rainfall in 4.3 4.5 5.9 5.6 6.1 5.2 3.8 2.1
cm. (X)
Pollution 12.6 12.1 11.6 11.8 11.4 11.8 13.2 14.1
removed in
mg/m3 (Y)

Y=15.49-0.675X

Ex.4 The following data regarding the heights (Y) and weights (X)
of 100 students are given.

 X =15000  Y =6800  X  XY =1022250  Y


2 2
=2272500 =463025

Find the regression line of height on weight.

10Y=X+530

(b) Regression line of X on Y

Let X=b0+b1Y be the equation of the line of regression of X on Y

Fig4.

7
AP=X1
AQ= b0+b1Y1
PQ=AQ-AP
D1= b0+ b1Y1- X1
D2= X2 -( b0+ b1Y2)= -( b0+ b1Y2- X2)
………..
…………..
Dn= b0+ b1Yn-Xn

Let S be the sum of the squares of such distances


S=( b0+ b1Y1- X1)2+( b0+ b1Y2- X2)2+…….+ (b0+ b1Yn- Xn)2
According to principle of least square we have to choose a and b
such that S is minimum

S
= 2(b 0 + b1 Y1 - X 1 ) + 2(b 0 + b1 Y2 - X 2 ) + ……. + 2(b 0 + b1 Yn - X n ) = 0
a

S
= 2Y1 (b 0 + b1 Y1 - X 1 ) + 2Y2 (b 0 + b1 Y2 - X 2 ) + ……. + 2Yn (b 0 + b1 Yn - X n ) = 0
b

 X = Nb + b  Y
0 1

 YX = b  Y + b  Y
2
0 1

These are called the normal equations of the regression line of X


on Y

 X  Y −  Y  XY
2
N  XY −  X  Y
b0 = b1 =
N  Y − ( Y )
2 2
N  Y 2 − ( Y ) 2

X = b0 + b1Y X = b0 + b1Y
b1 is called the regression coefficient of X on Y

Ex.5 Find the regression line of X onY for the following data.
Estimate the value of X when Y=6

8
X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Solution:

X Y XY Y2
1 1 1 1
3 2 6 4
4 4 16 16
6 4 24 16
8 5 40 25
9 7 63 49
11 8 88 64
14 9 126 81
56 40 364 256

b =
 X  Y −  Y  XY = − 1
2

N  Y − ( Y )
0 2 2
2
N  XY −  X Y 3
b = =
N  Y − ( Y )
1 2 2
2

1 3
X =− + Y
2 2

Fig5

Ex.6 The following data regarding the heights (Y) and weights (X)
of 100 students are given.

 X =15000  Y =6800  X  XY =1022250  Y


2 2
=2272500 =463025

Find the regression line of weight on height.

3. Correlation

9
When two variables X, Y are so related that an increase in the one
is accompanied by an increase or decrease in the other, then
variables are said to be correlated. The yield of crop varies with
increase of rainfall.

Two types of correlations: Linear and Nonlinear correlation

Linear correlation:
If all points seem to lie near a straight line the correlation is said to
be linear

Fig6.

Nonlinear correlation:
If all points seem to lie near a curve the correlation is said to be
nonlinear

Fig7.

If y tends to increase as x increases the correlation is said to be


positive

Fig8.

If y tends to decrease as x increases the correlation is said to be


negative

10
Fig9.

If there is no relation indicated between the variables, we say there


is no correlation between them (i.e. they are uncorrelated)

Fig10.

If all points lie on a straight line then it is called perfect correlation.

Fig11.

Simple correlation: When only two variables are involved


Multiple correlation: When more than two variables are involved
Formula: Karl Pearson’s Coefficient of correlation/correlation
coefficient.
r=
exp lained var iation
=  (Y − Y )
est
2

total var iation  (Y − Y ) 2

 (Y est − Y ) 2 is
called explained variation because Yest − Y has a
definite pattern.

Prove that the coefficient of correlation between X and Y variables


can be written as
r=  xy where x = X − X , y =Y −Y
x y 2 2

The regression line of Y on X can be written as


Y=a0+a1X Y = a0 + a1 X
Yest − Y = a1 ( X − X ) y est = Yest − Y

11
yest = a1x

exp lained var iation  (Yest − Y )


2

r2 = =
total var iation  (Y − Y ) 2
=
2
y est

y 2

x
2 2
a1
=
y 2

=   2 x
2
 xy  2

 x y
 2

( xy)
= 
2

x y 2 2

r=
 xy where x = X − X , y =Y −Y
x y2 2

This is also known product-moment formula for the linear


correlation.

Ex. Find the coefficient of correlation between the variables X and


Y presented in the table

X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Soln:

X Y x=X −X y =Y −Y x2 xy y2
1 1 -6 -4 36 24 16
3 2 -4 -3 16 12 9

12
4 4 -3 -1 9 3 1
6 4 -1 -1 1 1 1
8 5 1 0 1 0 0
9 7 2 2 4 4 4
11 8 4 3 16 12 9
14 9 7 4 49 28 16
56 40 132 84 56

X =7
Y =5

r=
 xy =
84
=
84
= 0.977
x y
2 2
132 x56 85.98

Ex. Find the coefficient of correlation between the variables X and


Y presented in the table.

X 1 3 4 6 8 9 11 14
Y 9 8 7 5 4 4 2 1

Soln:

X Y x=X −X y =Y −Y x2 xy y2
1 9 -6 4 36 -24 16
3 8 -4 3 16 -12 9
4 7 -3 2 9 -6 4
6 5 -1 0 1 0 0
8 4 1 -1 1 -1 1
9 4 2 -1 4 -2 1
11 2 4 -3 16 -12 9
14 1 7 -4 49 -28 16
56 40 132 -85 56

13
X =7
Y =5

r=
 xy =
− 85
=
− 85
= −0.989
x y
2 2
132 x56 85.98

Show that linear correlation coefficient between X and Y is given


by

N  XY −  X  Y
r=
[ N  X 2 − ( X ) 2 ][ N  Y 2 − ( Y ) 2 ]

We know that r=  xy x=X −X, y =Y −Y


x y 2 2

= 
( X − X )(Y − Y )

 ( X − X )  (Y − Y ) 2 2

 ( X − X )(Y − Y ) =  ( XY − XY − XY + XY )
=  XY − X  Y − Y  X + NXY
=  XY − XNY − Y NX + NXY
=  XY − NXY
 X Y
=  XY − N

 ( X − X ) =  ( X − 2 XX + X )
2 2 2

=  X − 2 X  X + NX
2 2

=  X − 2 XNX + NX
2 2

=  X − NX 2 2

( X ) 2
= X −2

14
Similarly  (Y − Y ) 2 =  Y 2 − 
2
( Y)
N

 X Y
 XY − N N  XY −  X  Y
r= =
 X 2 − 
 ( X )2
  Y 2 − 
 ( Y )2 

(N  X 2
)(
− ( X ) 2 N  Y 2 − ( Y ) 2 )
 N  N 
  

Example
Find the coefficient of correlation between the variables X and Y
presented in the following table.

X 1 3 4 6 8 9 11 14
Y 1 2 4 4 5 7 8 9

Soln:

X Y XY X2 Y2
1 1 1 1 1
3 2 6 9 4
4 4 16 16 16
6 4 24 36 16
8 5 40 64 25
9 7 63 81 49
11 8 88 121 64
14 9 126 196 81
56 40 364 524 256

N  XY −  X  Y
r= =0.977,
(N  X 2
)(
− ( X ) 2 N  Y 2 − ( Y ) 2 )

15
Comment: high correlation

Spearman’s Rank correlation coefficient


Ex.8 The following table shows how 10 students arranged in
alphabetic order, were ranked according to their achievements in
both the laboratory and lecture sections of physics course. Find the
coefficient of rank correlation.

Laboratory 8 3 9 2 7 10 4 6 1 5
Lecture 9 5 10 1 8 7 3 4 2 6

OR,
The following are the ranks given by two judges to 10 debators in
a competition. Find the coefficient of rank correlation.

Judge I 8 3 9 2 7 10 4 6 1 5
Judge II 9 5 10 1 8 7 3 4 2 6

Soln:

Judge I 8 3 9 2 7 10 4 6 1 5
Judge II 9 5 10 1 8 7 3 4 2 6
D -1 -2 -1 1 -1 3 1 2 -1 -1
D2 1 4 1 1 1 9 1 4 1 1

6 D 2
 =1− = 0.8545 ,
n(n 2 − 1)
Comment: high correlation

Prove that −1 r 1

16
(X − X ) x
2 2

sX = = Ns X =  x
2
1. 2

N N
 (Y − Y ) y
2 2

sY = = Ns Y =  y
2
2. 2

N N

3. r =  xy N rs X sY =  xy
 x2  y2
We know that
2
 x y
    0
 s X sY 
x2 2 xy y2
2
 + 0
sX s X sY sY 2

Taking summation

 x2

2 xy
+
 y2
0
2 2
sX s X sY sY

N  2rN+N  0

1 r 0 1+r  0 r  −1 −1  r

also 1-r  0 -r  −1 r 1 therefore − 1  r  1

17

You might also like