Unit 1 - Curve Fitting & Statistical Methods
Unit 1 - Curve Fitting & Statistical Methods
Unit 1 - Curve Fitting & Statistical Methods
Unit – I: Curve Fitting:Curve fitting by the method of least squares and fitting of the curves of
the form, y = ax + b, y = ax2 + bx + c, y = aebx and y = axb
Statistical Methods: Measures of central tendency and dispersion. Correlation-Karl Pearson’s
coefficient of correlation-problems. Regression analysis- lines of regression, problems. Rank
correlation.
na b x y
a x b x 2 xy
na b x c x 2 y
a x b x 2 c x3 xy
a x 2 b x 3 c x 4 x 2 y
Note: The normal equations for fitting a straight line or parabola can be written instantly from
the desired equation of the curve as follows
We first apply summation ( ) to the desired equation keeping the constants a, b and c outside
the summation where the summation of pure constant terms like a , b , c are to be written as
na, nb, nc respectively
We then multiply the given equation by the independent variable x and apply summation again.
This will be sufficient for fitting a straight line. However in the case of parabola we must also
multiply by x2 and apply summation.
1
Fitting of a curve of the form y ab x
Consider y ab x (1)
or Y A BX (2)
Which is the same as y a bx , the normal equations associated with equation (2) are as follows
nA B X Y (3)
A X B X 2 XY (4)
and log e b B b e B
Substitution of the values of a and b in (1) give us the best fitting curve y ab x in the least
square sense.
Note:We can also fit curves of the form y aebx (Exponential curve), y axb (Geometric curve)
in the similar way.
Working procedure for problems:
Method I: (Direct Method)
Step 1: We first write the normal equations appropriate to the curve of fit.
Step 2:We prepare the relevant table and the find the values of the summation present in the
normal equations. We substitute these values to arrive a system of equations in the unknown
parameters.
Step 3: We find the parameters by solving and substitute in the given equation.
2
Example: Fit a straight line y a bx in the least square sense for the data
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
na b x y
a x b x 2 xy Here n = 8
na b x y
a x b x 2 xy Here n = 6
3
x y xy x2
100 45 4500 10000
120 55 6600 14400
140 60 8400 19600
160 70 11200 25600
180 80 14400 32400
200 85 17000 40000
x = 900 y = 395 xy = 62100 2
x = 142000
Example: Fit a straight line for the data given below using the method of least squares
x 1 2 3 4 6 8
y 2.4 3 3.6 4 5 6
na b x y
a x b x 2 xy Here n = 6
4
Solving (1) and (2) we get a = 1.9764, b = 0.5058
The equation y a bx becomes y 1.9764 0.5058 x .
y na b x
xy a x b x
2
Here n = 5
Alternative Method
The normal equations for y a bX are given by
y na b X
Xy a X b X
2
Here n = 5 and X = x - 1931
5
The relevant table is as follows
x X = x - 1931 y Xy X2
1911 -20 8 -160 400
1921 -10 10 -100 100
1931 0 12 0 0
1941 10 10 100 100
1951 20 6 120 400
X=0 y = 46 Xy = -40 2
X = 1000
Put X = x - 1931
y 9.2 0.04 x 1931
y 86.44 0.04 x
Examples:
1.Find the equation of the beat fitting straight line for the following data
x 1 2 3 4 5
y 14 13 9 5 2
2. Fit a straight line for the data given below using the method of least squares
x 0 1 2 3 4 5
y 9 8 24 28 26 20
3. Fit a straight line for the data given below using the method of least squares
x 62 64 65 69 70 71 72
y 65.7 66.8 67.2 69.3 69.8 70.5 70.9
4. Find a law of the form y a bx for the following data
x 50 70 100 120
y 12 15 21 25
6
5. A simply supported beam carries a concentrated load P at its midpoint corresponding to
various values of Pthe maximum deflection Y is measured and is given below
P 100 120 140 160 180 200
Y 0.45 0.55 0.60 0.70 0.80 0.85
Find a law of the form Y a bP andhence estimate Y when P = 150.
x 0 1 2 3 4
y 1 1.8 1.3 2.5 2.3
na b x c x 2 y
a x b x 2 c x3 xy
a x 2 b x 3 c x 4 x 2 y Here n = 5
7
Example: Fit a parabola y a bx cx 2 by the method of least square for the data
x 2 4 6 8 10
y 3.07 12.85 31.47 57.38 91.29
na b x c x 2 y
a x b x 2 c x3 xy
a x 2 b x 3 c x 4 x 2 y Here n = 5
Examples:
2. Fit a parabola of second degree y a bx cx 2 in the least square sense for the data
x 0 1 2 3 4 5 6
y 14 18 27 29 36 40 46
3. Fit a second degree parabola y a bx cx 2 in the least square sense for the following data
and hence estimate y at x = 6
x 1 2 3 4 5
y 10 12 13 16 19
4. Fit a parabola of second degree y a bx cx 2 in the least square sense for the data
x 1 2 3 4 5
y 25 28 33 39 46
5. Fit a parabola of second degree y a bx cx 2 in the least square sense for the data
x 10 20 30 40 50 60
y 157 179 210 252 302 361
8
6. Fit a parabola of second degree y a bx cx 2 in the least square sense for the data
x 0 1 2 3 4
y 1 5 10 22 38
7. Fit a curve of the form y a0 a1 x a2 x to the data
2
x 0 1 2 3 4
y 1 1.8 1.3 2.5 6.3
by the method of least squares
x 1 2 3 4
y 1.7 1.8 2.3 3.2
by the method of least squares
x -2 -1 0 1 2
y -3.150 -1.390 0.620 2.886 5.378
10. Fit a parabola y a bx cx 2 by the method of least square to the following data
x -3 -2 -1 0 1 2 3
y 4.63 2.11 0.67 0.09 0.63 2.15 4.58
11. Fit a second degree polynomial of the form y a bx cx 2 for the data
x 0 1 2 3 4 5
y 1 3 7 13 21 31
9
Example: Fit a curve of the form y ab x in the least square sense for the following data
x 0 2 4 5 7 10
y 100 120 256 390 710 1600
nA B X Y
A X B X 2 XY Here n = 6
10
Example: Fit a curve of the form y ab x in the least square sense for the following data
x 1 2 3 4 5 6 7 8
y 1.0 1.2 1.8 2.5 3.6 4.7 6.6 9.1
Example: Fit a curve of the form y ab x in the least square sense for the following data and
hence estimate y when x = 8.
x 0 1 2 3 4 5 6
y 32 47 65 92 132 190 275
Example: Fit a curve of the form y ab x in the least square sense for the following data
x 1 2 3 4 5 6 7
y 87 97 113 129 202 195 193
Example: At constant temperature, the pressure P and the volume V of a gas are connected by
the relation PV = constant. Find the best fitting equation of this form to the following data and
estimate V when P = 4
P(Kg/Sq. cm 0.5 1.0 1.5 2.0 2.5 3.0
V(c.c) 1620 1000 750 620 520 460
Example: Fit a curve of the form y ae for the data
bx
x 0 2 4
y 8.12 10 31.82
Example: Fit a curve of the form y aebx for the data
x 5 6 7 8 9 10
y 133 55 23 7 2 2
Example: Fit a curve of the form y axb for the data
x 1 2 3 4 5 6
y 2.98 4.26 5.21 6.1 6.8 7.5
11
Correlation:
Suppose two variables x and y are related in such a way that an increase in one is accompanied
by an increase or decrease in the other. Such a relationship is called correlation (or covariation).
If x and y increase or decrease together, then we say that x and y are positively (directly)
correlated. On the other hand, if y decreases as x increases or vice-versa then we say that x and y
are negatively (inversely) correlated.
For example demand and price of a commodity are positively correlated, whereas supply and
price are negatively correlated.
The numerical measure of correlation between two variables x and y is known as the co-efficient
of correlation and it is defined as
( x x )( y y )
r
n x y
x y
where n is the number of observations, x is mean of x, y is mean of y,
n n
(x x )
2 2
x
x ( x ) 2 is the standard deviation of x and
n n
( y y)
2 2
y
y ( y ) 2 is the standard deviation of y.
n n
Alternate form (1):
If X x x and Y y y
(x x )
2 2
X
x
n n
( y y)
2 2
Y
y
n n
2 2
X Y
or x y n x y X 2 Y 2
n n
XY
Therefore r
2 2
X Y
x2 y2 x2 y
r
2 x y
25
Property:
The co-efficient of correlation numerically does not exceed unity.
Proof:
We have to show that 1 r 1
2 2
1 X Y 1 X Y
Let S and S
2n x y 2n x y
where X x x and Y y y
1 X 2 Y 2 2 XY
Now S 0
2 n x2 y2 x y
1 X2 Y2 XY
S 2 2 2 0
2n x y x y
1 1 X2 1 Y 2 XY
S 2 2 2 0
2x n y n n x y
1 1 2 1 2
S 2 x 2 y 2 r 0
2 x y
1
S 1 1 2r 0
2
1
S 2 2r 0
2
1 r 0
1 r (1)
1
Similarly we can obtain S 2 2r 0
2
1 r 0
r 1 (2)
From (1) and (2) 1 r 1
26
Regression
Regression is an estimation of one independent variable in terms of the other. If x and y are
correlated, the best fitting straight line in the least square sense gives reasonably a good relation
between x and y.
The best fitting straight line of the form y = ax + b (x being the independent variable) is called
the regression line of y on x and x = ay +b (y being the independent variable) is called the
regression line of x on y.
The regression line of y on x
y
y y r (x x )
x
y XY
where byx r ,
x X 2
X x x and Y y y
(or) x x bxy ( y y )
x XY
where bxy r ,
y Y 2
Note:
y
The values byx r and bxy r x are known as the regression co-efficients. Their product is
x y
equal to r2
27
Example:
x y 1 r 2
Show that θ is the angle between the lines of regression then tan 2
x y2 r
Solution:
We know that if θ is acute the angle between the lines y m1 x c1 and y m2 x c2 is given by
m2 m1
tan
1 m1m2
x
and x x r ( y y ) we write this equation as
y
y
y y (x x ) (2)
r x
y 1 r2 1 r2
x y
x r r
x2 y2 x2 y2
x2
x y 1 r 2
tan
x2 y2 r
28
Example:
Calculate the co-efficient of correlation and obtain the lines of regression for the following data
x 1 2 3 4 5 6 7 8 9
y 9 8 10 12 11 13 14 16 15
Obtain an estimate for y which corresponds to x = 6.2.
Solution:
Here n = 9
x 45 y 108
x 5 and y 12
n 9 n 9
We prepare the following table
x X xx X2 y Y y y Y2 XY
1 -4 16 9 -3 9 12
2 -3 9 8 -4 16 12
3 -2 4 10 -2 4 4
4 -1 1 12 0 0 0
5 0 0 11 -1 1 0
6 1 1 13 1 1 1
7 2 4 14 2 4 4
8 3 9 16 4 16 12
9 4 16 15 3 9 12
∑X2 = 60 2
∑ Y = 60 ∑XY = 57
XY
Now r
2 2
X Y
57 57
r 0.95
60 60 60
2 2
X 60 Y 60
x2 6.6667 y2 6.6667
n 9 n 9
x 2.582 y 2.582
29
Therefore the line of regression of y on x is
y y byx ( x x )
y 12 0.95( x 5)
x 5 0.95( y 12)
Solution:
Here n = 10
x 70 y 150
x 7 and y 15
n 10 n 10
30
2 -5 25 8 -7 49 35
5 -2 4 12 -3 9 6
8 1 1 16 1 1 1
9 2 4 16 1 1 2
10 3 9 10 -5 25 -15
13 6 36 32 17 289 102
15 8 64 32 17 289 136
∑X2 = ∑Y 2 = ∑XY =
204 818 360
XY 360 XY 360
byx 2
1.7647 bxy 2
0.44
X 204 Y 818
y 15 1.7647( x 7)
x 7 0.44( y 15)
Co-efficient of correlation is
31
Example: A person while calculating the co-efficient of correlation between two variables x, y
from a set of 25 observations obtain the following results ∑x = 125, ∑y = 100, ∑xy = 508,
x :8 6
∑x2 = 650, ∑y2 = 460. But it was later found that the pair of values where wrongly
y :12 8
x :8 6
copied as .
y :14 6
It may be observed that the summations ∑x, ∑y, ∑x 2 are unchanged even after the correction.
However we have
correct∑xy = 508 – 132 + 144 = 520
correct ∑y2 = 460 – 232 + 208 = 436
Therefore correct values of the mean and standard deviation of x and y are as follows
x 125 y 100
x 5 y 4
n 25 n 25
2
x 650 2
x ( x )2 5 1 1
n 25
2
y 436 2
y ( y )2 4 1.44 1.2
n 25
( x x )( y y )
We have r
n x y
32
xy xy xy x y
n x y
1 xy y x x y nx y
x y n n n n
1 xy
x y x y x y
x y n
1 xy
x y
x y n
1 520
(5 4) 0.6666 0.67
(1)(1.2) 25
Hence the correct value of r = 0.67.
Examples:
1. Find the correlation co-efficient and the regression lines for the following data
x 1 2 3 4 5
y 2 5 3 8 7
Find the best estimate for y when x = 3.5 and the best estimate for x when y = 3.5.
2. Calculate the co-efficient of correlation and obtain the lines of regression for the following
data
x 3 5 6 9 10 12 15 20 22 28
y 10 12 15 18 20 22 27 30 32 34
33
6.Find the correlation coefficient and the regression lines y on x and x on y for the following data
x 2 4 6 8 10
y 5 7 9 8 11
7. Find the correlation coefficient between x and y from the given data
x 78 89 97 69 59 79 68 57
y 125 137 156 112 107 138 123 108
11. The regression equations are 8x – 10y + 66 = 0 and 20x – 9y – 107 = 0. Find x , y , y and
correlation co-efficient.
34
Rank Correlation
A group of n individuals may be arranged in order to merit with respect to some characteristics.
The same group would give different orders for different characteristics. Considering the orders
corresponding to two characteristics A and B, the correlation between these n pairs of ranks is
called rank correlation in the characteristics A and B for that group of individuals.
Let xi, yi be the ranks of the ith individuals in A and B respectively. Assuming that no two
individuals are bracketed equal in either case, each of the variables taking the values 1, 2, 3, 4,
…..n, we have
Rank correlation between A and B is given by
6 d i2
1
n3 n
where di xi yi difference between the ranks of the ith individuals in A and B respectively.
d i 25 4 16 4 4 0 1 1 4 1 60
2
6 d i2 6 60 360
Hence 1 1 3 1 0.6 nearly.
n n
3
10 10 990
Example: Three judges, A, B, C, give the following ranks. Find which pair of judges has
common approach
A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7
( x, y ) 0.2, ( y , z ) 0.3, ( z , x ) 0.6
Since ( z , x ) 0.6 is maximum, the pair of judges A and C have the nearest common approach.
Example: Calculate the rank correlation coefficient from the following data showing ranks of 10
students in two subjects
Maths 3 8 9 2 7 10 4 6 1 5
Physics 4 9 10 1 8 7 3 4 2 6
35
The Spearman rank correlation for repeated ranks is given by
2 2
6 d 2 m1 (m1 1) m2 (m2 1) ...
1 12
3
12
n n
Where m1 , m2 ,... are the number of items whose ranks are common.
Example: Find rank correlation for the following data showing rank of 10 students in two tests
Student A B C D E F G H I J
Test 1 70 68 67 55 60 60 75 63 60 72
Test 2 65 65 80 60 68 58 75 63 60 70
36