Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management
Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management
Module-IV
• Expand their knowledge and skills of the Statistical Concepts and a personal
• development experience towards the needs of statistical data analysis.
• Understand the Least Squares Method.
• Fit data using several types of curves.
• Evaluate correlation and regression coefficients.
• Investigate the strength and direction of a relationship between two variables by
• collecting measurements and using appropriate statistical analysis.
Introduction:
In many fields of Applied Mathematics and Engineering we face some problems and do the
experiments involving two variables.
In this chapter, we consider the Mathematical theory of statistics, by presenting an elementary
treatment of curve fitting, correlation and regression.
Suppose we are given n values of x1 , x2 , x3 ,…………….xn of an independent variable x and the
corresponding values y1 , y2 , y3 ,…………….yn of a variable y depending on x. Then the pairs (x1, y1),
(x2, y2), ........, (xn, yn) give us n- points in the xy-plane. Generally, it is not possible to find the actual
curve y = f(x) that passes through these points. Hence, we try to find a curve that serves as best
approximation to the curve y = f(x). Such a curve is referred to as the curve of best fit. The process of
determining a curve of best fit is called curve fitting. A method to find curve of best fit is called method
of least squares.
Let y= f(x) be an approximate relation that fits into the data (xi, yi) then yi are called observed values
Yi = f(xi) is called the expected values. The expected values Ei = yi - Yi are called the estimated error
or residuals.
The method of least squares provides a relationship y = f(x) such that sum of the squares of the residues
is least. Such a curve is known as least square curve.
We will discuss the fitting of the following types of the curves.
y − a − b x = 0
y = na + b x
E n
= 0 2 (y − a − bx)( −x) = 0
b 1
xy = a x + b x 2
The normal equations for estimating the values of a and b are
y = na + b x
xy = a x + b x
2
Solving the above normal equations, we estimate the values of a & b. With these values of a and b
y = a + bx is the line of best fit.
y = na + b x + c x
2
E n
= 0 2 (y − a − bx − cx 2 )(−x) = 0
b 1
3
xy = a x + b x 2 + c x
E n
= 0 2 (y − a − bx − cx 2 )(−x 2 ) = 0
c 1
x 2 y = a x 2 + b x 3 + c x 4
The normal equations for estimating the values of a, b, c is
y = na + b x + c x
2
xy = a x + b x + c x
2 3
4
x y = a x + b x + c x
2 2 3
Solving the above equations, we estimate the values of a, b & c. With these values of a, b & c,
XY = A X + b X
2
Solving the above equations, we estimate the values of a & b. With these values of a and b, y = axb
is the line of best fit.
Problems:
x 1 2 3 4 5 6
y 6 4 3 5 4 2
Solution: The normal equations for y=a + bx estimating the values of a and b are
y = na + b x
xy = a x + b x
2
x y x2 xy
1 6 1 6
2 4 4 8
3 3 9 9
4 5 16 20
5 4 25 20
6 2 36 12
∑x= 21 ∑𝑦 = 24 ∑x2 = 91 ∑xy=75
Given 𝑛 = 6,
2
x =21, y =24, xy =75, x =91
Therefore, we get
24=6a+21b and 75=21a+91b
Solving, we get a=5.799, b=-0.514
Therefore, the equation of best fit is y=5.799-0.514x
2. Fit a straight line of the form y= ax +b for the following data by the method of
least squares.
x 5 10 15 20 25
y 16 19 23 26 30
Solution: Let y= ax +b be the given straight line.
The normal equations are
∑y = a∑x + nb
∑xy = a∑x2 + b∑x
x y x2 xy
5 16 25 80
10 19 100 190
15 23 225 345
20 26 400 520
25 30 625 750
∑x= 75 ∑𝑦 = 114 ∑x2 = 1375 ∑xy=1885
2
Therefore, y = 114, x = 75, xy = 1885, x = 1375,
Substituting in the above equations we get a = 0.7, b =12.3
The best fit is y = 0.7x+12.3
3. Fit a power function (geometric curve) of the form y = axb to the data given below.
𝑥: 20 16 10 11 14
𝑦: 22 41 120 89 56
Solution: Given y = axb .
By taking log on both side, we get
log y = log a + b log xY = A + b X
where Y = log y, A = log a & X = log x.
4. Fit a curve of the form y = a + bx + cx 2 to the data by the method of least squares.
x 1 2 3 4
y 1.7 1.8 2.3 3.2
Solution:
x y xy x2 x3 x4 x2 y
1 1.7 1.7 1 1 1 1.7
2 1.8 3.6 4 8 16 7.2
3 2.3 6.9 9 27 81 20.7
4 3.2 12.8 16 64 256 51.2
y = na + b x + c x
2
xy = a x + b x + c x
2 3
x y = a x + b x + c x
2 2 3 4
x : 1 2 3 4
y : 4 6 3 2
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 7 | 28
RV Institute of Technology & Management ®
2. The following table gives the production (in thousands of units) of a certain commodity in
different years:
Year x 1958 1968 1978 1988 1998
Production y 8 10 12 10 16
Fit a straight line to the data and estimate the production in the year 2005.
3. A simply supported beam carries a concentrated load P at its mid-point. Corresponding to
various values of P, the maximum deflection D is measured and the values are as given below:
P 100 120 140 160 180 200
D 0.45 0.55 0.6 0.7 0.8 0.85
Find a linear law of the form D=a+bP.
4. In some determination of the volume V of carbon dioxide dissolved in a given volume of water
at different temperatures T, the following pairs of values were obtained.
T 0 5 10 15
V 1.80 1.45 1.18 1
Obtain by the method of least squares a relation of the form V = a+bT which
best fits to these observations.
5. The following table gives the results of the measurements of train resistances; V is the velocity
in mile per hour. R is the resistance in pound per ton.
V 20 40 60 80 100 120
R 5.5 9.1 14.9 22.8 33.3 46
decrease in the other. The other technique that is often used in these circumstances is regression, which
involves estimating the best straight line to summarize the association.
Correlation
Correlation means simply a relation between two or more variables.
Two variables are said to be correlated if the change in one variable results in a corresponding change
in the other.
Eg: 1. x: supply y: price
2. x: demand y: Price
Types of correlation
Positive correlation:
If an increase or decrease in one variable corresponds to an increase or decrease in the other then the
correlation is said to be positive correlation or direct correlation.
Eg: 1. Demand and price of commodity.
2. Income and expenditure.
Negative correlation:
If an increase or decrease in one variable corresponds to a decrease or increase in the other then the
correlation is said to be negative correlation or inversely correlated.
Eg: 1. Supply and Price of a commodity.
2.Correlation between Volume and pressure of a perfect gas.
No correlation:
If there exist no relationship between two variables then they are said to be non-correlated.
Scatter diagram:
To obtain a measure of relationship between two variables x and y we plot their corresponding values
in the xy - plane. The resulting diagram Fig. 4.1 showing the collection of the dots is called the dot
diagram or scatter diagram.
nσ x σ y
x y
x= -Mean of the x series y= - mean of the y series
n n
For computation purpose we can use the formula
n xy −( x)( y)
r=
n x 2 − ( x) 2 n y 2 − ( y) 2
n n
1
a i bi
r= n
1 2 1 2
ai bi
n n
r 2
=
( a i bi )
2
2 2 ------------(1)
ai bi
By Schwartz inequality, which states that if ai , bi i=1, 2,…..,n are real quantities then
( a i b i ) a i bi and
2 2 2
the sign of equality holding if and only if
a1 a 2 a 3 a
= = = ............ = n .
b1 b 2 b 3 bn
Using this equation (1) becomes
r2 1
r 1
−1 r 1
Hence correlation coefficient cannot exceed unity numerically.
Note:
Problems:
1. While calculating the correlation coefficient between x and y from 25 pairs of observations a
x i = 125, x i = 650,
2
person obtained the following values.
y i = 100, y i = 460, x i y i = 508 . It was later discovered that he had copied down the
2
pairs (8,12) and (6,8) as (6,12) and (8,6) respectively. Obtain the correct value of the correlation
coefficient.
n xy −( x)( y)
r= = 0.51912
n x 2 − ( x) 2 n y 2 − ( y) 2
2. The following Table gives the age (in years) of 10 married couples. Calculate the coefficient of
Age of Husband(x) 23 27 28 29 30 31 33 35 36 39
Age of wife(y) 18 22 23 24 25 26 28 29 30 32
X i Yi
r= =0.9955
2 2
X i Yi
i.e, the ages of husbands and wives are almost perfectly correlated.
Solution: First we prepare a table including the data of x and y series, and calculate the necessary
totals required to compute r.
990 980
x= = 99 & y = = 98
10 10
∑𝑋𝑌 92
𝑟= = = 0.5963.
√∑𝑋 2 ∑𝑌 2 √170×140
5. Compute Pearson’s Coefficient of correlation between x & y from the following data:
x: 1 2 3 4
y: 1 4 9 16
Solution:
n xy − ( x )( y )
r=
n x 2 − ( x )2 n y 2 − ( y )2
4 100 − 10 30
=
( 4 30 −10 ) ( 4 354 − 30 )
2 2
400 − 300
= = 0.9843
√20 × 516
Regression:
Correlation describes the strength of an association between two variables, and is completely
symmetrical, the correlation between A and B is the same as the correlation between B and A.
However, if the two variables are related it means that when one changes by a certain amount the other
changes on an average by a certain amount. The relationship can be represented by a simple equation
called the regression equation. In this context "regression" (the term is a historical anomaly) simply
means that the average value of y is a "function" of x, that is, it changes with x.
Regression analysis is a mathematical measure of the average relationship between two or more
variables in terms of the original units of data.
Line of regression:
Line of regression is the line which gives the best estimate to the value of one variable for any specific
value of the other variable. So, the line of regression is the line of best fit.
Regression line of y on x:
Let regression line of y on x be y = a + bx
The normal equations by the method of least squares is
y = na + b x
xy = a x + b x 2
1 b
n
y = a + x
n
y = a + bx is the regression line passing through ( ( x , y )
(x − x)(y − y)
b=
(x − x)
2
(XY) (XY) σy
= = =r
X
2
nσ x
2
σx
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 𝜎 (𝑥 − 𝑥̅ ) is the regression line of y on x.
𝑥
Similarly,
𝜎
𝑥 − 𝑥̅ = 𝑟 𝜎𝑥 (𝑦 − 𝑦̅) is the regression line of x on y.
𝑦
Note:
1. Regression coefficient of y on x
(x − x)(y − y) n xy − x y σ
byx = = =r y
(x − x) n x − ( x ) σx
2 2 2
2. Regression coefficient of x on y
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 17 | 28
RV Institute of Technology & Management ®
(x − x)(y − y) n xy − x y σx
b xy = = = r
(x − x) n y 2 − ( y) 2 σy
2
Problems:
1. If two regression equations of the variables x and y are x = 19.13 - .87y, y = 11.6 – 0.5x, find
(a) mean of x
(b) mean of y
(c)The correlation coefficient between x and y.
2. For the data given below, obtain the two regression lines and hence obtain correlation
coefficient.
x 1 3 4 2 5 8 9 10 13 15
y 8 6 10 8 12 16 16 10 32 32
x= = =7& y = =
x 70 y 150
= 15
n 10 n 10
Let X = x − x = x − 7 & Y = y − y = y − 15
The lines of regression of y on x is
( y − y ) = byx ( x − x ),
where,
byx =
XY 360
= = 1.76
X 2 204
y − 15 = 1.76( x − 7)
y = 1.76 x − 1.76 7 + 15
y = 1.76 x + 2.68
The lines of regression of x on y is
( x − x ) = bxy ( y − y ),
where
bxy =
XY 360
= = 0.44
y2 818
x − 7 = 0.44( y − 15)
x = 0.44 y − 0.44 15 + 7
x = 0.44 y + 0.4
The sign of r is positive since both the regression coefficients are positive.
3. In the following table are recorded data showing the test scores made by salesmen on an
intelligence test and their weekly sales:
Salesmen 1 2 3 4 5 6 7 8 9 10 Total
Test 40 70 50 60 80 50 90 40 60 60 600
scores
Sales 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0 40.5
(000)
Calculate the regression line of Sales (y) on test Scores (x) and estimate the most probable weekly
Sales volume if a Salesmen makes a score of 70.
Solution:
x= = = 60 & y = =
x 600 y 40.5
= 4.05
n 10 n 10
The regression equation of y on x is
( y − y ) = byx ( x − x ),
where,
n xy − x y
byx =
n x 2 − ( x )
2
𝜎𝑦 𝑟𝜎𝑦 𝜎𝑦 −𝑟2 𝜎𝑦
−
𝑟𝜎𝑥 𝜎𝑥 𝑟𝜎𝑥 1−𝑟 2 𝜎𝑥 𝜎𝑦
∴ tan𝜃 = 𝑟𝜎𝑦 𝜎𝑦 = 𝜎𝑥 +𝜎2
2 = .
1+ ×
𝜎𝑥 𝑟𝜎𝑥
𝑦 𝑟 𝜎𝑥2 +𝜎𝑦2
2
𝜎𝑥
𝜋
When r = 0, tan𝜃 → ∞ 𝑜𝑟 𝜃 = 2, i.e., when the Variables are uncorrelated or independent,
the two lines of regression are perpendicular to each other. When r = ±1, tan𝜃 = 0 𝑖. 𝑒. 𝜃 =
0 𝑜𝑟 𝜋.
Thus, the lines of regression coincide. i.e., there is perfect correlation between the two
variables.
5. If the coefficient of correlation between two variables x and y is 0.5 & the acute angle
3
between their lines of regression is tan−1 (5). Show that 𝜎𝑦 = 2𝜎𝑥 𝑜𝑟 𝜎𝑥 = 2𝜎𝑦 .
3 3
Solution: By data r = 0.5, 𝜃 = tan−1 (5) 𝑜𝑟 tan𝜃 = 5.
The angle between the lines of regression is
1 − r 2 x y
tan =
r x2 + y2
1 2
1−
3 2 x y
=
5 1 x2 + y2
2
3 3 2
i.e. = 2 x y 2
5 4 1 x + y
1 1
i.e. = 2 x y 2
5 2 x + y
2 x2 + 2 y2 = 5 x y
2 x2 − 5 x y + 2 y2 = 0
2 x2 − 4 x y − x y + 2 y2 = 0
2 x ( x − 2 y ) − y ( x − 2 y ) = 0
( x − 2 y )(2 x − y ) = 0
x = 2 y or 2 x = y
.
6. In a partially destroyed laboratory, record of an analysis of correlation data, the following
results only are legible.
Variance of x=9, Regression equations 8x-10y+66=0 ,40x-18y=214 what are
(i) the mean values of x and y
(i) the correlation coefficient between x and y
(ii) the standard deviation of y.
Solution:
(i) since both the lines of regression pass through the point ( x , y )
8 x -10 y +66=0
40 x -18 y -214=0
4 18 9
b y x = , b xy = =
5 40 20
hence
9
r 2 = b y x b xy =
25
3
r = = 0.6
5
Since both the regression coefficients positive we take r =0.6
Standard deviation of y = 4.
,
Where x and y represent rankings of two variables from 1 to n.
Note:
(1) If the ranking of x, y are entirely in the same order like for example, x :1 1,2,3,4,5 ; y : 1,2,3,4,5
then ∑ 𝑑2 = ∑(𝑥 − 𝑦 )2 = 0. This will give us 𝜌 = ± 1 and is called perfect direct correlation.
If the ranking of x and y are entirely in the opposite order like for example, x : 1,2,3,4,5
y : 5,4,3,2,1 then ∑ 𝑑2 = 40.This will give us ρ = -1 and is called perfect inverse
correlation.
Problems:
1. Ten competitors in a beauty contest are ranked by two judges in the following order.
Compute the coefficient of correlation
I 1 6 5 3 10 2 4 9 7 8
II 6 4 9 8 1 2 3 10 5 7
6 ∑ 𝑑2
Solution: We have ρ =1-
𝑛 (𝑛2− 1)
∑ 𝑑2 = (1 − 6)2 + (6 − 4)2+(5 − 9)2 + (3 − 8)2 +(10 − 1)2 +(2 − 2)2 +(4 − 3)2 + (9 − 10)2
= 25 + 4+ 16 + 25 + 81 + 0 +1 +1 + 4 + 1 =158
6 (158)
Hence 𝜌 = 1 – 10(102− 1) = 0.042
2. Ten students got the following percentage of marks in two subjects x and y. Compute
their rank correlation coefficient.
Marks in 78 36 98 25 75 82 90 62 65 39
x
Marks in 84 51 91 60 68 62 86 58 53 47
y
Solution: We prepare the table consisting of the given data along with the ranks assigned according
to their order of the magnitude. In the subject x, 98 will be awarded rank 1, 90 as rank 2 and so on.
78 4 84 3 1 1
36 9 51 9 0 0
98 1 91 1 0 0
25 10 60 6 4 16
75 5 68 4 1 1
82 3 62 5 -2 4
90 2 86 2 0 0
62 7 58 7 0 0
65 6 53 8 -2 4
39 8 47 10 -2 4
∑ 𝑑2 = 30
6 ∑𝑑2
We have ρ = 1 - 𝑛 (𝑛2− 1) and n = 10 for the given data.
6 (30)
= 1 - 10(10 2− 1)
= 0.82
3. Ten competitors in music contest are ranked by 3 judges A, B, C in the following order. Use
the rank correlation coefficient to decide which pair of judges have the nearest approach to
common taste of music
A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7
Solution: We shall compute 𝜌𝐴𝐵, 𝜌𝐵𝐶, 𝜌𝐶𝐴 with the help of the following table where d is the
difference in ranks.
A B C 𝑑2 𝐴𝐵 𝑑2 𝐵𝐶 𝑑2 𝐶𝐴
1 3 6 4 9 25
6 5 4 1 1 4
5 8 9 9 1 16
10 4 8 36 16 4
3 7 1 16 36 4
2 10 2 64 64 0
4 2 3 4 1 1
9 1 10 64 81 1
7 6 5 1 1 4
8 9 7 1 4 1
∑ 𝑑2 ∑ 𝑑2 ∑ 𝑑2
𝐴𝐵 𝐵𝐶 𝐶𝐴
= 200 = 214 = 60
2
We have ρ = 1 - 6 ∑ 𝑑 and n = 10 for the given data.
𝑛 (𝑛2− 1)
6 (200)
Now, 𝜌𝐴𝐵 = 1 - 10(102− 1) = - 0.21
6 (214)
𝜌𝐵𝐶 = 1 - 10(102− 1) = - 0.297
6 (60)
=1 -
𝜌𝐶𝐴 10(102− 1) = + 0.636
It may be observed that 𝜌𝐴𝐵 and 𝜌𝐵𝐶 are negative which means their tastes (A &B; B
&C) are opposite. But 𝜌𝐶𝐴 is positive and is nearer to 1. (perfect correlation)
Exercise:
1. The equations of regression lines of two variables x and y are x =19.3 – 0.87y and y =
11.64-0.5x, Find the correlation coefficient and the means of x and y.
2. If the tangent of the angle between the lines of regression of y on x and x on y is 0.6 and
the standard deviation of y is twice the standard deviation of x. Find the coefficient of
correlation between x and y.
3. The following information is available in respect of the prices of a certain consumer item
in two cities: A, B. Average price in city A is Rs.65; average price in city B is Rs.67;
standard deviation in city A is 2.5; standard deviation in city B is 3.5. The coefficient of
correlation between the prices in the two cities is 0.8. Find the most likely price in city B
corresponding to the price of Rs.70 in city A.
Rank in 1 2 3 4 5 6 7 8 9 10
Economics
Rank in 4 8 2 3 5 7 6 9 10 1
Statistics
X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8
Video links:
https://youtu.be/i6ZmA9EEzrI
https://youtu.be/rWsiBmA_8Q4
https://youtu.be/Kskex59qnN4
Disclaimer: The content provided is prepared by department of Mathematics for the specified syllabus
by using reference books mentioned in the syllabus. This material is specifically for the use of RVITM
students and for education purpose only.