0% found this document useful (0 votes)

189 views

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

Laugh Long

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

189 views

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

Laugh Long

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

RV Institute of Technology & Management ®

Module-IV

CURVE FITTING & STATISTICAL METHODS

Topic Learning Objectives:

Upon Completion of this unit, students will be able to:

• Expand their knowledge and skills of the Statistical Concepts and a personal
• development experience towards the needs of statistical data analysis.
• Understand the Least Squares Method.
• Fit data using several types of curves.
• Evaluate correlation and regression coefficients.
• Investigate the strength and direction of a relationship between two variables by
• collecting measurements and using appropriate statistical analysis.

Introduction:

In many fields of Applied Mathematics and Engineering we face some problems and do the
experiments involving two variables.
In this chapter, we consider the Mathematical theory of statistics, by presenting an elementary
treatment of curve fitting, correlation and regression.
Suppose we are given n values of x1 , x2 , x3 ,…………….xn of an independent variable x and the
corresponding values y1 , y2 , y3 ,…………….yn of a variable y depending on x. Then the pairs (x1, y1),
(x2, y2), ........, (xn, yn) give us n- points in the xy-plane. Generally, it is not possible to find the actual
curve y = f(x) that passes through these points. Hence, we try to find a curve that serves as best
approximation to the curve y = f(x). Such a curve is referred to as the curve of best fit. The process of
determining a curve of best fit is called curve fitting. A method to find curve of best fit is called method
of least squares.

Method of Least squares:

The method of least squares tells that the curve should pass as closely as possible to meet all the points.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 1 | 28
RV Institute of Technology & Management ®

Let y= f(x) be an approximate relation that fits into the data (xi, yi) then yi are called observed values
Yi = f(xi) is called the expected values. The expected values Ei = yi - Yi are called the estimated error
or residuals.
The method of least squares provides a relationship y = f(x) such that sum of the squares of the residues
is least. Such a curve is known as least square curve.
We will discuss the fitting of the following types of the curves.

Fitting of a straight line: y = a + bx

Let y = a + bx be the equation of the straight line.

The error estimate is given by E = y - (a + bx) = y - a - bx
By the principle of least squares we have to determine the constants a, b such that
n
E =  ( y - a − bx) 2 is minimum.
1
For E to be minimum the two necessary conditions are
E E
= 0, =0
a b
E n
i.e, = 0  2 (y − a − bx)( −1) = 0
a 1
n
 2 (y − a − bx) = 0
1

  y −  a − b x = 0
  y = na + b x
E n
= 0  2 (y − a − bx)( −x) = 0
b 1

  xy = a  x + b x 2
The normal equations for estimating the values of a and b are
 y = na + b x

 xy = a  x + b x
2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 2 | 28
RV Institute of Technology & Management ®

Solving the above normal equations, we estimate the values of a & b. With these values of a and b
y = a + bx is the line of best fit.

Fitting of a second-degree equation (quadratic): y = a + bx + cx2

Let y = a + bx + cx 2 be the equation of the straight line.

The error estimate is given by E = y - a − bx - cx 2

By the principle of least squares we have to determine the constants a, b such that
n
E =  ( y - a − bx - cx2 )2 is minimum.
1
E E E
For E to be minimum = 0, = 0, =0
a b c
E n
= 0  2 (y − a − bx − cx 2 )(−1) = 0
a 1
  y −  a − b x = 0

  y = na + b x + c x
2

E n
= 0  2 (y − a − bx − cx 2 )(−x) = 0
b 1

3
  xy = a  x + b x 2 + c x
E n
= 0  2 (y − a − bx − cx 2 )(−x 2 ) = 0
c 1
  x 2 y = a x 2 + b x 3 + c x 4
The normal equations for estimating the values of a, b, c is

 y = na + b x + c x
2

 xy = a  x + b x + c x
2 3

4
 x y = a  x + b x + c x
2 2 3

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 3 | 28
RV Institute of Technology & Management ®

Solving the above equations, we estimate the values of a, b & c. With these values of a, b & c,

y = a + bx + cx2 is the line of best fit.

Fitting of a curve of the form: y = ax b

Let y = ax b
Taking log both sides
Log y = log a + blog x
Y = A + bX whereY = logy, A = loga, X = logx
The normal equations are
 Y = nA + b X

 XY = A X + b X
2

Solving the above equations, we estimate the values of a & b. With these values of a and b, y = axb
is the line of best fit.

Problems:

1. Fit a straight line for the following data

x 1 2 3 4 5 6
y 6 4 3 5 4 2

Solution: The normal equations for y=a + bx estimating the values of a and b are
 y = na + b x

 xy = a  x + b x
2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 4 | 28
RV Institute of Technology & Management ®

x y x2 xy
1 6 1 6
2 4 4 8
3 3 9 9
4 5 16 20
5 4 25 20
6 2 36 12
∑x= 21 ∑𝑦 = 24 ∑x2 = 91 ∑xy=75

Given 𝑛 = 6,
2
 x =21,  y =24,  xy =75,  x =91

Therefore, we get
24=6a+21b and 75=21a+91b
Solving, we get a=5.799, b=-0.514
Therefore, the equation of best fit is y=5.799-0.514x

2. Fit a straight line of the form y= ax +b for the following data by the method of
least squares.
x 5 10 15 20 25
y 16 19 23 26 30
Solution: Let y= ax +b be the given straight line.
The normal equations are
∑y = a∑x + nb
∑xy = a∑x2 + b∑x

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 5 | 28
RV Institute of Technology & Management ®

x y x2 xy
5 16 25 80
10 19 100 190
15 23 225 345
20 26 400 520
25 30 625 750
∑x= 75 ∑𝑦 = 114 ∑x2 = 1375 ∑xy=1885
2
Therefore,  y = 114,  x = 75, xy = 1885,  x = 1375,
Substituting in the above equations we get a = 0.7, b =12.3
The best fit is y = 0.7x+12.3

3. Fit a power function (geometric curve) of the form y = axb to the data given below.

𝑥: 20 16 10 11 14
𝑦: 22 41 120 89 56
Solution: Given y = axb .
By taking log on both side, we get
log y = log a + b log xY = A + b X
where Y = log y, A = log a & X = log x.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 6 | 28
RV Institute of Technology & Management ®

Thus, normal equations are:

5 A +13.1079b = 20.1061 , 13.1079 A + 34.6782b = 51.9663
On solving these two equations, we get A=10.2127 & b = −2.3617.
Therefore, a = e10.2127=27247.
Thus, the least square geometric curve is 𝑦 = (27247)𝑥 −2.3617 .

4. Fit a curve of the form y = a + bx + cx 2 to the data by the method of least squares.
x 1 2 3 4
y 1.7 1.8 2.3 3.2
Solution:
x y xy x2 x3 x4 x2 y
1 1.7 1.7 1 1 1 1.7
2 1.8 3.6 4 8 16 7.2
3 2.3 6.9 9 27 81 20.7
4 3.2 12.8 16 64 256 51.2

 x =10 ∑y=9 ∑ xy ∑x2 = 30 ∑x3 =100 ∑x4 ∑x2 y

= 354 = 80.8
= 25
Substitute these values in normal equations

 y = na + b x + c x
2

 xy = a  x + b  x + c x
2 3

 x y = a  x + b  x + c x
2 2 3 4

We get a = 2, b = −0.5, c = 0.2.

Then the curve of best fit is y = 2 − 0.5x + 0.2x 2

5. Fit a parabola of the form, 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 to the following data

x : 1 2 3 4
y : 4 6 3 2
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 7 | 28
RV Institute of Technology & Management ®

Solution: For this the normal equations are:

𝑛𝑎 + 𝑏∑𝑥 + 𝑐∑𝑥 2 = ∑𝑦
𝑎∑𝑥 + 𝑏∑𝑥 2 + 𝑐∑𝑥 3 = ∑𝑥𝑦
𝑎∑𝑥 2 + 𝑏∑𝑥 3 + 𝑐∑𝑥 4 = ∑𝑥 2 𝑦.
The relevant table is as follows:

Thus, normal equations take the form:

4𝑎 + 10𝑏 + 30𝑐 = 15
10𝑎 + 30𝑏 + 100𝑐 = 33
30𝑎 + 100𝑏 + 354𝑐 = 87.
Solving these equations, we obtain
a = 2.250, b = 2.850 and c = -0.75. Hence,
𝑦 = 2.250 + 2.850𝑥 − 0.75𝑥 2 is the required parabola of fit.
Exercise:
1. An experiment gave the following values
v(ft/min) 350 400 500 600
t(min) 61 26 7 26
It is known that v and t are connected by the relation v =atb . Find the best
values of a and b

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 8 | 28
RV Institute of Technology & Management ®

2. The following table gives the production (in thousands of units) of a certain commodity in
different years:
Year x 1958 1968 1978 1988 1998
Production y 8 10 12 10 16
Fit a straight line to the data and estimate the production in the year 2005.
3. A simply supported beam carries a concentrated load P at its mid-point. Corresponding to
various values of P, the maximum deflection D is measured and the values are as given below:
P 100 120 140 160 180 200
D 0.45 0.55 0.6 0.7 0.8 0.85
Find a linear law of the form D=a+bP.
4. In some determination of the volume V of carbon dioxide dissolved in a given volume of water
at different temperatures T, the following pairs of values were obtained.

T 0 5 10 15
V 1.80 1.45 1.18 1

Obtain by the method of least squares a relation of the form V = a+bT which
best fits to these observations.
5. The following table gives the results of the measurements of train resistances; V is the velocity
in mile per hour. R is the resistance in pound per ton.

V 20 40 60 80 100 120
R 5.5 9.1 14.9 22.8 33.3 46

If R is related to V by the relation R= a+bV+cV2. Find a, b and c.

Correlation and Regression:

The word correlation is used in everyday life to denote some form of association. In statistical terms
we use correlation to denote association between two quantitative variables. We also assume that the
association is linear, that one variable increases or decreases a fixed amount for a unit increase or

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 9 | 28
RV Institute of Technology & Management ®

decrease in the other. The other technique that is often used in these circumstances is regression, which
involves estimating the best straight line to summarize the association.
Correlation
Correlation means simply a relation between two or more variables.
Two variables are said to be correlated if the change in one variable results in a corresponding change
in the other.
Eg: 1. x: supply y: price
2. x: demand y: Price
Types of correlation
Positive correlation:
If an increase or decrease in one variable corresponds to an increase or decrease in the other then the
correlation is said to be positive correlation or direct correlation.
Eg: 1. Demand and price of commodity.
2. Income and expenditure.
Negative correlation:
If an increase or decrease in one variable corresponds to a decrease or increase in the other then the
correlation is said to be negative correlation or inversely correlated.
Eg: 1. Supply and Price of a commodity.
2.Correlation between Volume and pressure of a perfect gas.
No correlation:
If there exist no relationship between two variables then they are said to be non-correlated.
Scatter diagram:
To obtain a measure of relationship between two variables x and y we plot their corresponding values
in the xy - plane. The resulting diagram Fig. 4.1 showing the collection of the dots is called the dot
diagram or scatter diagram.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 10 | 28
RV Institute of Technology & Management ®

Fig. 4.1 Scatter diagram

Correlation Coefficient (Karl Pearson correlation coefficient)

The degree of association is measured by a correlation coefficient, denoted by r. It is sometimes
called Karl Pearson's correlation coefficient and is a measure of linear association. If a curved line is
needed to express the relationship, other and more complicated measures of the correlation must be
used.
Let x1 , x2 ,x3 , . . . . . . , xn be n values of x and y1 , y 2 , y3 ,......y n be the corresponding n values

of y, then the coefficient of correlation between x and y

is
 (x − x)(y − y)
r= , where  x - variance of the x series,  y 2 - variance of the y series,
2

nσ x σ y
x y
x= -Mean of the x series y= - mean of the y series
n n
For computation purpose we can use the formula
n  xy −( x)(  y)
r=
n x 2 − ( x) 2 n y 2 − ( y) 2 

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 11 | 28
RV Institute of Technology & Management ®

Limits for correlation coefficient

The coefficient of correlation Numerically does not exceed unity ( − 1  r  1 ).
Proof: We have
1
 (x i − x)(y i − y)
r= n ,i=1,2,………n
1 1
 (x i − x)  (y i − y)
2 2

n n

1
 a i  bi
r= n
1 2 1 2
ai  bi
n n

r 2
=
( a i  bi )
2

2 2 ------------(1)
ai  bi

By Schwartz inequality, which states that if ai , bi i=1, 2,…..,n are real quantities then

( a i  b i )   a i  bi and
2 2 2
the sign of equality holding if and only if

a1 a 2 a 3 a
= = = ............ = n .
b1 b 2 b 3 bn
Using this equation (1) becomes
r2  1
 r 1
 −1  r  1
Hence correlation coefficient cannot exceed unity numerically.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 12 | 28
RV Institute of Technology & Management ®

Note:

Fig. 4.2 Correlation illustrated.

1. If r =-1 there is a perfect negative correlation.

2. If r =1 there is a perfect positive correlation.
3. If r =0 then the variables are non-correlated.

4. when r = 0, = . ie when the variables are independent the two lines of regression are
2
perpendicular to each other.
5. When r = 1, θ = 0 or π .i,e the lines of regression coincide.

Problems:
1. While calculating the correlation coefficient between x and y from 25 pairs of observations a

 x i = 125,  x i = 650,
2
person obtained the following values.

 y i = 100,  y i = 460, x i y i = 508 . It was later discovered that he had copied down the
2

pairs (8,12) and (6,8) as (6,12) and (8,6) respectively. Obtain the correct value of the correlation
coefficient.

x = 125,  x i = 650,  yi = 102,  yi = 488, x i yi = 532 , n=25

2 2
Solution: Correct  i

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 13 | 28
RV Institute of Technology & Management ®

n  xy −( x)(  y)
r= = 0.51912
n x 2 − ( x) 2 n y 2 − ( y) 2 

2. The following Table gives the age (in years) of 10 married couples. Calculate the coefficient of

Age of Husband(x) 23 27 28 29 30 31 33 35 36 39
Age of wife(y) 18 22 23 24 25 26 28 29 30 32

correlation between these ages.

Solution: Here n=10
1 311 1 257
We find x =  xi = = 31.1 y =  y i = = 25.7
n 10 n 10
xi Xi = x i - x Xi
2
Yi = yi − y Yi
2
Xi Yi
23 -8.1 65.61 -7.7 59.29 62.37
27 -4.1 16.81 -3.7 13.69 15.17
28 -3.1 9.61 -2.7 7.29 8.37
29 -2.1 4.41 -1.7 2.89 3.57
30 -1.1 1.21 -0.7 0.49 0.77
31 -0.1 0.01 0.3 0.09 -0.03
33 1.9 3.61 2.3 5.29 4.37
35 3.9 15.21 3.3 10.89 12.87
36 4.9 24.01 4.3 18.49 21.07
39 7.9 62.41 6.3 39.69 49.77

 X i = 202.9  Yi = 158.10  X i Y i =178.3

2 2

 X i Yi
r= =0.9955
2 2
 X i  Yi

here the correlation is almost perfect.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 14 | 28
RV Institute of Technology & Management ®

i.e, the ages of husbands and wives are almost perfectly correlated.

3. Psychological test of intelligence and of Engineering ability were applied to 10 students.

Calculate the coefficient of correlation between intelligence ratio (I.R) and Engineering ratio
(E.R).

x: 105 104 102 101 100 99 98 96 93 92

y: 101 103 100 98 95 96 104 92 97 94

Solution: First we prepare a table including the data of x and y series, and calculate the necessary
totals required to compute r.

990 980
x= = 99 & y = = 98
10 10
∑𝑋𝑌 92
𝑟= = = 0.5963.
√∑𝑋 2 ∑𝑌 2 √170×140

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 15 | 28
RV Institute of Technology & Management ®

5. Compute Pearson’s Coefficient of correlation between x & y from the following data:
x: 1 2 3 4
y: 1 4 9 16
Solution:

n xy − (  x )(  y )
r=
 n x 2 − ( x )2   n y 2 − ( y )2 
       
4 100 − 10  30
=
( 4  30 −10 ) ( 4  354 − 30 )
2 2

400 − 300
= = 0.9843
√20 × 516

Regression:
Correlation describes the strength of an association between two variables, and is completely
symmetrical, the correlation between A and B is the same as the correlation between B and A.
However, if the two variables are related it means that when one changes by a certain amount the other
changes on an average by a certain amount. The relationship can be represented by a simple equation
called the regression equation. In this context "regression" (the term is a historical anomaly) simply
means that the average value of y is a "function" of x, that is, it changes with x.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 16 | 28
RV Institute of Technology & Management ®

Regression analysis is a mathematical measure of the average relationship between two or more
variables in terms of the original units of data.

Line of regression:
Line of regression is the line which gives the best estimate to the value of one variable for any specific
value of the other variable. So, the line of regression is the line of best fit.
Regression line of y on x:
Let regression line of y on x be y = a + bx
The normal equations by the method of least squares is

 y = na + b x
 xy = a x + b x 2

1 b
n
 y = a + x
n
y = a + bx is the regression line passing through ( ( x , y )

 (x − x)(y − y)
b=
 (x − x)
2

 (XY)  (XY) σy
= = =r
X
2
nσ x
2
σx
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 𝜎 (𝑥 − 𝑥̅ ) is the regression line of y on x.
𝑥

Similarly,
𝜎
𝑥 − 𝑥̅ = 𝑟 𝜎𝑥 (𝑦 − 𝑦̅) is the regression line of x on y.
𝑦

Note:
1. Regression coefficient of y on x

 (x − x)(y − y) n  xy − x  y σ
byx = = =r y
 (x − x) n  x − ( x ) σx
2 2 2

2. Regression coefficient of x on y
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 17 | 28
RV Institute of Technology & Management ®

 (x − x)(y − y) n  xy − x  y σx
b xy = = = r
 (x − x) n  y 2 − ( y) 2 σy
2

Problems:
1. If two regression equations of the variables x and y are x = 19.13 - .87y, y = 11.6 – 0.5x, find
(a) mean of x
(b) mean of y
(c)The correlation coefficient between x and y.

Solution: Since x and y lie on two regression lines

x = 19.13 − 0.87y, y = 11.64 − 0.5x, solving x = 15.79,

y = 3.74
b yx = −0.5, b xy = −0.87, r = − 0.5 −0.87 = −0.66

2. For the data given below, obtain the two regression lines and hence obtain correlation
coefficient.
x 1 3 4 2 5 8 9 10 13 15

y 8 6 10 8 12 16 16 10 32 32

Solution: First we consider a table containing all the required results;

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 18 | 28
RV Institute of Technology & Management ®

x= = =7& y =  =
x 70 y 150
= 15
n 10 n 10
Let X = x − x = x − 7 & Y = y − y = y − 15
The lines of regression of y on x is
( y − y ) = byx ( x − x ),

where,

byx = 
XY 360
= = 1.76
 X 2 204
 y − 15 = 1.76( x − 7)
 y = 1.76 x − 1.76  7 + 15
 y = 1.76 x + 2.68
The lines of regression of x on y is

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 19 | 28
RV Institute of Technology & Management ®

( x − x ) = bxy ( y − y ),

where

bxy = 
XY 360
= = 0.44
 y2 818
 x − 7 = 0.44( y − 15)
 x = 0.44 y − 0.44 15 + 7
 x = 0.44 y + 0.4

 r =  bxy  byx = + 1.76  0.44 = +0.88

The sign of r is positive since both the regression coefficients are positive.

3. In the following table are recorded data showing the test scores made by salesmen on an
intelligence test and their weekly sales:
Salesmen 1 2 3 4 5 6 7 8 9 10 Total

Test 40 70 50 60 80 50 90 40 60 60 600
scores

Sales 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0 40.5
(000)

Calculate the regression line of Sales (y) on test Scores (x) and estimate the most probable weekly
Sales volume if a Salesmen makes a score of 70.
Solution:

x= = = 60 & y =  =
x 600 y 40.5
= 4.05
n 10 n 10
The regression equation of y on x is
( y − y ) = byx ( x − x ),

where,

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 20 | 28
RV Institute of Technology & Management ®

n xy −  x y
byx =
n x 2 − (  x )
2

10  2570 − 600  40.5

byx =
10  38400 − 6002
25700 − 24300
=
384000 − 360000
= 0.058  0.06
Hence regression equation is
y − 4.05 = 0.06( x − 60)
 y = 0.06 x − 3.6 + 4.05  y = 0.06 x + 0.45
To find the probable weekly Sales (y) when Score (x) = 70
y = 0.06  70 + 0.45

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 21 | 28
RV Institute of Technology & Management ®

= 4.20 + 0.45 = 4.65.

1−𝑟 2 𝜎𝑥 𝜎𝑦
4. If 𝜃 is the angle between the two regression lines, show that 𝑡𝑎𝑛𝜃 = .
𝑟 𝜎𝑥2 +𝜎𝑦2
Explain the significance when r = 0 and r = 1 .
Solution: We know that if is acute, the angle between the lines y = m1x + c1 and y = m2x + c2
𝑚2 −𝑚1
is given by tan𝜃 = |1+𝑚 |.
𝑚 1 2
𝜎𝑦 𝜎
We know that 𝑏𝑦𝑥 = 𝑟 𝜎 & 𝑏𝑥𝑦 = 𝑟 𝜎𝑥 .
𝑥 𝑦
Therefore, the slopes of the regression lines are given by;
𝜎𝑦 1 𝜎𝑦
𝑚1 = 𝑏𝑦𝑥 = 𝑟 𝜎 & 𝑚2 = 𝑏 = 𝑟𝜎 .
𝑥 𝑥𝑦 𝑥

𝜎𝑦 𝑟𝜎𝑦 𝜎𝑦 −𝑟2 𝜎𝑦
−
𝑟𝜎𝑥 𝜎𝑥 𝑟𝜎𝑥 1−𝑟 2 𝜎𝑥 𝜎𝑦
∴ tan𝜃 = 𝑟𝜎𝑦 𝜎𝑦 = 𝜎𝑥 +𝜎2
2 = .
1+ ×
𝜎𝑥 𝑟𝜎𝑥
𝑦 𝑟 𝜎𝑥2 +𝜎𝑦2
2
𝜎𝑥
𝜋
When r = 0, tan𝜃 → ∞ 𝑜𝑟 𝜃 = 2, i.e., when the Variables are uncorrelated or independent,
the two lines of regression are perpendicular to each other. When r = ±1, tan𝜃 = 0 𝑖. 𝑒. 𝜃 =
0 𝑜𝑟 𝜋.
Thus, the lines of regression coincide. i.e., there is perfect correlation between the two
variables.
5. If the coefficient of correlation between two variables x and y is 0.5 & the acute angle
3
between their lines of regression is tan−1 (5). Show that 𝜎𝑦 = 2𝜎𝑥 𝑜𝑟 𝜎𝑥 = 2𝜎𝑦 .
3 3
Solution: By data r = 0.5, 𝜃 = tan−1 (5) 𝑜𝑟 tan𝜃 = 5.
The angle between the lines of regression is
1 − r 2  x y
tan  =
r  x2 +  y2
  1 2 
 1− 
3   2    x y
 =
5  1   x2 +  y2
 2 

3 3 2  
i.e. =    2 x y 2
5  4 1  x + y
1 1  
i.e. =   2 x y 2
5  2 x + y

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 22 | 28
RV Institute of Technology & Management ®

2 x2 + 2 y2 = 5 x y
2 x2 − 5 x y + 2 y2 = 0
2 x2 − 4 x y −  x y + 2 y2 = 0
2 x ( x − 2 y ) −  y ( x − 2 y ) = 0
( x − 2 y )(2 x −  y ) = 0
  x = 2 y or 2 x =  y
.
6. In a partially destroyed laboratory, record of an analysis of correlation data, the following
results only are legible.
Variance of x=9, Regression equations 8x-10y+66=0 ,40x-18y=214 what are
(i) the mean values of x and y
(i) the correlation coefficient between x and y
(ii) the standard deviation of y.
Solution:
(i) since both the lines of regression pass through the point ( x , y )

8 x -10 y +66=0
40 x -18 y -214=0

Solving this x =13, y =17

(ii)
 x2 = 9
x =3

Let 8x-10y+66=0 and 40x-18y=214 be the lines of regression of y on x and

x on y respectively

4 18 9
b y x = , b xy = =
5 40 20
hence

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 23 | 28
RV Institute of Technology & Management ®

9
r 2 = b y x b xy =
25
3
r =  = 0.6
5
Since both the regression coefficients positive we take r =0.6
Standard deviation of y = 4.

Rank Correlation and an expression for the rank correlation coefficient.

The coefficient of correlation in respect of the ranks of some two characteristics of an

individual or an observation is called Rank Correlation Coefficient usually denoted by ρ.

The expression for ρ is given by,

,
Where x and y represent rankings of two variables from 1 to n.

Note:

(1) If the ranking of x, y are entirely in the same order like for example, x :1 1,2,3,4,5 ; y : 1,2,3,4,5
then ∑ 𝑑2 = ∑(𝑥 − 𝑦 )2 = 0. This will give us 𝜌 = ± 1 and is called perfect direct correlation.
If the ranking of x and y are entirely in the opposite order like for example, x : 1,2,3,4,5
y : 5,4,3,2,1 then ∑ 𝑑2 = 40.This will give us ρ = -1 and is called perfect inverse
correlation.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 24 | 28
RV Institute of Technology & Management ®

Problems:

1. Ten competitors in a beauty contest are ranked by two judges in the following order.
Compute the coefficient of correlation

I 1 6 5 3 10 2 4 9 7 8
II 6 4 9 8 1 2 3 10 5 7

6 ∑ 𝑑2
Solution: We have ρ =1-
𝑛 (𝑛2− 1)

For the given data, n = 10 and

∑ 𝑑2 = (1 − 6)2 + (6 − 4)2+(5 − 9)2 + (3 − 8)2 +(10 − 1)2 +(2 − 2)2 +(4 − 3)2 + (9 − 10)2

+ (7 − 5)2 +(8 − 7)2

= 25 + 4+ 16 + 25 + 81 + 0 +1 +1 + 4 + 1 =158
6 (158)
Hence 𝜌 = 1 – 10(102− 1) = 0.042

2. Ten students got the following percentage of marks in two subjects x and y. Compute
their rank correlation coefficient.

Marks in 78 36 98 25 75 82 90 62 65 39
x
Marks in 84 51 91 60 68 62 86 58 53 47
y

Solution: We prepare the table consisting of the given data along with the ranks assigned according

to their order of the magnitude. In the subject x, 98 will be awarded rank 1, 90 as rank 2 and so on.

Marks in x Rank(x) Marks in y Rank(y) d = (x-y) 𝑑2 = (𝑥 − 𝑦)2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 25 | 28
RV Institute of Technology & Management ®

78 4 84 3 1 1
36 9 51 9 0 0
98 1 91 1 0 0
25 10 60 6 4 16
75 5 68 4 1 1
82 3 62 5 -2 4
90 2 86 2 0 0
62 7 58 7 0 0
65 6 53 8 -2 4
39 8 47 10 -2 4
∑ 𝑑2 = 30

6 ∑𝑑2
We have ρ = 1 - 𝑛 (𝑛2− 1) and n = 10 for the given data.

6 (30)
= 1 - 10(10 2− 1)

= 0.82

3. Ten competitors in music contest are ranked by 3 judges A, B, C in the following order. Use
the rank correlation coefficient to decide which pair of judges have the nearest approach to
common taste of music

A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7

Solution: We shall compute 𝜌𝐴𝐵, 𝜌𝐵𝐶, 𝜌𝐶𝐴 with the help of the following table where d is the

difference in ranks.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 26 | 28
RV Institute of Technology & Management ®

A B C 𝑑2 𝐴𝐵 𝑑2 𝐵𝐶 𝑑2 𝐶𝐴
1 3 6 4 9 25
6 5 4 1 1 4
5 8 9 9 1 16
10 4 8 36 16 4
3 7 1 16 36 4
2 10 2 64 64 0
4 2 3 4 1 1
9 1 10 64 81 1
7 6 5 1 1 4
8 9 7 1 4 1
∑ 𝑑2 ∑ 𝑑2 ∑ 𝑑2
𝐴𝐵 𝐵𝐶 𝐶𝐴
= 200 = 214 = 60

2
We have ρ = 1 - 6 ∑ 𝑑 and n = 10 for the given data.
𝑛 (𝑛2− 1)

6 (200)
Now, 𝜌𝐴𝐵 = 1 - 10(102− 1) = - 0.21

6 (214)
𝜌𝐵𝐶 = 1 - 10(102− 1) = - 0.297
6 (60)
=1 -
𝜌𝐶𝐴 10(102− 1) = + 0.636

It may be observed that 𝜌𝐴𝐵 and 𝜌𝐵𝐶 are negative which means their tastes (A &B; B

&C) are opposite. But 𝜌𝐶𝐴 is positive and is nearer to 1. (perfect correlation)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 27 | 28
RV Institute of Technology & Management ®

Exercise:

1. The equations of regression lines of two variables x and y are x =19.3 – 0.87y and y =
11.64-0.5x, Find the correlation coefficient and the means of x and y.

2. If the tangent of the angle between the lines of regression of y on x and x on y is 0.6 and
the standard deviation of y is twice the standard deviation of x. Find the coefficient of
correlation between x and y.

3. The following information is available in respect of the prices of a certain consumer item
in two cities: A, B. Average price in city A is Rs.65; average price in city B is Rs.67;
standard deviation in city A is 2.5; standard deviation in city B is 3.5. The coefficient of
correlation between the prices in the two cities is 0.8. Find the most likely price in city B
corresponding to the price of Rs.70 in city A.

4. From the following data, calculate the coefficient of rank correlation

Rank in 1 2 3 4 5 6 7 8 9 10
Economics
Rank in 4 8 2 3 5 7 6 9 10 1
Statistics

5. Find the rank correlation coefficient from the following data:

X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8

Video links:
https://youtu.be/i6ZmA9EEzrI
https://youtu.be/rWsiBmA_8Q4
https://youtu.be/Kskex59qnN4

Disclaimer: The content provided is prepared by department of Mathematics for the specified syllabus
by using reference books mentioned in the syllabus. This material is specifically for the use of RVITM
students and for education purpose only.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

P a g e 28 | 28

Introduction To Statistics and Data Analysis
No ratings yet
Introduction To Statistics and Data Analysis
567 pages
Numerical Methodes - Chapter 4
No ratings yet
Numerical Methodes - Chapter 4
25 pages
Curve Fitting: Fitting A Straight Line
No ratings yet
Curve Fitting: Fitting A Straight Line
3 pages
Notes-Curve fitting & Interpolation
No ratings yet
Notes-Curve fitting & Interpolation
23 pages
UNIT-III Curve fitting & Smpling, App
No ratings yet
UNIT-III Curve fitting & Smpling, App
51 pages
Scan 18 Aug 2020
No ratings yet
Scan 18 Aug 2020
9 pages
Curve Fitting, NP Bali
No ratings yet
Curve Fitting, NP Bali
10 pages
Curve Fitting
No ratings yet
Curve Fitting
7 pages
Linear Least Square and Euler Method
No ratings yet
Linear Least Square and Euler Method
18 pages
NA 1.CurveFitting
No ratings yet
NA 1.CurveFitting
12 pages
Curve fitting-I-II
No ratings yet
Curve fitting-I-II
12 pages
M Iii 118 127
No ratings yet
M Iii 118 127
10 pages
Curve Fitting For Gtu Amee
No ratings yet
Curve Fitting For Gtu Amee
20 pages
Fitting A Straight Line by The Method of Least Squares
No ratings yet
Fitting A Straight Line by The Method of Least Squares
6 pages
Stats Main
No ratings yet
Stats Main
18 pages
Curve Fitting, B-Splines & Approximations
No ratings yet
Curve Fitting, B-Splines & Approximations
14 pages
Mathcs41 Module 4
No ratings yet
Mathcs41 Module 4
28 pages
5 - Curve Fitting by Numerical Methods
No ratings yet
5 - Curve Fitting by Numerical Methods
57 pages
P&S unit 2
No ratings yet
P&S unit 2
42 pages
Curve Fitting: Fitting A Straight Line
No ratings yet
Curve Fitting: Fitting A Straight Line
17 pages
Probability and Statistics- Book(Dr Hari Arora)
100% (3)
Probability and Statistics- Book(Dr Hari Arora)
473 pages
Curve Fitting
100% (4)
Curve Fitting
37 pages
Chapter IV
No ratings yet
Chapter IV
24 pages
SE 403 Lecture 5
No ratings yet
SE 403 Lecture 5
10 pages
CH#4 Curve Fittings-19-01-2025
No ratings yet
CH#4 Curve Fittings-19-01-2025
24 pages
Unit 1 - Curve Fitting & Statistical Methods
No ratings yet
Unit 1 - Curve Fitting & Statistical Methods
23 pages
Curve Fitting - DS
100% (1)
Curve Fitting - DS
9 pages
Curve Fitting
No ratings yet
Curve Fitting
20 pages
Principle of Least Square
No ratings yet
Principle of Least Square
6 pages
Method of Least Square - 20210823-232902
No ratings yet
Method of Least Square - 20210823-232902
11 pages
Notes UnitIII
No ratings yet
Notes UnitIII
53 pages
Linear Ization 1
No ratings yet
Linear Ization 1
24 pages
Curve Fitting and Solution of Equation
No ratings yet
Curve Fitting and Solution of Equation
37 pages
ProbStat - Curvefitting - U5notes
No ratings yet
ProbStat - Curvefitting - U5notes
25 pages
Chapter Four-2
No ratings yet
Chapter Four-2
30 pages
Batch 5
No ratings yet
Batch 5
26 pages
Curve Fitting and Regression
No ratings yet
Curve Fitting and Regression
24 pages
Term Paper: Curve Fitting Numerical Methods
No ratings yet
Term Paper: Curve Fitting Numerical Methods
14 pages
Unit 3-Statistical Techniques
No ratings yet
Unit 3-Statistical Techniques
21 pages
PS CH1,2,3
No ratings yet
PS CH1,2,3
79 pages
CURVE FITTING
No ratings yet
CURVE FITTING
16 pages
5_6122829974731751861
No ratings yet
5_6122829974731751861
8 pages
Lease Squares Method
No ratings yet
Lease Squares Method
10 pages
Chapter 9
No ratings yet
Chapter 9
26 pages
Index Number
No ratings yet
Index Number
48 pages
regression
No ratings yet
regression
12 pages
Curve Fitting - Lecturers - 2
No ratings yet
Curve Fitting - Lecturers - 2
21 pages
Ch. 9 Curve Fitting
No ratings yet
Ch. 9 Curve Fitting
25 pages
Curve Fitting
No ratings yet
Curve Fitting
5 pages
Term Paper: Curve Fitting Numerical Methods
No ratings yet
Term Paper: Curve Fitting Numerical Methods
14 pages
Engg Maths Sem 3 Curve Fitting
No ratings yet
Engg Maths Sem 3 Curve Fitting
13 pages
Module 4-Curve Fitting
No ratings yet
Module 4-Curve Fitting
33 pages
Curve Fitting
No ratings yet
Curve Fitting
37 pages
Curve Fitting
No ratings yet
Curve Fitting
37 pages
cpp notes
No ratings yet
cpp notes
6 pages
Adobe Scan Dec 30, 2023
No ratings yet
Adobe Scan Dec 30, 2023
22 pages
least square method
No ratings yet
least square method
4 pages
Linear Regression Course
No ratings yet
Linear Regression Course
22 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
House Price Prediction: Group Name: Bug Free
No ratings yet
House Price Prediction: Group Name: Bug Free
32 pages
1) Qinthara 2) Fanisa 3) Roidatun
No ratings yet
1) Qinthara 2) Fanisa 3) Roidatun
12 pages
Zamora Beltranena Daniel Alejandro
No ratings yet
Zamora Beltranena Daniel Alejandro
7 pages
Spatial Panel-Data Models Using Stata: 17, Number 1, Pp. 139-180
No ratings yet
Spatial Panel-Data Models Using Stata: 17, Number 1, Pp. 139-180
42 pages
Descriptive Statistics Vs Inferential Statistics
No ratings yet
Descriptive Statistics Vs Inferential Statistics
8 pages
The PSPP Guide:: An Introduction To Statistical Analysis
No ratings yet
The PSPP Guide:: An Introduction To Statistical Analysis
95 pages
Stat 401B Exam 2 Key F15
No ratings yet
Stat 401B Exam 2 Key F15
10 pages
Data Description PDF
No ratings yet
Data Description PDF
38 pages
Instant Access to Statistics Explained An Introductory Guide for Life Scientists Steve Mckillup ebook Full Chapters
100% (5)
Instant Access to Statistics Explained An Introductory Guide for Life Scientists Steve Mckillup ebook Full Chapters
81 pages
IP 200 L Biostatistics
No ratings yet
IP 200 L Biostatistics
86 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
23 pages
Bias in Data Collection
100% (1)
Bias in Data Collection
14 pages
GL2201 Geostat T2 EDA Ali 12020064
No ratings yet
GL2201 Geostat T2 EDA Ali 12020064
7 pages
2.9 Kiwi Birds
No ratings yet
2.9 Kiwi Birds
3 pages
Ch. 10 Practice Test Solutions
No ratings yet
Ch. 10 Practice Test Solutions
4 pages
Confidence Intervals and Tests: List of Tables
No ratings yet
Confidence Intervals and Tests: List of Tables
4 pages
Summer 578 Assignment 2 Solutions
100% (1)
Summer 578 Assignment 2 Solutions
13 pages
Berrar_EBCB_2nd_edition_Cross-validation_preprint
No ratings yet
Berrar_EBCB_2nd_edition_Cross-validation_preprint
13 pages
Siti Noor Hazirah Sta715 Cdcs702 Cdcs
No ratings yet
Siti Noor Hazirah Sta715 Cdcs702 Cdcs
25 pages
Assignment 4
No ratings yet
Assignment 4
2 pages
Computation of Test Statistic On Population Mean
No ratings yet
Computation of Test Statistic On Population Mean
27 pages
Third Form Test 3 Statistics
No ratings yet
Third Form Test 3 Statistics
3 pages
Asghar Ghasemi, 2012
No ratings yet
Asghar Ghasemi, 2012
4 pages
Control Charts For PDF
No ratings yet
Control Charts For PDF
19 pages
STAT7055 Spring Session 2017 Topic 1 Tutorial Questions
No ratings yet
STAT7055 Spring Session 2017 Topic 1 Tutorial Questions
4 pages
03 Logistic Regression
No ratings yet
03 Logistic Regression
23 pages
Statistics
No ratings yet
Statistics
5 pages
Schafer SMMR 1999 MI Primer
No ratings yet
Schafer SMMR 1999 MI Primer
14 pages
MSC Econometrics (Ec402) : 2021-2022 Answers To Problem Set #6
No ratings yet
MSC Econometrics (Ec402) : 2021-2022 Answers To Problem Set #6
3 pages

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

RV Institute of Technology & Management ®

CURVE FITTING & STATISTICAL METHODS

Topic Learning Objectives:

Upon Completion of this unit, students will be able to:

Method of Least squares:

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Fitting of a straight line: y = a + bx

Let y = a + bx be the equation of the straight line.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Fitting of a second-degree equation (quadratic): y = a + bx + cx2

Let y = a + bx + cx 2 be the equation of the straight line.

The error estimate is given by E = y - a − bx - cx 2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

y = a + bx + cx2 is the line of best fit.

Fitting of a curve of the form: y = ax b

1. Fit a straight line for the following data

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Thus, normal equations are:

 x =10 ∑y=9 ∑ xy ∑x2 = 30 ∑x3 =100 ∑x4 ∑x2 y

We get a = 2, b = −0.5, c = 0.2.

5. Fit a parabola of the form, 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 to the following data

Solution: For this the normal equations are:

Thus, normal equations take the form:

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

If R is related to V by the relation R= a+bV+cV2. Find a, b and c.

Correlation and Regression:

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Fig. 4.1 Scatter diagram

Correlation Coefficient (Karl Pearson correlation coefficient)

of y, then the coefficient of correlation between x and y

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Limits for correlation coefficient

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Fig. 4.2 Correlation illustrated.

1. If r =-1 there is a perfect negative correlation.

x = 125,  x i = 650,  yi = 102,  yi = 488, x i yi = 532 , n=25

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

correlation between these ages.

 X i = 202.9  Yi = 158.10  X i Y i =178.3

here the correlation is almost perfect.

3. Psychological test of intelligence and of Engineering ability were applied to 10 students.

x: 105 104 102 101 100 99 98 96 93 92

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Solution: Since x and y lie on two regression lines

x = 19.13 − 0.87y, y = 11.64 − 0.5x, solving x = 15.79,

Solution: First we consider a table containing all the required results;

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

 r =  bxy  byx = + 1.76  0.44 = +0.88

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

10  2570 − 600  40.5

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

= 4.20 + 0.45 = 4.65.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Solving this x =13, y =17

Let 8x-10y+66=0 and 40x-18y=214 be the lines of regression of y on x and

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

Rank Correlation and an expression for the rank correlation coefficient.

The coefficient of correlation in respect of the ranks of some two characteristics of an

The expression for ρ is given by,

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

For the given data, n = 10 and

+ (7 − 5)2 +(8 − 7)2

Marks in x Rank(x) Marks in y Rank(y) d = (x-y) 𝑑2 = (𝑥 − 𝑦)2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

4. From the following data, calculate the coefficient of rank correlation

5. Find the rank correlation coefficient from the following data:

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)

You might also like