0% found this document useful (0 votes)
160 views

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

Laugh Long
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views

Module-IV Curve Fitting & Statistical Methods: RV Institute of Technology & Management

Uploaded by

Laugh Long
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

RV Institute of Technology & Management ®

Module-IV

CURVE FITTING & STATISTICAL METHODS

Topic Learning Objectives:

Upon Completion of this unit, students will be able to:

• Expand their knowledge and skills of the Statistical Concepts and a personal
• development experience towards the needs of statistical data analysis.
• Understand the Least Squares Method.
• Fit data using several types of curves.
• Evaluate correlation and regression coefficients.
• Investigate the strength and direction of a relationship between two variables by
• collecting measurements and using appropriate statistical analysis.

Introduction:

In many fields of Applied Mathematics and Engineering we face some problems and do the
experiments involving two variables.
In this chapter, we consider the Mathematical theory of statistics, by presenting an elementary
treatment of curve fitting, correlation and regression.
Suppose we are given n values of x1 , x2 , x3 ,…………….xn of an independent variable x and the
corresponding values y1 , y2 , y3 ,…………….yn of a variable y depending on x. Then the pairs (x1, y1),
(x2, y2), ........, (xn, yn) give us n- points in the xy-plane. Generally, it is not possible to find the actual
curve y = f(x) that passes through these points. Hence, we try to find a curve that serves as best
approximation to the curve y = f(x). Such a curve is referred to as the curve of best fit. The process of
determining a curve of best fit is called curve fitting. A method to find curve of best fit is called method
of least squares.

Method of Least squares:


The method of least squares tells that the curve should pass as closely as possible to meet all the points.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 1 | 28
RV Institute of Technology & Management ®

Let y= f(x) be an approximate relation that fits into the data (xi, yi) then yi are called observed values
Yi = f(xi) is called the expected values. The expected values Ei = yi - Yi are called the estimated error
or residuals.
The method of least squares provides a relationship y = f(x) such that sum of the squares of the residues
is least. Such a curve is known as least square curve.
We will discuss the fitting of the following types of the curves.

Fitting of a straight line: y = a + bx

Let y = a + bx be the equation of the straight line.


The error estimate is given by E = y - (a + bx) = y - a - bx
By the principle of least squares we have to determine the constants a, b such that
n
E =  ( y - a − bx) 2 is minimum.
1
For E to be minimum the two necessary conditions are
E E
= 0, =0
a b
E n
i.e, = 0  2 (y − a − bx)( −1) = 0
a 1
n
 2 (y − a − bx) = 0
1

  y −  a − b x = 0
  y = na + b x
E n
= 0  2 (y − a − bx)( −x) = 0
b 1

  xy = a  x + b x 2
The normal equations for estimating the values of a and b are
 y = na + b x

 xy = a  x + b x
2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 2 | 28
RV Institute of Technology & Management ®

Solving the above normal equations, we estimate the values of a & b. With these values of a and b
y = a + bx is the line of best fit.

Fitting of a second-degree equation (quadratic): y = a + bx + cx2

Let y = a + bx + cx 2 be the equation of the straight line.

The error estimate is given by E = y - a − bx - cx 2


By the principle of least squares we have to determine the constants a, b such that
n
E =  ( y - a − bx - cx2 )2 is minimum.
1
E E E
For E to be minimum = 0, = 0, =0
a b c
E n
= 0  2 (y − a − bx − cx 2 )(−1) = 0
a 1
  y −  a − b x = 0

  y = na + b x + c x
2

E n
= 0  2 (y − a − bx − cx 2 )(−x) = 0
b 1

3
  xy = a  x + b x 2 + c x
E n
= 0  2 (y − a − bx − cx 2 )(−x 2 ) = 0
c 1
  x 2 y = a x 2 + b x 3 + c x 4
The normal equations for estimating the values of a, b, c is

 y = na + b x + c x
2

 xy = a  x + b x + c x
2 3

4
 x y = a  x + b x + c x
2 2 3

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 3 | 28
RV Institute of Technology & Management ®

Solving the above equations, we estimate the values of a, b & c. With these values of a, b & c,

y = a + bx + cx2 is the line of best fit.

Fitting of a curve of the form: y = ax b


Let y = ax b
Taking log both sides
Log y = log a + blog x
Y = A + bX whereY = logy, A = loga, X = logx
The normal equations are
 Y = nA + b X

 XY = A X + b X
2

Solving the above equations, we estimate the values of a & b. With these values of a and b, y = axb
is the line of best fit.

Problems:

1. Fit a straight line for the following data

x 1 2 3 4 5 6
y 6 4 3 5 4 2

Solution: The normal equations for y=a + bx estimating the values of a and b are
 y = na + b x

 xy = a  x + b x
2

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 4 | 28
RV Institute of Technology & Management ®

x y x2 xy
1 6 1 6
2 4 4 8
3 3 9 9
4 5 16 20
5 4 25 20
6 2 36 12
∑x= 21 ∑𝑦 = 24 ∑x2 = 91 ∑xy=75

Given 𝑛 = 6,
2
 x =21,  y =24,  xy =75,  x =91

Therefore, we get
24=6a+21b and 75=21a+91b
Solving, we get a=5.799, b=-0.514
Therefore, the equation of best fit is y=5.799-0.514x

2. Fit a straight line of the form y= ax +b for the following data by the method of
least squares.
x 5 10 15 20 25
y 16 19 23 26 30
Solution: Let y= ax +b be the given straight line.
The normal equations are
∑y = a∑x + nb
∑xy = a∑x2 + b∑x

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 5 | 28
RV Institute of Technology & Management ®

x y x2 xy
5 16 25 80
10 19 100 190
15 23 225 345
20 26 400 520
25 30 625 750
∑x= 75 ∑𝑦 = 114 ∑x2 = 1375 ∑xy=1885
2
Therefore,  y = 114,  x = 75, xy = 1885,  x = 1375,
Substituting in the above equations we get a = 0.7, b =12.3
The best fit is y = 0.7x+12.3

3. Fit a power function (geometric curve) of the form y = axb to the data given below.

𝑥: 20 16 10 11 14
𝑦: 22 41 120 89 56
Solution: Given y = axb .
By taking log on both side, we get
log y = log a + b log xY = A + b X
where Y = log y, A = log a & X = log x.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 6 | 28
RV Institute of Technology & Management ®

Thus, normal equations are:


5 A +13.1079b = 20.1061 , 13.1079 A + 34.6782b = 51.9663
On solving these two equations, we get A=10.2127 & b = −2.3617.
Therefore, a = e10.2127=27247.
Thus, the least square geometric curve is 𝑦 = (27247)𝑥 −2.3617 .

4. Fit a curve of the form y = a + bx + cx 2 to the data by the method of least squares.
x 1 2 3 4
y 1.7 1.8 2.3 3.2
Solution:
x y xy x2 x3 x4 x2 y
1 1.7 1.7 1 1 1 1.7
2 1.8 3.6 4 8 16 7.2
3 2.3 6.9 9 27 81 20.7
4 3.2 12.8 16 64 256 51.2

 x =10 ∑y=9 ∑ xy ∑x2 = 30 ∑x3 =100 ∑x4 ∑x2 y


= 354 = 80.8
= 25
Substitute these values in normal equations

 y = na + b x + c x
2

 xy = a  x + b  x + c x
2 3

 x y = a  x + b  x + c x
2 2 3 4

We get a = 2, b = −0.5, c = 0.2.


Then the curve of best fit is y = 2 − 0.5x + 0.2x 2

5. Fit a parabola of the form, 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 to the following data

x : 1 2 3 4
y : 4 6 3 2
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 7 | 28
RV Institute of Technology & Management ®

Solution: For this the normal equations are:


𝑛𝑎 + 𝑏∑𝑥 + 𝑐∑𝑥 2 = ∑𝑦
𝑎∑𝑥 + 𝑏∑𝑥 2 + 𝑐∑𝑥 3 = ∑𝑥𝑦
𝑎∑𝑥 2 + 𝑏∑𝑥 3 + 𝑐∑𝑥 4 = ∑𝑥 2 𝑦.
The relevant table is as follows:

Thus, normal equations take the form:


4𝑎 + 10𝑏 + 30𝑐 = 15
10𝑎 + 30𝑏 + 100𝑐 = 33
30𝑎 + 100𝑏 + 354𝑐 = 87.
Solving these equations, we obtain
a = 2.250, b = 2.850 and c = -0.75. Hence,
𝑦 = 2.250 + 2.850𝑥 − 0.75𝑥 2 is the required parabola of fit.
Exercise:
1. An experiment gave the following values
v(ft/min) 350 400 500 600
t(min) 61 26 7 26
It is known that v and t are connected by the relation v =atb . Find the best
values of a and b

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 8 | 28
RV Institute of Technology & Management ®

2. The following table gives the production (in thousands of units) of a certain commodity in
different years:
Year x 1958 1968 1978 1988 1998
Production y 8 10 12 10 16
Fit a straight line to the data and estimate the production in the year 2005.
3. A simply supported beam carries a concentrated load P at its mid-point. Corresponding to
various values of P, the maximum deflection D is measured and the values are as given below:
P 100 120 140 160 180 200
D 0.45 0.55 0.6 0.7 0.8 0.85
Find a linear law of the form D=a+bP.
4. In some determination of the volume V of carbon dioxide dissolved in a given volume of water
at different temperatures T, the following pairs of values were obtained.

T 0 5 10 15
V 1.80 1.45 1.18 1

Obtain by the method of least squares a relation of the form V = a+bT which
best fits to these observations.
5. The following table gives the results of the measurements of train resistances; V is the velocity
in mile per hour. R is the resistance in pound per ton.

V 20 40 60 80 100 120
R 5.5 9.1 14.9 22.8 33.3 46

If R is related to V by the relation R= a+bV+cV2. Find a, b and c.

Correlation and Regression:


The word correlation is used in everyday life to denote some form of association. In statistical terms
we use correlation to denote association between two quantitative variables. We also assume that the
association is linear, that one variable increases or decreases a fixed amount for a unit increase or

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 9 | 28
RV Institute of Technology & Management ®

decrease in the other. The other technique that is often used in these circumstances is regression, which
involves estimating the best straight line to summarize the association.
Correlation
Correlation means simply a relation between two or more variables.
Two variables are said to be correlated if the change in one variable results in a corresponding change
in the other.
Eg: 1. x: supply y: price
2. x: demand y: Price
Types of correlation
Positive correlation:
If an increase or decrease in one variable corresponds to an increase or decrease in the other then the
correlation is said to be positive correlation or direct correlation.
Eg: 1. Demand and price of commodity.
2. Income and expenditure.
Negative correlation:
If an increase or decrease in one variable corresponds to a decrease or increase in the other then the
correlation is said to be negative correlation or inversely correlated.
Eg: 1. Supply and Price of a commodity.
2.Correlation between Volume and pressure of a perfect gas.
No correlation:
If there exist no relationship between two variables then they are said to be non-correlated.
Scatter diagram:
To obtain a measure of relationship between two variables x and y we plot their corresponding values
in the xy - plane. The resulting diagram Fig. 4.1 showing the collection of the dots is called the dot
diagram or scatter diagram.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 10 | 28
RV Institute of Technology & Management ®

Fig. 4.1 Scatter diagram

Correlation Coefficient (Karl Pearson correlation coefficient)


The degree of association is measured by a correlation coefficient, denoted by r. It is sometimes
called Karl Pearson's correlation coefficient and is a measure of linear association. If a curved line is
needed to express the relationship, other and more complicated measures of the correlation must be
used.
Let x1 , x2 ,x3 , . . . . . . , xn be n values of x and y1 , y 2 , y3 ,......y n be the corresponding n values

of y, then the coefficient of correlation between x and y


is
 (x − x)(y − y)
r= , where  x - variance of the x series,  y 2 - variance of the y series,
2

nσ x σ y
x y
x= -Mean of the x series y= - mean of the y series
n n
For computation purpose we can use the formula
n  xy −( x)(  y)
r=
n x 2 − ( x) 2 n y 2 − ( y) 2 

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 11 | 28
RV Institute of Technology & Management ®

Limits for correlation coefficient


The coefficient of correlation Numerically does not exceed unity ( − 1  r  1 ).
Proof: We have
1
 (x i − x)(y i − y)
r= n ,i=1,2,………n
1 1
 (x i − x)  (y i − y)
2 2

n n

1
 a i  bi
r= n
1 2 1 2
ai  bi
n n

r 2
=
( a i  bi )
2

2 2 ------------(1)
ai  bi

By Schwartz inequality, which states that if ai , bi i=1, 2,…..,n are real quantities then

( a i  b i )   a i  bi and
2 2 2
the sign of equality holding if and only if

a1 a 2 a 3 a
= = = ............ = n .
b1 b 2 b 3 bn
Using this equation (1) becomes
r2  1
 r 1
 −1  r  1
Hence correlation coefficient cannot exceed unity numerically.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 12 | 28
RV Institute of Technology & Management ®

Note:

Fig. 4.2 Correlation illustrated.

1. If r =-1 there is a perfect negative correlation.


2. If r =1 there is a perfect positive correlation.
3. If r =0 then the variables are non-correlated.

4. when r = 0, = . ie when the variables are independent the two lines of regression are
2
perpendicular to each other.
5. When r = 1, θ = 0 or π .i,e the lines of regression coincide.

Problems:
1. While calculating the correlation coefficient between x and y from 25 pairs of observations a

 x i = 125,  x i = 650,
2
person obtained the following values.

 y i = 100,  y i = 460, x i y i = 508 . It was later discovered that he had copied down the
2

pairs (8,12) and (6,8) as (6,12) and (8,6) respectively. Obtain the correct value of the correlation
coefficient.

x = 125,  x i = 650,  yi = 102,  yi = 488, x i yi = 532 , n=25


2 2
Solution: Correct  i

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 13 | 28
RV Institute of Technology & Management ®

n  xy −( x)(  y)
r= = 0.51912
n x 2 − ( x) 2 n y 2 − ( y) 2 

2. The following Table gives the age (in years) of 10 married couples. Calculate the coefficient of

Age of Husband(x) 23 27 28 29 30 31 33 35 36 39
Age of wife(y) 18 22 23 24 25 26 28 29 30 32

correlation between these ages.


Solution: Here n=10
1 311 1 257
We find x =  xi = = 31.1 y =  y i = = 25.7
n 10 n 10
xi Xi = x i - x Xi
2
Yi = yi − y Yi
2
Xi Yi
23 -8.1 65.61 -7.7 59.29 62.37
27 -4.1 16.81 -3.7 13.69 15.17
28 -3.1 9.61 -2.7 7.29 8.37
29 -2.1 4.41 -1.7 2.89 3.57
30 -1.1 1.21 -0.7 0.49 0.77
31 -0.1 0.01 0.3 0.09 -0.03
33 1.9 3.61 2.3 5.29 4.37
35 3.9 15.21 3.3 10.89 12.87
36 4.9 24.01 4.3 18.49 21.07
39 7.9 62.41 6.3 39.69 49.77

 X i = 202.9  Yi = 158.10  X i Y i =178.3


2 2

 X i Yi
r= =0.9955
2 2
 X i  Yi

here the correlation is almost perfect.


IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 14 | 28
RV Institute of Technology & Management ®

i.e, the ages of husbands and wives are almost perfectly correlated.

3. Psychological test of intelligence and of Engineering ability were applied to 10 students.


Calculate the coefficient of correlation between intelligence ratio (I.R) and Engineering ratio
(E.R).

x: 105 104 102 101 100 99 98 96 93 92


y: 101 103 100 98 95 96 104 92 97 94

Solution: First we prepare a table including the data of x and y series, and calculate the necessary
totals required to compute r.

990 980
x= = 99 & y = = 98
10 10
∑𝑋𝑌 92
𝑟= = = 0.5963.
√∑𝑋 2 ∑𝑌 2 √170×140

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 15 | 28
RV Institute of Technology & Management ®

5. Compute Pearson’s Coefficient of correlation between x & y from the following data:
x: 1 2 3 4
y: 1 4 9 16
Solution:

n xy − (  x )(  y )
r=
 n x 2 − ( x )2   n y 2 − ( y )2 
       
4 100 − 10  30
=
( 4  30 −10 ) ( 4  354 − 30 )
2 2

400 − 300
=   = 0.9843
√20 × 516

Regression:
Correlation describes the strength of an association between two variables, and is completely
symmetrical, the correlation between A and B is the same as the correlation between B and A.
However, if the two variables are related it means that when one changes by a certain amount the other
changes on an average by a certain amount. The relationship can be represented by a simple equation
called the regression equation. In this context "regression" (the term is a historical anomaly) simply
means that the average value of y is a "function" of x, that is, it changes with x.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 16 | 28
RV Institute of Technology & Management ®

Regression analysis is a mathematical measure of the average relationship between two or more
variables in terms of the original units of data.

Line of regression:
Line of regression is the line which gives the best estimate to the value of one variable for any specific
value of the other variable. So, the line of regression is the line of best fit.
Regression line of y on x:
Let regression line of y on x be y = a + bx
The normal equations by the method of least squares is

 y = na + b x
 xy = a x + b x 2

1 b
n
 y = a + x
n
y = a + bx is the regression line passing through ( ( x , y )

 (x − x)(y − y)
b=
 (x − x)
2

 (XY)  (XY) σy
= = =r
X
2
nσ x
2
σx
𝜎𝑦
𝑦 − 𝑦̅ = 𝑟 𝜎 (𝑥 − 𝑥̅ ) is the regression line of y on x.
𝑥

Similarly,
𝜎
𝑥 − 𝑥̅ = 𝑟 𝜎𝑥 (𝑦 − 𝑦̅) is the regression line of x on y.
𝑦

Note:
1. Regression coefficient of y on x

 (x − x)(y − y) n  xy − x  y σ
byx = = =r y
 (x − x) n  x − ( x ) σx
2 2 2

2. Regression coefficient of x on y
IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 17 | 28
RV Institute of Technology & Management ®

 (x − x)(y − y) n  xy − x  y σx
b xy = = = r
 (x − x) n  y 2 − ( y) 2 σy
2

Problems:
1. If two regression equations of the variables x and y are x = 19.13 - .87y, y = 11.6 – 0.5x, find
(a) mean of x
(b) mean of y
(c)The correlation coefficient between x and y.

Solution: Since x and y lie on two regression lines

x = 19.13 − 0.87y, y = 11.64 − 0.5x, solving x = 15.79,


y = 3.74
b yx = −0.5, b xy = −0.87, r = − 0.5 −0.87 = −0.66

2. For the data given below, obtain the two regression lines and hence obtain correlation
coefficient.
x 1 3 4 2 5 8 9 10 13 15

y 8 6 10 8 12 16 16 10 32 32

Solution: First we consider a table containing all the required results;

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 18 | 28
RV Institute of Technology & Management ®

x= = =7& y =  =
x 70 y 150
= 15
n 10 n 10
Let X = x − x = x − 7 & Y = y − y = y − 15
The lines of regression of y on x is
( y − y ) = byx ( x − x ),

where,

byx = 
XY 360
= = 1.76
 X 2 204
 y − 15 = 1.76( x − 7)
 y = 1.76 x − 1.76  7 + 15
 y = 1.76 x + 2.68
The lines of regression of x on y is

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 19 | 28
RV Institute of Technology & Management ®

( x − x ) = bxy ( y − y ),

where

bxy = 
XY 360
= = 0.44
 y2 818
 x − 7 = 0.44( y − 15)
 x = 0.44 y − 0.44 15 + 7
 x = 0.44 y + 0.4

 r =  bxy  byx = + 1.76  0.44 = +0.88

The sign of r is positive since both the regression coefficients are positive.

3. In the following table are recorded data showing the test scores made by salesmen on an
intelligence test and their weekly sales:
Salesmen 1 2 3 4 5 6 7 8 9 10 Total

Test 40 70 50 60 80 50 90 40 60 60 600
scores

Sales 2.5 6.0 4.5 5.0 4.5 2.0 5.5 3.0 4.5 3.0 40.5
(000)

Calculate the regression line of Sales (y) on test Scores (x) and estimate the most probable weekly
Sales volume if a Salesmen makes a score of 70.
Solution:

x= = = 60 & y =  =
x 600 y 40.5
= 4.05
n 10 n 10
The regression equation of y on x is
( y − y ) = byx ( x − x ),

where,

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 20 | 28
RV Institute of Technology & Management ®

n xy −  x y
byx =
n x 2 − (  x )
2

10  2570 − 600  40.5


byx =
10  38400 − 6002
25700 − 24300
=
384000 − 360000
= 0.058  0.06
Hence regression equation is
y − 4.05 = 0.06( x − 60)
 y = 0.06 x − 3.6 + 4.05  y = 0.06 x + 0.45
To find the probable weekly Sales (y) when Score (x) = 70
y = 0.06  70 + 0.45

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 21 | 28
RV Institute of Technology & Management ®

= 4.20 + 0.45 = 4.65.


1−𝑟 2 𝜎𝑥 𝜎𝑦
4. If 𝜃 is the angle between the two regression lines, show that 𝑡𝑎𝑛𝜃 = .
𝑟 𝜎𝑥2 +𝜎𝑦2
Explain the significance when r = 0 and r = 1 .
Solution: We know that if is acute, the angle between the lines y = m1x + c1 and y = m2x + c2
𝑚2 −𝑚1
is given by tan𝜃 = |1+𝑚 |.
𝑚 1 2
𝜎𝑦 𝜎
We know that 𝑏𝑦𝑥 = 𝑟 𝜎 & 𝑏𝑥𝑦 = 𝑟 𝜎𝑥 .
𝑥 𝑦
Therefore, the slopes of the regression lines are given by;
𝜎𝑦 1 𝜎𝑦
𝑚1 = 𝑏𝑦𝑥 = 𝑟 𝜎 & 𝑚2 = 𝑏 = 𝑟𝜎 .
𝑥 𝑥𝑦 𝑥

𝜎𝑦 𝑟𝜎𝑦 𝜎𝑦 −𝑟2 𝜎𝑦

𝑟𝜎𝑥 𝜎𝑥 𝑟𝜎𝑥 1−𝑟 2 𝜎𝑥 𝜎𝑦
∴ tan𝜃 = 𝑟𝜎𝑦 𝜎𝑦 = 𝜎𝑥 +𝜎2
2 = .
1+ ×
𝜎𝑥 𝑟𝜎𝑥
𝑦 𝑟 𝜎𝑥2 +𝜎𝑦2
2
𝜎𝑥
𝜋
When r = 0, tan𝜃 → ∞ 𝑜𝑟 𝜃 = 2, i.e., when the Variables are uncorrelated or independent,
the two lines of regression are perpendicular to each other. When r = ±1, tan𝜃 = 0 𝑖. 𝑒. 𝜃 =
0 𝑜𝑟 𝜋.
Thus, the lines of regression coincide. i.e., there is perfect correlation between the two
variables.
5. If the coefficient of correlation between two variables x and y is 0.5 & the acute angle
3
between their lines of regression is tan−1 (5). Show that 𝜎𝑦 = 2𝜎𝑥  𝑜𝑟 𝜎𝑥 = 2𝜎𝑦 .
3 3
Solution: By data r = 0.5, 𝜃 = tan−1 (5)  𝑜𝑟 tan𝜃 = 5.
The angle between the lines of regression is
1 − r 2  x y
tan  =
r  x2 +  y2
  1 2 
 1− 
3   2    x y
 =
5  1   x2 +  y2
 2 

3 3 2  
i.e. =    2 x y 2
5  4 1  x + y
1 1  
i.e. =   2 x y 2
5  2 x + y

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 22 | 28
RV Institute of Technology & Management ®

2 x2 + 2 y2 = 5 x y
2 x2 − 5 x y + 2 y2 = 0
2 x2 − 4 x y −  x y + 2 y2 = 0
2 x ( x − 2 y ) −  y ( x − 2 y ) = 0
( x − 2 y )(2 x −  y ) = 0
  x = 2 y or 2 x =  y
.
6. In a partially destroyed laboratory, record of an analysis of correlation data, the following
results only are legible.
Variance of x=9, Regression equations 8x-10y+66=0 ,40x-18y=214 what are
(i) the mean values of x and y
(i) the correlation coefficient between x and y
(ii) the standard deviation of y.
Solution:
(i) since both the lines of regression pass through the point ( x , y )

8 x -10 y +66=0
40 x -18 y -214=0

Solving this x =13, y =17


(ii)
 x2 = 9
x =3

Let 8x-10y+66=0 and 40x-18y=214 be the lines of regression of y on x and


x on y respectively

4 18 9
b y x = , b xy = =
5 40 20
hence

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 23 | 28
RV Institute of Technology & Management ®

9
r 2 = b y x b xy =
25
3
r =  = 0.6
5
Since both the regression coefficients positive we take r =0.6
Standard deviation of y = 4.

Rank Correlation and an expression for the rank correlation coefficient.

The coefficient of correlation in respect of the ranks of some two characteristics of an


individual or an observation is called Rank Correlation Coefficient usually denoted by ρ.

The expression for ρ is given by,

,
Where x and y represent rankings of two variables from 1 to n.

Note:

(1) If the ranking of x, y are entirely in the same order like for example, x :1 1,2,3,4,5 ; y : 1,2,3,4,5
then ∑ 𝑑2 = ∑(𝑥 − 𝑦 )2 = 0. This will give us 𝜌 = ± 1 and is called perfect direct correlation.
If the ranking of x and y are entirely in the opposite order like for example, x : 1,2,3,4,5
y : 5,4,3,2,1 then ∑ 𝑑2 = 40.This will give us ρ = -1 and is called perfect inverse
correlation.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 24 | 28
RV Institute of Technology & Management ®

Problems:

1. Ten competitors in a beauty contest are ranked by two judges in the following order.
Compute the coefficient of correlation

I 1 6 5 3 10 2 4 9 7 8
II 6 4 9 8 1 2 3 10 5 7

6 ∑ 𝑑2
Solution: We have ρ =1-
𝑛 (𝑛2− 1)

For the given data, n = 10 and

∑ 𝑑2 = (1 − 6)2 + (6 − 4)2+(5 − 9)2 + (3 − 8)2 +(10 − 1)2 +(2 − 2)2 +(4 − 3)2 + (9 − 10)2

+ (7 − 5)2 +(8 − 7)2

= 25 + 4+ 16 + 25 + 81 + 0 +1 +1 + 4 + 1 =158
6 (158)
Hence 𝜌 = 1 – 10(102− 1) = 0.042

2. Ten students got the following percentage of marks in two subjects x and y. Compute
their rank correlation coefficient.

Marks in 78 36 98 25 75 82 90 62 65 39
x
Marks in 84 51 91 60 68 62 86 58 53 47
y

Solution: We prepare the table consisting of the given data along with the ranks assigned according

to their order of the magnitude. In the subject x, 98 will be awarded rank 1, 90 as rank 2 and so on.

Marks in x Rank(x) Marks in y Rank(y) d = (x-y) 𝑑2 = (𝑥 − 𝑦)2


IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)
P a g e 25 | 28
RV Institute of Technology & Management ®

78 4 84 3 1 1
36 9 51 9 0 0
98 1 91 1 0 0
25 10 60 6 4 16
75 5 68 4 1 1
82 3 62 5 -2 4
90 2 86 2 0 0
62 7 58 7 0 0
65 6 53 8 -2 4
39 8 47 10 -2 4
∑ 𝑑2 = 30

6 ∑𝑑2
We have ρ = 1 - 𝑛 (𝑛2− 1) and n = 10 for the given data.

6 (30)
= 1 - 10(10 2− 1)

= 0.82

3. Ten competitors in music contest are ranked by 3 judges A, B, C in the following order. Use
the rank correlation coefficient to decide which pair of judges have the nearest approach to
common taste of music

A 1 6 5 10 3 2 4 9 7 8
B 3 5 8 4 7 10 2 1 6 9
C 6 4 9 8 1 2 3 10 5 7

Solution: We shall compute 𝜌𝐴𝐵, 𝜌𝐵𝐶, 𝜌𝐶𝐴 with the help of the following table where d is the

difference in ranks.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 26 | 28
RV Institute of Technology & Management ®

A B C 𝑑2 𝐴𝐵 𝑑2 𝐵𝐶 𝑑2 𝐶𝐴
1 3 6 4 9 25
6 5 4 1 1 4
5 8 9 9 1 16
10 4 8 36 16 4
3 7 1 16 36 4
2 10 2 64 64 0
4 2 3 4 1 1
9 1 10 64 81 1
7 6 5 1 1 4
8 9 7 1 4 1
∑ 𝑑2 ∑ 𝑑2 ∑ 𝑑2
𝐴𝐵 𝐵𝐶 𝐶𝐴
= 200 = 214 = 60

2
We have ρ = 1 - 6 ∑ 𝑑 and n = 10 for the given data.
𝑛 (𝑛2− 1)

6 (200)
Now, 𝜌𝐴𝐵 = 1 - 10(102− 1) = - 0.21

6 (214)
𝜌𝐵𝐶 = 1 - 10(102− 1) = - 0.297
6 (60)
=1 -
𝜌𝐶𝐴 10(102− 1) = + 0.636

It may be observed that 𝜌𝐴𝐵 and 𝜌𝐵𝐶 are negative which means their tastes (A &B; B

&C) are opposite. But 𝜌𝐶𝐴 is positive and is nearer to 1. (perfect correlation)

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 27 | 28
RV Institute of Technology & Management ®

Exercise:

1. The equations of regression lines of two variables x and y are x =19.3 – 0.87y and y =
11.64-0.5x, Find the correlation coefficient and the means of x and y.

2. If the tangent of the angle between the lines of regression of y on x and x on y is 0.6 and
the standard deviation of y is twice the standard deviation of x. Find the coefficient of
correlation between x and y.

3. The following information is available in respect of the prices of a certain consumer item
in two cities: A, B. Average price in city A is Rs.65; average price in city B is Rs.67;
standard deviation in city A is 2.5; standard deviation in city B is 3.5. The coefficient of
correlation between the prices in the two cities is 0.8. Find the most likely price in city B
corresponding to the price of Rs.70 in city A.

4. From the following data, calculate the coefficient of rank correlation

Rank in 1 2 3 4 5 6 7 8 9 10
Economics
Rank in 4 8 2 3 5 7 6 9 10 1
Statistics

5. Find the rank correlation coefficient from the following data:

X 17 13 15 16 6 11 14 9 7 12
Y 36 46 35 24 12 18 27 22 2 8

Video links:
https://youtu.be/i6ZmA9EEzrI
https://youtu.be/rWsiBmA_8Q4
https://youtu.be/Kskex59qnN4

Disclaimer: The content provided is prepared by department of Mathematics for the specified syllabus
by using reference books mentioned in the syllabus. This material is specifically for the use of RVITM
students and for education purpose only.

IV-Semester, Complex Analysis, Probability and Statistical Methods (18MAT41)


P a g e 28 | 28

You might also like