Ch.-1 Correlation, Regression and Curve Fitting
Ch.-1 Correlation, Regression and Curve Fitting
CORRELATION
Correlation is the relationship that exists between two or more variables. Two variables are
said to be correlated if a change in one variable affects a change in the other variable.
EXAMPLE:
1. Relationship between heights and weights.
2. Relationship between price and demand of commodity.
3. Relationship between rainfall and yield of crops.
Types Of Correlations
1. Positive and Negative correlations.
2. Simple and multiple correlations.
3. Partial and Total correlations.
4. Linear and Non-linear correlations.
2. Multiple Correlation: -
When more than two variables are studied, the relationship is described as multiple
correlation, e.g., relationship of price, demand, and supply of a commodity.
Nonlinear Correlation: - If the ratio of change between two variables is not constant, the
correlation is said to nonlinear. The graph of a nonlinear or curvilinear relationship will be
a curve.
X 15 22 25 30 35 40
Y 4 5 8 9 10 12
Method of correlation:
There are two different methods.
1. Graphic methods.
2. Mathematical methods.
Scatter diagram:
This is a very simple method studying the relationship between two variables. In this
method one variable is taken on X-axis and the other variable is taken on Y-axis and for
each pair of values, points are plotted as follows:
1
𝑐𝑜𝑣(𝑋, 𝑌) = ∑(𝑥 − 𝑥̅ )(𝑦 − 𝑦̅)
𝑛
Example: 1
Calculate the correlation coefficient between the following data
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
solution
n=6
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
𝛴𝑥 = 36 𝛴𝑦 = 60 𝛴𝑥 2 = 266 𝛴𝑦 2 =706 𝛴𝑥𝑦=293
𝛴𝑥𝛴𝑦
∑ 𝑥𝑦 −
𝑟= 𝑛
2 2
√𝛴𝑥 2 − (𝛴𝑥) √𝛴𝑦 2 − (𝛴𝑦)
𝑛 𝑛
(36)(60)
293 −
𝑟= 6
2 2
√266 − (36) √706 − (60)
6 6
𝒓 = −𝟎. 𝟗𝟐𝟎𝟑
Example : 2
Calculate the correlation coefficient between the following data
x 5 9 13 17 21
y 12 20 25 33 35
Solution
n=5
𝑥̅ = 𝛴𝑥𝑖 65
= =13
𝑛 5
𝑦̅ = 𝛴𝑦𝑖 125
= =25
𝑛 5
236
𝑟=
√160√358
𝒓 = 𝟎. 𝟗𝟖𝟔
Example 3
Calculate The Correlation Coefficient Between for The Following Values of Demand And
The Corresponding Price Of A Commodity:
Demand in quintals 65 66 67 67 68 69 70 72
Solution
Let the demand in quintal be denoted by x and the price in rupees per kg be denoted by
y.
n=8
𝑥̅ = 𝛴𝑥𝑖 544
= =68
𝑛 8
𝑦̅ = 𝛴𝑦𝑖 552
= =69
𝑛 8
Example 4
Given n = 10, 𝜎𝑥 = 5.4, 𝜎𝑦 = 6.2, and sum of the product of deviations
from the mean of 𝑥 and 𝑦 is 66. Find the correlation coefficient.
∑(𝑥 − 𝑥̅ )2
𝜎𝑥 = √
𝑛
∑(𝑥 − 𝑥̅ )2
5.4 = √
10
∑(𝑦 − 𝑦̅)2
𝜎𝑦 = √
𝑛
∑(𝑦 − 𝑦̅)2
6.2 = √
10
Examples
1.Find the Pearson’s Correlation Coefficient of the following data:
𝑥 100 101 102 102 100 99 97 98 96 95
𝑦 98 99 99 97 95 92 95 94 90 91
2.Calculate Karl Pearson’s coefficient of correlation for the data given below:
𝑥 10 14 18 22 26 30 10
𝑦 18 12 24 6 30 36 18
Spearman's rank correlation coefficient, often denoted by the symbol 𝝆 (rho), is a non-
parametric measure of statistical dependence between two variables.
Rank the data: For each variable, rank the data from lowest to highest, assigning a rank to
each value. If there are ties, assign each tied value the average of the ranks it would have
received if there were no tie.
Calculate the differences between ranks: For each pair of data points, find the
difference between their ranks.
Spearman’s Rank correlation coefficient:
𝟔∑𝒅𝟐
Calculated by following formula: 𝒓 = 𝟏 − 𝒏(𝒏𝟐 −𝟏)
𝑚
In ∑𝑑 2 , 12(𝑚2−1) is added where 𝑚 is the number of times an item is repeated.
The value of 𝝆 lies between -1 and 1. A positive value indicates a positive monotonic
relationship, while a negative value indicates a negative monotonic relationship. A value of
0 indicates no monotonic relationship.
Example:1
Two judges have given ranks to 10 students for their honesty. Find the rank correlation
coefficient of the following data:
1ST Judge 3 5 8 4 7 10 2 1 6 9
2nd Judge 6 4 9 8 1 2 3 10 5 7
Solution
3 6 −3 9
5 4 1 1
8 9 −1 1
4 8 −4 16
7 1 6 36
10 2 8 64
2 3 −1 1
1 10 −9 81
6 5 1 1
9 7 2 4
∑𝑑 2 =214
6∑𝑑 2 6(214) 1284
𝑟 =1− 2
= 1− =1− = 1 − 1.30 = −𝟎. 𝟑
𝑛(𝑛 − 1) 10(100 − 1) 990
Example:2
Ten students got the following percentage of marks in mathematics and physics.
(x)maths 8 36 98 25 75 82 92 62 65 35
(y)physics 84 51 91 60 68 62 86 58 35 49
Find the rank correlation coefficient.
Solution
𝑛 = 10
x y Rank in Rank in 𝑑 =𝑥−𝑦 𝑑2
maths(x) physics(y)
8 84 10 3 7 49
36 51 7 8 –1 1
98 91 1 1 0 0
25 60 9 6 3 9
75 68 4 4 0 0
82 62 3 5 –2 4
92 86 2 2 0 0
62 58 6 7 –1 1
65 35 5 10 –5 25
35 49 8 9 –1 1
∑𝑑 = 0 ∑𝑑 2 = 90
6∑𝑑 2 6(90)
𝑟 =1− = 1 − = 𝟎. 𝟒𝟓𝟓
𝑛(𝑛2 − 1) 10(100 − 1)
Example:3
Find the Coefficient of rank correlation of the following data:
𝑥 35 40 42 43 40 53 54 49 41 55
Solution:
x y Rank in (x) Rank in (y) 𝑑 =𝑥−𝑦 𝑑2
35 102 10 1 9 81
40 101 8.5 2.5 6 36
42 97 6 5.5 0.5 0.25
43 98 5 4 1 1
40 38 8.5 10 -1.5 2.25
53 101 3 2.5 0.5 0.25
54 97 2 5.5 -3.5 10.25
49 92 4 9 -5 25
41 95 7 7.5 -0.5 0.25
55 95 1 7.5 -6.5 42.25
= ∑𝑑 2 = 200.25
𝑚 𝑚 𝑚 𝑚
6 {∑𝑑2 + 12 (𝑚2 − 1) + 12 (𝑚2 − 1) + 12 (𝑚2 − 1) + 12 (𝑚2 − 1) }
𝑟 = 1−
𝑛(𝑛2 − 1)
6{200.50 + 0.5 + 0.5 + 0.5 + 0.5 }
𝑟 =1−
990
𝒓 = −𝟎. 𝟐𝟐𝟕
EXAMPLES
1.Compute Spearman’s rank correlation coefficient from the following data:
𝑥 18 20 34 52 12
𝑦 39 23 35 52 12
2.The following table gives the scores obtained by 11 students in English and Tamil translation.
Find the rank correlation coefficient.
𝑥 40 46 54 60 70 80 82 85 85 90 95
𝑦 45 45 50 43 40 75 55 72 65 42 70
3.Following are the scores of ten students in a class and their IQ:
Score 35 40 25 55 85 90 65 55 45 50
IQ 100 100 110 140 150 130 100 120 140 110
REGRESSION: By studying the correlation, we can know the existence, degree and
direction of relationship between two variables but we cannot answer the question of the
type if there is a certain amount of change in one variable, what will be the corresponding
change in the other variable. The above type of question can be answered if we can establish
a quantitative relationship between two related variables. The statistical tool by which it is
possible to predict or estimate the unknown values of one variable from known values of
another variable is called regression. A line of regression is a straight line.
This equation is called regression line 𝑌 on 𝑋 and 𝑏𝑦𝑥 is called regression coefficient. The
formula can be computed as:
(𝒚 − 𝒚̅ ) = 𝒃𝒚𝒙 (𝒙 − 𝒙̅ )
𝑛∑𝑥𝑦−(𝛴𝑥)(𝛴𝑦)
Where 𝑏𝑦𝑥 = 𝑛𝛴𝑥 2 −(𝛴𝑥)2
This formula can be used to compute the value of y for given value of x.
Similarly, the regression line 𝑋on 𝑌 and 𝑏𝑥𝑦 is called regression coefficient. The formula can
be computed as
(𝒙 − 𝒙̅ ) = 𝒃𝒙𝒚 (𝒚 − 𝒚̅ )
𝑛∑𝑥𝑦−(𝛴𝑥)(𝛴𝑦)
Where 𝑏𝑥𝑦 = 𝑛𝛴𝑦 2 −(𝛴𝑦)2
This formula can be used to compute the value of x for the given value of y.
NOTE:
(1) 𝑏𝑥𝑦 and 𝑏𝑦𝑥 are also computed using the following formula
𝑟𝜎 𝑟𝜎𝑦
𝑏𝑥𝑦 = 𝜎 𝑥 and𝑏𝑦𝑥 = 𝜎
𝑦 𝑥
Angle between the two regression lines are as follows:
𝑟2 − 1
( ) 𝜎𝑥 𝜎𝑦
𝑟
𝜃=| 2 |
𝜎 𝑥 + 𝜎2 𝑦
𝜋
When 𝑟 = 0 and 𝜃 = 2 in this case both the regression lines are perpendicular to each
other. If 𝑟 = ±1 and 𝜃 = 0 in this case both the regression lines are same line because
point (𝑥, 𝑦) is common point.
𝜋
When 𝑟 = 0 and 𝜃 = 2 in this case both the regression lines are perpendicular to each
other. If 𝑟 = ±1 and 𝜃 = 0 in this case both the regression lines are same line because
point (𝑥, 𝑦) is common point.
Example: The following data regarding the heights (y) and weights (x) of 100 college
students are given:𝛴𝑥 = 15000𝛴𝑥 2 = 2272500, 𝛴𝑥𝑦 = 1022250𝛴𝑦 = 6800𝛴𝑦 2 =
463025. Find the coefficient of correlation between height and weight and also the
equation of regression of height and weight.
Solution:
𝑛∑𝑥𝑦−(𝛴𝑥)(𝛴𝑦)
𝑏𝑦𝑥 = =0.1
𝑛𝛴𝑥 2 −(𝛴𝑥)2
𝑛∑𝑥𝑦 − (𝛴𝑥)(𝛴𝑦)
𝑏𝑥𝑦 = = 3.6
𝑛𝛴𝑦 2 − (𝛴𝑦)2
∑𝑥 15000
𝑥̅ = = = 150
𝑛 100
∑𝑦 6800
𝑦̅ = = = 68
𝑛 100
The equation of the line of regression of 𝑦 on 𝑥 is;
(𝑦 − 𝑦̅ ) = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
X 4 2 3 4 2
Y 2 3 2 4 4
Solution
𝑛=5
𝑥 𝑦 𝑥2 𝑦2 𝑥𝑦
4 2 16 4 8
2 3 4 9 6
3 2 9 4 6
4 4 16 16 16
2 4 4 16 8
∑𝑥 = 15 ∑𝑦 = 15 ∑𝑥 2 = 49 ∑𝑦 2 = 49 ∑𝑥𝑦 = 44
𝑛∑𝑥𝑦 − (𝛴𝑥)(𝛴𝑦)
𝑏𝑦𝑥 = = −0.25
𝑛𝛴𝑥 2 − (𝛴𝑥)2
𝑛∑𝑥𝑦 − (𝛴𝑥)(𝛴𝑦)
𝑏𝑥𝑦 = = −0.25
𝑛𝛴𝑦 2 − (𝛴𝑦)2
𝑟 = √𝑏𝑥𝑦 ∗ 𝑏𝑦𝑥 = 𝟎. 𝟐𝟓
Examples
1.Find the regression coefficient of y on x for the following data:
𝒙 1 2 3 4 5
𝒚 160 180 140 180 200
2. Find the equation of regression lines from the following data and also estimate y for x=1
and x for y=4.
𝒙 3 2 -1 6 4 -2 5 7
𝒚 5 13 12 -1 2 20 0 -3
3.Find the equation of regression lines and the correlation coefficient from the following data:
𝒙 28 41 40 38 35 33 46 32 36 33
𝒚 30 34 31 34 30 26 28 31 26 31
4.The following information is obtained for two variables x and y. Find regression equation of 𝑦
on 𝑥. n=10;∑𝑥 = 130; ∑𝑥 2 = 2288; ∑𝑥𝑦 = 3467.
Curve Fitting
Curve fitting is a statistical technique used to find the best-fitting curve or function that
describes a set of data points. The goal is to minimize the difference between the observed
data and the values predicted by the model. This process is commonly used in various
fields, including physics, engineering, biology, finance, and more. Curve fitting involves
the development of a mathematical model that describes the relationship between
independent and dependent variables in a dataset. The term "curve" is often used broadly,
encompassing various functional forms such as linear, polynomial, exponential,
logarithmic, and more complex equations. The "fitting" aspect refers to the optimization of
model parameters to minimize the discrepancy between observed data points and the
values predicted by the model. In essence, curve fitting aims to uncover the underlying
structure of the data and express it in a concise, mathematical form.
METHODS:
1.Linear Regression: One of the simplest forms of curve fitting, linear regression assumes a
linear relationship between the independent and dependent variables. The goal is to find the
best-fitting line (or hyperplane in higher dimensions) that minimizes the sum of squared
differences between observed and predicted values.(𝑌 = 𝐴𝑋 + 𝐵 𝑜𝑟 𝑋 = 𝐴𝑌 + 𝐵).
𝒇(𝒙) = 𝒂𝒙 + 𝒃
n
err = d i = ( y1 − f (x1 )) + ( y 2 − f (x 2 )) + ........( y n − f (x n ))
2 2 2 2
i =1
n
= ( y i − (axi + b ))
2
i =1
i =1 i =1 i =1 i =1 i =1 i =1
n n n n n
xi y i = a xi + b xi y i = a xi + n b
2
i =1 i =1 i =1 and i =1 i =1
∑ 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖 + 𝑛𝑏 (1)
𝑖=1 𝑖=1
𝑛
𝑛 𝑛
∑ 𝑥𝑖 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖2 + 𝑏 ∑ 𝑥𝑖 (2)
𝑖=1 𝑖=1
𝑖=1
EXAMPLE:1
Solution
Let the straight line to be fitted to the data be
𝑦 = 𝑎 + 𝑏𝑥
∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥 (1)
∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥 2 (2)
𝑛=6
𝑥 𝑦 𝑥2 𝑥𝑦
1 2.4 1 2.4
2 3 4 6
3 3.6 9 10.8
4 4 16 16
6 5 36 30
8 6 64 48
Example:2
Fit a straight line to the following data. Also, estimate the value of y at 𝑥 = 2.5
𝑥 0 1 2 3 4
Example:3
Fit a straight line using least square method
𝑥 0 0.5 1 1.5 2 2.5
Polynomial Regression: We started the linear curve fit by choosing a generic form of the
straight line 𝑓(𝑥) = 𝑎𝑥 + 𝑏
This is just one kind of function. There are an infinite number of generic forms we could
choose from for almost any shape we want. Let’s start with a simple extension to the linear
regression concept recall the examples of sampled data.
i =1
( (
= y1 − a + bx1 + cx1)) + (y − (a + bx
2 2
2 2 + +cx2
2
))
2
+ ........ + ( y n − (a + bxn + cxn ))
2
= (y − (a + bx + cx ))
n
2 2
i i i
i =1
To minimize the error, derivatives with respect to 𝑎, 𝑏 𝑎𝑛𝑑 𝑐 equal to 0.
(err ) n
a
( (
= − 2 y i − a + bxi + cxi = 0
2
))
i =1
(err ) n
b
( (
= − 2 xi y i − a + bxi + cxi = 0
2
))
i =1
(err ) n
b
2
( (
= − 2 xi y i − a + bxi + cxi = 0
2
))
i =1
Simplify these equations, we get
n n n
y i = a n + b xi + c xi
2
i =1 i =1 i =1
n n n n
xi y i = a xi + b xi + c xi
2 3
i =1 i =1 i =1 i =1
n n n n
xi y i = a xi + b xi + c xi
2 2 3 4
i =1 i =1 i =1 i =1
Example:1
Fit a second order polynomial equation to following data:
xi 0 0.5 1.0 1.5 2.0 2.5
Solution:
xi yi 2 3 4 xi y i 2
xi xi xi xi y i
0 0 0 0 0 0 0
0.5 0.25 0.25 0.125 0.0625 0.125 0.0625
1 1 1 1 1 1 1
1.5 2.25 2.25 3.375 5.0625 3.375 5.0625
2 4 4 8 16 8 16
2.5 6.25 6.25 15.625 39.0625 15.625 39.0625
x i =7.5 y i x i
2
x i
3
x i
4
x y i i x y i i
y i = a n + b xi + c xi
2
i =1 i =1 i =1
n n n n
xi y i = a xi + b xi + c xi
2 3
i =1 i =1 i =1 i =1
n n n n
xi y i = a xi + b xi + c xi
2 2 3 4
i =1 i =1 i =1 i =1
Curve fitting
Curve fitting - Other nonlinear fits (exponential)
Q: Will a polynomial of any order necessarily fit any set of data?
A: Nope, lots of phenomena don’t follow a polynomial form. They may be, for example,
exponential
(1) General exponential equation f ( x) = C e
Ax
Yi = a X i + n b
i =1 i =1 (1)
n n n
X Y = a X i + b X i
2
i i
i =1 i =1 i =1 (2)
After getting values of a and b , A = antilog a, C = antilog b .
Example:1
An experiment gave the following values:
X 1 5 7 9 12
Y 10 15 12 15 21
Solution:
X I = xi yi Yi = ln y i Xi
2 X i Yi
1 10 2.302585 1 2.302585
5 15 2.70805 25 13.54025
7 12 2.484906 49 17.39435
9 15 2.70805 81 24.37245
12 21 3.044522 144 36.53427
5 5 5 5
X Y X X Y
2
i i i i I
i =1 i =1 i =1 i =1
=34 =13.24811 =300 =94.1439
13.24811 = 34 A + 5B
94.1439 = 300 A + 34 B
A=2.00479, B=2.248664
a=antilog2.00479=7.424536, b=antilog (2.248664) =9.475068
(2) y = bx
a
Y i = nB + A X i
i =1 i =1 (1)
n n n
X Y = B X i + A X i
2
i i
i =1 i =1 i =1 (2)
It is known that v and t are connected by the relation v = bt , find the best possible
a
values of a and b.
𝑣 𝑡 𝑦 = 𝑙𝑜𝑔𝑣 𝑋 = 𝑙𝑜𝑔𝑡 X2 XY
350 61 2.544068 1.78533 3.18740262 4.542001
400 26 2.60206 1.414973 2.002149575 3.681846
500 7 2.69897 0.845098 0.714190697 2.280894
600 2.6 2.778151 0.414973 0.17220288 1.152859
4 4 2 3
Y X
4 4
i =1
i
i =1
i Xi
i =1
Xi
i =1
=10.62325 =4.460375 =6.075945772 =11.6576
Substitute in given equation,
n n
Y i = nB + A X i
i =1 i =1 (1)
n n n
X Y = B X i + A X i
2
i i
i =1 i =1 i =1 (2)
10.62325 = 4 B + 4.460375A
11.6575 = 4.460375B + 6.075945772A
On solving these equations B=2.845 A=a= - 0.17.
b = anti log( 2.845 ) = 699 .842