0% found this document useful (0 votes)

31 views25 pages

Unit 1 Correlation, Regression and Curve Fitting 2024-25

Uploaded by

Rajat Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views25 pages

Unit 1 Correlation, Regression and Curve Fitting 2024-25

Uploaded by

Rajat Mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

PARUL UNIVERSITY

FACULTY OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF APPLIED SCIENCE AND
HUMANITIES
4th SEMESTER B. TECH PROGRAMME
PROBABILITY, STATISTICS AND NUMERICAL
METHODS (303191251)
ACADEMIC YEAR 2024-2025
UNIT: 1 CORRELATION, REGRESSION AND CURVE
FITTING

CORRELATION
Correlation is the relationship that exists between two or more variables. Two variables are
said to be correlated if a change in one variable affects a change in the other variable.
EXAMPLE:
1. Relationship between heights and weights.
2. Relationship between price and demand of commodity.
3. Relationship between rainfall and yield of crops.
Types Of Correlations
1. Positive and Negative correlations.
2. Simple and Multiple correlations.
3. Partial and Total correlations.
4. Linear and Non-linear correlations.

POSITIVE AND NEGATIVE CORRELATIONS

POSITIVE CORRELATIONS (Same direction)
If both the variables vary in the same direction, the correlation is said to be positive.
In other words, if the value of one variable increases, the value of the other variable also
increases. Same decreases.
Height (cm) 120 130 135 140 145
Weight(kg) 50 55 60 65 70

NEGATIVE CORRELATIONS (Opposite direction)

If both the variables vary in the opposite direction, the correlation is said to be
negative. In other words, if the value of one variable increases, the value of other variable
decreases.
Height (cm) 120 130 135 140 145
Weight(kg) 70 65 60 55 50
SIMPLE AND MULTIPLE CORRELATIONS
1. Simple Correlation: -
When only two variables are studied, the relationship is described as simple correlation, e.g.,
the quantity of money and price level, demand and price, etc.

2. Multiple Correlation: -
When more than two variables are studied, the relationship is described as multiple
correlation, e.g., relationship of price, demand, and supply of a commodity.

PARTIAL AND TOTAL CORRELATIONS

1. Partial Correlation
When more than two variables are studied excluding some other variables, the relationship
is termed as partial correlation.
2. Total Correlation
When more than two variables are studied without excluding any variables, the relationship
is termed total correlation.

Linear and Nonlinear Correlations

1 . Linear Correlation:
If the ratio of change between two variables is constant, the correlation is said to be linear.
If such variables are plotted on a graph paper, a straight line is obtained.
X 5 10 15 20 25 30
Y 2 4 6 8 10 12

2. Nonlinear Correlation:
If the ratio of change between two variables is not constant, the correlation is said to
nonlinear. The graph of a nonlinear or curvilinear relationship will be a curve.
X 15 22 25 30 35 40
Y 4 5 8 9 10 12

Method of correlation:
There are two different methods.
1. Graphic methods.
2. Mathematical methods.

Graphic methods:
1. Scatter diagram.
2. Simple graph.
Mathematical methods:
1. Karl Pearson’s coefficient of correlation.
2. Spearman’s rank coefficient of correlation.

Scatter diagram:
This is a very simple method studying the relationship between two variables. In this method
one variable is taken on X-axis and the other variable is taken on Y-axis and for each pair
of values, points are plotted as follows:

Example 1:
After standard deduction from total income, 20% income tax is imposed on the
remaining income. The information regarding the taxable income and the tax to be paid
is given below for five persons.

Person 1 2 3 4 5
Taxable Income (thousand ₹ ) 𝒙 50 30 80 20 100
Income Tax (thousand ₹ ) 𝒚 10 6 16 4 20

Draw a scatter diagram from this information and discuss about the correlation.
Solution:
The following scatter diagram is obtained by plotting the points corresponding to the
ordered pairs (50,10), (30,6), (80,16), (20,4) and (100,20) of 𝑥 and 𝑦.
We can see that all the points lie on the same line in the scatter diagram. We can
also see that as the values of variable 𝑋 change, the values of variable 𝑌 also change in
the same direction with a constant proportion. Hence, we can see that there is a perfect
positive correlation between two variables 𝑋 and 𝑌.

Example: 2
To know the relation between monthly expenditure and monthly savings for middle class
families, the information regarding expenditure and savings for 5 families is given
below. (The monthly income of each family is ₹ 20,000)

Monthly Expenditure (thousand ₹) 𝒙 15 18 8 10 12

Monthly Savings (thousand ₹) 𝒚 5 2 12 10 8

Draw a scatter diagram indicating the relation between monthly expenditure and
monthly savings from this information and discuss about their correlation.
Solution:
The following scatter diagram is obtained by plotting the points of ordered pairs (15,5),
(18,2), (8,12), (10,10), (12,8) of 𝑋 and 𝑌 on the graph paper.
We can see that all the points lie on the same line in the scatter diagram. We can also see
that as the values of variable 𝑋 change, the values of variable 𝑌 also change in the
opposite direction with a constant proportion. Hence, we can see that there is a perfect
negative correlation between 𝑋 and 𝑌.

Exercise
1. A ball pen making company wants to know the relation between the price (in ₹ ) and
supply (in thousand units) of its most selling Gel Pen. The following information is
collected for it: Draw a scatter diagram and interpret it.
Price (in ₹ ) 14 16 12 11 15 13 17
Monthly Supply 32 50 20 12 45 30 53

2. The following information is collected to study the relationship between the

minimum day temperature and sale of woollen cloths during a particular day of
winter for six different cities.
Minimum day temperature
12 20 8 5 15 24
(Celsius)
Sale of woollen cloths
35 10 45 70 20 8
(thousand units)
Draw a scatter diagram from this information and interpret it.
Karl Pearson’s Coefficient of Correlation
The coefficient of correlation is the measure of correlation between two random variables
X and Y, and is denoted by r.
cov(𝑋, 𝑌)
𝑟 =
𝜎 𝜎
where cov(𝑋, 𝑌) is covariance of variables 𝑋 and 𝑌,
𝜎 is the standard deviation of variable 𝑋,
and 𝜎 is the standard deviation of variable 𝑌.
This expression is known as Karl Pearson’s coefficient of correlation or Karl Pearson’s
product-moment coefficient of correlation.

1
𝑐𝑜𝑣(𝑋, 𝑌) = ∑(𝑥 − 𝑥 )(𝑦 − 𝑦)
𝑛

∑(𝑥 − 𝑥 ) ∑(𝑦 − 𝑦)
𝜎 = , 𝜎 =
𝑛 𝑛

∑(𝑥 − 𝑥 )(𝑦 − 𝑦)
𝑟=
∑ (𝑥 − 𝑥 ) ∑(𝑦 − 𝑦)

The above expression can be further modified.

𝛴𝑥𝛴𝑦
∑ 𝑥𝑦 −
𝑟= 𝑛
(𝛴𝑥) (𝛴𝑦)
𝛴𝑥 − 𝛴𝑦 −
𝑛 𝑛

Properties of Coefficient of Correlation

1. The coefficient of correlation lies between -1 and 1, i.e., −1 ≤ 𝑟 ≤ 1.
2. Correlation coefficient is independent of change of origin and change of scale.
3. Two independent variables are uncorrelated.
Example : 1
Calculate the correlation coefficient between 𝑥 and 𝑦 using the following data:
𝑥 2 4 5 6 8 11
𝑦 18 12 10 8 7 5

Solution:
n=6
𝑥 𝑦 𝑥 𝑦 𝑥𝑦
2 18 4 324 36
4 12 16 144 48
5 10 25 100 50
6 8 36 64 48
8 7 64 49 56
11 5 121 25 55
𝛴𝑥 = 36 𝛴𝑦 = 60 𝛴𝑥 = 266 𝛴𝑦 =706 𝛴𝑥𝑦=293

𝛴𝑥𝛴𝑦 (36)(60)
∑ 𝑥𝑦 − 293 −
𝑟= 𝑛 = 6 = −𝟎. 𝟗𝟐𝟎𝟑
(𝛴𝑥) (𝛴𝑦) (36) (60)
𝛴𝑥 − 𝛴𝑦 − 266 − 706 −
𝑛 𝑛 6 6

Example : 2
Calculate the correlation coefficient between the following data:
𝑥 5 9 13 17 21
𝑦 12 20 25 33 35
(Summer 2023)
Solution:
𝑛 = 5
𝛴𝑥 65 𝛴𝑦 125
𝑥̅ = = = 13, 𝑦= = = 25
𝑛 5 𝑛 5
𝑥 𝑦 (𝑥 − 𝑥 ) (𝑦 − 𝑦) (𝑥 − 𝑥 ) (𝑦 − 𝑦) (𝑥 − 𝑥 )(𝑦 − 𝑦)
5 12 –8 –13 64 169 104
9 20 –4 –5 16 25 20
13 25 0 0 0 0 0
17 33 4 8 16 64 32
21 35 8 10 64 100 80
∑𝑥 =65 ∑𝑦 =125 ∑(𝑥 − 𝑥 ) ∑(𝑦 − 𝑦) ∑(𝑥 − 𝑥 ) ∑(𝑦 − 𝑦) ∑(𝑥 − 𝑥 )(𝑦 − 𝑦)
=0 =0 = 160 = 358 = 236

∑(𝑥 − 𝑥 )(𝑦 − 𝑦) 236

𝑟= = = 𝟎. 𝟗𝟖𝟔
∑ (𝑥 − 𝑥 ) ∑(𝑦 − 𝑦) √160√358
Example : 3
Calculate the correlation coefficient between for the following values of demand and the
corresponding price of a commodity:

Demand in quintals 65 66 67 67 68 69 70 72
Price in rupees per kg 67 68 65 68 72 72 69 71

Solution
Let the demand in quintal be denoted by x and the price in rupees per kg be denoted by y.
𝑛 = 8
𝛴𝑥 544
𝑥̅ = = = 68
𝑛 8
𝛴𝑦 552
𝑦= = = 69
𝑛 8
𝑥 𝑦 (𝑥 − 𝑥 ) (𝑦 − 𝑦) (𝑥 − 𝑥 ) (𝑦 − 𝑦) (𝑥 − 𝑥)(𝑦 − 𝑦)
65 67 –3 –2 9 4 6
66 68 –2 –1 4 1 2
67 65 –1 –4 1 16 4
67 68 –1 –1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
65 67 –3 –2 9 4 6
∑𝑥 ∑𝑦 ∑(𝑥 − 𝑥 ) ∑(𝑦 − 𝑦) ∑(𝑥 − 𝑥) ∑(𝑦 − 𝑦) ∑(𝑥 − 𝑥 )(𝑦 − 𝑦)
= 544 = 552 =0 =0 = 36 = 44 = 24

∑(𝑥 − 𝑥)(𝑦 − 𝑦) 24
𝑟= = = 𝟎. 𝟔𝟎𝟑
∑(𝑥 − 𝑥) ∑ (𝑦 − 𝑦 ) √36√44

Example : 4
Given n = 10, 𝜎 = 5.4, 𝜎 = 6.2, and sum of the product of deviations from the mean of 𝑥 and
𝑦 is 66. Find the correlation coefficient.

Solution
n = 10, 𝜎 = 5.4, 𝜎 = 6.2

∑(𝑥 − 𝑥)(𝑦 − 𝑦) = 66
1 66
𝑐𝑜𝑣(𝑋, 𝑌) = ∑(𝑥 − 𝑥 )(𝑦 − 𝑦) = = 6.6
𝑛 10
cov(𝑋, 𝑌) 6.6
𝑟 = = = 𝟎. 𝟏𝟗𝟕
𝜎 𝜎 5.4 × 6.2

Exercise:
1. Find the Pearson’s Correlation Coefficient of the following data:
𝑥 100 101 102 102 100 99 97 98 96 95
𝑦 98 99 99 97 95 92 95 94 90 91

2. Calculate Karl Pearson’s coefficient of correlation for the data given below:
𝑥 10 14 18 22 26 30 10
𝑦 18 12 24 6 30 36 18

3. Find the Pearson’s Correlation Coefficient of the following data:

𝑥 9 8 7 6 5 4 3 2 1
𝑦 15 16 14 1 11 12 10 8 9
(Winter 2022-23)
4. Find Coefficient of Correlation between the following data:
𝑥 1 2 3 4 5 6 7 8 9
𝑦 9 8 10 12 11 13 14 16 15
(Winter 2023-24)
5. Calculate Karl Pearson’s coefficient of correlation for the data given below:

𝑥 17 19 21 26 20
𝑦 23 27 25 26 27

6. Given n = 10, 𝜎 = 10.8, 𝜎 = 12.4, and sum of the product of deviations

from the mean of 𝑥 and 𝑦 is 132. Find the correlation coefficient.

Spearman’s Rank correlation coefficient:

Spearman's rank correlation coefficient, often denoted by the symbol 𝝆 (rho), is a non-
parametric measure of statistical dependence between two variables.

Here’s a brief explanation of how Spearman’s rank correlation coefficient is calculated.

Rank the data: For each variable, rank the data from lowest to highest, assigning a rank to
each value. If there are ties, assign each tied value the average of the ranks it would have
received if there were no tie.
Calculate the differences between ranks: For each pair of data points, find the difference
between their ranks.
Spearman’s Rank correlation coefficient:
𝟔∑𝒅𝟐
Calculated by following formula: 𝒓 = 𝟏 −
𝒏 𝒏𝟐 𝟏

Where n = number of pair

In case finding out rank correlation coefficient when the observations are paired, the above
formula can be written as:
𝑚 𝑚
6 ∑𝑑 + (𝑚 − 1) + (𝑚 − 1) + … … … … … …
𝑟 =1− 12 12
𝑛(𝑛 − 1)

In ∑𝑑 , ( )
is added where 𝑚 is the number of times an item is repeated.

The value of 𝝆 lies between -1 and 1. A positive value indicates a positive monotonic
relationship, while a negative value indicates a negative monotonic relationship. A value of
0 indicates no monotonic relationship.

Example: 1
Two judges have given ranks to 10 students for their honesty. Find the rank correlation
coefficient of the following data:
1ST Judge 3 5 8 4 7 10 2 1 6 9
2nd Judge 6 4 9 8 1 2 3 10 5 7
Solution
Rank given Rank given Difference in 𝑑
by 1st judge by 2nd judge ranks d

3 6 −3 9
5 4 1 1
8 9 −1 1
4 8 −4 16
7 1 6 36
10 2 8 64
2 3 −1 1
1 10 −9 81
6 5 1 1
9 7 2 4
∑𝑑 =214

6∑𝑑 6(214) 1284

𝑟 =1− =1− =1− = 1 − 1.30 = −𝟎. 𝟑
𝑛(𝑛 − 1) 10(100 − 1) 990
Example: 2
Ten students got the following percentage of marks in mathematics and physics.
(x)maths 8 36 98 25 75 82 92 62 65 35
(y)physics 84 51 91 60 68 62 86 58 35 49

Find the rank correlation coefficient.

Solution
𝑛 = 10
𝑥 𝑦 Rank in Rank in 𝑑 =𝑥−𝑦 𝑑
maths (𝑥) physics (𝑦)
8 84 10 3 7 49
36 51 7 8 –1 1
98 91 1 1 0 0
25 60 9 6 3 9
75 68 4 4 0 0
82 62 3 5 –2 4
92 86 2 2 0 0
62 58 6 7 –1 1
65 35 5 10 –5 25
35 49 8 9 –1 1
∑𝑑 = 0 ∑𝑑 = 90

6∑𝑑 6(90)
𝑟 = 1− = 1− = 𝟎. 𝟒𝟓𝟓
𝑛(𝑛 − 1) 10(100 − 1)

Example: 3
Find the Coefficient of rank correlation of the following data: (Summer 2022-23)

𝑥 35 40 42 43 40 53 54 49 41 55

𝑦 102 101 97 98 38 101 97 92 95 95

Solution
𝑛 = 10
𝑥 𝑦 Rank in (𝑥) Rank in (y) 𝑑 =𝑥−𝑦 𝑑
35 102 10 1 9 81

40 101 8.5 2.5 6 36

42 97 6 5.5 0.5 0.25

43 98 5 4 1 1

40 38 8.5 10 -1.5 2.25

53 101 3 2.5 0.5 0.25

54 97 2 5.5 -3.5 12.25

49 92 4 9 -5 25

41 95 7 7.5 -0.5 0.25

55 95 1 7.5 -6.5 42.25

∑𝑑 = 200.50

𝑚 𝑚 𝑚 𝑚
6 ∑𝑑 + (𝑚 − 1) + (𝑚 − 1) + (𝑚 − 1) + (𝑚 − 1)
𝑟 =1− 12 12 12 12
𝑛(𝑛 − 1)
6{200.50 + 0.5 + 0.5 + 0.5 + 0.5 }
=1−
990
= −𝟎. 𝟐𝟐𝟕

Exercise:
1.Compute Spearman’s rank correlation coefficient from the following data:
𝑥 18 20 34 52 12
𝑦 39 23 35 52 12
2.Obtain the rank correlation coefficient from the following data.
𝑥 10 12 18 18 15 40
𝑦 12 18 25 25 50 25

(Summer 2023-24)
REGRESSION:
By studying the correlation, we can know the existence, degree and direction of relationship
between two variables but we cannot answer the question of the type if there is a certain
amount of change in one variable, what will be the corresponding change in the other
variable. The above type of question can be answered if we can establish a quantitative
relationship between two related variables. The statistical tool by which it is possible to
predict or estimate the unknown values of one variable from known values of another
variable is called regression. A line of regression is a straight line.

LINES OF REGRESSION
If the variables, which are highly correlated, are plotted on a graph then the points lie in a
narrow strip. If all the points in the scatter diagram cluster around a straight line, the line is
called the line of regression. The line of regression is the line of best fit and is obtained by
the principle of least squares.
Line of Regression of 𝒚 on 𝒙:
It is the line which gives the best estimate for the values of 𝑦 for any given values of 𝑥.
The regression equation of 𝑦 on 𝑥 is given by
𝝈𝒚
(𝒚 − 𝒚 ) = 𝒓 (𝒙 − 𝒙 )
𝝈𝒙

It is also written as
𝑦 = 𝑎 + 𝑏𝑥
Line of regression of 𝒙 on 𝒚:
It is the line which gives the best estimate for the values of x for any given values of y. The
regression equation for x on y is given by
𝝈𝒙
(𝒙 − 𝒙 ) = 𝒓 (𝒚 − 𝒚 )
𝝈𝒚

It is also written as
𝑥 = 𝑎 + 𝑏𝑦
where 𝑥̅ and 𝑦 are means of 𝑥 series and 𝑦 series respectively, 𝜎 and 𝜎 are standard
deviations of 𝑥 series and 𝑦 series respectively, 𝑟 is the correlation coefficient between
𝑥 and 𝑦.

REGRESSION COEFFICIENTS
The slope 𝑏 of the line of regression of 𝑦 on 𝑥 is also called the coefficient of regression of
𝑦 on 𝑥. It represents the increment in the value of 𝑦 corresponding to a unit change in the
value of 𝑥.

𝑏 = Regression coefficient of 𝑦 on 𝑥 = 𝑟
Similarly, the slope 𝑏 of the line of regression of 𝑥 on 𝑦 is called the coefficient of regression
of 𝑥 on 𝑦. It represents the increment in the value of 𝑥 corresponding to a unit change in the
value of 𝑦.

𝑏 = Regression coefficient of 𝑥 on 𝑦 = 𝑟

Expressions for Regression Coefficients:

(a) 𝑏 = 𝑟
∑(𝑥 − 𝑥 )(𝑦 − 𝑦)
=
∑(𝑥 − 𝑥)

and
𝜎
𝑏 =𝑟
𝜎
∑(𝑥 − 𝑥)(𝑦 − 𝑦)
=
∑(𝑦 − 𝑦)

(b) 𝑏 = 𝑟
∑𝑥∑𝑦
∑ 𝑥𝑦 −
= 𝑛
(∑ 𝑥)
∑𝑥 −
𝑛
and
𝜎
𝑏 =𝑟
𝜎
∑𝑥∑𝑦
∑ 𝑥𝑦 −
= 𝑛
(∑ 𝑦)
∑𝑦 −
𝑛

Properties of Regression Coefficient:

(1) The coefficient of correlation is the geometric mean of the coefficients of regression,
i.e., 𝑟 = 𝑏 𝑏 .
(2) If one of the regression coefficients is greater than one, the other must be less than one.
(3) The arithmetic mean of regression coefficients is greater than or equal to the coefficient
of correlation.
(4) Regression Coefficients are independent of the change of origin but not of scale.
Example: 1
The following data regarding the heights (y) and weights (x) of 100 college students are
given:
𝛴𝑥 = 15000, 𝛴𝑥 = 2272500, 𝛴𝑥𝑦 = 1022250
𝛴𝑦 = 6800, 𝛴𝑦 = 463025.
Find the coefficient of correlation between height and weight and also the equation of
regression of height and weight.
Solution:
𝑛 = 100
∑𝑥 ∑𝑦
∑ 𝑥𝑦 −
𝑏 = 𝑛
(∑ 𝑥)
∑𝑥 −
𝑛
(15000)(6800)
1022250 − 100
=
(15000)
2272500 − 100

= 0.1

∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑏 = 𝑛
(∑ 𝑦)
∑𝑦 −
𝑛
(15000)(6800)
1022250 − 100
=
(6800)
463025 − 100

= 3.6

𝑟= 𝑏 𝑏 = (0.1)(3.6) = 0.6

∑𝑥 15000
𝑥 = = = 150
𝑛 100
∑𝑦 6800
𝑦 = = = 68
𝑛 100
The equation of the line of regression of 𝑦 on 𝑥 is;
(𝑦 − 𝑦 ) = 𝑏 (𝑥 − 𝑥 )
(𝑦 − 68) = 0.1(𝑥 − 150)
𝒚 = 𝟎. 𝟏𝒙 + 𝟓𝟑
The equation of the line of regression of 𝑥 on 𝑦 is;
(𝑥 − 𝑥 ) = 𝑏 (𝑦 − 𝑦 )
𝑥 − 150 = 3.6(𝑦 − 68)
𝒙 = 𝟑. 𝟔𝒚 − 𝟗𝟒. 𝟖
Example: 2
Find the regression coefficients 𝑏𝑦𝑥 and 𝑏𝑥𝑦 and hence, find the correlationcoefficient
between x and y for the following data:
𝑥 4 2 3 4 2
𝑦 2 3 2 4 4

Solution
𝑛=5
𝑥 𝑦 𝑥 𝑦 𝑥𝑦
4 2 16 4 8
2 3 4 9 6
3 2 9 4 6
4 4 16 16 16
2 4 4 16 8
∑𝑥 = 15 ∑𝑦 = 15 ∑𝑥 = 49 ∑𝑦 = 49 ∑𝑥𝑦 = 44

∑𝑥 ∑𝑦 (15)(15)
∑ 𝑥𝑦 − 44 −
𝑏 = 𝑛 = 5 = −0.25
(∑ 𝑥) (15)
∑𝑥 − 49 −
𝑛 5
∑𝑥∑𝑦 (15)(15)
∑ 𝑥𝑦 − 44 −
𝑏 = 𝑛 = 5 = −0.25
(∑ 𝑦) (15)
∑𝑦 − 49 −
𝑛 5

𝑟= 𝑏 𝑏 = (−0.25)(−0.25) = 𝟎. 𝟐𝟓

Exercise
1.Find the regression coefficient of y on x for the following data:
𝒙 1 2 3 4 5
𝒚 160 180 140 180 200

2. Find the equation of regression lines from the following data and also estimate 𝑦
for 𝑥 = 1 and 𝑥 for 𝑦 = 4.

𝒙 3 2 -1 6 4 -2 5 7

𝒚 5 13 12 -1 2 20 0 -3
3.Find the equation of regression lines and the correlation coefficient from the following data:
𝒙 28 41 40 38 35 33 46 32 36 33
𝒚 30 34 31 34 30 26 28 31 26 31

4.The following information is obtained for two variables x and y. Find regression equation of 𝑦
on 𝑥. n=10;∑𝑥 = 130; ∑𝑥 = 2288; ∑𝑥𝑦 = 3467.

CURVE FITTING
Curve fitting is the process of finding the ‘best-fit’ curve for a given set of data. It is the
representation of the relationship between two variables by means of an algebraic equation.
On the basis of this mathematical equation, predictions can be made in many statistical problems.
Suppose a set of 𝑛 points of values (𝑥 , 𝑦 ), (𝑥 , 𝑦 ), … , (𝑥 , 𝑦 ) of the two variables 𝑥 and 𝑦 are
given. These values are plotted on a rectangular coordinate system, i.e., the 𝑥𝑦-plane. The
resulting set of points is known as a scatter diagram (Fig. 5.1). The scatter diagram exhibits the
trend and it is possible to visualize a smooth curve approximating the data. Such a curve is known
as an approximating curve.

METHODS:
1.Linear Regression: One of the simplest forms of curve fitting, linear regression assumes a
linear relationship between the independent and dependent variables. The goal is to find the best-
fitting line (or hyperplane in higher dimensions) that minimizes the sum of squared differences
between observed and predicted values.(𝑌 = 𝐴𝑋 + 𝐵 𝑜𝑟 𝑋 = 𝐴𝑌 + 𝐵).

2.Polynomial Regression: Polynomial regression extends linear regression by allowing the

model to include higher-degree polynomials. This flexibility enables a better fit for nonlinear
relationships in the data.(𝑌 = 𝐴𝑋 + 𝐵𝑋 + 𝐶)𝑜𝑟( 𝑋 = 𝐴𝑌 + 𝐵𝑌 + 𝐶).
3.Exponential and Logarithmic: Exponential and logarithmic curve fitting is suitable for
datasets exhibiting exponential growth or decay. These models are often used in fields like
biology, physics, and finance.(𝑌 = 𝑒 )
Linear Regression:
Given the general form of a straight line

f ( x)  ax  b
How can we pick the coefficients that best fits
the line to the data?
First question: What makes a particular
straight line a ‘good’ fit?
Why does the blue line appear to us to fit the
trend better?
• Consider the distance between the data and
points on the line
• Add up the length of all the red and blue
verticle lines
• This is an expression of the ‘error’ between
data and fitted line
• The one line that provides a minimum error
is then the ‘best’ straight line
Quantifying errors in a curve fit
(1) positive or negative error have the same
value (data point is above or below the line)
(2) Weight greater errors more heavily
we can do both of these things by squaring
the distance denote data values as (x, y)
======>>
denote points on the fitted line as (x, f(x))
sum the error at the four data points

n
err   d i   y1  f x1    y 2  f x 2   ........ y n  f x n 
2 2 2 2

i 1

  y1  ax1  b    y 2  ax 2  b   ........   y n  ax n  b 

2 2 2

n
   y i  axi  b 
2

i 1

Error is minimum if first ordered partial derivatives=0

 err  n  err  n
   2 xi  y i  axi  b   0    2 y i  ax i  b   0
a i 1 b i 1
n n n n n n
  xi y i  a  xi  b xi  0   y i  a  xi  b  1  0
2

i 1 i 1 i 1 i 1 i 1 i 1
n n n n n
  xi y i  a  x i  b  xi   y i  a xi  n b
2

i 1 i 1 i 1 and i 1 i 1

Solve the equations

𝑦 =𝑎 𝑥 + 𝑛𝑏 (1)

𝑥𝑦 =𝑎 𝑥 +𝑏 𝑥 (2)

Example: 1 Fit a straight line to the following data:

𝑥 1 2 3 4 6 8
𝑦 2.4 3 3.6 4 5 6
Solution
Let the straight line to be fitted to the data be
𝑦 = 𝑎 + 𝑏𝑥
∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥 (1)
∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥 (2)
𝑛=6

𝑥 𝑦 𝑥 𝑥𝑦
1 2.4 1 2.4
2 3 4 6

3 3.6 9 10.8

4 4 16 16

6 5 36 30

8 6 64 48

∑𝑥 = 24 ∑𝑦 = 24 ∑𝑥 = 130 ∑𝑥𝑦 = 113.2

Substituting these values inn Eqs (1) and (2)
24 = 6𝑎 + 24𝑏 (3)
113.2 = 24𝑎 + 130𝑏 (4)
Solving Eqs (3) and (4), we get
𝑎 = 1.9764
𝑏 = 0.5059
Hence, the required equation of straight line is 𝒚 = 𝟏. 𝟗𝟕𝟔𝟒 + 𝟎. 𝟓𝟎𝟓𝟗𝒙

Example: 2 Fit a straight line to the following data. Also, estimate the value of y at 𝑥 = 2.5.
𝑥 0 1 2 3 4

𝑦 1 1.8 3.3 4.5 6.3

(Winter 2022-23)
Example: 3 Fit a straight line using least square method.
𝑥 0 0.5 1 1.5 2 2.5
𝑦 0 1.5 3 4.5 6 7.5
(Winter 2023-24)

Example: 4 Fit a straight line to the following data and hence find 𝑦 when 𝑥 = 70
𝑥 71 68 73 69 67 65 66 67
𝑦 69 72 70 70 68 67 68 64
(Summer 2023-24)
Polynomial Regression: We started the linear curve fit by choosing a generic form of the
straight line 𝑓(𝑥) = 𝑎𝑥 + 𝑏
This is just one kind of function. There are an infinite number of generic forms we could
choose from for almost any shape we want. Let’s start with a simple extension to the linear
regression concept recall the examples of sampled data.

Error - Least squares approach

n
err   d i   y1  f  x1    y 2  f  x 2   ........ y n  f  x n 
2 2 2 2

i 1

 
 y1  a  bx1  cx1  y  a  bx
2 2
2 2   cx2
2
 2
 ........   y n  a  bx n  cx n 
2

  y  a  bx  cx 
n
2 2
i i i
i 1

To minimize the error, derivatives with respect to 𝑎, 𝑏 𝑎𝑛𝑑 𝑐 equal to 0.

 err  n
a
 
   2 y i  a  bxi  cxi  0
2

i 1

 err  n
b
 
   2 xi y i  a  bxi  cxi  0
2

i 1

 err  n
b
   2 xi y i  a  bxi  cxi  0
2 2
  
i 1

Simplify these equations, we get

n n n

 y i  a n  b  xi  c  x i
i 1 i 1 i 1
2

n n n n

 xi y i  a  xi  b xi  c  xi
2 3

i 1 i 1 i 1 i 1
n n n n

 xi y i  a  xi  b  xi  c  x i
2 2 3 4

i 1 i 1 i 1 i 1
Example: 1
Fit a least squares quadratic curve to the following data:
𝑥 1 2 3 4
𝑦 1.7 1.8 2.3 3.2
Estimate 𝑦(2.4).
Solution:
Let the equation of the least squares quadratic curve (parabola) be 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 .
The normal equations are

𝑦 = 𝑛𝑎 + 𝑏 𝑥+𝑐 𝑥 (1)

𝑥𝑦 = 𝑎 𝑥+𝑏 𝑥 +𝑐 𝑥 (2)

𝑥 𝑦=𝑎 𝑥 +𝑏 𝑥 +𝑐 𝑥 (3)

Here, 𝑛 = 4
𝑥 𝑦 𝑥 𝑥 𝑥 𝑥𝑦 𝑥 𝑦
1 1.7 1 1 1 1.7 1.7
2 1.8 4 8 16 3.6 7.2
3 2.3 9 27 81 6.9 20.7
4 3.2 16 64 256 12.8 51.2
∑ 𝑥 = 10 ∑𝑦 =9 ∑ 𝑥 = 30 ∑ 𝑥 = 100 ∑ 𝑥 = ∑ 𝑥𝑦 = 25 ∑𝑥 𝑦 =
354 80.8

Substitute these values in equations (1), (2) and (3),

9 = 4𝑎 + 10𝑏 + 30𝑐
25 = 10𝑎 + 30𝑏 + 100𝑐
80.8 = 30𝑎 + 100𝑏 + 354𝑐

Solving the above equations, we get

𝑎 = 2, 𝑏 = −0.5, 𝑐 = 0.2
Hence, the required equation of quadratic curve is
𝑦 = 2 − 0.5𝑥 + 0.2𝑥
𝑦(2.4) = 2 − (0.5)(2.4) + (0.2)(2.4) = 1.952

Example: 2
Fit a second-degree polynomial using least square method to the following data:
𝑥 0 1 2 3 4
𝑦 1 1.8 1.3 2.5 6.3
Example: 3
Fit a second order polynomial 𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 to following data, using least square method.
(Summer 2022-23)
𝑥 0 5 10 15 20
𝑦 7 11 16 20 26

Curve fitting - Other nonlinear fits (exponential)

Q: Will a polynomial of any order necessarily fit any set of data?
A: Nope, lots of phenomena don’t follow a polynomial form. They may be, for example,
exponential
(1) General exponential equation f ( x)  C e
Ax

Now, take log on both side, we get

ln y  ln C  Ax
Y  b  aX ; where Y  ln y, X  x, ln C  b and a  ln A
Which is equation of line, the original data in xy- plane mapped into XY-plane. This is
called linearization. The data x, y  transformed as  x, ln y  .

To find the value of a and b we will use the equations

n n

 Yi  a X i  n b
i 1 i 1 (1)
n n n

X Y  a  X i  b X i
2
i i
i 1 i 1 i 1 (2)
After getting values of a and b , A  antilog a, C  antilog b .

Example: An experiment gave the following values:

X 1 5 7 9 12
Y 10 15 12 15 21

Fit an exponential curve y  Ce

Solution:
X I = xi yi Yi  ln y i Xi
2
X i Yi
1 10 2.302585 1 2.302585
5 15 2.70805 25 13.54025
7 12 2.484906 49 17.39435
9 15 2.70805 81 24.37245
12 21 3.044522 144 36.53427
5 5 5 5

X Y X X Y
2
i i i i I
i 1 i 1 i 1 i 1
=34 =13.24811 =300 =94.1439
13.24811  34 A  5B
94.1439  300 A  34B

A=2.00479, B=2.248664
a=antilog2.00479=7.424536, b=antilog (2.248664) =9.475068

Hence, best fit curve is y  9.475068 e

2.248664 x

(2) y  bx
a

Taking log10 on both the side

log10 y  log10 b  a log10 x

Y  B  AX ; where Y  log10 y, X  log10 x and a  A, B  log10 b
n n

Y i  nB  A X i
i 1 i 1 (1)
n n n

X Y  B  X i  A X i
2
i i
i 1 i 1 i 1 (2)

Example: An experiment gave the following values:

v (ft/min) 350 400 500 600
t (min) 61 26 7 2.6

It is known that v and t are connected by the relation v  bt , find the best possible values
a

of a and b.
V t Y=logv X=logt X2 XY
350 61 2.544068 1.78533 3.18740262 4.542001
400 26 2.60206 1.414973 2.002149575 3.681846
500 7 2.69897 0.845098 0.714190697 2.280894
600 2.6 2.778151 0.414973 0.17220288 1.152859
4 4 4 2 4 3

 Yi
i 1
 Xi
i 1
X i X i
i 1 i 1
=10.62325 =4.460375 =6.075945772 =11.6576
Substitute in given equation,
n n

 Yi  nB  A X i
i 1 i 1 (1)
n n n

X Y  B  X i  A X i
2
i i
i 1 i 1 i 1 (2)
10.62325  4 B  4.460375A
11.6575  4.460375B  6.075945772A
On solving these equations B=2.845 A=a= - 0.17.
b  anti log(2.845)  699.842

3)The following values of T and l follow the law T= aln. Test if this is so and find the best
values of a and n.
T 1.0 1.5 2.0 2.5
L 25 56.2 100 1.56

Turkish Steel Design Tables
No ratings yet
Turkish Steel Design Tables
977 pages
JIS B 7736:: Brinell Hardness Test-Calibration of Reference Blocks
50% (2)
JIS B 7736:: Brinell Hardness Test-Calibration of Reference Blocks
18 pages
Advances in Fatigue and Fracture
No ratings yet
Advances in Fatigue and Fracture
248 pages
Soft or Annealed Copper Wire: Standard Specification For
No ratings yet
Soft or Annealed Copper Wire: Standard Specification For
5 pages
Nieuw Operation Manual of PE Jaw Crusher - Liming Heavy Industry
100% (1)
Nieuw Operation Manual of PE Jaw Crusher - Liming Heavy Industry
22 pages
Lab1 Report Mass Moment of Inertia of Flywheel Mundher Hidarah PDF
No ratings yet
Lab1 Report Mass Moment of Inertia of Flywheel Mundher Hidarah PDF
7 pages
Iat 1 Dme
No ratings yet
Iat 1 Dme
3 pages
Artigo - 1865 - Rodolf Clausius - Nascimento Da Entropia
No ratings yet
Artigo - 1865 - Rodolf Clausius - Nascimento Da Entropia
3 pages
C9 - BPSY1113 Behaviorism
No ratings yet
C9 - BPSY1113 Behaviorism
45 pages
Unit-4 & 5 Possible Question
No ratings yet
Unit-4 & 5 Possible Question
4 pages
Scientific American - July-August 2024
No ratings yet
Scientific American - July-August 2024
108 pages
Science 9 Quarter 2 Week 2 Bohrs Model vs. Quantum Mechanical Model
No ratings yet
Science 9 Quarter 2 Week 2 Bohrs Model vs. Quantum Mechanical Model
40 pages
27 Atoms: Solutions
No ratings yet
27 Atoms: Solutions
12 pages
Mca4020 SLM Unit 08
0% (1)
Mca4020 SLM Unit 08
36 pages
Unit 5 Measurement and Mechatronics Notes
No ratings yet
Unit 5 Measurement and Mechatronics Notes
118 pages
Earthquake Microzonation of Yogyakarta City
No ratings yet
Earthquake Microzonation of Yogyakarta City
23 pages
Reaction Anchor Pile Calculation For Routine Vertical Load Test - Echangaadu - P 644 05 ECV 03
No ratings yet
Reaction Anchor Pile Calculation For Routine Vertical Load Test - Echangaadu - P 644 05 ECV 03
11 pages
Week 1: Progress Check
No ratings yet
Week 1: Progress Check
4 pages
CORRELATION
No ratings yet
CORRELATION
12 pages
Correlation and Regression
No ratings yet
Correlation and Regression
59 pages
Correlational Analysis - Statistics - Alok - Kumar
No ratings yet
Correlational Analysis - Statistics - Alok - Kumar
42 pages
PPP Correlation BIOSTATISTICS
No ratings yet
PPP Correlation BIOSTATISTICS
14 pages
Compte Rendu TP CAO
No ratings yet
Compte Rendu TP CAO
5 pages
Correlation 6th Sem
No ratings yet
Correlation 6th Sem
11 pages
QT 38, Edpm, Corporate Account
No ratings yet
QT 38, Edpm, Corporate Account
38 pages
Correlation Regreesion Sums
No ratings yet
Correlation Regreesion Sums
50 pages
Unit II - Correlation
No ratings yet
Unit II - Correlation
28 pages
WINSEM2020-21 MAT2001 ETH VL2020210505834 Reference Material I 25-Mar-2021 Module 3 - Correlation and Regression
No ratings yet
WINSEM2020-21 MAT2001 ETH VL2020210505834 Reference Material I 25-Mar-2021 Module 3 - Correlation and Regression
31 pages
Chapter7 (C) Hardenability
No ratings yet
Chapter7 (C) Hardenability
17 pages
Algebra Sheet 1 SSC CGL MAINS BATCH BY GAGAN PRATAP SIR
No ratings yet
Algebra Sheet 1 SSC CGL MAINS BATCH BY GAGAN PRATAP SIR
11 pages
Rolling Mills
No ratings yet
Rolling Mills
3 pages
Fire Fighting System Slide
No ratings yet
Fire Fighting System Slide
14 pages
EEE Course Structure
No ratings yet
EEE Course Structure
7 pages
Soal Sumatif Akhir Ipa 2024-1
No ratings yet
Soal Sumatif Akhir Ipa 2024-1
7 pages
Assignment On Correlation
100% (1)
Assignment On Correlation
7 pages
Correlation: Definitions
No ratings yet
Correlation: Definitions
24 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
Correlation
No ratings yet
Correlation
19 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Correlation
No ratings yet
Correlation
17 pages
Correction and Regression
No ratings yet
Correction and Regression
30 pages
Peter
No ratings yet
Peter
48 pages
Ghadir PVC 2 S
No ratings yet
Ghadir PVC 2 S
1 page
Correlation
100% (1)
Correlation
78 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
No ratings yet
Computer Numerical and Statistical Method Unit 2 Calicut Univercitty Note
17 pages
Maths and Statistical Analysis
No ratings yet
Maths and Statistical Analysis
56 pages
Ap Phys1 - Kinematics 2d Ap Style Free Response Questions - 2023 12 13
No ratings yet
Ap Phys1 - Kinematics 2d Ap Style Free Response Questions - 2023 12 13
12 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
22 pages
QMM 1
No ratings yet
QMM 1
18 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
45 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
3 pages
Correlation Notes
No ratings yet
Correlation Notes
9 pages
Correlation 26-2-24
No ratings yet
Correlation 26-2-24
16 pages
7 A Study On The Properties of Nano Materials
No ratings yet
7 A Study On The Properties of Nano Materials
4 pages
Correlation Regression Analysis
No ratings yet
Correlation Regression Analysis
40 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Bgcse Physics Paper 3 2017
No ratings yet
Bgcse Physics Paper 3 2017
16 pages
Unit - 3 - Correlation & Unit - 4 - Regression
No ratings yet
Unit - 3 - Correlation & Unit - 4 - Regression
43 pages
Magic Words
No ratings yet
Magic Words
2 pages
Unit Iii Poriyan Notes
No ratings yet
Unit Iii Poriyan Notes
33 pages
Ch.12 Correlation
No ratings yet
Ch.12 Correlation
10 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Correlation and Regression - Intro
No ratings yet
Correlation and Regression - Intro
24 pages
Ch.-1 Correlation, Regression and Curve Fitting
No ratings yet
Ch.-1 Correlation, Regression and Curve Fitting
22 pages
Correlation
No ratings yet
Correlation
20 pages
Correlation
No ratings yet
Correlation
14 pages
Unit 1 Correlation, Regression and Curve Fitting 2024-25-1
No ratings yet
Unit 1 Correlation, Regression and Curve Fitting 2024-25-1
23 pages
Correlation
No ratings yet
Correlation
4 pages
Chapter 4 (Correlation Part)
No ratings yet
Chapter 4 (Correlation Part)
16 pages
Penetrant Testing - Process Guide - Methods A B C D
No ratings yet
Penetrant Testing - Process Guide - Methods A B C D
1 page
Correlation and Regression
No ratings yet
Correlation and Regression
54 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Module-I Regression
No ratings yet
Module-I Regression
30 pages
Correlation Regression
No ratings yet
Correlation Regression
20 pages
Correlation Analysis
No ratings yet
Correlation Analysis
49 pages
Correlation & Regression
No ratings yet
Correlation & Regression
12 pages
Correlation
No ratings yet
Correlation
5 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
TSR Report - National Paints March 2025
No ratings yet
TSR Report - National Paints March 2025
3 pages
BMB 2205 Biostat Lec 4 AZN Apr 2025
No ratings yet
BMB 2205 Biostat Lec 4 AZN Apr 2025
33 pages
Correlation
No ratings yet
Correlation
44 pages
Lecture Sheet H
No ratings yet
Lecture Sheet H
17 pages
Unit III Notes
No ratings yet
Unit III Notes
31 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)