0% found this document useful (0 votes)
134 views

Module 5

This document provides an overview of curve fitting, regression, and correlation. It defines curve fitting as finding equations to approximate curves that best fit a set of data. Regression finds the dependent variable from independent variable data and can be linear or curved. Common regression curves are presented with their equations. Linear least squares approximation minimizes the sum of squared residuals to fit a straight line. The normal equations for linear regression are derived. Sample problems demonstrate linear regression using Excel and MATLAB. Metrics for quantifying regression error like standard deviation and coefficient of determination are also described.

Uploaded by

Kim Daniel Estoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
134 views

Module 5

This document provides an overview of curve fitting, regression, and correlation. It defines curve fitting as finding equations to approximate curves that best fit a set of data. Regression finds the dependent variable from independent variable data and can be linear or curved. Common regression curves are presented with their equations. Linear least squares approximation minimizes the sum of squared residuals to fit a straight line. The normal equations for linear regression are derived. Sample problems demonstrate linear regression using Excel and MATLAB. Metrics for quantifying regression error like standard deviation and coefficient of determination are also described.

Uploaded by

Kim Daniel Estoy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Module 4

Curve Fitting, Regression


and Correlation

Engr. Gerard Ang


School of EECE
Curve Fitting and Regression

• Curve Fitting – it is the


process of finding equations to
approximate straight lines and
curves that best fit the given set
of data.
• Regression – it is the process
of finding the dependent
variable from some data of the
independent variable. It can be
linear (straight line) or curved
(quadratic, cubic, etc.)
Approximating Curves
Equation Description of the curve
y = ax + b Straight line
y = ax2 + bx + c Parabola or quadratic curve
y = ax3 + bx2 + cx + d Cubic curve
y = ax4 + bx3 + cx2 + dx + e Quartic curve
y = anxn + … + a2x2 + a1x + a0 nth degree curve
y = abx Exponential curve
y = axb Geometric curve (power function)
y = axb + c Modified exponential curve
y = 1/(abx + q) Logistic curve
y = bemx Exponential function
y = 1/(mx + b) Reciprocal function
Linear Least Squares
Approximation or Regression
Linear regression fits a straight line through the set of n points in
such a way that makes the sum of squared residuals (or offsets) of
the model (that is, vertical distances between the points of the data
set and the fitted line) as small as possible.
Linear Least Squares
Approximation or Regression
Let Yi represent an exact value, and let yi represent an approximate value from
the equation
𝑦𝑖 = 𝑎1 𝑥𝑖 + 𝑎0
where xi is a particular value of the variable assumed to be free of error.
Let ei = Yi - yi. The least-squares criterion requires that
𝑆 = (𝑒1 )2 +(𝑒2 )2 + ⋯ + 𝑒𝑁 2 = σ𝑁 𝑖=1(𝑒𝑖 )
2

𝑆 = σ𝑁 2 𝑁
𝑖=1(𝑌𝑖 − 𝑦𝑖 ) = σ𝑖=1(𝑌𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )
2

be a minimum. N is the number of (x, Y) pairs. At a minimum S, the two partial


derivatives S/a1 and S/a0 will be both zero. Hence, obtaining the following
so-called normal equations.
𝑎1 σ(𝑥𝑖 )2 +𝑎0 σ 𝑥𝑖 = σ 𝑥𝑖 𝑦𝑖
𝑎1 σ 𝑥𝑖 + 𝑎0 𝑁 = σ 𝑦𝑖
Note: All the summations are from i = 1 to N.
These are the normal equations and they can be solved simultaneously.
Sample Problems on
Linear Least Squares Regression
1. In a physics laboratory, a group of students conducted an
experiment to determine the effects of temperature on resistance.
They have recorded the temperature and resistance measurements
as shown below. Determine the least squares fit for these
experiment.

T, C 20.5 32.7 51.0 73.2 95.7


R, ohms 765 826 873 942 1032

2. Find the least-squares line that fits the following data,


assuming that the x-values are free of error.
x 1 2 3 4 5 6
y 5.04 8.12 10.64 13.18 16.20 20.04
Excel Output for Linear Least Squares Regression

1. In a physics laboratory, a group of students conducted an experiment to determine the


effects of temperature on resistance. They have recorded the temperature and resistance
measurements as shown below. Determine the least squares fit for these experiment.

T, C 20.5 32.7 51.0 73.2 95.7


R, ohms 765 826 873 942 1032

n Ti Ri Ti^2 Ri*Ti
Solving for a1 and a0,
1 20.5 765 420.25 15682.50
2 32.7 826 1069.29 27010.20 a1 = 3.3949
3 51 873 2601.00 44523.00 a0 = 702.1721
4 73.2 942 5358.24 68954.40
5 95.7 1032 9158.49 98762.40 R = 3.3949T + 702.1721
sum = 273.1 4438 18607.27 254932.50

Therefore, we have
18607.27a1 + 273.1a0 = 254932.5 (eq. 1)
273.1a1 + 5a0 = 4438 (eq. 2)
Excel Output for Linear Least Squares Regression
2. Find the least-squares line that fits the following data, assuming that the x-values are free of
error.

x 1 2 3 4 5 6
y 5.04 8.12 10.64 13.18 16.20 20.04

n xi yi xi^2 xi*yi
1 1 5.04 1.00 5.04
2 2 8.12 4.00 16.24
3 3 10.64 9.00 31.92 Solving for a1 and a0,
4 4 13.18 16.00 52.72
5 5 16.2 25.00 81.00 a1 = 2.9080
6 6 20.04 36.00 120.24 a0 = 2.0253
sum = 21 73.22 91.00 307.16
y = 2.908x + 2.0253
Therefore, we have
91a1 + 21a0 = 307.16 (eq. 1)
21a1 + 6a0 = 73.22 (eq. 2)
MATLAB Implementation
Polyfit and Polyval Command/Function

MATLAB has a built-in function polyfit(x, y, n) that fits a least-squares


nth-order polynomial to data. Where x and y are the vectors of the
independent and the dependent variables, respectively and n is the
order of the polynomial.

The polyval(p, x) function is used to compute a value using the


coefficients of the linear regression obtain from the polyfit function.
Where p are the polynomial coefficients.
MATLAB Implementation
1. In a physics laboratory, a group of students conducted an experiment to
determine the effects of temperature on resistance. They have recorded
the temperature and resistance measurements as shown below. Determine
the least squares fit for these experiment.
>> T = [20.5 32.7 51 73.2 95.7];
T, C R, ohms >> R = [765 826 873 942 1032];
>> R = polyfit(T,R,1)
20.5 765
32.7 826 R=
51.0 873 3.3949 702.1721
73.2 942
95.7 1032 >> R = polyval(R,10)

R=

736.1208

>>
MATLAB Implementation

2. Find the least-squares line that fits the following data,


assuming that the x-values are free of error.

>> x = [1 2 3 4 5 6];
x y >> y = [5.04 8.12 10.64 13.18 16.20 20.04];
>> a = polyfit(x,y,1)
1 5.04
2 8.12 a=
3 10.64 2.9080 2.0253
4 13.18
5 16.20 >> y = polyval(a,10)

6 20.04 y=

31.1053

>>
Linear Regression Analysis
• A regression model is a mathematical equation that describes the relationship
between two or more variables.
• A single regression model includes only two variables: one independent and
one dependent. The relationship between two variables in a regression analysis
is expressed by a mathematical equation called a regression equation or
model. A regression equation that gives a straight-line relationship between two
variables is called a linear regression model; otherwise, it is called a non-
linear regression model.
Quantification of Error
of Linear Regression
The standard deviation σ measures the spread of the errors around the regression line
The standard deviation of errors (Se) is calculated using
𝑆𝑆𝐸
𝑠𝑒 = 𝑛−2
Where: Se = standard error of the estimate
SSE = sum of the squares of the errors (or residuals)
𝑆𝑆𝐸 = σ(𝑦𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )2

Coefficient of determination, r2 – it provides a measure of how well future outcomes


are likely to be predicted by the model. It always lies between 0 and 1. A value of r2 near
0 suggests that the regression equation is not very useful for making predictions,
whereas a value of r2 near 1 suggests that the regression equation is quite useful for
making predictions.
𝑆𝑆𝑅 𝑆𝑆𝑇−𝑆𝑆𝐸
𝑟 2 = 𝑆𝑆𝑇 = 𝑆𝑆𝑇
Where: SST = total sum of the squares
SSR = regression sum of squares
ത 2 and 𝑦ത = σ 𝑦𝑖 /𝑛
𝑆𝑆𝑇 = σ(𝑦𝑖 − 𝑦)
Quantification of Error
of Linear Regression
Linear correlation coefficient – it is a measure of the relationship between two
variables. It measures how closely the points in a scatter diagram are spread
around the regression line. The linear correlation coefficient is sometimes referred
to as the Pearson product moment correlation coefficient in honor of Karl Pearson.
a. Perfect positive linear correlation, r = 1
b. Perfect negative linear correlation, r = –1
c. No linear correlation, r = 0
𝑛 σ 𝑥𝑦 − (σ 𝑥)(σ 𝑦)
𝑟=
𝑛 σ 𝑥 2 − (σ 𝑥)2 𝑛 σ 𝑦 2 − (σ 𝑦)2
𝑆𝑆𝑅 𝑆𝑆𝑇−𝑆𝑆𝐸
or 𝑟= =
𝑆𝑆𝑇 𝑆𝑆𝑇
Sample Problem on Quantification
of Error of Linear Regression
1. In a physics laboratory, a group of students conducted an experiment to
determine the effects of temperature on resistance. They have recorded
the temperature and resistance measurements as shown below.

T, C 20.5 32.7 51.0 73.2 95.7


R, ohms 765 826 873 942 1032

Determine:
a. standard deviation of errors, Se
b. error sum of squares, SSE
c. total sum of squares, SST
d. regression sum of squares, SSR
e. the coefficient of determination, r2
f. Pearson’s correlation coefficient
Excel Output on Quantification
of Error of Linear Regression
1. In a physics laboratory, a group of students conducted an experiment to
determine the effects of temperature on resistance. They have recorded
the temperature and resistance measurements as shown below.

T, C 20.5 32.7 51.0 73.2 95.7


R, ohms 765 826 873 942 1032

n T (x) R (y) a1 a0 SSE ybar SST


1 20.5 765 3.3949 702.1721 45.79973 887.6 15030.76
2 32.7 826 3.3949 702.1721 164.2158 887.6 3794.56
3 51 873 3.3949 702.1721 5.345344 887.6 213.16
4 73.2 942 3.3949 702.1721 75.32122 887.6 2959.36
5 95.7 1032 3.3949 702.1721 24.3638 887.6 20851.36
sum = 4438 315.0459 42849.2

a . Se 10.2477
b. SSE 315.0459
c. SST 42849.2
d. SSR 42534.15
e. r^2 0.992648
f. r 0.996317
Sample Problem on Quantification
of Error of Linear Regression
2. Given the following data:

x 1 2 3 4 5 6
y 5.04 8.12 10.64 13.18 16.20 20.04

Determine:
a. standard deviation of errors, Se
b. error sum of squares, SSE
c. total sum of squares, SST
d. regression sum of squares, SSR
e. the coefficient of determination, r2
f. Pearson’s correlation coefficient
Excel Output on Quantification
of Error of Linear Regression
2. Given the following data:

x 1 2 3 4 5 6
y 5.04 8.12 10.64 13.18 16.20 20.04

n xi yi a1 a0 SSE ybar SST


1 1 5.04 2.908 2.0253 0.011385 12.20333 51.31334
2 2 8.12 2.908 2.0253 0.077674 12.20333 16.67361
3 3 10.64 2.908 2.0253 0.011946 12.20333 2.444011
4 4 13.18 2.908 2.0253 0.227815 12.20333 0.953878
5 5 16.2 2.908 2.0253 0.133444 12.20333 15.97334
6 6 20.04 2.908 2.0253 0.321149 12.20333 61.41334
73.22 0.783413 148.7715

a . Se 0.442553
b. SSE 0.783413
c. SST 148.7715
d. SSR 147.9881
e. r^2 0.994734
f. r 0.997364
Polynomial Regression
The least squares procedure can be
readily extended to fit the date to a
higher-order polynomial. For
example, suppose that we fit a
second order polynomial or
quadratic:
𝑦 = 𝑎2 𝑥 2 + 𝑎1 𝑥 + 𝑎0 + 𝑒

For this case the sum of the squares


of the residuals is
𝑆 = σ𝑁 2
𝑖=1(𝑦𝑖 − 𝑎2 𝑥𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )
2
Polynomial Regression
The least-squares criterion requires that
𝑆 = σ𝑁 2
𝑖=1(𝑦𝑖 − 𝑎2 𝑥𝑖 − 𝑎1 𝑥𝑖 − 𝑎0 )
2

be a minimum. N is the number of (x, Y) pairs. At a minimum S, the


partial derivatives S/a0, S/a1 and S/a2 will be zero. Hence,
obtaining the following so-called normal equations.

𝑎0 𝑁 + 𝑎1 σ 𝑥𝑖 + 𝑎2 σ 𝑥𝑖2 = σ 𝑦𝑖
𝑎0 σ 𝑥𝑖 + 𝑎1 σ 𝑥𝑖2 + 𝑎2 σ 𝑥𝑖3 = σ 𝑥𝑖 𝑦𝑖
𝑎0 σ 𝑥𝑖2 + 𝑎1 σ 𝑥𝑖3 + 𝑎2 σ 𝑥𝑖4 = σ 𝑥𝑖2 𝑦𝑖

Note: All the summations are from i = 1 to N.


Sample Problem on
Polynomial Regression
Fit a second-order polynomial to the data in the first two
columns of the given table.

x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1
Excel Output for Polynomial Regression
Fit a second-order polynomial to the data in the first two columns of the given table.

x 0 1 2 3 4 5
y 2.1 7.7 13.6 27.2 40.9 61.1

n xi yi xi^2 xi^3 xi^4 xi*yi xi^2*yi


1 0 2.1 0.00 0.00 0.00 0.00 0.00
2 1 7.7 1.00 1.00 1.00 7.70 7.70
3 2 13.6 4.00 8.00 16.00 27.20 54.40
4 3 27.2 9.00 27.00 81.00 81.60 244.80
5 4 40.9 16.00 64.00 256.00 163.60 654.40
6 5 61.1 25.00 125.00 625.00 305.50 1527.50
sum = 15 152.6 55.00 225.00 979.00 585.60 2488.80

6a0 + 15a1 + 55a2 = 152.6 eq 1 a0 = 2.4786


15a0 + 55a1 + 225a2 = 585.6 eq 2 a1 = 2.3593
55a0 + 225a1 + 979a2 = 2488.80 eq 3 a2 = 1.8607

y = 1.8607x2 + 2.3593x + 2.4786


MATLAB Implementation
Fit a second-order polynomial to the data in the first two columns of the
given table.

>> x = [0 1 2 3 4 5];
x y >> y = [2.1 7.7 13.6 27.2 40.9 61.1];
0 2.1 >> a = polyfit(x,y,2)
1 7.7
a=
2 13.6
3 27.2 1.8607 2.3593 2.4786
4 40.9 >> y = polyval(a,10)
5 61.1
y=

212.1429

>>
Multiple Linear Regression
A useful extension of linear regression is the case where y is a linear
function of two or more independent variables.
Consider a function y which is a linear function of x1 and x2 as in
𝑦 = 𝑎0 + 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑒
Such an equation is quite useful in fitting experimental data where variable
being studied is often a function of two other variables. For this two-
dimensional case, the regression line becomes a plane. The best values of
the coefficients are obtained by formulating the sum of the squares of the
residuals:
𝑆 = σ𝑛𝑖=1(𝑦𝑖 − 𝑎0 − 𝑎1 𝑥1,𝑖 − 𝑎2 𝑥2,𝑖 )2
At a minimum S, the partial derivatives S/a2, S/a1 and S/a0 will be
equal to zero and expressing the result in matrix form as
𝑛 σ 𝑥1,𝑖 σ 𝑥2,𝑖 𝑎𝑜 σ 𝑦𝑖
σ 𝑥1,𝑖 σ 𝑥1,𝑖 2 σ 𝑥1,𝑖 𝑥2,𝑖 𝑎1 = σ 𝑥1,𝑖 𝑦𝑖
σ 𝑥2,𝑖 σ 𝑥1,𝑖 𝑥2,𝑖 σ 𝑥2,𝑖 2 𝑎2 σ 𝑥2,𝑖 𝑦𝑖
Sample Problems on
Multiple Linear Regression
1. Use multiple linear regression fit for the following data:

x1 0 1 1 2 2 3 3 4 4
x2 0 1 2 1 2 1 2 1 2
y 15 18 12.8 25.7 20.4 35 30 45.3 40.1

2. Use multiple linear regression fit for the following data:

x1 0 0 1 2 1 1.5 3 3 -1
x2 0 1 0 1 2 1 2 3 -1
y 1 6 4 -4 -2 -1.5 -12 -15 17
Sample Problems on
Multiple Linear Regression
1. Use multiple linear regression fit for the following data:
x1 0 1 1 2 2 3 3 4 4
x2 0 1 2 1 2 1 2 1 2
y 15 18 12.8 25.7 20.4 35 30 45.3 40.1

n x1,i x2,i yi x1,i^2 x1,i*x2,i x2,i^2 x1,i*yi x2,i*yi


1 0 0 15 0 0 0 0 0
2 1 1 18 1 1 1 18 18
3 1 2 12.8 1 2 4 12.8 25.6
4 2 1 25.7 4 2 1 51.4 25.7
5 2 2 20.4 4 4 4 40.8 40.8
6 3 1 35 9 3 1 105 35
7 3 2 30 9 6 4 90 60
8 4 1 45.3 16 4 1 181.2 45.3
9 4 2 40.1 16 8 4 160.4 80.2

sum = 20 12 242.3 60 30 20 659.6 330.6

You might also like