5 Curve Fitting and Interpolation
5 Curve Fitting and Interpolation
Contents
2
Introduction to Curve Fitting
Experimental data is often accompanied by “noise” (error). Even though all control parameters (independent
variables) remain constant, the resultant outcomes (dependent variables) vary.
Because any individual data point may be incorrect, we make no effort to intersect every point. Rather, the curve
is designed to follow the pattern of the points taken as a group. One approach of this nature is called least-
squares regression.
Several least-squares regression methods are available depending on how the data is scattered;
• Linear regression
• Polynomial regression
• Multiple linear regression
• General linear least squares
• Nonlinear regression
4
Linear Regression
As seen in previous figure, a line can represent trend of data. However, the line is subjective; different analyst would
draw different lines because of scatteredness of the data (the points are not aligned in a perfect straight line).
Therefore, some criterion are needed to establish a basis for the fit. One way to do this is to derive a curve that
minimizes the discrepancy between the data points and the curve. A technique for accomplishing this objective, is
called least squares regression.
The simplest example of a least-squares approximation is fitting a straight line to a set of paired observations:
(ݔଵ , ݕଵ ), (ݔଶ , ݕଶ ), . . . , (ݔ , ݕ ). The mathematical expression for the straight line is
(1)
where ܽ and ܽଵ are coefficients representing the intercept and the slope, respectively, and ݁ is the error, or
residual, between the model and the observations, which can be represented by rearranging (1) as
Thus, the error is the discrepancy between the true value of ݕand the approximate value, ܽ + ܽଵ ݔ, predicted by
the linear equation.
Linear Regression
One strategy for fitting a “best” line through the data would be to minimize the sum of the residual errors for all the
available data, as in
ଶ
ܵ = ݕ െ ܽ െ ܽଵ ݔ (2)
ୀଵ
where ݊ is total number of points.
To determine values for ܽ and ܽଵ , Eq. (2) is differentiated with respect to each coefficient:
Setting these derivatives equal to zero will result in a minimum ܵ . If this is done, the equations can be expressed as
6
Linear Regression
Now, realizing that σ ܽ = ݊ܽ , we can express the equations as a set of two simultaneous linear equations with two
unknowns (ܽ and ܽଵ ):
(3)
(4)
Eqn (3) and (4) are called the normal equations. They can be solved simultaneously
(5)
This result can then be used in conjunction with Eq. (3) to solve for
(6)
where ݕത and ݔҧ are the means of ݕand ݔ, respectively.
Linear Regression
We still need to check the goodness of fit to ensure the regression model fit well with the data.
A “standard deviation” for the regression line can be use to quantifies the data spread around the regression
line. It is called the standard error of the estimate and is expressed as
(7)
The subscript notation “ݕΤ ”ݔdesignates that the error is for a predicted value of ݕcorresponding to a particular
value of ݔ. Also, notice that we now divide by ݊ െ 2 because two data-derived estimates (that are ܽ and ܽଵ )
were used to compute ܵ ; thus, we have lost two degrees of freedom.
As with our discussion of the standard deviation, another justification for dividing by ݊ െ 2 is that there is no
such thing as the “spread of data” around a straight line connecting two points. Thus, for the case where ݊ = 2,
Eq. (7) yields a meaningless result of infinity.
8
Linear Regression
Another criteria used to check goodness of fit is coefficient of determination, ݎଶ . It is expressed as
(8)
where ܵ௧ is the total sum of the squares of the residuals between the data points and the mean, expressed as
(9)
The ݎଶ provides a measure of how well observed outcomes are replicated by the model as it quantifies the
improvement or error reduction due to describing the data in terms of a straight line rather than as an average
value.
For example, Figure on the right shows some data that is obviously curvilinear. Transformations can be used to
express the data in a form that is compatible with linear regression.
10 10
9 9
8 8
7 7
6 6
5 5
y
y
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
x x
10
Linear Regression for Curvilinear
ఉ
1 One example of a nonlinear model is the simple power equation: ߙ = ݕଵ ݔభ
2 Second is the exponential equation: ߙ = ݕଶ ݁ ఉమ௫ 3 Third is the saturation-growth-rate equation
ݔ
where ߙଶ and ߚଶ are constants. ߙ = ݕଷ
ߚଷ + ݔ
This model is to characterize quantities that increase where ߙଷ and ߚଷ are constant coefficients.
(positive ߚଵ ) or decrease (negative ߚଵ ) at a rate that is
directly proportional to their own magnitude. This model, which is particularly well-suited for
characterizing population growth rate under limiting
conditions.
11
A simple technique to
solve these equations is
to transform the
equations into a linear
form using mathematical 1 1ߚ+ݔ
manipulation. Then, 1 =
ݔ ߙ ݕ
linear regression can be ln = ݕln ߙ + ߚ ݔln ݁ logଵ ߚ = ݕlogଵ ݔ+ logଵ ߙ
employed to fit the 1 ߚ1 1
equations to data. = +
ߙ ݔߙ ݕ
12
Polynomial Regression
Linear regression provides a powerful technique for fitting a best line to data. However, it is predicated on the
fact that the relationship between the dependent and independent variables is linear.
Some nonlinear data can be predicted using linear regression if it is transform into linear form. However, many
nonlinear data cannot be transform to linear. Therefore, alternative method is required whereby polynomial
regression is used to fit polynomials to the data.
The least-squares procedure can be readily extended to fit the data to a higher-order polynomial. For example,
suppose that we fit a second-order polynomial or quadratic:
(10)
(11)
13
Polynomial Regression
Following the procedure of the previous section, we take the derivative of Eq. (11) with respect to each of the
unknown coefficients of the polynomial, as in
These equations can be set equal to zero and rearranged to develop the following set of normal equations:
14
Polynomial Regression
For this case, we see that the problem of determining a least-squares second-order polynomial is equivalent to
solving a system of three simultaneous linear equations.
The foregoing analysis can be easily extended to this more general case. Thus, we can recognize that
determining the coefficients of an mth-order polynomial is equivalent to solving a system of ݉ + 1 simultaneous
linear equations. For this case, the standard error is formulated as
(12)
15
Interpolation
Recall that the general formula for an nth-order polynomial is
For ݊ + 1 data points, there is one and only one polynomial of order n that passes through all the points.
However, there are a variety of mathematical formats in which this polynomial can be expressed. In this chapter,
we will describe two alternatives that are well-suited for computer implementation: the Newton and the
Lagrange polynomials.
16
Newton’s Divided-Difference Interpolating Polynomial
Polynomial interpolation consists of determining the unique ݊th-order polynomial that fits ݊ + 1 data points.
Suppose that the function ݂( )ݔis tabulated at the points ݔ , ݔଵ , … , ݔ where the ݔpoint is not necessary to be
equidistant. The Newton interpolating polynomial of degree ݊ can be written as
݂ ݔ ݂ = ݔ + ݂ ݔଵ , ݔ ݔെ ݔ + ݂ ݔଶ , ݔଵ , ݔ ݔെ ݔ ݔെ ݔଵ + ڮ+ ݂ ݔ , ݔିଵ , … , ݔ ݔെ ݔ … ݔെ ݔିଵ
where ݂ ݔ , ݂ ݔଵ , ݔ , ݂ ݔଶ , ݔଵ , ݔ , ݂ ݔ , ݔିଵ , … , ݔ are Newton's divided difference that can be calculated
using Table.
The special feature of this form of the polynomial is that the coefficients ݂ ݔ through ݂ ݔ , ݔିଵ , … , ݔ can
be determined using a simple mathematical procedure. Once the coefficients are known, the polynomial can be
used for calculating an interpolated value at any ݔ.
Newton's interpolating polynomials have additional desirable features that make them a popular choice. The
data points do not have to be in descending or ascending order, or in any order.
18
Lagrange Interpolating Polynomial
Lagrange interpolating polynomial is a reformulation of the Newton polynomial which avoids the computation of
divided differences. The function ݂( )ݔis approximated by using
where
ݔെ ݔ
ܮ = ݔෑ
ݔ െ ݔ
ୀ
ஷ
is the Lagrange coefficient with ݊th order of interpolation.
19
Second order
ݔെ ݔଵ ݔെ ݔଶ ݔെ ݔ ݔെ ݔଶ ݔെ ݔ ݔെ ݔଵ
݂ଶ = ݔ ݂ ݔ + ݂ ݔଵ + ݂ ݔଶ
ݔ െ ݔଵ ݔ െ ݔଶ ݔଵ െ ݔ ݔଵ െ ݔଶ ݔଶ െ ݔ ݔଶ െ ݔଵ
20