1.1 3 - Interpolation - and - Curve - Fitting PDF
1.1 3 - Interpolation - and - Curve - Fitting PDF
Linear Interpolation
The linear interpolation assumes a straight line equation to interpolate the points between
every two given points in the data set.
Example 1: Suppose you have a table of readings of temperature of a chemical reaction at even
time intervals as following
From this set of data, if the temperature of reaction at time 50 sec is required, how will it be
found? The simplest method that is used frequently in the school to find mid values from tables
is the linear interpolation. In this method, the section of the curve connecting the two known
points (x1, y1) and (x2, y2) is assumed to be a straight line. In this case, a straight line is imagined
between (40, 61.6) and (60, 71.2).
Since the slope of the line is equal, or by similarity of the two triangles shown, the following
equation can be written
𝑦 − 𝑦1 𝑦2 − 𝑦1
=
𝑥 − 𝑥1 𝑥2 − 𝑥1
So, y at given x will be
𝑦2 − 𝑦1
𝑦 = 𝑦1 + (𝑥 − 𝑥1 )
𝑥2 − 𝑥1
This simple formula can be saved in a programmable calculator and used to find interpolated
values from tables in various field of science and engineering.
The disadvantage of the linear interpolation method is that the other points except the two
adjacent to the required one are totally ignored. So, the trend of the curve coinciding the whole
set of data is not taken into account and, consequently, the interpolated point is not accurately
lying on that curve.
In this section, two well-known methods of interpolation will be introduced and coded,
Lagrange’s and Newton’s interpolation methods.
Lagrange’s Method
This method is based on creating a polynomial of degree n. The degree depends on the number
of points considered in the data set so they should be n+1 points. For example, for a third
degree polynomial (cubic), n=3, four data points are required and it will written as following
𝑦(𝑥) = 𝑦1 ℓ1 (𝑥) + 𝑦2 ℓ2 (𝑥) + 𝑦3 ℓ3 (𝑥) + 𝑦4 ℓ4 (𝑥)
or
𝑛+1
𝑦(𝑥) = ∑ 𝑦𝑖 ℓ𝑖 (𝑥)
𝑖=1
where
(𝑥 − 𝑥2 ) (𝑥 − 𝑥3 ) (𝑥 − 𝑥4 )
ℓ1 (𝑥) =
(𝑥1 − 𝑥2 ) (𝑥1 − 𝑥3 ) (𝑥1 − 𝑥4 )
(𝑥 − 𝑥1 ) (𝑥 − 𝑥3 ) (𝑥 − 𝑥4 )
ℓ2 (𝑥) =
(𝑥2 − 𝑥1 ) (𝑥2 − 𝑥3 ) (𝑥2 − 𝑥4 )
(𝑥 − 𝑥1 ) (𝑥 − 𝑥2 ) (𝑥 − 𝑥4 )
ℓ3 (𝑥) =
(𝑥3 − 𝑥1 ) (𝑥3 − 𝑥2 ) (𝑥3 − 𝑥4 )
(𝑥 − 𝑥1 ) (𝑥 − 𝑥2 ) (𝑥 − 𝑥3 )
ℓ4 (𝑥) =
(𝑥4 − 𝑥1 ) (𝑥4 − 𝑥2 ) (𝑥4 − 𝑥3 )
or
𝑛+1
(𝑥 − 𝑥𝑗 )
ℓ𝑖 (𝑥) = ∏
𝑗=1
(𝑥𝑖 − 𝑥𝑗 )
𝑗≠𝑖
𝑛+1 𝑛+1
(𝑥 − 𝑥𝑗 )
𝑦(𝑥) = ∑ 𝑦𝑖 ∏
𝑖=1 𝑗=1
(𝑥𝑖 − 𝑥𝑗 )
𝑗≠𝑖
( )
This form can be used as the algorithm for Lagrange’s interpolation program. To simplify coding,
let’s construct the program step by step by using the data in the table of time-temperature in
page 1.
Step 1: Define the data set as two lists and number of points m where m = n+1.
Step 3: Construct the product loop inside the summation loop for j = 1 to m but j should
never be equal to i. The product variable should be initialized by one before the loop
Step 4: Put the code parts altogether and add input statement to enter the x value (xp) at the
beginning and display the result at the end of the program.
yp = 0
for i in range(n+1):
L = 1;
for j in range(n+1):
if j != i:
L *= (xp - x[j])/(x[i] - x[j])
yp += y[i]*L
print('For x = %.1f, y = %.1f' % (xp, yp))
The following graph shows the difference between the 5th degree polynomial curve resulted
from Lagrange’s method and the linear connection between data points. It is obvious that the
results of interpolation of the two methods will be closer between 20 and 60 seconds. (Check
for the difference between the result of each method at time = 10 sec.)
Finally, in the example, the points are equally spaced in time. Lagrange’s method can be applied
for the non-equally spaced data points as well.
Exercise: By using Lagrange interpolation method, find the value of expansion ratio
corresponding to weight of 5.5 lb. within a tensile test readings tabulated as follows
Newton’s Method
Newton’s method is applied to given data points in order to obtain a polynomial in the form
The divided differences are applied to create a table of given data plus n columns of differences,
where n is the degree of the polynomial for n+1 data points. For example, the table shown
below represents the divided differences levels for 4 data points.
The column (2) is the differences of the second column with respect to corresponding x values
and its values are calculated as
(1) (1)
(2) 𝑦𝑖 − 𝑦1
𝑦𝑖 = , 𝑖 = 2,3,4
𝑥𝑖 − 𝑥1
Similarly, the column (3) is the differences of the third. So, its values will be
(2) (2)
(3) 𝑦𝑖 − 𝑦2
𝑦𝑖 = , 𝑖 = 3,4
𝑥𝑖 − 𝑥2
Finally, the last column contains a single value
(3) (3)
(4) 𝑦4 − 𝑦3
𝑦4 =
𝑥4 − 𝑥3
So, the general formula for the divided differences is
(𝑗) (𝑗)
(𝑗+1)
𝑦𝑖 − 𝑦𝑗
𝑦𝑖 = , 𝑗 = 1, … , 𝑛 𝑎𝑛𝑑 𝑖 = 𝑗 + 1, … , 𝑛 + 1
𝑥𝑖 − 𝑥𝑗
(1) (1)
Where 𝑦1 = 𝑦1 and 𝑦2 = 𝑦2 and so on.
Example 2: Construct the divided differences table for the following data points:
Solution: The manual calculation of the given values results in the following table
Step 1: Define the data set as two lists and number of points m, where m = n+1.
import numpy as np
So,
Step 3: Make two nested loops: the j-loop is the outer and controls table columns and i-loop is
the inner and controls the differences according to the general formula of the divide differences.
for j in range(n):
for i in range(j+1, n+1):
Dy[i,j+1] = (Dy[i,j]-Dy[j,j])/(x[i]-x[j])
print(Dy)
Notice that the positions and values of i’s and j’s in the program are similar to theirs in the
general formula. The output of this portion of the code is
[[ 0. 0. 0. 0. 0. 0. ]
[ 0.9 0.6 0. 0. 0. 0. ]
[ 2.5 0.89285714 0.22527473 0. 0. 0. ]
[ 6.6 1.5 0.31034483 0.05316881 0. 0. ]
[ 7.7 1.26229508 0.14397719 -0.02463562 -0.04576731 0. ]
[ 8. 1. 0.06153846 -0.03148774 -0.02351571 0.01171137]]
Back to the Newton’s method, the coefficients of the polynomial will be the values of the main
diagonal of the divided differences table. So,
(1) (2) (3) (𝑛+1)
𝑎0 = 𝑦1 , 𝑎1 = 𝑦2 , 𝑎2 = 𝑦3 , …, 𝑎𝑛 = 𝑦𝑛+1
This can be coded by two simple ways: (1) by construction of a one dimensional array, a, and
transferring the main diagonal elements or (2) by using the main diagonal elements directly in
computing polynomial terms. The second approach is better since it does not require additional
memory.
The second part of Newton’s method is calculation of the polynomial for a given x value. The
polynomial can be rewritten in the following general form
𝑛 𝑖
𝑦(𝑥) = 𝑎0 + ∑ [∏(𝑥 − 𝑥𝑗 )] 𝑎𝑖
𝑖=1 𝑗=1
or
𝑛 𝑖
(1) (𝑖+1)
𝑦(𝑥) = 𝑦1 + ∑ [∏(𝑥 − 𝑥𝑗 )] 𝑦𝑖+1
𝑖=1 𝑗=1
Finally, the program can be put all together after adding input and display statements:
import numpy as np
yp = Dy[0,0]
for i in range(n):
xprod = 1
for j in range(i+1):
xprod *= xp - x[j]
yp += xprod*Dy[i+1,i+1]
print('For x = %.1f, y = %.1f' % (xp, yp))
As a first test of the program, let’s input values of x from the given table.
Enter x: 4.4
For x = 4.4, y = 6.6
Enter x: 8
For x = 8.0, y = 8.0
Enter x: 2.8
For x = 2.8, y = 2.5
Thus, the polynomial passes through the given points as shown in the following graph.
Curve Fitting
Curve fitting is to find the equation of the curve that passes through the given data points with
least deviation from the points. Thus, the main difference between interpolation and curve fitting
is that the latter does not have to coincide all given data points. The technique used in finding the
curve equation is known as the least-squares method where squares of the differences between
given points and fitting curve function values are minimized.
𝑓(𝑥) = 𝑎 + 𝑏𝑥
The coefficients a and b can be found by the equations
𝑦̅ ∑ 𝑥𝑖2 − 𝑥̅ ∑ 𝑥𝑖 𝑦𝑖
𝑎=
∑ 𝑥𝑖2 − 𝑛𝑥̅ 2
∑ 𝑥𝑖 𝑦𝑖 − 𝑥̅ ∑ 𝑦𝑖
𝑏=
∑ 𝑥𝑖2 − 𝑛𝑥̅ 2
Example 3: Find the equation of the straight line that fits the data:
x 3 4 5 6 7 8
y 0 7 17 26 35 45
Solution: This problem can be solved in two simple ways:
1. Using a for-loop for all required summations and then calculation of a and b.
x = [3, 4, 5, 6, 7, 8]
y = [0, 7, 17, 26, 35, 45]
n = len(x) # number of data points
sumx = sumx2 = sumxy = sumy = 0
for i in range(n):
sumx += x[i]
sumx2 += x[i]**2
sumxy += x[i]*y[i]
sumy += y[i]
xm = sumx / n # mean of x values
ym = sumy / n # mean of y values
a = (ym*sumx2 - xm*sumxy)/(sumx2 - n*xm**2)
b = (sumxy - xm*sumy)/(sumx2 - n*xm**2)
print('The straight line equation:')
print('f(x) = (%.3f) + (%.3f)x'%(a,b))
𝑓(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑛 𝑥 𝑛
If a set of data containing m points is to be fitted by the polynomial curve of degree n, a system of
linear equations is formulated to compute values of the coefficients
[𝐴]{𝑎} = {𝐵}
where
𝑚 ∑ 𝑥𝑖 ∑ 𝑥𝑖2 ⋯ ∑ 𝑥𝑖𝑛 ∑ 𝑦𝑖
∑ =∑
𝑖=1
In other words, the all summation signs shown above are from i=1 to m.
Once the all coefficients are calculated, the system can be solved by using a linear system solving
technique like Gauss-Elimination. In this section, the function solve() from the module
numpy.linalg will be used. This module contains linear algebra functions.
𝑚 ∑ 𝑥𝑖 ∑ 𝑥𝑖2 ∑ 𝑥𝑖3 ∑ 𝑦𝑖
x 0 1 2 3 4 5
y 2 8 14 28 39 62
Step 1: Define the given data in x and y arrays, and construct the square matrix and vectors of the
system in the dimension n+1.
Step 2: By using two nested i and j loops, define the elements of the matrix A. With paying some
attention to the relation between the position of each element and the power of its 𝑥𝑖 , it can be
noticed that the power = row number + column number when considering that the row and
column indices in Python start from 0. Accordingly, the power of 𝑥𝑖 at each row of B is equal to
the row number.
Step 3: At the end of the loops, the system can be solved for the vector of the polynomial
coefficients by using a numerical method.
import numpy as np
x = np.arange(6)
y = np.array([2, 8, 14, 28, 39, 62])
m = len(x) # number of data points
n = 2 # degree of the polynomial
A = np.zeros((n+1, n+1))
B = np.zeros(n+1)
a = np.zeros(n+1)
# Loops of system formation
for row in range(n+1):
for col in range(n+1):
if row == 0 and col == 0:
A[row,col] = m
continue
A[row,col] = np.sum(x**(row+col))
B[row] = np.sum(x**(row) * y)
The output:
The polynomial:
f(x) = 2.678571
+2.253571 x^1
+1.875000 x^2
To try fitting the given data in the example with a cubic polynomial, all that have to be done is to
set n = 3, and the output will be
The polynomial:
f(x) = 1.928571
+5.678571 x^1
-0.000000 x^2
+0.250000 x^3
The graph shown below compares between the quadratic and cubic polynomial curves in how
they fitted the given data points. Practically, the selection of the best fitting curve can be done
not only according to the curve behavior, but also to the required polynomial degree and other
conditions related to the physical problem.
Interpolation in SciPy
There are many one and multidimensional interpolation functions in the module:
scipy.interpolate. In this section, interp1d() and lagrange() interpolation functions
are applied to the data given in Example 1.
The value of y corresponding to x = 50 is equal to that obtained by using the linear interpolation
because the default interpolation kind is ‘linear’. Other kinds can be used for better evaluation.
>>> f = interp1d(x,y,'quadratic')
>>> print(f(50))
66.95208333333332
>>> f = interp1d(x,y,'cubic')
>>> print(f(50))
66.945
The cubic interpolation has resulted in a value of y equal to that obtained by Lagrange 5th degree
polynomial.
https://docs.scipy.org/doc/scipy/reference/interpolate.html
>>> L = linregress(x,y)
>>> L.slope
9.0857142857142854
>>> L.intercept
-28.3047619047619
For more information about linregress():
https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html
The polynomial curve fit is performed by the function curve_fit() from the module
scipy.optimize. It uses non-linear least squares to fit a function, f, to data. So, in addition
to data points, a model function should be given which accordingly the function that fits the data
will be created. Example 4 will be solved by curve_fit() to compare the results.