Lecture25 Ps
Lecture25 Ps
Numerical Methods
Lecture 25 April 8, 2002
Curve Fitting or Least Squares Approximation
Linear Regression
Linear least squares approximation is sometimes called linear regression. Suppose we have n data points
which are subject to experimental error, and they look like they almost fit on a straight line, as in the
figure.
y
y = ax + b
Dk (a,b)
(xk ,yk )
x
Instead of trying to find a smooth function which passes through all of the points, since the data may contain
experimental errors, we try to find the straight line, or regression line, that gives the best fit in the least
squares sense.
To make this more precise, suppose that the regression line or approximating line has equation
y = ax + b,
we want to measure how good a fit this line is to the data by minimizing the sum of the squares of the
deviations
Dk = yk (axk + b)
for k = 1, 2, . . . , n, that is, we want
n
X n
X 2
E(a, b) = Dk (a, b)2 = [yk (axk + b)]
k=1 k=1
to be a minimum.
As a function of the two variables a and b, the minimum of E(a, b) will occur either at a critical point where
E(a, b) E(a, b)
=0 and = 0,
a b
or at a point where E(a, b) is not differentiable. Clearly, there are no points where E(a, b) is not differentiable,
E(a, b)
so we should look for the minimum value of E(a, b) at a critical point. Therefore, we set = 0 and
a
E(a, b)
= 0 and solve for the values of a and b at the critical point. Note that we really should test the
b
critical point to see that we do, indeed, have a global minimum for the function E(a, b).
n
P 2
Differentiating E(a, b) = [yk (axk + b)] , we have
k=1
n n
E(a, b) X 2
X
0= = [yk (axk + b)] = 2 [yk (axk + b)] (xk )
a a
k=1 k=1
and
n n
E(a, b) X X
0= = [yk (axk + b)]2 = 2 [yk (axk + b)] (1).
b b
k=1 k=1
Now that we have the values of a and b, the line that best approximates the data points in the least squares
sense is
y = ax + b,
and is called the least squares line or the regression line.
(x1 , y1 ) = (1, 1.3), (x2 , y2 ) = (2, 3.5), (x3 , y3 ) = (3, 4.2), (x4 , y4 ) = (4, 5.0), (x5 , y5 ) = (5, 7.0)
(x6 , y6 ) = (6, 8.8), (x7 , y7 ) = (7, 10.1), (x8 , y8 ) = (8, 12.5), (x9 , y9 ) = (9, 13.0), (x10 , y10 ) = (10, 15.6)
Find the least squares line approximating this experimental data. Also, find the total least squares error
10
X 2
E(a, b) = [yk (axk + b)]
k=1
2
Solution. The data are tabulated below, note that the last column was completed after the coefficients a
and b were found.
xk yk x2k xk yk axk + b
10(572.4) 55(81)
a= = 1.538
10(385) (55)2
and
385(81) 55(572.4)
b= = 0.360.
10(385) (55)2
The equation of the least squares line is therefore
y = 1.538x 0.360.
The approximate values given by the least squares technique at the data points are given in the last column
of the table.
3
The graph of the least squares line and the data points were plotted using gnuplot and are shown in the
figure below.
20
18 1.538*t - 0.360
example1.dat
16
14
12
10
8
6
4
2
0
-2
0 2 4 6 8 10 12
{(xi , yi ) : i = 1, 2, . . . m}
a0 + a1 x1 + a2 x21 + + an xn1 = y1
a0 + a1 x2 + a2 x22 + + an xn2 = y2
..
.
a0 + a1 xm + a2 x2m + + an xnm = ym
and we have more equations than unknowns, since m > n + 1, so the system is overdetermined, usually
an overdetermined system has no solution.
In this situation, when there is no exact fit, we have to find values of the coefficients which give a best fit in
the least squares sense. Again, we choose the coefficients a0 , a1 , . . . , an to minimize the least squares error,
that is, the sum of the squares of the deviations,
m
X m
X m
X m
X
E(a0 , a1 , . . . , an ) = [yi pn (xi )]2 = yi2 2 yi pn (xi ) + pn (xi )2 .
i=1 i=1 i=1 i=1
4
After simplifying, we have
m n m
! n X
n m
!
X X X X X
E(a0 , a1 , . . . , an ) = yi2 2 aj xji yi + aj ak xj+k
i .
i=1 j=0 i=1 j=0 k=0 i=1
for j = 0, 1, 2, . . . , n.
This is a system of n + 1 linear equations in n + 1 unknowns:
m
X m
X m
X m
X m
X
a0 x0i + a1 x1i + a2 x2i + + an xni = yi x0i
i=1 i=1 i=1 i=1 i=1
m
X m
X m
X m
X m
X
a0 x1i + a1 x2i + a2 x3i + + an xn+1
i = yi x1i
i=1 i=1 i=1 i=1 i=1
..
.
m
X m
X m
X m
X m
X
a0 xni + a1 xn+1
i + a2 xn+2
i + + an x2n
i = yi xni .
i=1 i=1 i=1 i=1 i=1
It can be shown that this system of equations has a unique solution for a0 , a1 , . . . , an provided that the xi ,
i = 1, 2, . . . , m, are all distinct.
p2 (x) = ax2 + bx + c,
m
E X
2 yi (ax2i + bxi + c) (xi ) = 0
=
b i=1
m
E X
2 yi (ax2i + bxi + c) (1) = 0.
=
c i=1
5
The normal equations are
m
X m
X m
X m
X
a x4i +b x3i +c x2i = x2i yi
i=1 i=1 i=1 i=1
m
X m
X m
X m
X
a x3i + b x2i + c xi = xi yi
i=1 i=1 i=1 i=1
m
X m
X m
X m
X
a x2i + b xi + c 1= yi
i=1 i=1 i=1 i=1
which is a system of 3 linear equations in 3 unknowns (a, b, c), which has a unique solution provided that
x1 , x2 , . . . , xm are all distinct.
Note: In the general case, we can also determine the value of the least squares error
m
X 2
E(a0 , a1 , . . . , an ) = [yi pn (xi )]
i=1
and this gives a numerical value which indicates how well the curve fits the data.
Also, different polynomials, or other functions, can be tried and the one that gives the smallest value of E
can be used.
High order polynomials can give a perfect fit in principle, but in practice the results could be useless. The
degree of the polynomials should be kept to a minimum, say no more than 2 or 3.
(x1 , y1 ) = (1.0, 1.7), (x2 , y2 ) = (2.0, 1.8), (x3 , y3 ) = (3.0, 2.3), (x4 , y4 ) = (4.0, 3.1),
The plot seems to indicate that we should use a quadratic polynomial for the least squares approximations.
6
We choose a quadratic
p2 (x) = ax2 + bx + c
and use a table to calculate the summations used in the normal equations.
1.0 1.7 1.7 1.7 1.0 1.0 1.0 1.695 2.5 105
2.0 1.8 3.6 7.2 4.0 8.0 16.0 1.815 2.25 104
3.0 2.3 6.9 20.7 9.0 27.0 81.0 2.285 2.25 104
4.0 3.1 12.4 49.6 16.0 64.0 256.0 3.105 2.5 105
Again, note that the last two columns were calculated after solving the normal equations for a, b, and c.
The normal equations are
Solving these equations, we have a = 0.175, b = 0.405, c = 1.925, and the quadratic which gives a best fit
to the data set in the least squares sense is
and the value of the error is E(a, b, c, ) = 5.0 104 . Using gnuplot we plotted the least squares quadratic
and the data points on the same graph. The result is shown below.
3.2
0.175*t**2 - 0.405*t + 1.925
3 quadratic.dat
2.8
2.6
2.4
2.2
1.8
1.6
0 0.5 1 1.5 2 2.5 3 3.5 4
7
If instead of a quadratic we had chosen a linear polynomial to get the best fit to the data, say
p1 (x) = ax + b,
m
E(a, b) X
= 2 [yi (axi + b)] (1) = 0
b i=1
Now we use a table to compute the summations needed in the normal equations.
Again, note that the last two columns were completed after the normal equations were solved for a and b.
The normal equations are
8
Therefore, the linear polynomial that gives a best fit to the given data in the least squares sense is
y = 0.47x + 1.05.
Note that the least squares error in the linear case is E(a, b) = 0.123, while the least squares error in the
quadratic case was E(a, b, c) = 5.0 104 , and we conclude that the quadratic gives a better least squares fit
to the data.
We plotted the least squares quadratic, the least squares line, and the original data on the same graph using
gnuplot, and the result is shown below.
3.5
0.175*t**2 - 0.405*t + 1.925
0.47*t + 1.05
3 quadratic.dat
2.5
1.5
1
0 0.5 1 1.5 2 2.5 3 3.5 4
Note: There is no reason to use only polynomials in the least squares approximation process, we can use
other functions as well. Using polynomials, our basis functions were the powers:
{1, x, x2 , x3 , , xn , . . . }.
where {a1 , a2 , . . . , an , . . . } is sequence of distinct positive real numbers. Or, we could use the trigonometric
functions to get a trigonometric polynomial, for example,
{1, cos x, sin x, cos 2x, sin 2x, . . . , sin nx, cos nx, . . . }.
Or, we could use any family of continuous functions which is linearly independent on the real line.
9
In most cases, we can reduce the problem to a linear regression problem, or a quadratic least squares
approximation by making the appropppriate transformations.
For example,
If our original function is a hyperbola, say
1
p(x) =
a0 + a1 x
1
z= = a0 + a1 x.
p(x)
p(x) = abx
p(x) = axb ,
p(x) = a0 + a1 cos x,
10