Chap 12
Chap 12
Chap 12
Numerical Integration
247
Figure 12.1. The area under the graph of a function.
exception that round-off errors are not of must interest for the integration meth-
ods.
This integral gives the area under the graph of f , with the area under the positive
part counting as positive area, and the area under the negative part of f counting
as negative area, see figure 12.1.
Before we continue, we need to define a term which we will use repeatedly
in our description of integration.
Definition 12.1. Let a and b be two real numbers with a < b. A partition of
[a, b] is a finite sequence {x i }ni=0 of increasing numbers in [a, b] with x 0 = a
and x n = b,
a = x 0 < x 1 < x 2 · · · < x n−1 < x n = b.
The partition is said to be uniform if there is a fixed number h, called the step
length, such that x i − x i −1 = h = (b − a)/n for i = 1, . . . , n.
248
(a) (b)
(c) (d)
Figure 12.2. The definition of the integral via inscribed and circumsribed step functions.
n
� n
�
I= m i (x i − x i −1 ), I= M i (x i − x i −1 ).
i =1 i =1
To define the integral, we consider larger partitions and consider the limits of I
and I as the distance between neighbouring x i s goes to zero. If those limits are
the same, we say that f is integrable, and the integral is given by this limit. More
precisely,
�b
I= f (x) d x = sup I = inf I ,
a
249
where the sup and inf are taken over all partitions of the interval [a, b]. This
process is illustrated in figure 12.2 where we see how the piecewise constant
approximations become better when the rectangles become narrower.
The above definition can be used as a numerical method for computing ap-
proximations to the integral. We choose to work with either maxima or minima,
select a partition of [a, b] as in figure 12.2, and add together the areas of the rect-
angles. The problem with this technique is that it can be both difficult and time
consuming to determine the maxima or minima, even on a computer.
However, it can be shown that the integral has a very convenient property: If
we choose a point t i in each interval [x i −1 , x i ], then the sum
n
�
I˜ = f (t i )(x i − x i −1 )
i =1
will also converge to the integral when the distance between neighbouring x i s
goes to zero. If we choose t i equal to x i −1 or x i , we have a simple numerical
method for computing the integral. An even better choice is the more symmetric
t i = (x i + x i −1 )/2 which leads to the approximation
n
� � �
I≈ f (x i + x i −1 )/2 )(x i − x i −1 ). (12.1)
i =1
This is the so-called midpoint method which we will study in the next section.
In general, we can derive numerical integration methods by splitting the
interval [a, b] into small subintervals, approximate f by a polynomial on each
subinterval, integrate this polynomial rather than f , and then add together the
contributions from each subinterval. This is the strategy we will follow, and this
works as long as f can be approximated well by polynomials on each subinter-
val.
Algorithm 12.2. Let f a function which is integrable on the interval [a, b] and
let {x i }ni=0 be a uniform partition of [a, b]. In the midpoint rule, the integral of
250
x1�2 x1�2 x3�2 x5�2 x7�2 x9�2
(a) (b)
Figure 12.3. The midpoint rule with one subinterval (a) and five subintervals (b).
f is approximated by
�b n
�
f (x) d x ≈ I mi d (h) = h f (x i −1/2 ), (12.2)
a i =1
where
x i −1/2 = (x i + x i −1 )/2 = a + (i − 1/2)h.
This may seem like a strangely formulated algorithm, but all there is to it is to
compute the sum on the right in (12.2). The method is illustrated in figure 12.3
in the cases where we have one and five subintervals.
�1
cos x d x = sin 1 ≈ 0.8414709848
0
251
we halve the step length each time. The result is
h I mi d (h) Error
0.500000 0.85030065 −8.8 × 10-3
0.250000 0.84366632 −2.2 × 10-3
0.125000 0.84201907 −5.5 × 10-4
0.062500 0.84160796 −1.4 × 10-4
0.031250 0.84150523 −3.4 × 10-5
0.015625 0.84147954 −8.6 × 10-6
0.007813 0.84147312 −2.1 × 10-6
0.003906 0.84147152 −5.3 × 10-7
0.001953 0.84147112 −1.3 × 10-7
0.000977 0.84147102 −3.3 × 10-8
Note that each time the step length is halved, the error seems to be reduced by a
factor of 4.
Once again, Taylor polynomials with remainders help us out. We expand both
f (x) and f (a 1/2 ) about the the left endpoint
(x − a)2 ��
f (x) = f (a) + (x − a) f � (a) + f (ξ1 ),
2
(a 1/2 − a)2 ��
f (a 1/2 ) = f (a) + (a 1/2 − a) f � (a) + f (ξ2 ),
2
(b − a)2 � (b − a)3 ��
f (a 1/2 )(b − a) = f (a)(b − a) + f (a) + f (ξ2 ). (12.4)
2 8
252
Next, we integrate the Taylor expansion and obtain
�b �b � �
(x − a)2 ��
f (x) d x = f (a) + (x − a) f � (a) + f (ξ1 ) d x
a a 2
�
1 � �
2 b � 1 b
= f (a)(b − a) + (x − a) a f (a) + (x − a)2 f �� (ξ1 ) d x (12.5)
2 2 a
�
(b − a)2 � 1 b
= f (a)(b − a) + f (a) + (x − a)2 f �� (ξ1 ) d x.
2 2 a
We then see that the error can be written
��b � � �b �
�
�
� �1
� � 2 �� (b − a)3 �� �
�
� f (x) d x − f (a 1/2 )(b − a) =
� �2 (x − a) f (ξ 1 ) d x − f (ξ )
2 �
a a 8
�� �
1 �� b 2 ��
� (b − a)3 � �� �
≤ � (x − a) f (ξ1 ) d x �� + � f (ξ2 )� .
2 a 8
(12.6)
For the last term, we use our standard trick,
� �� � � �
� f (ξ2 )� ≤ M = max � f �� (x)� . (12.7)
x∈[a,b]
Note that since ξ2 ∈ (a, a 1/2 ), we could just have taken the maximum over the
interval [a, a 1/2 ], but we will see later that it is more convenient to maximise
over the whole interval [a, b].
The first term in (12.6) needs some massaging. Let us do the work first, and
explain afterwords,
�� � �
1 �� b 2 ��
� 1 b�
�
�
�(x − a)2 f �� (ξ1 )� d x
� (x − a) f (ξ1 ) d x � ≤
2 a 2 a
�
1 b � �
= (x − a)2 � f �� (ξ1 )� d x
2 a
�
M b
≤ (x − a)2 d x (12.8)
2 a
M 1� �b
= (x − a)3 a
2 3
M
= (b − a)3 .
6
The first inequality is valid because when we move the absolute value sign in-
side the integral sign, the function that we integrate becomes nonnegative ev-
erywhere. This means that in the areas where the integrand in the original ex-
pression is negative, everything is now positive, and hence the second integral
is larger than the first.
253
2
Next there is an equality which is valid because � �� (x −
� a) is never negative.
The next inequality follows because we replace � f (ξ1 )� with its maximum on
the interval [a, b]. The last step is just the evaluation of the integral of (x − a)2 .
We have now simplified both terms on the right in (12.6), so we have
��b �
� � � � M M
�
� f (x) d x − f a 1/2 (b − a)�� ≤ (b − a)3 + (b − a)3 .
a 6 8
Lemma 12.4. Let f be a continuous function whose first two derivatives are
continuous on the interval [a.b]. The error in the midpoint method, with only
one interval, is bounded by
� �b �
� � � � 7M
�
� f (x) d x − f a 1/2 (b − a)�� ≤ (b − a)3 ,
a 24
� �
where M = maxx∈[a,b] � f �� (x)� and a 1/2 = (a + b)/2.
The importance of this lemma lies in the factor (b−a)3 . This means that if we
reduce the size of the interval to half its width, the error in the midpoint method
will be reduced by a factor of 8.
Perhaps you feel completely lost in the work that led up to lemma 12.4. The
wise way to read something like this is to first focus on the general idea that was
used: Consider the error (12.3) and replace both f (x) and f (a 1/2 ) by its quadratic
Taylor polynomials with remainders. If we do this, a number of terms cancel out
and we are left with (12.6). At this point we use some standard techniques that
give us the final inequality.
Once you have an overview of the derivation, you should check that the de-
tails are correct and make sure you understand each step.
254
The total error is then
n �� x i
�
�
I − I mi d = f (x) d x − f (x i −1/2 )h .
i =1 x i −1
But the expression inside the parenthesis is just the local error on the interval
[x i −1 , x i ]. We therefore have
� �� ���
�� n xi
� �
|I − I mi d | = � f (x) d x − f (x i −1/2 )h �
�i =1 xi −1 �
n �� x �
�� i �
≤ � f (x) d x − f (x i −1/2 )h ��
�
i =1 x i −1
� n 7h 3
≤ Mi (12.9)
i =1 24
� �
where M i is the maximum of � f �� (x)� on the interval [x i −1 , x i ]. To simiplify the
expression (12.9), we extend the maximum on [x i −1 , x i ] to all of [a, b]. This will
usually make the maximum larger, so for all i we have
� � � �
M i = max � f �� (x)� ≤ max � f �� (x)� = M .
x∈[x i −1 ,x i ] x∈[a,b]
Here, we need one final little observation. Recall that h = (b−a)/n, so hn = b−a.
If we insert this in (12.10), we obtain our main result.
Theorem 12.5. Suppose that f and its first two derivatives are continuous on
the interval [a, b], and that the integral of f on [a, b] is approximated by the
midpoint rule with n subintervals of equal width,
�b n
�
I= f (x) d x ≈ I mi d = f (x i −1/2 )h.
a i =1
7h 2 � �
|I − I mi d | ≤ (b − a) max � f �� (x)� (12.11)
24 x∈[a,b]
255
This confirms the error behaviour that we saw in example 12.3: If h is re-
duced by a factor of 2, the error is reduced by a factor of 22 = 4.
One notable omission in our discussion of the midpoint method is round-off
error, which was a major concern in our study of numerical differentiation. The
good news is that round-off error is not usually a problem in numerical integra-
tion. The only situation where round-off may cause problems is when the value
of the integral is 0. In such a situation we may potentially add many numbers
that sum to 0, and this may lead to cancellation effects. However, this is so rare
that we will not discuss it here.
You should be aware of the fact that the error estimate (12.11) is not the best
possible in that the constant 7/24 can be reduced to 1/24, but then the deriva-
tion becomes much more complicated.
7h 2 � �
(b − a) max � f �� (x)� ≤ �
24 x∈[a,b]
which we can easily solve for h,
�
24� � �
h≤ , M = max � f �� (x)� .
7(b − a)M x∈[a,b]
This is not quite as simple as it may look since we will have to estimate M , the
maximum value of the second derivative. This can be difficult, but in some cases
it is certainly possible, see exercise 1.
I mi d (h 0 ), I mi d (h 1 ), . . . , I mi d (h k ), . . . ,
256
where h k = h 0 /2k . Suppose I mi d (h k ) is our latest approximation. Then we esti-
mate the relative error by the number
|I mi d (h k ) − I mi d (h k−1 |
|I mi d (h k )|
and stop the computations if this is smaller than �. To avoid potential division
by zero, we use the test
As always, we should also limit the number of approximations that are com-
puted.
Algorithm 12.6. Suppose the function f , the interval [a, b], the length n 0 of
the intitial partition, a positive tolerance � < 1, and the maximum number of
iterations M are given. The following algorithm will compute a sequence of
�b
approximations to a f (x) d x by the midpoint rule, until the estimated rela-
tive error is smaller than �, or the maximum number of computed approxi-
mations reach M . The final approximation is stored in I .
n := n 0 ; h := (b − a)/n;
I := 0; x := a + h/2;
for k := 1, 2, . . . , n
I := I + f (x);
x := x + h;
j := 1;
I := h ∗ I ;
abser r := |I |;
while j < M and abser r > � ∗ |I |
j := j + 1;
I p := I ;
n := 2n; h := (b − a)/n;
I := 0; x := a + h/2;
for k := 1, 2, . . . , n
I := I + f (x);
x := x + h;
I := h ∗ I ;
abser r := |I − I p|;
Note that we compute the first approximation outside the main loop. This
is necessary in order to have meaningful estimates of the relative error (the first
257
x0 x1 x0 x1 x2 x3 x4 x5
(a) (b)
Figure 12.4. The trapezoid rule with one subinterval (a) and five subintervals (b).
time we reach the top of the while loop we will always get past the condition).
We store the previous approximation in I p so that we can estimate the error.
In the coming sections we will describe two other methods for numerical
integration. These can be implemented in algorithms similar to Algorithm 12.6.
In fact, the only difference will be how the actual approximation to the integral
is computed.
To get good accuracy, we will have to split [a, b] into subintervals with a partition
and use this approximation on each subinterval, see figure 12.4b. If we have a
uniform partition {x i }ni=0 with step length h, we get the approximation
�b n �x i
� � n f (x
i −1 ) + f (x i )
f (x) d x = f (x) d x ≈ h. (12.13)
a i =1 x i −1 i =1 2
258
[x i −1 , x i ] we use the function values f (x i −1 ) and f (x i ), and on the next inter-
val we use the values f (x i ) and f (x i +1 ). All function values, except the first and
last, therefore occur twice in the sum on the right in (12.13). This means that if
we implement this formula directly we do a lot of unnecessary work. From the
explanation above the following observation follows.
and the first step is to expand the function values f (x) and f (b) in Taylor series
about a,
(x − a)2 ��
f (x) = f (a) + (x − a) f � (a) + f (ξ1 ),
2
(b − a)2 ��
f (b) = f (a) + (b − a) f � (a) + f (ξ2 ),
2
where ξ1 ∈ (a, x) and ξ2 ∈ (a, b). The integration of the Taylor series for f (x) we
did in (12.5) so we just quote the result here,
�b �b
(b − a)2 � 1
f (x) d x = f (a)(b − a) + f (a) + (x − a)2 f �� (ξ1 ) d x.
a 2 2 a
259
If we insert the Taylor series for f (b) we obtain
These expressions can be simplified just like in (12.7) and (12.8), and this yields
� �b �
� f (a) + f (b) � M M
�
� f (x) d x − (b − a)�� ≤ (b − a)3 + (b − a)3 .
a 2 6 4
Lemma 12.8. Let f be a continuous function whose first two derivatives are
continuous on the interval [a, b]. The error in the trapezoid rule, with only
one line segment on [a, b], is bounded by
� �b �
� f (a) + f (b) � 5M 3
� f (x) d x (b a)�≤
� −
2
− � 12 (b − a) ,
a
� �
where M = maxx∈[a,b] � f �� (x)�.
This lemma is completely analogous to lemma 12.4 which describes the lo-
cal error in the midpoint method. We particularly notice that even though the
trapezoid rule uses two values of f , the error estimate is slightly larger than the
estimate for the midpoint method. The most important feature is the exponent
on (b − a), which tells us how quickly the error goes to 0 when the interval width
is reduced, and from this point of view the two methods are the same. In other
words, we have gained nothing by approximating f by a linear functions instead
of a constant. This does not mean that the trapezoid rule is bad, it rather means
that the midpoint rule is unusually good.
260
Theorem 12.9. Suppose that f and its first two derivatives are continuous on
the interval [a, b], and that the integral of f on [a, b] is approximated by the
trapezoid rule with n subintervals of equal width h,
�b � �
f (a) + f (b) n−1
�
I= f (x) d x ≈ I t r ap = h + f (x i ) .
a 2 i =1
� � 2 � �
� I − I t r ap � ≤ (b − a) 5h max � f �� (x)� . (12.16)
12 x∈[a,b]
261
The Lagrange form of the polyomial that interpolates f at −1, 0, 1, is given
by
x(x − 1) (x + 1)x
p 2 (x) = f (−1) − f (0)(x + 1)(x − 1) + f (1) ,
2 2
and it is easy to check that the interpolation conditions hold. To integrate p 2 , we
must integrate each of the three polynomials in this expression. For the first one
we have
� �
1 1 1 1 2 1 � 1 3 1 2 �1 1
x(x − 1) d x = (x − x) d x = x − x = .
2 −1 2 −1 2 3 2 −1 3
Similarly, we find
�1 �1
4 1 1
− (x + 1)(x − 1) d x = , (x + 1)x d x = .
−1 3 2 −1 3
On the interval [−1, 1], Simpson’s rule therefore gives the approximation
�1
1� �
f (x) d x ≈ f (−1) + 4 f (0) + f (1) . (12.17)
−1 3
To obtain an approximation on the interval [a, b], we use a standard tech-
nique. Suppose that x and y are related by
y +1
x = (b − a) + a. (12.18)
2
We see that if y varies in the interval [−1, 1], then x will vary in the interval [a, b].
We are going to use the relation (12.18) as a substitution in an integral, so we
note that d x = (b − a)d y/2. We therefore have
�b �1 � � �
b−a b−a 1 ˜
f (x) d x = f (y + 1) + a d y = f (y) d y, (12.19)
a −1 2 2 −1
where � �
˜ b−a
f (y) = f (y + 1) + a .
2
To determine an approximaiton to the intgeral of f˜ on the interval [−1, 1], we
can use Simpson’s rule (12.17). The result is
�1 � �a +b � �
˜ 1 ˜ ˜ ˜ 1
f (y) d y ≈ ( f (−1) + 4 f (0) + f (1)) = f (a) + 4 f + f (b) ,
−1 3 3 2
since the relation in (12.18) maps −1 to a, the midpoint 0 to (a + b)/2, and the
right endpoint b to 1. If we insert this in (12.19), we obtain Simpson’s rule for
the general interval [a, b], see figure 12.5a. In practice, we will usually divide the
interval [a, b] into smaller ntervals and use Simpson’s rule on each subinterval,
see figure 12.5b.
262
x0 x1 x2 x0 x1 x2 x3 x4 x5 x6
(a) (b)
Figure 12.5. Simpson’s rule with one subinterval (a) and three subintervals (b).
We could just as well have derived this formula by doing the interpolation
directly on the interval [a, b], but then the algebra becomes quite messy.
The next step is to analyse the error. We follow the usual recipe and perform Tay-
� �
lor expansions of f (x), f (a + b)/2 and f (b) around the left endpoint a. How-
ever, those Taylor expansions become long and tedious, so we are gong to see
how we can predict what happens. For this, we define the error function,
�b � �a +b � �
b−a
E(f ) = f (x) d x − f (a) + 4 f + f (b) . (12.21)
a 6 2
Note that if f (x) is a polynomial of degree 2, then the interpolant p 2 will be ex-
actly the same as f . Since the last term in (12.21) is the integral of p 2 , we see
that the error E ( f ) will be 0 for any quadratic polynomial. We can check this by
263
calculating the error for the three functions 1, x, and x 2 ,
b−a
E (1) = (b − a) − (1 + 4 + 1) = 0,
6 � �
1 � 2 �b b − a a +b 1 (b − a)(b + a)
E (x) = x a − a +4 + b = (b 2 − a 2 ) − = 0,
2 6 2 2 2
� �
1 � �b b − a 2 (a + b)2
E (x 2 ) = x 3 a − a +4 + b2
3 6 4
1 b−a 2
= (b 3 − a 3 ) − (a + ab + b 2 )
3 3
1� �
= b 3 − a 3 − (a 2 b + ab 2 + b 3 − a 3 − a 2 b − ab 2 )
3
= 0.
The fact that the error is zero when f (x) = x 3 comes as a pleasant surprise; by
the construction of the method we can only expect the error to be 0 for quadratic
polynomials.
The above computations mean that
� �
E c 0 + c 1 x + c 2 x 2 + c 3 x 3 = c 0 E (1) + c 1 E (x) + c 2 E (x 2 ) + c 3 E (x 3 ) = 0
for any real numbers {c i }3i =0 , i.e., the error is 0 whenever f is a cubic polynomial.
f (x) = T3 ( f ; x) + R 3 ( f ; x).
264
The second equality follows from simple properties of the integral and function
evaluations, while the last equality follows because the error in Simpson’s rule is
0 for cubic polynomials.
The Lagrange form of the error term is given by
(x − a)4 (i v)
R 3 ( f ; x) = f (ξx ),
24
(b − a)5 (b − a)5
|E ( f )| ≤ M+ (M + 4M ).
5 × 24 576
49 � �
� �
|E ( f )| ≤ (b − a)5 max � f (i v) (x)� .
2880 x∈[a,b]
We note that the error in Simpson’s rule depends on (b − a)5 , while the error
in the midpoint rule and trapezoid rule depend on (b − a)3 . This means that
the error in Simpson’s rule goes to zero much more quickly than for the other
two methods when the width of the interval [a, b] is reduced. More precisely, a
reduction of h by a factor of 2 will reduce the error by a factor of 32.
As for the other two methods the constant 49/2880 is not best possible; it can
be reduced to 1/2880 by using other techniques.
265
x0 x1 x2 x3 x4 x5 x6
In this sum we observe that the right endpoint of one subinterval becomes the
left endpoint of the following subinterval to the right. Therefore, if this is im-
plemented directly, the function values at the points with an even subscript will
be evaluated twice, except for the extreme endpoints a and b which only occur
once in the sum. We can therefore rewrite the sum in a way that avoids these
redundant evaluations.
266
Observation 12.13. Suppose f is a function defined on the interval [a, b], and
let {x i }2n
i =0
be a uniform partition of [a, b] with step length h. The composite
Simpson’s rule approximates the integral of f by
�b � n−1 n �
h � �
f (x) d x ≈ f (a) + f (b) + 2 f (x 2i ) + 4 f (x 2i −1 ) .
a 3 i =1 i =1
Theorem 12.14. Suppose that f and its first four derivatives are continuous
on the interval [a, b], and that the integral of f on [a, b] is approximated
by Simpson’s rule with 2n subintervals of equal width h. Then the error is
bounded by
� � 4 � �
�E ( f )� ≤ (b − a) 49h max �� f (i v) (x)�� . (12.22)
2880 x∈[a,b]
12.5 Summary
In this chapter we have derived three methods for numerical integration. All
these methods and their error analyses may seem rather overwhelming, but they
all follow a common thread:
267
2. Approximate the integral of f by the integral of p. This makes it possible
to express the approximation to the integral in terms of function values
of f .
3. Derive an estimate for the error by expanding the function values (other
than the one at a) in Taylor series with remainders.
4. For numerical integration, the global error can easily be derived from
the local error using the technique leading up to theorem 12.5.
Exercises
12.1 a) Write a program that implements the midpoint method as in algorithm 12.6 and
test it on the integral
�1
e x d x = e − 1.
0
b) Determine a value of h that guarantees that the absolute error is smaller than 10−10 .
Run your program and check what the actual error is for this value of h. (You may
have to adjust algorithm 12.6 slightly and print the absolute error.)
12.2 Repeat exercise 1, but use Simpson’s rule instead of the midpoint method.
12.3 When h is halved in the trapezoid method, some of the function values used with step
length h/2 are the same as those used for step length h. Derive a formula for the trapezoid
method with step length h/2 that makes use of the function values that were computed
for step length h.
268