Asna Notes
Asna Notes
From symbolic to
numerical computation,
and applications
Master dinformatique fondamentale
cole normale suprieure de Lyon
Fall-winter 2013
Nicolas Brisebarre Bruno Salvy
http://www.ens-lyon.fr/LIP/AriC/M2R/ASNA
Chapter 1
Introduction
The classical presentation of mathematical methods usually leaves out the problems of actually
getting numerical values. In practice, a compromise between accuracy and eciency is desirable
and this turns out to require the development of specic (and often very nice) algorithms. Our
aim in this course is to exhibit the interplay between symbolic and numerical computation in order
to achieve as precise (or guaranteed or proved) computations as possible, and fast ! (At least in a
number of problems).
3
4 Introduction
The problem is that the numerical error introduced when replacing by a 10-digit approximation
amounts to having a very small, but nonzero, value for a. At rst, this goes unnoticed, but
eventually, since n tends to innity, it overtakes the part in n.
A natural solution is to work exactly, starting with a symbolic value for and reproducing the
same steps using symbolic computation:
Maple 8] phi2:=phi[2];
1/2 1/2 5
Maple 9] u[0]:=1:u[1]:=phi2: for i from 0 to N do u[i+2]:=u[i]+u[i+1]
od:L:=[seq(u[i],i=0..N)];
29
[1, 1/2 1/2 5 , 3/2 1/2 5 , 2 5 , 7/2 3/2 5 , 11/2 5/2 5 , 9 4 5 , 13/2 5 ,
2
47 123 55 199 89 521 233 843
21/2 5 , 38 17 5 , 5, 5 , 161 72 5 , 5,
2 2 2 2 2 2 2 2
377 2207 987 3571 1597 9349 4181
5 , 682 305 5 , 5, 5 , 2889 1292 5 , 5,
2 2 2 2 2 2 2
15127 6765 39603 17711 64079 28657
5 , 12238 5473 5 , 5, 5 , 51841 23184 5 ,
2 2 2 2 2 2
167761 75025 271443 121393 710647 317811
5, 5 , 219602 98209 5 , 5,
2 2 2 2 2 2
1149851 514229 3010349 1346269 4870847 2178309
5 , 930249 416020 5 , 5, 5,
2 2 2 2 2 2
12752043 5702887 20633239 9227465
3940598 1762289 5 , 5, 5 , 16692641
2 2 2 2
54018521 24157817 87403803 39088169
7465176 5 , 5, 5 , 70711162 31622993 5 ,
2 2 2 2
228826127 102334155 370248451 165580141
5, 5 , 299537289 133957148 5 ,
2 2 2 2
969323029 433494437 1568397607 701408733
5, 5 , 1268860318 567451585 5 ,
2 2 2 2
4106118243 1836311903 6643838879 2971215073
5, 5 , 5374978561 2403763488 5 ,
2 2 2 2
17393796001 7778742049 28143753123 12586269025
5, 5]
2 2 2 2
1.1 From symbolic to numerical computations 5
Again, the values explode eventually, although we have exact values all along. The reason for this
lies in the numerical evaluation of the large coecients involved in the exact expression. This can
be seen by evaluating both terms in each value separately:
Maple 11] u[50];
28143753123 12586269025
5
2 2
Maple 12] A:=[op(%)];
28143753123 12586269025
[ , 5]
2 2
Maple 13] evalf(A);
[14071876560.0, 14071876560.0]
Thus, in this case, increasing the precision of the numerical evaluation is sucient:
Maple 14] evalf(u[50],20);
0.0
Maple 15] evalf(u[50],30);
3.55318637 1011
Note that since both summands in the expression grow like n and we are computing a value that
decreases like n, the number of required digits grows linearly with n, making such a computation
costly.
The behaviour of this sequence is by no means an isolated accident. Every time a sequence has
an asymptotic behaviour which is not the dominating one, its direct evaluation presents this kind
of diculty. A simple and ecient way to compute such sequences will be presented in Chapter 8.
(
3
6
)
d2 3 3 (2/3)
y(x) x y(x) = 0, y(0) = 1/3 , D(y)(0) = 1/2
d x2 (2/3)
Note that apart from this region where x is large, the numerical solver behaves very well, as we
can see by superposing both curves
Maple 20] plots[display](%%,%):
Here, this behaviour is very unfortunate: this function has been isolated and given a name by
mathematical physicists precisely because it has a mild behaviour at . It is therefore necessary
to nd other ways for its evaluation. An ecient approach to the guaranteed computation of such
functions with high precision will be presented in Chapter 3.
1.2 Approximations 7
1.2 Approximations
We now list a few of the questions dealt with in this course, whose natural habitat lies sometimes
within approximation theory and sometimes within symbolic computation.
Compute the rst 1000 digits of , ln 2, 7 , exp(-10), ... (see Chapter 3);
Compute the oating point number in the IEEE standard that is closest to these numbers;
Compute the rst 1000 Taylor coecients of
1
, arcsin(x), sin(tan (x)) tan (sin (x)),
1 x x2
or of the solutions of
1
y(x) = 1 + x y(x)5, y(x) = x + x log , x2 y (x) + x y (x) + (x2 1) y(x) = 0
1 y(x)
(Ecient algorithms exist in computer algebra. A general purpose approach is described in
Chapter 3 and a faster one for all but the second and fourth examples is in Chapter 3).
Compute a polynomial P of minimal degree such that
|f (x) P (x)| < 1015 for all x [0, 1/4],
and for each of the functions above (see Chapter 2).
Conversely, given a function f and a polynomial P , compute a bound on |f P | on such
an interval (Chapter 2);
Polynomials are not very good at approximating functions that have poles at or near the
endpoints of the interval. It is therefore natural to ask the same questions with rational
functions instead of polynomials, minimizing the sum of the degrees of the numerator and
denominator (Chapters 4 and 5);
Same questions when minimizing
Z 1/4
(f (t) P (t))2 dt
0
instead;
8 Introduction
Given (x1, y1), ..., (xn , yn), compute a polynomial of minimal degree (or a rational function
with minimal sum of degrees) such that P (xi) = yi, i = 1, ..., n;
Same question with a xed degree, minimizing
n
X n
X
|P (xi) yi | or |P (xi) yi |2;
i=1 i=1
Same question with a given f , with yi = f (xi) and now the aim is to minimize |f (x) P (x)|
for x [a, b] or
Z b
|f (t) P (t)|2 dt;
a
Theorem 1.3. [Richardson-Matiyasevich] There cannot exist an algorithm taking as input any
function of one variable f (x) built by repeated composition from the constant 1, addition, subtrac-
tion, multiplication, the functions sine and absolute value, and that decides whether f=0 or not.
This result restricts the class of functions that can be handled in symbolic computation systems.
It also implies that simplication is at best a collection of useful heuristics. The way out of this
undecidability result is to stay within algebraic constructions that preserve decidability.
Similarly, and closer to the aim of this course, the solutions of linear dierential equations with
polynomial coecients over an eective eld enjoy a large number of closure properties made
eective by simple algorithms that will be presented in Chapter 3. Not only can one show auto-
matically identities like sin2 + cos2 = 1, but this gives access to an eective access to various
operations with special functions and orthogonal polynomials, thereby providing us with a large
set of eective functions that require approximation.
1.3.2 Efficiency
One minute
The basic algorithms of symbolic computation are extremely ecient. Thus in one minute of
cpu time of a typical modern computer, it is possible to compute
the product of integers with 500,000,000 digits;
the factorial of 20,000,000 (the result has roughly 140 106 digits);
(by contrast, only the factorisation of a 45-digits number can be done within this time);
the product of polynomials in K[x] of degree 14 106 (where K = Z/pZ with p = 67, 108,
879 is a 26-bit long prime number);
the gcd of polynomials of K[x] of degree 600, 000;
the determinant of matrices of size 4, 500 4, 500 over K;
the determinant of matrices of size 700 700 if their entries are 32-bit integers.
Complexity models
A simple means of assessing the eciency of algorithms is by analyzing their complexity. This
entails selecting a complexity model that denes precisely what operations are counted in the
analysis. One such model that is commonly used is the RAM model (for Random Access Machine).
In this model, the machine has one tape where it reads its input (one integer per cell in the tape);
another tape where it writes its output; a tape for its computations. It has an arbitrary number
of registers and the operations that are counted at unit cost are: reading or writing a cell; adding,
multiplying, subtracting, dividing integers; jumps that can be either unconditionnal or depending
on tests of the type =0 or >0 on the value of a register.
The complexity measured in this model is called the arithmetic complexity. While it is a good
measure of time when working in settings where the sizes of the integers are all similar, for instance
for polynomials or matrices over a nite eld, this model does not predict time correctly when
large size integers come into play. A variant of this model is then used, where the cells can only
contain one bit (0 or 1) and the operations only act on one bit as well. The complexity measured
in this model is called bit complexity.
Asymptotic estimates
On modern computers, a computation that takes more than 10 seconds is usually already
spending its time in the part that dominates the other ones asymptotically and thus fair predictions
of execution time can be obtained by a rst-order asymptotic estimate of the complexity. In each
case it is of course necessary to specify the complexity model (arithmetic or bit complexity in this
course).
For instance, the computation of n! requires
O(n2 log2n) bit operations by the nave algorithm and only O(n) arithmetic operations;
O( n log n) arithmetic operations with the currently known best algorithm in terms of
arithmetic operations;
O(n log3n loglog n) bit operations, with an algorithm presented in Chapter 3, which explains
the speed displayed above.
Integers and polynomials
10 Introduction
The starting point for fast algorithms in symbolic computation is fast multiplication. We use
M(N ) to denote the arithmetic complexity of the product of two polynomials of degree bounded
by N in one variable. Then,
O(N 2)
by the nave algorithm;
M(N ) = O(N log23) by Karatsubas algorithm;
O(N log N loglog N ) by fast Fourier transform (FFT)..
These will be the building blocks for fast algorithms in the following chapters. In particular, it is
important to keep in mind the following complexity estimates:
For power series, product, inverses, square-roots and more generally solutions of polynomials
can be computed in O(M(N )) arithmetic operations;
For polynomials, gcd, multipoint evaluation (evaluation of a polynomial of degree N at N
points) and interpolation can be computed in O(M(N )log N ) arithmetic operations.
Chapter 2
Polynomial approximations
In this chapter, we present various theoretical and algorithmic results regarding polynomial approx-
imations of functions. We will mainly deal with real-valued continuous functions over a compact
interval [a, b], a, bR, a 6 b. We will denote C([a, b]) the real vector space of continuous functions
over [a,b]. In the framework of function evaluation one usually works with the following two norms
over this vector space:
the least-square norm L2: given a nonnegative weight function w C([a, b]), if dx denotes
the Lebesgue measure, we write
g L2([a, b], w, dx)
if
Z b
w(x) |g(x)| 2 dx < ,
a
and then we dene sZ
b
kg k2,w = w(x) |g(x)|2 dx ;
a
the supremum norm (aka Chebyshev norm, innity norm, L norm) : if g is bounded
on [a, b], we set
kg k = sup |g(x)|,
x[a,b]
For both norms, one of the main questions we are interested in here is the following.
Question. Given f C([a, b]) and n N, minimize kf pk where p describes the space Rn[x] of
polynomials with real number coefficients and degree at most n.
In the L2 case, the answer to this question is easy. The space C([a, b]) is a subset of L2([a, b],
w, dx) which is a Hilbert space, i.e. a vector space equipped with an inner product
Z b
hf , g i = f (x) g(x) w(x) dx,
a
and kk2 is the associated norm, for which L2([a, b], w, dx) is complete. The best polynomial
approximation of degree at most n is the projection p = pr(f ) of f onto Rn[x]. We will give more
details on the L2 case in Chapter 7. The situation in the L case is more intricate and we will focus
on it in the sequel of this chapter.
11
12 Polynomial approximations
We rst show that En(f ) 0 as n , a result due to Weierstra theorem, 1885). Various proofs
of this result have been published, in particular, those by Runge (1885), Picard (1891), Lerch
(1892 and 1903), Volterra (1897) Lebesgue (1898), Mittag-Leer (1900), Fejr (1900 and 1916),
Landau (1908), la Valle Poussin (1908), Jackson (1911), Sierpinski (1911), Bernstein (1912),
Montel (1918). The text [16] is an interesting account on Weierstra contribution to Approxima-
tion Theory and, in particular, his fundamental result on the density of polynomials in C([a, b]).
We give now one proof inspired by Bernsteins one.
Theorem 2.1. [Weierstra, 1885] For all f C([a, b]) and for all > 0, there exists n N, p Rn[x]
such that kp f k < .
Proof. Up to a change of variable, we can assume [a, b] = [0, 1]. Dene the Bernstein polynomials as
n
X n
Bn(g, x) = g(k/n) xk (1 x)nk for g C([0, 1]).
k
k=0
We have
n
X n k
Bn(1, x) = x (1 x)nk = 1,
k
k=0
n n
X n k k X n 1 k1
Bn(x, x) = x (1 x)nk = x x (1 x)nk
k n k1
k=0 k=1
n1
X n 1
k n1k
=x x (1 x) = x,
k
k=0
n 2 n
2
X n k X k n 1 k1
Bn(x , x) = xk (1 x)nk = x x (1 x)nk.
k n n k1
k=0 k=1
n
X k 1 n 1 k1 nk x
= x x (1 x) +
n k 1 n
k=1
n
n1 2 X n 2 k 2 x x n1
= x x (1 x)nk + = + x2 .
n k 2 n n n
k=2
Fix > 0. The function f is continuous and hence uniformly continuous over [0, 1], hence there
exists > 0 such that
x1, x2 [0, 1], |x2 x1| < |f (x2) f (x1)| < .
Let M = maxx[0,1] |f (x)|. Since bn,k(x) > 0 for all x [0, 1], we can write
n n
X X
|f (x) Bn(f , x)| 6
(f (x) f (k/n)) bn,k(x) +
(f (x) f (k/n)) bn,k(x)
k=0 k=0
|xk/n|< |xk/n|>
Xn Xn X n
6 bn,k(x) + 2 M bn,k(x) = + 2 M bn,k(x).
k=0 k=0 k=0
|xk/n|> |xk/n|>
M
Therefore, we obtained |f (x) Bn(f , x)| 6 + 2n2 . The upper bound does not depend on x and
can be made as small as desired.
Remark 2.2. One of the very nice features of this proof is that it provides an explicit sequence of
polynomials which converges to the function f . It is worth mentioning that Bernstein polynomials
prove useful in various other domains (computer graphics, global optimization, ...). See [7] for
instance.
Note that, in the proof, we only used the values of the Bn(f , x) for 0 6 n 6 2. In fact, we have
the following result.
Theorem 2.3. (Bohman and Korovkin) Let Ln be a sequence of monotone linear operators on
C(|a, b]), that is to say: for all f , g C([a, b])
Ln (f + g) = Ln(f ) + Ln(g) for all , R,
if f(x)>g(x) for all x [a, b] then Lnf(x)>Lng(x) for all x [a, b],
the following conditions are equivalent:
i. Ln f f uniformly for all f C([a, b]);
ii. Ln f f uniformly for the three functions x 7 1, x, x2 ;
iii. Ln1 1 and (Lnt)(t)0 uniformly in t [a, b] where t : x [a, b] 7 (t x)2 .
A renement of Weierstras theorem that gives the speed of convergence is obtained in terms
of the modulus of continuity.
Proposition 2.5. If f is a continuous function over [0, 1], its modulus of continuity, then
1
9
kf Bn(f , x)k = 4 n 2 .
Proof. Let > 0 and x [0, 1]. Let k {0, ..., n} such that |x k/n| 6 , then |f (x) f (k/n)| 6 w().
Since bn,k(y) > 0 for all y [0, 1], we have
n n
X X
(f (x) f (k/n)) b n,k (x) 6 w()
bn,k(x) = w().
k=0 k=0
|xk/n|<
j k
|x k/n| j
Now, let k {0, ..., n} such that |x k/n| > . Let M =
, let y j = x + M + 1 (k/n x) for
j = 0, ..., M + 1. Note that, for all j = 0, ..., M , we have |y j +1 y j | < , from which follows
M
X
|f (x) f (k/n)| 6 |f (y j +1) f (y j )| 6 (M + 1)w()
j =0
1 k 1 k 2
6w() 1 + x 6 w() 1 + 2 x .
n n
14 Polynomial approximations
Remark 2.6. This result is not optimal. For improvements and renements, see Section 4.6 of
[4] or Chapter 16 of [17] for a presentation of Jackson theorems.
Corollary 2.7. When f is Lipschitz continuous, En(f ) = O n1/2 .
Proposition 2.8. Let (E , kk) be a normed R-vector space, let F be a finite dimensional subspace
of (E , kk). For all f E, there exists p F such that kp f k = min q F kq f k. Moreover, the
set of best approximations to a given f E is convex.
Proof. Let f E. Consider F0 = {p F : kpk 6 2 kf k}. Then F0 is nonempty (it contains 0), closed,
bounded, and we assumed dim F < . Hence F0 is compact. Let (p) = kf pk. The function is
1-Lipschitz and hence continuous. It follows that (F0) is compact: there exists p F0 s.t. (p) =
min pF0 ||f pk. Moreover, if p F \ F0, kf pk > kpk ||f | > kf k > (p) since 0 F0. Thus,
kf pk = min pF ||f pk.
Now, let p and q F be two best approximations to f . For all [0, 1], the vector p + (1 )q
is an element of the vector space F and we have, from the triangle inequality, kp + (1 )q f k 6
kp f k + (1 )kq f k = min q F kq f k: the vector p + (1 )q is also a best approximation
to f .
The best L2 approximation is unique, which is not always the case in the L setting.
Exercise 2.1. Consider the following simple situation : the interval is [1, 1], f is the constant function 1 and
F = Rg where g: x x2. Determine the set of best L approximations to f .
Definition 2.9. Consider n + 1 functions 0, ..., n defined over [a, b]. We say that 0, ..., n
satisfy the Haar condition iff
a) the i are continuous;
b) and the following equivalent statements hold:
for all x0, x1, ..., xn [a, b],
0(x0) n(x0)
=0 i =
/ j , xi = x j ,
0(xn) n(xn)
2.2 Best L (or minimax) approximation 15
given pairwise distinct x0, ..., xn [a, b] and values y0, ..., yn, there exists a unique
interpolant
X n
p= k k , with k R, k = 0,..., n,
k=0
such that p(xi) = yi,
Pn
any p = k=0 k k = / 0 has at most n distinct zeros.
A set of functions that satisfy the Haar condition is called a Chebyshev system. The prototype
example is i(x) = xi, for which we have
0(x0) n(x0) 1 xn0
Y
= = Vn = (x j xi). (2.1)
0(xn) n(xn) 1 xnn 06i< j 6n
(Proof: considering xn = z as an indeterminate and looking at the roots of the polynomial Vn, we
see that Vn = Vn1 (z x0) (z xn1).)
Exercise 2.3. Show that the following families of functions are Chebyshev systems as well:
{e0 x , ..., en x } for 0 < 1 < < n;
{1, cos x, sin x, ..., cos (n x), sin (n x)} over [a, b] where 0 6 a < b < 2 ;
{x0, ..., xn }, 0 < < n, over [a, b] with a > 0.
PLet E be a real vector space, e1, e2, ..., em E , we will denote SpanR{e1, ..., em } the set
{ mk=1 k ek; 1,..., m R}. If {0, ..., n } is a Chebyshev system over [a, b], any element of
SpanR{0, ..., n } will be called a generalized polynomial.
Theorem 2.10. [Alternation Theorem. Chebyshev? Borel (1905)? Kirchberger (1902)?] Let
{0, ..., n } be a Chebyshev system over [a, b]. Let f C([a, b]). A generalized polynomial
Pn
p = k=0 k k is the best approximation (or minimax approximation) to f iff there exist n +
2 points x0, ..., xn+1 , a 6 x0 < x1 < < xn+1 6 b such that, for all k,
P15
Example 2.12. Let f : x [0.9, 0.9] 7 arctan (x), p = k=0 ckxk its minimax approximation.
The graph of the error function = f p is:
16 Polynomial approximations
Example 2.13. The best approximation to cos over [0, 10 ] on the Chebyshev system {1, x, x2}
is the constant function 0! Moreover, the same is true for {1, x, ..., xh } up to and including h = 10.
(1)/2
[
K2 = [y1, y2] [y3, y4] = [y2k+1, y2k+2].
k=0
The sets K1 and K2 are nite unions of compact sets, and hence compact.
kp f k < p + Q f < kp f k,
2.2 Best L (or minimax) approximation 17
Finally, let us prove the uniqueness. Let p, q be two best approximations, and let
= kf pk = kf qk.
1
It follows from Proposition 2.8 that 2 (p + q) is a best approximation too. Thus there exist
t0 < t1 < < tn+1 such that
p+ q
(ti) f (ti) = (1)i .
2
Thus, we have p(ti) f (ti) = q(ti) f (ti)(=(1)i ) for all i = 0, ..., n + 1, and hence p = q by the
Haar condition.
Theorem 2.14. (La Valle Poussin) Let f C([a, b]). Let {0, ..., n } be a Chebyshev system
over [a, b], and let p SpanR {0, ..., n }. If there exist x0 < x1 < < xn+1 such that p f alternates
at the xi, then
min |f (xi) p(xi)| 6 En(f ) 6 kf pk,
i
Proof. The second inequality is obvious. If the rst one does not hold, assume wlog that f (x0) >
p(x0). Then, if p is the best approximation of f , we have, for all k = 0, ..., n, (1)k(f (xk) p(xk)) >
(1)k(f (xk) p(xk)): the generalized polynomial p pchanges sign n times over [a, b], which is
not possible.
Remark 2.15. The statements from Theorems 2.10 and 2.14 remain valid if [a, b] is replaced with
any compact subset of R containing at least n + 2 points.
Theorem 2.16. [Haars Unicity Theorem] Let {0, ..., n } be a Chebyshev system over [a, b].The
P
minimax approximation to a continuous function f by a generalized polynomial p = nk=0 k k is
unique for all choices of f iff {0, ..., n } satisfies the Haar condition.
Remez (1934) proposed the following algorithm to approximate the minimax polynomial.
Algorithm 2.1
Remez rst algorithm
Input. A segment [a, b], a function f C([a, b]), a Chebyshev system {k }06k6n, a toler-
ance .
Output. An approximation of the best approximation of f on the system {k }.
1. Choose n + 2 points x0 < x1 < < xn+1 in [a, b], 1, 0.
2. While > || do
a. Solve for a0, ..., an and the linear system
n
X
ak k(x j ) f (x j ) = (1) j , j = 0, ..., n + 1.
k=0
Replace one of the xi with xnew, in such a way that p f alternates at the points of the
resulting sequence x0,new, ..., xn+1,new. Set = |p(xnew) f (xnew)| ||.
3. Return p.
Proof. We show that the The La Valle-Poussin theorem tells us that after each iteration, we
have || 6 En(f ) 6 || + .
We will not give more details concerning this algorithm. See [4] or [17].
Theorem 2.17. Let pk denote the value of p after k(n+2) loop turns, and let p be such that
En(f ) = kf pk. There exists (0, 1) such that kpk pk = O(k).
k
Under mild regularity assumptions, the bound O(k) can in fact be improved to O( 2 ) [25].
Let A be a commutative ring (with unity). Given pairwise distinct x0, ..., xn A and corres-
ponding y0, ..., yn A, the interpolation problem is to nd p An[x] such that p(xi) = yi for all i.
P
Write p = k ak xk. The problem can be restated as
V a= y (2.2)
where V is a Vandermonde matrix. If det V is invertible, there is a unique solution.
From now on we assume A = R. The expression (2.1) of the Vandermonde determinant shows
that as soon as the xi are pairwise distinct, there is a unique solution. We now discuss several ways
to compute the interpolation polynomial.
Linear algebra. We could invert the system (2.2) using standard linear algebra algorithms. This
takes O(n3) operations using Gaussian elimination. In theory, the best known complexity bound
is currently O(n) where 2.3727 (Williams). In practice, Strassens algorithm yields a cost of
O(nlog2 7). There are issues with this approach, though:
the problem is ill-conditioned: a small perturbation on the yi leads to a signicant perturb-
ation of the solution.
we can do better from the complexity point of view: O(n2) or even O(n logO(1) n) in general,
O(n log n) if the xi are so-called Chebyshev nodes;
pn+1(x j ) = y j , 0 6 j 6 n,
pn+1(xn+1) = pn(xn+1) + an+1 (xn+1 x0) (xn+1 xn).
2.4 Interpolation and approximation, Chebyshev polynomials 19
Given y0, ..., yk, we denote by [y0, ..., yk] the corresponding ak: Then, we can compute ak using
the relation
[y , ..., yk+1] [y0, ..., yk]
[y0, ..., yk+1] = 1 .
xk+1 x0
This leads to a tree of the following shape.
[y0, ..., yn]
Then we have deg L j = n and L j (xi) = i,j for all 0 6 i, j 6 n. The polynomials L j , 0 6 j 6 n, form
a basis of Rn[x], and the interpolation polynomial p can be written
n
X
p(x) = yi Li(x).
i=0
In short, we should never use equidistant points when approximating a function by interpola-
tion. Are there better choices?
We discuss better choices below. We start with the following analogue of the Taylor-Lagrange
formula.
Theorem 2.19. Let a < x0 < < xn < b, and let f C n+1([a, b]). Let p Rn[x] be such that
f (xi) = p(xi) for all i. Then, for all x [a, b], there exists x (a, b) such that
n
f (n+1)(x) Y
f (x) p(x) = W (x), W (x) = (x xi).
(n + 1)!
i=0
Proof. This is obvious when x {xi }. Assuming x / {xi }, let = f p W where is chosen
so that (x) = 0. Then, we have (xi) = 0 for all i, and by Rolles theorem there exist n+1 points
y1 < < yn+1 with (yi) = 0. Iterating the argument, there exists [a, b] such that (n+1)() = 0.
Now recall that the polynomial W is monic and has degree n + 1, the polynomial p has degree at
most n: this implies W (n+1)() = (n + 1)! and p(n+1)() = 0, which yields the result.
This result encourages us to search for families of xi which make kW k as small as possible.
Its time for us to introduce Chebyshev polynomials.
Assume [a, b] = [1, 1]. The n-th Chebyshev polynomial of the rst kind is dened by
Tn(cos t) = cos (n t),t [0, 2].
The Tn can also be dened by
T0(x) = 1, T1(x) = x, Tn+2(x) = 2xTn+1(x) Tn(x),n N.
Among their numerous nice features, there is the following result which suggests to consider a
certain family of interpolation nodes.
d
For all n > 0, we have dx
Tn = n Un1. So the extrema of Tn+1 are 1, 1 and the zeros of Un,
that is,
i
k = cos , k = 0, ..., n,
n
called the Chebyshev nodes of the second kind. With W (x) = 2n+1(1 x2) Un1(x), we have
kW k = 2n+1.
It is obvious that deg Tn = deg Un = n for all n N. Therefore, in particular, the family (Tk)06k6n
is a basis of Rn[x]. In the sequel of the chapter, we give results that allow for the (fast) computation
of the coecients of interpolation polynomials, at the Chebyshev nodes, expressed in the basis
(Tk)06k6n.
2.5 Clenshaws method for Chebyshev sums 21
Proposition 2.22.
P
i. If p1,n = 06i6n c1,i Ti Rn[x] interpolates f on the set {k : 0 6 k 6 n}, then
n
2 X
c1,i = f (k) Ti(k).
n+1
k=0
P
ii. Likewise, if p2,n = 06i6n c2,i Ti interpolates f at {k: 0 6 k 6 n}, then
n
2X
c2,i = f (k) Ti(k).
n
k=0
Proof. Exercise.
Recall that the polynomials Tk satisfy Tk+2(x) = 2x Tk+1(x) Tk(x). A rst idea would be to
use this relation to compute the Tk(t) that appear in the sum. Unfortunately, this method is
numerically unstable. This is related to the fact that the Uk(x) satisfy the same recurrence but
grow faster: we have
kTk k = 1, kUk k = k + 1.
Clenshaws algorithm below does better.
Algorithm 2.2
Input. Chebyshev coecients c0, ..., cN , a point t
PN
Output. k=0 ck Tk(t)
1. bN +1 0, bN cN
2. for k = N 1, N 2, ..., 1
a. bk 2 t bk+1 bk+2 + ck
3. return c0 + t b1 b2
22 Polynomial approximations
The sum simplies to c0 + b1 t + b2 (T2 2 t T1) using the recurrence relation and the values of bN ,
bN +1.
So the c j are (up to scaling) the real part of the discrete Fourier transform of the yk.
The DFT is the map Y 7 V Y , where = e2i/M and V = Vandermonde(1, , ..., M 1). We
have V 1 V = M Id, hence
PMthe DFT is almost its own inverse. The DFT sends the coecient
1
vector of a polynomial P = n=0 yn xn to its values P (1), P (), ..., P ( M 1).
Assume that M = 2 m is even; then, m = 1. Rewrite P as
P (x) = Q0(X) (X m 1) + R0(X) = Q1(X) (X m + 1) + R1(X)
with deg R0, deg R1 < m. Then (
R0( ), even
P ( ) =
R1( ), odd.
3.1.1 Definition
Notation 3.1. K denotes a field, K[[x]] the ring of formal power series with coefficients in K,
and K((x)) the field of fractions of K[[x]], that is, the field of formal Laurent series. Observe that
K((x)) is an algebra over K(x).
Definition 3.2. A formal power series A K[[x]] is called dierentially nite (abbreviated D-
nite) when its derivatives A, A , A , ... span a finite-dimensional vector subspace of K((x))
regarded as a vector space over K(x).
In other words, there exist polynomials p0(x), ..., pr(x) in K[x] such that A satises a linear
dierential equation of the form
p0(x) A(r)(x) + + pr 1(x) A (x) + pr(x) A(x) = 0.
Example 3.3. Rational functions are D-nite, and so are the classical exp, ln, sin, cos, sinh, cosh,
arcsin(h), arccos(h), arctan(h) as well as many special functions like Bessel J , I , K , Y , Airy Ai
and Bi, the integral sine (Si), cosine (Ci) and exponential (Ei) and many more.
Our point of view on these objects is that dierential equations will serve as a data structure
to work with D-nite series.
Translation. A sequence (an) is P-recursive when it satisfies a recurrence relation of the form
3.1.2 Translation
Theorem 3.5. A formal power series is D-finite if and only if its sequence of coefficients is
P-recursive.
23
24 D-Finiteness
Proof. We have the following dictionary (actually a ring morphism in a suitable setting):
f (x) fn
f (x) fn
x f (x) fn1
x f (x) n fn.
By combining these rules, we can translate any monomial xi f (j)(x) (resp. ni fn+ j ), and hence any
linear dierential equation/recurrence.
Example 3.6. The dierential equation y = y that denes exp translates into (n + 1)yn+1 = yn
that denes /n!.
Example 3.7. The orders of the linear recurrence and dierential equation do not necessarily
match. For instance, the rst order y xk1 y = 0, that denes exp(xk/k) translates into the
linear recurrence (n + 1)yn+1 ynk+1 = 0 of order k. This recurrence has a vector space of
solutions of dimension k, but only a subspace of dimension 1 corresponds to the solutions of the
linear dierential equation. This subspace can be isolated by paying attention to the initial values
during the translation. Here, the identities y1 = = yk1 = 0 also come out of the translation.
Example 3.8. Assume we want to compute the coecient of x1000 in
p(x) = (1 + x)1000 (1 + x + x2)500.
A naive way of doing it would be to expand the polynomial. However, observing that
p (x) 1000 (2 x + 1)
= + 500 ,
p(x) 1 + x 1 + x + x2
yields a linear dierential equation (LDE) of order 1 for p, with coecients of order 3:
Maple 7] p:=(1+x)^1000*(1+x+x^2)^500:
deq:=numer(diff(y(x),x)/y(x)-diff(p,x)/p);
999 2 499 d 3 d 2 2 d
(1 + x) (x + x + 1) y(x) x + 2 y(x) x 2000 y(x) x + 2 y(x) x
dx dx dx
d
2500 y(x) x + y(x) 1500 y(x)
dx
This equation then translates into a linear recurrence equation (LRE) of order 3 with linear
coecients:
Maple 8] gfun:-diffeqtorec({%,y(0)=1},y(x),u(n));
{(2000 + n) u(n) + (2498 + 2 n) u (n + 1) + (1496 + 2 n) u (n + 2) + (n + 3) u (n + 3),
u(0) = 1, u(1) = 1500, u(2) = 1124750}
Then it suces to unroll this recurrence. A fast way of doing so is presented in 3.4.
Example 3.11. A computation-free proof that sin2 + cos2 = 1. Both sin and cos are dened by
y + y = 0. If f is a solution of this equation, then as in the previous example, f 2 is solution of a
linear dierential equation of order at most 3, all derivatives being generated by (f 2, f f , f 2). Thus
both sin2 and cos2 satisfy the same linear dierential equation of order at most 3, and therefore so
does their sum. Next, 1 is solution of a linear dierential equation of order 1, namely y = 0 and
thus the sum sin2 + cos2 1 and its derivatives live in a vector space of dimension at most 4, hence
is solution of a linear dierential equation of order at most 4. Checking that the desired solution
of this equation is exactly 0 reduces to checking 4 initial conditions, ie, the proof is summarized by
sin2x + cos2x 1 = O(x4) = sin2 + cos2 = 1.
Note that the actual value of the dierential equation is not important here, only its order has
been used.
Note also that one can simplify the argument further (and reduce the order accordingly) if one
takes into account the fact that sin = cos . Then we consider h = f 2 + f 2 1 whose derivative is
h = 2f f 2f f = 0 and thus checking the value at 0 is sucient.
Proof. This is a combination of the previous results: since f and g are D-nite, their sequences
of coecients (fn) and (gn) are P-recursive, then their product (fngn) is P-recursive too by the
previous theorem and nally the generating series of this product is D-nite.
26 D-Finiteness
Example 3.13. Mehlers identity for Hermite polynomials. Hermite polynomials are dened by
X zn
Hn(x) = exp (z (2 x z)).
n!
n>0
Mehlers identity asserts that
4z(x y z (x2 + y 2))
X z n exp 1 4z 2
Hn(x) Hn(y) = .
n! 1 4z 2
n
This can be proved easily by noticing that the left-hand side is nothing but
X
exp (z(2x z)) exp (z(2y z)) n!z n ,
n=0
As a simple consequence, the series expansions of algebraic functions can be computed in only
linear (ie, optimal) arithmetic complexity.
Proof. Without loss of generality, we can assume that P and P y = y
P are coprime. From the
equation P (x, A(x)) = 0, we get by dierentiation
Px(x, A(x)) + P y(x, A(x)) A (x) = 0.
We then invert P y mod P . Let U , V K(x)[y] be the co-factors in Bzouts identity, that satisfy
U P y + V P = 1. Then multiplying the previous equation by U (x, A(x)) leads to
U (x, A(x)) Px(x, A(x)) + 1 V (x, A(x))P (x, A(x)) A (x) = 0.
||||||||||||||{z}}}}}}}}}}}}}}
0
The factor of A is exactly 1 (this was the aim of the inversion modulo P ) and the rst term is a
polynomial evaluated at A(x), which is therefore equal to the evaluation of its remainder in the
Euclidean division by P . Denoting by the degree of P with respect to y, we obtain
A (x) = R1(x, A(x)), with deg yR1 < .
Dierentiating once more leads to
A (x) = R1x(x, A(x)) + R1y(x, A(x))R1(x, A(x)) = R2(x, A(x)) with deg yR2 < .
Hence the bit size of n! is (n log n) (and thus we cannot hope to compute it in less than (n) bit
operations). Similarly, 1!, 2!, ..., n! taken together have size (n2 log n).
In the nave alogrithm (compute n! by successive multiplications by k, k = 2, ..., n), the
multiplication k (k 1)! can be done by decomposing (k 1)! into k chunks of roughly log k bits.
The cost of the multiplication is then O(k log k log log k log log log k), and the total complexity
O(n2 log n log log n log log log n),
which is not too bad if we need all the values 1!, 2!, ..., n!.
If however we want to compute n! alone, we can to much better. Dene
P (a, b) = (a + 1) (a + 2) b.
Compute n! as
n! = P (0, n) = (1 2 n/2) ((n/2 + 1) n) = P (0, n/2) P (n/2 + 1, n)
and recurse. The key observation is that P (0, n/2) has size half of that of n! (by Stirlings formula)
and thefore so does the second factor. Assuming for simplicity that n is a power of 2, the binary
complexity can be bounded as follows:
n n n
C(0, n) = C 0, +C , n + MZ log n ,
2 2 2
n n
6 MZ log n + 2C ,n ,
2 2
n n 3n
6 MZ log n + 2MZ log n + 4C ,n ,
2 4 }}}}}}}}}}}}}}}}}}}
||||||||||||||||||||{z} 4
n
6MZ 2 log n
n
6 6 MZ log n log n = O(n log3 n log log n).
2
In the second line, we use the fact that the factors increase; in the third line, we iterate the
inequality once and use the convexity of the multiplication function M; in the last one the bound
log n on the number of recursive steps.
Finally, we have obtained n! in quasi-optimal bit complexity.
Theorem 3.17. Assume that the recurrence ( 3.1) is nonsingular ( 0 / p0(N)), that all pi have
degree bounded by d and integer coefficients bounded by 2, and that u0, ..., ur 1 Q have numer-
ators and denominators bounded by 2. Then one can compute uN in
1
Additionally, we have 0 < e en < n n! for all n, hence e en < 10N for n = O(N /log N ). Now
we obtain a linear recurrence of order 2 satised by (en):
1 1
en+1 en = = (en en1).
(n + 1)! n + 1
By Theorem 3.17, it follows that e con be computed within precision 10N in O(N log2 N log log N )
bit operations. (This gives us a huge rational number close to e. To get a binary/decimal expansion,
there remains to do a division. The inverse of the denominator can be computed eciently by
Newtons method for O(MZ(N )) bit operations.)
Example 3.18. All recent records in the computation of were obtained using basically the same
technique, starting from the following series due to the Chudnovskys
1 12 X (1)n(6n)!(A + Bn)
= 3/2 ,
C (3n)!n!3C 3n
n=0
with A = 13, 591, 409, B = 545, 140, 134, C = 640, 320. This series yields about 14 digits per term.
That alone is not sucient to reach a good eciency, which is achieved by observing that the
summand satises a linear recurrence of rst order and applying binary splitting.
3.5 Numerical evaluation 29
Theorem 3.19. (Cauchys theorem.) If z0 C is such that a0(D(z0, R)) / 0 then, for any y0, ...,
yr 1 , there exists a solution of Eq. ( 3.2) such that y (i) = yi for 0 6 i < r and that is analytic
in D(z0, R).
Since the coecients of Y are dominated by those of u, which is analytic, the series Y is convergent
for |z z0| < . And since we can do this for any < R, the function Y is analytic in D(z0, R).
The proof also yields bounds on the tails of the series expansion of Y (z). Indeed, we have
X
X
n
Yn (z z0)
6 yn |z z0|n
n>N
n>N
30 D-Finiteness
where
n+1
yn = y0 n (1)n = y0 n
n n
+ n 1 + N |z z0|
6 y0 N |z z0|N 1 + + .
N 1+N
The series on the right-hand side is convergent, say its sum is bounded by M . Now, in order to
ensure
X
Yn (z z0)
6 10k ,
n
n>N
it is sucient to take
k log 10 6 N log + log (...) + cst
|z z0|
i.e.
log 10
N >k + cst.
log |z z |
0
Combining this bound with the binary splitting method, we obtain the following theorem.
Corollary 3.20. When yi Q, ai Q[z], Q D(z0, R), all with numerators and denominators
bounded by 2, then y() can be computed at precision 10N in
2 r log N +
O N log N log log N R
log | z |
0
Proof. Consider Y [i] with initial conditions 0 except in the ith coordinate that is 1, for 1 6 i 6 r.
Clearly, such a matrix satises W = A W and any solution can be written W C with C a
constant vector.
Definition 3.23. The transition matrix between z0 and z1 D(z0, R) is the matrix such that
W (z1) = M (z0 z1) W (z0).
This matrix is well dened since W (z0) is invertible. The fundamental matrix itself has a radius
of convergence and by Cauchys theorem the solution can be extended to an analytic continuation
inside this new disk. Proceeding in this manner, one constructs a path (z0, z1, ..., zk) and transition
matrices M (z0 z1), M (z1 z2), ..., M (zk1 zk) whose product (in the right order) constructs
the analytic continuation of the fundamental matrix along that path. Each of these matrices can
be computed eciently by binary splitting if one takes for zi points with rational coordinates
bounded by 2 in the notation of the previous section. With a bit more eort, the cumulated error
is bounded by the product of the norms kM (zi zi+1)k that can be controled.
Proposition 3.24. For any D(z0, R), the value y() can be computed at precision 2N in
O(N log3N loglog N ) binary operations.
Using Horners scheme, evaluation of a polynomial of degree 8 uses 8 multiplications and 8 addi-
tions. Evaluation of a deg 4/deg 4 rational fractions uses 8 multiplications, 8 additions, plus one
(expensive!) division. Rational approximation makes little sense in this case.
33
34 Rational Approximation
4.2.1 Existence
Proposition 4.1. To each function f C([a, b]), there corresponds at least one best rational
approximation in Rm,n.
Proof. By analogy with the polynomial case, we might be tempted to consider the set
{R Rm,n : kR f k 6 2 kf k}.
This is a nonempty, closed and bounded set. It is not compact, though (remember that Rm,n is
not a nite-dimensional vector space over R, unlike Rn[X]). To illustrate this, simply consider the
1
sequence of continuous functions Rk(x) = kx + 1 , x [0, 1], k N. For any k N, kRk k 6 1 but the
function R dened as limk+Rk is not continuous since R(0) = 1 and R(x) = 0 otherwise.
4.3 An extension of Taylor approximation: Pad approximation 35
Theorem 4.3. (Achieser, 1930) Let f C([a, b]). A rational function R Rm,n is a best
approximation to f if and only if R f equioscillates between at least m + n + 2 d(R) extreme
points. There is a unique best approximation.
Remark 4.4. There is again a Remez algorithm for computing best rational approximations, with
the same rate of convergence as in the polynomial case.
4.3.1 Introduction
Let K be a eld, and let f K[[x]]. For m N, the degree-m Taylor approximant to f is the unique
pm Km[x] such that
f (x) pm(x) = 0 mod xm+1.
If now f C m+1 in the neighborhood of 0 (instead of f being a formal series), the analogous
condition is f (x) pm(x) = O(xm+1).
To extend this to rational functions, given f K[[x]], we would like to determine R = P /Q
(P Km[x], Q Kn[x]) such that
f (x) R(x) = 0 mod xm+n+1. (4.1)
Here again, we may also consider f C m+n+1, in which case we ask that
f (x) R(x) = O(xm+n+1).
In contrast with the case of Taylor approximation, it is not always possible to satisfy (4.1).
If we consider instead the problem of nding P Km[x] and Q Kn[x] such that
Q(x)f (x) P (x) = 0 mod xm+n+1,
it always has a nontrivial solution: think of it as a linear algebra problem, the homogeneous
linear system has n+1 + m+1 = n + m + 2 unknowns which are the coecients of P and Q and
n + m + 1 quations. Actually this linear system is given by a so-called Toeplitz matrix of dimension
(m + n + 1, n + 1). It is a structured matrix for which fast inversion algorithms exist, with the
same costs as the ones given in Remark 4.13. Nevertheless, we favour another presentation of the
problem.
Remark 4.7. Since T is invertible, we feel like considering the problem of nding (R, T ) K[x]2
such that
P2. deg R < k, deg T 6 N k, and R = P T mod M .
This condition is strictly weaker than the condition in P1.
Algorithm 4.1
Euclidean division.
Pn Pm
Input. Two nonzero polynomials A = i=0 ai xi and B = i=0 bi xi K[x].
Output. The couple (Q, R) K[x]2 such that A = B Q + R and deg R < m
1. If n < m, return Q = 0 and R = A
1
2. R := A, u := bm ,
3. for i = n m, n m 1, ..., 0, do
4. if deg R = m + i then qi := lc (R) u, R := R qi X i B
else qi := 0
P
5. Return Q = 06i6nm qi X i and R
Proposition 4.8. Euclidean division. Let n, m N, A and B K[x] with deg A = n and deg B = m.
1. A couple (Q, R) K[x]2 such that A = B Q + R and deg R < m is unique.
2. The Algorithm 4.1 returns the correct output.
4.3 An extension of Taylor approximation: Pad approximation 37
Proof. 1. Let (Q1, R1) and (Q2, R2) K[x]2 such that A = B Q1 + R1 and deg R1 < m,
A = B Q2 + R2 and deg R2 < m. Therefore, we have B (Q1 Q2) = R1 R2. The polynomial
B divides R1 R2 and deg B > deg (R1 R2) : necessarily, Q1 Q2 = 0 and consequently
R1 R2 = 0.
2. By construction, the nal R satises R = A Q B. Moreover, for all i = n m, n m 1, ...,
0, if deg R = m + i, then deg (R qi X i B) 6 m + i 1 since lc(R) = lc(qi X i B) by
construction. Therefore, we have deg R 6 m + i 1 after step 4. We get by induction that
the degreee of the nal R is lesser than m. We proved the uniqueness in 1.
3. Step 2 requires an inversion. Step 4 requires m + 1 multiplications (qi bj for j = 0, ..., m 1)
and m subtractions (for j = 0, ..., m 1, if ri+j denotes the coecient of X i+j in R, we
perform the subtractions ri+ j qi b j for j = 0, ..., m 1). This last step is executed n m + 1
times, which yields the result.
Algorithm 4.2
Extended Euclidean algorithm.
Pm Pn
Input. Two nonzero polynomials A = i=0 ai xi and B = i=0 bi xi.
Output. N, four tuples of polynomials (Qi)16i6, (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1.
1. Set
R0 := A, S0 := 1, T0 := 0,
R1 := B , S1 := 0, T1 := 1,
i := 1.
2. While Ri =
/ 0 do
a. Let Qi , Ri be the quotient and the remainder in the Euclidean division of Ri1 by Ri.
b. Set Si+1 := Si1 Qi Si.
c. Set Ti+1 := Ti1 Qi Ti.
d. i := i + 1.
3. Return = i 1, (Qi)16i6, (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1.
Proof.
1. By induction on i. For i = 0, its the rst step of the algorithm. We now assume i > 1 and
A Ri1 0 1 Ri1 Ri Ri
Ui = Wi = = = ,
B Ri 1 Qi Ri Ri1 Qi Ri Ri+1
38 Rational Approximation
2. A straightforward induction yields the rst equality which, combined with 1 implies Si A +
Ti B = Ri for 0 6 i 6 + 1.
3. We have, by denition, Ui = Wi...W1U0. It follows det (Ui) = det (Wi)...det(W1 )det(U0).
Since det Ui = Si Ti+1 Si+1 Ti, det (W j ) = 1 for all j and det U0 = 1, we obtain
Si Ti+1 Si+1 Ti = (1)i.
4. Let i {0, ..., }. We deduce from 1 that
R A Ri
= WWi+1Ui = WWi+1 .
0 B Ri+1
It follows that R is a linear combination over K[x] of Ri and Ri+1. Therefore gcd (Ri,Ri+1)
divides R. Moreover, det Wi = 1 : the matrix Wi is invertible of inverse
1 Qi 1
Wi = .
1 0
Hence
Ri R
1
= Wi+1 W1
Ri+1 0
which implies that R divides Ri and Ri+1. This shows that R is agreatest common divisor
of Ri and Ri+1 and gcd (Ri,Ri+1) =R/lc(R). This is true in particular for i = 0.
Only the second statement will be useful to solve P2. We use the rst one to prove Proposition
4.11.
Proof. We rst observe that from the initial assumption,deg R0 > deg R1 and by construction,
deg Ri > deg Ri+1 for 1 6 i 6 . It follows that deg Q1 > 0 and for 2 6 i 6 , deg Qi > 0 since Qj
is the quotient of the division of R j 1 by R j for j = 1, ..., . Therefore we have for 1 6 i 6 ,
deg (Qi Ri) = deg Qi + deg Ri > deg Ri > deg Ri+1, hence deg P Ri1 = deg (Qi Ri + Ri+1) = deg (Qi Ri)
i.e. deg Qi = deg Ri1 deg Ri for 1 6 2 6 . We obtain 26 j <i deg Q j = deg R1 deg Ri1 for all
P
2 6 i 6 + 1 and 16 j <i deg Q j = deg R0 deg Ri1 for all 1 6 i 6 + 1.
We have S2 = S0 Q1 S1 = 1 and deg S1 = < 0 = deg S2. Lets assume that we proved
X
deg S j 1 < deg S j for all 2 6 j 6 i and deg Si = deg Q j
26 j <i
if we apply the induction hypothesis again. This proves that the induction hypothesis also holds
for i + 1.
We have deg T1 = 0 and T2 = 0 Q1 = Q1, hence deg T2 = deg Q1.
The rest of the proof is identical to that of the rst statement since we can show
X
deg Ti > deg Ti1 and deg Ti = deg Qj
16 j <i
4.3 An extension of Taylor approximation: Pad approximation 39
deg Ri) m + 2 m)
P
because deg Ri1 deg Ri > 1 for 2 6 i 6 . This upper bound 2 m 26i6 (deg Ri1 deg Ri) +
2 m ( 1) = 2 m (deg R1 deg R) + 2 m2 6 4 m2 = O (n m) for we assumed n > m.
Likewise the computation of Ti+1 = Ti1 Qi Ti requires at most 2 deg Qi deg Ti + deg Qi +
deg Ti + 1 arithmetic operations for the product Qi Ti and deg Ti+1 + 1 arithmetic operations for
the subtraction.
From Proposition 4.10, we deduce that the cost is no larger than the sum of n m + 1 (for
i = 1) and
X X
(2 (deg Ri1 deg Ri) (deg R0 deg Ri1) + 2 (deg R0 deg Ri + 1)) 6 (2 (deg Ri1
26i6 26i6
deg Ri) n + 2 n)
sincePdeg Ri1 deg Ri > 1 for 2 6 i 6 . The cost is upper bounded by n m + 1 +
2 n 26i6 (deg Ri1 deg Ri) + 2 n ( 1) = n m + 1 + 2 n (deg R1 deg R) + 2 n m 6
n m + 1 + 4 n m 6 n m + 4 n m = 5 n m as soon as m > 1.
The total cost is in O (n m).
4.3.2.2 Solving the approximation problems P1 and P2
Let (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1 be the sequences of remainders and Bzout coecient
constructed by the extended Euclidean algorithm applied to the couple (M , P ). We have, see
Proposition4.8,
Ri = Si M + Ti P = Ti P mod M
for all i {0, ..., + 1}. We search for an i such that
deg Ri < k and deg Ti 6 N k.
Proposition 4.10 states that deg Ti = N deg Ri+1. Hence we want deg Ri < k 6 deg Ri1. The
sequence (deg Ri)06i6+1 is strictly decreasing: we have deg R0 = deg M =N > deg P = deg R1 and
(deg Ri)16i6+1 is always strictly decreasing. Since deg R0 = N and deg R+1 = , the integer i
exists: it is the smallest index j such that deg Ri < k. The couple (R j , T j ) is a solution to P2.
We now focus on P1. Weve just found R j and T j such that deg R j < k, deg T j 6 N k and
R j = T jP mod M. If gcd (R j , T j ) = 1, since gcd (R j , T j ) = gcd (M , T j ) = 1, the couple (R j , T j ) is a
solution to P1!
40 Rational Approximation
Conversely, assume that the couple (R, T ) is a solution to P1, with gcd (R, T ) = 1. Let S K[x]
such that R = SM +T P . We also have S jM + T jP = R j . If we suppose that S jT = / ST j , we get
M = (R jT RT j )/(S jT ST j ). Therefore, N = deg M = deg (R j T R T j ) deg (S j T S T j ).
Moreover,
deg (R j T R T j ) 6 max (deg (R j T ), deg (R T j )) = max (deg R j + deg T , deg R + deg T j )
6 max (k 1 + N k, k 1 + N k)
<N ,
from the denition of j and condition P1. Therefore, N = deg M 6 deg (R jT R T j ) < N :
contradiction.
Hence, we deduce T j |S j T . We know from Proposition 4.9 that S j and T j are coprime. This
implies that there exists K[x] such that T = T j , from which follows S T j = S j T = S j T j . As
T j is nonzero, we also obtain S = S j . Finally, R = S M + T P = (S j M + T jPB) = R j . Weve
just proved the following results.
Theorem 4.12.
1. There exists a solution (R, T ) = / (0, 0)to P2, which is (R, T ) = (R j,T j ). If, moreover,
gcd (R j,T j ) = 1, then (R j,T j )is also a solution to P1.
2. If P1 has a solution R/T K[x] with gcd (R, T ) = 1, then there exists K \ {0} such
R = R j and T = T j.
The problem P1 has a solution if and only if gcd (M , T j )is also a solution to P1.
Exercise 4.2. How to modify the extended Euclidean algorithm to answer P1 and in particular the Pad
approximation problem?
Remark 4.13. The cost of the computation of a solution to P1 or P2 is essentially the cost of the
extended Euclidean algorithm. Proposition 4.11 tells us that this cost is in at most O(N 2) arith-
metic operations. Using a fast Euclidean algorithm [27], one can reduce this cost to O(M (N )log N )
arithmetic operations.
Definition 4.14. A number C is algebraic when there exists a nonzero polynomial P Q[x]
such that P () = 0. It is said to be transcendental otherwise.
4.4 Application of Pad approximation to irrationality and transcendence proofs 41
Nothing was known about transcendence until Liouville proved in 1844 that for a N, a > 2,
X
an! is transcendental.
n=0
The transcendence of this number is related to the very fast convergence of the series. This is also
illustrated in the proof of the irrationality of ea
/ Q when a Q that we now present.
Exercise 4.3. Try (and fail!) to extend this proof to the case of ea, a N, a > 1.
Lemma 4.16. Let x R, if there exist two sequences (pn)and (qn) ZN satisfying the conditions:
1. there exists n0 Nsuch that qnx pn =
/ 0, n > n0 ,
2. limn+ qnx pn = 0,
then x
/ Q.
Proof. Exercise.
We want to determine two sequences (Pm,n) and (Qm,n) Q[X], such that
deg Pm,n 6 m and deg Qm,n 6 n, Qm,n(z)ez Pm,n(z) = 0 mod (z m+n+1). (4.4)
d
We rst determine Qm,n. If D denotes the operator dz , we have D(Q(z)ez) = Q (z)ez + Q(z)ez =
ez(D + I)Q(z). Therefore, if we apply the operator (D + I)m+1 to the last equality of(4.4), it yields
which is equivalent to (D + I)m+1 Qm,n(z) = 0 mod (z n). Since deg Qm,n(z) 6 n, there exists
km,n Q such that (D + I)m+1 Qm,n(z) = qm,nz n. Now recall that
+
1 X
k m+k
= (1) X k.
(1 + X)m+1 m
k=0
ln (1 + x)
> plot([seq(app[i],i=1..K),f],x=-2..5,view=-1..3,scaling=constrained);
On this picture, we observe that the approximants do not converge for x 6 1 (where there is
nothing to converge to), but do seem to converge to f for the other real values of x. Note that
this is achieved by starting from a series that converged only for |x| < 1. Another phenomenon
that appears is that the approximants are alternatively above and below the graph of the function.
This appears more clearly on a specic value, as well as the speed of convergence:
43
44 Numerical approximation using Pad approximants
> evalf(ln(3));
1.098612289
This gives the interval (1.098609, 1.098626) of width less than 2105 containing ln (3). Observe
again that we have computed these good approximations starting from a power series that con-
verges only in the unit disk!
The case of tan is similar:
> S := series(tan(x), x, K+1);
1 3 2 5 17 7 62 9
x+ x + x + x + x + O (x11)
3 15 315 2835
> eval([seq(app(i), i=1..10)], x=evalf(Pi));
Since the function tan is odd, Pad approximants are identical by pairs. Convergence still
holds, even beyond the disk of convergence of the power series they are constructed from. Here
is a picture illustrating this phenomenon with approximants constructed from the expansion at
order 20 of the series:
We now try something more unsettling: using Pad approximants to evaluate a divergent series!
> S := add((-1)^n*n!*x^n, n=0..K);
> evalf(subs(x=2,Int(exp(-t)/(1+t*x),t=0..infinity)));
0.4614553164
Again, the aproximations seem to converge, and it turns out that the limiting value has a
meaning. In that case, the power series is the (divergent) asymptotic expansion of the following
function: Z
et
dt,
0 1+xt
5.1 Numerical experiments 45
All these examples motivate a better study of Pad approximants and their convergence prop-
erties from the numerical point of view.
Starting from there and using = /3, Archimedes was able to compute the values up to k = 5,
corresponding to polygons with 96 edges! Let us rst look at the values he must have obtained.
3.141592654
>
46 Numerical approximation using Pad approximants
Increasing the precision shows that an accuracy of about 10 digits has been obtained, starting
from the same first 5 values!
We now review a few classical methods in acceleration of convergence, before relating them to
Pad approximants.
Eulers method. A rst idea is the following. Assume that
Sn = S + r n (n)
for some known r < 1 and (n + 1)/(n) 1. Writing
Sn+1 = S + r n+1 (n + 1),
we can see that
Sn+1 r Sn r n+1 (n)
Tn = = S + (n + 1) 1
1r 1r (n + 1)
should converge faster, since the last factor tends to 0. This is the idea that was used by Huygens
in 1654 who used r = 1/2 to obtain the rst estimate of with 15 digits of accuracy.
Shanks method. Shanks formula is a generalization of Aitkens formula. Observe that Aitkens
formula rewrites as
Sn Sn+1
(1) Sn+1 Sn+2
Tn = .
2 Sn
In a way, Aitkens method eliminates one unknown r. Shanks method eliminates k of them
simultaneously. It is based on the more general
Sn Sn+k
Sn+k Sn+2k
Tn(k) =
.
2 Sn 2 Sn+k 1
2
Sn+k 1 2 Sn+2k 2
integration by the trapezoidal rule (in this context, Eulers acceleration scheme is called
Rombergs method).
Wynns -algorithm. Shanks method seems to require the computation of large determinants,
which is computationally expensive and dicult to organize in a way so as to get successive
approximants eciently. A simple method due to Wynn in 1956 proceeds as follows:
(n) (n) (n+1) (n+1) 1
1 = 0, 0 = Sn , k+1 = k 1 + (n+1) (n)
.
k k
We admit the following result.
(n) (k)
Proposition 5.2. Wynns 2k is Shanks Tn .
In other words, this method gives a way to compute (values of) Pad approximants without
any heavy linear algebra.
= C {} to
The general framework is that we are given a sequence (k) of mappings from C
itself, we dene
n = 1 2 n ,
and study the limit of the sequence (n(c)) for an appropriate value of c.
P
In the case n: t 7 t + an, we recover n(0) = ni=1 ai.
Q
Similarly for n: t 7 an t, we have n(1) = ni=1 ai.
an
The notation dened above corresponds to n: t 7 b , and then we have
n+t
n
K (ai/bi) = n(0).
i=1
Definition 5.3. The n-th convergent to the continued fraction C = Kn=1 (an/bn) is by definition
n(0) = An/Bn. The elements an and bn are called the nth partial numerator and partial denom-
inator of C.
and let N be its north pole. Consider the stereographic projection that maps a point z C
to the intersection of S\{N } and the line that goes through z and N , and maps to N .
The coordinates are given by
2 z 2 z |z |2 1
x1 = , x2 = , x3 = .
|z |2 + 1 |z |2 + 1 |z |2 + 1
Lines in C correspond to circles (passing through the north pole) on the sphere. This can
be seen by observing that this circle is the intersection of the sphere with the plane passing
through the line and containing N .
Circles on the sphere correspond to either lines or circles on the plane. To see this, consider
the circle dened as the intersection of the sphere with the plane
a1 x1 + a2 x2 + a3 x3 = b,
where without loss of generality we may assume a21 + a22 + a23 = 1 and 0 6 b < 1. (The scalar
product of both vectors is smaller than the product of their norms, equality to 1 would
correspond to a circle reduced to one point, which we exclude). Now, if the corresponding
point is z = x + i y, we have
2 a1x + 2 a2 y + a3 (x2 + y 2 1) = b (x2 + y 2 + 1).
If b = a3, this is the equation of a line (and the north pole (0, 0, 1) is on the circle). Otherwise,
we have a circle with a positive radius: the equation rewrites
2 2
a1 a2 a +b a2 + a22 a2 + a22 + a23 b2
x+ + y+ = 3 + 1 2
= 1 .
a3 b a3 b a3 b (a3 b) (a3 b)2
Convergents are compositions of Mbius transformations, and hence are themselves Mbius
transformations.
with A1 = 1, B1 = 0, A0 = 0, B0 = 1.
Proof. It is a simple induction. For n = 1, we have 1(u) = a1/(b1 + u). Thus in that case, the
property holds. If it holds up to n 1, then a straightforward computation yields
an
An1 + An2 b anAn2 + bnAn1 + An1u
n+u
n(u) = n1(n(u)) = an = ,
Bn1 + Bn2 b anBn2 + bnBn1 + Bn1u
n+u
In other words, diagonal Pad approximants are obtained as convergents of continued fractions.
Proof. Using Theorem 5.4 with 0(w) = 1/ 1/c[0] [n1]
0 + w and n(w) = z/ 1/c0 + w for n > 1,
[0]
gives 1(0) = c0 and
An+2 An 1 An+1
=z + [n+1] , n > 0.
Bn+2 Bn c Bn+1
0
[0]
Thus by induction starting from A0 = 0, A1 = c0 , B0 = 1, B1 = 1, we get
deg A2n 6 n 1, deg A2n+1 6 n, deg B2n 6 n, deg B2n+1 6 n,
so that these polynomials have the required degrees. Also by induction from the recurrence we
obtain that Bn(0) =/ 0. We now show that they provide Pad approximants to the expected
precision. We have 1(0) = f (0) showing that A1 B1 f (z) = O(z). Next, inverting
An + An1u
n(u) =
Bn + Bn1u
yields
An Bnn(u)
u= .
An1 Bn1n(u)
In view of f (z) = n z f [n](z) , we deduce
An Bnf (z)
z f [n](z) = = O(z)
An1 Bn1 f (z)
and by induction from there An Bnf (z) = O(z n). The conclusion follows since Bn(0) =
/ 0.
Lemma 5.6. For an arbitrary sequence of nonzero c0 = 1, c1, c2, ..., we have
ai ci1ciai
K
i=1 bi
=
i=1
K cibi
.
Proof. The rst convergents are both equal to a1. Next, by induction from Theorem 5.4, we obtain
that the convergents An/Bn and An /Bn of the left-hand and right-hand sides are related by
An = c1cnAn , Bn = c1cnBn.
Appropriate choices of the sequence cn give the following useful special cases.
Corollary 5.7. The following one-parameter families are equivalent to Ki=1(ai/bi):
1 1
bi1 bi ai 1
K
i=1 1
and K
i=1 bici
,
with
a1a3a2i1 a2a4a2i
c2i = , c2i+1 = .
a2a4a2i a1a3a2i+1
Thus from now on, we can use partial numerators or denominators equal to 1 depending on
our needs.
Sucient conditions for this process to provide an explicit continued fraction expansion for f0(z)/
f1(z) (or for further fk(z)/fk+1(z)) are that bk(0) =
/ 0 while ak+1(0) = 0. In that case unrolling this
recurrence relates successively the pairs of indices (0, 1), (1, 2), (2, 3), ... and leads to
f0(z) a1(z)
= b0(z) + . (5.3)
f1(z) a2(z)
b1(z) +
a (z)
b2(z) + 3
In some cases, rewritings using Corollary 5.7 may lead to nicer formul.
Hypergeometric series
A large source of such sequences of power series is provided by hypergeometric series, that have
as special cases many classical elementary and special functions. The general framework is the fol-
lowing. Start with a parameterized power series f (; z) that satises a 2nd order linear dierential
equation with coecients in Q(, z). Assume moreover that the shifted power series f ( + 1; z)
satises an identity of the form
f ( + 1; z) = u(; z)f (; z) + v(; z)f (; z),
where the derivative is with respect to z. Then eliminating f (; z) between f (; z), f ( + 1; z),
f ( + 2; z) provides a recurrence like (5.2).
1 z 1 z2 1 z3
0F1(; z) = 1 + + + + .
1! ( + 1) 2! ( + 1)( + 2) 3!
Since its sequence of coecients satises n( + n 1)un+1 = un, the series satises a 2nd order
linear dierential equation. Also, by simple inspection,
1
0F1 (; z) = 0F1( + 1; z).
Thus by the reasoning above, there exists a linear dependency between 0F1(; z), 0F1( + 1;
z), 0F1( + 2; z). Linear algebra (or a direct examination) yields
z
0F1(; z) = 0 F1( + 1; z) + 0 F1( + 2; z).
( + 1)
The valuation conditions are satised and thus we get a continued fraction expansion for 0F1(;
z)/0F1( + 1; z).
/ 0, 1, 2, ... by
With one more parameter, we consider the hypergeometric series dened for =
z ( + 1) z 2 ( + 1)( + 2) z 3
1F1(; ; z) = 1 + + + + .
1! ( + 1) 2! ( + 1)( + 2) 3!
By the same reasoning this series satises a linear dierential equation of order 2 and each of
1F1( + 1; ; z) and 1F1(; + 1; z) can be rewritten as linear combination of 1F1(; ; z) and its
derivative. It follows that any three among the shifts 1F1( + p; + q; z) with integers p, q are
linearly dependent. A direct use of the same method as before leads to the slightly unpleasant
z ( + 1) z
1F1(; ; z) = 1 1F1( + 1; + 1; z) + 1 F1( + 2; + 2; z).
( + 1)
Although the valuation conditions are satised and we get a continued fraction expansion for 1F1(;
; z)/1F1( + 1; + 1; z), this is not really a nice one because of the presence of the variable z both
in the partial numerators and denominators.
Instead, using the following two identities
1F1(; ; z) = 1F1( + 1; + 1; z) +z 1 F1( + 1; + 2; z), 1F1(; ; z) = 1F1(; + 1;
( + 1)
z) +z 1 F1( + 1; + 2; z).
( + 1)
Example 5.9. (exp) Specializing at = 0 gives a simplication of the rst partial numerator
that makes the expression hold even for = 0. Then, the (obvious) special values
taking the inverse of which yields a continued fraction for exp (z).
As before, this series satises a second order linear dierential equation and all the shifts of its
parameters belong to the same vector space of dimension 2, so that any three of them are linearly
dependent. We use the following linear dependency
( )
2F1(, ; ; z) = 2F1(, + 1; + 1; z) +z 2F1( + 1, + 1; + 2; z),
( + 1)
and that obtained by exchanging the roles of and . Using them in alternance relates successively
the indices (, , ), (, + 1, + 1), ( + 1, + 1, + 2), ( + 1, + 2, + 3), ( + 2, + 2, + 4), ...
and produces the nice continued fraction
( )
2F1(,; ; z) z ( + 1)
=1+ ( + 1)( 1)
2F1(, + 1; + 1; z) z ( + 1)( + 2)
1+ ( + 1)( 1)
z ( + 2)( + 3)
1+ ( + 2)( 2)
z ( + 3)( + 4)
1+
discovered by Gauss himself. The special value
2F1(, 0; ; z) = 1
gives a continued fraction for the whole 2-parameter family
( + 1) 2 ( + 1)( + 2) 3
2F1(, 1; ; z) = 1 + z+ z + z +
( + 1) ( + 1)( + 2)
under the form
1
2F1(, 1; ; z) = .
z ( + 1)
1+ 1(1 + )
z ( + 1)( + 2)
1+ ( + 1)( + 1)
z ( + 2)( + 3)
1+ 2(2 + )
z ( + 3)( + 4)
1+
Several classical functions are obtained as special cases, such as arctan (z), (1 + z) and log (1 + z).
Example 5.10. (log(1+z)) The special case log (1 + z) = z 2 F1(1, 1; 2; z) leads to the explicit
form
z
log (1 + z) = 12 . (5.6)
z 23
1+ 12
z 34
1+ 23
z 45
1+ 23
z
1 + 56
5.3 Convergence
Up to now, we have observed many properties of continued fractions or other Pad approximants
and witnessed their great numerical properties, but we are still missing an explanation for this
observed convergence. We now give such results for several classes of continued fractions.
Proposition 5.11. The nth convergent An/Bn of the fraction Ki=1(ai/bi) is
a1 aa (1)n+1a1an
n(0) = 1 2 + + (5.7)
B1 B1B2 Bn1Bn
provided all the Bis are nonzero.
Corollary 5.12. When ai and bi are positive for all i, then the sequence Cn = n(0) of convergents
to Ki=1 (ai/bi) satisfies
C2 < C4 < C6 < < C2n < C2n+1 < C2n1 < < C1.
As a consequence, both sequences (C2n) and (C2n+1) converge. When they have the same limit, it
belongs to the intervals (C2n , C2n+1).
Proof. First, the positivity of the partial numerators and denominators induces the positivity of
the numerators and denominators of the convergents, by Proposition 5.4. Next, the sum of two
consecutive terms in the series (5.7) is
(1)n+1a1an (1)na1an+1 (1)n+1a1an 1 an+1
Cn+1 Cn1 = + = .
Bn1Bn BnBn+1 Bn Bn1 Bn+1
Positivity of bn+1 and Bn and the recurrence of Proposition 5.4 imply that Bn+1 = an+1Bn1 +
bn+1Bn > an+1Bn1 so that the last factor in the equality above is positive. It follows that the sign
of Cn+1 Cn1 is that of (1)n+1, which concludes the rst part of the proof. Then (C2n+1) is a
bounded increasing sequence, so that it converges, and similarly for (C2n). The last statement is
clear.
We now (at last!) give our rst criterion for the convergence of continued fractions (and therefore
also for Pad approximants).
Proposition 5.13. When Pbi > 0 for all i, the continued fraction Ki=1(1/bi) converges if and only
if the numerical series i=1 bi is divergent.
Proof. The general term of the alternating series (5.7) is 1/BnBn+1 which is decreasing when
n increases. Thus the convergence of the continued fraction is equivalent to BnBn+1 . The
recurrence gives
BnBn+1 = bnBn2 + Bn1Bn = = bnBn2 + bn1Bn1
2
+ + b1B12 > (b1 + + bn) min (Bi2).
P
This proves that divergence of the sum i=1 bi implies convergence of the continued fraction.
Conversely,
Bn+1 + Bn = (1 + bn)Bn + Bn1 6 (1 + bn)(Bn + Bn1) 6 6 (1 + bn)(1 + b1)b1 6 b1eb1 ++bn.
56 Numerical approximation using Pad approximants
Since BnBn+1 6 (Bn+1 + Bn)2/4, if the sum of the bis is convergent with limit , then the last
term above is bounded by b1e and therefore the sequence BnBn+1 is bounded, which prevents the
series (5.7) from converging.
Example 5.14. (log(1+z ) for z >0). The continued fraction for ln (1 + z) given in (5.6) has
positive coecients. A change of representation following Corollary 5.7 shows that this fraction
is equivalent to Ki=1 (1/bi) with b2i
= 2/i and b2i+1 = (2i + 1)/z. Thus, the sum diverges for any
z > 0 and the continued fraction converges over all of R+. This proves the behaviour observed at
the begining of this chapter, together with the fact that successive convergents provide smaller and
smaller intervals containing the limiting value.
Theorem 5.15. [Worpitzky (1865)] If |ak | 6 1/4, for k = 1, 2, ... then the continued fraction
C = Kk=1 (ak/1) converges. Its limit C satisfies |C | 6 1/2 and the nth convergent obeys |n(0)
1
C | 6 2n + 1 .
Proof. Recall the notation n(u) = 1 2 n(u), where here k(u) = ak/(1 + u). The proof is
based on the fact that the image of a disk by the ks is a disk. Moreover, if D(, r) denotes the disk
of center and radius r, the hypothesis on ak will be shown to imply that k(D(0, 1/2)) D(0, 1/2).
Then the sequence
D0 = D(0, 1/2), D1 = n(D0), ..., Dn = 1(Dn1)
will be proved to have radii decreasing to 0 when n increases. Now, n+k(0) n(D0) = Dn implies
that the sequence (n(0)) is a Cauchy sequence, whence its convergence.
We now prove the required bounds. First, the image of D(, r) by : u 7 a/(1 + u) is computed
by a sequence of translation, inversion, scaling, given by
r
c + D(, r) = D(c + , r), D(, r) = D(, ||r), 1/D(, r) = D , .
||2 r 2 ||2 r 2
This gives
(D(, r)) = a/(1 + D(, r)) = a/D(1
+ , r)
1 + r a(1 + ) |a|r
=a D , =D , .
|1 + |2 r 2 |1 + |2 r 2 |1 + |2 r 2 |1 + |2 r 2
4a 2|a|
In particular, when = 0 and r = 1/2 we obtain (D0) = D 3 , 3 which is indeed a subset
of D0 when |a| 6 1/4. For r < |1 + |, the following inequalities are consequences of the decrease
of x/(x2 r 2) for x > r:
a(1 + ) 1 || |a|r r
|1 + |2 r 2 6 |a| (1 ||)2 r 2 , 6 |a| . (5.8)
2
|1 + | r 2 (1 ||)2 r2
The right-hand sides of these inequalities are increasing functions of || and r as long as r < 1 ||
and thus provide the basis for an induction: if Dk = D(k , rk), then upper bounds on |k | and rk
produce upper bounds on k+1 and rk+1. In particular, the inequalities
k 1
|k | 6 , rk 6
2k + 1 2(2k + 1)
are valid for k = 0 and follow by induction using (5.8). This concludes the proof.
Example 5.16. (log(1+z ) for complex z). In (5.6), the partial denominators are 1, while the
partial numerators are given by
k (k + 1) k(k + 1)
a2k = z, a2k+1 = z.
(2k)(2k + 1) (2k + 1)(2k + 2)
5.3 Convergence 57
Example 5.17. (exp(z )). The continued fraction (5.5) is in the desired form with a1 = z and
1 1
a2k = z, a2k+1 = z, k > 0.
2(2k 1) 2(2k + 1)
In this example, the partial numerators are all of modulus smaller than 1/4 if and only if the rst
one is, which implies |z | 6 1/4. However, since the sequence of partial numerators decreases to 0,
as soon as k > |z |, the continued fraction k k+1 is convergent with a limit of modulus at
most 1/2. From there follows that the whole fraction converges for arbitrary z C provided the k
poles of the kth convergent k(w) are avoided, for k = |z |.
Example 5.18. (tan(z )) A continued fraction for tan (z) is given in (5.4). It has |ak+1(z)| =
1 3
|z |2/ 4 k + 2 k + 2 so that the continued fraction converges starting at the index k = |z |. As
above, from there follows that the whole fraction converges for arbitrary z C provided the k poles
of the kth convergent k(w) are avoided. This explains the behaviour observed on this example
in 5.1.1.
Proposition 5.19. [Jones & Thron 1990] If, for m > n > 1, (an) satisfies
1
|am | < n <
4
then for any nonnegative n and k, the convergents of the continued fraction Kk=1 (ak/1) obey
Qn
1 1 4n j =1 j
|n+k(0) n(0)| 6 Q .
1 4n + 1 4n n1 j =1 (1 3 j )
Example 5.20. (exp(z ) again). We restrict ourselves to |z | < 1/4, but by the reasoning in
Example 5.17, this can be extended to larger values of |z |. For those values of |z |, the sequence
dened by 1 = |z |, and n = 1/(8n 8) for n > 1 satises the constraints an gives the bound
q
1/2
1 1 n1 8 nc
q Q n1 n
8 n!
1 n 1 + 1 n 1 (n 1) j =1 (8j 11)
1/2 1/2
for a certain constant c. Thus the convergence is extremely fast in this case.
A rst (and clear) observation is that such a series is the asymptotic expansion of the function
Z
(t)
f (z) = dt
0 1 +tz
as z 0 with |arg z | < . This can be seen by integrating by parts. We admit the following powerful
result that summarizes the properties of these series.
Proposition 5.22. Let U (z) be a Stieltjes and Kk=1 (ak z/1) its corresponding continued fraction.
Then,
the coefficients ak are nonnegative;
P 1/2n
if an diverges then the sequence (n(0)) of convergents is convergent;
if (n(0)) converges for a value of z0 C \ R , then it is convergent for all z C \ R.
Example 5.23. With (t) = et, we have un = n! and thus the divergent series of our introduction
is the asymptotic expansion of the function
Z
et
F (z) = dt.
0 1+tz
This integral is convergent for all z C \ R. It has a continued fraction expansion with
coecients an = n/2. Thus the series in the second condition of the proposition converges and
the continued fraction is convergent in the whole slit complex plane.
Chapter 6
Orthogonal polynomials - Chebyshev series
Observe that R[x] E(w). The space E(w) is equipped with an inner product
Z b
hf , g i = f (x) g(x) w(x) dx;
a
Definition 6.1. A family of orthogonal polynomials associated with w is a sequence (pn) R[x]N
where deg pk = k for all k, and
i=
/j hpi , p j i = 0.
Theorem 6.2. For any weight w, there exists a family of orthogonal polynomials associated with w.
If additionally we request that the pk are all monic, this family is unique.
59
60 Orthogonal polynomials - Chebyshev series
The following statement gives us a way to recursively compute a sequence of orthogonal polyno-
mials. Note also that if you adapt Clenshaws method (cf. 2.5) to this recurrence, it also yields an
evaluation scheme in linear time for polynomials expressed in the corresponding basis of orthogonal
polynomials.
Proof. Let n > 2. The polynomial x pn1 is monic and has degree n, hence
n1
X
x pn1 = pn + a k pk .
k=0
hx pn 1, pk i
The orthogonality of the pn gives ak = kpk k22
for k = 0, ..., n 1. If we notice that hx pn1,
pk i = hpn1, xpk i for all k = 0, ..., n 1, we obtain ak = 0 if k 6 n 3 since xpk Rn2[x] and
pn1 (Rn2[x]). Hence, there are at most two nonzero coecients:
hpn1, x pn1i
an1 = = n ,
kpn1k22
hp , x pn2i hpn1, pn1 + q i
an2 = n1 = with q Rn2[x]
kpn2k22 kpn2k22
hp ,p i
= n1 n1 2 = n.
kpn2k2
Example 6.4.
(1, 1) w(x) = (1 x2)1/2 Chebyshev polynomials of the rst kind (up to normalization)
(1, 1) w(x) = 1 Legendre polynomials
(0, +) w(x) = ex Laguerre polynomials
2
(, ) w(x) = ex Hermite polynomials
Exercise 6.1. Prove that the rst statement of Example 6.4 is correct.
Theorem 6.5. For any weight w and for all n, the polynomial pn has n distinct zeros in (a, b).
Proof. Fix n. Let x1, ..., xk be the distinct zeros of pn in (a, b), with respective multiplicities
m1, ..., mk. We introduce the polynomial
k
Y
q(x) = (x x j )mj mod 2.
j=1
but the integrand is strictly positive over (a, b)\{x1, ..., xk }: contradiction.
6.1 Orthogonal polynomials 61
Theorem 6.6. Let f E(w), n N. There exists a unique best L2(w) polynomial approximation
to f in Rn[x], denoted p2,n:
kf p2,n k2 = min kf pk2.
pRn[x]
It is characterized by
p Rn[x], hf p2,n , pi = 0.
Proof. First assume that f C([a, b]). Let pn be the minimax degree-n approximation to f : then
Z b !1/2 Z b !1/2
kf p2,n k2 6 kf pn k2 = (f pn)2 w(x) dx 6 En(f ) w(x) dx
a a
: [a, b] [0, 1] = ,
We have f C([a, b]) if we assume f (a) = f (b) = 0. For almost all x [a, b], we have
|f (x) (f)(x)| |f (x)| 1[a,a+][b,b](x),
where 1[a,a+][b,b] denotes the indicator function of the set [a, a + ] [b , b]. Hence, for
almost all x [a, b]
lim0 |f (x) (f)(x)| = 0,
|f (x) (f)(x)| |f (x)| , with f L2([a, b], w).
It follows from Lebesgues dominated convergence theorem that
Z b
|f (x) (f)(x)|2 dx
0.
a 0
()
Denoting by p2,n the best L2(w) degree-n approximation to f , we have
()
()
kf p2,n k2 6
f p2,n
6 kf f k2 + kf p2,n k2
2
for all n and . Let > 0, there exists > 0 such that kf f k2 < . For this , there exists
n0 N such that kf p()
2,n k2 < for all n N, n > n0.
Remark 6.9. The previous statement can be wrong if one does not assume that (a, b) is bounded.
Can you give a counter-example?
62 Orthogonal polynomials - Chebyshev series
Note that, from Remark 6.7, the computation of the coecients of the best approximations in
the basis of orthogonal polynomials seems to require the evaluation of several integrals. Hence,
this kind of polynomials approximation is often signicantly more expensive than the approach via
interpolation polynomials.
interpolates f at the points x0, ..., xn, then our approximation for the integral is equal to
R b Pn
a
p(x)w(x) dx = k=0 wk f (xk) with
Z b
wk = k(x)w(x) dx for k =0, ..., n.
a
Thus we obtain an approximation of the integral that is exact at least for polynomials of degree
up to n. It is possible to obtain a much better result if one is allowed to choose the points x0, ..., xn:
Theorem 6.10. There exists a unique choice of the points xk and the weights wk such that,
whenever f R2n+1[x], the formula ( 6.1) is exact in the sense that
Z b X n
f (x) w(x) dx = wk f (xk).
a k=0
These points xk belong to (a, b) and are the roots of the (n + 1)-th orthogonal polynomial associated
to w.
Proof. We start with the uniqueness. Assume that x j , w j are such that the method is exact for
any f Rm[x], m 6 2 n + 1. Set
Yn
n+1(x) = (x x j ).
j =0
The polynomial n+1 is monic and belongs to (Rn[x]) : it is the (n + 1)-th orthogonal polynomial
Pn R b
associated to w. The xk are its roots and, as noted above, wk = k=0 wkk(xk)= a k(x)w(x)dx.
As for the existence, let x0, ..., xn be the distinct
R b roots in (a, b) of the (n + 1)-th orthogonal
polynomial (cf. Proposition 6.5), and let wk = a k(x)w(x)dx where k is the corresponding k-th
Lagrange polynomial. Clearly the method is exact if f Rn[x]. If now f R2n+1[x], write
f = q n+1 + r, deg r 6 n.
6.3 Lebesgue constants 63
R b
As n+1 Rn[x] et deg q 6 n, we have a q(x)n+1(x)w(x)dx = 0. It follows that
Z b Z b n
X n
X
f (x) w(x)dx = r(x) w(x)dx = wk r(xk) = wk f (xk).
a a k=0 k=0
See Chapter 19 of [22] for an interesting and up-to-date account on Gauss methods. Note that
a recent work [10] showed that the weights and the nodes for Gauss-Legendre or Gauss-Chebyshev
quadrature, for instance, can be computed in O(n) operations.
Remark 6.11. When w = 1, an alternative to Gauss quadrature with Legendre points is the so-
called Clenshaw-Curtis quadrature, which uses Chebyshev points as interpolation nodes. The
Chebyshev polynomials of the rst kind satisfy
Z 1 (
2
, k 2 N,
Tk(x)dx = 1 k2
1 0, k / 2 N.
Pn
Hence, if p = k=0 ck Tk is the interpolation polynomial of f , we deduce that the integral with
weight w = 1 of f is approximated by
Z 1 X 2 ck
p(x) dx = .
1 1 k2
06k6n
k2N
Since the coecients ck can be computed in O(n log n) arithmetic operations using the FFT, this
yields a complexity in O(n log n) for the computation of the quadrature approximant.
Definition 6.12. We say that a linear mapping L: C([1, 1]) Rn[x] is a projection onto Rn[x]
if Lp = p for all p Rn[x]. The operator norm
kLf k
= sup
f C([1,1]) kf k
Proposition 6.13. Let be the Lebesgue constant for the linear projection L of C([1, 1]) onto
Rn[x]. Let f C([1, 1]) and let p = Lf. Let p denote the minimax approximation to f. Then, we
have
kf pk 6 (1 + ) kf pk.
Clearly, Ln is a linear projection of C([1, 1]) onto Rn[x]. On the one hand, we have
n
X
|Ln f (x)| 6 kf k |k(x)|, for all x [1, 1],
k=0
64 Orthogonal polynomials - Chebyshev series
and hence kLn g k > A kg k. Weve just proved the following statement.
Theorem 6.14. The Lebesgue constant of degree-n Lagrange interpolation at x0, ..., xn is equal to
n
X
max |k(x)|.
x[1,1]
k=0
Remark 6.16. We deduce from this theorem that Chebyshev interpolants (i.e. interpolation
polynomials at Chebyshev nodes) are "near-best" approximations:
15 = 2.76...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
30 = 3.18...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
100 = 3.93...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
100000 = 8.32...: one loses at most 4 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial.
Remark 6.17. We deduce from this theorem that truncated Chebyshev series are "near-best"
approximations:
15 = 4.12...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
30 = 4.39...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
100 = 4.87...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
100000 = 7.66...: one loses at most 3 bits if one uses the truncated Chebyshev series instead
of the minimax polynomial.
Corollary 6.20. If f is Lipschitz continuous over [1, 1], then the truncated Chebyshev expansion
of f converges uniformly to f.
6.4.1 Convergence
Here is a summary of convergence results that we are going to rely on (see [22] Thm.3.1, 7.1, 7.2,
8.1, 8.2 for versions with weaker hypotheses).
66 Orthogonal polynomials - Chebyshev series
Theorem 6.21. Let f be continous on [1, 1]. Denote by (ak) its sequence of Chebyshev coefficients
and by (fn) its sequence of truncated Chebyshev expansions. Then
1. The coefficients ak tend to 0 when k .
2. If f is Lipschitz continuous on [1, 1], then (fn) converges absolutely and uniformly to f.
3. If f is C m and f (m) is Lipschitz continous, then ak = O(1/k m+1) and kf fn k = O(nm).
4. If f is analytic inside the ellipse z + z 2 1 6 r with r > 1, then ak = O(r k) and
kf fn k = O(r n).
Proposition
P 6.22. Assume that f is Lipschitz continuous on [1, 1]. Let its Chebyshev expansion
be ak Tk, and its n-th Chebyshev interpolant be
n
X (n)
pn(x) = ck Tk(x).
k=0
Then, we have
(n)
X
ck = aj.
j mod 2n=k
or 2nj mod 2n=k
The sums converge thanks to the absolute convergence of the Chebyshev expansion.
Proof. Consider the polynomial q whose coecients are those sums. Then, we have
(n) (n) (n)
q( j ) = f ( j ) = pn( j )
for all 0 6 j 6 n. Since both pn and q have degree at most n, they are therefore equal.
Hence, when convergence is fast, the interpolants are very close to the Chebyshev expansion.
This is exploited for instance in Trefethens Matlab package chebfun, which uses the pn as a data
structure to represent mathematical functions. Since computing the pn by FFT requires to evaluate
the function f , which may be expensive, we now turn to the direct computation of the Chebyshev
coecients when f is solution of a linear dierential equation.
Our aim is to show that for all functions that cancel such an operator, the rst n terms of the
Chebyshev expansion can be computed in a linear number O(n) of operations.
be two formal Laurent series, and assume L f = g. By the above formulas, we have:
X X X
x f = fk x Mk = fk Mk+1 = fk1 Mk
X X X
f= fk Mk = fk k Mk1 = (k + 1) fk+1 Mk.
Dene the shift operator
S: (fk) 7 (fk+1)
that sends a sequence to the sequence with the same values but where the indices are shifted by 1.
Also dene
X = S 1, D = (k + 1) S = S k.
Then, an iterative application of the previous rules, plus linearity, leads to
(gk) = L(X , D) (fk).
Example 6.23. Consider y y = 0, that is, f (x) = ex, g(x) = 0, and L(x, ) = 1. Then we have
L(X , D) = D 1 = (k + 1) S 1,
hence ((k + 1) S 1) (yk) = 0, and we have obtained the expected recurrence (k + 1) yk+1 = yk
satised by the coecients yk = 1/k!.
then
(k + 1) S (k 1) S 1 S S 1
(1 X 2) (wk) = (fk) = k (fk).
2 2
Let us be bold and use a formal inverse of (1 X 2). Proceeding purely formally produces
S S 1 S 2 + S 2 + 2 1 S S 1
(1 X 2)1 k= 1 k
2 4 2
S S 1 2 S S 1
= k = 2(S 1 S)1k.
2 2
It is therefore natural to set
D = 2(S 1 S)1k.
Then, if g = L(x, ) f , we should have (gk) = L(X , D) (fk). There remains to make sense of this.
The sequences of Chebyshev coecients of any function in C([1, 1]) belong to C: the symmetry
is clear from the denition of the coecients and the limit has been given in Theorem 6.21.
In this space C, the meaning of the operator D and its inverse can be made clear as follows.
Proposition 6.25. Let f and g belong to C([1, 1]) and let (fk) and (gk) be their Chebyshev
sequences. Then (gk) is the unique solution in C of the equation gk = Dfk (i.e., gk 1 gk+1 =
2 k fk). Conversely, given the Chebyshev sequence (gk) of a continuous function g, the primitive f
of g has for Chebyshev coefficients (fk) the unique solution of fk = D 1 gk in C such that f0 is given
by ( 6.18).
Proof. From
ck = ck+2 = = ck+K
0,
K
6.5 Chebyshev expansions for D-finite functions 69
The proof of the second statement of the Proposition is obtained by observing that conversely,
since f C 1([1, 1]), its sequence of Chebyshev coecients belongs to C and satises (6.2) where
now (fk) is the unknown sequence. For k = / 0, dividing by k shows uniqueness of the solution. The
case k = 0 depends on the constant of integration.
Theorem 6.27. If f C m([1, 1]) and if g = L(x, ) f, with L of order m, then g is continuous
and their Chebyshev sequences are related by
L(X , D) (fk) = (gk).
In particular, if L(x, ) f = 0, then L(X , D) (fk) = 0.
Proof.
If L = 1 (the identity), the result is clear: f = g implies fk = gk.
Assume L = x, that is, g = x f . Then
1
gk = hx f , Tk i = hf , x Tk i = hf , 2 (Tk+1 Tk1)i = (X f )k.
6.5.4 Examples
Applying the operator D amounts to solving a second order recurrence. However, this is generally
unncessary.
Example 6.28. The function y(x) = exp x satises y y = 0. We get (D 1) (fk) = 0. Expressed
in terms of k and S, this is
D 1 = (2 (S S 1)1 k 1).
Now, we factor out S S 1 and obtain
D 1 = (S S 1)1 (2 k S 1 + S).
Multiplying by (S S 1) on the left shows that (2 k S 1 + S) (fk) = 0, or in other words
2 k fk fk1 + fk+1 = 0.
Example 6.29. The function y(x) = erf(x) satises y + 2x y = 0. We get (D2 + 2 X D)(f )k = 0,
which rewrites as
D2 + 2X D = 4(S 1 S)1k(S 1 S)1k + 2(S 1 + S)(S 1 S)1k.
In order obtain a recurrence equation out of this, we note that multiplying the commutative product
(S 1 + S)(S 1 S) = (S 1 S)(S 1 + S)
on the right and on the left by (S 1 S)1 lets us factor out S S 1 on the left as in the previous
example, which leads to
D2 + 2X D = (S 1 S)12(2k(S 1 S)1k + (S 1 + S)k).
Again, the equation L(X , D)(fk) = 0 is of the form (S 1 S)1(uk) = 0, which implies (uk) = 0.
Next, (uk) = k(S 1 S)1(vk) = 0 implies (S 1 S)1(vk) = 0 for all k =
/ 0, and this identity also
holds for k = 0 (y being analytic all the functions we consider are continous). Thus we nally get
the following recurrence for the Chebyshev coecients of erf:
k2 k k k+2
2 fk2 + 2k + fk fk+2 = 0,
k1 k 1 k+1 k+1
which can be further simplied to
(k 2)(k + 1)fk 2 + 2k 2 fk (k + 2)(k 1)fk+2 = 0.
6.6.1 Definition
Let : Q(x) Q(x) be an injective ring morphism, and let be a -derivation, that is, a linear
map such that
(f g) = (f ) (g) + (f ) g.
Then, the ring of polynomials in a variable with coecients in Q(x) subject to the commutation
rule
f = (f ) + (f ), f Q(x)
is called the ring of Ore polynomials over Q(x), and denoted Q(x)h; , i.
Example 6.30.
Commutative polynomials Q(x)[]: take = id, = 0.
Recurrence operators: = S, = 0.
d
Dierential operators: = id, = dx .
6.6.2 Properties
Degree Using the commutation rule, it is always possible to write an Ore polynomial P in a
unique way in the form
P = ak k + + a0,
with the coecients ai on the left. The largest exponent of in this expression is called the degree
of P .
Proof. By induction, the leading monomial of iP with P as above is i(ak) k+i, since by
injectivity of , the coecient is not zero. If Q has degree and coecients bi, then by linearity,
it follows that the leading monomial of Q P is b (ak) k+, which has degree k + and by the
same reasoning, this is also the degree of the leading monomial ak k(b) k+ of P Q.
6.6 Ore polynomials 71
Euclidean division on the right Based on the previous lemma, it is easy to obtain that for
any A, B, there exists a unique pair (Q, R) such that A = Q B + R, with deg R < deg B and Q on
the left of B. (Proof and algorithm are as usual, except that one needs to take care of the order
of the factors).
Euclidean algorithm for right gcds (gcrds) Again, the same algorithm as in the commutative
case works, using the Euclidean division on the right at each stage. The same proof shows that
the last nonzero remainder is a greatest common right divisor (gcrd).
Extended Euclidean algorithm This is the variant where the cofactors are computed at each
stage, leading in particular to Um , Vm such that
Um A + Vm B = G = gcrd(A, B).
Least common left multiples (lclms) In the last relation Um+1 A + Vm+1 B = 0 found by the
Extended Euclidean algorithm (on the right), one can prove that Um+1 A = Vm+1 B are lclms
(note that this is not the standard proof in the commutative case).
Example 6.32. The Fibonacci sequence (Fn) dened by Fn+2 = Fn+1 + Fn, F0 = 0, F1 = 1 satises
Fn+15 = 610Fn+1 + 376Fn. This is clear by Euclidean division: S 15 = A (S 2 S 1) + 610S + 376.
Example 6.33. In Chapter 3, it was proved that the sum of the solutions f and g of linear
dierential equations L(x, )f = 0 and M g = 0 satises a linear dierential equation N (f + g) = 0.
The operator N in this equation is nothing but lclm(L, M ).
6.6.3 Fractions
Two pairs (P1, Q1) and (P2, Q2) of Ore polynomials will be called equivalent when
We view equivalence classes as fractions and write (P , Q) = Q1 P . This makes sense, since for
two equivalent pairs we have
Q11P1 = Q1 Q1 1 Q1 P1 = Q2 Q2 1 Q2 P2 = Q21P2.
Addition and multiplication of the equivalence classes are obtained in a way that is completely
similar to the construction of the fraction eld of a commutative Euclidean ring:
Addition:
1 consider A1 B + C 1 D and let L = lclm(A, C) = A A = C C. Then A1B =
A A AB = L AB and similarly for C 1D, so that
1
A1 B + C 1 D = lclm(A, C)1 A B + C D .
Theorem 6.34. (Ore, 1931). The set of fractions of Ore polynomials forms a non-commutative
field.
Theorem 6.35. Let f C m([1, 1]) be such that L(x, ) f = 0 with L of order m and let
L(X , D) = Q(k, S)1P (k, S). Then the sequence (fk) of coefficients of the Chebyshev expansion
of f satisfies P (k, S)(fk) = 0.
Proof. Recall that L is an operator of order m. By integrating the equation L(x, )f = 0 exactly
m times, we obtain an equation of the form M (x, I)f = 0, where I denotes the integration operator.
By Proposition 6.25, this operator acts on Chebyshev sequences by D 1 = k 1(S 1 S)/2. It
follows that I mL(x, ) acts on Chebyshev sequences as D mL(X , D) = M (X , D 1) which is a
polynomial , i.e., does not have a denominator. From there follows that L(X , D) = DmM (X , D 1);
in other words the denominator of this fraction equal to L is exactly a power of D. Thus Q1P
and D mM are equal fractions. This implies that QP = DmM , where lclm(Q, D m) = QQ = D mDm.
From there follows that whatever the representation of L(X , D) as a fraction Q1P , the operator D
is a right factor of Q. Simplifying these fractions by D and repeating this reasoning shows that Q
is necessarily a power of D provided P and Q are relatively prime, so that L(X , D) = D iP
for some i 6 m. Thus, we have obtained that the sequence (fk) C satises D iP (fk) = 0. By
Proposition 6.25, this implies that P (fk) = 0 as was to be proved.
Chapter 7
Interval Arithmetic, Interval Analysis
Interval Arithmetic is an arithmetic for inequalities. Assume for instance that we know that
5 6 a 6 6 and 10 6 b 6 11: then of course 50 6 a b 6 66. We will dene a product of real intervals
such that
[5, 6] [10, 11] = [50, 66]
that allows for such reasoning. Another need for interval arithmetic comes from the roundo errors
that occur when working with nite precision numbers.
Notable applications of interval arithmetic to bring rigor to numerical computations
performed on a computer include T. Hales proof of Keplers conjecture [11][12]
(see http://code.google.com/p/yspeck/), and W. Tuckers solution of Smales 14th
problem [20][21] (see http://www2.math.uu.se/~warwick/main/thesis.html and also
http://paulbourke.net/fractals/lorenz/).
The interested reader will nd numerous additional interesting information on the website
http://www.cs.utep.edu/interval-comp/.
In this course, we are interested in the use of interval arithmetic in the evaluation of mathem-
atical functions. Given > 0 and f : [a, b] R, we would like to make sure that the evaluation f (x)
of f at any value x [a, b] is such that
|f (x) f (x)| 6 .
f (x)
Note that, in practice, one commonly uses on relative error 1 rather than on absolute
f (x)
error |f (x) f (x)|. We focus on the absolute error case for the sake of clarity. To perform the
evaluation, we replace f by a polynomial p. Then we evaluate p, and f (x) = (p(x)), where is
the active rounding mode. There are two sources of error:
approximation error : let 1 be an upper bound for kf pk,
rounding error : let 2 be an upper bound for the error |p(x) (p(x))|,
we have to guarantee that 1 + 2 . In this chapter and in Chapter 9, we will develop tools
that help to establish rigorous approximation error. Regarding rounding errors, G.Melquiond has
developed a formal proof tools which address this issue[14][5][6](see http://gappa.gforge.inria.fr/).
Definition 7.2. Let x IR. The width of x is denoted w(x) = x x. We also define the center
x + x
mid(x) = ,
2
73
74 Interval Arithmetic, Interval Analysis
1
and the radius rad(x) = 2 w(x).
Remark 7.3. It is common in the litterature to encounter the notation (mid(x), rad(x)) =
{x R: |x mid(x)| 6 rad(x)}.
Definition 7.4. A point (or degenerate, or thin) interval is one of the form [x, x], also denoted [x].
XY = {xy : x X , y Y }
where, if = /, we assume that 0
/ Y.
[x, x ] + [ y, y ] =[x + y, x + y ],
[x, x ] [ y, y ] =[x y , x y],
[x, x ] [ y, y ] =[min (x y, x y , x y , x y ), max (x y, x y , x y , x y )],
1 1
[x, x ]/[ y, y ] =[x, x ] , if 0/Y ,
y y
which depend only on the endpoints.
Proof. Exercise.
Remark 7.7. Note that, in IR, the operations + and are associative and commutative.
Remark 7.8. In practice, multiplication (hence division) can be made more ecient. From the
formula in the previous proposition, it seems to require four real multiplications and several com-
parisons. And yet, if one checks the sign of the endpoints of the two intervals before starting the
computation, one can reduce this amount. Note that there are nine possible cases: one of them
indeed leads to four multiplications while the other eight only need two multiplications. Likewise,
there are six possible cases for the division.
Remark 7.9. It can be convenient to dene a result for the division even when 0 Y . One can
nd an interesting discussion regarding this issue in Section 2.3 of W. Tuckers book [23].To do
that, we work over R=R {+, }, with two signed zeros +0 and 0 (more precisely, over
over R\{0} {+, , +0, 0}, with two signed zeros +0 and 0). Hence, we can take advantage
of the following relations
1/(+ ) = +0, 1/(+0) = +, 1/() = 0, 1/(0) = .
Assume y < 0 < y . Then we dene
1 1 1 1
= , , = , +
[ y, 0] y [0, y ] y
and in general
1 1 1
= , , + .
[ y, y ] y y
We will thus dene the notion of extend interval by removing the condition x 6 x and set
{x R : x 6 x 6 x } if x 6 x ,
X = [x, x ] =
[, x ] [x, +]otherwise.
7.1 Interval arithmetic 75
We introduce the notation IR = {[x, x ]: x, x R}. We then dene division over IR as follows
X [1/y , 1/ y] if 0 /Y ,
[, +] if 0 X and 0Y ,
[x / y, +] if x < 0 and y < y = 0,
[x / y, x /y ] if x < 0 and y < 0 < y ,
X/Y = [, x /y ] if x < 0 and 0 = y < y , (7.1)
[, x/ y] if 0 < x and y <
y = 0,
[x/y , x/ y] if 0 < x and y < 0<y ,
[x/y , +] if 0 < x and 0 = y <y ,
if 0/ X and Y = [0, 0].
Proposition 7.10.
1. Interval subtraction is not the inverse of addition.
2. Interval division is not the inverse of multiplication.
3. Interval multiplication of an interval with itself is not equivalent to squaring the interval,
i.e., in general,
/ [min (x2, x 2), max (x2, x 2)].
[x, x ] [x, x ] =
Proof. Exercise.
Remark 7.12. Standard machine oating-point numbers are not always sucient, e.g., to work
with very small intervals. We may also use multiple-precision oating-point numbers as bounds for
our intervals. An example of a library which oers support for multiple precision interval arithmetic
is MPFR7.1.
Remark 7.14. Finding the exact image of a (usually multivariate) function, and, in particular,
a value where f attains its minimum is a whole subdomain of Math and CS called Optimization
Theory.
Let X = [x, x ] IR. By monotonicity, interval functions dened as follows give the exact range
of the corresponding real functions:
eX = [exp x, exp x ],
X = [ x , x ], x > 0,
log X = [log x, log x ], x > 0,
arctan X = [arctan x, arctan x ],
For some other functions like xn, trigonometric functions..., writing down R(f , D) is also possible,
as long as we know their extrema. For instance, let n Z, X IR,
if n2N + 1, [xn , x n]
if nN \ {0}, n even, [min (xn , x n), max (xn , x n)] if 0
/ X,
n
X = pow(X , n) = [0, max (xn , x n)] otherwise,
[1, 1] if n =n
0,
[1/x , 1/x] if n N and 0 / X.
Exercise 7.1. Write the analogous formulas for sin, cos, tan. For sin and tan, consider
n o n o
S1+ = 2k + , k Z , S1 = 2k , k Z .
2 2
For cos, consider
S2+ = {2k, k Z}, S2 = {2k + , k Z}.
f(Y ) R(f , Y ).
Several interval extensions are possible for the same function over the same X. Interval exten-
sions of exp over [1, 1] include
the constant function X 7 [e1, e],
7.1. http://www.mpfr.org
7.2 Interval functions 77
Let us try to propose a systematic process for computing interval extensions. If f (x) is a
rational expression, one means to get an interval extension of the function it denotes is to replace
each occurrence of the variable x by the interval X, and overload all arithmetic operations with
interval operations. The resulting extension is called the natural interval extension.
Theorem 7.16. Given a rational expression denoting a real-valued function f, and its natural
interval extension F, which we assume to be well-defined over some interval X IR, then
1. Z Z X implies F (Z) F (Z ) (inclusion isotonicity);
2. R(f , X) F (X) (range enclosure).
Proof. To prove assertion 1, it suces to repeatedly use Lemma 7.11. Regarding assertion 2,
assume that there exists y X such that f (y) R(f , X) but f (y)
/ F (X). This implies that
F ([y, y]) = [f (y), f (y)]F (X)which contradicts assertion 1.
We now would like to extend this notion of natural interval extension to a larger class of
functions.
Definition 7.18. We call elementary function a symbolic expression built from constants and basic
functions using arithmetic operations and composition. The class of elementary functions will be
denoted E. A function f E is given by an expression tree (or dag, for directed acyclic graph).
Theorem 7.20. Given an elementary function f and an interval X over which the natural interval
extension F of f is well-defined:
1. F is inclusion isotonic over X;
2. R(f , X) F (X).
Proof. The statement holds for rational functions (cf. Lemma 7.11) and, by denition, for standard
functions. Let g and h be two elementary functions for which the Theorem holds. We have to
prove that it holds as well for the function gh, where {+, , , /, }. Lets address the case of
(the other cases are analogous). We assumed that F (X) is well dened: this implies that neither
f nor any of its sub-expressions have singularities in their domains, induced by the interval X.
Then, is Zand Z denote two sub-intervals on the domain of h such that Z Z , the function h
is continuous over the compacts Zand Z . If G and H denote the natural interval extensions of
g and h, the sets H(Z) and H(Z ) are compact intervals. Using the inclusion isotonicity property
satised by G and H, we obtain H(Z) H(Z )and G H(Z) = G(H(Z)) G(H(Z )) = G H(Z ).
The proof of the second assertion is analogous to the proof of assertion 2 of Theorem 7.16.
78 Interval Arithmetic, Interval Analysis
Exercise 7.2. Show that f (x) = x sin x + 2/5 has no zero over [0, /4]. Hint: evaluating the natural interval
extension is not enough. Split the domain.
Theorem 7.22. Let X IR. Let f be an elementary function such that any subexpression of f
is Lipschitz continuous. Let F be an inclusion isotonic interval extension such
S that F (X) is well-
defined. Then, there exists > 0, depending on F and X, such that, if X = ki=1 Xi, with Xi IR
for all i, then
k
[
R(f , X) F (Xi) F (X)
i=1
and !
k
[
rad F (Xi) 6 rad(R(f , X)) + max rad Xi.
i=1,...,k
i=1
Proof. The rst inclusion follows from the inclusion isotonicity and the range enclosure properties:
k
! k k k
!
[ [ [ [
R(f , X) = R f , F (Xi) = R(f , Xi) F (Xi) = F Xi = F (X).
i=1 i=1 i=1 i=1
Now, we are going to prove the following fact: if Z X and y0 F (Z), then for all y R(f , X), we
have |y y0| 6 rad(Z). This implies the inequality in the theorem.
This statement is true for constants. It is also true for standard functions which are bounded
(they have a sharp interval enclosure). In the same way as in the proof of Theorem 7.20, we
consider two connecting branches g1 and g2 of the expression tree dening f and we prove that
the statement is also valid for g1g2, where {+, , , /, }. We focus on , the other cases being
analogous. The functions g1 and g2 are elementary (as sub-expressions of the elementary function
f ) and Lipschitz continuous. From Theorem 7.20, their natural interval extensions G1 and G2 are
inclusion isotonic. We also know that, since F (X) is well-dened, these extensions ae also well-
dened on their respective domains ZG1 and ZG2 induced by X. We assumed that the statement
is true for g1 and g2: for i = 1, 2,
if V ZGi , y0 Gi(V ) and y R(gi , V ), then |y y0| 6 irad(V ).
The range enclosure property of Theorem 7.20 gives: for all V ZH , we have
R(g1 g2, V ) = R(g1, R(g2, V )) R(g1,G2(V )).
Let z R(g1 g2, V ), there exists u R(g2, V ) s.t. z = g1(u). The real number u also belongs to
G2(V ). Therefore, if z0 G1 G2(V )=G1(G2(V )), the inductive assumptions on gi and Gi yield
|z z0| 6 1rad(G2(V ))
and
rad(G2(V )) 6 rad(R(g2, V )) + 2rad(V ) 6 (K2 + 2)rad(V ),
where K2 is a Lipschitz constant for g2. If we combine these two inequalities, we obtain
|z z0| 6 1(2 + K2)rad(V ).
Now, we use the fact that the expression tree dening f is nite by denition, which implies that
the constant of the statement exists: it is the result of a nite accumulation of constants yielded
as above.
7.2 Interval functions 79
Example 7.23. Let f (x) = e1/cos x, and let p be a degree-5 minimax approximation of f over [0, 1].
Let
(x) = f (x) p(x).
Using the natural interval extension of , we get kk < 0.4. But one can show that obtaining the
true value kk 1.12 106 by subdivision would require about 107 subintervals.
Chapter 8
Linear recurrences and Chebyshev series
Recall that we showed earlier that (under some regularity conditions) solutions of linear ODEs with
polynomial coecients have Chebyshev series that satisfy linear recurrences. These recurrences
provide an ecient way to compute the coecients. However, we also saw that using recurrences
to compute some sequences could be numerically unstable. It turns out that the recurrences we
computed for Chebyshev series are aected by this problem. In this chapter, we show how this
diculty can be overcome and values be computed in a numerically stable way. Our starting point
is the recurrence
un+k + a1(n) un+k 1 + + ak(n) un = 0, n N. (8.1)
or its matrix version using the companion matrix: Un+1 = A(n) Un with
1
un
Un = and A(n) = .
1
un+k1
ak(n) ak1(n) ... a1(n)
We rst consider this general setting and then specialize it to the specic form that recurrences
for Chebyshev coecients take.
81
82 Linear recurrences and Chebyshev series
Lemma 8.1. If c1 = / 0 (ie, U0 does not belong to the subspace Vect(v2, ..., vk)) and |1| > |2|, then
the iteration Un+1 = AUn is such that, as n , Un/kUn k tends to Vect(v1) and hUn |Un+1i/
kUn k2 1
Proof. By hypothesis, neither c1 nor 1 is 0, thus (8.3) can be divided by c1n1 , which yields
n n
Un c2 2 ck k
v1 = v2 + + vk.
c1n1 c1 1 c1 1
Since the eigenvalues are sorted by decreasing modulus, the norm of the right-hand side tends
to 0, which proves at the same time that the vector Un/(c1n1 ) tends to v1. Its norm tends to 1,
which shows kUn k |c1n1 | and proves the rst statement. It follows that hUn| AUn i/kUn k2 hv1|
Av1i = 1 by continuity of multiplication by A and scalar product, which proves the second one.
In general, a random U0 will do, since the probability that it belongs to Vect(v2, ..., vn) is
negligible. Note that in practical use of this method, in order to avoid an overow, it is preferable
to normalize Un at each step.
Example 8.2. The Fibonnacci recurrence Fn+2 = Fn+1 + Fn has the sequence of Fibonnaci
numbers as solution with initial conditions F0 = 0, F1 = 1. Another solution is ((1 )n)n,
where is the golden ratio, solution together with 1 0.618033988 of the characteristic
polynomial X 2 X 1. Numerical use of this recurrence in the forward direction with initial
conditions (u0, u1) = (1, 0.618034) yields the successive values
(u10, u11) (0.008130, 0.005026), (u20, u21) (0.000010, 0.000164), (u30, u31) (0.09360,
0.015146), (u40, u41) (1.151270, 1.862794), (u50, u51) (141.596850, 229.108516).
Lemma 8.3. Let (v1, ..., vk) be the eigenvectors of A as above and (w1, ..., wm) be m vectors
of Ck such that B = (w1, ..., wm , vm+1, ...,vk) is a basis of Ck. If moreover, |m | > |m+1|, then
the sequence of m-tuples of vectors Wn = w1(n), ..., wm(n)
= An(w1, ..., wm) is such that Wn tends
* +
m+1 n
(n)
wi
to Vect(v1, ..., vm) in the sense that for any > m, v|
(n)
= O
.
wi
m
F 0
Proof. First, the matrix of change of basis betwen B and (v1, ..., vk) has a block structure G Id
with F a m m matrix of rank m. We then consider the new vectors 0 = W F 1. Then we have
(0)
i = vi + cm+1vm+1 + + ckvk and the same reasoning as in the proof of Lemma 8.1 applies
(n) (0)
to i = Ani , showing that An0 tends to Vect(v1, ..., vm) in the sense given in the Lemma.
The result on Wn is obtained by multiplying An0 on the right by F .
Theorem 8.4. (Poincar, 1885.) Assume that, for all i {1, ..., k }, the limit
lim ai(n) = i R
n
exists, and let (un) be a solution of ( 8.1). If the roots 1, ..., k of the characteristic polynomial
P (X) = X k + k1 X k1 + + a0
satisfy |1| > |2| > > |k |, then either un = 0 for all large enough n, or un =
/ 0 for all large
enough n and moreover un+1/un for some .
An elegant and more recent proof due to Mt and Nevai is based on the following generaliz-
ation.
Proposition 8.5. (Mt & Nevai, 1990.) Let A be a k k matrix with eigenvalues |1| > > |k |
and vi corresponding eigenvectors, with kvi k = 1. Let (Bn) be a sequence of k k matrices with
limn kBn k = 0, and let un satisfy Un+1 = (A + Bn) Un. Then either Un = 0 for large enough n, or
Un/kUn k
v
n
84 Linear recurrences and Chebyshev series
for some .
Before turning to the proof of the previous proposition, we show how it implies Poincars
theorem.
The eigenvalues of the matrix A are the is from the statement of Poincars theorem and
satisfy |1| > > |k |. By Proposition 8.5, either Un = 0 (and thus un = 0 too) for n suciently
large, or Un/kUn k v for some eigenvector v of A. In that case,
Un+1 Un U
= (A + Bn Id) n 0.
kUn k kUn k
If v1 denotes the rst coordinate of v, we have un/kUn k v1 and this is not 0, since, by the special
form of the companion matrix A, Av = v implies vi+1 = vi for i = 1, ..., k 1 so that v1 = 0
would imply v = 0. For n suciently large, multiplying (8.5) by kUn k/un is thus possible and then
extracting the rst coecient concludes.
Proof of Proposition 8.5. If one P Un is 0, then all the following ones are 0 too. We assume that
it is not the case and write Un = i cin vi. Then we have
X
Un+1 = i cin vi + Bn Un
i
whence
|cin+1 i cin | 6 kBn k kUn k 6 kBn kk max |cin |. (8.4)
i
|cin+1| 6 |i | |cin | + k |cnn | kBn k 6 (|i | + k kBn k) |cnn | 6 (|n | k kBn k) |cnn |
for n large enough to ensure kBn k < (|n | |i |)/(2k), and thus |cin+1| 6 |cn+1
n
|.
Being bounded, this sequence has a limit = lim n for which cn is nonzero asymptotically.
For this coordinate, (8.4) implies cn+1/cn .
For any > 0, the other coordinates satisfy
i
cn+1 cin
c i c 6 2
n n
By the previous point and since 0 6 |cin |/|cn | 6 1 for n suciently large we have
i i
cn i
c c
n+1
|i | 6 | | 6 |i | n + .
cn cn+1 cn
Theorem 8.6. (Perron, 1919.) With the same hypotheses as in Poincars theorem and assuming
/ 0, there exists a basis (1n , ..., kn) of solutions such that
k =
in+1
i , i = 1, ..., k.
in
and then fn1 + c2 fn2 = 0 i fn1 = fn2 = 0. Then there exists c3 {(fn1 + c2 fn2)/fn3: n N,
fn3 =
/ 0}, and so on. In the end fn0 := fn1 + c2 fn2 + + ck fnk does not have any zero, otherwise
all the fni would have a common zero. This would force a linear dependency on the initial
conditions and the dimension of the space of solutions could not be k.
0
Poincars theorem then implies that there exists such that fn+1/fn0 as n .
Set un = vnfn0/n , which is possible since =
/ 0. Then vn satises the recurrence
vn+k + a1 (n) vn+k1 + + ak (n) vn = 0 (8.5)
with
0
fn+ki
ai(n) = i 0 ai(n)
i.
fn+k n
There remains to prove that for each wni, there exists a solution vni of the recurrence equation
i
vn+1 vni = wni
i
such that vn+1/vni i. If |i | > | |, we take
vni = wn1
i i
+ wn2 + 2 wn3
i
+ ,
otherwise
vni = 1 wni + 2 wn+1
i
+ .
(The convergence of these sums is justied by Lemma 8.7 below.)
i
In the rst case, the quotient vn+1/vni is
n 1 wi 2 n2 wi
i
vn+1 wni 1 + wni + wni +
=
vni i
wn1 wi wi
1 + wni 2 + 2 wn3 +
i
n 1 n1
from which the limit follows with a bit of analysis (bounding the tails of the series and then
controling the leading part by a geometric series). The second case is completely analogous.
86 Linear recurrences and Chebyshev series
Theorem 8.8. If the coefficients a1(n), ..., ak(n) of ( 8.1) belong to C(n), with ak = / 0, then there
exists a basis (n1 , ..., nk) of solutions with linearly independent asymptotic behaviours of the form
n!nexp P n1/ n lnJn,
with Q, N\{0}, P C[n] of degree at most 1, J N, , in C.
Finding the corresponding elements , , P , , J, in this order, from (8.1) is not too dicult by
a sort of undeterminate coecient method, but will not be detailed here.
Observe that the previous results were mainly discussing the case when = 0. When = / 0,
they can be used after proper scaling.
2
Example 8.9. The sequence (2n/n!) satises un+1 n + 1 un=0. The characteristic polynomial is X
whose only root is 0 and the only information we obtain from Poincars theorem is that un+1/un
tends to 0. However, one can scale the sequence so as to obtain a more interesting characteristic
polynomial by setting un = vn n!. Clearly the appropriate is 1 and then the new recurrence
has characteristic polynomial X 2 and a 2nenn behaviour with n 0 is recovered.
Lemma 8.10. If the coefficients a1(n), ..., ak(n) of ( 8.1) belong to C(n), with ak =
/ 0, if un does
not belong to the subspace Vect(n2 , ..., nk) and n2 = o(n1 ), then as n , un/n1 tends to a
nonzero constant.
8.4 Computation of Chebyshev series 87
The statement of the analogue of Lemma 8.3 for the block power method is clear and left as
an exercice.
In this setting, the inverse power method is known as Millers algorithm. Consider a recurrence
yn = A(n) yn1
where A(n) is a nonsingular matrix in Ck k and A(n) = A + Bn with kBn k 0, but now used in
the backward direction: starting from
1
0
yN = e =
,
0
we compute yN 1, ..., y0 and normalize by setting
yn
wn: = .
ky0k
The natural question is: Does (wn) converge when N ?
Let Un be a basis of solutions dened by U0 = Id and Un = A(n) Un1. Then we can write
1
yn = Un T for some xed vector T . In particular we have e = UN T so that T = UN e and
1
1 Un UN e
yn = Un UN e, wn = 1 .
kUN ek
The convergence of wn is thus equivalent to that of
1
UN e
w0 = 1 .
kUN ek
From there, we obtain Un1 e as the transpose of the rst row of this solution. This is not directly
a solution of an easily written recurrence, but a more careful analysis shows that the required
convergence can be deduced as in Mt and Nevais result [26] and similarly (but more technically)
for the block extension of this method [2].
and uses it to show that the Chebyshev recurrence P (k, S) is given by:
Proof. For any polynomial P (x), the polynomial P ((S + S 1)/2) is symmetric and satises the
property. The proof of Theorem 6.35 shows that the denominators of fractions M (X , D) with M a
polynomial are all powers of D. It follows that it is sucient to prove that the property is preserved
by addition and product by D 1. Addition is clear. Expanding the product of such operator by D 1
gives
2 2b (k 1) s1 2bs+1(k 1) s
(S S 1)(bs(k)S s + + bs(k)S s) = s S S +
k k k
2(bs(k + 1) bs+2(k 1)) s+1 2(bs2(k + 1) bs(k 1)) s1 2b (k + 1) s
S + + S + s1 S +
k k k
2bs(k + 1) s+1
S ,
k
from where the property is visible.
8.4.2 Problems
The recurrence operator (8.7) on the Chebyshev coecients suers from several drawbacks:
Its order is too big: in general s = r + maxdeg ai. This recurrence, besides the coecients of
Chebyshev series expansions of its solutions, possesses extra solutions that:
do not tend to zero,
and/or are not symmetric.
Also, the leading coecient bs(n) has zeros, which complicates the use of the recurrence
both in the forward and backward directions.
8.4.3 Convergence
We want to isolate the solutions of the recurrence tending to 0, since only them can correspond to
Chebyshev series of solutions.
Proof. The highest degree of the monomials images of the morphism (8.6) is reached when the
degree of is maximal.
Proof. The rst property is clear: the constant term of P is the leading term of ar divided by a
power of 2. The constraint on the modulus comes from the fact that the image of the unit circle
under the mapping X 7 (X + X 1)/2 is the interval [1, 1]. The last one comes from the quasi-
invariance of P under X 7 1/X.
8.4 Computation of Chebyshev series 89
Note that the hypothesis on ar is quite natural from the analytic point of view: it implies that
all solutions of the linear dierential equation are analytic in a neighborhood of [1, 1] (by Cauchys
theorem), hence have a (quickly) convergent Chebyshev series. A consequence of this corollary is
that the recurrence possesses many solutions that tend to innity (those with || > 1) and thus do
not correspond to Chebyshev series. The symmetry property of Proposition 8.13 can be used to
prove that there is a one-to-one correspondence between divergent and convergent solutions of the
recurrence, encompassing also the solutions growing like powers of n!.
8.4.4 Symmetry
The considerations of the previous sections suggest the use of Millers method for a block of
dimension s, which is half the order of the recurrence, but still larger than the order of the
dierential equation. Further constraints are obtained by considering symmetric solutions only. Let
S = {n > s : bs(n) = 0} be the set of indices where the recurrence can be used backwards. Let then
E = {(u|n|)nZ : n N\S , P (u|n|) = 0}
denote the space of symmetric sequences that are solutions of the recurrence, except possibly
at n S.
Proposition 8.16. The symmetric, convergent solutions of P (un) = 0 are exactly the sequences
of E that satisfy the extra conditions
(P u)n = 0, n S {r, r + 1, ..., s}.
8.4.5 Algorithm
We conclude this chapter by stating without proof the (simple) algorithm due to Benoit, Jolde
and Mezzarobba and their (not so simple) main theorem.
Let p(x) be the degree-5 minimax approximation to e1/cos x over [1, 1]. We observed earlier
that obtaining a good enclosure of e1/cos x p(x) using the natural interval enclosure of this
expression on subintervals would require about 107 subintervals. In this chapter, we present some
tools that make it possible to get a certied enclosure, as sharp as wished, of this error function
e1/cos x p(x) in an ecient way.
Definition 9.1. Let f C([a, b]). A rigorous polynomial approximation to f is a pair (p, ) where
p R[x] and f (x) p(x) for all x [a, b].
Lemma 9.2. (Taylor- formula.) Let n N, and let f C n+1([a, b]). Let x0 [a, b]. For all x [a, b],
we have
n Z x (n+1)
X f (i)(x0) f (t)
f (x) = (x x0)i + (x t)n dt.
i! x n!
i=0 ||||||||||||0||||||||||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
Rn(x)
Using the mean value theorem, there exists x between x0 and x such that
f (n+1)(x)
Rn(x) = (x x0)n+1.
(n + 1)!
To obtain rigorous polynomial approximations based on the previous lemma, there are two
kinds of computations to perform:
that of the Taylor coecients;
1
that of (n + 1)!
f (n+1)([a, b]).
When it comes to compute values of derivatives of a function f , there are several options.
Divided dierences: this gives rise to gross errors.
Symbolic computation: time (and memory?)-consuming... Also may give rise to huge over-
estimations for composite functions.
Automatic dierentiation. (Cf. Griewanks texts [9][8].) This is the idea we develop next.
We want to compute values of derivatives of f up to the order k. We associate to each function
u present in the expression tree/dag of f an array of size k + 1, [u0, u1, ..., uk]. Depending on the
context, ui may be either of u(i)(x0), u(i)([x0]) or u(i)([a, b]). Then we dene operations on these
arrays that overload the usual ones. For instance, given u = [u0, u1, ..., uk] and v = [v0, v1, ..., vk],
we dene
u + v = [u0 + v0, u1 + u1, ...],
u v = [u0 v0, u0 v1 + u1 v0, ...].
91
92 Rigorous Polynomial Approximations
Definition 9.3. We call Taylor model a rigorous polynomial approximation computed using this
scheme.
Basic functions. We use Taylor formulae. As most basic functions we deal with are D-nite, we
can use linear recurrence relations to compute enclosures of their Taylor coecients and (to some
extent) remainders.
9.3 A little, little, little bit of fixed-point theory 93
Second, why would this process yield tight enclosures? Our basic functions are analytic, and
hence have (fast) converging Taylor series.
If [a, b] = [1, 1] and the xi are the Chebyshev nodes of the rst kind, we have
1
Wn(x) = Tn(x).
2n+1
In this case,
f (n+1)([1, 1])
f (x) p(x) .
(n + 1)! 2n
We may dene Chebyshev models similar to the Taylor models. We express the polynomials in
the Chebyshev basis (Tk). For basic functions, we use an interpolant at the Chebyshev nodes of the
rst kind. The remainder formula is computed using the recurrence satised by the corresponding
Taylor expansion. For composite functions, we resort to the same two-step process as in the Taylor
1
case, making use of the formula Ti T j = 2 (Ti+j + T|ij |). For bouding the ranges of polynomials,
we replace the Horner scheme with the Clenshaw scheme.
: C([1, 1])
C([1, 1])
Z x
y 7 x 7 y0 + a(t) y(t) dt .
0
94 Rigorous Polynomial Approximations
A function f is a solution to (9.1) i (f ) = f , a xed point equation in the Banach space C([1, 1]).
For all u, v C([1, 1]) and x [1, 1], we have
Z x
|u(x) v(x)| = a(t) (u(t) v(t)) dt 6 kak ku vk,
0
Hence there exists i0 such that i0 < 1, and then i0 is a contraction over C([1, 1]). So there exists
a unique f such that C([1, 1]).
Bibliography
[1] A. I. Aptekarev. Sharp constants for rational approximations of analytic functions. Russian Acad. Sci. Sb.
Math., 193:172, 2002.
[2] Alexandre Benoit, Mioara Jolde and Marc Mezzarobba. Rigorous uniform approximation of D-nite functions.
, , 2012. In preparation.
[3] W. J. Cody, G. Meinardus and R. S. Varga. Chebyshev rational approximations to ex in [0, +) and
applications to heat-conduction problems. J. Approximation Theory, 2:5065, 1969.
[4] E. W. Cheney. Introduction to Approximation Theory, 2nd edition. AMS Chelsa Publishing. Providence, Rhode
Island, 1982.
[5] Marc Daumas and Guillaume Melquiond. Certication of bounds on expressions involving rounded operators.
Transactions on Mathematical Software, 37(1):120, 2010.
[6] Florent de Dinechin, Christoph Lauter and Guillaume Melquiond. Certifying the oating-point implementation
of an elementary function using Gappa. Transactions on Computers, 60(2):242253, 2011.
[7] Rida T. Farouki. The Bernstein polynomial basis: a centennial retrospective. Comput. Aided Geom. Design,
29(6):379419, 2012. Available from http://mae.engr.ucdavis.edu/~farouki/bernstein.pdf.
[8] Andreas Griewank. A mathematical view of automatic dierentiation. Acta Numer., 12:321398, 2003.
[9] Andreas Griewank and Andrea Walther. Evaluating derivatives. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, PA, Second edition, 2008. Principles and techniques of algorithmic dierentiation.
[10] Nicholas Hale and Alex Townsend. Fast and Accurate Computation of GaussLegendre and GaussJacobi
Quadrature Nodes and Weights. SIAM Journal on Scientific Computing, 35(2):0, 2013.
[11] Thomas C. Hales. A proof of the Kepler conjecture. Ann. of Math. (2), 162(3):10651185, 2005.
[12] Thomas C. Hales, John Harrison, Sean McLaughlin, Tobias Nipkow, Steven Obua and Roland Zumkeller. A
revision of the proof of the Kepler conjecture. Discrete Comput. Geom., 44(1):134, 2010.
[13] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University Press, Fifth
edition, 1979.
[14] Guillaume Melquiond. De larithmtique dintervalles la certification de programmes. PhD thesis, cole
Normale Suprieure de Lyon, Lyon, France, 2006.
[15] Oystein Ore. Linear equations in non-commutative elds. Annals of Mathematics, 32:463477, 1931.
[16] Allan Pinkus. Weierstrass and approximation theory. J. Approx. Theory, 107(1):166, 2000. Available from
http://www.math.technion.ac.il/hat/fpapers/wap.pdf.
[17] M. J. D. Powell. Approximation theory and methods. Cambridge University Press, Cambridge, 1981.
[18] A. Schnhage. Zur rationalen Approximierbarkeit von exber [0, ). J. Approximation Theory, 7:395398,
1973.
[19] H. Stahl. Best uniform rational approximation of |x| on [1, 1]. Russian Acad. Sci. Sb. Math., 76:461487,
1993.
[20] Warwick Tucker. The Lorenz attractor exists. C. R. Acad. Sci. Paris Sr. I Math., 328(12):11971202, 1999.
[21] Warwick Tucker. A rigorous ODE solver and Smales 14th problem. Found. Comput. Math., 2(1):53117, 2002.
[22] Lloyd Nick Trefethen. Approximation Theory and Approximation Practice. SIAM, 2013. See http://
www2.maths.ox.ac.uk/chebfun/ATAP/.
[23] Warwick Tucker. Validated numerics, A short introduction to rigorous computations. Princeton University
Press, Princeton, NJ, 2011. A short introduction to rigorous computations.
[24] R. S. Varga and A. J. Carpenter. On the Bernstein conjecture in approximation theory. Constr. Approx.,
1(4):333348, 1985.
[25] L. Veidinger. On the numerical determination of the best approximations in the Chebyshev sense. Numer.
Math., 2:99105, 1960.
[26] Ray V.M. Zahar. A mathematical analysis of Millers algorithm. Numerische Mathematik, 27(4):427447,
1976.
[27] J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2nd
edition, 2003.
95