Remezr2 Arxiv
Remezr2 Arxiv
Remezr2 Arxiv
BARYCENTRIC REPRESENTATIONS
SILVIU-IOAN FILIP∗ , YUJI NAKATSUKASA† , LLOYD N. TREFETHEN‡ , AND
BERNHARD BECKERMANN§
Abstract. Computing rational minimax approximations can be very challenging when there are
singularities on or near the interval of approximation — precisely the case where rational functions
outperform polynomials by a landslide. We show that far more robust algorithms than previously
available can be developed by making use of rational barycentric representations whose support
points are chosen in an adaptive fashion as the approximant is computed. Three variants of this
barycentric strategy are all shown to be powerful: (1) a classical Remez algorithm, (2) a “AAA-
Lawson” method of iteratively reweighted least-squares, and (3) a differential correction algorithm.
Our preferred combination, implemented in the Chebfun MINIMAX code, is to use (2) in an initial
phase and then switch to (1) for generically quadratic convergence. By such methods we can calculate
approximations up to type (80, 80) of |x| on [−1, 1] in standard 16-digit floating point arithmetic, a
problem for which Varga, Ruttan, and Carpenter required 200-digit extended precision.
Key words. barycentric formula, rational minimax approximation, Remez algorithm, differen-
tial correction algorithm, AAA algorithm, Lawson algorithm
where k · k∞ denotes the infinity norm over [a, b], i.e., kf − rk∞ = maxx∈[a,b] |f (x) −
r(x)|. The minimizer of (1.2) is known to exist and to be unique [58, Ch. 24].
Let the minimax (or best) approximation be written r∗ = p/q ∈ Rm,n , where p
and q have no common factors. The number d = min {m − deg p, m − deg q} is called
the defect of r∗ . It is known that there exists a so-called alternant (or reference) set
consisting of ordered nodes a 6 x0 < x1 < · · · < xm+n+1−d 6 b, where f − r∗ takes
its global extremum over [a, b] with alternating signs. In other words, we have the
beautiful equioscillation property [58, Theorem 24.1]
1 Chebfun’s previous remez command (until version 5.6.0 in December 2016) could only go up to
type (m, n) and include the initialization strategies that are crucial for making the
entire procedure into a fully practical algorithm.
This work is motivated by the recent AAA algorithm [43] for rational approxima-
tion, which uses adaptive barycentric representations with great success. A large part
of the text is focused on introducing a robust version of the rational Remez algorithm,
followed by a discussion of two other methods for discrete `∞ rational approximation:
the AAA-Lawson algorithm (efficient at least in the early stages, but non-robust) and
the DC algorithm (robust, but not very efficient). We shall see how all three algo-
rithms benefit from an adaptive barycentric basis. In practice, we advocate using the
Remez algorithm, mainly for its convergence properties (usually quadratic [21], unlike
AAA-Lawson, which converges linearly at best), practical speed (an eigenvalue-based
Remez implementation is usually much faster than a linear programming-based DC
method), and its ability to work with the interval [a, b] directly rather than requiring
a discretization (unlike both AAA-Lawson and DC). AAA-Lawson is used mainly as
an efficient approach to initialize the Remez algorithm.
The paper is organized as follows. In Section 2 we review the barycentric rep-
resentation for rational functions. Sections 3 to 6 are the core of the paper; here
we develop the barycentric rational Remez algorithm with adaptive basis functions.
Numerical experiments are presented in Section 7. We describe the AAA-Lawson
algorithm in Section 8, and in Section 9 we briefly present the barycentric version of
the differential correction algorithm. Section 10 presents a flow chart of minimax and
an example of how to compute a best approximation in Chebfun.
2. Barycentric rational functions. All of our methods are made possible by
a barycentric representation of r, in which both the numerator and denominator are
given as partial fraction expansions. Specifically, we consider
n n
X
N (z) X αk βk
r(z) = = , (2.1)
D(z) z − tk z − tk
k=0 k=0
then p(z) = ωt (z)N (z) and q(z) = ωt (z)D(z) are both polynomials in Rn [x]. We thus
get r(z) = p(z)/q(z), meaning that r is a type (n, n) rational function. (This is not
necessarily sharp; r may also be of type (µ, ν) with µ < n and/or ν < n.) At each
point tk with nonzero αk or βk , formula (2.1) is undefined, but this is a removable
singularity with limz→tk r(z) = αk /βk (or a simple pole in the case αk 6= 0, βk = 0),
meaning r is a rational interpolant to the values {αk /βk } at the support points {tk }.
Much of the literature on barycentric representations exploits this interpolatory
property [7,8,10,12,27,55] by taking αk = f (tk )βk , so that r is an interpolant to some
given function values f (t0 ), . . . , f (tn ) at the support points. In this case
n n
X f (tk )βk X βk
r(z) = , (2.2)
z − tk z − tk
k=0 k=0
4 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
with the coefficients {βk } commonly known as barycentric weights; we have r(tk ) =
f (tk ) as long as βk 6= 0. While such a property is useful and convenient when we
want to compute good approximations to f (see in particular the AAA algorithm),
for a best rational approximation r∗ we do not know a priori where r∗ will intersect f ,
so enforcing interpolation is not always an option. (We use interpolation for Remez
but not for AAA-Lawson or DC.) Formula (2.1), on the other hand, has 2n + 1
degrees of freedom and can be used to represent any rational function of type (n, n)
by appropriately choosing {αk } and {βk } [43, Theorem 2.1]. We remark that variants
of (2.1) also form the basis for the popular vector fitting [30, 31] method used to
match frequency response measurements of dynamical systems. A crucial difference
is that the support points {tk } in vector fitting are selected to approximate poles of
f , whereas, as we shall describe in detail, we choose them so that our representation
uses a numerically stable basis.
1. Let Q = [1, . . . , 1]T when m > n, and Q = [f (t0 ), . . . , f (tn )]T when m < n,
and normalize to have Euclidean norm 1.
2. Let q be the last column of Q. Take the projection of diag(t0 , . . . , tmax(m,n) )q
onto the orthogonal complement of Q, normalize, and append it to the right
of Q. Repeat this |m − n| times to obtain Q ∈ C(max(m,n)+1)×(|m−n|) . In
MATLAB, this is q = Q(:,end); q = diag(t)*q; for i = 1:size(Q,2),
q = q-Q(:,i)*(Q(:,i)’*q); end, q = q/norm(q); Q = [Q,q];.
3. Take the orthogonal complement Q⊥ of Q via computing the QR factorization
of Q. Q⊥ is the desired matrix, Pm or Pn .
Note that the matrix Q in the final step is well conditioned (κ2 (Q) = 1 in exact
arithmetic), so the final QR factorization is a stable computation.
2.2. Why does the barycentric representation help?. The choice of the
support points {tk } is very important numerically, and indeed it is the flexibility
of where to place these points that is the source of the power of barycentric repre-
sentations. If the points are well chosen, the basis functions 1/(x − tk ) lead to a
representation of r that is much better conditioned (often exponentially better) than
the conventional representation as a ratio of polynomials. We motivate and explain
our adaptive choice of {tk } for the Remez algorithm in Sections 4.3 and 4.5. The
analogous choices for AAA-Lawson and DC are discussed in Sections 8.5 and 9.2.
To understand why a barycentric representation is preferable for rational approx-
imation, we first consider the standard quotient representation p/q. It is well known
that a polynomial will vary in size by exponentially large factors over an interval un-
less its roots are suitably distributed (approximating a minimal-energy configuration).
If p/q is a rational approximation, however, the zeros of p and q will be positioned by
approximation considerations, and if f has singularities or near-singularities they will
be clustered near those points. In the clustering region, p and q will be exponentially
smaller than in other parts of the interval and will lose much or all of their relative
accuracy. Since the quotient p/q depends on that relative accuracy, its accuracy too
will be lost.
6 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
1.5
1
|D|
0.5
q
0
-1 -0.5 0 0.5 1
100
|D|
10-5
10-10
q
10-15
-1 -0.5 0 0.5 1
Fig. 2.1: Linear (top) and semilogy (bottom) plots of q and |D| in r∗ = p/q = N/D, the
best rational approximation for |x| of type (m, n) = (20, 20). Here p, q are the polynomials
in the classical quotient representation (1.1), and D is the denominator in the barycentric
representation (2.1). The dots are the equioscillation points {x` }, while the set of support
points {tk } consists of every other point in {x` }.
where αk , βk denote quantities of size O(u), or more precisely, bounded by (1 +
u)3n+4 . In other words, rb(x) is an exact evaluation of (2.1) for slightly perturbed
{αk }, {βk }. Note that when r represents a polynomial (as assumed in [34]), (2.6)
does not imply backward stability. However, as a rational function for which we allow
for backward errors in the denominator, (2.6) does imply backward stability.
For the forward error, we can adapt the analysis of [14, Proposition 2.4.3]. Assume
that the computed coefficients αb, βb are obtained through a backward stable process,
where κα and κβ are condition numbers associated with the matrices used to determine
b and β.
α b Then, if x (the evaluation point) and {tk } are considered to be floating point
numbers, we have
RATIONAL MINIMAX APPROXIMATION 7
Lemma 2.1. The relative forward error for the computed value rb(x) of (2.1)
satisfies
Pn αk Pn βk
r(x) − rb(x) k=0 x−tk k=0 x−tk
≤ u(n+3+O(κα )) P +u(n+2+O(κβ )) P +O(u2 ).
r(x) n αk n βk
k=0 x−tk k=0 x−tk
(2.7)
with s ∈ {±1} and such that for at least one ` ∈ {0, . . . , m + n + 1}, the left-
hand side of (3.2) equals kf − rk k∞ . If rk has converged to within a given
threshold εt > 0 (i.e., (kf − rk k∞ − λk )/ kf − rk k∞ ≤ εt [50, eq. (10.8)])
return rk , else go to Step 2 with k ← k + 1.
If Step 2 is always successful, then convergence to the best approximation is
assured [63, Theorem 9.14]. It might happen that Step 2 fails, namely when all rational
solutions satisfying the equations (3.1) have poles in [a, b]. If the best approximation
is non-degenerate and the initial reference set is already sufficiently close to optimal,
then the algorithm will converge [11, §V.6.B]. To our knowledge, there is no effective
way in general to determine when degeneracy is the cause of failure.
We note that the rational Remez algorithm can also be adapted to work in the
case of weighted best rational approximation. An early account of this is given in [22].
Given a positive weight function w ∈ C([a, b]), the goal is to find r∗ ∈ Rm,n such
that the weighted error kf − r∗ kw,∞ = maxx∈[a,b] |w(x)(f (x) − r∗ (x))| is minimal.
Equations (3.1) and (3.2) get modified to
(k) (k) (k)
w(x` ) f (x` ) − rk (x` ) = (−1)`+1 λk , ` = 0, . . . , m + n + 1
8 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
and
(k+1) (k+1) (k+1)
s(−1)` w(x` ) f (x` ) − rk (x` ) ≥ |λk | , ` = 0, . . . , m + n + 1,
while the norm computations in Step 3 are taken with respect to w. Notice that the
ability to work with the weighted error immediately allows us to compute the best
approximation in the relative sense, by taking w(x) = 1/|f (x)|, assuming that f is
nonzero over [a, b].
We discuss each step of the rational Remez algorithm in the following sections.
We first address Step 2, as this is the core part where the barycentric representation
is used. We then discuss initialization (Step 1) in Section 5, and finding the next
reference set (Step 3) in Section 6. Our focus is on the unweighted setting, but we
comment on how our ideas can be extended to the weighted case as well.
4. Computing the trial approximation. For notational simplicity, in this
section we drop the index k referring to the iteration number, the analysis being valid
for any iteration of the rational Remez algorithm. We begin with the case m = n.
4.1. Linear algebra in a polynomial basis. We first derive the Remez algo-
rithm in an (arbitrary) polynomial basis. At each iteration, we search for r = p/q ∈
Rn,n , p, q ∈ Rn [x] such that
As described in Powell [50, Ch. 10.2], solving (4.3) is usually done by eliminating
cp . His presentation considers the monomial basis, but the approach is valid for any
basis of Rn [x]. By taking the full QR decomposition of DΦx , we get
R
DΦx = Q1 Q2 = Q1 R.
0
(The top-left (n+1)×(n+1) block has all eigenvalues at infinity, and is thus irrelevant.)
In terms of polynomials, (Q1 )`,k = d` ψk (x` ), 0 ≤ k ≤ n, 0 ≤ ` ≤ 2n + 1, where
(ψk )0≤k≤n is a family of orthonormal polynomials with respect to the discrete inner
P2n+1
product hf, gix = k=0 d2k f (xk )g(xk ). Moreover, if (ϕk )0≤k≤n is a degree-graded
basis with deg ϕk = k, then we have deg ψk = k, 0 ≤ k ≤ n.
Let ωx be the node polynomial associated with the reference nodes x0 , . . . , x2n+1 ,
and Ωx = diag (1/ωx0 (x0 ), . . . , 1/ωx0 (x2n+1 )). We have [50, p. 114]
VxT Ωx Vx = 0, (4.5)
the divided differences of order 2n + 1 of the function xi+j at the {x` } nodes, hence
0 if i + j ≤ 2n.
By using the appropriate change of basis matrix in (4.5), we have
ΦTx Ωx Φx = 0. (4.6)
Now, by multiplying (4.3) on the left by ΦTx Ωx D−1 and using (4.6), we can eliminate
the cp term to obtain
Equation (4.7) is the extension of [50, Eq. (10.13)] from the monomial basis to
ϕ0 , . . . , ϕn . Moreover, we have:
Lemma 4.1. The matrix ΦTx Ωx SΦx is symmetric positive definite.
Proof. Since Ωx S = |Ωx |, it means that Ωx S is symmetric positive definite, and
the conclusion follows. See also [50, Theorem 10.2].
Since ΦTx Ωx F Φx is also symmetric, it follows that all eigenvalues of (4.7) are real
and at most one eigenvector cq corresponds to a pole-free solution r (i.e., q has no
root on [a, b]). To see this, suppose to the contrary that there exists another pole-free
solution r0 . Then, from (4.1), it follows that either r(xk ) − r0 (xk ) are all zero or they
alternate in sign at least 2n + 1 times. In both cases, r − r0 ∈ R2n,2n has at least
2n + 1 zeros inside [a, b], leading to r = r0 .
10 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
We can in fact transform (4.4) into a symmetric eigenvalue problem (an observa-
1/2
tion which seems to date to [49]) by considering the choice D = |Ωx | , which leads
to Q2 = SQ1 in view of (4.6). The system becomes QT1 SF Q1 Rcq = λQT1 S 2 Q1 Rcq ,
which, by the change of variables y = Rcq , gives
The vectors Rcp and Rcq can be seen as vectors of coefficients of the numerator
and denominator of r in the orthogonal basis ψ0 , . . . , ψn . The (scaled) values of
the denominator at each xk corresponding to an eigenvector y can be recovered by
computing
1/2
|Ωx | Φx cq = Q1 y. (4.9)
From this we can confirm the uniqueness of the pole-free solution: since the eigenvec-
tors are orthogonal, there is at most one generating a vector of denominator values of
the same sign, making it the only pole-free solution candidate.
4.2. Linear algebra in a barycentric basis. An equivalent analysis is valid
if we take r in the barycentric form (2.1). Namely, (4.1) becomes
f (x0 ) −1
f (x1 ) 1
Cα = − λ Cβ, (4.10)
. ..
−1
..
f (x )2n+1 .
Based on Lemma 4.3, we can again take Q2 = SQ1 . From (4.11) we get
h i α h i α
1/2 1/2 1/2
|∆| C −F |∆| C = λ 0 −S |∆| C .
β β
T
Multiplying this expression on the left by Q1 Q2 gives a block triangular matrix
pencil, whose (n + 1) × (n + 1) lower-right corner is the barycentric analogue of (4.4):
QT2 F Q1 Rβ = λQT2 SQ1 Rβ. After substituting QT2 = QT1 S, we get
As in the polynomial case, there is at most one solution such that q(x) = D(x)ωt (x)
has no root in [a, b]; indeed, (4.9) and (4.14) represent the values of q(x` ) for r = p/q
and x` satisfying equation (4.1). We use this sign test involving (4.14) to determine
the levelled error λ that gives a pole-free r in Step 2 of our rational Remez algorithm.
The appropriate β is then taken by solving Rβ = y. From (4.10), we have
1/2 1/2
|∆| Cα = (F − λS) |∆| Cβ,
and
QT1 (SF )Q1 y = λQT1 W −1 Q1 y,
where W = diag (w(x0 ), . . . , w(x2n+1 )) and all the other quantities are the same as
before. While not leading to a symmetric eigenvalue problem, the symmetric and
symmetric positive definite matrices appearing in the second pencil seem to suggest
that the eigenproblem computations will again correspond to well-conditioned opera-
tions. Our experiments support this statement and we leave it as future work to make
this rigorous. To recover α, (4.15) becomes Rα = QT1 (F − λSW −1 )Q1 y.
4.3. Conditioning of the QR factorization. Since the above discussion makes
heavy use of the matrix Q1 , it is desirable that computing the (thin) QR factorization
1/2
|∆| C = Q1 R is a well-conditioned operation.
Here we examine the conditioning of Q1 , the orthogonal factor in the QR fac-
torization of |∆|1/2 C, as this is the key matrix for constructing (4.12). We use the
fact that the standard Householder QR algorithm is invariant under column scal-
1/2 1/2
ing, that is, it computes the same Q1 for both |∆| C and |∆| CΓ for diagonal
Γ [33, Ch. 19]. We thus consider
1/2
min κ2 (|∆| CΓ), (4.16)
Γ∈Dn+1
Proof. Let {yj } be a (2n + 2)-element set such that yj ∈ (xj , xj+1 ), j = 0, . . . , 2n,
y2n+1 > x2n+1 and let Cx,y ∈ R(2n+2)×(2n+2) be the p Cauchy matrix with elements
(Cx,y )j,k = 1/(xj − yk ). If we consider D1 = diag( |ωy (xj )/ωx0 (xj )|) and D2 =
q
diag( ωx (yj )/ωy0 (yj ) ), then the matrix D1 Cx,y D2 is orthogonal. This follows, for
instance, if we examine the elements of its associated Gram matrix G and use divided
differences. Indeed, for an arbitrary element (G)j,k with j 6= k, we have
s 2n+1
ωx (yj )ωx (yk ) X ωy (x` )
−(G)j,k =
ωy0 (yj )ωy0 (yk ) (x` − yj )(x` − yk )ωx0 (x` )
`=0
s
ωx (yj )ωx (yk ) ωy (x)
= [x0 , . . . , x2n+1 ] = 0.
ωy0 (yj )ωy0 (yk ) (x − yj )(x − yk )
Similarly, since j6=k (x − yj ) = q(x)(x − yk ) + ωy0 (yk ), with q ∈ R2n [x], we have,
Q
2n+1 Q
j6=k (x − yj )
ωx (yk ) X ωy (x` ) ωx (yk )
−(G)k,k = 0 = [x0 , . . . , x2n+1 ]
ωy (yk ) (x` − yk )2 ωx0 (x` ) ωy0 (yk ) x − yk
`=0
ωy0 (yk )
ωx (yk )
= 0 q(x) + [x0 , . . . , x2n+1 ]
ωy (yk ) x − yk
1 −1
= ωx (yk ) [x0 , . . . , x2n+1 ] = ωx (yk ) = −1.
x − yk ωx (yk )
RATIONAL MINIMAX APPROXIMATION 13
Let Γ = ItT D2 It be as in the proof of Theorem 4.4. It turns out that for the choice
1/2
tk = x2k+1 − ε, sk = x2k+1 + ε, for k = 0, . . . , n, as ε → 0, the matrix |∆| C has
a finite limit Ce of full column rank, and similarly Γ tends to some diagonal matrix
Γ with positive diagonal entries. From Theorem 4.4 and its proof we know that C
e eΓe
has condition number 1, and, more precisely, orthonormal columns. We thus obtain
an explicit thin QR decomposition of C e (by direct calculation):
1/2
Corollary 4.5. In the limit tk % x2k+1 , for k = 0, . . . , n, the matrix |∆| C
converges to C,e with entries
|w0 (t )|
√ t k if j = 2k + 1,
|wx0 (tk )|
(C)j,k =
e 0 if j = 2` + 1, ` 6= k,
√|wt (xj )| /(x − t ) if j = 2`,
0 (x )| j k
|wx j
√
0 0
and R = 2 diag √|wt (t
0
0 )| |w (t )|
, . . . , √ t0 n .
|wx (t0 )| |wx (tn )|
Corollary 4.5 suggests the choice
4.4. The nondiagonal case m 6= n. As pointed out in Section 2.1, when search-
ing for a best approximant with m 6= n, we need to force the coefficient vector α or β
to lie in a certain subspace. This results in modified versions of (4.11). Namely,
α α
C −F CPn b = λ 0 −SCPn b , when m > n, (4.19)
β β
1/2
C Pn Pn⊥ = (S∆)1/2 C Pn Pn⊥ :
Consider the (thin) QR decomposition of |∆|
1/2 R1 R12
Pn Pn⊥
|∆| C = Q1 Q2 R = Q1 Q2 .
0 R2
T
Then we have the identity Q1 Q2 (SQ1 ) = 0, as can be verified analogously to (4.5)
1/2
using divided differences. This implies (SQ1 )T |∆| C = 0, so by left-multiplying
T
(4.21) by (SQ1 )⊥ SQ1
we obtain a block upper-triangular eigenvalue problem
with lower-right (n + 1) × (n + 1) block
Therefore
T
α = Pn Pn⊥ R−1 Q1 Q2 F Q1 y,
T
which is obtained
by computing the vector yb = Q1 Q2 F Q1 y, then solving Re
y = yb
for ye, then α = Pn Pn⊥ ye.
RATIONAL MINIMAX APPROXIMATION 15
Case m < n. This case is analogous to the previous one; we highlight the differ-
ences. C is a (m + n + 2) × (n + 1) matrix. Equation (4.20) is equivalent to
h i α h i α
1/2 1/2 1/2
|∆| CPm −F |∆| C = λ 0 −S |∆| C . (4.23)
b b
β β
which also reduces to the standard symmetric eigenvalue problem (setting y = Rβ)
b T1 F |∆|1/2 Cβ = Q
bα =Q
Rb b T1 F Q1 Rβ = Q
b T1 F Q1 y.
Therefore
b−1 Q
b=R
α b T F Q1 y,
1
and
b−1 Q
b=R
α b T1 (F − λSW −1 )Q1 y, m < n.
Stability and conditioning. We have just shown that the matrices arising in our
rational Remez algorithm have explicit expressions, and the eigenvalue problem re-
duces to a standard symmetric problem. Indeed, our experiments corroborate that
we have greatly improved the stability and conditioning of the rational Remez al-
gorithm using the barycentric representation. However, the algorithm is still not
16 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
n 1
(m3, n3)
Approximation domain
(m2, n2) 0.8
0.6
(m1, n1) New initialization points {x′′ℓ }
0.4
0
(0, 0) m 0 0.2 0.4 0.6 0.8 1
Normalized indices
Fig. 5.1: Initialization with lower degree approximations. The left plot shows the three
possible paths for updating the degrees (assuming the increment is j = 1): m < n (red),
m = n (black) and m > n (blue). The right plot shows how initialization is done √ at an
intermediate step. The function is f1 from Table 7.1, with a singularity at x = 1/ 2. The
y components of the red crosses correspond to the final references {x0` } for the (m0 , n0 ) =
(10, 10) best approximation, while the y components of the black circles are the initial guess
{x00` } for the (m00 , n00 ) = (11, 11) problem, taken based on the piecewise linear fit at {x0` }.
Note how the y components of both sets of points cluster near the singularity.
×10-4
Current Ref.
5 Next Ref.
Error
-5
Fig. 6.1: Illustration of how a new set of reference points (black stars) is found from the
current error function e = f − r (blue curve). Shown here is the error curve after three
Remez iterations in finding the best type (10, 10) approximation to f (x) = |x| on [−1, 1].
We split this interval into subintervals separated by the previous reference points (red circles),
and approximate e on each subinterval by a low-degree polynomial. We then find the roots
of its derivative.
and select those with the largest values that satisfy (3.2).
7. Numerical results. All computations in this section were done using Cheb-
fun’s new minimax command in standard IEEE double precision arithmetic.
Let us start with our core example of approximating |x| on [−1, 1], a problem
discussed in detail in [58, Ch. 25]. For more than a century, this problem has at-
tracted interest. The work of Bernstein and others in the 1910s led to the theorem
that degree n ≥ 0 polynomial approximations of this function can achieve at most
O(n−1 ) accuracy, whereas Newman in 1964 showed that rational approximations can
achieve root-exponential accuracy [45]. The convergence rate for best type √ (n, n)
approximations was later shown by Stahl [56] to be En,n (|x|, [−1, 1]) ∼ 8e−π n .
This result had in fact been conjectured by Varga, Ruttan and Carpenter [62]
based on a specialized multiple precision (200 decimal digits) implementation of the
Remez algorithm. Their computations were performed √ on the square root function,
using the fact that E2n,2n (|x|, [−1, 1]) = En,n ( x, [0, 1]), as follows from symmetry.
They went up to n = 40. In both settings, the equioscillation points cluster expo-
nentially around x = 0 (see second plot of Figure 7.1), making it extremely difficult
to compute best approximations. Our barycentric Remez algorithm in double preci-
sion arithmetic is able to match their performance, in the sense that we obtain the
type (80, 80) best approximation to |x| in less than 15 seconds on a desktop machine.
The results are showcased in Figure 7.1, where our levelled error computation for the
type (80, 80) approximation (value 4.39 . . . × 10−12 ) matches the corresponding error
of [62, Table 1] to two significant digits, even though the floating point precision is no
better than 10−16 .
Running the other non-barycentric codes (Maple’s numapprox[minimax], Math-
ematica’s MiniMaxApproximation (which requires f to be analytic on [a, b]), and
Chebfun’s previous remez) on the same example resulted in failures at very small
values of n (all for n ≤ 8).
The robustness of our algorithm is also illustrated by the examples of Table 7.1
and Figure 7.2, which is a highlight of the paper. Computing these five approximations
takes in total less than 50 seconds with minimax. Example f4 is taken from [60, §5],
while f5 is inspired by [51]. The difficulty of approximating f5 is even more pronounced
than for |x|, since best type (n, n) approximations to f5 offer at most O(n−1 ) accu-
RATIONAL MINIMAX APPROXIMATION 19
10 0
-5
10
-10
10
10 -15
0 10 20 30 40 50 60 70 80
-12
10
-2
-4
-6
-8
-12 -10 -8 -6 -4 -2 0
10 10 10 10 10 10 10
Fig. 7.1: In the first plot, the upper dots show the best approximation errors for the degree
2n best polynomial approximations of |x| on [−1, 1], while the lower ones correspond to the
best type (n, n) rational approximations, superimposed on the asymptotic formula from [56].
The bottom plot shows the minimax error curve for the type (80, 80) best approximation
to |x|. Note that the horizontal axis has a log scale: the alternant ranges over 11 orders of
magnitude. The positive part of the domain [−1, 1] is shown (by symmetry the other half is
essentially the same).
i fi [a, b] (m, n) kf − r∗ k∞
(
x2 , x< √1
1 √ 2 [0, 1] (22, 22) 2.439 × 10−9
−x2 + 2 2x − 1, √12 ≤ x
4.371 × 10−8
p
2 |x| |x| [−0.7, 2] (17, 71)
√ 2
3
xe−x
3 x3 + [−0.2, 0.5] (45, 23) 2.505 × 10−5
8
100π(x2 − 0.36)
4 [−1, 1] (38, 38) 1.780 × 10−12
sinh(100π(x2 − 0.36))
1
5 − [−0.1, 0.1] (8, 8) 1.52 × 10−2
log |x|
racy (a stark contrast to the root-exponential behavior of En,n (|x|, [−1, 1])) and the
reference points cluster even more strongly, quickly falling below machine precision.
In Figures 7.3 and 7.4, we further illustrate minimax and its weighted variant,
20 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
×10 -9
1
4
2
0.5 f1 0
-2
-4
0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
×10 -8
3
5
2
f2 0
1
-5
0
-0.5 0 0.5 1 1.5 2 -0.5 0 0.5 1 1.5 2
×10 -5
5
0.2
f3
0.1
0
0
-0.1 -5
-0.2 -0.1 0 0.1 0.2 0.3 0.4 -0.2 -0.1 0 0.1 0.2 0.3 0.4
×10 -12
1
2
0
0.5 f4
-2
0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
0.6
0.02
0.4
f5 0
0.2
-0.02
0
-0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1
Fig. 7.2: Error curves for the best rational approximations of Table 7.1.
Fig. 7.3: Result of the weighted version of our barycentric Remez algorithm for the function
√ √
f (x) = x, x ∈ [10−8 , 1] with w(x) = 1/ x and a type (17, 17) rational approximation. We
plot the absolute error curve on the left, while the relative error (right), matching our choice
of w, gives an expected equioscillating curve. This is Zolotarev’s third problem.
Fig. 7.4: The error in type (35, 34) best approximation to the sign function on [−104 , −1] ∪
√
[1, 104 ], computed via xr(1/x2 ), where r(x) ≈ x as obtained in Figure 7.3. This is
Zolotarev’s fourth problem.
where Z = {z1 , . . . , zM } is a set of distinct points (sample points) in [a, b]. The
number M is usually large, e.g. 105 , and in particular much bigger than m and n.
The idea is that the solution for the discrete problem (8.1) should converge to the
continuous one (1.2) if we discretize the interval densely enough.
22 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
minimize kf (Z)D(
e e − N (Z)k
Z) e w, (8.4)
kαk22 +kβk22 =1
√
via the SVD of the matrix diag( w) C −F C (recall (8.2)). If the resulting
kf (Z) − N (Z)/D(Z)k∞ is not smaller than before, then set γ := γ/2.
2. Update w by
γ
N (Zj ) wj
wj ← wj f (Zj ) − , ∀j, then wj := P (8.5)
D(Zj ) i wi
10-3 10-5
4 1
2 0.5
0 0
-2 -0.5
-4 -1
Fig. 8.1: Error of rational approximants to f (x) = |x| by the AAA and AAA-Lawson algo-
rithms. The black dots are the support points. They are also interpolation points for AAA,
but not for AAA-Lawson.
10-2
10-4
error
10-6
AAA-Lawson
10-8
AAA-Lawson+Remez
-10
10
5 10 15 20 25 30
Iteration
Fig. 8.2: Convergence of AAA-Lawson alone and AAA-Lawson followed by Remez, for f (x) =
|x|, m = n = 10. The error is measured by kr∗ − rk k∞ , where rk is the kth iterate. AAA-
Lawson converges linearly, whereas Remez converges quadratically.
often achieving an error of the same order of magnitude as the best approximation.
Our experiments suggest that AAA-Lawson is at least as efficient and robust as these
alternatives.
8.5. Adaptive choice of support points. At an early stage of the AAA-
Lawson iteration, we usually do not have the correct number (m + n + 2) of reference
(oscillation) points in the error curve. Therefore, choosing the support points {tk }
as in (4.18) is not an option. Instead, we use the same support points chosen by the
AAA algorithm, which is typically a good set. Once convergence sets in and the error
curve of the AAA-Lawson iterates has at least m + n + 2 alternation points, we can
switch to the adaptive choice (4.18) as in Remez. We note, however, that adaptively
changing the support points may further complicate the convergence, since it changes
the linear least-squares problem (8.4).
8.6. Adaptive choice of the sample points. For solving the continuous prob-
lem (1.2), we take the sample point set Z to be M points uniformly distributed on
[a, b] (M . 105 , chosen to keep the run time under control). Generally, it is necessary
to sample more densely near a singularity if there is one; this is important e.g. for
RATIONAL MINIMAX APPROXIMATION 25
f (x) = |x|. We incorporate this need as follows: use AAA to find the support points
{tk } (assume they are sorted), and take M/n points between [tk , tk+1 ].
9. A barycentric version of the differential correction algorithm. The
DC algorithm, due to Cheney and Loeb [16], has the great advantage of guaranteed
global convergence in theory [3,25], which applies whether the approximation domain
X is an interval [a, b] or a finite set. It can also be extended to multivariate ap-
proximation problems [32]. In practice, however, it may suffer greatly from rounding
errors, and its speed is often disappointing on larger problems. As we shall now de-
scribe, we have found that the first of these difficulties can be largely eliminated by
the use of barycentric representations with adaptively chosen support points. The
second problem of speed, however, remains, which is why ultimately we prefer the
Remez algorithm for most problems.
9.1. The barycentric formulation. For an effective implementation, X needs
to be a finite set (e.g. obtained by discretizing [a, b]) to reduce each iteration to a linear
programming (LP) problem. Considering the diagonal case m = n, a barycentric
version of the DC algorithm can be defined recursively as follows. (We assume the
support points are fixed to the values t0 , . . . , tn , which do not belong to X.) Given
rk = Nk /Dk ∈ Rn,n (X), choose the partial fraction decompositions N and D of (2.1)
that minimize the expression
|f (x)D(x) − N (x)| − δk |D(x)|
max , (9.1)
x∈X |Dk (x)|
subject to
and
where δk = maxx∈X |f (x) − rk (x)|. If r = N/D is not good enough, continue with
rk+1 = r. By imposing (9.3), we can establish convergence using an argument anal-
ogous to [3, Theorem 2]. In the polynomial basis setting, we know that the rate of
convergence will ultimately be at least quadratic if the best approximation is non-
degenerate [3, Theorem 3]. Non-diagonal approximations can be computed by adding
the appropriate null space constraints as described in Section 4.4.
9.2. Choice of support points. Compared to the case of the barycentric Re-
mez algorithm, changing the support points at each iteration of the DC algorithm
makes it hard to impose a normalization condition similar to (9.3) or do a conver-
gence analysis of the method. We therefore fix {tk } throughout the execution. The
strategy we have adopted is based on Section 5.3: recursively construct type (`, `)
approximations with ` ≤ n. We take the set of support points of the (`, `) problem
based on a piecewise linear fit of the final reference points of the (` − 1, ` − 1) problem
(similar to what is shown in Figure 5.1).
9.3. Experiments. We have implemented2 the barycentric DC algorithm in
MATLAB using CVX [29] to specify the LP problems corresponding to (9.1)–(9.3),
2 The prototype code used is available at https://github.com/sfilip/barycentricDC.
26 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN
Table 9.1: Best type (16, 16) approximations to four functions using the barycentric DC
algorithm. X consists of 20000 equispaced points inside [−1, 1].
i fi kfi − r∗ kX,∞
P∞
1 k=0 2−k cos(3k x) 0.1377
2 min {sech(3 sin(10x)), sin(9x)} 0.0610
1.2057 · 10−4
p
3 |x3 | + |x + 0.5|
4 1
2
x
erf √0.0002 + 32 e−x 6.2045 · 10−6
0.5
f1 0.2
0.1
0 0
-0.1
-0.5
-0.2
-1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
1
0.1
0.5 f2 0.05
0 0
-0.5 -0.05
-0.1
-1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
×10 -4
2.5
2
2
1
1.5
0
1
0.5
f3 -1
-2
0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
-5
×10
3
1
2.5
0.5
2 f4
0
1.5
-0.5
1
-1
0.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Fig. 9.1: The functions of Table 9.1 with error curves for best rational approximations
computed by the barycentric DC algorithm.
which are then solved using MOSEK’s [41] state-of-the-art LP optimizers. The four
examples in Table 9.1 and Figure 9.1, for instance, demonstrate the effectiveness of the
algorithm. For comparison, the sensitivity to the initial reference set prevented the
convergence of our barycentric Remez implementation on all four of these examples.
Function f1 is particularly interesting since it is a version of Weierstrass’s classic
example of a continuous but nowhere differentiable function.
Using a monomial or Chebyshev basis representation for the LP formulations
RATIONAL MINIMAX APPROXIMATION 27
quickly failed due to numerical errors, illustrating that the barycentric representation
is crucial for the DC algorithm just as for the Remez algorithm.
We nevertheless echo the statement in the beginning of the section of the down-
sides of using the DC approach:
• Its overall cost. Producing the approximations in Figure 9.1 took several
minutes in MATLAB on a desktop machine for each example.
• Numerical optimization tools for solving the corresponding LP problems break
down at lower values of m and n than the ones we achieved with the barycen-
tric Remez algorithm. We were usually able to go up to about type (20, 20).
No Yes
Failed valid
Lower degree
approximation
No
Yes
valid
No
Find λk and rk
Input:
AAA-Lawson using symmetric Yes Find new Yes Output:
f, (m, n) valid converged
[a, b]
approximation eigenvalue reference set r∗
problem
No No
valid Failed
Yes
CF approx-
imation
Fig. 10.1: Flowchart summarizing the minimax implementation of the rational Remez algo-
rithm in the unweighted case. It follows the steps outlined at the start of Section 3. Step 1
consists of picking the initial reference set. This is done by applying in succession (if needed)
the strategies discussed in Sections 5.1, 5.2 and 5.3. Next up in Step 2 is computing the cur-
rent approximant rk and alternation error λk . We do this by solving a symmetric eigenvalue
problem (4.13), (4.22) or (4.24), depending on m = n, m > n or m < n. We then pick, if
possible, the eigenpair leading to a rational approximant with no poles in [a, b] (see discus-
sion around equation (4.14)). The next reference set is determined in Step 3 as explained in
Section 6. If convergence is successful, the routine outputs a numerical approximant of r∗ .
Acknowledgement. We thank the reviewers for their useful comments and sug-
gestions, which helped improve the quality of the paper.
REFERENCES