Remezr2 Arxiv

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

RATIONAL MINIMAX APPROXIMATION VIA ADAPTIVE

BARYCENTRIC REPRESENTATIONS
SILVIU-IOAN FILIP∗ , YUJI NAKATSUKASA† , LLOYD N. TREFETHEN‡ , AND
BERNHARD BECKERMANN§

Abstract. Computing rational minimax approximations can be very challenging when there are
singularities on or near the interval of approximation — precisely the case where rational functions
outperform polynomials by a landslide. We show that far more robust algorithms than previously
available can be developed by making use of rational barycentric representations whose support
points are chosen in an adaptive fashion as the approximant is computed. Three variants of this
barycentric strategy are all shown to be powerful: (1) a classical Remez algorithm, (2) a “AAA-
Lawson” method of iteratively reweighted least-squares, and (3) a differential correction algorithm.
Our preferred combination, implemented in the Chebfun MINIMAX code, is to use (2) in an initial
phase and then switch to (1) for generically quadratic convergence. By such methods we can calculate
approximations up to type (80, 80) of |x| on [−1, 1] in standard 16-digit floating point arithmetic, a
problem for which Varga, Ruttan, and Carpenter required 200-digit extended precision.

Key words. barycentric formula, rational minimax approximation, Remez algorithm, differen-
tial correction algorithm, AAA algorithm, Lawson algorithm

AMS subject classifications. 41A20, 65D15

1. Introduction. The problem we are interested in is that of approximating


functions f ∈ C([a, b]) using type (m, n) rational approximations with real coefficients,
in the L∞ setting. The set of feasible approximations is
 
p
Rm,n = : p ∈ Rm [x], q ∈ Rn [x] . (1.1)
q

Given f and prescribed nonnegative integers m, n, the goal is to compute

min kf − rk∞ , (1.2)


r∈Rm,n

where k · k∞ denotes the infinity norm over [a, b], i.e., kf − rk∞ = maxx∈[a,b] |f (x) −
r(x)|. The minimizer of (1.2) is known to exist and to be unique [58, Ch. 24].
Let the minimax (or best) approximation be written r∗ = p/q ∈ Rm,n , where p
and q have no common factors. The number d = min {m − deg p, m − deg q} is called
the defect of r∗ . It is known that there exists a so-called alternant (or reference) set
consisting of ordered nodes a 6 x0 < x1 < · · · < xm+n+1−d 6 b, where f − r∗ takes

∗ UnivRennes, Inria, CNRS, IRISA, F-35000 Rennes, France ([email protected]).


† National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan.
([email protected])
‡ Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK (tre-
[email protected]). SF and LNT were supported by the European Research Council
under the European Union’s Seventh Framework Programme (FP7/2007–2013)/ERC grant agree-
ment 291068. The views expressed in this article are not those of the ERC or the European
Commission, and the European Union is not liable for any use that may be made of the information
contained here. YN was supported by Japan Society for the Promotion of Science as an Overseas
Research Fellow.
§ Laboratoire Paul Painleve UMR 8524, Dept. Mathématiques, Univ. Lille, F-59655 Villeneuve

d’Ascq CEDEX, France ([email protected]). Supported in part by the Labex


CEMPI (ANR-11-LABX-0007-01).
1
2 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

its global extremum over [a, b] with alternating signs. In other words, we have the
beautiful equioscillation property [58, Theorem 24.1]

f (x` ) − r∗ (x` ) = (−1)`+1 λ, ` = 0, . . . , m + n + 1 − d, (1.3)

where |λ| = kf − r∗ k∞ . Minimax approximations with d > 0 are called degenerate,


and they can cause problems for computation. Accordingly, unless otherwise stated,
we make the assumption that d = 0 for (1.2). In practice, degeneracy most often
arises due to symmetries in approximating even or odd functions, and we check for
these cases explicitly to make sure they are treated properly. Other degeneracies can
usually be detected by examining in succession the set of best approximations of types
(m − k, n − k), (m − k + 1, n − k + 1), . . . , (m, n) with k = min {m, n} [11, p. 161].
In the approximation theory literature [11, 15, 40, 50, 63], two algorithms are
usually considered for the numerical solution of (1.2), the rational Remez and dif-
ferential correction (DC) algorithms. The various challenges that are inherent in
rational approximations can, more often than not, make the use of such methods
difficult. Finding the best polynomial approximation, by contrast, can usually be
done robustly by a standard implementation of the linear version of the Remez al-
gorithm [47]. This might explain why the current software landscape for minimax
rational approximations is rather barren. Nevertheless, implementations of the ra-
tional Remez algorithm are available in some mathematical software packages: the
Mathematica MiniMaxApproximation function, the Maple numapprox[minimax] rou-
tine and the MATLAB Chebfun [24] remez code. The Boost C++ libraries [1] also
contain an implementation.
Over the years, the applications that have benefited most from minimax rational
approximations come from recursive filter design in signal processing [13, 23] and the
representation of special functions [18, 19]. Apart from such practical motivations,
we believe it worthwhile to pursue robust numerical methods for computing these
approximations because of their fundamental importance to approximation theory.
A new development of this kind has already resulted from the algorithms described
here: the discovery that type (k, k) rational approximations to xn , for n  k, converge
geometrically at the rate O(9.28903 · · ·−k ) [44].
In this paper we present elements that greatly improve the numerical robust-
ness of algorithms for computing best rational approximations. The key idea is the
use of barycentric representations with adaptively chosen basis functions, which can
overcome the numerical difficulties frequently encountered when f has nonsmooth
points. For instance, when trying to approximate f (x) = |x| on [−1, 1] using stan-
dard IEEE double precision arithmetic in MATLAB, our barycentric Remez algorithm
can compute rational approximants of type up to (82, 82)—higher than that obtained
by Varga, Ruttan and Carpenter in [62] using 200-digit arithmetic1 .
A similar Remez iteration using the barycentric representation was described by
Ionit, ă [35, Sec. 3.2.3] in his PhD thesis. We adopt the same set of support points
(see Section 4.3), and our analysis justifies its choice: we prove its optimality in
a certain sense. A difference from Ionit, ă’s treatment is that we reduce the core
computational task to a symmetric eigenvalue problem, rather than a generalized
eigenproblem as in [35]. The bigger difference is that Ionit, ă treated just the core
iteration for approximations of type (n, n), whereas we generalize the approach to

1 Chebfun’s previous remez command (until version 5.6.0 in December 2016) could only go up to

type (8, 8).


RATIONAL MINIMAX APPROXIMATION 3

type (m, n) and include the initialization strategies that are crucial for making the
entire procedure into a fully practical algorithm.
This work is motivated by the recent AAA algorithm [43] for rational approxima-
tion, which uses adaptive barycentric representations with great success. A large part
of the text is focused on introducing a robust version of the rational Remez algorithm,
followed by a discussion of two other methods for discrete `∞ rational approximation:
the AAA-Lawson algorithm (efficient at least in the early stages, but non-robust) and
the DC algorithm (robust, but not very efficient). We shall see how all three algo-
rithms benefit from an adaptive barycentric basis. In practice, we advocate using the
Remez algorithm, mainly for its convergence properties (usually quadratic [21], unlike
AAA-Lawson, which converges linearly at best), practical speed (an eigenvalue-based
Remez implementation is usually much faster than a linear programming-based DC
method), and its ability to work with the interval [a, b] directly rather than requiring
a discretization (unlike both AAA-Lawson and DC). AAA-Lawson is used mainly as
an efficient approach to initialize the Remez algorithm.
The paper is organized as follows. In Section 2 we review the barycentric rep-
resentation for rational functions. Sections 3 to 6 are the core of the paper; here
we develop the barycentric rational Remez algorithm with adaptive basis functions.
Numerical experiments are presented in Section 7. We describe the AAA-Lawson
algorithm in Section 8, and in Section 9 we briefly present the barycentric version of
the differential correction algorithm. Section 10 presents a flow chart of minimax and
an example of how to compute a best approximation in Chebfun.
2. Barycentric rational functions. All of our methods are made possible by
a barycentric representation of r, in which both the numerator and denominator are
given as partial fraction expansions. Specifically, we consider
n n
X
N (z) X αk βk
r(z) = = , (2.1)
D(z) z − tk z − tk
k=0 k=0

where n ∈ N, α0 , . . . , αn and β0 , . . . , βn are sets of real coefficients and t0 , . . . , tn is a


set of distinct real support points. The names N and D stand for “numerator” and
“denominator”.
If we denote by ωt the node polynomial associated with t0 , . . . , tn ,
n
Y
ωt (z) = (z − tk ),
k=0

then p(z) = ωt (z)N (z) and q(z) = ωt (z)D(z) are both polynomials in Rn [x]. We thus
get r(z) = p(z)/q(z), meaning that r is a type (n, n) rational function. (This is not
necessarily sharp; r may also be of type (µ, ν) with µ < n and/or ν < n.) At each
point tk with nonzero αk or βk , formula (2.1) is undefined, but this is a removable
singularity with limz→tk r(z) = αk /βk (or a simple pole in the case αk 6= 0, βk = 0),
meaning r is a rational interpolant to the values {αk /βk } at the support points {tk }.
Much of the literature on barycentric representations exploits this interpolatory
property [7,8,10,12,27,55] by taking αk = f (tk )βk , so that r is an interpolant to some
given function values f (t0 ), . . . , f (tn ) at the support points. In this case
n  n
X f (tk )βk X βk
r(z) = , (2.2)
z − tk z − tk
k=0 k=0
4 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

with the coefficients {βk } commonly known as barycentric weights; we have r(tk ) =
f (tk ) as long as βk 6= 0. While such a property is useful and convenient when we
want to compute good approximations to f (see in particular the AAA algorithm),
for a best rational approximation r∗ we do not know a priori where r∗ will intersect f ,
so enforcing interpolation is not always an option. (We use interpolation for Remez
but not for AAA-Lawson or DC.) Formula (2.1), on the other hand, has 2n + 1
degrees of freedom and can be used to represent any rational function of type (n, n)
by appropriately choosing {αk } and {βk } [43, Theorem 2.1]. We remark that variants
of (2.1) also form the basis for the popular vector fitting [30, 31] method used to
match frequency response measurements of dynamical systems. A crucial difference
is that the support points {tk } in vector fitting are selected to approximate poles of
f , whereas, as we shall describe in detail, we choose them so that our representation
uses a numerically stable basis.

2.1. Representing rational functions of nondiagonal type. Functions r


expressed in the barycentric form (2.1) range precisely over the set of all rational
functions of (not necessarily exact) type (n, n). When one requires rational functions
of type (m, n) with m 6= n, additional steps are needed to enforce the type.
The approach we have followed, which we shall now describe, is a linear algebraic
one based on previous work by Berrut and Mittelmann [9], where we make use of
Vandermonde matrices to impose certain conditions that limit the numerator or de-
nominator degree. An alternative might be to avoid such matrices and constrain the
barycentric representation more directly to have a certain number of poles or zeros at
z = ∞. This is a matter for future research.
To examine the situation, we first suppose m < n and convert r into the conven-
tional polynomial quotient representation
Qn Pn αk
ωt (z)N (z) k=0 (z − tk ) k=0 p(z)
z − tk
r(z) = = =: . (2.3)
ωt (z)D(z) Qn Pn βk q(z)
k=0 (z − t k ) k=0
z − tk

The numerator p is a polynomial of degree at most n. Further, it can be seen (either


via direct computation or from [9, eq. (1)]) that p is of degree m (< n) if and only
if the vector α = [α0 , . . . , αn ]T lies in a subspace spanned by the null space of the
(transposed) Vandermonde matrix
 
1 1 ··· 1
 t0 t1 ···  tn
Vm =  .. .. . .. (2.4)
 
 . .  .
n−1−m n−1−m n−1−m
t0 t1 · · · tn

That is, to enforce r ∈ Rm,n with m < n, we require α ∈ span(Pm ), where Pm ∈


R(n+1)×(m+1) has orthonormal
 columns, obtained by taking the full QR factorization
T
 ⊥  Rm ⊥
V m = Pm Pm , where Pm ∈ R(n+1)×(n−m) , Rm ∈ R(n−m)×(n−m) . Note that
0
Rm is nonsingular if the support points {tk } are distinct.
Pm Similarly, for−1m P> n, we need to take m + 1 terms in (2.1), that is, r(z) =
m −1
k=0 αk (z − t k ) k=0 βk (z − tk ) , and force β ∈ span(Pn ), where span(Pn ) is
RATIONAL MINIMAX APPROXIMATION 5

the null space of the matrix


 
1 1 ··· 1
 t0 t1 ···  tm
Vn =  .. .. , .. (2.5)
 
 . .  .
m−1−n m−1−n m−1−n
t0 t1 · · · tm
 
 Rn
obtained by the QR factorization VnT = Pn⊥ Pn , where Pn⊥ ∈ R(m+1)×(m−n) ,

0
Rn ∈ R(m−n)×(m−n) .
In Section 4.4 we describe how to use the matrices Pm , Pn in specific situations.
Since these matrices are obtained via Vm , Vn in (2.4)–(2.5) and real-valued Vander-
monde matrices are usually highly ill-conditioned [4, 5, 48], care is needed when com-
puting their null spaces, as extracting the orthogonal factors in QR (or SVD) is
susceptible to numerical errors. Berrut and Mittelmann [9] suggest a careful elimi-
nation process to remedy this (for a slightly different problem). Here, in view of the
Krylov-type structure of the matrices VmT and VnT , we propose the following simpler
approach, based on an Arnoldi-style orthogonalization:

1. Let Q = [1, . . . , 1]T when m > n, and Q = [f (t0 ), . . . , f (tn )]T when m < n,
and normalize to have Euclidean norm 1.
2. Let q be the last column of Q. Take the projection of diag(t0 , . . . , tmax(m,n) )q
onto the orthogonal complement of Q, normalize, and append it to the right
of Q. Repeat this |m − n| times to obtain Q ∈ C(max(m,n)+1)×(|m−n|) . In
MATLAB, this is q = Q(:,end); q = diag(t)*q; for i = 1:size(Q,2),
q = q-Q(:,i)*(Q(:,i)’*q); end, q = q/norm(q); Q = [Q,q];.
3. Take the orthogonal complement Q⊥ of Q via computing the QR factorization
of Q. Q⊥ is the desired matrix, Pm or Pn .

Note that the matrix Q in the final step is well conditioned (κ2 (Q) = 1 in exact
arithmetic), so the final QR factorization is a stable computation.

2.2. Why does the barycentric representation help?. The choice of the
support points {tk } is very important numerically, and indeed it is the flexibility
of where to place these points that is the source of the power of barycentric repre-
sentations. If the points are well chosen, the basis functions 1/(x − tk ) lead to a
representation of r that is much better conditioned (often exponentially better) than
the conventional representation as a ratio of polynomials. We motivate and explain
our adaptive choice of {tk } for the Remez algorithm in Sections 4.3 and 4.5. The
analogous choices for AAA-Lawson and DC are discussed in Sections 8.5 and 9.2.
To understand why a barycentric representation is preferable for rational approx-
imation, we first consider the standard quotient representation p/q. It is well known
that a polynomial will vary in size by exponentially large factors over an interval un-
less its roots are suitably distributed (approximating a minimal-energy configuration).
If p/q is a rational approximation, however, the zeros of p and q will be positioned by
approximation considerations, and if f has singularities or near-singularities they will
be clustered near those points. In the clustering region, p and q will be exponentially
smaller than in other parts of the interval and will lose much or all of their relative
accuracy. Since the quotient p/q depends on that relative accuracy, its accuracy too
will be lost.
6 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

1.5

1
|D|
0.5
q
0
-1 -0.5 0 0.5 1

100
|D|
10-5

10-10
q
10-15
-1 -0.5 0 0.5 1

Fig. 2.1: Linear (top) and semilogy (bottom) plots of q and |D| in r∗ = p/q = N/D, the
best rational approximation for |x| of type (m, n) = (20, 20). Here p, q are the polynomials
in the classical quotient representation (1.1), and D is the denominator in the barycentric
representation (2.1). The dots are the equioscillation points {x` }, while the set of support
points {tk } consists of every other point in {x` }.

A barycentric quotient N/D, by contrast, is composed of terms that vary in


size just algebraically across the interval, not exponentially, so this effect does not
arise. If the support points are suitably clustered, N and D may have approximately
uniform size across the interval (away from their poles, which cancel in the quotient),
as illustrated in Figure 2.1.

2.3. Numerical stability of evaluation. Regarding the evaluation of r in the


barycentric representation, Higham’s analysis in [34, p. 551] (presented for barycentric
polynomial interpolation, but equally valid for (2.1)) shows that evaluating r(x) is
backward stable in the sense that the computed value rb(x) satisfies
n  n
X αk (1 + αk ) X βk (1 + βk )
rb(x) = , (2.6)
x − tk x − tk
k=0 k=0

where αk , βk denote quantities of size O(u), or more precisely, bounded by (1 +
u)3n+4 . In other words, rb(x) is an exact evaluation of (2.1) for slightly perturbed
{αk }, {βk }. Note that when r represents a polynomial (as assumed in [34]), (2.6)
does not imply backward stability. However, as a rational function for which we allow
for backward errors in the denominator, (2.6) does imply backward stability.
For the forward error, we can adapt the analysis of [14, Proposition 2.4.3]. Assume
that the computed coefficients αb, βb are obtained through a backward stable process,

bk = αk (1 + δαk ), δαk = O(κα u), βbk = βk (1 + δβk ), δβk = O(κβ u),


α k = 0, . . . , n,

where κα and κβ are condition numbers associated with the matrices used to determine
b and β.
α b Then, if x (the evaluation point) and {tk } are considered to be floating point
numbers, we have
RATIONAL MINIMAX APPROXIMATION 7

Lemma 2.1. The relative forward error for the computed value rb(x) of (2.1)
satisfies
Pn αk Pn βk
r(x) − rb(x) k=0 x−tk k=0 x−tk
≤ u(n+3+O(κα )) P +u(n+2+O(κβ )) P +O(u2 ).
r(x) n αk n βk
k=0 x−tk k=0 x−tk
(2.7)

Proof. This follows from [14, Prop. 2.4.3].


If the functions |D(x)| and |N (x)| appearing in the denominators of the right-hand
side of (2.7) do not become too small over [a, b], then we can expect the evaluation of
rb to be accurate. Note that |D(x)| is precisely the quantity examined in Section 2.2,
where we argued that it takes values O(1) or larger across the interval. Further, since
r(x) ≈ f (x) implies |N (x)| ≈ |D(x)f (x)|, we see that |N (x)| is not too small unless
|f (x)| is small. Put together, we expect the barycentric evaluation phase to be stable
unless |f (x)| (and hence |r(x)|) is small. Note that since (2.7) measures the relative
error, we usually cannot expect it to be O(u) when |r(x)| ≈ |f (x)|  1.
3. The rational Remez algorithm. Initially developed by Werner [64, 65]
and Maehly [38], the rational Remez algorithm extends the ideas of computing best
polynomial approximations due to Remez [53, 54]. It can be summarized as follows:
Step 1 Set k = 1 and choose m + n + 2 distinct reference points
(k) (k)
a ≤ x0 < · · · < xm+n+1 ≤ b.

Step 2 Determine the levelled error λk ∈ R (positive or negative) and rk ∈ Rm,n


such that rk has no pole on [a, b] and
(k) (k)
f (x` ) − rk (x` ) = (−1)`+1 λk , ` = 0, . . . , m + n + 1. (3.1)
(k+1)
Step 3 Choose as the next reference m + n + 2 local maxima {x` } of |f − rk | such
that
 
(k+1) (k+1)
s(−1)` f (x` ) − rk (x` ) ≥ |λk | , ` = 0, . . . , m + n + 1, (3.2)

with s ∈ {±1} and such that for at least one ` ∈ {0, . . . , m + n + 1}, the left-
hand side of (3.2) equals kf − rk k∞ . If rk has converged to within a given
threshold εt > 0 (i.e., (kf − rk k∞ − λk )/ kf − rk k∞ ≤ εt [50, eq. (10.8)])
return rk , else go to Step 2 with k ← k + 1.
If Step 2 is always successful, then convergence to the best approximation is
assured [63, Theorem 9.14]. It might happen that Step 2 fails, namely when all rational
solutions satisfying the equations (3.1) have poles in [a, b]. If the best approximation
is non-degenerate and the initial reference set is already sufficiently close to optimal,
then the algorithm will converge [11, §V.6.B]. To our knowledge, there is no effective
way in general to determine when degeneracy is the cause of failure.
We note that the rational Remez algorithm can also be adapted to work in the
case of weighted best rational approximation. An early account of this is given in [22].
Given a positive weight function w ∈ C([a, b]), the goal is to find r∗ ∈ Rm,n such
that the weighted error kf − r∗ kw,∞ = maxx∈[a,b] |w(x)(f (x) − r∗ (x))| is minimal.
Equations (3.1) and (3.2) get modified to
 
(k) (k) (k)
w(x` ) f (x` ) − rk (x` ) = (−1)`+1 λk , ` = 0, . . . , m + n + 1
8 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

and
 
(k+1) (k+1) (k+1)
s(−1)` w(x` ) f (x` ) − rk (x` ) ≥ |λk | , ` = 0, . . . , m + n + 1,

while the norm computations in Step 3 are taken with respect to w. Notice that the
ability to work with the weighted error immediately allows us to compute the best
approximation in the relative sense, by taking w(x) = 1/|f (x)|, assuming that f is
nonzero over [a, b].
We discuss each step of the rational Remez algorithm in the following sections.
We first address Step 2, as this is the core part where the barycentric representation
is used. We then discuss initialization (Step 1) in Section 5, and finding the next
reference set (Step 3) in Section 6. Our focus is on the unweighted setting, but we
comment on how our ideas can be extended to the weighted case as well.
4. Computing the trial approximation. For notational simplicity, in this
section we drop the index k referring to the iteration number, the analysis being valid
for any iteration of the rational Remez algorithm. We begin with the case m = n.
4.1. Linear algebra in a polynomial basis. We first derive the Remez algo-
rithm in an (arbitrary) polynomial basis. At each iteration, we search for r = p/q ∈
Rn,n , p, q ∈ Rn [x] such that

f (x` ) − r(x` ) = (−1)`+1 λ, ` = 0, . . . , 2n + 1 (4.1)

and assume that we represent p and q using a basis of polynomials ϕ0 , . . . , ϕn such


that spanR (ϕi )0≤i≤n = Rn [x]:
n
X n
X
p(x) = cp,k ϕk (x), q(x) = cq,k ϕk (x).
k=0 k=0

The linearized version of (4.1) is then given by

p(x` ) = q(x` ) f (x` ) − (−1)`+1 λ ,




which, in matrix form, becomes


   
f (x0 ) −1
 f (x1 )   1 
Φ x cp =    − λ  Φx cq , (4.2)
   
.. −1
 .   
..
f (x2n+1 ) .

where Φx ∈ R(2n+2)×(n+1) is the basis matrix (Φx )`,k = ϕk (x` ), 0 ≤ ` ≤ 2n + 1, 0 ≤


k ≤ n, and cp = [cp,0 , cp,1 , . . . , cp,n ]T and cq = [cq,0 , cq,1 , . . . , cq,n ]T are the coefficient
vectors of p and q. Note that in this paper, vector and matrix indices always start
at zero. Up to multiplying both sides on the left by a nonsingular diagonal matrix
D = diag (d0 , . . . , d2n+1 ), (4.2) can also be written as a generalized eigenvalue problem
   
  cp   cp
DΦx −F DΦx = λ 0 −SDΦx , (4.3)
cq cq

with F = diag (f (x0 ), . . . , f (x2n+1 )) and S = diag (−1)k+1 .
RATIONAL MINIMAX APPROXIMATION 9

As described in Powell [50, Ch. 10.2], solving (4.3) is usually done by eliminating
cp . His presentation considers the monomial basis, but the approach is valid for any
basis of Rn [x]. By taking the full QR decomposition of DΦx , we get
 
  R
DΦx = Q1 Q2 = Q1 R.
0

Since DΦx is of full rank, we have Q1 , Q2 ∈ R(2n+2)×(n+1) and QT2 Q1 = 0. By multi-


 T
plying (4.3) on the left by QT = Q1 Q2 , we obtain a block triangular eigenvalue
problem with lower-right (n + 1) × (n + 1) block

QT2 F Q1 Rcq = λQT2 SQ1 Rcq . (4.4)

(The top-left (n+1)×(n+1) block has all eigenvalues at infinity, and is thus irrelevant.)
In terms of polynomials, (Q1 )`,k = d` ψk (x` ), 0 ≤ k ≤ n, 0 ≤ ` ≤ 2n + 1, where
(ψk )0≤k≤n is a family of orthonormal polynomials with respect to the discrete inner
P2n+1
product hf, gix = k=0 d2k f (xk )g(xk ). Moreover, if (ϕk )0≤k≤n is a degree-graded
basis with deg ϕk = k, then we have deg ψk = k, 0 ≤ k ≤ n.
Let ωx be the node polynomial associated with the reference nodes x0 , . . . , x2n+1 ,
and Ωx = diag (1/ωx0 (x0 ), . . . , 1/ωx0 (x2n+1 )). We have [50, p. 114]

VxT Ωx Vx = 0, (4.5)

where Vx ∈ R(2n+2)×(n+1) is the Vandermonde matrix associated with x0 , . . . , x2n+1 ,


that is, (Vx )i,j = xji . Indeed,
2n+1
X 1
VxT Ωx Vx i,j xi+j = (xi+j )[x0 , . . . , x2n+1 ] = 0,

= ` i, j ∈ {0, . . . , n} ,
ωx0 (x` )
`=0

the divided differences of order 2n + 1 of the function xi+j at the {x` } nodes, hence
0 if i + j ≤ 2n.
By using the appropriate change of basis matrix in (4.5), we have

ΦTx Ωx Φx = 0. (4.6)

Now, by multiplying (4.3) on the left by ΦTx Ωx D−1 and using (4.6), we can eliminate
the cp term to obtain

ΦTx Ωx F Φx cq = λΦTx Ωx SΦx cq . (4.7)

Equation (4.7) is the extension of [50, Eq. (10.13)] from the monomial basis to
ϕ0 , . . . , ϕn . Moreover, we have:
Lemma 4.1. The matrix ΦTx Ωx SΦx is symmetric positive definite.
Proof. Since Ωx S = |Ωx |, it means that Ωx S is symmetric positive definite, and
the conclusion follows. See also [50, Theorem 10.2].
Since ΦTx Ωx F Φx is also symmetric, it follows that all eigenvalues of (4.7) are real
and at most one eigenvector cq corresponds to a pole-free solution r (i.e., q has no
root on [a, b]). To see this, suppose to the contrary that there exists another pole-free
solution r0 . Then, from (4.1), it follows that either r(xk ) − r0 (xk ) are all zero or they
alternate in sign at least 2n + 1 times. In both cases, r − r0 ∈ R2n,2n has at least
2n + 1 zeros inside [a, b], leading to r = r0 .
10 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

We can in fact transform (4.4) into a symmetric eigenvalue problem (an observa-
1/2
tion which seems to date to [49]) by considering the choice D = |Ωx | , which leads
to Q2 = SQ1 in view of (4.6). The system becomes QT1 SF Q1 Rcq = λQT1 S 2 Q1 Rcq ,
which, by the change of variables y = Rcq , gives

QT1 SF Q1 y = λy. (4.8)


1/2 1/2
To get cp , from (4.2), we have |Ωx | Φx cp = (F − λS) |Ωx | Φx cq , or equiva-
lently (by multiplication on the left by QT1 ),
1/2
Rcp = QT1 (F − λS) |Ωx | Φx cq = QT1 F Q1 y.

The vectors Rcp and Rcq can be seen as vectors of coefficients of the numerator
and denominator of r in the orthogonal basis ψ0 , . . . , ψn . The (scaled) values of
the denominator at each xk corresponding to an eigenvector y can be recovered by
computing
1/2
|Ωx | Φx cq = Q1 y. (4.9)

From this we can confirm the uniqueness of the pole-free solution: since the eigenvec-
tors are orthogonal, there is at most one generating a vector of denominator values of
the same sign, making it the only pole-free solution candidate.
4.2. Linear algebra in a barycentric basis. An equivalent analysis is valid
if we take r in the barycentric form (2.1). Namely, (4.1) becomes
   
f (x0 ) −1
 f (x1 )   1 
Cα =  − λ  Cβ, (4.10)
   
 . .. 
  −1 
..
f (x )2n+1 .

where C is now a (2n + 2) × (n + 1) Cauchy matrix with entries C`,k = 1/(x` − tk )


(we assume for the moment {x` } ∩ {tk } = ∅) and α = [α0 , α1 , . . . , αn ]T and β =
[β0 , β1 , . . . , βn ]T are the column vectors of coefficients {αk } and {βk }. Again, this can
be transformed into a generalized eigenvalue problem
   
  α   α
C −F C = λ 0 −SC . (4.11)
β β

To reduce (4.11) to a symmetric eigenvalue problem as in (4.8), we form a link between


the monomial and barycentric representations in terms of the basis matrices Vx and
C. We have:
Lemma 4.2. Let Vx , ωt be as defined above, and Vt ∈ R(n+1)×(n+1) be the Van-
dermonde matrix corresponding to the support points, i.e., (Vt )i,j = tji . Then
   
1 1 1 1
diag ,..., Vx = C diag ,..., 0 Vt .
ωt (x0 ) ωt (x2n+1 ) ωt0 (t0 ) ωt (tn )

Proof. If we look at an arbitrary element of the right-hand side matrix, we have


n
x`j
   
1 1 X 1 `
C diag , . . . , V t = t = ,
ωt0 (t0 ) ωt0 (tn ) j,` (xj − tk )ωt0 (tk ) k ωt (xj )
k=0
RATIONAL MINIMAX APPROXIMATION 11

where the second equality is a consequence of the Lagrange interpolation formula.


In place of Ωx we will use the following matrix ∆:
Lemma 4.3. If ∆ = diag ωt (x0 )2 , . . . , ωt (x2n+1 )2 Ωx , then C T ∆C = 0.
Proof. We apply Lemma 4.2 and use the fact that VxT Ωx Vx = 0. Namely,
C ∆C = diag (ωt0 (t0 ), . . . , ωt0 (tn )) Vt−T VxT Ωx Vx Vt−1 diag (ωt0 (t0 ), . . . , ωt0 (tn )) = 0.
T
1/2
We now take the full QR decomposition of |∆| C = (S∆)1/2 C. We have
 
1/2   R
|∆| C = Q1 Q2 = Q1 R.
0

Based on Lemma 4.3, we can again take Q2 = SQ1 . From (4.11) we get

h i α h i α
1/2 1/2 1/2
|∆| C −F |∆| C = λ 0 −S |∆| C .
β β
 T
Multiplying this expression on the left by Q1 Q2 gives a block triangular matrix
pencil, whose (n + 1) × (n + 1) lower-right corner is the barycentric analogue of (4.4):
QT2 F Q1 Rβ = λQT2 SQ1 Rβ. After substituting QT2 = QT1 S, we get

QT1 (SF )Q1 Rβ = λQT1 S 2 Q1 Rβ, (4.12)

which, by the change of variable y = Rβ, becomes a standard symmetric eigenvalue


problem in λ with eigenvector y (recall that S, F are diagonal):

QT1 (SF )Q1 y = λy. (4.13)

Hence, computing its eigenvalues is a well-conditioned operation. The values of the


denominator of the rational interpolant corresponding to each eigenvector y can be
recovered by computing
−1/2
diag (ωt (x0 ), . . . , ωt (x2n+1 )) Cβ = diag (ωt (x0 ), . . . , ωt (x2n+1 )) |∆| Q1 y. (4.14)

As in the polynomial case, there is at most one solution such that q(x) = D(x)ωt (x)
has no root in [a, b]; indeed, (4.9) and (4.14) represent the values of q(x` ) for r = p/q
and x` satisfying equation (4.1). We use this sign test involving (4.14) to determine
the levelled error λ that gives a pole-free r in Step 2 of our rational Remez algorithm.
The appropriate β is then taken by solving Rβ = y. From (4.10), we have
1/2 1/2
|∆| Cα = (F − λS) |∆| Cβ,

or equivalently (by multiplication on the left by QT1 )


1/2
Rα = QT1 (F − λS) |∆| Cβ = QT1 (F − λS)Q1 y = QT1 F Q1 y, (4.15)

which allows us to recover α (and thus r).


Most of the derivations in this section can be carried over to the weighted approx-
imation setting as well. In particular, the reader can check that the weighted versions
of Equations (4.11) and (4.13) correspond to
   
  α  −1
 α
C −F C = λ 0 −SW C
β β
12 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

and
QT1 (SF )Q1 y = λQT1 W −1 Q1 y,
where W = diag (w(x0 ), . . . , w(x2n+1 )) and all the other quantities are the same as
before. While not leading to a symmetric eigenvalue problem, the symmetric and
symmetric positive definite matrices appearing in the second pencil seem to suggest
that the eigenproblem computations will again correspond to well-conditioned opera-
tions. Our experiments support this statement and we leave it as future work to make
this rigorous. To recover α, (4.15) becomes Rα = QT1 (F − λSW −1 )Q1 y.
4.3. Conditioning of the QR factorization. Since the above discussion makes
heavy use of the matrix Q1 , it is desirable that computing the (thin) QR factorization
1/2
|∆| C = Q1 R is a well-conditioned operation.
Here we examine the conditioning of Q1 , the orthogonal factor in the QR fac-
torization of |∆|1/2 C, as this is the key matrix for constructing (4.12). We use the
fact that the standard Householder QR algorithm is invariant under column scal-
1/2 1/2
ing, that is, it computes the same Q1 for both |∆| C and |∆| CΓ for diagonal
Γ [33, Ch. 19]. We thus consider
1/2
min κ2 (|∆| CΓ), (4.16)
Γ∈Dn+1

where Dn+1 is the set of (n + 1) × (n + 1) diagonal matrices. We have


Theorem 4.4. Let tk ∈ (x2k , x2k+1 ) for k = 0, . . . Q , n and sk ∈ (x2k+1 , x2k+2 ) for
n
k = 0, . . . , n − 1, sn ∈ (x2n+1 , ∞), and define ωs (x) = k=0 (x − sk ). Then
s s
1/2 ωs (x` ) ωt (xk )
min κ2 (|∆| CΓ) ≤ max · max . (4.17)
Γ∈Dn+1 ` ωt (x` ) k ωs (xk )

Proof. Let {yj } be a (2n + 2)-element set such that yj ∈ (xj , xj+1 ), j = 0, . . . , 2n,
y2n+1 > x2n+1 and let Cx,y ∈ R(2n+2)×(2n+2) be the p Cauchy matrix with elements
(Cx,y )j,k = 1/(xj − yk ). If we consider D1 = diag( |ωy (xj )/ωx0 (xj )|) and D2 =
q
diag( ωx (yj )/ωy0 (yj ) ), then the matrix D1 Cx,y D2 is orthogonal. This follows, for
instance, if we examine the elements of its associated Gram matrix G and use divided
differences. Indeed, for an arbitrary element (G)j,k with j 6= k, we have
s 2n+1
ωx (yj )ωx (yk ) X ωy (x` )
−(G)j,k =
ωy0 (yj )ωy0 (yk ) (x` − yj )(x` − yk )ωx0 (x` )
`=0
s  
ωx (yj )ωx (yk ) ωy (x)
= [x0 , . . . , x2n+1 ] = 0.
ωy0 (yj )ωy0 (yk ) (x − yj )(x − yk )

Similarly, since j6=k (x − yj ) = q(x)(x − yk ) + ωy0 (yk ), with q ∈ R2n [x], we have,
Q

2n+1 Q
j6=k (x − yj )

ωx (yk ) X ωy (x` ) ωx (yk )
−(G)k,k = 0 = [x0 , . . . , x2n+1 ]
ωy (yk ) (x` − yk )2 ωx0 (x` ) ωy0 (yk ) x − yk
`=0
ωy0 (yk )
 
ωx (yk )
= 0 q(x) + [x0 , . . . , x2n+1 ]
ωy (yk ) x − yk
 
1 −1
= ωx (yk ) [x0 , . . . , x2n+1 ] = ωx (yk ) = −1.
x − yk ωx (yk )
RATIONAL MINIMAX APPROXIMATION 13

Now, if we take tk = y2k , sk = y2k+1 , for k = 0, . . . , n, there exist D ∈ D2n+2 and


1/2 p
Γ ∈ Dn+1 such that |∆| CΓ = DD1 Cx,y D2 It , where D = diag( |ωt (xj )/ωs (xj )|)
and It is obtained by removing every second column from I2n+2 . In particular, Γ =
ItT D2 It . It follows that
s s
1/2 ωs (x` ) ωt (xk )
κ2 (|∆| CΓ) ≤ κ2 (D) = max · max .
` ωt (x` ) k ωs (xk )

Let Γ = ItT D2 It be as in the proof of Theorem 4.4. It turns out that for the choice
1/2
tk = x2k+1 − ε, sk = x2k+1 + ε, for k = 0, . . . , n, as ε → 0, the matrix |∆| C has
a finite limit Ce of full column rank, and similarly Γ tends to some diagonal matrix
Γ with positive diagonal entries. From Theorem 4.4 and its proof we know that C
e eΓe
has condition number 1, and, more precisely, orthonormal columns. We thus obtain
an explicit thin QR decomposition of C e (by direct calculation):
1/2
Corollary 4.5. In the limit tk % x2k+1 , for k = 0, . . . , n, the matrix |∆| C
converges to C,e with entries
 |w0 (t )|
 √ t k if j = 2k + 1,
 |wx0 (tk )|

(C)j,k =
e 0 if j = 2` + 1, ` 6= k,
 √|wt (xj )| /(x − t ) if j = 2`,


0 (x )| j k
|wx j

and explicit thin QR decomposition C e = Q1 R, where


 √

 1/ 2 if j = 2k + 1,
0 if j = 2` + 1, ` 6= k,

(Q1 )j,k = r
0
wx (tk )
 w t (xj )
2w0 (xj ) /(xj − tk ) if j = 2`,


w0 (tk ) t x


 
0 0
and R = 2 diag √|wt (t
0
0 )| |w (t )|
, . . . , √ t0 n .
|wx (t0 )| |wx (tn )|
Corollary 4.5 suggests the choice

tk = x2k+1 for k = 0, . . . , n. (4.18)

This takes us back to the interpolatory mode of barycentric representations (2.2), in


which we take αk = βk (f (tk ) − λ) for all k, instead of solving the system (4.15). This
interpolatory mode formulation is used in [35, Sec. 3.2.3]. Our derivation provides a
theoretical justification by showing that it is optimal with respect to the conditioning
1/2
of |∆| CΓ. Moreover, since minΓ∈Dn+1 κ2 (CΓ) e = 1 in (4.16), forming the QR
1/2
factorization of |∆| C via a standard algorithm (e.g. Householder QR) to obtain
Q1 is actually unnecessary, as the explicit form of Q1 is given in Corollary 4.5. In
addition, we reduce the problem to a symmetric eigenvalue problem (4.13), resulting
in well-conditioned eigenvalues, with β being obtained by solving the diagonal system
Rβ = y with y as in (4.13). Compared to (4.1), where we want q to have the same
sign over {x` }, we similarly require that β and thus y have components alternating
in sign, which uniquely fixes the norm 1 eigenvector y in (4.13). Our approach also
allows for nondiagonal types, as we describe next.
14 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

4.4. The nondiagonal case m 6= n. As pointed out in Section 2.1, when search-
ing for a best approximant with m 6= n, we need to force the coefficient vector α or β
to lie in a certain subspace. This results in modified versions of (4.11). Namely,
   
  α   α
C −F CPn b = λ 0 −SCPn b , when m > n, (4.19)
β β

for βb ∈ Cn+1 , and we take β = Pn β.


b Similarly,
   
  α   α
CPm −F C = λ 0 −SC , when m < n, (4.20)
b b
β β

b ∈ Cm+1 , and we take α = Pm α


for α b.
Below we describe the reduction of the generalized eigenvalue problems (4.19)
and (4.20) to standard symmetric eigenvalue problems.
Case m > n. In this case, C ∈ R(m+n+2)×(m+1) . Since det |∆|1/2 6= 0, (4.19) is
equivalent to the generalized eigenvalue problem
h i α h i α
1/2 1/2 1/2
|∆| C −F |∆| CPn b = λ 0 −S |∆| CPn b . (4.21)
β β

1/2
C Pn Pn⊥ = (S∆)1/2 C Pn Pn⊥ :
   
Consider the (thin) QR decomposition of |∆|
 
1/2  R1 R12
Pn Pn⊥
    
|∆| C = Q1 Q2 R = Q1 Q2 .
0 R2
 T
Then we have the identity Q1 Q2 (SQ1 ) = 0, as can be verified analogously to (4.5)
1/2
using divided differences. This implies (SQ1 )T |∆| C = 0, so by left-multiplying
T
(4.21) by (SQ1 )⊥ SQ1
 
we obtain a block upper-triangular eigenvalue problem
with lower-right (n + 1) × (n + 1) block

(SQ1 )T F Q1 R1 βb = λ(SQ1 )T SQ1 R1 β,


b

which again reduces to the standard symmetric eigenvalue problem (setting y = R1 β)


b

QT1 (SF )Q1 y = λy. (4.22)


1/2 1/2
From (4.21), we have |∆| Cα = (F − λS) |∆| CPn β.
b Left-multiplying by
 T  T 1/2
Q1 Q2 and using Q1 Q2 S |∆| CPn = 0, we obtain
T T 1/2 T
R Pn Pn⊥ α = Q1 Q2 F |∆| CPn βb = Q1 Q2 F Q1 R1 βb
  
 T
= Q1 Q2 F Q1 y.

Therefore
T
α = Pn Pn⊥ R−1 Q1 Q2 F Q1 y,
  

 T
which is obtained
 by computing the vector yb = Q1 Q2 F Q1 y, then solving Re
y = yb
for ye, then α = Pn Pn⊥ ye.

RATIONAL MINIMAX APPROXIMATION 15

Case m < n. This case is analogous to the previous one; we highlight the differ-
ences. C is a (m + n + 2) × (n + 1) matrix. Equation (4.20) is equivalent to
h i α  h i α 
1/2 1/2 1/2
|∆| CPm −F |∆| C = λ 0 −S |∆| C . (4.23)
b b
β β

Consider the (thin) QR decompositions


1/2 1/2
|∆| C = (S∆)1/2 C = Q1 R, |∆| CPm = Q
b 1 R.
b

Here Q1 ∈ R(m+n+2)×(n+1) , Q b 1 ∈ R(m+n+2)×(m+1) . We have Q b T (SQ1 ) = 0, which


1
1/2
again can be established using divided differences. This implies (SQ1 )T |∆| CPm =
T
0, so left-multiplying equation (4.23) by (SQ1 )⊥ SQ1
 
results in a block upper-
triangular eigenvalue problem with lower-right block

(SQ1 )T F Q1 Rβ = λ(SQ1 )T SQ1 Rβ,

which also reduces to the standard symmetric eigenvalue problem (setting y = Rβ)

QT1 (SF )Q1 y = λy. (4.24)


1/2 1/2
From (4.23), we have |∆| CPm α
b = (F − λS) |∆| Cβ. Left-multiplying by
T T 1/2
Q1 and using Q1 S |∆| C = 0, we obtain
b b

b T1 F |∆|1/2 Cβ = Q
bα =Q
Rb b T1 F Q1 Rβ = Q
b T1 F Q1 y.

Therefore
b−1 Q
b=R
α b T F Q1 y,
1

b T F Q1 y, then solving the linear system Rb


obtained via yb = Q bα = yb.
1
Analogously to our comments at the end of Section 4.2, the analysis for nondiag-
onal approximation presented here carries over to the weighted setting. In both the
m > n and m < n scenarios, the standard symmetric eigenproblems (4.22) and (4.24)
become

QT1 (SF )Q1 y = λQT1 W −1 Q1 y,

where y = R1 βb when m > n and y = Rβ when m < n. Recovering the set of


barycentric coefficients in the numerator corresponds to solving the systems
T
α = Pn Pn⊥ R−1 Q1 Q2 (F − λSW −1 )Q1 y,
  
m>n

and
b−1 Q
b=R
α b T1 (F − λSW −1 )Q1 y, m < n.

Stability and conditioning. We have just shown that the matrices arising in our
rational Remez algorithm have explicit expressions, and the eigenvalue problem re-
duces to a standard symmetric problem. Indeed, our experiments corroborate that
we have greatly improved the stability and conditioning of the rational Remez al-
gorithm using the barycentric representation. However, the algorithm is still not
16 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

guaranteed to compute r∗ to machine precision. Let us summarize the situation


for the unweighted case. As shown in Corollary 4.5, the computation of Q1 can
be done explicitly, and the linear system y = Rβ is diagonal, hence can be solved
with high relative accuracy. The main source of numerical errors is therefore in the
symmetric eigenvalue problem (4.13), (4.22) or (4.24). As is well known, by Weyl’s
bound [57, Cor. IV.4.9], eigenvalues of symmetric matrices are well conditioned with
condition number 1; thus λ is computed with O(u) accuracy, assuming for simplicity
that kf k∞ = 1 (without loss of generality). The eigenvector, on the other hand, has
conditioning O(1/gap) [57, Ch. V], where gap is the distance between the desired λ
and the rest of the eigenvalues. These eigenvalues are equal to those of the nonzero
eigenvalues of the generalized eigenproblem (4.3), and are inherent in the Remez al-
gorithm, i.e., they cannot be changed e.g. by a change of bases. For a fixed f , gap
tends to decrease as m, n increase, and we typically have gap = O(|λ|). Hence the
computed eigenvector tends to have accuracy O(u/|λ|), and if the eigenvector y has
small elements, the componentwise relative accuracy may be worse. The computation
therefore breaks down (perhaps as expected) when |λ| = O(u), that is, when the error
curve has amplitude of size machine precision.
4.5. Adaptive choice of the support points. Theorem 4.4 gives an optimal
1/2
choice of support points tk = x2k+1 in terms of optimizing minΓ∈Dn+1 κ2 (|∆| CΓ).
In Section 2.2 we discussed Qnanother desideratum for the support points {tk }: the
resulting |D(x` )| = |q(x` ) k=0 (x` − tk )| should take uniformly large values for all
`. Fortunately, this requirement is also met with this choice, as was illustrated in
Figure 2.1.
When m 6= n, (4.18) does not determine enough support points. We take the
remaining |m − n| support points from the rest of the reference points in Leja style,
i.e., to maximize the product of the differences (see for instance [52, p. 334]). This
is a heuristic strategy, and the optimal choice is a subject of future work: indeed, in
1/2
this case minΓ∈Dn+1 κ2 (|∆| CPm,n Γ) > 1.
5. Initialization. An indispensable component of a successful Remez algorithm
implementation is a method for finding a good set of initial reference points {x` }. A
key element of our approach is the AAA-Lawson algorithm, which can efficiently find
an approximate solution to the minimax problem (1.2) (to low accuracy).
5.1. Carathéodory-Fejér (CF) approximation. We first attempt to com-
pute the CF approximant [59, 61] to f , and use it to find the initial reference points
(as explained in Section 6). The dominant computation is an SVD of a Hankel matrix
of Chebyshev coefficients, which usually does not cause a computational bottleneck.
This method was also used in the previous Chebfun remez code. When f is smooth,
the result produced by CF approximation is often indistinguishable from the best
approximation, but nonsmooth cases may be very different.
5.2. AAA-Lawson approximation. This approach is based on the AAA algo-
rithm [43] followed by an adaptation of the Lawson algorithm. The resulting algorithm
is also based crucially on the barycentric representation. To keep the focus on Remez,
we defer the details to Section 8.
The output of the AAA-Lawson iteration typically has a nearly equioscillatory
error curve e = f − r, from which we find the initial set of reference points as the
extrema of e. For the prototypical example f = |x|, AAA-Lawson initialization lets
our barycentric minimax code converge for type up to (40, 40). The entire process
relies on a moderate number of SVDs (say max(m, n) + 10).
RATIONAL MINIMAX APPROXIMATION 17

n 1
(m3, n3)

Approximation domain
(m2, n2) 0.8

0.6
(m1, n1) New initialization points {x′′ℓ }
0.4

0.2 Reference points {x′ℓ }

0
(0, 0) m 0 0.2 0.4 0.6 0.8 1
Normalized indices

Fig. 5.1: Initialization with lower degree approximations. The left plot shows the three
possible paths for updating the degrees (assuming the increment is j = 1): m < n (red),
m = n (black) and m > n (blue). The right plot shows how initialization is done √ at an
intermediate step. The function is f1 from Table 7.1, with a singularity at x = 1/ 2. The
y components of the red crosses correspond to the final references {x0` } for the (m0 , n0 ) =
(10, 10) best approximation, while the y components of the black circles are the initial guess
{x00` } for the (m00 , n00 ) = (11, 11) problem, taken based on the piecewise linear fit at {x0` }.
Note how the y components of both sets of points cluster near the singularity.

5.3. Using lower degree approximations. We resort to this strategy if CF


and AAA-Lawson fail to produce a sufficiently good initial guess. For functions f with
singularities in [a, b], the reference sets {x` } corresponding to best approximations
in (1.3) tend to cluster near these singularities as m and n increase.
It is sensible to expect that first computing a type (m0 , n0 ) best approximation to
f with m0  m and n0  n is easier (with convergence achieved if necessary with the
help of CF or AAA-Lawson). We then proceed by progressively increasing the values
of m0 and n0 by small increments j, typically j ∈ {1, 2, 4}. The steps taken follow
a diagonal path, as explained in Figure 5.1. Note that in addition to improving the
robustness of the Remez algorithm, this strategy can help detect degeneracy; recall
the discussion after (1.3). It proves useful for many examples, including some of those
shown in Section 7: type (n, n) approximations to f (x) = |x|, x ∈ [−1, 1] for n > 40
and the f1 , f2 and f4 specifications in Table 7.1.
6. Searching for the new reference. We now turn to the updating strategy
for the reference points x0 . . . , xm+n+1 during the Remez iterations. These are a
subset of the local extrema of the error function e(x) = f (x) − r(x). To find them,
we decompose the domain [a, b] into subintervals of the form [x̃` , x̃`+1 ] (and [a, x̃0 ]
and [x̃m+n+1 , b], if non-degenerate; here {x̃` } are the old reference points) and then
compute Chebyshev interpolants pe (x) of e(x) on each subinterval. In addition, if
f has singularities (identified by Chebfun’s splitting on functionality [46]), then
we further divide the subintervals at those points. Since e(x) is then smooth and
Pk
each subinterval is small, typically a low degree suffices for pe = i=0 ci Ti (x): we
start with 23 + 1 points (degree k = 8), and resample if necessary (determined by
examining the decay of the Chebyshev coefficients). We then find the roots of p0e (x) =
Pk 0
i=1 ici Ui−1 (x) (using the formula Tn (x) = nUn−1 (x)) via the eigenvalues of the
colleague matrix for Chebyshev polynomials of the second kind [28]. Typically, one
local extremum per subinterval is found, resulting in m + n + 2 points, including the
endpoints. If more extrema are found, we evaluate the values of |e(x)| at those points
18 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

×10-4

Current Ref.
5 Next Ref.
Error

-5

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Fig. 6.1: Illustration of how a new set of reference points (black stars) is found from the
current error function e = f − r (blue curve). Shown here is the error curve after three
Remez iterations in finding the best type (10, 10) approximation to f (x) = |x| on [−1, 1].
We split this interval into subintervals separated by the previous reference points (red circles),
and approximate e on each subinterval by a low-degree polynomial. We then find the roots
of its derivative.

and select those with the largest values that satisfy (3.2).

7. Numerical results. All computations in this section were done using Cheb-
fun’s new minimax command in standard IEEE double precision arithmetic.
Let us start with our core example of approximating |x| on [−1, 1], a problem
discussed in detail in [58, Ch. 25]. For more than a century, this problem has at-
tracted interest. The work of Bernstein and others in the 1910s led to the theorem
that degree n ≥ 0 polynomial approximations of this function can achieve at most
O(n−1 ) accuracy, whereas Newman in 1964 showed that rational approximations can
achieve root-exponential accuracy [45]. The convergence rate for best type √ (n, n)
approximations was later shown by Stahl [56] to be En,n (|x|, [−1, 1]) ∼ 8e−π n .
This result had in fact been conjectured by Varga, Ruttan and Carpenter [62]
based on a specialized multiple precision (200 decimal digits) implementation of the
Remez algorithm. Their computations were performed √ on the square root function,
using the fact that E2n,2n (|x|, [−1, 1]) = En,n ( x, [0, 1]), as follows from symmetry.
They went up to n = 40. In both settings, the equioscillation points cluster expo-
nentially around x = 0 (see second plot of Figure 7.1), making it extremely difficult
to compute best approximations. Our barycentric Remez algorithm in double preci-
sion arithmetic is able to match their performance, in the sense that we obtain the
type (80, 80) best approximation to |x| in less than 15 seconds on a desktop machine.
The results are showcased in Figure 7.1, where our levelled error computation for the
type (80, 80) approximation (value 4.39 . . . × 10−12 ) matches the corresponding error
of [62, Table 1] to two significant digits, even though the floating point precision is no
better than 10−16 .
Running the other non-barycentric codes (Maple’s numapprox[minimax], Math-
ematica’s MiniMaxApproximation (which requires f to be analytic on [a, b]), and
Chebfun’s previous remez) on the same example resulted in failures at very small
values of n (all for n ≤ 8).
The robustness of our algorithm is also illustrated by the examples of Table 7.1
and Figure 7.2, which is a highlight of the paper. Computing these five approximations
takes in total less than 50 seconds with minimax. Example f4 is taken from [60, §5],
while f5 is inspired by [51]. The difficulty of approximating f5 is even more pronounced
than for |x|, since best type (n, n) approximations to f5 offer at most O(n−1 ) accu-
RATIONAL MINIMAX APPROXIMATION 19

10 0

-5
10

-10
10

10 -15
0 10 20 30 40 50 60 70 80

-12
10

-2

-4

-6

-8
-12 -10 -8 -6 -4 -2 0
10 10 10 10 10 10 10

Fig. 7.1: In the first plot, the upper dots show the best approximation errors for the degree
2n best polynomial approximations of |x| on [−1, 1], while the lower ones correspond to the
best type (n, n) rational approximations, superimposed on the asymptotic formula from [56].
The bottom plot shows the minimax error curve for the type (80, 80) best approximation
to |x|. Note that the horizontal axis has a log scale: the alternant ranges over 11 orders of
magnitude. The positive part of the domain [−1, 1] is shown (by symmetry the other half is
essentially the same).

Table 7.1: Best approximation to five difficult


√ functions by the barycentric rational Remez
algorithm. f100 is discontinuous at x = 1/ 2, f20 is discontinuous at x = 0, f30 is unbounded
as x → 0, f4 has two sharp peaks at x = ±0.6, and f5 has a logarithmic singularity at x = 0.

i fi [a, b] (m, n) kf − r∗ k∞
(
x2 , x< √1
1 √ 2 [0, 1] (22, 22) 2.439 × 10−9
−x2 + 2 2x − 1, √12 ≤ x

4.371 × 10−8
p
2 |x| |x| [−0.7, 2] (17, 71)
√ 2
3
xe−x
3 x3 + [−0.2, 0.5] (45, 23) 2.505 × 10−5
8
100π(x2 − 0.36)
4 [−1, 1] (38, 38) 1.780 × 10−12
sinh(100π(x2 − 0.36))
1
5 − [−0.1, 0.1] (8, 8) 1.52 × 10−2
log |x|

racy (a stark contrast to the root-exponential behavior of En,n (|x|, [−1, 1])) and the
reference points cluster even more strongly, quickly falling below machine precision.
In Figures 7.3 and 7.4, we further illustrate minimax and its weighted variant,
20 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

×10 -9
1
4
2
0.5 f1 0
-2
-4
0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
×10 -8
3
5
2
f2 0
1
-5
0
-0.5 0 0.5 1 1.5 2 -0.5 0 0.5 1 1.5 2
×10 -5
5
0.2
f3
0.1
0
0

-0.1 -5
-0.2 -0.1 0 0.1 0.2 0.3 0.4 -0.2 -0.1 0 0.1 0.2 0.3 0.4
×10 -12

1
2

0
0.5 f4
-2

0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

0.6
0.02
0.4
f5 0
0.2
-0.02
0
-0.1 -0.05 0 0.05 0.1 -0.1 -0.05 0 0.05 0.1

Fig. 7.2: Error curves for the best rational approximations of Table 7.1.

by revisiting some classical problems in rational approximation: the Zolotarev prob-


lems [2, Ch. 9]. Among other questions, Zolotarev asked what are the best rational
approximants to the √sign function (on the union of intervals [−b, −a] ∪ [a, b] √
for scalars
0 < a < b) and the x function (in the relative sense, i.e., minimizing k1−r/ xk∞ ) on
[1/b2 , 1/a2 ]. Zolotarev proved
p these problems are mathematically equivalent through√
the identity sign(x) = x 1/x2 : if r is the type (m, m) best approximant to x on
[1/b2 , 1/a2 ], then sign(x) − xr(1/x2 ) is found to equioscillate at 4m + 4 points on
[−b, −a] ∪ [a, b], so xr(1/x2 ) is the best approximant to sign(x) of type (2m + 1, 2m)
on [−b, −a] ∪ [a, b]. Furthermore, Zolotarev gave explicit solutions involving Jacobi’s
elliptic functions. These rational functions have the remarkable property of preserv-
ing optimality under appropriate composition [42].√ In Figure 7.3 we compute the
best relative error approximant of type (m, m) to x using the weighted variant of
our rational Remez algorithm. We then compute xr(1/x2 ), the type (2m + 1, 2m)
best approximant to the sign function. The error function is shown in Figure 7.4,
RATIONAL MINIMAX APPROXIMATION 21

confirming Zolotarev’s results.


We emphasize that the examples presented in this section are extraordinarily
challenging, far beyond the capabilities of most codes for minimax approximation.
Chebfun minimax not only solves them but does so quickly. For smoother functions
such as analytic functions (with singularities, if any, lying far from the interval), we
find that minimax usually easily computes r∗ so long as kf − r∗ k∞ is a digit or two
larger than ukf k∞ .

Fig. 7.3: Result of the weighted version of our barycentric Remez algorithm for the function
√ √
f (x) = x, x ∈ [10−8 , 1] with w(x) = 1/ x and a type (17, 17) rational approximation. We
plot the absolute error curve on the left, while the relative error (right), matching our choice
of w, gives an expected equioscillating curve. This is Zolotarev’s third problem.

Fig. 7.4: The error in type (35, 34) best approximation to the sign function on [−104 , −1] ∪

[1, 104 ], computed via xr(1/x2 ), where r(x) ≈ x as obtained in Figure 7.3. This is
Zolotarev’s fourth problem.

8. AAA-Lawson algorithm. Here we describe a new algorithm for rational


approximation that we call the AAA-Lawson algorithm; in practice we recommend
this for computing an initial guess for the Remez iteration. It applies on a finite,
discrete set rather than the continuous interval [a, b] as in (1.2). Specifically, we
consider the problem

minimize kf (Z) − r(Z)k∞ , (8.1)


r∈Rm,n

where Z = {z1 , . . . , zM } is a set of distinct points (sample points) in [a, b]. The
number M is usually large, e.g. 105 , and in particular much bigger than m and n.
The idea is that the solution for the discrete problem (8.1) should converge to the
continuous one (1.2) if we discretize the interval densely enough.
22 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

AAA-Lawson proceeds as follows:


1. Use the AAA algorithm to find an approximant (2.2), in particular the sup-
port points {tk } for a rational approximation r to f . This step is not tied to
a particular norm.
2. Use a variant of Lawson’s algorithm to obtain a refined (near-best) rational
approximant in the `∞ norm.
Below we first review the AAA algorithm, introduced in [43], then the Lawson
algorithm, and then we present the AAA-Lawson combination.
8.1. The AAA algorithm. Given a function f and sample points Z ∈ CM , the
AAA algorithm finds a rational approximant of type (n, n) represented as in (2.2) by
Pn  Pn
r(z) = N e (z)/D(z)
e := k=0 f (tk )βk (z − tk )−1 k=0 βk (z − tk )
−1
. Here, the support
points {tk } are a subset of Z chosen in an adaptive, greedy manner so as to improve the
approximation as we increase n, exploiting the interpolatory property N e (tk )/D(t
e k) =
f (tk ) for all k (unless βk = 0). AAA takes only βk as the unknowns, which are found
by solving a linearized least-squares problem of the form minimizekβk2 =1 kf D e −N e k e,
Z
where the subscript Z denotes the discrete 2-norm at points Z := Z \ {t0 , . . . , tn }.
e e
For details, see [43].
Noninterpolatory AAA. As we discussed in Section 2, the representation N e (z)/D(z)
e

is unsuitable when theP goal is to representrP: it is necessary to use the representation
n n
r(z) = N (z)/D(z) = k=0 αk (z − tk )−1 k=0 βk (z − tk )
−1
as in (2.1). This leads
to a noninterpolatory variant of AAA, discussed briefly in [43, Section 10]. The re-
sulting least-squares problem minimizekαk22 +kβk22 =1 kf D − N kZe has unknowns α and
β. Written in matrix form, it takes the form
 
  α
minimize C −F C , (8.2)
kαk22 +kβk22 =1 β 2

where F = diag(f (Z)),


e and C`,k = 1/(z` −tk ) is the Cauchy (basis) matrix as in (4.11),
but with rows corresponding to z` ∈ {t0 , . . . , tn } removed. We take the same  support
points {tk } as in AAA. We solve (8.2) by computing the SVD of the matrix C −F C
α 2n+2
and finding the right singular vector v = β ∈ R corresponding to the smallest
singular value. As in Section 4.4, the case m 6= n also uses the projection matrices
Pm , Pn .
8.2. Lawson’s algorithm. Lawson’s algorithm [37] computes the best polyno-
mial (linear) approximation based on an iteratively reweighted least-squares process.
During the iteration, a set of weights is updated according to the residual of the
previous solution.
Specifically, suppose that f is to be approximated on Z = {z1 , . . . , zM } in a linear
M
subspace span(gi )ni=0 . With an initial set of weights {wj }j=1 such that wj ≥ 0 and
PM
j=1 wj = 1, one solves (using a standard solver) the weighted least-squares problem
v
Xn uM n
uX X
minimize kf − ci gi kw = t wj (f (Zj ) − ci gi (Zj ))2 , (8.3)
c0 ,...,cn
i=0 j=1 i=0
Pn
and computes the residual rj = f (Zj ) − i=0 ci gi (Zj ). The weights are then updated
PM
by wj := wj |rj |, followed by the re-normalization wj := wj / i=1 wi . Iterating
this process is known to converge linearly to the best polynomial approximant (with
nontrivial convergence analysis [17]), and an acceleration technique is presented in [26].
RATIONAL MINIMAX APPROXIMATION 23

8.3. AAA-Lawson. We now propose a rational variant of Lawson’s algorithm.


(A similar attempt was made in [20, § 6.5], though the formulation there is not the
same: most notably, adjusting the exponent γ as done below appears to improve
robustness significantly.) The idea is to incorporate Lawson’s approach into nonin-
terpolatory AAA, replacing (8.3) with a weighted version of (8.2), and updating the
weights as in Lawson.
Specifically, given an initial set of weights w ∈ RM −(max(m,n)+1) , usually all ones,
and initializing the Lawson exponent γ = 1, we proceed as follows:
1. Solve the weighted linear least-squares problem

minimize kf (Z)D(
e e − N (Z)k
Z) e w, (8.4)
kαk22 +kβk22 =1

√  
via the SVD of the matrix diag( w) C −F C (recall (8.2)). If the resulting
kf (Z) − N (Z)/D(Z)k∞ is not smaller than before, then set γ := γ/2.
2. Update w by
γ
N (Zj ) wj
wj ← wj f (Zj ) − , ∀j, then wj := P (8.5)
D(Zj ) i wi

and return to step 1.


Note the exponent γ in (8.5). In the linear case, this is γ = 1. In the rational
(nonlinear) case, for which experiments suggest convergence is a delicate issue, we
have found that taking γ to be smaller makes the algorithm much more robust. We
repeat the steps until w undergoes small changes, e.g. 10−3 , or a maximum number
of iterations (e.g. 30) is reached.
We refer to this algorithm as AAA-Lawson. Each iteration is computed by an
SVD of an (M − max(m, n) − 1) × (m + n + 2) matrix, so the cost for k iterations
is O(kM (m + n)2 ). Convergence analysis appears to be highly nontrivial and is out
of our scope. We simply note here that if equioscillation ofp f − N/D is achieved at
∗ ∗
m + n + 2 points in Z∗ ⊂ Z, then by defining w as wj = 1/ |D(Zj )| for j ∈ Z∗ and
0 otherwise, we see that w∗ / w∗ (together with N ∗ /D∗ = r∗ , the solution of (1.2))
P
is a fixed point of the iteration.
8.4. Experiments with AAA-Lawson. Figure 8.1 compares AAA and AAA-
Lawson (run for ten Lawson steps) for type (10,10) and (20,20) approximation of
f (x) = |x|. The sample points are 104 equispaced points on [−1, 1]. Observe that
the Lawson update significantly reduces the error and brings the error curve close to
equioscillation.
AAA-Lawson is a new algorithm for rational minimax approximation. However,
we do not recommend it as a practical means to obtain r∗ over the classical Remez
or differential correction algorithms. The reason is that its convergence is far from
understood, and even when it does converge, the rate is slow (linear at best). We
illustrate this in Figure 8.2. In our Remez algorithm context, we take a small number
(say 10) of AAA-Lawson steps to obtain a set of initial reference points, thereby taking
advantage of the initial stage of the AAA-Lawson convergence.
We note that other approaches for rational approximation are available, which
can be used for initializing Remez. These include the Loewner approach presented
in [39] and RKFIT [6]. In particular, the Loewner approach is well suited when
approximating smooth functions (and sometimes non-smooth functions like f4 [36]),
24 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

10-3 10-5

4 1

2 0.5

0 0

-2 -0.5

-4 -1

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

Fig. 8.1: Error of rational approximants to f (x) = |x| by the AAA and AAA-Lawson algo-
rithms. The black dots are the support points. They are also interpolation points for AAA,
but not for AAA-Lawson.

10-2

10-4
error

10-6
AAA-Lawson

10-8

AAA-Lawson+Remez
-10
10
5 10 15 20 25 30
Iteration

Fig. 8.2: Convergence of AAA-Lawson alone and AAA-Lawson followed by Remez, for f (x) =
|x|, m = n = 10. The error is measured by kr∗ − rk k∞ , where rk is the kth iterate. AAA-
Lawson converges linearly, whereas Remez converges quadratically.

often achieving an error of the same order of magnitude as the best approximation.
Our experiments suggest that AAA-Lawson is at least as efficient and robust as these
alternatives.
8.5. Adaptive choice of support points. At an early stage of the AAA-
Lawson iteration, we usually do not have the correct number (m + n + 2) of reference
(oscillation) points in the error curve. Therefore, choosing the support points {tk }
as in (4.18) is not an option. Instead, we use the same support points chosen by the
AAA algorithm, which is typically a good set. Once convergence sets in and the error
curve of the AAA-Lawson iterates has at least m + n + 2 alternation points, we can
switch to the adaptive choice (4.18) as in Remez. We note, however, that adaptively
changing the support points may further complicate the convergence, since it changes
the linear least-squares problem (8.4).
8.6. Adaptive choice of the sample points. For solving the continuous prob-
lem (1.2), we take the sample point set Z to be M points uniformly distributed on
[a, b] (M . 105 , chosen to keep the run time under control). Generally, it is necessary
to sample more densely near a singularity if there is one; this is important e.g. for
RATIONAL MINIMAX APPROXIMATION 25

f (x) = |x|. We incorporate this need as follows: use AAA to find the support points
{tk } (assume they are sorted), and take M/n points between [tk , tk+1 ].
9. A barycentric version of the differential correction algorithm. The
DC algorithm, due to Cheney and Loeb [16], has the great advantage of guaranteed
global convergence in theory [3,25], which applies whether the approximation domain
X is an interval [a, b] or a finite set. It can also be extended to multivariate ap-
proximation problems [32]. In practice, however, it may suffer greatly from rounding
errors, and its speed is often disappointing on larger problems. As we shall now de-
scribe, we have found that the first of these difficulties can be largely eliminated by
the use of barycentric representations with adaptively chosen support points. The
second problem of speed, however, remains, which is why ultimately we prefer the
Remez algorithm for most problems.
9.1. The barycentric formulation. For an effective implementation, X needs
to be a finite set (e.g. obtained by discretizing [a, b]) to reduce each iteration to a linear
programming (LP) problem. Considering the diagonal case m = n, a barycentric
version of the DC algorithm can be defined recursively as follows. (We assume the
support points are fixed to the values t0 , . . . , tn , which do not belong to X.) Given
rk = Nk /Dk ∈ Rn,n (X), choose the partial fraction decompositions N and D of (2.1)
that minimize the expression
 
|f (x)D(x) − N (x)| − δk |D(x)|
max , (9.1)
x∈X |Dk (x)|

subject to

sign(ωt (x)D(x)) = sign(ωt (y)D(y)), ∀x, y ∈ X, x 6= y, (9.2)

and

max |βj | ≤ 1, (9.3)


0≤j≤n

where δk = maxx∈X |f (x) − rk (x)|. If r = N/D is not good enough, continue with
rk+1 = r. By imposing (9.3), we can establish convergence using an argument anal-
ogous to [3, Theorem 2]. In the polynomial basis setting, we know that the rate of
convergence will ultimately be at least quadratic if the best approximation is non-
degenerate [3, Theorem 3]. Non-diagonal approximations can be computed by adding
the appropriate null space constraints as described in Section 4.4.
9.2. Choice of support points. Compared to the case of the barycentric Re-
mez algorithm, changing the support points at each iteration of the DC algorithm
makes it hard to impose a normalization condition similar to (9.3) or do a conver-
gence analysis of the method. We therefore fix {tk } throughout the execution. The
strategy we have adopted is based on Section 5.3: recursively construct type (`, `)
approximations with ` ≤ n. We take the set of support points of the (`, `) problem
based on a piecewise linear fit of the final reference points of the (` − 1, ` − 1) problem
(similar to what is shown in Figure 5.1).
9.3. Experiments. We have implemented2 the barycentric DC algorithm in
MATLAB using CVX [29] to specify the LP problems corresponding to (9.1)–(9.3),
2 The prototype code used is available at https://github.com/sfilip/barycentricDC.
26 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

Table 9.1: Best type (16, 16) approximations to four functions using the barycentric DC
algorithm. X consists of 20000 equispaced points inside [−1, 1].

i fi kfi − r∗ kX,∞
P∞
1 k=0 2−k cos(3k x) 0.1377
2 min {sech(3 sin(10x)), sin(9x)} 0.0610
1.2057 · 10−4
p
3 |x3 | + |x + 0.5|
 
4 1
2
x
erf √0.0002 + 32 e−x 6.2045 · 10−6

0.5
f1 0.2
0.1
0 0
-0.1
-0.5
-0.2
-1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

1
0.1
0.5 f2 0.05

0 0

-0.5 -0.05

-0.1
-1
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

×10 -4
2.5
2
2
1
1.5
0
1

0.5
f3 -1

-2
0
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

-5
×10
3
1
2.5
0.5
2 f4
0
1.5
-0.5
1
-1
0.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

Fig. 9.1: The functions of Table 9.1 with error curves for best rational approximations
computed by the barycentric DC algorithm.

which are then solved using MOSEK’s [41] state-of-the-art LP optimizers. The four
examples in Table 9.1 and Figure 9.1, for instance, demonstrate the effectiveness of the
algorithm. For comparison, the sensitivity to the initial reference set prevented the
convergence of our barycentric Remez implementation on all four of these examples.
Function f1 is particularly interesting since it is a version of Weierstrass’s classic
example of a continuous but nowhere differentiable function.
Using a monomial or Chebyshev basis representation for the LP formulations
RATIONAL MINIMAX APPROXIMATION 27

quickly failed due to numerical errors, illustrating that the barycentric representation
is crucial for the DC algorithm just as for the Remez algorithm.
We nevertheless echo the statement in the beginning of the section of the down-
sides of using the DC approach:
• Its overall cost. Producing the approximations in Figure 9.1 took several
minutes in MATLAB on a desktop machine for each example.
• Numerical optimization tools for solving the corresponding LP problems break
down at lower values of m and n than the ones we achieved with the barycen-
tric Remez algorithm. We were usually able to go up to about type (20, 20).

No Yes
Failed valid

Lower degree
approximation
No
Yes
valid

No
Find λk and rk
Input:
AAA-Lawson using symmetric Yes Find new Yes Output:
f, (m, n) valid converged
[a, b]
approximation eigenvalue reference set r∗
problem
No No

valid Failed
Yes

CF approx-
imation

Step 1 Step 2 Step 3

Fig. 10.1: Flowchart summarizing the minimax implementation of the rational Remez algo-
rithm in the unweighted case. It follows the steps outlined at the start of Section 3. Step 1
consists of picking the initial reference set. This is done by applying in succession (if needed)
the strategies discussed in Sections 5.1, 5.2 and 5.3. Next up in Step 2 is computing the cur-
rent approximant rk and alternation error λk . We do this by solving a symmetric eigenvalue
problem (4.13), (4.22) or (4.24), depending on m = n, m > n or m < n. We then pick, if
possible, the eigenpair leading to a rational approximant with no poles in [a, b] (see discus-
sion around equation (4.14)). The next reference set is determined in Step 3 as explained in
Section 6. If convergence is successful, the routine outputs a numerical approximant of r∗ .

10. Minimax approximation in Chebfun. We have presented many algorith-


mic details that have enabled the design of a fast and robust Remez implementation.
In closing we remind readers that all this is available in Chebfun and readily explored
in a few lines of code. Download Chebfun version 5.7.0 or later from GitHub or
www.chebfun.org, put it in your MATLAB path, and then try for example

[p,q,r] = minimax(@(x) abs(x),60,60);


fplot(@(x) abs(x)-r(x),[-1 1])

In a few seconds a beautiful curve with 123 exponentially clustered equioscillation


points will appear. Figure 10.1 summarizes our algorithm in a flowchart.
28 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

Acknowledgement. We thank the reviewers for their useful comments and sug-
gestions, which helped improve the quality of the paper.

REFERENCES

[1] Boost C++ Libraries. http://www.boost.org.


[2] N. I. Akhiezer. Elements of the Theory of Elliptic Functions, volume 79 of Translations of
Mathematical Monographs. American Mathematical Society, 1990.
[3] I. Barrodale, M. J. D. Powell, and F. K. Roberts. The differential correction algorithm for
rational `∞ -approximation. SIAM J. Numer. Anal., 9(3):493–504, 1972.
[4] B. Beckermann. The condition number of real Vandermonde, Krylov and positive definite
Hankel matrices. Numer. Math., 85(4):553–577, 2000.
[5] B. Beckermann and A. Townsend. On the singular values of matrices with displacement struc-
ture. SIAM J. Matrix Anal. Appl., 38(4):1227–1248, 2017.
[6] M. Berljafa and S. Güttel. The RKFIT algorithm for nonlinear rational approximation. SIAM
J. Sci. Comp., 39(5):A2049–A2071, 2017.
[7] J.-P. Berrut. Rational functions for guaranteed and experimentally well-conditioned global
interpolation. Comput. Math. Appl., 15(1):1–16, 1988.
[8] J.-P. Berrut, R. Baltensperger, and H. D. Mittelmann. Recent developments in barycentric
rational interpolation. In Trends and Applications in Constructive Approximation, pages
27–51. Springer, 2005.
[9] J.-P. Berrut and H. D. Mittelmann. Matrices for the direct determination of the barycentric
weights of rational interpolation. J. Comput. Appl. Math., 78(2):355–370, 1997.
[10] J.-P. Berrut and L. N. Trefethen. Barycentric Lagrange interpolation. SIAM Rev., 46(3):501–
517, 2004.
[11] D. Braess. Nonlinear Approximation Theory. Springer, 1986.
[12] C. Brezinski and M. Redivo-Zaglia. Padé–type rational and barycentric interpolation. Numer.
Math., 125(1):89–113, 2013.
[13] F. Brophy and A. Salazar. Synthesis of spectrum shaping digital filters of recursive design.
IEEE T. Circuits Syst., 22(3):197–204, 1975.
[14] O. S. Celis. Practical Rational Interpolation of Exact and Inexact Data. PhD thesis, Universiteit
Antwerpen, 2008.
[15] E. Cheney. Introduction to Approximation Theory. AMS Chelsea Pub., 1982.
[16] E. W. Cheney and H. L. Loeb. Two new algorithms for rational approximation. Numer. Math.,
3(1):72–75, 1961.
[17] A. K. Cline. Rate of convergence of Lawson’s algorithm. Math. Comp., 26(117):167–176, 1972.
[18] W. J. Cody. The FUNPACK package of special function subroutines. ACM Trans. Math.
Softw., 1(1):13–25, 1975.
[19] W. J. Cody. Algorithm 715: SPECFUN—a Portable FORTRAN Package of Special Function
Routines and Test Drivers. ACM Trans. Math. Softw., 19(1):22–30, 1993.
[20] P. Cooper. Rational Approximation of Discrete Data with Asymptomatic Behaviour. PhD
thesis, University of Huddersfield, 2007.
[21] A. Curtis and M. R. Osborne. The construction of minimax rational approximations to func-
tions. Comput. J., 9(3):286, 1966.
[22] A. R. Curtis. Theory and calculation of best rational approximations. In Methods of Numerical
Approximation, pages 139–148. Elsevier, 1966.
[23] A. Deczky. Equiripple and minimax (Chebyshev) approximations for recursive digital filters.
IEEE T. Acoust., Speech, Signal Process., 22(2):98–111, 1974.
[24] T. A. Driscoll, N. Hale, and L. N. Trefethen. Chebfun Guide. Pafnuty Publications, Oxford,
2014.
[25] S. N. Dua and H. L. Loeb. Further remarks on the differential correction algorithm. SIAM J.
Numer. Anal., 10(1):123–126, 1973.
[26] S. Ellacott and J. Williams. Linear Chebyshev approximation in the complex plane using
Lawson’s algorithm. Math. Comp., 30(133):35–44, 1976.
[27] M. S. Floater and K. Hormann. Barycentric rational interpolation with no poles and high rates
of approximation. Numer. Math., 107(2):315–331, 2007.
[28] I. J. Good. The colleague matrix, a Chebyshev analogue of the companion matrix. Q. J. Math.,
12(1):61–68, 1961.
[29] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version
2.1. http://cvxr.com/cvx, Mar. 2014.
[30] B. Gustavsen. Improving the pole relocating properties of vector fitting. IEEE Trans. Power
RATIONAL MINIMAX APPROXIMATION 29

Del., 21(3):1587–1592, 2006.


[31] B. Gustavsen and A. Semlyen. Rational approximation of frequency domain responses by vector
fitting. IEEE Trans. Power Del., 14(3):1052–1061, 1999.
[32] R. Hettich and P. Zencke. An algorithm for general restricted rational Chebyshev approxima-
tion. SIAM J. Numer. Anal., 27(4):1024–1033, 1990.
[33] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, PA, USA,
second edition, 2002.
[34] N. J. Higham. The numerical stability of barycentric Lagrange interpolation. IMA J. Numer.
Anal., 24(4):547–556, 2004.
[35] A. C. Ionit, ă. Lagrange Rational Interpolation and its Applications to Approximation of Large-
Scale Dynamical Systems. PhD thesis, Rice University, 2013.
[36] D. S. Karachalios. Hyperbolic function and the Loewner framework. private communication.
[37] C. L. Lawson. Contributions to the Theory of Linear Least Maximum Approximations. PhD
thesis, University of California, Los Angeles, 1961.
[38] H. J. Maehly. Methods for fitting rational approximations, Parts II and III. J. ACM, 10(3):257–
277, 1963.
[39] A. J. Mayo and A. C. Antoulas. A framework for the solution of the generalized realization
problem. Linear Algebra Appl., 425(2-3):634–662, 2007.
[40] G. Meinardus. Approximation of Functions: Theory and Numerical Methods. Springer, 1967.
[41] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision
28), 2015.
[42] Y. Nakatsukasa and R. W. Freund. Computing fundamental matrix decompositions accurately
via the matrix sign function in two iterations: The power of Zolotarev’s functions. SIAM
Rev., 58(3):461–493, 2016.
[43] Y. Nakatsukasa, O. Sète, and L. N. Trefethen. The AAA algorithm for rational approximation.
Technical report, 2016. To appear in SIAM J. Sci. Comp.
[44] Y. Nakatsukasa and L. N. Trefethen. Rational approximation of xn . Technical report, 2018.
To appear in Proc. AMS.
[45] D. J. Newman. Rational approximation to |x|. Michigan Math. J., 11(1):11–14, 03 1964.
[46] R. Pachón, R. B. Platte, and L. N. Trefethen. Piecewise-smooth chebfuns. IMA J. Numer.
Anal., 30(4):898–916, 2010.
[47] R. Pachón and L. N. Trefethen. Barycentric-Remez algorithms for best polynomial approxima-
tion in the Chebfun system. BIT Numer. Math., 49(4):721–741, 2009.
[48] V. Y. Pan. How bad are Vandermonde matrices? SIAM J. Matrix Anal. Appl., 37(2):676–694,
2016.
[49] A. Pelios. Rational function approximation as a well-conditioned matrix eigenvalue problem.
SIAM J. Numer. Anal., 4(4):542–547, 1967.
[50] M. J. D. Powell. Approximation Theory and Methods. Cambridge University Press, 1981.
[51] A. Pushnitski and D. Yafaev. Best rational approximation of functions with logarithmic sin-
gularities. Constr. Approx., 46(2):243–269, 2017.
[52] L. Reichel. Newton interpolation at Leja points. BIT Numer. Math., 30(2):332–346, 1990.
[53] E. Remes. Sur le calcul effectif des polynômes d’approximation de Tchebichef. C. r. hebd.
séances Acad. Sci., 199:337–340, 1934.
[54] E. Remes. Sur un procédé convergent d’approximations successives pour déterminer les
polynômes d’approximation. C. r. hebd. séances Acad. Sci., 198:2063–2065, 1934.
[55] C. Schneider and W. Werner. Some new aspects of rational interpolation. Math. Comp.,
47(175):285–299, 1986.
[56] G. Stahl. Best uniform approximation of |x| on [−1, 1]. Russian Acad. Sci. Sb. Math.,
76(2):461–487, 1993.
[57] G. W. Stewart and J.-G. Sun. Matrix Perturbation Theory (Computer Science and Scientific
Computing). Academic Press, 1990.
[58] L. N. Trefethen. Approximation Theory and Approximation Practice. SIAM, 2013.
[59] L. N. Trefethen and M. H. Gutknecht. The Carathéodory-Fejér method for real rational ap-
proximation. SIAM J. Numer. Anal., 20(2):420–436, 1983.
[60] J. Van Deun. Computing near-best fixed pole rational interpolants. J. Comput. Appl. Math.,
235(4):1077–1084, 2010.
[61] J. Van Deun and L. N. Trefethen. A robust implementation of the Carathéodory–Fejér method
for rational approximation. BIT Numer. Math., 51(4):1039–1050, 2011.
[62] R. S. Varga, A. Ruttan, and A. D. Carpenter. Numerical results on best uniform rational
approximation of |x| on [−1, +1]. Mathematics of the USSR-Sbornik, 74(2):271, 1993.
[63] G. A. Watson. Approximation Theory and Numerical Methods. Wiley, 1980.
[64] H. Werner. Die konstruktive Ermittlung der Tschebyscheff-Approximierenden im Bereich der
30 FILIP, NAKATSUKASA, TREFETHEN AND BECKERMANN

rationalen Funktionen. Arch. Ration. Mech. An., 11(1):368–384, 1962.


[65] H. Werner. Tschebyscheff-Approximation im Bereich der rationalen Funktionen bei Vorliegen
einer guten Ausgangsnäherung. Arch. Ration. Mech. An., 10(1):205–219, 1962.

You might also like