Pivoting and Backward Stability of Fast Algorithms For Solving Cauchy Linear Equations
Pivoting and Backward Stability of Fast Algorithms For Solving Cauchy Linear Equations
www.elsevier.com/locate/laa
Abstract
Three fast O(n2 ) algorithms for solving Cauchy linear systems of equations are proposed.
A rounding error analysis indicates that the backward stability of these new Cauchy solvers is
similar to that of Gaussian elimination, thus suggesting to employ various pivoting techniques
to achieve a favorable backward stability. It is shown that Cauchy structure allows one to
achieve in O(n2 ) operations partial pivoting ordering of the rows and several other judicious
orderings in advance, without actually performing the elimination. The analysis also shows
that for the important class of totally positive Cauchy matrices it is advantageous to avoid
pivoting, which yields a remarkable backward stability of the suggested algorithms. It is shown
that Vandermonde and Chebyshev–Vandermonde matrices can be efficiently transformed into
Cauchy matrices, using Discrete Fourier, Cosine or Sine transforms. This allows us to use the
proposed algorithms for Cauchy matrices for rapid and accurate solution of Vandermonde and
Chebyshev–Vandermonde linear systems. The analytical results are illustrated by computed
examples. © 2002 Elsevier Science Inc. All rights reserved.
AMS classification: 65F05; 65L20; 15A09; 15A23
Keywords: Displacement structure; Cauchy matrix; Vandermonde matrix; Fast algorithms; Pivoting;
Rounding error analysis; Backward stability; Total positivity
聻
This work was supported in part by NSF contracts CCR-962811, 9732355 and 0098222.
∗ Corresponding author.
E-mail address: [email protected] (V. Olshevsky).
URL: www.cs.gsu.edu/˜ matvro (V. Olshevsky).
0024-3795/02/$ - see front matter 2002 Elsevier Science Inc. All rights reserved.
PII: S 0 0 2 4 - 3 7 9 5( 0 1) 0 0 5 1 9 - 5
64 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
1. Introduction
C(x1:n , y1:n ) =
..
..
. . ,
1
xn −y1 · · · xn −yn
1
(1.1)
1 x1 x12 ··· x1n
1 x22 ··· x2n
x2
V (x1:n ) = . .. .. .. .. ,
.. . . . .
1 xn xn2 ··· xnn
are classical. They are encountered in many applied problems related to polyno-
mial and rational function computations. Vandermonde and Cauchy matrices have
many similar properties, among them one could mention the existence of explicit
formulas for their determinants and inverses, see, e.g., [2] and references therein.
Along with many interesting algebraic properties, these matrices have several re-
markable numerical properties, often allowing us much more accurate computations
than those based on the use of general (structure-ignoring) algorithms, say Gaussian
elimination with pivoting. At the same time, such favorable numerical properties are
much better understood for Vandermonde and related matrices (see, for example,
[3,5–7,19–21,32,36–38]), as compared to the analysis of numerical issues related to
Cauchy matrices (see [10,11]).
This description allows one to solve the associated linear systems in only O(n2 )
operations, which is by an order of magnitude less than the complexity O(n3 ) of
general (structure-ignoring) methods. Moreover, the algorithm requires only O(n)
locations of memory.
O(n2 ) operations and O(n) locations of memory. It was further shown in [2] that the
following configuration of the nodes
yn < · · · < y 1 < x 1 < · · · < x n (1.8)
is an appropriate analog of (1.4) for Cauchy matrices, allowing us to prove that the
error bounds associated with the BP-type algorithm are entirely similar to (1.5) and
(1.6), viz.,
|a − â| 5(2n + 1)uC(x1:n , y1:n )−1 |f | + O(u2 ), (1.9)
1 Totally positive matrices are those for which the determinant of every submatrix is positive, see the
monographs [9,25].
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 67
It is now well known that for structured matrices the Gaussian elimination pro-
cedure can be speeded up, leading to the generalized Schur algorithms, see, e.g.,
[29], and the references therein. In the following section we exploit the displace-
ment structure of Cauchy matrices to specify two such algorithms. Then in Section
3 we exploit the quasi-Cauchy structure of the Schur complements of C(x1:n , y1:n )
to derive one more algorithm for solving Cauchy linear equations. Then in Section 4
we perform a rounding error analysis for these algorithms, obtaining backward and
residual bounds similar to those for Gaussian elimination. This analogy allows us in
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 69
Section 5 to carry over the stabilizing techniques known for Gaussian elimination to
the new Cauchy solvers. The numerical properties of new algorithms are illustrated
in Section 6 by a variety of examples. Then in Section 7 we show how Vander-
monde and Chebyshev–Vandermonde matrices can be efficiently transformed into
Cauchy matrices by using Discrete Fourier, Cosine or Sine transforms, thus allowing
us to use the proposed Cauchy solvers for the rapid solution of Vandermonde and
Chebyshev–Vandermonde linear systems.
for various choices of (sparse) matrices {F, A}. Let α := rank∇{F,A} (R). Then one
can factor
where both matrices on the right-hand side of (2.2) have only α columns each:
G, B ∈ Cn×α . The number α = rank∇{F,A} (R) is called {F, A}-displacement rank
of R, and the pair {G, B} is called {F, A}-generator of R. The displacement rank
measures the complexity of R, because all its n2 entries are described by a small-
er number 2αn entries of its generator {G, B}. We refer to the recent survey [29]
for more complete information on displacement, and further references and here
we restrict ourselves only to Cauchy matrices. The following lemma specifies their
displacement structure.
Lemma 2.1. Let ∇{F,A} (·) : Cn×n → Cn×n be defined by (2.1) with
d1 u1
R1 = (1) ,
l1 R22
(1)
and denote by R2 = R22 − d11 l1 u1 its Schur complement. Then if the matrices F1
and A1 in (2.2) are lower and upper triangular, respectively, say
f 0 a ∗
F1 = 1 , A1 = 1 , (2.5)
∗ F2 0 A2
then the {F2 , A2 }-displacement rank of the Schur complement is less than or equal
to α, so that we can write
F2 · R2 − R2 · A2 = G2 · B2T with some G2 , B2 ∈ C(n−1)×α (2.6)
see, e.g., [29] and the references therein, which also show how to run the genera-
tor recursion {G1 , B1 } → {G2 , B2 }. The following lemma (cf. with [12,13,28]) pro-
vides one particular form for the generator recursion.
d1 u1
R1 = (1)
l1 R22
satisfy the displacement equation (2.2) with triangular F1 , A1 partitioned as in
(2.5). If the (1, 1) entry d1 of R1 is nonzero, then the Schur complement R2 =
(1)
R22 − d11 l1 u1 satisfies the displacement equation (2.6) with
0 1
= G1 − 1 · g1 , 0 B2 = B1 − 1 d11 uT1 · b1 , (2.7)
G2 d1 l 1
where g1 and b1 denote the top rows of G1 and B1 , respectively.
d1 u1 1 0 d1 u1
R1 = (1) = 1 · , (2.8)
l1 R22 d1 l 1 I 0 R2
(1)
where R2 = R22 − d11 ll u1 is the Schur complement of (1, 1) entry d1 in the matrix
R1 . This step provides the first column
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 71
1
1
d1 l 1
of L and the first row [d1 u1 ] in the LU factorization of R1 . Proceeding with R2
similarly, after n − 1 steps one obtains the whole LU factorization.
Algorithms that exploit the displacement structure of a matrix to speed up the
Gaussian elimination procedure are called generalized Schur algorithms, because
the classical Schur algorithm [33] was shown (see, e.g., [29,30]) to belong to the
class. Instead of computing the (n − k + 1)2 entries of the Schur complement Rk ,
one has to compute only α(n − k + 1) entries of its {Fk , Ak }-generator {Gk , Bk },
which requires much less computations. To run the generator recursion (2.7) as well
as to write down the corresponding entries of the L and U factors, one needs only to
specify how to recover the first row and column of Rk from its generator {Gk , Bk }.
For a Schur complement Rk = [rij(k) ]k i,j n of a Cauchy matrix this is easy to do:
(k) (k)
(k) g i · b1
ri,j = , (2.9)
xi − yj
(k) (k) (k) (k)
where {[gk · · · gn ]T , [bk · · · bn ]} designates the corresponding generator.
In the rest of this section we formulate two variants of a generalized Schur al-
gorithm for C(x1:n , y1:n ). The first version is an immediate implementation of (2.7)
and (2.9), and its MATLAB code is given next.
In fact Algorithm 2.3 is valid for the more general class of Cauchy-like matrices,
see, e.g., [12,28] for details and applications. However for the special case of ordi-
nary Cauchy matrices we can exploit the fact that the corresponding displacement
rank is equal to 1, to formulate a more specific version of the generalized Schur
algorithm, based on the following lemma.
be the successive Schur complements of the Cauchy matrix C(x1:n , y1:n ). Then the
generator recursion (2.7) can be specialized to
(k+1) xk+1 −xk (k)
gk+1 xk+1 −yk gk+1
..
. = ,
..
.
(k+1) xn −xk (k)
gn xn −yk g n
(2.10)
(k+1) (k+1) y −yk (k) yn −yk (k)
bk+1 · · · bn = yk+1
k+1 −xk
b n · · · yn −xk b n .
The nonzero entries of the factors in C(x1:n , y1:n ) = LDU are given by dk = xk −
yk and
1 (k)
lk,k xk −yk gk
..
. = ,
..
.
ln,k 1 (k)
xn −yk g n
1 (k) (k)
(2.11)
uk,k · · · uk,n = xk −y b
k k
· · · 1
b
xk −yn n .
The following MATLAB code implements the algorithm based on Lemma 2.4.
The fast algorithms 2.3 and 2.5 require O(n2 ) locations of memory to store the
triangular factors. An algorithm with O(n) storage is described next.
The concept of displacement structure was initiated by the paper [26], where it
was first applied to study Toeplitz matrices, using a displacement operator of the form
∇{Z,Z T } (R) = R − Z · R · Z T , where Z is the lower shift matrix. In this section we
make a connection with [30], where the fact that Toeplitz matrices belong to the more
general class of matrices with ∇{Z,Z T } -displacement rank 2, was used to introduce
the name quasi-Toeplitz for such matrices. It is shown that any quasi-Toeplitz matrix
R can be represented as a product of three Toeplitz matrices:
R = LT U, (3.1)
where L is the lower and U is the upper triangular Toeplitz matrices. Patterning our-
selves upon the above definition, and taking Lemma 2.1 as a starting point, we shall
refer to matrices with {Dx , Dy }-displacement rank 1 as quasi-Cauchy matrices. The
next simple lemma is an analog of (3.1).
Lemma 3.1. Let Dx and Dy be defined by (2.4). Then the unique solution of the
equation
T
Dx · R − R · Dy = g 1 · · · g n · b 1 · · · bn (3.2)
is given by
R = diag(g1 , g2 , . . . , gn ) · C(x1:n , y1:n ) · diag(b1 , b2 , . . . , bn ). (3.3)
Lemma 3.1 allows us to obtain below an explicit factorization formula for Cauchy
matrices. Indeed, by Lemma 2.2 its Schur complement R2 = R22 − d11 l1 u1 in
74 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
1
d1 u1 d 0 0 d1 u1
C(x1:n , y1:n ) = = 1 d1
l1 R22 l1 I 0 R2 0 I
also has displacement rank 1, so by Lemma 3.1 its Cauchy structure can be recovered
by dropping diagonal factors as shown next.
Lemma 3.2. The Cauchy matrix and its inverse can be factored as
C(x1:n , y1:n ) = L1 · · · Ln−1 D Un−1 · · · U1 , (3.4)
where D = diag((x1 − y1 ), . . . , (xn − yn )), and
I
Ik−1 k−1
1
1
Lk =
xk −yk
1 1
..
.
.. ..
.
1 .
xn −xk 1 1
Ik−1
1
xk+1 − xk
× ,
..
.
xn − xk
Ik−1 Ik−1
1 1 1 ··· 1
yk − yk+1 1
Uk =
.. ..
. .
yk − yn 1
Ik−1
1
xk −yk+1
×
..
.
.
1
xk −yn
The representation for the inverse matrix C(x1:n , y1:n )−1 obtained from the above
leads to the following algorithm for solving C(x1:n , y1:n )a = f .
2 Algorithm 3.3 has lower complexity and better error bounds than its earlier variant called Cauchy-2
in [1].
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 75
% n=max(size(x)); a=f;
for k=1:n-1
for j=k:n
a(j) = a(j) * ( x(j) - y(k) );
end
for j=k+1:n
a(j) = a(j) - a(k);
end
for j=k+1:n
a(j) = a(j) / ( x(j) - x(k));
end
end
for k=1:n-1
a(k) = a(k) / ( x(k) - y(k) );
end
a(n) = a(n) * ( x(n) - y(n) );
for k = n-1:-1:1
for j=k+1:n
a(j) = a(j) /(y(k)-y(j));
end
tmp = 0;
for j=n:-1:k+1
tmp = tmp + a(j);
end
a(k) = a(k) -tmp;
for j=k:n
a(j) = a(j) * ( x(k) - y(j) );
end
end
return
The reader should note that the multiplication of the central factor of Uk−1 by a
vector is performed by accumulation of the inner product from the last to the first
entry; this order is influenced by the error analysis in the following section.
Algorithm 2.3 is a special case of the more general GKO algorithm [12], which is
applicable to the wider class of Cauchy-like matrices. A normwise rounding
76 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
error analysis for the GKO algorithm appeared in [35]. Along with the usual factor
LU (cf. with (1.15)) the backward error bound of [35] involves also a so-called
generator growth factor of the form
(k) (k)
|gk | |bk |
diag
. (4.1)
(k) (k)
g b
k k
(k) (k)
In the context of [12,35] the quantities gk and bk were vectors of size equal to
the displacement rank of R; so if the quantity in (4.1) is large, then the backward
stability of the GKO algorithm could be less favorable than that of Gaussian elimi-
nation. However, the ordinary Cauchy matrices, considered in the present paper, all
have displacement rank 1, so that the constant in (4.1) is unity, suggesting that the
backward stability of Algorithm 2.3 is related to that of Gaussian elimination without
pivoting.
Theorem 4.2. Assume that Algorithm 3.3 (quasi-Cauchy algorithm) is carried out
in floating point arithmetic with unit roundoff u, and that no overflows were en-
countered during the computation. Then the computed solution â solves a nearby
system
(C(x1:n , y1:n ) + C)â = (L + L)(D + D)(U + U )â = f
with
|C(x1:n , y1:n )â − f | ((n2 + 11n − 10)u + O(u2 ))|L| |DU | |â|, (4.6)
Proof. Let us recall that Algorithm 3.3 solves a Cauchy linear system by computing
a = C(x1:n , y1:n )−1 f = U1−1 · · · Un−1
−1
· D −1 · L−1 −1
n−1 · · · L1 f, (4.9)
where the {Li , Ui } are given by (3.4). The proof for (4.7) will be obtained in the
following steps:
(i) First we apply the standard error analysis for each elementary matrix–vector
multiplication in (4.9) to show that the computed solution â satisfies
â = (U1−1 ∗ δU1 ) · · · (Un−1
−1
∗ δU1 ) · (D −1 ∗ δD)
·(L−1 −1
n−1 ∗ δLn−1 ) · · · (L1 ∗ δL1 )f, (4.10)
where the asterisk ∗ denotes the Hadamard (or componentwise) product.
78 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
(ii) Next, the obtained bounds for δLk , δD, δUk will be used to deduce further
bounds for Lk , D, Uk , defined by
(L−1
k ∗ δLk )
−1
= Lk + Lk , (D −1 ∗ δD)−1 = D + D,
(Uk−1 ∗ δUk )−1 = Uk + Uk .
5. Pivoting
In the previous section we established that the backward stability of all the fast
Cauchy solvers suggested in the present paper is related to that of Gaussian elimina-
tion. This analogy will allow us to carry over the stabilizing techniques of Gaussian
elimination to the new Cauchy solvers. First, however, we identify the case when no
pivoting is necessary.
If we assume that
yn < · · · < y1 < x1 < · · · < xn , (5.1)
i.e. the matrix C(x1:n , y1:n ) is totally positive, so that all the entries of the exact
factors L and U are positive [9]. In this case Theorems 4.1 and 4.2 imply that Algo-
rithms 2.5 and 3.3 produce a favorable small backward error.
Corollary 5.1. Assume that condition (5.1) holds, i.e. that C(x1:n , y1:n ) is totally
positive, and assume that Algorithms 2.5 (GS-direct algorithm) and 3.3 (quasi-Cau-
chy algorithm) are performed in the floating point arithmetic with unit roundoff u,
and that no overflows were encountered during the computation.
80 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
If the triangular factorization of the GS-direct algorithm is used to solve the as-
sociated linear system, then the computed solution â solves a nearby system
(C(x1:n , y1:n ) + C)â = f,
with
|C| ((10n − 3)u + O(u2 ))C(x1:n , y1:n ). (5.2)
The analogous backward bound for the quasi-Cauchy algorithm is
|C| ((n2 + 11n − 10)u + O(u2 ))C(x1:n , y1:n ). (5.3)
The above results show that the backward stability of the fast algorithms 2.5, 3.3
for totally positive Cauchy matrices is even more favorable than that of the slow
Gaussian elimination procedure, see (1.16). Indeed the difference is that the bound
(1.16) is valid only for the case when the entries of the computed factors L̂ and Û
remain positive (which is usually not the case with ill-conditioned matrices), whereas
the favorable bounds in the two above corollaries hold while there are no overflows.
For example, for the Hilbert matrix
H = i+j1−1
the condition number k2 (H ) grows exponentially with the size, so already for small
n we have k2 (H ) > q(n)(1/u). Here q(n) is a polynomial of small degree in n.
Then in accordance with [39] the matrix H will likely lose during the elimination
not only its total positivity, but also the weaker property of being positive definite.
Correspondingly, the single precision LAPACK routine SPOSV for Cholesky fac-
torization, when applied to the Hilbert matrix, exits with an error flag already for
n = 9, warning that the entries of L̂, Û became negative, so the pleasing backward
bound (1.16) is no longer valid for Gaussian elimination. In contrast, the favorable
bounds (5.2), (5.3) are valid for higher sizes, as long as there are no overflows.
Here we assume that the two sets of nodes {xk } and {yk } are not separated from
each other. The similarity of the backward bounds (1.15) for Gaussian elimination
and of (4.2), (4.7) for the new Cauchy solvers suggests us to use the same pivoting
techniques for preventing instability. More precisely, any row or column reordering
that reduces the size of |L||U | appearing in the bounds (4.2), (4.7) will stabilize the
numerical performance of Algorithms 2.5, 3.3. Moreover, the normwise error analy-
sis of [35] for Algorithm 2.3, reviewed at the beginning of Section 4, also indicates
that the pivoting will enhance the accuracy of Algorithm 2.3.
Here we should note that the partial pivoting technique can be directly incorpo-
rated into the generalized Schur algorithms 2.3 and 2.5, see, e.g., [12]. However,
the corresponding ordering of {xk } can also be computed in advance in O(n2 ) flops.
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 81
Indeed, the partial pivoting technique determines a permutation matrix P such that
at each elimination step the pivot elements dk in
d1
..
PR = L . U
dn
are as large as possible. Clearly, the determinant of the k × k leading submatrix of
R is equal to d1 · · · dk , so the objective of partial pivoting is the successive maxi-
mization of the determinants of leading submatrices. This observation, and the well-
known formula [4] for the determinant of a Cauchy matrix, imply that partial pivoting
on C(x1:n , y1:n ) is equivalent to the successive maximization of the quantities
i−1
(xi − xj ) i−1 (yi − yj )
di | = j =1 j =1
, i = 1, . . . , n. (5.4)
(xi − yi ) i−1 (xi − yj ) i−1 (xj − yi )
j =1 j =1
We shall call this procedure predictive partial pivoting, because it can be rapidly
computed in advance by the following algorithm.
function x = partial(x,y)
n = max(size(x));
dist = 0; m = 1; aux = zeros(1,n);
for = i = 1:n
aux(i) = abs(1 / (x(i) - y(1)));
if dist<aux(i) m = i; dist = aux(i); end
end
x = swap(x,1,m); aux(m) = aux(1);
if n<=2 return; end
for i = 2:(n-1)
dist = 0; m = i;
for = j = i:n
aux(j) = aux(j) * abs((x(j) - x(i-1)) / (x(j) - y(i)));
if dist<aux(j) m = j; dist = aux(j); end
end
x = swap(x,i,m); aux(m) = aux(i);
end
return
A similar row reordering technique for Vandermonde matrices (and a fast O(n2 )
algorithm for achieving it) was proposed in [21], and in [31] it was called Leja
3 The subroutine “ swap(x,i,m)” in Algorithm 5.2 swaps the ith and mth elements of the vector x.
82 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
ordering. Therefore, PPP may also be called rational Leja ordering, by analogy with
(polynomial) Leja ordering of [21,31].
In a recent paper [18] a variation of complete pivoting was suggested for the more
general Cauchy-like matrices. In the context of [18] the corresponding displacement
rank is 2 or higher. For the ordinary Cauchy matrices C(x1:n , y1:n ) (displacement
rank = 1), Gu’s pivoting can be described as follows. At each elimination step one
(k)
chooses the column of Rk with the maximal magnitude entry bm in its generator Bk
(here we use the notations of Lemma 2.4). Then one interchanges this column with
the first one, and performs the partial pivoting step. The explicit expression (2.10)
for the entries of the successive generators Bk readily suggests a modification of
Algorithm 5.2 to perform the Gu’s variant of pivoting in advance, leading to what
can be called predictive Gu pivoting.
6. Numerical illustrations
We performed numerous numerical tests for the three algorithms suggested and
analyzed in this paper. The results confirm theoretical results (as perhaps should be
expected). In this section we illustrate with just a few examples the influence of
different orderings on the numerical performance of the following algorithms:
(a) GS-Cauchy. (Fast O(n) algorithm 2.3 requiring O(n2 ) storage.)
(b) GS-direct-Cauchy. (Fast O(n) algorithm 2.5 requiring O(n2 ) storage.)
(c) quasi-Cauchy. (Fast O(n) algorithm 3.3 requiring O(n) storage.)
(d) BKO. (Fast O(n2 ) algorithm of ([2]) requiring O(n) storage.)
(e) INV. The use of the explicit inversion formula
n n
(x −y )
(xk −yi )
−1
C(x1:n , y1:n ) = k=1
n
(y −y )
· x j
1
−y i
· nk=1 j k
(x −x ) . (6.1)
k=1 k i k=1 j k
k =i
/ k =j
/ 1i,j n
(Fast O(n) algorithm requiring O(n) storage.)
(f) GEPP. Gaussian elimination with partial pivoting. (Slow O(n3 ) algorithm, re-
quiring O(n2 ) storage.)
We refer to [2] for the discussion and computed examples related to the important
case of totally positive Cauchy matrices, and restrict ourselves here to the generic
case in which the two sets {xk } and {yk } cannot be separated from each other, so that
they cannot be reordered to achieve (1.8). We solved various Cauchy linear systems
Ca = f (6.2)
(including interlaced x1 < y1 < x2 < y2 < · · · < xn < yn equidistant, clustered or
randomly distributed nodes, and with many other configurations) with different right-
hand sides (RHS) f. We also solved the so-called Cauchy–Toeplitz linear systems
with coefficient matrices of the form
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 83
1
C= a+(i−j )b (6.3)
with different choices for the parameters a and b. All the experiments were per-
formed on a DEC 5000/133 RISC workstation in single precision (unit roundoff
≈ 1.19 × 10−7 ). For GEPP we used the LAPACK routine SGESV, and all the other
algorithms were implemented in C. In order to check the accuracy we implemented
all the above algorithms in double precision (unit roundoff ≈ 2.22 × 10−16 ), and
in each example we determined two particular algorithms providing solutions that
were the closest to each other. In all cases these two solutions agreed in more than
the eight significant digits needed to check the accuracy for a solution obtained in
single precision, so we regarded one of these double precision solutions âd as being
exact, and used it to compute the 2-norm relative error
âi − âd 2
ei =
âd 2
for the solutions âi computed by each of the above algorithms. In addition we com-
puted the residual errors
f − C · âi 2
ri =
f 2
and the backward errors
A2
bi = min : (A + A)âi = f
A2
using the formula
f − C · âi 2
bi = ,
C2 · âi 2
a result probably first shown by Wilkinson, see, e.g., [22]. Tables 1–6 display also
the condition number κ2 (C) of the coefficient matrix, norms for the solution âd 2
and the right-hand side f 2 as well as some other useful information.
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
Forward error: partial pivoting ordering
Table 2
Forward error: monotonic ordering
85
86
Table 3
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
Backward error: partial pivoting ordering
n maxij (|L| |DU |) maxij (C) INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP
b̂1 b̂2 b̂3 b̂4 b̂5 b̂6
10 2e + 00 1.0e + 00 6e − 08 1e − 07 6e − 08 6e − 08 4e − 08 5e − 08
50 2e + 00 1.0e + 00 1e − 07 2e − 01 8e − 08 1e − 07 2e − 07 1e − 07
100 2e + 00 1.0e + 00 3e − 07 1e + 00 1e − 07 1e − 07 2e − 07 1e − 07
Table 4
Backward error: monotonic ordering
n maxij (|L| |DU |) maxij (C) INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy
b̂1 b̂2 b̂3 b̂4 b̂5
5 6e + 02 1.0e + 00 6e − 08 2e − 07 3e − 06 1e − 06 2e − 06
10 3e + 06 1.0e + 00 7e − 08 2e − 05 1e − 03 7e − 03 6e − 03
20 1e + 14 1.0e + 00 1e − 07 6e − 02 1e + 00 1e + 00 1e + 00
30 4e + 21 1.0e + 00 1e − 07 1e + 00 1e + 00 1e + 00 1e + 00
50 6e + 36 1.0e + 00 2e − 07 1e + 00 NaN NaN NaN
60 2e + 44 1.0e + 00 NaN 1e + 00 NaN NaN NaN
Table 5
Residual error: partial pivoting ordering
n maxi (|L| |DU | |âd |) maxij (C) INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy GEPP
r1 r2 r3 r4 r5 r6
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
10 9e + 00 1e + 00 1e − 07 3e − 07 1e − 07 1e − 07 9e − 08 1e − 07
50 2e + 01 1e + 00 4e − 07 5e − 01 2e − 07 3e − 07 4e − 07 3e − 07
100 3e + 01 1e + 00 8e − 07 3e + 10 3e − 07 3e − 07 5e − 07 3e − 07
Table 6
Residual error: monotonic ordering
n maxi (|L| |DU | |âd |) maxij (C) INV BKO quasi-Cauchy GS-Cauchy GS-direct-Cauchy
r1 r2 r3 r4 r5
5 9e + 02 1e + 00 1e − 07 3e − 07 6e − 06 3e − 06 4e − 06
10 1e + 07 1e + 00 2e − 07 4e − 05 3e − 03 2e − 02 1e − 02
20 7e + 14 1e + 00 3e − 07 1e − 01 2e + 05 8e + 04 1e + 05
30 3e + 22 1e + 00 3e − 07 2e + 04 7e + 16 1e + 18 4e + 16
50 7e + 37 1e + 00 4e − 07 3e + 20 NaN NaN NaN
60 3e + 45 1e + 00 NaN 9e + 26 NaN NaN NaN
Table 7
Forward error: partial pivoting ordering
87
88 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
Table 8
Forward error: Gu’s pivoting
60 2e − 03 7e + 01 8e − 04 1e − 04 3e − 04
80 6e − 03 4e + 05 4e − 03 3e − 04 2e − 03
100 1e − 02 2e + 09 5e − 03 5e − 04 4e − 03
Table 9
Forward error: no pivoting
60 5e − 03 1e + 02 9e − 04 2e − 04 7e − 05
80 3e − 02 2e + 05 1e − 03 5e − 04 1e − 04
100 2e − 02 1e + 09 7e − 04 6e − 04 2e − 04
Table 10
Backward error: partial pivoting ordering
60 1e + 03 1.0e + 01 4e − 04 1e + 00 2e − 07 3e − 07 3e − 07 4e − 07
80 1e + 03 1.0e + 01 1e − 03 NaN 4e − 07 4e − 07 4e − 07 7e − 07
100 2e + 03 1.0e + 01 6e − 03 NaN 6e − 07 6e − 07 7e − 07 1e − 06
Table 11
Backward error: Gu’s pivoting
60 1e + 04 2e − 04 1e + 00 9e − 05 1e − 04 5e − 05
80 3e + 04 2e − 03 1e + 00 2e − 04 3e − 04 1e − 04
100 4e + 04 2e − 03 1e + 00 5e − 04 5e − 04 2e − 04
of the structure often allows us not only to speed up the computation, but also to
achieve more accuracy, as compared to general structure-ignoring methods.
It may seem to be quite unexpected that for Cauchy–Toeplitz matrices Gu’s piv-
oting technique (combining row and column permutations) can lead to less accurate
solutions as compared to the PPP technique (based on row permutations only). To
understand this occurrence it is useful
to observe that the entries of the diagonals of
1
Cauchy–Toeplitz matrices a+b(i−j ) depend hyperbolically on the difference (i −
j ), thus giving a pick for the diagonal with (i − j ) ≈ −a/b. We next display the
MATLAB graphs for the several permuted versions of the matrix in Example 2 for
n = 10 (cf. Fig. 1).
One sees that in the original matrix C(x1:10 , y1:10 ) the maximal magnitude entries
(all = 10) occupy the fourth subdiagonal (i.e., in the lower triangular part of the
matrix). Applying partial pivoting technique means moving each of the rows 4–10
three positions up, so that the maximal magnitude entries are now all located on
the main diagonal. In Table 13 we list the condition numbers for the k × k leading
submatrices corresponding to the three pivoting techniques.
We note, however, that the motivation for introducing Gu’s pivoting technique
was given in [18], where an application of [12] with displacement rank 2 or higher
was discussed.
a = 1 and b = 0.3. For such a matrix the maximal magnitude entries will now be
located above the main diagonal. Therefore it is reasonable to apply a partial column
pivoting technique. As in the above example, we next display the permuted versions
of a matrix, corresponding to different pivoting techniques (cf. Fig. 2).
In Table 14 we list the corresponding condition numbers for all successive leading
submatrices.
Table 12
Backward error: no pivoting
60 1e + 04 1e − 03 1e + 00 1e − 04 2e − 04 7e − 05
80 3e + 04 4e − 03 1e + 00 4e − 04 5e − 04 1e − 04
100 5e + 04 2e − 03 1e + 00 4e − 04 6e − 04 2e − 04
Table 13
Conditioning of leading submatrices
We now turn to the numerical results comparing the performance of the algo-
rithms designed in the present paper. We again used the vector f = [1 1 · · · 1]T
for the right-hand side.
Forward error. See Tables 15–18.
In this example the forward accuracy of the GS-Cauchy algorithm is better than
that of the GS-direct-Cauchy and quasi-Cauchy algorithms. Note that there are many
other examples, however, where these algorithms have roughly the same accuracy.
Backward error. It turns out that for many different orderings all the algorithms
designed in this paper exhibit a favorable backward stability. Moreover, for n vary-
ing from 5 to 100, for partial row pivoting, partial column pivoting, Gu’s pivoting,
and for no-pivoting the algorithms GS-Cauchy, GS-Cauchy-direct and quasi-Cauchy
produced backward errors of the order of 10−8 which is comparable to that of GEPP.
We, however, found that monotonic ordering, defined in Section 6.1, and randomized
ordering produce poor results.
This indicates that analytical error bounds obtained for the fast algorithms of this
paper in fact may lead to a wide variety of different pivoting techniques, each aimed
at the reduction of the quantity |L| · |U |.
92 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
Table 14
Conditioning of leading submatrices
In this section we shall show that all the fast Cauchy solvers suggested in the
present paper can be used to solve linear systems with polynomial Vandermonde
matrices
P0 (x1 ) P1 (x1 ) · · · Pn−1 (x1 )
P1 (x2 ) · · · Pn−1 (x2 )
def P0 (x2 )
VP (x1:n ) = . .. .. .. , (7.1)
.. . . .
P0 (xn ) P1 (xn ) · · · Pn−1 (xn )
where P = {P0 (x), . . . , Pn−1 (x)} denotes a basis in the linear space Cn−1 [x] of all
complex polynomials whose degree does not exceed n − 1. When P is the power
basis, then VP (x1:n ) is the ordinary Vandermonde matrix. If P stands for the basis of
Chebyshev polynomials (of the first or of the second kind), then VP (x1:n ) is called
a Chebyshev–Vandermonde matrix. Fast O(n2 ) algorithms for solving Chebyshev–
Vandermonde systems were suggested in [7,16,20,23]. Here we suggest an alterna-
tive, based on the following result.
Table 16
Forward error: partial column pivoting
93
94
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
Table 17
Forward error: Gu’s pivoting
Table 18
Forward error: no pivoting
Observe that formula (7.2) relates VP (x1:n ) and VP (y1:n ), or, in other words, it
allows us to change the nodes from {xk } to {yk }, while keeping the polynomial ba-
sis P = {P0 , . . . , Pn−1 }. Suitable choices of the new points {y1:n } can ensure that
VP (y1:n ) has low complexity. In such cases Proposition 7.1 allows us to reduce the
problem of solving a linear system with VP (x1:n ) to the analogous problem of solving
a linear system with the Cauchy matrix C(x1:n , y1:n ).
In the following proposition we specify several sets of points {y1:n } for which
ordinary Vandermonde matrices and Chebyshev–Vandermonde matrices have low
complexities.
Proposition 7.2.
1. Let yj = cos( jn ), j ∈ {0, . . . , n}, be the extrema of Tn (x). Then
n
VT (y0:n ) = cos j nk (7.3)
j,k=0
The latter proposition is easily deduced from the definitions of Chebyshev poly-
nomials. Before proving Proposition 7.1, let us introduce the necessary notations.
Let n denote the maximal degree of two polynomials u(x) and v(x). The bivariate
function
u(x) · v(y) − u(y) · v(x)
Bu,v (x, y) =
x−y
96 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
is called the Bezoutian of u(x) and v(x). Now, let Q = {Q0 (x), . . . , Qn−1 (x)} be
another basis in the linear space Cn−1 [x]. The matrix
n−1
B{P,Q,u,v} = bij i,j =0 ,
whose entries are determined by
n−1
Bu,v (x, y) = bij · Pi (x) · Qj (y)
i,j =0
Q0 (y)
Q1 (y)
= P0 (x) P1 (x) ··· Pn−1 (x) B{P,Q,u,v} .. (7.8)
.
Qn−1 (y)
is called the Bezout matrix of u(x) and v(x) with respect to the two sets of polyno-
mials P and Q.
Proof of Proposition 7.1. The proof is based on the following useful property of
the Bezout matrix
n
VP (x1:n ) B{P,Q,u,v} VQ (y1:n )T = Bu,v (xi , yj ) i,j =1 , (7.9)
which follows immediately from (7.8). It is easy to see that the matrix on the right-
hand side of Eq. (7.9) is a quasi-Cauchy matrix
u(xi ) v(yj ) n
VP (x1:n )B{P,Q,u,v} VQ (y1:n ) =
T
xi − yj i,j =1
= diag{u(x1 ), . . . , u(xn )}C(x1:n , y1:n )diag{v(y1 ), . . . , v(yn )}. (7.10)
8. Conclusion
the generic case the numerical performance of the BP-type algorithm can be less
favorable. Since the use of explicit inversion formula for Cauchy matrices also can
produce a large backward error, no fast and accurate methods methods were available
for solving Cauchy linear equations. In this paper we designed several alternative fast
O(n2 ) Cauchy solvers, and the rounding error analysis suggests that their backward
stability is similar to that of Gaussian elimination (GE), so that various pivoting
techniques (so successful for GE) will stabilize the numerical behavior also for these
new algorithms. It is further shown that the row ordering of partial pivoting and of
Gu’s pivoting [18] can be achieved in advance, without actually performing elimina-
tion, and fast O(n2 ) algorithms for these purposes are suggested. We also identified
a class of totally positive Cauchy matrices, for which it is advantageous not to piv-
ot when using the new algorithms, which yields a remarkable backward stability.
This matches the conclusion of de Boor and Pinkus, who suggested to avoid piv-
oting when performing standard Gaussian elimination on totally positive matrices.
Analytical error bounds and results of numerical experiments indicate that the meth-
ods suggested in the present paper enjoy favorable backward stability.
Most of the results of this paper and of [2] were available since 1994 as an ISL re-
port at Stanford University [1], and they were reported at several conferences. There
seems to be a further interest to connections between accuracy and total positivity
[40,41].
References
[1] T. Boros, T. Kailath, V. Olshevsky, Fast algorithms for solving Cauchy linear systems, Stanford
Information Systems Laboratory Report, 1994.
[2] T. Boros, T. Kailath, V. Olshevsky, Fast Björck–Pereyra-type algorithm for parallel solution of Cau-
chy linear equations, Linear Algebra Appl. 302–303 (1999) 265–293.
[3] A. Björck, V. Pereyra, Solution of Vandermonde systems of equations, Math. Comp. 24 (1970)
893–903.
[4] A.L. Cauchy, Mémoires sur les fonctions alternées et sur les sommes alternées, Exercices d’analyse
et de Phys. Math. ii (1841) 151–159.
[5] T. Chan, D. Foulser, Effectively well-conditioned linear systems, SIAM J. Sci. Statist. Comput. 9
(1988) 963–969.
[6] D. Calvetti, L. Reichel, A Chebyshev–Vandermonde solver, Linear Algebra Appl. 172 (1992) 219–
229.
[7] D. Calvetti, L. Reichel, Fast inversion of Vandermonde-like matrices involving orthogonal polyno-
mials, BIT (1993).
[8] C. de Boor, A. Pinkus, Backward error analysis for totally positive linear systems, Numer. Math. 27
(1977) 485–490.
[9] F.R. Gantmacher, M.G. Krein, Oscillatory Matrices and Kernels, and Small Vibrations of Mechani-
cal Systems, second ed., GITTL, Moscow, 1950 (in Russian); German translation: Oszillationsmatr-
izen, Oszillationskerne und kleine Schwingungen mechanischer Systeme, Akademie Verlag, Berlin,
1960.
[10] I. Gohberg, I. Koltracht, On the inversion of Cauchy matrices, in: M.A. Kaashoek, J.H. van Schup-
pen, A.C.M. Ran (Eds.), Signal processing, Scattering and Operator Theory, and Numerical Meth-
ods, Proceedings of the MTNS-89, Birkhäuser, Boston, MA, 1990, pp. 381–392.
98 T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99
[11] I. Gohberg, I. Koltracht, Mixed, componentwise and structured condition numbers, SIAM J. Matrix
Anal. 14 (1993) 688–704.
[12] I. Gohberg, T. Kailath, V. Olshevsky, Fast Gaussian elimination with partial pivoting for matrices
with displacement structure, Math. Comp. 64 (1995) 1557–1576.
[13] I. Gohberg, V. Olshevsky, Fast state space algorithms for matrix Nehari and Nehari–Takagi interpo-
lation problems, Integral Equations Operator Theory 20 (1) (1994) 44–83.
[14] I. Gohberg, V. Olshevsky, Fast algorithms with preprocessing for matrix–vector multiplication prob-
lems, J. Complexity 10 (1994) 411–427.
[15] I. Gohberg, V. Olshevsky, Complexity of multiplication with vectors for structured matrices, Linear
Algebra Appl. 202 (1994) 163–192.
[16] I. Gohberg, V. Olshevsky, Fast inversion of Chebyshev–Vandermonde matrices, Numer. Math. 67
(1) (1994) 71–92.
[17] G. Golub, C. VanLoan, Matrix Computations, second ed., John Hopkins University Press, Balti-
more, MD, 1989.
[18] M. Gu, Stable and efficient algorithms for structured linear systems, 1995, preprint.
[19] N.J. Higham, Error analysis of the Björck–Pereyra algorithms for solving Vandermonde systems,
Numer. Math. 50 (1987) 613–632.
[20] N.J. Higham, Fast solution of Vandermonde-like systems, involving orthogonal polynomials, IMA
J. Numer. Anal. 8 (1988) 473–486.
[21] N.J. Higham, Stability analysis of algorithms for solving confluent Vandermonde-like systems,
SIAM J. Matrix Anal. 11 (1) (1990) 23–41.
[22] N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, PA, 1996.
[23] G. Heinig, W. Hoppe, K. Rost, Structured matrices in interpolation and approximation problems,
Wiss. Z. Tech. Univ. Karl-Marx-Stadt. 31 (2) (1989) 196–202.
[24] G. Heinig, K. Rost, Algebraic Methods for Toeplitz-like Matrices and Operators, Operator Theory,
Birkäuser, Basel, 1984.
[25] S. Karlin, Total Positivity, Stanford University Press, Stanford, 1972.
[26] T. Kailath, S. Kung, M. Morf, Displacement ranks of matrices and linear equations, J. Math. Anal.
Appl. 68 (1979) 395–407.
[27] T. Kailath, V. Olshevsky, Displacement structure approach to Chebyshev–Vandermonde and related
matrices, Integral Equations Operator Theory 22 (1995) 65–92.
[28] T. Kailath, V. Olshevsky, Diagonal pivoting for partially reconstructable Cauchy-like matrices, with
applications to Toeplitz-like linear equations and to boundary rational matrix interpolation problems,
Linear Algebra Appl. 254 (1997) 251–302.
[29] T. Kailath, A.H. Sayed, Displacement structure: theory and applications, SIAM Rev. 37 (3) (1995)
297–386.
[30] H. Lev-Ari, T. Kailath, Triangular factorization of structured Hermitian matrices, in: I. Gohberg
(Ed.), Operator Theory: Advances and Applications, vol. 18, Birkhäuser, Boston, MA, 1986, pp.
301–324.
[31] L. Reichel, Newton interpolation at Leja points, BIT 30 (1990) 23–41.
[32] L. Reichel, G. Opfer, Chebyshev–Vandermonde systems, Math. Comp. 57 (1991) 703–721.
[33] I. Schur, Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind, J. Reine Angew.
Math. 147 (1917) 205–232 [English translation: in: I. Gohberg (ed.), Operator Theory: Advances
and Applications, vol. 18, Birkhäuser, Boston, MA, 1986, pp. 31–88].
[34] J. Stoer, R. Bulirsch, Introduction to Numerical Analysis, Springer, New York, 1980.
[35] D. Sweet, R. Brent, Error analysis of a partial pivoting method for structured matrices, Advanced
signal processing algorithms, in: Proceedings of the SPIE-1995, vol. 2563, pp. 266–280.
[36] W. Tang, G. Golub, The block decomposition of a Vandermonde matrix and its aplications, BIT 21
(1981) 505–517.
[37] E. Tyrtyshnikov, How bad are Hankel matrices, Numer. Math. 67 (2) (1994) 261–269.
T. Boros et al. / Linear Algebra and its Applications 343–344 (2002) 63–99 99
[38] J.M. Varah, Errors and perturbations in Vandermonde systems, IMA J. Numer. Anal. 13 (1993)
1–12.
[39] J. Wilkinson, A priori error analysis of algebraic processes, in: Proceedings of the International
Congress on Mathematics (1966), Mir, Moskow, 1968, pp. 629–639.
[40] D. Calvetti, L. Reichel, Factorization of Cauchy matrices, J. Comput. Appl. Math. 86 (1) (1997)
103–123.
[41] J.J. Martinez, J.M. Pena, Factorizations of Cauchy–Vandermonde matrices, Linear Algebra Appl.
284 (1998) 229–237.