Fast Direct Methods Toeplitz
Fast Direct Methods Toeplitz
ABSTRACT: Least squares estimations have Here 11 . 11 denotes the usual Euclidean norm.
been used extensively in many applications, e.g. Fast algorithms for solving Toeplitx least
system identification and signal prediction. In squares problems have been developed by Bo-
these applications, the least squares estimators janczyk rm et al. [a], Chun et al. [5] and Sweet
can usually be found by solving Toeplitz least [13] that solve (2) in O ( m n ) operations as op-
squares problems. In this paper, we present fast posed to O(nzn2) operations required for gen-
algorithms for solving the Toeplitz least squares eral dense least squares problems. The main
problems. The algorithm is derived by using the aim of this paper is to present a new fast algo-
displacement representation of the normal equa- rithm for solving the Toeplitz least squares prob-
tions matrix. Numerical experiments show that lems where the rectangular Toeplitz matrix has
these algorithms are eficient. full column rank. Our procedure is first to ob-
tain the displacernent representation of the n-by-
n normal equations matrix of the Toeplitz least
1. INTRODUCTION
squares problems. Then we transform the nor-
In signal processing, system identification and im- mal equation matrix into a special structured ma-
age processing applications, one encounters var- trix (Cauchy-like matrix) via fast Fourier trans-
ious forms of structured matrices. An m-by-n forms (FFTs) and solves the resulting n-by-n
matrix Tm2n is said to be Toeplitz if Cauchy-like linear system using Gaussian Elimi-
nation with pivoting technique. The Cauchy-like
linear system can be solved in O ( n 2 )operations
via forward and back-substitution. Hence the so-
lution of the Toeplitz least squares problems can
+
be solved in O ( n 2 mlog n ) operations. The pa-
per is organized as follows. In $27we review some
definitions and results on displacement represen-
tm-2 .. tm-n-1
tations of Toeplitz matrices. In $3, we consider
tm-1 tm-2 ... tm-n+1 tm-n ,
the fast Gaussian Elimination solver for Cauchy-
(1)
i.e., Tm,nis constant along its diagonals. In this like linear system and then present our fast algo-
paper, we are interested in solving Toeplitz least rithm. Finally, some numerical results are given
squares problems. in $4.
Toeplitz least squares problems arise in a vari-
ety of applications in signal processing. In these 2. DISPLACEMENT STRUCTURE
applications, one usually uses filters to estimate
In this section we briefly review relevant defini-
the transmitted signal from a sequence of received
tions and results on displacement structure rep-
signal samples or to model an unknown system.
+
It has been studied in [9] that given m n - 1
resentation of a matrix. The Stein type displace-
ment equation for a matrix A, E. C n X n is
data samples and desired response vector d with
length m ( m 2 n ) ,the filter w with n filter coef- An - %&An = Bn,ace,n, (3)
ficients can be found by solving the Toeplitz least
squares problem: where R n , A n E C n x n , Bn,, E C"'" and C,,, E
Ccux'?. The pair of matrices Bn,,and C,,n is
min
W - Tm,nw112. (2) called the generator of A, with respect to Qn and
743
0-7803-3679-8/96/$5.000 1996 IEEE
A,. The matrix A, is considered to possess a dis- matrix. However, using the displacement struc-
placement structure with respect to 0, and h, if ture of T,),, we can write down the displace-
n >> a. The scalar a is called displacement rank ment equation of T;,,Tm,, with respect t o the
of A, with respect t o s2, and A,. The advantage displacement operator Z, :
of using displacement representation of is that all
the information about n2 entries of the matrix
A, is efficiently stored in 2 a n entries of Bn,a and
Ca,,. The concept of displacement structure was where the matrix En,6 is given by
first introduced in Kailath et a1.[10].
Let us introduce the n-by-n the lower shift cir-
culant matrices 2,
- It-Ptm-2 0 0
ro o ... o 11 I :
l1
En,s =
z,= 0 O1
T, - Z,T,Z: =
and the matrix JG is given by
1 0 0 0 0 0
0 - 1 0 0 0 0
0 0 1 0 0 0
0 0 0 - 1 0 0
0 0 0 0 1 0
0 0 0 0 0 - 1
where C, is the minimizer of ]IT, - &, / I F over We note from ( 6 ) , (7) and (8) that the numbers
all n-by-n circulant matrices Qn and the entries Z L ~ ,ZLL and ;!f are just the convolutions of { t i } and
of the matrix C, can be generated from the gen- therefore they can be computed efficiently using
erator of T, on the right-hand size of (4). Since FFTs.
any circulant matrix can be characterized by its As was mentioned in case of Toeplitz ma-
first column, we only need t o construct the first trix, similarly, we cannot recover the entries of
column of C,. It has been shown in [4] that the T&,,Tm,, just using its generator on the right-
first column of C, can be constructed in O ( n ) hand size of (5). In this case, the normal equa-
operations. tion 'matrix T&,,Tm,, can be decomposed into
In this paper, we are interested in solving the two parts, i.e.,
Toeplitz least squares problems stated in ( 2 ) . We
note that T&,nTm,n is in general not a Toeplitz
744
where X, is the minimizer of IIT;,,Tm,, - &, [ I F Theorem 1 Let R, satisfy (10). Let the matrices
over all n-by-n circulant matrices Qn and the en- D, and Bn,a in (10) be partitioned as
tries of the matrix 2, can be generated from the
generator of T;,,T,,, . In [3], Chan et al. proved
that the circulant matrix X, can be generated
in O(n1ogn) operations. In the next section, we
make use of (5) and (9) t o derive a fast algorithm If the upper left block R11 is invertible, then the
t o solve the Toeplitz least squares problems. Schur complement R(’) = R22 - &1R;:R& sat-
isfies
3. F A S T A L G O R I T H M S R(’) -. Dz2R(’)Da2= B ( 2 ) j a ( B ( 2 ) ) * ,
Our method is based on a fast algorithm for solv- with
ing a special structured matrix system proposed
by Gohberg et al. [B] and Kailath et al. [ll].We B(’) Bzl-((~l-Dz2)R21Rr,’)(’I-D11)-1B11,
=z
first consider a special class of structured Hermi- (12)
tian matrices R,. We choose the displacement where 7- is any number on the unit circle, which
operators is not an eigenvalue of D l l .
Clt=h=D , = diag ( d l , d 2 , * . . , d n ) We avoid operations on the n2 entries of a
Cauchy-like matrix and only manipulate on the
in (3). Since R, is Hermitian, we have entries of its generator. Since the rnatrix R, is
just Hermitian and is not positive definite, the
Rn - DnRnD: = Bn,ajaB:,a, (10)
numerical stability of the Gaussian Elimination
x
where Bn,ais an n-by-a matrix and J , is an wby- may not be achieved using only scalar elimina-
a matrix. A matrix with low displacement rank tion steps. Sometimes, one has to perform the
(n >> a ) is called a Cauchy-like matrix. It has step with 2-by-2 block R11. The details can be
been shown in [Ill that the diagonal entries of found in [7]. To enhance the accuracy of factor-
the Cauchy-like matrix cannot be recovered from ization, Kailath abd Olshevsky [ll] also proposed
its generator Bn,a. More precisely, a Cauchy-like to use the diagonal pivoting in the block Gaus-
matrix R,, that satisfies ( l o ) , is decomposed as sian Elimination. By multiplying the displace-
ment equation with a permutation P,, one im-
Rn = Sn + sn, mediately sees that
Cholesky F a c t o r i z a t i o n A l g o r i t h m
of Cauchy-like M a t r i x
S t e p 1: If n = 0, then stop.
S t e p 2: The size I/ of the pivot R11 is chosen to
be 1 or 2 to enhance the stability of the algorithm.
Perform diagonal pivoting by choosing a suitable
permutation matrix P,.
The displacement structure of the Cauchy-like Step 3: The nondiagonal entries of the first
matrix, that satisfies (3) with displacement op- column of I?, are given by the formula
erators D,, is inherited by its Schur complement.
This facts allows one t o avoid expensive comput- bi J’,bj*
ing (n - 1)’ entries of the Schur complement in Ti3 = ___
1-did;’
(ll),and t o compute instead only of its generator
In particular, Kailath and Qlshevsky [ll] where bi is the ith row of Bn,a. The diagonal
proved the following result. entries of R , are stored in the diagonal of S,.
145
Step 4: The matrix E?(') (the generator of the
Schur complement d2)) is computed by the for- Step 11 Sequential I Parallel
mula (12). 1 II O(knloan)
v ,
I O(kloan)
\ Y ,
where F, is the n-by-11 discrete Fourier transform Table 1 below lists the computation cost in each
matrix step of the above algorithm. The basic tool of our
algorithm is the FFT. Since the FFT is highly
parallelizable and has been implemented on mul-
tiprocessors efficiently [l]. Our algorithm can be
expected to perform efficiently in a parallel envi-
D, F,ZnFn diag (1,e a T i f n , . . . , ezi'i(n-l)/n 1, ronment for large-scale applications. In Table 1,
and we assume that we have n processors for doing
FFTs on an n-vector in O(log n ) operations.
&,ti = FEnI6. (14)
We remark that Gu [8] recently proposed using
Moreover, the diagonal entries of S, is given by fast Gaussian Elimination algorithm for Cauchy-
like linear system to solve the Toeplitz least
[Sn]jj= fiFnx, (15) squares problems with full column rank. How-
where z is the first column of the circulant matrix ever, our approach is different from that pro-
X , as in (9). posed by Gu. Gu first transforms the rectangu-
lar Toeplitz matrix to the rectangular Cauchy-like
Since the rectangular Toeplitz matrix has full matrix. Then the Cauchy-like least squares prob-
rank, the corresponding normal equations ma- lem is reduced into two Cauchy-like linear systems
trix TA,nTm,n is Hermitian positive definite and and these Cauchy-like linear systems are solved
therefore the corresponding Cauchy-like matrix by the fast Gaussian Elimination algorithm. In
F,T:l,nTm,n F,' is also Hermitian positive definite. our case, we consider the displacement structure
It leads us to use the size of pivot to be 1 in the of the normal equations matrix TA,,Tm,, and
Cholesky factorization of the Cauchy-like matrix. transform it to the Cauchy-like matrix. The mo-
tivation behind our procedure is that we only
Fast Algorithm of Solving Toeplitz solve one Cauchy-like linear system and the trans-
Least Squares Problems [Id - Tm.,wllf formed Cauchy-like matrix is Hermitiaa positive
746
definite as the rectangular Toeplitz matrix Tm,n
has full rank.
Unknown System
4. NUMERICAL RESULTS tk ek
1
-
We performed a computer experiment with the FIR System { ~ k ) ; , ~
algorithm designed in the present paper to inves-
tigate its performance. We illustrate the perfor-
mance of the method by using finite impulse re-
Figure 1: FIR System Identification Model.
sponse (FIR) system identification as an example.
FIR system identification has wide applications in
engineering [la]. Figure 1 is a block diagram of
an FIR system identification model. The input
signal z k drives the unknown system t o produce
the output sequence Y k . We model the unknown
system as an FIR filter. If the unknown systern
is actually an FIR system, then the model is ex-
act. In the tests, we formulate a well-defined least
squares prediction problem by estimating the au-
tocovariances from the data samples. By solving Table 2: Number of mega-flops used by our
the normal equations, the FIR system coefficients method with m = kn.
can be found. The rectangular Toeplitz matrices
Tm,nin (1) we used are -
1. tk is randomly chosen from the normal dis- QR method that solve a general dense linear least
tribution with zero mean and variance 1; squares problem. The cost of the our method is
+
of O ( mlog n n') flops, and the cost for Q R is
2. tl, is generated from the second order autore- of O(mn2)fIops.
gressive process given by As for the comparison of times, Tables 2 and 3
show the number of mega-flops (counted by Mat-
x ( t ) - 1 . 4 ~ (-t 1) + 0.5z(t - 2) = ~ ( t ) ; lab) used by our method and QR, method respec-
tively for the above examples. e see from the
3. tk is generated from the autoregressive mov- table that the number of mega-flops used for our
ing average process given by method is significantly less than that of QR. For
the above examples, we observe that the error of
t 1)
z ( t ) - 1 . 8 ~ (- + 0.9z(t - 2) = the computed solution 6 by our method is at least
O(10-13)). The error is computed as
V(t) + 0 . 3 ~ (-t 1) - 0 . 5 ~ (-t 2);
4. tk is generated from the mixed process given
by
1.08
~~ 4.12
8.41
32.55
65.03
259.45
519.41
generate the Toeplitz data matrices. In the test, 32 2.17 1.7.00 130.56 1039.12
we choose the solution w in (2) to be a random I 64 ii 4.37 i 34.17 i 261.89-
vector and the right-hand side vector d is com-
puted by Tm,n w correspondingly. The computa- Table 3: Number of mega-flops used by QR
tions were done by Matlab on a Sparc workstation method with m = kn.
in double precision. We compare our method and
747
5 . REFERENCES 5 , ACKNOWLEDGMENTS: Research by
M. Ng was supported by the Cooperative Re-
1. S . Akl, The Design and Analysis of Farallel search Centre for Advanced Computational Sys-
Algorithms, Prentice-Hall, Englewood Cliffs, tems.
N J , 1989.
748