Parallel computation of determinants of matrices
Parallel computation of determinants of matrices
www.elsevier.com/locate/jsc
Abstract
An algorithm for computing the determinant of a matrix whose entries are multivariate
polynomials is presented. It is based on classical multivariate Lagrange polynomial interpolation, and
it exploits the Kronecker product structure of the coefficient matrix of the linear system associated
with the interpolation problem. From this approach, the parallelization of the algorithm arises
naturally. The reduction of the intermediate expression swell is also a remarkable feature of the
algorithm.
© 2004 Elsevier Ltd. All rights reserved.
1. Introduction
∗ Corresponding author.
E-mail addresses: [email protected] (A. Marco), [email protected] (J.-J. Martı́nez).
0747-7171/$ - see front matter © 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jsc.2003.11.002
750 A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760
In this paper we will show how the computation of the determinant of a matrix whose
entries are multivariate polynomials can be done in parallel. Our approach is based on
classical multivariate Lagrange polynomial interpolation, and our way to parallelization is
mainly based on exploiting the matrix structure of an appropriate interpolation problem.
An example of this situation is the Sylvester resultant of two polynomials, which is
the determinant of the Sylvester matrix. A parallel resultant computation is presented in
Blochinger et al. (1999), but it must be observed that, as in the case of Hong et al. (2000),
the approach used in Blochinger et al. (1999) consists of parallelizing standard modular
techniques.
Another example of parallelization in the field of resultant computation is the parallel
computation of the entries of the Dixon determinant which appears in Chionh (1997).
Let us observe that the computation of determinants of matrices whose entries are
multivariate polynomials is very common in different problems of computer algebra
such as solving systems of polynomial equations, computing multipolynomial resultants,
implicitizing curves and surfaces and so on. In this sense our work can serve in many
applications as a component of more complex algorithms, which has been pointed out in
Hong (2000) as an important aspect of research.
For example in the field of implicitization the need for improved algorithms, which
could be accomplished by means of parallelization, is indicated as one of the conclusions
in Aries and Senoussi (2001) where it is remarked that the last step of the implicitization
algorithm, corresponding to the computation of the determinant, consumes much time. Our
work addresses that step of the implicitization algorithm, for which the authors have used
the standard function ffgausselim of Maple.
Our method will also be useful as a complement to the techniques of implicitization
by the methods of moving curves or moving surfaces. For example, the method of
moving curves described in Sederberg et al. (1997) generates the implicit equation as
the determinant of a matrix whose entries are quadratic polynomials in x and y. The
order of that matrix is half the order of the corresponding Bézout matrix and, according
to the authors, these compressed expressions may lead to faster evaluation algorithms.
But, as can be read in Manocha and Canny (1993), the bottleneck in these methods is the
symbolic expansion of determinants, a problem which is not considered in Sederberg et al.
(1997).
A brief survey on the solution of linear systems in a computer algebra context, including
the computation of determinants, has been recently given in Kaltofen and Saunders (2003).
In that work, some important issues relevant to our work such as Gaussian elimination,
minor expansion, structured matrices and parallel algorithms are considered.
Let M be a square matrix of order k whose entries are polynomials in R[x 1 , . . . , xr ]
where r ≥ 2. The coefficients will usually be integer, rational or algebraic numbers. We
will compute the determinant of M, which we denote by F(x 1 , . . . , xr ) ∈ R[x 1 , . . . , xr ] by
means of classical multivariate Lagrange polynomial interpolation, so we need the bounds
on the degree of F(x 1 , . . . , xr ) in each one of the variables x i (i = 1, . . . , r ), i.e. n i ∈ N
with degxi (F(x 1 , . . . , xr )) ≤ n i , i = 1, . . . , r . In other words, we need to determine an
appropriate interpolation space to which F(x 1 , . . . , xr ) belongs: a vector space denoted by
Πn1 ,...,nr (x 1 , . . . , xr ) of polynomials in x 1 , . . . , xr with n i being the maximum exponent
in the variable x i (i = 1, . . . , r ).
A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760 751
In some problems the n i are known from previous theoretical results (for example in
the case of curve implicitization considered in Marco and Martı́nez (2001)), while in other
cases they can be calculated from the matrix. Let us observe that the total degree of F is
always a bound for the degree in each variable, although not necessarily the best one. From
now on, we suppose that the n i (i = 1, . . . , r ) are known.
In Section 2 a complete description of the algorithm is given for the case r = 2 and the
case r > 2 is sketched. In Section 3 a detailed example is given, leaving to Section 4 some
comparisons of our algorithm with the Maple command gausselim. Finally Section 5
summarizes some important properties of our approach.
{x i y j | i = 0, . . . , n; j = 0, . . . , m}
= {1, y, . . . , y m , x, x y, . . . , x y m , . . . , x n , x n y, . . . , x n y m }
with that precise ordering, and the interpolation nodes in the corresponding order
{(x i , y j ) | i = 0, . . . , n; j = 0, . . . , m} = {(x 0 , y0 ), (x 0 , y1 ), . . . , (x 0 , ym ),
(x 1 , y0 ), (x 1 , y1 ), . . . , (x 1 , ym ), . . . , (x n , y0 ), . . . , (x n , ym )},
Let us observe that this general algorithm could be replaced with another one designed
for each particular case in order to reduce the computational cost of the process.
A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760 753
For example, in the case of implicitizing polynomial curves, the computation of each
determinant can be replaced by the computation of a resultant of two polynomials in R[t],
and so the computational complexity (in terms of arithmetic operations) will be O(k 2 )
instead of O(k 3 ) (see Marco and Martı́nez, 2001).
For the computation of the solution vector c we use an algorithm that reduces the
problem of solving a linear system with coefficient matrix Vx ⊗ Vy to solving n + 1
systems with the same matrix Vy and m + 1 systems with the same matrix Vx . The
algorithm for solving linear systems with a Kronecker product coefficient matrix is given
(for a generalized Kronecker product) in Martı́nez (1999), and a general algorithm for the
Kronecker product of several matrices (not necessarily Vandermonde matrices) is given
in Buis and Dyksen (1996). Taking into account that every linear system to be solved
is a Vandermonde linear system, it is convenient to use the Björck–Pereyra algorithm
(Björck and Pereyra, 1970; Golub and Van Loan, 1996) to solve those linear systems, since
it takes advantage of the special structure of the coefficient matrices Vx and Vy . A serial
implementation of the algorithm in Maple follows:
Stage I. Solution of n + 1 linear systems with coefficient matrix Vy .
Since ci j is the coefficient of x i y j , the ordering of the components of the solution vector
c = (c00 , . . . , c0m , c10 , . . . , c1m , . . . , cn0 , . . . , cnm )
corresponds to the lexicographic order of the monomials, cnm being the leading coefficient.
Let us observe that the algorithm overwrites f i j with the solution vector component
ci j and it does not construct either the matrix A or the Vandermonde matrices Vx and Vy ,
which implies an additional saving both in memory cost and in computational cost.
The parallelization of the algorithm that we have just described can be easily done
because it performs the same computations on different sets of data without the necessity
of communication between the processors. It is trivial to see that in the computation of the
interpolation data each datum can be computed independently. As for the solution of the
linear system Ac = f (where A = Vx ⊗ Vy ), the solution of the n + 1 Vandermonde linear
systems with the same matrix Vy of stage I can be done independently, and once we have
the solutions of all of these systems, the same happens with the m + 1 Vandermonde linear
systems with matrix Vx of stage II. Taking this into account and considering n = m, we
see that the time required by the algorithm for computing the determinant can be divided
by n + 1 when we have n + 1 processors. This is an important consequence of the fact that
A has a Kronecker product structure (see Buis and Dyksen, 1996; Martı́nez, 1999).
As for the generalization of the algorithm to the case r > 2, we have to say that the
situation is completely analogous to the bivariate case. Now, let M be a matrix whose
entries are polynomials in R[x 1 , . . . , xr ] with r > 2, and let n i be the bound on the degree
of F(x 1 , . . . , xr ) in x i (i = 1, . . . , r ). For computing F(x 1 , . . . , xr ) by means of classical
multivariate Lagrange polynomial interpolation we have to solve a linear system Ac = f
where A = Vx1 ⊗ Vx2 ⊗ · · · ⊗ Vxr with Vxi the Vandermonde matrices generated by
the coordinates x i of the interpolation nodes for i = 1, . . . , r . In the same way as in the
bivariate case, we can reduce the solution of the system Ac = f to the solution of the
corresponding linear systems with coefficients matrices Vxi for i = 1, . . . , r . Since these
matrices are Vandermonde matrices we use the Björck–Pereyra algorithm to solve those
linear systems.
Now we give a theorem about the computational complexity (in terms of arithmetic
operations) of the whole algorithm. For the sake of clarity, we assume n 1 = . . . , nr = n.
The result can easily be extended to the general situation. As is usual in applications, we
also assume that the cost of the evaluation of the matrix at an interpolation node is small
when compared to the computation of the determinant of the evaluated matrix, and so it is
not considered here.
Theorem 2.1. Let M be a square matrix of order k whose entries are polynomials in
R[x 1 , . . . , xr ] with r ≥ 2 and let degxi (det(M)) ≤ n ∀i . The computational cost of
computing det(M) using our algorithm is (in terms of arithmetic operations):
Our aim is to compute Rest (g, h), the resultant with respect to t of g(x, y, t)
and h(x, y, t), i.e. the determinant of the Sylvester matrix of g(x, y, t) and h(x, y, t)
considered as polynomials in t. A formula for computing the bounds of Rest (g, h) in x
and y is well known (see, for example, Winkler, 1996). Such bounds are n = 6 and m = 8
in our example.
We will consider that the coefficients of g(x, y, t) and h(x, y, t) (polynomials
in R[x, y]) are expressed in their Horner form to reduce as much as possible the
computational cost of the evaluation.
We are going to show the interpolation data, the results of stage I and the results of
stage II in a compact form, using matrices to avoid unnecessary notations. The rows of the
matrix below correspond to the interpolation data, where the interpolation datum f i j is the
(i + 1, j + 1) entry:
9 81 441 1 521 3 969 8 649 16 641 29 241 47 961
−18 810 6570 25 470 69 822 156 042 304 650 540 270 891 630
63 6 399 46 503 175 239 474 903 1 055 223 2 053 359 3 633 903 5 988 879
900 2 2198 788
26 568 179 208 661 428 1 777 140 3 931 560 7 631 208 13 483 908 .
3 789 5 9833 629
76 869 495 405 1 804 149 4 820 229 1 0633 149 2 0604 789 36 369 405
10 674 178 686 1 117 566 4 033 026 1 0733 634 2 3630 814 4 5738 846 80 674 866 13 2658 866
24 147 359 235 2 199 915 7 887 195 2 0932 587 4 6018 107 8 8996 275 15 6890 115 25 7893 155
Now we compute the coefficients of Rest (g, h) making use of the algorithm described
in Section 2. We detail the solution of all the Vandermonde linear systems involved in the
process.
The solutions of the seven linear systems with the same Vandermonde coefficient matrix
Vy (i.e. the output of stage I) are the rows of the matrix below:
9 18 27 18 9 0 0 0 0
−18 72 243 342 171 0 0 0 0
63 882 2025 2286 1143 0 0 0 0
900 0
4392 8613 8442 4221 0 0 0 .
3789 13 842 25 191 22 698 11 349 0
0 0 0
10 674 33 768 58 887 50 238 25 119 0 0 0 0
24 147 70 002 118 773 97 542 48 771 0 0 0 0
The results of stage II, i.e. the solutions of the nine linear systems with the same
Vandermonde coefficient matrix Vx and vectors of data being the columns of the matrix
above, are the columns of the following matrix, where ci j , the coefficient of x i y j , is the
(i + 1, j + 1) entry:
9 18 27 18 9 0 0 0 0
−27 0 27 54 27 0 0 0 0
27 0 54 108 54 0 0 0 0
−54 0 54 108 54 0 0
0 0 .
27 54 81 54 27 0 0
0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760 757
Let us observe that we do not need to solve the nine Vandermonde linear systems with
matrix Vx . We only need to solve the first five systems because the solution of the other four
is obviously the vector of zeros. This situation appears in all the examples where the bound
of the determinant in y is not the exact degree of the polynomial in y (in this example the
bound was m = 8 while the exact degree is 4).
4. Some comparisons
Of course, for problems of small size like the example of Section 3, any method will
usually give the answer in little time. But when the size of the problem is not small
the differences appear clearly. In this section we will include five examples showing
the performance of the method that we have presented. All the results here included are
obtained by using Maple 8 on a personal computer.
In Examples 1–4 M is the Sylvester matrix constructed with Maple by means of the
command
where
f = (t + 1)n − x(t + 2)n , g = (t + 3)n − y(t + 4)n , n = 13, 15, 17, 19.
In Example 5 M is again a Sylvester matrix, but now
√ √
f = (t + 1)8 + 2 − x(t + 2)8 , g = (t + 3)8 + 3 − y(t + 4)8 .
We have chosen the computation of the determinant of a Sylvester matrix for
three important reasons: the relevance of such matrices in applications (computation of
resultants, implicitization, . . . ), the availability of the degree bounds n i needed to apply
our method and the possibility of an easy reproduction of our experiments by any reader.
Since all the examples correspond to the implicitization of rational curves, using a
result quoted in Marco and Martı́nez (2001) we have in Examples 1–4 n 1 = n 2 = n
(n = 13, 15, 17, 19, respectively), and in Example 5 n 1 = n 2 = 8.
Now we briefly indicate the results obtained when using a sequential implementation of
our algorithm and the command gausselim of the Maple package linalg. Let us observe
that we have also compared our algorithm with the commands det and ffgausselim, both
in the Maple package linalg, and with the command Determinant of the Maple package
LinearAlgebra, considering the options fracfree, minor, multivar and algnum. As
the results obtained when using these Maple commands are not better than the results
obtained with gausselim, we have only included the results for this latter Maple
command.
758 A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760
Example 1 (n = 13). The time and the space required by our algorithm for computing
the determinant of M are 18.719 s and 6224 780 bytes. The time and the space needed by
gausselim in computing the same determinant are 47.069 s and 7535 260 bytes.
Example 2 (n = 15). The time required by our algorithm for computing the determinant
of M is of 42.669 s whereas the time spent by gausselim is of 214.521 s (i.e. our algorithm
is approximately 5 times faster). As for the space, our algorithm has needed 6224 780 bytes
(more or less the same as in Example 1) while gausselim has required 10 483 840 bytes.
Example 3 (n = 17). The time spent by our algorithm in computing the determinant of M
is of 91.625 s; it is more than 6 times faster than gausselim, which has required 647.954 s.
As for the space, our algorithm has required 6421352 bytes, half the space required by
gausselim, which has needed 13 301 372 bytes.
Example 4 (n = 19). The time and the space required by our algorithm for computing
the determinant of M are 192.194 s and 6355 828 bytes. The time and the space needed by
gausselim in computing the same determinant are 2176.711 s and 19 264 056 bytes.
Example 5. In this example the polynomials f and g have lower degree in t, but some of
their coefficients are algebraic numbers, which makes the computation slower.
The time spent by our algorithm in computing the determinant of M is of 10.978 s;
it is 3 times faster than gausselim, which has required 33.773 s. As for the space, our
algorithm has required 6224 780 bytes, while gausselim has needed 16 970 716 bytes.
Remark 1. Let us point out that in spite of the length of the numbers involved, the
interpolation stage (i.e. the solution of the linear system) has taken less than 1 s in all
the examples.
Remark 2. If our algorithm had been fully parallelized using a large enough number of
processors for each case, the determinants of all the examples would have been computed
in a couple of seconds.
In the previous sections we have shown how to construct a parallelized algorithm for
computing the determinant of a matrix whose entries are multivariate polynomials. Our
approach is based on classical multivariate Lagrange polynomial interpolation and takes
advantage of the matrix structure of the linear system associated with the interpolation
problem.
Let us observe that we do not consider the case of sparse interpolation, since the
dense case arises naturally in many applications (Aries and Senoussi, 2001). For the
sparse multivariate polynomial interpolation problem, a practical algorithm has appeared in
Murao and Fujise (1996), in which modular techniques are used and the parallelization of
some steps of the algorithm is considered. The approach used in Murao and Fujise (1996)
also needs bounds for the degree of each variable of the interpolating polynomial, which
in our case gives the interpolation space without any additional work. Moreover, we do not
need to construct explicitly the coefficient matrix of the corresponding linear system, which
A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760 759
means an additional saving in computing time. And finally the possibility of parallelization
is given by the Kronecker product structure of that coefficient matrix.
In the following remarks we state in detail some of the features of the algorithm:
1. In contrast with the approach of Manocha and Canny (1993), where a different
interpolation technique is considered, we do not use modular techniques. This is
why we do not need to look for big primes and so we get the same results even when
the coefficients are not rational numbers. Of course, as Example 5 shows, if we work
with algebraic numbers the cost of arithmetic operations will increase.
2. The algorithm only makes use of arithmetic operations with the coefficients of
the polynomials in the entries of the matrix. In this way we reduce enormously
the problem of “intermediate expression swell” that appears when computing the
determinant of matrices with polynomial entries (see Section 2.8 of Davenport et al.,
1988). It is clear from the examples of Section 4 that the interpolation data are
very big numbers, but that cannot be avoided if one wishes to use interpolation.
Nevertheless, it should also be observed that our method only works with numbers
and not with more complicated expressions: they are big, but they are numbers, and
in that case the arithmetic of Maple (at least for the rational case) performs very
efficiently.
3. The algorithm has a high degree of intrinsic parallelism both in the computation
of the interpolation data and in solving all the linear systems with the same
Vandermonde matrix Vxi , i = 1, . . . , r . This is why the parallelized algorithm
reduces greatly the time required for computing the determinant.
4. The algorithm is completely deterministic in nature, i.e. it does not contain any
probabilistic step.
5. The loop structure of the algorithm allows us to estimate in advance the time the
algorithm will spend in solving each specific problem. To be precise, if we must
compute N determinants (the interpolation data), the total time will be less than the
time required to compute the datum f n1 ,...,nr multiplied by N. And, analogously, the
time needed to solve all the Vandermonde systems can be estimated by multiplying
the number of systems to be solved by the time required for solving one linear
system.
Acknowledgements
The authors wish to thank two anonymous referees whose suggestions led to several
improvements of the paper, including the addition of Section 4.
This research was partially supported by Research Grant BFM 2003–03510 from the
Spanish Ministerio de Ciencia y Tecnologı́a.
References
Aries, F., Senoussi, R., 2001. An implicitization algorithm for rational surfaces with no base points.
J. Symbolic Computation 31, 357–365.
Bini, D., Pan, V.Y., 1994. Polynomial and Matrix Computations. Fundamental Algorithms.
volume 1: Birkhäuser, Boston.
760 A. Marco, J.-J. Martı́nez / Journal of Symbolic Computation 37 (2004) 749–760
Björck, A., Pereyra, V., 1970. Solution of Vandermonde systems of equations. Math. Comp. 24,
893–903.
Blochinger, W., Küchlin, W., Ludwig, C., Weber, A., 1999. An object-oriented platform for
distributed high-performance symbolic computation. Math. Comput. Simulation 49, 161–178.
Buis, P.E., Dyksen, W.R., 1996. Efficient vector and parallel manipulation of tensor products. ACM
Trans. Math. Software 22 (1), 18–23.
Chionh, E.W., 1997. Concise parallel Dixon determinant. Comput. Aided Geom. Design 14,
561–570.
Davenport, J.H., Siret, Y., Tournier, E., 1988. Computer Algebra: Systems and Algorithms for
Algebraic Computation. Academic Press, New York.
Davis, P.J., 1975. Interpolation and Approximation. Dover Publications Inc., New York.
Golub, G.H., Van Loan, C.F., 1996. Matrix Computations. third edition. Johns Hopkins University
Press, Baltimore, MD.
Hong, H., 2000. J. Symbolic Computation 29, 3–4 (Editorial).
Hong, H., Liska, R., Robidoux, N., Steinberg, S., 2000. Elimination of variables in parallel. SIAM
News 33 (8), 1, 12–13.
Horn, R.A., Johnson, C.R., 1991. Topics in Matrix Analysis. Cambridge University Press,
Cambridge.
Kaltofen, E., Saunders, B.D., 2003. Linear Systems. In: Grabmeier, J., Kaltofen, E.,
Weispfenning, V. (Eds.), Computer Algebra Handbook. Springer-Verlag, Berlin (section 2.3.1).
Khanin, R., Cartmell, M., 2001. Parallelization of perturbation analysis: application to large–scale
engineering problems. J. Symbolic Computation 31, 461–473.
Lu, H., 1994. Fast solution of confluent Vandermonde linear systems. SIAM J. Matrix Anal. Appl.
15 (4), 1277–1289.
Manocha, D., Canny, J.F., 1993. Multipolynomial resultant algorithms. J. Symbolic Computation
15, 99–122.
Marco, A., Martı́nez, J.J., 2001. Using polynomial interpolation for implicitizing algebraic curves.
Comput. Aided Geom. Design 18, 309–319.
Martı́nez, J.J., 1999. A generalized Kronecker product and linear systems. Internat. J. Math. Ed. Sci.
Tech. 30 (1), 137–141.
Murao, H., Fujise, T., 1996. Modular algorithm for sparse multivariate polynomial interpolation and
its parallel implementation. J. Symbolic Computation 21, 377–396.
Sederberg, T., Goldman, R., Du, H., 1997. Implicitizing rational curves by the method of moving
algebraic curves. J. Symbolic Computation 23, 153–175.
Winkler, F., 1996. Polynomial Algorithms in Computer Algebra. Springer-Verlag, Wien, New York.