1 Polynomials in One Variable: 1.1 The Fundamental Theorem of Algebra
1 Polynomials in One Variable: 1.1 The Fundamental Theorem of Algebra
1 Polynomials in One Variable: 1.1 The Fundamental Theorem of Algebra
If the degree d is four or less, then the roots are functions of the coefficients
which can be expressed in terms of radicals. The command solve in maple
will produce these familiar expressions for us:
2 1/2 2 1/2
-a1 + (a1 - 4 a2 a0) -a1 - (a1 - 4 a2 a0)
1/2 ------------------------, 1/2 ------------------------
a2 a2
1/6/a3*(36*a1*a2*a3-108*a0*a3^2-8*a2^3+12*3^(1/2)*(4*a1^3*a3
-a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2+4*a0*a2^3)^(1/2)*a3)
^(1/3)+2/3*(-3*a1*a3+a2^2)/a3/(36*a1*a2*a3-108*a0*a3^2-8*a2^3
+12*3^(1/2)*(4*a1^3*a3-a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2
+4*a0*a2^3)^(1/2)*a3)^(1/3)-1/3*a2/a3
1
The polynomial p(x) has d distinct roots if and only if its discriminant is
nonzero. Can you spot the discriminant of the cubic equation in the previous
maple output ? In general, the discriminant is computed from the resultant
of p(x) and its first derivative p0 (x) as follows:
1
discrx (p(x)) = · resx (p(x), p0 (x)).
ad
This is an irreducible polynomial in the coefficients a0 , a1 , . . . , ad . It follows
from Sylvester’s matrix for the resultant that the discriminant is a homoge-
neous polynomial of degree 2d − 2. Here is the discriminant of a quartic:
-192*a4^2*a0^2*a3*a1-6*a4*a0*a3^2*a1^2+144*a4*a0^2*a2*a3^2
+144*a4^2*a0*a2*a1^2+18*a4*a3*a1^3*a2+a2^2*a3^2*a1^2
-4*a2^3*a3^2*a0+256*a4^3*a0^3-27*a4^2*a1^4-128*a4^2*a0^2*a2^2
-4*a3^3*a1^3+16*a4*a2^4*a0-4*a4*a2^3*a1^2-27*a3^4*a0^2
-80*a4*a3*a1*a2^2*a0+18*a3^3*a1*a2*a0
> with(linalg):
> sylvester(f,diff(f,x),x);
[ a4 a3 a2 a1 a0 0 0 ]
[ ]
[ 0 a4 a3 a2 a1 a0 0 ]
[ ]
[ 0 0 a4 a3 a2 a1 a0]
[ ]
[4 a4 3 a3 2 a2 a1 0 0 0 ]
[ ]
[ 0 4 a4 3 a3 2 a2 a1 0 0 ]
[ ]
[ 0 0 4 a4 3 a3 2 a2 a1 0 ]
[ ]
[ 0 0 0 4 a4 3 a3 2 a2 a1]
2
Galois theory tells us that there is no general formula which expresses
the roots of p(x) in radicals if d ≥ 5. For specific instances with d not too
big, say d ≤ 10, it is possible to compute the Galois group of p(x) over Q .
Occasionally, one is lucky and the Galois group is solvable, in which case
maple has a chance of finding the solution of p(x) = 0 in terms of radicals.
> solve(f,x)[1];
1/2 1/3
1/12 (-6 (108 + 12 69 )
/ 1/2 1/3
+ 72 ) / (108 + 12 69 )
/
The number 48 is the order of the Galois group and its name is "6T11". Of
course, the user now has to consult help(galois) in order to learn more.
3
> Digits := 6:
> f := x^200 - x^157 + 8 * x^101 - 23 * x^61 + 1:
> fsolve(f,x);
.950624, 1.01796
This polynomial has only two real roots. To list the complex roots, we say:
> fsolve(f,x,complex);
4
We note that convenient facilities are available for calling matlab inside of
maple and for calling maple inside of matlab. We wish to encourage our
readers to experiment with the passage of data between these two programs.
Some numerical methods for solving a univariate polynomial equation
p(x) = 0 work by reducing this problem to computing the eigenvalues of the
companion matrix of p(x), which is defined as follows. Consider the quotient
ring V = Q [x]/hp(x)i modulo the ideal generated by the polynomial p(x).
The ring V is a d-dimensional Q -vector space. Multiplication by the variable
x defines a linear map from this vector space to itself.
Proposition 2. The zeros of p(x) are the eigenvalues of the matrix Times x .
5
We note that the set of multiple roots of p(x) can be computed symboli-
cally by forming greatest common divisor of p(x) and its derivative:
p0 (x) := p(x), p1 (x) := p0 (x), pi (x) := −rem(pi−2 (x), pi−1 (x)) for i ≥ 2.
Thus pi (x) is the negative of the remainder on division of pi−2 (x) by pi−1 (x).
Let pm (x) be the last non-zero polynomial in this sequence.
Theorem 4. (Sturm’s Theorem) If a < b in R and neither is a zero of
p(x) then the number of real zeros of p(x) in the interval [a, b] is the number of
sign changes in the sequence p 0 (a), p1 (a), p2 (a), . . . , pm (a) minus the number
of sign changes in the sequence p 0 (b), p1 (b), p2 (b), . . . , pm (b).
We note that any zeros are ignored when counting the number of sign
changes in a sequence of real numbers. For instance, a sequence of twelve
number with signs +, +, 0, +, −, −, 0, +, −, 0, −, 0 has three sign changes.
If we wish to count all real roots of a polynomial p(x) then we can apply
Sturm’s Theorem to a = −∞ and b = ∞, which amounts to looking at the
signs of the leading coefficients of the polynomials pi in the Sturm sequence.
Using bisection, one gets an efficient method for isolating the real roots by
rational intervals. This method is conveniently implemented in maple:
> p := x^11-20*x^10+99*x^9-247*x^8+210*x^7-99*x^2+247*x-210:
> sturm(p,x,-INFINITY, INFINITY);
3
6
> sturm(p,x,0,10);
2
> sturm(p,x,5,10);
0
> realroot(p,1/1000);
1101 551 1465 733 14509 7255
[[----, ---], [----, ---], [-----, ----]]
1024 512 1024 512 1024 512
> fsolve(p);
1.075787072, 1.431630905, 14.16961992
All 2m − 1 zeros of this polynomial are real, and its expansion has m terms.
The role of the ambient algebraically closed field containing K is now played
by the field C ((t)) of Puiseux series. These are formal power series in t with
7
coefficients in C and having rational exponents, subject to the condition
that the set of appearing exponents is bounded below and has a common
denominator. The field C ((t)) is known to be algebraically closed.
The proof of Puiseux’s theorem is algorithmic, and, lucky for us, there is
an implementation of this algorithm in maple. Here is how it works:
We note that this program generally does not compute all Puiseux series
solutions but only enough to generate the splitting field of p(t; x) over K.
We shall explain how to compute the first term (lowest order in t) in each of
the d Puiseux series solutions x(t) to our equation p(t; x) = 0. Suppose that
the i-th coefficient in (6) has the Laurent series expansion:
8
Each Puiseux series looks like
The crucial condition (8) will reappear in various guises later in these lectures.
As an illustration consider the example p(t; x) = x2 + x − t3 , where (8) reads
2τ = τ ≤ 3 or 2τ = 3 ≤ τ or 3 = τ ≤ 2τ.
τ = 0 or τ = 3,
which gives us the lowest terms in the two Puiseux series produced by maple.
It is customary to phrase the procedure described above in terms of the
Newton polygon of p(t; x). This polygon is the convex hull in R2 of the points
(i, Ai ) for i = 0, 1, . . . , d. The condition (8) is equivalent to saying that −τ
equals the slope of an edge on the lower boundary of the Newton polygon.
9
Theorem 8. The roots of the general equation of degree d are a basis for the
solution space of the following system of linear partial differential equations:
∂2X ∂2X
∂ai ∂aj
= ∂ak ∂al
whenever i + j = k + l, (9)
Pd Pd
i=0 iai ∂X
∂ai
= −X and ∂X
i=0 ai ∂ai = 0. (10)
The meaning of the statement “are a basis for the solution space of” will
be explained at the end of this section. Let us first replace this statement by
“are solutions of” and prove the resulting weaker version of the theorem.
Proof. The two Euler equations (10) express the scaling invariance of the
roots. They are gotten by applying the operator d/dt to the identities
∂X
X j + f 0 (X) · = 0. (11)
∂aj
∂2X ∂ Xj ∂f 0 (X) j 0 ∂X 0
= − 0 = X f (X)−2 − jX j−1 f (X)−1 . (12)
∂ai ∂aj ∂ai f (X) ∂ai ∂ai
∂f 0 (X) 00 (X)
Using (11) and the resulting identity ∂ai
= − ff 0 (X) · X i + iX i−1 , we can
rewrite (12) as follows:
∂2X
= −f 00 (X)X i+j f 0 (X)−3 + (i + j)X i+j−1 f 0 (X)−2 .
∂ai ∂aj
This expression depends only on the sum of indices i+j. This proves (9).
We check the validity of our differential system for the case d = 2 and we
note that it characterizes the series expansions of the quadratic formula.
10
> X := solve(a0 + a1 * x + a2 * x^2, x)[1];
2 1/2
-a1 + (a1 - 4 a2 a0)
X := 1/2 ------------------------
a2
a5 x5 + a4 x4 + a3 x3 + a2 x2 + a1 x + a0 = 0. (13)
Mayr’s Theorem can be used to write down all possible Puiseux series solu-
tions to the general quintic (13). There are 16 = 25−1 distinct expansions.
For instance, here is one of the 16 expansions of the five roots:
X1 = − aa01 , X2 = − aa12 + aa01 , X3 = − aa23 + aa12 ,
X4 = − aa34 + aa23 , X5 = − aa45 + aa34 .
11
Each bracket is a series having the monomial in the bracket as its first term:
a0 a2 a a3 a a3 a2 a4 a a4 a a a5 a
= aa01 + a0 3 2 − a0 4 3 + 2 a0 5 2 + a0 5 4 − 5 0 a26 3 − a0 6 5 + · · ·
a1 1 1 1 1 1 1
a1 2
a a 3
a a 2
a a a 3
a a 3 4
a a 4
a a a
= aa12 + a1 3 3 − a1 4 4 − 3 0 a14 5 + 2 a1 5 3 + a1 5 5 − 5 1 a36 4 + · · ·
a2 2 2 2 2 2 2
a2 2
a a 3
a a a a2
3
= aa23 − aa0 a2 5 − aa1 a2 4 + 2 a1 aa23 a5 + a2 3 4 − a2 4 5 + 2 a2 5 4 + · · ·
a3 3 3 3 3 3 3
a3 a2a a a 2 a a a 2 a a 3 a a a3
= aa34 − aa2 a2 5 + a3 3 5 + a1 3 5 − 3 2 a34 5 − a0 4 5 + 4 1 a35 5 + · · ·
a4 4 4 4 4 4 4
a4
= aa45
a5
The
ai−1 last
bracket is just a single Laurent monomial. The other four brackets
4
ai
can easily be written as an explicit sum over N . For instance,
where ξ runs over the five complex roots of the equation ξ5 = −1, and
a01/5 1/5
a0 1 1 2 a21 a3 3 a2 a24
1/5
= 1/5 − a1 a4
25 a4/5 a6/5
− a2 a3
25 a4/5 a6/5
+ 125 a a6/5
9/5 + 125 a a11/5
4/5 +···
a5
a5 0 5 0 5 0 5 0 5
a1 1 a23 2 a2 a4 7 a3 a24 6 a1 a2 a3
3/5 2/5
= a1
3/5 2/5 − 5 a3/5 a7/5
− 5 a3/5 a7/5
+ 25 a3/5 a12/5
+ 25 a8/5 a7/5
+···
a0 a5
a0 a5 0 5 0 5 0 5 0 5
a2 1 a21 3 a3 a4 6 a1 a2 a4 3 a1 a23
2/5 3/5
= a2
2/5 3/5 − 5 a7/5 a3/5
− 5 a2/5 a8/5
+ 25 a7/5 a8/5
+ 25 a7/5 a8/5
+···
a0 a5
a0 a5 0 5 0 5 0 5 0 5
a3 1 a1 a2 2 a24 1 a31 4 a1 a3 a4
1/5 4/5
= a3
1/5 4/5 − 5 a6/5 a4/5
− 5 a1/5 a9/5
+ 25 a11/5 a4/5
+ 25 a6/5 a9/5
+···
a0 a5
a0 a5 0 5 0 5 0 5 0 5
Each of these four series can be expressed as an explicit sum over the lattice
points in a 4-dimensional polyhedron. The general formula can be found
12
in Theorem 3.2 of Sturmfels (2000). That reference gives all 2n−1 distinct
Puiseux series expansions of the solution of the general equation of degree d.
The system (9)-(10) is a special case of the hypergeometric differential
equations discussed in (Saito, Sturmfels and Takayama, 1999). More pre-
cisely, it is the Gel’fand-Kapranov-Zelevinsky system with parameters −10
associated with the integer matrix
0 1 2 3 ··· n −1 n
A = .
1 1 1 1 ··· 1 1
1.6 Exercises
(1) Describe the Jordan canonical form of the companion matrix Timesx .
What are the generalized eigenvectors of the endomorphism (2) ?
(2) We define a unique cubic polynomial p(x) by four interpolation condi-
tions p(xi ) = yi for i = 0, 1, 2, 3. The discriminant of p(x) is a rational
function in x0 , x1 , x2 , x3 , y0 , y1 , y2 , y3. What is the denominator of this
rational function, and how many terms does the numerator have ?
(3) Create a symmetric 50 × 50-matrix whose entries are random integers
between −10 and 10 and compute the eigenvalues of your matrix.
(4) For which complex parameters α is the following system solvable ?
xd − α = x3 + x + 1 = 0.
13
(5) Consider the set of all 65, 536 polynomials of degree 15 whose coeffi-
cients are +1 or −1. Answer the following questions about this set:
a4 x4 + a3 x3 + a2 x2 + a1 x + a0 = 0
(8) Compute all five Puiseux series solutions x(t) of the quintic equation
x5 + t · x4 + t3 · x3 + t6 · x2 + t10 · x + t15 = 0
(9) Fix two real symmetric n × n-matrices A and B. Consider the set of
points (x, y) in the plane R2 such that all eigenvalues of the matrix
xA + yB are non-negative. Show that this set is closed and convex.
Does every closed convex semi-algebraic subset of R2 arise in this way ?
(10) Let α and β be integers and consider the following system of linear
differential equations for an unknown function X(a0 , a1 , a2 ):
14
2 Gröbner Bases of Zero-Dimensional Ideals
Suppose we are given polynomials f1 , . . . , fm in Q [x1 ,. . . , xn ] which are known
to have only finitely many common zeros in C n . Then I = hf1 , . . . , fm i, the
ideal generated by these polynomials, is zero-dimensional. In this section we
demonstrate how Gröbner bases can be used to compute the zeros of I.
i3 : dim I, degree I
o3 = (0, 14)
i4 : gb I
o4 = | y2z-1/2xz2-yz2+1/2z3+13/60x2-1/12y2+7/60z2
x2z-xz2-1/2yz2+1/2z3+1/12x2-13/60y2-7/60z2
y3-3y2z+3yz2-z3-x2
xy2-2x2z-3y2z+3xz2+4yz2-3z3-7/6x2+5/6y2-1/6z2
15
x2y-xy2-x2z+y2z+xz2-yz2+1/3x2+1/3y2+1/3z2
x3-3x2y+3xy2-3y2z+3yz2-z3-x2-z2
z4+1/5xz2-1/5yz2+2/25z2
yz3-z4-13/20xz2-3/20yz2+3/10z3+2/75x2-4/75y2-7/300z2
xz3-2yz3+z4+29/20xz2+19/20yz2-9/10z3-8/75x2+2/15y2+7/300z2
xyz2-3/2y2z2+xz3+yz3-3/2z4+y2z-1/2xz2
-7/10yz2+1/5z3+13/60x2-1/12y2-1/12z2|
i5 : toString (x^10 % I)
o5 = -4/15625*x*z^2+4/15625*z^3-559/1171875*x^2
-94/1171875*y^2+26/1171875*z^2
i6 : R = S/I; basis R
16
The common generator of the elimination ideals is a polynomial of degree 8:
6 6 17 4 8
p(x) = x8 + x + x + x2
25 625 15625
This polynomial is not squarefree. Its squarefree part equals
6 5 17 3 8
pred (x) = x7 + x + x + x.
25 625 15625
Hence our ideal I is not radical. Using Theorem 10, we compute its radical:
The three given generators form a lexicographic Gröbner basis. We see that
V(I) has cardinality seven. The only real root is the origin. The other six
zeros of I in C 3 are not real. They are gotten by cyclically shifting
(x, y, z) = −0.14233 − 0.35878i, 0.14233 − 0.35878i, 0.15188i
and (x, y, z) = −0.14233 + 0.35878i, 0.14233 + 0.35878i, −0.15188i .
M = hx1 − p1 , x2 − p2 , . . . , xn − pn i ⊂ S.
I ⊆ (I : M ) ⊆ (I : M 2 ) ⊆ (I : M 3 ) ⊆ · · ·
17
This sequence stabilizes with an ideal called the saturation
I : M∞ = f ∈ S : ∃ m ∈ N : fm · M ⊆ I .
Proposition 11. The variety of (I : M ∞ ) equals V(I)\{p}.
Here is how we compute the ideal quotient and the saturation in Macaulay
2. We demonstrate this for the ideal in the previous section and p = (0, 0, 0):
i1 : R = QQ[x,y,z];
i2 : I = ideal( (x-y)^3-z^2, (z-x)^3-y^2, (y-z)^3-x^2 );
i3 : M = ideal( x , y, z );
i4 : gb (I : M)
o4 = | y2z-1/2xz2-yz2+1/2z3+13/60x2-1/12y2+7/60z2
xyz+3/4xz2+3/4yz2+1/20x2-1/20y2 x2z-xz2-1/2yz2+ ....
i5 : gb saturate(I,M)
o5 = | z2+1/5x-1/5y+2/25 y2-1/5x+1/5z+2/25
xy+xz+yz+1/25 x2+1/5y-1/5z+2/25 |
o7 = (6, 6, 6)
18
Proposition 12. The ring S/J is isomorphic to the local ring S p /Ip under
the natural map xi 7→ xi . In particular, the multiplicity of p as a zero of I
equals the number of standard monomials for any Gröbner basis of J.
In our example, the local ideal J is particularly simple and the multiplicity
eight is obvious. Here is how the Macaulay 2 session continues:
i8 : J = ( I : saturate(I,M) )
2 2 2
o8 = ideal (z , y , x )
i9 : degree J
o9 = 8
I = J ∩ (I : M ∞ ). (16)
Here J is the iterated ideal quotient in (15). This ideal is primary to the
maximal ideal M , that is, Rad(J) = M . We can now iterate by applying
this process to the ideal (I : M ∞ ), and this will eventually lead to the
primary decomposition of I. We shall return to this topic in later lectures.
For the ideal in our example, the decomposition (16) is already the pri-
mary decomposition when working over the field of rational numbers. It
equals
Note that the second ideal is maximal and hence prime in Q [x, y, z]. The
given generators are a Gröbner basis with leading terms underlined.
19
Gröbner basis of I is known. Let B denote the associated monomial basis for
S/I. Multiplication by any of the variables xi defines an endomorphism
S/I → S/I , f 7→ xi · f (17)
We write Ti for the d × d-matrix over Q which represents the linear map (17)
with respect to the basis B. The rows and columns of Ti are indexed by the
monomials in B. If xu , xv ∈ B then the entry of Ti in row xu and column
xv is the coefficient of xu in the normal form of xi · xv . We call Ti the i-th
companion matrix of the ideal I. It follows directly from the definition that
the companion matrices commute pairwise:
Ti · Tj = Tj · Ti for 1 ≤ i < j ≤ n.
The matrices Ti generate a commutative subalgebra of the non-commutative
ring of d × d-matrices, and this subalgebra is isomorphic to our ring
Q [T1 , . . . , Tn ] ' S/I , Ti 7→ xi .
Theorem 13. The complex zeros of the ideal I are the vectors of joint eigen-
values of the companion matrices T 1 , . . . , Tn , that is,
V(I) = (λ1 , . . . , λn ) ∈ C n : ∃ v ∈ C n ∀ i : Ti · v = λi · v .
Proof. Suppose that v is a non-zero complex vector such that Ti · v = λi · v
for all i. Then, for any polynomial p ∈ S,
p(T1 , . . . , Tn ) · v = p(λ1 , . . . , λn ) · v.
If p is in the ideal I then p(T1 , . . . , Tn ) is the zero matrix and we conclude
that p(λ1 , . . . , λn ) = 0. Hence V(I) contains the set on the right hand side.
We prove the converse under the hypothesis that I is a radical ideal. (The
general case is left to the reader). Let λ = (λ1 , . . . , λn ) be any zero of I.
There exists a polynomial q ∈ S ⊗ C such that p(λ) = 1 and p vanishes at
all points in V(I)\{λ}. Then xi · q = λi · q holds on V(I), hence (xi − λi ) · q
lies in the radical ideal I. Let v be the non-zero vector representing the
element q of S/I ⊗ C . Then v is a joint eigenvector with joint eigenvalue
λ.
If I is a radical ideal then we can form a square invertible matrix V
whose columns are the eigenvectors v described above. Then V −1 · Ti · V is
a diagonal matrix whose entries are the i-th coordinates of all the zeros of I.
20
Corollary 14. The companion matrices T1 , . . . , Tn can be simultaneously
diagonalized if I is a radical ideal.
As an example consider the Gröbner basis given at the end of the last
section. The given ideal is a prime ideal in Q [x, y, z] having degree d = 6.
We determine the three companion matrices Tx, Ty and Tz using maple:
> with(Groebner):
> print(cat(T,v),T);
> od:
[ -2 -1 -2 ]
[0 -- -- 0 --- 0 ]
[ 25 25 125 ]
[ ]
[ -1 ]
[1 0 0 0 -- 1/25]
[ 25 ]
[ ]
Tx, [0 -1/5 0 0 1/25 1/25]
[ ]
[ -2 ]
21
[0 1/5 0 0 -- 1/25]
[ 25 ]
[ ]
[0 0 -1 1 0 0 ]
[ ]
[0 0 -1 0 -1/5 0 ]
[ -1 -2 ]
[0 -- -- 0 0 2/125]
[ 25 25 ]
[ ]
[0 0 1/5 0 1/25 1/25 ]
[ ]
[ -1 ]
[1 0 0 0 1/25 -- ]
Ty, [ 25 ]
[ ]
[ -2 ]
[0 0 -1/5 0 1/25 -- ]
[ 25 ]
[ ]
[0 -1 0 0 0 1/5 ]
[0 -1 0 1 0 0 ]
[ -2 -1 ]
[0 0 0 -- 1/125 --- ]
[ 25 125 ]
[ ]
[ -2 ]
[0 0 0 -1/5 -- 1/25]
[ 25 ]
[ ]
[ -2 ]
Tz, [0 0 0 1/5 1/25 -- ]
22
[ 25 ]
[ ]
[ -1 -1 ]
[1 0 0 0 -- -- ]
[ 25 25 ]
[ ]
[0 1 0 0 -1/5 1/5 ]
[0 0 1 0 -1/5 1/5 ]
The matrices Tx, Ty and Tz commute pairwise and they can be simultaneously
diagonalized. The entries on the diagonal are the six complex zeros. We
invite the reader to compute the common basis of eigenvectors using matlab.
Theorem 15. The signature of the trace form Bh equals the number of real
roots p of I with h(p) > 0 minus the number of real roots p of I with h(p) < 0.
23
Corollary 16. The number of real roots of I equals the signature of B1 .
> B1 := array([],1..6,1..6):
> for j from 1 to 6 do
> for i from 1 to 6 do
> B1[i,j] := 0:
> for k from 1 to 6 do
> B1[i,j] := B1[i,j] + coeff(coeff(coeff(
> normalf(B[i]*B[j]*B[k], GB, tdeg(x,y,z)),x,
> degree(B[k],x)), y, degree(B[k],y)),z, degree(B[k],z)):
> od:
> od:
> od:
> print(B1);
[ -2 -2 ]
[6 0 0 0 -- -- ]
[ 25 25 ]
[ ]
[ -12 -2 -2 -2 ]
[0 --- -- -- -- 0 ]
[ 25 25 25 25 ]
[ ]
[ -2 -12 -2 ]
[0 -- --- -- 0 2/25]
[ 25 25 25 ]
[ ]
[ -2 -2 -12 -2 ]
[0 -- -- --- 2/25 -- ]
24
[ 25 25 25 25 ]
[ ]
[-2 -2 34 -16 ]
[-- -- 0 2/25 --- --- ]
[25 25 625 625 ]
[ ]
[-2 -2 -16 34 ]
[-- 0 2/25 -- --- --- ]
[25 25 625 625 ]
> charpoly(B1,z);
4380672 32768
+ -------- z - ------
48828125 9765625
> fsolve(%);
Here the matrix B1 has three positive eigenvalues and three negative eigen-
values, so the trace form has signature zero. This confirms our earlier finding
that these equations have no real zeros. We note that we can read off the
signature of B1 directly from the characteristic polynomial. Namely, the
characteristic polynomial has three sign changes in its coefficient sequence.
Using the following result, which appears in Exercise 5 on page 67 of (Cox,
Little & O’Shea, 1998), we infer that there are three positive real eigenvalues
and this implies that the signature of B1 is zero.
It is instructive to examine the trace form for the case of one polynomial
25
in one variable. Consider the principal ideal
I = h ad xd + ad−1 xd−1 + · · · + a2 x2 + a1 x + a0 i ⊂ S = Q [x].
We consider the traces of successive powers of the companion matrix:
X
bi := trace Timesix = ui .
u∈V(I)
26
By considering sign alternations among these expressions in b0 , b1 , . . . , b6 , we
get explicit conditions for the general quartic to have zero, one, two, three,
or four real roots respectively. These are semialgebraic conditions. This
means the conditions are Boolean combinations of polynomial inequalities
in the five indeterminates a0 , a1 , a2 , a3 , a4 . In particular, all four zeros of
the general quartic are real if and only if the trace form of positive definite.
Recall that a symmetric matrix is positive definite if and only if its principal
minors are positive. Hence the quartic has four real roots if and only if
b0 > 0 and b0 b2 − b21 > 0 and b0 b2 b4 − b0 b23 − b21 b4 + 2b1 b2 b3 − b32 > 0 and
2b0 b5 b3 b4 − b0 b25 b2 + b0 b2 b4 b6 − b0 b23 b6 − b0 b34 + b25 b21 − 2b5 b1 b2 b4 − 2b5 b1 b23
+2b5 b22 b3 − b21 b4 b6 + 2b1 b2 b3 b6 + 2b1 b3 b24 − b32 b6 + b22 b24 − 3b2 b23 b4 + b43 > 0.
2.5 Exercises
(1) Let A = (aij ) be a non-singular n × n-matrix whose entries are positive
integers. How many complex solutions do the following equations have:
Y
n Y
n Y
n
xa1j = xa2j = · · · = xanj = 1.
j=1 j=1 j=1
(4) For any two positive integers m, n, find an explicit radical ideal I in
Q [x1 , . . . , xn ] and a term order ≺ such that in≺ (I) = hx1 , x2 , . . . , xn im .
27
(5) Fix the monomial ideal M = hx, yi = hx3 , x2 y, xy 2, y 3 i and compute
its companion matrices Tx , Ty . Describe all polynomial ideals in Q [x, y]
which are within distance = 0.0001 from M , in the sense that the
companion matrices are -close to Tx , Ty in your favorite matrix norm.
(6) Does every zero-dimensional ideal in Q [x, y] have a radical ideal in all of
its -neighborhoods ? How about zero-dimensional ideals in Q [x, y, z] ?
(7) How many distinct real vectors (x, y, z) ∈ R3 satisfy the equations
x3 + z = 2y 2, y 3 + x = 2z 2 , z 3 + y = 2x2 ?
(8) Pick eight random points in the real projective plane. Compute the
12 nodal cubic curves passing through your points. Can you find eight
points such that all 12 cubic polynomials have real coefficients ?
Determine the irreducible factor of f in R [x, y], and also in C [x, y].
(10) Consider a polynomial system which has infinitely many complex zeros
but only finitely many of them have all their coordinates distinct. How
would you compute those zeros with distinct coordinates ?
(11) Does there exist a Laurent polynomial in C [t, t−1 ] of the form
28
3 Bernstein’s Theorem and Fewnomials
The Gröbner basis methods described in the previous lecture apply to ar-
bitrary systems of polynomial equations. They are so general that they are
frequently not the best choice when dealing with specific classes polynomial
systems. A situation encountered in many applications is a system of n
sparse polynomial equations in n variables which have finitely many roots.
Algebraically, this situation is special because we are dealing with a complete
intersection, and sparsity allows us to use polyhedral techniques for counting
and computing the zeros. This lecture gives a gentle introduction to sparse
polynomial systems by explaining some basic techniques for n = 2.
where the exponents ui and vi are non-negative integers and the coefficients ai
are non-zero rationals. Its total degree deg(f ) is the maximum of the numbers
u1 + v1 , . . . , um + vm . The following theorem gives an upper bound on the
number of common complex zeros of two polynomials in two unknowns.
Theorem 18. (Bézout’s Theorem) Consider two polynomial equations in
two unknowns: g(x, y) = h(x, y) = 0. If this system has only finitely many
zeros (x, y) ∈ C 2 , then the number of zeros is at most deg(g) · deg(h).
Bézout’s Theorem is best possible in the sense that almost all polynomial
systems have deg(g)·deg(h) distinct solutions. An explicit example is gotten
by taking g and h as products of linear polynomials u1 x + u2 y + u3 . More
precisely, there exists a polynomial in the coefficients of g and h such that
whenever this polynomial is non-zero then f and g have the expected number
of zeros. The first exercise below concerns finding such a polynomial.
A drawback of Bézout’s Theorem is that it yields little information for
polynomials that are sparse. For example, consider the two polynomials
These two polynomials have precisely four distinct zeros (x, y) ∈ C 2 for
generic choices of coefficients ai and bj . Here “generic” means that a certain
29
polynomial in the coefficients ai , bj , called the discriminant, should be non-
zero. The discriminant of the system (20) is the following expression
4a71 a3 b32 b33 + a61 a22 b22 b43 − 2a61 a2 a4 b32 b33 + a61 a24 b42 b23 + 22a51 a2 a23 b1 b22 b33
+22a51 a23 a4 b1 b32 b23 + 22a41 a32 a3 b1 b2 b43 + 18a1 a2 a3 a54 b21 b42 − 30a41 a2 a3 a24 b1 b32 b23
+a41 a43 b21 b22 b23 + 22a41 a3 a34 b1 b42 b3 + 4a31 a52 b1 b53 − 14a31 a42 a4 b1 b2 b43
+10a31 a32 a24 b1 b22 b33 + 22a31 a22 a33 b21 b2 b33 + 10a31 a22 a34 b1 b32 b23 + 116a31 a2 a33 a4 b21 b22 b23
−14a31 a2 a44 b1 b42 b3 + 22a31 a33 a24 b21 b32 b3 + 4a31 a54 b1 b52 + a21 a42 a23 b21 b43
+94a21 a32 a23 a4 b21 b2 b33 −318a21 a22 a23 a24 b21 b22 b23 + 396a1 a32 a3 a34 b21 b22 b23 + a21 a23 a44 b21 b42
+94a21 a2 a23 a34 b21 b32 b3 + 4a21 a2 a53 b31 b2 b23 + 4a21 a53 a4 b31 b22 b3 + 18a1 a52 a3 a4 b21 b43
−216a1 a42 a3 a24 b21 b2 b33 + 96a1 a22 a43 a4 b31 b2 b23 − 216a1 a22 a3 a44 b21 b32 b3 −27a62 a24 b21 b43
−30a41 a22 a3 a4 b1 b22 b33 + 96a1 a2 a43 a24 b31 b22 b3 + 108a52 a34 b21 b2 b33
+4a42 a33 a4 b31 b33 − 162a42 a44 b21 b22 b23 − 132a32 a33 a24 b31 b2 b23 + 108a32 a54 b21 b32 b3
−132a22 a33 a34 b31 b22 b3 − 27a22 a64 b21 b42 + 16a2 a63 a4 b41 b2 b3 + 4a2 a33 a44 b31 b32
If this polynomial of degree 14 is non-zero, then the system (20) has four
distinct complex zeros. This discriminant is computed in maple as follows.
g := a1 + a2 * x + a3 * x*y + a4 * y;
h := b1 + b2 * x^2 * y + b3 * x * y^2;
R := resultant(g,h,x):
S := factor( resultant(R,diff(R,y),y) ):
discriminant := op( nops(S), S);
30
are again polytopes of various dimensions between 0 and d − 1. The 0-
dimensional faces are called vertices, the 1-dimensional faces are called edges,
and the (d − 1)-dimensional faces are called facets. For instance, the cube
has 8 vertices, 12 edges and 6 facets. If d = 2 then the edges coincide with
the facets. A 2-dimensional polytope is called a polygon.
Consider the polynomial f (x, y) in (19). Each term xui y vi appearing in
f (x, y) can be regarded as a lattice point (ui , vi ) in the plane R 2 . The convex
hull of all these points is called the Newton polygon of f (x, y). In symbols,
New(f ) := conv (u1 , v1 ), (u2, v2 ), . . . , (um , vm )
If P and Q are any two polygons then we define their mixed area as
For instance, the mixed area of the two Newton polygons in (20) equals
13 3
M(P, Q) = M(N ew(g), New(h)) = − 1− = 4.
2 2
The correctness of this computation can be seen in the following diagram:
31
This figure shows a subdivision of P + Q into five pieces: a translate of P ,
a translate of Q and three parallelograms. The mixed area is the sum of the
areas of the three parallelograms, which is four. This number coincides with
the number of common zeros of g and h. This is not an accident, but it is an
instance of a general theorem due to David Bernstein (1975). We abbreviate
C ∗ := C \{0}. The set (C ∗ )2 of pairs (x, y) with x 6= 0 and y 6= 0 is a group
under multiplication, called the two-dimensional algebraic torus.
Actually, this assertion is valid for Laurent polynomials, which means that
the exponents in our polynomials (19) can be any integers, possibly negative.
Bernstein’s Theorem implies the following combinatorial fact about lattice
polygons. If P and Q are lattice polygons (i.e., the vertices of P and Q have
integer coordinates), then M(P, Q) is a non-negative integer.
We remark that Bézout’s Theorem follows as a special case from Bern-
stein’s Theorem. Namely, if g and h a general polynomials of degree d and e
respectively, then their Newton polygons are the triangles
The areas of these triangles are d2 /2, e2 /2, (d + e)2 /2, and hence
(d + e)2 d2 e2
M(P, Q) = − − = d · e.
2 2 2
Hence two general plane curves of degree d and e meet in d · e points.
We shall present a proof of Bernstein’s Theorem. This proof is algorithmic
in the sense that it tells us how to approximate all the zeros numerically.
The steps in this proof from the foundation for the method of polyhedral
homotopies for solving polynomial systems. This is an active area of research,
with lots of exciting progress by work of T.Y. Li, Jan Verschelde and others.
We proceed in three steps. The first deals with an easy special case.
32
3.2 Zero-dimensional binomial systems
A binomial is a polynomial with two terms. We first prove Theorem 1.1 in
the case when g and h are binomials. After multiplying or dividing both
binomials by suitable scalars and powers of the variables, we may assume
that our given equations are
This is accomplished using the Hermite normal form algorithm of integer lin-
ear algebra. The invertible matrix U triangularizes our system of equations:
g=h=0
⇐⇒ x y = c1 and xa2 y b2 = c2
a1 b1
⇐⇒ (xa1 y b1 )u11 (xa2 y b2 )u12 = cu1 11 cu2 12 and (xa1 y b1 )u21 (xa2 y b2 )u22 = cu1 21 cu2 22
⇐⇒ xr1 y r3 = cu1 11 cu2 12 and y r2 = cu1 21 cu2 22 .
This equals the mixed area M(New(g), New(h)), since the two Newton
polygons are just segments, so that area(New(g)) = area(New(h)) = 0.
This proves Bernstein’s Theorem for binomials. Moreover, it gives a simple
algorithm for finding all zeros in this case.
The method described here clearly works also for n binomial equations in
n variables, in which case we are to compute the Hermite normal form of an
33
integer n × n-matrix. We note that the Hermite normal form computation
is similar but not identical to the computation of a lexicographic Gröbner
basis. We illustrate this in maple for a system with n = 3 having 20 zeros:
13 3 8 10 15 2 2 9 8 6 3 4 7
[-c2 c1 + c3 z , c2 c1 y - c3 z , c2 c1 x - c3 z y]
34
In a neighborhood of the origin in the complex plane, each branch of our
algebraic function can be written as follows:
x(t) = x0 · tu + higher order terms in t,
y(t) = y0 · tv + higher order terms in t,
where x0 , y0 are non-zero complex numbers and u, v are rational numbers.
To determine the exponents u and v we substitute x = x(t) and y = y(t)
into the equations gt (x, y) = ht (x, y) = 0. In our example this gives
gt x(t), y(t) = a1 tν1 + a2 x0 tu+ν2 + a3 x0 y0 tu+v+ν3 + a4 y0 tv+ν4 + · · · ,
ht x(t), y(t) = b1 tω1 + b2 x20 y0 t2u+v+ω2 + b3 x0 y02tu+2v+ω3 + · · · .
In order for (1.6) to be a root the term of lowest order must vanish. Since
x0 and y0 are chosen to be non-zero, this is possible only if the lowest order
in t is attained by at least two different terms. This implies the following
two piecewise-linear equations for the indeterminate vector (u, v) ∈ Q 2 :
min ν1 , u + ν2 , u + v + ν3 , v + ν4 is attained twice
min ω1 , 2u + v + ω2 , u + 2v + ω3 is attained twice.
As in Lecture 1, each of these translates into a disjunction of linear equations
and inequalities. For instance, the second “min-equation” translates into
ω1 = 2u + v + ω2 ≥ u + 2v + ω3
or ω1 = u + 2v + ω3 ≥ 2u + v + ω2
or 2u + v + ω2 = u + 2v + ω3 ≥ ω1
It is now easy to state what we mean by the νi and ωj being sufficiently
generic. It means that “Min” is attained twice but not thrice. More precisely,
at every solution (u, v) of the two piecewise-linear equations, precisely two
of the linear forms attain the minimum value in each of the two equations.
One issue in the algorithm for Bernstein’s Theorem is to chose powers of
t that are small but yet generic. In our example, the choice ν1 = ν2 = ν3 =
ν4 = ω3 = 0, ω1 = ω2 = 1 is generic. Here the two polynomial equations are
gt (x, y) = a1 + a2 x + a3 xy + a4 y, ht (x, y) = b1 t + b2 x2 yt + b3 xy 2 ,
and the corresponding two piecewise-linear equations are
min 0, u, u + v, v and min 1, 2u + v + 1, u + 2v are attained twice.
35
This system has precisely three solutions:
(u, v) ∈ (1, 0), (0, 1/2), (−1, 0) .
For each of these pairs (u, v), we now obtain a binomial system g0 (x0 ,y0) =
h0 (x0 , y0 ) which
expresses the fact that the lowest terms in gt x(t), y(t) and
ht x(t), y(t) do indeed vanish. The three binomial systems are
• g 0 (x0 , y0) = a1 + a4 y0 and h0 (x0 , y0) = b1 + b3 x0 y02 for (u, v) = (1, 0).
• g 0 (x0 , y0) = a1 + a2 x0 and h0 (x0 , y0) = b1 + b3 x0 y02 for (u, v) = (0, 1/2).
• g 0 (x0 , y0) = a2 x0 + a3 x0 y0 and h0 (x0 , y0 ) = b2 x20 y0 + b3 x0 y02 for (u, v) =
(−1, 0).
These binomial systems have one, two and one root respectively. For in-
stance, the unique Puiseux series solution for (u, v) = (1, 0) has
x0 = −a24 b1 /a21 b3 and y0 = −a1 /a4 .
Hence our algebraic function has a total number of four branches. If one
wishes more information about the four branches, one can now compute
further terms in the Puiseux expansions of these branches. For instance,
a2 b a34 b21 (a1 a3 −a2 a4 )
x(t) = − a42 b13 · t + 2· a51 b23
· t2
1
a44 b21 (a31 a4 b2 −5a21 a23 b1 +12a1 a2 a3 a4 b1 −7a22 a24 b1 )
+ a81 b83
· t3 + . . .
b1 (a1 a3 −a2 a4 ) a4 b21 (a1 a3 −a2 a4 )(a1 a3 −2a2 a4 )
y(t) = − aa14 + a21 b3
·t + a51 b23
· t2 + . . . .
For details on computing multivariate Puiseux series see (McDonald 1995).
36
The Minkowski sum P + Q is a polytope in R3 . By a facet of P + Q we mean
a two-dimensional face. A facet F of P +Q is a lower facet if there is a vector
(u, v) ∈ R 2 such that (u, v, 1) is an inward pointing normal vector to P + Q
at F . Our genericity conditions for the integers νi and ωj is equivalent to:
(1) The Minkowski sum P + Q is a 3-dimensional polytope.
(2) Every lower facet of P + Q has the form F1 + F2 where either
(a) F1 is a vertex of P and F2 is a facet of Q, or
(b) F1 is an edge of P and F2 is an edge of Q, or
(c) F1 is a facet of P and F2 is a vertex of Q.
As an example consider our lifting from before, ν1 = ν2 = ν3 = ν4 = ω3 = 0
and ω1 = ω2 = 1. It meets the requirements (1) and (2). The polytope P
is a quadrangle and Q is triangle. But they lie in non-parallel planes in R3 .
Their Minkowski sum P + Q is a 3-dimensional polytope with 10 vertices:
The union of all lower facets of P + Q is called the lower hull of the
polytope P + Q. Algebraically speaking, the lower hull is the subset of all
points in P + Q at which some linear functional of the form (x1 , x2 , x3 ) 7→
ux1 + vx2 + x3 attains its minimum. Geometrically speaking, the lower hull
is that part of the boundary of P + Q which is visible from below. Let
π : R 3 → R 2 denote the projection onto the first two coordinates. Then
The map π restricts to a bijection from the lower hull onto New(g)+New(h).
The set of polygons ∆ := {π(F ) : F lower facet of P + Q} defines a sub-
division of N ew(g) + New(h). A subdivision ∆ constructed by this process,
for some choice of νi and ωj , is called a mixed subdivision of the given Newton
polygons. The polygons π(F ) are the cells of the mixed subdivision ∆.
Every cell of a mixed subdivision ∆ has the form F1 + F2 where either
(a) F1 = {(ui , vi )} where xui y vi appears in g and F2 is the projection of
a facet of Q, or
37
(b) F1 is the projection of an edge of P and F2 is the projection of an
edge of Q, or
Lemma 20. Let ∆ be any mixed subdivision for g and h. Then the sum of
the areas of the mixed cells in ∆ equals the mixed area M(New(g), New(h)).
Proof. Let γ and δ be arbitrary positive reals and consider the polytope
γP + δQ in R 3 . Its projection into the plane R2 equals
Let A(γ, δ) denote the area of this polygon. This polygon can be subdivided
into cells γF1 + δF2 where F1 + F2 runs over all cells of ∆. Note that
area(γF1 + δF2 ) equals δ2 · area(F1 + F2 ) if F1 + F2 is a cell of type (a),
γδ ·area(F1 +F2 ) if it is a mixed cell, and γ2 ·area(F1 +F2 ) if is has type (c).
The sum of these areas equals A(γ, δ). Therefore A(γ, δ) = A(a) · δ 2 + A(b) ·
γδ + A(c) · γ 2 , where A(b) is the sum of the areas of the mixed cells in ∆. We
conclude A(b) = A(1, 1) − A(1, 0) − A(0, 1) = M(New(g), New(h)).
The following lemma makes the connection with the previous section.
This implies that the valid choices of (u, v) are in bijection with the mixed
cells in the mixed subdivision ∆. Each mixed cell of ∆ is expressed uniquely
as the Minkowski sum of a Newton segment New(g0) and a Newton segment
New(h0 ), where g 0 is a binomial consisting of two terms of g, and h0 is a
binomial consisting of two terms of h. Thus each mixed cell in ∆ can be
identified with a system of two binomial equations g0(x, y) = h0 (x, y) = 0.
In this situation we can rewrite our system as follows:
where a and b suitable rational numbers. This implies the following lemma.
38
Lemma 22. Let (u, v) as in Lemma 21. The corresponding choices of
(x0 , y0) ∈ (C ∗ )2 are the solutions of the binomial system g 0 (x0 , y0) = h0 (x0 , y0) =
0.
39
Question 23. What is the maximum number of isolated real roots of any
system of two polynomial equations in two variables each having four terms ?
where ai , bj are arbitrary real numbers and ui , vj , ũi , ṽj are arbitrary integers.
To stay consistent with our earlier discussion, we shall count only solutions
(x, y) in (R ∗ )2 , that is, we require that both x and y are non-zero reals.
There is an obvious lower bound for the number Question 23: thirty-six.
It is easy to write down a system of the above form that has 36 real roots:
Each of the polynomials f and g depends on one variable only, and it has 6
non-zero real roots in that variable. Therefore the system f (x) = g(y) = 0
has 36 distinct isolated roots in (R∗ )2 . Note also that the expansions of f
and g have exactly four terms each, as required.
A priori it is not clear whether Question 23 even makes sense: why should
such a maximum exist ? It certainly does not exist if we consider complex
zeros, because one can get arbitrarily many complex zeros by increasing the
degrees of the equations. The point is that such an unbounded increase of
roots is impossible over the real numbers. This was proved by Khovanskii
(1980). He found a bound on the number of real roots which does not depend
on the degrees of the given equations. We state the version for positive roots.
40
We claim that the number of real zeros in the positive orthant is at most
m X
n
m Y
d
2( 2 ) · 1 + deg(Fi) · deg(Fi ).
i=1 i=1
Now, the punch line is that each of the n + 1 equations in (2.4) – in-
cluding the Jacobian – can be expressed in terms of only m monomials
41
xa1 · t, xa2 , · · · , xam . Therefore we can bound the number of bifurcation
points by the induction hypothesis, and we are done.
This was only to give the flavor of how Theorem 2.3 is proved. There are
combinatorial and topological fine points which need most careful attention.
The reader will find the complete proof in (Khovanskii 1980), in (Khovanskii
1991) or in (Benedetti & Risler 1990).
Khovanskii’s Theorem implies an upper bound for the root count sug-
gested in Question 23. After multiplying one of the given equations by a
suitable monomial, we may assume that our system has seven distinct mono-
mials. Substituting n = 2 and m = 7 into Khovanskii’s formula, we see that
7
there are at most 2(2) · (2 + 1)7 = 4, 586, 471, 424 roots in the positive quad-
rant. By summing over all four quadrants, we conclude that the maximum
7
in Question 23 lies between 36 and 18, 345, 885, 696 = 22 · 2(2) · (2 + 1)7 .
The gap between 36 and 18, 345, 885, 696 is frustratingly large. Experts agree
that the truth should be closer to the lower bound than to the upper bound,
but at the moment nobody knows the exact value. Could it be 36 ?
The original motivation for Khovanskii’s work was the following conjec-
ture from the 1970’s due to Kouchnirenko. Consider any system of n poly-
nomial equations in n unknown, where the i-th equation has at most mi
terms. The number of isolated real roots in (R+ )n of such a system is at most
(m1 −1)(m2 −1) · · · (md −1). This number is attained by equations in distinct
variables, as was demonstrated by our example with d = 2, m1 = m2 = 4
which has (m1 − 1)(m2 − 1) = 16 real zeros.
Remarkably, Kouchnirenko’s conjecture remained open for many years
after Khovanskii had developed his theory of fewnomials which includes the
above theorem. Only two years ago, Bertrand Haas (2000) found the follow-
ing counterexample to Kouchnirenko’s conjecture in the case d = 2, m1 =
m2 = 4. Proving the following proposition from scratch is a nice challenge.
It was proved by Li, Rojas and Wang (2001) that the lower bound pro-
vided by Haas’ example coincides with the upper bound for two trinomials.
42
Theorem 26. (Li, Rojas and Wang) A system of two trinomials
with ai , bj ∈ R and ui , vj , ũi, ṽj ∈ Z has at most five positive real zeros.
3.6 Exercises
(1) Consider the intersection of a general conic and a general cubic curve
a1 x2 + a2 xy + a3 y 2 + a4 x + a5 y + a6 = 0
b1 x3 +b2 x2 y+b3 xy 2 +b4 y 3 +b5 x2 +b6 xy+b7 y 2 +b8 x+b9 y+b10 = 0
f (x1 , x2 , x3 , x4 ) = (x1 −x2 )(x1 −x3 )(x1 −x4 )(x2 −x3 )(x2 −x4 )(x3 −x4 ).
α1 x3 y + α2 xy 3 = α3 x + α4 y and β1 x2 y 2 + β2 xy = β3 x2 + β4 y 2 ?
Can your bound be attained with all real vectors (x, y) ∈ (R∗ )2 ?
(4) Find the first three terms in each of the four Puiseux series solutions
(x(t), y(t)) of the two equations
43
(7) Show that Kouchnirenko’s Conjecture is true for d = 2 and m1 = 2.
(8) Prove Proposition 25. Please use any computer program of your choice.
(9) Can Haas’ example be modified to show that the answer to Question
23 is strictly larger than 36 ?
4 Resultants
Elimination theory deals with the problem of eliminating one or more vari-
ables from a system of polynomial equations, thus reducing the given problem
to a smaller problem in fewer variables. For instance, if we wish to solve
a0 + a1 x + a2 x2 = b0 + b1 x + b2 x2 = 0,
f = a0 + a1 x + a2 x2 + · · · + ad−1 xd−1 + ad xd ,
g = b0 + b1 x + b2 x2 + · · · + be−1 xe−1 + be xe .
44
Theorem 27. There exists a unique (up to sign) irreducible polynomial
Res in Z[a0, a1 , . . . , ad , b0 , b1 , . . . , bd ] which vanishes whenever the polynomi-
als f (x) and g(x) have a common zero.
Here and throughout this section “common zeros” may lie in any alge-
braically closed field (say, C ) which contains the field to which we specialize
the coefficients ai and bj of the given polynomials (say, Q ). Note that a poly-
nomial with integer coefficients being “irreducible” implies that the coeffi-
cients are relatively prime. The resultant Res = Resx (f, g) can be expressed
as the determinant of the Sylvester matrix
a0 b0
a1 a0 b1 b0
. .
a1 . . b1 . .
. . . .
.. . . a0 .. . . b0
.. ..
Resx (f, g) = . a1 . b1 (24)
ad be
.
.. ..
.
ad be
..
.
..
.
ad be
where the blank spaces are filled with zeroes. See the left formula in (22).
There are many other useful formulas for the resultant. For instance,
suppose that the roots of f are ξ1 , . . . , ξd and the roots of g are η1 , . . . , ηe .
Then we have the following product formulas:
Y
d Y
e Y
d Y
e
Resx (f, g) = ae0 bd0 (ξi − ηj ) = ae0 g(ξi) = (−1)de bd0 f (ηj ).
i=1 j=1 i=1 j=1
45
following polynomial in two variables, which is called the Bézoutian
f (x)g(y) − f (y)g(x) X
d−1
B(x, y) = = cij xi y j .
x−y i,j=0
Form the symmetric d × d-matrix C = (cij ). Its entries cij are sums of
brackets [kl] = ak bl − al bk . The case d = 2 appears in (22) on the right.
Theorem 29. (Bézout resultant) The determinant of C equals ±Resx (f, g).
Proof. The resultant Resx (f, g) is an irreducible polynomial of degree 2d in
a0 , . . . , ad , b0 , . . . , bd . The determinant of C is also a polynomial of degree 2d.
We will show that the zero set of Resx (f, g) is contained in the zero set of
det(C). This implies that the two polynomials are equal up to a constant.
Looking at leading terms one finds the constant to be either 1 or −1.
If (a0 , . . . , ad , b0 , . . . , bd ) is in the zero set of Resx (f, g) then the sys-
tem f = g = 0 has a complex solution x0 . Then B(x0 , y) is identically
zero as a polynomial in y. This implies that the non-zero complex vector
(1, x0 , x20 , . . . , xm−1
0 ) lies in the kernel of C, and therefore det(C) = 0.
The 3 × 3-determinants in the middle of (22) shows that one can also use
mixtures of Bézout matrices and Sylvester matrices. Such hybrid formulas for
resultants are very important in higher-dimensional problems as we shall see
below. Let us first show three simple applications of the univariate resultant.
Example. (Intersecting two algebraic curves in the real plane)
Consider two polynomials in two variables, say,
f = x4 + y 4 − 1 and g = x5 y 2 − 4x3 y 3 + x2 y 5 − 1.
46
Hence the two curves have four intersection points, with these y-coordinates.
By the symmetry in f and g, the same values are also the possible x-
coordinates. By trying out (numerically) all 16 conceivable x-y-combinations,
we find that the following four pairs are the real solutions to our equations:
2 2 2 3
+ 378 x y + 870 x y - 226 x y
3 4 3 2 4 3
+ 440 x - 484 x + 758 x y - 308 x y - 540 x y
2 3 3 3 4 2 3
- 450 x y - 76 x y + 76 x y - 216 y
47
minimal polynomials f, g ∈ Q [x]. These are the unique (up to scaling)
irreducible polynomials satisfying f (α) = 0 and g(β) = 0. Our problem
is to find the minimal polynomials p and q for their sum α + β and their
product α·β respectively. The answer is given by the following two formulas
p(z) = Resx f (x), g(z − x) and q(z) = Resx f (x), g(z/x) · xdeg(g) .
We assume that the i-th equation is homogeneous of degree di > 0, that is,
X (i)
fi = cj1 ,...,jn xj11 · · · xjnn ,
j1 +···+jn =di
where the sum is over all n+ddii −1 monomials of degree di in x1 , . . . , xn . Note
that the zero vector (0, 0, . . . , 0) is always a solution of (25). Our question is
to determine under which condition there is a non-zero solution. As a simple
example we consider the case of linear equations (n = 3, d1 = d2 = d3 = 1):
f1 = c1100 x1 + c1010 x2 + c1001 x3 = 0
f2 = c2100 x1 + c2010 x2 + c2001 x3 = 0
f3 = c3100 x1 + c3010 x2 + c3001 x3 = 0.
48
This system a non-zero solution if and only if the determinant is zero:
1
c100 c1010 c1001
det c2100 c2010 c2001 = 0.
c3100 c3010 c3001
(i)
Returning to the general case, we regard each coefficient cj1 ,...,jn of each
polynomial fi as an unknown, and we write Z[c] for the ring of polynomials
with integer coefficients
Pn n+d in these
variables. The total number of variables in
i −1
Z[c] equals N = i=1 di
. For instance, the 3 × 3-determinant in the
example above may be regarded as a cubic polynomial in Z[c]. The following
theorem characterizes the classical multivariate resultant Res = Resd1 ···dn .
49
Consider the projection ψ : C N × P n−1 → C N , (u, x) 7→ u. It follows
from the Main Theorem of Elimination Theory, (Eisenbud 1994, Theorem
14.1) that ψ(I) is an irreducible subvariety of C N which is defined over Q
as well. Every point c in C N can be identified with a particular polynomial
system f1 = · · · = fn = 0. That system has a nonzero root if and only if c
lies in the subvariety ψ(I). For every such c we have
The two inequalities in follow respectively from parts (2) and (1) of Theorem
7 in Section I.6.3 of (Shafarevich 1977). We now choose c = (f1 , . . . , fn ) as
follows. Let f1 , . . . , fn−1 be any equations as in (25) which have only finitely
many zeros in P n−1 . Then choose fn which vanishes at exactly one of these
zeros, say y ∈ P n−1 . Hence ψ−1 (c) = {(c, y)}, a zero-dimensional variety.
For this particular choice of c both inequalities hold with equality. This
implies dim(ψ(I)) = N − 1.
We have shown that the image of I under ψ is an irreducible hypersurface
in C N , which is defined over Z. Hence there exists an irreducible polynomial
Res ∈ Z[c], unique up to sign, whose zero set equals ψ(I). By construction,
this polynomial Res(u) satisfies properties (a) and (b) of Theorem 30.
Part (c) of the theorem is derived from Bézout’s Theorem.
Various determinantal formulas are known for the multivariate resultant.
The most useful formulas are mixtures of Bézout matrices and Sylvester ma-
trices like the expression in the middle of (23). Exact division-free formulas
of this kind are available for n ≤ 4. We discuss such formulas for n = 3.
The first non-trivial case is d1 = d2 = d3 = 2. Here the problem is to
eliminate two variables x and y from a system of three quadratic forms
F = a0 x2 + a1 xy + a2 y 2 + a3 xz + a4 yz + a5 z 2 ,
G = b0 x2 + b1 xy + b2 y 2 + b3 xz + b4 yz + b5 z 2 ,
H = c0 x2 + c1 xy + c2 y 2 + c3 xz + c4 yz + c5 z 2 .
50
We next compute the partial derivatives of J. They are quadratic as well:
∂J/∂x = u0 x2 + u1 xy + u2 y 2 + u3xz + u4 yz + u5 z 2 ,
∂J/∂y = v0 x2 + v1 xy + v2 y 2 + v3 xz + v4 yz + v5 z 2 ,
∂J/∂z = w0 x2 + w1 xy + w2 y 2 + w3 xz + w4 yz + w5 z 2 .
51
For any such monomial, we chose arbitrary representations
(d − i − 1) + (d − j − 1) + (d − k − 1) = 3d − 3 − (i + j + k) = 2d − 2.
f = a0 x + a1 y + a2 xy, g = b0 + b1 xy + b2 y 2 , h = c0 + c1 xy + c2 x2 .
52
always have the common root (1 : 0 : 0), regardless of what the coefficients
ai , bj , ck are. In other words, the three given quadrics always intersect in the
projective plane. But they generally do not intersect in the affine plane C 2 .
In order for this to happen, the following polynomial in the coefficients must
vanish:
a21 b2 b21 c20 c1 − 2a21 b2 b1 b0 c0 c21 + a21 b2 b20 c31 − a21 b31 c20 c2 + 2a21 b21 b0 c0 c1 c2
−a21 b1 b20 c21 c2 − 2a1 a0 b22 b1 c20 c1 + 2a1 a0 b22 b0 c0 c21 + 2a1 a0 b2 b21 c20 c2
−2a1 a0 b2 b20 c21 c2 − 2a1 a0 b21 b0 c0 c22 + 2a1 a0 b1 b20 c1 c22 + a20 b32 c20 c1 − a20 b22 b1 c20 c2
−2a20 b22 b0 c0 c1 c2 + 2a20 b2 b1 b0 c0 c22 + a20 b2 b20 c1 c22 − a20 b1 b20 c32 − a22 b22 b1 c30
+a22 b22 b0 c20 c1 + 2a22 b2 b1 b0 c20 c2 − 2a22 b2 b20 c0 c1 c2 − a22 b1 b20 c0 c22 + a22 b30 c1 c22 .
We may assume that L{0,1,...,n} = Zn. Let rank(J) denote the rank of the
lattice LJ . A subcollection of supports {Ai }i∈I is said to be essential if
53
Lemma 31. The projective variety Z̄ is irreducible and defined over Q .
It is possible that Z̄ is not a hypersurface but has codimension ≥ 2. This
is where the condition of the supports being essential comes in. It is known
that the codimension of Z̄ in P m0 −1 × · · · × P mn −1 equals the maximum of
the numbers #(I) − rank(I), where I runs over all subsets of {0, 1, . . . , n}.
We now define the sparse resultant Res. If codim(Z̄) = 1 then Res is the
unique (up to sign) irreducible polynomial in Z[. . . , cia , . . .] which vanishes on
the hypersurface Z̄. If codim(Z̄) ≥ 2 then Res is defined to be the constant
1. We have the following result. Theorem 32 is a generalization of Theorem
30, in the same way that Bernstein’s Theorem generalizes Bézout’s Theorem.
Theorem 32. Suppose that {A0 , A1, . . . , An } is essential, and let Qi denote
the convex hull of Ai . For all i ∈ {0, . . . , n} the degree of Res in the i-th group
of variables {cia , a ∈ Ai } is a positive integer, equal to the mixed volume
X X
M(Q0 , . . . , Qi−1 , Qi+1 . . . , Qn ) = (−1)#(J) · vol Qj .
J⊆{0,...,i−1,i+1...,n} j∈J
54
In our example the triangles Q2 and Q3 coincide, and we have
This implies
This explains why the sparse resultant (4.2) is quadratic in (a0 , a1 , a2 ) and
homogeneous of degree 3 in (b0 , b1 , b2 ) and in (c0 , c1 , c2 ) respectively.
One of the central problems in elimination theory is to find “nice” deter-
minantal formulas for resultants. The best one can hope for is a Sylvester-type
formula, that is, a square matrix whose non-zero entries are the coefficients
of the given equation and whose determinant equals precisely the resultant.
The archetypical example of such a formula is (24). Sylvester-type formulas
do not exist in general, even for the classical multivariate resultant.
If a Sylvester-type formula is not available or too hard to find, the next
best thing is to construct a “reasonably small” square matrix whose deter-
minant is a non-zero multiple of the resultant under consideration. For the
sparse resultant such a construction was given in (Canny and Emiris 1995)
and (Sturmfels 1994). A Canny-Emiris matrix for our example is
y2 y3 xy 3 y4 xy 4 xy 2 x2 y 2 x2 y 3 y xy
yf a1 0 0 0 0 a2 0 0 0 a0
2
y f 0 a1 a2 0 0 a0 0 0 0 0
xy 2 f
0 0 a1 0 0 0 a0 a2 0 0
y g
2
b0 0 b1 b2 0 0 0 0 0 0
2
xy g 0 0 0 0 b2 b0 0 b1 0 0
yg 0 b2 0 0 0 b1 0 0 b0 0
xyg 0 0 b2 0 0 0 b1 0 0 b0
xy 2 h
0 0 0 0 c2 c0 0 c1 0 0
yh 0 c2 0 0 0 c1 0 0 c0 0
xyh 0 0 c2 0 0 0 c1 0 0 c0
The determinant of this matrix equals a1 b2 times the sparse resultant (4.2).
The structure of this 10 × 10-matrix can be understood as follows. Form
the product f gh and expand it into monomials in x and y. A certain com-
binatorial rule selects 10 out of the 15 monomials appearing in f gh. The
columns are indexed by these 10 monomials. Suppose the i-th column is
55
indexed by the monomial xj y k , Next there is a second combinatorial rule
which selects a monomial multiple of one of the input equations f , g or h
such that this multiple contains xi y j in its expansion. The i-th row is in-
dexed by that polynomial. Finally the (i, j)-entry contains the coefficient
of the j-th column monomial in the i-th row polynomial. This construction
implies that the matrix has non-zero entries along the main diagonal. The
two combinatorial rules mentioned in the previous paragraph are based on
the geometric construction of a mixed subdivision of the Newton polytypes.
The main difficulty overcome by the Canny-Emiris formula is this: If
one sets up a matrix like the one above just by “playing around” then most
likely its determinant will vanish (try it), unless there is a good reason why it
shouldn’t vanish. Now the key idea is this: a big unknown polynomial (such
as Res) will be non-zero if one can ensure that its initial monomial (with
respect to some term order) is non-zero.
Consider the lexicographic term order induced by the variable ordering
a1 > a0 > a2 > b2 > b1 > b0 > c0 > c1 > c2 . The 24 monomials of Res
are listed in this order above. Consider all 10 ! permutations contribute a
(possible) non-zero term to the expansion of the determinant of the Canny-
Emiris matrix. There will undoubtedly be some cancellation. However, the
unique largest monomial (in the above term order) appears only once, namely,
on the main diagonal. This guarantees that the determinant is a non-zero
polynomial. Note that the product of the diagonal elements in equals a1 b2
times the underlined leading monomial.
An explicit combinatorial construction for all possible initial monomials
(with respect to any term order) of the sparse resultant is given in (Sturmfels
1993). It is shown there that for any such initial monomial there exists a
Canny-Emiris matrix which has that monomial on its main diagonal.
A := A 0 = A1 = · · · = A n ⊂ Zn.
In this situation, the sparse resultant Res is the Chow form of the projec-
tive toric variety
XA which parametrically given by the vector of monomials
x : a ∈ A . Chow forms play a central role in elimination theory, and it
a
56
is of great importance to find determinantal formulas for Chow forms of fre-
quently appearing projective varieties. Significant progress in this direction
has been made in the recent work of Eisenbud, Floystad, Schreyer on exte-
rior syzygies and the Bernstein-Bernstein-Beilinson correspondence. Khetan
(2002) has applied these techniques to give an explicit determinantal for-
mula of mixed Bézout-Sylvester type for the Chow form of any toric surface
or toric threefold. This provides a very practical technique for eliminating
two variables from three equations or three variables from four equations.
We describe Khetan’s formula for an example. Consider the following
unmixed system of three equations in two unknowns:
f = a1 + a2 x + a3 y + a4 xy + a5 x2 y + a6 xy 2 ,
g = b1 + b2 x + b3 y + b4 xy + b5 x2 y + b6 xy 2 ,
h = c1 + c2 x + c3 y + c4 xy + c5 x2 y + c6 xy 2 .
4.5 Exercises
57