Solving Polynomial Systems

SOLVING SYSTEMS OF POLYNOMIAL EQUATIONS
Bernd Sturmfels Department of Mathematics University of California at Berkeley Berkeley CA 94720, USA [email protected] May 17, 2002
Howdy readers, These are the lecture notes for ten lectures to be given at the CBMS Conference at Texas A & M University, College Station, during the week of May 20-24, 2002. Details about this conference are posted at the web site http://www.math.tamu.edu/conferences/cbms/ These notes are still unpolished and surely full of little bugs and omissions. Hopefully there are no major errors. I would greatly appreciate your comments on this material. All comments (including typos, missing commas and all that) will be greatly appreciated. Please e-mail your comments before June 9, 2002, to the e-mail address given above. Many thanks in advance.
Bernd
Polynomials in One Variable
The study of systems of polynomial equations in many variables requires a good understanding of what can be said about one polynomial equation in one variable. The purpose of this lecture is to provide some basic tools on this matter. We shall consider the problem of how to compute and how to represent the zeros of a general polynomial of degree d in one variable x: p(x) = ad xd + ad1 xd1 + + a2 x2 + a1 x + a0 . (1)
1.1
The Fundamental Theorem of Algebra
We begin by assuming that the coecients ai lie in the eld of rational numbers, with ad = 0, where the variable x ranges over the eld of complex numbers. Our starting point is the fact that is algebraically closed. Theorem 1. (Fundamental Theorem of Algebra) The polynomial p(x) has d roots, counting multiplicities, in the eld of complex numbers. If the degree d is four or less, then the roots are functions of the coecients which can be expressed in terms of radicals. The command solve in maple will produce these familiar expressions for us: > solve( a2 * x^2 + a1 * x + a0, x );
2 1/2 2 1/2 -a1 + (a1 - 4 a2 a0) -a1 - (a1 - 4 a2 a0) 1/2 ------------------------, 1/2 -----------------------a2 a2
> lprint( solve( a3 * x^3 + a2 * x^2 +
a1 * x + a0, x )[1] );
1/6/a3*(36*a1*a2*a3-108*a0*a3^2-8*a2^3+12*3^(1/2)*(4*a1^3*a3 -a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2+4*a0*a2^3)^(1/2)*a3) ^(1/3)+2/3*(-3*a1*a3+a2^2)/a3/(36*a1*a2*a3-108*a0*a3^2-8*a2^3 +12*3^(1/2)*(4*a1^3*a3-a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2 +4*a0*a2^3)^(1/2)*a3)^(1/3)-1/3*a2/a3
The polynomial p(x) has d distinct roots if and only if its discriminant is nonzero. Can you spot the discriminant of the cubic equation in the previous maple output? In general, the discriminant is computed from the resultant of p(x) and its rst derivative p (x) as follows: discrx (p(x)) = 1 resx (p(x), p (x)). ad
This is an irreducible polynomial in the coecients a0 , a1 , . . . , ad . It follows from Sylvesters matrix for the resultant that the discriminant is a homogeneous polynomial of degree 2d 2. Here is the discriminant of a quartic: > f := a4 * x^4 + a3 * x^3 + a2 * x^2 + > lprint(resultant(f,diff(f,x),x)/a4); a1 * x + a0 :
-192*a4^2*a0^2*a3*a1-6*a4*a0*a3^2*a1^2+144*a4*a0^2*a2*a3^2 +144*a4^2*a0*a2*a1^2+18*a4*a3*a1^3*a2+a2^2*a3^2*a1^2 -4*a2^3*a3^2*a0+256*a4^3*a0^3-27*a4^2*a1^4-128*a4^2*a0^2*a2^2 -4*a3^3*a1^3+16*a4*a2^4*a0-4*a4*a2^3*a1^2-27*a3^4*a0^2 -80*a4*a3*a1*a2^2*a0+18*a3^3*a1*a2*a0 This sextic is the determinant of the following 7 7-matrix divided by a4: > with(linalg): > sylvester(f,diff(f,x),x); [ a4 [ [ 0 [ [ 0 [ [4 a4 [ [ 0 [ [ 0 [ [ 0 a3 a4 0 3 a3 4 a4 0 0 a2 a3 a4 2 a2 3 a3 4 a4 0 3 a1 a2 a3 a1 2 a2 3 a3 4 a4 a0 a1 a2 0 a1 2 a2 3 a3 0 a0 a1 0 0 a1 2 a2 0 ] ] 0 ] ] a0] ] 0 ] ] 0 ] ] 0 ] ] a1]
Galois theory tells us that there is no general formula which expresses the roots of p(x) in radicals if d 5. For specic instances with d not too big, say d 10, it is possible to compute the Galois group of p(x) over . Occasionally, one is lucky and the Galois group is solvable, in which case maple has a chance of nding the solution of p(x) = 0 in terms of radicals. > f := x^6 + 3*x^5 + 6*x^4 + 7*x^3 + 5*x^2 + 2*x + 1: > galois(f); "6T11", {"[2^3]S(3)", "2 wr S(3)", "2S_4(6)"}, "-", 48, {"(2 4 6)(1 3 5)", "(1 5)(2 4)", "(3 6)"} > solve(f,x)[1]; 1/2 1/3 1/12 (-6 (108 + 12 69 ) 1/2 2/3 1/2 1/2 1/3 1/2 + 6 I (3 (108 + 12 69 ) + 8 69 + 8 (108 + 12 69 ) ) / 1/2 1/3 / (108 + 12 69 ) / The number 48 is the order of the Galois group and its name is "6T11". Of course, the user now has to consult help(galois) in order to learn more.
+ 72 )
1.2
Numerical Root Finding
In symbolic computation, we frequently consider a polynomial problem as solved if it has been reduced to nding the roots of one polynomial in one variable. Naturally, the latter problem can still be a very interesting and challenging one from the perspective of numerical analysis, especially if d gets very large or if the ai are given by oating point approximations. In the problems studied in this course, however, the ai are usually exact rational numbers and the degree d rarely exceeds 200. For numerical solving in this range, maple does reasonably well and matlab has no diculty whatsoever.
> Digits := 6: > f := x^200 - x^157 + 8 * x^101 - 23 * x^61 + 1: > fsolve(f,x); .950624, 1.01796 This polynomial has only two real roots. To list the complex roots, we say: > fsolve(f,x,complex); -1.02820-.0686972 I, -1.02820+.0686972 I, -1.01767-.0190398 I, -1.01767+.0190398 I, -1.01745-.118366 I, -1.01745 + .118366 I, -1.00698-.204423 I, -1.00698+.204423 I, -1.00028 - .160348 I, -1.00028+.160348 I, -.996734-.252681 I, -.996734 + .252681 I, -.970912-.299748 I, -.970912+.299748 I, -.964269 - .336097 I, ETC...ETC.. Our polynomial p(x) is represented in matlab as the row vector of its coecients [ad ad1 . . . a2 a1 a0 ]. For instance, the following two commands compute the three roots of the dense cubic p(x) = 31x3 + 23x2 + 19x + 11. >> p = [31 23 19 11]; >> roots(p) ans = -0.0486 + 0.7402i -0.0486 - 0.7402i -0.6448 Representing the sparse polynomial p(x) = x200 x157 + 8x101 23x61 + 1 considered above requires introducing lots of zero coecients: >> p=[1 zeros(1,42) -1 zeros(1,55) 8 zeros(1,39) -23 zeros(1,60) 1] >> roots(p) ans = -1.0282 + 0.0687i -1.0282 - 0.0687i -1.0177 + 0.0190i -1.0177 - 0.0190i -1.0174 + 0.1184i -1.0174 - 0.1184i ETC...ETC.. 5
We note that convenient facilities are available for calling matlab inside of maple and for calling maple inside of matlab. We wish to encourage our readers to experiment with the passage of data between these two programs. Some numerical methods for solving a univariate polynomial equation p(x) = 0 work by reducing this problem to computing the eigenvalues of the companion matrix of p(x), which is dened as follows. Let V denote the quotient of the polynomial ring modulo the ideal p(x) generated by the polynomial p(x). The resulting quotient ring V = [x]/ p(x) is a ddimensional -vector space. Multiplication by the variable x denes a linear map from this vector space to itself. Timesx : V V , f (x) x f (x). (2)
The companion matrix is the dd-matrix which represents the endomorphism Timesx with respect to the distinguished monomial basis {1, x, x2 , . . . , xd1 } of V . Explicitly, the companion matrix of p(x) looks like this: 0 0 0 a0 /ad 1 0 0 a1 /ad (3) Timesx = 0 1 0 a2 /ad . . .. . . . . . . . . . . . 0 0 . . . 1 ad1 /ad Proposition 2. The zeros of p(x) are the eigenvalues of the matrix Times x . Proof. Suppose that f (x) is a polynomial in [x] whose image in V = [x]/ p(x) is an eigenvector of (2) with eigenvalue . Then x f (x) = f (x) in the quotient ring, which means that (x ) f (x) is a multiple of p(x). Since f (x) is not a multiple of p(x), we conclude that is a root of p(x) as desired. Conversely, if is any root of p(x) then the polynomial f (x) = p(x)/(x ) represents an eigenvector of (2) with eigenvalue . Corollary 3. The following statements about p(x) [x] are equivalent: The polynomial p(x) is square-free, i.e., it has no multiple roots in The companion matrix Timesx is diagonalizable. The ideal p(x) is a radical ideal in [x]. 6 .
We note that the set of multiple roots of p(x) can be computed symbolically by forming the greatest common divisor of p(x) and its derivative: q(x) = gcd(p(x), p (x)) (4)
Thus the three conditions in the Corollary are equivalent to q(x) = 1. Every ideal in the univariate polynomial ring [x] is principal. Writing p(x) for the ideal generator and computing q(x) from p(x) as in (4), we get the following general formula for computing the radical of any ideal in [x]: Rad p(x) = p(x)/q(x) (5)
1.3
Real Roots
In this subsection we describe symbolic methods for computing information about the real roots of a univariate polynomial p(x). In what follows, we assume that p(x) is a squarefree polynomial. It is easy to achieve this by removing all multiplicities as in (4) and (5). The Sturm sequence of p(x) is the following sequence of polynomials of decreasing degree: p0 (x) := p(x), p1 (x) := p (x), pi (x) := rem(pi2 (x), pi1 (x)) for i 2. Thus pi (x) is the negative of the remainder on division of pi2 (x) by pi1 (x). Let pm (x) be the last non-zero polynomial in this sequence. Theorem 4. (Sturms Theorem) If a < b in and neither is a zero of p(x) then the number of real zeros of p(x) in the interval [a, b] is the number of sign changes in the sequence p 0 (a), p1 (a), p2 (a), . . . , pm (a) minus the number of sign changes in the sequence p 0 (b), p1 (b), p2 (b), . . . , pm (b). We note that any zeros are ignored when counting the number of sign changes in a sequence of real numbers. For instance, a sequence of twelve number with signs +, +, 0, +, , , 0, +, , 0, , 0 has three sign changes. If we wish to count all real roots of a polynomial p(x) then we can apply Sturms Theorem to a = and b = , which amounts to looking at the signs of the leading coecients of the polynomials pi in the Sturm sequence. Using bisection, one gets a procedure for isolating the real roots by rational intervals. This method is conveniently implemented in maple:
> p := x^11-20*x^10+99*x^9-247*x^8+210*x^7-99*x^2+247*x-210: > sturm(p,x,-INFINITY, INFINITY); 3 > sturm(p,x,0,10); 2 > sturm(p,x,5,10); 0 > realroot(p,1/1000); 1101 551 1465 733 14509 7255 [[----, ---], [----, ---], [-----, ----]] 1024 512 1024 512 1024 512 > fsolve(p); 1.075787072, 1.431630905, 14.16961992 Another important classical result on real roots is the following: Theorem 5. (Dscartes Rule of Signs) The number of positive real roots of e a polynomial is at most the number of sign changes in its coecient sequence. For instance, the polynomial p(x) = x200 x157 + 8x101 23x61 + 1, which was featured in Section 1.2, has four sign changes in its coecient sequence. Hence it has at most four positive real roots. The true number is two. Corollary 6. A polynomial with m terms can have at most 2m1 real zeros. The bound in this corollary is optimal as the following example shows:
m1
x
j=1
(x2 j)
All 2m 1 zeros of this polynomial are real, and its expansion has m terms.
1.4
Puiseux Series
Suppose now that the coecients ai of our given polynomial are not rational numbers but they are rational functions ai (t) in another parameter t. Hence we wish to determine the zeros of a polynomial in K[x] where K = (t). p(t; x) = ad (t)xd + ad1 (t)xd1 + + a2 (t)x2 + a1 (t)x + a0 (t). 8 (6)
The role of the ambient algebraically closed eld containing K is now played by the eld {{t}} of Puiseux series. The elements of {{t}} are formal power and having rational exponents, subject to series in t with coecients in the condition that the set of appearing exponents is bounded below and has a common denominator. Equivalently,
{{t}}
=
N =1
((t N )),
where ((y)) abbreviates the eld of Laurent series in y with coecients in . A classical theorem in algebraic geometry states that {{t}} is algebraically closed. For a modern treatment see (Eisenbud 1994, Corollary 13.15). Theorem 7. (Puiseuxs Theorem) The polynomial p(t; x) has d roots, counting multiplicities, in the eld of Puiseux series {{t}}. The proof of Puiseuxs theorem is algorithmic, and, lucky for us, there is an implementation of this algorithm in maple. Here is how it works: > with(algcurves): p := x^2 + x - t^3; 2 3 p := x + x - t
> puiseux(p,t=0,x,20); 18 15 12 9 6 3 {-42 t + 14 t - 5 t + 2 t - t + t , 18 15 12 9 6 3 + 42 t - 14 t + 5 t - 2 t + t - t - 1 } We note that this program generally does not compute all Puiseux series solutions but only enough to generate the splitting eld of p(t; x) over K. > with(algcurves): q := > puiseux(q,t=0,x,20); x^2 + t^4 * x - t: 15/2 4 1/2 t - 1/2 t + t }
29/2 {- 1/128 t + 1/8 > S := solve(q,x): > series(S[1],t,20); 1/2 4 15/2 t - 1/2 t + 1/8 t > series(S[2],t,20); 1/2 4 15/2 -t - 1/2 t - 1/8 t + 9
29/2 43/2 1/128 t + O(t ) 29/2 43/2 1/128 t + O(t )
We shall explain how to compute the rst term (lowest order in t) in each of the d Puiseux series solutions x(t) to our equation p(t; x) = 0. Suppose that the i-th coecient in (6) has the Laurent series expansion: ai (t) = ci tAi + higher terms in t.
Each Puiseux series looks like x(t) = t + higher terms in t.
We wish to characterize the possible pairs of numbers (, ) in which allow the identity p(t; x(t)) = 0 to hold. This is done by rst nding the possible values of . We ignore all higher terms and consider an equation cd tAd +d + cd1 tAd1 +(d1) + + c1 tA1 + + c0 tA0 = 0. This equation imposes the following piecewise-linear condition on : min{Ad +d, Ad1 +(d1), . . . , A2 +2, A1 +, A0 } is attained twice. (8) The crucial condition (8) will reappear in Lectures 3 and 9. Throughout this book, the phrase is attained twice will always mean is attained at least twice. As an illustration consider the example p(t; x) = x2 + x t3 . For this polynomial, the condition (8) reads min{ 0 + 2, 0 + , 3 } is attained twice. That sentence means the following disjunction of linear inequality systems: 2 = 3 or 2 = 3 This disjunction is equivalent to = 0 or = 3, which gives us the lowest terms in the two Puiseux series produced by maple. It is customary to phrase the procedure described above in terms of the Newton polygon of p(t; x). This polygon is the convex hull in 2 of the points (i, Ai ) for i = 0, 1, . . . , d. The condition (8) is equivalent to saying that equals the slope of an edge on the lower boundary of the Newton polygon. Here is a picture of the Newton polygon of the equation p(t; x) = x2 + x t3 : 10 or 3 = 2. (7)
Figure:
The lower boundary of the Newton polygon
1.5
Hypergeometric Series
The method of Puiseux series can be extended to the case when the coecients ai are rational functions in several variables t1 , . . . , tm . The case m = 1 was discussed in the last section. We now examine the generic case when all d + 1 coecients a0 , . . . , ad in (1) are indeterminates. Each zero X of the polynomial in (1) is an algebraic function of d + 1 variables, written X = X(a0 , . . . , ad ). The following theorem due to Karl Mayer (1937) characterizes these functions by the dierential equations which they satisfy. Theorem 8. The roots of the general equation of degree d are a basis for the solution space of the following system of linear partial dierential equations:
2X ai aj d i=0
2X ak al
whenever i + j = k + l, and
d i=0
(9) (10)
iai X = X ai
ai X = 0. ai
The meaning of the statement are a basis for the solution space of will be explained at the end of this section. Let us rst replace this statement by are solutions of and prove the resulting weaker version of the theorem. Proof. The two Euler equations (10) express the scaling invariance of the roots. They are obtained by applying the operator d/dt to the identities X(a0 , ta1 , t2 a2 , . . . , td1 ad1 , td ad ) X(ta0 , ta1 , ta2 , . . . , tad1 , tad ) = = 11
1 t
X(a0 , a1 , a2 , . . . , ad1 , ad ), X(a0 , a1 , a2 , . . . , ad1 , ad ).
d i1 To derive (9), we consider the rst derivative f (x) = and i=1 iai x d i2 . Note that f (X) = the second derivative f (x) = i=2 i(i 1)ai x 0, since a0 , . . . , ad are indeterminates. Dierentiating the dening identity d i i=0 ai X(a0 , a1 , . . . , ad ) = 0 with respect to aj , we get
X j + f (X) From this we derive f (X) ai =
X aj
0.
(11)
f (X) X i + iX i1 . f (X)
(12)
We next dierentiate X/aj with respect to the indeterminate ai : Xj 2X = ai aj ai f (X) = f (X) j X X f (X)2 jX j1 f (X)1 . ai ai (13)
Using (11) and (12), we can rewrite (13) as follows: 2X ai aj = f (X)X i+j f (X)3 + (i + j)X i+j1 f (X)2 .
This expression depends only on the sum of indices i+j. This proves (9). We check the validity of our dierential system for the case d = 2 and we note that it characterizes the series expansions of the quadratic formula. > X := solve(a0 + a1 * x + a2 * x^2, x)[1]; 2 1/2 -a1 + (a1 - 4 a2 a0) X := 1/2 -----------------------a2 > simplify(diff(diff(X,a0),a2) - diff(diff(X,a1),a1)); 0 > simplify( a1*diff(X,a1) + 2*a2*diff(X,a2) + X ); 0 > simplify(a0*diff(X,a0)+a1*diff(X,a1)+a2*diff(X,a2)); 0 12
> series(X,a1,4); 1/2 1/2 (-a2 a0) 1 (-a2 a0) 2 4 ----------- - 1/2 ---- a1 - 1/8 ----------- a1 + O(a1 ) a2 a2 2 a2 a0 What do you get when you now say series(X,a0,4) or series(X,a2,4)? Writing series expansions for the solutions to the general equation of degree d has a long tradition in mathematics. In 1757 Johann Lambert expressed the roots of the trinomial equation xp + x + r as a Gauss hypergeometric function in the parameter r. Series expansions of more general algebraic functions were subsequently given by Euler, Chebyshev and Eisenstein, among others. The widely known poster Solving the Quintic with Mathematica published by Wolfram Research in 1994 gives a nice historical introduction to series solutions of the general equation of degree ve: a5 x5 + a4 x4 + a3 x3 + a2 x2 + a1 x + a0 = 0. (14)
Mayrs Theorem can be used to write down all possible Puiseux series solutions to the general quintic (14). There are 16 = 251 distinct expansions. For instance, here is one of the 16 expansions of the ve roots: X1 =
a0 a1
X2 =
a3 a4
a1 a2
+ ,
a0 a1
X3 =
a4 a5
a2 a3 a3 a4
+ .
a1 a2
X4 =
a2 a3
X5 =
Each bracket is a series having the monomial in the bracket as its rst term: a0 a1 a1 a2 a2 a3 a3 a4 a4 a5 = = = = =
a0 a1 a1 a2
a2 a2 0 a3 1
a3 a3 0 a4 1
+2
a3 a2 0 2 a5 1
a4 a4 0 a5 1
5 +
a4 a2 a3 0 a6 1 a4 a5 1 a5 2 a3 a5 2 a4 3
a5 a5 0 a6 1
+ +
a2 a3 1 a3 2
a3 a4 1 a4 2
a0 a2 a5 1 a4 2
3
+2
a3 a3 1 3 a5 2 a2 a4 2 a3 3
a4 a3 a4 1 a6 2 a3 a2 2 4 a5 3
a2 a3 a3 a4

a4 a5
a0 a5 a2 3
a1 a4 a2 3
+ 2 a1 a2 a5 + a3
a1 a2 5 a3 4
+2 +4
+ +
a2 a5 a2 4
a2 a5 3 a3 4
a2 a3 a2 5 a4 4
a0 a3 5 a4 4
a1 a3 a3 5 a5 4
13
The last bracket is just a single Laurent monomial. The other four brackets ai1 can easily be written as an explicit sum over 4 . For instance, ai a0 a1 = (1)2i+3j+4k+5l (2i+3j +4k+5l)! ai+2j+3k+4l+1 ai aj ak al 2 3 0 2i+3j+4k+5l+1 4 5 i ! j ! k ! l ! (i+2j +3k+4l + 1)! a1 i,j,k,l0
Each coecient appearing in one of these series is integral. Therefore these ve formulas for the roots work in any characteristic. The situation is different for the other 15 series expansions of the roots of the quintic (14). For instance, consider the expansions into positive powers in a1 , a2 , a3 , a4 . They are X = a0
1/5 1/5 a5
1 a1 a2 a3 a4 2 3/5 2/5 + 3 2/5 3/5 + 4 1/5 4/5 5 a5 a0 a5 a0 a5 a0 a5
where runs over the ve complex roots of the equation 5 = 1, and a0

1/5
1/5 a5 a1 3/5 2/5 a0 a5 a2 2/5 3/5 a0 a5 a3 1/5 4/5 a0 a5
= = = =
a0
1/5
1/5 a5
a1 a4 1 25 a4/5 a6/5 0 5
a2 a3 1 25 a4/5 a6/5
0 5
a2 a3 2 1 125 a9/5 a6/5 0 5
a2 a2 3 4 125 a4/5 a11/5 0 5 6 a1 a2 a3 25 a8/5 a7/5 0 5 a1 a2 3 3 7/5 8/5 25 a a

0 5
+ + + +
a0
a1 3/5 2/5 a5
a2 1 3 5 a3/5 a7/5
0 5
2 a2 a4 5 a3/5 a7/5 0 5 3 a3 a4 5 a2/5 a8/5

0 5
+ + +
a3 a2 7 4 25 a3/5 a12/5 0 5 6 a1 a2 a4 25 a7/5 a8/5

0 5
+ + +
a2 2/5 3/5 a0 a5 a3 1/5 4/5 a0 a5
a2 1 1 7/5 3/5 5a a
0 5
1 a1 a2 5 a6/5 a4/5
0 5
a2 2 4 5 a1/5 a9/5
0 5
a3 1 1 25 a11/5 a4/5
0 5
4 a1 a3 a4 25 a6/5 a9/5 0 5
Each of these four series can be expressed as an explicit sum over the lattice points in a 4-dimensional polyhedron. The general formula can be found in Theorem 3.2 of Sturmfels (2000). That reference gives all 2n1 distinct Puiseux series expansions of the solution of the general equation of degree d. The system (9)-(10) is a special case of the hypergeometric dierential equations discussed in (Saito, Sturmfels and Takayama, 1999). More precisely, it is the Gelfand-Kapranov-Zelevinsky system with parameters 1 0 associated with the integer matrix A = 0 1 2 3 n 1 n 1 1 1 1 1 1 .
14
We abbreviate the derivation ai by the symbol i and we consider the ideal generated by the operators (10) in the commutative polynomial ring [0 , 1 , . . . , d ]. This is the ideal of the 2 2-minors of the matrix
0 1 2 d1 1 2 3 d
This ideal denes a projective curve of degree d, namely, the rational normal curve, and from this it follows that our system (9)-(10) is holonomic of rank d. This means the following: Let (a0 , . . . , ad ) be any point in d+1 such that the discriminant of p(x) is non-zero, and let U be a small open ball around that point. Then the set of holomorphic functions on U which are solutions to (9)-(10) is a complex vector space of dimension d. Theorem 8 states that the d roots of p(x) = 0 form a distinguished basis for that vector space.
1.6
Exercises
(1) Describe the Jordan canonical form of the companion matrix Timesx . What are the generalized eigenvectors of the endomorphism (2)? (2) We dene a unique cubic polynomial p(x) by four interpolation conditions p(xi ) = yi for i = 0, 1, 2, 3. The discriminant of p(x) is a rational function in x0 , x1 , x2 , x3 , y0 , y1 , y2 , y3. What is the denominator of this rational function, and how many terms does the numerator have? (3) Create a symmetric 50 50-matrix whose entries are random integers between 10 and 10 and compute the eigenvalues of your matrix. (4) For which complex parameters is the following system solvable? xd = x3 x + 1 = 0.
(5) Consider the set of all 65, 536 polynomials of degree 15 whose coecients are +1 or 1. Answer the following questions about this set: (a) Which polynomial has largest discriminant? (b) Which polynomial has the smallest number of complex roots? (c) Which polynomial has the complex root of largest absolute value? (d) Which polynomial has the most real roots? 15
(6) Give a necessary and sucient condition for quartic equation a4 x4 + a3 x3 + a2 x2 + a1 x + a0 = 0
to have exactly two real roots. We expect a condition which is a Boolean combination of polynomial inequalities involving a0 , a1 , a2 , a3 , a4 . (7) Describe an algebraic algorithm for deciding whether a polynomial p(x) has a complex root of absolute value one. (8) Compute all ve Puiseux series solutions x(t) of the quintic equation x5 + t x4 + t3 x3 + t6 x2 + t10 x + t15 What is the coecient of tn in each of the ve series? (9) Fix two real symmetric n n-matrices A and B. Consider the set of points (x, y) in the plane 2 such that all eigenvalues of the matrix xA + yB are non-negative. Show that this set is closed and convex. Does every closed convex semi-algebraic subset of 2 arise in this way? (10) Let and be integers and consider the following system of linear dierential equations for an unknown function X(a0 , a1 , a2 ): 2 X/a0 a2 = 2 X/a2 1 X X a1 a1 + 2a2 a1 = X
X X X a0 a0 + a1 a1 + a2 a2 = X
For which values of and do (non-zero) polynomial solutions exist? Same question for rational solutions and algebraic solutions.
16
Grbner Bases of Zero-Dimensional Ideals o
Suppose we are given polynomials f1 , . . . , fm in [x1 ,. . . , xn ] which are known to have only nitely many common zeros in n . Then I = f1 , . . . , fm , the ideal generated by these polynomials, is zero-dimensional. In this section we demonstrate how Grbner bases can be used to compute the zeros of I. o
2.1
Computing Standard Monomials and the Radical
Let be a term order on the polynomial ring S = [x1 ,. . . , xn ]. Every ideal I in S has a unique reduced Grbner basis G with respect to . The leading o terms of the polynomials in G generate the initial monomial ideal in (I). Let B = B (I) denote the set of all monomials xu = xu1 xu2 xun which do 1 2 n not lie in in (I). These are the standard monomials of I with respect to . Every polynomial f in S can be written uniquely as a -linear combination of B modulo I, using the division algorithm with respect to the Grbner o n for the complex variety dened by the ideal I. basis G. We write V(I) Proposition 9. The variety V(I) is nite if and only if the set B is nite, and the cardinality of B equals the cardinality of V(I), counting multiplicities. Consider an example with three variables denoted S = [x, y, z]: I = (x y)3 z 2 , (z x)3 y 2, (y z)3 x2 ) . (15)
The following Macaulay2 computation veries that I is zero-dimensional: i1 : S = QQ[x,y,z]; i2 : I = ideal( (x-y)^3-z^2, (z-x)^3-y^2, (y-z)^3-x^2 ); o2 : Ideal of S i3 : dim I, degree I o3 = (0, 14) i4 : gb I o4 = | y2z-1/2xz2-yz2+1/2z3+13/60x2-1/12y2+7/60z2 x2z-xz2-1/2yz2+1/2z3+1/12x2-13/60y2-7/60z2 y3-3y2z+3yz2-z3-x2 xy2-2x2z-3y2z+3xz2+4yz2-3z3-7/6x2+5/6y2-1/6z2 17
x2y-xy2-x2z+y2z+xz2-yz2+1/3x2+1/3y2+1/3z2 x3-3x2y+3xy2-3y2z+3yz2-z3-x2-z2 z4+1/5xz2-1/5yz2+2/25z2 yz3-z4-13/20xz2-3/20yz2+3/10z3+2/75x2-4/75y2-7/300z2 xz3-2yz3+z4+29/20xz2+19/20yz2-9/10z3-8/75x2+2/15y2+7/300z2 xyz2-3/2y2z2+xz3+yz3-3/2z4+y2z-1/2xz2 -7/10yz2+1/5z3+13/60x2-1/12y2-1/12z2| i5 : toString (x^10 % I) o5 = -4/15625*x*z^2+4/15625*z^3-559/1171875*x^2 -94/1171875*y^2+26/1171875*z^2 i6 : R = S/I; basis R
o7 = | 1 x x2 xy xyz xz xz2 y y2 yz yz2 z z2 z3 | 1 14 o7 : Matrix R <--- R The output o4 gives the reduced Grbner basis for I with respect to the o reverse lexicographic term order with x > y > z. We see in o7 that there are 14 standard monomials. In o5 we compute the expansion of x10 in this basis of S/I. We conclude that the number of complex zeros of I is at most 14. If I is a zero-dimensional ideal in S = [x1 , . . . , xn ] then the elimination ideal I [xi ] is non-zero for all i = 1, 2, . . . , n. Let pi (xi ) denote the generator of I [xi ]. The univariate polynomial pi can be gotten from a Grbner basis for I with respect to an elimination term order. Another o method is to use an arbitrary Grbner basis to compute the normal form of o successive powers of xi until they rst become linearly dependent. We denote the square-free part of the polynomial pi (xi ) by pi,red (xi ) = pi (xi )/gcd(pi(xi ), pi (xi )).
Theorem 10. A zero-dimensional ideal I is radical if and only if the n elimination ideals I [xi ] are radical. Moreover, the radical of I equals Rad(I) = I + p1,red , p2,red , . . . , pn,red .
Our example in (15) is symmetric with respect to the variables, so that I [x] = p(x) , I [y] = p(y) , 18 I [z] = p(z) .
The common generator of the elimination ideals is a polynomial of degree 8: p(x) = x8 + 6 6 17 4 8 x + x + x2 25 625 15625
This polynomial is not squarefree. Its squarefree part equals pred (x) = x7 + 6 5 17 3 8 x + x + x. 25 625 15625
Hence our ideal I is not radical. Using Theorem 10, we compute its radical: Rad(I) = = I + pred (x), pred (y), pred (z) x 5/2y 2 1/2y + 5/2z 2 1/2z, y + 3125/8z 6 + 625/4z 5 + 375/4z 4 + 125/4z 3 + 65/8z 2 + 3z, z 7 + 6/25z 5 + 17/625z 3 + 8/15625z .
The three given generators form a lexicographic Grbner basis. We see that o V(I) has cardinality seven. The only real root is the origin. The other six zeros of I in 3 are not real. They are gotten by cyclically shifting (x, y, z) = 0.14233 0.35878i, 0.14233 0.35878i, 0.15188i (x, y, z) = 0.14233 + 0.35878i, 0.14233 + 0.35878i, 0.15188i .
and
Note that the coordinates of these vectors also can be written in terms of radicals since pred (x)/x is a cubic polynomial in x2 .
2.2
Localizing and Removing Known Zeros
In the example above, the origin is a zero of multiplicity 8, and it would have made sense to remove this distinguished zero right from the beginning. In this section we explain how to do this and how the number 8 could have been derived a priori. Let I be a zero-dimensional ideal in S = [x1 , . . . , xn ] and p = (p1 , . . . , pn ) any point with coordinates in . We consider the associated maximal ideal M = x1 p1 , x2 p2 , . . . , xn pn S.
The ideal quotient of I by M is dened as I : M = f S : f M I . 19
We can iterate this process to get the increasing sequence of ideals I (I : M ) (I : M 2 ) (I : M 3 ) This sequence stabilizes with an ideal called the saturation I : M = f S : m : fm M I .
Proposition 11. The variety of (I : M ) equals V(I)\{p}. Here is how we compute the ideal quotient and the saturation in Macaulay 2. We demonstrate this for the ideal in the previous section and p = (0, 0, 0): i1 : R = QQ[x,y,z]; i2 : I = ideal( (x-y)^3-z^2, (z-x)^3-y^2, i3 : M = ideal( x , y, z ); i4 : gb (I : M) o4 = | y2z-1/2xz2-yz2+1/2z3+13/60x2-1/12y2+7/60z2 xyz+3/4xz2+3/4yz2+1/20x2-1/20y2 x2z-xz2-1/2yz2+ .... i5 : gb saturate(I,M) o5 = | z2+1/5x-1/5y+2/25 y2-1/5x+1/5z+2/25 xy+xz+yz+1/25 x2+1/5y-1/5z+2/25 | i6 : degree I, degree (I:M), degree (I:M^2), degree(I:M^3)
(y-z)^3-x^2 );
o6 = (14, 13, 10, 7) i7 : degree (I : M^4), degree (I : M^5), degree (I : M^6) o7 = (6, 6, 6) In this example, the fourth ideal quotient (I : M 4 ) equals the saturation (I : M ) = saturate(I,M). Since p = (0, 0, 0) is a zero of high multiplicity, namely eight, it would be interesting to further explore the local ring Sp /Ip . This is an 8-dimensional -vector space which tells the scheme structure at 20
p, meaning the manner in which those eight points pile on top of one another. The reader need not be alarmed is he has not yet fully digested the notion of schemes in algebraic geometry (Eisenbud and Harris 2000). An elementary but useful perspective on schemes will be provided in Lecture 10 where we discuss linear partial dierential equations with constant coecients. The following general method can be used to compute the local ring at an isolated zero of any polynomial system. Form the ideal quotient J = I : (I : M ) . (16)
Proposition 12. The ring S/J is isomorphic to the local ring S p /Ip under the natural map xi xi . In particular, the multiplicity of p as a zero of I equals the number of standard monomials for any Grbner basis of J. o In our example, the local ideal J is particularly simple and the multiplicity eight is obvious. Here is how the Macaulay 2 session continues: i8 : J = ( I : saturate(I,M) ) 2 2 2 o8 = ideal (z , y , x ) i9 : degree J o9 = 8 We note that Singular is ne-tuned for ecient computations in local rings via the techniques in Chapter 4 of (Cox, Little & OShea 1998). Propositions 11 and 12 provide a decomposition of the given ideal: I = J (I : M ). (17)
Here J is the iterated ideal quotient in (16). This ideal is primary to the maximal ideal M , that is, Rad(J) = M . We can now iterate by applying this process to the ideal (I : M ), and this will eventually lead to the primary decomposition of I. We shall return to this topic in later lectures. For the ideal in our example, the decomposition (17) is already the primary decomposition when working over the eld of rational numbers. It
21
equals = (x y)3 z 2 , (z x)3 y 2, (y z)3 x2 1 1 2 1 2 2 2 2 2 x , y , z z + 5 x 5 y + 25 , y 5 x + 1 z + 5 x2 + 1 y 1 z + 5 5

2 , 25 2 , 25 1 25
xy + xz + yz +
Note that the second ideal is maximal and hence prime in [x, y, z]. The given generators are a Grbner basis with leading terms underlined. o
2.3
Companion Matrices
Let I be a zero-dimensional ideal in S = [x1 , . . . , xn ], and suppose that the -vectorspace S/I has dimension d. In this section we assume that some Grbner basis of I is known. Let B denote the associated monomial basis for o S/I. Multiplication by any of the variables xi denes an endomorphism S/I S/I , f xi f (18)
We write Ti for the d d-matrix over which represents the linear map (18) with respect to the basis B. The rows and columns of Ti are indexed by the monomials in B. If xu , xv B then the entry of Ti in row xu and column xv is the coecient of xu in the normal form of xi xv . We call Ti the i-th companion matrix of the ideal I. It follows directly from the denition that the companion matrices commute pairwise: Ti Tj = Tj Ti for 1 i < j n.
The matrices Ti generate a commutative subalgebra of the non-commutative ring of d d-matrices, and this subalgebra is isomorphic to our ring
[T1 , . . . , Tn ]
S/I ,
Ti xi .
Theorem 13. The complex zeros of the ideal I are the vectors of joint eigenvalues of the companion matrices T 1 , . . . , Tn , that is, V(I) = (1 , . . . , n )
n
: v
i : Ti v = i v .
(19)
Proof. Suppose that v is a non-zero complex vector such that Ti v = i v for all i. Then, for any polynomial p S, p(T1 , . . . , Tn ) v = 22 p(1 , . . . , n ) v.
If p is in the ideal I then p(T1 , . . . , Tn ) is the zero matrix and we conclude that p(1 , . . . , n ) = 0. Hence the left hand side of (19) contains the right hand side of (19). We prove the converse under the hypothesis that I is a radical ideal. (The general case is left to the reader). Let = (1 , . . . , n ) n be any such that q() = 1 and zero of I. There exists a polynomial q S q vanishes at all points in V(I)\{}. Then xi q = i q holds on V(I), hence (xi i ) q lies in the radical ideal I. Let v be the non-zero vector representing the element q of S/I . Then v is a joint eigenvector with joint eigenvalue . Suppose that I is a zero-dimensional radical ideal. We can form a square invertible matrix V whose columns are the eigenvectors v described above. Then V 1 Ti V is a diagonal matrix whose entries are the i-th coordinates of all the zeros of I. This proves the if-direction in the following corollary. The only-if-direction is also true but we omit its proof. Corollary 14. The companion matrices T1 , . . . , Tn can be simultaneously diagonalized if and only if I is a radical ideal. As an example consider the Grbner basis given at the end of the last o section. The given ideal is a prime ideal in [x, y, z] having degree d = 6. We determine the three companion matrices Tx, Ty and Tz using maple: > with(Groebner): > GB := [z^2+1/5*x-1/5*y+2/25, y^2-1/5*x+1/5*z+2/25, > x*y+x*z+y*z+1/25, x^2+1/5*y-1/5*z+2/25]: > B := [1, x, y, z, x*z, y*z]:
> for v in [x,y,z] do > T := array([],1..6,1..6): > for j from 1 to 6 do > p := normalf( v*B[j], GB, tdeg(x,y,z)): > for i from 1 to 6 do > T[i,j] := coeff(coeff(coeff(p,x,degree(B[i],x)),y, > degree(B[i],y)),z,degree(B[i],z)): > od: 23
> od: > print(cat(T,v),T); > od: [ [0 [ [ [ [1 [ [ Tx, [0 [ [ [0 [ [ [0 [ [0 -2 -25 -1 -25 -2 --125 -1 -25 1/25 -2 -25 0 -1/5 ] 0 ] ] ] ] 1/25] ] ] 1/25] ] ] 1/25] ] ] 0 ] ] 0 ]
-1/5
1/5
0 0
-1 -1
1 0
[ [0 [ [ [0 [ [ [1 Ty, [ [ [ [0 [
-1 -25 0
-2 -25 1/5
1/25
1/25
-1/5
1/25
] 2/125] ] ] 1/25 ] ] -1 ] -- ] 25 ] ] -2 ] -- ] 25 ]
24
[ [0 [0
-1 -1
0 0
0 1
0 0
] 1/5 ] 0 ]
[ [0 [ [ [ [0 [ [ [ Tz, [0 [ [ [ [1 [ [ [0 [0
-2 -25
1/125
-1/5
-2 -25
1/5
1/25
-1 -25 -1/5 -1/5
1 0
0 1
0 0
-1 ] --- ] 125 ] ] ] 1/25] ] ] -2 ] -- ] 25 ] ] -1 ] -- ] 25 ] ] 1/5 ] 1/5 ]
The matrices Tx, Ty and Tz commute pairwise and they can be simultaneously diagonalized. The entries on the diagonal are the six complex zeros. We invite the reader to compute the common basis of eigenvectors using matlab.
2.4
The Trace Form
In this section we explain how to compute the number of real roots of a zero-dimensional ideal which is presented to us by a Grbner basis as before. o Fix any other polynomial h S and consider the following bilinear form on our vector space S/I d . This is called the trace form for h: Bh : S/I S/I , (f, g) trace (f g h)(T1 , T2 , . . . , Tn ) . 25
We represent the quadratic form Bh by a symmetric d d-matrix over with respect to the basis B. If xu , xv B then the entry of Bh in row xu and column xv is the sum of the diagonal entries in the d d-matrix gotten by substituting the companion matrices Ti for the variables xi in the polynomial xu+v h. This rational number can be computed by summing, over all xw B, the coecient of xw in the normal form of xu+v+w h modulo I. Since the matrix Bh is symmetric, all of its eigenvalues are real numbers. The signature of Bh is the number of positive eigenvalues of Bh minus the number of negative eigenvalues of Bh . It turns out that this number is always non-negative for symmetric matrices of the special form Bh . In the following theorem, real zeros of I with multiplicities are counted only once. Theorem 15. The signature of the trace form Bh equals the number of real roots p of I with h(p) > 0 minus the number of real roots p of I with h(p) < 0. The special case when h = 1 is used to count all real roots: Corollary 16. The number of real roots of I equals the signature of B1 . We compute the symmetric 66-matrix B1 for the case of the polynomial system whose companion matrices were determined in the previous section. > with(linalg): with(Groebner): > GB := [z^2+1/5*x-1/5*y+2/25, y^2-1/5*x+1/5*z+2/25, > x*y+x*z+y*z+1/25, x^2+1/5*y-1/5*z+2/25]: > B := [1, x, y, z, x*z, y*z]: > > > > > > > > > > > B1 := array([],1..6,1..6): for j from 1 to 6 do for i from 1 to 6 do B1[i,j] := 0: for k from 1 to 6 do B1[i,j] := B1[i,j] + coeff(coeff(coeff( normalf(B[i]*B[j]*B[k], GB, tdeg(x,y,z)),x, degree(B[k],x)), y, degree(B[k],y)),z, degree(B[k],z)): od: od: od: 26
> print(B1); [ [6 [ [ [ [0 [ [ [ [0 [ [ [ [0 [ [ [-2 [-[25 [ [-2 [-[25 > charpoly(B1,z); 6 2918 5 117312 4 1157248 3 625664 2 z - ---- z - ------ z - ------- z - ------- z 625 15625 390625 9765625 4380672 + -------- z 48828125 > fsolve(%); 32768 -----9765625 0 0 0 -2 -25 -2 -25 -2 ] -- ] 25 ] ] ] 0 ] ] ] ] 2/25] ] ] -2 ] -- ] 25 ] ] -16 ] --- ] 625 ] ] 34 ] --- ] 625 ]
-12 --25 -2 -25 -2 -25 -2 -25
-2 -25 -12 --25 -2 -25
-2 -25 -2 -25 -12 --25
2/25
2/25
34 --625 -16 --625
2/25
-2 -25
27
-.6400000, -.4371281, -.4145023, .04115916, .1171281, 6.002143 Here the matrix B1 has three positive eigenvalues and three negative eigenvalues, so the trace form has signature zero. This conrms our earlier nding that these equations have no real zeros. We note that we can read o the signature of B1 directly from the characteristic polynomial. Namely, the characteristic polynomial has three sign changes in its coecient sequence. Using the following result, which appears in Exercise 5 on page 67 of (Cox, Little & OShea, 1998), we infer that there are three positive real eigenvalues and this implies that the signature of B1 is zero. Lemma 17. The number of positive eigenvalues of a real symmetric matrix equals the number of sign changes in the coecient sequence of its characteristic polynomial. It is instructive to examine the trace form for the case of one polynomial in one variable. Consider the principal ideal I = ad xd + ad1 xd1 + + a2 x2 + a1 x + a0 S = [x].
We consider the traces of successive powers of the companion matrix: bi := trace Timesi x =
uV(I)
ui .
Thus bi is a Laurent polynomial of degree zero in a0 , . . . , ad , which is essentially the familiar Newton relation between elementary symmetric functions and power sum symmetric functions. The trace form is given by the matrix b0 b1 b2 bd1 b1 b2 b3 bd b2 b3 b4 bd+1 (20) B1 = . . . . .. . . . . . . . . . bd1 bd bd+1 b2d2 Thus the number of real zeros of I is the signature of this Hankel matrix.
28
For instance, for d = 4 the entries in the 4 4-Hankel matrix B1 are b0 = b1 = b2 = b3 = b4 = b5 = b6 = 4

a3 a4 2a4 a2 +a2 3 a2 4 3a2 a1 +3a4 a3 a2 a3 4 3 a3 4 4a3 a0 +4a2 a3 a1 +2a2 a2 4a4 a2 a2 +a4 4 4 4 2 3 3 a4 4 5a3 a3 a0 5a3 a2 a1 +5a2 a2 a1 +5a2 a3 a2 5a4 a3 a2 +a5 4 4 4 3 4 2 3 3 a5 4 6a4 a2 a0 3a4 a2 +6a3 a2 a0 +12a3 a3 a2 a1 +2a3 a3 6a2 a3 a1 9a2 a2 a2 +6a4 a4 a2 a6 4 4 1 4 3 4 4 2 4 3 4 3 2 3 3 , a6 4
and the characteristic polynomial of the 4 4-matrix B1 equals x4 + (b0 b2 b4 b6 ) x3 + (b0 b2 + b0 b4 + b0 b6 b2 b2 b2 + b2 b4 + b2 b6 2b2 b2 + b4 b6 ) x2 5 1 2 3 4 + (b0 b2 b0 b2 b4 b0 b2 b6 +b0 b2 +b0 b2 b0 b4 b6 +b2 b2 2b5 b2 b3 2b5 b3 b4 +b2 b4 5 3 4 5 1 2 3 2 2 2 2 3 +b1 b6 2b1 b2 b3 2b1 b3 b4 +b2 +b2 b6 +b2 b3 b2 b4 b6 +b3 b4 +b3 b6 +b4 ) x b0 b2 b2 +2b0 b5 b3 b4 + b0 b2 b4 b6 b0 b2 b6 b0 b3 + b2 b2 2b5 b1 b2 b4 2b5 b1 b2 5 3 4 5 1 3 2 2 2 3 2 2 2 4 +2b5 b2 b3 b1 b4 b6 + 2b1 b2 b3 b6 + 2b1 b3 b4 b2 b6 + b2 b4 3b2 b3 b4 + b3 By considering sign alternations among these expressions in b0 , b1 , . . . , b6 , we get explicit conditions for the general quartic to have zero, one, two, three, or four real roots respectively. These are semialgebraic conditions. This means the conditions are Boolean combinations of polynomial inequalities in the ve indeterminates a0 , a1 , a2 , a3 , a4 . In particular, all four zeros of the general quartic are real if and only if the trace form of positive denite. Recall that a symmetric matrix is positive denite if and only if its principal minors are positive. Hence the quartic has four real roots if and only if b0 > 0 and b0 b2 b2 > 0 and b0 b2 b4 b0 b2 b2 b4 + 2b1 b2 b3 b3 > 0 and 1 3 1 2 2b0 b5 b3 b4 b0 b2 b2 + b0 b2 b4 b6 b0 b2 b6 b0 b3 + b2 b2 2b5 b1 b2 b4 2b5 b1 b2 5 3 4 5 1 3 2 2 2 3 2 2 2 4 +2b5 b2 b3 b1 b4 b6 + 2b1 b2 b3 b6 + 2b1 b3 b4 b2 b6 + b2 b4 3b2 b3 b4 + b3 > 0. The last polynomial is the determinant of B1 . It equals the discriminant of the quartic (displayed in maple at the beginning of Lecture 1) divided by a6 . 4 29
2.5
Exercises
(1) Let A = (aij ) be a non-singular n n-matrix whose entries are positive integers. How many complex solutions do the following equations have:
n a xj 1j j=1 n n a xj 2j j=1
= =
j=1
xj nj = 1.
(2) Pick a random homogeneous cubic polynomial in four variables. Compute the 27 lines on the cubic surface dened by your polynomial. (3) Given d arbitrary rational numbers a0 , a1 , . . . , ad1 , consider the system of d polynomial equations in d unknowns z1 , z2 , . . . , zd given by setting xd + ad1 xd1 + a1 x + a0 = (x z1 )(x z2 ) (x zd ). Describe the primary decomposition of this ideal in [z1 , z1 , . . . , zd ]. How can you use this to nd the Galois group of the given polynomial? (4) For any two positive integers m, n, nd an explicit radical ideal I in [x1 , . . . , xn ] and a term order such that in (I) = x1 , x2 , . . . , xn m . (5) Fix the monomial ideal M = x, y = x3 , x2 y, xy 2, y 3 and compute its companion matrices Tx , Ty . Describe all polynomial ideals in [x, y] which are within distance = 0.0001 from M , in the sense that the companion matrices are -close to Tx , Ty in your favorite matrix norm. (6) Does every zero-dimensional ideal in [x, y] have a radical ideal in all of its -neighborhoods? How about zero-dimensional ideals in [x, y, z]? (7) How many distinct real vectors (x, y, z) 3 satisfy the equations x3 + z = 2y 2, y 3 + x = 2z 2 , z 3 + y = 2x2 ?
(8) Pick eight random points in the real projective plane. Compute the 12 nodal cubic curves passing through your points. Can you nd eight points such that all 12 cubic polynomials have real coecients? (9) Consider a quintic polynomial in two variables, for instance, f = 5y 5 + 19y 4x + 36y 3x2 + 34y 2x3 + 16yx4 + 3x5 +6y 4 + 4y 3x + 6y 2x2 + 4yx3 + x4 + 10y 3 + 10y 2 + 5y + 1.
Determine the irreducible factor of f in [x, y], and also in [x, y]. 30
(10) Consider a polynomial system which has innitely many complex zeros but only nitely many of them have all their coordinates distinct. How would you compute those zeros with distinct coordinates? (11) Does there exist a Laurent polynomial in [t, t1 ] of the form f = t4 + x3 t3 + x2 t2 + x1 t1 + y1 t + y2 t2 + y3 t3 + t4 such that the powers f 2 , f 3, f 4 , f 5 , f 6 and f 7 all have zero constant term? Can you nd such a Laurent polynomial with real coecients? What if we also require that the constant term of t8 is zero? (12) A well-studied problem in number theory is to nd rational points on elliptic curves. Given an ideal I [x1 , . . . , xn ] how can you decide whether V(I) is an elliptic curve, and, in the armative case, which computer program would you use to look for points in V(I) n ?
Bernsteins Theorem and Fewnomials
The Grbner basis methods described in the previous lecture apply to aro bitrary systems of polynomial equations. They are so general that they are frequently not the best choice when dealing with specic classes polynomial systems. A situation encountered in many applications is a system of n sparse polynomial equations in n variables which has nitely many roots. Algebraically, this situation is special because we are dealing with a complete intersection, and sparsity allows us to use polyhedral techniques for counting and computing the zeros. This lecture gives a gentle introduction to sparse polynomial systems by explaining some basic techniques for n = 2.
3.1
From Bzouts Theorem to Bernsteins Theorem e

f (x, y) = a1 xu1 y v1 + a2 xu2 y v2 + + am xum y vm , (21)
A polynomial in two unknowns looks like
where the exponents ui and vi are non-negative integers and the coecients ai are non-zero rationals. Its total degree deg(f ) is the maximum of the numbers u1 + v1 , . . . , um + vm . The following theorem gives an upper bound on the number of common complex zeros of two polynomials in two unknowns. 31
Theorem 18. (Bzouts Theorem) Consider two polynomial equations in e two unknowns: g(x, y) = h(x, y) = 0. If this system has only nitely many zeros (x, y) 2 , then the number of zeros is at most deg(g) deg(h). Bzouts Theorem is the best possible in the sense that almost all polye nomial systems have deg(g) deg(h) distinct solutions. An explicit example is gotten by taking g and h as products of linear polynomials u1 x + u2 y + u3 . More precisely, there exists a polynomial in the coecients of g and h such that whenever this polynomial is non-zero then f and g have the expected number of zeros. The rst exercise below concerns nding such a polynomial. A drawback of Bzouts Theorem is that it yields little information for e polynomials that are sparse. For example, consider the two polynomials g(x, y) = a1 + a2 x + a3 xy + a4 y , h(x, y) = b1 + b2 x2 y + b3 xy 2 . (22)
These two polynomials have precisely four distinct zeros (x, y) 2 for generic choices of coecients ai and bj . Here generic means that a certain polynomial in the coecients ai , bj , called the discriminant, should be nonzero. The discriminant of the system (22) is the following expression 4a7 a3 b3 b3 + a6 a2 b2 b4 2a6 a2 a4 b3 b3 + a6 a2 b4 b2 + 22a5 a2 a2 b1 b2 b3 1 2 3 1 2 2 3 1 2 3 1 4 2 3 1 3 2 3 5 2 3 2 4 3 4 5 2 4 4 +22a1 a3 a4 b1 b2 b3 + 22a1 a2 a3 b1 b2 b3 + 18a1 a2 a3 a4 b1 b2 30a1 a2 a3 a2 b1 b3 b2 4 2 3 4 4 2 2 2 4 3 4 3 5 5 3 4 4 +a1 a3 b1 b2 b3 + 22a1 a3 a4 b1 b2 b3 + 4a1 a2 b1 b3 14a1 a2 a4 b1 b2 b3 3 3 2 +10a1 a2 a4 b1 b2 b3 + 22a3 a2 a3 b2 b2 b3 + 10a3 a2 a3 b1 b3 b2 + 116a3 a2 a3 a4 b2 b2 b2 2 3 1 2 3 1 3 1 2 4 2 3 1 3 1 2 3 3 4 4 3 3 2 2 3 3 5 5 2 4 2 2 4 14a1 a2 a4 b1 b2 b3 + 22a1 a3 a4 b1 b2 b3 + 4a1 a4 b1 b2 + a1 a2 a3 b1 b3 2 3 2 +94a1 a2 a3 a4 b2 b2 b3 318a2 a2 a2 a2 b2 b2 b2 + 396a1 a3 a3 a3 b2 b2 b2 + a2 a2 a4 b2 b4 1 3 1 2 3 4 1 2 3 2 4 1 2 3 1 3 4 1 2 +94a2 a2 a2 a3 b2 b3 b3 + 4a2 a2 a5 b3 b2 b2 + 4a2 a5 a4 b3 b2 b3 + 18a1 a5 a3 a4 b2 b4 1 3 4 1 2 1 3 1 3 1 3 1 2 2 1 3 4 2 2 3 2 4 3 2 2 4 2 3 6 2 2 4 216a1 a2 a3 a4 b1 b2 b3 + 96a1 a2 a3 a4 b1 b2 b3 216a1 a2 a3 a4 b1 b2 b3 27a2 a4 b1 b3 30a4 a2 a3 a4 b1 b2 b3 + 96a1 a2 a4 a2 b3 b2 b3 + 108a5 a3 b2 b2 b3 1 2 2 3 3 4 1 2 2 4 1 3 +4a4 a3 a4 b3 b3 162a4 a4 b2 b2 b2 132a3 a3 a2 b3 b2 b2 + 108a3 a5 b2 b3 b3 2 3 1 3 2 4 1 2 3 2 3 4 1 3 2 4 1 2 2 3 3 3 2 2 6 2 4 6 4 132a2 a3 a4 b1 b2 b3 27a2 a4 b1 b2 + 16a2 a3 a4 b1 b2 b3 + 4a2 a3 a4 b3 b3 3 4 1 2 If this polynomial of degree 14 is non-zero, then the system (22) has four distinct complex zeros. This discriminant is computed in maple as follows. g := a1 + a2 * x + a3 * x*y + a4 * y; h := b1 + b2 * x^2 * y + b3 * x * y^2; 32
R := resultant(g,h,x): S := factor( resultant(R,diff(R,y),y) ): discriminant := op( nops(S), S); The last command extracts the last (and most important) factor of the expression S. Bzouts Theorem would predict deg(g) deg(h) = 6 common complex e zeros for the equations in (22). Indeed, in projective geometry we would expect the cubic curve {g = 0} and the quadratic curve {h = 0} to intersect in six points. But these particular curves never intersect in more than four points in 2 . How come ? To understand why the number is four and not six, we need to associate convex polygons with our given polynomials. Convex polytopes have been studied since the earliest days of mathematics. We shall see that they are very useful for analyzing and solving polynomial equations. A polytope is a subset of n which is the convex hull of a nite set of points. A familiar example is the convex hull of {(0, 0, 0), (0, 1, 0), (0, 0, 1), (0, 1, 1), (1, 0, 0), (1, 1, 0), (1, 0, 1), (1, 1, 1)} in 3 ; this is the regular 3-cube. A d-dimensional polytope has many faces, which are again polytopes of various dimensions between 0 and d 1. The 0dimensional faces are called vertices, the 1-dimensional faces are called edges, and the (d 1)-dimensional faces are called facets. For instance, the cube has 8 vertices, 12 edges and 6 facets. If d = 2 then the edges coincide with the facets. A 2-dimensional polytope is called a polygon. Consider the polynomial f (x, y) in (21). Each term xui y vi appearing in f (x, y) can be regarded as a lattice point (ui , vi ) in the plane 2 . The convex hull of all these points is called the Newton polygon of f (x, y). In symbols, New(f ) := conv (u1 , v1 ), (u2, v2 ), . . . , (um , vm )
This is a polygon in 2 having at most m vertices. More generally, every polynomial in n unknowns gives rise to a Newton polytope in n . Our running example in this lecture is the the pair of polynomials in (22). The Newton polygon of the polynomial g(x, y) is a quadrangle, and the Newton polygon of h(x, y) is a triangle. If P and Q are any two polygons in the plane, then their Minkowski sum is the polygon P +Q := p + q : p P, q Q .
Note that each edge of P + Q is parallel to an edge of P or an edge of Q. 33
The geometric operation of taking the Minkowski sum of polytopes mirrors the algebraic operation of multiplying polynomials. More precisely, the Newton polytope of a product of two polynomials equals the Minkowski sum of two given Newton polytopes: N ew(g h) = New(g) + New(h).
If P and Q are any two polygons then we dene their mixed area as M(P, Q) := area(P + Q) area(P ) area(Q).
For instance, the mixed area of the two Newton polygons in (22) equals M(P, Q) = M(New(g), New(h)) = 13 3 1 = 4. 2 2
The correctness of this computation can be seen in the following diagram:
Figure:
Mixed subdivision
This gure shows a subdivision of P + Q into ve pieces: a translate of P , a translate of Q and three parallelograms. The mixed area is the sum of the areas of the three parallelograms, which is four. This number coincides with the number of common zeros of g and h. This is not an accident, but is an instance of a general theorem due to David Bernstein (1975). We abbreviate := \{0}. The set ( )2 of pairs (x, y) with x = 0 and y = 0 is a group under multiplication, called the two-dimensional algebraic torus. 34
Theorem 19. (Bernsteins Theorem) If g and h are two generic bivariate polynomials, then the number of solutions of g(x, y) = h(x, y) = 0 in ( )2 equals the mixed area M(New(g), New(h)). Actually, this assertion is valid for Laurent polynomials, which means that the exponents in our polynomials (21) can be any integers, possibly negative. Bernsteins Theorem implies the following combinatorial fact about lattice polygons. If P and Q are lattice polygons (i.e., the vertices of P and Q have integer coordinates), then M(P, Q) is a non-negative integer. We remark that Bzouts Theorem follows as a special case from Berne steins Theorem. Namely, if g and h a general polynomials of degree d and e respectively, then their Newton polygons are the triangles P := New(g) = conv{(0, 0), (0, d), (d, 0)} , Q := New(h) = conv{(0, 0), (0, e), (e, 0)} , P + Q := New(g h) = conv{(0, 0), (0, d + e), (d + e, 0)}. The areas of these triangles are d2 /2, e2 /2, (d + e)2 /2, and hence d2 e2 (d + e)2 = d e. 2 2 2 Hence two general plane curves of degree d and e meet in d e points. We shall present a proof of Bernsteins Theorem. This proof is algorithmic in the sense that it tells us how to approximate all the zeros numerically. The steps in this proof from the foundation for the method of polyhedral homotopies for solving polynomial systems. This is an active area of research, with lots of exciting progress by work of T.Y. Li, Jan Verschelde and others. We proceed in three steps. The rst deals with an easy special case. M(P, Q) =
3.2
Zero-dimensional Binomial Systems
A binomial is a polynomial with two terms. We rst prove Theorem 1.1 in the case when g and h are binomials. After multiplying or dividing both binomials by suitable scalars and powers of the variables, we may assume that our given equations are g = xa1 y b1 c1 and h = xa2 y b2 c2 , (23)
where a1 , a2 , b1 , b2 are integers (possibly negative) and c1 , c2 are non-zero complex numbers. Note that multiplying the given equations by a (Laurent) 35
monomial changes neither the number of zeros in ( )2 nor the mixed area of their Newton polygons To solve the equations g = h = 0, we compute an invertible integer 2 2-matrix U = (uij ) SL2 ( ) such that u11 u12 u21 u22 a1 b1 a2 b2 = r 1 r3 0 r2 .
This is accomplished using the Hermite normal form algorithm of integer linear algebra. The invertible matrix U triangularizes our system of equations: g=h=0 xa1 y b1 = c1 and xa2 y b2 = c2 (xa1 y b1 )u11 (xa2 y b2 )u12 = cu11 cu12 and (xa1 y b1 )u21 (xa2 y b2 )u22 = cu21 cu22 1 2 1 2 xr1 y r3 = cu11 cu12 and y r2 = cu21 cu22 . 1 2 1 2 This triangularized system has precisely r1 r2 distinct non-zero complex solutions. These can be expressed in terms of radicals in the coecients c1 and c2 . The number of solutions equals r1 r2 = det r 1 r3 0 r2 = det a1 b1 a2 b2 = area(New(g) + New(h)).
This equals the mixed area M(New(g), New(h)), since the two Newton polygons are just segments, so that area(New(g)) = area(New(h)) = 0. This proves Bernsteins Theorem for binomials. Moreover, it gives a simple algorithm for nding all zeros in this case. The method described here clearly works also for n binomial equations in n variables, in which case we are to compute the Hermite normal form of an integer n n-matrix. We note that the Hermite normal form computation is similar but not identical to the computation of a lexicographic Grbner o basis. We illustrate this in maple for a system with n = 3 having 20 zeros: > with(Groebner): with(linalg): > gbasis([ > x^3 * y^5 * z^7 > x^11 * y^13 * z^17 > x^19 * y^23 * z^29
c1, c2, c3],
plex(x,y,z));
36
13 3 8 10 15 2 2 9 8 6 3 4 7 [-c2 c1 + c3 z , c2 c1 y - c3 z , c2 c1 x - c3 z y] > > > > ihermite( [ 3, 5, [ 11, 13, [ 19, 23, array([ 7 ], 17 ], 29 ] ])); [1 [ [0 [ [0 1 2 0 5] ] 2] ] 10]
3.3
Introducing a Toric Deformation
We introduce a new indeterminate t, and we multiply each monomial of g and each monomial of h by a power of t. What we want is the solutions to this system for t = 1, but what we will do instead is to analyze it for t in neighborhood of 0. For instance, our system (22) gets replaced by gt (x, y) ht (x, y) = = a1 t1 + a2 xt2 + a3 xyt3 + a4 yt4 b1 t1 + b2 x2 yt2 + b3 xy 2 t3
We require that the integers i and j be suciently generic in a sense to be made precise below. The system gt = ht = 0 can be interpreted as a bivariate system which depends on a parameter t. Its zeros (x(t), y(t)) depend on that parameter. They dene the branches of an algebraic function t (x(t), y(t)). Our goal is to identify the branches. In a neighborhood of the origin in the complex plane, each branch of our algebraic function can be written as follows: x(t) y(t) = = x0 tu + higher order terms in t, y0 tv + higher order terms in t,
where x0 , y0 are non-zero complex numbers and u, v are rational numbers. To determine the exponents u and v we substitute x = x(t) and y = y(t) into the equations gt (x, y) = ht (x, y) = 0. In our example this gives gt x(t), y(t) ht x(t), y(t) = a1 t1 + a2 x0 tu+2 + a3 x0 y0 tu+v+3 + a4 y0 tv+4 + , 2 = b1 t1 + b2 x2 y0 t2u+v+2 + b3 x0 y0 tu+2v+3 + . 0 37
In order for x(t), y(t) to be a root, the term of lowest order must vanish in each of these two equations. Since x0 and y0 are chosen to be nonzero, this is possible only if the lowest order in t is attained by at least two dierent terms. This implies the following two piecewise-linear equations for the indeterminate vector (u, v) 2 : min 1 , u + 2 , u + v + 3 , v + 4 min 1 , 2u + v + 2 , u + 2v + 3 is attained twice, is attained twice.
As in Lecture 1, each of these translates into a disjunction of linear equations and inequalities. For instance, the second min-equation translates into 1 = 2u + v + 2 u + 2v + 3 or 1 = u + 2v + 3 2u + v + 2 or 2u + v + 2 = u + 2v + 3 1 It is now easy to state what we mean by the i and j being suciently generic. It means that the minimum is attained twice but not thrice. More precisely, at every solution (u, v) of the two piecewise-linear equations, precisely two of the linear forms attain the minimum value in each of the two equations. One issue in the algorithm for Bernsteins Theorem is to choose powers of t that are small but yet generic. In our example, the choice 1 = 2 = 3 = 4 = 3 = 0, 1 = 2 = 1 is generic. Here the two polynomials are gt (x, y) = a1 + a2 x + a3 xy + a4 y, ht (x, y) = b1 t + b2 x2 yt + b3 xy 2 ,
and the corresponding two piecewise-linear equations are min 0, u, u + v, v and min 1, 2u + v + 1, u + 2v are attained twice.
This system has precisely three solutions: (u, v) (1, 0), (0, 1/2), (1, 0) .
For each of these pairs (u, v), we now obtain a binomial system g (x0 , y0) = 0 , y0) = 0 which expresses the fact that the lowest terms in gt x(t), y(t) h(x and ht x(t), y(t) do indeed vanish. The three binomial systems are
2 g (x0 , y0 ) = a1 + a4 y0 and h(x0 , y0 ) = b1 + b3 x0 y0 for (u, v) = (1, 0).
38
2 g (x0 , y0 ) = a1 + a2 x0 and h(x0 , y0) = b1 + b3 x0 y0 for (u, v) = (0, 1/2). 2 g (0 , y0 ) = a2 x0 + a3 x0 y0 and h(x0 , y0 ) = b2 x2 y0 + b3 x0 y0 for (u, v) = 0 (1, 0).
These binomial systems have one, two and one root respectively. For instance, the unique Puiseux series solution for (u, v) = (1, 0) has x0 = a2 b1 /a2 b3 4 1 and y0 = a1 /a4 .
Hence our algebraic function has a total number of four branches. If one wishes more information about the four branches, one can now compute further terms in the Puiseux expansions of these branches. For instance, x(t) = + y(t) = a1 + a4 a4 b1 t 2 3
1
a2 b
a3 b2 (a1 a3 a2 a4 ) 4 1 a5 b2 1 3
t2 t3 + . . . t2 + . . . .
a4 b2 (a3 a4 b2 5a2 a2 b1 +12a1 a2 a3 a4 b1 7a2 a2 b1 ) 4 1 1 1 3 2 4 a8 b8 1 3 b1 (a1 a3 a2 a4 ) a2 b3 1
t +
a4 b2 (a1 a3 a2 a4 )(a1 a3 2a2 a4 ) 1 a5 b2 1 3
For details on computing multivariate Puiseux series see (McDonald 1995).
3.4
Mixed Subdivisions of Newton Polytopes
We x a generic toric deformation gt = ht = 0 of our equations. In this section we introduce a polyhedral technique for solving the associated piecewise linear equation and, in order to prove Bernsteins Theorem, we show that the total number of branches equals the mixed area of the Newton polygons. Let us now think of gt and ht as Laurent polynomials in three variables (x, y, t) whose zero set is a curve in ( )3 . The Newton polytopes of these trivariate polynomials are the following two polytopes in 3 : P Q := := conv (0, 0, 1 ), (1, 0, 2), (1, 1, 3), (0, 1, 4) conv (0, 0, 1), (2, 1, 2), (1, 2, 3) .
and
The Minkowski sum P + Q is a polytope in 3 . By a facet of P + Q we mean a two-dimensional face. A facet F of P +Q is a lower facet if there is a vector (u, v) 2 such that (u, v, 1) is an inward pointing normal vector to P + Q at F . Our genericity conditions for the integers i and j is equivalent to: (1) The Minkowski sum P + Q is a 3-dimensional polytope. 39
(2) Every lower facet of P + Q has the form F1 + F2 where either (a) F1 is a vertex of P and F2 is a facet of Q, or (b) F1 is an edge of P and F2 is an edge of Q, or (c) F1 is a facet of P and F2 is a vertex of Q. As an example consider our lifting from before, 1 = 2 = 3 = 4 = 3 = 0 and 1 = 2 = 1. It meets the requirements (1) and (2). The polytope P is a quadrangle and Q is triangle. But they lie in non-parallel planes in 3 . Their Minkowski sum P + Q is a 3-dimensional polytope with 10 vertices:
Figure:
The 3-dimensional polytope P+Q
The union of all lower facets of P + Q is called the lower hull of the polytope P + Q. Algebraically speaking, the lower hull is the subset of all points in P + Q at which some linear functional of the form (x1 , x2 , x3 ) ux1 + vx2 + x3 attains its minimum. Geometrically speaking, the lower hull is that part of the boundary of P + Q which is visible from below. Let : 3 2 denote the projection onto the rst two coordinates. Then (P ) = N ew(g), (Q) = New(h), and (P +Q) = New(g)+New(h).
The map restricts to a bijection from the lower hull onto New(g)+New(h). The set of polygons := {(F ) : F lower facet of P + Q} denes a subdivision of N ew(g) + New(h). A subdivision constructed by this process, for some choice of i and j , is called a mixed subdivision of the given Newton polygons. The polygons (F ) are the cells of the mixed subdivision . 40
Every cell of a mixed subdivision has the form F1 + F2 where either (a) F1 = {(ui , vi )} where xui y vi appears in g and F2 is the projection of a facet of Q, or (b) F1 is the projection of an edge of P and F2 is the projection of an edge of Q, or (c) F1 is the projection of a facet of P and F2 = {(ui, vi )} where xui y vi appears in h. The cells of type (b) are called the mixed cells of . Lemma 20. Let be any mixed subdivision for g and h. Then the sum of the areas of the mixed cells in equals the mixed area M(New(g), New(h)). Proof. Let and be arbitrary positive reals and consider the polytope P + Q in 3 . Its projection into the plane 2 equals (P + Q) = (P ) + (Q) = New(g) + New(h).
Let A(, ) denote the area of this polygon. This polygon can be subdivided into cells F1 + F2 where F1 + F2 runs over all cells of . Note that area(F1 + F2 ) equals 2 area(F1 + F2 ) if F1 + F2 is a cell of type (a), area(F1 +F2 ) if it is a mixed cell, and 2 area(F1 +F2 ) if is has type (c). The sum of these areas equals A(, ). Therefore A(, ) = A(a) 2 + A(b) + A(c) 2 , where A(b) is the sum of the areas of the mixed cells in . We conclude A(b) = A(1, 1) A(1, 0) A(0, 1) = M(New(g), New(h)). The following lemma makes the connection with the previous section. Lemma 21. A pair (u, v) 2 solves the piecewise-linear min-equations if and only if (u, v, 1) is the normal vector to a mixed lower facet of P + Q. This implies that the valid choices of (u, v) are in bijection with the mixed cells in the mixed subdivision . Each mixed cell of is expressed uniquely as the Minkowski sum of a Newton segment New() and a Newton segment g where g is a binomial consisting of two terms of g, and h is a New(h), binomial consisting of two terms of h. Thus each mixed cell in can be
41
identied with a system of two binomial equations g(x, y) = h(x, y) = 0. In this situation we can rewrite our system as follows: gt (x(t), y(t)) ht (x(t), y(t)) = = g (x0 , y0 ) ta + higher order terms in t, h(x0 , y0 ) tb + higher order terms in t,
where a and b suitable rational numbers. This implies the following lemma. Lemma 22. Let (u, v) be as in Lemma 21. The corresponding choices of (x0 , y0 ) ( )2 are the solutions of the binomial system g (x0 , y0 ) = h(x0 , y0) = 0. We are now prepared to complete the proof of Bernsteins Theorem. This is done by showing that the equations gt (x, y) = ht (x, y) = 0 have M(New(g), New(h)) many distinct isolated solutions in (K )2 where K = {{t}} is the algebraically closed eld of Puiseux series. By Section 3.2, the number of roots (x0 , y0 ) ( )2 of the binomial system in Lemma 22 coincides with the area of the mixed cell New() + g New(h). Each of these roots provides the leading coecients in a Puiseux series solution (x(t), y(t)) to our equations. Conversely, by Lemma 21 every series solution arises from some mixed cell of . We conclude that the number of series solutions equals the sum of these areas over all mixed cells in . By Lemma 20, this quantity coincides with the mixed area M(New(f ), New(g)). General facts from algebraic geometry guarantee that the same number of roots is attained for almost all choices of coecients, and that we can descend from the eld K to the complex numbers under the substitution t = 1. Our proof of Bernsteins Theorem gives rise to a numerical algorithm for nding of all roots of a sparse system of polynomial equations. This algorithm belongs to the general class of numerical continuation methods, which are sometimes also called homotopy methods. Standard references include (Allgower & Georg, 1990) and (Li 1997). For some fascinating recent progress see (Sommese, Verschelde and Wampler 2001). The idea of our homotopy is to trace each of the branches of the algebraic curve (x(t), y(t)) between t = 0 and t = 1. We have shown that the number of branches equals the mixed area. Our constructions give sucient information about the Puiseux series so that we can approximate (x(t), y(t)) for any t in a small neighborhood of zero. Using numerical continuation, it is now possible to approximate (x(1), y(1)). 42
3.5
Khovanskiis Theorem on Fewnomials
Polynomial equations arise in many mathematical models in science and engineering. In such applications one is typically interested in solutions over the real numbers instead of the complex numbers . This study of real roots of polynomial systems is considerably more dicult than the study of complex roots. Even the most basic questions remain unanswered to-date. Let us start out with a very concrete such question: Question 23. What is the maximum number of isolated real roots of any system of two polynomial equations in two variables each having four terms? The polynomial equations considered here look like f (x, y) g(x, y) = a1 xu1 y v1 + a2 xu2 y v2 + a3 xu3 y v3 + a4 xu4 y v4 , = b1 xu1 y v1 + b2 xu2 y v2 + b3 xu3 y v3 + b4 xu4 y v4 .
where ai , bj are arbitrary real numbers and ui , vj , ui , vj are arbitrary integers. To stay consistent with our earlier discussion, we shall count only solutions (x, y) in ( )2 , that is, we require that both x and y are non-zero reals. There is an obvious lower bound for the number Question 23: thirty-six. It is easy to write down a system of the above form that has 36 real roots: f (x) = (x2 1)(x2 2)(x2 3) and g(y) = (y 2 1)(y 2 2)(y 2 3). Each of the polynomials f and g depends on one variable only, and it has 6 non-zero real roots in that variable. Therefore the system f (x) = g(y) = 0 has 36 distinct isolated roots in ( )2 . Note also that the expansions of f and g have exactly four terms each, as required. A priori it is not clear whether Question 23 even makes sense: why should such a maximum exist ? It certainly does not exist if we consider complex zeros, because one can get arbitrarily many complex zeros by increasing the degrees of the equations. The point is that such an unbounded increase of roots is impossible over the real numbers. This was proved by Khovanskii (1980). He found a bound on the number of real roots which does not depend on the degrees of the given equations. We state the version for positive roots. Theorem 24. (Khovanskiis Theorem) Consider n polynomials in n variables involving m distinct monomials in total. The number of isolated roots m in the positive orthant ( + )n of any such system is at most 2( 2 ) (n + 1)m . 43
The basic idea behind the proof of Khovanskiis Theorem is to establish the following more general result. We consider systems of n equations which can be expressed as polynomial functions in at most m monomials in x = (x1 , . . . , xn ). If we abbreviate the i-th such monomial by xai := xai1 xai2 xain , then we can write our n polynomials as 1 2 n Fi xa1 , xa2 , . . . , xam = 0 (i = 1, 2, . . . , n)
We claim that the number of real zeros in the positive orthant is at most
n d
2( 2 ) 1 +
i=1
deg(Fi)
i=1
deg(Fi ).
Theorem 2.3 concerns the case where deg(Fi ) = 1 for all i. We proceed by induction on m n. If m = n then (2.3) is expressed in n monomials in n unknowns. By a multiplicative change of variables xi
u u u z1 i1 z2 i2 znin
we can transform our d monomials into the n coordinate functions z1 , . . . , zn . (Here the uij can be rational numbers, since all roots under consideration are positive reals.) Our assertion follows from Bzouts Theorem, which states e that the number of isolated complex roots is at most the product of the degrees of the equations. Now suppose m > n. We introduce a new variable t, and we multiply one of the given monomials by t. For instance, we may do this to the rst monomial and set Gi (t, x1 , . . . , xn ) := Fi xa1 t , xa2 , . . . , xam (i = 1, 2, . . . , n)
This is a system of equations in x depending on the parameter t. We study the behavior of its positive real roots as t moves from 0 to 1. At t = 0 we have a system involving one monomial less, so the induction hypothesis provides a bound on the number of roots. Along our trail from 0 to 1 we encounter some bifurcation points at which two new roots are born. Hence the number of roots at t = 1 is at most twice the number of bifurcation points plus the number of roots of t = 0. Each bifurcation point corresponds to a root (x, t) of the augmented system (2.4) J(t, x) = G1 (t, x) = = Gn (t, x) = 0, 44
where J(t, x) denotes the toric Jacobian: J(t, x1 , . . . , xm ) = det xi Gj (t, x) xj .

1i,jm
Now, the punch line is that each of the n + 1 equations in (2.4) including the Jacobian can be expressed in terms of only m monomials xa1 t, xa2 , , xam . Therefore we can bound the number of bifurcation points by the induction hypothesis, and we are done. This was only to give the avor of how Theorem 2.3 is proved. There are combinatorial and topological ne points which need most careful attention. The reader will nd the complete proof in (Khovanskii 1980), in (Khovanskii 1991) or in (Benedetti & Risler 1990). Khovanskiis Theorem implies an upper bound for the root count suggested in Question 23. After multiplying one of the given equations by a suitable monomial, we may assume that our system has seven distinct monomials. Substituting n = 2 and m = 7 into Khovanskiis formula, we see that 7 there are at most 2(2) (2 + 1)7 = 4, 586, 471, 424 roots in the positive quadrant. By summing over all four quadrants, we conclude that the maximum 7 in Question 23 lies between 36 and 18, 345, 885, 696 = 22 2(2) (2 + 1)7 . The gap between 36 and 18, 345, 885, 696 is frustratingly large. Experts agree that the truth should be closer to the lower bound than to the upper bound, but at the moment nobody knows the exact value. Could it be 36 ? The original motivation for Khovanskiis work was the following conjecture from the 1970s due to Kouchnirenko. Consider any system of n polynomial equations in n unknown, where the i-th equation has at most mi terms. The number of isolated real roots in (+ )n of such a system is at most (m1 1)(m2 1) (md 1). This number is attained by equations in distinct variables, as was demonstrated by our example with d = 2, m1 = m2 = 4 which has (m1 1)(m2 1) = 16 real zeros. Remarkably, Kouchnirenkos conjecture remained open for many years after Khovanskii had developed his theory of fewnomials which includes the above theorem. Only recently, Bertrand Haas (2002) found the following counterexample to Kouchnirenkos conjecture in the case d = 2, m1 = m2 = 4. Proving the following proposition from scratch is a nice challenge. Proposition 25. (Haas) The two equations x108 + 1.1y 54 1.1y = y 108 + 1.1x54 1.1x 45 = 0
have ve distinct strictly positive solutions (x, y) ( + )2 . It was proved by Li, Rojas and Wang (2001) that the lower bound provided by Haas example coincides with the upper bound for two trinomials. Theorem 26. (Li, Rojas and Wang) A system of two trinomials f (x, y) g(x, y) = = a1 xu1 y v1 + a2 xu2 y v2 + a3 xu3 y v3 , b1 xu1 y v1 + b2 xu2 y v2 + b3 xu3 y v3 ,
with ai , bj and ui , vj , ui, vj has at most ve positive real zeros. The exponents in this theorem are allowed to be real numbers not just integers. Li, Rohas and Wang (2001) proved a more general result for a two equations in x and y where the rst equation and the second equation has m terms. The number of positive real roots of such a system is at most 2m 2. Let us end this section with a light-hearted reference to (Lagarias & Richardson 1997). That paper analyzes a particular sparse system in two variables, and the author of these lecture notes lost $ 500 along the way.
3.6
Exercises
a1 x2 + a2 xy + a3 y 2 + a4 x + a5 y + a6 = 0 b1 x3 +b2 x2 y+b3 xy 2 +b4 y 3 +b5 x2 +b6 xy+b7 y 2 +b8 x+b9 y+b10 = 0 Compute an explicit polynomial in the unknowns ai , bj such that equations have six distinct solutions whenever your polynomial is non-zero.
(1) Consider the intersection of a general conic and a general cubic curve
(2) Draw the Newton polytope of the following polynomial f (x1 , x2 , x3 , x4 ) = (x1 x2 )(x1 x3 )(x1 x4 )(x2 x3 )(x2 x4 )(x3 x4 ). (3) For general i , j , how many vectors (x, y) ( 1 x3 y + 2 xy 3 = 3 x + 4 y
2
) satisfy
and 1 x2 y 2 + 2 xy = 3 x2 + 4 y 2 ?
Can your bound be attained with all real vectors (x, y) ( )2 ?
46
(4) Find the rst three terms in each of the four Puiseux series solutions (x(t), y(t)) of the two equations t2 x2 + t5 xy + t11 y 2 + t17 x + t23 y + t31 t3 x2 + t7 xy + t13 y 2 + t19 x + t29 y + t37 = = 0 0
(5) State and prove Bernsteins Theorem for n equations in n variables. (6) Bernsteins Theorem can be used in reverse, namely, we can calculate the mixed volume of n polytopes by counting the number of zeros in ( )n of a sparse system of polynomial equations. Pick your favorite three distinct three-dimensional lattice polytopes in 3 and compute their mixed volume with this method using Macaulay 2. (7) Show that Kouchnirenkos Conjecture is true for d = 2 and m1 = 2. (8) Prove Proposition 25. Please use any computer program of your choice. (9) Can Haas example be modied to show that the answer to Question 23 is strictly larger than 36?
Resultants
Elimination theory deals with the problem of eliminating one or more variables from a system of polynomial equations, thus reducing the given problem to a smaller problem in fewer variables. For instance, if we wish to solve a0 + a1 x + a2 x2 = b0 + b1 x + b2 x2 = 0,
with a2 = 0 and b2 = 0 then we can eliminate the variable x to get a2 b2 a0 a1 b1 b2 2a0 a2 b0 b2 + a0 a2 b2 + a2 b0 b2 a1 a2 b0 b1 + a2 b2 = 0. (24) 0 2 1 1 2 0 This polynomial of degree 4 is the resultant. It vanishes if and only if the given quadratic polynomials have a common complex root x. The resultant
47
(24) has the following three determinantal representations: a0 0 b0 0 a1 a0 b1 b0 a2 0 a1 a2 b2 0 b1 b2 a0 a1 a2 b1 b2 b0 [01] [02] 0 [01] [02] [02] [12]
(25)
where [ij] = ai bj aj bi . Our aim in this section is to discuss such formulas. The computation of resultants is an important tool for solving polynomial systems. It is particularly well suited for eliminating all but one variable from a system of n polynomials in n unknowns which has nitely many solutions.
4.1
The Univariate Resultant

f g = a0 + a1 x + a2 x2 + + ad1 xd1 + ad xd , = b0 + b1 x + b2 x2 + + be1 xe1 + be xe .
Consider two general polynomials of degrees d and e in one variable:
Theorem 27. There exists a unique (up to sign) irreducible polynomial Res in [a0, a1 , . . . , ad , b0 , b1 , . . . , bd ] which vanishes whenever the polynomials f (x) and g(x) have a common zero. Here and throughout this section common zeros may lie in any algebraically closed eld (say, ) which contains the eld to which we specialize the coecients ai and bj of the given polynomials (say, ). Note that a polynomial with integer coecients being irreducible implies that the coecients are relatively prime. The resultant Res = Resx (f, g) can be expressed as the determinant of the Sylvester matrix a0 a1 a0 . . . Resx (f, g) = ad ad .. . ad 48 a1 . . . .. .. . . a0 a1 . . . be be .. . be . . . b0 b1 b0 b1 . . . .. .. . . b0 b1 . . . (26)
where the blank spaces are lled with zeroes. See the left formula in (24). There are many other useful formulas for the resultant. For instance, suppose that the roots of f are 1 , . . . , d and the roots of g are 1 , . . . , e . Then we have the following product formulas:
d e d e
Resx (f, g) =
ae bd d e
i=1 j=1
(i j ) =
ae d
i=1
g(i) =
(1)de bd e
j=1
f (j ).
From this we conclude the following proposition. Proposition 28. If Cf and Cg are the companion matrices of f and g then Resx (f, g) = ae det g(Cf ) 0 = (1)de bd det f (Cg ) . 0
If f and g are polynomials of the same degree d = e, then the following method for computing the resultant is often used in practice. Compute the following polynomial in two variables, which is called the Bzoutian: e B(x, y) = f (x)g(y) f (y)g(x) xy
d1
=
i,j=0
cij xi y j .
Form the symmetric d d-matrix C = (cij ). Its entries cij are sums of brackets [kl] = ak bl al bk . The case d = 2 appears in (24) on the right. Theorem 29. (Bzout resultant) The determinant of C equals Resx (f, g). e Proof. The resultant Resx (f, g) is an irreducible polynomial of degree 2d in a0 , . . . , ad , b0 , . . . , bd . The determinant of C is also a polynomial of degree 2d. We will show that the zero set of Resx (f, g) is contained in the zero set of det(C). This implies that the two polynomials are equal up to a constant. Looking at leading terms one nds the constant to be either 1 or 1. If (a0 , . . . , ad , b0 , . . . , bd ) is in the zero set of Resx (f, g) then the system f = g = 0 has a complex solution x0 . Then B(x0 , y) is identically zero as a polynomial in y. This implies that the non-zero complex vector (1, x0 , x2 , . . . , xm1 ) lies in the kernel of C, and therefore det(C) = 0. 0 0 The 3 3-determinants in the middle of (24) shows that one can also use mixtures of Bzout matrices and Sylvester matrices. Such hybrid formulas for e resultants are very important in higher-dimensional problems as we shall see below. Let us rst show three simple applications of the univariate resultant. 49
Example. (Intersecting two algebraic curves in the real plane) Consider two polynomials in two variables, say, f = x4 + y 4 1 and g = x5 y 2 4x3 y 3 + x2 y 5 1. We wish to compute the intersection of the curves {f = 0} and {g = 0} in the real plane 2 , that is, all points (x, y) 2 with f (x, y) = g(x, y) = 0. To this end we evaluate the resultant with respect to one of the variables, Resx (f, g) = 2y 28 16y 27 + 32y 26 + 249y 24 + 48y 23 128y 22 + 4y 21 757y 20 112y 19 + 192y 18 12y 17 + 758y 16 + 144y 15 126y 14 +28y 13 251y 12 64y 11 + 30y 10 36y 9 y 8 + 16y 5 + 1.
This is an irreducible polynomial in [y]. It has precisely four real roots y = 0.9242097, y = 0.5974290, y = 0.7211134, y = 0.9665063.
Hence the two curves have four intersection points, with these y-coordinates. By the symmetry in f and g, the same values are also the possible xcoordinates. By trying out (numerically) all 16 conceivable x-y-combinations, we nd that the following four pairs are the real solutions to our equations: (x, y) = (0.9242, 0.7211), (x, y) = (0.7211, 0.9242), (x, y) = (0.5974, 0.9665), (x, y) = (0.9665, 0.5974). Example. (Implicitization of a rational curve in the plane) Consider a plane curve which is given to us parametrically: C = a(t) c(t) , b(t) d(t) 2 : t ,
where a(t), b(t), c(t), d(t) are polynomials in [t]. The goal is to nd the unique irreducible polynomial f [x, y] which vanishes on C. We may nd f by the general Grbner basis approach explained in (Cox, Little & o OShea). It is more ecient, however, to use the following formula: f (x, y) = Rest b(t) x a(t), d(t) y c(t) .
Here is an explicit example in maple of a rational curve of degree six: 50
> a := t^3 - 1: b := t^2 - 5: > c := t^4 - 3: d := t^3 - 7: > f := resultant(b*x-a,d*y-c,t); 2 2 2 f := 26 - 16 x - 162 y + 18 x y + 36 x - 704 x y + 324 y 2 + 378 x y 2 2 3 + 870 x y - 226 x y
3 4 3 2 4 3 + 440 x - 484 x + 758 x y - 308 x y - 540 x y 2 3 3 3 4 2 3 - 450 x y - 76 x y + 76 x y - 216 y Example. (Computation with algebraic numbers) Let and be algebraic numbers over . They are represented by their minimal polynomials f, g [x]. These are the unique (up to scaling) irreducible polynomials satisfying f () = 0 and g() = 0. Our problem is to nd the minimal polynomials p and q for their sum + and their product respectively. The answer is given by the following two formulas p(z) = Resx f (x), g(z x) and q(z) = Resx f (x), g(z/x) xdeg(g) .
It is easy to check the identities p(+) = 0 and q() = 0. It can happen, for special f and g, that the output polynomials p or q are not irreducible. In that event an appropriate factor of p or q will do the trick. As an example consider two algebraic numbers given in terms of radicals: = 5 2, =
3
7/2 1/18 3981 +
7/2 + 1/18 3981.
Their minimal polynomials are 5 2 and 3 + +7 respectively. Using the above formulas, we nd that the minimal polynomial for their sum + is p(z) = z 15 + 5 z 13 + 35 z 12 + 10 z 11 + 134 z 10 + 500 z 9 + 240 z 8 + 2735 z 7 +3530z 6 + 1273z 5 6355z 4 + 12695z 3 + 1320z 2 + 22405z + 16167, and the minimal polynomial for their product equals q(z) = z 15 70 z 10 + 984 z 5 + 134456. 51
4.2
The Classical Multivariate Resultant
Consider a system of n homogeneous polynomials in n indeterminates f1 (x1 , . . . , xn ) = = fn (x1 , . . . , xn ) = 0. (27)
We assume that the i-th equation is homogeneous of degree di > 0, that is, fi = cj1 ,...,jn xj1 xjn , 1 n
j1 ++jn =di (i)
where the sum is over all n+dii 1 monomials of degree di in x1 , . . . , xn . Note d that the zero vector (0, 0, . . . , 0) is always a solution of (27). Our question is to determine under which condition there is a non-zero solution. As a simple example we consider the case of linear equations (n = 3, d1 = d2 = d3 = 1): f1 f2 f3 = = = c1 x1 + c1 x2 + c1 x3 100 010 001 2 2 c100 x1 + c010 x2 + c2 x3 001 3 3 3 c100 x1 + c010 x2 + c001 x3 = = = 0 0 0.
This system has a non-zero solution if and only if the determinant is zero: 1 1 c100 c1 010 c001 2 2 = 0. det c2 100 c010 c001 3 3 3 c100 c010 c001 Returning to the general case, we regard each coecient cj1 ,...,jn of each polynomial fi as an unknown, and we write [c] for the ring of polynomials with integer coecients in these variables. The total number of variables in [c] equals N = n n+dii 1 . For instance, the 3 3-determinant in the i=1 d example above may be regarded as a cubic polynomial in [c]. The following theorem characterizes the classical multivariate resultant Res = Resd1 dn . Theorem 30. Fix positive degrees d1 , . . . , dn . There exists a unique (up to sign) irreducible polynomial Res [c] which has the following properties: (a) Res vanishes under specializing the c j1 ...,jn to rational numbers if and only if the corresponding equations (27) have a non-zero solution in n . (b) Res is irreducible, even when regarded as a polynomial in 52 [c].
(i) (i)
(c) Res is homogeneous of degree d 1 di1 di+1 dn in the coecients (i) (ca : |a| = di ) of the polynomial fi , for each xed i {1, . . . , n}. We sketch a proof of Theorem 30. It uses results from algebraic geometry. Proof. The elements of [u] are polynomial functions on the ane space N . We regard x = (x1 , . . . , xn ) as homogeneous coordinates for the complex projective space P n1 . Thus (u, x) are the coordinates on the product variety N P n1 . Let I denote the subvariety of N P n1 dened by the equations cj1 ,...,jn xj1 xjn 1 n
j1 ++jn =di (i)
for i = 1, 2, . . . , n.
Note that I is dened over . Consider the projection : N P n1 P n1 , (u, x) x. Then (I) = P n1 . The preimage 1 (x) of any point x P n1 can be identied with the set { u N : (u, x) I }. This is a linear subspace of codimension n in N . To this situation we apply (Shafarevich 1994, I.6.3, Theorem 8) to conclude that the variety I is closed and irreducible of codimension n in N P n1 . Hence dim(I) = N 1. Consider the projection : N P n1 N , (u, x) u. It follows from the Main Theorem of Elimination Theory, (Eisenbud 1994, Theorem 14.1) that (I) is an irreducible subvariety of N which is dened over as well. Every point c in N can be identied with a particular polynomial system f1 = = fn = 0. That system has a nonzero root if and only if c lies in the subvariety (I). For every such c we have dim((I)) dim(I) = N 1 dim(1 (c)) + dim((I))
The two inequalities follow respectively from parts (2) and (1) of Theorem 7 in Section I.6.3 of (Shafarevich 1977). We now choose c = (f1 , . . . , fn ) as follows. Let f1 , . . . , fn1 be any equations as in (27) which have only nitely many zeros in P n1 . Then choose fn which vanishes at exactly one of these zeros, say y P n1 . Hence 1 (c) = {(c, y)}, a zero-dimensional variety. For this particular choice of c both inequalities hold with equality. This implies dim((I)) = N 1. We have shown that the image of I under is an irreducible hypersurface in N , which is dened over . Hence there exists an irreducible polynomial Res [c], unique up to sign, whose zero set equals (I). By construction, this polynomial Res(u) satises properties (a) and (b) of Theorem 30. Part (c) of the theorem is derived from Bzouts Theorem. e 53
Various determinantal formulas are known for the multivariate resultant. The most useful formulas are mixtures of Bzout matrices and Sylvester mae trices like the expression in the middle of (25). Exact division-free formulas of this kind are available for n 4. We discuss such formulas for n = 3. The rst non-trivial case is d1 = d2 = d3 = 2. Here the problem is to eliminate two variables x and y from a system of three quadratic forms F G H = a0 x2 + a1 xy + a2 y 2 + a3 xz + a4 yz + a5 z 2 , = b0 x2 + b1 xy + b2 y 2 + b3 xz + b4 yz + b5 z 2 , = c0 x2 + c1 xy + c2 y 2 + c3 xz + c4 yz + c5 z 2 .
To do this, we rst compute their Jacobian determinant F/x F/y F/z J := det G/x G/y G/z . H/x H/y H/z We next compute the partial derivatives of J. They are quadratic as well: J/x J/y J/z = u0 x2 + u1 xy + u2 y 2 + u3xz + u4 yz + u5 z 2 , = v0 x2 + v1 xy + v2 y 2 + v3 xz + v4 yz + v5 z 2 , = w0 x2 + w1 xy + w2 y 2 + w3 xz + w4 yz + w5 z 2 .
Each coecient ui , vj or wk is a polynomial of degree 3 in the original coecients ai , bj , ck . The resultant of F, G and H coincides with the following 6 6-determinant: a0 b0 c0 u0 v0 w0 a1 b1 c1 u1 v1 w1 a2 b2 c2 u2 v2 w2 (28) Res2,2,2 = det a3 b3 c3 u3 v3 w3 a4 b4 c4 u4 v4 w4 a5 b5 c5 u5 v5 w5 This is a homogeneous polynomial of degree 12 in the 18 unknowns a0 , a1 , . . . , a5 , b0 , b1 , . . . , b5 , c0 , c1 , . . . , c5 . The full expansion of Res has 21, 894 terms. In a typical application of Res2,2,2 , the coecients ai , bj , ck will themselves be polynomials in another variable t. Then the resultant is a polynomial in t which represents the projection of the desired solutions onto the t-axis. 54
Consider now the more general case of three ternary forms f, g, h of the same degree d = d1 = d2 = d3 . The following determinantal formula for their resultant was known to Sylvester. We know from part (c) of Theorem 30 that Resd,d,d is a homogeneous polynomial of degree 3d2 in 3 d+2 unknowns. We 2 shall express Resd,d,d as the determinant of a square matrix of size 2d 2 = d 2 + d 2 + d 2 + d+1 . 2
We write Se = [x, y, z]e for the e+2 -dimensional vector space of ternary 2 forms of degree e. Our matrix represents a linear map of the following form Sd2 Sd2 Sd2 Sd1 S2d2 ( a, b, c, u ) a f + b g + c h + (u), where is a linear map from Sd1 to S2d2 to be described next. We shall dene by specifying its image on any monomial xi y j z k with i+j+k = d1. For any such monomial, we choose arbitrary representations f g h = xi+1 Px + y j+1 Py + z k+1 Pz = xi+1 Qx + y j+1 Qy + z k+1 Qz = xi+1 Rx + y j+1 Ry + z k+1 Rz ,
where Px , Qx , Rx are homogeneous of degree d i 1, Py , Qy , Ry are homogeneous of degree d j 1, and Pz , Qz , Rz are homogeneous of degree d k 1. Then we dene P x Py Pz = det Qx Qy Qz . xi y j z k Rx Ry Rz Note that this determinant is indeed a ternary form of degree (d i 1) + (d j 1) + (d k 1) = 3d 3 (i + j + k) = 2d 2.
4.3
The Sparse Resultant
Most systems of polynomial equations encountered in real world applications are sparse in the sense that only few monomials appear with non-zero coecient. The classical multivariate resultant is not well suited to this situation. As an example consider the following system of three quadratic equations: f = a0 x + a1 y + a2 xy, g = b0 + b1 xy + b2 y 2 , 55 h = c0 + c1 xy + c2 x2 .
If we substitute the coecients of f, g and h into the resultant Res2,2,2 in (28) then the resulting expression vanishes identically. This is consistent with Theorem 30 because the corresponding homogeneous equations F = a0 xz +a1 yz + a2 xy, G = b0 z 2 +b1 xy +b2 y 2 , H = c0 z 2 +c1 xy + c2 y 2
always have the common root (1 : 0 : 0), regardless of what the coecients ai , bj , ck are. In other words, the three given quadrics always intersect in the projective plane. But they generally do not intersect in the ane plane 2 . In order for this to happen, the following polynomial in the coecients must vanish: a2 b2 b2 c2 c1 2a2 b2 b1 b0 c0 c2 + a2 b2 b2 c3 a2 b3 c2 c2 + 2a2 b2 b0 c0 c1 c2 1 1 0 1 1 1 0 1 1 1 0 1 1 a2 b1 b2 c2 c2 2a1 a0 b2 b1 c2 c1 + 2a1 a0 b2 b0 c0 c2 + 2a1 a0 b2 b2 c2 c2 1 0 1 2 0 2 1 1 0 2 2 2 2 2 2 2 3 2 2a1 a0 b2 b0 c1 c2 2a1 a0 b1 b0 c0 c2 + 2a1 a0 b1 b0 c1 c2 + a0 b2 c0 c1 a2 b2 b1 c2 c2 0 2 0 2 2 2 2 2 2 2 2 2 3 2 2 3 2a0 b2 b0 c0 c1 c2 + 2a0 b2 b1 b0 c0 c2 + a0 b2 b0 c1 c2 a0 b1 b0 c2 a2 b2 b1 c0 +a2 b2 b0 c2 c1 + 2a2 b2 b1 b0 c2 c2 2a2 b2 b2 c0 c1 c2 a2 b1 b2 c0 c2 + a2 b3 c1 c2 . 2 2 0 2 0 2 0 2 0 2 2 0 2 The expression is the sparse resultant of f, g and h. This resultant is customtailored to the specic monomials appearing in the given input equations. In this section we introduce the set-up of sparse elimination theory. In particular, we present the precise denition of the sparse resultant. Let A0 , A1, . . . , An be nite subsets of n. Set mi := #(Ai ). Consider a system of n + 1 Laurent polynomials in n variables x = (x1 , . . . , xn ) of the form fi (x) =
aAi
cia xa
(i = 0, 1, . . . , n).
Here xa = xa1 xan for a = (a1 , . . . , an ) n. We say that Ai is the 1 n support of the polynomial fi (x). In the example above, n = 2, m1 = m2 = m3 = 3, A0 = { (1, 0), (0, 1), (1, 1) } and A1 = A2 = { (0, 0), (1, 1), (0, 2)}. For any subset J {0, . . . , n} consider the ane lattice spanned by jJ Aj , LJ :=
jJ
j a(j) | a(j) Aj , j
for all j J and

jJ
j = 1 .
We may assume that L{0,1,...,n} = n. Let rank(J) denote the rank of the lattice LJ . A subcollection of supports {Ai }iI is said to be essential if rank(I) = #(I) 1 and rank(J) #(J) for each proper subset J of I. 56
The vector of all coecients cia appearing in f0 , f1 , . . . , fn represents a point in the product of complex projective spaces P m0 1 P mn 1 . Let Z denote the subset of those systems (4.3) which have a solution x in ( )n , where := \{0}. Let Z be the closure of Z in P m0 1 P mn 1 . Lemma 31. The projective variety Z is irreducible and dened over . It is possible that Z is not a hypersurface but has codimension 2. This is where the condition that the supports be essential comes in. It is known that the codimension of Z in P m0 1 P mn 1 equals the maximum of the numbers #(I) rank(I), where I runs over all subsets of {0, 1, . . . , n}. We now dene the sparse resultant Res. If codim(Z) = 1 then Res is the unique (up to sign) irreducible polynomial in [. . . , cia , . . .] which vanishes on the hypersurface Z. If codim(Z) 2 then Res is dened to be the constant 1. We have the following result, Theorem 32, which is a generalization of Theorem 30 in the same way that Bernsteins Theorem generalizes Bzouts e Theorem. Theorem 32. Suppose that {A0 , A1, . . . , An } is essential, and let Qi denote the convex hull of Ai . For all i {0, . . . , n} the degree of Res in the ith group of variables {cia , a Ai } is a positive integer, equal to the mixed volume M(Q0 , . . . , Qi1 , Qi+1 . . . , Qn ) = (1)#(J) vol
jJ
Qj .
J{0,...,i1,i+1...,n}
We refer to (Gelfand, Kapranov & Zelevinsky 1994) and (Pedersen & Sturmfels 1993) for proofs and details. The latter paper contains the following combinatorial criterion for the existence of a non-trivial sparse resultant. Note that, if each Ai is n-dimensional, then I = {0, 1, . . . , n} is essential. Corollary 33. The variety Z has codimension 1 if and only if there exists a unique subset {Ai}iI which is essential. In this case the sparse resultant Res coincides with the sparse resultant of the equations {fi : i I}. Here is a small example. For the linear system c00 x + c01 y = c10 x + c11 y = c20 x + c21 y + c22 = 0.
the variety Z has codimension 1 in the coecient space P 1 P 1 P 2 . The unique essential subset consists of the rst two equations. Hence the sparse 57
resultant of this system is not the 3 3-determinant (which would be reducible). The sparse resultant is the 2 2-determinant Res = c00 c11 c10 c01 . We illustrate Theorem 32 for our little system {f, g, h}. Clearly, the triple of support sets {A1 , A2, A3 } is essential, since all three Newton polygons Qi = conv(Ai ) are triangles. The mixed volume of two polygons equals M(Qi , Qj ) = area(Qi + Qj ) area(Qi ) area(Qj ).
In our example the triangles Q2 and Q3 coincide, and we have area(Q1 ) = 1/2, area(Q2 ) = 1, area(Q1 + Q2 ) = 9/2, area(Q2 + Q3 ) = 4. This implies M(Q1 , Q2 ) = M(Q1 , Q3 ) = 3 and M(Q2 , Q3 ) = 2. This explains why the sparse resultant above is quadratic in (a0 , a1 , a2 ) and homogeneous of degree 3 in (b0 , b1 , b2 ) and in (c0 , c1 , c2 ) respectively. One of the central problems in elimination theory is to nd nice determinantal formulas for resultants. The best one can hope for is a Sylvester-type formula, that is, a square matrix whose non-zero entries are the coecients of the given equation and whose determinant equals precisely the resultant. The archetypical example of such a formula is (26). Sylvester-type formulas do not exist in general, even for the classical multivariate resultant. If a Sylvester-type formula is not available or too hard to nd, the next best thing is to construct a reasonably small square matrix whose determinant is a non-zero multiple of the resultant under consideration. For the sparse resultant such a construction was given in (Canny and Emiris 1995)
58
and (Sturmfels 1994). A Canny-Emiris matrix for our example is y2 yf a1 y2f 0 xy 2 f 0 y 2 g b0 xy 2 g 0 yg 0 xyg 0 xy 2 h 0 yh 0 xyh 0 y3 0 a1 0 0 0 b2 0 0 c2 0 xy 3 0 a2 a1 b1 0 0 b2 0 0 c2 y4 0 0 0 b2 0 0 0 0 0 0 xy 4 0 0 0 0 b2 0 0 c2 0 0 xy 2 a2 a0 0 0 b0 b1 0 c0 c1 0 x2 y 2 0 0 a0 0 0 0 b1 0 0 c1 x2 y 3 0 0 a2 0 b1 0 0 c1 0 0 y 0 0 0 0 0 b0 0 0 c0 0 xy a0 0 0 0 0 0 b0 0 0 c0
The determinant of this matrix equals a1 b2 times the sparse resultant. The structure of this 10 10-matrix can be understood as follows. Form the product f gh and expand it into monomials in x and y. A certain combinatorial rule selects 10 out of the 15 monomials appearing in f gh. The columns are indexed by these 10 monomials. Say the ith column is indexed by the monomial xj y k . Next there is a second combinatorial rule which selects a monomial multiple of one of the input equations f , g or h such that this multiple contains xi y j in its expansion. The ith row is indexed by that polynomial. Finally the (i, j)-entry contains the coecient of the jth column monomial in the ith row polynomial. This construction implies that the matrix has non-zero entries along the main diagonal. The two combinatorial rules mentioned in the previous paragraph are based on the geometric construction of a mixed subdivision of the Newton polytopes. The main diculty overcome by the Canny-Emiris formula is this: If one sets up a matrix like the one above just by playing around then most likely its determinant will vanish (try it), unless there is a good reason why it shouldnt vanish. Now the key idea is this: a big unknown polynomial (such as Res) will be non-zero if one can ensure that its initial monomial (with respect to some term order) is non-zero. Consider the lexicographic term order induced by the variable ordering a1 > a0 > a2 > b2 > b1 > b0 > c0 > c1 > c2 . The 24 monomials of Res are listed in this order above. All 10 ! permutations contribute a (possible) nonzero term to the expansion of the determinant of the Canny-Emiris matrix. There will undoubtedly be some cancellation. However, the unique largest 59
monomial (in the above term order) appears only once, namely, on the main diagonal. This guarantees that the determinant is a non-zero polynomial. Note that the product of the diagonal elements in the 10 10-matrix equals a1 b2 times the underlined leading monomial. An explicit combinatorial construction for all possible initial monomials (with respect to any term order) of the sparse resultant is given in (Sturmfels 1993). It is shown there that for any such initial monomial there exists a Canny-Emiris matrix which has that monomial on its main diagonal.
4.4
The Unmixed Sparse Resultant
In this section we consider the important special case when the given Laurent polynomials f0 , f1 , . . . , fn all have the same support: A := A 0 = A1 = = A n
n
In this situation, the sparse resultant Res is the Chow form of the projective toric variety XA which is given parametrically by the vector of monomials xa : a A . Chow forms play a central role in elimination theory, and it is of great importance to nd determinantal formulas for Chow forms of frequently appearing projective varieties. Signicant progress in this direction has been made in the recent work of Eisenbud, Floystad, Schreyer on exterior syzygies and the Bernstein-Bernstein-Beilinson correspondence. Khetan (2002) has applied these techniques to give an explicit determinantal formula of mixed Bzout-Sylvester type for the Chow form of any toric surface e or toric threefold. This provides a very practical technique for eliminating two variables from three equations or three variables from four equations. We describe Khetans formula for an example. Consider the following unmixed system of three equations in two unknowns: f = a1 + a2 x + a3 y + a4 xy + a5 x2 y + a6 xy 2 , g = b1 + b2 x + b3 y + b4 xy + b5 x2 y + b6 xy 2 , h = c1 + c2 x + c3 y + c4 xy + c5 x2 y + c6 xy 2 .
The common Newton polygon of f, g and h is a pentagon of normalized area 5. It denes a toric surface of degree 5 in projective 5-space. The sparse unmixed resultant Res = Res(f, g, h) is the Chow form of this surface. It
60
can be written as a homogeneous polynomial ai aj bi bj [ijk] = ci cj
of degree 5 in the brackets ak bk . ck
Hence Res is a polynomial of degree 15 in the 18 unknowns a1 , a2 , . . . , c6 . It equals the determinant of the following 9 9-matrix 0 [124] 0 [234] [235] [236] a1 b1 c1 0 [125] 0 0 0 0 a2 b2 c2 0 [126] 0 [146] [156][345] [346] a3 b3 c3 0 0 0 [345][156][246] [256] [356] a4 b4 c4 0 0 0 [256] 0 0 a5 b5 c5 0 0 0 [356] [456] 0 a6 b6 c6 a1 b1 c1 d1 e1 f1 0 0 0 a2 b2 c2 d2 e2 f2 0 0 0 b3 c3 d3 e2 f3 0 0 0 a3
4.5
The Resultant of Four Trilinear Equations
Polynomial equations arising in many applications are multihomogeneous. Sometimes we are even luckier and the equations are multilinear, that is, multihomogeneous of degree (1, 1, . . . , 1). This will happen in Lecture 6. The resultant of a multihomogeneous system is the instance of the sparse resultant where the Newton polytopes are products of simplices. There are lots of nice formulas available for such resultants. For a systematic account see (Sturmfels and Zelevinsky 1994) and (Dickenstein and Emiris 2002). In this section we discuss a one particular example, namely, the resultant of four trilinear polynomials in three unknowns. This material was prepared by Amit Khetan. The given equations are fi = Ci7 x1 x2 x3 + Ci6 x1 x2 + Ci5 x1 x3 + Ci4 x1 + Ci3 x2 x3 + Ci2 x2 + Ci1 x3 + Ci0 , where i = 0, 1, 2, 3. The four polynomials f0 , f1 , f2 , f3 in the unknowns x1 , x2 , x3 share the same Newton polytope, the standard 3-dimensional cube. Hence our system is the unmixed polynomial system supported on the 3-cube. The resultant Res(f0 , f1 , f2 , f3 ) is the unique (up to sign) irreducible polynomial in the 32 indeterminates Cij which vanishes if f0 = f1 = f2 = 61
f3 = 0 has a common solution (x1 , x2 , x3 ) in 3 . If we replace the ane space 3 by the product of projective lines 1 1 1 , then the if in the previous sentence can be replaced by if and only if. The resultant is a homogeneous polynomial of degree 24, in fact, it is homogeneous of degree 6 in the coecients of fi for each i. In algebraic geometry, we interpret this resultant as the Chow form of the Segre variety 1 1 1 7 . We rst present a Sylvester matrix for Res. Let S(a, b, c) denote the vector space of all polynomials in [x1 , x2 , x3 ] of degree less than or equal to a in x1 , less than or equal to b in x2 , and less than or equal to c in x3 . The dimension of S(a, b, c) is (a + 1)(b + 1)(c + 1). Consider the -linear map : S(0, 1, 2)4 S(1, 2, 3) , (g0 , g1 , g2 , g3 ) g0 f0 + g1 f1 + g2 f2 + g3 f3 . Both the range and the image of the linear map are vector spaces of dimension 24. We x the standard monomial bases for both of these vector spaces. Then the linear map is given by a 24 24 matrix. Each non-zero entry in this matrix is one of the coecients Cij . In particluar, the determinant of is a polynomial of degree 24 in the 36 unknowns Cij . Proposition 34. The determinant of the matrix equals Res(f0 , f1 , f2 , f3 ). This formula is a Sylvester Formula for the resultant of four trilinear polynomials. The Sylvester formula is easy to generate, but it is not the most ecient representation when it comes to actually evaluating our resultant. A better representation is the following Bzout formula. e For i, j, k, l {1, 2, 3, 4, 5, 6, 7, 8} we dene the bracket variables C0i C0j C0k C0l C C1j C1k C1l [ijkl] = det 1i C2i C2j C2k C2l C3i C3j C3k C3l We shall present a 6 6 matrix B whose entries are linear forms in the bracket variables, such that det B = Res(f0 , f1 , f2 , f3 ). This construction is described, for arbitrary products of projective spaces, in a recent paper by Dickenstein and Emiris (2002). First construct the 4 4-matrix M such that M0j = fj (x1 , x2 , x3 ) for j = 0, 1, 2, 3 Mij = fj (y1 , . . . , yi, xi+1 , . . . , x3 ) fj (y1 , . . . , yi1 , xi , . . . , x3 ) yi xi 62
for i = 1, 2, 3 and j = 1, 2, 3, 4 The rst row of the matrix M consists of the given polynomials fi , while each successive row of M is an incremental quotient with each xi successively replaced by a corresponding yi. After a bit of simplication, such as subtracting x1 times the second row from the rst, the matrix M gets replaced by a 4 4-matrix of the form C03 x2 x3 + C02 x2 + C01 x3 + C00 . . . C x x + C06 x2 + C05 x3 + C04 . . . M = 07 2 3 C07 y1 x3 + C06 y1 + C03 x3 + C02 . . . C07 y1 y2 + C05 y1 + C03 y2 + C01 . . . Let B(x, y) denote the determinant of this matrix. This is a polynomial in two sets of variables. It is called the (ane) Bezoutian of the given trilinear forms f0 , f1 , f2 , f3 . It appears from the entries of M that B(x, y) has total degree 8, but this is not the case. In fact, the total degree of this polynomial is only 6. The monomials x y = x1 x2 x3 y1 1 y2 2 y3 3 appearing in B(x, y) 1 2 3 satisfy i < i and i < 3 i. This is the content of the lemma below. The coecient b of x y in B(x, y) is a linear form in the bracket variables. Lemma 35. B(x, y) S(0, 1, 2) S(2, 1, 0). We can interpret the polynomial B(x, y) as as a linear map, also denote B, from the dual vector space S(2, 1, 0) to S(0, 1, 2). Each of these two vector spaces is 6-dimensional and has a canonical monomial basis. The following 6 6-matrix represents the linear map B in the monomial basis:
[0124] [0234] [1234] + [0235] [1235] [0236] [1236] + [0237] [1237] [0146] [0245] [0147] + [0156] [0345] [1245] [0157] [1345] [1246] + [0256] [1247] [1346] [0257] + [0356] [1347] + [0357] [0346] [0247] [1247] + [0356] [0257] + [1346] [1257] + [1356] [2346] [0267] [0367] [1267] [2356] + [2347] [1367] + [2357] [0456] [1456] [0457] [1457] [2456] [3456] [2457] [3457] [0467]
[0125] [0134] [0135] [0126] [0136] [0127] [0137]
[1467] + [0567] [1567] [2467] [2567] + [3467] [3567]
Proposition 36. Res(f0 , f1 , f2 , f3 ) is the determinant of the above matrix. This type of formula is called a Bzout formula or sometimes pure Bzout e e formula formula in the resultant literature. Expanding the determinant gives a polynomial of degree 6 in the brackets with 11, 280 terms. It remains 63
an formidable challenge to further expand this expression into an honest polynomial of degree 24 in the 32 coecients Cij .
Primary Decomposition
In this lecture we consider arbitrary systems of polynomial equations in several unknowns. The solution set of these equations may have many dierent components of dierent dimensions, and our task is to identify all of these irreducible components. The algebraic technique for doing this is primary decomposition. After reviewing the relevant basic results from commutative algebra, we demonstrate how to do such computations in Singular and Macaulay2. We then present some particularly interesting examples.
5.1
Prime Ideals, Radical Ideals and Primary Ideals
Let I be an ideal in the polynomial ring [x] = [x1 , . . . , xn ]. Solving the polynomial system I means at least nding the irreducible decomposition V(I) = V(P1 ) V(P2 ) V(Pr )
n
of the complex variety dened by I. Here each V(Pi ) is an irreducible variety over the eld of rational numbers . Naturally, if we extend scalars and pass to the complex numbers , then V(Pi ) may further decompose into more components, but describing those components typically involves numerical computations. The special case where I is zero-dimensional was discussed in Lecture 2. In this lecture we mostly stick to doing arithmetic in [x] only. Recall that an ideal P in [x] is a prime ideal if (P : f ) = P for all f [x]\P (29)
A variety is irreducible if it can be dened by a prime ideal. Deciding whether a given ideal is prime is not an easy task. See Corollary 40 below for a method that works quite well (say, in Macaulay2) on small enough examples. Fix an ideal I in [x]. A prime ideal P is said to be associated to I if there exists f [x] such that (I : f ) = P. (30)
A polynomial f which satises (I : f ) = P is called a witness for P in I. We write Ass(I) for the set of all prime ideals which are associated to I. 64
Proposition 37. For any ideal I [x], Ass(I) is non-empty and nite. Here are some simple examples of ideals I, primes P and witnesses f . Example 38. In each of the following six cases, P is a prime ideal in the polynomial ring in the given unknowns, and the identity (I : f ) = P holds. (a) I = x4 x2 , f = x3 x1 , P = x1 . 1 1 1 (a) I = x4 x2 , f = x17 x16 , P = x1 + 1 . 1 1 1 1 (b) I = x1 x4 + x2 x3 , x1 x3 , x2 x4 , f = x2 , P = x1 , x2 . 4 (b) I = x1 x4 + x2 x3 , x1 x3 , x2 x4 , f = x1 x4 , P = x1 , x2 , x3 , x4 . (c) I = x1 x2+x3 x4 , x1 x3+x2 x4 , x1 x4+x2 x3 , f = (x2x2 )x4 , P = x1 , x2 , x3 . 3 4 (c) I = x1 x2 +x3 x4 , x1 x3 +x2 x4 , x1 x4 +x2 x3 , f = x1 x2 +x2 x2 x3 x2 +x2 x4 , 4 4 4 3 P = x1 x4 , x2 x4 , x3 + x4 . The radical of an ideal I equals the intersection of all its associated primes: Rad(I) = P : P Ass(I) . (31)
The computation of the radical and the set of associated primes are built-in commands in Macaulay 2. The following session checks whether the ideals in (b) and (c) of Example 38 are radical, and it illustrates the identity (31). i1 : R = QQ[x1,x2,x3,x4]; i2 : I = ideal( x1*x4+x2*x3, x1*x3, x2*x4 ); i3 : ass(I) o3 = {ideal (x4, x3), ideal (x2, x1), ideal (x4, x3, x2, x1)} i4 : radical(I) == I o4 = false i5 : radical(I) o5 = ideal (x2*x4, x1*x4, x2*x3, x1*x3)
65
i6 : intersect(ass(I)) o6 = ideal (x2*x4, x1*x4, x2*x3, x1*x3) i7 : ass(radical(I)) o7 = {ideal (x4, x3), ideal (x2, x1)} i8 : J = ideal( x1*x2+x3*x4, i9 : ass(J) o9 = {ideal ideal ideal ideal x1*x3+x2*x4, x1*x4+x2*x3 );
(x3 (x3 (x3 (x3
+ + -
x4, x4, x4, x4,
x2 x2 x2 x2
+ + -
x4, x4, x4, x4,
x1 x1 x1 x1
+ +
x4), x4), x4), x4),
ideal ideal ideal ideal
(x4, (x4, (x4, (x3,
x2, x3, x3, x2,
x1), x1), x2), x1)}
i10 : radical(J) == J o10 = true The following result is a useful trick for showing that an ideal is radical. any term order. If the Proposition 39. Let I be an ideal in [x] and initial monomial ideal in (I) is square-free then I is a radical ideal. An ideal I in [x] is called primary if the set Ass(I) is a singleton. In that case, its radical Rad(I) is a prime ideal and Ass(I) = Rad(I) . Corollary 40. The following three conditions are equivalent for an ideal I: (1) I is a prime ideal; (2) I is radical and primary; (3) Ass(I) = I .
We can use the condition (3) to test whether a given ideal is prime. Here is an interesting example. Let X = (xij ) and Y = (yij ) be two n n-matrices both having indeterminate entries. Each entry in their commutator XY Y X is a quadratic polynomial in the polynomial ring [X, Y ] generated by the 2n2 unknowns xij , yij . We let I denote the ideal generated by these n2 quadratic polynomials. It is known that the commuting variety V(I) is an irreducible variety in nn but it is unknown whether I is always prime ideal. The following Macaulay2 session proves that I is a prime ideal for n = 2. 66
i1 : R = QQ[ x11,x12,x21,x22, y11,y12,y21,y22 ]; i2 : X = matrix({ {x11,x12} , {x21,x22} }); i3 : Y = matrix({ {y11,y12} , {y21,y22} }); i4 : I = ideal flatten ( X*Y - Y*X ) o4 = ideal (- x21*y12 + x12*y21, x21*y12 - x12*y21, x21*y11 - x11*y21 + x22*y21 - x21*y22, - x12*y11 + x11*y12 - x22*y12 + x12*y22) i5 : ass(I) == {I} o5 = true
5.2
How to Compute a Primary Decomposition
The following is the main result about primary decompositions in [x]. Theorem 41. Every ideal I in [x] is an intersection of primary ideals, I = Q1 Q2 Qr , (32)
where the primes Pi = Rad(Qi ) are distinct and associated to I. It is an immediate consequence of (31) that the following inclusion holds: Ass Rad(I) Ass I .
In the situation of Theorem 41, the associated prime Pi is a minimal prime of I if it also lies in Ass Rad(I) . In that case, the corresponding primary component Qi of I is unique, and it can be recovered computationally via Qi = I : (I : Pi ) . (33) On the other hand, if Pi lies in Ass I \ Ass Rad(I) then Pi is an embedded prime of I and the primary component Qi in Theorem 41 is not unique. A full implementation of a primary decomposition algorithm is available in Singular. We use the following example to demonstrate how it works. I = xy, x3 x2 , x2 y xy = x x 1, y x2 , y .
The rst two components are minimal primes while the third component is an embedded primary component. Geometrically, V(I) consists of the y-axis, a point on the x-axis, and an embedded point at the origin. Here is Singular: 67
> ring R = 0, (x,y), dp; > ideal I = x*y, x^3 - x^2, x^2*y - x*y; > LIB "primdec.lib"; > primdecGTZ(I); [1]: [1]: _[1]=x [2]: _[1]=x [2]: [1]: _[1]=y _[2]=x-1 [2]: _[1]=y _[2]=x-1 [3]: [1]: _[1]=y _[2]=x2 [2]: _[1]=x _[2]=y > exit; Auf Wiedersehen. The output consists of three pairs denoted [1], [2], [3]. Each pair consists of a primary ideal Qi in [1] and the prime ideal P = Rad(Qi ) in [2]. We state two more results about primary decomposition which are quite useful in practice. Recall that a binomial is a polynomial of the form xi1 xi2 xin xj1 xj2 xjn , 1 2 n 1 2 n where and are scalars, possibly zero. An ideal I is a binomial ideal if it is generated by a set of binomials. All examples of ideals seen in this lecture so far are binomial ideals. Note that every monomial ideal is a binomial ideal. The following theorem, due to Eisenbud and Sturmfels (1996), states that primary decomposition is a binomial-friendly operation. Here we must pass 68
to an algebraically closed eld such as . Otherwise the statement is not true as the following primary decomposition in one variable over shows: x11 1 = x 1 x10 + x9 + x8 + x7 + x6 + x5 + x4 + x3 + x2 + x + 1 .
Theorem 42. If I is a binomial ideal in [x] then the radical of I is binomial, every associated prime of I is binomial, and I has a primary decomposition where each primary component is a binomial ideal. Of course, these statements are well-known (and easy to prove) when binomial is replaced by monomial. For details on monomial primary decomposition see the chapter by Hosten and Smith in the Macaulay2 book. Another class of ideals which behave nicely with regard to primary decomposition are the Cohen-Macaulay ideals. The archetype of a CohenMacaulay ideal is a complete intersection, that is, an ideal I of codimension c which is generated by c polynomials. The case c = n of zero-dimensional complete intersections was discussed at length in earlier lectures, but also higher-dimensional complete intersections come up frequently in practice. Theorem 43. (Macaulays Unmixedness Theorem) If I is a complete intersection of codimension c in [x] then I has no embedded primes and every minimal prime of I has codimension c as well. When computing a non-trivial primary decomposition, it is advisable to keep track of the degrees of the pieces. The degree of an ideal I is additive in the sense that degree(I) is the sum of over degree(Qi ) where Qi runs over all primary components of maximal dimension in (32). Theorem 43 implies Corollary 44. If I is a homogeneous complete intersection, then
r
degree(I)
=
i=1
degree(Qi ).
In the following sections we shall illustrate these results for some interesting systems of polynomial equations derived from matrices.
5.3
Adjacent Minors
The following problem is open and appears to be dicult: What does it mean for an m n-matrix to have all adjacent k k-subdeterminants vanish? 69
To make this question more precise, x an mn-matrix of indeterminates X = (xi,j ) and let [X] denote the polynomial ring in these mn unknowns. For any two integers i {1, . . . , n k + 1} and j {1, . . . , m k + 1} we consider the following k k-minor xi,j xi,j+1 ... xi,j+k1 xi+1,j xi+1,j+1 . . . xi+1,j+k1 (34) det . . . .. . . . . . . . xi+k1,j xi+k1,j+1 . . . xi+k1,j+k1 Let Ak,m,n denote the ideal in [X] generated by these adjacent minors. Thus Ak,m,n is an ideal generated by (n k + 1)(m k + 1) homogeneous polynomials of degree k in mn unknowns. The variety V(Am,n,k ) consists of all complex mn-matrices whose adjacent k k-minors vanish. Our problem is to describe all the irreducible components of this variety. Ideally, we would like to know an explicit primary decomposition of the ideal Ak,m,n . In the special case k = m = 2, our problem has the following beautiful solution. Let us rename the unknowns and consider the 2 2-matrix X = x1 x2 xn y1 y2 y n .
Our ideal A2,2,n is generated by the n 1 binomials xi1 yi xi yi1 (i = 2, 3, . . . , n).
These binomials form a Grbner basis because the underlined leading monoo mials are relatively prime. This shows that A2,2,n is a complete intersection of codimension n 1. Hence Theorem 43 applies here. Moreover, since the leading monomials are square-free, Proposition 39 tells us that A2,2,n is a radical ideal. Hence we know already, without having done any computations, that A2,2,n is an intersection of prime ideals each having codimension n. The rst case which exhibits the full structure is n = 5, here in Macaulay2: i1: R = QQ[x1,x2,x3,x4,x5,y1,y2,y3,y4,y5]; i2: A225 = ideal( x1*y2 - x2*y1, x2*y3 - x3*y2, x3*y4 - x4*y3, x4*y5 - x5*y4); i3: ass(A225) 70
o3 = { ideal(y4, y2, x4, x2), ideal(y3, x3, x5*y4 - x4*y5, x2*y1 - x1*y2), ideal(y4, x4, x3*y2 - x2*y3, x3*y1 - x1*y3, x2*y1 - x1*y2), ideal(y2, x2, x5*y4 - x4*y5, x5*y3 - x3*y5, x4*y3 - x3*y4), ideal (x5*y4 - x4*y5, x5*y3 - x3*y5, x4*y3 - x3*y4, x5*y2 - x2*y5, x4*y2 - x2*y4, x3*y2 - x2*y3, x5*y1-x1*y5, x4*y1-x1*y4, x3*y1-x1*y3, x2*y1-x1*y2)} i4: A225 == intersect(ass(A225)) o4 = true After a few more experiments one conjectures the following general result: Theorem 45. The number of associated primes of A2,2,n is the Fibonacci number f (n), dened by f (n) = f (n1) + f (n2) and f (1) = f (2) = 1. Proof. Let F (n) denote the set of all subsets of {2, 3, . . . , n 1} which do not contain two consecutive integers. The cardinality of F (n) equals the Fibonacci number f (n). For instance, F (5) = , {2}, {3}, {4}, {2,4} . For each element S of F(n) we dene a binomial ideal PS in [X]. The generators of PS are the variables xi and yi for all i S, and the binomials xj yk xk yj for all j, k S such that no element of S lies between j and k. It is easy to see that PS is a prime ideal of codimension n 1. Moreover, PS contains A2,2,n , and therefore PS is a minimal prime of A2,2,n . We claim that A2,2,n =
SF (n)
PS .
In view of Theorem 43 and Corollary 44, it suces to prove the identity degree(PS )
SF (n)
2n1 .
First note that P is the determinantal ideal xi yj xi xj : 1 i < j n . The degree of P equals n. Using the same fact for matrices of smaller size, we nd that, for S non-empty, the degree of the prime PS equals the product i1 (i2 i1 +1)(i3 i2 +1) (ir ir1 +1)ir where S = {i1 < i2 < < ir }. Consider the surjection : 2{2,...,n} F (n) dened by {j1 < j2 < < jr }) = 71 {jr1 , jr3 , jr5 , . . .}.
The product displayed above is the cardinality of the inverse image 1 (S). 1 n This proves SF (n) #( (S)) = 2 , which implies our assertion. Our result can be phrased in plain English as follows: if all adjacent 2 2-minors of a 2 n-matrix vanish then the matrix is a concatenation of 2 ni -matrices of rank 1 separated by zero columns. Unfortunately, things are less nice for larger matrices. First of all, the ideal Ak,m,n is neither radical nor a complete intersecion. For instance, A2,3,3 has four associated primes, one of which is embedded. Here is the Singular code for the ideal A2,3,3 : ring R = 0,(x11,x12,x13,x21,x22,x23,x31,x32,x33),dp; ideal A233 = x11*x22-x12*x21, x12*x23-x13*x22, x21*x32-x22*x31, x22*x33-x23*x32; LIB "primdec.lib"; primdecGTZ(A233); The three minimal primes of A2,3,3 translate into English as follows: if all adjacent 2 2-minors of a 3 3-matrix vanish then either the middle column vanishes, or the middle row vanishes, or the matrix has rank at most 2. The binomial ideals A2mn were studied by (Diaconis, Eisenbud and Sturmfels 1998). The motivation was an application to statistics to be described in Lecture 8. The three authors found a primary decomposition for the case m = n = 4. The ideal of adjacent 2 2-minors of a 4 4-matrix is A244 = x12 x21 x11 x22 , x13 x22 x12 x23 , x14 x23 x13 x24 , x22 x31 x21 x32 , x23 x32 x22 x33 , x24 x33 x23 x34 , x32 x41 x31 x42 , x33 x42 x32 x43 , x34 x43 x33 x44 .
Let P denote the prime ideal generated by all thirty-six 2 2-minors of our 4 4-matrix (xij ) of indeterminates. We also introduce the prime ideals C1 := x12 , x22 , x23 , x24 , x31 , x32 , x33 , x43 C2 := x13 , x21 , x22 , x23 , x32 , x33 , x34 , x42 . and the prime ideals A B := := x12 x21 x11 x22 , x13 , x23 , x31 , x32 , x33 , x43 x11 x22 x12 x21 , x11 x23 x13 x21 , x11 x24 x14 x21 , x31 , x32 , x12 x23 x13 x22 , x12 x24 x14 x22 , x13 x24 x14 x23 , x33 , x34 . 72
Rotating and reecting the matrix (xij ), we nd eight ideals A1 , A2 , . . . , A8 equivalent to A and four ideals B1 , B2 , B3 , B4 equivalent to B. Note that Ai has codimension 7 and degree 2, Bj has codimension 7 and degree 4, and Ck has codimension 8 and degree 1, while P has codimension 9 and degree 20. The following lemma describes the variety V(A244 ) 44 set-theoretically. Lemma 46. The minimal primes of A244 are the 15 primes Ai , Bj , Cj and P . Each of these is equal to its primary component in A 244 . From Rad(A244 ) = A1 A2 A8 B1 B2 B3 B4 C1 C2 P.
we nd that both A244 and Rad(A244 ) have codimension 7 and degree 32. We next present the list of all the embedded components of A244 . Each of the following ve ideals D, E, F, F and G was shown to be primary by using Algorithm 9.4 in (Eisenbud & Sturmfels 1996). Our rst primary ideal is D := x13 , x23 , x33 , x43 2 + x31 , x32 , x33 , x34 2 + xik xjl xil xjk : min{j, l} 2 or (3, 3) {(i, k), (j, l), (i, l), (j, k)}
The radical of D is a prime of codimension 10 and degree 5. (Commutative algebra experts will notice that Rad(D) is a ladder determinantal ideal.) Up to symmetry, there are four such ideals D1 , D2 , D3 , D4 . Our second type of embedded primary ideal is E := I + x2 , x2 , x2 , x2 , x2 , x2 , x2 , x2 , x2 , x2 12 21 22 23 24 32 33 34 42 43 : (x11 x13 x14 x31 x41 x44 )2 .
Its radical Rad(E) is a monomial prime of codimension 10. Up to symmetry, there are four such primary ideals E1 , E2 , E3 , E4 . Our third type of primary ideal has codimension 10 as well. It equals F := I + x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 12 13 22 23 31 32 33 34 42 43 2 : (x11 x14 x21 x24 x41 x44 ) (x11 x24 x21 x14 ) .
Its radical Rad(F ) is a monomial prime. Up to symmetry, there are four such primary ideals F1 , F2 , F3 , F4 . Note how Rad(F ) diers from Rad(E). Our fourth type of primary is the following ideal of codimension 11: F := I + x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 , x3 12 13 22 23 31 32 33 34 42 43 : (x11 x14 x21 x24 x41 x44 )(x21 x44 x41 x24 ) 73
Up to symmetry, there are four such primary ideals F1 , F2 , F3 , F4 . Note that Rad(F ) = Rad(F ) + x14 x21 x11 x24 . In particular, the ideals F and F lie in the same cellular component of I; see (Eisenbud & Sturmfels 1996, Section 6). Our last primary ideal has codimension 12. It is unique up to symmetry. G := I + x5 , x5 , x5 , x5 , x5 , x5 , x5 , x5 , x5 , x5 , x5 , x5 12 13 21 22 23 24 31 32 33 34 42 43 5 : (x11 x14 x41 x44 ) (x11 x44 x14 x41 ) .
In summary, we have the following theorem. Theorem 47. The ideal of adjacent 2 2-minors of a generic 4 4-matrix has 32 associated primes, 15 minimal and 17 embedded. Using the prime decomposition in Lemma 46, we get the minimal primary decomposition A244 = Rad(I) D1 D4 E1 E4 F1 F4 F1 F4 G. The correctness of the above intersection can be checked by Singular or Macaulay 2. It remains an open problem to nd a primary decomposition for the ideal of adjacent 2 2-minors for larger sizes. We do not even have a reasonable conjecture. Things seem even more dicult for adjacent k kminors. Do you have a suggestion as to how Lemma 46 might generalize?
5.4
Permanental Ideals
The permanant of an nn-matrix is the sum over all its n diagonal products. The permanent looks just like the determinant, except that every minus sign in the expansion is replaced by a plus sign. For instance, the permanent of a 3 3-matrix equals a b c per d e f = aei + af h + bf g + bdi + cdh + ceg. (35) g h i In this section we discuss the following problem: What does it mean for an m n-matrix to have all its k k-subpermanents vanish? As before, we x an m n-matrix of indeterminates X = (xi,j ) and let [X] denote the polynomial ring in these m n unknowns. Let Perk,m,n denote the ideal in [x] generated by all k k-subpermanents of X. Thus Perk,m,n represents a system of m n polynomial equations of degree k in m n unknowns. k k 74
As our rst example consider the three 22-permanents in a 23-matrix: Per2,2,3 = x11 x22 + x12 x21 , x11 x23 + x13 x21 , x12 x23 + x13 x22 .
The generators are not a Grbner basis for any term order. If we pick a term o order which selects the underlined leading monomials then the Grbner basis o consists of the three generators together with two square-free monomials: x13 x21 x22 and x12 x13 x21 .
Proposition 39 tells us that Per2,2,3 is radical. It is also a complete intersection and hence the intersection of prime ideals of codimension three. We nd Per2,2,3 = x11 , x12 , x13 x21 , x22 , x23 x11 x22 + x12 x21 , x13 , x23 x11 x23 + x13 x21 , x12 , x22 x12 x23 + x13 x22 , x11 , x21 .
However, if m, n 3 then P2,m,n is not a radical ideal. Let us examine the 3 3-case in Macaulay 2 with variable names as in the 3 3-matrix (35). i1 : R = QQ[a,b,c,d,e,f,g,h,i]; i2 : Per233 = ideal( a*e+b*d, a*f+c*d, b*f+c*e, a*h+b*g, a*i+c*g, b*i+c*h, d*h+e*g, d*i+f*g, e*i+f*h); i3 : gb Per233 o3 = | fh+ei ch+bi fg+di eg+dh cg+ai bg+ah ce+bf cd+af bd+ae dhi ahi bfi bei dei afi aeh adi adh abi aef abf aei2 ae2i a2ei| This Grbner basis shows us that Per2,3,3 is not a radical ideal. We compute o the radical using the built-in command: i4 : time radical Per233 -- used 53.18 seconds o4 = ideal (f*h + e*i, c*h + b*i, f*g + d*i, e*g + d*h, c*g + a*i, b*g + a*h, c*e + b*f, c*d + a*f, b*d + a*e, a*e*i) The radical has a minimal generator of degree three, while the original ideal was generated by quadrics. We next compute the associated primes. There are 16 such primes, the rst 15 are minimal and the last one is embedded: 75
i5 : time ass Per233 -- used 11.65 seconds o5 = { ideal (g, ideal (i, ideal (i, ideal (h, ideal (i, ideal (i, ideal (i, ideal (h, ideal (g, ideal (h, g, ideal (i, g, ideal (i, h, ideal (i, f, h, h, f, f, h, f, e, d, e, f, g, h, e, g, g, e, e, g, c, c, c, d, d, c, g, d, d, e, d, d, f, b, b, b, b, c, b, f, a, c*h + b*i), a, c*e + b*f), b, c*d + a*f), b, c*g + a*i), c, b*g + a*h), c, b*d + a*e), a, e*g + d*h), a, f*g + d*i), a, f*h + e*i), a), ideal (i, h, g, f, e, d), a), ideal (f, e, d, c, b, a), a), ideal (i, h, f, e, c, b), e, d, c, b, a) }
i6 : time intersect ass Per233 -- used 0.24 seconds o6 = ideal (f*h + e*i, c*h + b*i, f*g + d*i, e*g + d*h, c*g + a*i, b*g + a*h, c*e + b*f, c*d + a*f, b*d + a*e, a*e*i) Note that the lines o4 and o6 have the same output by equation (31). However, for this example the obvious command radical is slower than the nonobvious command intersect ass. The lesson to be learned is that many road lead to Rome and one should always be prepared to apply ones full range of mathematical knowhow when trying to crack a polynomial system. The ideals 22-subpermanents of matrices of any size were studied in full detail by Laubenbacher and Swanson (2000) who gave explicit descriptions of Grbner bases, associated primes, and a primary decomposition of P2,m,n . o The previous Macaulay 2 session oers a glimpse of their results. It would be very interesting to try to extend this work to 3 3-subpermanents and beyond. How many associated primes does the ideal Pk,m,n have? We present one more open problem about permanental ideals. Consider the n 2n-matrix [X X] which is gotten by concatenating our matrix of unknowns with itself. We write Pern [X X] for the ideal of nn-subpermanents of this n 2n-matrix. A conjecture on graph polynomials due to Tarsi sug76
gests that every matrix in the variety of Pern [X X] should be singular. We oer the following renement of Tarsis conjecture. Conjecture 48. The nth power of the determinant of X lies in Per n [X X]. For n = 2 this conjecture is easy to check. Indeed, the ideal Per2 x11 x12 x11 x12 x21 x22 x21 x22 = x11 x22 + x12 x21 , x11 x21 , x12 x22
contains (x11 x22 x12 x21 )2 but not x11 x22 x12 x21 . But already the next two cases n = 3 and n = 4 are quite interesting to work on.
5.5
Exercises
1. If P is an associated prime of I, how to nd a witness f for P in I? 2. Let P be a prime ideal and m a positive integer. Show that P is a minimal prime of P m . Give an example where P m is not primary. 3. For an ideal I of codimension c we dene top(I) as the intersection of all primary components Qi of codimension c. Explain how one computes top(I) from I in Macaulay2 or Singular? Compute top(I) for (a) I = x1 x2 x3 , x4 x5 x6 , x2 x3 , x5 x7 , x11 x13 , 1 2 3 4 5 6
(b) I = x1 x2 + x3 x4 + x5 x6 , x1 x3 + x4 x5 + x6 x2 , x1 x4 + x5 x6 + x2 x3 , x1 x5 + x6 x2 + x3 x4 , x1 x6 + x2 x3 + x4 x5 , (c) I = x2 + x2 x3 1, x2 + x3 x4 1, x2 + x4 x5 1, x2 + x5 x6 1, x2 + 1 2 3 4 5 x6 x1 1, x2 + x1 x2 1 . 6 4. What happens if you apply the formula (33) to an embedded prime Pi ? 5. Prove that P is associated to I if and only if I : (I : P ) = P . 6. Decompose the two adjacent-minor ideals A2,3,4 and A3,3,5 . 7. Decompose the permanental ideals Per2,4,4 , Per3,3,4 and Per3,3,5 . 8. Compute the primary decomposition of Per3 [X X] in Singular. 9. Prove Conjecture 48 for n = 4.
77
Polynomial Systems in Economics
The computation of equilibria in economics leads to systems of polynomial equations. In this lecture we discuss the equations satised by the Nash equilibria of an n-person game. For n = 2 these equations are linear but for n > 2 they are multilinear. We derive these multilinear equations, we present algebraic techniques for solving them, and we give a sharp bound for the number of totally mixed Nash equilibria. This bound is due to McKelvey & McLennan (1997) who derived it from Bernsteins Theorem. In Section 6.2 we oer a detailed analysis of the Three Man Poker Game which appeared in the orginal paper of Nash (1951) and leads to a solving a quadratic equation.
6.1
Three-Person Games with Two Pure Strategies
We present the scenario of a non-cooperative game by means of a small example. Our notation is consistent with that used by Nash (1951). There are three players whose names are Adam, Bob and Carl. Each player can choose from two pure strategies, say buy stock # 1 or buy stock # 2. He can mix them by allocating a probability to each pure strategy. We write a1 for the probability which Adam allocates to strategy 1, a2 for the probability which Adam allocates to strategy 2, b1 for the probability which Bob allocates to strategy 1, etc.. The six probabilities a1 , a2 , b1 , b2 , c1 , c2 are our decision variables. The vector (a1 , a2 ) is Adams strategy, (b1 , b2 ) is Bobs strategy, and (c1 , c2 ) is Carls strategy. We use the term strategy for what is called mixed strategy in the literature. The strategies of our three players satisfy a1 , a2 , b1 , b2 , c1 , c2 0 and a1 + a2 = b1 + b2 = c1 + c2 = 1. (36)
The data representing a particular game are three payo matrices A = (Aijk ), B = (Bijk ), and C = (Cijk ). Here i, j, k run over {1, 2} so that each of A, B, and C is a three-dimensional matrix of format 2 2 2. Thus our game is given by 24 = 3 2 2 2 rational numbers Aijk , Bijk , Cijk . All of these numbers are known to all three players. The game is for Adam, Bob and Carl to select their strategies. They will then receive the following payo: Adams payo Bobs payo Carls payo = = =
2 i,j,k=1 Aijk 2 i,j,k=1 Bijk 2 i,j,k=1 Cijk
ai bj ck ai bj ck ai bj ck
78
A vector (a1 , a2 , b1 , b2 , c1 , c2 ) satisfying (36) is called a Nash equilibrium if no player can increase their payo by changing his strategy while the other two players keep their strategy xed. In other words, the following condition holds: For all pairs (u1 , u2 ) with u1 , u2 0 and u1 + u2 = 1 we have
2 i,j,k=1 Aijk ai bj 2 i,j,k=1 Bijk ai bj 2 i,j,k=1 Cijk ai bj
ck ck ck
2 i,j,k=1 Aijk 2 i,j,k=1 Bijk 2 i,j,k=1 Cijk
ui bj ck , ai u j c k , ai bj uk .
Given xed strategies chosen by Adam, Bob and Carl, each of the expressions on the right hand side is a linear function in (u1 , u2). Therefore the universal quantier above can be replaced by For (u1 , u2 ) {(1, 0), (0, 1)} we have. Introducing three new variables , , for Adams, Bobs and Carls payo, the conditions for a Nash equilibrium can therefore be written as follows: = a1 = b1 = c1
2 j,k=1 A1jk bj ck 2 j,k=1 A1jk bj ck 2 i,k=1 Bi1k ai ck 2 i,k=1 Bi1k ai ck 2 i,j=1 Cij1 ai bj 2 i,j=1 Cij1 ai bj
+ a2 + b2 + c2
2 j,k=1
A2jk bj ck ,
2 j,k=1
and
2 i,k=1
A2jk bj ck , Bi2k ai ck , ai bj .
Bi2k ai ck ,
2 i,k=1
and
2 i,j=1
Cij2 ai bj ,
2 i,j=1 Cij2
and
Since a1 + a2 = 1 and a1 0 and a2 0, rst two rows imply:

2 2
a1
j,k=1
A1jk bj ck
= a2
j,k=1
A2jk bj ck
= 0. (37)
Similarly, we derive the following equations:

2 2
b1
i,k=1 2
Bi1k ai ck
= b2
i,k=1 2
Bi2k ai ck
= 0, (38)
c1
i,j=1
Cij1 ai bj
= c2
i,j=1
Cij2 ai bj
= 0.
(39)
We regard (37), (38) and (39) as a system of polynomial equations in the nine unknowns a1 , a2 , b1 , b2 , c1 , c2 , , , . Our discussion shows the following: 79
Proposition 49. The set of Nash equilibria of the game given by the payo matrices A, B, C is the set of solutions (a1 , . . . , c2 , , , ) to (36), (37), (38) and (39) which make the six expressions in the large parentheses nonnegative. For practical computations it is convenient to change variables as follows: a1 = a, a2 = 1 a, b1 = b, b2 = 1 b, c1 = c, c2 = 1 c. Corollary 50. The set of Nash equilibria of the game given by the payo matrices A, B, C consists of the common zeros of the following six polynomials subject to a, b and c and all parenthesized expressions being nonnegative: a (1 a) b (1 b) c (1 c) A111 bc A112 b(1 c) A121 (1 b)c A122 (1 b)(1 c) , A211 bc A212 b(1 c) A221 (1 b)c A222 (1 b)(1 c) , B111 ac B112 a(1 c) B211 (1 a)c B212 (1 a)(1 c) , B121 ac B122 a(1 c) B221 (1 a)c B222 (1 a)(1 c) , C111 ab C121 a(1 b) C211 (1 a)b C221 (1 a)(1 b) , C112 ab C122 a(1 b) C212 (1 a)b C222 (1 a)(1 b) .
A Nash equilibrium is said to be totally mixed if all six probabilities a, 1a, b, 1b, c, 1c are strictly positive. If we are only interested in totally mixed equilibria then we can erase the left factors in the six polynomials and eliminate , , by subtracting the second polynomial from the rst, the fourth polynomial from the third, and the last polynomial from the fth. Corollary 51. The set of fully mixed Nash equilibria of the game (A, B, C) consists of the common zeros (a, b, c) (0, 1)3 of three bilinear polynomials: (A111 A112 A121 +A122 A211 +A212 +A221 A222 ) bc + (A122 A222 ) + (A112 A122 A212 + A222 ) b + (A121 A122 A221 + A222 ) c, (B111 B112 +B122 B121 B211 +B212 B222 +B221 ) ac + (B212 B222 ) + (B211 B212 B221 + B222 ) c + (B112 B122 B212 + B222 ) a, (C111 C112 +C122 C121 C211 +C212 C222 +C221 ) ab + (C221 C222 ) + (C121 C221 C122 + C222 ) a + (C222 C221 C212 + C211 ) b. These three equations have two complex solutions, for general payo matrices A, B, C. Indeed, the mixed volume of the three Newton squares equals 2. In the next section we give an example where both roots are real and lie in the open cube (0, 1)3, meaning there are two fully mixed Nash equilibria. 80
6.2
Two Numerical Examples Involving Square Roots
Consider the game described in the previous section with the payo matrices 111 A= 6 10 B= C= 0
112 121 122 211 212 221 222
4 12 14
6 8 2
8 1 7
0 12 11
6 7 11
11 6 3
1 8 3
(40)
For instance, B112 = 12. The equations in Corollary 50 are a 6b(1 c) 11(1 b)c (1 b)(1 c) (1 a) 6bc 4b(1 c) 6(1 b)c 8(1 b)(1 c) b ( 12ac 7a(1 c) 6(1 a)c 8(1 a)(1 c) (1 b) 10ac 12a(1 c) 8(1 a)c (1 a)(1 c) c 11ab 11a(1 b) 3(1 a)b 3(1 a)(1 b) (1 c) 14a(1 b) 2(1 a)b 7(1 a)(1 b) = = = = = = 0, 0, 0, 0, 0, 0.
These equations are radical and they have 16 solutions all of which are real. Namely, a vector (a, b, c, , , ) is a solution if and only if it lies in the set 7/12, 7/9, 0, 44/9, 89/12, 28/9 , 1/2, 5/11, 1, 6, 9, 7 , 4, 0, 7/12, 41/6, 337/12, 35 , 1/10, 1, 1/4, 9/2, 297/40, 11/5 , 0, 4/5, 7/9, 86/15, 58/9, 3 , 1, 3/14, 5/7, 663/98, 74/7, 11 , 0, 0, 0, 8, 1, 7 , 0, 0, 1, 6, 8, 3 , 0, 1, 0, 4, 8, 2 , 0, 1, 1, 6, 6, 3 , 1, 0, 0, 1, 12, 14 , 1, 0, 1, 11, 10, 11 , 1, 1, 0, 6, 7, 0 , 1, 1, 1, 0, 12, 11 , 0.8058, 0.2607, 0.6858, 6.3008, 9.6909, 9.4465 0.4236, 0.4059, 0.8623, 6.0518, 8.4075, 6.3869

However, some of these solution vectors are not Nash equilibria. For instance, the third vector has a = 4 which violates the non-negativity of (1 a). The rst vector (a, b, c, , , ) = (7/12, 7/9, 0, 44/9, 89/12, 28/9) violates the non-negativity of ( 11ab 11a(1 b) 3(1 a)b 3(1 a)(1 b)), etc... This process eliminates 11 of the 16 candidate vectors. The remaining ve are marked with a star. We conclude: The game (40) has ve isolated Nash equilibria. Of these ve, the last two are fully mixed Nash equilibria.
81
The two fully mixed Nash equilibria can be represented algebraically by extracting a square root. Namely, we rst erase the left factors a, . . . , (1 c) from the six equations, and thereafter we compute the Grbner basis: o 1011X + 1426c 7348, 96Y + 698c 1409, 3Z + 52c 64, 24a + 52c 55, 1011b 832c + 307, 208c2 322c + 123 . As with all our Grbner bases, leading terms are underlined. These six o equations are easy to solve. The solutions are the last two vectors above. Our second example is the Three-Man Poker Game discussed in Nashs 1951 paper. This game leads to algebraic equations which can be solved by extracting the square root of 321. The following material was prepared by Ruchira Datta. The game was originally solved by John Nash in collaboration with Lloyd Shapley (1950). This is a greatly simplied version of poker. The cards are of only two kinds, high and low. The three players A, B, and C ante up two chips each to start. Then each player is dealt one card. Starting with player A, each player is given a chance to open, i.e., to place the rst bet (two chips are always used to bet). If no one does so, the players retrieve their antes from the pot. Once a player has opened, the other two players are again given a chance to bet, i.e., they may call. Finally, the cards are revealed and those players with the highest cards among those who placed bets share the pot equally. Once the game is open, one should call if one has a high card and pass if one has a low card. The former is obvious; the latter follows because it might be the strategy of the player who opened the game, to only open on a high card. In this case one would denitely lose ones bet as well as the ante. So the only question is whether to open the game. Player C should obviously open if he has a high card. It turns out that player A should never open if he has a low card (this requires proof). Thus player A has two pure strategies: when he has a high card, to open or not to open. We denote his probability of opening in this case by a. (His subsequent moves, and his moves in case he has a low card, are determined.) Player C also has two pure strategies: when he has a low card, to open or not to open. We denote his probability of opening in this case by c. Player B has four pure strategies: for each of his possible cards, to open or not to open. We denote his probability of opening when he has a high card by d, and his probability of opening when he has a low card by e. It turns out that the equilibrium strategy is totally mixed in 82
these four parameters (this also requires proof, but does not require actually computing the strategy). Assuming each of the eight possible hands is equally likely, the payo matrix (where by payo we mean the expected value of the payo) contains 48 = 3 2 4 2 rational entries. As in the examples above, this can be written as a 3 16 matrix. Here is the left (a = 0) block: A B
0000 1 = 4 1 = 4
0001 1 4 1 4
C=
0010 1 4 1 4 1 2
0011
0 0 0
0100 1 4 1 2 1 4
0101
0
1 4 1 4
0110 1 4
0
1 4
0111 1 4 1 2 1 4
(41)
and here is the right (a = 1) block: A= B C

1000 1 8 1 = 4 1 = 8
1001 1 8 1 4 1 8
1010
0
1 4 1 4
1011 1 2 1 4 1 4
1100 1 4 1 8 3 8
1101 1 4 7 8 5 8
1110 1 8 1 8 1 4
1111 3 8 3 8 3 4
(42)
(We split the matrix into blocks to t the page.) Here the indices across the top indicate the pure strategies chosen by the players. If we write a0 = a, a1 = 1 a, d0 = d, d1 = 1 d, e0 = e, e1 = 1 e, c0 = c, and c1 = 1 c, then for instance B1010 is Bs payo when player A does not open on a high card (so a1 = 1), player B does open on a high card (so d0 = 1) and does not open on a low card (so e1 = 1), and player C does open on a low card (so c0 = 1). In general, Xijkl is player Xs payo when ai = 1, dj = 1, ek = 1, and cl = 1. The equation for the expected payo of player B is =de + (1 d) e
1 i,k=0 Bi00k ai ck 1 i,k=0 Bi10k ai ck
+ d (1 e) + (1d)(1e)
1 i,k=0 Bi01k ai ck 1 i,k=0 Bi11k ai ck .
We have a modied version of Corollary 50 with eight polynomials instead of six. The rst polynomial becomes: a A0000 dec A0001 de(1 c) A0010 d(1 e)c A0011 d(1 e)(1 c) A0100 (1 d)ec A0101 (1 d)e(1 c) A0110 (1 d)(1 e)c A0111 (1 d)(1 e)(1 c) 83
The second, fth, and sixth polynomials are modied analogously. The third and fourth polynomials are replaced by four polynomials, the rst of which is d e B0000 ac B0001 a(1 c) B1000 (1 a)c B1001 (1 a)(1 c) Again, we can cancel the left factors of all the polynomials since the equilibrium is totally mixed. Eliminating and as before gives us the following two trilinear polynomials: (A0000 A0001 A0010 +A0011 A0100 +A0101 +A0110 A0111 A1000 +A1001 +A1010 A1011 +A1100 A1101 A1110 +A1111) cde +(A0010 A0011 A0110 +A0111 A1010 +A1011 +A1110 A1111 ) cd +(A0100 A0101 A0110 + A0111 A1100 +A1101 +A1110 A1111 ) ce +(A0001 A0011 A0101 +A0111 A1001 +A1011 +A1101 A1111 ) de +(A0110 A0111 A1110 +A1111 ) c + (A0011 A0111 A1011 +A1111 ) d +(A0101 A0111 A1101 +A1111 ) e + (A0111 A1111 ) and (C0000 C0001 C0010 +C0011 C0100 +C0101 +C0110 C0111 C1000 +C1001 +C1010 C1011 +C1100 C1101 C1110 +C1111) ade +(C0010 C0011 C0110 +C0111 C1010 +C1011 +C1110 C1111 ) ad +(C0100 C0101 C0110 + C0111 C1100 +C1101 +C1110 C1111 ) ae +(C1000 C1001 C1010 +C1011 C1100 +C1101 +C1110 C1111 ) de +(C0110 C0111 C1110 +C1111) a + (C1010 C1011 C1110 +C1111 ) d +(C1100 C1101 C1110 +C1111) e + (C1110 C1111 ). (For each term, take the bitstring that indexes its coecient and mask o the bits corresponding to variables that dont occur in its monomial, which will always be one; then the parity of the resulting bitstring gives the sign of the term.) There are four polynomials in ; subtracting each of the others
84
from the rst gives the following three bilinear polynomials: (B0000 B0001 B0010 +B0011 B1000 +B1001 +B1010 B1011 ) ac + (B1001 B1011 ) +(B0001 B0011 B1001 + B1011 ) a + (B1000 B1001 B1010 + B1011 ) c, (B0000 B0001 B0100 +B0101 B1000 +B1001 +B1100 B1101 ) ac + (B1001 B1101 ) +(B0001 B0101 B1001 + B1101 ) a + (B1000 B1001 B1100 + B1101 ) c, (B0000 B0001 B0110 +B0111 B1000 +B1001 +B1110 B1111 ) ac + (B1001 B1111 ) +(B0001 B0111 B1001 + B1111 ) a + (B1000 B1001 B1110 + B1111 ) c. So the set of totally mixed Nash equilibria consists of the common zeros (a, d, e, c) (0, 1)4 of these ve polynomials. Substituting our payo matrix into the last polynomial gives 1 1 5 + a c = 0. 8 8 2 Solving for c gives 5a + 1 4 and substituting into the previous two polynomials yields c= 5 3 21 + a a2 = 0 8 16 16 and 5 3 21 a + a2 = 0. 8 16 16 Solving for a in the range 0 < a < 1 gives 21 321 . a= 10 Substituting into the two trilinear polynomials yields two linear equations for d and e; solving these yields d= 5 2a , 5+a e= 4a 1 , a+5
which agrees with the result in Nashs paper.
85
6.3
Equations Dening Nash Equilbria
We consider a nite n-person game in normal form. The players are labeled 1, 2, . . . , n. The ith player can select from di pure strategies which we call 1, 2, . . . , di. The game is dened by n payo matrices X(i) , X (2), . . . , X (n) , one for each player. Each matrix X (i) is an n-dimensional matrix of format (i) d1 d2 dn whose entries are rational numbers. The entry Xj1 j2 jn represents the payo for player i if player 1 selects the pure strategy j1 , player 2 selects the pure strategy j2 , etc. Each player is to select a (mixed) strategy, which is a probability distribution on his set of pure strategies. We (i) write pj for the probability which player i allocates to the strategy j. The (i) (i) (i) vector p(i) = p1 , p2 , . . . , pdi is called the strategy of player i. The payo i for player i is the value of the multilinear form given by his matrix X(i) :
d1 d2 dn
=
j1 =1 j2 =1
jn =1
Xj1 j2 ...jn pj1 pj2 pjn .
(i)
(1) (2)
(n)
Summarizing, the data for our problem are the payo matrices X (i) , so the problem is specied by nd1 d2 dn rational numbers. We must solve for (i) the d1 + d2 + + dn unknowns pj . Since the unknowns are probabilities, i, j : pj 0 and i : p1 + p2 + + pdi = 1.
(i) (i) (i) (i) (i)
(43)
These conditions specify that p = (pj ) is a point in the product of simplices = d1 1 d2 1 dn 1 . (44)
A point p is a Nash equilibrium if none of the n players can increase his payo by changing his strategy while the other n 1 players keep their strategies xed. We shall write this as a system of polynomial constraints, in the unknown vectors p and = (1 , . . . , n ) n . For each of the (i) unknown probabilities pk we consider the following multilinear polynomial:
d1 (i) pk di1 di+1 dn
Xj1 ...ji1 kji+1jn pj1 pji1 pji+1 pjn
(i)
(1)
(i1) (i+1)
(n)
(45)
j1 =1 ji1 =1 ji+1 =1 jn =1
Hence (45) together with (43) represents a system of n + d1 + + dn polynomial equations in n + d1 + + dn unknowns, where each polynomial is the product of a linear polynomial and a multilinear polynomial of degree n 1. 86
Theorem 52. A vector (p, ) n represents a Nash equilibrium for the game with payo matrices X (1) , . . . , X (n) if and only if (p, ) is a zero of the polynomials (45) and each parenthesized expression in (45) is nonnegative. Nash (1951) proved that every game has at least one equilibrium point (p, ). His proof and many subsequent renements made use of xed point theorems from topology. Numerical algorithms based on combinatorial renements of these xed point theorems have been developed, notably in the work of Scarf (1967). The algorithms converge to one Nash equilibrium but they do not give any additional information about the number of Nash equilibria or, if that number is innite, about the dimension and component structure of the semi-algebraic set of Nash equilibria. For that purpose one needs the more rened algebraic techniques discussed in these lectures. There is an obvious combinatorial subproblem arising from the equations, namely, in order for the product (45) to be zero, one of the two factors must be zero and the other factor must be non-negative. Thus our problem is that of a non-linear complementarity problem. The case n = 2 is the linear complementarity problem. In this case we must solve a disjunction of systems of linear equations, which implies that each Nash equilibrium has rational coordinates and can be computed using exact arithmetic. A classical simplexlike algorithm due to Lemke and Howson (1964) nds one Nash equilibrium in this manner. It is a challenging computational task to enumerate all Nash equilibria for a given 2-person game as d1 and d2 get large. The problem is similar to (but more dicult than) enumerating all vertices of a convex polyhedron given by linear inequalities. In the latter case, the Upper Bound Theorem gives a sharp estimate for the maximal number of vertices, but the analogous problem for counting Nash equilibria of bimatrix games is open in general. For the state of the art see (McLennan & Park 1998). We illustrate the issue of combinatorial complexity with an example from that paper. Example 53. (A two-person game with exponentially many Nash equilibria) Take n = 2, d1 = d2 =: d and both X (1) and X (2) to be the d d-unit matrix. In this game, the two players both have payo 1 if their choices agree and otherwise they have payo 0. Here the equilibrium equations (45) are pk 1 pk
(1) (2)
pk 2 pk
(2)
(1)
0 for k = 1, 2, . . . , d. (46)
(i)
The Nash equilibria are solutions of (46) such that all pk are between 0 and (1) (1) (2) (2) i and p1 + + pd = p1 + + pd = 1. Their number equals 2d 1. 87
For instance, for d = 2 the equilibrium equations (46) have ve solutions: i1 : R = QQ[p,q,Pi1,Pi2]; i2 : I = ideal( p * (Pi1 - q), (1 - p) * (Pi1 - 1 + q), q * (Pi2 - p), (1 - q) * (Pi2 - 1 + p) ); i3 : decompose(I) o3 = { ideal (Pi2 - 1, Pi1 - 1, p, q), ideal (Pi2 - 1, Pi1 - 1, p - 1, q - 1), ideal (2Pi2 - 1, 2Pi1 - 1, 2p - 1, 2q - 1), ideal (Pi2, Pi1, p, q - 1), ideal (Pi2, Pi1, p - 1, q) } Only the rst three of these ve components correspond to Nash equilibria. For d = 2, the 2d 1 = 3 Nash equilibria are (p, q) = (0, 0), ( 1 , 1 ), (1, 1). 2 2 In what follows we shall disregard the issues of combinatorial complexity discussed above. Instead we focus on the algebraic complexity of our problem. To this end, we consider only fully mixed Nash equilibria, that is, we (i) add the requirement that all probabilities pj be strictly positive. In our algebraic view, this is no restriction in generality because the vanishing of some of our unknowns yields smaller system of polynomial equations with fewer unknowns but of the same multilinear structure. From now on, the (i) pj will stand for real variables whose values are strictly between 0 and 1. This allows us to remove the left factors p(i) in (45) and work with the parenthesized (n 1)-linear polynomials instead. Eliminating the unknowns i , we get the following polynomials for i = 1, . . . , n, and k = 2, 3, . . . , di:
d1 di1 di+1 dn
j1 =1
(Xj1 ...ji1 kji+1jn Xj1 ...ji1 1ji+1 jn )pj1 pji1 pji+1 pjn
(i)
(i)
(1)
(i1) (i+1)
(n)
ji1 =1 ji+1 =1
jn =1
This is a system of d1 + +dn n equations in d1 + +dn unknowns, which satisfy the n linear equations in (43). Corollary 51 generalizes as follows. Theorem 54. The fully mixed Nash equilibria of the n-person game with payo matrices X (1) , . . . , X (n) are the common zeros in the interior of the polytope of the d1 + + dn n multilinear polynomials above.
88
In what follows, we always eliminate n of the variables by setting

di 1 (i) pdi
1
j=1
pdi
(i)
for i = 1, 2, . . . , n.
What remains is a system of multilinear polynomials unknowns, where := d1 + + dn n. We shall study these equations in the next section.
6.4
The Mixed Volume of a Product of Simplices
Consider the di 1 polynomials which appear in Theorem 54 for a xed upper index i. They share same Newton polytope, namely, the product of simplices (i) = d1 1 di1 1 {0} di+1 1 dn 1 . (47)
Here di 1 is the convex hull of the unit vectors and the origin in di 1 . Hence the Newton polytope (i) is a polytope of dimension di + 1 in . Combining all Newton polytopes, we get the following -tuple of polytopes [d1 , . . . , dn ] := (1) , . . . , (1) , (2) , . . . , (2) , . . . , (n) , . . . , (n) ,
where (i) appears di 1 times. Corollary 55. The fully mixed Nash equilibria of an n-person game where player i has di pure strategies are the zeros of a sparse polynomial system with support [d1 , . . . , dn ], and every such system arises from some game. We are now in the situation of Bernsteins Theorem, which tells us that the expected number of complex zeros in ( ) of a sparse system of polynomials in unknowns equals the mixed volume of the Newton polytopes. The following result of McKelvey & McLennan (1997) gives a combinatorial description for the mixed volume of the polytope-tuple [d1 , . . . , dn ]. Theorem 56. The maximum number of isolated fully mixed Nash equilibria for any n-person game where the ith player has d i pure strategies equals the mixed volume of [d1 , . . . , dn ]. This mixed volume coincides with the number (i) of partitions of the -element set of unknowns { p k : i = 1, . . . , n, k = 2, . . . , di } into n disjoint subsets B1 , B2 , . . . , Bn such that 89
the cardinality of the ith block Bi is equal to di 1, and the ith block Bi is disjoint from { p1 , p2 , . . . , pdi with upper index i is allowed to be in Bi .
(i) (i) (i)
, i.e., no variable
This theorem says, in particular, that the maximum number of complex zeros of a sparse system with Newton polytopes [d1 , . . . , dn ] can be attained by counting real zeros only. Moreover, it can be attained by counting only real zeros which have all their coordinates strictly between 0 and 1. The key idea in proving Theorem 56 is to replace each of the given multilinear equations by a product of linear forms. In terms of Newton polytopes, this means that (i) is expressed as the Minkowski sum of the n 1 simplices {0} {0} dj 1 {0} {0}. (48)
We shall illustrate Theorem 56 and this factoring construction for the case n = 3, d1 = d2 = d3 = 3. Our familiar players Adam, Bob and Carl reenter the scene in this case. A new stock #3 has come on the market, and our friends can now each choose from three pure strategies. The probabilities which Adam allocates to stocks #1, #2 and #3 are a1 , a2 , and 1 a1 a2 . There are now six equilibrium equations in the six unknowns a1 , a2 , b1 , b2 , c1 , c2 . The number of set partitions of {a1 , a2 , b1 , b2 , c1 , c2 } described in Theorem 56 is ten. The ten allowed partitions are {b1 , b2 } {b1 , c1 } {b1 , c2 } {b2 , c1 } {b2 , c2 } {c1 , c2 } {a1 , a2 } {a1 , c2 } {a2 , b2 } {a1 , c1 } {a2 , b2 } {a1 , c2 } {a2 , b1 } {a1 , c1 } {a2 , b1 } {c1 , c2 } {a1 , a2 } {b1 , b2 } {b1 , c1 } {a2 , c2 } {a1 , b2 } {b1 , c2 } {a2 , c1 } {a1 , b2 } {b2 , c1 } {a2 , c2 } {a1 , b1 } {b2 , c2 } {a2 , c1 } {a1 , b1 }.
This number ten is the mixed volume of six 4-dimensional polytopes, each a product of two triangles, regarded as a face of the product of three triangles: [2, 2, 2] = 2 2 , 2 2 , 2 2 , 2 2 , 2 2 , 2 2
Theorem 56 tells us that Adam, Bob and Carl can be made happy in ten possible ways, i.e, their game can have as many as ten fully mixed Nash equilibria. We shall construct payo matrices which attain this number. 90
Consider the following six bilinear equations in factored form: (200b1 + 100b2 100)(200c1 + 100c2 100) (190b1 + 110b2 101)(190c1 + 110c2 101) (200a1 + 100a2 100)(180c1 + 120c2 103) (190a1 + 110a2 101)(170c1 + 130c2 106) (180a1 + 120a2 103)(180b1 + 120b2 103) (170a1 + 130a2 106)(170b1 + 130b2 106) = = = = = = 0 0 0 0 0 0.
These equations have the Newton polytopes [2, 2, 2], and the coecients are chosen so that all ten solutions have their coordinates between 0 and 1. We now need to nd 3 3 3-payo matrices (Aijk ), (Bijk ), and (Cijk ) which give rise to these equations. Clearly, the payo matrices are not unique. To make them unique we require the normalizing condition that each players payo is zero when he picks stock #1. In symbols, A1jk = Bi1k = Cij1 = 0 for all i, j, k {1, 2, 3}. The remaining 54 parameters are now uniquely determined. To nd them, we expand our six polynomials in a dierent basis, like the one used in Corollary 50. The rewritten equations are 10000b1 c1 10000b1 (1 c1 c2 ) 10000(1 b1 b2 )c1 +10000(1 b1 b2 )(1 c1 c2 ) = 0, 7921b1 c1 + 801b1 c2 8989b1 (1 c1 c2 ) + 801b2 c1 + 81b2 c2 909b2 (1 c1 c2 ) 8989(1 b1 b2 )c1 909(1 b1 b2 )c2 +10201(1 b1 b2 )(1 c1 c2 ) = 0, 7700a1 c1 + 1700a1 c2 10300a1 (1 c1 c2 ) 7700(1 a1 a2 )c1 1700(1 a1 a2 )c2 + 10300(1 a1 a2 )(1 c1 c2 ) = 0, 5696a1c1 + 2136a1 c2 9434a1 (1 c1 c2 ) + 576a2 c1 + 216a2 c2 954a2 (1 c1 c2 ) 6464(1 a1 a2 )c1 2424(1 a1 a2 )c2 +10706(1 a1 a2 )(1 c1 c2 ) = 0, 5929a1 b1 + 1309a1b2 7931a1 (1 b1 b2 ) + 1309a2 b1 + 289a2 b2 1751a2 (1 b1 b2 ) 7931(1 a1 a2 )b1 1751(1 a1 a2 )b2 +10609(1 a1 a2 )(1 b1 b2 ) = 0, 4096a1 b1 + 1536a1b2 6784a1 (1 b1 b2 ) + 1536a2 b1 + 576a2 b2 2544a2 (1 b1 b2 ) 6784(1 a1 a2 )b1 2544(1 a1 a2 )b2 +11236(1 a1 a2 )(1 b1 b2 ) = 0. 91
The 18 coecients appearing in the rst two equations are the entries in Adams payo matrix: A211 = 10000, A212 = 0, . . . , a233 = 10000 ; A311 = 7921, . . . , A333 = 10201. Similarly, we get Bobs payo matrix from the middle two equations, and we get Carls payo matrix from the last two equations. In this manner, we have constructed an explicit three-person game with three pure strategies per player which has ten fully mixed Nash equilibria. Multilinear equations are particularly well-suited for the use of numerical homotopy methods. For the starting system of such a homotopy one can take products of linear forms as outlined above. Jan Verschelde has reported encouraging results obtained by his software PHC for the computation of Nash equilibria. We believe that considerable progress can still be made in the numerical computation of Nash equilibria, and we hope to pursue this further. One special case of Theorem 56 deserves special attention: d1 = d2 = = dn = 2. This concerns an n-person game where each player has two pure strategies. The corresponding polytope tuple [1, 1, . . . , 1] consists of the n distinct facets of the n-dimensional cube. Ocially, the n-cube has 2n facets each of which is an (n 1)-cube, but the facets come in natural pairs, and we pick only one representative from each pair. In this special case, the partitions described in Theorem 56 correspond to the derangements of the set {1, 2, . . . , n}, that is, permutations of {1, 2, . . . , n} without xed points. Corollary 57. The following three numbers coincide, for every n : The maximum number of isolated fully mixed Nash equilibria for an n-person game where each player has two pure strategies, the mixed volume of the n facets of the n-cube, the number of derangements of an n-element set. Counting derangements is a classical problem is combinatorics. Their number grows as follows: 1, 2, 9, 44, 265, 1854, 14833, 133496, . . .. For instance, the number of derangements of {1, 2, 3, 4, 5} is 44. A 5-person game with two mixed strategies can have as many as 44 fully mixed Nash equlibria.
92
6.5
Exercises
1. Consider three equations in unknowns a, b, c as in Corollary 51: bc+1 b+2 c+3 = ac+1 a+2 c+3 = ab+1 a+2 b+3 = p 0. Find necessary and sucient conditions, in terms of the parameters i , j , k for this system to have two real roots (a, b, c) both of which satisfy 0 < a, b, c < 1. In other words, characterize those 3-person games with 2 pure strategies which have 2 totally mixed Nash equilibria. 2. Find all irreducible components of the variety dened by the equations (46). How many components do not correspond to Nash equilibria? 3. Determine the exact maximum number of isolated fully mixed Nash equilibria of any 5-person game where each player has 5 pure strategies. 4. Pick your favorite integer N between 0 and 44. Construct an explicit ve-person game with two mixed strategies per player which has exactly N fully mixed Nash equilibria.
Sums of Squares
This lecture concerns polynomial problems over the real numbers . This means that the input consists of polynomials in [x1 , . . . , xn ] where each coecient is given either as a rational number or a oating point number. A trivial but crucial observation about real numbers is that sums of squares are non-negative. Sums of squares lead us to Semidenite Programming, an exciting subject of current interest in numerical optimization. We will give an introduction to semidenite programming with a view towards solving polynomial equations and inequalities over . A crucial role is played by the Real Nullstellensatz which tells us that either a polynomial problem has a solution or there exists a certicate that no solution exists. Semidenite programming provides a numerical method for computing such certicates.
93
7.1
Positive Semidenite Matrices
We begin by reviewing some basic material from linear algebra. Let V m be an m-dimensional real vector space which has a known basis. Every quadratic form on V is represented uniquely by a symmetric m m-matrix A. Namely, the quadratic form associated with a real symmetric matrix A is : V , u uT A u. (49)
The matrix A has only real eigenvalues. It can be diagonalized over the real numbers by an orthogonal matrix , whose columns are eigenvectors of A: T A = diag(1 , 2 , . . . , m ). (50)
Computing this identity is a task in numerical linear algebra, a task that matlab performs well. Given (50) our quadratic form can be written as
m m
(u)
=
j=1
j
i=1
ij ui
(51)
This expression is an alternating sum of squares of linear forms on V . Proposition 58. For a symmetric m m-matrix A with entries in , the following ve conditions are equivalent: (a) uT A u 0 for all u m (b) all eigenvalues of A are nonnegative real numbers (c) all diagonal subdeterminants of A are nonnegative (d) there exists a real m m-matrix B such that A = B B T (e) the quadratic form uT A u is a sum of squares of linear forms on m . By a diagonal subdeterminant of A we mean an i i-subdeterminant with the same row and column indices, for any i {1, 2, . . . , m}. Thus condition (c) amounts to checking 2m 1 polynomial inequalities in the entries of A. If we wish to check whether A is positive denite, the situation when all eigenvalues are strictly positive, then it suces to take the m principal minors, which are gotten by taking the rst i rows and rst i columns only. 94
We call the identity A = B BT in (d) a Cholesky decomposition of A. In numerical analysis texts this term is often reserved for such a decomposition where B is lower triangular. We allow B to be any real matrix. Note that the factor matrix B is easily expressed in terms of the (oating point) data computed in (50) and vice versa. Namely, we take B = diag( 1 , 2 , . . . , m ).
In view of (51), this proves the equivalence of (d) and (e): knowledge of the matrix B is equivalent to writing the quadratic form as a sum of squares. A matrix A which satises the conditions (a) (e) is called positive semidenite. Let Sym2 (V ) denote the real vector space consisting of all symmetric m m-matrices. The positive semidenite cone or PSD cone is P SD(V ) = { A Sym2 (V ) : A is positive semidenite }.
This is a full-dimensional closed semi-algebraic convex cone in the vector m+1 space Sym2 (V ) ( 2 ) . The set P SD(V ) is closed and convex because it is the solution set of an innite system of linear inequalities in (a), one for each u m . It is semi-algebraic because it can be dened by m polynomial inequalities as in (c). It is full-dimensional because every matrix A with strictly positive eigenvalues i has an open neighborhood in P SD(V ). The extreme rays of the cone P SD(V ) are the squares of linear forms, as in (e). In what follows we use the symbol to denote a linear function (plus a constant) on the vector space Sym2 (V ). Explicitly, for an indeterminate symmetric matrix A = (aij ), a linear function can be written as follows:
m
(A)
u00 +
1j<km
ujk aij
where the ujk are constants. An ane subspace is the solution set to a system of linear equations 1 (A) = = r (A) = 0. Semidenite programming concerns the intersection of an ane subspace with the positive semidenite cone. There are highly ecient algorithms for solving the following problems. Semidenite Programming: Decision Problem Given linear functions 1 , . . . , r , does there exist a positive semidenite matrix A PSD(V) which satises the equations 1 (A) = = r (A) = 0? 95
Semidenite Programming: Optimization Problem Given linear functions 0 , 1 , . . . , r , minimize 0 (A) subject to A PSD(V) and 1 (A) = = r (A) = 0. It is instructive to examine these two problems for the special case when A is assumed to be a diagonal matrix, say, A = diag(1 , . . . , m ). Then A PSD(V ) is equivalent to 1 , . . . , m 0, and our rst problem is to solve a linear system of equations in the non-negative reals. This is the Decision Problem of Linear Programming. The second problem amounts to minimizing an linear function over a convex polyhedron, which is the Optimization Problem of Linear Programming. Thus Linear Programming is the restriction of Semidenite Programming to diagonal matrices. Consider the following simple semidenite programming decision problem for m = 3. Suppose we wish to nd a positive semidenite matrix a11 a12 a13 which satises A = a12 a22 a23 PSD( 3 ) a13 a23 a33 a11 = 1, a12 = 0, a23 = 1, a33 = 2 and 2a13 + a22 = 1. It turns out that this particular problem has a 1 0 1 1 0 1 1 = 0 1 A = 0 1 1 2 1 1 unique solution: T 0 1 0 0 0 0 1 0 0 1 1 0 (52)
(53)
We will use this example to sketch the connection to sums of squares. Consider the following fourth degree polynomial in one unknown: f (x) = x4 x2 2x + 2.
We wish to know whether f (x) is non-negative on , or equivalently, whether f (x) can be written as a sum of squares of quadratic polynomials. Consider the possible representations of our polynomial as a matrix product: 2 a11 a12 a13 x x2 x 1 a12 a22 a23 x (54) f (x) = a13 a23 a33 1 This identity holds if and only if the linear equations (52) are satised. By condition (e) in Proposition 58, the polynomial in (54) is a sum of squares if 96
and only if the matrix A = (aij ) is positive semidenite. Thus the semidefinite programming decision problem specied by (52) is exactly equivalent to the question whether f (x) is a sum of squares. The answer is armative and given in (53). From the Cholesky decomposition of A = (aij ) in (53). we get 2 x 1 x2 1 x 1 0 x 1 = (x2 1)2 + (x 1)2 . f (x) = 0
7.2
Zero-dimensional Ideals and SOStools
Let I be a zero-dimensional ideal in S = [x1 ,. . . , xn ] which is given to us by an explicit Grbner basis G with respect to some term order . Thus we are o in the situation of Lecture 2. The set B = B (I) of standard monomials is an eective basis for the -vector space V = S/I. Suppose that #(B) = m, so that S/I m . Every quadratic form on V is represented by an m mmatrix A whose rows and columns are indexed by B. Let X denote the column vector of length m whose entries are the monomials in B. Then X T A X is a polynomial in S = [x1 , . . . , xn ]. It can be regarded as an o element of S/I = B by taking its normal form modulo the Grbner basis G. In this section we apply semidenite programming to the quadratic forms X T A X on V . The point of departure is the following theorem. Theorem 59. The following three statements are equivalent: (a) The ideal I has no real zeros. (b) The constant 1 is a sum of squares in V = S/I. (c) There exists a positive semidenite m m-matrix A such that XT A X + 1 lies in the ideal I. (55)
The equivalence of (b) and (c) follows from Proposition 58. The implication from (b) to (a) is obvious. The implication from (a) to (b) is proved by reduction to the case n = 1. For one variable, it follows from the familar fact that a polynomial in [x] with no real roots can be factored into a product of irreducible quadratic polynomials. The condition (55) can be written as o X T A X + 1 reduces to zero modulo the Grbner basis G. 97 (56)
This is a linear system of equations in the unknown entries of the symmetric matrix A. We wish to decide whether A lies in cone PSD(V ). Thus the question whether the given ideal I has a real zero or not has been reformulated as a decision problem of semidenite programming. A positive solution A to the semidenite programming problem provides a certicate for the non-existence of real roots. The following ideal (for n = 3) appeared as an example in Lecture 2: I = z2 + 1 x 1 y + 5 5
2 , 25
y2 1 x + 1 z + 5 5
2 , 25
2 , 25 1 25
x2 + 1 y 1 z + 5 5
xy + xz + yz +
6 . The four given generators are a Grbner basis. We have [x, y, z]/I o T The column vector of standard monomials is X = 1, x, y, z, xz, yz . We wish to show that I has no real zeros, by nding a representation (55). We use the software SOStools which was developed by Pablo Parrilo and his collaborators. It is available at http://www.cds.caltech.edu/sostools/. The following SOStools sessions were prepared by Ruchira Datta. Many thanks and compliments to Ruchira. We write g1, g2, g3, g4 for the given generators of the ideal I. Our decision variables are nd p1, a sum of squares, and p2, p3, p4, p5, arbitrary polynomials. They are supposed to satisfy
p1 + 1 + p2 g1 + p3 g2 + p4 g3 + p5 g4 Here is how to say this in SOStools: >> >> >> >> >> >> >> clear; maple clear; echo on syms x y z; vartable = [x; y; z]; prog = sosprogram(vartable); Z = [ 1; x; y; z; x*z; y*z ]; [prog,p{1}] = sossosvar(prog,Z); for i = 1:4 [prog,p{1+i}] = sospolyvar(prog,Z); end; g{1} = z^2 + x/5 - y/5 + 2/25; g{2} = y^2 - x/5 + z/5 + 2/25; g{3} = x^2 + y/5 - z/5 + 2/25; g{4} = x*y + x*z + y*z + 1/25; expr = p{1} + 1; 98
0.
>> >> >> >> >>
>> for i = 1:4 expr = expr + p{1+i}*g{i}; end; >> prog = soseq(prog,expr); >> prog = sossolve(prog); The program prepares the semidenite programming problem (SDP) and then it calls on another program SeDuMi for solving the SDP by interior point methods. The numerical output produced by SeDuMi looks like this: SeDuMi 1.05 by Jos F. Sturm, 1998, 2001. Alg = 2: xz-corrector, Step-Differentiation, theta = 0.250, beta = 0.500 eqs m = 35, order n = 87, dim = 117, blocks = 2 nnz(A) = 341 + 0, nnz(ADA) = 563, nnz(L) = 336 it : b*y gap delta rate t/tP* t/tD* feas cg cg 0 : 2.82E-01 0.000 1 : 3.23E+00 6.35E-03 0.000 0.0225 0.9905 0.9900 -0.07 1 1 2 : 2.14E-04 3.33E-06 0.000 0.0005 0.9999 0.9999 0.97 1 1 3 : 2.15E-11 3.34E-13 0.000 0.0000 1.0000 1.0000 1.00 1 1 iter seconds digits c*x b*y 3 0.8 Inf 0.0000000000e+00 2.1543738837e-11 |Ax-b| = 2.1e-12, [Ay-c]_+ = 6.2E-12,|x|= 7.5e+01,|y|= 2.3e-11 Max-norms: ||b||=1, ||c|| = 0, Cholesky |add|=0, |skip| = 0, ||L.L|| = 2.79883. Residual norm: cpusec: iter: feasratio: pinf: dinf: numerr: 2.1405e-12 0.8200 3 1.0000 0 0 0
The bottom two entries pinf: 0 and dinf: 0 indicate that the SDP was feasible and a solution p1, . . . , p5 has been found. At this point we may already conclude that I has no real zeros. We can now ask SOStools to display the sum of squares p1 it has found. This is done by typing >> SOLp1 = sosgetsol(prog,p{1}) 99
Rather than looking at the messy output, let us now return to our general discussion. Suppose that I is a zero-dimensional ideal which has real roots, perhaps many of them. Then we might be interested in selecting the best real root, in the sense that it maximizes some polynomial function. Real Root Optimization Problem Given a polynomial f S, minimize f (u) subject to u V(I) n . This problem is equivalent to nding the largest real number such that f (x) is non-negative on V(I) n . In the context of semidenite programming, it makes sense to consider the following optimization problem: Sum of Squares in an Artinian Ring Given a polynomial f S, maximize subject to X T A X f (x) + I and A positive semidenite.
The latter problem can be easily solved using semidenite programming, and it always leads to a lower bound for the true minimum. But they need not be equal. The following simple example in one variable illustrates the issue. Consider the following two problems on the real line : (a) Minimize x subject to x2 5x + 6 = 0. (b) Minimize x subject to x4 10x3 + 37x2 60x + 36 = 0. The quartic in (b) is the square of the quadric in (a), so the solution to both problems is x = 2. Consider now the Sum of Squares problems: (a) Maximize such that x is a sum of squares modulo x2 5x + 6 . (b) Maximize such that x is a sum of squares modulo x4 10x3 + 37x2 60x + 36 . The solution to the semidenite program (a) is = 2 as desired, since (x 2) = (x 2)2 (x2 5x + 6).
On the other hand, by allowing polynomials of higher and higher degrees in our sum of squares representations, we can get a solution to problem (b) arbitrarily close to = 2, but can never reach it. However, for some nite degrees the solution we nd numerically will be equal to to within numerical error. The following SOStools session produces (numerically) polynomials p1 of degree six and p2 of degree two such that x + 1 = p1 + p2 g: 100
>> clear; maple clear; echo on >> syms x lambda >> prog=sosprogram([x],[lambda]); >> Z = monomials([x],0:3); >> [prog,p1] = sossosvar(prog,Z); >> Z = monomials([x],0:2); >> [prog,p2] = sospolyvar(prog,Z); >> g = x^4 - 10*x^3 + 37*x^2 - 60*x + 36; >> prog=soseq(prog,x-lambda-p1-p2*g); >> prog=sossetobj(prog,-lambda); >> prog = sossolve(prog); Size: 20 7 SeDuMi 1.05 by Jos F. Sturm, 1998, 2001. Alg = 2: xz-corrector, Step-Differentiation, theta = 0.250 eqs m = 7, order n = 13, dim = 25, blocks = 2 ... iter seconds digits c*x b*y 24 1.7 Inf -1.9999595418e+00 -1.9999500121e+00 ... >> SOLlambda = sosgetsol(prog,lambda) SOLlambda = 2
>> SOLp1 = sosgetsol(prog,p1) SOLp1 = 23216 + 6420.6*x - 21450.1*x^2 - 9880.2*x^3 + 18823*x^4 - 7046.8*x^5 + 830.01*x^6 >> SOLp2 = sosgetsol(prog,p2) SOLp2 = -644.95 - 1253.2*x - 830.01*x^2
101
From the numerical output we see that is between 1.99995 and 1.99996, although this is displayed as 2. The discrepancy between (a) and (b) is explained by the fact that the second ideal is not radical. The following result, which is due to Parrilo (2002), shows the SOStools computation just shown will always work well for a radical ideal I. Theorem 60. Let I be a zero-dimensional radical ideal in S = [x 1 , . . . , xn ], and let g S be a polynomial which is nonnegative on V(I) n . Then g is a sum of squares in S/I. Proof. For each real root u of I, pick a polynomial pu (x) which vanishes on V(I)\{u} but pu (u) = 1. For each pair of imaginary roots U = {u, u}, we pick a polynomial qU (x) with real coecients which vanishes on V(I)\U but qU (u) = qU (u) = 1, and we construct a sum of squares sU (x) in S = [x1 , . . . , xn ] such that g is congruent to sU modulo (x u)(x u) . The following polynomial has real coecients and is obviously a sum of squares: G(x) =
uV(I)n
g(u) pu (x)2 +
U V(I)\n
sU (x) qU (x)2 .
By construction, the dierence g(x) G(x) vanishes on the complex variety of I. Since I is a radical ideal, the Nullstellensatz implies that g(x) G(x) lies in I. This proves that the image of g(x) in S/I is a sum of squares. Corollary 61. If I is radical then the Real Root Optimization Problem is solved exactly by its relaxation Sum of Squares in an Artinian Ring.
7.3
Global Optimization
In this section we discuss the problem of nding the global minimum of a polynomial function on n , along the lines presented in more detail in (Parrilo & Sturmfels 2001). Let f be a polynomial in [x1 , . . . , xn ] which attains a minimum value f = f (u) as u ranges over all points in n . Our goal is to nd the real number f . Naturally, we also wish to nd a point u at which this value is attained, but let us concentrate on nding f rst. For example, the following class of polynomials is obviously bounded below and provides a natural test family: f (x1 , . . . , xn ) = x2d + x2d + + x2d + g(x1 , . . . , xn ) 1 2 n 102 (57)
where g is an arbitrary polynomial of degree at most 2d 1. In fact, it is possible to deform any instance of our problem to one that lies in this family, but we shall not dwell on this point right now. An optimal point u n of our minimization problem is a zero of the critical ideal f f f S. , ,..., I = x1 x2 xn Hence one possible approach would be to locate the real roots of I and then to minimize f over that set. For instance, in the situation of (57), the n partial derivatives of f are already of Grbner basis of I with respect to o the total degree term order, so it should be quite easy to apply any of the methods we already discussed for nding real roots. The trouble is that the Bzout number of the critical ideal I equals (2d 1)n . This number grows e exponentially in n for xed d. A typical case we might wish to solve in practice is minimizing a quartic in eleven variables. For 2d = 4 and n = 11 we get (2d 1)n = 311 = 177, 147. What we are faced with is doing linear algebra with square matrices of size 177, 147, an impossible task. Consider instead the following relaxation of our problem due to N. Shor. Global Minimization: SOS Relaxation Find the largest such that f (x1 , . . . , xn ) is a sum of squares. The optimal value for this problem clearly satises f . Using the well-known examples of positive polynomials which are not sums of squares, one can construct polynomials f such that < f . For instance, consider Motzkins polynomial f (x, y) = x4 y 2 + x2 y 4 3x2 y 2. (58)
For this polynomial we even have = and f = 0. However, the experiments in (Parrilo & Sturmfels 2001) suggest that the equality f = almost always holds in random instances. Moreover, the semidenite algorithm for computing allows us to certify f = and to nd a matching u n in these cases. The SOS Relaxation can be translated into a semidenite programming problem where the underlying vector space is the space of polynomials of degree at most d, V =
[x1 , . . . , xn ]d (
n+d d
).
103
Note that the dimension n+d of this space grows polynomially in n when d d is xed. For a concrete example consider again the problem of minimizing a quartic in eleven variables. Here d = 2 and n = 11, so we are dealing with symmetric matrices of order n+d = 13 = 78. This number is considerd 2 ably smaller than 177, 147. Linear algebra for square matrices of order 78 is quite tractable, and a standard semidenite programming implementation nds the exact minimum of a random instance of (57) in about ten minutes. Here is an explicit example in SOStools, with its SeDuMi output surpressed: >> >> >> >> >> clear; maple clear; echo on syms x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 lambda; vartable = [x1; x2; x3; x4; x5; x6; x7; x8; x9; x10; x11]; prog=sosprogram(vartable,[lambda]); f = x1^4 + x2^4 + x3^4 + x4^4 + x5^4 + x6^4 + x7^4 + x8^4 + x9^4 + x10^4 + x11^4 - 59*x9 + 45*x2*x4 - 8*x3*x11 - 93*x1^2*x3 + 92*x1*x2*x7 + 43*x1*x4*x7 - 62*x2*x4*x11 + 77*x4*x5*x8 + 66*x4*x5*x10 + 54*x4*x10^2 - 5*x7*x9*x11; prog=sosineq(prog,f+lambda); prog=sossetobj(prog,lambda); prog=sossolve(prog); SOLlambda=sosgetsol(prog,lambda)
>> >> >> >>
SOLlambda = .12832e8 With a few more lines of SOStools code, we can now verify that = 0.12832e8 = f holds and we can nd a point u 11 such that f (u) = f .
7.4
The Real Nullstellensatz
In this section we consider an arbitrary system of polynomial equations and inequalities in n real variables x = (x1 , . . . , xn ). The Real Nullstellensatz states that such a system either has a solution u n or there exists a certain certicate that no solution exists. This result can be regarded as a common generalization of Hilberts Nullstellensatz (for polynomial equations over ) and of Linear Programming duality (for linear inequalities over ). The former states that a set of polynomials f1 , . . . , fr either has a common complex zero or there exists a certicate of non-solvability of the form 104
pi fi = 1, where the pi are polynomial multipliers. One of the many equivalent formulations of Linear Programming duality states the following: A system of strict linear inequalities h1 (x) > 0, . . . , ht (x) > 0 either has a solution, or there exists nonnegative real numbers i , not all zero, such that
t
r i=1
i hi (x)
i=1
0.
Such an identity is an obvious certicate of non-solvability. The Real Nullstellensatz states the existence of certicates for all polynomial systems. The following version of this result is due to Stengle (1974). Theorem 62. The system of polynomial equations and inequalities f1 (x) = 0, f2 (x) = 0, . . . , fr (x) = 0, g1 (x) 0, g2 (x) 0, . . . , gs (x) 0, h1 (x) > 0, h2 (x) > 0, . . . , ht (x) > 0. either has a solution in n , or there exists a polynomial identity
r i=1 i fi
{0,1}t (
cj )2
2 {0,1}s ( j bj ) 2 h 1 h t + t 1 k dk
g1 1 g s s
t l=1
hul l
0,
where uj and ai , bj , cj , dk are polynomials. It is instructive to consider some special cases of this theorem. For instance, consider the case r = s = 0 and t = 1. In that case we must decide the solvability of a single strict inequality h(x) > 0. This inequality has no solution, i.e., h(x) is a nonnegative polynomial on n , if and only if there exists an identity of the following form (
j
c j )2 h +
k
d2 + hu k
0.
Here u is either 0 or 1. In either case, we can solve for h and conclude that h is a ratio of two sum of squares of polynomials. This expression can obviously be rewritten as a sum of squares of rational functions. This proves: Corollary 63. (Artin 1925) Every polynomial which is nonnegative on n is a sum of squares of rational functions. 105
Another case deserves special attention, namely, the case s = t = 0. There are no inequalities, but we are to solve r polynomial equations f1 (x) = f2 (x) = = fr (x) = 0. (59)
For this polynomial system, the expression t hul in the Real Nullstellenl=1 l satz certicate is the empty product, which evaluates to 1. Hence if (59) has no real solutions, then there exists an identity
r
i fi + 1
i=1
0.
This implies that Theorem 59 holds not just in the zero-dimensional case. Corollary 64. Let I be any ideal in S = [x1 , . . . , xn ] whose real variety V(I) n is empty. Then 1 is a sum of squares of polynomials modulo I. Here is our punchline, rst stated in the dissertation of Pablo Parrilo (2000): A Real Nullstellensatz certicate of bounded degree can be computed by semidenite programming. Here we can also optimize parameters which appear linearly in the coecients. This suggests the following algorithm for deciding a system of polynomial equations and inequalities: decide whether there exists a witness for infeasibility of degree D, for some D 0. If our system is feasible, then we might like to minimize a polynomial f (x) over the solution set. The Dth SDP relaxation would be to ask for the largest real number such that the given system together with the inequality f (x) < 0 has an infeasibility witness of degree D. This generalizes what was proposed in the previous section. It is possible, at least in principle, to use an a priori bound for the degree D in the Real Nullstellensatz. However, the currently known bounds are still very large. Lombardi and Roy recently announced a bound which is triplyexponential in the number n of variables. We hope that such bounds can be further improved, at least for some natural families of polynomial problems arising in optimization. Here is a very simple example in the plane to illustrate the method: f := x y 2 + 3 0 , g := y + x2 + 2 = 0. (60)
106
By the Real Nullstellensatz, the system {f 0, g = 0} has no solution (x, y) in the real plane 2 if and only if there exist polynomials s1 , s2 , s3 [x, y] that satisfy the following: s1 + s 2 f + 1 + s 3 g 0 , where s1 and s2 are sums of squares. (61)
The Dth SDP relaxation of the polynomial problem {f 0, g = 0} asks whether there exists a solution (s1 , s2 , s3 ) to (61) where the polynomial s1 has degree D and the polynomials s2 , s3 have degree D 2. For each xed integer D > 0 this can be tested by semidenite programming. Specically, we can use the program SOStools. For D = 2 we nd the solution s1 =
1 3
+2 y+
3 2 2
+6 x
1 2 , 6
s2 = 2,
s3 = 6.
The resulting identity (61) proves that the polynomial system {f 0, g = 0} is inconsistent.
7.5
Symmetric Matrices with Double Eigenvalues
The material in this section is independent from the previous sections. It is inspired by a lecture of Peter Lax in the Berkeley Mathematics Colloquium in February 2001 and by discussions with Beresford Parlett and David Eisenbud. Given three real symmetric n n-matrices A0 , A1 and A2 , how many matrices of the form A0 + xA1 + yA2 have a double eigenvalue ? Peter Lax (1998) proved that there is always at least one such matrix if n 2 (mod 4). We shall extend the result of Lax as follows: Theorem 65. Given three general symmetric nn-matrices A0 , A1 , A2 , there are exactly n+1 pairs of complex numbers (x, y) for which A 0 + xA1 + yA2 3 has a critical double eigenvalue. A critical double eigenvalue is one at which the complex discriminantal hypersurface = 0 (described below) is singular. This theorem implies the result of Lax because all real double eigenvalues are critical, and n+1 3 = 1 (n 1) n (n + 1) 6 is odd if and only if n 2 (mod 4).
In the language of algebraic geometry, Theorem 65 states that the complexication of the set of all real n n-symmetric matrices which have a 107
double eigenvalue is a projective variety of degree n+1 . Surprisingly, this 3 variety is not a hypersurface but has codimension 2. We also propose the following renement of Theorem 65 in terms of real algebraic geometry: Conjecture 66. There exist real three symmetric n n-matrices A0 , A1 and A2 such that all n+1 complex solutions (x, y) to the problem in Theorem 3 65 have real coordinates. Consider the case n = 3. The discriminant of the symmetric matrix a b c X = b d e (62) c e f is the discriminant of its characteristic polynomial. This is an irreducible homogeneous polynomial with 123 terms of degree 6 in the indeterminates a, b, c, d, e, f . It can be written as a sum of squares of ten cubic polynomials: 2(acd + acf + b2 c bde + bef c3 + cd2 cdf )2 + 2(abd + abf + b3 bc2 + bdf bf 2 cde + cef )2 + 2(abd abf + ace b3 bdf + be2 + bf 2 cef )2 + 2(abe acd + acf bde c3 + cd2 cdf + ce2 )2 + 2(a2 e + abc + ade + aef bcd c2 e def + e3 )2 + 2(a2 e + abc + ade + aef b2 e bcf def + e3 )2 + 14(b2 ebcd + bcf c2 e)2 + 14(acebc2 + be2 cde)2 + 14(abeb2 cbef + ce2 )2 + (a2 d a2 f ab2 + ac2 ad2 + af 2 + b2 d c2 f + d2 f de2 df 2 + e2 f )2 =
This polynomial denes a hypersurface in complex projective 5-space P 5 . What we are interested in is the complexication of the set of real points of this hypersurfaces. This is the subvariety of P 5 dened by the ten cubic polynomials appearing in the above representation of . These cubics arise from the following determinantal presentation of our variety due to Ilyushechkin (1992). Consider the following two 3 6-matrices of linear forms: b b 0 a d e c b F T = c 0 c e a f 0 e e c b df 108
1 1 1 0 0 0 a d f b c e G= 2 2 2 2 2 2 2 2 2 a +b +c b +d +e c +e +f ab+bd+ce ac+be+cf bc+de+ef The kernel of either matrix equals the row 0 G F = 0 0 span of the other matrix, 0 0 0 0 0 0
and this holds even when we take the kernel or row span as modules over the polynomial ring S = [a, b, c, d, e, f ]. In other words, we have an exact sequence of free S-modules: 0 S 3 S 6 S 3 The set of ten cubics dening our variety coincides with the set of non-zero maximal minors of F and also with the set of non-zero maximal minors of G. For instance, the 12-term cubic in the last summand of our formula for equals the determinant of the last three columns of F or of the rst three columns of F . In fact, we have the following identity = det F T diag(2, 2, 2, 1, 1, 1)F = det Gdiag(1, 1, 1, 2, 2, 2)GT .
F G
The following two facts are easily checked with maple: 1. The subvariety of projective 5-space P 5 dened by the 3 3-minors of either F or G is irreducible of codimension 2 and degree 4. 2. There exists a real 2-plane in P 5 whose intersection with that subvariety consists of four distinct points whose coordinates are real. These two points are exactly what is claimed for n = 3 in our conjecture. The exact sequence and the above formula for exist for all values of n. This beautiful construction is due to Ilyushechkin (1992). We shall describe it in commutative algebra language. We write Sym2 ( n ) for the space of symmetric nn-matrices, and we write 2 ( n ) for the space of antisymmetric n n-matrices. These are real vector spaces of dimension n+1 and n 2 2 respectively. Let X = (xij ) be a symmetric n n-matrix with indeterminate entries. Let S = [X] denote the polynomial ring over the real numbers generated by the n+1 variables xij and consider the free S-modules 2 2 (S n ) = 2 ( n ) S and 109 Sym2(S n ) = Sym2 ( n ) S.
Lemma 67. The following is an exact sequence of free S-modules: 0 2 (S n ) Sym2(S n ) S n 0, where the maps are dened as F (A) = AX XA and G(B) = trace(BX i )
i=0,...,n1 F G
(63)
Proof. It is easily seen that the sequence is a complex and is generically exact. The fact that it is exact follows from the Buchsbaum-Eisenbud criterion (Eisenbud 1995, Theorem 20.9), or, more specically, by applying (Eisenbud 1995, Exercise 20.4) to the localizations of S at maximal minors of F . The following sum of squares representation is due to Ilyushechkin (1992). Theorem 68. The discriminant of a symmetric n n-matrix X equals = det FT F) = det G GT ), (64)
where F and G are matrices representing the maps F and G in suitable bases. We now come to the proof of Theorem 65. Proof. The dual sequence to (63) is also exact and it provides a minimal free resolution of the module coker(F T ). This module is Cohen-Macaulay of codimension 2 and the resolution can be written with degree shifts as follows:
n+1 n GT FT 0 n S(i) S(1)( 2 ) S ( 2 ) . i=1 n+1 The Hilbert series of the shifted polynomial ring S is xi (1 x)( 2 ) . n+1 n+1 The Hilbert series of the module S(1)( 2 ) is n+1 x (1 x)( 2 ) . The 2 Hilbert series of the module coker(F T ) is the alternating sum of the Hilbert series of the modules in (64), and it equals
n n+1 x+ 2 2
xi
i=1
(1 x)(
n+1 2
).
Removing a factor of (1 x)2 from the parenthesized sum, we can rewrite this expression for the Hilbert series of coker(F T ) as follows:
n
i=2
i ni x 2
(1 x)( 110
n+1 2
)+2 .
We know already that coker(F T ) is a Cohen-Macaulay module of codimension 2. Therefore we can conclude the following formula for its degree:
n
degree coker(F T )
=
i=2
i 2
n+1 . 3
(65)
Finally, let X be the support of the module coker(F T ). Thus X is precisely our codimension 2 variety which is cut out by the vanishing of the maximal minors of the matrix X. The generic ber of the vector bundle on X represented by coker(F T ) is a one-dimensional space, since the rank drop of the matrix F is only one if the underlying symmetric matrix has only one double eigenvalue and n 2 distinct eigenvalues. We conclude that the degree of X equals the degree of the module coker(F T ). The identity in (65) now completes the proof of Theorem 65.
7.6
Exercises
(1) Solve the following one-variable problem, a slight modication of (b), using SOStools: Minimize x subject to x4 10x3 +37x2 61x+36 = 0. (2) Take g(x1 , x2 , . . . , x10 ) to be your favorite inhomogeneous polynomial of degree three in ten variables. Make sure it looks random enough. Use SOStools to nd the global minimum in 10 of the quartic polynomial x4 + x4 + + x4 + g(x1 , x2 , . . . , x10 ). 1 2 10 (3) Nina and Pascal stand in the playground 10 meters apart and they each hold a ball of radius 10 cm. Suddenly they throw their balls at each other in a straight line at the same constant speed, say, 1 meter per second. At what time (measured in seconds) will their balls rst hit? Formulate this using polynomial equations (and inequalities?) and explain how semidenite programming can be used to solve it. Nina next suggests to Pascal that they replace their balls by more interesting semialgebraic objects, for instance, those dened by xai + y a2 + z a3 1 for arbitrary integers a1 , a2 , a3 . Update your model and your SDP. (4) Find the smallest positive real number a such that the following three equations have a common solution in 3 : x6 +1+ay 2 +az = y 6 +1+az 2 +ax 111 = z 6 +1+ax2 +ay = 0.
(5) What does the Duality Theorem of Semidenite Programming say? What is the dual solution to the SDP problem which asks for a sum of squares representation of f (x) ? Can you explain the cryptic sentence With a few more lines... at the end of the third section? (6) Write the discriminant of the symmetric 3 3-matrix (62) as a sum of squares, where the number of squares is as small as possible.
Polynomial Systems in Statistics
In this lecture we encounter three classes of polynomial systems arising in statistics and probability. The rst one concerns the algebraic conditions characterizing conditional independence statements for discrete random variables. Computational algebra provides usefuls tool for analyzing such statements and for making inferences about conditional independence. The second class consists of binomial equations which represent certain moves for Markov chains. We discuss work of (Diaconis, Eisenbud & Sturmfels 1998) on the use of primary decomposition for quantifying the connectivity of Markov chains. The third class are the polynomial equations satised by the maximum likelihood equations in a log-linear model. We discuss several reformulations of these equations, in terms of posinomials and in terms of entropy maximization, and we present a classical numerical algorithm, called iterative proportional scaling, for solving the maximum likelihood equations. For additional background regarding the use of Grbner bases in statistics we refer o to the book Algebraic Statistics by Pistone, Riccomagno and Wynn (2001).
8.1
Conditional Independence
The set of probability distributions that satisfy a conditional independence statement is the zero set of certain polynomials and can hence be studied using methods from algebraic geometry. We call such a set an independence variety. In what follows we describe the polynomials dening independence varieties and we present some fundamental algebraic problems about them. Let X1 , . . . , Xn denote discrete random variables, where Xi takes values in the set [di ] = {1, 2, . . . , di }. We write D = [d1 ] [d2 ] [dn ] so that D 112
denotes the real vector space of n-dimensional tables of format d1 d2 dn . We introduce an indeterminate pu1 u2 ...un which represents the probability of the event X1 = u1 , X2 = u2 , . . . , Xn = un . These indeterminates generate the ring [D] of polynomial functions on the space of tables D . A conditional independence statement about X1 , X2 , . . . , Xn has the form A is independent of B given C (in symbols: A B | C) (66)
where A, B and C are pairwise disjoint subsets of {X1 , . . . , Xn }. If C is the empty set then (66) just reads A is independent of B. Proposition 69. The independence statement (66) translates into a set of quadratic polynomials in [D] indexed by
Xi A [di ]
Xj B [dj ]
Xk C
[dk ].
(67)
Proof. Picking any element of the set (67) means chosing two distinct elements a and a in Xi A [di ], two distinct elements b and b in Xj B [dj ], and an element c in Xk C [dk ], and this determines an expression involving probabilities: Prob(A = a, B = b, C = c) Prob(A = a , B = b , C = c) Prob(A = a , B = b, C = c) Prob(A = a, B = b , C = c). To get our quadrics indexed by (67), we translate each of the probabilities Prob( ) into a linear polynomial in [D]. Namely, Prob(A = a, B = b, C = c) equals the sum of all indeterminates pu1 u2 un which satisfy: for all Xi A, the Xi -coordinate of a equals ui , for all Xj B, the Xj -coordinate of b equals uj , and for all Xk C, the Xk -coordinate of c equals uk . We dene IAB|C to be the ideal in the polynomial ring [D] which is generated by the quadratic polynomials indexed by (67) and described above. We illustrate the denition of the ideal IAB|C with some simple examples. Take n = 3 and d1 = d2 = d3 = 2, so that D is the 8-dimensional space of 2 2 2-tables, and
[D]
[p111 , p112 , p121 , p122 , p211 , p212 , p221 , p222 ].
113
The statement {X2 } is independent of {X3 } given {X1 } describes the ideal IX2 X3 |X1 = p111 p122 p112 p121 , p211 p222 p212 p221 . (68)
The statement {X2 } is independent of {X3 } determines the principal ideal IX2 X3 = (p111 + p211 )(p122 + p222 ) (p112 + p212 )(p121 + p221 ) . (69)
The ideal IX1 {X2 ,X3 } representing the statement {X1 } is independent of {X2 , X3 } is generated by the six 2 2-subdeterminants of the 2 4-matrix p111 p112 p121 p122 p211 p212 p221 p222 (70)
The variety VAB|C is dened as the set of common zeros in D of the polynomials in IAB|C . Thus VAB|C is a set of complex d1 dn 0 tables, but in statistics applications we only care about the subset VAB|C of tables whose entries are non-negative reals. These correspond to probability distributions that satisfy the independence fact A B|C. We also consider >0 the subsets VAB|C of real tables and VAB|C of strictly positive tables. The variety VAB|C is irreducible because the ideal IAB|C is a prime ideal. Many statistical models for categorical data can be described by a nite set of independence statements (66). An independence model is such a set: M = A(1) B (1) |C (1) , A(2) B (2) |C (2), . . . , A(m) B (m) |C (m) .
This class of models includes all directed and undirected graphical models, to be discussed below. The ideal of the model M is dened as the sum IM = IA(1) B(1) |C (1) + IA(2) B(2) |C (2) + + IA(m) B(m) |C (m) .
The independence variety is the set of tables which satisfy these polynomials: VM = VA(1) B(1) |C (1) VA(2) B(2) |C (2) VA(m) B(m) |C (m) .
Problem 70. For which models M is the independence ideal I M a prime ideal, and for which models M is the independence variety V M irreducible? As an example consider the following model for binary random variables: MyModel = X2 X3 , X1 {X2 , X3 } 114
The ideal of this model is neither prime nor radical. It decomposes as IMyModel = IX2 X3 + IX1 {X2 ,X3 } = ISegre P 2 + IX1 {X2 ,X3 } (71)
where the rst component is the independence ideal for the model Segre = X1 {X2 , X3 }, X2 {X1 , X3 }, X3 {X1 , X2 }
Thus ISegre is the prime ideal of the Segre embedding of P1 P1 P1 into P7 . The second component in (71) is a primary ideal with radical P = p111 + p211 , p112 + p212 , p121 + p221 , p122 + p222 .
Since this ideal has no non-trivial zeros in the positive orthant, we conclude that MyModel is equivalent to the complete independence model Segre.
0 VMyModel
0 VSegre .
Thus the equation (71) proves the following rule for binary random variables: X2 X3 and X1 {X2 , X3 } implies X2 {X1 , X3 } (72)
It would be very nice project to determine the primary decompositions for all models on few random variables, say n 5. A catalogue of all resulting rules is likely to be useful for applications in articial intelligence. Clearly, some of the rules will be subject to the hypothesis that all probabilities involved be strictly positive. A good example is Proposition 3.1 in (Lauritzen 1996, page 29), which states that, for strictly positive densities, X1 X2 | X3 and X1 X3 | X2 implies X1 {X2 , X3 }.
It corresponds to the primary decomposition = IX1 {X2 ,X3 } IX1 X2 | X3 + IX1 X3 | X2 p111 , p122 , p211 , p222 p112 , p121 , p212 , p221 . {X1 , X2 , . . . , Xn }.
The conditional independence statement (66) is called saturated if AB C =
In that case IAB|C is a generated by dierences of monomials. Such an ideal is called a binomial ideal. Recall from Lecture 5 that every binomial ideal has a primary decomposition into binomial ideals. Proposition 71. The ideal IM is a binomial ideal if and only if the model M consists of saturated independence statements. 115
8.2
Graphical Models
The property that the ideal IM is binomial holds for the important class of undirected graphical models. Let G be an undirected graph with vertices X1 , X2 , . . . , Xn . From the graph G one derives three natural sets of saturated independence conditions: pairwise(G) local(G) global(G). (73) For instance,
See (Lauritzen 1996, page 32) for details and denitions. pairwise(G) consists of all independence statements Xi Xj | {X1 , . . . , Xn }\{Xi , Xj }
where Xi and Xj are not connected by an edge in G. It is known that the ideal Iglobal(G) is prime if and only if G is a decomposable graph. This situation was studied by Takken (1999), Dobra and Sullivant (2002) and Geiger, Meek and Sturmfels (2002). These authors showed that the quadratic generators of Iglobal(G) form a Grbner basis. o Problem 72. For decomposable graphical models G, including chains, study the primary decomposition of the binomial ideals I pairwise (G) and Ilocal (G). For a general undirected graph G, the following problem makes sense: Problem 73. Study the primary decomposition of the ideal Iglobal (G). The most important component in this decomposition is the prime ideal TG := (Ipairwise (G) : p ) = (Iglobal (G) : p ). (74)
This equation follows from the Hemmersley-Cliord Theorem. Here p denotes the product of all the indeterminates pu1 u2 ...un . The ideal TG is called the toric ideal of the graphical model G. The most basic invariants of any projective variety are its dimension and its degree. There is an easy formula for the dimension of the variety of TG , but its degree remains mysterious: Problem 74. What is the degree of the toric ideal TG of a graphical model?
116
Example 75. We illustrate these denitions and problems for the graph G which is the 4-chain X1 X2 X3 X4 . Here each Xi is a binary random variable. The ideal coding the pairwise Markov property equals Ipairwise(G) = p1121p2111 p1111 p2121 , p1112 p2111 p1111 p2112 , p1112 p1211 p1111 p1212 , p1122 p2112 p1112 p2122 , p1122 p2121 p1121 p2122 , p1122 p1221 p1121 p1222 , p1221 p2211 p1211 p2221 , p1212 p2211 p1211 p2212 , p2112 p2211 p2111 p2212 , p1222 p2212 p1212 p2222 , p1222 p2221 p1221 p2222 , p2122 p2221 p2121 p2222 Solving these twelve binomial equations is not so easy. First, Ipairwise(G) is not a radical ideal, which means that there exists a polynomial f with f 2 Ipairwise(G) but f Ipairwise(G) . Using the division algorithm modulo Ipairwise(G) , one checks that the following binomial enjoys this property f = p1111 p1212 p1222 p2121 p1111 p1212 p1221 p2122 .
An ideal basis of the radical of Ipairwise(G) consists of the 12 quadrics and eight quartics such as f . The variety dened by Ipairwise(G) has 33 irreducible components. One these components is dened by the toric ideal TG = Ipairwise(G) + p1122 p2221 p1121 p2222 , p1221 p2212 p1212 p2221 , p1222 p2211 p1211 p2222 , p1112 p2211 p1111 p2212 , p1222 p2121 p1221 p2122 , p1121 p2112 p1112 p2121 , p1212 p2111 p1211 p2112 , p1122 p2111 p1111 p2122 The twenty binomial generators of the toric ideal TG form a Grbner basis. o 15 The corresponding toric variety in P has dimension 8 and degree 34. Each of the other 32 minimal primes of Ipairwise(G) is generated by a subset of the indeterminates. More precisely, among the components of our model there are four linear subspaces of dimension eight, such as the variety of p0000, p0011 , p0100 , p0111 , p1000 , p1011 , p1100 , p1111 , there are 16 linear subspaces of dimension six, such as the variety of p0000 , p0001, p0010 , p0011 , p0100 , p0111 , p1011 , p1100 , p1101 , p1111 , and there are 12 linear subspaces of dimension four, such as the variety of p0000 , p0001 , p0010 , p0011 , p1000 , p1001 , p1010 , p1011 , p1100, p1101 , p1110 , p1111 . (75) 117
Each of these irreducible components gives a simplex of probability distributions which satises the pairwise Markov property but does not factor in the four-chain model. For instance, the ideal in (75) represents the tetrahedron consisting of all probability distributions with X1 = 0 and X2 = 1. In this example, the solution to Problem 74 is 34. The degree of any projective toric variety equals the normalized volume of the associated convex polytope. In setting of (Sturmfels 1995), this polytope is given by an integer matrix A. The integer matrix A which encodes the toric ideal our TG equals
1111 1 0 0 0 1 0 0 0 1 0 0 0 1112 1121 1122 1211 1212 1221 1222 2111 2112 2121 2122 2211 2212 2221 2222
1 0 0 0 1 0 0 0 0 1 0 0
1 0 0 0 0 1 0 0 0 0 1 0
1 0 0 0 0 1 0 0 0 0 0 1
0 1 0 0 0 0 1 0 1 0 0 0
0 1 0 0 0 0 1 0 0 1 0 0
0 1 0 0 0 0 0 1 0 0 1 0
0 1 0 0 0 0 0 1 0 0 0 1
0 0 1 0 1 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 0 1 0 0
0 0 1 0 0 1 0 0 0 0 1 0
0 0 1 0 0 1 0 0 0 0 0 1
0 0 0 1 0 0 1 0 1 0 0 0
0 0 0 1 0 0 1 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
The convex hull of the 16 columns of this matrix is an 8-dimensional polytope in 12 . The normalized volume of this polytope equals 34. We can generalize the denition of the toric ideal TG from graphical models to arbitrary independence models M. For any subset A of {X1 , . . . , Xn } and any element a of Xi A [di ], we consider the linear forms Prob(A = a) whoich is the sum all indeterminates pu1 u2 un such that the Xi -coordinate of a equals ui for all Xi A. Let p denote the product of all such linear forms Prob(A = a). We dene the following ideal by saturation: TM = ( IM : p ).
Problem 76. Is TM the vanishing ideal of the set of those probability distributions which are limits of strictly positive distributions which satisfy M. An armative answer to this question would imply that TM is always a radical ideal. Perhaps it is even always prime? A nice example is the model 118
M = {X1 X2 , X1 X3 , X2 X3 } for three binary random variables. Its ideal IM is the intersection of four prime ideals, the last one of which is TM : IM = Prob(X1 = 1), Prob(X1 = 2), Prob(X2 = 1), Prob(X2 = 2) Prob(X1 = 1), Prob(X1 = 2), Prob(X3 = 1), Prob(X3 = 2) Prob(X2 = 1), Prob(X2 = 2), Prob(X3 = 1), Prob(X3 = 2) p112 p221 + p112 p222 p121 p212 p121 p222 p122 p212 + p122 p221 , p121 p212 p111 p221 p111 p222 + p121 p211 p211 p222 + p212 p221 , p111 p212 + p111 p222 p112 p211 p112 p221 + p211 p222 p212 p221 , p111 p221 + p111 p222 p121 p211 + p121 p222 p122 p211 p122 p221 , p111 p122 + p111 p222 p112 p121 p112 p221 + p121 p222 p122 p221 .
The ve generators for TM are a Grbner basis with leading terms underlined. o An important class of non-saturated independence models arise from directed graphs as in (Lauritzen 1996, Section 3.2.2). Let G be an acylic directed graph with vertices X1 , X2 , . . . , Xn . For any vertex Xi , let pa(Xi ) denote the set of parents of Xi in G and let nd(Xi ) denote the set of nondescendants of Xi in G. The directed graphical model of G is described by the following set of independence statements: local(G) = Xi nd(Xi ) | pa(Xi ) : i = 1, 2, . . . , n .
Theorem 3.27 in (Lauritzen 1996) tell us that this model is well-behaved. Problem 77. Is the ideal Ilocal(G) prime, and hence equal to Tlocal(G) ? Assuming that the answer is yes we simply write IG = Ilocal(G) = Tlocal(G) for the prime ideal of the directed graphical model G. It is known that decomposable models can be regarded as directed ones. This suggests: Problem 78. Does the prime ideal IG of a directed graphical model G have a quadratic Grbner basis, generalizing the known Grbner basis for o o decomposable (undirected graphical) models? As an example consider the directed graph G on four binary random variables with four edges X1 X2 , X1 X3 , X2 X4 and X3 X4 . Here local(G) = X2 X3 | X1 , X4 X1 | {X2 , X3 } 119
and the prime ideal associated with this directed graphical model equals IG = (p1111 + p1112 )(p1221 + p1222 ) (p1121 + p1122 )(p1211 + p1212 ), (p2111 + p2112 )(p2221 + p2222 ) (p2121 + p2122 )(p2211 + p2212 ), p1111 p2112 p1112 p2111 , p1121 p2122 p1122 p2121 , p1211 p2212 p1212 p2211 , p1221 p2222 p1222 p2221
This ideal is a complete intersection, i.e. its variety has codimension six. The six quadrics form a Grbner basis with respect to a suitable monomial order. o In summary, statistical models described by conditional independence statements furnish a wealth of interesting algebraic varieties which are cut out by quadratic equations. Gaining a better understanding of independence varieties and their equations is likely to have a signicant impact for the study of multidimensional tables and its applications to problems in statistics.
8.3
Random Walks on the Integer Lattice
Let B be a (typically nite) subset of the integer lattice n. The elements of B are regarded as the moves or steps in a random walk on the lattice points in the non-negative orthant. More precisely, let GB be the graph with vertices the set n of non-negative integer vectors, where a pair of vectors u, v is connected by an edge if and only if either uv or v u lies in B. The problem to be addressed in this section is to characterize the connected components of the graph GB . Having a good understanding of the connected components and their higher connectivity properties is a necessary precondition for any study of specic Markov chains and their mixing time. Example 79. Let n = 5 and consider the set of moves B = (1, 1, 1, 1, 0) , (1, 1, 0, 1, 1) , (0, 1, 1, 1, 1) .
These three vectors span the kernel of the matrix A = 1 1 1 1 1 1 2 3 4 5
The two rows of the matrix A represent the sucient statistics of the walk given by B. Two vectors u, v 5 lie in the same component of GB only if they have the same sucient statistics. The converse is not quite true: we 120
need additional inequalities. Two non-negative integer vectors u and v lie in the same connected component of GB if and only if A u = A v and u1 + u2 + u3 1, u1 + u2 + u4 1, u2 + u4 + u5 1, u3 + u4 + u5 1 and v1 + v2 + v3 1, v1 +v2 +v4 1, v2 +v4 +v5 1, v3 + v4 + v5 1. Returning to the general case, let L denote the sublattice of n generated by B. Computing the sucient statistics amounts to computing the image under the canonical map n n/L. If n/L is torsion-free then this map can be represented by an integer matrix A. A necessary condition for u and v to lie in the same component of GB is that they have the same image under the linear map A. Thus we are looking for conditions (e.g. linear inequalities) which, in conjunction with the obvious condition u v L, will ensure that v can be reached from u in a random walk on n using steps from B only. We encode every vector u in B by a dierence of two monomials, namely, xu+ xu =
i:ui >0
xui i
j:uj <0
xj
uj
Let IB denote the ideal in S = [x1 , . . . , xn ] generated by the binomials xu+ xu where u runs over B. Thus every binomial ideal encountered in these lectures can be interpreted as a graph on non-negative lattice vectors. Theorem 80. Two vectors u, v n lie in the same connected component of GB if and only if the binomial xu xv lies in the binomial ideal IB . Our algebraic approach in studying the connectivity properties of graph GB is to compute a suitable ideal decomposition: IB = IL J1 J2 Jr .
This decomposition could be a binomial primary decomposition, or if could be some coarser decomposition where each Ji has still many associated primes. The key requirement is that membership in each component Ji should be describable by some easy combinatorial condition. Sometimes we can only give sucient conditions for membership of xu xv in each Ji , and this will lead to sucient conditions for u and v being connectable in GB . The lattice ideal IL encodes the congruence relation modulo L = B. Two vectors u and v in n have the same sucient statistics if and only if xu xv lies in 121
IL . Note that the lattice ideal IL is prime if and only if n/L is torsion-free. This ideal always appears in the primary decomposition of IB because IB : (x1 x2 xn ) = IL .
This identity of ideals has the following interpretation for our application: Two vectors u, v 5 lie in the same component of GB only if they have the same sucient statistics and their coordinates are positive enough. Our discussion implies that Grbner basis software can be used to detero mine the components of the graph GB . For instance, the system of inequalities in Example 79 is the output o3 of the following Macaulay 2 session: i1 i2 i3 o3 R = QQ[x1,x2,x3,x4,x5]; IB = ideal(x1*x4-x2*x3,x1*x5-x2*x4,x2*x5-x3*x4); toString ass(IB) { ideal(x1,x2,x3), ideal(x1,x2,x4), ideal(x2,x4,x5), ideal(x3,x4,x5), ideal(x4^2-x3*x5, x3*x4-x2*x5, x2*x4-x1*x5, x3^2-x1*x5, x2*x3-x1*x4, x2^2-x1*x3) } i4 : IB == intersect ass(IB) o4 = true Two-dimensional contigency tables are ubiquitous in statistics, and it is a basic problem to study random walks on the set of all contigency tables with xed margins. For instance, consider the set 44 of non-negative integer 4 4-matrices. The ambient lattice 44 is isomorphic to 16. The sucient statistics are given by the row sums and column sums of the matrices. Equivalently, the sublattice L consists of all matrices in 44 whose row sums and column sums are zero. The lattice ideal IL is the prime ideal generated by the thirty-six 2 2-minors of a 4 4-matrix (xij ) of indeterminates. A natural question is to study the connectivity of the graph GB dened by some basis B for the lattice L. For instance, take B to be the set of nine adjacent 2 2-moves. The corresponding binomial ideal equals IB = x12 x21 x11 x22 , x13 x22 x12 x23 , x14 x23 x13 x24 , x22 x31 x21 x32 , x23 x32 x22 x33 , x24 x33 x23 x34 , x32 x41 x31 x42 , x33 x42 x32 x43 , x34 x43 x33 x44 . : : : =
Theorem 80 tells us that two non-negative integer 4 4-matrices (aij ) and (bij ) with the same row and column sums can be connected by a sequence of 122
adjacent 2 2-moves if and only if the binomial xijij

1i,j4 a
ij xij
lies in the ideal IB .
1i,j4
The primary decomposition of IB was computed in Lecture 5. This primary decomposition implies the following combinatorial result: Proposition 81. Two non-negative integer 44-matrices with the same row and column sums can be connected by a sequence of adjacent 2 2-moves if both of them satisfy the following six inequalities: (i) a21 + a22 + a23 + a24 2; (ii) a31 + a32 + a33 + a34 2; (iii) a12 + a22 + a32 + a42 2; (iv) a13 + a23 + a33 + a43 2; (v) a12 + a22 + a23 + a24 + a31 + a32 + a33 + a43 1; (vi) a13 + a21 + a22 + a23 + a32 + a33 + a34 + a42 1. We remark that these sucient conditions remain valid if (at most) one of the four inequalities 2 is replaced by 1. No further relaxation of the conditions (i)(vi) is possible, as is shown by the following two pairs of matrices, which cannot be connected by an adjacent 2 2-walk: 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 2 1 1 0 0 0 0 0 0 0 0 0 0 The necessity n n 0 0 0 0 n 0 0 n 0 n of conditions n n 0 n (v) and (vi) n 0 n n 0 0 0 0 0 n n 0 123 is seen from the disconnected pairs n 0 for any integer n 0. n n
Such minimally disconnected pairs of matrices are derived by computing witnesses for the relevant associated primes of IB . Random walks arising from graphical models play a signicant role in the statistical study of multi-dimensional contigency tables. A noteworthy realworld application of these techniques is the work on the U.S. census data by Stephen Fienberg and his collaborators at the National Institute of Statistical Sciences (http://www.niss.org/). Studying the connectivity problems of these random graphs is precisely the issue of Problems 72 and 73. Namely, given a graph G, each of the three sets of independence facts in (73) translates into a set of quadratic binomials and hence into a random walk on all tables with margins in the graphical model G. The primary decompositions of the binomial ideals Ipairwise (G), Ilocal (G) and Iglobal (G) will furnish us with conditions under which two multi-dimensional tables are connected in under the random walk. Example 75 is a good place to start; see Exercise (3) below. We conclude with the family of circuit walks which is very natural from a mathematical perspective. Let A be a d n-integer matrix and L = ker (A) n as before. The ideal IL is prime; it is the toric ideal associated with A. A non-zero vector u = (u1, . . . , un ) in L is called a circuit if its coordinates ui are relatively prime and its support supp(u) = { i : ui = 0} is minimal with respect to inclusion. We shall consider the walk dened by the set C of all circuits in L. This makes sense for two reasons: The lattice L is generated by the circuits, i.e., ZC = L. The circuits can be computed easily from the matrix A. Here is a simple algorithm for computing C. Initialize C := . For any (d + 1)-subset = {1 , . . . , d+1 } of {1, . . . , n} form the vector
d+1
=
i=1
(1)i det(A \{i } ) ei ,
where ej is the jth unit vector and A is the submatrix of A with column indices . If C is non-zero then remove common factors from its coordinates. The resulting vector is a circuit and all circuits are obtained in this manner. Example 82. Let d 2, n = 4 and A = C = 0 2 5 7 . Then 7 5 2 0
(3, 5, 2, 0), (5, 7, 0, 2), (2, 0, 7, 5), (0, 2, 5, 3) . 124
It is instructive for Exercise (4) to check that the -span of C equals L = kerZ (A). (For instance, try to write (1, 1, 1, 1) L as a Z-linear combination of C). We shall derive the following result: Two L-equivalent non-negative integer vectors (A, B, C, D) and (A , B , C , D ) can be connected by the circuits if both of them satisfy the following inequality min max{A, B, C, D}, max{B, 9 9 9 9 C, D}, max{ A, B, C} 4 4 4 4 9.
The following two L-equivalent pairs are not connected in the circuit walk: (4, 9, 0, 2) (5, 8, 1, 1) and (1, 6, 6, 1) (3, 4, 4, 3). (76)
To analyze circuit walks in general, we consider the circuit ideal IC generated by the binomials xu+ xu where u = u+ u runs over all circuits in L. The primary decomposition of circuit ideals was studied in Section 8 of (Eisenbud and Sturmfels 1996). We summarize the relevant results. Let pos(A) denote the d-dimensional convex polyhedral cone in d spanned by the column vectors of A. Each face of pos(A) is identied with the subset {1, . . . , n} consisting of all indices i such that the ith column of A lies on that face. If is a face of pos(A) then the ideal I := xi : i + IL is prime. Note that I{1,...,n} = IL and I{} = x1 , x2 , . . . , xn . Theorem 83. (Eisenbud and Sturmfels 1996; Section 8) Rad(IC ) = IL and Ass(IC ) I : is a face of pos(A) .
Applying the techniques of binomial primary decomposition to the circuit ideal IC gives connectivity properties of the circuit walk in terms of the faces of the polyhedral cone pos(A). Let us see how this works for Example 82. We choose variables a, b, c, d for the four columns of A. The cone pos(A) = pos{(7, 0), (5, 2), (2, 5), (0, 7)} equals the nonnegative quadrant in 2 . It has one 2-dimensional face, labeled {a, b, c, d}, two 1-dimensional faces, labeled {a} and {d} and one 0-dimensional face, labeled {}. The toric ideal is IL = ad bc, ac4 b3 d2 , a3 c2 b5 , b2 d3 c5 , a2 c3 b4 d . (77)
The circuit ideal equals IC = a3 c2 b5 , a5 d2 b7 , a2 d5 c7 , b2 d3 c5 . 125
It has the minimal primary decomposition IC = IL b9 , c4 , d4 , b2 d2 , c2 d2 , b2 c2 a2 d2 , b5 a3 c2 a4 , b4 , c9 , a2 b2 , a2 c2 , b2 c2 a2 d2 , c5 b2 d3 a9 , b9 , c9 , d9 + IC .
The second and third ideals are primary to I{a} = b, c, d and to I{d} = a, b, c . This primary decomposition implies the inequality in (82) because a9 , b9 , c9 , d9 b9 , c4 , d4 a4 , b4 , c9 IL IC .
Returning to our general discussion, Theorem 83 implies that for each face of the polyhedral cone pos(A) there exists a non-negative integer M such that xi : i M IC . IL
face
Corollary 84. For each proper face of pos(A) there is an integer M such that any two L-equivalent vectors (a1 , . . . , an ) and (b1 , . . . , bn ) in n with ai M and
i i
bi M
for all proper faces of pos(A)
can be connected in the circuit walk. This suggests the following research problem. Problem 85. Find bounds for the integers M in terms of the matrix A. The optimal value of M seems to be related to the singularity of the toric variety dened by IL along the torus orbit labeled : The worse the singularity is, the higher the value of M . It would be very interesting to understand these geometric aspects. In Example 82 the optimal values are M{} = 15 and M{a} = 11 and M{d} = 11. Optimality is seen from the pairs of disconnected vectors in (76).
126
8.4
Maximum Likelihood Equations
We x a d n-integer matrix A = (aij ) with the property that all column sums of A are equal. As before we consider the polyhedral cone pos(A) and the sublattice L = ker (A) of n. The toric ideal IL is the prime ideal in [x1 , . . . , xn ] generated by all binomials xu+ xu where u runs + over L. We write VL for the set of zeros of IL in the non-negative orthant n 0 . This set is the log-linear model associated with A. Log-linear models include undirected graphical models and other statistical models dened by saturated independence facts. For instance, the graphical model for a fourchain of binary random variables corresponds to the 12 16-matrix A in Example 75. If an element p of n has coordinate sum 1 then we regard p 0 as a probability distribution. The vector A p in d is the sucient statistic + of p, and p is independent in the log-linear model A if and only if p VL . The following result is fundamental both for statistics and for toric geometry. Theorem 86. For any vector p n there exists a unique independent 0 + vector p VL with the same sucient statistics as p, i.e., A p = A p. The vector p is called the maximum likelihood estimate for p in the model A. Computing the maximum likelihood estimate amounts to solving a system of polynomial equations. We write Ax Ap for the ideal generated by the n d linear polynomial j=1 aij (xj pj ) for i = 1, 2, . . . , d. The maximum likelihood ideal for the non-negative vector p in the log-linear model A is IL + Ax Ap
[x1 , . . . , xn ].
(78)
We wish to nd the zero x = p . Theorem 86 can be reworded as follows. Corollary 87. Each maximum likelihood ideal (78) has precisely one nonnegative real root. Proofs of Theorem 86 and Corollary 87 are based on convexity considerations. One such proof can be found in Chapter 4 of Fulton (1993). In + toric geometry, the matrix A represents the moment map from VL , the nonnegative part of the toric variety, onto the polyhedral cone pos(A). The version of Theorem 86 appearing in (Fulton 1993) states that the moment + map denes a homeomorphism from VL onto pos(A). As an example consider the log-linear model discussed in Example 82. Let us compute the maximum likelihood estimate for the probability distribution p = (3/7, 0, 0, 4/7). The maximum likelihood ideal is given by the 127
two coordinates of Ax = Ap and the ve binomial generators of (77). More precisely, the maximum likelihood ideal (78) for this example equals x2 x3 x1 x4 , x5 x2 x3 , x5 x3 x2 , x1 x4 x3 x2 , x2 x3 x4 x4 , 3 2 4 2 1 3 3 2 4 1 3 2 0x1 + 2x2 + 5x3 + 7x4 b1 , 7x1 + 5x2 + 2x3 + 0x4 b2 with b1 = 3 and b2 = 4. This ideal has exactly one real zero x = p , which is necessarily non-negative by Corollary 87. We nd numerically p = 0.3134107644, 0.2726959080, 0.2213225526, 0.1925707745 .
There are other parameter values, for instance b1 = 1, b2 = 50, for which the above ideal has three real zeros. But always only of them is non-negative. The maximum likelihood ideal deserves further study from an algebraic point of view. First, for special points p in n , it can happen that the 0 ideal (78) is not zero-dimensional. It would be interesting to characterize those special values of p. For generic values of p, the ideal (78) is always zero-dimensional and radical, and it is natural to ask how many complex zeros it has. This number is bounded above by the degree of the toric ideal IL , and for many matrices A these two numbers are equal. For instance, in the above example, the degree of IL is seven and the maximum likelihood equations have seven complex zeros. Interestingly, these two numbers are not equal for most of the toric ideals which actually arise in statistics applications. For instance, for the four-chain model in Example 75, the degree of IL is 34 but the degree of the ideal (78) is 1; see Exercise (7) below. An explanation is oered by Proposition 4.18 in (Lauritzen 1998) which gives a rational formula for maximum likelihood estimation in a decomposable graphical model. This raises the following question for nondecomposable graphical models. Problem 88. What is the number of complex zeros of the maximum likelihood equations for a nondecomposable graphical model G ? Geiger, Meek and Sturmfels (2002) proved that this number is always greater than one. It would be nice to identify log-linear models other than decomposable graphical models whose maximum likelihood estimator is rational. Equivalently, which toric varieties have a birational moment map? Problem 89. Characterize the integer matrices A whose the maximum likelihood ideal (78) has exactly one complex solution, for each generic p. 128
In the nal version of these lecture notes, what will follow is the connection between maximum likelihood estimation, entropy minimization and optimization problems involving posinomials. Moreover, we shall present the method of iterative proportional scaling which is widely used among statisticians for computing p from p. I hope to have this material included soon.
8.5
Exercises
M X1 X2 |X3 , X2 X3 |X4 , X3 X4 |X1 , X4 X1 |X2 .
(1) Let X1 , X2 , X3 , X4 be binary random variables and consider the model =
Compute the ideal IM and nd the irreducible decomposition of the variety VM . Does every component meet the probability simplex? (2) Let G be the cycle on ve binary random variables. List the generators of the binomial ideal Ipairwise (G) and compute the toric ideal TG . (3) Give a necessary and sucient condition for two 2222-contigency tables with the same margins in the four-chain model to be connected by pairwise Markov moves. In other words, use the primary decomposition of Example 75 to analyze the associated random walk. (4) Prove that each sublattice L of
n
is spanned by its subset C of circuits.
(5) Determine and interpret the three numbers M{} , M{a} and M{d} for 0 3 7 10 circuit walk dened by the matrix A = . 10 7 3 0 (6) Compute the maximum likelihood estimate p for the probability distribution p = (1/11, 2/11, 3/11, 6/11) in the log-linear model specied by the 2 4-matrix A in the previous exercise. (7) Write the maximum likelihood equations for the four-chain model in Example 75 and show that it has only one complex solution x = p .
129
Tropical Algebraic Geometry
The tropical semiring is the extended real line {} with two arithmetic operations called tropical addition and tropical multiplication. The tropical sum of two numbers is their maximum and the tropical product of two numbers is their sum. We use the familiar symbols + and to denote these operations as well. The tropical semiring {}, +, satises many of the usual axioms of arithmetic such as (a + b) c = (axc) + (bxc). The additive unit is , the multiplicative unit is the real number 0, and x2 denotes x x. Tropical polynomials make perfect sense. Consider the cubic f (x) = 5 + (1) x + (0) x2 + (4) x3 . Then, tropically, f (3) = 6. In this lecture we study the problem of solving systems of polynomial equations in the tropical semiring. The relationship to classical polynomial equations is given by valuation theory, specically by considering Puiseux series solutions.
9.1
Tropical Geometry in the Plane
A tropical polynomial f (x) in n unknowns x = (x1 , . . . , xn ) is the maximum of a nite set of linear functions with -coecients. Hence the graph of f (x) is piecewise linear and convex. We dene the variety of f (x) as the set of points x n at which f (x) is not dierentiable. This is consistent with the intuitive idea that we are trying to solve f (x) = , given that is the additive unit. Equivalently, the variety of f (x) is the set of all points x at which the maximum of the linear functions in f (x) is attained at least twice. Let us begin by deriving the solution to the general quadratic equation ax2 + bx + c = 0 (79)
Here a, b, c are arbitrary real numbers. We wish to compute the tropical variety of (79). In ordinary arithmetic, this amounts to solving the equation max a + 2x, b + x, c This is equivalent to a + 2x = b + x c or a + 2x = c b + x or b + x = c a + 2x. is attained twice. (80)
From this we conclude: The tropical solution set to the quadratic equation (79) equals {ba, cb} if a+c 2b, and it equals {(ca)/2} if a+c 2b. 130
Our next step is the study of tropical lines in the plane. A tropical line is the tropical variety dened by a polynomial f (x, y) = ax + by + c,
where a, b, c are xed real numbers. The tropical line is a star with three rays emanating in the directions West, South and Northeast. The midpoint of the star is the point (x, y) = (c a, c b). This is the unique solution of a + x = b + y = c, meaning that the maximum involved in f (x, y) is attained not just twice but three times. The following result is easily seen: Proposition 90. Two general tropical lines always intersect in a unique point. Two general points always lie on a unique tropical line.
Figure:
Tropical Lines
Consider now an arbitrary tropical polynomial in two variables f (x, y) =

(i,j)A
ij xi y j .
Here A is a nite subset of 2. Note that it is important to specify the support set A because the term ij xi y j is present even if ij = 0. For any two points (i , j ), (i , j ) in A, we consider the system of linear inequalities i j + i x + j y = i
j
+ i x + j y ij + ix + jy 131
for (i, j) A. (81)
The solution set of (81) is either empty, or a point, or a line segment or a ray in 2 . The union of these solution sets, as (i , j ), (i , j ) ranges over pairs of distinct points in A, is the tropical curve dened by f (x, y). We use the following method to compute and draw this curve. For each point (i, j) in A, plot the point (i, j, ij ) in 3-space. The convex hull of these points is a 3-dimensional polytope. Consider the set of upper faces of this polytope. These are the faces which have an upward pointing outer normal. The collection of these faces maps bijectively onto the convex hull of A under deleting the third coordinates. It denes a regular subdivision of A. Proposition 91. The solution set to (81) is a segment if and only if (i , j ) and (i , j ) are connected by an interior edge in the regular subdivision , and it is a ray if and only if they are connected by a boundary edge of . The tropical curve of f (x, y) is the union of these segments and rays. An analogous statement holds in higher dimensions: The tropical hypersurface of a multivariate polynomial f (x1 , . . . , xn ) is an unbounded polyhedral complex geometrically dual to the regular subdivision of the support of f . If the coecients of the tropical polynomial f are suciently generic, then is a regular triangulation and the hypersurface is said to be smooth. Returning to the case n = 2, here are a few examples of smooth curves. Example 92. (Two Quadratic Curves) A smooth quadratic curve in the plane is a trivalent graph with four vertices, connected by three bounded edges and six unbounded edges. These six rays come in three pairs which go o in directions West, South and Northeast. The primitive vectors on the three edges emanating from any vertex always sum to zero. Our rst example is f1 (x, y) = 0x2 + 1xy + 0y 2 + 1x + 1y + 0.
The curve of f1 (x, y) has the four vertices (0, 0), (1, 0), (0, 1) and (1, 1):
132
Figure:
A quadratic curve
We now gradually increase the coecient from 1 to 3 and we observe what happens to our curve during this homotopy. The nal curve is f3 (x, y) = 0x2 + 1xy + 0y 2 + 3x + 1y + 0.
This curve has the four vertices (3, 1), (1, 1), (1, 2) and (3, 2):
Figure:
Another quadratic curve
Example 93. (Two Elliptic Curves) The genus of a smooth tropical curve is the number of bounded regions in its complement. The two quadratic 133
curves have divide the plane into six regions, all of them unbounded, so their genus is zero. A tropical elliptic curve has precisely one bounded region in its complement. A smooth cubic curve in the projective plane has this property:
Figure:
A cubic curve
Of course, we can also pick a dierent support set whose convex hull has exactly one interior lattice point. An example is the square of side length 2. It corresponds to a curve of bidegree (2, 2) in the product of two projective lines P 1 P 1 . Such curves are elliptic, as the following picture shows:
Figure:
A biquadratic curve
The result of Proposition 90 can be extended from tropical lines to tropical 134
curves of any degree, and, in fact, to tropical hypersurfaces in any dimension. Theorem 94. (Tropical Bzout-Bernstein) Two general tropical curves of e degrees d and e intersect in de points, counting multiplicities as explained below. More generally, the number of intersection points of two tropical curves with prescribed Newton polygons equals the mixed area of these polygons. We need to explain the multiplicities arising when intersecting two tropical curves. Consider two lines with rational slopes in the plane, where the primitive lattice vectors along the lines are (u1 , v1 ) and (u2, v2 ). The two lines meet in exactly one point if and only if the determinant u1 v2 u2 v1 is nonzero. The multiplicity of this intersection point is dened as |u1 v2 u2 v1 |. This denition of multiplicity ensures that the total count of the intersection points is invariant under parallel displacement of the tropical curves. For instance, in the case of two curves in the tropical projective plane, we can displace the curves of degree d and e in such a way that all intersection points are gotten by intersecting the Southern rays of the rst curve with the Eastern rays of the second curve. Clearly, there are precisely d e such intersection points, and their local multiplicities are all one. To prove the tropical Bernstein theorem, we use exactly the same method as in Lecture 3. Namely, we observe that the union of the two curves is the geometric dual of a mixed subdivision of the Minkowski sum of the two Newton polygons. The mixed cells in this mixed subdivision correspond to the intersection points of the two curves. The local intersection multiplicity at such a point, |u1v2 u2 v1 |, is the area of the corresponding mixed cell. Hence the mixed area, which is the total area of all mixed cells, coincides with the number of intersection points, counting multiplicity. The following picture demonstrates this reasoning for the intersection of two quadratic curves.
135
Figure:
The tropical Bezout theorem
9.2
Amoebas and their Tentacles
Let X be any subvariety of the n-dimensional algebraic torus ( )n . The amoeba of X is dened to be the image log(X) of X under the coordinatewise logarithm map from ( )n into n : log : (
n
) n ,
(z1 , . . . , zn ) log|z1 |, log|z2 |, . . . , log|zn |
(82)
The computational study of amoebas is an important new direction in the general eld of Solving Polynomial Equations. Even testing membership in the amoeba is a non-trivial problem. Consider the question whether or not the origin (0, 0, . . . , 0) lies in log(X), where X is given by its vanishing ideal of Laurent polynomials. This problem is equivalent to the following: Given a system of polynomial equations over the complex numbers, does there exist a solution all of whose coordinates are complex numbers of unit length ? We shall not pursue this question any further here. Instead, we shall take a closer look at the tentacles of the amoeba. The term amoeba was coined by Gelfand, Kapranov and Zelevinsky (1994). In the case when X is a hypersurface, the complement of X in n is a union of nitely many open convex regions, at most one for each lattice point in the Newton polytope of the dening polynomial of X. For n = 2, the amoeba does look like one of these biological organisms, with unbounded tentacles going o to innity. These tentacle directions are normal to the edges of the Newton polygon, just like the tentacles of a tropical curve. We shall see that this no coincidence. 136
Given any variety X in ( )n we dene a subset B(X) of the unit (n 1)sphere S n1 in n as follows. A point p S n1 lies in B(X) if and only if there exists a sequence of vectors p(1) , p(2) , p(3) , . . . in n such that p(r) log(X) r S n1 for all r 1 and 1 (r) p = p. r r lim
The set B(X) was rst introduced by George Bergman (1971) who called it the logarithmic limit set of the variety X. We write B(X) for the subset of 1 all vectors p in n such that either p = 0 or ||p|| p lies in B(X). We refer to B(X) as the Bergman complex of X and to B(X) as the Bergman fan of X. These objects are polyhedral by the following result: Theorem 95. The Bergman fan B(X) of a d-dimensional irreducible subva n riety X of ( ) is a nite union of rational d-dimensional convex polyhedral cones with apex at the origin. The intersection of any two cones is a common face of each. Hence B(X) is a pure (d 1)-dimensional polyhedral complex. Before discussing the proof of this theorem, let us to consider some special cases of low dimension or low codimension. Clearly, if X = X1 X2 Xr is a reducible variety then its Bergman complex equals B(X) = B(X1 ) B(X2 ) B(Xr ). We start out with the case when each Xi is a point. d = 0: If X is a nite subset of (
n
) then B(X) is the empty set.
d = 1: If X is a curve then B(X) is a nite subset of the unit sphere. The directions in B(X) are called critical tropisms in singularity theory. d = 2: If X is a surface then B(X) is a graph embedded in the unit sphere S n1 . This geometric graph retains all the symmetries of X. d = n 1: If X is a hypersurface whose dening polynomial polynomial has the Newton polytope P then B(X) is the intersection of Sn1 with the collection of proper faces in the normal fan of P . Thus B(X) is a radial projection of the (n 1)-skeleton of the dual polytope P . Bergman (1971) showed that B(X) is a discrete union of spherical polytopes, and he conjectured that this union is nite and equidimensional. This conjecture was proved using valuation theory by Bieri and Groves (1984). In what follows we shall outline a simpler proof using Grbner bases. o 137
Let I be any ideal in the polynomial ring R = [x1 , . . . , x1 ]. For 1 n instance, I could be the prime ideal dening our irreducible variety X. For a xed weight vector n , we use the following notation. For any Laurent polynomial f = c x , the initial form in (f ) is the sum of all terms c x such that the inner product is maximal. The initial ideal in (I) is the ideal generated by the initial forms in (f ) where f runs over I. Note that in (I) will be the unit ideal in R if is chosen suciently generic. We are interested in the set of exceptional for which in (I) does not contain any monomials (i.e. units). This is precisely the Bergman fan. Lemma 96. Let X be any variety in ( B(X) =
n
) and I its vanishing ideal. Then
n : in (I) does not contain a monomial }.
We sometimes use the notation B(I) for the Bergman fan of an ideal I, dened by the above formula, and similarly B(I) for the Bergman complex. Consider the closure of X in n-dimensional complex projective space P n and let J denote the homogeneous ideal in S = [x0 , x1 , . . . , xn ] which denes this closure. The ideal J is computed from I by homogenizing the given generators and saturating with respect to the ideal x0 . For any n , the initial ideal in (I) is computed as follows: form the vector (0, ) in n+1 , compute the initial ideal in(0,) (J) and then replace x0 by 1. Corollary 97. B(X) = n : in(0,) (J) contains no monomial in S .
Proof of Theorem 95: Two vectors and in n are considered equivalent for J if in (0,) (J) = in (0, ) (J). The equivalence classes are the relatively open cones in a complete fan in n called the Grbner fan of J. This fan is o the outer normal fan of the state polytope of J. See Chapter 2 in (Sturmfels 1995) for details. If C is any cone in the Grbner fan then we write inC (J) o for in (J) where is any vector in the relative interior of C. The niteness and completeness of the Grbner fan together with Corolo lary 97 imply that B(X) is a nite union of rational polyhedral cones in n . Indeed, B(X) is the support of the subfan of the Grbner fan of J consisting o of all Grbner cones C such that in C (J) contains no monomial. Note that if o C is any such cone then the Bergman fan of the zero set XC of the initial ideal in C (J) in ( )n equals B(XC ) = B(X) + C. 138 (83)
What remains to be proved is that the maximal Grbner cones C which lie in o B(X) all have the same dimension d. For that we need the following lemma. Lemma 98. Let K be a homogeneous ideal in the polynomial ring S, containing no monomials, and X(K) its zero set in the algebraic torus ( )n . Then the following are equivalent: (1) Every proper initial ideal of K contains a monomial. (2) There exists a subtorus T of ( many T -orbits.
n
) such that X(K) consists of nitely
(3) B(X(K)) is a linear subspace of n . Proof of Theorem 95 (continued): Let C be a cone in the Grbner fan o of J which is maximal with respect to containment in B(X). The ideal K = in C (J) satises the three equivalent properties in Lemma 98. The projective variety dened by K is equidimensional of the same dimension as the irreducible projective variety dened by J. Equidimensionality follows, for instance, from (Kalkbrener & Sturmfels 1995). We conclude that dim(X(K)) = dim(X) = d. Hence the subtorus T in property (2) and the subspace in property (3) of Lemma 98 both have dimension d. It follows from (83) that B(X(K)) = B(XC ) = C, and we conclude that the Grbner cone C has dimension d, as desired. o
Proof of Lemma 98: Let L denote the linear subspace of n consisting of all vectors such that in (K) = K. In other words, L is the common lineality space of all cones in the Grbner fan of K. A non-zero vector (1 , . . . , n ) lies o in L if and only if the one-parameter subgroup { (t1 , . . . , tn ) : t } xes K. The subtorus T generated by these one-parameter subgroups of ( )n has the same dimension as L, and it xes the variety X(K). We now replace ( )n by its quotient ( )n /T , and we replace n by its quotient n /L. This reduces our lemma to the following easier assertion: For a homogeneous ideal K which contains no monomial the following are equivalent: (1) For any non-zero vector , the initial ideal in (K) contains a monomial. (2) X(K) is nite. 139
(3) B(X(K)) = {0}. The equivalence of (1) and (3) is immediate from Corollary 97, and the equivalence of (2) and (3) follows from Theorem 3 in (Bergman 1971). It can also be derived from the well-known fact that a subvariety of ( )n is compact if and only if it is nite. Our proof suggests the following algorithm for computing the Bergman complex of an algebraic variety. First compute the Grbner fan, or the o state polytope, of the homogenization of its dening ideal. See Chapter 3 of (Sturmfels 1995) for details. For certain nice varieties we might know a universal Grbner basis and from this one can read o the Grbner fan o o more easily. We then check all d-dimensional cones C in the Grbner fan, o or equivalently, all (n d)-dimensional faces of the state polytope, and for each of them we determine whether or not inC (I) contains a monomial. This happens if and only if the reduced Grbner basis of in C (I) in any term order o contains a monomial. Here is a nice example to demonstrate these methods. Example 99. The Bergman complex of the Grassmannian G 2,5 of lines in P 4 is the Petersen graph. The Grassmannian G2,5 is the subvariety of P 9 whose prime ideal is generated by the following ve quadratic polynomials: p03 p12 p02 p13 + p01 p23 , p04 p12 p02 p14 + p01 p24 , p04 p13 p03 p14 + p01 p34 , p04 p23 p03 p24 + p02 p34 , p14 p23 p13 p24 + p12 p34 . (84)
A universal Grbner basis consists of these ve quadrics together with fteen o cubics such as p01 p02 p34 p02 p03 p14 + p03 p04 p12 + p04 p01 p23 . The ideal of G2,5 has 132 initial monomial ideals. They come in three symmetry classes: p02 p13 , p02 p14 , p04 p13 , p04 p23 , p14 p23 p02 p14 , p04 p13 , p04 p23 , p14 p23 , p01 p23 p01 p14 p23 , p01 p24 , p03 p12 , p03 p14 , p03 p24 , p13 p24 12 ideals , 60 ideals , 60 ideals .
We regard G2,5 as the 7-dimensional variety in ( )10 consisting of all nonzero vectors (p01 , . . . , p34 ) formed by the 22-minors of any complex 25-matrix. Hence n = 10 and d = 7. The common lineality space L of all Grbner cones o has dimension 5; hence the state polytope of G2,5 is 5-dimensional as well. Working modulo L as in the proof of Lemma 98, we conclude that B(G2,5 ) is a nite union of 2-dimensional cones in a 5-dimensional space. Equivalently, 140
it is a nite union of spherical line segments on the 4-dimension sphere. We consider B(G2,5 ) in this embedding as a graph in the 4-sphere. By doing a local computation for the Grbner cones of the three distinct o reduced Grbner bases (modulo symmetry), we found that this graph has 10 o vertices and 15 edges. The vertices are the rays spanned by the vectors eij , the images modulo L of the negated unit vectors in 10 . The corresponding initial ideal is gotten by erasing those monomials which contain variable pij . It is generated by three quadratic binomials and two quadratic trinomials. Two vertices are connected by an edge if and only if the index sets of the two unit vectors are disjoint. Hence the graph B(G2,5 ) is isomorphic to the graph whose vertices are the 2-subsets of {0, 1, 2, 3, 4} and whose edges are disjoint pairs. This is the Petersen graph. The edges correspond to the fteen deformations of G2,5 to a toric variety. See Example 11.9 in (Sturmfels 1995). For instance, the initial ideal corresponding to the disjoint pair ({0, 1}, {3, 4}) is gotten by setting the two underlined variables to zero in (84).
9.3
The Bergman Complex of a Linear Space
We next compute the Bergman complex of an arbitrary linear subspace in terms of matroid theory. Let I be an ideal in [x1 , . . . , xn ] generated by (homogeneous) linear forms. Let d be the dimension of the space of linear forms in I. A d-subset {i1 , . . . , id } of {1, . . . , n} is a basis if there does not exist a non-zero linear form in I depending only on {x1 , . . . , xn } \ {xi1 , . . . , xid }. The collection of bases is denoted M and called the matroid of I. In the following, we investigate the Bergman complex of an arbitrary matroid M of rank d on the ground set {1, 2, . . . , n}. We do not even require the matroid M to be representable over any eld. One of many axiomatization of abstract matroids goes like this: take any collection M of (n d)-subsets of {1, 2, . . . , n} and take and form the convex hull of the points i ei in n . Then M is a matroid if and only if every edge of this convex hull is a parallel translate of the dierence ei ej two unit vectors. In this case, we call the above convex hull the matroid polytope of M . Fix any vector n . We are interested in all the bases of M having minimum -cost. The set of these optimal bases is itself the set of bases of a matroid M of rank d on {1, . . . , n}. The matroid polytope of M is the face of the matroid polytope of M at which the linear functional is minimized. An element of the matroid is a loop if it does not occur in any basis.
141
In the amoeba framework the correspondence between the tentacle characterization and the matroid characterization can be stated as follows. Lemma 100. Let I be an ideal generated by linear forms, M be the associated matroid and n . Then in (I) does not contain a single variable if and only if M does not contain a loop. We may assume without loss of generality that is a vector of unit length having coordinate sum zero. The set of these vectors is S n2 =
2 2 2 n : 1 + 2 + + n = 0 and 1 + 2 + + n = 1 .
The Bergman complex of an arbitrary matroid M is dened as the set B(M ) := S n2 : M has no loops .
Theorem 101. The Bergman complex B(M ) of a rank d matroid is a pure (d 2)-dimensional polyhedral complex embedded in the (n 2)-sphere. Clearly, B(M ) is a subcomplex in the spherical polar to the matroid polytope of M . The content of this theorem is that each face of the matroid polytope of M whose matroid M has no loops, and is minimal with this property, has codimension n d + 1. If M is represented by a linear ideal I then B(M ) coincides with B(X) where X is the variety of I in ( )n . In this case, Theorem 101 is simply a special case of Theorem 95. However, when M is not representable, then we need to give a new proof of Theorem 101. This can be done using an inductive argument involving the matroidal operations of contraction and deletion. We wish to propose the combinatorial problem of studying the complex B(M ) for various classes of matroids M . For instance, for rank(M ) = 3 we always get a subgraph of the ridge graph of the matroid polytope, and for rank(M ) = 4 we get a two-dimensional complex. What kind of extremal behavior, in terms of face numbers, homology etc...etc... can we expect ? What is the most practical algorithm for computing B(M ) from M ? Example 102. Let M be the uniform matroid of rank d on {1, 2, . . . , n}. Then B(M ) is the set of all vectors in Sn2 whose largest n d + 1 coordinates are all equal. This set can be identied with the (d 2)-skeleton of the (n 1)-simplex. For instance, let M the uniform rank 3 matroid on
142
{1, 2, 3, 4, 5}. Then B(M ) is the complete graph K5 , which has ten edges, embedded in the 3-sphere S3 with vertices 1 1 1 1 2 1 1 1 2 1 , , , , , , , , , , . . . 2 5 2 5 2 5 2 5 5 2 5 2 5 2 5 5 2 5 These ve vectors are normal to ve of the ten facets of the second hypersimplex in 5 , which is the polytope conv ei + ej : 1 i < j 5 . Example 103. Let M be the rank 3 matroid on {1, 2, 3, 4, 5} which has eight bases and two non-bases {1, 2, 3} and {1, 4, 5}. Then B(M ) is the complete bipartite graph K3,3 , given with a canonical embedding in the 3-sphere S3 . Example 104. Consider the codimension two subvariety X of ( by the following two linear equations: x1 + x2 x4 x5 = x2 + x3 x5 x6 = =
6
) dened
0.
We wish to describe its Bergman complex B(X), or, equivalently, by Theorem 105 below, we wish to solve these two linear equations tropically. This amounts to nding all initial ideals of the ideal of these two linear forms which contain no variable, or equivalently, we are interested in all faces of the polar of the matroid polytope which correspond to loopless matroids. We can think of x1 , x2 , . . . , x6 as the vertices of a regular octahedron, where the ane dependencies are precisely given by our equations. The Bergman complex B(X) has 9 vertices, 24 edges, 20 triangles and 3 quadrangles. The 9 vertices come in two symmetry classes. There are six vertices which we identify with the vertices xi of the octahedron. The other three vertices are drawn in the inside of the octahedron: they correspond to the three symmetry planes. We then take the boundary complex of the octahedron plus certain natural connection to the three inside points.
9.4
The Tropical Variety of an Ideal
We now connect tropical geometry with algebraic geometry in the usual sense. The basic idea is to introduce an auxiliary variable t and to take exponents of t as the coecients in a tropical polynomial. More precisely, let f be any polynomial in [ t , x1 , x2 , . . . , xn ], written as a polynomial in x1 , . . . , xn , f =
aA
pa (t) xa1 xa2 xan . 1 2 n 143
We dene the tropicalization of f to be the polynomial trop(f ) =

aA
(lowdeg(pa )) xa1 xa2 xan 1 2 n
[x1 , . . . , xn ],
where lowdeg(pa ) is the largest integer u such that tu divides pa (t). For instance, for any non-zero rational numbers a, b and c, the polynomial f = a t3 x5 + b t7 x5 + c t2 x1 x4 . 1 1 2
has the tropicalization trop(f ) = (3) x5 + (2) x1 x4 . 1 2
The negation in the denition of trop(f ) is necessary because we are taking the maximum of linear forms when we evaluate a tropical polynomial. On the other hand, when working with Puiseux series, as in the denition of log(X) below, we always take the minimum of the occurring exponents. Given any ideal I in [t, x1 , . . . , xn ], we dened its tropical variety to be the tropical variety in n dened by the tropical polynomials trop(f ) as f runs over all polynomials in I. If the auxiliary variable t does not appear in any of the generators if I then I can be regarded as an ideal in [x1 , . . . , xn ]. In this case we recover the Bergman complex. Theorem 105. Let I be an ideal in [x1 , . . . , xn ] and X the variety it denes in ( )n . Then the tropical variety trop(I) equals the Bergman fan B(X). In the more general case when t does appear in I, the tropical variety trop(I) is not a fan but it is a polyhedral complex with possibly many bounded faces. We have seen many examples of tropical curves at the beginning of this lecture. In those cases, I is a principal ideal in [x, y]. Consider the algebraically closed eld K = {{t}} of Puiseux series. Every Puiseux series x(t) has a unique lowest term a tu where a and u . Setting val(f ) = u, this denes the canonical valuation map val : (K )n n , (x1 , x2 , . . . , xn ) val(x1 ), val(x2 ), . . . , val(xn ) . If X is any subvariety of (K )n then we can consider the its image val(X) in n . The closure of val(X) in n is called the amoeba of X. Theorem 106. Let I be any ideal in [t, x1 , . . . , xn ] and X its variety in (K )n . Then the following three subsets of n coincide: 144
The negative val(X) of the amoeba of the variety X (K )n , the tropical variety trop(I) of I, the intersection of the Bergman complex B(I) in S n with the Southern hemisphere {t < 0}, identied with n via stereographic projection. Let us illustrate Theorem 106 for our most basic example, the solution to the quadratic equation. Suppose n = 1 and consider an ideal of the form I = ta x2 + tb x + tc ,
where , , are non-zero rationals and a, b, c are integers with a + c 2b. Then trop(I) is the variety of the tropicalization (a)x2 + (b)x + (c) of the ideal generator. Since (a) + (c) 2(b), we have trop(I) = {a b, b c}. The variety of X in the ane line over K = {{t}} equals X = tba + , tcb + .
Hence val(X) = {b a, c b} = trop(I). The Bergman fan B(I) of the 2 bivarite ideal I is a one-dimensional fan in the (t, x)-plane , consisting of three rays. These rays are generated by (1, ab), (1, bc) and (2, ca), and hence the intersection of B(I) with the line t = 1 is precisely trop(I).
9.5
Exercises
10 + 9x + 7x2 + 4x3 + 0x4 .
(1) Draw the graph and the variety of the tropical polynomial f (x) =
(2) Draw the graph and the variety of the tropical polynomial f (x, y) = 1x2 + 2xy + 1y 2 + 3x + 3y + 1.
(3) Let I be the ideal of 3 3-minors of a 3 4-matrix of indeterminates. Compute the Bergman complex B(I) of this ideal. (4) The Bergman complex B(M ) of a rank 4 matroid M on {1, 2, 3, 4, 5, 6} is a polyhedral surface embedded in the 4-sphere. What is the maximum number of vertices of B(M ), as M ranges over all such matroids? 145
(5) Let I be a complete intersection ideal in [t, x1 , x2 , x3 ] generated by two random polynomials of degree three. Describe trop(I) 3 .
10
The Ehrenpreis-Palamodov Theorem
Every system of polynomials translates naturally into a system of linear partial dierential equations with constant coecients. The equation ci1 i2 ...in xi1 xi2 xin 1 2 n = 0 (85)
corresponds to the following partial dierential equation ci1 i2 ...in i1 +i2 ++in f xi1 xi2 xin 1 2 n = 0 (86)
for an unknown function f = f (x1 , . . . , xn ). In this lecture we argue that it is advantageous to regard polynomials as linear PDE, especially when the given polynomials have zeros with multiplicities or embedded components. In the 1960s Ehrenpreis and Palamodov proved their famous Fundamental Principle which states that all solutions to a system of linear PDE with constant coecients have a certain integral representation over the underlying complex variety. What follows is an algebraic introduction to this subject.
10.1
Why Dierential Equations ?
There are very good reasons for passing from polynomials to dierential equations. Let us illustrate this for one simple quadratic equation in one variable: (87) x2 = 2 where is a real parameter. This equation has two distinct solutions, namely x = and x = , provided the parameter is non-zero. For = 0, there is only one solution, namely x = 0, and conventional algebraic wisdom tells us that this solution is to be regarded as having multiplicity 2. In the design of homotopy methods for solving algebraic equations, such multiple points create considerable diculties, both conceptually and numerically. 146
Consider the translation of (87) into an ordinary dierential equation: f (x) = 2 f (x). (88)
The solution space V to (88) is always a two-dimensional complex vector space, for any value of . For = 0, this space has a basis of exponentials, V = exp( x), exp( x) ,
but for = 0 these two basis vectors become linearly independent. However, there exists a better choice of basis which works for all values of , namely, 1 exp( x) exp( x) , (89) exp( x), V = 2 This new basis behaves gracefully when we take the limit 0: V0 = 1, x .
The representation (89) displays V as a rank 2 vector bundle on the ane -line. There was really nothing special about the point = 0 after all. Perhaps this vector bundle point of view might be useful in developing new reliable homotopy algorithms for numerically computing the complicated scheme structure which is frequently hidden in a given non-radical ideal. Our second example is the following system of three polynomial equations x3 = yz , 2f 3f , = x3 yz y 3 = xz , z 3 = xy. 3f 2f . = z 3 xy (90)
These equations translate into the three dierential equations 3f 2f = y 3 xz and (91)
The set of entire functions f (x, y, z) which satisfy these dierential equations (91) is a complex vector space. This vector space has dimension 27, the Bzout number of (90). A solution basis for (91) is given by e exp(x + y + z), exp(x y z), exp(y x z), exp(z x y), exp(x + iy iz), exp(x iy + iz), exp(y + ix iz), exp(y ix + iz), exp(z + ix iy), exp(z ix + iy), exp(iy + iz x), exp(iy iz x), exp(ix + iz y), exp(ix iz y), exp(ix + iy z), exp(ix iy z), 1, x, y, z, z2 , y 2, x2 , x3 + 6yz, y 3 + 6xz, z 3 + 6xy, x4 + y 4 + z 4 + 24xyz Here i = 1. Using the results to be stated in the next sections, we can read o the following facts about our equations from the solution basis above: 147
(a) The system (90) has 17 distinct complex zeros, of which 5 are real. (b) A point (a, b, c) is a zero of (90) if and only if exp(ax + by + cz) is a solution to (91). All zeros other than the origin have multiplicity one. (c) The multiplicity of the origin (0, 0, 0) as a zero of (90) is eleven. This number is the dimension of the space of polynomial solutions to (91). (d) Every polynomial solution to (91) is gotten from the one specic solution, namely, from x4 +y 4 +z 4 +24xyz, by taking successive derivatives. (e) The local ring of (90) at the origin is Gorenstein. We conclude that our solution basis to (91) contains all the information one might ask about the solutions to the polynomial system (90). The aim of this lecture is to extend this kind of reasoning to arbitrary polynomial systems, that is, to arbitrary systems of linear PDE with constant coecients. Our third and nal example is to reinforce the view that, in a sense, the PDE formulation reveals a lot more information than the polynomial formulation. Consider the problem of solving the following polynomial equations: xi + xi + xi + xi = 0 1 2 3 4 for all integers i 0. (92)
The only solution is the origin (0, 0, 0, 0), and this zero has multiplicity 24. In the corresponding PDE formulation one seeks to identify the vectorspace of all functions f (x1 , x2 , x3 , x4 ), on a suitable subset of 4 or 4 , such that if if if if + + + x1 i x2 i x3 i x4 i = 0 for all integers i 0. (93)
Such functions are called harmonic. The space of harmonic functions has dimension 24. It consists of all successive derivatives of the discriminant (x1 , x2 , x3 , x4 ) = (x1 x2 )(x1 x3 )(x1 x4 )(x2 x3 )(x2 x4 )(x2 x4 ).
Thus the solution space to (93) is the cyclic , , , -module x1 x2 x3 x4 generated by (x1 , x2 , x3 , x4 ). This is what solving (92) should really mean.
148
10.2
Zero-dimensional Ideals
We x the polynomial ring [] = [1 , . . . , n ]. The variables have funny names but they are commuting variables just like x1 , . . . , xn in the previous lectures. We shall be interesting nding the solutions of an ideal I in []. Let F be a class of C -functions on n or on n or on some subset thereof. For instance F might be the class of entire functions on n . Then F is a module for the ring []: polynomials in [] acts on F by dierentiation. More precisely, if p(1 , 2 , . . . , n ) is a polynomial of degree d then it acts on F by sending a function f = f (x1 , . . . , xn ) in the class F to the result of applying the dierential operator p( x1 , x2 , . . . , xn ) to f . The class of functions F in which we are solving should always be chosen large enough in the following sense. If I is any ideal in [] and Sol(I) is its solution set in F then the set of all polynomials which annihilates all functions in Sol(I) should be precisely equal to I. What this means algebraically is that F is supposed to be an injective cogenerator for []. In what follows we will consider functions which are gotten by integration from products of exponentials and polynomials. The resulting class F is large enough. We start out by reviewing the case of one variable, abbreviated = 1 , over the eld of complex numbers. Here I = p is a principal ideal in [], generated by one polynomial which factors completely: p() = a0 + a1 + a2 2 + a3 3 + + ad d = ( u1 )e1 ( u2 )e2 ( ur )er
Here we can take F to be the set of entire functions on the complex plane . The ideal I represents the ordinary dierential equation ad f (d) (x) + + a2 f (x) + a1 f (x) + a0 f (x) = 0. (94)
The solution space Sol(I) consists of all entire function f (x) which satisfy the equation (94). This is a complex vector space of dimension d = e1 + e2 + + er . A canonical basis for this space is given as follows: Sol(I) = xj exp(ui x) | i = 1, 2, . . . , r , j = 0, 1, . . . , ei 1 . (95)
We see that Sol(I) encodes all the zeros together with their multiplicities. We now generalize the formula (95) to PDEs in n unknowns which have nite-dimensional solution space. Let I be any zero-dimensional ideal in 149
[] = [1 , . . . , n ]. We work over the complex numbers instead of the rational numbers to keep things simpler. The variety of I is a nite set V(I) = { u(1) , u(2) , . . . , u(r) }
n
and the ideal has a unique primary decomposition I = Q1 Q2 Qr ,
where Qi is primary to the maximal ideal of the point u(i) , Rad(Qi ) = 1 u1 , 2 u2 , . . . , n u(i) . n
(i) (i)
Given any operator p in [], we write p( + u(i) ) for the operator gotten (i) from p() by replacing the variable j with j + uj for all j {1, 2, . . . , n}. The following shifted ideal is primary to the maximal ideal 1 , . . . , n : shift(Qi ) = p( + u(i) ) : p Qi .
Let shift(Qi ) denote the complex vector space of all polynomials f [x1 , . . . , xn ] which are annihilated by all the operators in shift(Qi ). Lemma 107. The vector spaces shift(Qi ) and []/Qi are isomorphic.
Proof. Writing J = shift(Qi ), we need to show the following. If J is a 1 , . . . , n -primary ideal, then []/J is isomorphic to the space J of polynomial solutions of J. By our hypothesis, there exists a positive integer m such that 1 , . . . , n m lies in J. Hence J consists of polynomials all of whose terms have degree less than m. Dierentiating polynomials denes a nondegenerate pairing between the nite-dimensional vector spaces []/ 1 , . . . , n m and [x]<m = { polynomials of degree less than m}. This implies that J equals the annihilator of J in []/ 1 , . . . , n m , and hence []/J and J are complex vector spaces of the same dimension. In the next section we will show how to compute all polynomial solutions of an ideal in []. Here we patch solutions from the points of V(I) together. Theorem 108. The solution space Sol(I) of the zero-dimensional ideal I [] is a nite-dimensional complex vector space isomorphic to []/I. It is spanned by the functions q(x) exp(u(i) x) = q(x1 , x2 , . . . , xn ) exp(u1 x1 + u2 x2 + + u(i) xn ), n
(i) (i)
where i = 1, 2, . . . , r and q(x) shift(Qi ) . 150
Proof. An operator p() annihilates the function q(x)exp(u(i) x) if and only if the shifted operator p( + u(i) ) annihilates the polynomial q(x). Hence the given functions do lie in Sol(I). Moreover, if we let q(x) range over a basis of shift(Qi ) , then the resulting functions are -linearly independent. We conclude that the dimension of Sol(I) is at least the dimension of []/I. For the reverse direction, we assume that every function f in F is characterized by its Taylor expansion at the origin. Any set of such functions whose cardinality exceeds the number of standard monomials of I, in any term order, is easily seen to be linearly dependent over the ground eld . We have demonstrated that solving a zero-dimensional ideal in [] can be reduced, by means of primary decomposition, to nding all polynomial solutions of a system of linear PDE with constant coecients. In the next section we describe how to compute the polynomial solutions.
10.3
Computing Polynomial Solutions
In this section we switch back to our favorite ground eld, the rational numners , and we address the following problem. Let J be any ideal in [] = [1 , . . . , n ]. We do not assume that J is zero-dimensional. We are interested in the space Polysol(J) of polynomial solutions to J. Thus Polysol(J) consists of all polynomials in [x] = [x1 , . . . , xn ] which are annihilated by all operators in J. Our problem is to decide whether Polysol(J) is nite-dimensional and, in the armative case, to give a vector space basis. The rst step in our computation is to nd the iterated ideal quotient I = J : (J : 1 , 2 , . . . , n
(96)
The ideal I is the intersection of all primary components of J which are not contained in the maximal ideal 1 , 2 , . . . , n . Such a primary component cannot have any polynomial solutions, because an operator f () cannot annihilate a nonzero polynomial p(x) unless the constant term of f () is zero. This observation implies Polysol(J) = Polysol(I). (97)
Proposition 109. The following three conditions are equivalent: The vector space Polysol(J) is nite-dimensional. 151
The ideal I is zero-dimensional. The ideal I is 1 , . . . , n -primary. It is easy to test the second condition. We do so by computing the reduced Grbner basis of I with respect to any term order on []. The conditions o in Proposition 109 are met if and only if every variable i appears to some power in the initial ideal in (I) = in (g) : g I . Let B be the (nite) set of monomials in [x1 , . . . , xn ] which are annihilated by in (I). These are precisely the -standard monomials of I but written in the x-variables instead of the -variables. Clearly, the set B is a -basis of Polysol(in (I)). Let N denote the set of monomials in [x1 , . . . , xn ]\B. For every non-standard monomial there is a unique polynomial
x B
c,
in the ideal I,
which is gotten by taking the normal form modulo G. Here c, . Abbreviate ! := 1 !2 ! n !. For a standard monomial x , dene f (x) = x +
x N
c,
! x . !
(98)
0, then This sum is nite because I is 1 , . . . , n -primary, i.e., if || I and hence c, = 0. We can also write it as a sum over all n : f (x) =
c,
! x . !
Theorem 110. The polynomials f , where x runs over the set B of standard monomials, forms a -basis for the space I = Sol(I) = Polysol(I). Proof. The polynomials f are -linearly independent. Therefore, it suces
152
to show g()f (x) = 0 for g() = g()f (x) =

u
Cu u I. ! u ( x ) ! ! xu ( u)! ! v! xv where v = u
c, Cu c, Cu
u
=
v u
cu+v, Cu 1 v!
= !
v
cu+v, Cu xv .
u
The expression u cu+v, Cu is the coecient of in the v g(). It is zero since v g() I. If I is homogeneous, then we can write f = x +
x Nd
-normal form of
c,
! x !
(99)
where the degree of x is d and Nd denotes the degree d elements in the set N of non-standard monomials. We summarize our algorithm for nding all polynomial solutions to a system of linear partial dierential equations with constant coecients. Input: An ideal J []. Output: A basis for the space of polynomial solutions of J. 1. Compute the colon ideal I in formula (96). 2. Compute the reduced Grbner basis of I for a term order o 3. Let B be the set of standard monomials for I. 4. Output f (x1 , . . . , xn ) for f in (98), for all B. The following special case deserves particular attention. A homogeneous zero-dimensional ideal I is called Gorenstein if there is a homogeneous polynomial V (x) such that I = {p [] : p()V (x) = 0 }. In this case 153 .
I consists precisely of all polynomials which are gotten by taking successive partial derivatives of V (x). For example, the ideal I generated by the elementary symmetric polynomials is Gorenstein. Here V (x) = 1i<jn (xi xj ), the discriminant, and I is the space of harmonic polynomials. Suppose we wish to decide whether or not a ideal I is Gorenstein. We rst compute a Grbner basis G of I with respect to some term order . A o necessary condition is that there exists a unique standard monomial x of maximum degree, say t. For every monomial x of degree t there exists a unique constant c such that x c x I. We can nd the c s by normal form reduction modulo G. Dene V := :||=t (c / !) x , and let []V be the -vector space spanned by the polynomials uV = (c+u / !) x ,
:||=t|u|
(100)
where u runs over all monomials of degree at most t. Proposition 111. The ideal I is Gorenstein if and only if []V = I if and only if dim ( []V ) equals the number of standard monomials. The previous two propositions provide a practical method for solving linear systems with constant coecients. We illustrate this in a small example. Example 112. For n = 5 consider the homogeneous ideal I = 1 3 , 1 4 , 2 4 , 2 5 , 3 5 , 1 + 2 4 , 2 + 3 5 . 3 2 1 . The reduced Grbner o
Let be any term order with 5 4 basis of I with respect to equals G =
2 2 2 1 3 4 + 5 , 2 + 3 5 , 3 + 4 5 , 3 5 , 4 , 3 4 4 5 , 5 .
The underlined monomials generate the initial ideal in (I). The space of polynomials annihilated by in (I) is spanned by the standard monomials B = 1, x3 , x4 , x5 , x4 x5 .
There exists a unique standard monomial of maximum degree t = 2, so it makes sense to check whether I is Gorenstein. For any quadratic monomial
154
xi xj , the normal form of xi xj with respect to G equals cij x4 x5 for some constant cij . We collect these constants in the quadratic form V = = 1 2
5
cii x2 + i
i=1 1i<j5
cij xi xj
1 1 1 x4 x5 + x1 x5 + x3 x4 + x2 x3 + x1 x2 x2 x2 x2 . 3 2 2 2 2 1
This polynomial is annihilated by I, and its initial monomial is annihilated by in (I). We next compute the -vector space []V of all partial derivatives of V . It turns out that this space is ve-dimensional. Using Proposition 111 we conclude that I is Gorenstein and its solution space I equals []V .
10.4
How to Solve Monomial Equations

(1) (2) (r)
We consider an arbitrary monomial ideal M = a , a , . . . , a in []. The solution space Sol(M ) consists of all functions f (x1 , . . . , xn ) which have a specied set of partial derivatives vanish: |a | f
a x11
(i) (i)
xar r
(i)
for i = 1, 2, . . . , r.
If M is zero-dimensional then Sol(M ) is nite-dimensional with basis the standard monomials B as in the previous section. Otherwise, Sol(M ) is an innite-dimensional space. In what follows we oer a nite description. We are interested in pairs (u, ) consisting of a monomial xu , with u n , and a subset of {x1 , x2 , . . . , xn } with the following three properties: 1. ui = 0 for all i . 2. Every monomial of the form xu
i
xvi lies in Sol(M ). i

w i
3. For each j there exists a monomial j j
ivi which lies in M .
The pairs (u, ) with these three properties are called the standard pairs of the monomial ideal M . Computing the standard pairs of a monomial ideal is a standard task in combinatorial commutative algebra. See (Hosten and Smith 2001) for an implementation in Macaulay2. This is important for us because the standard pairs is exactly what we want solving a monomial ideal. 155
Theorem 113. A function f (x) is a solution to the ideal M of monomial dierential operators if and only if it can be written in the form f (x1 , . . . , xn ) = xu1 xun g(u,) xi : i , 1 n
where the sum is over all standard pairs of M . Example 114. Let n = 3 and consider the monomial ideal M =
2 3 4 2 4 3 3 2 4 3 4 2 4 2 3 4 3 2 1 2 3 , 1 2 3 , 1 2 3 , 1 2 3 , 1 2 3 , 1 2 3 .
Thus Sol(M ) consists of all function f (x1 , x2 , x3 ) with the property 9f x2 x3 x4 i j k = 0 for all permutations (i, j, k) of {1, 2, 3}.
The ideal M has precisely 13 standard pairs: (x3 , {x1 , x2 }) , (1, {x1 , x2 }) , (x2 , {x1 , x3 }) , (1, {x1 , x3 }) , (x1 , {x2 , x3 }) , (1, {x2 , x3 }) , (x2 x2 , {x1 }) , (x2 x2 , {x2 }) , (x2 x2 , {x3 }) , 2 3 3 1 1 2 3 3 3 2 3 3 3 2 3 3 3 2 (x1 x2 x3 , {}) , (x1 x2 x3 , {}) , (x1 x2 x3 , {}) , (x1 x2 x3 , {}). We conclude that the solutions to M are the functions of the following form x3 f1 (x1 , x2 ) + f2 (x1 , x2 ) + x2 g1 (x1 , x3 ) + g2 (x1 , x3 ) + x1 h1 (x1 , x3 ) + h2 (x1 , x3 ) + x2 x2 p(x1 ) + x2 x2 q(x2 ) + x2 x2 r(x2 ) 2 3 1 3 1 2 3 3 3 2 3 3 3 2 3 3 3 2 + a1 x1 x2 x3 + a2 x1 x2 x3 + a3 x1 x2 x3 + a4 x1 x2 x3 .
10.5
The Ehrenpreis-Palamodov Theorem
We are seeking a nite representation of all the solutions to an arbitrary ideal I in [] = [1 , . . . , n ]. This representation should generalize both the case of zero-dimensional ideals and the case of monomial ideals, and it should reveal all polynomial solutions. Let us present two simple examples, both for n = 3, which do not fall in the categories discussed so far. Example 115. Consider the principal prime ideal I = 1 3 2 . The variety of I is a surface in 3 parametrically given as (s, st, t) where s, t
156
runs over all complex numbers. The PDE solutions to I are the functions f (x1 , x2 , x3 ) which satisfy the equation f 2f = . x1 x3 x2 In the setting of Ehrenpreis and Palamodov, every solution to this dierential equation can be expressed as a double integral of the form f (x1 , x2 , x3 ) = exp sx1 + stx2 + tx3 dsdt, (101)
where the integral is taken with respect to any measure on the complex (s, t)-plane 2 . For instance, we might integrate with respect to the measure supported at two points (i, i) and (0, 17) and get a solution like g(x1 , x2 , x3 ) = exp ix1 x2 + ix3 + exp 17x3 .
Example 116. Let us consider the previous example but now add the requirement that the second partials with respect to x2 and x3 should vanish 2 2 as well. That is, we now consider the larger ideal J = 1 3 2 , 2 , 3 . The ideal J is primary to 2 , 3 . It turns out that there are two kinds of solutions: The rst class of solutions are functions in the rst variable only: f (x1 , x2 , x3 ) = g(x1 ),
The second class of solutions takes the following form: f (x1 , x2 , x3 ) = g(x1 ) x3 + g (x1 ) x2 .
In both cases, g is any dierentiable function in one variable. It is instructive to derive the second class as a special case from the integral formula (101). We are now prepared to state the Ehrenpreis-Palamodov Theorem, in a form that emphasizes the algebraic aspects over the analytic aspects. For more analytic information and a proof of Theorem 117 see (Bjrk 1979). o Theorem 117. Given any ideal I in [1 , . . . , n ], there exist nitely many pairs (Aj , Vj ) where Aj (x1 , . . . , xn , 1 , . . . , n ) is a polynomial in 2n unknowns and Vi n is the irreducible variety of an associated prime of I, such that the following holds. If K is any compact and convex subset of n and f C (K) is any solution to I, then there exist measures j on Vj such that f (ix1 , . . . , ixn ) =
j Vj
Aj (X, ) exp(ix ) dj ().
(102)
157
Here i2 = 1. Theorem 117 gives a precise characterization of the scheme structure dened by I. Indeed, if I is a radical ideal then all Aj can be taken as the constant 1, and the pairs (1, Vj ) simply run over the irreducible components of I. The main point is that the polynomials Aj (x, ) are independent of the space F = C (K) in which the solutions lie. In the opinion of the author, the true meaning of solving a polynomial system I is to exhibit the associated primes of I together with their multiplier polynomials Aj (x, ). Our earlier results on zero-dimensional ideals and monomial ideals can be interpreted as special cases of the Ehrenpreis-Palamodov Theorem. In both cases, the polynomials Aj (x, ) only depend on x and not on the auxiliary variables . In the zero-dimensional case, each Vj is a single point, say Vj = {u(j) }. Specifying a measure j on Vj means picking a constant multiplier for the function exp(x u(j)). Hence we recover Theorem 108. If I is a monomial ideal then each Vj is a coordinate subspace, indexed by a subset of the variables, and we can take monomials xu1 xun for the Aj . Thus, in the 1 n monomial case, the pairs (Aj , Vj ) are the standard pairs of Theorem 113. For general ideals which are neither zero-dimensional nor monomials, one needs the appearance of the extra variables = (1 , . . . , n ) is the polynomials Aj (x, ). A small ideal where this is necessary appears in Example 116. Suppose we are given an ideal I in [] and we wish to compute the list of pairs (Aj , Vj ) described in the Ehrenpreis-Palamodov Theorem. It is conceptually easier to rst compute a primary decomposition of I, and then compute multipliers Aj for each primary component separately. This leads to the idea of Noetherian operators associated to a primary ideal. In the literature, it is customary to Fourier-dualize the situation and to think of the Ai (x, ) as dierential operators. We shall sketch this in the next section.
10.6
Noetherian Operators
In this section we consider ideals in the polynomial ring [x] = [x1 , . . . , xn ]. Let Q be a primary ideal in [x] and V its irreducible variety in n . Theorem 118. There exist dierential operators with polynomial coecients, Ai (x, ) =
j j j j ci pj (x1 , . . . , xn ) 11 22 nn , j
i = 1, 2, . . . , t,
with the following property. A polynomial f [x] lies in the ideal Q if and only if the result of applying Ai (x, ) to f (x) vanishes on V for i = 1, 2, . . . , t. 158
The operators A1 (x, ), . . . , Ar (x, ) are said to be Noetherian operators for the primary ideal Q. Our computational task is to go back and fourth between the two presentations of a primary ideal Q. The rst presentation is by means of ideal generators, the second presentation is by means of Noetherian operators. Solving the equations Q means to go from the rst presentation to the second. The reverse process can be thought of as implicitization and is equally important. The nal version of these notes will contain some interesting examples to demonstrate the usefulness of Noetherian operators.
10.7
Exercises
(1) Let a, b, c be arbitrary positive integers. How many linearly independent (polynomial) functions f (x, y, z) satisfy the dierential equations b+c f af = , xa y b z c af b+c f = y a xb z c and af b+c f ? = z a xy
(2) Let 1 , 2 , 3 be parameters and consider the dierential equations 1 + 2 + 3 1 , 1 2 + 1 3 + 2 3 2 , 1 2 3 3 Find a solution basis which works for all values of the parameters 1 , 2 , 3 . One of your basis elements should have the form (x1 x2 )(x1 x3 )(x2 x3 ) + O(1 , 2 , 3 ). (3) Describe all solutions to the dierential equations
2 3 3 1 3 2 , 2 , 3 .
(4) The mth symbolic power P (m) of a prime ideal P in a polynomial ring [x1 , . . . , xn ] is the P -primary component in the ordinary power P m . What are the Noetherian operators for P (m) ?
159
11
References
1. E. Allgower and K. Georg, Numerical Continuation Methods, Springer Verlag, 1990. 2. R. Benedetti and J.-J. Risler, Real algebraic and semi-algebraic sets, Actualites Mathematiques, Hermann, Paris, 1990. 3. G. Bergman, The logarithmic limit-set of an algebraic variety, Trans. Amer. Math. Soc. 157 (1971) 459469. 4. D. Bernstein, The number of roots of a system of equations, Functional Analysis and its Applications 9 (1975) 14. 5. R. Bieri and J. Groves, The geometry of the set of characters induced by valuations, J. Reine Angew. Math. 347 (1984) 168195. 6. J.-E. Bjrk, Rings of dierential operators, North-Holland Mathemato ical Library, Vol. 21, Amsterdam-New York, 1979. 7. J. Canny and I. Emiris, A subdivision-based algorithm for the sparse resultant, Journal of the ACM 47 (2000) 417451. 8. D. Cox, J. Little and D. OShea: Ideals, Varieties, and Algorithms. An Introduction to Computational Algebraic Geometry and Commutative Algebra, Second edition. Undergraduate Texts in Mathematics. Springer-Verlag, New York, 1997. 9. D. Cox, J. Little and D. OShea: Using Algebraic Geometry, Graduate Texts in Mathematics, Vol 185. Springer-Verlag, New York, 1998. 10. P. Diaconis and B. Sturmfels, Algebraic algorithms for sampling from conditional distributions, Annals of Statistics 26 (1998) 363397. 11. P. Diaconis, D. Eisenbud and B. Sturmfels, Lattice walks and primary decomposition, Mathematical Essays in Honor of Gian-Carlo Rota, eds. B. Sagan and R. Stanley, Progress in Mathematics, Vol 161, Birkhuser, a Boston, 1998, pp. 173193. 12. A. Dickenstein and I. Emiris, Multihomogeneous resultant matrices, ISSAC 2002. 160
13. A. Dobra and S. Sullivant, A divide-and-conquer algorithm for generating Markov bases of multi-way tables, Manuscript, 2002, posted at http://www.niss.org/adobra.html/. 14. D. Eisenbud, Commutative algebra. With a view toward algebraic geometry, Graduate Texts in Mathematics, Vol 150, Springer-Verlag, New York, 1995. 15. D. Eisenbud, D. Grayson, M. Stillman and B. Sturmfels, Mathematical Computations with Macaulay 2, Algorithms and Computation in Mathematics, Vol. 8, Springer Verlag, Heidelberg, 2001, see also http://www.math.uiuc.edu/Macaulay2/. 16. D. Eisenbud and J. Harris, The Geometry of Schemes, Graduate Texts in Mathematics, Vol 197, Springer-Verlag, New York, 2000. 17. D. Eisenbud and B. Sturmfels, Binomial ideals, Duke Math. Journal 84 (1996) 145. 18. W. Fulton, Introduction to toric varieties, Annals of Mathematics Studies, 131, Princeton University Press, Princeton, NJ, 1993. 19. D. Geiger, C. Meek and B. Sturmfels, On the toric algebra of graphical models, submitted for publication, 2002, posted at http://research. microsoft.com/scripts/pubs/view.asp?TR ID=MSR-TR-2002-47 20. I. Gelfand, M. Kapranov, and A. Zelevinsky, Discriminants, resultants, and multidimensional determinants, Birkhuser Boston, Boston, 1994. a 21. G. Greuel and G. Pster, A SINGULAR Introduction to Commutative Algebra, Springer Verlag, 2002, http://www.singular.uni-kl.de/ 22. B. Haas, A simple counterexample to Kouchnirenkos conjecture, Beitrge zur Algebra und Geometrie (2002), to appear. a 23. S. Hosten and G. Smith, Monomial ideals, in Mathematical Computations with Macaulay 2, eds. D. Eisenbud, D. Grayson, M. Stillman and B. Sturmfels, Algorithms and Computation in Mathematics, Vol. 8, Springer Verlag, Heidelberg, 2001, pp. 73100.
161
24. N.V. Ilyushechkin, The discriminant of the characteristic polynomial of a normal matrix. Mat. Zametki 51 (1992), no. 3, 1623; translation in Math. Notes 51 (1992), no. 3-4, 230235. 25. M. Kalkbrener and B. Sturmfels, Initial complexes of prime ideals, Advances in Mathematics 116 (1995) 365376. 26. A. Khetan, Determinantal formula for the Chow form of a toric surface, ISSAC 2002, posted at http://math.berkeley.edu/ akhetan/ 27. A. Khovanskii, A class of systems of transcendental equations, Dokl. Akad. Nauk SSSR 255 (1980), no. 4, 804807. 28. A. Khovanskii, Fewnomials, Translations of Mathematical Monographs, Vol 88. American Mathematical Society, Providence, RI, 1991. 29. M. Kreuzer and L. Robbiano, Computational Commutative Algebra. 1, Springer-Verlag, Berlin, 2000, see also http://cocoa.dima.unige.it/ 30. J. Lagarias and T. Richardson, Multivariate Descartes rule of signs and Sturmfelss challenge problem, Math. Intelligencer 19 (1997) 915. 31. R. Laubenbacher and I. Swanson, Permanental ideals, J. Symbolic Comput. 30 (2000) 195205. 32. S. Lauritzen, Graphical Models. Oxford Statistical Science Series, 17, Oxford University Press, New York, 1996. 33. P. Lax, On the discriminant of real symmetric matrices, Comm. Pure Appl. Math. 51 (1998) 13871396. 34. C. Lemke and J. Howson, Jr., Equilibrium points of bimatrix games, J. Soc. Indust. Appl. Math. 12 (1964) 413423. 35. T.Y. Li, Numerical solution of multivariate polynomial systems by homotopy continuation methods, Acta numerica, 1997, 399436, Acta Numer., Vol 6, Cambridge Univ. Press, Cambridge, 1997. 36. T.Y. Li, M. Rojas and X. Wang, Counting isolated roots of trinomial systems in the plane and beyond, Manuscript, 2000, math.CO/0008069.
162
37. K. Mayer, Uber die Lsung algebraischer Gleichungssysteme durch hyo pergeometrische Funktionen, Monatshefte Math. Phys. 45 (1937). 38. J. McDonald, Fiber polytopes and dractional power series, J. Pure Appl. Algebra 104 (1995) 213233. 39. J. McDonald, Fractional power series solutions for systems of equations, Discrete and Computational Geometry 27 (2002) 501529. 40. R.D. McKelvey and A. McLennan, The maximal number of regular totally mixed Nash equilibria, J. Economic Theory, 72 (1997) 411425. 41. A. McLennan and I. Park: Generic 4 4 two person games have at most 15 Nash equilibria, Games Econom. Behav. 26 (1999) 111130. 42. J. Nash, Non-cooperative games, Annals of Math. 54 (1951) 286295. 43. J. Nash and L. Shapley, A simple three-person poker game, Contributions to the Theory of Games, pp. 105116. Annals of Mathematics Studies, no. 24. Princeton University Press, Princeton, NJ, 1950. 44. P. Parrilo, Structured Semidenite Programs and Semialgebraic Geometry Methods in Robustness and Optimization, Ph.D. thesis, Caltech, 2000, http://www.aut.ee.ethz.ch/ parrilo/pubs/index.html. 45. P. Parrilo and B. Sturmfels, Minimizing polynomial functions, to appear in the Proceedings of the DIMACS Workshop on Algorithmic and Quantitative Aspects of Real Algebraic Geometry in Mathematics and Computer Science (March 2001), (eds. S. Basu and L. Gonzalez-Vega), American Mathematical Society, posted at math.OC/0103170. 46. P. Pedersen and B. Sturmfels, Product formulas for resultants and Chow forms, Mathematische Zeitschrift 214 (1993) 377396. 47. G. Pistone, E. Riccomagno and H.P. Wynn, Algebraic Statistics: Computational Commutative Algebra in Statistics, Chapman and Hall, Boca Raton, Florida, 2001. 48. M. Saito, B. Sturmfels and N. Takayama, Grbner Deformations of o Hypergeometric Dierential Equations, Algorithms and Computation in Mathematics 6, Springer Verlag, Heidelberg, 1999. 163
49. H. Scarf, The computation of economic equilibria, Cowles Foundation Monograph, No. 24, Yale University Press, New Haven, Conn.-London, 1973. 50. I. Shafarevich, Basic algebraic geometry. 1. Varieties in projective space, Second edition. Translated from the 1988 Russian edition and with notes by Miles Reid. Springer-Verlag, Berlin, 1994. 51. A. Sommese, J. Verschelde and C. Wampler, Numerical decomposition of the solution sets of polynomial systems into irreducible components, SIAM J. Numer. Anal. 38 (2001) 20222046. 52. G. Stengle, A nullstellensatz and a positivstellensatz in semialgebraic geometry, Math. Ann. 207 (1974) 8797. 53. B. Sturmfels, On the Newton polytope of the resultant, Journal of Algebraic Combinatorics 3 (1994) 207236. 54. B. Sturmfels, Grbner Bases and Convex Polytopes, American Matheo matical Society, University Lectures Series, No. 8, Providence, Rhode Island, 1996. 55. B. Sturmfels, Solving algebraic equations in terms of A-hypergeometric series, Discrete Mathematics 210 (2000) 171-181. 56. B. Sturmfels and A. Zelevinsky, Multigraded resultants of Sylvester type, Journal of Algebra 163 (1994) 115-127. 57. A. Takken, Monte Carlo Goodness-of-Fit Tests for Discrete Data, PhD Dissertation, Department of Statistics, Stanford University, 1999.
164

Solving Polynomial Systems

Uploaded by

Copyright:

Available Formats

Solving Polynomial Systems

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Solving Polynomial Systems

Uploaded by

Copyright:

Available Formats

SOLVING SYSTEMS OF POLYNOMIAL EQUATIONS

Polynomials in One Variable

The Fundamental Theorem of Algebra

> lprint( solve( a3 * x^3 + a2 * x^2 +

1/6/a3*(36*a1*a2*a3-108*a0*a3^2-8*a2^3+12*3^(1/2)*(4*a1^3*a3 -a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2+4*a0*a2^3)^(1/2)*a3) ^(1/3)+2/3*(-3*a1*a3+a2^2)/a3/(36*a1*a2*a3-108*a0*a3^2-8*a2^3 +12*3^(1/2)*(4*a1^3*a3-a1^2*a2^2-18*a1*a2*a3*a0+27*a0^2*a3^2 +4*a0*a2^3)^(1/2)*a3)^(1/3)-1/3*a2/a3

Numerical Root Finding

29/2 43/2 1/128 t + O(t ) 29/2 43/2 1/128 t + O(t )

Each Puiseux series looks like x(t) = t + higher terms in t.

The lower boundary of the Newton polygon

X(a0 , a1 , a2 , . . . , ad1 , ad ), X(a0 , a1 , a2 , . . . , ad1 , ad ).

X j + f (X) From this we derive f (X) ai =

1 a1 a2 a3 a4 2 3/5 2/5 + 3 2/5 3/5 + 4 1/5 4/5 5 a5 a0 a5 a0 a5 a0 a5

where runs over the ve complex roots of the equation 5 = 1, and a0

1/5 a5 a1 3/5 2/5 a0 a5 a2 2/5 3/5 a0 a5 a3 1/5 4/5 a0 a5

a2 a3 2 1 125 a9/5 a6/5 0 5

a2 a2 3 4 125 a4/5 a11/5 0 5 6 a1 a2 a3 25 a8/5 a7/5 0 5 a1 a2 3 3 7/5 8/5 25 a a

2 a2 a4 5 a3/5 a7/5 0 5 3 a3 a4 5 a2/5 a8/5

a3 a2 7 4 25 a3/5 a12/5 0 5 6 a1 a2 a4 25 a7/5 a8/5

a2 2/5 3/5 a0 a5 a3 1/5 4/5 a0 a5

(6) Give a necessary and sucient condition for quartic equation a4 x4 + a3 x3 + a2 x2 + a1 x + a0 = 0

Grbner Bases of Zero-Dimensional Ideals o

Computing Standard Monomials and the Radical

Localizing and Removing Known Zeros

The ideal quotient of I by M is dened as I : M = f S : f M I . 19

equals = (x y)3 z 2 , (z x)3 y 2, (y z)3 x2 1 1 2 1 2 2 2 2 2 x , y , z z + 5 x 5 y + 25 , y 5 x + 1 z + 5 x2 + 1 y 1 z + 5 5

-1 -25 -1/5 -1/5

-1 ] --- ] 125 ] ] ] 1/25] ] ] -2 ] -- ] 25 ] ] -1 ] -- ] 25 ] ] 1/5 ] 1/5 ]

The Trace Form

-12 --25 -2 -25 -2 -25 -2 -25

-2 -25 -12 --25 -2 -25

-2 -25 -2 -25 -12 --25

34 --625 -16 --625

For instance, for d = 4 the entries in the 4 4-Hankel matrix B1 are b0 = b1 = b2 = b3 = b4 = b5 = b6 = 4

Bernsteins Theorem and Fewnomials

From Bzouts Theorem to Bernsteins Theorem e

A polynomial in two unknowns looks like

Note that each edge of P + Q is parallel to an edge of P or an edge of Q. 33

The correctness of this computation can be seen in the following diagram:

Zero-dimensional Binomial Systems

c1, c2, c3],

Introducing a Toric Deformation

a4 b2 (a3 a4 b2 5a2 a2 b1 +12a1 a2 a3 a4 b1 7a2 a2 b1 ) 4 1 1 1 3 2 4 a8 b8 1 3 b1 (a1 a3 a2 a4 ) a2 b3 1

a4 b2 (a1 a3 a2 a4 )(a1 a3 2a2 a4 ) 1 a5 b2 1 3

For details on computing multivariate Puiseux series see (McDonald 1995).

Mixed Subdivisions of Newton Polytopes

The 3-dimensional polytope P+Q

Khovanskiis Theorem on Fewnomials

where J(t, x) denotes the toric Jacobian: J(t, x1 , . . . , xm ) = det xi Gj (t, x) xj .

Can your bound be attained with all real vectors (x, y) ( )2 ?

The Univariate Resultant

Consider two general polynomials of degrees d and e in one variable:

Here is an explicit example in maple of a rational curve of degree six: 50

7/2 1/18 3981 +

7/2 + 1/18 3981.

The Classical Multivariate Resultant

Consider a system of n homogeneous polynomials in n indeterminates f1 (x1 , . . . , xn ) = = fn (x1 , . . . , xn ) = 0. (27)

The Sparse Resultant

for all j J and

The Unmixed Sparse Resultant

can be written as a homogeneous polynomial ai aj bi bj [ijk] = ci cj

1/6/a3(36a1a2a3-108a0a3^2-8a2^3+123^(1/2)(4a1^3a3 -a1^2a2^2-18a1a2a3a0+27a0^2a3^2+4a0a2^3)^(1/2)a3) ^(1/3)+2/3(-3a1a3+a2^2)/a3/(36a1a2a3-108a0a3^2-8a2^3 +123^(1/2)(4a1^3a3-a1^2a2^2-18a1a2a3a0+27a0^2a3^2 +4a0a2^3)^(1/2)a3)^(1/3)-1/3*a2/a3