2311 03503

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

THE F -ADJOINED GAUSS MAP AND GAUSSIAN

LIKELIHOOD GEOMETRY

LUKAS GUSTAFSSON
arXiv:2311.03503v1 [math.AG] 6 Nov 2023

Abstract. We introduce the F -adjoined Gauss map. We use it to


express the Gaussian maximum likelihood degree as a product of two
invariants. As an application of our product formula, we classify all
projective curves of Gaussian maximum likelihood degree 1. We also
provide a formula for the generic Gaussian maximum likelihood degree
of a projective variety X in terms of its polar classes. The renowned
polar class formula for generic Euclidean distance degree is a special
case of our formula.

1. Introduction
In this paper we study algebraic statistical models of centered multivariate
Gaussian distributions. These are subsets of the cone of m × m symmetric
positive definite matrices given by polynomial equalities and inequalities.
Each matrix K in the model corresponds to the probability density function
s
det(K) − x⊺ Kx
fK (x) = e 2 .
(2π)m
The Gaussian maximum likelihood degree of a statistical model M counts
the number of complex critical points of the log-likelihood function
n
ℓS (K) = (log det(K) − trace(SK) − m log(2π)) (1.1)
2
when constrained to the complex Zariski closure M. Here S = n1 ni=1 Yi Yi⊤
P
is the sample covariance matrix obtained from n independent samples Yi of
the underlying distribution and we assume throughout this article that the
vanishing ideal of M is homogeneous. The concept of Gaussian maximum
likelihood degree was introduced in [SU10] for Gaussian statistical models.
In [AGK+ 23] a coordinate-free formulation of the problem is introduced to
emphasize the underlying geometry which we now describe briefly. Given an
arbitrary complex vector space L, a homogeneous polynomial F , a general
linear polynomial u ∈ L∗ and a variety X ⊂ PL one can ask for the number
of critical points of
ℓF,u (x) = log F (x) − u(x)
when restricted to the smooth locus of the affine cone CX over X away
from the vanishing locus of F . The number of complex critical points is
the maximum likelihood degree of X with respect to F which we refer to as
1
2 LUKAS GUSTAFSSON

MLDF (X), in accordance with [DRGS22]. We recover the classical Gaussian


maximum likelihood degree by letting L be the set of m × m symmetric
matrices Sm , F = det and u = trace(S•). In this paper we further extend
the definition of maximum likelihood degree by allowing for a homogeneous
rational function F = f /g, where deg(f ) ̸= deg(g). We introduce more
notation and definitions in Section 2.
In Section 3, Definition 3.1 introduces the F -adjoined Gauss map of a
projective variety X,
γX,F : X 99K Gr(dim(X) − 1, PL).
The main goal of Section 3 is to use γX,F and the order of a subvariety of the
Grassmannian, introduced in Definition 3.14, to prove the following product
formula for MLDF (X).
Theorem 1.1. Let X be a projective variety and F a degree d ̸= 0 homo-
geneous rational function, then
∨ ∨ ))
MLDF (X) = deg(γX,F ) · order(Im(γX,F

if γX,F is generically finite, otherwise it is zero.
It is an important problem to classify the projective varieties X and ho-
mogeneous polynomials F admitting MLDF (X) = 1 because of their con-
nection to the maximum likelihood estimator (MLE) of the underlying sta-
tistical model M. An MLE of a statistical model M and data u is a point
x ∈ argmaxx∈M ℓF,u . This concept has been extensively studied in the liter-
ature, cf. [AKRS21]. Of particular interest are models whose MLE is unique
and a rational function in the data, MLE : L∗ 99K M. If M has homoge-
neous vanishing ideal, corresponding to a projective variety X, the MLE
is a rational function in the data if and only if MLDF (X) = 1 [AGK+ 23,
Theorem 3.1]. The varieties X admitting MLDF (X) = 1 are in bijection
with the solutions to the homaloidal PDE [AGK+ 23, Theorem 3.5]
Φ = F ◦ (−∇ log Φ), Φ : L∗ 99K C rational and homogeneous.
In Section 4 we classify the projective curves and homogeneous polynomials
F such that MLDF (X) = 1 using Theorem 1.1. These curves are also
referred to as curves with rational MLE.
Theorem 1.2. Let X ⊂ PL be a curve of degree > 1 with vanishing ideal
I(X) and let X ∨ denote its dual variety. For any homogeneous polynomial
F of positive degree we have


 The linear span of X is two-dimensional and
∃α ∈ L∗ such that:

MLDF (X) = 1 ⇐⇒
 mult[α] (X ∨ ) = deg(X ∨ ) − 1,


F − αdeg(F ) ∈ I(X).

THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 3

Moreover, MLDF (X) = 1 implies that X ∨ is a hypersurface X ∨ = V (g) ⊂


PL∗ and the solution to the homaloidal PDE with respect to F is determined
by α and g:
 deg(F )
ΦX,L,F = deg(F )deg(F ) · α(∇ log g) .

A direct consequence of this theorem is that curves X ⊂ PL ∼ = P2 with


MLDF (X) = 1 are have bounded degree, deg(X) ≤ deg(F ), as long as the
radical of F is not linear. This provides a classification of all varieties X ⊂
PS2 with MLDdet (X) = 1 by combining known results for linear subvarieties
[AGK+ 23, Corollary 4.12] and Corollary 4.6. We also extend the study of
curves with rational MLE to surfaces in Section 5 and identify a class of
surfaces in P3 with maximum likelihood degree 1.
Note that Pthe Gaussian maximum likelihood degree of X for the Fermat
quadric F = i x2i is the same as the Euclidean distance degree of the affine
cone over X [AGK+ 23, Proposition 2.7]. In Section 6 this connection and the
ideas from earlier sections culminate in a new formula for the generic value
of MLDF (X), extending the renowned formula for the generic Euclidean dis-
tance degree as the sum of the polar classes, EDD(X) = i δi (X) [DHO+ 16,
P
Theorem 5.4]. The formula also shares many similarities with known results
for other optimization problems, cf. [SEDS16, Theorem 1] and [KKS21,
Theorem 5.4].
Theorem 1.3. Let F be a homogeneous rational function on L of nonzero
degree. Let X ⊂ PL, then
dimX
PL−1
MLDF (X) ≤ δi (X)µi (F )
i=0
where δi (X) is the i’th multidegree of the conormal variety of X and µi (F ) is
the maximum likelihood degree with respect to F of a general i-dimensional
projective linear subspace. Equality holds if X is sufficiently general.
The exact definition for genericity in Theorem 1.3 is given in Definition 6.1
and motivated by Proposition 6.6. Theorem 1.3 is intended to be used in the
context of algebraic statistics and Gaussian maximum likelihood estimation,
where L = PSm and F = det. The maximum likelihood degree of a general
i-dimensional projective linear subspace of Sm , µi (det), has been studied in
the literature and is referred to as ϕ(m, i) [MMM+ 20].
Theorem 1.3 brings up a future problem of better understanding the re-
striction of the determinant to the linear space L ⊂ Sm associated with a
graphical model which can not be considered general. Understanding the
restriction of the determinant to such a special linear space would yield a
better understanding of the maximum likelihood degrees of the submodels of
the graphical model. Moreover, the linear space of diagonal matrices (corre-
sponding to the graphical model with no edges) and the maximum likelihood
degrees of its linear subspaces have connections to matroid theory via the
4 LUKAS GUSTAFSSON

permutohedral variety [DMS21, HK12] which would be interesting to explore


in more generality.

Acknowledgments. The author was supported by the VR grant [NT:2018-


03688]. The author would like to thank (in no particular order) Kathlén
Kohn, Sandra Di Rocco, Luca Sodomaco, Luca Schaffler and Orlando Marligliano
for helpful discussions and feedback.

2. Notation
Throughout this article, L and W denote finite-dimensional C-vector
spaces. Let F be a homogeneous rational function of nonzero degree, i.e.,
a quotient of homogeneous polynomials of different degrees. The letter X
refers to an arbitrary closed irreducible projective variety X ⊂ P(L) which is
not contained in the divisor associated with F in PL. Let U (F ) ⊂ X denote
the open set that is the complement of the associated divisor of F . The
projective general linear group of L is PGL(L). We denote the associated
affine cone over X ⊂ PL by CX ⊂ L. With this notation we may define

CX := (CX )reg \ div(F )
where reg denotes the regular/smooth locus of a variety and div(F ) the
divisor associated with a rational function F .
The reader is assumed to be familiar with the notion of dual variety X ∨
and bi-duality for complex projective varieties covered in [GKZ94, Chap-
ter 1]. We will denote the conormal variety of X by

W (X) := {(x, H) | x ∈ Xreg and H is tangent to X at x } ⊂ PL × PL∗


where L∗ denotes the dual vector space whose elements are linear forms and
the rational equivalence class of W (X) is
dimX
PL−1
[W (X)] = δi (X)H dim PL−i (H ∨ )i+1 ∈ A∗ (PL × PL∗ ).
i=0

We denote the span of two linear spaces A, B ⊂ PL by A + B and for a


vector v ∈ L we use [v] ∈ PL to denote the projective equivalence class. We
refer to the gradient of a homogeneous rational function F as ∇F : L 99K L∗
and do not distinguish it from the map
∇F : PL 99K PL∗
[v] 7→ [∇v F ].

For p ∈ PL the hyperplane that corresponds to ∇p F is denoted by ∇p F ∨ .


For a rational map φ : X 99K Y we denote its graph by
Γφ = {(x, y) : φ(x) = y} ⊂ X × Y
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 5

and its closure by Γφ . We denote the multi degrees of the gradient map of
F by µi (F ), i.e.,
dim
X PL
[Γ∇F ] = µj (F )H j (H ∨ )dim PL−j ∈ A∗ (PL × PL∗ )
j=0

We denote the Grassmannian of k-dimensional subspaces of PL by Gr(k, PL)


and Tx X denotes the embedded projective tangent space of X at x. The
Gauss map is defined as
γX : X 99K Gr(dim(X), PL)
x 7→ Tx X.
We identify the space Sm of symmetric m × m matrices with its dual space
Sm∗ via the trace pairing,
 
S ∈ Sm ∼ K 7→ trace(SK) ∈ Sm∗ .
We refer to elements of Sm∗ by S and they correspond to matrices with
entries {sij }1≤i≤j≤m . Elements in Sm are referred to as K with entries
{κij }1≤i≤j≤m . With this identification the gradient of a function F : Sm 99K
C is given by
1 ∂F
(∇K F )ij =
2 − δij ∂κij
where δij is the Kronecker delta. For example on S2 we may consider
F (K) = F ( κκ11
12
κ12
κ22 ) = κ11 κ22 − κ212 − κ211
and its gradient is
−2κ11 −κ12
∇K F = ( κ22−κ12 κ11 ).

3. A product formula for Gaussian maximum likelihood degree


The aim of this section is to introduce the F -adjoined Gauss map and
give a self-contained proof of the Theorem 1.1 that is needed in Section 4
to classify the curves with rational MLE. The theory of this section is built
from ideas presented in [DRGS22, Section 2].
Definition 3.1. The F-adjoined Gauss map associated with a variety X ⊂
PL and homogeneous rational function F : L 99K C of degree d ̸= 0 is the
map
γX,F : X 99K Gr(dim(X) − 1, PL)
p 7→ (Tp X) ∩ (∇p F ∨ ).
Dually we can define the map

γX,F : X 99K Gr(codimPL (X), PL∗ )
p 7→ (Tp X)∨ + ∇p F.
6 LUKAS GUSTAFSSON

We also define the projective maximum likelihood correspondence as


XF = {(x, u) : ∃L s.t. (x, L) ∈ ΓγX,F
∨ and [u] ∈ L} ⊂ X × L∗ .

(a) Illustration of γX for a curve. (b) Illustration of γX,F for a curve.

Figure 1. Figure (A) illustrates a collection of tangent lines


to a curve, i.e., some selected values of γX . Figure (B) illus-
trates a collection of points, each lying on a unique tangent
line, that correspond to some selected values of γX,F . Images
created with the Desmos Graphing Calculator, used with per-
mission from Desmos Studio PBC [DS23].

Example 3.2. Consider the curve X = V (g) ⊂ P(C3 ) for g = x0 x1 −x22 −x20
and let F = x0 x1 −x22 . Identify C3 with its dual vector space via the bilinear
form ⟨x, y⟩ = x0 y0 + x1 y1 + x2 y2 . Then
∇x F = (x1 , x0 , −2x2 ) ∇x g = (x1 − 2x0 , x0 , −2x2 )
and
Tx X = γX (x) = (∇x g)∨ = {a ∈ P2 | (x1 − 2x0 )a0 + x0 a1 − 2x2 a2 = 0}
This means that the image of x under the F -adjoined Gauss map is the
linear space of dimension 0 = dim(X) − 1 obtained by intersecting Tx X
with (∇x F )∨ :
γX,F (x) = Tx X ∩ (∇x F )∨
={a ∈ P2 | (x1 − 2x0 )a0 + x0 a1 − 2x2 a2 = x1 a0 + x0 a1 − 2x2 a2 = 0}
={(0 : 2x2 : x0 )}.
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 7

Definition 3.3. Fix a homogeneous rational function F of degree d ̸= 0 on


L. For u ∈ L∗ we define the log-likelihood with respect to F
ℓF,u (p) := log(F (p)) − u(p).
Remark 3.4. Technically the codomain of ℓF,u is C/(2πiZ) but for the
purpose of taking derivatives this is not a problem as we can identify its
differential with a map
d(ℓF,u ) : L 99K L∗
dp F
p 7→ −u
F (p)
Definition 3.5. Let X ⊆ P(L) be a projective variety. We define the affine
maximum likelihood correspondence with respect to F as
ξF = {(x, u) : x is a smooth critical point of ℓF,u |CX◦ } ⊂ L × L∗ .
The affine maximum likelihood correspondence comes equipped with two
projections denoted by πL and πL∗ respectively.
Remark 3.6. Fix a basis for L (and its dual basis on L∗ ), generators hi
for I(X) and letting F be the quotient of two relatively prime homogeneous
polynomials F = f /g. We may use the coordinate functions xi on L and ui
on L∗ to describe the ideals I(XF ), I(ξF ) ⊂ C[x, u] respectively
  +
u
* q ∞
I(XF ) = I(X) + (c + 2) minors g∇x f − f ∇x g   : I(γX,F )
JX (x)
∞
f g( ∇ xF
    q
I(ξF ) = I(X) + (c + 1) minors F (x) − u) : I(Xsing ) · I(f g)
JX (x)
where c := codimPL (X), the rows of JX are the gradients of hi and
  
g∇x f − f ∇x g
I(γX,F ) = (c + 1) minors
JX (x)
I(Xsing ) = ⟨c minors JX (x)⟩ + I(X).
Lemma 3.7. The projection πL : ξF → CX ◦ is a vector bundle of rank

codimPL (X). In particular, if X is irreducible then ξF is irreducible and


of dimension dim(L), thus also ξF .
Proof. The fiber over a point x is explicitly given by
 
∇x F
x, + {0} × Nx∗ CX ,
F (x)
where Nx∗ CX is the conormal space of CX at x, i.e., the row space of the
Jacobian matrix JX (x). The vector bundle trivializes over each open set
determined by a c × c minor of JX (x) being non-zero. The curious reader
can verify that the vector space structure on each fiber is given by
(x, u)+(x, v) = (x, u+v−∇x F/F (x)) t·(x, u) = (x, tu+(1−t)∇x F/F (x)).
8 LUKAS GUSTAFSSON

Lastly, a vector bundle over an irreducible variety is irreducible and the


dimension of a vector bundle is given by the sum of the dimensions of the
base and the fibers. □
Lemma 3.8. If X is (uni-)rational then so is ξF .
Proof. A vector bundle of a (uni-)rational variety is (uni-)rational and we
have shown the vector bundle structure of ξF in Lemma 3.7. Assume without
loss of generality that X = V (g1 , . . . , gk ) is of codimension c and that the
defining equations gi are ordered such that Nx∗ CX = Span({∇x gi }ci=1 ) on an
open subset U ⊂ X. This is an explicit trivialization of the conormal bundle
on U . Suppose for some positive integer n that we are given a dominant
map Ψ : Cn 99K CX ◦ , then

Ψ̃ : Cn × Cc 99K L × L∗
!
∇Ψ(t) F X
(t, λ) 7→ Ψ(t), + λi ∇Ψ(t) gi
F (Ψ(t))
i

If the map πL∗ : ξF → L∗ is dominant it is generically finite because the
domain and codomain have the same dimension by Lemma 3.7.
Definition 3.9. The Gaussian ML degree of X ⊂ PL with respect to F ,
denoted by MLDF (X), is the degree of πL∗ : ξF → L∗ if it is dominant and
0 otherwise.
We interpret MLDF (X) as the number of critical points of ℓF,u |CX◦ for
general u.
Remark 3.10. The maximum likelihood degree is a topological invariant
◦ . It is shown in [DRGS22, Theorem 1.3] that for two homogeneous
of CX
polynomials F, G of nonzero degree
U (F ) = U (G) =⇒ MLDF (X) = MLDG (X).
Moreover, the proof of this fact doesn’t rely on the fact that F, G are polyno-
mials, only that they are homogeneous rational functions of nonzero degrees.
Lemma 3.11. For generic u ∈ L∗ , the critical points contributing to MLDF (X)
are reduced.
Proof. If πL∗ is not dominant then MLDF (X) = 0. Otherwise, by inter-
secting the dense open set where πL∗ has finite fibers (see [Mil09, Theo-
rem 10.12]) and the dense open set where it is a smooth morphism (see
[Har77, Corollary 10.7]) we have a dense open set where we have finitely
many smooth points of ξF in the fiber. □
There is a closed subset of L∗ where the fibers of πL∗ are not zero-
dimensional and reduced. One might refer to this set as the ‘Gaussian
ML-discriminant’ in spirit of the Euclidean distance discriminant [DHO+ 16].
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 9

Lemma 3.12. The critical points of ℓF,u |CX◦ satisfy


u(x) = deg(F ).
Proof. Observe that
∇x F
(x, u) ∈ ξF ⇐⇒ − u ∈ Nx∗ CX .
F (x)
The vector space Nx∗ CX is characterized as the linear forms that vanish
on the embedded tangent space Tx CX . Since CX is a cone we have that
x ∈ Tx CX and thus
∇x F
(x, u) ∈ ξF ⇐⇒ (x) − u(x) = 0 =⇒ deg(F ) − u(x) = 0
F (x)
where the last implication is due to Euler’s homogeneous function theorem.

Proposition 3.13. The maximum likelihood degree of X can be computed
as the degree of the projection to L∗ from the projective maximum likelihood
correspondence.
MLDF (X) = deg(πL∗ : XF 99K L∗ )
Proof. We show that there is a commutative diagram of projections onto L∗
α
ξF XF
πL∗ π̃L∗
L∗

where α is birational and thus by definition


MLDF (X) = deg(πL∗ ) = deg(π̃L∗ ◦ α) = deg(π̃L∗ ) · deg(α) = deg(π̃L∗ ).
To prove this we define
α(x, u) = ([x], u)
deg(F )x
β([x], u) = ( , u).
u(x)
The image of α is contained in XF because for x ∈ CX ◦ we have that
∨ ([x]) ensures that (x, L) ∈ Γ ∨
L = γX,F γX,F and [u] ∈ L. The domain and
codomain of α are equidimensional irreducible varieties. To complete the
proof it is therefore sufficient to show that β ◦ α = IdξF . This is equivalent
to the fact that all critical points of ℓF,u |CX◦ satisfy u(x) = deg(F ). This is
shown in Lemma 3.12. This concludes the proof. □
Definition 3.14. Let Y ⊂ Gr(k, Pn ) be a closed subvariety and p ∈ Pn a
generic point. We define the order of Y to be the cardinality
order(Y ) = #(Y ∩ Σk (p)) = #{L ∈ Y | p ∈ L}.
Here Σk (p) is a Schubert variety. See [EH16] for more details.
10 LUKAS GUSTAFSSON

Example 3.15. An example of a variety Y ⊂ Gr(k, PL) of order 1 can be


constructed as the set of k planes that contain a fixed k − 1 plane PW ⊂ PL.
We denote this variety by P(L/W). Indeed, for any point p ̸∈ PW there is
a unique L = p + PW ∈ Y that is uniquely determined by the element of
the projective space P(L/W) that p represents.
Proof of Theorem 1.1. We compute the maximum likelihood degree via the
Proposition 3.13 as the degree of πL∗ : XF 99K L∗ . By definition there are
∨ )) many choices of L that contain a generic point [u]. More-
order(Im(γX,F
∨ )
over, again by definition, for each such L there exists either 0 or deg(γX,F
many choices of x such that (x, L) ∈ ΓγX,F∨ depending on whether γX,F ∨ is
generically finite or not. This concludes the proof. □
Lemma 3.16. Let F = α ∈ L∗ be a linear polynomial and assume that the
dual variety X ∨ is a hypersurface, then
MLDF (X) ≤ deg(X ∨ ).
If α is generic this is an equality.
Proof. This lemma is a direct consequence of Theorem 1.3 but we give a
direct proof. For a generic u ∈ L∗ , Proposition 3.13 and the fact ∇F = α ∈
̸
X ∨ enables us to compute MLDF (X) as the number of points in
{x ∈ Xreg : (Tx X)∨ ∩ (α + u) ̸= ∅}
where α + u denotes the projective line spanned by the two points. Now we
use bi-duality [GKZ94, Theorem 1.1] to see that a generic point H ∈ X ∨
is only tangent to a unique point x ∈ Xreg and the linear spaces (Tx X)∨
rule a dense subset of X ∨ . If α is generic then the line α + u is generic and
we can avoid the boundary points. So to compute the maximum likelihood
degree we can equivalently count (or bound from above if α is not generic)
the MLDF (X) through the number of points in
X ∨ ∩ (α + u),
which is the desired quantity by definition. □
Remark 3.17. To finish this section we note that the maximum likelihood
degree under appropriate assumptions is the top Segre class of the projec-
tive maximum likelihood correspondence XF , seen as a cone (e.g. a vector
bundle) over X, in the sense of [Ful98, Chapter 4]. This is the idea that
motivated Theorem 1.3, although it is not part of the proof. Under some
genericity assumptions XF fits in the middle of an exact sequence because
P(XF ) both contains and is ‘spanned by’ the conormal variety of X and the
graph of ∇F |X : X 99K PL∗ .

4. Curves of Gaussian Maximum likelihood degree 1


This section is dedicated to classifying complex projective curves and
homogeneous polynomials F admitting MLDF (X) = 1. The main idea
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 11

of the classification is that for a curve with rational MLE Theorem 1.1
guarantees that the image of γX,F is a line.
Lemma 4.1. Let Y ⊂ PL be the variety that is ruled by Im(γX,F ) and SX ,
SY be the linear span of X and Y respectively, then
dim SX ≤ dim SY + 1.
Proof. If X ⊂ SY the statement is immediate. Assume this is not the
case and thus a generic point of X is outside of SY . We start by studying
Z := join(X, SY ), this is the closure of the union of all lines that meet X
and SY in distinct points [Har92, Lecture 8]. For a generic p ∈ X
Tp X = p + γX,F (p)
where + denotes the linear span. By construction γX,F (p) ⊂ SY . We use
the Terracini Lemma [Ådl87, Lemma 1.11] to conclude that for a generic
point p ∈ X
dim Z = dim(Tp X + SY ) = dim(γX,F (p) + p + SY )
= dim(p + SY ) = dim(SY ) + 1.
Moreover, taking the secant of Z and applying Terracini Lemma we have
that the secant variety, denoted by Z 2 , has dimension equal to
dim Z 2 = dim(Tp X + Tq X + SY ) = dim(γX,F (p) + p + γX,F (q) + q + SY )
= dim(p + q + SY ) ≤ dim SY + 2 = dim(Z) + 1.
and thus by [Ådl87, Proposition 1.4] we have that Z is a linear space that
contains X, thus the linear span of X must have smaller dimension than
Z. □
Corollary 4.2. Let X ⊊ PL with linear span SX = PL and γX,F (x) ⊂ PW
for some a linear subspace W ⊊ L. Then PW is a hyperplane.
Proof. Let Y denote the variety ruled by Im(γX,F ). By assumption we have
that the linear span SY of Y satisfies SY ⊂ PW. By Lemma 4.1 we have
that
dim PL = dim SX ≤ dim(SY ) + 1 ≤ dim(PW) + 1 ≤ dim PL
and thus dim(PW) = dim PL − 1. □
The next proposition reveals the relation between polynomials F and
hypersurfaces X with rational MLE under the assumption that Im(γX,F ∨ ) is

the collection of lines passing through some fixed point [α] ∈ PL∗ . This will
include all plane curves but also varieties of higher dimension.
Proposition 4.3. Suppose that X ⊂ PL is a non-linear hypersurface,
MLDF (X) = 1 and that Im(γX,F ∨ ) is the family of lines passing through

a point [α], for some α ∈ L . Then MLDα (X) = 1 and
F ΦX,L,F
deg(F )
= = constant.
α X (deg(F )ΦX,L,α )deg(F )
12 LUKAS GUSTAFSSON

Here Φ denotes the corresponding solution to the homaloidal PDE and α can
be chosen such that the constant is 1, in which case F − αdeg(F ) ∈ I(X).
∨ (x) ∋ [α] ⇐⇒ γ ∨
Proof. Note that γX,F X,F (x) ⊂ [α] := H. Since X is a
hypersurface and α = ∇α, because it is linear, the α-adjoined Gauss map is
given by
γX,α = Tp X ∩ H.
By construction γX,F ⊂ γX,α and they are equidimensional. Hence
γX,F = γX,α .
This means that 1 = MLDF (X) = MLDα (X) by Theorem 1.1. We now use
[AGK+ 23, Theorem 3.5]. When MLDF (X) = 1 there exists an inverse to
the F -adjoined Gauss map and it determines the MLE up to scaling. This
means that there exists a rational function λ(u) such that
MLEX,L,F (u) = λ(u) · MLEX,L,α (u) ⇐⇒
−∇u log ΦX,L,F = −λ(u)∇u log ΦX,L,α
We may pair both sides of this equation with u and use Eulers’ homogeneous
function theorem u(∇u Φ) = deg(Φ) (see [AGK+ 23, Theorem 3.1] for details)
to deduce that
deg(F ) = λ(u) · 1
because the solutions to the homaloidal PDE with respect to a polynomial
F have degree equal to − deg(F ). From this, we deduce that
ΦX,L,F
−∇u log deg(F )
=0
ΦX,L,α
ΦX,L,F
which implies that (ΦX,L,α )deg(F )
is a constant function. Lastly, considering
the rationalfunction F/(αdeg(F ) ) we can evaluate this at a generic point p
in the affine cone over X, written on the form
p = −∇u log ΦX,L,F = − deg(F )∇u log ΦX,L,α .
By applying the homaloidal PDE to the numerator and denominator we get
F (p) ΦX,L,F
deg(F )
=
α(p) (deg F ΦX,L,α )deg(F )
which is constant. □
The following theorem is essential for the classification of curves with
rational MLE.
Theorem 4.4. [AGK+ 23, Theorem 6.2] Let L be a finite-dimensional C-
vector space and α ∈ L∗ . The solutions to the homaloidal PDE with respect
to F = α
Φ = α(−∇ log Φ)
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 13

are in bijection with the varieties X ⊂ PL such that the dual variety is
a hypersurface X ∨ = V (g) ⊂ PL∗ and α represents a point [α] ∈ X ∨ of
multiplicity deg(g) − 1. The solution corresponding to X is given by
ΦX,L,α = α(∇ log g).
Remark 4.5. A convenient way of determining if ℓ is a point of multiplicity
deg(g) − 1 on V (g) is to verify that
d
g(u + tℓ) = 0
dt2
This is an important part of the proof of the above theorem. For example
the cuspidal cubic g(u0 , u1 , u2 ) = 4u30 − 54u0 u21 + 27u21 u2 is linear in u2 ,
meaning that (0 : 0 : 1) is point of multpilicty 3 − 1 = 2. In general we can
always perform a linear change of coordinates such that ℓ = (0 : . . . : 0 : 1)
and then we need only check that g is linear in the last variable.
Proof of Theorem 1.2. The implication right-to-left is an immediate conse-
quence of Remark 3.10 because U (F ) = U (α) combined with Theorem 4.4
implies MLDF (X) = MLDα (X) = 1. For the other direction we note that

[u] ∈ γX,F (x) ⇐⇒ γX,F (x) ∈ [u]∨ ,
∨ ) is the degree of the curve Im(γ
thus the order of Im(γX,F X,F ) ⊂ Gr(0, PL) =

PL which is 1, i.e., it is a line. If X ⊂ PL = P then Im(γX,F ) = [α]∨ for
2

some α ∈ L∗ and the statement follows from applying Proposition 4.3 and
Theorem 4.4. If the ambient projective space has higher dimension we know
from Lemma 4.1 that X has a 2-dimensional projective linear span PW such
that X ⊂ PW ⊊ PL. We may apply Proposition 4.3 and Theorem 4.4 to
X ⊂ PW and F |W because MLDF (X) = MLDF |W (X) by Remark 3.10.
This provides us with α̃ ∈ W ∗ and a solution to the homaloidal PDE on
W ∗,
ΦX,W,F |W = (deg(F ) · ΦX,W,α̃ )deg(F ) .
We now extend the solution ΦX,W,F |W to a solution ΦX,L,F . Consider the
two different embeddings X ⊂ PW ⊂ PL and their corresponding dual
varieties: XW∨ = V (g̃) ⊂ PW ∗ and X ∨ = V (g) ⊂ PL∗ . Let π : L∗ →
L
W ∗ denote the dual map of the embedding ι : W → L. Notice that we
may choose g = g̃ ◦ π and α such that α̃ = α ◦ ι = α|W . We see that
(F − αdeg(F ) )|X = (F |W − α̃deg(F ) )|X = 0. By an appropriate choice of
basis of L and applying Remark 4.5 we deduce that [α] is a point on XL∨ of
multiplicity deg(XL∨ ) − 1 because [α̃] is for XW∨ . This concludes the proof.

The curious reader can verify that ΦX,L,F has the desired form by [AGK+ 23,
Proposition 2.8],
ΦX,L,F = ΦX,W,F |W ◦ π.
By the chain rule and Theorem 4.4
α(∇u log g) =α(∇u log(g̃ ◦ π)) = α((∇π(u) log g̃) ◦ π) = α(ι(∇π(u) log g̃))
=α̃(∇π(u) log g̃) = (ΦX,W,F |W ◦ π)(u).
14 LUKAS GUSTAFSSON

Thus ΦX,L,F (u) is obtained by taking a power of α(∇u log g). □


Corollary 4.6. Let X ⊂ PS2 be a curve of degree > 1. Then
MLDdet (X) = 1 ⇐⇒ X = V (det(K) − trace(AK)2 )
where A = ( aa11 a12
12 a22 ) is any symmetric matrix of rank 1. The corresponding
solution to the homaloidal PDE is given by
 2
a22 s11 + a11 s22 − 2a12 s12
ΦX,S2 ,det (S) = 4
(s11 s22 − s212 ) + (a22 s11 + a11 s22 − 2a12 s12 )2
Proof. We first argue that a nonlinear curve with rational MLE is of the
proposed form. Theorem 1.2 ensures the existence of a linear form α(K) =
trace(AK) such that Q = V (det(K) − trace(AK)2 ) contains X as a compo-
nent. If Q is not irreducible the components are linear but the curve X is
not, therefore Q must be irreducible and X = Q. Notice that
X ∩ V (trace(AK)) = V (det(K), trace(AK)).
By Theorem 1.2 mult[trace(AK)] X ∨ = 1, which means that it corresponds to
a tangent line of X, therefore it is also tangent to V (det), i.e., rank(A) = 1.
All curves of this form have rational MLE since they satisfy the conditions of
Theorem 1.2. We also provide the corresponding solution to the homaloidal
PDE by computing the dual quadric of X. The dual quadric is in the
denominator of ΦX,S2 ,det ,
Q∨ (S) = det(S) + trace(adj(A)S)2 ,
a22 −a12
where adj(A) = ( −a 12 a11 ). To see this it is enough to show that the
gradients
∇K Q = adj(K) − 2trace(AK)A
∇S Q∨ = adj(S) + 2trace(adj(A)S)adj(A)
are mutually inverse (up to scaling). The map A 7→ adj(A) is linear and
satisfies
trace(adj(A)S) =trace(Aadj(S))
adj(adj(A)) =A
rank(A) = 1 =⇒ trace(adj(A)A) = 0.
We then compute
 
(∇Q∨ ◦ ∇Q)(K) =adj adj(K) − 2trace(AK)A
 
+ 2trace adj(A)[adj(K) − 2trace(AK)A] adj(A)
=K − 2trace(AK)adj(A) + 2trace(adj(A)adj(K))adj(A)
=K.
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 15

Theorem 4.4 provides us with the solution


ΦX,S2 ,det (S) =trace A∇S log Q∨


trace (A[adj(S) + 2trace(adj(A)S)adj(A)]) 2


 
=4
det(S) + trace(adj(A)S)2
 2
trace(adj(A)S)
=4
det(S) + trace(adj(A)S)2

Example 4.7. Consider the curve X = V (κ11 κ22 − κ212 − κ211 ) ⊂ PS2 . We
may apply Theorem 1.2 or Corollary 4.6 using the linear form
α(K) = trace (( 10 00 ) ( κκ11
12
κ12
κ22 )) = κ11
which is corresponds to a tangent line of X, more specifically it is a smooth
(multiplicity 1) point of the dual variety:
A = ( 10 00 ) ∈ X ∨ = V (s11 s22 − s212 + s222 ).
Theorem 1.2 and Corollary 4.6 provide the correponding solution to the
homaloidal PDE
 2
s11 s12 s22
ΦX,S2 ,det ( s12 s22 ) = 4 .
s11 s22 − s212 + s222
We now verify that this is a solution to the homaloidal PDE with respect to
the determinant and that X is its associated variety. Recall that the pairing
trace(AB) lets us identify the vector space of symmetric matrices with its
dual vector space as long as we scale the off-diagonal entries by 12 . We may
1 ∂ΦX,S2 ,det
compute (∇(ΦX,S2 ,det ))ij = 2−δij ∂sij where δij is the Kronecker delta,

2 s22 −s12
 2 00
−∇S log(ΦX,S2 ,det ) = −s12 s11 +2s22 − ( )
2
s11 s22 − s12 + s222 s22 0 1
2  2
s22 −s12 s22

= 2 2 = MLE(S).
s22 (s11 s22 − s212 + s222 ) −s12 s22 s12 +s22
We verify that
2
s2 (s2 + s222 ) − s212 s222

s22
det(−∇ log ΦX,S2 ,det ) =4 222 12 =4 .
s22 (s11 s22 − s212 + s222 )2 s11 s22 − s212 + s222
The associated variety of this solution is the image of the MLE : S2 99K S2 ,
 2 
s −s s22
MLE = −∇ log(ΦX,S2 ,det ) ∝ −s 22s s2 12 +s 2 .
12 22 12 22

From this we can deduce that the entries of the MLE map satisfy the equa-
tion
κ11 κ22 − κ212 − κ211 = 0
which defines the variety X.
16 LUKAS GUSTAFSSON

Example 4.8. Let L = C3 and consider X = V y 2 z − (x + 2z)3 . Recall



that its dual variety
X ∨ = V (4u3 − 54uv 2 + 27v 2 w)
is a cuspidal curve. The polynomial g = 4u3 − 54uv 2 + 27v 2 w is linear in
w. By Remark 4.5 the point of multiplicity 2 (the cusp) is V (u, v) ∈ X ∨ .
This point V (u, v) corresponds to the line z = 0 that is tangent to X at the
inflection point V (x, z) = (0 : 1 : 0) ∈ X. By Theorem 4.4 we may choose
the linear form α(x, y, z) = 8z as a representative of this tangent line to
obtain a solution
∂ 27v 2
ΦX,C3 ,8z = 8log(4u3 − 54uv 2 + 27v 2 w) = 8 3 .
∂w 4u − 54uv 2 + 27v 2 w
to the homaloidal PDE with respect to F = 8z, i.e.,

Φ = −8 log Φ.
∂w
Now consider
F = y 2 z − x3 − 6x2 z − 12xz 2 + 504z 2
and note F − (8z)3 ∈ I(X). Theorem 1.2 then implies that MLDF (X) = 1
and
ΦX,C3 ,F = 33 Φ3X,L,8z .
We also verify this,
F (−∇ log ΦX,C3 ,F ) =F (−∇ log 33 Φ3X,L,8z ) = 33 F (−∇ log ΦX,L,8z )

{F |X = (8z)3 } =33 (−8 log ΦX,L,8z ))3 = 33 Φ3X,L,8z = ΦX,C3 ,F .
∂w
Example 4.9. Using Example 4.8 we construct a curve inside the space of
3 × 3 matrices L = S3 that has maximum likelihood degree 1. Let W be the
sub-vector space spanned by the matrices
0 0 1 0 0 0  −1 0 6 
A = 0 1 0 , B = 0 0 1 , C = 0 −6 0 .
100 010 6 0 48
Let F be the determinant and X be a cuspidal curve inside of W defined as
the matrices xA + yB + zC such that y 2 z − (x + 2z)3 = 0, or equivalently
X = V (4κ22 −4κ13 +κ33 ,κ12 ,48κ11 +κ33 ,1728κ313 −432κ213 κ33 −36κ223 κ33 +36κ13 κ233 −κ333 ).
Notice that
det(xA + yB + zC) = F (x, y, z) = y 2 z − x3 − 6x2 z − 12xz 2 + 504z 2 .
We observe that this the Example 4.8 in disguise. We use the explicit bijec-
tion W ∼ = C3 given by the matrices A, B, C. Applying [AGK+ 23, Proposi-
tion 2.8] proves that X has maximum likelihood degree 1 and we obtain the
following factorization
ΦX,S3 ,det (S) = Φ3X,C3 ,F ◦ πC3
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 17

where πC3 is defined by restricting dual vectors to W. More explicitly:


πC3 (S) =trace (S(xA + yB + zC))
=x · trace(SA) + y · trace(SB) + z · trace(SC)
=(2s13 + s22 )x + 2s23 y + (−s11 − 6s22 + 48s33 + 12s13 )z.
The linear forms x, y, z form the basis of W ∗ that is dual to the basis A, B, C
of W. So we now substitute u, v, w by tr(SA), tr(SB), tr(SC) into the known
solution ΦX,C3 ,F from Example 4.8 to obtain the solution,
 3
27(2s23 )2
ΦX,S3 ,det = 29 33 4(2s13 +s22 )3 −54(2s13 +s22 )(2s23 )2 +27(2s23 )2 (−s11 −6s22 +48s33 +12s13 )
.

This solution can be verified using the code provided in [AGK+ 23] at
https://mathrepo.mis.mpg.de/GaussianMLDeg1.

5. A family of maximum likelihood degree 1 surfaces


Theorem 1.3 suggests a suitable class of varieties that are not curves and
that might have rational MLE. These are the dual varieties of curves, as
they have a small number of nonzero polar degrees. We explore this idea for
surfaces in P3 in this section. The easiest cases of surfaces that are dual to
a plane curve. These are addressed in the following Lemma.
Lemma 5.1. Suppose X ⊂ PL, Y ⊂ PW and MLDF (X) = MLDG (Y ) = 1.
Then consider the join ZX,Y ⊂ P(L ⊕ W), cut out by the equations of both
X and Y . Then MLDF ·G (ZXY ) = 1 and
MLEZXY ,L⊕W,F ·G = MLEX,L,F ⊕ MLEY,W,G
or equivalently
ΦZXY ,L⊕W,F ·G (uL , uW ) = ΦX,L,F (uL ) · ΦY,W,G (uW ).
Proof. This is an immediate consequence of plugging the suggested solution
into the homaloidal PDE because it splits as F does not depend on the entry
in W and G not on the entry in L. □
Example 5.2. Consider the singular quadratic surface Q = V (κ11 κ22 −
κ212 − κ211 , κ13 , κ23 ) ⊂ PS3 sitting inside the linear space PV of matrices of
the form  
κ11 κ12 0
κ12 κ22 0 .
0 0 κ33

The surface Q is the join of the curve X = V (κ11κ22 −κ212 − κ211 ⊂ PS2 from
000
Example 4.7 and the point Y = V (0) ⊆ PW = ⟨ 0 0 0 ⟩. The restriction of
001
the determinant to V factors, det |V = (κ11 κ22 − κ212 )κ33 , where
MLD(κ11 κ22 −κ212 ) (X) = 1
MLDκ33 (Y ) = 1.
18 LUKAS GUSTAFSSON

Now Lemma 5.1 can be applied to the linear space V and then extended
to all of S3 by projecting to V (as described in [AGK+ 23, Proposition 2.8])
which gives MLDdet (Q) = 1 with
 2
s22 1
ΦS3 ,Q,det = ΦX,S2 ,det · ΦY,CY ,κ33 = 4 2 2 ·
s11 s22 − s12 + s22 s33
We now turn our attention to a larger class of varieties, those whose dual
variety is a nonplanar curve with a special secant line.
Theorem 5.3. Let X ⊂ PL ∼ = P3 be the dual variety of a curve X ∨ that is
not a plane curve. Suppose that there are two smooth points [α], [β] ∈ X ∨
such that the secant line/pencil they span meets X ∨ in deg(X ∨ ) − 1 many
points. Let F = α · β then MLDF (X) = 1, moreover X is ruled by a family
of curves C such that MLDF (C) = 1.
Proof. We prove this by proving that both factors of Theorem 1.1 are 1. By

construction, the map γX,F maps every point of X to a pencil that meets the
curve X and the secant line/pencil spanned by [α], [β], denoted by (PW)∨ .

The family of lines that meet a curve and a special secant line are known
to be of order 1. To see this, a generic u spans a plane u + PW ∨ which by
Bézout’s theorem meets X ∨ in a unique point outside PW ∨ that determines
a unique line in the family. We now turn our attention to the second factor
of the formula in Theorem 1.1. We must show that γX,F is birational onto its
image to conclude the proof. The closure of the fibers of the Gauss map γX
are called contact loci. Bi-duality [GKZ94, Theorem 1.1] and knowing that
W (X) = W (X ∨ ) is a projective bundle over the smooth locus of X ∨ implies
that the generic contact locus C is a line. To ensure that γX,F is birational we
must prove that the restriction of ∇F |C = [β(x)α + α(x)β] : C → (PW)∨ is
birational for a generic C. This is equivalent to F |C having two distinct zeros
and by applying the Euler characteristic formula [DRGS22, Theorem 1.3]
this is equivalent to MLDF (C) = 1 . We do a proof by contradiction and
assume F |C has a double root. This in turn is equivalent to the generic
contact locus C meeting PW = V (α) ∩ V (β) which is equivalent to the
generic tangent line of X ∨ meeting (PW)∨ . This is a contradiction because
then X ∨ would be a plane curve by Terracini lemma [Ådl87, Lemma 1.11]
and contradict our assumption. This concludes the proof. □
Example 5.4. Let us consider the projective space P(C4 ) and identify it
P3
with its dual vector space via the bilinear form ⟨x, y⟩ = i=0 xi yi . Let
n ∨
X ⊂ P be the dual variety of the twisted cubic X ⊂ P , n

X =V (x21 x22 − 4x0 x32 − 4x31 x3 + 18x0 x1 x2 x3 − 27x20 x23 )


X ∨ =V (u22 − u1 u3 , u1 u2 − u0 u3 , u21 − u0 u2 ).
The points (1 : 0 : 0 : 0) and (0 : 0 : 0 : 1) lie on X ∨ and we identify them
with the linear forms x0 , x3 ∈ X ∨ . The smooth curve X ∨ is degree 3 and
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 19

the secant line spanned by x0 , x3 satisfies the assumptions of Theorem 5.3.


The corresponding solution to the homaloidal PDE is
u1 u2
ΦX,L,(x0 x3 ) (u) =
(u1 u3 − u22 )(u0 u2 − u21 )
which is verified using the following code in Macaulay2 [GS].

R = QQ[x_0..x_3]; S = QQ[u_0..u_3];
X = x_1^2*x_2^2-4*x_0*x_2^3-4*x_1^3*x_3+18*x_0*x_1*x_2*x_3-27*x_0^2*x_3^2
Phi = (u_1*u_2)/(u_1^2*u_2^2-u_0*u_2^3-u_1^3*u_3+u_0*u_1*u_2*u_3)
p = numerator Phi; gradp = diff(vars S, p)
q = denominator Phi; gradq = diff(vars S, q)
nablaPhi = flatten entries (
sub((1/q^2), frac S)*sub((q*gradp - p*gradq), frac S))
MLE = apply(nablaPhi, f -> -f/Phi)
MLE_0*MLE_3 == Phi
ker (map(S,R, apply(q^2*nablaPhi, f -> sub(f,S)))) == ideal(X)

Notice that this solution is the product of the two functions


u2 u1
(u0 u2 − u1 ) (u1 u3 − u22 )
2

that are solutions to the homaloidal PDE respectively to x0 and x3 even


though the assumptions of Lemma 5.1 are not satisfied.

6. The Generic Maximum Likelihood Degree


In this section we provide the proof of Theorem 1.3 and introduce the
notion of a variety X ⊂ PL being F -general. We then motivate the name F -
general by proving that a generic perturbation gX of X, where g ∈ PGL(L),
is F -general.
Definition 6.1. Let F : L 99K C be homogeneous of nonzero degree. A
variety X ⊂ PL is said to be F -general if W (X) ∩ Γ∇F = ∅ and the variety
{(x, y, z) : (x, y) ∈ W (X) and (x, z) ∈ Γ∇F } ⊂ PL × PL∗ × PL∗
is irreducible.
The intuitive idea behind X being F general is that X should intersect
a stratification of the divisor associated with F transversely for the formula
to hold. The varieties of maximum likelihood degree 1 fail at this.
Lemma 6.2. Suppose F is a non-constant homogeneous polynomial such
that V (F ) ⊂ PL is smooth, then X is F -general if and only if
W (X) ∩ Γ∇F = ∅.
Moreover, if X is also smooth this is equivalent to X and V (F ) intersecting
transversely.
20 LUKAS GUSTAFSSON

Proof. By the assumption that V (F ) is smooth, ∇F is a morphism. Thus


Γ∇F = Γ∇F and the projection
{(x, y, z) : (x, y) ∈ W (X) and (x, z) ∈ Γ∇F } → PL
(x, y, z) 7→ x
is a projective bundle over X with fibers {x} × Wx (X) × {∇x F }, hence
irreducible. This proves the first part of the claim. For the second part,
assuming that V (F ), X are smooth and transverse means that
∅ = W (X) ∩ W (V (F )) = W (X) ∩ Γ∇F .

Example
P 2 6.3. Suppose that F = Q is the Fermat/isotropic quadric Q =
n
i xi on C . By Lemma 6.2 a smooth variety X being Q-general is equiv-
alent to X intersecting the quadric V (Q) transversely.
Example 6.4. Consider the smooth curve X = V (κ11 κ22 −κ212 −κ211 ) ⊂ PS2
from Example 4.7. Letting F = det = κ11 κ22 − κ212 we prove that that X is
not F -general. Observe that X is tangent to V (F ) at ( 00 01 ) with the common
tangent line V (κ11 ). The curve X is not F -general by Lemma 6.2. We may
also verify that  
[ 00 01 ] , [ 10 00 ] ∈ W (X) ∩ Γ∇F ̸= ∅.

Lemma 6.5. Let φ : Pn 99K Pn be a rational map and X ⊂ Pn . A generic


perturbation gX by g ∈ PGL(L) will respect the graph of φ in the sense
((gX) × Pn ) ∩ Γφ = Γ(φ|gX ) ⊂ Pn × Pn .
Proof. The inclusion Γ(φ|gX ) ⊆ Γφ ∩ (gX × Pn ) is immediate. Using Kleiman
transversality for the product of projective general linear groups on Pn × Pn
we can say that a general g makes gX × Pn intersect any closed subvariety
generically transverse. This means that for any subvariety Y we have
dim(Y ∩ (gX × Pn )) = dim(Y ) − codimPn (X)
and in particular Γφ ∩ (gX × Pn ) is of pure dimension dim(X). The proof of
the lemma will be complete if we prove that the intersection Γφ ∩ (gX × Pn )
is irreducible.
By construction any additional component of Γφ ∩ (gX × Pn ) except for
Γ(φ|gX ) must be contained inside I × Pn where I is the indeterminacy locus
of φ. It suffices to prove that dim Γφ ∩ (((gX) ∩ I) × Pn ) < dim(X). To see
this we observe that
Γφ ⊋ Γφ ∩ (I × Pn ),
and since Γφ is irreducible this means that dim(Γφ ∩ (I × Pn )) < n. This in
turn implies that
dim Γφ ∩ ((X ∩ I) × Pn ) = dim((Γφ ∩ (I × Pn )) ∩ (X × Pn ))
= dim Γφ ∩ (I × Pn ) − codimPn (X) < n − codimPn (X) = dim(X).
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 21


Proposition 6.6. Let X ⊂ PL and F : L 99K C be homogeneous of non-zero
degree. For a generic g ∈ PGL(L), the perturbed variety gX is F -general.
Proof. Let n = dim PL. The first part is to prove that Γ∇F ∩ W (gX) = ∅.
Consider the variety
V (y(x)) := {([x], [y]) : y(x) = 0} ⊂ PL × PL∗ .
This is a smooth 2n − 1 dimensional irreducible variety, as the projection
onto either factor yields a projective bundle of hyperplanes. Notice that
for any projective variety X we have a containment of the conormal variety
W (X) ⊂ V (y(x)). Thus
W (X) ∩ Γ∇F ⊂ V (y(x)).
Consider y(x) as a divisor on Γ∇F , for any x in the graph and outside the
indeterminacy locus of ∇F we have
u(x) = ∇x F (x) = deg(F )F (x)
by Euler’s homogeneous function theorem. This means that
Y = Γ∇F ∩ V (y(x)) ⊊ Γ∇F
so the proof is reduced to proving that
∅ = W (g · X) ∩ Y ⊂ V (y(x)).
Notice that the PGL(L) acts on V (y(x)) by
g · (x, y) = (g · x, y ◦ g −1 )
and g · W (X) = W (gX). This action is transitive on V (y(x)) because a
linear map can send any hyperplane with a marked point to any another
hyperplane with a marked point. Now we observe that
dim V (y(x)) =2n − 1
dim W (X) =n − 1
dim Y < dim Γ∇F = n.
By Kleiman transversality W (gX) is generically transverse to Y inside of
V (y(x)). The dimensions of these varieties do not add up so the intersection
is empty.
For the second part of the proof, let
W̃ (X) = {(x, y, z) : (x, y) ∈ W (X)} ⊂ V (y(x)) × PL∗
Γ̃∇F = {(x, y, z) : (x, z) ∈ Γ∇F } ⊂ V (y(x)) × PL∗ .
Our goal is to show that
ΘgX,F := W̃ (gX) ∩ Γ̃∇F
22 LUKAS GUSTAFSSON

is irreducible. Observe that PGL(L)×PGL(L) acts transitively on V (y(x))×


PL∗ by
(g, h) · (x, y, z) = (g(x), y ◦ g −1 , z ◦ g −1 )
and that (g, h) · W̃ (X) = W̃ (gX). We conclude that ΘgX,F is a generically
transverse intersection by Kleiman transversality. By Lemma 6.5 we have a
map
πx,z : ΘgX,F → Γ(∇F )|gX
(x, y, z) 7→ (x, z)
where the codomain is irreducible. Let
U := {(x, z) : x ∈ (gX)reg } ⊂ Γ(∇F )|gX
and notice that πx,z is a projective bundle over U because W (gX) is a projec-
tive bundle over the smooth locus of gX. We conclude that any component
−1
which is not πx,z (U ) must be lying over the singular locus of gX. Let
B̃(X) := {(x, y, z) : x ∈ Xsing } ⊊ W̃ (X)
and note that (g, h) · B̃(X) = B̃(gX). Now
B̃(gX) ∩ Γ̃∇F ⊊ W̃ (gX) ∩ Γ̃∇F
are two generically transverse intersections. We conclude that they are of dif-
ferent pure dimension. Any extraneous component of ΘgX,F = W̃ (X) ∩ Γ̃∇F
is contained in the smaller variety B̃(X)∩ Γ̃∇F . This cannot happen because
both varieties have different pure dimensions (equidimensional components).
Therefore ΘgX,F is irreducible. □
Lemma 6.7. Let P be an irreducible smooth projective variety and consider
two irreducible varieties of the form A ⊂ Pn × P, B ⊂ P × Pm such that
V = (A × Pm ) ∩ (Pn × B) ⊂ Pn × P × Pm
is irreducible and both intersecting varieties are smooth at a generic point of
V . Let πA : A → P denote the projection and assume it induces a surjective
map of tangent spaces at a generic point of V . Then the intersection V is
generically transverse.
Proof. By assumption we have that at a generic point (x, p, y) ∈ V the
involved varieties are smooth, so we have that
Tx,p,y (A × Pm ) = Tx,p A + Ty Pm
Tx,p,y (Pn × B) = Tx Pn + Tp,y B
where + denotes the linear span of all elements in two subsets of the vector
space Tx,p,y (Pn × P × Pm ) ∼
= Tx (Pn ) ⊕ Tp (P ) ⊕ Ty (Pm ). We can now deduce
that
Tx,p,y (A × Pm ) + Tx,p,y (Pn × B) = Tx Pn + (πA,∗ (Tx,p A) + πB,∗ (Tp,y B)) + Ty Pm
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 23

where πA,∗ (Tx,p A) denotes the projection of Tx,p A to Tp P via the differential
of πA , and similarly for B. By assumption we have that πA,∗ (Tx,p A) = Tp P
which completes the proof. □
Theorem 6.8. Let F : L 99K C homogeneous of nonzero degree and X ⊂
PL ∼
= Pn is F -general. Then
n−1
X
MLDF (X) = δi (X)µi (F )
i=0

where we use the multidegrees


n−1
X n
X
∨ i+1
[W (X)] = δi (X)H n−i
(H ) [Γ∇F ] = µj (F )H j (H ∨ )n−j .
i=0 j=0

If X is not F -general this is an upper bound.


Proof. In this proof we identify L with Cn+1 to simplify notation but keep
the distinction between dual vector spaces. Consider the incidence variety
NX,F = {(x, y, z, u) : (x, y) ∈ W (X), (x, z) ∈ Γ∇F , z ∧ y ∧ u = 0}
as a subvariety of Pn × (Pn∗ )3 .
Here W (X) is the conormal variety of X,
Γ∇F is the closed graph of ∇F : Pn 99K Pn∗ and recall that z ∧ y ∧ u = 0
means that z, y, u are collinear. We begin by showing that this variety is
irreducible with the given assumptions. Because X is F -general the pro-
jection of NX,F away from u is a P1 bundle over the irreducible variety
{(x, y, z) : (x, y) ∈ W (X) and (x, z) ∈ Γ∇F } and thus NX,F is irreducible.
If X is not F -general then NX,F may contain extraneous components that
can create excess intersections in the computations below, hence the upper
bound.
The projection (x, y, z, u) 7→ u factors through the projection πL∗ from
Definition 3.9 via the map
β : ÑX,F 99K XF
(x, y, z, u) 7→ (x, u).
We prove that β is birational. The map β is dominant because for a generic
(x, u) we have that {x} × {Wx (X) ∩ (∇x F + u)} × {∇x F } × {u} = β −1 (x, u).
For a generic (x, u) ∈ XX,F the line ∇x F + u meet the projective conormal
space Wx (X), in a unique point. This implies β is birational.
It follows with the given assumptions that (with some abuse of notation)
MLDF (X) = [NX,F ] · Hun .
If the assumptions are not satisfied we instead need to replace NX,F with one
of its irreducible components, and carry out the same argument to illustrate
why the formula is an upper bound. Consider the equality
−1
NX,F = (W (X) × (Pn∗ )2 ) ∩ πx,z (Γ∇F ) ∩ (Pn × Z),
24 LUKAS GUSTAFSSON

where Z is the variety of collinear triples (y, z, u) and where we’ve let
−1 (Γ
πx,z X,F ) = {(x, y, z, u) : (x, z) ∈ Γ∇F }. We proceed by showing that
these varieties are generically transverse. First consider the intersection
−1 (Γ
(W (X) × (Pn∗ )2 ) ∩ πx,z n∗ as u can vary freely in
∇F ) denoted by T × P
both factors. The intersecting varieties are both smooth as long as x is a
smooth point of X and ∇x F is defined, which is true for a generic point in T .
Lemma 6.7 ensures that T is generically transverse because the projection
of the graph Γ∇F to the leftmost factor is surjective on the tangent spaces
of Pn .
For a generic point in NX,F = T ×Pn∗ ∩Pn ×Z the two intersecting varieties
are smooth because Z is smooth when y, z, u are pairwise distinct. The
Lemma 6.7 ensures NX,F is generically transverse because the projection of
Z away from u is surjective on the tangent spaces of (Pn∗ )2 whenever y, z, u
are pairwise distinct. We conclude that

MLDF (X) = [W (X)] · [Γ∇F ] · [Z] · Hun

where [W (X)], [Γ∇F ], [Z] denote the pull-backs of these varieties by projec-
tion in the Chow ring Z[Hx , Hy , Hz , Hu ]. Now we list the classes of these
varieties, where the multidegrees of [Z] · Hun are 1 (see [DHO+ 16, Theo-
rem 5.4])
n−1
X
[W (X)] = δi Hxn−i Hyi+1
i=0
Xn
[Γ∇F ] = µj Hxj Hzn−j
j=0
n−1
X
[Z] · Hun = Hun · Hyn−k−1 Hzk .
k=0

We conclude the proof by computing

MLDF (X) = [W (X)] · [Γ∇F ] · [Z] · Hun


n−1
XX n n−1
X
=Hun · δi µj Hxn−i Hyi+1 Hxj Hzn−j Hyn−k−1 Hzk
k=0 j=0 i=0
n−1
XX n n−1
X
=Hun · δi µj Hxn−i+j Hyn−k+i Hzn−j+k
k=0 j=0 i=0

By recalling that H n+1 = 0 in the Chow ring of Pn we observe that we may


enforce the conditions

j≤i i≤k k ≤ j =⇒ i = k = j
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 25

meaning that
n−1
X
MLDF (X) = ( δi µi )Hun Hxn Hyn Hzn
i=0

Proof of Theorem 1.3. Combine Theorem 6.8 and Proposition 6.6. □


Example 6.9. Let F = Q be a non-degenerate quadratic polynomial on
Pn . The gradient map of Q is a projective linear map and thus all the
multidegrees µi (Q) = 1. We obtain the formula
n−1
X
MLDQ (X) = δi (X),
i=0

for the generic maximum likelihood degree with respect to Q. We may apply
this formula to the space of all 2 × 2 symmetric matrices L = S2 . The 2 × 2
determinant (κ11 κ22 − κ212 ) is non-degenerate. If we use the trace pairing
trace(AB) to identify S2 ∼ = (S2 )∗ we may write
κ22 −κ12
Γ(κ11 κ22 −κ212 ) = {(( κκ11 κ12
)} ⊂ (PS2 ) × (PS2 ).

12 κ22 , ) , −κ12 κ11

Lemma 6.2 guarantees that a smooth curve X is transversal to the vanishing


of the determinant is det-general. The polar degrees of a non-linear plane
curve C is deg(C)and deg(C ∨ ). The generic maximum likelihood degree of
a nonlinear curve in PS2 can then be computed as
MLD(κ11 κ22 −κ212 ) (C) = deg(C) + deg(C ∨ ).

Let us verify this formula for a general smooth cubic/elliptic curve C by com-
paring it to the Euler characteristic formula from [DRGS22, Example 4.4].
The genus of C is 1 so the Euler characteristic formula yields
MLD(κ11 κ22 −κ212 ) (X) = −χ(C)+deg(C)+#(Q∩C) = (−2+2·1)+3+6 = 9.

The Plücker formula yields deg(C ∨ ) = 3(3 − 1) = 6. Theorem 6.8 agrees


with the previous Euler characteristic formula
MLD(κ11 κ22 −κ212 ) (C) = deg(C) + deg(C ∨ ) = 3 + 6 = 9.

Example 6.10. Theorem 6.8 suggests that the maximum likelihood degree
of a linear space L ⊂ PL is exactly a multidegree of the closed graph Γ∇F
(and Theorem 1.3 confirms this). These have been studied in the litera-
ture [DMS21] and computed in the case where F is the determinant of a
symmetric m × m matrix [MMW21].
Example 6.11. Let us apply Theorem 1.3 to S3 and F = det. Consider a
generic quadric hypersurface Q ⊂ PS3 and a quintic curve C ⊂ PS3 cut out
26 LUKAS GUSTAFSSON

by two generic quadrics and two linear forms. The conormal varieties have
the multidegrees:
5 4
[W (Q)] =2(HK HS + HK HS2 + HK
3
HS3 + HK
2
HS4 + HK HS5 )
5 4
[W (C)] =8HK HS + 4HK HS2
where HK , HS generate A∗ (PS3 × P(S3∗ )). To see this, the polar degrees
of a quadric hypersurface are all δi = 2 because the Gauss map is just the
restriction of a linear map to Q. The degree of the dual hypersurface of C
is 8, this can be computed as if C ⊂ P3 was cut out by two quadrics Q1 , Q2 .
The degree of the dual variety is then the number of points on C satisfying
det ∇Q1 ∇Q2 u v = 0 which is another quadric, which is expected to
intersect C in 8 points by Bézouts theorem. The multidegrees µi (det) of the
closed graph in S3 are [AGK+ 21, Section 5]
Γdet(K) = 1HS5 + 2HK 1
HS4 + 4HK 2
HS3 + 4HK
3
HS2 + 2HK
4 5
 
HS + 1HK
Theorem 6.8 predicts the maximum likelihood degree to be
MLDdet (Q) =2(1 + 2 + 4 + 4 + 2) = 26
MLDdet (C) =1 · 8 + 2 · 4 = 16.
Knowing that C is an elliptic curve one can also compute MLDdet (C) =
−2 + 2 + 4 + 12 = 16 via the formula [DRGS22, Example 4.4]. We verify
the value of MLDdet using the following code in Macaulay2 [GS].
m=3;inds = flatten apply(m,i->apply(i+1,j->(j+1,i+1)));n = #inds -1;
R = QQ[join(apply(inds,I->k_I),apply(inds,I->s_I)),
Degrees=>join(apply(n+1,i->{1,0}),apply(n+1,i->{0,1}))]
K = matrix{apply(inds,I->k_I)}
Kmat = matrix apply(m,i->apply(m,j-> k_(min(i,j)+1,max(i,j)+1)))
S = matrix{join(apply(inds,I->s_I), apply(inds,I->0))}
F = ideal det(Kmat)
jac = (J) -> ( transpose jacobian J)
genQ = () -> sum flatten apply(inds,I->apply(inds,J->
(-1)^(random(ZZ))*random(QQ)*k_I*k_J))
genL = () -> sum apply(inds,I->(-1)^(random(ZZ))*random(QQ)*k_I)
--Choose X
X = ideal(genQ(),genQ(),genL(),genL())
X = ideal(genQ())
c=codim X
--Compute critical points for general data
E = X+minors(c+2,S||jac(F)||jac(X))+ideal(sum apply(inds,I->k_I*s_I)-m);
Kring = QQ[apply(inds,I->k_I)]
genericcrits = saturate(sub(sub(E,
apply(inds,I->s_I=>random(QQ))),Kring), sub(F,Kring));
MLD = degree genericcrits
THE F -ADJOINED GAUSS MAP AND GAUSSIAN LIKELIHOOD GEOMETRY 27

References
[Ådl87] Bjørn Ådlandsvik. Joins and higher secant varieties. Math. Scand., 61(2):213–
222, 1987.
[AGK+ 21] C. Améndola, L. Gustafsson, K. Kohn, O. Marigliano, and A. Seigal. The
maximum likelihood degree of linear spaces of symmetric matrices. Le Matem-
atiche (Catania), 76(2):535–557, 2021.
[AGK+ 23] Carlos Améndola, Lukas Gustafsson, Kathlén Kohn, Orlando Marigliano, and
Anna Seigal. Differential equations for gaussian statistical models with ratio-
nal maximum likelihood estimator, 2023. arXiv:2304.12054.
[AKRS21] Carlos Améndola, Kathlén Kohn, Philipp Reichenbach, and Anna Seigal.
Invariant theory and scaling algorithms for maximum likelihood estimation.
SIAM J. Appl. Algebra Geom., 5(2):304–337, 2021.
[DHO+ 16] Jan Draisma, Emil Horobeţ, Giorgio Ottaviani, Bernd Sturmfels, and
Rekha R. Thomas. The Euclidean distance degree of an algebraic variety.
Found. Comput. Math., 16(1):99–149, 2016.
[DMS21] Rodica Andreea Dinu, Mateusz Michalek, and Tim Seynnaeve. Applications
of intersection theory: from maximum likelihood to chromatic polynomials,
2021. arXiv:2111.02057.
[DRGS22] Sandra Di Rocco, Lukas Gustafsson, and Luca Schaffler. Gaussian likelihood
geometry of projective varieties, 2022. arXiv:2208.12560.
[DS23] PBC Desmos Studio. Desmos graphing calculator, 2023.
[EH16] David Eisenbud and Joe Harris. 3264 and all that—a second course in alge-
braic geometry. Cambridge University Press, Cambridge, 2016.
[Ful98] William Fulton. Intersection theory, volume 2 of Ergebnisse der Mathematik
und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics
[Results in Mathematics and Related Areas. 3rd Series. A Series of Modern
Surveys in Mathematics]. Springer-Verlag, Berlin, second edition, 1998.
[GKZ94] I. M. Gelfand, M. M. Kapranov, and A. V. Zelevinsky. Discriminants, resul-
tants, and multidimensional determinants. Mathematics: Theory & Applica-
tions. Birkhäuser Boston, Inc., Boston, MA, 1994.
[GS] Daniel R. Grayson and Michael E. Stillman. Available at
http://www.math.uiuc.edu/Macaulay2.
[Har77] Robin Hartshorne. Algebraic geometry. Graduate Texts in Mathematics, No.
52. Springer-Verlag, New York-Heidelberg, 1977.
[Har92] Joe Harris. Algebraic geometry, volume 133 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 1992. A first course.
[HK12] June Huh and Eric Katz. Log-concavity of characteristic polynomials and the
Bergman fan of matroids. Math. Ann., 354(3):1103–1116, 2012.
[KKS21] Kaie Kubjas, Olga Kuznetsova, and Luca Sodomaco. Algebraic degree of
optimization over a variety with an application to p-norm distance degree,
2021. arXiv:2105.07785.
[Mil09] James S. Milne. Algebraic geometry (v5.20), 2009. Available at
www.jmilne.org/math/.
[MMM+ 20] Laurent Manivel, Mateusz Michalek, Leonid Monin, Tim Seynnaeve, and
Martin Vodička. Complete quadrics: Schubert calculus for gaussian models
and semidefinite programming, 2020.
[MMW21] Mateusz Michalek, Leonid Monin, and Jaroslaw A. Wiśniewski. Maximum
likelihood degree, complete quadrics, and C∗ -action. SIAM J. Appl. Algebra
Geom., 5(1):60–85, 2021.
[SEDS16] Mohab Safey El Din and Pierre-Jean Spaenlehauer. Critical point computa-
tions on smooth varieties: degree and complexity bounds. In Proceedings of
28 LUKAS GUSTAFSSON

the 2016 ACM International Symposium on Symbolic and Algebraic Compu-


tation, pages 183–190. ACM, New York, 2016.
[SU10] Bernd Sturmfels and Caroline Uhler. Multivariate Gaussian, semidefinite ma-
trix completion, and convex algebraic geometry. Ann. Inst. Statist. Math.,
62(4):603–638, 2010.

You might also like