0% found this document useful (0 votes)
77 views304 pages

Combinatorics and Complexity of Partition Functions (2016)

The document is a comprehensive academic work by Alexander Barvinok focusing on the combinatorics and complexity of partition functions. It includes various mathematical topics such as convexity, polynomial approximations, permanents, and partition functions of integer flows, structured into chapters with detailed discussions and references. The book aims to provide a combinatorial perspective on partition functions, addressing different interpretations from both physicists and mathematicians.

Uploaded by

Juan Motelec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views304 pages

Combinatorics and Complexity of Partition Functions (2016)

The document is a comprehensive academic work by Alexander Barvinok focusing on the combinatorics and complexity of partition functions. It includes various mathematical topics such as convexity, polynomial approximations, permanents, and partition functions of integer flows, structured into chapters with detailed discussions and references. The book aims to provide a combinatorial perspective on partition functions, addressing different interpretations from both physicists and mathematicians.

Uploaded by

Juan Motelec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 304

Algorithms and Combinatorics 30

Alexander Barvinok

Combinatorics
and Complexity
of Partition
Functions
Algorithms and Combinatorics

Volume 30

Editorial Board
William J. Cook
Ronald Graham
Bernhard Korte
László Lovász
Avi Wigderson
Günter M. Ziegler
More information about this series at http://www.springer.com/series/13
Alexander Barvinok

Combinatorics
and Complexity
of Partition
Functions

123
Alexander Barvinok
Department of Mathematics
University of Michigan
Ann Arbor, MI
USA

ISSN 0937-5511 ISSN 2197-6783 (electronic)


Algorithms and Combinatorics
ISBN 978-3-319-51828-2 ISBN 978-3-319-51829-9 (eBook)
DOI 10.1007/978-3-319-51829-9

Library of Congress Control Number: 2016963427

Mathematics Subject Classification (2010): 05A05, 05A16, 05C31, 05C50, 05C65, 05C70, 15A15,
15A51, 15A57, 15A69, 30A06, 30A08, 30A82, 30C15, 30E10, 37A60, 37E05, 41A05, 41A10, 60C05,
68A10, 68A20, 68E10, 68R05, 68W25, 82A25, 82A67, 82A68, 90C25, 90C27

© Springer International Publishing AG 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Polynomial Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Polynomials with Real Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 H-Stable Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 D-Stable Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 47
3.1 Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 47
3.2 Permanents of Non-negative Matrices
and H-Stable Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 The van der Waerden Inequality and Its Extensions . . . . . . . . . . . . 55
3.4 The Bregman–Minc Inequality and Its Corollaries . . . . . . . . . . . . . 58
3.5 Matrix Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6 Permanents of Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.7 Approximating Permanents of Positive Matrices. . . . . . . . . . . . . . . 78
3.8 Permanents of a-Conditioned Matrices and Permutations
with Few Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 85
3.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 88
4 Hafnians and Multidimensional Permanents . . . . . . . . . . . . . . . . . . . . 93
4.1 Hafnians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2 Concentration of Hafnians of a-Conditioned Doubly Stochastic
Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.3 Hafnians and Pfaffians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4 Multidimensional Permanents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.5 Mixed Discriminants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.6 A Version of Bregman–Minc Inequalities for Mixed
Discriminants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

v
vi Contents

5 The Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


5.1 Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
5.2 Correlation Decay for the Matching Polynomial . . . . . . . . . . . . . . . 150
5.3 Matching Polynomials of Bipartite Graphs . . . . . . . . . . . . . . . . . . . 158
5.4 The Bethe-Entropy Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.5 Hypergraph Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 174
6 The Independence Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1 The Independence Polynomial of a Graph . . . . . . . . . . . . . . . . . . . 181
6.2 The Independence Polynomial of Regular Graphs . . . . . . . . . . . . . 189
6.3 Correlation Decay for Regular Trees . . . . . . . . . . . . . . . . . . . . . . . 196
6.4 Correlation Decay for General Graphs . . . . . . . . . . . . . . . . . . . . . . 205
6.5 The Roots on and Near the Real Axis . . . . . . . . . . . . . . . . . . . . . . 216
6.6 On the Local Nature of Independent Sets . . . . . . . . . . . . . . . . . . . . 224
7 The Graph Homomorphism Partition Function . . . . . . . . . . . . . . . . . 229
7.1 The Graph Homomorphism Partition Function . . . . . . . . . . . . . . . . 229
7.2 Sharpening in the Case of a Positive Real Matrix . . . . . . . . . . . . . 238
7.3 Graph Homomorphisms with Multiplicities . . . . . . . . . . . . . . . . . . 244
7.4 The Lee–Yang Circle Theorem and the Ising Model . . . . . . . . . . . 258
8 Partition Functions of Integer Flows . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.1 The Partition Function of 0-1 Flows . . . . . . . . . . . . . . . . . . . . . . . . 269
8.2 The Partition Function of Integer Flows . . . . . . . . . . . . . . . . . . . . . 273
8.3 Approximate Log-Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.4 Bounds for the Partition Function . . . . . . . . . . . . . . . . . . . . . . . . . . 284
8.5 Concluding Remarks: Partition Functions for Integer Points
in Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 286
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Chapter 1
Introduction

What this book is about. What is a partition function?


The answer depends on who you ask. You get one (multi)set of answers if you ask
physicists, and another (multi)set if you ask mathematicians (we allow multisets, in
case we want to account for the popularity of each answer). In this book, we adopt
a combinatorial view of partition functions. Given a family F of subsets of the set
{1, . . . , n}, we define the partition function of F as a polynomial in n real or complex
variables x1 , . . . , xn , 
pF (x1 , . . . , xn ) = xi . (1.1)
S∈F i∈S

Under typical circumstances, it is unrealistic to try to write pF as a sum of monomials


explicitly, for at least one of the following two reasons:
(1) the family F is very large
or
(2) we are not really sure how large F is and it will take us a while to go over all
subsets S of {1, . . . , n} and check whether S ∈ F.
Typically, however, we will have no trouble checking if any particular subset S
belongs to F. A good example is provided by the family H of all Hamiltonian cycles
in a given graph G (undirected, without loops or multiple edges) with n edges: we
say that a collection S of edges forms a Hamiltonian cycle in G if the set of edges
in S is connected and every vertex of G belongs to exactly two edges from S, see
Fig. 1.1.
A graph with m vertices may contain as many as (m−1)! 2
different Hamiltonian
cycles and it is believed (known, if P = NP) that it is computationally hard to find
at least one for a graph G supplied by a clever adversary.

© Springer International Publishing AG 2016 1


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_1
2 1 Introduction

Fig. 1.1 A graph with 7


vertices, 12 edges and a
Hamiltonian cycle (thick
lines)

Sometimes we allow F to be a family of multisets, in which case we replace


  μ
xi −→ xi i
i∈S i∈S

in formula (1.1), where μi is the multiplicity of i in S.


Sometimes we know pF perfectly well even if we are unable to write it explicitly
as a sum of monomials due to the lack of time. For example, if F = 2{1,...,n} is the
set of all subsets, we have

  
n
p 2{1,...,n} (x1 , . . . , xn ) = xi = (1 + xi ) (1.2)
S⊂{1,...,n} i∈S i=1

and it is hard to argue that we can know p2{1,...,n} any better than by the succinct product
in (1.2). Our experience teaches us, however, that the cases like (1.2) are quite rare.
For some mysterious reasons they all seem to reduce eventually to some determinant
enumerating perfect matchings in a planar graph, see [Ba82], [Va08] and Chap. 10
of [Ai07] for examples and recall that a perfect matching in a graph is a collection
of edges that contains every vertex of the graph exactly once (see Fig. 4.1) and that
the graph is planar if it can be drawn in the plane so that no two edges can possibly
intersect in a point other than their common vertex (see Fig. 4.8).
Although in Sect. 4.3 of the book we describe the classical Kasteleyn’s construc-
tion expressing the partition function of perfect matchings in a planar graph as a
determinant (more precisely, as a Pfaffian), the focus of the book is different. Since
the efficient exact computation of pF in most interesting situations is believed to be
impossible (unless the computational complexity hierarchy collapses, that is, unless
P = #P), we are interested in situations when pF can be efficiently approximated.
By efficiently approximated we understand that we can compute pF approximately
for all x = (x1 , . . . , xn ) in some sufficiently interesting domain, but not only. We also
approximate pF by some “nice function”, whose behavior we understand reasonably
well. We concentrate mostly on the following three approaches.
1 Introduction 3

Scaling. It may happen that there is a sufficiently rich group of transformations,


for example of the type xi −→ λi xi for some λi , which change the value of the
polynomial pF (x1 , . . . , xn ) in some obvious way and such that after factoring that
group out, we are left with a function that varies little. This is the case for the
permanent (Sect. 3.5), hafnian (Sect. 4.2) and their higher-dimensional extensions
(Sects. 4.4 and 4.5). A closely related approach expresses pF as the coefficient of
a monomial y1α1 · · · y Nα N in some explicit polynomial P (y1 , . . . , y N ) and obtains
an estimate of pF via solution of a convex optimization problem of minimizing
y1−α1 · · · y N−α N P (y1 , . . . , y N ) for y1 , . . . , y N > 0. We apply this approach to estimate
partition functions of flows (Chap. 8).
Correlation decay. We choose a variable (or a small set of variables), say xn ,
and define pFn as the sum of the monomials of pF containing xn . It may happen
that there is some metric on the set {x1 , . . . , xn } of variables such that the ratio
pFn (x1 , . . . , xn ) / pF (x1 , . . . , xn ) does not depend much on the variables xi that are
sufficiently far away from xn in that metric. This allows us to fix values of those
remote variables to our convenience and quickly approximate the ratio. We then
recover pF by iterating this procedure and telescoping. As a result, we approximate
ln pF (x1 , . . . , xn ) by a sum of functions, each of which depends on a small number
of coordinates. We apply this method to the matching polynomial (Sect. 5.2) and to
the independence polynomial of a graph (Sects. 6.3 and 6.4).
Interpolation. Suppose that the polynomial pF has no zeros in a domain
 ⊂ Cn . It turns out that ln pF is well approximated in a slightly smaller domain
 ⊂  by a low degree Taylor polynomial, sometimes after a change of coordinates
(Sect. 2.2). We demonstrate this approach for the permanent (Sects. 3.6 and 3.7) and
hafnian (Sect. 4.1), their higher-dimensional extensions (Sect. 4.4), for the matching
polynomial (Sect. 5.1) and the independence polynomial of a graph (Sect. 6.1), and
for the graph homomorphism partition function (Chap. 7). In our opinion, this is the
most general approach.
The correlation decay approach appears to be closely related to a probabilis-
tic approach, known as the Markov Chain Monte Carlo method. Assuming that
x1 > 0, . . . , xn > 0, we consider the family F as a finite probability space, with
 

Pr (S) = xi / pF (x1 , . . . , xn ) for S ∈ F. (1.3)
i∈S

Suppose that we can sample a random set S ∈ F in accordance with the probability
distribution (1.3). Then we can measure the frequency of how often a random S
contains a particular element of the ground set, say n, and hence we can estimate
the ratio pFn (x1 , . . . , xn ) / pF (x1 , . . . , xn ), which is also the goal of the correlation
decay method. To sample a random S ∈ F, we perform a random walk on F by
starting with some particular S and, at each step, trying to modify S −→  S by a
random move of the type  S := (S \ I ) ∪ J for some small sets I, J ⊂ {1, . . . , n}
performed with probability proportional to
4 1 Introduction
⎛ ⎞ 
S) ⎝ ⎠  −1
Pr (
= xj xi .
Pr (S) j∈J i∈I

It stands to reason that if the ratios of the type pFn (x1 , . . . , xn ) / pF (x1 , . . . , xn )
depend effectively only on a small set of variables, then we can expect the resulting
walk to mix rapidly, that is, we should hit more or less random S after performing a
moderate number of moves.
The Markov Chain Monte Carlo method resulted in a number of remarkable
successes, most notably in a randomized polynomial time approximation algorithm
for the permanent of a non-negative matrix [J+04]. However, we do not discuss it
in this book. First, there are excellent books such as [Je03] describing the method
in detail and second, we are interested in analytic properties of partition functions
that make them amenable to computation (approximation). Granted, the fact that
randomized algorithms are often very efficient must be telling us something important
about analytic properties of the functions they approximate, but at the moment we
hesitate to say what exactly.
Why this is interesting. Why do we care to approximate pF in (1.1)?
For one thing, it gives us some information about complicated combinatorial
families. As an example, let us consider the family H of all Hamiltonian cycles
in a complete graph K m (undirected, without loops or multiple edges) with m ver-
tices 1, . . . , m. Hence to every edge (i, j) of K m we assign a variable xi j , to every
Hamiltonian cycle in K m we assign a monomial that is the product of the variables xi j
on the edges of the cycle, and we define pH by summing up all monomials attached
to the Hamiltonian cycles in K m . If we let xi j = 1 for all edges (i, j) then the value
of pH is just the number of Hamiltonian cycles in K m , which is (m − 1)!/2. If we
assign xi j = 1 for some edges of K m and xi j = 0 for all other edges of K m , then the
value of pH is the number of Hamiltonian cycles in the graph G consisting of the
edges selected by the condition xi j = 1 (generally, it is computationally hard even
to tell pH from 0).
Looking at the problem of counting Hamiltonian cycles through the prism of
the partition function pH allows us to interpolate between a trivial problem (count-
ing Hamiltonian cycles in the complete graph) and an impossible one (counting
Hamiltonian cycles in an arbitrary graph) and find some middle ground. Given a
graph G with vertices 1, . . . , m, let us fix a small  > 0 (think  = 10−10 ) and let us
define
1 if (i, j) is an edge of G
xi j =
 otherwise.

In this case, pH still enumerates Hamiltonian cycles in the complete graph K m , but
it does so deliberately. It counts every Hamiltonian cycle in G with weight 1, while
every Hamiltonian cycle in K m that contains r non-edges of G is counted with weight
r . In Sect. 3.8, we show that it is quite easy to approximate pH within a factor of
m O(ln m) , where the implicit constant in the “O” notation depends on . This gives us
1 Introduction 5

some idea about Hamiltonian cycles in G: for example, we can separate graphs G
with many Hamiltonian cycles (the value of pH is large) from graphs G that do not
acquire a single Hamiltonian cycle unless sufficiently many new edges are added to
G (the value of pH is small).
Two particular topics discussed in this book are
(1) connections between the computational complexity of partition functions and
their complex zeros
and
(2) connections between computational complexity and “phase transition” in
physics.
In statistical physics, one deals with the probability space F defined by (1.3)
(sets S ∈ F are called “configurations”), where xi = eβi /t for some constants βi > 0
and a real parameter t, interpreted as temperature. As the ground set {1, . . . , n}
and the set F of configurations grow in some regular way, one can consider two
related, though not identical notions of phase transition. The first notion has to do
with a complex zero of pF , as a function of t, approaching the positive real axis
at some “critical temperature” tc > 0. This implies the loss of smoothness or even
continuity for various physically meaningful quantities, expressed in terms of ln pF
and its derivatives [YL52]. The second notion of phase transition has to do with
the appearance or disappearance of “long-range correlations”. Typically, at a high
temperature t (that is, when xi are close to 1), there is no long-range correlation:
the probability that S contains a given element i of the ground set is not affected by
whether S contains another element j, far away from i in some natural metric. As the
temperature t falls (and hence xi grow), such a dependence may appear. These two
notions of phase transition are related though apparently not identical, see [DS87]
and [Ci87], we discuss this when we talk about the Ising model in Sect. 7.4 .
The correlation decay approach emphasizing (2) was introduced by Bandyopad-
hyay and Gamarnik [BG08] and independently by Weitz [We06] and is generally
well-known in the computational community, while (1) is relatively less articulated
but appears to be no less interesting. Curiously, while the first type of phase tran-
sition is associated with complex zeros of the partition function approaching the
positive real axis, as far as our ability to approximate is concerned, a priori this
does not represent an insurmountable obstacle. What hinders our ability to compute
are the complex zeros “blocking” the reference point in the vicinity of which pF
looks easy, such as the point xi j = 1 for the partition function pH of Hamiltonian
cycles, see also our discussion in Sect. 2.2. The ways of statistical physics and those
of computational complexity diverge at this point, which is probably explained by
the fact that the temperature in the physical world is necessarily a real number, while
for computational purposes we can manipulate with a complex temperature just as
easily.
We stick to the language of combinatorics but the objects and phenomena dis-
cussed in this book have also their names in physics. Thus the “matching polynomial”
of Chap. 5 corresponds to the “monomer-dimer model”, the “graph homomorphism
partition function” in Chap. 7 corresponds to a “spin system”, while the cut partition
6 1 Introduction

function of Sect. 7.4 corresponds to a “ferromagnetic spin system”. Some of our


results, such as in Sects. 3.6, 3.7, 3.8, 4.2, 4.4, 7.1 and 7.2 correspond to the “mean
field theory” approach, while some others, such as in Chaps. 5 and 6 correspond
to the “hard core” model. For still others, such as in Sects. 3.4, 3.5 and Chap. 8,
we were unable to think of an appropriate physics name (though “renormalization”
may work for those in Sects. 3.4 an 3.5). We talk about physical implications of
results in Sect. 7.4 while discussing the Ising model, which connects several direc-
tions explored this book: zeros of partition functions, phase transition, correlation
decay, graph homomorphisms and enumeration of perfect matchings.
Finally, this book may be interesting because it contains an exposition of quite
recent breakthroughs (available before, to the best of our knowledge, only as
preprints, journal or conference proceedings papers). These include the Gurvits
approach connecting certain combinatorial quantities with stable polynomials
(Sects. 3.3 and 8.1), Csikvári and Lelarge approach to the Bethe-approximation of
the permanent (Sects. 5.3 and 5.4) and Weitz correlation decay method for the inde-
pendence polynomial (Sect. 6.4).
Prerequisites, contents, notation, and assorted remarks. We use some concepts of
combinatorics, but only very basic, such as graphs and hypergraphs. All other terms,
also very basic, such as matchings, perfect matchings and colorings are explained in
the text. We also employ some computational complexity concepts. As we are inter-
ested in establishing that some functions can be efficiently computed (approximated),
and not in proving that some functions are hard to approximate, we use only some
very basic complexity concepts, such as polynomial time algorithm, etc. The book
[PS98] will supply more than enough prerequisites in combinatorics and computa-
tional complexity (but see also more recent and comprehensive [AB09] and [Go08]).
We also require modest amounts of linear algebra, real and complex analysis. This
book should be accessible to an advanced undergraduate.
In Chap. 2, we develop our toolbox. First, we discuss various topics in convexity:
convex and concave functions, entropy and Bethe-entropy, Gauss-Lucas theorem on
the zeros of the derivative of a complex polynomial, the capacity of real polynomials
and the Prékopa-Leindler inequality. Then we present one of our main tools, inter-
polation, which allows us to approximate the logarithm of a multivariate polynomial
p by a low degree polynomial in a domain, given that there are no complex zeros
of p in a slightly larger domain. We discuss interlacing polynomials, H-stable poly-
nomials (polynomials with no roots in the open upper half-plane of C) and D-stable
polynomials (polynomials with no roots in the closed unit disc in C).
Then we begin our study of partition functions in earnest.
In Chap. 3, we start slowly with the permanent, as it is very easy to define and it has
a surprisingly rich structure. All this makes the permanent a very natural candidate
to try our toolbox on.
In Chap. 4, we consider extensions of the permanent to non-bipartite graphs (hafni-
ans) and hypergraphs (multi-dimensional permanents). We also consider the mixed
discriminant, which is a generalization of the permanent and of the determinant
1 Introduction 7

simultaneously. We observe that some properties of the permanent can be extended


to those more general objects, while some other cannot.
In Chap. 5, we consider the matching polynomial of a graph, a relative of the
permanent and hafnian. Here we introduce the correlation decay method, which, as
Bayati, Gamarnik, Katz, Nair and Tetali showed [B+07], looks particularly elegant
and simple in the case of the matching polynomial. It turns out to be very useful too
and provides some additional insight into the permanent.
In Chap. 6, we discuss the independence polynomial of a graph. We prove
Dobrushin’s bound on the complex roots and also present the correlation decay
approach at its most technical. We discuss an open question due to Sokal [S01b],
which, if answered affirmatively, would allow us to bridge the gap between different
degrees of approximability afforded by the interpolation and by correlation decay
approaches.
In Chap. 7, we present combinatorial partition functions at their most general.
Here we rely entirely on our interpolation technique, although some of the results
can be obtained by the correlation decay approach [LY13]. We also prove the Circle
Theorem of Lee and Yang and discuss the Ising model in some detail.
In Chap. 8, we consider partition functions associated with multisets. We study the
partition functions of 0-1 and non-negative integer flows, which present yet another
extension of permanents. Permanents also supply our main technical tool.
Sections, theorems, lemmas, and formulas are numbered separately inside each
chapter. Figures are numbered consecutively in each chapter. For example, Fig. 4.3
is the third figure in Chap. 4.
We use to denote the real part of a complex number and to denote the imaginary
part of a complex number, so that z = a and z = b for z = a + ib. We denote
by |X | the cardinality of a finite set X .
Finally, the product of complex numbers from an empty set is always 1.

Acknowledgements I am grateful to David Gamarnik, Leonid Gurvits, Gil Kalai, Guus Regts, Alex
Samorodnitsky, Prasad Tetali, Jan Vondrak and Shira Zerbib for helpful conversations, advice and
encouragement. I benefited a lot from the program “Counting Complexity and Phase Transition”
and I thank the Simons Institute for the Theory of Computing for hospitality. I am grateful to
many anonymous referees of the first draft of the book for their remarks. I thank Han Wu and
Max Kontorovich for their work on the REU project on the complex roots and approximation of
permanents in Summer 2016, where they implemented and experimented with the Taylor polynomial
interpolation method for permanents [KW16] and also pointed out to [Wa03] and connections to
the Szegő curve (see Sect. 5.5).
This work is partially supported by NSF Grant DMS 1361541.
Chapter 2
Preliminaries

We assemble our toolbox from real and complex analysis. The main topics are
inequalities inspired by convexity, polynomials with no roots in a particular domain
and relations between convexity and restrictions on the location of the roots. We
discuss the entropy of partitions, the Bethe-entropy, the Prékopa–Leindler inequality
for integrals and the capacity of polynomials with non-negative real coefficients as a
way to estimate a particular coefficient of a multivariate polynomial by solving a con-
vex optimization problem. We discuss polynomials with real roots, polynomials with
no roots in the open upper half-plane (H-stable polynomials) and polynomials with
no roots in the closed unit disc (D-stable polynomials). We prove the Gauss–Lucas
Theorem for the location of the roots of the derivative of a polynomial, the Gurvits
Theorem on the capacity of H-stable polynomials and establish log-concavity of the
coefficients of real-rooted polynomials. We introduce the Taylor polynomial inter-
polation method, which allows us to obtain computationally efficient low-degree
approximations of a polynomial in a complex domain, provided the polynomial has
no zeros in a slightly larger domain.

2.1 Convexity

2.1.1 Convex functions. In what follows, some convex/concave functions will play
an important role. A set A ⊂ Rd is called convex provided

αx + (1 − α)y ∈ A for all x, y ∈ A and all 0 ≤ α ≤ 1.

It follows then that



n
αi xi ∈ A provided xi ∈ A, αi ≥ 0 for i = 1, . . . , n
i=1

n
and αi = 1.
i=1

© Springer International Publishing AG 2016 9


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_2
10 2 Preliminaries

Fig. 2.1 The graph of a


convex function

Let A ⊂ Rd be a convex set. A function f : A −→ R is called convex provided


 
f αx +(1−α)y ≤ α f (x)+(1−α) f (y) for all x, y ∈ A and all 0 ≤ α ≤ 1,

see Fig. 2.1

The function f is called strictly convex if the above inequality is strict whenever
x = y and 0 < α < 1. It is easy to show that if f is convex then
 

n 
n
f αi xi ≤ αi f (xi ) provided xi ∈ A, αi ≥ 0 for i = 1, . . . , n
i=1 i=1

n
and αi = 1.
i=1

A function f : A −→ R is called concave provided


 
f αx +(1−α)y ≥ α f (x)+(1−α) f (y) for all x, y ∈ A and all 0 ≤ α ≤ 1,

see Fig. 2.2.


The function f is called strictly concave if the above inequality is strict whenever
x = y and 0 < α < 1. It is easy to show that if f is concave then
 

n 
n
f αi xi ≥ αi f (xi ) provided xi ∈ A, αi ≥ 0 for i = 1, . . . , n
i=1 i=1

n
and αi = 1.
i=1

Here are some functions whose convexity/concavity we will repeatedly use.


2.1 Convexity 11

Fig. 2.2 The graph of a


concave function

Fig. 2.3 The graph of ln x

2.1.1.1 Logarithm. As is well known, the function

f (x) = ln x for x > 0

is strictly concave, see Fig. 2.3.


In particular,
 n 
 
n
ln αi xi ≥ αi ln xi provided xi , αi > 0 for i = 1, . . . , n
i=1 i=1

n
and αi = 1.
i=1

Exponentiating, we obtain the arithmetic-geometric mean inequality:


12 2 Preliminaries

Fig. 2.4 The graph of x ln x


n 
n
αi xi ≥ xiαi provided xi , αi > 0 for i = 1, . . . , n
i=1 i=1

n
and αi = 1.
i=1

2.1.1.2 The function f (x) = x ln x. It is easy to check that the function

f (x) = x ln x for x > 0

is strictly convex, see Fig. 2.4, and, consequently, the function

1
h(x) = x ln for x > 0
x
is strictly concave.
2.1.1.3 Exponential substitution. Let p(x1 , . . . , xn ) be a polynomial with non-
negative real coefficients. Then the function f : Rn −→ R defined by
 
f (t1 , . . . , tn ) = ln p et1 , . . . , etn

is convex. Indeed, it suffices to check that the restriction h of f onto every line in
Rn is convex. Such a restriction h looks as
 m 

λi t
h(t) = ln αi e ,
i=1

where λ1 , . . . , λm are real and α1 , . . . , αm are positive real. It suffices then to check
that h  (t) ≥ 0 for all t ∈ R. Denoting
2.1 Convexity 13


m
g(t) = αi eλi t ,
i=1

we obtain
g  (t) g  (t)g(t) − g  (t)g  (t)
h  (t) = and h  (t) =
g(t) g 2 (t)

where


m 
m
  
g (t)g(t) − g (t)g (t) = λi2 αi α j e(λi +λ j )t − λi λ j αi α j e(λi +λ j )t
i, j=1 i, j=1
 
= λi2 + λ2j − 2λi λ j αi α j e(λi +λ j )t
{i, j}
i= j
 2
= λi − λ j αi α j e(λi +λ j )t ≥ 0.
{i, j}
i= j

2.1.2 Entropy. Let us consider the simplex n ⊂ Rn consisting of all vectors


x = (ξ1 , . . . , ξn ) such that ξi ≥ 0 for i = 1, . . . , n and ξ1 + . . . + ξn = 1. For
x ∈ n we define the entropy H by


n
1
H (x) = ξi ln where x = (ξ1 , . . . , ξn )
i=1
ξi

and the corresponding term is 0 if ξi = 0. It follows from Sect. 2.1.1.2 that H is


strictly concave. Therefore H attains its minimum value on n at an extreme point
of n , that is, where ξi = 1 for some i and ξ j = 0 for all j = i. In particular,

H (x) ≥ 0 for all x ∈ n .

Clearly, H is a symmetric function of ξ1 , . . . , ξn , so the value of H depends on


the multiset {ξ1 , . . . , ξn } but not on the order of ξi s.
By the concavity and symmetry of H , the largest value of H on n is attained
when ξ1 = . . . = ξn = 1/n, so

H (x) ≤ ln n for all x ∈ n . (2.1.2.1)

A multiset of non-negative numbers summing up to 1 is naturally interpreted as


a probability distribution. Let  be a probability space and let F = {F1 , . . . , Fn } be
its partition into finitely many pairwise disjoint events F1 , . . . , Fn , so that
14 2 Preliminaries

= Fi and Fi ∩ F j = ∅ for i = j.
Fi ∈F

We define the entropy of the partition F by


n
1
H (F) = H ({Pr Fi }) = pi ln where pi = Pr (Fi ).
i=1
pi

In particular, by (2.1.2.1),

H (F) ≤ ln n provided F consists of not more than n events. (2.1.2.2)

We say that a finite partition G refines a partition F if every event in the partition
G lies in some event in the partition F, see Fig. 2.5, in which case we write F G.
We often call events of a coarser partition blocks.
Given a pair of partitions F G, we define the conditional entropy of G with
respect to F as follows:
⎛ ⎞
 ⎜  Pr (G) Pr (F) ⎟
H (G|F) = Pr (F) ⎜
⎝ ln ⎟
F∈F G∈G
Pr (F) Pr (G) ⎠
G⊂F

(if Pr (F) = 0 for some F, the corresponding term in the sum is 0). In words:
each event F of the partition F such that Pr (F) > 0 we consider as a probability
space endowed with the conditional probability measure, compute the entropy of the
partition of F by events of G and average that entropy over all events F ∈ F.
For ω ∈ , let us denote by F(ω) the event of F containing ω, considered as
a probability space as before, and let F(ω) be the partition of F(ω) induced by G.
Assuming that  is finite, we can write

H (G|F) = Pr (ω)H (F(ω)). (2.1.2.3)
ω∈

It is not hard to check that

Fig. 2.5 A partition and its


refinement
2.1 Convexity 15

H (G) = H (F) + H (G|F).

Moreover, if F1 F2 ... Fm , iterating the above identity, we get


m−1
H (Fm ) = H (F1 ) + H (Fi+1 |Fi ) (2.1.2.4)
i=1

see, for example, [Kh57].

2.1.3 Bethe-entropy. Let n ⊂ Rn be, as above, the simplex of all n-vectors


x = (ξ1 , . . . , ξn ) such that ξi ≥ 0 for i = 1, . . . , n and ξ1 + . . . + ξn = 1. We assume
that n ≥ 2 and for x ∈ n , we define

n  
1
g(x) = ξi ln + (1 − ξi ) ln(1 − ξi ) for x = (ξ1 , . . . , ξn ) .
i=1
ξi

We call this function the Bethe-entropy. We claim that g(x) is a non-negative concave
function on n . We follow [Gu11], see also [Vo13].
Let
1
φ(ξ) = ξ ln + (1 − ξ) ln(1 − ξ),
ξ

see Fig. 2.6.

We have
2ξ − 1
φ (ξ) = ,
ξ(1 − ξ)

from which φ is concave for 0 ≤ ξ ≤ 1/2. Since φ(0) = φ(1/2) = 1, it follows that
φ(ξ) ≥ 0 for 0 ≤ ξ ≤ 1/2.

Fig. 2.6 The graph of φ(x)


16 2 Preliminaries

Hence g(x) ≥ 0 if ξi ≤ 1/2 for i = 1, . . . , n. Otherwise, there is at most one


value of ξi , say ξn , such that ξn > 1/2. Therefore, the minimum of the concave
n−1
function i=1 φ(ξi ) on the simplex defined by the equation ξ1 + . . . + ξn−1 = 1 − ξn
and inequalities ξi ≥ 0 for i = 1, . . . , n − 1 is attained at an extreme point, where
all but one ξi , say ξ1 , are equal to 0 and ξ1 = 1 − ξn . Therefore,

g(x) ≥ φ(ξn ) + φ (1 − ξn ) = 0,

and hence g(x) is indeed non-negative on n .


To prove that g(x) is concave, it suffices to prove that the restriction of g onto the
relative interior of n is concave, so we assume that ξ1 , . . . , ξn > 0. Computing the
Hessian of g at x = (ξ1 , . . . , ξn ), we obtain the n × n diagonal matrix
 
2ξ1 − 1 2ξn − 1
D = diag ,..., .
ξ1 (1 − ξ1 ) ξn (1 − ξn )

Our goal is to prove that the restriction of the quadratic form with matrix D onto the
tangent space to x ∈ n is negative semi-definite, that is

n
2ξi − 1 2 n
αi ≤ 0 provided αi = 0. (2.1.3.1)
ξ (1 − ξi )
i=1 i i=1

If ξi ≤ 1/2 for all i = 1, . . . , n then (2.1.3.1) obviously holds. Otherwise, there is at


most one coordinate ξi , say, ξn , such that ξn > 1/2. If αn = 0 then (2.1.3.1) holds,
so we can assume that αn = 0. Scaling, if necessary, we can further assume that
αn = −1.
Let us denote
2ξi − 1
βi = for i = 1, . . . , n − 1.
ξi (1 − ξi )

The maximum value of the negative definite quadratic form


n−1
(α1 + . . . + αn−1 ) −→ βi αi2
i=1

on the affine subspace defined by the equation

α1 + . . . + αn−1 = 1

is attained at
 n−1 −1
λ  1
αi = for i = 1, . . . , n − 1 and λ =
βi β
i=1 i
2.1 Convexity 17

and hence is equal to


 n−1 −1  n−1 −1
 1  ξi (1 − ξi )
λ= = .
β
i=1 i i=1
2ξi − 1

Consequently,
 n−1 −1
n
2ξi − 1 2 2ξn − 1  ξi (1 − ξi )
α ≤ +
ξ (1 − ξi ) i
i=1 i
ξn (1 − ξn ) i=1
2ξi − 1

n−1
provided αi = 1 and αn = −1. (2.1.3.2)
i=1

On the other hand, the function

ξ(1 − ξ) 1
ξ −→ for 0 ≤ ξ <
2ξ − 1 2

is concave, as we have

d 2 ξ(1 − ξ) 2 1
= < 0 provided 0 ≤ ξ < .
dξ 2 2ξ − 1 (2ξ − 1)3 2

Consequently, the minimum value of the concave function


n−1
ξi (1 − ξi )
(ξ1 , . . . , ξn−1 ) −→
i=1
2ξi − 1

on the simplex

ξ1 + . . . + ξn−1 = 1 − ξn and ξ1 , . . . , ξn−1 ≥ 0

is attained at an extreme point, where all but one ξi are equal to 0 and the remaining
value of ξi is 1 − ξn . Then from (2.1.3.2) we conclude that

n
2ξi − 1 2 2ξn − 1 1 − 2ξn
αi ≤ + =0
ξ (1 − ξi )
i=1 i
ξn (1 − ξn ) ξn (1 − ξn )

n−1
provided αi = 1 and αn = −1,
i=1

and (2.1.3.1) follows, establishing the concavity of g.


18 2 Preliminaries

Fig. 2.7 The roots of f


(black dots) and the roots of
f  (white dots)

2.1.4 Gauss–Lucas Theorem. Let f : C −→ C be a non-constant polynomial.


The Gauss–Lucas Theorem states that the roots of f  (z) lie in the convex hull of the
roots of f , see Fig. 2.7.
Indeed, without loss of generality, we assume that f is monic. Let γ1 , . . . , γn be
the roots of f , counted with multiplicities, so


n
f (z) = (z − γk ).
k=1

Let β be a root of f  , so


n  
n 
0 = f  (β) = (β − γm ) and hence (β − γm ) = 0.
k=1 m=k k=1 m=k

If β = γ j for some j, the result follows instantly. Otherwise, multiplying the last
 n
equation by (β − γm ), we obtain
m=1


n 
(β − γk ) |β − γm |2 = 0.
k=1 m=k

Denoting 
m=k |β − γm |2
αk = m  ,
k=1 m=k |β − γm |2

we write β as a convex combination of γ1 , . . . , γm :


m 
m
β= αk γk where αk = 1 and αk ≥ 0 for k = 1, . . . , m.
k=1 k=1

2.1.5 Capacity. Let


 μ
p (x1 , . . . , xn ) = am xm where xm = x1 1 · · · xnμn (2.1.5.1)
m∈M
2.1 Convexity 19

be a polynomial with non-negative real coefficients am ≥ 0 for m ∈ M. Following


Gurvits [Gu08, Gu15], given a non-negative integer vector r = (ρ1 , . . . , ρn ), we
define the capacity of p by

p (x1 , . . . , xn )
capr ( p) = inf ρ ρ .
x1 ,...,xn >0 x1 1 · · · xn n

As follows from Sect. 2.1.1.3, the substitution xi = eti for i = 1, . . . , n expresses


the capacity in terms of the infimum of a convex function on Rn :
 
ln capr ( p) = inf ln p et1 , . . . , etn − ρ1 t1 − . . . − ρn tn . (2.1.5.2)
t1 ,...,tn

This makes the capacity efficiently computable, see, for example, [Ne04], provided
the value of the polynomial p is efficiently computable for any given x1 , . . . , xn .
It follows from (2.1.5.2) that the function r −→ ln capr ( p), being the point-wise
minimum of a family of affine functions, is concave, meaning that if m 1 , . . . , m k are
non-negative integer vectors and


k 
k
r= αi m i where αi = 1 and αi ≥ 0 for i = 1, . . . , k
i=1 i=1

is also a non-negative integer vector, then


k
ln capr p ≥ αi ln capm i p.
i=1

We get an immediate upper bound on the coefficients of p in terms of the capacity:

am ≤ capm ( p) for all m ∈ M.

We obtain a complementary lower bound if we assume that the function m −→ ln am


is (approximately) concave. More precisely, we prove the following statement:
Let 0 < β ≤ 1 be a real and let r ∈ M be an index in (2.1.5.1) such that whenever


k
r= αi m i where m i ∈ M, αi > 0 for i = 1, . . . , k
i=1
m
and αi = 1,
i=1

we have

k
ar ≥ β amαii .
i=1
20 2 Preliminaries

Then
β
ar ≥ cap ( p), (2.1.5.3)
|M| r

where |M| is the number of monomials in the expansion (2.1.5.1).


Without loss of generality, we assume that am > 0 for all m ∈ M. Let us consider
a lifting
M −→ Rn+1 , m −→ (m, ln am ) .

Let us choose an arbitrary γ > ln ar − ln β and let us consider a closed ray

R = {(r, α) : α ≥ γ} .

Then R does not intersect the convex hull of the points (m, ln am ) for m ∈ M \
{r }. Therefore, there is a linear function separating R from the set (m, ln am ) for
m ∈ M \ {r }. Hence writing r = (ρ1 , . . . , ρn ), we conclude that there are real
t1 , . . . , tn ; tn+1 such that


n 
n
tn+1 α + ρi ti > tn+1 ln am + μi ti
i=1 i=1
for all m ∈ M \ {r }, m = (μ1 , . . . , μn ) and all α ≥ γ.

Moreover, we can choose t1 , . . . , tn ; tn+1 sufficiently generic, so that tn+1 = 0, in


which case we must necessarily have tn+1 > 0 and which we can further scale to
tn+1 = 1, see Fig. 2.8.
Hence we conclude that


n 
n
γ+ ρi ti > ln am + μi ti for all m ∈ M \ {r }, m = (μ1 , . . . , μn ) .
i=1 i=1

Since γ > ln ar − ln β was chosen arbitrarily, we conclude further that


n 
n
ln ar + ρi ti ≥ ln am +ln β+ μi ti for all m ∈ M \{r }, m = (μ1 , . . . , μn ) .
i=1 i=1

Letting xi = eti for i = 1, . . . , n, we get


ρ μ
ar x1 1 · · · xnρn ≥ βam x1 1 · · · xnμn for all m ∈ M \ {r }, m = (μ1 , . . . , μn ) ,

from which
ρ
β p (x1 , . . . , xn ) ≤ |M|ar x1 1 · · · x ρn

and (2.1.5.3) follows.


2.1 Convexity 21

Fig. 2.8 An index m, its


lifting, a ray R and an affine
hyperplane separating R
from the liftings of indices m
R

m r

2.1.6 Prékopa–Leindler inequality. We will need the following useful inequal-


ity. Let f, f 1 , . . . , f k : Rn −→ R+ be non-negative integrable functions and let
α1 , . . . , αk ≥ 0 be reals such that α1 + . . . + αk = 1. Suppose further, that
 k 
 
k
f αi xi ≥ f iαi (xi ) for all x1 , . . . , xk ∈ Rn .
i=1 i=1

Then
 k 
 αi
f (x) d x ≥ f i (x) d x .
Rn i=1 Rn

We adapt the proof of Sect. 2.2 of [Le01].


We proceed by induction on the dimension n of the ambient space. The main work
is done in dimension 1. For n = 1, by continuity we may assume that f 1 , . . . , f k are
strictly positive and continuous. Scaling, if necessary, we may assume further that
 +∞
f i (x) d x = 1 for i = 1, . . . , k.
−∞

Let us define  t
Fi (t) = f i (x) d x for i = 1, . . . , k.
−∞

Hence Fi (t) is an increasing function Fi : R −→ (0, 1) and let u i : (0, 1) −→ R be


its inverse. Thus u i (t) is also strictly increasing and Fi (u i (t)) = t for i = 1, . . . , k.
We note that Fi and hence u i are differentiable and that

f i (u i (t)) u i (t) = 1 for i = 1, . . . , k. (2.1.6.1)


22 2 Preliminaries

Let us define

k
u(t) = αi u i (t) for t ∈ (0, 1).
i=1

Making a substitution x = u(t), we get


    k  k 
+∞ 1 1  
f (x) d x = f (u(t))u  (t) dt = f αi u i (t) αi u i (t) dt
−∞ 0 0 i=1 i=1

By the condition of the theorem,


 k 
 
k
f αi u i (t) ≥ f iαi (u i (t)) ,
i=1 i=1

while by the arithmetic-geometric mean inequality,


k 
k
  αi
αi u i (t) ≥ u i (t) .
i=1 i=1

Summarizing,
   k 
+∞ 1  α
f (x) d x ≥ f (u i (t)) u i (t) i dt = 1
−∞ 0 i=1

by (2.1.6.1) and the proof for n = 1 follows.


Suppose that n > 1. We represent Rn = Rn−1 ⊕ R, x = (y, t), and define
 
g(t) = f (y, t) dy and gi (t) = f i (y, t) dy for i = 1, . . . , k.
Rn−1 Rn−1

Let us choose arbitrary real t1 , . . . , tk and let t = α1 t1 + . . . + αk tk . We define


functions h, h 1 , . . . , h k : Rn−1 −→ R by

h(y) = f (y, t) and h i (y) = f i (y, ti ) for i = 1, . . . , k.

Then
 k   k 
  
k 
k 
k
h αi yi = f αi yi , αi ti ≥ f αi (yi , ti ) = h iαi (yi )
i=1 i=1 i=1 i=1 i=1

and hence by the induction hypothesis


2.1 Convexity 23

 k 
 αi 
k
g(t) = h(t) dt ≥ h i (y) dy = giαi (ti ).
Rn−1 i=1 Rn−1 i=1

Applying Fubini’s Theorem and the inequality in the 1-dimensional case, we get
  +∞ k 
 +∞ αi k 
 αi
f (x) d x = g(t) dt ≥ gi (t) dt = f i (x) d x ,
Rn −∞ i=1 −∞ i=1 Rn

which completes the induction.

2.2 Polynomial Approximations

We start with a simple lemma.


2.2.1 Lemma. Let g(z) be a complex polynomial of degree d and let us suppose that

g(z) = 0 for all |z| ≤ β,

where β > 1 is a real number. Let us choose a branch of

f (z) = ln g(z) for |z| ≤ 1

and consider its Taylor polynomial

n  k   zk
d 
pn (z) = f (0) + k
f (z) .
k=1
dz z=0 k!

Then
d
| f (z) − pn (z)| ≤ for all |z| ≤ 1.
(n + 1)β n (β − 1)

In particular, assuming that β > 1 is fixed in advance, to achieve

| f (1) − pn (1)| <

for some > 0, it suffices to choose


 
d
n = O ln ,

where the implicit constant in the “O” notation depends only on β.


24 2 Preliminaries

Proof of Lemma 2.2.1. Let α1 , . . . , αd be the roots of g(z), so we can write

d  
z
g(z) = g(0) 1− where g(0) = 0 and |αi | > β for i = 1, . . . , d.
i=1
αi

Hence

d  
z
f (z) = ln g(z) = f (0) + ln 1 − for |z| ≤ 1,
i=1
αi

and expanding the logarithm, we obtain


  n
z zk
ln 1 − =− + ξi,n for |z| ≤ 1
αi k=1
kαik

where  ∞ 
    z k 
 1
ξi,n  =  ≤ for all |z| ≤ 1.
 k  (n + 1)β n (β − 1)
k=n+1
kα i

Therefore,
d  n
zk
f (z) = f (0) − + ηn for |z| ≤ 1
i=1 k=1
kαik

where
d
|ηn | ≤ .
(n + 1)β n (β − 1)

To complete the proof, it suffices to notice that

1 dk  d
 1
f (z)  = − k
.
k! dz k z=0
i=1
kα i


2.2.2 Computing derivatives of f (z) = ln g(z). Let f (z) = ln g(z) as in Lemma
2.2.1, where we assume that g(0) = 0 and hence a branch of f (z) can be chosen in
a sufficiently small neighborhood of z = 0. Then
g  (z)
f  (z) = and hence g  (z) = f  (z)g(z).
g(z)

Differentiating the product k − 1 times, we obtain

 k−1 
   k− j   dj  
dk  k−1 d  
g(z)  = f (z)  g(z)  . (2.2.2.1)
dz k z=0
j=0
j dz k− j z=0 dz j z=0
2.2 Polynomial Approximations 25

Combining the Eq. (2.2.2.1) for k = 1, . . . , n, we obtain a triangular system of linear


equations in f (k) (0)

g  (0) =g(0) f  (0)


g  (0) =g  (0) f  (0) + g(0) f  (0)
g (3) (0) =g  (0) f  (0) + 2g  (0) f  (0) + g(0) f (3) (0)
................................................
g (n) (0) =g (n−1) (0) f  (0) + (n − 1)g (n−2) (0) f  (0) + . . . + g(0) f (n) (0)

with coefficients g(0) = 0 on the diagonal, from which we can compute the deriv-
atives f (k) (0) for k = 1, . . . , n from g(0) and g (k) (0) for k = 1, . . . , n in O(n 2 )
time.
Lemma 2.2.1 allows us to approximate ln g(1) by a small (logarithmic) degree
Taylor polynomial of ln g(z) computed at z = 0, provided there are no roots of g(z)
in the disc Dβ = {z : |z| ≤ β} in the complex plane for some radius β > 1. We will
need to construct a similar approximation under a weaker assumption that g(z) = 0
in a thin strip aligned with the positive real axis, that is, g(z) = 0 provided

−δ ≤  z ≤ 1 + δ and | z| ≤ δ

for some δ > 0. We achieve this by constructing a polynomial φ = φδ : C −→ C


such that

φ(0) = 0, φ(1) = 1 and


−δ ≤  φ(z) ≤ 1 + δ, | φ(z)| ≤ δ provided |z| ≤ β

for some β = β(δ) > 1, see Fig. 2.9.


We then consider a composition h(z) = g(φ(z)). Hence h(z) is a polynomial
of deg h = (deg g)(deg φ) that does not have zeros in the disc Dβ and such that
h(0) = g(0) and h(1) = g(1). Using Lemma 2.2.1, we approximate ln g(1) by the
Taylor polynomial of ln h(z) of degree n = O (ln deg g + ln deg φ) computed at
z = 0. As follows from Sect. 2.2.2, to compute the Taylor polynomial of degree n
of ln h(z) at z = 0, it suffices to compute the Taylor polynomial of h(z) of degree
n at z = 0. On the other hand, to compute the Taylor polynomial of h(z) of degree

Fig. 2.9 A polynomial φ


mapping the disc of radius
β > 1 into a neighborhood
of [0, 1] ⊂ C while mapping
0 to 0 and 1 to 1
0 1 0 1
26 2 Preliminaries

n at z = 0 it suffices to compute the Taylor polynomial pn of degree n of g(z) at 0,


compute the truncation φn of φ by discarding all monomials of degree higher than n
and then compute the composition pn (φn (z)) and discard all monomials of degree
higher than n (recall that φ(0) = 0 so that the smallest degree of a monomial in φ(z)
is 1).
The following lemma provides an explicit construction of φ.

2.2.3 Lemma. For 0 < ρ < 1, let us define

1 − e−1− ρ
1
− ρ1
α =α(ρ) = 1 − e , β = β(ρ) = > 1,
1 − e− ρ
1

   N
1 1+ ρ1 αm
N =N (ρ) = 1+ e ≥ 14, σ = σ(ρ) = and
ρ m=1
m

1  (αz)m
N
φ(z) =φρ (z) = .
σ m=1 m

Then φ(z) is a polynomial of degree N such that φ(0) = 0, φ(1) = 1,

−ρ ≤  φ(z) ≤ 1 + 2ρ and | φ(z)| ≤ 2ρ provided |z| ≤ β.

Proof. Clearly, φ(z) is a polynomial of degree N such that φ(0) = 0 and φ(1) = 1.
It remains to prove that φ maps the disc |z| ≤ β into the strip −ρ ≤  z ≤ 1 + 2ρ,
| z| ≤ 2ρ.
We consider the function
1
Fρ (z) = ρ ln for |z| < 1.
1−z

Since
1
 > 0 if |z| < 1,
1−z

the function Fρ (z) is well-defined by the choice of a branch of the logarithm, which
we choose so that
Fρ (0) = ρ ln 1 = 0.

Then for |z| < 1 we have


 
 Fρ (z) ≤ πρ and  Fρ (z) ≥ −ρ ln 2 (2.2.3.1)
2
In addition,

Fρ (α) = 1 and  Fρ (z) ≤ 1 + ρ provided |z| ≤ 1 − e−1− ρ .


1
(2.2.3.2)
2.2 Polynomial Approximations 27

Let
n
zm
Pn (z) = .
m=1
m

Then
    
 1   ∞ z m  |z|n+1
ln − Pn (z) =   ≤ provided |z| < 1.
 1−z  m (n + 1)(1 − |z|)
m=n+1

Therefore, for |z| ≤ β, we have

  (αβ) N +1
 Fρ (αz) − ρPN (αz) ≤ ρ
(N + 1)(1 − αβ)
ρ   N +1
1 − e−1− ρ
1 1
≤ e1+ ρ
N +1
ρ ρ
≤ ≤ . (2.2.3.3)
N +1 15

Combining (2.2.3.1)–(2.2.3.3), we conclude that for |z| ≤ β we have

| ρPN (αz)| ≤ 1.64ρ and − 0.76ρ ≤  ρPN (αz) ≤ 1 + 1.07ρ. (2.2.3.4)

Substituting z = 1 in (2.2.3.3) and using (2.2.3.2), we conclude that


ρ
|1 − ρPN (α)| ≤ . (2.2.3.5)
15
Since
PN (αz) ρPN (αz)
φ(z) = = ,
PN (α) ρPN (α)

combining (2.2.3.4) and (2.2.3.5) and noting that ρPN (α) is real, we obtain

| φ(z)| ≤ 2ρ and − ρ ≤  φ(z) ≤ 1 + 2ρ provided |z| ≤ β.


The construction of Lemma 2.2.3 suggests a general principle:
Suppose we have a polynomial g(z) of degree n such that the k-th derivative
g (k) (0) can be computed in n O(k) time. We want to approximate g(1). If we can find
a sufficiently wide “sleeve” containing 0 and 1 and avoiding the roots of g(z), such
as the one on Fig. 2.10a, we can approximate g(1) within relative error 0 < < 1
in n O(ln n−ln ) time. For that, we construct a polynomial φ(z) such that φ(0) = 0,
φ(1) = 1 and φ maps the disc {z : |z| ≤ β} for some sufficiently large β > 1 into
the sleeve where g(z) = 0. We then apply Lemma 2.2.1 to g(φ(z)). If the zeros of
g surround 0 as on Fig. 2.10b, the sleeve connecting 0 and 1 and avoiding the roots
28 2 Preliminaries

Fig. 2.10 a There is a (a) (b)


sufficiently wide sleeve
connecting 0 and 1 and
avoiding the zeros of g, b the
zeros of g surround 0 0 1 1
precluding the existence of a 0
wide sleeve connecting 0
and 1

of g(z) will have to be too thin, making the radius β of the disc too close to 1 and
hence making any computational gain impossible.

2.3 Polynomials with Real Roots

We start with a definition.


2.3.1 Definition. Let f be a real polynomial of degree n with n distinct real roots
α1 < . . . < αn . We say that a real polynomial g of degree n − 1 interlaces f if g
has n − 1 real roots β1 < . . . < βn−1 such that

α1 < β1 < α2 < β2 < α3 < . . . < αn−1 < βn−1 < αn ,

see Fig. 2.11.


For example, if the roots of f are all real and distinct then f  interlaces f .
2.3.2 Theorem.
(1) Let f and g1 , . . . , gm be real polynomials such that gk interlaces f for k =
1, . . . , m. Suppose further that the highest degree terms of g1 , . . . , gm have the
same sign. Let λ1 , . . . , λm be non-negative reals, not all 0 and let


m
g= λk gk .
k=1

Then the polynomial g interlaces f ;


(2) Let f and g be real polynomials such that g interlaces f and suppose that the
highest terms of f and g have the same sign. Then for any λ ∈ R the polynomial
f interlaces the polynomial h(x) = (x − λ) f (x) − g(x).

Fig. 2.11 A polynomial g f


interlacing a polynomial f

g
2.3 Polynomials with Real Roots 29

Proof. Let α1 < . . . < αn be the roots of f , so deg f = n.


To prove Part (1), we note that since each gk interlaces f , it changes it sign exactly
once inside every interval [αi , αi+1 ] for i = 1, . . . , n − 1, see Fig. 2.11. Since the
coefficients of degree n − 1 of all polynomials gk have the same sign, inside each
interval [αi , αi+1 ] all the polynomials gk change the sign in the same way (that is,
all positive at αi and negative at αi+1 or all negative at αi and positive at αi+1 ). It
follows that g changes its sign inside each interval [αi , αi+1 ] and hence interlaces f .
To Prove Part (2), without loss of generality we assume that the highest terms of f
and g are positive. Since g interlaces f , the polynomial g changes its sign inside each
interval [αi , αi+1 ] for i = 1, . . . , n − 1 and hence the polynomial h also changes its
sign inside each interval. Thus each interval (αi , αi+1 ) contains at least one root of
h, which accounts for the total of n − 1 roots.
Let βn−1 ∈ (αn−1 , αn ) be the largest root of g. Since g(x) does not change its sign
for all x > βn−1 , we must have g (αn ) > 0 and hence h(αn ) < 0. On the other hand,
since the highest term of h(x) is positive, we must have h(x) > 0 for all sufficiently
large x and hence there is a root, say, γn+1 of h(x) satisfying γn+1 > αn .
Similarly, let β1 ∈ (α1 , α2 ) be the smallest root of g. Since g(x) does not change
its sign for all x < β1 , we must have have g(α1 ) > 0 if n is odd (and hence
deg g = n − 1 is even) and g(α1 ) < 0 if n is even (and hence deg g = n − 1 is odd).
Therefore, h(α1 ) < 0 if n is odd and h(α1 ) > 0 if n is even. On the other hand,
since the highest term of h(x) is positive, for all sufficiently small x we must have
h(x) > 0 if n is odd (and hence deg h = n + 1 is even) and h(x) < 0 if n is even
(and hence deg h = (n + 1) is odd). This proves that there is a root, say, γ1 of h(x)
satisfying γ1 < α1 . Since the total number of roots of h cannot exceed n + 1, we
conclude that every interval (αi , αi+1 ) for i = 1, . . . , n − 1 contains exactly one
root, say γi+1 of h and hence f interlaces h. 

The coefficients of a polynomial with real roots satisfy some interesting inequal-
ities.

2.3.3 Theorem. Suppose that the roots of a real polynomial


n
p(x) = ajx j
j=0

are real. Then


  
1 1
a 2j ≥ a j−1 a j+1 1 + 1+ for j = 1, . . . , n − 1.
j n− j

Equivalently, for
aj
b j = n  ,
j
30 2 Preliminaries

we have
b2j ≥ b j−1 b j+1 for j = 1, . . . , n − 1.

Proof. Repeatedly applying Rolle’s Theorem, we conclude that the roots of the
polynomial
d j−1 n
k!
q(x) = p(x) = ak x k− j+1
dx j−1
k= j−1
(k − j + 1)!

are also real. Hence the roots of the polynomial


  n
1 k!
r (x) = x n− j+1
q = ak x n−k
x k= j−1
(k − j + 1)!

are also real. Applying Rolle’s Theorem again, we conclude that the roots of the
quadratic polynomial

d n− j−1
s(x) = r (x)
d x n− j−1
(n − j + 1)!( j − 1)!a j−1 2 ( j + 1)!(n − j − 1)!a j+1
= x + j!(n − j)!a j x +
2 2
are real. Therefore,
 2
j!(n − j)!a j ≥ (n − j + 1)!(n − j − 1)!( j − 1)!( j + 1)!a j−1 a j+1

and the proof follows. 

When the coefficients a j are non-negative, we conclude that

a 2j ≥ a j−1 a j+1 for j = 1, . . . , n − 1,

which means that the sequence a0 , a1 , . . . , an is log-concave (that is, the sequence
c j = ln a j is concave), see [St89].

2.3.4 Estimating the largest absolute value of a root of a polynomial. Let f (t)
be a monic polynomial with real roots a1 , . . . , an , so


n 
n
f (t) = bi t n−i = (t − ai ),
i=0 i=1

where b0 = 1.
Let

n
pk = aik for k = 1, . . .
i=1
2.3 Polynomials with Real Roots 31

be the power sums of roots. Knowing the k + 1 highest coefficients b1 , . . . , bk+1 of


f allows us to compute p1 , . . . , pk using Newton’s identities:

p1 = −b1 , p2 = −b1 p1 − 2b2 , p3 = −b1 p2 − b2 p1 − 3b3

and, more generally,



k−1
pk = −kbk − bi pk−i .
i=1

On the other hand, since ai are real, we have

1
p2k ≤ max ai2k ≤ p2k .
n i=1,...,n

In particular, by choosing k = O (ln(n/ )), we can approximate maxi=1,...,n |ai |


within a relative error by ( p2k )1/2k .

2.4 H-Stable Polynomials

2.4.1 Definition. Let f (z 1 , . . . , z n ) be a complex polynomial. Given a set  ⊂ C,


we say that f is -stable provided

f (z 1 , . . . , z n ) = 0 whenever z 1 , . . . , z n ∈ .

If  
 = z : z > 0

is the open upper half-plane, we call f H-stable. In other words, f is H-stable if

f (z 1 , . . . , z n ) = 0 whenever  z 1 , . . . ,  z n > 0.

We note that if f (z) is an H-stable univariate polynomial with real coefficients


then all roots of f are necessarily real, since complex roots of f come in pairs of
complex conjugate.

The following lemma summarizes some properties of H-stable polynomials that


are of critical importance for us. We follow [Wa11].
2.4.2 Lemma.
(1) Let f m : m = 1, . . . , be a sequence of polynomials in n complex variables and
let f be a polynomial such that

f m −→ f
32 2 Preliminaries

uniformly on compact subsets of Cn . If all f m are H-stable then either f is


H-stable or f is identically 0;
(2) Let f (z 1 , . . . , z n ) be a H-stable polynomial where n > 1. Then the polynomial

g(z 1 , . . . , z n−1 ) = f (z 1 , . . . , z n−1 , 0)

is either H-stable or identically 0.


(3) Let f (z 1 , . . . , z n ) be a H-stable polynomial and let us define


g(z 1 , . . . , z n ) = f (z 1 , . . . , z n ).
∂z n

Then either g is H-stable or g is identically 0.

Proof. Part (1) follows by the (multivariate) Hurwitz Theorem which asserts that if
 ⊂ Cn is a connected open set, functions f m are analytic on  and have no zeros in
, and f m −→ f uniformly on compact subsets of  then f either has no zeros in 
or is identically zero in  (the multivariate Hurwitz Theorem immediately follows
from a more standard univariate version by restricting the functions f m and f onto
a complex line in Cn identified with C), see, for example, [Kr01].
To prove Part (2), we define a sequence of polynomials
 
gm (z 1 , . . . , z n−1 ) = f z 1 , . . . , z n−1 , im −1 .

Then gm are H-stable for all positive integer m and gm −→ g uniformly on compact
subsets of Cn−1 . The proof now follows by Part (1).
To prove Part (3), without loss of generality we may assume that the degree of f
in z n is d ≥ 1, so we can write


d
f (z 1 , . . . , z n ) = z nk h k (z 1 , . . . , z n−1 ), (2.4.2.1)
k=0

where h k (z 1 , . . . , z n−1 ) are polynomials for k = 0, 1, . . . , d and h d ≡ 0. Let us


consider a sequence of polynomials

f m (z 1 , . . . , z n ) = m −d f (z 1 , . . . , z n−1 , mz n ) for m = 1, 2, . . . .

Then the polynomials f m are H-stable and f m −→ z nd h d (z 1 , . . . , z n−1 ) uniformly on


compact subsets of Cn . By Part (1), the polynomial z nd h d (z 1 , . . . , z n−1 ) is H-stable
and hence the polynomial h d (z 1 , . . . , z n−1 ) is H-stable. Hence

h d (z 1 , . . . , z n−1 ) = 0 provided z 1 > 0, . . . , z n−1 > 0. (2.4.2.2)


2.4 H-Stable Polynomials 33

Let us fix some z 1 , . . . , z n−1 such that z 1 > 0, . . . , z n−1 > 0 and consider a
univariate polynomial

p(z) = f (z 1 , . . . , z n−1 , z) for z ∈ C.

By (2.4.2.1) and (2.4.2.2), we have deg p = d. Since f is H-stable, all the d roots
(counting multiplicity) z of p satisfy z ≤ 0. By the Gauss–Lucas Theorem, see
Sect. 2.1.4, the roots of p  lie in the convex hull of the set of roots of p. In particular,
p  (z) = 0 if z > 0, that is,

g(z 1 , . . . , z n−1 , z) = 0 provided z 1 > 0, . . . , z n−1 > 0, z > 0

and g is H-stable. 

Our goal is to prove the following result of Gurvits [Gu08] which bounds coef-
ficients of an H-stable polynomial p with non-negative real coefficients in terms of
its capacity, see Sect. 2.1.5.

2.4.3 Theorem. Let p (x1 , . . . , xn ) be an H-stable polynomial with non-negative


real coefficients and such that deg p ≤ n. Let us define polynomials pn , pn−1 , . . . , p0
by
∂ 

pn = p and pk = pk+1  for k = n − 1, . . . , 0,
∂xk+1 xk+1 =0

so that pk is a polynomial in x1 , . . . , xk and deg pk ≤ k.


Suppose further that the degree of xk in pk does not exceed an integer dk for
k = n, . . . , 1.
Then
 n  
∂n  dk − 1 dk −1 p (x1 , . . . , xn )
p0 = p ≥ inf ,
∂x1 · · · ∂xn k=1
dk x 1 ,...,x n >0 x1 · · · xn

where we agree that


 dk −1
dk − 1
= 1 if dk = 0 or dk = 1.
dk

The proof of Theorem 2.4.3 proceeds by induction on the number n variables with
the following lemma playing the crucial role.
2.4.4 Lemma. Let R(t) be a univariate polynomial non-negative real coefficients
and real roots. Suppose that deg R ≤ d for some non-negative integer d. Then
 d−1
d −1 R(t)
R  (0) ≥ inf if d > 1
d t>0 t
34 2 Preliminaries

and
R(t)
R  (0) = inf if d = 1.
t>0 t

Proof. We note that


 x−1
x −1
h(x) =
x

is a decreasing function of x > 1. Indeed, for

f (x) = ln h(x) = (x − 1) ln(x − 1) − (x − 1) ln x

we have
x −1 1
f  (x) = ln + < 0 for x > 1.
x x
Therefore, without loss of generality, we may assume that deg R = d.
If d ≤ 1 then R(t) = r0 + r1 t for some r0 , r1 ≥ 0 so that

R(t)
inf = r1 = R  (0), (2.4.4.1)
t>0 t

where the infimum is attained as t −→ +∞.


Suppose that d ≥ 2. If R(0) = 0 then R(t) = r1 t + . . . + rd t d for some r1 , . . . ,
rd ≥ 0 and we still have (2.4.4.1) where the infimum is attained as t −→ 0+.
Hence we may assume that R(0) > 0, in which case, scaling R if necessary, we
may additionally assume that R(0) = 1. Then we can write

d  
t
R(t) = 1− ,
i=1
αi

where α1 , . . . , αd are the roots of R. Since the coefficients of R are non-negative


and theconvex hull of the set roots α1 , . . . , αd are real, we necessarily have α1 <
0, . . . , αd < 0. Denoting ai = −αi−1 , we obtain


d
R(t) = (1 + ai t) where a1 , . . . , ad > 0.
i=1

Then
R  (0) = a1 + . . . + ad > 0

and applying the arithmetic-geometric mean inequality, see Sect. 2.1.1.1, we get
2.4 H-Stable Polynomials 35

 d  
t 
d
R  (0) d
R(t) ≤ 1+ ai = 1+ t for t ≥ 0,
d i=1 d

so that  
R(t) R  (0) d
inf ≤ inf g(t) where g(t) = t −1 1 + t .
t>0 t t>0 d

Since d ≥ 2 we have g(t) −→ +∞ as t −→ +∞ and hence the infimum of g(t) is


attained at a critical point t. Solving the equation g  (t) = 0, we get

d
t=
(d − 1)R  (0)

and  d−1
R(t) d
inf ≤ R  (0)
t>0 t d −1

as desired. 

2.4.5 Proof of Theorem 2.4.3. From Parts (3) and (2) of Lemma 2.4.2, each poly-
nomial pk is either H-stable or identically 0. We claim that
 dk −1
dk − 1 pk (x1 , . . . , xk )
pk−1 (x1 , . . . , xk−1 ) ≥ inf
dk xk >0 xk
for all x1 , . . . , xk−1 > 0 (2.4.5.1)

and k = n, n − 1, . . . , 1 with the standard agreement that


 dk −1
dk − 1
= 1 if dk = 1 or dk = 0.
dk

If pk is identically 0 then pk−1 is identically 0 and (2.4.5.1) holds. Hence we


assume that pk is H-stable.
If k = 1 then p1 (x) = ax1 + b for some a, b ≥ 0 so that

p1 (x)
p0 = a = inf .
x1 >0 x1
36 2 Preliminaries

If k ≥ 2, for any fixed x1 > 0, . . . , xk−1 > 0, we define a univariate polynomial

R(t) = Rx1 ,...,xk−1 (t) = pk (x1 , . . . , xk−1 , t).

We claim that all the roots of R are necessarily real. Indeed, R has real coefficients
and if it had a pair of complex conjugate roots α ± βi for some β > 0 then for all
sufficiently small > 0 the univariate polynomial

R̃(t) = pk (x1 + i , . . . , xk−1 + i , t)

would have had a root α̃ + i β̃ for some β̃ > 0, which would have contradicted the
H-stability of p. Applying Lemma 2.4.4, we obtain
 dk −1
 dk − 1 R(t)
R (0) ≥ inf
dk t>0 t

which proves (2.4.5.1) and hence completes the proof of the theorem. 

2.4.6 Corollary. Let p be a polynomial as in Theorem 2.4.3. Then

∂n p n! p (x1 , . . . xn )
≥ n inf .
∂x1 · · · ∂xn n x1 ,...,xn >0 x1 · · · xn

Proof. In Theorem 2.4.3, we can choose dk = k for k = n, n − 1, . . . , 1. Then


n 
 
k − 1 k−1 (n − 1)(n − 2) · · · 1 n!
= n−1
= n.
k=2
k n n

In [Gu15], Gurvits noticed that Theorem 2.4.3 leads to a bound on an arbitrary


coefficient of a homogeneous H-stable polynomial in terms of the capacity, see
Sect. 2.1.5.

2.4.7 Theorem. Let 


p (x1 , . . . , xn ) = am x m
m∈M

be an H-stable homogeneous polynomial with non-negative coefficients. Suppose


further that the degree of p in xk does not exceed dk for k = 1, . . . , n.
2.4 H-Stable Polynomials 37

Then for a non-negative integer vector r = (ρ1 , . . . , ρn ) such that ρ1 + . . . + ρn =


deg p and ρk ≤ dk for k = 1, . . . , n, we have
 n ρ 
 ρ k (dk − ρk )dk −ρk dk !
ar ≥ k
capr p.
k=1 ρk ! (dk − ρk )!dkdk

Proof. Without loss of generality we assume that ρ1 , . . . , ρn > 0 since otherwise we


consider the polynomial  p obtained from p by fixing xi = 0 whenever ρi = 0. By
Part (2) of Lemma 2.4.2, the polynomial  p is either H-stable or identically zero (in
which case the statement of the theorem trivially holds true). We define a polynomial
q in d variables y11 , . . . , y1ρ1 , . . . , yn1 , . . . , ynρn by
 
  yk1 + . . . + ykρk
q . . . , yk1 , . . . , ykρk , . . . = p . . . , ,... .
ρk

It is easy to see that q is an H-stable of degree d and that

∂d q n ρ
ρk k
ar = . (2.4.7.1)
∂ y11 . . . ∂ y1ρ1 · · · ∂ ynρn k=1 ρk !

The degree of q in every variable yk j does not exceed dk , while the degree of the
polynomial
∂ jq 


∂ yk1 · · · ∂ yk j yk1 =...=yk j =0

in yk( j+1) does not exceed dk − j for j = 1, . . . , ρk . Therefore, by Theorem 2.4.3,

ρk  dk − j
∂d q  n 
dk − j

∂ y11 . . . ∂ y1ρ1 · · · ∂ ynρn k=1 j=1
dk − j + 1
 
q y11 , . . . , y1ρ1 , . . . , yn1 , . . . , ynρn
× inf . (2.4.7.2)
y11 ,...,y1ρ1 >0 y11 · · · y1ρ1 · · · yn1 · · · ynρn
.....................
yn1 ,...,ynρn >0

We further simplify
ρk  dk − j 
n 
dk − j
n
dk !(dk − ρk )dk −ρk
= . (2.4.7.3)
k=1 j=1
dk − j + 1 k=1 (dk − ρk )!dkdk
38 2 Preliminaries

Finally, we claim that


 
q y11 , . . . , y1ρ1 , . . . , yn1 , . . . , ynρn
inf ≥ capr p
y11 ,...,y1ρ1 >0 y11 · · · y1ρ1 · · · yn1 · · · ynρn
.....................
yn1 ,...,ynρn >0
p (x1 , . . . , xn )
= inf ρ ρ . (2.4.7.4)
x1 ,...,xn >0 x1 1 · · · xn n

Indeed, given y11 , . . . , y1ρ1 , yn1 , . . . , ynρn > 0, let us define


ρk
1 
xk = yki for k = 1, . . . , n.
ρk i=1

By the arithmetic-geometric mean inequality, we have


ρk
ρ

xk k ≥ yki
i=1

and hence we obtain (2.4.7.4).


Combining (2.4.7.1)–(2.4.7.4), we get the desired result. 

2.5 D-Stable Polynomials

Let  
D= z∈C: |z| ≤ 1

be the closed unit disc. We are interested in D-stable polynomials, that is, polynomials
p (z 1 , . . . , z n ) such that p (z 1 , . . . , z n ) = 0 provided |z i | ≤ 1 for i = 1, . . . , n.
We start with multi-affine polynomials, that is sums of square-free monomials.
For a set S ⊂ {1, . . . , n}, let 
zS = zi
i∈S

denote the monomial in the complex variables z 1 , . . . , z n (we agree that z∅ = 1).
Our first result is as follows.
2.5.1 Theorem. Let
 
f (z 1 , . . . , z n ) = a S z S and g (z 1 , . . . , z n ) = bS zS
S⊂{1,...,n} S⊂{1,...,n}
2.5 D-Stable Polynomials 39

be two D-stable polynomials. Then the polynomial h = f ∗ g defined by



h (z 1 , . . . , z n ) = c S z S where c S = a S b S
S⊂{1,...,n}

is also D-stable.
The polynomial h = f ∗ g is called sometimes the Schur product and sometimes
the Hadamard product of f and g. We follow [Hi97], see also [Ru71]. The proof is
based on the Asano contractions [As70].

2.5.2 Lemma. Suppose that the bivariate polynomial

p (z 1 , z 2 ) = a + bz 1 + cz 2 + dz 1 z 2

is D-stable. Then the univariate polynomial

q(z) = a + dz

is also D-stable.

Proof. Since p is D-stable, we have a = 0. Seeking a contradiction, suppose that q(z)


is not D-stable. Then d = 0 and for the unique root w of q we have |w| = |a|/|d| ≤ 1,
so that |d| ≥ |a|.
Without loss of generality, we assume that |b| ≥ |c|. Let us fix a z 2 such that
|z 2 | = 1 and
|b + dz 2 | = |b| + |d| ≥ |a| + |c|.

Then the set


K = {(b + dz 2 ) z 1 : |z 1 | ≤ 1}

is a disc centered at 0 and of radius |b| + |d| ≥ |a| + |c|. Since

|a + cz 2 | ≤ |a| + |c|,

the translation K + (a + cz 2 ) of the disc K by a vector a + cz 2 whose length does


not exceed the radius of K must contain 0, see Fig. 2.12.
Therefore, for some z 1 such that |z 1 | ≤ 1, we have a + cz 2 + bz 1 + dz 2 z 1 = 0,
which is a contradiction. Hence |d| < |a| and q is D-stable. 

2.5.3 Proof of Theorem 2.5.1. We proceed by induction on the number n of vari-


ables. If n = 1, then f (z) = a + bz, g(z) = c + dz and h(z) = ac + bdz. Since
f is D-stable, we have a = 0 and |b| < |a|. Since g is D-stable, we have c = 0
and |d| < |c|. Therefore, ac = 0 and |bd| < |ac|, from which it follows that h is
D-stable.
40 2 Preliminaries

Fig. 2.12 The disc K and its


translation

0 0

Suppose that n ≥ 2. We can write


   
f (z 1 , . . . , z n ) = aS zS = a S + z n a S∪{n} z S
S⊂{1,...,n} S⊂{1,...,n−1}

and
   
g (z 1 , . . . , z n ) = bS zS = b S + z n b S∪{n} z S .
S⊂{1,...,n} S⊂{1,...,n−1}

Let us fix any two z, w ∈ D. Then the (n − 1)-variate polynomials


     
a S + za S∪{n} z S and b S + wb S∪{n} z S
S⊂{1,...,n−1} S⊂{1,...,n−1}

are D-stable and by the induction hypothesis the polynomial


   
a S + za S∪{n} b S + wb S∪{n} z S
S⊂{1,...,n−1}

in n − 1 variables z 1 , . . . , z n−1 is also D-stable. This means that for any fixed
z 1 , . . . , z n−1 ∈ D the bivariate polynomial
 
p(z, w) = aS bS zS + z a S∪{n} b S z S
S⊂{1,...,n−1} S⊂{1,...,n−1}
 
+w a S b S∪{n} z S + zw a S∪{n} b S∪{n} z S
S⊂{1,...,n−1} S⊂{1,...,n−1}

is D-stable. Lemma 2.5.2 then implies that for any fixed z 1 , . . . , z n−1 ∈ D the uni-
variate polynomial
 
q (z n ) = aS bS zS + zn a S∪{n} b S∪{n} z S
S⊂{1,...,n−1} S⊂{1,...,n−1}
2.5 D-Stable Polynomials 41

is D-stable. Therefore, for any z 1 , . . . , z n ∈ D, we have that


 
h (z 1 , . . . , z n ) = aS bS zS + zn a S∪{n} b S∪{n} z S = 0,
S⊂{1,...,n−1} S⊂{1,...,n−1}

as required. 
Ruelle [Ru71] generalized Lemma 2.5.2 as follows: let A, B ⊂ C be closed sets
such that 0 ∈ / B and let p (z 1 , z 2 ) = a + bz 1 + cz 2 + dz 1 z 2 be a bivariate
/ A and 0 ∈
polynomial such that

p (z 1 , z 2 ) = 0 =⇒ z 1 ∈ A or z 2 ∈ B.

Then for the univariate polynomial q(z) = a + dz we have

q(z) = 0 =⇒ z = −z 1 z 2 for some z 1 ∈ A and z 2 ∈ B.

The corresponding generalizations of Theorem 2.5.1 can be found in [Ru71] and


[Hi97].
Our next goal is to prove the following theorem of Szegő for univariate D-stable
polynomials.
2.5.4 Theorem. Let


n   n  
n k n k
f (z) = ak z and g(z) = bk z
k=0
k k=0
k

be D-stable polynomials. Then the polynomial h = f ∗ g defined by


n  
n k
h(z) = ck z where ck = ak bk for k = 0, 1, . . . , n
k=0
k

is also D-stable.
The polynomial h = f ∗ g is called the Schur product of f and g.
For k = 0, . . . , n, let

ek (z 1 , . . . , z n ) = z i1 · · · z ik
1≤i 1 <...<i k ≤n

be the k-th elementary symmetric polynomial in z 1 , . . . , z n , where we agree that


e0 (z 1 , . . . , z n ) = 1. We deduce Theorem 2.5.4 from Theorem 2.5.1 and the following
result of Szegő connecting multivariate and univariate D-stable polynomials.
2.5.5 Theorem. Let  

n
n k
f (z) = ak z
k=0
k
42 2 Preliminaries

be a univariate D-stable polynomial. Then the n-variate polynomial


n
F (z 1 , . . . , z n ) = ak ek (z 1 , . . . , z n )
k=0

is also D-stable.

We follow Chapter IV of [Ma66] with some modifications. We start with a lemma


known as Laguerre’s Theorem.

2.5.6 Lemma. Let p(z) be a polynomial and let n be a positive integer. For β ∈ C,
we define the polynomial

q(z) = np(z) + (β − z) p  (z).

(1) If deg p ≤ n then deg q ≤ n − 1;


(2) Suppose that deg p ≤ n, that p is D-stable and that |β| ≤ 1. Then q is also
D-stable.

Proof. If deg p ≤ n − 1 then deg q ≤ deg p ≤ n − 1. If deg p = n with the highest


term an z n then deg p ≤ n − 1 since the coefficient of z n in q(z) is nan − nan = 0,
which completes the proof of Part (1).
By continuity, it suffices to prove Part (2) assuming that deg p = n. Furthermore,
without loss of generality, we assume that p is monic. Let α1 , . . . , αn be the (not
necessarily distinct) roots of p, each listed with its multiplicity, so that

p(z) = (z − α1 ) · · · (z − αn ) and |α j | > 1 for j = 1, . . . , n.

Suppose that ζ is a root of q(z). Without loss of generality, we assume that ζ = α j


for i = 1, . . . , n. Then
np(ζ) + (β − ζ) p  (ζ) = 0

and since p(ζ) = 0, we have ζ = β and

1 1
n
1 1 p  (ζ)
= = . (2.5.6.1)
ζ −β n p(ζ) n i=1 ζ − αi

Suppose that |ζ| ≤ 1. The transformation

1
z −→ for z ∈ C \ D
ζ−z

is a bijection between C \ D and a set S ⊂ C that is either an open disc (if |ζ| < 1)
or an open halfplane (if |ζ| = 1). In either case, S is convex. Moreover,
2.5 D-Stable Polynomials 43

1
∈ S for i = 1, . . . , n.
ζ − αi

Since S is convex, by (2.5.6.1), we have

1
∈S
ζ −β

which implies that β ∈ C \ D, a contradiction. 

The following lemma gives a closed form description of the polynomials obtained
by a repeated application of the transformation of Lemma 2.5.6.
2.5.7 Lemma. Let  

n
n k
f 0 (z) = f (z) = ak z
k=0
k

be a polynomial and let β1 , . . . , βn be a sequence of complex numbers. We define


polynomials f 1 , . . . , f n by
  
f j (z) = (n − j + 1) f j−1 (z) + β j − z f j−1 (z) for j = 1, . . . , n.

Then

n
f n = n! ak ek (β1 , . . . , βn ) .
k=0

Proof. By the repeated application of Part (1) of Lemma 2.5.6, we have deg f j ≤
n − j for j = 0, . . . , n, so that f n is a constant. We prove by induction on j that

j
(n − k)!   dk
f j (z) = ek β1 − z, . . . , β j − z f (z). (2.5.7.1)
k=0
(n − j)! dz k

Clearly, (2.5.7.1) holds for j = 0. Assuming that (2.5.7.1) holds for j, we obtain
 
f j+1 (z) =(n − j) f j (z) + β j+1 − z f j (z)
j
(n − k)!   dk
=(n − j) ek β1 − z, . . . , β j − z f (z)
k=0
(n − j)! dz k

 d  (n − k)! 
j
  dk
+ β j+1 − z ek β1 − z, . . . , β j − z f (z).
dz k=0 (n − j)! dz k
44 2 Preliminaries

Using that for k > 0 we have

d    j 
ek β1 − z, . . . , β j − z = − ek−1 . . . , β
i − z, . . .
dz i=1
 
= − ( j − k + 1)ek−1 β1 − z, . . . , β j − z

we obtain that the coefficient of


dk
f (z)
dz k

in f j+1 (z) is

(n − k)!  
ek β1 − z, . . . , β j − z
(n − j − 1)!
(n − k)!    
− ( j − k + 1) β j+1 − z ek−1 β1 − z, . . . , β j − z
(n − j)!
(n − k + 1)!    
+ β j+1 − z ek−1 β1 − z, . . . , β j − z
(n − j)!
(n − k)!  
= ek β1 − z, . . . , β j − z
(n − j − 1)!
(n − k)!    
+ β j+1 − z ek−1 β1 − z, . . . , β j − z
(n − j − 1)!
(n − k)!  
= ek β1 − z, . . . , β j+1 − z
(n − j − 1)!

and the proof of (2.5.7.1) follows.


Since deg f n = 0 so that f n (z) does not depend on z, from (2.5.7.1) we obtain


n  
n
f n = f n (0) = (n − k)!ek (β1 , . . . , βn ) ak k!
k=0
k

n
=n! ak ek (β1 , . . . , βn )
k=0

as required. 

2.5.8 Proof of Theorem 2.5.5. Let us choose arbitrary β1 , . . . , βn ∈ D and let us


construct the polynomials f j for j = 1, . . . , n as in Lemma 2.5.7. By the repeated
application of Lemma 2.5.6, we conclude that f n is D-stable, that is, f n = 0. However,
by Lemma 2.5.7 we have
f n = n!F (β1 , . . . , βn )

and hence F (β1 , . . . , βn ) = 0, which completes the proof. 


2.5 D-Stable Polynomials 45

2.5.9 Proof of Theorem 2.5.4. From Theorem 2.5.5 we conclude that the polyno-
mials


n
F (z 1 , . . . , z n ) = ak ek (z 1 , . . . , z n ) and
k=0
n
G (z 1 , . . . , z n ) = bk ek (z 1 , . . . , z n )
k=0

are D-stable. Since the polynomials F and G are multi-affine, by Theorem 2.5.1 the
polynomial

n
H (z 1 , . . . , z n ) = ck ek (z 1 , . . . , z n )
k=0

is also D-stable. Then for any z ∈ D we have

h(z) = H (z, . . . , z) = 0

and hence h is D-stable. 

We will use the following simple corollary of Theorem 2.5.4.

2.5.10 Corollary. Let


n 
n
f (z) = ak z k and g(z) = bk z k
k=0 k=0

be two polynomials such that f (z) = 0 whenever |z| ≤ λ and g(z) = 0 whenever
|z| ≤ μ for some λ, μ > 0. Let h = f ∗ g,


n
ak bk
h(z) = ck z k where ck = n  for k = 0, . . . , n.
k=0 k

Then h(z) = 0 whenever |z| ≤ λμ.

Proof. The polynomials 


f (z) = f (λz) and 
g = g(μz) are D-stable. Therefore,
by Theorem 2.5.4, the polynomial h(z) = h(λμz) is D-stable. The proof now
follows. 

For extensions and generalizations of Theorems 2.5.1 and 2.5.4, see [BB09].
Chapter 3
Permanents

Introduced in 1812 by Binet and Cauchy, permanents are of interest to combina-


torics, as they enumerate perfect matchings in bipartite graphs, to physics as they
compute certain integrals and to computer science as they occupy a special place in
the computational complexity hierarchy. This is our first example of a partition func-
tion and we demonstrate in detail how various approaches work. Connections with
H-stable polynomials lead, in particular, to an elegant proof of the van der Waerden
lower bound for the permanent of a doubly stochastic matrix. Combining it with the
Bregman - Minc upper bound, we show that permanents of doubly stochastic matrices
are strongly concentrated. Via matrix scaling, this leads to an efficient approximation
of the permanent of non-negative matrices by a function with many convenient prop-
erties: it is easily computable, log-concave and generally amenable to analysis. As
an application of the interpolation method, we show how to approximate permanents
of a reasonably wide class of complex matrices and also obtain approximations of
logarithms of permanents of positive matrices by low degree polynomials.

3.1 Permanents
 
3.1.1 Permanent. Let A = ai j be an n ×n real or complex matrix. The permanent
of A is defined as
 n
per A = aiσ(i) , (3.1.1.1)
σ∈Sn i=1

where Sn is the symmetric group of all n! permutations of the set {1, . . . , n}.

One can see that the permanent does not change when the rows or columns of
the matrix are permuted and that per A is linear in each row and each column of A.
Moreover, if n > 1, then denoting by A j the (n − 1) × (n − 1) matrix obtained from

© Springer International Publishing AG 2016 47


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_3
48 3 Permanents

A by crossing out the first row and the j-th column, we obtain the row expansion


n
per A = a1 j per A j . (3.1.1.2)
j=1

3.1.2 Permanents and perfect matchings. If A is a real matrix and ai j ∈ {0, 1}


for all i, j then per A has a combinatorial interpretation as the number of perfect
matchings in a bipartite graph G with biadjacency matrix A. Namely, the vertices of
G are 1L , 2L . . . , n L and 1R, 2R, . . . , n R (“L” is for “left” and “R” is for “right”),
whereas the edges of G are all unordered pairs {i L , j R} for which ai j = 1. A perfect
matching in a graph G is a collection of edges which contain every vertex of G
exactly once, see Fig. 3.1.
In this case, per A is the number of perfect matchings in G, since every perfect
matching in G corresponds to a unique permutation σ such that aiσ(i) = 1 for all
i = 1, . . . , n. For example, Fig. 3.1 pictures a graph encoded by the matrix
⎛ ⎞
1 1 0 0
⎜1 0 0 1⎟
A=⎜
⎝0
⎟ (3.1.2.1)
0 1 1⎠
0 1 0 0

and a perfect matching corresponding to the permutation

1234
σ= (3.1.2.2)
1432

Fig. 3.1 A bipartite graph


and a perfect matching (thick
1L 1R
edges)
2L 2R
3L 3R
4L 4R
Fig. 3.2 A graph and a cycle
cover (thick edges)
2
1

3 4
3.1 Permanents 49

3.1.3 Permanents and cycle covers. A different interpretation of the permanent of


a 0–1 matrix A arises if we interpret A as the adjacency matrix of a directed graph
G. In this case, the vertices of G are 1, . . . , n whereas the edges of G are all ordered
pairs (i, j) such that ai j = 1 (in particular, we allow loops). A cycle cover of G is a
collection of edges which contain every vertex of G exactly once as the beginning
point of an edge and exactly once as an endpoint of an edge, see Fig. 3.2.
In this case, per A is the number of cycle covers of G, since every cycle cover of
G corresponds to a unique permutation σ such that aiσ(i) = 1 for all i = 1, . . . , n.
For example, Fig. 3.2 pictures a graph encoded by the matrix (3.1.2.1) and a cycle
cover corresponding to the permutation (3.1.2.2).
Interpretations of Sects. 3.1.2 and 3.1.3 explain why permanents are of interest to
combinatorics, see [LP09] for more.
3.1.4 Permanents as integrals. Let μn be the Gaussian probability measure on the
complex vector space Cn with density

1 −z2
e where z2 = |z 1 |2 + . . . + |z n |2 for z = (z 1 , . . . , z n ) .
πn
The measure μn is normalized in such a way that

E |z i |2 = 1 for i = 1, . . . , n and E z i z j = 0 for i = j.

Let f 1 , . . . , f n ; g1 , . . . , gn : Cn −→ C be linear forms and let us define an n × n


matrix A = ai j by

ai j = E f i g j = f i (z)g j (z) dμn for all i, j.
Cn

Then
E ( f 1 · · · f n g1 · · · gn ) = per A. (3.1.4.1)

Formula (3.1.4.1) is known as (a version of) Wick’s formula, see for example, [Zv97]
and [Gu04]. To prove it, we note that both sides of (3.1.4.1) are linear in each
f i and antilinear in each g j . Namely, denoting the left hand side of (3.1.4.1) by
L ( f 1 , . . . , f n ; g1 , . . . , gn ) and the right hand side by R ( f 1 , . . . , f n ; g1 , . . . , gn ), we
observe that
 
L f 1 , . . . , f i−1 , α1 f i + α2 f i , f i+1 , . . . , f n ; g1 , . . . , gn
 
= α1 L f 1 , . . . , f i−1 , f i , f i+1 , . . . , f n ; g1 , . . . , gn
 
+ α2 L f 1 , . . . , f i−1 , f i , f i+1 , . . . , f n ; g1 , . . . , gn
and
 
R f 1 , . . . , f i−1 , α1 f i + α2 f i , f i+1 , . . . , f n ; g1 , . . . , gn
50 3 Permanents

 
= α1 R f 1 , . . . , f i−1 , f i , f i+1 , . . . , f n ; g1 , . . . , gn
 
+ α2 R f 1 , . . . , f i−1 , f i , f i+1 , . . . , f n ; g1 , . . . , gn

as well as
 
L f 1 , . . . , f n ; g1 , . . . , gi−1 , α1 gi + α2 gi , gi+1 . . . , gn
 
= α1 L f 1 , . . . , f n ; g1 , . . . , gi−1 , gi , gi+1 , . . . , gn
 
+ α2 L f 1 , . . . , f n ; g1 , . . . , gi−1 , gi , gi+1 , . . . , gn
and
 
R f 1 , . . . , f n ; g1 , . . . , gi−1 , α1 gi + α2 gi , gi+1 , . . . , gn
 
= α1 R f 1 , . . . , f n ; g1 , . . . , gi−1 , gi , gi+1 , . . . , gn
 
+ α2 R f 1 , . . . , f n ; g1 , . . . , gi−1 , gi , gi+1 , . . . , gn .

Hence it suffices to check (3.1.4.1) when each f i and g j is a coordinate function.


Suppose therefore that
⎛ ⎞
⎜ ⎟
( f 1 , . . . , f n ) = ⎝z 1 , . . . , z 1 , . . . , z n , . . . , z n ⎠ and
     
m 1 times m n times
⎛ ⎞
⎜ ⎟
(g1 , . . . , gn ) = ⎝z 1 , . . . , z 1 , . . . , z n , . . . , z n ⎠ ,
     
k1 times kn times

where m 1 , . . . , m n and k1 , . . . , kn are non-negative integers such that

m 1 + . . . + m n = k1 + . . . + kn = n.

If we have m i = ki for some i then the left hand side of (3.1.4.1) is 0 since

E z im i z iki = 0 provided m i = ki .

On the other hand, the right hand side of (3.1.4.1) is also 0. Indeed, without loss of
generality, we may assume that m i > ki . The matrix A contains an m i × (n − ki )
block of 0 s and if m i > ki each of the n! terms of (3.1.1.1) contains and least one
entry from that block and hence is 0. Thus it remains to prove (3.1.4.1) in the case
when m i = ki for all i = 1, . . . , n. Since

E z im i z im i = m i !,
3.1 Permanents 51

0
Fig. 3.3 The structure of
matrix A 1
1

0 1

we conclude that the left hand side of (3.1.4.1) is m 1 ! · · · m n !. The matrix A in this
case consists of the diagonal blocks filled by 1 s of sizes m 1 , . . . , m n , see Fig. 3.3,
and hence the right hand side of (3.1.4.1) is also m 1 ! · · · m n !. 
One immediate corollary of (3.1.4.1) is that

per A ≥ 0 provided A is Hermitian positive semidefinite. (3.1.4.2)


 
Indeed, any such A = ai j can be written as
 
ai j = E f i f j for all i, j

and some linear forms f 1 , . . . , f n , in which case by (3.1.4.1) we have


   
per A = E f 1 · · · f n f 1 · · · f n = E | f 1 |2 · · · | f n |2 ≥ 0.

The identity of Sect. 3.1.4 has some relevance to statistics of bosons in quantum
physics, see, for example, [AA13] and [Ka16].
3.1.5 Permanents in computational complexity. Permanents occupy a special
place in the theory of computational complexity. Valiant [Va79] proved that com-
puting permanents of 0–1 matrices exactly (that is, counting perfect matchings in
bipartite graphs exactly) is an example of a #P-complete problem, that is, counting
perfect matchings in bipartite graphs in polynomial time exactly would lead to a
polynomial time counting of the number of acceptable computations of a general
non-deterministic polynomial time Turing machine, see also [AB09] and [Go08].
This is especially striking since finding whether there exists a perfect matching in
a given bipartite graph is a famous problem solvable in polynomial time, see for
example, [LP09]. Exact computation of permanents of 0–1 matrices leads by inter-
polation to exact computation of permanents of matrices with 0 and ±1 entries and
those turn out to be sufficient to encode rather involved computations. In the alge-
braic complexity theory, permanents stand out as universal polynomials, see Part 5
of [B+97].
Permanents also stand out as an example of the problem where randomized algo-
rithms so far substantially outperform deterministic algorithms. The Monte Carlo
Markov Chain algorithm of Jerrum, Sinclair and Vigoda [J+04] approximates per-
manents of non-negative matrices in polynomial time and none of the deterministic
algorithms could achieve that so far, see also Sects. 3.7 and 3.9 below.
52 3 Permanents

3.2 Permanents of Non-negative Matrices


and H-Stable Polynomials
 
3.2.1 Permanents and products of linear forms. Let A = ai j be an n × n matrix
and let z 1 , . . . , z n be complex variables. The following simple formula has many
important consequences:
⎛ ⎞
∂n n n
per A = ⎝ ai j z j ⎠ . (3.2.1.1)
∂z 1 · · · ∂z n i=1 j=1

In other words, per A is the coefficient of z 1 · · · z n in the product (3.2.1.1) of linear


forms.
 
We note that if A = ai j is a non-negative real matrix with non-zero rows, then
the polynomial ⎛ ⎞

n 
n
f (z 1 , . . . , z n ) = ⎝ ai j z j ⎠
i=1 j=1

is H-stable, see Sect. 2.4, since


⎛ ⎞
n
⎝ ai j z j ⎠ > 0 provided z j > 0 for j = 1, . . . , n.
j=1

More generally, let a1 , . . . , an be the columns of A, so that A = [a1 , . . . , an ].


Given a non-negative integer vector m = (m 1 , . . . , m n ) such that m 1 +. . .+m n = n,
let ⎡ ⎤
⎢ ⎥
Am = ⎣a1 , . . . , a1 , . . . , ak , . . . , ak , . . . , an , . . . , an ⎦
        
m 1 times m k times m n times

be the n × n matrix with columns consisting of m k copies of ak for k = 1, . . . , n.


Then ⎛ ⎞
∂n n 
n
⎝ ai j z j ⎠ = per Am (3.2.1.2)
∂z 1m 1 · · · ∂z nm n i=1 j=1

(if m k = 0 for some k then the corresponding partial derivative is missing and so are
the copies of ak in Am ). Indeed, the left hand side of (3.2.1.2) is the coefficient of
z 1m 1 · · · z nm n in the product of linear forms

n
f i (z 1 , . . . , z n ) = ai j z j ,
j=1
3.2 Permanents of Non-negative Matrices and H-Stable Polynomials 53

multiplied by m 1 ! · · · m n !. Hence the left hand side of (3.2.1.2) can be written as



f 1 · · · f n z 1m 1 · · · z nm n dμn ,
Cn

for the Gaussian measure μn of Sect. 3.1.4, and (3.2.1.2) follows by (3.1.4.1).
3.2.2 Alexandrov - Fenchel inequalities. One immediate application of (3.2.1.1)
and (3.2.1.2) is an inequality for permanents of non-negative matrices, which is a
particular case of the Alexandrov - Fenchel inequality for mixed volumes of convex
bodies, see, for example, [Sa93].
Let [a1 , . . . , an ] denote the n × n matrix with non-negative real columns
a1 , . . . , an . Then

per 2 [a1 , . . . , an ] ≥ per[a1 , a1 , a3 , . . . , an ] per[a2 , a2 , a3 , . . . , an ]. (3.2.2.1)

By continuity, it suffices to prove (3.2.2.1) assuming that the coordinates of a1 , . . . , an


are strictly positive. Let ai j > 0 be the i-th coordinate of a j . Then, from Sect. 3.2.1,
the polynomial ⎛ ⎞

n 
n
f (z 1 , . . . , z n ) = ⎝ ai j z j ⎠
i=1 j=1

is H-stable. Let

∂ n−2
g(z 1 , z 2 ) = f = uz 12 + 2vz 1 z 2 + wz 22 .
∂z 3 · · · ∂z n

Using (3.2.1.2) we observe that

1 1
u = per[a1 , a1 , a3 , . . . , an ], v = per[a1 , . . . , an ] and
2 2
1
w = per[a2 , a2 , a3 , . . . , an ].
2
By the repeated application of Part (3) of Lemma 2.4.2, the quadratic polynomial q is
H-stable, which implies that v 2 ≥ uw and we get (3.2.2.1). Indeed, if v 2 < uw then
the univariate polynomial t −→ u + 2vt + wt 2 has a pair of complex conjugate roots
α±βi for some β > 0. Then, for any  > 0, the point z 1 = 1+i, z 2 = (α+βi)(1+i)
is a root of q(z 1 , z 2 ) and if  > 0 is sufficiently small, we have z 2 = α + β > 0,
which contradicts the H-stability of q.
The connection of (3.2.2.1) to the Alexandrov - Fenchel inequality for mixed
volumes is as follows. Let K 1 , . . . , K n ⊂ Rn be convex bodies and let λ1 , . . . , λn be
positive real numbers. We consider a combination λ1 K 1 + . . . + λn K n , where

λK = {λx : x ∈ K }
54 3 Permanents

is the dilation/contraction by a factor of λ and “+” stands for the Minkowski sum of
convex bodies:
A + B = {x + y : x ∈ A, y ∈ B}.

As is known, the volume vol (λ1 K 1 + . . . + λn K n ) is a homogeneous polynomial in


λ1 , . . . , λn and its coefficient

∂n
V (K 1 , . . . , K n ) = vol (λ1 K 1 + . . . + λn K n )
∂λ1 · · · ∂λn

is called the mixed volume of K 1 , . . . , K n . The Alexandrov - Fenchel inequality


asserts that

V 2 (K 1 , . . . , K n ) ≥ V (K 1 , K 1 , K 3 , . . . , K n )V (K 2 , K 2 , K 3 , . . . , K n ). (3.2.2.2)

We obtain (3.2.2.1), if we choose K j to be the parallelepiped, that is the direct product


of axis-parallel intervals:

K j = [0, a1 j ] × . . . × [0, an j ].

In this case λ1 K 1 + . . . + λn K n is the parallelepiped


⎡ ⎤ ⎡ ⎤

n 
n
⎣0, a1 j λ j ⎦ × . . . × ⎣0, an j λ j ⎦ ,
j=1 j=1

cf. Fig. 3.4,


so that

Fig. 3.4 Parallelepipeds K 1 ,


K 2 and their Minkowski sum
K1 + K2
K1 K2

K 1+ K 2
3.2 Permanents of Non-negative Matrices and H-Stable Polynomials 55
⎛ ⎞

n n
vol (λ1 K 1 + . . . + λn K n ) = ⎝ ai j λ j ⎠
i=1 j=1

and  
V (K 1 , . . . , K n ) = per A where A = ai j ,

We note that for general convex bodies K 1 , . . . , K n , the polynomial vol(λ1 K 1 +


. . . + λn K n ) does not have to be H-stable, cf. [Kh84].

3.3 The van der Waerden Inequality and Its Extensions


 
3.3.1 Doubly stochastic matrices. A real n × n matrix A = ai j is called doubly
stochastic if


n 
n
ai j = 1 for i = 1, . . . , n, ai j = 1 for j = 1, . . . , n
j=1 i=1

and
ai j ≥ 0 for all i, j.

In words: a matrix is doubly stochastic if it is non-negative real with all row and
column sums equal 1.

Clearly, permutation matrices (matrices, containing in each row and column


exactly one non-zero entry equal to 1) are doubly stochastic, as well as the matrix

1
Jn ,
n
where Jn is the n × n matrix of all 1s.
The main goal of this section is to prove the following result, known as the
van der Waerden conjecture.

3.3.2 Theorem. Let A be an n × n doubly stochastic matrix. Then

n!
per A ≥ .
nn
1
Moreover, the equality is attained if and only if A = Jn .
n
Theorem 3.3.2 was first proved by Falikman [Fa81] and Egorychev [Eg81] (earlier
Friedland [Fr79] proved a slightly weaker bound per A ≥ e−n ). Our exposition
56 3 Permanents

follows Gurvits’ paper [Gu08] with some simplifications introduced in [Wa11] and
[LS10]. We use the notion of capacity, see Sect. 2.1.5, Theorem 2.4.3 and Corollary
2.4.6.
 
3.3.3 Lemma. Let A = ai j be an n × n doubly stochastic matrix and let
⎛ ⎞

n n
p(x1 , . . . , xn ) = ⎝ ai j x j ⎠ .
i=1 j=1

Then
p (x1 , . . . , xn )
inf = 1.
x1 ,...,xn >0 x1 · · · xn

Proof. Clearly, p(1, . . . , 1) = 1 and hence the infimum does not exceed 1. On the
other hand, using the arithmetic-geometric mean inequality, see Sect. 2.1.1.1, we
conclude that for x1 , . . . , xn > 0 we get
⎛ ⎞ ⎛ ⎞  n 

n 
n 
n 
n 
n  ai j n  
  n
n
⎝ ai j x j ⎠ ≥ ⎝ xj ⎠ =
ai j
xj =
a
x j i=1 i j = xj
i=1 j=1 i=1 j=1 j=1 i=1 j=1 j=1

and hence the infimum is at least 1. 


To prove the van der Waerden inequality, we use H-stability, see Sect. 3.2.
3.3.4 Proof of Theorem 3.3.2. As in Sect. 3.2.1, we define a polynomial p = p A
in n variables x1 , . . . , xn :
⎛ ⎞

n n
p (x1 , . . . , xn ) = ⎝ ai j x j ⎠ .
i=1 j=1

As we discussed in Sect. 3.2.1, the polynomial p is H-stable and hence by Corollary


2.4.6, we have
∂n p n! p (x1 , . . . , xn )
≥ n inf . (3.3.4.1)
∂x1 · · · ∂xn n x1 ,...,xn >0 x1 · · · xn

By (3.2.1.1), the left hand side of (3.3.4.1) is per A, while by Lemma 3.3.3, the
infimum in the right hand side of (3.3.4.1) is 1.
In the uniqueness proof, we follow [LS10]. Suppose now that A is a doubly
stochastic matrix such that per A = n!/n n . Then inequality (3.3.4.1) is, in fact,
equation. Analyzing the proof of Theorem 2.4.3 in Sect. 2.4.5, we conclude that for
⎛ ⎛ ⎞⎞ ⎛ ⎞

∂ ⎝ ⎝    
n n n n−1
q(x1 , . . . , xn−1 ) = ai j x j ⎠⎠  = akn ⎝ ai j x j ⎠ ,
∂xn i=1 j=1 xn =0 k=1 i:i=k j=1
3.3 The van der Waerden Inequality and Its Extensions 57

we must have
n−1
q (x1 , . . . , xn−1 ) n−1
inf = . (3.3.4.2)
x1 ,...,xn−1 >0 x1 · · · xn−1 n

Applying the arithmetic-geometric mean inequality, see Sect. 2.1.1.1, we conclude


that for all x1 > 0, . . . , xn−1 > 0, we get
⎛ ⎞akn ⎛ ⎞akn

n  n−1 
n  
n−1
q(x1 , . . . , xn−1 ) ≥ ⎝ ai j x j ⎠ = ⎝ ai j x j ⎠
k=1 i:i=k j=1 i=1 k:k=i j=1
⎛ ⎞1−ain

n n−1
= ⎝ ai j x j ⎠ .
i=1 j=1

Using the arithmetic-geometric mean inequality again, we conclude that for all x1 >
0, . . . , xn−1 > 0, we have
⎛ ⎞1−ain

n 
n−1
ai j
q(x1 , . . . , xn−1 ) ≥ ⎝(1 − ain ) xj⎠
i=1 j=1
1 − ain
⎛ ⎞
n  ai j
n−1
≥ ⎝(1 − ain )1−ain xj ⎠
i=1 j=1
 n  ⎛n−1 ⎞
 
= (1 − ain )1−ain ⎝ x j ⎠ .
i=1 j=1

Therefore,
q (x1 , . . . xn−1 ) n
inf ≥ (1 − ain )1−ain .
x1 ,...,xn−1 >0 x1 . . . xn−1 i=1

By (3.3.4.2), we must have


n
n−1 n−1
(1 − ain )1−ain ≤ . (3.3.4.3)
i=1
n

Now, since the function t −→ t ln t is strictly convex for t > 0, see Sect. 2.1.1.2, we
conclude that
1
n
t 1 + . . . + tn t 1 + . . . + tn
ti ln ti ≥ ln
n i=1 n n
58 3 Permanents

for all t1 , . . . , tn with equality if and only if t1 = . . . = tn . Applying it with ti =


1 − ain , we get

1
n
n−1 n−1
(1 − ain ) ln (1 − ain ) ≥ ln
n i=1 n n

with equality if and only if ain = 1/n for i = 1, . . . , n. In other words,


n
n−1 n−1
(1 − ain )1−ain ≥
i=1
n

with equality if and only if ain = 1/n for i = 1, . . . , n. Comparing this with (3.3.4.3),
we conclude that if per A = n!/n, we must have ain = 1/n for i = 1, . . . , n. Since
the matrix obtained from a doubly stochastic matrix by a permutation of columns
remains doubly stochastic with the same permanent, we conclude that ai j = 1/n for
all i and j as desired. 
3.3.5 Sharpening. Suppose that A is a doubly stochastic matrix and that, addi-
tionally, the j-th column of A contains not more than k j non-zero entries for some
1 ≤ k j ≤ n and j = 1, . . . , n. Using Theorem 2.4.3, we obtain

∂n n
kj − 1 k j −1
per A = p ≥ (3.3.5.1)
∂x1 · · · ∂xn j=1
kj

or, even sharply,

n
min{ j, k j } − 1 min{ j,k j }−1
per A ≥ , (3.3.5.2)
j=1
min{ j, k j }

where the corresponding factor is 1 if min{ j, k j } = 1. Inequalities (3.3.5.1) and


(3.3.5.2) are also due to Gurvits [Gu08]. In the case when all k j = 3 for all j, the
inequality (3.3.5.2) was obtained by Voorhoeve [Vo79] and in the case when all k j
are equal, the inequality (3.3.5.1) was obtained by Schrijver [Sc98]. In the case of
all k j equal, we will give a different proof of (3.3.5.1) in the particular case when
the non-zero entries of A are 1/k in Theorem 5.3.6, where we also show, following
Csikvári [Cs14], that asymptotically, as n grows, the bound is logarithmically exact.

3.4 The Bregman–Minc Inequality and Its Corollaries

The following inequality was conjectured by Minc, cf. [Mi78], and proved by
Bregman [Br73]. We follow the approach of Radhakrishnan [Ra97], only using the
language of partitions instead that of random variables.
3.4 The Bregman–Minc Inequality and Its Corollaries 59
 
3.4.1 Theorem. Let A = ai j be an n × n matrix such that ai j ∈ {0, 1} for all i, j.
Let
n
ri = ai j
j=1

be the number of 1 s in the ith row of A. Then


n
per A ≤ (ri !)1/ri .
i=1

Let us define

 = {σ ∈ Sn : aiσ(i) = 1 for i = 1, . . . , n}.

Hence
per A = ||.

Without loss of generality, we assume that  = ∅, in which case we consider  as


a probability space with uniform measure.
We start with a probabilistic argument.

3.4.2 Lemma. Let us fix a permutation σ ∈  and an index 1 ≤ i ≤ n. Let us


choose a permutation τ ∈ Sn uniformly at random, find k such that τ (k) = i and
cross out from A the columns indexed by σ(τ (1)), . . . , σ(τ (k − 1)). Let x be the
number of 1 s remaining in the ith row of A after the columns are crossed out. Then

1
Pr (x = a) = for a = 1, . . . , ri .
ri

Proof. Let J be the set of indices of columns where the ith row of A contains 1 and
let I = σ −1 (J ). Then i ∈ I and x is the number of indices in τ −1 (I ) that are greater
than or equal to k = τ −1 (i). Since τ ∈ Sn is chosen uniformly at random, τ −1 (i) is
equally probable to be the largest, second largest, etc. element of τ −1 (I ). 

3.4.3 Proof of Theorem 3.4.1


For a permutation τ ∈ Sn we construct a family of partitions

F τ ,0 Fτ ,1 ... Fτ ,n

of  as follows. We let Fτ ,0 = {}. The partition Fτ ,1 consists of the events



Fi = σ ∈  : σ(τ (1)) = i for i = 1, . . . , n

(note that not more than rτ (1) of the events Fi are non-empty). Generally, the partition
Fτ ,k consists of the events
60 3 Permanents

Fi1 ,...,ik = σ ∈  : σ(τ (1)) = i 1 , . . . , σ(τ (k)) = i k
for distinct 1 ≤ i 1 , . . . , i k ≤ n

(again, some of the events can be empty). In particular, the non-empty events in Fτ ,n
are singletons. From (2.1.2.4), using that H ({}) = 0 and H ({Fτ ,n }) = ln ||, we
obtain
n
ln || = H (Fτ ,k |Fτ ,k−1 ).
k=1

Averaging over all τ ∈ Sn , we obtain

1 
n
ln || = H (Fτ ,k |Fτ ,k−1 ). (3.4.3.1)
n! τ ∈S k=1
n

For a permutation σ ∈ , let Fτ ,k−1 (σ) be the block of Fτ ,k−1 that contains σ. We
consider Fτ ,k−1 (σ) as a probability space with conditional probability measure and
let Fτ ,k−1 (σ) be the partition of that space by the events of Fτ ,k . Then
  
H (Fτ ,k |Fτ ,k−1 ) = Pr (σ)H Fτ ,k−1 (σ) ,
σ∈

cf. (2.1.2.3), and by (3.4.3.1) we have

1    
n
ln || = Pr (σ)H Fτ ,k−1 (σ)
n! τ ∈S k=1 σ∈
n
(3.4.3.2)
 1  
n

= Pr (σ) H Fτ ,k−1 (σ) .
σ∈
n! τ ∈S k=1
n

We fix an arbitrary σ ∈  and consider the sum

1   
n
H Fτ ,k−1 (σ) . (3.4.3.3)
n! τ ∈S k=1
n

Recall that Fτ ,k−1 (σ) is the partition of the probability space  consisting of all
permutations π ∈  such that π(τ (1)) = σ(τ (1)), . . . , π(τ (k − 1)) = σ(τ (k − 1))
into the events defined by the choice of π(τ (k)). We rearrange (3.4.3.3) in accordance
with the value of i = τ (k):

1     1   
n n
H Fτ ,k−1 (σ) = H Fτ ,τ −1 (i)−1 (σ) (3.4.3.4)
n! τ ∈S k=1 i=1
n! τ ∈S
n n
3.4 The Bregman–Minc Inequality and Its Corollaries 61

and consider each term


1   
H Fτ ,τ −1 (i)−1 (σ) (3.4.3.5)
n! τ ∈S
n

separately.
Now, the partition Fτ ,τ −1 (i)−1 (σ) looks as follows. We fixed σ ∈  and 1 ≤ i ≤ n.
For the permutation τ , we find k such that τ (k) = i, consider the probability space of
all permutations π ∈  such that π(τ (1)) = σ(τ (1)), . . . , π(τ (k −1)) = σ(τ (k −1))
endowed with uniform probability measure and partition it according to the value of
π(i). By (2.1.2.2),
 
H Fτ ,τ −1 (i)−1 (σ) ≤ ln a provided Fτ ,τ −1 (i)−1 (σ) contains a events.

By Lemma 3.4.2, the value of (3.4.3.5) does not exceed

1
ri
1
ln a = ln(ri !).
ri a=1 ri

Then by (3.4.3.4), the value of (3.4.3.3) does not exceed

n
1
ln(ri !).
r
i=1 i

By (3.4.3.2), we get
n
1
ln || ≤ ln(ri !),
r
i=1 i

and the proof follows. 

3.4.4 Remark. Let Jr be the r × r matrix filled with 1s. If A is a block-diagonal


matrix with blocks Jr1 , . . . , Jrm , then


m
per A = ri !,
i=1

from which it follows that the bound of Theorem 3.4.1 is sharp.

Theorem 3.4.1 allows us to bound permanents of stochastic matrices.


 
3.4.5 Corollary. Suppose that A = ai j is an n × n stochastic matrix, that is,
ai j ≥ 0 for all i, j and
62 3 Permanents


n
ai j = 1 for all i = 1, . . . , n. (3.4.5.1)
j=1

Suppose that
1
ai j ≤ for all i, j (3.4.5.2)
bi

and some positive integers b1 , . . . , bn . Then


n
(bi !)1/bi
per A ≤ .
i=1
bi

Proof. Let us fix all but the i-th row of an n × n matrix A and allow the ith row
vary. Then per A is a linear function in the i-th row ai = (ai1 , . . . , ain ). Let us
consider the polytope Pi of all n-vectors ai = (ai1 , . . . , ain ) such that all entries
ai j are non-negative and the conditions (3.4.5.1) and (3.4.5.2) are met. By linearity,
the maximum value of per A on Pi is attained at a vertex of Pi , in which case we
necessarily have ai j ∈ {0, 1/bi j } for j = 1, . . . , n. Indeed, if 0 < ai j1 < 1/bi for
some j1 then there is another j2 = j1 such  that 0 < ai j2 < 1/bi (recall that bi is an
integer). In that case, we can write ai = ai1 + ai2 /2, where ai1 is obtained from ai
by the perturbation ai j1 := ai j1 + , ai j2 := ai j2 −  and ai2 is obtained from ai by the
perturbation ai j1 := ai j1 − , ai j2 := ai j2 +  for a sufficiently small  > 0, which
implies that ai is not a vertex of Pi .
Hence we conclude
  that the maximum of per A on the set of n × n non-negative
matrices A = ai j satisfying (3.4.5.1) and (3.4.5.2) is attained when ai j ∈ {0, 1/bi j }
for all i, j. Let B be the matrix obtained from such a matrix A by multiplying the
i-th row by bi . Then
 
n
1 
n
per B = per A and per B ≤ (bi !)1/bi
b
i=1 i i=1

by Theorem 3.4.1. 
The author learned Corollary 3.4.5 and its proof from A. Samorodnitsky [Sa01],
see also [So03] for a somewhat more general statement with bi not required to be
integer.
3.4.6 Concentration of the permanent of doubly stochastic matrices. The
van der Waerden bound (Theorem 3.3.2) together with the Bregman - Minc bound
(Corollary 3.4.5) implies that per A does not vary much if A is a doubly stochastic
matrix with small entries. Indeed, suppose that A is an n ×n doubly stochastic matrix.
Then, by Theorem 3.3.2,
n!
per A ≥ n ≥ e−n .
n
Let us fix an α ≥ 1 and suppose that, additionally,
3.4 The Bregman–Minc Inequality and Its Corollaries 63

α
ai j ≤ for all i, j.
n
Let !n"
b= ,
α
so that
1
ai j ≤ for all i, j
b
and by Corollary 3.4.5,
n
(b!)1/b
per A ≤ = e−n n O(α) .
b

Hence if the entries of an n × n doubly stochastic matrix are within a constant factor
of each other, the permanent of the matrix varies within a polynomial in n factor.

In fact,


n
 1−ai j 
n
 1−ai j
1 − ai j ≤ per A ≤ 2n 1 − ai j (3.4.6.1)
i, j=1 i, j=1

for any n × n doubly stochastic matrix A (if ai j = 1 the corresponding factor is 1),
where the lower bound is due to Schrijver [Sc98] and the upper bound was recently
established by Gurvits and Samorodnitsky [GS14], who also conjectured that the
upper bound holds with 2n replaced by 2n/2 .
The following useful inequality was conjectured by Vontobel [Vo13] and deduced
by Gurvits [Gu11]
  from the lower bound in (3.4.6.1)  
Let A = ai j be an n × n positive real matrix and let B = bi j be an n × n
doubly stochastic matrix.Then


n
ai j n
   
ln per A ≥ bi j ln + 1 − bi j ln 1 − bi j .
i, j=1
bi j i, j=1

We prove the inequality in Theorem 5.4.2 following the approach of Lelarge [Le15].
Note that if A is doubly stochastic, by choosing B = A we recover the lower bound
in (3.4.6.1).
64 3 Permanents

3.5 Matrix Scaling

Results of Sects. 3.3 and 3.4 provide us with some rather useful estimates of perma-
nents of doubly stochastic matrices. It turns out that computing the permanent of any
positive real matrix can be easily reduced to computing the permanent of a doubly
stochastic matrix.
 
3.5.1 Matrix scaling. Let A = ai j be  an n × n matrix. We say that A is obtained
by scaling from an n × n matrix B = bi j if

ai j = λi μ j bi j for all i, j

and some numbers λ1 , . . . , λn , μ1 , . . . , μn .


We note that in this case
 n ⎛ n ⎞
 
per A = λi ⎝ μ j ⎠ per B. (3.5.1.1)
i=1 j=1

 
3.5.2 Theorem. For any n × n matrix A = ai j such that

ai j > 0 for all i, j,


 
there exists a unique n×n doubly stochastic matrix B = bi j and positive λ1 , . . . , λn
and μ1 , . . . , μn such that

ai j = λi μ j bi j for all i, j. (3.5.2.1)

The numbers λi and μ j are unique up to a rescaling

λi −→ λi τ , μ j −→ μ j τ −1

for some τ > 0.

Proof. Without loss of generality, we may assume  n ≥ 2. Let n be the polytope


 that
of all n × n doubly stochastic matrices X = xi j and let us consider a function
f : n −→ R defined by
 n
xi j
f (X ) = xi j ln .
i, j=1
ai j

Then f is a strictly convex


  function, cf. Sect. 2.1.1.2, and hence it attains its unique
minimum, say B = bi j , on n .
3.5 Matrix Scaling 65

First, we establish that bi j > 0 for all i, j. Indeed,

∂ xi j
f (X ) = ln + 1. (3.5.2.2)
∂xi j ai j

If xi j = 0 we consider the right derivative and conclude that it is equal to −∞, while
for any xi j > 0 the derivative is finite. Let n1 Jn ∈ n be the matrix with all entries
equal to 1/n and let B(t) = (1 − t)B + t n1 Jn , so that B(0) = B and B(1) = n1 Jn . If
bi j = 0 for some i, j then for all sufficiently small t > 0 we have

f (Bt ) < f (B),

which contradicts the definition of B as the minimum point of f .


Thus B is a positive matrix and therefore lies in the relative interior of n . It
follows from (3.5.2.2) by the Lagrange multiplier conditions that there are numbers
α1 , . . . , αn and β1 , . . . , βn such that

bi j
ln = αi + β j for all i, j.
ai j

Letting
λi = e−αi and μ j = e−β j ,

we obtain (3.5.2.1).  
On the other hand, if a doubly stochastic matrix B = bi j satisfies (3.5.2.1)
then necessarily bi j > 0 for all i, j and B is a critical point of f on n . Since f is
strictly convex, B must be the unique minimum point of f on n , which proves the
uniqueness of B.
From (3.5.2.1) and the uniqueness of B, we obtain the uniqueness of λi and μ j
up to a rescaling. 

Scaling can be obtained by solving a different optimization problem.


 
3.5.3 Lemma. Let A = ai j be an n × n positive matrix. Let us define a function
g A : Rn ⊕ Rn −→ R by


n
g A (x, y) = ai j e xi +y j where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
i, j=1

and let L ⊂ Rn ⊕ Rn be the subspace defined by the equations


n 
n
xi = y j = 0.
i=1 j=1
66 3 Permanents

Then g attains its minimum on L at some point (x ∗ , y ∗ ) where x ∗ = (ξ1 , . . . , ξn ) and


y ∗ = (η1 , . . . , ηn ). Let
# #
−ξi g A (x ∗ , y ∗ ) −η j g A (x ∗ , y ∗ )
λi = e and μ j = e
n n
 
for all i, j and let us define an n × n matrix B = bi j by

bi j = λi−1 μ−1
j ai j for all i, j.

Then B is a doubly stochastic matrix.

Proof. First, we claim that the minimum of g A on L is indeed attained at some point.
Let
δ = min ai j > 0.
ij

Since for all (x, y) ∈ L, we have xi ≥ 0 and y j ≥ 0 for some i and j, we have

g A (0, 0) g A (0, 0)
g A (x, y) > g A (0, 0) if xi > ln or y j > ln
δ δ
for some i, j. On the other hand, if for some (x, y) ∈ L we have xi < −t for some
t > 0 then x j > t/n for some j and, similarly, if yi < −t for some t > 0 then
y j > t/n for some j. Therefore, the minimum of g A on L is attained on the compact
subset
  g A (0, 0)
|xi | ,  y j  ≤ n ln for all i, j.
δ
At the minimum point, the gradient of g A (x, y) is orthogonal to L, so for some α
and β we have
n
ai j eξi +η j = α for i = 1, . . . , n
j=1

and (3.5.3.1)

n
ai j eξi +η j = β for j = 1, . . . , n.
i=1

Summing the first set of equations over i = 1, . . . , n and the second set of equations
over j = 1, . . . , n, we conclude that


n
ai j eξi +η j = nα = nβ,
i, j=1
3.5 Matrix Scaling 67

so
1  ∗ ∗
α=β= gA x , y
n
and the proof follows from (3.5.3.1). 

3.5.4 Remark. Theorem 3.5.2 was proved by Sinkhorn [Si64], who used a different
approach. He showed that, given a positive matrix A, the repeated row and column
scaling (first, scale all rows to row sum 1, then scale all columns to column sum 1,
then again rows, then again columns, etc.) converges to the desired doubly stochastic
matrix B. An approach to scaling via a solution of an appropriate optimization
problem (similar to our Lemma 3.5.3) was used in [MO68] and several other papers
since then.
Clearly, not every non-negative matrix can be scaled to doubly stochastic (for
example, the matrix of all zeros cannot). Some non-negative matrices can be scaled
arbitrarily close to doubly stochastic, but cannot be scaled exactly, for example the
matrix
10
A= .
11

Indeed, multiplying the first column by  > 0 and the first row by −1 , we obtain the
matrix
10
B=
1

with row and column sums arbitrarily close to 1, but never exactly 1. It is shown
in [L+00] that a non-negative matrix A can be scaled arbitrarily close to a doubly
stochastic matrix if and only if per A > 0 and that it can be scaled exactly to a doubly
stochastic matrix, if, in addition, whenever for a set I ⊂ {1, . . . , n} of rows and for
a set J ⊂ {1, . . . , n} of columns such that |I | + |J | = n we have ai j = 0 for i ∈ I
and j ∈ J , we must also have ai j = 0 for all i ∈ / I and j ∈/ J . The conditions for
approximate and exact scaling can be efficiently (in polynomial time) verified. Also
[L+00] contains the fastest known algorithm for matrix scaling.

As is observed in [L+00], formula (3.5.1.1) together with the inequality

n!
≤ per B ≤ 1
nn
for the permanent of a doubly stochastic matrix B allows one to estimate the perma-
nent of any n × n non-negative matrix A within a multiplicative factor of roughly en
and the inequality (3.4.6.1) improves the factor further to 2n (and, conjecturally, to
2n/2 ). Computationally, matrix scaling is very efficient and in view of Sect. 3.4.6 it is
natural to ask for which matrices A their doubly stochastic scaling B will not have
large entries, so that a better upper bound on per B can be used.
68 3 Permanents
 
3.5.5 Definition. Let A = ai j be an n × n positive matrix. For α ≥ 1 we say that
A is α-conditioned if

ai j1 ≤ αai j2 for any 1 ≤ i, j1 , j2 ≤ n

and
ai1 j ≤ αai2 j for any 1 ≤ i 1 , i 2 , j ≤ n.

In words: an n × n positive matrix is α-conditioned if the ratio of any two entries of


A in the same row and the ratio of any two entries of A in the same column do not
exceed α.

  Let A be an n × n matrix which is α-conditioned for some α ≥ 1.


3.5.6 Lemma.
Let B = bi j be the doubly stochastic matrix obtained from A by scaling. Then B
is α2 -conditioned. In particular,

α2
bi j ≤ for all i, j.
n
 
Proof. Let A = ai j and let λ1 , . . . , λn and μ1 , . . . , μn be positive real such that

bi j = λi μ j ai j for all i, j.

Then
bi j1 μ j ai j μ j1
= 1 1 ≤ α for all 1 ≤ j1 , j2 ≤ n. (3.5.6.1)
bi j2 μ j2 ai j2 μ j2

Since

n 
n
bi j1 = bi j2 = 1,
i=1 i=1

we conclude that
μ j1 1
≥ for all j1 , j2 .
μ j2 α

On the other hand, since

ai j1 1
≥ for all j1 , j2 ,
ai j2 α

from (3.5.6.1) we conclude that

bi j1 1
≥ 2 for all j1 , j2 . (3.5.6.2)
bi j2 α
3.5 Matrix Scaling 69

Similarly, we prove that


bi1 j 1
≥ 2 for all i 1 , i 2
bi1 j α

and hence B is α2 -conditioned.


Since
n
bi j = 1 for all i = 1, . . . , n,
j=1

we have
1
bi j ≥ for every i and some j
n
and the proof follows by (3.5.6.2). 
Lemma 3.5.6 together the observation of Sect. 3.4.6 and formula (3.5.1.1) allows
us, given an n × n positive matrix A whose entries are within a constant factor of
each other, to compute per A by scaling within a polynomial in n factor.
Although the scaling factors λ1 , . . . , λn and μ1 , . . . , μn are not uniquely defined
by the matrix, Theorem 3.5.2 implies that their product λ1 · · · λn μ1 · · · μn is a function
of the matrix. It has some interesting convex properties.
 
3.5.7 Lemma. For ann ×n positive matrix A = ai j , let us define a number f (A)
as follows: Let B = bi j be a doubly stochastic matrix and let λ1 , . . . , λn and
μ1 , . . . , μn be positive numbers such that

ai j = λi μ j bi j for all i, j.

Let  n ⎛ n ⎞
 
f (A) = λi ⎝ μ j ⎠ .
i=1 j=1

Then f is well-defined and satisfies the following properties:


(1) Function f is homogeneous of degree n:

f (α A) = αn f (A) for all α > 0

and all positive n × n matrices A;


(2) Function f is monotone:
f (C) ≤ f (A)
   
for any positive n × n matrices A = ai j and C = ci j such that

ci j ≤ ai j for all i, j;
70 3 Permanents

(3) Function f 1/n is concave:

f 1/n (α1 A1 + α2 A2 ) ≥ α1 f 1/n (A1 ) + α2 f 1/n (A2 )

for any positive n × n matrices A1 and A2 and any α1 , α2 ≥ 0 such that


α1 + α2 = 1.
Proof. Theorem 3.5.2 implies that f is well-defined and Part (1) is straightforward.
As in Lemma 3.5.3, let us define


n
g A (x, y) = ai j e xi +y j
i, j=1

and let L ⊂ Rn ⊕ Rn be the subspace defined by the equations x1 + . . . + xn = 0


and y1 + . . . + yn = 0. Then, by Lemma 3.5.3,

1
f (A) = min g n (x, y).
n n (x,y)∈L A

Since gC (x, y) ≤ g A (x, y) for all (x, y) ∈ L provided ci j ≤ ai j for all i, j, the proof
of Part (2) follows.
We have
1
f 1/n (A) = min g A (x, y)
n (x,y)∈L

and hence for A = α1 A1 + α2 A2 we have

1 1
f 1/n (A) = min g A (x, y) = min α1 g A1 (x, y) + α2 g2 A2 (x, y)
n (x,y)∈L n (x,y)∈L
α1 α2
≥ min g A (x, y) + min g A (x, y) = α1 f 1/n (A1 ) + α2 f 1/n (A2 ),
n (x,y)∈L 1 n (x,y)∈L 2

which completes the proof of Part (3). 


It is not hard to see that the function f of Lemma 3.5.7 is the capacity

p (x1 , . . . , xn )
inf
x1 ,...,xn >0 x1 · · · xn

of the polynomial ⎛ ⎞

n n
p(x1 , . . . , xn ) = ⎝ ai j x j ⎠ ,
i=1 j=1

cf. Sect. 2.1.5 and Lemma 3.3.3.


We state the scaling theorem in the most general form (we will use it later in
Chap. 8).
3.5 Matrix Scaling 71

3.5.8 Theorem. Let r = (r1 , . . . , rm ) and c = (c1 , . . . , cn ) be positive integer


vectors such that
 m  n
ri = cj = N.
i=1 j=1

 
Then for any  positive m × n matrix A = ai j there exists an m × n positive matrix
B = bi j with row sums r1 , . . . , rm and column sums c1 , . . . , cn and positive real
λ1 , . . . , λm and μ1 , . . . , μn such that

ai j = λi μ j bi j for all i, j.

Moreover, given r , c and A, the matrix B is unique and can be found as the minimum
point of the function
 xi j
f = xi j ln
1≤i≤m
ai j
1≤ j≤n

on the polytope r,c of non-negative m × n matrices with row sums r and column
sums c. The numbers λi and μ j are unique up to a rescaling

λi −→ λi τ , μ j −→ μ j τ −1

for some τ > 0 and can be found as follows:


Let us define g A : Rm ⊕ Rn −→ R by

g A (x, y) = ai j e xi +y j for x = (x1 , . . . , xm ) and y = (y1 , . . . , yn )
1≤i≤m
1≤ j≤n

and let Lr,c ⊂ Rm ⊕ Rn be the subspace defined by the equations


m 
n
ri xi = 0 and c j y j = 0.
i=1 j=1

Then the minimum of g A on Lr,c is attained at some point x ∗ = (ξ1 , . . . , ξm ) and


y ∗ = (η1 , . . . , ηn ) and we may let
# #
−ξi g A (x ∗ , y ∗ ) g A (x ∗ , y ∗ )
λi = e and μ j = e−η j
N N

for all i, j. 

The proof is very similar to those of Theorem 3.5.2 and Lemma 3.5.3 and therefore
omitted.
72 3 Permanents

3.6 Permanents of Complex Matrices

In this section, we take a look at the permanents of matrices with complex entries.
Such permanents are of interest in physics, see, for example, [AA13] and [Ka16].
First, we prove that the permanents of matrices sufficiently close to the n × n matrix
Jn of all 1s is not 0.

  constant δ0 > 0 (one can choose δ0 = 0.5)


3.6.1 Theorem. There exists an absolute
such that for any n × n matrix A = ai j with complex entries satisfying
 
1 − ai j  ≤ δ0 for all i, j

we have
per A = 0.

Geometrically, the ∞ distance from the matrix Jn to the hypersurface per Z = 0


in the space Cn×n of n ×n complex matrices is bounded below by a positive constant,
independent on n. Later, in Theorem 5.5.3, we prove that per A = 0 if the 1 distance
of every row and column of an n × n complex matrix A to the vector of all 1 s does
not exceed γn for some absolute constant γ > 0 (one can choose γ = 0.0696).
In view
 of Theorem
 3.6.1, we can choose a branch of ln per A for all matrices
A = ai j satisfying 1 − ai j  ≤ δ0 such that ln per Jn is a real number, where Jn is
the n × n matrix of all 1s.

3.6.2 Theorem. Let us fix some 0 < δ < δ0 , where δ0 is the constant in Theorem
3.6.1. Then there exists γ = γ(δ) > 0 and for any  > 0 and positive integer n there
 
exists a polynomial p = pn,δ, in the entries of an n × n complex matrix A = ai j
satisfying
deg p ≤ γ(ln n − ln )

and
|ln per A − p(A)| ≤ 

provided  
1 − ai j  ≤ δ for all i, j.

As we will see, the polynomial p(A) can be efficiently computed. The gist of
Theorem 3.6.2 is that ln per A can be efficiently approximated by a low-degree poly-
nomial in the vicinity of the matrix Jn of all 1s, and, in particular, per A can be
approximated there within a relative error of  in quasi-polynomial n O(ln n−ln ) time.
Theorems 3.6.1 and 3.6.2 were first proved in [B16b] with a worse constant
δ0 = 0.195. Following [B16+], we give a much simplified proof achieving a better
constant.
First we prove Theorem 3.6.1 and then deduce Theorem 3.6.2 from it. We identify
C = R2 and measure angles between complex numbers as vectors in the plane.
3.6 Permanents of Complex Matrices 73

3.6.3 Lemma. Let u 1 , . . . , u n ∈ R2 be non-zero vectors and suppose that the angle
between any two vectors u i and u j does not exceed α for some 0 ≤ α < 2π/3. Let
u = u 1 + . . . + u n . Then
 α 
n
|u| ≥ cos |u i |.
2 i=1

Proof. First, we note that 0 cannot lie in the convex hull of the vectors u 1 , . . . , u n ,
since otherwise by the Carathéodory Theorem it would have lied in the convex hull
of some three vectors u i , u j , u k and then the angle between some two of these three
vectors would have been at least 2π/3, see Fig. 3.5.
Hence the vectors u 1 , . . . , u n lie in an angle measuring at most α. Let us consider
the orthogonal projections of u 1 , . . . , u n onto the bisector of the angle, see Fig. 3.6.
Then the length of the projection of u i is at least |u i | cos(α/2) and the length of
the projection of u is at least (|u 1 | + . . . + |u n |) cos(α/2). Since the length of u is at
least as large as the length of its orthogonal projection, the result follows. 

In [B16b] a weaker bound with cos α instead of cos(α/2) is used (assuming
that α < π/2). The current enhancement is due to Bukh [Bu15].

3.6.4 Lemma. Let u 1 , . . . , u n ∈ C be non-zero complex numbers, such that the


angle between any two vectors u i and u j does not exceed α for some 0 ≤ α < 2π/3
and let 0 ≤ δ < cos(α/2) be a real number. Let a1 , . . . , an and b1 , . . . , bn be complex
numbers such that

Fig. 3.5 If the origin lies in


the convex hull of the vectors
then the angle between some
two vectors is at least 2π/3
uk
uj
0

ui

Fig. 3.6 Projecting vectors


onto the bisector of the angle
ui

0
74 3 Permanents
   
1 − a j  ≤ δ and 1 − b j  ≤ δ for j = 1, . . . , n.

Let

n 
n
v= a j u j and w = bju j.
j=1 j=1

Then v = 0, w = 0 and the angle between v and w does not exceed

δ
2 arcsin .
cos(α/2)

Proof. Let u = u 1 + . . . + u n . Then, by Lemma 3.6.3, u = 0 and

α 
n
|u| ≥ cos |u j |.
2 j=1

By the triangle inequality, we have


n
  
n
|v − u| ≤ 1 − a j  |u j | ≤ δ |u j |.
j=1 j=1

Therefore, the angle between v = (v − u) + u and u does not exceed

|v − u| δ
θ = arcsin ≤ arcsin ,
|u| cos(α/2)

see Fig. 3.7.


Similarly, the angle between w and u does not exceed θ and hence the angle
between v and w does not exceed 2θ. 

3.6.5 Proof of Theorem 3.6.1. Let us choose


π
δ0 = 0.5 and α = .
2

Fig. 3.7 The angle between


a and a + b does not exceed b
|b|
arcsin |a| provided |b| < |a|

a
3.6 Permanents of Complex Matrices 75

We denote by Un the


 closed polydisc Un ⊂ C
n×n
consisting of the n × n complex
matrices A = ai j such that

|1 − ai j | ≤ δ0 for all i, j.

We prove by induction on n the following statement.


For every matrix Z ∈ Un we have per Z = 0 and, moreover, if A, B ∈ Un are two
matrices that differ in one row (one column) only, then the angle between non-zero
complex numbers per A and per B does not exceed α.
If n = 1 then any a ∈ U1 is necessarily non-zero, since δ0 < 1. Moreover, the
angle between any two a, b ∈ U1 does not exceed 2 arcsin δ0 = π/3 < α, cf. Fig. 3.7.
Suppose that n ≥ 2 and assume that the above statement holds for matrices
from Un−1 . Let A, B ∈ Un be two matrices that differ in one row or in one column
only. Without loss of generality, we assume that the matrix B is obtained from A
by replacing the entries a1 j in the first row by some complex numbers b1 j , where
j = 1, . . . , n. Using the row expansion (3.1.1.2), we obtain


n 
n
per A = a1 j per A j and per B = b1 j per A j ,
j=1 j=1

where A j is the (n − 1) × (n − 1) matrix obtained from A by crossing out the first


row and the j-th column. We have A j ∈ Un−1 and, moreover, up to a permutation
of columns, any two matrices A j1 and A j2 differ in at most one column. Therefore,
by the induction hypothesis per A j = 0 for j = 1, . . . , n and the angle between any
two non-zero complex numbers per A j1 and per A j2 does not exceed α.
We apply Lemma 3.6.4 with u j = per A j , a j = a1 j and b j = b1 j for j =
1, . . . , n. Since δ0 < cos(α/2), by Lemma 3.6.4 we have per A = 0 and per B = 0
and the angle between per A and per B does not exceed

δ0 0.5 1 π
2 arcsin = 2 arcsin = 2 arcsin √ = = α,
cos(α/2) cos(π/4) 2 2

which completes the proof. 

The value of δ0 = 0.5 is the largest value of δ for which the equation

δ
α = 2 arcsin
cos(α/2)

has a solution α. Indeed, the above equation can be written as


 α  α
sin cos = δ, that is, sin α = 2δ.
2 2
76 3 Permanents

3.6.6 The optimal value of δ0 . What is the optimal value of δ0 in Theorem 3.6.1?
To be more precise, since it is not even clear whether the optimal value δ0 exists,
what is the supremum of all possible values of δ0 in Theorem 3.6.1? Since
1+i 1−i
per 2 2
1−i 1+i =0
2 2

we must have √
2
δ0 < ≈ 0.7071067810.
2
Moreover, Bukh [Bu15] showed that for

1+i 1−i
a= and b =
2 2
we have ⎛ ⎞
a b a b ... a b
⎜b a b a ... b a⎟
⎜ ⎟
per ⎜
⎜. . . ... ... ...... ... ... . . .⎟
⎟=0
⎝a b a b ... a b⎠
b a b a ... b a
  
n≡2 mod 4

and hence there is no hope that the value of δ0 might improve as n grows.
Now we deduce Theorem 3.6.2 from 3.6.1.
 
3.6.7 Proof of Theorem 3.6.2. Let A = ai j be an n ×n complex matrix satisfying
|ai j − 1| ≤ δ for all i, j and let J = Jn be the n × n matrix of all 1s. We define a
univariate polynomial  
g(z) = per J + z(A − Jn )

with deg g ≤ n. Let


δ0
β= > 1.
δ
By Theorem 3.6.1,
g(z) = 0 provided |z| ≤ β.

Let
f (z) = ln g(z) for |z| ≤ 1,

where we choose the branch of the logarithm that is real for z = 0. We note that by
Theorem 3.6.1 the function f is well defined and we have

f (0) = ln n! and f (1) = ln per A.


3.6 Permanents of Complex Matrices 77

We consider the Taylor polynomial of f at z = 0:

m
zk d k 

pm (z) = f (0) + k
f (z)  (3.6.7.1)
k=1
k! dz z=0.

By Lemma 2.2.1, we have


n
| pm (1) − ln per A| = | pm (A) − f (1)| ≤
(m + 1)β m (β − 1)

In particular, to approximate ln per A within an additive error of  > 0, we can choose


m ≤ γ(ln n − ln ) in (3.6.7.1) for some γ = γ(δ) > 0.
It remains to show that pm (1) is a polynomial of degree m in the matrix entries
ai j of A. Our first observation is that the k-th derivative g (k) (0) is a polynomial of
degree k in the entries of the matrix A, which can be computed in n O(k) time. Indeed,

dk  dk   
n
  

g(z)  = 1 + z aiσ(i) − 1 
dz k z=0 dz k σ∈S i=1 z=0
n
     
= ai1 σ(i1 ) − 1 · · · aik σ(ik ) − 1 ,
σ∈Sn (i 1 ,...,i k )

where the last sum is taken over all ordered k-subsets (i 1 , . . . , i k ) of indices 1 ≤ i j ≤
n. Since there are (n − k)! permutations σ ∈ Sn that map a given ordered k-subset
(i 1 , . . . , i k ) into a given ordered k-subset ( j1 , . . . , jk ), we can write
    
g (k) (0) = (n − k)! ai1 j1 − 1 · · · aik jk − 1 , (3.6.7.2)
(i 1 ,...,i k )
( j1 ,..., jk )

where the last sum is taken over all pairs of ordered k-subsets (i 1 , . . . , i k ) and
( j1 , . . . , jk ) of indices between 1 and n. As follows from Sect. 2.2.2, the deriva-
tives f (k) (0) for k = 1, . . . , m can be found in O(m 2 ) time as linear combinations
of the derivatives g (k) (0) for k = 1, . . . , m with coefficients depending on k only,
which completes the proof. 

Kontorovich and Wu [KW16] implemented the algorithm of Sect. 3.6.7 for com-
puting the polynomial p(A) and performed numerical experiments. Computing
g (k) (0) reduces to computing the sum of permanents of k × k submatrices of A − Jn
and Kontorovich and Wu used for that purpose
  an efficient algorithm of [FG06]. It
turned out that for n × n matrices A = ai j satisfying |1 − ai j | ≤ 0.5 and n ≤ 20
(so that the exact value of per A can be computed for comparison), polynomials p
of degree 3 already provide reasonable approximations (they approximate ln per A
within an about 1% error). On the other hand, polynomials p of degree 3 can be
easily computed for 100 × 100 matrices.
78 3 Permanents

Let A be an n × n complex matrix such that per A = 0 and suppose that the
∞ -distance from A to the complex hypersurface per Z = 0 is at least δ0 for some
δ0 > 0. It follows from the proof of Sect. 3.6.7 that for any 0 < δ < δ0 there is a
constant γ = γ(δ) > 0 and for any0 <  < 1 there is a polynomial p = p A,δ, in
the entries of an n × n matrix B = bi j such that deg p ≤ γ(ln n − ln ) and
   
ln per B − p A,δ, (B) ≤  provided ai j − bi j  ≤ δ for all i, j.

Of course, depending on A, the polynomial p might be hard to compute (it is easy


when A = Jn , the matrix of all 1s).
 
3.6.8 Remark. If the entries of an n×n real matrix A = ai j are (weakly) decreasing
down each column, that is, if ai j ≥ a(i+1) j for all i, j then the roots of the polynomial
p(z) = per (Jn + z A) are real. Moreover, the n-variate polynomial

p (z 1 , . . . , z n ) = per (Jn D(z 1 , . . . , z n ) + A) ,

where D (z 1 , . . . , z n ) is the diagonal matrix having z 1 , . . . , z n on the diagonal, is


H-stable [B+11].
A different approach to approximation of permanents by Taylor polynomial
expansions around Jn is described in [Mc14].

3.7 Approximating Permanents of Positive Matrices

As follows from Sect. 3.5, for any α ≥ 1, fixed in advance, the permanent of an
α-conditioned n × n positive matrix A can be approximated in polynomial time
within an n O(α ) factor. Understanding permanents of complex matrices allows us to
2

approximate permanents of such matrices better: we show that we can approximate


the permanent within arbitrarily small relative error in quasi-polynomial time. More
precisely, we prove the following result.
3.7.1 Theorem. For any 0 ≤ δ < 1, there exists γ = γ(δ) > 0 such that for any
positive integer
 n andany real 0 <  ≤ 1 there exists a polynomial p = pn,δ, with
deg p ≤ γ ln n − ln  in the entries ai j of an n × n real matrix A = ai j such that

|ln per A − p(A)| ≤ 

provided  
1 − ai j  ≤ δ for all i, j.

We show that the polynomial pn,δ, can be computed in n O(ln n−ln ) time, where the
implicit constant in the “O” notation depends on δ alone.
We deduce Theorem 3.7.1 from the following result.
3.7 Approximating Permanents of Positive Matrices 79

3.7.2 Theorem. Let us fix a real 0 ≤ δ < 1 and le


π 
τ = (1 − δ) sin − arctan δ > 0.
4
 
Let Z = z i j be an n × n complex matrix such that
   
1 −  z i j  ≤ δ and  z i j  ≤ τ for all 1 ≤ i, j ≤ n.

Then
per Z = 0.

We note that
π  (1 − δ)2
(1 − δ) sin − arctan δ ≥ for all 0 ≤ δ ≤ 1
4 2
and so
(1 − δ)2
τ=
2
satisfies the condition of Theorem 3.7.2.
We prove Theorem 3.7.2 first and then deduce Theorem 3.7.1 from it.
As in Sect. 3.6, we identify C = R2 and measure angles between non-zero com-
plex numbers as between non-zero vectors in the plane. We start with a simple
geometric lemma.
3.7.3 Lemma. Let u 1 , . . . , u n ∈ C be non-zero complex numbers such that the angle
between any two u i , u j does not exceed π/2.
(1) Let

n 
n
v= α j u j and w = βju j
j=1 j=1

where α1 , . . . , αn are non-negative real and β1 , . . . , βn are real such that

|β j | ≤ α j for j = 1, . . . , n.

Then
|w| ≤ |v|;

(2) Let

n 
n
v= α j u j and w = βju j
j=1 j=1

where α1 , . . . , αn and β1 , . . . , βn are real such that


80 3 Permanents
   
1 − α j  ≤ δ and 1 − β j  ≤ δ for j = 1, . . . , n

and some 0 ≤ δ < 1. Then v = 0, w = 0 and the angle between v and w does
not exceed
2 arctan δ.

(3) Let

n 
n
v= α j u j and w = βju j
j=1 j=1

where    
1 −  α j  ≤ δ, 1 −  β j  ≤ δ and
   
 α j  ≤ τ ,  β j  ≤ τ for j = 1, . . . , n

and some 0 ≤ δ < 1 and 0 ≤ τ < 1 − δ. Then v = 0, w = 0 and the angle


between v and w does not exceed
τ
2 arctan δ + 2 arcsin .
1−δ

Proof. We consider the standard inner product in R2 = C, so

a, b =  ab.

Hence
u i , u j  ≥ 0 for all i, j.

We have  
|w|2 = βi β j u i , u j  ≤ αi α j u i , u j  = |v|2
1≤i, j≤n 1≤i, j≤n

and the proof of Part (1) follows.


To prove Part (2), let

n
αj + βj n
αj − βj
u= u j and x = u j,
j=1
2 j=1
2

so that v = u + x and w = u − x, see Fig. 3.8. Clearly, |u| > 0.


Now, if |1 − α| ≤ δ and |1 − β| ≤ δ for some 0 ≤ δ < 1 and α ≥ β we have

α 1+δ
≤ and hence α(1 − δ) ≤ β(1 + δ)
β 1−δ
3.7 Approximating Permanents of Positive Matrices 81

Fig. 3.8 Given |u| and |x|,


the angle between v = u + x −x x
and w = u − x is the largest
x −x
when u is orthogonal to x
u w u
v w
v

and
α−β α − β − δ(α + β) α(1 − δ) − β(1 + δ)
−δ = = ≤ 0.
α+β α+β α+β

Therefore for all α and β such that |1 − α| ≤ δ and |1 − β| ≤ δ for some 0 ≤ δ < 1
we have
|α − β|
≤ δ.
α+β

Therefore, by Part (1),


|x| ≤ δ|u|.

The angle between v and w is


v, w
arccos ,
|v||w|

where
v, w = |u|2 − |x|2 .

We have
|v|2 + |w|2 = 2|u|2 + 2|x|2

and hence
|v||w| ≤ |u|2 + |x|2

with equality attained when |v|2 = |w|2 = |u|2 + |x|2 , that is, when x is orthogonal
to u. Therefore, the angle between v and w does not exceed

|u|2 − |x|2
arccos
|u|2 + |x|2

with equality attained when x is orthogonal to u and the angle is

|x|
2 arctan ≤ 2 arctan δ,
|u|

see Fig. 3.8. The proof of Part (2) now follows.


82 3 Permanents

In Part (3), let


n
  
n
  
n
 
v =  α j u j , v  =  α j u j , w =  βj u j
j=1 j=1 j=1

n
 
and w  =  βj u j.
j=1

By Part (2), the angle between non-zero vectors v  and w  does not exceed 2 arctan δ.
By Part (1), we have
τ τ
|v  | ≤ |v  | and |w  | ≤ |w  |.
1−δ 1−δ

Hence v = v  + iv  = 0 and w = w  + iw  = 0 and the angle between v and v  and


the angle between w and w  do not exceed
τ
arcsin ,
1−δ

see Fig. 3.7. The proof of Part (3) now follows. 

Now we are ready to prove Theorem 3.7.2.

3.7.4 Proof of Theorem 3.7.2. For a positive


 integer n, let Un = Un (δ, τ ) be the
set of n × n complex matrices Z = z i j such that
   
1 −  z i j  ≤ δ and  z i j  ≤ τ for all i, j.

We prove by induction on n a stronger statement:


For any Z ∈ Un we have per Z = 0 and, moreover, if A, B ∈ Un are two matrices
that differ in one row (or in one column) only, then the angle between the non-zero
complex numbers per A and per B does not exceed π/2.
Since τ < 1 − δ, the statement holds for n = 1. Assuming that the statement
holds for matrices in Un−1 , let us consider two matrices A, B ∈ Un that differ in one
row or in one column only. Without loss of generality, we assume that B is obtained
from A by replacing the entries a1 j in the first row with complex numbers b1 j for
j = 1, . . . , n. Let A j be the (n − 1) × (n − 1) matrix obtained from A by crossing
out the first row and the j-th column. Applying the row expansion (3.1.1.2), we get


n 
n
per A = a1 j per A j and per B = b1 j per A j .
j=1 j=1

We have A j ∈ Un−1 for all j = 1, . . . , n, and, moreover any two matrices A j1 and
A j2 differ, up to a permutation of columns, in one column only. Therefore, by the
3.7 Approximating Permanents of Positive Matrices 83

induction hypothesis, we have per A j = 0 for j = 1, . . . , n and the angle between


any two non-zero complex numbers A j1 and A j2 does not exceed π/2. Applying Part
(3) of Lemma 3.7.3 with

u j = per A j , α j = a1 j and β j = b1 j for j = 1, . . . , n,

we conclude that per A = 0, per B = 0 and the angle between per A and per B does
not exceed τ π
2 arctan δ + 2 arcsin = .
1−δ 2


 
3.7.5 Proof of Theorem 3.7.1. Let A = ai j be an n × n real matrix such that
 
1 − ai j  ≤ δ for all i, j,

let Jn = J be the n × n matrix filled with 1 s and let us define a univariate polynomial
 
r (z) = per J + z(A − J ) for z ∈ C.

Hence
r (0) = per J = n!, r (1) = per A and deg r ≤ n.

First, we observe that as long as −α ≤  z ≤ 1 + α for some α > 0, the real part
of each entry of the matrix J + z(A − J ) lies in the interval

[1 − δ(1 + α), 1 + δ(1 + α)].

Similarly, as long as | z| ≤ ρ for some ρ > 0, the imaginary part of each entry of
the matrix J + z(A − J ) does not exceed ρδ in the absolute value. Let us choose an
α = α(δ) > 0 such that δ  = δ(1 + α) < 1 and choose

1 − δ π 
ρ = ρ(δ) = sin − arctan δ  > 0.
δ 4
It follows from Theorem 3.7.2 that

r (z) = 0 provided − α ≤  z ≤ 1 + α and | z| ≤ ρ. (3.7.5.1)

Let φ(z) = φδ (z) be the univariate polynomial constructed in Lemma 2.2.3, such
that
φ(0) = 0, φ(1) = 1

and
−α ≤  φ(z) ≤ 1 + α and | φ(z)| ≤ ρ
84 3 Permanents

provided
|z| ≤ β for some β = β(δ) > 1.

The degree of φ(z) is bounded by a constant depending on δ alone.


Let us define
g(z) = r (φ(z)).

Then g(z) is a univariate polynomial and deg g = (deg r )(deg φ) = O(n) where the
implicit constant in the “O” notation depends only on δ. We have

g(0) = r (0) = n!, g(1) = r (1) = per A

and from (3.7.5.1) it follows that

g(z) = 0 provided |z| ≤ β.

Let us choose a branch of f (z) = ln g(z) in the disc |z| ≤ 1 so that

f (0) = ln n! and f (1) = ln per A

and let pm be the Taylor polynomial of degree m of f (z) computed at z = 0, so


m
dk  zk

pm (z) = f (0) + f (z)  .
k=1
dz k z=0 k!

By Lemma 2.2.1, we have

deg g
| f (1) − pm (1)| ≤ .
(m + 1)β m (β − 1)
 
Hence one can choose m ≤ γ ln n − ln  for some constant γ = γ(δ) > 0 such
that
|ln per A − pm (1)| ≤ .

It remains to show that


m
f (k) (0)
pm (1) = f (0) +
k=1
k!

is a polynomial of degree at most m in the entries ai j of the matrix A that can be


computed in n O(m) time.
As follows from Sect. 2.2.2, the derivatives f (k) (0) for k = 1, . . . , m can be found
in O(m 2 ) time as linear combinations of the derivatives g (k) (0) for k = 1, . . . , m
with coefficients depending on k only.
3.7 Approximating Permanents of Positive Matrices 85

For a univariate polynomial q(z) and a positive integer m, let q[m] (z) be the
truncated polynomial obtained from q by erasing all monomials of degree higher
than m.
Since φ(0) = 0, the constant term of φ(z) is 0 and to compute g[m] (z), we com-
pute the truncated polynomials φ[m] (z), r[m] (z) and then truncate the composition
r[m] (φ[m] (z)) by discarding all terms of degree higher than m. As in Sect. 3.6.7, we
observe that the k-th derivative r (k) (0) is a polynomial of degree k in the entries of
the matrix A, which can be computed in n O(k) time. Hence g (k) (0) and thus (k)
 f (0)
are polynomials of degree at most k in the entries ai j of the matrix A = ai j . The
proof now follows. 

3.8 Permanents of α-Conditioned Matrices


and Permutations with Few Cycles
 
Let A = ai j be an n × n positive matrix which is α-conditioned for some α ≥ 1,
cf. Definition 3.5.5. Let us fix α and let n grow. It turns out that the bulk of the
permanent of A is carried by permutations with a small (logarithmic) number of
cycles. We interpret permanents as sums over cycle covers, see Sect. 3.1.3.
The following result was proved in [Ba15].
3.8.1 Theorem. Let c(σ) denote the number
 of cycles of a permutation σ ∈ Sn . For
an α-conditioned n × n matrix A = ai j , we have
 
n
1
aiσ(i) ≥ per A.
σ∈Sn : i=1
2
c(σ)<3α2 ln n+6

 
Given a positive matrix A = ai j , we consider the symmetric group Sn as a
probability space, where
 n 

Pr (σ) = (per A)−1 aiσ(i) for σ ∈ Sn .
i=1

3.8.2 Lemma. Let us define random variables

li : Sn −→ R for i = 1, . . . , n,

where li (σ) is the length of the cycle of permutation σ that contains i. Assuming that
A is α-conditioned, we have

  α2
Pr σ ∈ Sn : li (σ) = m ≤ for i = 1, . . . , n
n−m

and m = 1, . . . , n − 1.
86 3 Permanents

Fig. 3.9 Merging two cycles


r
1

1 r

Proof. Without loss of generality, we assume that i = 1. Let X ⊂ Sn be the set of


permutations σ ∈ Sn such that l1 (σ) = m. We construct a set Y ⊂ Sn as follows.
Each permutation σ ∈ X contributes n − m permutations into Y : we write the cycle
of σ containing 1 as
1 = j1 → j2 → . . . → jm → 1, (3.8.2.1)

pick an element r of the n − m elements not in the cycle, write the cycle of σ
containing r as
r = jm+1 → jm+2 → . . . → jm+k → r (3.8.2.2)

and produce a permutation τ ∈ Y by merging the two cycles together:

1 = j1 → j2 → . . . → jm → r = jm+1 → jm+2 → . . . → jm+k → 1, (3.8.2.3)

see Fig. 3.9.


Since A is α-conditioned, we have

Pr (σ) ≤ α2 Pr (τ ). (3.8.2.4)

Next, we observe that each permutation τ ∈ Y is obtained from a unique permutation


σ ∈ X . To reconstruct σ from τ , we find the cycle of σ containing 1, write it as in
(3.8.2.3) and cut into the cycles (3.8.2.1) and (3.8.2.2), see Fig. 3.10
Using (3.8.2.4), we conclude that

α2 α2
Pr (X ) ≤ Pr (Y ) ≤ .
n−m n−m

3.8.3 Proof of Theorem 3.8.1. Let li be the random variables of Lemma 3.8.2.
Using Lemma 3.8.2, we estimate
3.8 Permanents of α-Conditioned Matrices and Permutations with Few Cycles 87

Fig. 3.10 Cutting a cycle


into two
1 r

   n
1  
E li−1 = Pr σ : li (σ) = m
m=1
m
 1    1  
= Pr σ : li (σ) = m + Pr σ : li (σ) = m
1≤m≤n/3
m n/3<m≤n
m
3α2  1 3   
≤ + Pr σ : li (σ) = m
2n 1≤m≤n/3
m n n/3<m≤n
2
3α ln n 3
≤ + .
2n n
Next, we note that

n
c(σ) = li−1 (σ),
i=1

since the sum of li−1 (σ) for all i in a cycle of σ is 1. Therefore,


n
  3α2 ln n
E c(σ) = E li−1 (σ) ≤ + 3.
i=1
2

Applying the Markov inequality, we conclude that

  1
Pr σ : c(σ) ≥ 3α2 ln n + 6 ≤ ,
2
and the proof follows. 

As is shown in [Ba15], one immediate corollary of Theorem 3.8.1 is that on


α-conditioned matrices, the permanent of A and the Hamiltonian permanent of A,
88 3 Permanents

 
n
ham A = aiσ(i)
σ∈Sn : i=1
c(σ)=1

2
differ by a factor of n O(α ln n) (permutations consisting of a single cycle are called
Hamiltonian cycles). Similarly to the proof
 of Lemma 3.8.2, the result is obtained by
patching a permutation with O α2 ln n cycles into a single cycle. Consequently, for
α fixed in advance, using the scaling algorithm of Sect. 3.5, we obtain a polynomial
2
time algorithm for computing ham A within a factor of n O(α ln n) . As is discussed
in [Ba15], this allows one to distinguish in polynomial time directed graphs on n
vertices that contain many Hamiltonian cycles (at least n (n − 1)! for some fixed
 > 0) from graphs that are sufficiently far from having a Hamiltonian cycle (need at
least n new edges added to acquire one). The algorithm
  is obtained by approximating
per A and hence ham A for a “soft” version A = ai j of the adjacency matrix of the
graph, $
1 if i → j is an edge
ai j =
δ otherwise

for a sufficiently small δ = δ() > 0.


Vishnoi [Vi12] used the van der Waerden bound for the permanent (see Sect. 3.3)
to prove the existence of long cycles (and of an efficient algorithm to find such cycles)
in regular graphs.

3.9 Concluding Remarks

3.9.1 Permanents and determinants. It is tempting to compare the permanent


n
per A = aiσ(i)
σ∈Sn i=1

with the syntactically similar determinant

 
n
det A = (sgn σ) aiσ(i)
σ∈Sn i=1

and try exploit the similarity. Godsil and Gutman [GG78] suggested the following
construction.
 
Suppose that A = ai j is an n ×n non-negative real matrix. Let ξi j be real-valued
independent random variables such that

E ξi j = 0 and var ξi j = 1 for all i, j = 1, . . . , n


3.9 Concluding Remarks 89
 
and let us define a random n × n matrix B = bi j by

bi j = ξi j ai j for all i, j = 1, . . . , n.

It is not hard to show that


E (det B)2 = per A

and one can ask how well det 2 B is likely to approximate per A, see also Chap. 8 of
[LP09]. Since det 2 B is non-negative, the Markov inequality implies that det2 B is
unlikely to overestimate per A by a lot (for example, the probability that det 2 B >
10 per A does not exceed 1/10). However, it may happen that det2 B grossly under-
estimates per A. For example, if n = 2m and A is a block-diagonal matrix consisting
11
of m blocks J2 = then per A = 2m . If we choose ξi j to be random signs, so
11
that
1 1
Pr (ξi j = 1) = and Pr (ξi j = −1) =
2 2

then det B = 0 with probability 1 − 2−m . This effect can be mitigated if ξi j are
continuous random variables. In [Ba99] it is shown that if ξi j are standard Gaussian
with density
1
√ e−x /2
2

then with probability approaching 1 as n grows, we have

(det B)2 ≥ (0.28)n per A (3.9.1.1)

(the worst-case scenario is when A = In , the n × n identity matrix). It is also shown


that if ξi j are complex Gaussian with density

1 −|z|2
e for z ∈ C,
π

in which case E |det B|2 = per A then with probability approaching 1 as n grows,
we have
| det B|2 ≥ (0.56)n per A (3.9.1.2)

(again, the worst case scenario is when A = In ).


Finally, let us choose ξi j to be quarternionic Gaussian with density

4 −|h|2
e for h ∈ H
π2

(so that E |h|2 = 1, here H denotes the skew field of quaternions and not the upper
half-plane of C as elsewhere in the book). Then B is an n × n quaternionic matrix
90 3 Permanents

which we write as
B = R + i S + j T + k U,

where R, S, T and U are n × n real matrices. Let BC denote the 2n × 2n complex


matrix
R + i S T + iU
BC = .
−T + iU R − i S

It is show in [Ba99] that det BC is a non-negative real number such that E det BC =
per A and that
det BC ≥ (0.76)n per A (3.9.1.3)

with probability approaching 1 as n grows (again, the worst-case scenario is when


A = In ).
The idea behind the inequalities of (3.9.1.1)–(3.9.1.3) is roughly as follows. We
note that det B is linear in every row of B. We consider det B as a function of n inde-
pendent Gaussian n-vectors xi = (ξi1 , . . . , ξin ). In the real case (det B)2 is a quadratic
form in each xi , once the values of the remaining vectors x1 , . . . , xi−1 , xi+1 , . . . , xn
are fixed. In the complex case, | det B|2 is a Hermitian form in each xi , once the
values of the remaining vectors x1 , . . . , xi−1 , xi+1 , . . . , xn are fixed. In the quater-
nionic case, det BC is a quaternionic Hermitian form in each xi , once the values of
the remaining vectors x1 , . . . , xi−1 , xi+1 , . . . , xn are fixed.
We deduce (3.9.1.1) from the following: if q : Rn −→ R is a positive semidefinite
quadratic form on the space Rn equipped with the standard Gaussian measure and
such that E q = 1 then
E ln q ≥ − ln 2 − γ, (3.9.1.4)

where γ ≈ 0.5772156649 is the Euler constant and the bound (3.9.1.4) is attained if
q is a form of rank 1, for example,

q (x1 , . . . , xn ) = x12 where (x1 , . . . , xn ) ∈ Rn .

Since every positive semidefinite quadratic form is a convex combination of positive


semidefinite forms of rank 1, by Jensen’s inequality the minimum in (3.9.1.4) is
indeed attained on forms of rank 1. The constant in (3.9.1.1) is e− ln 2−γ ≈ 0.28.
We deduce (3.9.1.2) from the following: if q : Cn −→ R is a positive semidefinite
Hermitian form on the space Cn equipped with the standard Gaussian measure and
such that E q = 1 then
E ln q ≥ −γ, (3.9.1.5)

and the bound in (3.9.1.5) is attained if q is a form of rank 1, for example,

q (z 1 , . . . , z n ) = |z 1 |2 where (z 1 , . . . , z n ) ∈ Cn .
3.9 Concluding Remarks 91

Similarly to the real case, since every positive semidefinite Hermitian form is a
convex combination of positive semidefinite Hermitian forms of rank 1, by Jensen’s
inequality the minimum in (3.9.1.5) is indeed attained on forms of rank 1. We get a
better bound than in the real case, because a complex Hermitian form of rank 1 can
be viewed as a real quadratic form of rank 2. The constant in (3.9.1.2) is e−γ ≈ 0.56.
We deduce (3.9.1.3) from the following: if q : Hn −→ R is a positive semidefinite
Hermitian form on the space Hn equipped with the standard Gaussian measure and
such that E q = 1 then
E ln q ≥ 1 − γ − ln 2 (3.9.1.6)

and the bound in (3.9.1.6) is attained if q is a form of rank 1, for example,

q (h 1 , . . . , h n ) = |h 1 |2 where (h 1 , . . . , h n ) ∈ Hn .

The constant in (3.9.1.3) is e1−γ−ln 2 ≈ 0.76.


For various special classes of matrices, a subexponential approximation factor is
achieved by (real) Gaussian [F+04], [RZ16] and some non-Gaussian [CV09] random
variables ξi j .

3.9.2 Algorithms for computing permanents. For a general n × n real or complex


matrix A, the most efficient method known of computing per A exactly, is, apparently,
Ryser’s method and its modifications, see Chap. 7 of [Mi78], which achieves O(n 2 2n )
complexity. Essentially, it uses the formula
⎛ ⎞
∂n n n
per A = p(x1 , . . . , xn ) where p(x1 , . . . , xn ) = ⎝ ai j x j ⎠ ,
∂x1 · · · ∂xn i=1 j=1

and computes the derivative as

∂n 
p(x1 , . . . , xn ) = (−1)|I | p(x I ), (3.9.2.1)
∂x1 · · · ∂xn I ⊂{1,...,n}

where x I is the 0–1 vector with 0 s in positions I and 1 s elsewhere (as is easy
to see, formula (3.9.2.1) holds for any homogeneous polynomial p of degree n in
x1 , . . . , xn ). The exact computation of the permanent is a #P-hard problem already
for 0–1 matrices [Va79], which makes a polynomial time algorithm rather unlikely.
Efficient (polynomial time) algorithms for computing permanents exactly are known
for some rather restricted classes of matrices, for example, for matrices of a small
(fixed in advance) rank [Ba96] and for 0–1 matrices with small (fixed in advance)
permanents [GK87].
 
Given an n × n matrix A = ai j , let G(A) be the bipartite graph with 2n vertices
1 L , . . . n L and 1 R , . . . , n R , where vertices i L and j R are connected by an edge if
and only if ai j = 0, see Sect. 3.1.2. Cifuentes and Parillo found a polynomial time
92 3 Permanents

algorithm to compute per A exactly provided the treewidth of G(A) is bounded by


a constant, fixed in advance [CP16]. The algorithm is applicable to matrices over
any commutative ring. One can obtain graphs G(A) of a small treewidth provided A
is sufficiently sparse, that is, contains relatively few non-zeros. This is the case, for
example, if A has a band structure, that is, ai j = 0 provided |i − j| ≥ ω for some ω,
fixed in advance.
The greatest success in approximation algorithms is achieved by Jerrum,
Sinclair and Vigoda [J+04] who constructed a Markov Chain Monte Carlo based
fully polynomial time randomized approximation scheme for computing permanents
of non-negative matrices. A scaling based deterministic polynomial time algorithm
approximating permanents of n×n non-negative matrices within a factor of en is con-
structed in [L+00], see also Remark 3.5.4. The approximation factor was improved
to 2n [GS14] and it is conjectured that the same algorithm actually achieves a 2n/2
approximation factor, cf. (3.4.6.1). Using the “correlation decay” idea from statis-
tical physics, Gamarnik and Katz obtained a (1 + )n approximation factor for any
 > 0, fixed in advance, when A is a 0–1 matrix of a constant degree expander graph
[GK10].
Less is known about approximation algorithms for not necessarily non-negative
matrices (but see Sects. 3.6, 5.5 and also [Mc14]). Gurvits [Gu05] presented a ran-
domized algorithm, which, given an n × n complex matrix A approximates per A in
O(n 2 /2 ) time within an additive error of An , where A is the operator norm of
A, see also [AA13] for an exposition. The idea of the algorithm is to use the formula
⎛ ⎞

n 
n
per A = E x1 · · · xn ⎝ ai j x j ⎠ ,
i=1 j=1

where xi = ±1 are independent Bernoulli random variables and replace the expec-
tation by the sample average.
Chapter 4
Hafnians and Multidimensional Permanents

We explore certain extensions of the permanent: hafnians enumerate perfect


matchings in general graphs and multidimensional permanents enumerate perfect
matchings in hypergraphs. With the notable exception of the mixed discriminant,
which can be thought of as a “permanent-determinant” of a 3-dimensional array, these
extensions no longer have connections to H-stable polynomials, which is a major
disadvantage. However, other methods we tried on permanents generally continue to
work. Using scaling, we establish a decomposition of hafnians and multidimensional
permanents into the product of an easy to handle “scaling part” and hard to handle
“d-stochastic part”. We prove that the d-stochastic part is still concentrated, though
weaker than in the case of the permanent. Taylor polynomial interpolation works for
hafnians just as well as for permanents, while for multidimensional permanents it
produces efficient approximations in non-trivial real and complex domains. The van
der Waerden lower bound for mixed discriminants works just as well as for perma-
nents, while for the Bregman - Minc bound, we only manage to obtain a somewhat
weaker version.

4.1 Hafnians
 
4.1.1 Definition. Let n = 2m be a positive even integer and let A = ai j be an
n × n symmetric real or complex matrix. The hafnian of A is defined as

haf A = ai1 i2 · · · ai2m−1 i2m , (4.1.1.1)
{i 1 ,i 2 },...,{i 2m−1 ,i 2m }

where the sum is taken over all (2m)!/2m m! unordered partitions of the set
{1, . . . , n} into unordered pairs (the name was introduced by physicist Eduardo R.
Caianiello to mark his fruitful research stay in Copenhagen, or “Hafnia” in Latin).

© Springer International Publishing AG 2016 93


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_4
94 4 Hafnians and Multidimensional Permanents

Note that the diagonal entries of A are not involved at all. Equivalently,

1 
m
haf A = aσ(2i−1)σ(2i) , (4.1.1.2)
m!2m σ∈S i=1
n

where Sn is the symmetric group of all n! permutations of the set {1, . . . , n}. Although
one can define the hafnian of any (not necessarily symmetric) matrix by (4.1.1.2),
this does not lead to any more generality, since for a skew-symmetric matrix A the
expression (4.1.1.2) is identically 0, and, moreover, for a general A the value of
(4.1.1.2) is equal to its value on the symmetric part (A + A T )/2 of A.
The permanent of any m × m matrix is expressed as the hafnian of a (2m) × (2m)
symmetric matrix:
 
0 B
per B = haf A where A = .
BT 0

Indeed, any permutation σ ∈ Sm corresponds to the partition τ of {1, . . . , 2m} into


pairs {i, σ(i) + m} for i = 1, . . . , m and the contributions of σ to per B via (3.1.1.1)
and of τ to haf A via (4.1.1.1) coincide. Moreover, any partition τ with a non-zero
contribution to haf A corresponds to a unique permutation σ ∈ Sm .
We note a recursive formula


n
haf A = a1 j haf A j , (4.1.1.3)
j=2

where A j is the (n − 2) × (n − 2) symmetric matrix obtained from A by crossing


out the first and the j-th row and the first and the j-th column.
 
4.1.2 Hafnians and perfect matchings. If A = ai j is a real symmetric matrix
and ai j ∈ {0, 1} for all i, j then haf A has a combinatorial interpretation as the
number of perfect matchings in the graph G with adjacency matrix A, cf. Sect. 3.1.2.
That is, if G = (V, E) is an (undirected, without  loops or multiple edges) graph
with set V = {1, . . . , n} of vertices and set E ⊂ V2 of edges, the adjacency matrix
 
A = ai j is defined by 
1 if {i, j} ∈ E
ai j =
0 otherwise.

Assuming that n = 2m is even, we conclude haf A is the number of perfect matchings


of G.
4.1 Hafnians 95

Fig. 4.1 A graph and a


perfect matching (thick
1
edges)

2 3

5 6

For example, Fig. 4.1 pictures a graph with adjacency matrix


⎛ ⎞
0 1 1 00 0
⎜1 0 1 11 0⎟
⎜ ⎟
⎜1 1 0 10 1⎟
A=⎜
⎜0

⎜ 1 1 01 1⎟⎟
⎝0 1 0 10 1⎠
0 0 1 11 0

and a perfect matching of G.

4.1.3 Hafnians as integrals. Let γd be the standard Gaussian probability measure


on Rd with density

1
e−x /2 where x = x12 + . . . + xd2 for x = (x1 , . . . , xd ) .
2

(2π) d/2

In particular,
E xi2 = 1 and E xi x j = 0 provided i = j.

Let f 1 , . . . , f n : Rd −→ R be linear forms. Clearly,

E f 1 · · · f n = 0 if n is odd.

If n = 2m
 iseven, the expectation of the product is expressed as a hafnian. Namely,
let A = ai j be the (necessarily symmetric) n × n matrix defined by

ai j = E f i f j = f i (x) f j (x) dγd (x).
Rd

Then
E f 1 · · · f n = haf A. (4.1.3.1)
96 4 Hafnians and Multidimensional Permanents

Formula (4.1.3.1) is known as Wick’s formula, see [Zv97]. It can be proved as follows.
Let us denote the left hand side of (4.1.3.1) by L ( f 1 , . . . , f n ) and the right hand side
of (4.1.3.1) by R ( f 1 , . . . , f n ). For real parameters t = (t1 , . . . , tn ), let us define

f t = t 1 f 1 + . . . + tn f n .

Since L ( f 1 , . . . , f n ) and R ( f 1 , . . . , f n ) are degree n symmetric multilinear func-


tions of f 1 , . . . , f n , we have

1 ∂n
L ( f1 , . . . , fn ) = L ( f t , . . . , f t ) and
n! ∂t1 · · · ∂tn
1 ∂n
R ( f1 , . . . , fn ) = R ( ft , . . . , ft ) .
n! ∂t1 · · · ∂tn

Therefore, it suffices to prove (4.1.3.1) assuming that f 1 = . . . = f n . By the rotational


invariance of the measure γd , it further suffices to prove (4.1.3.1) when f 1 = . . . = f n
is the coordinate function, say, x1 . In that case, the matrix A is filled by 1 s and hence
the right hand side is equal to
(2m)!
.
2m m!
The left hand side is
  +∞  +∞
1 2 dt
x 2m e−x /2 d x = √ (2t)m e−t √
2
x12m dγd (x) = √
Rd 2π −∞ 2π 0 2t
  
1 m +∞ m− 1 −t 2m 1
=√ 2 t 2 e dt = √  m +
π 0 π 2
    
2m 1 3 1 1
=√ m− m− ··· 
π 2 2 2 2
(2m − 1)!
=(2m − 1)(2m − 3) · · · 1 =
(2m − 2) · · · (2m − 4) · · · 2
(2m − 1)! (2m)!
= m−1 = m ,
2 (m − 1)! 2 m!

which completes the proof of (4.1.3.1).


One corollary of (4.1.3.1) is that if A is an n × n symmetric positive semidefinite
matrix then  
A A
haf B ≥ 0 for B = .
A A
 
Indeed, A = ai j can be written as

ai j = E f i f j for all i, j
4.1 Hafnians 97

and some linear forms f 1 , . . . , f n : Rn −→ R, in which case

haf B = E f 12 · · · f n2 ≥ 0.

The following useful inequality relates hafnians and permanents of non-negative


matrices.

4.1.4 Theorem. Let A be an n × n non-negative symmetric matrix, where n is even.


Then 
haf A ≤ per A.

Proof. We follow [AF08]. Let n = 2m and let us consider (haf A)2 as a polynomial
in the entries ai j of the matrix A.
From the definition (4.1.1.1), we can write

(haf A)2 = ai1 i2 · · · ai2m−1 i2m a j1 j2 · · · a j2m−1 j2m , (4.1.4.1)
I,J

where the sum is taken over all ordered pairs  (I, J ) of unordered partitions  of
the
 set {1, . . . , 2m} into unordered
 pairs I = {i 1 , i 2 }, . . . , {i 2m−1 , i 2m } and J =
{ j1 , j2 }, . . . , { j2m−1 , j2m } (we allow I = J and count such pairs once). For given
I and J , the union of all pairs in I and J can be viewed as a graph with set {1, . . . , n}
of vertices and possibly multiple edges such that each vertex belongs to exactly two
edges, counting multiplicities, see Fig. 4.2. Such a graph is a union of disjoint cycles,
each cycle consisting of an even number of edges (counting multiplicities). On the
other hand, let  be a graph which is a union of disjoint cycles, each consisting of an
even number of edges, possibly including cycles with two edges, and containing all
n vertices. Let c>2 () be the number of cycles of  with more than 2 edges. Then 

Fig. 4.2 Two matchings and


their union
98 4 Hafnians and Multidimensional Permanents

can be represented as a union of two perfect matchings in exactly 2c>2 () ways and
hence (4.1.4.1) can be written as
 
(haf A)2 = 2c>2 () ai j . (4.1.4.2)
: {i, j}∈
each cycle has even length

To obtain the monomial expansion of per A, we interpret A as the adjacency matrix


of a complete directed graph on n vertices, which includes loops i → i and edges in
both directions i → j and j → i for i = j, see Sect. 3.1.3. Then
 
per A = ai j , (4.1.4.3)
 (i, j)∈

where the sum is taken over all directed cycle covers  of the complete graph. Since
A is symmetric, the contributions of any two 1 and 2 that differ just by orientations
on their cycles are the same and therefore (4.1.4.3) can be written as
 
per A = 2c>2 () ai j , (4.1.4.4)
 {i, j}∈

where the sum is taken over all graphs  that are disjoint union of undirected cycles
and contain all vertices {1, . . . , n} and where c>2 () is the number of cycles in 
consisting of more than 2 edges. Comparing (4.1.4.2) and (4.1.4.4), we conclude that

per A ≥ (haf A)2 .


The results of Sects. 3.6 and 3.7 almost verbatim transfer from permanents to
hafnians.
4.1.5 Theorem. There exists an absolute constant δ0 > 0 (one can choose δ0 = 0.5)
such that for any even integer n and for any n × n symmetric matrix A = ai j with
complex entries satisfying
 
1 − ai j  ≤ δ0 for all i = j

we have
haf A = 0.

For any 0 < δ < δ0 there exists γ = γ(δ) > 0 and for any 0 <  < 1 and positive
 apolynomial p = pn,δ, in the entries of an n × n complex
even integer n there exists
symmetric matrix A = ai j satisfying

deg p ≤ γ(ln n − ln )
4.1 Hafnians 99

and such that


|ln haf A − p(A)| ≤ 

provided
 
1 − ai j  ≤ δ for all i = j.

Proof. The proof closely follows


 those
 of Theorems
  3.6.1 and 3.6.2. First, we show
by induction on m that if A = ai j and B = bi j are two symmetric (2m) × (2m)
complex matrices satisfying
   
1 − ai j  ≤ 0.5 and 1 − bi j  ≤ 0.5 for all i = j

and such that the entries of A and B coincide except possibly in the i-th row and
i-th column for some unique i then haf A = 0, haf B = 0 and the angle between
non-zero complex numbers haf A and haf B does not exceed π/2.
This clearly holds for m = 1. Assuming that m > 1, without loss of generality we
assume that B is obtained from A by replacing the entries a1 j = a j1 by b1 j = b j1
for j = 2, . . . , 2m. Using (4.1.1.3), we write

2m 
2m
haf A = a1 j haf A j and haf B = b1 j haf A j ,
j=2 j=2

where A j is the (2m − 2) × (2m − 2) matrix obtained from A by crossing out the first
and the j-th row and the first and the j-th column. We note that, up to a simultaneous
permutation of rows and columns, any two matrices A j1 and A j2 differ in at most
the i-th row and i-th column for some unique i, so by the induction hypothesis
haf A j = 0 for all j = 2, . . . , 2m and the angle between any two non-zero complex
numbers haf A j1 and haf A j2 does not exceed π/2. Applying Lemma 3.6.4 with
u j = haf A j , a j = a1 j and b j = b1 j , as in Sect. 3.6.5, we conclude that haf A = 0,
haf B = 0 and the angle between haf A and haf B does not exceed π/2.
Next, we construct the polynomial p. Let J = Jn be the n × n matrix filled with
1 s and let n = 2m. We define the polynomial

g(z) = haf (J + z(A − J ))

of degree at most m, so that


(2m)!
g(0) = haf J = and g(1) = haf A.
2m m!
Moreover, for β = δ0 /δ > 1, we have g(z) = 0 whenever |z| ≤ β. We choose a
branch of f (z) = ln g(z) for |z| ≤ 1 such that f (0) is real and use Lemma 2.2.1 to
claim that for some k ≤ γ(ln n − ln ) the Taylor polynomial

k
f (s) (0) s
pk (z) = f (0) + z
s=1
s!
100 4 Hafnians and Multidimensional Permanents

approximates f (z) for |z| ≤ 1 within an additive error . We need to show that
pk (1) is a polynomial of degree at most k in the entries of A. To finish the proof as
in Sect. 3.6.7, it suffices to show that g (s) (0) is a polynomial in the entries of A of
degree at most s.
Indeed,

ds 

g(z) 
dz s z=0
ds        
= s 1 + z ai1 j1 − 1 · · · 1 + z aim , jm − 1  ,
dz {i , j },...,{i z=0
1 1 m , jm }

where the sum is taken over all unordered partitions of the set {1, . . . , n} into m
unordered pairs {i 1 , j1 }, . . . , {i m , jm }. Therefore,

(2m − 2s)!s!     
g (s) (0) = ai1 j1 − 1 · · · aim , jm − 1 ,
(m − s)!2m−s {i 1 , j1 },...,{i s , js }

where the sum i of s pairwise disjoint unordered pairs {i 1 , j1 }, . . . , {i s , js }. 

We observe that for a fixed δ < δ0 , the polynomial p(A) in Theorem 4.1.5 can be
computed in n O(ln n−ln ) time.

4.1.6 Theorem. Let us fix a real 0 ≤ δ < 1 and let


π 
τ = (1 − δ) sin − arctan δ > 0.
4
 
For an even n, let Z = z i j be an n × n symmetric complex matrix such that
   
1 − z i j  ≤ δ and  z i j  ≤ τ for all i, j.

Then
haf Z = 0.

As in Sect. 3.7, we deduce from Theorem 4.1.6 the following result.

4.1.7 Theorem. For any 0 ≤ δ < 1 there exists γ = γ(δ) > 0 such that for any
positive even integer n and any real 0 <  ≤ 1 there exists a polynomial p = pn,δ,
in the entries of an n × n symmetric matrix A such that deg p ≤ γ(ln n − ln ) and

|ln haf A − p(A)| ≤ 


 
provided A = ai j is a real symmetric matrix satisfying
 
1 − ai j  ≤ δ for all i, j.
4.1 Hafnians 101

Fig. 4.3 A connected


3-regular graph with no
perfect matchings

Similarly, for any δ > 0, fixed in advance, the polynomial p can be computed in
n O(ln n−ln ) time.
The proofs of Theorems 4.1.6 and 4.1.7 closely follow the proofs of Sect. 3.7 with
necessary adjustments as in the proof of Theorem 4.1.5, see also [B16+].
The main difficulty of dealing with hafnians compared to dealing with permanents
is that there appears to be no parallel theory relating hafnians to stable polynomials,
cf. Sects. 3.2–3.3, but see also Sect. 6 of [FG06] for an attempt to extend the theory
to hafnians. Consequently, there is no analogue of the van der Waerden inequality
(Theorem 3.3.2) for hafnians. As the following simple example shows, the hafnian
of a symmetric doubly stochastic matrix can be equal to 0. Indeed, if G is a graph
that is a disjoint union of an even number of triangles, and A is the adjacency matrix
of G then B = (1/2)A is a symmetric doubly stochastic matrix and haf B = 0.
Figure 4.3 demonstrates a more complicated example of a 3-regular graph without
perfect matchings.
If A is the adjacency matrix of the graph on Fig. 4.3, then B = (1/3)A is a
symmetric doubly stochastic matrix and haf B = 0. On the other hand, the number
of perfect matchings in a bridgeless 3-regular graph is exponentially large in the
number of vertices [E+11].

4.2 Concentration of Hafnians of α-Conditioned Doubly


Stochastic Matrices

Although there is no hafnian analogue of the van der Waerden inequality, some of
the corollaries of that inequality can be extended to hafnians, in particular, concen-
tration of hafnians of doubly stochastic matrices with relatively uniform entries, see
Sect. 3.4.6. We start with a definition.
102 4 Hafnians and Multidimensional Permanents
 
4.2.1 Definition. Let A = ai j be a symmetric matrix with zero diagonal and
positive off-diagonal entries. For α ≥ 1, we say that A is α-conditioned if

ai j1 ≤ αai j2 for all i = j1 , j2 .

The goal of this section is to prove the following result.

4.2.2 Theorem. For any α ≥ 1, there is a γ = γ(α) > 0 such that if A is a 2m × 2m


symmetric doubly stochastic α-conditioned matrix with zero diagonal, we have

m −γ e−m ≤ haf A ≤ m γ e−m .

We follow [BS11]. First, we need to adapt the technique of matrix scaling, see
Sect. 3.5, to hafnians.
 
4.2.3 Scaling. Let A = ai j be an n × n symmetric matrix with zero diagonal.
  We
say that A is obtained by scaling from an n × n symmetric matrix B = bi j if

ai j = λi λ j bi j for all i, j

and some λ1 , . . . , λn . If n is even, then the hafnians of A and B are defined and
 n 

haf A = λi haf B.
i=1

Note that compared to scaling of general matrices, we get just n scaling factors λi ,
instead of 2n factors λi and μ j in the case of the permanent.
The following result is a more or less straightforward extension of Theorem 3.5.2
and Lemma 3.5.3.
 
4.2.4 Theorem. Let A = ai j be an n × n symmetric matrix with zero diagonal
and positive off-diagonal
 entries.
 Then there exists a unique n × n symmetric doubly
stochastic matrix B = bi j and unique positive λ1 , . . . , λn such that

bi j = λi λ j ai j for all i, j.

The matrix B can be found as the minimum point of the convex function
 xi j
f (X ) = xi j ln
1≤i= j≤n
ai j

on the polyhedron of n × n symmetric doubly stochastic matrices X with zero diag-


onal, in which case
n
f (B) = 2 ln λi .
i=1
4.2 Concentration of Hafnians of α-Conditioned Doubly Stochastic Matrices 103

Let C = C(A) ⊂ Rn be the convex set defined by


⎧ ⎫
⎨  ⎬
C= x = (x1 , . . . , xn ) : ai j e xi +x j ≤ n
⎩ ⎭
1≤i= j≤n

and let x0 = (ξ1 , . . . , ξn ) where ξi = ln λi . Then x0 is the unique maximum point of


the linear function (x) = x1 + . . . + xn on C.

Proof. The proof of the first part is very similar to the proof of Theorem 3.5.2 and
therefore omitted. To prove the second part, we observe that the point x0 lies on the
boundary of ∂C that is a smooth strictly convex hypersurface defined by the equation

ai j e xi +x j = n,
1≤i= j≤n


cf. Sect. 2.1.1.3. Moreover, the gradient of g(x) = i= j ai j e xi +x j at x0 is (2, . . . , 2),
from which it follows that the affine hyperplane H defined by the equation (x) =
(x0 ) is tangent to ∂C at x0 . Since C is convex, H is the supporting affine hyperplane
at x0 and hence x0 is an extremal point of . Then x0 has to be the maximum point,
because  is unbounded from below on C. 

Our next result is a version of Lemma 3.5.6 for hafnians.

4.2.5 Lemma. Let A be an α-conditioned n×n symmetric matrix with zero diagonal.
Suppose that A is obtained by scaling from a doubly stochastic symmetric n×n matrix
B. Then B is α2 -conditioned.

Proof. Let λ1 , . . . , λn be the scaling factors so that

bi j = λi λ j ai j for all i, j.

Let us choose two 1 ≤ i = j ≤ n. Then


 
aik λi λk = bik = 1 − bi j (4.2.5.1)
k=i, j k=i, j

and  
a jk λ j λk = b jk = 1 − bi j . (4.2.5.2)
k=i, j k=i, j

Comparing (4.2.5.1) and (4.2.5.2) and using that A is α-conditioned, we conclude


that
λi ≤ αλ j for all i, j,

from which B is α2 -conditioned. 


104 4 Hafnians and Multidimensional Permanents

We prove Theorem 4.2.2 by induction on m, for which we need yet another lemma
which bounds the product of the scaling factors of an α-conditioned matrix.
 
4.2.6 Lemma. For n > 2, let A = ai j be an α-conditioned n × n symmetric
matrix with zero diagonal. Suppose that

ai j = n
1≤i= j≤n

and that  
  
 β
1 − ai j  ≤ for i = 1, . . . , n

 j: j=i  n

and some
n−2
0 ≤ β ≤ .

 
Suppose that A is obtained from a symmetric doubly stochastic matrix B = bi j by
scaling, so that
bi j = λi λ j ai j for all i, j

and some positive λ1 , . . . , λn . Then


n
8β 2 α
0 ≤ ln λi ≤ .
i=1
n

Proof. Let us define



δi = 1 − ai j for i = 1, . . . , n,
j: j=i

so that

n
δi = 0 (4.2.6.1)
i=1

and
β
|δi | ≤ for i = 1, . . . , n. (4.2.6.2)
n
 
Let us define an n × n matrix X = xi j by

δi + δ j
xi j = ai j + wi j where wi j = for i = j.
n−2
4.2 Concentration of Hafnians of α-Conditioned Doubly Stochastic Matrices 105

and xii = 0 for i = 1, . . . , n. We observe that X is a symmetric n × n matrix with


row and column sums 1 and zero diagonal. Moreover, since A is α-conditioned, we
have
1
ai j ≥ for all i = j (4.2.6.3)
(n − 1)α

while by (4.2.6.2), we have

  2β 1
wi j  ≤ ≤ (4.2.6.4)
n(n − 2) nα

and hence xi j ≥ 0, so X is doubly stochastic.


By Theorem 4.2.4,


n
1  xi j 1    ai j + wi j
ln λi ≤ xi j ln = ai j + wi j ln
i=1
2 1≤i= j≤n ai j 2 1≤i= j≤n ai j
 
1    wi j 1  wi2j
≤ ai j + wi j = wi j + .
2 1≤i= j≤n ai j 2 1≤i= j≤n ai j

Now, by (4.2.6.1)
 1 
wi j = (δi + δ j ) = 0
1≤i= j≤n
n − 2 1≤i= j≤n

and by (4.2.6.2)–(4.2.6.4),

 wi2j 4β 2 α(n − 1) 16β 2 α


≤ n(n − 1) ≤ ,
1≤i= j≤n
ai j n (n − 2)
2 2 n

n
which proves the upper bound for i=1 ln λi . To prove the lower bound, we note
that x = (0, . . . , 0) is a feasible point of the set C(A) of Theorem 4.2.4 and hence


n
ln λi ≥ 0.
i=1

4.2.7 Proof of Theorem 4.2.2. All implicit constants in the “O” notation below
depend only on α.
For a set I ⊂ {1, . . . , 2m}, let A(I ) denote the submatrix of A consisting of
the entries ai j with i, j ∈ I . Hence A(I ) is a symmetric α-conditioned with zero
diagonal. Let B(I ) be the doubly stochastic matrix obtained from A(I ) by scaling.
We prove by induction on k = 1, . . . , m that
106 4 Hafnians and Multidimensional Permanents
⎧ ⎞⎫

⎨ k
1 ⎬
haf B(I ) = exp −k + O ⎝ ⎠ where |I | = 2k. (4.2.7.1)
⎩ j ⎭
j=1

Let I ⊂ {1, . . . , 2m} be a subset such that |I | = 2k > 2. Let us pick an i ∈ I . To


simplify the notation somewhat, we denote B(I ) just by B and assume without loss
of generality that i = 1. We use the row expansion (4.1.1.3):

haf B = b1 j haf B j , (4.2.7.2)
j∈I \{1}

where B j is the matrix obtained from B by crossing out the 1st and the jth row
and the 1st and the jth column. Note that (4.2.7.2) represents haf B as a convex
combination of haf B j .
By Lemma 4.2.5, the matrix B is α2 -conditioned. Since B is doubly stochastic,
it follows that the entries of B do not exceed α2 /(2k − 1). Let σ j be the sum of the
matrix entries of B j . Hence
 
1
σ j = 2k − 4 + O . (4.2.7.3)
k

Let us scale B j to the total sum of entries 2k − 2, so we define

 2k − 2
Bj = B j for j ∈ I \ {1}.
σj

Then  k−1
σj
haf B j = haf 
B j for j ∈ I \ {1}
2k − 2

and by (4.2.7.3) we conclude that


 !
1
haf B j = exp −1 + O haf 
Bj. (4.2.7.4)
k

To estimate haf B j , we apply Lemma 4.2.6. Let us scale  B j to a doubly stochastic


matrix. The doubly stochastic matrix we get is the same matrix we obtain from
A(I \ {1, j}) by scaling, that is, the matrix B(I \ {1, j}).
Since B j is obtained by crossing out two rows and two columns of a doubly
stochastic matrix B, the row and column sums of B j do not exceed 1, but since the
entries of B do not exceed α2 /(2k − 1), the row and column sums of B j are at least

2α2
1− .
2k − 1
4.2 Concentration of Hafnians of α-Conditioned Doubly Stochastic Matrices 107

By (4.2.7.3), the absolute value of the difference between any row or column sum of

B j and 1 is O(1/k).
Applying Lemma 4.2.6, we conclude that for all k ≥ γ1 (α), we have
 !
1
haf 
B j = exp O haf B (I \ {1, j}) , (4.2.7.5)
k

where γ1 (α) is some positive constant. We use a trivial estimate

haf B = e O(1) provided k < γ1 (α). (4.2.7.6)

Combining (4.2.7.6), (4.2.7.5), (4.2.7.2) and the induction hypothesis, we complete


the proof of (4.2.7.1). 

4.2.8 Remark. The gist of Theorem 4.2.2 is the lower bound for haf A. As one can
see from the proof, we get a much better upper bound combining the inequalities of
Theorem 4.1.4 and Corollary 3.4.5.

4.3 Hafnians and Pfaffians


 
4.3.1 Pfaffian. Let n be a positive even integer, n = 2m, and let A = ai j be an
n × n skew-symmetric matrix, so that ai j = −a ji for all 1 ≤ i, j ≤ n. The Pfaffian
of A is defined as

1  m
Pf A = (sgn σ) aσ(2i−1)σ(2i) , (4.3.1.1)
m!2m σ∈S i=1
n

see, for example, Sect. 1 of Chap. VI of [We97] or Chap. 29 of [Pr94]. Note that
while different permutations σ may contribute the same product in (4.3.1.1), all those
products are counted with the same sign: if σ1 = σ2 τ , where τ is the transposition,
τ = (2i − 1, 2i), say, then sgn σ1 = − sgn σ2 but since A is skew-symmetric, the
signs of monomials in (4.3.1.1) corresponding to σ1 and σ2 coincide. Similarly, if
σ1 = σ2 τ where τ is the product of two transpositions, τ = (2i 1 −1, 2i 2 −1)(2i 1 , 2i 2 ),
then sgn σ1 = sgn σ2 and the signs of monomials corresponding to σ1 and σ2
coincide.
One can of course define Pf A for an arbitrary matrix A by (4.3.1.1) but then the
Pfaffian of an arbitrary matrix will coincide with the Pfaffian of its skew-symmetric
part:  
A − AT
Pf A = Pf .
2
108 4 Hafnians and Multidimensional Permanents

Assuming that A is a skew-symmetric


" complex matrix, we may identify with A the
exterior 2-form ω A ∈ 2 Cn ,

ωA = ai j ei ∧ e j ,
1≤i< j≤n

where e1 , . . . , en is the standard basis of Cn . In this case,

ω ∧ . . . ∧ ω A = (m! Pf A) e1 ∧ . . . ∧ en . (4.3.1.2)
# A $% &
m times

Let G be an n × n complex matrix. Then the matrix B = G T AG is skew-symmetric


and 
ωB = ai j (Gei ) ∧ (Ge j ).
1≤i< j≤n

Since
(Ge1 ) ∧ . . . ∧ (Gen ) = (det G) (e1 ∧ . . . ∧ en ) ,

it follows from (4.3.1.2) that

Pf (G AG T ) = (det G) Pf A. (4.3.1.3)

Equation (4.3.1.3) allows us to compute Pf A efficiently: indeed, for every 2m × 2m


skew-symmetric matrix A, one can easily compute a matrix G that A = G T K G,
 such 
0 1
where K is a 2m × 2m block-diagonal matrix with blocks , so
−1 0
⎛ ⎞
0 1 0 0 ... 0 0
⎜−1 0 0 0 ... 0 0⎟
⎜ ⎟
⎜. . . ... ... ... ... ... . . .⎟
K =⎜
⎜. . .
⎟, (4.3.1.4)
⎜ ... ... ... ... ... . . .⎟

⎝0 0 ... 0 0 0 1⎠
0 0 ... 0 0 −1 0

see, for example, Sect. 21 of [Pr94]. Then


 
Pf A = Pf G T K G = (det G) Pf K = det G.

4.3.2 Perfect matchings in directed graphs. Let H be a directed graph with set
{1, . . . , n} of vertices, no loops and at most one edge i j or ji connecting any two
vertices i and j. !We assume that n is an even integer, n = 2m. A collection I =
−→ −→
i 1 i 2 , . . . , i n−1 i n of pairwise disjoint edges of H is called a perfect matching of H .
4.3 Hafnians and Pfaffians 109

We define sgn I = sgn σ where σ ∈ Sn is a permutation such that σ(k) = i k for


k = 1, . . . , n (as before, the sign of I does not depend on the order in which we list
the edges of I ).
Let I and J be two perfect matchings in H then the union I ∪ J is a cycle cover
 of H by even cycles (that is, cycles having an even length), cf. Fig. 4.4.

We call a cycle of  evenly oriented if, when we choose an orientation of the


cycle, the number of edges co-oriented with the cycle is even. Otherwise, we call
the cycle oddly oriented. Since the cycle is even, the definition does not depend on
the choice of an orientation of the cycle. For example, on Fig. 4.4, the 6-cycle is
evenly oriented while the 4- and 2-cycles are oddly oriented.

4.3.3 Lemma. For any two perfect matchings I and J of H , we have

(sgn I )(sgn J ) = (−1)k ,

where k is the number of evenly oriented cycles in I ∪ J .

Proof. First, we observe that if the conclusion of the lemma holds for H , it also
holds for the graph obtained from H by reversing the direction of one edge. Indeed,
if that edge belongs neither to I nor to J then reversing its direction does not change
sgn I , sgn J or k. If the edge belongs to I and to J both, then reversing its direction
changes both sgn I and sgn J . However, since that edge forms a 2-cycle in I ∪ J ,
which is always oddly oriented, cf. Fig. 4.4, changing the direction of the edge does
not change the number of evenly oriented cycles. Finally, if the edge belongs to I
and not to J then changing the direction of the edge reverses sgn I , leaves sgn J
intact and changes k by 1.
Therefore, without loss of generality, we assume that all cycles of length greater
than 2 in I ∪ J are oriented, cf. Fig. 4.5.
In this case, k is the number of cycles of  of length greater than 2. We define two
permutations σ, τ ∈ Sn as follows: We number the cycles of , listing the cycles of
length greater than 2 first, list the vertices of the first cycle in the order of the cycle,
then the vertices of the second cycle in the order of the cycle, etc., just obtaining a
permutation i 1 . . . i n . We define σ(l) = il for l = 1, . . . , n. To define τ (l), we first
determine the cycle in which the l-th vertex il lies. If il lies in a cycle of length greater
than 2, we let τ (l) to be the next vertex of the same cycle in the order of the cycle.
If il lies in the cycle of length 2, we let τ (l) = il .

Fig. 4.4 A cycle cover  of


H
110 4 Hafnians and Multidimensional Permanents

Fig. 4.5 Oriented cycles i 12


and a 2-cycle i6 i1 i7 i8

i5 i2
i10 i9
i 11
i4 i3

For example, for the cycle cover on Fig. 4.5, we define σ(l) = il for l = 1, . . . , 12
and τ (1) = i 2 , τ (2) = i 3 , τ (3) = i 4 , τ (4) = i 5 , τ (5) = i 6 , τ (6) = i 1 , τ (7) = i 8 ,
τ (8) = i 9 , τ (9) = i 10 , τ (10) = i 7 , τ (11) = i 11 and τ (12) = i 12 , in which case σ
−→ −→ −→ −→ −→ −→
corresponds to the perfect matching i 1 i 2 , i 3 i 4 , i 5 i 6 , i 7 i 8 , i 9 i 10 , i 11 i 12 and τ corresponds
−→ −→ −→ −→ −→ −→
to the perfect matching i 2 i 3 , i 4 i 5 , i 6 i 1 , i 8 i 9 , i 10 i 7 , i 11 i 12 .
We have  
(sgn I )(sgn J ) = (sgn σ)(sgn τ ) = sgn τ σ −1 .

However, τ σ −1 is the permutation that is the product of k even cycles, so


 
sgn τ σ −1 = (−1)k .

For the example on Fig. 4.5, we have τ σ −1 = (i 1 i 2 i 3 i 4 i 5 i 6 ) (i 7 i 8 i 9 i 10 ) 

4.3.4 Theorem. Let A be a skew-symmetric n × n matrix, where n = 2m is an even


integer. Then
(Pf A)2 = det A.

Proof. The result immediately follows from (4.3.1.3) and the fact that det K = 1 for
the matrix K defined by (4.3.1.4). It is instructive, however, to give a combinatorial
proof along the lines of the proof of Theorem 4.1.4.
Let G be a complete directed graph with set {1, . . . , n} of vertices and edges i j for
all pairs i, j including i = j. We introduce weights ai j on the edges i j (in particular,
loops ii have weight 0).
We write  
det A = (sgn ) ai j , (4.3.4.1)
 i j∈

where the sum is taken over all directed cycle covers  of G and sgn  is defined as
the sign of the corresponding permutation, cf. Sect. 3.1.3. Note that sgn  depends
only on the cycle structure of , that is, on the number cycles of each length.
4.3 Hafnians and Pfaffians 111

Fig. 4.6 Reversing the


orientation of an odd cycle

Suppose that  contains a cycle of an odd length (an odd cycle). Since A is
skew-symmetric, reversing the orientation of an odd cycle changes the sign of the
corresponding term in (4.3.4.1), cf. Fig. 4.6.
Consequently, cycle covers  in (4.3.4.1) containing an odd cycle cancel each
other out, and so we can write
 
det A = (−1)c() ai j , (4.3.4.2)
 has no odd cycles i j∈

where c() is the number of cycles in .


Next, let G be the complete undirected graph with set {1, . . . , n} of vertices and no
 be a directed graph obtained by orienting the edges of G arbitrarily,
loops and let G
 Then we can
so that for every pair i = j exactly one edge i j or ji is included in G.
write (4.3.1.1) as

Pf A = (sgn I )ai1 i2 · · · ain−1 in ,
!
−→ −→
I= i 1 i 2 ,...,i n−1 i n

 cf. Sect. 4.3.2. Consequently,


where the sum is taken over all perfect matchings I of G,

(Pf A)2 = (sgn I )(sgn J )ai1 i2 · · · ain−1 in a j1 j2 · · · a jn−1 jn ,
!
−→ −→
I= i 1 i 2 ,...,i n−1 i n
!
−→ −→
J = j1 j2 ,..., jn−1 jn

where the sum is taken over all ordered pairs (I, J ) (we allow I = J and count
−→ −→ −→ −→
such pairs once). The union of edges i 1 i 2 , . . ., i n−1 i n , j1 j2 , . . ., jn−1 jn is a cycle
 where each cycle has an even length, cf. Fig. 4.4. Let c>2 () be
cover  of G,
the number of cycles of  of length greater than 2 (hence c>2 () = 2 for the
112 4 Hafnians and Multidimensional Permanents

cycle cover on Fig. 4.4). Then  can be represented as an ordered union I ∪ J of


vertex-disjoint perfect matchings I and J in 2c>2 () ways. By Lemma 4.3.3, the
product (sgn I )(sgn J ) is independent on the representation, which allows us to
define
() = (sgn I )(sgn J )

for perfect matchings I and J whose union is the cycle cover . Hence we can write
 
(Pf A)2 = ()2c>2 () ai j , (4.3.4.3)
 has no odd cycles i j∈

Furthermore, since A is skew-symmetric, the cycle cover obtained by reversing


the orientation of a single edge in G  contributes the same monomial to (4.3.4.3).
Moreover, since the cycle cover  of the undirected graph G can be oriented in
2c>2 () ways, we can rewrite (4.3.4.3) as
 
(Pf A)2 = ()(−1)c2 () ai j , (4.3.4.4)
 has no odd cycles i j∈

where the sum is taken over all oriented cycle covers  of the complete directed
graph G by even cycles and c2 () is the number of 2-cycles in . By Lemma
4.3.3, () = (−1)c>2 () and comparing (4.3.4.2) and (4.3.4.4), we complete the
proof. 

4.3.5 Pfaffian orientation. In view of Theorem 4.3.4, formula (4.3.1.3) and the fact
that the Pfaffian can be efficiently computed, the following question is of interest:
Given a 2m × 2m symmetric matrix A = ai j with zero diagonal, is it possible to
reverse the signs of some of the entries of A (that is, replace some ai j by −ai j ) so
that the resulting matrix B is skew-symmetric and haf A = Pf B?
Given such a matrix A, let us consider an undirected graph G A with set {1, . . . , n}
of vertices and edges {i, j} provided ai j = 0. We obtain a skew-symmetric matrix
B if for every unordered pair {i, j} we reverse the sign of exactly one entry among
ai j and a ji . This procedure is encoded by making the graph G A directed: for every
edges {i, j} of G A we introduce the directed edge i j if the sign of ai j is not reversed.
We denote the resulting directed graph by G B . If haf A = Pf B, we say that G B is
the Pfaffian orientation of G A .
Our next goal is to sketch a proof the famous result of Kasteleyn [Ka63], see also
[TF61], that if G A is a planar graph then it has a Pfaffian orientation, which can be
constructed efficiently. We follow [LP09].
We call an even cycle C in G A relevant if the graph obtained by deleting from
G A the vertices of C and all adjacent edges contains a perfect matching.

4.3.6 Lemma. Let G B be an orientation of G A . Suppose that every relevant cycle


C is oddly oriented. Then sgn I = sgn J for any two perfect matchings in G B .
4.3 Hafnians and Pfaffians 113

Fig. 4.7 A drawing of a


planar graph 1 2
III
3 4

5 I 6 II 7

8 9

Proof. As follows from Lemma 4.3.3, (sgn I )(sgn J ) = 1 for any two perfect
matchings I and J . 

Let us consider a drawing of a directed planar graph G in the plane. Connected


components of R2 \ G are called faces of G. There is one unbounded face, and there
can be no, one or several bounded faces. By choosing the orientation of the plane, we
can talk about the edges of any bounded face oriented clockwise or counterclockwise,
see Fig. 4.7.
For example, for the graph on Fig. 4.7, we have:
For the face I , the edges 3, 6 and 5 are oriented clockwise while the edge 8 is
oriented counterclockwise;
For the face I I , the edges 4 and 9 are oriented clockwise while the edges 6 and
7 is oriented counterclockwise;
For the face I I I , the edge 1 is oriented clockwise while the edges 3, 4 and 2 are
oriented counterclockwise.
The set of edges 1, 2, 7, 9, 8, 5 form a cycle C. With respect to that cycle, the
edges 1, 9 and 5 are oriented clockwise while the edges 2, 7 and 8 are oriented
counterclockwise.
Note that if the same edge belongs to two bounded faces then in one of the faces
it is oriented clockwise and in the other counterclockwise.
Similarly, we define the orientation of the edges of any directed cycle drawn on
the plane.
We will use the Euler formula relating the vertices, edges and faces of a planar
graph G. To apply Euler’s formula, we need the graph G to be 2-connected, meaning
that every two vertices of G can be connected by at least 2 vertex-disjoint (with
the exception of the endpoints) paths in G, so that G has no “loose ends” and the
embedding of G looks like the one on Fig. 4.8.

4.3.7 Lemma. Let G be a drawing of a 2-connected directed graph, without loops


or multiple edges, in the plane. Suppose that every bounded face has an odd number
of edges oriented clockwise. Then every relevant cycle C is G oddly oriented.

Proof. Since C is relevant, the graph obtained from G by deleting the vertices of C
and all adjacent edges contains a perfect matching and, therefore, the number v of
vertices of G lying inside the region bounded by C is even, so
114 4 Hafnians and Multidimensional Permanents

Fig. 4.8 A drawing of a


2-connected planar graph

v≡0 mod 2.

Let w be the number of vertices of C and hence also the number of edges of C. Let
f be the number of faces lying inside C, let ci be the number of clockwise oriented
edges in the i-th face and let c be the number of clockwise edges in C. Since each ci
is odd, we have
f
ci ≡ f mod 2.
i=1

Let e be the number of edges inside C. Then by the Euler’s formula,

(v + w) − (e + w) + f = 1

and hence
e =v+ f −1

Since every interior edge is counted as clockwise for exactly one face, we have


f
ci = e + c
i=1

and hence
f ≡ e + c = c + v + f − 1 mod 2.

It follows then that


c ≡ 1 mod 2,

as required. 
Now it is clear how to construct a Pfaffian orientation of a planar graph: we build
the graph edge by edge so that at most one new bounded face appears at each step.
We orient the edge in such a way that the new face has an odd number of clockwise
oriented edges, see Fig. 4.9.
4.3 Hafnians and Pfaffians 115

Fig. 4.9 Constructing a Pfaffian orientation of a graph

As follows by Lemmas 4.3.6 and 4.3.7, for any two perfect matchings I and J
in the graph, we have sgn I = sgn J and hence haf A = | Pf B| for the skew-
symmetric matrix B constructed from a given symmetric matrix A. If it so happens
that haf A = − Pf B, we reverse the sign of the first row and column of B.
Galluccio and Loebl proved that if the genus of the graph G A is g then haf A can
be written as a sum of 4g Pfaffians [GL99]. While no efficient algorithm for checking
whether a given graph has a Pfaffian orientation appears to be known, in the case of a
bipartite graph there is a polynomial time algorithm [R+99], see [Th06] for a survey.
 
4.3.8 Hafnians as expectations of random Pfaffians. Let A = ai j be a non-
negative real symmetric n × n matrix, where n = 2m is even. For 1 ≤ i < j ≤ n let
ξi j be real valued independent random variables such that

E ξi j = 0 and var ξi j = 1 for all 1 ≤ i < j ≤ n.


 
Let us define a skew-symmetric random matrix B = bi j by
⎧ √

⎨ξi j ai j if i < j

bi j = −ξi j ai j if i > j


0 if i = j.

It is not hard to see that


haf A = E (Pf B)2 .
116 4 Hafnians and Multidimensional Permanents

As in Sect. 3.9.1, the Markov inequality implies that the probability that (Pf B)2
overestimates haf A by a factor of λ > 1 does not exceed 1/λ. In [Ba99] it is shown
that if ξi j are independent standard Gaussian then with probability approaching 1 as
n grows, we have
(Pf B)2 > cn haf A

for some absolute constant c > 0 (one can choose c ≈ 0.28). As in Sect. 3.9, we can
get a better constant c ≈ 0.56 by switching to complex Gaussian ξi j and replacing
(Pf B)2 by |Pf B|2 , but unlike in the case of the permanent there does not seem to
exist a viable quaternionic version of the estimator.
In [R+16], the authors identified a class of matrices A for which the approximation
factor is subexponential in n.

4.4 Multidimensional Permanents


 
4.4.1 Permanents of tensors. Let A = ai1 ...id be a d-dimensional cubical n ×
. . . × n array (tensor) of complex numbers. We define the d-dimensional permanent
of A by
  n
PER A = aiσ2 (i)...σd (i) ,
σ2 ,...,σd ∈Sn i=1

where the sum is taken over all (d − 1)-tuples (σ2 , . . . , σd ) of permutations sampled
independently from the symmetric group Sn . In particular, if d = 2 then A is an n ×n
matrix and PER A = per A, cf. Sect. 3.1.1.
If ai1 ...id ∈ {0, 1} for all 1 ≤ i 1 , . . . , i d ≤ n, then PER A is naturally interpreted
as the number of perfect matchings in the d-partite hypergraph H encoded by A: the
vertices of H are split among d classes, where each class contains exactly n vertices,
numbered 1, . . . , n and the edges of H consist of the d-tuples (i 1 , . . . , i d ) where
ai1 ...id = 1 and i j denotes the i j -th vertex from the j-th class. A perfect matching in
H is a collection of edges containing each vertex exactly once.
For example, the perfect matching in a 3-partite hypergraph pictured on Fig. 4.10
corresponds to the pair of permutations (σ2 , σ3 ), where
   
1234 1234
σ2 = and σ3 = .
2314 3214

Hence for d ≥ 3 it is an NP-hard problem to decide whether PER A > 0 for a given
d-dimensional array A with 0-1 entries.
Given a d-dimensional array A, we call a (k, j)-th slice of A the set of all entries
ai1 ...id where i k = j. Hence if d = 2 a slice is a row (k = 1) or a column (k = 2) of
the matrix A and for a general d, each entry of A is contained in exactly d slices and
each slice consists of some n d−1 entries of A.
4.4 Multidimensional Permanents 117

Fig. 4.10 A perfect


matching in a 3-partite
hypergraph

Similarly to (3.1.1.2), we obtain the “(1, 1)-slice expansion” of the permanent of


a tensor: 
PER A = a1i2 ...id PER Ai2 ...id , (4.4.1.1)
1≤i 2 ,...,i d ≤n

where Ai2 ...id is the d-dimensional (n − 1) × . . . × (n − 1) array obtained from A by


crossing out all slices containing a1i2 ...id .
Some (but far from all) of the results and methods developed in Chap. 3 extend to
multi-dimensional permanents. In particular, the permanents of tensors whose entries
are close to 1 can be efficiently approximated, cf. Theorems 3.6.1 and 3.6.2.
4.4.2 Theorem. For an integer d ≥ 2 let us choose

α (d − 1)α
δd = sin cos
2 2
for some α = αd > 0 such that


(d − 1)α < .
3

Hence 0 < δd < 1 and we can choose δ2 = 0.5, δ3 = 6/9 ≈ 0.272, δ4 ≈ 0.1845
and δd = (1/d).
 
(1) For any d-dimensional cubical array Z = z i1 ...id of complex numbers satisfying
 
1 − z i ...i  ≤ δd for all 1 ≤ i 1 , . . . , i d ≤ n
1 d

we have
PER Z = 0.

(2) For any 0 < δ < δd there is γ = γ(δd /δ) > 0 and for any  > 0 and integer
n ≥ 1 there is a polynomial p = pd,n,,δ in the entries of a d-dimensional
n × . . . × n array A such that
118 4 Hafnians and Multidimensional Permanents

deg p ≤ γ(ln n − ln )

and
|ln PER A − p(A)| ≤ 
 
provided A = ai1 ...id is a cubical d-dimensional n × . . . × n array of complex
numbers satisfying
 
1 − ai  < δ for all 1 ≤ i 1 , . . . , i d ≤ n.
1 ...i d

Proof. The proof is similar to those of Sect. 3.6. In Part (1), let α = αd be a real
number such that 0 < (d − 1)α < 2π/3 and

α (d − 1)α
δd = sin
cos .
2 2
   
We prove by induction on n that if A = ai1 ...id and B = bi1 ...id are two n × . . . × n
arrays of complex numbers satisfying
   
1 − ai  ≤ δd and 1 − bi ...i  ≤ δd
1 ...i d 1 d

for all 1 ≤ i 1 , . . . , i d ≤ n and such that A and B differ in at most one slice, then
PER A = 0, PER B = 0 and the angle between two non-zero complex numbers
PER A and PER B does not exceed α.
For n = 1, then clearly PER A = 0 and PER B = 0 and the angle between the
two numbers does not exceed 2 arcsin δd < α. Assuming that n > 1, without loss of
generality, we assume that B is obtained from A by replacing a1i2 ...,id by b1i2 ...id for
all 1 ≤ i 2 , . . . , i d ≤ n. By (4.4.1.1), we have

PER A = a1i2 ...id PER Ai2 ...id and
1≤i 2 ,...,i d ≤n

PER B = b1i2 ...id PER Ai2 ...id ,
1≤i 2 ,...,i d ≤n

where Ai2 ...id is the (n − 1) × . . . × (n − 1) array obtained from A by crossing out all d
slices containing a1i2 ...id . Next, we observe that any two arrays Ai2 ...id and A j2 ... jd , up to
a permutation of slices, differ in at most (d − 1) slices. By the induction hypothesis
we have PER Ai2 ...id = 0 for all 1 ≤ i 2 , . . . , i d ≤ n and the angle between any two
non-zero complex numbers PER Ai2 ...id and PER A j2 ... jd does not exceed (d − 1)α.
Applying Lemma 3.6.4, we conclude that PER A = 0, PER B = 0 and the angle
between PER A and PER B does not exceed
4.4 Multidimensional Permanents 119

δd
2 arcsin = α,
cos (d−1)α
2

which completes the proof of Part (1).


To prove Part (2), let J = Jd,n be the n × . . . × n tensor filled with 1s. We define
a polynomial
g(z) = PER (J + z(A − J ))

of degree at most n, so that

g(0) = PER J = (n!)d−1 and g(1) = PER A.

Moreover, for β = δd /δ > 1 we have g(z) = 0 whenever |z| ≤ β. We choose a


branch of f (z) = ln g(z) for |z| ≤ 1 such that f (0) is real and use Lemma 2.2.1 to
claim that for some k ≤ γ(ln n − ln ) the Taylor polynomial

k
f (m) (0) m
pk (z) = f (0) + z
m=1
m!

approximates f (z) for |z| ≤ 1 within an additive error . We need to show that
pk (1) is a polynomial of degree at most k in the entries of A. To finish the proof as
in Sect. 3.6.7, it suffices to show that g (m) (0) is a polynomial in the entries of A of
degree at most m. Indeed,

dm   n
   
g (m) (0) = 1 + z a iσ2 (i)...σd (i) − 1 
dz m σ ,...,σ ∈S i=1 z=0
2 d n
     
= ai1 σ2 (i1 )...σd (i1 ) − 1 · · · aim σ2 (im )...σd (im ) − 1 ,
σ2 ,...,σd ∈Sn (i 1 ,...,i m )

where the last sum is taken over all ordered m-tuples of distinct indices (i 1 , . . . , i m ).
Therefore,

g (m) (0) = ((n − m)!)d−1


     
× ai11 i21 ...id1 − 1 ai12 i22 ...id2 − 1 · · · ai1m i2m ...idm − 1 ,
(i 11 ,...,i 1m )
(i 21 ,...,i 2m )
.........
(i d1 ,...,i dm )

 
where the sum is taken over ordered d-tuples of ordered m-tuples i j1 , . . . , i jm for
1 ≤ j ≤ d of distinct indices i jk . 
120 4 Hafnians and Multidimensional Permanents

For fixed d and δ, the polynomial p can be computed in n O(ln n−ln ) time. Later,
in Theorem 5.5.3, we prove that PER A = 0 if the 1 distance of each slice of a
d-dimensional n ×. . .×n complex cubical array A to the array of 1 s does not exceed
γd n d−1 , where γd = (α(d − 1))d−1 /d d and α ≈ 0.278 is an absolute constant.
If the entries of the tensor are real positive, we obtain better bounds, although
for d > 2 the improvement is not as substantial as in the case of permanents, see
Sect. 3.7.

4.4.3 Theorem. For an integer d ≥ 2, let


π
δd = tan
4(d − 1)
√ √
so that δ2 = 1, δ3 = 2 − 1 ≈ 0.41, δ4 = 2 − 3 ≈ 0.27, etc.
(1) Let us fix real δ and τ where
 
π
0 ≤ δ < δd and τ = (1 − δ) sin − arctan δ > 0
4(d − 1)
 
Let Z = z i1 ...id be a d-dimensional n × · · · × n array of complex numbers such
that for all 1 ≤ i 1 , . . . , i d ≤ n we have
   
1 − z i1 ...id  ≤ δ and  z i1 ...id  ≤ τ .

Then
PER Z = 0.

(2) For any integer d ≥ 2 and any 0 ≤ δ < δd there is a constant γ = γ(δd /δ) > 0
and for any positive integer n and real 0 <  < 1 there is a polynomial p =
pn,d,δ, of deg p ≤ γ (ln n − ln ) in the entries of a d-dimensional n × · · · × n
array such that
|ln PER A − p(A)| ≤ 
 
for any d-dimensional n × · · · × n array A = ai1 ...id of real numbers satisfying
 
1 − ai ...i  ≤ δ for all 1 ≤ i 1 , . . . , i d ≤ n.
1 d

Proof. The proof is similar to those of Sect. 3.7. Let Un = Un (d, δ, τ ) be the set of all
d-dimensional n × . . . × n complex tensors Z = z i1 ...id that satisfy the conditions
of Part (1). We prove by induction on n that for any two tensors A, B ∈ Un that differ
in at most one slice, we have PER A = 0, PER B = 0 and the angle between the
π
complex numbers does not exceed .
2(d − 1)
If n = 1 then clearly PER A = 0, PER B = 0 and the angle between PER A
and PER B does not exceed
4.4 Multidimensional Permanents 121

τ π π
2 arctan ≤ − 2 arctan δ < .
1−δ 2(d − 1) 2(d − 1)

Suppose that n > 1. Without loss of generality, we assume that B is obtained by A


by replacing a1i2 ...id by b1i2 ...id for all 1 ≤ i 2 , . . . , i d ≤ n. We have

PER A = a1i2 ...id PER Ai2 ...id and
1≤i 2 ,...,i d ≤n

PER B = b1i2 ...id PER Ai2 ...id ,
1≤i 2 ,...,i d ≤n

where Ai2 ...id is the (n − 1) × . . . × (n − 1) array obtained from A by crossing out all d
slices containing a1i2 ...id . Next, we observe that any two arrays Ai2 ...id and A j2 ... jd , up to
a permutation of slices, differ in at most (d − 1) slices. By the induction hypothesis,
we have PER Ai2 ...id = 0 for all 1 ≤ i 2 , . . . , i d ≤ n and that the angle between any
two non-zero complex numbers PER Ai2 ...id and PER A j2 ... jd does not exceed π/2.
Applying Part (3) of Lemma 3.7.3, we conclude that PER A = 0, PER B = 0 and
that the angle between PER A and PER B does not exceed
τ π
2 arctan δ + 2 arcsin = .
1−δ 2(d − 1)

To prove Part (2), Let J = Jn,d be the n × · · · × n tensor filled with 1 s and let us
define a univariate polynomial

r (z) = PER (J + z(A − J )).

Suppose that
−α ≤ z ≤ 1 + α and | z| ≤ ρ (4.4.3.1)

for some α > 0 and ρ > 0. Then


  
1 − (1 + α)δ ≤ 1 + z ai1 ...id − 1 ≤ 1 + (1 + α)δ and
   
 1 + z ai ...i − 1  ≤ ρδ.
1 d

Let us choose a sufficiently small α = α(δ) > 0 so that δ  = (1 + α)δ < δd and let
 
1 − δ π 
ρ= sin − arctan δ .
δ 4(d − 1)

Then by Part (1) we have r (z) = 0 for all z satisfying (4.4.3.1). Let φ(z) = φδd /δ (z)
be a univariate polynomial constructed in Lemma 2.2.3, such that

φ(0) = 0, φ(1) = 1
122 4 Hafnians and Multidimensional Permanents

and
−α ≤ φ(z) ≤ 1 + α and | φ(z)| ≤ ρ

provided
|z| ≤ β for some β = β(δd /δ) > 1.

We define the composition


g(z) = r (φ(z)).

Then g(z) is a univariate polynomial and deg g = (deg r )(deg φ) = O(n), where the
implicit constant in the “O” notation depends only on δd /δ. In addition,

g(0) = r (0) = PER J = (n!)d−1 and g(1) = r (1) = PER A

and
g(z) = 0 provided |z| ≤ β.

The proof is finished as in Sect. 3.7.5. We choose a branch of f (z) = ln g(z) in the
disc |z| ≤ 1 so that

f (0) = (d − 1) ln n! and f (1) = ln PER A.

Let pm (z) be the Taylor polynomial of f (z) of degree m computed at z = 0. By


Lemma 2.2.1, we can choose m = O(ln n − ln ), where the implicit constant in the
“O” notation depends on δd /δ, so that pm (1) approximates f (1) within an additive
error of . It remains to show that the k-th derivative f (k) (0) is a polynomial of degree
k in the entries of the tensor A. From Sect. 2.2.2, it suffices to show that g (k) (0) is
a polynomial of degree k in the entries of the tensor A. We showed in the proof of
Theorem 4.4.2 that r (k) (0) is a polynomial in the entries of A of degree k and we
compute the expansion of the composition g(z) = r (φ(z)) as in Sect. 3.7.5. 

For fixed d and δ, the polynomial p in Part (2) of Theorem 4.4.3 can be computed
in n O(ln n−ln ) time.
Figure 4.11 pictures regions for the entries of a tensor allowable by Theorem 4.4.2
(disc) and allowable by Theorem 4.4.3 (rectangle).

4.4.4 Scaling. Similarly to the scaling of matrices (see Sect. 3.5), one can define the
scaling of tensors. We say that the d-dimensional n × . . . × n tensor
 A = ai1 ...id is
obtained from the d-dimensional n × . . . × n tensor B = bi1 ...id by scaling if there
exist λk j > 0 for k = 1, . . . , d and j = 1, . . . , n such that

ai1 ...id = λ1i1 · · · λdid bi1 ...id for all 1 ≤ i 1 , . . . , i d ≤ n.

Clearly, in this case,


4.4 Multidimensional Permanents 123

Fig. 4.11 A neighborhood


for the entries of A (disc)
where PER A is
approximated by Theorem
4.4.2 and a neighborhood for
the entries of A (rectangle)
where PER A is
approximated by Theorem
4.4.3 0 1

⎛ ⎞
⎜  ⎟
PER A = ⎜
⎝ λk j ⎟
⎠ PER B. (4.4.4.1)
1≤k≤d
1≤ j≤n

We say that B is d-stochastic if the entries of B are non-negative:

bi1 ...id ≥ 0 for all 1 ≤ i 1 , . . . , i d ≤ n

and the sum of entries in every slice is 1:



bi1 ...ik−1 iik+1 ...id = 1
1≤i 1 ,...,i k−1 ,i k+1 ,...,i d ≤n

for all k = 1, . . . , d and all i = 1, . . . , n. (4.4.4.2)


 
4.4.5 Theorem. Any d-dimensional cubical tensor A = ai1 ...id with real positive
entries
ai1 ...,id > 0 for all 1 ≤ i 1 , . . . , i d ≤ n

can be obtained by scaling from a unique d-stochastic tensor B. The tensor B can
be found as a necessarily unique minimum of the convex function
 xi1 ...id
f (X ) = xi1 ...id ln
1≤i 1 ,...,i d ≤n
ai1 ...id

on the convex polytope of d-stochastic n × . . . × n tensors X . Thus we have

bi1 ...id = λ1i1 · · · λdid ai1 ...id for all 1 ≤ i 1 , . . . , i d ≤ n

and some λk j > 0 for k = 1, . . . , d and j = 1, . . . , n. The numbers λk j are unique


up to a rescaling
124 4 Hafnians and Multidimensional Permanents

λk j −→ τk λk j for all k, j,

for some τ1 , . . . , τd > 0 such that τ1 · · · τd = 1.

The proof is very similar to the proof of Theorem 3.5.2, see [BS11] and also [Fr11]
for extensions in the case of non-negative tensors A. We note that
 d 
 bi1 ...id  
f (B) = bi1 ...id ln = bi1 ...id ln λkik
1≤i 1 ,...,i d ≤n
ai1 ...id 1≤i 1 ,...,i d ≤n k=1
⎛ ⎞

d 
n
⎜  ⎟  d 
n
= ln λk j ⎜
⎝ bi1 ...id ⎟
⎠ = ln λk j .
k=1 j=1 1≤i 1 ,...,i d ≤n k=1 j=1
ik = j

For d ≥ 3, it is relatively easy to construct an example of a d-stochastic tensor B


such that PER B = 0, see [BS11]. The situation with the d-dimensional permanent
is somewhat similar to that with the hafnian, cf. Sect. 4.2: while there is no van
der Waerden-type lower bound, there is concentration of the permanents of well-
conditioned d-stochastic tensors.
 
4.4.6 Definition. Let A = ai1 ...id be a d-dimensional tensor with positive entries.
For α ≥ 1, we say that A is α-conditioned if

ai1 ...ik ...id ≤ αai1 ...ik ...id for all 1 ≤ i 1 , . . . , i k , i k , . . . , i d ≤ n


and all k = 1, . . . , d. (4.4.6.1)

In words: a tensor with positive entries is α-conditioned if the ratio of any two entries
which differ in one index does not exceed α.

4.4.7 Lemma. Let A be an α-conditioned d-dimensional cubical tensor and let B


be a d-stochastic tensor obtained from A by scaling. Then B is α2 -conditioned.

The proof is very similar to that of Lemma 3.5.6, see also [BS11] for details.
Our next goal is to prove the concentration of d-dimensional permanents of well-
conditioned d-stochastic tensors.

4.4.8 Theorem. For any real α ≥ 1 and any integer d > 1 there exists γ =
γ(d, α) > 0 such that if A is an α-conditioned d-stochastic n × . . . × n array then

n −γ e−n(d−1) ≤ PER A ≤ n γ e−n(d−1) .

The proof follows the same scheme as the proof of Theorem 4.2.2 for hafnians, see
also [BS11].
First, we need the dual description of the scaling factors λk j in Theorem 4.4.5, cf.
Theorem 4.2.4.
4.4 Multidimensional Permanents 125
 
4.4.9 Lemma. Let A = ai1 ...id be a d-dimensional n × . . . × n tensor with positive
entries  λk j > 0 : 1 ≤ k ≤ d, 1 ≤ j ≤ n be real numbers such that the tensor
 and let
B = bi1 ...id where

bi1 ...id = λ1i1 · · · λdid ai1 ...id for all 1 ≤ i 1 , . . . , i d ≤ n

is d-stochastic. Then the point


ξk j = ln λk j

is a maximum point of the linear function  : Rd×n −→ R,


d 
n
 
(x) = xk j for x = xk j
k=1 j=1

on the convex set C = C(A) ⊂ Rd×n defined by the inequality


⎧ ⎧ ⎫ ⎫
⎨    ⎨d 
n ⎬ ⎬
C= x = xk j : ai1 ...id exp xk j ≤ n .
⎩ ⎩ ⎭ ⎭
1≤i 1 ,...,i d ≤n k=1 j=1

The proof is similar to that of Theorem 4.2.4.


Next, we show that if a d-dimensional tensor which is close to d-stochastic is
scaled to d-stochastic, then the product of the scaling factors is close to 1.
 
4.4.10 Lemma. Let A = ai1 ...id be an α-conditioned d-dimensional n × . . . × n
tensor such that the sum of entries of A in the (k, j)-th slice is 1 − δk j , where

 
δk j  ≤ β for k = 1, . . . , d and j = 1, . . . , n
n
and some n
0 ≤ β ≤ .
αd−1 d
 
Suppose further that the sum of the entries of A is n. Let B = bi1 ...id be a d-stochastic
tensor obtained from A by scaling, so that

bi1 ...id = λ1i1 · · · λdid ai1 ...id for all 1 ≤ i 1 , . . . , i d ≤ n

and some λk j > 0. Then


d 
n
αd−1 β 2 d 2
0 ≤ ln λk j ≤ .
k=1 j=1
n
126 4 Hafnians and Multidimensional Permanents

Proof. Since the point xk j = 0 belongs to the convex set C of Lemma 4.4.9, we
conclude that
d  n
ln λk j ≥ 0.
k=1 j=1

Since the sum of entries of A is n, we have


n
δk j = 0 for k = 1, . . . , d. (4.4.10.1)
j=1

 
Let us define a tensor X = xi1 ...id by

1 
d
xi1 ...id = ai1 ...id + wi1 ...id where wi1 ...id = δkik .
n d−1 k=1

It follows by (4.4.10.1) that the sum of entries of X in every slice is 1. Since A is


α-conditioned, we have

1
ai1 ...id ≥ for all i 1 , . . . , i d (4.4.10.2)
(αn)d−1

and hence X is d-stochastic. From Theorem 4.4.5,


d 
n  xi1 ...id
ln λk j ≤ xi1 ...id ln
k=1 j=1 1≤i 1 ,...,i d ≤n
ai1 ...id

  
  wi ...i
= ai1 ...id + wi1 ...id ln 1 + 1 d
1≤i 1 ,...,i d ≤n
ai1 ...id
   wi1 ...id
≤ ai1 ...id + wi1 ...id
1≤i 1 ,...,i d ≤n
ai1 ...id

 wi21 ...id
= .
1≤i 1 ,...,i d ≤n
ai1 ...id

Since
 
wi  ≤ βd ,
1 ...i d
nd
4.4 Multidimensional Permanents 127

by (4.4.10.2) we conclude that


d 
n  wi21 ...id β2d 2 αd−1 β 2 d 2
ln λk j ≤ ≤ nd (αn) d−1
= .
k=1 j=1 1≤i 1 ,...,i d ≤n
ai1 ...id n 2d n


Now we are ready to prove Theorem 4.4.8.
4.4.11 Proof of Theorem 4.4.8. All implied constants in the “O” notation below
depend on α and d only.
For subsets I1 , I2 , . . . , Id ⊂ {1, . . . , n} such that |I1 | = . . . = |Id |, we denote
by A (I1 , . . . , Id ) the d-dimensional tensor consisting of the entries ai1 ...id where
i k ∈ Ik for k = 1, . . . , d. Let B (I1 , . . . , Id ) be the d-stochastic tensor obtained from
A (I1 , . . . , Id ) by scaling. We prove by induction on m = |I1 | = . . . = |Id | that
⎧ ⎛ ⎞⎫
⎨ m
1 ⎠⎬
PER B (I1 , . . . , Id ) = exp −m(d − 1) + O ⎝ (4.4.11.1)
⎩ j ⎭
j=1

Substituting m = n, we get the desired result.


Let I1 , . . . , Id ⊂ {1, . . . , n} be subsets such that |I1 | = . . . = |Id | = m and let
us choose i 1 ∈ I1 . To simplify the notation, we denote B (I1 , . . . , Id ) just by B and
also assume that i 1 = 1. We use the (1, 1)-slice expansion (4.4.1.1):

PER B = b1i2 ...id PER Bi2 ...id , (4.4.11.2)
i 2 ∈I2 ,...,i d ∈Id

where Bi2 ...id is the tensor obtained from B by crossing out all slices containing b1i2 ...id .
Note that (4.4.11.2) represents PER B as a convex combination of PER Bi2 ...id .
By Lemma 4.4.7, the tensor B is α2 -conditioned. Since B is d-stochastic, the
entries of B do not exceed α2(d−1) /m d−1 . Let σi2 ...id be the sum of the entries of
Bi2 ...id . Hence  
1
σi2 ...id = m − d + O (4.4.11.3)
m

(we obtain a lower bound when we subtract from the total sum of the entries of B
the sums over d slices and we obtain an upper bound if we add back the sums over
all pairwise intersections of slices).
We scale Bi2 ...id to the total sum of entries m − 1, so we define

 m−1
Bi2 ...id = Bi ...i .
σi2 ...id 2 d
128 4 Hafnians and Multidimensional Permanents

Then  m−1
σi2 ...id
PER Bi2 ...id = PER 
Bi2 ...id
m−1

and by (4.4.11.3), we conclude that


 !
1
PER Bi2 ...id = exp −(d − 1) + O PER 
Bi2 ...id . (4.4.11.4)
m

To estimate PER  Bi2 ...id we use Lemma 4.4.10. Let us scale  Bi2 ...id to a d-stochastic
tensor. The resulting d-stochastic tensor is the same tensor we obtain from A(I \{i 1 },
. . . , Id \{i d }) by scaling, that is, the tensor B (I1 \ {i 1 }, . . . , Id \ {i d }). Since the tensor
B is d stochastic and the entries of B do not exceed α2(d−1) /m d−1 , we conclude that
the sum of entries in every slice of Bi2 ...id is at most 1 and at least 1 − α2(d−1) /m.
Consequently, the absolute value of the difference of the sum of entries in every
slice  Bi2 ...id and 1 is O(1/m). Applying Lemma 4.4.10, we conclude that as long as
m > γ1 (α, d) for some constant γ1 depending on α and d only, we have
 !
1
PER 
Bi2 ...id = exp O PER B (I1 \ {i 1 }, . . . , Id \ {i d }) . (4.4.11.5)
m

We use a trivial estimate

PER B = e O(1) provided m ≤ γ1 (α, d). (4.4.11.6)

Applying the induction hypothesis to PER B (I1 \ {i 1 }, . . . , Id \ {i d }) and combining


(4.4.11.6), (4.4.11.5) and (4.4.11.2), we complete the proof of (4.4.11.1). 

4.4.12 Algorithmic applications. It follows from Theorem 4.4.5, Lemma 4.4.7 and
Theorem 4.4.8 that for any α ≥ 1, fixed in advance, the permanent of a d-dimensional
α-conditioned n×. . .×n tensor can be efficiently (in polynomial time) approximated
within a polynomial in n factor of n γ for some γ = γ(α, d). As is argued in [BS11],
this allows us to distinguish d-partite hypergraphs that are far from having a perfect
matchings from d-partite hypergraphs that have sufficiently many perfect matchings
even when “sufficiently many” means that the probability to hit a perfect matching
at random is exponentially small.
Let V = V1 ∪ . . . ∪ Vd be the set of vertices of a d-partite hypergraph H , where
|V1 | = . . . = |Vd | = n and for every edge S of H we have |S ∩ V1 | = . . . =
|S ∩ Vd | = 1. We identify each “part” Vi with a copy of the set {1,  . . . , n}, fix an
0 <  < 1 and construct a d-dimensional n × . . . × n tensor A = ai1 ...id by

1 if (i 1 , . . . , i d ) is an edge of H
ai1 ...id =
 otherwise.
4.4 Multidimensional Permanents 129

Then H is 1/-conditioned, and applying Theorem 4.4.5, Lemma 4.4.7 and Theorem
4.4.8 we can estimate in polynomial time PER A within a multiplicative factor of
n γ(,d) . Now, if every matching in H consists of at most (1 − δ)n edges for some
δ > 0, we have PER A ≤ δn (n!)d−1 . On the other hand, if H has at least β n (n!)d−1
perfect matchings for some 0 < β ≤ 1, we have PER A ≥ β n (n!)d−1 . As long as
δ < β, by computing PER A within a factor of n γ(,d) , we can distinguish these two
cases.
We note that similar results can be obtained for the d-dimensional version of a
hafnian, cf. [BS11]. We also note that there is a Bregman-Minc type upper bound,
cf. Sect. 3.4, for d-dimensional permanents of 0–1 tensors [DG87].
The entropy-based method of proof of the Bregman-Minc inequality found in
[Ra97] (see Sect. 3.4) was further applied to obtain non-trivial upper bounds for the
number of independent sets in graphs [Ka01], the number of Hamiltonian cycles
in graphs [CK09] and hypergraphs of particular types [LL13], [LL14]. In contrast,
lower bounds are usually much harder to come by. A recent breakthrough by Keevash
[Ke14], [Ke15] establishes the existence and the asymptotic of the number of designs,
which can be interpreted as a result on a lower bound for multidimensional perma-
nents for some special (very symmetric) arrays with 0–1 entries, see also [Po15] for
lower bounds complementing [LL13] and [LL14].
Efficient algorithms for computing the d-dimensional permanent exactly in special
cases are discussed in [CP16].

4.5 Mixed Discriminants

4.5.1 Definition. Let Q 1 , . . . , Q n be n × n real symmetric matrices. Then

p (x1 , . . . , xn ) = det (x1 Q 1 + . . . + xn Q n )

is a homogeneous polynomial of degree n and its mixed term

∂n
p (x1 , . . . , xn ) = D (Q 1 , . . . , Q n )
∂x1 · · · ∂xn

is called the mixed discriminant of Q 1 , . . . , Q n . Mixed discriminants were introduced


by A.D. Alexandrov in his work on mixed volumes [Al38], see also [Le93].
We can express the mixed discriminant as a polynomial   in the entries of the
matrices Q 1 , . . . , Q n as follows: suppose that Q k = qikj for 1 ≤ i, j ≤ n and
k = 1, . . . , n. Then
 
x1 Q 1 + . . . + xn Q n = x1 qi1j + . . . + xn qinj for 1 ≤ i, j ≤ n

and hence
130 4 Hafnians and Multidimensional Permanents

 
n
 
det (x1 Q 1 + . . . + xn Q n ) = (sgn σ) 1
x1 qiσ(i) + . . . + xn qiσ(i)
n

σ∈Sn i=1

and, consequently,

 
n
τ (i)
D(Q 1 , . . . , Q n ) = (sgn σ) qiσ(i) (4.5.1.1)
σ,τ ∈Sn i=1

Thus the mixed discriminant D(Q 1 , . . . , Q n ) can be interpreted as a version of the


determinant of an n × n × n array whose 2-dimensional slices are identified with the
matrices Q 1 , . . . , Q n , cf. Sect. 4.4.
As follows by (4.5.1.1), the mixed discriminant is linear in each argument. It is
immediate from the definition that if T is an n × n matrix then
 
D T ∗ Q 1 T, . . . , T ∗ Q n T = (det T )2 D (Q 1 , . . . , Q n ) , (4.5.1.2)

where T ∗ is the transpose of T .


In general, we obtain the monomial expansion

det (x1 Q 1 + . . . + xn Q n )
⎛ ⎞
 x1m 1 · · · xnm n ⎝
= D Q1, . . . , Q1, . . . , Qn , . . . , Qn ⎠ . (4.5.1.3)
m1! · · · mn ! # $% & # $% &
m 1 ,...,m n ≥0 m 1 times m n times
m 1 +...+m n =n

Indeed, it follows from the definition that D(Q, . . . , Q) = n! det Q for every n × n
symmetric matrix Q. For x = (x1 , . . . , xn ), let Q x = x1 Q 1 + . . . + xn Q n . Then

1
det Q x = D (Q x , . . . , Q x )
n!
and we obtain (4.5.1.3) since the mixed discriminant is linear in each argument and
symmetric, that is, does not depend on the order of matrices.
 
Mixed discriminants generalize permanents: given an n × n matrix A = ai j , let
us define n × n symmetric matrices Q 1 , . . . , Q n by Q i = diag (ai1 , . . . , ain ), that
is, Q i is the diagonal matrix having the i-th row of A on the diagonal. Then
 n 

n 
det (x1 Q 1 + . . . + xn Q n ) = xi ai j
j=1 i=1

and hence D (Q 1 , . . . , Q n ) = per A, cf. Sect. 3.2.1.


Just as the permanent of a non-negative matrix is non-negative, the mixed dis-
criminant of positive semidefinite matrices is non-negative.
4.5 Mixed Discriminants 131

4.5.2 Lemma. Suppose that Q 1 , . . . , Q n are positive semidefinite n × n matrices.


Then
D (Q 1 , . . . , Q n ) ≥ 0.

Moreover, if Q 1 , . . . , Q n are positive definite then D (Q 1 , . . . , Q n ) > 0.


Proof. Since D (Q 1 , . . . , Q n ) is a continuous function of Q 1 , . . . , Q n , without loss
of generality we may assume that Q 1 , . . . , Q n are positive definite, in which case we
prove that D (Q 1 , . . . , Q n ) > 0. We proceed by induction on n. The case of n = 1
is clear. Suppose that n > 1. Since Q 1 is positive definite, we can write Q 1 = T ∗ T
for some invertible n × n matrix T and then by (4.5.1.2)
 
D (Q 1 , . . . , Q n ) =D T ∗ T, Q 1 , . . . , Q n
  ∗  ∗  (4.5.2.1)
=(det T )2 D I, T −1 Q 2 T −1 , . . . , T −1 Q n T −1 ,

where I is the n × n identity matrix. For i = 1, . . . , n, let u i be the matrix having 1 in


the i-th diagonal position and zeros elsewhere, so that I = u 1 + . . . + u n . Denoting
 ∗
Q k = T −1 Q k T −1 for k = 2, . . . , n (4.5.2.2)

we conclude that Q 2 , . . . , Q n are positive definite matrices and by linearity we have

  n
 
D I, Q 2 , . . . , Q n = D u i , Q 2 , . . . , Q n . (4.5.2.3)
i=1

On the other hand, as follows from the definition or from (4.5.1.1), we have
   
D u i , Q 2 , . . . , Q n = D Q 2i , . . . , Q ni , (4.5.2.4)

where Q ki is the (n − 1) × (n − 1) symmetric matrix obtained from Q k by crossing


 
out the ith row and ith column. Since the matrices
 Q 2i , . . . Q niare positive definite,
 
by the induction hypothesis we conclude that D Q 2i , . . . , Q ni > 0 and combining
(4.5.2.1)– (4.5.2.4), we conclude the proof. 
4.5.3 Combinatorial applications of mixed discriminants. For a vector u =
(u 1 , . . . , u n ), we denote by u ⊗ u the n × n matrix whose (i, j)-th entry is u i u j .
Clearly, u ⊗ u is positive semidefinite. Various combinatorial applications of mixed
discriminants are based on the following formula:

D (u 1 ⊗ u 1 , . . . , u n ⊗ u n ) = (det [u 1 , . . . , u n ])2 , (4.5.3.1)

where u 1 , . . . , u n ∈ Rn are vectors and [u 1 , . . . , u n ] is the n ×n matrix with columns


u 1 , . . . , u n . By continuity, it suffices to check (4.5.3.1) when u 1 , . . . , u n is a basis of
Rn and then it follows by (4.5.1.2) from the obvious special case when u 1 , . . . , u n is
the standard orthonormal basis of Rn .
132 4 Hafnians and Multidimensional Permanents

The following application is from Chap. V of [BR97]. Let G be a connected


(undirected, with no loops or multiple edges) graph with n vertices and m edges, and
suppose that the edges are colored with n − 1 different colors.
 Let us direct the edges
arbitrarily and consider the n × m incidence matrix A = ai j of G, where


⎨1 if vertex i is the beginning of edge j,
ai j = −1 if vertex i is the end of edge j,


0 otherwise.

Let us remove an arbitrary row of A and let u 1 , . . . , am be the columns of the resulting
matrix, interpreted as vectors from Rn−1 . For k = 1, . . . , n − 1, let Jk ⊂ {1, . . . , m}
be the set of indices of edges colored into the k-th color and let

Qk = u j ⊗ u j for k = 1, . . . , n − 1.
j∈Jk

Then Q 1 , . . . , Q n−1 are positive semidefinite matrices and D (Q 1 , . . . , Q n−1 ) is the


number of spanning trees in G having exactly one edge of each color. Indeed, by
linearity of the mixed discriminant and (4.5.3.1), we have
  ( )2
D (Q 1 , . . . , Q n−1 ) = det u j1 , . . . , u jn−1 .
j1 ∈J1 ,..., jn−1 ∈Jn−1

As is well-known (see, for example, Chap. 4 of [E+84]), we have



( ) ±1 if the edges j1 , . . . , jn−1 form a spanning tree in G
det u j1 , . . . , u jn =
0 otherwise.

4.5.4 Doubly stochastic n-tuples. Pursuing an analogy with the permanent, we


say that the n-tuple (Q 1 , . . . , Q n ) of n × n positive semidefinite matrices is doubly
stochastic if

tr Q 1 = . . . = tr Q n = 1 and Q 1 + . . . + Q n = I,

the identity matrix. Indeed, if Q 1 , . . . , Q n are diagonal matrices then (Q 1 , . . . , Q n )


is doubly stochastic if and only if the n × n matrix A whose i-th row is the diagonal
of Q i is doubly stochastic.
The following result was conjectured by Bapat [Ba89] and proved by Gurvits
[Gu06], [Gu08].

4.5.5 Theorem. Let (Q 1 , . . . , Q n ) be a doubly stochastic n-tuple. Then

n!
D (Q 1 , . . . , Q n ) ≥ .
nn
4.5 Mixed Discriminants 133

The proof follows the approach of Sect. 3.3, which in turn follows [Gu08].
4.5.6 Lemma. Let Q 1 , . . . , Q n be n × n positive definite matrices. Then the poly-
nomial
p (x1 , . . . , xn ) = det (x1 Q 1 + . . . + xn Q n )

is H-stable and the coefficient of every monomial of degree n is positive.


Proof. Let us choose z 1 , . . . , z n ∈ C such that z k > 0 for k = 1, . . . , n and
suppose that p (z 1 , . . . , z n ) = 0. Then the matrix


n
Q= zk Q k
k=1

is not invertible and hence there is a vector y ∈ Cn \ {0} such that Qy = 0. We


consider the standard inner product


n
x, y = xk yk for x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
k=1

in Cn . Thus we have

n
0 = Qy, y = z k Q k y, y. (4.5.6.1)
k=1

However, since Q 1 , . . . , Q n are positive definite matrices, the numbers Q k y, y


are positive real, which contradicts (4.5.6.1) since the imaginary part of each z k is
positive.
Finally, by (4.5.1.3) and Lemma 4.5.2 the coefficient of x1m 1 . . . xnm n in p where
m 1 + . . . + m n = n is positive. 
Next, we discuss the capacity of p, see also Lemma 3.3.3.
4.5.7 Lemma. Let Q 1 , . . . , Q n be a doubly stochastic n-tuple and let

p(x1 , . . . , xn ) = det (x1 Q 1 + . . . + xn Q n ) .

Then
p (x1 , . . . , xn )
inf = 1.
x1 ,...,xn >0 x1 · · · xn

Proof. Let us define a function f : Rn −→ R by


 
f (t1 , . . . , tn ) = ln det et1 Q 1 + . . . + etn Q n

and let H ⊂ Rn be the hyperplane t1 + . . . + tn = 0. It suffices to show that


the minimum of f on H is attained at t1 = . . . = tn = 0. By Lemma 4.5.6 and
134 4 Hafnians and Multidimensional Permanents

Sect. 2.1.1.3, the function f is convex so it suffices to verify that the gradient of f
at t = 0 is proportional to the vector (1, . . . , 1).
Since
∇ (ln det X ) = (X ∗ )−1 ,

denoting

n
S(t) = etk Q k ,
k=1

we obtain
∂ f    
 = etk tr Q k S −1 (t)  = tr Q k = 1
∂tk t1 =...=tn =0 t1 =...=tn =0

and the proof follows. 

4.5.8 Proof of Theorem 4.5.5. The proof follows by Lemmas 4.5.6, 4.5.7 and
Corollary 2.4.6. 

There is a notion of scaling for n-tuples of positive semidefinite matrices. Just as


an n × n matrix can be scaled to a doubly stochastic matrix, see Theorem 3.5.2, an
n-tuple of positive definite matrices can be scaled to a doubly stochastic n-tuple. The
following result was obtain by Gurvits and Samorodnitsky [GS02].

4.5.9 Theorem. Let Q 1 , . . . , Q n be n × n positive definite matrices. Then there is


a doubly stochastic n-tuple (B1 , . . . , Bn ), an invertible n × n matrix T and positive
reals τ1 , . . . , τn such that

Q k = τk T ∗ Bk T for k = 1, . . . , n.

Proof. As in the proof of Lemma 4.5.7, we consider the function f : Rn −→ R


defined by  
f (t1 , . . . , tn ) = ln det et1 Q 1 + . . . + etn Q n

and the hyperplane H defined by the equation t1 + . . . + tn = 0. It is not hard to see


that f attains its minimum on H at some point (x1 , . . . , xn ) where the gradient of
f is proportional to (1, . . . , 1). As in the proof of Lemma 4.5.7, we obtain that for
some real α and
S = e x1 Q 1 + . . . + e xn Q n ,

we have
∂ f   
 = e xk tr Q k S −1 = α for k = 1, . . . , n. (4.5.9.1)
∂tk t1 =x1 ,...,tn =xn
4.5 Mixed Discriminants 135

Since  n 

n
  
−1 −1
nα = xk
e tr Q k S = tr xk
e Qk S = n,
k=1 k=1

we conclude that
α = 1. (4.5.9.2)

Since S is positive definite, we can write it as S = T ∗ T for an invertible n × n matrix


T . We define
 ∗
Bk = e xk T −1 Q k T −1 and τk = e−xk for k = 1, . . . , n.

Clearly, B1 , . . . , Bn are positive definite matrices,

Q k = τk T ∗ Q k T for k = 1, . . . , n

and  n 

n
    −1
∗ −1
Bk = T e Q k T −1 = T ∗
xk
ST −1 = I.
k=1 k=1

By (4.5.9.1) and (4.5.9.2) we get


 ∗  ∗
tr Bk = e xk tr T −1 Q k T −1 = e xk tr Q k T −1 T −1 = e xk tr Q k S −1 = 1,

which completes the proof. 

In [GS02], Gurvits and Samorodnitsky also discuss scaling of n-tuples of positive


semidefinite matrices.

4.6 A Version of Bregman–Minc Inequalities for Mixed


Discriminants

Theorem 4.5.5 is an extension of the van der Waerden inequality from permanents of
doubly stochastic matrices, see Sect. 3.3, to mixed discriminants of doubly stochastic
n-tuples of matrices. One can ask if there is a version of the Bregman - Minc inequality
for mixed discriminants, see Sect. 3.4. Some weak version of such an inequality is
suggested in [B16a]. For what follows, it is convenient to associate with an n × n
matrix Q the quadratic form q : Rn −→ R,

q(x) = Qx, x for x ∈ Rn ,


136 4 Hafnians and Multidimensional Permanents

where ·, · is the standard inner product in Rn . We define the eigenvalues, trace
and determinant of q as those of Q. Similarly, we define the mixed discriminant
D(q1 , . . . , qn ) of quadratic forms q1 , . . . , qn −→ R as D(Q 1 , . . . , Q n ), where Q i
is the matrix of qi . We observe that if we choose a different orthonormal basis in Rn ,
the matrices Q i change Q i := U ∗ Q i U for some orthogonal n × n matrix U , so that
the eigenvalues of Q i and the mixed discriminant D(Q 1 , . . . , Q n ) do not change.
In particular, q1 , . . . , qn : Rn −→ R is a doubly stochastic n-tuple of quadratic
forms, if the forms q1 , . . . , qn are positive semidefinite, tr qi = 1 for i = 1, . . . , n
and
n
qi (x) = x2 ,
i=1

where  ·  is the standard Euclidean norm in Rn .

4.6.1 Definition. Given a real α ≥ 1, we say that an n × n positive definite matrix


Q is α-conditioned if
λmax (Q) ≤ αλmin (Q),

where λmax and λmin are respectively the largest and the smallest eigenvalues of Q.
Equivalently, Q is α-conditioned if for the corresponding quadratic form, we have

q(x) ≤ αq(y) for all x, y ∈ Rn such that x = y = 1. (4.6.1.1)

An n-tuple (Q 1 , . . . , Q n ) of n × n positive definite matrices is α-conditioned if


each matrix Q k is α-conditioned for k = 1, . . . , n and

qi (x) ≤ αq j (x) for all 1 ≤ i, j ≤ n and all x ∈ Rn ,

where q1 , . . . , qn are the corresponding quadratic forms.

Definition 4.6.1 extends Definition 3.5.5 from α-conditioned positive matrices


to n-tuples of n × n positive definite matrices. The following result is obtained in
[B16a].

4.6.2 Theorem. Let (Q 1 , . . . , Q n ) be an α-conditioned doubly stochastic n-tuple


of positive definite n × n matrices. Then

D (Q 1 , . . . , Q n ) ≤ n α e−(n−1) .
2

Combining Theorems 4.5.5 and 4.6.2, we conclude that for a fixed α ≥ 1, the mixed
discriminant of an α-conditioned doubly stochastic n-tuple of matrices varies within
a polynomial in n factor of e−n , just like in the case of permanents of doubly sto-
chastic matrices, cf. Sect. 3.4.6, hafnians of doubly stochastic symmetric matrices
(Theorem 4.2.2) and similarly to multidimensional permanents of d-stochastic ten-
sors (Theorem 4.4.8). It would be interesting to find out if in Theorem 4.6.2 we
4.6 A Version of Bregman–Minc Inequalities for Mixed Discriminants 137

can just require that the eigenvalues of the matrices Q 1 , . . . , Q n do not exceed α/n
(that would have been a true extension of the Bregman - Minc inequality to the
mixed discriminant). By and large, the proof follows the same scheme as the proofs
of Theorems 4.2.2 and 4.4.8. It proceeds by combining induction and scaling.
To proceed with the induction, we need a way to pass from an n-tuple of n × n
matrices to (n − 1)-tuple of (n − 1) × (n − 1) matrices. We do so by considering
a restriction of the quadratic forms onto a subspace. Let q1 , . . . qn : Rn −→ R be
quadratic forms and let L ⊂ Rn be a subspace, dim L = m. Then the restrictions
qi |L : L −→ R are quadratic forms on L. Since the subspace L inherits the Euclidean
structure from Rn , we can define the mixed discriminant D (q1 |L , . . . , qm |L).
First, we obtain a version of recursive formulas (3.1.1.2), (4.1.1.3) and (4.4.1.1).

4.6.3 Lemma. Let q1 , . . . , qn : Rn −→ R be quadratic forms and let


n
qn (x) = λi u i , x2 ,
i=1

where λ1 , . . . , λn are the eigenvalues and u 1 , . . . , u n are the corresponding unit


eigenvectors of qn . Then


n
 
D (q1 , . . . , qn ) = λi D q1 |u i⊥ , . . . , qn−1 |u i⊥ ,
i=1

where u i⊥ ⊂ Rn is the orthogonal complement of u i .

Proof. Since the mixed discriminant is linear in each argument, it suffices to prove
that    
D q1 , . . . , qn−1 , u, x2 = D q1 |u ⊥ , . . . , qn−1 |u ⊥ (4.6.3.1)

for any unit vector u ∈ Rn . Let us choose an orthonormal basis in Rn containing


u as the last vector, and let Q 1 , . . . , Q n−1 be the matrices of q1 , . . . , qn−1 in this
basis. Then the matrices Q 1 , . . . , Q n−1 of the restrictions q1 |u ⊥ , . . . , qn−1 |u ⊥ are
the (n − 1) × (n − 1) upper left submatrices of Q 1 , . . . , Q n−1 while the matrix E n
of u, x is the matrix whose (n, n)-th entry is 1 and all other entries are 0. It then
follows that
∂  
det (t1 Q 1 + . . . + tn−1 Q n−1 + tn E n ) = det t1 Q 1 + . . . + tn−1 Q n−1
∂tn

and (4.6.3.1) follows by Definition 4.5.1. 

Next, we show that if we scale an α-conditioned n-tuple of positive definite


matrices to a doubly stochastic n-tuple, we get an α2 -conditioned n-tuple of matrices
(cf. Lemmas 3.5.6, 4.2.5 and 4.4.7). As we will have to deal with restrictions of
quadratic forms, we prove the statement in more generality.
138 4 Hafnians and Multidimensional Permanents

4.6.4 Lemma. Let q1 , . . . , qn : Rn −→ R be an α-conditioned n-tuple of positive


definite quadratic forms, let L ⊂ Rn be an m-dimensional subspace, let T : L −→
Rn be a linear transformation such that ker T = {0}, let τ1 , . . . , τm > 0 be reals,
and let us define quadratic forms p1 , . . . , pm : L −→ R by

pi (x) = τi qi (T x) for x ∈ L and i = 1, . . . , m.

Suppose that the m-tuple ( p1 , . . . , pm ) is doubly stochastic. Then the m-tuple


( p1 , . . . , pm ) is α2 -conditioned.

Proof. Let us define q : Rn −→ R by


m
q(x) = τi qi (x) for x ∈ Rn .
i=1

Then by (4.6.1.1) the form q is α-conditioned, so

λmax (q) ≤ αλmin (q),

where λmax (q) and λmin (q) are, respectively, the largest and the smallest eigenvalues
of q. For all x, y ∈ L such that x = y = 1, we have

1 = q(T x) ≥ λmin (q)T x2 and 1 = q(T y) ≤ λmax (q)T y2 ,

from which it follows that

T x2 ≤ αT y2 for all x, y ∈ L such that x = y = 1. (4.6.4.1)

Using that each quadratic form qi is α-conditioned, we deduce from (4.6.4.1) that
for all x, y ⊂ L such that x = y = 1, we have

pi (x) =τi qi (T x) ≤ τi λmax (qi )T x2 ≤ ατi λmax (qi )T y2 ≤ α2 τi λmin (qi )T y2
≤ α2 τi qi (T y) = α2 pi (y)

and hence each quadratic form pi is α2 -conditioned.


Let us now define quadratic forms ri : L −→ R by ri (x) = qi (T x) for x ∈ L
and i = 1, . . . , m. Since the n-tuple (q1 , . . . , qn ) is α-conditioned, we have

ri (x) ≤ αr j (x) for all x ∈ L and all i, j.

Therefore,
tr ri ≤ α tr r j for all i, j.
4.6 A Version of Bregman–Minc Inequalities for Mixed Discriminants 139

Since
1 = tr pi = τi tr ri for i = 1, . . . , m,

we conclude that
τi ≤ ατ j for all 1 ≤ i, j ≤ m. (4.6.4.2)

Since the n-tuple (q1 , . . . , qn ) is α-conditioned, we deduce from (4.6.4.2) that for
all x ∈ L we have

pi (x) = τi qi (T x) ≤ ατ j qi (T x) ≤ α2 τ j q j (T x) = α2 p j (x),

and the m-tuple ( p1 , . . . , pm ) is α2 -conditioned. 

The last ingredient we need to prove Theorem 4.6.2 is a one-sided version of the
inequalities of Lemmas 4.2.6 and 4.4.10.

4.6.5 Lemma. Let Q 1 , . . . , Q n be n × n positive definite matrices such that


n
tr Q i = n.
i=1

Let (B1 , . . . , Bn ) be a doubly stochastic n-tuple, constructed in Theorem 4.5.9, so


that
Q k = τk T ∗ Bk T for k = 1, . . . , n.

Then
D (B1 , . . . , Bn ) ≥ D (Q 1 , . . . , Q n ) .

Proof. Let Q = Q 1 + . . . + Q n and let λ1 , . . . , λn be the eigenvalues of Q. Then


 n  n

n
1
n
1
det Q = λi ≤ λi = tr Q = 1.
i=1
n i=1 n

We have  n 

D (Q 1 , . . . , Q n ) = τk (det T )2 D (B1 , . . . , Bn ) .
k=1

In the notation of Theorem 4.5.9,


 *

n 
n
τk = exp − xk = 1,
k=1 k=1

where x = (x1 , . . . , xk ) is the minimum point of the function


140 4 Hafnians and Multidimensional Permanents
 
f (t1 , . . . , tn ) = ln det et1 Q 1 + . . . + etn Q n

on the hyperplane t1 + . . . + tn = 0. In addition,


 
(det T )2 = det e x1 Q 1 + . . . + e xn Q n = exp { f (x1 , . . . , xn )}
≤ exp { f (0, . . . , 0)} = det (Q 1 + . . . + Q n ) = det Q ≤ 1,

and the proof follows. 

Now we are ready to prove Theorem 4.6.2.

4.6.6 Proof of Theorem 4.6.2. We prove a more general statement:


Let q1 , . . . , qn : Rn −→ R be an α-conditioned n-tuple of positive definite
quadratic forms, let L ⊂ Rn be an m-dimensional subspace, let T : L −→ Rn be a
linear transformation such that ker T = {0} and let τ1 , . . . , τm > 0 be reals. Let us
define quadratic forms pi : L −→ R by

pi (x) = τi qi (T x) for x ∈ L and i = 1, . . . , m.

Suppose that ( p1 , . . . , pm ) is a doubly stochastic m-tuple. Then


 *

m
1
D ( p1 , . . . , pm ) ≤ exp −(m − 1) + α 2
. (4.6.6.1)
k=2
k

We obtain Theorem 4.6.2 if m = n, T = I is the identity map and τi = 1 for


i = 1, . . . , n.
We proceed to prove the above statement by induction on m.
If m = 1 then D( p1 ) = det p1 = 1 and the statement clearly holds.
Suppose that m ≥ 2. Let


m
pm (x) = λ j u j , x2
j=1

is the spectral decomposition of pm , where λ j are the eigenvalues and u j are the
corresponding unit eigenvectors of pm . Since tr pm = 1, we have


m
λ j = 1. (4.6.6.2)
j=1

Let L j = u ⊥j for j = 1, . . . , m. Hence L j ⊂ L and dim L j = m −1. Let pi j = pi |L j


be the restriction of pi onto L j , so  pi j : L j −→ R are positive definite quadratic
forms. By Lemma 4.6.3,
4.6 A Version of Bregman–Minc Inequalities for Mixed Discriminants 141


m
 
D ( p1 , . . . , pm ) = λj D  p(m−1) j .
p1 j , . . . ,  (4.6.6.3)
j=1

We note that
α2
tr 
pm j = tr pm − λ j = 1 − λ j ≥ 1 − ,
m

since by Lemma 4.6.4 the quadratic form pm is α2 -conditioned.


p1 j + . . . + 
Using that  p(m−1) j = x2 for all x ∈ L j , we get


m−1
α2
σj = tr 
pi j = (m − 1) − tr 
pm j ≤ (m − 2) +
i=1
m
for j = 1, . . . , m. (4.6.6.4)

We define quadratic forms ri j : L j −→ R by

m−1
ri j = 
pi j for i = 1, . . . , m − 1 and j = 1, . . . , m.
σj

In particular,

m−1
tr ri j = m − 1 for j = 1, . . . , m. (4.6.6.5)
i=1

From (4.6.6.4), we get


 m−1
  σj  
p(m−1) j =
p1 j , . . . , 
D  D r1 j , . . . , r(m−1) j
m−1
 m−1
1 α2  
≤ 1− + D r1 j , . . . , r(m−1) j (4.6.6.6)
m − 1 m(m − 1)
!
α2  
≤ exp −1 + D r1 j , . . . , r(m−1) j for j = 1, . . . , m.
m
 
Let w1 j , . . . , w(m−1) j be the doubly stochastic (m − 1)-tuple of quadratic forms,
wi j : L j −→ R, obtained from r1 j , . . . , r(m−1) j by scaling as in Theorem 4.5.9.
From (4.6.6.5) and Lemma 4.6.5, we have
   
D w1 j , . . . , w(m−1) j ≥ D r1 j , . . . , r(m−1) j

and hence from (4.6.6.6), we get


!
  α2  
p1 j , . . . , 
D  p(m−1) j ≤ exp −1 + D w1 j , . . . , w(m−1) j . (4.6.6.7)
m
142 4 Hafnians and Multidimensional Permanents

Now we would like to apply the induction hypothesis to the quadratic forms

w1 j , . . . , w(m−1) j : L j −→ R.
 
Since the (m − 1)-tuple w1 j , . . . , w(m−1) j is obtained by scaling from the (m − 1)-
tuple r1 j , . . . , r(m−1) j , there is an invertible linear transformation S j : L j −→ L j
and positive numbers μ1 j , . . . , μ(m−1) j such that

  (m − 1)μi j (m − 1)μi j
wi j (x) =μi j ri j S j x = 
pi j (S j x) = pi (S j x)
σj σj
(m − 1)μi j τi
= qi (T S j x) for all x ∈ L j
σj

and i = 1, . . . , m − 1. For each j = 1, . . . , m, we have a linear transformation


T S j : L j −→ Rn with ker T S j = {0} and hence by the induction hypothesis
 *
  1
m−1
D w1 j , . . . , w(m−1) j ≤ exp −(m − 2) + α 2
(4.6.6.8)
k=2
k

for j = 1, . . . , m.
Combining (4.6.6.2), (4.6.6.3), (4.6.6.7) and (4.6.6.8), we obtain (4.6.6.1), which
completes the proof. 

4.6.7 Computing mixed discriminants. If the n-tuple (Q 1 , . . . , Q n ) is a doubly


stochastic then by Lemma 4.5.6 we have D (Q 1 , . . . , Q n ) ≤ det (Q 1 + . . . + Q n ) =
1. This, together with Theorem 4.5.5, the scaling algorithm of Theorem 4.5.9 and
the formula
 n 
  
∗ ∗
D λ1 T B1 T, . . . , λn T Bn T = λk (det T )2 D (B1 , . . . , Bn ) (4.6.7.1)
k=1

results in a deterministic polynomial time algorithm to approximate the mixed


discriminant D (Q 1 , . . . , Q n ) of positive semidefinite matrices within a multiplica-
tive factor of n!/n n ≈ e−n [GS02]. A better approximation factor can be achieved by
a randomized polynomial time algorithm [Ba99], extending the permanent approx-
imation algorithm of Sect. 3.9.1. Namely, given n × n positive semidefinite matri-
ces Q 1 , . . . , Q n , we compute n × n matrices T1 , . . . , Tn such that Q k = Tk∗ Tk for
k = 1, . . . , n. Let u 1 , . . . , u n be vectors sampled independently at random from the
standard Gaussian distribution in Rn and let [T1 u 1 , . . . , Tn u n ] be the n × n matrix
with columns T1 u 1 , . . . , Tn u n . Using formula (4.5.3.1) it is not hard to show that

D (Q 1 , . . . , Q n ) = E (det [T1 u 1 , . . . , Tn u n ])2 ,


4.6 A Version of Bregman–Minc Inequalities for Mixed Discriminants 143

and that with probability approaching 1 as n grows we have

det ([T1 u 1 , . . . , Tn u n ])2 ≥ (0.28)n D (Q 1 , . . . , Q n ) .

If vectors u 1 , . . . , u n are sampled independently at random from the standard


Gaussian distribution in Cn then

D (Q 1 , . . . , Q n ) = E |det [T1 u 1 , . . . , Tn u n ]|2 ,

and that with probability approaching 1 as n grows we have

|det [T1 u 1 , . . . , Tn u n ]|2 ≥ (0.56)n D (Q 1 , . . . , Q n ) .

Finally, assume that u 1 , . . . , u n are sampled from the standard Gaussian distribution
in the quaternionic space Hn and let [T u 1 , . . . , T u n ]C be 2n × 2n complex matrix
constructed from the n × n quaternionic matrix [T u 1 , . . . , T u n ] as in Sect. 3.9.1.
Then det [T u 1 , . . . , T u n ]C is a non-negative real,

E det [T u 1 , . . . , T u n ]C = D (Q 1 , . . . , Q n )

and with probability approaching 1 as n grows, we have

det [T u 1 , . . . , T u n ]C ≥ (0.76)n D (Q 1 , . . . , Q n ) .

Assume now that the n-tuple (Q 1 , . . . , Q n ) is α-conditioned. As follows from


Theorem 4.6.2 and Lemma 4.6.4, the scaling algorithm of Theorem 4.5.9, together
with formula (4.6.7.1) and Theorem 4.5.5, approximates the mixed discriminant
D (Q 1 , . . . , Q n ) within a factor of n O(α ) , which is polynomial in n provided α is
2

fixed in advance, cf. also Sects. 3.5, 4.2 and 4.4.


In their proof of the Kadison–Singer Conjecture [M+15], Marcus, Spielman and
Srivastava introduce and study the mixed characteristic polynomial

n    
 ∂ n 

p Q 1 ,...,Q n (x) = 1− det x I + zi Q i  ,
i=1
∂z i i=1
z 1 =...=z n =0

where Q 1 , . . . , Q n are real symmetric or complex Hermitian m × m matrices. If


Q 1 , . . . , Q n are positive semidefinite then the roots of the mixed characteristic poly-
nomial are real and necessarily non-negative. If m = n then the constant term of
p Q 1 ,...,Q n , up to a sign, is equal to the mixed discriminant D(Q 1 , . . . , Q n ). The rela-
tion of the mixed characteristic polynomial to the mixed discriminant is similar to
the relation of the matching polynomial of Chap. 5 to the permanent and hafnian.
Chapter 5
The Matching Polynomial

Known in statistical physics as the partition function of the monomer-dimer model,


the matching polynomial of a graph is an extension of the hafnian, as it enumerates
all, not necessarily perfect, matchings in the graph. The Heilmann–Lieb Theorem
asserts that the roots of the matching polynomial (with non-negative real weights on
the edges) are negative real, which allows us to efficiently approximate the polyno-
mial through interpolation anywhere away from the negative real axis. We demon-
strate the “correlation decay” phenomenon of the probability for a random matching
to contain a given vertex to be asymptotically independent on whether the matching
contains some other remote vertex. Through the Csikvári–Lelarge “lifting” argu-
ment, it allows us to lower bound the matching polynomial of a bipartite graph by
the matching polynomial of a covering tree, which produces a useful Bethe-entropy
estimate. Finally, we prove a general bound on the complex roots of the hypergraph
matching polynomial, which allows us to obtain new interpolation results for (mul-
tidimensional) permanents of matrices and tensors that are not very far from the
matrices (tensors) of all 1 s in the 1 distance on the slices.

5.1 Matching Polynomial


 
5.1.1 Definition. Let A = ai j be an n × n symmetric matrix. For a positive integer
m such that 2m ≤ n, we define

haf m (A) = ai1 j1 · · · aim jm , (5.1.1.1)
{i 1 j1 },...,{i m , jm }

where the sum is taken over all unordered collections of m pairwise disjoint unordered
pairs {i 1 , j1 }, . . . , {i m , jm } where 1 ≤ i 1 , j1 , . . . , i m , jm ≤ n. In particular, if n is even
and n = 2m then haf m (A) = haf A. We also agree that h 0 (A) = 1. Thus h m (A)
© Springer International Publishing AG 2016 145
A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_5
146 5 The Matching Polynomial

enumerates all matchings consisting of m edges in a complete weighted graph with


n vertices.
We define the univariate matching polynomial by

n/2

p A (t) = haf m (A)t m .
m=0

In statistical physics, p A (t) is known as the partition function of the “monomer-dimer


model”, where edges of the matching correspond to “dimers” while the vertices of
the graph not covered by the matching correspond to single “atoms”.
The following remarkable result was obtained by Heilmann and Lieb [HL72].

5.1.2 Theorem. Let A be an n × n symmetric matrix with non-negative entries and


let 
β = β A = max ai j .
i=1,...,n
j: j=i

Then the roots of the matching polynomial p A (t) are negative real and satisfy the
inequality
1
t ≤ − .

The bound on the roots obtained in [HL72] is, in fact, slightly better, cf. Remark
5.1.4 below.
We follow [HL72] and deduce Theorem 5.1.2 from the following result.

5.1.3 Theorem. For a symmetric n×n matrix A let us define a univariate polynomial

n/2

q A (t) = (−1)m h m (A)t n−2m .
m=0

(1) Suppose that A is a real symmetric matrix with positive off-diagonal entries and
let Ai be the (n − 1) × (n − 1) matrix obtained from A by crossing out the i-th
row and i-th column of A for some i = 1, . . . , n. Then the roots of q A (t) and
q Ai (t) are real and q Ai (t) interlaces q A (t) provided n ≥ 2.
(2) Suppose that A is a non-negative real matrix. Then the roots of q A (t) are real.
(3) Let A be an n × n symmetric non-negative real matrix and let

β = β A = max ai j .
i=1,...n
j: j=i


If q A (t) = 0 then |t| ≤ 2 β.

Proof. To prove Part (1), we proceed by induction on n = deg q A . If n = 2 and


i ∈ {1, 2} we have
5.1 Matching Polynomial 147

q A (t) = t 2 − a12 and q Ai (t) = t,

and hence q Ai (t) indeed interlaces q A (t).


Suppose that n > 2. We split all matchings in the complete graph with vertices
{1, . . . , n} contributing to (5.1.1.1) into two classes: those that contain i and those
that do not. Then we obtain the recurrence relation:

q A (t) = tq Ai (t) − ai j q Ai j (t), (5.1.3.1)
j: j=i

where Ai j is the (n − 2) × (n − 2) symmetric matrix obtained from A by crossing out


the i-th and j-th row and the i-th and j-th column. The polynomial q Ai (t) in (5.1.3.1)
accounts for the matchings not containing i while the sum in (5.1.3.1) accounts for
the matchings containing i. We note that the highest terms of q A (t), q Ai (t) and q Ai j (t)
are positive (with coefficients equal to 1).
By the induction hypothesis, each q Ai j (t) interlaces q Ai (t) and hence by Part (1)
of Theorem 2.3.2, the polynomial

p(t) = ai j q Ai j (t)
j: j=i

interlaces q Ai (t). Then by Part (2) of Theorem 2.3.2, the polynomial q Ai (t) interlaces
q A (t) = tq Ai (t) − p(t).
As follows by Part (1), the roots of q A (t) are real if A is a symmetric real matrix
with positive off-diagonal entries. It then follows by continuity that the roots of q A (t)
are real if A is a non-negative real matrix, which proves Part (2).
To prove Part (3), we may assume that β > 0 since the case of β = 0 is trivial.
For a subset I ⊂ {1, . . . , n} we denote by A I the submatrix of A obtained from A
by crossing out rows and columns in I . We denote q A I (t) just by q I (t) and prove by
descending induction on |I | = n − 2, n − 3, . . . , 0 that

q I (t)  
q I (t) = 0 and ≥ β provided i ∈
/ I and t ≥ 2 β.
q I ∪{i} (t)

Indeed, if I = {1, . . . , n} \ {i, j}, we have

q I (t) ai j  1 
q I (t) = t 2 − ai j , q I ∪{i} (t) = t and =t− ≥ 2 β− β ≥ β
q I ∪{i} (t) t 2

provided t ≥ 2 β.
If |I | < n − 2, using (5.1.3.1), for all i ∈
/ I we can write

q I (t) = tq I ∪{i} (t) − ai j q I ∪{i, j} (t) (5.1.3.2)
j: j ∈I
/
j=i
148 5 The Matching Polynomial

and hence
q I (t)  q I ∪{i, j} (t)
=t− ai j .
q I ∪{i} (t) j: j ∈I
/
q I ∪{i} (t)
j=i


By the induction hypothesis, for t ≥ 2 β we have

q I ∪{i, j} (t) 1
≤ √
q I ∪{i} (t) β

and hence
q I (t)  1    
≥ 2 β−√ ai j ≥ 2 β − β = β,
q I ∪{i} (t) β j: j ∈I
/
j=i


which completes the induction. Hence we proved that q A (t) = 0 provided t > 2 β.
Since the polynomial q A (t) is even when n is even and odd when n is odd, the proof
follows. 

5.1.4 Remark. In [HL72] a slightly stronger bound is proven (by a more careful
induction): let us define
⎛ ⎞

wi = ⎝ ai j ⎠ − min ai j
j: j=i
j: j=i ai j >0

β1 = max wi
i=1,...,n
1
β2 = max ai j
4 i, j
β = max {β1 , β2 } .

Then q A (t) = 0 for |t| ≥ 2 β. In particular, if A is the adjacency matrix of a
graph G√ with maximum degree (G) > 1 of a vertex, we have q A (t) = 0 for
|t| ≥ 2 (G) − 1.

5.1.5 Proof of Theorem 5.1.2. Let q A (t) be the polynomial of Theorem 5.1.3. Then

1
q A (t) = t n p A − .
t2

By Part (2) of Theorem 5.1.3 it follows that the roots of p A (t) are the (necessarily
real negative) numbers −1/t 2 where t are non-zero roots of q√ A (t). Since by Part (3)
of Theorem 5.1.3, every real root t of q A (t) satisfies |t| ≤ 2 β, we conclude that
all roots of p A (t) satisfy t ≤ −1/4β, as desired. 
5.1 Matching Polynomial 149

One immediate corollary is that the numbers haf m (A) form a log-concave
sequence.
 
5.1.6 Corollary. Let A = ai j be a non-negative symmetric matrix. Then

(haf m (A))2 ≥ haf m−1 (A) haf m+1 (A) for m = 1, . . . , n/2 − 1.

Proof. Follows by Theorem 5.1.2 and Theorem 2.3.3. 

5.1.7 Computing the matching polynomial. Let A be an n × n non-negative


real symmetric matrix, let p A (t) be the corresponding matching polynomial and let
β = β A ≥ 0 be as defined in Theorem 5.1.2. Let us fix some 0 < δ < 1. One
can deduce from Theorem 5.1.2 that for any given 0 <  < 1 and complex t the
value of p A (t) can be approximated within a relative error of  in quasi-polynomial
n O(ln n−ln ) time as long as |t| ≤ δ/4β, and, moreover, ln p A (t) can be approximated
within an additive error  > 0 by a polynomial of degree O(ln n − ln ) in t and the
entries of A. Given such a t, we define a univariate polynomial

g A (z) = p A (t z).

From Theorem 5.1.2, we deduce that g(z) = 0 as long as |z| ≤ 1/δ. We define

f A (z) = ln g A (z)

and use Lemma 2.2.1 to approximate f A (1) = ln p A (t) by the Taylor polynomial of
f A (z) at z = 0 of some degree d = O(ln n − ln ). Since the values of haf m (A) can
be computed exactly in n O(m) time, we can compute the m-th derivative of p A (t) at
t = 0 in n O(m) time and hence the m-th derivative of f A (z) at z = 0 in n O(m) time,
cf. Sect. 2.2.2, see also [Re15].
In fact, for any δ > 1, fixed in advance, the value of p A (t) at a complex t can be
approximated within a relative error 0 <  < 1 in n O(ln n−ln ) time as long as

Fig. 5.1 The region where


p A (t) can be efficiently
approximated; z 0 is an upper
bound on the roots of p A (t)

z0
150 5 The Matching Polynomial

δ 1
|t| ≤ and |π − arg t| ≥ .
4β δ

Moreover, in that region ln p A (t) can be approximated within an error  by a poly-


nomial of degree O(ln n − ln ) in t and the entries of A. Figure 5.1 shows a domain
of t ∈ C for which p A (t) can be approximated in quasi-polynomial time. It consists
of an outer disc of radius δ|z 0 | for some fixed δ > 1, where z 0 is an upper bound on
the roots of p A , with a sector of a fixed angle removed, and an inner disc of radius
γ|z 0 | for some fixed 0 < γ < 1.

To approximate p A (t), using Lemma 2.2.3 we first construct a disc D = z ∈ C :


|z| ≤ β of some radius β = β(δ) > 1 and a polynomial ψ = ψδ : C −→ C such
that ψ(0) = 0, ψ(1) = 1 and the image ψ(D) lies in a sufficiently thin strip aligned
with the positive real axis, so that the set tψ(D) does not contain the roots of p A . We
then consider the composition

g A (z) = p A (tψ(z))

and use the Taylor polynomial of f A (z) = ln g A (z) at z = 0 to approximate g A (1) =


p A (t), cf. Sect. 3.7 and see [PR16] for detail.
Patel and Regts further showed [PR16] that if A is the adjacency matrix of a
graph G with the largest degree (G) of a vertex bounded above in advance, then the
above algorithm for approximating p A (t) can be made polynomial and not just quasi-
polynomial. They show that in that case the values of haf m (A) for m = O(ln n −ln )
can be computed in time polynomial in n and 1/, see also Sect. 6.6.
Let z 0 be the largest root of the matching polynomial√p A (t) (since z 0 < 0, it is
also the root of p A (t) nearest to the origin). Then ±1/ −z 0 are the roots of the
polynomial q A (t) of Theorem 5.1.3 of the largest absolute value. We note that q A (t)
is a monic polynomial and that the coefficient h m (A) of t n−2m can be computed in
n O(2m) time simply by enumerating all matchings of size m in the complete graph.
Arguing as in Sect. 2.3.4, we can estimate the largest absolute value of the root of
q A (t) and hence the value of z 0 within relative error  in n O(ln n−ln ) time.
There is a Markov Chain based randomized polynomial time algorithm approxi-
mating p A (t) for real t ≥ 0, see Chap. V of [Je03]. If A is the adjacency matrix of a
graph, the complexity of the algorithm is polynomial in t.
For zeros of partition functions of subgraphs with various degree constraints, see
[Ru99, Wa99].

5.2 Correlation Decay for the Matching Polynomial

5.2.1 Graphs and probabilities. In what follows, it is convenient to switch from


the language of symmetric matrices to the language of weighted graphs. We consider
a graph G = (V, E; A), undirected, without loops or multiple edges, with set V of
5.2 Correlation Decay for the Matching Polynomial 151

vertices, set E of edges and non-negative weights ae : e ∈ E on the edges. We


define the matching polynomial

|V |/2
 
PG (t) = h k t k , where h k = ae1 · · · aek .
k=0 e1 ,...,ek ∈E:
e1 ,...,ek pairwise disjoint

We call the product ae1 · · · aek the weight of a matching e1 , . . . , ek .


When G is the complete graph with  V = {1, . . . , n} of vertices and weights
 set
ae = ai j for e = {i, j}, where A = ai j is a symmetric non-negative matrix, we
obtain the matching polynomial p A (t) of Sect. 5.1.1.
We assume that the parameter t is non-negative real. Let us consider the set of
all matchings in G as a finite probability space, where the probability of a matching
consisting of the edges e1 , . . . , ek is proportional to t k ae1 · · · ek (if k = 0 we assume
that the product is equal to 1). Then the probability that a random matching contains
k edges is t k h k /PG (t) and

|V |/2
t PG (t) d 
= t ln PG (t) = PG−1 (t) kh k t k
PG (t) dt k=0

is the expected number of edges in a random matching.


Let G = (V, E; A) be a weighted graph as above, and let S ⊂ V be a set of
its vertices. We denote by G − S the weighted graph obtained from G by deleting
all vertices from S together with incident edges. We start with a recurrence relation
similar to (5.1.3.1):

PG (t) = PG−v (t) + t a{v,w} PG−v−w (t) (5.2.1.1)
w∈V :
{w,v}∈E

Here v is a vertex of V , the term PG−v (t) enumerates all matchings in G not containing
v whereas the sum accounts for all matchings in G containing v (we use G − v as
a shorthand for G − {v} and G − v − w as a shorthand for G − {v, w}) We rewrite
(5.2.1.1) as
⎛ ⎞−1
PG−v (t) ⎜  PG−v−w (t) ⎟
=⎜
⎝1+t a{v,w} ⎟ . (5.2.1.2)
PG (t) w∈V :
PG−v (t) ⎠
{w,v}∈E

We note that PG−v (t)/PG (t) is the probability that a random matching does not
contain vertex v whereas PG−v−w (t)/PG−v (t) is the conditional probability that a
random matching does not contain vertex w given that it does not contain vertex v.
We note that the sum
152 5 The Matching Polynomial

1 PG−v (t)
1−
2 v∈V PG (t)

represents the expected number edges (half of the expected number of vertices) in a
random matching, and hence we get

d 1 PG−v (t)
t ln PG (t) = 1− . (5.2.1.3)
dt 2 v∈V PG (t)

Formula (5.2.1.2) can be naturally generalized as follows: for a set S ⊂ V of vertices


and a vertex v ∈ V \ S, we have
⎛ ⎞−1
PG−S−v (t) ⎜  PG−S−v−w (t) ⎟
=⎜
⎝ 1+t a{v,w} ⎟
⎠ . (5.2.1.4)
PG−S (t) w∈V \S:
PG−S−v (t)
{w,v}∈E

We interpret PG−S−v (t)/PG−S (t) as the conditional probability that a random match-
ing in G does not contain vertex v, given that it does not containing vertices from
S.
We discuss a dynamic programming type algorithm for computing the proba-
bilities PG−S−v (t)/PG−S (t) and, as a corollary, the matching polynomial PG (t),
which exhibits an interesting phenomenon, called the “correlation decay”. We fol-
low [B+07] with some modifications.
 
5.2.2 Lemma. Let us consider the set X of all non-negative vectors x = x S,v with
coordinates parameterized by a pair consisting of a set S ⊂ V of vertices and a
vertex v ∈ V \ S and let us define a transformation T : X −→ X by
⎛ ⎞−1
⎜  ⎟
T (x) = y where yS,v = ⎜
⎝ 1 + t a{v,w} x{S,v},w ⎟
⎠ .
w∈V \S:
{w,v}∈E

Let 
β = max a{v,w}
v∈V
w∈V :
{v,w}∈E

and suppose that


λ
t= for some λ > 0.
β

(1) Suppose that


5.2 Correlation Decay for the Matching Polynomial 153

1
≤ x S,v ≤ 1 for all S ⊂ V and v ∈ V \ S.
1+λ

Then for y = T (x) we have

1
≤ yS,v ≤ 1 for all S ⊂ V and v ∈ V \ S.
1+λ

(2) For any x , x ∈ X and y = T (x ), y = T (x ), we have

  λ  
max ln yS,v − ln yS,v  ≤ max ln x S,v − ln x S,v  .
S⊂V, λ + 1 S⊂V,
v∈V \S v∈V \S

Proof. Since x S,v ≥ 0 for all S and v, for y = T (x) we have yS,v ≤ 1 for all S and
v. If, in addition, x S,v ≤ 1 for all S and v then
 λ 
t a{v,w} x{S,v},w = a{v,w} ≤ λ
w∈V \S:
β w∈V \S:
{w,v}∈E {w,v}∈E

and yS,v ≥ (1 + λ)−1 for all S and v, which proves Part (1).
To prove Part (2), we introduce the substitution

ξ S,v = − ln x S,v and η S,v = − ln yS,v .

Then the transformation T is written as


⎛ ⎞
⎜  ⎟
η S,v = ln ⎜
⎝1 + t a{v,w} e−ξ{S,v},w ⎟
⎠.
w∈V \S:
{w,v}∈E

Then
 
 t  a −ξ{S,v},w 
 {v,w} e 
 
   w∈V \S: 
  ∂η S,v   {w,v}∈E 
  =  
 ∂ξ   −ξ{S,v},w 
w∈V \S {S,v},w  1 + t a {v,w} e 
 
{w,v}∈E  w∈V \S: 
 {w,v}∈E 
154 5 The Matching Polynomial
 
 
 
 
 
 1 
= 1 −  

 1+t a{v,w} e−ξ{S,v},w 
 
 w∈V \S: 
 {w,v}∈E 
 
 1  λ
≤ 1 −  =
1+λ 1+λ

and the proof of Part (2) follows. 

5.2.3 Correlation decay. The transformation T of Lemma 5.2.2 is a contraction.


If we start with a vector x S,v = 1 (or any other vector (1 + λ)−1 ≤ x S,v ≤ 1) and
iterate T , then the vector T m (x) necessarily converges to the unique fixed point x ∗
of T , which, by (5.2.1.4) necessarily satisfies

∗ PG−S−v (t)
x S,v = .
PG−S (t)

As follows from Lemma 5.2.2, to approximate x ∗ by T m (x) coordinate-wise within


a relative error 0 <  < 1, we can choose

ln 1/
m=O (5.2.3.1)
ln(λ + 1) − ln λ

iterations.
Let us introduce a metric on the set V of vertices, where dist(u, v) is the smallest
possible number of edges of G in a path connecting u and v (we let dist(u, v) = +∞
if vertices u and v lie in different connected components). We note that to compute
the (S, v)-coordinate of T m (x),
 weonly need to access (S , w )-coordinates, where
dist (u, v) ≤ m for all u ∈ S \ S ∪ {w }. As follows from (5.2.3.1), if λ is fixed
in advance, we obtain a quasi-polynomial algorithm of |V | O(ln |V |−ln ) complexity to

approximate x S,v within relative error 0 <  < 1.
In particular, if we fix some λ0 > 0 and  > 0 then for any λ ≤ λ0 , up to an
additive error , the value of PG−{v} (t)/PG (t) depends only on the structure of G in
the m-neighborhood of v for some m = m(, λ0 ). In other words, for two weighted
graphs G 1 = (V1 , E 1 ; A1 ) and G 2 = (V2 , E 2 ; A2 ) and for two vertices v1 ∈ V1 and
v2 ∈ V2 we have  
 PG 1 −v1 (t) PG 2 −v2 (t) 

 P (t) − P (t)  ≤ 
G1 G2

provided the m-neighborhoods of v1 of G 1 and of v2 in G 2 are isomorphic.


One particularly interesting case is when ae = 1 for all e ∈ E, when
5.2 Correlation Decay for the Matching Polynomial 155

|V |/2

PG (t) = (the number of k-matchings in G ) t k .
k=0

Then β = (G), the largest degree of a vertex of G. If λ and (G) are fixed in

advance, we obtain a polynomial time algorithm for approximating x S,v , because
the number of different coordinates (S , w ) we need to access while computing the
(S, v)-coordinate of the iteration T m (x) grows roughly as (G)m , which, by (5.2.3.1)
bounded by a polynomial in −1 . By looking at the two consecutive iterations of T ,
in [B+07] a better rate of convergence is established. It is shown that T 2 in fact, a
contraction with a factor of
1
1− √ .
t(G)

This phenomenon of fast convergence is called correlation decay because to approx-



imate x S,v we do not need to care at all about coordinates (S , w ) with {S , w } very
different from {S, v}.

We note that once we approximate x S,v , we can approximate the value of PG (t)
by telescoping. Namely, we number vertices v1 , . . . , vn of G and let

PG (t) PG−v1 (t) PG−v1 −...−vn (t)


PG (t) = ···
PG−v1 (t) PG−v1 ,v2 (t) PG−v1 −...−vn−1 (t)
 ∗ ∗ ∗
−1
= x S0 ,v1 x S1 ,v2 · · · x Sn−1 ,vn ,

where S0 = ∅, S1 = {v1 }, S2 = {v1 , v2 }, . . . , Sn−1 = {v1 , . . . , vn−1 }.

5.2.4 Definition. For positive integers m and k ≥ 2, we define Tkm as the tree with
vertices at the levels 0, 1, . . . , m, with one vertex, called the root at the 0th level
connected to k − 1 vertices at the level 1, and with every vertex at the i-th level
connected to one vertex at the (i − 1)st level and k − 1 vertices at the (i + 1)-st level,
for i = 1, . . . , m − 1. Each vertex at the m-th level is connected to one vertex at the
(m − 1)st level, see Fig. 5.2.
We set the weight on every edge of Tkm equal to 1.

Fig. 5.2 The tree T33 v


w
156 5 The Matching Polynomial

5.2.5 Lemma. Let us fix k and let v m be the root of Tkm . For any t > 0, we have

PTkm −vm (t) 1 + 4t (k − 1) − 1
lim = .
m−→∞ PTkm (t) 2t (k − 1)

Moreover, for any t0 > 0, the convergence is uniform on the interval 0 < t ≤ t0 .

Proof. As follows from Sect. 5.2.3, for any  > 0 there is m 0 = m 0 (, k, t0 ) such
that for t ≤ t0 the value of PTkm −vm (t)/PTkm (t), up to an error , depends only on the
m 0 -neighborhood of v m . However, for all m ≥ m 0 the m 0 -neighborhoods of v m ∈ Tkm
remains the same, from which it follows that the limit in question, call it x, indeed
exists.
If we remove the root vm of Tkm and all incident edges, we get a vertex-disjoint
union of (k − 1) trees Tkm−1 , see Fig. 5.2. Hence by (5.2.1.2) the limit x satisfies the
equation
1
x= ,
1 + t (k − 1)x

from which √
1 + 4t (k − 1) − 1
x= .
2t (k − 1)

We interpret the limit in Lemma 5.2.5 as the limit probability that a random
matching in Tkm does not contain the root, cf. Sect. 5.2.1.

5.2.6 Regular graphs of large girth. A graph G is called k-regular if every vertex
of G is incident to precisely k edges. The girth of an undirected graph G = (V, E)
without loops or multiple edges, denoted gr G, is the smallest number g of ver-
tices of a cycle v1 − v2 − . . . − vg − v1 , where v1 , . . . , vg ∈ V are distinct and
{v1 , v2 }, {v2 , v3 }, . . . , {vg−1 , vg }, {vg , v1 } ∈ E. If G has no cycles, that is, if G is
a forest, we say that gr G = +∞. Locally (that is, in the vicinity of each vertex),
a graph of a large girth looks like a tree, which often allows us to understand the
behavior of its matching polynomial.

5.2.7 Lemma. Let us fix an integer k > 1 and let G n = (Vn , E n ; 1), n ∈ N, be
a sequence of k-regular graphs such that gr G n −→ +∞ as n −→ ∞ and with
uniform weights equal 1 on every edge of G n . Let vn ∈ Vn be a sequence of vertices.
Then, for the matching polynomials PG n (t) and PG n −vn (t) we have

PG n −vn (t) 2k − 2
lim = √ for all t > 0
n−→∞ PG n (t) k 1 + 4t (k − 1) + k − 2

and the convergence is uniform over vn ∈ Vn . Moreover, for any fixed t0 > 0, the
convergence is also uniform over all 0 ≤ t ≤ t0 .
5.2 Correlation Decay for the Matching Polynomial 157

Fig. 5.3 A 3-regular tree


3 3
with root at 0 and 3 levels
2
3 2 1 3
3 3
0
2 2
3 1 1 3
2 2
3 3 3 3

Proof. Let us fix t0 > 0 and an  > 0. As is discussed in Sect. 5.2.3, there exists
m 0 = m 0 (, t0 , k) such that up to an error , the ratio PG n −vn (t)/PG n (t) depends only
on the m 0 -neighborhood of vn in G n . If gr G n > m, the neighborhood of vn looks
like the k-regular tree with root at vn , uniform weight 1 on every edge and m levels,
see Fig. 5.3. Hence it follows that the limit in question, say y, indeed exists.
If we remove the vertex vn with incident edges, then the m-neighborhood of vn
in the resulting graph will be a vertex-disjoint union of k trees Tkm−1 . From (5.2.1.2)
it follows that
1
y= ,
1 + tkx

where x is the limit in Lemma 5.2.5. 

Again, we interpret the limit in Lemma 5.2.7 as the limit probability that a random
matching in G n does not contain a particular vertex vn .
Finally, we compute the logarithmic asymptotic of the partition function PG n (t)
for k-regular graphs of growing girth.

5.2.8 Theorem. Let us fix an integer k > 1 and let G n = (Vn , E n ; 1), n ∈ N, be
a sequence of k-regular graphs such that gr G n −→ +∞ as n −→ ∞ and with
uniform weights equal 1 on every edge of G n . Then, for any t > 0 we have
 √ 
ln PG n (t) k − 1 1 + 1 + 4tk − 4t
lim = ln
n−→∞ |Vn | 2 2
 √ 
k−2 k 1 + 4tk − 4t + k − 2
− ln
2 2k − 2
1 2kt − 2t
+ ln √ .
2 1 + 4tk − 4t − 1

Proof. By Lemma 5.2.7 and (5.2.1.3), we have


158 5 The Matching Polynomial

Fig. 5.4 The graph of the


limit in Theorem 5.2.8 for
10-regular graphs G n of
growing girth, as a function
of t

1 d 1 k−1
lim t ln PG n (t) = − √ ,
n−→∞ |Vn | dt 2 k 1 + 4t (k − 1) + k − 2

where for any t0 > 0, the convergence is uniform over 0 ≤ t ≤ t0 . Let us fix an
0 <  < t. Then

1 1
lim ln PG n (t) − ln PG n ()
n−→∞ |Vn | |Vn |
 t
1 k−1
= − √ dτ . (5.2.8.1)
 2τ kτ 1 + 4τ (k − 1) + τ (k − 2)

Since G n is k-regular, the number of edges of G n is k|Vn |/2 and we can bound

1 1 k
ln PG n () ≤ ln (1 + )k|Vn |/2 ≤ ln(1 + ).
|Vn | |Vn | 2

One can show that the integrand in (5.2.8.1) is regular at τ = 0, and in fact,

1 k−1 k
− √ = + O(τ ) as τ −→ 0 + .
2τ kτ 1 + 4τ (k − 1) + τ (k − 2) 2

Hence we can take the limit in (5.2.8.1) as  −→ 0+. Computing the integral, we
complete the proof. 

The graph of the limit for 10-regular graphs as a function of t is pictured on


Fig. 5.4.

5.3 Matching Polynomials of Bipartite Graphs

5.3.1 Definition. We consider the special case of PG (t)


 for a bipartite graph G.
Alternatively, for a given n ×n non-negative matrix A = ai j and integer 1 ≤ k ≤ n,
we define
5.3 Matching Polynomials of Bipartite Graphs 159

per k (A) = ai1 j1 · · · aik jk ,
1≤i 1 <...<i k ≤n
j1 ,..., jk pairwise distinct

the sum of permanents of all k × k submatrices of A and let


n
r A (t) = per k (A)t k , (5.3.1.1)
k=0

where we agree that per 0 (A) = 1.

Our exposition loosely follows Csikvári [Cs14] and Lelarge [Le15].

5.3.2 A 2-lift of a graph and a 2-lift of a matrix. Let G = (V, E; A) be an


undirected weighted graph without loops or multiple edges. We construct its 2-lift
H as follows. For each vertex v of G, we introduce two vertices, say v1 and v2 of H .
For each edge {u, v} if G we introduce two edges: either a pair {v1 , u 1 } and {v2 , u 2 }
of edges or a pair {v1 , u 2 } and {v2 , u 1 } of edges (we have a choice here), see Fig. 5.5.
We make H a weighted graph by copying the weight of edge e on the lifts of e.

For example, if G is a cycle with n vertices then a 2-lift H can be a pair of vertex
disjoint n-cycles of a 2n-cycle, see Fig. 5.6.
One can similarly define n-lifts. Random lifts of graphs were studied in connection
with expander constructions [AL06], but also in connection with perfect matchings
[LR05].  
Following [Le15],
 we define a 2-lift of an n × n matrix A = ai j as a 2n × 2n
matrix B = bi j , where for all 1 ≤ i, j ≤ n we have

either bi j = b(i+n)( j+n) = ai j and b(i+n) j = bi( j+n) = 0


or bi( j+n) = b(i+n) j = ai j and bi j = b(i+n)( j+n) = 0.

Fig. 5.5 Two ways to lift


an edge
u2 v2 u2 v2

u1 v1 u1 v1

u v u v
160 5 The Matching Polynomial

Fig. 5.6 2-lifts of a triangle


v2 v2

u2 w2 u w2
2
v1 v1

u1 w1 u1 w1

v v

u w u w

For example, if
12
A= (5.3.2.1)
34

then
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 2 0 0 1 0 0 2 1 0 0 2
⎜3 4 0 0⎟ ⎜0 4 3 0⎟ ⎜3 4 0 0⎟
B=⎜
⎝0
⎟,C = ⎜ ⎟ and D = ⎜ ⎟ (5.3.2.2)
0 1 2⎠ ⎝0 2 1 0⎠ ⎝0 2 1 0⎠
0 0 3 4 3 0 0 4 0 0 3 4

are 2-lifts of A. It is clear that our definitions of a 2-lift of a matrix and a 2-lift of a
weighted bipartite graph agree.
The following result was proved by Csikvári [Cs14] in the case of uniformly
weighted graphs and then extended by Lelarge [Le15] to arbitrary positive weights.

5.3.3 Theorem. Let G be a weighted bipartite graph with positive weights on the
edges and let H be a 2-lift of G. Then

PH (t) ≤ PG2 (t) for all t ≥ 0.

Equivalently, if A is an n × n non-negative matrix and B is a 2-lift of A then for the


polynomials r A (t) and r B (t) defined by (5.3.1.1), we have

r B (t) ≤ r A2 (t) for all t ≥ 0.

Proof. Let G be a trivial 2-lift of G consisting of two vertex-disjoint copies of G,


say, G 0 and G 1 . Since every matching in G  can be written uniquely as a disjoint
union of a matching in G 0 and a matching in G 1 , we deduce that
5.3 Matching Polynomials of Bipartite Graphs 161

PG (t) = PG2 (t)

(here we don’t use that G is bipartite).


Next, we are going to prove that

PH (t) ≤ PG (t) for all t ≥ 0. (5.3.3.1)

Let e1 , . . . , ek be a matching in H and let us consider the edges f 1 , . . . , f k of G that


are the images of e1 , . . . , ek under the natural projection H −→ G. Since e1 , . . . , ek
is a matching, each vertex in G belongs to at most two of the edges f 1 , . . . , f k .
Consequently, the multiset F = { f 1 , . . . , f k } is a vertex-disjoint union of edges
of mutiplicity 2, paths, and cycles. Since G is a bipartite graph, the cycles have
necessarily an even number of vertices.
Let us fix a multiset F = { f 1 , . . . , f k } of edges as above, obtained by a projection
of a k-matching in H , and let us compare the total weights W H (F) and WG (F) of
matchings in H and G  respectively, projected onto F. If F can be represented as a
vertex-disjoint union F = F1 ∪ F2 then clearly

W H (F1 ∪ F2 ) = W H (F1 )W H (F2 ) and WG (F1 ∪ F2 ) = WG (F1 )WG (F2 ),

so is suffices to compare W H (F) and WG (F) when F is connected.


If F consists of a single edge of multiplicity 2, then W H (F) = WG (F), as there
are exactly two edges of equal weight projected onto the edge in F, see Fig. 5.5.
If F is a path then W H (F) = WG (F), since if F is a projection of a matching
in H , there are exactly two matchings in H of the same weight projecting into
F, whose union consists of two vertex-disjoint paths projected onto F. Similarly,
there are exactly two matchings in G projected onto F whose union consists of two
vertex-disjoint paths projected onto F, see Fig. 5.7.

In particular, every path in G can be lifted to two matchings in G.
Finally, if F is an even cycle then W H (F) = WG (F). If a matching in H is
projected onto an even cycle in G, then there are exactly two such matchings in H .
 projected onto F containing two
Similarly, there are two vertex-disjoint cycles in G

matchings in G projected onto F, see Fig. 5.8.
This concludes the proof of (5.3.3.1) and hence the proof of the theorem. 

Fig. 5.7 If there is a


matching in H projecting
onto a path in G then there
H G
are exactly two such
matchings in H (one of thick
lines and the other of thin
lines). Similarly, there are
exactly two matchings in G 
projecting onto the same
path in G
G G
162 5 The Matching Polynomial

Fig. 5.8 If there is a


matching in H projecting
H G
onto an even cycle, then
there are exactly two such
matchings (one of thick lines
and the other of thin lines).
Similarly, there are exactly
two matchings in G 
projected onto the cycle
G G

Fig. 5.9 If a 4-cycle in G is


G
lifted to two vertex-disjoint H
4-cycles in H then there is a
4-matching in H projected
onto the cycle. If a 4-cycle in
G is lifted to an 8-cycle in H
then there is no 4-matching
in H projected onto the cycle

G G

Some remarks are in order. We may have PH (t) < PG2 (t) = PG (t), since there
can be a 2k-cycle in G which is the projection of a 4k-cycle in H . In that case, there
is a 2k-matching in G  projecting onto the 2k-cycle but there is no 2k-matching in H
projecting onto that 2k-cycle, see Fig. 5.9.
For example, for matrix A of (5.3.2.1), we have per B = per C = 100 = 102 for
2-lifts B and C of (5.3.2.2), while per D = 52 < 102 for the 2-lift D of (5.3.2.2).
We note that if G is a triangle and H is a 2-lift that is a 6-cycle such as on Fig. 5.6,
then PG (t) is a polynomial of degree 1 (since the maximum matching in G consists
of one edge) while PH (t) is a polynomial of degree 3 (since the maximum matching
in H consists of 3 edges). Therefore, Theorem 5.3.3 does not hold if G is not required
to be bipartite. For non-bipartite graphs, the proof breaks down at the last step: if F
is an odd cycle which is a projection of a matching in H , then F is a projection of
exactly two such matchings whose union is an even cycle of twice the length of F
and not two vertex-disjoint copies of F, see Fig. 5.10.
Applying 2-lifts repeatedly, one can obtain from a general graph a graph with a
larger girth that locally looks more and more like a tree.
The following result and its proof is attributed to Linial in [Cs14, Le15].
5.3.4 Lemma. Let G = (V, E) be an undirected graph without loops or multiple
edges. Then there is a graph H obtained by repeated applications of 2-lifts to G such
that gr H > gr G.
5.3 Matching Polynomials of Bipartite Graphs 163

Fig. 5.10 If there is a


matching projecting onto a
H
cycle of length 3, then there
are exactly two such
matchings (one of thick lines
and the other of thin lines)
whose union is an even cycle
of length 6

Proof. Suppose that gr G = g and let k be the number of cycles of length g. Let G  be
a random 2-lift of G, where independently for each edge {u, v} of G, we choose the
lift {u 1 , v1 } and {u 2 , v2 } or the lift {u 1 , v2 } and {u 2 , v1 }, with probability 1/2 each,
see Sect. 5.3.2. Then a g-cycle in G is lifted to a pair of g-cycles in G  with probability
1/2 and to a 2g-cycle in G  with probability 1/2. Indeed, a path v1 − v2 − . . . − vg of
length g is lifted to a pair of paths of length g each in G,  and then the closing edge
v1 − vg is either lifted to a pair of edges closing each path to a cycle of length g or to a
pair of edges patching the paths into a cycle of length 2g, see Fig. 5.9. Consequently,
for every 2-lift G  of G we have gr G  ≥ g and the expected number of g-cycles in G 
is k. Since with positive probability G  consists of two vertex-disjoint copies of G, in
which case the number of g-cycles in G  is 2k > k, there is a lift G  which has fewer
than k cycles of length g. Iterating, we conclude that there is a sequence of 2-lifts of
G which produces a graph H with no g-cycles, in which case gr H > gr G. 

As a corollary of Lemma 5.3.4, Theorems 5.3.3 and 5.2.8, we obtain the following
lower bound.

5.3.5 Theorem. Let G = (V, E; 1) be a k-regular bipartite graph with uniform


weights 1 on all edges. Then, for k ≥ 2,
 √ 
ln PG (t) k−1 1 + 1 + 4tk − 4t
≥ ln
|V | 2 2
 √ 
k−2 k 1 + 4tk − 4t + k − 2
− ln
2 2k − 2
1 2kt − 2t
+ ln √ .
2 1 + 4tk − 4t − 1

for all t > 0.


164 5 The Matching Polynomial

Proof. Using Lemma 5.3.4, for n = 1, . . . , we construct an infinite sequence G n =


(Vn , E n ; 1) of graphs where G 1 = G, graph G n+1 is a 2-lift of G n for all n and
gr G n −→ +∞ as n −→ ∞. Since |Vn+1 | = 2|Vn |, from Theorem 5.3.3 we
conclude that
ln PG n (t)
for n = 1, . . . ,
|Vn |

is a non-increasing sequence. The proof now follows from Theorem 5.2.8. 

Taking the limit as t −→ +∞, we obtain a lower bound for the number of perfect
matchings in a k-regular bipartite graph.
5.3.6 Theorem. Let A be an n × n matrix with 0-1 entries such that every row and
every column of A contains exactly k 1s. Then

ln per A
≥ (k − 1) ln(k − 1) − (k − 2) ln k.
n
Moreover, there is a sequence of {An } of n × n matrices with 0-1 entries, each
containing exactly k 1s such that

ln per An
lim = (k − 1) ln(k − 1) − (k − 2) ln k.
n−→∞ n

Proof. Let r A (t) be the matching polynomial of A, see Definition 5.3.1. Then r A (t)
is a polynomial of degree n with the coefficient of t n equal to per A > 0, see, for
example, Theorem 3.3.2.
Therefore,
ln r A (t) ln per A
lim − ln t = .
t−→+∞ n n

On the other hand, by Theorem 5.3.5,


 √ 
ln r A (t) 1 + 1 + 4tk − 4t
≥ (k − 1) ln
n 2
 √ 
k 1 + 4tk − 4t + k − 2
− (k − 2) ln
2k − 2
2kt − 2t
+ ln √
1 + 4tk − 4t − 1

and hence
5.3 Matching Polynomials of Bipartite Graphs 165
 √ 
ln r A (t) 1 + 1 + 4tk − 4t
− ln t ≥ (k − 1) ln √
n 2 t
 √ 
k 1 + 4tk − 4t + k − 2
− (k − 2) ln √
(2k − 2) t
 
2kt − 2t
+ ln √ √ .
1 + 4tk − 4t − 1 t

Taking the limit as t −→ +∞, we obtain

ln per A k − 1 k−2 k2 1
≥ ln(k − 1) − ln + ln(k − 1)
n 2 2 k−1 2
= (k − 1) ln(k − 1) − (k − 2) ln k,

as required.
As in the proof of Theorem 5.3.5, matrices {An } are obtained as adjacency matrices
of graphs G n that are repeated 2-lifts of a given bipartite k-regular graph and such
that gr G n −→ +∞ as n −→ ∞. 

We can rewrite the bound of Theorem 5.3.6 as


(k−1)n
k−1
per A ≥ k n ,
k

in which case it becomes the familiar bound (3.3.5.1).

5.3.7 Upper bounds for the matching polynomial of a k-regular graph. In


[D+15], Davies, Jenssen, Perkins and Roberts prove that if G = (V, E) is a k-regular
graph then, for any t > 0, the quantity

t d t PG (t)
ln PG (λ) =
|E| dt |E| PG (t)

attains its maximum when G is the k-regular complete bipartite graph, see Fig. 5.11.
This quantity is naturally interpreted as the expected proportion of the edges of G
covered by random matching in G, where the probability that a random matching
contains exactly s edges is proportional to t s .

As is remarked in [D+15], this implies that for any t > 0, the maximum of
1/|V |
PG (t)

over k-regular graphs G = (V, E) is attained when G is the complete bipartite graph.
166 5 The Matching Polynomial

Fig. 5.11 The complete


bipartite 3-regular graph

5.4 The Bethe-Entropy Lower Bound

The goal of this section is to prove the following result due to Lelarge [Le15].
 
5.4.1 Theorem. For a positive n × n matrix A = ai j and a real t > 0, let


n
r A (t) = per k (A)t k
k=0

be the matching polynomial, where per k (A) is the sum of the permanents of all k × k
submatrices of A and per 0 (A) = 1.  
On the set Mn of n × n nonnegative real matrices X = xi j such that


n 
n
xi j ≤ 1 for i = 1, . . . , n and xi j ≤ 1 for j = 1, . . . , n
j=1 i=1

let us define a function


n
  
n 
n
f A,t (X ) = ln tai j xi j − xi j ln xi j + (1 − xi j ) ln(1 − xi j )
i, j=1 i, j=1 i, j=1
⎛ ⎞ ⎛ ⎞

n 
n 
n
− ⎝1 − xi j ⎠ ln ⎝1 − xi j ⎠
i=1 j=1 j=1
   

n 
n 
n
− 1− xi j ln 1 − xi j .
j=1 i=1 i=1

Then f A,t is strictly concave on Mn , attains maximum on Mn at a unique point and

ln r A (t) ≥ max f A,t (X ).


X ∈Mn

Taking the limit as t −→ +∞, we obtain a lower bound for the permanent.
5.4.2 Theorem. Let Bn be the
 polytope
 of n × n doubly stochastic matrices, that is,
non-negative matrices X = xi j such that
5.4 The Bethe-Entropy Lower Bound 167


n 
n
xi j = 1 for i = 1, . . . , n and xi j = 1 for j = 1, . . . , n.
j=1 i=1

 
For a positive n × n matrix A = ai j and X ∈ Bn , let


n
ai j n
   
g A (X ) = xi j ln + 1 − xi j ln 1 − xi j .
i, j=1
xi j i, j=1

Then g A is a concave function and

ln per A ≥ max g A (X ).
X ∈Bn

The inequality of Theorem 5.4.2 was conjectured by Vontobel [Vo13] and deduced
by Gurvits [Gu11] from Schrijver’s inequality [Sc98]. We take a different route here,
due to Lelarge [Le15], first
 proving
 Theorem 5.4.1 and then obtaining Theorem 5.4.2
as a limit case. If A = ai j is a doubly stochastic matrix, from Theorem 5.4.2 we
get
n
   
ln per A ≥ g A (A) = 1 − ai j ln 1 − ai j ,
i, j=1

which is Schrijver’s inequality.


The lower bounds for ln r A (t) of Theorem 5.4.1 and for ln per A of Theorem 5.4.2
are known as the Bethe-entropy lower bounds. Their advantage is that they supply
an easily computable lower bound as a solution to a convex optimization problem.
We prove Theorem 5.4.1 by taking a closer look at the lift of an arbitrary positive
matrix. We follow  Lelarge [Le15] with some modifications.
Let A = ai j be an n × n positive matrix, which we interpret as the matrix
of weights on the complete bipartite graph K n,n with vertices 1L , . . . , n L and
1R, . . . , n R, so that the weight on the edge i L and j R is ai j , cf. Sect. 3.1.2. As
we iterate 2-lifts m times, as described in Sect. 5.3.2, we obtain a graph G with
N = 2m+1 n vertices, where each vertex has type 1L , . . . , n L or 1R, . . . , n R, depend-
ing on where it projects under the natural projection G −→ K n,n . Each vertex of
type i L is connected by an edge to a vertex of type j R with weight ai j on the edge
for j = 1, . . . , n and each vertex of type j R is connected by an edge to a vertex of
type i L with weight ai j on the edge for i = 1, . . . , n, see Fig. 5.12.
In particular, G = (V, E) is an n-regular graph. Our goal is to compute the
asymptotic of ln PG (t)/|V | as the girth of G grows. First, we prove a refinement of
ij ij
Lemma 5.2.5, for which we introduce trees Lm and Rm that are refinements of trees
Tm from Sect. 5.2.4.
k

ij
5.4.3 Definition. The tree Lm is a tree with m levels that has the root of type i L at
the level 0 connected to n − 1 vertices at the level 1 of type k R for all k = j. Every
vertex at the level 1 is connected to n vertices of the type k L for k = 1, . . . , n, one
168 5 The Matching Polynomial

Fig. 5.12 A part of the lift


of a 3 × 3 matrix 3L 2L
a31
a23
1R 3R
a21
a11 a13 a33 3L
1L
2L
a12

2R a
32
a22
2L 3L

Fig. 5.13 The L11


3 tree for
n=3 1L
a12 a13

2R 3R
a 22 a33
a32 a23
2L 3L 2L 3L
a21 a a31 a33 a a22 a a32
23 21 31

1R 3R 1R 3R 1R 2R 1R 2R

of which is the root while the other n − 1 are at the level 2. Every vertex at the level
2 is connected to n vertices of the type k R for k = 1, . . . , n, one of which is at the
level 1 and the other n − 1 are at the level 3, etc.
ij
The tree Rm is a tree with m levels that has the root of type j R at the level 0
connected to n − 1 vertices at the level 1 of type k L for all k = i. Every vertex at the
level 1 is connected to n vertices of the type k R for k = 1, . . . , n, one of which is the
root while the other n − 1 are at the level 2. Every vertex at the level 2 is connected
to n vertices of the type k L for k = 1, . . . , n, one of which is at the level 1 and the
other n − 1 are at the level 3, etc.
If we remove an edge connecting vertices of the type i L and j R in a lift of a
matrix, in the neighborhood of the removed edge, the lift will look like the union of
two trees of the types Li j and Ri j , see Figs. 5.12 and 5.13.
The weights on the edges are replicated in the usual way: an edge connecting
vertices of types i L and j R has weight ai j , see Fig. 5.13.

 
5.4.4 Lemma. Let us fix a positive n × n matrix A = ai j . Let vm denote the root
ij ij
of Lm , respectively Rm . For every t > 0 the limits
5.4 The Bethe-Entropy Lower Bound 169

PLimj −vm (t)


lim = li j = li j (A, t)
m−→∞ PLimj (t)

and
PRimj −vm (t)
lim = ri j = ri j (A, t)
m−→∞ PRimj (t)

exist and satisfy the system of equations


⎛ ⎞−1 ⎛ ⎞−1
 
l i j = ⎝1 + t aik rik ⎠ and ri j = ⎝1 + t ak j l k j ⎠
k: k= j k: k=i

for all 1 ≤ i, j ≤ n. Moreover, for any t0 > 0 the convergence is uniform over all
0 < t ≤ t0 .

Proof. The proof is a refinement of that of Lemma 5.2.5. As follows from Sect. 5.2.3,
for any  > 0 there is m 0 = m 0 (, A, t0 ) such that for t < t0 , the value of
PLimj −vm (t)/PLimj (t), up to an error , depends only on the m 0 -neighborhood of vm
ij ij
in Lm . However, for m > m 0 the m 0 neighborhood of vm in Lm remains the same,
from which it follows that the limit li j indeed exists. The existence of the limit ri j is
proved similarly.
ij
If we remove the root vm of Lm with all incident edges, we get a vertex-disjoint
ij
union of n − 1 trees Rm−1 for k = j and if we remove the root of vm of Rm with
ik
kj
all incident edges, we get a vertex-disjoint union of n − 1 trees Lm−1 for k = i. The
equations for li j and ri j then follow from (5.2.1.2). 

A crucial observation of Lelarge [Le15] relates the probabilities li j and ri j to the


solution of a convex optimization problem.
 
5.4.5 Lemma. Let usfix an n × n positive matrix A = ai j and t > 0. Let the set
Mn of matrices X = xi j and a function f = f A,t be defined as in Theorem 5.4.1.
Then f is a strictly concave function.
Let li j = li j (A, t) and ri j =ri j (A,
 t) be the probabilities from Lemma 5.4.4. Then
the matrix X ∗ = X ∗ (A, t) = xi∗j defined by

tai j li j ri j
xi∗j =
1 + tai j li j ri j

is the maximum point of f on Mn .

Proof. For i = 1, . . . , n, let us define


170 5 The Matching Polynomial


n 
n
gi (X ) = − xi j ln xi j + (1 − xi j ) ln(1 − xi j )
j=1 j=1
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞

n 
n n 
n
− ⎝1 − xi j ⎠ ln ⎝1 − xi j ⎠ + ⎝ xi j ⎠ ln ⎝ xi j ⎠
j=1 j=1 j=1 j=1

and ⎛ ⎞ ⎛ ⎞

n n
u i (X ) = ⎝ xi j ⎠ ln ⎝ xi j ⎠ .
j=1 j=1

For j = 1, . . . , n, let us similarly define


n 
n
h j (X ) = − xi j ln xi j + (1 − xi j ) ln(1 − xi j )
i=1 i=1
     n   n 

n 
n  
− 1− xi j ln 1 − xi j + xi j ln xi j
i=1 i=1 i=1 i=1

and  n   n 
 
v j (X ) = xi j ln xi j .
i=1 i=1

From Sect. 2.1.3, the functions gi (X ) and h j (X ) are concave, while from Sect. 2.1.1.2,
the functions u i and v j are convex.
Since we have


n
  
n 
n 
n 
n
f (X ) = ln tai j xi j + gi (X ) + h j (X ) − u i (X ) − v j (X ),
i, j=1 i=1 j=1 i=1 j=1

n 
we conclude that f (X ) is concave. Moreover, since i=1 u i (X ) + nj=1 v j (X ) is a
strictly convex function, the function f (X ) is strictly concave.
To check that X ∗ is indeed the maximum point of f , we compute the gradient
of f .
We have
   
∂      n  n
f (X ) = ln tai j −ln xi j −ln 1 − xi j +ln 1 − xik +ln 1 − xk j .
∂xi j k=1 k=1

Using the equations of Lemma 5.4.4, we write

tai j li j
tai j li j ri j = 
1+t k: k=i ak j l k j
5.4 The Bethe-Entropy Lower Bound 171

and 
tai j li j 1 + t nk=1 ak j lk j
1 + tai j li j ri j = 1 +  =  ,
1+t k: k=i ak j l k j 1 + t k: k=i ak j lk j

from which
tai j li j
xi∗j = n . (5.4.5.1)
1+t k=1 ak j l k j

Similarly, we write
tai j ri j
tai j li j ri j = 
1+t k: k= j aik rik

and 
1 + t nk=1 aik rik
1 + tai j li j ri j = 
1 + t k: k= j aik rik

from which
tai j ri j
xi∗j = n . (5.4.5.2)
1+t k=1 aik rik

It follows from (5.4.5.1) that


 −1

n 
n
1− xk∗j = 1 + t ak j l k j (5.4.5.3)
k=1 k=1

and it follows from (5.4.5.2) that


 −1

n 
n

1− xik = 1+t aik rik . (5.4.5.4)
k=1 k=1

In particular, it follows from (5.4.5.3) and (5.4.5.4) that X ∗ is a feasible point of f .


Using (5.4.5.1)–(5.4.5.4) we obtain that
 
∂    tai j li j 1 + t k: k= j aik rik
f (X ) = ln tai j − ln  − ln 
∂xi j X =X ∗ 1 + t nk=1 ak j lk j 1 + t nk=1 aik rik
   
 n n
− ln 1 + t aik rik − ln 1 + t ak j l k j
k=1 k=1

⎛ ⎞

= − ln li j − ln ⎝1 + t aik rik ⎠ = 0.
k: k= j
172 5 The Matching Polynomial

Hence the gradient of a strictly concave function f at a feasible point X ∗ is 0 and


X ∗ is the maximum point of f . 

Next, we prove a refinement of Lemma 5.2.7.


 
 a positive n × n matrix A = ai j , a real t > 0 and let
5.4.6 Lemma. Let us fix
X ∗ = X ∗ (A, t) = xi∗j be the maximum point of the function f A,t (X ) in Lemma
5.4.5.
Let G m be a sequence of weighted graphs obtained from the complete bipar-
tite graph K n,n with weights A by a repeated application of 2-lifts and such that
gr G m −→ +∞ as m −→ ∞. Let vm be a vertex of G m . Then

PG m −vm (t)  n
lim =1− xi∗j provided vm is of type i L
m−→∞ PG (t)
m j=1

and
PG m −vm (t)  n
lim =1− xi∗j provided vm is of type j R.
m−→∞ PG (t)
m i=1

The convergence is uniform over vm ∈ G m . Moreover, for any fixed t, the convergence
is also uniform over all 0 ≤ t ≤ t0 .

Proof. We begin as in the proof of Lemma 5.2.7. Let us fix t0 > 0 and an  > 0. As
is discussed in Sect. 5.2.3, there exists m 0 = m 0 (, A, t) such that up to an error ,
the ratio PG m −vm (t)/PG m (t) depends only on the m 0 -neighborhood of vm . However,
if gr G m > m 0 , the m 0 -neighborhood depends only on the type of the vertex vm , see
Fig. 5.13, from which it follows that the limit indeed exists.
If vm is of type i L and gr G m > s then in the s-neighborhood of vm , the graph
G m − vm looks like a vertex-disjoint union of n trees Ris−1 j , for j = 1, . . . , n.
Therefore, by (5.2.1.2) and Lemma 5.4.4, the limit is equal to
⎛ ⎞−1

n 
n
⎝1 + t ri j ⎠ =1− xi∗j ,
j=1 j=1

where the last equation follows by (5.4.5.4).


If vm is of type j L and gr G m > s then in the s-neighborhood of vm , the graph
G m −vm looks like a vertex-disjoint union of n trees Lis−1
j , for i = 1, . . . , n. Therefore,
by (5.2.1.2) and Lemma 5.4.4, the limit is equal to
 −1

n 
n
1+t li j =1− xi∗j ,
i=1 i=1

where the last equation follows by (5.4.5.3). 


5.4 The Bethe-Entropy Lower Bound 173

5.4.7 Proof of Theorem 5.4.1. Let us fix a t0 > 0. For m = 1, 2, . . ., let G m be


a weighted graph obtained from K n,n with matrix A of weights by a sequence of m
2-lifts and such that gr G m −→ +∞ as m −→ ∞, see Lemma 5.3.4. Then G m has
n2m+1 vertices. For each i = 1, . . . , n, exactly 2m of the vertices have type i L and
for each j = 1, . . . , n, exactly 2m of the vertices have type j R. Applying (5.2.1.3)
and Lemma 5.4.6, we conclude that

1  ∗
n
d ln PG m (t)
lim = x (t), (5.4.7.1)
m−→∞ dt n2m+1 2nt i, j=1 i j

and the convergence is uniform for all 0 < t ≤ t0 .


On the other hand, since the function t −→ f A,t (X ) is smooth and strictly con-
cave, the maximum point X ∗ (t) depends smoothy on t. Since X ∗ (t) is the maximum
point, we get
∂ 

f t (X ) = 0 for all i, j
∂xi j X =X ∗ (t)

and, therefore,

d  ∗  
n
∂  ∂ 
 d ∗ 
f t X (t) = f t (X ) xi j (t) + f t (X )
dt i, j=1
∂xi j X =X ∗ (t) dt ∂t X =X ∗ (t)

1  ∗
n
= x (t).
t i, j=1 i j

Therefore, by (5.4.7.1)

d ln PG m (t) 1 d  ∗ 
lim m+1
= f t X (t) (5.4.7.2)
m−→∞ dt n2 2n dt
and the convergence is uniform over all 0 < t ≤ t0 .
As is easy to see,
 
lim f t X ∗ (t) = lim max f t (X ) = 0
t−→0+ t−→0+ X ∈Mn

and, as in the proof of Theorem 5.2.8, we have

ln PG m (t)
lim = 0.
t−→0+ n2m+1

Then from (5.4.7.2), we obtain

ln PG m (t) 1  ∗  1
lim = f t X (t) = max f t (X ) (5.4.7.3)
m−→∞ n2m+1 2n 2n X ∈Mn
174 5 The Matching Polynomial

for all 0 < t ≤ t0 and hence t0 was chosen arbitrarily, (5.4.7.3) holds for all t > 0.
Since by Theorem 5.3.3, we have

ln r A (t) ln PG m (t)
≤ ,
2n n2m+1
the proof follows. 

5.4.8 Proof of Theorem 5.4.2. Since r A (t) is a polynomial with the highest term
(per A) t n with per A > 0, we have

lim ln r A (t) − n ln t = ln per A. (5.4.8.1)


t−→+∞

By Sect. 2.1.3, g A (X ) is a continuous concave function and hence the maximum of


g A on Bn is attained, say, at doubly stochastic matrix X ∗ . Then X ∗ ∈ Mn , where
Mn is the set of matrices defined in Theorem 5.4.1 and hence by Theorem 5.4.1, for
every t > 0 we have

  
n
 
ln r A (t) − n ln t ≥ f A,t X ∗ − n ln t = g A (X ∗ ) − n ln t + (ln t) xi∗j = g A X ∗
i, j=1

and the proof follows by (5.4.8.1).

5.5 Hypergraph Matching Polynomial

5.5.1 The matching polynomial of a hypergraph. Let H = (V, E) be a d-uniform


hypergraph with set V of vertices and set E of edges. Hence each edge e ∈ E is a
set of d vertices from V . A matching in H is a set of pairwise vertex-disjoint edges.
Given complex weights w : E −→ C on the edges of H , we define the weight of a
matching {e1 , . . . , ek } by w(e1 ) · · · w(ek ). We consider the matching with no edges
as having weight 1. We define the matching polynomial of H by

PH (w) = w(e1 ) · · · w(ek ),
e1 ,...,ek ∈E:
e1 ,...,ek is a matching

where the sum includes all matchings in H , including the empty matching with the
corresponding product equal to 1.
The following result bounds from below the distance from complex zeros of
PH (w) to the origin.
5.5 Hypergraph Matching Polynomial 175

5.5.2 Theorem. Let H = (V, E) be a d-uniform hypergraph for d > 1 and let
w : E −→ C be complex weights such that

 (d − 1)d−1
|w(e)| ≤ for all v ∈ V.
e∈E:
dd
v∈e

Then
PH (w) = 0.

Proof. Given a set S ⊂ V of vertices, let H − S be the hypergraph with set V \ S


of vertices and consisting of the edges of H that do not intersect S. We denote the
restriction of w : E −→ C onto H − S also by w. Then for any vertex v ∈ V , we
have 
PH (w) = PH −v (w) + w(e)PH −e (w), (5.5.2.1)
e∈E:
v∈e

where PH −v (w) accounts for all matchings not containing v, whereas the sum
accounts for all matchings containing v (we use H − v as a shorthand for H − {v}).
We prove by induction on |V | that PH (w) = 0 and, moreover, for any vertex v
of V , we have  
 
1 − PH −v (w)  ≤ 1
.
 PH (w)  d −1

If |V | < d, we have PH (w) = PH −v (w) = 1 and the inequality holds. If |V | = d,


the hypergraph may contain either one edge or no edges. In the former case, we have
PH (w) = 1 + w(e) while PH −v (w) = 1 and the inequality reduces to
   
 1   w(e)  1
1 − =  ≤ ,
 1 + w(e)   1 + w(e)  d −1

which obviously holds when w(e) = 0. If w(e) = 0, we can further write


    −1
 w(e)   1  dd (d − 1)d−1
  = 
 1 + w(e)   1 + w(e)−1  ≤ (d − 1)d−1
−1 =
d d − (d − 1)d−1
(d − 1)d−1 1
< = .
d −d
d d−1 d −1

If H contains no edges then PH −v (w) = PH (w) = 1 and the inequality holds as


well.
Suppose now |V | > d. By the induction hypothesis, for every vertex v1 ∈ V , we
have PH −v1 (w) = 0. We rewrite (5.5.2.1) as
176 5 The Matching Polynomial

PH (w)  PH −e (w)
=1+ w(e) . (5.5.2.2)
PH −v1 (w) e∈E:
PH −v1 (w)
v∈e

Telescoping, for every edge e = {v1 , v2 , . . . , vd } containing v1 , we can write

PH −e (w) PH −e (w) PH −{v1 ,v2 } (w)


= ··· , (5.5.2.3)
PH −v1 (w) PH −{v1 ,v2 ,...,vd−1 } (w) PH −{v1 } (w)

where by the induction hypothesis, we have

PH −{v1 ,...,vk+1 } (w) = 0 and


 
 
1 − PH −{v1 ,...,vk+1 } (w)  ≤ 1
for all k = 1, . . . , d − 1,
 PH −{v1 ,...,vk } (w)  d −1

from which
 
 PH −{v1 ,...,vk+1 } (w)  d
  ≤ for k = 1, . . . , d − 1. (5.5.2.4)
 P (w)  d − 1
H −{v1 ,...,vk }

Combining (5.5.2.2)–(5.5.2.4), we conclude that


 
  d−1
1 − PH (w)  ≤ (d − 1)
d−1
d 1
 = , (5.5.2.5)
PH −v1 (w)  dd d −1 d

from which it follows that


PH (w) = 0.

The transformation z −→ 1/z maps the disc


 
1
D = z : |1 − z| ≤
d

onto the disc with center on the real axis and whose boundary intersects the real axis
in the points d/(d + 1) and d/(d − 1). Therefore, from (5.5.2.5), we have
   
   
1 − PH −v1 (w)  ≤ 1 − d  = 1 ,
 PH (w)   d − 1 d − 1

which completes the induction. 

The bound of Theorem 5.5.2 decreases as 1/ed as d grows.


For a weight w : E −→ C and a parameter z ∈ C, let zw denote the weight on
the edges of H scaled by z. Then
5.5 Hypergraph Matching Polynomial 177

dk  

PH (wz)  = k! w (e1 ) · · · w (ek ) .
dz k z=0
e1 ,...,ek
is a matching

In particular, the derivative can be computed in |E| O(k) time by the direct enumeration
of all matchings of k edges in H . As follows from Lemma 2.2.1 and Sect. 2.2.2, for
any 0 ≤ δ < 1, fixed in advance, for any complex weights w : H −→ C satisfying

 (d − 1)d−1
|w(e)| ≤ δ
e∈E:
dd
v∈E

and any 0 <  < 1, the value of PH (w) can be approximated within relative error 
in |E| O(ln |E|−ln ) time. If the largest degree of a vertex is bounded above in advance,
the computation can be done in genuine polynomial time via the approach of [PR16],
see also Sect. 6.6.
The correlation decay approach to computing PH (w) was tried in [D+14], [S+16].
In particular, a polynomial time approximation algorithm was constructed in [D+14]
that counts the number of matchings in a 3-uniform hypergraph such that the degree
of every vertex does not exceed 3.
We apply Theorem 5.5.2 to multidimensional permanents,  see Sect. 4.4. We show
that if each slice of a d-dimensional tensor A = ai1 ...id is close in the 1 -metric to the
tensor of all 1s, then PER A = 0 and, consequently, ln PER A can be approximated
in quasi-polynomial time. This contrasts with Theorem 4.4.2, where we require the
deviation to be small in the ∞ -metric.
 
5.5.3 Theorem. Let A = ai1 ...id be an n × . . . × n array of n d complex numbers,
such that
  
1 − ai ...i  ≤ αd−1 (d − 1) n d−1
d−1
1 d d
1≤i ,...,i ,i ,...,i ≤n
d
1 j−1 j+1 d

for all 1 ≤ i j ≤ n and all j = 1, . . . , d, where

α ≈ 0.2784645428

is the positive solution of the equation

xe1+x = 1.

Then
PER A = 0.

The following lemma is a weaker version of a bound from [Wa03].


5.5.4 Lemma. For a positive integer n, let us define a polynomial
178 5 The Matching Polynomial

n
zk
pn (z) = .
k=0
k!

Then
pn (z) = 0 provided |z| ≤ αn,

where α ≈ 0.2784645428 is the constant of Theorem 5.5.3.

Proof. First, we observe that


 1−z 
ze  ≤ |z|e1+|z| ≤ 1 provided |z| ≤ α.

Then for |z| ≤ α, we have


   
   −nz ∞
(nz) k


 n −n ∞
n k k−n 
z 
1 − e pn (nz) = e
−nz
 =  ze (1−z)
e 
 k!   k! 
k=n+1 k=n+1
∞
nk
≤ e−n < 1
k=n+1
k!

and hence pn (nz) = 0. 

Szegő proved that as n grows, the zeros of pn (nz) converge to the curve
 
z : ze1−z  = 1, |z| ≤ 1 ,

see [PV97].

5.5.5 Proof of Theorem 5.5.3. We have

 
n  
n
  
PER A = aiσ2 (i)...σd (i) = 1 + aiσ2 (i)...σd (i) − 1 .
σ2 ,...,σd ∈Sn i=1 σ2 ,...,σd ∈Sn i=1

We consider the complete d-partite graph H = (V, E) with n +. . .+n = nd vertices


and the weight of the edge (i 1 , . . . , i d ) equal ai1 ...id − 1. Let

Wk = w(e1 ) · · · w(ek )
e1 ,...,ek ∈E:
e1 ,...,ek is a matching

be the total weight of k-matchings in H , where we agree that W0 = 1. Then


n 
n
1 d−1
n −(d−1)
PER A = ((n − k)!)d−1 Wk = (n!)d−1 Wk .
k=0 k=0
k! k
5.5 Hypergraph Matching Polynomial 179

Let us define the univariate polynomial


n
q(z) = Wk z k .
k=0

Interpreting the value of q(z)


 as the value
 of the hypergraph matching polynomial
PH on the scaled weights z ai1 ...id − 1 , from Theorem 5.5.2 we deduce that

1
q(z) = 0 provided |z| ≤ .
(αn)d−1

Let
n
zk
p(z) = .
k=0
k!

By Lemma 5.5.4,
p(z) = 0 provided |z| ≤ αn.

Applying Corollary 2.5.10 successively to the pairs { p, q}, { p, p ∗ q}, . . . , { p, p ∗


. . . ∗ p ∗ q}, we conclude that the polynomial


n
1 d−1
n −(d−1)
r (z) = Wk z k
k=0
k! k

satisfies
r (z) = 0 provided |z| ≤ 1.

In particular,


n
1 d−1
n −(d−1)
r (1) = Wk = (n!)−(d−1) PER A = 0,
k=0
k! k

as claimed. 
Chapter 6
The Independence Polynomial

Known in statistical physics as the partition function of a hard core model, the
independence polynomial of a graph is a far-reaching extension of the matching
polynomial, demonstrating a much more complicated behavior. The roots of the
independence polynomial do not have to be real, but the Dobrushin–Scott–Sokal
bound for its complex roots is similar to the bound for the roots of the match-
ing polynomial. The correlation decay is observed for sufficiently small activities
but disappears for large activities, so there is phase transition. The highlight of the
chapter is in establishing the exact point of that phase transition, first for regular
trees, and then, following Weitz, for arbitrary graphs. It also provides us with an
instance where the correlation decay approach outperforms the Taylor polynomial
interpolation method (so far). The two methods would achieve the same degree of
approximation if there are no roots of the independence polynomial near the positive
real axis up to the Weitz bound, as was conjectured by Sokal. We prove a result
of Regts stating that there are indeed no roots near the positive real axis halfway
between the Dobrushin–Scott–Sokal and Weitz bounds.

6.1 The Independence Polynomial of a Graph

6.1.1 Definition. Let G = (V, E) be an undirected graph with set V of vertices, set
E of edges, without loops or multiple edges. A set U ⊂ V of vertices is independent
if no two vertices of U span an edge of G. We consider the empty set ∅ independent.
Let CV be the complex vector space with coordinates indexed by the vertices of G,
hence we write z = (z v : v ∈ V ) for a typical z ∈ CV . For a subset U ⊂ V we
consider the monomial 
zU = zv ,
v∈U

© Springer International Publishing AG 2016 181


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_6
182 6 The Independence Polynomial

where we agree as usual that


z∅ = 1.

We define the independence polynomial of G, ind : CV −→ C, by



ind G (z) = zU .
U ⊂V
U is independent

In particular, ind G (0) = 1. We call z v the activity of v. In statistical physics, ind G (z)
is known as the partition function of the “hard core model”. It describes mutually
repelling particles that can occupy positions at the vertices of a graph and avoid
coming too close to each other, that is, avoid occupying adjacent vertices.
Let v ∈ V be a vertex and let

Nv = {u ∈ V : {u, v} ⊂ E}

be the neighborhood of v in G (note that we do not include v in its own neighborhood).


For sets A, B ⊂ V of vertices, by G(A) we denote the subgraph induced by the subset
A of vertices (hence two vertices from A span an edge of G(A) if and only if they
span an edge of G) and by G(A) − B we denote the graph obtained from G(A) by
deleting all vertices from B together with incident edges. If an independent set U
contains v then it cannot contain any of the vertices adjacent to v and we arrive to
the identity
ind G (z) = ind G−v (z) + z v ind G−v−Nv (z), (6.1.1.1)

see Fig. 6.1 (we use G − v − Nv as a shorthand for G − ({v} ∪ Nv )).

Fig. 6.1 a A graph G with a (a)


vertex v, b the graph G − v
and c the graph G − v − Nv

(b) (c)
6.1 The Independence Polynomial of a Graph 183

The following result on the location of zeros of ind G was obtained by Dobrushin,
see [Do96] and [SS05]. We follow [CF16], which, in turn, contains a modification
of an argument from [Bo06].

6.1.2 Theorem. Let G = (V, E) be a graph and let 0 < rv < 1 : v ∈ V be reals.
Suppose that 
|z v | ≤ (1 − rv ) ru .
u∈V :
{u,v}∈E

Then
ind G (z) = 0.

Proof. Recall that for a set A ⊂ V of vertices, we denote by G(A) the induced
subgraph on the set A. We formally consider the polynomial ind G(A) as a function on
CV , although the variables z v with v ∈
/ A do not enter into it. We prove by induction
on |A| the following two inequalities:

ind G(A) (z) = 0 (6.1.2.1)

and ⎛ ⎞−1
 
 ind G(B) (z)  
  ≤ ⎝ ru ⎠ for any B ⊂ A (6.1.2.2)
 ind 
G(A) (z) u∈A\B

and z ∈ CV satisfying the conditions of the theorem. We agree that the right hand
side of (6.1.2.2) is 1 if B = A. If A = V then (6.1.2.1) is what we need.
If A = ∅ then (6.1.2.1) and (6.1.2.2) hold trivially. Suppose that A = ∅ and that
(6.1.2.1) and (6.1.2.2) hold for all proper subsets of A. Let us choose v ∈ A. By the
induction hypothesis, ind G(A)−v (z) = 0 and using (6.1.1.1) we can write

ind G(A) (z) = ind G(A)−v (z) + z v ind G(A)−v−Nv (z)


ind G(A)−v−Nv (z)
= ind G(A)−v (z) 1 + z v . (6.1.2.3)
ind G(A)−v (z)

By the induction hypothesis, from (6.1.2.2) it follows that

⎛ ⎞−1 ⎛ ⎞−1
   
 ind G(A)−v−Nv (z)  ⎜ ⎟
  ≤ ⎝ ru ⎠ = ⎝ ru ⎠ ,
 ind (z) 
G(A)−v u∈(A\{v})\(A\({v}∪N )) u∈A:
v
{u,v}∈E

where the last equality follows since every vertex u ∈ (A \ {v}) \ (A \ ({v} ∪ Nv ))
is necessarily connected to v by an edge, cf. Fig. 6.1. Therefore,
184 6 The Independence Polynomial
 
 ind G(A)−v−Nv (z) 
z v  ≤ (1 − rv ) < 1 (6.1.2.4)
 ind (z) 
G(A)−v

and (6.1.2.1) follows by (6.1.2.3). We also note that from (6.1.2.3) and the first
inequality in (6.1.2.4) it follows that
 
 ind G(A)−v (z) 
  ≤ r −1 . (6.1.2.5)
 ind  v
G(A) (z)

Let B ⊂ A be a subset. If B = A then (6.1.2.2) holds trivially, so we assume that


B is a proper subset of A. Then for some v ∈ A we have B ⊂ A \ {v} and applying
the induction hypothesis to the pair B ⊂ A \ {v} and using (6.1.2.5) we obtain
⎛ ⎞−1
    
 ind G(B) (z)   ind G(B) (z)   ind G(A)−v (z)  
  =   ≤ ⎝ ru ⎠ rv−1
 ind (z)   ind (z)   ind (z) 
G(A) G(A−v) G(A) u∈(A\{v})\{B}

= ru−1 ,
u∈A\B

which completes the proof. 

Suppose that the degree of every vertex of G does not exceed some  ≥ 1.
Choosing

rv = for all v ∈ V
+1

we obtain from Theorem 6.1.2 that


ind G (z) = 0 provided |z v | ≤ for all v ∈ V.
( + 1)+1

Scott and Sokal [SS05] showed that the bound can be improved somewhat.
6.1.3 Theorem. Suppose that the degree of every vertex of G does not exceed some
 ≥ 2. Then

( − 1)−1
ind G (z) = 0 provided |z v | ≤ for all v ∈ V.

Proof. The proof is very similar to that of Theorem 5.5.2. We proceed by induction
on the number |V | of vertices. If |V | = 1, the result holds, so we assume that |V | > 1.
We embed into our inductive proof yet another inductive argument (the inner
induction as opposed to the outer induction). Namely, we prove by induction on |V |
that if G = (V, E) is a graph with the largest degree (G) ≤  of a vertex and if
v ∈ V is a vertex of degree at most  − 1 then
6.1 The Independence Polynomial of a Graph 185
 
 ind G−v (z)  1
ind G (z) = 0 and 1 − <
ind G (z)  −1
( − 1)−1
provided |z u | ≤ for all u ∈ V.

The case of |V | = 1 is easily verified, so we assume that |V | ≥ 2. By the outer
induction hypothesis, ind G−v (z) = 0, so we can rewrite (6.1.1.1) as

ind G (z) ind G−v−Nv (z)


= 1 + zv . (6.1.3.1)
ind G−v (z) ind G−v (z)

Let Nv = {v1 , . . . , vk } for some 0 ≤ k ≤  − 1. If k = 0, that is, if v is an isolated


vertex, then    
   
1 − ind G−v (z)  = 1 − 1  < 1
 ind G (z)   1 + zv  −1

and the step of the inner induction is completed.


Suppose that k > 0 so that v has neighbors in G. Then

ind G−v−Nv (z) ind G−v−v1 (z) ind G−v−v1 −...−vk (z)
= ··· . (6.1.3.2)
ind G−v (z) ind G−v (z) ind G−v−v1 −...−vk−1 (z)

By the inner induction hypothesis

ind G−v−v1 −...−vi (z) = 0 for i = 1, . . . , k.

Moreover, since the degree of vi in the graph G − v − v1 − . . . − vi−1 does not exceed
 − 1, by the inner induction hypothesis
 
 
1 − ind G−v−v1 −...−vi (z)  < 1
for i = 1, . . . , k. (6.1.3.3)
 ind G−v−v1 −...−vi−1 (z)  −1

Hence from (6.1.3.2) we conclude that


  −1
 ind G−v−Nv (z)  
  <
 ind 
G−v (z) −1

and from (6.1.3.1) we conclude that


 
  −1
−1
1 − ind G (z)  < ( − 1) ·
1
= .
 ind G−v (z)    ( − 1) −1 
186 6 The Independence Polynomial

Therefore, ind G (z) = 0 and, as in the proof of Theorem 5.5.2, we conclude that
 
 
1 − ind G−v (z)  < 1
,
 ind G (z)  −1

which concludes the inner induction.


To conclude the outer induction, it remains to prove that ind G (z) = 0 if the degree
of every vertex v of G is . We choose an arbitrary vertex v and use (6.1.3.1) and
(6.1.3.2) as above, only that the right hand side of (6.1.3.2) is a product of  (as
opposed to  − 1) factors. Still, the degree of vi in G − v − v1 − . . . − vi−1 does not
exceed  − 1 and therefore (6.1.3.3) still holds. Hence from (6.1.3.2) we conclude
that  
 ind G−v−Nv (z)  
  < 
 ind 
G−v (z) −1

and from (6.1.3.1) we have


 
  −1

1 − ind G (z)  < ( − 1) · =
1
 ind G−v (z)    ( − 1)  −1

and ind G (z) = 0. 

As is discussed in [SS05], the bound of Theorem 6.1.3 is optimal, as it is asymp-


totically achieved on regular trees. Also, see [SS05] for extensions, generalizations
and connections to Lovász’s Local Lemma.

6.1.4 Example: the Tutte polynomial of a graph. In [CF16], Csikvári and and
Frenkel deduced from Theorem 6.1.2 bounds on the zeros of a wide class of graph
polynomials, which they call polynomials of exponential type. We consider one
example from [CF16], the Tutte polynomial of a graph.

Let G = (V, E) be a graph. Let w = (we : e ∈ E) be a vector of complex


variables indexed by the edges e of G and let ζ be yet another complex variable. We
define the Tutte polynomial of G by
 
TG (ζ, w) = ζ κ(A) we ,
A⊂E e∈A

where the sum is taken over all sets A of edges of G and κ(A) is the number
of connected components in the graph with set V of vertices and set A of edges.
In particular, TG is a monic polynomial in ζ of degree |V | since for A = ∅ the
corresponding monomial is just ζ |V | .
We express TG (ζ, w) in terms of the independence polynomial of some other
graph G = (V , E) as follows. The vertex set V consists of all subsets U ⊂ V such
that |U | ≥ 2. Two subsets U1 and U2 span an edge in G if and only if U1 ∩ U2 = ∅.
Hence the independent sets in G are the collections of pairwise disjoint subsets
6.1 The Independence Polynomial of a Graph 187

U1 , . . . , Uk ⊂ V , each of cardinality at least 2. Let G(U ) be the subgraph of G


induced on U . We define the activity zU of a vertex U of G by

zU = 0 if G(U ) is not connected

and by 
zU = ζ 1−|U | we if G(U ) is connected.
e∈E
both endpoints o f e lie in U

If U1 , . . . , Uk ⊂ V are pairwise disjoint subsets of cardinality at least 2, such that


all induced subgraphs G(U1 ), . . . , G(Uk ) are connected then for the set A ⊂ E of
edges that is the union of the sets of edges in G(U1 ), . . . , G(Uk ), we have
 

k
κ(A) = k + |V | − |Ui | ,
i=1

since the connected components of the graph with set V of vertices and set A of edges
are the induced subgraphs G(U1 ), . . . , G(Uk ) and the remaining isolated vertices.
On the other hand,
k k
(1 − |Ui |) = k − |Ui |,
i=1 i=1

from which we deduce that

TG (ζ, w) = ζ |V | ind G (z). (6.1.4.1)

Let us consider constant weights we = w0 for some w0 ∈ C. Using (6.1.4.1) and


Theorem 6.1.2, Csikvári and and Frenkel [CF16] prove that TG (ζ, w) = 0 if

|ζ| > γ(G) (1 + |w0 |)(G)

for some absolute constant γ > 0 (one can choose γ = 21).


For some specializations of the Tutte polynomial better bounds are known. For
example, if we = −1 for all e ∈ E then chr G (ζ) = TG (ζ, w) is the chromatic
polynomial of G, which, for positive integer ζ counts the number of ways to color
the vertices of G into at most ζ colors so that no two vertices spanning an edge of
G are colored with the same color, see also Lemma 6.5.5. Sokal [S01a] proved that
chr G (ζ) = 0 if |ζ| > 8(G).

6.1.5 Computing the independence polynomial. As was noticed in [Re15], Theo-


rem 6.1.3 allows one to approximate ind G (z) within relative error  in |V | O(ln |V |−ln )
time provided
( − 1)−1
|z v | ≤ δ

188 6 The Independence Polynomial

for any 0 < δ < 1, fixed in advance, where  = (G) ≥ 2 is the largest degree of
vertex of G. To see that, let us consider a univariate function f (ζ) = ln ind G (ζz).
Let pm (ζ) be the Taylor polynomial of degree m of f (ζ) at ζ = 0. It follows from
Lemma 2.2.1 that pm (1) approximates f (1) within an additive error  provided
m = O(ln |V | − ln ). Moreover, by Sect. 2.2.2, to compute the pm (ζ) it suffices to
compute
dk 

k
ind G (ζz)  ,
dζ ζ=0

which in turn reduces to the enumeration of all independent sets of G of size at most
m, which can be accomplished in |V | O(m) time. Patel and Regts show [PR16] that
if the largest degree (G) of a vertex of G is fixed in advance then ind G (z) can be
approximated in polynomial time (|V |/) O(1) , see Sect. 6.6.
Similar algorithms are described in [PR16] and [Re15] for other combinatorial
polynomials. As Regts notes [Re15], for some polynomials p computing values
p(z) for with |z| large is feasible, for which one should apply Lemma 2.2.1 to the
polynomial
p̃(z) = z deg p p(1/z).

A natural example is provided by the chromatic polynomial, see Sect. 6.1.4 and
Lemma 6.5.5, where Lemma 2.2.1 produces a quasi-polynomial approximation
algorithm to compute chr G (ζ) provided |ζ| > γ(G) for any γ > 8, fixed in
advance. Sokal conjectured, see [Ja03], that chr G (ζ) = 0 provided ζ > (G).
Should this conjecture be true, chr G (ζ) can be efficiently approximated provided
ζ > (1 + δ)(G) for any fixed δ > 0, see [PR16].
We note that

( − 1)−1 1 1
= 1+O
 e  (6.1.5.1)
as  −→ +∞.

It is shown in [LV99] that the problem of approximating ind G (z) is NP-hard provided
z = (λ, . . . , λ) for λ > c/(G), where c > 0 is an absolute constant.
There are certain parallels between the matching polynomial considered in
Chap. 5 and the independence polynomial. Given a graph G = (V, E), one can
consider its line graph L(G). The vertices of L(G) are the edges of E and two ver-
tices of L(G) span an edge if the corresponding edges in G share a vertex. Then a
matching in G corresponds to an independent set in L(G) and vice versa. Line graphs
form a rather restricted class of graphs, for example, they are always claw-free, that
is, do not contain an induced subgraph pictured on Fig. 6.2.
Extending Theorem 5.1.2, Chudnovsky and Seymour [CS07] proved that the roots
of the univariate independence polynomial (when all activities z v are equal) of a claw-
free graph are real. In that case, using Lemma 2.2.3 and arguing as in Sect. 5.1.7, for
any δ ≥ 1, fixed in advance and any complex z such that
6.1 The Independence Polynomial of a Graph 189

Fig. 6.2 A claw

( − 1)−1 1
|z| ≤ δ 
and |π − arg z| ≥
 δ
we can approximate ind G at z v = z for all v ∈ V within a relative error  > 0 in
n O(ln n−ln ) time. Furthermore, Patel and Regts show [PR16] that if (G) is fixed in
advance, the algorithm can be made genuinely polynomial, see also Sect. 6.6.
We also note that the nearest to the origin complex root of the univariate indepen-
dence polynomial of any graph is necessarily negative real [SS05]. More precisely,
let us fix any vector of non-negative real activities x = (xv ≥ 0 : v ∈ V ) at the
vertices V of a graph G and for a ζ ∈ C, let us consider its scaling

ζ x = (ζ xv : v ∈ V ) .

Then among the roots of the univariate polynomial

g(z) = ind G (ζ x) where ζ x = (ζ xv : v ∈ V )

nearest to the origin, there is necessarily a negative real root. We prove this later in
Theorem 6.5.4.

6.2 The Independence Polynomial of Regular Graphs

6.2.1 The probability space of independent sets. Let G = (V, E) be a graph. For
a real t > 0 we consider the value of the independence polynomial ind G (z) where
z v = t for all v ∈ V , which we denote just by ind G (t). We consider the set of all
independent sets S ⊂ V , including the empty set, as a finite probability space with

t |S|
Pr (S) = .
ind G (t)

Then
190 6 The Independence Polynomial

d t ind G (t) 1 
t ln ind G (t) = = |S|t |S|
dt ind G (t) ind G (t) S⊂V
is independent

= |S|Pr (S) = E|S| (6.2.1.1)
S⊂V
is independent

is naturally interpreted as the expected size of a random independent set S. Conse-


quently,
t d
ln ind G (t)
|V | dt

is naturally interpreted as the expected fraction of vertices contained in a random


independent set S.
Assume now that G is k-regular, that is, every vertex of G is incident to exactly k
edges of G. Davies, Jenssen, Perkins and Roberts proved [D+15] that the expected
fraction of vertices contained in a random independent set of a k-regular graph
is maximized when G is the vertex-disjoint union of k-regular complete bipartite
graphs, cf. Fig. 5.11. We follow their proof below, see also [Zh16] for a survey.
6.2.2 Theorem. Let G = (V, E) be a k-regular graph. Then for any t > 0 we have

t d t (1 + t)k−1
ln ind G (t) ≤ ,
|V | dt 2(1 + t)k − 1

where equality is attained if and only if G is the vertex-disjoint union of k-regular


complete bipartite graphs. Consequently,
  |V |
ind G (t) ≤ 2(1 + t)k − 1 2k ,

where equality is attained if and only if G is the vertex-disjoint union of k-regular


complete bipartite graphs.
Following [D+15], we start with a lemma.
6.2.3 Lemma. Let G = (V, E) be a graph where 0 < |V | ≤ n. Then for t > 0 we
have
ind G (t) n(1 + t)n−1
≤ ,
ind G (t) − 1 (1 + t)n − 1

with equality obtained if and only if E = ∅ so that G consists of n isolated vertices.


Proof. First, let us assume that |V | = n. Let G ◦ = (V, ∅) be the graph with set V
of vertices and no edges. Then every set S ⊂ V of vertices is independent and

n
n m
ind G ◦ (t) = t = (1 + t)n .
m=0
m
6.2 The Independence Polynomial of Regular Graphs 191

Let

n
ind G (t) = am (G)t m ,
m=0

where am is the number of independent m-sets in G. Since each independent (m +1)-


set in G contains exactly m + 1 independent m-sets and any independent m-set in G
is contained in at most |V | − m independent sets of size m + 1, we obtain

(m + 1)am+1 ≤ (n − m)am for m = 0, 1, . . . , n − 1 (6.2.3.1)

and, consequently,
am+1 am
 n  ≤  n  for m = 0, . . . , n − 1.
m+1 m

 
Iterating, we obtain a j / nj ≤ ai / ni provided j ≥ i, which we write as

n n
aj ≤ ai provided n ≥ j ≥ i ≥ 0. (6.2.3.2)
i j

Let


2n
t ind G ◦ (t) (ind G (t) − 1) = bm (G)t m and
m=2


2n
t ind G (t) (ind G ◦ (t) − 1) = cm (G)t m ,
m=2

where
 n  n
bm = j ai and cm = iai .
i+ j=m
j i+ j=m
j
i, j>0 i, j>0

Hence
 n  n n
bm − cm = ai ( j − i) = ( j − i) ai − aj ≥0
i+ j=m
j i+ j=m
j i
i, j>0 j>i>0

by (6.2.3.2). In addition, bm = cm for all m if and only if (6.2.3.1) holds for all m.
Hence

t ind G ◦ (t) (ind G (t) − 1) ≥ t ind G (t) (ind G ◦ (t) − 1) for all t > 0
192 6 The Independence Polynomial

with equality if and only if G = G ◦ and the proof follows assuming that |V | = n.
Since for n ≥ 2, we have

(1 + t)n − 1 (1 + t)n−1 − 1 1 + nt − (1 + t)n


− = < 0,
n(1 + t)n−1 (n − 1)(1 + t)n−2 n(n − 1)(1 + t)n−1

we conclude that
n(1 + t)n−1 (n − 1)(1 + t)n−2
>
(1 + t)n − 1 (1 + t)n−1 − 1

and the proof follows for any |V | ≤ n. 

Given an independent set S ⊂ V in G, we call a vertex v ∈ V occupied by S


if v ∈ S and unoccupied otherwise. A vertex v ∈ V is called uncovered by S if
it is not adjacent to any occupied vertex and covered otherwise. In particular, an
occupied vertex is necessarily uncovered but an uncovered vertex may or may not
be occupied. The set of neighbors of v that are not adjacent to any vertex u ∈ S
that is not a neighbor of v is called the free neighborhood of v (the vertex v is not
a neighbor of itself). Vertices in the free neighborhood may or may not be covered,
see Fig. 6.3.
As in Sect. 6.2.1, we consider the set of independent sets in G as a probability
space.
6.2.4 Lemma. Let v ∈ V be a vertex, let pv be the probability that v is occupied
and let qv be the probability that v is uncovered. Then
1. We have
t
pv = qv .
1+t

2. Let us fix a set U of neighbors of v such that the probability that U is a free
neighborhood of v with respect to an independent set is positive and let H be
the subgraph induced by U . Then the conditional probability that v is uncovered
given that U is the free neighborhood of v is 1/ ind H (t), where we agree that the
ratio is 1 if U is empty.

Fig. 6.3 An independent set


(black dots), covered vertices
(grey dots), uncovered
vertices (white dots) and the
free neighborhood of the
central vertex (dots inside
the shaded region)
6.2 The Independence Polynomial of Regular Graphs 193

3. Let U and H be as in Part (2). Then the conditional expectation of |U ∩ S| given


that U is the free neighborhood of v is t ind H (t)/ ind H (t), where we agree that
the ratio is 0 if U is empty.

Proof. In Part (1), if v is unoccupied and uncovered by an independent set S then


S = S∪{v} is an independent set, Pr(S ) = tPr(S) and v is occupied by S . Similarly,
if v is occupied by S then S = S \ {v} is an independent set, Pr(S ) = t −1 Pr and v
is uncovered by S. Consequently,
  
qv = Pr(S) = Pr(S) + Pr(S)
S:v is uncovered S:v is occupied S:v is uncovered
and unoccupied
  1+t
= Pr(S) + t −1 Pr(S) = pv
S:v is occupied S:v is occupied
t

and the proof of Part (1) follows.


In Part (2), if U = ∅ then then every neighbor u of v is covered by a vertex that
is not a neighbor of v and hence u ∈ / S and v is necessarily uncovered. Suppose
now that U = ∅ and let  be the set of independent sets S for which U is the
free neighborhood of v. Then, for S ∈  the vertex v is uncovered if and only if
S ∩ U = ∅. If S ∈  is an independent set then S1 = S ∩ U is an independent set in
H , S2 = S \ U is an independent set in G such that S2 ∈  and Pr(S) = t |S1 | Pr(S2 ).
Vice versa, if S1 ⊂ U is an independent set in H and S2 ∈  is an independent set
such that S2 ∩ U = ∅ then S = S1 ∪ S2 is an independent set such that S ∈  and
Pr(S) = Pr(S2 )t |S1 | . Hence
 
Pr(S) = t S1 Pr(S2 )
S∈ S1 : S1 is independent in H
S2 ∈: S2 ∩U =∅ (6.2.4.1)

= ind H (t) Pr(S2 )
S2 ∈: S2 ∩U =∅

and
   
     
Pr S ∈  : S ∩ U = ∅ S ∈  = Pr(S) Pr(S)
S∈: S∩U =∅ S∈
1
=
ind H (t)

and the proof of Part (2) follows.


To prove Part (3), we define  as above. Clearly, if U = ∅ then the conditional
expectation of |U ∩ S| is 0. We assume therefore that U = ∅. Arguing as in the proof
of Part (2), we obtain
194 6 The Independence Polynomial
 
|U ∩ S| · Pr(S) = |S1 |t |S1 | Pr(S2 )
S∈ S1 : S1 is independent in H
S2 ∈: S2 ∩U =∅

=t ind H (t) Pr(S2 )
S2 ∈: S2 ∩U =∅

and by (6.2.4.1)
   
  
E (|S ∩ U | : S ∈ ) = |S ∩ U | · Pr(S) Pr(S)
S∈ S∈
t ind H (t)
= ,
ind H (t)

which concludes the proof of Part (3). 

Now we are ready to prove Theorem 6.2.2.

6.2.5 Proof of Theorem 6.2.2. As before, we consider the set of all independent
sets in G as a probability space. For a vertex v ∈ V , let pv be the probability that
the vertex is occupied and let qv be the probability that the vertex is uncovered. Let
Nv be the neighborhood of v in G and let Uv,S be the free neighborhood of v with
respect to an independent set S.

Let U v be the set of all subsets U ⊂ V that appear as the free neighborhood of v
with positive probability yv,U , so that

yv,U = 1 for all v ∈ V.
U ∈Uv

Let 
U= Uv
v∈V

and for U ∈ U let 


xU = yv,U .
v: U ∈Uv

Hence  
xU = yv,U = |V |. (6.2.5.1)
U ∈U v∈V U ∈Uv

Let G(U ) denote the subgraph induced by U .


Using Part (1) of Lemma 6.2.4, we express the average size of a random indepen-
dent set S as follows:  t 
E|S| = pv = qv .
v∈V
1 + t v∈V
6.2 The Independence Polynomial of Regular Graphs 195

From Part (2) of Lemma 6.2.4 we further write


  yv,U  xU
qv = = .
v∈V v∈V U ∈Uv
ind G(U ) (t) U ∈U ind G(U ) (t)

Hence
t  xU
E|S| = . (6.2.5.2)
1 + t U ∈U ind G(U ) (t)

On the other hand, since every vertex u ∈ S has k neighbors, for any independent
set S we can write
1 1
|S| = |Nv ∩ S| = |Uv,S ∩ S|.
k v∈V k v∈V

Using Part (3) of Lemma 6.2.4, we write

1  t ind G(U ) (t) 1  t ind G(U ) (t)


E|S| = yv,U = xU . (6.2.5.3)
k v∈V ind G(U ) (t) k U ∈U ind G(U ) (t)
U ∈Uv

Since |U | ≤ k, from Lemma 6.2.3 we have

t t (1 + t)k−1  
ind G(U ) (t) ≤ ind G(U ) (t) − 1
k (1 + t) − 1
k

and hence using that x(U ) > 0 we obtain from (6.2.5.3)

t (1 + t)k−1  t (1 + t)k−1  x(U )


E|S| ≤ x(U ) − .
(1 + t)k − 1 U ∈U (1 + t)k − 1 U ∈U ind G(U ) (t)

Using (6.2.5.2), we conclude that

t (1 + t)k−1  (1 + t)k
E|S| ≤ x(U ) − E|S|,
(1 + t) − 1 U ∈U
k (1 + t)k − 1

so that
t (1 + t)k−1 
E|S| ≤ x(U ).
2(1 + t)k − 1 U ∈U

Applying (6.2.5.1), we obtain

1 t (1 + t)k−1
E|S| ≤ . (6.2.5.4)
|V | 2(1 + t)k−1
196 6 The Independence Polynomial

The desired inequality follows by (6.2.1.1). We get equality in (6.2.5.4) if every free
neighborhood that appears with positive probability consists of exactly k discon-
nected points or empty. 

6.3 Correlation Decay for Regular Trees

6.3.1 Occupancy
 probabilities
 on graphs and trees. Let G = (V, E) be a graph
and let z = z v : v ∈ V be a vector of non-negative activities. We consider the set
of independent sets S in a given graph G as a finite probability space where

Pr(S) = (ind G (z))−1 zv
v∈S

(if S = ∅ then the corresponding product is 1). Let p(v) be the probability that a
vertex v is occupied, that is, belongs to a random independent set S. We rewrite
(6.1.1.1) as
ind G−v (z) 1
= . (6.3.1.1)
ind G (z) 1 + z v G−v−Nv (z)
ind
ind G−v (z)

Then
ind G−v (z)
1 − p(v) =
ind G (z)

is the probability that a random independent set S in G does not contain v.

If G is a tree then G − v is a vertex-disjoint union of trees and hence the ratio

ind G−v−Nv (z)


ind G−v (z)

is naturally interpreted as the probability that none of the neighbors of v is occupied


in each of the trees obtained from G by deleting v.
First, we consider the case of an (almost) regular tree Tkn , see Sect. 5.2.4, in detail.

6.3.2 Trees Tkn and the phase transition. Let us consider a tree Tkn , with vertices
at the levels 0, 1, . . . , n, with one vertex, called the root, at the 0th level connected
to (k − 1) vertices at the level 1 and with every vertex at the i-th level connected
to one vertex at the (i − 1)-st level and k − 1 vertices at the (i + 1)-st level, for
i = 1, . . . , n − 1, see Sect. 5.2.4 (we assume that k ≥ 3). If a vertex v at the i-th
level is connected to a vertex u at the (i + 1)-st level, we call u a descendant of v.

We fix a t > 0 and, as in Sect. 6.3.1 consider the set of all independent sets in Tkn
as a probability space, with probability of an independent set S proportional to t |S| .
In other words, we set all activities z v = t. Let pn = pk,n (t) be the probability that
6.3 Correlation Decay for Regular Trees 197

root is occupied, that is, lies in a random independent set of Tkn . We are interested in
the asymptotic behavior of pn when k and t are fixed and n grows.
The equation (6.3.1.1) implies that

1 t
1 − pn = where p0 = . (6.3.2.1)
1 + t (1 − pn−1 )k−1 1+t

It turns out that the asymptotic behaviors of pn for large and small t are very different.
Namely, let
(k − 1)k−1
tc = ,
(k − 2)k

called the critical t. Then for t < tc there exists

p∞ = lim pn
n−→∞

while for t > tc there exist limits

peven = lim p2n and podd = lim p2n+1


n−→∞ n−→∞

and peven = podd . The values t < tc are called subcritical whereas values t > tc are
called supercritical. Physicists say that the model experiences a phase transition at
t = tc .
In view of (6.3.2.1), the results follow from the following general theorem, cf.
[Sp75].

6.3.3 Theorem. Fix some t > 0 and an integer k > 2 and consider the transforma-
tion
1
T (x) = Tt,k (x) = for 0 ≤ x ≤ 1.
1 + t x k−1

Let
(k − 1)k−1
tc = .
(k − 2)k

For a positive integer n, let T n denote the n-th iteration of T , so that T 2 (x) =
T (T (x)), T 3 (x) = T (T (T (x)), etc.
Then there exists a unique point x0 = x0 (t, k) such that T (x0 ) = x0 . If t < tc
then
lim T n (x) = x0 for all 0 ≤ x ≤ 1.
n−→∞

Moreover, the convergence is exponentially fast, meaning that there exist γ =


γ(t, k) > 0 and 0 < δ = δ(t, k) < 1 such that
 
ln T n (x) − ln x0  ≤ γδ n for all 0 ≤ x ≤ 1.
198 6 The Independence Polynomial

If t > tc then there exist x− = x− (t, k) and x+ = x+ (t, k) such that

x− < x0 < x+

while
lim T 2n (x) = x− for all 0 ≤ x < x0
n−→∞

and
lim T 2n (x) = x+ for all x0 < x ≤ 1.
n−→∞

Proof. It is convenient to parameterize x = e−s for 0 ≤ s ≤ +∞. In the new


coordinates, T can be written as
 
T (s) = ln 1 + te−s(k−1) .

Since T (s) is decreasing from T (0) = ln(1 + t) > 0 to T (+∞) = 0, there is a


unique fixed point a = a(t) such that T (a) = a, see Fig. 6.4.
Moreover, if s > a then T (s) < T (a) = a and if s < a then T (s) > T (a) = a.
Since for s > 0 we have Tt1 (x) > Tt2 (x) if and only if t1 > t2 we conclude that a(t)
is an increasing continuous function of t. In addition,

lim a(t) = 0 and lim a(t) = +∞


t−→0+ t−→+∞

and hence the set of possible values of a(t) is the interval (0, +∞).
We have
t (k − 1)e−s(k−1)
T (s) = −
1 + te−s(k−1)

and
t (k − 1)e−a(k−1)
T (a) = − .
1 + te−a(k−1)

Fig. 6.4 The graphs


 of
y = ln 1 + 3e−2x and
y = x.
6.3 Correlation Decay for Regular Trees 199

Since
1 + te−a(k−1) = ea ,

we conclude that
   
t = ea(k−1) ea − 1 and T (a) = −(k − 1) 1 − e−a .

If a = ln k−1
k−2
then T (a) = −1 and

k−1
k−1 1 (k − 1)k−1
t= = = tc .
k−2 k−2 (k − 2)k

Since a(t) is an increasing function of t, we conclude that for t < tc we have


0 > T (a) > −1 and for t > tc we have T (a) < −1.
It is now clear that if t > tc then a is an unstable fixed point: if s = a is sufficiently
close to a then |T (s) − a| > |s − a| and hence for any s = 0 the sequence T n (s)
cannot converge to a. On the other hand, if t < tc then a is a locally stable fixed
point: if s is sufficiently close to a then |T (s) − a| ≤ δ|x − a| for some 0 < δ < 1
and for any s sufficiently close to a the sequence T n (s) converges to a.
We consider the second iteration T (T (s)). Clearly, T (T (s)) is an increasing func-
tion of s. We claim that T (T (s)) is either concave or has exactly one inflection point,
where it changes from convex to concave, see Fig. 6.5.

We have

(T (T (s)) =T (T (s))T (s)


  −(k−1) 
t (k − 1) 1 + te−s(k−1) t (k − 1)e−s(k−1)
= −  −(k−1) −
1 + t 1 + te−s(k−1) 1 + te−s(k−1)
t 2 (k − 1)2 e−s(k−1)
= k  .
1 + te−s(k−1) + t 1 + te−s(k−1)

Fig. 6.5 The graph of


T (T (s)) for t = 2 and
k = 11
200 6 The Independence Polynomial

Thus we need to show that (T (T (s)) is either decreasing or first increasing and then
decreasing.
Equivalently, letting y = e−s(k−1) , we have to show that the function

(1 + t y)k + t (1 + t y)
f (y) = for 0 ≤ y < 1
y

is either decreasing or first decreasing and then increasing. We write

1 + t  j k j−1
k
f (y) = t (t + k) + + t y ,
y j=2
j

from which it follows that f is convex. Since

lim f (y) = +∞,


y−→0+

this proves that f (y) is either decreasing for 0 < y ≤ 1 or first decreasing and
then increasing. Consequently, T (T (s)) is either decreasing or first increasing and
then decreasing. Therefore, T (T (s)) is either concave for s ≥ 0 or has exactly one
inflection point, where it changes from convex to concave.
Next, we observe that s = a, where a is the unique fixed point of T must also be
a fixed point of T 2 . If b < a is a fixed point of T 2 then c = T (b) > a is another
fixed point of T 2 and if c > a is a fixed point of T 2 then b = T (c) < a is also a
fixed point of T 2 . Since T 2 has at most one inflection point, there cannot be more
than 3 fixed points, see Fig. 6.6.
If there are three fixed points of T 2 then we must have (T (a))2 = T (T (a)) > 1
which means that t > tc , see Fig. 6.6a. If t < tc , we must have one fixed point of T 2 ,
which is also the fixed point of T . Moreover, T 2 (s) is an increasing function such
that T 2 (s) < s for s < a and T 2 (s) > s for s > a.
It follows that if t < tc then for any 0 ≤ s < a, the sequence T 2n (s) is an
increasing sequence converging to a, while for any s > a, the sequence is T 2n (a) is
a decreasing sequence converging to a, see Fig. 6.7.

Fig. 6.6 The map T 2 has (a) (b)


either three (a) or one (b)
fixed points. In a, the fixed
point a of T is locally
repelling and in (b) it is
locally attracting

b a c s a s
6.3 Correlation Decay for Regular Trees 201

Fig. 6.7 Iterations T 2n (s)


for s < a and s > a when
t < tc

0 a s

Fig. 6.8 Iterations T 2n (s)


when t > tc .

0 b a c s

Therefore,
lim T n (s) = a for all 0 ≤ s < ∞.
n−→∞

If t > tc then for any 0 ≤ s < b the sequence T 2n (s) is an increasing sequence
converging to b and for any b < s < a the sequence T 2n (s) is a decreasing sequence
converging to b, while for any a < s < c the sequence T 2n (s) is an increasing
sequence converging to c and for any s > c the sequence T 2n (s) is a decreasing
sequence converging to c, see Fig. 6.8.
Therefore,

lim T 2n (s) = b for all 0 ≤ s < a and


n−→∞
lim T 2n (s) = c for all s > a.
n−→∞

It remains to show that if t < tc then |T n (s) − a| decreases exponentially fast


with n. Since T switches sets s < a and s > a it suffices to prove exponential decay
for one of the two sets. However, see Fig. 6.7, we have that

d 2 d 2 
0 ≤ T (s) ≤ T (s) < δ for all s ≤ a or for all s ≥ a
ds ds s=a
202 6 The Independence Polynomial

and some δ = δ(t) < 1. Thus T 2 is a contraction for all s > a or for all s < a, so
that
 2n 
T (a) − T 2n (s) ≤ δ n |s − a| for all s > a or for all s < a.

Since for

a ≤ s ≤ +∞ we have T 2 (a) = a ≤ T 2 (s) ≤ T 2 (+∞) = ln(1 + t),

the proof follows. 

6.3.4 Correlation decay in trees Tkn . Let us consider the tree Tkn of Sect. 6.3.2 and
fix some subcritical
(k − 1)k−1
t < tc = .
(k − 2)k

Let pno = pk,n


o
(t) be the conditional probability that the root of Tkn is occupied given
that all vertices at the n-th level are occupied, see Fig. 6.9a.

Arguing as in Sects. 6.3.1 and 6.3.2, we conclude that pno satisfies the recursion

1
1 − pno =  k−1 where p0 = 1.
o
1 + t 1 − pn−1
o

Fig. 6.9 Black dots are (a) (b)


occupied vertices, white dots
are unoccupied vertices and
grey dots are the vertices that
can be occupied or
unoccupied

(c)
6.3 Correlation Decay for Regular Trees 203

Hence by Theorem 6.3.3, we have

lim pno = 1 − x (6.3.4.1)


n−→∞

where x is the unique real solution of the equation

1
x= . (6.3.4.2)
1 + t x k−1

Moreover, the convergence in (6.3.4.1) is exponentially fast, meaning that


   
ln 1 − p o − ln (1 − x) ≤ γδ n ,
n

for some γ = γ(t, k) > 0 and some 0 < δ(t, k) < 1.


Next, let pnu = pk,n
u
(t) be the conditional probability that the root of Tkn is occupied
given that all vertices at the n-th level are unoccupied, see Fig. 6.9b. Arguing as in
Sects. 6.3.1 and 6.3.2, we conclude that pnu satisfies the recursion

1
1 − pnu =  k−1 where p0u = 0.
1+t 1− u
pn−1

Hence by Theorem 6.3.3, we have

lim pnu = 1 − x, (6.3.4.3)


n−→∞

where x is the same unique real solution of the Eq. (6.3.4.2). Moreover, the conver-
gence in (6.3.4.3) is exponentially fast, meaning that
   
ln 1 − p u − ln (1 − x) ≤ γδ n ,
n

for some γ = γ(t, k) > 0 and some 0 < δ(t, k) < 1. In particular, the limits in
(6.3.4.1) and (6.3.4.3) coincide.
Finally, let us impose impose some arbitrary occupancy constraints  at the n-th

level of Tkn , see Fig. 6.9c and let pn = pn,k (t) be the conditional probability that
the root is occupied given those constraints. For a vertex v of Tkn , let p  (v) be the
conditional probability that v is occupied given the constraints  at the n-th level.
Arguing as in Sects. 6.3.1 and 6.3.2, we arrive to the recurrence

1
1 − p  (v) =    , (6.3.4.4)
1 + t 1 − pv1 · · · 1 − pvk−1

where v1 , . . . , vk−1 are the descendants of v and the initial conditions are p  (v) = 1
if vertex v at the n-th level is occupied and p  (v) = 0 if vertex v at the n-th level is
unoccupied.
204 6 The Independence Polynomial

For vertices v at the n-th level, we clearly have

0 = p0u ≤ p  (v) ≤ 1 = p0o .

From (6.3.4.4), for the vertices v at the (n − 1)-st level, we have

0 = p1o ≤ p  (v) ≤ p1u .

Iterating, we obtain,
pmu ≤ p  (v) ≤ pmo ,

when m is even and v is a vertex at the (n − m)-th level and

pmo ≤ p  (v) ≤ pmu ,

when m is odd and v is a vertex at the (n − m)-th level. Therefore,


   
min pnu , pno ≤ pn ≤ max pnu , pno .

From (6.3.4.1) and (6.3.4.3), we conclude that

lim pn = 1 − x, (6.3.4.5)


n−→∞

where x is the unique real solution of (6.3.4.2). In other words, asymptotically, as


n grows, the conditional probability that the root is occupied does not depend on
the occupancy constraint  at the n-th level if Tkn . Hence we say that for subcritical
t < tc the model exhibits correlation decay. Moreover, the convergence in (6.3.4.5)
is exponentially fast, meaning that
   
ln 1 − p  − ln(1 − x) ≤ γδ n ,
n

for some γ = γ(t, k) > 0 and some 0 < δ = δ(t, k) < 1.


For supercritical values t > tc the root of the tree Tkn remembers the occupancy
constraint on the leaves, no matter how large n is. If pno = pno (t) is the conditional
probability that the root is occupied given that all leaves are occupied, we have
o
lim p2n > lim p2n+1
o
n−→∞ n−→∞

(both limits exist). Similarly, for the conditional probability pnu = pnu (t) that the root
is occupied given that all leaves are unoccupied, we have
u
lim p2n < lim p2n+1
u
n−→∞ n−→∞
6.3 Correlation Decay for Regular Trees 205

(both limits exist). Imposing the condition that all leaves are occupied, makes the
vertices on the m-th level more likely to be occupied if m ≡ n mod 2 and less likely
to be occupied if m ≡ n + 1 mod 2.

6.4 Correlation Decay for General Graphs

Our next goal is to show that the similar correlation decay for subcritical t holds
not only for trees but also for general graphs. This was proved by Weitz [We06] and
we follow his exposition. As the first and crucial step, we consider trees Tkn with
different non-negative real activities z v at vertices.

6.4.1 Trees Tkn with different activities at vertices. Suppose now that each vertex
v of Tkn has its own real activity z v ≥ 0. From (6.3.1.1), the probability p(v) that
vertex v is occupied satisfies

1
1 − p(v) =    , (6.4.1.1)
1 + z v 1 − p(u 1 ) · · · 1 − p(u k−1 )

where u 1 , . . . , u k−1 are the descendants of v. Following [We06], we introduce ratios

p(v)
r (v) =
1 − p(v)

for each vertex v of Tkn . Then

r (v)
0 ≤ r (v) ≤ +∞, p(v) =
1 + r (v)

and the recursion (6.4.1.1) is written as


zv
r (v) =    , (6.4.1.2)
1 + r (u 1 ) · · · 1 + r (u k−1 )

where u 1 , . . . , u k−1 are the descendants of v.

Let rnmax = rnmax (z) denote the largest possible value of r (v) at the root v of Tkn
given the vector of activities z = (z u : u ∈ V ) where the maximum is taken over all
possible choices of the initial values r (u) at the leaves u of Tkn and let rnmin = rnmin (z)
denote the smallest possible value of r (v) at the root v of Tkn given the vector of
activities z = (z u : u ∈ V ) where the minimum is taken over all possible choices of
the initial values r (u) at the leaves u of Tkn . We denote rnmax (t), respectively rnmin (t)
the corresponding quantities when z u = t for some t ≥ 0 and all vertices u.
206 6 The Independence Polynomial

6.4.2 Theorem. Suppose that


0 < zu ≤ t

for some t > 0 and all vertices u of Tkn . Then for n ≥ 2, we have rnmax (z) < +∞,
rnmin (z) > 0 and the inequalities

rnmax (z) rnmax (t)


≤ (6.4.2.1)
rnmin (z) rnmin (t)

and
1 + rnmax (z) 1 + rnmax (t)
≤ (6.4.2.2)
1 + rnmin (z) 1 + rnmin (t)

hold. In addition, r1max (z) < +∞ and (6.4.2.2) holds for n = 1.

Some remarks are in order. As follows from (6.4.1.2), if n is odd, the value of
rnmax is attained when r (v) = 0 for all leaves v of Tkn and the value of rnmin is attained
when r (v) = +∞ for all leaves v of Tkn , while if n is even, the value of rnmax is
attained when r (v) = +∞ for all leaves v of Tkn and the value of rnmin is attained
when r (v) = 0 for all leaves v of Tkn . By continuity, inequality (6.4.2.2) holds when
0 ≤ z v ≤ t. It can be written as

1 − pnmin (z) 1 − pnmin (t)


≤ ,
1 − pnmax (z) 1 − pnmax (t)

where p max , respectively, p min is the maximum, respectively minimum, probability


that the root is occupied taken over all possible initial occupancy probabilities 0 ≤
p(v) ≤ 1 on the leaves of Tkn . As is discussed in Sect. 6.3.4, for subcritical values

(k − 1)k−1
t < tc =
(k − 2)k

we have
1 − pnmin (t)
lim =1
n−→∞ 1 − p max (t)
n

and hence necessarily


1 − pnmin (z)
lim = 1.
n−→∞ 1 − p max (z)
n

In other words, the tree Tkn with different subcritical activities at each vertex also
exhibits correlation decay.
We prove Theorem 6.4.2 by induction on n. First, we establish some inequalities.
6.4 Correlation Decay for General Graphs 207

6.4.3 Lemma.
(1) Let a, b, c and d be non-negative numbers such that b > 0, d > 0,
a c
a ≤ c and 1 ≤ ≤ .
b d
Then
1+a 1+c
≤ .
1+b 1+d

(2) Let c ≥ b ≥ 0 be reals. Then for any α ≥ δ ≥ 0, we have

1 + αc 1 + δc
≥ .
1 + αb 1 + δb

Proof. We have

1+c 1+a (1 + c)(1 + b) − (1 + a)(1 + d) (b + c) − (a + d) + (cb − ad)


− = = .
1+d 1+b (1 + d)(1 + b) (1 + d)(1 + b)

Since
c a cb − ad
− = ≥ 0,
d b db
we conclude that cb − ad ≥ 0.
Writing c = γa for some γ ≥ 1, we conclude that d ≤ γb and hence

(b + c) − (a + d) ≥ b + γa − a − γb = (γ − 1)(a − b) ≥ 0

and the proof of Part (1) follows.


To prove Part (2), we note that

1 + αc 1 + δc (1 + αc)(1 + δb) − (1 + δc)(1 + αb) (c − b)(α − δ)


− = = ≥ 0.
1 + αb 1 + δb (1 + δb)(1 + αb) (1 + δb)(1 + αb)


6.4.4 Lemma. Let k > 2 be a positive integer, and t, b and c be non-negative real
such that
t t
b ≤ c, c ≥ and b ≤ .
(1 + b)k−1 (1 + c)k−1

Let us define a function

1 + t (1 + α1 b)−1 · · · (1 + αk−1 b)−1


f (α1 , . . . , αk−1 ) = for α1 , . . . , αk−1 ≥ 0.
1 + t (1 + α1 c)−1 · · · (1 + αk−1 c)−1
208 6 The Independence Polynomial

Then

f (α1 , . . . , αk−1 ) ≤ f (1, . . . , 1) for all 0 ≤ α1 , . . . , αk−1 ≤ 1.

Proof. We need to prove that


 
f (1, . . . , 1) 1 + t (1 + α1 c)−1 · · · (1 + αk−1 c)−1
≥ 1 + t (1 + α1 b)−1 · · · (1 + αk−1 b)−1

provided 0 ≤ α1 , . . . , αk−1 ≤ 1. Since for α1 = . . . = αk−1 = 1 we attain equality


above, it suffices to prove that the function

g (α1 , . . . , αk−1 ) =1 + t (1 + α1 b)−1 · · · (1 + αk−1 b)−1


− f (1, . . . , 1) − t f (1, . . . , 1) (1 + α1 c)−1 · · · (1 + αk−1 c)−1

is non-decreasing in every variable 0 ≤ αi ≤ 1 provided the remaining variables


0 ≤ α j ≤ 1 are fixed. By symmetry, it suffices to check that


g (α1 , . . . , αk−1 ) ≥ 0
∂α1

provided 0 ≤ α1 , . . . , αk−1 ≤ 1. Computing the derivative, we obtain


g (α1 , . . . , αk−1 ) = −tb (1 + α1 b)−1 (1 + α1 b)−1 · · · (1 + αk−1 b)−1
∂α1
+ tc f (1, . . . , 1) (1 + α1 c)−1 (1 + α1 c)−1 · · · (1 + αk−1 c)−1 .

Hence it suffices to prove that

1 + α1 c (1 + α1 c) · · · (1 + αk−1 c) c
· ≤ f (1, . . . , 1). (6.4.4.1)
1 + α1 b (1 + α1 b) · · · (1 + αk−1 b) b

On the other hand,

c c + tc(1 + b)−(k−1) t (1 + b)−(k−1) + tc(1 + b)−(k−1)


f (1, . . . , 1) = ≥
b b + tb(1 + c)−(k−1) t (1 + c)−(k−1) + tb(1 + c)−(k−1)
−(k−1)
(1 + c)(1 + b) (1 + c)k
= = .
(1 + b)(1 + c)−(k−1) (1 + b)k
(6.4.4.2)

By Part (2) of Lemma 6.4.3, we have

1 + αi c 1+c
≤ for i = 1, . . . , k − 1
1 + αi b 1+b
6.4 Correlation Decay for General Graphs 209

and hence

1 + α1 c (1 + α1 c) · · · (1 + αk−1 c) (1 + c)k
· ≤ . (6.4.4.3)
1 + α1 b (1 + α1 b) · · · (1 + αk−1 b) (1 + b)k

Combining (6.4.4.3) and (6.4.4.2), we obtain (6.4.4.1) and hence complete the
proof. 

The final lemma before we embark on the proof of Theorem 6.4.2.


6.4.5 Lemma. For any t > 0 we have

rnmax (t) ≤ rn−1


max
(t) and rnmin (t) ≥ rn−1
min
(t)

for all positive integer n.

Proof. We proceed by induction on n. We have

r0max (t) = +∞ and r0min (t) = 0

and, by (6.4.1.2),

t
r1max (t) =  k−1 = t < r0max (t)
1 + r0min (t)

and
t
r1min (t) =  k−1 = 0 = r0 (t).
min
1 + r0 (t)
max

For n > 1 by (6.4.1.2) and the induction hypothesis, we have

t t
rnmax (t) =  k−1 ≤  k−1 = rn−1 (t)
max
1 + rn−1 (t)
min
1 + rn−2 (t)
min

and, similarly,

t t
rnmin (t) =  k−1
≥  k−1 = rn−1 (t),
min
1 + rn−1
max
(t) 1 + rn−2
max
(t)

which completes the proof. 

6.4.6 Proof of Theorem 6.4.2. Let v be the root of Tkn . We have

r0max (z) = r0max (t) = +∞ and r0min (z) = r0min (t) = 0,


210 6 The Independence Polynomial

from which

r1max (z) = z v , r1max (t) = t, r1min (z) = r1min (t) = 0.

Since z v ≤ t, we have

1 + r1max (z) 1 + r1max (t)


= 1 + zv ≤ 1 + t =
1 + r1 (z)
min
1 + r1min (t)

which proves (6.4.4.2) for n = 1.

If we remove v with adjacent edges from Tkn , we obtain a vertex-disjoint union of


k − 1 trees Tkn−1 , the i-th tree with activity vector z i satisfying z i (u) ≤ t for all u.
Applying (6.4.1.2), we obtain
zv
rnmax (z) =     and
1 + rn−1
min
(z 1 ) · · · 1 + rn−1
min
(z k−1 )
zv (6.4.6.1)
rnmin (z) =    
1 + rn−1 (z 1 ) · · · 1 + rn−1
max max
(z k−1 )

and, similarly,

t t
rnmax (t) =  k−1 and rn (t) = 
min
k−1 .
1 + rn−1 (t)
min
1 + rn−1 (t)
max

We proceed by induction on n. For n = 2, by (6.4.6.1) we have

r2max (z) = z v , r2max (t) = t, r2min (z) > 0, r2min (t) > 0

and
k−1
r2max (z) 1 + r1max (z 1 ) 1 + r1max (z k−1 ) 1 + r1max (t) r2max (t)
= ··· ≤ = ,
r2 (z)
min
1 + r1 (z 1 )
min
1 + r1min (z k−1 ) 1 + r1min (t) r2min (t)

which establishes (6.4.2.1) for n = 2. Moreover, since r2max (z) ≤ r2max (t), the inequal-
ity (4.3.2) follows by Part (1) of Lemma 6.4.3.
Suppose that n > 2. Applying the induction hypothesis, we obtain from (6.4.6.1)

rnmax (z) 1 + rn−1


max
(z 1 ) 1 + rn−1
max
(z k−1 ) 1 + rn−1
max
(t) k−1
rnmax (t)
= · · · ≤ = .
rnmin (z) 1 + rn−1 (z 1 )
min
1 + rn−1
min
(z k−1 ) 1 + rn−1
min
(t) rnmin (t)

In particular, rnmax (z) < +∞, rnmin (z) > 0 and (6.4.2.1) follows.
Hence our goal is to prove (6.4.2.2).
First, we observe that if
rnmax (z) ≤ rnmax (t)
6.4 Correlation Decay for General Graphs 211

then (6.4.2.2) follows by (6.4.2.1) and Part (1) of Lemma 6.4.3. Hence without loss
of generality, we may assume that

rnmax (z) > rnmax (t). (6.4.6.2)

Let z be the vector of activities obtained from z by replacing the activity z v of the
root by t ≥ z v . From (6.4.6.1) it follows that

t max t
rnmax (z ) = rn (z) and rnmin (z ) = rnmin (z)
zv zv

so that by Part (2) of Lemma 6.4.3, we have

1 + rnmax (z ) 1 + rnmax (z)


≥ .
1 + rn (z )
min 1 + rnmin (z)

Therefore, without loss of generality, we may assume that

z v = t. (6.4.6.3)

Recall that z i is the vector of activities at the vertices of the i-th tree Tkn−1 obtained
from Tkn by removing the root v with the adjacent edges. Let I ⊂ {1, . . . , k − 1} be
the set of indices i such that
min
rn−1 (z i ) ≥ rn−1
min
(t).

Let z be the vector of activities at the vertices of Tkn obtained by replacing the vector
z i of activities at the vertices of the i-th tree Tkn−1 by t for all i ∈ I and let z i be the
corresponding vector of activities at the vertices of the i-th tree Tkn−1 (hence z i = z i
if i ∈
/ I and z i is the constant vector of t if i ∈ I ).
By the induction hypothesis,

1 + rn−1
max
(z i ) 1 + rn−1
max
(t)

1 + rn−1 (z i )
min
1 + rn−1 (t)
min

and using (6.4.6.1) and (6.4.6.3), we conclude that

rnmax (z) rnmax (z )


≤ .
rnmin (z) rnmin (z )

Moreover, we have
min
rn−1 (z i ) ≤ rn−1
min
(z i ) for all i = 1, . . . , k − 1
212 6 The Independence Polynomial

and by (6.4.6.1) and (6.4.6.3) we have

rnmax (z) ≤ rnmax (z ).

It follows then by Part (1) of Lemma 6.4.3 that

1 + rnmax (z) 1 + rnmax (z )


≤ .
1 + rnmin (z) 1 + rnmin (z )

Therefore, without loss of generality, we may assume that I = ∅ and hence


min
rn−1 (z i ) ≤ rn−1
min
(t) for i = 1, . . . , k − 1. (6.4.6.4)

In view of (6.4.6.4), let us define 0 ≤ α1 , . . . , αk−1 ≤ 1 such that


min
rn−1 (z i ) = αi rn−1
min
(t) for i = 1, . . . , k − 1.

By the induction hypothesis,


max
rn−1 (z i ) max
rn−1 (t)

rn−1 (z i )
min
rn−1 (t)
min

and hence
max
rn−1 (z i ) ≤ αi rn−1
max
(t) for i = 1, . . . , k − 1.

Applying (6.4.6.1) and (6.4.6.3), we conclude that


 −1  −1
1 + rnmax (z) 1 + t 1 + α1rn−1
min
(t) · · · 1 + αk−1rn−1
min
(t)
≤  −1  −1 .
1 + rnmin (z) 1 + t 1 + α1rn−1
max
(t) · · · 1 + αk−1rn−1
min
(t)

Besides, from Lemma 6.4.5,


t
max
rn−1 (t) ≥ rnmax (t) =  k−1 and
1 + rn−1 (t)
min

t
min
rn−1 (t) ≤ rnmin (t) =  k−1 .
1 + rn−1 (t)
max

Applying Lemma 6.4.4, we obtain


 −(k−1)
1 + rnmax (z) 1 + t 1 + rn−1
min
(t) 1 + rnmax (t)
≤  −(k−1)
=
1 + rnmin (z) 1 + t 1 + rn−1
max
(t) 1 + rnmin (t)

which proves (6.4.2.2) and completes the proof of the theorem. 


6.4 Correlation Decay for General Graphs 213

6.4.7 Correlation decay for general graphs. Let G = (V, E) be a general graph
and suppose that the degrees of vertices do not exceed  ≥ 3. Weitz [We06] showed
that if
( − 1)−1
t < tc =
( − 2)

then the probability p(v) that a particular vertex is occupied is asymptotically inde-
pendent on whether vertices far away from v are occupied (as in Sect. 5.2.3, we
measure the distance between a pair of vertices by the smallest number of edges in a
path connecting the vertices). Weitz [We06] deduced this result from Theorem 6.4.2,
and we sketch the reduction here.

First, we note that Theorem 6.4.2 implies correlation decay on k-regular trees
with subcritical activities

(k − 1)k−1
0 ≤ z u ≤ t < tc = (6.4.7.1)
(k − 2)k

at the vertices. Indeed, suppose that v is the root of a k-regular tree with n levels,
see Fig. 5.3, and let u 1 , . . . , u k be the neighbors of v. Let us impose some occupancy
condition  on the leaves of the tree (that is, set some leaves as occupied, as the rest
as unoccupied). If we remove v with incident edges, the remaining graph splits into
the vertex-disjoint union of k trees Tkn−1 , and from (6.3.1.1) we deduce the following
recursive relation
1
1 − p  (v) =    
1 + z v 1 − p  (u 1 ) · · · 1 − p  (u k )

for the probabilities p  (u) of occupancy. Theorem 6.4.2 implies that as n grows, the
probabilities p  (u 1 ), . . . , p  (u k ) converge to limits independent on the occupancy
condition  at the leaves of the tree and hence the probability p  (v) that the root is
occupied also converges to a limit independent of .
The next observation is that we have correlation decay if G is a tree where the
degree of every vertex is at most k and subcritical activities (6.4.7.1) at every vertex.
This case reduces to the case of a k-regular tree by adding auxiliary vertices where
needed with zero activities, cf. Fig. 6.10.
Finally, Weitz [We06] reduces the case of a general graph G = (V, E) with largest
degree (G) > 2 of a vertex and subcritical activities at the vertices to the case of a
tree with degrees of the vertices not exceeding (G). We present a modification of
that construction suggested by Gamarnik [Ga16].
We start by rewriting (6.3.1.1) for the case when z v = t for all v ∈ V :

ind G−v (t) 1


= ind G−v−Nv (t)
. (6.4.7.2)
ind G (t) 1 + t indG−v (t)
214 6 The Independence Polynomial

Fig. 6.10 A tree (black


nodes) with a vertex v
appended (white nodes) to a
3-regular tree with root v

Let v1 , . . . , vk , k ≤ (G), be the vertices of Nv (that is, the neighbors of v), listed
in some order. We can further rewrite
ind G−v−Nv (t) ind G−v−v1 (t) ind G−v−v1 −v2 (t)
= · ···
ind G−v (t) ind G−v (t) ind G−v−v1 (t)
(6.4.7.3)
ind G−v−v1 −...−vk (t)
× .
ind G−v−v1 −...−vk−1 (t)

Let p(v, v1 , . . . , vi ; vi+1 ) be the conditional probability that a random independent


set contains vi+1 given that it does not contain any of the vertices v, v1 , . . . , vi . Then

ind G−v−v1 −...−vi+1 (t)


1 − p (v, v1 . . . , vi ; vi+1 ) =
ind G−v−v1 −...−vi (t)

and combining (6.4.7.2) and (6.4.7.3), we obtain

1
1 − p(v) =    .
1 + t (1 − p (v; v1 )) (1 − p (v, v1 ; v2 )) · · · 1 − p v, v1 , . . . , vk−1 ; vk

On the other hand, each of the probabilities p (v, v1 , . . . , vi ; vi+1 ) can be computed
as the probability of occupancy of vi+1 in the graph G − v − v1 − . . . − vi obtained
from G by removing the vertices v, v1 , . . . , vi together with incident edges. This
allows us to arrange the computation of p(v) recursively into a tree. For example,
for suppose we want to compute the probability p(v) of occupancy in the graph on
Fig. 6.11.
Then we obtain the tree pictured on Fig. 6.12.
6.4 Correlation Decay for General Graphs 215

v a

b c

d e
Fig. 6.11 A graph and a vertex v

a v
b c b c
A B A B
d e d e
b c c c
C D E C D E
d e d e e
b c
F I
d d G e H e e
F G H I
e
d e d c
J
J K L K L
e M
M

Fig. 6.12 Computational tree to compute the occupancy probability p(v) for the graph on Fig. 6.11.
We recursively compute occupancy probabilities for black nodes in the corresponding subgraphs
of the graph

Denoting by p X (u) the occupancy probability of a vertex u in a graph X , we


obtain recursively:

1
1 − p M (e) = 1 − p K (d) = 1 − p L (c) = 1 − p I (e) = ,
1+t
1 1+t
1 − p J (d) = 1 − pG (e) = 1 − p H (e) = 1 − p E (c) = = ,
1 + t 1+t
1 1 + 2t
1 1 + 2t
1 − p F (b) = 1 − p D (d) = = ,
1 + t 1+2t
1+t 1 + 3t + t 2
1 1 + 3t + t 2
1 − pC (c) = 1 − p B (b) = = ,
1 + t 1+3t+t
1+2t 1+t
2 1+2t 1 + 4t + 2t 2
216 6 The Independence Polynomial

1 1 + 4t + 2t 2
1 − p A (a) = = and
1+3t+t 2
1 + t 1+4t+2t 2 1 + 5t + 5t 2 + t 3
1 1 + 5t + 5t 2 + 5t 3
1 − p(v) = =
1+4t+2t 2 1+3t+t 2
1 + t 1+5t+5t 2 +t 3 1+4t+2t 2 1 + 6t + 8t 2 + 2t 3

so that finally
t + 3t 2 + t 3
p(v) = .
1 + 6t + 8t 2 + 2t 3

Indeed, it is easy to see that for the graph on Fig. 6.11, there are two independent
sets of 3 vertices, one of which contains v, there are 8 independent sets of 2 vertices,
three of which contain v, there are 6 independent sets of one vertex, one of which
contains v, there is a unique independent set of 0 vertices not containing v and there
are no independent sets of 4 or more vertices.
This construction establishes correlation decay for general graphs of maximum
degree k and subcritical activities z u satisfying (6.4.7.1). Using telescoping as in
Sect. 5.2.3, Weitz [We06] further deduced that for such a family of graphs one can
approximate ind G (z) for non-negative weights z = (z v ) within relative error  in
time polynomial in |V | and −1 as long as (6.4.7.1) holds. In particular, as long as
(G) ≤ 5, the value of ind G (1, . . . , 1), that is, the number of independent sets in
G, can be efficiently approximated. On the other hand, Sly [Sl10] and Sly and Sun
[SS14] showed that the approximate counting of independent sets in computationally
hard when (6.4.7.1) is violated.

6.5 The Roots on and Near the Real Axis

We note that

( − 1)−1 e 1

= 1+O as  −→ +∞.
( − 2)  

Although the above bound and (6.1.5.1) are both inversely proportional to (G), the
correlation decay bound above achieves a better constant.
Sokal conjectured [S01b] that for any 0 <  < 1 there exists δ = δ() > 0 such
that for any graph G with the largest degree of a vertex not exceeding  > 2, we
have ind G (z, . . . , z) = 0 provided

( − 1)−1
0≤ z ≤ (1 − ) and | z| ≤ δ.
( − 2)

Should this conjecture be true, the technique of Lemma 1.2.3, see also [PR16]
and Sections 3.7, 5.1.7 and 6.1.5, would allow us to bridge the gap between the
6.5 The Roots on and Near the Real Axis 217

approximations achievable via the Taylor polynomial method and the correlation
decay method1 .
Below we present a result of Regts [Re16] confirming the absence of the roots
near the positive real axis “halfway between” the Dobrushin - Scott - Sokal bound
(6.1.5.1) and the conjectured Sokal bound.
6.5.1 Theorem. Let us choose an 0 <  < 1. Let G be a graph with the largest
degree of vertex not exceeding  ≥ 2. Then

ind G (z) = 0

for all activities z = (z v ) such that


π π
|z v | ≤ tan and |arg z v | ≤ for all v ∈ V.
(2 + 2) ( − 1) 2 + 2

The proof is based on the following geometric lemma.


6.5.2 Lemma. Let us fix a real 0 <  < 1, let d ≥ 1 be an integer and for k ≤ d let
w1 , . . . , wk be complex numbers such that
    π
w j  ≤ 1 and arg w j  ≤ for j = 1, . . . , k.
(2 + 2)d

Let z be a complex number such that


π π
|z| ≤ tan and | arg z| ≤
(2 + 2)d 2 + 2

and let
1
w=
1 + zw1 · · · wk

Then π
|w| ≤ 1 and | arg w| ≤ .
(2 + 2)d

Proof. Clearly,
π
|w1 · · · wk | ≤ 1 and |arg w1 · · · wk | ≤ .
2 + 2

In particular, (zw1 · · · wk ) ≥ 0 and hence |1 + zw1 . . . wk | ≥ 1 and |w| ≤ 1.


Moreover, see Fig. 6.13,
π
|arg (1 + zw1 . . . wk )| ≤ arctan |zw1 · · · wk | ≤ arctan |z| ≤ .
(2 + 2)d

1 Added in Proofs: The conjecture was proved in H. Peters and G. Regts, “On a conjecture of Sokal
concerning roots of the independence polynomial”, preprint arXiv:1701.08049 (2017)
218 6 The Independence Polynomial

Fig. 6.13 The real axis 1


(horizontal), the vectors
u = zw1 · · · wk and 1 + u
u

The proof now follows. 

6.5.3 Proof of Theorem 6.5.1. The proof is somewhat similar to that of Theo-
rem 6.1.3. We proceed by induction on the number |V | of vertices of G (the outer
induction). If |V | = 1, the result clearly holds, so we assume that |V | > 1.

We embed in the proof another inductive argument. Namely, we prove by induction


on |V | that if G = (V, E) is a graph of the largest degree (G) ≤  of a vertex and
if v is a vertex of degree at most  − 1 then
 
 ind G−v (z) 
ind G (z) = 0,    ≤ 1 and
ind G (z) 
 
 
arg ind G−v (z)  ≤ π
. (6.5.3.1)
 ind (z)  (2 + 2)( − 1)
G

The case of |V | = 1 is easy to check, so we assume that |V | ≥ 2. As in the proof


of Theorem 6.1.3, we use the recursive formulas (6.1.3.1) and (6.1.3.2) and note
that the product in the right hand side of (6.1.3.2) contains k ≤  − 1 factors. If
k = 0, so that v is an isolated vertex of G then ind G−v (z) = 0 by the outer induction
hypothesis and
ind G−v (z) 1
= ,
ind G (z) 1 + zv

so that (6.5.3.1) holds. Hence we assume that k > 0 and v has neighbors v1 , . . . , vk
in G.
Since the degree of vi in G − v − v1 − . . . − vi−1 does not exceed  − 1, by the
induction hypothesis, we have
 
 ind G−v−v1 −...−vi−1 (z) 
  ≤ 1 and
 ind 
G−v−v1 −...−vi (z)
  (6.5.3.2)
 
arg ind G−v−v1 −...−vi−1 (z)  ≤ π
for i = 1, . . . , k.
 ind G−v−v1 −...−vi (z)  (2 + 2)( − 1)

Applying Lemma 6.5.2 with d =  − 1, we deduce from (6.1.3.1) and (6.1.3.2) that
(6.5.3.1) holds, which completes the inner induction.
It remains to check that ind G (z) = 0 if the degree of every vertex v of G is .
Let us pick an arbitrary vertex v. We still use (6.1.3.1) and (6.1.3.2), only that the
6.5 The Roots on and Near the Real Axis 219

product in the right hand side of (6.1.3.2) now contains  factors. Since the degree
of vi in G − v − v1 − . . . − vi−1 still does not exceed  − 1, we still have (6.5.3.2).
From (6.1.3.2) and (6.5.3.2) we conclude that
 
 
arg z v ind G−v−Nv (z)  ≤ π
+

< π
 ind G−v (z)  (2 + 2)( − 1) 2 + 2

and ind G (z) = 0 by (6.1.3.1). 


The correlation decay method for complex activities is explored in [H+16].
Our next goal is to prove that among the roots of the univariate independence
polynomial nearest to the origin, one is necessarily real and hence negative real
[SS05]. More generally, we prove the following result.

6.5.4 Theorem. Let G = (V, E) be a graph and let x = (xv : v ∈ V ) be non-


negative real activities at the vertices of G, so that xv ≥ 0 for all v ∈ V . For ζ ∈ C
let us define ζ x = (ζ xv : v ∈ V ) and let

g(ζ) = ind G (ζ x)

be the corresponding univariate polynomial. Then

min |ζ| = min |ζ|,


ζ∈C: g(ζ)=0 ζ∈R: g(ζ)=0

that is, among the roots of g(ζ) nearest to the origin, one is negative real.

We follow [Lo12], Sect. 5.3.1. First, we define the chromatic polynomial of a


graph.

6.5.5 Lemma. Let G = (V, E) be a graph without loops or multiple edges. For a
positive integer n, let chr G (n) be the number of ways to color the vertices of G using
a set of at most n distinct colors so that no two vertices spanning an edge are colored
with the same color. Then
1. For k = 1, . . . , |V | there exist integer ak (G) such that

(−1)|V |−k ak (G) ≥ 0 for k = 1, . . . , |V |

and
|V |

chr G (n) = ak (G)n k for all positive integer n.
k=1

2. For k = 1, . . . , |V | there exist integer bk (G) such that

bk (G) ≥ 0 for k = 1, . . . , |V |
220 6 The Independence Polynomial

and
|V |
 n
chr G (n) = bk (G) for all positive integer n.
k=1
k

Proof. To prove Part (1), we proceed by induction on the number |E| of edges of G.
If |E| = 0, that is, if G consists of |V | isolated vertices, then chr G (n) = n |V | and
the result follows. Suppose now that |E| > 0 and let e ∈ E be an edge of G. Let
G − e be the graph with set V of vertices and set E \ {e} of edges, so that G − e
is obtained from G by deleting the edge e. Let G/e be the graph obtained from G
by contracting the edge e. We obtain the set V of vertices of G/e by replacing the
endpoints u, v of e in V by a single new vertex w and we obtain the set E of edges of
G/e by removing e from E and replacing all edges in E with one endpoint in {u, v}
by the edges with the corresponding endpoint at w (should multiple edges arise, we
replace them by a single edge), see Fig. 6.14.
It is not hard to see that

chr G (n) = chr G−e (n) − chr G/e (n) (6.5.5.1)

Since the graph G − e has |V | vertices, the graph G/e has |V | − 1 vertex and both
G − e and G/e contain fewer than |E| edges, the proof follows by induction from
(6.5.5.1).
To prove Part (2), we define bk (G) as the number of ways to color the vertices of
G using exactly k colors so that no two neighbors are colored with the same color.
Clearly bk (G) ≥0. To color the graph using at most n colors, we choose a subset
of k colors in nk ways and then color the graph in bk (G) ways using all chosen
colors. 

The polynomial chr G is called the chromatic polynomial of the graph G. We can
formally define
|V | |V |
 z
chr G (z) = ak (G)z k = bk (G)
k=1 k=1
k

for any complex z ∈ C, where

Fig. 6.14 A graph G, its


edge e and graphs G − e and
G/e e

G−e
G/e
6.5 The Roots on and Near the Real Axis 221

Fig. 6.15 Graphs G and


1 2 1 2
G(S) for
S = {1, 2, 2, 2, 3, 3} 2
G
2
3 4 3 3
G(S)

z z(z − 1) · · · (z − k + 1)
= .
k k!

Next, we connect the independence and chromatic polynomials of graphs. Given


a graph G = (V, E) and a multiset S of copies of vertices of G (that is, some vertices
of G can have multiple copies in S and some can have no copies), we define the graph
G(S) with set S of vertices as follows: an edge of G(S) connects two vertices u and
v of S if and only if u and v are copies of the same vertex of G or copies of vertices
connected by an edge in G, see Fig. 6.15.
If for a multiset S and activities z v at the vertices of G, we define the monomial

zS = zv ,
v∈S

where each vertex in S is accounted for with its multiplicity. Our goal is to obtain a
power series expansion of ln ind G (z), where z = (z v : v ∈ V ) is a vector of activities
at the vertices of G sufficiently close to 0, so that

|1 − ind G (z)| < 1.

In this case, we choose the branch of ln ind G (z) that is 0 when z v = 0 for all v ∈ V .
6.5.6 Lemma. Let G = (V, E) be a graph and let δ > 0 be a sufficiently small real
number such that

|1 − ind G (z)| < 1 provided |z v | ≤ δ for all v ∈ V.

Then
 1
ln ind G (z) = a1 (G(S))z S ,
S={v1 ,...,v1 ...,v ,...,v }
μ1 ! · · · μr !
r r

where the sum is taken over multisets S of vertices of G, μi is the multiplicity of vi


in S and 
d 
a1 (G(S)) = chr G(S) (z)
dz z=0

is the first coefficient of the chromatic polynomial of G(S). Moreover, the series
converges absolutely and uniformly on the polydisc |z v | ≤ δ for v ∈ V .
222 6 The Independence Polynomial

Proof. Let us fix some x ∈ C and consider a function

z −→ (1 + z)x = e x ln(1+z) for z ∈ C such that |z| < 1,

where we choose the branch of ln(1 + z) that is 0 for z = 0. We have the Taylor
series expansion

∞
x k
(1 + z)x = 1 + z provided |z| < 1. (6.5.6.1)
k=1
k

Moreover, the series converges absolutely and uniformly on compact sets inside the
polydisc |z| < 1 and |x| ≤ 1.
From (6.5.6.1), we get
⎛ ⎞k

 
 x x ⎜


ind G (z) = 1 + zS ⎟
⎠ . (6.5.6.2)
k=1
k ⎝ S⊂V,|S|>0
S independent

Furthermore, we write
⎛ ⎞k
⎜  ⎟ 
⎜ zS ⎟
⎝ ⎠ = z S1 · · · z Sk ,
S⊂V,|S|>0 S1 ,...,Sk ⊂V
S independent |S1 |,...,|Sk |>0
S1 ,...,Sk are independent

where the sum is taken over all ordered k-tuples of not necessarily distinct non-empty
independent sets S1 , . . . Sk of G. Given such a k-tuple S1 , . . . , Sk of independent sets,
let S = S1  . . .  Sk be the disjoint union of copies of S1 , . . . , Sk and let G(S) be
the corresponding graph with set S of vertices. Then G(S) can be colored using
exactly k colors, so that no two vertices spanning an edge are colored with the same
color (we call this a proper k-coloring). Conversely, given a multiset S of possibly
multiple copies of vertices of G, each proper k-coloring of G(S) corresponds to a
representation S = S1  . . .  Sk , where S1 , . . . , Sk are non-empty independent sets
in G, as follows: if a copy of a vertex v of G in S is colored with the i-th color
then we include v in Si . If S consists of copies of r distinct vertices v1 , . . . , vr with
respective multiplicities μ1 , . . . , μr then exactly μ1 ! · · · μr ! of proper k-colorings of
G(S) correspond to the same ordered k-tuple S1 , . . . , Sk of non-empty independent
sets of G. From (6.5.6.2), we can write

 |S|
 x
 x zS
ind G (z) = 1 + bk (G(S)),
S={v1 ,...,v1
μ ! · · · μr ! k=1 k
,...,v ,...,v } 1
r r
6.5 The Roots on and Near the Real Axis 223

where the sum is taken over all non-empty multisets S of vertices of G while bk (G(S))
is the number of proper k-colorings of G(S) and μ1 , . . . , μr are the multiplicities of
vertices in S. From Part (2) of Lemma 6.5.5, we obtain

 x  zS
ind G (z) = 1 + chr G(S) (x) (6.5.6.3)
S={v1 ,...,v1
μ ! · · · μr !
...,v ,...,v } 1
r r

and the series converges absolutely and uniformly on the polydisc |z v | ≤ δ and
|x| ≤ 1, say. On the other hand, for a > 0 we can write

d z ln a 
ln a = e 
dz z=0.

Computing the derivative of (6.5.6.3) at x = 0, we obtain


 1
ln ind G (z) = a1 (G(S))z S ,
S={v1 ,...,v1
μ
,...,v ,...,v } 1
! · · · μr !
r r

where a1 (G(S)) is the first coefficient of the chromatic polynomial of G(S). 


Now we are ready to prove Theorem 6.5.4.
6.5.7 Proof of Theorem 6.5.4. For sufficiently small δ > 0 we have

|1 − g(ζ)| < 1 provided |ζ| ≤ δ

and hence by Lemma 6.5.6, we have a univariate power series expansion


 1
ln g(ζ) = a1 (G(S))ζ |S| x S . (6.5.7.1)
S={v1 ,...,v1 ...,v ,...,v }
μ1 ! · · · μr !
r r

It follows then that the distance ρ0 from 0 to the nearest root of g(ζ) is the radius of
convergence of (6.5.7.1), see also Lemma 2.2.1. Since x1 , . . . , xn ≥ 0, we have

x S ≥ 0 for all S.

By Part (1) of Lemma 6.5.5, we have

(−1)|S| a1 (G(S)) ≤ 0 for all S.

Therefore, the maximum absolute value of the series (6.5.7.1) on any disc |ζ| ≤ ρ
where it converges is attained at ζ = −ρ and equal to the sum
 1
|a1 (G(S))| ρ|S| x S (6.5.7.2)
S={v1 ,...,v1 ...,v ,...,v }
μ1 ! · · · μr !
r r
224 6 The Independence Polynomial

of non-negative real numbers. In other words, (6.5.7.1) converges in the disc |ζ| ≤ ρ
if and only the series of non-negative real numbers (6.5.7.2) converges. Hence the
radius ρ0 of convergence of (6.5.7.1) is the smallest ρ > 0 where (6.5.7.2) diverges
and −ρ0 is necessarily a root of g(ζ). 

6.6 On the Local Nature of Independent Sets

Let us compare the correlation decay approach of Sects. 6.3 and 6.4 and the
Taylor polynomial interpolation method of Sect. 6.1.5. The correlation decay method
is based on the observation that for subcritical activities z v , the independence polyno-
mial can be approximated based on the local structure of the graph in a neighborhood
of each vertex. The Taylor polynomial interpolation method, again for sufficiently
small activities, relies on the information about independent sets of a small (loga-
rithmic) size. Such sets can be scattered all over the graph, so it may appear that we
rely on some global structural properties of the graph. Here we show that this is an
illusion, as the Taylor polynomial interpolation can also be done based on the local
information only. Namely, we show that the sum of weights
 
zv
S⊂V v∈S
S is independent
|S|=k

of independent k-subsets in a graph G = (V, E; z) can be computed entirely from


the data contained in the family of (k − 1)-neighborhoods of the vertices of the
graph. Besides, we show that if the maximum degree of a vertex of the graph is
bounded above in advance, then the interpolation in Sect. 6.1.5 can be done in genuine
polynomial and not just in quasi-polynomial time. Our exposition is loosely based
on [PR16].

6.6.1 Definitions A graph with multiplicities is an undirected graph H = (U, R; μ)


with set U of vertices, set R of edges, without loops or multiple edges, and with
positive integers μ(u), called multiplicities, assigned to its vertices u ∈ U . We say
that two such graphs H1 = (U1 , R1 ; μ1 ) and H2 = (U2 , R2 ; μ2 ) are isomorphic if
there is a bijection φ : U1 −→ U2 , called an isomorphism, such that {φ(u), φ(v)} is
an edge of H2 if and only if {u, v} is an edge of H1 and such that the multiplicity of
φ(u) in H2 is equal to the multiplicity of u in H1 .

Let G = (V, E) be a graph and let H = (U, R; μ) be a graph with multiplicities.


A map ψ : U −→ V is called an embedding if φ is an injection and {φ(u), φ(v)}
is an edge of G if and only if {u, v} is an edge of H (multiplicities of vertices of
H play no role here). Given a graph G = (V, E; z) with set V of vertices, set E of
edges and complex activities z v at the vertices of G and a graph H = (U, R; μ) with
multiplicities, we define a partition function
6.6 On the Local Nature of Independent Sets 225
  μ(u)
i H (G) = z ψ(u) .
ψ: U −→V u∈U
is embedding

In particular, if Fk is a graph with k vertices, no edges and multiplicity 1 of each


vertex, then  
i Fk (G) = k! zv , (6.6.1.1)
S⊂V v∈S
S is independent
|S|=k

since every independent k-set of G can be obtained as the image of Fk in exactly k!


ways. Note that i Fk (G) is what we need to reconstruct the independence polynomial
of G, since
|V |
 1
ind G (z) = 1 + i F (G).
k=1
k! k

6.6.2 Decomposition into connected graphs. Suppose now that the graph H =
(U, R; μ) is connected. Then i H (G) collects only the local information regarding
G = (V, E). Indeed, let u be an arbitrary vertex of H . Once we know the image
ψ(u) ∈ V under the embedding ψ : U −→ V , we know that for every w ∈ H
the image ψ(w) is connected to ψ(u) by a path of m edges in G if and only if w is
connected to u in H by a path of m edges. Hence the image of H lies entirely in the
(k − 1)-neighborhood of a vertex of G for k = |U |.

The crucial observation is that for any graph H with multiplicities, the value of
i H (G) can be expressed in terms of i H (G) for connected graphs H with multi-
plicities, such that each H has at most as many vertices as H has and the sum of
multiplicities of the vertices of each H is at most the sum of multiplicities of the
vertices of H . Indeed, assuming that H is not connected, let us represent it as a
vertex-disjoint union H = H1 ∪ H2 such that there are no edges of H connecting a
vertex of H1 with a vertex of H2 . Expanding the product i H1 (G) · i H2 (G), we observe
that we collect all the terms of i H (G), but also some extra terms, so that

i H (G) = i H1 (G) · i H2 (G) − i H (G), (6.6.2.1)
H

where H is a graph with multiplicities obtained from H1 and H2 by at least one of


the following sequence of two operations (a) and (b):
(a) we identify some vertices of H1 with some vertices of H2 so that if u 1 is
identified with u 2 then the new vertex u of H is assigned the multiplicity of μ(u 1 ) +
μ(u 2 ); and
(b) we connect some, unchanged on step (a), vertices of H1 with some, unchanged
on step (a), vertices of H2 by edges.
Whenever we create a multiple edge, we replace it by a single edge. We observe
that the number of connected components in each H so obtained is smaller than the
226 6 The Independence Polynomial

Fig. 6.16 A disconnected 1 1


graph with multiplicities F2
and connected graphs with
F2
multiplicities H1 , H2a and
H2b
1 2 1 1
H1 H2a H2b

Fig. 6.17 A disconnected 1


graph with multiplicities F3
and connected graphs with 1 F 1
3
multiplicities H1 , H2a , H2b ,
H3a , H3b , H3c and H3d 1
H1
2 1 1
H2a H2b
1
3 2 1 1 1 1 1 1
H3a H3b H3c H 3d

number of connected components of H . Iterating this procedure, we express i H (G)


entirely in terms of i H (G) with connected H .
For example, for the graphs with multiplicities pictured on Fig. 6.16, we have

i F2 (G) = i H1 (G) · i H1 (G) − i H2a (G) − i H2b (G).

A more tedious computation shows that for the graphs with multiplicities pictured
on Fig. 6.17, we have

i F3 (G) =i H1 (G) · i H1 (G) · i H1 (G) − 3i H2a (G)i H1 (G) − 3i H2b (G)i H1 (G)
(6.6.2.2)
+ 2i H3a (G) + 6i H3b (G) + 3i H3c (G) + 2i H3d (G).

6.6.3 The case of a bounded degree. Suppose now that the maximum degree
(G) of a vertex of G = (V, E) is bounded above in advance. Then the procedure
of computing (6.6.1.1) can be done in polynomial time |V | O(1) , as long as k =
O(ln |V |). The algorithm proceeds as follows.

First, we create a list of connected graphs H = (U, R; μ) with multiplicities


such that there is an embedding ψ : U −→ V and the sum of weights at the
vertices of H does not exceed k. Since H is connected, we can always order the
vertices u 1 , . . . , u m , m ≤ k, of H in such a way that every vertex u i for i ≥ 2 has a
neighbor among the preceding vertices. Once the image ψ(u 1 ) is chosen (in at most
|V | ways) then for each vertex u i there are at most (G) choices of ψ(u i ), given
6.6 On the Local Nature of Independent Sets 227

that the images u 1 , . . . , u i−1 are already chosen. This creates a list H of at most
|V |((G))
 k−1 
k−1
= |V | O(1) graphs H . Note that if H has m vertices then there are
m−1
ways to assign positive integer weights to the vertices of H so that the sum of
weights is k, which is |V | O(1) as long as k = O(ln |V |).
For each graph H from the list H, we compute i H (G) in |V | O(1) time.
Next, we create a list H consisting of the graphs with multiplicities H that are
represented as a union of a connected graph from H and some isolated vertices and
such that the sum of multiplicities at the vertices of H does not exceed k. Given a
graph H ∈ H \ H, we write H = H1 ∪ H2 , where H1 ∈ H and H2 consists of the
isolated vertices and apply the algorithm of Sect. 6.6.2. Note that all graphs H in
(6.6.2.1) with i H (G) = 0 (we only need to collect those) also belong to H and have
fewer isolated vertices than H has. When applying (6.6.2.1), we should account
for isomorphic graphs H (this is how we get integer coefficients in the formula
(6.6.2.2)). However, testing the isomorphism of two graphs H1 , H2 ∈ H reduces to
testing the isomorphism of their connected components H1 , H2 ∈ H, which can be
done in |V | O(1) time as above: once we picked the image of a vertex of H1 under
a prospective isomorphism φ, we have at most (G) choices for the image of each
next vertex. Thus we recursively compute i H (G) for all H ∈ H in the order of the
increasing number of isolated vertices, so that in the end we compute (6.6.1.1).
Chapter 7
The Graph Homomorphism Partition
Function

Known in statistical physics as the partition function of a multi-spin system, this is


one of the most general forms of a partition function. It covers permanents, hafnians,
independent sets, graph colorings and some more exotic objects such as the Hamil-
tonian permanent. We apply the Taylor polynomial interpolation to find a domain
where the partition function can be efficiently approximated. This leads to “softer”
(doable) versions of “hard” (impossible) problems of combinatorial enumeration: for
example, instead of counting all independent sets of a given cardinality in a graph,
we compute the total weight of all subsets of vertices of a given cardinality, where
the weight of each subset is exponentially small in the number of edges of the graph
it spans. We discuss one of the oldest and most famous models in statistical physics,
the Ising model for magnetization, which connects various topics: perfect matchings,
graph homomorphisms, cuts in graphs and phase transitions of various kinds. The
Lee–Yang Theorem asserts that the zeros of the partition function of cuts lie on the
unit circle, which is interpreted as the absence of phase transition in the presence of
a magnetic field.

7.1 The Graph Homomorphism Partition Function

7.1.1 Definition. Let G = (V, E) be an undirected graph with set V of vertices, set
E of edges, without multiple edges or loops, and let A = ai j be a k × k symmetric
real or complex matrix. We define the graph homomorphism partition function by
 
hom G (A) = aφ(u)φ(v) . (7.1.1.1)
φ:V −→{1,...,k} {u,v}∈E

The sum is taken over all maps φ of the set V of vertices into the set {1, . . . , k} of
indices of the matrix entries and the product is taken over all edges of the graph G.

© Springer International Publishing AG 2016 229


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_7
230 7 The Graph Homomorphism Partition Function

If A is the adjacency matrix of a graph H with set U of k vertices then hom G (A)
counts graph homomorphisms, that is, maps φ : V −→ U such that {φ(u), φ(v)} is
an edge of H whenever {u, v} is an edge of G.
Choosing the matrix A in a special way, we obtain various quantities of interest.
7.1.2 Example: independent sets. Let us choose k = 2 and
 
0 1
A= .
1 1

Each map φ : V −→ {1, 2} defines a set of vertices Sφ = φ−1 (1) ⊂ V and the
contribution of φ in (7.1.1.1) is 1 if S is an independent set in G and 0 otherwise.
Hence hom G (A) counts independent sets in G.
7.1.3 Example: colorings. Let us define

1 if i = j
ai j =
0 if i = j.

We interpret every map φ : V −→ {1, . . . , k} as a coloring of the vertices of G into


one of the k colors. Then the contribution of φ in (7.1.1.1) is 1 if the coloring is proper,
that is, the endpoints of every edge are colored differently, and the contribution of φ
is 0 otherwise. Hence hom G (A) counts the proper colorings of G with k colors.
For more examples, see Sect. 5.3 of [Lo12].
Recall that by (G) we denote the largest degree of a vertex of G.
7.1.4 Theorem. For a positive integer , let

α α
δ = sin cos
2 2
for some α = α such that

0<α< ,
3
so that we can choose δ3 = 0.18, δ4 = 0.13 and δ = (1/). Then for any graph
G with (G) ≤ , we have
hom G (Z ) = 0
 
for any k × k complex symmetric matrix Z = z i j such that

1 − z i j ≤ δ for all 1 ≤ i, j ≤ k.

A version of Theorem 7.1.4 was first proved in [BS14]. We present a simpler


proof achieving better constants.
7.1 The Graph Homomorphism Partition Function 231

As before, see Sect. 3.6, Theorem 4.1.5, Sects. 4.4, 5.5 and 6.1.5, we obtain that
hom G (A) is easily computable if the entries ai j satisfy a slightly stronger inequality.

7.1.5 Theorem. Let us fix a constant 0 < δ < δ , where δ is the constant in
Theorem 7.1.4. Then there exists a γ = γ (δ /δ) > 0 and for any 0 <  < 1, for
any graph G = (V, E) such that (G) ≤  and any k there   exists a polynomial
p = pG,k,δ, in the entries of k × k symmetric matrix A = ai j such that

deg p ≤ γ (ln |E| − ln )

and
|ln hom G (A) − p(A)| ≤ 

provided
1 − ai j ≤ δ for all 1 ≤ i, j ≤ k.

Moreover, given δ, G,  > 0 and k, the polynomial p can be constructed in


(k|E|) O(ln |E|−ln ) time, where the implied constant in the “O” notation depends on
the ratio δ /δ alone. The proof is very similar to that of Theorem 3.6.2, we sketch it
below.
Let J = Jk be the k × k matrix filled with 1s. We define a univariate polynomial
g = gG,A by
g(z) = hom G (J + z (A − J )) ,

so that g(0) = hom G (J ) = k |V | and g(1) = hom G (A). We note that

ds ds     
g(z) = 1 + z aφ(u)φ(v) − 1
dz s z=0 dz s z=0
φ:V −→{1,...,k} {u,v}∈E
     
= aφ(u 1 )φ(v1 ) − 1 · · · aφ(u s )φ(vs ) − 1 ,
φ:V −→{1,...,k} {u 1 ,v1 },...,{u s ,vs }∈E

where the inner sum is taken over all ordered s-tuples I of distinct edges of G. Let
V (I ) be the set of distinct vertices among u 1 , v1 , . . . , u s , vs . Then we can further
write
ds 
g(z) = k |V |−|V (I )|
dz s z=0
I =({u 1 ,v1 },...,{u s ,vs })
{u 1 ,v1 },...,{u s ,vs }∈E
    
× aφ(u 1 )φ(v1 ) − 1 · · · aφ(u s )φ(vs ) − 1 .
φ:V (I )−→{1,...,k}

Here the factor of k |V |−|V (I )| accounts for the number of ways to extend a map
φ : V (I ) −→ {1, . . . , k} to the whole set V ⊃ V (I ) of vertices. It follows now that
g (s) (0) is a polynomial of degree s in the entries ai j computable in (|E|k) O(s) time.
232 7 The Graph Homomorphism Partition Function

We define f (z) = ln g(z) in a neighborhood of z = 0 and the proof proceeds as in


Sect. 3.6.7.
Patel and Regts show [PR16] that if (G) is fixed in advance, then the value of
p(A) can be computed in polynomial time (k|E|/) O(1) , where the implied constant
depends on the ratio δ /δ only.
Using Theorem 7.1.5, we obtain the following relaxed versions of hard counting
problems in Examples 7.1.2 and 7.1.3.

7.1.6 Example: sets weighted by independence. In the context of Example 7.1.2,


let us define A by  
1−δ 1+δ
A= ,
1+δ 1+δ

where δ is the constant of Theorem 7.1.5. Then



(1 + δ)−|E| hom G (A) = w(S)
S⊂V (7.1.6.1)
−e(S)
where w(S) = (1 + δ) (1 − δ) e(S)

and e(S) is the number of edges in G with both endpoints in S. In particular, w(S) = 1
if S is independent and

exp −2δe(S) − δ 3 e(S) ≤ w(S) ≤ exp {−2δe(S)} .

Hence (7.1.6.1) is the sum over all subsets of vertices of G, where each subset S is
counted with weight 1 if S is independent and is counted with a weight exponentially
small in the number of edges that vertices of S span, if S is not independent.
As follows by Theorem 7.1.5, we can compute the sum (7.1.6.1) in quasi-
polynomial time (genuinely polynomial time, if (G) is fixed in advance [PR16]).

7.1.7 Colorings weighted by properness. In the context of Example 7.1.3, let us


define 
1 + δ if i = j
ai j =
1 − δ if i = j,

where δ > 0 is the constant in Theorem 7.1.5. Then



(1 + δ)−|E| hom G (A) = w(φ)
φ:V −→{1,...,k} (7.1.7.1)
−e(φ)
where w(φ) = (1 + δ) (1 − δ) e(φ)

and e(φ) is the number of edges of G whose both endpoints are colored into the same
color by the coloring φ. Thus we have w(φ) = 1 if φ is a proper coloring and

exp −2δe(φ) − δ 3 e(φ) ≤ w(φ) ≤ exp {−2δe(φ) } .


7.1 The Graph Homomorphism Partition Function 233

Hence (7.1.7.1) represents the sum over all colorings φ of G, where φ is counted
with weight 1 if φ is proper and is counted with a weight exponentially small in the
number of edges that are not properly colored, if φ is not proper.
Theorem 7.1.5 implies that we can compute the sum (7.1.7.1) in quasi-polynomial
time.
To prove Theorem 7.1.4, we first introduce a multi-affine version of hom G .
7.1.8 Edge-colored graph homomorphisms. Let G = (V, E) be a graph as above
and let Z = z iuvj be a |E| × k(k+1)
2
complex matrix with entries indexed by edges
{u,v}
{u, v} ∈ E and unordered pairs 1 ≤ i, j ≤ k. We write z iuvj instead of z {i, j} assuming
that
vu vu
z iuvj = z uv
ji = z ji = z i j .

Equivalently, we can think that a k × k symmetric matrix is attached to every edge


of G. We introduce the partition function
 
Hom G (Z ) = uv
z φ(u)φ(v) ,
φ:V −→{1,...,k} {u,v}∈E

which we call the partition function of edge-colored homomorphisms. If z iuvj = z i j


(that is, the same symmetric matrix is attached to each edge of G), we are in the
situation of Definition 7.1.1 and Hom G (Z ) = hom G (Z ). The advantage of working
with Hom G (Z ) as opposed to hom G (Z ) is that Hom G (Z ) is a multi-affine function,
that is, the degree of Hom G (Z ) in each variable z iuvj does not exceed 1.
We will prove that in fact

Hom G (Z ) = 0 provided 1 − z iuvj ≤ δ


for all {u, v} ∈ E and all 1 ≤ i, j ≤ k,

where δ is the constant of Theorem 7.1.4.


7.1.9 The recursion. For a sequence W = (v1 , . . . , vr ) of distinct vertices of G and
a sequence L = (l1 , . . . , lr ) of not necessarily distinct indices 1 ≤ l1 , . . . , lr ≤ k,
we define  
Hom W L (Z ) =
uv
z φ(u)φ(v)
φ:V −→{1,...,k} {u,v}∈E
φ(v1 )=l1 ,...,φ(vr )=lr

(we suppress the graph G in the notation). In words: we restrict the sum defining
Hom G to the maps φ that map prescribed vertices to prescribed indices. We denote
by |W | and by |L| the number of vertices in W and the number of indices in L
respectively. If W is a sequence of distinct vertices and L is a sequence of not
necessarily distinct indices such that |W | = |L|, for a vertex w in W we denote by
l(w) the corresponding index in L, so l(vi ) = li in the above definition. We denote
234 7 The Graph Homomorphism Partition Function

by (W, u) the sequence W appended by a vertex u, different from all vertices in


W and by (L , l) the sequence of indices L appended by an index l, not necessarily
different from the indices in L. Then, for any vertex u not in the sequence W , we
have
k
Hom WL (Z ) = Hom(W,u)
(L ,l) (Z ). (7.1.9.1)
l=1

For a 0 < δ < 1 we define the polydisc U(δ) = U (δ, G) consisting of all
|E| × k(k+1)
2
matrices Z = z iuvj such that

1 − z iuvj ≤ δ for all {u, v} ∈ E and 1 ≤ i, j ≤ k.

We will use the following straightforward observation: suppose that W contains


two vertices u and v such that {u, v} ∈ E with corresponding indices l and m in L,
so that W = (W  , u, v) and L = (L  , l, m). Let A, B ∈ U(δ) be two matrices that
differ only in the entries z iuvj for 1 ≤ i, j ≤ k. Then

uv
alm
L (A) =
Hom W uv Hom L (B).
W
blm

L (A)  = 0 and Hom L (B)  = 0 then the angle between non-zero


In particular, if Hom W W

complex numbers Hom W L (A) and Hom L (B) does not exceed 2 arcsin δ, see Fig. 3.7.
W

7.1.10 Proof of Theorem 7.1.4. Let us denote δ just by δ and let


0<α<
3
be a number such that
α α
δ = sin cos .
2 2
We prove by the descending induction on r = |V |, . . . , 1 the following statements:

Statement 1.r . Let W be a sequence of distinct vertices and let L be a sequence


of not necessarily distinct indices such that |W | = |L| = r . Then Hom WL (Z )  = 0
for all Z ∈ U (δ).
Statement 2.r . Let W be a sequence of distinct vertices such that |W | = r . Suppose
that W = (W  , u) and let L  be a sequence of not necessarily distinct indices such
that |W  | = |L  | = r − 1. Let 1 ≤ l, m ≤ k be indices. Then for any Z ∈ U (δ) the

(W  ,u)
angle between complex numbers Hom(W ,u)
(L  ,l) (Z )  = 0 and Hom (L  ,m) (Z )  = 0 does
not exceed α.
7.1 The Graph Homomorphism Partition Function 235

Statement 3.r . Let W be a sequence of distinct vertices and let L be a sequence


of not necessarily distinct indices such that |W | = |L| = r and suppose that W =
(W  , u) and L = (L  , l). Let v be a vertex not from W and let A, B ∈ U (δ) be
two matrices that differ only in the coordinates zluvj for j = 1, . . . , k. Then the angle
between Hom W L (A)  = 0 and Hom L (B)  = 0 does not exceed α.
W

Suppose that r = |V | so that W is a sequence of all vertices V of G. If L is a


sequence of indices such that |L| = |V | then

u v
L (Z ) =
Hom W zl(u) l(v)  = 0
{u,v}∈E

and Statement 1.r follows. Writing W = (W  , u), we have


⎛ ⎞
  v
zlul(v) 
Hom(W ,u)
(L  ,l) (Z ) =
⎝ ⎠ Hom(W ,u) (Z )
(L ,m)
zu v
v: {u,v}∈E m l(v)

 
and hence the angle between Hom(W ,u) (W ,u)
(L  ,l) (Z )  = 0 and Hom (L  ,m) (Z )  = 0 does not
exceed
2 arcsin δ ≤ α

and Statement 2.r follows. Statement 3.r is vacuous since there are no vertices outside
of W .
Suppose that 1 ≤ r < |V | and that Statements 1.(r + 1), 2.(r + 1) and 3.(r + 1)
hold.
Let W be a sequence of distinct vertices and let L be a sequence of not necessarily
distinct indices such that |W | = |L| = r . Let us choose a vertex v not in W . Then
by (7.1.9.1),
k
(W,v)
Hom WL (Z ) = Hom(L , j) (Z ) (7.1.10.1)
j=1

(W,v)
From Statement 1.(r + 1) we have Hom(L , j) (Z )  = 0 for all j = 1, . . . , k and from
(W,v)
Statement 2.(r + 1) the angle between any two complex numbers Hom(L ,i) (Z )  = 0
(W,v)
and Hom(L , j) (Z )  = 0 does not exceed α < 2π/3. Therefore, by Lemma 3.6.4,
we have
L (Z )  = 0
Hom W

and Statement 1.r follows.


Let W and L with |W | = |L| = r be sequences as above and suppose that
W = (W  , u) and L = (L  , l). Let v be a vertex not in W and let A, B ∈ U(δ) be
the matrices that differ only in the coordinates zluvj for j = 1, . . . , k. Let us define a
matrix C ∈ U(δ) such that
236 7 The Graph Homomorphism Partition Function

cluvj = 1 for j = 1, . . . , k

and C agrees with A and B in all other entries. From Statement 1.(r + 1) we have
(W,v)
Hom(L , j) (C)  = 0 for j = 1, . . . , k

(W,v)
and from Statement 2.(r + 1) the angle between any two numbers Hom(L ,i) (C)  = 0
(W,v)
and Hom(L , j) (C) = 0 does not exceed α < 2π/3. We rewrite (7.1.10.1) as


k
 
k


L (A)
Hom W = aluvj Hom(W ,u,v)
(L  ,l, j) (C) and L (B)
Hom W = bluvj Hom(W ,u,v)
(L  ,l, j) (C).
j=1 j=1

L (A)  = 0
Applying Lemma 3.6.4 again, we conclude that the angle between Hom W
L (B)  = 0 does not exceed
and Hom W

δ
2 arcsin =α
cos α
2

and Statement 3.r holds.


Let W with |W | = r be a sequence as above and suppose that W = (W  , u). Let
L be a sequence of indices such that |L  | = r − 1. Given A ∈ U(δ) and two indices


1 ≤ l, m ≤ k, let us define a matrix B by

bluvj = amuvj for all v such that {u, v} ∈ E and all j = 1, . . . , k

and keeping all other entries of B the same as in A. Then


 
Hom(W ,u) (W ,u)
(L  ,l) (B) = Hom (L  ,m) (A).

Let d0 be the number of neighbors of u in the sequence W  and let d1 be the num-
ber of neighbors of u not in the sequence W  . Then, from Statement 1.r we have

(W  ,u)
Hom(W ,u)
(L  ,l) (A)  = 0 and Hom (L  ,m) (A)  = 0 while from Statement 3.r the angle
  
between Hom(W ,u) (W ,u) (W ,u)
(L  ,l) (A) and Hom (L  ,m) (A) = Hom (L  ,l) (B) does not exceed

2d0 arcsin δ + d1 α ≤ d0 α + d1 α = α,

which proves Statement 2.r .


This concludes the induction and hence the proof of Statements 1.1 and 2.1. For
any vertex v of V , we have


k
Hom G (Z ) = Homvj (Z )
j=1
7.1 The Graph Homomorphism Partition Function 237

Fig. 7.1 A graph and a cut


of 8 edges associated with
the set S of 3 black dots

and the proof of the theorem follows by Statement 1.1, Statement 1.2 and Lemma
3.6.4. 

7.1.11 Cuts and limits of approximability. Let G = (V, E) be a graph and let
S ⊂ V be a set of vertices. The cut associated with S is the set of all edges of G with
one endpoint in S and the other not in S. We denote by cut G (S) the number of edges
in the cut. For example, for the graph G and set S in Fig. 7.1, we have cut G (S) = 8.
Let
μ(G) = max cut G (S)
S⊂V

be the largest number of edges in a cut of a graph G. Berman and Karpinski proved
[BK99] that there is an absolute constant β > 1 such that it is an NP-hard problem
to approximate μ(G) within a factor β > 1 for a given graph satisfying (G) ≤ 3.
Clearly, the problem remains NP-hard if we further restrict it to connected graphs,
in which case μ(G) ≥ |V | − 1.

Let k = 2, let us choose 0 <  < 1 and let


 
 1
A = .
1 

Then 
−|E| hom G (A ) = − cut G (S) .
S⊂V

Since the number of terms in the above sum is 2|V | , we obtain

ln hom G (A ) ln 2 ln hom G (A )


+ |E| − |V | ≤ μ(G) ≤ + |E|.
ln(1/) ln(1/) ln(1/)

Assuming now that G is a connected graph with (G) ≤ 3, we conclude that for
any given δ > 0, by choosing a sufficiently small  = (δ) > 0, we approximate
μ(G) within a relative error δ by

ln hom G (A )
|E| + .
ln(1/)
238 7 The Graph Homomorphism Partition Function

Hence for some 0 > 0 approximating hom G (A0 ) is an NP-hard problem. This can
be contrasted with Theorem 3.7.1, where approximability holds for matrices with
positive entries arbitrarily close to 0.
For hardness results on exact computation of hom G , see [BG05] and [C+13], for
hardness of approximate computation, see [GJ12] and [GJ15].
For applications of the correlation decay approach to approximating Hom G , see
[LY13].
Closely related edge-coloring models, also known as vertex models, holant prob-
lems or tensor networks were studied in [Re15] and [PR16]. There we consider all
possible colorings φ of the edges of G = (V, E) into k colors, at each vertex v of G
a complex number z(v, φ) is determined by the multiset of the numbers of edges of
each color that have v as an endpoint, and the partition function computes

z(v, φ).
φ v∈V

It is shown that the partition function is never zero provided

0.35
|1 − z(v, φ)| ≤ for all v ∈ V and all φ,
(G) + 1

which leads to a quasi-polynomial [Re15] and polynomial [PR16] in the case of


a bounded degree (G) algorithms for approximating the partition function in the
corresponding domains.

7.2 Sharpening in the Case of a Positive Real Matrix

In this section, we sharpen the approximation bounds in Theorem 7.1.5, assuming


that the matrix A is positive real.

7.2.1 Theorem. Let


π π
δ3 = tan ≈ 0.36 and δ = tan for integer  ≥ 4.
9 4 ( − 1)

so that δ4 ≈ 0.27, δ5 ≈ 0.20, etc.


Let us fix
0 ≤ δ < δ .

Then there exists a constant γ = γ (δ /δ) > 0 and for every connected graph
G = (V, E), for every positive integer k and every 0 <  < 1 there is a polynomial
p = pG,k,δ, in the entries of a k × k symmetric matrix A such that

deg p ≤ γ (ln |E| − ln )


7.2 Sharpening in the Case of a Positive Real Matrix 239

and
|ln hom G (A) − p(A)| ≤ 
 
for any k × k real symmetric matrix A = ai j such that

1 − ai j ≤ δ for all 1 ≤ i, j ≤ k

provided (G) ≤ .

As in Sect. 7.1, given δ, G,  > 0 and k, the polynomial p can be constructed in


(k|E|) O(ln |E|−ln ) time, where the implied constant in the “O” notation depends on
the ratio δ /δ alone. If  is fixed in advance, the value of p(A) can be computed
in polynomial time (k|E|/) O(1) , where the implied constant in the “O” notation
depends on the ratio δ /δ alone, cf. [PR16].
As in Sects. 3.7 and 4.4, we deduce Theorem 7.2.1 by bounding the complex roots
of hom G away from the positive real axis. As in Sect. 7.1, it is more convenient to
work with the multi-affine extension Hom G , see Sect. 7.1.8. We deduce Theorem
7.2.1 from the following result.

7.2.2 Theorem. For  ≥ 3, let δ be the constant of Theorem 7.2.1 and let us
choose
0 ≤ δ < δ .

Let  
π 1
τ =(1 − δ) sin − arctan δ > 0 if  = 3 and
18 2
 
π 1
τ =(1 − δ) sin − arctan δ > 0 for  ≥ 4.
8 ( − 1) 2

Then for any connected graph G such that (G) ≤ , we have

Hom G (Z ) = 0
 
for any k × k complex symmetric matrix Z = z i j such that

1− z i j ≤ δ and z i j ≤ τ for all 1 ≤ i, j ≤ k.

For the rest of the section, we prove Theorem 7.2.2. Theorem 7.2.1 follows then
as in Sects. 3.7 and 4.4.
L (Z ). For 0 ≤ δ < 1 and
As in Sect. 7.1.9, we define restricted functionals Hom W
0 < τ < 1 − δ, we define a domain U(δ, τ ) = U(δ, τ , G) in the space of matrices Z :
  
U(δ, τ ) = Z = z iuvj : 1− z iuvj ≤ δ, z iuvj ≤ τ

for all {u, v} ∈ E and all 1 ≤ i, j ≤ n .
240 7 The Graph Homomorphism Partition Function

We will use the following observation. Let W be a sequence of distinct vertices, which
includes some two vertices u and v such that {u, v} is an edge of G and let L be a
sequence of not necessarily distinct indices such that |W | = |L|. Let A, B ∈ U(δ, τ )
be two matrices that differ only in the entries z iuvj for 1 ≤ i, j ≤ k. Then
u v
bl(u)l(v)
L (B) =
Hom W u v L (A),
Hom W
al(u)l(v)

where l(u) and l(v) are the indices in L corresponding to u and v in W . In particular,
L (A) and Hom L (B)  = 0 then the angle between two numbers is at most
if Hom W W

τ
2 arctan .
1−δ

7.2.3 Proof of Theorem 7.2.2. Let


π
α = + arctan δ if  = 3 and
9
π
α= + arctan δ if  ≥ 4.
4( − 1)

We introduce the following statements.

Statement 1.r . Let W be a sequence of distinct vertices such that the graph induced
on W is connected and let L be a sequence of not necessarily distinct indices such
that |W | = |L| = r . Then

L (Z )  = 0 for all Z ∈ U (δ, τ ) .


Hom W

Statement 2.r . Let W be a sequence of distinct vertices such that the graph induced
on W is connected and |W | = r . Suppose that W = (W  , u) and let L  be a sequence
of not necessarily distinct indices such that |L  | = r − 1. Then for any two indices
1 ≤ l, m ≤ k and any Z ∈ U (δ, τ ) the angle between complex numbers
 
Hom(W ,u) (W ,u)
(L  ,l) (Z )  = 0 and Hom (L  ,m) (Z )  = 0

does not exceed π/2.


Statement 3.r . Let W be a sequence of distinct vertices such that the graph induced
on W is connected and let L be a sequence of not necessarily distinct indices such
that |W | = |L| = r . Suppose that W = (W  , u) and L = (L  , l) and let v be a
neighbor of u not in the sequence W . Let A, B ∈ U (δ, τ ) be two matrices that differ
only in the entries zluvj where {u, v} ∈ E and j = 1, . . . , k. Then the angle between

L (A)  = 0 and Hom L (B)  = 0


Hom W W

does not exceed α.


7.2 Sharpening in the Case of a Positive Real Matrix 241

First, we claim that Statements 1.r , 2.r and 3.r hold for r = |V |. Indeed, suppose
that r = |V |, so that W is a sequence consisting of all vertices of the graph. Then

u v
L (Z ) =
Hom W zl(u)l(v) = 0
{u,v}∈E

and Statement 1.r follows. Writing W = (W  , u), we have

  v
zlu l(v) 
Hom(W ,u)
(L  ,l) (Z ) = u v Hom(W ,u)
(L  ,m) (Z )
v: {u,v}∈E
z m l(v)

 
and hence the angle between Hom(W ,u) (W ,u)
(L  ,l) (Z )  = 0 and Hom (L  ,m) (Z )  = 0 does not
exceed τ
2 arctan ,
1−δ

which does not exceed π π


< if  = 3
3 2
and does not exceed
π π
< if  ≥ 4.
4( − 1) 2

Hence Statement 2.r follows. Statement 3.r is vacuous since there are no vertices
outside of W .
Next, we claim that Statements 1.(r +1), 2.(r +1) and 3.(r +1) imply Statements
1.r and 3.r for all 1 ≤ r < |V |.
To deduce Statement 1.r , let us choose a sequence W of distinct vertices and a
sequence L of not necessarily distinct indices such that the graph induced on W is
connected |W | = |L| = r . Since W = V there is a vertex v not in W with a neighbor
in W , so that the graph induced on (W, v) is connected. Then by (7.1.9.1) we have


k
(W,v)
L (Z ) =
Hom W Hom(L , j) (Z ). (7.2.3.1)
j=1

(W,v)
By Statement 1.(r + 1) we have Hom(L , j) (Z )  = 0 for all Z ∈ U (δ, τ ) and by
Statement 2.(r + 1) the angle between any two numbers
(W,v) (W,v)
Hom(L ,i) (Z )  = 0 Hom (L , j) (Z )  = 0

does not exceed π/2. By Lemma 3.6.4 we have Hom W L (Z )  = 0 and hence Statement
1.r follows.
To deduce Statement 3.r , let W and L be a sequence as above, |W | = |L| = r
and suppose that W = (W  , u) and L = (L  , l). Suppose that v is a neighbor of u
242 7 The Graph Homomorphism Partition Function

which is not in W and assume that A, B ∈ U (δ, τ ) are two matrices that differ in
the entries zluvj for j = 1, . . . , k only. Let C ∈ U (δ, τ ) be a matrix such that

cluvj = 1 for j = 1, . . . , k

and all other entries coincide with those in A and B. By Statement 1.(r + 1) we have
(W,v)
Hom(L , j) (C)  = 0 for j = 1, . . . , k and by Statement 2.(r + 1) the angle between
(W,v) (W,v)
any two complex numbers Hom(L ,i) (C)  = 0 and Hom (L , j) (C)  = 0 does not exceed
π/2. Applying (7.2.3.1), we can write


k
 
k


L (A) =
Hom W aluvj Hom(W ,u,v)
(L  ,l, j) (C) and Hom L (B) =
W
bluvj Hom(W ,u,v)
(L  ,l, j) (C)
j=1 j=1

and by Lemma 3.7.3 the angle between

L (A)  = 0 and Hom L (B)  = 0


Hom W W

does not exceed τ


2 arctan δ + 2 arcsin ,
1−δ

which is equal to
π
+ arctan δ = α if  = 3
9
and is equal to
π
+ arctan δ = α if  ≥ 4.
4( − 1)

Hence Statement 3.r follows.


Finally, we claim that Statements 1.(r +1), 2.(r +1) and 3.(r +1) imply Statement
2.r for 2 ≤ r < |V |. Let W be a sequence of distinct vertices such that the graph
induced on W is connected, |W | = r and W = (W  , u) and let L  be a sequence of
not necessarily distinct indices such that |L  | = r − 1. Let 1 ≤ l, m ≤ k be any two
indices. Given a matrix A ∈ U (δ, τ ), we define a matrix B ∈ U (δ, τ ) by

bluvj = amuvj for all v such that {u, v} ∈ E and all j = 1, . . . , k (7.2.3.2)

and letting all other entries of B equal to the corresponding entries of A. Then
 
Hom(W ,u) (W ,u)
(L  ,l) (B) = Hom (L ,m) (A).

Let d0 ≥ 1 (we use that r ≥ 2) be the number of neighbors v of u in the sequence W 


and let d1 ≤  − 1 be the number of neighbors v of u not in W  . Then by Statement
1.r we have
7.2 Sharpening in the Case of a Positive Real Matrix 243

 
Hom(W ,u) (W ,u)
(L  ,l) (A)  = 0 and Hom (L  ,m) (A)  = 0

while by Statement 3.r the angle between non-zero complex numbers


  
Hom(W ,u) (W ,u) (W ,u)
(L  ,l) (A) and Hom (L ,m) (A) = Hom (L  ,l) (B)

does not exceed τ


2d0 arctan + d1 α. (7.2.3.3)
1−δ

If  = 3 then (7.2.3.3) does not exceed

d0 π d1 π
+ + (d1 − d0 ) arctan δ. (7.2.3.4)
9 9
If d1 ≥ d0 then (7.2.3.4) does not exceed

2d1 π 4π π
≤ <
9 9 2
and if d1 < d0 then (7.2.3.4) does not exceed

d0 π d1 π π π
+ ≤ < .
9 9 3 2
If  ≥ 4 then (7.2.3.3) does not exceed

d0 π d1 π
+ + (d1 − d0 ) arctan δ. (7.2.3.5)
4( − 1) 4( − 1)

If d1 ≥ d0 then (7.2.3.5) does not exceed

2d1 π π

4( − 1) 2

and if d1 < d0 then (7.2.3.5) does not exceed

π π
< .
4( − 1) 2

Hence Statement 2.r holds.


This proves Statements 1.1, 3.1 and 2.2. Let us choose a vertex u of the graph and
two indices 1 ≤ l, m ≤ k. Given a matrix A ∈ U (δ, τ ), let us define a matrix
B by (7.2.3.2). By Statement 1.1 we have Homlu (A) = 0 and Homlu (B) = 0
and by Statement 3.1 the angle between non-zero complex numbers Homlu (A) and
Homlu (B) = Homum (A) does not exceed
244 7 The Graph Homomorphism Partition Function


3α < if  = 3
3
and does not exceed
π 2π
α < ≤ if  ≥ 4.
2 ( − 1) 3

By (7.1.9.1), we have

k
Hom G (Z ) = Homlu (Z )
l=1

and by Lemma 3.6.4, we have Hom G (Z ) = 0 for all Z ∈ U (δ, τ ). 

7.3 Graph Homomorphisms with Multiplicities

Following [BS16], we consider a refinement of the graph homomorphism partition


function.

7.3.1 Definition. Let G = (V, E) be an undirected graph with set V of vertices, set
E of edges, without loops and multiple edges, and let (G) be the largest degree
of a vertex of G. We assume that (G) ≥ 1. Let m = (m 1 , . . . , m k ) be a vector of
positive integers such that
m 1 + · · · + m k = |V |.
 
For a k × k symmetric complex matrix A = ai j we define the partition function of
graph homomorphisms with multiplicities m by
 
hom G,m (A) = aφ(u)φ(v) . (7.3.1.1)
φ:V −→{1,...,k} {u,v}∈E
|φ−1 (i)|=m i for i=1,...,k

Here the sum is taken over all maps φ : V −→ {1, . . . , k} such that to every
i = 1, . . . , k precisely m i vertices are mapped. We observe that hom G,m (A) is a
polynomial in the entries ai j of A and deg hom G,m = |E|.

In [BS16], the following result is obtained.

7.3.2 Theorem. There is an absolute constant δ0 > 0 (one can choose δ0 = 0.108)
such that for every graph G = (V, E) with the largest degree (G) ≥ 1 of a
vertex and every positive integer vector m = (m 1 , . . . , m k ) of multiplicities such
that m 1 + . . . + m k = |V | we have

hom G,m (A) = 0,


7.3 Graph Homomorphisms with Multiplicities 245
 
provided A = ai j is a k × k symmetric complex matrix satisfying

δ0
|1 − ai j | ≤ for all 1 ≤ i, j ≤ k.
(G)

As in Sect. 7.1, Theorem 7.3.2 implies the following corollary.

7.3.3 Theorem. Let us fix some 0 < δ < δ0 , where δ0 is the constant in Theorem
7.3.2. Then there exists γ = γ(δ0 /δ) > 0 and for any 0 <  < 1, any graph
G = (V, E), any positive integer k-vector m = (m 1 , . . . , m k ) there is a polynomial
p = pG,k,m,δ, in the entries of a k × k symmetric complex matrix A such that

deg p ≤ γ(ln |E| − ln )

and
ln hom G,m (A) − p(A) ≤ 

provided
δ
|1 − ai j | ≤ for all i, j.
(G)

As in Sect. 7.1, given G, m and , the polynomial p of Theorem 7.3.3 can be


computed efficiently, in quasi-polynomial (|E|k) O(ln |E|−ln ) time. Given G, A and
m, we define a univariate polynomial

g(z) = hom G,m (J + z(A − J )) ,

where J = Jk is the k × k matrix filled by 1s, so that

|V |
g(0) = hom G,m (J ) = and g(1) = hom G,m (A).
m1! · · · mk !

For an ordered set I = ({u 1 , v1 }, . . . , {u s , vs }) of distinct edges of G, let V (I ) be


the set of vertices {u 1 , v1 , . . . , u s , vs }. Arguing as in Sect. 7.1,

ds
g(z)
dz s z=0
  |V \ V (I )|!
=    
I =({u 1 ,v1 },...,{u s ,vs }) φ:V (I )−→{1,...,k}
m 1 − φ (1) ! · · · m k − φ−1 (k) !
−1

|φ−1 (i)|≤m i for i=1,...,k


   
× aφ(u 1 )φ(v1 ) − 1 · · · aφ(u s )φ(vs ) − 1 .

Here the outer sum is taken over all ordered collections I of s edges of G, the inner
sum is taken over all maps φ : V (I ) −→ {1, . . . , k} of the endpoints of the edges
from I into the set {1, . . . , k} such that the inverse image of every i ∈ {1, . . . , k}
246 7 The Graph Homomorphism Partition Function

consists of at most m i points from V (I ). The multinomial coefficient

|V \ V (I )|!
   
m 1 − φ−1 (1) ! · · · m k − φ−1 (k) !

accounts for the number of ways to extend φ to the whole set V of vertices of G
in such a way that the inverse image of every i ∈ {1, . . . , k} consists of exactly m i
points.
It follows that g (s) (0) is a polynomial of degree s in the entries of the matrix A
computable in (k|E|) O(s) time. We define f (z) = ln g(z) and proceed as in Sect. 7.1
and in Sect. 3.6.7 before that.
We obtain a quasi-polynomial algorithm to approximate
 hom G,m (A) within a
given relative error , provided the matrix A = ai j satisfies |1 − ai j | ≤ δ for all
i, j and some fixed 0 < δ < δ0 . Patel and Regts show [PR16] that the algorithm can
be made genuinely polynomial provided (G) is fixed in advance.
The functional hom G,m (A) specializes to some combinatorial quantities of inter-
est.
7.3.4 Hafnian. Suppose that G consists of n pairwise disjoint edges, so that |V | =
2n and (G) = 1. Let k = |V | = 2n and let m = (1, . . . , 1). Then

hom G,m (A) = 2n n! haf A, (7.3.4.1)

see Sect. 4.1.1. Indeed, every map φ : V −→ {1, . . . , k} in (7.3.1.1) is necessarily


a bijection and the corresponding term is the product of weights ai1 j1 · · · ain jn in a
perfect matching in the complete graph with k = 2n vertices. Since 2n n! different
maps φ result in the same perfect matching (we can switch the vertices of each edge
and also permute the edges), we obtain (7.3.4.1).
Theorem 4.1.5 is a particular case of Theorem 7.3.2 up to the value of δ0 , which
is better in Theorem 4.1.5.
More generally, suppose that k ≥ 2n, that m = (1, . . . , 1) and that G consists of
n pairwise disjoint edges and k − 2n isolated points. Then

hom G,m (A) = (k − 2n)!2n n! haf n A,

where haf n A enumerates matchings of size n in the complete graph with weights
ai j on the edges, see Sect. 5.1.1.
7.3.5 Hamiltonian permanent. Suppose that G is a cycle with n vertices, so that
|V | = n and (G) = 2. Let k = |V | = n and let m = (1, . . . , 1). Then

hom G,m (A) = n ham A, (7.3.5.1)

where ham A enumerates Hamiltonian cycles in the complete graph with n vertices
and weights ai j on the edges, see Sect. 3.8. The factor n in (7.3.5.1) accounts for
7.3 Graph Homomorphisms with Multiplicities 247

n distinct functions φ that differ by a cyclic shift of vertices of G and produce the
sameHamiltonian
 cycle. It follows from Theorem 7.3.2 that ham A = 0 provided
A = ai j is a complex symmetric matrix satisfying |1−ai j | ≤ δ0 /2 for all i, j where
δ0 is the constant in Theorem 7.3.2. Consequently, ham A can be approximated within
relative error  in quasi-polynomial time provided |1 − ai j | ≤ δ/2, where δ < δ0 is
fixed in advance.
7.3.6 Enumerating independent sets. Let k = 2, let m = (m 1 , m 2 ) and let us
choose  
0 1
A= .
1 1

A map φ : V −→ {1, 2} contributes 1 to hom G,m (A) in (7.3.1.1) if φ−1 (1) ⊂ V


is an independent set and contributes 0 otherwise. Consequently, hom G,m (A) is the
number of independent sets in G of cardinality m 1 . Detecting an independent set of a
given size in a graph is a notoriously hard problem. For example, for any 0 <  < 1
fixed in advance, it is an NP-hard problem to approximate the largest cardinality of
an independent set in G = (V, E) within a factor of |V |1− [Hå99, Zu07].
Let us choose 0 < δ < δ0 as in Theorem 7.3.3 and let us define
 δ δ

1− (G)
1+ (G)
=
A .
δ δ
1+ (G)
1+ (G)

 can be approximated in quasi-polynomial time. For a subset S ⊂ V ,


Now hom G,m ( A)
let e(S) be the number of edges of G spanned by the vertices of S. Then
 −|E| 
δ  =
1+ hom G,m ( A) w(S) where
(G) S⊂V
|S|=m 1 (7.3.6.1)
 −e(S)  e(S)
δ δ
w(S) = 1 + 1− .
(G) (G)

In particular,
 
e(S)
w(S) ≤ exp −2δ and w(S) = 1 if S is independent.
(G)

Thus the sum (7.3.6.1) accounts for all subsets S ⊂ V of m 1 vertices, where inde-
pendent subsets are counted with weight 1, and all other subsets are counted with
weight exponentially small in the number of edges they span. Computing (7.3.6.1)
allows us to distinguish graphs that are sufficiently far from having an independent
set of size m 1 (for example, when every subset in of m 1 vertices spans at least |E|
edges for some  > 0) from graphs that have many independent sets of size m 1 (for
example, when the probability that a randomly chosen m 1 -subset is independent is
248 7 The Graph Homomorphism Partition Function

at least 2e−2δ|E|/(G) ). Note that in the latter case, if G is not very far from regular,
so that |E|/(G)| ∼ |V |, the probability to hit such an independent set at random
is exponentially small in |V |.

7.3.7 A multi-affine version of hom G,m (A). We introduce an extension of


hom G,m (A). Let Z = z iuvj be a |E| × k(k+1) 2
matrix (tensor) indexed by edges
{u, v} ∈ E of the graph G and unordered pairs {i, j} of not necessarily distinct
{u,v}
indices 1 ≤ i, j ≤ k. We write z iuvj instead of z {i, j} assuming that

z iuvj = z ivuj = z vu
ji = z ji .
uv

We define
 
Hom G,m (Z ) = uv
z φ(u)φ(v) . (7.3.7.1)
φ:V −→{1,...,k} {u,v}∈E
|φ−1 (i)|=m i for i=1,...,k

 
If A = ai j is k × k symmetric matrix and z iuvj = ai j for all {u, v} ∈ E, we clearly
have hom G,m (A) = Hom G,m (Z ). The advantage of working with Hom G,m (Z ) is
that it is multi-affine, that is, the degree of every variable in Hom G,m (Z ) does not
exceed 1. We will prove that Hom G,m (Z ) = 0 for complex Z = z iuvj provided

δ0
1 − z iuvj ≤ for all {u, v} ∈ E and all 1 ≤ i, j ≤ k,
(G)

where δ0 > 0 is an absolute constant (one can choose δ0 = 0.108).

Given δ > 0, we define U(δ) ⊂ C|E| × Ck(k+1)/2 ,


 
U(δ) = Z = z iuvj : 1 − z iuvj ≤ δ (7.3.7.2)

(we suppress dependence on G in the notation). Hence our goal is to prove that

δ0
Hom G,m (Z ) = 0 for all Z ∈ U(δ) where δ = .
(G)

7.3.8 Recursion. We need a version of the recurrence formula (7.1.9.1). Let


W = (v1 , . . . , vr ) be an ordered sequence of vertices of G. A sequence W is called
admissible if all vertices v1 , . . . , vr are distinct. Let L = (i 1 , . . . , ir ) be a sequence
of indices 1 ≤ i j ≤ k. The multiplicity m i (L) of i in L is the number of occurrences
of a given 1 ≤ i ≤ k in L:
m i (L) = j : i j = i .

We call a sequence L admissible if m i (L) ≤ m i for i = 1, . . . , k. For admissible


sequences W = (v1 , . . . , vr ) of vertices and L = (i 1 , . . . , ir ) of indices such that
7.3 Graph Homomorphisms with Multiplicities 249

|W | = |L|, we define
 
L (Z ) =
Hom W .
uv
z φ(u)φ(v)
φ:V −→{1,...,k} {u,v}∈E
|φ−1 (i)|=m i for i=1,...,k
φ(v j )=i j for j=1,...,r

Hence Hom W L (Z ) is obtained by restricting the sum (7.3.1.1) to the maps φ that map
the vertices v j into i j for j = 1, . . . , r . If W = ∅ and L = ∅ then Hom W L (Z ) =
Hom G,m (Z ).

Let W be an admissible sequence of vertices, let L be an admissible sequence of


indices such that |W | = |L|.
Let v ∈ V be a vertex such that the sequence (W, v) obtained by appending W
by v is admissible (that is, v is not in W ). Then
 (W,v)
L (Z ) =
Hom W Hom(L ,i) (Z ). (7.3.8.1)
i=1,...,k
(L ,i) is admissible

where (L , i) denotes the sequence L appended by i.


Let 1 ≤ i ≤ k be an index such that the sequence (L , i) is admissible. Then
m i (L) < m i and

1  (W,v)
L (Z ) =
Hom W Hom(L ,i) (Z ). (7.3.8.2)
m i − m i (L) v∈V
(W,v) is admissible

We note that swapping the values on any two vertices u, v ∈ V does not change
the multiplicities of the values of φ.

To proceed with the induction, we need a simple geometric lemma which says
that the sum of vectors rotates by a small angle if each vector is perturbed slightly
and the vectors point roughly in the same direction.

7.3.9 Lemma. Let a1 , . . . , an and b1 , . . . , bn be complex numbers such that a1 , . . . ,


an are non-zero and
bj
− 1 ≤  for j = 1, . . . , n
aj

and some 0 <  < 1. Let


n 
n
a= a j and b = bj
j=1 j=1

and suppose that


250 7 The Graph Homomorphism Partition Function


n
|a| ≥ τ |a j |
j=1

for some 1 ≥ τ > . Then a = 0, b = 0 and the angle between a and b does not
exceed 
arcsin .
τ
Proof. Clearly a = 0. Writing

b j = (1 +  j )a j where | j | ≤  for j = 1, . . . , n,

we obtain


n 
n 
n 
n

b= (1 +  j )a j = a +  j a j where jaj ≤  |a j | ≤ |a|.
j=1 j=1 j=1 j=1
τ

Hence
b 
−1 ≤
a τ

and
b 
arg ≤ arcsin ,
a τ

cf. Fig. 3.7. The proof now follows. 

Building on Lemma 7.3.9, we supply the first ingredient of our induction argu-
ment.

7.3.10 Lemma. Let us fix an admissible sequence W of vertices, an admissible


sequence L of indices such that 0 ≤ |W | = |L| ≤ |V | − 2, a complex tensor Z , a
real  > 0 and a real 0 ≤ α < 2π/3 such that  < cos(α/2) and let

ω = arcsin .
cos(α/2)

Suppose that for any two vertices u, v ∈ V and for any two indices 1 ≤ i, j ≤ k such
that the sequences (W, u, v) and (L , i, j) are admissible, we have Hom(W,u,v)
(L ,i, j) (Z )
= 0, Hom(W,u,v)
(L , j,i) (Z )  = 0 and

Hom(W,u,v)
(L , j,i) (Z )
− 1 ≤ .
Hom(W,u,v)
(L ,i, j) (Z )
7.3 Graph Homomorphisms with Multiplicities 251

(1) Let us fix two vertices v, u ∈ V such that the sequence (W, u, v) is admissible
and an index i such that the sequence (L , i) is admissible. Suppose that for
any two indices j1 and j2 such that the sequences (L , i, j1 ) and (L , i, j2 ) are
admissible, the angle between two non-zero complex numbers Hom(W,u,v) (L ,i, j1 ) (Z )
and Hom(W,u,v) (W,u) (W,v)
(L ,i, j2 ) (Z ) does not exceed α. Then Hom (L ,i) (Z )  = 0, Hom (L ,i) (Z )  =
0 and the angle between the complex numbers does not exceed ω.
(2) Let us fix two indices i and j, possibly equal, such that the sequence (L , i, j) is
admissible and a vertex u such that the sequence (W, u) is admissible. Suppose
that for any two vertices v1 and v2 such that the sequences (W, u, v1 ) and
(W, u, v2 ) are admissible, the angle between two non-zero complex numbers
Hom(W,u,v 1) (W,u,v2 ) (W,u)
(L ,i, j) (Z ) and Hom (L ,i, j) (Z ) does not exceed α. Then Hom (L ,i) (Z )  =
0, Hom(W,u)
(L , j) (Z )  = 0 and the angle between the complex numbers does not
exceed ω.

Proof. To prove Part (1), using (7.3.8.1), we write



Hom(W,u)
(L ,i) (Z ) = Hom(W,u,v)
(L ,i, j) (Z ) and
j=1,...,k
(L ,i, j) is admissible
(W,v)

Hom(L ,i) (Z ) = Hom(W,u,v)
(L , j,i) (Z ).
j=1,...,k
(L ,i, j) is admissible

For j such that (L , i, j) is admissible, let us denote

a j =Hom(W,u,v) (W,u,v)
(L ,i, j) (Z ), b j = Hom (L , j,i) (Z ),
 
a= a j and b = bj.
j j

By Lemma 3.6.3, we have


 α
|a| ≥ τ |a j | for τ = cos .
j
2

Since
a = Hom(W,u) (W,v)
(L ,i) (Z ) and b = Hom (L ,i) (Z ),

the result follows by Lemma 7.3.9.


To prove Part (2), using (7.3.8.2), we write

1 
Hom(W,u)
(L ,i) (Z ) = Hom(W,u,v)
(L ,i, j) (Z ) and
m j − m j (L , i) v∈V
(W,u,v) is admissible
252 7 The Graph Homomorphism Partition Function

(W,v) 1 
Hom(L , j) (Z ) = Hom(W,u,v)
(L , j,i) (Z ).
m i − m i (L , j) u∈V
(W,u,v) is admissible

For v such that (W, u, v) is admissible, let us denote

av =Hom(W,u,v) (W,u,v)
(L ,i, j) (Z ), bv = Hom (L , j,i) (Z ),
 
a= av and b = bv .
v v

By Lemma 3.6.3, we have


 α
|a| ≥ τ |av | for τ = cos .
v
2

By Lemma 7.3.9, the angle between non-zero complex numbers a and b does not
exceed ω. Since
1 1
Hom(W,u)
(L ,i) (Z ) =
(W,v)
a and Hom(L , j) (Z ) = b,
m j − m j (L , i) m i − m i (L , j)

the proof follows. 


7.3.11 Finding a fixed point. The gist of Lemma 7.3.10 is as follows. Suppose that
the value of Hom W L (Z ) does not change much if we permute any two indices in L,
or, equivalently, any two vertices in W . We would like to know how the argument
of the complex number Hom W L (Z ) changes if we change one vertex in W or one
index in L. Let r = |W | = |L| be the length of the sequences. In Lemma 7.3.10 we
show that if Hom W

L (Z ) does not rotate much when we change one index in L then
Hom W L  (Z ) does not rotate much if we change one vertex in W  for shorter sequences
 
|W | = |L | = r − 1 and if Hom W L (Z ) does not rotate much when we change one

vertex in W then Hom W L  (Z ) does not rotate much if we change one index in L  for
 
shorter sequences |W | = |L | = r − 1.
We would like to find a fixed point of the conditions of Lemma 7.3.10 for which
α = ω. That is, we want to find an  > 0 for which the equation

α = arcsin
cos(α/2)

has a solution 0 ≤ α < 2π/3. It is clear that for all sufficiently small  > 0 such a
solution exists. In fact, any

α 4
0<≤ max (sin α) cos = √ (7.3.11.1)
0≤α<2π/3 2 3 3

will do.
7.3 Graph Homomorphisms with Multiplicities 253

Next, we link the property that Hom WL (Z ) doesn’t change much if any two vertices
in W or any two indices in L are permuted with partial derivatives. We recall the
definition (7.3.7.2) of the polydisc U(δ).

7.3.12 Lemma. Let us fix an integer 2 ≤ r ≤ |V | and real τ > 0 and 0 < δ < 1.
Suppose that for any admissible sequences W of vertices and L of indices such that
|W | = |L| = r and for any Z ∈ U(δ) we have Hom W L (Z )  = 0 and the following
condition holds: if W = (W  , v) and L = (L  , i), then

τ  ∂
L (Z ) ≥
Hom W z vw L (Z ) ,
Hom W
(G) w: {w,v}∈E il ∂z ilvw
l: 1≤l≤k

for any Z ∈ U(δ).


Then for any admissible W and L such that |W | = |L| = r and for any Z ∈ U(δ),
the following condition is satisfied: if W = (W  , u, v) and L = (L  , i, j) then

Hom(W ,u,v)
(L  , j,i) (Z ) 4δ(G)
 − 1 ≤ eξ − 1 where ξ = .
Hom(W ,u,v)
(L  ,i, j) (Z )
(1 − δ)τ

Proof. Let us choose admissible W and L such that |W | = |L| = r and suppose
that W = (W  , u, v) and L = (L  , j, i). Without loss of generality, we assume that
i = j. Since Hom W L
(Z ) = 0 for all Z ∈ U(δ), we choose a continuous branch of
ln Hom WL (Z ), so that ln Hom WL (Z ) is real when Z is the matrix of 1s. Then
 
∂ ∂
ln Hom W
(Z ) = Hom W
(Z ) /Hom W
L (Z )
∂z ilvw L
∂z ilvw L

xy xy
and using that the coordinates z ab of any Z ∈ U(δ) satisfy z ab ≥ 1 − δ, we obtain

 ∂ (G)
vw ln Hom L (Z ) ≤
W
and
w: {w,v}∈E
∂z il (1 − δ)τ
l: 1≤l≤k
(7.3.12.1)
 ∂ (G)
L (Z ) ≤
ln Hom W .
w: {w,u}∈E
∂z uw
jl (1 − δ)τ
l: 1≤l≤k

Given a matrix A ∈ U(δ), we define a matrix B ∈ U(δ) by

jl = ail
buw for all w = v such that {u, w} ∈ E and all l = 1, . . . , k
uw

bilvw = a vw
jl for all w = u such that {v, w} ∈ E and all l = 1, . . . , k,

while making all other entries of B equal to the corresponding entries of A. Then
254 7 The Graph Homomorphism Partition Function

 
Hom(W ,u,v) (W ,u,v)
(L  , j,i) (B) = Hom (L  ,i, j) (A)

and from (7.3.12.1), we conclude


 
ln Hom(W ,u,v) (W ,u,v)
(L  , j,i) (A) − ln Hom (L  ,i, j) (A)
 
= ln Hom(W ,u,v) (W ,u,v)
(L  , j,i) (A) − ln Hom (L  ,i, j) (B)
⎛ ⎞
⎜  ∂  ∂ ⎟
≤ max ⎜ uw ln Hom L (Z ) +
W ⎟
vw ln Hom L (Z ) ⎠
W
Z ∈U (δ) ⎝ ∂z jl ∂z il
w: {u,w}∈E w: {v,w}∈E
l: 1≤l≤k l: 1≤l≤k
⎛ ⎞
2(G) 4δ(G)
× ⎝ max a uw uw uw vw ⎠
jl − b jl , ail − bil ≤ × (2δ) = = ξ.
w∈W (1 − δ)τ (1 − δ)τ
1≤l≤k

Denoting

Hom(W ,u,v)
(L  , j,i) (Z )
ζ=  ,
Hom(W ,u,v)
(L  ,i, j) (Z )

we conclude that | ln ζ| ≤ ξ. Denoting s = ln ζ, we conclude


 ∞

sn |s|n
|ζ − 1| = es − 1 = ≤ = eξ − 1.
n=1
n! n=1
n!

7.3.13 Tuning up ξ. We would like to have

eξ − 1 ≤ 

for some  satisfying (7.3.11.1), see Sect. 7.3.11, so we choose

ξ = ln (1 + ) .

Our next (and last) lemma relates the parameter τ in Lemma 7.3.12 to the angles
between various complex numbers Hom W L (Z ).

7.3.14 Lemma. Let 0 ≤ α < 2π/3 be a real number, let W be an admissible


sequence of vertices and let L be an admissible sequence of indices such that 1 ≤
|W | = |L| ≤ |V | − 1. Suppose that for every Z ∈ U(δ), for every w such that
(W, w) is admissible and for every 1 ≤ l, j ≤ k such that (L , l) and (L , j) are
(W,w) (W,w)
admissible, we have Hom(L ,l) (Z )  = 0, Hom (L , j) (Z )  = 0 and the angle between
the two complex numbers does not exceed α.
7.3 Graph Homomorphisms with Multiplicities 255

Suppose that W = (W  , v) and L = (L  , i). Then

τ  ∂ α
L (Z ) ≥
Hom W z vw L (Z )
Hom W for τ = cos .
(G) w: {w,v}∈E i j ∂z ivw
j 2
j: 1≤ j≤k

Proof. Let w be a vertex such that {v, w} ∈ E. If w is an element of W  then



∂ 
L (Z ) if the element of L corresponding to w is j
Hom W
z ivw Hom W
(Z ) =
j
∂z ivw
j
L
0 otherwise

(here we use that Hom W L (Z ) is a multi-affine function of Z ).


If w = v is not an element of W  then (W, w) is an admissible sequence of vertices
and 
(W,w)
vw ∂ Hom(L , j) (Z ) if (L , j) is admissible
zi j vw Hom W
(Z ) =
∂z i j L
0 otherwise.

By (7.3.8.1), if w = v is not in W  then


 (W,w)
L (Z ) =
Hom W Hom(L , j) (Z )
j: 1≤ j≤k
(L , j) is admissible

and hence by Lemma 3.6.3,


 (W,w)
L (Z ) ≥ τ
Hom W Hom(L , j) (Z ) .
j: 1≤ j≤k
(L , j) is admissible

Denoting by d0 the number of vertices w such that {w, v} ∈ E and w is an element


of W  and by d1 the number of vertices w such that {w, v} ∈ E and w is not an
element of W  , we obtain

 ∂
z ivw L (Z ) = d0 Hom L (Z )
Hom W W

w: {w,v}∈E
j
∂z ivw
j
j: 1≤ j≤k
 (W,w)
+ Hom(L , j) (Z )
w: {w,v}∈E,
w is not in W 
j: 1≤ j≤k,
(L , j) is admissible

−1 (G)
L (Z ) + d1 τ
≤ d0 Hom W L (Z ) ≤
Hom W L (Z )
Hom W
τ
and the proof follows. 
256 7 The Graph Homomorphism Partition Function

Now we are ready to prove Theorem 7.3.2.

7.3.15 Proof of Theorem 7.3.2. First we define some constants. For some 0 < α <
2π/3, to be specified later, we choose
α 
 = (sin α) cos so that α = arcsin ,
2 cos(α/2)

see Sect. 7.3.11. Let

ξ = ln(1 + ) so that eξ − 1 = ,

see Sect. 7.3.13. and let α


τ = cos .
2
see Lemma 7.3.14. We define
ξτ
δ0 =
4 + ξτ

and let
δ0
δ= ,
(G)

so that
4δ(G)
≤ ξ,
(1 − δ)τ

see Lemma 7.3.12. As our goal is to maximize δ0 , we choose α to maximize


α α
ξτ = cos ln 1 + (sin α) cos .
2 2
Numerical computations show that it is reasonable to choose

α=1

so that
 ≈ 0.74, ξ ≈ 0.55, τ ≈ 0.88

and
δ0 > 0.108.

Our goal is to show that

Hom G,m (Z ) = 0 for all Z ∈ U(δ).


7.3 Graph Homomorphisms with Multiplicities 257

We prove by descending induction for r = |V |, |V | − 1, . . . , 2 the following State-


ments 1.r –5.r .

Statement 1.r . Let W be an admissible sequence of vertices and let L be an


L (Z )  = 0.
admissible sequence of indices such that |W | = |L| = r . Then Hom W

Statement 2.r . Let W be an admissible sequence of vertices and let L be an


admissible sequence of indices such that |W | = |L| = r . If W = (W  , v) and
L = (L  , i) then

τ  ∂
L (Z ) ≥
Hom W z vw L (Z ) .
Hom W
(G) w: {w,v}∈E il ∂z ilvw
l: 1≤l≤k

Statement 3.r . Let W be an admissible sequence of vertices and let L be an


admissible sequence of indices such that |W | = |L| = r . If W = (W  , u, v) and
L = (L  , i, j) then

Hom(W ,u,v)
(L  , j,i) (Z )
 ,u,v) − 1 ≤ .
Hom(W(L  ,i, j) (Z )

Statement 4.r . Let W be an admissible sequence of vertices such that |W | = r .


Suppose that W = (W  , w) and let L  be an admissible sequence of indices such that
|L  | = r − 1. Let i and j be indices such that the sequences (L  , i) and (L  , j) are

(W  ,w)
admissible. Then Hom(W ,w)
(L  ,i) (Z )  = 0, Hom (L  , j) (Z )  = 0 and the angle between the
complex numbers does not exceed α.

Statement 5.r . Let L be an admissible sequence of vertices such that |L| = r .


Suppose that L = (L  , i) and let W  be an admissible sequence of vertices such that
|W  | = r − 1. Let u and v be vertices such that the sequences (W, u) and (W, v) are

(W  ,v)
admissible. Then Hom(W ,u)
(L  ,i) (Z )  = 0, Hom (L  ,i) (Z )  = 0 and the angle between the
complex numbers does not exceed α.

Suppose that r = |V | and let W = (v1 , . . . , vr ) and L = (i 1 , . . . , ir ). Then


 v v
L (Z ) =
Hom W z i jjil l ,
1≤ j<l≤r :
{v j ,vl }∈E

and hence Statement 1.r holds. Furthermore, if deg vr is the degree of vr , we get

 ∂
z ivrrlw L (Z ) = (deg vr ) Hom L (Z )
Hom W W

w: {w,vr }∈E
∂z ivrrlw
l: 1≤l≤k
258 7 The Graph Homomorphism Partition Function

and Statement 2.r follows as well. Lemma 7.3.12 implies that Statement 3.r holds.
Statements 4.r and 5.r hold since if L  is an admissible sequence of indices such
that |L  | = |V | − 1 then there is a unique index i such that the sequence (L  , i) is
admissible and if W  is an admissible sequence of vertices such that |W  | = |V | − 1
then there is a unique vertex w such that the sequence (W  , w) is admissible.
From formula (7.3.8.1) and Lemma 3.6.3, we get the implication:

Statement 1. r and Statement 4.r =⇒ Statement 1.(r − 1).

From Lemma 7.3.14, we get the implication

Statement 4.r =⇒ Statement 2.(r − 1).

From Lemma 7.3.12, we get the implication

Statement 1.(r − 1) and Statement 2.(r − 1) =⇒ Statement 3.(r − 1).

From Part 1 of Lemma 7.3.10, we get the implication

Statement 3.r and Statement 4.r =⇒ Statement 5.(r − 1).

From Part 2 of Lemma 7.3.10, we get the implication

Statement 3.r and Statement 5.r =⇒ Statement 4.(r − 1).

This proves Statements 1.2–5.2. Applying again Part 2 of Lemma 7.3.10, we get
the implication

Statement 3.2 and Statement 5.2 =⇒ Statement 4.1.

Then from formula (7.3.8.1) and Lemma 3.6.3, we get the implication

Statement 4.1 =⇒ Statement 1.0,

which completes the proof. 

7.4 The Lee–Yang Circle Theorem and the Ising Model

Our goal is to prove the following remarkable theorem of Lee and Yang [LY52].
 
7.4.1 Theorem. Let A = ai j be an n × n complex Hermitian matrix (thus we
have ai j = a ji for all i, j) such that |ai j | ≤ 1 for all 1 ≤ i, j ≤ n. Let us define a
univariate polynomial
7.4 The Lee–Yang Circle Theorem and the Ising Model 259

Fig. 7.2 The cut created by


a set S of vertices and the
directed edges contributing
to the weight of the cut

 
Cut A (z) = z |S| ai j .
S⊂{1,...,n} i∈S
j ∈S
/

Then every root z 0 of Cut A satisfies |z 0 | = 1.

The polynomial Cut A (z) enumerates all cuts in the complete directed graph G
with set {1, . . . , n} of vertices and weight ai j on the edge i → j, cf. Sect. 7.1.11.
Every subset S ⊂ {1, . . . , n} of vertices, including S = ∅, creates a cut. The weight
of the cut is the product of weights of all directed edges of G that originate in S and
end outside of S (for S = ∅ and for S = {1, . . . , n} the weight of the cut is 1), see
Fig. 7.2, while the monomial z |S| accounts for the cardinality of the set S.
We note that the weights of the cuts corresponding to a set S and to its complement
are complex conjugates of each other and hence
 
1
z n Cut A = Cut A (z).
z

As follows from Lemma 2.2.1, see also Sect. 3.6, Theorem 4.1.5, Theorem 4.4.2,
Sect. 6.1.5, Theorems 7.1.5 and 7.2.3, for any 0 < δ < 1, fixed in advance and any
0 <  < 1 there is a polynomial
  p = pn,δ, in z and the entries ai j of an n × n
Hermitian matrix A = ai j such that deg p = O (ln n − ln ) and

|ln Cut A (z) − p(A, z)| ≤ 

provided |ai j | ≤ 1 for all i, j and |z| ≤ δ. As before, the approximating polynomial
p can be computed in n O(ln n−ln ) time.
Our proof follows [Hi97], see also [Ru71] and [As70].

7.4.2 Lemma. Let a be a complex number such that |a| ≤ 1 and let z 1 and z 2 be
complex numbers such that |z 1 |, |z 2 | < 1. Then

1 + az 1 + az 2 + z 1 z 2 = 0.
260 7 The Graph Homomorphism Partition Function

Proof. If |a| = 1 then aa = 1 and

1 + az 1 + az 2 + z 1 z 2 = (1 + az 1 ) (1 + az 2 )

and the proof follows. Hence we may assume that |a| < 1. Solving the equation

1 + az 1 + az 2 + z 1 z 2 = 0

for z 2 , we obtain
1 + az 1
z2 = − . (7.4.2.1)
a + z1

For any z such that |z| = 1, we have

|1 + az| = |1 + az| and |a + z| = |a + z||z| = |az + 1|,

from which it follows that the transformation


1 + az
z −→ −
a+z

maps the unit circle |z| = 1 onto itself and the disc |z| < 1 onto its complement
|z| > 1 (we use that |a| < 1). Therefore, if z 2 satisfies (7.4.2.1) with some |z 1 | < 1,
we must have |z 2 | > 1 and the proof follows. 

7.4.3 Proof of Theorem 7.4.1. Let us consider an n-variate polynomial


  
p A (z 1 , . . . , z n ) = zS ai j , where z S = zi . (7.4.3.1)
S⊂{1,...,n} i∈S i∈S
j ∈S
/

For 1 ≤ i < j ≤ n, let us define


   
pi j (z 1 , . . . , z n ) = zS + z S + ai j z S + a ji zS
S⊂{1,...,n} S⊂{1,...,n} S⊂{1,...,n} S⊂{1,...,n}
i ∈S,
/ j ∈S
/ i∈S, j∈S i∈S, j ∈S
/ i ∈S,
/ j∈S
⎛ ⎞
 ⎜  ⎟
= 1 + ai j z i + a ji z j + z i z j ⎜
⎝ zS ⎟

S⊂{1,...,n}
i ∈S,
/ j ∈S
/
  
= 1 + ai j z i + a ji z j + z i z j (1 + z k ) .
k∈{1,...,n}\{i, j}

From Lemma 7.4.2 it follows that

pi j (z 1 , . . . , z n ) = 0 provided |z 1 |, . . . , |z n | < 1.
7.4 The Lee–Yang Circle Theorem and the Ising Model 261

Therefore, for any real 0 < ρ < 1, the polynomial

(z 1 , . . . , z n ) −→ pi j (ρz 1 , . . . , ρz n )

is D-stable, see Sect. 2.5. On the other hand, p A is the Schur (Hadamard) product of
the polynomials pi j over all pairs 1 ≤ i < j ≤ n. Therefore, by Theorem 2.5.1, for
any real 0 < ρ < 1, the polynomial
 
z S ρ|S|n(n−1)/2 ai j
S⊂{1,...,n} i∈S
j ∈S
/

is D-stable. Taking the limit as ρ −→ 1, by Hurwitz’s Theorem, cf. the proof of


Lemma 2.4.2, we conclude that

p A (z 1 , . . . , z n ) = 0 provided |z 1 |, . . . , |z n | < 1.

Therefore,
Cut A (z) = p A (z, . . . , z) = 0 provided |z| < 1.

Since  
1
z n Cut A = Cut A (z),
z

we conclude that
Cut A (z) = 0 provided |z| > 1

and the proof follows. 


As follows from our proof, we have
 
p A (z 1 , . . . , z n ) = zS ai j = 0
S⊂{1,...,n} i∈S
j ∈S
/

provided
|z i | < 1 for i = 1, . . . , n.

Consequently, for any 0 < δ < 1, fixed in advance, there is an algorithm which,
given a Hermitian matrix A = ai j such that |ai j | ≤ 1 for all i and j, complex
z 1 , . . . , z n such that |z i | ≤ δ for i = 1, . . . , n and a real 0 <  < 1 approximates
p A (z 1 , . . . , z n ) within a relative error of  in n O(ln n−ln ) time. For a Markov Chain
Monte Carlo based algorithm, see [JS93].
7.4.4 The Ising model. One of the oldest and most famous models in statistical
physics, the Ising model, seeks to explain the phase transition in magnetization. It
is described as follows: let G = (V, E) be an undirected graph without loops or
262 7 The Graph Homomorphism Partition Function

Fig. 7.3 The graph of a


rectangular region of Z2

multiple edges. Typically, G is a graph of a rectangular region of the 2-dimensional


integer grid Z2 or a cubical region of the 3-dimensional grid Z3 , see Fig. 7.3.

We think of the vertices of G, which we number 1, 2, . . . , |V |, as of atoms. Suppose


that some real numbers bi j for {i, j} ∈ E are attached to the edges of G, which
characterize interactions between neighboring atoms and that real numbers ci for
i = 1, . . . , N are attached to the vertices of G, which characterize the external
magnetic field. An assignment σ : V −→ {−1, 1} of signs to the vertices of G is
called a configuration and the signs themselves are interpreted as spins of the atoms.
The energy of the configuration σ is defined as
 
H (σ) = − bi j σ(i)σ( j) − ci σ(i).
{i, j}∈E i∈V

The partition function of the Ising model is just the sum over all 2|V | configurations:

Z (G, t) = e−γ H (σ)/t
σ:V →{−1,1}
⎧ ⎛ ⎞⎫
 ⎨   ⎬ (7.4.4.1)
= exp γt −1 ⎝ bi j σ(i)σ( j) + ci σ(i)⎠ ,
⎩ ⎭
σ:V −→{−1,1} {i, j}∈E i∈V

where t > 0 is a parameter interpreted as the temperature and γ > 0 is an absolute


constant. The partition function defines a probability distribution on the set of all 2|V |
configurations:

e−γ H (σ)/t
Pr(σ) = for σ : V −→ {−1, 1}. (7.4.4.2)
Z (G, t)

Some observations are in order. As the temperature t −→ +∞ grows, the dis-


tribution approaches the uniform distribution on the set of all configurations. As the
temperature t −→ 0+ falls to 0, the distribution concentrates on the configurations
with the lowest energy. Suppose that ci = 0 for all i, so that there is no external
7.4 The Lee–Yang Circle Theorem and the Ising Model 263

Fig. 7.4 The most likely (a) (c)


configurations in the
ferromagnetic case a and b
and in the anti-ferromagnetic
case c and d. Black dots
denote spins +1 while white
dots denote spins −1

(b) (d)

magnetic field. If all bi j > 0 then the configurations where the spins of neighboring
atoms coincide have lower energy and hence higher probability. This is called the
ferromagnetic case. If all bi j < 0 then the configurations where the spins of neigh-
boring atoms are opposite have lower energy and hence higher probability. This is
called the anti-ferromagnetic case, see Fig. 7.4.
One can observe now that by a change of variables, Z (G, t) is transformed into
the partition function of Theorem 7.4.1, more precisely into (7.4.3.1). Namely, we
write
⎧ ⎛ ⎞⎫
⎨   ⎬
Z (G, t) = exp γt −1 ⎝ bi j + ci ⎠
⎩ ⎭
{i, j}∈E i∈V
⎧ ⎛ ⎞⎫

⎪ ⎪

 ⎨ ⎜   ⎟⎬
−1 ⎜ ⎟
× exp −2γt ⎝ bi j + ci ⎠ .

⎪ ⎪

σ: V →{−1,1} ⎩ {i, j}∈E: i∈V : ⎭
σ(i)=σ( j) σ(i)=−1

A configuration σ : V −→ {−1, 1} is uniquely determined by the subset S ⊂ V of


vertices where σ(i) = 1. Hence letting

ai j = exp −2γt −1 bi j and z i = exp −2γt −1 ci ,

we can further write


264 7 The Graph Homomorphism Partition Function
⎧ ⎛ ⎞⎫
⎨   ⎬ 
Z (G, t) = exp γt −1 ⎝ bi j + ci ⎠ zS ai j ,
⎩ ⎭
{i, j}∈E i∈V S⊂V i∈S
j ∈S
/

where we agree that ai j = 1 if {i, j} ∈


/ E (equivalently, we agree that bi j = 0 if
{i, j} ∈
/ E) and 
zS = zi .
i∈S

Hence up to a simple factor, Z (G, t) is indeed transformed into the partition func-
tion (7.4.3.1) of cuts. Moreover, the case of |ai j | ≤ 1 treated by Theorem 7.4.1
corresponds to the ferromagnetic case of bi j ≥ 0. Theorem 7.4.1 thus says that in
the ferromagnetic case the roots of c −→ Z (G, t; c), as a function of the constant
magnetic field ci = c interpreted as a complex variable, are purely imaginary, that
is, satisfy c = 0.
There are two related, though not identical, notions of a phase transition in the Ising
model. Both are asymptotic, as the graph grows in some regular way (for example,
when the square region in Z2 or the cubical region in Z3 gets larger, see Fig. 7.3). The
first notion has to do with complex zeros of the partition function Z (G, t) defined
by (7.4.4.1). Various quantities that have physical interpretation can be expressed in
terms of the “free energy per atom”

1
ln Z (G, t),
|V |

see [Ci87] and references therein. If for a sequence G n = (Vn , E n ) of growing graphs
a complex zero of the function t −→ Z (G n , t) approaches the positive real axis, it
means that the “thermodynamic limit”

1
lim ln Z (G n , t)
n−→∞ |Vn |

hits a singularity at some temperature tc , and hence those physical quantities hit a
singularity (discontinuity or loss of smoothness) as well, which is an indication of a
phase transition (such as the loss of the magnetization or gas becoming liquid, etc.)
occurring at the temperature tc , see [YL52]. Hence Theorem 7.4.1 implies that as
long as the magnetic field remains constant and non-zero, there is no phase transition
in the ferromagnetic case, as all the zeros of t −→ Z (G n , t) stay away from the
positive real axis even as the graph G n grows. If the external magnetic field is zero,
Onsager [On44] demonstrated that there is indeed a phase transition at a particular
temperature in the case of a growing rectangular region of the 2-dimensional grid
as on Fig. 7.3, in the ferromagnetic case with constant interactions bi j > 0, see also
Chap. 10 of [Ai07] for the computation of the partition function in that case.
The second notion of the phase transition has to do with the correlation decay
phenomenon, as in Sects. 5.2, 6.3 and 6.4. Suppose that there is no external magnetic
7.4 The Lee–Yang Circle Theorem and the Ising Model 265

field, so that ci = 0 for all i ∈ V . Let us consider the probability distribution on the
set of configurations σ defined by (7.4.4.2). We choose a particular vertex i in G and
consider the conditional probability that σ(i) = 1, given that the spins of the vertices
far away from i are also equal to 1. For example, i is the central vertex on Fig. 7.3,
the spins are fixed to 1 on the boundary of the square and the size of the square is
allowed to grow. We say that the phase transition occurs at a particular temperature
tc if for higher temperatures t > tc the probability that σ(i) = 1 asymptotically
does not depend on the boundary conditions (no long range interactions), while for
lower temperatures t < tc the probability that σ(i) = 1 asymptotically depends on
the boundary conditions (long range interactions appear). In 1936, Peierls found a
relatively simple argument, which allows one to show that this kind of the phase
transition indeed occurs for a variety of graphs, in particular for grids in Zd with
d ≥ 2, see [Ci87] for an exposition and references.
7.4.5 Reduction to matchings. Fisher [Fi66] showed that computing the partition
function Z (G, t) defined by (7.4.4.1) in the case of zero magnetic field (that is, then
ci = 0 for all i ∈ V ) can be reduced to counting weighted perfect matchings in some
auxiliary graph G, that is, to computing an appropriate hafnian, see Sect. 4.1. More-
over, if G is a planar graph, the graph G  is also planar, so one can use Pfaffians to
compute Z (G, t), see Sect. 4.3. Heilmann and Lieb [HL72] modified Fisher’s argu-
ment to account for a non-zero magnetic field and showed that in general computing
Z (G, t) reduces to computing the matching polynomial of a graph, see Chap. 5.
Below we follow [Fi66].
To simplify the notation, we write (7.4.4.1) in the absence of the magnetic field
simply as  
Z (G) = exp bi j σ(i)σ( j) ,
σ:V →{−1,1} {i, j}∈E

where bi j are some real weights on the edges of E. Since the product σ(i)σ( j)
takes only two values, +1 and −1, we can interpolate exp bi j σ(i)σ( j) by an affine
function in σ(i)σ( j) and write
   
Z (G) = f i j + gi j σ(i)σ( j)
σ:V →{−1,1} {i, j}∈E

ebi j + e−bi j ebi j − e−bi j


where f i j = > 0 and gi j = .
2 2
Next, we factor out f i j and write
⎛ ⎞

Z (G) = ⎝ f i j ⎠ Z 0 (G) where
{i, j}∈E
(7.4.5.1)
    ebi j − e−bi j
Z 0 (G) = 1 + h i j σ(i)σ( j) and h i j = bi j .
σ:V →{−1,1} {i, j}∈E
e + e−bi j
266 7 The Graph Homomorphism Partition Function

We note that the signs of h i j and bi j coincide. We will be computing Z 0 (G).


Let us expand the product in the definition of Z 0 (G). We obtain various monomials
of the type
h i1 j1 · · · h is js σ(i 1 )σ( j1 ) · · · σ(i s )σ( js ). (7.4.5.2)

The monomials that survive summing over all σ : V −→ {−1, 1} correspond to the
collections T of distinct edges {i 1 , j1 }, . . . , {i s , js } that cover every vertex i of V an
even, possibly zero, number of times. We call such collections Eulerian. Hence we
can write  
Z 0 (G) = 2|V | hi j . (7.4.5.3)
T ⊂E {i, j}∈T
T is Eulerian

Next, we begin to modify G. First, we construct an intermediate graph G̃ = (Ṽ , Ẽ)


with weights on edges such that Z 0 (G) = Z 0 (G̃) and the degree of every vertex on
G̃ does not exceed 3. We do it step by step, each time replacing a vertex of degree
d > 3 by d clones, connected in a circular order as on Fig. 7.5.
The edges of the obtained graph are of the two kinds: the inherited edges, con-
necting clones of the vertex to other vertices (thick lines on Fig. 7.5) and circular
edges, connecting clones of the vertex of G within themselves (thin lines on Fig. 7.5).
The weights h̃ i j on the inherited edges are copied from those on the corresponding
edges of G, while for any circular edge {i, j} we let h̃ i j = 1. Whenever σ(i) = σ( j)
for some two clones of a vertex in G, we also have σ(i) = σ( j) for two neighboring
clones and hence by (7.4.5.1) the contribution of the corresponding configuration σ
to the partition function is just 0. Repeating this process, we obtain a graph G̃ with
vertices of degree 1, 2 and 3 and such that Z 0 (G) = Z 0 (G̃). Note that if G is planar
then G̃ is also planar.
Hence without loss of generality, we may assume the degree of every vertex of
G is 1, 2 or 3. We still denote weights on the edges of G by h i j and without loss
of generality we assume that h i j > 0. Next, we construct a weighted graph G  such
that Z 0 (G) is expressed as the partition function enumerating perfect matchings in
 see Sect. 4.1.
G,
We keep vertices of degree 1 intact. If h i j is the weight on the unique edge incident
to such a vertex, we assign weight wi j = 1/ h i j to the unique inherited edge in G 

Fig. 7.5 Replacing a vertex


of degree 6 by 6 vertices of h ij
degree 3
h ij
1 1
1 1
1 1
7.4 The Lee–Yang Circle Theorem and the Ising Model 267


Fig. 7.6 Constructing G
from G

h ij 1/hij

1
h 1/h
ij ij
1/hi
j
hij
1 1
1

incident to the vertex, see Fig. 7.6. Every vertex of G of degree 2 we replace by two
clones, connected by an auxiliary edge of weight 1 in G  (thin line on Fig. 7.6) and
connected to other vertices by two inherited edges (thick lines on Fig. 7.6). If an
edge incident in G to such a vertex has weight h i j , we assign weight wi j = 1/ h i j
to the corresponding inherited edge in G.  Every vertex G of degree 3 we replace by
three clones connected by auxiliary edges of weight 1 each (thin lines on Fig. 7.6)
and connected to other vertices by three inherited edges (thick lines on Fig. 7.6). If
an edge incident in G to such a vertex has weight h i j , we assign weight wi j = 1/ h i j
to the corresponding inherited edge in G. 
Given an Eulerian collection T of edges in G, we construct a perfect matching
 as follows: we include an inherited edge into the perfect matching if and only
in G
if the corresponding edge of G is not included in T . We then include auxiliary
edges to complete the matching (the choice is unique). One can observe that the
correspondence is a bijection between Eulerian collections in G (which are just
collections of vertex-disjoint cycles) and perfect matchings in G.  One can deduce
from (7.4.5.3) that ⎛ ⎞

Z 0 (G) = 2|V | ⎝ "
h i j ⎠ haf(G),
{i, j}∈E

 is the sum of weights of the perfect matchings in G


where haf(G)  and where the
weight of a perfect matching is the product of weights of its edges, see Sect. 4.1.
We note that if G is a planar graph then G is also planar.
Chapter 8
Partition Functions of Integer Flows

We consider yet another extension of the permanent, and some of the methods and
results of Chap. 3 (capacity of polynomials, connections to H-stable polynomials,
the van der Waerden and Bregman–Minc bounds) are used. Geometrically, with each
integer point of a polyhedron in Rn , we associate a monomial in n real variables and
the partition function is just the sum of monomials over the integer points in the
polyhedron. When the variables are non-negative, we prove a general upper bound
for the partition function in terms of the solution to a convex optimization problem
(entropy maximization) on the polyhedron. Although for general polyhedra there can
be no matching lower bound, such a bound indeed exists in the case of polyhedra of
feasible flows in a graph. This allows us to understand what a “typical” random integer
point in a flow polyhedron looks like. Based on this understanding and with intuition
supplied by the Local Central Limit Theorem, we present a heuristic “Gaussian”
formula for the partition function of a general polyhedron. Its validity has indeed
been proven in some particular cases, though not in this book.

8.1 The Partition Function of 0-1 Flows

8.1.1 Definitions. Let us choose positive integer vectors R = (r1 , . . . , rm ) and C =


(c1 , . . . , cn ) such that

r1 + . . . + rm = c1 + . . . + cn = N (8.1.1.1)

and let 0 (R, C) be the set of all m × n matrices with row sums R, column sums
C and 0-1 entries:

© Springer International Publishing AG 2016 269


A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9_8
270 8 Partition Functions of Integer Flows

    n
0 (R, C) = D = di j : di j = ri for i = 1, . . . , m
j=1

n
di j = c j for j = 1, . . . , n and
i=1

di j ∈ {0, 1} for all i, j .

The vectors R and C are called margins of a matrix from 0 (R, C). The Gale–Ryser
Theorem, see for example, Sect. 6.2 of [BR91], provides a convenient necessary and
sufficient condition for 0 (R, C) to be non-empty: assuming that

m ≥ c1 ≥ c2 ≥ . . . ≥ cn > 0

and that
n ≥ ri > 0 for i = 1, . . . , m,

there is a 0-1 matrix with row sums R and column sums C if and only if the balance
condition (8.1.1.1) holds and


m 
k
min {ri , k} ≥ c j for k = 1, . . . , n. (8.1.1.2)
i=1 j=1

In the extreme case when 0 (R, C) consists of a single matrix, that matrix has 1s
arranged in a staircase pattern:
⎛ ⎞
1111 1
⎜1 1 1 0 0⎟
⎜ ⎟.
⎝1 1 1 0 0⎠
1100 0

Thus in the above matrix we have m = 5, n = 4, R = (5, 3, 3, 2), C = (4, 4, 3, 1, 1)


and (8.1.1.2) are equalities.
  that 0 (R, C) is indeed non-empty. Given a non-negative m × n matrix
Assume
W = wi j of weights, we define the partition function of 0-1 flows by
 d
Fl0 (R, C; W ) = wi ji j
D∈0 (R,C) i, j
D=(di j )

and we agree that 00 = 1 so that Fl0 (R, C; W ) remains a continuous function of W


when wi j −→ 0+.
In particular, if m = n and R = C = (1, . . . , 1) then Fl0 (R, C; W ) = per W , see
Sect. 3.1.
8.1 The Partition Function of 0-1 Flows 271

Fig. 8.1 A bipartite graph 1L 1R


and a feasible 0-1 flow (thick
edges) with supplies
2L 2R
(1, 2, 1, 2) and demands
(1, 1, 2, 1, 1) 3R
3L 4R
4L 5R

8.1.2 The flow interpretation. When the matrix W of weights is itself a 0-1 matrix,
the partition function Fl0 (R, C; W ) is naturally interpreted as the number of feasi-
ble flows in the network. Namely, we consider a bipartite graph G with m + n
vertices numbered 1L , 2L , . . . , m L and 1R, 2R, . . . , n R and edges (i L , j R) when-
ever wi j = 1. We assign to each vertex i L the supply ri and to each vertex j R the
demand c j . A feasible 0-1 flow is a subset F of edges of G such that every vertex
i L is incident to ri edges from F and each vertex j R is incident to c j edges from F.
For example, the feasible 0-1 flow on Fig. 8.1 corresponds to m = 4, n = 5,
⎛ ⎞
1 1 0 1 0
⎜1 0 1 0 0⎟
W =⎜
⎝0

1 1 0 1⎠
0 0 0 1 1

and R = (1, 2, 1, 2), C = (1, 1, 2, 1, 1).


The following estimate of Fl0 (R, C; W ) in terms of the capacity of a certain
polynomial, see Sect. 2.4, was obtained by Gurvits [Gu15].
 
8.1.3 Theorem. Given a non-negative m × n matrix W = wi j of weights, we
define a polynomial pW in m + n variables by
 
pW (x1 , . . . , xm ; y1 , . . . , yn ) = xi + wi j y j .
1≤i≤m
1≤ j≤n

Given margins R = (r1 , . . . , rm ) and C = (c1 , . . . , cn ), let

pW (x1 , . . . , xm ; y1 , . . . , yn )
α(R, C; W ) = inf .
x1 ,...,xm >0
y1 ,...,yn >0
x1n−r1 · · · xmn−rm y1c1 · · · yncn

Then
  ⎛ n cj ⎞
m
riri (n − ri )n−ri n! c j (m − c j )m−c j m!
⎝ ⎠ α(R, C; W )
i=1
ri !(n − ri )!n n j=1
c j !(m − c j )!m m

≤ Fl0 (R, C; W ) ≤ α(R, C; W ).


272 8 Partition Functions of Integer Flows

Proof. First, we claim that Fl0 (R, C; W ) is the coefficient of the monomial

x1n−r1 · · · xmn−rm y1c1 · · · yncn

in the monomial expansion of pW . Indeed, let us write the  monomial expansion of


pW by expanding the product of mn factors   i x + w i j j . With each monomial of
y
pW , we associate an  m × n matrix
 D = di j of 0s and 1s as follows. We let di j = 1
if from the factor xi + wi j y j we pick up the term wi j y j and we let di j = 0 if we
pick up xi . We obtain the monomial x1n−r1 · · · xmn−rm y1c1 . . . yncn precisely when the row
sums of D are r1 , . . . , rm and the column sums of D are c1 , . . . , cn .  
Next, we observe that pW is H-stable, see Sect. 2.4, provided W = wi j is a non-
negative real matrix. Indeed, if x1 , . . . , xm and y1 , . . . , ym are complex
 variables such
that  x1 , . . . ,  xm > 0 and  y1 , . . . ,  yn > 0 then  xi + wi j y j > 0 for all i, j
and hence pW (x1 , . . . , xm ; y1 , . . . , yn ) = 0.
Finally, we note that the degree of xi in pW does not exceed n for i = 1, . . . , m,
while the degree of y j in pW does not exceed m for j = 1, . . . , n. The result now
follows from Theorem 2.4.7. 
Some remarks are in order. Suppose that W is a 0-1 matrix, so that Fl0 (R, C; W )
enumerates 0-1 flows. In many asymptotic regimes, the quantity α(R, C; W ) captures
at least the logarithmic order of Fl0 (R, C; W ). For example, if m, n, ri and c j grow
roughly proportionately, so that ln Fl0 (R, C; W ) grows roughly linearly with mn, it
follows from Stirling’s formula
√  x x   
x! = 2πx 1 + O x −1 as x −→ +∞,
e

that α(R, C; W ) approximates Fl0 (R, C; W ) within a factor of e O(m+n) , that is,
α(R, C; W ) captures the logarithmic order of Fl0 (R, C; W ). If m = n and ri =
c j = 1 for all i, j then Fl0 (R, C; W ) = per W and α(R, C; W ) approximates
Fl0 (R, C; W ) within a factor of
 2n(n−1)
1
1− ≈ e−2(n−1) ,
n

and hence even in some sparse regimes α(R, C; W ) captures the logarithmic order
of Fl0 (R, C; W ). If W is sparse matrix, the bounds can be improved further, see
[Gu15], since the bounds in Theorem 2.4.7 can be made sharper by a more careful
application of Theorem 2.4.3.
In [B10b], a weaker bound for the approximation of Fl0 (R, C; W ) by α(R, C; W )
was obtained. Based on that bound, it was shown that a “typical” matrix D = di j
of 0s and 1s, with row sums R and column
 sums C, concentrates about a particular
“maximum entropy” matrix  = θi j that maximizes the strictly concave function

 1   1

xi j ln + 1 − xi j ln
ij
xi j 1 − xi j
8.1 The Partition Function of 0-1 Flows 273
 
on the polytope of m × n matrices X = xi j with row sums R, column sums C and
entries between 0 and 1. We discuss the connection is Sect. 8.5.1.
If W = (1)m × n is the matrix filled by 1s, then Fl0 (R, C; W ) is just the number
of all 0-1 matrices with row sums R and column sums C. There is an extensive
literature on approximate and asymptotic formulas for Fl0 (R, C; W ), see [G+06,
C+08, BH13, IM16] and references therein.
Since Fl0 (R, C; W ) can be represented as the coefficient of a monomial in a
product of mn linear forms, it follows from Sect. 3.2.1 that Fl0 (R, C; W ) can be
represented as the permanent of an (mn)×(mn) matrix. For example, for m = n = 3,
R = (3, 2, 1) and C = (2, 2, 2), formula (3.2.1.2) gives
⎛ ⎞
0 0 0 w11 w11 0 0 0 0
⎜0 0 0 0 0 w12 w12 0 0 ⎟
⎜ ⎟
⎜0 0 0 0 0 0 0 w13 w13 ⎟
⎜ ⎟
⎜1 0 0 w21 w21 0 0 0 0 ⎟
1 ⎜ ⎟
Fl0 (R, C; W ) = per ⎜
⎜1 0 0 0 0 w22 w22 0 0 ⎟⎟
0!1!2!2!2!2! ⎜1 0 0 0 0 0 0 w23 w23 ⎟
⎜ ⎟
⎜0 1 1 w31 w31 0 0 0 0 ⎟
⎜ ⎟
⎝0 1 1 0 0 w32 w32 0 0 ⎠
0 1 1 0 0 0 0 w33 w33

which simplifies to

Fl0 (R, C; W ) =w11 w31 w12 w22 w13 w23 + w11 w21 w12 w32 w13 w23
+ w11 w21 w12 w22 w13 w33 .

As Jerrum, Sinclair and Vigoda remark in [J+04], their randomized polynomial time
algorithm for approximating the permanent of a non-negative real matrix can be
applied to approximate Fl0 (R, C; W ) in polynomial time.

8.2 The Partition Function of Integer Flows

8.2.1 Definitions. Let us choose positive integer vectors R = (r1 , . . . , rm ) and C =


(c1 , . . . , cn ) such that

r1 + . . . + rm = c1 + . . . + cn = N (8.2.1.1)

and let + (R, C) be the set of all m × n non-negative integer matrices with row
sums R and column sums C:
274 8 Partition Functions of Integer Flows

    n
+ (R, C) = D = di j : di j = ri for i = 1, . . . , m,
j=1

n
di j = c j for j = 1, . . . , n,
i=1

di j ∈ Z and di j ≥ 0 for all i, j .

The vectors R and C are called margins of a matrix from + (R, C). It is not hard
to show that for positive integer vectors R and C the set + (R, C) is non-empty if
and only if the balance condition (8.2.1.1) is satisfied.
 Assuming
 that + (R, C) is
non-empty, for an m × n non-negative matrix W = wi j of weights, we define the
partition function of integer flows by
 d
Fl+ (R, C; W ) = wi ji j .
D∈+ (R,C) i j
D=(di j )

As in Sect. 8.1, we agree that 00 = 1, so that Fl+ (R, C; W ) remains a continuous


function of W when wi j −→ 0+.

8.2.2 The flow interpretation. Suppose that wi j ∈ {0, 1} for all i, j. Then
Fl+ (R, C; W ) is interpreted as the number of integer feasible flows in a network.
As in Sect. 8.1.2, we consider a bipartite graph G with m + n vertices numbered
1L , . . . , m L and 1R, . . . , n R and edges (i L , j R) whenever wi j = 1. We assign to
each vertex i L the supply ri and to each vertex j R the demand c j . A feasible integer
flow is an assignment of non-negative integer numbers to the edges of G so that for
every vertex i L the sum of the assigned numbers on the incident edges is ri while
for every vertex j R the sum of the assigned numbers on the incident edges is c j .
More generally, suppose that G is a directed graph without loops or multiple
edges. Suppose that to every vertex v of G an integer a(v) is assigned, which can be
positive (“demand”), negative (“supply”) or 0 (“transit”). A feasible integer flow is
an assignment of non-negative integers x(e) to every edge e of G so that for every v
the balance condition inflow − outflow = a(v) is satisfied, see Fig. 8.2:
 
x(e) − x(e) = a(v) for all v.
e:e=u→v e:e=v→u

Given that the set of feasible integer flows is non-empty, it is finite, if and only if
the graph contains no directed cycles of the type v1 → v2 → . . . → vn → v1 . If
there are no directed cycles, one can construct a bipartite graph G  and a bijection
between the set of feasible integer flows in G and the set of feasible integer flows in
 as follows. With every vertex v of G we associate two vertices vL and v R of G,
G 
connected by a directed edge vL → v R. For every edge v → u of G, we introduce
the edge vL → u R of G. For every vertex v of G, we choose a positive integer z(v)
8.2 The Partition Function of Integer Flows 275

Fig. 8.2 A graph with 4 1


vertices and a feasible flow, I −4 −3 II
corresponding to 2 3
demands/supplies written 1 1
inside each vertex

IV 7 0 III
3

Fig. 8.3 The feasible integer 16


flow in a bipartite graph with I 20 16 I
z(v) = 20 for all v, 1 1
corresponding to the feasible
16
integer flow on Fig. 8.2 II 20 17 II
3 1 2
17
III 20 20 III
3

IV 20 27 IV
20

which is at least as large as a possible outflow from v. We let the supply in vL equal
to z(v) and demand in v R equal to z(v) + a(v), where a(v) is the demand/supply in
v. To construct the bijection, if x(e) for e = v → u is a flow in G, we introduce the
flow x(e) on the edge vL → u R in G  and for every vertex v of G, we introduce the
flow of z(v)− outflow of v on the edge vL −→ v R, see Fig. 8.3.
Hence the number of integer feasible flows in an arbitrary directed graph without
directed cycles can be encoded as Fl+ (R, C; W ) for appropriate R, C and W .
In fact, we can also incorporate upper bounds on the size of flow on edges, that is,
enumerate integer feasible flows x(e) in G with additional constraints x(e) ≤ c(e),
where c(e) are given positive integers (frequently referred to as capacities of edges).
For that, for every edge v → u, we introduce two auxiliary vertices w+ and w− ,
replace the edge v → u by the three edges v → w+ , w− → w+ and w− → u,
and let a(w+ ) = c(v → u) and a(w− ) = −c(v → u). Then a flow on x(e) on
the edge v → u in G satisfying x(e) ≤ c(e), corresponds to the flow x(e) on the
edge v → w+ , flow c(e) − x(e) on the edge w− → w+ and flow x(e) on the edge
w− → u, see Fig. 8.4.
In particular, we can express the number Fl0 (R, C; W ) of feasible 0-1 flows, see
Sect. 8.1, as the number Fl+ (R , C , W ) of feasible integer flows. One particular case
resulting in the Kostant partition function of type A is of interest to representation
theory, see [BV09]. There we are interested in the number of feasible integer flows
in a graph with vertices numbered 1, . . . , n and edges i → j for j > i, see also
Fig. 8.2 for an example.
276 8 Partition Functions of Integer Flows

Fig. 8.4 Enforcing the x(e)


condition x(e) ≤ c

x(e) c−x(e) x(e)


c −c

We will use a representation of Fl+ (R, C; W ) as the permanent of a structured


random matrix. Recall that a random variable ω is standard exponential if the density
of ω is 
e−t if t > 0
0 if t ≤ 0.

Recall that  +∞
E ωk = t k e−t dt = k!
0

for non-negative integer k.


The following result was obtained in [Ba07].
 
8.2.3 Lemma. Let  = ωi j be the m × n matrix of independent standard expo-
nential random variables ωi j . Given positive integer vectors R = (r1 , . . . , rm ) and
C = (c1 , . . . , cn ) such that

r1 + . . . + rm = c1 + . . . + cn = N
 
and an m × n matrix W = wi j of weights, let us construct a random N × N matrix
A() = A R,C;W () as follows: the rows of A() are split into m blocks, with the
i-th block containing ri rows, the columns of A are split into n blocks, with the jth
block containing c j columns and the entries in the (i, j)-th block are all equal to
wi j ωi j , see Fig. 8.5.
Then
E per A()
Fl+ (R, C; W ) = .
r1 ! · · · rm !c1 ! · · · cn !

Proof. Let us pick one entry from every row and every column of A() andlet di j
be the number of entries picked from the (i, j)-th block. Clearly, D = di j is an
m × n non-negative integer matrix with row sums R and column sums C and the
expectation of the product of the picked entries is
8.2 The Partition Function of Integer Flows 277

Fig. 8.5 The structure of the cj

{
matrix A()

r
i { wij ωij

 d d
E wi j ωi j i j = di j !wi ji j . (8.2.3.1)
i, j i, j

Let us now


 compute how many times we obtain a given non-negative integer matrix
D = di j with row sums R and column sums C. For that, for i = 1, . . . , m, we split
the i-th block of ri rows into n sub-blocks with di1 , . . . , din rows in
m
ri !
i=1
di1 ! · · · din !

ways, for j = 1, . . . , n, we split the j-th block of c j columns into m sub-blocks with
d1 j , . . . , dm j columns in
n
cj!
d ! · · · dm j !
j=1 1 j

ways and then for each i and j such that di j > 0 we choose one entry in every row of
the j-th sub-block of the i-th block of rows and every column of the i-th sub-block
of the j-th block of columns altogether in
n
di j !
i, j

ways, see Fig. 8.6.

Fig. 8.6 Subdividing rows cj


and columns further into
sub-blocks d1j d ij d mj

d i1

ri dij

din
278 8 Partition Functions of Integer Flows

Hence we obtain (8.2.3.1) in

 m
⎛ n
⎞⎛ ⎞−1

ri ! ⎝ c j !⎠ ⎝ di j !⎠
i=1 j=1 ij

ways total and


 m
⎛ n

E per A() = ri ! ⎝ c j !⎠ Fl+ (R, C; W ),


i=1 j=1

which completes the proof. 

8.3 Approximate Log-Concavity

For a non-negative integer vector R = (r1 , . . . , rm ), we denote


m m
ri !
|R| = ri and γ(R) = .
i=1 i=1
riri

Our goal is to prove the following result from [Ba07].


8.3.1 Theorem. Let W be an m × n non-negative real matrix, let R1 , . . . , Rk be
non-negative integer m-vectors and let C 1 , . . . , Ck be non-negative integer n-vectors
such that |Ri | = |Ci | = N for all i. Suppose further that α1 , . . . , αk ≥ 0 are reals
such that α1 + . . . + αk = 1 and such that


k 
k
R= αi Ri and C = αi Ci
i=1 i=1

are positive integer vectors.


Then

NN
k  αi
γ(R)γ(C)Fl+ (R, C; W ) ≥ Fl+ (Ri , Ci ; W ) max {γ(Ri ), γ(Ci )} .
N! i=1

Theorem 8.3.1 implies an approximate log-concavity of the numbers


Fl+ (R, C; W ).
8.3.2 Corollary. Let W be an m × n non-negative real matrix, let R1 , . . . , Rk be
non-negative integer m-vectors and let C 1 , . . . , Ck be non-negative integer n-vectors
8.3 Approximate Log-Concavity 279

such that |Ri | = |Ci | = N for all i. Suppose further that α1 , . . . , αk ≥ 0 are reals
such that α1 + . . . + αk = 1 and such that


k 
k
R= αi Ri and C = αi Ci
i=1 i=1

are positive integer vectors. Assuming that R = (r1 , . . . , rm ) and C = (c1 , . . . , cn ),


we have
⎧ ⎫
NN ⎨ m r ! n c !⎬ k
(Fl+ (Ri , Ci ; W ))αi .
i j
min , Fl + (R, C; W ) ≥
N! ⎩ riri cjj ⎭
c
i=1 j=1 i=1

From Stirling’s formula,

NN eN ri ! −ri

=√ (1 + O(1/N )) , ri = e 2πri (1 + O(1/ri ))
N! 2π N ri
cj!   
and c j =e−c j 2πc j 1 + O(1/c j ) ,
cj

it follows that
⎧ ⎫
N N ⎨
ri !
m n
cj!⎬  √ √ 
min ri , = min 2 O(m) r1 · · · rm , 2 O(n) c1 · · · cn .
N! ⎩ r cj
cj ⎭
i=1 i j=1

There seem to be neither a counter-example nor a proof of a hypothetical stronger


inequality, which claims genuine log-concavity of Fl+ (R, C; W ):

?
k  αi
Fl+ (R, C; W ) ≥ Fl+ (Ri , Ci , ; W ) .
i=1

The proof of Theorem 8.3.1 uses the permanental representation of Fl+ (R, C; W )
of Lemma 8.2.3, matrix scaling (see Sect. 3.5), the van der Waerden (Sect. 3.3) and
Bregman–Minc (Sect. 3.4) inequalities.  
For an m × n positive real matrix B = bi j , we define a function g B : Rm ⊕
Rn −→ R by

g B (x, y) = bi j eξi +η j for x = (ξ1 , . . . , ξm ) and y = (η1 , . . . , ηn ) .
1≤i≤m
1≤ j≤n

Let ·, · denote the standard inner product in Euclidean space. For R ∈ Rm and
C ∈ Rn , we define a subspace L R,C ⊂ Rm ⊕ Rn by
280 8 Partition Functions of Integer Flows
 
L R,C = (x, y) ∈ Rm ⊕ Rn : R, x = C, y = 0 .

We further define a function f R,C of B by

f R,C (B) = inf g B (x, y) ≥ 0,


(x,y)∈L R,C

see Theorem 3.5.8.


8.3.3 Lemma. Let R = (r1 , . . . , rm ) be a positive integer m-vector and let C =
(c1 , . . . , cn ) be a positive integer n-vector such that |R| = |C| = N . Let W = wi j
be an m × n positive matrix. For an m × n matrix  = ωi j of independent standard  
exponential random variables, let us define the m × n matrix B = B() = bi j
by bi j = wi j ωi j for all i, j. Then
 m
 ⎛ n cj ⎞
N! riri cj
⎝ ⎠ 1 E f R,CN
(B) ≤ Fl+ (R, C; W )
NN i=1
r i ! j=1
c j ! N N

⎧ ⎫
⎨ m r ri n c c j ⎬ 1
j
≤ min i
, N
E f R,C (B).
⎩ r i ! c j ! ⎭ NN
i=1 j=1

Proof. With probability 1, the matrix B is positive. Using Theorem 3.5.8, we scale
B to a matrix with row sums r1 , . . . , rm and column  sums c1 , . . . , cn . That is, we
compute a positive m × n matrix L = L(), L = li j with row sums R and column
sums C and positive λi = λi () for i = 1, . . . , m and μ j = μ j () for j = 1, . . . , n,
such that
bi j = li j λi μ j for all i, j.

By Theorem 3.5.8, we can choose

f R,C (B) f R,C (B)


λi = e−ξi and μ j = e−η j ,
N N

where x ∗ = (ξ1 , . . . , ξm ) and y ∗ = (μ1 , . . . , μn ) is the minimum point of g R,C (B)


on L R,C . It follows that
 ⎛ ⎞  ! ⎧ ⎫
m n N
f R,C (B) 
m ⎨ n ⎬
λri i ⎝ μ jj ⎠
c
= exp − ri ξi exp − cjηj
NN ⎩ ⎭
i=1 j=1 i=1 j=1
N
f R,C (B)
= . (8.3.3.1)
NN

Let A() be the N × N matrix constructed in Lemma 8.2.3. Let us divide the entries
in the (i, j)-th block of A() by λi μ j ri c j and let D = D() be the N × N matrix
8.3 Approximate Log-Concavity 281

we obtain. Then
 m
⎛ n

per A = (ri λi )ri ⎝ (c j μ j )c j ⎠ per D


i=1 j=1
 m
⎛ n

ri ⎝
N
fr,c (B)
cjj ⎠
c
= ri per D (8.3.3.2)
i=1 j=1
NN

by (8.3.3.1). It is not hard to see that the matrix D is doubly stochastic, and hence
by the van der Waerden bound, see Theorem 3.3.2, we have

N!
per D ≥ .
NN

On the other hand, the entries of the (i, j)-th block of D can be written as li j /ri c j
and hence do not exceed min{1/ri , 1/c j }. Therefore, by the Bregman–Minc bound,
see Corollary 3.4.5, we have
⎧ ⎫
⎨ m
ri !
n
cj! ⎬
per D ≤ min , c .
⎩ riri cjj ⎭
i=1 j=1

Hence from (8.3.3.2),


⎧ ⎫
⎨ n n n m ⎬ f N (B)
c R,C
min ri ! cjj , cj! riri ≥ per A
⎩ ⎭ NN
i=1 j=1 j=1 i=1
 m
⎛ n

ri ⎝
N
N ! f R,C (B)
cjj ⎠
c
≥ ri .
i=1 j=1
NN NN

The proof now follows from Lemma 8.2.3. 

Next, we establish some convex properties of f R,C (B). We define u d ∈ Rd by


u d = (1, . . . , 1) (the d-vector of all 1s) and note that
 
g B x + αu m , y + βu n = eα+β g B (x, y) for all α, β ∈ R.

8.3.4 Lemma. Let R1 , . . . , Rk be m-vectors and let C1 , . . . , Ck be n-vectors such


that
Ri , u m  = Ci , u n  = 1 for i = 1, . . . , k.
282 8 Partition Functions of Integer Flows

Let B1 , . . . , Bk be positive real m × n matrices and let α1 , . . . , αk ≥ 0 be real such


that α1 + . . . + αk = 1. Let


k 
k 
k
R= αi Ri , C = αi Ci and B = αi Bi .
i=1 i=1 i=1

Then
k
 αi
f R,C (B) ≥ f Ri ,Ci (Bi ) .
i=1

Proof. Let us choose a point (x, y) ∈ L R,C , so that

R, x = C, y = 0. (8.3.4.1)

We define

xi = x − Ri , xu m and yi = y − Ci , yu n for i = 1, . . . , k.

Hence

Ri , xi  = Ri , x − Ri , x Ri , u m  = 0 and
Ci , yi  = Ci , y − Ci , y Ci , u n  = 0,

so that
(xi , yi ) ∈ L Ri ,Ci . (8.3.4.2)

Then

k k
 α
g B (x, y) = αi g Bi (x, y) ≥ g Bi (x, y) i (8.3.4.3)
i=1 i=1

and
 
g Bi (x, y) = g Bi xi + Ri , xu m , yi + Ci , yu n = e Ri ,x e Ci ,y g Bi (xi , yi ). (8.3.4.4)

Since by (8.3.4.1)
" k #!
k
  
Ri ,x αi
e = exp αi Ri , x = exp { R, x} = 1 and
i=1 i=1
" k #!
k
 Ci ,y αi 
e = exp αi Ci , y = exp { C, y} = 1,
i=1 i=1

combining (8.3.4.2)–(8.3.4.4) we obtain


8.3 Approximate Log-Concavity 283

k
 αi k
 αi
g B (x, y) ≥ g Bi (xi , yi ) ≥ f Ri ,Ci (Bi ) .
i=1 i=1

Sine the point (x, y) ∈ L R,C was chosen arbitrarily, the proof follows. 

8.3.5 Proof of Theorem 8.3.1. By continuity,


  it suffices to prove Theorem 8.3.1
assuming, additionally, that matrix W = wi j is positive.
Given an m × n matrix  of independent standard exponential random variables,
let us construct a random matrix B = B() as in Lemma 8.3.3. By Lemma 8.3.3,
we have

NN 1
γ(R)γ(C)Fl+ (R, C; W ) ≥ N
E f R,C (B) and
N! NN
1
max {γ(Ri ), γ(Ci )} Fl+ (Ri , Ci ; W ) ≤ E f RNi ,Ci (B) for i = 1, . . . , k.
NN
If 1 , . . . , k are different realizations of  and


k
0 = αi i ,
i=1

then for the corresponding matrices Bi = B(i ) for i = 0, 1, . . . , k, we have


k
B0 = αi Bi
i=1

and by Lemma 8.3.4,


k
 αi
N
f R,C (B0 ) ≥ f RNi ,Ci (Bi ) .
i=1

Note that we can apply Lemma 8.3.4 since |Ri | = |Ci | = N for all i, so that

f Ri ,Ci = f Ri /N ,Ci /N and f R,C = f R/N ,C/N

and the sum of the coordinates of vectors Ri /N , Ci /N , R/N and C/N are
equal to 1.  
Since the density of the random matrix  = ωi j is
$
ij e−ti j if ti j > 0 for all i, j
0 otherwise,

applying the Prékopa–Leindler inequality of Sect. 2.1.6, we conclude that


284 8 Partition Functions of Integer Flows

k
 αi
N
E f R,C (B) ≥ E f RNi ,Ci (B) ,
i=1

which completes the proof. 


8.3.6 Proof of Corollary 8.3.2. Using that the function

(x + 1)
x −→ ln for x ≥ 1
xx
is concave, we conclude that

k k
αi
γ(R) ≥ γ (Ri ) and γ(C) ≥ γ αi (Ci )
i=1 i=1

and the proof follows from Theorem 8.3.1. 

8.4 Bounds for the Partition Function

Corollary 8.3.2 allows us to estimate the partition function Fl+ (R, C; W ) in terms
of the capacity of a certain polynomial.
8.4.1 Complete symmetric polynomial. The complete symmetric polynomial
 +d−1 
h N (z 1 , . . . , z d ) of degree N in d variables z 1 , . . . , z d is the sum of all N d−1 mono-
mials in z 1 , . . . , z d of the total degree N . It can be defined recursively as h N (z 1 ) = z 1N
and

N
h N (z 1 , . . . , z d ) = z dm h N −d (z 1 , . . . , z d−1 ) ,
m=0

which also provides a fast way to compute h N at any given z 1 , . . . , z d .


8.4.2 Theorem. Let R = (r1 , . . . , rm ) be a positive integer m-vector and let C =
(c1 , . . . , cn ) be a positive integer n-vector such that

r1 + . . . + rm = c1 + . . . + cn = N .
 
Let W = wi j be a non-negative real m × n matrix of weights. Let us define a
polynomial p = p R,C;W in m + n real variables x1 , . . . , xm ; y1 , . . . , yn by

p (x1 , . . . , xm ; y1 , . . . , yn ) = h N (z i j ) for z i j = wi j xi y j ,

where h N (z i j ) is the complete symmetric polynomial of degree N in mn variables


z i j where 1 ≤ i ≤ m and 1 ≤ j ≤ n. Let
8.4 Bounds for the Partition Function 285

p (x1 , . . . , xm ; y1 , . . . , yn )
α+ (R, C; W ) = inf .
x1 ,...,xm >0 x1r1 · · · xmrm y1c1 · · · yncn
y1 ,...,yn >0

Then
⎧ ⎫
    ⎨ m r ri cjj ⎬
c
N + m − 1 −1 N + n − 1 −1 N !
n
max i
, α+ (R, C)
m−1 n−1 NN ⎩ r ! cj!⎭
i=1 i j=1

≤ Fl+ (R, C; W ) ≤ α+ (R, C; W ).

Proof. Expanding p into the sum of monomials, we get



p (x1 , . . . , xm ; y1 , . . . , yn ) = Fl+ (A, B; W )x1a1 · · · xmam y1b1 · · · ynbn
A=(a1 ,...,am )
B=(b1 ,...,bn )

where the sum is taken over all pairs non-negative integer m-vectors (a1 , . . . , am )
and n-vectors (b1 , . . . , bn ) such that a1 + . . . + am = b1 + . . . + bn = N . Hence
 the

+m−1 N +n−1
upper bound is immediate. As the total number of monomials is N m−1 n−1
,
the lower bound follows from Corollary 8.3.2 and (2.1.5.3). 

Theorem 8.4.2 shows that α+ (R, C; W ) approximates Fl+ (R, C; W ) within a


factor of N O(m+n) (the implicit constant in the “O” notation is absolute). In many
interesting asymptotic regimes, α+ (R, C; W ) captures the logarithmic asymptotics
of Fl+ (R, C; W ). A similar, though less explicit, bound

β+ (R, C; W )
≤ Fl+ (R, C; W ) ≤ β+ (R, C; W )
N O(m+n)
where
 m
⎛ n

−c 1
β+ (R, C; W ) = inf xi−ri ⎝ yj j ⎠
x1 ,...,xm >0
i=1 j=1 i, j
1 − wi j xi y j
y1 ,...,yn >0
wi j xi y j <1 for all i, j

was obtained in [Ba09]. Based on it, it was shown in [B10a] that a “typical” ran-
dom non-negative integer matrix D = di j with row sums R and column sums  C
concentrates about a particular “maximum entropy” matrix matrix  = θi j that
maximizes the strictly concave function
    
xi j + 1 ln xi j + 1 − xi j ln xi j
ij
286 8 Partition Functions of Integer Flows
 
on the polytope of non-negative real matrices X = xi j with row sums R and column
sums C. We discuss the connection in Sect. 8.5.2.
If W = (1)m × n is the matrix filled with 1s, then Fl+ (R, C; W ) is just the number
of non-negative integer matrices with row sums R and column sums C. There is an
extensive literature on asymptotic and approximate formulas for the number of such
matrices, see [DG95, GM08, CM10, BH12, IM16] and references therein.

8.5 Concluding Remarks: Partition Functions for Integer


Points in Polyhedra

The partition functions Fl0 (R, C; W ) and Fl+ (R, C; W ) can be considered as special
cases of more general partition functions for 0-1, respectively integer, points in
polyhedra.
 
8.5.1 Partition function of 0-1 points. Let A = ai j be an integer r × n matrix
of rank r , let b = (b1 , . . . , br ) be an integer r -vector, and let w = (w1 , . . . , wn ) be
a positive real vector of weights. We consider the set X0 (A, B) of the 0-1 vectors x
that lie in the affine subspace defined by the system Ax = b:

 
n
X0 (A, b) = x = (x1 , . . . , xn ) : ai j x j = bi for i = 1, . . . , r and
j=1

x j ∈ {0, 1} for j = 1, . . . , n .

We define a weighted sum (partition function) over X0 (A, b)



S0 (A, b; w) = w1x1 · · · wnxn .
x∈X0 (A,b)
x=(x1 ,...,xn )

It is not very hard to come up with an upper bound similar to the bound α0 of Sect. 8.1:
n
 a 
t1−b1 · · · tr−br
a
S0 (A, b; w) ≤ inf 1 + w j t1 1 j · · · tr r j . (8.5.1.1)
t1 ,...,tr >0
j=1

In general, there is no non-trivial lower bound for S0 (A, b; w) since there is no guar-
antee that the set X0 (A, b) of 0-1 vectors satisfying a given system of linear equations
is non-empty. There is, however, a dual reformulation of (8.5.1.1) which leads to
sharper upper bounds and, sometimes, to good approximations of S0 (A, b; w).
Let P0 (A, b) be the polyhedron that is the intersection of the cube [0, 1]n with the
affine subspace defined by the system Ax = b,
8.5 Concluding Remarks: Partition Functions for Integer Points in Polyhedra 287

 
n
P0 (A, b) = x = (x1 , . . . , xn ) : ai j x j = bi for i = 1, . . . , r and
j=1

0 ≤ x j ≤ 1 for j = 1, . . . , n .

Suppose that P0 (A, b) has a non-empty relative interior, that is, contains a point
x = (x1 , . . . , xn ) such that 0 < x j < 1 for j = 1, . . . , n. Let us consider a strictly
concave function
n  
   1 1
Hw x1 , . . . , x j = x j ln + (1 − x j ) ln + x j ln w j
j=1
xj 1 − xj
where 0 ≤ x j ≤ 1 for j = 1, . . . , n.

It is not hard to show that Hw attains its maximum on P0 (a, b) at a unique point
(ξ1 , . . . , ξn ) in the relative interior of P, cf. the proof of Theorem 3.5.2 and see
[BH10] for detail, and we claim that

S0 (A, b; w) ≤ exp {Hw (ξ1 , . . . , ξn )} (8.5.1.2)

and, moreover, that the bounds of (8.5.1.1) and (8.5.1.2) are identical.
Indeed, the Lagrange multiplier optimality condition implies that for some real
λ1 , . . . , λr , we have
  
r
1 − ξj
ln + ln w j = − λi ai j for j = 1, . . . , n, (8.5.1.3)
ξj i=1

that is,
wj
ξj =  %  for j = 1, . . . , n. (8.5.1.4)
w j + exp − ri=1 λi ai j

Since (ξ1 , . . . , ξn ) ∈ P(A, b), we also have


n
ai j w j
 %r  = bi for i = 1, . . . , r. (8.5.1.5)
j=1
w j + exp − i=1 λi ai j

Equations (8.5.1.5) imply that (λ1 , . . . , λr ) is the (necessarily unique) critical point
of the strictly concave function
  !

n 
r 
r
(s1 , . . . , sr ) −→ ln 1 + w j exp ai j si − bi si ,
j=1 i=1 i=1
288 8 Partition Functions of Integer Flows

and, consequently, ti∗ = eλi for i = 1, . . . , r is the point where the infimum in
(8.5.1.1) attained. Hence the bound in the right hand side of (8.5.1.1) is
 !   !

r n 
r
exp − λi bi 1 + w j exp ai j λ j , (8.5.1.6)
i=1 j=1 i=1

while from (8.5.1.3) and (8.5.1.4), we get

n  
1   1
ξ j ln + 1 − ξ j ln + ξ j ln w j
j=1
ξj 1 − ξj
   
n
1 − ξj
n
 
= ξ j ln + wj − ln 1 − ξ j
j=1
ξ j j=1
  r !

n r n 
=− ξ j λi ai j + ln 1 + w j exp λi ai j
j=1 i=1 j=1 i=1
  !

r 
n 
r
=− λi bi + ln 1 + w j exp λi ai j
i=1 j=1 i=1

and hence the bounds (8.5.1.1) and (8.5.1.2) indeed coincide.


The advantage of (8.5.1.2) is that it admits a useful probabilistic interpretation.
Let X = (X 1 , . . . , X n ) be an n-vector of independent Bernoulli random variables
such that
   
Pr X j = 1 = ξ j and Pr X j = 0 = 1 − ξ j for j = 1, . . . , n.

Then from (8.5.1.3) and (8.5.1.4), we conclude that for any vector x ∈ X0 (A, b),
x = (x1 , . . . , xn ), we have
 x j
n
x  1−x j n
ξj
n
 
Pr (X = x) = ξj j 1 − ξj = 1 − ξj
j=1 j=1
1 − ξj j=1
⎧ ⎫
⎨n 
r 
n ⎬ n
1
= exp λi x j ai j + x j ln w j %r 
⎩ ⎭ 1 + w j exp i=1 λi ai j
j=1 i=1 j=1 j=1
⎛ ⎞  r !⎛ ⎞
n  n
1
=⎝ w j j ⎠ exp λi bi ⎝ ⎠ .
x
%r
j=1 i=1 j=1
1 + w j exp i=1 λi ai j

We note that the probability that the random vector X hits a particular point x ∈
X0 (A, b) is proportional to the contribution w1x1 · · · wnxn of that point x to the partition
function S0 (A, b; w). We obtain the following identity for the partition function:
8.5 Concluding Remarks: Partition Functions for Integer Points in Polyhedra 289

S0 (A, b; w) =Pr (X ∈ X0 (A, b))


 r !   !
 n 
r
× exp − λi bi 1 + w j exp λi ai j (8.5.1.7)
i=1 j=1 i=1

=Pr (X ∈ X0 (A, b)) exp {Hw (ξ1 , . . . , ξn )} .

Comparing (8.5.1.6) and (8.5.1.7), we conclude that the upper bound (8.5.1.1)–
(8.5.1.2) is just a consequence of the trivial bound

Pr (X ∈ X0 (A, b)) ≤ 1. (8.5.1.8)

One can try to improve the bound (8.5.1.1)–(8.5.1.2) by trying to strengthen (8.5.1.8).
Thus we want to estimate the probability that a vector X = (X 1 , . . . , X n ) of
independent
%n Bernoulli random variables satisfies the system of linear equations
j=1 ai j X j = bi for i = 1, . . . , r . In [Sh10], Shapiro used anti-concentration
inequalities to sharpen (8.5.1.8). In particular, one obtains
   
Pr (X ∈ X0 (A, b)) ≤ min max ξ j1 , 1 − ξ j1 · · · max ξ jr , 1 − ξ jr ,
j1 ,..., jr

where the minimum of the products is taken over all collections of r linearly inde-
pendent columns of the matrix A. This results in an improvement, often substantial,
of the bound (8.5.1.1)–(8.5.1.2):

S0 (A, b; w) ≤ exp {Hw (ξ1 , . . . , ξn )}


   
× min max ξ j1 , 1 − ξ j1 · · · max ξ jr , 1 − ξ jr .
j1 ,..., jr

Another useful observation is that


⎛ ⎞
 n 
n
E ⎝ ⎠
ai j X j = ai j ξ j = bi for i = 1, . . . , r.
j=1 j=1

Therefore, one may try to adapt the local Central Limit Theorem approach to estimate
Pr (X ∈ X0 (A)) in (8.5.1.7). It% % the r × r covariance matrix
is not hard to compute
Q = qi j of random variables nj=1 a1 j X j , . . . , nj=1 ar j X j :


n
 
qi j = aik a jk ξk − ξk2
k=1

and the local Central Limit Theorem, when applicable, would imply that

det 
Pr (X ∈ X0 (A, b)) ≈ √ ,
(2π)r/2 det Q
290 8 Partition Functions of Integer Flows

where det  is the determinant of the lattice  ⊂ Zr generated by the columns


of the matrix A. This approach was used in [BH13] to obtain asymptotically exact
formulas for the number of graphs with a given degree sequence and for the number
0-1 integer matrices with prescribed row and column sums, and in [BH10], [Be14]
for the number of 0-1 and non-negative integer d-dimensional arrays with prescribed
sums over (d − 1)-dimensional “slices”.
Suppose that the set X0 (A, b) is not empty and let us consider it as a finite prob-
ability space, where Pr (x) is proportional to w1x1 · · · wnxn for x = (x1 , . . . , xn ). If
there is a lower bound for S0 (A, b; w), complementing the upper bound (8.5.1.1)
and (8.5.1.2) as in Theorem 8.1.3 for the partition function of 0-1 flows, one can
deduce that a random point x ∈ X0 (A, b) in many respects behaves as a vector
X = (X 1 , . . . , X n ) of independent Bernoulli random variables. Indeed, it follows
from (8.5.1.7) that Pr (X ∈ X0 (A, b)) is not too small, and hence various averaging
statistics on X and x ∈ X0 (A, b) are sufficiently close. This observation was used in
[B10b, BH13, C+11].

8.5.2  Partition
 functions of non-negative integer points. As in Sect. 8.5.1, let
A = ai j be an integer r × n matrix of rank r , let b = (b1 , . . . , br ) be an integer
r -vector, and let w = (w1 , . . . , wn ) be a positive real vector of weights. We consider
the set X+ (A, B) of non-negative vectors x that lie in the affine subspace defined by
the system Ax = b:
 
n
X+ (A, b) = x = (x1 , . . . , xn ) : ai j x j = bi for i = 1, . . . , r and
j=1

x j ∈ Z and x j ≥ 0 for j = 1, . . . , n .

To avoid convergence issues, we assume that X+ (A, b) is finite and consider a


weighted sum (partition function) over X+ (A, b):

S+ (A, b; w) = w1x1 · · · wnxn .
x∈X+ (A,b)
x=(x1 ,...,xn )

The estimates for S+ (A, b; w) are similar to those for S0 (A, b; w) in Sect. 8.5.1.
Below, we briefly sketch them, see also [BH10]. We get an upper bound
n
1
S+ (A, b; w) ≤ inf t1−b1 · · · tr−br a a . (8.5.2.1)
t1 ,...,tr >0
a1 j ar j j=1
1 − w j t1 1 j · · · tr r j
w j t1 ···tr <1
for j=1,...,n

The dual form of (8.5.2.1) is as follows. Let P+ (A, b) be the polyhedron that is the
intersection of the non-negative orthant Rn+ with the affine subspace defined by the
system Ax = b,
8.5 Concluding Remarks: Partition Functions for Integer Points in Polyhedra 291

 
n
P+ (A, b) = x = (x1 , . . . , xn ) : ai j x j = bi for i = 1, . . . , r and
j=1

x j ≥ 0 for j = 1, . . . , n .

Suppose that P+ (A, b) is bounded and has a non-empty relative interior, that is,
contains a point x = (x1 , . . . , xn ) such that x j > 0 for j = 1, . . . , n. Let us consider
a strictly concave function

  n
 
G w x1 , . . . , x j = (x j + 1) ln(x j + 1) − x j ln x j + x j ln w j
j=1

where x j ≥ 0 for j = 1, . . . , n.

Then G w attains its maximum on P+ (a, b) at a unique point (ξ1 , . . . , ξn ) in the


relative interior of P, see [BH10] for detail, and

S+ (A, b; w) ≤ exp {G w (ξ1 , . . . , ξn )} . (8.5.2.2)

Moreover, the bounds of (8.5.2.1) and (8.5.2.2) are identical.


The probabilistic interpretation of the bound (8.5.2.1)–(8.5.2.2) is as follows. Let
X = (X 1 , . . . , X n ) be an n-vector of independent geometric random variables such
that
 k
  1 ξj
Pr X j = k = for k = 0, 1, . . . and j = 1, . . . , n,
1 + ξj 1 + ξj

so that E X j = ξ j . Then

S+ (A, b; w) = Pr (X ∈ X+ (A, b)) G w (ξ1 , . . . , ξn ) (8.5.2.3)

and (8.5.2.1), (8.5.2.2) can be written as

Pr (X ∈ X+ (A, b)) ≤ 1.

In [Sh10], using anti-concentration inequalities, Shapiro obtained a stronger bound

1
Pr (X ∈ X+ (A, b)) ≤ min    ,
j1 ,..., jr 1 + ξ j · · · 1 + ξ j
1 r

where the minimum is taken over all collections { j1 , . . . , jr } of r linearly indepen-
%n columns of A.
dent %The r × r covariance matrix Q = qi j of random variables
n
j=1 a1 j X j , . . . , j=1 ar j X j is computed as
292 8 Partition Functions of Integer Flows


n
 
qi j = aik a jk ξk + ξk2
k=1

and the local Central Limit Theorem, when holds, implies that

det 
Pr (X ∈ X+ (A, b)) ≈ √ ,
(2π)r/2 det Q

where  ⊂ Zr is the lattice generated by the columns of A. In [BH12], this approach


applied to obtain asymptotically exact formulas for the number of non-negative
integer matrices with prescribed row and column sums.
Suppose that the set X+ (A, b) is not empty and let us consider it as a finite
probability space, where Pr (x) is proportional to w1x1 · · · wnxn for x = (x1 , . . . , xn ). If
there is a lower bound for S+ (A, b; w), complementing the upper bound (8.5.2.1) and
(8.5.2.2) as in Sect. 8.4 for the partition function of integer flows, one can deduce that a
random point x ∈ X+ (A, b) in many respects behaves as a vector X = (X 1 , . . . , X n )
of independent geometric random variables. Indeed, it follows from (8.5.2.3) that
Pr (X ∈ X0 (A, b)) is not too small, and hence various averaging statistics on X and
x ∈ X0 (A, b) are sufficiently close. This observation was used in [B10a].
References

[AA13] S. Aaronson and A. Arkhipov, The computational complexity of linear optics, Theory of
Computing 9 (2013), 143–252.
[Ai07] M. Aigner, A Course in Enumeration, Graduate Texts in Mathematics, 238, Springer,
Berlin, 2007.
[Al38] A.D. Alexandrov, On the theory of mixed volumes of convex bodies. IV. Mixed discrimi-
nants and mixed volumes (Russian), Matematicheskii Sbornik (Novaya Seriya) 3 (1938),
227–251.
[AF08] N. Alon and S. Friedland, The maximum number of perfect matchings in graphs with a
given degree sequence, Electronic Journal of Combinatorics 15 (2008), no. 1, Note 13, 2
pp.
[AL06] A. Amit and N. Linial, Random lifts of graphs: edge expansion, Combinatorics, Proba-
bility and Computing 15 (2006), no. 3, 317–332.
[AB09] S. Arora and B. Barak, Computational Complexity. A Modern Approach, Cambridge
University Press, Cambridge, 2009.
[As70] T. Asano, Theorems on the partition functions of the Heisenberg ferromagnets, Journal
of the Physical Society of Japan 29 (1970), 350–359.
[BV09] W. Baldoni and M. Vergne, Kostant partitions functions and flow polytopes, Transforma-
tion Groups 13 (2008), no. 3–4, 447–469.
[BG08] A. Bandyopadhyay and D. Gamarnik, Counting without sampling: asymptotics of the log-
partition function for certain statistical physics models, Random Structures & Algorithms
33 (2008), no. 4, 452–479.
[Ba89] R.B. Bapat, Mixed discriminants of positive semidefinite matrices, Linear Algebra and
its Applications 126 (1989), 107–124.
[BR97] R.B. Bapat and T.E.S. Raghavan, Nonnegative Matrices and Applications, Encyclopedia
of Mathematics and its Applications, 64 Cambridge University Press, Cambridge, 1997.
[Ba96] A. Barvinok, Two algorithmic results for the traveling salesman problem, Mathematics
of Operations Research 21 (1996), no. 1, 65–84.
[Ba99] A. Barvinok, Polynomial time algorithms to approximate permanents and mixed discrim-
inants within a simply exponential factor, Random Structures & Algorithms 14 (1999),
no. 1, 29–61.
[Ba07] A. Barvinok, Brunn-Minkowski inequalities for contingency tables and integer flows,
Advances in Mathematics 211 (2007), 105–122.
[Ba09] A. Barvinok, Asymptotic estimates for the number of contingency tables, integer flows,
and volumes of transportation polytopes, International Mathematics Research Notices.
IMRN 2009 (2009) no. 2, 348–385.
© Springer International Publishing AG 2016 293
A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9
294 References

[B10a] A. Barvinok, What does a random contingency table look like?, Combinatorics, Proba-
bility and Computing 19 (2010), no. 4, 517–539.
[B10b] A. Barvinok, On the number of matrices and a random matrix with prescribed row and
column sums and 0-1 entries, Advances in Mathematics 224 (2010), no. 1, 316–339.
[Ba15] A. Barvinok, On testing Hamiltonicity of graphs, Discrete Mathematics 338 (2015), no.
1, 53–58.
[B16a] A. Barvinok, Concentration of the mixed discriminant of well-conditioned matrices, Lin-
ear Algebra and its Applications 493 (2016), 120–133.
[B16b] A. Barvinok, Computing the permanent of (some) complex matrices, Foundations of
Computational Mathematics 16 (2016), no. 2, 329–342.
[B16+] A. Barvinok, Approximating permanents and hafnians, preprint arXiv:1601.07518
(2016); Discrete Analysis 2017:2.
[BH10] A. Barvinok and J.A. Hartigan, Maximum entropy Gaussian approximations for the num-
ber of integer points and volumes of polytopes, Advances in Applied Mathematics 45
(2010), no. 2, 252–289.
[BH12] A. Barvinok and J.A. Hartigan, An asymptotic formula for the number of non-negative
integer matrices with prescribed row and column sums, Transactions of the American
Mathematical Society 364 (2012), no. 8, 4323–4368.
[BH13] A. Barvinok and J.A. Hartigan, The number of graphs and a random graph with a given
degree sequence, Random Structures & Algorithms 42 (2013), no. 3, 301–348.
[BS11] A. Barvinok and A. Samorodnitsky, Computing the partition function for perfect match-
ings in a hypergraph, Combinatorics, Probability and Computing 20 (2011) no. 6, 815–
835.
[BS14] A. Barvinok and P. Soberón, Computing the partition function for graph homomorphisms,
preprint arXiv:1406.1771, to apear in Combinatorica, first online doi:10.1007/s00493-
016-3357-2 (2014).
[BS16] A. Barvinok and P. Soberón, Computing the partition function for graph homomorphisms
with multiplicities, Journal of Combinatorial Theory, Series A 137 (2016), 1–26.
[Ba82] R.J. Baxter, Exactly Solved Models in Statistical Mechanics (1982), Academic Press,
Inc. [Harcourt Brace Jovanovich, Publishers], London.
[B+07] M. Bayati, D. Gamarnik, D. Katz, C. Nair and P. Tetali, Simple deterministic approxi-
mation algorithms for counting matchings, STOC’07 – Proceedings of the 39th Annual
ACM Symposium on Theory of Computing, ACM, New York, 2007, pp. 122–127.
[Be14] D. Benson-Putnins, Counting integer points in multi-index transportation polytopes,
preprint arXiv:1402.4715 (2014).
[BK99] P. Berman and M. Karpinski, On some tighter inapproximability results (extended
abstract), Automata, languages and programming (Prague, 1999), Lecture Notes in Com-
puter Science, 1644, Springer, Berlin, 1999, pp. 200–209.
[BB09] J. Borcea and P. Brändén, The Lee-Yang and Pólya-Schur programs. II. Theory of stable
polynomials and applications, Communications on Pure and Applied Mathematics 62
(2009), no. 12, 1595–1631.
[Bo06] C. Borgs, Absence of zeros for the chromatic polynomial on bounded degree graphs,
Combinatorics, Probability and Computing 15 (2006), no. 1–2, 63–74.
[B+11] P. Brändén, J. Haglund, M. Visontai, and D.G. Wagner, Proof of the monotone column
permanent conjecture, Notions of positivity and the geometry of polynomials, Trends in
Mathematics, Birkhäuser/Springer Basel AG, Basel, 2011, pp. 63–78.
[Br73] L.M. Bregman, Certain properties of nonnegative matrices and their permanents
(Russian), Doklady Akademii Nauk SSSR 211 (1973), 27–30.
[BR91] R.A. Brualdi and H.J. Ryser, Combinatorial Matrix Theory, Encyclopedia of Mathematics
and its Applications, 39, Cambridge University Press, Cambridge, 1991.
[Bu15] B. Bukh, personal communication (2015).
[BG05] A. Bulatov and M. Grohe, The complexity of partition functions, Theoretical Computer
Science 348 (2005), no. 2–3, 148–186.
References 295

[B+97] P. Bürgisser, M. Clausen and M.A. Shokrollahi, Algebraic Complexity Theory. With
the collaboration of Thomas Lickteig, Grundlehren der Mathematischen Wissenschaften
[Fundamental Principles of Mathematical Sciences], 315, Springer-Verlag, Berlin, 1997.
[C+13] J.-Y. Cai, X. C., Xi and P. Lu, Graph homomorphisms with complex values: a dichotomy
theorem, SIAM Journal on Computing 42 (2013), no. 3, 924–1029.
[C+08] E.R. Canfield, C. Greenhill and B.D. McKay, Asymptotic enumeration of dense 0-1 matri-
ces with specified line sums, Journal of Combinatorial Theory. Series A 115 (2008), no.
1, 32–66.
[CM10] E.R. Canfield and B.D. McKay, Asymptotic enumeration of integer matrices with large
equal row and column sums, Combinatorica 30 (2010), no. 6, 655–680.
[C+11] S. Chatterjee, P. Diaconis and A. Sly, Random graphs with a given degree sequence, The
Annals of Applied Probability 21 (2011), no. 4, 1400–1435.
[CS07] M. Chudnovsky and P. Seymour, The roots of the independence polynomial of a clawfree
graph, Journal of Combinatorial Theory. Series B 97 (2007), no. 3, 350–357.
[CP16] D. Cifuentes and P.A. Parrilo, An efficient tree decomposition method for permanents and
mixed discriminants, Linear Algebra and its Applications 493 (2016), 45–81.
[Ci87] B.A. Cipra, An Introduction to the Ising Model, American Mathematical Monthly 94
(1987), no. 10, 937–959.
[CV09] K. Costello and V. Vu, Concentration of random determinants and permanent estimators,
SIAM Journal on Discrete Mathematics 23 (2009) no. 3, 1356–1371.
[Cs14] P. Csikvári, Lower matching conjecture, and a new proof of Schrijver’s and Gurvits’s the-
orems, preprint arXiv:1406.0766, to appear in the Journal of the European Mathematical
Society (2014).
[CF16] P. Csikvári and P.E. Frenkel, Benjamini - Schramm continuity of root moments of graph
polynomials, European Journal of Combinatorics 52, Part B (2016), 302–320.
[CK09] B. Cuckler and J. Kahn, Entropy bounds for perfect matchings and Hamiltonian cycles,
Combinatorica 29 (2009), no. 3, 327–335.
[D+15] E. Davies, M. Jenssen, W. Perkins and B. Roberts, Independent sets, matchings, and
occupancy fractions, preprint arXiv:1508.04675 (2015).
[DG95] P. Diaconis and A. Gangolli, Rectangular arrays with fixed margins, Discrete Probability
and Algorithms (Minneapolis, MN, 1993), The IMA Volumes in Mathematics and its
Applications, 72, Springer, New York, 1995, pp. 15–41.
[Do96] R.L. Dobrushin, Perturbation methods of the theory of Gibbsian fields, Lectures on prob-
ability theory and statistics (Saint-Flour, 1994), Lecture Notes in Mathematics, vol. 1648,
Springer, Berlin, 1996, pp. 1–66.
[DS87] R.L. Dobrushin and S.B. Shlosman, Completely analytical interactions: constructive
description, Journal of Statistical Physics 46 (1987), no. 5–6, 983–1014.
[DG87] S.J. Dow and P.M. Gibson, An upper bound for the permanent of a 3-dimensional (0,1)-
matrix, Proceedings of the American Mathematical Society 99 (1987), no. 1, 29–34.
[D+14] A. Dudek, M. Karpinski, A. Ruciński and E. Szymańska, Approximate counting of match-
ings in (3,3)-hypergraphs, Algorithm theory–SWAT 2014, Lecture Notes in Computer
Science, 8503, Springer, Cham, 2014, pp. 380–391.
[Eg81] G.P. Egorychev, The solution of van der Waerden’s problem for permanents, Advances
in Mathematics 42 (1981), no. 3, 299–305.
[E+84] V.A. Emelichev, M.M. Kovalev and M.K. Kravtsov, Polytopes, Graphs and Optimization,
Cambridge University Press, New York, 1984.
[E+11] L. Esperet, F. Kardoš, A.D. King, D. Král and S. Norine, Exponentially many perfect
matchings in cubic graphs, Advances in Mathematics 227 (2011), no. 4 1646–1664.
[Fa81] D.I. Falikman, Proof of the van der Waerden conjecture on the permanent of a doubly
stochastic matrix (Russian), Matematicheskie Zametki 29 (1981), no. 6, 931–938.
[Fi66] M.E. Fisher, On the dimer solution of planar Ising models, Journal of Mathematical
Physics 7, no. 10, 1776–1781.
[Fr79] S. Friedland, A lower bound for the permanent of a doubly stochastic matrix, Annals of
Mathematics (2) 110 (1979), no. 1, 167–176.
296 References

[Fr11] S. Friedland Positive diagonal scaling of a nonnegative tensor to one with prescribed
slice sums, Linear Algebra and its Applications 434 (2011), no. 7, 1615–1619.
[FG06] S. Friedland and L. Gurvits, Generalized Friedland-Tverberg inequality: applications
and extensions, preprint arXiv:math/0603410 (2006).
[F+04] S. Friedland, B. Rider, and O. Zeitouni, Concentration of permanent estimators for certain
large matrices, The Annals of Applied Probability 14 (2004), no. 3, 1559–1576.
[GL99] A. Galluccio and M. Loebl, On the theory of Pfaffian orientations. I. Perfect matchings and
permanents, Research Paper 6, 18 pp. (electronic), Electronic Journal of Combinatorics
6 (1999).
[Ga16] D. Gamarnik, personal communication (2016).
[GK10] D. Gamarnik and D. Katz, A deterministic approximation algorithm for computing the
permanent of a 0, 1 matrix, Journal of Computer and System Sciences 76 (2010), no. 8,
879–883.
[GG78] C.D. Godsil and I. Gutman, On the matching polynomial of a graph, Algebraic methods
in graph theory, Vol. I, II (Szeged, 1978), Colloquia Mathematica Societatis János Bolyai,
25, 1978, pp. 241–249.
[GJ12] L.A. Goldberg and M. Jerrum, Approximating the partition function of the ferromagnetic
Potts model, Journal of the ACM 59 (2012), no. 5, Art. 25, 31 pp.
[GJ15] L.A. Goldberg and M. Jerrum, A complexity classification of spin systems with an external
field, Proceedings of the National Academy of Sciences of the United States of America
112 (2015), no. 43, 13161–13166.
[Go08] O. Goldreich, Computational Complexity. A Conceptual Perspective, Cambridge Univer-
sity Press, Cambridge, 2008.
[GM08] C. Greenhill and B.D. McKay, Asymptotic enumeration of sparse nonnegative integer
matrices with specified row and column sums, Advances in Applied Mathematics 41
(2008), no. 4, 459–481.
[G+06] C. Greenhill, B.D. McKay and X. Wang, Asymptotic enumeration of sparse 0-1 matrices
with irregular row and column sums, Journal of Combinatorial Theory. Series A 113
(2006), no. 2, 291–324.
[GK87] D.Yu. Grigoriev and M. Karpinski, The matching problem for bipartite graphs with
polynomially bounded permanents is in NC, in Proceedings of the 28th IEEE Symposium
on the Foundations on Computer Science, 1987, pp. 166–172.
[Gu04] L. Gurvits, Classical complexity and quantum entanglement, Journal of Computer and
System Sciences 69 (2004), no. 3, 448–484.
[Gu05] L. Gurvits, On the complexity of mixed discriminants and related problems, Mathemat-
ical foundations of computer science 2005, Lecture Notes in Computer Science, 3618,
Springer, Berlin, 2005, pp. 447–458.
[Gu06] L. Gurvits, The van der Waerden conjecture for mixed discriminants, Advances in Math-
ematics 200 (2006), no. 2, 435–454.
[Gu08] L. Gurvits, Van der Waerden/Schrijver-Valiant like conjectures and stable (aka hyper-
bolic) homogeneous polynomials: one theorem for all. With a corrigendum, Electronic
Journal of Combinatorics 15 (2008), no. 1, Research Paper 66, 26 pp.
[Gu11] L. Gurvits, Unleashing the power of Schrijver’s permanental inequality with the help of
the Bethe Approximation, preprint arXiv:1106.2844 (2011).
[Gu15] L. Gurvits, Boolean matrices with prescribed row/column sums and stable homogeneous
polynomials: combinatorial and algorithmic applications, Information and Computation
240 (2015), 42–55.
[GS02] L. Gurvits and A. Samorodnitsky, A deterministic algorithm for approximating the mixed
discriminant and mixed volume, and a combinatorial corollary, Discrete & Computa-
tional Geometry 27 (2002), no. 4, 531–550.
[GS14] L. Gurvits and A. Samorodnitsky, Bounds on the permanent and some applications,
preprint arXiv:1408.0976 (2014).
[H+16] N. J. A. Harvey, P. Srivastava and J. Vondrák, Computing the independence polynomial
in Shearer’s region for the LLL, preprint arXiv:1608.02282 (2016).
References 297

[Hå99] J. Håstad, Clique is hard to approximate within n 1− , Acta Mathematica 182 (1999), no.
1, 105–142.
[HL72] O.J. Heilmann and E.H. Lieb, Theory of monomer-dimer systems, Communications in
Mathematical physics 25 (1972), 199–232.
[Hi97] A. Hinkkanen, Schur products of certain polynomials, Lipa’s legacy (New York, 1995),
Contemporary Mathematics, 211 Amer. Math. Soc., Providence, RI, 1997, pp. 285–295.
[IM16] M. Isaev and B.D. McKay, Complex martingales and asymptotic enumeration, preprint
arXiv:1604.08305 (2016).
[Ja03] B. Jackson, Zeros of chromatic and flow polynomials of graphs, Journal of Geometry 76
(2003), no. 1–2, 95–109.
[Je03] M. Jerrum, Counting, Sampling and Integrating: Algorithms and Complexity, Lectures
in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2003.
[JS93] M. Jerrum and A. Sinclair, Polynomial-time approximation algorithms for the Ising model,
SIAM Journal on Computing 22 (1993), no. 5, 1087–1116.
[J+04] M. Jerrum, A. Sinclair and E. Vigoda, A polynomial-time approximation algorithm for
the permanent of a matrix with nonnegative entries, Journal of the ACM 51 (2004), no.
4, 671–697.
[Ka01] J. Kahn, An entropy approach to the hard-core model on bipartite graphs, Combinatorics,
Probability and Computing 10 (2001), no. 3, 219–237.
[Ka16] G. Kalai, The quantum computer puzzle (expanded version), preprint arXiv:1605.00992
(2016).
[Ka63] P.W. Kasteleyn, Dimer statistics and phase transitions, Journal of Mathematical Physics
4 (1963), 287–293.
[Ke14] P. Keevash, The existence of designs, preprint arXiv:1401.3665 (2014).
[Ke15] P. Keevash, Counting designs, preprint arXiv:1504.02909, to appear in the Journal of the
European Mathematical Society (2015).
[Kh57] A.I. Khinchin, Mathematical Foundations of Information Theory, Dover Publications,
Inc., New York, N. Y., 1957.
[Kh84] A.G. Khovanskii, Analogues of Aleksandrov-Fenchel inequalities for hyperbolic forms
(Russian), Doklady Akademii Nauk SSSR 276 (1984), no. 6, 1332–1334.
[KW16] M. Kontorovich and H. Wu, REU project on complex roots and approximation of per-
manents, manuscript, University of Michigan, available at http://lsa.umich.edu/content/
dam/math-assets/math-document/reu-documents/Kontorovich%20%26%20Han%
20Wu.pdf (2016).
[Kr01] S.G. Krantz, Function Theory of Several Complex Variables. Reprint of the 1992 edition,
AMS Chelsea Publishing, Providence, RI, 2001.
[LS10] M. Laurent and A. Schrijver, On Leonid Gurvits’s proof for permanents, American Math-
ematical Monthly 117 (2010), no. 10, 903–911.
[Le01] M. Ledoux, The Concentration of Measure Phenomenon, Mathematical Surveys and
Monographs, 89, American Mathematical Society, Providence, RI, 2001.
[LY52] T.D. Lee and C.N. Yang, Statistical theory of equations of state and phase transitions.
II. Lattice gas and Ising model, Physical Review (2) 87 (1952), 410–419.
[Le93] K. Leichtweiß,Convexity and differential geometry, Handbook of Convex Geometry, Vol.
A, B North-Holland, Amsterdam, 1993, pp. 1045–1080.
[Le15] M. Lelarge, Counting matchings in irregular bipartite graphs, preprint arXiv:1507.04739
(2015).
[LL13] N. Linial and Z. Luria, An upper bound on the number of Steiner triple systems, Random
Structures & Algorithms 43 (2013), no. 4, 399–406.
[LL14] N. Linial and Z. Luria, An upper bound on the number of high-dimensional permutations,
Combinatorica 34 (2014), no. 4, 471–486.
[LR05] N. Linial and E. Rozenman, Random lifts of graphs: perfect matchings, Combinatorica
25 (2005), no. 4, 407–424.
[L+00] N. Linial, A. Samorodnitsky and A. Wigderson, A deterministic strongly polynomial
algorithm for matrix scaling and approximate permanents, Combinatorica 20 (2000),
no. 4, 545–568.
298 References

[Lo12] L. Lovász, Large Networks and Graph Limits, American Mathematical Society Collo-
quium Publications, 60 American Mathematical Society, Providence, RI, 2012.
[LP09] L. Lovász and M.D. Plummer, Matching Theory. Corrected reprint of the 1986 original,
AMS Chelsea Publishing, Providence RI, 2009.
[LY13] P. Lu and Y. Yin, Improved FPTAS for multi-spin systems, Approximation, Random-
ization, and Combinatorial Optimization, Lecture Notes in Computer Science, 8096,
Springer, Heidelberg, 2013, pp. 639–654.
[LV99] M. Luby and E. Vigoda, Fast convergence of the Glauber dynamics for sampling inde-
pendent sets, Random Structures & Algorithms 15 (1999), no. 3–4, 229–241.
[M+15] A.W. Marcus, D.A. Spielman and N. Srivastava, Interlacing families II: Mixed charac-
teristic polynomials and the Kadison-Singer problem, Annals of Mathematics. Second
Series 182 (2015), no. 1, 327–350.
[Ma66] M. Marden, Geometry of Polynomials. Second edition, Mathematical Surveys, No. 3,
American Mathematical Society, Providence, R.I., 1966.
[MO68] A.W. Marshall and I. Olkin, Scaling of matrices to achieve specified row and column
sums, Numerische Mathematik 12 (1968), 83–90.
[Mc14] P. McCullagh, An asymptotic approximation for the permanent of a doubly stochastic
matrix, Journal of Statistical Computation and Simulation 84 (2014), no. 2, 404–414.
[Mi78] H. Minc, Permanents, Encyclopedia of Mathematics and its Applications, 6, Addison-
Wesley Publishing Co., Reading, Mass., 1978.
[Ne04] Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course, Applied
Optimization, vol. 87, Kluwer Academic Publishers, Boston, MA, 2004.
[On44] L. Onsager, Crystal statistics. I. A two-dimensional model with an order-disorder tran-
sition, Physical Review (2) 65 (1944), 117–149.
[PS98] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Com-
plexity. Corrected reprint of the 1982 original, Dover Publications, Inc., Mineola, NY,
1998.
[PR16] V. Patel and G. Regts, Deterministic polynomial-time approximation algorithms for par-
tition functions and graph polynomials, preprint arXiv:1607.01167 (2016).
[Po15] V.N. Potapov, On the number of latin hypercubes, pairs of orthogonal latin squares and
MDS codes, preprint arXiv:1510.06212 (2015).
[Pr94] V.V. Prasolov, Problems and Theorems in Linear Algebra, Translations of Mathematical
Monographs, 134, American Mathematical Society, Providence, RI, 1994.
[PV97] I.E. Pritsker and R.S. Varga, The Szegő curve, zero distribution and weighted approxima-
tion, Transactions of the American Mathematical Society 349 (1997), no. 10, 4085–4105.
[Ra97] J. Radhakrishnan, An entropy proof of Bregman’s theorem, Journal of Combinatorial
Theory. Series A 77 (1997), no. 1, 161–164.
[Re15] G. Regts, Zero-free regions of partition functions with applications to algorithms and
graph limits, preprint arXiv:1507.02089, to appear in Combinatorica (2015).
[Re16] G. Regts, On a conjecture of Sokal concerning roots of the independence polynomial,
manuscript (2016).
[R+99] N. Robertson, P.D. Seymour and R. Thomas, Permanents, Pfaffian orientations, and even
directed circuits, Annals of Mathematics (2) 150 (1999), no. 3, 929–975.
[R+16] M. Rudelson, A. Samorodnitsky and O. Zeitouni, Hafnians, perfect matchings and
Gaussian matrices, The Annals of Probability 44 (2016), no 4, 2858–2888.
[RZ16] M. Rudelson and O. Zeitouni, Singular values of Gaussian matrices and permanent
estimators, Random Structures & Algorithms 48 (2016), no. 1, 183–212.
[Ru71] D. Ruelle, Extension of the Lee-Yang circle theorem, Physical Review Letters 26 (1971),
303–304.
[Ru99] D. Ruelle, Zeros of graph-counting polynomials, Communications in Mathematical
Physics 200 (1999), no. 1, 43–56.
[Sa01] A. Samorodnitsky, personal communication (2001).
[Sa93] J.R. Sangwine-Yager, Mixed volumes, Handbook of convex geometry, Vol. A, North-
Holland, Amsterdam, 1993, pp. 43–71.
References 299

[Sc98] A. Schrijver, Counting 1-factors in regular bipartite graphs, Journal of Combinatorial


Theory. Series B 72 (1998), no. 1, 122–135.
[SS05] A.D. Scott and A.D. Sokal, The repulsive lattice gas, the independent-set polynomial, and
the Lovász local lemma, Journal of Statistical Physics 118 (2005), no. 5–6, 1151–1261.
[Sh10] A. Shapiro, Bounds on the number of integer points in a polytope via concentration
estimates, preprint arXiv:1011.6252 (2010).
[Si64] R. Sinkhorn, A relationship between arbitrary positive matrices and doubly stochastic
matrices, Annals of Mathematical Statistics 35 (1964), 876–879.
[Sl10] A. Sly, Computational transition at the uniqueness threshold, 2010 IEEE 51st Annual
Symposium on Foundations of Computer Science FOCS 2010, IEEE Computer Soc.,
Los Alamitos, CA, 2010, pp. 287–296.
[SS14] A. Sly and N. Sun, Counting in two-spin models on d-regular graphs, The Annals of
Probability 42 (2014), no. 6, 2383–2416.
[S01a] A. Sokal, Bounds on the complex zeros of (di)chromatic polynomials and Potts-model
partition functions, Combinatorics, Probability and Computing 10 (2001), no. 1, 41–77.
[S01b] A.D. Sokal, A personal list of unsolved problems concerning lattice gases and anti-
ferromagnetic Potts models, Inhomogeneous random systems (Cergy-Pontoise, 2000),
Markov Processes and Related Fields 7 (2001), no. 1, 21–38.
[S+16] R. Song, Y. Yin and J. Zhao, Counting hypergraph matchings up to uniqueness threshold,
In Proceedings of the 20th International Workshop on Randomization and Computation
(RANDOM), Dagstuhl Publishing, Dagstuhl, Germany, 2016, pp. 46:1-46-29.
[So03] G. Soules, New permanental upper bounds for nonnegative matrices, Linear Multilinear
Algebra 51 (2003), no. 4, 319–337.
[Sp75] F. Spitzer, Markov random fields on an infinite tree, The Annals of Probability 3 (1975),
no. 3, 387–398.
[St89] R.P. Stanley, Log-concave and unimodal sequences in algebra, combinatorics, and geom-
etry, Graph Theory and its Applications: East and West (Jinan, 1986), Annals of the New
York Academy of Sciences, 576, New York Acad. Sci., New York, 1989, pp. 500–535.
[TF61] H.N.V. Temperley and M.E. Fisher, Dimer problem in statistical mechanics–an exact
result, Philosophical Magazine (8) 6 (1961), 1061–1063.
[Th06] R. Thomas, A survey of Pfaffian orientations of graphs, International Congress of Math-
ematicians. Vol. III, European Mathematical Society, Zürich, 2006, pp. 963–984.
[Va79] L.G. Valiant, The complexity of computing the permanent, Theoretical Computer Science
8 (1979), no. 2, 189–201.
[Va08] L.G. Valiant, Holographic algorithms, SIAM Journal on Computing 37 (2008), no. 5,
1565–1594.
[Vi12] N.K. Vishnoi, A permanent approach to the traveling salesman problem, 2012 IEEE 53rd
Annual Symposium on Foundations of Computer Science – FOCS 2012, IEEE Computer
Soc., Los Alamitos, CA, 2012, pp. 76–80.
[Vo13] P.O. Vontobel, The Bethe permanent of a nonnegative matrix, IEEE Transactions on
Information Theory 59 (2013), no. 3, 1866–1901.
[Vo79] M. Voorhoeve, A lower bound for the permanents of certain (0,1)-matrices, Koninklijke
Nederlandse Akademie van Wetenschappen. Indagationes Mathematicae 41 (1979), no.
1, 83–86.
[Wa99] D.G. Wagner, Weighted enumeration of spanning subgraphs with degree constraints,
Journal of Combinatorial Theory. Series B 99 (2009), no. 2, 347–357.
[Wa11] D.G. Wagner, Multivariate stable polynomials: theory and applications, American Math-
ematical Society. Bulletin. New Series 48 (2011), no. 1, 53–84.
[Wa03] P. Walker, The zeros of the partial sums of the exponential series, The American Mathe-
matical Monthly (2003), no. 4. 337–339.
[We06] D. Weitz, Counting independent sets up to the tree threshold, STOC’06: Proceedings of
the 38th Annual ACM Symposium on Theory of Computing, ACM, New York, 2006,
pp. 140–149.
300 References

[We97] H. Weyl, The Classical Groups. Their Invariants and Representations. Fifteenth printing,
Princeton Landmarks in Mathematics. Princeton Paperbacks, Princeton University Press,
Princeton, NJ, 1997.
[YL52] C.N. Yang and T.D. Lee, Statistical theory of equations of state and phase transitions. I.
Theory of condensation, Physical Review (2) 87 (1952), 404–409.
[Zh16] Y. Zhao, Extremal regular graphs: independent sets and graph homomorphisms, preprint
arXiv:1610.09210 (2016).
[Zu07] D. Zuckerman, Linear degree extractors and the inapproximability of max clique and
chromatic number, Theory of Computing 3 (2007), 103–128.
[Zv97] A. Zvonkin, Matrix integrals and map enumeration: an accessible introduction, Com-
binatorics and physics (Marseilles, 1995), Mathematical and Computer Modelling 26
(1997), no. 8–10, 281–304.
Index

C partition, 1, 145, 181, 229, 244, 262, 269,


Capacity 273
of a polynomial, 19 Kostant, 275
α-conditioned
matrix
positive, 68 G
positive semidefinite, 136 Girth, 156
symmetric, 102 Graph
tensor, 124 adjacency matrix of, 49, 94
Correlation decay, 3 biadjacency matrix of, 48
for general graphs, 213 bipartite, 48
for regular trees, 196 matching polynomial of, 158
for the matching polynomial, 150, 152, claw-free, 188
156 coloring, 219, 222
Cycle cut, 237
cover, 49 directed, 108
Hamiltonian, 1, 4, 88 flow, 271, 274
oriented girth of, 156
evenly, 109 homomorphism, 229
oddly, 109 edge-colored, 233
partition function, 229
with multiplicities, 244
E independent set of, 181, 230
Entropy, 13 k-regular, 156
Bethe, 15, 166 2-lift of, 159
conditional, 14 line, 188
of a partition, 14 neighborhood of a vertex, 182
free, 192
planar, 2
F polynomial of
Formula chromatic, 187, 219
Euler’s, 113 independence, 181
Wick’s, 49, 96 matching, 145
Function Tutte, 186
concave, 10 regular, 156
convex, 9 with multiplicities, 224
© Springer International Publishing AG 2016 301
A. Barvinok, Combinatorics and Complexity of Partition Functions,
Algorithms and Combinatorics 30, DOI 10.1007/978-3-319-51829-9
302 Index

H monomer-dimer, 145
Hafnian, 93
as an integral, 95
as the expectation of a random Pfaffian, P
115 Permanent
concentration of, 101 algorithms for, 91
recursive formula for, 94 and determinant, 88
Hypergraph, 116, 174 expectation, 89
d-partite, 116 as an integral, 49
matching polynomial of, 174 d-dimensional of a tensor, 116
perfect matching of, 116 slice expansion of, 117
Hamiltonian, 87, 246
of a matrix, 47
I complex, 72
Independent set, 181, 230 concentration, 62
Induced subgraph, 182 positive, 78
Inequality positive semidefinite, 51
Alexandrov - Fenchel, 53 row expansion, 48
Bregman - Minc, 58 Pfaffian, 107
for mixed discriminants, 135 orientation, 112
Prekopa - Leindler, 21 Polynomial
van der Waerden, 55 capacity of, 19
chromatic, 187, 220
complete symmetric, 284
L D-stable, 38
2-lift H-stable, 31
of a Graph, 159 independence, 181
of a Matrix, 159 interlacing, 28
matching, 145
computing, 149
M of a bipartite graph, 159
Matching, 145 mixed characteristic, 143
perfect, 2, 48, 94 multi-affine, 38
of a directed graph, 108 Tutte, 186
of a hypergraph, 116 Product
polynomial, 145 Hadamard, 39
computing, 149 Schur, 39, 41
correlation decay for, 150, 154
of a hypergraph, 174
Matrix S
adjacency, 49, 94 Scaling, 3
biadjacency, 48 of matrices, 64, 67, 71
doubly stochastic, 55, 101 of n-tuples of, 134
2-lift of, 159 symmetric, 102
n-tuple of, 134 of tensors, 122
α-conditioned, 136 Sequence
doubly stochastic, 132 log-concave, 30
Mixed discriminant, 129 Subgraph
as the expectation of a determinant, 142 induced, 182
computing, 142
Model
edge-coloring, 238 T
hard core, 181 Tensor, 116
Ising, 258 α-conditioned, 124
Index 303

d-stochastic, 123, 124 Heilmann–Lieb, 146


networks, 238 Laguerre, 42
slice of, 116 Lee–Yang, 258
Theorem of Regts, 217
Csikvári and Lelarge, 160 Scott and Sokal, 184
Davies et al., 190 Szegő, 41
Dobrushin, 183 Weitz, 205
Gauss–Lucas, 18 Tree
Gurvits and Samorodnitsky, 134 k-regular, 156
Gurvits on capacity, 33, 36 correlation decay for, 202, 213

You might also like