Multispin
Multispin
Alexander Barvinok
Typeset by AMS-TEX
1
j-th coordinate of x. The Hamming distance between two points x, y ∈ X is the
number of the coordinates where they differ:
1
|λ| ≤ √ .
3c r − 1
Then
E eλf 6= 0.
Moreover, if, additionally,
We prove Theorem 1.1 in Section 2, and in Section 4 we show that the bound for
|λ| is optimal, up to a constant factor. The method of proof extends those of [Ba17]
2
and [BR19]. The dependence on r is worth noting. One approach frequently used
for problems of this type is the cluster expansion method, see [Je24] for a recent
survey. In the situation of Theorem 1.1 it apparently gives only |λ| = Ω(1/rc)
as a bound for the zero-free region, while also requiring |φi | to remain uniformly
bounded [Ca+22].
The dependence on r allows us to obtain zero-free regions for the partition func-
tion of another model that can be considered as a scaling limit of that described by
Theorem 1.1. We consider the standard Gaussian probability measure in Rn with
density
1 1 2 2
exp − x1 + . . . + xn for x = (x1 , . . . , xn )
(2π)n/2 2
and prove the following result.
(1.2) Theorem. Let φ1 , . . . , φm : Rn −→ C be functions on Euclidean space Rn ,
endowed with the standard Gaussian probability measure. Assume that
(1) Each function φi is 1-Lipschitz in the `1 metric of Rn , that is,
n
X
|φi (x1 , . . . , xn ) − φi (y1 , . . . , yn )| ≤ |xi − yi | ;
i=1
1
|λ| < √ .
6c r − 1
Then
E eλf 6= 0.
Moreover, if, additionally,
1−δ 1−δ
|λ| ≤ √ (in Theorem 1.1) or |λ| ≤ √ (in Theorem 1.2)
3c r − 1 6c r − 1
approximating the value of E eλf within relative error 0 < < 1 reduces to the
computation of the mk expectations
where
k = Oδ (ln(m + n) − ln ) .
We sketch the algorithm in Section 5. In many applications of Theorem 1.1 the
probability spaces Xj are finite. Assuming that each Xj contains at most q ele-
ments,computing each expectation E (φi1 · · · φik ) by the direct enumeration takes
O q rk time, provided we have an access to the values of φi (x) for a given x ∈ X.
If the parameter r is fixed in advance, we obtain a quasi-polynomial algorithm of
(q(m + n))O(ln(m+n)−ln ) complexity. Moreover, the general technique of Patel and
Regts [PR17] allows one to speed it up to a genuinely polynomial time algorithm
O(1)
of q(m + n)−1 complexity, provided the parameter c is also fixed in advance.
In the context of Theorem 1.2, the expectations E (φi1 · · · φik ) are represented by
integrals in the space of dimension at most kr and often can be efficiently computed
or approximated with high accuracy, assuming again that r is fixed in advance. In
that case, we also obtain an algorithm of quasi-polynomial complexity.
As we mentioned, the expectation E eλf appears in several different, though
closely related contexts.
(1.3) Statistical physics: partition functions of systems with multi-spin
interactions. Here we describe the statistical physics context of Theorem 1.1.
Suppose that we have a system of n particles, where the j-th particle can be in a
state described by a point xj ∈ Xj , also called the spin of the particle. The vector
x = (x1 , . . . , xn ) of all spins is called a spin configuration. The particles interact
in various ways, and the energy of a spin configuration is given by a function
f (x1 , . . . , xn ). Then for a real λ > 0, interpreted as the inverse temperature, the
value of E e−λf is the partition function of the system, see, for example, [FV18] for
the general setup. If the function f is written as a sum of φi , as in Theorem 1.1,
then the energy of the system is a sum over subsystems, each containing at most
r particles, and each particle participating in at most c subsystems. The classical
works of Lee and Yang [YL52], [LY52] relate the complex zeros of the partition
4
function λ 7−→ E eλf to the phenomenon of phase transition. Generally, zero-free
regions like the one described by Theorem 1.1 correspond to regimes with no phase
transition.
In terms of Theorem 1.1, the most studied case is that of r = 2, which includes
the classical Ising, Potts and Heisenberg models, see [FV18]. In that case, we have
a (finite) graph G = (V, E) with set V of vertices and set E of edges. The particles
are identified with the vertices of G, the spins are ±1 in the case of the Ising model,
elements of some finite set in the case of the Potts model, or vectors in Euclidean
space, in the case of the Heisenberg model. The interactions are pairwise and
described by the functions attached to the edges of E. Hence, in the context of
Theorem 1.1, we have r = 2 and c is the largest degree of a vertex of G. Starting
with [LY52], the complex zeros of the partition function of systems with pairwise
interactions were actively studied in great many papers. We refer to [FV18] for
earlier, and to [B+21], [G+22], [PR20], [P+23] for more recent works.
Zero-free regions for systems with multiple spin interactions were considered by
Suzuki and Fisher [SF71], again in connection with phase transition, see also [L+19]
and [L+16] for recent developments. In this case, the particles are identified with
the vertices of a hypergraph, while functions attached to the edges (sometimes called
hyperedges) of the hypergraph describe interactions. In terms of Theorem 1.1, the
maximum number of vertices of an edge of the hypergraph is r, while c is the largest
degree of vertex. The papers [SF71] and [L+19], respectively [L+16], consider
rather special ferromagnetic, respectively antiferromagnetic, types of interaction,
so their results are not directly comparable to ours. While [SF71], [L+19] and
[L+16] say more about those specific models, our Theorem 1.1 allows one to handle
a wider class of interactions.
where k · k is the standard Euclidean norm in Rd . Hence the total energy of the
system is
X
f x(1) , . . . , x(N ) = − kx(i) − x(j) k,
i6=j
which is minimized when the particles are far away from each other. However,
the probability of a configuration with large pairwise distances is small, so that
the Gaussian density can be interpreted as an external force pushing the particles
5
N
towards the origin, In terms of Theorem 1.2, we have n = dN , m = 2 , r = 2d
and c = N . Theorem 1.2 establishes a zero-free region of the order
1
|λ| = Ω √
N d
for the partition function E e−λf . This model is apparently related to the old
problem of finding a configuration of points on the unit sphere in Rd that maximizes
the sum of pairwise distances between points, see [B+23] (we note that for large
√ d
the standard Gaussian measure in Rd is concentrated around the sphere kxk = d).
is known as the tensor network contraction, or the partition function of the edge
coloring model, or as a Holant polynomial, see [Re18] for relations between different
models, and references. For relations with spin systems in statistical physics and
also knot invariants, see [HJ93]. Assuming that ψv (x) 6= 0 for all x and v, we can
write ψv (x) = exp {φv (x)} and (1.5.1) can be written as the scaled expectation
E eλf of Theorem 1.1, assuming the uniform probability measure in each space Xj .
In the context of Theorem 1.1, we have c = 2, while r is equal to the largest degree
of a vertex of G. Note that compared to the graph interpretation of the Ising and
Potts models from Section 1.3, the parameters r and c switch places.
A similar to (1.5.1) expression can be built for hypergraphs, in which case r is
still the maximum degree of vertex, while c becomes the maximum size of an edge.
In combinatorics and computer science, there is a lot of interest in efficient
approximation of (1.5.1), also in connection with zero-free regions [GG16], [B+22],
[Ca+22], [G+21], [Re18], since many interesting counting problems on graphs and
hypergraphs can be stated as a problem of computing (1.5.1). To illustrate, we
consider just one example of perfect matchings in a hypergraph.
Let H = (V, E) be a hypergraph with set V of vertices and set E of edges. We
choose X = {1, 2}E so that every edge of H can be colored into one of the two
colors, which we interpret as the edge being selected or not selected. We define
6
ψv (x) = 1 if exactly one edge containing v is selected, and ψv (x) = 0 otherwise.
Then (1.5.1) is exactly the number of perfect matchings is H. Since deciding whether
a hypergraph contains a perfect matching is a well-known NP-complete problem,
there is little hope to approximate (1.5.1) efficiently. One can try to come up with
a more approachable version of the problem by modifying the definition of ψv so
that ψv (x) = 1 if exactly one edge containing v is selected, and ψv (x) = 1 − δ
otherwise, for some 0 < δ < 1. In this case, the sum (1.5.1), while taken over
all collections of edges of H, is “exponentially tilted” towards perfect matchings.
Thus every perfect matching is counted with weight 1, and an arbitrary collection
of edges is counted with a weight exponentially small in the number of vertices
where the perfect matching condition is violated. In other words, the weight of a
collection of edges is exponentially small in the number of vertices that belong to
any number of edges in the collection other than 1.
The larger δ we are able to choose, the more (1.5.1) is tilted towards perfect
matchings. It follows from Theorem 1.1 via the interpolation method that for
a fixed r, √ there is a quasi-polynomial algorithm approximating (1.5.1) for some
δ = Ω (1/c r), where c is the maximum number of vertices in an edge of H and
r is the largest degree of vertex. The results [Ca+22] and [Re18] appear to allow
only for δ = Ω (1/cr). We note that Theorem 1.1 allows us to choose
|k − 1|
ψv = exp −Ω √ ,
c r
where k is the number of selected edges containing v, so as to assign smaller weights
to the vertices v with bigger violation
√ of the perfect matching condition, up to the
smallest weight of ψv = exp {−Ω( r/c)}, when all edges containing v are selected.
We note also that Theorem 1.1 allows us to select edges with a non-uniform prob-
ability, and hence to zoom in the perfect matchings even more: if the hypergraph
H is r-regular, that is, if each vertex is contained in exactly r edges, it makes sense
to select each edge independently with probability 1/r, so that for each vertex v
the expected number of selected edges containing v is exactly 1. If H is not regu-
lar, one can choose instead the maximum entropy distribution, which also ensures
that the expected number of selected edges containing any given vertex is exactly
1. The maximum entropy distribution exists if an only if there exists a fractional
perfect matching, that is, an assignment of non-negative real weights to the edges
of H such that for every vertex of H the sum of weights of the edges containing it
is exactly 1, see [Ba23] for details.
(1.6) Numerical integration in higher dimensions. Computationally efficient
numerical integration of multivariate functions is an old problem that is often asso-
ciated with “the curse of dimensionality”. The most spectacular success is achieved
for integration of log-concave functions on Rn via the Monte Carlo Markov Chain
method, see [LV07] for a survey. Deterministic methods achieved much less success.
In [GS24], the authors, using the deterministic decay of correlations approach, con-
sidered integration of functions over the unit cube [0, 1]n . The model of [GS24] fits
7
the setup of our Theorem 1.1, if we choose X1 = . . . = Xn = [0, 1] endowed with
the Lebesgue probability measure. The results of [GS24] appear to be weaker than
our Theorem 1.1, in the dependence on both parameters r and c, as well as in the
class of allowed functions φi .
Theorem 1.2 allows us to integrate efficiently some functions that are decidedly
not log-concave, for example if we choose φi (x) = |xi | for some i, thus reaching
outside the realm of functions efficiently integrated by randomized methods.
(1.7) Other applications. Zero-free regions of partition functions of the type
covered by Theorem 1.1 turn out to be relevant to the decay of correlations [Ga23],
[Re23], to the mixing time of Markov Chains [Ch+22], to the validity of the Central
Limit Theorem for combinatorial structures [MS19], as well as to other related
algorithmic applications [J+22].
Proof. This is Lemma 3.3 from [BR19], see also Lemma 3.6.3 of [Ba16].
We will need some technical inequalities.
(2.2) Lemma. The following inequalities hold
(1)
k
α π
cos √ ≥ cos α for 0≤α≤ and k ≥ 1;
k 2
(2)
π
sin(τ α) ≥ τ sin α for 0≤α≤ and 0 ≤ τ ≤ 1;
2
(3)
|ez − 1| ≤ 2|z| for z∈C such that |z| ≤ 1.
Proof. The inequality of Part (1) is proved, for example, in [Ba23]. To prove the
inequality of Part (2), let
from which the proof follows. To prove the inequality of Part (3), we note that
∞ ∞
z
X zk X |z|k
|e − 1| = ≤ ,
k! k!
k=1 k=1
and hence it suffices to check the inequality assuming that z is non-negative real.
Since the function
∞
ez − 1 X z k
=
z (k + 1)!
k=0
π
(2.3.1) θ= √ .
2 r−1
Once we have (2.3.2) for S = {1, . . . , n}, we conclude that E eλf 6= 0, which is
what we want to prove.
Suppose that |S| = 0, so that S = ∅ and
( m
)
X
ES eλf = eλf = exp λ φi .
i=1
9
Let x0 , x00 ∈ X be two points that differ only in the xj coordinate. Since at most c
of the functions φi depend on the coordinate xj and each function φi is 1-Lipschitz,
we have
1
|λf (x0 ) − λf (x00 )| = |λ||f (x0 ) − f (x00 )| ≤ |λ|c ≤ √ < θ,
3 r−1
It follows that
ES eλf 6= 0.
Assuming that k + 1 < n, let us pick some index not in S, without loss of generality
index n. We need to prove that as the coordinate xn changes from some value x0n
to some value x00n , while other coordinates remain the same, the value of ES eλf
rotates through an angle of at most θ. Let
I = {i : φi depends on xn } , so |I| ≤ c.
Without loss of generality, I 6= ∅. For each i ∈ I, we define two functions φ0i and φ00i ,
obtained from φi by fixing the coordinate xn to x0n and x00n respectively. Although
φ0i and φ00i are functions of the first n − 1 coordinates x1 , . . . , xn−1 of x ∈ X, we
formally consider them as functions on X, by ignoring the last coordinate xn .
Since each function φi is 1-Lipschitz in the Hamming metric, we have
We pass from (2.3.5) to (2.3.6) step by step, replacing one φ0i by φ00i at each step.
Our goal is to prove that at each step, the expectation rotates by at most θ/c. Once
10
we prove that, it would follow that the expectation ES eλf rotates by at most θ,
when we replace the value of xn = x0n by xn = x00n .
Let us pick an index in I, without loss of generality index m. We define
where ψ is the sum of some functions φ0i , φ00i and φi , where for each i 6= m we select
exactly one of the three functions φ0i , φ00i or φi into the sum for ψ. Hence our goal
0 00
is to show that the angle between ES eλf and ES eλf does not exceed θ/c.
Since each of the functions φ0i and φ00i is obtained from φi by specifying some value
of xn , we can apply the induction hypothesis both to f 0 and to f 00 . Let S0 ⊂ S
be the set of indices j ∈ S such that φ0m and φ00m depend on xj . In particular,
|S0 | ≤ r − 1.
If S0 = ∅, then
0 0 00 00
ES eλf = eλφm ES eλψ and ES eλf = eλφm ES eλψ .
0 00
Using (2.3.4), we conclude the angle between ES eλf 6= 0 and ES eλf 6= 0 does
not exceed
1 θ
|λφ00m (x) − λφ0m (x)| ≤ |λ| ≤ √ < .
3c r − 1 c
Suppose now that S0 6= ∅. Since φ0m and φ00m do not depend on the coordinates
xj with j ∈
/ S0 , from (2.3.7) we have
00
00 00 0 0
00 0 0
ES eλf =ES eλφm eλψ = ES eλφm −λφm eλφm +λψ = E eλφm −λφm eλf
00 0
0
=ES0 eλφm −λφm ES\S0 eλf
and hence
λf 00 λf 0 λφ00 0
m −λφm λf 0 λf 0
ES e − ES e =ES0 e ES\S0 e − ES0 ES\S0 e
00 0
0
=ES0 eλφm −λφm − 1 ES\S0 eλf .
00 0
eλφm −λφm − 1 ≤ 2|λ|.
Therefore,
00 0 0
(2.3.8) ES eλf − ES eλf ≤ 2|λ|ES0 ES\S0 eλf .
11
Iterating (2.3.3) with f replaced by f 0 , we obtain that
|S0 |
λf 0 λf 0 θ 0
ES e = ES0 ES\S0 e ≥ cos ES0 ES\S0 eλf
2
(2.3.9) r−1
θ 0
≥ cos ES0 ES\S0 eλf .
2
Recalling the bound for |λ|, formula (2.3.1) for θ and using Part (1) of Lemma 2.2,
we obtain
00 0
ES eλf − ES eλf 2
π
−(r−1)
≤ √ cos √
|ES eλf 0 | 3c r − 1 4 r−1
√
2 1 2 2
≤ √ = p .
3c r − 1 cos(π/4) 3c (r − 1)
0 00
It follows now that the angle, call it α, between ES eλf and ES eλf is acute and
that √
2 2
sin α ≤ p .
3c (r − 1)
From (2.3.1) and Part (2) of Lemma 2.2, we have
θ π 1 π 1
sin = sin √ ≥ √ sin = √ ≥ sin α.
c 2c r − 1 c r−1 2 c r−1
0 00
Hence the angle between ES eλf and ES eλf indeed does not exceed θ/c and
ES eλf rotates by not more than an angle of θ when the value of one of the coor-
dinates xj with j ∈/ S changes, while the others remain the same. This completes
the induction step in proving Statement 2.3.2.
It remains to prove (1.1.2) assuming (1.1.1). Clearly, we have
and the upper bound in (1.1.2) follows. To prove the lower bound, iterating (2.3.3),
we get n n
λf θ λf −|λ|mL π
Ee ≥ cos E e ≥ e cos √ ,
2 4 r−1
as required.
12
(2.4) Remark. Let X1 , . . . , Xn and X = X1 × . . . × Xn be probability spaces and
let φ1 , . . . , φm : X −→ C be measurable functions. We assume that φi is 1-Lipschitz
for all i = 1, . . . , m and that each φi (x) depends on at most r ≥ 2 coordinates of
x = (x1 , . . . , xn ), x ∈ X. Suppose further that λ1 , . . . , λm are complex numbers
such that
X 1
|λi | ≤ √ for j = 1, . . . , n.
i:
3 r − 1
φi depends on xj
and
m
X
gN = ψi .
i=1
Since the functions φi are 1-Lipschitz in the `1 metric of Rn , the functions ψi are
1-Lipschitz in the Hamming metric of X. Moreover, each function ψi depends on
not more than rN = rN coordinates and at most c functions ψi depend on any
particular coordinate xjk . Finally, from (1.2.1), we conclude that
√
N
|ψi (x)| ≤ L for i = 1, . . . , m.
2
13
Therefore, from formula (1.1.2) of Theorem 1.1, we conclude that
nN
√
λN gN 1 π
Ee ≥ exp − |λN |m N L cos √ .
2 4 rN − 1
provided
1
|λN | ≤ √ .
3c rN
Therefore,
nN
2 π
(3.1) E exp λ √ gN ≥ exp {−|λ|mL} cos √
N 4 rN − 1
provided
1
|λ| ≤ √ .
6c r
By the Central Limit Theorem, as N −→ ∞, the random vector
x11 + . . . + x1N xj1 + . . . + xjN xn1 + . . . + xnN
√ ,... , √ ,... , √
N N N
where the expectation in the right hand side is taken with respect to the standard
Gaussian measure in Rn , see, for example, Section 7.2 of [GS20]. Since
nN nN
π2 π2 n
π
lim cos √ = lim 1− = exp − ,
N −→∞ 4 rN − 1 N −→∞ 32(rN − 1) 32r
from (3.1) we obtain the lower bound in the inequality (1.2.2). The upper bound
in (1.2.2) is trivial.
It remains to consider the general case of not necessarily bounded functions φi .
Shifting, if necessary,
φi := φi − φi (0) for i = 1, . . . , m,
φi (0) = 0 for i = 1, . . . , m.
14
Then
n
X
(3.2) |φi (x1 , . . . , xn )| ≤ |xj | for i = 1, . . . , m.
j=1
|φi (x)| ≤ L
φi (x) if
φi,L (x) =
Lφi (x)/|φi (x)| if |φ(x)| > L.
Then
|φi,L (x) − φi,L (y)| ≤ |φi (x) − φi (y)| ,
so φi,L satisfy the conditions of Theorem 1.2, and are bounded. Hence for
m
X
fL = φL
i=1
we have
1
E eλfL 6= 0 provided |λ| ≤ √ .
6c r − 1
From (3.2) it follows that
and that the convergence is uniform on any compact set of λ in C. Then by the
Hurwitz Theorem, see, for example, Section 1 of [Kr01], we have two options:√either
E eλf = 0 for all λ ∈ C or E eλf 6= 0 for all λ in the open disc |λ| < 1/6c r − 1.
Since for λ = 0 we clearly have E eλf = 1, the first option is not realized.
4. Optimality
Our goal is to show that the bound
1
|λ| ≤ √
3c r − 1
Let us define ψ : R −→ C
if x ≥ 0
vx
ψ(x) =
ux if x < 0.
Then
Z +∞
2
(4.1) eψ(x) e−x /2
dx = 0.
−∞
if |ψ(x)| ≤ L
ψ(x)
ψL (x) =
Lψ(x)/|ψ(x)| if |ψ(x)| > L.
Then
and from (4.2) it follows that the convergence is uniform in z on all compact sets
in C. Since the right hand side of (4.3) for z = 0 is equal to 1, while for z = 1 is
equal to 0 by (4.1), by the Hurwitz Theorem, see, for example, Section 1 of [Kr01],
we conclude that for all sufficiently large L we must have
Z +∞
2
(4.4) ezψL (x) e−x /2
dx = 0 for some z with |z| < ρ.
−∞
16
Let us pick some L such that (4.4) holds. For an integer n, we consider n
probability spaces X1 = . . . = Xn = {−1, 1}n and their product X = {−1, 1}n , all
endowed with the uniform probability measure. We define φ : X −→ C by
√ n
!
n 1 X
(4.5) φ(x) = ψL √ xi where x = (x1 , . . . , xn ) .
2τ n i=1
It follows from (4.2) that φ is 1-Lipschitz in the Hamming metric of {−1, 1}n . Hence
φ satisfies the conditions of Theorem 1.1 with c = 1 and r = n.
Since by the CentraL Limit Theorem the normalized sum
n
1 X
√ xi
n i=1
where φ is defined by (4.5). While the parameter r remains the same for all fk
defined by (4.7), the parameter c changes, c = k. Furthermore,
E eλfk = E e(kλ)f ,
and hence the zero-free region for λ should scale
1 1
|λ| = O =O .
k c
17
Question: computational complexity. It would be interesting to find out
whether the dependence on r is optimal from the computational
√ complexity point
of view, that is, whether for real-valued f and λ 1/c r, the approximation of
E eλf becomes computationally difficult, possibly conditioned on P 6= NP or other
commonly believed hypothesis. The argument with the “cloning” of f as in (4.7)
shows that the dependence on c is indeed optimal.
5. Approximations
Here we sketch how Theorems 1.1, respectively Theorem 1.2, allow us to approx-
imate E eλf provided
1−δ 1−δ
(5.1) |λ| ≤ √ , respectively, |λ| ≤ √
3c r − 1 6c r − 1
for some 0 < δ < 1, fixed in advance. The approach was used many times before,
in particular in [Ba17] in the context closest to ours.
The algorithm is based on the following result.
(5.2) Lemma. Let p(z) be a univariate polynomial of degree N in a complex vari-
able z. Suppose that for some β > 1, we have p(z) 6= 0 for all z satisfying |z| < β
and let us choose a branch of g(z) = ln p(z) in the disc |z| < β. For an integer
k ≥ 1, let Tk (z) be the Taylor polynomial of g(z) degree k computed at z = 0, that
is,
k
X g (s) (0) s
Tk (z) = g(0) + z .
s=1
s!
Then
N
|g(z) − Tk (z)| ≤ for all |z| ≤ 1.
(k + 1)(β − 1)β k
While (5.4) is not a polynomial, we can approximate it close enough by its Taylor
polynomial pN (z) and then use Lemma 5.2 to approximate ln pN (z) by a polynomial
of a low degree. Using (5.3) and the bounds (1.1.2) and (1.2.2), for a given an
0 < < 1, one can compute
O(1)
1
N = (m + n) ln ,
such that the Taylor polynomial of (5.4),
N N m
!s
X λs z s X λs z s X
pN (z) = Ef s = φi
s=0
s! s=0
s! i=1
satisfies
1
pN (z) 6= 0 provided |z| ≤
1−δ
and pN (1) approximates E eλf within a relative error of /3 (in the context of The-
orem 1.2, we replace φi by their appropriate truncations), see [Ba17] for estimates.
Using Lemma 5.2 with β = (1 − δ)−1 , we further approximate pN (1) within a
relative error of /3 using only k = Oδ (ln(m + n) − ln )) first derivatives,
m
!s
(s)
X
pN (0) = λs E φi for s = 1, . . . , k,
i=1
21