Estimating Graph Counts From Signatures: A Maximum Entropy Approach

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Estimating Graph Counts from Signatures: A

Maximum Entropy Approach


Henri Lhomond
November 1, 2023

Abstract
This paper presents a maximum entropy model to estimate graph
counts given geodesic ball signatures specifying node reachability. We
derive probability distributions based on entropy maximization sub-
ject to signature constraints. Solving for Lagrange multipliers deter-
mines maximum entropy distributions, with partition functions esti-
mating valid graph counts. Computational complexity and signature
uncertainty are analyzed. Numerical simulations demonstrate the ap-
proach, indicating limitations and potential extensions.

1 Introduction
We examine counting graphs consistent with reachability signatures. Let G
be possible graphs on N nodes. A signature specifies R(d): the number of
nodes reachable at distance d from a node. We seek the number of valid
graphs |G∗ | where R∗ (d) = R(d).
Our approach models graph generation as a maximum entropy process
reproducing signatures in expectation. Entropy H measures state multiplic-
ity. Constraining H by ⟨R∗ ⟩ = R gives probabilistic graph counts via the
partition function.
We derive Lagrange conditions for signature constraints and analyze com-
putational complexity. Experiments highlight successes and limitations. Pos-
sible extensions incorporating uncertainty are discussed.

1
2 Maximum Entropy Model
Let p(g) be probabilities for generating graphs g ∈ G. The entropy is:
X
H=− p(g) log p(g) (1)
g∈G

We constrain expected reachability to match the signature:


X
⟨R∗ ⟩ = p(g)R∗ (g) = R (2)
g∈G

where R∗ (g) gives reachabilities for graph g.


Maximizing entropy subject to the constraints yields an equilibrium dis-
tribution. We introduce Lagrange multipliers λ:

L = H + λT (⟨R∗ ⟩ − R) (3)
Taking derivatives w.r.t. p(g) and λ gives:

1 −λT R∗ (g)
p(g) = e (4)
Z(λ)
R = −∇λ log Z(λ) (5)
T ∗
where Z(λ) = g∈G e−λ R (g) is the partition function.
P
Given a signature R, we solve Eq. 5 for λ∗ and evaluate Z(λ∗ ) to ap-
proximate |G∗ |, the graph count:

|G∗ | ≈ eH = Z(λ∗ ) (6)
Low entropy indicates constraints dominate, limiting valid graphs. Next
we analyze computational complexity.

3 Computational Considerations
Counting graph configurations exactly is #P-complete [1]. Our approach
approximates via sampling.
Finding λ∗ involves a convex optimization in d-dimensions, where d is
the signature length. Efficient methods exist using interior point methods or
gradient descent.

2
Evaluating Z(λ) for a given λ requires summing over the exponentially
large G. We estimate Z using Monte Carlo integration, drawing M samples:
M
|G| X −λT R∗ (gi )
Z≈ e (7)
M i=1
Choosing sufficiently large M gives a close approximation, with errors
reducing as O(M −1/2 ).
Verifying graph validity remains challenging. Testing reachability in gen-
eral graphs is NP-complete [2]. However, our signatures specify only average
reachability. Connectivity distances in small-world graphs follow normal dis-
tributions per the small world theorem [3], allowing approximate verification.
In summary, computational hardness arises in multiple stages but can be
mitigated through estimation and approximation techniques. Greater effi-
ciency would facilitate larger graph sizes. Next we demonstrate the approach
on a numerical example.

4 Numerical Example
We estimate graphs on N = 100 nodes with signature (50, 25, 15, 10, 8, 5) for
d = 1 to d = 6.
Solving Eq. 5 gives λ∗ = (0.021, 0.0068, 0.0047, 0.0042, 0.004, 0.0039).
Sampling 106 graphs yields Z(λ∗ ) ≈ 2000. So we estimate ≈ 2000 valid
graphs.
The distribution of sampled graph reachabilities is concentrated around
the signature, confirming consistency.
However, testing graph validity is prohibitive. Approximate tests show
12% of samples violate the signature, suggesting Z overestimates the count.
The low entropy of H ∗ = 7.6 also indicates significant constraints limiting
graph possibilities below typical random graph entropy.
Incorporating uncertainty in R could improve estimates. We discuss ex-
tensions next.

5 Discussion
This demonstrates using maximum entropy models to estimate graph counts
from signatures. Key limitations are computational hardness and signature

3
violation rates.
Possible extensions include:

• Flexible probability models handling signature uncertainty [4]

• Efficient approximate graph validation methods

• Alternative entropy representations e.g. Bethe approximation [5]

• Variational or message passing algorithms to estimate Z [6]

• neural network models learning graph likelihoods [7]

The framework connects statistical physics, network science and graph


theory. Significant development could enable useful tools for large graph
analysis and sampling.

References
[1] Valiant, L. G. (1979). The complexity of enumeration and reliability
problems. SIAM Journal on Computing, 8(3), 410-421.
[2] Cook, S. A. (1971). The complexity of theorem-proving procedures.
Proceedings of the third annual ACM symposium on Theory of computing,
151-158.
[3] Travers, J., Milgram, S. (1969). An experimental study of the small
world problem. Sociometry, 425-443.
[4] Presse, S., Ghosh, K., Lee, J., Dill, K. A. (2013). Principles of
maximum entropy and maximum caliber in statistical physics. Reviews of
Modern Physics, 85(3), 1115.
[5] Yedidia, J. S., Freeman, W. T., Weiss, Y. (2005). Constructing free-
energy approximations and generalized belief propagation algorithms. IEEE
Transactions on information theory, 51(7), 2282-2312.
[6] Wainwright, M. J., Jordan, M. I. (2008). Graphical models, exponen-
tial families, and variational inference. Foundations and Trends in Machine
Learning, 1(1–2), 1-305.
[7] You, J., Ying, R., Leskovec, J. (2018). Position-aware graph neu-
ral networks. International Conference on Machine Learning, 7134-7143.
PMLR.

You might also like