Open Problems
Open Problems
STEFAN STEINERBERGER
Abstract. This list contains some open problems that I came across and
that are not well known (no Riemann hypothesis...). Some are extremely
difficult, others might be doable and some might be easy (one of the problems
is that I do not know which is which). The presentation is pretty casual,
the relevant papers/references usually have more details – if you have any
questions, comments, remarks or additional references, please email me!
Contents
Part 1. Combinatorics 3
1. The Motzkin-Schmidt problem 3
2. Great Circles on S2 4
3. Strange Patterns in Ulam’s Sequence 6
4. Topological Structures in Irrational Rotations on the Torus 7
5. Graphical Designs 8
6. How big is the boundary of a graph? 9
7. The Constant in the Komlos Conjecture 10
8. The Inverse of the Star Discrepancy 12
9. Erdős Distinct Subset Sums Problem 13
10. Finding Short Paths in Graphs with Specral Graph Theory 13
11. A Curve through the ℓ1 −norm of eigenvectors of Erdős-Renyi graphs 14
12. Matching Oscillations in High-Frequency Eigenvectors. 16
13. Balanced Measures are small? 16
14. Different areas of triangles induced by points on S1 18
Part 2. Analysis 19
15. Sums of distances between points on the sphere 19
16. A Directional Poincare Inequality: flows of vector fields 19
17. Auto-Convolution Inequalities and additive combinatorics 20
18. Maxwell’s Conjecture on Point Charges 21
19. Quadratic Crofton and sets that barely see themselves 22
20. Opaque Sets 23
21. A Greedy Energy Sequence on the Unit Interval 23
22. The Kritzinger sequence 25
23. A Special Property that some Lattices have? 26
24. Roots of Classical Orthogonal Polynomials 27
25. An Estimate for Probability Distributions 28
Last update: June 10, 2024. Partially supported by the NSF (DMS-1763179, DMS-2123224)
and the Alfred P. Sloan Foundation.
1
2
Part 7. Miscellaneous/Recreational 56
61. Geodesics on compact manifolds 56
62. The Traveling Salesman Constant 57
63. Number of Positions in Chess 58
3
Part 1. Combinatorics
1. The Motzkin-Schmidt problem
Let x1 , . . . , xn be n points in [0, 1]2 . The goal is to find a line ℓ such that the
ε−neighborhood of ℓ contains at least 3 points. How small can one choose ε (de-
pending on n) to ensure that this is always possible? A simple pigeonholing argu-
ment shows that ε ≤ 3/n always works. As far as I know, this trivial bound has
never been improved. The question, due to T. Motzkin and W. M. Schmidt (inde-
pendently), is whether ε = o(1/n) is possible. J. Beck has established this result
assuming that the points are somewhat evenly distributed in the square (J. Beck,
Almost collinear triples among N points on the plane). I would also be interested in
what happens when the points lie on S2 and one wants to capture at least 3 using
neighborhoods of great circles.
There is a harder version of the question (in the sense that if one could show that
no such constant exists, then this implies Motzkin-Schmidt).
Jean was very generous with problems. [...] He told me there
the following problem, which to the best of my knowledge, is still
open. Is it possible to find n points in the unit square such that the
1/n−neighborhood of any line contains no more than C of them for
some absolute constant C? The motivation for this problem comes
4
containing at least 3 points. We note that this bound is always ≤ c/n but not
necessarily better than that if the points are nicely distributed in the square.
2. Great Circles on S2
Let C1 , . . . , Cn denote the 1/n−neighborhood of n great circles on S2 . Here’s
a natural question: how much do they have to overlap? I proved (Discrete &
Computational Geometry, 2018) that
2−2s
X n n
if 0 ≤ s < 2
s −2
|Ci ∩ Cj | ≳s n log n if s = 2
i,j=1
1−3s/2
i̸=j n if s > 2.
and these bounds are sharp.
If we consider the δ−neighborhoods C1,δ , C2,δ , . . . , Cn,δ of n fixed great circles where
no two great circles coincide and ifp1 , . . . , pn denote one of their ’poles’, then
n n
1 X X 1
lim 2 |Ci,δ ∩ Cj,δ |s = 2 s/2 .
δ→0 δ
i,j=1 i,j=1 (1 − ⟨pi , pj ⟩ )
i̸=j i̸=j
has been investigated by Chen, Hardin, Saff (‘On the search for tight frames...’,
arXiv 2020) who show that minimizing configurations are well seperated.
0 1 2 3 4 5 6
The same seems to be true if one starts with initial values different from 1, 2. For
some choices, the arising sequence seems to be a union of arithmetic progressions:
whenever that is not the case, it seems to be ‘chaotic’ in the same sense: there
exists a constant α (depending on the initival values) such that αan mod 2π has
a strange limiting distribution. There are now several papers showing that these
strange type of phenomenon seems to persist even in other settings. I would like to
understand what this sequence does – even the most basic things are not known:
the sequence is known to be infinite since an + an−1 can be uniquely written as the
sum of two earlier terms and thus
an+1 ≤ an + an−1
7
This also shows that the sequence grows at most exponentially and that is the best
bound I am aware of. Empirically, the sequence has density ∼ 7% and it seems like
an ≤ 14n for all n sufficiently large.
Update (May 2022). Rodrigo Angelo (‘A hidden signal in Hofstadter’s H se-
quence’) discovered another example of a sequence with this property and gives
rigorous proofs for this example.
Simultaneously, if one takes the standard van der Corput sequence in base 2, one
seems to end up with a really nice manifold with some strange holes where things
are glued together in a fun way.
5. Graphical Designs
This problem is somewhere between PDEs and Combinatorics. Let G = (V, E) be
a finite, undirected, simple graph. Then we can define functions on the Graph as
mappings f : V → R. We can also define a Laplacian on the Graph, this is simply
a map that sends functions to other functions. One possible choice is
1 X
(Lf )(v) = f (v) − f (w).
deg(v) w∼ v
E
There are a couple of different definitions of the Laplacian and I don’t know what’s
the best choice for this problem. However, these different definitions of a Laplacian
all agree on d−regular graphs and the phenomenon at hand is already interesting
for d−regular graphs. Once one has a Laplacian matrix, one has eigenvectors
and eigenvalues. We will interpret these eigenvectors again as functions on the
graph. What one observes is that for many interesting graphs, there are subsets
of the vertices W ⊂ V such that for many different eigenvectors ϕk of the Graph
Laplacian X
ϕk (v) = 0.
v∈W
What is interesting is that such sets seem to inherit a lot of rich structure. I
proved (Journal of Graph Theory) that if there are large subsets W such that
the equation holds for many different eigenvectors k, then the Graph has to be
‘non-Euclidean’ in the sense the volume of balls grows quite quickly (exponentially
depending on the precise parameters). This shows that we expect graphs with nice
structure like this to be more like an expander than, say, a path graph. Konstantin
Golubev showed (Lin. Alg. Appl.) that this framework naturally encodes some
classical results from classical combinatorics: both the Erdős-Ko-Rado theorem and
the Deza-Frankl theorem can be stated as being special types of these ‘Graphical
Designs’. There are many other connections: (1) for certain types of graphs, this
seems to be related to results from coding theory and (2) such points would also
be very good points when one tries to sample an unknown function in just a few
vertices; this is because the definition can be (3) regarded as a Graph equivalent of
the classical notion of ‘spherical design’ on Sd−1 . In fact, seeing as designs can be
regarded as an algebraic definition that gives rise to Platonic bodies in R3 , I like to
think of these ‘graphical designs’ as the analogue of ‘Platonic bodies in a graph’.
There are a great many questions:
(1) when do such sets exists?
(2) are there nice extremal examples?
(3) how does one find them quickly?
(4) how can one prove non-existence?
The theory of spherical designs on Sd−1 is quite rich and full of intricate problems
so I would assumes the Graph analogue to be at least as difficult and probably
more difficult. But there are many more graphs than there are spheres (only one
per dimension: Sd ), so there should be many more interesting examples that may
themselves be tied to interesting algebraic-combinatorial structures.
The same paper an isoperimetric inequality: each vertex v ∈ V will detect a ‘large’
number of vertices as boundary vertices.
10
Update (Sep 2022). Chiem, Dudarov, Lee, Lee & Liu (‘A characterization of
graphs with at most four boundary vertices’) have characterized all graphs with at
most 4 boundary vertices.
min ∥Ax∥ℓ∞ ≤ K.
x∈{−1,1}n
√
The best known result is K = O( log n) (Banaszczyk) and the conjecture is that
there exists a universal K = O(1) independent of the dimension. What makes the
conjecture even more charming is the constant K might actually be quite small.
Remarkably little seems to be known about this. It is not that easy to construct
a matrix showing that K > 1.5 and apparently for a while it was considered a
unreasonable guess that K = 2. The best known result that I know of is due
to Kunisky (‘The discrepancy of unsatisfiable matrices and a lower bound for the
Komlos conjecture constant’) showing that
√
K ≥ 1 + 2 = 2.4142 . . .
What makes the problem of constructing lower bounds hard is that given a n × n
matrix, one needs to check 2n vectors to verify that for all of them ∥Ax∥ℓ∞ is large.
1 1 1 0 0 0 0
1 0 0 1 0 0 1
1 0 0 0 1 1 0
1
√ 0 1 0 1 0 1 0
3
0 1 0 0 1 0 1
0 0 1 1 1 0 0
0 0 1 0 0 1 1
√
Figure 9. A matrix showing K ≥ 3.
constant cX such that for every 2−coloring χ : X → {−1, 1}, there always exists a
line ℓ in the projective such that
X
χ(x) ≥ cX ?
x∈ℓ
Update (Aug 2022). Victor Reis reports the bounds in Table 2. They do seem to
indicate that there are actually rather effective two-colorings (in the sense of the α
being close to 2 or even smaller) which would suggest that finite projective plane
will not lead to good lower bound for the Komlos conjecture.
1
star-discrepancy = max # {1 ≤ i ≤ n : xi ∈ [0, y]} − vol([0, y]) ,
y∈[0,1]d n
d ∗ d
≲ N∞ (d, ε) ≲ 2 ,
ε ε
where the upper bound is a probabilistic argument by Heinrich, Novak, Wasilkowski
& Wozniakowski (2001). The lower bound was established by Hinrichs (2004) using
Vapnik-Chervonenkis classes and the Sauer–Shelah lemma.
The upper bound construction is relatively easy: take iid random points. This leads
to a fascinating dichotomy
• either random points are essentially as regular as possible
• or there are more regular constructions we do not yet know about.
I could well imagine that random is best possible but am personally hoping for the
existence of better sets (because such sets would probably be pretty interesting). I
gave a new proof of the lower bound (‘An elementary proof of a lower bound for the
inverse of the star discrepancy’) which is relatively simple and entirely elementary
– can any of these ideas be used to construct ‘good’ sets of points?
13
There is another interesting problem that arises from relaxing the conditions: what
if we do not require subset sums to be distinct but merely require them to attain
many different values (some duplicates are allowed as long as there are few).
Part of the motivation is that this question can be interpreted as a discrete version
of the Hot Spots conjecture (in particular, if the graph discretizes a convex domain
in the usual grid-like fashion, then we expect ϕ to be monotonically increasing away
from the vertex and to assume its maximum on the boundary).
Update (Nov 2020). Yancey & Yancey (arXiv:2011.08998) discuss some coun-
terexamples and propose graph curvature as an interesting condition.
Question. When we plot ∥vi ∥ℓ1 for i = 1...n, they seem to lie on
a curve (see Fig. 11). Why would they do that?
This somehow means that eigenvectors at the edge of the spectrum are more lo-
calized: that is perhaps not too surprising. What is truly surprising is that the
eigenvectors seem to uniformly lie very close to this curve. There seems to be
a strong measure concentration phenomenon at work: the curve always looks the
same for many different random realizations of G(n, p) (for fixed n, p).
A second question is what happens in the middle. What we see there is that
∥vi ∥ 1
E max √ ℓ ∼ 0.8.
1≤i≤n n
p i seems to be i ∼ n/2. If X ∼ N (0, 1) is a standard Gaussian, then
The relevant
E|X| = 2/π ∼ 0.7978 . . . . Coincidence?
Update (Dec 2020). I asked the question on mathoverflow. Ofer Zeitouni pointed
out that works by Rudelson-Vershynin and Eldan et al. suggest the lower bound
∥vi ∥ 1 1
E min √ ℓ ≳ .
1≤i≤n n (log n)c
16
Figure 12. Sign of the 2nd and the 3rd eigenvector and of their
product.
The second and the third eigenvector have sign changes across most of the edges:
they oscillate essentially as quickly as the graph allows. In contrast, the (pointwise)
product of these high-frequency eigenvectors appears to be much smoother and ex-
hibits a sign pattern typical of low-frequency eigenvectors: positive and negative
entries are clustered together and meet across a smooth interface.
In words: the new vertex that is being added has the property of maximizing the
sum of distances to the previous vertices. This procedure is also interesting in the
continuous setting (see ‘Sums of distances between points on the sphere’)
One could now wonder about the long-term behavior of this construction and what
one observes is that vertices are not taken with equal frequency, things are quite
lopsided and the actual frequency with which a vertex is selected converges to a
very special kind of probability distribution.
Definition 1. A probability measure µ on the vertices V is balanced if
X X
∀w ∈ V µ(w) > 0 =⇒ d(w, u)µ(u) = max d(v, u)µ(u).
v∈V
u∈V u∈V
So basically the support of the measure is simultaneously the point to which the
global transport of the entire measure would be the most expensive. We note that
balanced measures are not necessarily unique.
ϕ
ϕ
There are certainly graphs where this is not the case: examples are given by the
dodecahedral graph or the Desargue graphs for which the uniform measure is bal-
anced. In practice, it seems very difficult for graphs to have the support of a
balanced measure contain a large number of vertices and it would be interesting to
have a more quantitative understanding of this.
only differs up to a constant. It is easy to see that, since areas of a triangle are
determined by the three sides and we constrain the points to lie on S1 , that
2
# {area [∆(xi , xj , xk )] : 1 ≤ i < j < k ≤ n} ≲ (# {∥xi − xj ∥ : 1 ≤ i, j ≤ n})
and irrational rotations on the torus only have ∼ n pairwise distances since
|eikα − eimα | = |ei(k−m)α − 1|.
The second reason I like the problem is the formula
2 2 2
is it iu
s−t u−s t−u
area ∆(e , e , e ) = 4 sin sin sin
2 2 2
This naturally calls for the angles to be in some form of generalized arithmetic
progression but it’s clear that there is a nonlinear twist to it. The third reason I
like the problem is that it is known (Erdos-Purdy and others) that n points in the
planar induce at least ⌊(n − 1)/2⌋ different triangle areas: taking equispaced points
on two parallel lines shows that this is sharp. This means the curvature of S1 has
to somehow play a role.
Part 2. Analysis
15. Sums of distances between points on the sphere
It is known that for any set {x1 , . . . , xn } ⊂ S2 , we have
n
X 4 2 √
∥xi − xj ∥ ≤ n − c n.
i,j=1
3
No good explicit construction of points with this property is known (the only con-
struction that is known to attain this bound are randomized constructions involving
either jittered sampling or determinantal point processes). A strange mystery (dis-
cussed in ‘Polarization and Greedy Energy on the Sphere’) is the following: if we
start with an arbitrary initial set of points {x1 , . . . , xm } and then define a greedy
sequence by setting the next point so as to maximize the sum of the existing dis-
tances
Xn
xn+1 = arg max2 ∥x − xi ∥.
x∈S
i=1
This seems to actually lead to a sequence of maximal growth. We know (‘Polariza-
tion and Greedy Energy on the Sphere’) that, for n sufficiently large (depending
on the initial conditions),
n
X 4 2
∥xi − xj ∥ ≥ n − 100n
i,j=1
3
and it’s completely clear from the proof that this far from optimal (in fact, it is
optimal for a much wider class of sequences that is clearly not as well behaved).
The procedure is very stable: one does not need to take the actual maximum, it is
enough to pick a value that is sufficiently close. One can even sometimes replace
xn+1 by some other arbitrary point (perhaps a really bad one) and, as long as one
does not do that too often, this does not seem to cause any issues, it appears to
be an incredibly robust procedure. It also seemingly works in higher dimensions.
Why? We also note that there are at least two different problems here
(1) finding a good construction for fixed n
(2) finding a sequence (xk )∞ n
k=1 such that (xk )k=1 is good uniformly in n.
It is clear that the second problem is harder than the first but it also appears, nu-
merically, as if the greedy sequence leads to a sequence (xk )∞ n
k=1 such that (xk )k=1
is optimal (up to constants) uniformly in n.
The problem is also interesting on graphs (see ‘Balanced measures are small?’)
One would assume that this is generally possible. For example, let V be a vector
field on T2 . When do we have, for some fixed universal δ = δ(V ) > 0 that for all
f ∈ C ∞ (T2 ) with mean value 0
∥∇f ∥1−δ δ
L2 (T2 ) ∥ ⟨∇f, V ⟩ ∥L2 (T2 ) ≥ cα ∥f ∥L2 (T2 ) ?
How does δ depend on V ? One would expect that it depends on the mixing prop-
erties of V : the better it mixes, the larger δ can be. It should be connected to how
quickly the flow of the vector field transports you from the vicinity of one point to
the vicinity of another point. Is it true that δ can never be larger than 1/2?
The question also has a dual formulation (which also has relevance in Additive
Combinatorics): whenever we have an L1 −function f , then there exists a shift such
that f (x) and f (x − t) have small inner product. We observe that, for f ≥ 0,
Z Z 1Z
min f (x)f (x − t)dx ≤ f (x)f (x − t)dxdt
0≤t≤1 R 0 R
Z ∞Z
∥f ∥2L1
≤ f (x)f (x − t)dxdt = .
0 R 2
So the statement itself is not complicated but the optimal constant seems to be
quite complicated. We currently know that
21
Noah Kravitz (arXiv, April 2020) proved that the question is equivalent to an old
question about the cardinality of difference bases. We conclude with a related
question of G. Martin & K. O’Bryant: if f ∈ L1 (R) is nonnegative, can Hölder’s
inequality be improved? Is there a universal δ > 0 such that
∥f ∗ f ∥L∞ ∥f ∗ f ∥L1
≥ 1 + δ?
∥f ∗ f ∥2L2
√
They produce an example (f (x) = 1/ 2x if 0 < x < 1/2, f (x) = 0 otherwise)
showing that δ cannot exceed 0.13.
Update (Aug 2022). Ramos & Madrid (Comm. Pure Appl Anal, 2021) proved
that the optimal constant c in
Z
min f (x)f (x + t)dx ≤ c∥f ∥2L1
0≤t≤1 R
is strictly less than what one obtains from the argument by Rick and myself.
How often can ∇f vanish or, phrased differently, how many critical points can there
be? It is easy to see that all the critical points have to be in the convex hull of
the three points of x1 , x2 , x3 . Once one does some basic experimentation, one sees
that there seem to be at most 4 critical points (if the triangle is very flat, it is
possible that there are only 2). Gabrielov, Novikov, Shapiro (Proc. London Math,
2007) showed that there are at most 12. A more recent 2015 Physica D paper of
Y.-L. Tsai (Maxwell’s conjecture on three point charges with equal magnitudes)
shows that there are at most 4 critical points – the proof is heavily computational
and seems hard to generalize. Is there a ‘simple’ proof for n = 3? What about
other potential functions? Now suppose there are 4 points x1 , x2 , x3 , x4 . In that
case, it is not even known whether there is a finite number! There are even related
one-dimensional problems that are wide open such as
22
Update (July 2023) . Vladimir Zolotov (‘Upper bounds for the number of isolated
critical points via Thom-Milnor theorem’) gives a number of very strong bounds
for this and related problems.
n(x) n(y)
x y
One way of thinking about the functional is that it measures the behavior of the set
when projected onto itself in the following sense: consider x, y ∈ L and let us take
small neighborhoods around x and y (we may think of these as approximately being
short line segments). We could then ask for the expected size of the projection of
one such line segment onto the other under a ‘random’ projection. Equivalently,
we can ask for the likelihood that a ‘random’ line intersects both line segments.
Questions. Given a convex set Ω, which L ⊂ Ω minimize E(L)?
A partial answer (‘Quadratic Crofton and sets that see themselves as little as pos-
sible’) is given by the following Theorem that shows that, at least for a positive
proportion of lengths L the answer is fairly simple.
Theorem. Let Ω ⊂ Rn be a bounded, convex domain with C 1 -boundary. There
exists a constant cΩ such that if
0 ≤ L − m|∂Ω| ≤ cΩ for some m ∈ N,
then among all (n − 1)−dimensional piecewise differentiable Σ ⊂ Ω with surface
area Hn−1 (Σ) = L the energy
|⟨n(x), y − x⟩ ⟨y − x, n(y)⟩|
Z Z
E(Σ) = dσ(x)dσ(y)
Σ Σ ∥x − y∥n+1
is minimized by m copies of the boundary and a segment of a hyperplane.
23
One can also rephrase the question (see the referenced paper). By Crofton’s for-
mula, the expected number of intersections of a ‘random’ line with a set is only
determined by the co-dimension 1 volume of the set (i.e. the length in R2 ).
Questions. Which set minimizes the variance of intersections?
One motivating image could be the following. Suppose we cover Earth with sensors
to detect cosmic radiation: how should we arrange the sensors so that a random
ray is roughly captured by the same number of sensors? The real-life case of this
scenario is covered by the above Theorem: since sensors are expensive, we are in
the setting where 0 < L ≪ 1 and one should put the sensors in a single hyperplane.
Figure 15. Left: three sides of the boundary give an opaque set
with length 3. Right: the
√ conjectured
p shortest opaque set for the
unit square with length 2 + 3/2 ∼ 2.63.
If Ω ⊂ R2 is convex, then any opaque set has length at least |∂Ω|/2. It is known
that this cannot be improved in general (take an extremely thin rectangle and use
two short sides and one long sides). There are many proofs of this (often using
variants of Crofton’s Formula or, equivalently, averaging over projections). What
is somewhat shocking is that, even for the unit square [0, 1]2 , the best lower bounds
aren’t much better: the best bound is something like 2.000021. It would be very
nice to have some better bounds.
(The full phenomenon seems to hold for much more general functions but this seems
to be the easiest special case.) This function has a maximum in 0 and mean value
0. We can now consider sequences obtained in the following way
Xn
xn+1 = arg min f (x − xk ).
x∈T
k=1
What happens is that the arising sequence (xn )∞ n=1 seems to be very regularly
distributed in all the usual ways: for any subinterval J ⊂ [0, 1], we have
# {1 ≤ i ≤ N : xi ∈ J} ∼ |J| · N + very small error term.
There are many other ways of phrasing the phenomenon, for example it seems to
be that
N
X
f (xk − xℓ ) grows very slowly (logarithmically?) in N.
k,ℓ=1
lead to nearly optimal results (arXiv, June 2020) for this particular function – but
the proof is quite special and uses a number of tricks that are highly tailored to this
particular function; the phenomenon seems to be much, much more robust. Louis
Brown and I (J. Complexity, 2020) proved Wasserstein bounds that get really good
in dimensions d ≥ 3. But the one-dimensional problem seems to be hard and quite
interesting.
This seems maybe a bit arbitrary at first glance but arises naturally when trying to
pick xN +1 in such a way that the L2 −distance between the empirical distribution
and the uniform distribution is as small as possible (see the paper). What is
particularly nice about this greedy sequence is that its consecutive elements are
‘nice’
1 1 5 1 7 5 13
, , , , , , ,...
2 4 6 8 10 12 14
We observe that xn can be written as xn = p/(2n) with p odd (additional cancella-
tion may occur, so the denominator is always a divisor of 2n). The sequence seems
to be very regularly distributed in the sense that
Kritzinger proves
√
max |# {1 ≤ i ≤ N : xi ≤ x} − N x| ≲ N
0≤x≤1
but one could imagine the upper bound being as small as log N . It doesn’t seem
to matter much whether x1 = 1/2. In fact, even starting with an arbitrary initial
set {x1 , . . . , xm } ⊂ [0, 1], one observes this high degree of regularity. Why?
Update (July 2022). The Kritzinger sequence turns out to coincide with the
sequence that one obtains when greedily minimizing the Wasserstein W2 distance
between the empirical measure and the Lebesgue measure on [0, 1]. Using some
other ideas I was able to show (‘ On Combinatorial Properties of Greedy Wasserstein
Minimization’) that for infinitely many N ∈ N
Z x
max # {1 ≤ i ≤ N : xi ≤ y} − N y dy ≲ N 1/3 .
0≤x≤1 0
This in particular implies that the sequence is quite a bit more regular than iid
random points (for which this quantity would be ∼ N 1/2 with overwhelming like-
lihood).
26
1.5
0.5
-0.5
-1
-1.5
We can now perturb the lattice a little bit: by this I mean that we perturb the basis
vectors a tiny bit (but in such a way that the density, the volume of a fundamental
cell, is preserved). This ‘wiggling of the lattice’ leads to a ‘wiggling of the points’
Λr (by this we mean exactly what it sounds like: each point in Λr has a basis
representation a1 v1 +a2 v2 where v1 , v2 are the basis vectors of the hexagonal lattice
and we now consider a1 w1 + a2 w2 where w1 , w2 are the slightly perturbed vectors).
After wiggling the points in this way, some will move closer and some will move
further away.
Theorem (Faulhuber & S, J. Stat. Phys). The sum of the dis-
tances increases under small perturbations.
I believe this to be quite a curious property: it shows, in a certain sense, ‘points
in the hexagonal lattice are, on average, closer to the origin than the points of any
nearby lattice’. It seems a bit like optimal sphere packing but also like something
else. I would believe that most of the lattices that are optimal w.r.t. sphere packing
have this property but it’s not clear to me whether there are others.
Question. Which other lattices have this property? Even in R3
this already seems tricky. What about D4 or E8? Leech?
27
Our proof for the hexagonal lattice is actually quite simple: the set Λr has a ro-
tational symmetry by 120◦ so instead of studying Λr , it suffices to study a triple
of points having this symmetry and then the computation becomes explicit. In
principle this should also work for other lattices but one has to identify proper
symmetries and then see whether one can do the computations.
Update (Dec 2022) Paige Helms (‘Extremal Property of the Square Lattice’,
arXiv Dec 2022) has established a similar result for the square lattice Z2 .
I managed to extend this result to all classical polynomials (Proc. AMS 2018).
Theorem. Let p(x), q(x) be polynomials of degree at most 2 and 1, respectively.
Then the set {x1 , . . . , xn }, assumed to be in the domain of definition, satisfies
n
X 2
p(xi ) = q(xi ) − p′ (xi ) for all 1≤i≤n
k=1
xk − xi
k̸=i
if and only if
n
Y
y(x) = (x − xk ) solves − (p(x)y ′ )′ + q(x)y ′ = λy for some λ ∈ R.
k=1
What’s particularly interesting is that one can use this to define a system of ODEs
for which the stationary state corresponds exactly to roots of classical orthogonal
polynomials. More precisely, consider
n
d X 2
xi (t) = −p(xi ) + p′ (xi (t)) − q(xi (t)) (⋄)
dt k=1
x k (t) − x i (t)
k̸=i
We can then show that the underlying system of ODEs converges exponentially
quickly to the true solution.
28
Theorem. The system (⋄) converges for all initial values x1 (0) < · · · < xn (0) to
the zeros x1 < · · · < xn of the degree n polynomial solving the equation. Moreover,
max |xi (t) − xi | ≤ ce−σn t ,
1≤i≤n
-1
I would like to understand whether one can quantify how this result goes to infinity.
Suppose we have a random variable that is not compactly supported (and maybe
has a smooth density?)
• Question 1. Is there always a z > 0 such that
P(X + Y ≥ 2z and min(X, Y ) ≤ z) (2 log 2)z
≥ ?
P (X + Y ≥ 2z and min(X, Y ) ≥ z) med(X)
• Question 2. Is there always a z > 0 such that
P(X + Y ≥ 2z and min(X, Y ) ≤ z) 2z
≥ ?
P (X + Y ≥ 2z and min(X, Y ) ≥ z) EX
The numbers are coming from assuming that exponential distributions are the worst
case (they might not be). In case the constants are wrong: is the growth of the
RHS linear in z? If that is wrong: what is it? Note that all these probabilites can
be written explicitly as integrals over f (x)f (y)dxdy over certain regions.
I was originally interested in whether the assumption of the random variable not
having a compactly supported distribution is necessary. It turns out that it is: I
proved (Stat. Prob. Lett.)
Theorem. If X, Y are i.i.d. random variables drawn from an absolutely continuous
probability distributions with density f (x)dx on R≥0 , then
1
sup P (X ≤ z and X + Y ≥ 2z) ≥ ,
z>0 24 + 8 log2 (med(X)∥f ∥L∞ )
where med X denotes the median of the probability distribution. This estimate is
sharp up to constants and the supremum can be restricted to 0 ≤ z ≤ med(X).
It would be interesting to know whether it is possible to determine the sharp con-
stants and the extremal distribution.
I proved in the original paper (J. Geom. Anal, 2018) that if f : Ω → R is merely
subharmonic, i.e. ∆f ≥ 0, then we still have
Z Z
1/n
f dx ≤ cn |Ω| f dσ.
Ω ∂Ω
30
Jianfeng Lu and I then proved (Proc. AMS 2020) that one√ can take cn = 1.
Jeremy Hoskins and I proved that (arxiv, Dec 2019) c2 < 1/ 2π ∼ 0.39 . . . and
have obtained a candidate domain that leads to a constant of ∼ 0.358. We believe
that this is probably close to the best possible domain, it is shown in the Figure.
It’s currently not even known whether an extremal shape exists. Does it exist?
And does the curvature of its boundary vanish at exactly one point?
1.2
0.8
0.6
0.4
0.2
-0.2
Are there more such inequalities? Are they part of a family? I would be especially
interested in higher-dimensional analogues.
We note that the geodesic is defined as the shortest path γ : [0, 1] → Ω with
γ(0) = x and γ(1) = y. We say that it arrives non-tangentially if ⟨γ ′ (1), ν⟩ ̸= 0,
where ν is the normal vector of ∂Ω in y. Of course (∂Ω)x is a subset of the full
boundary ∂Ω. We were interested in whether this non-tangential boundary (∂Ω)x
still obeys some form of isoperimetric principle.
x x x
It is not terribly difficult to show (and was done in ‘The Boundary of a Graph and
its Isoperimetric Inequality’, Jan 2022) that for convex domains Ω ⊂ Rd
|Ω|
∀x ∈ Ω |(∂Ω)x | ≥ (d − 1) .
diam(Ω)
The constant d−1 cannot be optimal (but is optimal up to a factor of 2 in d = 2). It
seems natural to ask: what is the optimal constant cd such that for convex Ω ⊂ Rd
|Ω|
∀x ∈ Ω |(∂Ω)x | ≥ cd ?
diam(Ω)
The other natural question is to ask what sort of conditions one needs on the domain
Ω for this non-tangential isoperimetric principle to hold.
32
The inequality, if true, would be sharp for orthonormal systems. Pinasco (‘The
n−th linear polarization constant...’, arXiv Aug 2022) has proven the result for
n ≤ 14. It is also relatively easy to prove the result up to constants: by averaging
over points on the sphere and linearity of expectation, we see that
Z
E log (|⟨x, v1 ⟩ · ⟨x, v2 ⟩ · · · ⟨x, vn ⟩|) = n log |⟨x, v1 ⟩| dx.
Sn−1
This integral looks unpleasant but we know that, for sufficiently high-dimensions,
the inner product | ⟨x, v⟩ | is distributed like a Gaussian with variance 1/n. Thus
we expect that
Z √ nx2
ne− 2
Z
log |⟨x, v1 ⟩| dx ∼ √ log |x|dx.
Sn−1 R 2π
The integral can be evaluated
Z √ nx2
ne− 2 γ + log (2) 1
√ log |x|dx = − − log n,
R 2π 2 2
where γ ∼ 0.577 . . . is the Euler-Mascheroni constant. Thus
Note that several arguments leading to better constants are known (see Pinasco).
33
This has a number of different interpretations: as an affine map that sends the unit
n
cube {−1, 1} to points with, typically, at least one large coordinate. It can also be
understood as a series of n statistical tests that, while individually fair, are likely
to have at least one give a small p−value when tested on a sequence of fair flips
of a coin. Various other geometric interpretations are conceivable. I proved (‘Bad
Science Matrices’) that
1 X p
max ∥Ax∥ ℓ ∞ = (1 + o(1)) · 2 log n.
A∈Rn×n 2n n
x∈{−1,1}
n 2 3 4 5 6 7 8
βn ≥ 1.41 1.57 1.73 1.79 1.86 1.93 2
If we have the sum of k square roots of integers 1 ≤ ai ≤ n, how close can this be
to an integer? For example
√ √ √
3 + 20 + 23 ∼ 11 + 0.0000182859.
Note that if all the integers are squares themselves, then this sum of square roots is
an integer and we are not interested in that case, we are interested in ‘near-misses’.
Problem. Fix k ∈ N and use ∥x∥ to denote the distance between
x ∈ R and the nearest integer. How does
√ √ √ √
min {∥ a1 + · · · + ak ∥ : 1 ≤ a1 , . . . , ak ≤ n, a1 + · · · + ak ∈
/ N}
scale as a function of n?
k = 3√is the first open case. We know that if 1 ≤ a, b, c ≤ n are integers and
√ √
a+ b+ c∈ / N then
√ √ √ 1
∥ a + b + c∥ ≳ 7/2
n
with the following nice argument that we learned from Arturas Dubickas and Roger
Heath-Brown (independently): multiplying over all 8 possible choices of signs
Y √ √ √
± a± b± c ∈N
√
and since all 8 products are of size ∼ n, none of these expressions can be closer
than n−7/2 to an integer without being one. Conversely, we have the following nice
example due to Nick Marshall
p p p 4
∥ (k − 1)2 + 2 + (k + 1)2 + 2 + (2k)2 − 8∥ ∼ 5
k
which shows that
1 √ √ √ 1
7/2
≲ ∥ a + b + c∥ ≲ 5/2 .
n n
√
A natural guess would be n−3 : the numbers a mod 1 are pretty rigid but if we
sum 3 of them, maybe there is enough randomness that they start behaving like iid
random variables would?
35
I proved (‘Sums of square roots that are close to an integer’) that, when k gets
large, there are examples with
√ √ 1/3
0 <≲ ∥ a1 + · · · + ak ∥ ≲ n−c·k .
Surely one would expect the true rate to be closer to n−k .
Update (April 2024). Siddharth Iyer (Distribution of sums of square roots modulo
1) used an entirely different approach to show ≲ n−k/2 .
Update (June 2024). Additional results were obtained by Arturas Dubickas (J.
Complexity, 2024).
(4) Consider the gradient ascent started in x0 with respect to the energy E.
This gives rise to a curve γ that starts in γ(0) = x0 and satisfies
γ̇(t) = (∇E)(γ(t)).
(5) We stop the curve once it is within distance 2 of one of the existing points
x1 , . . . , xn ∈ R2 . Once the curve stops, we are at distance 2 of exactly one
existing disk (with likelihood 1).
Question. The ‘most popular’ disk, the one most likely to be hit,
how popular is it. What is the best possible upper bound on
max P (gradient flow hits xj )
1≤j≤n
in terms of n alone?
This estimate is nearly optimal when α is close to 0 but surely not optimal in
general. It also gives no nontrivial results when α ≥ 1 but one would expect n−cα
for some cα > 0 and all 0 < α < ∞. When α → 0, the optimal bound has to be
close to n−1/2 and when α → ∞, there is no nontrivial bound, one only gets ≲ 1.
Such Beurling estimates would, in turn, give rise to improved diameter estimate for
the Gradient Flow Aggregation process, see the paper for more details.
37
Is this true? If so, then this would be best possible. Is it possible to describe
other sequences having this property? Such sequences are candidates for having
interesting gap statistics.
There are now several papers concerned with questions of this type. One that is
especially worth emphasizing is a very nice result by Cohn - Goncalves (‘An optimal
uncertainty principle in √twelve dimensions via modular forms’). They prove that
the optimal constant is 2 in 12 dimensions.
These inequalities arise naturally when looking for the ‘best’ or ‘smoothest’ convo-
lution kernel. I would be interested in what can be said about the extremizers.
Question. What can be said about the extremizers of this func-
tional? One interesting question would be whether the extremizer
‘exploits’ the L∞ −bound fully and assumes it infinitely many times
such that |bu(ξ)| ∼ |ξ|−β . This would imply that the extremizer is
not smooth.
When n = 1 and β = 1, then for many values of α, the characteristic function
centered at the origin seems to play a special role. When n = 1 and β = 2, then
u(x) = 1 − |x| seems to play a special role (up to symmetries). It is not clear to me
whether they are global extremizers but it seems conceivable.
Discrete versions of these statements on Z have been proven in joint work with
Noah Kravitz (arXiv, July 2020). More precisely, we showed the
Pnfollowing: suppose
u : {−n, . . . , n} → R is a symmetric function normalized to k=−n u(k) = 1. We
show that every convolution operator is not-too-smooth, in the sense that
∥∇(f ∗ u)∥ℓ2 (Z) 2
sup ≥ ,
f ∈ℓ2 (Z) ∥f ∥ℓ2 2n + 1
and we show that equality holds if and only if u is constant on the interval {−n, . . . , n}.
In the setting where smoothness is measured by the ℓ2 -norm of the discrete second
derivative and we further restrict our attention to functions u with nonnegative
Fourier transform, we establish the inequality
∥∆(f ∗ u)∥ℓ2 (Z) 4
sup ≥ ,
f ∈ℓ2 (Z) ∥f ∥ℓ2 (Z) (n + 1)2
with equality if and only if u is the triangle function u(k) = (n + 1 − |k|)/(n + 1)2 .
It would be interesting to have variants of this type of statements for other ways
of measuring smoothness, other Lp −spaces.... – this seems to be quite interesting
and quite unexplored!
I would also be quite interested in what can be said about the optimal function
u when restricted to functions u : [−∞, 0] → R. This would have practical appli-
cations: when smoothing some real numbers (say, the stock prize or the current
temperature) we cannot look into the future. Thus the average has to be taken
with respect to the past (see also S & Tsyvinski ‘On Vickrey’s Income Averaging’).
Update (Feb 2024). Sean Richardson (‘A Sharp Fourier Inequality and the Epanech-
nikov Kernel’) solved the problem for the second derivatives without the assumption
that the kernel have a non-negative Fourier transform.
The extremal kernel is ex-
plicit and very close in shape to a parabola max 1 − |x|2 , 0 .
Littlewood originally conjectured that such a function should have ∼ |A| roots
which is now known to be false (Borwein, Erdelyi, Ferguson, Lockhart, Annals).
The best unconditional lower bound is due to Sahasrabudhe (Advances, 2016) which
shows that
number of roots ≳ (log log log |A|)1/2− .
Erdelyi (2017) improved the 1/2− to 1−. Surely it must be much bigger than that!
This is fairly classical object. The following object is less classical: define, for a
given function f : R → R and a given x ∈ R,
Z x+r Z x+s
1 1
rf (x) = inf f (z)dz = sup f (z)dz .
r>0 2r x−r s>0 2s x−s
So rf (x) is simply the shortest interval such that the average of f over that interval
is the same as the largest possible average.
Vague Problem. rf should assume many different values.
I proved (Studia Math, 2015) that if f is periodic and rf assumes only two values
{0, γ} and r−f also only assumes the same two values {0, γ}, then
f (x) = a + b sin (cx + d)
and c is determined by γ. The proof requires transcendental number theory (the
Lindemann-Weierstrass theorem), I always thought that was strange. Maybe we
even have:
Conjecture. If f ∈ L∞ (R) and rf assumes only finitely many
values, then
f (x) = a + b sin (cx + d).
Motivated by some heuristics (see paper), maybe we also have
Conjecture. Suppose f : R → R is C 1 and satisfies
f ′ (x + 1) − f (x + 1) = −f ′ (x − 1) − f (x − 1) whenever f (x) < 0.
Then
f (x) = a + b sin (cx + d) for some a, b, c, d ∈ R.
In general, it would be nice to have a better understanding of rf and how it depends
on f . Can rf (R) assume infinitely many values while not containing an interval?
40
where γ ranges over all closed geodesics γ : S1 → T2 and |γ| denotes their length.
The idea is that such an extremal geodesic somehow cannot be very long unless
the function oscillates a lot. If the function is very nice and smooth, then that
supremum should be attained by a relatively short geodesic.
Theorem (S, Bull. Aust. Math. Soc., 2019). Let f : T2 → R be at least s ≥ 2
times differentiable and have mean value 0. Then
Z
1
sup f dH1 ,
γ closed geodesic |γ| γ
I always though this was a really interesting result. I would expect that it’s not
quite optimal (there should be a loss of derivatives on the right-hand side). I would
also expect that there are analogous results on higher-dimensional tori Td . I would
in fact expect that such results actually exist in a wide variety of settings: a natural
starting point might be a setting where geodesics and Fourier Analysis work well
together.
Question What is the sharp form in T2 ? Is it possible to prove
analogous results on Td or in other settings? What is the correct
formulation of this underlying phenomenon without geodesics?
It’s not clear to me how to phrase this problem in a setting where geodesics don’t
make sense. What’s a proper way to encode this principle in Euclidean space?
−1 +1
+1 −1
Figure 22. The sign of sin (x) sin (y) for (x, y) ∈ T2 and a closed
geodesic that spends significantly more time in the positive region
than in the negative region. The flow γ(t) = (t, t) would spend
even more time in the positive region but that is not allowed: the
coefficients have to be different.
We will also assume that all the entries of the vector a are distinct. These linear
flows γ can be periodic or not periodic. We only care about the ones that are
periodic, this means that aLγ ∈ Zd for some minimal 0 < Lγ ∈ R in which case
this linear flow has length Lγ ∥a∥. What can be said about
Z
1
fd (γ(t))dt.
Lγ ∥a∥ one period
Typically it will be close to 0. What is the largest value it can assume? For d = 2
we solve the problem explicitly and find some very short geodesic that is the unique
maximizer. As d ≥ 3, the techniques from our paper might still apply but it seems
more challenging to get good values.
Update (Dec 2022). Dou, Goh, Liu, Legate, Pettigrew (‘Cosine Sign Correla-
tion’, arXiv) proved the following result: if a1 , a2 , a3 are three different positive
integers and if x ∼ U [0, 2π] is a uniformly distributed random variables then the
likelihood that cos (a1 x), cos (a2 x), cos (a3 x) are all positive or all negative is ≥ 1/9
with equality if and only if {a1 , a2 , a3 } = {m, 3m, 9m} for some m ∈ N. They con-
jecture that for 4 different numbers the extremal set {1, 3, 11, 33} (and multiples
thereof).
There is a somewhat dual question: given a cube Q ⊂ [0, 1]n whose center is in
0 ∈ Rn , can the cube be rotated in such a way so as to capture a lot more lattice
points than predicted by its volume? For the sake of concreteness, we ask
42
and the result follows by induction. If one had a better understanding of the ques-
tion above, one could try to remove more than 1 frequency at a time which might
lead to stronger results.
Are maximum and minimum assumed at the boundary? This famous conjecture
of J. Rauch has inspired a lot of work. I proved (Comm. PDE, 2020), that if Ω is
a convex domain of dimension N × 1, then maximum and minimum are at most
distance ∼ 1 from a pair of points whose distance is the diameter of Ω. This is the
optimal form of this statement (think of a rectangle), I always wondered whether
the argument could possibly be sharpened to say more about Hot Spots.
x1 x2
Figure 23. Maximum and minimum are attained close (at most
a universal multiple of the inradius away) to the points achieving
maximal distance (the ‘tips’ of the domain).
Update (Aug 2020). In a recent paper (‘An upper bound on the hot spots con-
stants’), it is shown that whenever the conjecture fails, it cannot fail too badly: if
Ω ⊂ Rd is a bounded, connected domain with smooth enough boundary, then
∥u∥L∞ (Ω) ≤ 60∥u∥L∞ (∂Ω) .
One naturally wonders about the optimal constant in this inequality. The proof
shows that 60 can be replaced by 4 in sufficiently high dimensions. An example of
Kleefeld shows that the constant is at least 1.001.
Update (May 2022). Mariano, Panzo & Wang (‘Improved upper bounds for
the Hot Spots constant of Lipschitz
√ domains’) have improved the constant in
∥u∥L∞ (Ω) ≤ c∥u∥L∞ (∂Ω) to c ≤ e + ε in high dimensions.
and this is sharp on the sphere. Another celebrated result is the local Weyl law: if
the manifold is assumed to have volume 1, then
n
X d−1
ϕk (x)2 = n + O(n d ).
k=1
Question. How large can
Xn
∥ϕk ∥L∞ be?
k=1
Is there a bound along the lines of
X n
∥ϕk ∥L∞ ≲ n(log n)c ?
k=1
we see that only ∼ n1/d of the first n eigenfunctions have a mean value different
from 0. In general, having many of these eigenfunctions have mean value 0 should
be a sign of great symmetry in the domain.
Problem. Is it true that among the first n Dirichlet eigenfunc-
tions on a domain in d−dimensions at least n1/d have a mean value
different from the ball? One might even conjecture that if there
are ≤ Cn1/d , then it already has to be the ball independently of C
(meaning that rate ∼ n1/d already implies it is the ball).
The ball is the most symmetric and should be the worst. Raghav and I proved
(’Dirichlet
√ eigenfunctions with nonzero mean value ’) that, up to logs, at least
n1/d of the first n have a mean value different from 0. Clearly, the square root
should not be there.
Note that in this model of randomness, the roots are random (other popular models
for random polynomials choose random coefficients).
Question. What is the distribution of roots of the (t · n)−th de-
rivative of the polynomial for 0 < t < 1?
In ‘A Nonlocal Transport Equation Describing Roots of Polynomials Under Dif-
ferentiation’ I gave a formal derivation of a PDE describing the evolution of the
density
∂u 1 ∂ Hu
+ arctan = 0,
∂t π ∂x u
where Hu is the Hilbert transform of u.
The best result in this direction is the beautiful work of Alexander Kiselev and
Changhui Tan (‘The Flow of Polynomial Roots Under Differentiation’) who study
the analogous problem on S1 (with polynomials replaced by trigonometric polyno-
mials). However, nothing so far seems to be known on the real line.
It’s an extraordinarily beautiful conjecture but it’s not so clear how to approach it.
2If we impose Neumann boundary conditions, then the first eigenfunction is constant ϕ ≡ c
1
and then, by orthogonality, all other eigenfunctions have mean value 0.
47
y z
x
x = ty + (1 − t)z
f (x) ≤ tf (y) + (1 − t)f (z)
We play this game until all the measure is on the boundary, we call it µ. The total
measure on the boundary then
µ(∂Ω) = H(Ω).
50
Suppose now someone, additionally, gives you the n numbers (|xi |)ni=1 , the absolute
value of n of these numbers. Is it possible to quickly recover the missing signs
xi = εi |xi |? Since we have strictly more information, the problem become easier
and can be solved in O(n3 ). But it somehow feels as if this additional information
should help us (and potentially help us a lot). Hau-tieng Wu and I (‘Recovering
eigenvectors from the absolute value of their entries’, on arXiv) propose an algo-
rithm that works some of the time. The problem should become a lot easier when
|λ| is very large, i.e. when |λ| ∼ ∥A∥.
which maps the origin to a vector y ∈ Rn satisfying ∥Ay − Ax∥ ≤ ε · ∥A∥ · ∥x∥. This
upper bound on k is independent of the spectral properties of the matrix A.
58. Finding the Center of a Sphere from Many Points on the Sphere
Here is a particularly funny way of solving linear systems Ax = b where A ∈ Rn×n is
assumed to be invertible (taken from ‘Surrounding the Solution of a Linear System
of equations from all sides’, arXiv, Sep. 2020). Denote the rows of A by a1 , . . . , an .
Then, for any y ∈ Rn and any 1 ≤ i ≤ n,
bi − ⟨y, ai ⟩
y and y+2· ai
∥ai ∥2
have the same distance to the solution x. This means we can very quickly generated
points that all have the same distance from the solution by starting with a random
guess for the solution and then iterating this procedure. Indeed, generating m
points on a sphere around the solution x has computational cost O(n · m), it is very
cheap. In particular, it is very cheap to generate c · n points on the sphere like that,
where c is a constant.
Problem. Given at least n + 1 points on a sphere in Rn , how
would one quickly determine an accurate approximation of its cen-
ter? Does it help if one has c · n points?
The problem can, of course, be solved by setting up a linear system – the question
is whether it can be done (computationally) cheaper if one is okay with only having
an approximation of the center.
A very natural way to do is to simply average the points. This is not very good
when the points are clustered in some region of space, though. I proved that if you
pick the rows of A with likelihood proportional to ∥ai ∥2 and then average, then the
arising sequence of points satisfies
m
1 X 1 + ∥A∥F ∥A−1 ∥
E x− xk ≤ √ · ∥x − x1 ∥.
m m
k=1
This gives rise to an algorithm that is as fast as the Random Kaczmarz method.
A better way of approximating the center would presumably give rise to a faster
method!
can now define, for any 0 < α < 1 the restricted singular value
∥AS x∥
σα,min (A) = min inf ,
S⊂{1,2,...,m}
|S|=αm
x̸=0 ∥x∥
where AS is the restriction of A to rows indexed by S. It’s clear that this quantity
will grow as α grows and coincides with the classical smallest singular value of
A when α = 1. Haddock, Needell, Rebrova & Swartworth (Quantile Kaczmarz,
SIMAX 2022) proved that for certain types of random matrices one has
r
3/2 m
σα,min (A) ≳ α with high likelihood.
n
I’d be interested in understanding what the best kind of matrix for this problem
would be, the one maximizing these quantities. Note that since the rows are all
normalized in ℓ2 , we can think of the rows as points on the unit sphere.
Let us consider the case where A ∈ Rm×n has each row sampled uniformly at
random from the surface measure of Sn−1 and suppose that the matrix is large,
m, n ≫ 1, and that the ratio m/n is large. Trying to find a subset S ⊂ {1, 2, . . . , m}
such that AS has a small singular value might be difficult, however, we can flip the
question: for a given x ∈ Sn−1 , how would we choose S to have
X 2
∥AS x∥2 = ⟨x, ai ⟩ as small as possible?
i∈S
2
This is a much easier problem: compute ⟨x, ai ⟩ for 1 ≤ i ≤ m and then pick S
to be the set of desired size corresponding to the smallest of these numbers. Using
rotational invariance of Gaussian vectors, we can suppose that x = (1, 0, . . . , 0).
Then we expect, in high dimensions, that
1
⟨ai , x⟩ ∼ √ γ where γ ∼ N (0, 1).
n
This suggest a certain picture: large inner products are those where many rows ai
are nicely aligned with x and we know with which likelihood to expect them (these
are just all the points in the two spherical caps centered at x and −x). This would
then suggest that, in the limit as m, n, m/n → ∞, we have an estimate along the
lines of
2 Z β
σα,min (A) 1 2
=√ e−x /2 x2 dx
(m/n) 2π −β
55
There is an underlying vector x ∈ Rd all of whose entries are either −1, 0, 1 and
most of them are 0. In fact, we may assume that only a relatively small number is
±1. We would like to understand how x looks like but we only have access to
y = Ax + ω,
where A ∈ Rn×d is a random matrix filled with independent N (0, 1) Gaussians and
ω ∈ Rn is a random Gaussian vector.
y= x+w
It is not terribly difficult to see that if n is very, very large, then it is fairly easy
to reconstruct x. The question is: how small can you make n and still reconstruct
x with high likelihood? What is remarkable is that this is doable even when n is
smaller than the number of variables d. Ofir and I propose a fun algorithm: you
take random subsets of the n rows, then do a least squares reconstruction and then
average this over many random subsets. The method seems to differ from other
methods and does work rather well even when n is quite small.
56
Update (Mar 2021). Ofir Lindenbaum and I found a tweak of the method which we
call RLS (Refined Least Squares for Support Recovery, arXiv, March 2021) which
leads to state-of-the-art results in many regimes. It seems very likely that we have
not yet fully exhausted the possibility of the method.
Part 7. Miscellaneous/Recreational
61. Geodesics on compact manifolds
√
This question is about whether the vector field V (x, y) = ( 2, 1) on the two-
dimensional flat torus T2 has, in some sense, the best mixing properties. Let (M, g)
be a smooth, compact two-dimensional Riemannian manifold without boundary: let
x ∈ M be a particular starting point and let γ : [0, ∞] → R be a geodesic starting
in x (in some arbitrary direction; parametrized according to arclength).
For any ε > 0, we can define Lε as the smallest number such that
{γ(t) : 0 ≤ t ≤ Lε } is ε − dense on the manifold.
Put differently, Lε is how long we have to go along the geodesic so that it visits
every point on the manifold up to distance at most ε. Here’s the question: how
57
x0
long does Lε have to be given ε? Since its ε−neighborhood is the entire manifold,
we expect Lε · ε ≳ vol(M ).
Problem. Suppose (M, g) has the property that there exists a
fixed geodesic such that
c
Lε ≤
ε
for one fixed universal c and all sufficiently small ε. What does this
tell us about (M, g)?
One example would be M = T2 with the canonical metric and the geodesic mov-
ing in a direction whose ratio of x and y−coordinates is badly approximable. It
seems reasonable to assume that one can glue many tori together but the queston
is whether these are fundamentally the only type of examples. Does hyperbolicity
help?
One would perhaps assume that in a very hyperbolic two-dimensional setting one
has something like
log(1/ε)
Lε ≲
ε
for fairly generic geodesics?.
Update (Dec 2022). Apparently this type of property is known as the existence of
a ‘superdense’ geodesic, two very recent papers in this spirit are
• J. Beck and W. Chen, Generalization of a density theorem of Khinchin and
diophantine approximation
• J. Southerland, Superdensity and bounded geodesics in moduli space
The best known lower bound is due to Gaudio & Jaillet (Op. Rest. Lett., 2020)
and is β ≥ 0.6277. I proved (Adv. Appl. Prob.) that β ≤ βBHH − 10−6 though,
if numerical evaluation of integrals is permissible, the improvement is a bit bigger.
Numerical experiments suggest that β ∼ 0.7. It seems like such a fundamental
question, it would be nice to understand this a bit better.
Update (Aug. 2020). Bade, Cui, Labelle, Li (arXiv, August 2020) have looked at
these types of sets in other settings as well. Lots and lots of structure!
Update (Dec. 2022). Andrei Mandelshtam (On fractal patterns in Ulam words,
arXiv:2211.14229 ) has found some highly intricate structure in Ulam words.
Figure 28. The set in R3 generated from (1, 0, 0), (0, 1, 0), (0, 0, 1)
(projected onto the plane that is orthogonal to (1, 1, 1)).
This sequence arose out of some fairly unrelated questions (that were further pur-
sued in a paper with X. Cheng and G. Mishne, J. Number Theory) but turned out
to be quite curious.
Theorem (S, Mathematics Magazine 2018). The function fn (x) has a strict local
minimum in x = p/q for all n ≥ q 2 .
The asymptotically sharp scaling is given by n ≥ (1 + o(1))q 2 /π. It’s not difficult
to see that fn grows like log n and thus f∞ does not exist. But as n becomes large,
there does seem to be some sort of universal function that emerges. Is it possible
to make some more precise statements about fn ?
−∆ϕk = λk ϕk
60
Let Ω = [0, 1]d (presumably this holds on much more general domains, manifolds,
etc.) and let f : [0, 1]d → R denote a function with mean value 0. Then
µ = max(f, 0)dx and ν = max(−f, 0)dx
are two measures with the same total mass (since f has mean value 0). How much
does it cost to ‘transport’ µ to ν? If we assume that transporting a ε−unit of
measure distance D costs ε · D, then this naturally leads to the ‘Earth-Mover’
Wasserstein distance W1 . The size of W1 (µ, ν) depends on the function, of course.
Here’s a basic idea: if W1 (µ, ν) is quite small, then the transport is cheap. But if
the transport is cheap, then most of the positive part of f has to lie pretty close
to most of the negative part of f . But that should somehow force the zero set
{x : f (x) = 0} to have large (d − 1)−dimensional volume. In (Calc Var Elliptic
Equations, 2020) I proved in d = 2 dimensions, i.e. for f : [0, 1]2 → R, that
∥f ∥2L1
W1 (f+ , f− ) · H1 x ∈ (0, 1)2 : f (x) = 0 ≳
.
∥f ∥L∞
This result is sharp. Amir Sagiv and I generalized this to higher dimensions (SIAM
J. Math. Anal). The currently sharpest form in higher dimensions is due to Carroll,
Massaneda & Ortega-Cerda (Bull. London Math. Soc.) and reads
2− d1
d−1
d ∥f ∥L1
W1 (f+ , f− ) · H x ∈ (0, 1) : f (x) = 0 ≳d ∥f ∥L1 .
∥f ∥L∞
Here, it is not clear whether the power is optimal or not. Of course, for all these
inequalities it would also be interested in having the same underlying thought ex-
pressed in other ways: certainly the idea behind these things can be expressed in
many different ways.
Update (Nov. 2020). A sharp form of this principle has been established in
Fabio Cavalletti, Sara Farinelli, Indeterminacy estimates and the
size of nodal sets in singular spaces, arXiv:2011.04409
where Lp,q is the Lorentz space and W∞ the ∞−Wasserstein distance. This in-
equality is ‘almost’ (in a suitable sense) proven in ‘On a Kantorovich-Rubinstein
63
inequality’ (arXiv: Oct 2020). The most general question is whether there exist
inequalities of the type
Z Z
f (x)dx − f (x)dµ ≤ c · ∥∇f ∥Lp,q · Wr (µ, dx),
[0,1]d [0,1]d
Update (March 2022). The conjectured inequality has been established by Filippo
Santambrogio in the preprint ‘Sharp Wasserstein estimates for integral sampling
and Lorentz summability of transport densities’ (cvgmt: 5463 and Journal of Func-
tional Analysis)
where
∞ ∞
(k+1/2)2 2
X X
θ2 (q) = q and θ3 (q) = qk
k=−∞ k=−∞
with the identities
2 ∞
1 π log q X 1
θ2 (q) = (q 2 , q 2 )∞ · exp − + + ,
log q 12 12 2
k sinh π k k=1 log q
and
2 ∞ k
1 π log q X (−1)
θ3 (q) = (q 2 , q 2 )∞ · exp − + + ,
log q 12 12 k sinh π k
2
k=1 log q
and
∞
!
π2 log (1/q) X 1 qbk
2 2 1 log (1/q)
(q ; q )∞ = exp − − log + − ,
12 log (1/q) 2 π 12 k 1 − qbk
k=1
where qb is an abbreviation for
2π 2
qb = exp − .
log (1/q)
For q ∈ (0, q0 ), one could probably establish it using a computer.
64
A famous inequality of Alon-Milman shows that already the first term of the sum
is essentially big enough provided one compensates for vertices of large degree: if
we denote the maximal degree by ∆, then
2∆(log #V )2
diam(G)2 ≤ .
λ2
The question is now whether one can replace the dependency on ∆ and #V by
summing over the remainder of the spectrum.
Motivation. This inequality would imply that the notion of graph curvature
introduced in ‘Graph Curvature via Resistance Distance’ satisfies a Bonnet-Myers-
type inequality: if the curvature is bounded from below by K > 0, then
diam(G) ≤ c · K −1/2 .
Update (May 2023). Using an identity of McKay, we can rephrase the question in
terms of the average commute time. The commute time between two vertices i, j is
the expected time a random walk needs to go from i to j and then back to i. The
question is then whether
1 X |E|
commute(v, w) ≳ diam(G)2 .
|V |2 |V |
v,w∈V
The inequality is sharp up to constants for cycle graphs Cn where the average
commute time is ∼ n2 . One of the reasons the inequality is interesting is that |E|
appears: adding more edges increases the global commute time.
Update (July 2023). I asked the question on mathoverflow and Yuval Peres pro-
duced the following counterexample.
65
√
Let ℓ = ⌊ n/2⌋ and let G1 , . . . , Gℓ be disjoint cliques of size ℓ. Let
K be a clique on the remaining n − ℓ2 > n/2 nodes. Connect every
node in Gi to every node in Gi+1 for i < ℓ and connect every node
in Gℓ to every node in K. This defines a graph of diameter ℓ. For
i < ℓ/2 the effective resistance between a node v in Gi and a node
w in K is Θ(1/ℓ) so the commute time between v and w is Θ(ℓ3 ).
Consequently, the average commute time between all pairs is Θ(ℓ3 )
which contradicts the conjectured inequality.
Update (June 2023). I asked whether anyone had any thoughts on Pasteczka’s
conjecture on mathoverflow, no replies.
Update (July 2023). Noah Kravitz and Mitchell Lee (‘Hermite–Hadamard in-
equalities for nearly-spherical domains’) just uploaded a short paper to the arXiv
that proves Pasteczka’s conjecture for domains sufficiently close to the ball.
Update (July 2023). Nazarov and Ryagobin prove Pasteczka’s conjecture in two
dimensions and disprove it in general in dimensions ≥ 3. This resolves Pasteczka’s
conjecture but, naturally, raises a new question (especially when combining it with
the inequality of. Kravitz-Lee): when do domains admit a Hermite-Hadamard
inequality?
Department of Mathematics, University of Washington, Seattle
Email address: [email protected]