From Counting Blocks To The Lebesgue Measure, With An Application To The Allouche-Hu-Morin Limit Theorem On Block-Constrained Harmonic Series
From Counting Blocks To The Lebesgue Measure, With An Application To The Allouche-Hu-Morin Limit Theorem On Block-Constrained Harmonic Series
JEAN-FRANÇOIS BURNOL
Abstract. We consider the harmonic series S(k) = (k) m−1 over the integers having
P
arXiv:2405.03625v1 [math.NT] 6 May 2024
k occurrences of a given block of b-ary digits, of length p, and relate them to certain
measures on the interval [0, 1). We show that these measures converge weakly to bp times
the Lebesgue measure, a fact which allows a new proof of the theorem of Allouche, Hu,
and Morin which says lim S(k) = bp log(b). A quantitative error estimate will be given.
Combinatorial aspects involve generating series which fall under the scope of the Goulden-
Jackson cluster generating function formalism and the work of Guibas-Odlyzko on string
overlaps.
1. Introduction
Throughout this work, an integer b > 1 and a block of b-ary digits w = d1 . . . dp having
length |w| = p ≥ 1 are fixed.
Let Sw (k) = (k) m−1 where the denominators are the positive integers whose (minimal)
P
representations as strings of b-ary digits contain exactly k occurrences of the block w. Their
finiteness will be reproven in the text body. For single-digit blocks w, i.e. p = 1, Farhi [8]
proved lim Sw (k) = b log(b). Allouche, Hu, and Morin [4] (and [3] earlier for the b = 2 case)
have now extended this to all p ≥ 1:
The approach of [3, 4] exploits special properties of a certain rational function of a combina-
torial nature, which had been defined and studied for binary base already in [1, 2]. Due to the
possibilities of self-overlap of w, (1) for p ≥ 2 seems indeed to be quite deeper than for p = 1.
Using completely different tools, we confirm (1) in the following quantitative form:
This estimate is definitely not sharp; the author studied in [7] the situation for one-digit
blocks w, with k fixed and b → ∞ and the results suggest, for |w| = 1, that the distance of
Sw (k) to b log(b) is of the order b−2k up to some factor depending on b and the digit w.
Remark 1. The rules of [1, 13, 3, 4] for counting (overlapping) occurrences of w in an integer
n are special if w’s first digit is 0 and not all its digits are 0: the integer n is first extended
to the left with infinitely many zeros. This either will not modify the number of occurrences
or will increase it by exactly one unit. Due to this, our Sw (k) for such w’s is not the same
quantity as considered in [4]. With Sw (k) replaced by the one using the [4] conventions, a
modified upper bound holds: our method obtains Theorem 1 but with an extra b factor and the
requirement k ≥ 2.
In [5] where we investigated the Sw (k)’s for w a single-digit block (Irwin series [14]), we
worked with formulae of the type
µk (dx)
Z
(2) Sw (k) =
[b−1 ,1) x
for some discrete measures µk on [0, 1). In [6] this was extended to the |w| = 2 case. In [5, 6],
the integral formula (2) is submitted to transformations induced from recurrence properties
among the µk ’s, and then converted, using the moments of the µk ’s, into series allowing the
numerical computation of the Sw (k)’s.
Here, we will not transform the integral formula (2) any further but concentrate on what
can be derived directly from it as k goes to ∞. The essential fact which we establish is:
Theorem 2. The measures µk converge weakly to bp dx, i.e. for any interval I ⊂ [0, 1), there
holds limk→∞ µk (I) = bp |I|, where |I| = sup I − inf I is the Lebesgue measure of I.
This explains the Farhi and Allouche-Hu-Morin theorem (1), in a manner very different
from the works of these authors. Further, for intervals I = [t, u), with t < u rational numbers
with powers of b denominators (“b-imal numbers”), the sequence (µk (I)) becomes constant
for k large enough, which is what will give the quantitative estimate 1.
The contents are as follows: the next section will explore underlying combinatorics and
obtain the core result for our aims, which is Theorem 3. A general framework of Goulden-
Jackson ([9], [10, §2.8]) could be applied to obtain the shape of certain generating series which
are parts of the input to Theorem 3, but we shall work from first principles. Although it is
not needed for our aims, a result of Guibas-Odlyzko [12] will also be proven.
Then in the last section we derive Theorems 1 and 2 as easy corollaries.
It has definite advantages in the study of the Sw (k)’s to work not only with integers but also
with strings (they may also be called “words”, but we will stick with “string”), i.e. elements
of D = ∪l≥0 Dl , D = {0, . . . , b − 1}. So all strings considered here have a finite length.
Each such string of b-ary digits X defines a non-negative integer n(X) in the usual way
of positional left to right notation. We set n(∅) = 0. Conversely each non-negative integer
n has a minimal representation X(n) as a string, all others being with prepended zeros. In
particular X(0) = ∅. The length of X is denoted |X| or l(X).
FROM COUNTING BLOCKS TO LEBESGUE MEASURE 3
Whenever we subtitute b−1 for t in such a generating series, we say that we compute the
“total mass” of the objects counted by the series. I.e. each string X is weighted as b−|X| . For
example strings of length l have a total mass equal to 1 (also for l = 0, there is then only one
X, the empty string). Here is our main result:
Theorem 3. Let s be any string of length ℓ, and let k0 (s) = max kw (sz) where the maximum
is taken over all strings z of length p − 1 (with p = |w|). For each k ≥ 1 + k0 (s), the total
mass of the k-admissible strings having s as prefix is equal to bp · b−ℓ . In particular for any
k ≥ 1, Zw (k)(b−1 ) = bp .
Note that k0 (s) ≤ ℓ in the above theorem so the conclusion applies to any k > ℓ. As
notational shortcut, in place of appending (b−1 ) to express evaluation at b−1 , we sometimes
replace the letter Z by M (for “mass”), so the theorem says in particular Mw (k) = bp .
It has turned out that the arguments which allowed us to prove (3) also allow to express
all Zw (k)’s, k ≥ 1 in terms of Zw (0). The existence of such expressions is related to the fact
that the doubly generating series
∞ X
X ∞ ∞
X
(4) Nw (k, l)rk tl = Zw (k)rk
k=0 l=0 k=0
falls under the scope of a general theory of Goulden and Jackson (see [9], [10, §2.8]; see
also Exercise 14 from [15, Chap. 4] as an introduction) where a notion of cluster generating
function is fundamental. It would take us a bit too far to discuss the details of the notion here,
and is not needed as we will obtain a complete description of the Zw (k) from first principles.
Guibas and Odlyzko define the auto-correlation polynomial of the block w as follows: Aw =
i
P
0≤i<p ci t with ci ∈ {0, 1} and ci = 1 if the prefix of w of length p − i is identical with the
suffix of the same length, or equivalently if i is a period of w, i.e., dj = di+j for 1 ≤ j ≤ p − i
(the same authors have characterized in [11] all possible such Aw ’s). For example if p = 3,
w can be of three types: abc (Aw = 1), aba (Aw = 1 + t2 ), or aaa (Aw = 1 + t + t2 ). The
Guibas-Odlyzko formula is then:
Aw
(5) Zw (0) =
(1 − bt)Aw + tp
The rationality of Zw (0) can also be obtained as a very special case of Proposition 4.7.6 of
[15] which itself is an application of the general transfer matrix method.
FROM COUNTING BLOCKS TO LEBESGUE MEASURE 4
It turns out that we don’t really need formula (5), but for completeness we will include a
proof.
Let’s start with proving that b−1 is always in the open disk of convergence. This is most
probably in the literature but we don’t know a reference.
Lemma 4. Each Zw (k) has a radius of convergence at least equal to (bp − 1)−p > b−1 . And
−1
X
(6) Zw (j)(b−1 ) ≤ p(k + 1)bp
0≤j≤k
P
Proof. Let’s give an upper bound for 0≤j≤k Nw (j, l). We write l = qp + r with 0 ≤ r < p.
We take a string of length l and consider it as q successive blocks of length p and r extra
digits. If the string has at most k occurrences of w, then in particular among the q blocks of
length p, at most k of them are equal to w and the others must be distinct from w. Hence
X X q
(7) Nw (j, l) ≤ (bp − 1)q−i × br , l = qp + r
i
0≤j≤k 0≤i≤k
q
(Or course with i = 0 for i > q). So we obtain a majorant series equal to
∞
X X q X
(8) (bp − 1)q−i tpq × b r tr
i
0≤i≤k q=0 0≤r<p
Proof. Consider the contribution to Sw (k) by integers of a given length l ≥ 1, i.e. those integers
in [bl−1 , bl ) having k occurrences of w. This contribution is bounded above by b1−l Nw (k, l),
hence Sw (k) ≤ bZw (k)(b−1 ) < ∞.
We defined the generating series Zw (k). Let’s similarly define Zw (x, j, y) as the generating
series of the counts per length of k-admissible strings which start with a given block x and
end with a given block y. We will here mainly use them for x and y being both of length
p − 1. So if p = 1, x and y are both the empty string and add no constraints. We also define
Zw (x, j) with only the condition to start with x. This is a shortcut for Zw (x, j, ∅). We will
also need Zw (∅, j, y) which it would by now be more problematic to denote as Zw (j, y) because
the two-argument Zw is already defined.
We let u be the prefix (initial sub-block) d1 . . . dp−1 of w and v its suffix (terminating
sub-block) d2 . . . dp .
FROM COUNTING BLOCKS TO LEBESGUE MEASURE 5
Using this lemma with s = v the (p − 1)-suffix of w we obtain in particular, for all k ≥ 0:
k
(11) Zw (v, k) = t2−p Zw (v, 0, u) Zw (v, 0)
To shorten notations, when evaluating at b−1 , in place of appending (b−1 ) we replace the
letter Z by M (for total mass).
The next lemma will imply that the constant value of Mw (v, k), k ≥ 0, is b.
Lemma 8. The following relations hold and show that knowing one of Zw (0), Zw (v, 0) or
Zw (v, 0, u) determines the other two. They also imply that Mw (v, 0) = b and Mw (v, 0, u) =
b2−p .
Proof. One surely has ord0 (t2−p Zw (v, 0, u)) ≥ 1, so it is licit in the algebra of formal power
series to compute the sums of the equations (11) for all k ≥ 0, to obtain the identity
∞
tp−1 X Zw (v, 0)
(14) = Zw (v, k) =
1 − bt 1 − t2−p Zw (v, 0, u)
k=0
′
hence (12). Let x = dx be a 0-admissible string, of length at least 1. If d 6= d1 (where d1 is
the first digit of w), then x′ is an arbitrary (possibly empty) 0-admissible string. If d = d1
then x = d1 x′ with x′ an arbitrary (possibly empty) admissible string not starting with the
(p − 1)-suffix v of w. Hence
Proof. Let r(w) be the reversed block. Its (p−1)-suffix is r(u). We thus have (1−bt)Zr(w)(0) =
1 − tZr(w) (r(u), 0). But mapping strings to their reversals, it is clear that Zr(w) (0) = Zw (0),
and Zr(w) (r(u), 0) = Zw (∅, 0, u). Thus Zw (∅, 0, u) = Zw (v, 0).
which via equations (12) and (13) can be used to express Zw (k) in terms of Zw (0).
Proof. For any given block s of length p−1 we have from Lemma 6 Zw (s, k) = t2−p Zw (s, 0, u)·
Zw (v, k − 1). For k ≥ 1, a k admissible string X necessarily has length at least p, hence it has
a prefix s of length p − 1 available. So Zw (k) is the sum of the Zw (s, k) over all s of length
P
p − 1. Similarly s,|s|=p−1 Zw (s, 0, u) = Zw (∅, 0, u) because any string terminating in u has
length at least p − 1 hence has a starting part s of length p − 1 associated with it. Hence
and the formula for Zw (v, k−1) is given by (11). And we know that Zw (∅, 0, u) = Zw (v, 0).
Let now s be an arbitrary string and define k0 (s) = max|z|=p−1 kw (sz) as in the theorem
statement. Suppose k > k0 (s) and let sx be an s-prefixed string which is k-admissible. Thus
|x| ≥ p. Let split x into zy with z its prefix of length p − 1 (and y necessarily not empty). If
w occurs in sx = szy it is either in sz or in zy, so kw (sx) = kw (sz) + kw (zy). We can thus
partition the set of all x such that kw (sx) = k according to the bp−1 possible z’s. And we
obtain the identity associated to this partition:
X
(18) k > k0 (s) =⇒ Zw (s, k) = t|s| Zw (z, k − kw (sz))
|z|=p−1
Using now the masses given in Lemma 8 (and thanks to jz = k − kw (sz) > 0) we obtain
X
(20) Zw (s, k)(b−1 ) = b−|s| · bp−2 · Zw (z, 0, u)(b−1 ) · 1jz −1 · b
|z|=p−1
FROM COUNTING BLOCKS TO LEBESGUE MEASURE 7
P
We have explained earlier that |z|=p−1 Zw (z, 0, u) = Zw (∅, 0, u) = Zw (v, 0) and we know
that substituting t = b−1 gives a mass equal to b. So Zw (s, k)(b−1 ) = bp b−|s| .
We will also need the following additional fact (for k ≥ 2 it is a corollary of the Theorem
3 we just proved) which for p = 2 and k = 1 had been already observed in [6, §3]:
Proposition 11. For any digit d ∈ D, the total mass of the k-admissible strings starting with
d is bp−1 if k ≥ 1. So the total mass of the k-admissible integers is (b − 1)bp−1 for k ≥ 1.
Proof. Let d1 be the first digit of w. Let d 6= d1 be another digit. Then X = dx is k-admissible
if and only x is, hence the result in that case. Thus the k-admissible strings not starting with
d1 make up for a total mass of (b − 1)b−1 · bp . This forces the k-admissible strings starting
with d1 to have a total mass bp−1 too (we have used that the empty string is not admissible
as k ≥ 1).
We can deduce in a novel manner Mw (v, k) = Mw (v, k − 1) hence Lemma 7 from this. The
above paragraph showed (without using any anterior result apart from finiteness) that the
mass of k-admissible strings starting with d1 is b−1 times Mw (0). But X = d1 x is admissible
if either x does not start with v and is k-admissible, or x starts with v and is (k−1)-admissible.
This gives a mass of b−1 (Mw (0) − Mw (v, k)) + b−1 Mw (v, k − 1). But the final result has to
be b−1 Mw (0). So Mw (v, k) = Mw (v, k − 1).
In order to have a fully-rounded picture we now go the extra step of proving the Guibas-
Odlyzko formula (5). We can partition the set A of 0-admissible strings X according to the
value of k(vX). We obtain a partition of A into A0 , . . . , Ap−1 (some of these possibly empty,
of course). Suppose X belongs to Aj , j ≥ 1 and let i ∈ {1, . . . , p − 1} be the index in wX,
counting from the left and starting at 0, of the last occurrence of w in wX. We can think
of i as how many times we need to shift w one unit to the right in order to reach that final
occurrence of w in wX. It is also the index in vX of the last occurrence of w, but starting
the index count at 1, not 0. Recall in what is next that we write w = d1 . . . dp .
The string X ∈ Aj thus starts with the last i digits of w: dp−i+1 . . . dp is a prefix of X.
And it is necessary that dp−i be the first digit to the left of X in wX, i.e. dp−i = dp , and
further dp−i−1 = dp−1 , etc... until d1 = di+1 . So the index i ≥ 1 is necessarily a period of w.
We will now prove that i is exactly the jth positive period of w in increasing order.
Let si be the length i suffix of w, and ri the length i prefix of w, for the period i. Then
wsi = ri w. We found that X starts with si , X = si Y , so wX = wsi Y = ri wY . We
have k(wX) = k(ri wY ) = k(ri w) + k(wY ) − 1 = k(ri w) + k(vY ). Thus j + 1 = k(wX) =
k(ri w) + k(vY ). We defined i as giving the position in wX of the last occurrence of w, and the
si in the decomposition X = si Y is the suffix of this last occurrence of w. So no occurrence of
w in wX can touch Y , which means that k(vY ) = 0 and thus j + 1 = k(ri w) = k(wsi ). This
characterizes i as being the jth positive period of w, indeed we observe that for all smaller
periods ι, their wsι is a prefix of wsi . Let’s thus now denote our i also as ij .
FROM COUNTING BLOCKS TO LEBESGUE MEASURE 8
For each string X, we define the b-imal number x(X) ∈ [0, 1) simply by putting X immedi-
ately to the right of the b-imal separator, i.e. x(X) = n(X)/b|X|. We define the measure µk as
the sum over all k-admissible strings of the Dirac masses at points x(X) with weights b−|X| .
In [5, §4] it is explained that as the discrete measure has finite total mass, one can use it to in-
tegrate any bounded or any non-negative function (in the latter case possibly obtaining +∞).
In particular, consider integrating the function g(x) being defined as 1/x for b−1 ≤ x < 1
and zero elsewhere. The computation of µk (g) involves only those strings X starting with a
non-zero b-ary digit (in particular the empty string which maps to the real number 0 is not
involved) hence contributions are in one-to-one correspondance with those positive integers
m having exactly k occurrences of the block of digits w. And the contribution of an integer
m ∈ [bl−1 , bl ) is (m/bl )−1 (value of the function) times b−l (weight associated to the Dirac
point-mass). That gives exactly m−1 hence formula (2) for Sw (k).
This reasoning works for all k ≥ 0 but now we suppose k ≥ 1. Recall in the following that
p = |w|.
Let us first suppose k = 1. We obtain from (2) the bounds
Now, µ1 ([b−1 , 1)) is the total mass (i.e. each string X weighs b−|X| if kept) of the 1-admissible
strings starting with a non zero digit. By proposition 11 this is (b − 1)bp−1 . We also have
(rather trivial and sub-optimal) bounds
1
bp dx
Z
(27) bp (1 − b−1 ) ≤ ≤ b · bp (1 − b−1 )
b−1 x
p 2 p−1
Hence |b log(b) − Sw (1)| ≤ (b − 1) b . It wasn’t really optimal to replace 1/x either by 1 or
by b! So let’s make a finer decomposition along sub-intervals Ia = [a/b, (a + 1)/b)), 1 ≤ a < b.
We know from Proposition 11 that µ1 (Ia ) = bp−1 for each a ∈ D. Hence we can write
a=b−1
X
(28) Sw (1) = bp−1 ya , (a + 1)−1 b ≤ ya ≤ a−1 b
a=1
bp dx/x:
R
And we also have regarding
a=b−1
X
(29) bp log(b) = bp−1 za , (a + 1)−1 b ≤ za ≤ a−1 b
a=1
Hence
a=b−1
X 1 1
(30) |Sw (1) − bp log(b)| ≤ bp ( − ) = (b − 1)bp−1
a=1
a a+1
all strings s of length k − 1 and whose first digit is non-zero (n(s) is the associated integer).
The value of µk (I(s)) is the total mass of all k-admissible strings having s as prefix (we used
that for example with k = 4, whatever w, 10 can not contribute any mass to [0.100, 0.101)
despite x(10) belonging to it). By Theorem 3, as k is greater than the length of s, this total
mass is bp · b−|s| = bp−k+1 . We can thus simultaneously write Sw (k) = s bp−k+1 ys with
P
(n(s) + 1)−1 bk−1 ≤ ys ≤ n(s)−1 bk−1 and bp log(b) = s bp · b−k+1 zs with (n(s) + 1)−1 bk−1 ≤
P
[t, u) must share the prefix s which represents (perhaps necessarily with leading zeros) the
integer n with l digits. And Theorem 3 says that for k > l the total mass of such k-admissible
strings with prefix s is bp−l . There remains to do case 1) i.e. singletons {x}. If x is not b-imal
the measure vanishes. Else we need to check the contribution originating in a string s, either
empty or ending in a non-zero digit, and all its extensions s0...0. If w isn’t a string of zeros,
for k large enough nothing contributes and µk ({x}) = 0. Else for k ≫ 1, there is only one
choice, the one of length k + c(s) for some constant, hence the total mass goes to zero.
The proof of Theorem 2 is thus complete.
References
[1] Allouche, J.-P., Shallit, J.O.: Infinite products associated with counting
blocks in binary strings. J. London Math. Soc. (2) 39(2), 193–204 (1989)
https://doi.org/10.1112/jlms/s2-39.2.193
[2] Allouche, J.-P., Hajnal, P., Shallit, J.O.: Analysis of an infinite product algorithm. SIAM
J. Discrete Math. 2(1), 1–15 (1989) https://doi.org/10.1137/0402001
[3] Allouche, J.-P., Morin, C.: Kempner-like harmonic series. (AMM, to appear) (2023).
https://arxiv.org/abs/2305.18180
[4] Allouche, J.-P., Hu, Y., Morin, C.: Ellipsephic harmonic series revisited (2024).
https://arxiv.org/abs/2403.05678
[5] Burnol, J.-F.: Measures for the summation of Irwin series (2024).
https://arxiv.org/abs/2402.09083
[6] Burnol, J.-F.: Summing the "exactly one 42" and similar subsums of the harmonic series
(2024). https://arxiv.org/abs/2402.14761
[7] Burnol, J.-F.: Un développement asymptotique des sommes harmoniques de Kempner-
Irwin (2024). https://arxiv.org/abs/2404.13763
[8] Farhi, B.: A curious result related to Kempner’s series. Amer. Math. Monthly 115(10),
933–938 (2008) https://doi.org/10.1080/00029890.2008.11920611
[9] Goulden, I.P., Jackson, D.M.: An inversion theorem for cluster decompositions of se-
quences with distinguished subsequences. J. London Math. Soc. (2) 20(3), 567–576 (1979)
https://doi.org/10.1112/jlms/s2-20.3.567
[10] Goulden, I.P., Jackson, D.M.: Combinatorial Enumeration. Wiley-Interscience Series in
Discrete Mathematics, John Wiley & Sons, Inc., New York, (1983). With a foreword by
Gian-Carlo Rota, A Wiley-Interscience Publication
[11] Guibas, L.J., Odlyzko, A.M.: Periods in strings. J. Combin. Theory Ser. A 30(1), 19–42
(1981) https://doi.org/10.1016/0097-3165(81)90038-8
[12] Guibas, L.J., Odlyzko, A.M.: String overlaps, pattern matching, and
nontransitive games. J. Combin. Theory Ser. A 30(2), 183–208 (1981)
https://doi.org/10.1016/0097-3165(81)90005-4
[13] Hu, Y.: Patterns in numbers and infinite sums and products. J. Number Theory 162,
589–600 (2016) https://doi.org/10.1016/j.jnt.2015.09.025
[14] Irwin, F.: A Curious Convergent Series. Amer. Math. Monthly 23(5), 149–152 (1916)
https://doi.org/10.2307/2974352
[15] Stanley, R.P.: Enumerative Combinatorics. Volume 1, 2nd edn. Cambridge Studies in
Advanced Mathematics, vol. 49, Cambridge University Press, Cambridge, (2012)
Université de Lille, Faculté des Sciences et technologies, Département de mathématiques, Cité
Scientifique, F-59655 Villeneuve d’Ascq cedex, France
Email address: [email protected]