Pseudo-Random Number Generators
Pseudo-Random Number Generators
net/publication/299681629
CITATIONS READS
5 82
4 authors, including:
All content following this page was uploaded by Igor E. Shparlinski on 25 April 2016.
C H A P T E R 13
1
(13.1) E ^ ^ , x€N,
2= 0
a(x + i)
E
2= U
xeN,
or
E
a[x + hi
x eN,
=0
for integers h\,... , / i m . Elements of the first sequence look more independent, so
t h a t construction is more common (without any particular theoretical justification,
it must be a d m i t t e d ) . The integer M is called the modulus of the pseudo-random
number generator.
In order to use these numbers as pseudo-random numbers, which should as a
minimal requirement be uniformly distributed on the unit interval, information on
their period and their distribution on [0,1] (more generally, about the distribution
of the vectors ( 7 ^ , . . . , 7 ^ + s _ i ) in the s-dimensional unit cube) is needed. T h a t is,
estimates are needed on their discrepancy. These questions have been addressed
in Chapter 3 and Section 7.2, respectively. Indeed, many of those results can be
directly applied to such pseudo-random number generators, although for m > 1
some adjustments are needed. On the other hand, there are some new aspects to
the problem of pseudo-random numbers generators. Previously the interest was in
general results which hold for all sequences from some wide class. As a result, these
estimates are not very strong. For pseudo-random numbers it would be enough to
prove stronger results for special sequences. Alternatively, it is sometimes useful to
211
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
212 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
show that for 'almost all' members of some class of sequences (rather than for all)
some refinements can be obtained. Both kind of results will be discussed below.
Typically, the modulus M is chosen as a prime power pk with a special emphasis
on the case M = 2k (because of the connection with computing). The two extreme
cases of M a large prime and M = 2 are of great interest as well, and deserve
special treatment.
For the parameter m, a reasonable choice is m — 1, so ^x — a(x)/M. This
generator is the Tausworthe generator; it has the important advantage that all
the results about periods of (see Chapter 3) and distribution for (see Section 7.2)
linear recurrence sequences may be applied without any of the adjustments needed
for (13.1) with m > 2.
In any case, m should usually not exceed the order of a, in order to have some
hope of independent digits of 7X (at least independent in the weakest sense of not
being obviously dependent). Clearly, if t is the period of a, then the period of the
sequence 7 given by (13.1) is at least r = £/gcd(ra,£).
In a series of papers by Niederreiter the discrepancy of such sequences has been
estimated. His results are given below in a simplified form which can be extracted
from [930, Theorem 3.1, 3.2]. More explicit information about the constants may
be found in [930].
T H E O R E M 13.1. Let M — p denote a prime, and let a denote a linear recurrence
sequence of order n and period t over ¥p with characteristic polynomial irreducible
over Wp. If gcd(m, t) — 1 and sm < n, then for any 1 < N < t, the s-dimensional
discrepancy of the sequence ( 7 ^ , . . . , 7 X + S _i) 7 x = 1 , . . . , TV is
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M NUMBERS 213
in polynomials /io, • • • hs-\ over ¥p. These numbers are called figures of merit of
/ . Clearly rs(f,m,p) < n + 1, and it is known that the bigger r s ( / , ra,p) is, the
smaller is the discrepancy DSiTn(N). In fact, rs(f,m,p) appears in lower and upper
bounds for DS(N) as the term p_rs(/>m>p) and p~r^f ^m^ \ogs t respectively (the
exact results may be found in [930, Sect. 4, 7] and [936, Chap. 9]).
It is also known that both requirements, a large period and a large figure of
merit, can be attained simultaneously. This result may be found in [930, Theo-
rem 5.4].
T H E O R E M 13.2. For any positive integers n, m, s, t such that n is the period
of t modulo p, there exists an irreducible polynomial f over ¥p of degree n and of
period t such that
(m+l)(p-l)y>(t)
5 (/,m,p) > !ogDpp
(ras — l)(ra + 1 — m/p)s
To understand this result better, consider the case t = pn — 1. Then there
exists a primitive period / with
rs(f,m,p) > n - log p lognlogp - slogp(m + 1) + 0(1),
so choosing m = n gives a sequence having discrepancy DS(N) = 0(t1^2+£N~1)
for any fixed s.
As mentioned in Section 9.2, the problem is related to continued fraction ex-
pansions of rational fractions.
All the above estimates of discrepancy need the period to be large enough
to be non-trivial; it should certainly exceed pnl2:, and the length N of sequence
considered should be at least of the same size. This second requirement is very
difficult to meet. Certainly if M — pk is a power of a fixed prime number, m — 1,
£
s — m then (7.8) gives the bound 0{t^ ) for the full period discrepancy of
k
a(x)/p . It is possible that a similar result can be obtained for any ms < n, but
for incomplete periods the corresponding result has not been worked out yet, and
one cannot expect particularly strong estimates here.
The situation is quite different if one agrees to consider 'almost all' sequences
from £ ( / ) , rather then all of them. The following assertion is obtained in [1187] for
m — 1 (that is, for the sequence j x — a(x)/p). Levin [710] extends it to arbitrary
m < n.
T H E O R E M 13.3. Let f denote an irreducible polynomial modulo p of period t.
Then for all but o(pn) sequences a G £ ( / ) the following holds. For any 1 < N < t
the s-dimensional discrepancy of the sequence ^x given by (13.1) is
0(A'-1/2logn+3p + p-m),
where s < n/m and gcd(m,£) = 1.
The upshot is that the bound holds for all A^ for all sequences from the same
set of 'good' sequences. For every fixed A^ such a result is quite simply obtained.
In [710] (but not in [1187]) the result is formulated for primitive polynomials only,
but certainly holds in the form given above.
Ambrosimov [28] obtains results for sequences in residue rings modulo 2k.
These concern the distribution of tuples
(a(x),... , a{x + 5 — 1))
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
214 13. PSEUDO-RANDOM NUMBER GENERATORS
(that is, the question discussed in Section 7.2) rather t h a n the discrepancy of the
points
\lxi • • • ? 1x+s — 1 ]•
Nevertheless, they are in the same spirit, and in particular provide good estimates
which hold for almost all initial values, and certainly could be useful for analysis of
pseudo-random number generators from linear recurrence sequences (see also [661]).
Apparently, these are the simplest in a series of results which can be obtained using
similar methods.
In [326], [327], [329], [679] more general non-linear pseudo-random number
generators are analyzed 'on average'.
Niederreiter, in [936, Sect. 10.1] and in more detail in [939], proposes a mul-
tidimensional matrix generalization of the pseudo-random number generators just
considered. T h a t is, sequences of vectors satisfying linear recurrence equations
with matrix coefficients. He analyzes their periods, discrepancy, and other useful
properties. This work has been continued by Larcher [677].
Returning to the particular case of pseudo-random number generators discussed
above, the most popular, theoretically attractive and practically convenient are
the linear congruencial generators, also known as the Lehmer generators. Such a
generator is a sequence ^x = a(x)/M, where
(13.2) a(x) = Xa(x- 1) + 7/ (mod M ) , 0 < a(x) < M , x G N.
Here the initial value a(0) = a and the multiplier A are integers co-prime to the
modulus M , and 77 is an arbitrary integer. These generators, and discussions of
their features and applications, can be met everywhere from mathematics books
through to computer science sources, to programming manuals; [936], [619], [584],
respectively, are representatives of the literature.
As discussed in Section 1.1, the parameter 77 in (13.2) is a little arbitrary. For
M = 2k, this parameter allows sequences of maximal period 2k to be constructed
(period 2k~2 is maximal for sequences with 77 = 0). From now on only the homo-
geneous linear congruencial generator
(13.3) a(x) = \a(x - 1) (mod M ) , 0 < a(x) < M , x G N,
x
will be considered. T h u s a(x) = a\ , and
_ a\x
N
I I
max y ^ exp(27rz7n7 x )
gcd(ra,A/) = l U=i I
which does not depend on s but does depend on all the other parameters. T h e
second depends on 5, A and M , but does not depend on the initial value and the
length Af of the interval considered:
p8{\,M) = min(m0...ms_i),
where the minimum is taken over all non-trivial solutions of the congruence
(13.4) ra0 + mi A H hm^iA5"1 = 0 (mod M ) ,
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. UNIFORMLY DISTRIBUTED PSEUDO-RANDOM NUMBERS 215
(13.6) p.{X,p) » ^ ^ .
In fact such an inequality holds for almost all A. The fact that almost all A are of
larger period modulo p gives the following statement (cf. [1194], [710]).
THEOREM 13.4. Let p denote a prime and s > 1 an integer. Then for all but
o(p2) pairs (a, A), 1 < a, A < p—1 of initial values and multipliers, the s-dimensional
discrepancy of the sequence (13.1) is D8(N) = O I J V - ^ V ) .
From the computational point of view it would be very desirable to find a
similar result for arbitrary M, especially for M — 2k and other prime powers (in
addition to the computational simplifications, in this case typically there are better
bounds for exponential sums, as shown in Section 5.4). Unfortunately, the proof
uses a bound on the number of solutions of polynomial congruences of degree n with
coefficients co-prime to M which can be as large as M 1 - 1 / / n for highly composite
M: The simplest example is An = 0 (mod 2k) or (A - l ) n = 0 (mod 2k). As a
result, for arbitrary M all that can be said is that for almost all A,
(13.7) ps(A, M) > M 1 / ( s _ 1 ) - e , s > 2,
which is much worse than (13.6) for s > 3. In [629] an 'on-average' estimate
2k
\- JT l/ps(A,Af)»M-1/(-D
A = l,
gcd(A,2) = l
is found (notice that l/p s (A,M) rather than ps(\, M) appears in (13.5)). This
method, leading to the bound (13.6), which is almost as good as possible for M = p,
cannot produce anything better than (13.7) for arbitrary M. Of course, a weak
average bound does not preclude the possibility of there being a good multiplier for
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
216 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
such moduli, so it still makes sense to show t h a t (13.7) can be refined for some A,
and even to try to describe such A.
The case 5 = 2 has been studied independently by many authors (in the differ-
ent context of Korobov's optimal coefficients [634], [925], [936]). Notice t h a t for
s = 2 (13.6) and (13.7) are not much different. The most convenient tool in this
case is the formula (9.9). In the Fibonacci case, (9.10) applies, but there is still
an open question about the period of f(h) modulo f(h + 1). Modulo an arbitrary
fixed M , a desirable choice for A has X/M ~ ( 5 1 / 2 - l ) / 2 = [ 1 , 1 , 1 , . . . ] . This is
quite a naive approach, and does not guarantee t h a t all the partial quotients of
X/M will be small. Nevertheless, in practical computations this has been success-
fully used, and this procedure is presented in many standard numerical analysis
recipes. See also [3] for explicit expressions for the 2-dimensional discrepancy of
linear congruencial generators.
For 5 = 3, Larcher and Niederreiter [678] prove t h a t for M = pk, a power of a
fixed prime number, there is a A which is a primitive root modulo pk if p > 3 and
A = 5 (mod 8) if p = 2 (thus of period 2k~2 modulo 2k) with the property t h a t
P3(A,M)>-^-.
log M
The proof is based on a detailed study of the structure of solutions of quadratic
congruences. Although it is possible t h a t the method can be extended to higher
dimensions, the technical complications will increase with each step, and it is not
clear how to carry out this approach for arbitrary s.
The following completely explicit construction of [1189] gives an estimate
weaker t h a n (13.6) but stronger t h a n (13.7); however it works for any M. If A
is defined by the congruence Xr = d (mod Af), where 1 < d < r < M are arbi-
trary integers with gcd($r, M) = 1 and d ~ r ~ M 1 / / ( ^ + 1 ) , then
(13.8) ps(\, M) > min{tfr, A f r ~ s + 1 } - Af 2 / ( s + 1 ) .
If in the previous construction one takes # ~ r ~ (M/2s)1^s then
UJS(\, M) > min{(tf + r ) / , s ^ ^ A f r " ^ } ~ 2 / ( 2 s ) - 1 / 2 s M 1 / s .
2 2 1 2 1 1 2
This bound is tight, as for any A of the same order (at least for s fixed) b o t h
meet the upper bound UJS(\,M) < 7 ( s ) M 1 / / s , where 7(5) is the Hermite constant
(see [619, Sect. 3.3.4], [256]).
Similar problems arise in cryptography. Let WS(5,M) denote the size of the
set of A, 1 < A < M, with ws(\, M) < Ms. For s = 3, the bound
W3(5,M) = 0{M1/2+3S/2+s)
is given in [415]. For the special case of M = pk a further improvement to (13.8)
is possible. The following result is taken from [629].
T H E O R E M 13.5. Let p denote a fixed prime and let ft, 1 < d < p2 denote a
primitive root modulo p2 if p > 3, and d = 3 if p = 2. Set
X = (pf 4- tf)(p* + l ) " 1 (mod pk),
where t = 2 [k/(2s + 1)J. Then, for sufficiently large M = pk,
4 2s+l
Ps(X,M)^>M ^ \
and the period X modulo M is at least Af/4.
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M N U M B E R S 217
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
218 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
span the vector space F*. As seen in Chapter 3, for several non-linear generators
the maximal period t = p can be guaranteed. Such a generator can be considered
as a map ¥p —> F p , so there is a unique polynomial G £ F p [X] with degree
D — degG < p such that a[x) — G(x\ x = 1 , . . . ,p (G must be a permutation
polynomial over F p , that is polynomial inducing a bijection). The parameter D is
an important characteristic of the sequence, see [935, Sect. 3] or [936, Sect. 8.1].
In particular, the sequence passes the s-dimensional lattice test for all s < D. A
lower bound on D is given in [942], which immediately yields the following result.
T H E O R E M 13.6. Letp denote a prime. Then a non-linear generator of maximal
period t = p, given by (3.6), with a polynomial F G ¥p(X) of degree d > 2, and
M = p, passes the s-dimensional lattice test for all s < \p/d].
It is also known [237], [392] that an inversive generator (3.8) (of maximal
period) passes the s-dimensional lattice test for all s < (p + l)/2. A close look at
Theorem 13.6 shows that this test is not very dependable. It seems plausible that
there are sequences which can only be represented by a polynomial of high degree
and which nonetheless have bad distribution properties. Such sequences do indeed
exist; see a discussion in [935, Sect. 3].
Motivated by applications to pseudo-random number generators, Anashin [30]
describes polynomials for which the sequences satisfying the congruence (3.6) with
M — pk a prime power are uniformly distributed modulo pk (that is, they generate
a one-to-one mapping in the residue ring Z/pkZ). Uniform distribution in the ring
of p-adic integers Z p is also considered.
So far the continuous aspect of the problem, dealing with numbers distributed
in [0,1], has been considered. The discrete aspect of the problem relates to binary
sequences, also known as sequences of pseudo-random bits. For an infinite sequence
(Sx) of #-ary digits, 0 < 5X < g — 1 and a finite #-ary string S = (d\... dk), denote by
F(S, N) the number of occurrences of this string in the initial segment of length iV,
and by f(S,N) = F(S,N)/N the corresponding frequencies. The sequence (Sx) is
called a pseudo-random sequence of g-ary digits if for any string 5, / ( S , N) —» g - ' 5 '
as N —> oo, where |AS| is the length of S. As shown in Section 8.1, such sequences
correspond to #-ary expansions of numbers normal to base g, and such g-ary se-
quences are also known as normal sequences of signs. Postnikov [1045], [1046]
provides a good outline of early approaches to such sequences and their applica-
tions. He also discusses various generalizations of this notion of normality such
as Bernoulli normal sequences, Markov normal sequences and normal continued
fractions. A construction of a Markov normal sequence with a very small discrep-
ancy (appropriately defined) is due to Levin [712]. Such discrete pseudo-random
sequences (finite or infinite) are of great importance in theoretical computer sci-
ence [1263]. They may also be used to rule out random walks on lattices, and
Postnikov [1044] notices that this leads to an algorithm for solving finite-difference
equations arising from the Dirichlet problem in the theory of partial differential
equations.
There is, however, one minor logical irritation here. Theorem 8.5 says that
there are numbers which are normal to some base g but are not normal to some
other base / . If such g-ary sequences are identified with the corresponding number
oo
h=l
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M N U M B E R S 219
which they represent, then the property of being pseudo-random is therefore not
invariant under change of base. On the other hand, it seems reasonable that being
pseudo-random should be an intrinsic property of #. This question is discussed by
Calude and Jiirgensen [196], who show how to change the definition of pseudo-
random sequence (random in their terminology) in order to make it invariant to
basis change. A dynamical overview of randomness for #-ary sequences may be
found in [1340].
Certainly, the smaller the difference | / ( 5 , N) — g~' 5 '| can be made, the better;
this size is related to the discrepancy of ($gx). This relation is not very direct,
and the sequences having the best bounds for the frequencies do not necessarily
correspond to values of # with the best value of the discrepancy. Indeed, [1185]
gives a construction of a sequence with
f(S,N)=g-W+0(N-1+s)
for any finite string S. The result of Levin [713] yields a slight improvement of this
asymptotic formula. On the other hand, the long standing record of Korobov [635]
and Levin [706] giving explicit constructions of $ such that the discrepancy of (dgx)
is 0 ( A r _ 2 / 3 + e ) , has recently been bettered: Levin [714] gives a construction which
leads to the bound 0(N~1 log2 TV).
Frequencies f(S,N) can also be considered for finite sequences, and here the
theory of linear recurrence sequences is of great use. For example, if g = q is a prime
power, Theorems 7.1 and 7.2 are nothing but bounds for f(N,S); see also [661],
[926].
Levin [712] also give a construction of a Markov normal number.
A different point of view on pseudo-random bits obtained from linear recurrence
sequences and their applications is pursued in [26]. Results of these papers are also
used later in [258] to study the statistical properties of shrinking two M-sequences
over F2, when the sequences a and s are selected at random from the sets of M-
sequences of orders n and m. The results of Section 7.1 can be used to obtain
some individual estimates as well. It is shown in [258] that if both a and s are
M-sequences over F 2 , and the characteristic polynomial of a is chosen with uniform
probability among all the (f(2n — l ) / n primitive polynomials of degree n over F2,
then the shrunken sequence (as(x)) has good statistical properties.
Another important question concerns the output rate of (a s (x)), or equivalently
the upper bound for hx, the position of the :rth 1 in s (see Chapter 4). It is shown
in [258] that if the characteristic polynomial of the selector sequence s is chosen
with uniform probability among all the ip(2m — l ) / n primitive polynomials of degree
m over F 2 , then hx = 0(x). A very natural question is whether it is possible in these
two statement to get individual estimates (rather than estimates which hold with
high probability). An encouraging remark in this direction is that Theorem 14.8
below implies that hx = 0(x) for any x > 2£rn. This and several similar results
about the output rate and the distribution of the shrinking generator from two
linear recurrence sequences are presented in [1205]. All these properties make such
sequences very attractive for applications. Moreover, (4.1) shows the shrunken
sequence is of exponential linear complexity (see Section 13.2 for the definition),
which is especially important for applications in cryptography.
Finally, one of the closest relatives of pseudo-random numbers are so-called
quasi-random numbers or points. The main difference is: Instead of seeking for
1-dimensional sequences (7^) for which the s-tuples ( 7 ^ , . . . ,7 X + S _i) admit a good
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
220 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 221
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
222 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 223
with the initial values ai(l) = a2(l) = 0, ai(l) = ^2(1) = 2, are identical. Boyar
proves that there is a prediction procedure of polynomial complexity (nlogM) 0 ^ 1 )
which makes at most
2n + 3 + log n -h | n log n + n log M
mistakes. Several related results have been obtained by Joux and Stern [565].
Clearly one cannot guarantee to do this with less than n — 1 mistakes.
For the quadratic generator
a(x + 1) = f2a{x)2 + fia(x) + /o (mod M)
a similar result is obtained, with the number of mistakes at most 10 + 3 log M (see
the comment after [138, Theorem 11]).
Lagarias and Reeds [669] and, later, Krawczyk [641] combine these two results
in a general statement about the predictability of non-linear recurrence sequences
(including vector-valued sequences). The methods of [669] work for /c-dimensional
vector-polynomial sequences A(x + 1) = F {A{x)) over a commutative ring 1Z where
A(x) G 1Zk, and the map F : 1Zk —> 1Zk is given by k polynomials in k variables.
This method relies on Hilbert's Basis Theorem and is therefore not computationally
effective. If 1Z is the residue ring modulo M, and all polynomials involved are of
total degree d, then the total number of mistakes is estimated as
1 4- <p(d, k) + ±N log N + dN log M,
where
AT , , (k + d\
The function ^(d, /c) is generally speaking not explicit, but for two important cases
it is known: <p(l, k) = k + 1 and <£>(d, 1) = <i+ 1. These two cases subsume the cases
considered by Boyar [138].
For the polynomial recurrence equation
(13.11)
a{x + n) = F (a(x + n - 1 ) , . . . , a(x)) (mod M), 0 < a(x) < M - 1, x G N
in which neither the polynomial F G Z [ X i , . . . , X n ] nor the modulus M are known
the following effective result holds [641].
THEOREM 13.7. Each polynomial recurrence sequence is predictable in polyno-
mial time, with the number of mistakes at most O (k2 log(kMd)) 7 where d is the
degree and k is the number of monomials in the polynomial F.
Clearly k < ( n ^ d ), but many generators are based on sparse polynomials for
which k may be significantly smaller [641]. For vector-valued sequences similar
results have also been proved.
Blum, Blum and Shub [127] introduce and study the 1/M generator. Let g > 2
denote a fixed integer, and M a positive integer that can be written with at most
L g-ary digits (that is, M < gL). They show that given k = 2L -f 3 consecutive
digits dfc+i,... , dh+k the denominator M (and hence the numerator) can be found
in polynomial time L°^l\ Indeed, the fraction
a = 0.(^+1,... ,dh+k)g
h
is an approximation to the fractional part {Ag /M} with error at most
k 2
g~ < 1/2M .
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
224 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
So, ultimately, {Agh/M} is one of the convergents of the continued fraction ex-
pansion of a, and thus can be found effectively. On the other hand, it is observed
in [127] that, under Artin's conjecture, k = L — 1 digits are not enough to determine
M unambiguously.
In [629], similar results are shown for smaller segments of digits without reliance
on any standard conjectures. The first statement claims that a string of k —
[3L/37J consecutive digits provides no information about M. Roughly speaking,
we see without any unproven conjectures that M may take almost any value among
all the primes p < gL.
T H E O R E M 13.8. Given any string of k = [3L/37J consecutive g-digits, there
are at least (1 + o(l))7r(gL) prime numbers p < gL such that the g-adic expansion
of 1/p contains that string.
Further, using results of [636], [637] it is shown that for an arbitrary e > 0,
any string of k < (l — e)L consecutive digits appears in the g-adic expansion of 1/M
for at least C(g)geL/2 values of M < gL. Here C(g) is some constant depending on
g only.
T H E O R E M 13.9. There is a constant c(g) > 0, depending only on g, such that
for every element M of the set
Wl={M=pa/jJ : l < / i < Q , (/i,p) = l } ,
where
Q= [c{g)geL,2\,
p is the smallest odd prime number with gcd(p, #) = 1, and pa is the largest power
of p that is less than gL/Q, and any string of k = [(1 — e)L\ consecutive g-digits,
the g-adic expansion of 1/M contains the string.
To make prediction harder several tricks can be used. One of them is to use
two or more generators in parallel (with different initial values, or even of differ-
ent types) and to mix their results. For example, a power generator ap and an
exponential generator ae can be combined to form a(x) = ap(x) + ae(x) (mod M).
The shrinkage operation described in Chapter 4 can be also be used. There do not
seem to be significant results about the predictability of such combined sequences.
The only exception is the bound (4.1) for the linear complexity of shrinking two
M-sequences. Another way to increase the security of a sequence is to use parts of
the elements. For example, given integers h, s, instead of a sequence a, consider
the truncation
ahi8(x) = [2~ha{x)\ (mod 2 s ), 0 < a M ( x ) < 2 s - 1.
Thus, ah,s is formed by the s bits of a(x) starting from the (/i+l)th lowest significant
bit. For several common pseudo-random generators, an extensive detailed study of
this and similar truncation procedures has been made in the papers [137], [415],
[666], [669] (and references therein). Some of these results follow. For integers
s, k > 1 denote by £fcjS(a(0), A, M) the binary s/c-dimensional vector obtained by
adjoining s leading bits of the k first elements of the sequence a produced by the
homogeneous linear congruencial generator (13.3) with the initial value a(0) (for
general generators (13.2) similar results can be obtained). All elements of the
sequence having L = [logM] bits are considered, so jBfc?s(a(0), A, M) is obtained
from the bits of \a(j)/2L~s\, j = 0 , . . . , k — 1. It is shown in [415, Theorem 3.1],
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 225
that for any k > 1, e > 0 and sufficiently large square-free M > c(k,e), there is an
exceptional set E(M,k,e) of multipliers A of cardinality
\E(M,k,e)\ <Ml~£
such that for any multiplier not in E(M, /c, e) the following is true: If
s = \(l/k + e) log M + fc(l/2 + log 3) + 3.5 log k + 2 - log 3] ,
then the initial value of the sequence a can be uniquely determined in polyno-
mial time from knowledge of the multiplier A, the modulus M and the vector
BM(a(0),A,M).
It is also remarked in that paper that an exceptional set E(M, k,e) is necessary,
as A = 1 is always a bad multiplier. Also, by Dirichlet's principle, if
s < ——log ML
I & I
then there is an initial value a® such that 2?fcjS(a(0), A, M) = jBfc)S(ao, A, M) for at
least M e initial values of a(0) = 0 , . . . , M — 1. Such an initial value is secure in
the following sense: Knowledge of B^ s (a(0), A, M) is not enough to determine the
entire sequence.
In the opposite direction, it is shown in [629] that for infinitely many primes
M = p,
s= — — - l o g p - log logp + O(l)
and all but at most pl~£ multipliers A, and any initial value a(0) = 0 , . . . ,p — 1,
the problem is secure: That is the knowledge of A, M and i ^ s ( a ( 0 ) , A, M) for
such k and 5 is not enough to determine the initial value uniquely (regardless of
the complexity of the algorithm) and moreover, the number of candidates for the
initial value is exponentially large.
THEOREM 13.10. Let p denote an L-bit prime with 2L - 2 3 L / 4 < p < 2L.
Then for any k > 5, e > 0 there is an exceptional set F(p, /c, e) of multipliers A
of cardinality \F(p,k,e)\ < pl~e, such that for any multiplier A ^ F{p,k,e) the
following is true. If
s < —-—L — log L — c,
k
where c is an absolute constant, then for any binary sk-dimensional vector B there
exist at least p£ initial values of a(0) = 0,.. .p — 1 such that J3fc?s(a(0), A,p) = B.
In the results above, only the initial value is unknown. Boyar [137] shows
that even if the multiplier A and the modulus M are unknown and only the
t < Clog log M lowest bits are available, then the sequence can be predicted in
polynomial time with at most (2 + log M) log M + 3 errors (provided that each
wrong prediction is corrected).
The results of [641], [669] are applicable to such truncated sequences as well
(see also [666]).
Finally, Kuzmin and Nechaev [662], [663] show that any linear recurrence
sequence a of order n over 7Ljpk7L can be recovered from its last coordinate ak-\{x)
given by (3.5) in time 0(nph). (Unfortunately it is not clear in these papers exactly
what 'recovered' means — perhaps that the characteristic polynomials and initial
values can be found).
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
226 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 227
(13.12) C*b<TClT,
n + p — l\r
n J
For fixed p, this is of polynomial order n°^ rather t h a n of exponential order as
over other fields.
The question about finding q from values of the bit sequence b is also considered
in [610], where some probabilistic algorithms are presented.
Statistical properties of Ca(h) for a random sequence a of elements of ¥q are
summarized by Niederreiter [932]. In particular, denote by Nh(L) the number of
different /i-term sequences a over ¥q with Ca(h) = L. T h e n
Nh(L) = (q- l)9min{2h-2L,2L-l}i
for h > L > 0. This formula can be used to show t h a t Ca(h) = h/2 + 0(logh) for
an infinite uniformly distributed random sequence over ¥q. More precisely, for such
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
228 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
a sequence
\Cg(h) ~ h/2\
r 1
lim sup J \-t——-—- = —
h^oo log h 2 log q
with probability 1, see [932], [937].
As seen above, for a random sequence Ca(h) is about h/2. A natural question is
how to construct a pseudo-random sequence having similar behaviour of the linear
complexity profile. The construction of such sequences uses the theory of continued
fractions in the field ¥q((X~1)) of formal Laurent series over ¥q. For a sequence a,
write
oo
A(x) = J2^h)x-he¥q((x-1)).
h=l
Let [Ci(X), C 2 P O , . . . ] denote the continued fraction expansion of A(X) in which
the partial quotients Cj(X), j = 1,2,... , are polynomials over ¥q with positive
degree. Then the following remarkable statement holds [929], [937].
T H E O R E M 13.11. Put dj = degCj. Then
J'=0
where m is uniquely defined by the inequalities
771 — 1 m
Jl, if x = 2 ^ - 1 ;
I 0, otherwise;
corresponding to the continued fraction [x, x , . . . ] over F2 has perfect linear com-
plexity profile. On the other hand, the bit distribution of this sequence is very
poor, which shows some of the weaknesses of this measure of randomness for se-
quences. Further discussion of the density of unit bits in such sequences can be
found in [853], [1364].
Niederreiter [934] demonstrates that a finite sequence of positive integers
L i , . . . ,Lh
can be realized as the initial segment of the linear complexity profile of some se-
quence a over ¥q if and only if the following conditions are satisfied: If L& > k/2
then Lfc+i = Lk , and if Lk < k/2 then L^ + i = L^ or L^+i = fc -f 1 — L&, k =
0 , 1 , . . . , h — 1. Moreover, there are exactly qminiLh,h-Lh} g u c n /^element sequences.
It is striking that this number depends on Lh only. The jump complexity profile of
a sequence a is defined as follows: Ja(h) is the number of positive integers among
the list £ a ( l ) , £ a ( 2 ) — £ a ( l ) , . . . ,Ca{h) — Ca{h — 1). Some combinatorial and statis-
tical properties of linear complexity and jump complexity profiles were considered
in [212], [348], [544], [1327] for the binary case and in [934] for sequences over
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 229
where Nh(L,r) is the number of different ft-term sequences a over ¥q such that
Ca(h) = L and Ja{h) = j (trivially, ^ ( 0 , 0 ) = 1 and Nh(L,0) = 0 if L > 1).
The mean value and variance of J0(ft) for an infinite uniformly distributed random
sequence over ¥q are evaluated in [934] as well. One of the results obtained asserts
that
limsup(Mog/i)- 1 / 2 |J a (/i) - (q-l)h/2q\ < (2/q)1/2
ft,—>oo
with probability 1. For random binary sequences a the expected value E(J a (ft))
and the variance of V(J a (ft)) of J a (ft) are explicitly evaluated by Wang [1327]. For
instance,
'ft/4 + 1/3 - 2~h/S if ft is even;
W ) ) = u ft/4
„ +. r5/12
„„ - „_
2 -h^ / 3 if ft is odd.
In [1203] a lower bound of order Hp~1/2\og~ p is obtained for the linear
complexity of a sequence of H consecutive values of the discrete logarithm modulo
p modulo any divisor dofp—1. Several other similar results can be found in [674],
[804].
The case d = 2 is of interest because it corresponds to the sequence of values
of the rightmost bit of the discrete logarithm, which determines whether x is a
quadratic residue modulo p. In this case, the linear complexity of the infinite
sequence (which is periodic with period p) has been evaluated in [302]:
a(s) = Yfil>(si)q-i€[0,l].
i=l
Now let d : N —> R be any non-negative function. If \Ln(s) — n/2\ < d(n) for all
n = 1, 2 , . . . then the sequence s has d-almost perfect linear complexity profile. The
case of constant d has special interest.
The Hausdorff dimension D(d,q) and the Hausdorff measure M{d,q) of the
set of points corresponding to sequences with d-almost perfect linear complexity
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
230 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S
Wg) = * ,
where $(<i, g) is the largest root of the equation
d-l
Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
View publication stats