0% found this document useful (0 votes)
9 views21 pages

Pseudo-Random Number Generators

Description of different algorithms for generating pseudo random numbers.

Uploaded by

samke.mufta1971
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views21 pages

Pseudo-Random Number Generators

Description of different algorithms for generating pseudo random numbers.

Uploaded by

samke.mufta1971
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299681629

Pseudo-random number generators

Chapter · July 2003


DOI: 10.1090/surv/104/13

CITATIONS READS

5 82

4 authors, including:

Igor E. Shparlinski Thomas Boulton Ward


UNSW Sydney Newcastle University
969 PUBLICATIONS 10,016 CITATIONS 154 PUBLICATIONS 1,922 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Igor E. Shparlinski on 25 April 2016.

The user has requested enhancement of the downloaded file.


http://dx.doi.org/10.1090/surv/104/13

C H A P T E R 13

Pseudo-Random Number Generators

Linear and non-linear recurrence sequences may be used to generate pseudo-


random numbers, and in this chapter the effectiveness of this approach is discussed.
Applications to cryptography are described, relating randomness to Kolmogorov
complexity.

13.1. Uniformly Distributed P s e u d o - R a n d o m N u m b e r s

The most direct way to get pseudo-random numbers 7 1 , 7 2 , . . . from linear


recurrence sequences is the following. Let M > 1 denote an integer, and let a
denote a linear recurrence sequence taking values in the residue ring modulo M ,
identified as usual with the set { 0 , 1 , . . . , M — 1}. Finally, use elements of the
sequence a as M - a r y digits of some other sequence. This can be done in several
ways; the most popular are to fix some integer ra > 1 and to define 7^ to be

1
(13.1) E ^ ^ , x€N,
2= 0

a(x + i)
E
2= U
xeN,
or

E
a[x + hi
x eN,
=0

for integers h\,... , / i m . Elements of the first sequence look more independent, so
t h a t construction is more common (without any particular theoretical justification,
it must be a d m i t t e d ) . The integer M is called the modulus of the pseudo-random
number generator.
In order to use these numbers as pseudo-random numbers, which should as a
minimal requirement be uniformly distributed on the unit interval, information on
their period and their distribution on [0,1] (more generally, about the distribution
of the vectors ( 7 ^ , . . . , 7 ^ + s _ i ) in the s-dimensional unit cube) is needed. T h a t is,
estimates are needed on their discrepancy. These questions have been addressed
in Chapter 3 and Section 7.2, respectively. Indeed, many of those results can be
directly applied to such pseudo-random number generators, although for m > 1
some adjustments are needed. On the other hand, there are some new aspects to
the problem of pseudo-random numbers generators. Previously the interest was in
general results which hold for all sequences from some wide class. As a result, these
estimates are not very strong. For pseudo-random numbers it would be enough to
prove stronger results for special sequences. Alternatively, it is sometimes useful to

211

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
212 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

show that for 'almost all' members of some class of sequences (rather than for all)
some refinements can be obtained. Both kind of results will be discussed below.
Typically, the modulus M is chosen as a prime power pk with a special emphasis
on the case M = 2k (because of the connection with computing). The two extreme
cases of M a large prime and M = 2 are of great interest as well, and deserve
special treatment.
For the parameter m, a reasonable choice is m — 1, so ^x — a(x)/M. This
generator is the Tausworthe generator; it has the important advantage that all
the results about periods of (see Chapter 3) and distribution for (see Section 7.2)
linear recurrence sequences may be applied without any of the adjustments needed
for (13.1) with m > 2.
In any case, m should usually not exceed the order of a, in order to have some
hope of independent digits of 7X (at least independent in the weakest sense of not
being obviously dependent). Clearly, if t is the period of a, then the period of the
sequence 7 given by (13.1) is at least r = £/gcd(ra,£).
In a series of papers by Niederreiter the discrepancy of such sequences has been
estimated. His results are given below in a simplified form which can be extracted
from [930, Theorem 3.1, 3.2]. More explicit information about the constants may
be found in [930].
T H E O R E M 13.1. Let M — p denote a prime, and let a denote a linear recurrence
sequence of order n and period t over ¥p with characteristic polynomial irreducible
over Wp. If gcd(m, t) — 1 and sm < n, then for any 1 < N < t, the s-dimensional
discrepancy of the sequence ( 7 ^ , . . . , 7 X + S _i) 7 x = 1 , . . . , TV is

D (N) « iW*Ym8Pn/2t~llo&8P + sP~m> forN = t,


s[ )
\{2/it)smspn/2t-l\ogt\ogsp + sp-rn, forl<N<t,
if P > 3 and

D (N) « f2~SmS2n/2t~1 + s2"m' f°r N = t


>
s{ s s n 2 1 m
' [2- m 2 / r logi + s2- , forl<N<t,
ifp = 2.
The general case of gcd(t, m) > 1 can be considered as well (with slightly worse
constants). Notice that if a is an M-sequence (that is, with period t = pn — 1) and
ms > n, then DS(N) = 0(t 1 / 2 + e A^ _ 1 ), where the implied constant depends on n
only.
Several more results about the discrepancy of such sequences can be found
in [677].
The case of sm > n (that is the case of manifestly dependent digits) is more
complicated. It turns out that the discrepancy depends not only on the period, but
also on the number
s-l
de
r8(f,m,p) = min ^ & ^>
2= 0
where the minimum is taken over all non-zero polynomial solutions of the congru-
ence
s-l
J2 deg hi(X)Xim = 0 (mod f(X))

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M NUMBERS 213

in polynomials /io, • • • hs-\ over ¥p. These numbers are called figures of merit of
/ . Clearly rs(f,m,p) < n + 1, and it is known that the bigger r s ( / , ra,p) is, the
smaller is the discrepancy DSiTn(N). In fact, rs(f,m,p) appears in lower and upper
bounds for DS(N) as the term p_rs(/>m>p) and p~r^f ^m^ \ogs t respectively (the
exact results may be found in [930, Sect. 4, 7] and [936, Chap. 9]).
It is also known that both requirements, a large period and a large figure of
merit, can be attained simultaneously. This result may be found in [930, Theo-
rem 5.4].
T H E O R E M 13.2. For any positive integers n, m, s, t such that n is the period
of t modulo p, there exists an irreducible polynomial f over ¥p of degree n and of
period t such that
(m+l)(p-l)y>(t)
5 (/,m,p) > !ogDpp
(ras — l)(ra + 1 — m/p)s
To understand this result better, consider the case t = pn — 1. Then there
exists a primitive period / with
rs(f,m,p) > n - log p lognlogp - slogp(m + 1) + 0(1),
so choosing m = n gives a sequence having discrepancy DS(N) = 0(t1^2+£N~1)
for any fixed s.
As mentioned in Section 9.2, the problem is related to continued fraction ex-
pansions of rational fractions.
All the above estimates of discrepancy need the period to be large enough
to be non-trivial; it should certainly exceed pnl2:, and the length N of sequence
considered should be at least of the same size. This second requirement is very
difficult to meet. Certainly if M — pk is a power of a fixed prime number, m — 1,
£
s — m then (7.8) gives the bound 0{t^ ) for the full period discrepancy of
k
a(x)/p . It is possible that a similar result can be obtained for any ms < n, but
for incomplete periods the corresponding result has not been worked out yet, and
one cannot expect particularly strong estimates here.
The situation is quite different if one agrees to consider 'almost all' sequences
from £ ( / ) , rather then all of them. The following assertion is obtained in [1187] for
m — 1 (that is, for the sequence j x — a(x)/p). Levin [710] extends it to arbitrary
m < n.
T H E O R E M 13.3. Let f denote an irreducible polynomial modulo p of period t.
Then for all but o(pn) sequences a G £ ( / ) the following holds. For any 1 < N < t
the s-dimensional discrepancy of the sequence ^x given by (13.1) is
0(A'-1/2logn+3p + p-m),
where s < n/m and gcd(m,£) = 1.
The upshot is that the bound holds for all A^ for all sequences from the same
set of 'good' sequences. For every fixed A^ such a result is quite simply obtained.
In [710] (but not in [1187]) the result is formulated for primitive polynomials only,
but certainly holds in the form given above.
Ambrosimov [28] obtains results for sequences in residue rings modulo 2k.
These concern the distribution of tuples
(a(x),... , a{x + 5 — 1))

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
214 13. PSEUDO-RANDOM NUMBER GENERATORS

(that is, the question discussed in Section 7.2) rather t h a n the discrepancy of the
points
\lxi • • • ? 1x+s — 1 ]•
Nevertheless, they are in the same spirit, and in particular provide good estimates
which hold for almost all initial values, and certainly could be useful for analysis of
pseudo-random number generators from linear recurrence sequences (see also [661]).
Apparently, these are the simplest in a series of results which can be obtained using
similar methods.
In [326], [327], [329], [679] more general non-linear pseudo-random number
generators are analyzed 'on average'.
Niederreiter, in [936, Sect. 10.1] and in more detail in [939], proposes a mul-
tidimensional matrix generalization of the pseudo-random number generators just
considered. T h a t is, sequences of vectors satisfying linear recurrence equations
with matrix coefficients. He analyzes their periods, discrepancy, and other useful
properties. This work has been continued by Larcher [677].
Returning to the particular case of pseudo-random number generators discussed
above, the most popular, theoretically attractive and practically convenient are
the linear congruencial generators, also known as the Lehmer generators. Such a
generator is a sequence ^x = a(x)/M, where
(13.2) a(x) = Xa(x- 1) + 7/ (mod M ) , 0 < a(x) < M , x G N.
Here the initial value a(0) = a and the multiplier A are integers co-prime to the
modulus M , and 77 is an arbitrary integer. These generators, and discussions of
their features and applications, can be met everywhere from mathematics books
through to computer science sources, to programming manuals; [936], [619], [584],
respectively, are representatives of the literature.
As discussed in Section 1.1, the parameter 77 in (13.2) is a little arbitrary. For
M = 2k, this parameter allows sequences of maximal period 2k to be constructed
(period 2k~2 is maximal for sequences with 77 = 0). From now on only the homo-
geneous linear congruencial generator
(13.3) a(x) = \a(x - 1) (mod M ) , 0 < a(x) < M , x G N,
x
will be considered. T h u s a(x) = a\ , and
_ a\x

Generally speaking, the s-dimensional discrepancy DS(N) of ( 7 ^ , . . . , 7 X + S _ i ) de-


pends on several characteristics. The first is the maximal value of the exponential

N
I I
max y ^ exp(27rz7n7 x )
gcd(ra,A/) = l U=i I
which does not depend on s but does depend on all the other parameters. T h e
second depends on 5, A and M , but does not depend on the initial value and the
length Af of the interval considered:
p8{\,M) = min(m0...ms_i),
where the minimum is taken over all non-trivial solutions of the congruence
(13.4) ra0 + mi A H hm^iA5"1 = 0 (mod M ) ,

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. UNIFORMLY DISTRIBUTED PSEUDO-RANDOM NUMBERS 215

and m = max{l, |ra|}. If M = p is a prime, then


N
I log s p
(13.5) DS(N) < ^f-^ max y^exp(27rim7 x ]
iv gcd(m,p) = l Ps(A,p)'
for other values of M some simple adjustments are necessary.
All these classical facts can be found in [925]; the bound (13.5) is a combination
of Theorem 13.3 and the bound [930, Eqn. (5.11)].
There is also one more characteristic which is defined analogously to ps(A, M),
LJS(\, M) = mm(ml + • • • + ra^)1/2,
where the minimum is taken over the same set of non-trivial solutions of (13.4).
This parameter is responsible for the lattice structure of points ( 7 ^ , . . . , 7 X + S _i),
see [619, Sect. 3.3.4].
By employing standard techniques for the exponential sums in (13.5), a good
bound — close to the square-root — can be found for almost all initial values.
Moreover, in Section 5.4, it is shown that for many special moduli M and a wide
range of TV, individual bounds are available.
The parameter ps(X,M) is less well understood. The simplest case is as usual
M — p prime. For such moduli, one can prove that there is a primitive root modulo
M, A (needed to provide the large period) such that

(13.6) p.{X,p) » ^ ^ .

In fact such an inequality holds for almost all A. The fact that almost all A are of
larger period modulo p gives the following statement (cf. [1194], [710]).
THEOREM 13.4. Let p denote a prime and s > 1 an integer. Then for all but
o(p2) pairs (a, A), 1 < a, A < p—1 of initial values and multipliers, the s-dimensional
discrepancy of the sequence (13.1) is D8(N) = O I J V - ^ V ) .
From the computational point of view it would be very desirable to find a
similar result for arbitrary M, especially for M — 2k and other prime powers (in
addition to the computational simplifications, in this case typically there are better
bounds for exponential sums, as shown in Section 5.4). Unfortunately, the proof
uses a bound on the number of solutions of polynomial congruences of degree n with
coefficients co-prime to M which can be as large as M 1 - 1 / / n for highly composite
M: The simplest example is An = 0 (mod 2k) or (A - l ) n = 0 (mod 2k). As a
result, for arbitrary M all that can be said is that for almost all A,
(13.7) ps(A, M) > M 1 / ( s _ 1 ) - e , s > 2,
which is much worse than (13.6) for s > 3. In [629] an 'on-average' estimate
2k

\- JT l/ps(A,Af)»M-1/(-D
A = l,
gcd(A,2) = l

is found (notice that l/p s (A,M) rather than ps(\, M) appears in (13.5)). This
method, leading to the bound (13.6), which is almost as good as possible for M = p,
cannot produce anything better than (13.7) for arbitrary M. Of course, a weak
average bound does not preclude the possibility of there being a good multiplier for

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
216 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

such moduli, so it still makes sense to show t h a t (13.7) can be refined for some A,
and even to try to describe such A.
The case 5 = 2 has been studied independently by many authors (in the differ-
ent context of Korobov's optimal coefficients [634], [925], [936]). Notice t h a t for
s = 2 (13.6) and (13.7) are not much different. The most convenient tool in this
case is the formula (9.9). In the Fibonacci case, (9.10) applies, but there is still
an open question about the period of f(h) modulo f(h + 1). Modulo an arbitrary
fixed M , a desirable choice for A has X/M ~ ( 5 1 / 2 - l ) / 2 = [ 1 , 1 , 1 , . . . ] . This is
quite a naive approach, and does not guarantee t h a t all the partial quotients of
X/M will be small. Nevertheless, in practical computations this has been success-
fully used, and this procedure is presented in many standard numerical analysis
recipes. See also [3] for explicit expressions for the 2-dimensional discrepancy of
linear congruencial generators.
For 5 = 3, Larcher and Niederreiter [678] prove t h a t for M = pk, a power of a
fixed prime number, there is a A which is a primitive root modulo pk if p > 3 and
A = 5 (mod 8) if p = 2 (thus of period 2k~2 modulo 2k) with the property t h a t

P3(A,M)>-^-.
log M
The proof is based on a detailed study of the structure of solutions of quadratic
congruences. Although it is possible t h a t the method can be extended to higher
dimensions, the technical complications will increase with each step, and it is not
clear how to carry out this approach for arbitrary s.
The following completely explicit construction of [1189] gives an estimate
weaker t h a n (13.6) but stronger t h a n (13.7); however it works for any M. If A
is defined by the congruence Xr = d (mod Af), where 1 < d < r < M are arbi-
trary integers with gcd($r, M) = 1 and d ~ r ~ M 1 / / ( ^ + 1 ) , then
(13.8) ps(\, M) > min{tfr, A f r ~ s + 1 } - Af 2 / ( s + 1 ) .
If in the previous construction one takes # ~ r ~ (M/2s)1^s then
UJS(\, M) > min{(tf + r ) / , s ^ ^ A f r " ^ } ~ 2 / ( 2 s ) - 1 / 2 s M 1 / s .
2 2 1 2 1 1 2

This bound is tight, as for any A of the same order (at least for s fixed) b o t h
meet the upper bound UJS(\,M) < 7 ( s ) M 1 / / s , where 7(5) is the Hermite constant
(see [619, Sect. 3.3.4], [256]).
Similar problems arise in cryptography. Let WS(5,M) denote the size of the
set of A, 1 < A < M, with ws(\, M) < Ms. For s = 3, the bound
W3(5,M) = 0{M1/2+3S/2+s)
is given in [415]. For the special case of M = pk a further improvement to (13.8)
is possible. The following result is taken from [629].

T H E O R E M 13.5. Let p denote a fixed prime and let ft, 1 < d < p2 denote a
primitive root modulo p2 if p > 3, and d = 3 if p = 2. Set
X = (pf 4- tf)(p* + l ) " 1 (mod pk),
where t = 2 [k/(2s + 1)J. Then, for sufficiently large M = pk,
4 2s+l
Ps(X,M)^>M ^ \
and the period X modulo M is at least Af/4.

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M N U M B E R S 217

T h e proof is ad hoc in t h a t it exploits several coincidences rather t h a n general


principles.
Dynamical systems arising from, and motivated by, linear congruencial gen-
erators are studied in [39], [1305]. Marsaglia [778] uses 'add-with-carry' and
'subtract-with-borrow' sequences, discussed in Chapter 3, as pseudo-random num-
ber generators of a new type. As shown in Chapter 3, their period structure is
reasonably well understood. Couture and L'Ecuyer [266] show how to combine
such generators with a linear congruencial generator and prove several geometric
results about the distribution of such combinations. The work of Tezuka [1276]
goes in the same direction, but with respect to combining several generators from
linear recurrence sequences. Roughly speaking, instead of the s t a n d a r d addition of
several such sequences a±(x) + • • • + arn(x) (which is a linear recurrence sequence
again by Chapter 4) he uses bit-wise addition X O R without carry. T h a t is, the
sequence
ai(x) X O R . . . X O R a m ( x )
is studied. Further developments and applications to cryptography can be found
in [617], where p-adic analysis is used; see also [46], [456], [457], [458], [618],
[1362] and [938, Sect. 2.2] for additional references.
'Multiplication-with-carry' sequences are also studied in the literature as useful
sources of pseudo-random numbers [266].
Non-linear recurrence sequences are also successfully used as pseudo-random
number generators. In Chapter 3 periods of quadratic (3.7) and inversive (3.8)
sequences are studied. Bounds for the discrepancy of pseudo-random number gen-
erators arising from such sequences (and more general sequences of types (3.6)
and (13.11)) have been obtained by Eichenaurer-Herrmann, Emmerich, Niederre-
iter, Tezuka and others [324], [325], [328], [330], [344], [345], [580], [935], [936],
[938], [1260], [1277].
For example, the 2-dimensional discrepancy of the inversive generator (3.8)
modulo a power of a fixed prime is over the complete period, b u t
-1 3
increases at least to £ / in the 3-dimensional case, see [324]. On the other hand,
generators modulo a square-free M have discrepancy of order M _ 1 / 2 + £ in any
dimension [325], [935].
The 2-dimensional discrepancy of a quadratic generator can be estimated as
well. Moreover, if M — pk where k is of order p 1 / / 2 l o g - p then, over the com-
plete period, the discrepancy is 0(t~1^2 log£), which is logt better t h a n the usual
estimate for similar sequences [330].
The papers mentioned above deal with the distribution of sequences over the
full period, and their methods cannot be used to study the distribution over parts
of the period. A new approach, and the first non-trivial results on the distribution
over parts of the period are given in [942], [945]. Several more applications of this
method are given in [478], [497], [944]. In particular, Theorems 7.7 and 7.8 are
proved. This method has also been applied in [414] (together with some other new
ideas) to studying the power generator, see Section 13.2.
For non-linear generators arising from the congruence (3.6) modulo a prime
M = p, a kind of lattice test can be defined. The corresponding sequence of
pseudo-random number numbers 7^ = a(x)/p passes the s-dimensional lattice test
if the vectors
( a ( x ) , . . . , a(x + s - 1 ) ) , x GN

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
218 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

span the vector space F*. As seen in Chapter 3, for several non-linear generators
the maximal period t = p can be guaranteed. Such a generator can be considered
as a map ¥p —> F p , so there is a unique polynomial G £ F p [X] with degree
D — degG < p such that a[x) — G(x\ x = 1 , . . . ,p (G must be a permutation
polynomial over F p , that is polynomial inducing a bijection). The parameter D is
an important characteristic of the sequence, see [935, Sect. 3] or [936, Sect. 8.1].
In particular, the sequence passes the s-dimensional lattice test for all s < D. A
lower bound on D is given in [942], which immediately yields the following result.
T H E O R E M 13.6. Letp denote a prime. Then a non-linear generator of maximal
period t = p, given by (3.6), with a polynomial F G ¥p(X) of degree d > 2, and
M = p, passes the s-dimensional lattice test for all s < \p/d].
It is also known [237], [392] that an inversive generator (3.8) (of maximal
period) passes the s-dimensional lattice test for all s < (p + l)/2. A close look at
Theorem 13.6 shows that this test is not very dependable. It seems plausible that
there are sequences which can only be represented by a polynomial of high degree
and which nonetheless have bad distribution properties. Such sequences do indeed
exist; see a discussion in [935, Sect. 3].
Motivated by applications to pseudo-random number generators, Anashin [30]
describes polynomials for which the sequences satisfying the congruence (3.6) with
M — pk a prime power are uniformly distributed modulo pk (that is, they generate
a one-to-one mapping in the residue ring Z/pkZ). Uniform distribution in the ring
of p-adic integers Z p is also considered.
So far the continuous aspect of the problem, dealing with numbers distributed
in [0,1], has been considered. The discrete aspect of the problem relates to binary
sequences, also known as sequences of pseudo-random bits. For an infinite sequence
(Sx) of #-ary digits, 0 < 5X < g — 1 and a finite #-ary string S = (d\... dk), denote by
F(S, N) the number of occurrences of this string in the initial segment of length iV,
and by f(S,N) = F(S,N)/N the corresponding frequencies. The sequence (Sx) is
called a pseudo-random sequence of g-ary digits if for any string 5, / ( S , N) —» g - ' 5 '
as N —> oo, where |AS| is the length of S. As shown in Section 8.1, such sequences
correspond to #-ary expansions of numbers normal to base g, and such g-ary se-
quences are also known as normal sequences of signs. Postnikov [1045], [1046]
provides a good outline of early approaches to such sequences and their applica-
tions. He also discusses various generalizations of this notion of normality such
as Bernoulli normal sequences, Markov normal sequences and normal continued
fractions. A construction of a Markov normal sequence with a very small discrep-
ancy (appropriately defined) is due to Levin [712]. Such discrete pseudo-random
sequences (finite or infinite) are of great importance in theoretical computer sci-
ence [1263]. They may also be used to rule out random walks on lattices, and
Postnikov [1044] notices that this leads to an algorithm for solving finite-difference
equations arising from the Dirichlet problem in the theory of partial differential
equations.
There is, however, one minor logical irritation here. Theorem 8.5 says that
there are numbers which are normal to some base g but are not normal to some
other base / . If such g-ary sequences are identified with the corresponding number

oo

h=l

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.1. U N I F O R M L Y D I S T R I B U T E D P S E U D O - R A N D O M N U M B E R S 219

which they represent, then the property of being pseudo-random is therefore not
invariant under change of base. On the other hand, it seems reasonable that being
pseudo-random should be an intrinsic property of #. This question is discussed by
Calude and Jiirgensen [196], who show how to change the definition of pseudo-
random sequence (random in their terminology) in order to make it invariant to
basis change. A dynamical overview of randomness for #-ary sequences may be
found in [1340].
Certainly, the smaller the difference | / ( 5 , N) — g~' 5 '| can be made, the better;
this size is related to the discrepancy of ($gx). This relation is not very direct,
and the sequences having the best bounds for the frequencies do not necessarily
correspond to values of # with the best value of the discrepancy. Indeed, [1185]
gives a construction of a sequence with
f(S,N)=g-W+0(N-1+s)
for any finite string S. The result of Levin [713] yields a slight improvement of this
asymptotic formula. On the other hand, the long standing record of Korobov [635]
and Levin [706] giving explicit constructions of $ such that the discrepancy of (dgx)
is 0 ( A r _ 2 / 3 + e ) , has recently been bettered: Levin [714] gives a construction which
leads to the bound 0(N~1 log2 TV).
Frequencies f(S,N) can also be considered for finite sequences, and here the
theory of linear recurrence sequences is of great use. For example, if g = q is a prime
power, Theorems 7.1 and 7.2 are nothing but bounds for f(N,S); see also [661],
[926].
Levin [712] also give a construction of a Markov normal number.
A different point of view on pseudo-random bits obtained from linear recurrence
sequences and their applications is pursued in [26]. Results of these papers are also
used later in [258] to study the statistical properties of shrinking two M-sequences
over F2, when the sequences a and s are selected at random from the sets of M-
sequences of orders n and m. The results of Section 7.1 can be used to obtain
some individual estimates as well. It is shown in [258] that if both a and s are
M-sequences over F 2 , and the characteristic polynomial of a is chosen with uniform
probability among all the (f(2n — l ) / n primitive polynomials of degree n over F2,
then the shrunken sequence (as(x)) has good statistical properties.
Another important question concerns the output rate of (a s (x)), or equivalently
the upper bound for hx, the position of the :rth 1 in s (see Chapter 4). It is shown
in [258] that if the characteristic polynomial of the selector sequence s is chosen
with uniform probability among all the ip(2m — l ) / n primitive polynomials of degree
m over F 2 , then hx = 0(x). A very natural question is whether it is possible in these
two statement to get individual estimates (rather than estimates which hold with
high probability). An encouraging remark in this direction is that Theorem 14.8
below implies that hx = 0(x) for any x > 2£rn. This and several similar results
about the output rate and the distribution of the shrinking generator from two
linear recurrence sequences are presented in [1205]. All these properties make such
sequences very attractive for applications. Moreover, (4.1) shows the shrunken
sequence is of exponential linear complexity (see Section 13.2 for the definition),
which is especially important for applications in cryptography.
Finally, one of the closest relatives of pseudo-random numbers are so-called
quasi-random numbers or points. The main difference is: Instead of seeking for
1-dimensional sequences (7^) for which the s-tuples ( 7 ^ , . . . ,7 X + S _i) admit a good

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
220 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

bound for the s-dimensional discrepancy, construct sequences of s-dimensional


points (71 } x ,... , ~/s,x) which are uniformly distributed (with respect to the discrep-
ancy and other criteria) in the s-dimensional unit cube. One of the most celebrated
applications of such points is the Monte Carlo Method, see [925], [936], [1277].
Some 30 years ago, Sobol [1226] found a construction of a family of quasi-
random points which he called LII r -nets; these still give the best known order for
the 5-dimensional discrepancy, DS(N) <C N~1logs N. The construction is based
on properties of sequences of maximal period over F2. To construct such a net in
s-dimensional space, fix s — 1 distinct M-sequences over F2 of orders n i , . . . , n s _ i ,
say (distinct means that they correspond to distinct primitive polynomials over
F 2 ). The constant in the bound for the discrepancy as well as many other useful
features of these sequences depend on the parameter
(13.9) r = m H h n8-i - s + 1;
the smaller r is, the better the sequence is. For the minimal values rs of those r
of the form (13.9), the explicit expression (6.17) and the asymptotic formula (6.18)
follow from the fact that there are ^{2n — l)/n primitive polynomials of degree n
over F 2 . The LII T -nets of Sobol influenced much further research, and in particular
it appears that their analogues and generalizations with respect to M-sequences
over other fields produce better constants in the estimate of the discrepancy [935],
[936]. Moreover, the classical LII r -nets have not lost their value as among many
other advantages they are directly related to binary representation of numbers —
the native language of modern computers.
It might seem that building a good pseudo-random number generator is just
a matter of mixing up several functions in an unexpected way. Unfortunately
this wide-spread belief has no real theoretical background: Knuth's example [619,
Algorithm K, Sect. 3.1] of an a priori exotic generator with very poor randomness
is instructive, and in some sense representative.

13.2. Pseudo-Random Number Generators in Cryptography


The previous section dealt with pseudo-random numbers from the point of view
of uniform distribution. Another view of pseudo-random numbers is related to Kol-
mogorov's complexity-theoretic approach to random numbers. Roughly speaking, if
7 is a sequence exhibiting almost random behaviour, then the next term j x should
be hard to guess from knowledge of the previous ones 7 1 , . . . , 7^-1 without knowing
the algorithm that produces them. Constructing sequences that are unpredictable
in this sense is an important part of modern cryptography.
The predictability problem is the following. Assume that some segment
lh, • • • ,7/i+fe-i
of the sequence 7 is known, as is some partial information about the nature of the
sequence (that is, the general shape of formulae for the sequence, and some of the
parameters involved are known). The problem is to predict the sequence forward
(that is, to generate Jh+k,'yh+k+i, • • •) a n d backward (to recover 7^-1,7^-2, • • • )•
These two aspects are closely related but for certain sequences can be problems
with essentially different complexities.
General approaches to this problem are beyond the scope of these notes; a
good introduction to the area is [666]. Instead, the predictability of the most
common pseudo-random number generators related to recurrence sequences will be

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 221

discussed. The majority of results obtained concern linear recurrence sequences,


and in particular sequences generated by a linear congruencial generator. Several
non-linear generators have been studied as well. Slightly changing the terminology
of [666] we obtain the following list of sequences.
(1) The power generator
a(x + l) = a(x)k (mod M), 0 < a(x) < M - 1,
where k is some fixed integer.
(2) The exponential generator
a(x + l)=gaix) (mod M), 0 < a(x) < M - 1,
where g is some fixed integer.
(3) The 1/M-generator, the sequence (dx) of g-ary digits of A/M (with some
integer 1 < A < M) in some fixed base g,
A/M = 0.did 2 ... •
An important special case of the power generator is the Blum, Blum and Shub
generator, a(x + 1) = a(x)2 (mod M) introduced in [127] and studied by several
other authors, see [155], [271], [407], [414], [406], [480] [666], [833], [1207] and
references therein.
Hallgren [508] has proposed using the elliptic curve analogue of the homoge-
neous linear congruencial generator (13.3). This and similar generators have also
been studied in [337], [445], [448]. In particular, using the bound for exponen-
tial sums from [623], El Mahassni and Shparlinski obtained an upper bound on the
one-dimensional discrepancy of such generators. The same method can probably be
used to estimate the discrepancy in any fixed small dimension s, say for s = 2, 3,4,
but it is not clear how to extend it to a general result which would apply to any
dimension. Many other natural questions about these constructions still remain
open, and they definitely deserve more attention. For example, it would be inter-
esting to study the linear complexity of such generators (in some special cases this
is done in [445]).
We now describe the construction of a pseudo-random function given by Naor
and Reingold [905]. Let p denote a prime, and let £ denote a prime divisor of p— 1.
Select an element g G F* of multiplicative order £. Then for each n-dimensional
vector a = ( a i , . . . , a n ) G (Z/iZ)71 one can define the function

(13.10) f*(x)=ga'1-a»n e¥p,


where x = x\ . . . xn is the bit representation of an integer x , 0 < x < 2 n — 1, with
some extra leading zeros if necessary. Naor and Reingold [905] have shown that
this function has some very desirable security property, provided certain standard
cryptographic assumptions hold. The results of [65], [479], [1204], [1208], [1210]
demonstrate that for almost all vectors a G (Z/£Z)n the function (13.10), as well as
its natural generalization to elliptic curves, poses very good linear complexity and
uniformity of distribution properties.
Several other generators are proposed in [666], including cellular automata
considered in Section 14.1 (see also [1358] and several other papers from [1359]).
Knowledge of M and k for the power generator, or of M and g for the exponen-
tial generator, makes the forward prediction trivial. However, knowledge only of a

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
222 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

segment of the sequence, or trying to compute earlier values, involves computation-


ally hard problems: Discrete root extraction and the discrete logarithm problems,
respectively. If gcd (fc, (p(M)) = 1 and g is a primitive root modulo p, then both
problems have unique solution but are computationally difficult (for a survey of
the current state of the art see [1202]). If M = p is a prime, or any other number
with known Euler function <^(M), then the backward computation for the power
generator can be done via the formula
a(x) = a(x -f l)e (mod M),
where £ is defined from the congruence k£ = 1 (mod (p(M)). In general nothing
better is known than to reduce the problem to computing <p(M), and thence to
the integer factorization of M. Thus if M = P1P2 is a product of two large primes
then the problem is hard; such M are called Rabin primes after the person who
first understood their importance for cryptography. That this problem is difficult
of course provides the security of the celebrated RSA-cryptosystem [621]. In fact,
this idea cannot be used without due care and attention: For example, Wiener
[1342] shows that if M — p\P2 with p\ ~ p2 ~ M 1 / 2 , and k has
kf < (0.25-£)M3/2,
then the RSA scheme can be broken in polynomial time via continued fractions.
For a simple description of this see [1027]. In particular, this problem arises if
k < <p(M) < M and £ < ( | — ^)M 1 / 4 . The most straightforward way to avoid the
problem is to choose k > M 3 / 2 ; only the residue k modulo ip(M) is important, and
this can be given by an artificially large value if desired.
Turning to the forward prediction problem, recall that the classical Berlekamp-
Massey algorithm to decode linear cyclic codes [85], [767], [767] efficiently solves
this problem if the field of definition is known. To find the characteristic polynomial
of a linear recurrence sequence a of order n, the algorithm takes 2n consecutive
terms and outputs the characteristic polynomial in 0(n2) field operations. The
basic algorithm is a version of the continued fraction algorithm. A more efficient
version is given by Blahut [122]; this uses only 0(n log +£ n) field operations (see
the survey [796] and the recent papers [27], [117], [390], [391], [968], [969], [970],
[984] for details).
This leaves the case where the definition domain is not known: The observer
sees integers forming a linear recurrence sequence over some unknown residue ring.
Boyar [138] gives general results about the predictability of linear and non-linear
recurrence sequences. The model is as follows: To predict the sequence a from
several initial values, we are allowed to guess each value and if we make a mistake
the correct value is revealed. The fewer mistakes made, the less secure is the
sequence.
The following results are obtained in [138]. Assume that a satisfies a linear
non-homogeneous recurrent congruence
a(x 4- n) = fn_ia(x + n - 1) + • • • + f±a(x + 1) + foa(x) + e (mod M)
of order n, but neither the modulus M nor the coefficients e, /o, / i , . . . , fn-i are
known. It is important to remember that the objective is to predict the sequence
rather than to find its parameters. In particular, the parameters are not uniquely
defined by the sequence, as the following example shows: The two sequences
ai(x + 2) = ai(x + l) + ai(x) (mod 6), d2(x + 2) = 4a2(x + l) + d2(x) (mod 6)

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 223

with the initial values ai(l) = a2(l) = 0, ai(l) = ^2(1) = 2, are identical. Boyar
proves that there is a prediction procedure of polynomial complexity (nlogM) 0 ^ 1 )
which makes at most
2n + 3 + log n -h | n log n + n log M
mistakes. Several related results have been obtained by Joux and Stern [565].
Clearly one cannot guarantee to do this with less than n — 1 mistakes.
For the quadratic generator
a(x + 1) = f2a{x)2 + fia(x) + /o (mod M)
a similar result is obtained, with the number of mistakes at most 10 + 3 log M (see
the comment after [138, Theorem 11]).
Lagarias and Reeds [669] and, later, Krawczyk [641] combine these two results
in a general statement about the predictability of non-linear recurrence sequences
(including vector-valued sequences). The methods of [669] work for /c-dimensional
vector-polynomial sequences A(x + 1) = F {A{x)) over a commutative ring 1Z where
A(x) G 1Zk, and the map F : 1Zk —> 1Zk is given by k polynomials in k variables.
This method relies on Hilbert's Basis Theorem and is therefore not computationally
effective. If 1Z is the residue ring modulo M, and all polynomials involved are of
total degree d, then the total number of mistakes is estimated as
1 4- <p(d, k) + ±N log N + dN log M,
where
AT , , (k + d\

The function ^(d, /c) is generally speaking not explicit, but for two important cases
it is known: <p(l, k) = k + 1 and <£>(d, 1) = <i+ 1. These two cases subsume the cases
considered by Boyar [138].
For the polynomial recurrence equation
(13.11)
a{x + n) = F (a(x + n - 1 ) , . . . , a(x)) (mod M), 0 < a(x) < M - 1, x G N
in which neither the polynomial F G Z [ X i , . . . , X n ] nor the modulus M are known
the following effective result holds [641].
THEOREM 13.7. Each polynomial recurrence sequence is predictable in polyno-
mial time, with the number of mistakes at most O (k2 log(kMd)) 7 where d is the
degree and k is the number of monomials in the polynomial F.
Clearly k < ( n ^ d ), but many generators are based on sparse polynomials for
which k may be significantly smaller [641]. For vector-valued sequences similar
results have also been proved.
Blum, Blum and Shub [127] introduce and study the 1/M generator. Let g > 2
denote a fixed integer, and M a positive integer that can be written with at most
L g-ary digits (that is, M < gL). They show that given k = 2L -f 3 consecutive
digits dfc+i,... , dh+k the denominator M (and hence the numerator) can be found
in polynomial time L°^l\ Indeed, the fraction
a = 0.(^+1,... ,dh+k)g
h
is an approximation to the fractional part {Ag /M} with error at most
k 2
g~ < 1/2M .

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
224 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

So, ultimately, {Agh/M} is one of the convergents of the continued fraction ex-
pansion of a, and thus can be found effectively. On the other hand, it is observed
in [127] that, under Artin's conjecture, k = L — 1 digits are not enough to determine
M unambiguously.
In [629], similar results are shown for smaller segments of digits without reliance
on any standard conjectures. The first statement claims that a string of k —
[3L/37J consecutive digits provides no information about M. Roughly speaking,
we see without any unproven conjectures that M may take almost any value among
all the primes p < gL.
T H E O R E M 13.8. Given any string of k = [3L/37J consecutive g-digits, there
are at least (1 + o(l))7r(gL) prime numbers p < gL such that the g-adic expansion
of 1/p contains that string.
Further, using results of [636], [637] it is shown that for an arbitrary e > 0,
any string of k < (l — e)L consecutive digits appears in the g-adic expansion of 1/M
for at least C(g)geL/2 values of M < gL. Here C(g) is some constant depending on
g only.
T H E O R E M 13.9. There is a constant c(g) > 0, depending only on g, such that
for every element M of the set
Wl={M=pa/jJ : l < / i < Q , (/i,p) = l } ,
where
Q= [c{g)geL,2\,
p is the smallest odd prime number with gcd(p, #) = 1, and pa is the largest power
of p that is less than gL/Q, and any string of k = [(1 — e)L\ consecutive g-digits,
the g-adic expansion of 1/M contains the string.
To make prediction harder several tricks can be used. One of them is to use
two or more generators in parallel (with different initial values, or even of differ-
ent types) and to mix their results. For example, a power generator ap and an
exponential generator ae can be combined to form a(x) = ap(x) + ae(x) (mod M).
The shrinkage operation described in Chapter 4 can be also be used. There do not
seem to be significant results about the predictability of such combined sequences.
The only exception is the bound (4.1) for the linear complexity of shrinking two
M-sequences. Another way to increase the security of a sequence is to use parts of
the elements. For example, given integers h, s, instead of a sequence a, consider
the truncation
ahi8(x) = [2~ha{x)\ (mod 2 s ), 0 < a M ( x ) < 2 s - 1.
Thus, ah,s is formed by the s bits of a(x) starting from the (/i+l)th lowest significant
bit. For several common pseudo-random generators, an extensive detailed study of
this and similar truncation procedures has been made in the papers [137], [415],
[666], [669] (and references therein). Some of these results follow. For integers
s, k > 1 denote by £fcjS(a(0), A, M) the binary s/c-dimensional vector obtained by
adjoining s leading bits of the k first elements of the sequence a produced by the
homogeneous linear congruencial generator (13.3) with the initial value a(0) (for
general generators (13.2) similar results can be obtained). All elements of the
sequence having L = [logM] bits are considered, so jBfc?s(a(0), A, M) is obtained
from the bits of \a(j)/2L~s\, j = 0 , . . . , k — 1. It is shown in [415, Theorem 3.1],

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 225

that for any k > 1, e > 0 and sufficiently large square-free M > c(k,e), there is an
exceptional set E(M,k,e) of multipliers A of cardinality
\E(M,k,e)\ <Ml~£
such that for any multiplier not in E(M, /c, e) the following is true: If
s = \(l/k + e) log M + fc(l/2 + log 3) + 3.5 log k + 2 - log 3] ,
then the initial value of the sequence a can be uniquely determined in polyno-
mial time from knowledge of the multiplier A, the modulus M and the vector
BM(a(0),A,M).
It is also remarked in that paper that an exceptional set E(M, k,e) is necessary,
as A = 1 is always a bad multiplier. Also, by Dirichlet's principle, if

s < ——log ML
I & I
then there is an initial value a® such that 2?fcjS(a(0), A, M) = jBfc)S(ao, A, M) for at
least M e initial values of a(0) = 0 , . . . , M — 1. Such an initial value is secure in
the following sense: Knowledge of B^ s (a(0), A, M) is not enough to determine the
entire sequence.
In the opposite direction, it is shown in [629] that for infinitely many primes
M = p,
s= — — - l o g p - log logp + O(l)

and all but at most pl~£ multipliers A, and any initial value a(0) = 0 , . . . ,p — 1,
the problem is secure: That is the knowledge of A, M and i ^ s ( a ( 0 ) , A, M) for
such k and 5 is not enough to determine the initial value uniquely (regardless of
the complexity of the algorithm) and moreover, the number of candidates for the
initial value is exponentially large.
THEOREM 13.10. Let p denote an L-bit prime with 2L - 2 3 L / 4 < p < 2L.
Then for any k > 5, e > 0 there is an exceptional set F(p, /c, e) of multipliers A
of cardinality \F(p,k,e)\ < pl~e, such that for any multiplier A ^ F{p,k,e) the
following is true. If
s < —-—L — log L — c,
k
where c is an absolute constant, then for any binary sk-dimensional vector B there
exist at least p£ initial values of a(0) = 0,.. .p — 1 such that J3fc?s(a(0), A,p) = B.
In the results above, only the initial value is unknown. Boyar [137] shows
that even if the multiplier A and the modulus M are unknown and only the
t < Clog log M lowest bits are available, then the sequence can be predicted in
polynomial time with at most (2 + log M) log M + 3 errors (provided that each
wrong prediction is corrected).
The results of [641], [669] are applicable to such truncated sequences as well
(see also [666]).
Finally, Kuzmin and Nechaev [662], [663] show that any linear recurrence
sequence a of order n over 7Ljpk7L can be recovered from its last coordinate ak-\{x)
given by (3.5) in time 0(nph). (Unfortunately it is not clear in these papers exactly
what 'recovered' means — perhaps that the characteristic polynomials and initial
values can be found).

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
226 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

There are other approaches to predictability of sequences. For brevity, only


linear methods of prediction will be considered. This leads to the notion of linear
complexity and the linear complexity profile of a sequence. For a finite or infinite
sequence a of elements over a ring 7Z the linear complexity profile Ca is defined as
follows: Ca(h) is the least n such that a ( l ) , . . . , a(h) form the first h terms of some
linear recurrence sequence of order n over 1Z. If 1Z = F is a field, then this function
is well-defined and satisfies the inequalities
£ a (ft)<min{fc, £ 0 ( / i + l ) }
(put Ca(h) = 0 for identically zero sequence). Also define the linear complexity
£* = limsup£ a (h).
h—*oo
Clearly, £* < oo if and only if the sequence a is a linear recurrence sequence beyond
some point. It is also clear that Ca(h) < £* < n if a is a linear recurrence sequence
of order n (in particular, for a periodic sequence of period t, Ca(h) < £* <t).
As mentioned above, the Berlekamp-Massey algorithm computes Ca(h) in
0(h2) field operations, and the Blahut algorithm improves this to 0(h log + £ n)
field operations.
Linear complexity is a widely accepted measure of randomness and unpre-
dictability of sequences, and has many applications to cryptography [272], [610],
[796], [929], [932], [933], [937], [934], [1027], [1095]. Many results giving bounds
for, and algorithms to compute, the linear complexity and even the linear complex-
ity profile of various sequences (de Bruijn sequences, sequences generated by finite
automata, random sequences and others). In addition to the sources quoted above,
the papers [117], [118], [119], [120], [121], [220], [221], [222], [296], [420], [439],
[442], [443], [535], [536], [549], [611], [612], [784], [794], [950], [952], [953],
[966], [984], [1293] discuss this.
A surprising result is obtained in [1361], showing that algebraic curves could
be a source of sequences with very good linear complexity profile properties.
Many of the results of Chapter 4 can be considered as results on the linear com-
plexity of sums, products, convolution, shrinking, self-shrinking and other functions
of linear recurrence sequences, and several of them are obtained in that context.
Thus the works [120], [258], [275], [276], [277], [459], [531], [652], [653], [654],
[655], [656], [663], [680], [806], [1098], [1379] are relevant to this subject as well.
For example, it is shown in [806] that a self-shrunken M-sequence of order n has
linear complexity at least 2Ln/2J.
The following interesting construction, combining linear recurrence sequences
with a knapsack-like construction, has been proposed by Massey and Rueppel [1097]
and studied in [1095, 1096], and [813]. Let a be a linear recurrence sequence of
order n over F2. For an integer modulus m > 1 and an n-dimensional vector
z = (21,... , zn) £ ZJ^ of weights, consider the following subset sum generator of
pseudo-random numbers:
n
uz(x) = Y^ a(x + j — 1)ZJ (mod m), 0 < uz(x) < m — 1, # = 1,2,...

of elements of Z m . For cryptographic applications, it is advisable to use a linear


recurrence sequence of maximal period r = 2 n — 1 and the modulus m — 2n.
Relatively little is known about this sequence, although some results about its

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 227

multidimensional distribution (for general r and m) have recently been established


in [255].
The linear complexity of various non-linear generators, including power, inver-
sive and Naor-Reingold generators, has been estimated in [65], [479], [480], [498],
[805], [833], [957], [1207], [1210].
In the series of papers [220], [221], [222], [610], [611], [612] (and references
therein) the following construction is considered. Let p denote any mapping from
¥q to F2 (whose elements are as usual called bits). Notice t h a t ¥q may or may not
be an extension of F2; odd q are of equal or even greater interest. For example, if
q — p is prime, and p(a) is the last bit of the least positive residue of its argument
a, then this is related to the truncation operation considered above. For a sequence
a taking values in ¥qi the sequence of bits b with b(x) = p(a(x)) is an important
example. Sometimes it is convenient to view p as a function from ¥q to {0, 1} c F g ;
examples here include polynomials. This is particularly convenient when q is even,
so ¥q is an algebraic extension of F2 (recall t h a t over a finite field any function
can be represented by a polynomial). If a is a linear recurrence sequence then
Theorems 4.1 and 4.2 apply. Certainly this construction is too complex to handle
in such generality but, for some specializations, it produces very interesting and
useful sequences. For example, t h e authors of [220], [221], [610], [611], [612]
concentrate on the case when a is an M-sequence.
For odd q, the following result appears in [610]. For an M-sequence a of order
n over ¥qi define the subsequence aT(x) = a(rx) where r — (qn — l)/(q — 1) is its
restricted period (see Chapter 3 for the definition). P u t bT(x) = p(atau(x)). Since
b o t h b and bT are periodic,

(13.12) C*b<TClT,

so it is of exponential linear complexity. If the mapping p is chosen so t h a t CI =


q — 1 (for small q this should not be difficult), then CI attains its maximal possible
value qn — 1, which is the period of the sequence b.
Klapper [610] notices t h a t since b and bT are 0,1-sequences they can be thought
of over any other finite field F^. This potentially gives different linear complexity,
however the formula (13.12) holds for any field of characteristic different from the
characteristic of ¥q, q = pr. He shows t h a t over ¥p the maximal value of the linear
complexity is essentially smaller, being at most

n + p — l\r
n J
For fixed p, this is of polynomial order n°^ rather t h a n of exponential order as
over other fields.
The question about finding q from values of the bit sequence b is also considered
in [610], where some probabilistic algorithms are presented.
Statistical properties of Ca(h) for a random sequence a of elements of ¥q are
summarized by Niederreiter [932]. In particular, denote by Nh(L) the number of
different /i-term sequences a over ¥q with Ca(h) = L. T h e n
Nh(L) = (q- l)9min{2h-2L,2L-l}i

for h > L > 0. This formula can be used to show t h a t Ca(h) = h/2 + 0(logh) for
an infinite uniformly distributed random sequence over ¥q. More precisely, for such

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
228 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

a sequence
\Cg(h) ~ h/2\
r 1
lim sup J \-t——-—- = —
h^oo log h 2 log q
with probability 1, see [932], [937].
As seen above, for a random sequence Ca(h) is about h/2. A natural question is
how to construct a pseudo-random sequence having similar behaviour of the linear
complexity profile. The construction of such sequences uses the theory of continued
fractions in the field ¥q((X~1)) of formal Laurent series over ¥q. For a sequence a,
write
oo

A(x) = J2^h)x-he¥q((x-1)).
h=l
Let [Ci(X), C 2 P O , . . . ] denote the continued fraction expansion of A(X) in which
the partial quotients Cj(X), j = 1,2,... , are polynomials over ¥q with positive
degree. Then the following remarkable statement holds [929], [937].
T H E O R E M 13.11. Put dj = degCj. Then

J'=0
where m is uniquely defined by the inequalities
771 — 1 m

2 2_^ dj -\- dm < h < 2 2_^ dj + d m +i.


i=o j=o
If all the dj = 1 then the sequence a has a perfect linear complexity profile,
namely Ca(h) = [(h + 1)/2J. The reverse is also true: In particular, the sequence

Jl, if x = 2 ^ - 1 ;
I 0, otherwise;

corresponding to the continued fraction [x, x , . . . ] over F2 has perfect linear com-
plexity profile. On the other hand, the bit distribution of this sequence is very
poor, which shows some of the weaknesses of this measure of randomness for se-
quences. Further discussion of the density of unit bits in such sequences can be
found in [853], [1364].
Niederreiter [934] demonstrates that a finite sequence of positive integers
L i , . . . ,Lh
can be realized as the initial segment of the linear complexity profile of some se-
quence a over ¥q if and only if the following conditions are satisfied: If L& > k/2
then Lfc+i = Lk , and if Lk < k/2 then L^ + i = L^ or L^+i = fc -f 1 — L&, k =
0 , 1 , . . . , h — 1. Moreover, there are exactly qminiLh,h-Lh} g u c n /^element sequences.
It is striking that this number depends on Lh only. The jump complexity profile of
a sequence a is defined as follows: Ja(h) is the number of positive integers among
the list £ a ( l ) , £ a ( 2 ) — £ a ( l ) , . . . ,Ca{h) — Ca{h — 1). Some combinatorial and statis-
tical properties of linear complexity and jump complexity profiles were considered
in [212], [348], [544], [1327] for the binary case and in [934] for sequences over

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
13.2. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S IN C R Y P T O G R A P H Y 229

arbitrary finite fields. For example, it is shown in [934] that


1 1
N (L . . ( ^ ^ ' ^ ( q - i y Q ^ ^ - ^ , i f l < J < m i n { L , / i - L + l};
h[
'" [0 if j > m i n { L , f t - L + l } ,

where Nh(L,r) is the number of different ft-term sequences a over ¥q such that
Ca(h) = L and Ja{h) = j (trivially, ^ ( 0 , 0 ) = 1 and Nh(L,0) = 0 if L > 1).
The mean value and variance of J0(ft) for an infinite uniformly distributed random
sequence over ¥q are evaluated in [934] as well. One of the results obtained asserts
that
limsup(Mog/i)- 1 / 2 |J a (/i) - (q-l)h/2q\ < (2/q)1/2
ft,—>oo

with probability 1. For random binary sequences a the expected value E(J a (ft))
and the variance of V(J a (ft)) of J a (ft) are explicitly evaluated by Wang [1327]. For
instance,
'ft/4 + 1/3 - 2~h/S if ft is even;
W ) ) = u ft/4
„ +. r5/12
„„ - „_
2 -h^ / 3 if ft is odd.
In [1203] a lower bound of order Hp~1/2\og~ p is obtained for the linear
complexity of a sequence of H consecutive values of the discrete logarithm modulo
p modulo any divisor dofp—1. Several other similar results can be found in [674],
[804].
The case d = 2 is of interest because it corresponds to the sequence of values
of the rightmost bit of the discrete logarithm, which determines whether x is a
quadratic residue modulo p. In this case, the linear complexity of the infinite
sequence (which is periodic with period p) has been evaluated in [302]:

r(p-i)A ifP = i (mod 8),


_\p, if p = 3 (mod 8),
1 p — 1, if p = 5 (mod 8),
l(p+l)/2, ifp = 7 (mod 8).
This result has been generalized in [300], see also [272], [297], [298], [299].
As mentioned above, many properties of the linear complexity can be expressed
in terms of properties of continued fractions related to the sequence. Links between
linear complexity profiles of sequences over ¥q and g-adic expansions of real numbers
of the interval [0,1] have been discovered in [951]. Fix a bijection
^:Fg-{0,... ,^-l}.
Then each infinite sequence s over ¥q can be mapped to the point
oo

a(s) = Yfil>(si)q-i€[0,l].
i=l

Now let d : N —> R be any non-negative function. If \Ln(s) — n/2\ < d(n) for all
n = 1, 2 , . . . then the sequence s has d-almost perfect linear complexity profile. The
case of constant d has special interest.
The Hausdorff dimension D(d,q) and the Hausdorff measure M{d,q) of the
set of points corresponding to sequences with d-almost perfect linear complexity

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
230 13. P S E U D O - R A N D O M N U M B E R G E N E R A T O R S

profile is evaluated explicitly in [951]. In particular, if d(ri) = d is constant then


M ( d , q)=0 and

Wg) = * ,
where $(<i, g) is the largest root of the equation
d-l

i9q -(q-l)J2$J =0-


i=o
The case d(n) —> oo is considered as well. In this case (under some additional
natural assumptions) log q n is the threshold: The measure is zero if d(ri) grows
slower t h a n log q n and is positive if d(n) grows faster t h a n (1 + e) log g n. On the
other hand, D(d, q) = 1 for any d(n) —> oo.
For example, the k-error linear complexity £ a (/c, /i), see [803], [802], [940],
[941], of a sequence a is defined as the smallest possible linear complexity £b(h)
taken over all sequences b which differ from a in at most k places. This is an
important characteristic of 'stability' of the linear complexity of the sequence. Ef-
ficient algorithms to compute the /c-error linear complexity are discussed in [568],
[569]. Relations between the linear complexity and the A:-error linear complexity
have been studied in [659]; see also [570]. A very similar function is called sphere
complexity in [272]. Several results about the distribution of £a{k, h) over the set
of all binary sequences are given in [941].

Licensed to AMS.
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms
View publication stats

You might also like