Approximate Entropy For Testing Randomness PDF
Approximate Entropy For Testing Randomness PDF
Approximate Entropy For Testing Randomness PDF
Andrew L. Rukhin
Abstract
Key words : Decomposable Statistics, Entropy, Information Divergence, Multinomial Distribution, Poisson Distribution, 2-distribution.
Andrew L. Rukhin is a Professor in the Department of Mathematics and
Statistics at University of Maryland at Baltimore County, Baltimore, MD,
21250. He also has a faculty appointment in the Statistical Engineering Division at the National Institute of Standards and Technology, Gaithersburg,
MD 20899-0001. This work has been motivated by a joint project with the
Computer Security Division of the National Institute of Standards and Technology.
and
n+1
X?m
1
log Cim :
n + 1 ? m i=1
m
Observe that Ci is the relative frequency of occurrences of the template
Yi(m) in the sequence, and ?(m) is the entropy of the empirical distribution
arising on the observed subset of the set of all sm possible patterns of length
m:
The approximate entropy ApEn of order m; m 1 is dened as
ApEn(m) = (m) ? (m+1)
with ApEn(0) = ?(1) : \ApEn(m) measures the logarithmic frequency with
which blocks of length m that are close together remain close together for
blocks augmented by one position. Thus, small values of ApEn(m) imply
strong regularity, or persistence, in a sequence. Alternatively, large values
of ApEn(m) imply substantial
uctuation, or irregularity .." (Pincus and
Singer, 1996, p 2083).
Pincus and Singer (1996) dened a sequence to be m-irregular (m-random)
if its approximate entropy ApEn(m) takes the largest possible value. Pincus
and Kalman (1997) evaluated quantities
p
pApEn(m); m = 0; 1; 2 for binary
and decimal expansions
p of e; ; 2 and 3 with the surprising conclusion
that the expansion of 3 demonstrated much more irregularity than that of
.
Since ?(m) is the entropy of the empirical distribution which under the
randomness assumption must be almost uniform, one should expect that for
xed m; (m) ?m log s and ApEn(m) = (m) ?(m+1) ! log s; indeed this
fact follows from Theorem 2 in Pincus (1991). As far as the limiting behavior
of ApEn(m)?log s, Pincus and Huang (1992), p 3072, indicate that \analytic
proofs of asymptotic normality and especially explicit variance estimates for
ApEn appear to be extremely dicult".
The key step leading to the limiting distribution of approximate entropy
is a modication of its denition. Introduce the modied version of the
empirical distribution entropy ?(m) as
X
(1)
~ (m) =
i1 im log i1 im :
(m) =
i1 im
Here i1 im = !i1im =n denotes the relative frequency of the pattern
(i1 ; ; im ) in the augmented (or circular) version of the original string,
3
i.e.
(1 ; : : : ; n; 1P
; : : : ; m?1 ). Under this denition !i1im =
P !in the, string
k i1 im k so that for any m; i1 im i1 im = n:
Dene the modied approximate entropy as
i1 im
!i01 im = n ? m + 1;
i1 im ? i01 im n ?m m? +1 1 ;
(3)
g(m)
which suggests that for a xed m, Pincus' approximate entropy and ApEn
must be close when n is large.
g(m)]
In the next Section I derive the limiting distribution of n[log s?ApEn
g(m)] =
when n ! 1 and m is xed. It is also proven that n[ApEn(m)?ApEn
OP (n?1) ; so that the limiting distributions of Pincus' approximate entropy
4
g(m)] as well
It is shown here that the limiting distribution of 2n[log s ? ApEn
as of 2n[log s ? ApEn(m)] is that of a 2-random variable with (s ? 1)sm
degrees of freedom.
Proposition 1 For xed m as n ! 1 one has the following convergence in
distribution
h
g(m)i ! 2 (sm+1 ? sm ):
2n log s ? ApEn
Also
1
g
n[ApEn(m) ? ApEn(m)] = OP n ;
(4)
so that
2n[log s ? ApEn(m)] ! 2(sm+1 ? sm):
g(m). Put
Proof Let us start with the limit theorem for ApEn
p
Zi1im = n i1 im ? s1m :
Then the vector formed by Zi1 im has asymptotic multivariate normal distribution with zero mean and the covariance matrix of the form
1
1
m = m Im ? 2m em eTm:
s
s
Here Im denotes the sm sm identity matrix andPeTm = (1; : : : ; 1) is a smdimensional vector. Since with probability one, Zi1im = 0, (1) shows
that
"
#
1 i
2m Z 2
m Zi i
s
s
i1 im h
i
i
m
1
m
1
~(m) = ? X 1m + Zp
pn ? 2n +OP n3=2
n ?m log s+
i1 im s
5
?m log s + s2n
m
i1 im
Zi21 im :
Using a similar notation for patterns of length m + 1, let i1 imim+1 be the
relative frequencies, and let Zi1im im+1 denote the corresponding dierences
between empirical and theoretical probabilities. Then
Zi1 im =
and
s
X
k=1
Zi1im k
m+1
Thus
2n i1 imim+1
Zi21im im+1 :
~ (m) ? ~ (m+1)
2
3
!2
m X X
X
s
log s ? 2n 4
Zi1imk ? s
Zi21im im+1 5
i1 im k
i1 im im+1
m
= log s ? s Z T QZ
2n
m
+1
m
+1
with the s s block-diagonal matrix Q formed by formed by sm blocks
Q0 of size s s,
Q0 = sI1 ? e1 eT1 ;
and sm+1 -dimensional normal
vector Z . The distribution of the quadratic
P
T
form Z QZ is that of li1 imim+1 Wi21 imim+1 with independent standard
normal variables Wi1 im im+1 and li1 imim+1 denoting the eigenvalues of the
matrix 1=2 Q1=2 .
It is easy to check that
2 =
1m=+1
1 I ? 1 e eT ;
(
m
+1)
s =2 m+1 s3(m+1)=2 m+1 m+1
2 Q1=2 = 1 Q:
and 1m=+1
m+1 sm+1
h 2 1=2
i
The evaluation of the determinant, det 1m=+1
Qm+1 ? lIm+1 , shows the
needed eigenvalues are equal to s with multiplicity (s ? 1)sm and 0 with
multiplicity sm. Therefore
1
~ (m) ? ~ (m+1) log s ? 2 ((s ? 1)sm )
2n
and
h
g(m)i 1 2(sm+1 ? sm):
n log s ? ApEn
2
pn h 0 ? s?m i then
0
=
The estimate (3) shows that
if
Z
i
i
i1 im
m
1
jZi01im ? Zi1 im j (m ? 1)pn=(n ? m + 1) and
2m
2
(m) (m) sm X 2
X
Zi012im 2(sn ?(mm?+1)1)2 :
~ ? 2n Zi1 im ?
i1im
i1 im
Thus (4) follows and the Proposition 1 is proven. 2
For the observed value ApEn(m), one has to dene 2 (obs) as 2 (obs)
= 2n jlog s ? ApEn(m)j, whereas, as has been noticed, the dierence log s ?
g(m) is always positive. The reported P-value (tail probability) is
ApEn
Pn(m) = 1 ? P 2m?1 ; 2(obs)=2
with P denoting the incomplete gamma-function. The null hypothesis of
randomness is rejected for large values of 2 (obs). h
g(m)i and
The asymptotic distribution of the statistics 2n log s ? ApEn
2n [log s ? ApEn(m)], evaluated under the alternative of the form i1im im+1 =
s?m?1 + n?1=2 i1 im im+1 , with T e = 0, is a noncentral 2-distribution with
sm+1 ? sm degrees of freedom and the noncentrality parameter T =sm+1.
This fact allows for an approximate power function of the corresponding test
of randomness.
To investigate this case let us write the formula for the modied approximate entropy in the following form
i1 im im+1
1 Xh
!i1ims i
!i1im 1 + + !
=
!
i1 im s log P
i1 im 1 log P
n i1im
k !i1 im k
k !i1 im k
X
=1
(6)
n i1im f (!i1im1 ; : : : ; !i1ims)
with f (u1; : : : ; us) denoting
the entropy of the probability distribution dened
P
by probabilities uk = uj ; k = 1; : : : ; s,
!
!
u
u
s
1
f (u1; : : : ; us) = ?u1 log P u ? ? us log P u :
j j
j j
P (u )?
Note
that
our
function
f
has
a
special
form,
namely,
f
(
u
;
:
:
:
;
u
)
=
j
1
s
j
(Pj uj ) with (u) = ?u log u.
A similar representation with n replaced by n ? m + 1Palso holds for
ApEn(m). Indeed in the notation of Section 2, !i01 im ? k !i01im k 1
and
no more than one m-tuple i1; : : : ; im for which !i01 im 6=
P !0 there. exists
i1 im k Therefore
0
0
X
!
X 0
!
im im+1
!i01 imim+1 log n i?1m
!i1im log n ?i1mim+ 1 ?
+ 1
i1 im
i1 im im+1
0xmax
[(x + 1) log(x + 1) ? x log x] + log(n ? m + 1) 2 log n;
n?m+1
so that
!
0
Xh 0
!
1
i
i
1
m
1
+
ApEn(m) = n ? m + 1
!i1 im1 log P !0
k i1 im k
i1 im
!i
!
0
!
log
n
i
i
s
+!i01im s log P 1 0 m
+ OP
n :
k !i1 im k
Thus ApEn also admits the representation (6), and the limiting distribution
g and ApEn is that of this decomposable statistic. Sums of
of both ApEn
8
this form (with functions f of only one argument) have been extensively
studied. See Holst (1972), Morris (1975) and Medvedev (1977). Although
our situation with f depending on s frequencies !i1im1 ; : : : ; !i1 im s does
not follow directly from these results, the special form of this function leads
to the following Proposition 2 which can be derived from Holst (1972) after
some modications.
Let i1 im 1; : : : ; i1im s denote s independent Poisson random variables
with parameter . It is also convenient to write 1 ; : : : ; s or 1(); : : : ; s()
for a s-tuple of such random variables. Put
m
X
n = n1
Ef (i1 im 1; : : : ; i1im s) = sn Ef (1 ; : : : ; s)
i1 im
=
and
1
s Ef (1; : : : ; s);
1 ; : : : ; s ); 1 + + s )
= Cov (f (
Var ( + + )
1
With
Un =
and
X
i1 im
X
Vn = p1n
[i1im 1 + + i1 im s ? s]
i1 im
n2 =
i1 im
then the joint asymptotic distribution of Un =n and Vn is normal with zero
mean and the identity covariance matrix. The conditional distribution of
g(m) ? n, since
Un=n given Vn = 0 coincides with the distribution of ApEn
the conditional distribution of (i1im 1; : : : ; i1 im s) given that P[i1 im1 +
+ i1 im s] = n is multinomial.
g(m)?
Therefore the following result concerning the convergence of n[ApEn
n]=n and of (n ? m + 1)[ApEn(m) ? n]=n to a standard normal distribution is not surprising.
n!1
0 g
1
ApEn
(
m
)
?
n
P @n
xA ! (x)
!
ApEn
(m) ? n
x ! (x)
P n
n
Sketch of the Proof The argument above can be made rigorous by examig(m) as in Lemmas 2.1, 2.2, A1,
nation of the characteristic function of ApEn
m
+1
A2 and A3 of Holst (1972). With N = s as in Lemma 2.1 there
2
3
1
n ?Nz
Y
X
f
(
;:::;
)
xi1 ii1mim 1 i1im s 5 (Nz)n!e
AN (z) = En 4
n=0
i1 im
Y X zj1 ++js e?z f (i1 im 1 (z);:::;i1ims (z))
=
:
xi1 im
i1 im j1 :::js j1 ! js !
A similar representation
for the characteristic function '(t)
P
= E exp fit i1 im f (i1 im 1 ; : : : ; i1 ims)g as in Lemma 2.2 follows; the only
dierence is that the ordinary sum in the right-hand side is replaced by the
multiple sum
X (ei )j1++js h it[(j1 )++(js)]?it(j1 +:::+js) i
e
?1 :
e?sei
j1 :::js j1 ! js !
The same estimates as in Lemmas A1 and A2 hold for the corresponding
function. The convergence result in Lemma A2 also holds by analysis of
Taylor's expansion. 2
and
10
and then
Similarly with
one has
Also if
and
() = Cov ([1() + 2 ()] log[1 () + 2()]; 1() log 1 ())
1 k log(k + 1)
X
k ? (2)();
= e?2
k!
k=1
where
!
k
X
k
+
1
k = (j + 1) log(j + 1) j + 1 ;
j =1
then
Varf (1 ; 2 ) = 2 (2) + 2 2 () ? 4 ():
Thus
"
2#
[
(2
)
?
2
(
)]
2
2
2
m
:
n = s (2) + 2 () ? 4 () ?
2
More generally, the asymptotic distribution of the sum
1 X f (!
S=
n i1im i1im 1; : : : ; !i1ims)
The asymptotic power of this test statistic S under the alternative i1 im
is determined by the ratio R = lim[E S ? n]=n, whose absolute value is to
be maximized to have the optimal Pitman eciency.
Under the alternative of the form i1im im+1 = s?m?1 + n?1=4 i1im im+1
with
Z i1 +i2=s++(im +1)s?m
i1im =
q(u) du
?m
R
such that 1 q(u) du = 0,
i1 +i2 =s++im s
R=
Z1
4 Examples
Here are two strings of 20 binary bits which have been suggested by Chaitin
(1975)
(A) 01010101010101010101
(B ) 01101100110111100010
For a non-randomly looking sequence (A); ApEn(0) = ?(1) = ?~ (1) =
log 2, which is the largest possible value for ApEn. Since there are only two
occurring patterns of length 2, namely (0; 1) and (1; 0) with frequencies 10
and 9 respectively,
1
10
9
(2)
=
10 log + 9 log
= ?0:6918:::
19
19
19
Thus
ApEn(1) = 0:0014:::
with 2 (obs) = 40[log 2 ? ApEn(1)] = 27:6699:::
12
13
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
300
350
400
3 (broken line),
In Figure 1 the P-values Pnp(1) from Section 2 are plotted against the rst
digits of binary expansions
of 3. and e. According to this data, P-values
p
corresponding to 3 are much smaller than those of e and
p . The situation,
however, is reverse for m = 7. when the digits of and 3 look much more
random than these of of expansion of e (Figure 2).
14
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
500
600
700
800
900
3 (broken line),
References
[1] Chaitin, G. (1975), \Randomness and mathematical proof," Scientic
American, 232, pp 47{52.
15
16