A Universal Variable-to-Fixed Length Source Code Based On Lawrence's Algorithm
A Universal Variable-to-Fixed Length Source Code Based On Lawrence's Algorithm
A Universal Variable-to-Fixed Length Source Code Based On Lawrence's Algorithm
Abstract
The Lawrence algorithm (1977) is a universal binary variable-to-xed length
source coding algorithm. Here we introduce a modied version of this algorithm and
investigate its asymptotic performance. For M (the segment set cardinality) large
enough, we show that the rate R as function of the source parameter satises
R h( ) 1 +
log log M ;
2 log M
for 0 < < 1. Here h( ) is the binary entropy function.
In addition to this, we prove that no codes exist that have a better asymptotic per-
formance, thereby establishing the asymptotic optimality of our modied Lawrence
code.
The asymptotic bounds show that universal variable-to-xed length codes can
have a signicantly lower redundancy than universal xed-to-variable length codes
with the same number of codewords.
Index terms: Universal Source Coding, Enumerative Coding, Variable-
to-Fixed Length Codes, Asymptotic Redundancy.
1 Preliminaries
A binary memoryless information source generates a sequence of independent and identi-
cally distributed random variables fXt gt=1;1, each of which assumes values in the nite
set X = f0; 1g, called the source alphabet. Let = PrfX = 1g = 1 PrfX = 0g,
t t
t = 1; 2; . Then the entropy of the source (in bits per symbol) is equal to h() =
log() (1 ) log(1 ). (We assume throughout this paper that log()'s have base 2
and that ln() has base e.)
In what follows we will describe a universal variable-to-xed length coding strategy
for the class of binary memoryless sources. With these codes, the (innite length) source
Eindhoven University of Technology, Faculty of Electrical Engineering, P.O. Box 513,
5600 MB Eindhoven.
1
sequence is chopped up into sequences of variable length (segments), chosen from some
nite set S of segments, and each segment is assigned to a code sequence of xed length
N = log M , where M is the number of segments in S . (Note that we ignore the rounding
of log M to an integer). This set of segments must be complete, i.e. every innite sequence
has a prex in the segment set, since every sequence must be subdividable into segments.
We also require that the set is proper, i.e. no segment in the set is a prex of an other
segment in the set. In this way we guarantee a unique subdivision of every source sequence.
It is assumed that the code alphabet Y = f0; 1g. Let L(x ) = k be the length of segment
x = (x1 ; x2; ; xk ). Then instead of sending the L(x ) source symbols to a receiver
we can send the corresponding codeword. This codeword can be used by the receiver to
reconstruct the source segment. If the code is properly chosen, the average segment length
Lav
can be considerable higher than N where
Lav X
PrfX = x gL(x ):
= (1)
x 2S
Therefore the (compression) rate R of a code, which is dened as R = N=Lav , can be
smaller than one. Note that a universal code can not be designed using the statistics of
the source.
Tunstall [10] discovered a procedure for constructing an optimum segment set for a
given memoryless source. For a xed N , this construction maximizes Lav . If we form
such a code for a binary source with < 0:5, then N + log() L h() N . A major
av
disadvantage of a Tunstall code is that the complete code has to be stored by both the
encoder and the decoder. Note that these codes are not universal.
Lawrence [5] devised a variable-to-xed length code that is easier to implement. Only a
part of Pascal's triangle must be stored by the encoder and the decoder now. An additional
feature of this code is that it is universal. This code can be seen as the variable-to-xed
length counterpart of Schalkwijk's [8] `Pascal triangle' algorithm.
In this paper we will describe a modication of this Lawrence code. Instead of using
a prex and a sux implementation as in Schalkwijk [8] and in Lawrence [5] we compute
the lexicographical indices of the segments. The lexicographical index of a segment x 2 S
equals the number of segments in S which are less than x in a lexicographical ordering.
This index can be represented using log M binary symbols. We also change the segment
set of the code. Both modications yield a more natural and simple implementation of the
algorithm and reduce the redundancy of the code. In the next sections we will be more
specic about our `modied Lawrence' code.
see Figure 1, to determine whether or not a sequence is a segment. A new segment starts
at the top of the triangle, xt = 0 corresponds to a step in the a-direction, xt = 1 to a step
in the b-direction. Hence 0010001 is a segment since 7 6 < 82 and 8 21 82. We say
that (5,2) is on the segment set boundary in the triangle, i.e. (5,2) is a boundary point.
See Appendix A.1 for the exact denition of this term. In Figure 1 we also indicate the
path for 001001.
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 21 7 1
1 8 28 28 8 1
1 9 36 36 9 1
1 10 10 1
1 11 . a b & 11 1
1 12 12 1
1 81 1 1
Figure 1: Modied Lawrence code with C=82. Boundary points are underlined.
Observe that M (0; 0) is equal to the total number of segments in the set. M (0; 0) is a
function of C and for C = 82 this total number is 256, see Figure 2, where we relled the
triangle of Figure 1 from bottom to top using (4) and starting at the boundary points. In
the appendix A we show that for C large enough 2C M (0; 0) 2C 1 + ln M2(0;0) .
log
256
128 128
106 22 106
95 11 11 95
88 7 4 7 88
83 5 2 2 5 83
79 4 1 1 1 4 79
76 3 1 1 3 76
74 2 1 1 2 74
73 1 1 1 1 73
72 1 1 72
71 1 . a b & 1 71
70 1 1 70
1 1 1 1
i) index := 0 a := 0 b := 0
ii) while M(a,b) 6= 1 do
if x(next) = 0
then a := a + 1
else index := index + M(a+1,b)
b := b + 1
This lexicographical index is sent to the decoder that reconstructs the segment as fol-
lows:
i) I := 0 a := 0 b := 0
4
ii) while M(a,b) 6= 1 do
if index < (I + M(a+1,b))
then x(next) := 0
a := a + 1
else x(next) := 1
I := I + M(a+1,b)
b := b + 1
It will be clear that the lexicographical index of the segment 0010001 is 95 + 3 = 98,
see Figure 2. We remark that there are M (3; 0) = 95 segments starting with '000' and
M (6; 1) = 3 segments starting with '0010000'.
4 The performance
The redundancy of a code is dened as the dierence between the compression rate R
and the source entropy h(). In this section we tabulate the redundancy of our algorithm
and compare it with the universal 'Pascal triangle' algorithm [8] and Lawrence's algo-
rithm [5]. We compute the redundancy of the three codes for the code sizes 256 (i.e. 8
digits codewords) and 65536 (i.e. 16 digits codewords). The results are listed in Table 1.
Code size is 28 Code size is 216
Pascal Law- modif. Pascal Law- modif.
triangle rence algor. triangle rence algor.
0.5 0.17857 0.25799 0.24574 0.5 0.09943 0.12219 0.19436
0.1 0.22958 0.21492 0.17505 0.1 0.13257 0.11983 0.11929
0.01 0.37748 0.20726 0.06196 0.01 0.22517 0.04883 0.03849
0.001 0.42016 0.24226 0.09132 0.001 0.25925 0.00480 0.00449
0.0001 0.42740 0.24889 0.09768 0.0001 0.26559 0.00329 0.00063
0.00001 0.42842 0.24986 0.09862 0.00001 0.26653 0.00381 0.00102
Table 1: The redundancies.
From this table we can see that the two variable-to-xed length algorithms (Lawrence
and the modied Lawrence algorithms) outperform the xed-to-variable length Pascal tri-
angle algorithm, except for high entropy sources. Also we observe that for low entropy
sources the modied algorithm performs better than the original Lawrence algorithm.
From the table we can conclude that the variable-to-xed length algorithms perform well
for practical code sizes.
5
rate of this modied Lawrence algorithm converges as the code size increases. An asymp-
totic upperbound on the rate is stated in the following theorem and its proof is given in
Appendix A.
Theorem 1 For any > 0 and any 0 < < 1, we have for C > C (; ) that
M 1 + (1 + ) log log M h():
!
R = log
Lav 2 log M
(5)
It should be noted that M increases when C increases. In particular, see the text before
(11) in Appendix A where it is shown that M 2C .
R 1 + (1 ) log
2 log M
h();
for all 0 1 except for those in a set B whose volume tends to zero as M increases.
The proof of this theorem is based on Rissanen's converse for xed-to-variable length
codes for arbitrary sources [7]. We restrict ourselves to variable-to-xed length codes for
binary memoryless sources, although it is clear that the proof readily extends to arbitrary
nite alphabet memoryless sources.
7 Conclusion
In this contribution we showed that the modied Lawrence algorithm is universal over the
class of binary memoryless sources, and in addition, that the rate converges asymptotically
optimally fast to the source entropy h(). For the class of binary memoryless sources the
asymptotically optimal redundancy is h() log log M=(2 log M ) where M is the number
of codewords. When we compare this to Rissanen's redundancy for this case, which is
log N=(2N ) = log log M=(2 log M ) where M again denotes the number of codewords, we
see that in the VF case the asymptotic redundancy is a factor h() lower than in the FV
case.
An earlier converse for VF codes for memoryless sources was given by Tromov, see [4].
The lowerbound stated there showed a uniform convergence in correspondence with Davis-
son's [3] result that the class of memoryless sources is `minimax universal'. However this
6
bound is expressed in terms of the minimal average message length of a code (with respect
to the class of sources) and we consider this a less realistic approach than our bound of
Theorem 2 which relates the redundancy to the code size.
In a recent paper, Shtarkov [9] presented two universal VF coding schemes for m-ary
memoryless sources. The rst scheme achieves Tromov's lowerbound. The upperbound
for the redundancy for the second scheme for binary sources is twice as high as our upper-
bound (5).
Finally we want to thank the reviewer who suggested that an upperbound on the rate
in a previous version might be too weak. Motivated by this remark we indeed obtained
the suggested improvement and also the converse result.
Appendices
A An upperbound on the rate of the modied Lawrence
algorithm
This appendix consists of 4 subsections. Throughout these subsections we assume that
0 < < 1 and that > 0.
f (a; b) = (a + b + 1) b (6)
Recall that (a; b) corresponds to a sequence containing a zeros and b ones. A point (a; b)
is now said to be internal if and only if f (a; b) < C . A point (a; b) is a boundary point if
and only if f (a; b) C and at least one of the points (a; b 1) and (a 1; b) is internal.
Note that if (a; b) is a boundary (internal) point, (b; a) is a boundary (internal) point
too. The set of boundary points (segments) is therefore symmetric. Let S be the minimal
value of a + b when (a; b) is a boundary point. Then, if S is even, (S=2; S=2) must be a
boundary point and (S=2; S=2 1) an internal point. Consequently C > f (S=2; S=2 1)
S 2SS 1 = 2S 1. Likewise, for S odd, we can show that C > 2S 1. Hence
S < log 2C: (7)
To avoid degenerate codes we always assume that S 1 and thus C > 1.
Now consider a boundary point (a; b) with a b. Then, if b 1, the point (a; b 1)
must be internal since f (a; b 1) f (a 1; b). Note that not both (a + 1; b 1) and
(a + 1; b) can be boundary points. On the other hand either (a + 1; b 1) or (a + 1; b) must
be a boundary point since f (a + 1; b) f (a; b) C .
7
Now for 1 b S=2 let amin (b) resp. amax (b) be the minimal resp. maximal
value of a such that (a; b) is a boundary point. The consequence of the above is that
(amin (b); b); (amin(b) + 1; b); ; (amax (b); b) all are (adjacent) boundary points and so is
(amax (b) + 1; b 1).
If we consider a boundary point (a; 0) then (a 1; 0) must be an internal point. Note
that (a + 1; 0) can not be a boundary point too.
M =2
X
b N (S ); (9)
b=0;bS=2c
where N (S ) is 0 for odd S and S for even S . Note that (amax(b); b 1) is an interior
S=2
point if 2 b bS=2c: Hence
C > f (amax (b); b 1) = (amax (b) + b) amax (bb) +1b 1 = b amax (bb) + b :
! !
(10)
From amax0 (0) + amax1 (1) = C (and consequently M 2C ) for C large enough, and (10),
M 2C + 2
X C 2C + 2 C db = 2C (1 + ln S )
b=2;bS=2c b 1 b 2
2C 1 + ln log22C 2C 1 + ln log2M ;
!! !!
(11)
a+b+1 b
8
where S is the set of all segments in the code, G the set of all boundary points (a; b), and
Prf(A; B ) = (a; b)g is the probability that the source generates a segment with a zeros
and b ones.
Now let = =2 and + = (1 + )=2. If we note that 0 < < < + < 1 we can
dene
G = f(a; b) 2 G : a +b b +g: (13)
Note that both a and b tend to innity for (a; b) 2 G when C increases. This follows from
the fact that for any boundary point (a; b)
a + b = 21 log 22(a+b) 12 log(a + b + 1)2a+b
1 a +b
!
p
2 log(a + b + 1) b log C (14)
1 a+b 1
(a + b + 1) 2ab exp( 12 )
a b
a+b+1 b
p
s
2
= a + b (a + b + 1) exp( 1 ) exp (a + b)d b k
!!
2ab 12 a+b
p b + 1)2 exp( 1 ) pa + b 9 exp(1=6)
s v
a + b (a +2ab
u
u
t
12 8 (1 +)
v
p p
= a + b 9 exp(1=6) = K a + b;
u
u
t
(15)
2(1 )
for K = (9 exp(1=6))=(2(1 )) and where d(pkq) = p log p + (1 p) log 1 p (with
q
q 1 q
0 p 1 and 0 < q < 1) is the (non-negative) binary divergence function. For any
(a; b) 2 G we nd
(1 )ab
1 a+b 1
(a + b + 1) C: (16)
a+b+1 b
The last inequality holds since for any point (a; b); a + b 1 on the boundary, there exists
an interior point (a0 ; b0) with a0 + b0 = a + b 1 for which (a0 + b0 + 1) B < C , for some
binomial coecient B .
To get an upper bound for the divergence D(P kQ) in (12) we rst consider
Prf(A; B ) 62 G g = Prf(A; B ) 2 G : A + B < g + Prf(A; B ) 2 G : B > +g
B A+B
9
= Prf (1 )l i i g + Prf (1 )l+ ii g
X X
(a;b)2G
p
Prf(A; B ) = (a; b)g log( a + b) + K + 2C =2 log C
X
(a;b)2G
12 log(Lav ) + K + ; (18)
for C large enough.
10
K 1 log h() + 1 + . The second inequality follows from log M H segm =
where K0 = 2
L h(). For the rate of our modied Lawrence code we nally obtain
av
Note that this lower bound for R holds also for h() = 0. Now we easily arrive at our rst
implication :
log M log log M
!
A (1
) log
2 log M
) R 1+
2 log M
1
log M
h(): (27)
11
Next we introduce the set X of segments that have a prex of length Lmin which is `
-typical', i.e.
8 9
LXmin
X = :x 2 S : L(x ) L ^ Lmin xi p cmin ; ;
1
<
min
=
(28)
i=1
L
where c > 0 is to be specied later. Note that for P = Prfx 2 X g, by the union-bound
and Chebyshev's inequality, we may conclude that
8 9
LXmin
P 1 Prfx 2 S : L(x ) < L g Pr :x 2 S : Lmin xi > p cmin ;1
< =
min
i=1 L
1 A (1c2 ) 1 A 41c2 : (29)
Let M be the number of segments in X , then from the `log-sum' inequality (Csiszar and
Korner [2]) we obtain that
T =
X
Prfx g log
Prfxg P log P : (30)
x 2X 1 =M M =M
Furthermore from Massey's leaf-node theorem, the log-sum inequality and the basic in-
equality ln t < t 1 it follows that
g log Prfx g
log M = Lav f
X
h( ) + Pr x 1=M
x 2S
Lav h() + T + (1 P ) log 1 1 MP=M
Lav h() + T log e: (31)
2
segm log M , leads for M 2exp 1 , to our second
This combined with Lav h() = H
implication :
M M e
!
log log log log
T (1 ) 2 ) R 1 + (1 ) 2 log M log M h(): log (32)
From the denition
M M
( )
log log
B = : A (1
) < 2 log M ^ T < (1 ) 2 log log : (33)
and the implications (27) and (32) it follows that for M large enough
log M
!
References
[1] M. Abramowitz and I.A. Stegun, Handbook of Mathematical Functions. New York :
Dover, 1970.
13
[2] I. Csiszar and J. Korner, Information Theory : Coding Theorems for Discrete Memo-
ryless Systems. Budapest, Hungary : Akademiai Kiado, 1981.
[3] L.D. Davisson, \Universal noiseless coding," IEEE Trans. Inform. Theory, vol. IT-19,
no. 6, 1973, pp. 783{795.
[4] R.E. Krichevsky and V.K. Tromov, \The performance of universal encoding," IEEE
Trans. Inform. Theory, vol. IT-27, no. 2, 1981, pp. 199{207.
[5] J.C. Lawrence, \A new universal coding scheme for the binary memoryless source,"
IEEE Trans. Inform. Theory, vol. IT-23, no. 4, 1977, pp. 466{472.
[6] J.L. Massey, \The entropy of a rooted tree with probabilities," presented at the IEEE
Int. Symp. Inform. Theory, St. Jovite, Canada, Sept. 26{30, 1983.
[7] J. Rissanen, \Universal Coding, Information, Prediction, and Estimation," IEEE
Trans. Inform. Theory, vol. IT-30, no. 4, July 1984, pp. 629{636.
[8] J.P.M. Schalkwijk, \An algorithm for source coding," IEEE Trans. Inform. Theory,
vol. IT-18, no. 3, 1972, pp. 395{399.
[9] Yu.M. Shtarkov, \The variable-to-block universal encoding of memoryless sources,"
presented at The Fourth Joint Swedish-Soviet International Workshop on Information
Theory, august 1989, Gotland, Sweden.
[10] B.P. Tunstall, Synthesis of noiseless compression codes, Ph.D. dissertation, Georgia
Institute of Technology, Sept. 1967.
14