Characterization Results For Time-Varying Codes: Fundamenta Informaticae 53 (2), 2002, 185-198
Characterization Results For Time-Varying Codes: Fundamenta Informaticae 53 (2), 2002, 185-198
(a
E. Makinen
C. Enea
(b
D. Trinca
(a
(a
a)
Abstract
Time-varying codes associate variable length code words to letters being encoded depending on their positions in the input string. These codes
have been introduced in [8] as a proper extension of L-codes.
This paper is devoted to a further study of time-varying codes. First,
we show that adaptive Huffman encodings are special cases of encodings
by time-varying codes. Then, we focus on three kinds of characterization
results: characterization results based on decompositions over families of
sets of words, a Sch
utzenberger like criterion, and a Sardinas-Patterson
like characterization theorem. All of them extend the corresponding characterization results known for classical variable length codes.
Originated in the Shannons information theory in the 1950s, the theory of codes
has developed in several directions. Among them, the theory of variable length
codes, strongly related to combinatorics on words, automata theory, formal languages, and the theory of semigroups, has produced a number of beautiful results
applicable to various fields. Intuitively, a variable length code is a set of words
such that any product of these words can be uniquely decoded.
1
Time-Varying Codes
t
T
0
T1
T
t
Tt
L
T
0
1
L1
T
L
TTt
Lt t
a C
C1
0
C
Ct t
b c
Figure 1: Tree representation of the code {01, 101, 110}
proper extension of L-codes [4]. The connection to gsm-codes and SE-codes has
been also discussed in [8]. In this section we recall the concept of a TV-code
and show that adaptive Huffman encodings are special cases of encodings by
TV-codes.
A TV-code over an alphabet is any function h : N + , where
: given by h()
1
h(1 , 1)
h(2 , 1)
2
h(1 , 2)
h(2 , 2)
3
h(1 , 3)
h(2 , 3)
frequency of in w
.
|w|
and
w (u u ) = h
w (u)hw ( , |u| + 1)h
w (u ) = h
w (u)code( , A|u| )h
w (u ).
h
Because Ai is a prefix code for any i, it follows that neither code(, A|u| ) is a
prefix of code( , A|u| ) nor code( , A|u| ) is a prefix of code(, A|u| ). Therefore,
w (uu ) 6= h
w (u u ), proving that hw is a TV-code.
h
We have proved the following.
Proposition 2.1 Adaptive Huffman encodings are special cases of encodings
by TV-codes.
Example 2.1 In Figure 2, the sequence of Huffman trees needed to encode the
string dcd over the alphabet {a, b, c, d}, is given. The first Huffman tree A0 is
associated to the alphabet. When the first letter d of the input string is read,
it is encoded by code(A0 , d), and a new Huffman tree A1 is generated. This
procedure is iterated until the last letter of the input string is processed. The
4t
T
0 T 1
T
2t
2Tt
L
0 T 1
0 L1
T
T
t
L
1
1 t 1t
1Tt
a
d
b c
5t
T
0 T 1
T
2t
3Tt
L
0 T 1
0 L1
T
T
t
1
1 Lt 1 t
2Tt
a
c
d
b
(b) A1
(a) A0
7t
T
0 T 1
Tt
3 t
4T
T
d
0 T1
2 t
2T t
c
T
0 T 1
T
Tt 1
1 t
a
b
6t
T
0 T 1
T
2t
4Tt
L
0 T 1
0 L1
T
T
t
L
t
t
1
1 2
2Tt
a
d
b c
(c) A3
(d) A2
TV-code induced by this adaptive Huffman encoding in given in the table below.
\ N
a
b
c
d
1
00
01
10
11
2
00
01
10
11
3
4
00 110
01 111
10 10
11
0
5
110
111
10
0
Characterization Results
The aim of this section is to present several characterization results for TVcodes. First, we characterize the TV-code property by means of decompositions over families of sets of words, and then, a Sch
utzenberger criterion and
a Sardinas-Patterson characterization theorem are presented. All these results
extend the corresponding characterization results known for classical codes.
In order to avoid trivial but annoying analysis cases, the alphabet is
assumed to be of cardinality at least 2 throughout this section.
3.1
Let A S
= (Ai |i 1) be a family of subsets of + . Denote by Ai the set
i
A = ji Ai Aj . A decomposition of a word w + over the family A is
any sequence of words u1 , . . . , uk such that w = u1 uk and ui Ai , for all
1 i k.
Definition 3.1 A function h : N + is called regular or injective on
sections if the following property holds true:
(i 1)(, )( 6= h(, i) 6= h( , i)).
h is called regular of base C, where C is a nonempty subset of + , if it is regular
and Hi = C, for all i 1.
When is finite, h is regular iff |Hi | = ||, for all i 1.
Remark 3.1 Any TV-code h must be regular. Indeed, if we assume that h is
a TV-code and there are , such that 6= and h(, i) = h( , i) for
some i 1, then
}
} 6=
| {z
| {z
i times
i times
1 2 3
a c c
a d d
is not regular but any word w + has at most one decomposition over its
1 ) = h(
2 ).
family of sections. Moreover, h is not a TV-code because h(
Corollary 3.1 A regular function h : N + of base C is a TV-code iff
C is a code over .
Proof Let h : N + be a regular function of base C. Then, h is a TVcode iff any word w + has at most one decomposition over H = (Hi |i 1)
iff any word w + has at most one decomposition over C iff C is a code over
. 2
Remark 3.3 (1) A section of a TV-code is not necessarily a code. Indeed,
let us consider the function h given in the table below.
\ N
1
2
3
1 2 3
a c c
ab d d
ba e e
1
a
ab
2 3 4
ba c c
a d d
P
for all w (the symbol uv=w in the right hand side of the second
equation indicates summation in the semiring R, over all factorizations uv of
w, and f (u)g(v) is the product of f (u) and g(v) in R). These operations are
associative and R[[]] under them forms a semiring.
In what follows we work only with power-series over the semiring N of natural
numbers with addition and multiplication. Let be an alphabet and X .
The characteristic power-series of X, denoted X , is defined by X (w) = 1 if
w X, and X (w) = 0, otherwise.
Let A1 , . . . , An be subsets of + , where n 2. The product A1 An
is called unambiguous if any word w A1 An has only one decomposition
w = u1 un with ui Ai , for all 1 i n.
The unambiguity property can be easily characterized by power-series as
follows [1]: the product A1 An is unambiguous iff
A1 An = A1 An .
Let h : N + be a function. The family (H1 Hi |i 1) of
formal power-series is locally finite, that is, the following property holds true
(w )(|{i 1|(H1 Hi )(w) 6= 0}| N).
P
Therefore, a power-series i1 H1 Hi can be defined by
X
(
H1 Hi )(w) =
i1
(H1 Hi )(w),
{i1|(H1 Hi )(w)6=0}
for all w (the right hand side of the equality is a finite sum of natural
numbers).
+
Proposition
P 3.2 Let h : N be a regular function. h is a TV-code
iff H 1 = i1 H1 Hi .
Proof
true:
w H 1
w H1 Hi , for exactly one i 1
H1 Hi (w) = 1, for exactly one i 1
(
PH1 Hi )(w) = 1, for exactly one i 1
( i1 H1 Hi )(w) = 1,
for any w .
P
Conversely, assume that H 1 = i1 H1 Hi , but h is not a TV-code.
Then, there is a word w + having at least two distinct decompositions over
H. That is, there are i 1 and j 1 such that w H1 Hi H1 Hj (in the
case i = j we assume that w has two distinct decompositions over H1 Hi ).
Then:
(H1 Hi )(w) 1 and (H1 Hj )(w) 1, if j 6= i, and
(H1 Hi )(w) 2, if j = i.
P
Therefore, ( i1 H1 Hi )(w) 2 > 1 = H 1 (w); a contradiction. 2
Remark 3.4 Let h : N + be a regular function of base C.PBy Corollary 3.1 and Proposition 3.1 we obtain that C is a code iff C + = i1 (C )i .
This is a well-known characterization result for classical codes [3].
3.2
Sch
utzenberger Criterion for TV-Codes
the word xwy has at least two decompositions over H, one of the form x(wy),
beginning by a decomposition of x H1 Hi , and another one of the form
(xw)y, beginning by a decomposition of xw H1 Hj . The decomposition of x
in the word xw is different than the decomposition of x H1 Hi because w is
not a member of H i+1 . Therefore, xwy has at least two distinct decompositions
over H, contradicting the fact that h is a TV-code.
Conversely, suppose that (1), (2), and (3) hold true but h is not a TV-code.
Then, there are two distinct words 1 n , 1 m + such that
h(1 , 1) h(n , n) = h(1 , 1) h(m , m).
Let i be the least index such that i 6= i . By (1), we obtain h(i , i) 6= h(i , i).
Moreover, either h(i , i) is a proper prefix of h(i , i), or vice versa. Let us
suppose that h(i , i) = h(i , i)w, where w is non-empty. Since H is catenatively
independent, w 6 H i+1 .
The equality
h(1 , 1) h(i , i) = h(1 , 1) h(i , i)w
shows that H1 Hi w H1 Hi 6= . Similarly, the equality
h(i+1 , i + 1) h(n , n) = wh(i+1 , i + 1) h(m , m)
shows that wH i+1 H i+1 6= . We have now a contradiction with (2). 2
Remark 3.5 Let C be a nonempty subset of + . Corollary 3.1 and Theorem
3.1 lead to the following conclusion: C is a code iff the following two properties
hold
1. C is catenatively independent;
2. for any w + , if w 6 C + then C + w C + = or wC + C + = .
This is the Sch
utzenberger criterion for codes [1].
3.3
h(, 1)
w
h( , 1)
h( , 2)
w
w
h( , 1)
h(, 1)
h( , 1)
1
a
baa
aba
2
3
4
5
baa ab
ab
ab
b
aab aab aab
bab
a
a
a
1 6= 1 and
1 6= 1 and
and
such that y = y x. By the induction hypothesis, there are 1 qk
+
h(1 , k) h(qk
, q 1)y = h(1 , k) h(pk+1 , p).
Further, we have
h(1 , k) h(qk
, q 1)y x = h(1 , k) h(pk+1 , p)x.
(1) h is regular;
(2) (k 1)(i, j k)(Hi,j Hi+1 = ).
Proof Let us assume that h is a TV-code. Clearly, h is regular. In order to
prove (2) consider k 1 and i, j k.
To derive a contradiction, assume that Hi,j apHi+1 6= , and let x Hi,j
that 1 6= 1 and
+ such that
are two distinct words 1 n , 1 m
()
1 n ) = h(
).
h(
1
m
Without loss of the generality we may assume that there are no k i < n and
k j < m such that
h(k , k) h(i , i) = h(k , k) h(j , j).
From (1) it follows that h(k , k) 6= h(k , k), and from () it follows that
h(k , k) is a proper prefix of h(k , k), or vice versa. Let us assume that h(k , k)
is a prefix of h(k , k), and let x be such that h(k , k)x = h(k , k). Then,
x Hk,k , and x 6 Hk+1 because Kk,k Hk+1 = .
The relation () leads to
( )
, m)
h(k+2 , k + 2) h(n , n) = yh(k , k) h(m
13
or
Continuing this process a finite number of times, we get a word z such that
, m) and z Hm1,n Hm .
h(n , n) = z and z Hn1,m Hn , or z = h(m
Both cases lead to a contradiction. Therefore, h is a TV-code. 2
Example 3.2 Consider the function h in Example 3.1. We have H3,2 H4 6= ,
which shows that h is not a TV-code. In fact, it is easy to see that
h(1 , 1)h(2 , 2)h(1 , 3)h(1 , 4) = h(3 , 1)h(2 , 2).
Remark 3.6 We show that the Sardinas-Patterson characterization theorem
for classical codes is a special case of Theorem 3.2.
Let C + be a non-empty set. Define
C1 = {x + |Cx C 6= };
Ci+1 = {x + |Cx Ci 6= Ci x C 6= }, for all i 1.
The Sardinas-Patterson characterization theorem states that C is a code iff
C Ci = , for all i 1.
Assume now that h : N + is a regular function of base C. Then,
Hk,k = H1,1 = C1 , for all k 1. By induction on i 2, we can prove that
Ci =
i
[
Hij+1,j .
j=1
Si1
j=1
Hij+1,j , then
Cx H1,i1 6= Hi2,2 x C 6=
Cx Hi1,1 6= }
Si
=
j=1 Hij+1,j
14
Conclusions
Time-varying codes associate variable length code words to letters being encoded depending on their positions in the input string. These codes have been
introduced in [8] as a proper extension of L-codes.
In this paper we have continued the study of time-varying codes. First, we
have shown that adaptive Huffman encodings are special cases of encodings by
time-varying codes. Then, we have provided three kinds of characterization results: characterization results based on decompositions over families of sets of
words, a Sch
utzenberger like criterion, and a Sardinas-Patterson like characterization theorem. All of them extend the corresponding characterization results
known for classical variable length codes.
References
[1] J. Berstel, D. Perrin: Theory of Codes, Academic Press, 1985.
[2] W. Kuich, A. Salomaa. Semirings, Automata, and Languages, SpringerVerlag, 1986.
[3] G. Lallement. Semigroups and Combinatorial Applications, John-Wiley &
Sons, 1979.
[4] H.A. Maurer, A. Salomaa, D. Wood. L-Codes and Number Systems, Theoretical Computer Science 22, 1983, 331346.
[5] G. Rozenberg, A. Salomaa (eds.): Handbook of Formal Languages, vol. 1,
Springer-Verlag, 1997.
[6] A. Salomaa: Jewels of Formal Language Theory, Computer Science Press,
1981.
[7] D. Salomon. Data Compression. The Complete Reference, Springer-Verlag,
1998.
[8] F.L. T
iplea, E. Makinen, C. Enea. SE-Systems, Timing Mechanisms, and
Time-Varying Codes, International Journal of Computer Mathematics (to
appear).
15