case4文章
case4文章
Max CROCHEMORE
Laboratoire d ‘pifwmatique, UniversitCde Haute - Normandie, BP 6 7 76130, Man t-S&int-A ignan, France
the repetitions of primitive factors in a word x in time x is, as in [4], any integer in {1,2, .... n).
0( i x i log2 Ix i 9. A straightforward adaption of the A word u of length p is said to occur at position i
Knuth, Morris and Pratt’s string-matching algorithm in x if
[S) also allows to solve the problem, but in time
O(ixi2). it p - 1 Gn and U=XiXi+l l
**Xi+p_le
Main and Lorentz have given an 0( I x I log2 i xi)
algorithm to find one square in a word x. Their meth- As usual, a word is said to be primitive if it cannot
od cannot be directly extended to solve the present be written ve with v E A* and e 3 2.
problem since they eliminate many repetitions when Then, a repetition in x is a non trivial power of a
they are guaranteed to find another one later in the primitive word which occurs in x.
search. More accurately, a repetition in x is defined to be a
Our algorithm uses an improved version of the well- triple (i, p, e) so that, set”* u = xi a**
x~+P_~,one has:
known partitioning technique [I] for refmements of ue occurs at position i in x.
equivalence relations. This version has already been uetl does not occur at position i in x.
fruitful in a problem concerning partitions on graphs u is primitive.
C2L The integers p and e are called respectively the
The optimality of the algorithm is proved by period and the exponent of the repetition (i, p, e).
showing that there exist words which have indeed For instance, (1,3,2), (3,1,2), (4,2,2) and
0( ix i log2 ix i) repetitions. These particular words are (5,2,2) are repetitions in abaababa.
Fibonacci words. Maximalrepetitions are also considered and defined
With a slight modification, the algorithm gives the by: a repetition (i, p, e) in x is maximal if i - p G 0 or
maximal repetitions of a word. This algorithm is aljo xixitl *bxi+p_l does not occur at position i - p.
l
optimal since it computes all the 0( ix I log2 ix I) The algorithm given in this paper computes, for a
maximal repetitions of a Fibonacci word x in time word x in A*, its set of repetitions or only its set of
O(ixi loga 1x1). maximal repetitions.
Then, repetitions in x are characterized in term of Exploiting this relation directly leads to an 0(n2)
differences D,: algorithm to compute the equivalences.
The other dassicrd partitioning algorithm,
Lemma 1. (i, p, e) is a repetition iff Hopcroft’s one [ 11,does not work for tbis problem
since it computes EN via other equivalencesthan the
Dp(i) = DP(i + p) = **a= Dp(i + (e - 2)~) = p
E; s.
and The method retained here was used in [2] to par-
tition graphs. It leads to an O(n logs n) algorithm.
D,(it(e- l)p)+p.
Let us consider two consecutive values of the
Maximal repetitions are characterized in the same equivalences,E,_r and E,. Let (Cfp .... C,) be the
way: equivalence classesaccording to E, (E,-classes) and
CC’,,*a*,CAl) the EP_, -classes.E, being a refinement
Lemma 2. (i, p, e) is a maxImal repetition iff (i, p, e) of Ep_r , each Ep_r-classis a union of E,-classes.
is a repetition and i - p G 0 or Dp(i - p) # p. A choice finction is a function
Proof of Lemma I. Of course, the conditions are suf- f: {Ci, .*.,Ch*>+ IC1, *‘a,C,],
ficient. Now, if (i, p, e) is a repetition we have:
with the properties: for any C’ in CC’,,.,., Chs}
VjdjE{i,i+p,...,i+(e-2)p) DP(j)9p, [f(C) C C’ and for any C in {Cl, .... CJ C C C’ *
and ICI < lf(C’)l].
So, f associates to each E,_l -classone of its E,-
DP(it (e - l)p)#p. ~b~la~es of maximal size.
Suppose that D&i) = p’ < p for one j. The word Given a choice function f, each E+ass f(C’)is
u = XJ **xI+_r occurs also at positions j t p’ and
l
called a big class; the others are cabed ~rn~~iclasses,
j + p. In such a situation, denotiug by v the word Of course, there are as many big E,-classes than
xj ***Xj+p*_rand by w the word xI+r,sI**xp,,_l it E,_,&sses. In particular, E, = E,_r iff there is no
c~e~ybe~n~atu=~=~.~~s~~euisa small E&ass. By definition, all the E&asses are
power.of a word in A* [6J that contradicts the fact small.
that u is primitive, Now, a new sequence (S,),~r of equivalenceson
‘45
INFORMATION PROCESSINGLETTERS 13 October 1981 ’
Volume 12, number 5
the positions of x are defmed: of small ciasses. Thus, the cost of all executions of
steps 5 and 6 is
6, j) E E, of,
(i, j) f Sp iff
both i and j are in big En-classes. c IEp-classsi ,
of a small Ep-class
Equivalently we have:
where N is the first integer such that EN = EN+1or
ii, j) f Sp iff for any small E,-class C, equivalently such that EN+~has no small equivalence
ifC iffjEC. class.
Lemma 3. For any p 2 1, (i, j) E Ep+r iff (i, j; S E, Lemma 4. C < n log,(n - m + 1) where m is the num-
and(i+ l,j+ I)E$. ber of distinct letters in x.
procedure REP(x)
(I) defmeEtobeEronthewordx;defineDtobeDr;
p + 1; make R empty; SMALL+ {indices of E-classes);
(2) while SMALL# 0 do
(3) begin add to R the repetitions of period p (Lemma 1);
(4) p+p+ l;ifp> 1x1/2 thenretumR;
(5) E + E n g (Lemma 3); update D from the value of E.
(6) SMALL+ {indices of small E-classes);
end;
return R.
QUEUE in order to preserve the increasing order on execution of the while loop 5.2 exactly one position i
the positions in each small class. At the same time the is transferred from its equivalence class k to another
set, SPLIT, of E-classes submitted to the ‘splitting ‘5;.If i’ is the position that preceeds i in ECLASS(k)
instruction’ 5.2 is created. For each E-class k in then the value of Dp(i’) after i has been extracted
SPLIT, a set SUBCLASS(k) is initialized to contain its from ECLASS(k) is Dp_r (i’) + D,_ r(i) since positions
subclass indices, together with a variable LAST- in ECLASS(k) are in increasing order. When i Is added
SMALL(k). This indicator gives in step 5.2 the last to ECLASS(@ its predecessor i” in ECLASS(K) must
small class s that has been used to split the E-class k. satisfy
During step 5.2 the equivalence classes are split.
D,,(i”) = i _ i” ,
One position at a time is transferred to a new class’i;,
from the E-class k. Let us assume that i’ is the last since the positions in the small classes (copied in
position in ECLASS(k) that has been transferred to a QUEUE) are in increasing order. Furthermore, i being
class k’, using a small class s’; in this case LAST- the greatest position in ECLASS(k), we have Dp(i) =’
SMALL(k) = s’; if s’ is used again to transfer i into 00.These three points correspond to what is done
ECLASS(x) then i and i’ are equivalent according to during step 5.2.
the value of E being computed and % is defmed to be
k’. If not, a new index is extracted from NEWINDEX The procedure REP may be immediately modified
to define H, and LASTSMALL is set to be s. to calculate maximal repetitions in the word x. Re-
While a position is transferred, D and DCLASS are garding Lemma 2 we have only to move the instruc-
updated. The computation of D use heavily the fact tion 3.1 after the step 3.2. Let this new procedure be
that positions in equivalence classes are in increasing called REPMAX.
order.
At step 6, a new value of SMALL is calculated. The Theorem 6. The procedure REPMAX computes all the
array that gives the number of elements in each E- maximal repetitions of a word x.
class allows to find the small classes efficiently.
Theorem 7. The time complexity of procedure REP
Theorem5. The procedure REP in Fig. 2 computes (or REPMAX) is 0( 1x1log, Ixi+ IAl 1x1).
all the repetitions in a word x.
Proof. Step 1 in Fig. 2 contributes to O(m lx I) in the
Proof. It is easy to see that ‘the algorithm stops. The total complexity, where m is the number of distinct
computation of a new value of the equivalence E is letters in the word x. This is bounded by 0( IA I 1x1).
done in steps 5.1 and 5.2 exactly as stated in Lemma Next, we discuss the complexity of the “while”
3. If we assume that D is correctly calculated, then loop 2. All the executions of step 3 take a time
from Lemma 1 it can be shown that all the repetitions proportional to the number of repetitions in the word
of period p are added to R at step 3. x. This number is bounded by I x I 1062 I x I [6].
It remains to prove that D is well updated. At each The cost of the executions of steps 5.1,s .2 and 6
247
Volume 12, number 5 INFORMATION PROCESSING LETTERS 13 October 1981
procedure REP(x)
for k c 2n step 1 until 1 do begin push k onto NEWINDEX; make ECLASS(k) empty;
end;
(1) for I + 1 until n do begin if (xi already occurs at j) then k +- E(j)
else pop k from NEWINDEX; E(i) c- k; add i at the end of ECLASS(k);
end;
defme D; put in same DCLASS the positions that have same values of D; p +- 1; make R, QUEUE, SPLIT empty;
SMALL + {indices of the E-classes};
(2) while SMALL f Q do
begin comment computation of the repetitions of period p;
(3) while DCLASS(p) + @do
begin i + a position in DCLASS(p);
repeati+i+puntiID(i)#p;e+l;
repeatbegini+i-p;e+e+l;
(3.1) add (i, p, e) to R; erase i from DCLASS(p);
end;
until (i - p 6 0 or D(i - p) + p);
(3.2) comment see computation of maximal repetitions;
end;
(4) pep+ l;ifp>n/2thenretumR;
comment copy of small classes in QUEUE;
(5.1) while SMALL # fl do
begin extract s from SMALL;
for j from the frost to the last element of ECLASS(s) do
be@nIfj+ 1 then
begin add (j, s) at the end of QUEUE; k t E(j - 1);
if k 4 SPLIT then
begin add k to SPLIT; set SUBCLASS(k) = {k}; LASTSMALL e 0;
end;
end;
end
comment computation of the new values of E and D;
(5.2) while QUEUE + 0 do
begin (j, s) a the first pair in QUEUE; i - j - 1; k + E(i);
if LASTSMALL # s then
begin LASTSMALL t s; pop NI from NEWINDEX; add NI to SUBCLASS(k);
end;
x+ the last index put in SUBCLASS(k);
if (i has a predecessor i’ in ECLASS(k)) then
begin D(i’) + D(i’) + D(i); transfer i’ to DCLASS(D(i’));
end;
transfer i at the end of ECLASS(k); E(i) CT; D(i) + 0; transfer i to DCLASS(=);
if (i has a predecessor i’ in ECLASS(@) then
begin D(i’) + i - i’; transfer i’ to DCLASS(D(i’));
end;
end;
comment determination of the small classes;
while SPLIT # Q do
begin extract k from SPLIT;
If IECLASS(k)l = 0 then
begin push k onto NEW INDEX; erase k from SUBCLASS(k);
end;
add to SMALL ali the indices in SUBCLASS(r) but one, corresponding to a greatest E-class;
end;
end;
return R.
Fig. 2. Searching repetitions in a word x.
248
Volume 12, number 5 INFORMATIONPROCESSINGLETTERS 13 October 1981
The proof is a direct consequence of Lemma 10 on First, we prove that fq+r contains Ifq_3 I t 1
the number of squares in Fibonacci words. Observing squares of period Ifq_l I; so, these squares contribute
that Fibonacci words do not contain repetition of to rq+l. We have successively:
exponent 4, together with Lemma 10, we obtain also
fq+l = fq fq-1 = fq_.Jfq_*fq_*fq-3
the optimal@ of the procedure REPMAX:
= fq_~fq_~fq__3fq-~fq._~
Theorem 9. The procedure REPMAXis optimal in the
= fq_lfq_~fq_L&fq_+
class of algorithms computing all the maximal repeti-
tions of a word. The square fq_Ifq_l is then a prefm of fq+r .
For q 3 6, fq_a is a prefm of fq_4fq_3 since:
Lemma 10. Let us define the sequence of Fibonacci
fq-4 fq-3 = fq._4fq_4fq_~
words by: f0 = b, fr = a and fq+r = fqfq_rq integer
> 1. Then, the number % of squares (repetition of = fq-4fq_,fq-_6fq-5
exponent 2) in fq satisfy, for any q Z 5 :
= fq_$q_Jq-5 l
To get the results, it suffices to prove: [l] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The Design and
Analysis of Computer Algorithms (Addison-Wesley, MA,
1974) 157-162.
[2] A. Cardon and M. Crochemore,Partitioning a graph in
O( IA Ilog2 IVI), Theoret. Comput. Sci., to appear.
>pq+lllogz Ifq+J, [3] A. Ehrenfeucht and G. Rosenberg,On the separating
249
Volume 12, number S INFORMATION PROCESSING LETTERS 13 October 1981
power of EOL systems, RAIRO, to appear. problem in the theory of free monoids, in: Combinatorial
(41 M.A. Harrison, Introduction to Formai Language Theory Mathematics and Its Applications (University of North
(Addison-Wesley, MA, 1978). Carolina Press, NC, 1969) 128-144.
[5] D.E. Knuth, J.H. Morris and V.R. Pratt, Fast pattem- [7] R. Ross and R. Winkimann, Repetitive strings are not
matching in strings, SIAM J. Comput. 6 (1977) 323-350. context-free, CS-81-070 (Washington State University,
[6] A. Lentin and M.P. Schutzenberger, A combinatorial Pullman, WA, 1981).
250