Faster Scrabble Algorithm
Faster Scrabble Algorithm
steven a. gordon
Department of Mathematics, East Carolina University, Greenville, NC 27858, U.S.A.
(email: magordonKecuvax.cis.ecu.edu)
SUMMARY
Appel and Jacobson1 presented a fast algorithm for generating every possible move in a given
position in the game of Scrabble using a DAWG, a finite automaton derived from the trie of a
large lexicon. This paper presents a faster algorithm that uses a GADDAG, a finite automaton
that avoids the non-deterministic prefix generation of the DAWG algorithm by encoding a
bidirectional path starting from each letter of each word in the lexicon. For a typical lexicon, the
GADDAG is nearly five times larger than the DAWG, but generates moves more than twice as
fast. This time/space trade-off is justified not only by the decreasing cost of computer memory,
but also by the extensive use of move-generation in the analysis of board positions used by
Gordon 2 in the probabilistic search for the most appropriate play in a given position within
realistic time constraints.
key words: Finite automata Lexicons Backtracking Games Artificial intelligence
INTRODUCTION
Appel and Jacobson1 presented a fast algorithm for generating every possible move
given a set of tiles and a position in Scrabble (in this paper Scrabble refers to the
SCRABBLE brand word game, a registered trade mark of Milton Bradley, a
division of Hasbro, Inc.). Their algorithm was based on a large finite automaton
derived from the trie3,4 of the entire lexicon. This large structure was called a
directed acyclic word graph (DAWG).
Structures equivalent to a DAWG have been used to represent large lexicons for
spell-checking, dictionaries, and thesauri. 5–7 Although a left-to-right lexical represen-
tation is well-suited for these applications, it is not the most efficient representation
for generating Scrabble moves. This is because, in Scrabble, a word is played by
‘hooking’ any of its letters onto the words already played on the board, not just the
first letter.
The algorithm presented here uses a structure similar to a DAWG, called a
GADDAG, that encodes a bidirectional path starting from each letter in each word
in the lexicon. The minimized GADDAG for a large American English lexicon is
approximately five times larger than the minimized DAWG for the same lexicon,
but the algorithm generates moves more than twice as fast on average. This faster
Figure 1. Subgraph of unminimized GADDAG for ‘CARE’ (see Table I for letter sets)
S1 = {D u DC is a word} = [.
S2 = {D u DA is a word} = {A,B,D,F,H,K,L,M,N,P,T,Y}.
S3 = {D u DR is a word} = {A,E,O}.
S4 = {D u DE is a word} = {A,B,D,H,M,N,O,P,R,W,Y}.
S5 = {D u CD is a word} = [.
S6 = {D u DCA is a word} = {O}.
S7 = {D u DAR is a word} = {B,C,E,F,G,J,L,M,O,P,T,V,W,Y}.
S8 = {D u DRE is a word} = {A,E,I,O}.
S9 = {D u CAD is a word} = {B,D,M,N,P,R,T,W,Y}.
S10 = {D u DCAR is a word} = {S}.
S11 = {D u DARE is a word} = {B,C,D,F,H,M,P,R,T,W,Y}.
S12 = {D u CARD is a word} = {B,D,E,K,L,N,P,S,T}.
S13 = {D u DN is a word} = {A,E,I,O,U}.
S14 = {D u DEE is a word} = {B,C,D,F,G,J,L,N,P,R,S,T,V,W,Z}.
S15 = {D u DEN is a word} = {B,D,F,H,K,M,P,S,T,W,Y}.
S16 = {D u DREE is a word} = {B,D,F,G,P,T}.
S17 = {D u DEEN is a word} = {B,K,P,S,T,W}.
S18 = {D u DCARE is a word} = {S}.
S19 = {D u DAREE is a word} = [.
S20 = {D u DREEN is a word} = {G,P}.
S21 = {D u CARED is a word} = {D,R,S,T,X}.
S22 = {D u DCAREE is a word} = [.
S23 = {D u DAREEN is a word} = {C}.
S24 = {D u CAREED is a word} = {N,R}.
s. a. gordon 223
connect in front (above), in back (below), through, or in parallel with words already
on the board, as long as every string formed is a word in the lexicon.
Consider, for example, the steps (corresponding to the numbers in the upper left
corners of the squares) involved in play (c) of Figure 2. CARE can be played
perpendicularly below ABLE as follows:
1. Play R (since ABLER is a word); move left; follow the arc for R.
2. Play A; move left; follow the arc for A.
3. Play C; move left; follow the arc for C.
4. Go to the square right of the original starting point; follow the arc for e.
5. Play the E, since it is in the last arc’s letter set.
The GADDAG algorithm for generating every possible move with a given rack
from a given anchor square is presented in Figure 3 in the form of backtracking,
recursive co-routines. Gen(0,NULL,RACK,INIT) is called, where INIT is an arc to the
initial state of the GADDAG with a null letter set. The Gen procedure is independent
of direction. It plays a letter only if it is allowed on the square, whether letters are
being played leftward or rightward. In the GoOn procedure, the direction determines
which side of the current word to concatenate the current letter to, and can be
shifted just once, from leftward to rightward, when the e is encountered.
A GADDAG also allows a reduction in the number of anchor squares used. There
is no need to generate plays from every other internal anchor square of a sequence
of contiguous anchor squares (e.g. the square left or right of the B in Figure 2),
since every play from a given anchor square would be generated from the adjacent
anchor square either to the right (above) or to the left (below). In order to avoid
generating the same move twice, the GADDAG algorithm was implemented with a
parameter to prevent leftward movement to the previously used anchor square.
The GADDAG algorithm is still non-deterministic in that it runs into many dead-
ends. Nevertheless, it requires fewer anchor squares, hits fewer dead-ends, and
follows fewer arcs before detecting dead-ends than the DAWG algorithm.
224 a faster scrabble move generation algorithm
Figure 5. Subgraph of semi-minimized GADDAG for ‘CARE’ (see Table I for letter sets)
Figure 6. Subgraph of semi-minimized GADDAG for ‘CAREEN’ (see Table I for letter sets)
that ZeAGG, AZeGG, GAZeG, and GGAZe all lead to. Each arc leading to the
former node has the letter set {S}, whereas each arc leading to the latter node has
a null letter set. After merging, those arcs will all lead to the same node, but their
letter sets will remain distinct. Incidentally, the node that DNUOW leads to cannot
be merged with the node that GGAZ lead to, since these strings can be completed
by different strings (e.g. the path DNUOWER for REWOUND and the path GGAZ-
GIZeED for ZIGZAGGED). The e precludes this.
Compression
A GADDAG (or DAWG) could be represented in a various expanded or com-
pressed forms. The simplest expanded form is a 2-dimensional array of arcs indexed
s. a. gordon 227
by state and letter. In the current lexicon, the number of distinct letter sets, 2575,
and distinct states, 89,031, are less than 212 and 217, respectively. So, each arc can
encode the indices of a letter set and a destination state within a 32-bit word. The
array of letter sets takes just over 10 kilobytes. Non-existent arcs or states are just
encoded with a 0.
The simplest compressed representation is a single array of 32-bit words. States
are a bit map indicating which letters have arcs. Each arc is encoded in 32-bit
words as in the expanded representation. In this combined array, arcs directly follow
the states they originate from in alphabetical order.
Compression has two disadvantages. The first is that encoding destination states
in arcs limits the potential size of the lexicon. Under expanded representation, a
state is just the first index of a two-dimensional array, whereas, under compression,
a state is the index of its bit-map in a single array of both arcs and states. This
second index grows much faster, so that a larger lexicon would allow arcs to be
encoded in 32 bits under expanded representation than under compression.
The second disadvantage is the time it takes to find arcs. Each arc is addressed
directly under expanded representation. Under compression, the bit map in a state
indicates if a given letter has an associated arc. One must count how many of the
preceding bits are set and then skip that many 32-bit words to actually find the arc.
Although the actual number of preceding bits that are set is usually small, each
preceding bit must be examined to compute the correct offset of the associated arc.
The advantage of compression is, of course, a saving in space. The histogram of
states by number of arcs in Table II indicates that most states have only a few arcs.
Avoiding the explicit storage of non-existent arcs thus saves a great amount of
space. A compressed GADDAG requires S + A 32-bit words, compared to 27S for
an expanded GADDAG, where S is the number of states and A is the number of
arcs. Nevertheless, the automata in Reference 7 had fewer states with more than 10
arcs, since a lexicon of commonly-used words is less dense than a Scrabble lexicon.
Table III compares the sizes of DAWG and GADDAG structures built from a
lexicon of 74,988 words from the OSPD211. Word lengths vary from 2 to 8 letters,
averaging 6·7 letters per word. Since people rarely play words longer than 8 letters,
the OSPD2 only lists base words from 2 to 8 letters in length and their forms. The
longer words that are listed are not representative, having, for example, a dispro-
portionate number of ING endings. Comparing DAWG and GADDAG structures on
a lexicon with just these longer words could be misleading, so longer words
were omitted.
The minimized and compressed DAWG structure represents the lexicon with an
impressive 4·3 bits per character, less than half the number of bits required to
represent the lexicon as text (including one byte per word for a delimiter). Appel
and Jacobson achieved an even better ratio by encoding many arcs in 24 bits. The
GADDAG minimized more effectively than the DAWG, being just less than 5 times
larger rather than the 6·7 times larger expected with equally effective minimization.
PERFORMANCE
Table IV compares the performances of the DAWG and GADDAG move generation
algorithms implemented within identical shells in Pascal playing 1000 randomly
generated games on a VAX4300 using both compressed and expanded representations.
228 a faster scrabble move generation algorithm
Table II. Histogram of states by number of outgoing arcs in minimized
DAWG and GADDAG structures
0 0 0·0 0 0·0
1 5708 32·0 35,103 39·4
2 5510 30·9 24,350 27·4
3 2904 16·3 11,291 12·7
4 1385 7·8 5927 6·7
5 775 4·3 3654 4·1
6 445 2·5 2187 2·5
7 299 1·7 1435 1·6
8 189 1·1 1009 1·1
9 148 0·8 821 0·9
10 98 0·5 641 0·7
11 84 0·5 493 0·6
12 53 0·3 389 0·4
13 52 0·3 325 0·4
14 41 0·2 254 0·3
15 30 0·2 198 0·2
16 20 0·1 165 0·2
17 23 0·1 150 0·2
18 16 0·1 128 0·1
19 14 0·1 102 0·1
20 10 0·1 81 0·1
21 18 0·1 87 0·1
22 12 0·1 65 0·1
23 8 0·0 59 0·1
24 6 0·0 52 0·1
25 4 0·0 26 0·0
26 4 0·0 34 0·0
27 – – 5 0·0
CPU time
Expanded 9:32:44 1·344s 3:38:59 0·518s 2·60
Compressed 8:11:47 1·154s 3:26:51 0·489s 2·36
Page faults
Expanded 6063 32,305 0·19
Compressed 1011 3120 0·32
Arcs Traversed 668,214,539 26,134 265,070,715 10,451 2·50
Per sec (compressed) 22,646 21,372
Anchors used 3,222,746 126·04 1,946,163 76·73 1·64
Number of moves 25,569 25,363
Average score 389·58 388·75
The VAX had 64M of memory, a light, mostly interactive work load, and effectively
unlimited image sizes, so performance was not significantly affected by either
memory management or competing work load. The DAWG algorithm traversed 2·5
times as many arcs in its structure as the GADDAG algorithm did in its. CPU times
reflect a slightly smaller ratio. In other words, both algorithms traverse about 22,000
arcs/s, but the GADDAG algorithm traverses the same number of arcs to generate
five moves as the DAWG algorithm traverses to generate two moves.
The shell used the greedy evaluation function (i.e. play the highest scoring move).
Ties for high score were broken by using the move found first. Since ties occurred
frequently and the algorithms do not generate moves in exactly the same order, the
actual games played diverged quickly.
Each expanded implementation ran slightly slower than the respective compressed
implementation. The page faults due to the larger memory demands of the expanded
structures evidently take more time to process on the VAX than searching for arcs
in the compressed structures. On a dedicated machine with enough memory, the
expanded implementations might run faster.
Some additional speed-up could be expected from reordering bit maps and arcs
into letter frequency order (i.e. e, E, A, I, $) to reduce the average number of
bits preceding a letter.4 In practice, reordering had little effect on CPU times. This
may be because A and E are already near the beginning and the GADDAG
implementation already placed e before all the letters, so most of the advantage of
reordering had already been achieved.
DAWG GADDAG
Arc range With blank Without Total With blank Without Total
0–255 0 24 24 0 45 45
256–511 0 896 896 0 847 847
512–1K 0 877 877 0 1618 1618
1K–2K 0 1075 1075 0 2680 2680
2K–4K 0 1809 1809 8 6260 6268
4K–8K 7 4654 4661 45 7935 7980
8K–16K 29 8348 8357 126 3605 3731
16K–32K 107 5209 5316 353 337 690
32K–64K 201 602 803 734 1 735
64K–128K 569 1 570 559 0 559
128K–256K 774 0 774 183 0 183
256K–512K 349 0 349 18 0 18
512K–1M 23 0 23 7 0 7
1M–2M 13 0 13 2 0 2
2M–4M 2 0 2 0 0 0
Total 23,495 2074 25,569 23,328 2035 25,363
Average arcs 11,961 179,850 26,134 5467 67,586 10,451
than average. The worst case requires about 1 minute for the GADDAG algorithm
versus about 2 minutes for the DAWG algorithm (the frequency of this case, twice
in both sides of 1000 games, suggests these racks contain both blanks).
As measured by arcs traversed, the GADDAG algorithm is 2·19 times faster on
racks without blanks, whereas it is 2·66 times faster on racks with blanks. The
GADDAG algorithm still hesitates when it encounters blanks, but a little less.
Greedy vs. Greedy 1·154 26,134 1011 0·489 10,451 3120 2·36 2·50 0·32
RackH1 vs. Greedy 1·390 32,213 1013 0·606 12,612 3136 2·29 2·55 0·32
RackH2 vs. Greedy 1·448 33,573 1016 0·630 13,060 3139 2·30 2·57 0·32
RackH3 vs. Greedy 1·507 35,281 1017 0·655 13,783 3141 2·30 2·56 0·32
s. a. gordon 231
measured by arcs traversed per move, the performance advantage of GADDAG
algorithm over the DAWG algorithm increases slightly under these heuristics. How-
ever, the ratio of seconds per move actually decreases slightly.
Each heuristic is computed for each move found by the move generation algorithms.
Heuristic processing time is therefore a function of the number of moves generated
rather than the number of arcs traversed generating those moves. Heuristic processing
time per move is bounded by the increase in average CPU times for the GADDAG
algorithm. The fact that average CPU times increased by twice as much for the
DAWG algorithm suggests that at least half of this increase was due to poorer
performance generating moves from the better racks that resulted from using the heu-
ristics.
FUTURE WORK
The lexicon should be expanded to include all nine-letter words, and the effect on
the relative size of the data structures and the performance of the algorithms should
be remeasured. Even longer words, up to 15 letters, could also be added.
Research into improving the modeling of Scrabble strategy continues on three
fronts: weighted heuristics for the evaluation of possible moves, the use of simulation
to select the most appropriate candidate move in a given position, and exhaustive
search for the optimal move in end games.
CONCLUSION
Although the Appel and Jacobson algorithm generates every possible move in any
given Scrabble position with any given rack very quickly using a deterministic finite
automaton, the algorithm itself is not deterministic. The algorithm presented here
232 a faster scrabble move generation algorithm
achieves greater determinism by encoding bidirectional paths for each word starting
at each letter. The resulting program generates moves more than twice as fast, but
takes up five times as much memory for a typical lexicon. In spite of the memory
usage, a faster algorithm makes the construction of a program that plays intelligently
within competitive time constraints a more feasible project.
REFERENCES
1. A. Appel and G. Jacobson, ‘The world’s fastest Scrabble program’, Commun. ACM, 31,(5), 572–578,
585 (1988).
2. S. Gordon, ‘A comparison between probabilistic search and weighted heuristics in a game with
incomplete information’, Technical Report, Department of Mathematics, East Carolina University, August
1993. Also, to appear in AAAI Fall Symposium Series, Raliegh, NC (October 1993).
3. E. Fredkin, ‘Trie memory’, Commun. ACM, 3,(9), 490–500 (1960).
4. D. Knuth, The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Reading,
MA, 1973, pp.481–500, 681.
5. M. Dunlavey, ‘On spelling correction and beyond’, Commun. ACM, 24(9), 608 (1981).
6. K. Kukich, ‘Techniques for automatically correcting words in text’, ACM Comput. Surv., 24(4), 382–
383, 395 (1992).
7. C. Lucchesi and T. Kowaltowski, ‘Applications of finite automata representing large vocabularies’,
Software—Practice and Experience, 23, 15–30 (1993).
8. A. Apostolico and R. Giancarlo, ‘The Boyer–Moore–Galil string searching strategies revisited’, SIAM
J. Comput., 15,(1), 98–105 (1986).
9. S. Baase, Computer Algorithms: Introduction to Design and Analysis, 2nd edn, Addison-Wesley, Reading,
MA, 1988, pp.209–230.
10. R. Boyer and J. Moore, ‘A fast string searching algorithm’, Commun. ACM, 20,(10), 762–772 (1977).
11. Milton Bradley Company, The Official Scrabble Players Dictionary, 2nd edn, Merriam-Webster Inc.,
Springfield,MA, 1990.
12. R.Tarjan, ‘Depth-first search and linear graph algorithms’, SIAM J. Comput., 1,(2), 146–160 (1972).
13. A. Nerode, ‘Linear automaton transformations’, Proc. AMS, 9, 541–544 (1958).
14. N. Ballard, ‘Renaissance of Scrabble theory 2’, Games Medleys, 17, 4–7 (1992).
15. D. Sankoff and J. B. Kruskal, Time Warps, String Edits, and Macromolecules: The Theory and Practice
of Sequence Comparison, Addison-Wesley, Reading, MA, 1983.