0% found this document useful (0 votes)
141 views24 pages

Crossword Compiler: A Data Structure, Algorithms, and Entropy

This document discusses an algorithm developed for crossword puzzle generation. It presents the outline, including crossword puzzles, a filler word tree data structure for word lookup, a word ranking algorithm, and a crossword fill algorithm using backtracking. It also covers information theory concepts like entropy that are relevant to the algorithm design. The algorithm aims to fill crossword grids with themed words and common words to complete the puzzle.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views24 pages

Crossword Compiler: A Data Structure, Algorithms, and Entropy

This document discusses an algorithm developed for crossword puzzle generation. It presents the outline, including crossword puzzles, a filler word tree data structure for word lookup, a word ranking algorithm, and a crossword fill algorithm using backtracking. It also covers information theory concepts like entropy that are relevant to the algorithm design. The algorithm aims to fill crossword grids with themed words and common words to complete the puzzle.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Crossword

 Compiler:  A  Data  
Structure,  Algorithms,  and  Entropy  
 
Ma<hew  Fahrbach  

This  presenta@on  discusses  an  algorithm  developed  for  Bobo  Strategy,  Inc.  as  part  of  employment.    The  ideas  within  are  
presented  with  the  permission  of  Bobo  Strategy,  Inc.  in  this  academic  seFng.  
Outline  
Crossword  Compiler  
•  Crossword  Puzzles  
•  Filler  Word  Tree  
•  Word  Ranking  Algorithm  
•  Crossword  Fill  Algorithm  

Informa@on  Theory  
•  Claude  Elwood  Shannon  
•  Entropy  
•  Redundancy  

Existence  of  Infinitely  Large  2-­‐Dimensional  Crosswords  

 
New  York  World  Crossword  

•  The  first  crosswords  appeared  in  


English  children’s  puzzle  books  
during  the  19th  century  

•  Arthur  Wynne  was  a  Journalist  


from  Liverpool,  England  

•  By  the  1920s  crosswords  


appeared  in  almost  all  American  
newspapers  
New  York  Times  Crossword  
•  3-­‐le<er  word  minimum  

•  A  word  may  only  be  used  once  

•  Puzzle  must  have  rota@onal  


symmetry  

•  Theme  should  be  interes@ng  and  


narrowly-­‐defined  

•  Difficulty  increases  throughout  the  


week  

•  Daily  Crossword  –  15x15                        


Sunday  Crossword  –  21x21  or  23x23  
Setup  
•  Empty  crossword  template  (.txt)  

•  Themed  words  (100)                                          


Percent  themed  words  (10%)  

•  Filler  words  –  Unix  Dic@onary  (200K)  


Filler  Word  Tree  
•  Composed  of  directories  and  text  files  
•  Allows  for  quick  lookup  of  par@ally  filled  words  (regular  
expressions)  
•  A  node  (text  file)  in  the  tree  contains  a  subset  of  the  matching  
words  

•  Example:  Query(Q-­‐-­‐B)  ⇒  fillerwords_4_1_Q_4_B.txt  ⇒  [QUAB,  QUIB]    


Count(Q-­‐-­‐B)  ⇒  2    
 
Filler  Word  Tree  
Filler  Word  Tree  
•  Requires  hours  of  preprocessing  
•  The  structure  is  recyclable  
•  Words  may  be  added  or  removed  without  regenera@ng  the  
en@re  tree  
•  Unfortunately  these  200K  words  require  200MB  and  are  
ini@ally  read  from  the  disk  (though  memory  is  not  a  problem)  

 
 
 
Word  Rank  Algorithm  
•  Inputs:  Crossword,  star@ng  cell,  and  direc@on  
•  Outputs:  Top  ten  matching  words  ranked  by  the  sum  of  their  
stemming  word  count  
 
Word  Rank  Algorithm  

Example:  RankWords(0,  0,  Across)  

Query(Q-­‐-­‐B)  ⇒  [QUAB,  QUIB]  


 
 QUABscore  =  Count(Q-­‐-­‐)  +  Count(U-­‐-­‐)  +  Count(A-­‐E)  +  Count(BAY)  
       =  2  +  34  +  17  +  1    
                       =  54  
 
 QUIBscore    =  Count(Q-­‐-­‐)  +  Count(U-­‐-­‐)  +  Count(I-­‐E)  +  Count(BAY)  
     =  2  +  34  +  5  +  1    
     =  42  
 
Return  [QUAB,  QUIB]  
Word  Rank  Algorithm  
•  Implicitly  processes  languages  for  common  word  structures  
•  Higher  ranked  words  are  more  likely  to  fill  and  complete  the  
crossword    
•  If  there  is  an  index  in  the  fiFng  word  that  has  no  
perpendicular,  stemming  words  it  is  not  returned  in  the  
ranked  list  (pruning  for  backtracking)  
Recursive  Backtracking  
1.  Star@ng  at  Root,  your  op@ons  are  A  and  B.  You  
choose  A.   Root  
2.  At  A,  your  op@ons  are  C  and  D.  You  choose  C.  
3.  C  is  bad.  Go  back  to  A  
4.  At  A,  you  have  already  tried  C,  and  it  failed.  Try  
D.  
A   B  
5.  D  is  bad.  Go  back  to  A.  
6.  At  A,  you  have  no  op@ons  lem  to  try.  Go  back  to  
Root.  
7.  At  Root,  you  have  already  tried  A.  Try  B.  
8.  At  B,  your  op@ons  are  E  and  F.  Try  E.  
C   D   E   F  
9.  E  is  Good.  You  are  finished.   bad   bad   good   good  
Crossword  Fill  Algorithm    
•  General  Algorithm:  Heuris@c  Backtracking  
•  Fills  an  empty  crossword  with  a  percentage  of  themed  words  
and  then  completes  it  using  the  filler  word  tree  
 
Crossword  Fill  Algorithm  
•  Heuris@c  elements  have  been  implemented  experimentally  to  
improve  performance  
•  Select  the  ranked  words  at  random  instead  of  choosing  the  
highest  ranked  word  at  each  intersec@on  
•  Recursive  limits  prevent  the  program  from  exhaus@vely  
searching  for  a  solu@on  down  a  failing  path  
Crossword  Compiler  Demonstra@on  
Claude  Elwood  Shannon  

•  A  Mathema@cal  Theory  of  


Communica@on  –  1948  
•  The  Father  of  Informa@on  Theory  
Entropy  (Informa@on  Theory)  
•  Defini@on:  Entropy  is  the  measure  of  uncertainty  in  a  random  
variable  
•  Measured  in  bits  
•  High  entropy  implies  less  predictability  
•  Provides  a  limit  on  the  best  possible  lossless  compression  of  
any  transmi<ed  data  
Entropy  (Informa@on  Theory)  
Let’s  start  with  a  fair  coin  flip  
•  Heads  and  tails  are  equally  likely  
•  Entropy  of  one  flip  is  one  bit  
•  Entropy  of  two  flips  is  two  bits  
 
Now  suppose  the  coin  always  
lands  on  tails.    
•  How  predictable  is  this?  
 
 

 
 
 
Entropy  (Informa@on  Theory)  
Informa@on  Content  

Shannon’s  Entropy  

Example  
Redundancy  (Informa@on  Theory)  
•  Defini@on:  Number  of  bits  in  the  transmi<ed  data  minus  its  
entropy  
•  Wasted  “space”  when  transmiFng  data  
•  Compression  reduces  redundancy  
The  Existence  of  Large  2-­‐Dimensional  
Crosswords  
•  The  redundancy  of  a  language  is  related  to  the  existence  of  
crossword  puzzles  
•  Zero  redundancy  is  trivial  
•  If  the  redundancy  is  too  high  the  language  imposes  too  many  
constraints  for  large  crosswords  to  be  possible  
The  Existence  of  Large  2-­‐Dimensional  
Crosswords  
•  A  more  detailed  analysis  shows  that  large  2-­‐dimensional  
crossword  puzzles  are  only  possible  when  the  redundancy  is  
less  than  50%.    
•  If  the  redundancy  is  less  than  33%,  3-­‐dimensional  crossword  
puzzles  should  be  possible,  etc.  
The  Existence  of  Large  2-­‐Dimensional  
Crosswords  
•  Edgar  Gilbert  is  an  American  coding  theorist  and  long@me  
researcher  at  Bell  Laboratories  
•  Mo@vated  by  Shannon’s  asser@ons  he  es@mated  the  entropy  
of  English  text  to  be  41.5%  when  elimina@ng  words  of  length  
1  and  2    
•  Infinitely  large  2-­‐dimensional  crosswords  are  possible  to  
construct,  but  3-­‐dimensional  crosswords  are  not  
References  
•  Crossword  History  -­‐                                                      
www.crosswordtournament.com/more/wynne.html  
•  Recursive  Backtracking  -­‐                                                                    
www.cis.upenn.edu/~matuszek/cit594-­‐2002/pages/
backtracking.html    
•  Claude  Shannon  and  Informa@on  Theory  -­‐  Wikipedia    
•  Crossword  Puzzles  and  Shannon  -­‐  IEEE  Informa@on  Theory  
Society  NewslePer,  Vol.  51,  No.  3,  September  2001  

This  presenta@on  discusses  an  algorithm  developed  for  Bobo  Strategy,  Inc.  as  part  of  employment.    The  ideas  within  are  
presented  with  the  permission  of  Bobo  Strategy,  Inc.  in  this  academic  seFng.  

You might also like