Crossword Compiler: A Data Structure, Algorithms, and Entropy
Crossword Compiler: A Data Structure, Algorithms, and Entropy
Compiler:
A
Data
Structure,
Algorithms,
and
Entropy
Ma<hew
Fahrbach
This
presenta@on
discusses
an
algorithm
developed
for
Bobo
Strategy,
Inc.
as
part
of
employment.
The
ideas
within
are
presented
with
the
permission
of
Bobo
Strategy,
Inc.
in
this
academic
seFng.
Outline
Crossword
Compiler
• Crossword
Puzzles
• Filler
Word
Tree
• Word
Ranking
Algorithm
• Crossword
Fill
Algorithm
Informa@on
Theory
• Claude
Elwood
Shannon
• Entropy
• Redundancy
New
York
World
Crossword
Word
Rank
Algorithm
• Inputs:
Crossword,
star@ng
cell,
and
direc@on
• Outputs:
Top
ten
matching
words
ranked
by
the
sum
of
their
stemming
word
count
Word
Rank
Algorithm
Entropy
(Informa@on
Theory)
Informa@on
Content
Shannon’s Entropy
Example
Redundancy
(Informa@on
Theory)
• Defini@on:
Number
of
bits
in
the
transmi<ed
data
minus
its
entropy
• Wasted
“space”
when
transmiFng
data
• Compression
reduces
redundancy
The
Existence
of
Large
2-‐Dimensional
Crosswords
• The
redundancy
of
a
language
is
related
to
the
existence
of
crossword
puzzles
• Zero
redundancy
is
trivial
• If
the
redundancy
is
too
high
the
language
imposes
too
many
constraints
for
large
crosswords
to
be
possible
The
Existence
of
Large
2-‐Dimensional
Crosswords
• A
more
detailed
analysis
shows
that
large
2-‐dimensional
crossword
puzzles
are
only
possible
when
the
redundancy
is
less
than
50%.
• If
the
redundancy
is
less
than
33%,
3-‐dimensional
crossword
puzzles
should
be
possible,
etc.
The
Existence
of
Large
2-‐Dimensional
Crosswords
• Edgar
Gilbert
is
an
American
coding
theorist
and
long@me
researcher
at
Bell
Laboratories
• Mo@vated
by
Shannon’s
asser@ons
he
es@mated
the
entropy
of
English
text
to
be
41.5%
when
elimina@ng
words
of
length
1
and
2
• Infinitely
large
2-‐dimensional
crosswords
are
possible
to
construct,
but
3-‐dimensional
crosswords
are
not
References
• Crossword
History
-‐
www.crosswordtournament.com/more/wynne.html
• Recursive
Backtracking
-‐
www.cis.upenn.edu/~matuszek/cit594-‐2002/pages/
backtracking.html
• Claude
Shannon
and
Informa@on
Theory
-‐
Wikipedia
• Crossword
Puzzles
and
Shannon
-‐
IEEE
Informa@on
Theory
Society
NewslePer,
Vol.
51,
No.
3,
September
2001
This
presenta@on
discusses
an
algorithm
developed
for
Bobo
Strategy,
Inc.
as
part
of
employment.
The
ideas
within
are
presented
with
the
permission
of
Bobo
Strategy,
Inc.
in
this
academic
seFng.