0% found this document useful (0 votes)

554 views3 pages

CYK Algorithm - A Haskell Implementation

This document explains how the CYK algorithm works, and provides two parsers -- one implemented in C++, and one in Haskell to deal with problems.

Uploaded by

Paul Wintereise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

554 views3 pages

CYK Algorithm - A Haskell Implementation

This document explains how the CYK algorithm works, and provides two parsers -- one implemented in C++, and one in Haskell to deal with problems.

Uploaded by

Paul Wintereise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

A Haskell Implementation of the CYK Algorithm

Aaron Gorenstein
February 15, 2014
This is a brief literate program written in Haskell implementing the CYK algorithm. I assume the reader
is already familiar with the CYK algorithm. My implementation is very simplisticthis is unavoidable as I
am a Haskell novice. This has not deterred me, because my target audience is other novices! In particular,
Im eager to share what I found to be the natural way of expressing the CYK algorithm in Haskell,
especially in comparison to my C++ implementation. In many ways it looks quite similar, but it seems to
more faithfully reflect the recursive structure our dynamic algorithm counts on.
There are two main sections in this document. The first details the foundation. There we define how
to express a CFG that is in CNF, and some reverse-lookup functions. That is, for a string of symbols in
the gammar G, is there a production in G of the form A ? If so, return all such A. The second section
focuses entirely on the main CYK algorithm, and focuses on explaining how our Haskell code reflects the
imperative definition as found in the C++ version. A final epilogue simply shows a very simple instatiation
of the algorithm on a grammar and input string.
Without further ado:

Prologue: Building the Foundation

Our code begins in a very staid manner.

1
2

import Data.Array -- for Array

import Data.List -- for nub

The function nub takes a list and removes redundant elements. For instance, the list [1, 2, 3, 2, 3] is transformed to [1, 2, 3].
This code assumes that the grammar is in Chomsky Normal form, the definition of which I will not address
hereits very standard. Regardless, were able to define our grammar productions as having exactly two
forms (tuples, really), as we know that each production involves exactly 2 or 3 symbols.
1
2

data Production = NTprod String String String | Tprod String String

type CFG = [Production]

The datatype Production is either a nonterminal production (NTprod) of the form A BC (observe that
the right-hand-side is made of nonterminals), or a terminal (Tprod) production, of the form A a, where a
is a terminal. We specify these options very concretely with Tuples of the associated string representations.
A context-free grammar is simply a collection of these productions, hence the nave definition in list form
in line 2 of the above code snippet. Observe that many things about the grammar are not checked (that the
start symbol is not used recursively, for instance).
With our grammar object defined, the only remaining feature we want to define is the reverse-lookup:
given a string of symbols from the gammar , return any nonterminal A such that the production A
exists. As we know our grammar is in CNF, there are exactly two cases: either is two nonterminals, or
is a single terminal. To facilitate implementation, first we want to be able to, given a Production of either
sort, extract its left-hand-side.
1
2

prodLHS (NTprod a _ _) = a
prodLHS (Tprod a _) = a

This is an obvious utility.

Now we can do the reverse lookup for the two cases. Theyre obviously very similar. Open question:
is there a nicer way of implementing this, perhaps as a single function? Regardless, here is the A BC
lookup (given BC, find all A such that A BC):
1
2
3
4

ntGens :: CFG (String, String) [String]

ntGens cfg (a, b) = map prodLHS (filter filterFunc cfg)
where filterFunc (NTprod _ x y) = x == a && y == b
filterFunc _ = False

Here is the A a lookup (given a, find all A such that A a).

1
2
3
4

termGens :: CFG String [String]

termGens cfg b = map prodLHS (filter filterFunc cfg)
where filterFunc (Tprod _ y) = y == b
filterFunc _ = False

Those are the only data types and functions we need! We are now ready to define our main algorithm.

The Main Algorithm

Lets jump right into the main algorithm, and we shall explain it line-by-line:
1
2
3
4
5
6
7
8
9
10
11
12
13
14

cykMatrix :: CFG String Array (Int, Int) [String]

cykMatrix cfg s =
let termGens = termGens cfg
ntGens = ntGens cfg
n = length s
m = array ((0,0), (n-1,n-1))
([((i,i), termGens [s!!i])
| i [0..n-1]] ++
[((r, r+l), generators r l) | l [1..n-1], r [0..n-l-1]])
where generators :: Int Int [String]
generators r l =
nub $ concat [ntGens (a,b) | t [0..l-1],
a m!(r,r+t),
b m!(r+t+1,r+l)]
in m

Lines 1-6 are just setting up the functionthe real magic starts afterwards. Recall from the CYK algorithm
design that m!(i, j) is the set of all nonterminals which generates the substring of s starting at index i and
ending at j. The base case, as shown on line 7, follows naturally: at m!(i, i), the nonterminals are exactly
those which generate the single terminal found at location i in s. (The curious [s!!i] notation is simply
because s!!i is a character, and for type-safety we have to promote back to a String type, which we do
with the [] notation.) On line 8, we say more-or-less the same thing, conceptually, but in the general case.
Thus, it is entirely the list-comprehension beginning on line 11 where the magic happens. It states:
The nonterminals which can generate the substring from index r to r + ` are exactly those C involved in the
production C AB, where AB are represented by (a, b) in the ntGens call. Moreover, for any a, b, it must
be that a generates some substring-prefix s[r, r + t] and b generates the corresponding suffix s[r + t + 1, r + `].
The values t can exist between 0 and ` 1. Quite a lot in packed into that single line!
I hope that was somewhat comprehensible. Note that compared with the C++ version, that single list
comprehension replaces 3 for loops! Most importantly, I think the list comprehension better expresses the
relationships between those three integers r, t, l. Hooray!

Epilogue: A Small Execution

That was the major part of the work. At this point well simply present a hard-coded example. In the future
this code may be extended to actually take things from input files and so forth, but for now this is just a
quick test.
First, we use the matrix to actually compute our decision algorithm:
1
2

cyk cfg s = let n = length s in

"S" elem cykMatrix cfg s !(0,n-1)

Now lets define a very simple grammar:

1
2
3
4
5
6
7

exampleCFG :: CFG
exampleCFG = [NTprod "S" "A" "B",
NTprod "A" "A" "A",
Tprod "A" "a",
NTprod "B" "B" "B",
-(NTprod "B" "A" "A"),
Tprod "B" "b"]

And here is the easy way of using our CYK algorithm!

main = print (cyk exampleCFG "aaabb")

Thank you for reading, I hope this has helped you understand something about Haskell or the CYK algorithm. I also hope I have been able to share my excitement and enthusiasm with how different programming
languages can sometimes express things in intellectually pleasing contrasting ways.

Fractions and Decimals Scholastic
100% (3)
Fractions and Decimals Scholastic
49 pages
John Duarte - Op-140 - Twelve Studies For Guitar
100% (2)
John Duarte - Op-140 - Twelve Studies For Guitar
14 pages
201 2018 2 b-22 PDF
No ratings yet
201 2018 2 b-22 PDF
21 pages
Lec 3
No ratings yet
Lec 3
76 pages
CYK-Notes
No ratings yet
CYK-Notes
7 pages
CYK Algorithm
No ratings yet
CYK Algorithm
6 pages
TOC 3IS(cs)
No ratings yet
TOC 3IS(cs)
24 pages
Lesson 44
No ratings yet
Lesson 44
43 pages
CS351 Context Free Grammars
No ratings yet
CS351 Context Free Grammars
9 pages
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
No ratings yet
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
23 pages
Ch4_
No ratings yet
Ch4_
20 pages
CKY) Cocke-Kasami-Younger) Earley Parsing Algorithms: Dor Altshuler
No ratings yet
CKY) Cocke-Kasami-Younger) Earley Parsing Algorithms: Dor Altshuler
81 pages
CYK Parsing Notes
No ratings yet
CYK Parsing Notes
5 pages
Theory of Computation: Automata Theory (CFG, CFL, CNF)
No ratings yet
Theory of Computation: Automata Theory (CFG, CFL, CNF)
39 pages
Lectures Examples and Solutions of CFG&RE
No ratings yet
Lectures Examples and Solutions of CFG&RE
290 pages
CYK Algorithm
No ratings yet
CYK Algorithm
3 pages
Automata Suggestions Solution. Soumyadip Karak
No ratings yet
Automata Suggestions Solution. Soumyadip Karak
6 pages
4 - 3 - 15. Decision and Closure Properties For CFL's (35 Min.)
No ratings yet
4 - 3 - 15. Decision and Closure Properties For CFL's (35 Min.)
12 pages
Normal Forms and Parsing: CSC 3130: Automata Theory and Formal Languages
No ratings yet
Normal Forms and Parsing: CSC 3130: Automata Theory and Formal Languages
22 pages
CS107-08 CFGs
No ratings yet
CS107-08 CFGs
7 pages
Lecture7 PDF
No ratings yet
Lecture7 PDF
40 pages
Problem Set 8
No ratings yet
Problem Set 8
7 pages
Module-4 Normal Forms
No ratings yet
Module-4 Normal Forms
63 pages
TOC minor
No ratings yet
TOC minor
10 pages
hw2
No ratings yet
hw2
4 pages
Normal Forms For Context-Free Grammars
No ratings yet
Normal Forms For Context-Free Grammars
57 pages
CYK Algorithm For String Parsing
No ratings yet
CYK Algorithm For String Parsing
3 pages
Compiler Design Lab Manual 05.02.2024_Final
No ratings yet
Compiler Design Lab Manual 05.02.2024_Final
71 pages
lec6
No ratings yet
lec6
35 pages
Class10 Normal Forms
No ratings yet
Class10 Normal Forms
12 pages
MTE 2025 solutions
No ratings yet
MTE 2025 solutions
10 pages
Ada CD Index Cdfile
No ratings yet
Ada CD Index Cdfile
70 pages
SPCC Practicalss
No ratings yet
SPCC Practicalss
6 pages
Lesson 22 CFG and CNF 11012023 022658pm 21022024 043832pm
No ratings yet
Lesson 22 CFG and CNF 11012023 022658pm 21022024 043832pm
130 pages
Screenshot 2024-07-22 at 8.41.47 AM
No ratings yet
Screenshot 2024-07-22 at 8.41.47 AM
62 pages
Ch-6 CNF and GNF
100% (1)
Ch-6 CNF and GNF
33 pages
WINSEM2024-25_CSE1008_TH_AP2024254000332_2025-02-12_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE1008_TH_AP2024254000332_2025-02-12_Reference-Material-I
56 pages
Theory of Computing
No ratings yet
Theory of Computing
118 pages
UNIT3-Part I
No ratings yet
UNIT3-Part I
25 pages
Context Free Grammar
100% (1)
Context Free Grammar
65 pages
To Check Whether String Belongs To A Grammar or Not: Algorithm
No ratings yet
To Check Whether String Belongs To A Grammar or Not: Algorithm
34 pages
Module 4 CNF
No ratings yet
Module 4 CNF
16 pages
Compiler Design
No ratings yet
Compiler Design
10 pages
CC Lab Record[1]
No ratings yet
CC Lab Record[1]
19 pages
Properties of Context-Free Languages: Reading: Chapter 7
No ratings yet
Properties of Context-Free Languages: Reading: Chapter 7
61 pages
Dbms U2
No ratings yet
Dbms U2
15 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Automata Lectuee5
No ratings yet
Automata Lectuee5
33 pages
Pumping Lemma (Bar Hillel Lemma)
No ratings yet
Pumping Lemma (Bar Hillel Lemma)
49 pages
Answers
No ratings yet
Answers
11 pages
Lesson_17
No ratings yet
Lesson_17
19 pages
UNIT-3 PART II
No ratings yet
UNIT-3 PART II
13 pages
lec26-dynamic-programming-7
No ratings yet
lec26-dynamic-programming-7
57 pages
Theory of Computation
No ratings yet
Theory of Computation
30 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
Compiler Design Pur Vi
No ratings yet
Compiler Design Pur Vi
39 pages
NATURAL LANGUAGE PROCESSING
No ratings yet
NATURAL LANGUAGE PROCESSING
5 pages
Context Free Grammars Theory of Automata
No ratings yet
Context Free Grammars Theory of Automata
65 pages
Toc K2
No ratings yet
Toc K2
4 pages
CD Lab Record
No ratings yet
CD Lab Record
43 pages
notes
No ratings yet
notes
38 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Swift Documentation 1
No ratings yet
Swift Documentation 1
3 pages
Tring Class and Its Objects
No ratings yet
Tring Class and Its Objects
12 pages
Check Digit: Multiple Errors, Such As Two Replacement Errors (12 34) Though, Typically, Double Errors Will
No ratings yet
Check Digit: Multiple Errors, Such As Two Replacement Errors (12 34) Though, Typically, Double Errors Will
4 pages
NDF Reference Manual
No ratings yet
NDF Reference Manual
8 pages
Vidyavahini First Grade College: Digital Electronics 2 Semester BCA
No ratings yet
Vidyavahini First Grade College: Digital Electronics 2 Semester BCA
154 pages
Y6 Topic 1 Whole Numbers
0% (1)
Y6 Topic 1 Whole Numbers
43 pages
Lecture 5
No ratings yet
Lecture 5
3 pages
Unit 1. Integer Numbers. Activities 2º ESO
No ratings yet
Unit 1. Integer Numbers. Activities 2º ESO
4 pages
Hiragana and K Atakana Worksheets
No ratings yet
Hiragana and K Atakana Worksheets
31 pages
Formation of Numbers: Student's Will Be Able To Identify The Place Value of Numbers and Writing Numbers in Standard Form
No ratings yet
Formation of Numbers: Student's Will Be Able To Identify The Place Value of Numbers and Writing Numbers in Standard Form
5 pages
Lesson 2 Operation On Fractions (Addition and Subtraction) PDF
No ratings yet
Lesson 2 Operation On Fractions (Addition and Subtraction) PDF
5 pages
Objective-C Cheat Sheet and Quick Reference: Superclass
No ratings yet
Objective-C Cheat Sheet and Quick Reference: Superclass
1 page
Ugaritic Alphabet
100% (1)
Ugaritic Alphabet
6 pages
Programming in C Data Structures (15pcd13) - Notes PDF
No ratings yet
Programming in C Data Structures (15pcd13) - Notes PDF
108 pages
Punctuation:: 1. Period/Full Stop (.)
No ratings yet
Punctuation:: 1. Period/Full Stop (.)
6 pages
MUSC 110 Compound Meters Worksheet
No ratings yet
MUSC 110 Compound Meters Worksheet
4 pages
Admmodule Stem Gp12eu Ia 1
No ratings yet
Admmodule Stem Gp12eu Ia 1
26 pages
List of Builder Project
No ratings yet
List of Builder Project
64 pages
MS1008-Tutorial 5
No ratings yet
MS1008-Tutorial 5
10 pages
(AC-S13) Week 13 - Pre-Task - Quiz - Weekly Quiz - INGLES IV (30468)
No ratings yet
(AC-S13) Week 13 - Pre-Task - Quiz - Weekly Quiz - INGLES IV (30468)
4 pages
7th Mathematics DLP Study Package Final
No ratings yet
7th Mathematics DLP Study Package Final
173 pages
Guitar Method
90% (21)
Guitar Method
318 pages
Enum and Constants
No ratings yet
Enum and Constants
6 pages
Vals Sentimental - 6
No ratings yet
Vals Sentimental - 6
2 pages
2ND Periodical Test (MTB-2 Iloko) A
No ratings yet
2ND Periodical Test (MTB-2 Iloko) A
6 pages
ChoirCheatSheet 1
No ratings yet
ChoirCheatSheet 1
2 pages
How To Convert From Decimal To Binary
No ratings yet
How To Convert From Decimal To Binary
5 pages
Pitts Modern School, Gomia 2
No ratings yet
Pitts Modern School, Gomia 2
1 page

CYK Algorithm - A Haskell Implementation

Uploaded by

CYK Algorithm - A Haskell Implementation

Uploaded by

A Haskell Implementation of the CYK Algorithm

Prologue: Building the Foundation

Our code begins in a very staid manner.

import Data.Array -- for Array

data Production = NTprod String String String | Tprod String String

This is an obvious utility.

ntGens :: CFG (String, String) [String]

Here is the A a lookup (given a, find all A such that A a).

termGens :: CFG String [String]

The Main Algorithm

cykMatrix :: CFG String Array (Int, Int) [String]

Epilogue: A Small Execution

cyk cfg s = let n = length s in

Now lets define a very simple grammar:

And here is the easy way of using our CYK algorithm!

main = print (cyk exampleCFG "aaabb")

You might also like