0% found this document useful (0 votes)

37 views7 pages

(A) - An Interactive Parser Generator For Context-Free Grammars

This document summarizes an interactive parser generator called Grambler that was created to be used in a sophomore-level programming course. Grambler accepts arbitrary context-free grammars as input and generates a Java parser using the Earley parsing algorithm. The GUI allows users to develop, edit, and test grammars by visualizing parse trees and the Earley parsing table. Generated parsers can be exported as Java source code and reloaded for further editing.

Uploaded by

Moi Jacques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views7 pages

(A) - An Interactive Parser Generator For Context-Free Grammars

Uploaded by

Moi Jacques

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/261363980

AN INTERACTIVE PARSER GENERATOR FOR CONTEXT-FREE GRAMMARS

Article in Information Sciences · December 2012

CITATION READS
1 835

1 author:

Gabriel John Ferrer

Hendrix College
32 PUBLICATIONS 72 CITATIONS

SEE PROFILE

All content following this page was uploaded by Gabriel John Ferrer on 23 October 2017.

The user has requested enhancement of the downloaded file.

AN INTERACTIVE PARSER GENERATOR FOR CONTEXT-FREE GRAMMARS

Gabriel J. Ferrer
Department of Mathematics and Computer Science
Hendrix College
Conway, AR 72032
(501) 450-3879
[email protected]

ABSTRACT
This paper describes a parser generator that accepts arbitrary context-free grammars. It
generates a parser using the Earley algorithm [1]. It enables the user to develop, edit, and test a
grammar in an interactive graphical environment. This GUI visualizes both the operation of the
Earley algorithm as well as the generated parse trees. The generated parsers are saved as
fully-functional Java source files, ready to be incorporated into an application. These Java
programs can be reloaded into the GUI for further editing of the grammar. Employing this parser
generator in a sophomore-level software development course enables students to become
proficient in writing a parser with two days of lecture and one assignment.

INTRODUCTION
In our college's Computer Science program, students are expected to learn to process
text input using both regular expressions and parser generators. While regular expressions are
sufficient for many purposes, they suffer from two intrinsic problems: It is difficult to introduce
named abstractions, and it is not possible to represent recursive structures. For this reason, we
require our students to learn how to process text input using a parser generator.

Parser generators are typically covered as part of a compiler construction course (e.g.
[10] [11]). However, the number of courses we can require for the Computer Science major is
constrained by Hendrix College's liberal arts curriculum; this makes it difficult for us to require
(or offer) a compiler-construction course. We instead cover parsing in a sophomore-level
programming course.

To write a parser by hand for any but the simplest of input languages makes for a
tedious, highly bug-prone task, a task typically avoided in practice by using a parser generator.
But parser generators themselves are associated with a number of practical problems,
especially in the context of the limited time we have available for covering this topic:
1.In order to improve runtime efficiency, most parser generators (e.g., [3] [4] [5] [6]) accept
only a subset of context-free grammars.
2.The parsing algorithms (e.g., LALR parsers [2][11]) often require considerable study to
understand.
3.Many parser generators (e.g., [3] [4] [6]) require that input be preprocessed with a lexical
analyzer.
4.When developing a grammar, it is difficult to get immediate feedback on the changes, as
testing each alteration requires first generating a parser and running the compiler.
5.The API for interacting with the output of a parser generator is often intimidating, and
disproportionate to what is needed to build a working parser.

In order to address these problems, we would like a parser generator to have the
following features:
1.Accept an unrestricted context-free grammar.
2.Accept regular expressions as terminal symbols.
3.Employ an algorithm readily comprehensible by a 2nd-year computer science student.
4.Allow the programmer to interact with the grammar without an explicit code-generation
step.
5.Provide an API that enables the programmer to write tree-walking code quickly and easily.
In addition, the API should allow the grammar to change with as few changes as
possible to the tree-walking code that depends upon it.

Furthermore, given the widespread use of automated unit testing (e.g. JUnit), it would be
helpful for a parser generator to support automated unit testing as well.

This led us to create our own parser generator, Grambler, to meet our requirements.
Grambler is an implementation of the Earley parsing algorithm. It allows the user to specify an
arbitrary context-free grammar, and it will generate a Java class that corresponds to that
grammar. The user may mix regular expressions into the context-free grammar as well, thus
seamlessly incorporating lexical analysis into the parser.

Grambler also allows the user to interact with the grammar. Once a grammar is
specified, the user can enter text and request that it be parsed. The GUI will then display a
parse tree for the text using the grammar, and will also show the parse table generated by the
Earley algorithm. The user can also generate JUnit tests to verify that alterations to the
grammar preserve its ability to parse previously successful inputs.

This paper is organized as follows. First, the Earley algorithm is summarized, and some
modifications we made to the algorithm are described. Next, each feature of the GUI is
described, with an emphasis on the pedagogical role of each feature. We then describe the API
for interacting with the parse trees that Grambler generates.

IMPLEMENTING THE EARLEY PARSING ALGORITHM

The Earley algorithm is a dynamic-programming algorithm that builds a table of parses
of input prefixes. The algorithm terminates when the top-level production has matched the entire
input string. Each row of the table corresponds to one character position in the input string. (For
an input string of length n, the first row is row zero, and the final row is row n.) Each row
contains a list of every possible match between a production and the input string at that position.

Each possible match is called a state. A state is defined by the following parameters:
●Input position where this match started (the origin position)
●Current input position (i.e., the current row)
●A production from the grammar
●Next candidate symbol from that production

The algorithm starts by adding a state to row zero for each alternative of the top-level
production. The first symbol from the production will be the next candidate symbol, and both the
starting and current input positions will be zero. Then, for every state in every table row, a table
update is performed.

The nature of a table update depends upon the type of the state. A state is complete if all
production symbols have been matched. A state is scanning if the next candidate production
symbol is a terminal symbol; if it is a non-terminal symbol, the state is predicting.

If the state is predicting, it will add a new state to the same table row for every right-hand
alternative of the next candidate production symbol. If the state is scanning and the current
character is a match for the terminal symbol, it will add a new state to the next table row, with an
updated current-input position and an updated next-candidate symbol. If the state is complete, it
will loop through all states in its origin position; if the completed state's production is the next-
candidate symbol for an origin state, a new state derived from the origin state is inserted at the
current table row, with an updated next-candidate symbol.

As our implementation of this algorithm permits both arbitrary-length fixed strings as well
as regular expressions as terminal symbols, it was necessary to modify the update for scanning
states. First, the length of the matching input substring is computed. This length is then added to
the current character position to determine the row into which the new state is to be placed.

To demonstrate the input language for context-free grammars, Figure 1 gives as an

example the grammar for that input language. Grambler is self-hosting; it generated its own
parser.

Figure 1: Grammar for grammars

Each production is structured as follows. The left-hand side symbol is followed by a

colon; alternatives are separated with a vertical line, and the production as a whole is
terminated by a semicolon. The left-hand side of the first production is the start symbol for the
grammar.

The right-hand elements are separated by spaces. Elements without quotation marks
are nonterminals. Elements within single quotation marks are string literal terminals. Elements
within double quotation marks are regular expression terminals. The string literal within each
pair of double quotes is used to construct a java.util.regex.Pattern object for regular expression
matching.

GRAPHICAL USER INTERFACE

The Grambler GUI gives the user the following options:
●Create/edit a grammar
●Export/import a grammar to/from a Java file
●Create/edit a text input for the grammar to process
●Open/save a text input or grammar as a text file
●Export a text input as a JUnit test
○Check acceptance/rejection only
○Check matching parse tree
●Parse the current text input using the current grammar
○Determine whether the parse succeeded
○View the parse tree resulting from the parse
○View any Earley table row generated while parsing
Figure 2: Graphical User Interface

Figure 2 shows a screenshot of the user interface. The user creates a grammar in the
upper-left area. The user then enters a text input to be parsed in the lower-left area. Once the
grammar and text input are ready, the user clicks the Parse button. The upper-right area shows
a parse tree. The lower-right area shows the table rows generated by the Earley algorithm.

The parse tree is presented based on a preorder traversal (similar to the visualization
from [10]). Each row of the text output is a tree node. Each level of indentation indicates a level
of tree depth. The example above contains a tree with three levels and seven total nodes.

The Earley chart is visualized one row at a time. The user can select a row directly, jump
to the start or end, or iterate among consecutive rows. As some rows may contain no states, the
Next and Previous buttons jump only between rows that do have states. Each state is described
by its production, a period (“.”) before the next candidate symbol, and the current and origin
input positions for the state. The purpose of this aspect of the GUI is to enable a user who is
puzzled as to why a parse is failing to inspect the chart to find out precisely how far into the
input the parser got. From there, the user can inspect the states to figure out which productions
were attempted, and from there infer which productions failed to advance.

The File menu allows the user to export the grammar to a Java file that, when compiled
and executed, will parse the language the grammar specifies. The user can also import the
grammar from a Java file in the format it generates.

The Unit Tests menu allows the user to generate two kinds of unit tests: acceptance
checks and tree checks. An acceptance check will generate a unit test that checks to see if the
error status of the generated tree is identical to the error status of parsing the current text input.
A tree check ensures that the generated tree is identical to that in view on the GUI. These unit
tests are appended to a JUnit-compatible file that is automatically generated, based on the
name of the Java file used for saving the grammar.

APPLICATION PROGRAMMER INTERFACE

Once a grammar is complete, Grambler will generate Java code corresponding to it.
Specifically, it will generate a class that extends the Grammar class from the Grambler API.
Objects of the grammar class have a parse() method that takes a String parameter and returns
a Tree object corresponding to the concrete syntax tree for the input parameter. In the event of a
syntax error, a Tree object is still returned containing the error information.

A successful tree walk requires the ability to:

1.Determine the grammar label corresponding to each tree node;
2.Retrieve the children of each interior node; and
3.Reconstruct the original text input corresponding to a node.

To achieve these goals, the Tree class provides the following methods:
1.The getName() method returns the name of the left-hand side element corresponding to
this Tree node.
2.The hasNamed() method determines whether a child with a given name is present for this
tree Node. The getNamed() method retrieves the Tree object corresponding to a given
name. If more than one child shares the same name, an additional integer parameter
can be supplied to resolve the ambiguity.
3.The toString() method returns the input substring corresponding to this Tree object.

Syntax errors are flagged by the isError() method. If a syntax error is present anywhere
within the tree, isError() will be true for all of the ancestor nodes of the error, including the root
node. The getErrorMessage() method returns a String giving the the line number of the error as
well as a prefix of the line that was successfully matched immediately prior to the error.

RELATED WORK
The parser generator DParser [7] directly inspired the grammar syntax employed in
Grambler. While DParser addresses most of the problems with existing parser generators listed
in the introduction, is not suitable for our pedagogical goals for two reasons. First, being a
traditional command-line parser generator, it lacks the level of interactivity we require. Second,
the Earley algorithm was easier for us to conceptualize as an interactive visualization, in
comparison to the Tomita algorithm [9] DParser employs.

In both VAST [12] and Tree-Viewer [10], syntax tree visualizations were created for
educational purposes. We prefer the text-oriented preorder-traversal approach from [10], as
nearly a full line of text is available for describing each node. In the graphical-box approach in
[12], much less information can be displayed for each node.

Grammar Editor [8] is an interactive editor for context-free grammars employing the CYK
algorithm. As with Grambler, a user enters a grammar and text to be parsed, and the program
will display a parse tree. It cannot, however, create a parser generator that can be incorporated
into a user program.
CONCLUSION
The Grambler parser generator has been successful in simplifying the presentation of
parsing to students in a third-semester programming course. Every project in the class had a
working parser after two days of lecture and one assignment. We have been able to focus our
valuable course time on higher-level issues involved with incorporating parsers into computer
programs without being distracted by other issues such as obscure syntax and extended edit-
compile-test cycles.

Grambler is freely available for download from http://code.google.com/p/grambler/.

REFERENCES
[1] Earley, J., An efficient context-free parsing algorithm, Communications of the ACM, 13 (2):
94-102, 1970.
[2] Aho, A.V., Sethi, R., Ullman, J.D., Compilers: Principles, Techniques, and Tools, Addison-
Wesley, 1986.
[3] Parr, T., ANTLR, http://www.antlr.org/, retrieved 5/16/12.
[4] JavaCC, https://javacc.dev.java.net/, retrieved 5/16/12.
[5] Grimm, R., Rats!, http://cs.nyu.edu/rgrimm/xtc/rats-intro.html, retrieved 5/16/12.
[6] Gagnon, E., SableCC, http://sablecc.org/, retrieved 5/16/12.
[7] Plevyak, J., DParser, http://dparser.sourceforge.net/, retrieved 5/16/12.
[8] Burch, C., Grammar Editor, http://ozark.hendrix.edu/~burch/proj/grammar/, retrieved 5/16/12.
[9] Tomita, M., An efficient context-free parsing algorithm for natural languages, In Proceedings
of the International Joint Conference on Artificial Intelligence, pp. 756-764, 1985.
[10] Vegdahl, S., Using visualization tools to teach compiler design, Journal of Computing
Sciences in Colleges, 16 (2), January 2001.
[11] Demaille, A., Levillain, R., Perrot, B., A set of tools to teach compiler construction, In
Proceedings of ITiCSE-08, June 2008.
[12] Almeida-Martinez, F.J., Urguiza-Fuentes, J., Velazquez-Iturbide, J.A., VAST: Visualization of
Abstract Syntax Trees within language processors courses, In Proceedings of SoftVis '08,
September 2008.
[13] Wall, L., Apocalypse 5: Pattern Matching, http://perl6.org/archive/doc/design/apo/A05.html,
retrieved 7/31/12.

View publication stats

Compiler Construction Material
No ratings yet
Compiler Construction Material
98 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Compiler Construction Material
No ratings yet
Compiler Construction Material
94 pages
Compiler Design CS - 5
No ratings yet
Compiler Design CS - 5
53 pages
Natural Language Parsers - A Course in Cooking
No ratings yet
Natural Language Parsers - A Course in Cooking
87 pages
Cit316 - Nounmedia Summary
No ratings yet
Cit316 - Nounmedia Summary
50 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
37 pages
CD Unit-2 (DM)
No ratings yet
CD Unit-2 (DM)
53 pages
CS8602 CD Unit 2
No ratings yet
CS8602 CD Unit 2
43 pages
Left To Right-Right Most Parsing Algorithm With Lookahead
No ratings yet
Left To Right-Right Most Parsing Algorithm With Lookahead
7 pages
Aabhas CD2
No ratings yet
Aabhas CD2
7 pages
Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
No ratings yet
Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
46 pages
Compiler Ass
No ratings yet
Compiler Ass
13 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
Johnstone-Scott1998 Chapter GeneralisedRecursiveDescentPar
No ratings yet
Johnstone-Scott1998 Chapter GeneralisedRecursiveDescentPar
15 pages
Barve 2014
No ratings yet
Barve 2014
5 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Lecture 08 09 PDF
No ratings yet
Lecture 08 09 PDF
10 pages
Unit Ii 2 Marks
No ratings yet
Unit Ii 2 Marks
5 pages
Parsing Teaching Tools Report
No ratings yet
Parsing Teaching Tools Report
5 pages
Session 7 - Syntax Parsing
No ratings yet
Session 7 - Syntax Parsing
53 pages
Basic Parsing Techniques
No ratings yet
Basic Parsing Techniques
34 pages
CC LL
No ratings yet
CC LL
15 pages
Lecture Notes On Context-Free Grammars: 15-411: Compiler Design Frank Pfenning September 15, 2009
No ratings yet
Lecture Notes On Context-Free Grammars: 15-411: Compiler Design Frank Pfenning September 15, 2009
9 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
12.2unit 2
No ratings yet
12.2unit 2
25 pages
UNIt-II-P
No ratings yet
UNIt-II-P
57 pages
Parsing Techniques: Parsers
No ratings yet
Parsing Techniques: Parsers
16 pages
Chapter 13.1. LL Parsing
No ratings yet
Chapter 13.1. LL Parsing
28 pages
Unit-2 PCD
No ratings yet
Unit-2 PCD
36 pages
Unit III
No ratings yet
Unit III
29 pages
Efficient Earley Parsing With Regular Right-Hand Sides
No ratings yet
Efficient Earley Parsing With Regular Right-Hand Sides
14 pages
MODULE 3 Syntax Analysis
100% (1)
MODULE 3 Syntax Analysis
182 pages
Simple Grammars: LR (K) " SLR (K)
No ratings yet
Simple Grammars: LR (K) " SLR (K)
8 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
CC Project Proposal
No ratings yet
CC Project Proposal
10 pages
PCD - Unit Ii
No ratings yet
PCD - Unit Ii
31 pages
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
No ratings yet
Lexical and Syntax Analysis: CSE 325/CSE 425: Concepts of Programming Language
41 pages
Unit 2 (CD)
No ratings yet
Unit 2 (CD)
12 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
66 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
Lecture 4
No ratings yet
Lecture 4
26 pages
CD Module2 16 03 23 PDF
No ratings yet
CD Module2 16 03 23 PDF
36 pages
Parsernotes in C
No ratings yet
Parsernotes in C
45 pages
Syntax Analysis I 2024
No ratings yet
Syntax Analysis I 2024
38 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Unit 2
No ratings yet
Unit 2
29 pages
Parsing Assignment
No ratings yet
Parsing Assignment
6 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
AI Notes Part-3
No ratings yet
AI Notes Part-3
29 pages
Unit 3
No ratings yet
Unit 3
19 pages
Click To Open - Social Media Managers Toolbox
No ratings yet
Click To Open - Social Media Managers Toolbox
5 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
28 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Parsing and Parsing Techniques in Compiler Construction
No ratings yet
Parsing and Parsing Techniques in Compiler Construction
12 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
REST API in ASP - NET Core
No ratings yet
REST API in ASP - NET Core
15 pages
An Introduction To GCC - Brian Gough PDF
No ratings yet
An Introduction To GCC - Brian Gough PDF
124 pages
RizzCraft Color Guide
100% (1)
RizzCraft Color Guide
17 pages
Compiler Design Questions
No ratings yet
Compiler Design Questions
6 pages
Introduction On UEFI History
100% (1)
Introduction On UEFI History
4 pages
Bakery Management Synopsis
No ratings yet
Bakery Management Synopsis
13 pages
Quality Agreement Template 4.28.10
No ratings yet
Quality Agreement Template 4.28.10
19 pages
Ns2 Installation Procedure
No ratings yet
Ns2 Installation Procedure
18 pages
02 - Digital Image Processing
No ratings yet
02 - Digital Image Processing
38 pages
Capital One Offers Terms and Conditions
No ratings yet
Capital One Offers Terms and Conditions
4 pages
Change Management and iFIX
No ratings yet
Change Management and iFIX
67 pages
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
No ratings yet
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
15 pages
Water Tap
No ratings yet
Water Tap
14 pages
CS 3440 Graded Quiz Unit 6
No ratings yet
CS 3440 Graded Quiz Unit 6
7 pages
Manual
No ratings yet
Manual
24 pages
Casestudy 4
No ratings yet
Casestudy 4
3 pages
Solved - The Fourth-Degree Polynomial F (X) 230x4 + 18x3 + 9x2...
No ratings yet
Solved - The Fourth-Degree Polynomial F (X) 230x4 + 18x3 + 9x2...
7 pages
TN Apogee Prepress 10.0 - Apogee Impose
No ratings yet
TN Apogee Prepress 10.0 - Apogee Impose
49 pages
On-Line Monetary Transaction: Marketing in IT
No ratings yet
On-Line Monetary Transaction: Marketing in IT
16 pages
Mca Department: G. H. Raisoni Institute of Information Technology, Nagpur
No ratings yet
Mca Department: G. H. Raisoni Institute of Information Technology, Nagpur
18 pages
Agreement &doa
No ratings yet
Agreement &doa
3 pages
Flask WTF
No ratings yet
Flask WTF
29 pages
1,2&3
No ratings yet
1,2&3
43 pages
Intille S. S. (2002)
No ratings yet
Intille S. S. (2002)
7 pages
BAMC Layout and Manual Merged
No ratings yet
BAMC Layout and Manual Merged
15 pages
S33120+Kate Saenko+Fighting Dataset Bias in Computer Vision - 1617924588759001FZj3
No ratings yet
S33120+Kate Saenko+Fighting Dataset Bias in Computer Vision - 1617924588759001FZj3
47 pages
The Hveem Method
No ratings yet
The Hveem Method
22 pages
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
No ratings yet
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
4 pages
Resume 2022
No ratings yet
Resume 2022
2 pages
Choose An OTA For The Apple Watch Series 3 (42mm) IPSW Downloads
No ratings yet
Choose An OTA For The Apple Watch Series 3 (42mm) IPSW Downloads
1 page
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet

(A) - An Interactive Parser Generator For Context-Free Grammars

Uploaded by

(A) - An Interactive Parser Generator For Context-Free Grammars

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

AN INTERACTIVE PARSER GENERATOR FOR CONTEXT-FREE GRAMMARS

Article in Information Sciences · December 2012

Gabriel John Ferrer

The user has requested enhancement of the downloaded file.

IMPLEMENTING THE EARLEY PARSING ALGORITHM

To demonstrate the input language for context-free grammars, Figure 1 gives as an

Figure 1: Grammar for grammars

Each production is structured as follows. The left-hand side symbol is followed by a

GRAPHICAL USER INTERFACE

APPLICATION PROGRAMMER INTERFACE

A successful tree walk requires the ability to:

Grambler is freely available for download from http://code.google.com/p/grambler/.

View publication stats

You might also like