0% found this document useful (0 votes)

493 views21 pages

Use JavaCC To Build A User Friendly

use javacc to build a user friendly program

Uploaded by

Zerihun Bekele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

493 views21 pages

Use JavaCC To Build A User Friendly

use javacc to build a user friendly program

Uploaded by

Zerihun Bekele

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Use JavaCC to build a user friendly boolean query

language
Now you're talking my language!

JoAnn Brereton January 15, 2004

JavaCC is a very powerful "compiler compiler" tool that can be used to formulate context-free
grammars. This article demonstrates how JavaCC can be used to allow end users to formulate
simple boolean queries against a DB2 UDB database.

Introduction to JavaCC
Many Web-based projects contain ad-hoc query systems that allow end users to search for
information. For this, some sort of language is needed for the end users to convey what it is
they wish to search for. Sometimes, a user query language is defined quite simply. If your end
user is content to have a language as simple as your most typical Google search, then Java's
StringTokenizer is more than up to the task for parsing. However, if a customer wishes to have a
more robust language, perhaps adding in parentheses and "AND"/"OR" logic, then we quickly find
ourselves in need of a bigger hammer. We need a means to first define the language that the user
will be using and then use that definition to parse the user's entries and formulate the correct query
to whatever back-end database we have.

That's where a tool like JavaCC comes in. JavaCC stands for "Java® Compiler Compiler", a name
that stands in homage to YACC, "Yet Another Compiler Compiler", the C-based tool developed
by AT&T for the purposes of building parsers for C and other high-level languages. YACC and
its cohort lexical tokenizer, "Lex", accepted as input a definition of a language in a common form
known as Backus-Naur form (BNF, aka Backus Normal Form) and emitted a "C" program that
could parse input in that language and perform functions on it. JavaCC, like YACC, is designed
to speed the process of developing parser logic for languages. Whereas YACC emitted C code,
however, JavaCC, as you might expect, emits Java code.

JavaCC's history is storied. It began its life in Sun, under the name "Jack". Jack later became
JavaCC which passed through several owners, notably Metamata and WebGain, before coming
back to Sun, who then released it as Open Source code under the BSD license.

JavaCC's strength is in its simplicity and extensibility. To compile the Java code generated by
JavaCC, no outside JAR files or directories are required. Simply compile with the base 1.2 Java

© Copyright IBM Corporation 2004 Trademarks

Use JavaCC to build a user friendly boolean query language Page 1 of 21
developerWorks® ibm.com/developerWorks/

compiler. The layout of the language also makes it easy to add production rules and behaviors.
The Web site even describes how you can tailor exceptions to give your user targetted syntax
hints.

The Problem Definition

Let us suppose that you have a customer at a video rental store that has a simple database of
movies. The database consists of tables for MOVIES, ACTORS, and KEYWORDS. The MOVIES
table lists the pertinent data for each movie in his shop, namely things like the title and director for
each movie. The ACTORS table lists the names of actors in each movie. The KEYWORDS table
lists words describing the movie, such as words like "action", "drama", "adventure", and so on.

The customer would like to be able to issue somewhat sophisticated queries against this database.
He would, for example, like to enter a query of the form

actor = "Christopher Reeve" and keyword=action and keyword=adventure

and have it return the Superman movies featuring Christopher Reeve. He would also like to use
parentheses to indicate order of evaluation to distinguish queries like

(actor = "Christopher Reeve" and keyword=action) or keyword=romance

which might return movies that do not feature Christopher Reeve from

actor = "Christopher Reeve" and (keyword=action or keyword=romance)

which should always return movies starring Christopher Reeve.

The Solution
For this exercise, you will define the solution in two stages. In the stage 1, you will define the
language in JavaCC, ensuring that end-user queries are parsed correctly. In stage 2, you will add
behavior to JavaCC code in order to produce DB2® SQL code that will ensure that the correct
movies are returned in answer to the end-user queries.

Stage 1 - Defining the User's Query Language

The language will be defined inside of a file named UQLParser.jj. This file will be "compiled" by the
JavaCC tool into a set of Java class files of type .java. To define a language in a JJ file, you need
to do five things:

1. Define the context of parsing

2. Define "white space"
3. Define "tokens"
4. Define the syntax of the language itself in terms of the tokens
5. Define the behavior that will happen at each stage of parsing
You can define your own UQLParser.jj file using the segments presented, or you can follow along
using the code associated with this paper. Use the copy of UQLParser.jj in JavaCCPaper/stage1/

Use JavaCC to build a user friendly boolean query language Page 2 of 21

ibm.com/developerWorks/ developerWorks®

src for steps 1 through 4. Step 5 is covered in JavaCCPaper/stage2/src. DDL for the sample
database can be found in JavaCCPaper/moviedb.sql. The example works best if the database is
created and the parser run using the same userid. Ant files (build.xml) are provided to speed to
compilation process.

Step 1. Define the context of parsing

JavaCC .jj files are translated by the JavaCC executable into .java files. JavaCC takes the portion
of the .jj file that lives between PARSER_BEGIN and ends with PARSER_END and copies it
directly into the resulting java file. This is the place where you, the parser designer, can place all of
your pre- and post-parsing activity related to your parser. This is also the place where you can link
your java code to the parser actions you will define in steps 4 and 5.

In the example shown below, the parser is relatively simple. The constructor UQLParser takes
as input a String, reading it in through Java's java.io.StringReader class, and then calls another,
unseen constructor with the StringReader cast to Reader. The only other method defined here
is the static main method which calls the constructor and then calls an as-yet undefined method
called parse().

As you might have already guessed, JavaCC already provides a constructor from Java's Reader
class. We have added the String-based constructor for ease of use and testing.

Listing 1. The parser's Java context

PARSER_BEGIN(UQLParser)

package com.demo.stage1;

import java.io.StringReader;
import java.io.Reader;

public class UQLParser {

/**
A String based constructor for ease of use.
**/
public UQLParser(String s)
{
this((Reader)(new StringReader(s)));

public static void main(String args[])

{
try
{
String query = args[0];
UQLParser parser = new UQLParser(query);
parser.parse();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
PARSER_END(UQLParser)

Use JavaCC to build a user friendly boolean query language Page 3 of 21

developerWorks® ibm.com/developerWorks/

Step 2. Define white space

In this language, you wish to treat as delimiters but otherwise ignore spaces, tabs, carriage
returns and newlines. These characters are known as white space. In JavaCC, we define these
characters in the SKIP section, as indicated in Listing 2.

Listing 2. Defining white space in the SKIP section

/** Skip these characters, they are considered "white space" **/
SKIP :
{
" "
| "\t"
| "\r"
| "\n"

Step 3. Define the tokens

Next, you define what tokens you will recognize in the language. A token is the smallest unit of
parsed string that will have meaning to the parsing program. The process that scans an input
string and determines what the tokens are is called the tokenizer. In the query,

actor = "Christopher Reeve"

The tokens are

• actor
• =
• "Christopher Reeve"
In your language, you will make actor and the equal sign (=) reserved tokens in the language,
much as the words if and instanceof are reserved tokens with particular meaning within the
Java language. By reserving words and other special tokens, the programmer promises that
the parser will recognize these words literally, assigning a specific meaning to them. So long as
you're reserving words, go ahead and reserve the not equal sign (<>), and, or, the left and right
parentheses. Also reserve title, director, and keyword to represent specific fields for the user to
search.

To define all of these, use the TOKEN directive of JavaCC. Each token definition is enclosed in
angle-brackets, (< and >). The token's name is given on the left hand side of the colon ( : ) and
a regular expression is given on the right hand side. A regular expression is a way of defining a
portion of text to be matched. In its simplest form, regular expressions can match exact sequences
of characters. Use the code below to define six tokens that match exact words and four others that
match symbols. When the parser sees any of the words, it will match them symbolically as AND,
OR, TITLE, ACTOR, DIRECTOR, or KEYWORD. When the symbols are matched, the parser will
return LPAREN, RPAREN, EQUALS, or NOTEQUAL accordingly. Listing 3 shows the JavaCC
reserved tokens definition.

Use JavaCC to build a user friendly boolean query language Page 4 of 21

ibm.com/developerWorks/ developerWorks®

Listing 3. Defining reserved tokens

TOKEN: /*RESERVED TOKENS FOR UQL */
{
<AND: "and">
| <OR: "or">
| <TITLE: "title">
| <ACTOR: "actor">
| <DIRECTOR: "director">
| <KEYWORD: "keyword">
| <LPAREN: "(">
| <RPAREN: ")">
| <EQUALS: "=">
| <NOTEQUAL: "<>">
}

For strings like "Christopher Reeve", you cannot possibly set aside all the names of actors
as reserved words in our language. Instead, you will recognize tokens of type STRING or
QUOTED_STRING using a character pattern defined using regular expressions. Regular
expressions are strings of characters that define a pattern to be matched. Defining the regular
expression that will match all strings or quoted strings is a bit trickier than defining exact word
matches.

You will define a STRING as a series of one or more characters, the valid characters being A
through Z, upper and lower case, and the numerals 0 through 9. For the purposes of simplification,
don't worry about movie stars or movie titles with accented characters or other anomalies. You can
write this pattern as a regular expression in the following way.

<STRING : (["A"-"Z", "a"-"z", "0"-"9"])+ >

The plus sign indicates that the pattern enclosed in the parentheses (any character from A to Z,
a to z, or 0 to 9) should occur one or more times in sequence. In JavaCC, you may also use the
asterisk (*) to indicate zero or more appearances of a pattern or a question mark (?) to indicate 0
or 1 repetition.

QUOTED_STRING is a little trickier. You will define a string to be a QUOTED_STRING if it begins

with a quote, ends with a quote and has any other character in between. The regular expression
for this is "\"" (~["\""])+ "\"" which certainly is an eyeful. To simplify, it helps to understand
that since the quote character itself has meaning to JavaCC, we need to escape the quote for it
to be meaningful to our language instead of JavaCC. To escape the quote, we use a backward
slash to precede it. The tilde character (~) means NOT to the JavaCC tokenizer. (~["\""])+
is shorthand for one or more non-quote characters. Taken altogether, "\"" (~["\""])+ "\""
means ?a quote followed by one or more non-quotes followed by a quote".

You should add the tokenizer rules for STRING and QUOTED_STRING after the rules for the
reserved words. It is important to preserve this ordering because the order that tokenizer rules
appear in the file are the order in which token rules are applied. You want to be very sure that
"title" is taken as the reserved word rather than as a STRING. The complete STRING and
QUOTED_STRING token definition appears in Listing 4.

Use JavaCC to build a user friendly boolean query language Page 5 of 21

developerWorks® ibm.com/developerWorks/

Listing 4. Defining STRING and QUOTED_STRING

TOKEN :
{
<STRING : (["A"-"Z", "0"-"9"])+ >
<QUOTED_STRING: "\"" (~["\""])+ "\"" >
}

Step 4. Define the language in terms of the tokens

Now that you have the tokens defined, it is time to define the parsing rules in terms of the tokens.
The user enters the query in the form of an expression. An expression is defined as a series of
one or more query terms joined by the boolean operators and or or.

To express this, we need to write a parsing rule, also known as a production. Write the production
in Listing 5 into the JavaCC UQLParser.JJ file.

Listing 5. The expression() production

void expression() :
{
}
{ queryTerm()
(
( <AND> | <OR> )
queryTerm() )*

When Javacc is run against a .jj file, productions are translated into methods. The first set of
braces will enclose any declarations that will be needed by the production method. For now, this
is left blank. The second set of braces encloses the production rule written in a way that JavaCC
understands. Note the use of the AND and OR tokens that were defined earlier. Also note that
queryTerm() is written as a method call. queryTerm() in fact, is another production method.

Now, let us define the queryTerm() production. queryTerm() is either a single criterion (such as
title="The Matrix") or an expression surrounded by parentheses. queryTerm() is defined in JavaCC
recursively, via expression(), which allows you to sum up the language succinctly using the code
shown in Listing 6.

Listing 6. The queryTerm() production method in JavaCC (UQLParser.jj)

void queryTerm() :
{
}
{
(<TITLE> | <ACTOR> |
<DIRECTOR> | <KEYWORD>)
( <EQUALS> | <NOTEQUAL>)
( <STRING> | <QUOTED_STRING> )
|
<LPAREN> expression() <RPAREN>
}

That's all the rules we need. The entire language parser has been summed up in two productions.

Use JavaCC to build a user friendly boolean query language Page 6 of 21

ibm.com/developerWorks/ developerWorks®

Taking JavaCC for a test drive

At this point, you should have a valid JavaCC file. You can compile and "run" this program to see if
your parser operates correctly before moving on to step 5.

The ZIP file provided with this paper should include the sample stage 1 JavaCC file, UQLParser.jj.
Unzip the entire ZIP file into an empty directory. To compile stage1/UQLParser.jj, first you'll need to
download JavaCC and install according to the directions at the JavaCC Web page. For simplicity's
sake, be sure to put the executable path for Javacc.bat into your PATH environment variable.
Compilation is easy; cd into the directory where you've unloaded UQLParser.jj and enter the
following.

javacc ?debug_parser ?output_directory=.\com\demo\stage1 UQLParser.jj

You may also use the enclosed Ant file, build.xml if you prefer. You will have to make some
adjustments to the top of the properties file to point to your JavaCC installation. JavaCC should
produce the messages shown in Listing 7 the first time you run it.

Listing 7. Output from compilation of UQLParser.jj

Java Compiler Compiler Version 3.2 (Parser Generator)
(type "javacc" with no arguments for help)
Reading from file UQLParser.jj . . .
File "TokenMgrError.java" does not exist. Will create one.
File "ParseException.java" does not exist. Will create one.
File "Token.java" does not exist. Will create one.
File "SimpleCharStream.java" does not exist. Will create one.
Parser generated successfully.

In addition to the four files mentioned, JavaCC will also produce UQLParser.java,
UQLParserConstants.java, and UQLParserTokenManager.java. All of these are written to the
com\demo\stage1 directory. From here, you should be able to compile these files without any
additions to the default runtime classpath. The Ant file default target will perform the Java compile
automatically if the JavaCC step runs successfully. If not, you can compile your files from the top
level directory (JavaCCPaper/stage1) with

javac ?d bin src\com\demo\stage1\*.java

Once you have Java class files in place, you may test your new parser by entering the following
sample user query into the "main" java method you have defined. If you are using the same code,
begin from the JavaCCPaper/stage1 directory and enter the following on the command line.

java ?cp bin com.demo.stage1.UQLParser "actor = \"Tom Cruise\""

The ?debug_parser option that we used during the JavaCC step ensures that the following useful
trace messages are output, indicating how the user query is parsed. The output should appear as
in Listing 8.

Use JavaCC to build a user friendly boolean query language Page 7 of 21

developerWorks® ibm.com/developerWorks/

Listing 8. Output from UQLParser, query actor="Tom Cruise"

Call: parse
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

To test the recursive path for parenthesized expressions, try the following test.

java ?cp bin com.demo.stage1.UQLParser "(actor=\"Tom Cruise\" or actor=\"Kelly McGillis

\") and keyword=drama"

This should produce the output in listing 9.

Listing 9. Output from UQL1Parser, query (actor="Tom Cruise" or actor="Kelly

McGillis") and keyword=drama
Call: parse
Call: expression
Call: queryTerm
Consumed token: <"(">
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Consumed token: <"OR">
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Kelly McGillis"">
Return: queryTerm
Return: expression
Consumed token: <")">
Return: queryTerm
Consumed token: <"AND">
Call: queryTerm
Consumed token: <"keyword">
Consumed token: <"=">
Consumed token: <<STRING>: "drama">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

This output is useful because it illustrates the recursion through queryTerm and expression. The
first instance of queryTerm is actually an expression made up of two queryTerms. Figure 1 shows a
pictorial view of this parse path.

Use JavaCC to build a user friendly boolean query language Page 8 of 21

ibm.com/developerWorks/ developerWorks®

Figure 1. Graphical representation of parse of user query

If you're curious as to what Java code was produced by JavaCC, by all means, have a look (but
don't try to change any of it!). Here is what you'll find.

UQLParser.java ? In this file, you will find the code that you placed between PARSER_BEGIN and
PARSER_END in your UQLParser.jj file. You will also discover that your JJ production methods
have been mutated into Java methods.

For example, expression() rule has been expanded into the code found in Listing 10.

Listing 10. UQLParser.java

static final public void expression() throws ParseException {
trace_call("expression");
try {
queryTerm();
label_1:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
case OR:
;
break;
default:
jj_la1[0] = jj_gen;
break label_1;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
jj_consume_token(AND);
break;
case OR:
jj_consume_token(OR);

Use JavaCC to build a user friendly boolean query language Page 9 of 21

developerWorks® ibm.com/developerWorks/

break;
default:
jj_la1[1] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
queryTerm();
}
} finally {
trace_return("expression");
}
}

It bears some resemblance to what you originally wrote in that queryTerm(), AND, and OR make
an appearance but the rest is parsing detail that JavaCC has added around it.

UQLParserConstants.java ? This file is easy to figure out. All of the tokens that you have defined
are here. JavaCC has merely listed them in an array and provided integer constants to reference
into that array. Listing 11 shows the contents of UQLParserConstants.java

Listing 11. UQLParserConstants.java

/* Generated By:JavaCC: Do not edit this line.
UQLParserConstants.java */

package com.demo.stage1;

public interface UQLParserConstants {

int EOF = 0;
int AND = 5;
int OR = 6;
int TITLE = 7;
int ACTOR = 8;
int DIRECTOR = 9;
int KEYWORD = 10;
int LPAREN = 11;
int RPAREN = 12;
int EQUALS = 13;
int NOTEQUAL = 14;
int STRING = 15;
int QUOTED_STRING = 16;

int DEFAULT = 0;

String[] tokenImage = {
"<EOF>",
"\" \"",
"\"\\t\"",
"\"\\r\"",
"\"\\n\"",
"\"and\"",
"\"or\"",
"\"title\"",
"\"actor\"",
"\"director\"",
"\"keyword\"",
"\"(\"",
"\")\"",
"\"=\"",
"\"<>\"",
"<STRING>",
"<QUOTED_STRING>",

Use JavaCC to build a user friendly boolean query language Page 10 of 21

ibm.com/developerWorks/ developerWorks®

};

UQLParserTokenManager.java ? This is one scary file. JavaCC uses this class as the tokenizer.
This is the piece of code that determines what the tokens are. The primary routine of interest here
is GetNextToken. This is the routine used by the parser production methods to determine which
path to take.

SimpleCharStream.java ? This file is used by UQLParserTokenManager to represent the ASCII

stream of characters to be parsed.

Token.java ? The Token class is provided to represent the tokens themselves. The next part of
this paper will demonstrate the usefulness of the Token class.

TokenMgrError.java and ParseException ? These classes represent exception conditions in the

tokenizer and parser respectively.

Stage 2 - Adding behavior to JavaCC code

Note: For this part of the tutorial, refer to the stage2 subdirectory of the code. The JJ file that is
presented from here on is JavaCCPaper/stage2/UQLParser.jj. In order to run your sample SQL
queries, you should also create the MOVIEDB database using the enclosed moviedb.sql file.
Execute the DDL using db2 -tf moviedb.sql.

Now that we've parsed, we need to take action on the individual expressions. The goal of this
stage is to produce a runnable DB2 SQL query that will return what the user expects.

The process should start with a boilerplate SELECT containing a blank spot where the parser will
fill in the rest. The SELECT template appears in Listing 12. The query produced by the parser may
not be as optimal as a human DBA might write, but it will return the correct results expected by the
end user.

Listing 12. The SELECT statement

SELECT TITLE, DIRECTOR
FROM MOVIE
WHERE MOVIE_ID IN
(
? parser will fill in here?.
);

What the parser fills in depends on the path it takes through the tokenizer. For example, if the user
enters the query from above:

(actor="Tom Cruise" or actor="Kelly McGillis") and keyword=drama"

then the parser should emit text into the missing portion of the SQL query according to Figure
2. It will echo the parentheses, enter subqueries for the terminal queryTerms and substitute
INTERSECT for AND and UNION for OR.

Use JavaCC to build a user friendly boolean query language Page 11 of 21

developerWorks® ibm.com/developerWorks/

Figure 2. Parser output to SQL query

This would ensure that the SQL query emitted for

(actor = "Tom Cruise" or actor = "Kelly McGillis") and keyword=drama

would appear as in Listing 13.

Listing 13. The complete SELECT statement

SELECT TITLE, DIRECTOR
FROM MOVIE
WHERE MOVIE_ID IN
(
(
SELECT MOVIE_ID
FROM ACTOR
WHERE NAME=?Tom Cruise?
UNION
SELECT
MOVIE_ID
FROM ACTOR
WHERE NAME=?Kelly McGillis?
)
INTERSECT
SELECT MOVIE_ID
FROM KEYWORD
WHERE KEYWORD=?drama?
);

As mentioned earlier, there are probably shorter, more optimal ways to write this particular
query but this SQL will produce correct results. Generally, the DB2 optimizer can smooth out
performance deficiencies.

Use JavaCC to build a user friendly boolean query language Page 12 of 21

ibm.com/developerWorks/ developerWorks®

So, what needs to be added to the JavaCC source to produce this? You must add actions
and other supporting code to the defined grammar. An action is Java code that is executed in
response to a particular production. Before adding actions, first add a method that will return
the completed SQL to the caller. To do this, add a method called getSQL() to the top part of the
JavaCC file. You should also add the private StringBuffer sqlSB to the parser's internal members.
This variable will represent the current SQL string at any stage of the parsing. Listing 14 shows the
PARSER_BEGIN/PARSER_END section of UQLParser.jj. Finally, add some code in your main()
test method to print out and execute the generated SQL query.

Listing 14. PARSER_BEGIN/PARSER_END section

PARSER_BEGIN(UQLParser)

package com.demo.stage2;

import java.sql.DriverManager;
import java.sql.Connection;
import java.sql.Statement;
import java.sql.ResultSet;

import java.sql.Statement;
import java.lang.StringBuffer;
import java.io.StringReader;
import java.io.Reader;

public class UQLParser {

private static StringBuffer sqlSB;

// internal SQL representation.

public UQLParser(String s)
{
this((Reader)(new StringReader(s)));
sqlSB = new StringBuffer();
}

public String getSQL()

{
return sqlSB.toString();
}

public static void main(String args[])

{
try
{
String query = args[0];
UQLParser parser =
new UQLParser(query);
parser.parse();
System.out.println("\nSQL Query: " +
parser.getSQL());

// Note: This code assumes a

// default connection
// (current userid and password).
System.out.println("\nResults of Query");

Class.forName(
"COM.ibm.db2.jdbc.app.DB2Driver"
).newInstance();
Connection con =
DriverManager.getConnection(
"jdbc:db2:moviedb");

Use JavaCC to build a user friendly boolean query language Page 13 of 21

developerWorks® ibm.com/developerWorks/

Statement stmt =
con.createStatement();
ResultSet rs =
stmt.executeQuery(parser.getSQL());
while(rs.next())
{
System.out.println("Movie Title = " +
rs.getString("title") +
" Director = " +
rs.getString("director"));
}
rs.close();
stmt.close();
con.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
PARSER_END(UQLParser)

Now fill in the actions taken by the parser. We'll start with the easy one first. When an expression
is being parsed, the parser should emit the word "INTERSECT" whenever it parses "AND" and
"UNION" whenever it parses "OR". To do that, insert self-contained blocks of Java code after the
<AND> and <OR> tokens in the expression production. The code should append INTERSECT or
UNION to the sqlSB StringBuffer. This is illustrated in Listing 15.

Listing 15. Actions taken for expressions

void expression() :
{
}
{ queryTerm()
(
( <AND>
{ sqlSB.append("\nINTERSECT\n"); }
| <OR>
{ sqlSB.append("\nUNION\n"); }
)
queryTerm() )*
}

Several actions need to be taken within the queryTerm()production. These tasks are as follows:

1. Map the search names to their respective DB2 tables and columns
2. Save the comparator token.
3. Put the comparands into a form that DB2 can understand, for example remove the double
quotes from QUOTED_STRING tokens
4. Output the proper subselect into sqlSB
5. For the recursive expression case, emit the parentheses as is.

For all of these tasks, you will need some local variables. These are defined between the first pair
of braces in the production, as shown in Listing 16.

Use JavaCC to build a user friendly boolean query language Page 14 of 21

ibm.com/developerWorks/ developerWorks®

Listing 16. Local variables for queryTerm()

void queryTerm() :
{
Token tSearchName, tComparator, tComparand;
String sComparand, table, columnName;
}

The first task can be accomplished with the code in Listing 17. Set the proper DB2 table and
column associated with the token encountered.

Listing 17. Mapping search names to DB2

(
<TITLE> {table = "movie";
columnName = "title"; } |
<DIRECTOR> {table = "movie";
columnName = "director"; } |
<KEYWORD> {table = "keyword";
columnName = "keyword"; } |
<ACTOR> {table = "actor";
columnName = "name"; }
)

The second task can be accomplished with the code in Listing 18. Save the token so that it can be
used in the SQL buffer.

Listing 18. Saving the comparator

( tComparator=<EQUALS> |
tComparator=<NOTEQUAL> )

The third task can be accomplished with the code in Listing 19. Set the comparand value
accordingly, stripping the double quotes from the QUOTED_STRING token if necessary.

Listing 19. Preparing the comparands

tComparand=<STRING> {
sComparand = tComparand.image; }
|
tComparand=<QUOTED_STRING>
{ // need to get rid of quotes.
sComparand =
tComparand.image.substring(1,
tComparand.image.length() - 1);
}

The fourth task can be accomplished with the code in Listing 20. The complete query term is
appended to the sql buffer.

Listing 20. Writing the SQL expressions

{
sqlSB.append("SELECT MOVIE_ID FROM ").append(table);
sqlSB.append("\nWHERE ").append(columnName);
sqlSB.append(" ").append(tComparator.image);
sqlSB.append(" '").append(sComparand).append("'");
}

Use JavaCC to build a user friendly boolean query language Page 15 of 21

developerWorks® ibm.com/developerWorks/

Finally, for the case of recursive expressions, the parser should simply echo parentheses when it
sees them within the expression recursion, as illustrated in Listing 21.

Listing 21. Echoing parentheses

<LPAREN>
{ sqlSB.append("("); }
expression()
<RPAREN>
{ sqlSB.append(")"); }

Listing 22 shows the completed queryTerm() production.

Listing 22. The complete queryTerms() production

/**
* Query terms may consist of a parenthetically
* separated expression or may be a query criteria
* of the form queryName = something or
* queryName <> something.
*
*/
void queryTerm() :
{
Token tSearchName, tComparator, tComparand;
String sComparand, table, columnName;
}
{
(
<TITLE> {table = "movie";
columnName = "title"; } |
<DIRECTOR> {table = "movie";
columnName = "director"; } |
<KEYWORD> {table = "keyword";
columnName = "keyword"; } |
<ACTOR> {table = "actor";
columnName = "name"; }
)

( tComparator=<EQUALS> |
tComparator=<NOTEQUAL> )

(
tComparand=<STRING>
{ sComparand = tComparand.image; } |
tComparand=<QUOTED_STRING>
{ // need to get rid of quotes.
sComparand = tComparand.image.substring(1,
tComparand.image.length() - 1);
}
)

{
sqlSB.append("SELECT MOVIE_ID FROM ").append(table);
sqlSB.append("\nWHERE ").append(columnName);
sqlSB.append(" ").append(tComparator.image);
sqlSB.append(" '").append(sComparand).append("'");
}
|
<LPAREN>
{ sqlSB.append("("); }
expression()
<RPAREN>
{ sqlSB.append(")"); }
}

Use JavaCC to build a user friendly boolean query language Page 16 of 21

ibm.com/developerWorks/ developerWorks®

Compile and run UQLParser.jj as before. Visit UQLParser.java and note how your production rules
have been neatly inserted into the generated code. An example of the expanded expression()
method is shown in Listing 23. Note the code after the jj_consume_token calls.

Listing 23. expression() method in UQLParser.java

/**
* An expression is defined to be a queryTerm followed by zero
* or more query terms joined by either an AND or an OR. If two
* query terms are joined with * AND then both conditions must
* be met. If two query terms are joined with an OR, then
* one of the two conditions must be met.
*/
static final public void expression() throws ParseException {
trace_call("expression");
try {
queryTerm();
label_1:
while (true) {
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
case OR:
;
break;
default:
jj_la1[0] = jj_gen;
break label_1;
}
switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
case AND:
jj_consume_token(AND);
sqlSB.append("\nINTERSECT\n");
break;
case OR:
jj_consume_token(OR);
sqlSB.append("\nUNION\n");
break;
default:
jj_la1[1] = jj_gen;
jj_consume_token(-1);
throw new ParseException();
}
queryTerm();
}
} finally {
trace_return("expression");
}
}

Run this code as before. You will have to include db2java.zip in the CLASSPATH. This time, when
you run

java ?cp bin;c:/sqllib/db2java.zip com.demo.stage2.UQLParser "(actor=\"Tom Cruise\" or

actor=\"Kelly McGillis\") and keyword=drama"

it should produce the output in listing 24.

Listing 24. Output from UQL2Parser, query (actor="Tom Cruise" or

actor="Kelly McGillis") and keyword=drama
Call: parse

Use JavaCC to build a user friendly boolean query language Page 17 of 21

developerWorks® ibm.com/developerWorks/

Call: expression
Call: queryTerm
Consumed token: <"(">
Call: expression
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Tom Cruise"">
Return: queryTerm
Consumed token: <"or">
Call: queryTerm
Consumed token: <"actor">
Consumed token: <"=">
Consumed token: <<QUOTED_STRING>:
""Kelly McGillis"">
Return: queryTerm
Return: expression
Consumed token: <")">
Return: queryTerm
Consumed token: <"and">
Call: queryTerm
Consumed token: <"keyword">
Consumed token: <"=">
Consumed token: <<STRING>: "drama">
Return: queryTerm
Return: expression
Consumed token: <<EOF>>
Return: parse

SQL Query: SELECT TITLE,DIRECTOR

FROM MOVIE
WHERE MOVIE_ID IN (
(SELECT MOVIE_ID FROM actor
WHERE name = 'Tom Cruise'
UNION
SELECT MOVIE_ID FROM actor
WHERE name = 'Kelly McGillis')
INTERSECT
SELECT MOVIE_ID FROM keyword
WHERE keyword = 'drama')

Results of Query
Movie Title = Top Gun Director = Tony Scott
Movie Title = Witness Director = Peter Weir

Try a few more queries to get used to your parser. Try a query using the NOTEQUAL token, as in
actor<>"Harrison Ford". Try some illegal queries like "title=" to see what happens. With very few
lines of JavaCC code, you have produced a very effective end-user query language.

Final Points to Consider

JavaCC, in addition to providing the parser builder, also provides the JJDOC tool for documenting
your grammar in Backus-Naur Form. JJDOC can make it easy for you to provide your end users
with a description of the language they are using. The ant file provided in the accompanying code
has a "bnfdoc" target as an example.

JavaCC also provide a tool called JJTree. This tool provides tree and node classes which make
it easy for you to partition your code into separate parsing and action classes. Moving forward
with this example, you might consider writing a modest optimizer for your query to eliminate
unnecessary INTERSECTs and UNIONS. You could accomplish this by visiting the nodes of

Use JavaCC to build a user friendly boolean query language Page 18 of 21

ibm.com/developerWorks/ developerWorks®

the parse tree and consolidating similar adjacent nodes (actor="Tom Cruise" and actor="Kelly
McGillis", for example).

JavaCC has a rich library of grammars. Before writing a parser yourself, be sure to look in
JavaCC's examples directory for a possible ready-built solution.

Be sure to read the Frequently Asked Questions on the JavaCC Web page and visit the javacc
newsgroup at comp.compilers.tools.javacc to get a better understanding of all the capabilities and
features of JavaCC.

Conclusion
JavaCC is a robust tool that can be used to define grammars and easily incorporate the parsing
and exception handling for that grammar into your Java business application. With this article, we
have demonstrated that JavaCC can be used to provide a powerful, yet simple query language for
end users of a database system.

Use JavaCC to build a user friendly boolean query language Page 19 of 21

developerWorks® ibm.com/developerWorks/

Downloadable resources
Description Name Size
JavaCCPaper.zip ( HTTP | FTP ) 8.47KB

Use JavaCC to build a user friendly boolean query language Page 20 of 21

ibm.com/developerWorks/ developerWorks®

Related topics
• YACC: Yet Another Compiler Compiler The very first widely available "compiler compiler" is
described in a paper by Johnson, Stephen C, AT&T Bell Laboratories, Murray Hill, New Jersey
07974
• Build your own Languages with JavaCC is a very good JavaWorld article written by Ensileng,
Oliver.

(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Use JavaCC to build a user friendly boolean query language Page 21 of 21

Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
56 pages
ACSC368: Artificial Intelligence: Course Details
No ratings yet
ACSC368: Artificial Intelligence: Course Details
4 pages
2017 Volume 14 Number 2 2017 Volume 14 Number 2: ISSN 1739-4341 ISSN 1739-4341
No ratings yet
2017 Volume 14 Number 2 2017 Volume 14 Number 2: ISSN 1739-4341 ISSN 1739-4341
124 pages
Second Semester Report
No ratings yet
Second Semester Report
29 pages
Applied Linguistic
No ratings yet
Applied Linguistic
32 pages
FENCE GENERAL SPECIFICATIONS
No ratings yet
FENCE GENERAL SPECIFICATIONS
4 pages
Compiler Design Lecture Notes
No ratings yet
Compiler Design Lecture Notes
37 pages
Getting Started With JavaCC
No ratings yet
Getting Started With JavaCC
9 pages
Breadth-First Search - Wikipedia, The Free Encyclopedia
No ratings yet
Breadth-First Search - Wikipedia, The Free Encyclopedia
3 pages
1MIS Interview Questions and Answers PDF Download
No ratings yet
1MIS Interview Questions and Answers PDF Download
4 pages
Applied Artificial Intelligence
No ratings yet
Applied Artificial Intelligence
31 pages
Java How to Program Early Objects 11th Edition Deitel Test Bank - Available For Instant Download And Reading
100% (2)
Java How to Program Early Objects 11th Edition Deitel Test Bank - Available For Instant Download And Reading
45 pages
Interactive View and I O
No ratings yet
Interactive View and I O
4 pages
OWENS%20-%20Group%20Portfolio%20Peer%20Evaluation%20Form
No ratings yet
OWENS%20-%20Group%20Portfolio%20Peer%20Evaluation%20Form
2 pages
Autumn Hard Waste 2024 Residential A5 GL
No ratings yet
Autumn Hard Waste 2024 Residential A5 GL
4 pages
Lecture6 Java
No ratings yet
Lecture6 Java
84 pages
Autonomous Driving Based On Accurate Localization Using Multilayer LiDAR and Dead Reckoning
No ratings yet
Autonomous Driving Based On Accurate Localization Using Multilayer LiDAR and Dead Reckoning
6 pages
Chap 04
100% (1)
Chap 04
22 pages
Lecture11 Java
No ratings yet
Lecture11 Java
80 pages
Mechanical Power Press
100% (3)
Mechanical Power Press
150 pages
Bce Unit 5
No ratings yet
Bce Unit 5
36 pages
Lecture4 Java
No ratings yet
Lecture4 Java
46 pages
CTK Tables
No ratings yet
CTK Tables
3 pages
Sustainable Investment
No ratings yet
Sustainable Investment
2 pages
Microsoft Word High Level Overview Slides
No ratings yet
Microsoft Word High Level Overview Slides
9 pages
Final Lab Exam
No ratings yet
Final Lab Exam
13 pages
Microsoft PowerPoint High Level Overview Slides
No ratings yet
Microsoft PowerPoint High Level Overview Slides
9 pages
LAB - 02 (Class Design Basics - Part-1) PDF
No ratings yet
LAB - 02 (Class Design Basics - Part-1) PDF
5 pages
Computer Network UNIT 5
No ratings yet
Computer Network UNIT 5
28 pages
Compilers: CS414-2017S-01 Compiler Basics & Lexical Analysis
No ratings yet
Compilers: CS414-2017S-01 Compiler Basics & Lexical Analysis
58 pages
Chap 01
No ratings yet
Chap 01
11 pages
ArrayList in Java
No ratings yet
ArrayList in Java
7 pages
Applications Audit Checklist - en - v3
No ratings yet
Applications Audit Checklist - en - v3
13 pages
Net Integration Using Message Broker and Ibm Integration Bus
No ratings yet
Net Integration Using Message Broker and Ibm Integration Bus
55 pages
XPath Cheat Sheet v1.1
No ratings yet
XPath Cheat Sheet v1.1
1 page
Answer For HMIS Exercise Revised Feb, 2010
No ratings yet
Answer For HMIS Exercise Revised Feb, 2010
30 pages
Solari Et Al, 2003 PDF
No ratings yet
Solari Et Al, 2003 PDF
26 pages
Chap 02
No ratings yet
Chap 02
16 pages
Lecture3 Java
No ratings yet
Lecture3 Java
82 pages
PE Notes
100% (1)
PE Notes
1 page
2021 Company Profile - AdvanceNet Group Updated
No ratings yet
2021 Company Profile - AdvanceNet Group Updated
11 pages
g7m3l10 - Properties of Inequalities
No ratings yet
g7m3l10 - Properties of Inequalities
5 pages
Core Java
No ratings yet
Core Java
54 pages
Chapter 1 - Intro To Emerging Technologies
100% (1)
Chapter 1 - Intro To Emerging Technologies
58 pages
Compiler Assignment - 1-5 Unit Aktu
No ratings yet
Compiler Assignment - 1-5 Unit Aktu
6 pages
Focus Area Ms-Word: - Practical and Oral Questions Focus On
No ratings yet
Focus Area Ms-Word: - Practical and Oral Questions Focus On
5 pages
Selenium Commands Det
No ratings yet
Selenium Commands Det
15 pages
Crud Operation
No ratings yet
Crud Operation
10 pages
An Overview and Survey On Multi Agent System: Budianto
No ratings yet
An Overview and Survey On Multi Agent System: Budianto
6 pages
The Challenges of Mechatronics
No ratings yet
The Challenges of Mechatronics
2 pages
Chapter 3 - Artificial Intelligence
No ratings yet
Chapter 3 - Artificial Intelligence
26 pages
Java Tutorials
100% (1)
Java Tutorials
42 pages
ACSC 368 - Artificial Intelligence: Coursework 1
No ratings yet
ACSC 368 - Artificial Intelligence: Coursework 1
1 page
Spice Model of Diode
100% (1)
Spice Model of Diode
20 pages
Features of Java Programming Language
No ratings yet
Features of Java Programming Language
7 pages
Decision Making and Branching
No ratings yet
Decision Making and Branching
18 pages
JDK Tutorials - Herong's Tutorial Examples
From Everand
JDK Tutorials - Herong's Tutorial Examples
Herong Yang
No ratings yet
Part 1. Experiments With Javacc: Source Code Source Code
No ratings yet
Part 1. Experiments With Javacc: Source Code Source Code
3 pages
Angel Heart by Bonnie Tyler
No ratings yet
Angel Heart by Bonnie Tyler
4 pages
CSCP 363: Object Oriented Programming in Java I
No ratings yet
CSCP 363: Object Oriented Programming in Java I
21 pages
AITS1
No ratings yet
AITS1
29 pages
Junit Presentation
No ratings yet
Junit Presentation
26 pages
EC Declaration of Conformity and Manufacturer's Certificate For Rope Only (EN 10204 Type 3.1)
No ratings yet
EC Declaration of Conformity and Manufacturer's Certificate For Rope Only (EN 10204 Type 3.1)
11 pages
Core Java PDF
No ratings yet
Core Java PDF
177 pages
Java Interview Programs PDF
No ratings yet
Java Interview Programs PDF
70 pages
STRINGS StringBuffer StringBuilder StringTokenizer
No ratings yet
STRINGS StringBuffer StringBuilder StringTokenizer
9 pages
ACSC 368 - Artificial Intelligence: Homework 1
No ratings yet
ACSC 368 - Artificial Intelligence: Homework 1
1 page
VB Program
No ratings yet
VB Program
120 pages
Java Coding Standard
No ratings yet
Java Coding Standard
4 pages
Sample DB Project
No ratings yet
Sample DB Project
17 pages
Classes and Objects (Java)
No ratings yet
Classes and Objects (Java)
14 pages
Session 1 (1) (7 Files Merged)
No ratings yet
Session 1 (1) (7 Files Merged)
231 pages
DevLabs Alliance Top 20 Java Programming Interview Questions For SDET
No ratings yet
DevLabs Alliance Top 20 Java Programming Interview Questions For SDET
18 pages
Chapter 5 VR, AR and MR
No ratings yet
Chapter 5 VR, AR and MR
22 pages
Authentic Movement A Dance With The Divi PDF
100% (1)
Authentic Movement A Dance With The Divi PDF
23 pages
Topic: Evolution of Java, Java Architecture, Language Basics, Flow Control
25% (4)
Topic: Evolution of Java, Java Architecture, Language Basics, Flow Control
15 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
Unit 5 2 Marks
No ratings yet
Unit 5 2 Marks
10 pages
Lesson Plan ON EBP
No ratings yet
Lesson Plan ON EBP
10 pages
EBTax Purchasing Whitepaper
No ratings yet
EBTax Purchasing Whitepaper
52 pages
NOTES
No ratings yet
NOTES
156 pages
Exercise 09
No ratings yet
Exercise 09
8 pages
Web Lab Report 2
No ratings yet
Web Lab Report 2
6 pages
Unit-2 CPPM
No ratings yet
Unit-2 CPPM
12 pages
Java Programming
From Everand
Java Programming
Brian Evenson
No ratings yet
SQL
No ratings yet
SQL
8 pages
175 16sccit4-16sccca3-16scccs3 2020051602435211 PDF
No ratings yet
175 16sccit4-16sccca3-16scccs3 2020051602435211 PDF
126 pages
Javascript Basics & HTML Dom
No ratings yet
Javascript Basics & HTML Dom
67 pages
Cbse Xii Boolean Algebra 130206085155 Phpapp01
100% (1)
Cbse Xii Boolean Algebra 130206085155 Phpapp01
57 pages
Java Programming - Unit I Java: SS Govt. Arts College
No ratings yet
Java Programming - Unit I Java: SS Govt. Arts College
15 pages
Oracle 11am
No ratings yet
Oracle 11am
15 pages
Introduction To Javacc: Cheng-Chia Chen
No ratings yet
Introduction To Javacc: Cheng-Chia Chen
87 pages
Mind Map Database
No ratings yet
Mind Map Database
6 pages
Training of Trainers For Ethiopia - May 29-30, 2023
No ratings yet
Training of Trainers For Ethiopia - May 29-30, 2023
5 pages
10 003591 01EN FlexoFORM User PDF
No ratings yet
10 003591 01EN FlexoFORM User PDF
86 pages
Toaz - Info JDBC Notes Shishira PDF PR
No ratings yet
Toaz - Info JDBC Notes Shishira PDF PR
50 pages
Wrapper Classes
No ratings yet
Wrapper Classes
22 pages
C18H0 UFAA78 Proposal
No ratings yet
C18H0 UFAA78 Proposal
9 pages
Groovy in Action
No ratings yet
Groovy in Action
18 pages
Practical Exercise1
No ratings yet
Practical Exercise1
3 pages
Web Programming - Full Notes
No ratings yet
Web Programming - Full Notes
140 pages
3 Jndi Notes
No ratings yet
3 Jndi Notes
14 pages
Core Java Cheat Sheet
No ratings yet
Core Java Cheat Sheet
10 pages
PHY 206 Lecture 06
No ratings yet
PHY 206 Lecture 06
283 pages
DCS 115 Object Oriented Programming
No ratings yet
DCS 115 Object Oriented Programming
304 pages
HCL Sample Technical Placement Paper
No ratings yet
HCL Sample Technical Placement Paper
8 pages
Java Quick Reference Guide
No ratings yet
Java Quick Reference Guide
2 pages
CMP 101 Set 14 Programming in C
No ratings yet
CMP 101 Set 14 Programming in C
16 pages
PENTAGON SPACE - Java Full Stack Brochure New Syllabus 01
No ratings yet
PENTAGON SPACE - Java Full Stack Brochure New Syllabus 01
10 pages
Core Java Syllabus: Bishnu Charan Barik
No ratings yet
Core Java Syllabus: Bishnu Charan Barik
6 pages
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet