CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
1. Lexical Specification
And here is the list of tokens that your lexical analyzer needs to support:
PUBLIC = "public"
PRIVATE = "private"
EQUAL = "="
COLON = ":"
COMMA = ","
SEMICOLON = ";"
LBRACE = "{"
RBRACE = "}"
ID = letter (letter + digit)*
2. Grammar
Here is the grammar for our input language:
comments.
We highlight some of the syntactical elements of the language:
Global variables are optional
The scopes have optional public and private variables
Every scope has a body which is a list of statements
A statement can be either a simple assignment or another scope (a nested scope)
Here is the example program from the previous section, with all name references resolved (look
at the comments):
a, b, c;
test {
public:
a, b, hello;
private:
x, y;
a = b; // test.a = test.b
hello = c; // test.hello = ::c
y = r; // test.y = ?.r
nested {
public:
b;
a = b; // test.a = nested.b
x = hello; // ?.x = test.hello
c = y; // ::c = ?.y
}
}
4. Examples
The simplest possible program would be:
main {
a = a; // ?.a = ?.a
}
Let's add a global variable:
a;
main {
a = a; // ::a = ::a
}
// main.a = main.a
// main.a = main.a
Or a private a :
a;
main {
private:
a;
a = a;
}
And a public b :
a, b;
main {
public: b;
private: a;
nested {
a = b; // ::a = main.b
}
}
You can find more examples by looking at the test cases and their expected outputs.
5. Expected Output
There are two cases:
In case the input does not follow the grammar, the expected output is:
Syntax Error
NOTE: no extra information is needed here! Also, notice that we need the exact
message and it's case-sensitive.
In case the input follows the grammar:
For every assignment statement in the input program in order of their appearance in the
program, output the following information:
The resolved left-hand-side of the assignment
The resolved right-hand-side of the assignment
in the following format:
resolved_lhs = resolved_rhs
NOTE: You can assume that scopes have unique names and variable names in a single
scope (public and private) are not repeated.
test.a = test.b
test.hello = ::c
test.y = ?.r
test.a = nested.b
?.x = test.hello
::c = ?.y
6. Implementation
Start by modifying the lexical analyzer from previous project to make it recognize the
tokens required for parsing this grammar. It should also be able to handle comments
(skip them like spaces). NOTE: make sure you remove the tokens that are not used
in this grammar from your lexer, otherwise you might not be able to pass all test
cases. Your TokenType type declaration should look like this:
typedef enum { END_OF_FILE = 0,
PUBLIC, PRIVATE,
EQUAL, COLON, COMMA, SEMICOLON,
LBRACE, RBRACE, ID, ERROR
} TokenType;
Next, write a parser for the given grammar. You would need one function per each nonterminal of the grammar to handle parsing of that non-terminal. I suggest you use the
following signature for these functions:
void parse_X()
Where X would be replaced by the target non-terminal. The lexical analyzer object
needs to be accessible to these functions so that they can use the lexer to get and unget
tokens. These functions can be member functions of a class, and the lexer object can be
a member variable of that class.
You also need a syntax_error function that prints the proper message and
terminates the program:
void syntax_error()
{
cout << "Syntax Error\n";
exit(1);
}
Test your parser thoroughly. Make sure it can detect any syntactical errors.
Next, write a symbol table that stores information about scopes and variables. You
would also need to store assignments in a list to be accessed after parsing is finished.
You need to think about how to organize all this information in a way that is useful for
producing the required output.
Write a function that resolves the left-hand-side and right-hand-side of all assignments
and produces the required output. Call this function in your main() function after
successfully parsing the input.
NOTE: you might need more time to finish the last step compared to previous steps.
7. Requirements
You should use C/C++, no other programming languages are allowed.
You should test your code on CentOS 6.7
You should submit your code on the course submission website, no other submission
forms will be accepted.
8. Evaluation
The submissions are evaluated based on the automated test cases on the submission website.
Your grade will be proportional to the number of test cases passing. If your code does not
compile on the submission website, you will not receive any points.
Here is the breakdown of points for tests in different categories:
Parsing (including inputs with syntax errors and error-free inputs): 50 points
Name resolution: 50 points
Test cases containing comments: 10 points extra credit
The parsing test cases contain cases that are syntactically correct and cases that have syntax
errors. If a syntax test case has no syntax error, your program passes the test case if the output
is not Syntax Error . If a syntax test case has syntax error, your program passes the test
case if the output is Syntax Error .
Note that if your program prints the syntax error message independently of the input, for
example:
int main()
{
cout << "Syntax Error\n";
return 0;
}
It will pass some of the test cases, but you will not receive any points.