0% found this document useful (0 votes)
26 views23 pages

PART I: Overview Material: 2 Language Processors (Tombstone Diagrams, Bootstrapping) 3 Architecture of A Compiler

This document summarizes the process of developing a recursive descent parser from a grammar. It discusses: 1. Expressing the grammar in EBNF and performing transformations like left factorization and eliminating left recursion. 2. Creating a parser class with methods for accepting tokens from a scanner and a public parse method. 3. Implementing private parsing methods corresponding to each grammar rule, using pattern matching on the current token to determine the parsing path. 4. The algorithm to automatically generate these parsing methods by rewriting EBNF rules is described. For the parser to work correctly, the grammar must be LL(1).

Uploaded by

anithasudha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

PART I: Overview Material: 2 Language Processors (Tombstone Diagrams, Bootstrapping) 3 Architecture of A Compiler

This document summarizes the process of developing a recursive descent parser from a grammar. It discusses: 1. Expressing the grammar in EBNF and performing transformations like left factorization and eliminating left recursion. 2. Creating a parser class with methods for accepting tokens from a scanner and a public parse method. 3. Implementing private parsing methods corresponding to each grammar rule, using pattern matching on the current token to determine the parsing path. 4. The algorithm to automatically generate these parsing methods by rewriting EBNF rules is described. For the parser to work correctly, the grammar must be LL(1).

Uploaded by

anithasudha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Course Overview

PART I: overview material


1 Introduction
2 Language processors (tombstone diagrams, bootstrapping)
3 Architecture of a compiler
PART II: inside a compiler
4 Syntax analysis
5 Contextual analysis
6 Runtime organization
7 Code generation
PART III: conclusion
8 Interpretation
9 Review
Syntax Analysis (Chapter 4) 1
Systematic Development of Rec. Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations:
Left factorization and Left recursion elimination
(3) Create a parser class with
– private variable currentToken
– methods to call the scanner: accept and acceptIt
(4) Implement a public method for main function to call:
– public parse method that
• fetches the first token from the scanner
• calls parseS (where S is start symbol of the grammar)
• verifies that scanner next produces the end–of–file token
(5) Implement private parsing methods:
– add private parseN method for each non terminal N
Syntax Analysis (Chapter 4) 2
Developing RD Parser for Mini Triangle
Before we begin:
• The following non-terminals are recognized by the scanner
• They will be returned as tokens by the scanner
Identifier := Letter (Letter|Digit)*
Integer-Literal ::= Digit Digit*
Operator ::= + | - | * | / | < | > | =
Comment ::= ! Graphic* eol
Assume scanner returns instances of this class:
public class Token {
byte kind; String spelling;
final static byte
IDENTIFIER = 0,
INTLITERAL = 1;
...
Syntax Analysis (Chapter 4) 3
(1)&(2) Developing RD Parser for Mini Triangle

Program ::= single-Command


Command ::= single-Command
Left recursion elimination needed
| Command ; single-Command
single-Command Left factorization needed
::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
V-name ::= Identifier
...

Syntax Analysis (Chapter 4) 4


(1)&(2) Express grammar in EBNF and transform

After factorization etc. we get:


Program ::= single-Command
Command ::= single-Command (; single-Command)*
single-Command
::= Identifier
( := Expression | ( Expression ) )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
V-name ::= Identifier
...

Syntax Analysis (Chapter 4) 5


(1)&(2) Developing RD Parser for Mini Triangle
Expression Left recursion elimination
::= primary-Expression
needed
| Expression Operator primary-Expression
primary-Expression
::= Integer-Literal
| V-name
| Operator primary-Expression
| ( Expression )
Declaration Left recursion elimination
::= single-Declaration
needed
| Declaration ; single-Declaration
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter
Type-denoter ::= Identifier
Syntax Analysis (Chapter 4) 6
(1)&(2) Express grammar in EBNF and transform
After factorization and recursion elimination :
Expression
::= primary-Expression
( Operator primary-Expression )*
primary-Expression
::= Integer-Literal
| Identifier
| Operator primary-Expression
| ( Expression )
Declaration
::= single-Declaration (; single-Declaration)*
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter
Type-denoter ::= Identifier
Syntax Analysis (Chapter 4) 7
(3)&(4) Create a parser class and public parse method
public class Parser {
private Token currentToken;
private void accept (byte expectedKind) {
if (currentToken.kind == expectedKind)
currentToken = scanner.scan( );
else
report syntax error
}
private void acceptIt( ) {
currentToken = scanner.scan( );
}
public void parse( ) {
acceptIt( ); // get the first token
parseProgram( ); // Program is the start symbol
if (currentToken.kind != Token.EOT)
report syntax error
}
...
Syntax Analysis (Chapter 4) 8
(5) Implement private parsing methods
Program ::= single-Command

private void parseProgram( ) {


parseSingleCommand( );
}

Syntax Analysis (Chapter 4) 9


(5) Implement private parsing methods
single-Command
::= Identifier
( := Expression | ( Expression ) )
| if Expression then single-Command
else single-Command
| ... other alternatives ...

private void parseSingleCommand( ) {


switch (currentToken.kind) {
case Token.IDENTIFIER : ...
case Token.IF : ...
... other cases ...
default: report a syntax error
}
}

Syntax Analysis (Chapter 4) 10


Algorithm to convert EBNF into a RD parser
• The conversion of an EBNF specification into a Java or C++
implementation for a recursive descent parser is so “mechanical”
that it could easily be automated (such tools exist, but we won’t
use them in this course)
• We can describe the algorithm by a set of mechanical rewrite
rules
N ::= 
private void parseN( ) {
parse  // as explained on next two slides
}

Syntax Analysis (Chapter 4) 12


Algorithm to convert EBNF into a RD parser

parse t where t is a terminal


accept(t);

parse N where N is a non-terminal


parseN( );

parse 
// a dummy statement

parse X Y

parse X
parse Y

Syntax Analysis (Chapter 4) 13


Algorithm to convert EBNF into a RD parser
parse X*
while (currentToken.kind is in starters[X]) {
parse X
}

parse X | Y
switch (currentToken.kind) {
cases in starters[X]:
parse X
break;
cases in starters[Y]:
parse Y
break;
default:
if neither X nor Y generates  then report syntax error
}
Syntax Analysis (Chapter 4) 14
Example: “Generation” of parseCommand

Command ::= single-Command ( ; single-Command )*

private void parseCommand( ) {


parse single-Command );( ; single-Command )*
parseSingleCommand(
}while
parse ((currentToken.kind==Token.SEMICOLON)
; single-Command )* {
} acceptIt(
parse ; single-Command
); // because SEMICOLON has just been checked
} parseSingleCommand(
parse single-Command );
}}
}

Syntax Analysis (Chapter 4) 15


Example: Generation of parseSingleDeclaration
single-Declaration
::= const Identifier ~ Expression
| var Identifier : Type-denoter

private void parseSingleDeclaration( ) {


switch (currentToken.kind) {
private
case Token.CONST:
void parseSingleDeclaration( ) {
switch
parseacceptIt(
(currentToken.kind)
const );
Identifier {
~ Expression
case
| parseIdentifier(
varToken.CONST:
Identifier : );
Type-denoter
parse const
acceptIt( ); Identifier ~ Expression
} accept(Token.IS);
parse
parseIdentifier(
Identifier ); );
parseExpression(
case Token.VAR:
parse
accept(Token.IS);
case ~
Token.VAR:
var Identifier : Type-denoter
parse Expression
parseExpression(
acceptIt(
default: ); syntax
report ); error
} case Token.VAR:);
parseIdentifier(
parse var Identifier : Type-denoter
} accept(Token.COLON);
default:
parseTypeDenoter(
report syntax);error
} default: report syntax error
}}
} Analysis (Chapter 4)
Syntax 16
LL 1 Grammars
• The presented algorithm to convert EBNF into a parser
does not work for all possible grammars.
• It only works for so called “LL 1” grammars.
• Basically, an LL 1 grammar is a grammar which can
be parsed with a top-down parser with a lookahead (in
the input stream of tokens) of one token.
• What grammars are LL 1?
How can we recognize that a grammar is (or is not) LL 1?
=> We can deduce the necessary conditions from the
parser generation algorithm.

Syntax Analysis (Chapter 4) 17


LL 1 Grammars
parse X*
while (currentToken.kind is in starters[X]) {
parse X
} Condition: starters[X] must be
disjoint from the set of tokens that
parse X |Y can immediately follow X *
switch (currentToken.kind) { Conditions: starters[X] and starters[Y]
cases in starters[X]:
parse X
must be disjoint sets, and if either X
break; or Y generates  then must also be
cases in starters[Y]: disjoint from the set of tokens that can
parse Y immediately follow X | Y
break;
default: if neither X nor Y generates  then report syntax error
}

Syntax Analysis (Chapter 4) 18


LL 1 grammars and left factorization

The original Mini-Triangle grammar is not LL 1:

For example:
single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...
V-name ::= Identifier

Starters[V-name := Expression]
= Starters[V-name] = Starters[Identifier]
Starters[Identifier ( Expression )]
= Starters[Identifier] NOT DISJOINT!
Syntax Analysis (Chapter 4) 19
LL 1 grammars: left factorization
What happens when we generate a RD parser from a non LL 1 grammar?

single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...

private void parseSingleCommand( ) {


switch (currentToken.kind) { wrong: overlapping
case Token.IDENTIFIER: cases
parse V-name := Expression
case Token.IDENTIFIER:
parse Identifier ( Expression )
...other cases...
default: report syntax error
}
}
Syntax Analysis (Chapter 4) 20
LL 1 grammars: left factorization

single-Command
::= V-name := Expression
| Identifier ( Expression )
| ...

Left factorization (and substitution of V-name)

single-Command
::= Identifier
( := Expression | ( Expression ) )
| ...

Syntax Analysis (Chapter 4) 21


LL 1 Grammars: left recursion elimination

Command ::= single-Command


| Command ; single-Command
What happens if we don’t perform left-recursion elimination?
public void parseCommand( ) {
switch (currentToken.kind) { wrong: overlapping
case in starters[single-Command] cases
parseSingleCommand( );
case in starters[Command]
parseCommand( );
accept(Token.SEMICOLON);
parseSingleCommand( );
default: report syntax error
}
}

Syntax Analysis (Chapter 4) 22


LL 1 Grammars: left recursion elimination

Command ::= single-Command


| Command ; single-Command

Left recursion elimination


Command
::= single-Command (; single-Command)*

Syntax Analysis (Chapter 4) 23


Abstract Syntax Trees
• So far we have talked about how to build a recursive
descent parser which recognizes a given language
described by an (LL 1) EBNF grammar.
• Next we will look at
– how to represent AST as data structures.
– how to modify the parser to construct an AST data structure.
• We make heavy use of Object–Oriented Programming!
(classes, inheritance, dynamic method binding)

Syntax Analysis (Chapter 4) 24

You might also like