UNIT II-Syntax Analysis: CS416 Compilr Design 1

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 29

CS416 Compilr Design 1

UNIT II- Syntax Analysis


CS416 Compilr Design 2

Role of the parser
Writing Grammars
Context-Free Grammars
Top Down parsing
Recursive Descent
Predictive Parsing
Bottom-up
Shift Reduce Operator Precedent Parsing
LR Parsers
SLR Parser
Canonical LR Parser
LALR Parser.
CS416 Compilr Design 3
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-free grammar (CFG). We will
use BNF (Backus-Naur Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
If it satisfies, the parser creates the parse tree of that program.
Otherwise the parser gives the error messages.
A context-free grammar
gives a precise syntactic specification of a programming language.
the design of the grammar is an initial phase of the design of a compiler.
a grammar can be directly converted into a parser by some tools.

CS416 Compilr Design 4
Parser
Lexical
Analyzer

Parser

source
program
token
get next token
parse tree
Parser works on a stream of tokens.

The smallest item is a token.
CS416 Compilr Design 5
Parsers (cont.)
We categorize the parsers into two groups:

1. Top-Down Parser
the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
the parse is created bottom to top; starting from the leaves

Both top-down and bottom-up parsers scan the input from left to right
(one symbol at a time).
Efficient top-down and bottom-up parsers can be implemented only for
sub-classes of context-free grammars.
LL for top-down parsing
LR for bottom-up parsing
CS416 Compilr Design 6
Context-Free Grammars
Inherently recursive structures of a programming language are defined
by a context-free grammar.

In a context-free grammar, we have:
A finite set of terminals (in our case, this will be the set of tokens)
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form
A o where A is a non-terminal and
o is a string of terminals and non-terminals (including the empty string)
A start symbol (one of the non-terminal symbol)

Example:
E E + E | E E | E * E | E / E | - E
E ( E )
E id
CS416 Compilr Design 7
Derivations
E E+E

E+E derives from E
we can replace E by E+E
to able to do this, we have to have a production rule EE+E in our grammar.

E E+E id+E id+id

A sequence of replacements of non-terminal symbols is called a derivation of id+id from E.

In general a derivation step is
oA| o| if there is a production rule A in our grammar
where o and | are arbitrary strings of terminal and non-terminal symbols

o
1
o
2
... o
n
(o
n
derives from o
1
or o
1
derives o
n
)

: derives in one step
: derives in zero or more steps
: derives in one or more steps




*
+
CS416 Compilr Design 8
CFG - Terminology
L(G) is the language of G (the language generated by G) which is a set
of sentences.
A sentence of L(G) is a string of terminal symbols of G.
If S is the start symbol of G then
e is a sentence of L(G) iff S e where e is a string of terminals of G.

If G is a context-free grammar, L(G) is a context-free language.
Two grammars are equivalent if they produce the same language.

S o - If o contains non-terminals, it is called as a sentential form of G.
- If o does not contain non-terminals, it is called as a sentence of G.
+
*
CS416 Compilr Design 9
Derivation Example
E -E -(E) -(E+E) -(id+E) -(id+id)
OR
E -E -(E) -(E+E) -(E+id) -(id+id)

At each derivation step, we can choose any of the non-terminal in the sentential form
of G for the replacement.

If we always choose the left-most non-terminal in each derivation step, this derivation
is called as left-most derivation.

If we always choose the right-most non-terminal in each derivation step, this
derivation is called as right-most derivation.





CS416 Compilr Design 10
Left-Most and Right-Most Derivations
Left-Most Derivation

E -E -(E) -(E+E) -(id+E) -(id+id)

Right-Most Derivation

E -E -(E) -(E+E) -(E+id) -(id+id)

We will see that the top-down parsers try to find the left-most derivation
of the given source program.

We will see that the bottom-up parsers try to find the right-most
derivation of the given source program in the reverse order.

lm lm lm lm
lm
rm rm rm rm rm
CS416 Compilr Design 11
Parse Tree
Inner nodes of a parse tree are non-terminal symbols.
The leaves of a parse tree are terminal symbols.

A parse tree can be seen as a graphical representation of a derivation.
E -E
E
E -
E
E
E E
E
+
-
( )
E
E
E -
( )
E
E
id
E
E
E +
-
( )
id
E
E
E
E E +
-
( )
id
-(E)
-(E+E)
-(id+E) -(id+id)
CS416 Compilr Design 12
Ambiguity
A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
E E+E id+E id+E*E
id+id*E id+id*id
E E*E E+E*E id+E*E
id+id*E id+id*id
E
id
E +
id
id
E
E
*
E
E
E +
id
E
E
*
E
id id
CS416 Compilr Design 13
Ambiguity (cont.)
For the most parsers, the grammar must be unambiguous.

unambiguous grammar
unique selection of the parse tree for a sentence

We should eliminate the ambiguity in the grammar during the design
phase of the compiler.
An unambiguous grammar should be written to eliminate the ambiguity.
We have to prefer one of the parse trees of a sentence (generated by an
ambiguous grammar) to disambiguate that grammar to restrict to this
choice.
CS416 Compilr Design 14
Ambiguity (cont.)
stmt if expr then stmt |
if expr then stmt else stmt | otherstmts
if E
1
then if E
2
then S
1
else S
2

stmt

if expr then stmt else stmt

E
1
if expr then stmt S
2

E
2
S
1
stmt

if expr then stmt

E
1
if expr then stmt else stmt

E
2
S
1
S
2


1
2
CS416 Compilr Design 15
Ambiguity (cont.)
We prefer the second parse tree (else matches with closest if).
So, we have to disambiguate our grammar to reflect this choice.

The unambiguous grammar will be:

stmt matchedstmt | unmatchedstmt

matchedstmt if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt if expr then stmt |
if expr then matchedstmt else unmatchedstmt
CS416 Compilr Design 16
Ambiguity Operator Precedence
Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules.

E E+E | E*E | E^E | id | (E)
disambiguate the grammar
precedence: ^ (right to left)
* (left to right)
+ (left to right)
E E+T | T
T T*F | F
F G^F | G
G id | (E)

CS416 Compilr Design 17
Left Recursion
A grammar is left recursive if it has a non-terminal A such that there is
a derivation.

A Ao for some string o

Top-down parsing techniques cannot handle left-recursive grammars.
So, we have to convert our left-recursive grammar into an equivalent
grammar which is not left-recursive.
The left-recursion may appear in a single step of the derivation
(immediate left-recursion), or may appear in more than one step of
the derivation.
+
CS416 Compilr Design 18
Immediate Left-Recursion
A A o | | where | does not start with A
eliminate immediate left recursion
A | A

A

o A

| c an equivalent grammar
A A o
1
| ... | A o
m
| |
1
| ... | |
n
where |
1
... |
n
do not start with A
eliminate immediate left recursion
A |
1
A

| ... | |
n
A

A

o
1
A

| ... | o
m
A

| c an equivalent grammar
In general,
CS416 Compilr Design 19
Immediate Left-Recursion -- Example
E E+T | T
T T*F | F
F id | (E)

E T E


E

+T E

| c
T F T

*F T

| c
F id | (E)

eliminate immediate left recursion
CS416 Compilr Design 20
Left-Recursion -- Problem
A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S Aa | b
A Sc | d This grammar is not immediately left-recursive,
but it is still left-recursive.

S Aa Sca or
A Sc Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammar
CS416 Compilr Design 21
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A
1
... A
n

- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
A
i
A
j

by
A
i
o
1
| ... | o
k

where A
j
o
1
| ... | o
k

}
- eliminate immediate left-recursions among A
i
productions
}

CS416 Compilr Design 22
Eliminate Left-Recursion -- Example
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A Sd with A Aad | bd
So, we will have A Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A bdA

| fA


A

cA

| adA

| c

So, the resulting equivalent grammar which is not left-recursive is:
S Aa | b
A bdA

| fA


A

cA

| adA

| c
CS416 Compilr Design 23
Eliminate Left-Recursion Example2
S Aa | b
A Ac | Sd | f

- Order of non-terminals: A, S

for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A SdA

| fA


A

cA

| c

for S:
- Replace S Aa with S SdA

a | fA

a
So, we will have S SdA

a | fA

a | b
- Eliminate the immediate left-recursion in S
S fAaS | bS


S

dA

aS | c

So, the resulting equivalent grammar which is not left-recursive is:
S fAaS | bS


S

dA

aS | c
A SdA

| fA


A

cA

| c



CS416 Compilr Design 24
Left-Factoring
A predictive parser (a top-down parser without backtracking) insists
that the grammar must be left-factored.

grammar a new equivalent grammar suitable for predictive parsing

stmt if expr then stmt else stmt |
if expr then stmt

when we see if, we cannot now which production rule to choose to
re-write stmt in the derivation.
CS416 Compilr Design 25
Left-Factoring (cont.)
In general,

A o|
1
| o|
2
where o is non-empty and the first symbols
of |
1
and |
2
(if they have one)are different.

when processing o we cannot know whether expand
A to o|
1
or
A to o|
2


But, if we re-write the grammar as follows
A oA


A |
1
| |
2
so, we can immediately expand A to oA

CS416 Compilr Design 26
Left-Factoring -- Algorithm
For each non-terminal A with two or more alternatives (production
rules) with a common non-empty prefix, let say

A o|
1
| ... | o|
n
|
1
| ... |
m


convert it into

A oA

|
1
| ... |
m

A

|
1
| ... | |
n




CS416 Compilr Design 27
Left-Factoring Example1
A abB | aB | cdg | cdeB | cdfB

A aA

| cdg | cdeB | cdfB


A

bB | B

A aA

| cdA

A

bB | B
A

g | eB | fB


CS416 Compilr Design 28
Left-Factoring Example2
A ad | a | ab | abc | b

A aA | b
A d | c | b | bc

A aA | b
A d | c | bA
A c | c


CS416 Compilr Design 29
Non-Context Free Language Constructs
There are some language constructions in the programming languages
which are not context-free. This means that, we cannot write a context-
free grammar for these constructions.

L1 = { ece | e is in (a|b)*} is not context-free
declaring an identifier and checking whether it is declared or not
later. We cannot do this with a context-free language. We need
semantic analyzer (which is not context-free).

L2 = {a
n
b
m
c
n
d
m
| n>1 and m>1 } is not context-free
declaring two functions (one with n parameters, the other one with
m parameters), and then calling them with actual parameters.

You might also like