0% found this document useful (0 votes)
8 views36 pages

Syntax Analysis: COP5621 Compiler Construction

The document discusses the role of parsers in compiler construction, emphasizing their function in syntax checking and semantic actions. It covers various parsing techniques, error handling strategies, and the structure of context-free grammars. Additionally, it explains concepts like left recursion elimination, left factoring, and the criteria for LL(1) grammars.

Uploaded by

meaad mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views36 pages

Syntax Analysis: COP5621 Compiler Construction

The document discusses the role of parsers in compiler construction, emphasizing their function in syntax checking and semantic actions. It covers various parsing techniques, error handling strategies, and the structure of context-free grammars. Additionally, it explains concepts like left recursion elimination, left factoring, and the criteria for LL(1) grammars.

Uploaded by

meaad mohammed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

1

Syntax Analysis
Part I

Chapter 4

COP5621 Compiler Construction


Copyright Robert van Engelen, Florida State University, 2007-2011

2

Position of a Parser in the


Compiler Model

Token,
Source tokenval
Parser
Lexical Intermediate
Program
and rest of
Analyzer
representation

Get next front-end

token

Lexical error
Syntax error
Semantic error

Symbol Table

3

The Parser

• A parser implements a C-F grammar

• The role of the parser is twofold:

1. To check syntax (= string recognizer)

– And to report syntax errors accurately

2. To invoke semantic actions

– For static semantics checking, e.g. type checking of
expressions, functions, etc.

– For syntax-directed translation of the source code to
an intermediate representation

4

Syntax-Directed Translation

• One of the major roles of the parser is to produce
an intermediate representation (IR) of the source
program using syntax-directed translation methods

• Possible IR output:

– Abstract syntax trees (ASTs)

– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer list notation

– WHIRL (SGI Pro64 compiler) has 5 IR levels!

5

Error Handling

• A good compiler should assist in identifying and
locating errors

– Lexical errors: important, compiler can easily recover
and continue

– Syntax errors: most important for compiler, can almost
always recover

– Static semantic errors: important, can sometimes
recover

– Dynamic semantic errors: hard or impossible to detect
at compile time, runtime checks are required

– Logical errors: hard or impossible to detect

6

Viable-Prefix Property

• The viable-prefix property of parsers allows
early detection of syntax errors

– Goal: detection of an error as soon as possible
without further consuming unnecessary input

– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in
the language

Error is
Error is detected here

… detected here

Prefix
Prefix

for (;) DO 10 I = 1;0
… …
7

Error Recovery Strategies



• Panic mode

– Discard input until a token in a set of designated
synchronizing tokens is found

• Phrase-level recovery

– Perform local correction on the input to repair the error

• Error productions

– Augment grammar with productions for erroneous
constructs

• Global correction

– Choose a minimal sequence of changes to obtain a
global least-cost correction

8

Grammars (Recap)

• Context-free grammar is a 4-tuple
G = (N, T, P, S) where

– T is a finite set of tokens (terminal symbols)

– N is a finite set of nonterminals

– P is a finite set of productions of the form

α → β
where α ∈ (N∪T)* N (N∪T)* and β ∈ (N∪T)*

– S ∈ N is a designated start symbol

9

Notational Conventions Used



• Terminals

a,b,c,… ∈ T

specific terminals: 0, 1, id, +

• Nonterminals

A,B,C,… ∈ N

specific nonterminals: expr, term, stmt

• Grammar symbols

X,Y,Z ∈ (N∪T)

• Strings of terminals

u,v,w,x,y,z ∈ T*

• Strings of grammar symbols

α,β,γ ∈ (N∪T)*

10

Derivations (Recap)

• The one-step derivation is defined by

α A β ⇒ α γ β
where A → γ is a production in the grammar

• In addition, we define

– ⇒ is leftmost ⇒lm if α does not contain a nonterminal

– ⇒ is rightmost ⇒rm if β does not contain a nonterminal

– Transitive closure ⇒* (zero or more steps)

– Positive closure ⇒+ (one or more steps)

• The language generated by G is defined by

L(G) = {w ∈ T* | S ⇒+ w}

11

Derivation (Example)

Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
productions P =
E → E + E



E → E * E



E → ( E )



E → - E



E → id

Example derivations:

E ⇒ - E ⇒ - id

E ⇒rm E + E ⇒rm E + id ⇒rm id + id

E ⇒* E

E ⇒* id + id

E ⇒+ id * id + id

12

Chomsky Hierarchy: Language


Classification

• A grammar G is said to be

– Regular if it is right linear where each production is of
the form

A → w B
or
A → w
or left linear where each production is of the form

A → B w
or
A → w

– Context free if each production is of the form

A → α
where A ∈ N and α ∈ (N∪T)*

– Context sensitive if each production is of the form

α A β → α γ β
where A ∈ N, α,γ,β ∈ (N∪T)*, |γ| > 0

– Unrestricted

13

Chomsky Hierarchy

L(regular) ⊂ L(context free) ⊂ L(context sensitive) ⊂ L(unrestricted)


Where L(T) = { L(G) | G is of type T }


That is: the set of all languages
generated by grammars G of type T

Examples:

Every finite language is regular! (construct a FSA for strings in L(G))

L1 = { anbn | n ≥ 1 } is context free

L2 = { anbncn | n ≥ 1 } is context sensitive

14

Parsing

• Universal (any C-F grammar)

– Cocke-Younger-Kasimi

– Earley

• Top-down (C-F grammar with restrictions)

– Recursive descent (predictive parsing)

– LL (Left-to-right, Leftmost derivation) methods

• Bottom-up (C-F grammar with restrictions)

– Operator precedence parsing

– LR (Left-to-right, Rightmost derivation) methods

• SLR, canonical LR, LALR

15

Top-Down Parsing

• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing

Grammar: Leftmost derivation:
E → T + T E ⇒lm T + T
T → ( E ) ⇒lm id + T
T → - E ⇒lm id + id

T → id

E
E
E
E

T
T
T
T
T
T

+
id
+
id
+
id

16

Left Recursion (Recap)



• Productions of the form

A → A α 

| β

| γ
are left recursive

• When one of the productions in a grammar
is left recursive then a predictive parser
loops forever on certain inputs

17

A General Systematic Left


Recursion Elimination Method

Input: Grammar G with no cycles or ε-productions

Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do

for j = 1, …, i-1 do


replace each



Ai → Aj γ


with



Ai → δ1 γ | δ2 γ | … | δk γ


where



Aj → δ1 | δ2 | … | δk

enddo

eliminate the immediate left recursion in Ai
enddo

18

Immediate Left-Recursion
Elimination

Rewrite every left-recursive production

A → A α 

| β

| γ

| A δ
into a right-recursive production:

A → β AR

| γ AR

AR → α AR

| δ AR

| ε

19

Example Left Recursion Elim.



A → B C | a
B → C A | A b Choose arrangement: A, B, C

C → A B | C C | a

i = 1:

nothing to do
i = 2, j = 1:
B → C A | A b



B → C A | B C b | a b


⇒(imm)
B → C A BR | a b BR



BR → C b BR | ε 
i = 3, j = 1:
C → A B | C C | a



C → B C B | a B | C C | a
i = 3, j = 2:
C → B C B | a B | C C | a



C → C A BR C B | a b BR C B | a B | C C | a


⇒(imm)
C → a b BR C B CR | a B CR | a CR



CR → A BR C B CR | C CR | ε

20

Left Factoring

• When a nonterminal has two or more productions
whose right-hand sides start with the same
grammar symbols, the grammar is not LL(1) and
cannot be used for predictive parsing

• Replace productions

A → α β1 | α β2 | … | α βn | γ
with

A → α AR | γ

AR → β1 | β2 | … | βn

21

Predictive Parsing

• Eliminate left recursion from grammar

• Left factor the grammar

• Compute FIRST and FOLLOW

• Two variants:

– Recursive (recursive-descent parsing)

– Non-recursive (table-driven parsing)

22

FIRST (Revisited)

• FIRST(α) = { the set of terminals that begin all




strings derived from α }

FIRST(a) = {a}



if a ∈ T
FIRST(ε) = {ε}
FIRST(A) = ∪A→α FIRST(α)

for A→α ∈ P
FIRST(X1X2…Xk) =

if for all j = 1, …, i-1 : ε ∈ FIRST(Xj) then


add non-ε in FIRST(Xi) to FIRST(X1X2…Xk)

if for all j = 1, …, k : ε ∈ FIRST(Xj) then


add ε to FIRST(X1X2…Xk)

23

FOLLOW

• FOLLOW(A) = { the set of terminals that can


immediately follow nonterminal A }

FOLLOW(A) =

for all (B → α A β) ∈ P do


add FIRST(β)\{ε} to FOLLOW(A)

for all (B → α A β) ∈ P and ε ∈ FIRST(β) do


add FOLLOW(B) to FOLLOW(A)

for all (B → α A) ∈ P do


add FOLLOW(B) to FOLLOW(A)

if A is the start symbol S then


add $ to FOLLOW(A)

24

LL(1) Grammar

• A grammar G is LL(1) if it is not left recursive
and for each collection of productions

A → α1 | α2 | … | αn
for nonterminal A the following holds:

1.
FIRST(αi) ∩ FIRST(αj) = ∅ for all i ≠ j
2.
if αi ⇒* ε then

2.a.
αj ⇒* ε for all i ≠ j

2.b.
FIRST(αj) ∩ FOLLOW(A) = ∅



for all i ≠ j

25

Non-LL(1) Examples

Grammar
Not LL(1) because:

S → S a | a
Left recursive

S → a S | a
FIRST(a S) ∩ FIRST(a) ≠ ∅

S → a R | ε
R → S | ε
For R: S ⇒* ε and ε ⇒* ε

S → a R a For R:
R → S | ε
FIRST(S) ∩ FOLLOW(R) ≠ ∅

26

Recursive-Descent Parsing
(Recap)

• Grammar must be LL(1)

• Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminal’s syntactic
category of input tokens

• When a nonterminal has multiple productions,
each production is implemented in a branch of a
selection statement based on input look-ahead
information

27

Using FIRST and FOLLOW in a


Recursive-Descent Parser

procedure rest();
begin
expr → term rest if lookahead in FIRST(+ term rest) then
rest → + term rest match(‘+’); term(); rest()
else if lookahead in FIRST(- term rest) then
| - term rest match(‘-’); term(); rest()
| ε else if lookahead in FOLLOW(rest) then
term → id
return
else error()
end;

where
FIRST(+ term rest) = { + }

FIRST(- term rest) = { - }

FOLLOW(rest) = { $ }

28

Non-Recursive Predictive
Parsing: Table-Driven Parsing

• Given an LL(1) grammar G = (N, T, P, S)
construct a table M[A,a] for A ∈ N, a ∈ T
and use a driver program with a stack

input
a
+
b
$

stack

Predictive parsing
X
output

program (driver)

Y

Z
Parsing table
$
M

29

Constructing an LL(1) Predictive


Parsing Table

for each production A → α do

for each a ∈ FIRST(α) do


add A → α to M[A,a]

enddo

if ε ∈ FIRST(α) then


for each b ∈ FOLLOW(A) do



add A → α to M[A,b]


enddo 

endif
enddo
Mark each undefined entry in M error

30

Example Table
A → α
FIRST(α)
FOLLOW(A)

E → T ER
( id
$ )

ER → + T ER
+
$ )

E → T ER
ER → + T ER | ε  ER → ε
ε
$ )

T → F TR T → F TR
( id
+ $ )

TR → * F TR | ε  TR → * F TR
*
+ $ )

F → ( E ) | id
TR → ε
ε
+ $ )

F → ( E )
(
* + $ )

F → id
id
* + $ )

id
+
*
(
)
$

E
E → T ER
E → T ER

ER
ER → + T ER
ER → ε
ER → ε

T
T → F TR
T → F TR

TR
TR → ε
TR → * F TR
TR → ε
TR → ε

F
F → id
F → ( E )

31

LL(1) Grammars are


Unambiguous

Ambiguous grammar A → α
FIRST(α)
FOLLOW(A)

S → i E t S SR | a S → i E t S SR
i
e $

SR → e S | ε  S → a
a
e $

E → b
SR → e S
e
e $

SR → ε
ε
e $

E → b
b
t

Error: duplicate table entry

a
b
e
i
t
$

S
S → a
S → i E t S SR

SR → ε
SR
SR → ε

SR → e S

E
E → b

32

Predictive Parsing Program


push($)
(Driver)

push(S)
a := lookahead

repeat

X := pop()

if X is a terminal or X = $ then


match(X) // moves to next token and a := lookahead

else if M[X,a] = X → Y1Y2…Yk then


push(Yk, Yk-1, …, Y2, Y1) // such that Y1 is on top


… invoke actions and/or produce IR output …

else
error()

endif
until X = $

33

Example Table-Driven Parsing



Stack
Input
Production applied
$E
id+id*id$ E → T ER
$ERT id+id*id$ T → F TR
$ERTRF id+id*id$ F → id
$ERTRid id+id*id$
$ERTR +id*id$ TR → ε
$ER +id*id$ ER → + T ER

$ERT+ +id*id$
$ERT id*id$ T → F TR
$ERTRF id*id$ F → id
$ERTRid id*id$
$ERTR
*id$ TR → * F TR
$ERTRF* *id$
$ERTRF id$ F → id
$ERTRid id$
$ERTR $ TR → ε
$ER $ ER → ε

$
$

34

Panic Mode Recovery



Add synchronizing actions to FOLLOW(E) = { ) $ }
undefined entries based on FOLLOW
FOLLOW(ER) = { ) $ }
FOLLOW(T) = { + ) $ }
Pro:
Can be automated FOLLOW(TR) = { + ) $ }
Cons:
Error messages are needed
FOLLOW(F) = { + * ) $ }

id
+
*
(
)
$

E
E → T ER
E → T ER
synch
synch

ER
ER → + T ER
ER → ε
ER → ε

T
T → F TR
synch
T → F TR
synch
synch

TR
TR → ε
TR → * F TR
TR → ε
TR → ε

F
F → id
synch
synch
F → ( E )
synch
synch

synch:
the driver pops current nonterminal A and skips input till

synch token or skips input until one of FIRST(A) is found

35

Phrase-Level Recovery
Change input stream by inserting missing tokens
For example: id id is changed into id * id

Pro:
Can be automated
Cons:
Recovery not always intuitive

Can then continue here

id
+
*
(
)
$

E
E → T ER
E → T ER
synch
synch

ER
ER → + T ER
ER → ε
ER → ε

T
T → F TR
synch
T → F TR
synch
synch

TR
insert *
TR → ε
TR → * F TR
TR → ε
TR → ε

F
F → id
synch
synch
F → ( E )
synch
synch

insert *: driver inserts missing * and retries the production



36

Error Productions

E → T ER Add “error production”:
ER → + T ER | ε 
TR → F TR
T → F TR to ignore missing *, e.g.: id id

TR → * F TR | ε  Pro:
Powerful recovery method
F → ( E ) | id
Cons:
Cannot be automated

id
+
*
(
)
$

E
E → T ER
E → T ER
synch
synch

ER
ER → + T ER
ER → ε
ER → ε

T
T → F TR
synch
T → F TR
synch
synch

TR
TR → F TR
TR → ε
TR → * F TR
TR → ε
TR → ε

F
F → id
synch
synch
F → ( E )
synch
synch

You might also like