0% found this document useful (0 votes)
16 views61 pages

Chapter4 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views61 pages

Chapter4 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

1

Syntax Analysis
Part I
2

Position of a Parser/Syntax
Analyzer in the Compiler Model
Token,
tokenval Parser/syntax analysis
Source Lexical Intermediate
and rest of
Program Analyzer representation
Get next front-end
token
Lexical error Syntax error
Semantic error

Symbol Table
3

The Parser
• A parser implements a C-F grammar as a
recognizer of strings
• The role of the parser in a compiler is twofold:
1. To check syntax (= string recognizer)
• And to report syntax errors accurately
2. To invoke semantic actions
• For static semantics checking, e.g. type checking of
expressions, functions, etc.
• For syntax-directed translation of the source code to an
intermediate representation
4

Syntax-Directed Translation
• One of the major roles of the parser is to produce
an intermediate representation (IR) of the source
program using syntax-directed translation
methods
• Possible IR output:
– Abstract syntax trees (ASTs)
– Control-flow graphs (CFGs) with triples, three-address
code, or register transfer list notation
– WHIRL (SGI Pro64 compiler) has 5 IR levels!
5

Error Handling
• A good compiler should assist in identifying and
locating errors
– Lexical errors: important, compiler can easily recover
and continue
– Syntax errors: most important for compiler, can almost
always recover
– Static semantic errors: important, can sometimes
recover
– Dynamic semantic errors: hard or impossible to detect
at compile time, runtime checks are required
– Logical errors: hard or impossible to detect
6

Viable-Prefix Property
• The viable-prefix property of parsers allows
early detection of syntax errors
– Goal: detection of an error as soon as possible
without further consuming unnecessary input
– How: detect an error as soon as the prefix of the
input does not match a prefix of any string in
the language
Error is
Error is detected here
… detected here …
Prefix Prefix DO 10 I = 1;0
for (;)
… …
7

Error Recovery Strategies


• Panic mode
– Discard input until a token in a set of designated
synchronizing tokens is found
• Phrase-level recovery
– Perform local correction on the input to repair the error
• Error productions
– Augment grammar with productions for erroneous
constructs
• Global correction
– Choose a minimal sequence of changes to obtain a
global least-cost correction
8

Grammars
• Context-free grammar is a 4-tuple
G = (N, T, P, S) where
– T is a finite set of tokens (terminal symbols)
– N is a finite set of nonterminals
– P is a finite set of productions of the form
α→β
where α ∈ (N∪T)* N (N∪T)* and β ∈
(N∪T)*
– S ∈ N is a designated start symbol
9

Notational Conventions Used


• Terminals
a,b,c,… ∈ T
specific terminals: 0, 1, id, +
• Nonterminals
A,B,C,… ∈ N
specific nonterminals: expr, term, stmt
• Grammar symbols
X,Y,Z ∈ (N∪T)
• Strings of terminals
u,v,w,x,y,z ∈ T*
• Strings of grammar symbols
α,β,γ ∈ (N∪T)*
10

Derivations
• The one-step derivation is defined by
αAβ⇒αγβ
where A → γ is a production in the grammar
• In addition, we define
– ⇒ is leftmost ⇒lm if α does not contain a nonterminal
– ⇒ is rightmost ⇒rm if β does not contain a nonterminal
– Transitive closure ⇒* (zero or more steps)
– Positive closure ⇒+ (one or more steps)
• The language generated by G is defined by
L(G) = {w ∈ T* | S ⇒+ w}
11

Derivation (Example)
Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
productions P = E → E + E
E→E*E
E→(E)
E→-E
E → id
Example derivations:
E ⇒ - E ⇒ - id
E ⇒rm E + E ⇒rm E + id ⇒rm id +
id
E ⇒ E*E
E ⇒ E * E+E E ⇒ E * E+id E ⇒ E * id+id
E ⇒id * id + id
12
13
14
15

Chomsky Hierarchy: Language


Classification
• A grammar G is said to be
– Regular if it is right linear where each production is of
the form
A → w B or A → w
or left linear where each production is of the form
A → B w or A → w
– Context free if each production is of the form
A→α
where A ∈ N and α ∈ (N∪T)*
– Context sensitive if each production is of the form
αAβ→αγβ
where A ∈ N, α,γ,β ∈ (N∪T)*, |γ| > 0
– Unrestricted
16

Chomsky Hierarchy

L(regular) ⊂ L(context free) ⊂ L(context sensitive) ⊂ L(unrestricted)

Where L(T) = { L(G) | G is of type T }


That is: the set of all languages
generated by grammars G of type T

Examples:
Every finite language is regular! (construct a FSA for strings in L(G))
L1 = { anbn | n ≥ 1 } is context free
L2 = { anbncn | n ≥ 1 } is context sensitive
17

Parsing
• Universal (any C-F grammar)
– Cocke-Younger-Kasimi
– Earley
• Top-down (C-F grammar with restrictions)
– Recursive descent (predictive parsing)
– LL (Left-to-right, Leftmost derivation) methods
• Bottom-up (C-F grammar with restrictions)
– Operator precedence parsing
– LR (Left-to-right, Rightmost derivation) methods
• SLR, canonical LR, LALR
18

Top-Down Parsing
• LL methods (Left-to-right, Leftmost
derivation) and recursive-descent parsing

Grammar: Leftmost derivation:


E→T+T E ⇒lm T + T
T→(E) ⇒lm id + T
T→-E ⇒lm id + id
T → id
E E E E

T T T T T T

+ id + id + id
19
20

Recursive-Descent Parsing
21

Backtracking
using the second production of A for matching the input string
22

Left Recursion
• Productions of the form
A → A α | β| γ
are left recursive
• When one of the productions in a grammar
is left recursive then a predictive parser
loops forever on certain inputs
23

Immediate Left-Recursion
Elimination
Rewrite every left-recursive production
A → A α | β| γ| A δ
into a right-recursive production:
A → β A’| γ A’
A’ → α A’ | δ A’ | ε
24
25
26

A General Systematic Left


Recursion Elimination Method
Input: Grammar G with no cycles or ε-productions
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai → Aj γ
with
A i → δ1 γ | δ 2 γ | … | δ k γ
where
A j → δ1 | δ 2 | … | δ k
enddo
eliminate the immediate left recursion in Ai
enddo
27

Example Left Recursion Elim.


S→ Aa|b
A → Ac | Sd | ε Choose arrangement: S, A

i = 1: nothing to do
i = 2, j = 1: A → Ac | Sd | ε
A → Ac | Aad | bd| ε
A → bdA’ |A’
A’ → cA’|adA’ | ε
28

Left Factoring
• When a nonterminal has two or more productions
whose right-hand sides start with the same
grammar symbols, the grammar is not LL(1) and
cannot be used for predictive parsing
• Replace productions
A → α β1 | α β2 | … | α βn | γ
with
A → α A’ | γ
A’ → β1 | β2 | … | βn
29
30

Examples-Left Factoring
31
32

Example-Left Recursion
33
34

Predictive Parsing: Table-Driven


Parsing
• Given an LL(1) grammar G = (N, T, P, S)
construct a table M[A,a] for A ∈ N, a ∈ T
and use a stack
input a + b $

stack
Predictive parsing
X output
program
Y
Z Parsing table
$ M
35
36

Predictive Parsing Program


37

Example
38
39
40
41
42
43

Predictive Parsing
• Eliminate left recursion from grammar
• Left factor the grammar
• Compute FIRST and FOLLOW
• Two variants:
– Recursive (recursive-descent parsing)
– Non-recursive (table-driven parsing)
44

FIRST set (1)


• FIRST(α) = { the set of terminals that begin all
strings derived from α }

FIRST(a) = {a} if a ∈ T
FIRST(ε) = {ε}
FIRST(A) = ∪A→α FIRST(α) for A→α ∈ P
FIRST(X1X2…Xk) =
if for all j = 1, …, i-1 : ε ∈ FIRST(Xj) then
add non-ε in FIRST(Xi) to FIRST(X1X2…Xk)
if for all j = 1, …, k : ε ∈ FIRST(Xj) then
add ε to FIRST(X1X2…Xk)
45

FIRST set (1.1)


46

FOLLOW set (1)


47

FOLLOW set (1.1)


• FOLLOW(A) = { the set of terminals that can
immediately follow nonterminal A }

FOLLOW(A) =
for all (B → α A β) ∈ P do
add FIRST(β)\{ε} to FOLLOW(A)
for all (B → α A β) ∈ P and ε ∈ FIRST(β) do
add FOLLOW(B) to FOLLOW(A)
for all (B → α A) ∈ P do
add FOLLOW(B) to FOLLOW(A)
if A is the start symbol S then
add $ to FOLLOW(A)
48
49
50

Example
51
52

Constructing an LL(1) Predictive


Parsing Table
for each production A → α do
for each a ∈ FIRST(α) do
add A → α to M[A,a]
enddo
if ε ∈ FIRST(α) then
for each b ∈ FOLLOW(A) do
add A → α to M[A,b]
enddo
endif
enddo
Mark each undefined entry in M error
53

LL(1) Grammar
• A grammar G is LL(1) if it is not left recursive and
for each collection of productions
A → α1 | α 2 | … | α n
for nonterminal A the following holds:

1. FIRST(αi) ∩ FIRST(αj) = ∅ for all i ≠ j


2. if αi ⇒* ε then
2.a. αj ⇒* ε for all i ≠ j
2.b. FIRST(αj) ∩ FOLLOW(A) = ∅
for all i ≠ j
54

Non-LL(1) Examples

Grammar Not LL(1) because:


S→Sa|a Left recursive
S→aS|a FIRST(a S) ∩ FIRST(a) ≠ ∅
S→aR|ε
R→S|ε For R: S ⇒* ε and ε ⇒* ε
S→aRa For R:
R→S|ε FIRST(S) ∩ FOLLOW(R) ≠ ∅
55

Consider the grammar- Construct Predictive Parsing


Table

S → i E t S S’ | a
S’ → e S | ε
E→b
56

LL(1) Grammars are


Unambiguous
A→α FIRST(α) FOLLOW(A)
S → i E t S S’ | a S → i E t S S’ i
S’ → e S | ε e$
S→a a
E→b S’ → e S e
e$
S’ → ε ε
E→b b t
Error: duplicate table entry
a b e i t $
S S→a S → i E t S S’
S’ → ε
S’ S’ → ε
S’ → e S
E E→b
57

Panic Mode Recovery


Add synchronizing actions to FOLLOW(E) = { ) $ }
undefined entries based on FOLLOW FOLLOW(E’) = { ) $ }
FOLLOW(T) = { + ) $ }
Pro: Can be automated FOLLOW(T’) = { + ) $ }
Cons: Error messages are needed FOLLOW(F) = { + * ) $ }

id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ TR → ε TR → * F TR T’ → ε T’ → ε
F F → id synch synch F→(E) synch synch
synch: the pparser pops current nonterminal A and skips input till
synch token or skips input until one of FIRST(A) is found
58
59

Phrase-Level Recovery
Change input stream by inserting missing tokens
For example: id id is changed into id * id
Pro: Can be fully automated
Cons: Recovery not always intuitive
Can then continue here

id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ insert * T’→ ε T’ → * F TR T’ → ε T’→ ε
F F → id synch synch F→(E) synch synch

insert *: pprarser inserts missing * and retries the production


60

Error Productions
E → T E’ Add “error production”:
E’ → + T E’ | ε T’ → F T’
T → F T’ to ignore missing *, e.g.: id id
T’ → * F T’ | ε Pro: Powerful recovery method
F → ( E ) | id Cons: Manual addition of productions
id + * ( ) $
E E → T E’ E → T E’ synch synch
E’ E’ → + T E’ E’ → ε E’ → ε
T T → F T’ synch T → F T’ synch synch
T’ T’ → F T’ T’→ ε T’ → * F T’ T’→ ε T’→ ε
F F → id synch synch F→(E) synch synch
61

Construct Predictive parsing table for the


following grammar and show the stack moves
for the input w=t or f and t

E → E or T | T
T → T and F | F
F → not F | (E) | t | f

You might also like