0% found this document useful (0 votes)
24 views

Lecture 4- Syntax Analysis (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Lecture 4- Syntax Analysis (1)

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 66

Syntax Analysis

Compiler construction
Introduction to syntax analysis
• The parser (syntax analyzer) receives the source
code in the form of tokens from the lexical
analyzer and performs syntax analysis, which
create a tree-like intermediate representation
that depicts the grammatical structure of the
token stream.
• Syntax analysis is also called parsing.
• It involves analyzing the structure of source code
to ensure it adheres to the grammatical rules of
the language
• A typical representation is an abstract syntax tree
where;
• Each interior node represents an operation
• The children of the node represent the
arguments of the operation
Compiler construction
Syntactic Analysis
• Input: sequence of tokens from scanner
• Output: abstract syntax tree
• Actually,
• parser first builds a parse tree
• AST is then built by translating the parse
tree
• parse tree rarely built explicitly; only
determined by, say, how parser pushes
stuff to stack

Compiler construction
Introduction to syntax analysis
Parse Tree/Abstract Syntax Tree
(AST)

• Parse Tree: A tree representation


of the syntactic structure of the
input based on the grammar rules.

• Abstract Syntax Tree (AST): A


simplified version of the parse tree
that abstracts away certain
syntactic details, focusing instead
on the logical structure of
Compiler the
construction
Example
• Source Code
• 4*(2+3)
• Parser input
• NUM(4) TIMES LPAR NUM(2) PLUS
NUM(3) RPAR
• Parser output (AST):
*

+
NUM(4)
NUM(2) NUM(3)

Compiler construction
Another example

• Source Code
• if (x == y) { a=1; }
• Parser input
• IF LPAR ID EQ ID RPAR LBR ID AS INT
SEMI RBR
• Parser output (AST):
IF-THEN

== =

ID ID ID IN
T

Compiler construction
Introduction to syntax analysis
Example:
For a simple expression like a + b * c, the syntax
analysis might involve:
• Tokenizing: a, +, b, *, c
• Using grammar rules to recognize:
An expression consists of terms and
operators.
The multiplication operator has higher
precedence than the addition operator.
Constructing an AST:

Compiler construction
Syntax Analysis Analogy
Syntax analysis for natural languages
• Identify the function of each word
• Recognize if a sentence is grammatically
correct
• Example: I gave Ali the card.
Syntax analysis for natural languages
• Identify the function of each word
• Recognize if a sentence is grammatically
correct

Compiler construction
Introduction to syntax analysis
.

Compiler construction
Introduction to syntax analysis

Compiler construction
Position of Syntax Analyzer

Compiler construction
Overview

Main Task: Take a token sequence from the


scanner and verify that it is a syntactically
correct program.
Secondary Tasks:
 Process declarations and set up symbol table
information accordingly, in preparation for semantic
analysis.
 Construct a syntax tree in preparation for 12
intermediate code generation. Compiler construction
Context-free Grammars
• A context-free grammar for a language
specifies the syntactic structure of
programs in that language.
• Components of a grammar:
• a finite set of tokens (obtained from the
scanner);
• a set of variables representing “related”
sets of strings, e.g., declarations,
statements, expressions.
• a set of rules that show the structure of
these strings.
• an indication of the “top-level” set of 13

Compiler construction
Context-free Grammars: Definition
Formally, a context-free grammar G is a 4-
tuple G = (V, T, P, S), where:
• V is a finite set of variables (or
nonterminals). These describe sets of
“related” strings.
• T is a finite set of terminals (i.e., tokens).
• P is a finite set of productions, each of the
form
•A  
• where A  V is a variable, and   (V 
T)* is a sequence of terminals and
nonterminals. 14

Compiler construction
Context-free Grammars: An Example
A grammar for palindromic bit-strings:
G = (V, T, P, S), where:
• V = { S, B }
• T = {0, 1}
• P = { S  B,
S  ,
S  0 S 0,
S  1 S 1,
B  0,
B1
}

15

Compiler construction
Context-free Grammars: Terminology
• Derivation: Suppose that
•  and  are strings of grammar symbols,
and
• A   is a production.
• Then, A   (“A derives ”).

•  : “derives in one step”


• * : “derives in 0 or more steps”
•  *  (0 steps)
•  *  if    and  *  ( 1 steps)
16

Compiler construction
Derivations: Example
• Grammar for palindromes: G = (V, T,
P, S),
• V = {S},
• T = {0, 1},
•P={S0S0 | 1S1 | 0 | 1 | 
}.
• A derivation of the string 10101:
•S
•1S1 (using S  1S1)
•  1 0S0 1 (using S  0S0) 17

Compiler construction
Leftmost and Rightmost Derivations
• A leftmost derivation is one where, at each step,
the leftmost nonterminal is replaced.
• (analogous for rightmost derivation)
• Example: a grammar for arithmetic expressions:
• E  E + E | E * E | id
• Leftmost derivation:
• E  E * E  E + E * E  id + E * E  id +
id * E  id + id * id
• Rightmost derivation:
• E  E + E  E + E * E  E + E * id  E +
id * id  id + id * id

CSc 453: Syntax Analysis 18

Compiler construction
Context-free Grammars: Terminology
• The language of a grammar G =
(V,T,P,S) is
• L(G) = { w | w  T* and S * w }.
• The language of a grammar
contains only strings of terminal
symbols.

• Two grammars G1 and G2 are


equivalent if
• L(G1) = L(G2). 19

Compiler construction
Parse Trees
• A parse tree is a tree representation of a
derivation.
• Constructing a parse tree:
• The root is the start symbol S of the grammar.
• Given a parse tree for  X , if the next
derivation step is
•  X    1…n  then the parse tree is
. obtained as:

20
Compiler construction
Approaches to Parsing
• Top-down parsing:
• attempts to figure out the derivation for
the input string, starting from the start
symbol.

• Bottom-up parsing:
• starting with the input string, attempts to
“derive in reverse” and end up with the
start symbol;
• forms the basis for parsers obtained from
parser-generator tools such as yacc,
bison. 21

Compiler construction
Top-down Parsing
• “top-down:” starting with the start symbol
of the grammar, try to derive the input
string.

• Parsing process: use the current state of the


parser, and the next input token, to guide
the derivation process.

• Implementation: use a finite state


automaton augmented with a runtime stack
(“pushdown automaton”).
22

Compiler construction
Bottom-up Parsing

• “bottom-up:” work backwards from the


input string to obtain a derivation for it.

• Parsing process: use the parser state to


keep track of:
• what has been seen so far, and
• given this, what the rest of the input
might look like.

• Implementation: use a finite state


automaton augmented with a runtime stack 23

(“pushdown automaton”). Compiler construction


Parsing: Top-down vs. Bottom-up
 .

24

Compiler construction
Parsing Problems: Ambiguity

• A grammar G is ambiguous if some string in L(G)


has more than one parse tree.
• Equivalently: if some string in L(G) has more than
one leftmost (rightmost) derivation.
• Example: The grammar
• E  E + E | E * E | id
• is ambiguous, since “id+id*id” has multiple
parses:
.

25
Compiler construction
Dealing with Ambiguity
1. Transform the grammar to an equivalent
unambiguous grammar.
2. Use disambiguating rules along with the
ambiguous grammar to specify which
parse to use.
Comment: It is not possible to determine
algorithmically whether:
• Two given CFGs are equivalent;
• A given CFG is ambiguous.

26

Compiler construction
Removing Ambiguity: Operators
• Basic idea: use additional nonterminals to
enforce associativity and precedence:
• Use one nonterminal for each precedence
level:
• E  E * E | E + E | id
• needs 2 nonterminals (2 levels of
precedence).
• Modify productions so that “lower precedence”
nonterminal is in direction of precedence:
•EE+E  E  E + T (+ is
left-associative)
CSc 453: Syntax Analysis 27

Compiler construction
Example
• Original grammar:
•EE*E | E/E | E+E | E–E |
( E ) | id
• precedence levels: { *, / } > { +, – }
• associativity: *, /, +, – are all left-
associative.

• Transformed grammar:
•EE+T | E–T | T (precedence
level for: +, -)
•T T*F | T/ F | F (precedence 28

level for: *, /) Compiler construction


Bottom-up parsing: Approach
1. Preprocess the grammar to compute some
info about it.
(FIRST and FOLLOW sets)
2. Use this info to construct a pushdown
automaton for the grammar:
• the automaton uses a table (“parsing
table”) to guide its actions;
• constructing a parser amounts to
constructing this table.

29

Compiler construction
FIRST Sets
Defn: For any string of grammar symbols ,
FIRST() = { a | a is a terminal and  *
a}.
if  *  then  is also in FIRST().
 Example: E  T E′
E′  + T E′ | 
T  F T′
T′  * F T′ | 
F  ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E′) = { +,  }
FIRST(T′) = { *,  }
CSc 453: Syntax Analysis 30

Compiler construction
Computing FIRST Sets
Given a sequence of grammar symbols A:
 if A is a terminal or A =  then FIRST(A)
= {A}.
 if A is a nonterminal with productions A
 1 | … | n then:
• FIRST(A) = FIRST(1)    FIRST(n).
 if A is a sequence of symbols Y1 … Yk
then:
• for i = 1 to k do:
– add each a  FIRST(Yi), such that a 
, to FIRST(A).
– if   FIRST(Yi) then break;
• if  is in each of FIRST(Y1), …, FIRST(Yk) 31

Compiler construction
Computing FIRST sets: cont’d
• For each nonterminal A in the grammar,
initialize FIRST(A) = .
• repeat {
• for each nonterminal A in the grammar {
• compute FIRST(A); /* as described
previously */
•}
• } until there is no change to any FIRST set.

32

Compiler construction
Example (FIRST Sets)
X  YZ | a
Y b | 
Z c | 

• X  a, so add a to FIRST(X).
• X  YZ, b  FIRST(Y), so add b to FIRST(X).
• Y  , i.e.   FIRST(Y), so add non- symbols from
FIRST(Z) to FIRST(X).
• ► add c to FIRST(X).
•   FIRST(Y) and   FIRST(Z), so add  to FIRST(X).
• Final: FIRST(X) = { a, b, c,  }.

33

Compiler construction
FOLLOW Sets
Definition: Given a grammar G = (V, T, P, S),
for any nonterminal A  V:
• FOLLOW(A) = { a  T | S * Aa for
some , }.
i.e., FOLLOW(A) contains those terminals
that can appear after A in something
derivable from the start symbol S.
• if S * A then $ is also in FOLLOW(A).
($  EOF, “end of input.”)
Example:
E  E + E | id
FOLLOW(E) = { +, $ }.
34

Compiler construction
Computing FOLLOW Sets
Given a grammar G = (V, T, P, S):
1. add $ to FOLLOW(S);
2. repeat {
• for each production A  B in P, add
every non- symbol in FIRST() to
FOLLOW(B).
• for each production A  B in P, where
  FIRST(), add everything in
FOLLOW(A) to FOLLOW(B).
• for each production A  B in P, add
everything in FOLLOW(A) to FOLLOW(B).
} until no change to any FOLLOW set.
Compiler construction
Example (FOLLOW Sets)
X  YZ | a
Y b | 
Z c | 
• X is start symbol: add $ to FOLLOW(X);
• X  YZ, so add everything in FOLLOW(X) to FOLLOW(Z).
• ►add $ to FOLLOW(Z).
• X  YZ, so add every non- symbol in FIRST(Z) to
FOLLOW(Y).
• ►add c to FOLLOW(Y).
• X  YZ and   FIRST(Z), so add everything in FOLLOW(X)
to FOLLOW(Y).
• ►add $ to FOLLOW(Y).
36

Compiler construction
Shift-reduce Parsing
• An instance of bottom-up parsing
• Basic idea: repeat
1. in the string being processed, find a
substring α such that A → α is a
production;
2. replace the substring α by A (i.e., reverse
a derivation step).
until we get the start symbol.
• Technical issues: Figuring out
1. which substring to replace; and
2. which production to reduce with. 37

Compiler construction
Shift-reduce Parsing: Example

Grammar: S → aABe
A → Abc | b
B→d

Input: abbcde (using A → b)


 aAbcde (using A → Abc)
 aAde (using B → d)
 aABe (using S → aABe)
 S
38

Compiler construction
Shift-Reduce Parsing: cont’d
• Need to choose reductions carefully:
• abbcde  aAbcde  aAbcBe  …
• doesn’t work.
• A handle of a string s is a substring 
s.t.:
•  matches the RHS of a rule A → ;
and
• replacing  by A (the LHS of the
rule) represents a step in the
reverse of a rightmost derivation of 39

s. Compiler construction
Shift-reduce Parsing: Implementation
• Data Structures:
• a stack, its bottom marked by ‘$’.
Initially empty.
• the input string, its right end marked by
‘$’. Initially w.
• Actions:
• repeat
• Shift some ( 0) symbols from the
input string onto the stack, until a
handle  appears on top of the stack.
• Reduce  to the LHS of the appropriate
production.
• until ready to accept. 40

• Acceptance: when input Compiler


is empty and
construction
Example
Stack (→) Input Action
$ abbcde$ shift
$a bbcde$ shift
$ab bcde$ reduce: A → b Grammar :
$aA bcde$ shift S → aABe
$aAb cde$ shift A → Abc | b
$aAbc de$ reduce: A → Abc B→d
$aA de$ shift
$aAd e$ reduce: B → d
$aAB e$ shift
$aABe $ reduce: S → aABe
$S $ accept
41

Compiler construction
Conflicts
• Can’t decide whether to shift or to reduce ―
both seem OK (“shift-reduce conflict”).
• Example: S → if E then S | if E then
S else S | …

• Can’t decide which production to reduce


with ― several may fit (“reduce-reduce
conflict”).
• Example: Stmt → id ( args ) | Expr
• Expr → id ( args )
42

Compiler construction
LR Parsing
• A kind of shift-reduce parsing. An LR(k)
parser:
• scans the input L-to-R;
• produces a Rightmost derivation (in
reverse); and
• uses k tokens of lookahead.
• Advantages:
• very general and flexible, and handles a
wide class of grammars;
• efficiently implementable.
• Disadvantages:
• difficult to implement by hand (use tools 43

Compiler construction
LR Parsing: Schematic

• The driver program is the same for all LR


parsers (SLR(1), LALR(1), LR(1), …). Only
the parse table changes.
• Different LR parsing algorithms involve
different tradeoffs between parsing power,
parse table size. 44
Compiler construction
LR Parsing: the parser stack
• The parser stack holds strings of the form
• s0 X1s1 X2s2 … Xmsm (sm is on top)
• where si are parser states and Xi are grammar
symbols.
• (Note: the Xi and si always come in pairs, with
the state component si on top.)

• A parser configuration is a pair


• stack contents, unexpended input

45

Compiler construction
LR Parsing: Roadmap
• LR parsing algorithm:
• parse table structure
• parsing actions

• Parse table construction:


• viable prefix automaton
• parse table construction from this
automaton
• improving parsing power: different LR
parsing algorithms
46

Compiler construction
LR Parse Tables
• The parse table has two parts: the action
function and the goto function.

• At each point, the parser’s next move is


given by action[sm, ai], where:
• sm is the state on top of the parser stack,
and
• ai the next input token.

• The goto function is used only during


reduce moves.
47

Compiler construction
LR Parser Actions: shift
• Suppose:
• the parser configuration is s0 X1s1 … Xmsm,
ai … an, and
• action[sm, ai] = ‘shift sn’.
• Effects of shift move:
• push the next input symbol ai; and
• push the state sn

• New configuration: s0 X1s1 … Xmsm ai sn, ai+1 … an


48

Compiler construction
LR Parser Actions: reduce
• Suppose:
• the parser configuration is s0 X1s1 … Xmsm, ai …
an, and
• action[sm, ai] = ‘reduce A → ’.
• Effects of reduce move:
• pop n states and n grammar symbols off the
stack (2n symbols total), where n = ||.
• suppose the (newly uncovered) state on top of
the stack is t, and goto[t, A] = u.
• push A, then u.
• New configuration: s0 X1s1 … Xm-nsm-n A u, ai … an

49

Compiler construction
LR Parsing Algorithm
1. set ip to the start of the input string w$.
2. while TRUE do:
1. let s = state on top of parser stack, a = input
symbol pointed at by ip.
2. if action[s,a] == ‘shift t’ then: (i) push the input
symbol a on the stack, then the state t; (ii)
advance ip.
3. if action[s,a] == ‘reduce A → ’ then: (i) pop 2*|
| symbols off the stack; (ii) suppose t is the
state that now gets uncovered on the stack; (iii)
push the LHS grammar symbol A and the state u
= goto[A, t].
4. if action[s,a] == ‘accept’ then accept;
5. else signal a syntax error.
50

Compiler construction
LR parsing: Viable Prefixes
• Goal: to be able to identify handles, and so
produce a rightmost derivation in reverse.
• Given a configuration s0 X1s1 … Xmsm, ai … an:
• X1 X2 … Xm ai … an is obtainable on a rightmost
derivation.
• X1 X2 … Xm is called a viable prefix.
• The set of viable prefixes of a grammar are
recognizable using a finite automaton.
• This automaton is used to recognize handles.

51

Compiler construction
Viable Prefix Automata
• An LR(0) item of a grammar G is a
production of G with a dot “” somewhere in
the RHS.
• Example: The rule A → a A b gives these
LR(0) items:
•A→  aAb
•A→ aAb
•A→ aAb
•A→ aAb
• Intuition: ‘A →  ’ denotes that:
• we’ve seen something derivable from ;
and
52
• it would be legal to see something
Compiler construction
Overall Approach
Given a grammar G with start symbol S:
• Construct the augmented grammar by
adding a new start symbol S′ and a new
production S′ → S.

• Construct a finite state automaton whose


start state is labeled by the LR(0) item S′
→  S.

• Use this automaton to construct the


parsing table.
53

Compiler construction
Viable Prefix NFA for LR(0) items
• Each state is labeled by an LR(0) item. The initial
state is labeled S′ →  S.

• Transitions:
1.

where X is a terminal
or nonterminal.

2.
where X is a nonterminal, and X →  is a
production.

54

Compiler construction
Viable Prefix NFA:
Example
Grammar :
S→0S1
S→

55
Compiler construction
Viable Prefix NFA  DFA
• Given a set of LR(0) items I, the set closure(I) is
constructed as follows:
• repeat
• add every item in I to closure(I);
• if A →   B  closure(I) and B is a
nonterminal, then for each production B → ,
add the item B →   to closure(I).
• until no new items can be added to
closure(I).

• Intuition:
• A →   B  closure(I) means something
derivable from B is legal at this point. This 56
means that something derivable from
Compiler B (and
construction
Viable Prefix NFA  DFA (cont’d)
• Given a set of LR(0) items I, the set goto(I,X) is
defined as
• goto(I, X) = closure({ A →  X   | A →   X
  I })

• Intuition:
• if A →   X   I then (a) we’ve seen something
derivable from ; and (b) something derivable
from X would be legal at this point.
• Suppose we now see something derivable from
X.
• The parser should “go to” a state where (a)
we’ve seen something derivable from X; and (b)
something derivable from  would be legal.
57

Compiler construction
Example

 Let I0 = {S′ → S}.


 I1 = closure(I0) = { S′ → S, /*
from I0 */
S →  0 S 1, S →  }
 goto (I1, 0) = closure( { S → 0  S 1 } )
= {S → 0  S 1, S →  0 S 1, S →  }
58
Compiler construction
Viable Prefix DFA for LR(0) Items
1. Given a grammar G with start symbol S, construct
the augmented grammar with new start symbol S′
and new production S′ → S.
2. C = { closure({ S′ → S }) }; // C = a set of sets of
items = set of parser states
3. repeat {
for each set of items I  C {
for each grammar symbol X {
if ( goto(I,X)   && goto(I,X)  C ) {
// new state
add goto(I,X) to C;
}
}
}
} until no change to C;
59

Compiler construction
SLR(1) Parse Table Construction I
Given a grammar G with start symbol S:
• Construct the augmented grammar G′
with start symbol S′.
• Construct the set of states {I0, I1, …, In}
for the Viable Prefix DFA for the
augmented grammar G′.
• Each DFA state Ii corresponds to a parser
state si.
• The initial parser state s0 coresponds to
the DFA state I0 obtained from the item S′
→  S. 60

Compiler construction
SLR(1) Parse Table Construction II
• Parsing action for parser state si:
• action table entries:
• if DFA state Ii contains an item A →   a 
where a is a terminal, and goto(Ii, a) = Ij : set
action[i, a] = shift j.
• if DFA state Ii contains an item A →  , where
A  S′: for each b  FOLLOW(A), set action[i,
b] = reduce A → .
• if state Ii contains the item S′ → S : set
action[i, $] = accept.
• goto table entries:
• for each nonterminal A, if goto(Ii, A) = Ij, then
goto[i, A] = j.
• any entry not defined byAnalysis
CSc 453: Syntax these steps is an error61
state. Compiler construction
SLR(1) Shortcomings
• SLR(1) parsing uses reduce actions too
liberally. Because of this it fails on many
reasonable grammars.
• Example (simple pointer assignments):
S→R | L=R
L → *R | id
R→L
The SLR parse table has a state { S → L  =
R, R → L  }, and FOLLOW(L) = { =, $ }.
 shift-reduce conflict.
62

Compiler construction
Improving LR Parsing
• SLR(1) parsing weaknesses can be
addressed by incorporating lookahead into
the LR items in parser states.
• The lookahead makes it possible to
remove some “spurious” reduce actions
in the parse table.
• The LALR(1) parsers produced by bison
and yacc incorporate such lookahead
items.

• This improves parsing power, but at the


cost of larger parse tables. 63

Compiler construction
Error Handling
Possible reactions to lexical and syntax errors:
• ignore the error. Unacceptable!
• crash, or quit, on first error. Unacceptable!
• continue to process the input. No code
generation.
• attempt to repair the error: transform an
erroneous program into a similar but legal
input.
• attempt to correct the error: try to guess
what the programmer meant. Not
worthwhile.
64

Compiler construction
Error Reporting
• Error messages should refer to the source
program.
• prefer “line 11: X redefined” to “conflict
in hash bucket 53”
• Error messages should, as far as possible,
indicate the location and nature of the error.
• avoid “syntax error” or “illegal
character”
• Error messages should be specific.
• prefer “x not declared in function foo”
to “missing declaration”
• They should not be redundant.
65

Compiler construction
Error Recovery
• Lexical errors: pass the illegal character to
the parser and let it deal with the error.
• Syntax errors: “panic mode error
recovery”
• Essential idea: skip part of the input
and pretend as though we saw
something legal, then hope to be able to
continue.
• Pop the stack until we find a state s such
that goto[s,A] is defined for some
nonterminal A.
• discard input tokens until we find some
token a that can legitimately follow A 66

Compiler construction

You might also like