Syntax Analysis
Syntax Analysis
ANALYSIS
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given
source program.
A context-free grammar
Bottom-up methods.
They are not as easy to build, but tools for generating them directly from a grammar are available.
Both top-down and bottom-up parsers scan the input from left to right (one symbol at a time).
Top- Down Parsing
Done by starting with the root, labeled with the starting nonterminal stmt,
and repeatedly performing the following two steps.
At node N, labeled with nonterminal A, select one of the productions for A and
construct children at N for the symbols in the production body.
Find the next node at which a subtree is to be constructed, typically the leftmost
unexpanded nonterminal of the tree.
S S Backtrack S
/|\ /|\ /|\
cAd cAd cAd
/ \ |
a b a
Predictive Parsing
Panic-mode recovery
Phrase-level recovery
Error-productions
Global-correction.
Panic-Mode Recovery
The parser discards input symbols one at a time until one of a designated set of
synchronizing tokens is found.
It may replace a prefix of the remaining input by some string that allows the
parser to continue.
Disadvantage in coping with situations in which the actual error has occurred
before the point of detection.
Error Productions
Expand the grammar for the language at hand with productions that generate the
erroneous constructs.
The parser can then generate appropriate error diagnostics about the erroneous
construct that has been recognized in the input.
Global Correction
Compiler to make as few changes as possible in processing an incorrect input
string.
Given an incorrect input string x and grammar G, algorithms will find a parse
tree for a related string y, such that the number of insertions, deletions, and
changes of tokens required to transform x into y is as small as possible.
Not implemented.
Syntax Definition
A grammar describes the hierarchical structure of programming language constructs.
The digits 0, 1, . . . , 9.
Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly
empty) strings of terminals.
A set of productions a -> α 1 , a -> α2, ... , a -> α k with a common head
Unless stated otherwise, the head of the first production is the start symbol
Notational Conventions
Derivations
E E+E : E+E derives from E
A if there is a production rule A in our grammar and and
are arbitrary strings of terminal and non-terminal symbols
OR
E -E lm-(E) -(E+E)
lm
-(id+E)
lm
-(id+id)
lm
lm
Right-Most Derivation
Erm -E
rm -(E) rm
-(E+E) -(E+id)
rm -(id+id)
rm
We will see that the top-down parsers try to find the left-most derivation of the
given source program.
We will see that the bottom-up parsers try to find the right-most derivation of
the given source program in the reverse order.
Parse Trees and Derivations
A parse tree is a graphical representation of a derivation that filters out the
order in which productions are applied to replace nonterminals.
The interior node is labeled with the nonterminal A in the head of the
production;
The children of the node are labeled, from left to right, by the symbols in the
body of the production
The leaves of a parse tree are labeled by nonterminals or terminals
Read from left to right, constitute a sentential form, called the yield or frontier
of the tree.
There is a many-to-one relationship between derivations and parse trees.
Ambiguity
a grammar that produces more than one parse tree for some sentence is said
to be ambiguous
1
2
3
4
f
Writing a Grammar
Grammars are capable of describing most, of the syntax of programming
languages .
Grammar should be unambiguous.
Left-recursion elimination and left factoring - are useful for rewriting
grammars .
From the resulting grammar we can create top down parsers without
backtracking.
Such parsers are called predictive parsers or recursive-descent parser
Eliminating Ambiguity
The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.
Immediate Left-Recursion
AA| where does not start with A
S Aa | b
A Sc | d
S Aa Sca
A Sc Aac causes to a left-recursion
Eliminate Left-Recursion -- Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai Aj
by
Ai 1 | ... | k
where Aj 1 | ... | k
}
- eliminate immediate left-recursions among Ai productions
}
Eliminate Left-Recursion
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S, A
- A Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A bdA’ | fA’
A’ cA’ | adA’ |
So, the resulting equivalent grammar which is not left-recursive is:
S Aa | b
A bdA’ | fA’
A’ cA’ | adA’ |
Eliminate Left-Recursion – Example2
S Aa | b
A Ac | Sd | f
- Order of non-terminals: A, S
- Eliminate the immediate left-recursion in A
A SdA’ | fA’
A’ cA’ |
- Replace S Aa with S SdA’a | fA’a
- Eliminate the immediate left-recursion in S
S fA’aS’ | bS’
S’ dA’aS’ |
So, the resulting equivalent grammar which is not left-recursive is:
S fA’aS’ | bS’
S’ dA’aS’ |
A SdA’ | fA’
A’ cA’ |
Left-Recursive Grammars III
Here is an example of a (directly) left-recursive grammar:
EE+T|T
TT*F|F
F ( E ) | id
E T E’ E’ + TE’ | є
T F T’ T’ * F T’ | є
F (E) | id
Left Factoring
Left factoring is a grammar transformation that is useful for
producing a grammar suitable for predictive, or top-down,
parsing.
Stmt -> if expr then stmt else stmt
|if expr then stmt
A ->α 1 | α 2
So it should be left factored as
Left-Factoring -- Algorithm
For each non-terminal A with two or more alternatives (production rules)
with a common non-empty prefix
A 1 | ... | n | 1 | ... | m
convert it into
A A’ | 1 | ... | m
A’ 1 | ... | n
Left-Factoring – Example1
A abB | aB | cdg | cdeB | cdfB
A aA’ | cdg | cdeB | cdfB
A’ bB | B
A aA’ | cdA’’
A’ bB | B
A’’ g | eB | fB
Left-Factoring – Example2
A ad | a | ab | abc | b
A aA’ | b
A’ d | | b | bc
A aA’ | b
A’ d | | bA’’
A’’ | c
Top-Down Parsing
The parse tree is created top to bottom.
Top-down parser
Recursive-descent parsing
Backtracking is needed
It is a general parsing technique, but not widely used.
Not efficient
Predictive parsing
No backtracking
Efficient
Needs a special form of grammars - (LL(1) grammars).
Recursive predictive parsing is a special form of recursive descent parsing without
backtracking.
Non-recursive (table driven) predictive parser is also known as LL(1) parser.
Recursive Predictive Parsing
Each non-terminal corresponds to a procedure.
Ex: A aBb
proc A {
- match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
}
Recursive Predictive Parsing (cont.)
A aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
Top-down parse for id + id * id
FIRST and FOLLOW
FIRST and FOLLOW allow us to choose which production toapply, based on the
next input symbol.
FIRST(α), where α is any string of grammar symbols, to be the set of terminals that
begin strings derived from α.
If α => ε, then ε is also in FIRST(α) .
FOLLOW(A) is the set of the terminals which occur immediately after (follow) the
non-terminal A in the strings derived from the starting symbol.
$ is in FOLLOW(A) if S A
*
*
FIRST
1. If X is a terminal, then FIRST(X) = {X}.