Chapter 3 - Syntax Analyzer
Chapter 3 - Syntax Analyzer
Instructor: Mohammed O.
Email: [email protected]
Samara University
Chapter Three
This Chapter Covers:
Syntax Analyzer
Top-Down Parsing
Bottom-up Parsing
Context Free Grammar
Left Recursion
Left-Factoring
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given
source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-
free grammar (CFG).
The syntax analyzer (parser) checks whether a given
source program satisfies the rules implied by a context-free
grammar or not.
If it satisfies, the parser creates the parse tree of that
program.
Otherwise the parser gives the error messages.
Parser
A context-free grammar
Gives a precise (accurate) syntactic specification of a
programming language.
The design of the grammar is an initial phase of the
design of a compiler.
A grammar can be directly converted into a parser by
some tools (parser generators).
Parser works on a stream (continuous flow) of tokens.
(cont’d)
We categorize the parsers into two groups:
Top-Down Parser
the parse tree is created top to bottom, starting from the root.
Bottom-Up Parser
the parse is created bottom to top; starting from the leaves
Both top-down and bottom-up parsers scan the input from
left to right (one symbol at a time).
Efficient top-down and bottom-up parsers can be
implemented only for context-free grammars.
LL (left to right/ leftmost derivation) for top-down parsing
LR (left to right/ rightmost derivation) for bottom-up parsing
Context-Free Grammars
CFG is a formal grammar which is used to generate all
possible strings in a given formal language.
In a CFG , G (where G describes the grammar) can be defined
by four tuples (have multiple parts) as: G= (V, T, P, S)
T describes a finite set of terminal symbols.
V describes a finite set of non-terminal symbols.
P describes a set of productions rules in the following form
A where A is a non-terminal and
is a string of terminals and non-terminals (including the
empty string)
S is the start symbol (one of the non-terminal symbol)
Example: E E + E | E – E | E * E | E / E | - E
E (E)
E id
Derivations
In CFG, the start symbol is used to derive the string. You can
derive the string by repeatedly replacing a non-terminal by
the right hand side of the production, until all non-terminal
have been replaced by terminal symbols.
E E+E
E+E derives from E
we can replace E by E+E
to able to do this, we have to have a production rule
EE+E in our grammar.
Right-Most Derivation
E -E -(E) -(E+E) -(E+id) -(id+id)
We will see that the top-down parsers try to find the left-
most derivation of the given source program.
We will see that the bottom-up parsers try to find the right-
most derivation of the given source program in the reverse
order.
Parse Tree
Inner nodes of a parse tree are non-terminal symbols.
The leaves of a parse tree are terminal symbols.
A parse tree can be seen as a graphical representation of a
derivation.
E
E -E E E - E
-(E) -(E+E)
- E - E ( E )
( E ) E + E
E E
-(id+E) - -(id+id)
E - E
( E )
( E )
E + E
E + E
id
id id
Ambiguity
A grammar produces more than one parse tree for a
sentence is called as an ambiguous grammar.
E E+E id+E id+E*E
id+id*E id+id*id E
E + E
id E * E
id id
E E*E E+E*E id+E*E
E
id+id*E id+id*id
E * E
E + E id
id id
Ambiguity (cont’d)
For the most parsers, the grammar must be unambiguous.
unambiguous grammar
unique selection of the parse tree for a sentence
+
A A for some string
Top-down parsing techniques cannot handle left-recursive
grammars.
S Aa Sca or
A Sc Aac causes to a left-recursion
So, we have to eliminate all left-recursions from our
grammar
Eliminate Left-Recursion -- Example
S Aa | b
A Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
S Aa |b
for A:
- Replace A Sd with A Aad | bd
So, we will have A Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A bdA’ | fA’
A’ cA’ | adA’ |
Cont.
So, the resulting equivalent grammar which is not left-
recursive is:
S Aa | b
A bdA’ | fA’
A’ cA’ | adA’ |
Left-Factoring
In left factoring it is not clear which of two alternative
productions to use to expand
A 1 | 2
convert it into
A A’ | 1 | ... | m
A’ 1 | ... | n
Left-Factoring – Example1
A abB | aB | cdg | cdeB | cdfB
A aA’ | cdg | cdeB | cdfB
A’ bB | B
A aA’ | cdA’’
A’ bB | B
A’’ g | eB | fB
Left-Factoring – Example2
A ad | a | ab | abc | b
A aA’ | b
A’ d | | b | bc
A aA’ | b
A’ d | | bA’’
A’’ | c
Assignment1 - 15%
Explain in detail with an example?
Types of three-address statements
Control-flow/ Positional representation of Boolean
expressions
What are the two methods that are used to translating
the Boolean expressions
3 up to 5 pages.
Assignment2 -15%
Explain in detail with an example?
Issues in the design of a code generator
Run-Time Storage Management
Static Allocation and Stack Allocation
Basic blocks and flow graphs
What is backpatching ? Generate three address code for
control flow statements using backpatching?
5 up to 10 pages.