0% found this document useful (0 votes)
4 views

Chapter 3 Syntax Analyzer

The document provides an overview of syntax analysis in compiler design, detailing the role of parsers in checking source programs against context-free grammars (CFGs). It explains the components of CFGs, the derivation process, and the differences between top-down and bottom-up parsing methods. Additionally, it discusses ambiguous grammars, parse trees, and the implementation of shift-reduce parsing techniques.

Uploaded by

kaleabbayeh2001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 3 Syntax Analyzer

The document provides an overview of syntax analysis in compiler design, detailing the role of parsers in checking source programs against context-free grammars (CFGs). It explains the components of CFGs, the derivation process, and the differences between top-down and bottom-up parsing methods. Additionally, it discusses ambiguous grammars, parse trees, and the implementation of shift-reduce parsing techniques.

Uploaded by

kaleabbayeh2001
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Compiler Design

Department of Computer Science

Syntax Analysis

Compiler Design-Woldia University by:- Michael W.


1
Syntax Analysis
• The syntax analyzer (parser) checks whether a given source
program satisfies the rules implied by a context-free grammar
or not.
– If it satisfies, the parser creates the parser tree of that
program
– Otherwise the parser gives the error messages.

Compiler Design-Woldia University by:- Michael W.


2
• The syntax of a programming is described by a context-free
grammar (CFG). We will use BNF (Backus-Naur Form)
notation in the description of CFGs.
• Languages that are generated by context-free grammars are
context-free languages.
Compiler Design-Woldia University by:- Michael W.
3
• Context-free grammars are more expressive than
finite automata: if a language L is accepted by a finite
automata then L can be generated by a context-free grammar.
• Beware: The converse is NOT true

Context-Free Grammar (CFG)


• A grammar consists of 4 components (T, N, s,
)
 T — set of terminal symbols
• Essentially tokens — those appear in the input string
 N — set of non-terminal symbols
• Categories of strings impose hierarchical language structure
• Useful for analysis
Compiler Design-Woldia University by:- Michael W.
4
• example: declaration, statement, loop, ...
 s — a special non-terminal start symbol that denotes
every sentence is derivable from it
 — a set of production rules
• ―LHS → RHS‖: left-hand-side produces right-hand-side
• Each production rule in context-free grammar must be of the
form:
A→α
• where A represents a single non-terminal and α is any string
of terminals and non-terminals.
• A context-free grammar

Compiler Design-Woldia University by:- Michael W.


5
• Gives a precise syntactic specification of a programming
language.
• A grammar can be directly converted into a parser by some
tools.

Example 1
• Grammar: G = { N, T, S, P }; N = { S, A, B }; T = { a, b, d };
S={S}

• P (Production Rules) : S -> a A B d


A -> b

Compiler Design-Woldia University by:- Michael W.


6
B -> a.
• String: abad.
• Solution: S -> a A B d
abBd
a b a d. This is equal to given string.
Accept.

Example 2
Grammar: = {N, T, S, P }; N = { E }; T = { +, -, *, /, (, ), id }; S = { E }
• P: E -> E+E / E-E / E*E / E/E / (E) / id.
a. String: id + id.
Solution: E -> E + E
Compiler Design-Woldia University by:- Michael W.
7
id +E Accept.
id + id.
Acce
pt.
b.
String:
( id +
id ).
Solutio
n: E ->
(E)
( Example 3
• Grammar:E G = { N, T, S, P }; over the alphabet
{a,b}. +
• P (Production Rules) : S -> aSb | ε
E
Compiler Design-Woldia University by:- Michael W.
8
)
Generate the context-free (and non-regular) language.
L = {anbn | n>=0}.
• The example derivation
S → aSb → aaSbb → aabb.

Derivation
• Derivation is a sequence of replacements by using production
rules from the starting symbols to get a given string.
• There are two types:

Compiler Design-Woldia University by:- Michael W.


9
• a. Leftmost derivation: Left-most derivations are those in
which the leftmost non-terminal is replaced at each step
of
derivation.
• b. Rightmost derivation: Rightmost derivations are those in
which the rightmost non-terminal is replaced at each step. It
is also called “canonical derivations”.
Example:
– Grammar: G = { N, T, S, P }; N = { E }; T = { +,
-, *,
/, (, ), id }; S={E}
– P: E -> E+E | E-E | E*E | E/E | (E) | id.
• Find the leftmost derivations and rightmost derivations of
the following string: ((Compiler
id +Design-Woldia
id ) * id / id ).
University by:- Michael W.
10
• Solution:
Question (( id + id ) * id / id ).

a. E → (E) → ( E * E ) →( ( E ) * E ) → ( ( E + E ) * E ) → ( ( id + E )
* E ) → ( ( id + id ) * E ) → ( ( id + id ) * E / E ) → ( ( id + id ) * id /
E ) → ( ( id + id ) * id / id ). Leftmost Derivation
b. E → (E) → ( E / E ) → ( E / id ) → ( E * E / id )→ ( E * id / id ) →
( ( E ) * id / id ) → ( ( E + E ) * id / id ) → ( ( E + id ) * id / id ) →
( ( id + id ) * id / id ). Rightmost Derivation

Exercise
1. Show a left-most derivation for each of the following strings,
using the given grammar.
Expr → Expr + Expr | Expr Expr | ( Expr ) | var | const
(a) var + const (b) var11 + var var
Compiler Design-Woldia University by:- Michael W.
(c) (var) (d) ( var + var ) var
Solution:
Expr → Expr * Expr (b)
Expr → Expr + Expr → Expr + Expr * Expr
→ var + Expr → var + Expr * Expr
→ var + var → var + var * Expr
→ var + var * var

Parse Trees
A labeled tree which in the interior nodes are labeled by non-terminals

Compiler Design-Woldia University by:- Michael W.


12
– leaf nodes are labeled by terminals
– the children of an interior node represent a replacement of
the associated nonterminal in a derivation
– corresponding to a derivation
• What is the relationship between a parse-tree and derivations?
– Parse tree is the graphical representation of derivations
– many-to-one relationship between derivations and parse-tree.

Compiler Design-Woldia University by:- Michael W.


13
Example
• Parse tree for the expression: ((id + id) * id / id) of the
previous example.
Compiler Design-Woldia University by:- Michael W.
14
• Solution: Parse tree for two types of derivations, such
as leftmost (lm) derivation and right-most (rm) derivation.

Compiler Design-Woldia University by:- Michael W.


15
Ambiguous Grammars
• A grammar that produces more than one left-most (or)
rightmost derivation for some string (sentence) is called
ambiguous grammar.
• For the most parsers, the grammar must be unambiguous.
• Unambiguous grammar: Unique selection of the parse tree for a
sentence.
• We should eliminate the ambiguity in the grammar during the
design phase of the compiler.
• An unambiguous grammar should be written to eliminate the
ambiguity.

Compiler Design-Woldia University by:- Michael W.


16
Ambiguous Grammars
• We have to prefer one of the parse trees of a sentence
(generated by an ambiguous grammar) to disambiguate
that grammar to restrict to this choice)
• Ambiguous grammars can be disambiguated according to the
precedence and associativity rules.

Compiler Design-Woldia University by:- Michael W.


17
Compiler Design-Woldia University by:- Michael W.
18
Compiler Design-Woldia University by:- Michael W.
19
Compiler Design-Woldia University by:- Michael W.
20
Exercise
1. Determine whether the following grammar is ambiguous. If so, show
two different derivation trees for the same string of terminals,
and show a left-most derivation corresponding to each tree.
S → aSbS | aS | c •
Solution:
S → aSbS S → aS
→ aaSbS → aaSbS
→ aacbS → aacbS
→ aacbc → aacbc

Compiler Design-Woldia University by:- Michael W.


21
Parser
• Parsing is the activity of checking whether a given string of
symbols (usually, the stream of tokens produced by a
lexical analyser) is in the language of some grammar, and
if so, constructing a parse tree for it.
Two general types (methods) of parsers:
• Top-down methods
• Bottom-up methods
These terms refer to the order in which nodes in the parse tree are
constructed.

Compiler Design-Woldia University by:- Michael W.


22
Top-down Parsing
• Parse-trees built from root to leaves.
• Input to parser scanned from left to right one symbol at a time.
• top-down parsing method is called ―recursive
descent‖ parsing.
• predictive parser is one kind of recursive descent parser.
• Also called as LL parsing:
– L means that tokens are read left to right.
– L means that it constructs a leftmost derivation.

Compiler Design-Woldia University by:- Michael W.


23
• The popularity of top-down parsers is due to the fact
that
efficient parsers can be constructed more easily by hand using
top-down methods.

Bottom-Up Parsing
Start from leaves and work their way up to the root.
Input to parser scanned from left to right one symbol at a time.
bottom-up parsingmethodis called ― shift-reduce
(SR)‖ parsing.
An operator-precedence parser is one kind of
shift-reduce parser.
Also called as LR parsing: Compiler Design-Woldia
24
University by:- Michael W.
– L means that tokens are read left to right.
– R means that it constructs a rightmost derivation.
• Bottom-up parsing, however, can handle a larger class of
grammars and translation schemes, so software tools
for generating parsers directly from grammars often use
bottom-up methods.

Example

Compiler Design-Woldia University by:- Michael W.


25

Compiler Design-Woldia University by:- Michael W.


26
Example
• Grammar for a fragment of a programming language:

Compiler Design-Woldia University by:- Michael W.


27
Top-down parsing Bottom-Up parsing

Stmt → if cond then stmt Stmt → if var == var then var=var


→ if var == var then stmt → if var == var then stmt
→ if var == var then var=var → if cond then stmt
→ stmt

Shift-Reduce (SR) Parsing


• A shift-reduce parser tries to reduce the given input string into
the starting symbol.
• At each reduction step, a substring of the input matching to the
right side of a production rule is replaced by the non-
terminal at the left side of that production rule.

Compiler Design-Woldia University by:- Michael W.


28
• If the substring is chosen correctly, the right most derivation of
that string is created in the reverse order.
• Example: Grammar S → aABb; A → aA | a; B → bB | b;
and the input string is aaabb.
– aaabb → aaAbb → aAbb → aABb → S

Shift-Reduce (SR) Parsing


• Handle: a handle of a string is a substring that matches the
right side of a production rule.
• Reduction: The process of ―reducing‖ a string ‗w‘ to the
start symbol of grammar. At each replacement of the right
side of a production by the left side in the
process is called a
Compiler Design-Woldia University by:- Michael W.
―reduction‖. 29
• Handle Pruning: A rightmost derivation in reverse,
often called a “canonical reduction sequence”, is obtained by
“handle pruning”.

SR Parsing: Example
• Example: Consider the following grammar :
1. E -> E + E, 2. E -> E * E 3. E -> ( E ), 4. E -> id,
and the input string id1 + id2 + id3. The following sequence
of reductions reduces id1 + id2 + id3 to the start symbol E:

Compiler Design-Woldia University by:- Michael W.


30
Stack Implementation of Shift-Reduce
• Parsing
A convenient way to implement a shift-reduce parser is to use
a
stack and an input buffer.
• There are four possible actions of a shift-parser action:

Compiler Design-Woldia University by:- Michael W.


31
– Shift: The next input symbol is shifted onto the top of the
stack.
– Reduce: Replace the handle on the top of the
stack by nonterminal.
– Accept: Successful completion of parsing.
– Error: Parser discovers a syntax error, and calls an error
recovery routine.
Initial stack just contains only the end-marker $.
The end of the input string is marked by the end-marker $.
Stack Implementation of SR-Parsing: Example
Consider the previous grammar and derivation.

Compiler Design-Woldia University by:- Michael W.


32
Compiler Design-Woldia University by:- Michael W.
33
Stack Implementation of Shift-Reduce Parsing

• There are four steps to be followed for Shift-Reduce


Parsing method:
– Find the right-most derivation.
– Prepare the handle list
– Implement the stack.
– Construction of parse tree.

Compiler Design-Woldia University by:- Michael W.


34
Operator-Precedence Parsing
• A grammar is said to be a operator grammar, if all the productions
containing no two consecutive non-terminal nodes in the right
hand side.
• Example:
1: E -> id; E -> E + E; E -> E * E; E -> (E). [ Valid ]
2: E -> E OP E [ In-valid, because two consecutive capital
letters placed ]
S -> AbB [ Valid ]
A -> aBb [ Valid ]

Compiler Design-Woldia University by:- Michael W.


35
3: S -> aABc. [ In-valid, because two consecutive capital
letters placed ].

L-R (Left-to-Right) Parsing


• LR parser consists of two parts: a driver routine and a
parsing table.
• The driver routine is the same for all LR parsers; only
the parsing table changes from one parser to another.
• The schematic form of LR parser is:

Compiler Design-Woldia University by:- Michael W.


36
L-R (Left-to-Right) Parsing
• There are three different techniques for producing LR parsing
tables.
1. Simple LR (SLR): This method is easiest to implement.
Unfortunately, it may fail to produce a table for certain
grammars on which the other method succeed.
Compiler Design-Woldia University by:- Michael W.
37
2. Canonical LR (CLR): This is the most powerful and will
work on a very large class of grammars. Unfortunately this
method can be very expensive to implement.
3. Look-Ahead LR (LALR): This is intermediate in power
between the SLR and the canonical LR methods. This method
will work on most programming language grammars and, with
some effort, can be implement efficiently.

Push-down Automata/Machine (PDA)


• Push-down Automaton (PDA) is a nondeterministic finite
automaton equipped with an additional storage device
called a
Compiler Design-Woldia University by:- Michael W.
38
stack, and a stack head which reads from and writes to the top of the stack.

Compiler Design-Woldia University by:- Michael W.


39
Push-down Automata/Machine (PDA)
• The stack is a first-in last-out storage device with
no predetermined size limit.
• The stack head always scans the top element of the stack.
• It performs two basic stack operations:
– Push: add a new symbol at the top of the stack.
– Pop: read and remove the top symbol from the stack.
• PDA are equivalent in power to context-free grammars.
• PDA is a recognizes strings given in a context-free grammar.

Compiler Design-Woldia University by:- Michael W.


40
Formal Definition of PDA

Compiler Design-Woldia University by:- Michael W.


41
How PDA computes
• At the beginning:
– a PDA is in the initial state q0,
– The PDA have an empty stack, and
– Its tape head is scanning the leftmost symbol of the input
tape.
• At the end, we say a PDA accepts the input if, in one of its
computation paths, it reads all input symbols, has an
empty stack, and reaches a final state.

Compiler Design-Woldia University by:- Michael W.


42
Error Recovery Techniques
• Panic-Mode Error Recovery
– Skipping the input symbols until a synchronizing token is found.
• Phrase-Level Error Recovery
– Each empty entry in the parsing table is filled with a pointer to a specific
error routine to take care that error case.
• Error-Productions
– If we have a good idea of the common errors that might be encountered, we
can augment the grammar with productions that generate erroneous constructs.
– When an error production is used by the parser, we can generate appropriate
error diagnostics.
– Since it is almost impossible to know all the errors that can be made by
the programmers, this method is not practical.
• Global-Correction
– Ideally, we we would like a compiler to make as few change as possible in
processing incorrect inputs.
Compiler Design-Woldia University by:- Michael W.
43
– We have to globally analyze the input to find the error.
– This is an expensive method, and it is not in practice.

Panic-Mode Error Recovery in


LL(1) Parsing
• In panic-mode error recovery, we skip all the input symbols until a
synchronizing token is found.
• What is the synchronizing token?
– All the terminal-symbols in the follow set of a non-terminal can be
used as a synchronizing token set for that non-terminal.
• So, a simple panic-mode error recovery for the LL(1) parsing:

Compiler Design-Woldia University by:- Michael W.


44
– All the empty entries are marked as synch to indicate that the parser
will skip all the input symbols until a symbol in the follow set of the
non-terminal A which on the top of the stack. Then the parser will
pop that non-terminal A from the stack. The parsing continues from
that state.
– To handle unmatched terminal symbols, the parser pops that unmatched
terminal symbol from the stack and it issues an error message saying
that that unmatched terminal is inserted.

Compiler Design-Woldia University by:- Michael W.


45
Panic-Mode Error Recovery - Example

Compiler Design-Woldia University by:- Michael W.


46

You might also like