0% found this document useful (0 votes)
10 views46 pages

Chapter - 3

Chapter 3 discusses syntax analysis, focusing on how tokens form expressions and the role of parsers in verifying syntax according to context-free grammar (CFG). It covers different parsing methods, including top-down and bottom-up approaches, and introduces concepts like derivation, parse trees, and ambiguity elimination. Additionally, it explains the construction of predictive parsing tables using FIRST and FOLLOW sets to aid in parsing decisions.

Uploaded by

dejenehundaol91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views46 pages

Chapter - 3

Chapter 3 discusses syntax analysis, focusing on how tokens form expressions and the role of parsers in verifying syntax according to context-free grammar (CFG). It covers different parsing methods, including top-down and bottom-up approaches, and introduces concepts like derivation, parse trees, and ambiguity elimination. Additionally, it explains the construction of predictive parsing tables using FIRST and FOLLOW sets to aid in parsing decisions.

Uploaded by

dejenehundaol91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

Chapter – 3

Syntax analysis Introduction


• Syntax: the way in which tokens are put together to form expressions,
statements, or blocks of statements.
o The rules governing the formation of statements in a programming
language.
• Syntax analysis: the task concerned with fitting a sequence of tokens into
a specified syntax.
• Parsing: To break a sentence down into its component parts with an
explanation of the form, function, and syntactical relationship of each
part.
• The syntax of a programming language is usually given by the grammar
rules of a context free grammar (CFG).
Parser

• The syntax analyzer (parser) checks whether a given source program


satisfies the rules implied by a CFG or not.
o If it satisfies, the parser creates the parse tree of that program.
o Otherwise, the parser gives the error messages.
Parser
 The parser can be categorized into two groups:
Top-down parser
o The parse tree is created top to bottom, starting from the root to leaves.
Bottom-up parser
o The parse tree is created bottom to top, starting from the leaves to root.
• Both top-down and bottom-up parser scan the input from left to right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented by making use of
context free- grammar.
o LL for top-down parsing
o LR for bottom-up parsing
Context free grammar (CFG)
 A context-free grammar is a specification for the syntactic structure of a
programming language.
 Context-free grammar has 4-tuples: G = (T, N, P, S) where
o T is a finite set of terminals (a set of tokens)
o N is a finite set of non-terminals (syntactic variables)
o P is a finite set of productions of the form A → α where A is non-terminal and
α is a strings of terminals and non-terminals (including the empty string)
o S ∈ N is a designated start symbol (one of the non- terminal symbols)
Example: grammar for simple arithmetic expressions
Derivation
 A derivation is a sequence of replacements of structure names by choices on the
right hand sides of grammar rules.
 Example: E → E + E | E – E | E * E | E / E | -E
E→(E)
E → id
E => E + E means that E + E is derived from E
o we can replace E by E + E
o we have to have a production rule E → E+E in our grammar.
E=>E+E =>id+E=>id+id means that a sequence of replacements of non-terminal
symbols is called a derivation of id+id from E.
 If we always choose the left-most non-terminal in each derivation step, this
derivation is called left-most derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(id+E)=>-(id+id)
 If we always choose the right-most non-terminal in each derivation step, this
derivation is called right-most derivation.
Example: E=>-E=>-(E)=>-(E+E)=>-(E+id)=>-(id+id)
Parse tree
 A parse tree is a graphical representation of a derivation.
 It filters out the order in which productions are applied to replace non-
terminals.
 A parse tree corresponding to a derivation is a labeled tree in which:
o the interior nodes are labeled by non-terminals,
o the leaf nodes are labeled by terminals, and
o the children of each internal node represent the replacement of the
associated non terminal in one step of the derivation.
Parse tree and Derivation

Ambiguity: example
Elimination of ambiguity
Precedence/Association
 These two derivations point out a problem with the grammar:
 The grammar do not have notion of precedence, or implied order of evaluation
To add precedence
o Create a non-terminal for each level of precedence
o Isolate the corresponding part of the grammar
o Force the parser to recognize high precedence sub expressions first
For algebraic expressions
o Multiplication and division, first (level one)
o Subtraction and addition, next (level two)
To add association
o Left-associative : The next-level (higher) non-terminal places at the last of
a production
o Elimination of ambiguity
o To disambiguate the grammar :
Elimination of ambiguity
Precedence/Association
o we can use precedence of operators as follows:
* Higher precedence (left associative)
+ Lower precedence (left associative)
o We get the following unambiguous grammar:

Left Recursion

Elimination of Left recursion :


 A grammar is left recursive, if it has a non-terminal A such that there is a
derivation A=>+Aα for some string α.
Left Recursion
 Top-down parsing methods cannot handle left-recursive grammar. So a
transformation that eliminates left-recursion is needed.
 To eliminate left recursion for single production A  Aα |β could be replaced by
the non left- recursive productions
AβA`
A`α A`| ε

 Generally, we can eliminate immediate left recursion from them by the following
technique. First we group the A-productions as:
A  Aα1 |Aα2 |…. |Aαm |β1 | β2|….| βn
Left factoring
 When a non-terminal has two or more productions whose right-hand sides start
with the same grammar symbols, the grammar is not LL(1) and cannot be used
for predictive parsing
 A predictive parser (a top-down parser without backtracking) insists that the
grammar must be left-factored.
• In general: A  αβ1 | αβ2 , where α-is a non-empty and the first symbol of β1
and β2.
 When processing α we do not know whether to expand A to αβ1 or to αβ2, but
if we re-write the grammar as follows:
A  αA`
A`  β1 | β2 so, we can immediately expand A to αA`.
Example: given the following grammar:
S  iEtS | iEtSeS | a
E b
Left factored, this grammar becomes: S  iEtSS` | a
S` eS | ε
E b
Syntax analysis
 Every language has rules that prescribe the syntactic structure of well-formed
programs.
 The syntax can be described using Context Free Grammars (CFG) notation.
 The use of CFGs has several advantages:
o helps in identifying ambiguities
o it is possible to have a tool which produces automatically a parser using the
grammar
o a properly designed grammar helps in modifying the parser easily when the
language changes
Top-down parsing
Recursive Descent Parsing (RDP)
 This method of top-down parsing can be considered as an attempt to find the
left most derivation for an input string. It may involve backtracking.
 To construct the parse tree using RDP:
o We create one node tree consisting of S.
o Two pointers, one for the tree and one for the input, will be used to indicate
where the parsing process is.
o Initially, they will be on S and the first input symbol, respectively.
o Then we use the first S-production to expand the tree. The tree pointer will
be positioned on the left most symbol of the newly created sub-tree.
 As the symbol pointed by the tree pointer matches that of the symbol pointed
by the input pointer, both pointers are moved to the right.
 Whenever the tree pointer points on a non-terminal, we expand it using the first
production of the non-terminal.
Top-down parsing
Recursive Descent Parsing (RDP)
 Whenever the pointers point on different terminals, the production that was
used is not correct, thus another production should be used. We have to go
back to the step just before we replaced the non-terminal and use another
production.
 If we reach the end of the input and the tree pointer passes the last symbol of
the tree, we have finished parsing.
Example:
G: S cAd
A ab|a
 Draw the parse tree for the input string cad using the above method.
Non-recursive predictive parsing
 It is possible to build a non-recursive parser by explicitly maintaining a stack.
 This method uses a parsing table that determines the next production to be
applied. The input buffer contains the string to be parsed followed by $ (the
right end marker)

 The stack contains a sequence of grammar symbols with $ at the bottom.


 Initially, the stack contains the start symbol of the grammar followed by $.
 The parsing table is a two dimensional array M[A, a] where A is a non-terminal
of the grammar and a is a terminal or $.
 The parser program behaves as follows.
Non-recursive predictive parsing
 The program always considers
o X, the symbol on top of the stack and
o a, the current input symbol.
Predictive Parsing…
 There are three possibilities:
1. x = a = $ : the parser halts and announces a successful completion of parsing
2. x = a ≠ $ : the parser pops x off the stack and advances the input pointer to
the next symbol
3. X is a non-terminal: the program consults entry M[X, a] which can be an X-
production or an error entry.
 If M[X, a] = {X uvw}, X on top of the stack will be replaced by uvw (u at the top
of the stack).
 As an output, any code associated with the X-production can be executed.
 If M[X, a] = error, the parser calls the error recovery method.
Non-recursive predictive parsing
A Predictive Parser table:

A Predictive Parser: Example


Non-recursive predictive parsing
Non-recursive predictive parsing Example:
G: E  TR
R +TR Input: 1+2
R -TR
R ε
T  0|1|…|9
FIRST and FOLLOW
 The construction of both top-down and bottom-up parsers are aided by two
functions, FIRST and FOLLOW, associated with a grammar G.
 During top-down parsing, FIRST and FOLLOW allow us to choose which
production to apply, based on the next input symbol. During panic-mode error
recovery, sets of tokens produced by FOLLOW can be used as synchronizing
tokens.
 We need to build a FIRST set and a FOLLOW set for each symbol in the
grammar. The elements of FIRST and FOLLOW are terminal symbols.
o FIRST(α) is the set of terminal symbols that can begin any string derived
from α<--> derivation containing αt
FIRST and FOLLOW
FIRST
o FIRST(α) = set of terminals that begin the strings derived from α.
o If α => ε in zero or more steps, ε is in FIRST(α).
o FIRST(X) where X is a grammar symbol can be found using the following rules:
1- If X is a terminal, then FIRST(x) = {x}
2- If X is a non-terminal: two cases
a) If X  ε is a production, then add ε to FIRST(X)
b) For each production X  y1y2…yk, place a in FIRST(X) if for some i, a Є FIRST
(yi) and ε Є FIRST (yj), for 1<j. If ε Є FIRST (yj), for j=1, …,k then ε Є FIRST(X)
For any string y = x1x2…xn
a- Add all non- ε symbols of FIRST(X1) in FIRST(y)
b- Add all non- ε symbols of FIRST (Xi) for i≠1 if for all j<i, ε Є FIRST (Xj)
c- ε Є FIRST(y) if ε Є FIRST(Xi) for all i
FIRST and FOLLOW
FOLLOW
 FOLLOW(A) = set of terminals that can appear immediately to the right of A in
some sentential form.
o Place $ in FOLLOW(A), where A is the start symbol.
o If there is a production B  αAβ, then everything in FIRST(β), except ε,
should be added to FOLLOW(A).
o If there is a production B  αA or B αAβ and ε Є FIRST(β), then all
elements of FOLLOW(B) should be added to FOLLOW(A).
Construction of predictive parsing table
o Input Grammar G
o Output Parsing table M
 For each production of the form A  α of the grammar do:
 For each terminal a in FIRST(α), add A α to M[A, a]
 If ε Є FIRST(α), add A α to M[A, b] for each b in FOLLOW(A)
 If ε Є FIRST(α) and $ Є FOLLOW(A), add A  α to M[A, $]
 Make each undefined entry of M be an error.
Bottom-Up and Top-Down Parsers
Top-down parsers:
 Starts constructing the parse tree at the top (root) of the tree and move down
towards the leaves.
 Easy to implement by hand, but work with restricted grammars.
Example: predictive parsers
Bottom-up parsers:
 Build the nodes on the bottom of the parse tree first.
 Suitable for automatic parser generation, handle a larger class of grammars.
Example: shift-reduce parser (or LR (k) parsers)
 A bottom-up parser, or a shift-reduce parser, begins at the leaves and works up
to the top of the tree. The reduction steps trace a rightmost derivation on
reverse.
Bottom-Up and Top-Down Parsers
 We want to parse the input string abbcde. This parser is known as an LR Parser
because it scans the input from Left to right, and it constructs a rightmost
derivation in reverse order.
Example of Bottom-up parser (LR parsing)
S aABe
A Abc | b
Bd
abbcde aAbcde aAde aABe S
 At each step, we have to find α such that α is a substring of the sentence and
replace α by A, where A  α
Stack implementation of shift/reduce parsing
 In LR parsing the two major problems are:
o locate the substring that is to be reduced
o locate the production to use
Bottom-Up and Top-Down Parsers
 A shift/reduce parser operates:
o By shifting zero or more input into the stack until the right side of the
handle is on top of the stack.
o The parser then replaces handle by the non-terminal of the production.
o This is repeated until the start symbol is in the stack and the input is
empty, or until error is detected.
 Four actions are possible:
o Shift: the next input is shifted on to the top of the stack
o Reduce: the parser knows the right end of the handle is at the top of the
stack. It should then decide what non-terminal should replace that substring
o Accept: the parser announces successful completion of parsing
o Error: the parser discovers a syntax error
Bottom-Up and Top-Down Parsers
 Example: An example of the operations of a shift/reduce parser G: E  E + E |
E*E | (E) | id
Conflict during shift/reduce parsing
 Grammars for which we can construct an LR (k) parsing table are called LR (k)
grammars.
 Most of the grammars that are used in practice are LR (1).
 There are two types of conflicts in shift/reduce parsing:
o Shift/reduce conflict: when we have a situation where the parser knows
the entire stack content and the next k symbols but cannot decide whether

it should shift or reduce. Ambiguity


o Reduce/reduce conflict: when the parser cannot decide which of the
several productions it should use for a reduction.
ET
Eid with an id on the top of stack
Tid

LR parser
Conflict during shift/reduce parsing
 The LR(k) stack stores strings of the form: S0X0S1X1…XmSm where
o Si is a new symbol called state that summarizes the information contained
in the stack
o Sm is the state on top of the stack
o Xi is a grammar symbol
 The parser program decides the next step by using:
 the top of the stack (Sm),
 the input symbol (ai), and
 the parsing table which has two parts: ACTION and GOTO.
 then consulting the entry ACTION[Sm , ai] in the parsing action table
Structure of the LR Parsing Table
 The parsing table consists of two parts:
o a parsing-action function ACTION and
o a goto function GOTO.
 The ACTION function takes as arguments a state i and a terminal a (or $, the
input endmarker).
 The value of ACTION[i, a] can have one of four forms:
o Shift j, where j is a state. The action taken by the parser shifts input a on
the top of the stack, but uses state j to represent a.
o Reduce A  β, The action of the parser reduces β on the top of the stack
to head A.
o Accept, The parser accepts the input and finishes parsing.
o Error, The parser discovers an error
 GOTO function, defined on sets of items, to states.
o GOTO[Ii, A] = Ij, then GOTO maps a state i and a non-terminal A to state j.
LR parser configuration
 Behavior of an LR parser describes the complete state of the parser.
 A configuration of an LR parser is a pair:

 This configuration represents the right-sentential form


Behavior of LR parser
 The parser program decides the next step by using:
o the top of the stack (Sm),
o the input symbol (ai), and
o the parsing table which has two parts: ACTION and GOTO.
o then consulting the entry ACTION[Sm , ai] in the parsing action table
1. If Action[Sm, ai] = shift S, the parser program shifts both the current input
symbol ai and state S on the top of the stack, entering the configuration (S0 X1 S1
X2 S2 … Xm Sm ai S, ai+1 … an $)
2. Action[Sm, ai] = reduce A  β: the parser pops the first 2r symbols off the stack,
where r = |β| (at this point, Sm-r will be the state on top of the stack), entering the
configuration, (S0 X1 S1 X2 S2 … Xm-r Sm-r A S, ai ai+1 … an $)
o Then A and S are pushed on top of the stack where S = goto[Sm-r, A]. The
input buffer is not modified.
3. Action[Sm, ai] = accept, parsing is completed.
4. Action[Sm, ai] = error, parsing has discovered an error and calls an error
recovery routine.
LR-parsing algorithm
 let a be the first symbol of w$;
while(1) { /* repeat forever */
let S be the state on top of the stack;
if ( ACTION[S, a] = shift t ) { push t onto the stack;
let a be the next input symbol;
} else if ( ACTION[S, a] = reduce A β ) {
pop IβI symbols off the stack;
let state t now be on top of the stack;
push GOTO[t, A] onto the stack;
output the production A β;
} else if ( ACTION[S, a] = accept ) break; /* parsing is done */
else call error-recovery routine; }
LR-parsing algorithm
 Example: Let G1 be:

 Legend: Si means shift to state i, Rj means reduce production by j


LR-parsing algorithm
 The following grammar can be parsed with this action and goto table as bellow

 Example: The following example shows how a shift/reduce parser parses an


input string w = id * id + id using the parsing table shown above.
Constructing SLR parsing tables
 This method is the simplest of the three methods used to construct an LR
parsing table. It is called SLR (simple LR) because it is the easiest to implement.
However, it is also the weakest in terms of the number of grammars for which it
succeeds. A parsing table constructed by this method is called SLR table. A
grammar for which an SLR table can be constructed is said to be an SLR
grammar.
LR (0) item
 An LR (0) item (item for short) is a production of a grammar G with a dot at
some position of the right side.
 For example for the production A X Y Z we have four items:
A.XYZ
AX.YZ
A X Y . Z
A X Y Z.
 For the production A ε we only have one item:
A .
Constructing SLR parsing tables
 An item indicates what is the part of a production? That we have seen and what
we hope to see. The central idea in the SLR method is to construct, from the
grammar, a deterministic finite automaton to recognize viable prefixes.
 A viable prefix is a prefix of a right sentential form that can appear on the stack
of a shift/reduce parser.
o If you have a viable prefix in the stack it is possible to have inputs that will
reduce to the start symbol.
o If you don‟t have a viable prefix on top of the stack you can never reach
the start symbol; therefore you have to call the error recovery procedure.
The closure operation
 If I is a set of items of G, then Closure (I) is the set of items constructed by two
rules:
o Initially, every item in I is added to closure (I)
o If A  α.Bβ is in Closure of (I) and B  γ is a production, then add B .γ to I.
o This rule is applied until no more new item can be added to Closure (I).
Example G1`: E` E
EE+T
E T
TT*F
TF
F (E)
F id
 I = {[E` .E]}
 Closure (I) = {[E`  .E], [E  .E + T], [E  .T], [T .T * F], [T  .F], [F  .(E)], [F
 .id]}
The Goto operation
 The second useful function is Goto (I, X) where I is a set of items and X is a
grammar symbol. Goto (I, X) is defined as the closure of all items [A  αX.β]
such that [A  α.Xβ] is in I.
 Example: I = {[E` E.], [E  E . + T]} Then goto (I, +) = {[E  E +. T], [T .T * F], [T
.F], [F  .(E)] [F  .id]}

The set of Items construction


 Below is given an algorithm to construct C, the canonical collection of sets of
LR(0) items for augmented grammar G‟.
Procedure Items (G‟);
Begin
C := {Closure ({[S‟  . S]})}
Repeat
 For Each item of I in C and each grammar symbol X such that Goto (I, X) is not
empty and not in C do
Add Goto (I, X) to C;
Until no more sets of items can be added to C
End
The set of Items construction

 Example: Construction of the set of Items for the augmented grammar above
G1‟.
The set of Items construction
 Example: Construction of the set of Items for the augmented grammar above
G1‟.

LR (0) automation
SLR table construction algorithm
 1. Construct C = {I0, I1, ......, IN} the collection of the set of LR (0) items for G`.
 2. State i is constructed from Ii and a) If [A  α.aβ] is in Ii and Goto (Ii, a) = Ij (a is
a terminal) then action [i, a]=shift j b) If [A  α.] is in Ii then action [i, a] = reduce
A  α for a in Follow (A) for A ≠ S` c) If [S`  S.] is in Ii then action [i, $] = accept.
o If no conflicting action is created by 1 and 2 the grammar is SLR (1); otherwise
it is not.
 3. For all non-terminals A, if Goto (Ii, A) = Ij then Goto [i, A] = j
 4. All entries of the parsing table not defined by 2 and 3 are made error
 5. The initial state is the one constructed from the set of items containing
[S` .S]
SLR table construction algorithm
 Example: Construct the SLR parsing table for the grammar G1`
Follow (E) = {+, ), $} Follow (T) = {+, ), $, *
Follow (F) = {+, ), $,*}
E` E
1 E E + T
2 E T
3 TT*F
4TF
5 F (E)
6 F id}
SLR table construction algorithm
 By following the method we find the Parsing table used earlier.

 Legend: Si means shift to state i, Rj means reduce production by j


The Parser Generator: Yacc
 Yacc stands for "yet another compiler-compiler". Yacc: a tool for automatically
generating a parser given a grammar written in a yacc specification (.y file). Yacc
parser
– calls lexical analyzer to collect tokens from input stream. Tokens are organized
using grammar rules. When a rule is recognized, its action is executed
Note: lex tokenizes the input and yacc parses the tokens, taking the right actions,
in context.
Scanner, Parser, Lex and Yacc
The Parser Generator: Yacc
The Parser Generator: Yacc
 There are four steps involved in creating a compiler in Yacc:
1) Generate a parser from Yacc by running Yacc over the grammar file.
2) Specify the grammar: Yacc…
o Write the grammar in a .y file (also specify the actions here that are to be
taken in C).
o Write a lexical analyzer to process input and pass tokens to the parser. This
can be done using Lex.
o Write a function that starts parsing by calling yyparse(). o Write error
handling routines (like yyerror()).
3) Compile code produced by Yacc as well as any other relevant source files.
4) Link the object files to appropriate libraries for the executable parser.

You might also like