Syntax Analyzer

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Compiler

Syntax Analyzer
The Syntax analyzer builds a hierarchical structure of program known as abstract syntax tree. It is
heavily dependent on the context- free grammars (CFG) and context-free languages (CFL) which
are processed by push down automata.
Every programming language has rules that describe the syntactic structures of their programs. A
programming language satisfies the syntactic rules of a particular Context-free grammar (CFG) or
by Backus – Naur Form (BNF). CFG ensures that the stream of tokens returned by Lexical analyzer
forms a syntactically correct program as per the grammar.
A grammar is defined by <V, T, S, P>, where
• V: Variables or Non terminals
• T: Terminal (specifies set of symbols which construct the string). Terminals are mapped with
tokens.
• S: S -> S є V where S is the start variable
• P: Production rules which combine terminals and non terminals.
Thus, a grammar
• Imparts a structure to a programming language which is useful in translating a program to its
object code.
• Add new rules to make the language more powerful by improving the functionality of grammar.
In last class, we have discussed syntax analyzer in detail. In the following we shall review in brief
CFGs. 1
Compiler
Context-free Grammar and Languages.
One method of describing formal languages is by RL represented by REs, generated by
regular grammar (type 3) and processed by FA. Thus, RLs are effective in describing
certain simple patterns. Lexical analyzers are build based on them. We have also shown
that some languages can not be described by them. For example, L = {0 n1n|n or
expressions for balancing of parentheses or matching of begin-end pairs in Pascal
programs. They are handled by a large class of languages called the Context-free
languages (CFLs) generated by Context-free grammars (CFGs). RLs are subsets of CFG.
CFLs are recognized by pushdown automata which extends FA. Thus, every regular
grammar is context-free and so family of regular languages is a proper subset of the
family of context-free languages. CFLs are used in a variety of applications such as in the
study of human languages and in the specification and compilation of formal languages.
Deriving a sentence through its grammatical derivation is called parsing. Thus, a
component of a compiler called parser extracts the meaning of a program prior to
generating the compiled code or performing the interpreted execution. It is also used to
describe document formats using the DTD (document type definition) in the XML
(extensible markup language) for information exchange on the Web.
Example (1). Consider a set of palindromes over {0,1}. A palindrome is a string that reads
the same forward and backward. That is if w is a palindrome then w = wR. It is also shown
that it is not regular by using pumping lemma for RLs. However, it can be generated by
Compiler
CFG, G = ({S}, {0, 1}, S, P) with production rules are S → |0|1|0S0|1S1, where S is the start
symbol.
Example (2). Consider the representation a simple arithmetic expressions involving
operators + and * and identifiers (I) formed from one or more letters a, b, followed
by one or more digits 0 and 1 and starting with either a or b. The CFG G for this is given by
G = ({E,I}, {+,*,(,),a,b,0,1}, E, P), where
P: E → I |E+E |E*E|(E)
I → a|b|Ia|Ib|I0|I1
The inference that a * (a + b00)is in the language of variable E can be reflected in a
derivation of that string starting with the string E. One such derivation is
E E * E I * E a* E a*(E) a* (E+E) a* (a + I) a*(a+I0) a*(a+I00) a*(a+b00)
If the leftmost variable is replaced at each step then the derivative is called left derivation
and likewise it is called right derivation if right variable is replaced at each step in the
derivation.
Example (3). Construct the string 0100110 from the following grammar
S → 0S |1AA
A → 0 |1A|0B
B → 1|0BB
by using leftmost and rightmost derivations.
Compiler
Answer.
i) Leftmost Derivation.
S → 0S → 01AA → 010BA → 0100BBA → 01001BA → 010011A → 0100110
(ii) Rightmost Derivation.
S → 0S → 01AA → 01A0 → 010B0 → 0100BB0 → 0100B10 → 0100110 Parse trees.
A parse tree of a CFG, G = (VN, VT , S, P) is a tree with the following conditions.
(1) Each interior node is labeled by a variable in VN.
(2) Each leaf is labeled by either a variable or a terminal symbol or However if a leaf is
labeled with then it must be the only child of its parent.
(3) If an interior node is labeled A and its children are labeled by X 1, X2, respectively from
the left then A → X1, X2, is a production in P. Note that the only time one of the X i’s can be
if it is the label of the only child and A → is a production of G.
Example (4). Consider the CFG, G = ({S}, {(,)}, S, P), where P: S → |SS|(S) generating the
CFL, L(G). A string w L(G) may have many derivations in G. For example, if G generates the
set of balanced parenthesis, then the string ()() can be derived from S by at least two
derived from S by at least two distinct derivations given by
S SS (S)S ()S ()(S) ()()
Compiler
and
S SS S(S) (S)(S)(S) () () ()
Clearly, both of them use the same rules at the same places. The only difference is
the order in which rules are applied. The parse tree for both the derivations can be
given by

Likewise, a parse tree for the derivation of E + I from E is


Compiler
A parse tree for the derivation of a string P * 0110 is

If we look at the leaves of any parse tree and concatenate them from the left we
get a string called the yield of the tree which is always a string that is derived from
the root variable. Formally, for a CFG, G = (VN, VT , S, P)
Compiler
1. A parse tree for each a VT is

The single node of this parse tree is both root and leaf a.

2. If A → is a rule in P, then

is a parse tree. Its root is the single node labeled A, its only leaf is the node
labeled , and its yield is .
3.If
Compiler
are parse trees, where n 1, with roots labeled A1,A2,…,An respectively, and with
yields y1,y2,…,yn and A → A1A2…An
is a rule in P, then

is a parse tree. Its root is the new node labeled A, its leaves are the leaves of its
constituent parse trees, and its yield is y1y2…yn.
4. Nothing else is a parse tree.
Compiler
Example (5). Construct a parse tree for the string aabbaa from the following
grammar.
S → a|aAS
A → SS|SbA|ba
Answer. For generating the string from the given grammar, the derivation will be
S aAS aSbAS aabAS aabbaS aabbaa
The parse tree is
Compiler
Example (6). Find the parse tree for generating the string 11001010 from the
given grammar.
S → 1B|0A
A → 1|1S|0AA
B → 0|0S|1BB
Answer. For generating the string 11001010 from the given CFG the leftmost
derivation will be
S 1B 11BB 110SB 1100AB 11001B 110010S 1100101B 11001010
The corresponding parse tree is
Compiler
The rightmost derivation is
S 1B 11BB 11B0S 11B01B 11B010S 11B0101B 11B01010 11001010
The corresponding parse tree is
Compiler
Parser
• What is parsing?
• Discovering the derivation of a string ( If one exists).
• Harder than generating strings.
• Two major approaches
• Top-down parsing
• Bottom-up parsing
Top Down Parser
• A parser is top-down if it discovers a parse tree from top to bottom.
• Grammars that can handle top down parsers are called LL(1) grammars.
• The first L in LL(1) refers to the fact that the input is processed from left to right.
• The second L refers to the fact that LL(1) parsing always determines a leftmost derivation for
the input string. The 1 in parentheses implies that LL(1) parsing uses only one symbol of
input to predict the next grammar rule that should be used. It uses preorder traversal of
parse trees. We start with the root of the parse tree ( the node that labeled with the start
symbol).
• There are two types of Top-Down Parsers.
• Recursive Descent Parser (involves backtracking).
• Predictive Parser (involves the prediction of the production of the grammar without backtracking ). 12
Compilers
Recursive Descent Parser.
A top-down parse corresponds to a preorder expansion of the parse tree.
The Leftmost non terminal production will occur first.
A leftmost derivation is applied at each derivation step.
Start at the root of the parse tree and grow toward leaves.
Pick a production & try to match the input.
A bad “pick” may lead to backtracking
Algorithm.
Repeat
Declare a pointer which will represent the current position of the
parser in string.
Start scanning character by character from left to right from the parse
tree and match it with input string using preorder traversal
until
the yield of the parse tree matches input string.
Recursive descent parsing ay involve backtracking. It is the predictive
parsing if no backtracking is allowed.
13
Compilers

• If the scanned symbol is:


• terminal: Increase the pointer by one.
• non terminal: Go for a production.
Add a child node for each symbol of chosen production.
• If a terminal symbol is added that doesn’t match, backtrack.
• Find the next node to be expanded (a non terminal)
• Repeat The process.
• Terminate when
• Leaves of parse tree match input string (success)
• All productions exhausted in backtracking (failure)
We have shown in last class a top down parser by working out an example
by considering an Input String :x-2*y.
14
Compilers
Bottom-Up Parsing
Here, we construct a parse tree for an input string beginning at the leaves (the bottom) and
working up towards the root (the top). We can think of this process as one of reducing a string w
to the start symbol of the grammar. It is also known as shift-reduce parsing because its two main
actions are shift and reduce.

At each shift action, the current symbol in the input string is pushed to a stack.
a string  the starting symbol
reduced to

At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side
of a production) will be replaced by the non-terminal at the left side of that production. If the
substring is chosen correctly, the right most derivation of that string is created in the reverse order.
Rightmost Derivation: S

Shift-Reduce Parser finds:   ...  S


Compilers
Example.
Consider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde  reduction

B d aABe
S

We can scan abbcde looking for a substring that matches the right side of some
production. The substrings b and d qualify. Let us choose left most b and replace it by A,
the left side of the production Ab; We thus obtained the string aAbcde. Now the
substrings Abc, b and d match the right side of some production. Although b is the
leftmost substring that matches the right side of the some production, we choose to
replace the substring Abc by A, the left side of the production AAbc. We obtain aAde.
Then replacing d by B and then replacing the entire string by S. Thus, by a sequence of
four reductions we are able to reduce abbcde to S.
Compilers

These reductions in fact trace out the following right-most derivation in reverse

S  aABe  aAde  aAbcde  abbcde

rm rm rm rm

Right Sentential Forms


How do we know which substring to be replaced at each reduction step?
Compilers
Handle
Informally, a “handle” of a string is a substring that matches the right side of the production,
and whose reduction to nonterminal on the left side of the production represents one step
along the reverse of a rightmost derivation.
But not every substring that matches the right side of a production rule is handle.
Formally , a “handle” of a right sentential form γ ( ) is a production rule A   and a
position of  where the string  may be found and replaced by A to produce the previous
right-sentential form in a rightmost derivation of .

S*  Arm  
rm

then Aβ in the position following α is a handle of αβω

The string  to the right of the handle contains only terminal symbols.
Compilers

Example

Consider the example discussed in the beginning,abbcde is a right sentential form whose
handle is Ab at position 2.Likewise,aAbcde is a right sentential form whose handle is
AAbc at position 2.

Sometimes we say “the substring β is a handle of αβω” if the position of β and the
production Aβ we have in mind are clear.
Compilers

Handle Pruning
A rightmost derivation in reverse can be obtained by “handle pruning”.That is,we start
with a string of terminals w that we wish to parse.If ω is a sentence of grammar at
hand,then ω = γ,where γn is the nth right-sentential form of some as yet unknown
rightmost derivation

S rm= 0 rm 1 rm 2 rm...  rmn-1  n= 

Input string
Compilers

S = 0 
rm 1 
rm 2 
rm ... 
rm n-1 rm n= 

Start from n, find a handle Ann in n, and

replace n in by An to get n-1.

Then find a handle An-1n-1 in n-1, and replace n-1 in by An-1 to get n-2.

Repeat this, until we reach S.


Compilers

A Shift-Reduce Parser
E  E+T | T Right-Most Derivation of id+id*id
T  T*F | F E  E+T  E+T*F  E+T*id  E+F*id
F  (E) | id  E+id*id  T+id*id  F+id*id  id+id*id
Right-Most Sentential form HANDLE Reducing Production
id+id*id id Fid
F+id*id F TF
T+id*id T
ET
E+id*id id
E+F*id F Fid
E+T*id Id TF
E+T*F T*F Fid
E+T E+T TT*F
E EE+T
Compilers

A Stack Implementation of a Shift-Reduce Parser

There are four possible actions of a shift-parser action:

1.Shift : The next input symbol is shifted onto the top of the stack.

2.Reduce: Replace the handle on the top of the stack by the non-terminal.

3.Accept: Successful completion of parsing.

4.Error: Parser discovers a syntax error, and calls an error recovery routine.

Initial stack just contains only the end-marker $.


The end of the input string is marked by the end-marker $.
Compilers

Stack Input Action


$ id+id*id$shift Parse Tree
$id +id*id$ Reduce by Fid
$F +id*id$
$T +id*id$ Reduce by TF E8
$E +id*id$
$E+ Id*id$ Reduce by ET
$E+id *id$ +
E3 T7
$E+F *id$
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ *
T 2
T 5 F6
$E+T*F $ Reduce by Fid
$E+T $ Reduce by TF
$E $ Shift
Shift F1 F4 id
Reduce by Fid
Reduce by TT*F
Reduce by E E+T id id
Accept
Compilers

OPERATOR PRECEDENCE PARSING


• OPERATOR GRAMMAR
• No Ɛ-transition.
• No two adjacent non-terminals.
Eg.
E  E op E | id
op  + | *
The above grammar is not an operator grammar but:
E  E + E | E* E | id
Compilers

OPERATOR PRECEDENCE
• If a has higher precedence over b; a .> b
• If a has lower precedence over b; a <. b
• If a and b have equal precedence; a =. b
Note:
• id has higher precedence than any other symbol
• $ has lowest precedence.
• if two operators have equal precedence, then we check the
Associativity of that particular operator.
Compilers

PRECEDENCE TABLE

id + * $
id .> .> .>
+ <. .> <. .>
* <. .> .> .>
$ <. <. <. .>

Example: w= $id + id * id$


$<.id.>+<.id.>*<.id.>$
Compilers

BASIC PRINCIPLE
• Scan input string left to right, try to detect .> and put a pointer on its
location.
• Now scan backwards till reaching <.
• String between <. And .> is our handle.
• Replace handle by the head of the respective production.
• REPEAT until reaching start symbol.
Compilers

ALGORITHM

w  input
a  input symbol
b  stack top
Repeat
{
if(a is $ and b is $)
return
if(a .> b)
push a into stack
move input pointer
else if(a <. b)
c  pop stack
until(c .> b)
else
error()
}
Compilers

EXAMPLE
STACK INPUT ACTION/REMARK
$ id + id * id$ $ <. Id
$ id + id * id$ id >. +
$ + id * id$ $ <. +
$+ id * id$ + <. Id
$ + id * id$ id .> *
$+ * id$ + <. *
$+* id$ * <. Id
$ + * id $ id .> $
$+* $ * .> $
$+ $ + .> $
$ $ accept
Compilers

PRECEDENCE FUNCTIONS
• Operator precedence parsers use precedence functions that map terminal symbols
to integers.

Algorithm for Constructing Precedence Functions

1. Create functions fa for each grammar terminal a and for the end of string symbol.
2. Partition the symbols in groups so that fa and gb are in the same group if a =· b
(there can be symbols in the same group even if they are not connected by this
relation).
3. Create a directed graph whose nodes are in the groups, next for each symbols a
and b do: place an edge from the group of gb to the group of fa if a <· b,
otherwise if a ·> b place an edge from the group of fa to that of gb.
4. If the constructed graph has a cycle then no precedence functions exist. When
there are no cycles collect the length of the longest paths from the groups of fa
and gb respectively.
Compilers
• Consider the following table:

• Resulting graph:
gid fid

f* g*

g+ f+

f$ g$
Compilers

• From the previous graph we extract the following precedence


functions:
id + * $
f 4 2 4 0
id 5 1 3 0
Compilers
Compilers
Pre Predictive Parser Parser
Compilers
Compilers

37
Compilers

38

You might also like