0% found this document useful (0 votes)

11 views74 pages

Lec02-Syntax Analysis and LL

The document discusses syntax analysis, which is the second phase of compiler construction. It involves checking if a program satisfies the rules of a context-free grammar through syntax analysis. A syntax analyzer or parser creates a parse tree that represents the syntactic structure of the program. Parsers can be top-down or bottom-up. A grammar defines the rules of a language using terminals, non-terminals, a start symbol, and production rules. Derivations show how strings can be generated from a grammar. Ambiguous grammars allow more than one parse tree for a given string.

Uploaded by

hewokes870

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views74 pages

Lec02-Syntax Analysis and LL

Uploaded by

hewokes870

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 74

SYNTAX ANALYSIS

2ND PHASE OF COMPILER CONSTRUCTION

1
SECTION 2.1: CONTEXT FREE GRAMMAR

2
SYNTAX ANALYZER

 The syntax analyzer (parser) checks whether a given source program

satisfies the rules implied by a context-free grammar or not.
 If it satisfies, the parser creates the parse tree of that program.
 Otherwise the parser gives the error messages.

 It creates the syntactic structure of the given source program.

 This syntactic structure is mostly a parse tree.
 Syntax Analyzer is also known as parser.
 The syntax of a programming is described by a context-free grammar
(CFG).
 A context-free grammar
 gives a precise syntactic specification of a programming language.
 the design of the grammar is an initial phase of the design of a compiler.
 a grammar can be directly converted into a parser by some tools.
3
PARSER
• Parser works on a stream of tokens.

• The smallest item is a token.

source Lexical token

Parser parse tree
program Analyzer get next token

4
PARSERS (CONT.)

 We categorize the parsers into two groups:

1. Top-Down Parser
 Parse-trees built is build from root to leaves (top to bottom).
 Input to parser is scanned from left to right one symbol at a time
2. Bottom-Up Parser
 Start from leaves and work their way up to the root.
 Input to parser scanned from left to right one symbol at a time

 Efficient top-down and bottom-up parsers can be implemented only

for sub-classes of context-free grammars.
 LL for top-down parsing 5

 LR for bottom-up parsing

WHY DO WE NEED A
GRAMMAR?

Grammar defines a Language.

There are some rules which need to be followed to express or
define a language.
These rules are laid down in the form of Production rules (P).

Context-free grammar (CFG) is used to generate a language

called Context Free Language (L)

6
CONTEXT-FREE GRAMMARS
(CFG)

CFG G consist of 4 symbol (T,V, S, P):

 T: A finite set of terminals

 V: A finite set of non-terminals ( also denoted by N)

 S: A start symbol (Non-terminal symbol with which

the grammar starts)

 P: A finite set of productions rules

7
CONTEXT-FREE GRAMMARS
(CFG)

Consider the Grammar:

S aAa/b
A a

G = (T,V, S, P)

{a, b} S aAa
S, A S b
Aa 8
TERMINALS SYMBOLS

Terminals include:
 Lower case letters early in the alphabets
 Operator symbols, +, %
 Punctuation symbols such as ( ) , ;
 Digits 0,1,2, …
 Boldface strings id or if

Consider the Grammar:

S aAa
S b;c Here Terminal Symbols
A aA/ ε are {a, b, c, ; , ε}
9
NON TERMINALS SYMBOLS

Non - Terminals include:

 Uppercase letters early in the alphabet
 The letter S, start symbol
 Lower case italic names such as expr or stmt

Consider the Grammar:

Here Non- Terminal
S aAa Symbols are {A, B, S}
S bB
A aA/ ε
B b
10
PRODUCTION RULES

Production Rules include:

 Set of Rules which define the grammar G

Consider the Grammar:

S aAa
A aA/ a
Here we have three production rules
i. SaAa
ii. AaA
iii. A a

11
DERIVATION OF A STRING

String ‘w’ of terminals is generated by the grammar if:

Starting with the start variable, one can apply productions and end up
with ‘w’.
A sequence of replacements of non-terminal symbols or a sequence
of strings so obtained is a derivation of ‘w’.

We can derive sentence ‘aaa’ from

Consider the Grammar: this grammar.
S aAa SaAa
A aA/ a S aaa (Aa)

12
DERIVATION OF A STRING

In general a derivation step is:

A   if there is a production rule A in a grammar

where
+  and  are arbitrary strings of
terminal and non-terminal symbols

1  2  ...  n (n derives from 1 or 1 derives n )

 : derives in one step


+ : derives in zero or more steps
13
 : derives in one or more steps
DERIVATION OF A STRING

Consider the Grammar:

S aSa/b/aA
A a
Derived in one step Sb

Derived in two steps

SaSa  aba

Derived in multiple
SaSa  aaSaaaaaSaaaaaabaaa
steps

14
SENTENCE AND SENTENTIAL
FORM

A sentence of L(G) is a string of terminal symbols only.

A sentential form is a combination of terminals and non-terminals.

Say, we have a production
S
If  contains non-terminals, it is called as a sentential form of G.

If  does not contain non-terminals, it is called as a sentence of G.

15
LEFT-MOST AND RIGHT-MOST DERIVATIONS

We can derive the grammar in two ways:

 Left-Most Derivation
 Right- Most Derivation

In Left Most Derivation , we start deriving the string ‘w’ from the
left side and convert all non terminals into terminals.

In Right Most Derivation, we start deriving the string ‘w’ from the
right side and convert all non terminals into terminals.
16
LEFT-MOST DERIVATIONS

Consider the Grammar:

E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
EE+E (EE+E) EE+E (EE+E)
Eid+E (Eid) EE+E*E (EE*E)
Eid+E*E (EE*E) Eid+E*E (Eid)
Eid+id*E (Eid) Eid+id*E (Eid)
Eid+id*id (Eid) Eid+id*id (Eid)
17
PARSE TREE FOR LEFT-MOST
DERIVATIONS

E
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id E E
Derive the string ‘id+id *id’
+

EE+E (EE+E) id E E
Eid+E (Eid) *

Eid+E*E (EE*E)
id
Eid+id*E (Eid) id

Eid+id*id (Eid)
18
RIGHT-MOST DERIVATIONS

Consider the Grammar:

E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EEE (EEE) EE*E (EE+E)

EEid (Eid) EE+EE (EE+E)

EE+Eid (EE+E) EE+Eid (Eid)

EE+idid (Eid) EE+idid (Eid)

Eid+idid (Eid) Eid+idid (Eid)

19
RIGHT-MOST DERIVATIONS
Consider the Grammar:
E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’
E

EE*E (EE+E) E E

EE*id (Eid) *

EE+E*id (EE+E) E id
+ E
EE+id*id (Eid)
Eid+id*id (Eid)
id id
20
SECTION 2.2: AMBIGUOUS GRAMMAR

21
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e. it can be
derived by more then one ways from LMD or RMD.

Consider the Grammar:

E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EE+E (EE+E) EEE (EEE)

Eid+E (Eid) Eid*E (Eid)

More than one
leftmost derivations
Eid+E*E (EE*E) Eid+E*E (EE+E)
Ambiguous Grammar
Eid+id*E (Eid) Eid+id*E (Eid)

Eid+idid (Eid) Eid+idid (Eid)

22
AMBIGUOUS GRAMMAR
A grammar is Ambiguous if it has:
More than one left most or more than one right most derivation for a given sentence i.e.
it can be derived by more then one ways from LMD or RMD.

Consider the Grammar:

E E+E/E-E/E*E/E/(E)/id
Derive the string ‘id+id *id’

EEE (EEE) EE+E (EE+E)

More than one
EE*id (Eid) EE+E*E (EE*E) rightmost derivations
EE+E*id (EE+E) EE+E*id (Eid)
Ambiguous Grammar
EE+id*id (Eid) EE+id*id (Eid)

Eid+idid (Eid) Eid+idid (Eid)

23
AMBIGUITY (CONT.)
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts

if E1 then if E2 then S1 else S2

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

E2 S1 E2 S1 S2
1 2 24
AMBIGUITY (CONT.)

• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.

• The unambiguous grammar will be:

stmt  matchedstmt | unmatchedstmt

matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt  if expr then stmt |

if expr then matchedstmt else unmatchedstmt

25
SECTION 2.3: LEFT RECURSION AND LEFT
FACTORING

26
LEFT RECURSION
 A grammar is left recursive if it has a non-terminal A such
that there is a derivation.
+
A  A for some string 

 Top-down parsing techniques cannot handle left-recursive

grammars.
 So, we have to convert our left-recursive grammar into an
equivalent grammar which is not left-recursive.
 The left-recursion may appear in a single step of the
derivation (immediate left-recursion), or may appear in more
27
than one step of the derivation.
IMMEDIATE LEFT-RECURSION

AA|  A   A'
Eliminate
A'   A' | 
where  does immediate left
recursion
not start with A An equivalent grammar

In general,
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A

 Eliminate immediate left recursion

A  1 A' | ... | n A' an equivalent grammar
A'  1 A ' | ... | m A' |  28
REMOVING IMMEDIATE LEFT-RECURSION

E  E+T | T Immediate Left Recursion In

EE+T|T (AA | )
A is E;  is +T and  is T
Applying Rule we get
E  T E'
E  T E' (A   A ')
E’  +T E' |  (A '   A '| ) E’  +T E' | 
T  F T'
T’  *F T' | 
TT*F|F (AA | ) F  id | (E)
A is T;  is *F and  is F
Applying Rule we get Final Output 29
T  F T' (A   A')
T  *F T' | 
’
(A’   A'|)
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the Grammar S 
No Immediate left recursion in the
Aa | b grammar
A  Sc | d
Substitution

S  Aa  Sca Immediate left recursion in the

or grammar
A  Sc  Aac

We need to check and eliminate both Immediate left recursion and Left recursion

30
NO IMMEDIATE LEFT-RECURSION BUT
GRAMMAR IS LEFT RECURSIVE
Consider the Grammar S 
Aa | b No Immediate left recursion in S
A  Ac | Sd | f
Order of non-terminals: S, A
for S:
Substitute ASd with Aad|bd
- there is no immediate left recursion in S.

S  Aa | b Immediate left recursion in A

A  Ac | Aad |bd| f
1 is c; 2 is ad; 1 is bd and 2 is f

Applying Rule

S  Aa | b
We get: A  bdA' | fA'
A  bdA' | fA' A'  cA' | adA' | 
A'  cA' | adA' |  31

Final Output
NO IMMEDIATE LEFT-RECURSION AA| 

BUT AA'
A'   A
GRAMMAR IS LEFT RECURSIVE '|

Consider the Grammar Order of non-terminals: A, S

S  Aa | b
A  Ac | Sd | f

for A:
Eliminate the immediate left-recursion A  Ac | Sd | f
in A  is c; 1 is Sd and 2 is f
A  SdA' | fA'
A'  cA' | 

for S:
- Replace S  Aa with S  SdA' a|fA'a

So, we will have S  SdA' a | fA'a | b

Remove the left recursion from the grammar given below

ABx y | x
BCD
CA| c
D d

33
ELIMINATE LEFT-RECURSION -- ALGORITHM

- Arrange non-terminals in some order: A1 ... An

- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai  A j 
by
Ai  1  | ... | k 
where Aj  1 | ... | k
}
- eliminate immediate left-recursions among Ai productions 34

}
LEFT-FACTORING
Consider the Grammar
S  Aa |A b

stmt  if expr then stmt else stmt |

if expr then stmt

When we see A or if, we cannot determine which production rule to

choose to expand S or stmt since both productions have same left most
symbol at the starting of the production.
35
(A in first example and if in second example)
LEFT-FACTORING (CONT.)

If there is a grammar
A  1|2
where  is non-empty and the first symbols of 1 and 2
(if they have one)are different.

Re-write the grammar as follows:

A  A'
A'  1|2

Now, we can immediately expand A to A'

This rewriting of the grammar is called LEFT FACTORING

36
LEFT-FACTORING --
ALGORITHM
 For each non-terminal A with two or more alternatives
(production rules) with a common non-empty prefix, let say

A  1 | ... | n | 1 | ... | m

convert it into

A  A' | 1 | ... | m
A'  1 | ... | n

37
LEFT-FACTORING – EXAMPLE1

A  abB | aB | cdg | cdeB | cdfB

  is a; 1 is bB;2 is B

A  aA' | cdg | cdeB | cdfB

A'  bB | B
  is cd; 1 is g; 2 is eB; 3 is fB

A  aA' | cdA''
A'  bB | B
A''  g | eB | fB

38
LEFT-FACTORING – EXAMPLE2

A  ad | a | ab | abc | b
  is a; 1 is d; 2 is  ; 3 is b, 4 is bc

A  aA' | b
A'  d |  | b | bc
  is b; 1 is  ; 2 is c

A  aA' | b
A'  d |  | bA''
A''   | c

39
NON-CONTEXT FREE LANGUAGE
CONSTRUCTS
 There are some language constructions in the programming
languages which are not context-free. This means that, we cannot
write a context-free grammar for these constructions.

 L1 = { c |  is in (a|b)*} is not context-free

 Declaring an identifier and checking whether it is declared
or not later. We cannot do this with a context-free language. We
need semantic analyzer (which is not context-free).

 L2 = {anbmcndm | n1 and m1 } is not context-free

 Declaring two functions (one with n parameters, the other
one with m parameters), and then calling them with actual 40
parameters.
SECTION 2.4 : TOP DOWN PARSING

41
TOP-DOWN PARSING
 Beginning with the start symbol, try to guess the productions
to apply to end up at the user's program.

42
CHALLENGES IN TOP-DOWN PARSING

 Top-down parsing begins with virtually no information.

 Begins with just the start symbol, which matches every program.

 How can we know which productions to apply?

 In general, we can't.

 There are some grammars for which the best we can do is guess and
backtrack if we're wrong.
 If we have to guess, how do we do it?

43
TOP-DOWN PARSING
 Top-down parser
 Recursive-Descent Parsing
 Backtracking is needed (If a choice of a production rule does not work, we backtrack

to try other alternatives.)

 It is a general parsing technique, but not widely used.

 Not efficient

 Predictive Parsing
 No backtracking

 Efficient

 Needs a special form of grammars (LL(1) grammars).

 Recursive Predictive Parsing is a special form of Recursive Descent parsing without

backtracking.
 Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

44
RECURSIVE-DESCENT PARSING
(USES BACKTRACKING)
 Backtracking is needed.
 It tries to find the left-most derivation.

S  aBc
B  bc | b
S S
Input: abc
a Bc a Bc

b c b
45
fails, backtrack
RECURSIVE PREDICTIVE
PARSING
 Each non-terminal corresponds to a procedure.

Ex: A  aBb (This is only the production rule for A)

proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}

46
RECURSIVE PREDICTIVE PARSING
(CONT.)
A  aBb | bAB

proc A {
case of the current token
{
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
} 47
RECURSIVE PREDICTIVE PARSING
(CONT.)
 When to apply -productions.

A  aA | bB | 

 If all other productions fail, we should apply an -production. For example, if the
current token is not a or b, we may apply the -production.

 Most correct choice: We should apply an -production for a non-terminal A when

the current token is in the follow set of A (which terminals can follow A in the
sentential forms).

48
TOP-DOWN, PREDICTIVE PARSING: LL(1)

 L: Left-to-right scan of the tokens

 L: Leftmost derivation.

 (1): One token of lookahead

 Construct a leftmost derivation for the sequence of tokens.

 When expanding a nonterminal, we predict the production to use by

looking at the next token of the input.

49
TOP-DOWN, PREDICTIVE PARSING:
LL(1)

a grammar   a grammar suitable for predictive

eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.

 When re-writing a non-terminal in a derivation step, a predictive parser can uniquely

choose a production rule by just looking the current symbol in the input string.

A  1 | ... | n input: ... a .......

current token

50
TOP-DOWN, PREDICTIVE PARSING: LL(1)

stmt  if ...... |
while ...... |
begin ...... |
for .....
 When we are trying to write the non-terminal stmt, if the current
token is if we have to choose first production rule
 When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
 We eliminate the left recursion in the grammar, and left factor it. But
it may not be suitable for predictive parsing (not LL(1) grammar).
51
NON-RECURSIVE PREDICTIVE
PARSING -- LL(1) PARSER
 Non-Recursive predictive parsing is a table-driven parser.
 It is a top-down parser.
 It is also known as LL(1) Parser.

Input Buffer

Stack Non-Recursive Output

Predictive Parser

Parsing Table
52
LL(1) PARSER
Input buffer
 Contains the string to be parsed. We will assume that its end is marked with a special symbol $.

Output
 A production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.

Stack
 Contains the grammar symbols
 At the bottom of the stack, there is a special end marker symbol $.
 Initially the stack contains only the symbol $ and the starting symbol S.
 $S  initial stack
 When the stack is emptied (ie. only $ left in the stack), the parsing is completed.

Parsing table
 A two-dimensional array M[A,a]
 Each row is a non-terminal symbol
 Each column is a terminal symbol or the special symbol $
 Each entry holds a production rule.
53
LL(1) PARSER – PARSER
ACTIONS
 The symbol at the top of the stack (say X) and the current symbol in the input string (say a)
determine the parser action.
 There are four possible parser actions.

1. If X and a are $  parser halts (successful completion)

2. If X and a are the same terminal symbol (different from $)

 parser pops X from the stack, and moves the next symbol in the input buffer.

3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The parser
also outputs the production rule XY1Y2...Yk to represent a step of the derivation.

4. none of the above  error

 all empty entries in the parsing table are errors.
 If X is a terminal symbol different from a, this is also an error case.

54
CONSTRUCTING LL(1) PARSING
TABLES
 Two functions are used in the construction of LL(1) parsing
tables:
 FIRST FOLLOW

 FIRST() is a set of the terminal symbols which occur as first

symbols in strings derived from  where  is any string of
grammar symbols.

 FOLLOW(A) is the set of the terminals which occur immediately

after (follow) the non-terminal A in the strings derived from the
starting symbol. *
 a terminal a is in FOLLOW(A) if S  Aa

55
COMPUTE FIRST FOR ANY STRING X

 We want to tell if a particular nonterminal A derives a string starting

with a particular terminal t.
 Intuitively, FIRST(A) is the set of terminals that can be at the start of
a string produced by A.
 If we can compute FIRST sets for all non terminals in a grammar,
we can efficiently construct the LL(1) parsing table.

56
COMPUTE FIRST FOR ANY STRING X
 Initially, for all non-terminals A, set
FIRST(A) = { t | A → t  for some  }
Consider the grammar :
SaC/bB
Bb
Cc
FIRST(S) ={a,b}; FIRST (B) ={b} and FIRST(C) ={c}

 For each nonterminal A, for each production A → B, set

FIRST(A) = FIRST(A) ∪ FIRST(B)
Consider the grammar :
SaC/bB/C
Bb Consider the grammar:
Cc SAb
FIRST(S) ={a,b,c}; Aa
FIRST (B) ={b} FIRST(S)=FIRST (A)={a} 57

FIRST(C) ={c}
FIRST COMPUTATION WITH ΕPSILON
 For all NT A where A → ε is a production, add ε to FIRST(A).
For eg. Sa|ε FIRST(S) {a, ε}
 For each production A → , where  is a string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ { ε }.
For eg. SAB|c ; Aa| ε ; B b| ε
FIRST(S) {a, b,c, ε} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ;
 For each production A → t, where  is a string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ { t }
For eg. SABcD ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b, c} ; FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}
 For each production A → B, where  is string of NT whose FIRST sets contain ε, set
FIRST(A) = FIRST(A) ∪ (FIRST(B) - { ε }).
For eg. SABDc|f ; Aa| ε ; B b| ε ; Dd
FIRST(S) {a,b,d,f } ;FIRST(A) {a, ε} ; FIRST(B) {b, ε} ; FIRST(D) {d}

58
FOLLOW SET
 The FOLLOW set represents the set of terminals that might
come after a given nonterminal
 Formally:

FOLLOW(A) = { t | S ⇒* αAt for some α,  }

where S is the start symbol of the grammar.
 Informally, every nonterminal that can ever come after A in a
derivation.

59
COMPUTE FOLLOW FOR ANY STRING X

RULE 1: If S is the start symbol  $ is in FOLLOW(S)

RULE 2: if A  B is a production rule

 everything in FIRST() is FOLLOW(B) except 

RULE 3(i) If ( A  B is a production rule ) or

RULE 3(ii) ( A  B is a production rule and  is in FIRST() )
 everything in FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any follow set.

60
FIRST AND FOLLOW SET
EXAMPLE

Consider the grammar:

S  Aa
ABD
B  b|
D  d| 

FIRST(S) = {b, d, a} FOLLOW(S) = { $ } (Rule 1)

FIRST(A) = { b, d,  } FOLLOW(A) = { a } (Rule 2)
FIRST(B) = { b,  } FOLLOW(B) = { d, a } (Rule 2; Rule 3(ii))
FIRST(D) = { d,  } FOLLOW(D) = { a } Rule 3

FIRST(C) = {public, final, class} FOLLOW(C)={$} (Rule 1)

FIRST(P) = { public, } FOLLOW(P)={final, class} (Rule 2; Rule 3 (ii))
FIRST(F) = { final, } FOLLOW(F) ={class} (Rule 2)
FIRST(X) = { extends,  } FOLLOW(X)={implements,$}(Rule 2; Rule 3(ii))
FIRST(Y) = { implements,  } FOLLOW(Y)={$} (Rule 3(i))
FIRST(I) = { id} FOLLOW(I)={$} (Rule 3(i)) 62
FIRST(J) = { ‘,’ ,  } FOLLOW(J)={$} (Rule 3(i))
LL(1) PARSING

Consider the grammar:

E  E+T|T
T  T*F|F
F  (E) | id

Remove Immediate Left Recursion:

(Ref: Slide no. 29)

E  TE'
E'  +TE'|
T  FT'
T'  *FT'| 63

F  (E)|id
FIRST EXAMPLE

Consider the grammar:

E  TE'
E'  +TE'| 
T  FT'
T'  *FT'|
F  (E)| id

FIRST(F) = {(,id}
FIRST(T') = {*, }
FIRST(T) = {(,id}
FIRST(E') = {+, }
FIRST(E) = {(,id}
64
FIRST(F) = {(,id}
FIRST(T’) = {*, }
FIRST(T) = {(,id}
FOLLOW EXAMPLE
FIRST(E’) = {+, } 1. If S is the start symbol  $ is in FOLLOW(S)
FIRST(E) = {(,id} 2(i) If A  B is a production rule
 everything in FIRST() is FOLLOW(B) except 
3(i) If ( A  B is a production rule ) or
3(ii) ( A  B is a production rule and  is in FIRST() )
Consider the following grammar:
 everything in FOLLOW(A) is in FOLLOW(B).
E  TE'
E’  +TE' |
T  FT' ETE’ {(Rule 1: $ in FOLLOW(E);
(Rule 2: A B :  is ; B is T and  is E’ );
T’  *FT' |  (Rule3(i): A B :  is T; B is E ’);
F  (E) |id Rule 3 (ii): A B :  is ; B is T and E ’ is ; FIRST of  has )}
E+TE ’ |  {Rule 2: A B :  is +; B is T and  is E’ );
(Rule3(i): A B:  is +T; B is E ’;
(Rule3(ii): A B:  is +; B is T;  is E ’;FIRST of  has )}
TFT ’ {Rule 2: A B :  is ; B is F and  is T’);
FOLLOW(E) = { $, ) } (Rule3(i): A B :  is F; B is T ’);
FOLLOW(E') = { $, ) } (Rule3(ii): A B :  is  ; B is F and  is T ’ FIRST of  has )}
FOLLOW(T) = { +, ), $ } T’*FT ’|  {Rule 2: A B :  is *; B is F and  is T ’);
(Rule3(i): A B :  is *; B is F;  is T ’);
FOLLOW(T') = { +, ), $} Rule3(ii): A B :  is *; B is F;  is T ’; FIRST of  has )}
FOLLOW(F) = {+, *, ), $ } F  (E)|id 65
{(Rule 2: A B :  is ‘(‘; B is E and ‘)’ is  )}
CONSTRUCTING LL(1) PARSING
TABLE -- ALGORITHM
 for each production rule A   of a grammar G
 for each terminal a in FIRST()
 add A   to M[A,a]
 If  in FIRST()
 for each terminal a in FOLLOW(A) add A  
to M[A,a]
 If  in FIRST() and $ in FOLLOW(A)
 add A   to M[A,$]

 All other undefined entries of the parsing table are error

entries. 66
CONSTRUCTING LL(1) PARSING TABLE
E  TE' FIRST(TE'id}  E  TE'’ into M[E,(] and M[E,id]

E'  +TE' FIRST(+TE' )={+}  E’  +TE' into M[E',+]

E'   FIRST()={}  none

but since  in FIRST()
and FOLLOW(E')={$,)}  E'   into M[E' and M[E',)]

T  FT' FIRST(FT’)={(,id}  T  FT' into M[T,(] and M[T,id]

T'  FT' FIRST(FT’ )={}  T'  F' into M[T',*]

T'   FIRST()={}  none

but since  in FIRST()
and
FOLLOW(T’)={$,),+}  T'   into M[T',$], M[T' and M[T',+]
67
F  (E) FIRST((E) )={(}  F  (E) into M[F,(]

F  id FIRST(id)={id}  F  id into M[F,id]

LL(1) PARSER – EXAMPLE 1
FIRST(F) = {(,id} FOLLOW(E) = { $, ) }
FIRST(T’) = {*, }
E  TE' FIRST(T) = {(,id}
FOLLOW(E') = { $, ) }
FOLLOW(T) = { +, ), $ }

E' +TE' |  FIRST(E’) = {+, }

FIRST(E) = {(,id}
FOLLOW(T') = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

T  FT'
T' *FT' | 
F  (E) | id FIRST (E') has , so add E’  in FOLLOW (E’)
FIRST (T') has , so add T’  in FOLLOW (T’)

id + * ( ) $
E E  TE' E  TE'
E' E'  +TE' E'   E'  
T T  FT' T  FT'
T' T'   T'  *FT’ T'   T'  68
F F  id F  (E)
LL(1) PARSER – EXAMPLE 1
Stack Input Output id + * ( ) $
E E  TE' E  TE'
$E id+id$ ETE'
$E'T id+id$ TFT' E' E'  +TE' E'   E'  
$E'T'F id+id$ Fid T T  FT' T  FT'
$E'T'id id+id$
T' T'   T'  *FT’ T'   T'  
$E'T' +id$ T'
F F  id F  (E)
$E' +id$ E’+TE'
$E'T+ +id$
$E'T id$ TFT'
$E'T'F id$ Fid
$E’T'id id$
$ET' $ T'
$E' $ E'
$ $ Accept
69
LL(1) PARSER – EXAMPLE 2

a b $
S  aBa S S  aBa
B  bB |  B B B  bB

LL(1) Parsing Table

Stack Input Output
$S abba$ SaBa
$aBa abba$
$aB bba$ BbB
$aBb bba$
$aB ba$ BbB
$aBb ba$
$aB a$ B
$a a$ 70

$ $ Accept, Successful
Completion
LL(1) PARSER – EXAMPLE2 (CONT.)

Outputs: S  aBa B  bB B  bB B

Derivation(left-most): SaBaabBaabbBaabba

S
parse tree
a B a

b B

b B
71

A GRAMMAR WHICH IS NOT
LL(1)

SiCtSE | a FOLLOW(S) = { $,e }

EeS |  FOLLOW(E) = { $,e }
Cb FOLLOW(C) = { t }
a b e i t $
FIRST(iCtSE) = {i}
S Sa S  iCtSE
FIRST(a) = {a}
FIRST(eS) = {e} E EeS E
FIRST() = {} E
FIRST(b) = {b}
C Cb
two production rules for M[E,e]
72

Problem  ambiguity
A GRAMMAR WHICH IS NOT LL(1)
(CONT.)
 What do we have to do it if the resulting parsing table contains multiply defined
entries?
 If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
 If the grammar is not left factored, we have to left factor the grammar.
 If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
 A left recursive grammar cannot be a LL(1) grammar.
 A  A | 
 any terminal that appears in FIRST() also appears FIRST(A) because A 
.

any terminal that appears in FIRST() also appears in FIRST(A)
If  is ,

and FOLLOW(A).
 A grammar is not left factored, it cannot be a LL(1) grammar
• A  1 | 2
 any terminal that appears in FIRST(1) also appears in FIRST(2).
73
 An ambiguous grammar cannot be a LL(1) grammar.
PROPERTIES OF LL(1)
GRAMMARS
 A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A   and A  
1. Both  and  cannot derive strings starting with same terminals.
2. At most one of  and  can derive to .
3. If  can derive to , then  cannot derive to any string starting with a
terminal in FOLLOW(A).
 In other word we can say that a grammar G is LL(1) iff for any
productions
A → ω and A → ω , the sets
1 2

FIRST(ω1 FOLLOW(A)) and FIRST(ω2 FOLLOW(A)) are disjoint.


This condition is equivalent to saying that there are no conflicts in the
74
table.

Future Prospects 1 - SB
75% (4)
Future Prospects 1 - SB
77 pages
Progression Paper 2023-Stage 6 English
92% (13)
Progression Paper 2023-Stage 6 English
48 pages
1 MS CHOUIT Aboubaker MS1 Pre Sequence
50% (2)
1 MS CHOUIT Aboubaker MS1 Pre Sequence
90 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
RAZ-AA 006-Rainforest Animals
100% (1)
RAZ-AA 006-Rainforest Animals
15 pages
How To Learn Amharic Easily
No ratings yet
How To Learn Amharic Easily
5 pages
Compiler Design Chapter-3
0% (1)
Compiler Design Chapter-3
177 pages
2.2 - Syntax Analysis (Upto Top-Down Parsing)
No ratings yet
2.2 - Syntax Analysis (Upto Top-Down Parsing)
91 pages
Spanish Commands
No ratings yet
Spanish Commands
26 pages
Module-2 1
No ratings yet
Module-2 1
51 pages
Syntax Analysis
No ratings yet
Syntax Analysis
58 pages
CHAPTER 1 - Giving Suggestion
100% (1)
CHAPTER 1 - Giving Suggestion
2 pages
Chapter - 3
No ratings yet
Chapter - 3
46 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
Syntax Analysis: CD: Compiler Design
No ratings yet
Syntax Analysis: CD: Compiler Design
36 pages
Alone Together British English Student B2 C1
No ratings yet
Alone Together British English Student B2 C1
7 pages
Compiler Unit2
No ratings yet
Compiler Unit2
89 pages
Chapter - Three
No ratings yet
Chapter - Three
139 pages
GI B1 U2 Grammar Standard
0% (1)
GI B1 U2 Grammar Standard
1 page
3 Role of Parser
No ratings yet
3 Role of Parser
135 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
St1 English
No ratings yet
St1 English
7 pages
Ll1parser 190921075612
No ratings yet
Ll1parser 190921075612
84 pages
CD Unit-3 Part-1
No ratings yet
CD Unit-3 Part-1
99 pages
The Cambridge Companion To English Dictionaries 1st Edition Sarah Ogilvie Editor PDF Download
No ratings yet
The Cambridge Companion To English Dictionaries 1st Edition Sarah Ogilvie Editor PDF Download
76 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Unit Iii
No ratings yet
Unit Iii
95 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
79 pages
SSC Module3 SyntaxAnalysis
No ratings yet
SSC Module3 SyntaxAnalysis
54 pages
Syntax Analysis
No ratings yet
Syntax Analysis
90 pages
Ch4a Modified
No ratings yet
Ch4a Modified
53 pages
Unit-II CD
No ratings yet
Unit-II CD
81 pages
Module 2a - With Soln
No ratings yet
Module 2a - With Soln
90 pages
Chapter4 1
No ratings yet
Chapter4 1
61 pages
Review of Related Literature: Universitas Sumatera Utara
No ratings yet
Review of Related Literature: Universitas Sumatera Utara
18 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
CD Unit 3
No ratings yet
CD Unit 3
76 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
Chapter-3-Syntax Analysis
No ratings yet
Chapter-3-Syntax Analysis
126 pages
4.types of Grammars
No ratings yet
4.types of Grammars
40 pages
Compiler Design Lec-Three Syntax Analysis
No ratings yet
Compiler Design Lec-Three Syntax Analysis
60 pages
Syntax Analysis
No ratings yet
Syntax Analysis
47 pages
TOC - Tutorial II
No ratings yet
TOC - Tutorial II
2 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
88 pages
Chapter 3
No ratings yet
Chapter 3
41 pages
Chapter-3 So Far
No ratings yet
Chapter-3 So Far
50 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
Top Down
No ratings yet
Top Down
25 pages
Parser
No ratings yet
Parser
36 pages
CD Chapter 2
No ratings yet
CD Chapter 2
39 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Top To Bottom
No ratings yet
Top To Bottom
31 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Qing - English Language Contemporary Media Example Bank Template
No ratings yet
Qing - English Language Contemporary Media Example Bank Template
20 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
M2 Compiler Design
No ratings yet
M2 Compiler Design
51 pages
Tekkom M4,5
No ratings yet
Tekkom M4,5
29 pages
Chapter 4 - Syntax Analysis Part 1
No ratings yet
Chapter 4 - Syntax Analysis Part 1
36 pages
Eng Ii - Unit 1
No ratings yet
Eng Ii - Unit 1
28 pages
CD - Ch.2
No ratings yet
CD - Ch.2
39 pages
UNIT 3-Modal Verbs-1
No ratings yet
UNIT 3-Modal Verbs-1
13 pages
Homonymsgraphsphones
No ratings yet
Homonymsgraphsphones
15 pages
Fla and Sla
No ratings yet
Fla and Sla
12 pages
Lesson 17
No ratings yet
Lesson 17
21 pages
Tugas Makalah Devi
No ratings yet
Tugas Makalah Devi
10 pages
KCA015 Unit2
No ratings yet
KCA015 Unit2
29 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
English Conjunctions Connectors For Class 8
No ratings yet
English Conjunctions Connectors For Class 8
8 pages
Chapter 1 Basics and Grammar 99
No ratings yet
Chapter 1 Basics and Grammar 99
10 pages
Chapter 3 - Syntax Analysis
No ratings yet
Chapter 3 - Syntax Analysis
51 pages
Syntax Analysis: COP5621 Compiler Construction
No ratings yet
Syntax Analysis: COP5621 Compiler Construction
36 pages
Sentence Stress
No ratings yet
Sentence Stress
6 pages
Chapter 3 - Syntax Analyzer
No ratings yet
Chapter 3 - Syntax Analyzer
28 pages
Old, Middle, Modern English
No ratings yet
Old, Middle, Modern English
5 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
M3 Readingwriting Edited
No ratings yet
M3 Readingwriting Edited
12 pages
History of Language Teaching
No ratings yet
History of Language Teaching
9 pages
Travel - Grammar (Past Simple Vs Present Participle)
No ratings yet
Travel - Grammar (Past Simple Vs Present Participle)
11 pages
KHS English 8 Diagnostic Exam 10 Items 2020 2021
No ratings yet
KHS English 8 Diagnostic Exam 10 Items 2020 2021
4 pages
Definition-Docs
No ratings yet
Definition-Docs
5 pages
Cockney as an example of a broad accent of English 42занФ Piddubna Anastasiia
No ratings yet
Cockney as an example of a broad accent of English 42занФ Piddubna Anastasiia
11 pages
Past Simple
No ratings yet
Past Simple
5 pages
CH03
No ratings yet
CH03
57 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Carol Viviana Ramos Acevedo July, 13 2020 01: OBJETIVO: Practicar El Uso Del Past Perfect Simple. Past Perfect Simple
No ratings yet
Carol Viviana Ramos Acevedo July, 13 2020 01: OBJETIVO: Practicar El Uso Del Past Perfect Simple. Past Perfect Simple
3 pages
Lec03 parserCFG
No ratings yet
Lec03 parserCFG
27 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
Context Free Grammars
No ratings yet
Context Free Grammars
10 pages
Basics of Triad Chords: A Little Help…Please!
From Everand
Basics of Triad Chords: A Little Help…Please!
Lynette Haddock
5/5 (1)

Lec02-Syntax Analysis and LL

Uploaded by

Lec02-Syntax Analysis and LL

Uploaded by

SYNTAX ANALYSIS

2ND PHASE OF COMPILER CONSTRUCTION

 The syntax analyzer (parser) checks whether a given source program

 It creates the syntactic structure of the given source program.

• The smallest item is a token.

source Lexical token

 We categorize the parsers into two groups:

 Efficient top-down and bottom-up parsers can be implemented only

 LR for bottom-up parsing

Grammar defines a Language.

Context-free grammar (CFG) is used to generate a language

CFG G consist of 4 symbol (T,V, S, P):

 T: A finite set of terminals

 V: A finite set of non-terminals ( also denoted by N)

 S: A start symbol (Non-terminal symbol with which

 P: A finite set of productions rules

Consider the Grammar:

Consider the Grammar:

Non - Terminals include:

Consider the Grammar:

Production Rules include:

Consider the Grammar:

String ‘w’ of terminals is generated by the grammar if:

We can derive sentence ‘aaa’ from

In general a derivation step is:

1  2  ...  n (n derives from 1 or 1 derives n )

 : derives in one step

Consider the Grammar:

Derived in two steps

A sentence of L(G) is a string of terminal symbols only.

A sentential form is a combination of terminals and non-terminals.

If  does not contain non-terminals, it is called as a sentence of G.

We can derive the grammar in two ways:

Consider the Grammar:

Consider the Grammar:

EE*E (EE*E) EE*E (EE+E)

EE*id (Eid) EE+E*E (EE+E)

EE+E*id (EE+E) EE+E*id (Eid)

EE+id*id (Eid) EE+id*id (Eid)

Eid+id*id (Eid) Eid+id*id (Eid)

Consider the Grammar:

EE+E (EE+E) EE*E (EE*E)

Eid+E (Eid) Eid*E (Eid)

Eid+id*id (Eid) Eid+id*id (Eid)

Consider the Grammar:

EE*E (EE*E) EE+E (EE+E)

Eid+id*id (Eid) Eid+id*id (Eid)

if E1 then if E2 then S1 else S2

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

• The unambiguous grammar will be:

stmt  matchedstmt | unmatchedstmt

matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt  if expr then stmt |

 Top-down parsing techniques cannot handle left-recursive

 Eliminate immediate left recursion

E  E+T | T Immediate Left Recursion In

S  Aa  Sca Immediate left recursion in the

S  Aa | b Immediate left recursion in A

Consider the Grammar Order of non-terminals: A, S

So, we will have S  SdA' a | fA'a | b

Remove the left recursion from the grammar given below

- Arrange non-terminals in some order: A1 ... An

stmt  if expr then stmt else stmt |

When we see A or if, we cannot determine which production rule to

Re-write the grammar as follows:

Now, we can immediately expand A to A'

This rewriting of the grammar is called LEFT FACTORING

A  1 | ... | n | 1 | ... | m

A  abB | aB | cdg | cdeB | cdfB

A  aA' | cdg | cdeB | cdfB

 L1 = { c |  is in (a|b)*} is not context-free

 L2 = {anbmcndm | n1 and m1 } is not context-free

 Top-down parsing begins with virtually no information.

 How can we know which productions to apply?

to try other alternatives.)

EEE (EEE) EE*E (EE+E)

EEid (Eid) EE+EE (EE+E)

EE+Eid (EE+E) EE+Eid (Eid)

EE+idid (Eid) EE+idid (Eid)

Eid+idid (Eid) Eid+idid (Eid)

EE+E (EE+E) EEE (EEE)

Eid+idid (Eid) Eid+idid (Eid)

EEE (EEE) EE+E (EE+E)

Eid+idid (Eid) Eid+idid (Eid)

T'  FT' FIRST(FT’ )={}  T'  F' into M[T',*]