Chapter 7 - Compiler Construction
Chapter 7 - Compiler Construction
LALR
LALR Parser is lookahead LR parser. It is the most powerful parser which can handle large classes of
grammar. The size of CLR parsing table is quite large as compared to other parsing table. LALR
reduces the size of this table.LALR works similar to CLR. The only difference is , it combines the
similar states of CLR parsing table into one single state.
The general syntax becomes [A->∝.B, a ]
where A->∝.B is production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
CASE 1 –
A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions as
well.
Suppose this is B’s production. The look ahead of this production is given as- we look at previous
production i.e. – 0th production. Whatever is after B, we find FIRST(of that value) , that is the
lookahead of 1st production. So, here in 0th production, after B, C is there. Assume FIRST(C)=d, then
1st production become.
B->.D, d
CASE 2 –
A->∝.B, a
Here,we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead of
1st production. ie-
B->.D, a
CASE 3 –
Here, the 1st production is a part of the previous production, so the lookahead will be the same as
that of its previous production.
Example Problem:
Question: Construct CLR parsing table for the given context free grammar
S -> AA
A -> aA|b
Solution:
STEP 1
Now, the 1st production came into existence because of ‘ . ‘ before ‘S’ in 0th production. There
is nothing after ‘S’, so the lookahead of 0th production will be the lookahead of 1st
production. i.e. : S–>.AA ,$
Now, the 2nd production came into existence because of ‘ . ‘ before ‘A’ in the 1st production.
After ‘A’, there’s ‘A’. So, FIRST(A) is a,b. Therefore, the lookahead of the 2 nd production
becomes a|b.
Now, the 3rd production is a part of the 2nd production. So, the look ahead will be the same.
STEP 2
RULE: If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
RULE: From each state to the next state, the ‘ . ‘ shifts to one place to the right.
Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen by the
compiler. Since I2 is a part of the 1st production, the lookahead is same i.e. $.
I0 goes to I3 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . a is seen by the
compiler.since I3 is a part of 2nd production, the lookahead is same i.e. a|b.
I0 goes to I4 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . b is seen by the
compiler. Since I4 is a part of 3rd production, the lookahead is same i.e. a|b.
I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen by the
compiler. Since I5 is a part of the 1st production, the lookahead is same i.e. $.
I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is seen by the
compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen by the
compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the
compiler. Since I3 is a part of the 2nd production, the lookahead is same i.e. a|b.
I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the
compiler. Since I8 is a part of the 2nd production, the lookahead is same i.e. a|b.
I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the
compiler. Since I9 is a part of the 2nd production, the lookahead is same i.e. $.
I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the
compiler. Since I6 is a part of the 2nd production, the lookahead is same i.e. $.
I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen by the
compiler. Since I6 is a part of the 3rd production, the lookahead is same i.e. $.
STEP 3
Once we make a CLR parsing table, we can easily make a LALR parsing table from it.
We combine two 47 row into one by combining each value in the single 47 row.
We combine two 89 row into one by combining each value in the single 89 row.
==================================================================================
CLR
LR parsers :
It is an efficient bottom-up syntax analysis technique that can be used to parse large classes of
context-free grammar is called LR(k) parsing.
L stands for the left to right scanning
R stands for rightmost derivation in reverse
k stands for no. of input symbols of lookahead
Advantages of LR parsing :
It recognizes virtually all programming language constructs for which CFG can be written
1. SLR
2. CLR
3. LALR
CLR Parser :
The CLR parser stands for canonical LR parser.It is a more powerful LR parser.It makes use of
lookahead symbols. This method uses a large set of items called LR(1) items.The main difference
between LR(0) and LR(1) items is that, in LR(1) items, it is possible to carry more information in a
state, which will rule out useless reduction states.This extra information is incorporated into the state
by the lookahead symbol. The general syntax becomes [A->∝.B, a ]
where A->∝.B is the production and a is a terminal or right end marker $
LR(1) items=LR(0) items + look ahead
A->∝.BC, a
Suppose this is the 0th production.Now, since ‘ . ‘ precedes B,so we have to write B’s productions as
well.
Suppose this is B’s production. The look ahead of this production is given as we look at previous
productions ie 0th production. Whatever is after B, we find FIRST(of that value) , that is the
lookahead of 1st production.So,here in 0th production, after B, C is there. assume FIRST(C)=d, then
1st production become
B->.D, d
CASE 2–
A->∝.B, a
Here, we can see there’s nothing after B. So the lookahead of 0th production will be the lookahead
of 1st production. ie-
B->.D, a
CASE 3–
Here, the 1st production is a part of the previous production, so the lookahead will be the same as
that of its previous production.
These are the 2 rules of look ahead.
Steps for constructing CLR parsing table :
Example Problem:
Question: Construct a CLR parsing table for the given context-free grammar
S-->AA
A-->aA|b
Solution:
STEP 1
Now, the 1st production came into existence because of ‘ . ‘ Before ‘S’ in 0th production.There
is nothing after ‘S’, so the lookahead of 0th production will be the lookahead of 1st
production. ie: S–>.AA ,$
Now, the 2nd production came into existence because of ‘ . ‘ Before ‘A’ in the 1st production.
After ‘A’, there’s ‘A’. So, FIRST(A) is a,b
Therefore, the look ahead for the 2nd production becomes a|b.
Now, the 3rd production is a part of the 2nd production. So, the look ahead will be the same.
STEP 2
RULE: If any non-terminal has ‘ . ‘ preceding it, we have to write all its production and add ‘ . ‘
preceding each of its production.
RULE: From each state to the next state, the ‘ . ‘ shifts to one place to the right.
Io goes to I1 when ‘ . ‘ of 0th production is shifted towards the right of S(S’->S.). This state is
the accept state . S is seen by the compiler. Since I1 is a part of the 0th production, the
lookahead is the same ie $
Io goes to I2 when ‘ . ‘ of 1st production is shifted towards right (S->A.A) . A is seen by the
compiler. Since I2 is a part of the 1st production, the lookahead is the same i.e. $.
I0 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the
compiler. Since I3 is a part of the 2nd production, the lookahead is the same ie a|b.
I0 goes to I4 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen by the
compiler. Since I4 is a part of the 3rd production, the lookahead is the same i.e. a | b.
I2 goes to I5 when ‘ . ‘ of 1st production is shifted towards right (S->AA.) . A is seen by the
compiler. Since I5 is a part of the 1st production, the lookahead is the same i.e. $.
I2 goes to I6 when ‘ . ‘ of 2nd production is shifted towards the right (A->a.A) . A is seen by the
compiler. Since I6 is a part of the 2nd production, the lookahead is the same i.e. $.
I2 goes to I7 when ‘ . ‘ of 3rd production is shifted towards right (A->b.) . A is seen by the
compiler. Since I6 is a part of the 3rd production, the lookahead is the same i.e. $.
I3 goes to I3 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the
compiler. Since I3 is a part of the 2nd production, the lookahead is the same i.e. a|b.
I3 goes to I8 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the
compiler. Since I8 is a part of the 2nd production, the lookahead is the same i.e. a|b.
I6 goes to I9 when ‘ . ‘ of 2nd production is shifted towards the right (A->aA.) . A is seen by the
compiler. Since I9 is a part of the 2nd production, the lookahead is the same i.e. $.
I6 goes to I6 when ‘ . ‘ of the 2nd production is shifted towards right (A->a.A) . a is seen by the
compiler. Since I6 is a part of the 2nd production, the lookahead is the same i.e. $.
I6 goes to I7 when ‘ . ‘ of the 3rd production is shifted towards right (A->b.) . b is seen by the
compiler. Since I6 is a part of the 3rd production, the lookahead is the same ie $.
STEP 3
similarly 5 is written in A column and 2nd row, 8 is written in A column and 3rd row, 9 is
written in A column and 6th row.
Similarly, S6(shift 6) is added on ‘a’ column and 2,6 row ,S7(shift 7) is added on b column and
2,6 row,S3(shift 3) is added on ‘a’ column and 3 row ,S4(shift 4) is added on b column and 3
row.
==================================================================================
LL(1)
A top-down parser builds the parse tree from the top down, starting with the start non-terminal.
There are two types of Top-Down Parsers:
Top-Down Parsers without backtracking can further be divided into two parts:
In this article, we are going to discuss Non-Recursive Descent which is also known as LL(1) Parser.
LL(1) Parsing: Here the 1st L represents that the scanning of the Input will be done from the Left to
Right manner and the second L shows that in this parsing technique, we are going to use the Left
most Derivation Tree. And finally, the 1 represents the number of look-ahead, which means how
many symbols are you going to see when you want to make a decision.
3. The grammar has to be left factored in so that the grammar is deterministic grammar.
These conditions are necessary but not sufficient for proving a LL(1) parser.
Step 1: First check all the essential conditions mentioned above and go to step 2.
1. First(): If there is a variable, and from that variable, if we try to drive all the strings then the
beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the process of derivation.
1. Find First(α) and for each terminal in First(α), make entry A –> α in the table.
2. If First(α) contains ε (epsilon) as terminal, then find the Follow(A) and for each terminal in
Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make entry A –> ε in the
table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the Terminal
Symbols. All the Null Productions of the Grammars will go under the Follow elements and the
remaining productions will lie under the elements of the First set.
E --> TE'
E' --> +TE' | ε
T --> FT'
T' --> *FT' | ε
F --> id | (E)
As you can see that all the null productions are put under the Follow set of that symbol and all the
remaining productions lie under the First of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell may
contain more than one production.
S --> A | a
A --> a
Step 1: The grammar does not satisfy all properties in step 1, as the grammar is ambiguous. Still, let’s
try to make the parser table and see what happens
Parsing Table:
Here, we can see that there are two productions in the same cell. Hence, this grammar is not feasible
for LL(1) Parser.
Trick – Above grammar is ambiguous grammar. So the grammar does not satisfy the essential
conditions. So we can say that this grammar is not feasible for LL(1) Parser even without making the
parse table.
Parsing Table:
Here, we can see that there are two productions in the same cell. Hence, this grammar is not feasible
for LL(1) Parser. Although the grammar satisfies all the essential conditions in step 1, it is still not
feasible for LL(1) Parser. We saw in example 2 that we must have these essential conditions and in
example 3 we saw that those conditions are insufficient to be a LL(1) parser.
==================================================================================
Parsing is the process to determine whether the start symbol can derive the program or not. If the
Parsing is successful then the program is a valid program otherwise the program is invalid.
1. Top-Down Parsers:
In this Parsing technique we expand the start symbol to the whole program.
2. Bottom-Up Parsers:
Operator Precedence Parser, LR(0) Parser, SLR Parser, LALR Parser and CLR Parser are
the Bottom-Up parsers.
It is a kind of Top-Down Parser. A top-down parser builds the parse tree from the top to down,
starting with the start non-terminal. A Predictive Parser is a special case of Recursive Descent Parser,
where no Back Tracking is required.
By carefully writing a grammar means eliminating left recursion and left factoring from it, the
resulting grammar will be a grammar that can be parsed by a recursive descent parser.
Example:
E –> T E’
E –> E + T | T E’ –> + T E’ | e
T –> T * F | F T –> F T’
F –> ( E ) | id T’ –> * F T’ | e
F –> ( E ) | id
E → T E'
E' → + T E' | ε
T → F T'
T' → * F T' | ε
F → ( E ) | id
#include <iostream>
#include <string>
using namespace std;
string input;
int pos = 0;
string lookahead;
int main() {
cout << "Enter an expression: ";
getline(cin, input);
if (lookahead == "\0") {
cout << "Parsing successful!" << endl;
} else {
cout << "Syntax error: unexpected input after parsing!" << endl;
}
return 0;
}
S → AB
A → aA
B → bB | ε
#include <iostream>
#include <string>
using namespace std;
class Parser {
string input;
int pos;
public:
Parser(string str): input(str),pos(0) {}
bool Parse(){
S();
return pos == input.length();
}
void S (){
A();
B();
}
void A (){
if(pos < input.length() && input[pos]==’a’){
pos++;
A();
} else {
error();
}
}
void B (){
if(pos < input.length() && input[pos]==’b’){
pos++;
B();
} else {
error();
}
}
};
int main(){
string input;
cout<<”Enter String: ”<<endl;
cin>> input;
Parser parser(input);
if(parser.parse()){
cout<<”Success”<<endl;
} else{
cout<<”Failure”<<endl;
}
return 0;
}
S → Be
B → CD
B → bCb
C → cC
C→ε
D → dD
D→ε
#include <iostream>
#include <string.h>
using namespace std;
class Parser {
Private:
string input;
int pos;
char currentChar (){
return pos < input.size() ? input[pos] : ‘\0’;
}
bool match (char expected){
if(currentChar() == expected){
pos++;
return true;
}
return false;
}
bool S (){
if(B()) {
return match(‘e’);
}
return false;
}
bool B (){
int entry = pos;
if(C() && D()) {
return true;
}
pos = entry; // BACKTRACKING IS IMPLEMENTED HERE
if(match(‘b’) && C() && match(‘b’)){
return true;
}
//another case if B → SC was present in our production; then code
will be as below:
// pos = entry;
// if (S() && C()){ ... }
return false;
}
bool C (){
if(match(‘c’) && C()) {
return true;
}
return true; //disputed\not confirmed case of epsilon
}
bool D (){
if(match(‘d’) && D()) {
return true;
}
return true; //disputed\not confirmed case of epsilon
}
Public:
Parser(string in): input(in), pos(0) {}
bool parse(){
if(S() && pos == input.size()) {
return true;
}
return false;
}
};
int main(){
string input;
cout<<”Enter String: ”<<endl;
cin>> input;
Parser p(input);
if(p.parse()){
cout<<”Success”<<endl;
} else{
cout<<”Failure”<<endl;
}
return 0;
}
S → Be
B → CD
B → bCb
C → cC
C→ε
D → dD
D→ε
First(S) = {b, c, d, e} Follow(S) = {$}
First(B) = {b, c, d} Follow(B) = {e}
First(C) = {c, ε} Follow(C) = {b, d, e}
First(D) = {d, ε} Follow(D) = {e}
#include <iostream>
#include <string.h>
using namespace std;
class Parser {
Private:
string input;
int pos;
char currentChar (){
return pos < input.size() ? input[pos] : ‘\0’;
}
bool S (){
if(B()) {
return match(‘e’);
}
return false;
}
bool B (){
// In this first “if” statement below; here is the logic to
// to follow:
// Only put here those whose FIRST(CD) & FIRST(b)
// FIRST(CD) = {c, d, ε}
// FIRST(b)= {b}
if(lookahead(‘c’) || lookahead(‘d’)) {
// B → CD
return (C() && D());
} else if (lookahead(‘b’)){
return match(‘b’) && C() && match(‘b’);
} else{
return false;
}
}
bool C (){
// In this first “if” statement below; here is the logic to
// to follow:
// Only put here those whose FIRST(cC) & FOLLOW(C)
// FIRST(cC) = {c}
// FOLLOW(C)= {b, d, e}
if(lookahead(‘c’)) {
//C → cC
return (match(‘c’) && C());
} else if (lookahead(‘b’) || lookahead(‘d’) || lookahead(‘e’)){
// C → ε
// NOTE HERE THAT THE LOOKAHEADS() ARE FROM FOLLOW SET!!
return true;
}
return false;
}
bool D (){
// In this first “if” statement below; here is the logic to
// to follow:
// Only put here those whose FIRST(dD) & FOLLOW(D)
// FIRST(dD) = {d}
// FOLLOW(D)= {e}
if(lookahead(‘d’)) {
// D → dD
return (match(‘d’) && D());
} else if (lookahead(‘e’)){
// D → ε
// NOTE HERE THAT THE LOOKAHEADS() ARE FROM FOLLOW SET!!
return true;
}
return false;
}
Public:
Parser(string in): input(in), pos(0) {}
bool parse(){
if(S() && pos == input.size()) {
return true;
}
return false;
}
};
int main(){
string input;
cout<<”Enter String: ”<<endl;
cin>> input;
Parser p(input);
if(p.parse()){
cout<<”Success”<<endl;
} else{
cout<<”Failure”<<endl;
}
return 0;
}