0% found this document useful (0 votes)
27 views32 pages

Compiler Design Lab Manual

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views32 pages

Compiler Design Lab Manual

Uploaded by

venkat Mohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 32

COMPILER DESIGN LABORATORY 0 0 3 2

COURSE OBJECTIVE:

Students will be able to provide practical, hands on experience in


1. Designing a Lexical analyzer
2. Developing and implementing the parser
3. Identifying the similarities and differences among parsers
4. Designing and implementing Code Generator
5. Designing and implementing code optimizer

LIST OF EXPERIMENTS:

1. Construction of NFA.
2. Construction of minimized DFA from a given regular expression.
3. Use LEX tool to implement a lexical analyzer.
4. Use YACC and LEX to implement a parser for the grammar.
5. Implement a recursive descent parser for an expression grammar..
6. Construction of operator precedence parse table.
7. Implementation of symbol table
8. Implementation of shift reduced parsing algorithms.
9. Construction of LR parsing table.
10. Generation of code for a given intermediate code.
11. Implementation of code optimization techniques.

Total: 45 hours
1. Construction of NFA

1.1 AIM

Write a C program to implement Lexical analyzer using Non Deterministic Finite


Automata (NFA).

1.2 DESCRIPTION

Lexical analyzer is the first stage of the compiler it reads the input program and
groups them in to lexemes (basic elements of language). In this exercise we are going to
implement the lexical analyzer with NFA. The steps are:

a. Constructing the NFA from the regular expression


b. Recognize the input with the help of NFA

1.2.1 Constructing NFA


The lexemes are represented by regular expression. The following steps give the
information about how to convert Regular expression to NFA:
To recognize an empty string 


i f
To recognize a symbol a in the alphabet 

a
i f

If N(r1) and N(r2) are NFAs for regular expressions r1 and r2

For regular expression r1 | r2

 N(r1) 
i  f

N(r2)

NFA for r1 | r2
For regular expression r1 r2

Final state of N(r2) become final state of N(r1r2)


For regular expression r*

NFA for r*

1.2.2 Example - (a|b) * a )

b
b:

a 

(a | b)  
b

a 

 a
(a|b) *  
  
 
(a|b) * a b a
 
b


1.2.3 Recognizer
A recognizer for a language is a program that takes a string x, and answers “yes” if x
is a sentence of that language, and “no” otherwise.

Algorithm for recognizer

1.Start from the initial state (i.e 0)


2.For all transition states in the given state and next input symbol do the following
3.Move the next state ( & recursively call the same function)
4.If not an valid move then repeat the step 3 with other value
5.If there is no valid move then
6.If final state
7.Return the token value
8.Else
9.Backtrack one step
10. If backtrack is not possible then indicate error

1.3 Input and output format


The regular expression of the tokens and return value will be stored in the file
regex.txt the format is as follows

Pattern returnvalue

You can accept the input from the keyboard and you should print the return value
in the following format

Startingnode value endnode returnvalue

1.3.1 Sample input and output


regex.txt contents:
[a-z]* 1
[a-z]+[1-9]* 2
[0-9]+ 3

Output for different input text:


sum 1
var1 2
123 3

1.4 Sample problems:


1. Define a structure rule to store the triple (p, a, q), where q is one possibility for δ(p,a).
Also define an NFA as a structure of the following items: the number n of states, an
array f [ ] to store the information which states are final, the number m of transfer-
function rules, and an array T [ ] of m structures of type rule. Write a function to
populate and return a machine data type M as follows:
i. First read the number of states
ii. The user then enters the final states one by one (–1 to terminate)
iii. The user finally enters the transition rules (p, a, q) one the by one (p =
–1 terminates the reading loop) The user then runs the NFA on one or
more inputs. Each input is read as a sequence of 0's and 1's. If –1 is
entered as the next symbol, the input ends. All the symbols read
(including the trailing –1) are stored in an array. For each input, run
the functions of Parts 2 and 3.

2. L={w|w} ends with 00} with three states.Notice that w only has to end with 00, and
before the two zeros, there can be anything. Construct a NFA to recognize L.
2. Construction of minimized DFA from a given regular expression

2.1 AIM

Write a C program to implement lexical analyzer with Deterministic Finite Automata

2.2 DESCRIPTION

A Deterministic Finite Automaton (DFA) is a special form of a NFA.


 no state has - transition
 for each symbol a and state s, there is at most one labeled edge a leaving s.

2.2.1 Converting Regular Expressions Directly to DFAs

 We may convert a regular expression into a DFA (without creating a NFA first).
 First we augment the given regular expression by concatenating it with a special
symbol #.
 r  (r)# augmented regular expression
 Then, we create a syntax tree for this augmented regular expression.
 In this syntax tree, all alphabet symbols (plus # and the empty string) in the
augmented regular expression will be on the leaves, and all inner nodes will be the
operators in that augmented regular expression.
 Then each alphabet symbol (plus #) will be numbered (position numbers).

We need to calculate firstpos, followpos, lastpos, nullable

2.2.1.1 How to evaluate firstpos, lastpos, nullable

N nullable(n) firstpos(n) lastpos(n)


leaf labeled  True  
leaf labeled with position i false {i} {i}
| nullable(c1) or firstpos(c1) lastpos(c1) 
nullable(c2) firstpos(c2) lastpos(c2)
c1 c2
 nullable(c1) and if (nullable(c1)) if (nullable(c2))
 nullable(c2) firstpos(c1)  lastpos(c1) 
firstpos(c2) lastpos(c2)
c1 c2
else firstpos(c1) else lastpos(c2)
* True firstpos(c1) lastpos(c1)
c1

Table 2.1:Evaluate firstpos,lastpos,nullable


2.2.1.2 Algorithm (RE  DFA)

•Create the syntax tree of (r) #


•Calculate the functions: followpos, firstpos, lastpos, nullable
•Put firstpos(root) into the states of DFA as an unmarked state.
•while (there is an unmarked state S in the states of DFA) do
–mark S
–for each input symbol a do

•let s1,...,sn are positions in S and symbols in those positions are


•S’  followpos(s1)  ...  followpos(sn)
•move(S,a)  S’

•if (S’ is not empty and not in the states of DFA)


–put S’ into the states of DFA as an unmarked state
.
•the start state of DFA is firstpos(root)
•the accepting states of DFA are all states containing the position of #

Example -- ( a | b) * a #
1 2 34

followpos(1)={1,2,3}
followpos(2)={1,2,3}
followpos(3)={4}
followpos(4)={}
S1=firstpos(root)={1,2,3}
 mark S1
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S1,a)=S2
b: followpos(2)={1,2,3}=S1 move(S1,b)=S1
 mark S2
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S2,a)=S2
b: followpos(2)={1,2,3}=S1 move(S2,b)=S1

start state: S1 b a
accepting states: {S2}
a
S1 S2

b
Example -- ( a | ) b c* #
1 2 3 4
followpos(1)={2}
followpos(2)={3,4}
followpos(3)={3,4}
followpos(4)={}
S1=firstpos(root)={1,2}
 mark S1
a: followpos(1)={2}=S2 move(S1,a)=S2
b: followpos(2)={3,4}=S3 move(S1,b)=S3
 mark S2
b: followpos(2)={3,4}=S3 move(S2,b)=S3
 mark S3
c: followpos(3)={3,4}=S3 move(S3,c)=S3

start state: S1
accepting states: {S3} S2
a
b
S1
b
S3 c
2.2.2 Recognizer
Algorithm
1 start with the initial state
2 Do the following steps until the end of input file
a. Check if there is any transition for the given input symbol
b. If there is a transition then move to that state
c. If not, check that it is a final state
i. if it is the final state, then return token value
ii. else indicate error

2.3 Sample Input and output


(a/b)*abb
(0+1)*
2.4 Sample problems:

1. The language corresponding to the regular expression (11) *+(111)* seems to accept
all strings of 1’s with length in multiple of 2 or 3.Construct a minimized DFA for the
above regular expression.

2. A small application named Mod4Filter that reads lines of text from the standard
input, filters out those that are not binary representations of numbers that are divisible
by four, and echoes the others to the standard output. Write a C program which
models the behavior of the Mod4Filterusing an appropriate type of finite automata.
3. Implementing a Lexical analyzer using LEX tool

3.1 AIM

LEX is a tool widely used to specify lexical analyzers, Using LEX tool implement a
lexical analyzer.

3.2 Description

Lex
source LEX Lex.yy.c
pgm COMPILER
lex.l

lex.yy.c
C COMPILER a.out

Input
stream a.out Sequence
of token

3.2.1 LEX specification

A Lex program consists of three parts


Declarations
%%
translation
%%
auxiliary procedures

3.2.1.1 Declaration

 Variables
 Manifest constants: an identifier that is declared to represent a constant
 Regular definition:

 To write regular expression for some languages can be difficult,


because their regular expressions can be quite complex. In those cases,
we may use regular definitions.
 We can give names to regular expressions, and we can use these
names as symbols to define other regular expressions.

 •A regular definition is a sequence of the definitions of the form:


 d1  r1 where di is a distinct name and
 d2  r2 where ri is a regular expression over symbols
in  {d1,d2,...,di-1}
 dn  rn

3.2.1.2 Translation rules

The translation rules of the LEX program are statements of the form

p1 {action1}
p2 {action2}
.. …
pn {actionn}

Where pi is a regular expression and


Each actioni} is program fragment describing what action the lexical analyzer should take
when pattern matches the lexeme

3.2.1.3 Auxiliary procedures:


 Needed by the action
 Compiled separately and loaded with the lexical analyzer.

3.2.2 Example
%{
/* Definition of manifest constants LT, LE, EQ,NE,GT,GE,IF,THEN, ELSE, ID,
NUMBER, RELOP*/
}%
/*regular definition*/
delim { \t\n}
ws {delim}+
letter {A-Za-z}
digit {0-9}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?
%%
{ws} {/*no action and noreturn*/}
if {return (IF);}
then {return (THEN);}
else {return (ELSE);}
{id} {yyval=install_id(); return (ID);}
{number} {yyval=install_num (); return (NUMBER);}
“<” {yyval = LT; return (RELOP);}
“<=” {yyval = LE; return (RELOP);}
“=” {yyval = EQ; return (RELOP);}
“<>” {yyval = NE; return (RELOP);}
“>” {yyval = GT; return (RELOP);}
“>=” {yyval = GE; return (RELOP);}
%%
INSTALL_ID()
{
/* Procedure to install the lexemes, whose first character is pointed to by yytext and
whose length is yyleng, into the symbol table and return a pointer there to */
}
install_num()
{
Similar procedure to install a lexeme that is a number */
}

3.3 Input and output format


The LEX input file must be stored in the input.l file. The LEX will provide one C file
after compilation you will get an executable file after executing this it will accept the input
string and it should print the token value or error message.

3.3.1 Sample problems

1. To find out number of occurrences of predefined words in a file


For example
Predefined words
the,is,was
Sentence
2. Using LEX find out no of lines, words, characters in the given text file.
3. Using LEX to find out different tags and their no of occurrences in the given HTML
file.
4. Using LEX to convert student marks available in text format to HTML table
format .The format of input text file is as follows.
Name:student1
M01:20
M03:30
M04:40
M05:60
M06:70
Name:student2
*
*
*
The output will be as follows
<HTML>
<BODY>
<TABLE>
<TR>
<TH><TD>NAME</TD></TH>
<TH><TD>M01 </TD></TH>
<TH><TD>M02 </TD></TH>
<TH><TD>M03 </TD></TH>
<TH><TD>M04 </TD></TH>
<TH><TD>M05 </TD></TH>
</TR>
<TR>
<TD>STUDENT1</TD>
<TD>20</TD>
<TD>30 </TD>
<TH><TD>40 </TD></TH>
<TH><TD>60 </TD></TH>
<TH><TD>70 </TD></TH>
</TR>
</TABLE>
</BODY>
</HTML>
4. Use YACC and LEX to implement a parser for the grammar

4.1 AIM

Create YACC input file to generate parser for an expression grammar.

4.2 DISCRIPTION

4.2.1 YACC specifications


Introduction
YACC ( Yet Another Compiler Collection ) is the toll used to construct new parser

Parser
/specification YACC Parser

Spec.y

Yacc is the command tool used to construct the parser we need to give parser
specification as the input and it will produce some .h and .c files which contains the actual
code of the parser it contains one function yyparse() it is the original parser

Basic Specifications

declarations
%%
rules
%%
programs

4.2.1.1 Declaration section


o The code between %{ ……. %} will be copied to the beginning of resulting C file
o %token is used to indicate the tokens here we can specify the type of the token value
for eg %token <intv> INT the token value will be available intv field of the union
o %left, %right is used to indicate associative type
o %start is used to indicate start location

4.2.1.2 Rules section


Here we specify the Context Free Grammar and corresponding actions
For eg
CFG
E->NUM+NUM| NUM-NUM
We can specify this in rules section as follows
E : NUM ‘+’ NUM { $$=$1 + $3 ;}
|NUM-NUM { $$=$1 - $2; }
Here $ is special symbol to indicate the value of the node $$ indicate the value of
current node $i indicate the value of the ith node .

The general format is


LHSof CFG: 1st RHS of CFG { actions it may be any valuable C coding }
| 2nd RHS of CFG {action}
.
.
.

4.2.1.3 Programs Section


It contains the main program and user defined function from the main
function we can call the function yyparse() which is the actual parser

4.2.2 Lexical analyzer part


The parser is designed such a way that it will accept the tokens by using
external program with the function yylex . So you can implement your own lexical
analyzer with yylex function or you can use LEX tool to generate lexical analyzer.

4.2.3 Sample YACC specification

Let us assume the CFG E-> ID + ID |ID -ID


%{
extern int yylval ;
int t;
%}
%token ID
%%
E: ID ‘+’ ID { t=$1; t+=$3 printf(“%d”,t) ; }
| ID ‘-‘ ID { t=$1; t-=$3 printf(“%d”,t);}
%%
int main()
{yyparse();
return(1);
}
4.3 Sample problems

Write a C program to implement Program semantic rules to calculate the expression that takes an expression
with digits, + and * and computes the value
5. Recursive descent parser

1. Write a C program for implementing the functionalities of recursive descent parser


for validating variables declared in a program using the productions given below.
<identifier>::=<identifier><letter or digit>|<letter>
<letter or digit>::=<letter>|<digit>
<letter>:;=a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
<digit>::=0|1|2|3|4|5|^|7|8|9
6. Operator Precedence Parser

6.1 AIM

Write a C program to implement operator precedence parser for an expression


grammar.

6.2 DESCRIPTION:

The operator Precedence parser is a bottom up parsing technique.It is widely sed for a
small but important class of grammar. These grammar have the property that no production
on right side is
ε or has 2 adjacent non terminals.This grammar is called operator grammar.

Eg: E->E+E/E-E/E*E/E/E/(E)/id

In operator Precedence parser we define 3 digit precedence relation <,= and >.These
precedence relation guide the selection of handler and have the following meanings

 a<b a yields precedence to b

 a=b a has some precedence as b

 a>b a takes precedence over b

Rules are designed to select the proper handler to reflect a given act of associativity
and precedence rules for binary operator.

1. If operator θ1 has higher precedence than θ2 make θ1>θ2 and θ2<θ1

2. If θ1 and θ2 are operators of equal precedence

then make θ1> θ2 and θ2> θ,.if the operators are left associative or

make θ1<θ2 and θ2<θ1,if they are right associative.

3. Make θ <id,id> θ, θ<(,(< θ,&>θ, θ>), θ>$ and $< θ for all operators θ.

4. Also make (=) , $<), $<id, (<(, id>$, )>$, (<id, id>), )>)

These rules ensure that both id and (E) will be reduced to E.also $ serves as both the
left and right end marker,causing handler to be found between $’s wherever possible.

6.3 Sample Input:

i + i*i
6.4 Sample problem:
1. Construct the operator precedence parser for the following grammar:
E->TE'
E'->+TE/@ "@ represents null character"
T->FT'
T`->*FT'/@
F->(E)/ID
7. Symbol Table

7.1 AIM

Write a C program to implement symbol table

7.2 DESCRIPTION

After syntax tree have been constructed, the compiler must check whether the input
program is type- correct (called type checking and part of the semantic analysis). During
type checking, a compiler checks whether the use of names (such as variables, functions,
type names) is consistent with their definition in the program. Consequently, it is necessary
to remember declarations so that we can detect inconsistencies and misuses during type
checking. This is the task of a symbol table. Note that a symbol table is a compile-time data
structure. It’s not used during run time by statically typed languages. Formally, a symbol
table maps names into declarations (called attributes), such as mapping the variable name x
to its type int. More specifically, a symbol table stores:

• For each type name, its type definition.


• For each variable name, its type. If the variable is an array, it also stores
dimension information. It may also store storage class, offset in activation record
etc.
• For each constant name, its type and value.
• For each function and procedure, its formal parameter list and its output type.
Each formal parameter must have name, type, type of passing (by-reference or
by-value), etc.

7.3 OPERATION ON SYMBOL TABLE

We need to implement the following operations for a symbol table:


1. insert ( String key, Object binding )
2. object_lookup ( String key )
3. begin_scope () and end_scope ()

(1) insert (s,t)- return index of new entry for string ‘s’
and token ‘t’
(2) lookup (s)- return index of new entry for string ‘s’ or o if ‘s’
is not found.
(3) begin_scope () and end_scope () : When we have a new block (ie, when we encounter
the token {), we begin a new scope. When we exit a block (i.e. when we encounter the
token }) we remove the scope (this is the end_scope). When we remove a scope, we
remove all declarations inside this scope. So basically, scopes behave like stacks. One
way to implement these functions is to use a stack. When we begin a new scope we push
a special marker to the stack (e.g., 1). When we insert a new declaration in the hash table
using insert, we also push the bucket number to the stack. When we end a scope, we pop
the stack until and including the first –1 marker.

(4) Handling Reserve Keywords: Symbol table also handle reserve keywords like
‘PLUS’, MINUS’, ‘MUL’ etc. This can be done in following manner.
insert (“PLUS”, PLUS);

insert (“MINUS”, MINUS);

In this case first ‘PLUS’ and ‘MINUS’ indicate lexeme and other one indicate token.

7.4 SYMBOL TABLE MPLEMENTATION

The data structure for a particular implementation of a symbol table is sketched


in figure 9.1. In figure 9.1, a separate array ‘arr_lexemes’ holds the character string
forming an identifier. The string is terminated by an end-of-string character, denoted by
EOS, that may not appear in identifiers. Each entry in symbol-table array
‘arr_symbol_table’ is a record consisting of two fields, as “lexeme_pointer”, pointing to
the beginning of a lexeme, and token. Additional fields can hold attribute values. In figure
9.1, the 0th entry is left empty, because lookup return 0 to indicate that there is no entry for
a string. The 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th entries are for the ‘a’, ‘plus’ ‘b’ ‘and’, ‘c’,
‘minus’, and ‘d’ where 2nd, 4th and 6th entries are for reserve keyword.

ARRAY
all_symbol_table
a+b AND c – d Lexeme_pointe Token Attribute Position
id 1 plus
2 id
3
AND 4
Id 5 minus
6
id 7

Figure 7.1: Implemented symbol table

7.5 DESIGN OF SYMBOL TABLE

When lexical analyzer reads a letter, it starts saving letters, digits in a buffer ‘lex_bufffer’. The string
collected in lex_bufffer is then looked in the symbol table, using the lookup operation. Since the
symbol table initialized with entries for the keywords plus, minus, AND operator and some
identifiers as shown in figure 9.1 the lookup operation will find these entries if lex_buffer contains
either div or mod. If there is no entry for the string in lex_buffer, i.e., lookup return 0, then
lex_buffer contains a lexeme for a new identifier. An entry for the identifier is created using insert( ).
After the insertion is made; ‘n’ is the index of the symbol-table entry for the string in lex_buffer.
This index is communicated to the parser by setting tokenval to n, and the token in the token field of
the entry is returned

7.6 Sample Input

a + b AND c-d

7.7 Sample problem:


1. Consider the following sample program written in a hypothetical language.

BEGIN
PRINT “HELLO”
INTEGER A, B, C
REAL D, E
STRING X, Y
A := 2
B := 4
C := 6
D := -3.56E-8
E := 4.567
X := “text1”
Y := “hello there”
PRINT “Values of integers are [A], [B], [C]”
FOR I:= 1 TO 5 STEP +2
PRINT “[I]”
PRINT “Strings are [X] and [Y]”
END

Output of the above hypothetical program is

HELLO
Values of integers are 2, 4, 6
1
21
3
5
Strings are text1 and hello there.
Generate a symbol table to handle the variables and their types etc. An output file called symtab.sym will be created
which will contain the relevant data.

22
8. Shift Reduce Parsing

8.1 AIM

Write a C program to implement shift reduce parsing for an expression grammar.

8.2 Description

A shift-reduce parser uses a parse stack which (conceptually) contains grammar symbols.
During the operation of the parser, symbols from the input are shifted onto the stack. If a prefix of
the symbols on top of the stack matches the RHS of a grammar rule which is the correct rule to use
within the current context, then the parser reduces the RHS of the rule to its LHS, replacing the
RHS symbols on top of the stack with the nonterminal occurring on the LHS of the rule. This shift-
reduce process continues until the parser terminates, reporting either success or failure. It terminates
with success when the input is legal and is accepted by the parser. It terminates with failure if an
error is detected in the input.

The parser is nothing but a stack automaton which may be in one of several discrete states. A
state is usually represented simply as an integer. In reality, the parse stack contains states, rather than
grammar symbols. However, since each state corresponds to a unique grammar symbol, the state
stack can be mapped onto the grammar symbol stack mentioned earlier.

The operation of the parser is controlled by a couple of tables:

8.2.1 Action Table

The action table is a table with rows indexed by states and columns indexed by terminal
symbols. When the parser is in some state s and the current lookahead terminal is t, the action
taken by the parser depends on the contents of action[s][t], which can contain four different
kinds of entries:
Shift s'
Shift state s' onto the parse stack.
Reduce r
Reduce by rule r. This is explained in more detail below.
Accept
Terminate the parse with success, accepting the input.
Error
Signal a parse error.

8.2.2 Goto Table

The goto table is a table with rows indexed by states and columns indexed by nonterminal
symbols. When the parser is in state s immediately after reducing by rule N, then the next
state to enter is given by goto[s][N].

The current state of a shift-reduce parser is the state on top of the state stack. The detailed
operation of such a parser is as follows:

23
1. Initialize the parse stack to contain a single state s0, where s0 is the distinguished initial state
of the parser.
2. Use the state s on top of the parse stack and the current lookahead t to consult the action table
entry action[s][t]:
 If the action table entry is shift s' then push state s' onto the stack and advance the
input so that the lookahead is set to the next token.
 If the action table entry is reduce r and rule r has m symbols in its RHS, then pop m
symbols off the parse stack. Let s' be the state now revealed on top of the parse stack
and N be the LHS nonterminal for rule r. Then consult the goto table and push the
state given by goto[s'][N] onto the stack. The lookahead token is not changed by this
step.
 If the action table entry is accept, then terminate the parse with success.
 If the action table entry is error, then signal an error.
3. Repeat step (2) until the parser terminates.

One possible set of shift-reduce parsing tables is shown below (sn denotes shift n, rn denotes
reduce n, acc denotes accept and blank entries denote error entries):

Parser Tables
Action Table Goto Table
ID ':=' '+' '-' <EOF> stmt expr
0 s1 g2
1 s3
2 s4
3 s5 g6
4 acc acc acc acc acc
5 r4 r4 r4 r4 r4
6 r1 r1 s7 s8 r1
7 s9
8 s10
9 r2 r2 r2 r2 r2
10 r3 r3 r3 r3 r3

Handles during parsing

24
Definition of "handle" :

The leftmost simple phrase of a right-sentencial form is the handle.

8.3 Shift-Reduce Parsing

Parser performs actions based on parse table information:

1. Shift. Shift the next input symbol onto the top of the stack.
2. Reduce. The right end of the string to be reduced must be at the top of the
stack. Locate the left end of the string within the stack and decide with what
nonterminal to replace the string.
3. Accept. Announce successful completion of parsing.
4. Error. Discover a syntax error and call an error recovery routine.

8.4 Sample Input:


id *id

8.5 Sample Output:

Shift-reduce parsing on input id*id.

25
8.6 Sample Problem:

1. Design a parser which accepts a mathematical expression (containing integers only). If the expression is valid, then
evaluate the expression else report that the expression is invalid.
[Note: Design first the Grammar and then implement using Shift-Reduce parsing technique. Your program should
generate an output file clearly showing each step of parsing/evaluation of the intermediate sub-expressions. ]

26
9. LR PARSER

9.1 AIM

Write a C program to construct LR parsing table

9.2 Description

 Recognize CFGs for most programming languages. No need to rewrite grammar.


 Most general O(n) shift-reduce method.
 Optimal syntax error detection.
 LR grammars superset of LL grammars

9.2.1 LR(0) Automaton

 The LR(0) automaton is a DFA which accepts viable prefixes of right sentencial
forms, ending in a handle.
 States of the NFA correspond to LR(0) Items.
 Use subset construction algorithm to convert NFA to DFA.

LR(0) Item: A grammar rule with a dot ( ) added between symbols on the RHS
Example:The production rule yields four items:

Items indicate how much of a production has been seen at a given point in parsing process.
indicates a string derivable from appears on input and a string derivable
from is expected on input.

indicates input derivable from the RHS has been seen and a reduction
may occur.

Kernel Items: All items whose dots are not at the beginning of the RHS, plus the augmented
initial item .

Nonkernel Items: Items with dots at beginning of RHS, except .

NFA of LR(0) Items: formed using each item as a state of NFA with transitions
corresponding to movement of dots by one symbol.

27
Items with a dot preceding a nonterminal have epsilon transitions to all items beginning with
that nonterminal.

LR(0) Automaton is the DFA formed by subset construction of the LR(0) NFA.

9.3LR parsing algorithm

Input: Input string w and an LR parsing table with functions action and goto for a grammar G.

Output: If w is in L(G), a bottom-up parse for w. Otherwise, an error indication.

Method: Initially the parser has s0, the initial state, on its stack, and w$ in the input buffer.
repeat forever begin

let s be the state on top of the stack


and a the symbol pointed to by ip;
if action[s, a] = shift s' then begin
push a, then push s' on top of the stack; // <symbol, state> pair
advance ip to the next input symbol;
else if action[s, a] = reduce A -> b then begin
pop 2* |b| symbols off the stack;
let s' be the state now on top of the stack;
push A, then push goto[s', A] on top of the stack;
output the production A -> b; // for example
else if action[s, a] = accept then
return
else error();
end

28
9.4 Sample Input:

(1) E -> E * B
(2) E -> E + B
(3) E -> B
(4) B -> 0
(5) B -> 1

9.5 Sample Output:

action goto
state * + 0 1 $ EB
0 s1 s2 3 4
1 r4 r4 r4 r4 r4
2 r5 r5 r5 r5 r5
3 s5 s6 acc
4 r3 r3 r3 r3 r3
5 s1 s2 7
6 s1 s2 8
7 r1 r1 r1 r1 r1
8 r2 r2 r2 r2 r2
9.6 Sample problem:

1. Consider the following grammar:


S --> ABC
A--> abA | ab
B--> b | BC
C--> c | cC
Construct a LR parser which accepts a string and tells whether the string is accepted by above
grammar or not.

29
10. Intermediate code Generator
10.1 AIM

Write a C Program to generate intermediate code which converts source program to target code.

10.2 Description

 Intermediate codes are machine independent codes, but they are close to machine
instructions.
 The given program in a source language is converted to an equivalent program in an
intermediate language by the intermediate code generator.
 Intermediate language can be many different languages, and the designer of the compiler
decides this intermediate language.
 syntax trees can be used as an intermediate language.
 postfix notation can be used as an intermediate language.
 three-address code (Quadraples) can be used as an intermediate language
 we will use quadraples to discuss intermediate code generation
 quadraples are close to machine instructions, but they are not actual machine
instructions.
 some programming languages have well defined intermediate languages.
 java – java virtual machine
 prolog – warren abstract machine
 In fact, there are byte-code emulators to execute instructions in these
intermediate languages.

10.2.1 Three Address Code

Statements of general form x:=y op z

No built-up arithmetic expressions are allowed.

As a result, x:=y + z * w should be represented as t1:=z * w t2:=y + t1 x:=t2

Observe that given the syntax-tree or the dag of the graphical representation we can easily derive a
three address code for assignments as above.

In fact three-address code is a linearization of the tree.

Three-address code is useful: related to machine-language/ simple/ optimizable.

10.2.2 Types of Three-Address Statements.

Assignment Statement: x:=y op z


Assignment Statement: x:=op z
Copy Statement: x:=z
Unconditional Jump: goto L
Conditional Jump: if x relop y goto L

30
10.3 Sample Input:

Enter the expression : a=(k+8)*(c-s)

10.4 Sample Output:

t0 = k + 8

t1 = c – s

t2 = t0 * t1

a = t2

31
11. Code Optimization Technique

11.1 AIM

Write a C program to optimize the code using optimizing techniques.

11.2 DESCRIPTION

Compiler optimization is the process of tuning the output of a compiler to minimize or


maximize some attributes of an executable computer program. The most common requirement is to
minimize the time taken to execute a program; a less common one is to minimize the amount of
memory occupied. The growth of portable computers has created a market for minimizing the power
consumed by a program. Compiler optimization is generally implemented using a sequence of
optimizing transformations, algorithms which take a program and transform it to produce a
semantically equivalent output program that uses less resource.

11.2.1 Types of optimizations

Techniques used in optimization can be broken up among various scopes which can affect anything
from a single statement to the entire program. Generally speaking, locally scoped techniques are
easier to implement than global ones but result in smaller gains. Some examples of scopes include:

 Peephole optimizations
 Local optimizations
 Global optimizations
 Loop optimizations
 Machine code optimization

Front End Code Code


Optimizer Generator

General Flow Data Flow


Analysis Transformation
Analysis

32

You might also like