0% found this document useful (0 votes)
90 views

Compiler-Group Assignment

The document is a group assignment for a principles of compiler design course at Arba Minch University. It lists 10 students, their names and student IDs, who are working on the group assignment. The assignment is being submitted to the instructor, Mr. Alemayehu M.

Uploaded by

Elias Hailu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Compiler-Group Assignment

The document is a group assignment for a principles of compiler design course at Arba Minch University. It lists 10 students, their names and student IDs, who are working on the group assignment. The assignment is being submitted to the instructor, Mr. Alemayehu M.

Uploaded by

Elias Hailu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

ARBA MINCH UNIVERSITY

INSTITUTE OF TECHNOLOGY
FACULTY OF COMPUTING AND SOFTWARE
ENGINEERING (G4-SWE)
Principles of Compiler Design Group Assignment
Group members ID

1. Edilayehu Tadesse……………………………...NSR/835/12
2. Lencho Nugus…………………………………. NSR/1477/12
3. Elias Chane……………………………………. NSR/860/12
4. Hawa Hinnaw…………………………………. NSR/1263/12
5. Rhobot Haile……………………………………NSR/1999/12
6. Tekalign Getachew……………………………. NSR/2267/12
7. Omar Kamil…………………………………… NSR/1947/12
8. Sisay Fikadu…………………………………… NSR/2171/12
9. Rahel Getachew…………………………………NSR/1971/12
10. Tewodros Mekuria……………………………...NSR/2313/12
To: - Mr. Alemayehu M.
1. Discuss in detail the three ways of Intermediate representation.

Intermediate representation (IR) is a crucial component in the compilation process,


serving as a bridge between the high-level source code and the low-level machine code.
It provides a structured and machine-independent representation of the program, enabling
various optimization and analysis techniques.
The translation of the source code into the object code for the target machine, a compiler
can produce a middle-level language code, which is referred to as intermediate code or
intermediate text. There are three types of intermediate code representation are as
follows :-

Postfix Notation
In postfix notation, the operator comes after an operand, i.e., the operator follows an
operand.
Example
 Postfix Notation for the expression (a+b) * (c+d) is ab + cd +*
 Postfix Notation for the expression (a*b) - (c+d) is ab* + cd + - .
Syntax Tree
A tree in which each leaf node describes an operand & each interior node an operator.
The syntax tree is shortened form of the Parse Tree.
Example − Syntax Tree for the string a + b ∗ c − d.

Three-Address Code
The three-address code is a sequence of statements of the form A−=B op C, where A, B,
C are either programmer-defined names, constants, or compiler-generated temporary
names, the op represents for an operator that can be fixed or floating-point arithmetic
operators or a Boolean valued data or a logical operator. The reason for the name “three
address-code” is that each statement generally includes three addresses, two for the
operands, and one for the result.
There are three types of three address code statements which are as follows −
Quadruples representation:
Records with fields for the operators and operands can be define three address
statements. It is possible to use a record structure with fields, first hold the operator ‘op’,
next two hold operands 1 and 2 respectively, and the last one holds the result. This
representation of three addresses is called a quadruple representation.
Triples representation:
The contents of operand 1, operand 2, and result fields are generally pointer to symbol
records for the names described by these fields. Therefore, it is important to introduce
temporary names into the symbol table as they are generated.
This can be prevented by using the position of statement defines temporary values. If this
is completed then, a record structure with three fields is enough to define the three
address statements− The first holds the operator and the next two holds the values of
operand 1 and operand 2 respectively. Such representation is known as triple
representation.
Indirect Triples Representation:
The indirect triple representation uses an extra array to list the pointer to the triples in the
desired sequence. This is known as indirect triple representation.
The triple representation for the statement x− = (a + b)*-c is as follows −

Statement Statement Location Operator Operand 1 Operand 2

(0) (1) (1) + a B

(1) (2) (2) - c

(2) (3) (3) * (0) (1)

(3) (4) (4) / (2) d

(4) (5) (5) :=) (3)

2. Discuss and give example for Bottom-up parsing and Top-down


parsing.

Parser is a compiler that is used to break the data into smaller elements coming from
lexical analysis phase. A parser takes input in the form of sequence of tokens and
produces output in the form of parse tree. Parsing is of two types: top-down parsing and
bottom-up parsing.

Top-Down Parsing and Bottom-Up Parsing are used for parsing a tree to reach the
starting node of the tree. Both the parsing techniques are different from each other. top-
down parsing starts from top of the parse tree, while bottom-up parsing starts from the
lowest level of the parse tree.
"Bottom-up parsing" and "top-down parsing" refer to two different approaches for
constructing parse trees or syntax trees during the parsing process. Both approaches are
commonly used in compiler design and syntax analysis.

1. Top-Down Parsing:
Top-Down Parsing technique is a parsing technique which starts from the top level of the
parse tree, move downwards, evaluates rules of grammar. In other words, top-down
parsing is a parsing technique that looks at the highest level of the tree at start and then
moves down to the parse tree.
The top-down parsing technique tries to identify the leftmost derivation for an input. It
evaluates the rules of grammar while parsing. Consequently, each terminal symbol in the
top-down parsing is produced by multiple production of grammar rules.
Since top-down parsing uses leftmost derivation, hence in this parsing technique, the
leftmost decision selects what production rule is used to construct the string.
Top-down parsing starts from the start symbol of the grammar and tries to derive the
input string by repeatedly applying the grammar rules in a process known as "expansion"
or "prediction." It begins with the start symbol and recursively expands it to match the
input tokens until the entire input is parsed.
Example:
Top-down parsing steps:
Expand S -> E [E]
Expand E -> T [T]
Match num [num]
Expand T -> num [num]
Expand E -> T [T]
Match + [num, +]
Expand T -> num [num, +, num]
In this example, the parser starts with the start symbol and repeatedly expands it to match
the input tokens until the input is fully parsed.
Types of Top-Down Parsing
There are two types of top-down parsing which are as follows –
 Top-Down Parsing with Backtracking
In Backtracking, the parser can make repeated scans of input. If the required input string
is not achieved by applying one production rule, then another production rule can be
applied at each step to get the required string.
In Top-Down Parsing with Backtracking, Parser will attempt multiple rules or
production to identify the match for input string by backtracking at every step of
derivation. So, if the applied production does not give the input string as needed, or it
does not match with the needed string, then it can undo that shift.
Example1 − Consider the Grammar
S→aAd
A→bc|b
Make parse tree for the string a bd. Also, show parse Tree when Backtracking is required
when the wrong alternative is chosen.
Solution
The derivation for the string abd will be −
S ⇒ a A d ⇒ abd (Required String)
If bc is substituted in place of non-terminal A then the string obtained will be abcd.
S ⇒ a A d ⇒ abcd (Wrong Alternative)
Figure (a) represents S → aAd

Figure (b) represents an expansion of tree with production A → bc which gives string
abcd which does not match with string abd. So, it backtracks & chooses another
alternative, i.e., A → b in figure (c) which matches with abd.
 Top-Down Parsing without Backtracking
Once, the production rule is applied, it cannot be undone. There are two types of Top-
Down Parsing without Backtracking, which are:
Predictive Parser − Predictive Parser is also known as Non-Recursive Predictive
Parsing. A predictive parser is an effective approach of implementing recursive descent
parsing by handling the stack of activation records explicitly. The predictive parser has
an input, a stack, a parsing table, and an output. The input includes the string to be
parsed, followed by $, the right-end marker.
Top-down parsers often use predictive parsing techniques, in which the parser predicts
the following symbol inside the enter based on the present-day country of the parse
stack and the manufacturing rules of the grammar. This permits the parser to speedy
determine if a particular enter string is valid beneath the grammar.
Recursive Descent Parser − A top-down parser that implements a set of recursive
procedures to process the input without backtracking is known as recursive-descent
parser, and parsing is known as recursive-descent parsing.
It is another type of top-down parsing that uses a hard and fast of recursive approaches
to suit non-terminals inside the grammar. Each nonterminal has a corresponding
manner this is answerable for parsing that nonterminal.
2. Bottom-Up Parsing:
Bottom-up parsing starts from the input tokens and constructs the parse tree in a bottom-
up manner by repeatedly applying grammar rules in a process known as "reduction." It
begins with individual tokens and combines them to form higher-level non-terminal
symbols until the entire input is reduced to the start symbol of the grammar.
Example:
Consider the following grammar:
S -> E
E -> E + T
E -> T
T -> num
And the input expression: "3 + 4 * 5"

Bottom-up parsing steps:


Shift 3 [3]
Shift + [3, +]
Shift 4 [3, +, 4]
Reduce T -> num [3, +, T]
Shift * [3, +, T, *]
Shift 5 [3, +, T, *, 5]
Reduce T -> num [3, +, T, *, T]
Reduce E -> T [3, +, E]
Reduce E -> E + T [E]
In this example, the parser starts by shifting the tokens onto the parsing stack and
eventually reduces them using the grammar rules until the parse tree is constructed.
Types of Bottom-Up Parser
There are three types of the bottom-up parser which are as follows −
Shift Reduce Parser − Shift reduce parser is a type of bottom-up parser. It can require a
stack to influence grammar symbols. A parser goes on shifting the input symbols onto
the stack until a handle comes on the top of the stack. When a handle occurs on the top
of the stack, it implements reduction.
Operator Precedence Parser − The shift-reduce parsers can be generated by hand for a
small class of grammars. These grammars have the property that no production on the
right side is ϵ or has two adjacent non-terminals. Grammar with the latter property is
known as operator grammar.
LR Parsers − The LR Parser is a shift-reduce parser that creates use of deterministic
finite automata, identifying the set of all viable prefixes by reading the stack from
bottom to top. It decides what handle, if any, is available.
A viable prefix of a right sequential form is that prefix that includes a handle, but no
symbol to the right of the handle. Thus, if a finite state machine that identifies viable
prefixes of the right sentential form is generated, it can guide the handle selection in the
shift-reduce parser.
There are three types of LR Parsers which are as follows −
 Simple LR Parser (SLR): It is very easy to implement but it fails to
produce a table for some classes of grammars.
 Canonical LR Parser (CLR): It is the most powerful and works on large
classes of grammars.
 Look Ahead LR Parser (LALR): It is intermediate in power between SLR
and CLR.
Both bottom-up and top-down parsing have their advantages and disadvantages. Bottom-
up parsing is generally more powerful and can handle a wider range of grammars,
including left-recursive and ambiguous grammars. Top-down parsing is simpler to
implement, more intuitive, and often used in LL(k) grammars.
Overall, bottom-up and top-down parsing are fundamental techniques in syntax analysis
that enable the construction of parse trees from input strings based on a given grammar.

3. Discuss Symbol table and how to represent scope of information


stored in symbol table.

Symbol Table is an important data structure created and maintained by the compiler in
order to keep track of semantics of variables i.e. it stores information about the scope
and binding information about names, information about instances of various entities
such as variable and function names, classes, objects, etc.
It acts as a central repository for storing and retrieving information related to these
symbols during the various phases of compilation or interpretation.
Symbol table is used by the compiler to achieve compile-time efficiency and used by
various phases of the compiler as follows: -
1. Lexical Analysis: Creates new table entries in the table, for example like entries
about tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension,
line of reference, use, etc.
3. Semantic Analysis: Uses available information in the table to check for
semantics i.e. to verify that expressions and assignments are semantically correct
(type checking) and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and
what type of run-time is allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in the symbol table for machine-
dependent optimization.
6. Target Code generation: Generates code by using address information of
identifier present in the table.
Items stored in Symbol table:
 Variable names and constants
 Procedure and function names
 Literal constants and strings
 Compiler generated temporaries
 Labels in source languages
Information used by the compiler from Symbol table:
 Data type and name
 Declaring procedures
 Offset in storage
 If structure or record then, a pointer to structure table.
 For parameters, whether parameter passing by value or by reference
 Number and type of arguments passed to function
 Base Address
Operations of Symbol table – The basic operations defined on a symbol table include:
Operations on Symbol Table :
Following operations can be performed on symbol table-
1. Insertion of an item in the symbol table.
2. Deletion of any item from the symbol table.
3. Searching of desired item from symbol table.
Functions of a Symbol Table:
1. Declaration Management: The symbol table keeps track of all declared symbols in
the source code, along with their attributes such as data type, scope, memory location,
etc.
2. Scope Management: It maintains information about the scopes of symbols, including
nested scopes in block-structured languages. This ensures that symbols are accessible
only within their respective scopes and not outside.
3. Conflict Resolution: The symbol table handles naming conflicts, such as multiple
symbols with the same name in different scopes, and helps to resolve them based on the
language's scoping rules.
4. Type Checking: It assists in enforcing type rules, verifying that expressions and
assignments have compatible data types.
5. Code Generation: For compilers, the symbol table can also play a role in code
generation by providing information about memory allocation and addressing modes for
variables.
Representation of Scope Information in Symbol Table:
The representation of scope information in the symbol table varies depending on the
programming language and the scoping rules it follows. Here are some common ways to
represent scope information:
1. Scope Representation by Number
It stores all values in one Symbol Table, the same name can be declared many times as a
distinct name in different blocks or procedures.
So, each Procedure or block can be given a unique number.
Symbol Table will not contain just the name of the identifier, but each entry will have a
pair (name, procedure-number).
We can recognize the particular identifier in a procedure by matching not only the name
of the identifier but also match the procedure number to which it belongs.
Each block or procedure will be assigned a number.
Example:

The following shows the configuration of Symbol Table, i.e., fill identifiers with their
corresponding procedure or block no.

Here, the identifier or name will belong to the most closely nested of active subprograms
declaring that identifier or procedure.

2. Stack of Tables:
In block-structured languages with nested scopes, a stack of symbol tables is often used.
Each table represents the symbols in the current scope, and new tables are pushed onto
the stack when entering a new scope and popped when leaving it. This allows for
efficient scope lookup and resolution.
3. Linked List:
Another approach is to use a linked list of symbol tables, where each table contains
symbols from a specific scope, and the tables are linked together to represent the scope
hierarchy.
4. Hash Table with Scoping Rules:
A hash table can be used to map symbol names to their corresponding entries in the
symbol table. Scoping rules can be enforced by using separate chaining or open
addressing to handle naming conflicts within the same scope.
5. Tree Structure:
For statically scoped languages, the symbol table can be implemented using a tree
structure, where each node represents a nested scope, and the parent-child relationship
reflects the scope hierarchy. This approach is useful when the scope nesting can be
determined statically during the compilation phase.
6. Dynamic Stack-like Structure:
For dynamically scoped languages, the symbol table may use a stack-like structure to
manage active scopes. When a function is called, a new scope is added to the top of the
stack, and when the function returns, the scope is removed from the stack.

In the source program, every name possesses a region of validity, called the scope of that
name. The rules in a block-structured language are as follows:

1. If a name declared within block B then it will be valid only within B.


2. If B1 block is nested within B2 then the name that is valid for block B2 is also
valid for B1 unless the name's identifier is re-declared in B1.

o These scope rules need a more complicated organization of symbol table than a list
of associations between names and attributes.
o Tables are organized into stack and each table contains the list of names and their
associated attributes.
o Whenever a new block is entered then a new table is entered onto the stack. The
new table holds the name that is declared as local to this block.
o When the declaration is compiled then the table is searched for a name.
o If the name is not found in the table then the new name is inserted.
o When the name's reference is translated then each table is searched, starting from
the each table on the stack.

For example:

1. int x;
2. void f(int m) {
3. float x, y;
4. {
5. int i, j;
6. int u, v;
7. } }
8. int g (int n)
9. {
10. bool t; }
The organization and design of the symbol table are crucial for the efficiency and
correctness of the compilation or interpretation process. It should provide fast access to
symbols, handle scope resolution correctly, and enforce the scoping rules specified by the
language. Additionally, the symbol table should be able to handle complex scenarios like
function overloading, nested classes, and other language-specific features.

You might also like