Complier Design Documentation
Complier Design Documentation
What is Compiler ?
A compiler is a computer program which helps you transform source code
written in a high-level language into low-level language.
Before knowing about the concept of compilers, you first need to understand a
few other tools which work with compilers.
Here we are drawing the language process system by using the compiler
Pre-processor
The pre-processor is considered as a part of the Compiler. It is a tool which
produces input for Compiler. It deals with macro processing, augmentation,
language extension, etc.
1
Compiler
A compiler is a computer program which helps you transform source code written
in a high-level language into low-level machine language.
Assembler:
It translates assembly language code into machine understandable language. The
output result of assembler is known as an object file which is a combination of
machine instruction as well as the data required to store these instructions in
memory.
• Linker:
The linker helps you to link and merge various object files to create an executable
file. All these files might have been compiled with separate assemblers.
2
Loader:
The loader is a part of the OS, which performs the tasks of loading executable
files into memory and run them. It also calculates the size of a program which
creates additional memory space.
Machine code
a computer programming language consisting of binary or hexadecimal
instructions which a computer can respond to directly.
Phases of Compiler
The compilation process contains the sequence of various phases. Each phase
takes source program in one representation and produces output in another
representation. Each phase takes input from its previous stage.
3
There are the various phases of compiler:
4
Lexical Analysis
The first phase of scanner works as a text scanner. This phase scans the source
code as a stream of characters and converts it into meaningful lexemes. Lexical
analyser represents these lexemes in the form of tokens.
<token-name, attribute-value>
Example:
Sum=old sum Rate*50
id1=id2+id3*50
Syntax Analysis
The next phase is called the syntax analysis or parsing.
the parser checks if the expression made by the tokens is syntactically correct.
id1=id2+id3*50
Example
5
Semantic Analysis
Semantic analysis is the third phase of compilation process. It checks whether
the parse tree follows the rules of language. Semantic analyser keeps track of
identifiers, their types and expressions. The output of semantic analysis phase is
the annotated tree syntax.
• Intermediate Code Generation
compiler generates the source code into the intermediate code. Intermediate
code is generated between the high-level language and the machine language.
Example
id1=id2+id3*50
temp1 = inttoreal(50)
temp2 = id3*temp1
temp3 = id2+temp2
id1 = temp3
Code Optimization
It removes the unnecessary lines of the code and arranges the sequence of
statements in order to speed up the program execution.
Example id1=id2+id3*50
temp1=id3*50.0
id1=id2+temp1
• Code Generation
Code generation is the final stage of the compilation process. It takes the
optimized intermediate code as input and maps it to the target machine
language. Example: id1=id2+id3*50
6
ADD R1, R2
MOV Id1, R1
7
Productions Semantic rule
Parser
A parser takes input in the form of sequence of tokens and produces output in
the form of parse tree.
Parsing is of two types: 1.top down parsing
2.bottom up parsing.
8
Top-down parsing
• The process of constructing the parse tree which starts from the root and
goes down to the leaf is Top-Down Parsing.
• Top-Down Parsers constructs from the Grammar which is free from
ambiguity and left recursion. Top-Down Parsers uses leftmost derivation
to construct a parse tree.
9
Example
S aABe
A bc
B d
Input string is abcde.
10
Recursive descent parser
• Recursive Descent Parser uses the technique of Top-Down Parsing
without backtracking.
• It can be defined as a Parser that uses the various recursive procedure to
process the input string with no backtracking. It can be simply performed
using a Recursive language.
Backtracking
Example1 − Consider the Grammar
S→aAd
A→bc|b
i/p=abd
11
Predictive Parser:
In this, we will cover the overview of Predictive Parser and mainly focus on
the role of Predictive Parser. And will also cover the algorithm for the
implementation of the Predictive parser algorithm and finally will discuss an
example by implementing the algorithm for precedence parsing.
Example :
Given grammar
E->E+T|T
T->T*F|F
F->(E)|id
First(
12
Bottom-Up Parsing
Bottom-up parsing parses the stream of tokens from the lexical analyzer. And
after parsing the input string it generates a parse tree.
The bottom-up parser builds a parse tree from the leaf nodes and proceeds
towards the root node of the tree. In this section, we will be discussing bottom-
up parsing along with its types.
Example:
S aABe
A Abc/b
B d
Input string “ abbcde ”
• abbcde
aAbcde(A b)
aAde(A Abc)
aABe(B d)
S(S aABe)
13
Example
E → E+T|T
T → T*F|F
F → (E)|id i/p=id*id
id*id
F*id (F id)
T*id (T F)
T*F (F id)
T (E T)
E
Example
E → E+T|T
T → T*F|F
F → (E)|id i/p=id*id
id*id
F*id (F id)
T*id (T F)
T*F (F id)
T (E T)
E
14
Shift reduce parser:
• Shift − Parser shifts zero or more input symbols onto the stack
until the handle is on top of the stack.
• Reduce − Parser reduce or replace the handle on top of the stack
to the left side of production, i.e., R.H.S. of production is
popped, and L.H.S is pushed.
15
• Accept − Step 3 and Step 4 will be repeated until it has detected
an error or until the stack includes start symbol (S) and input
Buffer is empty, i.e., it contains $.
Example
Given grammar
E → E+T|T
T → T*F|F
F → (E)|id i/p=id*id
16
Canonical Collection of LR(0) items
Example
Given grammar:
1. S → AA
2. A → aA | b
Add Augment Production and insert '•' symbol at the first position for every
production in G
S` → •S
S → •AA
A → •aA
A → •b
Drawing DFA:
17
LR(0) Table
18
SLR(1) Parser
Example
S–>AA
A–>aA|b
Solution:
19
A–>.b [3rd production]
CLR(1) Parser
EXAMPLE
S-->AA
A-->aA|b
Solution :
22
STEP 3-
EXAMPLE
S-->AA
A-->aA|b
Solution:
23
S-->.AA ,$ [1st production]
A-->.aA ,a|b [2nd production]
A-->.b ,a|b [3rd production]
STEP 3 –
24
From step 2
25
Three address code in Compiler
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Implementation of Three Address Code –
1. Quadruple –
It is structure with consist of 4 fields namely op, arg1, arg2 and result.
op denotes the operator and arg1 and arg2 denotes the two operands and
result is used to store the result of the expression.
Advantage –
26
Example – Consider expression a = b * – c + b * – c.
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
2. Triples –
Disadvantage –
27
Example – Consider expression a = b * – c + b * – c
3. Indirect Triples –
28
LEX:
o Firstly lexical analyzer creates a program lex.1 in the Lex language. Then
Lex compiler runs the lex.1 program and produces a C program lex.yy.c.
o Finally C compiler runs the lex.yy.c program and produces an object
program a.out.
o a.out is lexical analyzer that transforms an input stream into a sequence of
tokens.
29
Lex file format:
A Lex program is separated into three sections by %% delimiters. The formal of
Lex source is as follows:
1. { definitions }
2. %%
3. { rules }
4. %%
5. { user subroutines
Where pi describes the regular expression and action1 describes the actions
what action the lexical analyzer should take when pattern pi matches a lexeme.
Storage Organization:
o When the target program executes then it runs in its own logical address
space in which the value of each program has a location.
o The logical address space is shared among the compiler, operating system
and target machine for management and organization. The operating
system is used to map the logical address into physical address which is
usually spread throughout the memory.
Storage Allocation
30
3. Heap storage allocation
Activation Record
o Control stack is a run time stack which is used to keep track of the live
procedure activations i.e. it is used to find out the procedures whose
execution have not been completed.
31
When it is called (activation begins) then the procedure name will push on to
the stack and when it returns (activation ends) then it will popped.
An activation record is pushed into the stack when a procedure is called and it is
popped when the control returns to the caller function.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before
the procedure is called.
32
Local Data: It holds the data that is local to the execution of the procedure.
Heap Management
The heap is the portion of the store that is used for data that lives indefinitely, or
until the program explicitly deletes it. While local variables typically become
inaccessible when their procedures end, many languages enable us to create
objects or other data whose existence is not tied to the procedure activation that
creates them. For example, both C + + and Java give the programmer new to
create objects that may be passed — or pointers to them may be passed — from
procedure to procedure, so they continue to exist long after the procedure that
created them is gone. Such objects are stored on a heap.
The memory manager keeps track of all the free space in heap storage at all times.
It performs two basic functions:
3. Locality in Programs
Most programs exhibit a high degree of locality; that is, they spend most of their
time executing a relatively small fraction of the code and touching only a small
fraction of the data. We say that a program has temporal locality if the memory
locations it accesses are likely to be accessed again within a short period of time.
We say that a program has spatial locality if memory locations close to the
location accessed are likely also to be accessed within a short period of time.
34
Peephole Optimization :
Peephole optimization is a type of code Optimization performed on a small
part of the code. It is performed on a very small set of instructions in a
segment of code.
The small set of instructions or small part of code on which peephole
optimization is performed is known as peephole or window.
It basically works on the theory of replacement in which a part of code is
replaced by shorter and faster code without a change in output. The peephole
is machine-dependent optimization.
Redundant-instructions elimination
Flow-of-control optimizations
Algebraic simplifications
Use of machine idioms
Unreachable
we can delete instructions (2) because whenever (2) is executed. (1) will
ensure that the value of a is already in register R0.If (2) had a label we could not
be sure that (1) was always executed immediately before (2) and so we could not
remove (2).
Unreachable Code:
35
#define debug 0
….
If ( debug ) {
Print debugging information
Flows-Of-Control Optimizations:
goto L1
….
L1: goto L2
Algebraic Simplification:
i:=i+1 → i++
i:=i-1 → i- -
Basic Block is a straight line code sequence that has no branches in and out
branches except to the entry and at the end respectively. Basic Block is a set of
statements that always executes one after other, in a sequence.
The first task is to partition a sequence of three-address code into basic blocks.
A new basic block is begun with the first instruction and instructions are added
until a jump or a label is met. In the absence of a jump, control moves further
consecutively from one instruction to another. The idea is standardized in the
algorithm below:
Algorithm:
Partitioning three-address code into basic blocks.
37
4) t2 = t1 + j
5) t3 = 8 * t2
6) t4 = t3 - 88
7) a[t4] = 0.0
8) j = j + 1
9) if j <= goto (3)
10) i = i + 1 //Leader 4 (Immediately following Conditional goto
statement)
11) if i <= 10 goto (2)
12) i = 1 //Leader 5 (Immediately following Conditional goto
statement)
13) t5 = i - 1 //Leader 6 (Target of 17th statement)
14) t6 = 88 * t5
15) a[t6] = 1.0
16) i = i + 1
17) if i <= 10 goto (13)
The given algorithm is used to convert a matrix into identity matrix i.e. a
matrix with all diagonal elements 1 and all other elements as 0.
Steps (3)-(6) are used to make elements 0, step (14) is used to make an
element 1. These steps are used recursively by goto statements.
There are 6 Basic Blocks in the above code :
B1) Statement 1
B2) Statement 2
B3) Statement 3-9
B4) Statement 10-11
B5) Statement 12
B6) Statement 13-17
38
Issues in the design of a code generator:
• Target program: The target program is the output of the code generator. The
output may be absolute machine language, relocatable machine language, or
assembly language.
39