Chapter-1 Compiler Design
Chapter-1 Compiler Design
Introduction to Compiler
………………………………………………………………………………………………………
Topics
1.1 Compiler Structure: Analysis and Synthesis Model of Compilation, different sub-phases
within analysis and synthesis phases
1.2 Basic concepts related to Compiler such as interpreter, simple One-Pass Compiler,
preprocessor, macros, and Symbol table and error handler.
………………………………………………………………………………………………………
What is Compiler?
A compiler is a translator software program that takes its input in the form of program written
in one particular programming language (source language) and produce the output in the form
of program in another language (object or target language).
The major features of Compiler are listed below:
A compiler is a translator that converts the high-level language into the machine
language.
High-level language is written by a developer and machine language can be understood
by the processor.
Compiler is used to show errors to the programmer.
The main purpose of compiler is to change the code written in one language without
changing the meaning of the program.
When you execute a program which is written in HLL programming language then it
executes into two parts.
In the first part, the source program compiled and translated into the object program
(low level language).
In the second part, object program translated into the target program through the
assembler.
Source program Object program
Compiler
Phases of a Compiler
A compiler operates in phases. A phase is a logically interrelated operation that takes source
program in one representation and produces output in another representation. The compilation
process contains the sequence of various phases. Each phase takes source program in one
representation and produces output in another representation. The phases of a compiler are
shown in below. There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis (Machine Dependent/Language independent)
Analysis part
In analysis part, an intermediate representation is created from the given source program.
This part is also called front end of the compiler. This part consists of mainly following four
phases:
Lexical Analysis
Syntax Analysis
Semantic Analysis and
Intermediate code generation
Synthesis part
In synthesis part, the equivalent target program is created from intermediate representation of
the program created by analysis part. This part is also called back end of the compiler. This part
consists of mainly following two phases:
Code Optimization and
Final Code Generation
Input source program
Lexical analyzer
Syntax analyzer
Intermediate code
Generator
Code optimizer
Code generator
Tokens Description
While while keyword
( left parenthesis
I Identifier
> greater than symbol
0 integers constant
) right parenthesis
I Identifier
= Equals
I Identifier
- Minus
2 integers constant
; Semicolon
Assignment Statement
In a parse tree all terminals are at
leaves.
Identifier = Expression
Expressio All inner nodes are non-terminals
n in a context free grammar (CFG).
Expression + Expression
Newval
Identifier Number
Oldval 12
The syntax of a language is specified by a context free grammar (CFG).
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for the given program.
3. Semantic Analyzer
The next phase of the semantic analyzer is the semantic analyzer and it performs a very
important role to check the semantics rules of the expression according to the source language.
The previous phase output i.e. syntactically correct expression is the input of the semantic
analyzer. Semantic analyzer is required when the compiler may require performing some
additional checks such as determining the type of expressions and checking that all statements
are correct with respect to the typing rules, that variables have been properly declared before
they are used, that functions are called with the proper number of parameters etc. This semantic
analyzer phase is carried out using information from the parse tree and the symbol table.
The parsing phase only verifies that the program consists of tokens arranged in a syntactically
valid combination. Now semantic analyzer checks whether they form a sensible set of
instructions in the programming language or not. Some examples of the things checked in this
phase are listed below:
The type of the right side expression of an assignment statement should match the type
of the left side ie. in the expression newval = oldval + 12, The type of the expression
(oldval+12) must match with type of the variable newval.
The parameter of a function should match the arguments of a function call in both
number and type.
The variable name used in the program must be unique etc.
The main purposes of Semantic analyzer are:
It is used to analyze the parsed code for meaning.
Semantic analyzer fills in assumed or missing information.
It tags groups with meaning information.
5. Code Optimization
Optimization is the process of transforming a piece of code to make more efficient (either in
terms of time or space) without changing its output or side effects. The process of removing
unnecessary part of a code is known as code optimization. Due to code optimization process it
decreases the time and space complexity of the program i.e.
b=0
Detection of redundant function calls t1 = a + b
Detection of loop invariants t2 = c * t1
a = t2
Common sub-expression elimination
Dead code detection and elimination
b=0
a = c *a
The main purposes of Code optimization are:
It examines the object code to determine whether there are more efficient means of
execution.
Important techniques that are used for lexical analyzer:
Loop unrolling.
Common-sub expression elimination
Operator reduction etc.
6. Code Generation
It generates the assembly code for the target CPU from an optimized intermediate
representation of the program. Ex: Assume that we have an architecture with instructions
whose at least one of its operands is a machine register.
A=b+c*d/f
MOVE c, R1
MULT d, R1
DIV f, R1
ADD b, R1
MOVE R1, A
Example: Showing all phases of Compiler
Multi-pass Compiler
Multi pass compiler is used to process the source code of a program several times. In the first
pass, compiler can read the source program, scan it, extract the tokens and store the result in an
output file. In the second pass, compiler can read the output file produced by first pass, build
the syntactic tree and perform the syntactical analysis. The output of this phase is a file that
contains the syntactical tree. In the third pass, compiler can read the output file produced by
second pass and check that the tree follows the rules of language or not. The output of semantic
analysis phase is the annotated tree syntax. This pass is going on, until the target output is
produced.
One-pass Compiler
One-pass compiler is used to traverse the program only once. The one-pass compiler passes
only once through the parts of each compilation unit. It translates each part into its final
machine code. In the one pass compiler, when the line source is processed, it is scanned and the
token is extracted. Then the syntax of each line is analyzed and the tree structure is build. After
the semantic part, the code is generated. The same process is repeated for each line of code until
the entire program is compiled.
Pre-Processor
Pre-processed
code
Compiler
Target Assembly
code
Assembler
Re-locatable
machine code Library files/
Re-locatable
Linker
Modules
Executable
machine code
Loader
Memory
Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation; file inclusion, language extension, etc.
A preprocessor produce input to compilers. They may perform the following functions.
Macro processing: A preprocessor may allow a user to define macros that are short hands
for longer constructs.
File inclusion: A preprocessor may include header files into the program text.
Rational preprocessor: these preprocessors augment older languages with more modern
flow-of-control and data structuring facilities.
Language Extensions: these preprocessor attempts to add capabilities to the language by
certain amounts to build-in macro.
Macros
A macro (which stands for macroinstruction) is a programmable pattern which translates a
certain sequence of input into a preset sequence of output. Macros can make tasks less repetitive
by representing a complicated sequence of keystrokes, mouse movements, commands, or other
types of input.
A Macro instruction is the notational convenience for the programmer. For every occurrence of
macro the whole macro body or macro block of statements gets expanded in the main source
code. Thus Macro instructions make writing code more convenient.
Features of Macro Processor:
Macro represents a group of commonly used statements in the source programming
language.
Macro Processor replaces each macro instruction with the corresponding group of
source language statements. This is known as expansion of macros.
Using Macro instructions programmer can leave the mechanical details to be handled by
the macro processor.
Macro Processor designs is not directly related to the computer architecture on which it
runs.
Macro Processor involves definition, invocation and expansion.
Symbol Tables
Symbol tables are data structures that are used by compilers to hold information about source-
program constructs. The information is collected incrementally by the analysis phase of a
compiler and used by the synthesis phases to generate the target code. Entries in the symbol
table contain information about an identifier such as its type, its position in storage, and any
other relevant information. Symbol tables typically need to support multiple declarations of the
same identifier within a program. The lexical analyzer can create a symbol table entry and can
return token to the parser, say id, along with a pointer to the lexeme. Then the parser can decide
whether to use a previously created symbol table or create new one for the identifier.
The basic operations defined on a symbol table include
allocate – to allocate a new empty symbol table
free – to remove all entries and free the storage of a symbol table
insert – to insert a name in a symbol table and return a pointer to its entry
lookup – to search for a name and return a pointer to its entry
set_attribute – to associate an attribute with a given entry
get_attribute – to get an attribute associated with a given entry
Other operations can be added depending on requirement
For example, a delete operation removes a name previously inserted
Possible entries in a symbol table:
Name: a string.
Attribute:
Reserved word
Variable name
Type name
Procedure name
Constant name
_______
Data type
Scope information: where it can be used.
Storage allocation, size…
………………
Example: Let’s take a portion of a program as below:
void fun ( int A, float B)
{
int D, E;
D = 0;
E = A / round (B);
if (E > 5)
{
Print D
}
}
Its symbol table is created as below:
Symbol Token Data type Initialization?
Fun Id Function name No
A Id Int Yes
B Id Float Yes
D Id Int No
E Id Int No