Unit 1 - CD Cs3501
Unit 1 - CD Cs3501
Introduction to Compilers
Introduction
Translator
A translator is a programming language processor that converts a computer program
from one language to another.
It discovers and identifies the error during translation.
Compiler
A compiler is a computer program which helps to transform source code written in a
high-level language into low-level machine language.
It translates the code written in one programming language to some other language
without changing the meaning of the code.
The compiler also makes the end code efficient which is optimized for execution time
and memory space.
Page-1
Interpreter
An interpreter is a computer program that is used to directly execute
program instructions written using one of the many high-level programming languages.
Example
Page-2
Comparison of Compiler and Interpreter
Compiler Interpreter
Compiler takes Entire program as input Interpreter takes Single instruction as input
Intermediate object is generated No Intermediate object is generated
Conditional control statements are Conditional control statements are executes
executes faster slower
Errors are displayed after entire Errors are displayed for every instruction
program is checked interpreted(if any)
Page-3
Parts of the Compiler or Architecture of Compiler
A compiler can broadly be divided into two parts based on the way they compiler
Analysis Phase
Known as the front-end of the compiler
The analysis phase of the compiler reads the source program, divides it into core parts and then
checks for lexical, grammar and syntax errors.
The analysis phase generates an intermediate representation of the source program and symbol
table, which should be fed to the Synthesis phase as input.
Synthesis Phase
Known as the back-end of the compiler.
The synthesis phase generates the target program with the help of intermediate source code
representation and symbol table.
Page-4
Language Processing Systems or Cousins of Compiler.
Any computer system is made of hardware and software.
The hardware understands a language, which humans cannot understand. So write programs in
high-level language, which is easier to understand and remember.
These programs are then fed into a series of tools and OS components to get the desired code
that can be used by the machine. This is known as Language Processing System.
Page-5
Preprocessor: The preprocessor is considered as a part of the Compilerisatool which produces
input for Compiler. It deals with macro processing, augmentation, language extension,
etc.
Compiler: A compiler is a computer program which helps to transform source code
written in a high-level language into low-level machine language. It translates the code
written in one programming language to some other language without changing the
meaning of the code
Assembler: It translates assembly language code into machine understandable language.
The output result of assembler is known as an object file which is a combination of
machine instruction as well as the data required to store these instructions in memory.
Linker: The linker helps you to link and merge various object files to create an
executable file. All these files might have been compiled with separate assemblers. The
main task of a linker is to search for called modules in a program and to find out the
memory location where all modules are stored.
Loader: The loader is a part of the OS, which performs the tasks of loading executable
files into memory and run them. It also calculates the size of a program which creates
additional memory space.
Page-6
Compiler Construction Tools
Compiler construction tools were introduced as computer-related technologies spread all
over the world. They are also known as a compiler- compilers, compiler- generators or
translator.
These tools use specific language or algorithm for specifying and implementing the
component of the compiler.
Scanner generators: This tool takes regular expressions as input. For example LEX for Unix
Operating System.
Syntax-directed translation engines: These software tools offer an intermediate code by using
the parse tree. It has a goal of associating one or more translations with each node of the parse
tree.
Parser generators: A parser generator takes a grammar as input and automatically generates
source code which can parse streams of characters with the help of a grammar.
Automatic code generators: Takes intermediate code and converts them into Machine Language
Data-flow engines: This tool is helpful for code optimization. Here, information is supplied by
user and intermediate code is compared to analyze any relation. It is also known as data-flow
analysis. It helps you to find out how values are transmitted from one part of the program to
another part.
Page-7
Advantages
Modification of user program can be easily made and implemented as execution
proceeds.
Type of object that denotes a various may change dynamically.
Debugging a program and finding errors is simplified task for a program used for
interpretation.
The interpreter for the language makes it machine independent.
Disadvantages
The execution of the program is slower.
Memory consumption is more.
Page-8
Topic Structure of Compiler
Phases of a compiler
A compiler operates in phases.
A phase is a logically interrelated operation that takes source program in one
representation and produces output in another representation. The phases of a
compiler are shown in below
There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis(Machine Dependent/Language independent)
Compilation process is partitioned into no-of-sub processes called ‘phases’.
In practice, several phases may be grouped together, and the intermediate
representations between the grouped phases need not be constructed explicitly.
The symbol table, which stores information about the entire source program, is used
by all phases of the compiler.
Page-9
Page-10
Lexical Analysis:
lexical analysis or scanning forms the first phase of a compiler.
The lexical analyzer reads the stream of characters which makes the source program
and groups them into meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces tokens as output.
A token format is shown below.
<token-name, attribute-value>
These tokens pass on to the subsequent phase known as syntax analysis.
The token elements are listed below:
Token-name: This is an abstract symbol used during syntax analysis.
Attribute-value: This point to an entry in the symbol table for the
corresponding token.
Information from the symbol-table entry 'is needed for semantic analysis and code
generation.
Page-11
For example, let us try to analyze a simple arithmetic expression evaluationincompiler
context
position = initial + rate * 60
The individual units in the above expression can be grouped into lexemes
Lexeme Token
position Identifier
= Assignment Symbol
initial Identifier
+ Addition Operator
rate Identifier
* Multiplication Operator
60 Constant
The expression is seen by Lexical analyzer as
<id,1> <=> <id,2> <+> <id,3> <*> <60>
Page-12
Syntax Analysis
Syntax analysis forms the second phase of the compiler.
The list of tokens produced by the lexical analysis phase forms the input and arranges them
in the form of tree-structure (called the syntax tree). This reflects the structure of the
program. This phase is also called parsing.
The syntax tree consists of interior node representing an operation and the child of the node
representing arguments. A syntax tree for the token statement is as shown in the above
example.
Operators are considered as root nodes of this syntax tree. In the above case = has left and a
right node. The left node consists of <id,1> and the right node is again parsed and the
immediate operator is taken as right node
Page-13
Semantic analysis
Semantic analysis forms the third phase of the compiler. This phase uses the syntax tree
and the information in the symbol table to check the source program for consistency with
the language definition.
This phase also collects type information and saves it in either the syntax tree or the
symbol table, for subsequent use during intermediate-code generation.
Type checking forms an important part of semantic analysis. Here the compiler checks
whether each operator has matching operands
Coercions(some type conversions) may be permitted by the language specifications.
If the operator is applied to a floating-point number and an integer, the compiler may
convert or coerce the integer into a floating-point number.
Page-14
Coercion exists in the example quoted
position = initial + rate * 60
Suppose position, initial and rate variables are declared as float. Since the <id, 3> is a
floating point, then 60 is also converted to floating point.
The syntax tree is modified to include the conversion/semantic aspects. In the example
quoted 60 is converted to float as inttofloat
Page-15
Intermediate Code Generation ww.studymaterialz.in
Intermediate code generation forms the fourth phase of the compiler. After syntax and semantic
analysis of the source program, many compilers generate a low level or machine-like intermediate
representation, which can be thought as a program for an abstract machine.
This intermediate representation must have two important properties:
(a) It should be easy to produce
(b) It should be easy to translate into the target machine.
The above example is converted into three-address code sequence
There are several points worth noting about three-address instructions.
1. First each three-address assignment instruction has at most one
operator on the right side. The
multiplication precedes the addition in the source program
2. Second, the compiler must generate a temporary name to hold the value computed by a three-
address instruction.
3. Third, some "three-address instructions" like the first and last in the sequence have fewer
than
three operands.
t1 = inttofloat (60)
t2 = id3 * t1
t3 = id2 + t2 Page-16
id1 = t3
Code Optimization
Code Optimization forms the fifth phase in the compiler design.
This is a machine-independent phase which attempts to improve the intermediate code
for generating better (faster) target code.
For example, a straightforward algorithm generates the intermediate code using an
instruction for each operator in the tree representation that comes from the semantic
analyzer.
The optimizer can deduce that the conversion of 60 from integer to floating point can be
done once and for all at compile time, so the intto-float operation can be eliminated by
replacing the integer 60 by the floating-point number 60.0.
Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform
the above three-address code sequence into a shorter sequence.
t1= id3 * 60.0
id1 = id2 +
t1
Page-17
Code Generator
Code Generator forms the sixth phase in the compiler design. This takes the
intermediate representation of the source program as input and maps it to the target
language.
The intermediate instructions are translated into sequences of machine instructions
that perform the same task. A critical aspect of code generation is the assignment of
registers to hold variables.
Using R1 & R2 the intermediate code will get converted into machine code.
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
The first operand of
each instruction
specifies a destination.
The F in each Page-18
Symbol-Table Management
An essential function of a compiler is to record the variable names used in the source
program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name, its
type, its scope (where in the program its value may be used), and in the case of
procedure names, such things as the number and types of its arguments, the method of
passing each argument (for example, by value or by reference), and the type returned.
The symbol table is a data structure containing a record for each variable name, with
fields for the attributes of the name.
Error Handing
One of the most important functions of a compiler is the detection and reporting of
errors in the source program. The error message should allow the programmer to
determine exactly where the errors have occurred. Errors may occur in all or the
phases of a compiler.
Whenever a phase of the compiler discovers an error, it must report the error to the
error handler, which issues an appropriate diagnostic msg. Both of the table-
management and error-Handling routines interact with all phases of the compiler.
Page-19
The Grouping of Phases into Passes
Activities from several phases may be grouped together into a pass that reads an input
file and writes an output file.
For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis,
and intermediate code generation might be grouped together into one pass.
Code optimization might be an optional pass.
Back-end pass consisting of code generation for a particular target machine.
Some compiler collections have been created around carefully designed intermediate
representations that allow the front end for a particular language to interface with the
back end for a certain target machine.
With these collections, we can produce compilers for different source languages for one
target machine by combining different front ends with the back end for that target
machine.
Similarly, we can produce compilers for different target machines, by combining a front
end with back ends for different target machines
Page-20
Page-21