Unit-1: Introduction To Compilers
Unit-1: Introduction To Compilers
A translator can be defined as: “A device that changes a sentence from one
language to another without change of meaning.”
1.1.1 Assemblers
In the early days of programming, machine code (binary) was the only option.
Unfortunately this was laborious, prone to error and difficult. A slight
improvement on this was the use of hexadecimal or octal which reduced the
number of errors and the time to enter the program. Eventually assembly
languages were developed which were easier and more productive to use
whilst preserving the speed and compactness of machine code.
Assembly languages vary from one type of computer to another (or more
correctly from processor to processor) which results in a difficulty in
transporting programs from one computer to another.
Assemblers are the simplest of all the translators to understand since the
majority of the statements in the source code are mnemonics (short words
that help you to remember something) representing specific binary patterns -
the others being labels, directives (or pseudo-ops) which give instructions to
the assembler.
So, for instance, rather than enter the binary pattern 01011100 which might
mean “Increment the contents of the Accumulator by 1” we could type in the
mnemonic “INC” which the assembler would translate into the appropriate
binary pattern.
1
Compiler Design
1.1.2 Interpreters
1.1.3 Compilers
A compiler is a program that can read a program in one language - the source
language - and translate it into an equivalent program in another language -
the target language. An important role of the compiler is to report any errors
in the source program that it detects during the translation process.
Source Program Input
Target
Compiler Program
2
Compiler Design
Character stream
Lexical Analysis
Token Stream
Syntax Analysis
Syntax Tree
Semantic Analysis
Syntax Tree
Symbol Intermediate Code Generator
Table
Intermediate Representation
Intermediate Representation
Code Generator
Phases of a Compiler
3
Compiler Design
The analysis part breaks up the source program into constituent pieces and
imposes a grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program. The analysis
part also collects information about the source program and stores it in a
data structure called a symbol table, which is passed along with the
intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the
intermediate representation and the information in the symbol table.
The analysis part is often called the front end of the compiler; the synthesis
part is the back end.
The lexical analyzer is the interface between the source program and the
compiler. It reads the source program one character at a time, carving the
source program into a sequence of atomic units called tokens. In other words,
the main function of the lexical analyzer is to determine the tokens, that may
be identifiers, keywords, constants, operations, and punctuation symbols
such as commas and parentheses.
The characters in this statement are mapped into the following eight tokens
passed on to the syntax analyzer:
The second phase of the compiler is syntax analysis or parsing. A parser has
two functions as:
(i) To check whether the tokens occurring in the input are permitted by
the specification of the source language, and
(ii) To give the sequence of tokens, a tree like structure also called as
parse tree.
The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream. A typical representation is a
4
Compiler Design
syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
/ *
A * / C
B C A B
(a) (b)
PARSER checks whether the output of lexical analyzer satisfies the context
free grammar (CFG).
The semantic analyzer uses the syntax tree and the information in the symbol
table to check the source program for semantic consistency with the language
definition. It also gathers type information and saves it in either the syntax
tree or the symbol table, for subsequent use during intermediate-code
generation. An important part of semantic analysis is type checking, where
the compiler checks that each operator has matching operands. For example,
many programming language definitions require an array index to be an
integer; the compiler must report an error if a floating-point number is used
to index an array.
For example, the parse tree generated for the statement: A/B*C
/ *
A * / C
B C A B
(a) (b)
5
Compiler Design
T1 = B * C T1 = A/B
T2 = T1/A T2 = T1*C
(a) (b)
if A > B GOTO L2
GOTO L3
L2:
----------
----------
L3:
----------
----------
This sequence could be replaced by the single statement
if A <=B GOTO L3
L3:
----------
----------
Loop Optimization: A typical improvement is to move a computation that
produces the same result, each time around the loop to a point in the
program just before the loop is entered.
6
Compiler Design
A compiler need to collect information about all the data objects that appear
in the source program. The information is collected by early phases of the
compiler – lexical and syntactic analysis – and entered into the symbol table.
The symbol table is a data structure containing a record for each variable
name, with fields for the attributes of the name. The data structure should be
designed to allow the compiler to find the record for each name quickly and to
store or retrieve data from that record quickly.
The most successful tools are those that hide the details of the generation
algorithm and produce components that can be easily integrated into the
remainder of the compiler. Some commonly used compiler-construction tools
include:
7
Compiler Design
Compiler Interpreter
1. Spends a lot of time analyzing and 1. Relatively little time is spent
processing the program. analyzing and processing the
program.
2. The resulting executable is some 2. The resulting code is some sort of
form of machine – specific binary intermediate code.
code.
3. The computer hardware interprets 3 The resulting code is interpreted
(executes) the resulting code. by another program.
Representation of Differences