Introduction
Introduction
Introduction
Construction
Lecture 1
Compiler
• A given source language is either compiled or
interpreted for execution
• compiler is a program that translates a source
program (HLL; C, Java) into target code;
machine re-locatable code or assembly code.
– The generated machine code can be later
executed many times against different data
each time.
– The code generated is not portable to other
systems.
Interpreter
In an interpreted language, implementations
execute instructions directly and freely
without previously compiling a program into
machine code instructions.
Translation occurs at the same time as the
program is being executed.
An interpreter reads an executable source
program written in HLL as well as data for this
program, and it runs the program against the
data to produce some results.
–
Interpreter
Common interpreters include Perl, Python, and
Ruby interpreters, which execute Perl, Python,
and Ruby code respectively.
Others include Unix shell interpreter, which
runs operating system commands interactively.
Source program is interpreted every time it is
executed (less efficient).
–
Interpreter
Interpreted languages are portable since they
are not machine dependent. They can run on
different operating systems and platforms.
They are translated on the spot and thus
optimized for the system on which they’re
being run.
–
Compilers and Interpreters
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent
program written in a target language
Input
Source Target
Compiler
Program Program
• “Interpretation”
– Performing the operations implied by the
source program
Source
Program
Interpreter Output
Input
Error messages
The Analysis-Synthesis Model of
Compilation
• There are two parts to compilation:
– Analysis Phase
This is also known as the front-end of the compiler. It
reads the source program, divides it into core parts and
then checks for lexical, grammar and syntax errors. The
analysis phase generates an intermediate representation
of the source program and symbol table, which should
be fed to the Synthesis phase as input
– Synthesis Phase
Its also known as the back-end of the compiler.
It generates the target program with the help of
intermediate source code representation and symbol
table.
Other Tools that Use the
Analysis-Synthesis Model
• Editors (syntax highlighting)
• Pretty printers (e.g. Doxygen)
• Static checkers (e.g. Lint and Splint)
• Interpreters
• Text formatters (e.g. TeX and LaTeX)
• Silicon compilers (e.g. VHDL)
• Query interpreters/compilers (Databases)
Preprocessors, Compilers, Assemblers and
Linkers
• A preprocessor considered as part of compiler, is a
tool that produces input for compilers. It deals with
macro-processing, file inclusion, language
extension, etc.
• Assembler
An assembler translates assembly language programs
into machine code. The output of an assembler is called
an object file, which contains a combination of
machine instructions as well as the data required to
place these instructions in memory.
Preprocessors, Compilers, Assemblers and
Linkers
• Linker
A computer program that links and merges various
object files together in order to make an executable
file.
All these files might have been compiled by separate
assemblers. The major task of a linker is to search
and locate referenced module/routines in a program
and to determine the memory location where these
codes will be loaded, making the program
instruction to have absolute references.
Compiler Design - Architecture of a
Compiler
• A compiler can have many phases and passes.
• Pass : A pass refers to the traversal of a compiler
through the entire program.
• Phase : A phase of a compiler is a distinguishable
stage, which takes input from the previous stage,
processes and yields output that can be used as input
for the next stage. A pass can have more than one
phase.
Phases of a Compiler
• The compilation process is a sequence of various
phases.
• Each phase takes input from its previous stage and
has its own representation of source program, and
feeds its output to the next phase of the compiler.
Traditional three pass compiler
errors
Phases of a Compiler - Front end
The front end analyzes the source code to
build an internal representation of the
program, called the intermediate
representation (IR).
It also manages the symbol table, a data
structure mapping each symbol in the source
code to associated information such as
location, type and scope.
Phases of a Compiler - Front end cont’d
Source IR Machine
Front end Back end
code code
errors
errors
errors
• Scanner:
– Maps characters into tokens – the basic unit of syntax
• x = x + y becomes <id, x> = <id, x> + <id, y>
– Typical tokens: number, id, +, -, *, /, do, end
– Eliminate white space (tabs, blanks, comments)
• A key issue is speed so instead of using a tool like
LEX it sometimes needed to write your own
scanner
Front end
Source tokens IR
Scanner Parser
code
errors
• Parser:
– Recognize context-free syntax
– Guide context-sensitive analysis
– Construct IR
– Produce meaningful error messages
– Attempt error correction
• There are parser generators like YACC which
automates much of the work
Phases of a Compiler cont’d
Middle End – The Optimizer
The middle end performs optimizations on the
intermediate representation in order to improve the
performance and the quality of the produced
machine code.
The middle end contains those optimizations that
are independent of the CPU architecture being
targeted.
– Effort to realize efficiency
– Can be very computationally intensive
Middle end (optimizer)
• Modern optimizers are usually built as a set
of passes
• Typical passes
– Constant propagation
– Common sub-expression elimination
– Redundant store elimination
– Dead code elimination
Back end
Instruction Register Machine code
IR selection Allocation
errors
errors