Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program
Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program
Compiler Construction
CS-4207
Lecture – 01 - 02
Introduction
Programming languages are notations for describing computations to people and to machines. The
world as we know it depends on programming languages, because all the software running on all
the computers was written in some programming language .But, before a program can be run, it
first must be translated into a form in which it can be executed by a computer.
The software systems that do this translation are called compilers.
What is Compiler?
Compiler is a program that can read a program in one language (source language) and translate it
into an equivalent program in another language (target language). An important role of compiler
is to report any errors in source program that it detects during the translation process.
If the target program is executable machine language program then it can be called by the user to
process inputs and produce outputs.
What is Interpreter?
An Interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in the
source program on inputs specified by the user.
Source Code
Interpreter Outputs
Inputs
Atif Ishaq - Lecturer GC University, Lahore
Compiler Vs Interpreter
The machine language target program produced by a compiler is usually much faster than an
interpreter at mapping inputs to outputs. However, an interpreter usually give better error
diagnostics than a compiler, because it executes the source program statement by statement.
Java language processor combine compilation and interpretation. A Java source program may first
be compiled into an intermediate form called bytecode. The bytecodes are then interpreted by a
virtual machine. A benefit of this arrangement is that bytecode compiled on one machine can be
interpreted on another machine, perhaps across the network. A just-in-time compiler translate the
bytecodes into machine language immediately before they run the intermediate program to process
inputs to achieve faster processing of inputs to outputs.
Many other programs are required to create an executable target program. A source program may
be divided into modules stored in separate files. The task of collecting the source program is
entrusted by a program called preprocessor. The preprocessor may also expand shorthands, called
macros into source language statements. The modified program is then fed to compiler. The
compiler produce an assembly language program as its output. The assembly language is then
processed by a program called an assembler that produces relocatable machine code as inputs.
These relocatable codes are linked together. The linker resolves external memory addresses, where
the code in one file may refer to a location in another file. The loader then puts together all of the
executable objects files into memory for execution.
Relocatable code is program code that can be loaded anywhere in memory.
The compiler/assembler produces a table of all such memory references, and the loader converts
them into absolute addresses as part of the loading process.
The study of compilers clarifies many deep issues in programming languages and their
execution, e.g. recursion, multithreading, object orientation. It may help you design your own
mini-language.
Underlying compilers construction are many Computer Science seminal concepts such as
syntax vs. Semantics, Generator vs. Recognizer and Syntax Directed Translation.
Understanding a compiler and its Optimization mechanisms enable us to write more efficient
programs
For example
analysis part also collects information about the source program and stores it in a data structure
called symbol table which is passed along with intermediate representation to the synthesis part.
The synthesis part constructs the desired target program from the intermediate representation and
the information in the symbol table. The analysis part is often called the front end of the compiler
and the synthesis part called the back end of compiler.
Phases of a Compiler
Compilation process operates as sequence of phases each of which transforms one representation
of the source program to another. Several phases may be grouped together and the intermediate
representation between the grouped phases need not be constructed explicitly. The symbol table
which stores information about the entire source program, is used by all phases of the compiler.
Semantic Analysis : Checking global consistency. Often does not comply with
hierarchical structure. Type Checking is an instance of such analysis.
Lexical Analysis
The first phase of a Compiler is Lexical Analysis or scanning
Reads streams of characters and groups the characters into meaningful sequence called
lexemes
For each lexeme produces as output a token of the form
(token-name , attribute-value)
token-name is abstract symbol used during syntax analysis
attribute-value points to an entry in the symbol table.
Token is passed to subsequent phase , syntax analysis
Syntax Analysis
Atif Ishaq - Lecturer GC University, Lahore
The second phase of compiler is syntax analysis or parsing. The parser uses token name to create
a tree like structure (syntax tree). It represents the grammatical structure of the token stream. In
syntax tree the interior node represents an operation while the children of the node represents the
arguments of operation.
The syntax tree represents the order in the operations in the assignment are to be performed.
The tree has interior node labeled * with id3 as its left child and 60 as its right child. The tree
represents that first rate will be multiplied with 60 and their result will then be added to the value
of initial. The assignment (=) is the root of the tree which indicates that the result of addition
must be stored into the location for the identifier position. The tree follows the usual convention
of arithmetic operations.
Semantic Analysis
The semantic analyzer checks the source code for semantic consistency with the language defined
with the use of syntax tree and information stored in symbol table. It collects the type information
and stores it in symbol table that is ultimately used in intermediate code generation. Another part
of semantic analyzer is type checking. In type checking it is checked that each operator has
matching operand. For example the index of the array is integer in many languages so if in these
language the index is defined as floating value the compiler will report an error.
In some languages type conversion is permitted. This type conversion is called coercions. For
example in an arithmetic expression the binary arithmetic operator may be applied to integer or to
the float values. But in case of a mixed expression for example if binary operator is applied on an
integer value and a floating value, the compiler may convert or coerce the integer into floating
value. It is already discussed when the analysis of source code is discussed.
Atif Ishaq - Lecturer GC University, Lahore
Code Optimization
In this phase the intermediate code is improves so that we can have a better target code as result.
When we talk about better target code we usually means faster code but there are other aspects
that may be considered. For example we are desired to have shorter code that consumes less power.
Consider again the example in discussion, during the semantic analysis the value 60 is converted
Atif Ishaq - Lecturer GC University, Lahore
into float by adding an extra node. In intermediate code generation one instruction is added that
converts integer into float and even for every operation, an instruction is used. In the optimization
process, optimizer deduce that conversion of 60 from integer to float can be done once in
compilation process, so the inttofloat(60) operation can be eliminated by replacing it with 60.0.
Similarly the number of instruction can also be eliminated by removing repetitive instruction.
Code Generation
The code generation phase uses intermediate code of source program to map it onto target
program.