0% found this document useful (0 votes)
73 views8 pages

Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program

The document provides an overview of compilers, including: 1) A compiler translates a program written in one language into another language for execution. It detects and reports errors. 2) An interpreter directly executes operations from a source program on inputs. Compilers produce faster machine code than interpreters but interpreters provide better error messages. 3) A compiler consists of analysis and synthesis phases. Analysis breaks down and analyzes a program while synthesis constructs the target program.

Uploaded by

Faisal Shehzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views8 pages

Compiler Construction CS-4207 Lecture - 01 - 02: Input Output Target Program

The document provides an overview of compilers, including: 1) A compiler translates a program written in one language into another language for execution. It detects and reports errors. 2) An interpreter directly executes operations from a source program on inputs. Compilers produce faster machine code than interpreters but interpreters provide better error messages. 3) A compiler consists of analysis and synthesis phases. Analysis breaks down and analyzes a program while synthesis constructs the target program.

Uploaded by

Faisal Shehzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Atif Ishaq - Lecturer GC University, Lahore

Compiler Construction
CS-4207
Lecture – 01 - 02
Introduction
Programming languages are notations for describing computations to people and to machines. The
world as we know it depends on programming languages, because all the software running on all
the computers was written in some programming language .But, before a program can be run, it
first must be translated into a form in which it can be executed by a computer.
The software systems that do this translation are called compilers.

What is Compiler?
Compiler is a program that can read a program in one language (source language) and translate it
into an equivalent program in another language (target language). An important role of compiler
is to report any errors in source program that it detects during the translation process.

If the target program is executable machine language program then it can be called by the user to
process inputs and produce outputs.

Input Target Program Output

What is Interpreter?
An Interpreter is another common kind of language processor. Instead of producing a target
program as a translation, an interpreter appears to directly execute the operations specified in the
source program on inputs specified by the user.

Source Code

Interpreter Outputs

Inputs
Atif Ishaq - Lecturer GC University, Lahore

Compiler Vs Interpreter
The machine language target program produced by a compiler is usually much faster than an
interpreter at mapping inputs to outputs. However, an interpreter usually give better error
diagnostics than a compiler, because it executes the source program statement by statement.

Java language processor combine compilation and interpretation. A Java source program may first
be compiled into an intermediate form called bytecode. The bytecodes are then interpreted by a
virtual machine. A benefit of this arrangement is that bytecode compiled on one machine can be
interpreted on another machine, perhaps across the network. A just-in-time compiler translate the
bytecodes into machine language immediately before they run the intermediate program to process
inputs to achieve faster processing of inputs to outputs.

Many other programs are required to create an executable target program. A source program may
be divided into modules stored in separate files. The task of collecting the source program is
entrusted by a program called preprocessor. The preprocessor may also expand shorthands, called
macros into source language statements. The modified program is then fed to compiler. The
compiler produce an assembly language program as its output. The assembly language is then
processed by a program called an assembler that produces relocatable machine code as inputs.
These relocatable codes are linked together. The linker resolves external memory addresses, where
the code in one file may refer to a location in another file. The loader then puts together all of the
executable objects files into memory for execution.
Relocatable code is program code that can be loaded anywhere in memory.
The compiler/assembler produces a table of all such memory references, and the loader converts
them into absolute addresses as part of the loading process.

Why study compiler construction?


The expectation of teaching this course is not an expectation that many of you become compiler
builders. This course is very significant in many aspects.
 Many applications use components of compilers, e.g. analysis and translation.
Atif Ishaq - Lecturer GC University, Lahore

 The study of compilers clarifies many deep issues in programming languages and their
execution, e.g. recursion, multithreading, object orientation. It may help you design your own
mini-language.
 Underlying compilers construction are many Computer Science seminal concepts such as
syntax vs. Semantics, Generator vs. Recognizer and Syntax Directed Translation.
 Understanding a compiler and its Optimization mechanisms enable us to write more efficient
programs
For example

Maximal Expressibility and Maximal Efficiency


Compiler plays an important role to bridge between maximal expressibility and maximal
efficiency. The current trend
 Development of more expressive (and user-friendly) high level programming languages
 Development of more advanced (and parallel) architectures that enable more efficient
execution
The compiler should be able to reconcile these two (sometimes conflicting) trends.

Fields and Disciplines


There are different fields and disciplines that grew out of studies of compilers
 Semantics of programming languages
 Formal Languages and theory of parsing.
 Type theory and its logics
 Theory of abstract interpretation and program analysis
Structure of Compiler
A Compiler is mapped into two parts : analysis and synthesis
The analysis part breaks up the source program into constituent pieces and impose a grammatical
structure on them. It then uses this structure to create an intermediate representation of the source
program. If the analysis part detect that source program is either syntactically or semantically not
correct then it must provide informative message so the user can take corrective actions. The
Atif Ishaq - Lecturer GC University, Lahore

analysis part also collects information about the source program and stores it in a data structure
called symbol table which is passed along with intermediate representation to the synthesis part.

The synthesis part constructs the desired target program from the intermediate representation and
the information in the symbol table. The analysis part is often called the front end of the compiler
and the synthesis part called the back end of compiler.

Phases of a Compiler
Compilation process operates as sequence of phases each of which transforms one representation
of the source program to another. Several phases may be grouped together and the intermediate
representation between the grouped phases need not be constructed explicitly. The symbol table
which stores information about the entire source program, is used by all phases of the compiler.

Analysis of Source Program


Analysis can be partitioned into three phases
 Linear (Lexical) Analysis : Stream of characters is read left-to-right and partitioned into
tokens
 Hierarchical (Syntax) Analysis Tokens are grouped hierarchically into nested collections
Atif Ishaq - Lecturer GC University, Lahore

 Semantic Analysis : Checking global consistency. Often does not comply with
hierarchical structure. Type Checking is an instance of such analysis.

Lexical Analysis
 The first phase of a Compiler is Lexical Analysis or scanning
 Reads streams of characters and groups the characters into meaningful sequence called
lexemes
 For each lexeme produces as output a token of the form
(token-name , attribute-value)
 token-name is abstract symbol used during syntax analysis
 attribute-value points to an entry in the symbol table.
 Token is passed to subsequent phase , syntax analysis

Consider this example


position = initial + rate * 60
the characters in this assignment could be grouped into following lexemes and mapped into the
following tokens which is then passed to syntax analyzer
 position is a lexeme that would be mapped into token (id,1), where id is abstract symbol
standing for identifier and 1 stands for symbol table entry for position
 the assignment symbol is lexeme that is mapped into token (=)
 initial is lexeme that is mapped into token (id,2)
 + is lexeme that is mapped into the token (+)
 rate is the lexeme that is mapped into token(id,3)
 * is lexeme that is mapped into the token (*)
 60 is lexeme that is mapped into the token (60)

The lexical analyzer gives us following representation

Syntax Analysis
Atif Ishaq - Lecturer GC University, Lahore

The second phase of compiler is syntax analysis or parsing. The parser uses token name to create
a tree like structure (syntax tree). It represents the grammatical structure of the token stream. In
syntax tree the interior node represents an operation while the children of the node represents the
arguments of operation.
The syntax tree represents the order in the operations in the assignment are to be performed.

position = initial + rate * 60

The tree has interior node labeled * with id3 as its left child and 60 as its right child. The tree
represents that first rate will be multiplied with 60 and their result will then be added to the value
of initial. The assignment (=) is the root of the tree which indicates that the result of addition
must be stored into the location for the identifier position. The tree follows the usual convention
of arithmetic operations.

Semantic Analysis
The semantic analyzer checks the source code for semantic consistency with the language defined
with the use of syntax tree and information stored in symbol table. It collects the type information
and stores it in symbol table that is ultimately used in intermediate code generation. Another part
of semantic analyzer is type checking. In type checking it is checked that each operator has
matching operand. For example the index of the array is integer in many languages so if in these
language the index is defined as floating value the compiler will report an error.
In some languages type conversion is permitted. This type conversion is called coercions. For
example in an arithmetic expression the binary arithmetic operator may be applied to integer or to
the float values. But in case of a mixed expression for example if binary operator is applied on an
integer value and a floating value, the compiler may convert or coerce the integer into floating
value. It is already discussed when the analysis of source code is discussed.
Atif Ishaq - Lecturer GC University, Lahore

Intermediate Code Generation


During the compilation process (source program into target program) many intermediate
representation are constructed. One of the representation is syntax tree used in syntax and semantic
analysis.
The intermediate representation should have two important properties: it should be easy to produce
and it should be easy to translate into target machine. Three-address code is one example of
intermediate representation which consists of assembly-like instructions. In each instruction there
are three operands and each operand acts like a register. Consider already discussed example and
not some important points:
i. Each instruction can have at-most one operator at right hand side. It is important to define
the order of execution of operation in an instruction
ii. Compiler also generate some temporary name to hold the value computed by three-address
instruction
iii. An instruction can have fewer than three operands

Code Optimization
In this phase the intermediate code is improves so that we can have a better target code as result.
When we talk about better target code we usually means faster code but there are other aspects
that may be considered. For example we are desired to have shorter code that consumes less power.
Consider again the example in discussion, during the semantic analysis the value 60 is converted
Atif Ishaq - Lecturer GC University, Lahore

into float by adding an extra node. In intermediate code generation one instruction is added that
converts integer into float and even for every operation, an instruction is used. In the optimization
process, optimizer deduce that conversion of 60 from integer to float can be done once in
compilation process, so the inttofloat(60) operation can be eliminated by replacing it with 60.0.
Similarly the number of instruction can also be eliminated by removing repetitive instruction.

Code Generation
The code generation phase uses intermediate code of source program to map it onto target
program.

Symbol Table Management


One of the important function of compiler is to record variable names used in the source program
and collect information about various attributes of each name. The information that may be
inferred is storage allocated for a name, its type, its scope. In case of a procedure or function names,
the information is number and type of its argument and the method of passing each argument, call
by value or call by reference and the value returned by the function. A symbol table is one contains
a record of each variable name with all details. The construction of data structure to manage
symbol table is very significant because it must allow compiler to find record quickly. We will
discuss symbol table in detail in upcoming lectures.

You might also like