Introduction To Compiler Design: B.Sc. (SE) - 3rd Year (Session-2017-18)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

1

Introduction to Compiler Design

B.Sc.(SE)-3rd Year
(Session-2017-18)

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
2

Syllabus
• Prerequisites: Formal Language and Automata
Theory
• Textbook: “Compilers: Principles, Techniques,
and Tools” by Aho, Sethi, and Ullman, 2nd edition
• Other material: class handouts
• Grade breakdown:
– Exams (two midterm, one final) (35%+50%)
– Lab/Project assignments (10%)
– Attendence (5%)

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
3

Objectives
• Be able to build a compiler for a (simplified)
(programming) language
• Know how to use compiler construction tools,
such as generators of scanners and parsers
• Be familiar with assembly code and virtual
machines, such as the JVM, and bytecode
• Be familiar with compiler analysis and
optimization techniques

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
4

Introductions….
• Programming languages are notations for
describing computations to people and to
machine.
• The world as we know it depends on
programming language, because all the S/W
running on all the computers was written in some
programming language.
• But, before a program can be run, it first must be
translated into a form in which it can be execute
by computer.
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
5
• The S/W system that do this translation are called
Compilers.
• The study of Compilers Writing touches upon:
– Programming Languages
– Machine Architecture
– Language Theory
– Algorithms
– S/W Engineering

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
6

Language Processors
• Simply stated, a compiler is a program that can
read a program into one language-(the source
language) and translate it into an equivalent
program into another language-(the target
language)
Input

Source Target
Compiler
Program Program

Dr. Deepak K. Sinha,


Error messages Output
AASTU, Addis Ababa
7
• An Important role of the compiler is to report
any errors in the source program that it detects
during the translation process.
• If the target program is an executable machine
language program, it can then be called by the
user to process inputs and produce outputs.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
8

Compilers and Interpreters (cont’d)


• An Interpreter is another kind of language
processor. Instead of producing a target program
as a translation, an interpreter appears to directly
execute the operation specified in the source
program on inputs supplied by the user.

Source
Program
Interpreter Output
Input

Dr. Deepak K. Sinha,


Error messages
AASTU, Addis Ababa
9
• The machine-language target program produced
by complier is usually much faster than an
interpreter at mapping inputs to outputs.
• An Interpreter however, can usually give better
error diagnostics than a compiler, because it
executes the source program statement by
statement.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
10

A Hybrid Compilers
Source Program

Translator

Intermediate Program
VM Output
Inputs

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
• In addition to the compiler, several other program 11

may be required to create an executable target


program:
– A source program may be divided into modules stored in
separate files.
– The task of collecting the source program is sometimes
entrusted to a separate program, called a preprocessor.
– The preprocessor may also expand shorthand, called macros,
into source language statements.
– The modified source program is then fed to a compiler.
– The compiler may produce an assembly language program as
its output.
– The assembly language is then processed by a program called
assembler that produces relocatable machine code as its
output.
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
Contd…..
12

• Large programs are often complied in pieces, so


the relocatable machines code may have to be
linked together with other relocatable object files
and library files into the code that actually runs on
the machine(linker)
• The loader then puts together all the executable
objects files into memory for execution.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
Preprocessors, Compilers,
13

Assemblers, and Linkers


Source Code

Preprocessor
Modified Source Code
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker/Loader Libraries and
Relocatable Object Files
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
Target Machine Code
The structure of a compiler
14

• Up to this point, we have treated a compiler as a


single box that maps a source program into
semantically equivalent target program.
• If we open up this box a little, we see that there
are two parts of this mapping
– Analysis, and
– Systhesis

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
15

The Structure of a Compiler (1)

• Any compiler must perform two major tasks

Compiler

Analysis Synthesis

– Analysis of the source program


– Synthesis
Dr. Deepak K. Sinha,
AASTU, Addis Ababa of a machine-language program
16

The Analysis-Synthesis Model of Compilation


– Analysis part breaks up the source program into
constituent piece and imposes a grammatical structure
on them
– If the analysis part detects that the source program is
either syntactically ill formed or syntactically
unsound, then it must provide information message, so
the user can take correct action.
– The analysis part also collects information about the
source program and stores it in a data structure called a
symbol table, which is passed along with the
intermediate representation to the synthesis part.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
17

Contd…
• The synthesis part construct the desired target
program from the intermediate representation and
information in the symbol table
• Synthesis takes the tree structure and translates the
operations therein into the target program
• The analysis part is often called the front end and
the synthesis part is the back end of the compiler.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
18
The Structure of Compiler (Phases of Compiler)

Lexical Syntax Semantic


Analyzer Analyzer
Analyzer
Character Token Syntax
stream Stream Tree Syntax
tree
Intermediate
Symbol code
Table Generator
Intermediate
Machine- representation
Code
Dependent Code Generator Machine-
Optimizer Independent
IR
Target Machine Target machine Code Optimizer
Deepak K. Sinha, code
Code Dr.
AASTU, Addis Ababa
19

Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
Lexical Analysis 20

• The first phase of a compiler is called Lexical


Analysis or Scanning.
• The Lexical Analyzer reads the stream of
characters making up the source program and
groups the characters into meaningful sequences
called Lexemes.
• For each lexemes, the lexical analyzer produces
an output a token of the form
• <token-name, attribute-value>
• The same token is being passed to the
subsequent phase, syntax alalysis
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
21
• In the token, the first component token-name is an
abstract symbol that is used during syntax analysis,
And
• The second component attribute-value points to an
entry in the symbol table for this token.

(Information from the symbol table entry is needed


for semantic analysis and code generation)
Example:
p=i+r*60

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
22

p=i+r*60
• p is a lexeme that would be mapped into a token
<id, 1>, where id is an abstract symbol standing
for identifier and 1 points to the symbol-table
entry for p.
• The symbol-table entry for an identifier holds
information about the identifier, such as its name
and type 1 p --------
2 i ………
3 r ……….

symbol table
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
23

• The assignment symbol = is a lexeme that is


mapped into the token <=>
• i……………….<id, 2>
• +……………….<+>
• r.........................<id, 3>
• *…………………<*>
• 60…………………<60>

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
24

Syntax Analysis
• The second phase of the compiler is syntax
analysis or parsing
• The parser uses the first component of the tokens
produced by the lexical analyzer to create a tree-
like intermediate representation that depicts the
grammatical structure of the token stream
• A typical representation of a syntax tree in which
each interior node represents an operation and
the children of the node represent the arguments
of the operation.
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
25

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
26

Semantic Analysis
• The semantic analyzer uses the syntax tree and
the information in the symbol table to check the
source program for semantic consistency with
the language definition.
• It also gathers type information and saves it in
either the syntax tree or the symbol table, for
subsequent use during intermediate code
generation.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
27
• An important part of semantic analysis is type
checking, where the compiler checks that each
operator has matching operands.
• For Example, many programming language
definition require an array index to be an
integer; the compiler must report an error if a
floating-point number is used to index an array.
• The language specification may present some
type conversion called coercions.
(For example p,i,r have been declared to be
floating point numbers, and that the lexeme
<60> by itself forms an integer. )
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
28

• The type checker in the semantic analyzer


discovers that the operator * is applied to a
floating-point number (r) and an integer 60, in
this case the integer may be converted into a
floating-point number.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
29

Intermediate Code Generator


• In the process of translating a source program
into target code, a compiler may construct one or
more intermediate representation.
• After syntax and semantic analysis of the source
program, many compilers generate an explicit
low-level or machine like intermediate
representation.
• The intermediate representation should have two
important properties:
– It should be easy to produce, and it should be easy to
translate into the target machine.
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
• The output of the intermediate code generation 30

consists of three-address code sequence, as


below:

• Why and how three-address???


– Each three-address assignment instruction has at
most one operator on the right side, thus the
precedence of the operation can be done.
– Compiler must generate a temporary name to hold
the value computed by a three-address method.
– First and last in the sequence have fewer then these
operands.
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
Code Optimization 31

• The code optimization attempts to improve the


intermediate code so that better target code will
result.
• Usually better means faster, but some times
shorter code or target code that consumes less
resources

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
Code Generation 32

• The code generation takes as input an


intermediate representation of the source
program and maps it into the target language.
• If the target language is machine code,
registers or memory location are selected for
each variables used by the program

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
Compiler-Construction Tools 33

• Software development tools are available to


implement one or more compiler phases
– Parser generators:
That automatically produces syntax analyzer from a
grammatical description of a programming language.
– Scanner generators:
That produce lexical analyzers from a regular-
expression description of the token of a language.
– Syntax-directed translation engines:
That produce collection of routines for walking a
parse tree and generating intermediate code.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
– Automatic code generators: 34

that produces a code generator from a collection of


rule for translating each operation of the
intermediate language into the machine language for
a target machine.
– Data-flow engines:
That facilitate the gathering of information about
how values are transmitted from one part of a
program to each other part. This is key part of code
optimization.
-Compiler Construction toolkit:
That provides an integral set of the routine for
contracting various phases of a compiler
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
35

Programming Language Basics


• Static/Dynamic Distinction:
– Policy that allows a decision to be made when we
execute the program is said to be a dynamic policy or
to require a decision at run time or otherwise it is
static.
– The scope of declaration:
• A language uses static scope or lexical scope if it is
possible to determine the scope of a declaration by looking
only at the program, or otherwise, the language uses
dynamic scope.
• (C/C++/JAVA uses static scope)

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
36

Environment and Scopes


• The association of name with locations in the memory and then
with values can be described by two mapping that changes as the
program runs
environment state
names location value
(variable)
Environment changes according to the scope rule of a language
int i;
void f()
{
int i;
i=3;
}
x=x+i;
Dr. Deepak K. Sinha,
AASTU, Addis Ababa
The Environment and the state mapping 37

• Static/dynamic binding of names to locate:


– Most binding of names to location is dynamic.
• Static/dynamic binding of location to values:
– The binding of locations to value is generally
dynamic, since we can not tell the value in location
until we run the program.

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
Static scope and Block Structure 38

• C and its family uses static scope (implicit)


• C++/Java and C# explicit control over scope through
the use of keyword public, private and protected.
main() {
int a=1, b=1;
{
int b=2;
{
int a=3; B3
cout<<a<<b;
}
B2 B1
{
int b=4; B4
cout<<a<<b;
}
cout<<a<<b;
}
cout<<a<<b;
}

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
39

Declaration Scope
int a=1 B1-B3
int b=1 B1-B2
int b=2 B2-B4
int a=3 B3
int b=4 B4

Dr. Deepak K. Sinha,


AASTU, Addis Ababa
40

Passing Parameter Mechanism


• Call by Value
• Call by Reference

Dr. Deepak K. Sinha,


AASTU, Addis Ababa

You might also like