0% found this document useful (0 votes)
52 views

Chapter 1

The document discusses compilers and their design. It defines a compiler as a program that translates a program written in one language into an equivalent program in another language. It describes the main phases of a compiler as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also discusses different types of compilers based on their target platform and number of passes.

Uploaded by

Bontu Girma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Chapter 1

The document discusses compilers and their design. It defines a compiler as a program that translates a program written in one language into an equivalent program in another language. It describes the main phases of a compiler as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also discusses different types of compilers based on their target platform and number of passes.

Uploaded by

Bontu Girma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Compiler Design

Department of software Engineering


AASTU
Chapter One
What is a Compiler?
• A compiler is a program that reads a program in one language, the
source language and translates into an equivalent program in
another language, the target language(object).

• Purpose: to translate the programs written in human readable


language to machine language which is understandable to computer
machine.
• The compiler also reports the presence of errors in the source
program during the process.
Analysis-Synthesis Model of Compilation
• There are two parts of compilation…
• Analysis Part: breaks up the source program into constant
piece and creates an intermediate representation (IP) of the
source program.
• Synthesis Part: constructs the desired target program from the
intermediate representation.

• During Analysis =>

Syntax Tree for position := initial + rate * 60


Classification of Compiler
• Based on Target Platform:
– A native or hosted compiler is one which output is intended to
directly run on the same type of computer and operating system
that the compiler itself runs on.
– A cross compiler is one whose output is intended to run on a
platform different from the one on which a compiler runs.
• Based on Number of passes (n-pass):
– One-Pass : The compilation is done in one pass over the
program source, hence the compilation is completed very
quickly.
– Multi-Pass Compiler: The compilation is done step by step.
Each step uses the result of the previous step and creates another
intermediate result.
Translator
• A translator is a program that takes as input a program written
in one programming language (source language) and produces
as output a program in another language (object language).

• Classes of Translator:
– Compiler
– Assembler
– Interpreter
– Preprocessor
Compiler Vs Assembler
• Compiler:

• Assembler:
Compiler Vs Interpreter
• Compiler:

• Interpreter: Instead of producing a target program as a


translation, an interpreter appears to directly execute the operations
specified in the source program on inputs supplied by the user.
Cont..
• Preprocessor: Preprocessor is used for translators that take programs
in one high-level language into equivalent programs in another high-level
language.
• Compilation and Execution:
– Executing a program written in a high-level programming language is
basically a two-step process
Object / Target
Source Program → Compiler →
Program

Error
Messages

Object Program Object Object Program


→ →
Input Program Output
Structure / Phases of a Compiler
Phases Vs Passes
• Phases:
– Phases of a compiler are the sub-tasks that must be
performed to complete the compilation process.
– Phases transforms the source program from one
representation to another.
• Passes:
– Passes refer to the number of times the compiler
has to traverse through the entire program.
– Passes are a combination of one or more phases.
Phase grouping
Front-end & Back-end Compiler Structure

• Front-End: Analysis
– Read source program and understand its structure and meaning.
– A front-end of a compiler is responsible for the analysis of source code.
• Back-End: Synthesis
– Generate equivalent target language program.
– A back-end of a compiler is responsible for the synthesis of target code.
Front-end

• The front end consists of lexical (Scanner) and syntax analysis


(Parser) and is machine-independent.
• Recognize legal source code.
• Report errors.
• Produce IR (Intermediate Representation).
• Preliminary storage maps.
Back-end

• The back end consists of code generation and optimization and


is very machine-dependent.
• Translate IR into target machine code.
• Choose instructions for each IR operation.
• Decide what to keep in registers at each point.
1. Lexical Analysis
• Is the first phase of a compiler.
• A lexical analyzer, also called a lexer or a scanner.
• Goal: Scans a source program from left to right character by
character and group them into tokens.

• Tokens are sequences of characters with a collective meaning.


• The output of the scanner, Stream of tokens passes to the next
phase (Syntax analyzer) as an input.
Cont…
• this phase builds tables which are used by subsequent phases
of the compiler. Such table is called the symbol table, stores all
identifiers used in the source program, including relevant
information and attributes of the identifiers.
• The symbol-table entry for an identifier holds information
about the identifier, such as its name and type.
• Example: position := initial + rate * 60 ;
Specification & Recognition of Tokens
• Regular expressions are an important notation used for
specifying tokens. (But they can‟t express all possible patterns)
• Deterministic Finite-state Automata (DFA) are used to
recognize tokens, that are specified by regular expression.
Tokens
• Each token has a type and a value:
 (token-type, attribute-value)
• Examples of Tokens:
– Operators (arithmetic, relational, logic)
– =, +, -, *, /, >, <, (, ), {, :=, <>, }, …
– Keywords
– If, while, for, int, double, …
– Constant literals (Integer, Double, Char, String, etc.)
– 43, 6.035, -3.6e10, „a‟, „`‟, „\‟, “3.142”, “aBcDe”, “hello”, ….
– Punctuations, Identifiers, etc…
• Examples of non-Tokens:
– White space space, tab(„\t‟), eoln(„\n‟)
– Comments /*this is not a token*/
Token Type vs. Lexeme vs. Attribute
• Token type: A syntactic category/grouping
– In English: noun, verb, adjective, …
– In programming language: identifier, integer, keyword, ….
• Lexeme: Concrete manifestation of token in the text.
– Example: position := initial + rate * 60 ;
• Position, :=, initial, +, rate, * and 60 all are lexeme which are mapped into a token.

• Attribute: “Value of interest” about a token.


Example:
• Show the token classes, or “words”, put out by the lexical
analysis phase corresponding to this C++ source input:
a) position = initial + rate * 60 ;
b) sum = sum + unit * /* accumulate sum */ 1.2e-12 ;
• Solution:
a) identifier (position)
assignment (=)
identifier (initial)
operator (+)
identifier (rate)
operator (*)
numeric constant (60)
Cont…
b) sum = sum + unit * /* accumulate sum */ 1.2e-12 ;

• Identifier (sum)
• Assignment (=)
• Identifier (sum)
• Operator (+)
• Identifier (unit)
• Operator (∗)
• numeric constant (1.2e-12)
2. Syntax Analysis
• The syntax analysis phase is often called the parser.
• Parser has two functions:
– receives a stream of tokens from the lexer and groups them
into phrases that match specified grammatical patterns.
– The output of the parser is an abstract syntax tree
representing the syntactical structure of the tokens.
Conti…
• Syntax analysis or parsing is about discovering structure in
text and is used to determine whether or not a token conforms
to an expected format (grammatical patterns).
• Grammatical patterns are described by a context-free grammar.
• For example, an assignment statement may be defined as:
stmt id = expr ;
expr expr + expr | expr * expr | id | num
• The compiler checks to make sure the statements and
expressions are correctly formed:
– Example: "Is this a correct assignment statement?“
position = initial + rate * 60 ;
Push-down Automata/Machine (PDA)
• Pushdown machines can be used for syntax analysis, just as
finite state machines are used for lexical analysis.
Cont…
• Syntax tree (parser tree):
– A parse tree is known as a concrete syntax tree.
– each interior node represents an operation and the children
of the node represent the arguments of the operation.
• Example: Show a syntax tree for the C/C++ statement:
a) (A / B) * C
b) A / (B * C)
c) position = initial + rate * 60
d) If x>100 then y :=1 else y:=2; (Assume that an if statement
consists of three subtrees, one for the condition, one for the
consequent statement, and one for the else statement, if necessary.)
Solution

A) B)

C) D)
3. Semantic Analyzer
• A semantic analyzer traverses the abstract syntax tree,
checking that each node is appropriate for its context, i.e., it
checks for semantic errors. It outputs a refined abstract syntax
tree.
• Find remaining errors that would make program invalid
• undefined variables, types
• type errors that can be caught statically
• Figure out useful information for later phases
• types of all expressions
• data layout
Kinds of Checks
• Uniqueness checks
– Certain names must be unique
– Many languages require variable declarations
• Flow-of-control checks
– Match control-flow operators with structures
– Example: break applies to innermost loop/switch
• Type checks
– Check compatibility of operators and operands
• Logical checks
– Program is syntactically and semantically correct, but does
not do the “correct” thing
Examples of Reported Errors
• Undeclared identifier
• Multiple declared identifier
• Index out of bounds
• Wrong number or types of args to call
• Incompatible types for operation
• Break statement outside switch/loop
• Goto with no label
• etc…
4. Intermediate Code Generation
• Intermediate code is code that represents the semantics of a
program, but is machine-independent.
• An intermediate code generator receives the abstract syntax
tree and it outputs intermediate code that semantically
corresponds to the abstract syntax tree.

• This stage marks the boundary between the front end and the
back end.
• intermediate representation should have two important
properties; it should be easy to produce, and easy to translate
into the target program.
Cont…
• One popular type of intermediate-language representation is
“Three Address Code (TAC)”.
• Three-address code statement is: A := B op C
where A, B and C are operands and op is a binary operator.
• The parse tree for (A/B) * C might be converted into the three-
address sequence:

T1 := A/B;
T2 := T1 * C;

(where T1 and T2 are


temporary variables)
Cont…
• Control Statements are translated into three-address code by
using jump instructions.
• The basic idea of converting any flow of control statement to a
three address code is to simulate the “branching” of the flow
of control.
• Examples:-
5. Code Optimization
• The code optimization phase attempts to improve the
intermediate code.
• An optimizer reviews the code, looking for ways to reduce the
number of operations and the memory requirements.
– A program may be optimized for speed or for size.
– Typically there is a trade-off between speed and size.
• Code optimization aimed at obtaining a more efficient code.

• Optimized code:
– Executes faster
– efficient memory usage
– yielding better performance.
Example (Optimization)

• The intermediate code in this example may be optimized as:


• position = initial + rate * 60;

temp1 = inttoreal(60)
temp2 = rate * temp1 temp1 = rate * 60.0
temp3 = initial + temp2 position= initial+ temp1
position = temp3

` Three Address Code (TAC)


Optimization Transformations
• Optimizations provided by a compiler includes:
– Eliminating common sub-expressions
– Variable propagation
– Dead Code Elimination
– Local Optimization
– Compile Time Evaluation
– Removing redundant identifiers
– Loop optimizations: Code motion, Induction variable
elimination, and Reduction in strength.
– Etc…
6. Code Generation
• The final phase, code generator, produces the object code.
• The code generator receives the (optimized) intermediate code.
• It produces either
– Machine code for a specific machine, or
– Assembly code for a specific machine and assembler.
• If it produces assembly code, then an assembler is used to
produce the machine code.
Example of Code Generation
• The intermediate code may be translated into the assembly
code:
position = initial + rate * 60;

MOVF rate, R2
temp1 = inttoreal(60)
MULF #60.0, R2
temp2 = rate * temp1
MOVF initial, R1
temp3 = initial + temp2
ADDF R2, R1
position = temp3
MOVF R1, position

TAC Assembly
Symbol Table
• Symbol Table is a data structure meant to collect information
about names appearing in the source program.
• It keeps track about the scope/binding information about
names.
• used during all phases of compilation.
• Each entry in the symbol table has a pair of the form (name
and information).
• Information consists of attributes (e.g. type, location)
depending on the language.
• may or may not be constructed during lexical and syntax
analysis, depending on the compiler.
Error Handling
• One of the most important functions of a compiler is the
detection and reporting of errors in the source program.
• Errors can be encountered by all of the phases of a compiler.
• What Happens When an Error Is Found?
• Whenever a phase of the compiler discovers an error, it
must report the error to the error handler, which issues an
appropriate diagnostic message.
• What Kinds of Errors Are Found During the Analysis Phase?
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
Parser (performs syntax analysis Parse tree or abstract syntax tree ;
|
based on the grammar of the =
programming language) / \
A +
/ \
B C

Semantic analyzer (type checking, Annotated parse tree or abstract


etc) syntax tree
Intermediate code generator Three-address code, quads, or int2fp B t1
RTL + t1 C t2
:= t2 A
Optimizer Three-address code, quads, or int2fp B t1
RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3,r1
ADDF2 r1,r2
MOVF r2,A
Peephole optimizer Assembly code ADDF2 #2.3,r2
MOVF r2,A
Compiler Writing Tools
• Different tools have been developed to help constructing
compiler.
• tools available to assist in the writing of lexical analyzers:
• lex - produces C source code (UNIX).
• flex - produces C source code (gnu).
• JLex - produces Java source code.
• tools available to assist in the writing of parsers:
• yacc - produces C source code (UNIX).
• bison - produces C source code (gnu).
• CUP - produces Java source code.

You might also like