0% found this document useful (0 votes)
17 views29 pages

Compiler Design

Uploaded by

rrrroptv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views29 pages

Compiler Design

Uploaded by

rrrroptv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Compiler Design

Introduction

Falguni Sinhababu
Government College of Engineering and Leather Technology

1 1
Books
 Compliers: Principles, Techniques and Tools by Aho,
Lam, Sethi, Ullman Dragon Book
 Engineering a Complier : Cooper and Torczon
 The Essence of Compilers by Hunter (Prentice-
Hall)
 Modern Compiler Design by Grune (Wiley)

2
Definitions
 What is a compiler?
 A program that accepts as input a program text in a
certain language and produces as output a program
text in another language, while preserving the meaning
of that text (Grune et al, 2000).
 A program that reads a program written in one
language (source language) and translates it into an
equivalent program in another language (target
language) (Aho et al)
3 3
General Structure of a Compiler

Source code Object code


Compiler

Error message

The compiler:-
 must generate correct code.
 must recognise errors.
 analyses and synthesises.

4
Definitions
 Interpreter
 A computer program that translates an instruction into machine language and
executes it before going to the next instruction.
 Cross-complier
 A cross-complier is a compiler that runs on one machine and produces object
code for another machine. If a compiler has been implemented in its own language,
then this arrangement is called bootstrap arrangement.
 Assembler
 An assembler translates an assembly language source code to executable or
almost executable object code.
 Macro assembler
 A macro assembler is a type of assembler that supports the use of macros, which are blocks of
code that can be defined and then invoked multiple times within a program, potentially saving time.

5
Qualities of a good compiler
 Generates correct code (first and foremost).
 Generates fast code.
 Conforms to the specification of the input language.
 Copes with essentially arbitrary input size, variables, etc.
 Compilation time (linearly) proportional to size of source.
 Good diagnostics.
 Consistent optimization.
 Works well with the debugger.

6
Principles of compilation
 Preserve the meaning of the program being compiled.
 Improves the source code in some way.
 Speed (of compiled code).
 Space (size of compiled code).
 Feedback (information provided to the user).
 Debugging (transformation observe the relationship source
code vs target).
 Compilation time efficiency (fast or slow compiler).

7
Language Processing System

8 2024/8/3
Phases of a Compiler

9
Compilers and Interpreters
• Compilers generate machine code, whereas Interpreters interpret
intermediate code
• Interpreters are easier to write and can provide better error messages
(symbol table still available)
• Interpreters are at least 5 times slower than machine code generated by
compilers
• Interpreters also require much more memory than machine code
generated by compilers
• Examples: Java, Scala, C#, C, C++ use Compilers. Perl, Ruby, PHP use Interpreters.

10
Translation Overview – Lexical Analysis

11
Lexical Analysis (Scanning)
 Reads characters in the source program and group them into stream of tokens (basic unit of
syntax).
 Each token represents a logical cohesive sequence of characters, such as identifiers,
operators and keywords.
 The character sequence that forms a token is called a lexeme.
 The output is called a token and is a pair of the form <type, lexeme> or <token_class,
attribute>
 a = b + c becomes
 <id, a> <=> <id, b> <id, c>
 position = initial + rate * 60 becomes
 <id, 1> <=> <id, 2> < +> <id, 3> <*> <60>
 And each id attribute is used to record in the symbol table
 Lexical analysis eliminates white space
 FLEX or LEX is used for generating scanners; programs which recognize lexical patterns in
text.

12 12
Token Types
 Keywords: if, else, int, char, do, while, for, struct, return etc
 Constants: often these are numbers, strings or characters
 Numbers are types of numbers
 Strings are text items the language can recognize
 In C or C++ → “This is string”
 Characters are single letters
 In C or C++ →‘C’
 Identifiers: names the programmer has given to something. These include variables, functions,
classes, enumerates etc. each language has rules for specifying how these names can be
written.
 Operators: these are the mathematical, logical and other operators that the language can
recognize
 +, -, *, /, % (modulo), -- (decrement), ++ (increment) etc
 Other tokens: {()} may be valid in the language but not treated as a keyword or operator

13
Attributes of Tokens
 The lexical analyser returns to the parser a representation for the token it has found. The
representation is a integer code if the token is a simple construct such a left parenthesis, comma or
colon. The representation is a pair of an integer code and a pointer to a table if the token is a more
complex element such as an identifier or constant. The integer type gives the token type and the
pointer points to the value of that token.
 The token names and the associated attribute values for the FORTRAN statement
 E = M * C ** 2 are written below as a sequence of pairs
 <id, pointer to symbol table entry for E>
 <assign_op>
 <id, pointer to symbol table entry for M>
 <multi_op>
 <id, pointer to symbol table entry for C>
 <exp_op>
 <number, integer value 2>
 Special operators, punctations and keywords, there is no need for an attribute value. In this example,
the token number has been given an integer valued attribute.

14
Lexical Analysis
• LA can be generated automatically from regular Expressions specification
• LEX and Flex are two such tools

• LA is a deterministic finite state automaton


• Why is LA separated from parsing?
• Simplification of design – software Engineering reason
• I/O issues are limited LA alone
• LA based on finite automaton are more efficient to implement than pushdown automata
used for parsing (due to stack)

15
Translation Overview – Syntax Analysis

16
Parsing or Syntax Analysis
• Syntax Analyzers ( Parsers) can be generated automatically from several
variants of context free grammar specifications
• LL(1) or LALR(1) are the most popular ones
• ANTLR (for LL(1)), YACC and Bison (for LALR(1)) are such tools

• Parsers are deterministic push-down automata


• Parsers cannot handle context sensitive features of programming language
• Variables are declare before use
• Types match on both sides of assignments
• Parameter types and number match in declaration and use

17
Translation Overview – Semantic Analysis

18
Semantic Analysis
• Semantic consistency that cannot be handled at the parsing stage is handled
here
• Type checking of various language constructs is one of the most important tasks
• Stores type information in the symbol table or the syntax tree
• Types of variables, function parameters, array dimensations, etc.
• Used not only for semantic validation but also for subsequent phases of compilation

• Static semantics of programming language can be specified using


attribute grammar

19
Translation overview – Intermediate Code Generation

20
Intermediate Code Generation
• While generation machine code directly from source code , it entails two
problems
• With m languages and n target machines, we need to write mxn compilers
• The code optimizer which is one of the largest and very-difficult-to-write component of
any compiler cannot be reused

• By converting source code to an intermediate code, a machine independent code


optimizer may be used
• Intermediate code must be easy to produce and easy to translate to machine
code
• A sort of universal language.
• Should not contain a machine specific parameter (registers, address, etc)

21
Different types of Intermediate Code
• Types of Intermediate code deployed is based on the application
• Quadruples, triples, indirect triples, abstract syntax trees are the classical forms
used for machine-independent optimizations
• Static Single Assgnment form(SSA) is a recent form and enables more effective
optimizations
• Conditional constant propagation and global values numbering are more effective on
SSA

• Program Dependence Graph is useful in automatic parallelization,


instruction scheduling and software pipelining

22
Translation Overview – Code Optimization

23
Machine Independent Code Optimization
• Intermediate code generation process introduces many inefficiencies.
• Extra copies of variables, using variables instead of constants, repeated evolution
of expressions, etc.

• Code optimization removes such inefficiencies and improves code.


• Improvement may be time, space and power consumption.
• It changes the structure of programs, sometimes of beyond recognition.
• Inline functions, unroll loops, eliminates some programmer defined variables, etc.

• Code optimization consists of a bunch of heuristics and percentage of


improvement depends on programs (may be zero also).

24
Examples of Machine Independent Code Optimization
• Common sub-expression elimination
• Copy propagation
• Loop invariant code motion
• Partial redundant elimination
• Induction variable elimination and strength reduction
• Code optimization needs information about the program
• Which expressions are being recomputed in a function?
• Which definitions reach a point?

• All such information is gathered through data-flow analysis

25 25
Translation Overview – Code Generation

26
Code Generation
 Converts intermediate code to machine code
 Each intermediate code instruction may result in many machine instructions or
vice-versa
 Must handle all aspects of machine architecture
 Registers, pipelining, cache, multiple function units, etc.
 Generating efficient code is a NP complete problem
 Tree pattern matching-based strategies are the best and most common
 Needs tree intermediate code
 Storage allocation decisions are made here
 Register allocation and assignment are the most important problems

27
Machine-Dependent Optimization
 Peephole optimization
 Analyze sequence of instructions in a small `window (peephole) and using preset
patterns, replace them with a more efficient sequence
 Redundant instruction elimination
 E.g. replace the sequence [LD A, R1][ST R1, A] by [LD A, R1]
 Eliminate “jump to jump” instructions
 Use machine idioms (use INC instead of LD and ADD)
 Instruction scheduling (reordering) to eliminate pipeline interlocks and to increase
parallelism
 Trace scheduling to increase the size of basic blocks and increase parallelism
 Software pipelining to increase parallelism in loops

28
Thank You

29

You might also like