0% found this document useful (0 votes)
2 views

Structure and phases of a compiler

The document provides an overview of compiler design, covering topics such as syntax and semantic analysis, optimization, and code generation. It discusses the theory of computer languages, the evolution of compilers, and the various stages involved in compilation. Additionally, it highlights the features of a good programming language and the challenges faced in compiler development.

Uploaded by

yoju1907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Structure and phases of a compiler

The document provides an overview of compiler design, covering topics such as syntax and semantic analysis, optimization, and code generation. It discusses the theory of computer languages, the evolution of compilers, and the various stages involved in compilation. Additionally, it highlights the features of a good programming language and the challenges faced in compiler development.

Uploaded by

yoju1907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 54

COMPILER DESIGN

1
Introduction to Compilers

In these slides, we will cover the following topics:


• Introduction
• Theory of computer languages
• Design of a language
• Evolution of compilers
• Stages of compilation

2
Introduction

• Syntax analysis (scanning and parsing)

• Semantic analysis (determining what a program should do)

• Optimization (improving the performance of a program as indicated by


some metric, typically execution speed and/or saving space requirement)

• Code generation (generation and output of an equivalent program in


some target language, often the instruction set of a CPU)

3
Theory of Computer Languages

• Natural Languages vs. Formal Languages

• Language and Grammar

• Notations and Conventions

• Hierarchy of Formal Languages

4
Natural Language vs. Formal Language

• For natural language, there is no grammar

• For each formal language there is a grammar and each


grammar may produce a formal language

• For the user, the following two sentences:


THE CHT CAUGHT THE MOUSE
THE CAT CAUGHT THE MOUSE
would convey the same meaning after correcting the spelling in
the second word of the first sentence since there is a already a
context established with the relation between mouse and cat.

5
Natural Language vs. Formal Language
(Contd.)
For the user, the following two sentences:
THE MOUSE CAUGHT THE CAT and
THE CAT CAUGHT THE MOUSE
are syntactically correct but semantically (meaning-wise) these are
incorrect.

However if it is parsed by compiler, it treats both sentence as both


syntactically and semantically correct.

It is very difficult (not impossible) to resolve this kind of problem by the


computer (compiler) due to the complexities involved in representing
the sentence in a meaningful form.

6
Language and Grammar
Grammar means a set of rules governing a language Components
of the language
• Character Set
• Words (strings or tokens)
• Sentence
The sentence has different parts such as noun, verb, etc.

Consider a sentence:
The boy throws a ball

The components of the sentence are:


– The (article)
– boy (noun)
– throws (verb)
– a (article)
– ball (noun)

7
Parts of a sentence

8
Grammar Notations and Conventions

9
Grammar Notations and Conventions
(Contd.)

A generative grammar (G) can be formally defined as G = (VN, VT, R, S) such that:

Formal grammar G = (VN, VT, R, S) such that VN and VT are the finite set of symbols
where, VN  VT = empty
R is a set of pairs (P, Q) such that
a) P in (VN È VT)+
b) Q in (VN È VT)*
S in VN
R is the rewriting rule represented in the form P :: Q or P  Q. We say P produces Q.
10
Hierarchy of Formal Languages

Type 0: unrestricted or phrase-


structured grammar
Type 1: context-sensitive grammar
(CSG)
Type 2: context-free grammar (CFG)
Type 3: regular grammar (RG)

1. Increasing Power of expressing the


language
2. Increasing complexity for the
implementation

11
Comparison of the grammar and its
generated languages

12
Features of a Good Language

• Easy to understand
• Expressive power
• Interoperability
• Good turnaround time
• Portability
• Automatic error recovery
• Good error reporting

13
Features of a Good Language (Contd.)

• Efficient memory usage

• Provision of good run-time environment


• Support for virtual machine
• Support for concurrent operation
• Support for unblocked operations
• Garbage collection
• Ability to interface with foreign functions
• Ability to model real-world problems
• Ability to expose the functions for usage in other
languages

14
Representation of Languages

Representation of language with the semantics:


Type 1 grammar or Context sensitive grammar

Representation of language without semantics:


Type 2 grammar or context free grammar

Representation of the words in the language:


Type 3 grammar or regular grammar

15
Grammar of a Language

Formal definition of a grammar is that it is a collection of a finite


set of terminals, finite set of non-terminals, finite set of rewriting
or production rules, and a start symbol.

Example: Grammar for assignment statement

Se  id = expr
expr  expr + term
expr  term
term  term * fact
term  fact
fact  (expr)
fact  id
fact  const

16
History of Compilers
Language Year Introduced by Place
lambda calculus 1930 Alonzo Church Princeton
Stephen Cole University
Kleene
FORTRAN 1954-1957 John Backus IBM
Simula 1960 Ole-Johan Dahl Norwegian
and Kristen Computing Center
Nygaard
Smalltalk 1970 Alan Kay Xerox PARC
LISP 1958 Jhon McCarthy Massa chusetts
Institute of
Technology
Prolog 1970 Alain Colmerauer Marseille, France
Eiffel 1980s Bertrand Meyer

17
History of Compilers (Contd.)
Language Year Introduced by Place
Ada 1977 to 1983 Jean David Ichbiah CII Honeywell Bull

COBOL 1960 Grace Copper Harvard University

BASIC 1964 John George Dartmouth College


Kemeny and in New Hampshire,
Thomas Eugene USA
Kurtz
C 1972 Dennis Ritchie Bell Telephone
Laboratories
C with classes 1979 Bjarne Stroustrup Bell Labs

C++(renamed from 1983 Bjarne Stroustrup Bell Labs


C with classes)

18
History of Compilers (Contd.)

Language Year Introduced by Place

Java 1995 James Gosling Sun


Microsystems

C# 2000 Led by Anders Micro Software


Hejls berg

19
Development of Compilers
The typical issues associated from the language developer
perspective are:
• Where the source code is available (from keyboard, file, or
socket etc)
• What are the types of data supported?
• Basic data types (Boolean, Character, Integer, Real, etc)
• Qualified data types (short, long, signed, unsigned etc)
•Derived data types (record, 1-D array, 2-D array, File,
pointers etc)
• Types of constants (Boolean, char, string etc)
• How variables are represented?
• Size of each data supported
• Scope of variables (static, dynamic)
20
Development of Compilers (Contd.)

• Life time of variables (local, global, external)


• Interface with other complied code
• What kind of error reports that can be produced
• Decide the operating environment (Microsoft, Unix variants)
•Whether it supports parallel processing? If yes, how to separate the
units which are to be run in parallel

21
Development of Compilers (Contd.)
The typical issues associated with respect to the
compilers are:
• How to read the source code?
• How to represent the source code?
• How to separate the tokens?
•What are the data structures for storing variable
information?
•How to store them in memory (code, stack, or
heap area)?
•How to manage the storage during the run time?
•How to prepare the errors linked with multiple
lines?

22
Development of Compilers (Contd.)

• To what extent the semantic checking can be done?


• What intermediate code to prefer?
• Where to introduce optimization process?
• Mapping of intermediate code to machine code
•Interface with host operating system for any parallel
processing support

23
Compiler—At a glance

24
Structure of a Compiler

25
Phases of a Compiler

26
Phases of a Compiler (Contd.)

27
Phases of a Compiler (Contd.)

28
Lexical Analysis
Scans the source code character by character delimited by some white
space characters, operators and punctuators and separate the tokens.

Given a code segment: D=A+B*C

Out of the scanner is:


Token
1 2 3 4 5 6 7
No.
Token
Id op Id op Id op Id
Type
Token or
Lexeme D = A + B * C
Value

29
Lexical Analysis: Token Representation

Given a code segment: Area = 1 / 2 * base * height


Token representation is by a pair: 1. Token type
2. Lexeme value

<Id, “Area”> <op, *>


‘<Id, “base”>
<op, = > <op, *>
<nConst, 1> <Id, height>
<op, / >
<nConst, 2>

30
Syntactic Analysis
Types of Statements
Declarative statement
Assignment statement (Sa)
Lval = Expression
Control statement
Selective
If statement (Sif)
If-then-else statement (Sie)
Switch.. Case (Ssc)
Iterative statement
For statement (Sfor)
While statement (Swhile)
Repeat while or Do while statement
(Sdw)
Goto statement (Sgo)
IO statement (Sio)

31
Syntactic Analysis: Syntax for Arithmetic
Statement

Se  Sa | Sif | Sie | Ssc | Sfor | Swhile | Sdw | Sgo

General Format for the context free grammar is: A 


α

Where A is a non-terminal and ‘α’ is string of grammar


symbol (terminal or non-terminal)

32
Syntactic Analysis: Syntax for Arithmetic
Statement (Contd.)

Sa  Id = Expr
Sif  if Se
Sie  if Se else Se
Ssc  switch Expr Scase
Sfor  for Exprinit Exprcheck Exprincrdcr Se
Swhile  while Expr Se
Sdw  do Se while Expr
Sgo  goto L
Scase  Scase case Expr Se | ε

33
Top-down Parser
Beginning with the start symbol and expanding
(producing) till the given sentence is produced

Example:
Given Sentence: if (a<10) c=a+b else c=a-b

The derivative process is:

Sie if Expr Se else Se  ...  if (a<10) c=a+b else


c=a-b

34
Bottom-up Parser

From the given sentence, scanning from left to


right to find the appropriate right hand side of a
grammar rule (handle) so that it can be replaced by
the corresponding non-terminal in the left hand
side of the grammar rule leading to the reduction to
the start symbol

Finding the right handle is the topic of


interest in bottom up parsing

35
Bottom-up Parsing — An Example
Grammar for arithmetic expression
Expr  Expr + Term
Expr  Term
Term  Term * Factor
Term  Factor
Factor  (Expr)
Factor  Id
Factor  Const
Given sentence to be parsed: A+B
A+B  id + id  Id+Id Factor + Id
 Term+ Id
 Expr + Id
 Expr + Factor
 Expr + Term
 Expr … Which is start symbol for the expression

36
Ambiguous Grammar

• In the parsing steps, if there is more than one


step exists for the production or reduction, the
grammar is said to be ambiguous,

• In such cases, the grammar has to be modified


to resolve the ambiguities

37
Semantic Analysis

• Checking for the compatibility of the data


• Type conversion
• Static checking of the boundary values
•Static checking of the divide by zero are
some of the operations in semantic analysis

38
Intermediate Code Generation (IC)

• At the end of syntax analysis the program is


transformed into another form known as
Intermediate code
• Now onwards there no need for preserving the
source code
• Further actions are based on the intermediate code

39
IC-Compilation process without IC

40
IC-Compilation process with IC

41
IC-Advantages

It support many features like:


• Consistent and continuous programming Model
• Develop once and run anywhere
• Simplified deployment
• Wide platform reach
• Programming language integration
• Simplified code reuse
• Interoperability
• Optimization

42
IC-Types

Syntax Trees
Three address code
Quadruple
Triple
Indirect Triple
Any valid and usable IC

43
IC-Example with Syntax Tree

Example: A + B * C

Id *

A Id Id

B C

44
Code Optimization

The objective of optimization is to reduce the time


requirement and/or space requirements.

Optimization can be done on:


Source code
Intermediate code
Machine code

Optimization on IC is simpler than machine code

Optimization is mainly carried out on code and loop


level

45
Code Optimization (Contd.)

Code level optimization will take care of –


•Reduction in cost by using the appropriate
equivalent operations
• Elimination of repeated calculations
•Elimination of common sub-expression evaluations
•Unreachable code can be identified and removed

If the number of iterations in a loop is very high


focusing on loop optimization gives better
advantages

46
Code Generation

• From the IC, the machine code is generated


• Needs to have the knowledge of
The target machine
(addressing mode, instruction set),
Operating system under deployment
(file format)

47
Code Generation: Example
Given sentence: D=A + B * C
Intermediate code:
T1 = B * C
T2 = A + T1
D = T2

T1 = B * C MOV R1, B
MOV R2, C
ADD R1, R2
MOV T1, R1

T2 = A + T1 MOV R1, A
MOV R2, T1
ADD R1, R2
MOV T2, R1

D = T2 MOV R1, T2
MOV D, R1

48
Code Generation: Issues

Code generation is influenced by many factors such:

• Register Allocation
• Register Scheduling
• Code selection
• Addressing modes
• Instruction format
• Power of Instructions
• Optimization at the machine code level
• Back patching

49
Symbol Table: Management

With respect to the symbols in the program the


issues are:
• Where they are stored?
• When they are stored?
• How they are accessed?
• What is the format for it?
• What data structure can be used?

50
Symbol Table: Attributes associated with
symbols
• Name
• Type
• Size
• Location

Symbol Table: Data Structures

• Array of records
• Linked list of records
• Tree of records
• Hash data structure etc

51
Error Management

Desirable characteristics of error management modules


are:
•The compiler should not crash on encountering error
•Compiler should report error and recover from it for
proceeding with other lines of code
• The reported error must be of meaningful ones
•Error correction module should not change the
meaning of the intended operations

52
Error Management (Contd.)

Lexical errors
• Caused due to misspelling
• Juxtaposing of characters
Syntax errors (Context free grammar error)
• Un-balanced parenthesis
• Undeclared variables
• Missing punctuation operators
Semantic errors (Context sensitive grammar error)
• Assignment of incompatible data types
• Truncation of results
Un-reachable code

53
Key Terms

• Natural language Vs Formal language


• Language and Grammar
• Notations
• Types of grammar
• History of compiler
• Phases of compiler

54

You might also like