0% found this document useful (0 votes)

35 views41 pages

CC 1

This document provides an overview of compiler construction. It discusses the aims and learning outcomes of studying compiler construction. It also defines key terms like compilers, interpreters, and describes the overall compilation process from source code to assembly code. The structure of a compiler is explained as having a front end that analyzes source code and produces an intermediate representation, and a back end that takes the intermediate code to generate target machine code. Context-free grammars are also introduced as a way to specify programming language syntax.

Uploaded by

Kami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views41 pages

CC 1

Uploaded by

Kami

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Compiler Construction

About Me

Amir Ali,

PhD Candidate ,
Xian Jiaotong University, China
MS (CCS), SEECS, NUST, Pakistan

Email: [email protected]
Aims of Course

• Any program written in a programming language must be translated before it

can be executed. This translation is typically accomplished by a software
system called compiler.
• This module aims to introduce students to the principles and techniques used
to perform this translation and the issues that arise in the construction of a
compiler.
Learning Outcomes

• A student successfully completing this module should be able to:

• understand the principles governing all phases of the
compilation process.
• understand the role of each of the basic components of a
standard compiler.
• show awareness of the problems of and methods and
techniques applied to each phase of the compilation process.
• apply standard techniques to solve basic problems that arise in
compiler construction.
• understand how the compiler can take advantage of particular
processor characteristics to generate good code.
Books

• Aho, Lam, Sethi, Ullman. “Compilers: Principles, Techniques and

Tools”, 2nd edition. (Aho2) The 1st edition (by Aho, Sethi, Ullman –
Aho1), the “Dragon Book”, has been a classic for over 20 years.
• Cooper & Torczon. “Engineering a Compiler” –
• Other books:
• Hunter et al. “The essence of Compilers” (Prentice-Hall)
• Grune et al. “Modern Compiler Design” (Wiley)

5
Syllabus

• Introduction
• Lexical Analysis (scanning)
• Syntax Analysis (parsing)
• Semantic Analysis
• Intermediate Representations
• Storage Management
• Code Generation
• Code Optimisation
Why Take this Course

Reason #1: understand compilers and languages

• understand the code structure
• understand language semantics
• understand relation between source code and generated
machine code
• become a better programmer

7
Why Take this Course

Reason #2: nice balance of theory and practice

• Theory
• mathematical models: regular expressions, automata,
grammars, graphs
• algorithms that use these models
• Practice
• Apply theoretical notions to build a real compiler

8
Why Take this Course

Reason #3: programming experience

• write a large program which manipulates complex data
structures
• learn more about C++/C#/Java

9
Definitions – Compilers : Language processors
(compile: collect material into a list, volume)
• What is a compiler?
• A program that accepts as input a program text in a certain
language and produces as output a program text in another
language, while preserving the meaning of that text (Grune et al,
2000).
• A program that reads a program written in one language (source
language) and translates it into an equivalent program in another
language (target language) (Aho et al)
• What is an interpreter?
• A program that reads a source program and produces the results
of executing this source.
• We deal with compilers! Many of these issues arise with interpreters!
Compilation - Big Picture
Source Code

int expr( int n )

{
int d;
d = 4*n*n*(n+1)*(n+1);
return d;
}

• Optimized for human readability

• Matches human notions of grammar
• Uses named constructs such as variables and procedures

12
Assembly Code
.globl _expr imull %eax,%edx
_expr: movl 8(%ebp),%eax
pushl %ebp incl %eax
movl %esp,%ebp imull %eax,%edx
subl $24,%esp movl %edx,-4(%ebp)
movl 8(%ebp),%eax movl -4(%ebp),%edx
movl %eax,%edx movl %edx,%eax
leal 0(,%edx,4),%eax jmp L2
movl %eax,%edx .align 4 Optimized for hardware
imull 8(%ebp),%edx L2: • Consists of machine instructions
movl 8(%ebp),%eax leave • Uses registers and unnamed
incl %eax ret memory locations
• Much harder to understand by
13
humans
How to translate

• the generated machine code must execute precisely the same computation
as the source code
• Is there a unique translation? No!
• Is there an algorithm for an “ideal translation”? No!

• Translation is a complex process

• source language and generated code are very different
• Need to structure the translation
How to translate

If the target program is an executable machine-language

program, it can then be called by the user to process
inputs and produce outputs;
How to translate
• C is typically compiled
• Lisp is typically interpreted
• Java is compiled to bytecodes, which are then interpreted by
Virtual Machine (perhaps across the network)
Structure of a Compiler

• Up to this point we have treated a compiler as a single box that maps a source program in to a
semantically equivalent target program. If we open up this box a little, we see that there are two parts to
this mapping: analysis and synthesis.

• The analysis part breaks up the source program in to constituent pieces and imposes a grammatical
structure (lexical, grammar and syntax errors) on them.
• It then uses this structure to create an intermediate representation of the source program.
• If the analysis part detects that the source program is either syntactically ill formed or semantically
unsound, then it must provide informative messages, so the user can take corrective action.
• The analysis part also collects information about the source program and stores it in a data structure
called a symbol table, which is passed along with the intermediate representation to the synthesis part.
• The synthesis part constructs the desired target program from the intermediate representation and the
information in the symbol table.

• The analysis part is often called the front end of the compiler; the synthesis part is the back end.
Structure of a Compiler
Structure of a Compiler

source Front IR Back machine

code End End code

errors
Front end : maps legal source code into IR
• Recognizes legal (& illegal) programs
• Report errors in a useful way
• Produce IR & preliminary storage map

Back end : maps IR into target machine code

Front End

source scanner tokens parser IR

code

errors

Modules:
1. Scanner
2. Parser
Front End
Scanner Example
• Maps character stream into words – basic
unit of syntax
becomes
x = x + y
<id,x>
<id,x>
• Produces pairs – a word and its part of <assign,=>
<id,x>
speech
<op,+> word
<id,y>
token type

Parser
• Recognizes context-free syntax and reports
errors
• Guides context-sensitive (“semantic”)
analysis
• Builds IR for source program
Context-Free Grammars

• Context-free syntax is specified with a grammar

G=(S,N,T,P)
• S is the start symbol
• N is a set of non-terminal symbols
• T is set of terminal symbols or words
• P is a set of productions or rewrite rules

22
Context-Free Grammars
Grammar for expressions
1. goal → expr • For this CFG
2. expr → expr op term
S = goal
3. | term
4. term → number T = { number, id, +, -}
5. | id N = { goal, expr, term, op}
6. op → + P = { 1, 2, 3, 4, 5, 6, 7}
7. | -

23
Context-Free Grammars
• Given a CFG, we can derive sentences by repeated substitution
• Consider the sentence (expression) x + 2 – y
Production Result
goal
1 expr
2 expr op term
5 expr op y
7 expr – y
2 expr op term – y
4 expr op 2 – y
6 expr + 2 – y
3 term + 2 – y
5 x+2–y
24
Context-Free Grammars
The Front End
• To recognize a valid sentence in some CFG, we reverse this
process and build up a parse
• A parse can be represented by a tree: parse tree or syntax tree
Production Result
goal
1 expr
2 expr op term
5 expr op y
7 expr – y
2 expr op term – y
4 expr op 2 – y
6 expr + 2 – y
3 term + 2 – y
25
5 x+2–y
A language-processing system

• The task of collecting the source program is

sometimes entrusted to a separate program, called a
preprocessor. The preprocessor may also expand
shorthands, called macros, in to source language
statements.
• The compiler, compiles the program and translates it
to assembly program (low-level language).
• An assembler then translates the assembly program
into machine code.
• A linker tool is used to link all the parts of the
program together for execution (executable machine
code).
• A loader loads all of them into memory and then the
program is executed.
Phases of a Compiler
• The compilation process is a sequence of various phases.

• Each phase takes input from its previous stage, has its own
representation of source program, and feeds its output to the next
phase of the compiler.

• In practice, several phases may be grouped together,

and the intermediate representations between the grouped phases
need not be constructed explicitly .
Lexical Analysis
• The first phase of a compiler is called lexical analysis or scanning.
• It reads the stream of characters making up the source program
and groups the characters in to meaningful sequences called
lexemes.
• For each lexeme, the lexical analyzer produces as output a token
of the form that it passes on to the subsequent phase, syntax
analysis.

• In the above token, the first component token-name is an

abstract symbol that is used during syntax analysis, and the
second component attribute-value points to an entry in the
symbol table for this token. Information from the symbol-table
entry is needed for semantic analysis and code generation.
•
Lexical Analysis
• For example position = initial + rate * 60
• position is a lexeme that would b e mapped in to a token <id; 1>, where id is an abstract symbol standing
for identifier and 1 points to the symbol table entry for position.
• The assignment symbol = is a lexeme that is mapped in to the token < = >. Since this token needs no
attribute-value, we have omitted the second component. We could have used any abstract symbol such as
assign for the token-name, but for notational convenience we have chosen to use the lexeme itself as the
name of the abstract symbol.
• initial is a lexeme that is mapped in to the token <id; 2>, , where 2 points to the symbol-table entry for
initial.
• + is a lexeme that is mapped in to the token <+>.
• rate is a lexeme that is mapped in to the token <id; 3>, where 3 points to the symbol-table entry for rate.
• * is a lexeme that is mapped in to the token <*>
• 60 is a lexeme that is mapped in to the token <60> (in later we may use it as <number, 4> )
• Blanks separating the lexemes would b e discarded b y the lexical analyzer.
• In this representation, the token names =, +, and * are abstract symbols for the assignment, addition, and
multiplication operators, respectively .
Lexical Analysis
Syntax Analysis

• The second phase of the compiler is syntax analysis or parsing.

• The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like
intermediate representation that depicts the grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node represents an operation and the
children of the node represent the arguments of the operation.
• The tree has an interior node labeled * with <id,3> as its left child
• and the integer 60 as its right child. The node <id,3> represents
the identifier rate. The node labeled + makes it explicit that we must first multiply the value of rate b y 60.
• The node labeled + indicates that we must add the result of this multiplication to the value of initial. The
root of the tree, labeled =, indicates that we must store the result of this addition in to the location for the
identifier position.
Semantic Analysis

• The semantic analyzer uses the syntax tree and the information in the symbol table to check the source
program for semantic consistency with the language dentition.
• It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks that each operator
has matching operands.
• F or example, many programming language definitions require an array index to be an integer; the
compiler must report an error if a floating-point number is used to index an array.
• The language specification may permit some type conversions called coercions. For example, a binary
arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the
operator is applied to a floating-point number and an integer, the compiler may convert or coerce the
integer in to a floating-point number.
• Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are
declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generation

• After syntax and semantic analysis of the source program, many compilers generate an explicit low-level
or machine-like intermediate representation, which w e can think of as a program for an abstract machine.
This intermediate representation should have two important properties: it should b e easy to
produce and it should be easy to translate in to the target machine.
• In later lectures , we consider an intermediate form called three-address code, which consists of a
sequence of assembly-like instructions with three operands per instruction. Each operand can act like a
register
Code Optimization

• The machine-independent co de-optimization phase attempts to improve the intermediate co de so that

better target co de will result. Usually better means faster, but other objectives may be desired, such as
shorter code, or target code that consumes less power. For example, a straight forward algorithm generates
the intermediate code, using an instruction for each operator in the tree representation that comes from the
semantic analyzer.

• The optimizer can deduce that the con version of 60 from integer to floating point can b e done once and
for all at compile time, so the inttofloat operation can be eliminated by replacing the integer 60 by the
floating-point number 60.0.
• Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform in to the shorter
sequence
Code Generation

• The code generator takes as input an intermediate representation of the source program and maps it in to
the target language. If the target language is machine code, registers or memory lo cations are selected for
each of the variables used b y the program.
• Then, the intermediate instructions are translated in to sequences of machine instructions that perform the
same task. A crucial aspect of code
generation is the assignment of registers to hold variables.
• For example, using registers R1 and R2, the intermediate code in optimization phase might get translated
in to the machine code.
Symbol Table

• It is a data-structure maintained throughout all the phases of a compiler.

• All the identifier's names along with their types are stored here.
• The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it.
• The symbol table is also used for scope management.

The Grouping of Phases in to Passes

The discussion of phases deals with the logical organization of a compiler. In an implementation, activities from
several phases may be grouped together in to a pass that reads an input file and writes an output file. For example, the
front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code generation might be
grouped together in to one pass. Code optimization might be an optional pass. Then there could be a back-end pass
consisting of code generation for a particular target machine

Compiler-Construction Tools
There are some tools available for each stage / phase of compiler (we will see and try to get hands on experience)
Symbol Table
Full Compiler Structure
Qualities of a Good Compiler

What qualities would you want in a compiler?

• generates correct code (first and foremost!)
• generates fast code
• conforms to the specifications of the input language
• copes with essentially arbitrary input size, variables, etc.
• compilation time (linearly)proportional to size of source
• good diagnostics
• consistent optimisations
• works well with the debugger
Principles of Compilation
The compiler must:
• preserve the meaning of the program being compiled.
• “improve” the source code in some way.
Other issues (depending on the setting):
• Speed (of compiled code)
• Space (size of compiled code)
• Feedback (information provided to the user)
• Debugging (transformations obscure the relationship source code vs target)
• Compilation time efficiency (fast or slow compiler?)
Uses of Compiler Technology
• Most common use: translate a high-level program to object code
• Program Translation: binary translation, hardware synthesis, …
• Optimizations for computer architectures:
• Improve program performance, take into account hardware parallelism, RISC, CISC,
etc…
• Automatic parallelisation
• Performance instrumentation: e.g., -pg option of cc or gcc
• Interpreters: e.g., Python, Ruby, Perl, Matlab, sh, …
• Software productivity tools
• Bound Checking, Type Checking, Debugging aids: e.g, purify (memory management
errors, etc).
• Security: Java VM uses compiler analysis to prove “safety” of Java code.
• Text formatters, just-in-time compilation for Java, power management,
global distributed computing, …
Key: Ability to extract properties of a source program (analysis) and
transform it to construct a target program (synthesis)

CD All Units
No ratings yet
CD All Units
117 pages
The Evidence Based Practitioner Applying Research To Meet Client Needs 1st Edition, (Ebook PDF
No ratings yet
The Evidence Based Practitioner Applying Research To Meet Client Needs 1st Edition, (Ebook PDF
50 pages
What Is Moxibustion Acupuncturedrcmt PDF
No ratings yet
What Is Moxibustion Acupuncturedrcmt PDF
3 pages
Beginning Algebra 9th Edition Tobey Test Bank
100% (33)
Beginning Algebra 9th Edition Tobey Test Bank
25 pages
Compiler Design: B.Tech Cse Iii Year Ii Semester
No ratings yet
Compiler Design: B.Tech Cse Iii Year Ii Semester
25 pages
Programming Imp Questions
No ratings yet
Programming Imp Questions
32 pages
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
No ratings yet
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
4 pages
AJE American and British English
100% (1)
AJE American and British English
3 pages
Poseidon Principles
No ratings yet
Poseidon Principles
73 pages
Unwholesome Action Brings Suffering (Eng & Chi)
No ratings yet
Unwholesome Action Brings Suffering (Eng & Chi)
13 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
41 pages
CD Module 1 Cambridge
No ratings yet
CD Module 1 Cambridge
136 pages
Flava Works vs. Adam4Adam
No ratings yet
Flava Works vs. Adam4Adam
61 pages
Log in and Out DTR
No ratings yet
Log in and Out DTR
5 pages
CSC Slides Intro N Lex
No ratings yet
CSC Slides Intro N Lex
77 pages
1 - Introduction To Compilers
No ratings yet
1 - Introduction To Compilers
21 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
185 pages
Pittston Sunday Dispatch 4-13-2011
No ratings yet
Pittston Sunday Dispatch 4-13-2011
70 pages
Refusal Letter For Deployment
100% (1)
Refusal Letter For Deployment
2 pages
m433-نظرية المترجمات د عبدالباقي
No ratings yet
m433-نظرية المترجمات د عبدالباقي
146 pages
CD Unit-1 (Complete)
No ratings yet
CD Unit-1 (Complete)
90 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
43 pages
Lecture 1 - CSC 303
No ratings yet
Lecture 1 - CSC 303
40 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
The History of Spirometry
No ratings yet
The History of Spirometry
16 pages
Copch 1
No ratings yet
Copch 1
32 pages
English Exam Unit 3
No ratings yet
English Exam Unit 3
4 pages
Chapter 1
No ratings yet
Chapter 1
44 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
40 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Marketing Analysis of Unilever: Submitted To
No ratings yet
Marketing Analysis of Unilever: Submitted To
32 pages
Chapter 1
No ratings yet
Chapter 1
49 pages
Compilers
No ratings yet
Compilers
86 pages
Notes Compile Complete
No ratings yet
Notes Compile Complete
117 pages
Unit-1 Notes CD OU
No ratings yet
Unit-1 Notes CD OU
19 pages
CD Unit 1
No ratings yet
CD Unit 1
63 pages
Math210 03notes
No ratings yet
Math210 03notes
4 pages
Seraph
0% (1)
Seraph
5 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
CD Unit-1
No ratings yet
CD Unit-1
37 pages
Intro To Compilers
No ratings yet
Intro To Compilers
77 pages
Compiler Design Chapter-1
No ratings yet
Compiler Design Chapter-1
41 pages
Compiler Lecture-1
No ratings yet
Compiler Lecture-1
47 pages
CD Introduction
No ratings yet
CD Introduction
32 pages
CD Unit I Part I Introduction
No ratings yet
CD Unit I Part I Introduction
67 pages
Dela Cruz and Chan Ethics
No ratings yet
Dela Cruz and Chan Ethics
4 pages
Chapter 1-1
No ratings yet
Chapter 1-1
25 pages
Introduction To Python
No ratings yet
Introduction To Python
13 pages
Viio Turbo Viper Diagram
No ratings yet
Viio Turbo Viper Diagram
2 pages
CD 1
No ratings yet
CD 1
15 pages
1 Compiler Design Lect1
No ratings yet
1 Compiler Design Lect1
28 pages
Compiler Design and Implementation
No ratings yet
Compiler Design and Implementation
5 pages
Cat Driver Information Card - LEDT7022
No ratings yet
Cat Driver Information Card - LEDT7022
2 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
CSC303 - Compiler Design - 060624
No ratings yet
CSC303 - Compiler Design - 060624
49 pages
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
No ratings yet
TK3163 Sem2 2023 1MyCh1.1-1.2 Intro
43 pages
Advanced Computer Systems: Compiler Design & Implementation
No ratings yet
Advanced Computer Systems: Compiler Design & Implementation
20 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
Compiler Design
No ratings yet
Compiler Design
25 pages
CD Unit 1
No ratings yet
CD Unit 1
11 pages
Trek Marlin 29er Owners Manual
No ratings yet
Trek Marlin 29er Owners Manual
3 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
Introduction To Agri-Business
No ratings yet
Introduction To Agri-Business
61 pages
Debre Markos University Burie Campus Departement of Computer Science
No ratings yet
Debre Markos University Burie Campus Departement of Computer Science
44 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
No ratings yet
Automata Theory and Compiler Design (AT&CD) Vtu Sce 5th Sem 21cs51
12 pages
Principles of Compiler Design: Million G/her
No ratings yet
Principles of Compiler Design: Million G/her
40 pages
D T Solar Ovens Lesson Plan
No ratings yet
D T Solar Ovens Lesson Plan
6 pages
Accounting Problem Solving Part 2
No ratings yet
Accounting Problem Solving Part 2
2 pages
Unit 1 (A)
No ratings yet
Unit 1 (A)
40 pages
Chapter 1
No ratings yet
Chapter 1
42 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
Relation-Reincarnation and Globalisation.
No ratings yet
Relation-Reincarnation and Globalisation.
3 pages
Vision and Mission Vision Statement: United Schools of Science and Technology (USST) Colleges San Isidro, Tarlac City
100% (1)
Vision and Mission Vision Statement: United Schools of Science and Technology (USST) Colleges San Isidro, Tarlac City
8 pages
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
No ratings yet
Dakshina Ranjan Kisku Associate Professor Department of Computer Science and Engineering National Institute of Technology Durgapur
31 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
Structural Functionalism
No ratings yet
Structural Functionalism
16 pages
Narrative (SPES)
No ratings yet
Narrative (SPES)
2 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Unit 1
No ratings yet
Unit 1
29 pages
CD Iii I
No ratings yet
CD Iii I
180 pages
Unit 1
No ratings yet
Unit 1
29 pages
ComeThouLongExpected Hyfrydol
No ratings yet
ComeThouLongExpected Hyfrydol
1 page
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)