CC 1
CC 1
About Me
Amir Ali,
PhD Candidate ,
Xian Jiaotong University, China
MS (CCS), SEECS, NUST, Pakistan
Email: [email protected]
Aims of Course
5
Syllabus
• Introduction
• Lexical Analysis (scanning)
• Syntax Analysis (parsing)
• Semantic Analysis
• Intermediate Representations
• Storage Management
• Code Generation
• Code Optimisation
Why Take this Course
7
Why Take this Course
8
Why Take this Course
9
Definitions – Compilers : Language processors
(compile: collect material into a list, volume)
• What is a compiler?
• A program that accepts as input a program text in a certain
language and produces as output a program text in another
language, while preserving the meaning of that text (Grune et al,
2000).
• A program that reads a program written in one language (source
language) and translates it into an equivalent program in another
language (target language) (Aho et al)
• What is an interpreter?
• A program that reads a source program and produces the results
of executing this source.
• We deal with compilers! Many of these issues arise with interpreters!
Compilation - Big Picture
Source Code
12
Assembly Code
.globl _expr imull %eax,%edx
_expr: movl 8(%ebp),%eax
pushl %ebp incl %eax
movl %esp,%ebp imull %eax,%edx
subl $24,%esp movl %edx,-4(%ebp)
movl 8(%ebp),%eax movl -4(%ebp),%edx
movl %eax,%edx movl %edx,%eax
leal 0(,%edx,4),%eax jmp L2
movl %eax,%edx .align 4 Optimized for hardware
imull 8(%ebp),%edx L2: • Consists of machine instructions
movl 8(%ebp),%eax leave • Uses registers and unnamed
incl %eax ret memory locations
• Much harder to understand by
13
humans
How to translate
• the generated machine code must execute precisely the same computation
as the source code
• Is there a unique translation? No!
• Is there an algorithm for an “ideal translation”? No!
• Up to this point we have treated a compiler as a single box that maps a source program in to a
semantically equivalent target program. If we open up this box a little, we see that there are two parts to
this mapping: analysis and synthesis.
• The analysis part breaks up the source program in to constituent pieces and imposes a grammatical
structure (lexical, grammar and syntax errors) on them.
• It then uses this structure to create an intermediate representation of the source program.
• If the analysis part detects that the source program is either syntactically ill formed or semantically
unsound, then it must provide informative messages, so the user can take corrective action.
• The analysis part also collects information about the source program and stores it in a data structure
called a symbol table, which is passed along with the intermediate representation to the synthesis part.
• The synthesis part constructs the desired target program from the intermediate representation and the
information in the symbol table.
• The analysis part is often called the front end of the compiler; the synthesis part is the back end.
Structure of a Compiler
Structure of a Compiler
errors
Front end : maps legal source code into IR
• Recognizes legal (& illegal) programs
• Report errors in a useful way
• Produce IR & preliminary storage map
errors
Modules:
1. Scanner
2. Parser
Front End
Scanner Example
• Maps character stream into words – basic
unit of syntax
becomes
x = x + y
<id,x>
<id,x>
• Produces pairs – a word and its part of <assign,=>
<id,x>
speech
<op,+> word
<id,y>
token type
Parser
• Recognizes context-free syntax and reports
errors
• Guides context-sensitive (“semantic”)
analysis
• Builds IR for source program
Context-Free Grammars
22
Context-Free Grammars
Grammar for expressions
1. goal → expr • For this CFG
2. expr → expr op term
S = goal
3. | term
4. term → number T = { number, id, +, -}
5. | id N = { goal, expr, term, op}
6. op → + P = { 1, 2, 3, 4, 5, 6, 7}
7. | -
23
Context-Free Grammars
• Given a CFG, we can derive sentences by repeated substitution
• Consider the sentence (expression) x + 2 – y
Production Result
goal
1 expr
2 expr op term
5 expr op y
7 expr – y
2 expr op term – y
4 expr op 2 – y
6 expr + 2 – y
3 term + 2 – y
5 x+2–y
24
Context-Free Grammars
The Front End
• To recognize a valid sentence in some CFG, we reverse this
process and build up a parse
• A parse can be represented by a tree: parse tree or syntax tree
Production Result
goal
1 expr
2 expr op term
5 expr op y
7 expr – y
2 expr op term – y
4 expr op 2 – y
6 expr + 2 – y
3 term + 2 – y
25
5 x+2–y
A language-processing system
• Each phase takes input from its previous stage, has its own
representation of source program, and feeds its output to the next
phase of the compiler.
• The semantic analyzer uses the syntax tree and the information in the symbol table to check the source
program for semantic consistency with the language dentition.
• It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate-code generation.
• An important part of semantic analysis is type checking, where the compiler checks that each operator
has matching operands.
• F or example, many programming language definitions require an array index to be an integer; the
compiler must report an error if a floating-point number is used to index an array.
• The language specification may permit some type conversions called coercions. For example, a binary
arithmetic operator may be applied to either a pair of integers or to a pair of floating-point numbers. If the
operator is applied to a floating-point number and an integer, the compiler may convert or coerce the
integer in to a floating-point number.
• Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are
declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generation
• After syntax and semantic analysis of the source program, many compilers generate an explicit low-level
or machine-like intermediate representation, which w e can think of as a program for an abstract machine.
This intermediate representation should have two important properties: it should b e easy to
produce and it should be easy to translate in to the target machine.
• In later lectures , we consider an intermediate form called three-address code, which consists of a
sequence of assembly-like instructions with three operands per instruction. Each operand can act like a
register
Code Optimization
• The optimizer can deduce that the con version of 60 from integer to floating point can b e done once and
for all at compile time, so the inttofloat operation can be eliminated by replacing the integer 60 by the
floating-point number 60.0.
• Moreover, t3 is used only once to transmit its value to id1 so the optimizer can transform in to the shorter
sequence
Code Generation
• The code generator takes as input an intermediate representation of the source program and maps it in to
the target language. If the target language is machine code, registers or memory lo cations are selected for
each of the variables used b y the program.
• Then, the intermediate instructions are translated in to sequences of machine instructions that perform the
same task. A crucial aspect of code
generation is the assignment of registers to hold variables.
• For example, using registers R1 and R2, the intermediate code in optimization phase might get translated
in to the machine code.
Symbol Table
The discussion of phases deals with the logical organization of a compiler. In an implementation, activities from
several phases may be grouped together in to a pass that reads an input file and writes an output file. For example, the
front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate code generation might be
grouped together in to one pass. Code optimization might be an optional pass. Then there could be a back-end pass
consisting of code generation for a particular target machine
Compiler-Construction Tools
There are some tools available for each stage / phase of compiler (we will see and try to get hands on experience)
Symbol Table
Full Compiler Structure
Qualities of a Good Compiler