1.1 Compilers
1.1 Compilers
1 Compilers:
• A compiler is a program that reads a program
written in one language –– the source
language –– and translates it into an
equivalent program in another language ––
the target language
1
1.1 Compilers:
• As an important part of this translation
process, the compiler reports to its user the
presence of errors in the source program.
2
1.1 Compilers:
3
1.1 Compilers:
• At first glance, the variety of compilers may
appear overwhelming.
4
1.1 Compilers:
• Target languages are equally as varied;
5
1.1 Compilers:
• Compilers are sometimes classified as:
– single-pass
– multi-pass
– load-and-go
– Debugging
– optimizing
6
1.1 Compilers:
• The basic tasks that any compiler must
perform are essentially the same.
7
1.1 Compilers:
• Throughout the 1950’s, compilers were
considered notoriously difficult programs to
write.
8
9
The Analysis-Synthesis Model of
Compilation:
– Analysis
– Synthesis
10
The Analysis-Synthesis Model of
Compilation:
11
The Analysis-Synthesis Model of
Compilation:
12
The Analysis-Synthesis Model of
Compilation:
errors
13
The Analysis-Synthesis Model of
Compilation:
14
The Analysis-Synthesis Model of
Compilation:
15
The Analysis-Synthesis Model of
Compilation:
16
17
Analysis of the Source Program:
– Linear Analysis:
– Hierarchical Analysis:
– Semantic Analysis:
18
Analysis of the Source Program:
• Linear Analysis:
19
Scanning or Lexical Analysis (Linear
Analysis):
21
Scanning or Lexical Analysis (Linear
Analysis):
22
23
Analysis of the Source Program:
• Hierarchical Analysis:
24
Syntax Analysis or Hierarchical Analysis
(Parsing):
25
Syntax Analysis or Hierarchical Analysis
(Parsing):
26
Syntax Analysis or Hierarchical Analysis
(Parsing):
Assignment
Statement
:=
Identifier Expression
Position +
Expression Expression
Identifier *
Initial
Expression Expression
Identifier Number
Rate 60
27
Syntax Analysis or Hierarchical Analysis
(Parsing):
• In the expression initial + rate * 60, the phrase
rate * 60 is a logical unit because the usual
conventions of arithmetic expressions tell us
that the multiplication is performed before
addition.
• Because the expression initial + rate is
followed by a *, it is not grouped into a single
phrase by itself
28
Syntax Analysis or Hierarchical Analysis
(Parsing):
29
Syntax Analysis or Hierarchical Analysis
(Parsing):
• Any identifier is an expression.
• Any number is an expression
• If expression1 and expression2 are expressions,
then so are
– Expression1 + expression2
– Expression1 * expression2
– (Expression1 )
30
31
Analysis of the Source Program:
• Semantic Analysis:
32
Semantic Analysis:
33
Semantic Analysis:
34
Semantic Analysis:
35
Semantic Analysis:
36
Semantic Analysis:
37
38
1.3 The Phases of a Compiler:
• A compiler operates in phases.
39
1.3 The Phases of a Compiler:
40
1.3 The Phases of a Compiler:
• Linear Analysis:
41
1.3 The Phases of a Compiler:
43
1.3 The Phases of a Compiler:
44
1.3 The Phases of a Compiler:
• Hierarchical Analysis:
45
1.3 The Phases of a Compiler:
46
1.3 The Phases of a Compiler:
47
1.3 The Phases of a Compiler:
Assignment
Statement
:=
Identifier Expression
Position +
Expression Expression
Identifier *
Initial
Expression Expression
Identifier Number
Rate 60
48
1.3 The Phases of a Compiler:
• In the expression initial + rate * 60, the phrase
rate * 60 is a logical unit because the usual
conventions of arithmetic expressions tell us
that the multiplication is performed before
addition.
• Because the expression initial + rate is
followed by a *, it is not grouped into a single
phrase by itself
49
1.3 The Phases of a Compiler:
50
1.3 The Phases of a Compiler:
• Any identifier is an expression.
• Any number is an expression
• If expression1 and expression2 are expressions,
then so are
– Expression1 + expression2
– Expression1 * expression2
– (Expression1 )
51
1.3 The Phases of a Compiler:
• Semantic Analysis:
52
1.3 The Phases of a Compiler:
53
1.3 The Phases of a Compiler:
54
1.3 The Phases of a Compiler:
55
1.3 The Phases of a Compiler:
56
1.3 The Phases of a Compiler:
57
1.3 The Phases of a Compiler:
• Symbol Table Management:
– An essential function of a compiler is to record the
identifiers used in the source program and collect
information about various attributes of each
identifier.
58
1.3 The Phases of a Compiler:
– The symbol table is a data structure containing a
record for each identifier with fields for the
attributes of the identifier.
59
1.3 The Phases of a Compiler:
– However, the attributes of an identifier cannot
normally be determined during lexical analysis.
60
1.3 The Phases of a Compiler:
– The remaining phases gets information about
identifiers into the symbol table and then use this
information in various ways.
62
63
Error Detection and Reporting:
• Each phase can encounter errors.
64
Error Detection and Reporting:
• A compiler that stops when it finds the first
error is not as helpful as it could be.
65
Error Detection and Reporting:
• Errors where the token stream violates the
structure rules (syntax) of the language are
determined by the syntax analysis phase.
66
67
Intermediate Code Generation:
• After Syntax and semantic analysis, some
compilers generate an explicit intermediate
representation of the source program.
68
Intermediate Code Generation:
• This intermediate representation should have
two important properties;
– it should be easy to produce,
– easy to translate into the target program.
69
Intermediate Code Generation:
• We consider an intermediate form called
“three-address code,”
70
Intermediate Code Generation:
• Three-address code consists of a sequence of
instructions, each of which has at most three
operands.
71
Intermediate Code Generation:
(1.3)
• Temp1 := inttoreal (60)
• Temp2 := id3 * temp1
• Temp3 := id2 + temp2
• id1 := temp3
72
73
Code Optimization:
• The code optimization phase attempts to
improve the intermediate code, so that faster-
running machine code will result.
74
Code Optimization:
• Some optimizations are trivial.
75
Code Optimization:
(1.4)
• Temp1 := id3 * 60.0
• id := id2 + temp1
76
Code Optimization:
• That is, the compiler can deduce that the
conversion of 60 from integer to real
representation can be done once and for all at
compile time, so the inttoreal operation can
be eliminated.
77
Code Optimization:
• Besides, temp3 is used only once, to transmit
its value to id1. It then becomes safe to
substitute id1 for temp3, whereupon the last
statement of 1.3 is not needed and the code
of 1.4 results.
78
79
Code Generation
• The final phase of the compiler is the
generation of target code
80
Code Generation
• Memory locations are selected for each of the
variables used by the program.
• MOVF id3, r2
• MULF #60.0, r2
• MOVF id2, r1
• ADDF r2, r1
• MOVF r1, id1
82
Code Generation
• The first and second operands of each
instruction specify a source and destination,
respectively.
83
Code Generation
• This code moves the contents of the address
id3 into register 2, and then multiplies it with
the real-constant 60.0.
84
Code Generation
• The third instruction moves id2 into register 1
and adds to it the value previously computed
in register 2
85
86
1.4 Cousins of the Compiler:
• As we saw in given figure, the input to a
compiler may be produced by one or more
preprocessors, and further processing of the
compiler’s output may be needed before
running machine code is obtained.
87
Skeletal Source Program
Preprocessor
Source program
Compiler
Assembler
Loader/link-editor
LIBRARY, RELOCATABLE
OBJECT FILES
Absolute machine code
– Macro Processing:
– File inclusion:
– “Rational” Preprocessors:
– Language extensions:
89
Preprocessors:
• Macro Processing:
90
Preprocessors:
• File inclusion:
– A preprocessor may include header files into the
program text.
91
Preprocessors:
• defs.h • main.c
• • //////
#include “defs.h”
• ////// • //////
• …---…---…---
• ////// • //////
• …---…---…---
• ////// • …---…---…---
• …---…---…---
• …---…---…---
• …---…---…---
92
Preprocessors:
• “Rational” Preprocessors:
– These processors augment older languages with
more modern flow-of-control and data-structuring
facilities.
93
Preprocessors:
• Language extensions:
– These processors attempt to add capabilities to
the language by what amounts to built-in macros.
– For example, the language Equal is a database
query language embedded in C. Statements
beginning with ## are taken by the preprocessor
to be database-access statements, unrelated to C,
and are translated into procedure calls on routines
that perform the database access.
94
Assemblers:
• Some compilers produce assembly code that
is passed to an assembler for further
processing.
95
Assemblers:
96
Assemblers:
• Assembly code is a mnemonic version of
machine code.
97
Assemblers:
• A typical sequence of assembly instructions
might be
• MOV a , R1
• ADD #2 , R1
• MOV R1 , b
98
Assemblers:
• This code moves the contents of the address a
into register 1, then adds the constant 2 to it,
reading the contents of register 1 as a fixed-
point number, and finally stores the result in
the location named by b. thus, it computes
b:=a+2.
99
Two-Pass Compiler:
100
Two-Pass Compiler:
• The simplest form of assembler makes two
passes over the input.
101
Two-Pass Compiler:
• in the first pass, all the identifiers that denote
storage locations are found and stored in a
symbol table
102
Two-Pass Compiler:
MOV a , R1 Identifiers Address
ADD #2 , R1 a 0
MOV R1 , b b 4
103
Two-Pass Compiler:
• In the second pass, the assembler scans the input
again.
105
Loaders and Link-Editors:
• The process of loading consists of taking
relocatable machine code, altering the
relocatable addresses, and placing the altered
instructions and data in memory at the proper
location.
106
Loaders and Link-Editors:
• The link-editor allows us to make a single
program from several files of relocatable
machine code.
107
108
1.5
The Grouping of Phases:
109
Front and Back Ends:
• The phases are collected into a front end and
a back end.
110
Front and Back Ends:
• These normally include lexical and syntactic
analysis, the creating of the symbol table,
semantic analysis, and the generation of
intermediate code.
111
Front and Back Ends:
• The front end also includes the error handling
that goes along with each of these phases.
112
Front and Back Ends:
• The back end includes those portions of the
compiler that depend on the target machine.
113
Front and Back Ends:
• In the back end, we find aspects of the code
optimization phase, and we find code
generation, along with the necessary error
handling and symbol table operations.
114
115
Passes:
116
117
118
Compiler-Construction Tools:
• The compiler writer, like any programmer, can
profitably use tools such as
– Debuggers,
– Version managers,
– Profilers and so on.
119
Compiler-Construction Tools:
• In addition to these software-development
tools, other more specialized tools have been
developed for helping implement various
phases of a compiler.
120
Compiler-Construction Tools:
• Shortly after the first compilers were written,
systems to help with the compiler-writing
process appeared.
• These systems have often been referred to as
– Compiler-compilers,
– Compiler-generators,
– Or Translator-writing systems.
121
Compiler-Construction Tools:
• Some general tools have been created for the
automatic design of specific compiler
components.
• These tools use specialized languages for
specifying and implementing the component,
and many use algorithms that are quite
sophisticated.
122
Compiler-Construction Tools:
• The most successful tools are those that hide
the details of the generation algorithm and
produce components that can be easily
integrated into the remainder of a compiler.
123
Compiler-Construction Tools:
• The following is a list of some useful compiler-
construction tools:
– Parser generators
– Scanner generators
– Syntax directed translation engines
– Automatic code generators
– Data-flow engines
124
Compiler-Construction Tools:
• Parser generators
– These produce syntax analyzers, normally from
input that is based on a context-free grammar.
– In early compilers, syntax analysis consumed not
only a large fraction of the running time of a
compiler, but a large fraction of the intellectual
effort of writing a compiler.
– This phase is considered one of the easiest to
implement.
125
Compiler-Construction Tools:
• Scanner generators:
– These tools automatically generate lexical
analyzers, normally from a specification based on
regular expressions.
126
Compiler-Construction Tools:
• Syntax directed translation engines:
– These produce collections of routines that walk
the parse tree, generating intermediate code.
127
Compiler-Construction Tools:
• Automatic code generators:
128
• Data-flow engines:
129