0% found this document useful (0 votes)
23 views

Chapter One-Introduction

The document discusses the various phases of a compiler: 1) Lexical analysis breaks the source code into tokens that are passed to the syntax analyzer. 2) Syntax analysis builds a syntax tree from the tokens to represent the grammatical structure. 3) Semantic analysis checks for semantic consistency and performs type checking using the syntax tree and symbol table. 4) Intermediate code generation produces a low-level intermediate representation like 3-address code. 5) Code optimization improves the intermediate code for better target code generation. 6) Code generation maps the optimized intermediate code to instructions in the target language.

Uploaded by

gebrehiwot
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Chapter One-Introduction

The document discusses the various phases of a compiler: 1) Lexical analysis breaks the source code into tokens that are passed to the syntax analyzer. 2) Syntax analysis builds a syntax tree from the tokens to represent the grammatical structure. 3) Semantic analysis checks for semantic consistency and performs type checking using the syntax tree and symbol table. 4) Intermediate code generation produces a low-level intermediate representation like 3-address code. 5) Code optimization improves the intermediate code for better target code generation. 6) Code generation maps the optimized intermediate code to instructions in the target language.

Uploaded by

gebrehiwot
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter-1: Introduction

Compiler and its various phases-Cousins of Compiler-The Grouping of Phases-Compiler Construction Tools

Compiler and its various phases


 A compiler is a program that can read a program written in one language – the source language – and
translate it into an equivalent program in another language – the target language (see Figure 1.1)

source program Compiler target program

Figure 1.1: Compiler

 A compiler also reports any errors in the source program that it detects during the translation process
 If the target program is an executable machine-language program, it can then be called the user to process
input and produce output
 There are two parts responsible for mapping source program into a semantically equivalent target program:
analysis and synthesis
 The analysis part breaks up the program into constituent pieces and imposes grammatical structure on
them
 It then uses this structure to create an intermediate representation of the source program
 If analysis part detects errors (syntax and semantic), it provides informative messages
 The analysis part also collects information about the source program and stores it in a data structure called
symbol table, which is passed along with an intermediate representation to the synthesis part
 The synthesis part constructs the desired target program from the intermediate representation and the
information in the symbol table
 The analysis part is often called the front end of the compiler; the synthesis part the back end
 The compilation process operates as a sequence of phases each of which transforms one representation of
the source program into another
 A typical decomposition of a compiler into phases is shown in Figure 1.3
 In practice several phases may be grouped together and the intermediate representations between need not
be constructed explicitly
 The symbol table, which stores information about the entire source program, is used by all phases of the
compiler

Lexical Analysis

 The first phase of a compiler is called lexical analysis or scanning


 The lexical analyzer reads the stream of characters making up the source program and groups them into
meaningful sequences called lexemes
 For each lexeme the lexical analyzer produces a token of the form:
<token-name, attribute-value>
that it passes on to the subsequent phase, syntax analysis
 In the token, the first component token-name is an abstract symbol that is used during syntax analysis, and
the second attribute value points to an entry in the symbol table for this token
 For example, suppose a source program contains the following assignment statement
position = initial + rate * 60

1
 The characters in this assignment could be grouped into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:
1. position is a lexeme that would be mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol table entry holds
information about the identifier, such as its name and type
2. = is a lexeme that is mapped into the token <=>. Since it needs no attribute value, the second
component is omitted
3. initial is a lexeme that would be mapped into a token <id, 2>, where 2 points to the symbol table entry
for position
4. + is a lexeme that is mapped into the token <+>
5. rate is a lexeme that would be mapped into a token <id, 3>, where 3 points to the symbol table entry
for rate
6. * is a lexeme that is mapped into the token <*>
2
7. 60 is a lexeme that is mapped into the token <60>
 After lexical analysis, the sequence of tokens in equation 1.1 are

<id, 1><=><id, 2><+><id, 3><*><60> (1.2)

 In this representation, the token names =, +, and * are abstract symbols for the assignment, addition, and
multiplication operators, respectively

Syntax Analysis

 The second phase of a compiler is syntax analysis or parsing


 The parser uses the tokens produced by the lexical analyzer to create a tree-like intermediate representation
– called syntax tree – that depicts the grammatical structure of the token stream
 In a syntax tree, each interior node represents an operator and the children of the node represent the
arguments of the operation (operands)
 The syntax tree for the previous token stream in Equation 1,2 is shown in Figure 1.4

Figure 1.4: Syntax tree


 This tree shows the order in which the operations in the assignment in (1.1) are to be performed
 Multiplication is done first, followed by addition, and finally assignment

Semantic Analysis

 The semantic analyzer uses the syntax tree and the information in the symbol table to check the source
program for semantic consistency with the language definition
 It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate code generation
 An important part of semantic analysis is type checking, where the compiler checks that each operator as
matching operands, e.g., many programming language definitions require array index to be an integer; the
compiler must report error if floating-point number is used instead
 A language specification may permit type coercion, e.g., if binary arithmetic operator is applied to integer
and floating point operands, the compiler may convert or coerce the integer into a floating-point number
 Suppose that position, initial and rate have been declared to be floating-point numbers, and lexeme 60 by
itself forms an integer
 Semantic analyzer first converts integer 60 to a floating point number before applying *
3
Intermediate Code Generation

 After syntax and semantic analysis, many compilers generate an explicit low-level or machine-like, which
we can think of as a program for an abstract machine
 This intermediate representation should have two properties: it should be easy to produce and it should be
easy to translate it into the target machine
 One of the intermediate representations called three-address code consists of an assembly like instructions
with a maximum of three operands per instruction (or at most one operator at the right side of an
assignment operator)
 Each operand can act like a register
 The output of the intermediate code generator can consist of the three-address code sequence
t1 = int to float (60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3 (1.3)
 The compiler must also generate temporary name to hold the value computed by a three-address
instruction

Code Optimization

 The machine-independent code-optimization phase attempts to improve the intermediate code so that
better target code will result
 Usually better means faster, but other objectives may be desired, such as shorter code or target code that
consumes less power
 For example, an algorithm generates the intermediate code (1.3), using an instruction per each operator in
the tree representation that comes from the semantic analyzer
 The optimizer can deduce that the conversion of 60 from integer to floating point can be done once for all
at compile time, so the int to float operation can be eliminated by replacing the integer 60 by the floating-
point number 60.0
 Moreover, t3 is used only once to transmit its value to id1 so that the optimizer can transform (1.3) into the
shorter sequence
t1 = id3 * 60.0
id1 = id2 + t1 (1.4)

Code Generation

 The code generator takes as an input intermediate representation of the source program and maps it into
the target language
 If the target language is machine language, registers or memory locations are selected for each of the
variables used by the program
 Then, the intermediate instructions are translated into sequences of machine instructions to perform the
same task
 A crucial aspect of code generation is the judicious assignment of registers to hold variables
 For example, using registers R1 and R2, the intermediate code in (1.4) might get translated into the
machine code
LDF R2, id3

4
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1 (1.5)
 The first operand of each instruction specifies a destination
 The F in each instruction tells us that it deals with floating-point numbers
 The code in (1.5) loads the contents of address id3 into register R2, then multiplies it with floating-point
constant 60.0
 The # signifies that 60.0 is to be treated as an immediate constant
 The third instruction moves id2 into register R1 and the fourth adds to it the value previously computed in
register R2
 Finally, the value in register R1 is stored into the address id1, so the code correctly implements the
assignment statement (1.1)

Symbol-Table Management

 An essential function of a compiler is to record the variables names used in the source program and collect
information about various attributes of each name
 These attributes may provide information about the storage allocated for a name, its type, its scope (where
in the program its program can be used), and in the case of procedure names, such things as the number
and types of its arguments, the method of passing each argument (e.g., by value or by reference), and the
type returned
 The symbol table is a data structure containing a record for each variable name, with fields for attributes of
the name
 The data structure should be designed to allow the compiler to find the record for each name quickly and
to store or retrieve data from that record quickly

Cousins of Compiler

5
 An interpreter is another common kind of language processor that instead of producing a target program as
a translation, an interpreter appears to directly execute the operations specified in the source program on
input supplied by the user
 The machine-language target produced by a compiler is usually much faster than an interpreter at mapping
inputs to outputs
 An interpreter can usually give better error diagnostics than a compiler, because it executes the source
program statement by statement
 Several other programs may be needed in addition to a compiler to create an executable program as shown
in Figure 1.2.
 The task of a preprocessor (a separate program) is collecting modules of a program stored in separate files
 It may also expand short hands, called macros, into source language statements
 The modified source program is fed to a compiler
 The compiler may produce an assembly-language program as its output, because assembly language is
easier to produce as an output and easier to debug
 The assembly language program is then processed by a program called assembler that produces a
relocatable machine code as its output
 Large programs are often compiled in pieces, so that the relocatable machine code may have to be linked
with other relocatable object files and library files into the code actually runs on the machine
 The linker resolves external memory addresses, where the code in one file may refer to a location in
another file
 The loader then puts together all executable object files into memory for execution

The Grouping of Phases


 The discussion of phases deals with the logical organization of a compiler
 In an implementation, activities from several phases may be grouped together into a pass that reads an
input file and writes an output file
 For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate
code generation into one pass
 Code optimization may be an optional pass
 Then there could be a back-end pass consisting of code generation for a particular target machine

Compiler Construction Tools


 Some commonly used compiler construction tools include:
1. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens
of the language
2. Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language
3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and
generating intermediate code
4. Code generator generators that produce a code generator from a collection of rules for translating each
operation of the intermediate language into the machine language for a target machine
5. Data-flow analysis engines that facilitate the gathering of information about how values are
transmitted from one part of the program to each other part. Data flow analysis is key part of code
optimization
6. Compiler-construction toolkits that provide an integrated set of routines for constructing various
phases of a compiler

You might also like