Chapter One-Introduction

The document discusses the various phases of a compiler: 1) Lexical analysis breaks the source code into tokens that are passed to the syntax analyzer. 2) Syntax analysis builds a syntax tree from the tokens to represent the grammatical structure. 3) Semantic analysis checks for semantic consistency and performs type checking using the syntax tree and symbol table. 4) Intermediate code generation produces a low-level intermediate representation like 3-address code. 5) Code optimization improves the intermediate code for better target code generation. 6) Code generation maps the optimized intermediate code to instructions in the target language.

Uploaded by

gebrehiwot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Chapter One-Introduction

Uploaded by

gebrehiwot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Chapter-1: Introduction

Compiler and its various phases-Cousins of Compiler-The Grouping of Phases-Compiler Construction Tools

Compiler and its various phases

 A compiler is a program that can read a program written in one language – the source language – and
translate it into an equivalent program in another language – the target language (see Figure 1.1)

source program Compiler target program

Figure 1.1: Compiler

 A compiler also reports any errors in the source program that it detects during the translation process
 If the target program is an executable machine-language program, it can then be called the user to process
input and produce output
 There are two parts responsible for mapping source program into a semantically equivalent target program:
analysis and synthesis
 The analysis part breaks up the program into constituent pieces and imposes grammatical structure on
them
 It then uses this structure to create an intermediate representation of the source program
 If analysis part detects errors (syntax and semantic), it provides informative messages
 The analysis part also collects information about the source program and stores it in a data structure called
symbol table, which is passed along with an intermediate representation to the synthesis part
 The synthesis part constructs the desired target program from the intermediate representation and the
information in the symbol table
 The analysis part is often called the front end of the compiler; the synthesis part the back end
 The compilation process operates as a sequence of phases each of which transforms one representation of
the source program into another
 A typical decomposition of a compiler into phases is shown in Figure 1.3
 In practice several phases may be grouped together and the intermediate representations between need not
be constructed explicitly
 The symbol table, which stores information about the entire source program, is used by all phases of the
compiler

Lexical Analysis

 The first phase of a compiler is called lexical analysis or scanning

 The lexical analyzer reads the stream of characters making up the source program and groups them into
meaningful sequences called lexemes
 For each lexeme the lexical analyzer produces a token of the form:
<token-name, attribute-value>
that it passes on to the subsequent phase, syntax analysis
 In the token, the first component token-name is an abstract symbol that is used during syntax analysis, and
the second attribute value points to an entry in the symbol table for this token
 For example, suppose a source program contains the following assignment statement
position = initial + rate * 60

1
 The characters in this assignment could be grouped into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:
1. position is a lexeme that would be mapped into a token <id, 1>, where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol table entry holds
information about the identifier, such as its name and type
2. = is a lexeme that is mapped into the token <=>. Since it needs no attribute value, the second
component is omitted
3. initial is a lexeme that would be mapped into a token <id, 2>, where 2 points to the symbol table entry
for position
4. + is a lexeme that is mapped into the token <+>
5. rate is a lexeme that would be mapped into a token <id, 3>, where 3 points to the symbol table entry
for rate
6. * is a lexeme that is mapped into the token <*>
2
7. 60 is a lexeme that is mapped into the token <60>
 After lexical analysis, the sequence of tokens in equation 1.1 are

<id, 1><=><id, 2><+><id, 3><*><60> (1.2)

 In this representation, the token names =, +, and * are abstract symbols for the assignment, addition, and
multiplication operators, respectively

Syntax Analysis

 The second phase of a compiler is syntax analysis or parsing

 The parser uses the tokens produced by the lexical analyzer to create a tree-like intermediate representation
– called syntax tree – that depicts the grammatical structure of the token stream
 In a syntax tree, each interior node represents an operator and the children of the node represent the
arguments of the operation (operands)
 The syntax tree for the previous token stream in Equation 1,2 is shown in Figure 1.4

Figure 1.4: Syntax tree

 This tree shows the order in which the operations in the assignment in (1.1) are to be performed
 Multiplication is done first, followed by addition, and finally assignment

Semantic Analysis

 The semantic analyzer uses the syntax tree and the information in the symbol table to check the source
program for semantic consistency with the language definition
 It also gathers type information and saves it in either the syntax tree or the symbol table, for subsequent
use during intermediate code generation
 An important part of semantic analysis is type checking, where the compiler checks that each operator as
matching operands, e.g., many programming language definitions require array index to be an integer; the
compiler must report error if floating-point number is used instead
 A language specification may permit type coercion, e.g., if binary arithmetic operator is applied to integer
and floating point operands, the compiler may convert or coerce the integer into a floating-point number
 Suppose that position, initial and rate have been declared to be floating-point numbers, and lexeme 60 by
itself forms an integer
 Semantic analyzer first converts integer 60 to a floating point number before applying *
3
Intermediate Code Generation

 After syntax and semantic analysis, many compilers generate an explicit low-level or machine-like, which
we can think of as a program for an abstract machine
 This intermediate representation should have two properties: it should be easy to produce and it should be
easy to translate it into the target machine
 One of the intermediate representations called three-address code consists of an assembly like instructions
with a maximum of three operands per instruction (or at most one operator at the right side of an
assignment operator)
 Each operand can act like a register
 The output of the intermediate code generator can consist of the three-address code sequence
t1 = int to float (60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3 (1.3)
 The compiler must also generate temporary name to hold the value computed by a three-address
instruction

Code Optimization

 The machine-independent code-optimization phase attempts to improve the intermediate code so that
better target code will result
 Usually better means faster, but other objectives may be desired, such as shorter code or target code that
consumes less power
 For example, an algorithm generates the intermediate code (1.3), using an instruction per each operator in
the tree representation that comes from the semantic analyzer
 The optimizer can deduce that the conversion of 60 from integer to floating point can be done once for all
at compile time, so the int to float operation can be eliminated by replacing the integer 60 by the floating-
point number 60.0
 Moreover, t3 is used only once to transmit its value to id1 so that the optimizer can transform (1.3) into the
shorter sequence
t1 = id3 * 60.0
id1 = id2 + t1 (1.4)

Code Generation

 The code generator takes as an input intermediate representation of the source program and maps it into
the target language
 If the target language is machine language, registers or memory locations are selected for each of the
variables used by the program
 Then, the intermediate instructions are translated into sequences of machine instructions to perform the
same task
 A crucial aspect of code generation is the judicious assignment of registers to hold variables
 For example, using registers R1 and R2, the intermediate code in (1.4) might get translated into the
machine code
LDF R2, id3

4
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1 (1.5)
 The first operand of each instruction specifies a destination
 The F in each instruction tells us that it deals with floating-point numbers
 The code in (1.5) loads the contents of address id3 into register R2, then multiplies it with floating-point
constant 60.0
 The # signifies that 60.0 is to be treated as an immediate constant
 The third instruction moves id2 into register R1 and the fourth adds to it the value previously computed in
register R2
 Finally, the value in register R1 is stored into the address id1, so the code correctly implements the
assignment statement (1.1)

Symbol-Table Management

 An essential function of a compiler is to record the variables names used in the source program and collect
information about various attributes of each name
 These attributes may provide information about the storage allocated for a name, its type, its scope (where
in the program its program can be used), and in the case of procedure names, such things as the number
and types of its arguments, the method of passing each argument (e.g., by value or by reference), and the
type returned
 The symbol table is a data structure containing a record for each variable name, with fields for attributes of
the name
 The data structure should be designed to allow the compiler to find the record for each name quickly and
to store or retrieve data from that record quickly

Cousins of Compiler

5
 An interpreter is another common kind of language processor that instead of producing a target program as
a translation, an interpreter appears to directly execute the operations specified in the source program on
input supplied by the user
 The machine-language target produced by a compiler is usually much faster than an interpreter at mapping
inputs to outputs
 An interpreter can usually give better error diagnostics than a compiler, because it executes the source
program statement by statement
 Several other programs may be needed in addition to a compiler to create an executable program as shown
in Figure 1.2.
 The task of a preprocessor (a separate program) is collecting modules of a program stored in separate files
 It may also expand short hands, called macros, into source language statements
 The modified source program is fed to a compiler
 The compiler may produce an assembly-language program as its output, because assembly language is
easier to produce as an output and easier to debug
 The assembly language program is then processed by a program called assembler that produces a
relocatable machine code as its output
 Large programs are often compiled in pieces, so that the relocatable machine code may have to be linked
with other relocatable object files and library files into the code actually runs on the machine
 The linker resolves external memory addresses, where the code in one file may refer to a location in
another file
 The loader then puts together all executable object files into memory for execution

The Grouping of Phases

 The discussion of phases deals with the logical organization of a compiler
 In an implementation, activities from several phases may be grouped together into a pass that reads an
input file and writes an output file
 For example, the front-end phases of lexical analysis, syntax analysis, semantic analysis, and intermediate
code generation into one pass
 Code optimization may be an optional pass
 Then there could be a back-end pass consisting of code generation for a particular target machine

Compiler Construction Tools

 Some commonly used compiler construction tools include:
1. Scanner generators that produce lexical analyzers from a regular-expression description of the tokens
of the language
2. Parser generators that automatically produce syntax analyzers from a grammatical description of a
programming language
3. Syntax-directed translation engines that produce collections of routines for walking a parse tree and
generating intermediate code
4. Code generator generators that produce a code generator from a collection of rules for translating each
operation of the intermediate language into the machine language for a target machine
5. Data-flow analysis engines that facilitate the gathering of information about how values are
transmitted from one part of the program to each other part. Data flow analysis is key part of code
optimization
6. Compiler-construction toolkits that provide an integrated set of routines for constructing various
phases of a compiler

Final Project Report
50% (4)
Final Project Report
52 pages
Auto Data 3.45 INSTALLATION INSTRUCTIONS
67% (6)
Auto Data 3.45 INSTALLATION INSTRUCTIONS
5 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Compilation Phases
No ratings yet
Compilation Phases
20 pages
Lec 2
No ratings yet
Lec 2
21 pages
CD UNIT 1 Chapter 1
No ratings yet
CD UNIT 1 Chapter 1
9 pages
phases of compiler
No ratings yet
phases of compiler
36 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
SCS13033
No ratings yet
SCS13033
121 pages
Compiler Design Mod 1
No ratings yet
Compiler Design Mod 1
75 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
Structure of Compiler
No ratings yet
Structure of Compiler
3 pages
COMPILER DESIGN UNIT 1
No ratings yet
COMPILER DESIGN UNIT 1
30 pages
Lecture 1,2 Introduction
No ratings yet
Lecture 1,2 Introduction
40 pages
Unit 1
No ratings yet
Unit 1
37 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
Lec#1
No ratings yet
Lec#1
36 pages
Introduction To Compiling
100% (1)
Introduction To Compiling
26 pages
SCSA1604
No ratings yet
SCSA1604
133 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
17 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Ch1 Introduction
No ratings yet
Ch1 Introduction
12 pages
CD FINALIZED NOTES
No ratings yet
CD FINALIZED NOTES
6 pages
CD 1
No ratings yet
CD 1
23 pages
CC Assignment
No ratings yet
CC Assignment
6 pages
SPCCPDF
No ratings yet
SPCCPDF
83 pages
Compiler Notes
No ratings yet
Compiler Notes
66 pages
Introduction To Compiler Lexical Analysis Notes
No ratings yet
Introduction To Compiler Lexical Analysis Notes
21 pages
Compiler 1
No ratings yet
Compiler 1
28 pages
SSCDNotes PDF
100% (1)
SSCDNotes PDF
53 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
CD Unit I Part I Introduction
No ratings yet
CD Unit I Part I Introduction
67 pages
Compiler Construction Week 2
No ratings yet
Compiler Construction Week 2
29 pages
CD_ UNIT-1
No ratings yet
CD_ UNIT-1
10 pages
Unit 1 CD
No ratings yet
Unit 1 CD
26 pages
Introduction To Compilation
No ratings yet
Introduction To Compilation
33 pages
Compiler Design - YesDee(1)
No ratings yet
Compiler Design - YesDee(1)
427 pages
Language Processing System:-: Compiler
No ratings yet
Language Processing System:-: Compiler
6 pages
Compiler Construction Design Phases
No ratings yet
Compiler Construction Design Phases
7 pages
CD Full Material
No ratings yet
CD Full Material
74 pages
Compiler Desining Complete Notes
No ratings yet
Compiler Desining Complete Notes
175 pages
CST302_FullNotes
No ratings yet
CST302_FullNotes
134 pages
CD - 1
No ratings yet
CD - 1
22 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
27 pages
Compiler Design Note1
No ratings yet
Compiler Design Note1
111 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Phase 1: Lexical Analysis: Example
No ratings yet
Phase 1: Lexical Analysis: Example
6 pages
Phases of Compiler
No ratings yet
Phases of Compiler
9 pages
1.Q and A Compiler Design
No ratings yet
1.Q and A Compiler Design
20 pages
Module 1
No ratings yet
Module 1
133 pages
CS6109-MODULE-1
No ratings yet
CS6109-MODULE-1
40 pages
Com Phases
No ratings yet
Com Phases
10 pages
CD Model Set-3 Answer Key
No ratings yet
CD Model Set-3 Answer Key
29 pages
CD Unit I
No ratings yet
CD Unit I
27 pages
CD_Unit1_Lecture2-3
No ratings yet
CD_Unit1_Lecture2-3
32 pages
Compiler Design
From Everand
Compiler Design
Knowledge Flow
No ratings yet
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
C Programmin Language
From Everand
C Programmin Language
Knowledge Flow
No ratings yet
Space Shooter Final
100% (2)
Space Shooter Final
25 pages
STUDENT Ienabler Registration Manual All Students CJC
No ratings yet
STUDENT Ienabler Registration Manual All Students CJC
8 pages
Mind Maps 2023 Cissp
100% (2)
Mind Maps 2023 Cissp
71 pages
SH (NA) - 080862ENG-M - GT Converter2 Version3 Operating Manual For GT Works3
No ratings yet
SH (NA) - 080862ENG-M - GT Converter2 Version3 Operating Manual For GT Works3
66 pages
Unit 1
No ratings yet
Unit 1
32 pages
Bank Copy Candidate
No ratings yet
Bank Copy Candidate
1 page
Veeam Backup Replication Best Practices
No ratings yet
Veeam Backup Replication Best Practices
322 pages
Testsdumps: Latest Test Dumps For It Exam Certification
No ratings yet
Testsdumps: Latest Test Dumps For It Exam Certification
7 pages
Word Tute
No ratings yet
Word Tute
15 pages
Module 1 Planning and Provisioning Office 365
No ratings yet
Module 1 Planning and Provisioning Office 365
14 pages
World Setup En.a4
No ratings yet
World Setup En.a4
4 pages
Workshop 01
50% (2)
Workshop 01
3 pages
ABB Productattachments Files T e Technical Specification Dpa Upscaleri
No ratings yet
ABB Productattachments Files T e Technical Specification Dpa Upscaleri
17 pages
Aim: Hardware Required
No ratings yet
Aim: Hardware Required
12 pages
Technojet Project Proposal
No ratings yet
Technojet Project Proposal
11 pages
L2KDocVersion1!04!20 June 2012
No ratings yet
L2KDocVersion1!04!20 June 2012
71 pages
Project Report On Distance Learning System: Chapter - 1
No ratings yet
Project Report On Distance Learning System: Chapter - 1
20 pages
3-En - STM32L4 System Interconnect PDF
No ratings yet
3-En - STM32L4 System Interconnect PDF
9 pages
SSC Prev Year Spelling Correction
No ratings yet
SSC Prev Year Spelling Correction
39 pages
MAD Micro Project
No ratings yet
MAD Micro Project
23 pages
Passive Trading
25% (4)
Passive Trading
20 pages
Chapter 1: Indices & Logarithms: 1.1 Exponent 1.2 Logarithms 1.3 Exponent & Logarithms Equation
100% (1)
Chapter 1: Indices & Logarithms: 1.1 Exponent 1.2 Logarithms 1.3 Exponent & Logarithms Equation
22 pages
Chapter 3 Time Series Decomposition
No ratings yet
Chapter 3 Time Series Decomposition
20 pages
DFMA
No ratings yet
DFMA
34 pages
Exp1 Pcom
No ratings yet
Exp1 Pcom
7 pages
LECTURE 1 Introduction to Computing and Understanding the ICT environment (1)
No ratings yet
LECTURE 1 Introduction to Computing and Understanding the ICT environment (1)
47 pages
LC LabsoltionOperatorsGuide 202011
No ratings yet
LC LabsoltionOperatorsGuide 202011
380 pages
تجربة كود
No ratings yet
تجربة كود
3 pages