0% found this document useful (0 votes)

16 views47 pages

Chapter 1 (Introduction)

Uploaded by

newsetup48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views47 pages

Chapter 1 (Introduction)

Uploaded by

newsetup48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

Introduction

A compiler is a program that can read a program in one language -

the source language - and translate it into an equivalent program in
another language - the target language; see Fig. 1 . 1 .

An important role of the compiler is to report any errors in the source
program that it detects during the translation process.

1
Compiler

2
Compiler
• If the target program is an executable machine-language program, it
can then be called by the user to process inputs and produce outputs;
see Fig. 1.2.

3
Interpreter
An interpreter is another common kind of language processor. Instead
of producing a target program as a translation, an interpreter appears
to directly execute the operations specified in the source program on
inputs supplied by the user, as shown in Fig . 1.3.

4
Interpreter vs. Compiler

The machine-language target program produced by a compiler is

usually much faster than an interpreter at mapping inputs to outputs .
An interpreter, however, can usually give better error diagnostics than a
compiler, because it executes the source program statement by
statement.

5
Language Processors
Example 1.1 : Java language processors combine compilation and interpretation,
as shown in Fig. 1.4. A Java source program may first be compiled into an
intermediate form called bytecodes. The bytecodes are then interpreted by a
virtual machine. A benefit of this arrangement is that byte codes compiled on one
machine can be interpreted on another machine , perhaps across a network.

In order to achieve faster processing of inputs to outputs, some Java compilers,
called just-in-time compilers, translate the bytecodes into machine language
immediately before they run the intermediate program to process the input.
6
Language Processors

7
pre-processor

In addition to a compiler, several other programs may be required to

create an executable target program, as shown in Fig. 1 .5.

A source program may be divided into modules stored in separate files.

The task of collecting the source program is sometimes entrusted to a
separate program, called a pre-processor.

8
Compiler

• The modified source program is then fed to a compiler. The compiler

may produce an assembly-language program as its output, because
assembly language is easier to produce as output and is easier to
debug.

• The assembly language is then processed by a program called an

assembler that produces relocatable machine code as its output.

9
Linker
• Large programs are often compiled in pieces, so the relocatable machine
code may have to be linked together with other relocatable object files and
library files into the code that actually runs on the machine.

• The linker resolves external memory addresses, where the code in one file
may refer to a location in another file.

• The loader then puts together all of the executable object files into
memory for execution.

10
Language Processing

11
Structure of a Compiler

• The analysis part breaks up the source program into constituent pieces and
imposes a grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program.

• If the analysis part detects that the source program is either syntactically ill
formed or semantically unsound, then it must provide informative
messages, so the user can take corrective action.

12
Structure of a Compiler
• The analysis part also collects information about the source program and
stores it in a data structure called a symbol table, which is passed along
with the intermediate representation to the synthesis part.

• The synthesis part constructs the desired target program from the
intermediate representation and the information in the symbol table.

• The analysis part is often called the front end of the compiler; the synthesis
part is the back end.

13
Structure of a Compiler
If we examine the compilation process in more detail, we see that it operates as a
sequence of phases, each of which transforms one representation of the source
program to another. A typical decomposition of a compiler into phases is shown
in Fig. 1 .6.
In practice, several phases may be grouped together, and the intermediate
representations between the grouped phases need not be constructed explicitly.
The symbol table, which stores information about the entire source program, is
used by all phases of the compiler.

14
Structure of a Compiler

15
Lexical Analysis

 The first phase of a compiler is called lexical analysis or scanning. The

lexical analyser reads the stream of characters making up the source
program and groups the characters into meaningful sequences called
lexemes.

 For each lexeme, the lexical analyser produces as output a token of

the form:

16
Lexical Analysis

 In the token, the first component token- name is an abstract symbol that is
used during syntax analysis , and the second component attribute-value
points to an entry in the symbol table for this token.

 Information from the symbol-table entry is needed for semantic analysis

and code generation.

 For example, suppose a source program contains the assignment

statement:

17
Lexical Analysis
• The characters in this assignment could be grouped into the
following lexemes and mapped into the following tokens passed on to the
syntax analyser:

1. position is a lexeme that would be mapped into a token {id, 1 ), where id is

an abstract symbol standing for identifier and 1 points to the symbol table
entry for position. The symbol-table entry for an identifier holds
information about the identifier, such as its name and type.

18
Lexical Analysis

2. The assignment symbol = is a lexeme that is mapped into the token {=).
Since this token needs no attribute-value, we have omitted the second
component . We could have used any abstract symbol such as assign for
the token-name , but for notational convenience we have chosen to use
the lexeme itself as the name of the abstract symbol.
3. Initial is a lexeme that is mapped into the token (id, 2 ) , where 2 points
to the symbol-table entry for initial .

19
Lexical Analysis

4. + is a lexeme that is mapped into the token (+).

5. rate is a lexeme that is mapped into the token ( id, 3 ) , where 3 points to
the symbol-table entry for rate.

6. * is a lexeme that is mapped into the token (* ) .

7. 60 is a lexeme that is mapped into the token (60) .

Blanks separating the lexemes would be discarded by the lexical analyser.

20
Lexical Analysis

 Figure 1.7 shows the representation of the assignment statement

(1.1) after lexical analysis as the sequence of tokens:

 In this representation, the token names =, +, and * are abstract

symbols for the assignment, addition, and multiplication operators,
respectively.

21
Structure of a Compiler

22
Syntax Analysis
 The second phase of the compiler is syntax analysis or parsing. The parser
uses the first components of the tokens produced by the lexical analyser to
create a tree-like intermediate representation that depicts the grammatical
structure of the token stream.
 A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation.
 A syntax tree for the token stream (1.2) is shown as the output of the
syntactic analyser in Fig. 1.7.
23
Syntax Analysis
 This tree shows the order in which the operations in the assignment are to be
performed.

 The tree has an interior node labelled * with ( id, 3 ) as its left child and the
integer 60 as its right child. The node (id, 3) represents the identifier rate. The
node labelled * makes it explicit that we must first multiply the value of rate by
60.
 The node labelled + indicates that we must add the result of this multiplication to
the value of initial.
24
Syntax Analysis

• The root of the tree, labelled =, indicates that we must store the
result of this addition into the location for the identifier position.

• This ordering of operations is consistent with the usual conventions of

arithmetic which tell us that multiplication has higher precedence
than addition, and hence that the multiplication is to be performed
before the addition.

25
Syntax Analysis

 The subsequent phases of the compiler use the grammatical structure

to help analyse the source program and generate the target program.

 We shall use context-free grammars to specify the grammatical

structure of programming languages and discuss algorithms for
constructing efficient syntax analysers automatically from certain
classes of grammars.

26
Semantic Analysis
 The semantic analyser uses the syntax tree and the information in the symbol table to
check the source program for semantic consistency with the language definition.

 It also gathers type information and saves it in either the syntax tree or the symbol table,
for subsequent use during intermediate-code generation.

 An important part of semantic analysis is type checking, where the compiler checks that
each operator has matching operands. For example, many programming language
definitions require an array index to be an integer; the compiler must report an error if a
floating-point number is used to index an array.

27
Semantic Analysis
• Such a coercion appears in Fig. 1 . 7. Suppose that position, initial, and rate have
been declared to be floating-point numbers, and that the lexeme 60 by itself
forms an integer.
• The type checker in the semantic analyser in Fig. 1.7 discovers that the operator *
is applied to a floating-point number rate and an integer 60. In this case, the
integer may be converted into a floating-point number.

• In Fig. 1 .7, notice that the output of the semantic analyser has an extra node for
the operator inttofloat , which explicitly converts its integer argument into a
floating-point number. 28
Intermediate Code Generation

• In the process of translating a source program into target code, a

compiler may construct one or more intermediate representations,
which can have a variety of forms.

• Syntax trees are a form of intermediate representation; they are

commonly used during syntax and semantic analysis.

29
Intermediate Code Generation

 After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation,
which we can think of as a program for an abstract machine.

 This intermediate representation should have two important properties: it

should be easy to produce and it should be easy to translate into the target
machine.

30
Intermediate Code Generation
• The output of the intermediate code generator in Fig. 1 . 7 consists of the three-address
code sequence:

• There are several points worth noting about three-address instructions. First, each three-
address assignment instruction has at most one operator on the right side. Thus, these
instructions fix the order in which operations are to be done; the multiplication precedes
the addition in the source program. Second, the compiler must generate a temporary
name to hold the value computed by a three-address instruction. Third, some "three-
address instructions" like the first and last in the sequence (1 .3) , above, have fewer than
three operands. 31
Code Optimization
• The machine-independent code-optimization phase attempts to improve
the intermediate code so that better target code will result. Usually better
means faster, but other objectives may be desired, such as shorter code, or
target code that consumes less power.

• For example, a straightforward algorithm generates the intermediate code

( 1 .3) , using an instruction for each operator in the tree representation
that comes from the semantic analyser.

32
Intermediate Code Generation
• A simple intermediate code generation algorithm followed by code
optimization is a reasonable way to generate good target code.

• The optimizer can deduce that the conversion of 60 from integer to floating
point can be done once and for all at compile time, so the inttofloat
operation can be eliminated by replacing the integer 60 by the floating-
point number 60.0. Moreover, t3 is used only once to transmit its value to
id1 so the optimizer can transform ( 1 .3) into the shorter sequence:

33
Intermediate Code Generation

• There is a great variation in the amount of code optimization different

compilers perform. In those that do the most, the so-called "optimizing
compilers,“ a significant amount of time is spent on this phase.

• There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much.

34
Code Generation
• The code generator takes as input an intermediate representation of the source
program and maps it into the target language. If the target language is machine
code, registers Or memory locations are selected for each of the variables used
by the program.
• Then, the intermediate instructions are translated into sequences of machine
instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers to
hold variables.

35
Code Generation

• For example, using registers R1 and R2, the intermediate code in ( 1 .4)
might get translated into the machine code:

• The first operand of each instruction specifies a destination. The F in each

instruction tells us that it deals with floating-point numbers. The code in

36
Code Generation
 The code in ( 1 .5) loads the contents of address id3 into register R2, then
multiplies it with floating-point constant 60.0.

 The # signifies that 60.0 is to be treated as an immediate constant. The

third instruction moves id2 into register Rl and the fourth adds to it the
value previously computed in register R2.

 Finally, the value in register Rl is stored into the address of idl , so the code
correctly implements the assignment statement ( 1 . 1) .

37
Symbol-Table Management
• An essential function of a compiler is to record the variable names used in
the source program and collect information about various attributes of
each name.

• These attributes may provide information about the storage allocated for a
name, its type, its scope (where in the program its value may be used), and
in the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or
by reference), and the type returned.
38
Symbol-Table Management

• The symbol table is a data structure containing a record for each

variable name, with fields for the attributes of the name.

• The data structure should be designed to allow the compiler to find

the record for each name quickly and to store or retrieve data from
that record quickly.

39
Compiler- Construction Tools

1. Parser generators that automatically produce syntax analysers from a

grammatical description of a programming language.

2. Scanner generators that produce lexical analysers from a regular-

expression description of the tokens of a language.

3. Syntax-directed translation engines that produce collections of

routines for walking a parse tree and generating intermediate code.

40
Compiler- Construction Tools
4. Code-generator generators that produce a code generator from a collection of
rules for translating each operation of the intermediate language into the
machine language for a target machine.
5. Data-flow analysis engines that facilitate the gathering of information about how
values are transmitted from one part of a program to each other part. Data-flow
analysis is a key part of code optimization.
6. Compiler- construction toolkits that provide an integrated set of routines for
constructing various phases of a compiler.

41
Program Translations

• Binary Translation: Compiler technology can be used to translate the

binary code for one machine to that of another, allowing a machine to
run programs originally compiled for another instruction set.

• Binary translation technology has been used by various computer

companies to increase the availability of software for their machines.

42
Type Checking
• Type checking is an effective and well-established technique to catch
inconsistencies in programs.
• It can be used to catch errors, for example, where an operation is applied
to the wrong type of object, or if parameters passed to a procedure do not
match the signature of the procedure.
• Program analysis can go beyond finding type errors by analysing the flow
of data through a program.
• For example, if a pointer is assigned null and then immediately
dereferenced, the program is clearly in error.
43
Bounds Checking

• It is easier to make mistakes when programming in a lower-level language

than a higher-level one. For example, many security breaches in systems
are caused by buffer overflows in programs written in C.

• Because C does not have array bounds checks, it is up to the user to ensure
that the arrays are not accessed out of bounds. Failing to check that the
data supplied by the user can overflow a buffer, the program may be
tricked into storing user data outside of the buffer.
44
Environments and States

Another important distinction we must make when discussing

programming languages is whether changes occurring as the program
runs affect the values of data elements or affect the interpretation of
names for that data.

45
Environments and States

 Consider the C program fragment in Fig. 1.9. Integer i is declared a

global variable, and also declared as a variable local to function f·

 When f is executing, the environment adjusts so that name i refers to

the location reserved for the i, and any use of i, such as the
assignment i = 3 shown explicitly, refers to that location. Typically, the
local i is given a place on the run-time stack.

46
Environments and States

TNT135 Wiring Diagram-Color PDF
50% (4)
TNT135 Wiring Diagram-Color PDF
1 page
KANNAD 406 As Installation Operation Manual DMA176M
No ratings yet
KANNAD 406 As Installation Operation Manual DMA176M
48 pages
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
From Everand
Assembly Programming:Simple, Short, And Straightforward Way Of Learning Assembly Language
Sherwyn Allibang
5/5 (2)
All Arduino-Modules-Sensors Kit Project
100% (2)
All Arduino-Modules-Sensors Kit Project
4 pages
Lec#1
No ratings yet
Lec#1
36 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
124 pages
CD - 1
No ratings yet
CD - 1
22 pages
CD Finalized Notes
No ratings yet
CD Finalized Notes
6 pages
Compiler Lec-One
No ratings yet
Compiler Lec-One
46 pages
Compiler Construction
No ratings yet
Compiler Construction
35 pages
Lecture1 - Compiler Design
No ratings yet
Lecture1 - Compiler Design
52 pages
L2 - Structure of A Compiler
No ratings yet
L2 - Structure of A Compiler
43 pages
Compiler Design
No ratings yet
Compiler Design
47 pages
m433-نظرية المترجمات د عبدالباقي
No ratings yet
m433-نظرية المترجمات د عبدالباقي
146 pages
Introduction To Compiler Design-Unit I
No ratings yet
Introduction To Compiler Design-Unit I
30 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
No ratings yet
Compiler Design: Dr. M. Moshiul Hoque Dept. of CSE, CUET
53 pages
Unit 1 1
No ratings yet
Unit 1 1
42 pages
Intro To Compilers
No ratings yet
Intro To Compilers
77 pages
Chapter 1 - Introduction To Comp
No ratings yet
Chapter 1 - Introduction To Comp
27 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
40 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Module 1-1
No ratings yet
Module 1-1
22 pages
CD - Module 1
No ratings yet
CD - Module 1
22 pages
BCS 324 Lesson 1
No ratings yet
BCS 324 Lesson 1
28 pages
Automata Theory and Compiler Design
No ratings yet
Automata Theory and Compiler Design
55 pages
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 1 & 2 Instructor Name: Atif Ishaq
29 pages
CSC 415
No ratings yet
CSC 415
52 pages
Compiler Design
No ratings yet
Compiler Design
23 pages
1-Introduction To Compilers
No ratings yet
1-Introduction To Compilers
41 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
CH1 3
No ratings yet
CH1 3
32 pages
Chapter 1
No ratings yet
Chapter 1
35 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Compiler Design and Implementation
No ratings yet
Compiler Design and Implementation
5 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
27 pages
Unit 1 Compiler Design
No ratings yet
Unit 1 Compiler Design
43 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
CD Unit I Part I Introduction
No ratings yet
CD Unit I Part I Introduction
67 pages
Chapter 1
No ratings yet
Chapter 1
40 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
CC 1
No ratings yet
CC 1
41 pages
Introduction
No ratings yet
Introduction
23 pages
Language Processing System:-: Compiler
No ratings yet
Language Processing System:-: Compiler
6 pages
CD Introduction
No ratings yet
CD Introduction
32 pages
Lec 2
No ratings yet
Lec 2
21 pages
Unit 1
No ratings yet
Unit 1
109 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
Chapter-1 Compiler Design
100% (1)
Chapter-1 Compiler Design
13 pages
Compiler Construction Notes
No ratings yet
Compiler Construction Notes
61 pages
Compiler
No ratings yet
Compiler
17 pages
VI-Semester Departmental Elective CY603 (C) - Autometa & Compiler Design
No ratings yet
VI-Semester Departmental Elective CY603 (C) - Autometa & Compiler Design
94 pages
Lec00 Outline
No ratings yet
Lec00 Outline
27 pages
Chap1 (Minimized)
No ratings yet
Chap1 (Minimized)
23 pages
Compiler Design Mod 1
No ratings yet
Compiler Design Mod 1
75 pages
Unit-1 Notes CD OU
No ratings yet
Unit-1 Notes CD OU
19 pages
CSC Slides Intro N Lex
No ratings yet
CSC Slides Intro N Lex
77 pages
Compiler Design: Computer Science
No ratings yet
Compiler Design: Computer Science
117 pages
Introduction, Lexical Analysis 1.1 Language Processors:: Compiled By: Dept. of CSE SJEC, M'luru
No ratings yet
Introduction, Lexical Analysis 1.1 Language Processors:: Compiled By: Dept. of CSE SJEC, M'luru
52 pages
CD Decode
100% (1)
CD Decode
169 pages
Learn Java Programming in 24 Hours
From Everand
Learn Java Programming in 24 Hours
PublishDrive
No ratings yet
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Hyper-Threading Technology Architecture and Microarchitecture - Summary
No ratings yet
Hyper-Threading Technology Architecture and Microarchitecture - Summary
4 pages
The Problem and Review of Related Literature: Oil Refineries Crude Oil Distillation
No ratings yet
The Problem and Review of Related Literature: Oil Refineries Crude Oil Distillation
50 pages
1 Pengantar Keandalan
100% (1)
1 Pengantar Keandalan
144 pages
229 HVDC Note HVDC Note For 8 Sem Subjects Electrical Engineering All Chapters Include of
No ratings yet
229 HVDC Note HVDC Note For 8 Sem Subjects Electrical Engineering All Chapters Include of
58 pages
Contact App Android: Project Description
100% (1)
Contact App Android: Project Description
16 pages
10 Aaa21700ag PDF
No ratings yet
10 Aaa21700ag PDF
9 pages
Case Discussion Note - Netflix
No ratings yet
Case Discussion Note - Netflix
2 pages
Technology Reading Comprehension Exercises 22566
100% (3)
Technology Reading Comprehension Exercises 22566
2 pages
Supervisor Daily Report-Time Sheet-R5a - in Progress
No ratings yet
Supervisor Daily Report-Time Sheet-R5a - in Progress
1 page
Frigidaire Parts
No ratings yet
Frigidaire Parts
7 pages
Arthur Evoyan CV
No ratings yet
Arthur Evoyan CV
3 pages
Reliasoft Software - Blocksim
No ratings yet
Reliasoft Software - Blocksim
4 pages
GSM & WCDMA Paging Load LAC Split Guideline
100% (1)
GSM & WCDMA Paging Load LAC Split Guideline
10 pages
PSI Dossier Rev-03
No ratings yet
PSI Dossier Rev-03
1 page
Centersnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation
No ratings yet
Centersnap: Single-Shot Multi-Object 3D Shape Reconstruction and Categorical 6D Pose and Size Estimation
1 page
Synopsis Format
No ratings yet
Synopsis Format
19 pages
EOR-Electrical Engineering Services Rev 0
No ratings yet
EOR-Electrical Engineering Services Rev 0
12 pages
BES 01 NOTES (Finals 1st Sem)
No ratings yet
BES 01 NOTES (Finals 1st Sem)
12 pages
Teoria HPC
No ratings yet
Teoria HPC
73 pages
A-147-2 Vcdlfo
No ratings yet
A-147-2 Vcdlfo
2 pages
Atys Atys R 4x1000a F 95234100
No ratings yet
Atys Atys R 4x1000a F 95234100
4 pages
Creating Custom Record Sharing Logic Using Salesforce Flow - Automation Champion
No ratings yet
Creating Custom Record Sharing Logic Using Salesforce Flow - Automation Champion
27 pages
Data: Introduction, Data Types, Data Collections
No ratings yet
Data: Introduction, Data Types, Data Collections
6 pages
Method Statement of Drywall Installation For Gypsum Boards
100% (1)
Method Statement of Drywall Installation For Gypsum Boards
7 pages
Entradas Analogica Kfd2-Stc4-Ex PDF
No ratings yet
Entradas Analogica Kfd2-Stc4-Ex PDF
3 pages
Izn PDF
No ratings yet
Izn PDF
8 pages