SPCC Resource Book SPCC Resource Book
SPCC Resource Book SPCC Resource Book
SYSTEM PROGRAMMING
AND
COMPILER CONSTRUCTION
(TE- SEM-VI)
TCET, Mumbai
PREFACE
System Programs and compilers are an essential part of any computer system. Similarly, a course on
System Programming and Compiler Construction is an essential part of any computer-science education.
This field is undergoing rapid change, as computers are now prevalent in virtually every application. Yet
the fundamentals remain the same. The goal of this resource book is to provide TCET students a clear
description of concepts that underlie the different system software. As pre-requisites, we assume that the
reader is familiar with basic data structures, computer organization, Theory of computation and a high-
level language such as C or Java. At Mumbai University the System Programming and Compiler
Construction as a subject is introduced in the third year (Sem VI). The Objectives of this course is to make
the learner appreciate the role of different system software. The resource book is intended for
undergraduate course in System Programming and Compiler Construction. In this preface, we guide the
reader in the use of the book. Hence we briefly present here the summary of each chapter.
This module differentiates among various parsing techniques and grammar transformations. It also deals
with the concept of Syntax Directed Definition (SDD) and Syntax Directed Translation (SDT).
EXAM SPECIFIC
1. All modules are equally important. Emphasis on module 1, 2,3 and 4 can be given from the
examination point of view because it covers (almost 60%) of university question paper which includes
theory and system designing problem.
2. Neat labeled diagram should be drawn as per the requirement of the question mentioned in the question
paper.
3. Read the question paper thoroughly first then choose the five questions. Attempt the one that you know
the best first but do not change the internal sequence of the sub questions.
4. Minimum passing marks in theory paper - 30/75 and in term work 10/25.
5. For further subject clarification/ doubt in the subject, students can contact the subject teacher.
Theory:
Write content as per marks distribution.
Highlight the main points.
Write examples for the topic asked in the question.
Write necessary content related to the point.
Draw neat and labeled diagrams wherever necessary.
While writing distinguishing points, write double the number of points as per the marks given,
excluding the example.
Numerical:
Important steps should be written as they carry stepwise marks. The steps are as follows:
Given data
Diagrams (wherever applicable)
Formula
Substitution
Calculation
Answer with proper units
Derivation:
Important steps to be followed while attempting questions involving derivations:
Write statement of theorem / proof.
Mention necessary assumptions to be considered in the derivation.
Draw neat and labeled diagrams wherever required.
Define the variables which are being used.
Mention the formula which is being used.
Write stepwise formulations and necessary substitutions.
Highlight the equation or formula proved in the last step.
Note- To get better result, we recommend that the quality of answers should be as per those given in
university question-sample answers.
Syllabus Detailing
Keywords Used for framing of Learning Objectives and its Case Study (Sample)
A. Keywords
Sr.No. Remembering Understanding Applying Analyzing Evaluating Creating
1 Label Compare Change Conclude Choose Arrange
2 List Explain Use Deduce Test Collect
FE 3 Select Illustrate Show Question Revise Modify
4 Name Classify Complete Illustrate Evaluate Rewrite
5 State Derive Calculate Outline Determine Create
6 Write Differentiate Classify Identify Contrast Construct
7 Read Discuss Illustrate Revise
Course Scheme
Course Objective: The Objective of this course is to compare the role and functioning of various system
programs over application program, understand the role of various system programs from program
development to program execution and design of Assemblers, Macro processor, Linker, Loader, Compiler.
Course Outcomes: Upon completion of the course students will be able to:
Assembler, programs such as web services, web browser, Assembler with the help
Interpreter. spreadsheets, library management system etc. of examples (E)
4. Compare different
language translator
compiler, Interpreter and
Students Evaluation – Assembler (E)
1. Theory Questions to be asked on system
software and application software 5. Use of different
2. Lab experiments: Case study on different system software’s for
System and Application Software program development.
3. Corresponding viva questions can be asked (AN)
on different System and Application Software
6. Illustrate the working
of different system
softwares (A)
Modul Chapter Assemblers: Purpose- 1. To Compare between
e2 2 Elements of Macros and procedure
Assembl Assembly To make students learn the various elements of and Specify when to use
ers and Language Assembly language programming, study the macro and procedure
Macro Programming, design of single pass and multi-pass .(E)
Processo Basic assemblers. Learn about the design and use of 2. To Identify macros as
rs (Hours Assembler two-pass IBM 360/370 assemblers. and when required to
-10) functions , increase readability,
Design of the This chapter focused on how to use macro & productivity of program.
Assembler, procedure in programs, differentiation between (AN)
Types of macros & procedure/function/subroutine, 3. To Design and
Assemblers, comprehend the definition and expansion of develop two-pass Macro
Two pass macro instructions and to design & implement processor with the help
assembler – macro processor of databases and
IBM 360/370, Scope – Illustrate the working
Format of 1. Academic Aspects- Explore data two-pass
databases, structures, databases, algorithms and Macroprocessor (C)
Algorithm flowchart of single and two-pass
,Single pass Assembler. Explore data structures, 4. To Design and
Assembler for databases, algorithms and flowchart of develop 2 pass Assembler
Intel x86. two-pass Macroprocessor for IBM 360/370 machine
Macro (C)
Processors: 2. Technology Aspect- Assembler is a 5. Outline of single pass
Macros, Basic language translator, which translates Assembler for X86
Functions of Assembly language program to machine code. machine(AN)
Macro Design and development of single pass
Processor, Assembler for X86 machine and two-pass 6. Illustrate the working
Features of Assembler for IBM 360/370 machine. of single pass and two-
Macro Facility, Design and develop two –pass pass Assembler (A)
Design of Two macroprocessor for IBM 360/370 Assembler.
pass Macro
Processor, 3. Application Aspect- Role of Assembler as
Format of a language translator. Write macros as and
Databases and when required to increase readability,
Algorithm productivity of program
Students Evaluation –
1. Subjective questions on functions of
Assemblers, Macroprocessor, macro facilities.
2. Listing the database formats.
5. Illustrate and
distinguish between
compiler , Interpreter
and Assembler . (U)
6.Summarize the
working of compiler with
the help of example and
specify the output of
each phase.(AN)
Modul Chapter Syntax Purpose – 1. Define Context Free
e Parsing Analysis: The This chapter is covering the role pf parser in Grammar and Describe
5 (Hours - Role of Parser, compiler design, Understand the different Top- the structure of YACC
12) Down and Bottom-Up parsing techniques. specification and Apply
Top down
Scope – YACC Compiler for
parsing- Automatic Generation of
1. Academic Aspects-
Predictive To check the efficiency of Bottom-Up parser Parser Generator. (U)
parsers (LL), over Top-Down parser. To understand the role
Bottom Up of parser in compiler design.
parsing - 2. Technology Aspect- 3. Describe the role of
Operator Design and development of various Top-Down parser in compilation
and Bottom-Up parsing techniques. process. Explain
precedence
3. Application Aspect- different top down and
parsing, SLR, Bottom-up parsing
Construct the parse tree for sequence of
LR (1), LALR, tokens generated by lexical analyzer techniques. (E )
automatic Student Evaluation –
construction of 1. Problems on Top-Down and Bottom-Up
parsers using parsing techniques, such as LL(1), LR(0), 4. Specify various parsing
YACC. LR(1), LALR and operator precedence parser. techniques to design new
2. Mini project: Top-Down and Bottom-Up language structures with
Introduction to
parsing techniques. the help of grammars.(C)
Semantic
Analysis: Need 3. Lab experiments based on First () and
Follow () sets and left recursive and left 5. Explain the
of semantic
factored grammar. construction and role of
analysis, type
4. GATE Questions based on parser, the syntax tree in the
checking and
difference between Top-Down and Bottom- context of Parse tree.(U)
type conversion
Up parsing techniques..
5. Theory and viva questions based on JAVA 6. Distinguish between
compiler environment , LEX and YACC tool Parse tree , Syntax tree
and DAG for graphical
Student Evaluation –
1. Theory and viva questions on machine
dependent and independent code optimization
techniques
2. Lab experiments based on code
optimization techniques
3. GATE questions on code optimization
techniques
CO and PO Mapping
CO1 √ √ √ √ √ √ √ √ √ √
CO2 √ √ √ √ √ √ √ √ √ √ √
CO3 √ √ √ √ √ √ √ √
CO4 √ √ √ √ √ √ √ √ √ √
CO5 √ √ √ √ √ √ √ √ √
CO6 √ √ √ √ √ √ √ √ √ √ √
LL-> LL 1 LL 2 LL 3 LL 4 LL 5 LL 6
Module-I √ √ √ √ √ √
Module-II √ √ √ √ √
Module-III √ √ √ √ √
Module-IV √ √ √ √ √
Module-V √ √ √ √ √
Module-VI √ √ √ √ √ √
List of Experiment
Resources/ Tools/
Sr.
Experiment Name Technology to be Learning Objectives Learning Outcomes
No
used
INDEX
Module Contents Page number
1 Overview of System Software
• Introduction 1-53
• Assemblers
• Loaders
• Linkers
• Macro processors
• Compilers
• Interpreters
• Operating systems
• Device drivers
•
• Objective Questions with answers
• Subjective Questions with answers
• University Questions with answers
5 Parsing
• Role of Parser 141-160
• Top-down parsing
• Recursive descent and predictive parsers (LL)
• Bottom-Up parsing, Operator precedence parsing
• LR, SLR and LALR parsers
• Syntax directed definitions
• Inherited and Synthesized attributes
• Evaluation order for SDDs
• S attributed Definitions
• L attributed Definitions
YACC compiler-compiler
• Objective Questions with answers………………
• Subjective Questions with answers…………………
• University Questions with answers……………..
MODULE 1
Overview of System Software
1.1 Motivation:
To provide the students with the knowledge of
1. 2. Learning Objective:
• To learn various system software9s such as assemblers, loaders, linkers, macro processors,
compiler, interpreters, operating systems, device drivers.
• Differentiate between application software & system software.
1.3 Syllabus:
1.5 Definitions
Compiler: is a translator (program) which convert source program (Written in C,C++ etc)
into machine code.
Assembler: is a translator (program) which convert source program (Written in Assembly)
into machine code.
Interpreter: is a translator (program) which convert source program line by line into
Intermediate code, which it then executes.
Operating System: is a program which provide interface or communication between
hardware & software.
Loader: A program routine that copies a program into memory for execution.
Linker: is a program that combines object modules to form an executable program.
Macro Processor: is a program which is responsible for processing the macros.
Device Drivers: is a program that controls a particular type of device that is attached to
your computer.
Software: is a program. Program is a group of instructions.
Lecture 1 & 2
1.7.1 Types of Software:
Assembler:
An assembler is a program that accepts as input an assembly language program &
produces its machine language equivalent along with information for the loader.
Assembler
ALP m/c language & other information for the loaders
Databases
Compiler:
A compiler is a program that reads an input in HLL & translates it into an equivalent
program in machine language.
Compiler
Source m/c language Program ( e.g C,C++)
Error Messages
An interpreter is a translator that reads the program written in HLL & translate line by
line into an intermediate form, which it then executes.
Loader:
Loader is a system program which is responsible for preparing the object program for execution
& initiates the execution. OR
A program routine that copies a program into memory for execution.
OR
Operating System utilities that copy program from a storage device to main memory, where they
can be execute. In addition to copying a program into main memory, the loader can also replace
virtual addresses with physical addresses.
Function of Loader:
a) Allocation
b) Linking
c) Relocation
d) Loading
Types of Loaders:
a) Assemble ( Compile) & go loader
b) Absolute loader
c) Bootstrap loader
d) Direct linking loader
Linker:
Linker is a program that combines object modules to form an executable program.
Also called Link editor & binder.
Many programming languages allow you to write different pieces of code, called modules,
separately.
In addition to put all the modules, a linker also replaces symbolic addresses with real addresses.
Therefore, you need to link a program even if it contains only one.
Operating System:
Operating system is system software, consisting of program and data that runs on computer and
manage the computer hardware.
It is an integrated set of programs that controls the resources of a computer system and provides its
users with an interface that is easier to use.
Objective of OS:
-Make a computer system easier to use
-Manage the resources of a computer system
Function of OS
a) Process management: It takes care of creation and deletion of processes.
to another. This module takes care of creation, organization, storage, naming, sharing, backup and
protection of different files.
(5)Scheduling: The OS also establishes and enforces process priority. That is, it determines and
maintains the order in which the jobs are to be executed by the computer system. This is so because
the most important job must be executed first followed by less important jobs.
(6)Security management: This module of the OS ensures data security and integrity.
That is, it protects data and program from destruction and unauthorized access. It keeps different
programs and data which are executing concurrently in the memory in such a manner that they do
not interfere with each other.
(7)Processor management: OS assigns processor to the different task that must be performed by
the computer system. If the computer has more than one processor idle, one of the processes
waiting to be executed is assigned to the idle processor.
OS maintains internal time clock and log of system usage for all the users. It also creates error
message and their debugging and error detecting codes for correcting programs
Types of OS:
1. Multiprogramming: More than one job reside in the memory & all are ready for execution.
2. Multiprocessing: is the simultaneous execution of two or more processes by a computer
system having more than one CPU.
3. Multitasking: Switch from one task to another in small fraction of time.
Technically it is same as multiprogramming. But multiprogramming for multi-user systems (systems
that are uses simultaneously by many users such as mainframe system) and multitasking for single-
user systems (systems that are used by only one user at a time).
4. Batch processing: batch processing system is a one where programs data are collected
together in a batch before processing start.
5. Multithreading: Allows different parts of a single program to run concurrently.
6. Real-time: Responds to input instantly.
• There are device drivers for printers, displays, CD-ROM readers, diskette drives, and so
on.
• When you buy an OS, many device drivers are built into the product.
• A device driver essentially converts the more general input/output instructions of the
operating system to messages that the device type can understand.
• Some Windows programs are Virtual Device Driver. These programs interface with the
Windows Virtual Machine Manager. There is a virtual device driver for each main
hardware device in the system, including the hard disk drive controller, keyboard, and
serial and parallel ports. They're used to maintain the status of a hardware device that has
changeable settings. Virtual device drivers handle software interrupts from the system
rather than hardware interrupts.
• In Windows operating systems, a device driver file usually has a file name suffix of DLL
or EXE. A virtual device driver usually has the suffix of VXD.
1.8 . References:
D. M. Dhamdhere, =Systems programming & Operating Systems=.
Q.5 Which of the following loader is executed when a system is first turned on or
restarted
(A) Boot loader (B) Compile and Go loader
(C) Bootstrap loader (D) Relating loader
Q.6 A linker program
(A) Places the program in the memory for the purpose of execution.
(B) Relocates the program to execute from the specific memory area allocated to it.
(C) Links the program with other programs needed for its execution.
(D) Interfaces the program with the entities generating its input data.
Q. 7 An assembly language is a
(A) low level programming language
(B) Middle level programming language
(C) High level programming language
(D) Internet based programming language
Q.2 Differentiate between application program and system program. Indicate the order in which
following system program are used, from developing program upto its execution. Assembler,
loaders, Linkers, macroprocessor, compiler, editor.
MODULE 2
ASSEMBLERS AND MACRO PROCESSORS
2.1 Motivation:
2.3. Syllabus:
2.5. Definitions
Assembler: is a translator ( program) which convert source program ( Written in assembly) into
machine code.
Macro Processor: is a program which is responsible for processing the macros.
Macro: Single line abbreviation for group of instructions.
Lecture 5 &6
2.7.1 Assembler:
An assembler is a program that accepts as input an assembly language program &
produces its machine language equivalent along with information for the loader.
Assembler
ALP m/c language &
Other information for the loaders
Databases
FOUR DC F 849
Lecture 7
7.2 General Design Procedure
1. Specify the Problem
2) Instruction Formats:
a) RR Format:-
In this format the first & second operands present in the registers.
Format:
Opcode R1 (OP1) R2 (OP2)
0 7 8 11 12 15
e.g AR 3,4
3) RX format:
-In this format first operand is in the register & second operand is external ( present
in the memory location).
-The address of second operand is given by-
C (B2) + C(X2) + D2
Contents of Base Reg. + Contents of Index Reg. + Displacement
Format:
Opcode R1 X1 B2 D2
0 7 8 11 12 15 16 19 20 31
e.g A 1, 90(2,15)
i.e. C(15) + C(2) +90
c. DROP statement:
It is a assembler directive statement which indicate that the given register is
unavailable to be used as base register.
Format:
DROP Register no.
Lecture 8
7.5 Forward Reference Problem:
a) Definition of forward reference:
-The rules of ALP state that the symbol should be defined somewhere in the
program. So that there might be a case in which a reference is made to a symbol
prior to its definition. Such a reference is called forward reference.
b) What is a FRP?
-The function of the assembler is to replace each symbol by its m/c address &
if we refer to that symbol before it is defined , its address is not known by the
assembler. Such a problem is called FRP.
c) Solution to FRP:-
- Forward reference problem (For IBM 360) is solved by making two passes
over the assembly code.
Pass1:-
Purpose- Define Symbols & Literals.
1) Keep track of LC.
2) Determine length of m/c instructions.
Lecture 9
= F 859 28 04 R
-
- LT is used to keep a track on the Literals encountered in the program.
- In pass1- whenever the literals are encountered an entry is made in the LT.
- In pass2- Literal table is used to generate the address of the literal.
1 8N9 -
2 8N9 -
. -
.
.
15 8N9 00
- Code availability-
- Y- Register specified in USING pseudo-opcode.
- N--Register never specified in USING pseudo-opcode.
- BT is used to keep a track on the Register availability.
- In pass1- BT is not used.
- In pass2- In pass2, BT is consulted to find which register can be used as base registers
along with their contents.
Pass2 Databases:
- The purpose of pass2 is to generate the instruction & data.
- The various databases maintained are as follows.
a) Copy File:
It is prepared by pass1.
c) Location counter (LC):
It is used to assign addresses to the instruction & addresses to the symbols
defined in the program.
c)Mnemonic Opcode Table (MOT):
MOT is consulted to obtain the binary opcode, instruction length & instruction
format.
d)Pseudo Opcode Table (POT):
POT is consulted to process some Pseudo opcodes like DS, DC, USING, DROP.
e)Symbol Table (ST):
It is used to generate the address for the symbols.
f) Literal Table (LT):
It is used to generate the address of the literals.
g)Base Table (BT):
It is consulted to find the registers which are available to be used as base registers.
h) Instruction workspace:
It is used to hold the instruction while its various parts are getting assembled.
i) PUNCH workspace:
It is used to punch the assembled instruction onto the cards.
j) PRINT workspace:
It is used to generate a printed assembly listing.
Lecture 10
c) The DS and DC pseudo-ops can affect both the location counter and the definition of
symbols in pass 1.
d) When the END pseudo-op is encountered, pass 1 is terminated.
Lecture 11
7.1 Definition of Macro:
The assembly language programmer often finds certain statements being repeated in the
program. The programmer can take the advantage of 8MACRO9 facility where MACRO is defined
to be 3Single line abbreviation for group of instructions.
Macro Name
Macro Body
.
.
DATA DC F '5'
A 1, DATA
A 2, DATA
A 3, DATA
occurs twice.
A 1,DATA
A 2,DATA
A 3,DATA
MEND
.
.
.
A 1, DATA
INCR A 2, DATA
A 3, DATA
.
.
. A 1, DATA
INCR A 2, DATA
. . A 3, DATA
.
DATA DC F'5'
.
In this case the macro processor replaces each macro call with the lines
A 1, DATA
A 2, DATA
A 3, DATA
• There are four basic tasks/ functions that any macro instruction processor must
perform.
The processor must substitute for dummy or macro definition arguments the corresponding
arguments from a macro call; the resulting symbolic (in this case, assembly language) text
is then substituted for the macro call. This text, of course, may contain additional macro
definitions or calls.
In summary: the macro processor must recognize and process macro definitions
and macro calls.
Lecture 12
Example 2:
.
.
.
.
.
.
DATA DC F '5'
DATA DC F '109
- In this case the instruction sequences are very similar but not identical. The first sequence
performs an operation using DATAl as operand; the second, using
DATA2.
- They can be considered to perform the 3same operation with a variable parameter, -or-
argument. Such a parameter is called a macro instruction argument or "dummy argument;"
it is specified on the macro name line and distinguished by as a macro language symbol
rather than an assembly language symbol)
by the ampersand (&), which is always its first character.
A 1, &ARG
A 2, &ARG
A 3, &ARG
MEND
.
.
.
A 1, DATA1
DATA1 DC F'5'
DATA2 DC F9109
.
.
It is possible to supply more than one argument in a macro call.
Example 3:
.
.
.
DATA DC F '5'
DATA DC F '109
DATA DC F '159
A 1, &ARG1
A 2, &ARG2
A 3, &ARG3
MEND
.
.
.
A 1, DATA1
DATA1 DC F'5'
DATA2 DC F9109
DATA DC F9159
.
.
2. Conditional Macro Expansion:
- Two important macro processor pseudo-ops, AIF and AGO, permit conditional
reordering of the sequence of macro expansion.
-This allows conditional selection of the machine instructions that appear in expansions of a macro
call.
Example 4:
.
.
.
LOOP2 A 3,DATA1 Add contents of DATA1 to register 3
.
.
.
DATA DC F '5'
DATA DC F '109
DATA DC F '159
In this example, the operands, labels, and the number of instructions generated change in each
sequence. This program could be written as follows:
.
.
.
MACRO
&ARG0 VARY &COUNT, &ARG l,&ARG2,&ARG3
&ARG0 A l, & ARG l
AIF (&COUNT EQ 1) .FINI Test if & COUNT = 1
A 2, &ARG2
AIF (&COUNT EQ 2) .FINI Test if & COUNT = 2
A 3,&ARG3
.FINI MEND
.
. Expanded Source
.
LOOP1 VARY 3, DATA 1,DATA2, DATA3 LOOP1 A 1,DATAl
. A 2, DATA2
. A 3, DATA3
.
.
LOOP3 VARY 1,DATA1 { LOOP3 A 1,DATA1
DATAl DC F'5'
DATA2 DC F'10'
DATA3 DC F'15'
- Labels starting with a period (.), such as .FINI, are macro labels and do not
appear in the output of the macro processor.
- The statement AIF (&COUNT EQ l)FINI directs the macro processor to skip to the statement
labeled .FINI if the parameter corresponding to &COUNT is a 1; otherwise, the macro processor
is to continue with the statement following the AIF pseudo-op.
- AIF is a conditional branch pseudo-o ; it performs an arithmetic test and branches only if the
tested condition is true.
- The AGO is an unconditional branch pseudo-op or 'go to' statement.
Example 5:
MACRO
ADD1 & ARG
L 1, & ARG
A 1, =F 819
ST 1, & ARG
MEND
MACRO
ADDS &ARG1, &ARG2 , &ARG3
ADD1 &ARG1
ADD1 &ARG2
ADD1 &ARG3
MEND
- Within the definition of the macro 'ADDS' are three separate calls to a previously defined
macro 8ADD19 & thus has made it more easily understood.
- Such use of macro results in macro expansions on multiple levels.
Lecture 13
7.2 Two Pass Macro processor:
The macro processor algorithm will two passes over the input text, searching for the macro
definition & then for macro calls.
Format of Databases:
1) Argument List Array:
- The Argument List Array (ALA) is used during both pass 1 and pass 2.
During pass 1, in order to simplify later argument replacement during macro expansion, dummy
arguments in the macro definition are replaced with positional indicators when the definition is
stored: The ith dummy argument on the macro name card 'is represented in the body of the macro
by the index marker symbol #.
Where # is a symbol reserved for the use of the macro processor (i.e., not available to
the programmers).
- These symbols are used in conjunction with the argument list prepared before expansion
of a macro call. The symbolic dummy arguments are retained on the macro name card to
enable the macro processor to handle argument replacement by name rather than by
position.
- During pass 2 it is necessary to substitute macro call arguments for the index markers stored
in the macro definition.
17 A 2, #2
18 A 3,#3
19 MEND
. .
. .
. .
7.6 Data bases required for Pass1 & Pass2 Macro processor:
The following data bases are used by the two passes of the macro processor:
6. The Macro Name Table Counter (MNTC), used to indicate the next available entry in the MNT
7. The Argument List Array (ALA), used to substitute index markers for dummy arguments before
storing a macro definition
Pass 2 data bases:
1. The copy of the input macro source deck
2. The output expanded source deck to be used as input to the assembler
3. The Macro Definition Table (MDT), created by pass 1
4. The Macro Name Table (MNT), created by pass 1
5. The Macro Definition Table Pointer (MDTP), used to indicate the next line of text to be used
during macro expansion
6. The Argument List Array (ALA), used to substitute macro call arguments for the index markers
in the stored macro definition
Lecture 14
7.7 Algorithm of Pass1 & Pass2 Macro processor:
PASS I-MACRO DEFINITION: The algorithm for passl tests each input line. If-it is a MACRO
pseudo-op:
1) The entire macro definition that follows is saved in the next available locations in the
Macro Definition Table (MDT).
2) The first line of the definition is the macro name line. The name is entered into
the Macro Name Table (MNT), along with a pointer to the first location of the
MDT entry of the definition.
3) When the END pseudo-op is encountered, all of the macro definitions have been processed
so control transfers to pass 2 in order to process macro calls.
PASS2-MACRO CALLS AND EXPANSION:
The algorithm for pass 2 tests the operation mnemonic of each input line to see if it is a name in
the MNT. When a call is found:-
1) The macro processor sets a pointer, the Macro Definition Table Pointer (MDTP) , to
the corresponding macro definition stored in the MDT. The initial value of the MDTP
is obtained from the "MDT index" field of the MNT entry.
2) The macro expander prepares the Argument List Array(ALA) consisting of a table of
dummy argument indices and corresponding arguments to the call.
3) Reading proceeds from the MDT; as each successive line is read, the values
from the argument list are substituted for dummy argument indices in the macro
definition.
4) Reading of the MEND line in the MDT terminates expansion of the macro, and
scanning continues from the input file.
5) When the END pseudo-op is encountered, the expanded source deck is transferred to
the assembler for further processing.
7.8 Flowchart of Pass1 & Pass2 Macro processor:
2. 9. References:
J.J. Donovan, < System Programming=
D. M. Dhamdhere , =Systems programming & Operating Systems=.
2. 10. Objective Question (minimum 10-15) with answers.
(A) FIFO rule (First in first out) (B) LIFO (Last in First out)
(C) FILO rule (First in last out) (D) None of the above
Q.5 Which of the following is not part of data structure of macro processor?
A)MNT B)MDT
C)MOT D)ALA
Q.6 Which system software used to define the macros?
A) Compiler B) Interpreter
C) Assembler D) Macro processor
Q.7 Which directive used to start the macro in program?
A) START B)MACRO
C)MEND D)none of the above
Q.8 In pass2 macro processor task of pass1 is..
Q.15 In a two pass assembler the pseudo code EQU is to evaluated during-
(A) Pass1 (B) Pass2
(C )Not evaluated by the assembler (D)None of the above
Q.1 What is assembly language? What kinds of statements are present in an assembly language
program? Discuss. Also highlight the advantages of assembly language.
Ans: Assembly language is a family of low-level language for programming computers,
microprocessors, microcontrollers etc. They implement a symbolic representation of the numeric
machine codes and other constants needed to program a particular CPU architecture. This
representation is usually defined by the hardware manufacturer, and is based on abbreviations
(called mnemonic) that help the programmer remember individual instruction, register etc.
Assembly language programming is writing machine instructions in mnemonic form, using an
assembler to convert these mnemonics into actual processor instructions and associated data.
Q.2 What are the functions of passes used in two-pass assembler? Explain pass-1
algorithm?Describe Data structures used during passes of assembler and their use.
Ans:
Data structure during passes of assembler and their use.
Pass 1 data base
1. Input source program
2. A location counter (LC)
3. A table, the machine-operation table (MOT), that indicates the symbolic
mnemonic for each instruction and its length.
4. Pseudo- operation table
5. Symbol table
6. Literal table
7. Copy of the input to be used later by pass 2
Pass 2
1. Copy of source program input to pass 1
2. Location counter (LC)
3. MOT
4. POT
5. ST
6. Base table that indicates which registers are currently specified as base register.
7. A work space, INST, that9s used to hold instruction as its various parts are being
assembled together
8. Punch line, used to produce a printed listing.
9. Punch card for converting assembled instructions into the format needed by the loader.
Q.3 Can the operand expression in an ORG statement contain forward references? If so, outline
how the statement can be processed in a two-pass assembly scheme.
Ans:
(ORG ((orriigiin)) iis an assembllerr diirrecttiive tthatt
• Indirectly assign values to symbols
• Reset the location counter to the specified value-ORG value
• Value can be: constant, other symbol, expression
• No forward reference
- Assemblers scan the source program, generating machine instructions. Sometimes,
the assembler reaches a reference to a variable, which has not yet been defined. This
is referred to as a forward reference problem. It is resolved in a two-pass assembler as follows:
- On the first pass, the assembler simply reads the source file, counting up the number
of locations that each instruction will take, and builds a symbol table in memory
that lists all the defined variables cross-referenced to their associated memory
address.
- On the second pass, the assembler substitutes opcodes for the mnemonics and variable
names are replaced by the memory locations obtained from the symbol table.
Q.4 Explain the differences between macros and subroutines.
Q.5 What is macro-expansion?
Q.8 What are the advantages and disadvantages of macro pre-processor? (8)
- Ans: The advantage of macro pre-processor is that any existing conventional assembler
can be enhanced in this manner to incorporate macro processing. It would reduce the
programming cost involved in making a macro facility available.
- The disadvantage is that this scheme is probably not very efficient because of the time
spent in generating assembly language statement and processing them again for the purpose
of translation to the target language
- Q.6 Explain macro definition, macro call and macro expansion?
i) POT
ii) MOT
iii) ST
iv) LT
Q.4 What are various databases used in two pass assembler design. Explain flowchart with
example.
Q.8) Explain different psudo-ops used for conditional macro expansion, along with
example.
MODULE 3
3.3 Syllabus:
Dynamic Linking
Many operating system environments allow dynamic linking, that is the postponing of the
resolving of some undefined symbols until a program is run. That means that the executable code
still contains undefined symbols, plus a list of objects or libraries that will provide definitions for
these. Loading the program will load these objects/libraries as well, and perform a final linking.
Relocation
It is the process of replacing symbolic references or names of libraries with actual usable addresses
in memory before running a program. It is typically done by the linker during compilation (at
compile time), although it can be done at runtime by a relocating loader. Compilers or assemblers
typically generate the executable with zero as the lower-most starting address.
Relocation Table
It can also be provided in the header of the object code file. Each "fixup" is a pointer to an address
in the object code that must be changed when the loader relocates the program. Fixups are designed
to support relocation of the program as a complete unit. In some cases, each fixup in the table is
itself relative to a base address of zero, so the fixups themselves must be changed as the loader
moves through the table
Self-relocation
It is a process, which an executing computer program may use to effect a change in the base address
at which that computer program executes. This is similar to the relocation process employed by
the loader when a program is copied from external storage into main memory; the difference is the
locus of instructions that compute the relocation. When those instructions reside within the
relocated program, self-relocation occurs.
Linkage editor
An editor program that creates one module from several by resolving cross-references among the
modules.
Linkage editor
Its processing follows the source program assembly or compilation of any problem program. The
linkage editor is both a processing program and a service program used in association with the
language translators.
Linking loader
It Performs all linking and relocation operations, including automatic library search, and loads the
linked program into memory for execution.
Linkage editor It Produces a linked version of the program, which is normally written to a file or
library for later execution.
Lecture 15
Basic Functions
• Allocation: allocate space in memory for the programs
• Linking: Resolve symbolic references between object files
– combines two or more separate object programs
– supplies the information needed to allow references between them
• Relocation: Adjust all address dependent locations, such as address constants, to
correspond to the allocated space
– modifies the object program so that it can be loaded at an address different from
the location originally specified
• Loading: Physically place the machine instructions and data into memory
Lecture 16
6.2 Design of Absolute Loader
– place the assembled machine instructions and data, as they are assembled, directly
into their assigned memory locations
– When the assembly is completed, the assembler causes a transfer to the starting
instruction of the program
Lecture 17 & 18
• Assembler records
• External Symbol Dictionary (ESD) record: Entries and Externals
• (TXT) records control the actual object code translated version of the source program.
• The Relocation and Linkage Directory (RLD) records relocation information
• The END record specifies the starting address for execution
• SD: Segment Definition
• Local Definition
• External Reference
1. Linkage editor produces a linked Linking loader performs all linking &
version of the program which is written relocation operations including
to a file or library for later execution. automatic library search if specified &
loads the linked version of program
directly into the memory for
execution.
2. Linkage editor generates a file which Linking loader does not generates a file but
would be supplied later to the relocating directly loads the program into the memory for
loader for later execution. execution.
3. Resolution of external references & Linking loader searches libraries & resolves
library searching are only performed external references every time the program is
once. executed.
3.8 References:
1. Leland Beck < System Software = Addision Wesley
2. D. M. Dhamdhere; Systems programming & Operating systems , Tata
McGraw Hill
3. J.J. Donovan, < System Programming=
A. Linker
B. Loader
C. Compiler
D. Interpreter
Q 2 Postponing of the resolving of some undefined symbols until a program is run is called as
A. Dynamic Linking
B. Dynamic loading
C. Linking
D. Loading
2. What is the difference between Dynamic Loading and Dynamic Linking explain with an
example.
Module 4
Introduction to Compilers and Lexical Analysis
4.1. Motivation:
To provide the students with the knowledge about compiler and its phases
4.3. Syllabus:
Prerequisites Syllabus Duration Self Study
Fundamental of Introduction to Compilers: Design 2Hr 2 Hr
issues, passes, phases.
Programming Lexical Analysis: The Role of a Lexical
Lecture 20
Compiler
A compiler is a computer program (or set of programs) that transforms source code written in a
programming language (the source language) into another computer language (the target
language, often having a binary form known as object code). The most common reason for wanting
to transform source code is to create an executable program.
Interpreters
An interpreter is also a program that translates a high-level language into a low-level one, but it
does it at the moment the program is run. You write the program using a text editor or something
similar, and then instruct the interpreter to run the program. It takes the program, one line at a time,
and translates each line before running it: It translates the first line and runs it, then translates the
second line and runs it etc. The interpreter has no "memory" for the translated lines, so if it comes
across lines of the program within a loop, it must translate them afresh every time that particular
line runs.
1. Analysis Phase :
a) Lexical analysis - it contains a sequence of characters called tokens. Input is source program &
the output is tokens.
c) Semantic analysis - input is parse tree and the output is expanded version of parse tree
2 .Synthesis Phase :
d) Intermediate Code generation - Here all the errors are checked & it produce an intermediate
code.
e)Code Optimization - the intermediate code is optimized here to get the target program
f) Code Generation - this is the final step & here the target program code is generated.
Lecture: 22
Role of a Lexical analyzer, input buffering, specification and recognition of tokens
Learning Objective: In this lecture student will able to design Lexical Analyzer.
Lexical grammar
The specification of a programming language will often include a set of rules which defines the
lexer. These rules are usually called regular expressions and they define the set of possible
character sequences that are used to form individual tokens or lexemes.
Token
A token is a string of characters, categorized according to the rules as a symbol (e.g. IDENTIFIER,
NUMBER, COMMA, etc.). The process of forming tokens from an input stream of characters is
called tokenization and the lexer categorizes them according to a symbol type
Scanner
The scanner, is usually based on a finite state machine. It has encoded within it information on
the possible sequences of characters that can be contained within any of the tokens it handles
(individual instances of these character sequences are known as lexemes)
Tokenization
It is the process of demarcating and possibly classifying sections of a string of input characters
Q 1. Postponing of the resolving of some undefined symbols until a program is run is called as
A. Dynamic Linking
B. Dynamic loading
C. Linking
D. Loading
A. NFA
B. DFA
C. Turing Machine
Answers: 1) D 2) B
(A) 3
(B) 26
(C) 10
(D) 21
Answer: (C)
Answer: (C)
3. The lexical analysis for a modern computer language such as Java needs the power of which
one of the following machine models in a necessary and sufficient sense?
(A) Finite state automata
(B) Deterministic pushdown automata
(C) Non-Deterministic pushdown automata
(D) Turing Machine
Answer: (A)
Learning from the lecture 8Role of a Lexical analyzer, input buffering, specification and
recognition of tokens=.Student will able to list and explain the role of Lexical Analyzer
in compiler Design.
4.8 References:
Objective Questions
Q1 In a compiler the module that checks every character of the source text is called
A. Syntax analysis
B. Symantic analysis
C. Code generation
D. Error reporting
• Identifier
• Keyword
• Number
• Function
1. What are the different phases of compiler? Illustrate compilers internal presentation of source
program for following statement after each phase [Nov 2016]
2. What are the different phases of compiler? Illustrate compilers internal presentation of source program
for following statement after each phase. [May 2016]
Amount = P+ P * N* R/100
Module 5
Parsing
5.1 Motivation:
Motivation of this chapter is to study & design of different Top-Down & Bottom-
Up parsing techniques.
After studying this module students can easily develop a parser or a syntax analyzer
phase of a compiler.
How to design intermediate code by using syntax directed translation
Detail study of syntax directed definitions & translation scheme.
The fundamental knowledge about Lexical analysis.
5.2. Syllabus:
Lecture Content Duration Self Study
23 Syntax Analysis: Role of Parser 1 Lecture 2 hours
1. List the functions of Lexical Analyzer. Describe the role of Lexical Analyzer in Compiler
Design. (R)
2. Design and Develop hand written Lexical Analyzer and show the Demonstration of working of
lexical analyzer in compiler design . (A)
3. Describe the role of parser in compilation process. Explain different top down and Bottom-up
parsing techniques. (E )
4. Specify various parsing techniques to design new language structures with the help of
grammars.(C)
5. Explain the construction and role of the syntax tree in the context of Parse tree.(U)
6. Distinguish between Parse tree , Syntax tree and DAG for graphical representation of the source
program. (U )
7. Summarize different Compiler Construction tools and Describe the structure of Lex
specification. (AN)
8. Apply LEX Compiler for Automatic Generation of Lexical Analyzer and Construct Lexical
analyzer using open source tool for compiler design.( C )
9. Define Context Free Grammar and Describe the structure of YACC specification and Apply
YACC Compiler for Automatic Generation of Parser Generator. (U)
Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall learn the
basic concepts used in the construction of a parser.A lexical analyzer can identify tokens with the
help of regular expressions and pattern rules. But a lexical analyzer cannot check the syntax of a
given sentence due to the limitations of the regular expressions. Regular expressions cannot check
balancing tokens, such as parenthesis. Therefore, this phase uses context-free grammar (CFG),
which is recognized by push-down automata.CFG, on the other hand, is a superset of Regular
Grammar, as depicted below:
It implies that every Regular Grammar is also context-free, but there exists some problems, which
are beyond the scope of Regular Grammar. CFG is a helpful tool in describing the syntax of
programming languages.
Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams.
The parser analyzes the source code (token stream) against the production rules to detect any errors
in the code. The output of this phase is a parse tree.
This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and
generating a parse tree as the output of the phase. Parsers are expected to parse the whole code
even if some errors exist in the program. Parsers use error recovering strategies.
4.6. Abbreviations:
LL(1):
LR(k):
CLR: Canonical LR
Leftmost Derivation: Derivation in which only the leftmost non terminal in any sentential form
is replaced at each step. Such derivation is called leftmost derivation.
Rightmost Derivation: Derivation in which only the rightmost nonterminal in any sentential form
is replaced at each step. Such derivation is called rightmost derivation.
Handle of a string: Substring that matches the RHS of some production AND whose reduction to
the non-terminal on the LHS is a step along the reverse of some rightmost derivation.
Parse tree: Graphical representation of a derivation ignoring replacement order.
Annotated Parse Tree : A parse tree showing the values of attributes at each node is called an
annotated parse tree.
Annotating (or decorating) of the parse tree: The process of computing the attributes values at
the nodes is called annotating (or decorating) of the parse tree.
A syntax directed definition : specifies the values of attributes by associating semantic rules with
the grammar productions.
Production Production
E->E1+T E.code=E1.code||T.code||9+9
b is an inherited attribute one of the grammar symbols in α (on the right side
of the production), and c1,c2,…,cn are attributes of the grammar
symbols in the production.
( A→α ).
Inherited Attributes: An inherited attribute is one whose value at a node in a parse tree is
defined in terms of attributes at the parent and/or siblings of that node.
Syntax Tree: A syntax tree is a more condensed version of the parse tree useful for representing
language constructs.
Given a CFG, a parse tree according to the grammar is a tree with following properties.
Example :A XYZ
X Y Z
1) Determine the syntactic structure of a program from the tokens produces by the scanner
i.e. to check validity of source program.
3) For invalid string diagnostic error messages for the cause & nature of error.
Parsing technique:
b) Bottom up parsing
a)parsing b)scanning
L23. Exercise:
Q.1) Differentiate between top-down parser & Bottom-up parser.
Lecture: 24
Top-down parsing
Learning objective: In this lecture students will able to design Top Down Parser.
4.9.3 Top Down parser:
Builds parse tree from top (root) to work down to the bottom(leaves).
Top down parsing techniques:
Left Recursion:
• The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.
Rule-
A ò A9
A9 ñ A9 | an equivalent grammar
In general,
A ò1 A9 | ... | òn A9
Left-Recursion – Example
E E+T | T
T T*F | F
F id | (E)
E T E9
E9 +T E9 |
T F T9
T9 *F T9 |
F id | (E)
Left-Factoring
• when we see if, we cannot now which production rule to choose to re-write
stmt in the derivation.
• In general,
A to ñò1 or A to ñò2
A ñA9
• For each non-terminal A with two or more alternatives (production rules) with
a common non-empty prefix, let say
convert it into
A ñA9 | 1 | ... | m
A9 ò1 | ... | òn
Left-Factoring-example
E->b
can be rewritten as
S -->iEtSS9 | a
S9--> | eS
Eb
a)LL(1) b)LR(0)
c)LR(1) d)LALR
a)Yes b)No
c)Ambiguous grammars
d)Non-ambiguous grammars
a)Stack b)Queue
c)Array d)LinkList
L24. Exercise
Q.1) Check following grammar is Left recursive or not? If yes remove left recursion.
S Aa | b
A Ac | Sd |
A.1) This grammar is not immediately left-recursive, but it is still left-recursive.
S þ Aa þ Sda or
S Aa | b
A Ac | Aad | bd |
S Aa | b
A bdA9 | A9
A9 cA9 | adA9 |
Q.2)Check following grammar is left recursive & Left factored or not.
A9 bB | B
A aA9 | cdA99
A9 bB | B
A99 g | eB | fB
Lecture: 25 & 26
Recursive descent and predictive parsers (LL)
Learning Objective: In this lecture students will able to design Recursive Decent and Predictive
Parser(LL).
4.9.3 Example on Predictive or LL(1) Parser:-
a. Steps:
add A ñ to M[A,a]
3 All other undefined entries of the parsing table are error entries.
Original grammar:
EE+E
EE*E
E(E)
Eid
EE+T
TT*F
F(E)
Fid
ETE'
E'+TE'|
TFT'
T'*FT'|
F(E)
Fid
Step 3: Find First & Follow set to construct predictive parser table:-
FIRST(E') = {+, }
FIRST(T') = {*, }
FOLLOW(F) = {*, +, $, )}
Parsing table
Id + * ( ) $
E E TE9 E TE9
E9 E9 +TE9 E9 E9
T T FT9 T FT9
T9 T9 T9 *FT9 T9 T9
F F id F (E)
Step 5: Parser table not contain any multiple defined entries. Hence, given grammar is
LL(1) grammar.
Step 6: Parse the input id*id using the parse table and a stack
1 $E id*id$ ETE'
9 $T' $ T'
10 $ $ accept
L24 Exercise:
Q.1 Check following grammar is LL(1) or not.
SiCtSE | a
EeS |
Cb
A (A)A |
FOLLOW (A) = {$ , )}
S aSa |
Lecture: 27 & 28
Bottom-Up parsing and Operator Precedence Parser
Learning Objective: In this lecture students will able to design Bottom 3 Up Parsing.
7.3.2 LR parsing
Operator-Precedence Parser:
The input string is w$, the initial stack is $ and a table holds precedence relations
between certain terminals
Algorithm:
repeat forever
else {
let a be the topmost terminal symbol on the stack and let b be the symbol pointed
to by p;
until ( the top of stack terminal is related by <. to the terminal most
recently popped );
else error();
Operator grammar
In an operator grammar, no production rule can have:
Example
Precedence Relations:
.
> marking the right hand.
4) In our input string $a1a2...an$, we insert the precedence relation between the pairs
of terminals (the precedence relation holds between the terminals in that pair).
5) Then the input string id+id*id with the precedence relations inserted will be: $ <. id
.
> + <. id .> * <. id .> $
id + * $
. . .
Id > > >
+ <. .
> <. .
>
* <. .
> .
> .
>
Disadvantages:
1) It cannot handle the unary minus (the lexical analyzer should handle the
unary minus).
Advantages:
1) Simple
Lecture: 29
LR parsers
Learning Objective: In this lecture students will able to Design LR Parser.
4.9.6 LR Parser:
a) LR (0) or SLR
b) LR (1) or Canonical LR
1) E E+T
2) E T
3) T T*F
4) T F
5) F (E)
6) F id
• To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G9.
• Algorithm:
C is { closure({S9.S}) }
repeat the followings until no more set of LR(0) items can be added to C.
add goto(I,X) to C
Steps:
2) Find item I.
6) Draw DFA.
Step1.Augmented Grammar:
G9 is G with a new production rule S9S where S9 is the new starting symbol.
E9-> E
E E+T
ET
T T*F
TF
F (E)
F id
A XY.Z
A XYZ.
• Sets of LR(0) items will be the states of action and goto table of the SLR parser.
ET E .E+T
T T*F E .T
TF T .T*F
F (E) T .F
F id F .(E)
F .id
3 If I is the set of items that are valid for some viable prefix , then
goto(I,X) is the set of items that are valid for the viable prefix
X.
Example:
T .T*F, T .F,
F .(E), F .id }
goto(I,F) = {T F. }
F .(E), F .id }
goto(I,id) = { F id. }
E .T T .F
T .F T T.*F F .id
F .(E)
F .(E)
E .E+T
E .T I8: F (E.)
T .T*F E E.+T
T .F
F .(E)
F .id
I5: F id.
FOLLOW(E)= { $, +, ) }
FOLLOW(T)= {*, $, +, ) }
FOLLOW(F)= {*, $, +, ) }
T * to I7
I
EI0 E I1 + F to I3
I6
Id ( to I4
to I5
T *
I2
I F
I1
( to I4
Id to I5
( E )
I I8 I11
T +
id F to I2 to I6
id to I3
stat id + * ( ) $ E T F
e
0 S5 s4 1 2 3
1 s6 acc
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 s4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5
Step 9: There is no Multiple defined shift/Reduce entries in the parser table hence given
grammar is LR(0) Grammar.
0 id*id+id$ shift 5
0E1 $ accept
1)LALR 2)LR(0)
3)LR(1) 4)LL(1)
a)shift-reduce b)reduce-reduce
a)SLR(1) b)LR(1)
4) LR grammar is a:
5) YACC is a:
L27 Exercise:
Q.1 Construct the LR(0) collection for following Arithmetic Grammar & construct LR(0) parser
table:-
EE+T
TT*F
F(E)
Fid
Lecture: 30 & 31
LR (1) and LALR parsers
Learning Objective: In this lecture students will able to Design LR(1) and LALR Parser
4.9.7 Example on LR(1) parser:
Steps:
6)Draw DFA.
LR (1) Item
• To avoid some of invalid reductions, the states need to carry more information.
(a is a terminal or end-marker.)
goto operation:
If A ñ.Xò,a in I
• Algorithm:
C is { closure({S9.S,$}) }
repeat the followings until no more set of LR(1) items can be added to C.
add goto(I,X) to C
Grammar G:
S CC
C cC
C d
Step1:Augmented Grammar G9
1. S9 S
2. S C C
3. C c C
4. C d
Step 2,3,4,5: Item & Closure & goto operations i.e. LR(1) Collection:
(S9 S, $)
(S C C, $)
(C c C, c/d)
(C d, c/d)
I2: goto(I1, C) =
(S C C, $)
(C c C, $)
(C d, $)
I3: goto(I1, c) =
(C c C, c/d)
(C c C, c/d)
(C d, c/d)
I4: goto(I1, d) =
(C d , c/d)
I5: goto(I3, C) =
(S C C , $)
I6: goto(I3, c) =
(C c C, $)
(C c C, $)
(C d, $)
I7: goto(I3, d) =
(C d , $)
I8: goto(I4, C) =
(C c C , c/d)
: goto(I4, c) = I4
: goto(I4, d) = I5
I9: goto(I7, c) =
(C c C , $)
: goto(I7, c) = I7
: goto(I7, d) = I8
FIRST (S)=FIRST(C)={ c , d }
FOLLOW(S)={ $ }
FOLLOW(C)={ c,d }
States c D $ S C
I0 s3 s4 g1 g2
I1 acc
I2 S6 S7 g5
I3 S3 S4 g8
I4 r3 r3
I5 r1
I6 S6 S7 g9
I7 r3
I8 r2 r2
I9 r2
Step 9:There is no multiple defined entries in the LR(1) table hence this grammar is LR(1)
grammar.
1. LALR parsers are often used in practice because LALR parsing tables are smaller than
LR(1) parsing tables.
2. The number of states in SLR and LALR parsing tables for a grammar G are equal.
• We will do this for all states of a canonical LR(1) parser to get the states of the LALR
parser.
• In fact, the number of the states of the LALR parser for a grammar will be equal to the
number of states of the SLR parser for that grammar.
• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same cores.
Then we will merge them as a single state.
I3: goto(I1, c) =
(C c C, c/d)
(C c C, c/d)
(C c C, $) (C d, $)
(C c C, $)
(C d, $)
I4: goto(I1, d) =
I7: goto(I3, d) =
(C d , $)
I8: goto(I4, C) =
I9: goto(I7, c) =
(C c C , $)
DFA:-
States c D $ S C
0 s36 s47 1 2
1 acc
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3
5 r1
89 r2 r2 r2
If there is no parsing action conflicts ,then the given grammar is said to be an LALR Grammar.
a)shift-reduce b)reduce-reduce
a)SLR(1) b)LR(1)
Exercise:
Q 1. Cosider the following grammar and construct the LALR parsing table.
S->AA
A-> aA | b
S->S
S-> CC
C-> cC | d
Learning from the lecture 8LR (1) and LALR9: Student will able to design and Implement
Bottom up Parser.
Exercise:
Q 1. Cosider the following grammar and construct the LALR parsing table.
S->AA
A-> aA | b
Q.1. Construct the LALR parsing table for the following grammar.
S->S
S-> CC
C-> cC | d
Learning from the lecture 8Syntax directed definitions9 : Student will able to design and
Implement Bottom up Parser.
Lecture: 33
LEX Compiler
Learning Objective: In this lecture students will able to understand the working of LEX Compiler
Lex is a tool in lexical analysis phase to recognize tokens using regular expression. Lex tool itself
is a lex compiler.
declarations
%%
translation rules
%%
auxiliary functions
Declarations This section includes declaration of variables, constants and regular definitions.
Auxiliary functions This section holds additional functions which are used in actions. These
functions are compiled separately and loaded with lexical analyzer.
Lexical analyzer produced by lex starts its process by reading one character at a time until a valid
match for a pattern is found.
Once a match is found, the associated action takes place to produce token.
L23 Exercise:
Q.2 Write short note on: LEX Compiler
Lecture: 34
YACC Compiler- Compiler
Learning Objective: In this lecture students will able to understand the working of YACC Compiler
• Writing a compiler is difficult requiring lots of time and effort. Construction of the scanner
and parser is routine enough that the process may be automated.
• What is YACC ?
3 Tool which will produce a parser for a given grammar.
3 YACC (Yet Another Compiler Compiler ) is a program designed to compile a
LALR(1) grammar and to produce the source code of the syntactic analyzer of the
language produced by this grammar.
• Variants:
3 lex, yacc (AT&T)
3 bison: a yacc replacement (GNU)
3 flex: fast lexical analyzer (GNU)
3 BSD yacc
PCLEX, PCYACC (Abraxas Software)
yacc program
a.out
yacc C compiler
yacc y.tab.c
cc
or gcc
C compiler/linker y.tab.h
y.tab.c
An YACC File Example
%{
#include <stdio.h>
%}
LEX
a. (2) Compile time
y yylex()
Input programs
YACC a.out
yyparse() 12 + 26
%{
C declarations
%}
yacc declarations
%%
Grammar rules
%%
Additional C code
Chomsky Hierarchy
A grammar can be classified on the basis of production rules Chomsky classified grammars into
following types.
Type and Grammar name Language used Forms of production Accepting device
& |X1|<=|X2|
& X1Є(VUT)*
&aЄT
1. Know:
a) Student should be able to differentiate between Top Down and Bottom up Parser.
b) Understand the role of Lexical and Syntax analyzer in Compiler Design.
2. Comprehend:
a) Student should be able to explain and design Lexical Analyzer.
b) Student should be able to describe and design Syntax Analyzer.
S AaAb I0: S9 .S
S BbBa S .AaAb
A S .BbBa
B A.
B.
Problem
A reduce by A b reduce by A
reduce by B reduce by B
A (A)A |
FOLLOW (A) = {$ , )}
S aSa |
S Aa | b
A Sc | d
S þ Aa þ Sca or
S Aa | b
A Aac | bc | d
S Aa | b
A bcA9 |dA9
A9 acA9 |
Q.5 Test whether the grammar is LL(1) or not and construct a predictive parsing table for it.
S AaAb
S BbBa
A
B
Q.6 What is parsing? Write down the drawback of top down parsing of backtracking.
Ans:Parsing is the process of analyzing a text, made of a sequence of tokens, to determine its
grammatical structure with respect to a given formal grammar. Parsing is also known as syntactic
analysis and parser is used for analyzing a text. The task of the parser is essentially to determine if
and how the input can be derived from the start symbol of the grammar. The input is a valid input
with respect to a given formal grammar if it can be derived from the start symbol of the grammar.
Following are drawbacks of top down parsing of backtracking:
(i) Semantic actions cannot be performed while making a prediction. The actions must be
delayed until the prediction is known to be a part of a successful parse.
(ii) Precise error reporting is not possible. A mismatch merely triggers backtracking. A
source string is known to be erroneous only after all predictions have failed.
Q.7 For the following grammar construct the predictive parsing table and explain that step by step.
Grammar G:-
ETE'
E'+TE'|
TFT'
T'*FT'|
F(E)
Fid
Q.8 Construct the LR(0) collection for following Arithmetic Grammar & construct LR(0) parser
table:-
EE+T
TT*F
F(E)
Fid
EE+T | T
TT*F | F
F(E)
Fid
Show the shift reduce parser action for the string id+id+id*id.
EE+T | T
TT*F | F
F(E) |I
Ia | b| c
S -> aBDh
B-> cC
C->bC |
D -> EF
E-> g |
F -> f |
Q 13. Cosider the following grammar and construct the LALR parsing table.
S->AA
A-> aA | b
Q.14. Construct the LALR parsing table for the following grammar.
S->S
S-> CC
C-> cC | d
S AaAb
S BbBa
A
B
S->A
A-> Ad|Ae|aB|aC
B->bBC | f
C->g
Q.2. Eliminate left recursion present in following grammar ( Remove direct and Indirect
recursion both)
SAa|b
Q.4 For the given grammar given below, construct operator precedence relations matrix,
assuming *, + are binary operators and id as terminal symbol and E as non terminal symbol.
EE+E
EE*E
Eid
Apply operator precedence parsing algorithm to obtain skeletal syntax free for the statement
Id+id*id. [Dec 2016, Nov 2015]
Q. 5 Construct SLR parsing table for following grammar. Show how parsing actions are
done for the input string ()()$. Show stacks content, i/p buffer, action.
S-> (S)S
ETE9 E9+TE9|€
TFT9 T9*FT9|€
4.15. References
1. A.V. Aho, a nd J.D.Ullman: Principles of compiler construction,
Pearson Education
2 . A.V. Aho, R. Shethi and Ulman; Compilers - Principles, Techniques and
Tools , Pearson Education
4.16 Practice for Module No.4 Syntax Analyzer (Based on Gate Exam & University Patterns)
1. What is the maximum number of reduce moves that can be taken by a bottom-up parser for a
grammar with no epsilon- and unit-production (i.e., of type A -> є and A -> a) to parse a string
with n tokens?
(A) n/2
(B) n-1
(C) 2n-1
(D) 2n
Answer: (B)
Which of the following statements related to merging of the two sets in the corresponding LALR
parser is/are FALSE?
(A) 1 only
(B) 2 only
(C) 1 and 4 only
(D) 1, 2, 3, and 4
Answer: (D)
3. For the grammar below, a partial LL(1) parsing table is also presented along with the
grammar. Entries that need to be filled are indicated as E1, E2, and E3. is the empty string, $
indicates end of input, and, | separates alternate right hand sides of productions.
(A) A
(B) B
(C) C
(D) D
Answer: (A)
4. Consider the data same as above question. The appropriate entries for E1, E2, and E3 are
(A) A
(B) B
(C) C
(D) D
Answer: (C)
Answer: (C)
6. Match all items in Group 1 with correct options from those given in Group 2.
Group 1 Group 2
P. Regular expression 1. Syntax analysis
Q. Pushdown automata 2. Code generation
R. Dataflow analysis 3. Lexical analysis
S. Register allocation 4. Code optimization
Answer: (B)
(B) It is non-terminal whose production will be used for reduction in the next step
(C) It is a production that may be used for reduction in a future step along with a position in the
sentential form where the next shift or reduce operation will occur
(D) It is the production p that will be used for reduction in the next step along with a position in
the sentential form where the right hand side of the production may be found
Answer: (D)
8. An LALR(1) parser for a grammar G can have shift-reduce (S-R) conflicts if and only if
(A) the SLR(1) parser for G has S-R conflicts
(B) the LR(1) parser for G has S-R conflicts
(C) the LR(0) parser for G has S-R conflicts
(D) the LALR(1) parser for G has reduce-reduce conflicts
Answer: (B)
Answer: (A)
10. Consider the grammar with non-terminals N = {S,C,S1 },terminals T={a,b,i,t,e}, with S as the start
symbol, and the following set of rules:
S --> iCtSS1|a
S1 --> eS|ϵ
C --> b
Answer: (C)
Answer: (C)
S -> S * E
S -> E
E -> F + E
E -> F
F -> id
(i) S -> S * .E
(ii) E -> F. + E
(iii) E -> F + .E
Given the items above, which two of them will appear in the same set in the canonical sets-of-
items for the grammar?
(A) (i) and (ii)
(B) (ii) and (iii)
(C) (i) and (iii)
(D) None of the above
Answer: (D)
S --> L. > R
Q --> R.
Answer: (D)
14. Consider the grammar defined by the following production rules, with two operators ∗ and +
S --> T * P
T --> U | T * U
P --> Q + P | Q
Q --> Id
U --> Id
Answer: (B)
15. Which one of the following grammars is free from left recursion?
(A) A
(B) B
(C) C
(D) D
Answer: (B)
S → FR
R→S|ε
F → id
In the predictive parser table, M, of the grammar the entries M[S, id] and M[R, $] respectively.
(A) {S → FR} and {R → ε }
(B) {S → FR} and { }
(C) {S → FR} and {R → *S}
(D) {F → id} and {R → ε}
Answer: (A)
17. The grammar A → AA | (A) | ε is not suitable for predictive-parsing because the grammar is
(A) ambiguous
(B) left-recursive
(C) right-recursive
(D) an operator-grammar
Answer: (B)
S → (S) | a
Let the number of states in SLR(1), LR(1) and LALR(1) parsers for the grammar be n1, n2 and
n3 respectively. The following relationship holds good
(A) n1 < n2 < n3 (B) n1 = n3 < n2 (C) n1 = n2 = n3
(D) n1 ≥ n3 ≥ n2
Answer: (B)
19. Which of the following grammar rules violate the requirements of an operator grammar ? P,
Q, R are nonterminals, and r, s, t are terminals.
1. P → Q R
2. P → Q s R
3. P → ε
4. P → Q t R r
(A) 1 only
(B) 1 and 3 only
(C) 2 and 3 only
(D) 3 and 4 only
Answer: (B)
20. Which of the following suffices to convert an arbitrary CFG to an LL(1) grammar?
(A) Removing left recursion alone
(B) Factoring the grammar alone
(C) Removing left recursion and factoring the grammar
(D) None of these
Answer: (D)
Self-Assessment
Q.1 What is Lexical Analysis? What are the functions of Lexical Analyzer?
Module: 6
Compilers: Synthesis Phase
6.1. Motivation:
1. Describe the role of Intermediate Code Generation in connection with language designing (E)
2. Comprehend the intermediate language and intermediate code for assignment statements, arrays,
Boolean expression, switch statement and conditional and iterative control flow. (U)
3. Explain back patching, procedure calls and translation of mixed mode expressions.(C)
5. State the issues in the design of a code generator. Describe basic blocks and flow graps.(R)
6. Describe dynamic programming code generation algorithm and code generator (U).
A source code can directly be translated into its target machine code, then why at all we need to
translate the source code into an intermediate code which is then translated to its target code? Let
us see the reasons why we need an intermediate code.
If a compiler translates the source language to its target machine language without having
the option for generating intermediate code, then for each new machine, a full native
compiler is required.
Intermediate code eliminates the need of a new full compiler for every unique machine
by keeping the analysis portion same for all the compilers.
The second part of compiler, synthesis, is changed according to the target machine.
It becomes easier to apply the source code modifications to improve code performance by
applying code optimization techniques on the intermediate code.
Intermediate Representation
Intermediate codes can be represented in a variety of ways and they have their own benefits.
High Level IR - High-level intermediate code representation is very close to the source
language itself. They can be easily generated from the source code and we can easily
apply code modifications to enhance performance. But for target machine optimization, it
is less preferred.
Low Level IR - This one is close to the target machine, which makes it suitable for
register and memory allocation, instruction set selection, etc. It is good for machine-
dependent optimizations.
Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).
6.6. Abbreviations:
• DAG: Directed Acyclic Graph
Lecture: 35
Syntax Directed Translation: Syntax directed definitions
Learning Objective: In this lecture students will able to design intermediate code by using syntax
directed translation.
6.9.7 Contents:
Application of Syntax-Directed Translation:-
• Constructing Abstract Syntax Tree
• Type checking
• Intermediate code generatio
Syntax-Directed Translation:-
• Values of these attributes are evaluated by the semantic rules associated with
the production rules.
1. Syntax-Directed Definitions
2. Translation Schemes
1. Syntax-Directed Definitions:
- We associate a production rule with a set of semantic actions, and we do not say when they
will be evaluated.
2. Translation Schemes:
- indicate the order of evaluation of semantic actions associated with a production rule.
- In other words, translation schemes give a little bit information about implementation
details.
Lecture: 36
Intermediate Code Generation: Intermediate languages: declarations, Assignment statements
Learning Objective: In this lecture student will able to understand the different ways to represent
Intermediate Code Generator.
6.9.2 Three-address code corresponding to the tree and dag Intermediate Code Generation:
In the analysis-synthesis model of a compiler, the front end translates a source program
into an intermediate representation from which the back end generates target code. Details
of the target language are confined to the back end, as far as possible. Although a source
program can be translated directly into the target language, some benefits of using a
machine-independent intermediate form are:
b) Postfix notation
Graphical Representations
Graphical representations of a = b * - c + b * - c
b) Postfix notation is a linear zed representation of a syntax tree; it is a list of the nodes of the tree
in which a node appears immediately after its children.
definition -
Three-Address Code:
tl : Y * Z
t2 : = X + t1
Three-address code is a linearized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph.
a= b*-c+b*-c
a) Syntax tree
b) DAG
c) Parse Tree
d) Postfix expression
L33. Exercise:
Q.1 Which are the different types of three address statements?
Lecture: 37
Types of Three Address Statement
Learning objective: In this lecture students will able to identify different types of Three Address
Statements
6.9.3 Types of Three-Address Statements:
Three-address statements are akin to assembly code. Statements can have symbolic labels and there are
statements for flow of control. A symbolic label represents the index of a three-address statement in the array
holding intermediate code.
Actual indices can be substituted for the labels either by making a separate pass, or by using "back patching,"
2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary operations
include unary minus, logical negation, shift operators, and conversion operators that, for example, convert a
fixed-point number to a floating-point number.
4. The unconditional jump goto L. The three-address statement with label L is the next to be executed.
5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator «, =. >=, etc.) to
x and y, and executes the statement with label L next if x stands in relation relop to y. If not, the three-address
statement following if x relop y goto L is executed next, as in the usual sequence.
6. param x and call p, n for procedure calls and return y, where y representing a returned value is optional.
Their typical use is as the sequence of three-address statements par am Xl
par am X2
param Xn
call p,n
generated as part of a call of the procedure p (Xl, X2, ……….. Xn) The integer n indicating the number of
actual parameters in "call p, n" is not redundant because calls can be nested.
7. Indexed assignments of the form X : = y[ i] and x[ i] : = y. The first of these sets x to the value in the location
i memory units beyond location y. The statement x[ i] : = y sets the contents of the location i units beyond x to
the value of y. In both these instructions, x, y, and i refer to data objects.Address and pointer assignments of
the form x : = &y, x : = *y, and *x : = y.
When three-address code is generated, temporary names are made up for the interior nodes of a syntax tree.
The value of nonterminal E on the left side of E -> E + E 2 will be computed into a new temporary t. In
general, the three address code for id : = E consists of code to evaluate E into some temporary t, followed by
the assignment id.place : = t. If an expression is a single identifier,say y, then y itself holds the value of the
expression. For the moment, we create a new name every time a temporary is needed;
The S-attributed definition in Fig. generates three-address code for assignment statements.
Given input a : = b* - c + b* - c, it produces the code in Fig. The synthesized attribute S.code represents the
three address code for the assignment S.
The function newtemp returns a sequence of distinct names vej , t2, . .. in response to successive calls.
For convenience, we use the notation gen (x ': =' y , +' z) in Fig. to represent the three-address statement x : =
y + z. Expressions appearing instead of variables like x, y, and z are evaluated when passed to gen, and quoted
operators or operands, like ' +', are taken literally. In practice, three address statements might be sent to an
output file, rather than built up into the code attributes.
L22. Exercise
Q.3 What is difference between syntax tree & DAG?
Lecture: 38
Representation of Three address Code
Learning Objective: In this lecture students will able to understand the different ways to
Represent and Implement Three Address Code.
6.9.3 Implementations of Three-Address Statements
Quadruples:
g) The contents of fields arg I, arg 2, and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:
a) To avoid entering temporary names into the symbol table, refer to a temporary value by
the position of the statement that computes it.
b) Three-address statements can be represented by records with only three fields: op, arg 1
and arg2, as in Fig.(b).
c) The fields arg 1 and arg2, for the arguments of op, are either pointers to the symbol table.
Since three fields are used, this intermediate code format is known as triples.
A ternary operation like x[ i] := y requires two entries in the triple structure, as shown in Fig.(a),
while x : = y[ i] is naturally represented as two operations in Fig. (b).
Indirect Triples:
Another implementation of three-address code that has been considered is that of listing pointers
to triples, rather than listing the triples themselves. This implementation is naturally called indirect
triples.
For example, let us use an array statement to list pointers to triples in the desired order.
Q.7
a) Postfix notation
b) Syntax Tree
c) Parse tree
d)DAG
e)3- address stmt
L23 Exercise:
Q.4 Compare Triples, Quadruples & Indirect triples.
Lecture: 39
Back patching and Issues in the design of Code Generator
Learning Objective: In this lecture students will able to list different Issues in the design of Code
Generator
6.9.4 Backpatching with example.
The problem in generating three address codes in a single pass is that we may not know the labels
that control must go to at the time jump statements are generated. So to get around this problem a
series of branching statements with the targets of the jumps temporarily left unspecified is
generated. BackPatching is putting the address instead of labels when the proper label is
determined.
1) makelist(i) 3 creates a new list containing only i, an index into the array of quadruples and
returns pointer to the list it has made.
2) merge(i,j) 3 concatenates the lists pointed to by i and j ,and returns a pointer to the
concatenated list.
3) backpatch(p,i) 3 inserts i as the target label for each of the statements on the list pointed to
by p.
The Translation scheme is as follows :-
1) E ---> E1 or M E2
backpatch(E1.falselist, M.quad)
E.truelist = merge(E1.truelist, E2.truelist)
E.falselist = E2.falselist
2) E ---> E1 and M E2
backpatch(E1.truelist, M.quad)
E.truelist = E2.truelist
Symbol Table
1. Memory management.
2. Instruction Selection.
3. Register Utilization (Allocation).
4. Evaluation order.
1. Memory Management
Mapping names in the source program to address of data object is cooperating done in pass
1 (Front end) and pass 2 (code generator).
2. Instruction Selection
1. MOV b, R0 R0 ← b
2. ADD c, R0 R0 ← c + R0
3. MOV R0, a a ← R0
4. MOV a, R0 R0 ← a
5. ADD e, R0 R0 ← e + R 0
6. MOV R0 , d d ← R0
Here the fourth statement is redundant, and so is the third statement if 'a' is not
subsequently used.
3. Register Allocation
Register can be accessed faster than memory words. Frequently accessed variables should
reside in registers (register allocation). Register assignment is picking a specific register
for each such variable.
1. Special use of hardware for example, some instructions require specific register.
For example
Familiarity with the target machine and its instruction set is a prerequisite for designing a good
code generator.
Typical Architecture
Typical Architecture:
1. Target machine is :
2. Bit addressing (factor of 1).
3. Word purpose registers.
4. Three address instruction of forms:
Op source 1, source 2, destination
e.g.,
ADD A, B, C
Byte-addressable memory with 4 bytes per word and n general-purpose registers, R0,
R1, . . . , Rn-1. Each integer requires 2 bytes (16-bits).
ADDED-
MODE FORM ADDRESS EXAMPLE
COST
contents (c +
Indirect Index *c (R) ADD * 100(R2), R1 1
contents (R)
Instruction costs:
Each instruction has a cost of 1 plus added costs for the source and destination.
=> cost of instruction = 1 + cost associated the source and destination address mode.
Examples
a) x= y op Z,
b)x : = op y,
c)x : = y
d)x=y op z op x
L24 Exercise:
Q.12 Define Backpatching
Q.13 Write a short note on: ICG
Question/ problems for practice
Q.14 Write different issues in the design of Code Generator?
Learning from the Lecture 8 Back patching and Issues in the design of Code Generator9:
Student will able to list the different issues in the design of Code Generator.
Lecture: 40
Basic Blocks and Flow graphs
Learning Objective: In this lecture students will able to define and draw flow graph for given
basic block.
5.9.6 Basic block : A sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.
Partitioning a sequence of statements into BBs
1. Determine leaders (first statements of BBs)
The first statement is a leader
The target of a conditional is a leader
A statement following a branch is a leader
2. For each leader, its basic block consists of the leader and all the statements up to
but not including the next leader.
Example:
unsigned int fibonacci (unsigned int n) {
unsigned int f0, f1, f2;
f0 = 0;
f1 = 1;
if (n <= 1)
return n;
for (int i=2; i<=n; i++) {
f2 = f0+f1;
f0 = f1;
f1 = f2;
}
return f2;
}
read(n)
f0 := 0
Leaders:
f1 := 1
if n<=1 goto L0
i := 2
L2: if i<=n goto L1
return f2
L1: f2 := f0+f1
f0 := f1
f1 := f2
i := i+1
go to L2
L0: return n
entry
read(n)
f0 := 0
f1 := 1
n <= 1
return n
i := 2
i<=n
f2 := f0+f1 return f2
f0 := f1
f1 := f2
i := i+1
exit
Learning from the lecture 8Basic Blocks and Flow graphs9: Student will able to find basic
block and draw flow graphs.
Lecture : 41
Learning Objective: In this lecture students will able to apply code generation algorithm for
generating Assembly code for the given optimized code.
The function getreg returns the location L to hold the value of x for the assignment
x := y op z.
1. If the name y is in a register that holds the value of no other names (recall that copy
instructions such as x := y could cause a register to hold the value of two or more variables
simultaneously), and y is not live and has no next use after execution of
x := y op z, then return the register of y for L. Update the address descriptor of y to indicate
that y is no longer in L
2. Failing (1), return an empty register for L if there is one.
3. Failing (2), if x has a next use in the block, or op is an operator such as indexing, that
requires a register, find an occupied register R. Store the value of R into memory location
(by MOV R, M) if it is not already in the proper memory location M, update the address
descriptor M, and return R. If R holds the value of several variables, a MOV instruction
must be generated for each variable that needs to be stored. A suitable occupied register
might be one whose datum is referenced furthest in the future, or one whose value is also
in memory.
4. If x is not used in the block, or no suitable occupied register can be found, select the
memory location of x as L.
A Directed acyclic graph is a graph with no cycles which gives a picture of how the value
computed by each statement in a basic block is used in subsequent statement in the block.
That is, a DAG has node for every sub-expression of the expression. An interior node
represents n operator & its child represents an operand.
DAG is mainly used to identify the same expression.
DAG are useful data structures for implementing transformations on basic blocks. A DAG
gives a picture of how the value computed by a statement in a basic block is used in
subsequent statements of the block. Constructing a DAG from three-address statements is
a good way of determining common sub-expressions (expressions computed more than
once) within a block, determining which names are used inside the block but evaluated
outside the block, and determining which statements of the block could have their
computed value used outside the block. A DAG for a basic block is a directed cyclic graph
with the following labels on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants. From the
operator applied to a name we determine whether the l-value or r-value of a name is needed;
most leaves represent r-values. The leaves represent initial values of names, and we
subscript them with 0 to avoid confusion with labels denoting <current< values of names
as in (3) below.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels. The intention is that
interior nodes represent computed values, and the identifiers labeling a node are deemed
to have that value.
4. For example, the slide shows a three-address code. The corresponding DAG is shown. We
observe that each node of the DAG represents a formula in terms of the leaves, that is, the
values possessed by variables and constants upon entering the block. For example, the node
labeled t4 represents the formula b[4 * i] that is, the value of the word whose address is 4*i
bytes offset from address b, which is the intended value of t4.
Constructing a DAG
Input: a basic block. Statements: (i) x:= y op z (ii) x:= op y (iii) x:= y
- a label for each node. For leaves an identifier - constants are permitted. For interior
nodes an operator symbol.
- for each node a (possibly empty) list of attached identifiers - constants not permitted.
(1) If node(y) is undefined: created a leaf labeled y, let node(y) be this node. In case(i) if
node(z) is undefined create a leaf labeled z and that leaf be node(z).
(2) In case(i) determine if there is a node labeled op whose left child is node(y) and right
child is node(z). If not create such a node, let be n. case(ii), (iii) similar.
(3) Delete x from the list attached to node(x). Append x to the list of identify for node n and
set node(x) to n.
Exercise:
Q.10 Write Code Generation Algorithm.
Q.11 Define DAG
Learning from the lecture 8 Code generation algorithm and DAG representation of Basic Block9:
Student will able to draw DAG for the given Basic Block.
1. Know:
a) Student should be able to write different ways to represent Intermediate Code.
b) Define Basic Block and Flow Graph.
2. Comprehend:
a) Student should be able to Describe Syntax Tree, Three address Code and Postfix
notation.
b) Find leader for given basic block and draw flow graph.
Graphical Representations
the nodes of the tree in which a node appears immediately after its children.
c)Three-Address Code:
tl : Y * Z
t2 : = X + t1
a= b*-c+b*-c
Lecture: 42
Code Optimization: Principal sources of Optimization
Learning Objective: In this lecture student will able to optimize the Intermediate code using
different code optimization techniques.
Simply stated, the best program transformations are those that yield the most benefit for the least
effort. The transformations provided by an optimizing compiler should have several properties.
First, a transformation must preserve the meaning of programs. That is, an "optimization" must
not change the output produced by a program for a given input, or cause an error, such as a division
by zero, that was not present in the original version of the source program. The influence of this
criterion pervades this chapter; at all times we take the "safe" approach of missing an opportunity
to apply a transformation rather than risk changing what the program does.
Third, a transformation must be worth the effort. It does not make sense for a compiler writer to
expend the intellectual effort to implement a code improving transformation and to have the
compiler expend the additional time compiling source programs if this effort is not repaid when
the target programs are executed. Certain local or "peephole" transformations of the kind are
simple enough and beneficial enough to be included in any compiler.
Some transformations can only be applied after detailed, often time-consuming, analysis of the
source program, so there is little point in applying them to programs that will be run only a few
times. For example, a fast, non-optimizing, compiler is likely to be more helpful during debugging
or for "student jobs= that will be run successfully a few times and thrown away. Only when the
program in question takes up a significant fraction of the machine's cycles does improved code
quality justify the time spent running an optimizing compiler on the program.
ALGEBRAIC TRANSFORMATION
Countless algebraic transformations can be used to change the set of expressions computed
by a basic block into an algebraically equivalent set. The useful ones are those that simplify
expressions or replace expensive operations by cheaper ones. For example, statementssuch as
x := x +0 Or x := x*1 , can be eliminated from a basic block without changing the set
of expressions it computes. The exponentiation operator in the statements
x := y ** 2
usually requires a function call to implement. Using an algebraic transformation, this statement
can be replaced by cheaper, but equivalent statement
x := y*y
FLOW GRAPHS
We can add the flow-of 3control information to the set of basic blocks making up a
program by constructing a directed graph called a flow graph. The nodes of the flow graph are the
basic blocks. One node is distinguished as initial; it is the block whose leader is the first statement.
There is a directed edge from block B1 to block B2can be immediately follow B1in some execution
sequence; that is, if
Example 4: The flow graph of the program of fig. 7 is shown in fig. 9, B1 is the initial node.
Prod := 0 B1
I:=1
t1 := 4 * i
B2
t2 := a [ t1 ]
t3 := 4 * i
t4 := b [ t3 ]
t5 := t2 * t4
t6:= prod + t5
t7:=i+1
i := t7
if I <= 20 goto B2
Basic Blocks are represented by variety of data structures. For example, after
partitioning the three address statements by Algorithm 1, each basic block can be represented by
a record consisting of a count of number of quadruples in the block, followed by a pointer to the
leader of the block, and by the list of predecessors and successors of the block. For example the
block B2 running from the statement (3) through (12) in the intermediate code of figure 2 were
moved elsewhere in the quadruples array or were shrunk, the (3) in if i<=20 goto(3) would have
to be changed.
LOOPS
1. All nodes in the collection are strongly connected; from any node in the loop to any other, there
is path of length one or more, wholly within the loop, and
2. The collection of nodes has a unique entry, a node in the loop such that is, a node in the loop
such that the only way to reach a node of the loop from a node outside the loop is to first go through
the entry.
PEEPHOLE OPTIMIZATION
A simple but effective technique for improving the target code is peephole
optimization, a method for trying to improving the performance of the target program by
examining a short sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.
The peephole is a small, moving window on the target program. The code in
the peephole need not contiguous, although some implementations do require this. We shall give
the following examples of program transformations that are characteristic of peephole
optimizations:
• Redundant-instructions elimination
• Flow-of-control optimizations
• Algebraic simplifications
-we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of
a is already in register R0.If (2) had a label we could not be sure that (1) was always executed
immediately before (2) and so we could not remove (2).
UNREACHABLE CODE
#define debug 0
….
If (debug) {
If debug =1 goto L2
Goto L2
L2: …………………………(a)
One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what the
value of debug, (a) can be replaced by:
If debug ≠1 goto L2
L2: ……………………………(b)
As the argument of the statement of (b) evaluates to a constant true it can be replaced
by If debug ≠0 goto L2
L2: ……………………………(c)
As the argument of the first statement of (c) evaluates to a constant true, it can be replaced
by goto L2. Then all the statement that print debugging aids are manifestly unreachable and can
be eliminated one at a time.
FLOW-OF-CONTROL OPTIMIZATIONS
The unnecessary jumps can be eliminated in either the intermediate code or the target
code by the following types of peephole optimizations. We can replace the jump sequence
goto L2
….
L1 : gotoL2
by the sequence
goto L2
….
L1 : goto L2
If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2
provided it is preceded by an unconditional jump .Similarly, the sequence
if a < b goto L1
….
L1 : goto L2
can be replaced by
if a < b goto L2
….
L1 : goto L2
Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional goto. Then
the sequence
goto L1
……..
L3: …………………………………..(1)
may be replaced by
if a<b goto L2
goto L3
…….
L3: ………………………………….(2)
While the number of instructions in (1) and (2) is the same, we sometimes skip the unconditional
jump in (2), but never in (1).Thus (2) is superior to (1) in execution time
ALGEBRAIC SIMPLIFICATION
There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only a few algebraic identities occur frequently enough that it is worth
considering implementing them .For example, statements such as
x := x+0
Or
x := x * 1
are often produced by straightforward intermediate code-generation algorithms, and they can be
eliminated easily through peephole optimization.
Common sub expressions need not be computed over and over again. Instead they can be computed
once and kept in store from where it9s referenced when encountered again 3 of course providing
the variable values in the expression still remain constant.
It9s possible that a large amount of dead (useless) code may exist in the program. This might be
especially caused when introducing variables and procedures as part of construction or error-
correction of a program 3 once declared and defined, one forgets to remove them in case they serve
no purpose. Eliminating these will definitely optimize the code.
REDUCTION IN STRENGTH
The target machine may have hardware instructions to implement certain specific
operations efficiently. Detecting situations that permit the use of these instructions can reduce
execution time significantly. For example, some machines have auto-increment and
auto-decrement addressing modes. These add or subtract one from an operand before or after using
its value. The use of these modes greatly improves the quality of code when pushing or popping a
stack, as in parameter passing. These modes can also be used in code for statements like i : =i+1.
Here we introduce some of the most useful code-improving transformations. Techniques for
implementing these transformations are presented in subsequent sections. A transformation of a
program is called local if it can be performed by looking only at the statements in a bas9ic block;
otherwise, it is called global. Many transformations can be performed at both the local and global
levels. Local transformations are usually performed first.
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program without changing the
function it computes. Common subexpression elimination, copy propagation, dead-code
elimination, and constant folding are common examples of such function-preserving
transformations. The other transformations come up primarily when global optimizations are
performed.
Frequently, a program will include several calculations of the same value, such as an offset in an
array. Some of these duplicate calculations cannot be avoided by the programmer because they lie
below the level of detail accessible within the source language.
Common Subexpressions
recomputing the expression if we can use the previously computed value. For example, the
assignments to t7 and t10 have the common subexpressions 4*I and 4*j, respectively, on the right
side in Fig. They have been eliminated in Fig by using t6 instead of t7 and t8 instead of t10. This
change is what would result if we reconstructed the intermediate code from the dag for the basic
block.
Copy Propagation
Block B5 in Fig. can be further improved by eliminating x using two new transformations. One
concerns assignments of the form f:=g called copy statements, or copies for short. Had we gone
into more detail in Example 10.2, copies would have arisen much sooner, because the algorithm
for eliminating common subexpressions introduces them, as do several other algorithms. For
example, when the common subexpression in c:=d+e is eliminated in Fig., the algorithm uses a
new variable t to hold the value of d+e. Since control may reach c:=d+e either after the assignment
to a or after the assignment to b, it would be incorrect to replace c:=d+e by either c:=a or by c:=b.
The idea behind the copy-propagation transformation is to use g for f, wherever possible after the
copy statement f:=g. For example, the assignment x:=t3 in block B5 of Fig. is a copy. Copy
propagation applied to B5 yields:
x:=t3
a[t2]:=t5
a[t4]:=t3
goto B2
Dead-Code Eliminations
A variable is live at a point in a program if its value can be used subsequently; otherwise, it is dead
at that point. A related idea is dead or useless code, statements that compute values that never get
used. While the programmer is unlikely to introduce any dead code intentionally, it may appear as
the result of previous transformations. For example, we discussed the use of debug that is set to
true or false at various points in the program, and used in statements like
If (debug) print …
By a data-flow analysis, it may be possible to deduce that each time the program reaches this
statement, the value of debug is false. Usually, it is because there is one particular statement
Debug :=false
That we can deduce to be the last assignment to debug prior to the test no matter what sequence of
branches the program actually takes. If copy propagation replaces debug by false, then the print
statement is dead because it cannot be reached. We can eliminate both the test and printing from
the o9bject code. More generally, deducing at compile time that the value of an expression is a
co9nstant and using the constant instead is known as constant folding.
One advantage of copy propagation is that it often turns the copy statement into dead code. For
example, copy propagation followed by dead-code elimination removes the assignment to x and
transforms 1.1 into
a [t2 ] := t5
a [t4] := t3
goto B2
L21. Exercise:
Q.1 What are the sources of code optimization?
Lecture: 43
Optimization of Basic Blocks, Loops in Flow graph
Learning objective: In this lecture students will able to optimize the basic block to improve the
efficiency of the running code.
Optimization of Basic Blocks: Identify Common sub-expression (The expression that compute
the same value) by construction of DAG.
a := b + c
b := b – d Common expressions
c := c + d But do not generate the
e := b + c same result
Basic Blocks
+ + +d - a
0
a := b +b0c c0
b := b – d
c := c + d
e := b + c
Explanation with example:
Source codes generally have a number of instructions, which are always executed in sequence and
are considered as the basic blocks of the code. These basic blocks do not have any jump statements
among them, i.e., when the first instruction is executed, all the instructions in the same basic block
will be executed in their sequence of appearance without losing the flow control of the program.
A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-CASE
conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL, etc.
We may use the following algorithm to find the basic blocks in a program:
Search header statements of all the basic blocks from where a basic block starts:
o First statement of a program.
o Statements that are target of any branch (conditional/unconditional).
o Statements that follow any branch statement.
Header statements and the statements following them form a basic block.
A basic block does not include any header statement of any other basic block.
Basic blocks are important concepts from both code generation and optimization point of view.
Basic blocks play an important role in identifying variables, which are being used more than once
in a single basic block. If any variable is being used more than once, the register memory allocated
to that variable need not be emptied unless the block finishes execution.
Basic blocks in a program can be represented by means of control flow graphs. A control flow
graph depicts how the program control is being passed among the blocks. It is a useful tool that
helps in optimization by help locating any unwanted loops in the program.
Learning from this lecture 8Optimization of Basic Blocks, Loops in Flow graph9:
Students will able to optimize basic block using DAG.
Lecture: 44
Loop Optimization and Peephole Optimization
Learning Objective: In this lecture students will able to study Peephole Optimization.
We now give a brief introduction to a very important place for optimizations, namely loops,
especially the inner loops where programs tend to spend the bulk of their time. The running time
of a program may be improved if we decrease the number of instructions in an inner loop, even if
we increase the amount of code outside that loop. Three techniques are important for loop
optimization: code motion, which moves code outside a loop; induction-variable elimination,
which we apply to eliminate I and j from the inner loops B2 and B3 and, reduction in strength,
which replaces and expensive operation by a cheaper one, such as a multiplication by an addition.
Code Motion
An important modification that decreases the amount of code in a loop is code motion. This
transformation takes an expression that yields the same result independent of the number of times
a loop is executed ( a loop-invariant computation) and places the expression before the loop. Note
that the notion <before the loop= assumes the existence of an entry for the loop. For example,
evaluation of limit-2 is a loop-invariant computation in the following while-statement:
t= limit-2;
while (i<=t)
While code motion is not applicable to the quicksort example we have been considering the other
two transformations are.Loops are usually processed inside out.For example consider the loop
around B3.
Note that the values of j and t4 remain in lock-step; every time the value of j decreases by 1 ,that
of t4 decreases by 4 because 4*j is assigned to t4.Such identifiers are called induction variables.
When there are two or more induction variables in a loop, it may be possible to get rid of all but
one, by the process of induction-variable elimination. For the inner loop around B3 in Fig. we
cannot ger rid of either j or t4 completely.; t4 is used in B3 and j in B4. However, we can illustrate
reduction in strength and illustrate a part of the process of induction-variable elimination.
Eventually j will be eliminated when the outer loop of B2 - B5 is considered.
Example: As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is
not changed elsewhere in the inner loop around B3, it follows that just after the statement j:=j-1
the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:= 4*j by t4:=
t4-4. The only problem is that t4 does not have a value when we enter block B3 for the first time.
Since we must maintain the relationship t4=4*j on entry to the block B3, we place an initializations
of t4 at the end of the block where j itself is initialized, shown by the dashed addition to block B1
in second Fig.
The replacement of a multiplication by a subtraction will speed up the object code if multiplication
takes more time than addition or subtraction, as is the case on many machines.
Peephole Optimization:
In compiler theory, peephole optimization is a kind of optimization performed over a very small
set of instructions in a segment of generated code. The set is called a "peephole" or a "window". It
works by recognizing sets of instructions that can be replaced by shorter or faster sets of
instructions.
L23 Exercise:
Q.7 Differentiate between Machine dependent and Machine Independent optimization..
Questions/problems for practice:
Q.8 List the different code Optimization techniques.
Q.1 Explain with an example Quadruples, Triples, Indirect triples. [May 2016]
Q.2 Draw and explain DAG and represent the following example with it. [May 2016]
Q.4 What are the different issues in design of Code Generator? Explain with an example. [may 2016]
4.15. References
Compilers-Principles, Techniques & Tools By A.V.Aho, R.Shethi & Ullman
Practice for Module No.5 Intermediate Code Generator and Code Generator (based on
University Patterns)
Q.1 A) Write the role of ICG in Compiler Design. (5 Marks)
Q.2 A) Differentiate between Syntax Tree, Parse tree and DAG. (5 marks)
B) What is DAG? Explain with the help of examples. (5 marks)
Q.3 A) List and explain different design issues in compiler design. (10 marks)
Q.4 A) What is Basic Block? Write Algorithm for the same. (10 marks)
Self-Assessment
GATE Questions:
Answer: (C)
Answer: (D)
1. i = 1
2. j = 1
3. t1 = 5 * i
4. t2 = t1 + j
5. t3 = 4 * t2
6. t4 = t3
7. a[t4] = 31
8. j = j + 1
9. if j <= 5 goto(3)
10. i = i + 1
11. if i < 5 goto(2)
The number of nodes and edges in the control-flow-graph constructed for the above code,
respectively, are
(A) 5 and 7
(B) 6 and 7
(C) 5 and 5
(D) 7 and 8
Answer: (B)
x = u - t;
y = x * v;
x = y + w;
y = t - z;
y = x * y;
The minimum number of total variables required to convert the above code segment to static
single assignment form is
(A) 6
(B) 8
(C) 9
(D) 10
Answer: (D)
3. Explain different Code Optimization techniques along with an example. [May 2016]
4.15. References
Compilers-Principles, Techniques & Tools By A.V.Aho,R.Shethi & Ullman
Practice for Module No. 6 Code Optimization, Run Time storage and Compiler-compilers:
Q.1 A) Write machine dependent and independent code optimization techniques (10marks).
Q.2 A) Explain Peephole optimization? (5 Marks)
Q.3 A) Explain heap and stack organization. (10marks)
Q.4 A) Explain working of LEX and YACC. (5 marks)
Self-Assessment
GATE Questions:
1. Some code optimizations are carried out on the intermediate code because
(A) they enhance the portability of the compiler to other target processors
(B) program analysis is more accurate on intermediate code than on machine code
(C) the information from dataflow analysis cannot otherwise be used for optimization
(D) the information from the front end cannot otherwise be used for optimization
Answer: (A)
The computer has only two registers, and OP is either ADD or SUB. Consider the following
basic block:
Assume that all operands are initially in memory. The final value of the computation should be
in memory. What is the minimum number of MOV instructions in the code generated for this
basic block?
(A) 2
(B) 3
(C) 5
(D) 6
Answer: (B)
4.Consider the grammar rule E → E1 3 E2 for arithmetic expressions. The code generated is
targeted to a CPU having a single user register. The subtraction operation requires the first
operand to be in the register. If E1 and E2 do not have any common sub expression, in order to
get the shortest possible code
(A) E1 should be evaluated first
(B) E2 should be evaluated first
(C) Evaluation of E1 and E2 should necessarily be interleaved
Self-Evaluation
Name of
Student
Class
Roll No.
Subject
Module No.
S.No Tick
Your choice
7. Do you understand the role of o Yes
Intermediate Code Generator in o No
Compiler Design?
8. Do you able to list and identify different o Yes
types of three address code? o No
9. Do you able to write different ways to o Yes
represent three address code? o No
10. Do you able to list different issues in o Yes
code generation ? o No
11. Do you understand module? o Yes
o No