0% found this document useful (0 votes)
218 views205 pages

SPCC Resource Book SPCC Resource Book

Uploaded by

suyash kubade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
218 views205 pages

SPCC Resource Book SPCC Resource Book

Uploaded by

suyash kubade
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 205

lOMoARcPSD|24907930

SPCC Resource book

System Programming & Complier Construction (University of Mumbai)

Studocu is not sponsored or endorsed by any college or university


Downloaded by super market ([email protected])
lOMoARcPSD|24907930

SYSTEM PROGRAMMING

AND

COMPILER CONSTRUCTION
(TE- SEM-VI)

Compiled, reviewed and edited by:

Mrs. Vaishali Nirgude

TCET, Mumbai

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

PREFACE
System Programs and compilers are an essential part of any computer system. Similarly, a course on
System Programming and Compiler Construction is an essential part of any computer-science education.
This field is undergoing rapid change, as computers are now prevalent in virtually every application. Yet
the fundamentals remain the same. The goal of this resource book is to provide TCET students a clear
description of concepts that underlie the different system software. As pre-requisites, we assume that the
reader is familiar with basic data structures, computer organization, Theory of computation and a high-
level language such as C or Java. At Mumbai University the System Programming and Compiler
Construction as a subject is introduced in the third year (Sem VI). The Objectives of this course is to make
the learner appreciate the role of different system software. The resource book is intended for
undergraduate course in System Programming and Compiler Construction. In this preface, we guide the
reader in the use of the book. Hence we briefly present here the summary of each chapter.

Module 1: Overview of System Software


This module defines various system software such as assemblers, loaders, linkers, macro processors,
compiler, interpreters, operating systems, device drivers and differentiates Application software with
System Software.
Module 2: Assemblers and Macro Processors
This module presents the concept of assemblers. This module presents at length the concept of macros,
macro processor, loaders and linkers. It provides the objectives and functionality of this system software.
Module 3: Linkers and Loaders
This module describes different Software Tools for Program development. It introduces the concept of
compilers, phases of compilers and differentiates compilers and interpreters.
Module 4: Introduction to Compilers and Lexical Analysis
This module integrates and describes the various phases of compilers in detail. It describes the role of
lexical analyzer and parser in transformation of high level language into machine level language. This
module differentiates among various parsing techniques and grammar transformations. It also deals with
the concept of Syntax Directed Definition (SDD) and Syntax Directed Translation (SDT).
Module 5: Parsing

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

This module differentiates among various parsing techniques and grammar transformations. It also deals
with the concept of Syntax Directed Definition (SDD) and Syntax Directed Translation (SDT).

Module 6: Compilers: Synthesis Phase


This module describes the intermediate code generation and code generation phases of compiler. It deals
with the various representation techniques for intermediate code. It explains the various code generation
algorithms. This module describes the different code optimization techniques like machine dependent and
machine independent code optimization. It also explains run-time storage organization in detail. It also
provides the broad overview of automated tools for generating compilers.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

General Guidelines for Students


1. Resource book is for structured and guided teaching learning process and therefore students are
recommended to come with the same in every lecture.
2. Teaching will be done on the basis of resource book and home assignments will be covered from the
same for the benefit of the students.
3. Resource book is framed to improve the academic result and therefore the students are recommended
to take up all the module contents, home assignments and exercise seriously.
4. A separate notebook should be maintained for every subject.
5. Lectures should be attended regularly. In case of absence topic done in the class should be referred
from the module before attending the next lecture.
6. Motivation, weightage and pre-requisite in every chapter have been included in order to maintain
continuity and improve the understanding of the content to clarify topic requirement from exam point
of view.
7. For any other additional point related to the topic instructions will be given by the subject teacher from
time to time.

System Programming & Compiler Construction


SUBJECT RELATED
1. Weightage in the paper is 30-40% system designing and 70-60% theory and programming.
2. Questions are expected from all modules and students are instructed not to leave any module in option.
3. Weightage for Term Work 3 25 marks, Practical and Oral -25 and Theory -75 marks. Importance
should be given to term work for improving overall percentage in university examination.
4. Practice questions and university questions should be solved sincerely in order to enhance confidence
level and to excel in university examination.
5. Definitions, key notations, solved examples and system design should be referred thoroughly from the
modules after lecture session.

EXAM SPECIFIC

1. All modules are equally important. Emphasis on module 1, 2,3 and 4 can be given from the
examination point of view because it covers (almost 60%) of university question paper which includes
theory and system designing problem.
2. Neat labeled diagram should be drawn as per the requirement of the question mentioned in the question
paper.
3. Read the question paper thoroughly first then choose the five questions. Attempt the one that you know
the best first but do not change the internal sequence of the sub questions.
4. Minimum passing marks in theory paper - 30/75 and in term work 10/25.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

5. For further subject clarification/ doubt in the subject, students can contact the subject teacher.

Guidelines for Writing Quality Answer

Theory:
 Write content as per marks distribution.
 Highlight the main points.
 Write examples for the topic asked in the question.
 Write necessary content related to the point.
 Draw neat and labeled diagrams wherever necessary.
 While writing distinguishing points, write double the number of points as per the marks given,
excluding the example.
Numerical:
Important steps should be written as they carry stepwise marks. The steps are as follows:
 Given data
 Diagrams (wherever applicable)
 Formula
 Substitution
 Calculation
 Answer with proper units
Derivation:
 Important steps to be followed while attempting questions involving derivations:
 Write statement of theorem / proof.
 Mention necessary assumptions to be considered in the derivation.
 Draw neat and labeled diagrams wherever required.
 Define the variables which are being used.
 Mention the formula which is being used.
 Write stepwise formulations and necessary substitutions.
 Highlight the equation or formula proved in the last step.

Note- To get better result, we recommend that the quality of answers should be as per those given in
university question-sample answers.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Syllabus Detailing
Keywords Used for framing of Learning Objectives and its Case Study (Sample)
A. Keywords
Sr.No. Remembering Understanding Applying Analyzing Evaluating Creating
1 Label Compare Change Conclude Choose Arrange
2 List Explain Use Deduce Test Collect
FE 3 Select Illustrate Show Question Revise Modify
4 Name Classify Complete Illustrate Evaluate Rewrite
5 State Derive Calculate Outline Determine Create
6 Write Differentiate Classify Identify Contrast Construct
7 Read Discuss Illustrate Revise

1 Outline Convert Compute Differentiate Explain Solve


2 Indicate Give examples Relate Summarize Relate Design
SE 3 Describe Express Solve Diagram Select Develop
4 Define Predict Choose Determine Estimate Draw
5 Draw Interpret Categorize Explain

1 Recall Distinguish Review Select Compare Specify


2 Relate Describe Sketch Experiment Justify Integrate
TE 3 Reproduce Comprehend Apply Analyze Describe
4 Find Examine Relate Assess
5 Characterize Schedule Distinguish

1 Cite Summarize Compare, Conclude Synthesize


**BE 2 Match Estimate All the Contrast Measure
3 Tabulate Contrast above Investigate Summarize

Course Scheme

B.E. (Computer Engineering) T.E. SEM: VI


Course Name: System Programming and Compiler Construction Course Code: BSC-CS401
Teaching Scheme (Program Specific) Examination Scheme (Formative/ Summative)
Modes of Teaching / Learning / Weightage Modes of Continuous Assessment / Evaluation
Hours Per Week Theory Practical/Oral Term Work Total
(100) (25) (25)
Theory Tutorial Practical Contact Credits IA ESE PR/OR TW
Hours
125
-
3 1 4 4 25 75 25

IA: In-Semester Assessment - Paper Duration – 1.5 Hours


ESE: End Semester Examination - Paper Duration - 3 Hours
The weightage of marks for continuous evaluation of Term work/Report: Formative (40%), Timely completion of
practical (40%) and Attendance / Learning Attitude (20%)
Prerequisite: Theoretical Computer Science, Discrete Structure, Operating System

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Course Objective: The Objective of this course is to compare the role and functioning of various system
programs over application program, understand the role of various system programs from program
development to program execution and design of Assemblers, Macro processor, Linker, Loader, Compiler.

Course Outcomes: Upon completion of the course students will be able to:

Sr. Course Outcomes Cognitive


No. levels of
attainment as
per Bloom9s
Taxonomy
1 Identify and use of various System and Application software for L1, L2, L3
program development.
2 Design and develop Assemblers and Macro processors. L1, L2, L3
3 List various functions of loader and describe various loading scheme. L1, L2
4 Illustrate the working of compiler and design and develop hand L1, L2, L3
written and automatic lexical analyzer.
5 Apply various parsing techniques to design new language structures L1, L2, L3
with the help of grammars.
6 Apply code optimization techniques to optimize intermediate code and L1,L2,L3
generate target machine code.

Syllabus Detailing and Learning objectives

Module Chapter Detailed Syllabus Detailing Learning Objectives


Content
Modul CH 1 Introduction to Purpose: To make students understand the 1. To Describe System
e1 Overview System concept of System Software and Application programs such as
of Software with Software. Explain system programs for assemblers, loaders ,
System language processing such as Assemblers, linkers ,macro
examples,
Software Macroprocessors, Compilers and Interpreters. processors, compilers,
(Hours -4) Software Describe the role of other system programs interpreters, operating
Hierarchy, such as Loaders, Linkers, Operating Systems systems, device drivers,
Differentiate and Device drivers. etc. (R)
between system Scope –
software and 1. Academic Aspects- Compare the role and 2.To Distinguish
application functioning of various system programs over between system
application program software and application
software.
2. Technology Aspect- Understand the software (U)
Introduction to different language translators such as
Language Assemblers, Compilers and Interpreters. 3. To Describe the
Processors: 3. Application Aspect- Understand the role of different database
Compiler, system programs to run the various application formats of 2-pass

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Assembler, programs such as web services, web browser, Assembler with the help
Interpreter. spreadsheets, library management system etc. of examples (E)
4. Compare different
language translator
compiler, Interpreter and
Students Evaluation – Assembler (E)
1. Theory Questions to be asked on system
software and application software 5. Use of different
2. Lab experiments: Case study on different system software’s for
System and Application Software program development.
3. Corresponding viva questions can be asked (AN)
on different System and Application Software
6. Illustrate the working
of different system
softwares (A)
Modul Chapter Assemblers: Purpose- 1. To Compare between
e2 2 Elements of Macros and procedure
Assembl Assembly To make students learn the various elements of and Specify when to use
ers and Language Assembly language programming, study the macro and procedure
Macro Programming, design of single pass and multi-pass .(E)
Processo Basic assemblers. Learn about the design and use of 2. To Identify macros as
rs (Hours Assembler two-pass IBM 360/370 assemblers. and when required to
-10) functions , increase readability,
Design of the This chapter focused on how to use macro & productivity of program.
Assembler, procedure in programs, differentiation between (AN)
Types of macros & procedure/function/subroutine, 3. To Design and
Assemblers, comprehend the definition and expansion of develop two-pass Macro
Two pass macro instructions and to design & implement processor with the help
assembler – macro processor of databases and
IBM 360/370, Scope – Illustrate the working
Format of 1. Academic Aspects- Explore data two-pass
databases, structures, databases, algorithms and Macroprocessor (C)
Algorithm flowchart of single and two-pass
,Single pass Assembler. Explore data structures, 4. To Design and
Assembler for databases, algorithms and flowchart of develop 2 pass Assembler
Intel x86. two-pass Macroprocessor for IBM 360/370 machine
Macro (C)
Processors: 2. Technology Aspect- Assembler is a 5. Outline of single pass
Macros, Basic language translator, which translates Assembler for X86
Functions of Assembly language program to machine code. machine(AN)
Macro Design and development of single pass
Processor, Assembler for X86 machine and two-pass 6. Illustrate the working
Features of Assembler for IBM 360/370 machine. of single pass and two-
Macro Facility, Design and develop two –pass pass Assembler (A)
Design of Two macroprocessor for IBM 360/370 Assembler.
pass Macro
Processor, 3. Application Aspect- Role of Assembler as
Format of a language translator. Write macros as and
Databases and when required to increase readability,
Algorithm productivity of program
Students Evaluation –
1. Subjective questions on functions of
Assemblers, Macroprocessor, macro facilities.
2. Listing the database formats.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3. Lab experiments based on design and


development of 2 pass Assembler and Macro
Processors.
4. Viva questions on Assembler, macros,
procedures, comparison of both 1. List steps involved in
Chapter Linkers: Purpose- program development
3 Introduction, The purpose of this chapter is to provide the and describe software
Modul Loaders Relocation of students with the knowledge of Understanding tools. (R)
e3 and Linking Concept, different functions of Loaders and Linkers, study 2. Describe the
Linkers Design of a of different types of Loader schemes and functions of software
(Hours - Linker. Linkers, understand the design of absolute tools and their use in
5) Loaders: loader and direct linking loader program
Loader and development.(A)
Function of
Scope - 3. List and explain
Loader, Loader
1. Academic Aspects- different types of editor
schemes,
Describe the concept of loader and linker and and their use in program
Design of Direct
the design of absolute loader and direct linking development. (E)
linking loader.
loader
2. Technology Aspect- 4. To Comprehend the
Design and development of absolute loader and definition and expansion
direct linking loader of macro instruction.(U)
3. Application Aspect- To understand the role 5.List functions of loader
of Loader and Linker to run the application and Describe the
programs concept of loader and
linkers and different
Students Evaluation – loader schemes.(R)
1. Subjective questions on functions of loader,
loader schemes, difference between linkage 6. To understand the
editor and linking loader design of absolute
2. Listing the database formats. loader and direct linking
3. Mini project: Design and development of loader. (C)
loader
4. Viva questions on loader, linker and their
functions.
Modul Chapter Introduction to Purpose –
e4 4- Compilers: To provide the students with the knowledge 1. List the functions of
Introduct Design issues, about compiler and its phases, describe front Lexical Analyzer.
ion to passes, phases. end and the back end of compiler. Describe the role of
Compiler Lexical Understand the role of Lexical Analyser in Lexical Analyzer in
s and Analysis: The Compilation process. Introduce the concept of Compiler Design. (R)
Lexical Role of a Lexical finite automata and regular expression
Analysis analyzer, Input 2. Design and Develop
(Hours -4) buffering, Scope – hand written Lexical
specification 1. Academic Aspects- Analyzer and show the
and recognition Illustrate and differentiate between Compiler, Demonstration of
of tokens, Interpreter and assembler. working of lexical
Automatic 2. Technology Aspect- analyzer in compiler
construction of Design of the hand-written lexical analyzer design . (A)
lexical analyzer Compiler is a language translator which
using LEX. convert High level language to Assembly 3. Summarize different
language, e.g. C/C++, Pascal, and Lisp. Compiler Construction
3. Application Aspect- tools and Describe the
Use of different compilers such as Turbo C, structure of Lex
Java for language translation. specification. (AN)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

To take input as a source program which is


written in high level language and convert it 3. Apply LEX Compiler
into sequence of tokens. for Automatic
Generation of Lexical
Student Evaluation - Analyzer and Construct
1. Questions based on Compiler and phases of Lexical analyzer using
compiler open source tool for
2. Experiments based on design and compiler design.( C )
development of phases of compiler.
3. GATE questions based on Compiler 4. Identify and describe
different phases and
passes of compiler.
(AN)

5. Illustrate and
distinguish between
compiler , Interpreter
and Assembler . (U)

6.Summarize the
working of compiler with
the help of example and
specify the output of
each phase.(AN)
Modul Chapter Syntax Purpose – 1. Define Context Free
e Parsing Analysis: The This chapter is covering the role pf parser in Grammar and Describe
5 (Hours - Role of Parser, compiler design, Understand the different Top- the structure of YACC
12) Down and Bottom-Up parsing techniques. specification and Apply
Top down
Scope – YACC Compiler for
parsing- Automatic Generation of
1. Academic Aspects-
Predictive To check the efficiency of Bottom-Up parser Parser Generator. (U)
parsers (LL), over Top-Down parser. To understand the role
Bottom Up of parser in compiler design.
parsing - 2. Technology Aspect- 3. Describe the role of
Operator Design and development of various Top-Down parser in compilation
and Bottom-Up parsing techniques. process. Explain
precedence
3. Application Aspect- different top down and
parsing, SLR, Bottom-up parsing
Construct the parse tree for sequence of
LR (1), LALR, tokens generated by lexical analyzer techniques. (E )
automatic Student Evaluation –
construction of 1. Problems on Top-Down and Bottom-Up
parsers using parsing techniques, such as LL(1), LR(0), 4. Specify various parsing
YACC. LR(1), LALR and operator precedence parser. techniques to design new
2. Mini project: Top-Down and Bottom-Up language structures with
Introduction to
parsing techniques. the help of grammars.(C)
Semantic
Analysis: Need 3. Lab experiments based on First () and
Follow () sets and left recursive and left 5. Explain the
of semantic
factored grammar. construction and role of
analysis, type
4. GATE Questions based on parser, the syntax tree in the
checking and
difference between Top-Down and Bottom- context of Parse tree.(U)
type conversion
Up parsing techniques..
5. Theory and viva questions based on JAVA 6. Distinguish between
compiler environment , LEX and YACC tool Parse tree , Syntax tree
and DAG for graphical

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

6. Lab experiments on design of Lexical representation of the


analyzer using LEX tool and Design of parser source program. (U )
using YACC tool

Modul Compiler Syntax Purpose – 1. Define the role of


e6 s: Directed Discussing the construction and role of Code Optimizer in
Synthesi Translation and syntax tree in the context of parse tree. Compiler design. List
s Phase Explain synthesized and inherited attributes the different principle
Intermediate
(10 .Describing the concept of bottom-up sources of Code
Hours) Code evaluation of attributed definition Optimization.(R)
Generation: 2. Apply code
Attribute To provide the students with the knowledge of optimization principles
grammar, S and Various ways to represent intermediate code on given code. (A)
L attributed i.e. Syntax tree, Postfix notation & three
grammar, address code. Explain backpatching, 3. Apply different code
conditional, iterative control flow optimization techniques
bottom up and
for increasing efficiency
top down Discuss the need of code optimization in of compiler and
evaluations of S compilation, explain different techniques of Demonstrate the
and L attributed loop optimization and explain peephole working of Code
grammar, optimization Optimizer in Compiler
Intermediate Scope – design. (A)
code – need, 1. Academic Aspects-
Explain syntax directed definitions and syntax 4. Describe the role of
Types of
directed translation schema Operating System
Intermediate Comprehend the intermediate language and functions such as
codes, and intermediate code for assignment statements, memory management
Implementation explore procedure calls and translation of as pertaining to run time
of Three address mixed mode expressions. storage management. (
codes. E)
Explain the principle sources of code 5. Describe the role of
Code
optimization to improve space and time Intermediate Code
Optimization: Generation in
complexity of target code
Need and Describe basic blocks, flow graphs and DAG connection with
sources of representation of basic blocks language designing and
optimization, Apply code generation
Code 2. Technology Aspect- algorithm for generating
optimization Explore the intermediate code generation for target machine code.(A)
arrays, Boolean expression, conditional and
techniques:
iterative control flow, switch statements and 6. State the issues in
Machine procedure calls the design of a code
Dependent and generator. Describe
Machine basic blocks and flow
Independent. Data flow analysis, basic blocks, peephole graps.(R)
Code optimization and loop optimization.
Generation:
Issues in the Analyze a code generation algorithm and
design of code describe dynamic programming code
generator, code generation algorithm
generation
algorithm. Basic
block and flow 3. Application Aspect-
graph.
Generate machine independent code for target
machine

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

To make target programs fast, small and


maintainable

Student Evaluation –
1. Theory and viva questions on machine
dependent and independent code optimization
techniques
2. Lab experiments based on code
optimization techniques
3. GATE questions on code optimization
techniques

CO and PO Mapping

PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO


1 2 3 4 5 6 7 8 9 10 11 12 1 2 3

CO1 √ √ √ √ √ √ √ √ √ √

CO2 √ √ √ √ √ √ √ √ √ √ √

CO3 √ √ √ √ √ √ √ √

CO4 √ √ √ √ √ √ √ √ √ √

CO5 √ √ √ √ √ √ √ √ √

CO6 √ √ √ √ √ √ √ √ √ √ √

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

F. Module and Learning Levels Mapping

LL-> LL 1 LL 2 LL 3 LL 4 LL 5 LL 6

Modules Remember Understand Apply Analyze Evaluate Create

Module-I √ √ √ √ √ √

Module-II √ √ √ √ √

Module-III √ √ √ √ √

Module-IV √ √ √ √ √

Module-V √ √ √ √ √

Module-VI √ √ √ √ √ √

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

List of Experiment

Following Strategies will be implemented for SPCC lab experiments:


 Total 12 Experiments are planned out of which 1- experiment is based on case studya nd 1 is Mini-project.
 8- Experiments should be implemented using C++/Java programming language and 2 -experiments using
Compiler construction tools LEX and YACC.
 Journal evaluation will be done on weekly basis along with mock viva.

Resources/ Tools/
Sr.
Experiment Name Technology to be Learning Objectives Learning Outcomes
No
used

Students will able to:

List various system software


Implement interactive
and application software
editor as an application
Distinguish between system
program and compare Compare the role and
software and application
between System and Internet, Text and functioning of various
1 software
Application program. Reference books system programs over
Apply system software to run
application program
the application programs
Indicate the order in which
different system software are
used to run the application
programs
Describe the different database
formats of 2-pass Assembler
with the help of examples
Design and develop two Jdk1.8, Turbo C/C++, Design 2 pass Assembler for
pass Assembler for X86 Notepad++ Design and develop 2-
2 X86 machine.
machine Pass Assembler
Develop 2-pass Assembler for
X86 machine.
Illustrate the working of 2-Pass
Assembler
Describe the different database
formats of 2-pass Macro
processor with the help of
Jdk1.8,Turbo C/C++, examples
Design and develop two Notepad++
Design and develop 2- Design 2-pass Macro processor
3 pass Macro Processor.
Pass Macro-processor for X86 machine.
Develop 2-pass Macro
processor for X86 machine.
Illustrate the working of 2-Pass
Macro-processor
Design and Develop hand List the functions of Lexical
4
written lexical analyzer Analyzer

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Describe the role of Lexical


Jdk1.8,Turbo C/C++, Analyzer in Compiler Design
Design and develop a Notepad++ Design and Develop hand
Lexical Analyzer using
written Lexical Analyzer
programming language.
Demonstrate the working of
lexical analyzer in compiler
design
Identify type of grammar G
Jdk1.8,Turbo C/C++, Define First () and Follow ()
Implement a program to Notepad++ Identify the types of sets
5 find First () and Follow()
grammar and compute Find First () and Follow () sets
set of given grammar. for given grammar G
First() and Follow() set
Apply First () and Follow () sets
for designing Top Down and
Bottom up Parsers
Explain the properties of Top
Down Parser
Analyze whether the given
Implement a program to Jdk1.8,Turbo C/C++, grammar G is left recursive and
remove Left Recursion Notepad++ Design new language left factored or not
6 from grammar and make it structures with the help of Compute Non left recursive
Left Factored Grammar. grammars grammar and make it left
factored
Reproduce the equivalent
grammar for designing Top
Down Parsers
Define the role of Intermediate
Code Generator in Compiler
design
Describe the various ways to
Design and develop Jdk1.8,Turbo C/C++,
implement Intermediate Code
Intermediate Code Notepad++ Design and Develop
Generator
7 Generator using 3- Intermediate code
Specify the formats of 3
Address code. Generator
Address Code
Illustrate the working of
Intermediate Code Generator
using 3-Address code

Define the role of Code


Optimizer in Compiler design
Implement code List the different principle
optimization techniques: sources of Code Optimization
Jdk1.8,Turbo C/C++, Design and implement Apply different code
8 1. Function Notepad++ different Code optimization techniques for
preserving Optimization techniques increasing efficiency of
2. Loop compiler
optimization. Demonstrate the working of
Code Optimizer in Compiler
design

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Define the role of Code


Design and develop code Generator in Compiler design
generator for target Jdk1.8,Turbo C/C++, Apply the code generator
Design and implement
machine architecture. Notepad++ algorithm to generate the
9 Code Generator for target
machine code
machine
Generate target code for the
optimized code, considering
the target machines
Summarize different Compiler
Construction tools
Describe the structure of Lex
Design and develop a
Open Source tool specification
Lexical Analyzer using (Ubuntu , LEX tool), Develop Lexical Analyzer Apply LEX Compiler for
10 Notepad++
LEX / Flex tool using LEX / Flex tool Automatic Generation of
Lexical Analyzer
Construct Lexical analyzer
using open source tool for
compiler design
Define Context Free Grammar
Describe the structure of
Design and develop a YACC specification
Parser Generator using
Open Source tool
(Ubuntu , LEX tool), Develop Parser Generator Apply YACC Compiler for
YACC tool (Design
11 Notepad++ Automatic Generation of
calculator using YACC using YACC tool
tool). Parser Generator
Construct Parser Generator
using open source tool for
compiler design
Comprehend System Software
and Application Software
Jdk1.8, Turbo C/C++,
To familiarize and Design different System
Notepad++, Open
encourage the students to Software
Mini-project Source tools, .NET
12 use various software tools Construct System Software
framework
for developing System and and Application Software
Application programs
Apply different System
Software , Application
Software and Open source tools

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

INDEX
Module Contents Page number
1 Overview of System Software
• Introduction 1-53
• Assemblers
• Loaders
• Linkers
• Macro processors
• Compilers
• Interpreters
• Operating systems
• Device drivers

• Objective Questions with answers
• Subjective Questions with answers
• University Questions with answers

2 Assemblers and Macro Processors


 Design of Assembler (Single Pass 3Assembler, multi pass 54-63
Assembler )
• Data Structure ,
• format of Databases
• Algorithm of Pass1 And Pass2
• Macro instructions
• Features of Macro facility
• Design of 2 pass macro-processor
• • Objective Questions with answers…………………
• Subjective Questions with answers……………………
• University Questions with answers……………………

3 Linkers and Loaders


• Loader schemes 64-74
• Design of Absolute loader
• Design of Direct linking loader
• Objective Questions with answers………
• Subjective Questions with answers……………
• University Questions with answers……………

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

4 Introduction to Compilers and Lexical Analysis


• Introduction to Compilers 75-140
• Phases of a compiler
• Comparison of compilers and interpreters
 Role of a Lexical analyzer
• Specification and recognition of tokens
• Designing a lexical analyzer generator
• Objective Questions with answers………………
• Subjective Questions with answers………………
• University Questions with answers………………

5 Parsing
• Role of Parser 141-160
• Top-down parsing
• Recursive descent and predictive parsers (LL)
• Bottom-Up parsing, Operator precedence parsing
• LR, SLR and LALR parsers
• Syntax directed definitions
• Inherited and Synthesized attributes
• Evaluation order for SDDs
• S attributed Definitions
• L attributed Definitions
YACC compiler-compiler
• Objective Questions with answers………………
• Subjective Questions with answers…………………
• University Questions with answers……………..

6 Compilers: Synthesis Phase


• Intermediate languages: declarations 161-199
• • Issues in the design of Code Generator
• Basic Blocks and Flow graphs
• Code generation algorithm
• DAG representation of Basic Block
Principal sources of Optimization
• Optimization of Basic Blocks
• Loops in Flow graph
• Peephole Optimization
• • Objective Questions with answers……………………
• Subjective Questions with answers………
• University Questions with answers…………..

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

MODULE 1
Overview of System Software

1.1 Motivation:
To provide the students with the knowledge of

 How to design & implement of various types of system software


 Differentiation between application software & system software.
 Different types of system software
 Application of system software & application software
 Study of different types of Translators

1. 2. Learning Objective:
• To learn various system software9s such as assemblers, loaders, linkers, macro processors,
compiler, interpreters, operating systems, device drivers.
• Differentiate between application software & system software.

1.3 Syllabus:

Lecture Content Duration Self Study


1 Introduction to System Software with 2 hrs 2 hrs
examples, Software Hierarchy, Differentiate
between system software and application
software

2 Introduction to Language Processors: 2 Hrs 2 Hrs


Compiler, Assembler, Interpreter.

1.4 Learning Outcomes:


After studying this chapter, students able to:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• Compare between system software & application software.


• Differentiate between system software9s & application software are available in the market.
• Understand the application of all these software9s.

1.5 Definitions

 Compiler: is a translator (program) which convert source program (Written in C,C++ etc)
into machine code.
 Assembler: is a translator (program) which convert source program (Written in Assembly)
into machine code.
 Interpreter: is a translator (program) which convert source program line by line into
Intermediate code, which it then executes.
 Operating System: is a program which provide interface or communication between
hardware & software.
 Loader: A program routine that copies a program into memory for execution.
 Linker: is a program that combines object modules to form an executable program.
 Macro Processor: is a program which is responsible for processing the macros.
 Device Drivers: is a program that controls a particular type of device that is attached to
your computer.
 Software: is a program. Program is a group of instructions.

1.6 Key Definitions:

OS: Operating System


HLL: High Level Language
ALP: Assembly Language Programming

1.7 Course Content:

Lecture 1 & 2
1.7.1 Types of Software:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Software is generally divided into:


a) System software
b) Application software
a) System software are programs which help in the running of a computer system
E.g. Disc operating programs, OS, Compiler etc.
b) Application software are programs which perform specific tasks for the user.
E.g. Word processing software, Graphics package, Theatre booking software.

Assembler:
An assembler is a program that accepts as input an assembly language program &
produces its machine language equivalent along with information for the loader.

Assembler
ALP m/c language & other information for the loaders
Databases

Compiler:
A compiler is a program that reads an input in HLL & translates it into an equivalent
program in machine language.

Compiler
Source m/c language Program ( e.g C,C++)
Error Messages

Phases of Compiler: Compiler operates in different phases.\


1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
4. Intermediate Code Generator
5. Code Optimizer
6. Code Generator
Other- Symbol table & Error Handler
Interpreter:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

An interpreter is a translator that reads the program written in HLL & translate line by
line into an intermediate form, which it then executes.

1.7.2 Difference between Compiler, Interpreter & Assembler:


Sr. Assembler Compiler Interpreter
No
1. It converts ALP into m/c It converts HLL program It converts HLL program line by
language. into m/c language. line in intermediate form, which it
then executes.
2. It assemble whole source It compile whole source It translate line by line.
program. program
3. Speed of execution is Speed of execution is fast. Speed of execution is slow.
fast.
4. Program need to Program need to compiled For every run of program, program
assemble only once & only once & can be need to be translated.
can be executed executed repeatedly
repeatedly.
5. It creates an object file. It creates an object file. It does not creates an object file.
6. It require large memory It require large memory It require less memory space to
space to store (But less space to store. store.
than compiler).
7. e.g. Microsoft assembler MS-DOC C compiler. Basic Interpreter.
( MASM)
8. Source language- Source language- C, C++ Source language-
Assembly BASIC, LISP

Loader:
Loader is a system program which is responsible for preparing the object program for execution
& initiates the execution. OR
A program routine that copies a program into memory for execution.
OR

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Operating System utilities that copy program from a storage device to main memory, where they
can be execute. In addition to copying a program into main memory, the loader can also replace
virtual addresses with physical addresses.
Function of Loader:
a) Allocation
b) Linking
c) Relocation
d) Loading
Types of Loaders:
a) Assemble ( Compile) & go loader
b) Absolute loader
c) Bootstrap loader
d) Direct linking loader
Linker:
Linker is a program that combines object modules to form an executable program.
Also called Link editor & binder.
Many programming languages allow you to write different pieces of code, called modules,
separately.
In addition to put all the modules, a linker also replaces symbolic addresses with real addresses.
Therefore, you need to link a program even if it contains only one.

Operating System:
Operating system is system software, consisting of program and data that runs on computer and
manage the computer hardware.
It is an integrated set of programs that controls the resources of a computer system and provides its
users with an interface that is easier to use.
Objective of OS:
-Make a computer system easier to use
-Manage the resources of a computer system

Function of OS
a) Process management: It takes care of creation and deletion of processes.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

b) Memory management: It takes care of allocation and de-allocation of memory


space to programs in need of this resource.
c) File management: It takes care of file-related activities.
d) Security: It protects the information of a computer system against unauthorized
access.
e) I/ O Management.
f) Communication
g) Accounting
(1)Process management: A process is a program in execution. It is the job, which is currently
being executed by the processor. During its execution a process would require certain system
resources such as processor, time, main memory, files etc. OS supports multiple processes
simultaneously. The process management module of the OS takes care of the creation and
termination of the processes, assigning resources to the processes, scheduling processor time to
different processes and communication among processes.
(2)Memory management module: It takes care of the allocation and de-allocation of the main
memory to the various processes. It allocates main and secondary memory to the system/user
program and data. To execute a program, its binary image must be loaded into the main memory.
Operating System decides.
(a) Which part of memory are being currently used and by whom.
(b) which process to be allocated memory.
(c) Allocation and de allocation of memory space.
(3)I/O management: This module of the OS co-ordinates and assigns different I/O devices namely
terminals, printers, disk drives, tape drives etc. It controls all I/O devices, keeps track of I/O
request, issues command to these devices.
I/O subsystem consists of
(i) Memory management component that includes buffering, caching and spooling.
(ii) Device driver interface
(iii) Device drivers specific to hardware devices.
(4)File management: Data is stored in a computer system as files. The file management module
of the OS would manage files held on various storage devices and transfer of files from one device

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

to another. This module takes care of creation, organization, storage, naming, sharing, backup and
protection of different files.
(5)Scheduling: The OS also establishes and enforces process priority. That is, it determines and
maintains the order in which the jobs are to be executed by the computer system. This is so because
the most important job must be executed first followed by less important jobs.
(6)Security management: This module of the OS ensures data security and integrity.
That is, it protects data and program from destruction and unauthorized access. It keeps different
programs and data which are executing concurrently in the memory in such a manner that they do
not interfere with each other.
(7)Processor management: OS assigns processor to the different task that must be performed by
the computer system. If the computer has more than one processor idle, one of the processes
waiting to be executed is assigned to the idle processor.
OS maintains internal time clock and log of system usage for all the users. It also creates error
message and their debugging and error detecting codes for correcting programs
Types of OS:
1. Multiprogramming: More than one job reside in the memory & all are ready for execution.
2. Multiprocessing: is the simultaneous execution of two or more processes by a computer
system having more than one CPU.
3. Multitasking: Switch from one task to another in small fraction of time.
Technically it is same as multiprogramming. But multiprogramming for multi-user systems (systems
that are uses simultaneously by many users such as mainframe system) and multitasking for single-
user systems (systems that are used by only one user at a time).
4. Batch processing: batch processing system is a one where programs data are collected
together in a batch before processing start.
5. Multithreading: Allows different parts of a single program to run concurrently.
6. Real-time: Responds to input instantly.

7.5 Device Drivers:


• A device driver is a program that controls a particular type of device that is attached to
your computer.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• There are device drivers for printers, displays, CD-ROM readers, diskette drives, and so
on.
• When you buy an OS, many device drivers are built into the product.
• A device driver essentially converts the more general input/output instructions of the
operating system to messages that the device type can understand.
• Some Windows programs are Virtual Device Driver. These programs interface with the
Windows Virtual Machine Manager. There is a virtual device driver for each main
hardware device in the system, including the hard disk drive controller, keyboard, and
serial and parallel ports. They're used to maintain the status of a hardware device that has
changeable settings. Virtual device drivers handle software interrupts from the system
rather than hardware interrupts.
• In Windows operating systems, a device driver file usually has a file name suffix of DLL
or EXE. A virtual device driver usually has the suffix of VXD.

1.8 . References:
D. M. Dhamdhere, =Systems programming & Operating Systems=.

1.9 Multiple Choice Questions.

Q.1 Translator for low level programming language termed as


(A) Assembler (B) Compiler
(C) Linker (D) Loader

Q.2 The translator which perform macro expansion is called a


(A) Macro processor (B) Macro pre-processor
(C) Micro pre-processor (D) assembler

Q.3 Shell is the exclusive feature of


(A) UNIX (B) DOS

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(C) System software (D) Application software


Q.4 An assembler is
(A) programming language dependent. (C) machine dépendant.
(B) syntax dépendant. (D) data dependant.

Q.5 Which of the following loader is executed when a system is first turned on or
restarted
(A) Boot loader (B) Compile and Go loader
(C) Bootstrap loader (D) Relating loader
Q.6 A linker program
(A) Places the program in the memory for the purpose of execution.
(B) Relocates the program to execute from the specific memory area allocated to it.
(C) Links the program with other programs needed for its execution.
(D) Interfaces the program with the entities generating its input data.
Q. 7 An assembly language is a
(A) low level programming language
(B) Middle level programming language
(C) High level programming language
(D) Internet based programming language

Q.8 Which of the following are language processors?


(A) Assembler (B) Compiler
(C) Interpreter (D) All of the above

Q.9 Which is not a computer translator?


(A) Compiler (B)Assembler
© Interpreter (D) Word processor

Q.10 Does Interpreter generate object file?


a) Yes
b) No

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

1.10 Short questions


Q.1 Differentiate between system software & application software.
a) System software are programs which help in the running of a computer system e.g. Disc
operating programs, OS, Compiler etc.
b) Application software are programs which perform specific tasks for the user. e.g. Word
processing software, Graphics package, Theatre booking software.

Q.2 Define system software.


A.2 Refer Answer No.1

Q.3 Define OS. Which are the different functions of OS?

A. Refer topic no. 7.2


Q.4 Give examples of application and system software
A. For application software examples are web browser, editors.
For system software examples are assemblers, macro-processor linkers, loaders, interpreters,
compilers, operating system, device drivers.
Q.5 Which are the different types of OS?
Refer answer No. 3
Q.6 Explain all system software in detail.
Q.7 What are the basic functions of language translator.

1.14. Long Questions:

Q.1 what is system programming? Explain the evolution of system software.


Ans: System software is collection of system programs that perform a variety of functions, viz file
editing, recourse accounting, IO management, storage management etc.System programming is
the activity of designing and implementing SPs.System programs which are the standard
component of the s/w of most computer systems; The two fold motivation mentioned above arises
out of single primary goal via of making the entire program execution process more effective.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q.2 Differentiate between application program and system program. Indicate the order in which
following system program are used, from developing program upto its execution. Assembler,
loaders, Linkers, macroprocessor, compiler, editor.

MODULE 2
ASSEMBLERS AND MACRO PROCESSORS

2.1 Motivation:

Motivation of this course is to provide the students with the knowledge of


 How to design multi pass ( pass1 & pass2 ) assembler
 How to design single pass assembler
 What are the applications of assembler
 How to use macro & procedure in programs
 Differentiation between macros & procedure/function/subroutine.
 How to design & implement macro processor

2.2 Learning Objective:


• List features & functions of assembler.
• Design & implement single pass & multi-pass assembler.
• Identify forward reference problem & find its solution.
• Appreciate role & function of macro processor.
• Design & implement 2- pass macro processor.
• Apply macro & procedure in the program

2.3. Syllabus:

Lecture Content Duration Self Study


5&6 Assemblers: Elements of Assembly Language 2 Lecture 2 hours
Programming

7 Basic Assembler functions , Design of the 1 Lecture 2 hours


Assembler, Types of Assembler

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

8&9 Two pass assembler 3 IBM 360/370, Format of 2 Lecture 3 hours


databases, Algorithm

10 Single pass Assembler for Intel x86. 1 Lecture 2 hours


11 & 12 Macro Processors: Macros, Basic Functions of 2 Lecture 2 hours
Macro Processor, Features of Macro Facility

13 & 14 Design of Two pass Macro Processor, Format of 2 Lecture 4 hours


Databases and Algorithm.

2.4. Learning Outcomes:

After studying this chapter, students should able to:


• Design & implement single pass & multi pass assembler.
• Identify forward reference problem & find its solution.
• Compare between macro & procedure.
• Apply macros and procedure in the program
• Learn different features of macro processor
• Design and implement 2-pass macro processor

2.5. Definitions

Assembler: is a translator ( program) which convert source program ( Written in assembly) into
machine code.
Macro Processor: is a program which is responsible for processing the macros.
Macro: Single line abbreviation for group of instructions.

2.6. Key Definitions:


ALP: Assembly Language Programming
FRP: Forward Reference Problem
DC: Declare Constant
MOT: Mnemonic Opcode Table
POT: Pseudo Opcode Table

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

ST: Symbol Table


LT: Literal Table
BT: Base Table
LC: Location Counter
FRT: Forward Reference Table
CRT: Cross Reference Table
ALP : Assembly Language Programming
MDT : Macro Definition Table
MNT : Macro Name Table
MDTC: Macro Definition Table Counter
MNTC: Macro Name Table Counter
ALA: Argument List Array

2.7 Course Content:

Lecture 5 &6

2.7.1 Assembler:
An assembler is a program that accepts as input an assembly language program &
produces its machine language equivalent along with information for the loader.

Assembler
ALP m/c language &
Other information for the loaders
Databases

7.1.1 Functions of Assembler:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

- Convert mnemonic operation codes to their machine language equivalents


- Convert symbolic operands to their equivalent machine addresses
- Decide the proper instruction format
- Convert the data constants to internal machine representations
- Write the object program and the assembly listing
- Convert symbolic operands to their equivalent machine addresses

7.1.2 Features of assembly language programming:


1) Mnemonic opcode specification:
Instead of using the binary opcodes mnemonics can be used.
E.g. L- load
A- Add
ST- Store
BR- Branch

2) Symbolic Operand Specification:


Instead of referring to instruction or data by its address, symbol can be used.
(Symbolic name can be associated with data or instructions).
It is function of assembler to replace each symbol by its address.

3) Storage Area specification:


Declaration of data & storage areas.
Assembly language can be used to specify some part of memory can be used for
storage.

7.1.3 Statements in ALP:


1. Imperative statement:
Understood by m/c & executed by m/c.
e.g. All the instructions.
3. Declarative statements:
DC statement: Declare Constant

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Label Opcode Operand

FOUR DC F 849

2. Assembler Directive statement:


These are neither executable nor declarative.
They direct the assembler to perform the specific task.
e.g. a) START statement:
-These statement indicates the start of assembly language program.
b) END statement: END of ALP.
c) BYTE, WORD, RESB, RESW

Lecture 7
7.2 General Design Procedure
1. Specify the Problem

2. Specify Data Structure


3. Define Format of Data Structure
4. Specify Algorithm
5. Look for modularity (i.e. capability of one program to be subdivided into independent
programming units)
6. Repeat 1 through 5 on modules

7.3 Design of Assembler:


(For IBM 360/370 m/c)
1) Registers:-
-There are 16 general purpose register of 32 bit each.
- There are 4 floating point register of 64 bit each.
- There is 1 program status word of 64 bits.
2) Memory:-

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

-Basic unit Bytes.

Unit Of Memory Bytes Bits


Byte 1 8
Half Word 2 16
Full Word 4 32
Double Word 8 64

2) Instruction Formats:
a) RR Format:-
In this format the first & second operands present in the registers.
Format:
Opcode R1 (OP1) R2 (OP2)
0 7 8 11 12 15
e.g AR 3,4
3) RX format:
-In this format first operand is in the register & second operand is external ( present
in the memory location).
-The address of second operand is given by-
C (B2) + C(X2) + D2
Contents of Base Reg. + Contents of Index Reg. + Displacement

Format:
Opcode R1 X1 B2 D2
0 7 8 11 12 15 16 19 20 31
e.g A 1, 90(2,15)
i.e. C(15) + C(2) +90

b. Using (USING) statement:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

It is an assembler directive statement which indicates that the given register is


available to be used as base register along with its value.
Format:
USING value, Reg. no.
-Evaluate the operand
-Make the entries in the BT

c. DROP statement:
It is a assembler directive statement which indicate that the given register is
unavailable to be used as base register.
Format:
DROP Register no.

Lecture 8
7.5 Forward Reference Problem:
a) Definition of forward reference:
-The rules of ALP state that the symbol should be defined somewhere in the
program. So that there might be a case in which a reference is made to a symbol
prior to its definition. Such a reference is called forward reference.

b) What is a FRP?
-The function of the assembler is to replace each symbol by its m/c address &
if we refer to that symbol before it is defined , its address is not known by the
assembler. Such a problem is called FRP.

c) Solution to FRP:-
- Forward reference problem (For IBM 360) is solved by making two passes
over the assembly code.
Pass1:-
Purpose- Define Symbols & Literals.
1) Keep track of LC.
2) Determine length of m/c instructions.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3) Remember the value of symbols until pass2.


4) Process some Pseudo-ops e.g EQU, DS, DC.
5) Remember Literals.
Pass2:
Purpose- Generate Object Program.
1) Lookup values of symbols.
2) Generate instructions.
3) Generate data.
4) Process some Pseudo-ops e.g. USING, DROP.

Lecture 9

7.6 Format of Databases:


a) POT (Pseudo Opcode Table):-
POT is a fixed length table i.e. the contents of these table are altered during the assembly
process.

Pseudo Opcode Address of routine to process pseudo-


(5 Bytes charater) opcode.
(3 bytes= 24 bit addrss)
<DROPb= P1 DROP
<ENDbb= P1 END
<EQUbb= P1 EQU
<START= P1 START
<USING= P1 USING

- The table will actually contain the physical addresses.


- POT is a predefined table.
- In PASS1 , POT is consulted to process some pseudo opcodes like-DS,DC,EQU
- In PASS2, POT is consulted to process some pseudo opcodes like DS,DC,USING,DROP

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

b) MOT (Mnemonic Opcode Table):-


MOT is a fixed length table i.e. the contents of these tables are altered during the assembly
process.

Mnemonic Binary Op- Instruction Instruction Not used in


Opcode code Length Format this design
(4 Bytes (1 Byte ( 2 Bits binary) (3 bits binary) (3 bits)
character) Hexadecimal)
<Abbb= 5A 10 001
<AHbb= 4A 10 001
<ALbb= 5E 10 001
<ALRb= 1E 01 000
b- Represents the char blanks.
Codes:-
Instruction Length Instruction Format
01= 1 Half word=2 Bytes 000 = RR
10= 2 Half word=4 Bytes 001 = RX
11= 3 Half word=6 Bytes 010 = RS
011= SI
100= SS
- MOT is a predefined table.
- In PASS1 , MOT is consulted to obtain the instruction length.(to Update LC)
- In PASS2, MOT is consulted to obtain:
a) Binary Opcode (to generate instruction)
b) Instruction length ( to update LC)
c) Instruction Format (to assemble the instruction).

C) Symbol table (ST):

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Symbol Value Length Relocation


(8 Bytes (4 Bytes ( 1 Byte (R/A)
charaters) Hexadecimal) Hexadecimal) (1 Byte character)
<PRG1bbb= 0000 01 R
<FOURbbbb= 000C 04 R

- ST is used to keep a track on the symbol defined in the program.


- In pass1- whenever the symbol is defined an entry is made in the ST.
- In pass2- Symbol table is used to generate the address of the symbol.

D) Literal Table (LT):


Literal Value Length Relocation
(R/A)

= F 859 28 04 R
-
- LT is used to keep a track on the Literals encountered in the program.
- In pass1- whenever the literals are encountered an entry is made in the LT.
- In pass2- Literal table is used to generate the address of the literal.

E) Base Table (BT):

Register Availability Contents of Base register


(1 Byte Character) (3 bytes= 24 bit address hexadecimal)

1 8N9 -
2 8N9 -
. -
.
.
15 8N9 00

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

- Code availability-
- Y- Register specified in USING pseudo-opcode.
- N--Register never specified in USING pseudo-opcode.
- BT is used to keep a track on the Register availability.
- In pass1- BT is not used.
- In pass2- In pass2, BT is consulted to find which register can be used as base registers
along with their contents.

F) Location Counter (LC):


- LC is used to assign addresses to each instruction & address to the symbol defined in the
program.
- LC is updated only in two cases:-
a) If it is an instruction then it is updated by instruction length.
b) If it is a data representation (DS, DC) then it is updated by length of data field.

7.7 Pass1 Databases:

- The purpose of pass1 is to define the symbols & literals.


- The various databases maintained are as follows.
a) Original source cards:
It contains the original program.
b) Location counter (LC):
It is used to assign addresses to the instruction & addresses to the symbols
defined in the program.
c)Mnemonic Opcode Table (MOT):
MOT is consulted to obtain the instruction length ( to Update LC).
d)Pseudo Opcode Table (POT):
POT is consulted to process some pseudo opcodes like DS, DC, EQU.
e) Symbol Table (ST):
Whenever a symbol is defined entry is made in ST.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

f) Literal Table (LT):


Whenever literal are encountered entry is made in LT.
g) Copy File:
It is prepared to be used by pass2.

Pass2 Databases:
- The purpose of pass2 is to generate the instruction & data.
- The various databases maintained are as follows.
a) Copy File:
It is prepared by pass1.
c) Location counter (LC):
It is used to assign addresses to the instruction & addresses to the symbols
defined in the program.
c)Mnemonic Opcode Table (MOT):
MOT is consulted to obtain the binary opcode, instruction length & instruction
format.
d)Pseudo Opcode Table (POT):
POT is consulted to process some Pseudo opcodes like DS, DC, USING, DROP.
e)Symbol Table (ST):
It is used to generate the address for the symbols.
f) Literal Table (LT):
It is used to generate the address of the literals.
g)Base Table (BT):
It is consulted to find the registers which are available to be used as base registers.
h) Instruction workspace:
It is used to hold the instruction while its various parts are getting assembled.
i) PUNCH workspace:
It is used to punch the assembled instruction onto the cards.
j) PRINT workspace:
It is used to generate a printed assembly listing.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

k) Assembled object cards:


Which contain the object program in a format required by the loader.

Lecture 10

7.8 Algorithm of PASS1 & PASS2 Assembler:

PASS 1: DEFINE SYMBOLS


The purpose of the first pass is to assign a location to each instruction and data defining
pseudo-instruction, and thus to define values for symbols appearing in the label fields of the source
program.
1) Initially, the Location Counter (LC) is set to the first location in the program (relative address
0). Then a source statement is read.
2) The operation-code field is examined to determine if it is a pseudo-op; if it is not, the table of
machine op-codes (MOT) is searched to find a match for the source statement's op-code field. The
matched MOT entry specifies the length (2,4 or 6 bytes) of the instruction.
3) The operand field is scanned for the presence of a literal. If a new literal is found, it is entered
into the Literal Table (LT) for later processing.
4) The label field of the source statement is then examined for the presence of a symbol. If there
is a label, the symbol is saved in the Symbol Table (ST) along with the current value of the location
counter.
5) The current value of the location counter is incremented by the length of the instruction and a
copy of the source card is saved for use by pass 2. The above sequence is then repeated for the
next instruction.
6) Pseudo-ops:
a) USING and DROP do neither. The assembler need only save the USING and DROP
cards for pass 2.
b) In the case of the EQU pseudo-op during pass 1, we are concerned only with defining
the symbol in the label field. This requires evaluating the expression in the operand field.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

c) The DS and DC pseudo-ops can affect both the location counter and the definition of
symbols in pass 1.
d) When the END pseudo-op is encountered, pass 1 is terminated.

PASS 2: GENERATE CODE

1) Read copy file generated by pass 1.


2) As in pass 1, the operation code field is examined to determine if it is a pseudo-op; if it is not,
the table of machine op-codes (MOT) is searched to find a-match for the card's op-code field. The
matching MOT entry specifies the length, binary op-code, and the format type of the instruction.
The operand fields of the different instruction format types require somewhat different processing.
3) For the RR-format instructions, each of the two register specification fields is evaluated. This
evaluation may be very simple, as in:
AR 2, 3.
4) For RX-·format instructions, the register and index fields are evaluated and processed in the
same way as the register specifications for RR-format instructions.

5) Print the EQU card as part of the printed listing.


6) The USING and DROP pseudo-ops, which were largely ignored in pass 1, require additional
processing in pass 2. The operand fields of the pseudo-ops are evaluated; then the corresponding
Base Table entry is either marked as available, if USING, or unavailable, if DROP. The base table
is used extensively in pass 2 to compute the base and displacement fields for machine instructions
with storage operands.
7) The DS and DC pseudo-ops are processed essentially as in pass 1. In pass 2, however, actual
code must be generated for the DC pseudo-op.
8) The END pseudo-op indicates the end of the source program and terminates the assembly.
Various "housekeeping" tasks must now be performed. For example, code must be generated for
any literals remaining in the Literal Table.

7.9 Detail Flowchart of Pass1 & Pass2 Assembler:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

a) Detailed pass l flowchart

7. Detailed Pass2 Assembler:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Lecture 11
7.1 Definition of Macro:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The assembly language programmer often finds certain statements being repeated in the
program. The programmer can take the advantage of 8MACRO9 facility where MACRO is defined
to be 3Single line abbreviation for group of instructions.

The template to be followed for defining a MACRO is as follows:

MACRO Start of Definition

Macro Name

Macro Body

MEND END of Macro definition

7.2 MACRO instructions:


In its simplest form, a macro is an abbreviation for a sequence of operation.
Consider, the following program:
Example 1:
.
A 1, DATA Add contents of DATA to register 1
A 2, DATA Add contents of DATA to register 2
A 3, DATA Add contents of DATA to register 3
.
.
.

A 1, DATA Add contents of DATA to register 1


A 2, DATA Add contents of DATA to register 2
A 3,DATA Add contents of DATA to register 3

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

.
.
DATA DC F '5'

In the above program the sequence

A 1, DATA
A 2, DATA
A 3, DATA
occurs twice.

The above requirement can be achieved by using macro facility as follows.


Source Expanded source
MACRO
INCR

A 1,DATA
A 2,DATA
A 3,DATA

MEND
.
.
.
A 1, DATA
INCR A 2, DATA
A 3, DATA
.
.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

. A 1, DATA
INCR A 2, DATA
. . A 3, DATA

.
DATA DC F'5'
.
In this case the macro processor replaces each macro call with the lines

A 1, DATA
A 2, DATA
A 3, DATA

This process of replacement is called expanding the macro.


7.3 Definition & function of Macro processor:
• Macro processor is a program which is responsible for processing the macro.

• There are four basic tasks/ functions that any macro instruction processor must
perform.

1. Recognize macro definition:


A macro instruction processor must recognize macro definitions identified by the MACRO
and MEND pseudo-ops.
• Save the definitions:
The processor must store the macro instruction definitions, which it will need for
expanding macro calls.
• Recognize calls:
The processor must recognize macro calls that appear as operation mnemonics. This
suggests that macro names be handled as a type of op-code.

• Expand calls and substitute arguments:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The processor must substitute for dummy or macro definition arguments the corresponding
arguments from a macro call; the resulting symbolic (in this case, assembly language) text
is then substituted for the macro call. This text, of course, may contain additional macro
definitions or calls.
In summary: the macro processor must recognize and process macro definitions
and macro calls.

Lecture 12

7.4 Features of Macro facility:


1. Macro Instruction Arguments:
- The macro facility presented thus far is capable of inserting blocks of instructions in place
'of macro calls. All of the calls to any given macro will be replaced by identical blocks.
- This macro facility lacks flexibility: there is no way for a specific macro call to modify the
coding that replaces it,
- An important extension of this facility consists of providing for arguments, or parameters,
in macro calls or also called macro dummy arguments.

Example 2:
.
.
.

A 1, DATA1 Add contents of DATA1 to register 1


A 2, DATA1 Add contents of DATA1 to register 2
A 3, DATA1 Add contents of DATA1 to register 3
.
.
.

A 1, DATA2 Add contents of DATA2 to register 1

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A 2, DATA2 Add contents of DATA2 to register 2


A 3,DATA2 Add contents of DATA2 to register 3

.
.
.
DATA DC F '5'
DATA DC F '109

- In this case the instruction sequences are very similar but not identical. The first sequence
performs an operation using DATAl as operand; the second, using
DATA2.
- They can be considered to perform the 3same operation with a variable parameter, -or-
argument. Such a parameter is called a macro instruction argument or "dummy argument;"
it is specified on the macro name line and distinguished by as a macro language symbol
rather than an assembly language symbol)
by the ampersand (&), which is always its first character.

- Program could be written as:


Source Expanded source
MACRO
INCR & ARG

A 1, &ARG
A 2, &ARG
A 3, &ARG

MEND
.
.
.
A 1, DATA1

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

INCR DATA1 A 2, DATA1


A 3, DATA1
.
.
. A 1, DATA2
INCR DATA2 A 2, DATA2
. . A 3, DATA2

DATA1 DC F'5'
DATA2 DC F9109
.
.
It is possible to supply more than one argument in a macro call.

Example 3:

A 1, DATA1 Add contents of DATA1 to register 1


A 2, DATA2 Add contents of DATA2 to register 2
A 3, DATA3 Add contents of DATA3 to register 3
.
.
.

A 1, DATA3 Add contents of DATA3 to register 1


A 2, DATA2 Add contents of DATA2 to register 2

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A 3, DATA1 Add contents of DATA1 to register 3

.
.
.
DATA DC F '5'
DATA DC F '109
DATA DC F '159

Source Expanded source


MACRO
INCR & ARG1, & ARG2, &ARG3

A 1, &ARG1
A 2, &ARG2
A 3, &ARG3

MEND
.
.
.
A 1, DATA1

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

INCR DATA1, DATA2, DATA3 A 2, DATA2


A 3, DATA3
.
.
. A 1, DATA3
INCR DATA3, DATA2, DATA1 A 2, DATA2
. . A 3, DATA1

DATA1 DC F'5'
DATA2 DC F9109
DATA DC F9159
.
.
2. Conditional Macro Expansion:
- Two important macro processor pseudo-ops, AIF and AGO, permit conditional
reordering of the sequence of macro expansion.
-This allows conditional selection of the machine instructions that appear in expansions of a macro
call.
Example 4:

A 1, DATA1 Add contents of DATA1 to register 1


LOOP1 A 2, DATA2 Add contents of DATA2 to register 2
A 3, DATA3 Add contents of DATA3 to register 3
.
.
.

LOOP2 A 1, DATA3 Add contents of DATA3 to register 1


A 2, DATA2 Add contents of DATA2 to register 2

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

.
.
.
LOOP2 A 3,DATA1 Add contents of DATA1 to register 3

.
.
.
DATA DC F '5'
DATA DC F '109
DATA DC F '159

In this example, the operands, labels, and the number of instructions generated change in each
sequence. This program could be written as follows:
.
.
.

MACRO
&ARG0 VARY &COUNT, &ARG l,&ARG2,&ARG3
&ARG0 A l, & ARG l
AIF (&COUNT EQ 1) .FINI Test if & COUNT = 1
A 2, &ARG2
AIF (&COUNT EQ 2) .FINI Test if & COUNT = 2
A 3,&ARG3
.FINI MEND
.
. Expanded Source
.
LOOP1 VARY 3, DATA 1,DATA2, DATA3 LOOP1 A 1,DATAl

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

. A 2, DATA2
. A 3, DATA3
.

LOOP2 VARY 2, DATA3,DATA2 LOOP2 A 1,DATA3


A 2,DATA2

.
LOOP3 VARY 1,DATA1 { LOOP3 A 1,DATA1

DATAl DC F'5'
DATA2 DC F'10'
DATA3 DC F'15'

- Labels starting with a period (.), such as .FINI, are macro labels and do not
appear in the output of the macro processor.
- The statement AIF (&COUNT EQ l)FINI directs the macro processor to skip to the statement
labeled .FINI if the parameter corresponding to &COUNT is a 1; otherwise, the macro processor
is to continue with the statement following the AIF pseudo-op.
- AIF is a conditional branch pseudo-o ; it performs an arithmetic test and branches only if the
tested condition is true.
- The AGO is an unconditional branch pseudo-op or 'go to' statement.

3) Macro Calls Within Macros:


- Since macro calls are "abbreviations" of instruction sequences, it seems reasonable that
such "abbreviations" should be available within other macro definitions.
For example,

Example 5:
MACRO
ADD1 & ARG

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

L 1, & ARG
A 1, =F 819
ST 1, & ARG
MEND
MACRO
ADDS &ARG1, &ARG2 , &ARG3
ADD1 &ARG1
ADD1 &ARG2
ADD1 &ARG3
MEND

- Within the definition of the macro 'ADDS' are three separate calls to a previously defined
macro 8ADD19 & thus has made it more easily understood.
- Such use of macro results in macro expansions on multiple levels.

Lecture 13
7.2 Two Pass Macro processor:
The macro processor algorithm will two passes over the input text, searching for the macro
definition & then for macro calls.
Format of Databases:
1) Argument List Array:
- The Argument List Array (ALA) is used during both pass 1 and pass 2.
During pass 1, in order to simplify later argument replacement during macro expansion, dummy
arguments in the macro definition are replaced with positional indicators when the definition is
stored: The ith dummy argument on the macro name card 'is represented in the body of the macro
by the index marker symbol #.
Where # is a symbol reserved for the use of the macro processor (i.e., not available to
the programmers).
- These symbols are used in conjunction with the argument list prepared before expansion
of a macro call. The symbolic dummy arguments are retained on the macro name card to

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

enable the macro processor to handle argument replacement by name rather than by
position.

- During pass 2 it is necessary to substitute macro call arguments for the index markers stored
in the macro definition.

Argument List Array:

Index 8 bytes per entry

0 "bbbbbbbb" (all blank)


2 "DATA3bbb"
3 "DATA2bbb"
4 "DATA1bbb"

2) Macro Definition Table:

- The Macro Definition Table (MDT) is a table of text lines.


- Every line of each macro definition, except the MACRO line, is stored in the MDT. (The
MACRO line is useless during macro expansion.)
- The MEND is kept to indicate the end of the definition; and the macro name line is retained
to facilitate keyword argument replacement.

Macro Definition Table

8O bytes per entry


Index Card
. .
. .
15 &LAB INCR &ARG1,&AAG2,&AAG3
16 #0 A 1, #1

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

17 A 2, #2
18 A 3,#3
19 MEND
. .
. .
. .

4) The Macro Name Table (MNT) :


- MNT serves a function very similar to that of the assembler's Machine-Op Table (MOT)
and Pseudo-Op Table(POT).
- Each MNT entry consists of a character string (the macro name) and a pointer (index) to
the entry in the MDT that corresponds to the beginning of the macro definition.

Index 8 Bytes 4 Bytes


. . .
. . .
3 <INCRbbbb= 15
. . .
. . .

7.6 Data bases required for Pass1 & Pass2 Macro processor:

The following data bases are used by the two passes of the macro processor:

Pass 1 data bases:


1. The input macro source deck
2. The output macro source deck copy for use by pass 2
3. The Macro Definition Table (MDT), used to store the body of the macro definitions
4. The Macro Name Table (MNT), used to store the names of defined macros
5. The Macro Definition Table Counter (MDTC), used to indicate the next available entry in the
MDT

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

6. The Macro Name Table Counter (MNTC), used to indicate the next available entry in the MNT
7. The Argument List Array (ALA), used to substitute index markers for dummy arguments before
storing a macro definition
Pass 2 data bases:
1. The copy of the input macro source deck
2. The output expanded source deck to be used as input to the assembler
3. The Macro Definition Table (MDT), created by pass 1
4. The Macro Name Table (MNT), created by pass 1
5. The Macro Definition Table Pointer (MDTP), used to indicate the next line of text to be used
during macro expansion
6. The Argument List Array (ALA), used to substitute macro call arguments for the index markers
in the stored macro definition

Lecture 14
7.7 Algorithm of Pass1 & Pass2 Macro processor:
PASS I-MACRO DEFINITION: The algorithm for passl tests each input line. If-it is a MACRO
pseudo-op:
1) The entire macro definition that follows is saved in the next available locations in the
Macro Definition Table (MDT).
2) The first line of the definition is the macro name line. The name is entered into
the Macro Name Table (MNT), along with a pointer to the first location of the
MDT entry of the definition.
3) When the END pseudo-op is encountered, all of the macro definitions have been processed
so control transfers to pass 2 in order to process macro calls.
PASS2-MACRO CALLS AND EXPANSION:
The algorithm for pass 2 tests the operation mnemonic of each input line to see if it is a name in
the MNT. When a call is found:-
1) The macro processor sets a pointer, the Macro Definition Table Pointer (MDTP) , to
the corresponding macro definition stored in the MDT. The initial value of the MDTP
is obtained from the "MDT index" field of the MNT entry.
2) The macro expander prepares the Argument List Array(ALA) consisting of a table of
dummy argument indices and corresponding arguments to the call.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3) Reading proceeds from the MDT; as each successive line is read, the values
from the argument list are substituted for dummy argument indices in the macro
definition.
4) Reading of the MEND line in the MDT terminates expansion of the macro, and
scanning continues from the input file.
5) When the END pseudo-op is encountered, the expanded source deck is transferred to
the assembler for further processing.
7.8 Flowchart of Pass1 & Pass2 Macro processor:

Pass 1: processing macro definitions

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Pass2 : processing macro calls and expansion

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

MACROS PROCEDURE / Subroutine

1 The corresponding machine code is 1 The Corresponding m/c code is written


written every time a macro is called in a only once in memory
program.
2 Program takes up more memory space. 2 Program takes up comparatively less
memory space.

3 No transfer of program counter. 3 Transferring of program counter is


required.

4 No overhead of using stack for 4 Overhead of using stack for transferring


transferring control. control.

5 Execution is fast 5 Execution is comparatively slow.

6 Assembly time is more. 6 Assembly time is comparatively less.

7 More advantageous to the programs when 7 More advantageous to the programs


repeated group of instruction is too short. when repeated group of instructions is
quite large.

2. 9. References:
J.J. Donovan, < System Programming=
D. M. Dhamdhere , =Systems programming & Operating Systems=.
2. 10. Objective Question (minimum 10-15) with answers.

Q.1 The translator which perform macro expansion is called a


(A) Macro processor (B) Macro pre-processor
(C) Micro pre-processor (D) assembler

Q.2 Nested Macro calls are expanded using the

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(A) FIFO rule (First in first out) (B) LIFO (Last in First out)
(C) FILO rule (First in last out) (D) None of the above

Q.3 The expansion of nested macro calls follows


(A) FIFO rule. (B) LIFO rule.
(C) LILO rule. (D) priority rule.
Q.4 A macro definition consists of
(A) A macro prototype statement (B) One or more model statements
(C) Macro pre-processor statements (D) All of the above

Q.5 Which of the following is not part of data structure of macro processor?

A)MNT B)MDT
C)MOT D)ALA
Q.6 Which system software used to define the macros?

A) Compiler B) Interpreter
C) Assembler D) Macro processor
Q.7 Which directive used to start the macro in program?

A) START B)MACRO
C)MEND D)none of the above
Q.8 In pass2 macro processor task of pass1 is..

A) Processing macro calls


B) Macro Expansion
C) Processing macro definition
D) Processing macro calls & expansion
Q.9 State true/ False:

Macros are faster than procedure. - TRUE

Q.10 Macros not available in HLL. 3 FALSE

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q.11 In a two-pass assembler, the task of the Pass II is to


(A) Separate the symbol, mnemonic opcode and operand fields (C) Construct intermediate code.
(B) Build the symbol table. (D) Synthesize the target
program.

Q.12 The syntax of the assembler directive EQU is


(A) EQU <address space> (B) <symbol>EQU<address space>
(C) <Symbol>EQU (D) None of the above

Q.13 Which of these translate assembly code into m/c code?


(A) Complier (B) Interpreter
(C )None of the above (D) Assembler

Q.14 Which of the following is not an assembler directives?


(A) stack (B)model
(C ) call (D) db

Q.15 In a two pass assembler the pseudo code EQU is to evaluated during-
(A) Pass1 (B) Pass2
(C )Not evaluated by the assembler (D)None of the above

Q.16 Writing a software in Assembly language is preferred to writing in HLL when


a) Memory space is limited
b) Programmer9s productivity is important
c) Portability is important
d) Optimal use of h/w resources are available is of primary concern.

Q.17 An assembly language directive is:


a) Same as an instruction
b) Used to define space for variable

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

c) Used to start a program


d) To give commands to an assembler

Q.18 SPARC assembler stands for..


a)serial processor architecture
b)sun micro system processor architecture
c)scalable processor architecture
d)none of the above
Q.19 IBM stands for:
a) International Business Machine
b) International Business Management
c) Institute of Business Management
d) None of the above
Q.20 MASM stands for:
a) Microsoft Assembler
b) Machine assembler
c)None of the above

2. 11. Short Questions

Q.1 What is assembly language? What kinds of statements are present in an assembly language
program? Discuss. Also highlight the advantages of assembly language.
Ans: Assembly language is a family of low-level language for programming computers,
microprocessors, microcontrollers etc. They implement a symbolic representation of the numeric
machine codes and other constants needed to program a particular CPU architecture. This
representation is usually defined by the hardware manufacturer, and is based on abbreviations
(called mnemonic) that help the programmer remember individual instruction, register etc.
Assembly language programming is writing machine instructions in mnemonic form, using an
assembler to convert these mnemonics into actual processor instructions and associated data.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

An assembly program contains following three kinds of statements:


1. Imperative statements: These indicate an action to be performed during execution of the
assembled program. Each imperative statement typically translates into one machine instruction.
2. Declaration statements: The syntax of declaration statements is as follows:
[Label] DS<constant>
[Label] DC 8<value>9
The DS statement reserves areas of memory and associates names with them.
The DC statement constructs memory words containing constants.
3. Assembler directives: These instruct the assembler to perform certain actions during the
assembly of a program. For example
START <constant> directive indicates that the first word of the target program generated by the
assembler should be placed in the memory word with address <constant>.

The advantages of assembly language program would be


• reduced errors
• faster translation times
• changes could be made easier and faster

Q.2 What are the functions of passes used in two-pass assembler? Explain pass-1
algorithm?Describe Data structures used during passes of assembler and their use.
Ans:
Data structure during passes of assembler and their use.
Pass 1 data base
1. Input source program
2. A location counter (LC)
3. A table, the machine-operation table (MOT), that indicates the symbolic
mnemonic for each instruction and its length.
4. Pseudo- operation table
5. Symbol table
6. Literal table
7. Copy of the input to be used later by pass 2

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Pass 2
1. Copy of source program input to pass 1
2. Location counter (LC)
3. MOT
4. POT
5. ST
6. Base table that indicates which registers are currently specified as base register.
7. A work space, INST, that9s used to hold instruction as its various parts are being
assembled together
8. Punch line, used to produce a printed listing.
9. Punch card for converting assembled instructions into the format needed by the loader.

Q.3 Can the operand expression in an ORG statement contain forward references? If so, outline
how the statement can be processed in a two-pass assembly scheme.
Ans:
(ORG ((orriigiin)) iis an assembllerr diirrecttiive tthatt
• Indirectly assign values to symbols
• Reset the location counter to the specified value-ORG value
• Value can be: constant, other symbol, expression
• No forward reference
- Assemblers scan the source program, generating machine instructions. Sometimes,
the assembler reaches a reference to a variable, which has not yet been defined. This
is referred to as a forward reference problem. It is resolved in a two-pass assembler as follows:
- On the first pass, the assembler simply reads the source file, counting up the number
of locations that each instruction will take, and builds a symbol table in memory
that lists all the defined variables cross-referenced to their associated memory
address.
- On the second pass, the assembler substitutes opcodes for the mnemonics and variable
names are replaced by the memory locations obtained from the symbol table.
Q.4 Explain the differences between macros and subroutines.
Q.5 What is macro-expansion?

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q.6 Explain databases used in pass1 & pass2 macro processor?

Q.7 Define macro.

Q.8 What are the advantages and disadvantages of macro pre-processor? (8)
- Ans: The advantage of macro pre-processor is that any existing conventional assembler
can be enhanced in this manner to incorporate macro processing. It would reduce the
programming cost involved in making a macro facility available.
- The disadvantage is that this scheme is probably not very efficient because of the time
spent in generating assembly language statement and processing them again for the purpose
of translation to the target language
- Q.6 Explain macro definition, macro call and macro expansion?

2.12 Long Questions:

Q.1 What is forward reference & Forward Reference problem?


Q.2 What is FRP? How to solve forward reference problem in multi pass assembler?
Q.3 With the reference to assembler explain the following tables with suitable examples:

i) POT
ii) MOT
iii) ST
iv) LT

Q.4 What are various databases used in two pass assembler design. Explain flowchart with
example.

Q. ) Explain two pass macro processor with flowchart and databases.

Q.8) Explain different psudo-ops used for conditional macro expansion, along with
example.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

MODULE 3

Loaders and Linkers


3.1. Motivation:
Motivation of this chapter is to provide the students with the knowledge of
 Understanding different functions of Loaders and Linkers
 Study of different types of Loaders and Linkers
 How to design & implement Loaders and Linkers

3.2. Learning Objective: Students will be able to:


• Understand different functions of Loaders and Linkers
• Design & implement Loader
• Design & implement Linker

3.3 Syllabus:

Lecture Content Duration Self Study


15 Linkers: Introduction, Relocation of 2 Lecture 3 hours
Linking Concept

16 Design of a Linker. 1 Lecture 2 hours

17 Loaders: Loader and Function of Loader 1 Lecture 2 hours

18 Loader schemes 1 Lecture 2 hours

19 Design of Direct linking loader. 1 Lecture 2 hours

3.4. Learning Outcomes:

The student should be able to :


 Learn the concept of loader and linker
 Describe the linkage editor.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

 Understand the working of dynamic linking loader

3.5. Key Definitions:


Linker
Linker is a program that takes one or more objects generated by a compiler and combines them
into a single executable program.
Loader
Loader is the part of an operating system that is responsible for loading programs from
executables (i.e., executable files) into memory, preparing them for execution and then executing
them.

Dynamic Linking

Many operating system environments allow dynamic linking, that is the postponing of the
resolving of some undefined symbols until a program is run. That means that the executable code
still contains undefined symbols, plus a list of objects or libraries that will provide definitions for
these. Loading the program will load these objects/libraries as well, and perform a final linking.

Relocation

It is the process of replacing symbolic references or names of libraries with actual usable addresses
in memory before running a program. It is typically done by the linker during compilation (at
compile time), although it can be done at runtime by a relocating loader. Compilers or assemblers
typically generate the executable with zero as the lower-most starting address.

Relocation Table
It can also be provided in the header of the object code file. Each "fixup" is a pointer to an address
in the object code that must be changed when the loader relocates the program. Fixups are designed
to support relocation of the program as a complete unit. In some cases, each fixup in the table is
itself relative to a base address of zero, so the fixups themselves must be changed as the loader
moves through the table

Self-relocation

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

It is a process, which an executing computer program may use to effect a change in the base address
at which that computer program executes. This is similar to the relocation process employed by
the loader when a program is copied from external storage into main memory; the difference is the
locus of instructions that compute the relocation. When those instructions reside within the
relocated program, self-relocation occurs.

Linkage editor

An editor program that creates one module from several by resolving cross-references among the
modules.

Linkage editor

Its processing follows the source program assembly or compilation of any problem program. The
linkage editor is both a processing program and a service program used in association with the
language translators.

Linking loader

It Performs all linking and relocation operations, including automatic library search, and loads the
linked program into memory for execution.
Linkage editor It Produces a linked version of the program, which is normally written to a file or
library for later execution.

Simple relocating loader (one pass)


It can be used to load the program into memory for execution. _ The linkage editor performs
relocation of all control sections relative to the start of the linked program._ The only object code
modification necessary is the addition of an actual load address to relative values within the
program.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3.7 Course Contents:

Lecture 15
Basic Functions
• Allocation: allocate space in memory for the programs
• Linking: Resolve symbolic references between object files
– combines two or more separate object programs
– supplies the information needed to allow references between them
• Relocation: Adjust all address dependent locations, such as address constants, to
correspond to the allocated space
– modifies the object program so that it can be loaded at an address different from
the location originally specified
• Loading: Physically place the machine instructions and data into memory

Fig. 1 : Basic Functions

Lecture 16
6.2 Design of Absolute Loader

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Its operation is very simple; no linking or relocation

6.2.1 Disadvantages of Absolute Loaders


• Actual load address must be specified
• The programmer must be careful not to assign two subroutines to the same or overlapping
locations
• Difficult to use subroutine libraries (scientific and mathematical) efficiently
– important to be able to select and load exactly those routines that are needed
• Allocation - by programmer
• Linking - by programmer
• Relocation - None required-loaded where assembler assigned
• Loading - by loader

6.2.2 Loader Schemes


• Compile and Go
– The assembler run in one part of memory

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

– place the assembled machine instructions and data, as they are assembled, directly
into their assigned memory locations
– When the assembly is completed, the assembler causes a transfer to the starting
instruction of the program

6.2.3 Disadvantages of Compile and Go

• A portion of memory is wasted because the memory occupied by the assembler is


unavailable to the object program.
• It is necessary to re-translate (assemble) the user's program file every time it is run.
• It is very difficult to handle multiple segments, especially if the source programs are in
different.
• If changes were made to MAIN that increased its length to more than 300 bytes
• the end of MAIN (at 100 + 300 = 400) would overlap the start of SQRT (at 400)
• It would then be necessary to assign
• SQRT to a new location
• changing its START and re-assembling it?!
• Furthermore, it would also be necessarily to modify all other subroutines that referred to
the address of SQRT.

Lecture 17 & 18

6.3 Design of Direct Linking Loader


• The assembler provides
1. The length of segment
2. A list of all entries and their relative location within the segment
3. A list of all external symbols
4. Information as to where address constants are loaded in the segment and a
description of how to revise their values.
5. The machine code translation of the source program and the relative addresses
assigned

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• Assembler records
• External Symbol Dictionary (ESD) record: Entries and Externals
• (TXT) records control the actual object code translated version of the source program.
• The Relocation and Linkage Directory (RLD) records relocation information
• The END record specifies the starting address for execution
• SD: Segment Definition
• Local Definition
• External Reference

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

6.3.1 Disadvantages of Direct Linking


• It is necessary to allocate, relocate, link, and load all of the subroutines each time in order
to execute a program
• Loading process can be extremely time consuming.
• Though smaller than the assembler, the loader absorbs a considerable amount of space
• Dividing the loading process into two separate programs a binder and a module loader
can solve these problems

6.4 Difference between Linkage editor & linking loader


Sr.No Linkage Editor Linking Loader

1. Linkage editor produces a linked Linking loader performs all linking &
version of the program which is written relocation operations including
to a file or library for later execution. automatic library search if specified &
loads the linked version of program
directly into the memory for
execution.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

2. Linkage editor generates a file which Linking loader does not generates a file but
would be supplied later to the relocating directly loads the program into the memory for
loader for later execution. execution.

3. Resolution of external references & Linking loader searches libraries & resolves
library searching are only performed external references every time the program is
once. executed.

4. Linkage editor is suitable if program LL is suitable in a program development &


need to be executed many times without testing environment.
being reassembled.

3.8 References:
1. Leland Beck < System Software = Addision Wesley
2. D. M. Dhamdhere; Systems programming & Operating systems , Tata
McGraw Hill
3. J.J. Donovan, < System Programming=

3.9 Objective questions.


Q1 A Program that is responsible for loading programs from executables into memory

A. Linker
B. Loader
C. Compiler
D. Interpreter

Q 2 Postponing of the resolving of some undefined symbols until a program is run is called as

A. Dynamic Linking

B. Dynamic loading

C. Linking

D. Loading

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3.10 SUBJECTIVE QUESTIONS

1) What is Compiler? Explain its phases


2) Explain Function of Loader
3) What is Dynamic linking and loading

3.11 Long questions

1. What are the different functions of loader explain in brief.

2. What is the difference between Dynamic Loading and Dynamic Linking explain with an
example.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Module 4
Introduction to Compilers and Lexical Analysis

4.1. Motivation:
 To provide the students with the knowledge about compiler and its phases

4.2. Learning Objective: Students will be able to :


• Define and compare the Compiler and Interpreter
• Illustrate the different phases of Compiler with the help of example

4.3. Syllabus:
Prerequisites Syllabus Duration Self Study
Fundamental of Introduction to Compilers: Design 2Hr 2 Hr
issues, passes, phases.
Programming Lexical Analysis: The Role of a Lexical

language analyzer, Input buffering, specification and


recognition of tokens, Automatic
construction of lexical analyzer using LEX

4.4. Learning Outcomes: Students should be able to:


• Differentiate between compiler and Interpreter
• Learn/Understand working of compiler

Lecture 20

Compiler
A compiler is a computer program (or set of programs) that transforms source code written in a
programming language (the source language) into another computer language (the target
language, often having a binary form known as object code). The most common reason for wanting
to transform source code is to create an executable program.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Interpreters

An interpreter is also a program that translates a high-level language into a low-level one, but it
does it at the moment the program is run. You write the program using a text editor or something
similar, and then instruct the interpreter to run the program. It takes the program, one line at a time,
and translates each line before running it: It translates the first line and runs it, then translates the
second line and runs it etc. The interpreter has no "memory" for the translated lines, so if it comes
across lines of the program within a loop, it must translate them afresh every time that particular
line runs.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Lecture 21 Phases of Compiler

1. Analysis Phase :

Analysis Phase performs 3 actions namely

a) Lexical analysis - it contains a sequence of characters called tokens. Input is source program &
the output is tokens.

b) Syntax analysis - input is token and the output is parse tree

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

c) Semantic analysis - input is parse tree and the output is expanded version of parse tree

2 .Synthesis Phase :

Synthesis Phase performs 3 actions namely

d) Intermediate Code generation - Here all the errors are checked & it produce an intermediate
code.

e)Code Optimization - the intermediate code is optimized here to get the target program

f) Code Generation - this is the final step & here the target program code is generated.

Lecture: 22
Role of a Lexical analyzer, input buffering, specification and recognition of tokens
Learning Objective: In this lecture student will able to design Lexical Analyzer.

4.9.1 Role of lexical Analyzer:


Lexical Analyzer is a program or function which performs lexical analysis is called a lexical
analyzer, lexer or scanner.

Lexical grammar

The specification of a programming language will often include a set of rules which defines the
lexer. These rules are usually called regular expressions and they define the set of possible
character sequences that are used to form individual tokens or lexemes.

Token
A token is a string of characters, categorized according to the rules as a symbol (e.g. IDENTIFIER,
NUMBER, COMMA, etc.). The process of forming tokens from an input stream of characters is
called tokenization and the lexer categorizes them according to a symbol type

Scanner

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The scanner, is usually based on a finite state machine. It has encoded within it information on
the possible sequences of characters that can be contained within any of the tokens it handles
(individual instances of these character sequences are known as lexemes)
Tokenization
It is the process of demarcating and possibly classifying sections of a string of input characters

Finite-state machine (FSM)


Finite-state automaton or simply a state machine, is a mathematical abstraction sometimes used
to design digital logic or computer programs. It is a behavior model composed of a finite number
of states, transitions between those states, and actions, similar to a flow graph in which one can
inspect the way logic runs when certain conditions are met. It has finite internal memory, an input
feature that reads symbols in a sequence, one at a time without going backward; and an output
feature, which may be in the form of a user interface, once the model is implemented. The
operation of an FSM begins from one of the states (called a start state), goes through transitions
depending on input to different states and can end in any of those available, however only a certain
set of states mark a successful flow of operation (called accept states).
Deterministic finite state machine
It also known as deterministic finite automaton (DFA)4is a finite state machine accepting finite
strings of symbols. For each state, there is a transition arrow leading out to a next state for each
symbol. Upon reading a symbol, a DFA jumps deterministically from a state to another by
following the transition arrow. A DFA has a start state (denoted graphically by an arrow coming
in from nowhere) where computations begin, and a set of accept states (denoted graphically by a
double circle) which help define when a computation is successful.

Nondeterministic finite state machine


Nondeterministic finite automaton (NFA) is a finite state machine where for each pair of state
and input symbol there may be several possible next states. This distinguishes it from the
deterministic finite automaton (DFA), where the next possible state is uniquely determined.
Although the DFA and NFA have distinct definitions, it may be shown in the formal theory that
they are equivalent, in that, for any given NFA, one may construct an equivalent DFA, and vice-
versa.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Let’s check the take away from this lecture

Q 1. Postponing of the resolving of some undefined symbols until a program is run is called as

A. Dynamic Linking

B. Dynamic loading

C. Linking

D. Loading

Q 2 Automaton where the next possible state is uniquely determined is called as

A. NFA

B. DFA

C. Turing Machine

Answers: 1) D 2) B

L21 & 22. Exercise:


1) Explain with example NFA and DFA
2) Define Finite state automata. What is their role compiler theory? Explain in Detail
3)Write Short note on Lexical Analysis

Questions/problems for practice:


GATE Exam Questions
1. The number of tokens in the following C statement is

printf("i = %d, &i = %x", i, &i);

(A) 3
(B) 26
(C) 10
(D) 21

Answer: (C)

2.In a compiler, keywords of a language are recognized during


(A) parsing of the program

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(B) the code generation


(C) the lexical analysis of the program
(D) dataflow analysis

Answer: (C)

3. The lexical analysis for a modern computer language such as Java needs the power of which
one of the following machine models in a necessary and sufficient sense?
(A) Finite state automata
(B) Deterministic pushdown automata
(C) Non-Deterministic pushdown automata
(D) Turing Machine

Answer: (A)

Learning from the lecture 8Role of a Lexical analyzer, input buffering, specification and
recognition of tokens=.Student will able to list and explain the role of Lexical Analyzer
in compiler Design.

4.8 References:

1. A.V. Aho, a nd J.D.Ullman: Principles of compiler construction,


Pearson Education

2 . A.V. Aho, R. Shethi and Ulman; Compilers - Principles, Techniques and


Tools , Pearson Education

3 Leland Beck < System Software = Addision Wesley

4. D. M. Dhamdhere; Systems programming & Operating systems , Tata


McGraw Hill

4.9. Question Bank

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

 Objective Questions

Q1 In a compiler the module that checks every character of the source text is called

A the code generator

B the code optimizer

C the lexical analyzer

D the syntax analyzer.

2. Which one is not function of compiler?


A. A compiler does a conversion line by line as the program is run
B. A compiler converts the whole of a higher level program code into machine code in one step
C. A compiler is a general purpose language providing very efficient execution
D. It does not report the error
Q 3. Which one is not the stage in the compilation process?

A. Syntax analysis

B. Symantic analysis

C. Code generation

D. Error reporting

Q 4. Following is not purpose of interpreter

A. An interpreter does the conversion line by line as the program is run


B. An interpreter is a representation of the system being designed
C. An interpreter is a general-purpose language providing very efficient execution
D. An interpreter does not perform the conversion line by line as the program is run

Q 5 Following is not the token

• Identifier

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• Keyword

• Number

• Function

Answer: Q1-C, Q-2 D, Q3-D, Q4-D Q5-D

4.9 . Short Questions

1) What is Compiler? Explain its phases


2) What is Dynamic linking and loading
3) Compare compiler and interpreter
4) What is compiler? Draw and Explain structure of compiler
5) What are the phases of Compiler? Explain

4.10 Long Questions

1. What are the different phases of compiler? Illustrate compilers internal presentation of source
program for following statement after each phase [Nov 2016]

Position = initial + Rate * 60

2. What are the different phases of compiler? Illustrate compilers internal presentation of source program
for following statement after each phase. [May 2016]

Amount = P+ P * N* R/100

3. What is the difference between compiler and interpreter? [May 2016]

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Module 5
Parsing

5.1 Motivation:
 Motivation of this chapter is to study & design of different Top-Down & Bottom-
Up parsing techniques.
 After studying this module students can easily develop a parser or a syntax analyzer
phase of a compiler.
 How to design intermediate code by using syntax directed translation
 Detail study of syntax directed definitions & translation scheme.
 The fundamental knowledge about Lexical analysis.

5.2. Syllabus:
Lecture Content Duration Self Study
23 Syntax Analysis: Role of Parser 1 Lecture 2 hours

24 Top-down parsing 1 Lecture 2 hours

25 & 26 Recursive descent and predictive parsers (LL) 2 Lecture 4 hours

27 & 28 Bottom-Up parsing 2 Lecture 4 hours

29 Operator precedence parsing 1 Lecture 2 hours

30 LR parsers 1 Lecture 2 hours

31 & 32 SLR and LALR parsers 2 Lecture 2 hours

33 Automatic construction of parsers using YACC. 2 Lecture 2 hours

34 Introduction to Semantic Analysis: Need of semantic 1 Lecture 2 hours


analysis, type checking and type conversion

4.3. Weightage: 20 Marks


4.4. Learning Objectives: Students should be able to-

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

1. List the functions of Lexical Analyzer. Describe the role of Lexical Analyzer in Compiler
Design. (R)

2. Design and Develop hand written Lexical Analyzer and show the Demonstration of working of
lexical analyzer in compiler design . (A)

3. Describe the role of parser in compilation process. Explain different top down and Bottom-up
parsing techniques. (E )

4. Specify various parsing techniques to design new language structures with the help of
grammars.(C)

5. Explain the construction and role of the syntax tree in the context of Parse tree.(U)

6. Distinguish between Parse tree , Syntax tree and DAG for graphical representation of the source
program. (U )

7. Summarize different Compiler Construction tools and Describe the structure of Lex
specification. (AN)

8. Apply LEX Compiler for Automatic Generation of Lexical Analyzer and Construct Lexical
analyzer using open source tool for compiler design.( C )

9. Define Context Free Grammar and Describe the structure of YACC specification and Apply
YACC Compiler for Automatic Generation of Parser Generator. (U)

4.5. Theoretical Background:

Syntax analysis or parsing is the second phase of a compiler. In this chapter, we shall learn the
basic concepts used in the construction of a parser.A lexical analyzer can identify tokens with the
help of regular expressions and pattern rules. But a lexical analyzer cannot check the syntax of a
given sentence due to the limitations of the regular expressions. Regular expressions cannot check
balancing tokens, such as parenthesis. Therefore, this phase uses context-free grammar (CFG),
which is recognized by push-down automata.CFG, on the other hand, is a superset of Regular
Grammar, as depicted below:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

It implies that every Regular Grammar is also context-free, but there exists some problems, which
are beyond the scope of Regular Grammar. CFG is a helpful tool in describing the syntax of
programming languages.

Syntax Analyzers

A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams.
The parser analyzes the source code (token stream) against the production rules to detect any errors
in the code. The output of this phase is a parse tree.

This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and
generating a parse tree as the output of the phase. Parsers are expected to parse the whole code
even if some errors exist in the program. Parsers use error recovering strategies.

4.6. Abbreviations:
LL(1):

<L= 3 left-to-right scan of input

<L= 3 leftmost derivation

<1= 3 predict based on one token look-ahead

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

For every non-terminal and token predict the next production

LR(k):

<L= 3 left-to-right scan of input

<L= 3 rightmost derivation

<k= 3 predict based on token look-ahead

For every non-terminal and token predict the next production.

SLR: simple LR parser

LR: most general LR parser(canonical LR)

LALR: intermediate LR parser (look-head LR parser)

CFG: Context Free Grammar

CLR: Canonical LR

SDD: Syntax-Directed Definitions

4.7. Formulae: Nil

4.8. Key Definitions:


Parsing: Parsing is the process of determining, if a string of tokens can be generated by a grammar.
Or Parsing is the task of determining the syntax or structure of a program for this reason it is called
as a syntax analysis.
Grammar: Grammar is a set of rules which check correctness of sentences.
Syntax: The rules used to form sentence is called syntax.
Semantics:-The meaning of the sentence.
Language: The set of rules denote set of valid sentence, such a set of valid sentence is called
language.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Leftmost Derivation: Derivation in which only the leftmost non terminal in any sentential form
is replaced at each step. Such derivation is called leftmost derivation.
Rightmost Derivation: Derivation in which only the rightmost nonterminal in any sentential form
is replaced at each step. Such derivation is called rightmost derivation.
Handle of a string: Substring that matches the RHS of some production AND whose reduction to
the non-terminal on the LHS is a step along the reverse of some rightmost derivation.
Parse tree: Graphical representation of a derivation ignoring replacement order.
Annotated Parse Tree : A parse tree showing the values of attributes at each node is called an
annotated parse tree.

Annotating (or decorating) of the parse tree: The process of computing the attributes values at
the nodes is called annotating (or decorating) of the parse tree.

A syntax directed definition : specifies the values of attributes by associating semantic rules with
the grammar productions.

Production Production

E->E1+T E.code=E1.code||T.code||9+9

Detail Syntax-Directed Definition:-

• In a syntax-directed definition, each production A→α is associated with a set of


semantic rules of the form: b=f(c1,c2,…,cn) where f is a function, and b can be
one of the followings:

 b is a synthesized attribute of A and c1,c2,…,cn are attributes of the


grammar symbols in the production ( A→α ).

 b is an inherited attribute one of the grammar symbols in α (on the right side
of the production), and c1,c2,…,cn are attributes of the grammar
symbols in the production.

( A→α ).

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Inherited Attributes: An inherited attribute is one whose value at a node in a parse tree is
defined in terms of attributes at the parent and/or siblings of that node.

Attribute grammar:- An attribute grammar is a syntax-directed definition in which the


functions in the semantic rules cannot have side effects (they can only evaluate values of
attributes).

S-Attributed Definitions: only synthesized attributes used in the syntax-directed definitions.

L-Attributed Definitions: in addition to synthesized attributes, we may also use inherited


attributes in a restricted fashion.

Syntax Tree: A syntax tree is a more condensed version of the parse tree useful for representing
language constructs.

Detail definition of Parse Tree:

Given a CFG, a parse tree according to the grammar is a tree with following properties.

 The root of the tree is labeled by the start symbol


 Each leaf of the tree is labeled by a terminal (=token) or 
Each interior node is labeled by a nonterminal
 If A  X1 X2 … Xn is a production, then node A has immediate
children X1, X2, …, Xn where Xi is a (non)terminal or  ( denotes
the empty string)

Example :A  XYZ

X Y Z

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

4.9. Course Content:


Lecture: 23
Syntax Analysis: Role of Parser
Learning Objective: In this lecture student will able to list different role of Parser.
4.9.2 The role of parser:

1) Determine the syntactic structure of a program from the tokens produces by the scanner
i.e. to check validity of source program.

2) For valid string build a parse tree.

3) For invalid string diagnostic error messages for the cause & nature of error.

Parsing technique:

a) Top Down parsing

b) Bottom up parsing

Let’s check the take away from this lecture


1) Syntax analysis also called…

a)parsing b)scanning

2) Which is the Top down parsing technique?

a) LR(0) b)Recursive decent parser

c)LALR parser d)LL(1)

L23. Exercise:
Q.1) Differentiate between top-down parser & Bottom-up parser.

Q.2) Write Role of Parser in compiler Design.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Questions/problems for practice:


Q. 1 List the Phases of Compiler.

Learning from the lecture 8Role of Parser9:


Student will able to list and explain the role of Parser in compiler Design.

Lecture: 24
Top-down parsing
Learning objective: In this lecture students will able to design Top Down Parser.
4.9.3 Top Down parser:

 Builds parse tree from top (root) to work down to the bottom(leaves).
 Top down parsing techniques:

1) Recursive Decent parser

2) Predictive 0r LL (1) parser

Rules of top down parser:

1) Grammar should not be left recursive.

2) It should be left factor.

Left Recursion:

• A grammar is left recursive if it has a non-terminal A such that there is a


derivation.

A þ Añ for some string ñ

• Top-down parsing techniques cannot handle left-recursive grammars.

• So, we have to convert our left-recursive grammar into an equivalent grammar


which is not left-recursive.

• The left-recursion may appear in a single step of the derivation (immediate left-
recursion), or may appear in more than one step of the derivation.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Rule-

AAñ| ò where ò does not start with A

ÿ eliminate immediate left recursion

A  ò A9

A9  ñ A9 |  an equivalent grammar

In general,

A  A ñ1 | ... | A ñm | ò1 | ... | òn where ò1 ... òn do not start with A

ÿ eliminate immediate left recursion

A  ò1 A9 | ... | òn A9

A9  ñ1 A9 | ... | ñm A9 |  an equivalent grammar

Left-Recursion – Example

E  E+T | T

T  T*F | F

F  id | (E)

ÿ eliminate immediate left recursion

E  T E9

E9  +T E9 | 

T  F T9

T9  *F T9 | 

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

F  id | (E)

Left-Factoring

• A predictive parser (a top-down parser without backtracking) insists that the


grammar must be left-factored.

grammar  a new equivalent grammar suitable for predictive parsing

stmt  if expr then stmt else stmt | if expr then stmt

• when we see if, we cannot now which production rule to choose to re-write
stmt in the derivation.

• In general,

A  ñò1 | ñò2 where ñ is non-empty and the first symbols

of ò1 and ò2 (if they have one)are different.

• when processing ñ we cannot know whether expand

A to ñò1 or A to ñò2

But, if we re-write the grammar as follows

A  ñA9

A9  ò1 | ò2 so, we can immediately expand A to ñA9

• For each non-terminal A with two or more alternatives (production rules) with
a common non-empty prefix, let say

A  ñò1 | ... | ñòn | 1 | ... | m

convert it into

A  ñA9 | 1 | ... | m

A9  ò1 | ... | òn

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Left-Factoring-example

S --> iEtS | iEtSeS | a

E->b

can be rewritten as

S -->iEtSS9 | a

S9--> | eS

Eb

Let’s check the take away from this lecture


1) Which is the most powerful parsing technique?

a)LL(1) b)LR(0)

c)LR(1) d)LALR

2) Predictive parser support backtracking method..

a)Yes b)No

3) Which Grammar is called LL(1) Grammar?

a) two adjacent non-terminals at the right side.

b) parsing table with no multiply-defined entries

c)Ambiguous grammars

d)Non-ambiguous grammars

4) Which data structure used to parse the string?

a)Stack b)Queue

c)Array d)LinkList

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

L24. Exercise
Q.1) Check following grammar is Left recursive or not? If yes remove left recursion.

S  Aa | b

A  Ac | Sd | 
A.1) This grammar is not immediately left-recursive, but it is still left-recursive.

S þ Aa þ Sda or

A þ Sd þ Aad causes to a left-recursion


So, we have to eliminate all left-recursions from our grammar.

S  Aa | b

A  Ac | Aad | bd | 

ÿ eliminate left recursion

S  Aa | b

A  bdA9 | A9

A9  cA9 | adA9 | 
Q.2)Check following grammar is left recursive & Left factored or not.

A  abB | aB | cdg | cdeB | cdfB


A.2) Grammar is not left recursive .
But Grammar is not left factored. So make left factored.

A  abB | aB | cdg | cdeB | cdfB

A  aA9 | cdg | cdeB | cdfB

A9  bB | B

A  aA9 | cdA99

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A9  bB | B

A99  g | eB | fB

Learning from this lecture 8Top Down Parsing9:


Students will able to find Left recursion and Left Factored of the given Grammar.

Lecture: 25 & 26
Recursive descent and predictive parsers (LL)
Learning Objective: In this lecture students will able to design Recursive Decent and Predictive
Parser(LL).
4.9.3 Example on Predictive or LL(1) Parser:-

a. Steps:

1. Check grammar is left recursive or not.

2. Check grammar is left factored or not.

3. Find FIRST & FOLLOW set to construct Predictive parser table.

4. Constuct Predictive parser table.

5. Check grammar is LL(1) or not.

6. Parse i/p string

b. Compute FIRST as follows:

 Let a be a string of terminals and non-terminals.


 First (a) is the set of all terminals that can begin strings derived from a.

Compute FIRST(X) as follows:

a) if X is a terminal, then FIRST(X)={X}

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

b) if X is a production, then add  to FIRST(X)

c) if X is a non-terminal and XY1Y2...Yn is a production, add FIRST(Yi) to


FIRST(X) if the preceding Yjs contain  in their FIRSTs

c. Compute FOLLOW as follows:

a) FOLLOW(S) contains EOF

b) For productions AñBò, everything in FIRST(ò) except  goes into


FOLLOW(B)

c) For productions AñB or AñBò where FIRST(ò) contains , FOLLOW(B)


contains everything that is in FOLLOW(A)

d .Constructing LL(1) Parsing Table – Algorithm

• for each production rule A  ñ of a grammar G

3 for each terminal a in FIRST(ñ)

 add A  ñ to M[A,a]

3 If  in FIRST(ñ)  for each terminal


a in FOLLOW(A) add A  ñ to M[A,a]

3 If  in FIRST(ñ) and $ in FOLLOW(A)  add A  ñ to


M[A,$]

3 All other undefined entries of the parsing table are error entries.

e. Example on LL(1) or Predictive Parser:

Original grammar:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

EE+E
EE*E
E(E)
Eid

This grammar is left-recursive, ambiguous and requires left-factoring. It needs to


be modified before we build a predictive parser for it:

Step 1: Remove Ambiguity.

EE+T
TT*F
F(E)
Fid

Grammar is left recursive hence Remove left recursion:

ETE'

E'+TE'|
TFT'

T'*FT'|
F(E)
Fid

Step 2: Grammar is already left factored.

Step 3: Find First & Follow set to construct predictive parser table:-

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}

FIRST(E') = {+, }

FIRST(T') = {*, }

FOLLOW(E) = FOLLOW(E') = {$, )}

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

FOLLOW(T) = FOLLOW(T') = {+, $, )}

FOLLOW(F) = {*, +, $, )}

Step 4: Now, we can either build a Predictive parser table:

Parsing table

Id + * ( ) $

E E  TE9 E  TE9

E9 E9  +TE9 E9   E9  

T T  FT9 T  FT9

T9 T9   T9  *FT9 T9   T9  

F F  id F  (E)

Step 5: Parser table not contain any multiple defined entries. Hence, given grammar is
LL(1) grammar.

Step 6: Parse the input id*id using the parse table and a stack

Step Stack Input Next Action

1 $E id*id$ ETE'

2 $E'T id*id$ TFT'

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3 $E'T'F id*id$ Fid

4 $E'T'id id*id$ match id

5 $E'T' *id$ T'*FT'

6 $T'F* *id$ match *

7 $T'F id$ Fid

8 $T'id id$ match id

9 $T' $ T'

10 $ $ accept

Let’s check the take away from this lecture


1) A left recursive grammar:

a) cannot be LL(1) b) cannot be LR(1)

c) an infinite set d) none of the above

2) True| false questions:

a) Every regular grammar is context free grammar.- TRUE

b)Every LL(1) grammar is SLR(1) also. - FALSE

c) Every unambiguous grammar belong to the class of either SLR, CLR or


LALR.- FALSE

d)If a grammar G is SLR(1) then it is definitely LALR(1).-TRUE

L24 Exercise:
Q.1 Check following grammar is LL(1) or not.

SiCtSE | a
EeS | 
Cb

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Questions/problems for practice:


Q.1 For the grammar having productions:

A  (A)A | 

Compute FIRST & FOLLOW set of A.

A.1 FIRST (A) = { ( ,  }

FOLLOW (A) = {$ , )}

Q.2 Is the following grammar LL(1)

S  aSa | 

Learning from the lecture 8Recursive Decent and Predictive Parser(LL)9:

Student will able to design Top Down Parser.

Lecture: 27 & 28
Bottom-Up parsing and Operator Precedence Parser
Learning Objective: In this lecture students will able to design Bottom 3 Up Parsing.

4.9.5 Bottom Up Parsing:

 A general style of bottom-up syntax analysis, known as shift-reduce parsing.


 Two types of bottom-up parsing:

7.3.1 Operator-Precedence parsing

7.3.2 LR parsing

Operator-Precedence Parser:

Operator-Precedence Parsing Algorithm

The input string is w$, the initial stack is $ and a table holds precedence relations
between certain terminals

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Algorithm:

set p to point to the first symbol of w$ ;

repeat forever

if ( $ is on top of the stack and p points to $ ) then return

else {

let a be the topmost terminal symbol on the stack and let b be the symbol pointed
to by p;

if ( a <. b or a =· b ) then { /* SHIFT */

push b onto the stack;

advance p to the next input symbol;

else if ( a .> b ) then /* REDUCE */

repeat pop stack

until ( the top of stack terminal is related by <. to the terminal most
recently popped );

else error();

 Operator grammar
In an operator grammar, no production rule can have:

1) at the right side

2) Two adjacent non-terminals at the right side.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

 Example

EAB EEOE EE+E |

Aa Eid E*E |

Bb O+|*|/ E/E | id

not operator grammar not operator grammar perator grammar

 Precedence Relations:

1) In operator-precedence parsing, we define three disjoint precedence relations


between certain pairs of terminals.

a <. b b has higher precedence than a

a =· b b has same precedence as a

a .> b b has lower precedence than a

2) The determination of correct precedence relations between terminals are based on


the traditional notions of associatively and precedence of operators. (Unary minus
causes a problem).

3) The intention of the precedence relations is to find the handle of a right-sentential


form,

<. with marking the left end,

=· appearing in the interior of the handle, and

.
> marking the right hand.

4) In our input string $a1a2...an$, we insert the precedence relation between the pairs
of terminals (the precedence relation holds between the terminals in that pair).

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

E  E+E | E-E | E*E | E/E | E^E | (E) | -E | id

The partial operator-precedence table for this grammar

5) Then the input string id+id*id with the precedence relations inserted will be: $ <. id
.
> + <. id .> * <. id .> $

id + * $

. . .
Id > > >

+ <. .
> <. .
>

* <. .
> .
> .
>

$ <. <. <.

Advantages & Disadvantages of Operator Precedence Parsing:-

Disadvantages:

1) It cannot handle the unary minus (the lexical analyzer should handle the
unary minus).

2) Small class of grammars.

3) Difficult to decide which language is recognized by the grammar.

Advantages:

1) Simple

2) Powerful enough for expressions in programming languages

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Let’s check the take away from this lecture


1. Analysis which determines the meaning of a statement once its grammatical structure
becomes known is termed as
(A) Semantic analysis (B) Syntax analysis
(C) Regular analysis (D) General analysis

L26 & 27 Exercise:


Q.1 Explain Operator Precedence Parser with the help of Example.
Q.2 Write short note on : Virtual 8086 mode
Questions/problems for practice:
Q.1 Design Operator Precedence Parser for Arithmetic Operator.
Learning from the lecture 8Bottom up Parsing and Operator Precedence Parsing9:
Student will able to Design Operator Precedence Parser.

Lecture: 29
LR parsers
Learning Objective: In this lecture students will able to Design LR Parser.

4.9.6 LR Parser:

a) LR (0) or SLR

b) LR (1) or Canonical LR

c) LALR or Look ahead LR

A)LR(0) or SLR Parsing Tables for Expression Grammar

1) E  E+T

2) E  T

3) T  T*F

4) T  F

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

5) F  (E)

6) F  id

Construction of The Canonical LR(0) Collection:

• To create the SLR parsing tables for a grammar G, we will create the canonical
LR(0) collection of the grammar G9.

• Algorithm:

C is { closure({S9.S}) }

repeat the followings until no more set of LR(0) items can be added to C.

for each I in C and each grammar symbol X

if goto(I,X) is not empty and not in C

add goto(I,X) to C

goto function is a DFA on the sets in C.

Steps:

1) Find Augmented Grammar G9.

2) Find item I.

3) Find Closure operation.

4) Find Goto operation.

5) Construct Canonical Collection or LR(0) collection.

6) Draw DFA.

7) Find FIRST & FOLLOW set.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

8) Design SLR table.

9) Check grammar is SLR or not.

10) Parse the i/p string.

Step1.Augmented Grammar:

G9 is G with a new production rule S9S where S9 is the new starting symbol.

E9-> E

E  E+T

ET

T  T*F

TF

F  (E)

F  id

Step 2.LR (0) Item:

An LR(0) item of a grammar G is a production of G a dot at the some position of


the right side

• Ex: A  XYZ Possible LR(0) Items: A  .XYZ

(four different possibility) A  X.YZ

A  XY.Z

A  XYZ.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• Sets of LR(0) items will be the states of action and goto table of the SLR parser.

• A collection of sets of LR(0) items(the canonical LR(0) collection) is the basis


for constructing SLR parsers.

Step 3.The Closer Operation:

• If I is a set of LR(0) items for a grammar G, then closure(I) is the set of


LR(0) items constructed from I by the two rules:

1. Initially, every LR (0) item in I is added to closure(I).

2. If A  ñ.Bò is in closure (I) and B is a production rule of G; then


B. Will be in the closure (I). We will apply this rule until no more
new LR (0) items can be added to closure (I).

The Closure Operation -- Example

E9  E closure ({E9  .E}) =

E  E+T { E9  .E kernel items

ET E  .E+T

T  T*F E  .T

TF T  .T*F

F  (E) T  .F

F  id F  .(E)

F  .id

Step 4.Goto Operation

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• If I is a set of LR(0) items and X is a grammar symbol (terminal or non-


terminal), then goto(I,X) is defined as follows:

3 If A  ñ.Xò in I then every item in closure({A  ñX.ò}) will be in


goto(I,X).

3 If I is the set of items that are valid for some viable prefix , then
goto(I,X) is the set of items that are valid for the viable prefix
X.

Example:

I ={ E9  .E, E  .E+T, E  .T,

T  .T*F, T  .F,

F  .(E), F  .id }

goto(I,E) = { E9  E., E  E.+T }

goto(I,T) = { E  T., T  T.*F }

goto(I,F) = {T  F. }

goto(I,() = { F  (.E), E  .E+T, E  .T, T  .T*F, T  .F,

F  .(E), F  .id }

goto(I,id) = { F  id. }

Step 5.The Canonical LR(0) Collection – Example

I0: E9  .E I1: E9  E. I6: E  E+.T I9: E  E+T.

E  .E+T E  E.+T T  .T*F T  T.*F

E  .T T  .F

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

T  .T*F I2: E  T. F  .(E) I10: T  T*F.

T  .F T  T.*F F  .id

F  .(E)

F  .id I3: T  F. I7: T  T*.F I11: F  (E).

F  .(E)

I4: F  (.E) F  .id

E  .E+T

E  .T I8: F  (E.)

T  .T*F E  E.+T

T  .F

F  .(E)

F  .id

I5: F  id.

Step 6. FIRST & FOLLOW set.

FIRST(E)= FIRST(T)= FIRST(F)={ ( , id ) }

FOLLOW(E)= { $, +, ) }

FOLLOW(T)= {*, $, +, ) }

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

FOLLOW(F)= {*, $, +, ) }

Step 7: Transition Diagram (DFA) of Goto Function

T * to I7
I

EI0 E I1 + F to I3
I6
Id ( to I4

to I5

T *
I2
I F
I1
( to I4

Id to I5

( E )
I I8 I11
T +

id F to I2 to I6

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

id to I3

Step 8. Design SLR Iparser table.

Action Table Goto Function

stat id + * ( ) $ E T F
e

0 S5 s4 1 2 3

1 s6 acc

2 r2 s7 r2 r2

3 r4 r4 r4 r4

4 s5 s4 8 2 3

5 r6 r6 r6 r6

6 s5 s4 9 3

7 s5 s4 10

8 s6 s11

9 r1 s7 r1 r1

10 r3 r3 r3 r3

11 r5 r5 r5 r5

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Step 9: There is no Multiple defined shift/Reduce entries in the parser table hence given
grammar is LR(0) Grammar.

Step 10:Actions of a (S)LR-Parser -- Example

stack input action output

0 id*id+id$ shift 5

0id5 *id+id$ reduce by Fid Fid

0F3 *id+id$ reduce by TF TF

0T2 *id+id$ shift 7

0T2*7 id+id$ shift 5

0T2*7id5 +id$ reduce by Fid Fid

0T2*7F10 +id$ reduce by TT*F TT*F

0T2 +id$ reduce by ET ET

0E1 +id$ shift 6

0E1+6 id$ shift 5

0E1+6id5 $ reduce by Fid Fid

0E1+6F3 $ reduce by TF TF

0E1+6T9 $ reduce by EE+T EE+T

0E1 $ accept

Let’s check the take away from this lecture

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

1) yacc creates a ….. parser for the given grammar.

1)LALR 2)LR(0)

3)LR(1) 4)LL(1)

2) Which of the following conflicts can not arise in LR parsing:

a)shift-reduce b)reduce-reduce

c)shift-shift d)none of the above

3) If the grammar is LALR(1) then it is necessarily:

a)SLR(1) b)LR(1)

c)LL(1) d)None of the above

4) LR grammar is a:

a) Context free grammar b) Context sensitive grammar

c) Regular Grammar d) None of the above

5) YACC is a:

a) Lexical analyzer generator b) A parser generator

c) Semantic analyzer d) None of the above

L27 Exercise:
Q.1 Construct the LR(0) collection for following Arithmetic Grammar & construct LR(0) parser
table:-

EE+T
TT*F
F(E)
Fid

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Question/ problems for practice


Q.1 Explain the LR parsers with suitable examples.

Learning from the Lecture 8LR Parser9:


Student will able to design LR (0) Parser.

Lecture: 30 & 31
LR (1) and LALR parsers
Learning Objective: In this lecture students will able to Design LR(1) and LALR Parser
4.9.7 Example on LR(1) parser:

Steps:

1)Find Augmented Grammar G9.

2)Find LR(1) item.

3)Find Closure operation.

4)Find Goto operation.

5)Construct Canonical Collection.

6)Draw DFA.

7)Find FIRST & FOLLOW set.

8)Design LR(1) table.


9)Check grammar is LR1 or not.

10)Parse the i/p string.

LR (1) Item

• To avoid some of invalid reductions, the states need to carry more information.

• Extra information is put into a state by including a terminal symbol as a second


component in an item.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• A LR(1) item is:

A  ñ.ò,a where a is the look-head of the LR(1) item

(a is a terminal or end-marker.)

• Such an object is called LR(1) item.

3 1 refers to the length of the second component

3 The lookahead has no effect in an item of the form [A  ñ.ò,a], where ò is


not .

3 But an item of the form [A  ñ.,a] calls for a reduction by A  ñ only if


the next input symbol is a.

closure(I) is: ( where I is a set of LR(1) items)

3 every LR(1) item in I is in closure(I)

3 if Añ.Bò,a in closure(I) and B is a production rule of G; then


B.,b will be in the closure(I) for each terminal b in FIRST(òa) .

goto operation:

• If I is a set of LR(1) items and X is a grammar symbol (terminal or non-


terminal), then goto(I,X) is defined as follows:

If A  ñ.Xò,a in I

then every item in closure({A  ñX.ò,a}) will be in goto(I,X).

Construction of The Canonical LR(1) Collection:

• Algorithm:

C is { closure({S9.S,$}) }

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

repeat the followings until no more set of LR(1) items can be added to C.

for each I in C and each grammar symbol X

if goto(I,X) is not empty and not in C

add goto(I,X) to C

goto function is a DFA on the sets in C.

An Example of LR(1) PARSER:

Grammar G:

S CC

C  cC

C d

Step1:Augmented Grammar G9

1. S9  S

2. S  C C

3. C  c C

4. C  d

Step 2,3,4,5: Item & Closure & goto operations i.e. LR(1) Collection:

I0: closure({(S9   S, $)}) =

(S9   S, $)

(S   C C, $)

(C   c C, c/d)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(C   d, c/d)

I1: goto(I1, S) = (S9  S  , $)

I2: goto(I1, C) =

(S  C  C, $)

(C   c C, $)

(C   d, $)

I3: goto(I1, c) =

(C  c  C, c/d)

(C   c C, c/d)

(C   d, c/d)

I4: goto(I1, d) =

(C  d , c/d)

I5: goto(I3, C) =

(S  C C , $)

I6: goto(I3, c) =

(C  c  C, $)

(C   c C, $)

(C   d, $)

I7: goto(I3, d) =

(C  d , $)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

I8: goto(I4, C) =

(C  c C , c/d)

: goto(I4, c) = I4

: goto(I4, d) = I5

I9: goto(I7, c) =

(C  c C , $)

: goto(I7, c) = I7

: goto(I7, d) = I8

Step 6: Find FIRST & FOLLOW set.

FIRST (S)=FIRST(C)={ c , d }

FOLLOW(S)={ $ }

FOLLOW(C)={ c,d }

Step7: Construct DFA:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Step 8: Construct LR(1) Parsing Table:

Action Table Goto Function

States c D $ S C

I0 s3 s4 g1 g2

I1 acc

I2 S6 S7 g5

I3 S3 S4 g8

I4 r3 r3

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

I5 r1

I6 S6 S7 g9

I7 r3

I8 r2 r2

I9 r2

Step 9:There is no multiple defined entries in the LR(1) table hence this grammar is LR(1)
grammar.

3)LALR or Lookahead parser:

LALR stands for Lookahead LR.

1. LALR parsers are often used in practice because LALR parsing tables are smaller than
LR(1) parsing tables.

2. The number of states in SLR and LALR parsing tables for a grammar G are equal.

3. But LALR parsers recognize more grammars than SLR parsers.

4. yacc creates a LALR parser for the given grammar.

A state of LALR parser will be again a set of LR(1) items.

• We will do this for all states of a canonical LR(1) parser to get the states of the LALR
parser.

• In fact, the number of the states of the LALR parser for a grammar will be equal to the
number of states of the SLR parser for that grammar.

• We will find the states (sets of LR(1) items) in a canonical LR(1) parser with same cores.
Then we will merge them as a single state.

Consider I3 & I6 and replaced by their union:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

I3: goto(I1, c) =

(C  c  C, c/d)

(C   c C, c/d)

(C   d, c/d) = I36: (C  c  C, c/d/$)

I6: goto(I3, c) = (C   c C, c/d/$)

(C  c  C, $) (C   d, $)

(C   c C, $)

(C   d, $)

Consider I4 & I7 and replaced by their union:

I4: goto(I1, d) =

(C  d , c/d)  I47: (C  d , c/d/$)

I7: goto(I3, d) =

(C  d , $)

Consider I8 & I9 and replaced by their union:

I8: goto(I4, C) =

(C  c C , c/d)  I89: C  c C , c/d/$

I9: goto(I7, c) =

(C  c C , $)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

DFA:-

Creation of LALR Parsing Tables:

Action Table Goto Function

States c D $ S C

0 s36 s47 1 2

1 acc

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

2 S36 S47 5

36 S36 S47 89

47 r3 r3 r3

5 r1

89 r2 r2 r2

If there is no parsing action conflicts ,then the given grammar is said to be an LALR Grammar.

Let’s check the take away from this lecture

1) Which of the following conflicts can not arise in LR parsing:

a)shift-reduce b)reduce-reduce

c)shift-shift d)none of the above

2) If the grammar is LALR(1) then it is necessarily:

a)SLR(1) b)LR(1)

c)LL(1) d)None of the above

Exercise:
Q 1. Cosider the following grammar and construct the LALR parsing table.

S->AA

A-> aA | b

(Dec 2007 ) (10M)

Questions/problems for Practice


Q.1. Construct the LALR parsing table for the following grammar.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

S->S

S-> CC

C-> cC | d

Learning from the lecture 8LR (1) and LALR9: Student will able to design and Implement
Bottom up Parser.

Let’s check the take away from this lecture

Q.1 Which of the following eliminate common sub expression?


a)Parse tree
b) Syntax tree
c)DAG
Q.2 An inherited attribute is the one whose initial value at a parse tree node is defined in
terms of:
a) attributes at the parent and /or siblings of that node
b) attributes at the children nodes only
c) attributes at both children nodes & parent and / or siblings of that node
d) none of the above

Exercise:
Q 1. Cosider the following grammar and construct the LALR parsing table.

S->AA

A-> aA | b

(Dec 2007 ) (10M)

Questions/problems for Practice

Q.1. Construct the LALR parsing table for the following grammar.

S->S

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

S-> CC

C-> cC | d

Learning from the lecture 8Syntax directed definitions9 : Student will able to design and
Implement Bottom up Parser.

Lecture: 33
LEX Compiler

Learning Objective: In this lecture students will able to understand the working of LEX Compiler

6.9.6 LEX Compiler:

Lex is a tool in lexical analysis phase to recognize tokens using regular expression. Lex tool itself
is a lex compiler.

Steps to generate LEXER

÷ lex.l: is an a input file written in a language which describes the


generation of lexical analyzer. The lex compiler transforms lex.l to a C
program known as lex.yy.c.
÷ lex.yy.c: is compiled by the C compiler to a file called a.out.
÷ The output of C compiler is the working lexical analyzer which takes
stream of input characters and produces a stream of tokens.
÷ yylval is a global variable which is shared by lexical analyzer and parser
to return the name and an attribute value of token.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

÷ The attribute value can be numeric code, pointer to symbol table or


nothing.
÷ Another tool for lexical analyzer generation is Flex.

Structure of Lex Programs

Lex program will be in following form

declarations

%%

translation rules

%%

auxiliary functions

Declarations This section includes declaration of variables, constants and regular definitions.

Translation rules It contains regular expressions and code segments.

Form : Pattern {Action}

Pattern is a regular expression or regular definition.

Action refers to segments of code.

Auxiliary functions This section holds additional functions which are used in actions. These
functions are compiled separately and loaded with lexical analyzer.

Lexical analyzer produced by lex starts its process by reading one character at a time until a valid
match for a pattern is found.

Once a match is found, the associated action takes place to produce token.

The token is then given to parser for further processing.

6.9. 7 Let’s check the take away from this lecture

o LEX tool is used to generate :


a) Lexical Analyzer
b) Syntax Analyzer
c) ICG

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

L23 Exercise:
Q.2 Write short note on: LEX Compiler

Lecture: 34
YACC Compiler- Compiler

Learning Objective: In this lecture students will able to understand the working of YACC Compiler

6.9.6 YACC Compiler:

Why use Lex & Yacc ?

• Writing a compiler is difficult requiring lots of time and effort. Construction of the scanner
and parser is routine enough that the process may be automated.

• What is YACC ?
3 Tool which will produce a parser for a given grammar.
3 YACC (Yet Another Compiler Compiler ) is a program designed to compile a
LALR(1) grammar and to produce the source code of the syntactic analyzer of the
language produced by this grammar.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

• Original written by Stephen C. Johnson, 1975.

• Variants:
3 lex, yacc (AT&T)
3 bison: a yacc replacement (GNU)
3 flex: fast lexical analyzer (GNU)
3 BSD yacc
PCLEX, PCYACC (Abraxas Software)

How YACC Works?

File containing desired


grammar in yacc format

yacc program

a.out

C source program created by yacc


gram.y

yacc C compiler

Executable program that will parse


grammar given in gram.y

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

How YACC Works

yacc y.tab.c

cc
or gcc

(1) Parser generation time

C compiler/linker y.tab.h

y.tab.c
An YACC File Example
%{
#include <stdio.h>
%}

%token NAME NUMBER


%%

statement: NAME '=' expression


| expression { printf("= %d\n", $1); }
;

expression: expression '+' NUMBER { $$ = $1 + $3; }


| expression '-' NUMBER { $$ = $1 - $3; }
| NUMBER { $$ = $1; }
;
%%
int yyerror(char *s)
{
Downloaded
fprintf(stderr, by super market
"%s\n", s); ([email protected])
lOMoARcPSD|24907930

Works with Lex

LEX
a. (2) Compile time
y yylex()

Input programs
YACC a.out
yyparse() 12 + 26

YACC File Format

%{

C declarations

%}

yacc declarations

%%

Grammar rules

%%

Additional C code

3 Comments enclosed in /* ... */ may appear in any of the sections.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Add to Knowledge (Content Beyond Syllabus)

Chomsky Hierarchy
A grammar can be classified on the basis of production rules Chomsky classified grammars into
following types.

Type and Grammar name Language used Forms of production Accepting device

Type-0 3unrestricted Recursively X1->X2 where Turing Machine


grammar enumerable
X1,X2Є(V U T)*

Where V-is a variable


set

T-is a Terminal set

Type -1 3context Context sensitive X1->X2 Turing Machine with


sensitive grammar bounded tape &
X1,X2Є(V U T)*
length of tape is finite
Where V-is a variable
set

T-is a Terminal set

& |X1|<=|X2|

Type-2-Context free Context free Y->X1 PDA (Push down


grammar language automata)
Where YЄV

& X1Є(VUT)*

Type-3-Regular Regular Language X->aY|a|Ya|ε FA(Finite Automata)


Grammar
Where X,YЄV

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

&aЄT

4.10 Learning Outcomes:

1. Know:
a) Student should be able to differentiate between Top Down and Bottom up Parser.
b) Understand the role of Lexical and Syntax analyzer in Compiler Design.

2. Comprehend:
a) Student should be able to explain and design Lexical Analyzer.
b) Student should be able to describe and design Syntax Analyzer.

3. Apply, analyze and synthesize:


Student should be able to:
1. Show the demonstration how Lexical Analyzer generate tokens and detect error.
2. Show the working of Bottom up Parser.

4.11. Short Answer questions


Q.1)Test whether the grammar is LR(1) or not.

S  AaAb I0: S9  .S

S  BbBa S  .AaAb

A S  .BbBa

B A.

B.

Problem

A reduce by A   b reduce by A  

reduce by B   reduce by B  

reduce/reduce conflict reduce/reduce conflict

Q.2 For the grammar having productions:

A  (A)A | 

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Compute FIRST & FOLLOW set of A.

A.2 FIRST (A) = { ( ,  }

FOLLOW (A) = {$ , )}

Q.3 Is the following grammar LL(1)

S  aSa | 

Q.4 Consider the following CFG:-

S  Aa | b

A  Sc | d

Remove the left recursion from above grammar.

Answer: This grammar is not immediately left-recursive,

but it is still left-recursive.

S þ Aa þ Sca or

A þ Sc þ Aac causes to a left-recursion

So, we have to eliminate all left-recursions from our grammar.

The resulting equivalent grammar which is not left-recursive is:

S  Aa | b

A  Aac | bc | d

ÿ eliminate left recursion

S  Aa | b

A  bcA9 |dA9

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A9  acA9 | 

Q.5 Test whether the grammar is LL(1) or not and construct a predictive parsing table for it.

S  AaAb

S  BbBa

A

B

Q.6 What is parsing? Write down the drawback of top down parsing of backtracking.
Ans:Parsing is the process of analyzing a text, made of a sequence of tokens, to determine its
grammatical structure with respect to a given formal grammar. Parsing is also known as syntactic
analysis and parser is used for analyzing a text. The task of the parser is essentially to determine if
and how the input can be derived from the start symbol of the grammar. The input is a valid input
with respect to a given formal grammar if it can be derived from the start symbol of the grammar.
Following are drawbacks of top down parsing of backtracking:
(i) Semantic actions cannot be performed while making a prediction. The actions must be
delayed until the prediction is known to be a part of a successful parse.
(ii) Precise error reporting is not possible. A mismatch merely triggers backtracking. A
source string is known to be erroneous only after all predictions have failed.

Q.7 For the following grammar construct the predictive parsing table and explain that step by step.

Grammar G:-

ETE'

E'+TE'|
TFT'

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

T'*FT'|
F(E)
Fid

Q.8 Construct the LR(0) collection for following Arithmetic Grammar & construct LR(0) parser
table:-

EE+T
TT*F
F(E)
Fid

Q.9 Write short notes on:

i) LEX & YACC ii) Recursive descent parser

Q.10 Consider the following grammar:-

EE+T | T

TT*F | F
F(E)
Fid

Show the shift reduce parser action for the string id+id+id*id.

(May 2007,) (10M)

Q.11 Consider the following CFG:-

EE+T | T
TT*F | F
F(E) |I
Ia | b| c

Remove the left recursion from above grammar.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q.12 Construct LL(1) parsing table for following grammar.

S -> aBDh

B-> cC

C->bC | 

D -> EF

E-> g | 

F -> f | 

(Dec 2007) (10M)

Q 13. Cosider the following grammar and construct the LALR parsing table.

S->AA

A-> aA | b

(Dec 2007 ) (10M)

Q.14. Construct the LALR parsing table for the following grammar.

S->S

S-> CC

C-> cC | d

Q.15. Explain the LR parsers with suitable examples.

Q.16 Construct the predictive parser for the following grammar.

S  AaAb

S  BbBa

A

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

B

4.13. University Questions Sample Answers:


Q.1. Consider the following grammar-

S->A

A-> Ad|Ae|aB|aC

B->bBC | f

C->g

(Dec 2007) (10M)

Q.2. Eliminate left recursion present in following grammar ( Remove direct and Indirect
recursion both)

SAa|b

AAc|Sd|€ [May 2016]

Q. 3 What is Handle pruning? [Dec 2016]

Q.4 For the given grammar given below, construct operator precedence relations matrix,
assuming *, + are binary operators and id as terminal symbol and E as non terminal symbol.
EE+E
EE*E
Eid
Apply operator precedence parsing algorithm to obtain skeletal syntax free for the statement
Id+id*id. [Dec 2016, Nov 2015]
Q. 5 Construct SLR parsing table for following grammar. Show how parsing actions are
done for the input string ()()$. Show stacks content, i/p buffer, action.

S-> (S)S

S->€ [Dec 2016]

Q. 6 Find First and Follow set for given grammar below:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

ETE9 E9+TE9|€

TFT9 T9*FT9|€

F(E) Fid [Nov 2015]

4.15. References
1. A.V. Aho, a nd J.D.Ullman: Principles of compiler construction,
Pearson Education
2 . A.V. Aho, R. Shethi and Ulman; Compilers - Principles, Techniques and
Tools , Pearson Education

4.16 Practice for Module No.4 Syntax Analyzer (Based on Gate Exam & University Patterns)

1. What is the maximum number of reduce moves that can be taken by a bottom-up parser for a
grammar with no epsilon- and unit-production (i.e., of type A -> є and A -> a) to parse a string
with n tokens?
(A) n/2
(B) n-1
(C) 2n-1
(D) 2n

Answer: (B)

2. Consider the following two sets of LR(1) items of an LR(1) grammar.

X -> c.X, c/d


X -> .cX, c/d
X -> .d, c/d
X -> c.X, $
X -> .cX, $
X -> .d, $

Which of the following statements related to merging of the two sets in the corresponding LALR
parser is/are FALSE?

1. Cannot be merged since look aheads are different.


2. Can be merged but will result in S-R conflict.
3. Can be merged but will result in R-R conflict.
4. Cannot be merged since goto on c will lead to two different sets.

(A) 1 only
(B) 2 only
(C) 1 and 4 only

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(D) 1, 2, 3, and 4

Answer: (D)

3. For the grammar below, a partial LL(1) parsing table is also presented along with the
grammar. Entries that need to be filled are indicated as E1, E2, and E3. is the empty string, $
indicates end of input, and, | separates alternate right hand sides of productions.

(A) A
(B) B
(C) C
(D) D

Answer: (A)

4. Consider the data same as above question. The appropriate entries for E1, E2, and E3 are

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(A) A
(B) B
(C) C
(D) D

Answer: (C)

5. The grammar S → aSa | bS | c is


(A) LL(1) but not LR(1)
(B) LR(1)but not LR(1)
(C) Both LL(1)and LR(1)
(D) Neither LL(1)nor LR(1)

Answer: (C)

6. Match all items in Group 1 with correct options from those given in Group 2.

Group 1 Group 2
P. Regular expression 1. Syntax analysis
Q. Pushdown automata 2. Code generation
R. Dataflow analysis 3. Lexical analysis
S. Register allocation 4. Code optimization

(A) P-4. Q-1, R-2, S-3


(B) P-3, Q-1, R-4, S-2
(C) P-3, Q-4, R-1, S-2
(D) P-2, Q-1, R-4, S-3

Answer: (B)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

7. Which of the following describes a handle (as applicable to LR-parsing) appropriately?


(A) It is the position in a sentential form where the next shift or reduce operation will occur

(B) It is non-terminal whose production will be used for reduction in the next step
(C) It is a production that may be used for reduction in a future step along with a position in the
sentential form where the next shift or reduce operation will occur
(D) It is the production p that will be used for reduction in the next step along with a position in
the sentential form where the right hand side of the production may be found

Answer: (D)

8. An LALR(1) parser for a grammar G can have shift-reduce (S-R) conflicts if and only if
(A) the SLR(1) parser for G has S-R conflicts
(B) the LR(1) parser for G has S-R conflicts
(C) the LR(0) parser for G has S-R conflicts
(D) the LALR(1) parser for G has reduce-reduce conflicts

Answer: (B)

9. Which one of the following is a top-down parser?


(A) Recursive descent parser.
(B) Operator precedence parser.
(C) An LR(k) parser.
(D) An LALR(k) parser

Answer: (A)

10. Consider the grammar with non-terminals N = {S,C,S1 },terminals T={a,b,i,t,e}, with S as the start
symbol, and the following set of rules:

S --> iCtSS1|a
S1 --> eS|ϵ
C --> b

The grammar is NOT LL(1) because:


(A) it is left recursive
(B) it is right recursive
(C) it is ambiguous
(D) It is not context-free.

Answer: (C)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

11. Consider the following two statements:

P: Every regular grammar is LL(1)


Q: Every regular set has a LR(1) grammar

Which of the following is TRUE?


(A) Both P and Q are true
(B) P is true and Q is false
(C) P is false and Q is true
(D) Both P and Q are false

Answer: (C)

12. Consider the following grammar.

S -> S * E
S -> E
E -> F + E
E -> F
F -> id

Consider the following LR(0) items corresponding to the grammar above.

(i) S -> S * .E
(ii) E -> F. + E
(iii) E -> F + .E

Given the items above, which two of them will appear in the same set in the canonical sets-of-
items for the grammar?
(A) (i) and (ii)
(B) (ii) and (iii)
(C) (i) and (iii)
(D) None of the above

Answer: (D)

13. A canonical set of items is given below

S --> L. > R
Q --> R.

On input symbol < the set has


(A) a shift-reduce conflict and a reduce-reduce conflict.
(B) a shift-reduce conflict but not a reduce-reduce conflict.
(C) a reduce-reduce conflict but not a shift-reduce conflict.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(D) neither a shift-reduce nor a reduce-reduce conflict.

Answer: (D)

14. Consider the grammar defined by the following production rules, with two operators ∗ and +

S --> T * P
T --> U | T * U
P --> Q + P | Q
Q --> Id
U --> Id

Which one of the following is TRUE?

(A) + is left associative, while ∗ is right associative


(B) + is right associative, while ∗ is left associative
(C) Both + and ∗ are right associative
(D) Both + and ∗ are left associative

Answer: (B)

15. Which one of the following grammars is free from left recursion?

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(A) A
(B) B
(C) C
(D) D

Answer: (B)

16. Consider the following grammar:

S → FR
R→S|ε
F → id

In the predictive parser table, M, of the grammar the entries M[S, id] and M[R, $] respectively.
(A) {S → FR} and {R → ε }
(B) {S → FR} and { }
(C) {S → FR} and {R → *S}
(D) {F → id} and {R → ε}

Answer: (A)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

17. The grammar A → AA | (A) | ε is not suitable for predictive-parsing because the grammar is
(A) ambiguous
(B) left-recursive
(C) right-recursive
(D) an operator-grammar

Answer: (B)

18. Consider the grammar

S → (S) | a

Let the number of states in SLR(1), LR(1) and LALR(1) parsers for the grammar be n1, n2 and
n3 respectively. The following relationship holds good
(A) n1 < n2 < n3 (B) n1 = n3 < n2 (C) n1 = n2 = n3
(D) n1 ≥ n3 ≥ n2

Answer: (B)

19. Which of the following grammar rules violate the requirements of an operator grammar ? P,
Q, R are nonterminals, and r, s, t are terminals.

1. P → Q R
2. P → Q s R
3. P → ε
4. P → Q t R r

(A) 1 only
(B) 1 and 3 only
(C) 2 and 3 only
(D) 3 and 4 only

Answer: (B)

20. Which of the following suffices to convert an arbitrary CFG to an LL(1) grammar?
(A) Removing left recursion alone
(B) Factoring the grammar alone
(C) Removing left recursion and factoring the grammar
(D) None of these

Answer: (D)

Self-Assessment

Q.1 What is Lexical Analysis? What are the functions of Lexical Analyzer?

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q.2 What is Grammar? Write Different types of grammar.


Q.3 Write Predictive Parser algorithm to design LL(1) Parser.
Q.4 Write Operator Precedence Algorithm.
Q.5 Differentiate between Top Down and Bottom up Parser.
Self-Evaluation
Name of
Student
Class
Roll No.
Subject
Module No.
S.No Tick
Your choice
1. Do you understand the role of Syntax Analyzer o Yes
in Compiler Design? o No
2. Do you able to differenciate different Types of o Yes
parser? o No
3. Do you able to Design Top down and Bottom Up o Yes
Parser? o No
4. Do you able to identify different types of o Yes
compilation error ? o No
5. Do you Differenciate Between Top down and o Yes
Bottom up parser? o No
6. Do you understand module ? o Yes, Completely.
o Partialy.
o No, Not at all.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Module: 6
Compilers: Synthesis Phase
6.1. Motivation:

At the completion of the module, students should be able to:

• Understand the different representation of Intermediate Code.


• Design and implement ICG.
• Learn and design code generation algorithm.
• Optimize code using DAG.
• How to design & implement code generator
• Study of target architecture which include register set, instruction set & instruction
set.
6.2. Syllabus:
Lecture Content Duration Self Study
35 Syntax Directed Translation: 1 Lecture 2 hours
Attribute grammar, S and L attributed grammar, bottom up
and top down evaluations of S and L attributed grammar
36 Intermediate Code Generation: Intermediate code 3 need 1 Lecture 2 hours

37 Types of Intermediate codes 1 Lecture 2 hours

38 Representation of Three address Code 1 Lecture 2 hours

39 Code Generation: Issues in the design of Code Generator 1 Lecture 2 hours

40 Basic Blocks and Flow graphs 1 Lecture 2 hours

41 Code generation algorithm 1 Lecture 2 hours

42 DAG representation of Basic Block 1 Lecture 2 hours

43 Code Optimization: Need and sources of optimization 1 Lecture 2 hours

44 & 45 Code optimization techniques: Machine Dependent and 1 Lecture 2 hours


Machine Independent

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

6.3. Weightage: 15 Marks


6.4. Learning Objectives: Students should be able to-

1. Describe the role of Intermediate Code Generation in connection with language designing (E)

2. Comprehend the intermediate language and intermediate code for assignment statements, arrays,
Boolean expression, switch statement and conditional and iterative control flow. (U)

3. Explain back patching, procedure calls and translation of mixed mode expressions.(C)

4 Apply code generation algorithm for generating target machine code.(A)

5. State the issues in the design of a code generator. Describe basic blocks and flow graps.(R)

6. Describe dynamic programming code generation algorithm and code generator (U).

6.5. Theoretical Background:

A source code can directly be translated into its target machine code, then why at all we need to
translate the source code into an intermediate code which is then translated to its target code? Let
us see the reasons why we need an intermediate code.

 If a compiler translates the source language to its target machine language without having
the option for generating intermediate code, then for each new machine, a full native
compiler is required.
 Intermediate code eliminates the need of a new full compiler for every unique machine
by keeping the analysis portion same for all the compilers.
 The second part of compiler, synthesis, is changed according to the target machine.
 It becomes easier to apply the source code modifications to improve code performance by
applying code optimization techniques on the intermediate code.

Intermediate Representation

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Intermediate codes can be represented in a variety of ways and they have their own benefits.

 High Level IR - High-level intermediate code representation is very close to the source
language itself. They can be easily generated from the source code and we can easily
apply code modifications to enhance performance. But for target machine optimization, it
is less preferred.
 Low Level IR - This one is close to the target machine, which makes it suitable for
register and memory allocation, instruction set selection, etc. It is good for machine-
dependent optimizations.

Intermediate code can be either language specific (e.g., Byte Code for Java) or language
independent (three-address code).

6.6. Abbreviations:
• DAG: Directed Acyclic Graph

• ALP: Assembly Language Programming

6.7. Formulae: Nil

6.8. Key Definitions:


Syntax Tree: An (abstract) syntax tree is a condensed form of parse tree useful for representing
language constructs.
DAG (Directed Acyclic Graph): A DAG an expression identifies the common sub expressions
in the expression. Like a syntax tree, a DAG has a node for every sub expression of expression; an
interior node represents an operator and its children its operands.
The difference is that a node in a DAG representing a common subexpression has more than one
<parent= in a syntax tree, the common subexpression would be represented as a duplicate subtree.
Absolute program: The address at which the programs need to be loaded in the memory for
execution is fixed.
Reloadable program: These programs loaded anywhere in the memory ie. Addresses are
not fixed.

6.9. Course Content:

Lecture: 35
Syntax Directed Translation: Syntax directed definitions

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Learning Objective: In this lecture students will able to design intermediate code by using syntax
directed translation.

6.9.7 Contents:
Application of Syntax-Directed Translation:-
• Constructing Abstract Syntax Tree
• Type checking
• Intermediate code generatio

Syntax-Directed Translation:-

• Grammar symbols are associated with attributes to associate information with


the programming language constructs that they represent.

• Values of these attributes are evaluated by the semantic rules associated with
the production rules.

• Evaluation of these semantic rules:

3 may generate intermediate codes

3 may put information into the symbol table

3 may perform type checking

3 may issue error messages

3 may perform some other activities

3 in fact, they may perform almost any activities.

• An attribute may hold almost anything.

a string, a number, a memory location, a complex record.

Syntax-Directed Definitions and Translation Schemes :

When we associate semantic rules with productions, we use two notations:

1. Syntax-Directed Definitions

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

2. Translation Schemes
1. Syntax-Directed Definitions:

- give high-level specifications for translations

- hide many implementation details such as order of evaluation of semantic actions.

- We associate a production rule with a set of semantic actions, and we do not say when they
will be evaluated.

2. Translation Schemes:

- indicate the order of evaluation of semantic actions associated with a production rule.

- In other words, translation schemes give a little bit information about implementation
details.

Lecture: 36
Intermediate Code Generation: Intermediate languages: declarations, Assignment statements

Learning Objective: In this lecture student will able to understand the different ways to represent
Intermediate Code Generator.
6.9.2 Three-address code corresponding to the tree and dag Intermediate Code Generation:

In the analysis-synthesis model of a compiler, the front end translates a source program
into an intermediate representation from which the back end generates target code. Details
of the target language are confined to the back end, as far as possible. Although a source
program can be translated directly into the target language, some benefits of using a
machine-independent intermediate form are:

l. Retargeting is facilitated; a compiler for a different machine can be created by attaching


a back end for the new machine to an existing front end.

2. A machine-independent code optimizer can be applied to the intermediate


representation.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Position of intermediate code generator.

(Intermediate Languages) Intermediate Code Representation:

a) Syntax trees or DAG

b) Postfix notation

c) Three address code

Graphical Representations

a) A syntax tree depicts the natural hierarchical structure of a source program.


A dag gives the same information but in a more compact way because Common sub
expressions are identified.

A syntax tree and dag for the assign statement a : = b * - c + b * - c appear in

Graphical representations of a = b * - c + b * - c

b) Postfix notation is a linear zed representation of a syntax tree; it is a list of the nodes of the tree
in which a node appears immediately after its children.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The postfix notation for the syntax tree in Fig. (a) is

a b c uminus * b c uminus * + assign

Syntax trees for assignment statements are produced by the syntax-directed

definition -

Syntax-directed definition to produce syntax trees for assignment statements.

Tree representations of the syntax tree

Three-Address Code:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Three-address code is a sequence of statements of the general form x : = y op Z where x, y, and z


are names, constants, or compiler-generated temporaries;

op stands for any operator, such as a fixed- or floating-point arithmetic operator,

or a logical operator on Boolean-valued data.

Thus source language expression like x + y * z might be translated into a sequence

tl : Y * Z

t2 : = X + t1

where tl and t2 are compiler-generated temporary names.

Three-address code is a linearized representation of a syntax tree or a dag in which explicit names
correspond to the interior nodes of the graph.

a= b*-c+b*-c

Let’s check the take away from this lecture


Q.1 Which of the following is not graphical representation?
a)Syntax tree
b)Parse tree
c)DAG
d)postfix notation

Q.2 Which of the following identify common sub expression?

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

a) Syntax tree
b) DAG
c) Parse Tree
d) Postfix expression

L33. Exercise:
Q.1 Which are the different types of three address statements?

Questions/problems for practice:


Q.2 Explain Syntax tree, Parse tree & DAG.

Learning from the lecture 8Intermediate languages: declarations, Assignment statements9:


Student will able to understand the role of ICG in compiler design.

Lecture: 37
Types of Three Address Statement
Learning objective: In this lecture students will able to identify different types of Three Address
Statements
6.9.3 Types of Three-Address Statements:

Three-address statements are akin to assembly code. Statements can have symbolic labels and there are
statements for flow of control. A symbolic label represents the index of a three-address statement in the array
holding intermediate code.

Actual indices can be substituted for the labels either by making a separate pass, or by using "back patching,"

Here are the common three-address statements:

I. Assignment statements of the form x= y op Z, where op is a binary arithmetic or logical operation.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

2. Assignment instructions of the form x : = op y, where op is a unary operation. Essential unary operations
include unary minus, logical negation, shift operators, and conversion operators that, for example, convert a
fixed-point number to a floating-point number.

3. Copy statements of the form x : = y where the value of y is assigned to x.

4. The unconditional jump goto L. The three-address statement with label L is the next to be executed.

5. Conditional jumps such as if x relop y goto L. This instruction applies a relational operator «, =. >=, etc.) to
x and y, and executes the statement with label L next if x stands in relation relop to y. If not, the three-address
statement following if x relop y goto L is executed next, as in the usual sequence.

6. param x and call p, n for procedure calls and return y, where y representing a returned value is optional.
Their typical use is as the sequence of three-address statements par am Xl

par am X2

param Xn

call p,n

generated as part of a call of the procedure p (Xl, X2, ……….. Xn) The integer n indicating the number of
actual parameters in "call p, n" is not redundant because calls can be nested.

7. Indexed assignments of the form X : = y[ i] and x[ i] : = y. The first of these sets x to the value in the location
i memory units beyond location y. The statement x[ i] : = y sets the contents of the location i units beyond x to
the value of y. In both these instructions, x, y, and i refer to data objects.Address and pointer assignments of
the form x : = &y, x : = *y, and *x : = y.

Syntax-Directed Translation into Three-Address Code:

When three-address code is generated, temporary names are made up for the interior nodes of a syntax tree.
The value of nonterminal E on the left side of E -> E + E 2 will be computed into a new temporary t. In
general, the three address code for id : = E consists of code to evaluate E into some temporary t, followed by
the assignment id.place : = t. If an expression is a single identifier,say y, then y itself holds the value of the
expression. For the moment, we create a new name every time a temporary is needed;

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The S-attributed definition in Fig. generates three-address code for assignment statements.

Given input a : = b* - c + b* - c, it produces the code in Fig. The synthesized attribute S.code represents the
three address code for the assignment S.

The nonterminal E has two attributes:

1. E.place, the name that will hold the value of E, and

2_ E.code, the sequence of three-address statements evaluating E.

The function newtemp returns a sequence of distinct names vej , t2, . .. in response to successive calls.

For convenience, we use the notation gen (x ': =' y , +' z) in Fig. to represent the three-address statement x : =
y + z. Expressions appearing instead of variables like x, y, and z are evaluated when passed to gen, and quoted
operators or operands, like ' +', are taken literally. In practice, three address statements might be sent to an
output file, rather than built up into the code attributes.

Syntax-directed definition to produce three-address code for assignment

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Let’s check the take away from this lecture


Q.3 Intermediate code generator is optional phase of compiler.
State true or false.
-TRUE
Q.4 Three - address code consists of a sequence of instructions, each of which has at the
most ------ operands.
a) 2
b) 3
c) 0
d) 1

Q.5 Front end of the compiler consist which phase/s?


a) Lexical analyzer
b) Syntax tree
c)Semantic analyzer
d) All of the above

L22. Exercise
Q.3 What is difference between syntax tree & DAG?

Questions/Problems for practice:

Q.6 List different types of three address code.

Learning from this lecture 8Types of Three-Address Statements9:


Students will able to list and identify Three Address Statements.

Lecture: 38
Representation of Three address Code
Learning Objective: In this lecture students will able to understand the different ways to
Represent and Implement Three Address Code.
6.9.3 Implementations of Three-Address Statements

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A three-address statement' is an abstract form of intermediate code. In a compiler, these statements


can be implemented as records with fields for the operator and the operands. Three such
representations are quadruples, triples, and indirect triples.

Quadruples:

a) A quadruple is a record structure with four fields:


op, arg I, arg 2, and result.

b) The op field contains an internal code for the operator.


c) The three-address statement x : = y op z is represented by placing y in arg1, Z in arg 2, and
x in result.
d) Statements with unary operators like x : = -y or x : = y do not use arg 2. Operators like
param use neither arg 2 nor result.
e) Conditional and unconditional jumps put the target label in result.
f) The quadruples are for the assignment a : = b * - c + b* - c.
They are obtained from the three-address code in Fig. (a).

g) The contents of fields arg I, arg 2, and result are normally pointers to the symbol-table
entries for the names represented by these fields. If so, temporary names must be entered
into the symbol table as they are created.
Triples:

a) To avoid entering temporary names into the symbol table, refer to a temporary value by
the position of the statement that computes it.
b) Three-address statements can be represented by records with only three fields: op, arg 1
and arg2, as in Fig.(b).
c) The fields arg 1 and arg2, for the arguments of op, are either pointers to the symbol table.
Since three fields are used, this intermediate code format is known as triples.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Quadruple and triple representations of three-address statements.

A ternary operation like x[ i] := y requires two entries in the triple structure, as shown in Fig.(a),
while x : = y[ i] is naturally represented as two operations in Fig. (b).

More triple representations.

Indirect Triples:

Another implementation of three-address code that has been considered is that of listing pointers
to triples, rather than listing the triples themselves. This implementation is naturally called indirect
triples.

For example, let us use an array statement to list pointers to triples in the desired order.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Indirect triples representation of three-address statements

Let’s check the take away from this lecture


Q.6 Back end of the compiler consist which phase/s?
a) Intermediate code generator
b) Code generator
c)Code optimizer
d) All of the above

Q.7
a) Postfix notation
b) Syntax Tree
c) Parse tree
d)DAG
e)3- address stmt

L23 Exercise:
Q.4 Compare Triples, Quadruples & Indirect triples.

Questions/problems for practice:


Q.10 Write different ways to represent three address code.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Learning from the lecture Representation of Three address Code9:


Student will able to represent three address code in Triples, Quadruples and Indirect Triples.

Lecture: 39
Back patching and Issues in the design of Code Generator
Learning Objective: In this lecture students will able to list different Issues in the design of Code
Generator
6.9.4 Backpatching with example.

Three Address Code Generation – Backpatching

The problem in generating three address codes in a single pass is that we may not know the labels
that control must go to at the time jump statements are generated. So to get around this problem a
series of branching statements with the targets of the jumps temporarily left unspecified is
generated. BackPatching is putting the address instead of labels when the proper label is
determined.

Backpatching Algorithms perform three types of operations

1) makelist(i) 3 creates a new list containing only i, an index into the array of quadruples and
returns pointer to the list it has made.
2) merge(i,j) 3 concatenates the lists pointed to by i and j ,and returns a pointer to the
concatenated list.
3) backpatch(p,i) 3 inserts i as the target label for each of the statements on the list pointed to
by p.
The Translation scheme is as follows :-
1) E ---> E1 or M E2
backpatch(E1.falselist, M.quad)
E.truelist = merge(E1.truelist, E2.truelist)
E.falselist = E2.falselist
2) E ---> E1 and M E2
backpatch(E1.truelist, M.quad)
E.truelist = E2.truelist

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

E.falselist = merge(E1.falselist, E2.falselist)


3) E ----> not E1
E.truelist = E1.falselist
E.falselist = E1.truelist
4) E -----> (E1)
E.truelist = E1.truelist
E.falselist = E1.falselist
5) E -----> id1 relop id2
E.truelist = makelist(nextquad)
E.falselist = makelist(nextquad +1 )
emit(if id1.place relop id2.place goto __ )
emit(goto ___)
6) E -----> true
E.truelist = makelist(nextquad)
emit(goto ___)
7) E -----> false
E.falselist = makelist(nextquad)
emit(goto ___)
8) M -----> epsilon

5.9.5 Code generation must do following things:

 Produce correct code


 Make use of machine architecture.
 Run efficiently.

Position of code generation phase in compiler:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Source Program Intermediate \ Intermediate target


code Front End Code Code
Optimizer Generat
or

Symbol Table

Issues in the Design of Code generator

Code generator concern with:

1. Memory management.
2. Instruction Selection.
3. Register Utilization (Allocation).
4. Evaluation order.

1. Memory Management

Mapping names in the source program to address of data object is cooperating done in pass
1 (Front end) and pass 2 (code generator).

Quadruples → address Instruction.

Local variables (local to functions or procedures ) are stack-allocated in the activation


record while global variables are in a static area.

2. Instruction Selection

The nature of instruction set of the target machine determines selection.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

-"Easy" if instruction set is regular that is uniform and complete.


Uniform: all triple addresses
all stack single addresses.

Complete: use all register for any operation.

If we don't care about efficiency of target program, instruction selection is straight


forward.

For example, the address code is:


a := b + c
d := a + e

Inefficient assembly code is:

1. MOV b, R0 R0 ← b
2. ADD c, R0 R0 ← c + R0
3. MOV R0, a a ← R0
4. MOV a, R0 R0 ← a
5. ADD e, R0 R0 ← e + R 0
6. MOV R0 , d d ← R0

Here the fourth statement is redundant, and so is the third statement if 'a' is not
subsequently used.

3. Register Allocation

Register can be accessed faster than memory words. Frequently accessed variables should
reside in registers (register allocation). Register assignment is picking a specific register
for each such variable.

Formally, there are two steps in register allocation:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

1. Register allocation (what register?)


This is a register selection process in which we select the set of variables that will
reside in register.
2. Register assignment (what variable?)
Here we pick the register that contain variable. Note that this is a NP-Complete
problem.

Some of the issues that complicate register allocation (problem).

1. Special use of hardware for example, some instructions require specific register.

2. Convention for Software:

For example

 Register R6 (say) always return address.


 Register R5 (say) for stack pointer.
 Similarly, we assigned registers for branch and link, frames, heaps, etc.,

3. Choice of Evaluation order

Changing the order of evaluation may produce more efficient code.


This is NP-complete problem but we can bypass this hindrance by generating code
for quadruples in the order in which they have been produced by intermediate code
generator.
ADD x, Y, T1
ADD a, b, T2
is legal because X, Y and a, b are different (not dependent).

7.2 The Target Machine

Familiarity with the target machine and its instruction set is a prerequisite for designing a good
code generator.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Typical Architecture

Target machine is:

1. Byte addressable (factor of 4).


2. 4 byte per word.
3. 16 to 32 (or n) general purpose register.
4. Two addressable instruction of form:
Op source, destination.
e.g., move A, B
add A, D

Typical Architecture:

1. Target machine is :
2. Bit addressing (factor of 1).
3. Word purpose registers.
4. Three address instruction of forms:
Op source 1, source 2, destination
e.g.,
ADD A, B, C

 Byte-addressable memory with 4 bytes per word and n general-purpose registers, R0,
R1, . . . , Rn-1. Each integer requires 2 bytes (16-bits).

 Two address instruction of the form


mnemonic source, destination

ADDED-
MODE FORM ADDRESS EXAMPLE
COST

Absolute M M ADD R0, R1 1

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Register R R ADD temp, R1 0

Index c (R) c + contents (R) ADD 100(R2), R1 1

Indirect register *R contents (R) ADD * R2, R1 0

contents (c +
Indirect Index *c (R) ADD * 100(R2), R1 1
contents (R)

Literal #c constant c ADD # 3, R1 1

Instruction costs:

Each instruction has a cost of 1 plus added costs for the source and destination.

=> cost of instruction = 1 + cost associated the source and destination address mode.

This cost corresponds to the length (in words ) of instruction.

Examples

1. Move register to memory R0 ← M.


MOV R0, M cost = 1+1 = 2.
2. Indirect indexed mode:
MOV * 4 (R0), M
cost = 1 plus indirect index plus
instruction word
=1+1+1=3
3. Indexed mode:
MOV 4(R0), M
cost = 1 + 1 + 1 = 3
4. Litetral mode:
MOV #1, R0
cost = 1 + 1 = 2

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

5. Move memory to memory


MOV m, m cost = 1 + 1 + 1 = 3

Let’s check the take away from this lecture


Q.8 Which of the following is not 3- address statement?

a) x= y op Z,

b)x : = op y,

c)x : = y

d)x=y op z op x

Q.9 Which is the final phase of a compiler?


(A) Code generator (B) Code Optimizer
(C) Syntax analyzer (D) Parser

L24 Exercise:
Q.12 Define Backpatching
Q.13 Write a short note on: ICG
Question/ problems for practice
Q.14 Write different issues in the design of Code Generator?

Learning from the Lecture 8 Back patching and Issues in the design of Code Generator9:
Student will able to list the different issues in the design of Code Generator.

Lecture: 40
Basic Blocks and Flow graphs
Learning Objective: In this lecture students will able to define and draw flow graph for given
basic block.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

5.9.6 Basic block : A sequence of consecutive statements in which flow of control enters at the
beginning and leaves at the end without halt or possibility of branching except at the end.
Partitioning a sequence of statements into BBs
1. Determine leaders (first statements of BBs)
 The first statement is a leader
 The target of a conditional is a leader
 A statement following a branch is a leader
2. For each leader, its basic block consists of the leader and all the statements up to
but not including the next leader.

Example:
unsigned int fibonacci (unsigned int n) {
unsigned int f0, f1, f2;
f0 = 0;
f1 = 1;
if (n <= 1)
return n;
for (int i=2; i<=n; i++) {
f2 = f0+f1;
f0 = f1;
f1 = f2;
}
return f2;
}
read(n)
f0 := 0

Leaders:

f1 := 1
if n<=1 goto L0
i := 2
L2: if i<=n goto L1
return f2
L1: f2 := f0+f1
f0 := f1
f1 := f2
i := i+1
go to L2
L0: return n

Control Flow Graph:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

entry

read(n)
f0 := 0
f1 := 1
n <= 1

return n
i := 2

i<=n

f2 := f0+f1 return f2
f0 := f1
f1 := f2
i := i+1

exit

Let’s check the take away from this lecture


Q.10 Which phase of compiler generate target code?
(A) Code generator (B) Code Optimizer
(C) Syntax analyzer (D) Lexical analyzer

Q.11 Compiler is ------


a) Machine dependent
b) Machine independent
c) Language Independent

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

d) None of the above


Exercise:
Q.18 Define Basic Block and Flow Graph.

Questions/problems for Practice


Q.19 Write algorithm to find Basic Block.

Learning from the lecture 8Basic Blocks and Flow graphs9: Student will able to find basic
block and draw flow graphs.

Lecture : 41

Code generation algorithm and DAG representation of Basic Block

Learning Objective: In this lecture students will able to apply code generation algorithm for
generating Assembly code for the given optimized code.

5.9.7 Code generation algorithm:


The code generation algorithm takes as input a sequence of three-address statements constituting
a basic block. For each three-address statement of the form x := y op z we perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the computation y
op z should be stored. L will usually be a register, but it could also be a memory location.
We shall describe getreg shortly.
2. Consult the address descriptor for u to determine y9, (one of) the current location(s) of y.
Prefer the register for y9 if the value of y is currently both in memory and a register. If the
value of u is not already in L, generate the instruction MOV y9, L to place a copy of y in
L.
3. Generate the instruction OP z9, L where z9 is a current location of z. Again, prefer a register
to a memory location if z is in both. Update the address descriptor to indicate that x is in
location L. If L is a register, update its descriptor to indicate that it contains the value of x,
and remove x from all other register descriptors.
4. If the current values of y and/or y have no next uses, are not live on exit from the block,
and are in registers, alter the register descriptor to indicate that, after execution of x := y op
z, those registers no longer will contain y and/or z, respectively.

The function getreg returns the location L to hold the value of x for the assignment
x := y op z.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

1. If the name y is in a register that holds the value of no other names (recall that copy
instructions such as x := y could cause a register to hold the value of two or more variables
simultaneously), and y is not live and has no next use after execution of
x := y op z, then return the register of y for L. Update the address descriptor of y to indicate
that y is no longer in L
2. Failing (1), return an empty register for L if there is one.
3. Failing (2), if x has a next use in the block, or op is an operator such as indexing, that
requires a register, find an occupied register R. Store the value of R into memory location
(by MOV R, M) if it is not already in the proper memory location M, update the address
descriptor M, and return R. If R holds the value of several variables, a MOV instruction
must be generated for each variable that needs to be stored. A suitable occupied register
might be one whose datum is referenced furthest in the future, or one whose value is also
in memory.
4. If x is not used in the block, or no suitable occupied register can be found, select the
memory location of x as L.

For example, the assignment d: = (a - b) + (a - c) + (a - c) might be translated into the


following three-address code sequence:
t1 = a – b
t2 = a – c
t3 = t1 + t2
d = t3 + t2
The code generation algorithm that we discussed would produce the code sequence as shown.
Shown alongside are the values of the register and address descriptors as code generation
progresses.

Stmt code reg desc addr desc


t1=a-b mov a,R0 R0 contains t1 t1 in R0
sub b,R0
t2=a-c mov a,R1 R0 contains t1 t1 in R0
sub c,R1
R1 contains t2 t2 in R1
t3=t1+t2 add R1,R0 R0 contains t3 t3 in R0
R1 contains t2 t2 in R1
d=t3+t2 add R1,R0 R0 contains d d in R0
mov R0,d d in R0 and
memory

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

5.9.8 DAG (Directed Acyclic Graphs):

A Directed acyclic graph is a graph with no cycles which gives a picture of how the value
computed by each statement in a basic block is used in subsequent statement in the block.
That is, a DAG has node for every sub-expression of the expression. An interior node
represents n operator & its child represents an operand.
DAG is mainly used to identify the same expression.

DAG are useful data structures for implementing transformations on basic blocks. A DAG
gives a picture of how the value computed by a statement in a basic block is used in
subsequent statements of the block. Constructing a DAG from three-address statements is
a good way of determining common sub-expressions (expressions computed more than
once) within a block, determining which names are used inside the block but evaluated
outside the block, and determining which statements of the block could have their
computed value used outside the block. A DAG for a basic block is a directed cyclic graph
with the following labels on nodes:
1. Leaves are labeled by unique identifiers, either variable names or constants. From the
operator applied to a name we determine whether the l-value or r-value of a name is needed;
most leaves represent r-values. The leaves represent initial values of names, and we
subscript them with 0 to avoid confusion with labels denoting <current< values of names
as in (3) below.
2. Interior nodes are labeled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels. The intention is that
interior nodes represent computed values, and the identifiers labeling a node are deemed
to have that value.
4. For example, the slide shows a three-address code. The corresponding DAG is shown. We
observe that each node of the DAG represents a formula in terms of the leaves, that is, the
values possessed by variables and constants upon entering the block. For example, the node
labeled t4 represents the formula b[4 * i] that is, the value of the word whose address is 4*i
bytes offset from address b, which is the intended value of t4.

Constructing a DAG

Input: a basic block. Statements: (i) x:= y op z (ii) x:= op y (iii) x:= y

Output: a dag for the basic block containing:

- a label for each node. For leaves an identifier - constants are permitted. For interior
nodes an operator symbol.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

- for each node a (possibly empty) list of attached identifiers - constants not permitted.

Method: Initially assume there are no nodes, and node is undefined.

(1) If node(y) is undefined: created a leaf labeled y, let node(y) be this node. In case(i) if
node(z) is undefined create a leaf labeled z and that leaf be node(z).
(2) In case(i) determine if there is a node labeled op whose left child is node(y) and right
child is node(z). If not create such a node, let be n. case(ii), (iii) similar.
(3) Delete x from the list attached to node(x). Append x to the list of identify for node n and
set node(x) to n.

Let’s check the take away from this lecture


13. DAG Stands for?
a) Direct Acyclic Graph b) Directed Acyclic Graph

14. Code Generation Algorithm is used to generate which language?


a) Assembly Language b) C Language c) Object Code d) Machine Code

Exercise:
Q.10 Write Code Generation Algorithm.
Q.11 Define DAG

Learning from the lecture 8 Code generation algorithm and DAG representation of Basic Block9:
Student will able to draw DAG for the given Basic Block.

Add to Knowledge (Content Beyond Syllabus)

4.10 Learning Outcomes:

1. Know:
a) Student should be able to write different ways to represent Intermediate Code.
b) Define Basic Block and Flow Graph.

2. Comprehend:
a) Student should be able to Describe Syntax Tree, Three address Code and Postfix
notation.
b) Find leader for given basic block and draw flow graph.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

3. Apply, analyze and synthesize:


Student should be able to
1. Draw DAG for the given three address code and optimized the code.
2. Design and implement ICG and Code Generator.

4.11. Short Answer questions


Q.1 Explain code generation phase of a compiler.
Q.2 Explain phases of compiler?

4.13. University Questions Sample Answers:


Q.1 Discuss various intermediate code forms in detail. [Dec 2016]

Ans: Various forms of intermediate code are as follows:


a)Syntax trees or DAG
b)postfix notation
c)Three address code

Graphical Representations

a) A syntax tree depicts the natural hierarchical structure of a source program.


A dag gives the same information but in a more compact way because Common sub
expressions are identified.

b) Postfix notation is a linear zed representation of a syntax tree; it is a list of

the nodes of the tree in which a node appears immediately after its children.

The postfix notation for the syntax tree in Fig. (a) is

a b c uminus * b c uminus * + assign

c)Three-Address Code:

Three-address code is a sequence of statements of the general form x : = y op Z

where x, y, and z are names, constants, or compiler-generated temporaries;

op stands for any operator, such as a fixed- or floating-point arithmetic operator,

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

or a logical operator on Boolean-valued data.

Thus source language expression like x + y * z might be translated into a sequence

tl : Y * Z

t2 : = X + t1

where tl and t2 are compiler-generated temporary names.

Three-address code is a linearized representation of a syntax tree or a dag

in which explicit names correspond to the interior nodes of the graph.

a= b*-c+b*-c

Three-address code corresponding to the tree and dag

Lecture: 42
Code Optimization: Principal sources of Optimization
Learning Objective: In this lecture student will able to optimize the Intermediate code using
different code optimization techniques.

6.9.1 Criteria for Code-Improving Transformations:

Simply stated, the best program transformations are those that yield the most benefit for the least
effort. The transformations provided by an optimizing compiler should have several properties.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

First, a transformation must preserve the meaning of programs. That is, an "optimization" must
not change the output produced by a program for a given input, or cause an error, such as a division
by zero, that was not present in the original version of the source program. The influence of this
criterion pervades this chapter; at all times we take the "safe" approach of missing an opportunity
to apply a transformation rather than risk changing what the program does.

Second, a transformation must, on the average, speed up programs by a measurable amount.


Sometimes we are interested in reducing the space taken by the compiled code, although the size
of code has less importance than it once had. Of course, not every transformation succeeds in
improving every program, and occasionally an "optimization" may slow down a program slightly,
as long as on the average it improves things.

Third, a transformation must be worth the effort. It does not make sense for a compiler writer to
expend the intellectual effort to implement a code improving transformation and to have the
compiler expend the additional time compiling source programs if this effort is not repaid when
the target programs are executed. Certain local or "peephole" transformations of the kind are
simple enough and beneficial enough to be included in any compiler.

Some transformations can only be applied after detailed, often time-consuming, analysis of the
source program, so there is little point in applying them to programs that will be run only a few
times. For example, a fast, non-optimizing, compiler is likely to be more helpful during debugging
or for "student jobs= that will be run successfully a few times and thrown away. Only when the
program in question takes up a significant fraction of the machine's cycles does improved code
quality justify the time spent running an optimizing compiler on the program.

ALGEBRAIC TRANSFORMATION

Countless algebraic transformations can be used to change the set of expressions computed
by a basic block into an algebraically equivalent set. The useful ones are those that simplify
expressions or replace expensive operations by cheaper ones. For example, statementssuch as

x := x +0 Or x := x*1 , can be eliminated from a basic block without changing the set
of expressions it computes. The exponentiation operator in the statements

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

x := y ** 2

usually requires a function call to implement. Using an algebraic transformation, this statement
can be replaced by cheaper, but equivalent statement

x := y*y

FLOW GRAPHS

We can add the flow-of 3control information to the set of basic blocks making up a
program by constructing a directed graph called a flow graph. The nodes of the flow graph are the
basic blocks. One node is distinguished as initial; it is the block whose leader is the first statement.
There is a directed edge from block B1 to block B2can be immediately follow B1in some execution
sequence; that is, if

1. there is a conditional or unconditional jump from the last statement of B2, or


2. B2 immediately follow B1in the order of the program, and B1 does not end in the
unconditional jump

B1 is a predecessor of B2, and B2is a successor of B1.

Example 4: The flow graph of the program of fig. 7 is shown in fig. 9, B1 is the initial node.

Prod := 0 B1

I:=1

t1 := 4 * i
B2

t2 := a [ t1 ]

t3 := 4 * i

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

t4 := b [ t3 ]

t5 := t2 * t4

t6:= prod + t5

t7:=i+1

i := t7

if I <= 20 goto B2

Flow graph for program

REPRESENTATION OF BASIC BLOCKS

Basic Blocks are represented by variety of data structures. For example, after
partitioning the three address statements by Algorithm 1, each basic block can be represented by
a record consisting of a count of number of quadruples in the block, followed by a pointer to the
leader of the block, and by the list of predecessors and successors of the block. For example the
block B2 running from the statement (3) through (12) in the intermediate code of figure 2 were
moved elsewhere in the quadruples array or were shrunk, the (3) in if i<=20 goto(3) would have
to be changed.

LOOPS

Loop is a collection of nodes in a flow graph such that

1. All nodes in the collection are strongly connected; from any node in the loop to any other, there
is path of length one or more, wholly within the loop, and

2. The collection of nodes has a unique entry, a node in the loop such that is, a node in the loop
such that the only way to reach a node of the loop from a node outside the loop is to first go through
the entry.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

A loop that contains no other loops is called an inner loop.

PEEPHOLE OPTIMIZATION

A statement-by-statement code-generations strategy often produce target code that


contains redundant instructions and suboptimal constructs .The quality of such target code can
be improved by applying <optimizing= transformations to the target program.

A simple but effective technique for improving the target code is peephole
optimization, a method for trying to improving the performance of the target program by
examining a short sequence of target instructions (called the peephole) and replacing these
instructions by a shorter or faster sequence, whenever possible.

The peephole is a small, moving window on the target program. The code in
the peephole need not contiguous, although some implementations do require this. We shall give
the following examples of program transformations that are characteristic of peephole
optimizations:

• Redundant-instructions elimination

• Flow-of-control optimizations

• Algebraic simplifications

• Use of machine idioms

REDUNTANT LOADS AND STORES

If we see the instructions sequence

(1) (1) MOV R0,a (2) (2) MOV a,R0

-we can delete instructions (2) because whenever (2) is executed. (1) will ensure that the value of
a is already in register R0.If (2) had a label we could not be sure that (1) was always executed
immediately before (2) and so we could not remove (2).

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

UNREACHABLE CODE

Another opportunity for peephole optimizations is the removal of unreachable


instructions. An unlabeled instruction immediately following an unconditional jump may be
removed. This operation can be repeated to eliminate a sequence of instructions. For example, for
debugging purposes, a large program may have within it certain segments that are executed only
if a variable debug is 1.In C, the source code might look like:

#define debug 0

….

If (debug) {

Print debugging information

In the intermediate representations the if-statement may be translated as:

If debug =1 goto L2

Goto L2

L1: print debugging information

L2: …………………………(a)

One obvious peephole optimization is to eliminate jumps over jumps .Thus no matter what the
value of debug, (a) can be replaced by:

If debug ≠1 goto L2

Print debugging information

L2: ……………………………(b)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

As the argument of the statement of (b) evaluates to a constant true it can be replaced
by If debug ≠0 goto L2

Print debugging information

L2: ……………………………(c)

As the argument of the first statement of (c) evaluates to a constant true, it can be replaced
by goto L2. Then all the statement that print debugging aids are manifestly unreachable and can
be eliminated one at a time.

FLOW-OF-CONTROL OPTIMIZATIONS

The unnecessary jumps can be eliminated in either the intermediate code or the target
code by the following types of peephole optimizations. We can replace the jump sequence

goto L2

….

L1 : gotoL2

by the sequence

goto L2

….

L1 : goto L2

If there are now no jumps to L1, then it may be possible to eliminate the statement L1:goto L2
provided it is preceded by an unconditional jump .Similarly, the sequence

if a < b goto L1

….

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

L1 : goto L2

can be replaced by

if a < b goto L2

….

L1 : goto L2

Finally, suppose there is only one jump to L1 and L1 is preceded by an unconditional goto. Then
the sequence

goto L1

……..

L1:if a<b goto L2

L3: …………………………………..(1)

may be replaced by

if a<b goto L2

goto L3

…….

L3: ………………………………….(2)

While the number of instructions in (1) and (2) is the same, we sometimes skip the unconditional
jump in (2), but never in (1).Thus (2) is superior to (1) in execution time

ALGEBRAIC SIMPLIFICATION

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

There is no end to the amount of algebraic simplification that can be attempted through
peephole optimization. Only a few algebraic identities occur frequently enough that it is worth
considering implementing them .For example, statements such as

x := x+0

Or

x := x * 1

are often produced by straightforward intermediate code-generation algorithms, and they can be
eliminated easily through peephole optimization.

ELIMINATION OF COMMON SUBEXPRESSIONS

Common sub expressions need not be computed over and over again. Instead they can be computed
once and kept in store from where it9s referenced when encountered again 3 of course providing
the variable values in the expression still remain constant.

ELIMINATION OF DEAD CODE

It9s possible that a large amount of dead (useless) code may exist in the program. This might be
especially caused when introducing variables and procedures as part of construction or error-
correction of a program 3 once declared and defined, one forgets to remove them in case they serve
no purpose. Eliminating these will definitely optimize the code.

REDUCTION IN STRENGTH

Reduction in strength replaces expensive operations by equivalent cheaper ones on


the target machine. Certain machine instructions are considerably cheaper than others and can
often be used as special cases of more expensive operators. For example, x² is invariably cheaper
to implement as x*x than as a call to an exponentiation routine. Fixed-point multiplication or
division by a power of two is cheaper to implement as a shift. Floating-point division by a constant
can be implemented as multiplication by a constant, which may be cheaper.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

USE OF MACHINE IDIOMS

The target machine may have hardware instructions to implement certain specific
operations efficiently. Detecting situations that permit the use of these instructions can reduce
execution time significantly. For example, some machines have auto-increment and

auto-decrement addressing modes. These add or subtract one from an operand before or after using
its value. The use of these modes greatly improves the quality of code when pushing or popping a
stack, as in parameter passing. These modes can also be used in code for statements like i : =i+1.

THE PRINCIPAL SOURCES OF OPTIMIZATION

Here we introduce some of the most useful code-improving transformations. Techniques for
implementing these transformations are presented in subsequent sections. A transformation of a
program is called local if it can be performed by looking only at the statements in a bas9ic block;
otherwise, it is called global. Many transformations can be performed at both the local and global
levels. Local transformations are usually performed first.

Function-Preserving Transformations

There are a number of ways in which a compiler can improve a program without changing the
function it computes. Common subexpression elimination, copy propagation, dead-code
elimination, and constant folding are common examples of such function-preserving
transformations. The other transformations come up primarily when global optimizations are
performed.

Frequently, a program will include several calculations of the same value, such as an offset in an
array. Some of these duplicate calculations cannot be avoided by the programmer because they lie
below the level of detail accessible within the source language.

Common Subexpressions

An occurrence of an expression E is called a common subexpression if E was previously computed,


and the values of variables in E have not changed since the previous computation. We can avoid

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

recomputing the expression if we can use the previously computed value. For example, the
assignments to t7 and t10 have the common subexpressions 4*I and 4*j, respectively, on the right
side in Fig. They have been eliminated in Fig by using t6 instead of t7 and t8 instead of t10. This
change is what would result if we reconstructed the intermediate code from the dag for the basic
block.

Copy Propagation

Block B5 in Fig. can be further improved by eliminating x using two new transformations. One
concerns assignments of the form f:=g called copy statements, or copies for short. Had we gone
into more detail in Example 10.2, copies would have arisen much sooner, because the algorithm
for eliminating common subexpressions introduces them, as do several other algorithms. For
example, when the common subexpression in c:=d+e is eliminated in Fig., the algorithm uses a
new variable t to hold the value of d+e. Since control may reach c:=d+e either after the assignment
to a or after the assignment to b, it would be incorrect to replace c:=d+e by either c:=a or by c:=b.

The idea behind the copy-propagation transformation is to use g for f, wherever possible after the
copy statement f:=g. For example, the assignment x:=t3 in block B5 of Fig. is a copy. Copy
propagation applied to B5 yields:

x:=t3

a[t2]:=t5

a[t4]:=t3

goto B2

Dead-Code Eliminations

A variable is live at a point in a program if its value can be used subsequently; otherwise, it is dead
at that point. A related idea is dead or useless code, statements that compute values that never get
used. While the programmer is unlikely to introduce any dead code intentionally, it may appear as

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

the result of previous transformations. For example, we discussed the use of debug that is set to
true or false at various points in the program, and used in statements like

If (debug) print …

By a data-flow analysis, it may be possible to deduce that each time the program reaches this
statement, the value of debug is false. Usually, it is because there is one particular statement

Debug :=false

That we can deduce to be the last assignment to debug prior to the test no matter what sequence of
branches the program actually takes. If copy propagation replaces debug by false, then the print
statement is dead because it cannot be reached. We can eliminate both the test and printing from
the o9bject code. More generally, deducing at compile time that the value of an expression is a
co9nstant and using the constant instead is known as constant folding.

One advantage of copy propagation is that it often turns the copy statement into dead code. For
example, copy propagation followed by dead-code elimination removes the assignment to x and
transforms 1.1 into

a [t2 ] := t5

a [t4] := t3

goto B2

Let’s check the take away from this lecture


1) How many phases in compiler?

1)5 2)6 3) 4 4)8

L21. Exercise:
Q.1 What are the sources of code optimization?

Questions/problems for practice:

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Q. 2 Short note : Code Optimization [5 marks]

Learning from the lecture 8Principal sources of Optimization9:


Student will able to list and explain the different code optimization techniques.

Lecture: 43
Optimization of Basic Blocks, Loops in Flow graph
Learning objective: In this lecture students will able to optimize the basic block to improve the
efficiency of the running code.

6.9.2 Course Contents: DAG of a Basic Block

1. A leaf node for the initial value of an id


2. A node n for each statement s
3. The children of node n are the last definition (prior to s) of the operands of n

Optimization of Basic Blocks: Identify Common sub-expression (The expression that compute
the same value) by construction of DAG.

a := b + c
b := b – d Common expressions
c := c + d But do not generate the
e := b + c same result
Basic Blocks

þ DAG representation identifies expressions that yield the same result

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

+ + +d - a
0

a := b +b0c c0
b := b – d
c := c + d
e := b + c
Explanation with example:

Source codes generally have a number of instructions, which are always executed in sequence and
are considered as the basic blocks of the code. These basic blocks do not have any jump statements
among them, i.e., when the first instruction is executed, all the instructions in the same basic block
will be executed in their sequence of appearance without losing the flow control of the program.

A program can have various constructs as basic blocks, like IF-THEN-ELSE, SWITCH-CASE
conditional statements and loops such as DO-WHILE, FOR, and REPEAT-UNTIL, etc.

Basic block identification

We may use the following algorithm to find the basic blocks in a program:

 Search header statements of all the basic blocks from where a basic block starts:
o First statement of a program.
o Statements that are target of any branch (conditional/unconditional).
o Statements that follow any branch statement.
 Header statements and the statements following them form a basic block.
 A basic block does not include any header statement of any other basic block.

Basic blocks are important concepts from both code generation and optimization point of view.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Basic blocks play an important role in identifying variables, which are being used more than once
in a single basic block. If any variable is being used more than once, the register memory allocated
to that variable need not be emptied unless the block finishes execution.

Control Flow Graph

Basic blocks in a program can be represented by means of control flow graphs. A control flow
graph depicts how the program control is being passed among the blocks. It is a useful tool that
helps in optimization by help locating any unwanted loops in the program.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Let’s check the take away from this lecture

R. 1 Which data structure is used to identify common sub expression?


a) Syntax Tree b) Parse Tree c) DAG
L22. Exercise
Q.1 Write short note on : Optimization of Basic Block in compiler design.
Questions/Problems for practice:

Q.6 Draw the DAG for the given Basic Block:


a := b + c
b := b 3 d
c := c + d
e := b + c

Learning from this lecture 8Optimization of Basic Blocks, Loops in Flow graph9:
Students will able to optimize basic block using DAG.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Lecture: 44
Loop Optimization and Peephole Optimization
Learning Objective: In this lecture students will able to study Peephole Optimization.

6.9.3 Loop Optimization:

We now give a brief introduction to a very important place for optimizations, namely loops,
especially the inner loops where programs tend to spend the bulk of their time. The running time
of a program may be improved if we decrease the number of instructions in an inner loop, even if
we increase the amount of code outside that loop. Three techniques are important for loop
optimization: code motion, which moves code outside a loop; induction-variable elimination,
which we apply to eliminate I and j from the inner loops B2 and B3 and, reduction in strength,
which replaces and expensive operation by a cheaper one, such as a multiplication by an addition.

Code Motion

An important modification that decreases the amount of code in a loop is code motion. This
transformation takes an expression that yields the same result independent of the number of times
a loop is executed ( a loop-invariant computation) and places the expression before the loop. Note
that the notion <before the loop= assumes the existence of an entry for the loop. For example,
evaluation of limit-2 is a loop-invariant computation in the following while-statement:

While (i<= limit-2 )

Code motion will result in the equivalent of

t= limit-2;

while (i<=t)

Induction Variables and Reduction in Strength

While code motion is not applicable to the quicksort example we have been considering the other
two transformations are.Loops are usually processed inside out.For example consider the loop
around B3.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Note that the values of j and t4 remain in lock-step; every time the value of j decreases by 1 ,that
of t4 decreases by 4 because 4*j is assigned to t4.Such identifiers are called induction variables.

When there are two or more induction variables in a loop, it may be possible to get rid of all but
one, by the process of induction-variable elimination. For the inner loop around B3 in Fig. we
cannot ger rid of either j or t4 completely.; t4 is used in B3 and j in B4. However, we can illustrate
reduction in strength and illustrate a part of the process of induction-variable elimination.
Eventually j will be eliminated when the outer loop of B2 - B5 is considered.

Example: As the relationship t4:=4*j surely holds after such an assignment to t4 in Fig. and t4 is
not changed elsewhere in the inner loop around B3, it follows that just after the statement j:=j-1
the relationship t4:= 4*j-4 must hold. We may therefore replace the assignment t4:= 4*j by t4:=
t4-4. The only problem is that t4 does not have a value when we enter block B3 for the first time.
Since we must maintain the relationship t4=4*j on entry to the block B3, we place an initializations
of t4 at the end of the block where j itself is initialized, shown by the dashed addition to block B1
in second Fig.

The replacement of a multiplication by a subtraction will speed up the object code if multiplication
takes more time than addition or subtraction, as is the case on many machines.

Peephole Optimization:

In compiler theory, peephole optimization is a kind of optimization performed over a very small
set of instructions in a segment of generated code. The set is called a "peephole" or a "window". It
works by recognizing sets of instructions that can be replaced by shorter or faster sets of
instructions.

Let’s check the take away from this lecture

L23 Exercise:
Q.7 Differentiate between Machine dependent and Machine Independent optimization..
Questions/problems for practice:
Q.8 List the different code Optimization techniques.

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Learning from the lecture 8Loop Optimization and Peephole Optimization:


Student will able to optimize the loop by applying different loop optimization techniques.

4.14. University Questions:

Q.1 Explain with an example Quadruples, Triples, Indirect triples. [May 2016]
Q.2 Draw and explain DAG and represent the following example with it. [May 2016]

(a/b) + (a/b) * (c*d)

Q.3 Discuss various intermediate code forms in detail. [Dec 2016]

Q.4 What are the different issues in design of Code Generator? Explain with an example. [may 2016]

4.15. References
Compilers-Principles, Techniques & Tools By A.V.Aho, R.Shethi & Ullman

Practice for Module No.5 Intermediate Code Generator and Code Generator (based on
University Patterns)
Q.1 A) Write the role of ICG in Compiler Design. (5 Marks)
Q.2 A) Differentiate between Syntax Tree, Parse tree and DAG. (5 marks)
B) What is DAG? Explain with the help of examples. (5 marks)
Q.3 A) List and explain different design issues in compiler design. (10 marks)
Q.4 A) What is Basic Block? Write Algorithm for the same. (10 marks)

Self-Assessment
GATE Questions:

1.One of the purposes of using intermediate code in compilers is to


(A) make parsing and semantic analysis simpler.
(B) improve error recovery and error reporting.
(C) increase the chances of reusing the machine-independent code optimizer in other compilers.
(D) improve the register allocation.

Answer: (C)

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Which one of the following is FALSE?


(A) A basic block is a sequence of instructions where control enters the sequence at the
beginning and exits at the end.
(B) Available expression analysis can be used for common subexpression elimination.
(C) Live variable analysis can be used for dead code elimination.
(D) x = 4 ∗ 5 => x = 20 is an example of common subexpression elimination.

Answer: (D)

2. Consider the intermediate code given below:

1. i = 1
2. j = 1
3. t1 = 5 * i
4. t2 = t1 + j
5. t3 = 4 * t2
6. t4 = t3
7. a[t4] = 31
8. j = j + 1
9. if j <= 5 goto(3)
10. i = i + 1
11. if i < 5 goto(2)

The number of nodes and edges in the control-flow-graph constructed for the above code,
respectively, are

(A) 5 and 7
(B) 6 and 7
(C) 5 and 5
(D) 7 and 8

Answer: (B)

3. Consider the following code segment.

x = u - t;
y = x * v;
x = y + w;
y = t - z;
y = x * y;

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

The minimum number of total variables required to convert the above code segment to static
single assignment form is

(A) 6
(B) 8
(C) 9
(D) 10

Answer: (D)

1. Explain role of code optimization in compiler designing? Explain Peephole optimization


along with an example. [Nov 2015]
2. Explain various loop optimization techniques with example. [Dec 2016]

3. Explain different Code Optimization techniques along with an example. [May 2016]

4. Explain Run time organization in detail. [Dec 2016]


5. What is activation record? Draw the diagram of general activation record and explain
the purpose of different fields of an activation record. [Nov 2015, May 2016]

6.Short note: LEX and YACC. [10 marks]

1. Java Compiler and environment. [5 marks]

4.15. References
Compilers-Principles, Techniques & Tools By A.V.Aho,R.Shethi & Ullman

Practice for Module No. 6 Code Optimization, Run Time storage and Compiler-compilers:

(based on University Patterns)

Q.1 A) Write machine dependent and independent code optimization techniques (10marks).
Q.2 A) Explain Peephole optimization? (5 Marks)
Q.3 A) Explain heap and stack organization. (10marks)
Q.4 A) Explain working of LEX and YACC. (5 marks)

Self-Assessment

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

GATE Questions:

1. Some code optimizations are carried out on the intermediate code because
(A) they enhance the portability of the compiler to other target processors
(B) program analysis is more accurate on intermediate code than on machine code
(C) the information from dataflow analysis cannot otherwise be used for optimization
(D) the information from the front end cannot otherwise be used for optimization
Answer: (A)

2.In a simplified computer the instructions are:

The computer has only two registers, and OP is either ADD or SUB. Consider the following

basic block:

Assume that all operands are initially in memory. The final value of the computation should be
in memory. What is the minimum number of MOV instructions in the code generated for this
basic block?
(A) 2
(B) 3
(C) 5
(D) 6

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

Answer: (B)

3. Consider the following C code segment.

for (i = 0, i<n; i++)


{
for (j=0; j<n; j++)
{
if (i%2)
{
x += (4*j + 5*i);
y += (7 + 4*j);
}
}
}

Which one of the following is false?


(A) The code contains loop invariant computation
(B) There is scope of common sub-expression elimination in this code
(C) There is scope of strength reduction in this code
(D) There is scope of dead code elimination in this code
Answer: (D)

4.Consider the grammar rule E → E1 3 E2 for arithmetic expressions. The code generated is
targeted to a CPU having a single user register. The subtraction operation requires the first
operand to be in the register. If E1 and E2 do not have any common sub expression, in order to
get the shortest possible code
(A) E1 should be evaluated first
(B) E2 should be evaluated first
(C) Evaluation of E1 and E2 should necessarily be interleaved

Downloaded by super market ([email protected])


lOMoARcPSD|24907930

(D) Order of evaluation of E1 and E2 is of no consequence


Answer: (B)

Self-Evaluation
Name of
Student
Class
Roll No.
Subject
Module No.
S.No Tick
Your choice
7. Do you understand the role of o Yes
Intermediate Code Generator in o No
Compiler Design?
8. Do you able to list and identify different o Yes
types of three address code? o No
9. Do you able to write different ways to o Yes
represent three address code? o No
10. Do you able to list different issues in o Yes
code generation ? o No
11. Do you understand module? o Yes
o No

Downloaded by super market ([email protected])

You might also like