System Programming Concept

Download as pdf or txt
Download as pdf or txt
You are on page 1of 175

System Programming

Concept
An understanding towards a compiler

Mr. Biswaranjan Mishra


B.Tech, M.Tech, PhD(Cont.), CCNA, MCP, MCTS
All India Institute of Medical Sciences, Odisha

Mr. Bijay Ku Paikaray


MCA, M.Tech, PhD(Cont.)
Centurion University of Technology and Management, Odisha
CONTENT
Preface IX
Acknowledgments X

Chapter 1 Introduction to Software 1


1.1 Introduction to Software 2
1.2 Characteristics of Software 2
1.3 Basic Hierarchy of Software 2
1.3.1 Application Software 2
1.3.2 Utility Software 3
1.3.3 System Software 3
1.3.4 Computer Hardware 4
1.4 Difference between Software’s and its 4
Development
Exercise 7

Chapter 2 System Programming 8


2.1 Introduction to System Programming 9
2.2 Machine Structure of System Program 10
2.3 Types of Computer Architecture 10
2.3.1. Von Neumann Architecture 10
2.3.2 Harvard Computer Architecture 11
2.4 Different between Von Neumann and Harvard 12
Computer Architecture
2.5 Interfaces in System Programming 12
2.6 Address Space in System Programming 13
2.7 Life Cycle of a Source Program 14
2.8 System Software Development 15
2.9 Recent trends in Software Development 16
2.10 Levels of System Software 17
Exercise 18

Chapter 3 Computer Language 19


3.1 Introduction to Computer Language 20
3.2 Classification of Computer Languages 20
3.2.1 Low Level Languages 20

3.2.2 High Level Languages 22


3.2.3 Advanced High Level Language 23
3.3 What is Compiler? 23
3.4 What is Interpreter? 24
3.5 Different Language Expressions 24
3.6 Language Processing Activity 26
3.6.1 Program Generation Activities 26
3.6.2 Program Execution Activities 27
3.7 Symbol tables 27
3.7.1 Symbol table entry formats 28
3.8 Phases and Passes of Compiler (Toy Compiler) 28
3.8.1 Language Processor Pass 28
3.8.2 Phases of Language Processor 29
Exercise 33

Chapter 4 Introduction of Assembler 34


4.1 Introduction of Assembler 35
4.2 Types of Assembly Statements 35
4.3 Working of Pass-1 38
4.4 Working of Pass-2 40
4.5 Elements of Assembly Language Programming 41
4.5.1 Statement Format 42
4.5.2 Analysis Phase 42
4.5.3 Synthesis Phase 43
4.6 Types of Pass Structure 43
4.6.1 Two pass translation 43
4.6.2 Single pass translation 44
4.7 Forward References Solved Using Back 44
Patching
4.8 Advanced Assembler Directives 44
4.9 Design of Two-pass Assembler 47
4.9.1 Algorithm for Pass I 48
4.9.2 Intermediate Code Forms 49
4.9.3 Intermediate code for Imperative 50
statement
4.9.4 Comparison of the variants 52
4.9.5 Algorithm for Pass - II 52
4.10 Error reporting of assembler 53
4.10.1 Error reporting in pass I 53
4.10.2 Error reporting in pass II 54
4.11 Algorithm of the Single-Pass Assembler 55
4.11.1 Intermediate Representation 59
Exercise 60

Chapter 5 Introduction to Linker and Loader 61


5.1 Introduction to Linker 62
5.1.1 Static Linking 62
5.1.2 Dynamic linking 62
5.2 Execution Phase 63
5.2.1 Linking Process 64
5.3 Design of a Linker 64
5.3.1 Relocation 64
5.3.2 Linking 65
5.4 Relocation of Linking Concept 66
5.4.1 Performing Relocation 66
5.4.2 Self-Relocating Programs 67
5.4.3 Linking in MS-DOS 67
5.5 Linking of Overlay Structured Programs 68
5.6 Introduction about Loader 69
5.7 Different Loading Schemes 69
5.7.1 Compile-and-Go Loaders 69
5.7.2 General Loader 70
5.7.3 Absolute Loaders 70
5.7.4 Relocating Loaders 71
5.7.5Practical Relocating Loaders 72
5.7.6 Linking Loaders 73
5.7.7 Relocating Linking Loaders 73
Exercise 74

Chapter 6 Macro Processors 75


6.1 Introduction to Macro 76
6.2 Macro Processors 76
6.2.1 Macro Processor Operation 76
6.2.2 Silent Features of Macro Processor 77
6.3 Macro Definition and Call 77
6.3.1 Macro Expansion 78
6.4 Difference between Macro and Subroutine 79
6.5 Types of formal parameters 80
6.5.1 Positional Parameters 80
6.5.2 Keyword Parameters 80
6.5.3 Specifying Default Values of 80
Parameters
6.5.4 Macros with Mixed Parameter Lists 80
6.6 Advanced Macro Facilities 81
6.7 Design of Macro Pre-processor 83
6.8 Design of Macro Assembler 85
6.9 Functions of Macro Processor 86
6.9.1 Basic Tasks of Macro Processor 86
6.10 Design Issues of Macro Processor 87
6.10.1 Design Features of Macro Processor 88
6.10.2 Macro Processor Design Options 88
6.11 One-pass Macro Processors 89
6.12 Design of Two-pass Macro Pre-processor 90
Exercise 92

Chapter 7 Introduction to Compiler 93


7. 1 Introduction to Compiler 94
7.2 Binding and Binding Times 95
7.2.1 Introduction to Binding 95
7.2.2 Importance of Binding Times 95
7.3 Memory Allocation 95
7.3.1 Static Memory Allocation 96
7.3.2 Dynamic Memory Allocation 96
7.3.3 Memory Allocation in Block- 96
Structured Language
7.3.4 Dynamic Pointer 98
7.3.5 Static pointer 98
7.4 Compilation of Expression 99
7.4.1 Operand Descriptor 99
7.4.2 Register Descriptors 100
7.5 Intermediate Code for the Expression 100
7.5.1 Quadruple Representation 101
7.5.2 Triples Representation 101
7.5.3 Indirect Triples Representation 102
7.6 Code Optimization 102
7.7 Overview of Interpretation 104
7.7.1 Comparison between Compilers 104
and Interpreters
7.7.2 Comparing the Performance of 105
Compilers and Interpreters
7.7.3 Benefits of Interpretation 106
Exercise 107

Chapter 8 Programming Language Grammars 108


8.1 Programming Language 109

8.2 Programming Language Grammars 109


8.3 Classification of Grammar 111
8.4 Operator Grammars 112
8.4.1 Ambiguous Grammar 112
8.4.2 Scanning 113
8.5 Parsing 113
8.5.1 Bottom-Up Parsing 116
8.5.2 Shift Reduce Parsing 116
8.5.3 Operator Precedence Parsing 117
8.6 Operator Precedence Parsing Algorithm using 119
Stack
8.7 Language Processor Development Tools 121
Exercise 123

Chapter 9 Systems Development 124

9.1 Introduction Systems Development 125


9.2 Java Language Environment 125

9.2.1 Java Virtual Machine 127


9.3 Types of Errors 128
9.3.1 Syntax Error 128
9.3.2 Semantic Error 128
9.3.3 Logical Error 128
9.4 Debugging Procedures 129
9.4.1 Types of debugging procedures 129
9.5 Classification of Debuggers 130
9.5.1 Static Debugging 130
9.5.2 Dynamic/Interactive Debugger 130
Exercise 132

Miscelenous problem 133

MCQ Questions with answers 144

Index 155
Published by Notion Press Xpress Publishing
www.notionpress.com

Copyright © Biswaranjan Mishra &, Bijay Ku Paikaray 2021

All rights reserved. No part of this publication may be reproduced, stored


in a retrieval system, or transmitted in any form or by any means,
electronic, mechanical, recording or otherwise, without the prior written
permission of the author.

This book has been published with all reasonable efforts taken to make the
material error-free after the consent of the author. The author of this book is
solely responsible and liable for its content including but not limited to the
views, representations, descriptions, statements, information, opinions and
references [“Content”]. The publisher does not endorse or approve the
Content of this book or guarantee the reliability, accuracy or completeness
of the Content published herein. The publisher and the author make no
representations or warranties of any kind with respect to this book or its
contents. The author and the publisher disclaim all such representations
and warranties, including for example warranties of merchantability and
educational or medical advice for a particular purpose. In addition, the
author and the publisher do not represent or warrant that the information
accessible via this book is accurate, complete or current.

ISBN: 9781637817858
Price ₹ 295.00
First Published – 2021

viii
PREFACE
This book is for system programmer, computer engineers, and others who
want to be able to understand the program codes by learning what is going
on “under the hood” of a computer system. We are aiming to explain the
enduring concepts underlying all computer systems, and to show you the
concrete ways that these ideas affect the correctness, performance, and
utility of your application programs. This book is focused to Computer
Science an interesting topic for Undergraduate and Postgraduate students.
The proposed book is not only student-friendly but also contains all
advanced domains of Computer Science in day to day practice. This book is
written from a programmer’s perspective, describing how application
programmers can use their knowledge of a system to write better
programs. Of course, learning what a system is supposed to do provides a
good first step in learning how to build one, and so this book also serves as
a valuable introduction to those who go on to implement systems hardware
and software. As there is a common syllabus for all Graduate students, the
course has been accordingly designed. The syllabus designed by
Universities for ‘System Programming’ is not only an introductory
computing course but also it focuses on new industrial skills in Computer
Science. This book is not considered as the ultimate textbook ever written,
but sincere efforts have been made to bring it at par with the much sought-
after texts in this field. If you study and learn the concepts in this book, you
will be on your way to becoming the rare “power programmer” who
knows how things work and how to fix them when they break. Our aim is
to present the fundamental concepts in ways that you will find useful right
away. You will also be prepared to delve deeper, studying such topics as
compilers, computer architecture, operating systems, embedded systems,
and networking. Finally, we dedicate this book to all the students who
would be using this text and wish them all our best.

Happy Reading Ahead!!!

ix
Acknowledgments
From Biswaranjan Mishra

At the outset , I offer my holy salutation onto the Lotus


Feet of the Almighty for showering love and grace upon
me to make me write this book. I convey my gratitude to
many people. Firstly, I express my deep sense of gratitude
to my entire Family who are the devotees of great guru
Nigamanada for their constant support and inspiration. I
extend my sweetest love and affection to my spouse Mrs. Swapna
Chatterjee, Assistant Teacher, School and Mass Education who is my
guider, for her constant effort to make me restless, without whom it would
not be possible to bring out this book from inception to reality. In fine ,I am
indebted to my parents my mother Bijaya Nalini Mishra, Father Benudhar
Mishra, father in law Tarakinkar chatterjee, Mother in law Geetarani
Mukharjee, for their caring, guidance, love and blessings whose
contributions to my carrer cannot be summed up in words.

From Bijay Ku Paikaray

The writing of this book was a massive task for me. I have
given my sincere efforts to put it from my brain to paper.
A lot of help was offered by many people, whom I need to
mention. First of all, I want to show my gratitude to my
Family for their constant support and inspiration. My
special thanks to mother Sakuntala Paikaray, father
Shyama Sundar Paikaray and my sister Susama Manjari Parida who have
always motivated me and encouraged me to explore my interests.

Lastly but not the least ,we are thankful to sri saswat chatterjee who helped
us in preparing the manuscript of this book.

Any value adding suggestions,comments and corrections are welcome at


[email protected], [email protected]

x
Organization of the Book

The book is organized into nine chapters. A Brief Description of each


chapter follows:

• The first chapter introduces the readers to the different software and
components in detail.

• The second chapter presents the concept of system programming


and its hierarchy.

• Chapter three describes the computer language with its phases and
passes of the compiler.

• Chapter four is about assembler and its passes.

• Chapter five deals with linker and loader about its usages in the
programming loading schemes.

• Chapter six explains the different macro pre-processor and its


design.

• Chapter seven holds the basics of compilers and their allocation in


memory.

• Chapter eight explains the basics of grammar classification and


parsing techniques.

• The last chapter nine describes the system development and its
applications.

xi
About the Authors
Biswaranjan Mishra is working as Faculty of Computer Science in All
India Institute of Medical Science, Bhubaneswar, Odisha. In addition to
taking the classes over there served as guest faculty in various
conglomerate like Centurion University of Technology and Management
Bhubaneswar and NIPS school of Management Bhubaneswar. He has
completed B.Tech and M.Tech in Computer Science and Engineering from
BPUT, and pursuing his Ph.D. in Computer Science and Engineering at
GIET University, Odisha. He has more than a decade of teaching, research
experience and industrial expertise in many premier institutes. His area of
research interest includes Blockchain and cryptocurrency, Machine
Learning and IoT.

Bijay Ku Paikaray is working as an Instructor in the Dept. of Computer


Science and Engineering, Centurion University of Technology and
Management, Bhubaneswar, India. He has completed MCA from
sambalpur University, M.Tech in Computer Science and Engineering from
Centurion University and pursuing his Ph.D. in Computer Science and
Engineering at Centurion University. He is teaching undergraduate and
postgraduate students for the last 9 years at Centurion University. He has
published a number of research papers in peer-reviewed International
Journals and conferences. His area of research includes High-performance
Computing, Information Security and IoT.

xii
Chapter 1
Introduction to Software

1.1 Introduction to Software


1.2 Characteristics of Software
1.3 Basic Hierarchy of Software
1.3.1 Application Software
1.3.2 Utility Software
1.3.3 System Software
1.3.4 Computer Hardware
1.4 Difference between Software’s and its Development
Exercise

1
1.1 Introduction to Software
The instruction is the internal command or an external input received from devices such
as a mouse or keyboard. A program is a set of instructions to perform specific tasks and
software is the collection of one or many programs for a specific purpose. In the
computer language software is programming code executed on a computer processor.
For an operating system, the code written is called machine code. Examples of software
are Photoshop, MySQL, Google Chrome, Microsoft Word, Excel, PowerPoint, etc.

1.2 Characteristics of Software


The software characteristics include performance, portability, and functionality.
Developing any software requires the understanding of the following software quality
factors:
• Operational characteristics: These include characteristics such as correctness,
usability/learnability, integrity, reliability, efficiency, security, and safety.
• Transitional characteristics: These include interoperability, reusability, and
portability.
• Revision characteristics: These are characteristics related to the 'interior quality'
of software such as efficiency, documentation, and structure. Various revision
characteristics of software are maintainability, flexibility, extensibility,
scalability, testability, and modularity.

1.3 Basic Hierarchy of Software


Broadly, computer software includes various computer programs, system libraries, and
their associated documentation. Based on the nature of the task and goal, computer
software can be classified into application software, utility software, and system
software.

1.3.1 Application Software


Application software is designed to perform special functions other than the basic
operations carried out by a computer. All such software is application-specific and
cannot be directly understood by the underlying hardware of the computer. Application
software is concerned with the solution of some problems; it uses a computer as a tool
and enables the end-user to perform specific and productive tasks. There are different
types of application software based on the range of tasks performed by the computer.
2
User

Application Software

Utility Software

System Software
(Including OS)

Hardware

Figure 1.3: Software Hierarchy

1.3.2 Utility Software


This software is designed for users to assist in maintenance and monitoring activities.
These include anti-virus software, firewalls, and disk utilities. They help maintain and
protect the system but do not directly interface with the hardware.

1.3.3 System Software


System software can be viewed as software that logically binds components of a computer
to work as a single unit and provides the infrastructure over which programs can operate.
It is responsible for controlling computer hardware and other resources and allows the
application software to interact with computers to perform their tasks. System software
includes an operating system, device drivers, language translators, etc. Some specific
system software is assemblers, linkers, loaders, macro processors, text editors, compilers,
operating systems, debugging systems, source code control systems, etc.

3
1.3.4 Computer Hardware
Hardware refers back to the physical components of a system. Computer Hardware is any
part of the system that we will touch these parts. These are the primary electronic gadgets
used to accumulate the computer. Examples of hardware in a computer are the Processor,
Memory Devices, Monitor, Printer, Keyboard, Mouse, and the Central Processing Unit.

Application Software

Internet Browsers
System Software
Games
Operating system
Hardware
Multimedia
Utilities CPU, Mouse, Printer
Spreadsheet

Figure 1.4: Differentiating Hardware to Software

1.4 Various Software Platforms


1. Free Software Open Source Software

“Free software” means software that Open Source Software is something


respects users’ freedom and community. which you can modify as per your
Roughly, it means that the users have the needs, share with others without any
freedom to run, copy, distribute, study, licensing violation burden. When we
change, and improve the software. say Open Source, the source code of
The term “free software” is sometimes the software is available publicly with
misunderstood—it has nothing to do Open Source licenses like GNU (GPL)
with price. It is about freedom. which allows you to edit source code
and distribute it. Read these licenses
and you will realize that these licenses
are created to help us.
1. Coined by the development
environments around software
produced by the open

4
collaboration of software
developers on the internet.
2. Later specified by the Open
Source Initiative (OSI).
3. It does not explicitly state ethical
values, besides those directly
associated with software
development.
2. Packaged Software Custom Software

Packaged software this is often called a Custom software is a particular


software package deal is a conglomerate program this is advanced for a goal in a
program that’s obtainable to the overall conglomerate. Its value is higher than
public and bought to them at express
package software due to the fact custom
costs. The package software program is
advanced by system technicians. It is the software is made for a specific purpose.
compilation of programs that are Unlike bundled software programs, the
grouped to provide publicly with custom software program may be
different tools in the equal group. It can’t changed or modified if there may be a
be changed or altered even if there may need due to the fact this software is
be a need. The important definition says custom-built.
that, once several software packages are
grouped throughout a package and The exceptional instance of a custom
deliver answers to people, then it gets the software program is an organization
required name.
that needs to personal a management
The nice instance of a package software gadget for their employees and
program is Microsoft Office that has preserving of their operating hours.
many gears grouped as an example Whenever one undertaking is carried
Office, Access, Excel, Note, and out during a selected condition, as an
PowerPoint. instance, folks obtaining the
undertaking of planning a calculator
via C++ language, then it’ll turn out to
be a custom product.

3. Generic Software Development Custom Software Development

Generic software development is a Customer software development is a


technique accomplished through the mechanism via which a company
builders that expand the software program develops the product for a character
product. Usually, this product is made for client. Individual clients can be a
5
all forms of business needs which have an corporation or institution of persons.
effective demand within the marketplace This product in most cases has a
over a length of time. Software distinct need within the market
development companies expand generic handiest for a confined time and is for
software on their own and treated it to a the specialized commercial enterprise
group of clients having a similar need. needs. Software development
businesses expand custom software at
the value of specific customers.
4. Traditional Software Development Agile Software Development

Traditional software development is the Agile software program improvement


software development process used to is the software program improvement
design and develop simple software. It is technique used to design complicated
used when the security and many other software. It is used when the software
factors of the software are not much is pretty sensitive and complicated. It
important. It is used by fresher to is used while safety is an awful lot
develop the software. important. It is used by specialists to
broaden the software program.
It consists of five phases:
1. Requirements analysis It consists of three phases:
2. Design 1. Project initiation
3. Implementation 2. Sprint planning
4. Coding and Testing 3. Demos
5. Maintenance

6
Exercise

Short Questions

1. What is a kernel?
2. Can application software without an operating system be installed and why?
3. What are the phases of Traditional Software Development?
4. What are the phases of Agile Software Development?
5. What are the tasks of an operating system?

Long Questions

1. How Traditional Software Development and Agile Software Development


differs from each other.
2. How Generic Software Development and Custom Software Development
differs from each other.
3. How Packaged Software and Custom Software differs from each other.
4. How Free Software and Open Source Software differs from each other.
5. What are the characteristics of the software?

7
Chapter 2
System Programming

2.1 Introduction to System Programming


2.2 Machine Structure of System Program
2.3 Types of Computer Architecture
2.3.1 Von Neumann Architecture
2.3.2 Harvard Computer Architecture
2.4 Difference between Von Neumann and Harvard Computer Architecture
2.5 Interfaces in System Programming
2.6 Address Space in System Programming
2.7 Life Cycle of a Source Program
2.8 System Software Development
2.9 Recent trends in Software Development
2.10 Levels of System Software
Exercise

8
2.1 Introduction to System Programming
System programming is characterized by the fact that it is aimed at producing system
software that provides services to the computer hardware or specialized system
services. Many a time, system programming directly deals with the peripheral devices
with a focus on input, process (storage), and output.

The essential characteristics of system programming are as follows:

• Programmers are expected to know the hardware and internal behavior of the
computer system on which the program will run. System programmers
explore these known hardware properties and write software for specific
hardware using efficient algorithms.
• Uses a low level programming language or some programming dialect.
• Requires little runtime overheads and can execute in a resource-constrained
environment.
• These are very efficient programs with a small or no runtime library
requirements.
• Has access to systems resources, including memory
• Can be written in assembly language

The following are the limiting factors of system programming:

• Many times, system programs cannot be run in debugging mode.


• Limited programming facility is available, which requires high skills for the
system programmer.
• Less powerful runtime library (if available at all), with less error-checking
capabilities.

9
2.2 Machine Structure of System Program
A generic computer system comprises hardware components, collection of system
programs, and a set of application programs.

Hospital Application
Banking Web Browser
Management
System
Programs
Command
Compilers Editors
Interpreters
Operating System
System
Machine Language Programs
Microarchitecture Hardware
Physical devices

Figure 2.1: Machine Structure of System Program

2.3 Types of Computer Architecture

2.3.1 Von Neumann Architecture


The important parts of this architecture include the following:
• Processing unit: It contains a set of working registers for the processor and an
Arithmetic Logic Unit (ALU).
• Control unit: It encompasses control mechanism to carry out functions and
includes program counter and instruction register.
• Memory: It stores instructions and data used for applications, system
processing, I/O mechanisms and address mapping for external mass storage.

10
Memory

ALU

Accumulator Control Unit

Output Input

Figure 2.2.1: Von Neumann Architecture

The traditional Von Neumann architecture describes stored-program computer, which


does not allow fetching of instructions and data operations to occur at that time. The
reason is that the architecture uses a commonly shared bus to access both. This has led
to limiting the performance of the system. The structure of a typical Von Neumann
machine is shown in Figure:

2.3.2 Harvard Computer Architecture


The characteristics of Harvard Computer architecture are as follows:
• The Harvard architecture is stored-program computer system, which has
separate sets of data buses and addresses to read and write data to the
memory and also for fetching instructions.
• Basically, the Harvard architecture has physically distinct storage and signal
paths access to data and instructions, respectively.
• In the Harvard architecture, two memories need not share characteristics.
• The structure and width of the word, timing characteristics, mechanism of
implementation, and structure of addresses can vary, while program
instructions reside in read-only memory, the program data often needs read-
write memory.
• The requirements of instruction memory can be larger due to the fact that
some systems have much more instruction memory than data memory.
Therefore, the width of instruction addresses happens to be wider than that of
data addresses.

11
ALU

Data Control unit Instruction


Memory Memory

Memory

I/O

Figure 2.2.2: Harvard Computer Architecture

2.4 Difference between Von Neumann and Harvard Computer Architecture

Von Neumann Architecture Harvard Computer Architecture


(i) Does not allow both fetching of (i) CPU can both read program
data operations and instruction and access data
instructions to occur at that from the memory
time. simultaneously.
(ii) Uses a common shared bus to (ii) Instruction fetches separate
access both instructions and pathways. and data access
data. need
(iii) Shared bus usage for instruction (iii) Faster for a given circuit
and data memory results into complexity.
performance bottleneck.
(iv) Has limited transfer rate between (iv) Uses separate address spaces for
the CPU and memory. code and data.

2.5 Interfaces in System Programming


An interface is defined as a border or an entry point across which distinct components
of a digital computing system interchange data and information. There are three types
of interfaces - software, hardware, and user interfaces.

1) Software Interface
• Software interface comprises a set of statements, predefined functions, user
12
options, and other methods of conveying instructions and data obtained from a
programmer language for programmers.
• Access to resources including CPU, memory and storage, etc., is facilitated by
software interfaces for the underlying computer system.
• While programming, the interface between software components makes use of
program and language facilities such as constants, various data types, libraries
and procedures, specifications for exception, and method handling.
• Operating system provides the interface that allows access to the system
resources from applications. This interface is called Application Programming
Interface (API). These APIs contain the collection of functions, definitions for
type, and constants, and also include some variable definitions. While
developing software applications, the APIs can be used to access and
implement functionalities.

2) Hardware Interface
• Hardware interfaces are primarily designed to exchange data and information
among various hardware components of the system, including internal and
external devices.
• This type of interface is seen between buses, across storage devices and other
I/O and peripherals devices.
• A hardware interface provides access to electrical, mechanical, and logical
signals and implements signaling protocols for reading and sequencing them.
• These hardware interfaces may be designed to support either parallel or serial
data transfer or both. Hardware interfaces with parallel implementations allow
more than one connection to carry data simultaneously, while serial allows
data to be sent one bit at a time.
• One of the popular standard interfaces is Small Computer System Interface
(SCSI) that defines the standards for physically connecting and
communicating data between peripherals and computers.

3) User Interface
• User interface allows interaction between a computer and a user by providing
various modalities of interaction including graphics, sound, position,
movement, etc. These interfaces facilitate transfer of data between the user and
the computing system. User interface is very important for all systems that
require user inputs.

13
2.6 Address Space in System Programming
The amount of space allocated for all possible addresses for data and other
computational entities is called address space. The address space is governed by the
architecture and managed by the operating system. The computational entities such as
a server, networked computer, or file all are addressed with in space.

There are two types of address space namely,

1. Physical Address Space


Physical address space is the collection of all physical addresses produced by a
computer program and provided by the hardware. Every machine has its own
physical address space with its valid address range between 0 and some
maximum limits supported by the machine.

2. Logical Address Space


CPU generates Logical address space or provides the OS kernel. It is also
sometimes called virtual address space. In the virtual address space, there is one
address space per process, which may or may not start at zero and extend to the
highest address.

2.7 Life Cycle of a Source Program


The life cycle of a source program defines the program behaviour and extends through
execution stage, which exhibits the behaviour specified in the program. Every source
program goes through a life cycle of several stages.
• Edit time: It’s the phase where editing of the program code takes place and is
also known as design time. At this stage, the code is in its raw form and may not
be in a consistent state.
• Compile time: At the compile time stage, the source code after editing is passed
to a translator that translates it into machine code. One such translator is a
compiler. This stage checks the program for inconsistencies and errors and
produces an executable file.
• Distribution time: It is the stage that sends or distributes the program from the
entity creating it to an entity invoking it. Mostly executable files are distributed.
• Installation time: Typically, a program goes through the installation process,
which makes it ready for execution within the system. The installation can also
optionally generate calls to other stages of a program's lifecycle.
• Link time: At this stage, the specific implementation of the interface is linked
14
and associated to the program invoking it. System libraries are linked by using
the lookup of the name and the interface of the library needed during compile
time or throughout the installation time, or invoked with the start or even
during the execution process.
• Load time: This stage actively takes the executable image from its stored
repositories and places them into active memory to initiate the execution. Load
time activities are influenced by the underlying operating system.
• Run time: This is the final stage of the life cycle in which the programmed
behavior of the source program is demonstrated.

Source Program

Assembler of Compiler

Object Code

Compile Time

Linkage Editor
Object Modules

Load Module
Load Time

Loader System Libraries

Execution Time Dynamic System Libraries


Binary Memory Image

Figure 2.6: Life Cycle of a Source Program

15
2.8 System Software Development
Software development process follows the Software Development Life Cycle (SDLC),
which has each step doing a specific activity till the final software is built. The system
software development process also follows all the stages of SDLC, which are as
follows:
• Preliminary investigation: It determines what problems need to be fixed by the
system software being developed and what would be the better way of solving those
problems.
• System analysis: It investigates the problem on a large scale and gathers all the
information. It identifies the execution environment and interfaces required by
the software to be built.
• System design: This is concerned with designing the blueprint of system
software that specifies how the system software looks like and how it will
perform.
• System tool acquisition: It decides and works around the software tools to
develop the functionalities of the system software.
• Implementation: It builds the software using the software tools with all the
functionality, interfaces, and support for the execution. This may be very specific
as the system software adheres to the architecture. Operating system support is
sought for allocations and other related matters.
• System maintenance: Once the system software is ready, it is installed and used.
The maintenance includes timely updating software what is already installed.

2.9 Recent trends in Software Development


As we know now a day there are fast digital transformations in our era including
manufacturing, business, healthcare, and entertainment. Human beings uses this in our
day to day life the Artificial Intelligence (AI) programs like, Google predictive searches,
Gmail and many more. Latest trends in program development from the coding
perspective include the following:
• Use of preprocessors against full software stack
• JavaScript MV* frameworks rather than Java Script files
• CSS frameworks against generic cascading styles sheets
• SVG with JavaScript on Canvas in competition with Flash
• Gaming frameworks against native game development
• Single-page Web apps against websites
• Mobile Web apps against native apps

16
• Android against iOS
• Moving towards GPU from CPU
• Renting against buying (Cloud Services)
• Web interfaces but not IDEs
• Agile development

2.10 Levels of System Software


Various levels of system software used with modern computer systems. A source
program submitted by the programmer is processed during its life cycle.

System Software

Operating Systems
Target Machine Code

Utility Programs
Loader

Disk Defrag Firewall


Linkers
Debugger

Library Programs

Language Translators Reloadable Machine Code

Assemblers

Modified Source Program


Compiler

Interpreter

Pre-processor

Figure 2.10: Levels of System Software


17
Exercise
Short Questions

1. What is system programming?


2. Sort notes on factors of system programming.
3. Write the sort notes about Harvard computer architecture.
4. Different between physical address space and logical address space.
5. Write about the levels of system software.

Long Questions

1. Explain about different type of computer architecture


2. What is machine structure of system program? Write about interfaces in system
programming.
3. Different between Harvard and von Neumann computer architecture.
4. Explain about the life cycle of a source program.
5. Write on recent trends in software development.

18
Chapter 3
Computer Language

3.1 Introduction to Computer Language


3.2 Classification of Computer Languages
3.2.1 Low Level Languages
3.2.2 High Level Languages
3.2.3 Advanced High Level Language
3.3 What is Compiler?
3.4 What is Interpreter?
3.5 Different Language Expressions
3.6 Language Processing Activity
3.6.1 Program Generation Activities
3.6.2 Program Execution Activities
3.7 Symbol tables
3.7.1 Symbol table entry formats
3.8 Phases and Passes of Compiler (Toy Compiler)
3.8.1 Language Processor Pass
3.8.2 Phases of Language Processor
Exercise

19
3.1 Introduction to Computer Language
Computer needs language to communicate across its components and devices and
carry out instructions. A computer acts on a specific sequence of instructions written
by a programmer for a specific job. This sequence of instructions is known as a
program.

There are mainly three different languages with the help of which we can develop
computer programs. And they are High Level Language ,Low Level Language, and
Advanced High Level Language.

Computer
Languages

Low Level High Level Advanced High


Language Language Level Language

Problem-Oriented Fourth Generation


Language Language (4GL)

Procedure- Fifth Generation


Oriented Language Language (5GL)

Figure 3.1: Computer Language

3.2 Classification of Computer Languages

3.2.1 Low Level Languages


Characteristically, low-level languages represent languages that can be directly
understood or are very close to machines. They can be further classified into machine
language and assembly language.

20
1) Machine Language
Machine language is the language which is purely depended on programming concept,
hence written by using binary languages. It is also known as machine code. Machine
code encompasses a function (Opcode) and an operand (Address) part. Since a
computer understands machine codes, programs written by using machine language
can be executed immediately without the requirement of any language translators.

The disadvantages of machine language are as follows:


• Programs based on machine language are difficult to understand as well as
develop.
• A pure machine-oriented language is difficult to remember and recall: All the
instructions and data to the computer are fed in numerical form.
• Knowledge of computer internal architecture and codes is must for programming.
• Writing code using machine language is time consuming, cumbersome, and
complicated.
• Debugging of programs is difficult.

2) Assembly Language
This is a type of programming language in low-level that uses symbolic codes or
mnemonics as instruction. Some examples of mnemonics include ADD, SUB, LDA,
and STA that stand for addition, subtraction, load accumulator, and store
accumulator, respectively. The language known as an assembly language program,
where systematic processing is done by using a language translator hereafter called
assembler. Who played a vital role for translation for assemble language code into
machine code. This type of Assembly language program is mandatory for translation
of equivalent machine language code (binary code) before execution.

The advantages of assembly language are as follows:


• Due to use of symbolic codes (mnemonics), an assembly program can be
written faster.
• It makes the programmer free from the burden of remembering the operation
codes and addresses of memory location.
• It is easier to debug.

The disadvantages of assembly language are as follows:

• Familiarity with machine architecture required for machine-oriented language.


• Understanding of available instruction set required for machine-oriented
21
language.
• The Time consumption for Execution in assembly language program is more than
the time consumption for machine language as a separate language translator
program is required to translate assembly program into binary machine code.

3.2.2 High Level Languages


High level languages (HLL) were developed to overcome large time consumption
and cost in developing machine and assembly languages. HLLs are much closer to
English-like language. A separate language translator is required for translation of
HLL computer programs into machine readable object code. Some of the
distinguished features of HLL include interactive, support of a variety of data
types, a rich set of operators, flexible control structures, readability, modularity, file
handling, and memory management.

The advantages of high level languages are as follows:


• It is machine-independent.
• It can be used both as problem- and procedure-oriented.
• It follows simple English-like structure for program coding.
• It does not necessitate extensive knowledge of computer architectures.
• Writing code using HLL consumes less time.
• Debugging and program maintenance are easier in HLL.

The disadvantage of high level languages is as follows:


• It requires language translators for converting instructions from high level
language into low level object code.

1) Problem-Oriented Language
Problem-oriented language is a programming model which is resulting from
structured programming procedures. A procedure is known as functions,
routines, and subroutines series of computational steps to be carried out.
Program’s execution, any given procedure might be called at any point, including
by other procedures or itself. Languages used in Problem-Oriented Language:
FORTRAN, COBOL, ALGOL, Pascal, BASIC and C.

2) Procedure- Oriented Language


This is a Language in other means known as a programming model based upon
the concept of objects where Objects contain data in the form of attributes and
code in the form of methods. In object oriented programming, computer programs
22
are designed using the concept of objects that interact with real world. Object
oriented programming languages are various but the most popular ones are class-
based, meaning that objects are instances of classes, which also determine their
types. Languages used in Procedure- Oriented Language: Java, C#, C++, PHP,
Python, Ruby, JavaScript, Perl, Objective-C, Dart, Swift, Scala.

3.2.3 Advanced High Level Language


1) Fourth Generation Programming Languages
Fourth generation programming languages are designed to achieve a specific goal
(such as to develop commercial business applications). 4GL preceded 3rd
generation programming languages (which were already very user friendly). 4GL
surpassed 3GL in user-friendliness and its higher level of abstraction. This is
achieved through the use of words (or phrases) that are very close to English
language, and sometimes using graphical constructs such as icons, interfaces and
symbols. By designing the languages according to the needs of the domains, it
makes it very efficient to program in 4GL. Furthermore, 4GL rapidly expanded the
number of professionals who engage in application development. Many fourth
generation programming languages are targeted towards processing data and
handling databases, and are based on SQL.

2) Fifth Generation Programming Languages


Fifth generation programming languages (which followed 4GL) are programming
languages that allow programmers to solve problems by defining certain
constraints as opposed to writing an algorithm. This means that 5GL can be used
to solve problems without a programmer. Because of this reason, 5GL are used in
AI (Artificial Intelligence) research. Many constraint-based languages, logic
programming languages and some of the declarative languages are identified as
5GL. Prolog and Lisp are the most widely used 5GL for AI applications. In the
early 90’s when the 5GL came out, it was believed they would become the future
of programming. However, after realizing that the most crucial step (defining
constraints) still needs human intervention, the initial high expectations were
lowered.

3.3 What is Compiler?


A compiler is a computer program that transforms code written in a high-level
programming language into the machine code. It is a program which translates
the human-readable code to a language a computer processor understands (binary

23
1 and 0 bits). The computer processes the machine code to perform the
corresponding tasks.

Role of Compiler
• Compliers reads the source code, outputs executable code
• Translates software written in a higher-level language into instructions that
computer can understand. It converts the text that a programmer writes into a
format the CPU can understand.
• The process of compilation is relatively complicated. It spends a lot of time
analyzing and processing the program.
• The executable result is some form of machine-specific binary code.

3.4 What is Interpreter?


An interpreter is a computer program, which coverts each high-level program
statement into the machine code. This includes source code, pre-compiled code,
and scripts. Both compiler and interpreters do the same job which is converting
higher level programming language to machine code. However, a compiler will
convert the code into machine code (create an exe) before program run.
Interpreters convert code into machine code when the program is run.

Role of Interpreter
• The interpreter converts the source code line-by-line during RUN Time.
• Interpret completely translates a program written in a high-level language
into machine level language.
• Interpreter allows evaluation and modification of the program while it is
executing.
• Relatively less time spent for analyzing and processing the program
• Program execution is relatively slow compared to compiler

3.5 Different Language Expressions


• Semantic: It represents the rules of the meaning of the domain.
• Semantic gap: It represents the difference between the semantic of two
domains.
• Application domain: The designer expresses the ideas in terms related to
application domain of the software.
• Execution domain: To implement the ideas of designer, their description has
to be interpreted in terms related to the execution domain of computer system.
24
• Specification gap: The gap between application and public domain is called
specification and design gap or simply specification gap. Specification gap is
the semantic gap between two specifications of the same task.

Specification Execution Gap


Gap

Application Public Execution


Domain Domain Domain

• Execution gap: The gap between the semantic of programs written in different
programming language.
• Language processor: Language processor is software which bridges a
specification or execution gap.
• Language translator: Language translator bridges an execution gap to the
machine language of a computer system.
• Detranslator: It bridges the same execution gap as language translator, but in
the reverse direction.
• Preprocessor: It is a language processor which bridges an execution gap but is
not a language translator.
• Language migrator: It bridges the specification gap between two
programming languages.
• Interpreter: An interpreter is a language processor which bridges an execution
gap without generating a machine language program.
• Source language: The program which forms the input to a language processor
is a source program. The language in which the source program is written is
known source language.
• Target language: The output of a language processor is known as the target
program. The language, to which the target program belongs to, is called target
language.

25
• Forward Reference: A forward reference of a program entity is a reference to
the entity in some statement of the program that occurs before the statement
containing the definition or declaration of the entity.
• Language processor pass: A Language processor pass is the processing of
every statement in a source program, or in its equivalent representation, to
perform a language processing function (a set of language processing
functions).
• Intermediate representation (IR): An intermediate representation is a
representation of a source program which reflects the effect of some, but not
all analysis and synthesis functions performed during language processing.
An intermediate representation should have the following three properties:
1. Ease of use: It should be easy to construct the intermediate representation
and analyze it.
2. Processing efficiency: Efficient algorithms should be available for accessing
the data structures used in the intermediate representation.
3. Memory efficiency: The intermediate representation should be compact so
that it does not occupy much memory.

3.6 Language Processing Activity


There are mainly two types of language processing activity which bridges the
semantic gap between source language and target language.

3.6.1 Program Generation Activities


A program generation activity aims an automatic generation of a program.
Program generator is software, which aspects source program and generates a
program in target language. Program generator introduces a new domain between
the application and programming language domain is called program generator
domain.

Error

Program Program Target


Specification Generator
Program

Figure 3.6.1: Program Generation Activities

26
3.6.2 Program Execution Activities
Two popular models for program execution are translation and interpretation.

1) Translation
The program translation model bridges the execution gap by translating a
program written in PL, called source program, into an equivalent program in
machine or assembly language of the computer system, called target program.

Error Data

Source Translator M/C Language Target


Program
Program Program

Figure 3.6.2: Program Execution Activities

2) Interpretation
The interpreter reads the source program and stores it in its memory. The CPU
uses the program counter (PC) to note the address of the next instruction to be
executed. The statement would be subjected to the interpretation cycle, which
consist the following steps:

1. Fetch the instruction


2. Analyze the statement and determine its meaning, the computation to be
performed and its operand.
3. Execute the meaning of the statement.

3.7 Symbol tables


An identifier used in the source program is called a symbol. Thus, names of
variables, functions and procedures are symbols. A language processor uses the
symbol table to maintain the information about attributes of symbols used in a
source program.
It performs the following four kinds of operations on the symbol table:

27
1. Add a symbol and its attributes: Make a new entry in the symbol table.
2. Locate a symbol’s entry: Find a symbol’s entry in the symbol table.
3. Delete a symbol’s entry: Remove the symbol’s information from
the table.
4. Access a symbol’s entry: Access the entry and set, modify or copy its attribute
information.

The symbol table consists of a set entries organized in memory. Two kinds of data
structures can be used for organizing its entries:
1. Linear data structure: Entries in the symbol table occupy adjoining areas of
memory. This property is used to facilitate search.
2. Nonlinear data structure: Entries in the symbol table do no occupy
contiguous areas of memory. The entries are searched and accessed using
pointers.

3.7.1 Symbol table entry formats


Each entry in the symbol table is comprised of fields that accommodate the
attributes of one symbol. The symbol field of fields stores the symbol to which
entry pertains. The symbol field is key field which forms the basis for a search in
the table.
The following entry formats can be used for accommodating the attributes:
1. Fixed length entries: Each entry in the symbol table has fields for all attributes
specified in the programming language.
2. Variable-length entries: the entry occupied by a symbol has fields only for the
attributes specified for symbols of its class.
3. Hybrid entries: A hybrid entry has fixed-length part and a variable-length part.

3.8 Phases and Passes of Compiler (Toy Compiler)


3.8.1 Language Processor Pass
A language processor pass is the processing of every statement in a source
program, to perform language processing function.
Pass I: Perform analysis of the source program and note deduced information.
Pass II: Perform synthesis of target program.
The classic two-pass schematic of language processor is shown in the figure.

28
Source Front End Back End Target

Program
Program

Intermediate
Representation

Figure 3.8.1 Language Processor Pass

The first pass performs analysis of the source program and reflects its results in the
intermediate representation. The second pass reads and analyzes the intermediate
representation to perform synthesis of the target program.

3.8.2 Phases of Language Processor


1) Lexical Analysis (Scanning)
Lexical analysis identifies the lexical unit in a source statement. Then it classifies
the units into different lexical classes. e.g. id’s, constants, keyword etc. After that
these are entered into different tables. The most important table is symbol table
which contains information concerning all identifiers used in the SP. The symbol
table is built during lexical analysis. Lexical analysis builds a descriptor, called a
token. We represent token as Code #no, where Code can be Id or Op for identifier
or operator respectively and no indicates the entry for the identifier or operator in
symbol or operator table.

Example-1
Consider following code
i; integer; a, b; real;
a = b + i;

The statement a = b + i is represented as a string of token

a = b + i

Id#1 Op#1 Id#2 Op#2 Id#3

29
Here a, b & i are real value which are defined in the code so it is considered as Id#1,
Id#2 & Id#3 respectively .
In the above code operator ‘=’ & ‘+” are taken as Op#1 & Op#2
respectively.

2) Syntax Analysis (Parsing)


Syntax analysis processes the string of token to determine its grammatical
structure and builds an intermediate code that represents the structure. The tree
structure is used to represent the intermediate code.

Example-1
Consider the statement a = b + i which can be represented in a tree form as

a +

b i
Where b & i are two real value which are added to have a result to another real
value a, so two sides of the above expressions are represented in a tree structure .

3) Semantic Analysis
Semantic analysis determines the meaning of a statement by applying the
semantic rules to the structure of the statement. While processing a declaration
statement, it adds information concerning the type, length and dimensionality of a
symbol to the symbol table. While processing an imperative statement, it
determines the sequence of actions that would have to be performed for
implementing the meaning of the statement and represents them in the
intermediate code.

Example-1
Considering the tree structure for the statement a = b + i
1. If node is operand, then type of operand is added in the description field of
operand.

30
2. While evaluating the expression the type of b is real and i is int so type of i
is converted to real, i*.

The analysis ends when the tree has been completely processed.

4) Intermediate Representation
IR contains intermediate code and symbol table

Example-1
Finding the intermediate code by the symbol table with symbol type as shown in
the table

Symbol Type Length Address

1 I int

2 a real

3 b real

4 i* real

5 temp real

The Intermediate codes for the symbol table:-


1. Convert(id1#1) to real, giving (id#4)
2. Add(id#4) to (id#3), giving (id#5)
3. Store (id#5) in (id#2)
31
5) Memory Allocation
The memory requirement of an identifier is computed from its type, length and
dimensionality and memory is allocated to it. The address of the memory area is
entered in the symbol table.

Example-1
Memory are allocated by specified address irrespective with symbol like 2000 for
symbol i type int.

Symbol Type Length Address

1 i int 2000

2 a real 2001

3 b real 2002

6) Code Generation
The synthesis phase may decide to hold the value of i* and temp in machine
registers and may generate the assembly code

Example-1
CONV, R (CONV –assembly code, R-temp. registers)
AREG, I (AREG –assembly code, I-temp. registers)
ADD, R (ADD –assembly code, R-temp. registers)
AREG, B
MOVE, M
AREG, A

32
Exercise
Short Questions

1. What is machine language? Writes the disadvantages of machine language


2. What is advanced high level language?
3. What is compiler and role of compiler?
4. What is assembly language? Write the advantages of assembly language.
5. Different language expressions
6. What are the passes of compiler?

Long Questions

1. What is computer language? Write on classification of computer languages


2. Different between low level languages and high level languages.
3. Writes on different language expressions.
4. What is symbol table? Write the table entry format.
5. Write about the phases of language processor.

33
Chapter 4
Introduction of Assembler

4.1 Introduction of Assembler


4.2 Types of Assembly Statements
4.3 Working of Pass-1
4.4 Working of Pass-2
4.5 Elements of Assembly Language Programming
4.5.1 Statement Format
4.5.2 Analysis Phase
4.5.3 Synthesis Phase
4.6 Types of Pass Structure
4.6.1 Two pass translation
4.6.2 Single pass translation
4.7 Forward References Solved Using Back-Patching
4.8 Advanced Assembler Directives
4.9 Design of Two-pass Assembler
4.9.1 Algorithm for Pass I
4.9.2 Intermediate Code Forms
4.9.3 Intermediate code for Imperative statement
4.9.4 Comparison of the variants
4.9.5 Algorithm for Pass - II
4.10 Error reporting of assembler
4.10.1 Error reporting in pass I
4.10.2 Error reporting in pass II
4.11 Algorithm of the Single-Pass Assembler
4.11.1 Intermediate Representation
Exercise

34
4.1 Introduction of Assembler
Assembler is a program, which generate the information for the loader by converts the
instruction written in low-level assembly code into re-locatable machine code.

Assemble Code Assembler Machine Code

It is also generates instructions by evaluating the mnemonics (symbols) in operation


field and find the value of symbol and literals to produce machine code. Now, if
assembler do all this work in one scan then it is called single pass assembler, otherwise
if it does in multiple scans then called multiple pass assembler. Here assembler divides
these tasks in two passes:

● Pass-1:
1. Define symbols and literals and remember them in symbol table and literal table
respectively.
2. Keep track of location counter.
3. Process pseudo-operations.
● Pass-2:
1. Generate object code by converting symbolic op-code into respective numeric
op-code
2. Generate data for literals and look for values of symbols. Firstly, we will take a
small assembly language program to understand the working in their respective
passes. Assembly language statement format:

[Label] [Opcode] [Operand]

4.2 Types of Assembly Statements


1. Imperative statement
An imperative statement indicates an action to be performed during the execution of
the assembled statement. Each imperative statement typically translates into one
machine instruction. These are executable statements. Some example of imperative
statement are given below

MOVER BREG, X
STOP

35
READ X
PRINT Y
ADD AREG, Z
2. Declaration Statement
Declaration statements are for reserving memory for variables.

The syntax of declaration statement is as follow:

[Label] DS <constant>

*Label + DC ‘<value>’

DS: stands for Declare storage, DC: stands for Declare constant. The DS statement
reserves area of memory and associates name with them.

Example: A DS 10

Above statement reserves 10 word of memory for variable A.

The DC statement constructs memory words containing constants.

Example: ONE DC ‘1’

Above statement associates the name ONE with a memory word containing the value
‘1’.

Any assembly program can use constant in two ways- as immediate operands, and as
literals. Many machine support immediate operands in machine instruction.

Example: ADD AREG, 5

But hypothetical machine does not support immediate operands as a part of the
machine instruction. It can still handle literals. A literal is an operand with the
syntax=’<value>’.

Example: ADD AREG, = ’5’

It differs from constant because its location cannot be specified in assembly program.

3. Assembler Directive
Assembler directives instruct the assembler to perform certain action during the
assembly program.

36
1) START
This directive indicates that first word of machine should be placed in the memory
word with address <constant>.

START <Constant>

Example: START 500

First word of the target program is stored from memory location 500 onwards.

2) END
This directive indicates end of the source program. The operand indicates address of
the instruction where the execution of program should begin.

By default it is first instruction of the program.

END <operand 2>

Execution control should transfer to label given in operand field.

4.2.1 Criteria OR Design and Assembly of Assembler

Figure 4.2.1: Design of Assembler

37
Example-1

Label Opcode Operand LC value(Location counter)


JOHN START 200
MOVER R1, ='3' 200
MOVE R1, X 201
M
L1 MOVER R2, ='2' 202
LTORG 203
X DS 1 204

Let’s take a look on how this program is working:

• START: This instruction starts the execution of program from location 200 and
label with START provides name for the program.(JOHN is name for program)
• MOVER: It moves the content of literal (=’3′) into register operand R1.
• MOVEM: It moves the content of register into memory operand(X).
• MOVER: It again moves the content of literal (=’2′) into register operand R2 and its
label is specified as L1.
• LTORG: It assigns address to literals (current LC value).
• DS (Data Space): It assigns a data space of 1 to Symbol X.
• END: It finishes the program execution.

4.3 Working of Pass-1


Define Symbol and literal table with their addresses.
Note: Literal address is specified by LTORG or END.

START 200: (Here no symbol or literal is found so both tables would be empty)

LITERAL ADDRESS

=’3′ –––

MOVER R1, =’3′ 200: (=’3′ is a literal so literal table is made)

MOVEM R1, X 201: X is a symbol referred prior to its declaration so it is stored in


symbol table with blank address field.
38
SYMBOL ADDRESS

X –––

L1 MOVER R2, =’2′ 202: L1 is a label and =’2′ is a literal so store them in respective
tables.
LITERAL ADDRESS
SYMBOL ADDRESS
=’3′ –––
X –––
=’2′ –––
L1 202

LTORG 203: Assign address to first literal specified by LC value, i.e., 203

LITERAL ADDRESS

=’3′ 203

=’2′ – ––

X DS 1 204: It is a data declaration statement i.e. X is assigned data space of 1. But X is a


symbol which was referred earlier in step 3 and defined in step 6. This condition is
called forward reference problem, where variable is referred prior to its declaration and
can be solved by back-patching. So now assembler will assign X the address specified
by LC value of current step.

SYMBOL ADDRESS

X 204

L1 202

END 205: Program finishes execution and the remaining literal will get the address
specified by LC value of END instruction. Here is the complete symbol and literal table
made by pass 1 of assembler.

39
SYMBOL ADDRESS LITERAL ADDRESS

X 204 =’3′ 203

L1 202 =’2′ 205

Now tables generated by pass 1 along with their LC value will go to pass-2 of
assembler for further processing of pseudo-opcodes and machine opcodes.

4.4 Working of Pass-2


Pass-2 of assembler generates machine code by converting symbolic machine-opcodes
into their respective bit configuration (machine understandable form). It stores all
machine-opcodes in MOT table (op-code table) with symbolic code, their length and
their bit configuration. It will also process pseudo-ops and will store them in POT table
(pseudo-op table).

Various Databases required by pass-2:

1. MOT table (machine opcode table)


2. POT table (pseudo opcode table)
3. Base table (storing value of base register)
4. LC (location counter)
As a whole assembler works as:

IR

Assembly Target
Pass-1 Pass-2
Program Program

Symbol
Table

40
Take a look at flowchart to understand:

READ

Search POT DS / DC

Search MOT Conclude data


space, length,
and output
Get Length, type constants
and Binary code
Exit

Evaluate
operands by ST

Assemble the
Update LC
instruction

4.5 Elements of Assembly Language Programming


An assembly language provides the following three basic facilities that simplify
programming:

1. Mnemonic operation codes: The mnemonic operation codes for machine


instructions (also called mnemonic opcodes) are easier to remember and use than
numeric operation codes. Their use also enables the assembler to detect use of
invalid operation codes in a program.
2. Symbolic operands: A programmer can associate symbolic names with data or
instructions and use these symbolic names as operands in assembly statements.
This facility frees the programmer from having to think of numeric addresses in a
program. We use the term symbolic name only in formal contexts; elsewhere we
simply say name.
3. Data declarations: Data can be declared in a variety of notations, including the
decimal notation. It avoids the need to manually specify constants in
representations that a computer can understand, for example, specify -5 as
(11111011)2 in the two's complement representation.

41
4.5.1 Statement Format
An assembly language statement has the following format:

[Label] <Opcode> <operand specification> [, <operand specification>.. ]

Where the notation [..] indicates that the enclosed specification is optional. If a label is
specified in a statement, it is associated as a symbolic name with the memory word
generated for the statement. If more than one memory word is generated for a
statement, the label would be associated with the first of these memory words.

< operand specification> has the following syntax:

< symbolic name> [± <displacement> ] [(<index register>)]

Thus, some possible operand forms are as follows:

• The operand AREA refers to the memory word with which the name AREA is
associated.
• The operand AREA+5 refer to the memory word that is 5 words away from the
word with the name AREA. Here '5' is the displacement or offset from AREA.
• The operand AREA(4) implies indexing the operand AREA with index register 4—
that is, the operand address is obtained by adding the contents of index register 4 to
the address of AREA.
• The operand AREA+5 (4) is a combination of the previous two specifications.

4.5.2 Analysis Phase


The primary function performed by the analysis phase is the building of the symbol
table. For this purpose it must determine the address of the symbolic name. It is
possible to determine some address directly, however others must be inferred. And this
function is called memory allocation. To implement memory allocation a data structure
called location counter (LC) is used, it is initialized to the constant specified in the
START statement. We refer to the processing involved in maintaining the location
counter as LC processing.

Tasks of Analysis phase


1. Isolate the label, mnemonics opcode, and operand fields of a constant.
2. If a label is present, enter the pair (symbol, <LC content>) in a new entry of symbol
table.
3. Check validity of mnemonics opcode.

42
4. Perform LC processing.

4.5.3 Synthesis Phase


Consider the assembly statement:

MOVER BREG, ONE

We must have following information to synthesize the machine instruction


corresponding to this statement:

1. Address of name ONE


2. Machine operation code corresponding to mnemonics MOVER.
The first item of information depends on the source program; hence it must be
available by analysis phase. The second item of information does not depend on the
source program; it depends on the assembly language. Based on above discussion, we
consider the use of two data structure during synthesis phase:

1. Symbol table: Each entry in the symbol table has two primary field- name and
address. This table is built by analysis phase
2. Mnemonics table: An entry in the mnemonics table has two primary fields-
mnemonics and opcode.
Tasks of Synthesis phase
1. Obtain machine opcode through look up in the mnemonics table.
2. Obtain address of memory operand from the symbol table.
3. Synthesize a machine instruction.

4.6 Types of Pass Structure


4.6.1 Two pass translation
Two pass translations consist of pass I and pass II. LC processing is performed in the
first pass and symbols defined in the program are entered into the symbol table, hence
first pass performs analysis of the source program. So, two pass translation of assembly
lang. program can handle forward reference easily. The second pass synthesizes the
target form using the address information found in the symbol table. First pass
constructs an intermediate representation of the source program and that will be used
by second pass. IR consists of two main components: data structure + IC (intermediate
code)

43
4.6.2 Single pass translation
A one pass assembler requires 1 scan of the source program to generate machine code.
The process of forward references is talked using a process called back patching. The
operand field of an instruction containing forward references is left blank initially. A
table of instruction containing forward references is maintained separately called table
of incomplete instruction (TII). This table can be used to fill-up the addresses in
incomplete instruction. The address of the forward referenced symbols is put in the
blank field with the help of back patching list.

4.7 Forward References Solved Using Back-Patching


The assembler implements the back patching technique as follows:

It builds a table of incomplete instructions (TII) to record information about


instructions whose operand fields were left blank. Each entry in this table contains a
pair of the form (instruction address, symbol) to indicate that the address of symbol
should be put in the operand field of the instruction with the address instruction
address. By the time the END statement is processed, the symbol table would contain
the addresses of all symbols defined in the source program and TII would contain
information describing all forward references. The assembler can now process each
entry in TII to complete the concerned instruction. Alternatively, entries in TII can be
processed on the fly during normal processing. In this approach, all forward references
to a symbol symbi would be processed when the statement that defines symbol symbi is
encountered. The instruction corresponding to the statement MOVER BREG, ONE
contains a forward reference to ONE. Hence the assembler leaves the second operand
field blank in the instruction that is assembled to reside in location 101 of memory, and
makes an entry (101, ONE) in the table of incomplete instructions (TII). While
processing the statement ONE DC '1' address of ONE, which is 115, is entered in the
symbol table. After the END statement is processed, the entry (101, ONE) would be
processed by obtaining the address of ONE from the symbol table and inserting it in
the second operand field of the instruction with assembled address 101.

4.8 Advanced Assembler Directives


1. ORIGIN
The syntax of this directive is

ORIGIN <address specification>

44
Where <address specification> is an <operand specification> or <constant>. This directive
instructs the assembler to put the address given by <address specification> in the location
counter. The ORIGIN statement is useful when the target program does not consist of a
single contiguous area of memory. The ability to use an <operand specification> in the
ORIGIN statement provides the ability to change the address in the location counter in
a relative rather than absolute manner.

2. EQU
The EQU directive has the syntax

<symbol> EQU <address specification>

Where <address specification> is either a <constant> or <symbolic name> ±

<displacement>. The EQU statement simply associates the name <symbol> with the
address specified by <address specification>. However, the address in the location
counter is not affected.

3. LTORG
The LT0RG directive, which stands for 'origin for literals', allows a programmer to
specify where literals should be placed. The assembler uses the following scheme for
placement of literals: When the use of a literal is seen in a statement, the assembler
enters it into a literal pool unless a matching literal already exists in the pool. At every
LTORG statement, as also at the END statement, the assembler allocates memory to the
literals of the literal pool and clears the literal pool. This way, a literal pool would
contain all literals used in the program since the start of the program or since the
previous LTORG statement. Thus, all references to literals are forward references by
definition. If a program does not use an LTORG statement, the assembler would enter
all literals used in the program into a single pool and allocate memory to them when it
encounters the END statement.

Consider the following assembly program to understand ORIGIN, EQU and LTORG

1 START 200

2 MOVER AREG, 200) +04 1 211


=’5’

3 MOVEM AREG, A 201) +05 1 217

4 LOOP MOVER AREG, A 202) +04 1 217

45
5 MOVER CREG, B 203) +04 3 218

6 ADD CREG, 204) +01 3 212


=’1’

7 …

12 BC ANY,NEXT 210) +07 6 214

13 LTORG

=’5’ 211) +00 0 005

=’1’ 212) +00 0 001

14 …

15 NEXT SUB AREG, 214) +02 1 219


=’1’

16 BC LT, BACK 215) +07 1 202

17 LAST STOP 216) +00 0 000

18 ORIGIN LOOP + 2

19 MULT CREG, B 204) +03 3 218

20 ORIGIN LAST + 1

21 A DS 1 217)

22 BACK EQU LOOP

23 B DS 1 218)

24 END

25 =’1’ 219) +00 0 001

ORIGIN: Statement number 18 of the above program viz. ORIGIN LOOP + 2 puts the address
204 in the location counter because symbol LOOP is associated with the address 202. The next
statement MULT CREG, B is therefore given the address 204.

46
EQU: On encountering the statement BACK EQU LOOP, the assembler associates the symbol
BACK with the address of LOOP i.e. with 202.

LTORG: In assembly program, the literals ='5' and ='1' are added to the literal pool in
Statements 2 and 6, respectively. The first LTORG statement (Statement 13) allocates the
addresses 211 and 212 to the values '5' and ‘1’. A new literal pool is now started. The value T is
put into this pool in Statement 15. This value is allocated the address 219 while processing the
END statement. The literal ='1' used in Statement 15 therefore refers to location 219 of the
second pool of literals rather than location 212 of the first pool.

4.9 Design of Two-pass Assembler


Data structures of assembler pass I OR explain the role of mnemonic opcode table,
symbol table, literal table, and pool table in assembling process of assembly language
program. OR Describe following data structures:

OPTAB, SYMTAB, LITTAB, and POOLTAB.

OPTAB

• A table of mnemonics opcode and related information


• OPTAB contains the field mnemonics opcodes, class and mnemonics info.
• The class field indicates whether the opcode belongs to an imperative statement
(IS), a declaration statement (DS), or an assembler directive (AD).
• If an imperative, the mnemonics info field contains the pair (machine code,
instruction length), else it contains the id of a routine to handle the declaration or
directive statement.

Opcode class info

MOVER IS (04,1)

DS DL R#7

START AD R#11

SYMTAB
A SYMTAB entry contains the symbol name, field address and length. Some address
can be determining directly, e.g. the address of the first instruction in the program,
however other must be inferred. To find address of other we must fix the addresses of
all program elements preceding it. This function is called memory allocation.

47
Symbol Address Length
LOOP 202 1
NEXT 214 1
LAST 216 1
A 217 1
BACK 202 1
B 218 1

LITTAB
A table of literals used in the program. A LITTAB entry contains the field literal and
address. The first pass uses LITTAB to collect all literals used in a program.

POOLTAB
Awareness of different literal pools is maintained using the auxiliary table POOLTAB.
This table contains the literal number of the starting literal of each literal pool. At any
stage, the current literal pool is the last pool in the LITTAB. On encountering an
LTORG statement (or the END statement), literals in the current pool are allocated
addresses starting with the current value in LC and LC is appropriately incremented.

literal Address
Literal no
1 =’5’
#1
2 =’1’
#3
3 =’1’
LITTAB POOLTAB

4.9.1 Algorithm for Pass I


1) loc_cntr=0(default value)
pooltab_ptr=1; POOLTAB[1]=1;
littab_ptr=1;
2) While next statement is not END statement
a) If a label is present then
this_label=symbol in label field
Enter (this_label, loc_cntr) in SYMTAB
48
b) If an LTORG statement then
i) Process literals LITTAB to allocate memory and put the address field.
Update loc_cntr accordingly
ii) pooltab_ptr= pooltab_ptr+1;
iii) POOLTAB[ pooltab_ptr]= littab_ptr
c) If a START or ORIGIN statement then
loc_cntr=value specified in operand field;
d) If an EQU statement then
i) This_address=value specified in <address spec>;
ii) Correct the symtab entry for this_label to (this_label, this_address);
e) If a declaration
i) Code= code of the declaration statement
ii) Size= size of memory area required by DC/DS
iii) loc_cntr=loc_cntr+size;
iv) Generate IC’ (DL, code)’..
f) If an imperative statement then
i) Code= machine opcode from OPTAB
ii) loc_cntr=loc_cntr+instruction length from OPTAB;
iii) If operand is a literal then
this_literal=literal in operand field;
LITTAB[littab_ptr]=this_literal;
littab_ptr= littab_ptr +1;
Else
this_entry= SYMTAB entry number of operand generate IC ‘(IS,
code)(S, this_entry)’;
3) (Processing END statement)
a) Perform step2 (b)
b) Generate IC ‘(AD, 02)’
c) Go to pass II

4.9.2 Intermediate Code Forms


Intermediate code consist of a set of IC units, each unit consisting of the following three
fields

1. Address
2. Representation of mnemonics opcode
3. Representation of operands
49
Mnemonics field
The mnemonics field contains a pair of the form (statement class, code). Where
statement class can be one of IS, DL, and AD standing for imperative statement,
declaration statement and assembler directive respectively. For imperative statement,
code is the instruction opcode in the machine language. For declarations and assembler
directives, code is an ordinal number within the class. Thus, (AD, 01) stands for
assembler directive number 1 which is the directive START. Codes for various
declaration statements and assembler directives.

The information in the mnemonics field is assumed to have the same representation in
all the variants.

Assembler directive
Declaration Statement
START 01 EQU 04
DC 01
END 02 LTORG 05
DS 02
ORIGIN 03

4.9.3 Intermediate code for Imperative statement


Variant I
First operand is represented by a single digit number which is a code for a register or
the condition code.

The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code). Where operand class is one of the C, S and L standing for
constant, symbol and literal. For a constant, the code field contains the internal
representation of the constant itself. Ex: the operand descriptor for the statement
START 200 is (C, 200). For a symbol or literal, the code field contains the ordinal
number of the operand’s entry in SYMTAB or LITTAB.

50
Condition Code
Register Code
LT 01
AREG 01
LE 02
BREG 02
EQ 03
CREG 03
GT 04
DREG 04
GE 05
ANY 06

Variant II
This variant differs from variant I of the intermediate code because in variant II
symbols, condition codes and CPU register are not processed. So, IC unit will not
generate for that during pass I.

Variant I Variant II
START 200 (AD,01) (C, 200) (AD,01) (C, 200)
READ A (IS, 09) (S, 01) (IS, 09) A
LOOP MOVER AREG, A (IS, 04) (1)(S, 01) (IS, 04) AREG, A
. . .
. . .
SUB AREG,=’1 (IS, 02) (1)(L, (IS, 02) AREG,(L, 01)
’ 01)
BC GT, LOOP (IS, 07) (4)(S, 02) (IS, 07) GT, LOOP
STOP (IS, 00) (IS, 00)
A DS 1 (DL, 02) (C,1) (DL, 02) (C,1)
LTORG (AD, 05) (AD, 05)
…..

51
4.9.4 Comparison of the variants

Variant I Variant II
IS, DL and AD all Contain DL and AD statements contain
statements processed form. processed form while for Is statements,
operand field is processed only to
identify literal references.
Extra work in pass I Extra work in pass II
Simplifies tasks in pass II Simplifies tasks in pass I
Occupies more memory Memory utilization of two passes gets
than pass II better balanced.

4.9.5 Algorithm for Pass - II

It has been assumed that the target code is to be assembled in the area named
code_area.

1. Code_area_adress= address of code_areas; Pooltab_ptr=1;


Loc_cntr=0;
2. While next statement is not an END statement
a) Clear machine_code_buffer;
b) If an LTORG statement
i) Process literals in LITTAB and assemble the literals in
machine_code_buffer.
ii) Size= size of memory area required for literals
iii) Pooltab_ptr=pooltab_ptr +1; If a START or ORIGIN statement
iv) Loc_cntr=value specified in operand field;
v) Size=0;
c) If a declaration statement
i) If a DC statements then assemble the constant in machine_code_buffer;
ii) Size= size of memory area required by DC/DS;
d) If an imperative statement
i) Get operand address from SYMTAB or LITTAB
ii) Assemble instruction in machine_code_buffer;
iii) Size=size of instruction;
e) If size≠ 0 then
i) Move contents of machine_code_buffer to the address
code_area_address+loc_cntr;

52
ii) Loc_cntr=loc_cntr+size;
3. Processing end statement
a) Perform steps 2(b) and 2(f)
b) Write code_area into output file.

4.10 Error Reporting of Assembler


4.10.1 Error reporting in pass I

Listing an error in first pass has the advantage that source program need not be
preserved till pass II. But, listing produced in pass I can only reports certain errors not
all. From the below program, error is detected at statement 9 and 21. Statement 9 gives
invalid opcode error because MVER does not match with any mnemonics in OPTAB.
Statement 21 gives duplicate definition error because entry of A is already exist in
symbol table. Undefined symbol B at statement 10 is harder to detect during pass I, this
error can be detected only after completing pass I.

Sr. no Statements Address


1 START 200
2 MOVER AREG,A 200
3 . .
. .
9 MVER BREG, A 207
**ERROR* Invalid opcode
10 ADD BREG, B 208
14 A DS 1 209
. . .
. . .
21 A DC ‘5’ 227
**ERROR** duplicate definition of symbol A
.
.
35 END
**ERROR** undefined symbol B in statement 10

53
4.10.2 Error Reporting In Pass II
During pass II data structure like SYMTAB is available. Error indication at statement 10
is also easy because symbol table is searched for an entry B. if match is not found, error
is reported.

Single Pass Assembler for Intel x86 Design


The algorithm for the Intel 8088 assembler is given at the end of this section. LC
processing in this algorithm differs from LC processing in the first pass of a two-pass
assembler in one significant respect. In Intel 8088, the unit for memory allocation is a
byte; however, certain entities require their first byte to be aligned on specific
boundaries in the address space. While processing declarations and imperative
statements, the assembler first aligns the address contained in the LC on the
appropriate boundary. We call this action LC alignment. Allocation of memory for a
statement is performed after LC alignment. The data structures of the assembler are as
follows:

1) Mnemonics Table (MOT)


The mnemonics table (MOT) is hash organized and contains the following fields:
mnemonic opcode, machine opcode, alignment/format info and routine id. The routine
id field of an entry specifies the routine which handles that opcode. Alignment/format
info is specific to a given routine.

Mnemonic Machine Alignment/ Routine id (4)


opcode (6) opcode (2) Format info (1)
JNE 75H OOH R2

2) Symbol Table (SYMTAB)


The symbol table (SYMTAB) is hash-organized and contains information about
symbols defined and used in the source program. The contents of some important
fields are as follows: The owner segment field indicates the id of the segment in which
the symbol is defined. It contains the SYMTAB entry # of the segment name. For a non-
EQU symbol the type field indicates the alignment information. For an EQU symbol,
the type field indicates whether the symbol is to be given a numeric value or a textual
value, and the value itself is accommodated in the owner segment and offset fields of
the entry.

54
3) Segment Register Table (SRTAB)
The segment register table (SRTAB) contains four entries, one for each segment register.
Each entry shows the SYMTAB entry # of the segment whose address is contained in
the segment register. SRTAB_ARRAY is an array of SRTABs.

Segment Register (1) SYMTAB entry # (2)


00 (ES) 23

4) Forward Reference Table (FRT)


Information concerning forward references to a symbol is organized as a linked list.
Thus, the forward reference table (FRT) contains a set of linked lists, one for each
forward referenced symbol.

Pointer (2) SRTAB# (1) Instruction Usage Code Source stmt #


address (2) (1) (2)

5) Cross Reference Table (CRT)


A cross reference directory is a report produced by the assembler which lists all
references to a symbol sorted in the ascending order by statement numbers. The
assembler uses the cross reference table (CRT) to collect the relevant information.

4.11 Algorithm of the Single-Pass Assembler


Important data structures used by the Single-Pass Assembler:

SYMTAB, SRTAB_ARRAY, CRT, FRT and ERRTAB


LC : Location Counter
code_area : Area for assembling the target program
code_area_address : Contains address of code_area srtab_no
: Number of the current SRTAB
stmt_no : Number of the current statement
SYMTAB_segment_entry : SYMTAB entry # of current segment
machine_code_buffer : Area of constructing code for one statement

Algorithm (Single pass assembler for Intel 8088)


1) code_area_address : = address of code_area;
srtab_no : = 1;
LC : = 0;

55
stmt_no : = 1;
SYMTAB_segment_entry : = 0; Clear ERRTAB, SRTAB_ARRAY.
2) While the next statement is not an END statement
a) Clear machine_code_buffer.
b) If a symbol is present in the label field then this_label := symbol in the label field;
c) If an EQU statement
i) this_address := value of <address specification>;
ii) Make an entry for this_label in SYMTAB with offset := this_addr;
Defined: = ‘yes’
owner_segment:= owner_segment in SYMTAB entry of the symbol in the
operand field. source_stmt_# := stmt_no;
iii) Enter stmt_no in the CRT list of the label in the operand field.
iv) Process forward references to this_label;
v) Size := 0;
d) If an ASSUME statement
i) Copy the SRTAB in SRTAB_ARRAY[srtab_no] into SRTAB_ARRAY
[srtab_no+1]
ii) srtab_no := srtab_no+1;
iii) For each specification in the ASSUME statement
(a) this_register := register mentioned in the specification.
(b) This_segment:= entry number of SYMTAB entry of the segment
appearing in the specification.
(c) Make the entry (this_register, this_segment) in SRTAB_ARRAY
[srtab_no]. (It overwrites an existing entry for this_register.)
(d) size: = 0;
e) If a SEGMENT statement
i) Make an entry for this_label in SYMTAB and note the entry number.
ii) Set segment name? := true;
iii) SYMTAB_segment_entry := entry no. in SYMTAB;
iv) LC :=0;
v) size := 0;
f) If an ENDS statement then SYMTAB_segment_entry :=0;
g) If a declaration statement
i) Align LC according to the specification in the operand field.
ii) Assemble the constant(s), if any, in the machine_code_buffer.
iii) size: = size of memory area required;
h) If an imperative statement
i) If the operand is a symbol symb then enter stmt_no in CRT list of symb.
56
ii) If the operand symbol is already defined then check its alignment and
addressability. Generate the address specification (segment register, offset) for
the symbol using its SYMTAB entry and SRTAB_ARRAY[srtab_no].
Else
Make an entry for symb in SYMTAB with defined :=’no’;
Make the entry (srtab_no, LC, usage code, stmt_no) in FRT of symb.
iii) Assemble instruction in machine_code_buffer.
iv) size := size of the instruction;
i) If size ≠ 0 then
i) If label is present then
Make an entry for this_label in
SYMTAB with owner_segment := SYMTAB_segment_entry;
Defined := ‘yes’;
offset := LC;
source_stmt_# := stmt_no;
ii) Move contents of machine_code_buffer to the address code_area_address +
<LC>;
iii) LC := LC + size;
iv) Process forward references to the symbol. Check for alignment and
addressability errors. Enter errors in the ERRTAB.
v) List the statement along with errors pertaining to it found in the ERRTAB.
vi) Clear ERRTAB.
3) (Processing of END statement)
a) Report undefined symbols from the SYMTAB.
b) Produce cross reference listing.
c) Write code_area into the output file.

Example 1:
Assembly Program to compute N! & equivalent machine language program

Opcode Register Memory


(2 digit) Operand Operand

(1 digit) (3 digit)

1 START 101

2 READ N 101) 09 0 113

57
3 MOVER BREG, ONE 102) 04 2 115

4 MOVEM BREG, TERM 103) 05 2 116

5 AGAIN MULT BREG,TERM 104) 03 2 116

6 MOVER CREG, TERM 105) 04 3 116

7 ADD CREG, ONE 106) 01 3 115

12 MOVEM CREG, TERM 107) 05 3 116

13 COMP CREG, N 108) 06 3 113

14 BC LE, AGAIN 109) 07 2 104

15 MOVEM BREG, RESULT 110) 05 2 114

16 PRINT RESULT 111) 10 0 114

17 STOP 112) 00 0 000

18 N DS 1 113)

19 RESULT DS 1 114)

20 ONE DC ‘1’ 115) 00 0 001

21 TERM DS 1 116)

22 END

Output of Pass-I: Data Structure of Pass-I of an assembler

OPTAB SYMTAB
Mnemonic Class Mnemonic Symbol Address Length
OPCODE info
AGAIN 104 1
START AD R#11
N 113 1
READ IS (09,1)
RESULT 114 1
MOVER IS (04,1)
ONE 115 1
MOVEM IS (05,1)
TERM 116 1
ADD IS (01,1)
BC IS (07,1)
DC DL R#5
SUB IS (02,1)
STOP IS (00,1)
58
COMP IS (06,1)
DS DL R#7
PRINT IS (10,1)
END AD
MULT IS (03,1)

LITTAB and POOLTAB are empty as there are no Literals defined in the program.

4.11.1 Intermediate Representation


Variant I Variant II

1 START 101 (AD, 01) (C, 101) (AD, 01) (C, 101)

2 READ N (IS, 09) (S, 02) (IS, 09) N

3 MOVER BREG, ONE (IS, 04) (2)(S, 04) (IS, 04) (2) ONE

4 MOVEM BREG, TERM (IS, 05) (2)(S, 05) (IS, 05) (2) TERM

5 AGAIN MULT BREG,TERM (IS, 03) (2)(S, 05) (IS, 03) (2) TERM

6 MOVER CREG, TERM (IS, 04) (3)(S, 05) (IS, 04) (3) TERM

7 ADD CREG, ONE (IS, 01) (3)(S, 04) (IS, 01) (3) ONE

12 MOVEM CREG, TERM (IS, 05) (3)(S, 05) (IS, 05) (3) TERM

13 COMP CREG, N (IS, 06) (3)(S, 02) (IS, 06) (3) N

14 BC LE, AGAIN (IS, 07) (2)(S, 01) (IS, 07) (2)AGAIN

15 MOVEM BREG, RESULT (IS, 05) (2)(S, 03) (IS, 05) (2)RESULT

16 PRINT RESULT (IS, 10) (S, 03) (IS, 10) RESULT

17 STOP (IS, 00) (IS, 00)

18 N DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)

19 RESULT DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)

20 ONE DC ‘1’ (DL, 01) (C, 1) (DL, 01) (C, 1)

21 TERM DS 1 (DL, 02) (C, 1) (DL, 02) (C, 1)

22 END (AD, 02) (AD, 02)

59
Exercise
Short Questions

1. What are the elements of assembly language programming?


2. What are analysis phase and its task?
3. Explain about the types of pass structure.
4. What are intermediate code forms and its statement?
5. Different between variants of intermediate code.
6. Write the algorithm of the single-pass assembler.

Long Questions

1. What is pass-1 and pass-2? How it works?


2. What is assembler? Explain about types of assembly statements.
3. How to design of two-pass assembler? Write the algorithm.

60
Chapter 5
Introduction to Linker and Loader

5.1 Introduction to Linker


5.1.1 Static Linking
5.1.2 Dynamic linking
5.2 Execution Phase
5.2.1 Linking Process
5.3 Design of a Linker
5.3.1 Relocation
5.3.2 Linking
5.4 Relocation of Linking Concept
5.4.1 Performing Relocation
5.4.2 Self-Relocating Programs
5.4.3 Linking in MS-DOS
5.5 Linking of Overlay Structured Programs
5.6 Introduction about Loader
5.7 Different Loading Schemes
5.7.1 Compile-and-Go Loaders
5.7.2 General Loader
5.7.3 Absolute Loaders
5.7.4 Relocating Loaders
5.7.5 Practical Relocating Loaders
5.7.6 Linking Loaders
5.7.7 Relocating Linking Loaders
Exercise

61
5.1 Introduction to Linker
The linker is an arrangement in a device that permits to link the item module of the
machine within a precious object record. It plays the manner of linking. The linker
is additionally known as hyperlink editors. Linking is the technique of
accumulating and retaining a bit of code and reality within a single report. Linker
additionally links a specific module within the system library. It yields item
modules from assembler for absorption and bureaucracy an execrable report as
output for the loader.
Linking is carried out at each compile-time, whilst the source code is rewritten into
machine code together with load-time, whilst the program is loaded into memory
utilizing the loader. Linking is implemented at the final step in compiling an
application.
Source code -> compiler -> Assembler -> Object code -> Linker -> Executable file -> Loader

Linking can be categorized into two types: Static Linking, Dynamic linking.

5.1.1 Static Linking


In static linking, the linker links all modules of a program before its execution
begins; it produces a binary program that does not contain any unresolved external
references. If statically linked programs use the same module from a library, each
program will get a private copy of the module. If many programs that use the
module are in execution at the same time, many copies of the module might be
present in memory.
Static linkers do justice to two predominant tasks:
• Symbol resolution: It friends every symbol reference with precisely one
symbol definition.
• Relocation: It relocates the code, data phase and modifies symbol references to
the transposed memory location.

The linker replica all library routines recycled in the program into an executable
program. Therefore, it requires extra memory space to store it. As it does no longer
require the presence of a library on the gadget when it’s far run. So it’s miles
quicker and more convenient. Success threats and much less error risk.
5.1.2 Dynamic linking
Dynamic linking is implemented all through the run time. This linking is executed
by means of putting on account of a shareable library inside the executable image.
There are extra likelihood of errors and failure possibilities. It requires less memory
area as multiple programs can proportion a single reproduction of the library.

Here we can carry out code sharing. It implies we are the use of the same items
several times inside the program. Instead of linking the equal item over and over
62
into the library, each module shares statistics of an object with different modules
having the same item. The shared library needed within the linking is stored in
digital memory to save RAM. In this linking, we also can relocate the code for the
smooth executing of code but all of the code isn’t re-locatable. It fixes the address at
run time.

5.2 Execution Phase


1) Translation time address: This type of address is passed down at the time of
translation. It is designated by the translator.
2) Linked time address: It’s a type of address is passed down on the link time.
This deal is assigned with the aid of the linker.
3) Load time address: This address is passed down at the load time. This address
is nominated by using the loader.
4) Translated origin: Address of source pretended by the translator.
5) Linked origin: Address of source simulated via the linker whilst generating a
binary program.
6) Load origin: Address of source simulated via the loader while programs are
loaded for execution.

Figure 5.2 Execution Phase

The translator converts the source program to their corresponding object modules,
which are stored in files for future use. At the time of linking, the linker combines
all these object modules together and convert them to their respective binary
modules. These binary modules are ready to execute the form. They are also stored
in the files for future use. At the time of execution, the loader uses these binary

63
modules and load to the correct memory location, and the required binary program
is obtained. The binary program, in turn, receives the input from the user in the
form of data and the result is obtained.

5.2.1 Linking Process


• Contemplate an application program AP subsisting of a set of program units SP
= {Pi}.
• A program unit Pi connects with one more program unit Pj by using addresses
of Pj’s instructions and data in its instructions.
• To recognize such interactions, Pj and Pi must contain public definitions and
external references as defined in the following: (public definition and external
reference)

• Public definition: one of a symbol pub_symb defined in a program unit


which may be accused in other program units.
• External reference: one type of reference to a symbol ext_symb which is
undefined in the program unit.

5.3 Design of a Linker


The design of the linker is divided into two parts:
5.3.1 Relocation
• The linker uses an area of memory called the work area for constructing the
binary program.
• It loads the machine language program found in the program component of
an object module into the work and re-locates the address sensitive
instructions in it by processing entries of the RELOCTAB.
• For each RELOCTAB entry, the linker determines the address of the word in
the work area that contains the address sensitive instruction and relocates it.
• The details of the address computation would depend on whether the linker
loads and relocates one object module at a time, or loads all object modules
that are to be linked together into the work area before performing
relocation.
Algorithm: Program Relocation
1. program_linked_origin := <link origin> from the linker command;
2. For each object module mentioned in the linker command
(a) t_origin := translated origin of the object module; OM size := size of the
object module;
(b) relocation_factor := program_linked_origin –t_origin;
(c) Read the machine language program contained in the program
component of the object module into the work-area.
64
(d) Read RELOCTAB of the object module.
(e) For each entry in RELOCTAB
i. translated_address := address found in the RELOCTAB entry;
ii. address_in_work_area := address of work_area +
translated_address –t_origin;
iii. Add relocation-factor to the operand address found in the word that has
the address address_in_work_area.
(f) Program_linked_origin := program_linked_origin +OM_size;

5.3.2 Linking
• An external reference to a symbol alpha can be resolved only if alpha is
declared as a public definition in some object module.
• Using this observation as the basis, program linking can be performed as
follows:
• The linker would process the linking tables (LINKTABs) of all object modules
that are to be linked and copy the information about public definitions found
in the min to a table called the name table (NTAB).
• The external reference to alpha would be resolved simply by searching for
alpha in this table, obtaining its linked address, and copying it into the word
that contains the external reference.
• Accordingly, each entry of the NTAB would contain the following fields:
• Symbol: Symbolic name of an external reference or an object module.
• Linked-address: For a public definition, this field contains the linked address
of the symbol. For an object module, it contains the linked origin of the object
module.
Algorithm: Program Linking
1. program_linked_origin = <link origin> from the linker command.
2. for each object module mentioned in the linker command
(a) t_origin = translated origin of the object module; OM_size = size of the
objectmodule;
(b) Relocation_factor = program_linked_origin – t_origin;
(c) Read the machine language program contained in the program
component of the object module into the work area.
(d) Read LINKTAB of the objectmodule.
(e) Enter (object module name, program_linked_origin) inNTAB.
(f) For each LINKTAB entry with type = PD name = symbol field of the
LINKTABentry;
linked_address = translated_address + relocation_factor; Enter (name,
linked_address) in a new entry of the NTAB.
(g) program_linked_origin : = program_linked_origin +OM_size;
65
3. for each object module mentioned in the linker command
(a) t_origin = translated origin of the object module; program_linked_origin
= linked_adress from NTAB;
(b) For each LINKTAB entry with type = EXT
i. address_in_work_area = address of work_area + program_linked_origin -
<link origin> in linker command + translated address –t_origin;
ii. Search the symbol found in the symbol field of the LINKTAB entry in
NTAB and note its linked address. Copy this address into the operand
address field in the word that has the address address_in_work_area.

5.4 Relocation of Linking Concept


• Program relocation is the procedure of modifying the addresses used in the
address sensitive instruction of a program so that the program can have a
correct output from the designated region of memory.
• If linked origin ≠ translated origin, linker must perform relocation.
• If load origin ≠ linked origin, loader must perform relocation.
• Let AA be the set of absolute address - instruction or data addresses – used in
the instruction of a program P.
• AA ≠ ф implies that program P assumes its instructions and data to occupy
memory words with specific addresses.
• Such a program is known as an address sensitive program – involve one or
both of the following:
• An address sensitive instruction: it’s an instruction that uses only an
address αi εAA.
• An address constant: it’s a data word which contains only an address αi ε
AA.
• An address sensitive program P can execute correctly only if the start
address of the memory area allocated to it is the same as its translated origin.
• To execute correctly from any other memory area, the address used in each
address sensitive instruction of P must be ‘corrected’.

5.4.1 Performing Relocation


• Let the translated and linked origins of program P be t_originp and l_originp,
respectively.
• Consider a symbol symb in P.
• Let its translation time address be tsymb and link-time address belsymb.
• The relocation factor of P is defined as
• Relocation_factorp = l_originp-t_originp ...................... (1)
• Note that relocation factorp can be positive, negative or zero.
66
• Consider a statement which uses symb as an operand. The translator puts the
address tsymb in the instruction generated for it. Now,
• Tsymb= t_originp +dsymb
• Where dsyml_bis the offset of symb in P.Hence
• lsymb= l_originp +dsymb
• Using(1),
• lsymb = t_originp + Relocation _factorp +dsymb
= t_originp + dsymb+ Relocation_factorp
= tsymb+Relocation _factorp………………….. (2)
Let IRPp designate the set of instructions requiring relocation in program P.
Following (2), relocation of program P can be performed by computing the
relocation factor for P and adding it to the translation time address (es) in every
instruction i εIRPp.
5.4.2 Self-Relocating Programs
The program can perform their location of its address sensitive instructions are
known as self relocating programs. Provisions for this purpose are:-
• A table of information concerning the address sensitive instructions exists
as a part of the program.
• Code to perform the relocation of an address sensitive instruction also
exists as a part of the program. This is called relocating logic.
• The start address of the relocating logic is specified as the execution start
address of the program.
• Thus the relocating logic gains control when the program is loaded in
memory for the execution.
• It uses the load address and therefore the information regarding address
sensitive instructions to perform its relocation.
• Execution control is now transferred to the relocated program.
• A self –relocating program will execute in any area in between the
memory.
• This is very critical in time-sharing operating systems in which the load
address of a program is probable to be specific for different executions.
5.4.3 Linking in MS-DOS
• We discuss the design of a linker for the Intel 8088/80x86 processors
which resembles LINK of MS-DOS in many respects.
• It may be noted that the object modules of MS-DOS differ from the Intel
specifications in some respects.
• Object Module Format (Explain object module of the program)

67
• An Intel 8088 object module is a sequence of object records, each object
record describing specific aspects of the programs in the object module.
• There are 14 types of object records containing the following five basic
categories of information:
• Binary image (i.e. code generated by a translator)
• External references
• Public definitions
• Debugging information (e.g. line number in source program).
• Miscellaneous information (e.g. comments in the source program).

• We only consider the object records corresponding to the first three


categories-a total of eight object record types.
• Each object record contains variable-length information and may refer to the
contents of previous object records.
• Each name in an object record is represented in the following format:

Length (1 byte) Name

5.5 Linking of Overlay Structured Programs


An overlay is part of a program (or software package) that has the same load origin
as some other part of the program. An overlay is used to reduce the main memory
requirement of a program.
Overlay structured program
We refer to a program containing overlays as an overlay structured program. Such
a program consists of in these ways:
• A permanently resident portion called the root.
• A set of overlays.
Execution of an overlay structured program proceeds as follows:
• In starting, in memory the root is loaded and for the purpose of execution
control is given.
• Other overlays are loaded as and when needed.
• Note that the loading of an overlay overwrites a previously loaded overlay
with the same load origin.
• This reduces the memory requirement of a program.
• It also makes it possible to execute programs whose size exceeds the amount of
memory that can be allocated to them.
• The structure of an overlay program is designed by identifying mutually
exclusive modules that are, modules that do not call each other.
• Such modules do not need to reside simultaneously in memory.

68
Execution of an overlay structured program
• For linking and execution of an overlay structured program in MS-DOS, the
linker produces a single executable file at the output, which contains two
provisions to support overlays.
• First, an overlay manager module is included in the executable file.
• This module is responsible for loading the overlays when needed.
• Second, all calls that cross overlay boundaries are replaced by an interrupt
producing instruction.
• To start with, the overlay manager receives control and loads the root.
• A procedure call that crosses overlay boundaries leads to an interrupt.
• This interrupt is processed by the overlay manager and the appropriate overlay
is loaded into memory.
• When each overlay is structured into a separate binary program, as in IBM's
mainframe systems, a call that crosses overlay boundaries leads to an interrupt
which is attended by the OS kernel.
• Control is now transferred to the OS loader to load the appropriate binary
program.

5.6 Introduction about Loader


A Loader is a type of routine that loads an object program and formulates it for
execution. There are numerous loading schemes: direct-linking, relocating, and
absolute. Primarily, the loader needs to relocate, link, and load the object program.
The loader is a program replaces programs into memory and assembles them for
execution. In an easy loading arrangement, the assembler yield the machine
language translation of a program on a subsidiary device, and a loader is placed in
the core. The loader is placed inside the memory of the machine language version
of the user’s program and also transfers control to it. Since the assembler is much
greater than the loader program, those make extra core available to the user’s
program.

5.7 Different Loading Schemes


5.7.1 Compile-and-Go Loaders
Assembler is loaded in one part of memory and assembled program directly into
their assigned memory location. After the loading process is complete, the
assembler transfers the control to the starting instruction of the loaded program.

Program loader
Source Compiler & go in memory and
Program Assembler Assembler
69
Advantages of the Compile-and-Go loaders
• The user needn’t be troubled with the separate steps of compilation,
assembling, linking, loading, and capital punishable.
• Execution speed is usually abundant in superior to interpreted systems.
• They are easy and simpler to implement.

Disadvantages of the Compile-and-Go loaders schemes


• There is misuse in memory space because of the presence of the assembler.
• The code should be reprocessed every time it’s run.

5.7.2 General Loader


The general loading scheme improves the compile/assemble-and-go scheme by
permitting totally different source programs (or modules of the same program) to
be translated separately into their respective object programs. The object code
(modules) is stored in the secondary storage area; and then, they are loaded. The
loader usually combines the object codes and executes them by loading them into
the memory, including space where the assembler had been in the assemble-and-go
scheme. Rather than the entire assembler sitting in the memory, a small utility
component called loader does the job. Note that the loader program is relatively a
lot of smaller than the assembler, therefore more space out there is available to the
user for their programs.

Advantages of the general loading


• Saves memory and makes it available for the user program as loaders are
smaller in size than assemblers. The loader replaces the assembler.
• Reassembly of the program is no more needed for later execution of the
program. The object file/deck is available and can be loaded and executed
directly at the desired location.
• This scheme allows the use of subroutines in several different languages
because the object files processed by the loader utility will all be in machine
language.

Disadvantages of the general loading


• The loader is more complicated and needs to manage multiple object files.
• Secondary storage is required to store object files, and they cannot be directly
placed into the memory by assemblers.

5.7.3 Absolute Loaders


The absolute loader for execution loads a binary program in memory.
A binary program that keeps an exceedingly file ought to contain the following:

70
• A Header record showing the load origin, length, and load time execution start
address of the program.
• A sequence of binary image records containing the program’s code. Each
binary image record contains a part of the program’s code in the form of a
sequence of bytes, the load address of the first byte of this code, and a count of
the number of bytes of code.
• The absolute loader records the load origin and also to the length of the
program mentioned within the header record.
• It then enters into a loop that reads a binary image record and moves the code
contained in it to the memory area beginning on the address mentioned within
the binary image record.
• In the top, it transfers control to the execution begin address of the program.

Advantages of the absolute loading


• Simple to implement and efficient in execution.
• Saves the memory (core) as a result the size of the loader is smaller than that of
the assembler.
• It permits the adoption of multi-source programs written in numerous
languages. In such cases, the given language assembler converts the source
program into the language, and a probable object file is then prepared by
address resolution.
• The loader is simpler and simply obeys the instruction concerning wherever to
put the object code within the main memory.

Disadvantages of the absolute loading


• The programmer must know and it specifies to the translator (the assembler)
the address in the memory for the inner-linking and loading of the programs.
Care should be taken so that the addresses do not overlap.
• For programs with multiple subroutines, the programmer must remember the
absolute address of each subroutine and use it explicitly in other subroutines to
perform linking.
• If the subroutine is modified, the program has to be assembled again from first
to last.

5.7.4 Relocating Loaders


A relocating loader loads a program in a designated area of memory relocates it so
that it can execute correctly in that area of memory and passes control to it for
execution.
The binary program is stored in a file contains the following:

• A Header record showing the load origin, length, and load time execution start
address of the program.
• A sequence of binary image records containing the program’s code. Each
71
binary image record contains a part of the program’s code in the form of a
sequence of bytes, the load address of the first byte of this code, and a count of
the number of bytes of code.
• A table is analogous to the RELOCTAB table giving linked addresses of an
address sensitive instruction in the program.

Algorithm: Relocating Loader


1) Program_load_origin = load origin specified in the loader command
2) program_linked_origin = linked origin specified in the header record
3) relocation_factor = program_load_origin –program_linked_origin
4) For each binary image record
a. code_linked_address = linked address specified in therecord
b. code_load_address = code_linked_address +relocation_factor
c. byte_count = count of the number of bytes in therecord
d. Move byte_count bytes from the record to the memory area with start
address code_load_address
5) Read RELOCTAB of the program
6) For each entry in the RELOCTAB
a. Instruction_linked_address = address specified in the RELOCTAB entry
b. instruction_load_address = instruction_linked_address +relocation_factor
c. Add relocation_factor to the operand address used in the instruction that
has the address instruction_load_address

5.7.5 Practical Relocating Loaders


To avoid possible assembling of all subroutine when a single subroutine is changed
and to perform the task of allocation and linking for the programmer, the general
class of relocating loader was introduced.
• Binary symbolic loader (BSS) is an example of a relocating loader.
• The output of assembler using BSS loader is:
1. Object program
2. Reference about other programs to be accessed
3. Information about address sensitive entities.

Let us consider a program segment as shown below.


Offset=10

Offset=30 ADD AREG, X

72
In the above program, the address of variable X in the instruction ADD AREG, X
will be 30. If this program is loaded from the memory location 500 for execution
then the address of X in the instruction ADD AREG, X must become 530.
Offset=10
500

Offset=30
ADD AREG, X 530
ADD AREG, X

5.7.6 Linking Loaders


A modern program comprises several procedures or subroutines together with the
main program module. The translator, in such cases as a compiler, will translate
them all independently into distinct object modules usually stored in the secondary
memory. Execution of the program in such cases is performed by linking together
these independent object modules and loading them into the main memory.
Linking of various object modules is done by the linker. A special system program
called linking loader gathers various object modules, links them together to
produce a single executable binary program, and loads them into the memory. This
category of loaders leads to a popular class of loaders called direct-linking loaders.
The loaders used in these situations are usually called linking loaders, which link
the necessary library functions and symbolic references. Essentially, linking loaders
accept and link together a set of object programs and a single file to load them into
the core. Linking loaders additionally perform relocation and overcome
disadvantages of other loading schemes.

5.7.7 Relocating Linking Loaders


Relocating linking loaders combines the relocating capabilities of relocating loaders
and the advanced linking features of linking loaders and presents a more robust
loading scheme. This necessarily eliminates the need to use two separate programs
for linking and loading respectively. These loaders can perform relocation and
linking both. These types of loaders are especially useful in a dynamic runtime
environment, wherein the link and load origins are highly dependent upon the
runtime situations. These loaders can work efficiently with support from the
operating system and utilize the memory and other resources efficiently.

73
Exercise

Short Questions

1. What is Linker? Writes on Linking Process.


2. Explain about Execution Phase.
3. Write the Design of a Linker.
4. What is linking in MS-DOS?
5. What is relocating loaders? Write the algorithm of relocating loader.

Long Questions

1. Difference between Static Linking and dynamic linking.


2. What is linking? Write the algorithm program linking.
3. What are loader and its different loading schemes?
4. Write the advantages and disadvantages of the absolute loading.

74
Chapter 6
Macro Processors

6.1 Introduction to Macro


6.2 Macro Processors
6.2.1 Macro Processor Operation
6.2.2 Silent Features of Macro Processor
6.3 Macro Definition and Call
6.3.1 Macro Expansion
6.4 Difference between Macro and Subroutine
6.5 Types of formal parameters
6.5.1 Positional Parameters
6.5.2 Keyword Parameters
6.5.3 Specifying Default Values of Parameters
6.5.4 Macros with Mixed Parameter Lists
6.6 Advanced Macro Facilities
6.7 Design of Macro Preprocessor
6.8 Design of Macro Assembler
6.9 Functions of Macro Processor
6.9.1 Basic Tasks of Macro Processor
6.10 Design Issues of Macro Processor
6.10.1 Design Features of Macro Processor
6.10.2 Macro Processor Design Options
6.11 One-pass Macro Processors
6.12 Design of Two-pass Macro Preprocessor
Exercise

75
6.1 Introduction to Macro
Formally, macro instructions (often called macro) are single-line abbreviations for
groups of instructions. For every occurrence of this one-line macro instruction
within a program, the instruction must be replaced by the entire block.
The advantages of using macro are as follows:
• Simplify and reduce the amount of repetitive coding.
• Reduce the possibility of errors caused by repetitive coding.
• Make an assembly program more readable.

6.2 Macro Processors


A preprocessor can be any program that processes its input data to produce output,
which is used as an input to another program. The outputs of the macro processors
are assembly programs that become inputs to the assembler. The macro processor
may exist independently and be called during the assembling process or be a part
of the assembler implementation itself.

6.2.1 Macro Processor Operation

76
6.2.2 Silent Features of Macro Processor

Macro represents a group of commonly used statements in the source


programming language. Macro Processor replaces each macro instruction with the
corresponding group of source language statements. This is known as the
expansion of macros. Using Macro instructions programmer can leave the
mechanical details to be handled by the macro processor. Macro Processor designs
are not directly related to the computer architecture on which it runs. Macro
Processor involves definition, invocation, and expansion.

6.3 Macro Definition and Call


It has been aforementioned that a macro consists of a name, a set of formal
parameters, and a body of codes. A macro can be defined by enclosing a set of
statements between a macro header and a macro end statement.
The formal structure of a macro includes the following features:
• Macro prototype statement: Specifies the name of the macro and name and
type of formal parameters.
• Model statements: Specify the statements in the body of the macro from which
assembly language statements are to be generated during expansion.
• Macro preprocessor statement: Specifies the statement used for performing
auxiliary function during macro expansion

A macro prototype statement can be written as follows:


<name_of_macro> [<formal parameter spec> [,...]]
Where [<formal parameter spec> [,...]] defines the parameter name and its kind,
which are of the following form:
&<name_of_parameter> [<parameter_type>]

A macro can be called by writing the name of the macro in the mnemonic field of
the assembly language.
The syntax of a typical macro call can be of the following form:
<name_of_macro> [<actual_parameter_spec> [,…]]

The MACRO directive in the mnemonic field specifies the start of the macro
definition and it should compulsorily have the macro name in the label field. The
MEND directive specifies the end of the macro definition. The statements between

77
MACRO and MEND directives define the body (model statements) of the macro
and can appear in the expanded code.
Macro Definition Example:

MACRO
INCR &MEM_VAL, &INC_VAL,
&REG MOVER &REG
&MEM_VAL
ADD &REG
&INC_VAL MOVEM
&REG
&MEM_VAL MEND

Macro Call Example: INCR A, B


6.3.1 Macro Expansion
A macro call in a program leads to macro expansion. To expand a macro, the name of the
macro is placed in the operation field, and no special directives are necessary. During
macro expansion, the macro name statement in the program is replaced by the sequence
of assembly statements. Let us consider the following example:

START 100
A DS 1
B DS 1
INCR A, B, AREG
PRINT A
STOP
END

The preceding example uses a statement that calls the macro. The assembly code
sequence INCR A, B, AREG is an example of the macro call, with A and B being the
actual parameters of the macro. While passing over the assembly program, the
assembler recognizes INCR as the name of the macro, expands the macro, and places a
copy of the macro definition (along with the parameter substitutions). The expanded
code for the code is as below.

78
START 100
A DS 1
B DS 1
+ MOVER REG A
+ ADD REG B
+ MOVEM REG A
PRINT A
STOP
END

The statements marked with a ‘+’ sign in the preceding label field denote the expanded
code and differentiate them from the original statements of the program.

6.4 Difference between Macro and Subroutine

Macro Subroutine
Macro name in the mnemonic field Subroutine name in a call statement in
leads to an expansion only. the program leads to execution.
Macros are completely handled Subroutines are completely handled by
by the assembler during assembly the hardware at runtime.
time.
Macro definition and macro Hardware executes the subroutine call
expansion are executed by the instruction. So, it has to know how to
assembler. So, the assembler has to save the return address and how to
know all the features, options, and branch to the subroutine.
exceptions associated with them.
The hardware knows nothing about The assembler knows subroutines
macros. nothing about anything.
The macro processor generates a The subroutine call instruction is
new copy of the macro and places it assembled in the usual way and treated
in the program. by the assembler as any other
instruction.
Macro processing increases the size The use of subroutines does not result
of the resulting code but results in in bulk object codes but has substantial
faster execution of the program for overheads of control transfer during
expanded programs. execution.

79
6.5 Types of formal parameters
6.5.1 Positional parameters
For positional formal parameters, the specification <parameter kind> of syntax rule is
simply omitted. Thus, a positional formal parameter is written as &<parameter name>,
e.g., &SAMPLE where SAMPLE is the name of a parameter. In a call on a macro using
positional parameters (see syntax rule (4.2)), the <actual parameter specification> is an
ordinary string.
The value of a positional formal parameter XYZ is determined by the rule of positional
association as follows:
• Find the ordinal position of XYZ in the list of formal parameters in the macro
prototype statement.
• Find the actual parameter specification that occupies the same ordinal position in
the list of actual parameters in the macro call statement. If it is an ordinary string
ABC, the value of the formal parameter XYZ would be ABC.

6.5.2 Keyword Parameters


For keyword parameters, the specification <parameter kind> is the string '=' in syntax
rule. The <actual parameter specification> is written as <formal parameter name> =
<ordinary string>. The value of a formal parameter is determined by the rule of
keyword association as follows:
• Find the actual parameter specification which has the form XYZ= <ordinary string>.
• If the <ordinary string> in the specification is some string ABC, the value of formal
parameter XYZ would be ABC.

6.5.3 Specifying Default Values of Parameters


If a parameter has the same value in most calls on a macro, this value can be specified
as its default value in the macro definition itself. If a macro call does not explicitly
specify the value of the parameter, the preprocessor uses its default value; otherwise, it
uses the value specified in the macro call. This way, a programmer would have to
specify a value of the parameter only when it differs from its default value specified in
the macro definition. Default values of keyword parameters can be specified by
extending the syntax of formal parameter specification as follows:
&< parameter name > [< parameter kind > [< default value >]]

6.5.4 Macros with Mixed Parameter Lists


A macro definition may use both positional and keyword parameters. In such a case,
all positional parameters must precede all keyword parameters in a macro call.
For example, in the macro call
80
SUMUP A, B, G=20, H=X
A and B are positional parameters while G and H are keyword parameters.
Correspondence between actual and formal parameters is established by applying the
rules governing positional and keyword parameters separately.

6.6 Advanced Macro Facilities


Expansion time statement: AIF
An AIF statement has the syntax: AIF (<expression>) <sequencing symbol>
Where <expression> is a relational expression involving ordinary strings, formal
parameters and their attributes, and expansion time variables. If the relational
expression evaluates to true, expansion time control is transferred to the statement
containing <sequencing symbol> in its label field.

AGO
An AGO statement has the syntax: AGO <sequencing symbol>
It unconditionally transfers expansion time control to the statement containing
<sequencing symbol> in its label field.

Expansion time loops:

It is often necessary to generate many similar statements during the expansion of a


macro. This can be achieved by writing similar model statements in the macro.
Expansion time loops can be written using expansion time variables (EV’s) and
expansion time control transfer statements AIF and AGO.

Example
MACRO

CLEAR &X, &N


LCL &M
&M SET 0
MOVER AREG, =’0’
.MORE MOVEM AREG, &X+&M
&M SET &M + 1
AIF (&M NE
N)
.MORE

81
MEND
The LCL statement declares M to be a local EV. At the start of the expansion of the call,
M is initialized to zero. The expansion of model statement MOVEM, AREG, &X+&M
thus leads to the generation of the statement MOVEM AREG, B. The value of M is
incremented by 1 and the model statement MOVEM is expanded repeatedly until its
value equals the value of N.

Expansion time variable


Expansion time variables (EV's) are variables which can only be used during the
expansion of macro calls. A local EV is created for use only during a particular macro
call. A global EV exists across all macro calls situated in a program and can be used in
any macro which has a declaration for it. Local and global EV's are created through
declaration statements with the following syntax:
LCL <EV specification> [, <EV specification> ..]
GBL <EV specification> [, <EV specification> ..]
<EV specification> has the syntax &<EV name>, where <EV name> is an ordinary
string.
Values of EV's can be manipulated through the preprocessor statement SET.
A SET statement is written as:
< EV specification > SET <SET-expression>
Where < EV specification > appears in the label field and SET in mnemonic field.
A SET statement assigns value of <SET-expression> to the EV specified in <EV
specification>.

Example

MACRO
CONSTANTS
LCL &A
&A SET 1
DB &A
&A SET &A
+l
DB &A
MEND

82
The local EV A is created. The first SET statement assigns the value '1' to it. The first
DB statement thus declares a byte constant ‘1’. The second SET statement assigns the
value '2' to A and the second DB statement declares a constant '2'.

Attributes of the formal parameter


An attribute is written using the syntax
<attribute name> ’ <formal parameter spec>
It represents information about the value of the formal parameter, i.e. about the
corresponding actual parameter. The type, length, and size attributes have the names
T, L, and S.
Example
MACRO
DCL_CONST &A
AIF (L'&A EQ 1).NEXT
--
.NEXT
--
MEND
Here expansion time control is transferred to the statement having .NEXT field only if
the actual parameter corresponding to the formal parameter length of ' 1'.

6.7 Design of Macro Preprocessor


Macro preprocessors are vital for processing all programs that contain macro
definitions and/or calls. Language translators such as assemblers and compilers cannot
directly generate the target code from the programs containing definitions and calls for
macros. Therefore, most language processing activities by assemblers and compilers
preprocess these programs through macro processors. A macro preprocessor essentially
accepts an assembly program with macro definitions and calls as its input and
processes it into an equivalent expanded assembly program with no macro definitions
and calls. The macro preprocessor output program is then passed over to an assembler
to generate the target object program.

83
The general design semantics of a macro preprocessor is shown as below

Macro Pre- A Program without Macro


Definiteness
Processor

Program
Expanded
Expansion
Program Program

with Macro
Definitions Assembler
and Macro

Calls

Input Program Object Program

The design of a macro preprocessor is influenced by the provisions for performing the
following tasks involved in macro expansion:
• Recognize macro calls: A table is maintained to store names of all macros
defined in a program. Such a table is called Macro Name Table (MNT) in which
an entry is made for every macro definition being processed. During processing
program statements, a match is done to compare strings in the mnemonic field
with entries in the MNT. A successful match in the MNT indicates that the
statement is a macro call.
• Determine the values of formal parameters: A table called Actual Parameter
Table (APT) holds the values of formal parameters during the expansion of a
macro call. The entry into this table will be in pair of the form (<formal
parameter name>, <value>). A table called Parameter Default Table (PDT)
contains information about default parameters stored as pairs of the form
(<formal parameter name>, <default value>) for each macro defined in the
program. If the programmer does not specify the value for any or some
parameters, its corresponding default value is copied from PDT to APT.
• Maintain the values of expansion time variables declared in a macro: A table
called Expansion time Variable Table (EVT) maintains information about
84
expansion variables in the form (<EV name>, <value>). It is used when a
preprocessor statement or a model statement during expansion refers to an EV.
• Organize expansion time control flow: A table called Macro Definition Table
(MDT) is used to store the body of a macro. The flow of control determines when
a model statement from the MDT is to be visited for expansion during macro
expansion. MEC {Macro Expansion Counter) is defined and initialized to the
first statement of the macro body in the MDT. MDT is updated following an
expansion of a model statement by a macro preprocessor.
• Determine the values of sequencing symbols: A table called Sequencing Symbols
Table (SST) maintains information about sequencing symbols in pairs of the form
(<sequencing symbol name>, <MDT entry #>)
Where <MDT entry #> denotes the index of the MDT entry containing the model
statement with the sequencing symbol. Entries are made on encountering a
statement with the sequencing symbol in their label field or on reading a
reference before its definition.
• Perform expansion of a model statement: The expansion task has the following
steps:
• MEC points to the entry in the MDT table with the model statements.
• APT and EVT provide the values of the formal parameters and EVs,
respectively.
• SST enables identifying the model statement and defining sequencing.

6.8 Design of Macro Assembler


A macro processor is functionally independent of the assembler, and the output of the
macro processor will be a part of the input into the assembler. A macro processor,
similar to any other assembler, scans, and processes statements. Often, the use of a
separate macro processor for handling macro instructions leads to less efficient
program translation because many functions are duplicated by the assembler and
macro processor. To overcome efficiency issues and avoid duplicate work by the
assembler, the macro processor is generally implemented within pass 1 of the
assembler. The integration of the Macro processor and assembler is often referred to as
a Macro assembler. Such implementations will help in eliminating the overheads of
creating intermediate files, thus improving the performance of integration by
combining similar functions.

The advantages of a macro assembler are as follows:


• It ensures that many functions need not be implemented twice.
85
• Results in fewer overheads because many functions are combined and do not
need to create intermediate (temporary) files.
• It offers more flexibility in programming and allows the use of all assembler
features in combination with macros.
The disadvantages of a macro assembler are as follows:
• The resulting pass by combining macro processing and pass 1 of the assembler
may be too large and sometimes suffer from core memory problems.
• The combination of macro processor pass 0 and pass I may sometimes increase
the complexity of program translation, which is not desired.

6.9 Functions of Macro Processor


The design and operation of a macro processor greatly influence the activities performed
by it. In general, a macro processor will perform the following tasks:
• Identifies macro definitions and calls in the program.
• Determines formal parameters and their values.
• Keeps track of the values of expansion time variables and sequencing symbols
declared in a macro.
• Handles expansion time control flow and performs expansion of model
statements.
6.9.1 Basic Tasks of Macro Processor
Macro processing involves the following two separate steps for handling macros in a
program:

1. Handling Macro Definition


In general, a macro in a program can have only one definition, but it can be called
(expanded) many times. To handle macro definitions, the macro assembler
maintains tables called Macro Name Table (MNT) and Macro Definition Table
(MDT). MNT is used to maintain a list of macros names defined in the program,
while MDT contains the actual definition statements (the body of macro). Macro
definition handling starts with a MACRO directive in the statement. The assembler
continually reads the definition from the source file and saves it in the MDT. During
this, the assembler, in most cases, will just save the definition as it is in the MDT and
not try to assemble it or execute it. On encountering the MACRO directive in the
source assembly program, the assembler changes from the regular mode to a special
macro definition mode, wherein it does the following activities:
• Analyzes the available space in the MDT

86
• Reads continually the statements and writes them to the MDT until the MEND
directive is found.
When a MEND directive is encountered in the source file, the assembler reverts to
the normal mode. If the MEND directive is missing, the assembler will stay in the
macro definition mode and continue to save program statements in the MDT until
an obvious error occurs. This will happen in cases such as reading another MACRO
directive or an END statement, where the assembler will generate an error
(runaway definition) and abort the assembly process.

2. Handling the Macro Definition


When an assembler encounters a source statement that is not an instruction or a
directive, it becomes optimistic and does not flag error immediately. The assembler
will rather search the MNT and MDT for a macro with name in the opcode file and,
on locating a valid entry for it, change its mode of operation from normal mode to
macro expansion mode. The succession of tasks performed in the macro expansion
mode is as follows:
• Reading back a source statement from the MDT entry.
• Writing the statement on the new source listing files unless it is a pass-0
directive (such as a call or definition of another macro); in that case, it is
executed immediately.
• Repeating the previous two steps until the end of the macro is located in the MDT.

6.10 Design Issues of Macro Processor


The most highlighted key issues of a macro processing design are as follows:
• Flexible data structures and databases: They should maintain several data
structures to keep track of locations, nesting structures, values of formal and
positional parameters, and other important information concerning the source
program.
• Attributes of macro arguments: Macro arguments used for expansion have
attributes. These attributes include count, type, length, integer, scaling, and
number attributes. Attributes can be used to make decisions each time a macro is
expanded. Note that attributes are unknown at the time of macro definition and
are known only when the macro is expanded.
• Default arguments: Many assemblers allow the use of default arguments. This
means when the actual argument that binds the formal argument is null in a
certain expansion, the argument will be bound to the default value specified in the

87
definition.
• Numeric values of arguments: Although most macro processors treat arguments
normally as strings. Some assemblers, like VAX assembler, optionally allow using
the value, rather than the name of the argument.
• Comments in macros: Comments are printed with macro definition, but they
might or might not be with each expansion. Some comments are meant only for
definitions, while some are expected in the expanded code.
6.10.1 Design Features of Macro Processor

The features of a macro processor design are as follows:


• Associating macro parameters with their arguments: All macro processors
support associating macro parameters by position, name, and numeric position in
the argument list.
• Delimiting macro parameters: Macro processors use specially defined characters
such as delimiters or a scheme where parameters can be delimited in a general
way. Characters like ';' and ‘.’ are used in many macro processor implementations.
• Directives related to arguments: Modern macro processors support arguments
that ease the task of writing sophisticated macros. The directive IF-ELSE-ENDIF
helps decide whether an argument is blank or not, or whether identical or different
arguments are used.
• Automatic label generation: Directives like IRP and PRINT provide facilities to
work with labels. A pair of IRP directives defines a sequence of lines directing the
assembler to repeatedly duplicate and assemble the sequence as many times as
determined by a compound parameter. The PRINT directive suppresses listing of
macro expansions or turns on such a listing.
• Machine-independent features: They include a concatenation of macro
parameter) generation of unique labels, conditional macro expansion, and
keyword macro parameters.
6.10.2 Macro Processor Design Options
1. Recursive Macro Expansion
For an efficient preprocessing and expansion, preprocessors use a language that
supports recursion and allow the use of variables and data structure for
recursive expansions. The support for handling global expansion variables
together with local variables is highly desirable.

2. General-purpose Macro Processors


Modern macro processors are more generic and not restricted to any specific
88
language Facilities to call and process them are possible with macro languages.
Although most Macros are used in assembler language, they are used with
higher-level languages as well and need to learn a different macro language for
programming macros. Although desirable, yet such type of general macro
facility is hard since each language follows its way of doing tasks.

3. Macro Processing within Language Translators


All macro definitions must be processed, symbols are resolved, and calls be
explained before a program is sent to language translators such as assemblers.
Therefore, macro processing must be a preprocessing step for language
assembling. Therefore, integrating macro processing within the translation
activity is highly desirable.

4. Line-by-Line Macro Processor


Design options that avoid making an extra pass over the source program is
another interesting aspect to look into. Performing the Macro processing line by
line enables the sharing of data structures, utility functions, and procedures, and
supporting diagnostic messages. Line- by-line macro processor environment
reads source program, processes macro definitions, expands macro call, and
finally transfers output lines to translators like assemblers or compilers.

6.11 One-pass Macro Processors


A one-pass macro processor is another design option available for macro processing.
The restriction in working with one-pass macro processors is that they strictly require
the definition of a macro to appear always before any statements that invoke that
macro in the program. The important data structures required in a one-pass macro
processor are:
• DEFTAB (Definition Table): It is a definition table that is used to store the macro
definition including macro prototype and macro body. Comment lines are not
included here, and references to the parameters use the positional notation for
efficiency in substituting arguments.
• NAMTAB (Name Table): This table is used for storing macros names. It serves as
an index to DEFTAB and maintains pointers that point to the beginning and end of
the macro definition in DEFTAB.
• ARGTAB (Argument Table): It maintains arguments according to their positions
in the argument list. During expansion, the arguments from this table are
substituted for the corresponding parameters in the macro body.

89
One-pass Macro Processor scheme is presented as below:

GETLINE PROCESSLI

Macro definition
Macro call

DEFINE EXPAND

NAMTAB DEFTAB ARGTAB

6.12 Design of Two-pass Macro Preprocessor

Pass 0 of Assembler
The activities of the pass-0 macro processor are given in the following steps:
1. Read and examine the next source statement.
2. If MACRO statement, continue reading the source and copy the entire macro
definition to the MDT. Go to Step1.
3. If the statement is a pass-0 directive, execute it. Go to Step 1. (These directives
are written to the new source file in a unique manner (different from normal
directives). They are only needed for the listing in pass2.
4. If the statement contains a macro name, it must perform expansion, that is, read
model statements from the MDT corresponding to the call, substitute
parameters, and write each statement to the new source file (or execute it if it is a
pass-0 directive). Go to Step1.
5. For any other statement, write the statement to the new source file. Go to Step1.
6. If the current statement contains the END directive, stop (end of pass0).

The assembler will be in one of the three modes:


• In the normal mode, the assembler will read statement lines from the source file and
write them to the new source file. There is no translation or any change in the
statements. In the macro definition mode, the assembler will continuously copy the
source file to the MDT.

90
• In the macro expansion mode, the assembler will read statements from the MDT,
substitute parameters, and write them to the new source file. Nested macros can be
implemented using the Definition and Expansion (DE) mode.

Pass 1 of Macro Processor - Processing Macro Definitions


1. Initialize MDTC and MNTC.
2. Read the next source statement of the program.
3. If the statement contains MACRO pseudo-op. go to Step6.
4. Output the instruction of the program.
5. If the statement contains END pseudo-op, go to Pass 2, else go to Step2.
6. Read the next source statement of the program.
7. Make an entry of the macro name and MTDC into MNT at location MNTC and
increment the MNTC by 1.
8. Prepare the parameter (arguments) list array.
9. Enter the Macro name into the MDT and increment the MTDC by1.
10. Read the next card and substitute index for the parameters (arguments).
11. Enter the statement into the MDT and increment the MDT by 1.
12. If MEND pseudo-op found, go to step 2, else go to Step 10.

Pass 2 of Macro Processor - Processing for Calls and Expansion of Macro


2. Read the next source statement copied by pass1.
3. Search into the MNT for record and evaluate the operation code.
4. If the operation code has a macro name, go to Step 5.
5. Write the statement to the expanded source file.
6. If END pseudo-op found, pass the entire expanded code to the assembler for
assembling and stop. Else go to Step1.
7. Update the MDTP to the MDT index from the MNT entry.
8. Prepare the parameter (argument) list array.
9. Increment the MDTP by 1.
10. Read the statement from the current MDT and substitute actual parameters
(arguments) from the macro call.
11. If the statement contains MEND pseudo-op, go to Step 1, else write the expanded
source code and go to Step 8.

91
Exercise
Short Questions

1. What are macro processors?


2. Write the macro processor operation.
3. Draw the design issues of macro processor.
4. What is design of macro assembler?

Long Questions

1. Difference between macro and subroutine


2. Discuss about the types of formal parameters.
3. What is advanced macro facilities? Define design of macro pre-processor.
4. Different between one-pass and two-pass macro processors.

92
Chapter 7
Introduction to Compilers

7.1 Introduction to Compiler


7.2 Binding and Binding Times
7.2.1 Introduction to Binding
7.2.2 Importance of Binding Times
7.3 Memory Allocation
7.3.1 Static Memory Allocation
7.3.2 Dynamic Memory Allocation
7.3.3 Memory Allocation in Block-Structured Language
7.3.4 Dynamic Pointer
7.3.5 Static pointer
7.4 Compilation of Expression
7.4.1 Operand Descriptor
7.4.2 Register Descriptors
7.5 Intermediate Code for the Expression
7.5.1 Quadruple Representation
7.5.2 Triples Representation
7.5.3 Indirect Triples Representation
7.6 Code Optimization
7.7 Overview of Interpretation
7.7.1 Comparison between Compilers and Interpreters
7.7.2 Comparing the Performance of Compilers and Interpreters
7.7.3 Benefits of Interpretation
Exercise

93
7.1 Introduction to Compiler
A compiler is motherly known as translators who reshape the high-degree
language into the machine oriented language. A high-level language is written by a
developer and machine language can be understood by the processor. The compiler
is used to show errors to the programmer. The main purpose of the compiler is to
change the code written in one language without changing the meaning of the
program. When you execute a program that is written in HLL programming
language then it executes into two parts. In the first part, the source program
compiled and translated into the object program (low-level language). In the
second part, an object program translated into the target program through the
assembler.

Execution process of the source program in Compiler

Source Program Compiler Object Program

Object Program Assembler Target Program

Two aspects of the compilation are:


a) Generate code to implement the meaning of a original program within the
execution domain (target code generation)
b) Provide diagnostics for violations of PL semantics in a program (Error
reporting)
There are four issues involved in implementing these aspects
1. Data types: semantics of a data type require a compiler to ensure that variables
of a type are assigned or manipulated only through legal operation and the
compiler must generate type-specific code to implement an operation.
2. Data structures: to compile a reference to an element of a data structure, the
compiler must develop a memory mapping to access the memory word
allocated to the element.
3. Scope rules: compiler operates called scope analysis and name resolution to
determine the data item designated by the use of a name in the source program
4. Control structure: control structure includes the conditional transfer of control,
conditional execution, iteration control, and procedure calls. The compiler must
ensure that the source program does not violate the semantics of control
structures.
94
7.2 Binding and Binding Times
7.2.1 Introduction to Binding
A binding is that the association of a characteristics of an application entity with a
worth. The binding of an attribute may be performed at any convenient time
subject to the condition that the value of the attribute should be known when the
attribute is referenced. Binding time is that time at which binding is performed.

The following binding times arise in compilers:


1) Language definition time of a programming language L, which is the time at
which features of the language are specified.
2) Language implementation time of a programming language L, which is the
time at which the design of a language translator for L is finalized.
3) Compilation time of a program P.
4) Execution init time of a procedure proc.
5) The execution time of a procedure proc.

7.2.2 Importance of Binding Times


The binding time of an entity's attributes determines how a language processor can
handle the use of the entity in the program. A compiler can tailor the code
generated to access an entity if a relevant binding was performed before or during
compilation time. However, such tailoring is not possible if the binding is
performed later than compilation time. So the compiler has to generate a general-
purpose code that would find information about the relevant binding during its
execution and use it to access the entity appropriately. It affects the execution efficiency of
the target program.

7.3 Memory Allocation


Three important tasks of memory allocation are:
1. Determine the number of memory required to represent the worth of the record
item.
2. Use an appropriate memory allocation version to implement the lifetimes and
scopes of record items.
3. Determine appropriate memory mappings to get right of entry to the values in
a nonscalar records item, e.g. values in an array.

95
Memory allocation is mainly divided into two types:
1. Static binding
2. Dynamic binding

7.3.1 Static Memory Allocation


In static memory allocation, memory is allocated to a variable before the execution
of a program begins. Static memory allocation is typically performed during
compilation. No memory allocation or de-allocation actions are performed during
the execution of a program. Thus, variables remain permanently allocated.

7.3.2 Dynamic Memory Allocation


In dynamic memory allocation, memory binding share established and destroyed
during the execution of a program. Dynamic memory allocation has two flavors’-
automatic allocation and program-controlled allocation. In automatic dynamic
allocation, memory is allocated to the variables declared in a program unit when
the program unit is entered during execution and is de-allocated when the program
unit is an exit. Thus the same memory area may be used for the variables of
different program units. In program-controlled dynamic allocation, a program can
allocate or de-allocate memory at arbitrary points during its execution. It is obvious
that in both automatic and program-controlled allocation, the address of the
memory area allocated to a program unit cannot be determined at compilation
time.

7.3.3 Memory Allocation in Block-Structured Language


The block is a sequence of statements containing the local data and declarations
which are enclosed within the delimiters.
Example :
A
{
Statements
…..
}

The delimiters mark the beginning and the end of the block. There can be nested
blocks for example block B2 can be completely defined within block B1. A block-
structured language uses dynamic memory allocation. Finding the scope of the
variable means checking the visibility within the block.

96
Following are the rules used to determine the scope of the variable:
1. Variable X is accessed within the block B1 if it can be accessed by any statement
situated in block B1.
2. Variable X is accessed by any statement in block B2 and block B2 is situated in
block B1.

There are two types of variable situated in the block-structured language


1. Local variable
2. Non-local variable

To understand local and non-local variable consider the following example

Procedure A
{
int x, y, z
Procedure B
{
int a, b
}
Procedure C
{
int m, n
}
}

Table 7.3.3 block-structured


Procedure Local variables Non-local variables
A x, y, z
B a, b x, y, z
C m, n x, y, z

Variables x, y, and z are local variables to procedure A but those are non-local to
block B and C because these variables are not defined locally within the block B
and C but are accessible within these blocks. Dynamic allocation which is
automatic, implemented using the extended stack model. Each file within the stack
has two reserved pointers rather than one. Each stack document comprises the
variable for one activation of a block, which is called an activation record (AR).

97
7.3.4 Dynamic Pointer
The firstly reserved pointer in block’s AR points to the activation file of its dynamic
parent. This is referred to as a dynamic pointer and has the address 0 (ARB). The
dynamic pointer plays a vital role for de-allocating an AR. The following example
shows memory allocation for the program given below.

Figure 7.3.4 Dynamic Pointer

7.3.5 Static Pointer


Accesses to non-local variables are implemented using the second reserved pointer
in AR. This pointer which has the address 1 (ARB) is called the static pointer.

Activation Record
The activation record could be a block of memory used for managing statistics
wished by way of a single execution of a procedure.

Return value

Actual parameter

Control link

Access

Link Saved M/C

Status

Table 7.3.5 Activation Record


Local Variables

Temporaries98
1. Temporary variables: Such types of variables are needed during the evaluation
of expressions. These types of variables are stored in the temporary field of the
activation record.
2. Local variables: The local data is a type data which is local to the execution
system stored in this field of the activation record.
3. Saved machine registers: Before the procedure is called this field holds the
information regarding the status of the machine. This field also contains the
registers and program counter.
4. Control link: It’s an optional field which points to the activation record of the
calling procedure. This link is alternatively known as a dynamic link.
5. Access link: This field is also optional. It refers to the non-local data in another
activation record. This field is also called a static link field.
6. Actual parameters: This field holds information about the actual parameters.
These actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.

7.4 Compilation of Expression

7.4.1 Operand Descriptor


An operand descriptor has the following fields:
1. Attributes: Contains the subfield's type, length, and miscellaneous
information.
2. Addressability: Specifies where the operand is located, and how it can be
accessed.
It has two subfields:
• Addressability code: Takes the values M' (operand is in memory), and 'R'
(operand is in register). Other addressability codes, e.g. address in a register
('AR') and address in memory ('AM'), are also possible,
• Address: Address of a CPU register or memory word.

Example: a * b

MOVER AREG, A MULT AREG, B

Three operand descriptors are used during code generation. Assuming a, b to be


integers occupying 1 memory word, these are:

99
Attribute Addressability

(int, 1) Address(a)

(int, 1) Address(b)

(int, 1) Address(AREG)

7.4.2 Register Descriptors


A register descriptor has two fields
1. Status: Contains the code-free or occupied to indicate register status.
2. Operand descriptor #: If status=occupied, this field contains the descriptor for
the operand contained in the register.
Register descriptors are stored in an array called Register_descriptor. One register
descriptor exists for each CPU register. In the above example, the register
descriptor for AREG after generating code for a*b would be Occupied #3. This
indicates that register AREG contains the operand described by descriptor #3.

7.5 Intermediate Code for the Expression

There are two types of intermediate representation


1. Postfix notation
2. Three address code.

1) Postfix notation
• Postfix notation is a linearized representation of a syntax tree.
• It a list of nodes of the tree in which a node appears immediately after its
children
• the postfix notation of x=-a*b + -a*b will be x a –b *a-b*+=

2) Three address code


• In three address code format, the most three addresses are used to represent
the statement.
• The general form of three address code representation is -a:= b opc
• Where a, b, or c are the operands that can be names, constants.
• For the expression like a = b+c+d the three address code will be t1=b+c and
t2=t1+d
• Here t1 and t2 are the temporary names generated by the compiler. There are
100
the most three addresses allowed. Hence, this representation is the three-
address code.
• There are three representations used for three codes such as quadruples,
triples, and indirect triples.

7.5.1 Quadruple Representation


The quadruple is a structure with at the most tour fields such as op, arg1, arg2, and
result. The op field is used to represent the internal code for the operator, the arg1
and arg2 represent the two operands used, and result field is used to store the
result of an expression.
Consider the input statement x: = - a * b + - a * b

Op Arg1 Arg2 result

t1=uminus a (0) uminus a t1

t2 := t1 * b (1) * t1 b t2

t3= - a (2) uminus a t3

t4 := t3 * b (3) * t3 b t4

t5 := t2 + t4 (4) + t2 t4 t5

x= t5 (5) := t5 X

7.5.2 Triples Representation


The triple representation of the use of temporary variables is avoided by referring
the pointers in the symbol table. The expression x : = - a * b + - a * b the triple
representation is as given below:

Number Op Arg1 Arg2


(0) uminus a
(1) * (0) b
(2) uminus a
(3) * (2) b
(4) + (1) (3)
(5) := X (4)

101
7.5.3 Indirect Triples Representation
The indirect triple representation of the listing of triples is been done. And listing
pointers are used instead of using statements.

Number Op Arg1 Arg2 Statement

(0) uminus a (0) (11)


(1) * (11) b (1) (12)
(2) uminus a (2) (13)
(3) * (13) b (3) (14)
(4) + (12) (14) (4) (15)

(5) := X (15) (5) (16)

7.6 Code Optimization

a) Compile Time Evaluation


Compile-time evaluation means shifting of computations from runtime to
compilation.
There are two methods used to obtain the compile-time evaluation.
1. Folding
In the folding technique, the computation of constant is done at compile time
instead of run time.
Example : length = (22/7) * d

Here folding is implied by performing the computation of 22/7 at compile time


2. Constant propagation
In this technique, the value of the variable is replaced and the computation of
expression is done at the compilation time.
Example : pi = 3.14; r =5;
Area = pi * r *r
Here at the compilation time, the value of pi is replaced by 3.14, and r by 5 then
computation of 3.14 * 5 * 5 is done during compilation.

102
b) Common Sub Expression Elimination
The common sub expression is an expression appearing repeatedly in the program
which is computed previously. Then if the operands of this sub expression do not
get changed at all the result of such sub expression is used instead of recompiling it
each time.
Example:
t1 : = 4 * i t2 : = a[t1] t3 : = 4 * j t4 : = 4 * i t5 : =n t6 := b[t4] + t5
The above code can be optimized using common sub expression elimination
t1=4*i
t2=a[t1] t3=4*j t5=n t6=b[t1]+t5
The common sub expression t4:= 4 * i is eliminated as its computation is already in
t1 and the value of i is not been changed from definition to use.

c) Loop invariant computation (Frequency reduction)


Loop invariant optimization can be obtained by moving some amount of code
outside the loop and placing it just before entering the loop. This method is also
called code motion.
Example:
while ( I <= max-1)
{
sum=sum+a[i];
}
Can be optimized as a N=max-1;
While (I <= N)
{
sum=sum+a[i];
}

d) Strength Reduction
The strength of certain operators is higher than in others. For instance, the strength
of * is higher than +. In this technique, the higher strength operators can be
replaced by lower strength operators.
Example:
for(i=1;i<=50;i++)
{
count = i x 7;
}

103
Here we get the count values as 7, 14, 21, and so on up to less than50.
This code can be replaced by using strength reduction as follows temp=7
for(i=l; i<=50; i++)
{
count = temp;
temp = temp + 7;
}

e) Dead Code Elimination


A variable is alleged to be board a program if the value contained is later on. On the
opposite hand, the variable is alleged to be dead at some extent during a program if the
value contained in it’s ne’er been used. The code containing such a variable alleged to
be dead code. And optimisations are often performed by eliminating such a dead code.

Example :
i=0;
if(i==1)
{
a=x+5;
}

If the statement is a dead code as this condition will never get satisfied hence, the
statement can be eliminated and optimization can be done.

7.7 Overview of Interpretation


An interpreter is system software that translates a given High-Level Language
(HLL) program into a low-level one, but it differs from compilers. Interpretation is
a real-time activity where an interpreter takes the program, one statement at a time,
and translates each line before executing it.

7.7.1 Comparison between Compilers and Interpreters


Compilers Interpreters
Compilers are language processors Interpreters are a class of language
that are supported the language processors based on the
translation linking loading model. interpretation model.

104
Generate a target output program as Prevent any output program;
associate output, which may be run rather they judge the source
severally from the source program program at any time for execution.
written in Source Language.

Program execution is separated Program execution may be a part


from compilation and performed of the interpretation and
solely when whole output program performed on a press release by
is created. statement basis.
The target program executes The interpreter exists in the
independently and does not need memory during interpretation, i.e.
the presence of compiler in the it coexists with the source
memory. program to be interpreted.
Prevent the output program for Can figure out and execute
execution if there is an error in any program statement till an error is
of the source program statements. found.
Recompilation is needed for The interpreter is freelance of
generating a fresh output program program modification problem
in target language once every because it processes the source
modification within the source program on every occasion
program. throughout execution.
Compilers are convenient for Interpreters are suited
production environment. development environment for
program.
Compilers are bound to a particular Can be created transportable by
target machine and can’t be ported. rigorously writing them in a very
higher-level language.

7.7.2 Comparing the Performance of Compilers and Interpreters

• Comparative performance of a compiler and an interpreter can be realized by


inspecting the average CPU time cost for different kinds of processing of a
statement.
• Let ti, tc, and te be the interpretation-time statement, compilation-time
statement, and execution-time of the compiled statement, respectively.
• It is assumed that tc ≈ ti since both the compilers and interpreters involve
lexical, syntax, and semantic analyses of the source statement. Also, the code
generation effort for a statement performed by the compiler is of the same order of
magnitude as the effort involved in the interpretation of the statement.

105
• If a 400-statement program is executed on a test data with only 80 statements
being visited during the test run, the total CPU time in compilation followed by
the execution of the program is 400 * tc + 80 * te, while the total CPU time in the
interpretation of the program is 80 *ti = 80 *tc. This shows that the interpretation
will be cheaper in such cases.
• However, if more than 400 statements are to be executed, compilation followed by
execution will be cheaper, which means that using an interpreter is advantageous up
to the execution of 400 statements during the execution. This indicates that from the
CPU time cost, interpreters are a better choice at least for the program development
environment.

7.7.3 Benefits of Interpretation


The distinguished benefits of interpretation are as follows:
• Executes the ASCII text file directly. It interprets the ASCII text file into some
economical Intermediate Code (IC) and forthwith executes it. The process of
execution can be performed in a single stage without the need for a
compilation stage.
• Handles certain language features that cannot be compiled.
• Ensures portability since it does not produce a machine language program.
• Suited for a development environment where a program is modified
frequently. This means the alteration of code can be performed dynamically.
• Suited for debugging of the code and facilitates interactive code development.

106
Exercise
Short Questions

1. What is Compiler?
2. Explain about Operand and register Descriptor.
3. Write about the Intermediate Code for the Expression.
4. What is Quadruple Representation?
5. Explain about the indirect Triples Representation.
6. Comparison between Compilers and Interpreters.
7. Write about Benefits of Interpretation.

Long Questions

1. What is Binding and explain about Binding Times?


2. Discuss about memory allocation and its types.
3. What is Memory Allocation in Block-Structured Language?
4. Difference between Dynamic Pointer and Static pointer

107
Chapter 8
Programming Language Grammars
8.1 Programming Language
8.2 Programming Language Grammars
8.3 Classification of Grammar
8.4 Operator Grammars
8.4.1 Ambiguous Grammar
8.4.2 Scanning
8.5 Parsing
8.5.1 Bottom-Up Parsing
8.5.2 Shift Reduce Parsing
8.5.3 Operator Precedence Parsing
8.6 Operator Precedence Parsing Algorithm using Stack
8.7 Language Processor Development Tools
Exercise

108
8.1 Programming Language
It is defined the concept of its syntax and its semantics. The syntax gives us its
structure of program and the semantics gives us the importance of every
construction of programming language. In programming language, it describes
the semantics in many different ways. But there is mostly one technology to
describe its syntax is called context-free grammars.
Overview of Grammar
A program, which is a transformation of grammar. A linear sequence of ASCII
characters are normally represented into a syntax tree. In this way the tree are
syntactically valid and transformed in this way of the program. The program
process through a compiler or interpreter of the tree which is the main data-
structure. By traversing this tree the compiler can produce machine code, or can
type check the program, for instance. And by traversing this very tree the
interpreter can simulate the execution of the program.

8.2 Programming Language Grammars


Terminal symbol: A symbol in the alphabet is known as a terminal symbol.
Alphabet: The alphabet of a language L is the collection of graphic symbols such
as letters and punctuation marks used in L. It is denoted by Greek symbol Σ. Eg. Σ
= {a, b,…z, 0,1,…9}

String: A string is a finite sequence of symbols. It is denoted by Greek symbols α,


β, etc.
Nonterminal Symbol: A nonterminal symbol is the name of the syntax category
of a language, e.g., noun, verb, etc. A nonterminal symbol is written as a single
capital letter, or as a name enclosed between < … >, e.g., A or <Noun>. During
grammatical analysis, the nonterminal symbol represents an instance of the
category.
Production: A production, also called a rewriting rule, is a rule of grammar. It
has the form a nonterminal symbol → String of terminal and nonterminal symbols.
Where the notation ‘→’ stands for ‘is defined as’.
Example
<Noun Phrase> → <Article><Noun>
Grammar: A grammar G of a language LGi s a quadruple ( Σ, SNT, S, P) where Σ
is the alphabet of LG, i.e. the set of terminal symbols SNT is the set of nonterminal
symbols is the distinguished symbol (Start symbol) P is the set of productions

109
Example
<Noun Phrase> → <Article><Noun>
<Article> → a | an | the
<Noun> → boy | apple

Derivation: Let production P1 of grammar G be of the form P1: A → α and let β be a


string such that β β A, then replacement of A by α in string β constitutes a
derivation according to production P1. It yields the string α.
<Noun Phrase><Article><Noun>

the<Noun>

the boy

Reduction: Let production P1 of grammar G be of the form P1: A → α and let σ be a


string such that σ α, then replacement of α by A in string σ constitutes a
reduction according to production P1.
Step String
0 the boy
1 <Article>boy
2 <Article><Noun>
3 <Noun Phrase>

Parse trees: The tree representation of the sequence of derivations that produces a
string from the distinguished (start) symbol is termed a sparse tree.

NTi … …

Example:
<Noun Phrase>

<Article> <Noun>

the
boy

110
8.3 Classification of Grammar
Type-0 grammars
These grammars are known as phrase structure grammars. Their productions are
of the form

α→β

Where both α and β can be strings of the terminal and nonterminal symbols. Such
productions permit the arbitrary substitution of strings during derivation or
reduction, hence they are not relevant to the specification of programming
languages.

Type-1 grammars
Productions of Type-1 grammars specify that derivation or reduction of strings
can take place only in specific contexts. Hence these grammars are also known as
context-sensitive grammars. Production of a Type -1 grammar has the form

αAβ → απβ

Here, a string π can be replaced by 'A' (or vice versa) only when it is enclosed by
the strings α and β in a sentential form. These grammars are also not relevant for
programming language specification since recognition of programming language
constructs is not context-sensitive.

Type-2 grammars
These grammars do not impose any context requirements on derivations or
reductions. A typical Type -2 production is of the form

A→π

This can be applied independently of its context. These grammars are therefore
known as context-free grammars (CFG). CFGs are ideally suited for programming
language specifications.

Type-3 grammars
Type-3 grammars are characterized by productions of the form

A → tB|t or A → Bt|t

Note that these productions also satisfy the requirements of type-2 grammars.
The specific form of the RHS alternatives—namely a single nonterminal symbol
or a string containing a single terminal symbol and a single nonterminal symbol,
111
gives some practical advantages in scanning. However, the nature of the
productions restricts the expressive power of these grammars, e.g., nesting of
constructs or matching of parentheses cannot be specified using such
productions. Hence the use of Type -3 productions is restricted to the
specification of lexical units, e.g., identifiers, constants, labels, etc. Type-3
grammars are also known as linear grammars or regular grammars.

8.4 Operator Grammars


Productions of an operator's grammar do not contain two or more consecutive
nonterminal symbols in any RHS alternative. Thus, nonterminal symbols
occurring in an RHS string are separated by one or more terminal symbols.

Example

<exp> → <exp> + <term> | <term>


<term> → <term> * <factor> | <factor>
<factor> → <factor> ↑ <primary> | <primary>
<primary> → <<id> | <constant> | (<exp>)
<id> → <letter> | <id><letter> | <id><digit>
<const> → *+|-] <digit> | <const><digit>
<letter> → a|b|c|…|z
<digit> → 0|1|2|3|4|5|6|7|8|9

8.4.1 Ambiguous Grammar


A grammar is ambiguous if a string can be interpreted in two or more ways by
using it. In natural languages, ambiguity may concern the meaning of a word, the
syntax category of a word, or the syntactic structure of a construct.
For example, a word can have multiple meanings or it can be both a noun and a
verb and a sentence can have more than one syntactic. In a formal language
grammar, ambiguity would arise if identical strings can occur on the RHS of two or
more productions. For example, if a grammar has the
N1 → α N2 → α
The string α can be derived from or reduced to either N1orN2.
Ambiguity at the level of the syntactic structure of a string would mean that more
than one parse tree can be built for the string.
Example:
<exp> → <id> | <exp> + <exp> | <exp> * <exp>
<id> → a | b | c

112
Two parse trees can be built for the source string a + b * c according to this
grammar – one I which a + b is first reduced to <exp> and another in which b*c is
first reduced to <exp>.

Eliminating ambiguity in the above grammar can be rewritten as


<exp> → <exp> + <term> | <term>
<term> → <term> * <id> | <id>
<id> → a | b | c
8.4.2 Scanning
Finite State Automaton (FSA): A finite-state automaton is a triple (S, Σ, T) where S
is a finite set of states, one of which is the initial state sinit, and one or more of which
are the final states, Σ is the alphabet of source symbols, T is a finite set of state
transitions defining transitions out of states in S on encountering symbols in Σ.

Deterministic finite state automaton: It is a finite state automaton none of whose


states have two or more transitions for the same source symbol. The DFA has the
property that it reaches a unique state for every source string input to it.

Regular Expression: A regular expression is a sequence of characters that define a


search pattern, mainly for use in pattern matching with strings, or string matching,
i.e. "find and replace"-like operations.

8.5 Parsing
It is the process of top-down with non-recursive. LL (1) – Two L are define where
first L is scanned from left to right and the second L indicates leftmost derivation
for input string. In bracket 1 uses for predict the parsing process with input
symbol. LL (1) parser is an input buffer, stack, and parsing table in the data
structure.

Top and current input symbol are the process in the parsing program. These two
symbols parsing action used determined the program. It consults the LL (1)
parsing table each time while taking the parsing actions hence this type of parsing
method is also called the table-driven parsing method. The input is successfully
parsed if the parser reaches the halting configuration. When the stack is empty and
the next token is $ then it corresponds to successful parsing.

Steps to construct LL (1) parser


1. Remove left recursion / perform left factoring.
2. Compute FIRST and FOLLOW of non-terminals.

113
3. Construct a predictive parsing table.
4. Parse the input string with the help of the parsing table.

Example:
E→E+T/T
T→T*F/F
F→(E)/id

Step 1: Remove left recursion


E→TE’

E’→+TE’ | ϵ
T→FT’
T’→*FT’ | ϵ
F→(E) | id

Step 2: Compute FIRST & FOLLOW

FIRST FOLLOW
E {(,id} {$,)}
E {+,ϵ} {$,)}

T {(,id} {+,$,)}
T {*,ϵ} {+,$,)}

F {(,id} {*,+,$,)}

Table 8.5 (a) First & Follow set


Step 3: Predictive Parsing

114
id + * ( ) $

E E→TE’ E→TE’

E’ E’→+TE’ E’→ϵ E’→ϵ

T T→FT’ T→FT’

T’ T’→ϵ T’→*FT’ T’→ϵ T’→ϵ

F F→id F→(E)

Table 8.5 (b) Predictive parsing tables

Step 4: Parse the string

Stack Input Action


$E id+id*id$
$E’T id+id*id$ E TE’
$ E’T’F id+id*id$ T FT’
$ E’T’id id+id*id$ F id
$ E’T’ +id*id$
$ E’ +id*id$ T’ ϵ
$ E’T+ +id*id$ E’ +TE’
$ E’T id*id$
$ E’T’F id*id$ T FT’
$ E’T’id id*id$ F id
$ E’T’ *id$
$ E’T’F* *id$ T’ *FT’
$ E’T’F id$
$ E’T’id id$ F id
$ E’T’ $
$ E’ $ T’ ϵ
$ $ E’ ϵ

Table 8.5 (c) Moves made by the predictive parse

115
8.5.1 Bottom-Up Parsing

Handle: A “handle” of a string is a substring of the string that matches


the right side of a production, and whose reduction to the nonterminal of
the production is one step along with the reverse of rightmost derivation.

Handle pruning: The process of discovering a handle and reducing it to


appropriate Left-hand side non-terminal is known as handle pruning.

Right sentential Handle Reducing


form production
id1+id2*id3 id1 E id
E+id2*id3 id2 E id
E+E*id3 id3 E id
E+E*E E*E E E*E
E+E E+E E E+E
E
Table 8.5.1 Handle and Handle Pruning

8.5.2 Shift Reduce Parsing

The following basic operations are shift-reduce parser,


Shift: It moving from input buffer on to the stack of the symbols.
Reduce: Reduce action is the appropriate action rule, which appears
on the top of the stack for reduction.
Accept: It is the action, where the input buffer is empty and the stack
contains start symbol only.
Error: When there is no accept action and parser cannot either shift or
reduce the symbols.
Example
Consider the following
grammar,
E→E + T |T
T→T * F |F
F→id

Perform shift-reduce parsing for string id + id * id.

116
Stack Input buffer Action
$ id+id*id$ Shift
$id +id*id$ Reduce F->id
$F +id*id$ Reduce T->F
$T +id*id$ Reduce E->T
$E +id*id$ Shift
$E+ id*id$ shift
$E+ id *id$ Reduce F->id
$E+F *id$ Reduce T->F
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ Reduce F->id
$E+T*F $ Reduce T->T*F
$E+T $ Reduce E->E+T
$E $ Accept

Table 8.5.2 Configuration of shift-reduce parser on input id + id*id

8.5.3 Operator Precedence Parsing


Operator Grammar: In Grammar, where is no Є in RHS of any
production or no adjacent non-terminals. It is called operator precedence
grammar. In operator precedence parsing, it define three disjoint
precedence relations <., .>and = between a certain pair of terminals.

Relation Meaning
a < .b a “yields precedence to” b
a=.b a “has the same precedence as” b
a.>b a “takes precedence over” b
Table 8.5.3 Precedence between terminal a & b

Leading: It is the production of that nonterminal and the operator or first


terminal of nonterminal.
Trailing: It is the production of that nonterminal and the operator or last
terminal of nonterminal.

117
Example:
E→E+T/T

T→T
*F/F
F→ i
d

Step-1: Find leading and trailing of NT.


Leading Trailing
(E)={+,*,id} (E)={+,*,id}
(T)={*,id} (T)={*,id}
(F)={id} (F)={id}
Step-2: Establish Relation

1. a <.b
Op .NT Op <. Leading (NT)
+T + <. {*, id }
*F * <. { id }
2. a .>b
NT .Op Trailing (NT)
.>Op

E+ {+,*, id} .>+


T* {*,
id} > *
.

3. $ <. {+,*,id}
4. {+,*,id} .> $

Step-3: Creation of table


+ * id $
.
+ > <. <. .
>
. .
* > > <. .
>
. . .
id > > >

$ <. <. <.

118
Step-4: Parsing of the string <id> + <id> * <id> using
precedence table. We will follow following steps to parse
the given string,

1. Scan the input string until first.> is encountered.


2. Scan backward until <. Is encountered.
3. The handle is string between <.and.>.

$ <.id .> + <.id .> * <.id .> $ Handle id is obtained between <..>
Reduce this by F->id
$ F+ <.id .> * <.id .> $ Handle id is obtained between <..>
Reduce this by F->id
$ F + F * <.id .> $ Handle id is obtained between <..>
Reduce this by F->id
$F+F*F$ Perform appropriate non-terminals.
Reductions of all
$ E + T * F$ Remove all nonterminal
$ +* $ Place relation between operators
$ <. + <. * >$ The * operator is surrounded by <..>.
This indicates * becomes handle we
have to reduce T*F.
$ <. + >$ + becomes handle. Hence reduce E+T.

$$ Parsing Done

Table 6.9: Moves for parsing <id>+<id>*<id>

8.6 Operator Precedence Parsing Algorithm using Stack


Data Structure:
Stack: each stack entry is a record with two fields, operator and operand_pointer

Node: a node is a record with three fields’ symbol, left_pointer, and


right_pointer.

Functions: New node (operator, l_operand_pointer, r_operand_pointer) creates a


node with appropriate pointer fields and returns a pointer to the node.

119
Current Operator AS
St T
(a) ack
‘+ SB,TO |- a
’ S

|- a
SB,
(b) ‘* TO
’ S +
b

SB |- a

(c) ‘ -| + b

TO * c
S

‘ -| SB |- a

TO + *
(d) S

b c

(e ‘ -| SB,TO |- +
’ S

) a *

b c

1. TOS:= SB-1; SSM=0;


2. Push ‘|-‘ on the stack.
3. ssm=ssm+1;
4. x:=newnode(source symbol, null, null) TOS.
a. operand_pointer:=x;
b. Go to step 3;
5. While TOS operator .> current operator,
a. x:=newnode(TOS operator, TOSM.operand_pointer,
TOS.operand_pointer) pop an entry of the stack;
b. TOS.operand_pointer:=x;
6. If TOS operator <. current operator, then Push the current operator
on the stack. Go to step3;

120
7. While TOS operator .= current operator, then
a. if TOS operator = |-- then exit successfully if TOS operator
=’(‘, then
b. temp:=TOS.operand_pointer; pop an entry off the stack
TOS.operand_pointer:=temp; Go to step 3;
8. If no precedence defines between the TOS operator and the current
operator the report error and exit unsuccessfully.

8.7 Language Processor Development Tools


The two widely used language processor development tools are the
lexical analyzer generator LEX and the parser generator YACC. The
input to these tools are specifications of the lexical and syntactic
constructs of a programming language L, and the semantic actions that
should be performed on recognizing the constructs. The figure shows a
schematic for developing the analysis phase of a compiler for language L
by using LEX and YACC.

Figure 6.3: Using LEX and YACC

LEX
• The input to LEX consists of two components.
• The first component is a specification of strings that represents the
lexical units in L.
• This specification is in the form of regular expressions.
• The second component is a specification of semantic actions that are

121
aimed at building the intermediate representation.
• The intermediated representation produced by a scanner would
consist of a set of tables of lexical units and a sequence of tokens for
the lexical units occurring in a source statement.
• The scanner generated by LEX would be invoked by a parser
whenever the parser needs the next token.
• Accordingly, each semantic action would perform some table
building actions and return a single token.

YACC
• Each translation rule input to YACC has a string specification that
resembles a production of a grammar, it has a nonterminal on the
LHS and a few alternatives on the RHS.
• For simplicity, we will refer to a string specification as a production.
YACC generates an LALR (1) parser for language L from the
productions, which is a bottom-up parser.
• The parser would operate as follows: For a shift action, it would
invoke the scanner to obtain the next token and continue the parse by
using that token. While performing a reduced action following
production, it would perform the semantic action associated with
that production.
• The semantic actions associated with productions achieve the
building of an intermediate representation or target code as follows:
Every nonterminal symbol in the parser has an attribute. The
semantic action associated with production can access attributes of
nonterminal symbols used in that production—a symbol '$n' in the
semantic action, where n is an integer, designates the attribute of the nth
nonterminal symbol in the RHS of the production and the symbol '$$'
designates the attribute of the LHS nonterminal symbol of the production.
The semantic action uses the values of these attributes for building the
intermediate representation or target code. The attribute type can be
declared in the specification input to YACC.

122
Exercise
Short Questions

1. What is Programming Language?


2. How the Operators are used in Grammars?
3. What is Bottom-Up Parsing?
4. Explain about Shift Reduce Parsing.
5. How language processor Development Tools used in
language?

Long Questions

1. Discuss about grammar and its classification in Programming


Language.
2. What is parsing? Write on its type.
3. Explain about Operator Precedence Parsing algorithm using
stack.

123
Chapter 9
Systems Development

9.1 Introduction Systems Development


9.2 Java Language Environment
9.2.1 Java Virtual Machine
9.3 Types of Errors
9.3.1 Syntax Error
9.3.2 Semantic Error
9.3.3 Logical Error
9.4 Debugging Procedures
9.4.1 Types of debugging procedures
9.5 Classification of Debuggers
9.5.1 Static Debugging
9.5.2 Dynamic/Interactive Debugger
Exercise

124
9.1 Introduction Systems Development
In current development of a program or a new software application need the
process of designing, testing, defining, and implementing which process is called
systems development. It includes the acquisition of third party developed or the
creation of database systems in the internal development of customized systems
development. It must guide all information systems processing functions are
written standards and procedures. The life cycle methodology of system
development governing the process of implementing, maintaining, acquiring,
developing the related technology and the computerized information systems.

9.2 Java Language Environment


Java language environment has four key features:
• The Java virtual machine (JVM), which provide as portability of Java
programs.
• An impure interpretive scheme, whose flexibility is exploited to provide a
capability for the inclusion of program modules dynamically, i.e., during
interpretation.
• A Java bytecode verifier, which provides security by ensuring that
dynamically loaded program modules do not interfere with the operation of
the program and the operating system.
• An optional Java just-in-time (JIT) compiler, which provides efficient
execution.
Figure 9.2 shows a schematic of the Java language environment. The Java
compiler converts a source language program into the Java bytecode, which is a
program in the machine language of the Java virtual machine. The Java virtual
machine is implemented by a software layer on a computer, which is called the
Java virtual machine for simplicity. This scheme provides portability as the Java
bytecode can be 'executed' on any computer that implements the Java virtual
machine.
The Java virtual machine essentially interprets the bytecode form of a
program. The Java compiler and the Java virtual machine thus implement the
impure interpretation scheme. The use of an interpretive scheme allows certain
elements of a program to be specified during interpretation. This feature is
exploited to provide a capability for including program modules called Java class
files during the interpretation of a Java program. The class loader is invoked
whenever a new class file is to be dynamically included in the program. The class

125
loader locates the desired class file and passes it to the Java bytecode verifier.

Dat
aa

Java
Virtual Result
Java s
Machine
bytecod
e
Error
s
Class
Java Java Loader+
source Error
Compile Javabytecod s
Java
program r e
bytecod
e verifier
Java Just-In- Mixed-
mode Result
Time s
Compiler execution

Java M/c
native code language Result
M/c s
compiler language program
program
Figure 9.2 Java Language Environments

The Java bytecode verifier checks whether


• The program forges pointers, thereby potentially accessing invalid data or
performing branches to invalid locations.
• The program violates access restrictions, e.g., by accessing private data.
• The program has type-mismatches whereby it may access data in an invalid
manner.
• The program may have stack overflows or underflows during execution.
The Java language environment provides the two compilation schemes shown in
the lower half of Figure 8.1. The Java Just-In-Time compiler compiles parts of the
Java bytecode that are consuming a significant fraction of the execution time into
the machine language of the computer to improve their execution efficiency. It is
implemented using the scheme of dynamic compilation. After the just-in-time

126
compiler has compiled some part of the program, some parts of the Java source
program has been converted into the machine language while the remainder of
the program still exists in the bytecode form. Hence the Java virtual machine uses
a mixed-mode execution approach. The other compilation option uses the Java
native code compiler shown in the lower part of Figure 8.1. It simply compiles the
complete Java program into the machine language of a computer. This scheme provides
fast execution of the Java program; however, it cannot provide any of the benefits of
interpretation or just-in-time compilation.

9.2.1 Java Virtual Machine


A Java compiler produces a binary file called a class file that contains the
bytecode for a Java program. The Java virtual machine loads one or more class
files and executes programs contained in them. To achieve it, the JVM requires
the support of the class loader, which locates a required class file, and a bytecode
verifier, which ensures that execution of the bytecode would not cause any
breaches of security. The Java virtual machine is a stack machine. By contrast, a
stack machine performs computations by using the values existing in the top few
entries on a stack and leaving their results on the stack. This arrangement
requires that a program should load the values on which it wishes to operate on
the stack before performing operations on them and should take their results
from the stack.

The stack machine has the following three kinds of operations:


Push operation: This operation has one operand, which is the address of a
memory location. The operation creates a new entry at the top of the stack
and copies the value that is contained in the specified memory location into
this entry.
Pop operation: This operation also has the address of a memory location as
its operand. It performs the converse of the push operation—it copies the
value contained in the entry that is at the top of the stack into the specified
memory location and also deletes that entry from the stack.
n-ary operation: This operation operates on the values existing in the top n
entries of the stack, deletes the top n entries from the stack, and leaves the
result, if any, in the top entry of the stack. Thus, a unary operation operates
only on the value contained in the top entry of the stack, a binary operation
operates on values contained in the top two entries of the stack, etc.
A stack machine can evaluate expressions very efficiently because partial results
need not be stored in memory—they can be simply left on the stack.

127
9.3 Types of Errors
9.3.1 Syntax Error
• Syntax errors occur because the syntax of the programming language is not
followed.
• The errors in token formation, missing operators, unbalanced parenthesis,
etc., constitute syntax errors.
• These are generally programmer induced due to mistakes and negligence
while writing a program.
• Syntax errors are detected early during the compilation process and restrict
the compiler to
• proceed for code generation.
• Let us see the syntax errors with Java language in the following examples.
Example 1:
Missing punctuation-"semicolon" int age=50
// note here semicolon is missing

9.3.2 Semantic Error


• Semantic errors occur due to improper use of programming language
statements.
• They include operands whose types are incompatible, undeclared variables,
incompatible arguments to function or procedures, etc.
• Semantic errors are mentioned in the following examples.
Example:
Type incompatibility between operands
int msg="hello"; //note the types String and int are incompatible

9.3.3 Logical Error


• Logical errors occur due to the fact that the software specification is not
followed while writing the program. Although the program is successfully
compiled and executed error-free, the desired results are not obtained.
• Let us look into some logical errors with Java language.
Example:
Errors in computation
public static int mul(int a, int b)
{

128
return a + b ;
} // this method returns the incorrect value concerning the specification that
requires to multiply two integers
Example:
Non-terminating loops
String str = br. readLine();
while (str != null)
{
System.out.println(str);
} // the loop in the code did not terminate
Logical errors may cause undesirable effects and program behaviors. Sometimes,
these errors remain undetected unless the results are analyzed carefully.

9.4 Debugging Procedures


• Whenever there is a gap between expected output and the actual output of a
program, the program needs to be debugged.
• An error in a program is called a bug, and debugging means finding and
removing the errors present in the program.
• Debugging involves executing the program in a controlled fashion.
• During debugging, the execution of a program can be monitored at every
step.
• In the debug mode, activities such as starting the execution and stopping the
execution are in the hands of the debugger.
• The debugger provides the facility to execute a program up to the specified
instruction by inserting a breakpoint.
• It gives a chance to examine the values assigned to the variables present in
the program at any instant and, if required, offers an opportunity to update
the program.

9.4.1 Types of debugging procedures


Debug Monitors: A debug monitor is a program that monitors the execution of a
program and reports the state of a program during its execution. It may interfere
in the execution process, depending upon the actions carried out by a debugger
(person). In order to initiate the process of debugging, a programmer must
compile the program with the debug option first. This option, along with other

129
information, generates a table that stores the information about the variables
used in a program and their addresses.

Assertions: Assertions are mechanisms used by a debugger to catch the errors at


a stage before the execution of a program. Sometimes, while programming, some
assumptions are made about the data involved in the computation. If these
assumptions went wrong during the execution of the program, it may lead to
erroneous results. For this, a programmer can make use of an assert statement.
Assertions are the statements used in programs, which are always associated
with Boolean conditions. If an assert() statement is evaluated to be true, nothing
happens. But if it is realized that the statement is false, the execution program
halts.
9.5 Classification of Debuggers
9.5.1 Static Debugging
Static debugging focuses on semantic analysis. In a certain program, suppose
there are two variables: var1 and var2. The type of var1 is an integer, and the
type of var2 is a float. Now, the program assigns the value of var2 to var1; then,
there is a possibility that it may not get correctly assigned to the variable due to
truncation. This type of analysis falls under static debugging. Static debugging
detects errors before the actual execution.
Static code analysis may include detection of the following situations:
• Dereferencing of the variable before assigning a value to it
• Truncation of value due to the wrong assignment
• Redeclaration of variables
• Presence of unreachable code

9.5.2 Dynamic/Interactive Debugger


Dynamic analysis is carried out during program execution. An interactive
debugging system provides programmers with facilities that aid in testing and
debugging programs interactively.
A dynamic debugging system should provide the following facilities:
Execution sequencing: It is nothing but observation and control of the flow of
program execution. For example, the program may be halted after a fixed
number of instructions are executed.

Breakpoints: Breakpoints specify the position within a program until which the

130
program gets executed without disturbance. Once the control reaches such a
position, it allows the user to verify the contents of the variables declared in the
program.

Conditional expressions: A debugger can include statements in a program to


ensure that certain conditions are reached in the program. These statements,
known as assertions, can be used to check whether some pre-condition or post-
condition has been met in the program during execution.

Tracing: Tracing monitors step by step the execution of all executable statements
present in a program. The other name for this process is "step into". Another
possible variation is "step over" debugging that can be executed at the level of
procedure or function. This can be implemented by adding a breakpoint at the
last executable statement in a program.

Traceback: This gives a user the chance to trace back over the functions, and the
traceback utility uses a stack data structure. Traceback utility should show the
path by which the current statement in the program was reached.

Program-display capabilities: While debugging is in progress, the program


being debugged must be made visible on the screen along with the line numbers.

Multilingual capability: The debugging system must also consider the language
in which the debugging is done. Generally, different programming languages
involve different user environments and applications systems.

Optimization: Sometimes, to make a program efficient, programmers may use an


optimized code. Debugging of such statements can be tricky. However, to
simplify the debugging process, a debugger may use an optimizing compiler that
deals with the following issues:
• Removing invariant expressions from a loop
• Merging similar loops
• Eliminating unnecessary statements
• Removing branch instructions

131
Exercise
Short Questions

1. Explain about Systems Development.


2. What is error? Discuss about logical error.
3. Different between Static Debugging and Dynamic Debugger.
4. Explain about DFA with transition table.

Long Questions

1. What is Java Language Environment? Write the sort note on Java


Virtual Machine.
2. Difference between syntax error and Semantic Error.
3. Explain about the Debugging Procedures and its classification.

132
Miscellaneous Problems

1. (0+1)*010(0+1)*
ϵ- closure (0) = {0, 1, 2, 4, 7} A
Move (A, 0) = {3, 8}
ϵ- closure (Move (A, 0)) = {3, 6, 7, 1, 2,4,8} ----- B
Move (A, 1) = {5}
ϵ- closure (Move (A, 1)) = {5, 6, 7, 1,2,4} ------ C

Move (B, 0) = {3, 8}


ϵ- closure (Move (B, 0)) = {3, 6, 7, 1, 2,4,8} ------ B
Move (B, 1) = {5, 9}
ϵ- closure (Move (B, 1)) = {5, 6, 7, 1, 2,4,9} ------ D

133
ϵ

0
2 3 ϵ
ϵ 0
ϵ ϵ 1
0 1 6 7 8 9

ϵ
4 5 ϵ

1
0

0
12 13
ϵ ϵ

ϵ ϵ
17
10 11 16
ϵ
14 15 ϵ

Move (C, 0) = {3, 8}


ϵ- closure (Move(C, 0)) = {3, 6, 7, 1, 2,4,8} ------ B
Move (C, 1) = {5}
ϵ- closure (Move(C, 1)) = {5, 6, 7, 1,2,4} ------ C
Move (D, 0) = {3, 8, 10}
ϵ- closure (Move (D, 0)) = {3, 6, 7, 1, 2, 4, 8, 10, 11, 12,14,17} ------E
Move (D, 1) = {5}
ϵ- closure (Move(D, 1)) = {1, 2, 4, 5,6,7}------ C

134
Move (E, 0) = {3, 8, 13}
ϵ- closure (Move (E, 0)) = {1, 2, 3, 4, 6, 7, 8, 13, 16, 17, 11,12,14} ----- F
Move (E, 1) = {5, 9, 15}
ϵ- closure (Move (E, 1)) = {1, 2, 4, 5, 6, 7, 9, 15, 16, 17, 11,12,14} ----- G

Move (F, 0) = {3, 8, 13}


ϵ- closure (Move (F, 0)) = {1, 2, 3, 4, 6, 7, 8, 13, 16, 17, 11,12,14} ----- F
Move (F, 1) = {5, 9, 15}
ϵ- closure (Move (F, 1)) = {1, 2, 4, 5, 6, 7, 9, 11, 12, 14, 17,15,16} ---- G

Move (G, 0) = {3, 10, 13}


ϵ- closure (Move (G, 0)) = {1, 2, 3, 4, 6, 7, 10, 13, 16, 17, 11,12,14} ------ H
Move (G, 1) = {5, 15}
ϵ- closure (Move (G, 1)) = {1, 2, 4, 5, 6, 7, 15, 16, 17, 11,12,14}------ I

Move (H, 0) = {3, 8, 13}


ϵ- closure (Move (H, 0)) = {1, 2, 3, 4, 6, 7, 8, 13, 16, 17, 11,12, 14}----- F
Move (H, 1) = {5, 15}
ϵ- closure (Move (H, 1)) = {1, 2, 4, 5, 6, 7,15, 16, 17, 11,12,14} ----- I

Move (I, 0) = {3, 8, 13}


ϵ- closure (Move (I, 0)) = {1, 2, 3, 4, 6, 7, 8, 13, 16, 17, 11,12,14} ------ F
Move (I, 1) = {5, 15}
ϵ- closure (Move (I, 1)) = {1, 2, 4, 5, 6, 7, 15, 16, 17, 11,12,14} ------ I

Transition Table:
States 0 1

A B C
B B D
C B C
D E C
E F G
F F G
G H I
H F I
I F I

135
DFA:

0
0
0 0
B H
E F
0
1
1
A 1 0
0 0
1

1 I
C D G 1

1
1
1

136
2. (010+00)*(10)* ϵ

0 1 0

ϵ 3 4 5 ϵ
2

ϵ ϵ
0 1 9
10
1

ϵ ϵ
6 7 8

0 0
ϵ
ϵ
ϵ

ϵ
1 0
11 14
12 13

ϵ- closure (0) = {0, 1, 2, 6, 10,11,14}--------- A

Move (A, 0) = {3,7}


ϵ- closure (Move (A, 0)) ={3,7} -------- B
Move (A, 1) = {12}
ϵ- closure (Move (A, 1))={12} ---------C

Move (B, 0) ={8}

137
ϵ- closure (Move (B, 0)) = {8, 9, 10, 11, 14, 1,2,6} -------- D
Move (B, 1) = {4}
ϵ- closure (Move (B, 1))={4} -------- E

Move (C, 0) ={13}


ϵ- closure (Move(C, 0)) = {13,11,14}---------F
Move (C, 1) = φ

Move (D, 0) = {3,7}


ϵ- closure (Move (D, 0)) ={3,7} ---------B

Move (D, 1) = {12}


ϵ- closure (Move(D, 1))={12} ---- C

Move (E, 0) ={5}


ϵ- closure (Move (E, 0)) = {5, 9, 1, 2, 6, 10,11,14} ----- G
Move (E, 1) = φ

Move (F, 0) = φ Move (F, 1) ={12}


ϵ- closure (Move(F, 1))={12} ---- C

Move (G, 0) = {3,7}


ϵ- closure (Move (G,0))={3,7}----- B
Move (G, 1) = {12}
ϵ- closure (Move (G, 1))={12} ---- C

States 0 1

A B C
B D E
C F φ
D B C
E G φ
F φ C
G B C

138
DFA:
0

0
D
B
0 0

1
A G
1
1

1 0 0
C F E

3. (a+b)*a(a+b)
ϵ

a
a
10
2 9 ϵ
3 ϵ ϵ
ϵ
ϵ ϵ a
0 1 6 7 8 13

ϵ
4 5 ϵ ϵ 11 ϵ
12
b b

ϵ- closure (0)= {0,1,2,4,7} ------ A

Move (A, a) ={3,8}


ϵ- closure (Move (A, a))={3,6,7,1,2,4,8,9,11} ---- B
Move (A, b) = {5}

139
ϵ- closure (Move (A, b))={5,6,7,1,2,4} --- C

Move (B, a) ={3,8,10}


ϵ- closure (Move (B, a))= {3,6,7,1,2,4,8,9,11,10,13} --- D
Move (B, b) = {5,12}
ϵ- closure (Move (B, b))={5,6,7,1,2,4,12,13} --- E

Move (C, a) ={3,8}


ϵ- closure (Move (C, a)) == {3,6,7,1,2,4,8,9,11} --- B
Move (C, b) = {5}
ϵ- closure (Move (C, b))={5,6,7,1,2,4} ---- C

Move (D, a) ={3,8,10}


ϵ- closure (Move (D, a))={3,6,7,1,2,4,8,9,11,10,13} ---- D
Move (D, b) = {5, 12}
ϵ- closure (Move (D, b))= {5,6,7,1,2,4,12,13} --- E

Move (E, a) ={3,8}


ϵ- closure (Move (E, 0)) == {3,6,7,1,2,4,8,9,11} --- B
Move (E, b) = {5}
ϵ- closure (Move (E, 1))={5,6,7,1,2,4} --- C

Transition Table:
States a b
A B C
B D E
C B C
D D E
E B C

140
DFA:

a
B
a D
a
b
A a b

b
C E
b
b

141
4. (a+b)*abb(a+b)*

a
2 3 ϵ
ϵ a
ϵ ϵ b
0 1 6 7 8 9

ϵ
4 5 ϵ

b
b

a
12 13
ϵ ϵ

ϵ ϵ
17
10 11 16
ϵ
14 15 ϵ

ϵ- closure (0) = {0, 1, 2,4,7} ------ A

Move (A, a) = {3,8}


ϵ- closure (Move (A, a)) = {3, 6, 1, 2, 4,7,8} ----- B
Move (A, b) = {5}
ϵ- closure (Move (A, b)) = {5, 6, 1, 2,4,7} ---- C

142
Move (B, a) ={3,8}
ϵ- closure (Move (B, a)) = {3, 6, 1, 2, 4,7,8} ----- B
Move (B, b) = {5, 9}
ϵ- closure (Move (B, b)) = {5, 6, 7, 1, 2,4,9} ---- D

Move (C, a) = {3,8}


ϵ- closure (Move(C, a)) = {3, 6, 1, 2,4,7,8}------B
Move (C, b) = {5}
ϵ- closure (Move(C, b)) = {5, 6, 1, 2,4,7} ----- C

Move (D, a) = {3,8}


ϵ- closure (Move (D, a)) = {3, 6, 1, 2, 4,7,8} -----B
Move (D, b) = {5, 10}
ϵ- closure (Move(D, b)) = {5, 6, 7, 1, 2, 4, 10, 11, 12,14,17} -----E

Move (E, a) = {8, 3,13}


ϵ- closure (Move (E, a)) = {8, 3, 6, 7, 1, 2, 4, 13, 16, 17, 11,12,14} --- F
Move (E, b) = {5, 15}
ϵ- closure (Move (E, b)) = {5, 6, 7, 1, 2, 4, 15, 16, 17, 11,12,14} ---- G

Move (F, a) = {8, 3,13}


ϵ- closure (Move (F, a)) = {8, 3, 6, 7, 1, 2, 4, 13, 16, 17, 11,12,14} ---- F
Move (F, b) = {5, 9, 15}
ϵ- closure (Move (F, b)) = {9, 5, 6, 7, 1, 2, 4, 15, 16, 17, 11,12,14} ----H

Move (G, a) = {8, 3,13}


ϵ- closure (Move (G, a)) = {8, 3, 6, 7, 1, 2, 4, 13, 16, 17, 11,12,14}----- F
Move (G, b) = {5, 15}
ϵ- closure (Move (G, b)) = {5, 6, 7, 1, 2, 4, 15, 16, 17, 11,12,14} ---- G

Move (H, a) = {8, 3,13}


ϵ- closure (Move (H, a)) = {8, 3, 6, 7, 1, 2, 4, 13, 16, 17, 11,12,14} ---- F
Move (H, b) = {10, 5, 15}
ϵ- closure (Move (H, b)) = {10, 11, 12, 14, 17, 5, 6, 7, 1, 2, 4,15,16} --- I

Move (I, a) = {3, 8,13}


ϵ- closure (Move (I, a)) = {8, 3, 6, 7, 1, 2, 4, 13, 16, 17, 11,12,14} ---- F

143
Move (I, b) = {5, 15}
ϵ- closure (Move (I, b)) = {5, 6, 7, 1, 2, 4, 15, 16, 17, 11,12,14} ----- G

Transition Table:

States a b
A B C
B B D
C B C
D B E
E F G
F F H
G F G
H F I
I F G

DFA: a
a

a b
B E F H
a a

b
a b
A a b a
b
a

b
C D G I
b

b
b

144
5. 10(0+1)*1
ϵ

0
4 5
ϵ
ϵ
1 0 ϵ ϵ 1
0 1 2 3 10
8 9

ϵ
ϵ 6 7 ϵϵ

ϵ- closure (0)= {0} ------ A

Move (A, 0) =φ
ϵ- closure (Move (A, 0)) = φ Move (A, 1) = {1}
ϵ- closure (Move (A, 1))={1} ----- B

Move (B, 0) ={2}


ϵ- closure (Move (B, 0)) = {2, 3, 4,6,9} ---- C
Move (B, 1) = φ
ϵ- closure (Move (B, 1)) = φ

Move (C, 0) ={5}


ϵ- closure (Move (C, 0)) = {5, 8, 9, 3,4,6} ----- D
Move (C, 1) = {7, 10}
ϵ- closure (Move (C, 1)) = {3, 4, 6, 7, 8,9,10} --- E

Move (D, 0) ={5}


ϵ- closure (Move (D, 0)) = {5, 8, 9, 3,4,6} ---- D
Move (D, 1) = {7, 10}
ϵ- closure (Move (D, 1)) = {3, 4, 6, 7, 8,9,10} ---- E

Move (E, 0) ={5}


ϵ- closure (Move (E, 0)) = {5, 8, 9, 3,4,6} -----D
Move (E, 1) = {7, 10}
ϵ- closure (Move (E, 1)) = {3, 4, 6, 7, 8,9,10} ---- E

Transition Table:

States 0 1
A φ B
B C φ
C D E
D D E
E D E

DFA 0
B D
1

0
0 0
1
A

C E
1 1
MCQ practice Questions

S. NO Options Instructions
1. In a two pass assembler the object code generation is done during the?
A Second pass
B First pass
C Zeroeth pass
D Not done by assembler
2. Which of the following is not a type of assembler?
A one pass
B two pass
C three pass
D load and go
3. In a two pass assembler, adding literals to literal table and address
resolution of local symbols are done using?
A First pass and second respectively
B Both second pass
C Second pass and first respectively
D Both first pass

4. In a two pass assembler the pseudo code EQU is to be evaluated during ?


A Pass 1
B Pass 2
C not evaluated by the assembler
D None of above
5. A compiler which allows only the modified section of the source code to be
re-compiled is called
A incremental compiler
B re-configurable compiler
C dynamic compiler
D subjective compiler
6. Translator for low level programming language were termed as
A Assembler
B Compiler
C Linker
D Loader
7. An assembler is
A programming language dependent
B syntax dependant
C machine dependant
D data dependant
8. An imperative statement
A Reserves areas of memory and associates names with them
B Indicates an action to be performed during execution of
assembled program
C Indicates an action to be performed during optimization
D None of the above

9. In a two-pass assembler, the task of the Pass II is to


A Separate the symbol, mnemonic opcode and operand fields.
B Build the symbol table.
C Construct intermediate code.
D Synthesize the target program.
10. TII stands for
A Table of incomplete instructions
B Table of information instructions
C Translation of instructions information
D Translation of information instruction
11. Assembler is a machine dependent, because of?
A Macro definition table(MDT)
B Pseudo operation table(POT)
C Argument list array(ALA)
D Mnemonics operation table(MOT)
12. Assembler is ?
A A program that places programs into memory an prepares them
for execution
B A program that automate the translation of assembly language
into machine language
C A program that accepts a program written in high level
language and produces an object program
D Is a program that appears to execute a source program as if it
were machine language
13. Forward reference table(FRT) is arranged like ?
A Stack
B Queue
C Linked list
D Double linked list

14. Assembler is a program that


A places programs into memory and prepares then for execution
B automates the translation of assemble language into machine
language
C accepts a program written in a high level language and produces
an object program
D appears to execute a resource as if it were machine language
15. A single two pass assembler does which of the following in the first pass
A It allocates space for the literals
B It computes the total length of the program
C It builds the symbol table for the symbols and their values
D all of the above
16. Which of the following is not a function of pass1 of an assembler
A generate data
B keep track of LC
C remember literals
D remember values of symbols until pass 2
17. ____ converts the programs written in assembly language into machine
instructions.
A Machine compiler
B Interpreter
C Assembler
D Converter
18. The instructions like MOV or ADD are called as ______.
A OP-Code
B Operators
C Commands
D None of the above
19. Instructions which won’t appear in the object program are called as _____.
A Redundant instructions
B Exceptions
C Comments
D Assembler Directives
20. The assembler stores all the names and their corresponding values in
______ .
A Special purpose Register
B Symbol Table
C Value map Set
D None of the above
21. When dealing with the branching code, the assembler
A Replaces the target with its address
B Does not replace until the test condition is satisfied
C Finds the Branch offset and replaces the Branch target with it
D Replaces the target with the value specified by the DATAWORD
directive
22. The last statement of the source program should be _______.
A Stop
B Return
C OP
D END
23. _____ Directive specifies the end of execution of a program.
A END
B RETURN
C STOP
D Terminate
24. Address symbol table is generated by the
A memory management software
B assembler
C match logic of associative memory
D generated by operating system
25. Which of these features of assembler are Machine-Dependent?
A Instruction formats
B Addressing modes
C Program relocation
D All of the mentioned
26. In a two pass assembler pseudo code,equ is to be evaluated during
A pass 1
B pass 2
C not evaluated by the assembler
D none of these
27. The translator used by second generation languages is?
A assembler
B interpreter
C compiler
D linker
28. A simple two-pass assembler does which of the following in the first pass ?
A It allocates space for the literals
B It computes the total length of the program
C It builds the symbol table for the symbols and their values
D All of these
29. In analyzing the compilation of PL/I program the description " creation of
more optimal matrix " is associated with
A syntax analysis
B code generation
C assembly and output
D machine independent optimization
30. Which of the following translation program converts assembly language
programs to object program
A Loader
B Compiler
C Assembler
D Macroprocessor
31. Pass 2 -
A perform processing of assembler directives not done during pass
1
B write the object program and assembly listing
C assemble instruction and generate data
D all of these
32. Pass I -
A save the values assigned to all labels for use in pass 2
B perform some processing of assembler directives
C assign address to all statements in the program
D all of these
33. Which table is permanent databases that has an entry for each terminal
symbol ?
A Literal table
B Identifier table
C Terminal table
D None of these
34. Which of the following system program forgoes the production of object
code to generate absolute machine code and load it into the physical main
storage location from which it will be executed immediately upon
completion of the assembly?
A compiler
B macroprocessor
C two pass assembler
D load-and-go assembler
35. Which of the following is not a feature of compiler ?
A Scans the entire program first and then translate it into machine
code
B When all the syntax errors are removed execution takes place
C slow for debugging
D Execution time is more
36. Yacc semantic action is a sequence of ?
A Tokens
B Expression
C C statement
D Rules
37. Which of the following software tool is parser generator ?
A LEX
B YACC
C Both a and b
D None of these
38. A Lex compiler generates?
A Lex object code
B Transition tables
C C Tokens
D None of above
39. Analysis which determines the meaning of a statement once its
grammatical structure becomes known is termed as
A Semantic analysis
B Syntax analysis
C Regular analysis
D General analysis
40. Storage mapping is done by ?
A Loader
B Linker
C Operating system
D Compiler
41. A compiler is a program that
A places programs into memory and prepares then for execution
B automates the translation of assemble language into machine
language
C accepts a program written in a high level language and produces
an object program
D appears to execute a resource as if it were machine language
42. Recursive descent parsing is an example of
A top down parsers
B bottom up parsers
C predictive parsing
D none of these
43. A compiler that runs on one machine and produces code for a different
machine is called
A cross compilation
B one pass compilation
C two pass compilation
D none of these
44. YACC stands for
A yet accept compiler constructs
B yet accept compiler compiler
C yet another compiler constructs
D yet another compiler compiler
45. An optimizer compiler
A Is optimized to occupy less space
B Is optimized to take less time for execution
C Optimizes the code
D None of these

Answers

1-A, 2-C, 3-D, 4-A, 5-A, 6-A, 7-C, 8-B, 9-D, 10-A, 11-D, 12-B, 13-C, 14-B, 15-D,
16-A, 17-C, 18-A, 19-C, 20-B, 21-C, 22-D, 23-B, 24-B, 25-D, 26-A, 27-A, 28-D, 29-
D, 30-C, 31-D, 32-D, 33-C, 34-D, 35-D, 36-C, 37-B, 38-B, 39-A, 40-D, 41-C, 42-B,
43-A, 44-D, 45-D
Index

A DC ‘5’, 107, 176, 182, 201, 232 Program Relocation, 244

A DS 1, 100, 101, 181, 199, 215 An imperative statement, 67

A set of overlays., 254 Android against iOS, 29

ADD AREG, Z, 67 AREG, =’0’, 67, 71, 72, 73, 74, 80, 81, 83, 85,
90, 95, 123, 158, 178, 197, 198, 203, 212,
ADD BREG, B, 68, 177, 180, 184, 200, 206, 213
216
AREG, A, 63, 85, 90, 158
Add(id#4) to (id#3), giving (id#5), 60
AREG, B, 63
ADD, R (ADD –assembly code, R-temp.
registers), 63 AREG, I (AREG –assembly code, I-temp.
registers), 62
Address, 37, 57, 60, 72, 73, 74, 75, 81, 84,
120, 121, 124, 139, 140, 147, 148, 150, 172, AREG,=’1’, 38, 101, 163, 205
177, 186, 201, 241, 246
Argument list array(ALA), 10, 42, 43, 150,
ADDRESS, 73, 74, 75, 76 151, 153, 164, 165, 206, 211, 216, 220, 221,
226, 227, 231, 232, 236
Address of name ONE, 81
Assembler, 4, 10, 18, 24, 38, 40, 41, 42, 44,
address_in_work_area, 245, 247 49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
Addressing modes, 95, 102, 113, 119, 132, 123, 126, 139, 149, 151, 152, 153, 155, 158,
177, 180, 184 160, 164, 165, 177, 178, 179, 181, 182, 183,
188, 206, 209, 216, 218, 221, 224, 227, 229,
AGAIN, 11, 59, 78, 79, 80, 85, 95, 101, 120, 232, 234, 248, 250, 251, 253, 254, 255, 256
141, 172, 182, 198, 204, 213, 239, 240, 241,
242, 243, 244, 245, 248, 249, 251, 252, 255, Assembler directive, 69, 148
256 Assembler Directive, 69, 83
Agile development, 29 Assembler Directives, 38
Program Linking, 246
assembly and output, 9, 42, 149, 151, 164, Comments, 10, 42, 149, 151, 152, 164, 206,
206, 208, 216, 217, 221, 223, 227, 228, 232, 209, 216, 218, 221, 224, 227, 229, 232, 234
234
Compiler, 38, 95, 102, 113, 120, 132, 177,
Assembly Language, 38, 78 180, 184

BACK, 112, 201, 238 Construct intermediate code., 10, 42, 149,
151, 152, 164, 165, 206, 210, 216, 219, 221,
Base table (storing value of base register),
225, 227, 230, 232, 235
77
CONV, R (CONV –assembly code, R-
bottom up parsers, 95, 102, 113, 119, 132,
temp. registers), 62
177, 180, 184
Convert(id1#1) to real, giving (id#4), 60
Build the symbol table., 95, 102, 113, 119,
132, 177, 180, 184 Converter, 38

C statement, 9, 41, 42, 149, 151, 164, 206, cross compilation, 3, 18, 23, 37, 40, 41, 44,
207, 216, 217, 221, 222, 227, 228, 232, 233 49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
123, 126, 138, 153, 158, 164, 176, 178, 179,
C Tokens, 9, 41, 42, 149, 151, 164, 206, 207,
181, 182, 183, 187, 248, 250, 251, 253, 255,
216, 217, 221, 222, 227, 232, 233
256
Go to pass II, 148
Cross Reference Table (CRT), 187
Clear machine_code_buffer., 144, 147, 172,
Custom Software, 8
176, 189, 191, 244, 246, 247
Custom Software XE "Software"
Clear machine_code_buffer;, 144, 147, 172,
Development, 10
176, 189, 191, 244, 246, 247
data dependant, 39
code generation, 95, 102, 113, 119, 132, 177,
180, 184 Debugging of programs is difficult., 38

Code Generation, 62 Declaration Statement, 67, 149

Code= code of the declaration statement, Different language expressions, 63


144, 145, 146, 147, 173, 174, 175, 189, 190,
191, 192, 193, 194 DL, 67, 68, 71, 72, 75, 83, 107, 113, 122, 123,
146, 147, 148, 149, 153, 158, 163, 164, 172,
Code= machine opcode from OPTAB, 144, 174, 181, 182, 200, 201, 204, 205, 212, 213,
145, 146, 173, 174, 175, 189, 190, 191, 192, 214, 215, 216, 221, 227, 232
193, 194
Double linked list, 38
Commands, 10, 42, 149, 151, 152, 164, 165,
206, 209, 216, 218, 221, 224, 227, 229, 232,
234
dynamic compiler, 10, 42, 43, 150, 151, 152, generate data, 4, 18, 24, 37, 40, 41, 44, 49,
164, 165, 206, 210, 216, 220, 221, 225, 227, 53, 68, 78, 82, 85, 90, 107, 120, 121, 123,
230, 232, 236 126, 139, 153, 154, 158, 159, 164, 177, 178,
179, 181, 182, 183, 188, 248, 250, 251, 253,
EQU, 84, 112, 120, 145, 148, 186, 189 255, 256
Exceptions, 95, 102, 113, 119, 132, 177, 180, Generate IC ‘(AD, 02)’, 144, 148, 173, 176,
184 189, 191, 245, 246, 247
execution, 19, 41, 45, 49, 75, 245, 248 Generate IC’ (DL, code)’.., 146, 173, 190,
Exercise, 3, 12, 16, 31, 35, 63, 66, 238, 240 192, 195

Expression, 95, 102, 113, 119, 132, 177, 180, generated by operating system, 38
184 GT, LOOP, 100, 102, 163, 199, 204, 215
External references, 252 hardware, 5, 6, 7, 16, 17, 18, 22, 23, 24, 25,
Extra work in pass I, 172 266

Fetch the instruction, 50 Hardware Interface, 23

Levels of System Software, 31 Identifier table, 95, 102, 113, 119, 132, 177,
180, 184
Von Neumann Architecture, 20
Imperative statement, 67, 149
Harvard Computer Architecture, 21
incremental compiler, 4, 18, 24, 38, 40, 41,
Life Cycle of a Source Program, 27 44, 49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
123, 126, 139, 153, 155, 158, 160, 164, 177,
Computer Language, 36
178, 179, 181, 182, 183, 188, 249, 250, 251,
Program Generation Activities, 49 253, 254, 255, 256

Program Execution Activities, 50 Instruction formats, 4, 18, 24, 37, 40, 41, 44,
49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
Design of Assembler, 70 123, 126, 139, 153, 154, 158, 159, 164, 176,
178, 179, 181, 182, 183, 187, 248, 250, 251,
First pass, 95, 102, 113, 120, 132, 177, 180,
253, 255, 256
184
Intermediate Representation, 57
Forward Reference Table (FRT), 187
Interpretation, 50
Fourth Generation Programming
Languages, 43 interpreter, 95, 102, 113, 119, 132, 177, 180,
184
General analysis, 38
Interpreter, 95, 102, 113, 119, 132, 177, 180, LOOP, 11, 59, 64, 65, 66, 67, 70, 72, 76, 78,
184 79, 80, 81, 82, 83, 90, 121, 143, 148, 149,
158, 163, 171, 172, 176, 185, 186, 188, 197,
LAST, 85, 90, 102, 200, 216 206, 213
LC (location counter), 77 LTORG, 71, 72, 73, 74, 84, 101, 120, 139,
LEX, 3, 18, 24, 37, 40, 41, 44, 49, 53, 68, 78, 144, 149, 171, 173
82, 85, 90, 107, 120, 121, 123, 126, 139, LTORG 203
153, 154, 158, 164, 176, 178, 179, 181, 182,
183, 187, 248, 250, 251, 253, 255, 256 Machine compiler, 4, 18, 24, 37, 40, 41, 44,
49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
Lex object code, 3, 18, 24, 37, 40, 41, 44, 49, 123, 126, 139, 153, 154, 158, 159, 164, 177,
53, 68, 78, 82, 85, 90, 107, 120, 121, 123, 178, 179, 181, 182, 183, 188, 248, 250, 251,
126, 138, 153, 158, 164, 176, 178, 179, 181, 253, 255, 256
182, 183, 187, 248, 250, 251, 253, 255, 256
machine dependant, 10, 42, 150, 151, 152,
Lexical Analysis (Scanning), 54 164, 165, 206, 210, 216, 219, 221, 225, 227,
Linked list, 10, 42, 149, 151, 152, 164, 165, 230, 232, 236
206, 209, 216, 219, 221, 224, 227, 230, 232, machine independent optimization, 38
235
Machine Language, 37
linker, 38
Machine opcode (2), 186
Linker, 10, 42, 43, 95, 102, 113, 119, 132,
150, 151, 152, 164, 165, 177, 180, 184, 206, Macro definition table(MDT), 4, 18, 24, 37,
210, 216, 220, 221, 225, 227, 230, 232, 236 40, 41, 44, 49, 53, 68, 78, 82, 85, 90, 107,
120, 121, 123, 126, 139, 153, 155, 158, 159,
Literal table, 4, 18, 24, 37, 40, 41, 44, 49, 53, 164, 177, 178, 179, 181, 182, 183, 188, 248,
68, 78, 82, 85, 90, 107, 120, 121, 123, 126, 250, 251, 253, 254, 255, 256
139, 153, 154, 158, 159, 164, 176, 178, 179,
181, 182, 183, 187, 248, 250, 251, 253, 255, macroprocessor, 95, 102, 113, 119, 132, 177,
256 180, 184

load-and-go assembler, 38 Macroprocessor, 38

Loader, 4, 18, 24, 37, 39, 40, 41, 44, 49, 53, Memory Allocation, 60
68, 78, 82, 85, 90, 107, 120, 121, 123, 126,
139, 153, 154, 158, 159, 164, 176, 178, 179, memory management software, 4, 18, 24,
181, 182, 183, 187, 248, 250, 251, 253, 255, 37, 40, 41, 44, 49, 53, 68, 78, 82, 85, 90,
256 107, 120, 121, 123, 126, 139, 153, 154, 158,
159, 164, 176, 178, 179, 181, 182, 183, 187,
Local variables, 42 248, 250, 251, 253, 255, 256

Logical Address Space, 25 Mnemonics field, 148


Mnemonics operation table(MOT), 39 pass 2, 95, 102, 113, 119, 132, 177, 180, 184

Mnemonics Table (MOT), 185 Physical Address Space, 25

one pass, 4, 18, 24, 38, 40, 41, 44, 49, 53, 68, predictive parsing, 9, 41, 42, 149, 151, 164,
78, 82, 85, 90, 107, 120, 121, 123, 126, 139, 206, 207, 216, 221, 222, 227, 232, 233
153, 155, 158, 160, 164, 177, 178, 179, 181,
Process pseudo-operations., 66
182, 183, 188, 249, 250, 251, 253, 254, 255,
256 Processing end statement, 10, 11, 12, 33,
one pass compilation, 95, 102, 113, 119, 34, 35, 36, 37, 39, 43, 44, 45, 46, 48, 49, 50,
51, 52, 53, 54, 176, 247
132, 177, 180, 184

Opcode, 67, 70, 79 Produce cross reference listing., 195

OP-Code, 4, 18, 24, 37, 40, 41, 44, 49, 53, 68, program, 243
78, 82, 85, 90, 107, 120, 121, 123, 126, 139, Program relocation, 9, 42, 149, 151, 164,
153, 154, 158, 159, 164, 177, 178, 179, 181,
206, 208, 216, 217, 221, 223, 227, 228, 232,
182, 183, 188, 248, 250, 251, 253, 255, 256
234
Open Source Software, 7
program_linked_origin = <link origin>
Operating system, 9, 41, 42, 149, 151, 164, from the linker command., 1, 2, 3, 4, 5, 6,
206, 207, 216, 221, 222, 227, 232, 233 7, 11, 75, 172, 244, 246

Operators, 95, 102, 113, 119, 132, 177, 180, programming language dependent, 4, 18,
184 24, 37, 40, 41, 44, 49, 53, 68, 78, 82, 85, 90,
107, 120, 121, 123, 126, 139, 153, 155, 158,
OPTAB, 121, 122, 147, 176, 203 160, 164, 177, 178, 179, 181, 182, 183, 188,
248, 250, 251, 253, 254, 255, 256
OPTAB, SYMTAB, LITTAB, and
POOLTAB., 121 Pseudo operation table(POT), 95, 102, 113,
119, 132, 177, 180, 184
Optimizes the code, 9, 41, 42, 149, 151, 164,
206, 216, 221, 227, 232 Public definitions, 252

ORIGIN, 83, 84, 102, 107, 119, 145, 149, 173 Queue, 95, 102, 113, 119, 132, 177, 180, 184

ORIGIN <address specification>, 83 re-configurable compiler, 95, 102, 113, 120,


132, 177, 180, 184
Overlay structured program, 253
Redundant instructions, 4, 18, 24, 37, 40,
Pass 1, 4, 18, 24, 38, 40, 41, 44, 49, 53, 68, 78,
41, 44, 49, 53, 68, 78, 82, 85, 90, 107, 120,
82, 85, 90, 107, 120, 121, 123, 126, 139,
121, 123, 126, 139, 153, 154, 158, 159, 164,
153, 155, 158, 160, 164, 177, 178, 179, 181,
176, 178, 179, 181, 182, 183, 188, 248, 250,
182, 183, 188, 249, 250, 251, 253, 254, 255,
251, 253, 255, 256
256
Register Operand (1 digit), 195 Size= size of memory area required by
DC/DS, 144, 145, 146, 147, 173, 174, 175,
Regular analysis, 9, 41, 42, 149, 151, 164,
189, 191, 192, 193, 195
206, 207, 216, 221, 222, 227, 232, 233
slow for debugging, 9, 41, 42, 149, 151, 164,
remember literals, 10, 42, 149, 151, 152,
206, 207, 216, 217, 221, 222, 227, 228, 232,
164, 165, 206, 209, 216, 218, 221, 224, 227,
233
229, 232, 235
Software, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 15,
Renting against buying (Cloud Services),
16, 23, 27, 28, 29, 30, 31
29
Special purpose Register, 4, 18, 24, 37, 40,
Replaces the target with its address, 4, 18,
41, 44, 49, 53, 68, 78, 82, 85, 90, 107, 120,
24, 37, 40, 41, 44, 49, 53, 68, 78, 82, 85, 90, 121, 123, 126, 139, 153, 154, 158, 159, 164,
107, 120, 121, 123, 126, 139, 153, 154, 158, 176, 178, 179, 181, 182, 183, 188, 248, 250,
159, 164, 176, 178, 179, 181, 182, 183, 188,
251, 253, 255, 256
248, 250, 251, 253, 255, 256
SRTAB# (1), 187
Role of Compiler, 44
srtab_no
Role of Interpreter, 45
= srtab_no+1;, 144, 145, 146, 147, 173,
Rules, 38 174, 175, 189, 190, 192, 193, 194
Second pass, 4, 18, 24, 38, 40, 41, 44, 49, 53, Stack, 4, 18, 24, 37, 40, 41, 44, 49, 53, 68, 78,
68, 78, 82, 85, 90, 107, 120, 121, 123, 126, 82, 85, 90, 107, 120, 121, 123, 126, 139,
139, 153, 155, 158, 160, 164, 177, 178, 179, 153, 155, 158, 159, 164, 177, 178, 179, 181,
181, 182, 183, 188, 249, 250, 251, 253, 254,
182, 183, 188, 248, 250, 251, 253, 255, 256
255, 256
START, 69, 70, 72, 73, 80, 84, 123, 145, 148,
Second pass and first respectively, 10, 42,
150, 151, 173, 177, 196, 203, 206
43, 150, 151, 152, 164, 165, 206, 210, 216,
220, 221, 225, 227, 231, 232, 236 START <Constant>, 69

Segment Register Table (SRTAB), 186 START 200, 1, 2, 3, 4, 5, 6, 7, 11, 19, 41, 44,
55, 57, 60, 71, 72, 73, 75, 76, 79, 81, 82, 83,
Semantic analysis, 3, 18, 23, 37, 40, 41, 44,
84, 85, 90, 100, 101, 102, 107, 113, 119,
49, 53, 68, 78, 82, 85, 90, 107, 120, 121, 120, 123, 124, 125, 126, 132, 139, 141, 142,
123, 126, 138, 153, 158, 164, 176, 178, 179, 143, 145, 147, 148, 158, 163, 164, 166, 172,
181, 182, 183, 187, 248, 250, 251, 253, 255, 173, 177, 181, 186, 187, 189, 190, 191, 195,
256 196, 200, 201, 202, 203, 204, 205, 206, 216,
Semantic Analysis, 56 221, 226, 227, 232, 237, 244, 246

Single-page Web apps against websites, 29 stmt_no, 188, 189, 190, 193
Stop, 4, 18, 24, 37, 40, 41, 44, 49, 53, 68, 78, Table of information instructions, 95, 102,
82, 85, 90, 107, 120, 121, 123, 126, 139, 113, 119, 132, 177, 180, 184
153, 154, 158, 159, 164, 176, 178, 179, 181,
182, 183, 188, 248, 250, 251, 253, 255, 256 Tasks of Analysis phase, 80

STOP, 10, 42, 67, 102, 149, 151, 152, 163, Tasks of Synthesis phase, 81
164, 200, 205, 206, 208, 216, 218, 221, 223, temp, 11, 59, 78, 79, 80, 85, 95, 101, 120,
227, 229, 232, 234 141, 172, 182, 198, 204, 213, 239, 240, 241,
Store (id#5) in (id#2), 60 242, 243, 244, 245, 248, 249, 251, 252, 255,
256
subjective compiler, 39
TERM, 107, 176, 182, 201, 232
Symbol Table, 95, 102, 113, 119, 132, 177,
180, 184 Terminal table, 9, 41, 42, 149, 151, 164, 206,
207, 216, 217, 221, 222, 227, 228, 232, 233
Symbol Table (SYMTAB), 186
Terminate, 38
SYMTAB, 121, 123, 144, 147, 150, 174, 186,
188, 189, 190, 191, 192, 194, 201 Terminology, 3

SYMTAB entry # (2), 186 The EQU directive has the syntax, 84

syntax analysis, 4, 18, 24, 37, 40, 41, 44, 49, The syntax of this directive is, 83
53, 68, 78, 82, 85, 90, 107, 120, 121, 123, This_address=value specified in <address
126, 139, 153, 154, 158, 159, 164, 176, 178, spec>;, 144, 145, 146, 173, 174, 175, 189,
179, 181, 182, 183, 187, 248, 250, 251, 253, 190, 191, 192, 193, 194
255, 256
this_entry= SYMTAB entry number of
Syntax Analysis (Parsing), 55
operand generate IC ‘(IS, code)(S,
syntax dependant, 95, 102, 113, 120, 132, this_entry)’;, 147
177, 180, 184
three pass, 10, 42, 43, 150, 151, 152, 164,
Synthesize a machine instruction., 81 165, 206, 210, 216, 220, 221, 225, 227, 231,
232, 236
Synthesize the target program., 39
Tokens, 3, 18, 24, 37, 40, 41, 44, 49, 53, 68,
System Programming, 14 78, 82, 85, 90, 107, 120, 121, 123, 126, 139,
153, 154, 158, 164, 176, 178, 179, 181, 182,
Systems, 1, 64, 262, 280, 295, 311
183, 187, 248, 250, 251, 253, 255, 256
Table of incomplete instructions, 4, 18, 24,
top down parsers, 3, 18, 23, 37, 40, 41, 44,
37, 40, 41, 44, 49, 53, 68, 78, 82, 85, 90,
49, 53, 68, 78, 82, 85, 90, 107, 120, 121,
107, 120, 121, 123, 126, 139, 153, 155, 158,
123, 126, 138, 153, 158, 164, 176, 178, 179,
160, 164, 177, 178, 179, 181, 182, 183, 188,
248, 250, 251, 253, 254, 255, 256
181, 182, 183, 187, 248, 250, 251, 253, 255, yet accept compiler compiler, 95, 102, 113,
256 119, 132, 177, 180, 184

Transition tables, 95, 102, 113, 119, 132, yet accept compiler constructs, 3, 18, 23,
177, 180, 184 37, 40, 41, 44, 49, 53, 68, 78, 82, 85, 90,
107, 120, 121, 123, 126, 138, 153, 158, 164,
Translation, 50, 242 176, 178, 179, 181, 182, 183, 187, 248, 250,
Translation of information instruction, 39 251, 253, 255, 256

Translation of instructions information, 10, yet another compiler compiler, 38


42, 149, 151, 152, 164, 165, 206, 209, 216, yet another compiler constructs, 9, 41, 42,
219, 221, 224, 227, 230, 232, 235 149, 151, 164, 206, 216, 221, 227, 232
two pass, 95, 102, 113, 120, 132, 177, 180, Zeroeth pass, 10, 42, 43, 150, 151, 152, 164,
184 165, 206, 210, 216, 220, 221, 225, 227, 231,
two pass assembler, 9, 41, 42, 149, 151, 164, 232, 236
206, 207, 216, 217, 221, 222, 227, 228, 232,
233

two pass compilation, 9, 41, 42, 149, 151,


164, 206, 207, 216, 221, 222, 227, 232

Update loc_cntr accordingly, 144

Use of preprocessors against full software


stack, 29

User Interface, 24

Using(1),, 249

Value map Set, 10, 42, 149, 151, 152, 164,


206, 208, 216, 218, 221, 223, 227, 229, 232,
234

Variant I, 149, 150, 151, 171, 206

Variant II, 150, 151, 171, 206

Various Databases required by pass-2:, 77

Web interfaces but not IDEs, 29

YACC, 95, 102, 113, 119, 132, 177, 180, 184

You might also like