System Software - An Introduction To Systems Programming (PDFDrive) OCR
System Software - An Introduction To Systems Programming (PDFDrive) OCR
An Introduction to
Systems Programming
Third Edition
Leland L. Beck
San Diego State University
A
wiy ADDISON-WESLEY
Beck, Leland L.
An introduction to systems programming / Leland L. Beck
p. cm.
Includes index.
ISBN 0-201-42300-6
All rightsreserved.
Nopartof thispublication
maybereproduced,
storedina retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photo-
copying, recording, or otherwise, without the prior written permission of the publisher.
Printed in the United States of America.
456789 10—MA—009998
To Marla and Kendra
Preface
This text is an introduction to the design and implementation of various types of sys-
tem software. A central theme of the book is the relationship between machine architec-
ture and system software. For example, the design of an assembler or an operating
system is greatly influenced by the architecture of the machine on which it runs. These
influences are emphasized and demonstrated through the discussion of actual piecesof
system software for a variety of real machines.
However, there are also similarities between software for different systems. For ex-
ample, the basic structure and design of an assembler is essentially the same for most
computers. These fundamental machine-independent aspects of software design are
clearly identified and separatedfrom machine-specificdetails.
This third edition includes all new examples of machine architecture and software.
The principal computer architectures used as examples are Pentium Pro (x86),
UltraSPARC,PowerPC, and Cray T3E. The text discussesassemblers,loaders, compil-
ers, and operating systems for these machines, focusing on the relationship between
machine architecture and software design. There are also discussions of multiprocessor
and distributed operating systems, and systems structured according to the client-
server model.
This edition also includes an introduction to the principles of object-oriented pro-
gramming and design. Theseconceptsare illustrated by considering an object-oriented
design for an assembler.There are also examples of object-oriented operating systems,
and brief discussionsof the Javalanguage,compiler, and run-time environment.
A number of other new topics have been added in the third edition. For example,
the introductory chapter contains a brief discussion of RISC architectures. The chapter
on compilers includes new material on finite automata and shift-reduce parsing. Other
revisions and clarifications have been made throughout the text, and there are more
than 40 new exercises.
This book is intended primarily for use as a text in a junior-, senior-, or graduate-
level course in system software or systems programming. It is also suitable for use as a
referenceor for independent study. The reader is assumed to be familiar with the as-
sembler language for at least one machine and with common methods for representing
instructions and data within a computer (for example, octal and hexadecimal notations
and l|’s and 2’s complement representation of negative values). It is also assumed that
the reader is familiar with the implementation and use of basic data structures, particu-
larly linked lists and hash tables.
Chapter 1 contains a brief introduction to the book, and gives a description of the
Simplified Instructional Computer (SIC) that is used to present fundamental software
concepts.It also describes the real machines that are used as examples throughout the
text. These machines have contrasting architectures and were chosen as examples in or-
der to illustrate the variety in software and hardware systems.
Chapter 2 describesthe design and implementation of assemblers.The basic con-
cepts of program assembly are presented in Section 2.1, using the SIC machine as a
teachingaid. Thesebasicassemblerfunctionsand characteristicsshould remainessen-
tially the same, regardless of what machine is being used. This gives the student a
starting point from which to begin the design of an assembler for a new or unfamiliar
machine. Section 2.2 discusses machine-dependent extensions to the basic structure
presented in Section 2.1; this serves to emphasize the relationship between machine ar-
chitecture and assemblerdesign and implementation. Section 2.3 introduces a number
of machine-independent assembler features, and Section 2.4 discussessome important
alternatives for the overall structure of the assembler. These features and alternatives
are not dictated by machine considerations; they are choices made by the software de-
signer. In such areas,there is no one “right” way of doing things; a software designer
needs to be aware of the available options in order to make intelligent decisions be-
tween them. Finally, Section 2.5 discussesexamples of actual assemblersfor a variety
of real computers. This provides an illustration of both machine-dependentand ma-
chine-independent variations that reinforces the points made in previous sections of
the chapter.
The same general approach is followed in discussing loaders and linkers (Chapter
3), macro processors (Chapter 4), compilers (Chapter 5), and operating systems
(Chapter 6). The basic featuresof each type of software are described first, followed by
discussions of machine-dependent and machine-independent extensions to the basic
features. Design alternatives are then discussed and examples of actual pieces of soft-
ware are presented.
The depth of discussion varies considerably from one chapter to another. Chapters
2-4 give reasonably complete discussions of assemblers,linkers and loaders, and macro
processors.Implementation details, such as algorithms and data structures, are also in-
cluded. The student should be able to write a working assembler, loader, or macro
processorfrom the outline supplied by the text. (I strongly recommendthat such a pro-
ject be assigned in a course based on this book.)
Chapters 5 and 6, on the other hand, deal with the much larger subjectsof compil-
ers and operating systems.Each of these topics has, by itself, been the subject of many
entire books, so it is obviously impossible to fully discusseither one in a single chapter.
Instead the goal is to give the reader a brief-but-not-superficial overview of compilers
and operating systems. The most important and fundamental concepts of these types of
software are introduced and illustrated with examples. More advanced topics are men-
tioned, and referencesare given for the reader who wishes to explore these areasfur-
ther. Becauseof space limitations, most implementation details have been omitted. A
similar approach is followed in Chapter 7, which discussesdatabasemanagement sys-
tems, text editors, and interactive debugging systems.
Chapter 8 contains an introduction to software engineering concepts and tech-
niques. This chapter does not attempt to cover the full scope of software engineering
practice. Instead, it focuses on techniques that might be most useful in designing and
implementing a piece of system software such as an assembler.Both procedural and
object-oriented methods are discussed.The goal of this chapter is to provide the stu-
dent with a set of tools and methods that he or she can use in a software project based
on this book. The presentation of this material is relatively independent of the rest of
the text. Chapter 8 can be read at any time after the introduction to assemblers in
Section 2.1.
The exercisesthat appear at the end of each major chapter are an important part of
the text. They are intended to stimulate individual thought and class discussion; some
vi
of the questions are open-ended design problems that have no one “right answer.”
Many of the exercisesrequire the reader to apply concepts that have been covered in
the text, extending them to new situations. This ensures that the reader fully under-
stands the principles inVolved and is able to put them into actual practice. I have pur-
posely not included answers to the exercisesbecauseI believe that a set of answers
would have the effect of stifling thought and creativity rather than stimulating it.
This book contains more material than can usually be covered in a one-semester
course. This allows the instructor to place a varying degree of emphasis on different
topics to suit the needsof a specific curriculum. For example, if students will later take
a course that deals solely with operating systems, the instructor may wish to omit
Chapter 6; the remaining chapters include enough material for a typical one-semester
course. Other instructors may prefer to cover all of the major chapters, eliminating
some of the more advanced sections.
One issue that deservescomment is the use of a hypothetical computer (SIC) for in-
structional purposes. I have used the hypothetical machine primarily becauseit avoids
the problem of dealing with the irrelevant complexities and “quirks” found on most
real computers. It also clearly separatesthe fundamental concepts of a piece of software
from implementation details associatedwith a particular computer. If a real machine is
used in teaching, students are often unsure about which software characteristics are
truly fundamental and which are simply consequencesof the particular machine used
in the text.
A secondary benefit to using a hypothetical machine is that all students begin on
equal footing. No student is at an unfair disadvantage becausehe or she happens to be
unfamiliar with the hardware and software system on which the text is based. I have
found this to be particularly important in my courses, which tend to attract students
who have had experience on a variety of computers.
Finally, it should be noted that some of the original reviewers of this text were ini-
tially skeptical about SIC, but changed their opinion after seeinghow it could be used
as an instructional aid.
Of course, students in a course of this type need to be able to write and run pro-
grams for the machine being studied. A SIC simulator and a simple SIC assembler are
available for this purpose. This software enablesstudents to develop and run programs
as though they actually had accessto a SIC machine. It is not necessaryto use any par-
ticular real computer—the simulator and assemblercan be run on almost any comput-
ing system that supports Pascal. The SIC support software is available by anonymous
ftp from rohan.sdsu.edu,in the file faculty /beck /SystemSoftware.tar.
Many people have given their time and energy to help make this a better book. The
manuscript for this third edition was reviewed by Donald Gustafson, Texas Tech
University, Donald E. Merusi, The Hartford Graduate Center, Joseph Migga Kizza,
University of Tennesseeat Chattanooga, Thomas W. Page,Jr., Ohio State University,
Martina Schollmeyer,Ph.D., TexasA&M University-Corpus Christi, Violet R. Syrotiuk,
University of Texasat Dallas. The comments and suggestions of these reviewers were
extremely helpful to me in finding errors and other problems in the manuscript. Any
errors which remain are, of course, entirely my responsibility, and I would be very
grateful to any reader for pointing out such errors. Pleasesend comments and sugges-
tions to me at [email protected].
I am indebted to the fine team of professionals at Addison-Wesley who helped
make this third edition a reality. My editor, Susan Hartman, did an excellent job of co-
ordinating the writing and reviewing processes,and guiding the development of the
vil
project. Kathy Manley was superb in her role as production supervisor. I also want to
acknowledge the contributions of many others, including Pat Brown, Julie Dunn, and
Tom Ziolkowski.
I would like to thank Peter Ashford, John Carroll, Sushant Gargya, Bill Morris, Ron
Nash, Kris Stewart, and FeleciaVlahos for their assistancein finding referencematerial
on specific systems.I am also indebted to the students who used previous versions of
this material as a text and provided many valuable suggestions.
L.L.B.
San Diego, California
viii
Contents
Chapter1 Background
1.1. Introduction 1
Chapter 2 Assemblers 43
2.1.2.
2.1.1. ASimple SIC
Assembler Algorithm
Assembler
and 46
Data Structures
2.2 2.2.1
Machine-Dependent
Instruction Formats
Assembler
andFeatures
Addressing
52Modes
Chapter
Exercises
3.5.3.
4 Macro166
CrayProcessors
MPP Linker 164 175
4.4.22
|ANSIC MacroLanguage 209
Chapter5
Exercises
44.35
Compilers
218
The ELENA Macro Processor 213
225
325
Exercises 413
References 507
Index 511
xiii
Chapter 1
1.1 INTRODUCTION
discussed in the next section; many other examples appear throughout the
text.
One characteristic in which most system software differs from application soft-
ware is machine dependency. An application program is primarily concerned
with the solution of some problem, using the computer as a tool. The focus is
on the application, not on the computing system. System programs, on the
other hand, are intended to support the operation and use of the computer it-
self, rather than any particular application. For this reason,they are usually re-
lated to the architecture of the machine on which they are to run. For example,
assemblers translate mnemonic instructions into machine code; the instruction
formats, addressing modes, etc., are of direct concern in assembler design.
Similarly, compilers must generate machine language code, taking into ac-
count such hardware characteristics as the number and type of registers and
the machine instructions available. Operating systems are directly concerned
with the management of nearly all of the resources of a computing system.
Many other examples of such machine dependencies may be found through-
out this book.
On the other hand, there are some aspects of system software that do not
directly depend upon the type of computing system being supported. For
example, the general design and logic of an assembleris basically the same on
most computers. Some of the code optimization techniques used by compilers
are independent of the target machine (although there are also machine-
dependent optimizations). Likewise, the process of linking together indepen-
dently assembled subprograms does not usually depend on the computer
being used. We will also see many examples of such machine-independent
features in the chapters that follow.
Becausemost system software is machine-dependent, we must include real
machines and real pieces of software in our study. However, most real com-
puters have certain characteristics that are unusual or even unique. It can be
Chapter
1 Background
difficult to distinguish between those features of the software that are truly
fundamental and those that depend solely on the idiosyncrasies of a particular
machine. To avoid this problem, we present the fundamental functions of each
piece of software through discussion of a Simplified Instructional Computer
(SIC). SIC is a hypothetical computer that has been carefully designed to in-
clude the hardware features most often found on real machines, while avoid-
ing unusual or irrelevant complexities. In this way, the central concepts of a
piece of system software can be clearly separated from the implementation de-
tails associated with a particular machine. This approach provides the reader
with a starting point from which to begin the design of system software for a
new or unfamiliar computer.
Each major chapter in this text first introduces the basic functions of
the type of system software being discussed. We then consider machine-
dependent and machine-independent extensions to these functions, and exam-
ples of implementations on actual machines. Specifically, the major chapters
are divided into the following sections:
This chapter contains brief descriptions of SIC and of the real machines
that are used as examples. You are encouraged to read these descriptions now,
and refer to them as necessarywhen studying the examples in each chapter.
Like many other products, SIC comes in two versions: the standard model
and an XE version (XE stands for “extra equipment,” or perhaps “extra expen-
sive”). The two versions have been designed to be upward compatible—thatis,
an object program for the standard SIC machine will also execute properly on
a SIC/XE system. (Such upward compatibility is often found on real comput-
ers that are closely related to one another.) Section 1.3.1 summarizes the stan-
dard features of SIC. Section 1.3.2 describes the additional features that are
included in SIC/XE. Section 1.3.3 presents simple examples of SIC and
SIC/XE programming. These examples are intended to help you become more
familiar with the SIC and SIC/XE instruction sets and assembler language.
Practice exercisesin SIC and SIC/XE programming can be found at the end of
this chapter.
Memory
Memory consists of 8-bit bytes; any 3 consecutive bytes form a word (24 bits).
All addresseson SIC are byte addresses;words are addressed by the location
of their lowestnumberedbyte.Therearea totalof 32,768
(215)bytesin the
computer memory.
Registers
There are five registers, all of which have special uses. Each register is 24 bits
in length. The following table indicates the numbers, mnemonics, and uses of
these registers. (The numbering scheme has been chosen for compatibility
with the XE version of SIC.)
Data Formats
Instruction Formats
All machine instructions on the standard version of SIC have the following
24-bit format:
8 1 15
Addressing Modes
There are two addressing modes available, indicated by the setting of the x bit
in the instruction. The following table describes how the targetaddressis calcu-
lated from the address given in the instruction. Parentheses are used to indi-
cate the contents of a register or a memory location. For example, (X)
represents the contents of register X.
Instruction Set
SIC provides a basic set of instructions that are sufficient for most simple
tasks. These include instructions that load and store registers (LDA, LDX, STA,
STX, etc.), as well as integer arithmetic operations (ADD, SUB, MUL, DIV). All
arithmetic operations involve register A and a word in memory, with the result
being left in the register. There is an instruction (COMP) that compares the
value in register A with a word in memory; this instruction sets a condition code
CC to indicate the result (<, =, or >). Conditional jump instructions (JLT,JEQ,
JGT) can test the setting of CC, and jump accordingly. Two instructions are
1.3. TheSimplifiedInstructionalComputer(SIC)
On the standard version of SIC, input and output are performed by transfer-
ring 1byteat a timeto or fromtherightmost
8 bitsof registerA. Eachdeviceis
assigned a unique 8-bit code. There are three I/O instructions, each of which
specifies the device code as an operand.
The Test Device (TD) instruction tests whether the addressed device is
ready to send or receive a byte of data. The condition code is set to indicate the
result of this test. (A setting of < means the device is ready to send or receive,
and = means the device is not ready.) A program needing to transfer data must
wait until the device is ready, then execute a Read Data (RD) or Write Data
(WD). This sequencemust be repeated for each byte of data to be read or writ-
ten. The program shown in Fig. 2.1 (Chapter 2) illustrates this technique for
performing I/O.
Memory
The memory structure for SIC/XE is the same as that previously described for
SIC. However, the maximum memory available on a SIC/XE system is
1 megabyte(220bytes).This increaseleads to a changein instruction formats
and addressing modes.
Registers
Data Formats
SIC/XE provides the same data formats as the standard version. In addition,
there is a 48-bit floating-point data type with the following format:
1 11 36
The fraction is interpreted as a value between 0 and 1; that is, the assumed bi-
nary point is immediately before the high-order bit. For normalized floating-
point numbers, the high-order bit of the fraction must be 1. The exponent is
interpreted as an unsigned binary number between 0 and 2047.If the exponent
has value e and the fraction has value f, the absolute value of the number rep-
resented is
£ + 2(e-1024),
Instruction Formats
The larger memory available on SIC/XE means that an address will (in gen-
eral) no longer fit into a 15-bit field; thus the instruction format used on the
standard version of SIC is no longer suitable. There are two possible options—
either use some form of relative addressing, or extend the address field to 20
bits. Both of these options are included in SIC/XE (Formats 3 and 4 in the fol-
lowing description). In addition, SIC/XE provides some instructions that do
not reference memory at all. Formats 1 and 2 in the following description are
used for such instructions.
The new set of instruction formats is as follows. The settings of the flag bits
in Formats 3 and 4 are discussed under Addressing Modes. Bit e is used to dis-
tinguish between Formats 3 and 4 (e = 0 means Format 3, e = 1 means Format
4). Appendix A indicates the format to be used with each machine instruction.
Format 1 (1 byte):
8
1.3. TheSimplifiedInstructional Computer(SIC)
Format 2 (2 bytes):
Format 3 (3 bytes):
1111141 4
Format 4 (4 bytes):
111111 20
Addressing Modes
Two new relative addressing modes are available for use with instructions
assembledusing Format 3. Theseare described in the following table:
Bits t and n in Formats 3 and 4 are used to specify how the target addressis
used. If bit 1 = 1 and n = 0, the target address itself is used as the operand
value; no memory referenceis performed. This is called immediateaddressing.
If bit / = 0 and n = 1, the word at the location given by the target address is
fetched; the vale contained in this word is then taken as the address of the
operand value. This is called indirect addressing. If bits i and n are both 0 or
both 1, the target address is taken as the location of the operand; we will refer
to this as simpleaddressing. Indexing cannot be used with immediate or indi-
rect addressing modes.
Many authors use the term effectiveaddressto denote what we have called
the target address for an instruction. However, there is disagreement concern-
ing the meaning of effective address when referring to an instruction that uses
indirect addressing. To avoid confusion, we use the term target address
throughout this book.
SIC/XE instructions that specify neither immediate nor indirect addressing
are assembled with bits 1 and i both set to 1. Assemblers for the standard ver-
sion of SIC will, however, set the bits in both of these positions to 0. (This is be-
cause the 8-bit binary codes for all of the SIC instructions end in 00.) All
SIC/XE machines have a special hardware feature designed to provide the up-
ward compatibility mentioned earlier. If bits 1 and i are both 0, then bits b, p,
and e are considered to be part of the address field of the instruction (rather
than flags indicating addressing modes). This makes Instruction Format 3
identical to the format used on the standard version of SIC, providing the de-
sired compatibility.
Figure 1.1 gives examples of the different addressing modes available on
SIC/XE. Figure 1.1(a) shows the contents of registers B, PC, and X, and of se-
lected memory locations. (All values are given in hexadecimal.) Figure 1.1(b)
gives the machine code for a series of LDA instructions. The target address
generated by each instruction, and the value that is loaded into register A, are
also shown. You should carefully examine theseexamples,being sure you un-
derstand the different addressing modes illustrated.
For easeof reference,all of the SIC/XE instruction formats and addressing
modes are summarized in Appendix A.
Instruction Set
SIC/XE provides all of the instructions that are available on the standard
version. In addition, there are instructions to load and store the new registers
(LDB, STB, etc.) and to perform floating-point arithmetic operations (ADDF,
1.3 The SimplifiedInstructional Computer(SIC) 11
e
(B)
=006000
°
e e
(PC)
=003000
; (X)
=000090
3030
3600
6390
| 00C303
e
*| 003030
e
e
e
e| e |
e | e
(a)
SUBF,MULF, DIVF). There are also instructions that take their operands from
registers. Besides the RMO (register move) instruction, these include
register-to-register arithmetic operations (ADDR, SUBR, MULR, DIVR). A spe-
cial supervisor call instruction (SVC) is provided. Executing this instruction
generates an interrupt that can be used for communication with the operating
system. (Supervisor calls and interrupts are discussedin Chapter 6.)
There are also several other new instructions. Appendix A gives a complete
list of all SIC/XE instructions, with their operation codesand a specification of
the function performed by each.
The I/O instructions we discussed for SIC are also available on SIC/XE. In ad-
dition, there are 1/O channels that can be used to perform input and output
while the CPU is executing other instructions. This allows overlap of comput-
ing and I/O, resulting in more efficient system operation. The instructions
SIO, TIO, and HIO are used to start, test, and halt the operation of I/O chan-
nels. (Theseconceptsare discussedin detail in Chapter 6.)
This section presents simple examples of SIC and SIC/XE assembler language
programming. Theseexamples are intended to help you becomemore familiar
with the SIC and SIC/XE instruction sets and assembler language. It is as-
sumed that the reader is already familiar with the assembler language of at
least one machine and with the basic ideas involved in assembly-level pro-
gramming.
The primary subject of this book is systems programming, not assembler
language programming. The following chapters contain discussionsof various
types of system software, and in some cases SIC programs are used to illus-
trate the points being made. This section contains material that may help you
to understand these examples more easily. However, it does not contain any
new material on system software or systems programming. Thus, this section
can be skipped without any loss of continuity.
Figure 1.2 contains examples of data movement operations for SIC and
SIC/XE. There are no memory-to-memory move instructions; thus, all data
movementmust be done using registers.Figure1.2(a)showstwo examplesof
data movement. In the first, a 3-byte word is moved by loading it into register
A and then storing the register at the desired destination. Exactly the same
thing could be accomplished using register X (and the instructions LDX, STX)
or register L (LDL, STL). In the second example, a single byte of data is moved
using the instructions LDCH (Load Character) and STCH (Store Character).
1.8 TheSimplifiedInstructionalComputer(SIC) 13
more words of storage for use by the program. For example, the RESW state-
ment in Fig. 1.2(a) defines one word of storage labeled ALPHA, which will be
used to hold a value generated by the program.
The statements BYTE and RESBperform similar storage-definition func-
tions for data items that are characters instead of words. Thus in Fig. 1.2(a)
CHARZ is a 1-byte data item whose value is initialized to the character “Z”,
and C1 is a 1-byte variable with no initial value.
cl RESB 1 ONE-BYTE
VARIABLE
(a)
LDA #5 LOAD
VALUE5 INTOREGISTER
A
STA ALPHA STORE
IN ALPHA
LDA #90 LOAD
ASCIICODEFOR’Z’ INTOREGA
STCH cl STORE
IN CHARACTER
VARIABLE
C1
ALPHA
—-RESW 1 ONE-WORD
VARIABLE
cl RESB 1 ONE-BYTE
VARIABLE
(b)
Figure 1.2 Sample data movement operations for (a) SIC and
(b) SIC/XE.
14 Chapter
1 Background
The instructions shown in Fig. 1.2(a) would also work on SIC/XE; how-
ever, they would not take advantage of the more advanced hardware features
available. Figure 1.2(b) shows the same two data-movement operations as
they might be written for SIC/XE. In this example, the value 5 is loaded into
register A using immediate addressing. The operand field for this instruction
contains the flag # (which specifies immediate addressing) and the data value
to be loaded. Similarly, the character “Z” is placed into register A by using im-
mediate addressing to load the value 90, which is the decimal value of the
ASCII code that is used internally to represent the character “Z”.
Figure 1.3(a) shows examples of arithmetic instructions for SIC. All arith-
metic operations are performed using register A, with the result being left in
register A. Thus this sequenceof instructions stores the value (ALPHA + INCR
—1) in BETA and the value (GAMMA + INCR - 1) in DELTA.
Figure 1.3(b) illustrates how the same calculations could be performed on
SIC/XE. The value of INCR is loaded into register $ initially, and the register-
to-register instruction ADDR is used to add this value to register A when it is
needed. This avoids having to fetch INCR from memory each time it is used in
a calculation, which may make the program more efficient. Immediate ad-
dressing is used for the constant 1 in the subtraction operations.
Looping and indexing operations are illustrated in Fig. 1.4. Figure 1.4(a)
shows a loop that copies one 11-byte character string to another. The index
register (register X) is initialized to zero before the loop begins. Thus, during
the first execution of the loop, the target address for the LDCH instruction will
be the address of the first byte of STR1. Similarly, the STCH instruction will
store the character being copied into the first byte of STR2. The next instruc-
tion, TIX, performs two functions. First it adds 1 to the value in register X, and
then it compares the new value of register X to the value of the operand (in
this case, the constant value 11). The condition code is set to indicate the result
of this comparison. The JLT instruction jumps if the condition code is set to
“less than.” Thus, the JLT causesa jump back to the beginning of the loop if
the new value in register X is less than 11.
During the second execution of the loop, register X will contain the value
1. Thus, the target address for the LDCH instruction will be the second byte of
STR1,and the target address for the STCH instruction will be the second byte
of STR2. The TIX instruction will again add 1 to the value in register X, and the
loop will continue in this way until all 11bytes have been copied from STR1to
STR2. Notice that after the TIX instruction is executed, the value in register X
isequalto thenumberof bytes
that havealreadybeencopied.
Figure 1.4(b) shows the same loop as it might be written for SIC/XE. The
main difference is that the instruction TIXR is used in place of TIX. TIXR
works exactly like TIX, except that the value used for comparison is taken
from another register (in this case, register T), not from memory. This makes
15
the loop
memory
tialize register
each
moretime
Tefficient,
to the
thevalue
loop
because
is
11executed.
and
theto1.8
value
initialize
Immediate
The
does
Simplified
register
notaddressing
have
Instructional
X toto0.beisComputer
fetched
used tofrom
(SIC)
ini-
ae
DELTA
(a)
BETA
DELTA
. ONE-WORD CONSTANTS
ZERO WORD 0
(a)
(b)
Figure 1.4 Sample looping and indexing operations for (a) SIC and
(b) SIC/XE.
STA INDEX
(a)
#3 INITIALIZE REGISTER S TO 3
#300 INITIALIZE REGISTER T TO 300
LDX #0 INITIALIZE INDEX REGISTER TO 0
ADDLP ALPHA,
X LOAD WORD FROM ALPHA INTO REGISTER A
ADD BETA,
X ADD WORD FROM BETA
STA GAMMA
,X STORE THE RESULT IN A WORD IN GAMMA
ADDR S,X% ADD 3 TO INDEX VALUE
COMPR X,T COMPARE NEW INDEX VALUE TO 300
JLT ADDLP LOOP IF INDEX VALUE IS LESS THAN 300
Figure
(b) SIC/XE.
1.5 Sample indexing and(b)
looping operations for (a) SIC and
18 Chapter1 Background
If the device is ready to transmit data, the condition code is set to “less than”;
if the device is not ready, the condition code is set to “equal.” As Fig. 1.6
1.2 TheSimplifiedInstructional Computer(SIC) 19
illustrates, the program must execute the TD instruction and then check the
condition code by using a conditional jump. If the condition code is “equal”
(device not ready), the program jumps back to the TD instruction. This two-
instruction loop will continue until the device becomesready; then the RD will
be executed.
Output is performed in the same way. First the program uses TD to check
whether the output device is ready to receive a byte of data. Then the byte to
be written is loaded into the rightmost byte of register A, and the WD (Write
Data) instruction is used to transmit it to the device.
Figure 1.7 shows how these instructions can be used to read a 100-byte
record from an input device into memory. The read operation in this example
is placed in a subroutine. This subroutine is called from the main program by
using the JSUB (Jump to Subroutine) instruction. At the end of the subroutine
there is an RSUB (Return from Subroutine) instruction, which returns control
to the instruction that follows the JSUB.
The READ subroutine itself consists of a loop. Each execution of this loop
reads 1 byte of data from the input device, using the same techniques illus-
trated in Fig. 1.6.The bytes of data that are read are stored in a 100-byte buffer
area labeled RECORD.The indexing and looping techniques that are used in
storing characters in this buffer are essentially the same as those illustrated in
Fig. 1.4(a).
Figure 1.7(b) shows the same READ subroutine as it might be written for
SIC/XE. The main differences from Fig. 1.7(a) are the use of immediate
addressing and the TIXR instruction, as was illustrated in Fig. 1.4(a).
20 Chapter1 Background
(a)
Figure
(a) SIC and
1.7 (b)
Sample
SIC/XE.
subroutine call
(b) and record input operations for
° 1.4. Traditional (CISC) Machines 21
This section introduces the architectures of two of the machines that will be
used as examples later in the text. Section 1.4.1describes the VAX architecture,
and Section 1.4.2 describes the architecture of the Intel x86 family of proces-
SOrs.
Memory
The VAX memory consists of 8-bit bytes. All addresses used are byte ad-
dresses.Two consecutive bytes form a word; four bytes form a longword;eight
bytes form a quadword;sixteen bytes form an octaword.Some operations are
more efficient when operands are aligned in a particular way—for example, a
longword operand that begins at a byte address that is a multiple of 4.
All VAX programs operate in a virtual addressspaceof 232 bytes. This vir-
tual memory allows programs to operate as though they had accessto an ex-
tremely large memory, regardless of the amount of memory actually present
on the system. Routines in the operating system take care of the details of
memory management. We discuss virtual memory in connection with our
study of operating systems in Chapter 6. One half of the VAX virtual address
space is called systemspace,which contains the operating system, and is shared
by all programs. The other half of the address space is called processspace,and
Chapter 1 Background
isdefined
separately
foreach
program.
A partoftheprocess
space
contains
stacks that are available to the program. Special registers and machine instruc-
tions aid in the use of thesestacks.
Registers
Data Formats
is used to represent numeric values with one digit per byte. In this format, the
sign may appear either in the last byte, or as a separatebyte preceding the first
digit. These two variations are called trailing numeric and leadingseparatenu-
meric.
VAX also supports queues and variable-length bit strings. Data structures
such as these can, of course, be implemented on any machine; however, VAX
provides direct hardware support for them. There are single machine instruc-
tions that insert and remove entries in queues, and perform a variety of opera-
tions on bit strings. The existenceof such powerful machine instructions and
complex primitive data types is one of the more unusual features of the VAX
architecture.
Instruction Formats
Addressing Modes
VAX provides a large number of addressing modes. With few exceptions, any
of these addressing modes may be used with any instruction. The operand it-
self may be in a register (registermode), or its address may be specified by a
register (registerdeferredmode). If the operand address is in a register, the reg-
ister contents may be automatically incremented or decremented by the
operand length (autoincrementand autodecrementmodes). There are several
base relative addressing modes, with displacement fields of different lengths;
when used with register PC, these become program-counter relative modes.
All of these addressing modes may also include an index register, and many of
them are available in a form that specifies indirect addressing (called deferred
modes on VAX). In addition, there are immediate operands and several spe-
cial-purpose addressing modes. For further details, see Baase(1992).
Instruction Set
One of the goals of the VAX designers was to produce an instruction set that is
symmetric with respect to data type. Many instruction mnemonics are formed
by combining the following elements:
24 Chapter
1 Background
For example, the instruction ADDW2 is an add operation with two operands,
each a word in length. Likewise, MULL3 is a multiply operation with three
longword operands, and CVIWL specifies a conversion from word to long-
word. (In the latter case,a two-operand instruction is assumed.) For a typical
instruction, operands may be located in registers, in memory, or in the instruc-
tion itself (immediate addressing). The same machine instruction code is used,
regardlessof operand locations.
VAX provides all of the usual types of instructions for computation, data
movement and conversion, comparison, branching, etc. In addition, there are a
number of operations that are much more complex than the machine instruc-
tions found on most computers. These operations are, for the most part, hard-
ware realizations of frequently occurring sequences of code. They are
implemented as single instructions for efficiency and speed.For example, VAX
provides instructions to load and store multiple registers, and to manipulate
queues and variable-length bit fields. There are also powerful instructions for
calling and returning from procedures. A single instruction savesa designated
set of registers, passes a list of arguments to the procedure, maintains the
stack, frame, and argument pointers, and sets a mask to enable error traps for
arithmetic operations. For further information on all of the VAX instructions,
see Baase (1992).
Input and output on the VAX are accomplished by I/O device controllers.
Each controller has a set of control/status and data registers, which are as-
signed locations in the physical address space. The portion of the address
spaceinto which the device controller registers are mapped is called I/O space.
No special instructions are required to accessregisters in I/O space. An
1/O device driver issues commands to the device controller by storing values
into the appropriate registers, exactly as if they were physical memory loca-
tions. Likewise, software routines may read these registers to obtain status in-
formation. The association of an address in I/O space with a physical register
in a device controller is handled by the memory managementroutines.
1.4 Traditional (CISC) Machines 25
The Pentium Pro microprocessor, introduced near the end of 1995, is the latest
in the Intel x86 family. Other recent microprocessors in this family are the
80486and Pentium. Processorsof the x86 family are presently used in a major-
ity of personal computers, and there is a vast amount of software for these
processors.It is expected that additional generations of the x86 family will be
developed in the future.
The various x86 processors differ in implementation details and operating
speed. However, they share the same basic architecture. Each succeeding gen-
eration has been designed to be compatible with the earlier versions. This sec-
tion contains an overview of the x86 architecture, which will serve as
background for the examples to be discussed later in the book. Further infor-
mation about the x86 family can be found in Intel (1995), Anderson and
Shanley (1995),and Tabak (1995).
Memory
Memory in the x86 architecture can be described in at least two different ways.
At the physical level, memory consists of 8-bit bytes. All addresses used are
byte addresses.Two consecutive bytes form a word; four bytes form a double-
word (also called a dword). Some operations are more efficient when operands
are aligned in a particular way—for example, a doubleword operand that be-
gins at a byte addressthat is a multiple of 4.
However, programmers usually view the x86 memory as a collection of
segments.From this point of view, an address consists of two parts—a segment
number and an offset that points to a byte within the segment. Segmentscan
be of different sizes, and are often used for different purposes. For example,
some segments may contain executable instructions, and other segments may
be used to store data. Some data segments may be treated as stacks that can be
used to save register contents, pass parameters to subroutines, and for other
purposes.
It is not necessaryfor all of the segmentsused by a program to be in physi-
cal memory. In some cases, a segment can also be divided into pages. Some of
the pages of a segment may be in physical memory, while others may be
stored on disk. When an x86 instruction is executed, the hardware and the op-
erating system make sure that the needed byte of the segment is loaded into
physical memory. The segment/offset address specified by the programmer is
automatically translated into a physical byte address by the x86 Memory
26 Chapter
1 Background
Registers
There are eight general-purpose registers, which are named EAX, EBX, ECX,
EDX, ESI, EDI, EBP,and ESP.Eachgeneral-purpose register is 32 bits long (i.e.,
one doubleword). Registers EAX, EBX, ECX, and EDX are generally used for
data manipulation; it is possible to access individual words or bytes from
these registers. The other four registers can also be used for data, but are more
commonly used to hold addresses.The general-purpose register set is identi-
cal for all members of the x86 family beginning with the 80386.This set is also
compatible with the more limited register sets found in earlier members of the
family.
There are also several different types of special-purpose registers in the x86
architecture. EIP is a 32-bit register that contains a pointer to the next instruc-
tion to be executed. FLAGS is a 32-bit register that contains many different bit
flags. Some of these flags indicate the status of the processor;others are used
to record the results of comparisons and arithmetic operations. There are also
Six 16-bit segment registers that are used to locate segments in memory.
Segment register CS contains the address of the currently executing code seg-
ment, and SS contains the address of the current stack segment. The other seg-
ment registers (DS, ES, FS, and GS) are used to indicate the addresses of data
segments.
Floating-point computations are performed using a special floating-point
unit (FPU). This unit contains eight 80-bit data registers and several other con-
trol and status registers.
All of the registers discussed so far are available to application programs.
There are also a number of registers that are used only by system programs
such as the operating system. Someof theseregistersare used by the MMU to
translate segment addresses into physical addresses.Others are used to con-
trol the operation of the processor,or to support debugging operations.
Data Formats
The x86 architecture provides for the storage of integers, floating-point values,
characters, and strings. Integers are normally stored as 8-, 16-, or 32-bit binary
numbers. Both signed and unsigned integers (also called ordinals) are sup-
ported; 2’s complement is used for negative values. The FPU can also handle
64-bit signed integers. In memory, the least significant part of a numeric value
is stored at the lowest-numbered address. (This is commonly called
1.4. Traditional (CISC) Machines 27
little-endian byte ordering, because the “little end” of the value comes first in
memory.)
Integers can also be stored in binary codeddecimal(BCD). In the unpacked
BCD format, each byte represents one decimal digit. The value of this digit is
encoded (in binary) in the low-order 4 bits of the byte; the high-order bits are
normally zero. In the packed BCD format, each byte represents two decimal
digits, with each digit encoded using 4 bits of the byte.
There are three different floating-point data formats. The single-precision
format is 32 bits long. It stores 24 significant bits of the floating-point value,
and allows for a 7-bit exponent (power of 2). (The remaining bit is used to
store the sign of the floating-point value.) The double-precision format is 64
bits long. It stores 53 significant bits, and allows for a 10-bit exponent. The
extended-precision format is 80 bits long. It stores 64 significant bits, and
allows for a 15-bit exponent.
Characters are stored one per byte, using their 8-bit ASCII codes. Strings
may consist of bits, bytes, words, or doublewords; special instructions are
provided to handle each type of string.
Instruction Formats
All of the x86 machine instructions use variations of the same basic format.
This format begins with optional prefixes containing flags that modify the op-
eration of the instruction. For example, some prefixes specify a repetition
count for an instruction. Others specify a segment register that is to be used
for addressing an operand (overriding the normal default assumptions made
by the hardware). Following the prefixes (if any) is an opcode (1 or 2 bytes);
some operations have different opcodes, each specifying a different variant of
the operation. Following the opcode are a number of bytes that specify the
operands and addressing modes to be used. (Seethe description of addressing
modes in the next section for further information.)
The opcode is the only element that is always present in every instruction.
Other elements may or may not be present, and may be of different lengths,
depending on the operation and the operands involved. Thus, there are a large
number of different potential instruction formats, varying in length from
1 byte to 10 bytes or more.
Addressing Modes
Operands stored in memory are often specified using variations of the gen-
eral target address calculation
Instruction Set
The x86 architecture has a large and complex instruction set, containing more
than 400 different machine instructions. An instruction may have zero, one,
two, or three operands. There are register-to-register instructions, register-to-
memory instructions, and a few memory-to-memory instructions. In some
cases,operands may also be specified in the instruction as immediate values.
Most data movement and integer arithmetic instructions can use operands
that are 1, 2, or 4 bytes long. String manipulation instructions, which use repe-
tition prefixes, can deal directly with variable-length strings of bytes, words,
or doublewords. There are many instructions that perform logical and bit ma-
nipulations, and support control of the processor and memory-management
systems.
The x86 architecture also includes special-purpose instructions to perform
operations frequently required in high-level programming languages—forex-
ample, entering and leaving procedures and checking subscript values against
the bounds of an array.
This section introducés the architectures of three RISC machines that will be
used as examples later in the text. Section 1.5.1describes the architecture of the
SPARCfamily of processors.Section 1.5.2describesthe PowerPC family of mi-
croprocessorsfor personal computers. Section 1.5.3 describes the architecture
of the Cray T3E supercomputing system.
All of these machines are examples of RISC (Reduced Instruction Set
Computers), in contrast to traditional CISC (Complex Instruction Set
Computer) implementations such as Pentium and VAX. The RISC concept, de-
veloped in the early 1980s,was intended to simplify the design of processors.
This simplified design can result in faster and less expensive processor devel-
opment, greater reliability, and faster instruction execution times.
In general, a RISC system is characterized by a standard, fixed instruction
length (usually equal to one machine word), and single-cycle execution of
most instructions. Memory accessis usually done by load and store instruc-
tions only. All instructions except for load and store are register-to-register op-
erations. There are typically a relatively large number of general-purpose
registers. The number of machine instructions, instruction formats, and ad-
dressing modes is relatively small.
The discussions in the following sections will illustrate some of these RISC
characteristics.Further information about the RISCapproach, including its ad-
vantagesand disadvantages,can be found in Tabak (1995).
Memory
Memory consists of 8-bit bytes; all addresses used are byte addresses.Two
consecutive bytes form a halfword;four bytes form a word; eight bytes form a
doubleword.Halfwords are stored in memory beginning at byte addressesthat
are multiples of 2. Similarly, words begin at addressesthat are multiples of 4,
and doublewords at addressesthat are multiples of 8.
UltraSPARC programs can be written using a virtual address space of
264bytes. This addressspaceis divided into pages;multiple page sizesare sup-
ported. Some of the pages used by a program may be in physical memory,
while others may be stored on disk. When an instruction is executed, the hard-
ware and the operating system make sure that the needed page is loaded into
physical memory. The virtual address specified by the instruction is automati-
cally translated into a physical address by the UltraSPARC Memory Manage-
ment Unit (MMU). Chapter 6 contains a brief discussion of methods that can
be used in this kind of address translation.
Registers
Besides these register files, there are a program counter PC (which contains
the address of the next instruction to be executed), condition code registers,
and a number of othef control registers.
Data Formats
Charactersare stored one per byte, using their 8-bit ASCII codes.
Instruction Formats
There are three basic instruction formats in the SPARC architecture. All of
these formats are 32 bits long; the first 2 bits of the instruction word identify
which format is being used. Format 1 is used for the Call instruction. Format 2
is used for branch instructions (and one special instruction that enters a value
into a register). The remaining instructions use Format 3, which provides for
register loads and stores,and three-operand arithmetic operations.
The fixed instruction length in the SPARC architecture is typical of RISC
systems,and is intended to speed the process of instruction fetching and de-
coding. Compare this approach with the complex variable-length instructions
found on CISC systems such as VAX and x86.
Addressing Modes
Instruction Set
The basic SPARC architecture has fewer than 100 machine instructions, reflect-
ing its RISC philosophy. (Compare this with the 300 to 400 instructions often
found in CISC systems.) The only instructions that accessmemory are loads
and stores. All other instructions are register-to-register operations.
Instruction execution on a SPARC system is pipelined—while one instruc-
tion is being executed, the next one is being fetched from memory and de-
coded. In most cases,this technique speedsinstruction execution.However, an
ordinary branch instruction might cause the process to “stall.” The instruction
following the branch (which had already been fetched and decoded) would
have to be discarded without being executed.
To make the pipeline work more efficiently, SPARCbranch instructions (in-
cluding subroutine calls) are delayedbranches.This means that the instruction
immediately following the branch instruction is actually executed beforethe
branch is taken. For example, in the instruction sequence
the MOV instruction is executed before the branch BA. This MOV instruction
is said to be in the delay slot of the branch. The programmer must take this
characteristic into account when writing an assembler language program.
Further discussionsand examplesof the use of delayed branchescan be found
in Section 2.5.2.
The UltraSPARC architecture also includes special-purpose instructions to
provide support for operating systems and optimizing compilers. For exam-
ple, high-bandwidth block load and store operations can be used to speed
1.5 RISC Machines 33
Memory
Memory consists of 8-bit bytes; all addresses used are byte addresses. Two
consecutivebytes form a halfword;four bytes form a word; eight bytes form a
doubleword;sixteen bytes form a quadword. Many instructions may execute
34 Chapter 1 Background
Registers
Data Formats
Instruction Formats
There are seven basic instruction formats in the PowerPC architecture, some of
which have subforms. All of these formats are 32 bits long. Instructions must
be aligned beginning at a word boundary (i.e., a byte address that is a multiple
of 4). The first 6 bits of the instruction word always specify the opcode; some
instruction formats also have an additional “extended opcode” field.
The fixed instruction length in the PowerPC architecture is typical of RISC
systems. The variety and complexity of instruction formats is greater than that
found on most RISC systems (such as SPARC). However, the fixed length
makes instruction decoding faster and simpler than on CISC systems like VAX
and x86.
Addressing Modes
The register numbers and displacement are encoded as part of the instruction.
36 Chapter1 Background
Mode Targetaddresscalculation
Absolute TA = actualaddress
Relative TA = currentinstructionaddress+
displacement {25 bits, signed}
Link Register TA = (LR)
Count Register TA = (CR)
Instruction Set
The PowerPC architecture provides two different methods for performing 1/O
operations. In one approach, segments in the virtual address space are
mapped onto an external address space(typically an I/O bus). Segmentsthat
are mapped in this way are called direct-storesegments.This method is similar
to the approach used in the SPARCarchitecture.
1.5 RISC Machines 37
DEC Alpha microprocessor.Sections 3.5.3 and 5.5.3 discuss some of the ways
programs can take advantage of the multiprocessor architecture of this ma-
chine. Further information about the T3E can be found in Cray Research
(1995c).Further information about the DEC Alpha architecture can be found in
Sites (1992) and Tabak (1995).
Memory
Each processing element in the T3E has its own local memory with a capacity
of from 64 megabytes to 2 gigabytes. The local memory within each PE is part
~~
>— Interconnectnetwork
N
Registers
Data Formats
The Alpha architecture provides for the storage of integers, floating-point val-
ues, and characters.Integers are stored as longwords or quadwords; 2’s com-
plement is used for negative values. When interpreted as an integer, the bits of
a longword or quadword have steadily increasing significance beginning with
bit 0 (which is stored in the lowest-addressedbyte).
There are two different types of floating-point data formats in the Alpha
architecture. One group of three formats is included for compatibility with the
VAX architecture. The other group consists of four IEEE standard formats,
which are compatible with those used on most modern systems.
Characters may be stored one per byte, using their 8-bit ASCII codes.
However, there are no byte load or store operations in the Alpha architecture;
only longwords and quadwords can be transferred between a register and
memory. As a consequence,characters that are to be manipulated separately
are usually stored one per longword.
1.5 RISC Machines 39
Instruction Formats
There are five basic Instruction formats in the Alpha architecture, some of
which have subforms. All of these formats are 32 bits long. (As we have noted
before, this fixed length is typical of RISC systems.) The first 6 bits of the in-
struction word always specify the opcode; some instruction formats also have
an additional “function” field.
Addressing Modes
Register indirect with displacement mode is used for load and store opera-
tions and for subroutine jumps. PC-relative mode is used for conditional and
unconditional branches.
Instruction Set
The T3E system performs I/O through multiple ports into one or more I/O
channels, which can be configured in a number of ways. These channels are
Chapter 1 Background
integrated into the network that interconnects the processing nodes. A system
may be configured with up to one I/O channel for every eight PEs.All chan-
nels are accessible and controllable from all PEs.
Further information about this “scalable” I/O architecture can be found in
Cray Research(1995c).
EXERCISES
Section 1.3
Write a subroutine for SIC/XE that will read a record into a buffer, as
in Fig. 1.7(b).The record may be any length from 1 to 100 bytes. The
end of the record is marked with a “null” character (ASCII code 00).
Thesubroutineshouldplacethelengthof therecordreadintoa vari-
able named LENGTH. Use immediate addressing and register-to-
register instructions to make the subroutine as efficient as possible.
Chapter 2
Assemblers
Beyond this most basic level, however, the features and design of an as-
sembler depend heavily upon the source language it translates and the ma-
chine language it produces. One aspect of this dependence is, of course, the
existence of different machine instruction formats and codes to accomplish
(for example) an ADD operation. As we shall see, there are also many subtler
ways that assemblersdepend upon machine architecture. On the other hand,
there are some features of an assembler language (and the corresponding as-
sembler) that have no direct relation to machine architecture—they are, in a
sense,arbitrary decisions made by the designers of the language.
We begin by considering the design of a basic assembler for the standard
version of our Simplified Instructional Computer (SIC). Section 2.1 introduces
the most fundamental operations performed by a typical assembler,and de-
scribes common ways of accomplishing these functions. The algorithms and
data structures that we describe are shared by almost all assemblers.Thus this
level of presentation gives us a starting point from which to approach the
study of more advanced assembler features. We can also use this basic struc-
ture as a framework from which to begin the design of an assembler for a com-
pletely new or unfamiliar machine.
In Section 2.2, we examine some typical extensions to the basic assembler
structure that might be dictated by hardware considerations. We do this by
discussing an assembler for the SIC/XE machine. Although this SIC/XE as-
sembler certainly does not include all possible hardware-dependent features,
it does contain some of the ones most commonly found in real machines. The
principles and techniques should be easily applicable to other computers.
Section 2.3 presents a discussion of some of the most commonly encoun-
tered machine-independent assembler language features and their implemen-
tation. Once again, our purpose is not to cover all possible options, but rather
43
44 Chapter
2. Assemblers
to introduce concepts and techniques that can be used in new and unfamiliar
situations.
Figure 2.1 shows an assembler language program for the basic version of SIC.
We use variations of this program throughout this chapter to show different
assembler features. The line numbers are for reference only and are not part of
the program. These numbers also help to relate corresponding parts of differ-
ent versions of the program. The mnemonic instructions used are those intro-
duced in Section 1.3.1 and Appendix A. Indexed addressing is indicated by
adding the modifier “,X” following the operand (seeline 160).Lines beginning
with “.” contain comments only.
In addition to the mnemonic machine instructions, we have used the fol-
lowing assembler
directives:
WORD Generateone-wordintegerconstant.
RESB Reservetheindicatednumberof bytesfor a dataarea.
RESW Reserve the indicated number of words for a data area.
The program contains a main routine that reads records from an input de-
vice (identified with device code F1) and copies them to an output device
(code 05). This main routine calls subroutine RDREC to read a record into a
buffer and subroutine WRREC to write the record from the buffer to the out-
2.1. Basic Assembler Functions 45
255
Line Source statement
95 RETADR RESW
100 LENGTH RESW LENGTH OF RECORD
put device. Each subroutine must transfer the record one character at a time
becausethe only I/O instructions available are RD and WD. The buffer is nec-
essary because the I/O rates for the two devices, such as a disk and a slow
printing terminal, may be very different. (In Chapter6, we seehow to use
channel programs and operating system calls on a SIC/XE system to accom-
plish the same functions.) The end of each record is marked with a null charac-
ter (hexadecimal 00). If a record is longer than the length of the buffer (4096
bytes), only the first 4096 bytes are copied. (For simplicity, the program does
not dealwith errorrecoverywhen
a recordcontaining4096bytesor moreis
read.) The end of the file to be copied is indicated by a zero-length record.
When the end of file is detected, the program writes EOF on the output device
and terminates by executing an RSUB instruction. We assume that this pro-
gram was Called by the operating system using a JSUB instruction; thus, the
RSUB will return control to the operating system.
Figure 2.2 shows the same program as in Fig. 2.1, with the generated object
code for each statement. The column headed Loc gives the machine address
(in hexadecimal) for each part of the assembled program. We have assumed
that the program starts at address 1000. (In an actual assembler listing, of
course, the comments would be retained; they have been eliminated here to
save space.)
The translation of source program to object code requires us to accomplish
the following functions (not necessarilyin the order given):
4. Convert the data constants specified in the source program into their
internal machine representations—e.g., translate EOF to 454F46(line
80).
object program and specifies the address in the program where execution is to
begin. (This is taken from the operand of the program’s END statement. If no
operand is specified, the address of the first executableinstruction is used.)
The formats we use for these records are as follows. The details of the for-
mats (column numbers, etc.) are arbitrary; however, the information contained
in these records must be present (in some form) in the object program.
Header record:
Col. 1 H
Text record:
Col. 1 T
Col.2-7 Starting
address
forobject
code
inthisrecord(hexadecimal)
Col. 8-9 Length of objectcode in this record in bytes (hexadecimal)
Col. 10-69 Object code, representedin hexadecimal (2 columns per
byte of object code)
End record:
Col. 1 E
To avoid confusion, we have used the term column rather than byte to refer to
positions within object program records. This is not meant to imply the use of
any particular medium for the object program.
Figure 2.3 shows the object program corresponding to Fig. 2.2, using this
format. In this figure, and in the other object programs we display, the symbol
“ is used to separatefields visually. Of course, such symbols are not present in
the actual object program. Note that there is no object code corresponding to
addresses1033-2038.This storage is simply reserved by the loader for use by
the program during execution. (Chapter 3 contains a detailed discussion of the
operation of the loader.)
We can now give a general description of the functions of the two passesof
our simple assembler.
HCOPY 00100000107A
7,001
0001E1
4103348203900
10362810303010154820613C100300102A0C103900102D
TOOLOLEI
5,0€10364820610810334C0000454F46000003000000
70020391E04
103000
1030E0205D30203FD8205D2810303020575490392C205E38203F
70020571
C1010364CO0000F
100100004
1030E02079302064509039DC20792C1036
1,002073073820644C000005
E001000
In the next section we discuss these functions in more detail, describe the in-
ternal tables required by the assembler,and give an overall description of the
logic flow of each pass.
Our simple assembler uses two major internal data structures: the Operation
Code Table (OPTAB) and the Symbol Table (SYMTAB). OPTAB is used to look
up mnemonic operation codes and translate them to their machine language
equivalents. SYMTAB is used to store values (addresses)assigned to labels.
We also need a Location Counter LOCCTR. This is a variable that is used
It is possible for both passes of the assembler to read the original source
program as input. However, there is certain information (such as location
counter values and error flags for statements) that can or should be communi-
cated between the two passes.For this reason, Pass 1 usually writes an inter-
mediatefile that contains each source statement together with its assigned
address, error indicators, etc. This file is used as the input to Pass2. This work-
ing copy of the source program can also be used to retain the results of certain
52 Chapter
2. Assemblers
2.2 MACHINE-DEPENDENT
ASSEMBLER FEATURES
Pass1:
begin
while
else
initialize
OPCODE +LOCCTR
‘END’ do
to 0
begin
if this
beginis not a comment line then
end {Pass 1}
Chapter2. Assemblers
Pass2:
begin
read first input line {from intermediate file}
if OPCODE = ‘START’ then
begin
write listing line
read next input line
end {if START}
write Header record to object program
initialize first Text record
while OPCODE # ‘END’ do
begin
if this is not a comment line then
begin
search OPTAB for OPCODE
if found then
begin
if there is a symbol in OPERAND field then
begin
search SYMTAB for OPERAND
if found then
store symbol value as operand address
else
begin
store 0 as operand address
set error flag (undefined symbol)
end
end {if symbol}
else
store 0 as operand address
assemble the object code instruction
end {if opcode found}
else if OPCODE = ‘BYTE’ or ‘WORD’ then
convert constant to object code
if object code will not fit into the current Text record then
begin
write Text record to object program
initialize new Text record
end
add object code to Text record
end {if not comment}
write listing line
read next input line
end {while not END}
write last Text record to object program
write End record to object program
write last listing line -
end {Pass 2}
255
Line Source statement
>
(see line 70). Immediate operands are denoted with the prefix # (lines 25, 55,
133). Instructions that refer to memory are normally assembled using either
the program-counter relative or the base relative mode. The assembler direc-
tive BASE (line 13) is used in conjunction with base relative addressing. (See
Section 2.2.1 for a discussion and examples.) If the displacements required for
both program-counter relative and base relative addressing are too large to fit
into a 3-byte instruction, then the 4-byte extended format (Format 4) must be
used. The extended instruction format is specified with the prefix + added to
the operation code in the source statement (see lines 15, 35, 65). It is the pro-
grammer’s responsibility to specify this form of addressing when it is re-
quired.
The main differences between this version of the program and the version
in Fig. 2.1 involve the use of register-to-register instructions (in place of regis-
ter-to-memory instructions) wherever possible. For example, the statementon
line 150 is changed from COMP ZERO to COMPR AS. Similarly, line 165 is
changed from TIX MAXLEN to TIXR T. In addition, immediate and indirect
addressing have been used as much as possible (for example, lines 25, 55, and
70).
These changes take advantage of the more advanced SIC/XE architecture
to improve the execution speed of the program. Register-to-register instruc-
tions are faster than the corresponding register-to-memory operations because
they are shorter, and, more importantly, becausethey do not require another
memoryreference.
(Fetchingan operandfrom
a registeris muchfasterthanre-
trieving it from main memory.) Likewise, when using immediate addressing,
the operand is already present as part of the instruction and need not be
fetched from anywhere. The use of indirect addressing often avoids the need
for another instruction (as in the “return” operation on line 70). You may no-
tice that some of the changes require the addition of other instructions to the
program. For example, changing COMP to COMPR on line 150 forces us to
add the CLEAR instruction on line 132.This still results in an improvement in
execution speed. The CLEAR is executed only once for each record read,
whereas the benefits of COMPR (as opposed to COMP) are realized for every
byte of data transferred.
In Section 2.2.1, we examine the assembly of this SIC/XE program, focus-
ing on the differences in the assemblerthat are required by the new addressing
modes. (You may want to briefly review the instruction formats and target ad-
dress calculations described in Section 1.3.2.)These changes are direct conse-
quences
of theextended
hardware
functions.
Section 2.2.2 discusses an indirect consequence of the change to SIC/XE.
The larger main memory of SIC/XE means that we may have room to load
and run several programs at the same time. This kind of sharing of the ma-
chine between programs is called multiprogramming.Such sharing often results
in more productive use of the hardware. (We discuss this concept, and its
* 2.2. Machine-Dependent
AssemblerFeatures 57
Figure 2.6 shows the object code generated for each statement in the program
of Fig. 2.5. In this section we consider the translation of the source statements,
paying particular attention to the handling of different instruction formats and
different addressing modes. Note that the START statement now specifies a
beginning program address of 0. As we discuss in the next section, this indi-
cates a relocatable program. For the purposes of instruction assembly, how-
ever, the program will be translated exactly as if it were really to be loaded at
machine address 0.
>
Line Loc
Figure
1076
2.6 Program
Source
from statement
Fig. 2.5 with object code. Object code
the operand address is 1036.This full address is stored in the instruction, with
bit e set to 1 to indicate extended instruction format.
Note that the programmer must specify the extended format by using the
prefix + (as on line 15). If extended format is not specified, our assemblerfirst
attempts to translate the instruction using program-counter relative address-
ing. If this is not possible (becausethe required displacement is out of range),
the assembler then attempts to use base relative addressing. If neither form of
relative addressing is applicable and extended format is not specified, then the
instruction cannot be properly assembled. In this case, the assembler must
generate an error message.
We now examine the details of the displacement calculation for program-
counter relative and base relative addressing modes. The computation that the
assembler needs to perform is essentially the target address calculation in
reverse. You may want to review this from Section 1.3.2.
The instruction
Here the operand address is 0006. During instruction execution, the program
counter will contain the address 0001A. Thus the displacement required is
6-1A=-14. This is represented(using 2’s complement for negative numbers) in
a 12-bit field as FEC,which is the displacementassembledinto the objectcode.
The displacement calculation process for base relative addressing is much
the same as for program-counter relative addressing. The main difference is
60 Chapter
2.—Assemblers
that the assembler knows what the contents of the program counter will be at
execution time. The base register, on the other hand, is under control of the
programmer. Therefore, the programmer must tell the assembler what the
base register will contain during execution of the program so that the assem-
bler can compute displacements.This is done in our example with the assem-
bler directive BASE. The statement BASE LENGTH (line 13) informs the
assembler that the base register will contain the addressof LENGTH. The pre-
ceding instruction (LDB #LENGTH) loads this value into the register during
program execution. The assembler assumesfor addressing purposes that reg-
ister B contains this address until it encounters another BASE statement. Later
in the program, it may be desirable to use register B for another purpose (for
example, as temporary storage for a data value). In such a case,the program-
mer must use another assembler directive (perhaps NOBASE) to inform the
assembler that the contents of the base register can no longer be relied upon
for addressing.
It is important to understand that BASE and NOBASE are assembler direc-
tives, and produce no executable code. The programmer must provide instruc-
tions that load the proper value into the base register during execution. If this
is not done properly, the target addresscalculation will not produce the correct
operand address.
The instruction
to indicate indexed and base relative addressing. Another example is the in-
struction STX LENGTH on line 175.Here the displacement calculated is 0.
Notice the difference between the assembly of the instructions on lines 20
and 175. On line 20, LDA LENGTH is assembled with program-counter rela-
tive addressing. On line 175, STX LENGTH uses base relative addressing, as
noted previously. (If you calculate the program-counter relative displacement
that would be required for the statementon line 175,you will seethat it is too
large to fit into the 12-bit displacement field.) The statement on line 20 could
also have used base relative mode. In our assembler, however, we have arbi-
trarily chosen to attempt program-counter relative assemblyfirst.
The assembly of an instruction that specifies immediate addressing is sim-
pler becauseno memory reference is involved. All that is necessaryis to con-
vert the immediate operand to its internal representationand insert it into the
instruction. The instruction
“2.2. Machine-DependentAssemblerFeatures 61
is a typical example of this, with the operand stored in the instruction as 003,
and bit i set to 1 to indicate immediate addressing. Another example can be
found in the instruction
In this case the operand (4096) is too large to fit into the 12-bit displacement
field, so the extended instruction format is called for. (If the operand were too
large even for this 20-bit address field, immediate addressing could not be
used.)
A different way of using immediate addressing is shown in the instruction
In this statement the immediate operand is the symbol LENGTH. Since the
value of this symbol is the addressassigned to it, this immediate instruction has
the effect of loading register B with the address of LENGTH. Note here that
we have combined program-counter relative addressing with immediate ad-
dressing. Although this may appear unusual, the interpretation is consistent
with our previous uses of immediate operands. In general, the target address
calculation is performed; then, if immediate mode is specified, the target ad-
dress (not the contents stored at that address) becomes the operand. (In the
LDA statement on line 55, for example, bits x, b, and p are all 0. Thus the target
address is simply the displacement 003.)
The assembly of instructions that specify indirect addressing presents
nothing really new. The displacement is computed in the usual way to pro-
duce the target address desired. Then bit n is set to indicate that the contents
stored at this location represent the addressof the operand, not the operand it-
self. Line 70 shows a statement that combines program-counter relative and
indirect addressingin this way.
run, etc.) Because of this, it is desirable to be able to load a program into mem-
ory wherever there is room for it. In such a situation the actual starting ad-
dress of the program is not known until load time.
The program we considered in Section 2.1 is an example of an absolute
program (or absolute assembly). This program must be loaded at address 1000
(the address that was specified at assembly time) in order to executeproperly.
To see this, consider the instruction
from Fig. 2.2. In the object program (Fig. 2.3), this statement is translated as
00102D,specifying that register A is to be loaded from memory address102D.
Suppose we attempt to load and execute the program at address 2000 instead
of address 1000. If we do this, address 102D will not contain the value that we
expect—in fact, it will probably be part of some other user’s program.
Obviously we need to make some change in the address portion of this in-
struction so we can load and execute our program at address 2000. On the
other hand, there are parts of the program (such as the constant 3 generated
from line 85) that should remain the same regardless of where the program is
loaded. Looking at the object code alone, it is in general not possible to tell
which values represent addresses and which represent constant data items.
Since the assembler does not know the actual location where the program
will be loaded, it cannot make the necessarychangesin the addressesused by
the program. However, the assemblercan identify for the loader those parts of
the object program that need modification. An object program that contains
the information necessaryto perform this kind of modification is called a relo-
catable program.
To look at this in more detail, consider the program from Figs. 2.5 and 2.6.
In theprecedingsection,we assembled
this programusing
a startingaddress
of 0000. Figure 2.7(a) shows this program loaded beginning at address 0000.
The JSUB instruction from line 15 is loaded at address 0006. The address field
of this instruction contains 01036, which is the address of the instruction la-
beled RDREC. (Theseaddressesare, of course, the same as those assigned by
the assembler.)
Now suppose that we want to load this program beginning at address
5000, as shown in Fig. 2.7(b). The address of the instruction labeled RDREC is
then 6036. Thus the JSUB instruction must be modified as shown to contain
this new address. Likewise, if we loaded the program beginning at address
7420 (Fig. 2.7c), the JSUBinstruction would need to be changed to 4B108-456
to
correspond to the new address of RDREC.
Sectioh2.2| Machine-Dependent
AssemblerFeatures 63
5 a veces
Qeeee
8 (+JSUB RDREC)
ROREC
5000 :
5006
| 48106036
|(+JSUB
RDREC)
6036
| B410 ROREC
6076 :
7420
7426 (+JSUB RDREC)
8456 0 ROREC
: t
8496 :
Note that no matter where the program is loaded, RDREC is always 1036
bytes past the starting address of the program. This means that we can solve
the relocation problem in the following way:
1. When the assembler generates the object code for the JSUB instruc-
tion we are considering, it will insert the address of RDREC relativeto
the start of the program.(This is the reason we initialized the location
counter to 0 for the assembly.)
2. The assembler will also produce a command for the loader, instruct-
ing it to add the beginning address of the program to the address
field in the JSUB instruction at load time.
Chapter
2. Assemblers
The command for the loader, of course, must also be a part of the object pro-
gram. We can accomplish this with a Modification record having the following
format:
Modification record:
Col. 1 M
M00000705
T,0000001D17202D69202D4B1010360320262900003320074B10105D3F2FECO32010
HCOPY00000001077
7,00001D130F20160100030F200D4B10105D3E2003454F46
70010531
7,0010361DB410B400844075101000E32019332FFADB2013A00433200857C003B850
D3B2FEAI
340004
FOOO0OF1B410774000E32011332FFA5
3CO03DF2008B850
7,001070073B2FEF4FO00005
M00000705
M00001405
M00002705
E000000
2.3 MACHINE-INDEPENDENT
ASSEMBLER FEATURES
In this section, we discuss some common assembler features that are not
closely related to machine architecture. Of course, more advanced machines
tend to have more complex software; therefore the features we consider are
more likely to be found on larger and more complex machines. However, the
presenceor absenceof such capabilities is much more closely related to issues
such as programmer convenience and software environment than it is to
machine architecture.
2.3.1 Literals
specifies a 3-byte operand whose value is the character string EOF. Likewise
the statement
255
Line Source statement
Line Loc
Figure
1076
2.10 Program
Source
from statement
Fig. 2.9 with object code. Object code
0000 COPY START 0
10 0000 FIRST STL RETADR 17202D
13 0003 LDB #LENGTH 69202D
14 BASE LENGTH
15 0006 CLOOP +JSUB RDREC 4B101036
20 000A LDA LENGTH 032026
25 000D COMP #0 290000
30 0010 JEQ ENDFIL 332007
35 0013 +JSUB WRREC 4B10105D
40 0017 CLOOP 3F2FEC
45 001A ENDFIL
=C’EOF’ 032010
50 001D BUFFER OF2016
55 0020 #3 010003
60 0023 LENGTH OF200D
65 0026 WRREC 4B10105D
70 002A @RETADR 3E2003
93 LTORG
002D SUBROUTINE
=C’EOF’ TO READ RECORD 454F46
INTO BUFFER
95 0030 RETADR RESW 1
100 0033 RESW 1
105 0036 BUFFER RESB 4096
106 1036 BUFEND EQU BUFEND-
* BUFFER
107 1000 EQU
110
115
120
125 1036 RDREC CLEAR x B410
130 1038 CLEAR A B400
132 103A CLEAR S B440
133 103¢c +LDT #MAXLEN 75101000
135 1040 RLOOP TD INPUT E32019
140 1043 JEQ RLOOP 332FFA
145 1046 RD INPUT DB2013
150 1049 COMPR A,S A004
155 104B JEQ EXIT 332008
160 104E STCH BUFFER,
X 570003
165 1051 TIXR T B850
170 1053 JLT RLOOP 3B2FEA
175 1056 EXIT STK LENGTH 134000
180 1059 RSUB 4F0000
185 105¢ SUBROUTINE
BYTE TO WRITE RECORDFlFROM BUFFER
X'F1'
195
200
205
210 105D x B410
212 105F LENGTH 774000
215 1062 WLOOP =X’05' £32011
220 1065 WLOOP 332FFA
225 1068 BUFFER,
X 53C003
230 106B =X’05' DF2008
235 106E TIXR . T B850
240 1070 JLT WLOOP 3B2FEF
245 1073 RSUB 4F0000
255 =X'05’
END FIRST
05
2.3. Machine-IndependentAssemblerFeatures 69
specifies a 1-byte literal with the hexadecimal value 05. The notation used for
literals varies from assembler to assembler; however, most assemblers use
some symbol (as we have used =) to make literal identification easier.
It is important to understand the difference between a literal and an imme-
diate operand. With immediate addressing, the operand value is assembled as
part of the machine instruction. With a literal, the assembler generates the
specified value as a constant at some other memory location. The addressof
this generated constant is used as the target address for the machine instruc-
tion. The effect of using a literal is exactly the same as if the programmer had
defined the constant explicitly and used the label assigned to the constant as
the instruction operand. (In fact, the generated object code for lines 45 and 215
in Fig. 2.10 is identical to the object code for the corresponding lines in
Fig. 2.6.) You should compare the object instructions generated for lines 45 and
55 in Fig. 2.10 to make sure you understand how literals and immediate
operands are handled.
All of the literal operands used in a program are gathered together into
one or more literal pools.Normally literals are placed into a pool at the end of
the program. The assembly listing of a program containing literals usually in-
cludes a listing of this literal pool, which shows the assigned addresses and
the generateddatavalues.Such
a literal pool listing is shownin Fig.2.10im-
mediately following the END statement. In this case,the pool consists of the
single literal =X’05’.
In some cases,however, it is desirable to place literals into a pool at some
other location in the object program. To allow this, we introduce the assembler
directive LTORG (line 93 in Fig. 2.9). When the assembler encounters a LTORG
statement, it creates a literal pool that contains all of the literal operands used
since the previous LTORG (or the beginning of the program). This literal pool
is placed in the object program at the location where the LTORG directive was
encountered (see Fig. 2.10). Of course, literals placed in a pool by LTORG will
not be repeated in the pool at the end of the program.
If we had not used the LTORG statement on line 93, the literal =C’EOF’
would be placed in the pool at the end of the program. This literal pool would
begin at address 1073.This means that the literal operand would be placed too
far away from the instruction referencing it to allow program-counter relative
addressing. The problem, of course, is the large amount of storage reserved for
BUFFER.By placing the literal pool before this buffer, we avoid having to use
extended format instructions when referring to the literals. The need for an as-
sembler directive such as LTORG usually arises when it is desirable to keep
the literal operand close to the instruction that usesit.
Most assemblersrecognize duplicate literals—that is, the same literal used
in more than one place in the program—and store only one copy of the speci-
fied data value. For example, the literal =X’05’ is used in our program on lines
70 Chapter2. Assemblers
215 and 230. However, only one data area with this value is generated. Both
instructions refer to the same address in the literal pool for their operand.
The easiest way to recognize duplicate literals is by comparison of the
character strings defining them (in this case,the string =X’05’). Sometimesa
slight additional saving is possible if we look at the generated data value in-
stead of the defining expression. For example, the literals =C’EOF’ and
=X'454F46’ would specify identical operand values. The assembler might
avoid storing both literals if it recognizedthis equivalence.However, the bene-
fits realized in this way are usually not great enough to justify the additional
complexity in the assembler.
If we use the character string defining a literal to recognize duplicates, we
must be careful of literals whose value depends upon their location in the pro-
gram. Suppose, for example, that we allow literals that refer to the current
value of the location counter (often denoted by the symbol *). Such literals are
sometimes useful for loading base registers. For example, the statements
BASE *
LDB =*
as the first lines of a program would load the beginning address of the pro-
gram into register B. This value would then be available for base relative ad-
dressing.
Such a notation can, however, causea problem with the detection of dupli-
cate literals. If a literal =* appeared on line 13 of our example program, it
would specify an operand with value 0003.If the sameliteral appearedon line
55,it would specifyan operandwith value0020.In such
a case,the literal
operands have identical names;however, they have different values, and both
must appear in the literal pool. The same problem arises if a literal refers to
any other item whose value changes between one point in the program and
another.
Now we are ready to describe how the assemblerhandles literal operands.
The basic data structure needed is a literal table LITTAB. For each literal used,
this table contains the literal name, the operand value and length, and the ad-
dress assigned to the operand when it is placed in a literal pool. LITTAB is of-
ten organized as a hash table, using the literal name or value as the key.
As each literal operand is recognized during Pass1, the assemblersearches
LITTAB for the specified literal name (or value). If the literal is already present
in the table, no action is needed; if it is not present, the literal is added to LIT-
TAB (leaving the address unassigned). When Pass 1 encounters a LTORG
statement or the end of the program, the assembler makes a scan of the literal
table. At this time each literal currently in the table is assignedan address (un-
less such an address has already been filled in). As these addresses are as-
‘2.3 Machine-IndependentAssemblerFeatures 71
signed, the location counter is updated to reflect the number of bytes occupied
by each literal.
During Pass2, the‘operand address for use in generating object code is ob-
tained by searching LITTAB for each literal operand encountered. The data
values specified by the literals in each literal pool are inserted at the appropri-
ate places in the object program exactly as if these values had been generated
by BYTE or WORD statements. If a literal value represents an address in the
program (for example, a location counter value), the assemblermust also gen-
erate the appropriate Modification record.
To be sure you understand how LITTAB is created and used by the assem-
bler, you may want to apply the procedure we just described to the source
statements in Fig. 2.9. The object code and literal pools generated should be
the same as those in Fig. 2.10.
This statement defines the given symbol (i-e., enters it into SYMTAB) and as-
signs to it the value specified. The value may be given as a constant or as any
expression involving constants and previously defined symbols. We discuss
the formation and use of expressions in the next section.
One common use of EQU is to establish symbolic names that can be used
for improved readability in place of numeric values. For example, on line 133
of the program in Fig. 2.5 we used the statement
+LDT #4096
to load the value 4096 into register T. This value represents the maximum-
length record we could read with subroutine RDREC. The meaning is not,
however, as clear as it might be. If we include the statement
+LDT #MAXLEN
When the assembler encounters the EQU statement, it enters MAXLEN into
SYMTAB (with value 4096). During assembly of the LDT instruction, the as-
sembler searchesSYMTAB for the symbol MAXLEN, using its value as the
operand in the instruction. The resulting object code is exactly the same as in
the original version of the instruction; however, the source statement is easier
to understand. It is also much easier to find and change the value of MAXLEN
if this becomesnecessary—wewould not have to search through the source
code looking for places where #4096is used.
Another common use of EQU is in defining mnemonic names for registers.
We have assumed that our assembler recognizes standard mnemonics for reg-
isters—A, X, L, etc. Suppose, however, that the assembler expected register
numbersinstead of names in an instruction like RMO. This would require the
programmer to write (for example) RMO 0,1 instead of RMO A,X. In such a
case the programmer could include a sequence of EQU statements like
A FOU 0
X EQU 1
L EQU 2
BASE EQU R1 -
COUNT EQU R2
INDEX EQU R3
the programmer can establish and use names that reflect the logical function
of the registers in the program.
2.3° Machine-Independent
AssemblerFeatures 73
ORG value
In this table, the SYMBOL field contains a 6-byte user-defined symbol; VALUE
is a one-word representation of the value assigned to the symbol; FLAGS is a
2-byte field that specifiessymbol type and other information.
We could reserve space for this table with the statement
We might want to refer to entries in the table using indexed addressing (plac-
ing in the index register the offset of the desired entry from the beginning of
the table). Of course, we want to be able to refer to the fields SYMBOL,
VALUE, and FLAGS individually, so we must also define these labels. One
way of doing this would be with EQU statements:
LDA VALUE,
X
to fetch the VALUE field from the table entry indicated by the contents of reg-
ister X. However, this method of definition simply defines the labels; it does
not make the structure of the table as clear as it might be.
We can accomplish the samesymbol definition using ORG in the following
way:
VALUE RESW 1
FLAGS RESB 2
ORG STAB+1100
The first ORG resetsthe location counter to the value of STAB (i.e., the begin-
ning address of the table). The label on the following RESBstatement defines
SYMBOL to have the current value in LOCCTR; this is the same address as-
signed to SYMTAB. LOCCTR is then advanced so the label on the RESWstate-
ment assigns to VALUE the address (STAB+6), and so on. The result is a set of
labels with the same values as those defined with the EQU statements above.
This method of definition makes it clear, however, that each entry in STAB
consistsof a 6-byte SYMBOL, followed by a one-word VALUE, followed by a
2-byte FLAGS.
The last ORG statement is very important. It sets LOCCTR back to its pre-
vious value—the address of the next unassigned byte of memory after the
table STAB. This is necessary so that any labels on subsequent statements,
which do not represent part of STAB, are assigned the proper addresses.In
some assemblers the previous value of LOCCTR is automatically remembered,
so we can simply write
ORG
ALPHA RESW 1
would not. The reason for this is the symbol definition process. In the second
example above, BETA cannot be assigned a value when it is encountered dur-
ing Pass 1 of the assembly (because ALPHA does not yet have a value).
However, our two-pass assemblerdesign requires that all symbols be defined
during Pass 1.
A similar restriction applies to ORG: all symbols used to specify the new
location counter value must have been previously defined. Thus, for example,
the sequence
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1
could not be processed.In this case,the assembler would not know (during
Pass 1) what value to assign to the location counter in response to the first
ORG statement.As a result, the symbols BYTE1,BYTE2,and BYTE3could not
be assigned addressesduring Pass1.
It may appear that this restriction is a result of the particular way in which
we defined the two passesof our assembler.In fact, it is a more general prod-
uct of the forward-reference problem. You can easily see,for example, that the
sequence of statements
2.3.3 Expressions
useofexpressions
wherever
such
asingle
operand
ispermitted.
Each
such
ex-
pression must, of course, be evaluated by the assembler to produce a single
operand address or value.
Assemblers generally allow arithmetic expressions formed according to
the normal rules using the operators +, -, *, and /. Division is usually defined
to produce an integer result. Individual terms in the expression may be con-
stants, user-defined symbols, or special terms. The most common such special
term is the current value of the location counter (often designated by *). This
term represents the value of the next unassigned memory location. Thus in
Fig. 2.9 the statement
givesBUFEND
a valuethatis theaddressof thenextbyteafterthebufferarea.
In Section 2.2 we discussed the problem of program relocation. We saw
that some values in the object program are relativeto the beginning of the pro-
gram, while others are absolute(independent of program location). Similarly,
the values of terms and expressions are either relative or absolute. A constant
is, of course, an absolute term. Labels on instructions and data areas, and ref-
erencesto the location counter value, are relative terms. A symbol whose value
is given by EQU (or some similar assembler directive) may be either an ab-
solute term or a relative term depending upon the expression used to define
its value.
Although the rules given above may seemarbitrary, they are actually quite
reasonable.The expressions that are legal under these definitions include ex-
actly those expressionswhose value remains meaningful when the program is
relocated. A relative term or expression represents some value that may be
writtenas(S+r), whereS is thestartingaddress
of theprogramand
ris the
2.3. Machine-IndependentAssemblerFeatures 77
value of the term or expression relative to the starting address. Thus a relative
term usually represents some location within the program. When relative
terms are paired with opposite signs, the dependency on the program starting
address is canceled out; the result is an absolute value. Consider, for example,
the program of Fig. 2.9. In the statement
both BUFEND and BUFFER are relative terms, each representing an address
within the program. However, the expression represents an absolute value: the
differencebetween the two addresses,which is the length of the buffer area in
bytes. Notice that the assembler listing in Fig. 2.10 shows the value calculated
for this expression (hexadecimal 1000) in the Loc column. This value does not
represent an address, as do most of the other entries in that column. However,
it does show the value that is associated with the symbol that appears in the
source statement (MAXLEN).
Expressions such as BUFEND + BUFFER, 100 —BUFFER, or 3 * BUFFER
represent neither absolute values nor locations within the program. The values
of these expressionsdepend upon the program starting address in a way that
is unrelated to anything within the program itself. Becausesuch expressions
are very unlikely to be of any use, they are considered errors.
To determine the type of an expression, we must keep track of the types of
all symbolsdefinedin the program.For this purposewe need
aflag in the
symbol table to indicate type of value (absolute or relative) in addition to the
value itself. Thus for the program of Fig. 2.10, some of the symbol table entries
might be
BUFFER R 0036
BUFEND R 1036
MAXLEN A 1000
With this information the assembler can easily determine the type of each ex-
pression used as an operand and generate Modification records in the object
program for relative values.
In Section 2.3.5 we consider programs that consist of several parts that can
be relocated independently of each other. As we discuss in the later section,
our rules for determining the type of an expression must be modified in such
instances.
78 Chapter
2. Assemblers
In all of the examples we have seen so far the program being assembledwas
treated as a unit. The source programs logically contained subroutines, data
areas,etc. However, they were handled by the assembleras one entity, result-
ing in a single block of object code. Within this object program the generated
machine instructions and data appeared in the same order as they were writ-
ten in the source program.
The assembler directive USE indicates which portions of the source pro-
gram belong to the various blocks. At the beginning of the program, state-
ments are assumed to be part of the unnamed (default) block; if no USE
statements are included, the entire program belongs to this single block. The
USE statement on line 92 signals the beginning of the block named CDATA.
Source statements are associated with this block until the USE statement on
line 103,which begins the block named CBLKS. The USE statement may also
indicate a continuation of a previously begun block. Thus the statement on
line 123 resumes the default block, and the statement on line 183 resumes the
block named CDATA. .
As we can see, each program block may actually contain several separate
segmentsof the source program. The assemblerwill (logically) rearrange these
segmentsto gather together the piecesof each block. Theseblocks will then be
assigned addresses in the object program, with the blocks appearing in the
Source statement
BUFEND- 2.3 Machine-Independent
BUFFER AssemblerFeatures 79
Line
BUFEND EQU
©
106 FIRST LOCATION AFTER BUFFER
107
110
MAXLEN EQU
SUBROUTINE TO READ RECORD MAXIMUM
INTO BUFFER
RECORD LENGTH
115
120
123 USE
125 RDREC CLEAR x CLEAR LOOP COUNTER
130 CLEAR A CLEAR A TO ZERO
132 CLEAR Ss CLEAR S TO ZERO
133 +LDT #MAXLEN
135 RLOOP TD INPUT TEST INPUT DEVICE
140 JEQ RLOOP LOOP UNTIL READY
145 RD INPUT READ CHARACTER INTO REGISTER A
150 COMPR A,S TEST FOR END OF RECORD (X‘00')
155 JEQ EXIT EXIT LOOP IF EOR
160 STCH BUFFER,
X STORE CHARACTER IN BUFFER
165 TIXR T LOOP UNLESS MAX LENGTH
170 JLT RLOOP HAS BEEN REACHED
175 EXIT STX LENGTH SAVE RECORD LENGTH
180 RSUB RETURN TO CALLER
183 USE CDATA
185 INPUT SUBROUTINEX’F1’
BYTE TO WRITE RECORDCODE
FROMFOR
BUFFER
INPUT DEVICE
195
200
205
208 USE
210 WRREC CLEAR x CLEAR LOOP COUNTER
212 LENGTH
215 WLOOP TD =X’05’ TEST OUTPUT DEVICE
220 WLOOP LOOP UNTIL READY
225 BUFFER,
X GET CHARACTER FROM BUFFER
230 =X'05’ WRITE CHARACTER
235 T LOOP UNTIL ALL CHARACTERS
240 WLOOP HAVE BEEN WRITTEN
245
Figure 2.11 Example of a program with multiple
RETURN program
TO CALLER
blocks.
252 CDATA
253 LTORG
255 END FIRST
80 Chapter
2. Assemblers
same
order
inwhich
theywere
firstbegun
inthesource
program.
Theresult
is
the same as if the programmer had physically rearranged the source state-
ments to group togetherall the sourcelines belongingto eachblock.
The assembler accomplishes this logical rearrangement of code by main-
taining, during Pass1, a separate location counter for each program block. The
location counter for a block is initialized to 0 when the block is first begun.The
current value of this location counter is saved when switching to another
block, and the saved value is restored when resuming a previous block. Thus
during Pass 1 each label in the program is assigned an address that is relative
to the start of the block that contains it. When labels are entered into the sym-
bol table, the block name or number is stored along with the assigned relative
address. At the end of Pass 1 the latest value of the location counter for each
block indicates the length of that block. The assembler can then assign to each
block a starting address in the object program (beginning with relative loca-
tion 0).
For code generation during Pass 2, the assembler needs the address for
eachsymbol relative to the start of the object program (not the start of an indi-
vidual program block). This is easily found from the information in SYMTAB.
The assembler simply adds the location of the symbol, relative to the start of
its block, to the assignedblock starting address.
Figure 2.12 demonstrates this process applied to our sample program. The
column headed Loc/Block shows the relative address (within a program
block) assigned to each source line and a block number indicating which pro-
gram block is involved (0 = default block, 1 = CDATA, 2 = CBLKS). This is es-
sentially the same information that is stored in SYMTAB for each symbol.
Notice that the value of the symbol MAXLEN (line 107) is shown without a
block number. This indicates that MAXLEN is an absolute symbol, whose
value is not relative to the start of any program block.
At the end of Pass 1 the assembler constructs a table that contains the start-
ing addressesand lengths for all blocks. For our sample program, this table
looks like
200 . SUBROUTINE
TO WRITERECORD
FROMBUFFER
205
208 004D 0 USE
210 004D 0 WRREC CLEAR X B410
212 004F 0 LDT LENGTH 772017
215 0052 0 WLOOP TD =X'05° E3201B
220 0055 0 JEQ WLOOP 332FFA
225 0058 0 LDCH BUFFER,
X 53A016
230 005B 0 WD =X'05' DF2012
235 OO5E 0 TIXR T B850
240 0060 QO JLT WLOOP 3B2FEF
245 0063 0 RSUB 4F0000
252 0007 1 USE CDATA
253 LTORG
0007 1 * =C’EOF 454F46
OOOA 1 =X'05’ 05
255 END FIRST
SYMTAB
shows
thevalue
oftheoperand
(thesymbol
LENGTH)
asrelative
lo-
cation 0003 within program block 1 (CDATA). The starting address for CDATA
is 0066. Thus the desired target address for this instruction is 0003 + 0066 =
0069. The instruction is to be assembled using program-counter relative ad-
dressing. When the instruction is executed, the program counter contains the
address of the following instruction (line 25). The address of this instruction is
relative location 0009 within the default block. Since the default block starts at
location 0000, this address is simply 0009.Thus the required displacement is
0069—0009 = 60. The calculation of the other addressesduring Pass2 follows a
similar pattern.
We can immediately see that the separation of the program into blocks has
considerably reduced our addressing problems. Becausethe large buffer area
is moved to the end of the object program, we no longer need to use extended
format instructions on lines 15, 35, and 65. Furthermore, the base register is no
longer necessary;we have deleted the LDB and BASE statements previously
on lines 13 and 14. The problem of placement of literals (and literal references)
in the program is also much more easily solved. We simply include a LTORG
statement in the CDATA block to be sure that the literals are placed ahead of
any large data areas.
Of course the use of program blocks has not accomplished anything we
could not have done by rearranging the statements of the source program. For
example, program readability is often improved if the definitions of data areas
are placed in the source program close to the statements that reference them.
This could be accomplished in a long subroutine (without using program
blocks) by simply inserting data areas in any convenient position. However,
the programmer would need to provide Jump instructions to branch around
the storage thus reserved.
In the situation just discussed, machine considerations suggested that the
parts of the object program appear in memory in a particular order. On the
otherhand,humanfactorssuggested
that the sourceprogramshouldbe ina
different order. The use of program blocks is one way of satisfying both of
theserequirements, with the assemblerproviding the required reorganization.
It is not necessaryto physically rearrange the generated code in the object
program to place the pieces of each program block together. The assemblercan
simply write the object code as it is generated during Pass 2 and insert the
proper load address in each Text record. These load addresses will, of course,
reflect the starting address of the block as well as the relative location of the
code within the block. This process is illustrated in Fig. 2.13. The first two Text
records are generated from the source program lines 5 through 70. When the
USE statement on line 92 is recognized, the assembler writes out the current
Text record (even though there is still room left in it). The assembler then pre-
pares to begin a new Text record for the new program block. As it happens, the
statementson lines 95 through 105result in no generatedcode, so no new Text
23° Machine-IndependentAssemblerFeatures 83
70000001
HCOPY00000001071
E17
20634B20210320602900003320064B203B3F
2FEEQ320550F2056010003
E090F20484B2Q293E203F
assemble,
‘A load,
ne410B400844075101000E32038332FFADB2032A00433200857A02FB850
and manipulateeachof thesecontrolsectionsseparately.
The
T,000044093B2FEA13201F4F0000
700004D19B410772017E3201B332FFA53A016DF2012B8503B2FEFR4FO000
T,00006D04454F4605
E000000
Figure 2.13 Object program corresponding to Fig. 2.11.
records are created. The next two Text records come from lines 125 through
180. This time the statements that belong to the next program block do result
in the generation of object code. The fifth Text record contains the single byte
of data from line 185. The sixth Text record resumes the default program block
Program loaded
Source program Object program in memory
. Relative
Line address
5 - 0000
Defauilt(1)
0027
70
95 004D
100
105
125
006C
CDATA(3) |2062
0071
180
CDATA(2)
210
CBLKS(1)
Defauit(3)
245
253 CDATA(3)
1070
Figure 2.14 Program blocks from Fig. 2.11 traced through the assem-
bly and loading processes.
Figure 2.15 shows our example program as it might be written using multi-
ple control sections. In this case there are three control sections: one for the
main program and one for each subroutine. The START statement identifies
the beginning of the assembly and gives a name (COPY) to the first control
section. The first section continues until the CSECT statement on line 109. This
assembler directive signals the start of a new control section named RDREC.
Similarly, the CSECT statement on line 193 begins the control section named
WRREC. The assembler establishes a separate location counter (beginning at
0) for each control section, just as it does for program blocks.
Control sections differ from program blocks in that they are handled sepa-
rately by the assembler. (It is not even necessary for all control sections in a
program to be assembled at the same time.) Symbols that are defined in one
control section may not be used directly by another control section; they must
be identified as external references for the loader to handle. Figure 2.15 shows
the use of two assemblerdirectives to identify such references:EXTDEF (exter-
nal definition) and EXTREF (external reference). The EXTDEF statement in a
control section names symbols, called externalsymbols,that are defined in this
control section and may be used by other sections. Control section names (in
this case COPY, RDREC, and WRREC) do not need to be named in an EXTDEF
statement because they are automatically considered to be external symbols.
The EXTREF statement names symbols that are used in this control section
and are defined elsewhere. For example, the symbols BUFFER,BUFEND, and
LENGTH are defined in the control section named COPY and made available
to the other sections by the EXTDEF statement on line 6. The third control sec-
tion (WRREC) uses two of thesesymbols, as specified in its EXTREFstatement
(line 207). The order in which symbols are listed in the EXTDEF and EXTREF
statementsis not significant.
Now we are ready to look at how external referencesare handled by the
assembler.Figure 2.16 shows the generated object code for each statement in
the program. Consider first the instruction
The operand (RDREC) is named in the EXTREFstatement for the control sec-
tion, so this is an external reference. The assembler has no idea where the con-
trol section containing RDREC will be loaded, so it cannot assemble the
address for this instruction. Instead the assembler inserts an address of zero
and passes information to the loader, which will cause the proper address to
be inserted at load time. The address of RDREC will have no predictable rela-
tionship to anything in this control section; therefore relative addressing is not
possible. Thus an extended format instruction must be used to provide room
for the actual address to be inserted. This is true of any instruction whose
operand involves an external reference.
Line
255 Source statement
193
195
200
205
207 EXTREF LENGTH, BUFFER
210 CLEAR x CLEAR LOOP COUNTER
212 +LDT LENGTH
215 WLOOP TD =X'05’ TEST OUTPUT DEVICE
220 JEQ WLOOP LOOP UNTIL READY
225 +LDCH GET CHARACTER FROM BUFFER
BUFFER,
X
230 WD =X'Q5’ WRITE CHARACTER
235 TIXR T LOOP UNTIL ALL CHARACTERS
240 JLT WLOOP HAVE BEEN WRITTEN
245 RSUB
Figure 2.15 Illustration of control sections RETURN
and program
TO CALLER
linking.
END FIRST
86
Line Loc
Figure
001B
2.16 Program
Source Fig. 2.15BUFFER,
from statement BUFEND,
with object LENGTH
code. Object code
0000 COPY START 0
EXTDEF
EXTREF
RDREC
, WRREC
10 0000 FIRST STL RETADR 172027
15 0003 CLOOP +JSUB RDREC 4B100000
20 0007 LDA LENGTH 032023
25 000A COMP #0 290000
30 000D JEQ ENDFIL 332007
35 0010 +JSUB WRREC 4B100000
40 0014 J CLOOP 3F2FEC
45 0017 ENDFIL LDA =C’ EOF’ 032016
50 001A STA BUFFER OF2016
55 001D LDA #3 010003
60 0020 STA LENGTH OF200A
65 0023 +JSUB WRREC 4B100000
70 0027 J @RETADR 3E2000
95 002A RETADR RESW 1
100 002D LENGTH RESW 1
103 LTORG
x
0030 SUBROUTINE
=C’ EOF’ TO READ RECORD INTO
454F46
BUFFER
105 0033 BUFFER RESB 4096
*
106 1033 BUFEND EQU
107 1000 MAXLEN EQU BUFEND-BUFFER
87
88 Chapter2. Assemblers
is only slightly different. Here the value of the data word to be generated is
specified by an expression involving two external references: BUFEND and
BUFFER. As before, the assembler stores this value as zero. When the program
is loaded, the loader will add to this data area the address of BUFEND and
subtract from it the address of BUFFER, which results in the desired value.
Note the difference between the handling of the expressionon line 190and
the similar expression on line 107. The symbols BUFEND and BUFFER are
defined in the same control section with the EQU statement on line 107. Thus
the value of the expression can be calculated immediately by the assembler.
This could not be done for line 190; BUFEND and BUFFER are defined in an-
other control section, so their values are unknown at assembly time.
As we can See from the above discussion, the assembler must remember
(via entries in SYMTAB) in which control section a symbol is defined. Any
attempt to refer to a symbol in another control section must be flagged as an
error unless the symbol is identified (using EXTREF)as an external reference.
The assembler must also allow the same symbol to be used in different control
sections.For example, the conflicting definitions of MAXLEN on lines 107and
190 should cause no problem. A reference to MAXLEN in the control section
COPY would use the definition on line 107, whereas a reference to MAXLEN
in RDREC would use the definition on line 190.
So far we have seen how the assembler leaves room in the object code for
the values of external symbols. The assembler must also include information
in the object program that will cause the loader to insert the proper values
where they are required. We need two new record types in the object program
and a change in a previously defined record type. As before, the exact format
of these records is arbitrary; however, the same information must be passedto
the loader in some form.
The two new record types are Define and Refer. A Define record gives in-
formation about external symbols that are defined in this control section—that
is, symbols named by EXTDEF.A Refer record lists symbols that are used as
external referencesby the control section—that is, symbols named by EXTREF.
The formats of these records are as follows.
Section2.3. Macitine-IndependentAssemblerFeatures 89
Define record:
Col. 1 D
Col.1 R
Col. 2-7 Name of external symbol referred to in this control
section
The first three items in this record are the same as previously discussed. The
two new items specify the modification to be performed: adding or subtract-
ing the value of some external symbol. The symbol used for modification may
be defined either in this control section or in another one.
Figure 2.17 shows the object program corresponding to the source in Fig.
2.16.Notice that there is a separate set of object program records (from Header
through End) for each control section. The records for each control section are
exactly the same as they would be if the sections were assembledseparately.
The Define and Refer records for each control section include the symbols
named in the EXTDEF and EXTREF statements. In the case of Define, the
record also indicates the relative address of each external symbol within the
control section. For EXTREF symbols, no address information is available.
Thesesymbols are simply named in the Refer record.
90 Chapter 2. Assemblers
HCOPY(90000001033
DBUFFERO00033BUFENDO01033LENGTHOO002D
RRDREC
MRREC .
70000001
D1720274B1000000320232900003320074B1000003F2FEC03201
T,00001D0D0100030F200A4B1000003E2000
T,00003003454F46
M00000405+RDREC
M00001105+WRREC
MO0002405+WRREC
E000000
HRDREC 00000000028
RBUFFERLENGTHBUFEND
70000001
DB410B400B44077201FE3201B332FFADB2015A0043320095790
700001
DOE3B2FES9131000004F0000F
1000000
MOO001805+BUFFER
M00002105+LENGTH
M00002806+BUFEND
M00002806-BUFFER
E
HWRREC
poooogooooic
RLENGTHBUFFER
T0000001CB41077100000E32012332FFA53900000DF
200888
503B2FEE4FO00005
M00000305+LENGTH
MOOOOODOS+BUFFER
E
M00000405+RDREC °
instructions on lines 35 and 65. Likewise, the first Modification record in con-
trol section RDREC fills in the proper address for the external reference on
line 160. .
The handling of the data word generated by line 190 is only slightly differ-
ent. The value of this word is to be BUFEND-BUFFER, where both BUFEND
and BUFFER are defined in another control section. The assembler generates
an initial value of zero for this word (located at relative address 0028 within
control section RDREC). The last two Modification records in RDREC direct
that the address of BUFEND be added to this field, and the address of
BUFFERbe subtracted from it. This computation, performed at load time, re-
sults in the desired value for the data word.
In Chapter 3 we discuss in detail how the required modifications are per-
formed by the loader. At this time, however, you should be sure that you un-
derstand the concepts involved in the linking process. You should carefully
examine the other Modification records in Fig. 2.17, and reconstruct for your-
self how they were generated from the source program statements.
Note that the revised Modification record may still be used to perform pro-
gram relocation. In the caseof relocation, the modification required is adding
the beginning address of the control section to certain fields in the object pro-
gram. The symbol used as the name of the control section has as its value the
required address. Since the control section name is automatically an external
symbol, it is available for use in Modification records. Thus, for example, the
Modification records from Fig. 2.8 are changed from
M00000705
M00001405
M00002705
to
M00000705+COPY
M00001405+COPY
M00002705+COPY
In this way, exactly the same mechanism can be used for program relocation
and for program linking. There are more examples in the next chapter.
The existence of multiple control sections that can be relocated indepen-
dently of one another makes the handling of expressionsslightly more compli-
cated. Our earlier definitions required that all of the relative terms in an
expression be paired (for an absolute expression), or that all except one be
paired (for a relative expression). We must now extend this restriction to spec-
ify that both terms in each pair must be relative within the same control sec-
92 Chapter 2 Assemblers
tion. The reason is simple—if the two terms represent relative locations in the
same control section, their difference is an absolute value (regardless of where
the control section is located). On the other hand, if they are in different con-
trol sections, their difference has a value that is unpredictable (and therefore
probably useless).For example, the expression
BUFEND-
BUFFER
has as its value the length of BUFFERin bytes. On the other hand, the value of
the expression
RDREC-COPY
is the difference in the load addresses of the two control sections. This value
depends on the way run-time storage is allocated; it is unlikely to be of any
use whatsoever to an application program.
When an expression involves external references, the assembler cannot in
general determine whether or not the expression is legal. The pairing of rela-
tive terms to test legality cannot be done without knowing which of the terms
occur in the same control sections, and this is unknown at assembly time. In
such a case, the assembler evaluates all of the terms it can, and combines these
An example should help to make this process clear. Figure 2.19(a) shows
the object code and symbol table entries as they would be after scanning line
40 of the program in Pig. 2.18. The first forward reference occurred on line 15.
Since the operand (RDREC) was not yet defined, the instruction was assem-
bled with no value assigned as the operand address (denoted in the figure by
~--), RDREC
wasthenenteredinto SYMTAB
asan undefinedsymbol(indi-
cated by *); the address of the operand field of the instruction (2013) was in-
serted in a list associated with RDREC. A similar process was followed with
the instructions on lines 30 and 35.
Now consider Fig. 2.19(b), which corresponds to the situation after scan-
ning line 160. Some of the forward referenceshave been resolved by this time,
while others have been added. When the symbol ENDFIL was defined (line
45), the assemblerplaced its value in the SYMTAB entry; it then inserted this
value into the instruction operand field (at address 201C) as directed by the
forward reference list. From this point on, any references to ENDFIL would
not be forward references,
and would not be enteredintoalist. Similarly,the
definition of RDREC(line 125)resulted in the filling in of the operand address
at location 2013. Meanwhile, two new forward references have been added: to
WRREC (line 65) and EXIT (line 155). You should continue tracing through
this processto the end of the program to show yourself that all of the forward
Memory
address Contents SymbolValue
1000 A454F4600
00030000
OOxxxxxx
XXXXXXXXLENGTH
1006
|
1010 AXXXXXXX
e XAXXXXXXX
XXXXXXXX
XXAXXXXXX
RDREC
+|
°e
THREE
|1003
2000 XXXNXXXX
XXXXXXXX
XKXXXXXX
XXXXXx14
|ZERO
|1006|
2010 100948—-
—00100C
28100630
-—--48—. WRREC
RETADR
1009
|
CLOOP
2012
|
Figure 2.19(a) Object code in memory and symbol table entries for
the program in Fig. 2.18 after scanning line 40.
96 Chapter 2. Assemblers
THREE
9°
g
FIRST
MAXLE2
INPUT
Ex
RLOOP
Figure 2.19(b) Object code in memory and symbol table entries for
the program in Fig. 2.18 after scanning line 160.
useful when the external storage is slow or is inconvenient to use for some
other reason. One-pass assemblers that produce object programs follow a
slightly different procédure from that previously described. Forward refer-
ences are entered into lists as before. Now, however, when the definition of a
symbol is encountered, instructions that made forward referencesto that sym-
bol may no longer be available in memory for modification. In general, they
will already have been written out as part of a Text record in the object pro-
gram. In this case the assembler must generate another Text record with the
correct operand address.When the program is loaded, this address will be in-
serted into the instruction by the action of the loader.
Figure 2.20 illustrates this process. The second Text record contains the ob-
ject code generated from lines 10 through 40 in Fig. 2.18. The operand ad-
dresses for the instructions on lines 15, 30, and 35 have been generated as
0000. When the definition of ENDFIL on line 45 is encountered, the assembler
generates the third Text record. This record specifies that the value 2024 (the
address of ENDFIL) is to be loaded at location 201C (the operand address field
of the JEQ instruction on line 30). When the program is loaded, therefore, the
value 2024 will replace the 0000 previously loaded. The other forward refer-
ences in the program are handled in exactly the same way. In effect, the ser-
vices of the loader are being used to complete forward references that could
not be handled by the assembler.Of course, the object program records must
be kept in their original order when they are presented to the loader.
In this section we considered only simple one-pass assemblers that han-
dled absolute programs. Instruction operands were assumed to be single sym-
bols, and the assembled instructions contained the actual (not relative)
addressesof the operands. More advanced assembler features such as literals
HCOPY0100000107A
T00100009454F46000003000000
00200F1514100948000000100C2810063000004800003C2012
3002010022024
>
37
002024190010000C1
00F0010030C100C64800000810094C0000F1001000
00201302203D
00203D1E04
in Fig.2.18.
1006001
006E02039302043D8203928100630000054900F2C203A382043
1002050022058
7,00205B0710100C4C000005
T00201F022062
7,002031022062
70020621804
1006E0206130206550900FDC20612C100C3820654cC0000
E00200F
were not allowed. You are encouraged to think about ways of removing some
of these restrictions (see the Exercises for this section for some suggestions).
MAXLEN has not yet*been defined, so no value for HALFSZ can be com-
puted. The defining expression for HALFSZ is stored in the symbol table in
place of its value. The entry &1 indicates that one symbol in the defining ex-
pression is undefined. In an actual implementation, of course, this definition
might be stored at some other location. SYMTAB would then simply contain a
pointer to the defining expression.The symbol MAXLEN is also entered in the
symbol table, with the flag * identifying it as undefined. Associated with this
entry is a list of the symbols whose values depend on MAXLEN (in this case,
HALFSZ). (Note the similarity to the way we handled forward referencesin a
one-pass assembler.)
HALFSZ
a] MAXLEN/2 0|
|HALFSZ_
MAXLEN/2
|MAXLEN
| 182]
BUFEND-
BUFFER HALFSZ
0|
|HALFSZ.
|]a1
| MAXLEN/2
|MAXLEN
||82|BUFEND-BUFFER
PREVBT
0
MAXLEN
|
HALFSZ MAXLEN/2
(e)
MAXLEN
(f)
This section describes some of the features of the Microsoft MASM assembler
for Pentium and other x86 systems. Further information about MASM can be
found in Barkakati (1992).
As we discussed in Section 1.4.2, the programmer of an x86 system views
memory as a collection of segments. An MASM assembler language program
is written as a collection of segments.Each segment is defined as belonging to
a particular class, corresponding to its contents. Commonly used classesare
CODE, DATA, CONST, and STACK.
During program execution, segments are addressed via the x86 segment
registers. In most cases, code segments are addressed using register CS, and
stack segments are addressed using register SS. These segment registers are
automatically set by the system loader when a program is loaded for execu-
tion. Register CS is set to indicate the segment that contains the starting label
specified in the END statement of the program. Register SS is set to indicate
the last stack segment processedby the loader.
Data segments (including constant segments) are normally addressed us-
ing DS, ES, FS,or GS. The segment register to be used can be specified explic-
itly by the programmer (by writing it as part of the assembler language
instruction). If the programmer does not specify a segment register, one is se-
lected by the assembler.
By default, the assembler assumes that all references to data segments use
register DS. This assumption can be changed by the assembler directive
ASSUME. For example, the directive
ASSUME ES:DATASEG2
would set ES to indicate the data segment DATASEG2. Notice the similarities
between the ASSUME directive and the BASE directive we discussed for
SIC/XE. The BASE directive tells a SIC/XE assembler the contents of register
B; the programmer must provide executable instructions to load this value
into the register. Likewise, ASSUME tells MASM the contents of a segment
register; the programmer must provide instructions to load this register when
the program is executed.
JMP TARGET
If the definition of the label TARGET occurs in the program before the JMP in-
struction, the assembler can tell whether this is a near jump or a far jump.
However, if this is a forward reference to TARGET, the assembler does not
know how many bytes to reservefor the instruction.
By default, MASM assumes that a forward jump is a near jump. If the tar-
get of the jump is in another code segment, the programmer must warn the
assemblerby writing
If the jump address is within 128bytes of the current instruction, the program-
mer can specify the shorter (2-byte) near jump by writing
JMP SHORT TARGET
If theJMPto TARGET
isa farjump,andtheprogrammer
doesnotspecifyFAR
PTR, a problem occurs. During Pass1, the assemblerreserves3 bytes for the
jump instruction. However, the actual assembled instruction requires 5 bytes.
In the earlier versions of MASM, this causedan assembly error (called a phase
2.5 ImplementationExamples 105
error). In later versions of MASM, the assembler can repeat Pass 1 to generate
the correct location counter values.
Notice the similarities between the far jump and the forward references in
SIC/XE that require the use of extended format instructions.
There are also many other situations in which the length of an assembled
instruction depends on the operands that are used. For example, the operands
of an ADD instruction may be registers, memory locations, or immediate
operands. Immediate operands may occupy from 1 to 4 bytes in the instruc-
tion. An operand that specifies a memory location may take varying amounts
of spacein the instruction, depending upon the location of the operand.
This means that Pass 1 of an x86 assembler must be considerably more
complex than Pass 1 of a SIC assembler. The first pass of the x86 assembler
must analyze the operands of an instruction, in addition to looking at the op-
eration code. The operation code table must also be more complicated, since it
must contain information on which addressing modes are valid for each
operand.
Segments in an MASM source program can be written in more than one
part. If a SEGMENT directive specifies the samename as a previously defined
segment, it is considered to be a continuation of that segment. All of the parts
of a segment are gathered together by the assembly process.Thus, segments
can performa similar function to the program blocks we discussedfor
SIC/XE.
References between segments that are assembled together are automati-
cally handled by the assembler.External referencesbetween separately assem-
bled modules must be handled by the linker. The MASM directive PUBLIC
has approximately the same function as the SIC/XE directive EXTDEF. The
MASM directive EXTRN has approximately the same function as EXTREF.We
will consider the action of the linker in more detail in the next chapter.
The object program from MASM may be in several different formats, to
allow easy and efficient execution of the program in a variety of operating
environments. MASM can also produce an instruction timing listing that
shows the number of clock cycles required to execute each machine instruc-
tion. This allows the programmer to exercise a great deal of control in optimiz-
ing timing-critical sections of code.
This section describes some of the features of the SunOS SPARC assembler.
Further information about this assembler can be found in Sun Microsystems
(1994a).
106 Chapter
2 Assemblers
-RODATA Read-onlydata
.BSS Uninitialized data areas
CMP %L0, 10
BLE LOOP
ADD 42, tL3, %L4
the ADD instruction is executed beforethe conditional branch BLE. This ADD
instruction is said to be in the delayslot of the branch; it is executed regardless
of whether or not the conditional branch is taken.
LOOP:
NOP
Moving the ADD instruction into the delay slot would produce the version
discussed earlier. (Notice that the CMP instruction could not be moved into
the delay slot, becauseit sets the condition codes that must be tested by the
BLE.)
However, there is another possibility. Suppose that the original version of
the loop had been
CMP LO, 10
BLE LOOP
NOP
Now the ADD instruction is logically the first instruction in the loop. It could
still be moved into the delay slot, as previously described. However, this
would create a problem. On the last execution of the loop, the ADD instruction
(which is the beginning of the next loop iteration) should not be executed.
The SPARC architecture defines a solution to this problem. A conditional
branch instruction like BLE can be annulled. If a branch is annulled, the in-
struction in its delay slot is executed if the branch is taken, but not executed if
the branch is not taken. Annulled branches are indicated in SPARC assembler
108 Chapter
2) Assemblers
language by writing “,A” following the operation code. Thus the loop just dis-
cussed could be rewritten as
LOOP:
CMP %LO, 10
BLE,A LOOP
This section describes some of the features of the AIX assembler for PowerPC
and other similar systems. Further information about this assembler can be
found in IBM (1994b).
The AIX assembler includes support for various models of PowerPC mi-
croprocessors,as well as earlier machines that implement the original POWER
architecture. The programmer can declare which architecture is being used
with the assembler directive MACHINE. The assembler automatically checks
for POWER or PowerPC instructions that are not valid for the specified envi-
ronment. When the object program is generated, the assembler includes a flag
that indicates which processorsare capable of running the program. This flag
depends on which instructions are actually used in the program, not on the
-MACHINE directive. For example, a PowerPC program that contains only in-
structions that are also in the original POWER architecture would be exe-
cutable on either type of system.
As we discussed in Section 1.5.2, PowerPC load and store instructions use
a base register and a displacement value to specify an address in memory. Any
of the general-purpose registers (except GPRO)can be used as a base register.
Decisions about which registers to use in this way are left to the programmer.
In a long program, it is not unusual to have several different base registers in
use at the same time. The programmer specifies which registers are available
for use as base registers, and the contents of these registers, with the .USING
, 2.5 ImplementationExamples 109
. USING LENGTH,
1
. USING BUFFER, 4
would identify GPR1 and GPR4 as base registers. GPR1 would be assumed to
contain the address of LENGTH, and GPR4 would be assumed to contain the
address of BUFFER. As with SIC/XE, the programmer must provide instruc-
tions to place these values into the registers at execution time. Additional
.USING statements may appear at any point in the program. If a base register
is to be used later for some other purpose, the programmer indicates with the
.DROP statement that this register is no longer available for addressing
purposes.
This additional flexibility in register usage means more work for the as-
sembler. A baseregistertableis used to remember which of the general-purpose
registers are currently available as base registers, and what base addresses
they contain. Processing a .USING statement causes an entry to be made in
this table (or an existing entry to be modified); processing a .DROP statement
removes the corresponding table entry. For each instruction whose operand is
an address in memory, the assembler scans the table to find a base register that
can be used to address that operand. If more than one register can be used, the
assembler selects the base register that results in the smallest signed displace-
ment. If no suitable base register is available, the instruction cannot be assem-
bled. The process of displacement calculation is the same as we described for
SIC/XE.
The AIX assembler language also allows the programmer to write base
registers and displacements explicitly in the source program. For example, the
instruction
L 2,8(4)
simply inserts the specified values into the object code instruction: in this case
base register GPR4 and displacement 8. The base register table is not involved,
and the register used in this way need not have appeared in a .USING state-
ment.
110 Chapter
2. Assemblers
does not continue to the second pass.In this case,the assembly listing contains
only errors that could be detected during Pass1.
If no errors are detected during the first pass, the assembler proceeds to
Pass 2. The second pass reads the source program again, instead of using an
intermediate file as we discussed for SIC. This means that location counter val-
ues must be recalculated during Pass 2. It also means that any warning mes-
sages that were generated during Pass 1 (but were not serious enough to
terminate the assembly)are lost. The assembly listing will contain only errors
and warnings that are generated during Pass2.
Assembled control sectionsare placed into the object program according to
their storage mapping class.Executableinstructions, read-only data, and vari-
ous kinds of debugging tables are assigned to an object program section
named .TEXT. Read/write data and TOC entries are assigned to an object pro-
gram section named .DATA. Uninitialized data is assigned to a section named
.BSS.When the object program is generated, the assembler first writes all of
the .TEXT control sections, followed by all of the .DATA control sections ex-
cept for the TOC. The TOC is written after the other .DATA control sections.
Relocation and linking operations are specified by entries in a relocation table,
similar to the Modification records we discussed for SIC.
EXERCISES
Section 2.1
JLT LOOP
STA TOTAL
RSUB
Fig. 2.4 are not explicitly spelled out. (One example would be scan-
ning the instruction operand field for the modifier “,X”.) List as
many of these implied operations as you can, and think about how
. Suppose
they mightthat
be you
implemented.
are to write a “disassembler”—that is, a system
errors; however, there are many more such errors that might occur.
List error conditions that might arise during the assembly of a SIC
program. When and how would each type of error be detected, and
. Suppose
what action
that
should
the SIC
theassembler
assemblertake
language
for each?
is changed to include a
RESB n'‘c’
this assembler would give an error messageonly for the second (i.e.,
duplicate) definition. For example, it would give an error message
only for line 5 of the program below.
1 P3 START 1000
2 LDA ALPHA
3 STA ALPHA
4 ALPHA RESW 1
5 ALPHA WORD 0
6 END
Suppose that you want to change the assembler to give error mes-
sagesfor all definitions of a doubly defined symbol (e.g., lines 4 and
5), and also for all referencesto a doubly defined symbol (e.g., lines 2
and 3). Describethe changesyou would make to accomplish this. In
making this modification, you should change the existing assembler
as little as possible.
. Suppose that you have a two-pass assembler that is written accord-
ing to the algorithm in Fig. 2.4. You want to change this assemblerso
that it gives a warning messagefor labels that are not referenced in
the program, as illustrated by the following example.
P3 START 1000
LDA DELTA
ADD BETA
ALPHA RESW 1
DELTA RESW 1
END
Section 2.2
SUM START 0
FIRST LDX #0
LDA #0
+LDB #TABLE2
BASE TABLE2
LOOP ADD TABLE,X
ADD TABLEZ2
,X
TIX COUNT
JLT LOOP
+STA TOTAL
RSUB
COUNT RESW 1
TABLE RESW 2000
TABLE2 RESW 2000
TOTAL RESW 1
END FIRST
4. Generate the complete object program for the source program given
in Exercise 3.
10.
6. Modify the algorithm described in Fig. 2.4 to handle relocatable pro-
grams. How would these modifications be reflected in the assembler
Suppose that you are writing an assembler for a machine that has
only program-counter relative addressing. (That is, there are no di-
rect-addressinginstruction formats and no base relative addressing.)
Suppose that you wish to assemblean instruction whose operand is
an absolute address in memory—for example,
LDA 100
b. ALPHA ADD
[CC ALPHA
I(3)
Section 2.3
LDA =W’3’
specifying as the literal operand a word with the value 3. Would this
be a good idea?
4. Immediate operands and literals are both ways of specifying an
operand value in a source statement. What are the advantages and
disadvantages of each?When might each be preferable to the other?
5. Suppose that you have a two-pass SIC/XE assembler that does not
support literals. Now you want to modify the assembler to handle
literals. However, you want to place the literal pool at the beginning
of the assembled program, not at the end as is commonly done. (You
do not have to worry about LTORG statements—your assembler
should always place all literals in a pool at the beginning of the pro-
gram.) Describe how you could accomplish this. If possible, you
should do so without adding another pass to the assembler. Be sure
to describe any data structures that you may need, and explain how
they are used in the assembler.
6. Supposewe made the following changesto the program in Fig. 2.9:
a. Delete the LTORG statement on line 93.
Show the resulting object code for lines 45, 135, 145, 215, and 230.
Also show the literal pool with addressesand data values. Note: you
do not need to retranslate the entire program to do this.
7. Assume that the symbols ALPHA and BETA are labels in a source
program. What is the difference between the following two
sequences of statements?
118 Chapter
14.
13.
12.
10.
11.
8. 2 Assemblers
a. LDA ALPHA-BETA
b. LDA ALPHA
SUB BETA
a. LDA #3
b. THREE EQU 3
LDA #THREE
c. THREE EQU 3
LDA THREE
CDATA
aretobeincluded
in thedefault
block.Whatchanges
in the
source program would accomplish this? Show the object program
(corresponding to Fig. 2.13) that would result.
16.
15. ° Exercises 119
Instructions and
initialized data items
Reserved storage
(uninitialized data items)
a. LDA LENGTH
SUB #1
17.b.
ReferringLDA
to the definitions
LENGTH-1 of symbols in Fig. 2.10, give the value,
type, and intuitive meaning (if any) of each of the following expres-
sions:
a. BUFFER-FIRST
b. BUFFER+4095
c. MAXLEN-1
d. BUFFER+MAXLEN-1
e. BUFFER-MAXLEN
f. 2* LENGTH
120 Chapter
23.
21.
20.
19. j.2|
18. g.
i.h. Assemblers
MAXLEN-BUFFER
FIRST+BUFFER
FIRST-BUFFER+BUFEND
2*MAXLEN-1
In the program of Fig. 2.9, what is the advantage of writing (on line
107)
instead of
+LDT #MAXLEN
change would eliminate the need for the EXTREF statement. Would
this be a good idea?
How could an assembler that allows external references avoid the
24. Assume that the symbols RDREC and COPY are defined as in Fig.
2.15.According to our rules, the expression
RDREC-COPY
would be illegal (that is, the assembler and/or the loader would re-
ject it). Suppose that for some reason the program really needs the
value of this expression. How could such a thing be accomplished
without changing the rules for expressions?
25. We discussed a large number of assembler directives, and many
more could be implemented in an actual assembler. Checking for
them one at a time using comparisons might be quite inefficient.
How could we use a table, perhaps similar to OPTAB, to speed
recognition and handling of assembler directives? (Hint: the answer
to this problem may depend upon the language in which the assem-
bler itself is written.)
26. Other than the listing of the source program with generated object
code, what assembler outputs might be useful to the programmer?
Suggest some optional listings that might be generated and discuss
any data structures or algorithms involved in producing them.
Section 2.4
JEQ ENDFIL+3
. Outline
where ENDFIL
the logichas
flow
notfor
yeta been
simple
defined?
one-passload-and-go assembler.
123
124 Chapter3 Loadersand Linkers
ture. We also present the design of a linking loader,a more advanced type of
loader that is typical of those found on most modern computing systems.
Section 3.3 presents a selection of commonly encountered loader features
that are not directly related to machine architecture. As before, our purpose is
not to cover all possible options, but to introduce some of the concepts and
techniquesmost frequently found in loaders.
Section 3.4 discussesalternative ways of accomplishing loader functions.
We consider the various times at which relocation and linking can be per-
formed, and the advantages and disadvantages associatedwith each. In this
context we study linkage editors (which perform linking before loading) and
dynamic linking schemes(which delay linking until execution time).
Finally, in Section 3.5 we briefly discuss some examples of actual loaders
and linkers. As before, we are primarily concerned with aspects of each piece
of software that are related to hardware or software design decisions.
We consider the design of an absolute loader that might be used with the sort
of assembler described in Section 2.1. The object program format used is the
same as that described in Section 2.1.1.An example of such an object program
is shown in Fig. 3.1(a).
3.1 125
7,00
7,002057.1C1010364COO00F
7,0020391E041030001030£0205D30203FD8205D2810303020575490392C205
101150€
103648206108
° 100
10334C0000454F46000003000000
100004
1030£02079302064509039DC20792C1036
Basic
Loader
Functions
T,0010001E1
HCOPY0100000107A
4103348203900
10362810303010154820613C100300102A0C103900102D
7,00
2073073820644C000005
E001000 (a) Object program
Memory
address Contents
e e e e e
e e e e e
e e e e e
e
e
e
program
plished
memory).
the
loader
gram.
are
ter loading.
Because
shown
indicated
Figure
Figure
jumps
inhas
relocation,
As
aas
The
our
single
3.1
been
3.1(b)
each
xxxx.
address
toloader
contents
the
Loading
pass.
presented
Text
This
shows
specified
itsin
does
record
The
indicates
operation
of
memory.
of
a(b)
memory
not
Header
an
representation
for
address
is
absolute
Program
need
loading
read,
that
When
isrecord
locations
to
very
the
the
to
loaded
perform
program.
(and
the
begin
previous
object
is
of
simple.
checked
for
End
the
in
that
memory
execution
such
code
which
program
it
record
contents
All
will
functions
to
itthere
functions
verify
contains
fit
isof
from
into
of
encountered,
is
the
that
these
as
no
Fig.
the
is
loaded
linking
are
Text
the
moved
available
locations
3.1(a)
accom-
correct
record
pro-
and
the
af-
to
remain unchanged.
126 Chapter3 Loadersand Linkers
Figure 3.2 shows an algorithm for the absolute loader we have discussed.
Although this process is extremely simple, there is one aspect that deserves
comment. In our object program, each byte of assembled code is given using
its hexadecimal representation in character form. For example, the machine
operation code for an STL instruction would be representedby the pair of char-
acters“1” and “4”. When theseare read by the loader (as part of the object pro-
gram), they will occupy two bytes of memory. In the instruction as loaded for
execution, however, this operation code must be stored in a single byte with
hexadecimalvalue 14. Thus each pair of bytes from the object program record
must be packed together into one byte during loading. It is very important to
realize that in Fig. 3.1(a), each printed character represents one byte of the ob-
ject program record. In Fig. 3.1(b), on the other hand, each printed character
representsone hexadecimal digit in memory (i.e., a half-byte).
This method of representing an object program is inefficient in terms of
both space and execution time. Therefore, most machines store object pro-
grams in a binary form, with eachbyte of object code stored as a single byte in
the object program. In this type of representation, of course, a byte may con-
tain any binary value. We must be sure that our file and device conventions do
not causesome of the object program bytes to be interpreted as control charac-
ters. For example, the convention described in Section 2.1—indicating the end
of a record with a byte containing hexadecimal 00—would clearly be unsuit-
able for use with a binary object program.
Obviously object programs stored in binary form do not lend themselves
well to printing or to reading by human beings. Therefore, we continue to use
character representationsof object programs in our examples in this book.
begin
read Header record
verify program name and length
read first Text record
while record type # ‘E’ do
begin
{if object code is in character form, convert into
internal representation}
move object code to specified location in memory
read next object program record
end
different kinds of error conditions that might arise during the loading, and
how these could be handled.
The absolute loader described in Section 3.1 is certainly simple and efficient;
however, this schemehas several potential disadvantages.One of the most ob-
vious is the need for the programmer to specify (when the program is assem-
bled) the actual address at which it will be loaded into memory. If we are
considering a very simple computer with a small memory (such as the stan-
dard version of SIC), this does not create much difficulty. There is only room
to run one program at a time, and the starting address for this single user pro-
gram is known in advance. On a larger and more advanced machine (such as
SIC/XE), the situation is not quite as easy. We would often like to run several
independent programs together, sharing memory (and other system resources)
between them. This means that we do not know in advance where a program
will be loaded. Efficient sharing of the machine requires that we write relocat-
able programs instead of absolute ones.
Writing absolute programs also makes it difficult to use subroutine li-
braries efficiently. Most such libraries (for example, scientific or mathematical
packages) contain many more subroutines than will be used by any one pro-
gram. To make efficient use of memory, it is important to be able to select and
load exactly those routines that are needed. This could not be done effectively
if all of the subroutines had preassigned absolute addresses.
In this section we consider the design and implementation of a more com-
plex loader. The loader we present is one that is suitable for use on a SIC/XE
system and is typical of those that are found on most modern computers. This
loader provides for program relocation and linking, as well as for the simple
loading functions described in the preceding section. As part of our discus-
sion, we examine the effect of machine architecture on the design of the loader.
The need for program relocation is an indirect consequence of the change
to larger and more powerful computers. The way relocation is implemented in
a loader is also dependent upon machine characteristics. Section 3.2.1 dis-
cussesthese dependenciesby examining different implementation techniques
and the circumstancesin which they might be used.
Section 3.2.2 examines program linking from the loader’s point of view.
Linking is not a machine-dependent function in the sense that relocation is;
however, the same implementation techniques are often used for these two
functions. In addition, the process of linking usually involves relocation of
some of the routines being linked together. (See,for example, the previous dis-
cussion concerning the use of subroutine libraries.) For these reasons we dis-
cuss linking together with relocation in this section.
130 Chapter 3. Loadersand Linkers
3.2.1 Relocation
Loaders that allow for program relocation are called relocatingloadersor relative
loaders. The concept of program relocation was introduced in Section 2.2.2; you
may want to briefly review that discussion before reading further. In this
section we discuss two methods for specifying relocation as part of the object
program.
The first method we discuss is essentially the same as that introduced in
Chapter 2. A Modification record is used to describe each part of the object
code that must be changed when the program is relocated. (The format of the
Modification record is given in Section 2.3.5.) Figure 3.4 shows a SIC/XE pro-
gram we use to illustrate this first method of specifying relocation. The pro-
gram is the same as the one in Fig. 2.6; it is reproduced here for convenience.
Most of the instructions in this program use relative or immediate addressing.
The only portions of the assembled program that contain actual addressesare
the extended format instructions on lines 15, 35, and 65. Thus these are the
only items whose values are affected by relocation.
Figure 3.5 displays the object program corresponding to the source in
Fig.3.4. Notice that there is one Modification record for each value that must
be changed during relocation (in this case, the three instructions previously
mentioned). EachModification record specifies the starting addressand length
of the field whose value is to be altered. It then describes the modification to
be performed. In this example, all modifications add the value of the symbol
COPY, which represents the starting address of the program. The algorithm
the loader uses to perform these modifications is discussed in Section 3.2.3.
More examples of relocation specified in this manner appear in the next sec-
tion when we examine the relationship between relocation and linking.
The Modification record schemeis a convenient means for specifying pro-
gram relocation; however, it is not well suited for use with all machine archi-
tectures. Consider, for example, the program in Fig. 3.6. This is a relocatable
program written for the standard version of SIC. The important difference
between this example and the one in Fig. 3.4 is that the standard SIC machine
does not use relative addressing. In this program the addresses in all the in-
structions except RSUB must be modified when the program is relocated. This
would require 31 Modification records, which results in an object program
more than twice as large as the one in Fig. 3.5.
1076 Source statement 3.2. Machine-DependentLoaderFeatures 131
HCOPY00000001077
70000001
DI7202D69202D4B1010360320262900003320074B10105D3F2F
700001D130F20160100030F200D4B10105D3E2003454F46
70010361
DB410B400B44075101000E32019332FFADB201
3A00433200857C003BB850
10010531D3B2FEA1340004FO000F1B410774000E32011332FFA53CO03DF2008B
700107007
3B2FEF4FO00005
M00000705+COPY
M00001405+COPY
M00002705+COPY
E000000
cept that there is a relocationbit associated with each word of object code. Since
all SIC instructions occupy one word, this means that there is one relocation
bit for each possible instruction. The relocation bits are gathered together into
a bit mask following the length indicator in each Text record. In Fig. 3.7 this
mask is represented (in character form) as three hexadecimal digits. These
charactersare underlined for easier identification in the figure.
If the relocation bit corresponding to a word of object code is set to 1, the
program’s starting address is to be added to this word when the program is re-
located. A bit value of 0 indicates that no modification is necessary. If a Text
record contains fewer than 12 words of object code, the bits corresponding to
unused words are set to 0. Thus the bit mask FFC (representing the bit string
111111111100) in the first Text record specifies that all 10 words of object code
are to be modified during relocation. These words contain the instructions cor-
responding to lines 10 through 55 in Fig. 3.6. The mask E00 in the second Text
record specifies that the first three words are to be modified. The remainder of
the object code in this record represents data constants (and the RSUB instruc-
tion) and thus does not require modification.
The other Text records follow the same pattern. Note that the object code
generated from the LDX instruction on line 210 begins a new Text record even
though there is room for it in the preceding record. This occurs becauseeach
relocation bit is associated with 4 3-byte segment of object code in the Text
record. Any value that is to be modified during relocation must coincide with
one of these 3-byte segmentsso that it corresponds to a relocation bit. The as-
sembled LDX instruction does require modification because of the direct ad-
dress. However, if it were placed in the preceding Text record, it would not be
Source statement 096 3.2 Machine-Dependent
Object code
LoaderFeatures 133
Line Loc
HCOPYf0000000107A
7,00001E15E000C00364810610800334C0000454F46000003000000
TOOOOOOIEFFC]400334810390000362800303000154810613C000300002A
7,00
700105
E000000
70010391
106119
70A8001000364C0000F
EF
s——A
FE0040030£01079301064508039DC10792C00363810644C000005
FC040030000030E0105D30103FD8105D2800303010575480392C10
1001000
ginning of the user’s assigned area of memory. The conversion of these rela-
tive addresses to actual addressesis performed as the program is executed.
(We discuss this further when we study memory management in Chapter 6.)
As the next section illustrates, however, the loader must still handle relocation
of subprograms in connection with linking.
are to be linked, relocated, and loaded. The loader has no way of knowing
(and no need to know) which control sections were assembled at the same
time. .
Consider the three (separately assembled) programs in Fig. 3.8, each of
whichconsistsof a singlecontrolsection.Eachprogramcontains
alist of items
(LISTA, LISTB, LISTC); the ends of these lists are marked by the labels ENDA,
ENDB, ENDC. The labels on the beginnings and ends of the lists are external
symbols (that is, they are available for use in linking). Note that each program
contains exactly the same set of referencesto these external symbols. Three of
these are instruction operands (REF1 through REF3), and the others are the
values of data words (REF4through REF8).In considering this example, we
examine the differences in the way these identical expressions are handled
within the three programs. This emphasizes the relationship between the relo-
cation and linking processes.To focus on these issues,we have not attempted
to make these programs appear realistic. All portions of the programs not in-
volved in the relocation and linking process are omitted. The same applies to
the generated object programs shown in Fig. 3.9.
Consider first the reference marked REF1. For the first program (PROGA),
REF1
is simply a referenceto a label within the program. It is assembledin the
usual way as a program-counter relative instruction. No modification for relo-
cation or linking is necessary.In PROGB,on the other hand, the same operand
refers to an external symbol. The assembler uses an extended-format instruc-
tion with address field set to 00000. The object program for PROGB (see Fig.
3.9) contains a Modification record instructing the loader to add the value of
the symbol LISTA to this address field when the program is linked. This refer-
ence is handled in exactly the same way for PROGC.
The referencemarked REF2is processedin a similar manner. For PROGA,
the operand expression consists of an external reference plus a constant. The
assembler stores the value of the constant in the address field of the instruc-
tion and a Modification record directs the loader to add to this field the value
0000 PROGA
EXTREF
EXTDEFLISTA,
0LISTB,ENDB,
ENDALISTC, ENDC
0020 REF
1 +LDT
LDA LISTA 03201D
0023 REF2 LISTB+4 77100004
0027 REF3 LDX #ENDA-LISTA 050014
*
0054 ENDA EQU
0054 WORD ENDA-LISTA+LISTC 000014
0057 REF5 WORD ENDC-LISTC-10 FFFFF6
005A REF6 WORD ENDC-LISTC+LISTA-1 00003F
005D REF7 WORD ENDA-LISTA-(ENDB-LISTB) 000014
0060 REF8 WORD LISTB-LISTA FFFFCO
Source
START statement
END REF
1
0000 PROGB
+LDA
EXTDEFLISTB,
EXTREF LISTA,ENDA,LISTC,
0 ENDB ENDC
*
0070 ENDB EQU
0070 WORD ENDA-LISTA+LISTC 000000
REF
4
0073 REF5 WORD ENDC-LISTC~-10 FFFFF6
0076 REF6 WORD ENDC-LISTC+LISTA-1 FFFFFF
FFFFFO
0079 REF7 WORD ENDA-LISTA-(ENDB-LISTB)
007C Figure 3.8 Sample LISTB-LISTA
WORD programs illustrating linking and relocation.
000060
END
3.2. Machine-DependentLoaderFeatures 137
HPROGA
RLISTB
e
DLISTA
7,0000200a03201D77100004050014
Figure
ENDB
POOO4OENDA
00000000063
3.8
LISTC
(contd)
END
EXTDEF
EXTREF
,ENDC
00054
LISTA,
LISTC,ENDC
ENDA,
LISTB,
ENDB
0018 REF
1 +LDA LISTA 03100000
001C REF2 +LDX
+LDT #ENDA-LISTA
LISTB+4 77100004
0042 ENDC
EQU
WORD ENDC-LISTC+LISTA-1
ENDA-LISTA+LISTC
ENDC-LISTC-10
ENDA-LISTA-
LISTB-LISTA
* (ENDB-LISTB)
0042 REF4 000030
0045 REFS 000008
0048 REF6 000011
004B REF7 000000
004E REF8
70000540F000014FFFFF600003F000014FFFFCO 000000
M00002405+LISTB
M00005406+LISTC
00057.06+ENDC
400005706-LISTC
OO0SA0G+ENDC
0005AD6-LISTC
0005A06+PROGA
Nb0OOSmR6-ENDB
H00005D06+LISTB
M00006006+LISTB
M00006006-PROGA
£000020
Figure 3.9 Object programs corresponding to Fig. 3.8.
138 Chapter3 Loadersand Linkers
HPROGB£0000000007F
DLISTB OOOQ60ENDB 000070
RLISTA ENDA |LISTC ENDC
7,0000360B80310000077202705100000
10000700FOOO00QFFFFF6FFFFFEFFFFF0000060
M00003705+LISTA
M0000
3E05+ENDA
MD0003E05-LISTA
0007006+ENDA
0007006-LISTA
400007006+LISTC
00007
306+ENDC
0007 406LIsTC
0007606+ENDC
MD0007606-LISTC
Mp0007 606+LISTA
400007906+ENDA
Mp0007906-LISTA
400007C06+PROGB
M00007C06-LISTA
E
HPROGC000000000051
DLISTC OO0030ENDC 000042
RLISTA ENDA LISTB ENDB
e
T,0000180C031000007710000405100000
700004
20F.00003000000800001
1000000000000
M00001905+LISTA
M00001D05+LISTB
0002105+ENDA
0002103,
LISTA
400004206+ENDA
MP0004206-LISTA
400004206+PROGC
M00004806+LISTA
M00004 BO6+ENDA
400004806-LISTA
400004
B06-ENDB
H00004B06+LISTB
M00004E06+LISTB
MOOOU4E06-LISTA
E
of the expression in REF4 except for the value of LISTC. This results in an ini-
tial value of (hexadecimal) 000014 and one Modification record. However, the
same expression in PR@GBcontains no terms that can be evaluated by the as-
sembler.The object code therefore contains an initial value of 000000and three
Modification records. For PROGC, the assembler can supply the value of
LISTC relative to the beginning of the program (but not the actual address,
which is not known until the program is loaded). The initial value of this data
word contains the relative address of LISTC (hexadecimal 000030). Modifica-
tion records instruct the loader to add the beginning address of the program
(i.e., the value of PROGC), to add the value of ENDA, and to subtract the
value of LISTA. Thus the expression in REF4 represents a simple external ref-
erence for PROGA, a more complicated external reference for PROGB, and a
combination of relocation and external references for PROGC.
relative) address of the next instruction. We could also think of this process as
automatically providing the needed relocation at execution time through the
target address calculation. In PROGB, on the other hand, reference REF1 is an
extended format instruction that contains a direct (actual) address. This ad-
dress, after linking, is 4040—thesame as the target address for the same refer-
ence in PROGA.
You should work through the details of the other referencesto see that the
target addresses (for REF2 and REF3) or the data values (for REF5 through
REF8) are the same in each of the three programs. You do not need to worry
about how thesecalculations are actually performed by the loader becausethe
algorithm and data structures for doing this are discussed in the next section.
It is important, however, that you understandthe calculations to be performed,
and that you are able to carry out the computations by hand (following the in-
structions that are contained in the object programs).
Memory
address Contents
0000 XXXXXXKXX XXXXXKXKXX XXXXXKXX XXXKXXXKX
4000
4010
4020 1040C€705 0014.... PROGA
4030
4040
4050 00412600 00080040 51000004
4060
4070
4080
4090 - 031040 40772027
PROGB
40A0
40B0
40c0
40D0 40070510
eoeee 200 41260000 08004051 00000400
40E0
40FO0 ee + -0310 40407710
Figure 3.10(a) Programs
4100 from Fig. 3.8 after linking and loading.
0014.... PROGC
4110
4120 00412600 51000004
4130 XXXXXXXX XXKXXKXXXX XE XXKKKX
PROGA
|HPROGAeee
(REF4)
(+) 4050\
eeecececsce
004126|° eeececcecee
(Actual address
of LISTC)
/
Load addresses
\ PROGA 004000
\ NN
PROGB004063
until the later control section is read). Thus a linking loader usually makes two
passes over its input, just as an assembler does. In terms of general function,
the two passesof a linking loader are quite similar to the two passesof an as-
sembler: Pass1 assignsaddressesto all external symbols, and Pass2 performs
the actual loading, relocation, and linking.
The main data structure needed for our linking loader is an external sym-
bol table ESTAB.This table, which is analogous to SYMTAB in our assembler
algorithm, is used to store the name and address of eachexternal symbol in the
set of control sections being loaded. The table also often indicates in which
control section the symbol is defined. A hashed organization is typically used
for this table. Two other important variables are PROGADDR (program load
address)and CSADDR (control section address).PROGADDR is the beginning
address in memory where the linked program is to be loaded. Its value is sup-
plied to the loader by the operating system. (In Chapter 6 we discuss how
PROGADDR might be generated within the operating system.) CSADDR con-
tains the starting address assigned to the control section currently being
scanned by the loader. This value is added to all relative addresseswithin the
control section to convert them to actual addresses.
Control Symbol
section name Address Length
PROGA 4000 0063
LISTA 4040
ENDA 4054
PROGB 4063 007F
LISTB 40C3
ENDB 40D3
ENDC 4124
Pass
1:
while
begin
get
set begin
CSADDR
PROGADDR
not
if
read
set
search
found
set
end
CSLTH
to
next
from
ESTAB
PROGADDR
error
ofthen
input
to
input
operating
control
for
flag
record
{for
do
control
{duplicate
section
system
first
{Header
section
control
length
external
record
name
section}
forsymbol}
control section}
while
else
begin
enter
record
if
read
control
record
for
next
type
each
input
type
section
+ symbol
'E’record
do
name
in the
into
record
ESTAB do
with value CSADDR
end
Figure
end
(Pass
add
3.11(a)
{while
end
1}
CSLTH
not
{while
Algorithm
toEOF}
CSADDR
begin
end
# for
’E’}
if
else
search
{for}
Pass
{startimg
found
set1ESTAB
enter
(CSADDR
‘D’'
error
ofathen
symbol
then
linking
address
for
flag
+ into
loader.
indicated
symbol
(duplicate
for
ESTAB
name
next
address)
with
external
control
value section}
symbol)
144 Chapter
3 Loaders
andLinkers
Pass2:
begin
set CSADDR to PROGADDR
set EXECADDR to PROGADDR
The last step performed by the loader is usually the transferring of control
to the loaded program to begin execution. (On some systems, the address
where execution is to begin is simply passedback to the operating system.The
user must then enter a separate Execute command.) The End record for each
control section may contain the address of the first instruction in that control
section to be executed. Our loader takes this as the transfer point to begin exe-
cution. If more than one control section specifiesa transfer address,the loader
arbitrarily uses the last one encountered. If no control section contains a trans-
fer address, the loader uses the beginning of the linked program (i.e.,
PROGADDR) as the transfer point. This convention is typical of those found
in most linking loaders. Normally, a transfer address would be placed in the
End record for a main program, but not for a subroutine. Thus the correct exe-
cution addresswould be specified regardlessof the order in which the control
sectionswere presented for loading. (SeeFig. 2.17 for an example of this.)
You should apply this algorithm (by hand) to load and link the object pro-
grams in Fig. 3.9. If PROGADDR is taken to be 4000, the result should be the
same as that shown in Fig. 3.10.
This algorithm can be made more efficient if a slight change is made in the
object program format. This modification involves assigning a referencenumber
to each external symbol referred to in a control section. This referencenumber
is used (instead of the symbol name) in Modification records.
Suppose we always assign the reference number 01 to the control section
name. The other external reference symbols may be assigned numbers as part
of the Refer record for the control section. Figure 3.12 shows the object
HPROGA000000000063
DLISTA OOOO4QENDA 000054
ROZLISTB O3ENDB 4LISTC OSENDC
7,0000200A03201D77100004050014
70000540F000014FFFFF600003F000014FFFFCO
M00002405+02
M00005406+04
400005706+05
MD0005706-04
M00005A06+05
M00005A06-04
MOOOUSAD6+01
H00005D06-03
M00005D06+02
M00006006+02
400006006-01
£000020 -
ROGBP0000000007F
ISTB OOOO60ENDB 000070
ROZLISTA O3ENDA o4
4LISTC PSENDC
e
e
T,0000360B03100000772027,05100000
A
10000700
FOOOOOOFFFFF6FFFFFFEFFFFFOQ000Q60
M000037.05+02
M00003E05+03
M00003E,05-02
M,00007006+03
M00007006-02
00007006+04
400007306405
M00007306-04
ee
MD0007606-04
MD0007606+02
M00007906+03
M00007906-02
Mp0007G06+01
490007606,
02
HPROGC
DLISTC 00000000051
0O00030ENDC000042
ROZLISTA O3ENDA Q4LISTB OSENDB
700001806031
000007710000405100000
T,000042,0F000030,00000800001
1,000000000000
M00001905+02
M00001D05+04
MO00021,05+03
M00002105-02
M00004206+03
400004206-02
M00004206+01
M00004806+02
M00004B06+03
400004B806-02
M00004B06-05
M00004B06+04
M00004E06+04
MO0004E06-02
programs from Fig. 3.9 with this change. The reference numbers are under-
lined in the Refer and Modification records for easier reading. The common
useof a techniquesuch
as this is onereasonwe includedReferrecordsin our
object programs. You may have noticed that these records were not used in the
algorithm of Fig. 3.11.
The main advantage of this reference-number mechanism is that it avoids
multiple searchesof ESTABfor the same symbol during the loading of a con-
trol section. An external reference symbol can be looked up in ESTAB once for
each control section that uses it. The values for code modification can then be
obtained by simply indexing into an array of these values. You are encouraged
to develop an algorithm that includes this technique, together with any addi-
tional data structures you may require.
In this section we discuss some loader features that are not directly related to
machine architecture and design. Loading and linking are often thought of as
operating system service functions. The programmer's connection with such
services is not as direct as it is with, for example, the assembler during pro-
gram development. Therefore, most loaders include fewer different features
(and less varied capabilities) than are found in a typical assembler.
Section 3.3.1 discusses the use of an automatic library search process for
handling external references.This feature allows a programmer to use stan-
dard subroutines without explicitly including them in the program to be
loaded.The routinesare automaticallyretrievedfroma library as they are
neededduring linking.
Section 3.3.2 presents some common options that can be selected at the
time of loading and linking. Theseinclude such capabilities as specifying alter-
native sources of input, changing or deleting external references,and control-
ling the automatic processing of external references.
automatically fetched from the library, linked with the main program, and
loaded. The programmer does not need to take any action beyond mentioning
the subroutine names as external references in the source program. On some
systems, this feature is referred to as automatic library call. We use the term
library searchto avoid confusion with the call feature found in most program-
ming languages.
Linking loaders that support automatic library search must keep track of
external symbols that are referred to, but not defined, in the primary input to
the loader. One easy way to do this is to enter symbols from each Refer record
into the symbol table (ESTAB)unless these symbols are already present. These
entries are marked to indicate that the symbol has not yet been defined. When
the definition is encountered, the address assigned to the symbol is filled in to
complete the entry. At the end of Pass1, the symbols in ESTABthat remain un-
defined represent unresolved external references. The loader searches the
library or libraries specified for routines that contain the definitions of these
symbols, and processesthe subroutines found by this searchexactly as if they
had been part of the primary input stream.
Note that the subroutinesfetchedfroma library in this way may them-
selves contain external references.It is therefore necessaryto repeat the library
search process until all references are resolved (or until no further resolution
can be made). If unresolved external referencesremain after the library search
is completed, thesemust be treated as errors.
The process just described allows the programmer to override the standard
subroutines in the library by supplying his or her own routines. For example,
suppose that the main program refers to a standard subroutine named SQRT.
Ordinarily the subroutine with this name would automatically be included via
the library searchfunction. A programmer who for some reasonwanted to use
a different version of SQRTcould do so simply by including it as input to the
loader. By the end of Pass1 of the loader, SQRTwould already be defined, so
it would not be included in any library searchthat might be necessary.
The libraries to be searchedby the loader ordinarily contain assembledor
compiled versions of the subroutines (that is, object programs). It is possible to
search these libraries by scanning the Define records for all of the object pro-
grams on the library, but this might be quite inefficient. In most casesa special
file structure is used for the libraries. This structure contains a directory that
gives the name of each routine and a pointer to its address within the file. If a
subroutine is to be callable by more than one name (using different entry
points), both names are entered into the directory. The object program itself, of
course, is only stored once. Both directory entries point to the samecopy of the
routine. Thus the library search itself really involves a search of the directory,
followed by reading the object programs indicated by this search. Some oper-
ating systemscan keep the directory for commonly used libraries permanently
Section3.3. Machine-IndependentLoaderFeatures 149
in memory. This can expedite the search processif a large number of external
references are to be resolved.
The processof library search has been discussed as the resolution of a call
to a subroutine. Obviously the same technique applies equally well to the res-
olution of external references to data items.
Many loaders allow the user to specify options that modify the standard pro-
cessing described in Section 3.2. In this section we discuss some typical loader
options and give examples of their use. Many loaders have a special command
language that is used to specify options. Sometimes there is a separate input
file to the loader that contains such control statements. Sometimes these same
statements can also be embedded in the primary input stream between object
programs.
Ona fewsystemstheprogrammer
canevenincludeloadercontrol
statementsin the source program, and the assembler or compiler retains these
commands as a part of the object program.
We discuss loader options in this section as though they were specified us-
ing a command language, but there are other possibilities. On some systems
options are specified as a part of the job control language that is processedby
the operating system. When this approach is used, the operating system incor-
porates the options specified into a control block that is made available to the
loader when it is invoked. The implementation of such options is, of course,
the same regardless of the means used to select them.
One typical loader option allows the selection of alternative sources of
input. For example, the command
INCLUDE program-name(library-name)
DELETE csect-name
might instruct the loader to delete the named control section(s) from the set of
programs being loaded. The command
CHANGE namel,name2
150 Chapter3. Loadersand Linkers
control sectionsof the object program will appear in the same file (or as part of
the same library member).
Suppose
now thata setof utility subroutines
is madeavailableon thecom-
puter system. Two of these, READ and WRITE, are designed to perform the
same functions as RDREC and WRREC. It would probably be desirable to
change the source program of COPY to use these utility routines. As a tempo-
rary measure, however, a sequence of loader commands could be used to
make this change without reassembling the program. This might be done, for
example, to test the utility routines before the final conversion is made.
Supposethat afile containingthe objectprogramsin Fig.2.17is the pri-
mary loader input with the loader commands
INCLUDE READ(UTLIB)
INCLUDE WRITE(UTLIB)
DELETE RDREC, WRREC
CHANGE RDREC, READ
CHANGE WRREC, WRITE
These commands would direct the loader to include control sections READ
and WRITE from the library UTLIB, and to delete the control sections RDREC
and WRREC from the load. The first CHANGE command would cause all ex-
LIBRARY MYLIB
‘ 3.4 LoaderDesign Options 151
to instruct the loader that these external references are to remain unresolved.
This avoids the overhead of loading and linking the unneeded routines, and
savesthe memory spacethat would otherwise be required.
It is also possible to specify that no external referencesbe resolved by li-
brary search.Of course, this means an error will result if the program attempts
to make such an external reference during execution. This option is more use-
ful when programs are to be linked but not executed immediately. It is often
desirableto postponethe resolutionof externalreferences
in such
a case.In
Section 3.4.1we discuss linkage editors that perform this sort of function.
Another common option involves output from the loader. In Section 3.2.3
we gave an example of a load map that might be generated during the loading
process.Through control statementsthe user can often specify whether or not
such a map is to be printed at all. If a map is desired, the level of detail can be
selected. For example, the map may include control section names and ad-
dressesonly. It may also include external symbol addresses or even a cross-
referencetable that shows referencesto each external symbol.
Loaders often include a variety of other options. One such option is the
ability to specify the location at which execution is to begin (overriding any in-
formation given in the object programs). Another is the ability to control
whether or not the loader should attempt to execute the program if errors are
detected during the load (for example, unresolved external references).
In this section we discuss some common alternatives for organizing the load-
ing functions, including relocation and linking. Linking loaders, as described
152 Chapter3 Loadersand Linkers
Section 3.4.1 discusses linkage editors, which are found on many comput-
ing systems instead of or in addition to the linking loader. A linkage editor
performs linking and some relocation; however, the linked program is written
toa file or libraryinsteadof beingimmediately
loadedinto memory.Thisap-
proach reduces the overhead when the program is executed. All that is re-
quired at load time is a very simple form of relocation.
Section 3.4.2 introduces dynamic linking, which uses facilities of the oper-
ating system to load and link subprograms at the time they are first called. By
delaying the linking processin this way, additional flexibility can be achieved.
However,this approachusuallyinvolvesmoreoverheadthandoes
a linking
loader.
Theessential
difference
betweena linkageeditorand
a linking loaderis illus-
trated in Fig. 3.13.The source program is first assembledor compiled, produc-
ing an object program (which may contain several different control sections).
A linking loader performs all linking and relocation operations, including au-
tomatic library searchif specified, and loads the linked program directly into
memory for execution. A linkageeditor, on the other hand, produces a linked
version of the program (often called a load module or an executableimage),
whichis writtentoa file or libraryfor laterexecution.
When the user is ready to run the linked program, a simple relocating
loader can be used to load the program into memory. The only object code
modification necessary is the addition of an actual load address to relative val-
ues within the program. The linkage editor performs relocation of all control
sections relative to the start of the linked program. Thus, all items that need to
be modified at load time have values that are relative to the start of the linked
program. This means that the loading can be accomplished in one pass with
no externalsymboltablerequired.Thisinvolvesmuchlessoverheadthanus-
ing a linking loader.
If a program is to be executed many times without being reassembled, the
use of a linkage editor substantially reducesthe overhead required. Resolution
' 3.4 LoaderDesign Options 153
of external references and library searching are only pertormed once (when
the program is link edited). In contrast, a linking loader searcheslibraries and
resolvesexternal referencesevery time the program is executed.
Sometimes, however, a program is reassembled for nearly every execution.
This situation might occur in a program development and testing environ-
ment (for example, student programs). It also occurs when a program is used
soinfrequentlythat it is not worthwhileto storetheassembled
versioninali-
brary. In such casesit is more efficient to use a linking loader, which avoids the
steps of writing and reading the linked program.
The linked program produced by the linkage editor is generally in a form
that is suitable for processing by a relocating loader. All external referencesare
resolved, and relocation is indicated by some mechanism such as Modification
records or a bit mask. Even though all linking has been performed, informa-
tion concerning external references is often retained in the linked program.
This allows subsequent relinking of the program to replace control sections,
modify external references,etc. If this information is not retained, the linked
program cannot be reprocessed by the linkage editor; it can only be loaded
and executed.
Object Object
program(s) program(s)
Linking
loader
Linked
program
Relocating
(b)
loader
If the actual address at which the program will be loaded is known in ad-
vance, the linkage editor can perform all of the needed relocation. The result is
a linked program that is an exact image of the way the program will appear in
memory during execution. The content and processing of such an image are
the same as for an absolute object program. Normally, however, the added
flexibility of being able to load the program at any location is easily worth the
slight additional overhead for performing relocation at load time.
Linkage editors can perform many useful functions besides simply prepar-
ing an object program for execution. Consider, for example, a program
(PLANNER) that uses a large number of subroutines. Suppose that one sub-
routine (PROJECT)used by the program is changed to correct an error or to
improve efficiency. After the new version of PROJECTis assembledor com-
piled, the linkage editor can be used to replace this subroutine in the linked
version of PLANNER. It is not necessaryto go back to the original (separate)
versions of all of the other subroutines. The following is a typical sequenceof
linkage editor commands used to accomplish this. The command language is
similar to that discussed in Section 3.3.2.
|
INCLUDE PLANNER(PROGLIB)
DELETE PROJECT {DELETE from existing PLANNER}
INCLUDE PROJECT(NEWLIB) {INCLUDE new version}
REPLACE PLANNER
(PROGLIB)
INCLUDE READR(FTNLIB)
INCLUDE WRITER(FTNLIB)
3.4 LoaderDesign Options 155
The linked module named FTNIO could be indexed in the directory of SUBLIB
under the same names as the original subroutines. Thus a search of SUBLIB
before FTNLIB would retrieve FINIO instead of the separate routines. Since
FTNIO already has all of the cross-references between subroutines resolved,
these linkages would not be reprocessed when each user’s program is linked.
The result would be a much more efficient linkage editing operation for each
program and a considerable overall savings for the system.
Linkage editors often allow the user to specify that external referencesare
not to be resolved by automatic library search.Suppose, for example, that 100
FORTRAN programs using the I/O routines described above were to be
storedona library.If all externalreferences
wereresolved,this would mean
that a total of 100 copies of FTNIO would be stored. If library space were an
important resource, this might be highly undesirable. Using commands like
those discussed in Section 3.3.2, the user could specify that no library search
be performed during linkage editing. Thus only the external references be-
tween user-written routines would be resolved. A linking loader could then be
used to combine the linked user routines with FTNIO at execution time.
Because this process involves two separate linking operations, it would re-
quire slightly more overhead; however, it would result in a large savings in
library space.
Linkage editors often include a variety of other options and commands
like those discussed for linking loaders. Compared to linking loaders, linkage
editors in general tend to offer more flexibility and control, with a correspond-
ing increasein complexity and overhead.
Linkage editors perform linking operations before the program is loaded for
execution. Linking loaders perform these same operations at load time. In this
section we discuss a scheme that postpones the linking function until execu-
tion time: a subroutine is loaded and linked to the rest of the program when it
Dynamic
loader
Load-ana-call
ERRHANDL
User
program
ERRHANDL
(a) Load-and-call
ERRHANDL
(b)
{od
User User
| program program
(d) (e)
158 Chapter3 Loadersand Linkers
In this section we briefly examine linkers and loaders for actual computers. As
in our previous discussions, we make no attempt to give a full description of
the linkers and loaders used as examples. Instead we concentrate on any par-
ticularly interesting or unusual features, and on differences between these im-
plementations and the more general model discussed earlier in this chapter.
We also point out areas in which the linker or loader design is related to the
assemblerdesign or to the architecture and characteristicsof the machine.
The loader and linker examples we discuss are for the Pentium, SPARC,
and T3E architectures. You may want to review the descriptions of these archi-
tecturesin Chapter 1, and the related assemblerexamplesin Section2.5.
160 Chapter 3 Loadersand Linkers
This section describes some of the features of the Microsoft MS-DOS linker for
Pentium and other x86 systems. Further information can be found in Simrin
(1991) and Microsoft (1988).
Most MS-DOS compilers and assemblers(including MASM) produce ob-
ject modules, not executable machine language programs. By convention,
these object modules have the file name extension .OBJ. Each object module
contains a binary image of the translated instructions and data of the program.
It also describes the structure of the program (for example, the grouping of
segments and the use of external references in the program).
MS-DOS LINK is a linkage editor that combines one or more object mod-
ules to produce a complete executable program. By convention, this exe-
cutable program has the file name extension .EXE. LINK can also combine the
translated programs with other. modules from object code libraries, as we dis-
cussedpreviously.
Figure 3.15 illustrates a typical MS-DOS object module. There are also sev-
eral other possible record types (such as comment records), and there is some
flexibility in the order of the records.
The THEADR record specifies the name of the object module. The MOD-
END record marks the end of the module, and can contain a reference to the
entry point of the program. These two records generally correspond to the
Header and End records we discussed for SIC/XE.
TYPDEF
PUBDEF Externalsymbolsandreferences
EXTDEF
LNAMES
SEGDEF Segmentdefinitionandgrouping
GRPDEF
The PUBDEF record contains a list of the external symbols (called public
names) that are defined in this object module. The EXTDEF record contains a
list of the external symbols that are referred to in this object module. These
records are similar in function to the SIC/XE Define and Refer records. Both
PUBDEF and EXTDEF can contain information about the data type designated
by an external name. Thesetypes are defined in the TYPDEF record.
SEGDEF records describe the segments in the object module, including
their name, length, and alignment. GRPDEF records specify how these seg-
ments are combined into groups. (See Section 2.5.1 for a discussion of the use
of segmentation
in theMASMassembler.)
TheLNAMESrecordcontains
a list
of all the segment and class names used in the program. SEGDEF and
GRPDEF records refer to a segment by giving the position of its name in the
LNAMES record. (This approach to specifying names is similar to the “refer-
ence number” technique described near the end of Section 3.2.3.)
LEDATA records contain translated instructions and data from the source
program, similar to the SIC/XE Text record. LIDATA records specify trans-
lated instructions and data that occur in a repeating pattern. (See Exercise
2.1.7.)
FIXUPP records are used to resolve external references, and to carry out
address modifications that are associated with relocation and grouping of seg-
ments within the program. This is similar to the function performed by the
SIC/XE Modification records. However, FIXUPP records are substantially
more complex, because of the more complicated object program structure. A
FIXUPP record must immediately follow the LEDATA or LIDATA record to
which it applies.
LINK performs its processing in two passes,following a similar approach
to that described in Section 3.2.3. Pass 1 computes a starting address for each
segment in the program. In general, segments are placed into the executable
program in the same order that the SEGDEFrecords are processed.However,
in some casessegments from different object modules that have the same seg-
ment name and class are combined. Segments with the same class, but differ-
ent names, are concatenated. The starting address initially associated with a
segment is updated during Pass 1 as these combinations and concatenations
are performed.
Pass 1 constructs a symbol table that associates an address with each seg-
ment (using the LNAMES, SEGDEF,and GRPDEF records) and each external
symbol (using the EXTDEF and PUBDEF records). If unresolved external sym-
bols remain after all object modules have been processed, LINK searchesthe
specified libraries as described in Section3.3.1.
During Pass 2, LINK extracts the translated instructions and data from the
object modules, and builds an image of the executable program in memory. It
does this becausethe executable program is organized by segment, not by the
162 Chapter 3. Loadersand Linkers
order of the object modules. Building a memory image is the most efficient
way to handle the rearrangements caused by combining and concatenating
segments. If there is not enough memory available to contain the entire exe-
cutable image, LINK uses a temporary disk file in addition to all of the avail-
able memory.
Pass2 of LINK processeseach LEDATA and LIDATA record along with the
corresponding FIXUPP record (if there is one). It places the binary data from
LEDATA and LIDATA records into the memory image at locations that reflect
the segment addressescomputed during Pass 1. (Repeated data specified in
LIDATA records is expanded at this time.) Relocations within a segment
(causedby combining or grouping segments)are performed, and external ref-
erencesare resolved. Relocation operations that involve the starting address of
a segmentareaddedtoa tableof segmentfixups.Thistableis usedto perform
relocations that reflect the actual segment addresses when the program is
loaded for execution. -
After the memory image is complete, LINK writes it to the executable
(.EXE) file. This file also includes a header that contains the table of segment
fixups, information about memory requirements and entry points, and the ini-
tial contents for registers CS and SP.
This section describes some of the features of the SunOS linkers for SPARC
systems. Further information can be found in Sun Microsystems (1994b).
SunOS actually provides two different linkers, called the /ink-editorand the
run-time linker. The link-editor is most commonly invoked in the process of
compiling a program. It takes one or more object modules produced by assem-
blers and compilers, and combines them to produce a single output module.
This output module may be one of the following types:
3. A dynamicexecutable,
in which some symbolic referencesmay need to
be bound at run time
attributes, such as “executable” and “writeable.” (See Section 2.5.2 for a dis-
cussion of how sections are defined in an assembler language program.) The
objectmodulealsoincludes
alist of the relocationandlinking operationsthat
need to be performed, and a symbol table that describesthe symbols used in
these operations.
The SunOSlink-editor begins by reading the object modules (or other files)
that are presented to it to process. Sections from the input files that have the
same attributes are concatenated to form new sections within the output file.
The symbol tables from the input files are processedto match symbol defini-
tions and references,and relocation and linking operations within the output
file are performed. The linker normally generatesa new symbol table, and a
new set of relocation instructions, within the output file. These represent sym-
bols that must be bound at run time, and relocations that must be performed
when the program is loaded.
Symbolic referencesfrom the input files that do not have matching defini-
tions are processed by referring to archivesand sharedobjects.An archive is a
collection of relocatable object modules. A directory stored with the archive as-
sociatessymbol names with the object modules that contain their definitions.
Selectedmodules from an archive are automatically included to resolve sym-
bolic references, as described in Section 3.3.1.
A shared object is an indivisible unit that was generated by a previous
link-edit operation. When the link-editor encounters a reference to a symbol
defined in a shared object, the entire contents of the shared object become a
logicalpart of the output file. All symbols defined in the object are made avail-
able to the link-editing process.However, the shared object is not physically
included in the output file. Instead, the link-editor records the dependency on
the shared object. The actual inclusion of the shared object is deferred until run
time. (This is an example of the dynamic linking approach we discussed in
Section 3.4.2.In this case,the use of dynamic linking allows several executing
programs to share one copy of a shared object.)
The SunOS run-time linker is used to bind dynamic executables and
shared objects at execution time. The linker determines what shared objects
are required by the dynamic executable,and ensures that these objects are in-
cluded. It also inspects the shared objects to detect and process any additional
dependencies on other shared objects.
164 Chapter3 Loadersand Linkers
After it locates and includes the necessary shared objects, the linker per-
forms relocation and linking operations to prepare the program for execution.
These operations are specified in the relocation and linking sectionsof the dy-
namic executable and shared objects.They bind symbols to the actual memory
addresses at which the segments are loaded. Binding of data referencesis per-
formed before control is passed to the executable program. Binding of proce-
dure calls is normally deferred until the program is in execution. During
link-editing, calls to globally defined procedures are converted to referencesto
a procedure linkage table. When a procedure is called for the first time, control
is passed via this table to the run-time linker. The linker looks up the actual
address of the called procedure and inserts it into the linkage table. Thus sub-
sequent calls will go directly to the called procedure, without intervention by
the linker. This processis sometimes referred to as lazy binding.
The run-time linker also provides an additional level of flexibility. During
execution, a program can dynamically bind to new shared objects by request-
ing the same services of the linker that we have just described. This feature al-
lows a program to choose between a number of shared objects, depending on
the exact services required. It also reduces the amount of overhead required
for starting a program. If a shared object is not needed during a particular run,
it is not necessary to bind it at all. These advantages are similar to those that
we discussed for dynamic linking in Section 3.4.2.
This section describes some of the features of the MPP linker for Cray T3E sys-
tems. Further information can be found in Cray Research(1995b).
As we discussed in Chapter 1, a T3E system contains a large number of
processing elements (PEs). Each PE has its own local memory. In addition, any
PE can access the memory of all other PEs (this is sometimes referred to as
remotememory).However, the fastest accesstime always results from a PE ac-
cessing its own local memory.
An application program on a T3E system is normally allocated a partition
that consists of several PEs. (It is possible to run a program in a partition of
one PE, but this does not take advantage of the parallel architecture of the ma-
chine.) The work to be done by the program is divided between the PEs in the
partition. One common method for doing this is to distribute the elements of
an array among the PEs. For example, if a partition consists of 16 PEs,the ele-
ments of a one-dimensional array might be distributed as shown in Fig. 3.16.
The processing of such an array can also be divided among the PEs.
Suppose, for example, that the program contains a loop that processesall 256
array elements. PEOcould execute this loop for subscripts 1 through 16, PE1
3.5 ImplementationExamples 165
A(32) A[256]
could execute the loop for subscripts 17 through 32, and so on. In this way, all
of the PEs would share in the array processing, with each PE handling the ar-
ray elements from its own local memory. Section 5.5.3 describes how this kind
Private
PEO
data PE1 PEn
Code Code
Private Private
data data
Shared Shared
EXERCISES
Section 3.1
Exercises 167
1. Modify the algorithm given in Fig. 3.11 to use the bit-mask approach
to relocation. Linking will still be performed using Modification
records.
5. Apply the algorithm described in Fig. 3.11 to link and load the re-
vised object programs you generated in Exercise4.
6. Using the methods outlined in Chapter 8, develop a modular design
for a relocating and linking loader.
7. Extend the algorithm in Fig. 3.11to include the detection of improper
external referenceexpressionsas suggested in the text. (SeeSection
2.3.5 for the set of rules to be applied.) What problems arise in per-
forming this kind of error checking?
8. Modify the algorithm in Fig. 3.11 to use the reference-numbertech-
nique for code modification that is described in Section 3.2.3.
EXTDEF MAXLEN
Suppose that you have been given the task of writing an “un-
loader”—that is, a piece of software that can take the image of a pro-
gram that has been loaded and write out an object program that
could later be loaded and executed. The computer system uses a re-
locating loader, so the object program you produce must be capable
of being loaded at a location in memory that is different from where
your unloader took it. What problems do you seethat would prevent
you from accomplishing this task?
Suppose that you are given two images of a program as it would ap-
pear after loading at two differentlocations in memory. Assume that
the images represent the program after it is loaded and relocated,but
before any of the program’s instructions are actually executed.
Describe how this information could be used to accomplish the “un-
loading” task mentioned in Exercise 10.
Some loaders have used an indirect linking scheme. To use such a
techniquewith SIC/XE, the assemblerwould generate
a list of
pointer words from the EXTREF directive (one pointer word for each
external reference symbol). Modification records would direct the
loader to insert the external symbol addressesinto the corresponding
words in the pointer list. External references would then be accom-
plished with indirect addressing using these pointers. Thus, for ex-
ample, an instruction like
LDA XYZ
LDA Q@PXYZ
15.
16.
14.
13. Exercises 169
Constants
TI...
TV...
TI...
TC...
TI...
M..
Variables
Describe how the assembler could separate the object program into
TI, TV, and TC records as described above. Describe how the loader
would use the information in these records in loading the program.
Consider the control sections shown in Fig. 3.8. Assume that these
control sections are being loaded and linked at the addressesshown
in Fig. 3.10; thus the loader will set register R to the value 4000. What
value should appear in the External Symbol Table of the loader for
the symbol LISTB?What should the instruction labeled REF2in con-
trol section PROGC look like after all loading and linking operations
have been performed?
Section 3.3
Section 3.4
d. Store the source program and the linked version with all external
loading need not be removed until the termination of the main pro-
gram. Suggesta way to improve the efficiency of dynamic linking by
making it unnecessary for the operating system to be involved in the
. Suppose
transfer ofthat
control
it may
after
bethe
necessary
routine to
is loaded.
remove from memory routines
that were dynamically loaded (to reuse the space).Will the method
that you suggested
in Exercise
5 still work?Whatproblemsarise,
and how might they be solved?
. Exercises 173
Section 3.5
Macro Processors
175
176 Chapter4 Macro Processors
In this section we examine the fundamental functions that are common to all
macro processors. Section 4.1.1 discusses the processes of macro definition, in-
vocation, and expansion with substitution of parameters. These functions are
illustrated with examples using the SIC/XE assemblerlanguage. Section 4.1.2
presents a one-pass algorithm for a simple macro processor together with a
description of the data structures needed for macro processing. Later sections
in this chapter discuss extensions to the basic capabilities introduced in this
section.
This program defines and uses two macro instructions, RDBUFF and
WRBUFF.The functions and logic of the RDBUFFmacro are similar to those of
the RDREC subroutine in Fig. 2.5; likewise, the WRBUFF macro is similar to
the WRREC subroutine. The definitions of these macro instructions appear in
the source program following the START statement.
Two new assembler directives (MACRO and MEND) are used in macro de-
finitions. The first MACRO statement (line 10) identifies the beginning of a
macro definition. The symbol in the label field (RDBUFF) is the name of the
macro, and the entries in the operand field identify the parametersof the macro
instruction. In our macro language, each parameter begins with the character
&, which facilitates the substitution of parameters during macro expansion.
The macro name and parameters define a pattern or prototype for the macro in-
structions used by the programmer. Following the MACRO directive are the
statementsthat make up the bodyof the macro definition (lines 15 through 90).
Theseare the statementsthat will be generated as the expansion of the macro.
The MEND assembler directive (line 95) marks the end of the macro defini-
tion. The definition of the WRBUFF macro (lines 100 through 160) follows a
similar pattern.
The main program itself begins on line 180. The statement on line 190 is a
macroinvocationstatement that gives the name of the macro instruction being
invoked and the argumentsto be used in expanding the macro. (A macro invo-
cation statement is often referred to as a macro call. To avoid confusion with the
call statementsused for procedures and subroutines, we prefer to use the term
invocation. As we shall see, the processes of macro invocation and subroutine
call are quite different.) You should compare the logic of the main program in
Fig. 4.1 with that of the main program in Fig. 2.5, remembering the similarities
in function between RDBUFF and RDREC and between WRBUFF and
WRREC.
COPY
STARTTO&INDEV,
MACRO
MACRO 0
READRECORD
&BUFADR,
INTO
&RECLTH
COPY
BUFFER
FILE
| FROM
INPUTTOOUTPUT
10 RDBUFF
15
20
25 -~
60 RD =X‘
&INDEV’ READ CHARACTER INTO REG A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ *+11 EXIT LOOP IF EOR
75 STCH &BUFADR,
X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUM LENGTH
85 JLT *-19 HAS BEEN REACHED
90 STX &RECLTH
MACRO
MACROTO WRITE
&OUTDEV
RECORD
, &BUFADR,
FROM
SAVE
&RECLTH
BUFFER
RECORD LENGTH
95 MEND
100 WRBUFF
105
110
115
120 CLEAR x CLEAR LOOP COUNTER
125 LDT &RECLTH
130 LDCH &BUFADR,
X GET CHARACTER FROM BUFFER
135 TD =X’ &OUTDEV’ TEST OUTPUT DEVICE
140 LOOP UNTIL READY
JEQ *-3
145 WD =X‘ &QUTDEV’ WRITE CHARACTER
165
170 MAIN PROGRAM
175
180 FIRST STL
RDBUFF RETADR
F1,BUFFER,LENGTH
SAVE
READ
RETURN
RECORD
ADDRESS
INTO BUFFER
190 CLOOP
195 TEST FOR END OF FILE
LDA LENGTH
200 COMP #0
205 JEQ
WRBUFF ENDFIL
05,BUFFER,LENGTH
EXIT
WRITE
IF EOF
OUTPUT
FOUND
RECORD
210
215 J CLOOP LOOP
178
255
Line Source statement
190b CLEAR A
190c CLEAR S
200 COMP #0
210 bs uENGTH
05, BUFFX WRITE OUTPUT RECORD
BUFFER,
210a CLEAR LOOP COUNTER
210b LENGTH
215 LOOP
CLOOP
05, EOF,THREE
220 - ENDFIL WRBUFF INSERT EOF MARKE
220b
225 J @RETADR
179
180 Chapter4 Macro Processors
Lines190athrough190mshowthecompleteexpansion
of themacroinvo-
cation on line 190. The comment lines within the macro body have been
deleted, but comments on individual statements have been retained. Note that
the macro invocation statement itself has been included as a comment line.
and the statements generated from the macro expansions will be assembled
exactly as though they had been written directly by the programmer.
A comparison of the expanded program in Fig. 4.2 with the program in
Fig. 2.5 shows the most significant differences between macro invocation
and subroutine call. In Fig. 4.2, the statements from the body of the macro
WRBUFF are generated twice: lines 210athrough 210h and lines 220athrough
220h. In the program of Fig. 2.5, the corresponding statements appear only
once: in the subroutine WRREC (lines 210 through 240). In general, the state-
ments that form the expansion of a macro are generated (and assembled)each
time the macro is invoked. Statements in a subroutine appear only once, re-
gardless of how many times the subroutine is called.
Note also that our macro instructions have been written so that the body of
the macro contains no labels. In Fig. 4.1, for example, line 140 contains the
statement “JEQ *-3” and line 155 contains “JLT *-14.” The corresponding
statements in the WRREC subroutine (Fig. 2.5) are “JEQ WLOOP” and “JLT
WLOOP,” where WLOOP is a label on the TD instruction that tests the output
device. If such a label appearedon line 135of the macro body, it would be gen-
erated twice—on lines 210d and 220d of Fig. 4.2. This would result in an error
(a duplicate label definition) when the program is assembled.To avoid dupli-
cation of symbols, we have eliminated labels from the body of our macro defi-
nitions.
(a)
{SIC/XE version}
{SIC/XE version}
(b)
NAMTAB &INDEV,
DEFTAB
&BUFADR, &RECLTH
Themacro
processor
algorithm
itselfispresented
in Fig.4.5.Theproce-
dure DEFINE, which is called when the beginning of a macro definition is rec-
ognized, makes the appropriate entries in DEFTAB and NAMTAB. EXPAND is
called to set up the argument values in ARGTAB and expand a macro invoca-
tion statement. The procedure GETLINE, which is called at several points in
the algorithm, gets the next line to be processed. This line may come from
DEFTAB (the next line of a macro being expanded), or from the input file,
depending upon whether the Boolean variable EXPANDING is set to TRUE or
FALSE.
example in Fig. 4.3, however. The MEND on line 3 (which actually marks the
end of the definition of RDBUFF) would be taken as the end of the definition
of MACROS. To solve this problem, our DEFINE procedure maintains a
counter named LEVEL. Each time a MACRO directive is read, the value of
LEVEL is increased by 1; each time an MEND directive is read, the value of
LEVEL is decreased by 1. When LEVEL reaches 0, the MEND that corre-
sponds to the original MACRO directive has been found. This processis very
much like matching left and right parentheseswhen scanning an arithmetic
expression.
procedure PROCESSLINE
begin
search NAMTAB for OPCODE
if found then
EXPAND
else if OPCODE = ‘MACRO’ then
DEFINE
procedure DEFINE
begin
enter macro name into NAMTAB
enter macro prototype into DEFTAB
LEVEL := 1
while LEVEL > 0 do
begin
GETLINE
if this is not a comment line then
begin
substitute positional notation for parameters
enter line into DEFTAB
if OPCODE = ‘MACRO’ then
LEVEL := LEVEL + 1
else if OPCODE = ‘’MEND’ then
LEVEL := LEVEL - 1
procedure EXPAND
begin
EXPANDING := TRUE
end {while}
EXPANDING := FALSE
end {EXPAND}
procedure GETLINE
begin
if EXPANDING then
begin
get next line of macro definition from DEFTAB
substitute arguments from ARGTAB for positional notation
end {if}
else
You may want to apply this algorithm by hand to the program in Fig. 4.1
to be sure you understand its operation. The result should be the same as
shown in Fig. 4.2.
Most macro processorsallow the definitions of commonly used macro in-
structions to appear in a standard system library, rather than in the source pro-
gram. This makes the use of such macros much more convenient. Definitions
are retrieved from this library as they are needed during macro processing.
The extension of the algorithm in Fig. 4.5 to include this sort of processing
appears as an exerciseat the end of this chapter.
In this section we discuss several extensions to the basic macro processor func-
tions presented in Section 4.1. As we have mentioned before, these extended
features are not directly related to the architecture of the computer for which
the macro processor is written. Section 4.2.1 describes a method for concate-
nating macro instruction parameters with other character strings. Section 4.2.2
discusses one method for generating unique labels within macro expansions,
which avoids the need for extensive use of relative addressing at the source
statement level. Section 4.2.3 introduces the important topic of conditional
macro expansion and illustrates the concepts involved with several examples.
This ability to alter the expansion of a macro by using control statements
makes macro instructions a much more powerful and useful tool for the pro-
grammer. Section4.2.4describesthe definition and use of keyword parameters
in macro instructions.
Suppose that the parameter to such a macro instruction is named &ID. The
body of the macro definition might contain a statement like
LDA X&ID1
in which the parameter &ID is concatenated after the character string X and
before the character string 1. Closer examination, however, reveals a problem
with such a statement. The beginning of the macro parameter is identified by
the starting symbol &; however, the end of the parameter is not marked. Thus
the operand in the foregoing statement could equally well represent the char-
acter string X followed by the parameter &ID1. In this particular case, the
macro processor could potentially deduce the meaning that was intended.
However, if the macro definition contained both &ID and &ID1 as parameters,
the situation would be unavoidably ambiguous.
Most macro processorsdeal with this problem by providing a special con-
catenationoperator.In the SIC macro language, this operator is the character —.
Thus the previous statement would be written as
LDA X&ID—1
so that the end of the parameter &ID is clearly identified. The macro processor
deletes all occurrences of the concatenation operator immediately after per-
forming parameter substitution, so the character — will not appear in the
macro expansion.
Figure 4.6(a) shows a macro definition that uses the concatenation operator
as previously described. Figure 4.6(b) and (c) shows macro invocation state-
ments and the corresponding macro expansions.You should work through the
generation of these macro expansions for yourself to be sure you understand
how the concatenation operators are handled. You are also encouraged to
think about how the concatenation operator would be handled in a macro pro-
cessingalgorithm like the one given in Fig. 4.5.
relative addressing at the source statement level. Consider, for example, the
definition of WRBUFF in Fig. 4.1. If a label were placed on the TD instruction
on line 135, this label would be defined twice—once for each invocation of
188 Chapter4 Macro Processors
(a)
SUM A
LDA XAL
ADD XA2
ADD XA3
STA XAS
(b)
SUM BETA
LDA XBETAL
ADD XBETA2
ADD XBETA3
STA XBETAS
(c)
Figure 4.7 illustrates one technique for generating unique labels within a
macro expansion. A definition of the RDBUFF macro is shown in Fig. 4.7(a).
Labels used within the macro body begin with the special character $. Figure
4.7(b) shows a macro invocation statement and the resulting macro expansion.
Each symbol beginning with $ has been modified by replacing $ with $AA.
More generally, the character $ will be replaced by $xx, where xx is a two-
character alphanumeric counter of the number of macro instructions ex-
panded. For the first macro expansion in a program, xx will have the value
AA. For succeeding macro expansions, xx will be set to AB, AC, etc. (If only al-
phabetic and numeric characters are allowed in xx, such a two-character
counter provides for as many as 1296macro expansions in a single program.)
This results in the generation of unique labels for each expansion of a macro
instruction. For further examples,seeFigs. 4.8 and 4.10.
The SIC assembler language allows the use of the character $ in symbols;
however, programmers are instructed not to use this character in their source
programs. This avoids any possibility of conflict between programmer-
generated symbols and those created by the macro processor.
tion of RDBUFF.)
25
90 RDBUFF
SAAEXIT MACRO
STX & INDEV, &BUFADR, &RECLTH
30 CLEAR X CLEAR LOOP COUNTER
35 CLEAR A
40 CLEAR Ss
45 +LDT #4096 SET MAXIMUM RECORD LENGTH
50 SLOOP TD =X‘ &INDEV’ TEST INPUT DEVICE
55 JEQ SLOOP LOOP UNTIL READY
60 RD =X‘
&INDEV’ READ CHARACTER INTO REG A
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ &BUFADR
SEXIT ,X EXIT LOOP IF EOR
75 STCH STORE CHARACTER IN BUFFER
80 TIXR LOOP UNLESS MAXIMUM LENGTH
85 JLT SLOOP HAS BEEN REACHED
RDBUFF
(b)
4.2. Machine-Independent
Macro ProcessorFeatures 191
(a)
(c)
RDBUFF F1,BUFF,RLENG, 04
30 CLEAR xX CLEAR
LOOP
COUNTER
35 CLEAR A
40 LDCH =X'04' SET EOR CHARACTER
42 RMO A,S
45 +LDT #4096 SETMAXLENGTH
= 4096
50 SACLOOP TD =X'F1’ TEST INPUT DEVICE
55 JEQ SACLOOP LOOP
UNTILREADY
60 RD =X'F1’ READ
CHARACTER
INTOREGA
65 COMPR A,S TEST FOR END OF RECORD
pression that is its operand. If the value of this expression is TRUE, the state-
ments following the IF are generated until an ELSEis encountered.
Otherwise, these statements are skipped, and the statements following the
ELSE are generated. The ENDIF statement terminates the conditional expres-
sion that was begun by the IF statement. (As usual, the ELSE clause can be
omitted entirely.) Thus if the parameter &MAXLTH is equal to the null string
4.2. Machine-Independent
Macro ProcessorFeatures 193
(that is, if the corresponding argument was omitted in the macro invocation
statement), the statement on line 45 is generated. Otherwise, the statement on
line 47 is generated.
A similar structure appears on lines 26 through 28. In this case, however,
the statement controlled by the IF is not a line to be generated into the macro
expansion. Instead, it is another macro processor directive (SET). This SET
statement assigns the value 1 to &EORCK. The symbol &EORCK is a macro-
time variable(also often called a setsymbol),which can be used to store working
values during the macro expansion. Any symbol that begins with the character
& and that is not a macro instruction parameter is assumed to be a macro-time
variable. All such variables are initialized to a value of 0. Thus if there is an ar-
gument corresponding to &EOR (that is, if &EOR is not null), the variable
&EORCK is set to 1. Otherwise, it retains its default value of 0. The value of
this macro-time variable is used in the conditional structures on lines 38
are made or modified when SET statements are processed.The table is used to
look up the current value of a macro-time variable whenever it is required.
When an IF statement is encountered during the expansion of a macro, the
specified Boolean expression is evaluated. If the value of this expression is
TRUE, the macro processor continues to process lines from DEFTAB until it
encounters the next ELSE or ENDIF statement. If an ELSE is found, the macro
processorthen skips lines in DEFTABuntil the next ENDIF. Upon reaching the
ENDIF, it resumes expanding the macro in the usual way. If the value of the
specified Boolean expression is FALSE, the macro processor skips ahead in
DEFTAB until it finds the next ELSE or ENDIF statement. The macro processor
then resumes normal macro expansion.
194 Chapter 4. Macro Processors
The implementation outlined above does not allow for nested IF struc-
tures. You are encouraged to think about how this technique could be modi-
fied to handle such nested structures (see Exercise 4.2.10).
It is extremely important to understand that the testing of Boolean expres-
sions in IF statements occurs at the time macros are expanded.By the time the
program is assembled,all such decisions have been made. There is only one
sequenceof source statements [for example, the statements in Fig. 4.8(c)], and
the conditional macro expansion directives have been removed. Thus macro-
time IF statementscorrespond to options that might have been selectedby the
programmer in writing the source code. They are fundamentally different
from statements such as COMPR (or IF statements in a high-level program-
ming language), which test data values during program execution.The same
applies to the assignment of values to macro-time variables, and to the other
conditional macro expansion directives we discuss.
The macro-time IF-ELSE-ENDIF structure provides a mechanism for either
generating (once) or skipping selectedstatements in the macro body. A differ-
ent type of conditional macro expansion statement is illustrated in Fig. 4.9.
Figure 4.9(a) shows another definition of RDBUFF. The purpose and function
of the macro are the same as before. With this definition, however, the pro-
grammercanspecify
a list of end-of-record
characters.
In themacroinvocation
statement in Fig. 4.9(b), for example, there is a list (00,03,04)corresponding to
the parameter &EOR. Any one of thesecharactersis to be interpreted as mark-
ing the end of a record. To simplify the macro definition, the parameter
&MAXLTH has been deleted; the maximum record length will always be 4096.
The definition in Fig. 4.9(a) uses a macro-time looping statement WHILE.
The WHILE statement specifies that the following lines, until the next ENDW
statement, are to be generated repeatedly as long as a particular condition is
true. As before, the testing of this condition, and the looping, are done while
the macro is being expanded. The conditions to be tested involve macro-time
variables and arguments, not run-time data values.
The use of the WHILE-ENDW structure is illustrated on lines 63 through
73 of Fig. 4.9(a). The macro-time variable &EORCT has previously been set
(line 27) to the value %NITEMS(&EOR). %NITEMS is a macro processor
function that returns as its value the number of members in an argument list.
For example, if the argument corresponding to &EOR is (00,03,04), then
%NITEMS(&EOR) has the value 3.
The macro-time variable &CTR is used to count the number of times the
lines following the WHILE statement have been generated. The value of
&CTR is initialized to 1 (line 63), and incremented by 1 each time through the
loop (line 71). The WHILE statement itself specifies that the macro-time loop
will continue to be executed as long as the value of &CTR is less than or equal
to the value of &EORCT. This means that the statements on lines 65 and 70
4.2 Machine-Independent
Macro ProcessorFeatures 195
(a)
-~
will be generated once for each member of the list corresponding to the para-
meter &EOR. The value of &CTR is used as a subscript to select the proper
member of the list for each iteration of the loop. Thus on the first iteration the
expression &EOR[&CTR] on line 65 has the value 00; on the second iteration it
has the value 03, and so on.
Figure 4.9(b) shows the expansion of a macro invocation statement using
the definition in Fig. 4.9(a). You should examine this example carefully to be
sure you understand how the WHILE statements are handled.
The implementation of a macro-time looping statement such as WHILE is
also relatively simple. When a WHILE statement is encountered during macro
expansion, the specified Boolean expression is evaluated. If the value of this
expression is FALSE,the macro processorskips ahead in DEFTAB until it finds
the next ENDW statement, and then resumes normal macro expansion. If the
value of the Boolean expression is TRUE, the macro processor continues to
process lines from DEFTAB in the usual way until the next ENDW statement.
When the ENDW is encountered, the macro processor returns to the preceding
WHILE, re-evaluates the Boolean expression, and takes action based on the
new value of this expressionas previously described.
This method of implementation does not allow for nested WHILE struc-
tures. You are encouraged to think about how such nested structures might be
supported (seeExercise4.2.14).
All the macro instruction definitions we have seenthus far used positionalpara-
meters. That is, parameters and arguments were associated with each other ac-
cording to their positions in the macro prototype and the macro invocation
statement. With positional parameters, the programmer must be careful to
specify the arguments in the proper order. If an argument is to be omitted, the
macro invocation statement must contain a null argument (two consecutive
commas) to maintain the correct argument positions. [See, for example, the
macro invocation statement in Fig. 4.8(c).]
Positional parameters are quite suitable for most macro instructions.
However, if a macro has a large number of parameters, and only a few of these
are given values in a typical invocation, a different form of parameter specifi-
cationis moreuseful.(Sucha macromayoccurina situationin whicha large
and complex sequence of statements—perhaps even an entire operating sys-
tem—is to be generated from a.macro invocation. In such cases,most of the
parametersmay have acceptabledefault values; the macro invocation specifies
only the changesfrom the default set of values.)
For example, suppose that a certain macro instruction GENER has 10 pos-
sible parameters, but in a particular invocation of the macro, only the third
4.2. Machine-Independent
Macro ProcessorFeatures 197
GENER ,,DIRECT,,,,,,3-
GENER TYPE=DIRECT,
CHANNEL=3.
This statement is obviously much easier to read, and much less error-prone,
than the positional version.
Figure 4.10(a)shows a version of the RDBUFF macro definition using key-
word parameters. Except for the method of specification, the parameters are
the same as those in Fig. 4.8(a). In the macro prototype, each parameter name
is followed by an equal sign, which identifies a keyword parameter. After the
equal sign, a default value is specified for some of the parameters. The para-
meter is assumed to have this default value if its name does not appear in the
macro invocation statement. Thus the default value for the parameter
&INDEV is F1. There is no default value for the parameter SBUFADR.
Default values can simplify the macro definition in many cases.For exam-
ple, the macro definitions in Figs. 4.10(a) and 4.8(a) both provide for setting
the maximum record length to 4096unless a different value is specified by the
user. The default value established in Fig. 4.10(a) takes care of this automati-
cally. In Fig. 4.8(a),an IF-ELSE-ENDIFstructure is required to accomplish the
same thing.
The other parts of Fig. 4.10 contain examples of the expansion of keyword
macro invocation statements. In Fig. 4.10(b), all the default values are ac-
cepted. In Fig. 4.10(c),the value of &INDEV is specified as F3, and the value of
&EOR is specified as null. These values override the corresponding defaults.
Note that the arguments may appear in any order in the macro invocation
statement. You may want to work through these macro expansions for your-
self, concentrating on how the default values are handled.
In this section we discuss some major design options for a macro proces-
sor. The algorithm presented in Fig. 4.5 does not work properly if a macro
198 Chapter4 Macro Processors
&INDEV=F 1,&BUFADR=
, &RECLTH=,
&EOR=04,
- &MAXLTH=4096
25 RDBUFF MACRO
26 IF (&EORNE '')
27 &EORCK SET 1 ;
28 ENDIF
30 CLEAR x CLEAR LOOP COUNTER
35 CLEAR A
38 IF (&EORCK EQ 1)
40 LDCH =X'&EOR' SET EOR CHARACTER
42 RMO A,S
43 ENDIF
47 +LDT #&MAXLTH SET MAXIMUM RECORD LENGTH
50 SLOOP TD =X‘&INDEV’ TEST INPUT DEVICE
55 LOOP UNTIL READY
JEQ SLOOP
60 RD =X' &INDEV’ READ CHARACTER INTO REG A
63 IF (&EORCK EQ 1)
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ SEXIT EXIT LOOP IF EOR
73 ENDIF
75 STCH &BUFADR,X STORE CHARACTER IN BUFFER
80 TIXR T LOOP UNLESS MAXIMUM LENGTH
85 JLT SLOOP HAS BEEN REACHED
90 SEXIT STX &RECLTH
BUFADR=BUFFER, RECLTH=LENGTH
SAVE RECORD LENGTH
95 MEND
(a)
RDBUFF
90 SAAEXIT
Figure STX of keyword
4.10 Use LENGTH
parameters inSAVE
macroRECORD
instructions.
LENGTH
(b)
4.3. Macro ProcessorDesign Options 199
the programmer who is defining RDBUFFneed not worry about the details of
device accessand control. (RDCHAR might be written at a different time, or
even by a different programmer.) The advantages of using RDCHAR in this
way would be even greater on a more complex machine, where the code to
read a single character might be longer and more complicated than our simple
three-line version.
30 CLEAR x CLEARLOOPCOUNTER
35 CLEAR A
40 CLEAR Ss
45 +LDT #4096 SET MAXIMUMRECORDLENGTH
50 SLOOP RDCHAR &INDEV READCHARACTER
INTO REGA
65 COMPR A,S TEST FOR END OF RECORD
70 JEQ SEXIT EXIT LOOPIF EOR
75 STCH &BUFADR,X STORECHARACTER IN BUFFER
80 TIXR T LOOP UNLESSMAXIMUMLENGTH
85 JLT $LOOP HAS BEENREACHED
90 SEXIT STX &RECLTH SAVERECORDLENGTH
95 MEND
(a)
(b)
RDBUFF BUFFER,
LENGTH,
F1
(c)
Parameter Value
1 BUFFER
2 LENGTH
3 F1
4 (unused)
Parameter Value
1 Fl
2 (unused)
The expansion of RDCHAR would also proceed normally. At the end of this
expansion, however, a problem would appear. When the end of the definition
of RDCHAR was recognized, EXPANDING would be set to FALSE. Thus the
macro processorwould “forget” that it had been in the middle of expanding a
macro when it encountered the RDCHAR statement. In addition, the argu-
ments from the original macro invocation (RDBUFF) would be lost because
the values in ARGTAB were overwritten with the arguments from the invoca-
tion of RDCHAR.
These problems are not difficult to solve if the macro processor is being
written in a programming language (such as Pascalor C) that allows recursive
calls. The compiler would be sure that previous values of any variables de-
clared within a procedure were saved when that procedure was called recur-
sively. It would also take care of other details involving return from the
procedure. (In Chapter 5 we consider in detail how such recursive calls are
handled by a compiler.)
If a programming language that supports recursion is not available, the
programmer must take care of handling such items as return addresses and
values of local variables. In such a case, PROCESSLINE and EXPAND would
probably not be procedures at all. Instead, the same logic would be incorpo-
ratedinto a loopingstructure,with datavaluesbeingsavedona stack.The
concepts involved are the same as those that we discuss when we consider re-
cursion in Chapter 5. An example of such an implementation can be found in
Donovan (1972).
facility for each compiler or assembler language, so much of the time and ex-
pense involved in training are eliminated. The costs involved in producing a
general-purpose macro processor are somewhat greater than those for devel-
oping a language-specific processor.However, this expensedoes not need to
be repeated for each language; the result is a substantial overall saving in soft-
ware development cost. Similar savings in software maintenanceeffort should
also be realized. Over a period of years, these maintenance costs may be even
moresignificantthantheoriginalcostfor softwaredevelopment.
In spite of the advantages noted, there are still relatively few general-
purpose macro processors. One of the reasons for this situation is the large
number of details that must be dealt with in a real programming language. A
special-purpose macro processor can have these details built into its logic and
4.3 Macro ProcessorDesignOpttons 203
-~
The macro processors that we have discussed so far might be called preproces-
sors. That is, they process macro definitions and expand macro invocations,
producing an expanded version of the source program. This expanded pro-
gram is then used as input to an assembleror compiler. In this section we dis-
cuss an alternative: combining the macro processing functions with the
language translator itself.
The simplest method of achieving this sort of combination is a line-by-line
macro processor.Using this approach, the macro processor reads the source
program statements and performs all of its functions as previously described.
However, the output lines are passed to the language translator as they are
generated (one at a time), instead of being written to an expanded source file.
Thus the macro processor operates as a sort of input routine for the assembler
or compiler.
This line-by-line approach has severaladvantages.It avoids making an ex-
tra pass over the source program (writing and then reading the expanded
source file), so it can be more efficient than using a macro preprocessor. Some
of the data structures required by the macro processorand the language trans-
lator can be combined. For example, OPTAB in an assemblerand NAMTAB in
the macro processor could be implemented in the same table. In addition,
many utility subroutines and functions can be used by both the language
translator and the macro processor. These include such operations as scanning
input lines, searching tables, and converting numeric values from external to
internal representations.A line-by-line macro processoralso makes it easier to
give diagnostic messagesthat are related to the source statement containing
the error (i.e., the macro invocation statement). With a macro preprocessor,
such an error might be detected only in relation to some statement in the
macro expansion. The programmer would then need to backtrack to discover
the original source of trouble.
Although
a line-by-linemacroprocessor
mayusesomeof thesameutility
routines as the language translator, the functions of macro processing and pro-
gram translation are still relatively independent. The main form of communi-
cation between the two functions is the passing of source statements from one
, 4.3. Macro ProcessorDesign Options 205
to the other. It is possible to have even closer cooperation between the macro
processorand the assembleror compiler. Such a schemecan be thought of as a
language translator with an integratedmacro processor.
An integrated macro processor can potentially make use of any informa-
tion about the source program that is extracted by the language translator. The
actual degree of integration varies considerably from one system to another.
Ata relativelysimplelevelof cooperation,
the macroprocessor
may usethe
results of such translator operations as scanning for symbols, constants, etc.
Such operations must be performed by the assembleror compiler in any case;
the macro processorcan simply use the results without being involved in such
details as multiple-character operators, continuation lines, and the rules for
token formation. This is particularly useful when the rules for such details
vary from one part of the program to another (for example, within FORMAT
statements and character string constants in FORTRAN).
The sort of token scan just mentioned is conceptually quite simple.
However, many real programming languages have certain characteristics that
create unpleasant difficulties. One classic example is the FORTRAN statement
DO 100 I = 1,20
bo 100 I= 1
has a quite different meaning. This is an assignment statement that gives the
value 1 to the variable DO100I. Thus the proper interpretation of the charac-
ters DO, 100, etc., cannot be decided until the rest of the statement is exam-
ined. Such interpretations would be very important if, for example, a macro
involved substituting for the variable name I. A FORTRAN compiler must be
able to recognize and handle situations such as this. However, it would be
very difficult for an ordinary macro processor (not integrated with a compiler)
to do so. Such a macro processor would be concerned only with character
strings, not with the interpretation of source statements.
With an even closer degree of cooperation, an integrated macro processor
can support macro instructions that depend upon the context in which they
occur. For example, a macro could specify a substitution to be applied only to
variables or constants of a certain type, or only to variables appearing as loop
indicesin DOstatements.
Theexpansion
of a macrocouldalsodependupona
variety of characteristics of its arguments.
206 Chapter 4 Macro Processors
This section describes some of the macro processing features of the Microsoft
MASM assembler. Further information about MASM can be found in
Barkakati (1992).
The macro processor of MASM is integrated with Pass 1 of the assembler.
It supports all of the main macro processor functions that we have discussed,
including the definition and invocation of macro instructions within macros.
Macros may be redefined during a program, without causing an error. The
new definition of the macro simply replaces the first one. However, this prac-
tice can be very confusing to a person reading the program—it should proba-
bly be avoided.
One of the main differences between the MASM macro processor and the
one we discussed for SIC lies in the nature of the conditional macro expansion
' 4.4 ImplementationExamples 207
Line 2 declares that EXIT is a local label. When the macro is expanded,
each local label is replaced by a unique name. MASM generates these unique
names in the form ??n, where n is a hexadecimal number in the range 0000 to
FFFF.Seethe macro expansions in Fig. 4.12(b) and (c) for an example of this.
The IFNB
on line 3 evaluates to “true” if its operand is not blank. If the pa-
rameter SIZE is not blank (that is, if it is present in the macro invocation), lines
4 through 8 are processed.Otherwise, these lines are skipped. Lines 4 through
8 contain a nested conditional statement. The IFDIF on line 4 is true if the
(a)
(b)
ABSDIF M,N,E
L
MOV EAX,M ; COMPUTE ABSOLUTE DIFFERENCE
SUB EAX,N
JNS ??0001
NEG EAX
2270001:
ABSDIF P,Q,X
L
; ERROR -- SIZE MUST BE E OR BLANK
(a)
NODE Xx
XLEFT DW 0
XDATA DW 0
XRIGHT DW 0)
(b)
This section describes some of the macro processing features of the ANSI C
programming language. Section 5.5.1 discussesthe structure of a typical com-
piler and preprocessor that implement these features. Further information can
be found in Schildt (1990),as well as in many C language referencebooks.
In the ANSI C language, definitions and invocations of macros are handled
by a preprocessor.This preprocessor is generally not integrated with the rest
of the compiler. Its operation is similar to the macro processor we discussed in
Section 4.1. The preprocessor also performs a number of other functions, some
of which are discussed in Section 5.5.1.
Here are two simple (and commonly used) examples of ANSI C macro
definitions.
#define NULL 0
#define EOF (-1)
After these definitions appear in the program, every occurrence of NULL will
#define EO
210 Chapter 4 Macro Processors
while (I EQ 0)...
-
while (I == 0)...
In this case, the macro name is ABSDIFF; the parameters are named X and Y.
The body of the macro makes use of a special C language conditional expres-
sion. If the condition (X) > (Y) is true, the value of this expression is the first al-
ternative specified, (X) —(Y). If the condition is false, the value of the
expression is the second alternative, (Y) —(X).
A macro invocation consistsof the name of the macro followed by a paren-
thesized list of parameters separated by commas. When the macro is
expanded, each occurrence of a macro parameter is replaced by the corre-
sponding argument. For example,
ABSDIFF(I+1,J-5)
Notice the similarity between this macro invocation and a function call.
Clearly, we could write a function ABSDIFF to perform this same operation.
However, the macro is more efficient, because the amount of computation re-
quired to compute the absolute difference is quite small—much less than the
overhead of calling a function. The macro version can also be used with differ-
ent types of data. For example,we could invoke the macroas
ABSDIFF(I, 3.14159)
‘ 4.4 ImplementationExamples 211
or
ABSDIFF(‘D', "A’)
ABSDIFF(3 + 1,10 - 8)
3+1>10-8?73+1-410-8:10-8-3++21
which would not produce the intended result. (The first alternative in this case
has the value —14instead of 2, as it should be.)
In ANSI C, parameter substitutions are not performed within quoted
strings. For example, consider the macro definition
DISPLAY(I*J+1)
(However, some C compilers would perform the substitution for EXPR inside
the quoted string, possibly with a warning messageto the programmer.)
To avoid this problem, ANSI C provides a special “stringizing” operator #.
When the name of a macro parameter is preceded by #, argument substitution
is performed in the usual way. After the substitution, however, the resulting
string is enclosed in quotes. For example, if we define
DISPLAY(I*J+1)
DISPLAY
(ABSDIFF
(3, 8) )
(Notice that the ABSDIFF within the quoted string is not treated as a macro
invocation.) When executed, this statement would produce the output
ABSDIFF
(3,8) = 5
#ifndef BUFFER_SIZE
.
#define BUFFER_SIZE 1024
#endif
‘ 4.4 ImplementationExamples 213
the #define will be processed only if BUFFER_SIZE has not already been
defined.
Conditionals are also often used to control the inclusion of debugging
statementsin a program. Consider, for example, the sequence
#define DEBUG 1
#i£ DEBUG == 1
In this case,the printf statement will be included in the output from the pre-
processor (and therefore compiled into the program). If the first line were
changed to
#define DEBUG 0
the printf would not be included in the program. The same thing could also be
accomplishedby writing
#ifdef DEBUG
In this case, the printf would be included if a #define statement for DEBUG
appeared in the source program.
$1 = $2 + %3
could be invoked as
could be invoked as
Figure 4.14 illustrates how ELENA could be used with different languages.
Consider the macro header shown in Fig. 4.14(a). If this macro is to be used
with the C language, its body might be defined as shown in Fig. 4.14(b).An
example of a macro invocation and expansion using this body appears in
Fig. 4.14(c).
On the other hand, suppose that the macro is to be used in an x86 assem-
bler language program. In that case,the body might be defined as shown in
Fig. 4.14(d).An example of a macro invocation and expansion using this body
appears in Fig. 4.14(e). Notice that in this expansion the label &STOR is
changedto &STORO001.
Thecharacter& identifies&STORasa local label
within the macro definition. The macro processorappends a numeric value to
createunique labels eachtime the macro is expanded.
ELENA also provides macro-time variables and macro-time instructions
that can be used to control the macro expansion. Consider the macro header
shown in Fig. 4.15(a) and the associated body in Fig. 4.15(b). The .SET state-
ment on the first line of the macro body sets the macro-time variable .LAA to
1. The next line is a statement to be generated as part of the macro expansion.
Afterthislineis generated,
thefollowing.SETstatement
adds
1to thevalueof
.LAA. If this new value is less than or equal to the value of the second parame-
ter, the .IF macro-time instruction causes the macro processor to jump back to
the line with the macro-time label .E. Figure 4.15(c) shows an example of a
macro invocation and expansion using this body.
macro-time
would
is
proach
a The
higher-level
have
to
macro-time
conditional
conditional
written
macro-time
this
instructions
macro
definition
“goinstruction.
expansion.
to” in
statement.
using
ELENAThe
therepresent
In
WHILE-ENDW
.IFthe
statement
4.4
SICa Implementation
macro
different
in structure,
Fig.
language,
type
4.15(b)
Examples
of is
which
ap-
wea 215
%1:=ABSDIFF
(%2,%3)
(a)
(b)
Z :=ABSDIFF
(X,Y)
(c)
MOV EAX
,$2
SUB EAX,$3
JNS &STOR
NEG EAX
&STOR MOV EAX,$1
(d)
Z :=ABSDIFF
(X,Y)
MOV FAX,
X
SUB FAX,
Y
JNS STORO001
NEG EAX
STOROOO1
MOV FAX,
Z
Figure 4.14 Examples of ELENA macro
(e) definition and invocation.
216 Chapter4 Macro Processors
(a)
.SET .LAA = 1
.E V(.LAA) = V(.LAA) + $1
.SET .LAA = .LAA + 1
.IF .LAA LE %2 .JUMP .E
(b)
L
vil) = V(l1) + 5
V(2) = V(2) + 5
V(3) = V(3) +5
(c)
The ELENA macro processor uses a macro definition table that is similar to
the one we discussed for SIC. However, the processof matching a macro invo-
cation with a macro header is more complicated. Notice that there is no single
token that constitutes the macro “name.” Instead, the macro is identified by
the sequenceof keywords that appear in its header.Consider, for example, the
two macro headers
ADD %1 TO %2
ADD %1 TO THE FIRST ELEMENT OF %2
Furthermore, it is not even clear from a macro invocation statement which to-
kens are keywords and which are parameters. A sequenceof tokens like
DISPLAY TABLE
4.4 ImplementationExamples 217
DISPLAY %i
(where the parameter specifies what to display). On the other hand, it could
also be an invocation of a macro with header
%1 TABLE
A SUM B,C
would be compared against all macro headers in which the first token is A or
the second token is SUM.
A=B+1
$1 $2 + $3
and
$1 $2 +1
In this situation, ELENA selects the header with the fewest parameters (i.e.,
the second of the two headers just mentioned). If there are two or more match-
ing headers with the same number of parameters, the most recently defined
macro is selected.
218 Chapter
Section
EXERCISES
4 4.1
Macro Processors
10.
wn
. Apply the algorithm in Fig. 4.5 to process the source program in
Fig. 4.1; the results should be the same as shown in Fig. 4.2.
Using the methods outlined in Chapter 8, develop a modular design
for a One-pass macro processor.
Section 4.2
. nation
Modify operators.
the algorithm in Fig. 4.5 to include the generation of unique
. Suppose
labels within
that macro
we want
expansions.
to allow labels within macro expansionswith-
out requiring them to have any special form (such as beginning with
$). Each such label would be considered to be defined only within
the macro expansion in which it occurs; this would eliminate the
problem caused by duplicate labels. How could the macro processor
. and
Whatthe
isassembler
the most work
important
together
difference
to allowbetween
this? the following two
sequences of statements?
a. LDA ALPHA
COMP #0
JEQ SKIP
LDA #3
STA BETA
SKIP
b. IF (&ALPHA
NE 0)
&BETA SET 3
a. RDBUFF F1,BUFFER,LENGTH,
00,1024
. b.
Suppose
LOOPthat you
RDBUFF
have aF2,
simple
BUFFER,
one-pass
LTH macro processor like the
IFDEF ALPHA.
ENDIF
Exercises 221
. Suppose that you have a simple one-pass macro processor like the
one described in Section 4.1. Now you want to add a built-in func-
tion named %SIZEOF to the macro processor. This function can be
applied to macro parameters, and returns the number of bytes occu-
pied by the corresponding argument. Consider, for example, the
following program:
P8 START 0)
MOVE MACRO &FROM,&TO
&LENGTH SET $SIZEOF(&FROM)
IF (&LENGTH EQ 1)
LDCH &FROM
STCH &TO
ELSE
LDX #&LENGTH
LDS #FROM
LDT #TO
JSUB MOVERTN
ENDIF
MEND
A RESB 1
B RESB 1
Cc RESB 500
D RESB 500
END
LDT #D
JSUB MOVERTN
a. LDT #8
CLEAR Xx
LOOP
TIXR T
JLT LOOP
b. &CTR SET 0
WHILE (&CTR LT 8)
a. RDBUFFF1,BUFFER,LENGTH,
(04,12)
b. LABEL RDBUFFF1,BUFFER,
LENGTH,
00
Cc. RDBUFF F1,BUFFER,LENGTH
19.
18.
17.
16.
15.
14.
13. Exercises 223
accomplished?
Modify the algorithm in Fig. 4.5 to include keyword parameters.
Some macro processors allow macro instructions in which some of
the parameters are keyword parameters and some are positional pa-
rameters. How could a macro processor handle such mixed-mode
macro instructions?
a. RDBUFF F3,BUF,RECL, ZZ
{illegal value specified for &EOR}
b. RDBUFF
F3,BUF,RECL, 04,2048,01
{too many arguments}
C. RDBUFF F3,,RECL,
04
(no value specified for & BUFADR}
d. RDBUFF F3,RECL, BUF
Section 4.3
Compilers
In this chapter we discuss the design and operation of compilers for high-level
programming languages. Many textbooks and courses are entirely devoted to
compiler construction. We obviously cannot hope to cover the subject thor-
oughly in a single chapter. Instead, our goal is to give the reader an under-
standingof howatypical compilerworks.Weintroducethe mostimportant
concepts and issues related to compilers, and illustrate them with examples.
As each subject is discussed, we give referencesfor those readers who want to
explore the topic in more detail.
Section 5.1 presents the basic functions of a simple one-pass compiler. We
illustrate the operation of such a compiler by following an example program
through the entire translation process. This section contains somewhat more
detail than the other parts of the chapter because of the fundamental impor-
tance of the material.
This section introduces the fundamental operations that are necessary in com-
piling a typical high-level language program. We use as an example the Pascal
program in Fig. 5.1; however, the concepts and approachesthat we discuss can
also be applied to the compilation of programs in other languages.
For the purposes of compiler construction, a high-level programming lan-
guage is usually described in terms of a grammar.This grammar specifies the
225
226 Chapter5 9 Compilers
1 PROGRAM STATS
2 VAR
3 SUM, SUMSQ,
I, VALUE,MEAN,VARIANCE : INTEGER
4 BEGIN
5 SUM := 0;
6 SUMSQ := 0;
7 FOR I := 1 TO 100DO
8 BEGIN
9 READ (VALUE)
;
10 SUM := SUM + VALUE;
11 SUMSQ := SUMSO + VALUE * VALUE
12 END;
single pass. Our discussions in this section describe how such a one-pass com-
piler mightwork.On theotherhand,compilersfor otherlanguages
andcom-
pilers that perform sophisticated code optimization or other analysis of the
program generally make several passes.In Section 5.4 we discuss the division
of a compiler into passes.Section 5.5 gives several examples of the structure of
actual compilers.
In the following sections we discuss the basic elements of a simple compi-
lation process,illustrating their application to the example program in Fig. 5.1.
Section 5.1.1 introduces some concepts and notation used in specifying gram-
mars for programming languages.Sections5.1.2 through 5.1.4discuss, in turn,
the functions of lexical analysis, syntactic analysis, and code generation.
5.1.1 Grammars
and
xX : il
as + H
where X and Y are REAL variables and I, J, K are INTEGER variables. These
two statements have identical syntax. Each is an assignment statement; the
value to be assigned is given by an expression that consists of two variable
names separated by the operator +. However, the semantics of the two state-
ments are quite different. The first statement specifies that the variables in the
expression are to be added using integer arithmetic operations. The second
statement specifies a floating-point addition, with the integer operand I being
converted to floating point before adding. Obviously, these two statements
would be compiled into very different sequences of machine instructions.
However, they would be described in the same way by the grammar. The dif-
ferencesbetween the statements would be recognized during code generation.
A number of different notations can be used for writing grammars. The
one we describe is called BNF (for Backus-Naur Form). BNF is not the most
powerful syntax description tool available. In fact, it is not even totally ade-
quate for the description of some real programming languages. It does,
228 Chapter
5 Compilers
however, have the advantages of being simple and widely used, and it pro-
vides capabilities that are sufficient for most purposes. Figure 5.2 gives one
possible BNF grammar for a highly restricted subset of the Pascallanguage. A
complete BNF grammar for Pascalcan be found in Jensenand Wirth (1974).In
the remainder of this section, we discuss this grammar and show how it re-
lates to the example program in Fig. 5.1.
A BNF grammar consists of a set of rules, each of which defines the syntax
of some construct in the programming language. Consider, for example, Rule
13 in Fig. 5.2:
This rule offers two possibilities, separatedby the | symbol, for the syntax of
an <id-list>. The first alternative specifiesthat an <id-list> may consist simply
of a token id (the notation id denotes an identifier that is recognized by the
scanner). The second syntax alternative is an <id-list>, followed by the token
“,” (comma), followed by a token id. Note that this rule is recursive, which
means the construct <id-list> is defined partially in terms of itself. By trying a
few examples you should be able to see that this rule includes in the definition
of <id-list> any sequenceof one or more id’s separatedby commas. Thus
ALPHA
ALPHA , BETA
READ ( VALUE )
That is, an <assign> consists of an id, followed by the token :=, followed by an
expression <exp>. Rule 10 gives a definition of an <exp>:
By reasoning similar to that applied to <id-list>, we can see that this rule de-
fines an expression<exp> to be any sequenceof <term>s connectedby the op-
erators + and -. Similarly, Rule 11 defines a <term> to be any sequence of
230 Chapter
5 Compilers
(read)
(id-list)
READ ' id )
{VALUE}
(a)
(assign)
(term) (term)
ia = ia DIV int - ia * ia
{VARIANCE} {SUMSQ} {100} {MEAN} {MEAN
}
(b)
Figure 5.3 Parse trees for two statements from Fig. 5.1.
<factor>s connected by * and DIV. Rule 12 specifies that a <factor> may con-
sist of an identifier id or an integer int (which is also recognized by the scan-
ner) or an <exp> enclosed in parentheses.
Figure 5.3(b) shows the parse tree for statement 14 from Fig. 5.1 in terms of
the rules just described. You should examine this figure carefully to be sure
you understand the analysis of the source statement according to the rules of
the grammar. In Section 5.1.3,we discuss methods for performing this sort of
syntactic analysis in a compiler.
5.1 BasicCompilerFunctions 231
Note that the parse tree in Fig. 5.3(b) implies that multiplication and divi-
sion are done before addition and subtraction. The terms SUMSQ DIV 100 and
MEAN * MEAN must be calculated first since these intermediate results are
the operands (left and right subtrees)for the —operation. Another way of say-
ing this is that multiplication and division have higher precedencethan addi-
tion and subtraction. These rules of precedenceare implied by the way Rules
10-12 are constructed (see Exercise 5.1.3). In Section 5.1.3 we see a way to
make use of such precedencerelationships during the parsing process.
The parse trees shown in Fig. 5.3 represent the only possible ways to ana-
lyze these two statements in terms of the grammar of Fig. 5.2. For some gram-
mars, this might not be the case.If there is more than one possible parse tree
for a given statement, the grammar is said to be ambiguous.We prefer to use
unambiguous grammars in compiler construction because,in some cases,an
ambiguous grammar would leave doubt about what object code should be
generated.
Figure 5.4 shows the parse tree for the entire program in Fig. 5.1. You
should examine this figure carefully to see how the form and structure of the
program correspond to the rules of the grammar in Fig. 5.2.
(prog)
{STATS} | |
(id-list) (type) (stmt-list) ; (stmt) (write)
oo
FOR
ia
oo
(index-exp)
=
DO Ee
(body)ee
(exp) TO (exp) BEGIN (stmt-list) END
{I}
€tc
234 Chapter5 Compilers
scanner can perform this same function much more efficiently. Since a large
part of the source program consists of such multiple-character identifiers, this
saving in compilation time can be highly significant. In addition, restrictions
such as a limitation on the length of identifiers are easier to include in a scan-
ner than in a general-purpose parsing routine.
Similarly, the scanner generally recognizes both single- and multiple-
character tokens directly. For example, the character string READ would be in-
terpreted as a single token rather than as a sequenceof four tokens R, E, A, D.
Thestring:= would berecognized
asa singleassignment
operator,not as
: fol-
lowed by =. It is, of course, possible to handle multiple-character tokens one
character at a time, but such an approach createsconsiderably more work for
the parser.
The output of the scanner consists of a sequenceof tokens. For efficiency of
later use, each token is usually representedby some fixed-length code,such as
an integer, rather than as a variable-length character string. In such a coding
scheme for the grammar of Fig. 5.2 (shown in Fig. 5.5) the token PROGRAM
would be representedby the integer value 1, an identifier id would be repre-
sented by the value 22, and so on.
Token Code
PROGRAM 1
VAR 2
BEGIN 3
END 4
END. 5
INTEGER 6
FOR 7
READ 8
WRITE 9
TO 10
DO 11
: 12
13
; 14
= 15
+ 16
- 1?
* 18
DIV. 19
( 20 ~
) 21
id 22
int 23
Figure 5.5 Token coding scheme for the grammar from Fig. 5.2.
5.1 BasicCompiler Functions 235
Figure Token
5.6 “VALUE
Lexical
specifier Line
scan of the 16
program Token 5type
from Fig. 5.1. Token specifier
1 10 22 “SUM
22 “STATS 15
2 22 “SUM
22 “SUM 16
14 22 “VALUE
22 ~SUMSQ 12
14 11 22 “SUNSQ
22 “I 15
14 22 “SUMSQ
22 “VALUE 16
14 22 “VALUE
22 “MEAN 18
14 22 “VALUE
22 “VARIANCE 12 4
13 12
6 13 22 “MEAN
3 15
22 “SUM 22 “SUM
15 19
23 23 #100
12 12
22 ~SUMSQ 14 22 “VARIANCE
15 15
23 22 “SUMSQ
12 19
7 23 #100
22 17
15 22 “MEAN
23 #1 18
10 22 “MEAN
23 #100 12
11 15 9
3 20
8 22 “MEAN
20 14
22 22 “VARIANCE
21 21
12
5.1 BasicCompiler Functions 237
DO 10 I = 1,100
pO 10 I=1
ahead to see if there is a comma (,) before it can decide on the proper interpre-
tation of the characters DO.
ELSE
THEN = IF
ENDIF
abc {recognized}
abccabc’™ {recognized}
ac {not recognized}
(b)
Similarly, consider the second input string shown in Fig. 5.7(b). The scan-
ning of the first three characters happens exactly as described above. This
time, however, thererare still characters left in the input string. The fourth
character of the string (the second c) causes the automaton to remain in State 4
(note the arrow labeled with c that loops back to State4). The following a takes
the automaton back to State 2. At the end of the input string, the finite automa-
ton is again in State 4, so it recognizes the string abccabc.
On the other hand, consider the third input string in Fig. 5.7(b). The finite
automaton begins in State 1, as before, and the a causesa transition from State
1 to State 2. Now the next character to be scanned is c. However, there is no
transition from State 2 that is labeled with c. Therefore, the automaton must
stop in State 2. Because this is not a final state, the finite automaton fails to rec-
ognize the input string. If you try some other examples, you will discover that
the finite automaton in Fig. 5.7(a) recognizes tokens of the form abc...abc...
where the grouping abc is repeated one or more times, and the c within each
grouping may also be repeated.
Figure 5.8 shows several finite automata that are designed to recognize
typical programming language tokens. Figure 5.8(a) recognizes identifiers and
keywords that begin with a letter and may continue with any sequenceof let-
ters and digits. Notice the notation A-Z, which specifies that any character
from A to Z may causethe indicated transition. For simplicity, we have consid-
ered only uppercaseletters in this example.
Some languages allow identifiers such as NEXT_LINE, which contains the
underscore character (_). Figure 5.8(b) shows a finite automaton that recog-
nizes identifiers of this type. Notice that this automaton does not allow identi-
fiers that begin or end with an underscore, or that contain two consecutive
underscores.
The finite automaton in Fig. 5.8(c) recognizes integers that consist of a
string of digits, including those that contain leading zeroes, such as 000025.
Figure 5.8(d) shows an automaton that does not allow leading zeroes,except
in the caseof the integer 0. An integer that consists only of the digit 0 must be
followed by a space to separateit from the following token. You are encour-
aged to try several example strings with the finite automata in Fig. 5.8, to be
sure you seehow they work.
Eachof the finite automata we have seenso far was designed to recognize
one particular type of token. Figure 5.9 shows a finite automaton that can rec-
ognize all of the tokens listed in Fig. 5.5. Notice that for simplicity we have
chosen to recognize all identifiers and keywords with one final state (State 2).
A separate table look-up operation could then be used to distinguish key-
words. Likewise, a separate check could be made to ensure that identifiers are
of a length permitted by the language definition. (Finite automata cannot eas-
ily represent limitations on the length of strings being recognized.)
240 Chapter5 Comprlers
(b)
the string being recognized is “END.”. If it is not, the scanner could, in effect,
back up to State 2 (recognizing the “VAR”). The period would then be re-
scanned as part of the following token the next time the scanner is called.
Notice that this kind of backup is not required with State 7. The sequence:= is
alwaysrecognized
asanassignment
operator,
notas: followedby =.
Finite automata provide an easy way to visualize the operation of a scan-
ner. However, the real advantage of this kind of representation is in easeof im-
plementation. Consider again the problem of recognizing identifiers that may
contain underscores. Figure 5.10(a) shows a typical algorithm to recognize
such a token.
in thetable.
Thetabular
representation
isusually
much
clearerandlesserror-
prone than an algorithmic representation such as Fig. 5.10(a).It is also much
easierto changetable entries than to modify nested loops or procedure calls.
Last_Char_Is_Underscore := false
end {while}
if Last_Char_Is_Underscore then
return (Token_Error)
else
return (Valid_Token)
end {if first in [’A'’..'2’]}
else
return (Token_Error)
(a)
1 2 {starting state}
2 2 2 3 {final state}
3 2 2
(b)
Figure 5.10 Token recognition using (a) algorithmic code and (b) tabu-
lar representation of finite automaton.
being translated. Parsing techniques are divided into two general classes—
bottom-upand top-down—accordingto the way in which the parse tree is con-
structed. Top-down methods begin with the rule of the grammar that specifies
the goal of the analysis (i.e., the root of the tree), and attempt to construct the
tree so that the terminal nodes match the statements being analyzed. Bottom-
up methods begin with the terminal nodes of the tree (the statements being
analyzed), and attempt to combine these into successively higher-level nodes
until the root is reached.
Operator-Precedence Parsing
A+B*
C -D
According to the usual rules of arithmetic, multiplication and division are per-
formed before addition and subtraction—that is, multiplication and division
have higher precedence than addition and subtraction. If we examine the first
two operators ( + and *), we find that + has lower precedencethan *. This is of-
ten written as
+ <*
Similarly, for the next pair of operators (* and —),we would find that * has
higher precedencethan —.We may write this as
* D>-
A+B*
C-D
<»>
PROGRAM = VAR
and
The relation = indicates that the two tokens involved have equal precedence
and should be recognized by the parser as part of the same language con-
struct. Note that the precedencerelations do not follow the ordinary rules for
comparisons. For example, we have
; => END
but
END >;
5.1 BasicCompiler Functions 245
That is, when ; is followed by END, the ; has higher precedence. But when
END is followed by ;, the END has higher precedence.
Also note that in many casesthere is no precedence relation between a pair
of tokens. This means that these two tokens cannot appear together in any le-
gal statement. If such a combination occurs during parsing, it should be recog-
nized as a syntax error.
There are algorithmic methods for constructing a precedence matrix like
Fig. 5.11from a grammar [see,for example, Aho et al. (1988)].For the operator-
precedence parsing method to be applied, it is necessary that all the prece-
dence relations be unique. For example, we could not have both ; < BEGIN
and ; > BEGIN. This condition holds true for the grammar in Fig. 5.2;
however, if seemingly minor changes were made to this grammar, some of the
PROGRAM
|=
VAR = < <ai/< <
WRITE
TO
_ {[ | >
_| <<«/<
| <
=
< <<
|
DO < > >| <<< > <
> > > <<< > <|< <
> . | >| |
= > > = > < «<j]aq
ac<q |< «|
+ > > > > > > >pIl< < < DIi< <
- > > |> > > | > >|< < < olK< <
* >> ls > > > PpIl>> <<DIi< <
DIV > > > > > > D/P > < bPIl/< <
READ ( (Ny) )
ia
{ VALUE}
READ ( <N,> )
5.1 BasicCompiler Functions 247
which corresponds, except for the name of the nonterminal symbol, to Rule 13
of the grammar. This rule is the only one that could be applied in recognizing
this portion of the program. As before, however, we simply interpret the se-
quence as some nonterminal <N,>.
This completes the parsing of the READ statement. If we compare the
parse tree shown in Fig. 5.3(a) with the one just developed, we see that they
are the same except for the names of the nonterminal symbols involved. This
means that we have correctly identified the syntax of the statement, which is
the goal of the parsing process.The names of the nonterminals are arbitrarily
chosen by the person writing the grammar, and have no real bearing on the
syntax of the source statement.
Figure 5.13 shows a similar step-by-step parsing of the assignment state-
ment from line 14 of the program in Fig. 5.1. Note that the left-to-right scan is
continued in each step only far enough to determine the next portion of the
statement to be recognized, which is the first portion delimited by < and >.
Once this portion has been determined, it is interpreted as a nonterminal ac-
cording to some rule of the grammar. This process continues until the com-
plete statement is recognized. You should carefully follow the steps shown in
Fig. 5.13 to be sure you understand how the statement is analyzed with the aid
of the precedencematrix in Fig. 5.11.Note that each portion of the parse tree is
constructed from the terminal nodes up toward the root, hence the term bot-
tom-up parsing.
Comparing the parse tree built in Fig. 5.13 with the one in Fig. 5.3(b), we
note a few differences in structure. For example, in Fig. 5.3 the id SUMSQ is
interpreted first as a <factor>, which is then interpreted as a <term> that is one
of the operands of the DIV operation. In Fig. 5.13, however, the id SUMSQ is
interpreted as the single nonterminal <N,>, which is an operand of the DIV.
That is, <N,> in the tree from Fig. 5.13 corresponds to two nonterminals,
<factor> and <term>, in Fig. 5.3(b). There are other similar differences be-
tween the two trees.
These differences are consistent with our use of arbitrary names for the
nonterminal symbols recognized during an operator-precedenceparse. In Fig.
5.3(b), the interpretation of SUMSQ as a <factor> and then as a <term> is sim-
ply a reassignment of names. This renaming is necessarybecause,according to
Rule 11 of the grammar, the first operand in a multiplication operation must
be a <term>, not a <factor>. Since our operator-precedence parse is not con-
cerned with the names of the nonterminals in any case, it is not necessary to
perform this additional step in the recognition process.As a matter of fact, the
three different names <exp>, <term>, and <factor> were originally included in
the grammar only as a means of specifying the precedenceof operators (for
example, that multiplication is performed before addition). Since this informa-
tion is incorporated into our precedence matrix, there is no need to be con-
cerned with the different namesduring the actual parsing.
248 Chapter
5 Compilers
(Ny) (No)
(N})
ido DIV
(No)
“P
int ids
{SUMSQ} {100} {MEAN}
(Ny) ~ , IV
(Ns
)
(suméa} 100}(Mean) one
(Nz) (Ne)
(Nz) (Ne)
id) = IV id,
{VARIANCE}
(sungo} 1300} (MEAN) {MEAN}
Figure 5.13 (contd)
250 Chapter
5 Compilers
(a)
t BEGIN
(b)
(e)
. BEGIN READ ( id )
bo
(f)
Recursive-Descent Parsing
Figure 5.15 shows the grammar from Fig. 5.2 with left recursion elimi-
nated. Consider, for example, Rule 6a in Fig. 5.15:
<id-list> ::= id { , id }
This notation, which is a common extension to BNF, specifies that the terms
between | and } may be omitted, or repeated one or more times. Thus Rule 6a
defines <id-list> as being composed of an id followed by zero or more occur-
rencesof “, id”. This is clearly equivalent to Rule 6 of Fig. 5.2. With the revised
definition, the procedure for <id-list> simply looks first for an id, and then
keeps scanning the input as long as the next two tokens are a comma (,) and
id. This eliminates the problem of left recursion and also the difficulty of de-
ciding which alternative for <id-list> to try.
Similar changeshave been made in Rules 3a, 7a, 10a, and 11a in Fig. 5.15.
You should compare these rules to the corresponding definitions in Fig. 5.2 to
be sure you understand the changesmade. Note that the grammar itself is still
recursive: <exp> is defined in terms of <term>, which is defined in terms of
<factor>, and one of the alternatives for <factor> involves <exp>. This means
that recursive calls among the procedures of the parser are still possible.
However, direct left recursion has been eliminated. A chain of calls from
<exp> to <term> to <factor> and back to <exp> must always consume at least
one token from the input statement.
1 <prog> = PROGRAM
<prog-name> VAR <dec-list> BEGIN <stmt-list> END.
2 <prog-name> = id
3a <dec-list> = <dec> { ; <dec> }
<dec> = <id-list> : <type>
5 <type> = INTEGER
6a <id-list> =id{, id }
Ja <stmt-list> <stmt> { ; <stmt> }
procedure READ
begin
FOUND := FALSE
if TOKEN = 8 {READ} then
begin
advance to next token
if TOKEN = 20 { ( } then
begin
advance to next token
if IDLIST returns success then
if TOKEN
= 21 { ) } then
begin
FOUND := TRUE
advance to next token
end {if ) }
end {if ( }
end {if READ}
if FOUND = TRUE then
return success
else
return failure
end {READ}
procedure IDLIST
begin
FOUND := FALSE
if TOKEN = 22 {id} then
begin
FOUND := TRUE
advance to next token
while (TOKEN = 14 {,}) and (FOUND = TRUE) do
begin
advance to next token
if TOKEN = 22 {id} then
advance to next token
else
FOUND := FALSE
end {while}
end {if id}
if FOUND = TRUE then
return success
else
return failuré
end {IDLIST}
(a)
(b)
In the procedure IDLIST, note that a comma (,) that is not followed by an
id is considered to be an error, and the procedure returns an indication of fail-
ure to its caller. If a sequenceof tokens such as “id,id,” could be a legal con-
struct according to the grammar, this recursive-descent technique would not
work properly. For such a grammar, it would be necessaryto use a more com-
plex parsing method that would allow the top-down parser to backtrack after
recognizing that the last comma was not followed by an id.
Figure 5.16(b) gives a graphic representation of the recursive-descentpars-
ing process for the statement being analyzed. In part (i), the READ procedure
has been invoked and has examined the tokens READ and ( from the input
stream (indicated by the dashed lines). In part (ii), READ has called IDLIST (in-
dicated by the solid line), which has examined the token id. In part (iii), IDLIST
has returned to READ, indicating success;READ has then examined the input
token ). This completes the analysis of the source statement. The procedure
READ will now return to its caller, indicating that a <read> was successfully
found. Note that the sequenceof procedure calls and token examinations has
completely defined the structure of the READ statement. The representation in
part (iii) is the sameas the parse tree in Fig. 5.3(a).Note also that the parse tree
was constructed beginning at the root, hencethe term top-downparsing.
Figure 5.17 illustrates a recursive-descent parse of the assignment state-
ment on line 14 of Fig. 5.1. Figure 5.17(a)shows the procedures for the nonter-
minal symbols that are involved in parsing this statement. You should
carefully compare these procedures to the corresponding rules of the gram-
mar. Figure 5.17(b)is a step-by-step representation of the procedure calls and
token examinations similar to that shown in Fig. 5.16(b).You are urged to fol-
low through each step of the analysis of this statement, using the procedures
in Fig. 5.17(a).Compare the parse tree built in Fig. 5.17(b) to the one in Fig.
5.3(b). Note that the differences between these two trees correspond exactly to
the differences between the grammars of Figs. 5.15and 5.2.
256 Chapter5 Compilers
procedure ASSIGN
begin
FOUND := FALSE
if TOKEN = 22 {id} then
begin
advance to next token
if TOKEN = 15 { := } then
begin
advance to next token
if EXP returns success then
FOUND := TRUE
end {if := }
end {if id}
if FOUND = TRUE then
return success
else
return failure
end {ASSIGN}
procedure EXP
begin
FOUND := FALSE
if TERM returns success then
begin
FOUND := TRUE
while ((TOKEN = 16 {+}) or (TOKEN = 17 {-}))
and ( FOUND = TRUE ) do
begin
advance to next token
if TERM returns failure then
FOUND := FALSE
end {while}
end {if TERM}
if FOUND = TRUE then
return success
else
return failure
end (EXP}
Fig. 5.1. The result should be similar to the parse tree in Fig. 5.4. The only differences
should be onescreatedby the modifications made to the grammar in Fig. 5.15.
procedure TERM
begin
FOUND := FALSE
if FACTOR returns success then
begin
FOUND := TRUE
while ({TOKEN = 18 {*}) or (TOKEN = 19 {DIV})
and (FOUND = TRUE) do
begin
advance to next token
if FACTOR returns failure then
FOUND := FALSE
end {while}
end {if FACTOR}
if FOUND = TRUE then
return success
else
return failure
end {TERM}
procedure FACTOR
begin
FOUND := FALSE
if (TOKEN = 22 {id@}) or (TOKEN = 23 {int}) then
begin
FOUND := TRUE
advance to next token
end {if id or int}
else
if TOKEN = 20 { ( } then
begin
advance to next token
if EXP returns success then
if TOKEN = 21 { ) } then
begin
FOUND := TRUE
advance to next token
end {if )}
end {if ( }
if FOUND = TRUE then
return success
else
return failure
end {FACTOR}
() ASSIGN ASSIGN
/
oy|
7
/ |
id) =
{VARIANCE}
(ii) ASSIGN
/
/
oyI
/
/ I
id) =
{VARIANCE}
{ SUMSQ}
(iii)
(v)
id)
{ VARIANCE
}
id)
tsuMso
{VARIANCE}
} {100}
~.
id)
or
{ VARIANCE}
2 2 3
{ SUMSQ
} {100} { SUMSQ} {100} {MEAN
}
! J 1
I
After the syntax of a program has been analyzed, the last task of compilation
is the generation of object code. In this section we discuss a simple code-
generation technique that creates the object code for each part of the program
as soon as its syntax has been recognized.
The code-generationtechnique we describe involves a set of routines, one
for each rule or alternative rule in the grammar. When the parser recognizes a
portion of the source program according to some rule of the grammar, the cor-
responding routine is executed.Such routines are often called semanticroutines
becausethe processing performed is related to the meaning we associatewith
the corresponding construct in the language. In our simple scheme, these
semantic routines generate object code directly, so we refer to them as code-
generationroutines. In more complex compilers, the semantic routines might
generate an intermediate form of the program that would be analyzed further
in an attempt to generatemore efficient object code. We discuss this possibility
in more detail in Sections 5.2 and 5.3.
(popped from the stack) in the opposite order, last in-first out. The variable
LISTCOUNT is used,to keep a count of the number of items currently in the
list. The code-generation routines also make use of the token specifiers de-
scribed in Section 5.1.2; these specifiers are denoted by S(token). For a token
id, S(id) is the name of the identifier, or a pointer to the symbol-table entry for
it. For a token int, S(int) is the value of the integer, such as #100.
Many of our code-generation routines, of course, create segments of object
code for the compiled program. We give a symbolic representation of this
code, using SIC assembler language. You should remember, however, that the
actual code generated is usually machine language, not assembler language.
As each piece of object code is generated, we assume that a location counter
LOCCTR is updated to reflect the next available address in the compiled pro-
gram (exactly as it is in an assembler).
Figure 5.18 illustrates the application of this process to the READ state-
ment on line 9 of the program in Fig. 5.1. The parse tree for this statement is
repeated for convenience in Fig. 5.18(a). This tree can be generated with
many different parsing methods. Regardlessof the technique used, however,
the parser always recognizes at each step the leftmost substring of the input
that can be interpreted according to a rule of the grammar. In an operator-
precedence
parse,this recognitionoccurswhena substringof the input is
reduced to some nonterminal <N;>. In a recursive-descentparse,the recogni-
tion occurs when a procedure returns to its caller, indicating success.Thus
the parser first recognizes the id VALUE as an <id-list>, and then recognizes
the complete statement as a <read>.
Figure 5.18(c) shows a symbolic representation of the object code to be
generated for the READ statement. This code consists of a call to a subroutine
XREAD, which would be part of a standard library associatedwith the com-
piler. The subroutine XREAD can be called by any program that wants to per-
form a READ operation. XREAD is linked together with the generated object
programbya linking loader or a linkage editor. (The compiler includes
enough information in the object program to specify this linking operation,
perhaps using Modification records such as those discussed in Chapter 2.)
This technique is commonly used for the compilation of statements that per-
form relatively complex functions. The use of a subroutine avoids the repeti-
tive generation of large amounts of in-line code, which makes the object
program smaller.
Since XREAD may be used to perform any READ operation, it must be
passed parameters that specify the details of the READ. In this case, the para-
meter list for XREAD is defined immediately after the JSUB that calls it. The
first word in this parameter list contains a value that specifies the number of
variables that will be assigned values by the READ. The following words
give the addresses of these variables. Thus the second line in Fig. 5.18(c) spec-
ifies that one variable is to be read, and the third line gives the address of this
262 Chapter5 Compilers
(read)
(id-list)
READ ( id }
{VALUE}
(a)
<id-list> ::= id
<id-list>::= <id-list> , id
(b)
+JSUB XREAD
WORD 1
WORD. VALUE
(c)
variable. The address of the first word of the parameter list will automatically
be placedin registerL by theJSUBinstruction.ThesubroutineXREADcanuse
this address to locate its parameters, and then add the length of the parameter
list to register L to find the true return address.
Figure5.18(b)
shows
aset of routinesthatmightbeusedto accomplish
this
code generation. The first two routines correspond to alternative structures for
<id-list>, which are shown in Rule 6 of the grammar in Fig. 5.2. In either case,
the token specifier S(id) for a new identifier being added to the <id-list> is in-
serted into the list used by the code-generation routines, and LISTCOUNT is
updated to reflect this insertion. After the entire <id-list> has been parsed, the
list contains the token specifiers for all the identifiers that are part of the <id-
list>. When the <read> statement is recognized, these token specifiers are re-
moved from the list and used to generate the object code for the READ.
Remember that the parser, in generating the tree shown in Fig. 5.18(a), rec-
ognizes first <id-list> and then <read>. At each step, the parser calls the ap-
propriate code-generation routine. You should work through this example
carefully to be sure you understand how the code-generation routines in Fig.
5.18(b)createthe object code that is symbolically representedin Fig. 5.18(c).
Figure 5.19 shows the code-generation process for the assignment state-
ment on line 14 of Fig. 5.1. Figure 5.19(a)displays the parse tree for this state-
ment. Most of the work of parsing involves the analysis of the <exp> on the
right-hand side of the :=. As we can see, the parser first recognizes the id
SUMSQ as a <factor> and a <term>; then it recognizes the int 100 as a
<factor>; then it recognizes SUMSQ DIV 100 as a <term>, and so forth. This is
essentially the same sequenceof steps shown in Fig. 5.13. The order in which
the parts of the statement are recognized is the same as the order in which the
calculations are to be performed: SUMSQ DIV 100 and MEAN * MEAN are
computed, and then the second result is subtracted from the first.
As each portion of the statement is recognized, a code-generation routine
is called to create the corresponding object code. For example, suppose we
want to generate code that corresponds to the rule
The subscripts are used here to distinguish between the two occurrences of
<term>. Our code-generation routines perform all arithmetic operations using
register A, so we clearly need to generate a MUL instruction in the object code.
The result of this multiplication, <term>,, will be left in register A by the MUL.
If either <term>, or <factor> is already present in register A, perhaps as the re-
sult of a previous computation, the MUL instruction is all we need. Otherwise,
we must generatea LDA instruction preceding the MUL. In that casewe must
alsosavethepreviousvaluein register
Aif it will berequiredfor lateruse.
264 Chapter5 Compilers (assign)
(term) (term)
ia = ia DIV int - ia * ia
{ VARIANCE} {SUMSQ} {100} {MEAN} {MEAN}
(a)
GETA ( <exp> )}
generate [STA S(id)]
REGA := null
S(<exp>) := S(<term>)
if S(<exp>) = rA then
REGA := <exp>
if S(<exp>,) = rA then
generate [ADD S(<term)
]
else if S(<term) = rA then
generate [ADD S(<exp>,)]
else
begin
GETA (<exp>,)
generate [ ADD S(<term)
]
end
S(<exp>,) := rA
REGA := <exp>,
S(<exp>,) := FA
REGA := <exp>,
S(<term>) := S(<factor>)
if S(<term) = rA then
REGA := <term
if S(<term>,) = rA then
generate [MUL S(<factor>)
]
else if S(<factor>) = rA then
S(<term>,) := rA
REGA:= <term,
if S(<term>,) = rA then
generate [DIV S(<factor>)
]
else
begin
GETA(<term>,)
generate [DIV S(<factor>)
]
end
S(<term-,) rA
REGA:= <term,
<factor> ::= id
S(<factor>) := S(id)
S(<factor>) := S(int)
(b)
end ({GETA)}
(c)
LDA SUMSQ
DIV #100
STA Tl
LDA MEAN
MUL MEAN
STA T2
LDA T1
SUB T2
STA —«
VARIANCE
(d)
Obviously we need to keep track of the result left in register A by each seg-
ment of code that is generated.We do this by extending the token-specifier idea
to nonterminal nodes of the parse tree. In the example just discussed, the node
specifierS(<term>,) would be set to rA, indicating that the result of this compu-
tation is in register A. The variable REGA is used to indicate the highest-level
node of the parse tree whose value is left in register A by the code generated
so far (i.e., the node whose specifier is rA). Clearly, there can be only one such
node at any point in the code-generationprocess.If the value corresponding to
a node is not in register A, the specifier for the node is similar to a token speci-
fier: either a pointer to a symbol table entry for the variable that contains the
value, or an integer constant.
As an illustration of these ideas, consider the code-generation routine in
Fig. 5.19(b)that correspondsto the rule
If the node specifier for either operand is rA, the corresponding value is al-
ready in register A, so the routine simply generates a MUL instruction. The
operand address for this MUL is given by the node specifier for the other
operand (the one not in the register). Otherwise, the procedure GETA is called.
This procedure, shown in Fig. 5.19(c),generatesa LDA instruction to load the
value associated with <term>, into register A. Before the LDA, however, the
procedure generatesa STA instruction to save the value currently in register A
(unless REGA is null, indicating that this value is no longer needed). The value
is stored in a temporaryvariable. Such variables are created by the code genera-
tor (with names T1, T2,...) as they are needed for this purpose. The temporary
variables used during a compilation will be assigned storage locations at the
end of the object program. The node specifier for the node associatedwith the
value previously in register A, indicated by REGA, is reset to indicate the tem-
porary variable used.
ment has been completely generated, and any intermediate results are no
longer needed.
The remaining rules shown.in Fig. 5.19(b) do not require the generation of
any machine instructions since no computation or data movement is involved.
The code-generation routines for these rules simply set the node specifier of
the higher-level node to reflect the location of the corresponding value.
Figure 5.19(d) shows a symbolic representation of the object code gener-
ated for the assignment statement being translated. You should carefully work
through the generation of this code to understand the operation of the routines
in Figs. 5.19(b) and 5.19(c).You should also confirm that this code will perform
the computations specified by the source program statement.
Figure 5.20 shows the other code-generation routines for the grammar in
Fig. 5.2. The routine for <prog-name> generatesheader information in the ob-
ject program that is similar to that created from the START and EXTREF as-
sembler directives. It also generates instructions to save the return address and
jump to the first executable instruction in the compiled program. When the
complete <prog> is recognized, storage locations are assigned to any tempo-
rary (Ti) variables that have been used. Any references to these variables are
then fixed in the object code using the same process performed for forward
references by a one-pass assembler. The compiler also generates any
Modification records required to describe external referencesto library subrou-
tines.
<prog-name> :: WW pe&
generate [ START 0]
generate [ EXTREF XREAD,XWRITE]
generate { STL RETADR]
add 3 to LOCCTR
{leave room for jump to first executable instruction}
generate [RETADR RESW 1]
begin
remove S(NAME) from list
enter LOCCTR into symbol table as address for NAME
generate [S(NAME) RESW 1)
end
LISTCOUNT := 0
Figure 5.20 Other code-generation routines for the grammar from Fig. 5.2.
270 Chapter5 Compilers
begin
remove S(ITEM) from list
GETA (<exp>,)
push LOCCTR onto stack (beginning address of loop}
push S(id) onto stack {index variable}
generate [ STA S(id)]
generate [ COMPS(<exp>,)]
push LOCCTR onto stack {address of jump out of loop}
add 3 to LOCCTR {leave room for jump instruction)
REGA := null
Line SymbolicRepresentation
of Generated
Code
where operationis some function to be performed by the object code, op1 and
op2are the operands for this operation, and resultdesignateswhere the result-
ing value is to be placed.
For example, the source program statement
+ , SUM, VALUE, i,
=, i, , SUM
1) = #0 SUM {SUM:= 0}
(2) := #0 SUMSO {SUMSQ:= 0}
3) := #1 I {FOR I := 1 TO 100}
4) JGT I #100 (15)
(5) CALL XREAD {READ(VALUE)
}
6) PARAM VALUE
7) + SUM VALUE i, {SUM:= SUM+ VALUE}
8) := 1, SUM
(9) * VALUE VALUE i, {SUMSQ:= SUMSQ+
10) + SUMSQ i, i, VALUE * VALUE}
(11) := i, SUMSQ
12) + I #1 i, {end of FOR loop}
13) := i, I
14 J (4)
(15) DIV SUM #100 i, {MEAN:= SUMDIV 100}
(16) := i. MEAN
(17) DIV SUMSQ «#100 i, {VARIANCE :=
(18) * MEAN MEAN i, SUMSQ
DIV 100
(19) - i, i, ig - MEAN* MEAN}
(20) := i, VARIANCE
(21) CALL XWRITE ° {WRITE
(MEAN,
VARIANCE)
}
(22) PARAM MEAN
(23) PARAM VARIANCE
Figure 5.22 \ntermediate code for the program from Fig. 5.1.
5.2. Machine-DependentCompilerFeatures 275
These same registers can also often be used for addressing (as base or index
registers). We concentrate here, however, on the use of registers as instruction
operands. .
Machine instructions that use registers as operands are usually faster than
the corresponding instructions that refer to locations in memory. Therefore, we
would prefer to keep in registers all variables and intermediate results that
will be used later in the program. Each time a value is fetched from memory,
or calculated as an intermediate result, it can be assigned to some register. The
value will be available for later use without requiring a memory reference.
This approach also avoids unnecessarymovement of values between memory
and registers, which takes time but does not advance the computation. We
used a very simple version of this technique in Section 5.1.4 when we kept
track of the value currently in register A.
Consider, for example, the quadruples shown in Fig. 5.22. The variable
VALUE is used once in quadruple 7 and twice in quadruple 9. If enough regis-
ters are available, it would be possible to fetch this value only once. The value
would be retained in a register for use by the code generated from quadruple
9. Likewise, quadruple 16storesthe value of i, into the variable MEAN. If i; is
assigned to a register, this value could still be available when the value of
MEAN is required in quadruple 18.Such register assignmentscan also be used
to eliminate much of the need for temporary variables. Consider, for example,
the machine code in Fig. 5.21,in which the use of only one register (register A)
wassufficientto handlesixof theeightintermediate
results(i,)in Fig.5.22.
Of course, there are rarely as many registers available as we would like to
use. The problem then becomes one of selecting which register value to re-
place when it is necessaryto assign a register for some other purpose. One rea-
sonable approach is to scan the program for the next point at which each
register value would be used. The value that will not be needed for the longest
time is the one that should be replaced. If the register that is being reassigned
contains the value of some variable already stored in memory, the value can
simply be discarded. Otherwise, this value must be saved using a temporary
variable. This is one of the functions performed by the GETA procedure dis-
cussed in Section 5.1.4, using the temporary variables Ti.
In making and using register assignments, a compiler must also consider
the control flow of the program. For example, quadruple 1 in Fig. 5.22 assigns
the value 0 to SUM. This value might be retained in some register for later use.
When SUM is next used as an operand in quadruple 7, it might appear that its
value can be taken directly from the register; however, this is not necessarily
the case. The J operation in quadruple 14 jumps to quadruple 4. If control
passesto quadruple 7 in this way, the value of SUM may not be in the desig-
nated register. This would happen, for example, if the register were reassigned
276 Chapter
5 Compilers
Figure 5.23 Basic blocks and flow graph for the quadruples from
Fig. 5.22.
5.2. Machine-DependentCompilerFeatures 277
LDA SUMSQ
DIV #100
STA Tl
LDA MEAN
MUL MEAN
STA T2
LDA T1
SUB T2
STA VARIANCE
(a)
* MEAN MEAN i,
DIV SUMSO #100 i,
- i, 1, 1,
= 1, VARIANCE
LDA MEAN
MUL MEAN
STA Tl
LDA SUMSQ
DIV #100
SUB T1
STA VARIANCE
(b)
Note that the value of the intermediate result i, is calculated first and
stored in temporary variable T1. Then the value of /, is calculated.The third
quadruple in this seriescalls for subtracting the value of i, from 1,.Sincei, has
just been computed, its value is available in register A; however, this does no
good, since thefirst operand for a —- operation must be in the register.It is nec-
essaryto store the value of i, in another temporary variable, T2, and then load
the value of 1, from T1 into register A before performing the subtraction.
Withalittle analysis,anoptimizingcompilercouldrecognize
thissituation
and rearrangethe quadruples so the secondoperand of the subtraction is com-
puted first. This rearrangement is illustrated in Fig. 5.24(b). The first two
quadruples in the sequence have been interchanged. The resulting machine
code requires two fewer instructions and usesonly one temporary variable in-
stead of two. The same technique can be applied to rearrange quadruples that
calculate the operands of a DIV operation or any other operation for which the
machine code requires a particular placement of operands.
Other possibilities for machine-dependent code optimization involve tak-
ing advantage of specific characteristicsand instructions of the target machine.
For example, there may be special loop-control instructions or addressing
modes that can be used to create more efficient object code. On some comput-
ers there are high-level machine instructions that can perform complicated
functions such as calling procedures and manipulating data structures in a sin-
gle operation. Obviously the use of such features, when possible, can greatly
improve the efficiency of the object program.
Some machines have a CPU that is made up of several functional units. On
such computers, the order in which machine instructions appear can affect the
speed of execution. Consecutive instructions that involve different functional
units can sometimesbe executed at the same time. An optimizing compiler for
such a machine could rearrange object code instructions to take advantage of
this property. For examplesand references,seeAho et al. (1988).
5.3 MACHINE-INDEPENDENT
COMPILER FEATURES
In this section we briefly describe some common compiler features that are
largely independent of the particular machine being used. As in the preceding
section, we do not attempt to give full details of the implementation of these
features. Such details may be fourd in the references cited.
Section 5.3.1 describes methods for handling structured variables such as
arrays. Section 5.3.2 continues the discussion of code optimization begun in
Section 5.2.2. This time, we are concerned with machine-independent tech-
niques for optimizing the object code.
5.3. Machine-Independent
CompilerFeatures 279
In the compiler design described in Section 5.1, we dealt only with simple
variables that were permanently assigned to storage locations within the ob-
jectprogram.
Section§.3.3describes
somealternative
waysof performingstor-
age allocation for the compiled program. Section 5.3.4 discussesthe problems
involved in compiling a block-structured language and indicates some possi-
ble solutions for theseproblems.
In this section we briefly consider the compilation of programs that use struc-
tured variablessuch as arrays, records, strings, and sets. We are primarily con-
cerned with the allocation of storage for such variables and with the
generation of code to reference them. These issues are discussed in a moderate
amount of detail for arrays. The same principles can also be applied to the
other types of structured variables. Further details concerning these topics can
be found in a number of textbooks on compilers, such as Aho et al. (1988).
Consider first the Pascal array declaration
A : ARRAY[1..10] OF INTEGER
If each INTEGER variable occupies one word of memory, then we must clearly
allocate ten words to store this array. More generally, if an array is declared as
ARRAY
[!..u] OFINTEGER
Here the first subscript can take on four different values (0-3), and the second
subscript can take on six different values. We need to allocate a total of 4 * 6 =
24 words to store the array. In general, if the array declaration is
(u,—1,+1)*(u,-1,+1)
280 Chapter
5 Compilers
>
Row 0
Row 1 Row 2 Row 3
(a)
Column 1
Column 2 Column 3 “Column 4 Column 5 Column 6
(b)
A : ARRAY[1..10] OF INTEGER
and suppose that a statement refers to array element A[6]. There are five array
elementspreceding A[6]; on a SIC machine, each such element would occupy
3 bytes. Thus the address of A[6] relative to the starting address of the array is
givenby 5* 3 = 15.
If an array reference involves only constant subscripts, the relative address
calculation can be performed during compilation. If the subscripts involve
variables, however, the compiler must generateobject code to perform this cal-
culation during execution. Suppose the array declaration is
A : ARRAY[/..u] OF INTEGER
and each array element occupies w bytes of storage. If the value of the sub-
script is s, then the relative address of the referenced array element A[s] is
given by
w*(s—D
in row-major order. Consider first the array element B[2,5]. If we start at the
beginning of the array, we must skip over two complete rows (row 0 and row
1) beforearriving at the beginningof row2 (i.e.,elementB[2,1]).Eachsuch
row contains six elements, so this involves 2 * 6 = 12 array elements. We must
also skip over the first four elements in row 2 to arrive at B[2,5].This makes a
total of 16 array elements between the beginning of the array and element
B[2,5]. If each element occupies 3 bytes, then B[2,5] is located at relative ad-
dress 48 within the array.
More generally, suppose the array declaration is
A{I] :=5
(1) I #1 i,
(2) + i, 8B i,
(3) gs #5 Ali,]
B : ARRAY [0..3,1..6]
(a) OF INTEGER
B{I,J] :=5
(1) * I #6 i,
(2) - J #1 i,
(3) + i, og i,
(4) * i, #3 i,
(5) gs #5 Bli,]
This specifies that MATRIX is an array of integers that can be allocated dy-
namically. The allocation can be accomplished by a statement like
ALLOCATE(MATRIX
(ROWS,COLUMNS)
)
where the variables ROWS and COLUMNS have previously been assigned
values.
Since the values of ROWS and COLUMNS are not known at compilation
time, the compiler cannot directly generate code like that in Fig. 5.26. Instead,
the compiler createsa descriptor (often called a dopevector)for the array. This
descriptor includes space for storing the lower and upper bounds for each ar-
ray subscript. When storage is allocated for the array, the values of these
bounds are computed and stored in the descriptor. The generated code for an
array reference uses the values from the descriptor to calculate relative ad-
dresses as required. The descriptor may also include the number of dimen-
sions for the array, the type of the array elements, and a pointer to the
beginning of the array. This information can be useful if the allocated array is
passedas a parameter to another procedure.
The issuesdiscussed for arrays also arise in the compilation of other struc-
tured variables such as records, strings, and sets. The compiler must provide
for the allocation of storage for the variable; it must store information concern-
ing the structure of the variable, and use this information to generate code to
accesscomponents of the structure; and it must construct a descriptor for situ-
ations in which the required information is not known at compilation time.
For further discussion of these issues as they relate to specific types of struc-
tured variables, see Aho et al. (1988) and Fischer and LeBlanc (1988).
284 Chapter
5 Compilers
After the substitution of 1, for i,, is performed, quadruples 6 and 13are the
same,except for the name of the result. Thus we can remove quadruple 13 and
substitute i, for i,, wherever it is used.Similarly, quadruples 10 and 11can be
removed becausethey are equivalent to quadruples 3 and 4.
The result of applying this technique is shown in Fig. 5.27(c).The quadru-
ples have been renumbered in this figure. However, the intermediate result
namesi, havebeenleft unchanged,exceptfor the substitutionsjust described,
to make the comparison with Fig. 5.27(b) easier. Note that the total number of
quadruples has been reduced from 19 to 15. Each of the quadruple operations
used here will probably take approximately the same length of time to execute
on a typical machine, so there sHould be a corresponding reduction in the
overall execution time of the program.
Another common source of code optimization is the removal of loop invari-
ants. These are subexpressionswithin a loop whose values do not change from
one iteration of the loop to the next. Thus their values can be computed once,
5.3. Machine-Independent
CompilerFeatures 285
before the loop is entered, rather than being recalculated for each iteration.
Becausemost programs spend most of their running time in the execution of
loops,
thetimesavings
fromthissortofoptimization
canbehighlysignificant.
We assumethe existenceof algorithms that can detect loops by analyzing the
control flow of the program. One example of such an algorithm is the method
for constructing a program flow graph that is described in Section 5.2.2.
An example of a loop-invariant computation is the term 2*J in Fig. 5.27(a)
[see quadruple 5 of Fig. 5.27(c)].The result of this computation depends only
on the operand J, which does not change in value during the execution of the
loop.Thuswe canmovequadruple
5in Fig.5.27(c)to a pointimmediatelybe-
fore the loop is entered. A similar argument can be applied to quadruples 6
and 7.
Figure 5.27(d) shows the sequenceof quadruples that results from these
modifications. The total number of quadruples remains the same as in Fig.
5.27(c);however, the number of quadruples within the body of the loop has
been reduced from 14 to 11. Each execution of the FOR statement in Fig.
5.27(a) causes 10 iterations of the loop, which means the total number of
quadruple operations required for one execution of the FOR is reduced from
141 to 114.
Tl :=2 * J;
T2 = T1 - 1;
FOR I := 1 TO 10 DO
X[I,T2] := Y{I,T1]
However, this would achieve only a part of the benefits realized by the opti-
mization process just described. The rest of the optimizations are related to the
process of calculating a relative address from subscript values; these details
are inaccessibleto the source programmer. For example, the optimizations in-
volving quadruples 3, 4, 10, and 11 in Fig. 5.27(b) could not be achieved with
any rewriting of the source statement. It could also be argued that the original
statementsin Fig. 5.27(a)are preferable becausethey are clearer than the mod-
ified version involving T1 and T2. An optimizing compiler should allow the
programmer to write source code that is clear and easy to read, and it should
compile such a program into machine code that is efficient to execute.
286 Chapter
X,Y
FOR 5:
X(I,2*J-1]
I Compilers
:= 1 TO 10
:= Y(I,2*J]
DO
ARRAY[1..10,1..10]
OF
(2) INTEGER -
(a)
(b)
sions
Figureand
5.27
removal
Codeofoptimization
loop invariants.
by(c)elimination of common subexpres-
5.3. Machine-IndependentCompilerFeatures 287
1 * #2 J i, {computation of invariants}
2 - 1; #1 1,
3 - i, “ #1 is
4) := #1 I {loop initialization}
5 JGT I #10 (16)
6 - I #1 i, {subscript calculation for X}
7 * i, #10 1,
8 + i, i, 1¢
(9 * i, #3 i,
(10 + i, i, ip {subscript calculation for Y}
(11) * 145 #3 13
(12 := Y(i,,] X[i,] {assignment operation}
(13 + #1 I 1i4 {end of loop}
(14) ois ing I
(15 J (5)
(16 {next statement}
(d)
DO 10 I = 1,20
10 TABLE(I) = 2**I
(a)
1 := #1 I {loop initialization}
2 EXP #2 I i, {calculation of 2**I}
3 - I #1 i, {subscript calculation}
4 * i, #3 i,
5 := i, TABLE
[i,] {assignment operation)
6 + I #1 i, {end of loop}
7 := i, I
8 JLE I #20 (2)
(b)
(1 = #1 i, {initialize temporaries}
(2 = #(-3) 1,
(3 = #1 I {loop initialization}
(4 * 1, #2 i, {calculation of 2**I}
(5 + 1, #3 i, {subscript calculation}
(6 := i, TABLE
[i,] {assignment operation}
(7 + I #1 1, {end of loop}
(8 := 1, I
(9 JLE I #20 (4)
(c)
(1) MAIN
(1) MAIN (1)
Lo
—
ee
es
cs
We
Call SUB Call SUB
SUB
"RETADR 1! L
(c)
(a) (b)
Figure 5.29 Recursive invocation of a procedure using static storage
allocation.
5.3 Machine-IndependentConipiler Teatures 291
In Fig. 5.30(b), MAIN has called the procedure SUB. A new activation
recordhas beencreatedon the top of the stack,with register B set to indicate
this new current record. The pointers PREV and NEXT in the two records have
been set as shown. In Fig. 5.30(c),SUB has called itself recursively; another ac-
tivation record has been created for this current invocation of SUB. Note that
the return addresses and variable values for the two invocations of SUB are
Variables
System @—--—=-4 for SUB
(1)
RETADR
we
ee
ws
a
Se PREV
(2) Variables
Variables
1
System¢——s
forMAIN |
forMAIN
(1) MAIN ! SUB
to RETAOR =_——4 RETADR
NEXT NEXT
Bp8 pe
Stack Stack
(a) (b)
Variables
2o@3 4
|1
| for SUB
(1)
4
l t
RETADR
NEXT
@ PREV
ePwrewe
rer
ere
Per
Variables Variables
for SUB System @=—=—--—4 for SUB
M1) MAIN
! RETADR RETADR
a
Call SUB NEXT
PREV PREV
SUB
RETADR RETADR
NEXT
Stack Stack
(c) (d)
ALLOCATE(MATRIX(ROWS,
COLUMNS)
)
allocates storage for a dynamic array MATRIX with the specified dimensions.
The statement
DEALLOCATE (MATRIX)
NEW
(P)
allocates storage for a variable and sets the pointer P to indicate the variable
just created.The type of the variablecreatedis specifiedby the wayP is de-
clared in the program. The program refers to the created variable by using the
pointer P.The statement
DISPOSE
(P)
MALLOC(SIZE)
294 Chapter
5 Compilers
allocatesa block of storage of the specified size, and returns a pointer to it. The
function
FREE(P)
frees the storage indicated by the pointer P,which was returned by a previous
MALLOC.
In some languages a program can be divided into units called blocks.A block is
a portion of a program that has the ability to declare its own identifiers. This
definition of a block is also met by units such as procedures and functions in
Pascal. In this section we consider some of the issues involved in compiling
and executing programs written in such block-structured languages.
5.3. Machine-IndependentCompilerFeatures 295
PROCEDURE A:
3 PROCEDURE B:
5 PROCEDURE
C:; |
6 VAR V.W : INTEGER:
|2
3
"
7 END {C}:
8 END {B}
9 PROCEDURE D; ; _
ll
12 END
A 1 1 —
B 2 2 1
Cc 3 3 2
D 4 2 1
(b)
one definition of the identifier. The chain of definitions for that identifier is
then searched for the appropriate entry. There are other symbol-table organi-
zations that store thedefinitions of identifiers according to the nesting of the
blocks that define them. This kind of structure can make the search for the
surroundthecurrentonein thesourceprogram.When
a blockrefersto a vari-
able that is declared in some surrounding block, the generated object code
uses the display to find the activation record that contains this variable.
The use of a display is illustrated in Fig. 5.32.We assumethat procedure A
has been invoked by the system, A has then called procedure B, and B has
called procedure C. The resulting situation is shown in Fig. 5.32(a).The stack
contains activation records for the invocations of A, B, and C. The display con-
tains pointers to the activation records for C and for the surrounding blocks
(A and B).
Now let us assumeprocedure
C callsitself recursively.
Anotheractivation
record for C is created on the stack as a result of this call. Any reference to a
variable declared by C should use this most recent activation record; the dis-
play pointer for C is changed accordingly. Variables that correspond to the
previous invocation of C are not accessible for the moment, so there is no dis-
play pointer to this activation record. This situation is illustrated in Fig.
5.32(b).
Suppose now that procedure C calls D. (This is allowed becausethe identi-
fier D is defined in procedure A, which contains C. For simplicity, we have as-
sumed that no special “forward call” declarations are required.) The resulting
stack and display are shown in Fig. 5.32(c). An activation record for D has
been created in the usual way and added to the stack. Note, however, that the
display now contains only two pointers: one each to the activation records for
D and A. This is becauseprocedure D cannot refer to variables in B or C, ex-
cept through parameters that are passed to it, even though it was called from
C. According to the rules for the scope of names in a block-structured lan-
guage, procedure D can refer only to variables that are declared by D or by
some block that contains D in the source program (in this case,procedure A).
298 Chapter 5 Conipulers
Activation
record for C
Activation Activation
record for C record for C
Activation Activation
record for B record for B
Activation Activation
record
Stack
for A Display record for A
Stack Display
(b)
Activation
record for B
Activation Activation
record for D record for D
Activation Activation
record for C record for C
Activation Activation
record for C record for C
Activation Activation
record for B record for B
Activation Activation
record
Stack
for 5.32
Figure A (c) of display
Use Display
for proceduresrecord
inStack
Fig.
for5.31.
A (d) Display
5.4 Compiler Design Options 299
In this section we consider some of the possible alternatives for the design and
construction of a compiler. The discussions in this section are necessarily very
brief. Our purpose is to introduce terms and concepts rather than to give a
comprehensivediscussion of any of thesetopics.
The compilation scheme presented in Section 5.1 was a simple one-pass
design. Sections5.2 and 5.3 described many features that usually require more
than one pass to implement. In Section 5.4.1 we briefly discuss the general
question of dividing a compiler into passes,and consider the advantages of
one-passand multi-pass designs.
Section 5.4.2 discusses interpreters, which execute an intermediate form of
the program instead of translating it into machine code. Section 5.4.3 intro-
duces the related topic of P-code systems,which compile high-level language
programs into object code for a hypothetical machine.
Finally, Section 5.4.4 describes compiler-writing systems, which use soft-
ware tools to automate much of the processof compiler construction.
~-
that use these variables. In FORTRAN, declarations may appear at the begin-
ning of the program; any variable that is not declared is assigned characteris-
tics by default. However, in some languages the declaration of an identifier
may appear after it has been used in the program. One-passcompilers must
have the ability to fix up forward referencesin jump instructions, using tech-
niques like those discussed for one-pass assemblers. Forward references to
data items, however, present a much more serious problem.
Consider, for example, the assignment statement
If all of the variables X, Y, and Z are of type INTEGER, the object code for this
statement might consist of a simple integer multiplication followed by storage
of the result. If the variables are a mixture of REAL and INTEGER types, one
or more conversion operations will need to be included in the object code, and
floating-point arithmetic instructions may be used. Obviously the compiler
cannot decide what machine instructions to generate for this statement unless
information about the operands is available. The statement may even be illegal
for certain combinations of operand types. Thus a language that allows for-
ward referencesto data items cannot be compiled in one pass.
Some programming languages, because of other characteristics, require
more than two passes to compile. For example Hunter (1981) showed that
ALGOL 68 required at least three passes.
There are a number of factors that should be considered in deciding be-
tween one-passand multi-pass compiler designs (assuming that the language
in question can be compiled in one pass).If speed of compilation is important,
a one-pass design might be preferred. For example, computers running stu-
dent jobs tend to spend a large amount of time performing compilations. The
resulting object code is usually executed only once or twice for each compila-
tion; these test runs are normally very short. In such an environment, im-
provements in the speed of compilation can lead to significant benefits in
system performance and job turnaround time.
If programs are executed many times for each compilation, or if they
processlarge amounts of data, then speed of execution becomesmore impor-
tant than speed of compilation. In such a case,we might prefer a multi-pass
compiler design that could incorporate sophisticated code-optimization
5.4 CompilerDesignOptions 301
techniques. Multi-pass compilers are also used when the amount of memory,
or other system resources,is severely limited. The requirements of each pass
can be kept smaller iftthe work of compilation is divided into several passes.
Other factors may also influence the design of the compiler. If a compiler is
divided into several passes,each pass becomes simpler and, therefore, easier
to understand, write, and test. Different passes can be assigned to different
programmers and can be written and tested in parallel, which shortens the
overall time required for compiler construction.
For further discussion of the problem of dividing a compiler into passes,
see Hunter (1981) and Aho et al. (1988).
5.4.2 Interpreters
other computer. In this way, a P-code compiler can be used without modifica-
tion on a wide variety of systems if a P-code interpreter is written for each dif-
ferent machine. Althéugh writing such an interpreter is not a trivial task, it is
certainly easier than writing a new compiler for each different machine. The
same approach can also be used to transport other types of system software
without rewriting.
The design of a P-machine and the associated P-code is often related to the
requirements of the language being compiled. For example, the P-code for
a Pascal compiler might include single P-instructions that perform array-
subscript calculations, handle the details of procedure entry and exit, and per-
form elementary operations on sets. This simplifies the code-generation
process,leading to a smaller and more efficient compiler. In addition, the P-
code object program is often much smaller than a corresponding machine-
code program would be. This is particularly useful on machines with severely
limited memory size.
Source program
P-code
interpreter
approach sacrifices some of the portability associated with the use of P-code
compilers.
Section5.5.4of this text describesa recently developed P-codecompiler for
the Java language.
5.4.4 Compiler-Compilers
The process of writing a compiler usually involves a great deal of time and ef-
fort. In some areas, particularly the construction of scannersand parsers, it is
possible to perform much of this work automatically. A compiler-compileris a
software tool that can be used to help in the task of compiler construction.
Such tools are also often called compilergeneratorsor translator-writingsystems.
The processof using a typical compiler-compiler is illustrated in Fig. 5.34.
The user (i.e., the compiler writer) provides a description of the language to be
translated. This description may consist of a set of lexical rules for defining to-
kens and a grammar for the source language. Some compiler-compilers use
this information to generate a scanner and a parser directly. Others create ta-
bles for use by standard table-driven scanning and parsing routines that are
supplied by the compiler-compiler.
In addition to the description of the source language, the user provides a
set of semantic or code-generation routines. Often there is one such routine for
each rule of the grammar, as we discussed in Section 5.1. This routine is called
by the parser each time it recognizes the language construct described by the
associatedrule. However, some compiler-compilers can parse a larger section
of the program before calling a semantic routine. In that case,an internal form
of the statements that have been analyzed, such as a portion of the parse tree,
may be passed to the semantic routine. This latter approach is often used
when code optimization is to be performed. Compiler-compilers frequently
provide special languages,notations, data structures, and other similar facili-
ties that can be used in the writing of semantic routines.
Compiler
Lexical Scanner
rules
Compiler-compiler Parser
Semantic Code
routines generator
In this section we briefly discuss the design of several real compilers. Section
5.5.1 describes the SunOS C compiler, which runs on a variety of hardware
platforms. Section 5.5.2 discusses the GNU NYU Ada Translator (GNAT), a
freely distributed Ada 95 compiler. Section 5.5.3 describes the Cray MPP
FORTRAN compiler. This compiler is designed to produce object code that
runs efficiently on a massively parallel processingmachine like the T3E.
Section 5.5.4 discussesthe Java programming language and run-time envi-
ronment recently developed by Sun Microsystems. The Java compiler is a
P-codecompiler of the type we discussed in Section 5.4.3.
Section 5.5.5 presents a description of the YACC compiler-compiler, origi-
nally developed at Bell Laboratories for use with the UNIX operating system.
We also briefly describe LEX, a scannergenerator that is commonly used with
YACC.
specify which sets of language features are to be accepted by the compiler and
which are to generate warning messagesduring compilation.
The translation process begins with the execution of the C preprocessor,
which performs such functions as file inclusion and macro processing. (See
Section 4.4.3 for a brief discussion of some of these features.) The output from
the preprocessor goes to the C compiler itself. Several different levels of code
optimization can be specified by the user. The compiler generatesassembler
language, which is then translated by an assembler. The preprocessor and
compiler also accept source files that contain assembler language subpro-
grams, and pass theseon to the assembly phase.
The preprocessing phase consists of the following conceptual steps. The
logical ordering of these steps is specified by the ANSI standard to eliminate
possible ambiguities. The implementation of a particular preprocessormay in
fact combine several of these steps. However, the effect must be the same as if
they were executedseparately in the sequencegiven.
(step 3 above). Thus the compiler itself begins with syntactic analysis, fol-
lowed by semantic analysis and code generation.
Like most implementations of UNIX, the SunOS operating system itself is
largely written in C. Many compilers, editors, and other pieces of UNIX sys-
tem software are also written in C. Because of this, it is important that a C
compiler for such a system be able to generate efficient object code. It is also
desirable that the compiler include tools to assist programmers in analyzing
the performance of their programs. We will focus on these aspects of the
SunOSC compiler.
Four different levels of code optimization can be specified by the user
when a program is compiled. These levels are designated by O1 through O4.
The O1 level does only a minimal amount of local (peephole) optimization.
This type of optimization is performed at the assembler-languagelevel, after
the compilation itself is complete.
The O2 level provides basic local and global optimization. This includes
register allocation and merging of basic blocks (see Section 5.2.2) as well as
elimination of common subexpressions and removal of loop invariants (see
Section 5.3.2). It also includes a number of other optimizations such as alge-
braic simplification and tail recursion elimination. In general, the O2 level of
optimization results in the minimum object code size. This is the default that is
provided unless otherwise requestedby the user.
The O03and O4 levels include optimizations that can improve execution
speed, but usually produce a larger object program. For example, O3 opti-
mization performs loop unrolling, which partially converts loops into straight-
line code. The O4 level automatically converts calls to user-written functions
into in-line code. This eliminates the overhead of calling and returning from
the functions. Optimizations such as these can be guided by the user via com-
pile-time options. For example, the user can specify the names of functions
that should (or should not) be converted to in-line code.
When requested, the SunOSC compiler can insert special code into the ob-
ject program to gather information about its execution. For example, one op-
tion accumulates a count of how many times each basic block is executed.
Another option invokes a run-time recording mechanism. The resulting data
can be analyzed by other SunOSsoftware tools to produce profiles of program
execution. For example, one such profile shows the percentage of execution
time spent in different parts of the program.
Other program analysis tools provide support for reordering object code at
the function level. To use these tools, the user instructs the C compiler to place
each function in a separatesection. After the execution profile is analyzed, the
object program is relinked to rearrange the functions. This can produce an exe-
cutable program with improved locality of referencethat runs more efficiently
308 Chapter
5 Compilers
-~
*Adapted from “The GNAT Project: A GNU-Ada 9X Compiler” by Edmond Schonberg and
Bernard Banner, from Tri-Ada 94 Proceedings. pp. 48-57. © 1994 Association for Computing
Machinery, Inc.
5.5 ImplementationExamples 309
in Ada
syntax semantic
expansion
analysis analysis
AST
DIMENSION A(256)
CDIR$ SHARED A(:BLOCK)
The compiler directive SHARED specifies that the elements of the array are to
be divided among the processing elements that are assigned to execute the
program. If 16 PEs are being used, the array elements would be distributed as
shown in Fig. 5.36.
In this case, a contiguous block of 16 elements is assigned to each PE.
Several other methods of distribution can be specified by using options in the
SHARED directive. Each dimension of a multi-dimensional array can use its
own distribution scheme.
A[243]
In this example, the compiler directive DOSHARED specifies that the itera-
tions of the loop are to be divided among the available PEs. Each PE executes
the loop on the array elements that are assigned to it. If the array is distributed
as shown in Fig. 5.36, PEOprocesseselements 1 through 16, PE1 processesele-
ments 17 through 32, and so on. All the PEs perform their computations at the
same time. When a PE completes its assignediterations, it automatically waits
at the end of the loop until all of the PEs have finished.
Techniquessuch as shared loops provide a high-level method of using the
MPP system with minimal extra effort from the programmer. The compiler
also implements lower-level features that can be used if more detailed control
of the processing is needed. For example, the special constant N$PES is equal
to the number of PEs that are available to the program. The intrinsic function
MY_PE returns the number of the PE that executes the function call. This num-
ber is relative to the subset of PEs being used by the program—thus, it is al-
ways a value between 0 and N$PES-1.
When all PEsassigned to a program are working at the same time, the pro-
gram is said to be executing in a parallel region.MPP FORTRAN programs al-
ways begin execution in a parallel region. Occasionally it may be desirable to
enter a serial region in which only one PE is executing. This can be accom-
plished with the compiler directives
CDIR$ MASTER
The statements between MASTER and END MASTER are executed only on
PEO(the first PE assigned to the program). All other PEs remain idle until the
serial region is completed.
The MPP FORTRAN compiler provides a number of tools that can be used
to synchronize the parallel execution of programs. A barrier is like a roadblock
within an executing program. When a PE encounters a barrier, it stops execu-
tion and waits until all other PEs have also reached the barrier. Barriers are au-
tomatically inserted at the end of shared loops. They can also be specified by
the programmer using the compiler directive BARRIER. Barriers are imple-
mented via hardware signals using the network that connects the PEs of the
T3E system.
Another type of synchronization allows PEsto signal each other. For exam-
ple, suppose that the PEsassigned to a program are searching a databasefor a
particular item. When one PE finds the item, it can signal the others by calling
the intrinsic function SET_EVENT. This function sets the value of an event vari-
able.During the search, each PE periodically calls a function TEST_EVENT to
check the value of this event variable. Thus all PEs can stop searching as soon
as one has found the desired item.
Event variables as described above are implemented by storing data val-
ues, using shared memory locations. A more efficient, but less flexible, mecha-
nism uses the same interconnection network that implements barriers. This
type of event mechanism is known as eureka mode.
MPP FORTRAN also provides several other high-level mechanisms that
can be used to synchronize program execution. In addition, there are low-level
functions that allow a PE to explicitly accessany word of memory from any
other PE. These methods allow greater control over the MPP execution of the
program. However, they also demand a much higher level of effort from the
programmer.
Further information about the Cray MPP FORTRAN compiler can be
found in Cray Research(1995a,1995c).
5.5 ImplementationExamples 313
Java is an object-oriented language. (If you are unfamiliar with the princi-
ples of object-oriented programming, you may want to review the discussion
in Section 8.4.1.) The object-orientation in Java is stronger than in many other
languages, such as C++. Except for a few primitive data types, everything in
Java is an object. Arrays and strings are treated as objects. Even the primitive
data types can be encapsulated inside objects if necessary.There are no proce-
dures or functions in Java; classes and methods are used instead. Thus pro-
grammers are constrained to use a “pure” object-oriented style, rather than
mixing the procedural and object-oriented approaches.
Java provides built-in support for multiple threadsof execution. This fea-
ture allows different parts of an application’s code to be executed concurrently.
For example, an interactive application might use one thread to run an anima-
tion, another to control sound effects, and a third to scroll a window. In Java,
threads are implemented as objects. The Java library provides methods that
can be invoked to start or stop a thread, check on the status of a thread, and
synchronize the operation of multiple threads.
The Java compiler follows the P-code approach we discussed in Section
5.4.3.It does not generate machine code or assembly language for a particular
target machine. Instead, the compiler generates bytecodes—ahigh-level, ma-
chine-independent code for a hypothetical machine (the Java Virtual
Machine). This hypothetical machine is implemented on each target computer
by an interpreter and run-time system. Thus a Java application can be run,
without modification and without recompiling, on any computer for which a
Java interpreter exists. The Javacompiler itself is written in Java.Therefore the
314 Chapter5 Compilers
The Java Virtual Machine supports a standard set of primitive data types:
1-, 2-, 4-, and 8-byte integers, single- and double-precision floating-point num-
bers, and 16-bit character codes.Thesedata representationsare independent of
the architecture of the target machine. The interpreter is responsible for emu-
lating these data types using the underlying hardware. Thus, for example, the
floating-point formats and the big- or little-endian storage of integers on the
target machine have no effect on an application program.
A bytecode instruction on the Java Virtual Machine consists of a 1-byte op-
code followed by zero or more operands. Many opcodes require no explicit
operands in the instruction; instead, they take their operand values from a
stack. A stack organization was chosen so that it would be easy to emulate the
machine on a computer with few general-purpose registers (such as the x86
architecture).
For example, the “iadd” instruction adds two integers together. It expects
that the integers to be added are the top two words on the operand stack,
pushed there by previous instructions. Both integers are popped from the
stack, and their sum is pushed back onto the stack. Each primitive data type
has specializedinstructions that must be used to operateon items of that type.
There are also single bytecode instructions that perform higher-level oper-
ations. For example, one instruction allocatesa new array of a particular type.
Other instructions can be used to transfer elements of an array to or from the
operand stack.
Similarly, the bytecode instructions provide direct support for the object-
oriented nature of Java. One instruction creates a new object of a specified
type. Another instruction tests whether an object is an instance of a particular
class. There are four instructions that can be used (depending upon the situa-
tion) to invoke a method on an object. Another group of instructions is used to
manipulate fields within an object.
Performance is always a consideration, especially with interpreted execu-
tion. The Javainterpreter is designedto run as fast as possible,without need-
ing to check the run-time environment. The automatic garbage collection
system used to manage memory runs as a low-priority background thread.
Experiments conducted on modern (1995) systems such as workstations and
5.5 ImplementationExamples 315
*Adapted from “Language Development Tools on the Unix System” by S.C. Johnson, from the
IEEE publication Computer,Vol. 13, No. 8, pp. 16-21,August 1980.€ 1980IEEE.
316 Chapter5 Compilers
(a)
(b)
let
x =y%*z
5.5 ImplementationExamples 317
Note that the first pattern that matches the input stream is selected, so the key-
word let is recognized as the token LET, not as ID.
LEX can be used to produce quite complicated scanners.Some languages,
such as FORTRAN, however, have lexical analyzers that must still be gener-
ated or modified by hand.
The YACC parser generator accepts as input a grammar for the language
beingcompiledanda setof actionscorresponding
to rulesof the grammar.A
portion of such an input specification appears in Fig. 5.37(b). The first line
shown is a declaration of the token types used. The other entries are rules of
the grammar. The YACC parser calls the semantic routine associated with each
constructs a portion of the parse tree for the statement, using a tree-building
function build. This subtree is returned from the semantic routine by assigning
the subtree to $$. The arguments passed to the build function are the operator
MUL and the values (i.e., the subtrees)returned when the operands were rec-
ognized. Thesevalues are denoted by $1 and $3.
It is sometimes useful to perform semantic processing as each part of a rule
is recognized. YACC permits this by allowing semantic routines to be written
in the middle of a rule as well as at the end. The value returned by such a rou-
tine is available to any of the routines that appear later in the rule. It is also
possible for the user to define global variables that can be used by all of the se-
mantic routines and by the lexical scanner.
The parsers generated by YACC use a bottom-up parsing method called
LALR(1), which is a slightly restricted form of shift-reduce parsing. The
parsers produced by YACC have very good error detection properties. Error
handling permits the reentry of the items in error or a continuation of the in-
put process after the erroneous entries are skipped.
318 Chapter
Section
EXERCISES
1. 5Draw
Compilers
5.1 parse trees,according to the grammar in Fig. 5.2, for the fol-
lowing <id-list>s:
a. ALPHA
2. b.
Draw
ALPHA,
parse BETA,
trees, according
GAMMAto the grammar in Fig. 5.2, for the fol-
lowing <exp>s:
a. ALPHA + BETA
. c.
Suppose
ALPHARules
DIV 10(BETA
and 11+
of GAMMA)
the grammar
- DELTA
in Fig. 5.2 were changed to
<exp>
<term> ::= <factor>
<term> | <exp>
| <term>
* <term
+ <factor>
| <exp>
| <term>
DIV <term
- <factor>
Draw the parse trees for the <exp>s in Exercise 2 according to this
modified grammar. How has the change in the grammar affected the
. Assume
precedence
thatofRules
the arithmetic
10 and 11operators?
of the grammar in Fig. 5.2 are deleted
Draw the parse trees for the <exp>s in Exercise 2 according to this
modified grammar. How has the change in the grammar affected the
. precedence
Modify the grammar
of the arithmetic
in Fig. 5.2
operators?
to include exponentiation operations
of the form XTY. Be sure that exponentiation has higher priority than
. Modify
any otherthe
arithmetic
grammaroperation.
in Fig. 5.2 to include statementsof the form
where the ELSE clause may be omitted. Assume that the condition
Modify the grammar in Fig. 5.2 so that the I/O list for a WRITE state-
ment may include character strings enclosed in quotation marks, as
well as identifiers.
name
name:n
name :n:m
‘string’
‘string’:n
320 Chapter 5 Compilers
where
nanie must start with.a letter (a—z); all characters after the first let-
ter must be either letters (a—z)or digits (0-9).
string may contain any characters other than quote (’).
c. theFOR
statement
beginning
online7
28.
27.
26.
29.
25.
23.
24.
22.
21. Exercises 321
Refer to the parse tree in Fig. 5.4 to see the order in which the parser
recognizes the various constructs involved in these statements.
Use the routines in Figs. 5.18-5.20 to generate code for the entire pro-
gram in Fig. 5.1.
Write code-generation routines for the new rules that you added to
the grammar in Exercise 6 to define the IF statement.
wn
. Rewrite the code-generation routines given in Figs. 5.18 and 5.19 to
produce quadruples instead of object code.
Write a set of routines to generate object code from the quadruples
produced by your routines in Exercise1. (Hint: You will need a rou-
tine that is similar in function to the GETA procedure in Fig. 5.19.)
Use the routines you wrote in Exercise 1 to produce quadruples for
the following program fragment:
READ(X,Y);
Z2:=3*X-5* Y+tX* Y;
Use the routines you wrote in Exercise2 to produce object code from
the quadruples generated in Exercise3.
Rewrite the code-generation routines given in Fig. 5.20 to produce
quadruples instead of object code.
Use the routines you wrote in Exercises1 and 5 to produce quadru-
ples for the program in Fig. 5.1.
Divide the quadruples you produced in Exercise 6 into basic blocks
anddraw
a flow graphfor theprogram.
Assume that you are generating SIC/XE object code from the
quadruples produced in Exercise 6. Show one way of performing
register assignments to optimize the object code, using registers S
Section
1. and 5.3
T to hold variable values and intermediate results.
. Assumethearray
C is declaredas
v
C: ARRAY[5..20] OF INTEGER
C(I] := 0
. AssumethearrayD is declaredas
D: ARRAY [-10..10,2..12] OF INTEGER
D[I,J} := 0
D[I,J] := 0
E[I,J,K] := 0
. How could the base address for the array A defined in Fig. 5.26(a) be
modified to avoid the need for subtracting 1 from the subscript value
(quadruple 1)?
. How could the technique derived in Exercise 8 be extended to two-
dimensional arrays?
324 Chapter
10.
12.
11. 5Assume
Compilers
the array declaration
K:= J-1;
FOR I := 1 TO 5 DO
BEGIN
T{I,J}] :=K* K
J := J+kK
T (I,J) :=K* K-11;
END
Operating Systems
325
326 Chapter6 Operating Systems
system goals such as efficiency. The details of this resource management can
be quite complicated; however, the operating system usually hides such com-
plexities from the user.
Our discussion of basic features is much shorter and more general than
those in previous chapters. In Chapter 2, for example, we were able to give a
common framework that could be applied to all assemblers—thosefor micro-
processors as well as those for multi-user supercomputers. However, the oper-
ating systems for two such dissimilar computers would be quite different.
Except for the general approaches described in this section, these operating
systems would have little in common.
We may visualize the basic functions of an operating system as illustrated
in Fig. 6.1. The operating system supports a userinterfacethat governs the in-
teractions with programmers, operators, etc. The user interface is what is usu-
ally described in response to the question, What kind of operating system is
this? This interface may, for example, provide a control language.The usersof
sucha language
couldentera commandsuchasRUN
Pto invokethesystem
loader to load and execute a program. Section 6.1.1 introduces some operating
system terminology and describes a classification of operating systemsbased
on the kind of user interface they provide. Section 6.1.2briefly discussessome
of the possible functions of the user interface.
User
RUN P
Real
machine
Extended machine
(run-time environment)
User interface
The operating system also provides programs with a set of services that
canaidin theperformance
of manycommontasks.Forexample,
suppose
pro-
gram P wantsto readdata sequentiallyfrom a file. An operatingsystemmight
provide a service routine that could be invoked with a command such as
read(f).With such a command, the program would specify a file name; the op-
erating system would take care of the details of performing the actual ma-
chine-level I/O.
Such service routines can be thought of as providing a run-time environment
for the programs being executed. Section 6.1.3 gives a general discussion of
such a run-time environment. More detailed descriptions of some common
service functions and routines are contained in Sections 6.2 and 6.3.
The most common ways of classifying operating systems are based on the
kind of user interface provided. Much operating system terminology arises
from the way the system appears to a user. In this section we introduce terms
commonly used to describe operating systems. The types of systems men-
tioned are not always distinct. Some of the classifications overlap and many
real operating systems fall into more than one category.
One way of classifying operating systemsis concernedwith the number of
users the system can support at one time. A single-jobsystem is one that runs
one user job at a time. Single-job systems, which are commonly found today
on microcomputers and personal computers, were the earliest type of operat-
ing system.A single-joboperatingsystemwould probablybe usedona stan-
dard SIC computer. Because of the limited memory size and lack of data
channels and other resources, it would be difficult to support more than one
user on such a machine.
A multiprogrammingsystem permits several user jobs to be executed con-
currently. The operating system takes care of switching the CPU among the
various user jobs. It also provides a suitable run-time environment and other
support functions so the jobs do not interfere with each other. A multiprocessor
system is similar to a multiprogramming system, except that there is more
than one CPU available. In most multiprocessor systems, the processors share
a common memory. Thus the user can view the system as if it were a powerful
single processor.
A network of computers may be organized in a number of different ways.
Each computer may have its own independent operating system, which pro-
vides an interface to allow communication via the network. A user of such a
system is aware of the existenceof the network. He or she may login to remote
machines, copy files from one machine to another, etc. This kind of system is
often called a networkoperatingsystem.Except for the network interface, such an
operatingsystemis quitesimilarto thosefoundona single-computer
system.
328 Chapter6 Operating Systems
With the support of an operating system, the task of the user program
would be much easier.The program could simply invoke a service routine and
specify the device to be used. The operating system would take care of all de-
tails such as status testing and counting of bytes transferred, and it would also
handle any necessaryerror checking.
A service routine such as the one just described can be thought of as pro-
viding an extension to the underlying machine. In a typical operating system,
there are many such routines. Taken together, these service routines can be
thought of as defining an extendedmachinefor use by programs during execu-
tion. Programs deal with the functions and capabilities provided by this ex-
tended machine; they have no need to be concerned with the underlying real
machine. The extended machine is easier to use than the real machine would
be. For example, the details of performing an I/O operation are much simpler.
The extended machine may also be more attractive in other ways. For exam-
ple, I/O operations on the extended machine may appear to be less error-
prone than on the real machine, because the operating system takes care of
error detection and recovery.
The extended machine is sometimes referred to as a virtual machine.
However, the term virtual machineis also used in a different, although related,
context. This alternative usage of the term is described in Section 6.4.2.
tends to be inconvenient and error-prone, and it also may allow the user to by-
pass certain safeguards built into the operating system. On more advanced
systems,the usersgenerallyrequestoperatingsystemfunctionsby meansof
some special hardware instruction such as a supervisorcall (SVC). Execution of
an SVC instruction generates an interrupt that transfers control to an operat-
ing system service routine. A code supplied by the SVC instruction specifies
the type of request. The handling of interrupts by an operating system is dis-
cussed in Section 6.2.1.
Ona typicalmachine,thegeneration
of aninterruptalsocauses
theCPUto
switch from user modeto supervisor mode.In supervisor mode, all machine in-
structions and features can be used. Most parts of the operating system are de-
signed to run in supervisor mode. In user mode, however, some instructions
are not available. These might include, for example, instructions that perform
I/O functions, set memory protection flags, or switch the CPU from one mode
to another. We discuss examples of such instructions later in this chapter.
Restricting the use of such privilegedinstructions forces programs to make use
of the services provided by the run-time environment. That is, user programs
must deal with the extended machine interface, rather than utilizing the un-
derlying hardware functions directly. This restriction also prevents user pro-
grams from interfering, either deliberately or accidentally, with the resource
management functions of the operating system. Privileged instructions and
user/supervisor modes (or their equivalents) are a practical necessity for a
system that supports more than one user at a time.
In Sections 6.2 and 6.3 we discuss many different functions and services
that are commonly provided by the run-time environment. At this level, there
is much similarity between operating systemsthat might appear very different
at the user interface. Most of the techniques discussed can be applied, with a
few modifications, to all types of operating systems.
Consider, for example, a standard SIC computer. This machine has a small
central memory, no I/O channels, no supervisor-call instruction, and no inter-
rupts. Such a machine might be suitable as a personal computer for a single
user; however, it could not reasonably be shared among several concurrent
332 Chapter6 Operating Systems
users. Thus an operating system for a standard SIC machine would probably
be a single-job system, providing a simple user interface and a minimal set of
functions in the run-time environment. It would probably provide few, if any,
capabilities beyond the simple ones discussedin Section6.1.
On the other hand, a SIC/XE computer has a larger central memory, I/O
channels, and many other hardware features not present on the standard SIC
machine. A computer with these characteristics might well have a multipro-
gramming operating system. Such a system would allow several concurrent
users to share the expanded machine resourcesthat are available, and would
take better advantage of the more advanced hardware. Of course, the sharing
of a computing system between several users creates many problems, such as
resource allocation, that must be solved by the operating system. In addition,
the operating system must provide support for the more advanced hardware
features such as I/O channels and interrupts.
In this section we discuss some important machine-dependent operating
system functions. This discussion is presented in terms of a SIC/XE computer;
however, the same principles can easily be applied to other machines that
have architectural features similar to those of SIC/XE. We describe a number
of significant SIC/XE hardware features as a part of our discussion. For easeof
reference,these features are also summarized in Appendix C.
Section 6.2.1 introduces fundamental concepts of interrupts and interrupt
processing that are used throughout the remainder of this chapter. Section
6.2.2 discussesthe problem of switching the CPU among the several user jobs
being multiprogrammed. Section 6.2.3describesa method for managing input
and output using I/O channels in a multiprogramming operating system.
Sections 6.2.4 and 6.2.5 discuss the problem of dividing the central memory
between user jobs. Section 6.2.4 presents techniques for managing real mem-
ory, and Section 6.2.5 introduces the important topic of virtual memory.
An interrupt is a signal that causes a computer to alter its normal flow of in-
struction execution. Such signals can be generated by many different condi-
tions, such as the completion of an I/O operation, the expiration of a preset
time interval, or an attempt to divide by zero.
The sequence of events that occurs in response to an interrupt is illustrated
in Fig.6.2.Suppose
program
A is beingexecuted
whenaninterruptsignalis
generated by some source. The interrupt automatically transfers control to an
interrupt-processingroutine (also called an interrupt handler) that is usually a
part of the operating system. This interrupt-processing routine is designed to
take some action in response to the condition that caused the interrupt. After
6.2. Machine-DependentOperating SystemFeatures 333
Interrupt-processing
routine
Interrupt
334 Chapter6 Operating Systems
Interrupt
Class type
I SVC
I] Program
HI Timer
IV 1/O
value reacheszero, a timer interrupt occurs. The interval timer is used by the
operating system to govern how long a user program can remain in control of
the machine.
the address given by the new value of PC. This address, which is prestored in
the interrupt work area, is the starting address of the interrupt-handling rou-
tine for a timer interrupt. The loading of the status word SW also causescer-
tain changes,described later in this section, in the state of the CPU.
After taking whatever action is required in response to the interrupt, the
interrupt-handling routine returns control to the interrupted program by exe-
cuting a Load Processor Status (LPS) instruction. This action is illustrated in
Fig. 6.4(b). LPS causes the stored contents of SW, PC, and the other registers to
be loaded from consecutive words beginning at the addressspecified in the in-
struction. This restores the CPU status and register contents that existed at the
time of the interrupt, and transfers control to the instruction following the one
c 6.2 Machine-DependentOperating SystemFeatures 335
Central memory
106
SVC
109
interrupt
work area 10C
Registers
—E
fc ——
130] New SW
133] New PC
136| Old SW
Program
}
interrupt
139} Old PC
Timer
Old PC
interrupt ¢
work area |
Register
save
area
el
1
190]
NewSW
193| New PC |
196] Old SW
0 |
interrupt
199] Old PC
area |
(a)
Central memory
100
103
svc 106
Registers
|
1301 New SW
| | |
133; New PC
J
136; Old SW
Program j}—______}
interrupt 139] OldPC -—____—_—|
work area
| 13C
Register
| save
area
L LL
- 160] New SW
163] New PC
166) Old SW
Timer
interrupt 169] Old PC
work area
| 16C
Register
save
| area
([~ _——— |
190} New SW
I
193} New PC
| }+-—_———_—_—_—++
196| Old SW
0 +
199} Old PC
interrupt q
work area
| 19C
| Register
save |
| area.
L
!
(b)
that was being executed when the interrupt occurred. The saving and restor-
ing of the CPU status and register contents are often called context switching
operations. .
The status word SW contains several pieces of information that are impor-
tant in the handling of interrupts. We discuss the contents of SW for a SIC/XE
machine.Mostcomputershave
a similarregister,which is oftencalleda pro-
gram status word or a processor status word.
Figure 6.5 shows the contents of the status word SW. The first bit, MODE,
specifies whether the CPU is in user mode or supervisor mode. Ordinary pro-
grams are executed in user mode (MODE = 0). When an interrupt occurs, the
new SW contents that are loaded have MODE = 1, which automatically
switches the CPU to supervisor mode so that privileged instructions may be
used. Before the old value of SW is saved, the ICODE field is automatically set
to a value that indicates the cause of the interrupt. For an SVC interrupt,
ICODE is set to the value supplied by the user in the SVC instruction. This
value specifies the type of service request being made. For a program inter-
rupt, ICODE indicates the type of condition, such as division by zero, that
caused the interrupt. For an I/O interrupt, ICODE gives the number of the
I/O channel that generated the interrupt. Further information about the possi-
ble values of ICODE can be found in Appendix C.
The status word also contains the condition code CC. Saving SW automati-
cally preserves the condition code value that was being used by the inter-
rupted process.The use of the fields IDLE and ID will be described later in
this chapter. IDLE specifies whether the CPU is executing instructions or
is idle. ID contains a 4-bit value that identifies the user program currently
being executed.
Bit Field
position name Use
6-7 CC Conditioncode
8-11 MASK Interruptmask
12-15 Unused
The remaining status word field, MASK, is used to control whether inter-
rupts are allowed. This control is necessary to prevent loss of the stored
processor status information. Suppose, for example, that an I/O interrupt oc-
curs. The values of SW, PC, and the other registers would be stored in the I/O-
interrupt work area as just described, and the CPU would begin to execute the
1/O-interrupt handler. If another I/O interrupt occurred before the processing
of the first had been completed, another context switch would take place. This
time, however, the register contents stored in the interrupt work area would
be the values currently being used by the interrupt handler. The values that
were saved by the original interrupt would be destroyed, so it would be im-
possible to return control to the user program that was executing at the time of
the first interrupt.
To avoid such a problem, it is necessaryto prevent certain interrupts from
occurring while the first one is being processed.This is accomplished by using
the MASK field in the status word. MASK contains one bit that corresponds to
each class of interrupt. If a bit in MASK is set to 1, interrupts of the corre-
sponding classare allowed to occur. If the bit is set to 0, interrupts of the corre-
sponding class are not allowed. When interrupts are prohibited, they are said
to be masked(also often called inhibited or disabled).Interrupts that are masked
are not lost, however, because the hardware saves the signal that would have
caused the interrupt. An interrupt that is being temporarily delayed in this
way is said to be pending.When interrupts of the appropriate class are again
permitted, becauseMASK has been reset, the signal is recognized and an in-
terrupt occurs.
The masking of interrupts on a SIC/XE machine is under the control of the
operating system. It depends upon the value of MASK in the SW that is pre-
stored in each interrupt work area.One approach is to set all the bits in MASK
to 0, which prevents the occurrence of any other interrupt. However, it is not
really necessaryto inhibit all interrupts in this way.
Each class of interrupt on a SIC/XE machine is assigned an interrupt prior-
ity. SVC interrupts (Class I) have the highest priority, program interrupts
(Class II) have the next highest priority, and so on. The MASK field in the sta-
tus word corresponding to each interrupt class is set so that all interrupts of
equal or lower priority are inhibited; however, interrupts of higher priority are
allowed to occur. For example, the status word that is loaded in responseto a
program interrupt would have the MASK bits for program, timer, and I/O in-
terrupts set to 0; these classes of interrupt would be inhibited. The MASK bit
for SVC interrupts would be set-to 1, so these interrupts would be allowed.
When interrupts are enabled at the end of an interrupt-handling routine, there
may be more than one type of interrupt pending (for example, one timer inter-
rupt and one I/O interrupt). In such a case, the pending interrupt with the
highest priority is recognized first.
6.2 Machine-Dependent
OperatingSystemFeatures 339
ample, a process might be blocked becauseit must wait for the completion of
an I/O operation before proceeding. Processesthat are neither blocked nor
running are said to be ready.Theseprocessesare candidates to be assigned the
CPU when the currently running processgives up control.
Figure 6.7 shows the possible transitions between these three process
states. At any particular time, there can be no more than one process in the
running state (i.e., in control of the CPU). When the operating system transfers
control to a user process,it sets the interval timer to specify a time-slice,which
is a maximum amount of CPU time the process is allowed to use before giving
up control. If this time expires, the process is removed from the running state
and placed in the ready state. The operating system then selectssome process
from the ready state, according to its scheduling policy. This processis placed
in the running state and given control of the CPU. The selection of a process,
Wait for
some event
Time-slice
expired Awaited event
has occurred
Dispatch
and the transfer of control to it, is usually called dispatching.The part of the op-
erating system that performs this function is known as the dispatcher.
Before it has usec all its assigned time-slice, a running process may find
that it must wait for the occurrenceof some event such as the completion of an
I/O operation. In such a case, the running process enters the blocked state,
and a new process is dispatched. When an awaited event occurs, the blocked
process associated with that event is moved to the ready state, where it is
again a candidate for dispatching. The operations of waiting for an event, and
of signaling that an event has occurred, are implemented as operating system
service requests (using SVC). A mechanism often used to associate processes
with awaited events is described later in this section.
procedure DISPATCH
else
equally. The dispatcher cycles through the PSBs,selecting the next processthat
is in the ready state. Each process dispatched is given the same length time-
slice as all other processes.
More complicated dispatching methods may select processesbased on a
priority scheme. In some systems, the priorities are predefined, based on the
nature of the user job. The goal of such a system is to provide the desired level
of service for each class of job. On other systems, the priorities may be as-
signed by the operating system itself. In this case,the assignment of priorities
is made in an effort to improve the overall system performance. Priorities may
be allowed to vary dynamically, depending on the system load and perfor-
mance. It is also possible to assign different time-slices to different processesin
conjunction with the priority system. Further discussion of these more sophis-
ticated dispatching techniquescan be found in Tanenbaum(1992).
When a running process reaches a point at which it must wait for some
event to occur, the process informs the operating svstem by making a WAIT
(SVC 0) service request. The occurrence of an event on which other processes
may be waiting is communicated to the operating system by a SIGNAL (SVC 1)
request. In Section 6.2.3 we present examples of the use of WAIT and
SIGNAL; at this time, we are concerned with how these requests are related to
the process-schedulingfunction.
Figure 6.9 gives the sequenceof logical steps that is performed by the oper-
ating system in response to such service requests. The event to be awaited or
signaled is specified by giving the addressof an eventstatusblock(ESB)that is
associated with the event. The ESB contains a flag bit ESBFLAG that records
whether or not the associated event has occurred. The ESB also contains a
pointer to ESBQUEUE,a list of all processescurrently waiting for the event.
Further information about ESBs,and examples of their creation and use, are
presentedin Section6.2.3.
6.2 Machine-DependentOperating SystemFeatures 343
procedure WAIT(ESB)
begin
mark requesting process as Blocked
enter requesting process on ESBQUEUE
DISPATCH
end
(a)
procedure SIGNAL(ESB)
begin
Mark process as Ready
remove process from ESBQUEUE
end
(b)
Figure 6.9 Algorithms for WAIT (SVC 0) and SIGNAL (SVC 1).
The WAIT request is issued by a running process and indicates that the
processcannot proceed until the event associatedwith ESBhas occurred. Thus
the algorithm for WAIT first examines ESBFLAG.If the event has already oc-
curred, control is immediately returned to the requesting process.If the event
has not yet occurred, the running processis placed in the blocked state and is
entered on ESBQUEUE.The dispatcher is then called to selectthe next process
to be run.
The SIGNAL request is made by a process that detects that some event cor-
responding to ESB has occurred. The algorithm for SIGNAL therefore records
the event occurrence by setting ESBFLAG. It then scans ESBQUEUE,the list of
processes waiting for this event. Each process on the list is moved from the
blocked state to the ready state. Control is then returned to the process that
made the SIGNAL request.
If the dispatching method being used is based on priorities, a slightly dif-
ferent SIGNAL algorithm is often used. On such systems,it may happen that
Oneor more of the processesthat were made ready has a higher priority than
344 Chapter6 Operating Systems
the currently running process. To take this into account, the SIGNAL algo-
rithm would invoke the dispatcher instead of returning control directly to the
requesting process.The dispatcher would then transfer control to the highest-
priority process that is currently ready. This scheme is known as preemptive
process scheduling. It permits a process that becomes ready to seize control
from a lower-priority process that is currently running, without waiting for
the time-slice of the lower-priority process to expire.
On a typical small computer, such as a standard SIC machine, input and out-
put are usually performed 1 byte at a time. For example, a program that needs
to read data might enter a loop that tests the status of the I/O device and exe-
cutes a series of read-data instructions. On such systems, the CPU is involved
with each byte of data being transferred to or from the I/O device. An exam-
ple of this type of I/O programming can be found in Fig. 2.1.
More advanced computers often have special hardware to take care of the
details of transferring data and controlling I/O devices.On SIC/XE, this func-
tion is performed by simple processors known as I/O channels.Figure 6.10
shows a typical I/O configuration for SIC/XE. There may be as many as 16
channels, and up to 16 devices may be connected to each channel. The identi-
fying number assigned to an I/O device also reflects the channel to which it is
connected.For example,the devicesnumbered 20-2Fare connectedto channel2.
The sequenceof operations to be performed by a channel is specified by a
channelprogram,which consists of a series of channelcommands.To perform an
I/O operation, the CPU executes a Start I/O (SIO) instruction, specifying a
channel number and the beginning address of a channel program. The channel
then performs the indicated I/O operation without further assistancefrom the
CPU. After completing its program, the channel generatesan I/O interrupt.
Several channels can operate simultaneously, each executing its own channel
program, so several different I/O operations can be in progress at the same
time. Each channel operates independently of the CPU, so the CPU is free to
continue computing while the I/O operations are carried out.
The operating system for a computer like SIC/XE is involved with the I/O
process in several different ways. The system must accept !/O requests from
user programs and inform these programs when the requested operations
have been completed. It must also control the operation of the 1/O channels
and handle the I/O interrupts generated by the channels. In the remainder of
this section we discuss how these functions are performed and illustrate the
processwith several examples.
6.2 Machine-DependentOperating SystemFeatures 345
Devices
Channel 0
Central
Channel 2
In some cases, the WAIT may come immediately after the I/O request.
However, becausecomputing and I/O can be performed at the same time, it
may be possible for the program to continue processing while awaiting the re-
sults of the I/O operation.
346 Chapter6 OperatingSystems
Pl START 0
. {initialization}
LDS #1
LDT #ESB
svc 2 ISSUE NEXT READ REQUEST
{process data}
J LOOP
END
other. Either operation might be completed before the other. It is also possible
that the two operations might actually be performed at the same time. The
program is able to coordinate the related I/O operations because there is a dif-
ferent ESBcorresponding to each operation. This program illustrates how I/O
channels can be used to perform several overlapped I/O operations. Later in
this section we consider a detailed example of this kind of overlap.
348 Chapter6 OperatingSystems
P2 START 0
{initialization}
WORD BUF
IN ADDRESS OF INPUT BUFFER
SECOND COMMAND--
BYTE Xx‘'000000000000' HALT CHANNEL
END
6.2. Machine-DependentOperating SystemFeatures 349
The programs in Figs. 6.11 and 6.12 illustrate I/O requests from the user’s
point of view. Now we are ready to discuss how such requests are actually
handled
bytheoperating
system
andthemachine.
TheSIC/XE
hardware
pro-
vides a channelwork areain memory corresponding to each 1/O channel. This
work area contains the starting address of the channel program currently be-
ing executed, if any, and the address of the ESBcorresponding to the current
operation. When an I/O operation is completed, the outcome is indicated by
status flags that are stored in the channel work area. These flags indicate con-
ditions such as normal completion, I/O error, or device unavailable. The chan-
nel work area also contains a pointer to a queue of I/O requests for the
channel. This queue is maintained by the operating system routines. Appendix
C contains additional details on the location and contents of the channel work
areas for SIC/XE.
Figure 6.13 outlines the actions taken by the operating system in response
to an I/O request from a user program. If the channel on which I/O is being
requested is busy performing another operation, the operating system inserts
the request into the queue for that channel. If the channel is not busy, the oper-
ating system stores the current request in the channel work area and starts the
channel. In either case, contro] is then returned to the process that made
the I/O request so that it can continue to execute while the I/O is being
performed.
Figure 6.14 describesthe actions taken by the operating system in response
to an I/O interrupt. The number of the I/O channel that generated the inter-
rupt can be found in the status word that is stored in the I/O-interrupt work
area. The interrupt-handling routine then examines the status flags in the
work area for this channel to determine the cause of the interrupt.
If the channel status flags indicate normal completion of the I/O opera-
tion, the interrupt handler signals this completion via the ESBthat was speci-
fied in the I/O request. This may be done either by making an SVC request,
begin
store (CP,ESB) in channel work area
(a)
P1 requests I/O operation (a)
(b)
P2 requests I/O operation (b)
(c)
P1 requests I/O operation (c)
ese
354 Chapter 6 Operating Systems
>
state to the [/O-interrupt handler. After determining that the operation was
completed normally, the 1/O-interrupt handler issues a SIGNAL request
(SVC 1) for the associatedESB.This switches control to the SVC-interrupt han-
dler (12). ProcessP1, which is waiting on this ESB,is placed in the ready state.
The SVC handler then returns control to the [/O-interrupt handler (13). The
dispatcher is invoked at sequence(14), and switches control to process P1 at
(15). A similar series of operations occurs when channel 2 completes its opera-
tion (16); this causes process P2 to be made ready. However, since the CPU
was not idle at the time of the interrupt, control does not pass immediately to
P2. The I/O-interrupt handler restores control to the interrupted process P1.
P2 does not receive control until P1 issues its next WAIT request at sequence
(22).
You should carefully follow through the other steps in this example to be
sure you understand the flow of control in response to the various interrupts.
In doing this, you may find it useful to refer to the algorithms in Figs. 6.8, 6.9,
6.13, and 6.14. Note in particular the many different types of overlap between
the CPU execution and the I/O operations of the different processes.The abil-
ity to provide such flexible sequencing of tasks is one of the most important
advantagesof an interrupt-driven operating system.
Any operating system that supports more than one user at a time must pro-
vide a mechanism for dividing central memory among the concurrent
processes. Many multiprogramming and multiprocessing systems divide
memory into partitions, with each process being assigned to a different parti-
tion. These partitions may be predefined in size and position (fixedpartitions),
or they may be allocated dynamically according to the requirements of the
jobs being executed (variablepartitions).
In this section we illustrate and discuss several different forms of parti-
tioned memory management. Our examples use the sequence of jobs de-
scribed in Fig. 6.16. We assume that the level of multiprogramming (i.e., the
number of concurrent jobs) is limited only by the number of jobs that can fit
into central memory.
Figure 6.17 illustrates the allocation of memory in fixed partitions. The to-
tal amount of memory available on the computer is assumed to be 50000
bytes; the operating system occupies the first 10000bytes. These sizes, and all
other sizes and addresses used in this section, are given in hexadecimal; the
numbers have intentionally been kept small to allow for easier reading. The
memory that is not occupied by the operating system is divided into four par-
titions. Partition 1 begins at address 10000,immediately following the operat-
ing system, and is 18000 bytes in length. The other partitions follow in
8000
sequence:
bytesPartitions
in length. |2 and 3Job
are each
6.2.
(hexadecimal)
10000
Machine-Dependent
Length
bytes in length,
Operating
and System
Partition
Features
4 is 355
1
14000
A000.
2
3 A800
4 4000
5 E000
6 BOOO
7 C000
f
Operating Operating
system system
Job 5 Job 5
Partition 1
|
Partition
2| Job 1 Job 1 Job 6
\
‘
Job 4 Job 4
partitions.
Figure 6.17
4{ (a)————>>.
Partition Memoryallocation
(b)—for jobs from Fig.
Job 2
(ce)
—>6.16 usingfixed
terminates
Job 1
Job
(¢)4
terminates
356 Chapter6 OperatingSystems
Note that the partitions themselves remain fixed in size and position re-
gardless of the sizes of the jobs that occupy them. The initial selection of the
partition sizes is very important in a fixed partition scheme. There must be
enough large partitions so that large jobs can be run without too much delay.
If there are too many large partitions, however, a great deal of memory may be
wasted when small jobs are run. The fixed partition technique is most effective
when the sizes of jobs tend to cluster around certain common values, and
when the distribution of job sizes does not change frequently. This makes it
possible to make effective use of the available memory by tailoring a set of
partitions to the expectedpopulation of job sizes.
Figure 6.18 illustrates the running of the same set of jobs using variable
memory partitions. A new partition is created for each job to be loaded. This
newly created partition is of exactly the size required to contain the job. When
a job terminates, the memory assigned to its partition is released, and this
memory then becomesavailable for use in allocating other partitions.
Initially, all memory except that assigned to the operating system is unallo-
cated becausethere are no predefined partitions. When Job 1 is loaded, a parti-
tion is created for it. We assume this partition is allocated immediately after
the operating system. Job 2 is then assigned a partition immediately following
Job 1, and so on. Figure 6.18(a) shows the situation after the first five jobs are
loaded. The free memory that remains following Job 5 is not large enough to
load any other job.
When Job 2 terminates, its partition is released, and a new partition is allo-
catedfor Job6.As shownin Fig.6.18(b),this newpartitionoccupiespart of the
memory previously assigned to Job 2. The rest of Job 2’s former partition re-
mains free. There are now two separate free areas of memory; however, nei-
ther of these is large enough to load another job. Figure 6.18(c)and (d) shows
how the releasing and allocating of memory continue as other jobs terminate.
6.2 Machine-DependentOperating SystemFeatures 357
Operating
system
Operating
system
|Operating
|Operating
| system system
as
10000 a 10000
™ eT
Job 2 Job 1 Job 3
(a) terminates
(b) terminates
(c) terminates
(d)
Figure 6.18 Memory allocation for jobs from Fig. 6.16 using variable
partitions.
>
Job 3
(a) terminates
(b) terminates
(c) terminates
(¢)
Figure 6.19 Memory allocation for jobs from Fig. 6.16 using relocat-
able partitions.
Loc
1840 Source
+STAstatement
BUFF2 Object code
0000 P3 START 0
OF108108
. 2E000
8108= BUFF2
... 2F840|
0F136108
END
(a)
36108
(b)
Relocation
register
1A000
1B840| OF108108 +
Relocation
register
2E000
Figure 6.20 Use of relocation register
22108
in address
(d)caiculation.
36108 (c)
6.2 Machine-DependentOperating SystemFeatures 361
represent addresses and which represent other types of data. Thus relocating a
program during execution is much more difficult than relocating it when it is
first loaded. .
memory may even be larger than the total amount of real memory available
on the computer. The virtual memory used by a program is stored on some ex-
ternal device (the backingstore).Portions of this virtual memory are mapped
into real memory as they are needed by the program. The backing store and
the virtual-to-real mapping are completely invisible to the user program. The
program is written exactly as though the virtual memory really existed.
In this section we describe demandpaging, which is one common method
for implementing virtual memory. Referencesto discussions of other types of
virtual memories can be found at the end of the section.
In a typical demand-paging system, the virtual memory of a process is di-
vided into pagesof some fixed length. The real memory of the computer is di-
vided into pageframes of the same length as the pages. Any page from any
process can potentially be loaded into any page frame in real memory. The
mapping of pages onto page frames is described by a pagemap table (PMT);
there is one PMT for each process in the system. The PMT is used by the hard-
ware to convert addressesin a program’s virtual memory into the correspond-
ing addresses in real memory. This conversion process is similar to the use of
the relocation register described in the last section. However, there is one PMT
entry for each page instead of one relocation register for the entire program.
This conversion of virtual addressesto real addressesis known as dynamicad-
dress translation.
These concepts are illustrated by the program outlined in Fig. 6.22. The
program is divided into pages that are 1000 bytes (hexadecimal) in length.
Loc SKIP2
BUFF1,X Object code
000060 P3 START 0
Page 4
Page 6
OOGFFA
BUFF
1 RESW
Page 7
Page 8
Page 9
Page A
OO
A800 END
:
®
| Page=0
Offset=420 Page=6 Offset =FFA
1D000
Page map
1D103} 33100420
table
1DFFF-————_
Page
tfault
1D420 interrupt
| (Real address)
Loe a
Dynamic address translation
(a) (b)
Figure 6.23 Examples of dynamic address translation and demand
paging.
this instruction is virtual address 0420. We used an instruction format that pro-
vides direct addressing to make this initial example easier to follow. The
operand address 0420 is located within Page 0, at offset 420 from the begin-
ning of the page. The page map table indicates that Page 0 of this program is
loaded in page frame 1D (that is, beginning at address 1D000). Thus the real
address calculated by the dynamic address translation is 1D420.
Next let us consider the LDA instruction at virtual address 0420. The
operand for this instruction is at virtual address 6FFA (Page 6, offset FFA).
However, Page 6 has not yet been loaded into real memory, so the dynamic
address translation hardware is not able to compute a real address. Instead, it
generates a special type of program interrupt called a pagefault [see Fig.
366 Chapter6 Operating Systems
(Virtual address)
6FFA
Page=6 Offset=FFA
1D000
1D420| O3106FFA
Page
map
\OFFF Page
frame=29
29FFA
(Real address)
29FFF (c)
this interrupt by loading the required page into some page frame. Let us as-
sume page frame 29 is chosen. The instruction that caused the interrupt is then
reexecuted.This time, as shown in Fig. 6.23(c),the dynamic address translation
is successful.
The other pages of the program are loaded on demand in a similar way.
Assume that Fig. 6.22 shows all the Jump instructions in the program as well
as al] instructions whose operands are located in another page. When control
passesfrom the last instruction in Page 0 to the first instruction in Page 1, the
instruction-fetch operation causesa page fault, which results in the loading of
Page 1. The STA instruction at virtual address 1840causesPage 8 to be loaded.
Page 2 is then loaded as a result of the instruction-fetch cycle, just as Page 1
was. Now consider the two Jump instructions at addresses 2020 and 2024. If
the first of these jumps is executed (i.e., if the less-than condition is true), it
causes Page 4 to be loaded; otherwise, Page 4 remains unloaded. In this latter
6.2 Machine-DependentOperating SystemFeatures 367
3A
Virtual
memory 3B
Backing store
Real
P3, memory
page 8 |4C
(a)
begin
select page to be removed
mark the selected page frame table entry as committed
update PMT to reflect the removal of the page
enable all interrupts using LPS
if the selected page has been modified then
begin
issue I/O request to rewrite page to backing store
wait for completion of the write operation
end {if modified}
end {if no empty page frame}
issue I/O request to read page into the selected page frame
walt for completion of the read operation
update PMT and page frame table
mark process as Ready
restore status of user process that caused the page fault
(b)
Figure 6.25 Algorithms for dynamic address translation and page fault interrupt
processing.
370 Chapter6 Operating Systems
Thealgorithm
description
in Fig.6.25(b)
leaves
several
important
ques-
tions unanswered. The most obvious of these is which page to select for re-
moval. Some systems keep records of when each page in memory was last
referenced and replace the page that has been unused for the longest time.
This is called the least recentlyused(LRU) method. Since the overhead for this
kind of record keeping can be high, simpler approximations to LRU are often
used. Other systemsattempt to determine the set of pagesthat are frequently
used by the process in question (the so-called working setof the process).These
systems attempt to replace pages in such a way that each process always has
its working set in memory. Discussions and evaluations of various page re-
placement strategies can be found in Tanenbaum (1992)and Deitel (1990).
Another unanswered question concerns the implementation of the page ta-
bles themselves. One possible solution is to implement these tables as arrays
in central memory. A register is set by the operating system to point to the be-
ginning of the PMT for the currently executing process. This method can be
very inefficient because it requires an extra memory accessfor each address
translation. Some systems, however, use such a technique in combination with
a high-speed buffer to improve average accesstime. Another possibility is to
implement the page map tables in a special high-speed associative memory.
This is very efficient, but may be too expensive for systems with large real
memories. Further discussions of these and other PMT implementation tech-
niques can be found in Tanenbaum (1992).
Demand-paging systems avoid most of the wasted memory due to frag-
mentation that is often associated with partitioning schemes.They also save
memory in other ways. For example, parts of a program that are not used dur-
ing a particular execution need not be loaded. However, demand-paging sys-
tems are vulnerable to other serious problems. For example, suppose that
referencing a word in central memory requires 1 psec,and that fetching a page
from the backing store requires an average of 10 msec (10,000psec). Suppose
also that on the average, considering all jobs in the system, only 1 out of 100
virtual memory referencescausesa page fault. Even with this apparently low
page fault rate, the system will not perform well. For every 100memory refer-
ences (requiring 100 psec), the system will spend 10,000psec fetching pages
from the backing store. Thus the computing system will spend approximately
99 percent of its time swapping pages, and only 1 percent of its time doing
useful work. This total collapse of service because of a high paging rate is
known as thrashing.
To avoid thrashing in the situation just described, it is necessary for the
page fault rate to be much lower (perhaps on the order of one fault for every
10,000memory references).At first glance, this might seem to make demand
paging useless. It appears that all of a program’s pages would need to be in
memory to achieve acceptable performance. However, this is not necessarily
6.2 Machine-Dependent
OperatingSystemFeatures 371
Page fault
rate
Ww
Number of pages in memory
(a) (b)
Figure 6.26 (a) Localized memory references. (b) Effect of localized
references on page fault rate.
372 Chapter6 Operating Systems
Demand paging provides yet another example of delayed binding: the as-
sociation of a virtual-memory address with a real-memory address is not
made until the memory referenceis performed. This delayed binding requires
more overhead (for dynamic address translation, page fetching, etc.).
However, it can provide more conveniencefor the programmer and more ef-
fective use of real memory. You may want to compare these observations with
those made in the previous examples of delayed binding (Sections3.4.2,5.3.3,
and 5.4.2).
In this section we have described an implementation of virtual memory us-
ing demand paging. A different type of virtual memory can be implemented
using a technique called segmentation.In a segmented virtual-memory system,
an address consists of a segment number and an offset within the segment be-
ing addressed.The concepts of mapping and dynamic address translation are
similar to those we have discussed. However, in most systems segments may
be of any length (as opposed to pages,which are usually of a fixed length for
the entire system). Also, segments usually correspond to logical program units
such as procedures or data areas(as opposed to pages,which are arbitrary di-
visions of the address space).This makes it possible to associateprotection at-
tributes such as read only or executeonly with certain segments. It is also
possible for segments to be shared between different user jobs. Segmentation
is often combined with demand paging. This combination requires a two-level
mapping and address-translation procedure. For further information about
segmentation and its implementation, seeDeitel (1990)and Tanenbaum (1992).
problem of job scheduling, which selectsuser jobs as candidates for the lower-
level processscheduling discussedpreviously.
Section 6.3.3 discussesthe general subject of resource allocation by an op-
erating system and describes some of the problems that may occur. Finally,
Section 6.3.4provides a brief introduction to the important topics of protection
and operating system security.
Logical request
Catalog Record (‘read next record
from file F’’)
File
Physical request
information
(channel program)
tables
the processing of the file is completed, the buffers and any other work areas
and pointers are deleted. This procedure is called closingthe file.
One of the most important functions of the file manager is the automatic
performance of blockingand bufferingoperations on files being read or written.
Figure 6.28 illustrates these operations on a sequential input file. We assume
the user program starts reading records at the beginning of the file and reads
each record in sequenceuntil the end. The file logically consistsof records that
are 1024bytes long; however, the file is physically written in 8192-byteblocks,
with each block containing 8 logical records. This sort of blocking of records is
commonly done with certain types of storage devices to save processing time
and storage space.
Figure 6.28(a) shows the situation after the file has been opened and the
user program has made its first read-record request. The file manager has is-
sued an I/O request to read the first block of the file into buffer B1. The file
manager must wait for the completion of this I/O operation before it can re-
turn the requested record to the user. In Fig. 6.28(b),the first block has been
read. This block, containing logical records 1 through 8, is present in buffer B1.
The file manager can now return the requested record to the user program. In
this case,the requestedrecord is returned by setting a pointer P to the first log-
6.3 Machine-Independent
Operating
System
Features 375
re
f
Read
first block (a)
P Read
(b) second block
P Read
Bi}
1 2/3] 4/5 6| 7|8 B2
P Read
Read P
third block (e)
ical record. The file manager also issues a second physical I/O request to read
the second block of the file into buffer B2.
The next time the user program makes a read-record request, it is not nec-
essary to wait for any physical I/Oactivity. The file manager simply advances
the pointer P to logical record 2, and returns to the user. This operation is illus-
trated in Fig. 6.28(c).Note that the physical I/O operation that reads the sec-
ond block into buffer B2 is still in progress. The same process continues for the
rest of the logical records in the first block [seeFig. 6.28(d)].
If the user program makes its ninth read-record request before the comple-
tion of the 1/O operation for block 2, the file manager must again cause the
program to wait. After the secondblock has beenread, the pointerP is
switched to the first record in buffer B2. The file manager then issues another
I/O request to read the third block of the file into buffer B1, and the process
376 Chapter@ OperatingSystems
continues as just described. Note that the use of two buffer areas allows over-
lap of the internal processing of one block with the reading of the next. This
technique, often called doublebuffering,is widely used for input and output of
sequential files.
The user program in the previous example simply makes a series of read-
record requests.It is unaware of the buffering operations and of the details of
the physical I/O requestsbeing performed. Compare this with the program in
Fig. 6.11,which performs a similar buffering function by dealing directly with
the I/O supervisor. Clearly, the use of the file manager makes the user pro-
gram much simpler and easier to write, and therefore less error-prone. It also
avoids the duplication of similar code in a large number of programs.
File-management routines also perform many other functions, such as the
allocation of space on external storage devices and the implementation of
rules governing file accessand use. For further discussions of such topics, see
Deitel (1990) and Tanenbaum (1992).
Job scheduling is the task of selecting the next user job to begin execution. Ina
single-job system, the job scheduler completely specifies the order of job exe-
cution. In a multiprogramming system, the job scheduler specifies the order in
which jobs enter the set of tasks that are being executed concurrently.
Figure 6.29(a)illustrates a typical two-level scheduling schemefor a multi-
programming system. Jobs submitted to the system become part of an input
queue;a job schedulerselectsjobs from this workload. The jobs selected become
active,which means they begin to participate in the process-schedulingopera-
tion described in Section 6.2.2. This two-stage procedure is used to limit the
multiprogramminglevel,which is the number of user jobs sharing the CPU and
the other system resources.Such a limitation is necessaryin a multiprogram-
ming system to maintain efficient operation. If the system attempts to run too
many jobs concurrently, the overhead of resource management becomes too
large, and the amount of resourcesavailable to each job becomestoo small. As
a result, system performance is degraded.
In the schemejust described, the job scheduler is used as a tool to maintain
a desirable level of multiprogramming. However, this ideal multiprogram-
ming level may vary according to the jobs being executed. Consider, for exam-
ple, a system that uses demand-paged memory management.The number of
user jobs that can share the real memory is essentially unlimited. Each job can
potentially be executed with as little as one page in memory. However, thrash-
ing occurswhen
a job doesnot have
a certaincriticalnumberof pagesin
memory, and the performance of the overall system suffers. Unfortunately, the
6.3 Machine-Independent
Operating SystemFeatures 377
Input
queue Job
scheduler Active ,
Dispatcher CPU
(a)
Suspended
jobs
(b)
Figure 6.29 (a) Two-level scheduling system and (b) three-level
scheduling system.
achieve the lowest average turnaround time, which is the time between the sub-
mission of a job by a user and the completion of that job. A related goal for a
time-sharing system is to minimize expected response time, which is the length
of time between entering a command and beginning to receive a response
from the system.
There are many other possible scheduling goals for a computing system.
For example, we might want to provide a guaranteed level of service by limit-
ing the maximum possible turnaround time or responsetime. Another alterna-
tive is to be equitable by attempting to provide the samelevel of service for all.
On the other hand, it may be desirable to give certain jobs priority for external
reasonssuch as meeting deadlines or providing good service to important or
influential users. On some systems it is even possible for users to get higher
priority by paying higher rates for service, in which casethe overall schedul-
ing goal of the system might be to make the most money.
The first two goals mentioned above—high throughput and low average
turnaround time or response time—are commonly accepted as desirable sys-
tem characteristics. Unfortunately, these two goals are often incompatible.
Consider, for example, a time-sharing system with a large number of termi-
nals. We might choose to provide better response time by switching control
more rapidly among the active user terminals. This could be accomplished by
giving each process a shorter time-slice when it is dispatched. However, the
use of shorter time-sliceswould mean a higher frequencyof context switching
operations, and would require the operating system to make more frequent
decisions about the allocation of the CPU and other resources. This means the
On the other hand, consider a batch processing system that runs one job at
a time. The execution of two jobs on such a system is illustrated in Fig. 6.30(a).
Note the periods of CPU idle time, representedby gaps in the horizontal lines
for Jobs 1 and 2. If both jobs are submitted at time 0, then the turnaround time
for Job 1 (T;) is 2 minutes, and the turnaround time for Job 2 (T2) is 5 minutes.
The averageturnaround time, Tayg,is 3.5minutes.
Now consider a multiprogramming system that runs two jobs concur-
rently, as illustrated in Fig. 6.30(b).Note that the two concurrent jobs share the
CPU, so there is less overall idle time; this is the same phenomenon we stud-
ied in Fig. 6.15. Becausethere is less idle time, the two jobs are completed in
less total time: 4.5 minutes instead of 5 minutes. This means the system
throughput has been improved: we have done the same amount of work in
less time. However, the average turnaround time has become worse: 4.4 min-
utes instead of 3.5 minutes.
6.3 Machine-IndependentOperating SystemFeatures 379
Time (minutes)
(a)
Job 1
In Section 6.2 we discussed how an operating system might control the use of
resources such as central memory, I/O channels, and the CPU. These re-
sources are needed by all user jobs, and their allocation is handled automati-
cally by the system. In this section, we describe a more general
resource-allocation
functionprovidedby manyoperatingsystems.Such
a fa-
cility can be used to control the allocation of user-defined resources such as
files and data structures.
currently
Chapter
ables
value
and
shown
subtracting
the base
itemsthen
STACK
by
ofthe
TOP
3.
address
loading
on
saving
code
Program
3top
and
from
byneeded
3,
of
the
of
TOP
the
storing
the
the
P1
value
new
are
value
adds
stack.
to
stack;
handle
handled
value
afrom
new
of
items
We
TOP
TOP
the
of
stack
item
assume
by
to
contains
TOP
top
(lines
the
linking
from
overflow
of(lines
stack
that
37-40).
the
register
the
methods
external
stack
by
24-27).
and
relative
For
incrementing
Aunderflow.
into
simplicity,
on
»references
like
Program
location
register
the
those
top the
we
discussed
of
P2
A,
of
tothe
and
previous
have
removes
thestack,
vari-
item
then
not
in
Pl START 0
NF EXTREF STACK,TOP
LDS #3 REGISTER S = CONSTANT 3
48
P2 START 0
NO STACK, TOP
#3 REGISTER S = CONSTANT 3
75
(a)
requests.
6.3 Machine-Independent
Operating SystemFeatures 381
If processesP1 and P2 are executed concurrently, they may or may not work
properly.Forexample,
suppose
thepresentvalueof TOPis 12.If P1executes
its in-
structions numbered 24-27, it will add a new item in bytes 15-17 of the stack, and
1 Pl START 0
2 EXTREF STACK,TOP
3 LDS #3 REGISTER S = CONSTANT 3
1 P2 START 0
2 EXTREF STACK,TOP
3 LDS #3 REGISTER S = CONSTANT 3
(b)
the new value of TOP will be 15. If P2 then executes its instructions 37-40, it
will remove the item just added by P1, resetting the value of TOP to 12. This
representsa correct functioning of P1 and P2: the two processesperform their
intended operations on the stack without interfering with each other. Another
correct sequence would occur if P2 executed lines 37-40, and then P1 executed
lines 24-27.
On the other hand, suppose that P1 has just executed line 24 when its cur-
rent time-slice expires. The resulting timer interrupt causesall register values
to be saved; the saved value for register X is 12. Suppose now that the dis-
patcher transfers control to P2, which executeslines 37-40. These instructions
will cause P2 to remove the item from bytes 12-14 of the stack because the
value of TOP has not yet been updated by P1; P2 will then set TOP to 9. When
P1 regains control of the CPU, its register X will still contain the value 12. Thus
lines 25-27 will add the new item in bytes 15-17 of the stack, setting TOP to 15.
The sequence of events just described has resulted in an incorrect opera-
tion of P1 and P2. The item that was removed by P2 is still logically a part of
the stack, and the stack appears to contain one more item than it should.
Several other sequencesof execution also yield incorrect results. Similar prob-
lems may occur whenever two concurrent processes attempt to update the
same file or data structure.
state. It can then continue with its updating operation when it next receives
control of the CPU. You should trace through this sequenceof events carefully
to seehowtheproblems
previously
discussed
areprevented
by thisscheme.
Unfortunately, the use of request and releaseoperations can lead to other
types of problems. Consider, for example, the programs in Fig. 6.32.P3 first re-
quests control of resource RES]; later, it requests resource RES2. P4 utilizes the
same two resources; however, it requests RES2 before RES].
Suppose P3 requests, and receives, control of RES1,and that its time-slice
expires before it can request RES2.P4 may then be dispatched. SupposeP4 re-
quests, and receives, control of RES2. This sequence of events creates a situa-
tion in which neither P3 nor P4 can complete its execution. Eventually, P4 will
reach its line 5 and request control of RES1;it will then be placed into the
blocked state because RES1 is assigned to P3. Similarly, P3 will eventually
reach its line 5 and request control of RES2; P3 will then be blocked because
RES2 is assigned to P4. Neither process can acquire the resource it needs to
continue, so neither process will ever releasethe resource needed by the other.
This situation is an example of a deadlock:a set of processeseach of which
is permanently blocked because of resources held by the others. Once a dead-
lock occurs, the only solution is to releasesome of the resourcescurrently be-
ing held; this usually means canceling one or more of the jobs involved. There
are a number of methods that can prevent deadlocks from occurring. For ex-
ample, the system could require that a process request all its resources at the
same time, or that it request them in a particular order (such as RES1before
RES2).Unfortunately, such methods may require that resourcesbe tied up for
longer than is really necessary,which can degrade the overall operation of the
system. Discussions of methods for detecting and preventing deadlocks can be
found in Singhal and Shivaratri (1994)and Tanenbaum(1992).
The problems we have discussed in this section are examples of the more
general problems of mutual exclusionand processsynchronization.Discussions of
these problems, and techniques for their solution, can be found in Singhal and
Shivaratri (1994) and Tanenbaum (1992).
6.3.4 Protection
P3 START 0
svc 3
Svc 4
\o
@ LD? =R1 RELEASE RES1
Svc 4
10 Ri BYTE C’RES1
11 R2 BYTE C’RES2
12 END
P4 START 0
\o
© LDT =R2 RELEASE RES2
svc 4.
10 R1 BYTE C’RES1
11 R2 BYTE C’‘RES2
12
Figure 6.32
END
Resource requests leading to potential deadlock.
6.3 Machine-Independent
Operating SystemFeatures 385
Files Programs
Users f, fa fy P; Pz
the table in an encoded form. Details about this and other similar techniques
can be found in Pfleeger (1996)and Denning (1982).
A system of user identification and authorization does not always solve
the overall security problem, becauseinformation must sometimes leave the
secure environment. Consider, for example, Fig. 6.34(a).The user at the time-
sharing terminal is properly identified and authorized to accessa certain file F.
We assumethe computer system provides adequateaccesscontrols and is con-
tained in a physically secure environment. We also assume the terminal itself
is physically protected. However, the communication link between the com-
puter system and the terminal may be difficult or impossible to protect.
(Consider, for example, the problem of preventing wiretapping when the ter-
minal is connected to the computer via public telephone lines.) This means
that any information from file F that is transmitted to the terminal is vulnera-
Secure
environments
Terminal Terminal
(a) (b)
Figure 6.34 Use of encryption to protect data during transmission.
6.4 Operating SystemDesignOptions 387
The usual solution to this type of security problem is data encryption [see
Fig. 6.34(b)]. Information to be sent over a nonsecure communication link is
encrypted
(encoded)
whilestillinthesecure
environment
ofthesender.
The
transmitted information is decrypted (decoded) after entering the secure envi-
ronment of the receiver. Wiretapping is still possible; however, the eavesdrop-
per would be unable to interpret the encrypted information being transmitted.
The encryption and decryption operations can be performed efficiently by ei-
ther hardware or software. There are a large number of different encryption
techniques available. For a comprehensive discussion of such methods, see
Pfleeger (1996)and Denning (1982).
Of course, the effectivenessof any protection system depends entirely on
the correctness and protection of the security system itself. In the system we
discussed,the access-controlinformation must be protected against unautho-
rized modifications. There must also be a mechanism to ensure that users
cannot accessthe protected objects without going through the security sys-
tem. Hardware features such as user/supervisor modes, privileged instruc-
tions, and memory protection mechanisms are often useful in dealing with
issues of this kind. It is also important to be sure that the part of the operating
system that applies the protection rules (the security kernel)performs its task
correctly. Further discussions of these issues can be found in Pfleeger (1996).
A survey of common flaws in operating system security mechanisms is given
in Landwehr et al (1994).
In this section we briefly describe some important concepts related to the de-
sign and structure of an operating system. Section 6.4.1 introduces the notion
of hierarchical operating system structure, which has been used in the design
of many real systems. Section 6.4.2 describes how an operating system may
provide multiple virtual machines.Such a system gives each user the impres-
sion of running on a dedicated piece of hardware. Sections 6.4.3 and 6.4.4 dis-
cuss operating systems for multiprocessors and distributed systems, and
describe some options for the division of tasks between the processors. Section
6.4.5 gives an introduction to object-oriented operating systems, and discusses
some of the advantages of this approach.
388 Chapter6 Operating Systenis
User
(a) interface
Level Functions
3 File management
2 Memory management
1 /O supervision
0 Dispatching.
resourcemanayement
(b)
Figure 6.35(b) shows the operating system functions assigned to each level
of our sample structure. The placement of the functions is governed by the re-
lationships between‘the operations that must be performed. In general, func-
tions at one level are allowed to refer only to functions provided by the same
or lower levels; that is, there should be no outward calls. In our example, the
filee-managementroutines (Level 3) must use the memory manager (Level 2) to
allocate buffers, and the I/O supervisor (Level 1) to read and write data
blocks. If demand-paged memory management is being used, the memory
manager must also call on the I/O supervisor to transfer pages between the
real memory and the backing store. All levels of the system use the process-
scheduling and resource-managementfunctions provided by Level 0.
There are many advantages to such a hierarchical structure. Operating sys-
tem routines at a given level can use the relatively simple functions and inter-
faces provided by lower levels. It is not necessary for the programmer to
understand how these lower-level functions are actually implemented. The
operating system can be implemented and tested one level at a time, begin-
ning with Level 0. This greatly reduces the complexity of each part of the sys-
tem and makes the tasks of implementation and debugging much simpler.
The placement of functions shown in Fig. 6.35(b) is typical; however, there
are many variations between different systems. Consider, for example, the
interrupt-handling routines. Many systemsplace all first-level interrupt handlers
(FLIH) in the kernel (Level 0). After initial interrupt processing, an FLIH can
transfer control to a routine at some higher level; this is an exception to the no-
outward-calls rule. Thus, for example, the FLIH for a page fault interrupt
might save status information, enable other interrupts, and then transfer to a
routine at Level 2. SeeFig. 6.25(b) for an example of such processing.
Hierarchical systems also differ in the rules for passing control from one
level to another. In a strict hierarchy,each level may refer only to the level im-
mediately beneath it. Thus Level 3 could communicate directly only with
Level 2. If it were necessaryfor the file-management routine in our example to
call the I/O supervisor, this request would have to be passed on from Level 2
to Level 1. This approach has the advantage of simplicity of use: each level has
only one interface with which it must be concerned. However, such a restric-
tion can lead to inefficiency becauseit increasesthe number of calls that must
be performed to reach the inner level. In a transparenthierarchy,each level may
communicate directly with the interface of any lower level. Thus, for example,
a user program could invoke file-management routines at Level 3, or it could
call the 1/O-supervisor functions of Level 1 directly.
Further discussions of hierarchical operating system structures can be
found in Singhal and Shivaratri (1994)and Tanenbaum (1992).
390 Chapter6 OperatingSystems
Test
user
machine. The other users of the real machine can continue their operation
without being disturbed.
Figure6.37illustrateshow this illusion can be accomplished.
The lowest-
level routines of the operating system deal with the VMM instead of with the
real machine. The VMM, which is completely invisible to the operating system
and the user program, provides resources,services, and functions that are the
same as those available on the underlying real machine.
Each direct user of a virtual machine, such as OS1 or User5 in Fig. 6.36, ac-
tually runs in user mode, not supervisor mode, on the real machine. When
such a user attempts to execute a privileged instruction such as SIO, STI, or
LPS, a program interrupt occurs. This interrupt transfers control to the VMM.
The VMM simulates (with respect to the virtual machine) the effect of the priv-
ileged operation that was being attempted, and then returns control to the
user of the virtual machine. Similarly, an interrupt on the real machine also ac-
tivates the VMM. The VMM determines which virtual machine should be af-
fected by the interrupt and makes the appropriate changesin the status of that
virtual machine.
User
interface
Virtual
machine
The VMM is actually a complete, but simple, operating system for the real
machine. The other operating systems and stand-alone users of virtual ma-
chines are the “users” of the real operating system (the VMM). Thus the VMM
must provide all of the essential machine-dependent functions we have dis-
cussed. The VMM saves status information for each virtual machine and
switches the real CPU between the various virtual machines; this is the same
as the process-schedulingfunction discussed in Section 6.2. It also provides a
separatevirtual memoryand
a setof virtual I/O channelsfor eachvirtual ma-
chine, using techniques similar to those we have already discussed.
The most obvious advantagesof the virtual-machine approach are flexibil-
ity and convenience. Different operating systems can be run concurrently to
serve the needs of different types of users. Operating systems and stand-alone
programs can be tested while still making the machine available to ordinary
users.The use of separatevirtual machines can also provide a higher degreeof
protection since each virtual machine has no access to the resources of any
other. The disadvantage, of course, is the higher system overhead required to
simulate virtual-machine operation. For example, if an operating system run-
ning on a virtual machine uses virtual-memory techniques itself, it may be
necessaryto have two separatelevels of dynamic address translation. The effi-
ciency of a virtual-machine operating system depends heavily on how many
operations must be simulated by the VMM, and how many can be performed
directly on the real machine.
Further discussions of virtual machine operating systems can be found in
Deitel (1990).
CPU CPU
Memory Memory
Memory
communicate with each other, but each can directly accessonly its own local
memory. In a tightly coupledsystem, all processors share the same logical ad-
dress space,and there is acommon memory that can be accessedby all proces-
sors. These types of multiprocessor organization are sometimes referred to as
distributed memorysystems and sliaredmemorysystems. However, it is possible
to have an organization in which the memory is physically distributed but log-
ically shared. (See,for example, the description of the Cray T3E architecture in
Section 1.5.3.)
Figure 6.39 illustrates the three basic types of multiprocessor operating
system. In a separatesupervisor system, each processor has its own operating
system. There are some common data structures that are used to synchronize
communication between the processors. However, each processor acts largely
as an independent system. Separatesupervisor systems are relatively simple,
and the failure of one processor need not affect the others. However, the inde-
pendence between processors makes it difficult to perform parallel execution
of a single user job.
In a master-slavesystem [Fig. 6.39(b)], one “master” processor performs all
resource management and other operating system functions. This processor
completely controls the activities of the “slave” processors,which execute user
394 Chapter 6 Operating Systems
Master
Slave Processor
Slave
Processor
Processor
(b) Master-Slave
(c) Symmetric
example, all processorsmust have accessto all the data structures used by the
operating system. However, the processors deal with these data structures in-
dependently of one another, and problems can arise if two processorsattempt
to update the same structure at the same time (see Section 6.3.3). Symmetric
multiprocessing systems must provide some mechanism for controlling ac-
cess to critical operating system tables and data structures. The request and
release operations described in Section 6.3.3 are not sufficient to handle this
problem becausetwo different processorsmight perform request operations
simultaneously. The solution usually requires a special hardware feature that
allows one processor to seize control of a critical resource, locking out all
other processors in a single step. Discussions of such mechanisms, and fur-
ther information about multiprocessor operating systems, can be found in
Singhal and Shivaratri (1994).
Section 6.5.4 describesan example of a symmetric multiprocessing operat-
ing system running on a machine with a tightly coupled architecture.
Files
Processor Processor
Memory Memory
(user Processor
Files
Memory
Processor
Processor
Kernel
Processor
This section describes how the concepts of object-oriented design and pro-
gramming can be applied to operating systems. (If you are unfamiliar
with object-oriented concepts, you may want to review Section 8.4 before
proceeding.)
Figure 6.41 illustrates the general structure of an object-oriented operating
system. Most of the system is implemented as a collection of objects. Objects
belong to classesthat designate some of the properties of the object. For exam-
ple, there may be one class for file objects, one for processes,etc. Each object
encapsulatesa data structure and defines a set of operations on that data struc-
ture. For a file object, for example, typical operations are reading, writing, ap-
pending, and deleting. The operations defined for objects are called methods.
When a user program (or a part of the operating system) needs to perform
some operation on an object, it does so by invokingone of the methods defined
for that object.
Network
Transport
Memory
Manager
e
- e
e e
In thissectionwe present
brief descriptionsof severalrealoperatingsystems.
These systems have been chosen to illustrate some of the variety of design and
purpose in such software. As in our previous examples, we do not attempt a
complete high-level description of any system. Instead, we focus on some of
the more interesting or unusual features provided and give references for
readers who want more information.
6.5.1 MS-DOS
Version 1 of MS-DOS was written in 1981by Microsoft for use with the newly
announced IBM personal computer. This first version consisted of about 4000
lines of assemblerlanguage code and ran in 8KB of memory. The original PC,
which was based on the Intel 8088 chip, could address a maximum of 1
megabyte (1MB) of memory.
As the PC evolved, so did MS-DOS. Version 2, released in 1983, ran on the
IBM PC/XT. It supported a hard disk drive, and incorporated many features
that were similar to ones found in UNIX. Version 3, released in 1984, was de-
signed for use with the new IBM PC/AT; this computer was basedon the Intel
80286chip. Version 4 (releasedin 1988),Version 5 (1991),and Version 6 (1993)
provided further enhancements.They also included support for the more ad-
vanced CPU chips that were available (80386,80486,and Pentium).
By modern standards, MS-DOS is technically obsolete.For example, it can
run only one processat a time and can make only very limited use of memory
in excessof IMB. However, it remains the most widely used operating system
in the world, and there are a vast number of applications that run under its
control.
Thissoftware
base
helpstoexplain
thecontinued
popularity
of
MS-DOS, and the reluctance of many users to change to more modern systems.
Figure 6.42 shows the overall structure of MS-DOS. The BIOS (Basic
Input/Output System) contains a collection of device drivers that correspond
6.5 ImplementationExamples 401
to the specific hardware components being used. MS-DOS performs input and
output by invoking the BIOS routines; this isolates the rest of the operating
system from the details of the hardware. The kernel provides service routines
to implement memory management, process management, and the file sys-
tem. The shell is the interface that interprets user commands and calls other
operating system routines to execute them. MS-DOS provides a shell that in-
terprets command lines and an alternative screen-oriented interface. Users can
also install their own special-purpose shells.
MS-DOS does not support multiprogramming. A processcan create a child
process via a system call. However, the parent process is automatically sus-
pended until the child process terminates. There may be many processesin
memory at a particular time; however, only one of them can be active. All the
rest will be blocked, waiting for a child to finish. Compare this with a multi-
programming system, in which the CPU can be switched among a number of
active processes.
The lack of multiprogramming makes process management in MS-DOS
relatively simple. Becauseof this restriction, however, MS-DOS cannot effec-
tively support features such as background print spooling.
Figure 6.43 illustrates the memory model used in MS-DOS. There are four
different areas of memory, each of which can be used in different ways.
+. Extended
|
+ Memory
1MB
+64KB|High Memo
{MB}-—2 vy
Upper
Memory
640KB
Conventional
Memory
Because
it is intendedfor a single-user
personal
computer,
MS-DOS
does
not provide the protection found on multi-user systems. Users are free to in-
stall their own interrupt-processing routines and device drivers. It is also pos-
sible for a program to manipulate system data structures to “trick” MS-DOS
‘ 6.5 ImplementationExamples 403
6.5.2 Windows 95
‘ aa
Win32 Win16
Application Application
MS-DOS
Application
Ring 3
MS-DOS
System
VirtualMachine Virtual eee
Machine
6.5.3 SunOS
The SunOS operating system is the foundation of Sun’s Solaris operating envi-
ronment. Solaris consists of three major components: the SunOSoperating sys-
tem, the ONC distributed computing environment, and the OpenWindows
development environment. ONC provides support for a distributed file sys-
tem, a network naming service, and a remote procedure call facility.
OpenWindows includes productivity tools and utilities that manage system
resources,and provides an environment for developing and running applica-
tion programs. SunOS is the underlying operating system that supports the
other components of Solaris.
The original Solaris was developed for SPARC machines. Later versions,
released in 1995 and 1996, can run on UltraSPARC, Pentium Pro, and
PowerPC computers. This portability makes it easier to createapplications for
multiple hardware architectures and for networked systems that contain dif-
ferent types of machines. Further information about Solaris can be found in
Becker et al. (1995).
This section focuses on the SunOS operating system, which is based on
UNIX System V Release4 (SVR4). We begin by discussing the development
and design of UNIX, and then briefly survey the extensions included in
SunOS.
Application
_ program
Hardware
Utilities
UNIX also provides three other ways for processesto communicate with
each other. One set of system calls allows processes to send and receive mes-
sages.A second set implements integer-valued semaphoresthat can be set and
tested by different processes.A third group of system calls allows processesto
define shared regions of memory.
Each process has a separate virtual address space. This virtual memory is
implemented via demand paging. Pagesin physical memory are selected for
replacement using a modified working-set policy. A page that has not been ref-
erencedin a certain length of time is considered no longer to be in the working
set of any process; thus it becomes a candidate for replacement. It is possible
that the working sets of the processes being run might require more pages
than can fit into memory. If this happens, one or more of the processesis tem-
porarily suspended. Pagesbeing used by these processesare removed from
memory to make more room available.
UNIX organizes files into a hierarchical file system, with files grouped to-
gether under directories. Links can be established to allow a single file to be
accessedby different names, or to appear in different directories. A pipe is a
special type of file that can be used to connect the output of one processdi-
rectly to the input of another. Physical devices are treated in the same way as
files, and can be accessedusing the same system calls.
The current version of SunOS adds several enhancementsto the capabili-
ties of SVR4. Symmetric multiprocessing is supported; multiple threads
within the kernel can execute concurrently on different processors.Support is
also provided for application-level multithreading. Extensions to the real-
time scheduling policy have been implemented, with the goal of providing
deterministic scheduling response.A number of security enhancements have
been added, including several different user authentication modes. Tools are
also provided to assist system administrators in monitoring and improving
security.
6.5 ImplementationExamples 409
6.5.4 UNICOS/mk
UNICOS/mk is Cray’s operating system for the T3E multiprocessor. You may
want to review the description of the T3E in Section 1.5.3before proceeding.
The user and application program interfaces of UNICOS/mk are based on
UNIX. UNICOS/mk complies with the latest POSIXstandards, as well as with
a number of other industry standards. However, the method of implementa-
tion is quite different from the normal UNIX structure discussed in the previ-
ous section (see Fig. 6.45).
Figure 6.46 shows the overall structure of UNICOS/mk. The system con-
sists of a microkernel and a number of servers. The microkernel includes a mini-
| Microkernel
|
Figure 6.46 Overall structure of UNICOS/mk.
410 Chapter6 Operating Systems
6.5.5 Amoeba*
Gateway
— WAN
Specialized servers
(file, database, etc.)
Many of the servers in Amoeba are just ordinary user processes.For exam-
ple, the file system server is a collection of user processesthat manage file ob-
jects.
Users
whoare
‘nothappy
withthestandard
filesystem
arefreetowrite
and use their own. This provides a high degreeof flexibility.
Further information about Amoeba can be found in Tanenbaum (1992) and
Tanenbaum et al. (1990).
EXERCISES
Section 6.2
>
27. Why was the I/O interrupt assigned a lower priority than the SVC
interrupt on the SIC/XE machine?
Section 6.3
1. Give an algorithm for a file manager routine that performs the block-
ing and buffering operations illustrated in Fig. 6.28.
2. When might it be advantageous to use more than two buffers for a
sequential file?
3. Draw a state-transition diagram similar to Fig. 6.7 for the three-level
scheduling procedure illustrated in Fig. 6.29(b).
4. Is it possible for a job to have a shorter turnaround time under a
multiprogramming operating system than under a single-job system
for the same machine?
Section 6.4
This section describes the basic purpose and functions of a generalized data-
base management system (DBMS). Our main focus in this section is the user’s
view of a DBMS. We also discuss how the DBMS functions are related to other
419
420 Chapter7 Other SystemSoftware
Programs Registration
copy of each logical data item, so redundancy is eliminated. The database it-
self consists of a set of files. Different applications that require the same data
item may share the file that contains the needed information.
Although the approach just described solves the problem of data redun-
dancy, it may cause other difficulties. Application programs that deal dir-
ectly with physical files are data dependent,which means they depend on
characteristics such as record format and file organization and sequence.
Programs
Integrated
database
Integrated
database
(b)
dexed by advisor. The content and organization of some existing files would
need to be changed, and, because of data dependence, all other programs
using these files would also have to be modified.
This situation is illustrated in Fig. 7.2(b). The advising system uses three
database files: one new file and two existing ones. It was necessaryto modify
the format and structure of one of the existing files used by the new system
(shown by shading in the figure). All other programs that use this file must
therefore be changed. In this case,the changes involve programs in the exist-
ing registration and financial aid systems.
Problems like the one just described can be avoided by making application
programs independent of details such as file organization and record format.
Figure 7.3(a) shows how this can be accomplished. The user programs do not
deal directly with the files of the database. Instead, they accessthe data by
making requests to a databasemanagementsystem(DBMS). Application pro-
grams request data at a logical level, without regard for how the data is actu-
ally stored in files. For example, a program can request current enrollment
information for a particular student. The DBMS determines which physical
files are involved, and how these files are to be accessed,by referring to a
stored data mappingdescription.It then reads the required records from the files
of the databaseand converts the information into the form requestedby the
application program. This process is discussed in more detail in Sections 7.1.2
and 7.1.3.
The data independence provided by this approach means that file structures
can be changed without affecting the application programs. Consider, for ex-
ample, Fig. 7.3(b). As before, a new advising system has been added. A new
file has been added to the database, and one existing file has been modified
(indicated by the shading in the figure). The data mapping description has
been modified to reflect these changes,but the application programs them-
selves remain unchanged. The same logical request from an application
7.1 DatabaseManagementSystems 423
program may now result in a different set of operations on the files of the
database.The application programs, however, are unaware of this difference
because
theyareconcerned
onlywithlogical
items
ofinformation,
notwith
how this information is stored in files.
The data independence provided by a DBMS is also important for other
reasons. The techniques used for physical storage of the database can be
Programs
Request for
Data
logical record
mapping
description
Database
Programs
Request for
logical record
DEGREE-OBJECTIVE requirement
COURSE
MAJOR
| STANDING NUMBER
| SECTION
|
major—-in enrolled-in instructor
STUDENT FACULTY
advisor
ADDRESS
NAME
pePt|OFFICE
|
aid-recipient student-employee faculty—employee
FINANCIAL-AID PAYROLL
TYPE
| AMOUNT PAY-RATE DEDUCTIONS
that processesthe payroll for the university. Each logical record in this sub-
schemacontains the information needed to issue a paycheck to an employee.
These three subschemasprovide quite different views of the database.A
subschemamust be consistent with the schema—thatis, it must be possible to
derive the information in the subschema from the schema. Subschema (a) is
simply a subset of the schema:the record names,data items, and relationships
are the same as those contained in the corresponding part of the schema.This
need not be true, however, for all subschemas.
In subschema(b), the application program uses record names that are dif-
ferent from those contained in the schema. It is also possible to use different
names for individual data items. The FACULTY record in subschema (b) con-
tains some, but not all, of the information from the FACULTY record of the
schema. The COURSE-TAUGHT subschema record contains the same infor-
mation that is present in the COURSE schema record. The STUDENT-
ENROLLED record in the subschema contains information from the
STUDENT record in the schema. However, the STUDENT-ENROLLED record
also contains information about the student’s major, which is contained in the
DEGREE-OBJECTIVE record that is logically connected to the STUDENT
record by the major-in relationship.
Subschema(c) consists of a single logical record type PAY-RECORD,whose
data may come from three different schema records. Information concerning
rate of pay and deductions is taken from the PAYROLL
FACULTY schema record.
instructor
DEGREE-OBJECTIVE COURSE-TAUGHT
| MAJOR
| STANDING
| NUMBER
SECTION
; ; enrollment
major-in
STUDENT STUDENT—-ENROLLED
(a) (b)
PAY—RECORD
1|NAME:
ADDRESS PAY—RATE
DEDUCTIONS
(c)
Information such as employee name and address is obtained from either the
STUDENT record or the FACULTY record, whichever is appropriate, by using
the student-employ¢e and faculty-employee relationships. The data field in
PAY-RECORDthat is designated S/F indicates whether the record pertains to
a student employee or to a faculty member.
A subschemaprovides an application program with a view of the database
that is suited to the needs of the particular program. The DBMS takes care of
converting information from the database into the form specified by the sub-
schema (seeSection 7.1.3).As a result, the application program is simpler and
easier to write because the programmer does not have to be concerned with
data items and relationships that are not relevant to the application. The sub-
schema is also an aid in providing data security becausea program has no
way of referring to data items not described in its subschema.
We have now discussed three different levels of data description in data-
base management systems: the subschemas, the schema, and the data map-
ping description. A DBMS supplies languages,called datadescriptionlanguages,
for defining the databaseat each of these levels. The subschemasare used by
application programmers and are written in a subschemadescriptionlanguage
designed to be convenient for the programmer. Often subschema description
languages are extensions of the data description capabilities in the program-
ming language to be used. However, the subschemas are created and main-
tained by the database administrator. In defining a subschema, the database
administrator must be sure that the view of data given in the subschema is de-
rivable from the schema,and that it contains only those data items the applica-
tion program is authorized to use.
The schema itself, and the physical data mapping description, are nor-
mally used only by the database administrator. On many systems, the schema
descriptionlanguageis closely related to the subschemadescription language. It
is also possible to use a more generalized language, becausethe schema is not
used directly by application programmers. The physical data description lan-
guageis influenced by the types of logical structures supported by the schema,
and also by the types of files and storage devices supported by the DBMS.
Further discussions and examples of data description languages can be
found in Date (1990).
The two principal methods for user interaction with a DBMS are illustrated
in Fig. 7.6. The user can write a source program in the normal way, using a
general-purpose programming language. However, instead of writing I/O
statements of the form provided by the programming language, the program-
mer writes commands in a data manipulation language(DML) defined for use
with the DBMS. Thesecommands are often designed so that the DML appears
to be merely an extension of the programming language. As shown in Fig.
7.6(a), a preprocessor may be used to convert the DML commands into pro-
gramming language statements that call DBMS routines. The modified source
program is then compiled in the usual way. Another approach is to modify the
compiler to handle the DML statements directly. Some DMLs are defined as a
set of CALL statements using the programming language itself, which avoids
the need for preprocessing or compiler modification.
Source program
with
DML commands
Preprocessor
Source program
with
call statements
Query
Object program language
interpreter
The other approach to DBMS interaction, illustrated in Fig. 7.6(b), does not
require the user to write programs to access the database. Instead, users enter
commands
ina specialquerylanguage
definedby theDBMS.Thesecommands
are processedby a query-language interpreter, which calls DBMS routines to
perform the requestedoperations.
Each of these approaches to user interaction has its own advantages. With
a query language, it is possible to obtain results much more quickly because
there is no need to write and debug programs. Query languages can also be
used effectively by nonprogrammers, or by individuals who program only oc-
casionally. Most query languages, however, have built-in limitations. For ex-
ample, it may be difficult or impossible to perform a function for which the
language was not designed. On the other hand, a DML allows the program-
mer to use all the flexibility and power of a general-purpose programming
language; however, this approach requires much more effort from the user.
Most modern database management systems provide both a query language
and a DML so that a user can choose the form of interaction that best meets his
Application
program A Subschema
used by
application
program A
Database
management
system
Operating
system
Data mapping
description
CD
request for a subschemarecord into physical requeststo read data from one or
more files. These requests for file I/O are passed to the operating system (step
5) using the types of service calls discussed in Chapter 6. The operating system
then issueschannel and device commands to perform the necessaryphysical
I/O operations (step 6). These I/O operations read the required records from
the database into a DBMS buffer area.
After the physical I/O operations have been completed, all the data re-
quested by the application program is present in central memory. However,
this information must still be converted into the form expected by the pro-
gram. The DBMS accomplishes this conversion (step 7) by again comparing
the schema and the subschema.In the example we are discussing, the DBMS
would extract data from the PAYROLL record and the associated STUDENT or
for data. Finally, the DBMS returns control to the application program and
makes available to the program a variety of status information, including any
possible error indications.
Further details concerning the topics discussed in this section can be found
in Date (1990).
The interactive text editor has become an important part of almost any com-
puting environment. No longer are editors thought of as tools only for pro-
grammers or for secretaries transcribing from marked-up copy generated by
authors. It is now increasingly recognized that a text editor should be consid-
ered the primary interface to the computer for all types of “knowledge work-
ers” as they compose, organize, study, and manipulate computer-based
information.
In this section we briefly discuss interactive text-editing systems from the
points of view of both the user and the system. Section 7.2.1 gives a general
overview of the editing process. Section 7.2.2 expands upon this introduction
by discussing various types of user interfaces and I/O devices. Section 7.3.3
describes the structure of a typical text editor and discusses a number of
system-related issues.
An interactive editor is a computer program that allows a user to create and re-
vise a target document. The term documentincludes objects such as computer
programs, text, equations, tables, diagrams, line art, and photographs—any-
thing that one might find on a printed page. In this discussion, we restrict our
attention to text editors,in which the primary elements being edited are charac-
ter strings of the target text.
The document-editing processis an interactive user-computer dialogue de-
signed to accomplish four tasks:
*Adapted from Norman Meyrowitz and Andries van Dam, “Interactive Editing Systems:Part I and
Part II,” ACM ComputingSurveys,September 1982.Copyright 1982,Association for Computing
Machinery, Inc. These publications also contain much more detailed discussions of the editing
process,descriptionsof a large number of actual editors, and a comprehensivebibliography.
432 Chapter7 Other SystentSoftware
>
Selection of the part of the document to be viewed and edited involves first
traveling through the document to locate the area of interest. This searchis ac-
complished with operations such as next screenful,bottom, and find pattern.
Traveling specifies where the area of interest is; the selection of what is to be
viewed and manipulated there is controlled by filtering. Filtering extracts the
relevant subset of the target document at the point of interest, such as the next
screenful of text or the next statement. Formatting then determines how the re-
sult of the filtering will be seenas a visible representation(the view) on a dis-
play screenor other device.
In the actual editing phase, the target document is created or altered with a
set of operations such as insert, delete,replace,move,and copy.The editing func-
tions are often specialized to operate on elementsmeaningful to the type of edi-
tor. For example, a manuscript-oriented editor might operateon elementssuch
as single characters, words, lines, sentences, and paragraphs; a program-
oriented editor might operate on elements such as identifiers, keywords, and
statements.
In a simple scenario, then, the user might travel to the end of the docu-
ment. A screenful of text would be filtered, this segment would be formatted,
and the view would be displayed on an output device. The user could then,
for example, delete the first three words of this view.
Besides the conceptual model, the user interface is concerned with the in-
put devices,the output devices,and the interaction languageof the system. Brief
discussions and examples of these aspects of the user interface are presented
in the remainder of this section.
Input devices are used to enter elements of the text being edited, to enter
commands, and to designate editable elements. These devices, as used with
editors, can be divided into three categories: text devices, button devices, and
locator devices. Text or string devices are typically typewriter-like keyboards
on which a user pressesand releaseskeys, sending a unique code for each key.
Virtually all current computer keyboards are of the QWERTY variety (named
for the first six letters in the second row of the keyboard). Several alternative
keyboard arrangements have been proposed, some of which offer significant
advantages over the standard keyboard layout. None of these alternatives,
however, seemslikely to be widely accepted in the near future becauseof the
retraining effort that would be required.
Button or choicedevices generate an interrupt or set a system flag, usually
causing invocation of an associatedapplication-program action. Such devices
typically include a set of special function keys on an alphanumeric keyboard
or on the display itself. Alternatively, buttons can be simulated in software by
displaying text strings or symbols on the screen.The user choosesa string or
symbol instead of pressing a button.
Locatordevices are two-dimensional analog-to-digital converters that posi-
tion a cursor symbol on the screen by observing the user’s movement of the
device. The most common such devices for editing applications are the mouse
and the data tablet. The data tablet is a flat, rectangular, electromagnetically
sensitive panel. Either a ballpoint-pen-like stylus or a puck,a small device simi-
lar to a mouse, is moved over the surface. The tablet returns to a system pro-
gram the coordinates of the position on the data tablet at which the stylus or
puck is currently located. The program can then map these data-tablet coordi-
nates to screen coordinates and move the cursor to the corresponding screen
position. Locator devices usually incorporate one or more buttons that can be
used to specify editing operations.
Text devices with arrow (cursor) keys can be used to simulate locator de-
vices. Each of these keys shows an arrow that points up, down, left, or right,
Pressing an arrow key typically generates an appropriate character sequence;
the program interprets this sequenceand moves the cursor in the direction of
the arrow on the key pressed.
Voice-inputdevices,which translate spoken words to their textual equiva-
lents, may prove to be the text input devices of the future. Voice recognizers
are currently available for command input on some systems.
Formerly limited in range, output devices for editing are becoming more
diverse. The output device lets the user view the elements being edited and
434 Chapter7 Other SystemSoftware
the results of the editing operations. The first output devices were teletype-
writers and other character-printing terminals that generated output on paper.
Next, “glass teletypes” based on cathode ray tube (CRT) technology used the
CRT screenessentially to simulate a hard-copy teletypewriter (although a few
operations, such as backspacing,were performed more elegantly). Today’s ad-
vanced CRT terminals use hardware assistance for such features as moving the
cursor, inserting and deleting characters and lines, and scrolling lines and
pages. The more modern professionalworkstations, sometimes based on per-
sonal computers with high-resolution displays, support multiple proportion-
ally spaced character fonts to produce realistic facsimiles of hard-copy
documents. Thus the user can seethe document portrayed essentially as it will
look when printed on paper.
The interaction language of a text editor is generally one of several com-
mon types. The typing-orientedor text command-oriented method is the oldest of
the major editor interfaces. The user communicates with the editor by typing
text strings both for command names and for operands. These strings are sent
to the editor and are usually echoed to the output device.
Typed specification often requires the user to remember the exact form of
all commands, or at least their abbreviations. If the command language is com-
plex, the user must continually refer to a manual or an on-line helpfunction for
a description of less frequently used commands. In addition, the typing
required can be time consuming, especially for inexperienced users. The
function-keyinterface addressesthesedeficiencies. Here each command has as-
sociated with it a marked key on the user’s keyboard. For example, the insert
charactercommand might have associatedwith it a key marked IC. Function-
key command specification is typically coupled with cursor-key movement for
specifying operands, which eliminates much typing.
For the common commands in a function-key editor, usually only a single
key is pressed. For less frequently invoked commands or options, an alterna-
tive textual syntax may be used. More commonly, however, special keys are
used to shift the standard function-key interpretations, just as the SHIFT key
on a typewriter shifts from lowercase to uppercase. As an alternative to shift-
ing function keys, the standard alphanumeric keyboard is often overloadedto
simulate function keys. For example, the user may press a control key simulta-
neously with a normal alphanumeric key to generate a new character that is
interpreted like a function key.
Typing-oriented systemsrequire familiarity with the system and language,
as well as some expertise in typing. Function key-oriented systemsoften have
either too few keys, requiring multiple-keystroke commands, or have too
many unique keys, which results in an unwieldy keyboard. In either case,the
function-key systems demand even more agility of the user than a standard
keyboard does. The menu-orienteduser interface is an attempt to address these
’ 7.2 Text Editors 435
Most text editors have a structure similar to that shown in Fig. 7.8, regardless
of the particular features they offer and the computers on which they are im-
plemented. The commandlanguageprocessoracceptsinput from the user’s input
devices, and analyzes the tokens and syntactic structure of the commands. In
this sense, the command language processor functions much like the lexical
and syntactic phases of a compiler. Just as in a compiler, the command lan-
guage processor may invoke semantic routines directly. In a text editor, these
semantic routines perform functions such as editing and viewing.
Alternatively, the command language processormay produce an interme-
diate representation of the desired editing operations. This intermediate repre-
sentation is then decoded by an interpreter that invokes the appropriate
semantic routines. The use of an intermediate representation allows the editor
to provide a variety of user-interaction languages with a single set of semantic
routines that are driven from a common intermediate representation.
The semantic routines involve traveling, editing, viewing, and display
functions. Editing operations are always specified explicitly by the user, and
display operations are specified implicitly by the other three categories of
436 Chapter7 Other SystemSoftware
Command
Input
language
|0 PR Retetern Viewing
processor filter
Paging
routines
bufferbased on the current editing pointer as well as on the editing filter para-
meters. These parameters, which are specified both by the user and the sys-
tem, provide information such as the range of text that can be affected by an
operation. Filtering may simply consist of the selection of contiguous charac-
ters beginning at the current point. Alternatively, filtering may depend on
more complex user specifications pertaining to the content and structure of the
document. Such filtering might result in the gathering of portions of the docu-
ment that are not necessarily contiguous. The semantic routines of the editing
component then operate on the editing buffer, which is essentially a filtered
subset of the document data structure. Note that this explanation is at a con-
ceptual level—in a given editor, filtering and editing may be interleaved, with
no explicit editing buffer being created.
Similarly, in viewing a document, the start of the area to be viewed is de-
termined by the current viewing pointer.This pointer is maintained by the view-
ing componentof the editor, which is a collection of modules responsible for
determining the next view. The current viewing pointer can be set or reset ex-
plicitly by the user with a traveling command or implicitly by the system as a
result of the previous editing operation. When the display needs to be up-
dated, the viewing component invokes the viewingfilter. This component fil-
ters the document to generate a new viewing buffer based on the current
viewing pointer as well as on the viewing filter parameters. These parameters,
which are specified both by the user and by the system, provide information
such as the number of characters needed to fill the display, and how to select
them from the document. In line editors, the viewing buffer may contain the
current line; in screen editors, this buffer may contain a rectangular cutout of
the quarter-plane of text. This viewing buffer is then passedto the displaycom-
ponentof the editor, which produces a display by mapping the buffer to a rec-
tangular subset of the screen,usually called a window.
The editing and viewing buffers, while independent, can be related in
many ways. In the simplest case,they are identical: the user edits the material
directly on the screen(seeFig. 7.9). On the other hand, the editing and viewing
buffers may be completely disjoint. For example, the user of a certain editor
might travel to line 75, and after viewing it, decide to change all occurrencesof
“ugly duckling” to “swan” in lines 1 through 50 of the file by using a change
command such as
As part of this editing command, there is implicit travel to the first line of the
file. Lines 1 through 50 are then filtered from the document to become the edit-
ing buffer. Successivesubstitutions take place in this editing buffer without
corresponding updates of the view. If the pattern is found, the current editing
438 Chapter7 Other SystemSoftware
and viewing pointers are moved to the last line on which it is found, and that
line becomes the default contents of both the editing and viewing buffers. If
the pattern is not found, line 75 remains in the editing and viewing buffers.
The editing and viewing buffers can also partially overlap, or one may be
completely contained in the other. For example, the user might specify a
search to the end of the document, starting at a character position in the mid-
dle of the screen. In this case the editing filter creates an editing buffer that
contains the document from the selected character to the end of the document.
The viewing buffer contains the part of the document that is visible on the
screen,only the last part of which is in the editing buffer.
Windows typically cover either the entire screenor a rectangular portion of
it. Mapping viewing buffers to windows that cover only part of the screen is
especially useful for editors on modern graphics-based workstations. Such
systems can support multiple windows, simultaneously showing different
portions of the same file or portions of different files. This approach allows the
user to perform inter-file editing operations much more effectively than with a
system having only a single window.
The mapping of the viewing buffer to a window is accomplishedby two
components of the system. First, the viewing component formulates an ideal
view, often expressed in a device-independent intermediate representation.
This view may be a very simple one consisting of a window’s worth of text
arranged so that lines are not broken in the middle of words. At the other ex-
treme, the idealized view may be a facsimile of a page of fully formatted and
typeset text with equations, tables, and figures. Second,the displaycomponent
7.2. Text Editors 439
takes this idealized view from the viewing component and maps it to a physi-
cal output device in the most efficient manner possible.
Updating of a full-screen display connectedover low-speed lines is slow if
every modification requires a full rewrite of the display surface. Greatly im-
proved performance can be obtained by using optimal screen-updating algo-
rithms. These algorithms are based on comparing the current version of the
screenwith the following screen.They make use of the innate capabilities of
the terminal, such as insert-characterand delete-character
functions, transmitting
only those charactersneeded to generate a correct display.
Device-independent output, like device-independent input, helps provide
portability of the interaction language. Decoupling editing and viewing opera-
tions from display functions for output avoids the need to have a different ver-
sion of the editor for each output device. Many editors make use of a
terminal-controldatabase.Instead of having explicit terminal-control sequences
in the display routines, these editors simply call terminal-independent library
routines such as scroll down or readcursor position. These library routines use
the terminal-control database to look up the appropriate control sequencesfor
a particular terminal. Consequently, adding a new terminal merely entails
adding a databasedescription of that terminal.
The components of the editor deal with a user document on two levels: in
main memory and in the disk file system. Loading an entire document into
main memory may be infeasible. However, if only part of a document is
loaded, and if many user-specifiedoperations require a disk read by the editor
to locate the affected portion, editing might be unacceptably slow. In some sys-
tems, this problem is solved by mapping the entire file into virtual memory
and letting the operating system perform efficient demand paging. An alterna-
tive is to provide editorpaging routines,which read one or more logical portions
of a document into memory as needed. Such portions are often termed pages,
although there is usually no relationship between these pages and hardcopy
document pages or virtual-memory pages. These pages remain resident in
main memory until a user operation requires that another portion of the docu-
ment be loaded.
memory, and I/O devices. The editor on a stand-alone system must have ac-
cess to the functions that the time-sharing editor obtains from its host operat-
ing system. These may be provided in part by a small local operating system,
or they may be built into the editor itself if the stand-alone system is dedicated
to editing. The editor operating in a distributed resource-sharing local net-
work must, like a stand-alone editor, run independently on each user’s ma-
chine and must, like a time-sharing editor, contend for shared resourcessuch
as files.
been edited in this way, its updated contents would be transmitted back to the
host processor.
The advantage of this scheme is that the host need not be concemed with
each minor change or keystroke; however, this is also the major disadvantage.
With a nonintelligent terminal, the CPU seesevery character as it is typed and
can react immediately to perform error checking, to prompt, to update the
data structure, etc. With an intelligent workstation, the lack of constant CPU
intervention often means that the functionality provided to the user is more
limited. Also, local editing operations on the workstation may be lost in the
event of a system crash. On the other hand, systems that allow each character
to interrupt the CPU may not use the full hardware editing capabilities of
the workstation because the CPU needs to see every keystroke and provide
character-by-characterfeedback.
*Adapted from Rich Seidner and Nick Tindall, “Interactive Debug Requirements.” Software
EngmweeringNotesand SIGPLAN Notices,August 1983.Copyright 1983,Association for Computing
Machinery, Inc.
7.3 InteractiveDebuggingSystenis 441
cedure, branch, individual instruction, and so on. The tracing can also be
based on conditional expressionsas previously mentioned. Traceback can show
the path by which the current statement was reached. It can also show which
statements have modified a given variable or parameter. This kind of informa-
tion should be displayed symbolically, and should be related to the source pro-
gram—for example, as statement numbers rather than as hexadecimal
displacements.
It is also important for a debugging system to have good program-display
capabilities. It must be possible to display the program being debugged, com-
plete with statement numbers. The user should be able to control the level at
which this display occurs. For example, the program may be displayed as it
442 Chapter7 Other SystemSoftware
was originally written, after macro expansion, and so on. It is also useful to be
able to modify and incrementally recompile the program during the debug-
ging session.The system should save all the debugging specifications (break-
point definitions, display modes, etc.) across such a recompilation, so the
programmer does not need to reissue all of these debugging commands. It
should be possible to symbolically display or modify the contents of any of the
variables and constants in the program, and then resume execution. The intent
All these types of optimization create problems for the debugger. The user
of a debugging system deals with the source program in its original form, be-
fore optimizations are performed. However, code rearrangement alters the ex-
ecution sequence and may affect tracing, breakpoints, and even statement
counts if entire statements are involved. If an error occurs, it may be difficult
to relate the error to the appropriate location in the original source program.
A different type of problem occurs with respect to the storage of variables.
When
a programis translated,
thecompilernormallyassigns
a home
location
in
main memory (or in an activation record) to each variable. However, as we
discussed in Section 5.2.2,variable values may be temporarily held in registers
at various times to improve speed of access.Statements referring to these
Chapter7 Other SystemSoftware
variables use the value stored in the register, instead of taking the variable
value from its home location. These optimizations present no problem for dis-
playing the values of such variables. However, if the user changes the value of
the variable in its home location while debugging, the modified value might
not be used by the program as intended when execution is resumed. In a simi-
lar type of global optimization, a variable may be permanently assigned to a
register. In this case, there may be no home location at all.
The debugging of optimized code requires a substantial amount of cooper-
ation from the optimizing compiler. In particular, the compiler must retain in-
formation about any transformations that it performs on the program. Such
information can be made available both to the debugger and to the program-
mer. Where reasonable,the debugger should use this information to modify
the debugging request made by the user, and thereby perform the intended
operation. For example, it may be possible to simulate the effect of a break-
point that was set on an eliminated statement. Similarly, a modified variable
value can be stored in the appropriate register as well as at the home location
for that variable. However, some more complex optimizations cannot be han-
dled as easily. In such cases,the debugger should merely inform the user that
a particular function is unavailable at this level of optimization, instead of at-
tempting some incomplete imitation of the function.
ger to interfere with any aspect of system integrity. Use of the debugger must
be subject to the normal authorization mechanisms and must leave the usual
audittrails.Onebenefit
ofthedebugger,
atleast
bycomparison
with
a storage
dump, is that it controls the information that is presented. Whereas a dump
may include information that happens to have been left in storage, the debug-
ger presentsinformation only for the contents of specific named objects.
The debugger must coordinate its activities with those of existing and fu-
ture language compilers and interpreters, as described in the preceding sec-
tion. It is assumedthat debugging facilities in existing languageswill continue
to exist and be maintained. The requirements for a cross-language debugger
assume that such afacility would be installed as an alternative to the individ-
ual language debuggers.
The user interaction should make use of full-screen displays and window-
ing systems as much as possible. The primary advantage offered by such an
interface is that a great deal of information can be displayed and changed eas-
ily and quickly. With menus and full-screen editors, the user has far less infor-
mation to enter and remember. This can greatly contribute to the perceived
friendliness of an interactive debugging system.
If the tasks a user needs to perform are reflected in the organization of
menus, then the system will feel very natural to use. Menus should have titles
that identify the task they help perform. Directions should precede any
choices available to the user. Techniques such as indentation should be used to
help separateportions of the menu. Often the most frustrating aspect of menu
systemsis their lack of direct routing. It should be possible to go directly to the
menu that the user wants to select without having to retrace an entire hierar-
chy of menus.
The use of full-screen displays and techniques such as menus is highly
desirable. However, a debugging system should also support interactive
users when a full-screen terminal device is not present. Every action a user
can take at a full-screen terminal should have an equivalent action in a linear
Chapter7 Other SystemSoftware
multi-level and quite specific. One powerful use of HELP with menus is to
provide explanatory text for all options present on the screen.Thesecan be se-
lectable by option number or name, or by filling the choice slot with a question
mark. HELP should be accessiblefrom any state of the debugging session.The
more difficult the situation, the more likely it is that the user will need such in-
formation.
Chapter 8
Software Engineering
Issues
The development of software engineering tools and methods began in the late
1960s, largely in response to what many authors have called “the software
crisis.” This crisis arose from the rapid increase in the size and complexity of
computer applications. Systemsbecamemuch too large and complicated to be
programmed by one or two people; instead, large project teams were required.
In some extremely large systems, it was difficult for any one individual even
to have a full intellectual grasp of the entire project. The problems in manag-
ing such a large development project led to increases in development costs
and decreasesin productivity. Large software systems seemed always to be
delivered late, to cost more than anticipated, and to have hidden flaws. For an
excellent discussion of such problems, seeBrooks (1995).
We can seeevidence of the continuing problems today. The purchaser of a
new automobile, television set, or personal computer usually expects that the
product will correctly perform its intended function. On the other hand, the
first releasesof a new operating system, compiler, or other software product
almost always contain major “bugs” and do not work properly for some users
and in somesituations.Thesoftwarethengoesthrough
a seriesof different
versions or “fixes” designed to resolve these problems. Even in later releases,
however, it is usual to find new flaws from time to time.
The discipline now known as softwareengineeringevolved gradually in re-
sponse to the problems of cost, productivity, and reliability created by increas-
ingly large and complex software systems. Software engineering has been
defined in many different ways—for example,
Many useful tools and techniques have been developed to help with these
problems of reliability and cost. For discussions of some of these methods, see
Sommerville (1996);Ng and Yeh (1990),and Lamb (1988). In spite of the ad-
vances, however, the problems are far from completely solved. Brooks (1995)
gives an interesting discussion of the present state of software engineering and
his view of the prospects for the future.
This section briefly discusses the various steps in the software development
process. Figure 8.1 shows the oldest and best-known model for this process—
the so-called waterfall software life-cycle model. As we shall see,this model is
an oversimplification of the actual software development cycle. Nevertheless,
it serves as a useful starting point for our discussions.
In the waterfall model, the software development effort is pictured as
flowing through a sequenceof different stages,much like a waterfall moving
from one level to the next. In the first stage, requirementsanalysis,the task is to
determine what the software system must do. The focus of this stage is on the
Requirements |
analysis
System
specification
r System
design
Implementation
System
testing
__4
Maintenance
Pl
needs of the users of the system, not on the software solution. That is, the re-
quirements specify what the system must do, not /iow it will be done. In some
cases,it is necessary to do much analysis and consultation with the prospec-
tive users—there are often hidden assumptions or constraints that must be
made precise before the system can be constructed. The result of the require-
ments analysis stage is a requirementsdocumentthat states clearly the intent,
properties, and constraints of the desired system in terms that can be under-
stood by both the end user and the system developer.
The goal of the second stage, systemspecification,is to formulate a precise
description of the desired system in software development terms. The infor-
mation contained in the system specification is similar to that contained in the
requirements document. The focus is still on what functions the system must
perform, rather than on how these functions are to be accomplished. (Many
authors, in fact, consider requirements analysis and system specification to be
different aspectsof the same stage in the development process.)However, the
approach is somewhat different. The requirements analysis step looks at the
system from the point of view of the end user; the system specifications are
written from the point of view of system developers and programmers, using
software development terminology. Thus, the system specifications can be
considered as a computer-oriented interpretation of the requirements docu-
ment. We will consider examplesof such specifications in Section8.2.
The first two stages of the software development process are primarily
concerned with understanding and describing the problem to be solved. The
third stage, systemdesign,begins to address the solution itself. The system de-
sign document outlines the most significant characteristics of the software to
be developed. If a procedural approach is being taken, the system design may
describe the most important data structures and algorithms to be used and
specify how the software is to be divided into modules. If an object-oriented
approach is taken, the system design may describe the objectsdefined in the
system and the methods that can be invoked on each object. Smaller and less
significant details are omitted—the goal is a high-level description of how the
software will be constructed and how the various parts will work together to
perform the desired function.
The system designer should attempt to make decisions that will minimize
the problems of cost, complexity, and reliability that we have mentioned previ-
ously. In a procedural design, for example, the software might be divided into
a set of relatively independent modules. An effort should be made to keep
each module to a manageable size and complexity. The modular structure
should also be designed so that the overall system is as easy as possible to im-
plement, test, and modify. We will consider an example of this modular design
process in Section 8.3.
8.1 Introduction to SoftwareEngineeringConcepts 451
The last phaseof the software life-cycle model shown in Fig. 8.1 is maintenance.
It is tempting to believe that most of the work is done after a system has been
designed, implemented, and tested. However, this is often far from true.
Systems that are used over a long period of time inevitably must change in
order to meet changing requirements. According to some estimates, mainte-
nance can account for as much as two-thirds of the total cost of software
(Sommerville, 1996).
Lamb (1988) identifies four major categories of software maintenance.
Correctivemaintenancefixes errors in the original system design and implemen-
tation—that is, casesin which the system does not satisfy the original require-
ments. Perfectivemaintenance improves the system by addressing problems that
do not involve violations of the original requirements—for example, making
the system more efficient or improving the user interface. Adaptive maintenance
changesthe system in order to meet changing environments and evolving user
needs. Enhancement adds new facilities to the system that were not a part of
the original requirements and were not planned for in the original design.
Maintenance can be made much easier and less costly by the presence of
good documentation. The documents created during the system development
(requirements, specifications, system design, and module or object documen-
tation) should be kept throughout the lifetime of the system. It is very impor-
tant to keep these documents updated as the system evolves (for example,
when requirements change or new modules are added). There may be docu-
mentation that explicitly addressesquestions of maintenance. For example, the
designers of a system often plan for the likelihood of future change, identify-
ing points where code or data items can easily be added or modified to meet
changing requirements. Sample executions of the system should also be a part
of the documentation. The test casesoriginally used during system develop-
ment should be preserved so that these tests can be repeated on the modified
version of the system.
As the system is modified, it is extremely important to maintain careful
control over all of the changesbeing made. A software system may go through
many different versions, or releases.Each such version typically involves a set
8.2 SystemSpecifications 453
As discussed in the preceding section, the system specification process lies be-
tween the steps of requirements analysis and system design. The requirements
document is primarily concerned with the end user’s view of the system and
is usually written at a high level, with less important details omitted. During
system specification, these details must be supplied to provide a basis for the
system design to follow. The specifications must contain a complete descrip-
tion of all of the functions and constraints of the desired software system.
They must be clearly and precisely written, must be consistent, and must con-
tain all of the information needed to write and test the software. In order to
create such specifications, the developers must examine the purpose and goals
of the system more closely. The processof formulating precise system specifi-
cations often reveals areas where the requirements are incomplete or ambigu-
ous; this requires a temporary return to the requirements analysis stage.
Although the system specifications contain more detailed information than
the requirements document, they are still concerned with what the system
must do, rather than with how it will be done. The selection of algorithms, data
representations,etc., belongs to the following phase, system design. However,
it is important that the specifications explicitly address issues such as the per-
formance needed from the system and how the software should interact with
the users. During design, it is often necessary to make choices between con-
flicting goals. For example, one choice of data structures might lead to more
454 Chapter8 SoftwareEngineeringIssues
efficient processing but consume more storage space; a different choice might
save space but require more processing time. The selection of one type of user
interface might optimize the speed of data entry but require more initial train-
ing time for the users. It is important that the system designers make such
choices in a way that is consistent with the overall objectives of the system and
the needs of the end users. Including such information in the requirements
provides a basis for making the design decisions to follow.
The system specificationsalso form the basis for the system testing phase.
Therefore, they must be written in a way that is objectively testable.In the case
of “general” requirements such as efficiency and cost of operation, it is impor-
tant to specify how these qualities will be measured and what will constitute
acceptableperformance.
As we can see, the system specifications are related closely to the other
parts of the software development process.It is essential to maintain a record
of these relationships so that the overall system documentation is coherent
and consistent. For example, each specification should be explicitly connected
with the appropriate item in the requirements document. Later in the process,
design decisions may contain cross-referencesto the specifications that formed
the basis for the decision. In the testing phase, each test casemay refer explic-
itly to the specification that is being verified.
statement. The Operation field is separated from the Label field by at least one blank; if
no label is present, the Operation field may begin anywhere after column 1.
Labels may be from 1 to 6 characters in length. The first character must be alphabetic
(A—Z);each of the remaining characters may be alphabetic or numeric (0-9).
The Operation field must contain one of the SIC mnemonic opcodes, or one of the as-
sembler directives BYTE, WORD, RESB, RESW, START, END.
A hexadecimal string operand in a BYTE directive must be of the form X’hhh..., where
each
his a characterthat representsa hexadecimal
digit (0-9 or A-F). Theremustbe
an even number of such hex digits. The maximum length of the string is 32 hex digits
(representing 16 bytes of memory).
The source program may contain as many as 500 distinct labels.
Output specifications
The assembly listing should show each source program statement (including any com-
ments), together with the current location counter value, the object code generated,
and any error messages.
The object program should occupy no address greater than hexadecimal FFFF.
The object program should not be generated if any assembly errors are detected.
Quality specifications
The assembler should be able to process at least 50 source statements per second of
compute time.
Experienced SIC programmers using this assembler for the first time should be able to
understand at least 90 percent of all error messages without assistance.
The assembler should fail to process source programs correctly in no more than 0.01
Figure
percent
8.2
of allSample
executions.
program specifications.
456 Chapter8 SoftwareEngineeringIssues
Eachcolumn after the first describesa rule that specifieswhich actions to take
under a particular combination of conditions. (You may want to refer to
Section 2.2 in order to understand the logic being expressed in this specifica-
tion.)
Thus, for example, Rule R5 states that if (1) the operand value type is “rel”
(relative), (2) extended format is not specified in the instruction, and (3) the
operand is in range for PC-relative addressing, then the assemblershould (1)
set bit P to 1, (2) set bit B to 0, and (3) set bit E to 0. Rule R7 states that if (1) the
operand value type is “rel”, (2) extended format is not specified, (3) the
operand is not in range for PC relative addressing, and (4) the operand is not
in range for base-relative addressing, then an error has been detected. The en-
try “—” as part of a rule indicates that the corresponding condition is irrele-
vant or does not apply to that rule. Of course, if this decision table were used
as a part of the system specification for an assembler, other specifications
would be needed to define precisely what is meant by such terms as “absolute
operand,” “relative operand,” and “in range for PC-relative addressing.”
Further information about the construction and use of decision tables may be
found in Gilbert (1983).
[ ]
Rule
4 = —-~ + = — + 4
| Conditions/actions
| R1| R2|R3|R4/A5|R6|R7/R8| RS
Operandvaluetype abs
| abs
| abs
| abs
| rel
| rel| rel| re!| neither abs
nor rel
- 1 — + + oe —— I rr es|
Operand value
< 4096? Y Y N N/—/—/]—|]-—- _—
as a + —- + — St |
Extended format
specified? N Y N Y N N | N Y —
— a T —T — — + +
Operand in range for
PC-relative? —_ _— —|- Y N N| — —_—
eee n —F > + +— + + — > 4
Set———
bit Pto —T
0 | 0 + -
0 1 0 0
Set bit B to 0 0 0 0 1 0
Set bit E to 0 1 1 0 0 1
+ | 1 po
One of the most frequently overlooked aspects of software writing is the han-
dling of error conditions. It is much easier to write a program that simply
processes valid inputs properly than it is to write one that detects and
processeserroneous input as well. However, effective handling of error condi-
tions is essential to the creation of a usable software product.
Properly written specifications implicitly define what classesof inputs are
not acceptable. For example, specification 2 in Fig. 8.2 implies that a label
should be considered invalid if it (1) is longer than six characters, (2) does not
begin with an alphabetic character,or (3) contains any character that is not ei-
ther alphabetic or numeric. Figure 8.4 shows a number of input errors derived
from the input specifications in Fig. 8.2. Other erroneous input conditions may
be explicitly defined by the specifications, as in the decision table in Fig. 8.3. It
is extremely important that such error conditions be a part of the testing of the
overall software system.
Software may take many different actions when faced with erroneous in-
put. Sometimesa program simply aborts with a run-time error (such as a sub-
script out of range) or halts with no output. Obviously, these are unacceptable
actions—they leave the user of the program with little or no help in finding
the cause of the problem. An even worse alternative is simply to ignore the er-
ror and continue normally. This may deceive the user into thinking that every-
thing is correct, which may lead to confusion when the output of the system is
not as expected.
Error Violates
number specification Statement
1 1 ALPHA LDA BETA
2 1 ALPHALDA BETA
3 1 LDA BETA
4 2 ALPHAXX LDA BETA
5 2 1LPHA LDA BETA
6 2 ALP*A LDA BETA
7 3 ALPHA XxX BETA
8 4 ALPHA LDA 7FD3
9 4 ALPHA LDA 010000
10 5 BETA BYTE XA3B2
11 5 BETA BYTE X'01G9’
12 5 BETA BYTE X‘A3B’
include the action or actions that produced File 1 and the action or actions that
use information from File 2.
Figure 8.5(c) shows an action that reads a source program and produces a
symbol table that contains the labels defined in the program and their associ-
ated addresses.The action may also set flags to indicate errors that were de-
tected in the source program. Notice that the symbol table is both an input
object and an output object for this action. This reflects the fact that the action
must first search the table before adding a symbol, in order to detect duplicate
symbol definitions.
Data flow diagrams usually represent the most important actions and ob-
jects in a system, with less-important details omitted. For example, the action
shown in Fig. 8.5(c)probably usesseveral other data objects,such as a location
counter and working-storage variables. Likewise, the action itself could be
Source Build
program symbol table
(c)
divided into smaller actions, such as updating the location counter or scan-
ning the source statement for a label. However, the high-level representation
shown in Fig. 8.5(c)tonveys the overall approach being taken.
During the system design process, data flow diagrams may initially be
drawnata relativelyhighlevel.Thediagramsarethenrefinedandmademore
detailed as the design progresses.As an illustration of this, let us consider a
simple assembler for a SIC machine (standard version). You may want to refer
to Sections 1.3.1 and 2.1 as you read this material.
Object
program
Source Assemble
program program
Assembly
listing
produced by the system; this question will be addressedat a later step in the
process.
Similarly, in Fig. 8.7(b) we have created an action to write the assemblylist-
ing. This action requires as input some of the same information that was
needed for the object program. However, it also requires the source program
(so that the original assemblerlanguage statementscan be listed) and informa-
tion about any errors that were detected during assembly.The sourceprogram
is a primary input to the system, so it can be used directly by the new action.
However, we must introduce a new object that contains the required error
flags.
The process continues in Fig. 8.7(c). At this stage, we consider how to pro-
duce the intermediate data object that contains the assembled instructions
and data with addresses. Obviously, the new action that creates this object
must have the source program as one of its inputs. It also requires a table of
Assembled
instructions
and data with
addresses
and
instructions
Assembled
addresses
data with (a)
(b)
Object
program
Assembled
Instruction
instructions Write
and data
and data with object program
addresses
addresses
Write
Assembie assembly listing
instructions
and data
Assembly
Source listing
program
(c)
Object
program
Instruction
and data
addresses Assembled
instructions
and data with
addresses
Assign
instruction and Write
data addresses object program
Assemble
instructions
and data Write
operation codes (to translate the mnemonic instructions into machine op-
codes) and a symbol table (to translate symbolic operands into machine ad-
dresses). In addition, we define a new object that contains the address
assigned to each instruction and data item to be assembled.We prefer to sepa-
rate the assignment of addresses from the translation process itself in order to
simplify the translation action and also to make the addresses available to
other actions that may be defined.
During the translation of instructions, certain error conditions may be de-
tected. Thus, the data object that contains error flags is also an output from the
new action in Fig. 8.7(c).(As we shall see,other actions may also set error flags
in this object.) The operation code table contains only constant information
that is related to the instruction set of the machine. This information may be
predefined as part of the assembleritself, instead of being produced during
each assembly of a source program. Thus, the operation code table may be
treated as though it were a primary input to the system—we will not need to
createany new action to produce this data object.
Figure 8.7(d) shows the final step in the development of the data flow dia-
gram for our simple assembler. We have created one new action to compute
the addressesfor the instructions and data items for the program being assem-
bled. This action operates by scanning the source program and noting the
length of each instruction and data item, as described in Section 2.1. Another
new action uses these addressesto make entries in the symbol table for each
label in the source program. Both of the new actions may detect errors in the
source program, so they may need to write into the data object that is used to
store error flags. With these new actions defined, there are no “disconnected”
actions or objects.Thus, the data flow diagram in Fig. 8.7(d) is complete.
The data flow diagram is intended to represent the flow and transforma-
tion of information from the primary inputs through intermediate data objects
to the final outputs of the system. As the diagram is developed, it is important
to write down documentation for the data objects and processing actions that
are being defined. For example, the documentation should describe what data
is stored in each object and how each action transforms the data with which it
deals. However, the data structures being used to store the information and
the algorithms used to accessthis information are not a part of the data flow
representation. Likewise, the mechanisms by which data is passed from one
processingaction to another are not specified by the representation.Such im-
plementation details are a part of the modular design processthat we discuss
in the following section. Thus, the-data flow diagram and associated docu-
mentation can be considered as an intermediate step between the specifica-
tions (which describe what is to be done) and the system design (which
describeshow the tasks are to be accomplished).
8.3 ProceduralSystemDesign 465
The data flow diagram for a system represents the flow and transformation of
information from the primary inputs through intermediate data objects to the
final outputs of the system. However, there are many different ways in which
these flows and transformations could be implemented in a piece of software.
For example, consider the action in Fig. 8.7(d) that assignsinstruction and data
addresses. This action might be implemented as a separate pass over the
source program, computing all addresses before any other processing is done.
In that case, the object that contains the addresses might be a data structure
with one entry for each line of the source program. This structure would then
be used by the other actions of the assembler.
On the other hand, the action that assigns addressesmight deal with the
source program one line at a time. It might compute the address for each in-
struction or data item and then pass this address to the other parts of the as-
sembler that require it. In that case,the data object created might be a simple
variable, containing only the address for the line currently being processed.
Similar options exist for many of the other actions and data objects in Fig.
8.7(d). Thus, this data flow diagram could describe an ordinary two-pass as-
sembler. However, it could equally well describe a one-pass assembler or a
multi-pass assembler (see Section 2.4). The choices between such alternatives
are made by the system designer as the data flow diagram is converted into a
modular design for a piece of software.
Obviously, the goal of the modular design processis a software design that
meets the system specifications. However, it is almost equally important to de-
sign systems that are easy to implement, understand, and maintain. Thus, for
example, the modules should be small enough that each could be imple-
mented by one person within a relatively short time period. Modules should
have high cohesion—thatis, the functions and objects within a module should
be closely related to each other. At the same time, the modules in the system
should have low coupling—that is, each module should be as independent as
possible of the others. Systems organized into modules according to these
“divide and conquer” principles tend to be much easier to understand. They
are also easier to implement, becausea programmer needs to understand and
remember fewer details in order to code each module. The resulting system is
easier to maintain and modify, becausethe changes that need to be made are
usually isolated within one or two modules, not distributed throughout the
entire system. In the remainder of this section, we will seeexamples of the ap-
plication of thesegeneral principles to the design of an assembler.
466 Chapter8 SoftivareEngineeringIssues
Assembler
_—_
— — — — i Jr
Object
program
Instruction
and data
addresses Assembied
instructions
and data with ~
addresses ‘~
Assign 1 |
instruction and i |
Gata addresses | \
Assemble ; |
instructions i |
Source
program | ||
|
|
J a
“
—
Assign
symbol
addresses
Assembly
listing
‘ 8.3 ProceduralSystemDesign 467
For example, the task of assembling a single line from the source program
might be divided into one module that assembles machine instructions, one
module that assembles data constants, and so on.
Another important factor to consider in modular design is the desirability
of minimizing the coupling between modules. Consider, for example, the por-
tion of the data flow diagram that is shown in Fig. 8.9(a). Suppose that each of
the two actions shown is implemented as a separatemodule. These two mod-
ules accessthe symbol table directly (to add new entries and to search the
Assemble
symbol
Assign
addresses and data
instructions
Assign
symbol addresses AStractions
and data
Symbol Symbol
addresses addresses
table).
Thus,bothmodules
mustknowtheinternal
structure
of thesymbol
table. For example, if the symbol table is a hash table, then both modules must
know the size of the table, the hashing function, and the methods used for re-
solving collisions. This creates more work for the programmers who imple-
ment the modules. It also leads to duplication of effort, because the same
code is written twice, and it creates additional possibilities for errors in the
implementation.
Similar problems may occur during the maintenance phase. If the organi-
zation of the table or the methods for accessingit are changed, then both of the
processing modules must be modified. As before, this requires more work and
may lead to errors if one module is updated and the other is not.
The difficulties just described are a consequence of the undesirable cou-
pling between the two modules—closely related items of information and pro-
cessing logic occur in both modules. The modules also exhibit relatively poor
cohesion—each module must contain information and logic that is related to
the design of the symbol table, instead of focusing only on the logical require-
ments of the specific processingtask being performed.
A better design, with increased cohesion and reduced coupling, is shown
in Fig. 8.9(b). We have defined a new module whose sole purpose is to access
the symbol table. This module is called by the other two whenever they need
to perform any operation on the table. Thus, the two original modules need
only know the calling interface (parameters, etc.) used to invoke the “access”
module. The internal structure of the symbol table—size, organization, algo-
rithms for access,etc.—are of concern only within the new module that per-
forms the actual access.This reduces the amount of knowledge that must be
included in the two main processing modules. It also simplifies the mainte-
nance of the system in case the internal details of the table structure need to
be changed.
In the processjust illustrated, the effect of a design decision (i.e., the inter-
nal structure and representation of the symbol table) was “isolated” within a
single module. This design principle is sometimes referred to as isolationof de-
sign factors, or simply factor isolation. The same general concept is also often
called informationhiding (becausea module “hides” some design decision), or
data abstraction (because the rest of the system deals with the data as an
“abstract” entity, separated from its actual representation). Further discussions
of these topics can be found in Ng and Yeh (1990)and Lamb (1988).
Figure 8.10 shows a modularization of the data flow diagram from Fig. 8.8
according to the principles just described. This design includes the new mod-
ule introduced in Fig. 8.9 (Access_symtab);there are also several other similar
changes. The source statements, instruction and data addresses, and error
flags that are communicated from Pass 1 to Pass 2 are included in an interme-
diate file. (A discussion of the reasonsfor this design decision may be found in
Assembier
[ \
| f_—_ NN
/ NON
| / NON
/ Opcode NON Object
y table ‘\“\ \ program
[ Pass1 -Pass
CT / ‘“ \
N\.
' , P2_search_optab
SY
|| #
earc
opcode
: aN
. \
| table P2_write_ob) |
ii
Pi_assign_loc Write |
| Assign instruction power |
instruction
and
data addresses
| ane
cae Operation codes
|
| Access
intermediate
file
Pl_read_source ' Assembled
! Error Assemble instructions
Source Read Access_int_file flags instructions anddatawith
program | source anddata addresses
| program
|
Af P2_assemble_inst
|
| Source
Assign statements
symbol |
addresses i
Pl_assign
sym P2_writelist Z/
Symbol | Access_symtab Symbol! ye 7
addresses Access addresses 7
symbol y 7, /
table / ,¢
y 7 Assembly
— 7 listing
/
/
Symbol y4
table
7
/
/
69P
470 Chapter8 SoftivareEngineeringIssues
Section 2.1.) Anew module (Access_int_file) has been defined to handle all of
the reading and writing of this intermediate file. The reasonsfor including this
module are essentially the same as those discussed above—all of the details
concerning the structure and accesstechniques for the intermediate file are
isolated within a single module and removed from the rest of the system.
Likewise, we have defined a module (P2_search_optab)whose sole pur-
pose is to accessthe operation code table. This design decision is somewhat
different from the two just discussed, becauseit does not materially reduce the
coupling between modules. (Whether or not the new module is defined, there
is still only one place in the system where the structure of the table must be
known.) However, the decision does make the module that assembles instruc-
tions smaller and less complex. It also improves module cohesion by separat-
ing two logically unrelated functions that were previously part of the same
module. Thus, it leads to a modular structure that is easier to implement, un-
derstand, and modify. For similar reasons, we have introduced a module
(P1_read_source) that reads the source program and passes the source state-
ments to the rest of the assembler.This module could, for example,handle de-
tails such as scanning for the various subfields in a free-format source
program.
Figure 8.10 representsone possible stage in the modular design of our as-
sembler. Depending upon the detailed specifications for the assembler,how-
ever, some of these modules may still be larger and more complex than is
desirable. In that case, the decomposition could be carried further. For exam-
ple, module P2_assemble_inst could be divided into several submodules ac-
cording to the type of statement being processed—oneto assembleFormat 3
instructions, one to process BYTE assembler directives, etc. There may also be
a number of “common” or “utility” functions that are used by more than one
module in the system. Thesefunctions might include, for example, conversion
of numeric values between internal (integer) and external (hexadecimal char-
acter string) representations. Each such function could be isolated within its
own module and called by the other modules as needed. This isolation would
improve module cohesion and reduce module coupling, resulting in the bene-
fits previously described.
Assembler
4a
g
| Passe
Access_symtab P2_search_optab
(a)
Symbol(!)
Address (I/O)
Pl_read_source Return code (O) Pass_l
Source statement (O)
Error flags (O)
Pl_assign_loc Source statement (I) Pass_l
Error flags (O)
Current location counter (1)
Next location counter (O)
Pl_assign_sym Source statement (I) Pass_l Access_symtab
Error flags (O) (for each label)
Current location counter (I)
P2_assemble_inst Source statement (I) Pass_2 P2_search_optab
Error flags (I/O) (for each instruction)
Current location counter (I) Access_symtab
Object code (O)
P2_search_optab Mnemonic opcode (I) P2_assemble_inst
Return code (O)
Machine opcode (QO)
P2_write_obj Current location counter (I) Pass_2
Object code (I)
P2_write_list Source statement (1) (b) Pass_2
Current location counter (I)
Error flags (I)
474 Chapter8 SoftwareEngineeringIssues
Consider, for example, an object that represents the symbol table used by
an assembler. The methods defined by this object might include operations
such as Insert_symbol and Lookup_symbol. The instance variables of the ob-
ject would be the contents of the hash table (or other data structure) used to
store the symbols and their addresses.The representation of the instance vari-
ables—for example, the way the hash table is organized—would be invisible
to the rest of the assembler.
Compare this approach with the isolation of design factors that we dis-
cussed in Section 8.3.3. In effect, the object that represents the symbol table
combines the “symbol table” data structure and the “access symbol table”
module that are shown in Fig. 8.9(b). The object-oriented representation pro-
vides the same advantages of data abstraction and information hiding that we
discussedin Section 8.3.3.For example, any changesin the internal organiza-
tion of data in the object do not affect the rest of the assembler. The OOP term
for this kind of abstraction is encapsulation.
However, there is much more to OOP than encapsulation. In the object-
oriented paradigm, each object is created as an instanceof some class. A class
476 Chapter8 SoftwareEngineeringIssues
>
can be thought of as a template that defines the instance variables and meth-
ods of an object. It is possible to create many objects from the same class. For
example, suppose that an assembler is designed to translate programs for dif-
ferent versions of the target machine (such as SIC and SIC/XE). The assembler
might make use of a class named Opcode_table. A separate instance of this
class (i.e., a separate object) could be created to define the instruction set for
each version of the target machine.
Classes can be related to each other in a variety of ways. Consider, for ex-
ample, Fig. 8.12(a). An object of the class Source_program could be used to
represent an assembler language program. This object might contain a variety
of information about the program itself—for example, the total program size
and an indication of whether or not errors have been detected. It could also in-
clude a collection of objects of the class Source_line. Each of these objects
would represent a single line of the program.
In this example, the relationship between the class Source_program and
the class Source_line is one of inclusion or aggregation. In OOP terms, this is
called a “has-a” relationship. The diagram indicates that there is a 1:N rela-
tionship between one instance of Source_program and many instances of
Source_line.
It is also possible for one class to be a subclassof another. For example, Fig.
8.12(b) shows a class Symbol_table and a class Opcode_table.These are both
subclasses of the baseclass Hash_table. In OOP terms, this is called an “is-a” re-
lationship.
Subclassing is very important in the object-oriented paradigm. When a
subclass is created, it automatically inherits all of the instance variables and
methods of the base class. For example, suppose the class Hash_table defines
methods called Insert_item and Search_for_item. When the classes
Symbol_table and Opcode_tableare declared, they automatically contain defi-
nitions of thesesame methods. Likewise, they automatically incorporate what-
ever mechanism is used in the class Hash_table for organizing and accessing
the data (instance variables).
The instance variables and methods inherited by a subclasscan be overrid-
den to add new or specialized behavior to the subclass. For example, the
instance variables of the classSymbol_table could be changed to include sym-
bol addresses, error flags, and other needed information. Likewise, the
instance variables of Opcode_table could be changed to include information
about instruction formats. The method Insert_item could be deleted from
the subclass Opcode_table to prevent accidental changes to the contents of
this table.
8.4 Object-OrientedSystemDesign 477
Hash_table
is-a {S-a
Symbol_table Opcode_table
(b)
Searchable_data_structure
is-a {S-a
Hash_table Binary_search_tree
is-a is-a
Symbol_table Opcode_table
(c)
This macro processrepeats itself after each major releaseof a software prod-
uct. The overall sequenceof events is similar to the waterfall model depicted
in Fig. 8.1. However, Booch argues that object-oriented development is inher-
ently iterative, so that reverse flows of information like those discussed in
Section 8.1.2 are inevitable.
Booch’s micro process essentially represents the daily activities of the sys-
tem developers. It consists of the following activities:
to indicate the time that the flow of control is focused in each object.
Consider, for example, the interactions shown in Fig. 8.15. The primary fo-
cus of control is in the object Source_program. The Assemble method of
Source_program begins by creating 4 new instance of Source_line for each line
in the input file for the assembler.The method Assign_location is then invoked
for each Source_line object. During its period of control, Source_line may in-
voke Enter to make an entry in the symbol table. If errors are detected, it may
invoke the method Record_error on itself.
8.4 Object-OrientedSystemDesign 481
Source_program
Contents ,
Methods
Assemble — Translatethesourceprogram,produc-
ing an object program and an assem-
bly listing.
Source_line
Contents
Methods
Assign_location — Assign
a locationcountervalueto the
line; return an updated location
counter value to the invoker. Enter
the label on the line (if any) into the
symbol table.
Translate — Translatethe instruction or data defi-
nition on the line into machine lan-
Symbol_table
Contents
Methods
Opcode_table
Contents
Object_program
Contents
Assembly_listing
Contents ,
Methods
Complete
Create
| Assign_ Translate Assembly_listing
Location
Enter_line
Error Enter_text
Object_program
Enter Search Search
Symbol_table Opcode_table
if errors found
foreach
Source_line
Translate
if operand exists
if instruction
Record_
if errors found error
Unit tests are often conducted by the programmer who writes each mod-
ule. However, there are many advantages to having other people involved in
the unit testing. From the programmer’s point of view, a test is successful if it
shows that the program works correctly. However, the purpose of testing is to
reveal bugs in the software under test—thus, a test should be considered suc-
cessful if it discloses a flaw. The programmer who writes a piece of code may
be too close to it, and too psychologically invested in its success,to design rig-
orous tests.
The term bottom-uptesting describes one common sequencein which the mod-
ules of a system undergo unit testing and are integrated into a partial system.
In the case of a procedural system like the one shown in in Fig. 8.11(a),the
term bottom-up refers to the hierarchical calling structure. The modules at the
lowest level of the hierarchy (i.e., farthest from the root) are tested first, then
the modules at the next higher level, and so on.
Thus, in the structure of Fig. 8.11(a),we might first perform unit testing on
the modules Access_symtab and P2_search_optab.Then we might unit test
module P2_assemble_inst. After this unit testing, we could combine these
three modules into a partial system and perform integration testing on them.
The other modules at level 3 of the hierarchy would also be unit tested indi-
vidually. Then they would be integrated together to form Pass_1and Pass_2,
8.4 Object-OrientedSystemDesign 487
and these larger subsystemswould be tested. Finally, the “driver” routine for
the assembler would be unit tested, and all of the modules would be com-
bined for system testing.
In the caseof an object-oriented system, the situation may not be as clear.
In general, bottom-up testing for such a system would begin with objects that
are passive—thatis, objectsthat do not invoke methods on other objects.In Fig.
8.14, for example, the objects Symbol_table, Opcode_table, Object_program,
and Assembly_listing might be tested individually first. The sequencewould
then continue with other objects that invoke methods only on objects that have
already been tested. Thus, in Fig. 8.14 the next object to be tested would be
Source_line.Finally the object Source_programwould be tested.
During the unit testing of individual modules and the integration testing
of partial systems,it is necessaryto simulate the presenceof the remainder of
the system. This can be done by writing a test driver program for each module.
Figure 8.16 shows the outline of a simple test driver for the procedure
Access_symtabfrom Fig. 8.11.This driver reads test cases(i.e., sets of calling
parameters) that are supplied by the person performing the test and calls
Access_symtabwith these parameters. It then displays the results returned by
the procedure, so that thesecan be compared with the correct input.
Essentially the same process would be used to test the behavior of an ob-
ject. Each test casewould include a specification of a method to be invoked on
the object, and any parameters that are passed with the invocation. After in-
voking the method, the test driver would display any values that were re-
turned from the invocation. The details of doing this depend on the syntax of
the particular object-oriented language being used.
begin
while not eof(input) do
begin
readin (request_code, symbol, address)
;
{ read test case from input }
Access_symtab(request_code, return_code, symbol, address)
;
{ call Access_symtab with test case }
writeln(request_code, return_code, symbol, address)
;
{ display result }
end;
end.
Bottom-up testing is the most frequently used strategy for unit testing and
module integration. Test casesare delivered directly to the module, instead of
being passed through the rest of-the system. Thus, it is relatively easy to test a
large variety of different conditions. Bottom-up testing also allows for the si-
multaneous unit testing and integration of many different low-level modules
in the system. This can be of real benefit in meeting project deadlines.
However, bottom-up testing has been criticized by a number of authors.
The most frequent objection is that, with bottom-up testing, design errors that
involve interfaces between modules are not discovered until the later stages of
testing. When such errors are discovered, fixing them can be very expensive
and time-consuming. In some complex systems, it can also be difficult to write
drivers that exactly simulate the environment of the unit or partial system be-
ing tested.
parameters it was passed.In some cases,the stub need do nothing except exit
8.4 Object-OrientedSystemDesign 489
back to the calling procedure. Figure 8.17 shows sample stubs for some of the
modules in Fig. 8.11.
The same processcan be used to simulate the behavior of an object that
has not yet been implemented. The object being simulated would contain a
stub like those in Fig. 8.17 for each method it defines. The simulated object
would include only enough code to allow communication with the object that
invokes its methods. The details of doing this depend on the syntax of the par-
ticular object-oriented language being used.
begin
next_locctr := curr_locctr + 3;
end;
begin
if mnemonic = ‘LDA’ then
begin
returm_code := 0; {mnemonic found in opcode table}
opcode := ‘00’; {machine opcode = hex 00}
end
else
begin
return_code := 1; {mnemonic not found in table}
opcode := ‘FF’; {set machine opcode to hex FF}
end;
end;
begin
writeln(’*** P2_write_obj executed ***’);
end;
EXERCISES
Section 8.2
Modify the data flow diagram in Fig. 8.7(d) for a SIC/XE assembler
that supports program blocks (see Sections 2.2 and 2.3.4).
Modify the object diagram in Fig. 8.14 for a SIC/XE assembler that
supports literals (see Sections 2.2 and 2.3.1).
Modify the object diagram in Fig. 8.14 for a SIC/XE assembler that
supports program blocks (see Sections 2.2 and 2.3.4).
Writealgorithms
forimplementing
themethods
of theobject
Assembly
_listing.
How would the implementations of the Search method be different
in the classesHash_tableand Binary_search_tree?
How would the implementations of the Enter method be different in
the classesHash_table and Binary_search_tree?
What methods might be implemented in the class Hash_table that
would not be found in the class Binary_search_tree?
What methods might be implemented in the class Binary_search_tree
that would not be found in the class Hash_table?
8.11).
Outline a test driver for module P2_write_obj (see Figs. 8.10 and
8.11).
Outline a test driver for the object named Source_line (see Figs. 8.14
and 8.15).
494 Chapter
10.
12.
11. 8 SoftwareEngineeringIssues
Write stubs for the methods of the object Source_line (see Figs. 8.14
and 8.15).
Appendix A
SIC/XE Instruction
Set and Addressing
Modes
Instruction Set
Privileged instruction
Instruction available only on XE version
Floating-point instruction
Condition code CC set to indicate result of operation (<, =, or >)
495
496 AppendixA SIC Instruction Setand AddressingModes
J m 3/4 3C PC —m
Instruction Formats
Format 1 (1 byte):
Format 2 (2 bytes):
8 4
6 11114141 20
0
Addressing Modes
The following addressing modes-apply to Format 3 and 4 instructions.
Combinations of addressing bits not included in this table are treated as errors
by the machine. In the description of assembler language notation, c indicates
a constant between 0 and 4095 (or a memory address known to be in this
range); m indicates a memory address or a constant value larger than 4095.
Further information can be found in Section 1.3.2.
Addressing
The letters Flag
in the
bits
Notes column have
‘ the following meanings:
Addressing
Modes 499
Format 4 instruction
Direct-addressing instruction
Assembler selects either program-counter relative or base-relative
mode
Assembler Calculation
language__ of target
type nixbpe_ notation address TA Operand Notes
00 NUL 20 SP 40 60
01 SOH 21 { 41 61
a“
02 STX 22 42 62
03 ETX 23 # 43 63
04 EOT 24 $ 44 64
05 ENQ 25 %o 45 65
06 ACK 26 & 46 66
i
07 BEL 27 47 67
08 BS 28 ( 48 68
09 HT 29 ) 49 69
0A LF 2A 4A 6A
OB VT 2B + 4B 6B
0C FF 2C ~
4C 6C
0D CR 2D 4D 6D
OE SO 2E 4E 6E
OF SI 2F 4F 6F
10 DLE 30 50 70
11 DC1 31 51 71
12 DC2 32 52 72
13 DC3 33 53 73
14 DC4 34 54 74
15 NAK 35 55 75
16 SYN 36 56 76
17 ETB 37 57 77
18 CAN 38 58 78
19 EM 39 59 79
1A SUB 3A SA 7A
1B ESC 3B Ne 5B 7B
1C FS 3C 5C 7C
1D GS 3D I“MV
A 5D
1E RS 3E 5E 7E
1F US 3F 5F 7F DEL
501
APPENDIX
SIC/XE Reference
C
Status
Material
Word
Contents
Bit Field
Interrupts
Class
503
Appendix C SIC/XE ReferenceMaterial
00 Illegal instruction
01 Privileged instruction in user mode
02 Address out of range
03 Memory-protection violation
04 Arithmetic overflow
10 Page fault
11 Segment fault
12 Segment-protectionviolation
13 Segmentlength exceeded
Channel Command Format 505
0 Halt device
1 Read data
2 Write data
Bytes Contents
Status flags
The workCF
area for channel n begins
Reserved
at hexadecimal memory address 2n0.
References
Aho, Alfred V., Sethi, Ravi, and Ullman, Jeffrey D., Compilers:Principles, Techniques,and Tools,
Addison-Wesley Publishing Co., Reading, Mass., 1988.
Anderson, Don and Shanley,Tom, Pentium ProcessorSystemArchitecture(2nd edition), Addison-
WesleyPublishing Co., Reading, Mass., 1995.
Amold, Ken and Gosling, James,TheJavaProgrammingLanguage,Addison-Wesley Publishing Co.,
Reading, Mass., 1996.
Baase,Sara, VAX-11AssemblyLanguageProgramming(2nd edition), Prentice-Hall, Inc., Englewood
Cliffs, N.J., 1992.
Barkakati, Nabajyoti, Te WaiteGroup’sMicrosoftMacroAssemblerBible(2nd edition), H. W. Sams,
Indianapolis, 1992.
Barcucci, Elena and Pelacani, Gianluca, “A Software Development System Based on a
Macroprocessor,”Software:Practiceand Experience14:519-531,June 198-4.
Barnes,J. G. P.,Programmingin Ada 95,Addison-Wesley Publishing Co., Reading,Mass., 1996.
Bauer, F.L., “Software Engineering,” Information Processing71, North-Holland Publishing Co.,
Amsterdam, 1972.
Becker,George,Morris, Mary E. S., and Slattery,Kathy, SolarisImplementation:
A Guidefor System
Administrators, SunSoft Press, Mountain View, Ca., 1995.
Booch, Grady, Object-Oriented Analysis and Design with Applications, Benjamin/Cummings
Publishing Co., Redwood City, Ca., 1994.
Brooks, Frederick P.,TheMythical Man-Month: Essayson SoftwareEngineering(anniversary edition),
Addison-Wesley Publishing Co., Reading, Mass., 1995.
Brown, PeterJ.,Macre Processorsand Techniquesfor PortableSoftware,John Wiley & Sons,New York,
1974.
507
508 References
Landwehr, Carl E., Bull, Alan R., McDermott, John P., and Choi, William S., “A Taxonomy of
Computer Program Security Flaws,” ACM ComputingSurveys26: 211-254,September1994.
Lazzerini, Beatrice,ProgramDebuggingEnvironments:Designand Utilization, Ellis Horwood, New
York, 1992.
Levine, John R., Mason, Tony, and Brown, Doug, Lex & Yacc,O'Reilly & Associates,Sebastopol,
Ca., 1992.
Lewis, Harry R. and Denenberg, Larry, Data Structuresand Their Algorithms, HarperCollins, New
York, 1991.
Lindholm, Tim and Yellin, Frank, The Java Virtual Machine, Addison-Wesley Publishing Co.,
Reading, Mass., 1996.
Marciniak, John J., “Software Engineering, A Historical Perspective,” in Encyclopedia
of Software
Engineering,John Wiley & Sons,Inc., New York, 1994.
Meyrowitz, Norman and van Dam, Andries, “Interactive Editing Systems: Part I,” ACM
ComputingSurveys14:321-352,September 1982.
Meyrowitz, Norman and van Dam, Andries, “Interactive Editing Systems: Part II,” ACM
ComputingSurveys14:353-416,September1982.
Microsoft Corporation, TheMS-DOS Encyclopedia, Microsoft Press,Redmond, Wash.,1988.
Ng, Peter, and Yeh, Raymond T. (ed.), Modern Software Engineering: Foundationsand Current
Perspectives,Van Nostrand Reinhold, New York, 19990.
Norton, Peter,PeterNorton'sCompleteGuideto DOS 6.22,SamsPublishing, Indianapolis, 1994.
Parnas, David L. and Clements, Paul C., “A Rational Design Process:How and Why to Fake It,”
IEEE Transactions on SoftwareEngineeringSE-12;251-257,February 1986.
Patterson, David A. and Hennessy,John L., ComputerArchitecture:A Quantitative Approach(2nd
edition), Morgan Kaufmann, San Francisco, 1996.
Pfleeger,Charles P., Security in Computing (2nd edition), Prentice-Hall, Englewood Cliffs, N.J.,
1996.
Pietrek,Matt,Windows
95System
Programming
Secrets,
IDGBooks,
FosterCity,Ca.,1995.
Ramamoorthy,C.V. and Siyan, K., “Software Engineering,” in Encyclopedia
of ComputerScienceand
Engineering,(2nd edition), Van Nostrand Reinhold, New York, 1983.
Rosen, Kenneth H., Rosinski, Richard R., and Farber, James M., UNIX System V Release4: An
Introductionfor Newand Experienced Users,OsborneMcGraw-Hill, Berkeley,Ca., 1990.
Schildt, Herbert, TheAnnotatedANSI C Standard,Osborne McGraw Hill, Berkeley,1990.
References
Schonberg, Edmond, and Banner, Bernard, “The GNAT Project: A GNU-Ada 9X Compiler,” Tri-
Ada 94 Proceedings,
48-57, 1994.
Schulman, Andrew, UndocumentedDOS (2nd edition), Addison-Wesley Publishing Co., Reading,
Mass., 1993.
Sebesta,Robert W., Conceptsof ProgrammingLanguages,Addison-Wesley Publishing Co., Reading,
Mass., 1996.
Seidner, Rich and Tindall, Nick, “Interactive Debug Requirements,” Proceedingsof the ACM
SIGSOFT/SIGPLANSoftwareEngineeringSymposiumon High-LevelDebugging,March 1983,ap-
pearing in SoftwareEngineeringNotesand SIGPLAN Notices,August 1983,pp. 9-22.
Simrin, Steven, TheWaiteGroup’sMS-DOS Bible(4th edition), H. W. Sams,Indianapolis, 1991.
Singhal, Mukesh and Shivaratri, Naranjan G., AdoancedConceptsin OperatingSystems,McGraw-
Hill, New York, 1994.
Sites,Richard L. (ed.), Alpha ArchitectureReference
Manual, Digital Press,Burlington, Mass., 1992.
Smith, James E. and Weiss, Shlomo, “PowerPC 601 and Alpha 21064: A Tale of Two RISCs,”
Computer27: 46-58, June 1994.
Sommerville, Ian, SoftwareEngineering(5th edition), Addison-Wesley Publishing Co., Reading,
Mass., 1996.
Sun Microsystems, “SPARC Assembly Language ReferenceManual,” 1994a.
Sun Microsystems, “SunOS Linker and Libraries,” 1994b.
Sun Microsystems, “C 3.0.1User’s Guide,” 1994c.
Sun Microsystems, “The UltraSPARC Processor—A Technology White Paper,” http:
/ /www.Sun.com:80/sparc / whitepapers /UltraSPARCtechnology,1995a.
Sun Microsystems, “The Java(tm) Language Environment: A White Paper,” http:
/ /java.sun.com/whitePaper /java-whitepaper-1.html, 1995b.
Sun Microsystems, “SunOS ReferenceManual,” 1995c.
Tabak, Daniel, AdvancedMicroprocessors(2nd edition), McGraw-Hill, Inc., New York, 1995.
Tanenbaum,Andrew S., Modern OperatingSystems,Prentice-Hall, Englewood Cliffs, N_J.,1992.
Tanenbaum,Andrew S., van Renesse,Robbert, van Staveren,Hans, Sharp, Gregory J., Mullender,
Sape J., Jansen,Jack, and van Rossum, Guido, “Experiences with the Amoeba Distributed
Operating System,” Communicationsof theACM 33: 46-63, December 1990.
UNIX SystemLaboratories, UNIX SystemV Release 4: IntegratedSoftwareDevelopmentGuidefor Intel
Processors,Prentice-Hall, Englewood Cliffs, N.J., 1992.
Watt, David, ProgrammingLanguageProcessors: Compilersand Interpreters,Prentice-Hall, Englewood
Cliffs, N.J., 1993.
index
80x86. See x86 Pass one, 48, 49, 50, 51, 53, 70-71, 80, 105
Absolute expression, 76-77,91 Pass two, 48, 49, 50, 51, 54, 57, 71, 80
Absolute loader, 124-127 Assembler directives, 13, 44, 48, 49, 60
Absolute program, 62, 124 Assignment statement
Absolute symbol, 76-77,80 code generation, 263-268
Access control, 385 parsing ,247, 255
Access matrix, 385 syntax, 229-230, 247
Access method, 373 Associative memory, 370
Activation record, 289-293, 297-299 Asynchronous, 333
Ada, 203, 308, 310 Authorization, 385, 445
Addressing modes Authorization list, 385
PowerPC, 35-36 Automatic allocation, 289-293, 297
SIC, 6, 9-10, 498-499 Automatic call. SeeAutomatic library search
SPARC, 31-32 Automatic library search, 147-149,151, 155, 163
T3E, 39
VAX, 23 Backing store, 362, 368
Immediate
x86,also
See
ing, Program-counter
27-28
Base-relative
addressing,
addressing,
relative
Indexedaddressing
addressing,
Directaddressing,
Indirectaddress- Backus-Naur form. See BNF
BASE, 56, 60
Base class, 476, 484
Baseregister table, 109
AIX, 108-111 Base-relativeaddressing
Alignment, 21, 25, 34, 35, 38 and relocation, 64, 65
Alpha. SeeT3E in compilation, 290, 291, 293, 294
Ambiguous grammar, 231 PowerPC, 35, 108-109
Amoeba, 411-413 SIC, 9, 56
APL, 302 SPARC, 32
Applet, 314 T3E, 39
ARGTAB. SeeArgument table VAX, 23
Argument table, 182-183 x86, 28, 103-104
Arithmetic expression Basic block, 276, 284, 307
code generation, 263-268 Batch processing system, 328, 378
parsing,247,255 BCD, 27
syntax, 227, 229-231 Best-fit, 357
Seealso Expression Beta testing, 486
Array Big-endian, 31, 34
allocation of, 279-280, 283 Binary coded decimal. SeeBCD
one-dimensional, 279, 280, 281 Binary object program, 126
references to, 280 Binding, 158,294,302, 372
two-dimensional, 279, 280, 281-283 BIOS, 400
ASCII character codes, 501 Bit mask, 132
Assembler Black-box testing, 485
algorithms, 46-48,49-52 Block (of data), 374
data structures, 50-52 Block-structured language, 294-299
errors, 51, 59, 76, 88, 96, 102 Blocked state, 341, 382, 383
intermediate file, 51-52 Blocking (of data), 374-376
one-pass. SeeOne-passassembler BNF, 227-230, 253
511
512
Opening
afile, 373 registers, 34
Operating system Precedence, 231, 243-244
goals, 325,326,328,342,377-379 Precedence matrix, 244-246, 250
services, 326-327, 329-331 Precedence relation, 244-245, 246
types of, 325, 327-328 Preemptive scheduling, 344,351
user interface, 326, 328-329, 388 Priority scheduling, 342, 343, 344,377,404-405,407-408
Operation code table, 50-51,57 Privileged instructions, 331,337,358,387,391
Operator-precedenceparsing, 243-250,260,261.Seealso Procedural design, 459-474,486,488
Precedence module interfaces, 470-474
OPTAB. SeeOperation codetable principles of, 465
ORG, 73-74, 75, 98 using data flow diagram, 466-470
Process, 339-340, 404, 408. See also Thread
P-codecompiler, 299, 302-304,313 Process identifier, 337, 358
P-machine, 302-303 Processscheduling, 339-344,351-354,392,401, 404-405,
Packed decimal, 22 407-408
x86, 28 registers, 22
Task. See process Virtual address space, 21-22, 30, 34, 36, 405, 408
TD, 7, 18-19 e Virtual machine, 330, 390-392, 403-404
Temporary variable, 267, 268, 275, 278, 289 Virtual machine monitor, 390-392, 404
Terminal symbol, 228
Test driver, 487, 490 408
Text editor, 431-440 Virtual
Virtual-to-real
resource,
mapping,
361 362, 372. Seealso Dynamic address
commands, 434-435
document representation,439 translation
editor
User mode, 331, 337, 387
VAX, 21-24
addressingmodes,23
data formats, 22-23
Input and output, 24
instruction formats, 23
instruction set, 23-24
memory, 21-22