High Level Programmers Guide To The 68000 1992
High Level Programmers Guide To The 68000 1992
McCabe
High-Level
Programmer's
Guide to the
.68000
/
Prentice Hall International Series in Computer Science
Francis G. McCabe
Prentice Hall
New York London Toronto Sydney Tokyo Singapore
First published 1992 by
Prentice Hall International (UK) Ltd
66 Wood Lane End, Heme! Hempstead
Hertfordshire HP2 4RG
A division of
Simon & Schuster International Group
McCabe, Francis G.
High-level programmer's guide to the 68000.
I. Title
005.13
ISBN 0-13-388034-6
1 2 3 4 5 96 95 94 93 92
Contents
lntroduction ................................................................................................................1
1.1 Approach ........................................................................................................4
1.1. l Structure of the book. ...................................................................... 5
1.1.2 Exercise ............................................................................................... 7
Prolog........................................................................................................................184
10.1 Prolog data structures ................... ;........................................................ 185
10.1.1 Unification in Prolog .................................................................. 190
Compiling unification ................................................................ 191
10.2 Controlling a Prolog execution ........................................................... 193
10.2.1 The Prolog evaluation stack ...................................................... 196
10.2.2 A sample evaluation ................................................................... 199
viii Contents
•
List of figures
xv
xvi Aims and objectives
'
programs written in Pascal, LISP and Prolog are mapped onto the
computers resources.
Just as we do not aim to teach assembler programming as another
programming lanuage, nor do we aim to teach compiler construction.
Although there is some overlap, in the sense that a compiler construction
course would also cover details of code generation, our objective is to
understand the code that is generated by high quality compilers, not how
the code is actually generated.
CHAPTER ONE
Introduction
1 . 1 Approach
Given our motivation for introducing assembler language programming
an obvious approach is to see how a modern programming language is
mapped onto a typical modern processor. We take Pascal as our primary
example of a 'modern' programming language and the 680x0 series of
processors as our modern computer.
Pascal is a reasonable choice as an application language even though it
may not be the most popular programming language for professional
programmers. This is because it contains features which are found in
many other languages - which you are likely to use - such as types,
records, arrays, recursion, scoped procedures etc. It is simple to see how
other programming languages like 'C' are represented by viewing them as
simple modifications to the basic scheme presented for Pascal.
As a target architecture, the 680x0 is appropriate since it is popular in
real computers and it has a clean straightforward architecture. In seeing
how Pascal is mapped to the 680x0 we can appreciate some of the
architectural features that we find in the 680x0 (for example the use of
separate address and data registers in the 680x0 register bank).
There are several models in the 68000 range of processors. Since we are
primarily concerned with the basic instructions common to the whole
range, we shall refer to the 680x0 when we mean any of them. Where a
difference is important (in that it allows us to choose a different
representation of a programming language construct) we will obviously
highlight it. For example if there are restrictions which apply to the 68010
or 68000, or when we want to discuss additional features available on the
68020 or 68030 which are not available on the base 68000 model; in these
circumstances we shall be more explicit.
Note that this is not a book about how to program; and we shall assume
that you are already familiar with, and are reasonably comfortable writing
programs in, Pascal. However, since it would be unnecessarily restrictive
to make the same assumption about the assembler level, we will give a
basic introduction to the architecture of the 680x0 series from a
programmer's point of view. This includes introducing the concepts of
registers, memory and so forth.
However, we will not be going into the details of the architecture of the
68000 that would be necessary for a computer designer. Thus we will not
be dealing with issues relating to interfacing the 68000 to memory; nor will
1.1 Approach 5
and we will see how programs like this can be represented by sequences of
680x0 assembler instructions such as:
languages which are considerably less widespread in their use than the
mainstream languages which are well represented by Pascal.
Most chapters are accompanied by exercises. These are intended to
deepen your understanding of the text. Some of the exercises lead into
areas which go beyond the scope of the book, and the reader is encouraged
to follow this lead.
Chapter 2 concentrates on the issues involved in representing numbers
in the machine. We look at how integers are represented, and the
fundamental nature of computer arithmetic. We also see how algorithms
for multiplication and division can be implemented. It is important to get
to grips with numbers in computers because they appear extremely
frequently within programs.
Quite apart from their role in application programs, multiplication and
division represent the most complex operations which are necessary to
support features of Pascal itself. Other, more complex operations such as
cosine and square root are important for applications but are not needed to
access and generate data structures.
Chapter 3 introduces the basic structure of the 680x0 series of processors.
The aim of this chapter is to familiarize you with the registers and
operations available to the assembler programmer. We also see exactly
what goes into an assembler program, and how they are assembled.
By the end of Chapters 2 and 3 you should be aware of the major
components of the 680x0 and the kind of data objects that are prevalent in
an assembler program. This provides a base for the following chapters in
which we explore the use of 680x0 features to support Pascal.
Chapter 4 looks at the representation of scalar values and expressions.
Some techniques for implementing expressions are presented based on
converting into reverse polish notation and using the system stack and
registers. We also look at the role of run-time errors in making sure that
programs only execute normally when the arithmetic performed is safe:
i.e. it is within the limits set by the program.
Chapter 5 concentrates on non-scalar or compound data structures. We
cover how records are laid out in memory and how fields of records are
accessed. We also see how arrays are mapped onto the machine and how
array elements are indexed and accessed.
The more advanced Pascal data structures such as packed structures and
sets are covered in Chapter 6. We illustrate the large difference between
accessing normal or unpacked structures and packed structures. In this
case, and generally, we show the instructions necessary to access structures
both in standard 68000 instructions and in 68020/68030 instructions (where
their additional instructions and addressing modes can make the tasks
simpler).
In Chapter 7 we tackle the issues of Pascal's control features. We see
how the various basic control structures such as conditional statements
1.1 Approach 7
and loop statements are supported by the 680x0. We conclude this chapter
with a section on performance oriented assembler programming.
The procedure and function statements merit a separate chapter:
Chapter 8. In this chapter we see how procedures are called, parameters
are passed to them and how local variables are allocated. This chapter also
examines the complexities of scoped procedures and implementing goto
in the context of scoped procedures.
In Chapter 9 we leave Pascal and look at a completely different style of
programming language, namely LISP. Implementing LISP brings
additional complications over Pascal; in particular we look at how
recursive data structures are represented. and garbage collection.
Chapter 10 introduces some of the mechanisms needed to support
Prolog. Prolog and LISP are quite a lot further from the machine's level
than Pascal. This increased gap is reflected in the relatively long sequences
of instructions needed to implement simple LISP and Prolog programs.
The two appendices A and B provide reference material on the 680x0
machine. These are primarily intended to support the text, but should
also be helpful beyond the immediate scope of this book.
Appendix A summarises the addressing modes available on the 680x0
range, and Appendix B lists all the instructions which are referred to in
the text and others which are related. This listing is not a complete listing
of the 680x0 instruction set, however it does include all the instructions
which are likely to be used in applications level programming. (There are
a number of instructions which are primarily of interest to systems
programmers and are not really relevant to normal programming.)
1.1.2 Exercise
In all, there are 2s or 256 possible patterns that 8 bits can take. By
associating each of these patterns with a number we can represent any one
of 256 different numbers in a byte, usually written as the range 0 ... 255
although the range could be represented by any range of 256 integers:
8
Bits, bytes and numbers 9
-128 ... 127 or even 1012 ... 1267. Although we might present the bit pattern
in a byte as a number it is not to be confused with the number itself: a bit
pattern is just that -: a pattern.
Bytes are a convenient size because we can represent a character from
the ASCII character set (say) easily in a single byte and character processing
applications are extremely common and important in computing. It
should be said that some character sets - especially the Japanese characters
- require two bytes per character.
Larger groupings of bytes are also common: typically a modern
computer will group two bytes together to form a word (sometimes called
a half-word) and two 16 bit words together form a long word. A 16 bit
word can represent up to 216 numbers, for example in the range 0 ... 65535.
A long word is 32 bits long and can represent 232 numbers, for example
integers in the range -2,147,483,648 ... 2,147,483,647.
Having said that the fundamental structure in a computer is a bit
pattern, it is also fair to say that the representation of numbers and of
arithmetic play an extremely important role in computer applications.
This is not simply in obvious areas such as spreadsheet programs and
graphics but also within the execution of any program. For example, each
byte that is held in memory, or on disk, has an address associated with it.
That address is also a number; and address arithmetic is vital in accessing
data within the machine.
So, we shall explore some of the issues involved in the representation
of various kinds of numeric values. In particular we look at integers, how
negative numbers are arrived at and how arithmetic is performed using
strings of bits. We also explore other kinds of number systems such as
fixed point and floating point numbers. In this way, we can prepare
ourselves for the issues of representing data in general in computers.
sixty five thousand five hundred and thirty six English decimal
65 53 6 Decimal
$1 O O O O Hexadecimal
1 OO OO OOO OO OOO Oo o OB Binary
LXVDXXXVI Roman numerals
10 Bits, bytes and numbers
These expressions are all equivalent in the sense that they denote the
same number: they are numerals. A numeral is an expression which
denotes a number.
The decimal notation that we are familiar with is a shorthand notation
for an expansion into a sum of terms, each of which is a multiple of a
power of 10. Each digit in the numeral corresponds to the factor in a
different term in the expansion; where the position of the digit indicates
which power of 10 is referred to.
For example, we can expand the number 'sixty five thousand five
hundred and thirty six' into a sum of powers of 10; and we can also expand
it as a sum of powers of 16 or 2:
65536 = 6*104+5*103+5*102+3*101+6*100
= 1*164+0*163+0*162+0*161+0*160
= 1*216+0*215+ ... +0*20
65535 = 6*104+5*103+5*102+3*101+5*100
= O*l 64+ 15*163+15*162+ 15*161+15*160
= 0*216+1*215+ ... +1*20
The so-called positional notation is used today, in preference over the
Roman system of numbers, because it is useful: we can easily perform
arithmetic on numbers by manipulating their decimal numerals. The
positional notation is so powerful that we can, for example, teach our
children mechanical techniques such as long multiplication and long
division to allow them to multiply and divide numbers beyond the scope
of simple mental recall.
Binary expansion
~ /. I
,~w··>'
I
164 Decimal number
XII
IX
VI
x + -x = 0
In modulo N arithmetic all numbers are within the range O... N-1;
including the negativ~ numbers. This means that negative numbers
appear to map onto positive numbers; for example if we subtract 4 hours
from 2 o'Clock we get 10 o'Clock rather than -2 o'Clock. To form the
negative of a number in modulo arithmetic we subtract it from the
2.2 Arithmetic in fixed length bit strings 13
modulus (12 in this case). For example, to get l-4liz we subtract 4 from 12
which gives 8 as the negative of 4. This is because l4+8=12h 2=0.
There are some particular properties of modulo numbers when
combined with a binary representation which make negative values easy
to determine. We saw above that we can represent numbers such as 35 as
expansions of powers of 2:
The negative of 35, l-3512561 is 221, and if we look at the binary expansion
of 221 we get:
221 = 1*27+1*26+0*25+1*24+1*23+1*22+0*21+1*20
= 11011 lOlB l•l•lol•l•l•lol•I
477 = 1*28+1*27+1*26+0*25+1*24+1*23+1*22+0*21+1*20
= 111011101 B l•l•l•lol•l•l•lol•I
This expansion has the same terms as the expansion for 221, except for an
additional term at the beginning. We can repeat this for any number of
terms to get an expansion of -35 in any number of bits, so in 16 bit
arithmetic (i.e. arithmetic modulo 65536), l-35165536 = 65501, and the
expansion for 65501 is:
The rightmost 8 bits are identical in the representations of 65501, 477 and
221. This is true no matter how many bits we choose to represent our
numbers provided that there are sufficient bits to represent them and their
negatives. We would get into trouble, for example, trying to represent -35
in a 6 bit system. In modulo 64 arithmetic, -35 is equivalent to 29, and this
means that we cannot separate the numbers 29 and -35 in a 6 bit system
(just as we cannot distinguish -35 and 221 in an 8 bit system); we would
have to choose which one the pattern represented. But there is not much
point in having a system which allows us to represent -35 but not to
14 Bits, bytes and numbers
1*27+1*26+0*25+1*24+1*23+1*22+0*21+1*20
which is the expansion for 221, which we already know is equivalent to -35
in modulo 256 arithmetic.
A good question to ask is given that we can represent -128, what
happens to 128? In fact the 8 bit pattern for -128 is identical to the pattern
for 128; this means that we cannot represent both in 8 bits. All the
negative numbers in the range -127 ... -1 have the most significant bit set in
their binary numerals, 128 and -128 also have the most significant bit set.
If we choose -128 to be in accord with the other negative numbers, then we
have a simple test for negative numbers: their most significant bit is set.
The signed form of representation is called 2's complement, and
arithmetic using this representation is called 2's complement arithmetic.
Nearly all modern computers use this type of arithmetic as the basic form
of integer arithmetic. Integer arithmetic may be supplemented by some
form of fractional arithmetic, typically floating point, but even that is
sometimes based on 2's complement arithmetic.
Typically, in a computer, we often mix our use of numbers - sometimes
we regard a bit string as representing unsigned numbers, and at other
times it is interpreted as signed. In both forms the interpretation is the
same for positive numbers (i.e. numbers in the range 0 ... 127 for byte
arithmetic). In fact, the operations needed to perform simple arithmetic
2.2 Arithmetic in fixed length bit strings 15
This expression, involving two shifts and an add, is often twice as fast
compared to using a general purpose multiply instruction to perform the
same multiplication.
This amounts to a shift of the multiplier to the right with the remainder
term dropping from the right hand end of the number. We use the
remainder term to decide whether or not to add the current I<<k term to
the result so far. So, each iteration of the loop performs three operations:
M: =M+2; c: =remainder;
This can be done in a single step on most computers because a
division by 2 can be achieved with a right shift. The bit pattern in the
2.2 Arithmetic in fixed length bit strings 17
number is shifted one position to the right; the leftmost bit position is
filled with a zero, and the rightmost bit which is lost from the bit
pattern is typically stored in a special 1 bit register or flag. We can test
this C flag and ...
A:=A+I;
I: =I<<l.
We perform this loop for however many significant bits there are in the
expansion of the multiplier; and the algorithm is initialized by setting the
answer A to 0. Figure 2.3 illustrates how the algorithm applies if we let
M=S and 1=10:
M C I A
[[) D 1111 11111 5*10 ...
answer=~
32+16+2=50
If the numbers involved are 32 bits long (say) then we know that after
doing the loop 32 times there can be no more significant bits in the
multiplier, so the loop is performed no more than 32 times.
In general, the result of the multiplication may have as many significant
bits in it as are in the multiplier and multiplicand together - that is why
the multiply instructions on a processor tend to produce double precision
answers: a 16 bit multiply generates a 32 bit answer.
18 Bits, bytes and numbers
x*y = sign(x)*sign(y)*abs(x)*abs(y)
We can separately multiply the signs and the absolute quantities of x and
y. The sign multiplication is a simple calculation:
1989+16
which, using standard long division has, as its first step, the division of 19
by16:
1
161 1989
16
-3-
The quotient of this division step - 1 - is also the first digit of the quotient
of the whole calculation, and the remainder - 3 - is used in the rest of the
division. The next step in the long division is to divide 38 by 16:
12
161 1989
16
~
32
6
run out of digits. The remainder of the last sub-division is the remainder
of the whole division:
124
161 1989
16
~ 1989+16 = 124 remainder 5
32
69
64
5
We can see why long division works by looking at the first step a little
more closely. In particular to divide 1989 by 16 we split it into a most
significant part and a least significant part:
= (1+3+16)*100+(89+16)
= 100 + 300+16+89+16
= 100 + 389+16
we can choose i so that dividing 'Da by the divisor 'V' results in a single digit
quotient:
20 Bits, bytes and numbers
We can now perform a single step of the long division of 'D+'l/ abstractly:
'D/10i+'J/ + 'D6+'1/
= ('Dadiv '0*10i + ('Da rem 'J)tlQi + 'D6)+'1/
where 'Da div '1/ refers to the quotient of 'Da + '1/and 'Da rem '1/refers to the
remainder. Referring to our example, we get:
where a= most significant digit of 'Dr,, and 'Dr,' is the rest of 'Dr,.
When, finally, 'Dr,' becomes zero, and there are no more digits left in the
dividend, then the corresponding 'Da' is the remainder of the whole
division and the quotient can be extracted from the intermediate quotient
digits computed along the way.
In each step of the long division we can concern ourselves with only the
most significant part of the dividend - in particular we can rely on the
property that the quotient of the sub-calculation is a single digit. The full
quotient is calculated one digit at a time. For school children of all ages it
is easier to perform such divisions than full divisions involving multi-
digit quotients and this is equally true for computers.
We can apply the same kind of reasoning to long division using binary
expansions for numbers as well as for decimal expansions. As with binary
multiplication, the use of binary expansions considerably simplifies the
procedures needed to perform long division.
If we know that the quotient of 'Da+'V is a single binary digit - 0 or 1 -
then we also know that
1oooosl11111000101s
The very first step in this long division would be to attempt the division:
1+1000B, which is, of course, 0. Thus the leading binary digit of the
quotient is 0. The next step would be to add an extra digit from the rest of
the dividend and try again: 11B+1000B, which also has a zero quotient.
The first step which results in a non-zero quotient digit is:
00001
1oooosi11111000101s
10000
1111
In the step that follows this one, we can compute the next dividend-so-far
by left shifting the remainder by one bit. The least significant bit of the
new dividend-so-far can be obtained from the most significant bit of the
rest-of-the-dividend. This is done by shifting the rest-of-the-dividend to
the left and extracting the bit that 'drops off'; this bit is then inserted into
the dividend-so-far as it is shifted to the left. The effect of this bit
twiddling is to bring down the next binary digit from the rest-of-dividend
into the current dividend.
The new dividend looks like:
000010
1oooos\11111000101s
10000
11110
1110
/final quotient
00001111100
lOOOOB\ 11111000101B
10000
11110
11100
11000
10001
final remainder~____!Q_
101
22 Bits, bytes and numbers
Quotient of
sub-d~Q "-Divi,end~-fa' v 7ivWend
D 111111111.-1111111
If we want to accumulate the digits of the quotient we can do so by setting
the least significant bit of a quotient register to the quotient digit obtained
at each step, and then left shifting it along with the dividend-so-far and
rest-of-dividend between steps. The rest of the complete division, in
terms of comparisons and left shifting can be seen though the sequence:
Divisor
Q~ v R D
rn111111111~~111111
~1~
[IIJ 111111111~~[[[[]
~1~
~;~ 11111111 l~ITIJ
~;~~rn
~;5111111111~0
~11111>~
O~ ~Final
\ remainder is 5
Final quotient is 64+32+16+8+4=124
When the last digit has been shifted out of the the dividend then the
algorithm stops, and the quotient register contains the full quotient, and
the remainder is held in the dividend-so-far register. The complete
division algorithm is:
Set 1(.to zero and set Q,to zero; 'l:Jto dividend and o/to divisor.
1) Left shift 'D by one bit, shifting its most significant bit into X;
Left shift '](_by one bit, shifting X into 'Ks least significant bit.
2) If v.:;1<. then
set X to 1, and subtract 'V from 1(
else
set X to 0.
2.2.4 Exercises
2. How many bits are needed to faithfully represent all the integers in
the range 20 ... 26?
a= ao + Exa1
and
Since each of ao, ai, bo and bi have half as many bits as a and b
respectively, the complexity of their multiplication is 0.25x0(n2)
which does not achieve very mucli. since we have four of them to do.
However, we can save one multiplication based on the observation
that
!0000011111000110,00011010000100001
~1~
binary point
The number of bits that we allocate to the integral part of a fixed point
number determines the range of numbers that can be represented; the
number of bits allocated to the fraction part determines the accuracy of the
resulting number. In the example here, where we have allocated 16 bits
for the integral part and 16 bits for the fraction part, we can approximate
numbers in the range -32768 ..32767 with a fraction accuracy of one part in
65536.
Notice that the fixed point concept includes the case of integers: simply
set the number of bits allocated to the fractional part to zero and the result
is an integer.
Binary fractions
Recall that in the standard positional notation for integers, each bit stands
for the coefficient of a power term. The same applies for the fractional part
26 Bits, bytes and numbers
0.275 = O.OlOlB
0.3 = 0*20 + 0*2-1 + 1 *2-2 + 1 *2-3 + 0*2-4 + 0*2-5 + 1 *2-6 + 1*2-7 ...
0.011001100110011 ... B
On the other hand, there are no finite binary fractions which cannot be
represented exactly as a finite decimal fraction. (This is because any term
of the form 2-x can be expressed as a finite sum of powers of 10.)
With this in mind we can perform our fixed point arithmetic separately
on the significant integer and the powers of 10:
Thus, to add two fixed point numbers we can simply add up their bit
patterns as though they represented integers. As we shall see, this
procedure is somewhat simpler than that for adding two floating point
numbers and this simplicity is the reason that fixed point numbers are
computationally efficient.
Multiplying two fixed point numbers is slightly more complicated than
adding them because we are also required to multiply the two powers of
10:
Since this product has four digits in the fraction and in our fixed point
format we only allow two decimal digits for the fraction, we must adjust
the result. This adjustment is accomplished by dividing the result of the
integer multiplication by 100 and ignoring the remainder. The result of
this division is that we 'lose' the two least significant digits to produce an
answer with the same represented accuracy as the two operands:
This loss of accuracy is inevitable with a fixed point number system since
we must always ensure that the result has its decimal point in a fixed
place.
The arithmetic for binary fixed point numbers is exactly the same as for
decimal fixed point numbers; except that in order to adjust the result after
a multiplication (and division) we have to divide by a power of 2 rather
than a power of 10. Such a division is easily achieved on a computer by a
shift instruction.
Fixed point numbers are only slightly more complicated to manipulate
than integers. As a result they are very fast, considerably faster on most
computers than floating point numbers for example. For those
applications where it is relatively easy to .predict the required range of
numbers, and that range is not very great, then fixed point numbers are
very suitable.
There are many applications that fall into that category. For example, in
a real-time radar tracking application, the input data is likely to consist of
pairs of angles and distances. The angular data will consist of numbers in
the range 0 ... 360 and will therefore all be of a similar size. The distance
information is likely to have a larger range but may still be in a relatively
manageable band. The effect of this constraint is that a fixed point system
28 Bits, bytes and numbers
may well be sufficient to represent and manipulate the angle and distance
data.
On the other hand, given the loss of accuracy that results when
multiplying two fixed point numbers together, and given the difficulty of
predicting the suitable allocation of bits to the integer and fraction part of a
fixed point number, few programming languages provide direct support
for fixed point numbers. Instead, effort is concentrated on floating point
numbers which are more stable in their accuracy.
mantissa exponent
0001101000010000
floating point
We call the string of digits which form the significant digits of the number
the mantissa or fraction and the number which indicates the position of
the floating point the exponent.
One immediate point to notice here is that the binary point need not be
within the mantissa: it can be outside it. That is, we can also represent
very large numbers (by having the pointer to the right of the mantissa)
and numbers which are close to zero (by having the pointer to the left of
the mantissa).
An alternative way of understanding the binary point pointer is that it is
a multiplier: the floating point number is represented as a mantissa
multiplied by a power of 2 (or 10 in the case of decimal floating point
2.3 Other kinds of numerals 29
1000123400000000
012340000000000
1000000000001~
so that the most significant digit is at one end of the mantissa (usually the
left end). This way we can leave room for the number of digits to expand:
\123400000000000 I
!000000000000000 000
coooooooo...... 0~01234000000000001
0.567*101 = 0.0567*102
Now if we add up the numbers we get 0.1801 *102; but, more importantly,
any digits which might be lost by dividing the mantissa by 10 are the least
significant digits from the smaller of the two numbers. This will
minimize any errors arising from the addition.
Many floating point systems further reduce any errors by adding one or
more guard digits. Guard digits are not stored with the number but are
used to collect the last digits that were shifted out of the number as a result
of aligning it in order to perform the addition. These guard digits are used
during the calculation and only afterwards, when the result is stored in
the normal format, are the guard digits finally lost.
After adding the two mantissae, it is possible that the result is no longer
normalized. Therefore, after the addition has taken place the number
must be re-normalized. This could mean that the guard digits reappear if
normalization implies that the mantissa is shifted to the right. If
normalization means that the mantissa must be left shifted - i.e. if the
addition resulted in a carry being generated - then the guard digits are
really lost. That should not concern us so much since we are storing the
result as accurately as possible in the given number of bits.
Floating point multiplication is less complicated than addition since we
do not need to align the numbers before performing the multiplication.
Instead, we can separately multiply the mantissae and add the exponents:
guard digits
2.3.4 Exercises
3. Prove that any number of the form 2-x - where x is a pOst._tive integer
- can be represented as a finite decimal fraction.
CHAPTER THREE
35
36 The 680x0 programmer's model
compared to memory. For this reason, registers. are also used to hold
frequently accessed values and variables.
Given the fact that there are only a few registers - the 680x0 has 16
general registers, and there are rarely more than 32 - and because registers
can be accessed quickly there is a strong desire to use registers to represent
a Pascal program's variables, these register variables must change their
meaning from time to time within the program. It is one of the assembler
programmer's greatest tasks to keep track of the precise meaning of a
register in the various parts of the program.
aQ ____ -1 ~o____ _
al dl
------1------
a~ ____ -I ~2- ___ _
a3 d3
------1------
a4 d4
as - - - - -1 d.s- - - - -
------1------f---~
Condition codes register
a§ ____ -I ~6- ____ XNZVCj
a7 d7 PC j
Address regs Data regs Program counter
The split of the 16 registers into 8 data and 8 address registers roughly
reflects a separation of address and data found in Pascal programs The
data manipulated by an application program falls into two aspects: literal
data values - such as the characters in a string or the numbers in an
expression - and the locations or addresses of those values in memory.
The two types of data require different kinds of operation and that is
reflected in their separation in the 680x0.
3.1 The 680x0 registers 37
The data registers are used to hold the arguments and results of
operations. So, for example nearly every arithmetic operation requires at
least one of the operands to be in a data register; with the other being in
memory or in another data register.
The data registers can be accessed and manipulated as 92 bit registers, or
as ~.!~it rggi§J~I.§,. In the latter cases the least significant 8 or 16 bits of
the data register are used and/ or affected by an operation. This flexibility
reflects the different natures of the data commonly processed: 8 bits are
often used for text processing applications for example.
The address registers are most often referred to in the calculation of
where data is. So we shall see below - as in the various addressing modes
of the 680x0 - that the address registers are often used to establish where
the various values are located in memory. It is as though they form a set
of 8 pointer variables where the data registers are integer variables. The
address registers also have a limited computational power associated with
them: mainly the ability to add and subtract into them.
One of the addres~-~sters - a 7 - has an additional interpretation: it is
the §stem stack pointer!___ It is sometimes also referred to as@ This
pointer is used by the processor as the address to which to save the state of
the machine during certain instructions. All of the address registers can be
used to implement stacks, but a 7 is used by the processor when it needs a
stack.
So, when a subroutine or function is called, the stack pointer register
indicates where to store the address of the next instruction to be executed
so that it can be ret"QJJJ.e<!. tQ..':'.Yh.~!)._tJ:.l~.. su£!:~~!~~!:._ha~.£~1..!1.£1~!~9-....:. We shall
see later that we can use this stack for many other purposes: we can use it
for all_oca_ling._a.p,ac;;~.for,Jocal variables .and..fru:J1alding"te.mpoxatyJ.alu.es
during complex computations. ,----...
The 680x0 also has a special shadow register 2f the (a 7 ~egister. This
shadow a 7 register is used by the operating system as- a second stack
pointer during the processing of special events such as interrupt
processing and virtual memory handling. This allows the operating
system to provide a separate memory area which is guaranteed to be
sufficient to process interrupts and other operating system events without
cluttering up the user's workspace.
In fact, some of the models in the 680x0 range have many other
specialized registers. The 68030 has some 30 odd further registers which
are used to implement operating system functions such as virtual
memory. If a floating point co-processor is attached then there are 8 more
floating point registers and three more control registers making a grand
total of 64 registers in the 68030. However we will only be concerned with
18 of them - the 16 address and data registers, the condition codes register
and the program counter.
38 The· 680x0 programmer's model
•• XNZVC
extended
71
Negative Zero
Carry /borrow
The Carry flag is set whenever the last arithmetic operation resulted in a
value which could not be correctly represented in 8/16/32 bit modulo
arithmetic (depending on the size of the operation). For example, in 8 bit
arithmetic, if we add 100 to 200, then the result will be 44; this is because
the true answer - 300 - is represented as 44 in modulo 256 arithmetic. The
fact that there was an overflow is signalled by the carry flag being set. The
carry flag is also used in the shift operations where it holds the last bit that
was shifted out.
The overflow flag is set when the last operation resulted in a value that
could not be faithfully represented as a signed number in 2's complement.
For example, if we add 100 to 50 in 8 bit arithmetic then the result will be
150. But this is a negative number in 2's complement and so the overflow
flag is set.
The oVerflow flag is important for calculations involving signed
arithmetic whereas the Carry flag is reflects the result of unsigned
arithmetic.
3.1 The 680x0 registers 39
The Zero flag is set if the last data value processed was zero. This is often
used in comparisons, for example. A comparison is implemented as a
subtraction where the result is used only to set flags. If the result of the
subtraction was zero then the two values were equal.
The Negative flag is set whenever the last value computed was a
negative number in 2's complement. In practice that means that the
Negative flag tracks the most significant bit (i.e. the sign bit) of values
computed in the processor.
The eXtend flag. is similar to the Ca!!}'_ flag e~Cef!LJhat it is used in
implementing multi-precision arithmetic. Therefore it is also input to
certain instructions as well as being generated by them.
The individual flags in the CCR are rarely used on their own. Instead
various combinations of them are used which represent more meaningful
conditions. These are the conditions which are directly available to the
programmer in instructions such as branch conditional (bee) where cc
refers to one of the 16 conditions listed below:
Unsigned
arithmetic
conditions { cc carry clear
HI> . i!!W
EQ equal/zero
-
cs
LS::;
carry set
low or same
NE, not equal/non zero
F false/never T true/ always
Signed
~ MI minus PL plus
vc overflow clear vs overflow set
arithmetic -<
conditions GE?. greater or eqmd LT 1:7~'.. less than
( GT> greater than LE _<; less or equal
parts of the memory array - some of it contains the code itself, other parts
of it contain the data and still other parts belong to the operating system ...
However, from the point of view of the machine itself, there is only a
single interpretation of the memory - it is an array of fixed length bit
patterns arranged as words or cells.
Each memory cell has an index which we call its address. The address of
a memory cell is not part of the memory cell itself, but it allows us to
uniquely identify the cell.
Addresses, like array indices, are just numbers. Such numbers can be
stored in memory cells just as any numbers can be. This allows us to have
some cells 'point' to other cells by allowing them to contain the number of
the target cell's address:
a:=a+b*c;
The only point to note here is that a single Pascal statement often requires
many assembler instructions to implement. The result is that assembler
programs usually contain many more lines than their Pascal equivalents.
Apart from the difference in the granularity of assembler statements
compared to statements in high-level languages, the other main difference
is the restricted nature of the data that can be manipulated directly. A
Pascal program variable can range in type from a single boolean value to
complex structures such as arrays of records each containing a sub-array ...
By contrast, in an assembler program we are always dealing with the
contents of registers or of individual memory locations.
Perhaps one of the simplest assembler programs we might think of takes
two numbers, adds them together and places the result in a third location.
We could do this with the instructions:
move.w 1000,dl.
add• dl I d._o •
W
move. w dO, 1002
.
~~
The first instruction moves the contents of the word whose loc<;ltion in
~q~ The size specifier '. w' which
memory is l_O OO into the data registef----:::·
~
3.3 Simple assembler programming 43
move.w 1000,dl
998
1000=:€]
1002
1004 aO dO ,...23'
al dl-=~
a2 d2
a3 d3
a4
----- d4 ------
as- - - - - dS
------
a§ _ _ _ _ ~6- ___ _
a7 d7
Address regs Data regs
The second instruction adds the contents of data register dl to the register
dO, overwriting it in the process. So, if dO previously contained -23 in it,
then after the add instruction it will contain@:
aO ~o- - - -~~..::: ,
al - - dl 3S
- - - ------
a2 d2
a3 d3
----- ------ add.w dl,dO
a4 d4
aS dS
----- ------
a§ _ _ _ _ ~6- ___ _
a7 d7
Address regs Data regs
Again, the '. w' specifier in the add instruction indicates that we want to
use 16 bit addition. The final !11.C?.Y.!! _instruction overwrites the memory
contents at location 1O02 with the contents of register,_e!O, i.e. with the
result of the addition: -····
move.w d0,1002
998
1000 35 - 35 -
aO
- - - - -
dO
- - - -12
1002 ?? 12 --
al dl 35-
1004 - - - - - -----
a2 d2
- - - - - ------
a3- - - - -
d3
------
a4 d4
- - - - - ------
as dS
- - - - - ------
a§_ - - - d6
------
a7 d7
before after Address regs Data regs
source,dest
~op.s
Operation mnemonic /
Size specifier
t De~ation
Source operand specifier
operand specifier
3.3 Simple assembler programming 45
~op.s label
Operation mnemonic ' \ 'Program label specifier
The program control instructions may also have a size specifier, in which
case it specifies whether a short or long jump (goto) is to be taken. The
existence of this specifier allows the programmer to select the size of
instruction needed to encode a particular jump. Since the program
control instructions are extremely frequent in assembler programs most
computer designers attempt to optimise their representation - a short
branch is shorter (occupies less space) than a long one. Some assemblers
can calculate this specifier automatically since the target address or label is
always known; however most assemblers require at least some assistance.
46 The 680x0 programmer's model
program, this listing would not only identify any errors produced, but will
also indicate the actual bit patterns generated for each instruction. Listing
formats vary with different assemblers, however a typical assembler might
generate the following for our simple three line program:
1:00000000 list 1
2:
3: a sample program
4:
5:00000000 323803E8 move.w 1000,dl
6:00000004 0041 add.w dl,dO
7:00000006 31C003EA move.w d0,1002
B:OOOOOOOA end
Code size = 10
Number of errors = 0, number of warnings = 0
Figure 3.6 A sample listing of an assembled program
If we look at one of the lines in this listing in more detail, we can see what
information the assembler produces:
\ 5:00000000 323803E8
\ move.w 10 0 0, dl ....._. source line
/
Address of instruction words
!
opcode mnemonic
\
destination operand
The original source line echoes the e~act contents of the file being
assembled. This is further identified by the number of the line in the file.
Immediately adjacent to the line number is the address (in hexadecimal)
into which this instruction is being assembled. On a modern computer,
this address is rarely the real address of the instruction in the memory, but
rather its relative location within the program. When the assembled
program is loaded into the machine, these addresses are adjusted to
indicate where it has been loaded into the memory.
The hexadecimal number which follows the instruction address is the
actual instruction word(s) generated by the assembler. This number will
48 The 680x0 programmer's model
100000011111010001
The first 16 bit word of an instruction determines both the opcode and the
major aspects of operand addressing used in the instruction. In this case,
the two most significant bits of the instruction word are zero, which
indicates a move instruction. The next two bits determine the size of the
move (01 would signify a byte operation, 10 would signify a long
operation and 00 signifies a completely different instruction). The next 6
bits determine the destination of the move - dl here - and the least
significant bits determine that a move from an actual memory location is
3.3 Simple assembler programming 49
specified. The memory address to read from is in the next word of the
instruction.
The length of a 680x0 instruction depends on the complexity of the
addressing being specified: the shortest instructions are one 16 bit word,
whereas the longest instruction on the 68020 occupies 11 words!
cmp.b #32,dO
The operand #32 is the source operand and it is an immediate operand
(indicated by the presence of the '#' character in front of the literal
number), and the destination operand - dO - is an example of the register
direct addressing mode.
The effect of executing this cmp instruction is to compare data register
dO with the literal quantity 32. Only the least significant 8 bits in dO take
part in the comparison since the size specifier is ' . b'.
cmp.b #32,dO
h
aO ~o_ - - ~~
al dl . Z flag is set if lowest byte in
-- - - - ------ registerdO is an ASCII space
a2 d2
a3 d3 The other flags will be
a4 d4
- - - - - set accordingly
as dS
- - - - - - I--~--.'"
move.l store,d4
The ds . 1 directive does not generate any 680x0 instructions - its function
is to ensure that the assembler reserves some space (one long word in this
case). There is no requirement that this long word in memory has an
initial value. If it is required for a variable to have an initial value, or if it
is necessary to have a constant literal in the assembler program (a literal
with a specific location as well as value) then we use the related directive:
the define constant - de - statement. For example, a statement such as:
3.3 Simple assembler programming 51
move. w 1000, dl
add.w dl, dO
move. w dO, 1002
rts
3.3.3 Exercises
move.l 10,dO
and
move.l #10,dO
move.l #100000,dO
move.l #200000,dl
move . w dO , dl
0€90....+0 lEf>l--+0
lEf>0-..+1 0€91....+1
Data values in Pascal are built from a set of primitive data types - called
scalar types - and a set of methods for combining and structuring the data.
So we have, for example, integers, 'real' numbers and characters as scalar
types in Pascal. These can be combined into arrays, records and sets or
various combinations of these.
Intimately associated with these data types are the variables which can
have them as values. A Pascal 'variable' is best thought of as a named
location in the computer's memory. The different values that a variable
can take on, through being assigned to, are reflected by the different
contents of the variable's location. Furthermore the structure of a
variable's location will depend on its type.
Apart from simply representing variables and values, it is also necessary
to show how expressions can be computed, how variables can be assigned
to, and how components of complex data structures can be accessed and
updated. In effect, this chapter is concerned with the implementation of a
single type of Pascal statement: the assignment statement; in particular, we
concentrate on expressions involving scalar values and variables.
Although we will look in later chapters at the explicit control aspect of
programs, it is fair to say that we are also interested in control in this
chapter. On the whole though, the control referred to whilst accessing
data is automatic: i.e. it is only indirectly specified by the programmer
through the use of expressions.
55
56 Representing Pascal expressions
From the point of view of the Pascal and assembler programmer a scalar
quantity is also treated as a whole and its structure is not normally
inspected. In practice a stricter interpretation of scalar is also used in
computing - a scalar quantity is one which can reside in a machine
register. This is a more restricted view than the mathematical definition;
for example, the set of integers has infinitely many elements, but a 680x0
register can only handle integers in the range -2,147,483,648 ... 2,147,483,647;
which though it is large it is not an infinite range: larger numbers have to
be constructed from sequences of integers within the range that can be
handled directly.
There are three different types of scalar in the Pascal language - ordinals
which includes sub-ranges of the integers, characters and booleans;
pointers; and the real numbers.
In fact, computer 'real' numbers are not Real but floating point numbers
which are really fractions. There are an uncountably infinite number of
reals, most of which would require an infinitely large amount of
computer storage to represent - just one real number 7t has infinitely
many digits in its decimal (or binary) expansion. Therefore it is not really
practical to have real real numbers!
However reals are a primitive type in Pascal and are treated as scalar. In
assembler, floating point numbers are not primitive as they have an
internal structure consisting of exponent, mantissa and sign bit.
4.1.1 Ordinals
An ordinal scalar type expresses a sub-range of some other (possibly
infinite) type. The most fundamental example of this is the type
consisting of the representable integers;· other examples include the
characters and the enumerated types. Although intended for different
uses, the different types of ordinal types are handled by Pascal programs in
similar ways and can all be represented in the machine using common
techniques.
These are some example Pascal definitions of ordinal types, together
with the typical number of bits required to represent values in them:
which is not strictly legal Pascal, is intended to denote the most negative
integer.
Since the smallest addressable quantity in the 680x0 is a byte, it is
convenient to allocate space for variables in multiples of bytes. If a scalar
value requires less than 8 bits to represent it then a byte is used none-the-
less. If a scalar requires between 8 and 16 bits to represent it then a 16 bit
word will be used. For example, a number of the sub-range type -512 ..512
occupies the same space as a number of sub-range type -32768 .. 32767.
Similarly if a scalar quantity needs more than 16 bits then the whole 32 bits
are used to represent values of that type.
Later we shall investigate packed data structures where it is essential to
use the least possible amount of space. In a packed structure we make an
effort to use only the absolute minimum number of bits needed to
represent each value; for example, if data items of a particular type only
need 9 bits to represent it then 9 bits are used. This will mean that there
may be more than one data item represented within a word, and even that
an individual item may be spread across word boundaries. However,
since the processor does not easily access such odd size quantities there is a
consequent increase in complexity in accessing packed data.
Characters
Logically the character scalar type is also a sub-range: characters from
systems such as the ASCII characters form a subset of all the possible
characters. There are several different character sets in common use
including ASCII, EBCDIC and the various Japanese Katakana/Kanji
character sets; some of these character sets are more common than others.
There are 128 characters in the standard ASCII character set, including 32
control characters. Since we can represent 128 patterns in 7 bits we
typically use a single byte to represent a given character. There are some
variations on the 7 bit ASCII character set: for example there is an 8 bit
ASCII character set, the IBM PC character set and the Apple Macintosh®
character set.
Character processing is extremely important in computing. Characters
and the ability to manipulate them are essential in applications ranging
from word processing to databases. It can be argued that the primary
motivation to have a byte oriented memory structure in processors such
as the 680x0 is the desire to optimise character and string processing.
4.1.2 Pointers
A pointer is a quantity which is a reference to another value. We would
call it a scalar value since it has no internal structure even though we can
use a pointer to access the value identified by it. Pointers are also very
important in programming; although a major desire in the design of
programming languages is to make them either transparent - in the case
of LISP and Prolog - or to make their definition and use disciplined - as is
attempt~d in Pascal.
On a computer such as the 680x0 a pointer is represented as a memory
address; which is of course a number which can be placed into an address
register. The effective size of this number is different in the 68000 and in
the 68020 and 68030 (24 bits and 32 bits respectively). However, most
compilers devote a full 32 bit long word for a ppinter whether the target
machine is a 68000 or a 68020.
The add instruction adds the integers in its source and destination - 1
and it. respectively - and stores the result in the destination - it.- The effect
4.2 Scalar expressions 59
The parse tree shown in Figure 4.1 highlights the dependencies between
the various parts of the expression and the variables and constants
involved. It also shows where there are intermediate points within the
expression which are not directly associated with a variable or numeric
value. We shall see that although these points have no identifiers
associated with them in the Pascal expression, we do have to explicitly
identify them when we come to map the expression into assembler
instructions.
x+y*z
becomes in reverse polish form:
~ z *+I
The meaning of this expression is
'apply *toy and z, and apply+ to the result of that and x'
More complicated reverse polish form expressions often have several
operators in sequence being applied to larger and larger sub-expressions.
Our initial expression would be written in reverse polish form as:
Ix y * z 2 ** + x y - 11
4.2 Scalar expressions 61
x+y
is converted to: ~_±]
A more complex expression is converted recursively, by first converting
sub-expressions:
u * v + x * y
=> r:.;--;---;i + x * y
=> lu v *I + Ix y *I
=> ~ * x y * +I
I.e. for any expression of the form:
L op R
we recursively map L to~' and we map R to~ and then move the
operator to the end to get IL R opl.
If we apply the conversion to our original expression we get the
transformations:
(x*y+z**2)/(x-y)
=> <Ix y *I + z**2)/(x-y)
=> <Ix y *I + lz 2 **I> I (x-y)
=> <Ix Y *I + Iz 2 **I> 11 x Y -I
=> Ix y * z 2 ~/Ix y -]
=> Ix y * z 2 ** + x y - 11
Any rules that we have regarding associative operators must be followed
during this conversion process. For example, if we want to convert the
expression
62 Representing Pascal expressions
x-y-z
we have first to decide whether this refers to
(x-y) -z
or
x-(y-z)
Both interpretations are possible, though they lead to different, non-
equivalent, reverse polish forms:
Ix y - z -I
and
Ix y z - -I
Evaluating reverse polish form expressions
We can evaluate reverse polish form expressions with the aid of an
expression stack. This stack is used to hold the temporary intermediate
values generated during the evaluation and as a source of operands for the
arithmetic operators.
First free ~
,.
location in Value-2 Top element in
expression stack Value-1 """"' expression stack
Value-0
I . i
... 4 ... ... 4 ...
t t
4.2 Scalar expressions 63
.
* * ... I
+
c) In general, on encountering an n-ary operator, we remove n operands
from the stack, apply the operator and place the result back on the
stack.
d) After the last symbol in the expression has been processed, the
expression stack contains a single value: the value of the expression as
a whole.
D . 20
... * I ... * I
+ +
Most arithmetic operators are binary - they take two arguments - but
some are unary - square root for example - and there may even be some
ternary operators such as if-then-else (though not in standard
Pascal).
Using our procedure, we can see how to evaluate the simple expression
Ix y +I
Proceeding from left to right we push x onto the stack,
64 Representing Pascal expressions
D .. nbd
Ix Y +I Ix Y +I
t t
and then push y on to the expression stack,
Ix Y +I Ix Y +I
t t
On reaching the + symbol, we take the top two items from the stack, add
them together and put the result back on the stack. This is the last symbol
in the expression so the single value remaining is the value of the
expression .
.. n~ [lllili> x+y
Ix Y +I Ix Y +I Ix Y +I
t t t
The only difference, in principle, between a simple expression and a
complex one is that the stack gets a little deeper when evaluating the
complex expression:
4.2 Scalar expressions 65
y * z 2 ** + x y - I
x-y (x*y+z**2)
x x*y x*y+z** /(x-y)
-.1
The value of the expression
is left on the stack
aO dO
al dl ,
------- ----- --- 998
998 a2 d2
1000 a3 d3
~~~'<!
1002 .......,._............. a4 d4
as------- d5
------ --
d6
d7
instructions.
The addressing mode we have used here .:.,.. a7 . - is the ~r
itJA.ftg£t wi.th_ rzre-decrement mode._ In this mode, the address register is
decremented and the value contained in the address register is then used
as the address of the operand. The amount that the address register is
decremented depends on the size of the data transfer; in this case it is two
since the . w specifier indicates a word length transfer. The aim behind the
pre-decrement addressing mode is to always ensure that the stack pointer
is in an appropriate place to place new values on the stack. Without this
addressing mode we would have to adjust the stack pointer explicitly with
extra instructions.
The pre-decrement mode can be used to specify either a source operand
or, as in this case, a destination operand, i.e. where to store the var on the
expression stack. If var was identified with the data register dO, we could
implement a push onto the expression stack with ~
move.w
--
dO, -(a7)
4.2 Scalar expressions 67
which would save the lower word contents of data register dO on the
system stack.
Every time we encounter a binary operator in our processing of a
reverse polish expression we are required to take off two operands from
the stack, apply the operator and replace the result onto the stack. In order
to take an entry off the system stack we use an instruction such as:
move. w (a 7) +, dO
Here we have used address register indirect with post-increment
addressing mode as the source operand of the move instruction. This
addressing mode is analogous to the pre-decrement mode except that the
address register is incremented after it is used to determine the address of
the operand. Again, since a word length move is specified, a 7 is
incremented by 2.
With these two addressing modes we can construct the sequence of
instructions to evaluate the complete reverse polish form expression:
r~\'\ ~
Ix y +I ::;i\- -----
1· ~" (.
~-r-/-
for which we can use the instruction sequence:
adds the lower 16 bit contents of dO, together with the 16 bit word
addressed by a7 (which is x in this case), and replaces the word in memory
by the result. In effect we have replaced the old copy of x on the stack with
x+y in a single 680x0 instruction.
Notice that if we knew more about where the variables x and y were
located in memory, if either was in a data register for example, then the
code sequence could be further shortened to a single instruction:
add.w vY
move. w # 2, - (a 7)
<ezyonentiate>
move . w (a 7) +, dO
add. w dO, (~7_}~ ;x*y+z**2
move.w v -(a7)
sub.w y, (a7) ;x-y
move . w (a 7) +, dO
move.w (a7)+,dl
ext .1 dl
divs .w dO, dl ; (x*y+z**2) I.
move . w dl, - (a 7) (x-y)
move . w (a 7) +, -t. ; x: = ...
This rather long sequence of instructions illustrates graphically the
difference in granularity and detail between a simple Pascal statement and
the machine instructions needed to implement it.
The third instruction in the sequence is a signed multiply instruction:
muls. This instruction is unusual in that the destination (which must be
a data register) is different in size to the source. In general multiplication
can double the number of significant digits; for example, multiplying
75x45 is 3375 which has twice the number of digits of either operand. The
same applies to binary multiplication which is why the muls instruction
takes two 16 bit operands and returns a 32 bit answer. However, we are
only saving 16 bits of the result in this sequence so we run a risk of
generating erroneous results.
On the other hand, division has the opposite phenomenon to
multiplication: generally the number of significant digits is reduced by a
4.2 Scalar expressions 69
Doing this unfolding nearly always results in faster program execution but
it can lead to greatly expanded code. In this case the saving in instructions
executed is spectacular; the loop and the initial stack pushes can be
replaced by the sequence:
move. w z, dO
muls. w dO, dO
move. w dO, - (a 7)
Notice that when we implemented our stack moves - stack pushes and
stack pops - we used pre-decrement addressing as the destination when
pushing a value onto the stack, and post-increment addressing as the
source when popping a value from the stack. This results in a downwards
growing stack: that is the stack's address register decreases in value as more
is pushed onto the stack.
We could just as easily use post-increment as the destination and pre~
decrement as the source; in which case the resulting stack would be an
'upwards growing stack' - with increasing memory addresses as more is
pushed onto it. However the system stack - as addressed by a 7 - is
assumed to be a 'down' stack by the processor; and therefore it would be
extremely unwise to use a7 to construct an upward stack.
We can use the pre-decrementing and post-incrementing addressing
modes with any of the 680xO's address registers; this means that we can
have stacks pointed at by any address register. It is possible, for example, to
have more than one expression stack - the a 7 register points to the system
stack and we could use a4 (say) to point to a different stack.
One limitation of implementing stacks in the way that we have, is that
there is no bounds checking. Unless extra checking instructions are used
to make sure that the stack pointer remains within the memory allocated
to the expression stack there is a danger of stack overflow or underflow.
Not unusually for assembler programming it is the responsibility of the
programmer to ensure that stacks do not stray outside their allocated space
and overwrite neighbouring areas of memory; this is usually achieved by
allocating a large enough space for the stack and hoping that it will never
overflow when running the application.
Evaluating an expression using registers
Just as we can unfold a bounded loop into a sequence of its bodies, so we
can unfold the various stack operations - instead of using the system stack
for intermediate results we can simulate a stack by using data registers.
In such a scheme we might use d7 for the first stack push; and if there is
a second stack push without there being a stack pop first then we use
register d6 for the second level of the stack. We can use d5 for the third
level of the stack and so on down to d3 (say). Provided that the data
registers d3 to d 7 are not needed for other purposes this technique would
4.2 Scalar expressions 71
allow us to simulate a stack with 5 levels; this is enough for most of the
expressions that are likely to be encountered in a Pascal pi:ogram.
Using the data registers to simulate a stack allows further scope for
optimization - we can eliminate some of the stack movement
instructions altogether since data registers can be used directly as the
source/ destination for arithmetic instructions. Instead of a sequence such
as:
move .w 't,, d7
muls.w y,d7
move.w 't,, d7 ;x
muls.w y,d7 ;x*y
move.w z, d6 ; z
muls.w d6,d6 ;z**2
add.w d6,d7 ;x*y+z**2
move.w 't,, d6
sub.w y, d6 ;x-y
ext.l d7
divs d6,d7 ;value in d7
move.w d7, '\:.
.,., ••~-y),
move. w ;i:, d7
muls.w y,d7
move. w ;i:, d7
muls.w y,d7
bvs overflow line xxx
The bvs instruction checks the ccr register for the overflow condition,
and if the multiply resulted in an overflow then the branch is taken. The
bvs (and the bne) instructions are special cases of the bee instruction
4.2 Scalar expressions 73
which branches on any of the test conditions. See Appendix B for a more
detailed description of bee.
The label overflow_line_xxx is at some suitable place in the
program which would contain the necessary instructions to report the
error and allow the programmer to be aware that an overflow error
occurred at line xxx of the Pascal program. If we repeat this exercise for
the whole statement we see that some 50% of the instructions are error
checking code!
move.w ~' d7 ;x
muls.w y,d7 ;x*y
bvs overflow xxx
move.w z, d6 ;z
muls.w d6,d6 ;z**2
bvs overflow xxx
add.w d6,d7 ;x*y+z**2
bvs overflow xxx
move.w ~, d6
sub.w y, d6 ;x-y
bvs overflow xxx
beq zero divide xxx
ext.l d7
divs d6,d7 ;value in d7
bvs overflow xxx
cmp.w #max for _x,d1 ;x in range?
bgt -
range_error_ xxx
cmp.w #min for x,d7
blt - -
range_error_ xxx
move.w d7,~
The last four instructions prior to the final move instruction implement a
range check - they test that the value of the expression is in the type range
of the variable ~· We can slightly optimise this sequence on the
68020 I 68030 by using a single instruction to perform both comparisons for
x during the assignment. We can use the cmp2 instruction which
compares a register against two quantities:
cmp2. w x_bnd, d7
bes range_error_xxx
x bnd dc.w min for x
dc.w max-for-x
The two literal constants min for x and max for x define the
minimum and maximum values- that d7 is to be compa-;ed with. The
cmp2 instruction compares its destination operand with both bounds and
, if the number is out of range than the Carry flag is set - hence the b cs
74 Representing Pascal expressions
4.2.3 Exercises
u-v-w+x-y
assuming that '-'and'+' are left associative: i.e. that
a-b-c = (a-b) -c
a+b+c = (a+b) +c
2. a) Given that the Pascal variables u and v are 16 bit integers show
the reverse polish form of the expression:
(u+v)/(u-15)
b) Show the 680x0 instructions that implement this expression
using the system stack.
((u*32)+(u/v))**w
CHAPTER FIVE
In Pascal structured data, such as vectors, matrices, stacks and queues, are
primarily represented using combinations of records and arrays; in LISP
and Prolog compound structures are formed, in a higher level way, using
lists and trees. Here, it is our intention to examine the representation and
manipulation of Pascal compound structures such as records, arrays and
sets.
The 680x0 is, like most conventional processors, a fundamentally scalar
machine: an individual instruction can only manipulate scalar quantities:
bytes, words and long words. The largest objects routinely handled by
processors as scalar objects are typically floating point numbers; even here
the main 680x0 processor does not have any specific floating point
instructions - instead a co-processor is linked to perform floating point
arithmetic.
The key to handling compound structures in the 680x0 is the
observation that they are nearly always represented - in memory - as
collections of bytes and words. Although the 680x0 cannot deal directly
with a compound structure as a single entity we will see that we can
manipulate expressions involving them a 'piece at a time'.
5.1 Records
A Pascal record cannot be held entirely in a single 680x0 register. Record
values are represented only in memory: as a collection of bytes - in fact as
a contiguous concatenation of the component parts of the record.
Each component part of a record has a size: this size is pre-determined
for scalar types (i.e. the number of bytes needed to represent the scalar
variable) and the size of a compound structure is found by adding up the
sizes of each of the component parts of the structure:
75
76 Pascal compound structures
foobar = record
foo:integer; {2 bytes}
bar:record
a:integer; {2 bytes}
b:char; {l byte}
end; {a filler byte}
foop:"foobar; {4 bytes}
end; {total: 10 bytes}
aO dO
-------------
al dl
a2 d2
-------------
a~--~OQ~ d~~~-=-=-~-~-=-=1-~~~--
a4 d4
-------------
as dS
999
1000
a6 d6
------------- - - - - 1001
a7 d7 1002
1003
1004
- - - -
foobar = record
- - - - 1005
foo:integer; 1006
bar:record - - - -
1007
a:integer;
- - - -
- - - - 1008
b:char; - - - - 1009
end; 1010
foop: ... foobar;
end;
aO dO- -
- - -- - -dl - - -
al
- - - - - - -- - - -
a2 d2
- - - - - - - - - - -
a3 1000 d3- - - - -
- - - - -
a4
- - - - - -d4- - - - - 999
as dS - - - - 1000
- - - - - fb
a_§ - - - ~~ - - - - - 1001
a7 d7 1002
ASCII' 9' is 57 ~:::: :::: :::: 1003
~ 57 1004
1005
1006
move.b #'9',4(a3)_/
1007
1008
1009
1010
move.! O(a3),0(a2)
move.! 4(a3),4(a2)
move.w 8(a3),8(a2)
Notice that we can use these three instructions no matter how many fields
there were in foobar providing that its length was still 10 bytes. The
sequence of move instructions in the assignment does not relate to the
individual fields in the records but rather to the total number of bytes
needed to be moved. Indeed in this case one of the fields - foop - is
moved in two pieces across· the second and third instructions. We can do
this because assignment is an atomic action from the point of view of the
Pascal programmer and the state of the memory is consistent both before
and after the assignment sequence (although not necessarily during it).
If the record is very large then the iterated sequence of move
instructions might lead to a large number of instructions. We could,
instead, implement the record assignment as a loop:
Here we are using a byte sized transfer together with the post-increment
addressing mode to copy the record across. The address registers a3 and
a2 are successively incremented in each pass of the loop so that they are
always pointing at the next byte to copy. At the end of the loop, when all
the bytes have been copied, a3 and a2 will point to the first byte after the
fb and jb variables respectively. In practice of course, we wouldn't use a
byte transfer - we would probably use a word or long word transfer - and
have fewer passes round the copying loop.
The dbra instruction decrements the bottom half of the data register dO
and if the result is -1 then the loop terminates and execution continues
with the next instruction. Otherwise, the processor jumps to the label @0
and the loop is re-entered. Since the loop finishes when dO reaches -1
we have initialized the counter dO with one less than the length of the
record.
fbp,jbp:"foobar;
80 Pascal compound structures
and we might use the jbp variable to update a component in the record
addressed by fbp:
fbp".foop:=jbp;
Notice that while fbp is a scalar variable, fbp" is a record variable which
would have to be manipulated via its address - but we can obtain that
from the scalar fbp. The record element fbp". foop is once again a
scalar that can be processed directly. As we come to implement this Pascal
statement we no longer need to assume that fbp" has somehow been
loaded into a3 - we can do this directly by loading it from the fbp
variable:
while fbp".foop<>nil do
fbp:=fbp".foop;
nw:~ foo:
bar: a:
b:
foop:
(1-foo-:~~ foo:
bar: a: bar: a:
b: b:
foop: ~~f_o_o_p_=~~~~
Figure 5.3 Inserting an element into a list
The old value in fbp"'. foop is copied to the new element's foop, before
being updated to point to the new element:
nw"'.foop:=fbp"'.foop;
fbp"'.foop:=nw;
move. 1 f6p, aO
move. 1 nw, al
move.1 foop(aO),foop(al)
move.1 al,foop(aO)
nw"'.foo:=lO;
nw"'.bar.a:=nw"'.foo*23;
nw"'.bar.b:='O';
nw"'.foop:=nil;
the components of the records nw" and nw" . bar to be in scope, in effect
defining new variables:
If we were a naive compiler, then the first set of statements would involve
repeatedly computing the address fbp" as we accessed and stored values
into the record. However in the second formulation, we can take the hint
of the with statement and allocate an address register a3 (say) to
temporarily hold the addresses of fbp" and fbp". bar. If we do this,
then when we map the statement sequence, we can assume that we know
where the variables foo, a, band foop are in relation to a3:
After the completion of the with statement then we can 'release' the
address register a3 for other roles.
Recall that we constructed the assembler symbols for the offsets of the
elements of the foobar record by means of a series of equates. With a
large program it is quite possible to accumulate large numbers of equates
relating to the various records and other constants. Unless carefully
managed, this can result in some confusion, especially if there are records
with duplicate field names in .them. A few assemblers have a more
elaborate way of declaring record layouts which reduces this problem by
isolating each record description. The technique is reminiscent of
declaring storage:
barr.a equ 0
barr.b equ 2
foobar.foo equ 0
foobar.bar.a equ 2
foobar.bar.b equ 4
foobar.foop equ 6
foobar.length equ 10
The symbolic names that are introduced with this notation can be used
instead of offsets; so for example, a Pascal statement such as
fbp".foop:=nil;
can be written in assembler as the instruction:
move .1 /Gp, a3
move.l #nil,foobar.foop(a3)
The main advantage of using such record descriptors in assembler
programs is that the names which are declared within the record are not
global: a normal equate directive would declare a symbol for the whole of
the remainder of the source file. With record declarations, we could
describe several records, with possibly overlapping field names, and
reduce the risk of confusion.
The record descriptions can be used to give a similar kind of support to
the assembler programmer as Pascal does with the with statement. The
assembler with directive 'declares' that the symbols within a record
description are made directly available. Using field names within the
scope of an assembler with directive would be automatically converted
into the appropriate offset values. The scope is terminated by a matching
endwi th directive.
It is still however the programmer's responsibility to ensure that an
address register has been appropriately loaded with the base address of the
record. Our original initializing sequence can now be expressed as:
84 Pascal compound structures
We also have a new instruction here: the load effective address - lea -
instruction. This is an interesting and important instruction which loads
the address of an operand into an address register rather than the value
addressed by the operand. In this case the instruction
lea bar(a3),a2
move. 1 a3, a2
add. l #bar, a2
We can use any memory addressing mode as the source of this instruction
- there is no numeric value for the address of a register however, so it is
not possible to load its effective address!
The add instruction above is really an adda . 1 instruction which is a
version of add which adds to an address register as opposed to a data
register, but most assemblers automatically substitute adda for add when
the destination is an address register.
5.1.4 Exercises
2. Show the sizes of the record types below, and determine the
numerical offsets of the components:
d entry = record
- mark:boolean;
t:(a tag,b tag);
n:"d-entryT
end;-
and
e entry = record
- mark:boolean;
n:"e entry;
t: (a -tag, b tag);
end;- -
the array; we need to map the index into a new range so that the first index
is mapped to the first element of the array - which is always at offset 0 -
and then we need, in general, to multiply the shifted index by the size of
each element of the array:
I * Size of element
' ~ Range of byte offsets into
[0 .. (n-m) *SJ array memory structure
F~r example, if each element of an array occupies 10 bytes then the 3rd
element in the array is 2*10 bytes from the beginning of it.
In the ai array above, each element is an integer, which we are
assuming to be a 16 bit integer. Thus in order to convert an index into an
offset we need to multiply the index by 2. For example, the Pascal
assignment statement:
ai[x]:=32;
where xis an integer variable, can be mapped to the instruction sequence:
lea ai, a2
move.w -t_,dO
sub.w #1,dO ; array starts at 1
muls.w #2,dO ; convert to offset
add.l a2,d0 ; add in offset
move.l dO,aO
move.w #32, (aO) ;the
Here we have assumed, as we did with record variables, that we do not yet
know where ai is or how to find its address; so we have assumed that
register a2 can be set to the base of the array using some kind of operation
similar to the lea instruction. This may indeed be a lea instruction,
however it will not always be so. We shall be better placed to determine
how to find this base address when we look at procedures and variable
allocation within them.
As with scalar expressions, it is our responsibility to ensure that the
expressions that we use to access array elements are within the bounds of
the array itself: it is meaningless to access the moth element of an array
that only has 30 elements. In principle, checking for array bounds is the
5.2 Representing arrays 87
same as range checking for scalar expressions: to check that the value of
the index expression is within the range of indices of the array we compare
the index value against the minimum and maximum values permitted
for the index:
lea ai, a2
move.w ~' dO
sub.w #1, dO ; array starts at 1
blt array bounds error
cmp #9 I dO- ; array bounds: [1 .. 10]
bgt array bounds error
muls.w #2 I dO- . ; convert to offset
add.l a2, dO ; add in offset
move.! dO,aO
move.w #32, (aO) ;the .-
We can use another of the 68000's addressing modes, the address register
indirect with index addressing mode to optimise array access. The index
addressing mode combines the use of an address register (which points to
the base of the array), a second register (to provide the index) and a
displacement.
An operand using this mode is written: O f f ( ax , r y . s ) or
(Off, axr ry. s) where ax is the address register, ry is the index register
which can be either an address register or more typically a data register, s
is the size of the index ( . w for word length or . 1 for long word length)
and Off is a displacement or offset in the range -128 .. 127.
The size specifier determines how much of the index register is to be
used for the index size. In our case, since ai occupies less than 64 Kbytes,
the indices into ai are integer or word length and so our index variables
are also word length.
Using the indexed addressing mode we can shorten the sequence of
instructions above by eliminating one add and one move:
lea ai, a2
move.w ~' dO
sub.w # 1, dO ; array starts at 1
blt array bounds error
cmp #9,dO- -
bgt array bounds error
muls.w #2,dO- -
move.w #32,0(a2,d0.w)
lea ai, a2
move.w x,dO
cmp #1,dO
blt array bounds error
cmp #9,dO- -
bgt array bounds error
add.w dO,dO- -
move.w #32,-2(a2,d0.w)
We can compute the initial or base offset for the general case by calculating
the size of the 'array' fragment from the first index to 0. This will be a
negative amount if the first index is positive and positive if the first index
is negative. The base offset can be expressed as the value of the expression:
then the offset is found by multiplying the first index (which is -10) by the
size of an element (which is 1 byte) and negating it, giving us an offset of
10 bytes - implying that the offset required to access an element of this
array is 10. Only arrays whose first index is 0 use the address of the array
without offsets to access elements in it.
The 68020 and 68030 offer a further enhancement to the indexed
addressing mode: it is possible to scale the index. The scale factor is the
number by which the index is multiplied before use in the address
computation. The scale factor can be 1, 2, 4 or 8, corresponding to arrays
whose elements are byte sized, word sized, long word or double long word
sized. In our case, ai is an array of words, and so we can use a scale factor
of 2. (The 68020/68030 also allows displacements to be 16 bit as opposed to
just 8 bit on the 68000.)
If we also use the more advanced double comparison instruction cmp2
available on the 68020/68030 then we get the sequence:
5.2 Representing arrays 89
lea ai, a2
move.w ~, dO
cmp2 ai bnds,dO
bes array bounds error
move.w #32,-2(a2,d0~w*2)
ai bnds:
dc.w 1 ; lower bound of ai
dc.w 10 ; upper bound
aO dO x
------------
dl
d2
d3
d4
------------
dS
move.w #32,-2(a2,d0.w*2)
d7
32
ai[l] ai[2] ai[x] ai [10]
fba[x] .bar.a:=y;
90 Pascal compound structures
with foobar
move .w ~, dO
cmp . w # 1, dO ; check lower bound
blt array bounds error
cmp . w # 10 , dO ; Check upper bound
bgt array bounds error
mulu.w #length,dO -
lea f6a, a2
move.w y,bar.a-length(a2,d0.w)
endwith
In the last move instruction, the expression bar. a-length refers to the
initial base offset (which is -1 * length of a foobar record) but then we
add an offset to address the field bar. a within the foobar record.
The most expensive operation here is the multiplication of the index by
10 which is the length of a foobarray entry. Earlier, in Chapter 2, we
saw that it is possible, given that we know the multiplier involved in the
calculation, to transform the multiplication into a series of 'shifts and
adds'. In this case, the size of each entry in the array is 10, therefore we can
perform a multiplication by ten by performing the simpler calculation:
x<<l+x<<3
The lsl instruction performs a logical shift to the left of a given number
of bits. On average, the 68020 these four instructions execute in 12 cycles,
compared to 29 cycles for the mul u instruction.
5. 2. 2 Arrays of arrays
In Pascal it is possible to have 2-dimensional (or even n-dimensional)
matrices; for example, the declaration:
fi?irr?a IT?u
.il/ \ >
bi[0,10]
1
11 1 1
• •
J... 1f1
bi[20,10]
bi[x,y] :=z;.
In this case the advanced versions of the indexed addressing mode (with
the scale factor built in) on the 68020 / 68030 can only help marginally with
the final index: it cannot remove the first multiplication. With higher-
dimensional arrays it is even more important that the multiplications
involved be as fast as is possible.
92 Pascal compound structures
We are using here the asl (arithmetic shift left) instruction to implement
a multiplication by four and by two. The first is needed because a pointer
occupies four bytes, and the vector of row addresses is in effect a vector of
pointers. Using the scale factors available on the 68020/68030 we can
eliminate the left shifts:
"bk[2]
\~/
"bk [1]
lbk[l,K,1] 1 ...... lbk[l,K,L] I
address vectors
lbk[J,K,l] 1 ...... lbk[J,K,L] I
However, this technique for representing matrices does have an extra cost:
for each row in the matrix there is an overhead consisting of a pointer to
it. Similarly, for each plane in a 3-dimensional matrix, there is a pointer
to it. For certain shapes of arrays (such as those which have many short
rows) the memory overhead of row address vectors may be greater than
the memory required for the array itself. Furthermore, the vector must be
initialized when an array is created. This is itself an expensive operation.
5.2.3 Exercises
jamjar=record
jam:integer; {16 bit integers}
jar:array[l .. 10] of Ajamjar;
end;
jjp:Aarray[l .. 5] of jamjar;
jjn:Ajamjar;
assuming that x and y are 16 bit integers. You should ensure that the
array index variables, x and y, do not exceed the bounds of the
various arrays involved.
95
96 Advanced Pascal data structures
string=packed record
length:byte;
c: packed array [ 1 .. 255] of char;
end;
If we had not said that c was packed then each entry in the array would
probably have taken a 16 bit word on the 68000 and the whole record
would be nearly twice the size! (In fact the outer packed declaration
would have been meaningless.)
A typical example of a packed structure with more than one field packed
into a word is: •
valid
Unused ~ d f mon day year
31 23 21 17 12 0
byte or word boundary. For example, the day field (which occupies bits 12
through 16) straddles a byte boundary within the record. The lack of a
regular addressing discipline means that we have to use more exotic
methods for accessing and updating fields in a packed record.
Suppose that we had to change the day of the month in our date
variable, we might do so with an assignment such as:
date.day:=date.day+l;
move.l aate, dO
and.l #$1FOOO,d0
The number $1FOOO we used here is obtained from the bit pattern which
has a 1 in every bit which belongs to the day field, and a 0 elsewhere.
Where there is a 0 in the mask number the and instruction will clear the
corresponding bit in dO, and where there is a 1 in the mask the original bit
pattern will be preserved. This masking of unwanted bit patterns relies on
the equations:
and.l #$1F~lllllllllllllll•llllllllllll
Having masked off the other fields, we now convert the day field into a
normal number by shifting it to the right so that it is moved to the least
6.1 Packed data structures 99
lsr.l #8,dO
lsr.l #4,dO
add.w #1,dO
cmp.w #32,dO ;range error?
bge range_error_yyy
The lsr instruction implements a logical shift to the right. The left hand
bits are filled with 0 as the pattern is shifted to the right, and the last bit
which is shifted off the right hand end of the register is collected in the c
flag in the condition codes register. We have seen the lsl instruction
which has the effect of multiplying a number by 2 for every bit shifted; the
lsr instruction can be used to divide a positive number by 2 for every bit
shifted. The lsr instruction is complemented by the asr instruction
which preserves the sign of the number as it is shifted.
The maximum number of bits that can be specified as immediate data to
the lsl/lsr instructions is 8: hence in order to shift by 12 bits we have to
use two instructions and break the 12 bit shift into two shifts.
After the arithmetic operation we shift the answer back to the correct
position in the long word for the day field with a logical left shift:
Normally we would also have to use the day mask again on the result of
the calculation, to make sure that no overflow in the calculation could
contaminate the other fields. In this case the mask operation is not
necessary since we abort the calculation and report an error if there is an
overflow on the arithmetic.
Having performed the calculation on the day field we have to re-insert
it into the original pattern for the record. We do this by first removing the
existing day field from the date variable, with another and instruction:
move.! aate, dl
and.l #$FFFEOFFF,dl
~~~1 12:1111™111111111111
and.l #$FF~ 11111
Note that $FFFEOFFF is the complement of the mask we used for day:
each bit in the mask is changed from a 0 to a 1 and vice-versa. Finally, we
or the two patterns from the new day field and the rest of the date
record, to get the final result and replace the new bit pattern into the date
variable:
~,J;J I I I I I I I I I I I I I I I I I I I I I
or.l dO,dl
~~1llllOllllllllllllll
Figure 6.6 Insert new data into packed record
which extracts the day field, and shifts it into position for an arithmetic
operation, into the data register dO. The number 15 referred to in the
instruction is the bit offset from the address date (counting from the left)
where the day field starts, and 5 is the width of the field (5 bits), Both the
offset and the width components of this instruction can be specified in a
data register. If the offset is specified explicitly, as an immediate operand,
then the offset is the range o .. 31, otherwise, if the offset is in a data
register, then the offset can be up to ±2,147,483,647 bits! The maximum
field size, whether specified in a data register or as part of the instruction,
is 32 bits.
0 15
The bf ins instruction puts the low order 5 bits from d O into the
appropriate part of the date record.
i set:set of 0 .. 1023;
should not be accepted by every Pascal compiler.
A set can be represented, in memory, as a (possibly packed) array of
booleans: each index in the boolean array corresponds to an element in the
base set. So, the i_set declaration might be implemented as though it
were:
i set:=i_set+[I];
i_array[I] :=true;
move.w J,dO
lsr.w #3,dO ; index ~ 8
move.w J,dl
and.w #7,dl ; remainder by 8
104 Advanced Pascal data structures
10 bit index
Bit offset
and.w #7,dl within byte
lsr.w #3,dO
Byte offset
within array
We use the number 7 in the mask to extract the bit offset due to the fact
that 7, being of the form 2n-1, has all l's in its binary expansion: in fact
7=0111 B. Once we have calculated the byte offset we can insert the
element by turning on the required bit in the relevant byte. We do this
using the bset instruction:
The bset instruction sets the bit indicated by its source in the memory
byte indicated by its destination. In this case, the bit position to set is
indicated by the contents of dl. If the destination were a data register then
any of the 32 bits in the data register can be set by this instruction, however
in the case of a memory address only bits within a single byte can be set.
Notice that we have once again used the address register indirect with
indexing addressing mode, this time to address the required byte in our
packed array.
If we had wanted to delete the element [I] from the set then instead of
setting the bit we would have cleared it with a bclr instruction:
(leaving the other flags alone) according to the value of the bit that is
tested:
The z flag is set by the btst instruction if the bit tested is zero, i.e. if the
element is not present in the set. If the element is present in the set then
z will be cleared.
Having tested for the existence of an element, the next step depends on
what context the test occurs in. If we wanted to set a boolean variable to
true if the element I is in i_set, which we might do with a Pascal
assignment like:
then we have to convert the state of the flags, in particular the z flag, into
a truth value which can be stored in a Pascal boolean variable. The 68000
has a·special instruction: sne which sets the lowest byte of a data register
to all l's if the z flag is cleared, and to all O's if not (in fact, there is a whole
class of instructions of the form sec where cc is one of the conditions such
as eq, lt etc. which convert conditions into truth values). We can use
the sne instruction to move the state of the z flag to a data register, and
then to a Pascal variable; however, we must first reformat the 68000's
notion of true to be consistent with Pascal's notion of true.
In Pascal, the boolean type is an enumerated type:
boolean=(false,t~ue);
The other context in which a set membership test often occurs is in the
condition part of a conditional statement (i.e. if-then-else). In this
case the state of the z flag is sufficient to guide the rest of the execution.
The use of conditions to control execution is explored in Chapter 7.
j_set:=i_set+j_set;
i s = [1,4,6]
j_s = [2 I 4]
move. w i_s, dO
or.w dO,j_s
6.2 Representing sets 107
The extra work needed to union the larger sets with 1024 elements
involves orring each of the 32 long words/128 bytes of the sets in turn;
typically we would perform this using a loop:
move.w #31,dO
lea i_set, aO
lea }_set, al
@l move.! (aO)+,dl
and.! dl,(al)+ ;intersection
dbra dO,@l
and set difference is also similar, except that in this case there is no single
bit-wise manipulation corresponding to set difference. However we can
observe that set difference can be re-expressed in terms of intersection and
complement:
A-B Arl-B
move.w #31,dO
lea i_set, aO
lea }_set, al
@l move.! (aO)+,dl
not.! dl ;-i set
and.! dl, (al)+ -
dbra dO,@l
108 Advanced Pascal data structures
6.2.3 Exercises
1. Show how to use the bfset instruction (which sets all the bits in a
bit field to l's) to implement set element insertion more succinctly
than using bset. The bf set instruction (which is not available on
the 68000) is similar in format to the bfins instruction except that it
has only one operand.
bool_var:=i_set<=j_set;
where <= means sub-set and i _set and j_set are sets of sub-range
type: O .. 1O2 3. Note that there is no direct 68000 instruction to
perform sub-set test, but it can be re-expressed as a combination of
intersection and equality.
z:set of 0 .. 1023;
you may assume that address register al contains the base address of
the set variable z.
Ensure that the value of the variable z is faithfully represented in
the machine.
CHAPTER SEVEN
109
110 Representing Pascal control
if fbp<>nil then
if fbp". foo>20 then
<sl>
else
<s2>
move.! f6p, aO
cmp.l #nil,aO ;outer test
beq @0
move.w foo(aO),dO ;inner test
cmp.w #20,dO ;>20?
ble @l
<sl> ;then branch
bra @0
@l <s2> ;else branch
@0 ;continue
cmp.w #20,dO
compares the lower half of dO with the literal number 20. It does this by
subtracting 20 from the lower half of dO, without updating it. If dO was
greater than 20 then the condition codes would be set by the cmp
instruction so that a subsequent bgt instruction would take the branch
112 Representing Pascal control
given by its label. In this case we only want to execute the e1se statement
(<s2>) if dO (which contains fbp"'. foo) is less than or equal to 20. We
can specify this using the same comparison followed by a b1e instruction:
cmp.w #20,dO
ble @1
ble *+10
where '*+10' means 'add 10 to the program counter'. In order to take the
branch - i.e. to go to the program instructions at @1 - all that is required is
to add 10 to the program counter. This is true no matter where in
memory the two instructions - the branch and the target - are located, so
long as they are 10 bytes apart. This, in effect, means that the program
fragment is position independent; and that in turn means that we can
change its position, i.e. move the program, dynamically without changing
its meaning.
Having said that the bra instruction uses program counter relative
addressing, we should also point out that this is the only addressing mode
that bra can use: therefore it is slightly disingenuous to suggest that it has
an addressing mode at all!
The 680x0 has good support for constructing position independent
programs. So much so, that at least one operating system - the Apple
Macintosh OS - requires that all programs are position independent. This
is because the Macintosh OS does not guarantee where a particular
program will be loaded into memory, which in turn allows a
simplification of the hardware requirements for a Macintosh computer.
If a program could rely on being located in a fixed place in the
computer's memory, then, in order to allow more than one program in
the memory at a time, a hardware mapping unit - called a memory
management unit - must be used to map logical addresses within an
executing program to physical addresses in the machine; since it is certain
that two co-resident programs will be in different places in the memory! It
must also be said that enforcing a position independent discipline incurs a
small overhead in the performance of an application.
7.1 Simple Pascal control structures 113
move. w *+30, dO
for example, which will load the word located 30 bytes from the start of
this move instruction into dO. Some assemblers will always use program
counter relative addressing when the programmer uses a symbolic label
and the label can be determined to be in the program, as in the instruction:
move.w v_loc,dO
v loc ds. w 1
if (fbp<>nil)and(fbpA.foo<20) then
both of the arms of the conjunction may be evaluated; in much the same
way as they would be had the test been written as:
move.l f6p, aO
cmp.l #nil,aO ;fbp<>nil?
sne dl ;dl:=fbp<>nil
move.w foo(aO),dO
cmp.w #20,dO ;fbpA.foo<20?
slt dO ;dO:=fbpA.foo<20
and.b d0, d1 ; ... and ...
beq @0 ;false->else
<sl>
bra @1
@0 <s2>
@1 ;continue
if (fbp<>nil)&&(fbpA.foo<20) then
<sl>
else
<s2>
where & & indicates the conditional conjunction. Such a statement can be
expressed in standard Pascal as the nested conditional statement:
if fbp<>nil then
if fbpA. foo<20 then
<sl>
else
<s2>
else
<s2>
7.1 Simple Pascal control structures 115
move.l f6p, aO
cmp.l #nil,aO ;fbp = nil?
beq @0 ; skip out if nil
move.w foo(aO),dO ; fbp is valid now
cmp.w #20,dO ;fbp".foo<20?
bge @0 ; skip out again
<sl> ;then case
bra @1
@0 <s2> ;else case
@1
repeat
<Statement>
until <Test>;
becomes, in 680x0 instructions, the structure:
I
116 Representing Pascal control
repeat
i:=i+l
until ai[i]>lO;
The code that we have generated to implement this loop is not especially
efficient since we are having to re-compute an array access for every test,
whereas in fact, we know that in each iteration of the loop the array
element being tested is actually the next one along.
If could take advantage of this fact, then a much more efficient loop code
can be constructed which simply moves a pointer along the array:
\
7.1 Simple Pascal control structures 117
@0 <test>
bee @l ; exit from l.oop?
<Statement>
bra @0 ;l.oop round
@l ;continue
Such a saving may be important if the body of the loop only consists of a
few instructions and the loop is executed frequently. We might have a
whil.e loop such as:
whil.e ai[i]<ai[j] do
i:=i+l;
in which case we can implement this loop using the sequence of 680x0
instructions:
I
118 Representing Pascal control
@0 Statement
add. w #1, contra{ ;increment control
@1 contro{>fimit?
bee @0
In practice, for loops are often used in array processing; any operation
(other than assignment) to a whole array requires a for loop to specify an
iteration over the elements of the array, as in this example of summing
two vectors:
for i:=l to 10 do
ai[i] :=ai[i]+bi[i];
which we can implement in the 68000 sequence:
\
7.1 Simple Pascal control structures 119
Of course a for loop need not have fixed bounds: either or both the initial
value and the limit values can be specified through expressions (although
it is undefined - in standard Pascal - what is meant if the expression
governing the limit value changes in value during the body of the for
loop). A more complex loop, to transpose a square matrix, might be:
for i:=l to 10 do
for j: =i+l to 10 do
beqin
e: =m [ i, j] ;
m[i, j] :=m[j, i];
m [ j, i] : =e;
end;
Although in practice programmers may tend to prefer for loops which
are incrementing, it is possible to have a decrementing loop in Pascal.
Such a loop would be written using the downto keyword as in:
move.w i, d7 ;j=:d7
add.w #1,d7 ;i+l
blt @1 ;early exit if j<O
@0 lea ai, aO ; ai [ ......
move.w k,. dO
add.w d0,-2(a0,d7.w*2) ;68020
dbra d7,@0
@1 ;continue
move. w # 6, dO
lea 5'l, aO ; address of A string
lea @0,al ;address of "foobar"
bra. s @1 ; branch to l.oop test
@0 dc.b "foobar"
and we could then use the dbge instruction to implement the loop
controlling branch:
The dbge instruction first of all tests the ge condition. If it is true - i.e. if
the condition codes register matches it - then execution continues with
the following instruction and the loop terminates. If the condition is false
then the dbge instruction is equivalent to a dbra instruction: it
decrements the loop counter register and branches to the label if it was not
0.
Because of the complexity of this instruction, in particular because we
cannot predict the state of the loop control register, we have to assume
that it is invalid after the completion of the loop: the programmer cannot
rely on its value.
In practice it may be quite rare for a Pascal compiler to generate such
instructions from a for loop in the program since most programmers
122 Representing Pascal control
write their for loops in increasing order rather than decreasing order.
However, an important use for the dbcc instructions is to implement the
'hidden loops' generated automatically by the compiler to implement
whole array or record assignments or string comparisons - in such a
situation the compiler 'knows' the whole loop and it is easy to arrange it to
make best use of the dbcc instructions.
7 .1 .4 Case statements
A Pascal case statement is used as a way of specifying a multi-way branch:
it is often used when the different possible values of an expression can
determine one of several actions to take. A classic example of this might
be in a 'calculator' program which reads in expressions and evaluates
them, by reading in a character and performing a case analysis on the
character:
case ch of
IQ I • • I 9 I: { read a digit }
I+ I: { add two numbers }
' - ': { subtract ...... }
'Q': { stop }
end;
<Sl>; goto @1
1---.----==t--~ <S2>; goto @1
@1: <continue>
The main code for a case statement consists of computing the value of the
expression and using this value to index the table. The table would
normally consist of offsets to instructions rather than addresses:
@99 ;continue
We have used a new instruction and a new addressing mode in this
sequence of instructions. The instruction:
jmp @O(dO.w)
which is also using the program counter indexed addressing mode with an
offset to @0. The jmp instruction is similar to the bra instruction except
that the normal addressing modes are available to specify the target
address - except for the register direct and immediate modes. This means,
for example, that we can specify a jump to the contents of an address
register:
jmp (aO)
which jumps to the program whose address is in aO. This instruction will
be used again when we look at the structure of code which implements
Pascal procedures.
Notice that the addressing mode used in the jmp instruction is not to be
confused with the 'address register indirect with displacement' addressing
mode which would be written
jmp 0 (an)
is to be preferred.
The third new aspect of the code fragment for case selection is the use
of label subtraction in the definition of the offsets in the table. A directive
such as:
causes the assembler to store in the word the difference between the
addresses @2 and @0. Provided that both @2 and @0 don't move relative
to each other, this difference is a fixed number which the assembler can
determine even without knowing where they are in absolute terms.
7.1 Simple Pascal control structures 125
@0 jmp @1
jmp @2
and we index into the correct jmp instruction as though we were accessing
a table of long words:
The 'indexed jump' selects one of the jmp instructions to execute, executes
it, which then causes a jump to the instructions for the selected case. If we
are prepared to risk the possibility that an expression might be out of
range, and not do the bounds check, then the index selection becomes
quite short:
compute ezy, dO
jmp @O(d0.w*4) ;68020 only
7 .1.5 Exercises
whi1e i>j do
begin
if 2*i = j I I j>lO then
ai[i]:=j
e1se
ai[j] :=i;
i:=i-1;
end;
2. The sea-brigade of the Pascal standard block has decreed that there
should be an extension to the language: the conditional expression.
An example of the use of this expression might be:
Sketch out how you would compile such an expression, and give the
exact instructions for the statement:
for i: = S to E do
fab[i] .foo .- fab[i] .foo+jab[i] .foo;
lea fab,aO
move.w d7,d0
asl.w #1,dO
move.w dO,dl
asl.w #2,dO
add.w dl,dO ;i*lO
add.w d2,foo-length(a0,d0.w)
@2 cmp.w E,d7
ble @1 ;end of loop?
The next iteration of the loop will involve accessing the elements:
move.w S,d7
move.w d7,d0
asl.w #1,dO
move.w dO,dl
asl.w #2,dO
add.w dl,dO ;S*lO
lea fab,aO
lea -length(a0,d0.w),a2 ;a2=fab[S]
lea jab,aO
lea -length(a0,d0.w),a3 ;a3=jab[S]
bra @2
add.w #l,d7
@2 cmp . w E, d7
ble @l
In this version of the loop, only seven instructions are executed in each
iteration (although more are executed during the loop set up).
7.2 Coding for performance 129
fp = &fab [i];
jp = &jab [i];
procedure proc;
begin
end;
proc;
130
The Pascal procedure 131
The effect of the j s r instruction is to push the return address (i.e. the
address of the instruction which follows the jsr instruction) onto the
system stack, and then cause a jump to the program address specified by
the operand of the j sr:
aO
------
al
a2
------
a3
a4
Before
<ret>
as------
a§ - - - - -
jsr proc
<ret> is the return address left on the stack by the jsr instruction - it is
the address of the instruction which follows the jsr.
At the end of the instruction sequence which implements the called
procedure proc we insert a rts (return from subroutine) instruction.
This 'undoes' the effect of the j s r instruction: it 'pops off' the address
from the system stack and~ count~ i.e. continues
execution from the point after the call. -- ---
This simple mechanism allows us to have an arbitrary depth of
procedure calls - including recursive calls - since we are using a memory
stack to keep track of which procedure calls are in force and where to
return to for each separate invocation.
Although a primary function of the system stack is to keep track of
which procedures have been called, and from where, it is not the only role
that it plays (we have already seen that it is used in the evaluation of
expressions, for example). As programmers, it is our responsibility to
ensure that the system stack is balanced - that after executing a return to a
calling procedure, the stack is returned to the same height/place as before
the j s r instruction. It is a rather common error for assembler
programmers in particular to fail to ensure this simple condition; this
leads, at times, to some spectacular behaviour from our programs.
132 The Pascal procedure
procedure qs(fab:foobarray;
f:integer;var t:integer);
qs(fab,i+l,j);
move.w i,dO
add.w #1,dO ;i+l
move . w dO, - (a 7)
pea j
The pea instruction pushes the address specified by its operand onto the
system stack as opposed to the value of the operand. Of course, for us to be
able to use pea here, it must have been the case that we could specify
where the variable j is via a standard addressing mode. For example, it
may be that j was a field in the j_r variable which is a j_rec record
(say), in which case to pass the j parameter we would have:
All the parameters, whether they are var or value parameters, are pushed
onto the system stack just ~~-~.11 ~e_m:ocedure. The calling
sequence for the complete call:
pea fa fa:foobarray
move.w i,dO
add.w #1,dO 1002 Before
move.w d0,-(a7) 998 - -"fa
- - -
pea j 996 - -i+l
---
__" j __ "'II( j : integer
--..;=-..,;. _<_!='~t_? - ~a7
984 After
move.l qs.t(a7),a0
move . w (a 0 ) , ..... .
136 The Pascal procedure
link a6,#-16
This instruction performs three actions to reserve the 16 bytes: the current
value of the address register - a6 - is pushed onto the system stack, the
stack pointer ::.-eJ. - is then moved to a6, and finally 16 is subtracted from
a 7. This last step ensures that there are 16 bytes on the stack which will
not be overwritten should ther_e be a subsequent stack push. The
instruction semantics is to add the amount to the system stack pointer,
which is why we specify a negative amount to add!
8.1 Parameters and local variables 137
<link> is the
old value~o-f_a_4-~
/ aO- - - - -
1004 Before al
- - - - -
1000 <link> a2
~;- - - - - - - -
996 a3
- - -
~$'5'
~~-=---
a4
984 a5
980 After a6- - - - -
a7
!i
link a4,#-16
The fact that the 'free space' is below a6 - which does not directly point to
the allocated block - is largely immaterial: we simply use a negative offset
from a6 when we wish to refer to a location within the free space. The
amount of space that we need to allocate is found by counting the space
needed for each of the local variables and adding to it any space needed for
local copies of non-scalar parameters which have been passed by value. In
the case of qs, the full declaration of the procedure may look like:
procedure qs(fab:foobarray;
f:integer;var t:integer);
var i,j:integer;
begin
in which case we need four bytes for the local variables i and j, and 100
bytes for the copy of fab (assuming that it is an array of 10 foobar
records, each of which is 10 bytes long):
qs link a6,#-104
After the link instruction has been executed, and after copying non-
scalars which have been passed by value, we have a complete stack frame
in which we can execute the newly entered procedure:
138 The Pascal procedure
• fa: foobarray
1002
998 "fa j:inteqer
996 i+l
992 aO
jsr qs 988 al
a2
p~~ ------
i- a3
. 982
980
. - -
__ :!_ __
-
a4
qs link a6,#-104 ------
as
a6
------
~ a7
local copy of fab 880
878 - After -
Jj.
We can view the set of local variables as being part of a record: the
difference between this record and a normal data record is that the offsets
are negative, but we can assign symbolic names to them all the same:
qs.i equ -2
qs. j equ -4
qs.fab equ -104
i:=i+j;
becomes
move . w qs . j (a 6) , dO
add.w d0,qs.i(a6)
8.1 Parameters and local variables 139
In fact, we can also use the link register to gain access to the parameters of
the call: whilst a negative offset is needed to access a local variable, an
appropriate positive offset will access the parameters of the call. The long
words at the link register (offset 0) and immediately above (offset 4) are
occupied by the previous link address and the return address respectively,
but above that - from offset 8 upwards - are the parameters of the call.
The link register is fixed, and therefore offsets from it are valid,
throughout the execution of the procedure body; unlike the system stack
pointer which can vary considerably. This applies even if there are
procedures which are called from within the body: provided that the link
register is preserved across a call. (This is why the link instruction saves
the old link register.)
So, we can now completely determine where all local variables and
parameters to a procedure are located: they are accessed via offsets to a link
register which is established on entry to the procedure. In particular, we
can now completely specify all our assembler instructions.
unlk a6
The fact that the link instruction preserves the old link value, and the
unlk restores it, allows us to support local variables and parameters
within recursive procedures. A new invocation of the same procedure
will re-use a6 to access its own lm::als and parameters; when the recursive
call is completed the unlk instruction restores the previous environment
so that access can be made to its locals and parameters. We can re-use the
link register again in this way because we never need, in a recursive call,
access to the local variables for both the recursive calls at the same time.
140 The Pascal procedure
Having cleared the local variables with the unlk instruction, it remains
to clear the parameters from the stack and to return to the caller. After the
local variables have been de-allocated a typical stack frame still has the
return address 'on top of' the parameters:
1002 Before
998 "fa
996 i+l
992 "j aO
988 ret> al
jsr
984 After a2
a3
JJ, a4
------
a5
------
qs link a6,#-104
unlk a6
There are two possible ways to clear the parameters: we can immediately
execute a r t s instruction (since the top of the stack now contains the
return address left by the j sr instruction), and let the caller remove the
arguments that it has stacked; or we can remove them before returning to
the caller.
The first method can be done using a stack adjustment after the j s r
instruction:
jsr qs
add.l #l0,a7 ;adjust stack
In order for us to clear the stack before the return we have to perform
some shuffling:
Notice that we have, in effect, split the sub-routine return operation into
two parts and inserted the stack adjustment into the middle. Although
this sequence of instructions is longer than the first one, it executes in the
same amount of time on the 68000: demonstrating that rts is a relatively
expensive operation.
The 68010/68020/68030 processors have an instruction which simplifies
the stack adjustment somewhat. The rtd instruction combines the effect
of a sub-routine return with the corresponding stack adjustment; we can
return and clean up the parameters from the qs program with the single
instruction:
rtd #10
This procedure has two var parameters: i and j, and one local variable k
which occupies two bytes. If we use the register a6 as our link register, we
can access the local variable k with the negative offset -2, and the two var
parameters i and j have positive offsets of 8 and 12 respectively.
The assignment
k:=i;
142 The Pascal procedure
can access its own variables, those of its enclosing procedure and the global
variables.
In practice we might not have such complete freedom to choose link
registers. For example, the Macintosh 0 /S requires that register as is used
for global variables. We can still use a6 for the 1st level, and a4 for the
2nd level and so on. The exact order or register usage is not important so
long as the compiler is consistent.
We can use the same address register for all of the procedures at a given
lexical level because although a procedure can access variables in outer
procedures, under the scoping rules it _c.?l_nnot access variables at the same
qr ~i:_lexical scope. Similarly, recursion can be safely implemented
because a resursive cc1ll is simply a call to a procedure at the same lexical'
level as the-caller! -.-- · · ··· · ·
-In th~--Pascal program below we need to support three lexical levels, the
program level and two inner levels:
program pr;
var a: integer; {lexical level 0}
begin{main program}
a:=3;
p;
end.
which we can do using a6 for the globals, as for the variables within the ·
level 1 procedures p and r, and a4 for procedure q which is at lexical level
2. Notice that the global variables are allocated using a link instruction;
just as we do for the other procedures. This allows us to consider that the
operating system can call our program just as though it were another
procedure!
144 The Pascal procedure
p.b equ -2 ;b in p
p link aS,#-2 ;level 1
jsr q ;call q
unlk as
rts
q.c equ -2 ;c in q
q link a4,#-2 ;level 2
jsr r ;call Ll pr
move.w pr.a(a6),d0;a
add.w p.b(aS),dO ;a+b
move.w d0,q.c(a4) ;c:=a+b
unlk a4
rts
main program
pr.a equ -2
pr link a6, #-2 ;global var
move.w #3,pr.a(a6);a:=3;
jsr p
unlk a6
rts ; to O/S ..... .
There are 8 address registers which, in principle, can all be used as link
registers; however, a 7 is already in use as the system stack pointer, and if
we wish to be able to implement indirect access to variables then we need
at least one, and preferably two, address registers for use in intermediate
calculations. This leaves 5 address registers that we can potentially use,
and that in turn means that we can support procedures up to five lexical
levels deep. This should support all practical examples of Pascal programs;
however if a deeper level is necessary then we can re-use some of the
registers and provide extra links to the missing lexical levels.
over memory (after all, one of the motivations for using an assembler is to
gain increased speed).
Some compilers also attempt to use the registers for holding program
variables as opposed to using them as pointers to memory locations which
themselves contain the variables. Typically, reflecting the different
capabilities of the two register banks on the 680x0, we might use data
registers to hold numeric or character values and address registers to hold
pointer variables.
Obviously, the scope for using registers to hold variables is limited by
the fact that there are only a fixed number of registers and the compiler
must use some of them to support other necessary features. However, we
do not need all 16 registers to support Pascal and and so some can be used
to hold users' variables.
If some of the registers are to be used to hold variables, then we must
ensure that their validity is maintained; if a procedure calls another one
then any variables which are in registers must either be preserved by the
calling procedure, so that they can be restored when the procedure returns,
or the callee can save any registers that it uses and restore them before
returning.
In either case, the likelihood is that several registers will need to be
saved and restored on either side of a call. The 680x0 has an instruction -
the movem move multiple instruction - which simplifies the process of
saving and restoring groups of registers. This instruction has two forms,
the first is used when saving registers and the second when restoring.
In order to save all the data registers except d3, and address registers a2
and a4, on the system stack we would use:
movem.l d0-d2/d4-d7/a2/a4,-(a7)
movem.1 (a7)+,d0-d2/d4-d7/a2/a4
expressions. The use to which the registers are put depends on a balance
chosen by the compiler writer. In many situations it is possible that all the
requirements can be met without compromise.
For example, provided that they are preserved prior to their use, the
address registers which are needed to support higher lexical levels can be
made available to the lower lexical level procedures for use in with
statements.
8.2.2 Exercises
1. Given the Pascal program below, show the complete sequence of 680x0
instructions that would implement the program. You may assume
that address registers a4 through a6 are available as link registers,
although you should indicate which you intend to use for a given
lexical level:
begin {split}
· less .- first;
greater . - last + 1;
ref val .- d[less] .extn_no;
repeat
repeat
less := less + 1;
until(d[less] .extn no>=ref val);
repeat - -
greater : = greater - 1;
until
d[greater] .extn no<=ref val;
if less < greater then -
swap(less, greater);
until (less >= greater);
swap(first, greater);
end;{split}
begin {quicksort}
if first < last then
begin
split(middle);
quicksort(d,first,middle-1);
quicksort(d,middle+l,last);
end;
end; {quicksort}
148 The Pascal procedure
begin{main}
directory[O] .extn_no .- -1;
directory[max array] .extn no .- MAXINT;
for count . - -1 to number of entries do
with directory[count] -do
readln(extn_no,name);
quicksort(directory,l,number_of_entries);
8.3 Functions
In Pascal, functions can be viewed as being procedures with an extra
argument which is represented as the result variable. A function call is
implemented in the same way as a procedure call except that it always
takes place in the context of an expression evaluation:
i:=max(j,k)+l;
The extra, hidden, parameter to the function call is filled in when the
function variable is assigned in the body of the function:
'{fl..
,/
, >'\fV move . w #0,-(a7) ;result
' -'1"' r") move . w p.j(a6),-(a7) ;j of context p
'_move .w q.k(aS), -(a7) ;k of context q
jsr max ; call max
'
aO
------
al
1002 Before
- - - - -
------
a2
------
1000 <result>
--- -- a3
998 ;! ____ ------
a4
------
996 k- - - - - a5
992 <ret> ------
- - - - -
990 After
When the function max has returned, the result of the function,_w:i-11.l>e--'
~ugh the arguments j and k will have been cleared.
This allows us to use the value of the function in an expression in the
normal way:
Apart from that, we allocate local variables on the stack in the same way as
we do for procedures. The complete max function can be implemented as
the sequence of instructions:
150 The Pascal procedure
Notice that even if there are no local variables we still use a 1 ink
instruction - with an allocation of zero bytes - to establish a convenient
pointer to the parameters of the procedure or function.
This scheme of implementing function calls fits in very well with our
method for implementing expressions using the system stack for
intermediate results. A function call simply leaves its result on the stack
as its contribution to the value of the expression.
However, we can also combine function calls with a register based
scheme for expression evaluation provided that we preserve the
intermediate registers prior to making the call, and restore them
afterwards. In this kind of system, we would also use a data register - dO
(say) - to return the value of the function, rather than using the stack.
while fbpA.foo<lO do
begin
fbp:=fbp".next;
if fbp=nil then goto 10;
end;
10:
For simple situations like this, the goto statement is easily mapped to a
jmp instruction:
8.4 The goto statement 151
bra @1
@0
move.! p.fbp(a6),a0 ;fbp in scope p
move.! next(aO),aO
move.! a0,p.fbp(a6)
cmp.l #nil,aO ;fbp=nil?
beq @10 ;goto 10
procedure a;
label 10;
procedure b;
begin
goto 10;
end;
begin
a;
10: ...
end;
In such a circumstance, we have to be careful to restore the system stack
and various link registers to their correct state appropriate to the outer
scope. A simple jmp instruction from a procedure to a point in the calling
procedure potentially leaves the stack in a disordered state. Without
adjustment the stack would still reflect the position within the called
procedure - it would be unbalanced. However we can readjust the stack to
take into account the goto.
Since the target label must be in an outer scope to the procedure with
the goto statement it must also be the case that the link register for that
lexical level is still in force, similarly for the lexical levels which are below
it. It is not permissible, in Pascal, for a goto statement to exit to a
procedure at the same or higher lexical level. We could take advantage of
this and simply do nothing - we could ignore the space allocated on the
stack.
152 The Pascal procedure
movem.l (a7)+,d3-d5/a2-a4
rts
then a goto out of this procedure would also have to restore these
registers:
movem.l (a7)+,d3-d5/a2-a4
jmp {afJe{
Apart from the data registers, it is vital that any frame registers that may
have been used are restored prior to the jump. This includes the local
variables allocated by the exiting procedure.
8.4 The goto statement 153
In the case of a jump out of more than one lexical level it is often
impossible to predict which frame registers have been used and need to be
restored. In this case it becomes necessary to force the execution of the exit
sequences of the procedures that the goto is passing through before
jumping to the appropriate label in the target procedure. This can be done
by patching return addresses on the stack, but this is a complex operation.
In general, using a goto to exit a procedure is not to be recommended,
and some languages (such as 'C') do not permit non-local gotos.
CHAPTER NINE
154
9.1 Recursive data structures 155
i list=record
i:integer;
next:"i list;
end; -
i list=record
i:integer;
next:i list; { recursive reference}
end; -
Our test on the first list element would now be expressed as:
data objects since the programmer does not need to be concerned with the
fiddly details of using pointers.
Since pointers are no longer explicit in a system with recursive data
structures, it follows that the programmer has less control over them. In
Pascal, when we assign a record variable ip (say) to another record
variable jp (say), we can choose whether to change the pointer to the
record or the contents of the record itself:
versus
LISP also has a method for combining data: the dotted pair or CONS pair.
A CONS pair is simply a combination of two LISP objects (each of which
might be a dotted pair also). We write the dotted pair of A and Bas:
(A . B)
where A is referred to as the CAR of the pair and B is the CDR. The terms
CAR and CDR are used for historical reasons: the earliest implementations
of LISP were on the IBM 709 computer. This machine had two registers
called the current Address Register and the current Decrement Register. A
frequent operation in LISP is to build the two components of the CONS
pair; so they were typically loaded into the Address register and Decrement
register prior to building the pair in one step.
Where the CDR of a dotted pair is another dotted pair then we can use
an alternative notation: the list notation. This is similar to the dotted pair
except that instead of writing:
(A (B . C))
we can write:
(A B . C)
There is a special case where the 'last' element is the special atom () or
nil or *nil* - depending on the version of LISP. In this case the final
dot is omitted, this allows us to write lists in a natural way:
(1 2 3 4 5)
instead of
(1 (2 . (3 . (4 (5 . nil)))))
158 Symbolic programming languages
Note that the fractional number 23. 3 is differentiated from a dotted pair
by the absence of spaces around the dot.
This notation for describing data structures is called the S-expression
notation. The S-expression notation is a fully recursive language for
describing tree-like data objects. It is quite possible to program in LISP
without any understanding of how S-expressions are implemented. The
language of S-expressions is suffident as a tool for 'thinking' about data.
This is the second great achievement underlying the LISP language.
LISP programs are also valid data objects which are written as lists:
(defun app (x y)
(cond ( (ni1p x) y)
(T (cons (car x)
(app (cdr x) y) ) ) ) )
The lines coming out of the box indicate pointers to the CAR and CDR of
the dotted pair. We can represent an atom as another kind of box:
Each box corresponds to a record in the memory of the computer and each
pointer is represented by the address of the box pointed at. Box diagrams
can easily become quite complicated:
The box corresponding to the nil atom tends to have a large number of
references. To simplify our later box diagrams we will use the special box
~to denote pointers to nil as in Figure 9.4:
tg=(atom,dotted);
s exp=record
- case tag: tg of
atom:__ see below for atoms
dotted: (car,cdr:As exp);
end; -
cell=record
case tag: tg of
atom: ......
dotted: (ptr: Apair) ;
end;
This representation, which is the basis of many modern LISP and Prolog
systems, is used because it reduces the overheads for some common types
of atoms: notably integers. Some LISP systems have additional methods
for structuring data: vectors for example, and the tagged pointer scheme
makes this easier.
appe~ / tag=atom
-~~~Other
,. ~ L.n-:::i properties ...
l~"code"I ~-~p-- . .
instructions ~ ~
for append
function...
j "pname"j
. .
~
"a" 5J
"p"
Notice that the structure for an atom itself uses dotted pairs extensively.
The distinguishing feature of an atom structure and a normal S-
expression is that the tag field of the 'top-most' record is set to atom
rather than dotted. Below this top-level dotted pair the structure of an
atom is a normal S-expression albeit in a particular format. Standard
access functions which operate over S-expressions also operate over the
internal structure of an atom.
Another feature of the atom structure is that the print name of an atom
can be arbitrarily long, since it is represented by a list of characters. (This
would have been a relief for FORTRAN programmers in the 1950' s used
to identifiers being restricted to only 6 significant letters.) It is important to
be able to 'hold' all of the characters that make up a name since we need to
be able to read, print and re-read any S-expression. Unless all the letters in
a print name are remembered there would be a risk of two atoms printing
the same way, and hence being confused when the S-expression is re-read.
Symbol dictionary
Finally, especially given the complexity of an atom's structure, we make
some effort to share the memory occupied by it. In particular, all
references to an atom, in all S-expressions, are resolved to a single copy of
the atom's structure in memory. This includes references from within
programs and from within the system stack. This is done using a special
162 Symbolic programming languages
/g=integer tag=atom
[ill
Other properties ...
~
bit pattern for 23
Other possible special cases of atoms which would benefit from specialised
representations include floating point numbers, single character symbols
and 'special' system pointers (such as nil).
9.2 LISP data structures 163
~ ...... ~
~ ~ ~~'m
Figure 9.7 98,537, 195,986,590, 732,017,237 as a list of digits
However, this could be quite expensive in space: each dotted pair (and so
each digit in the big num) occupies 10 bytes (say); whereas a single byte can
hold the equivalent of over two decimal digits: a big num represented in
this way would be 20-25 times as expensive as a regular number. More
economically, we can split the binary expansion of the number into 32 bit
chunks:
164 Symbolic programming languages
00000000000000000001010011011101
10110110010011000000110101011101
01101011010101100101111001010101
This representation is only 2-3 times larger than the space needed to
represent the pure bit string itself.
If we are to perform arithmetic on such numbers then we have to do it
on a piece-meal basis also - preferably in 32 bit chunks. This means that
we need to see how multi-word arithmetic can be done in a fixed word
machine.
Suppose that we were to add the numbers 120 and 150 in 8 bit
arithmetic. The 680x0 processor allows us to do so, using the add. b
instruction:
move. b #120, dO
add. b #150, dO
The result of this addition is 14 (since I270I2s 6=14), but the processor also
sets the carry flag to 1 to indicate that the result overflowed the ability of 8
bits to represent the result. It is also true that the binary expansion of 270
consists of a leading 1 (the carry bit) followed by the binary expansion of
14:
= (3+7)*256 + (232+208)
= (3+7+1)*256+184
= 11*256+184
ta'ke care of factors such as aligning the lists of chunks together, and
performing mixed arithmetic between normal numbers and big nums.
(cons a b)
is a dotted pair, whose car is the value of a and whose cdr is the value
of b. Thus, assuming that there is no conflict with side effects, we have
the equivalence:
and
Free list
End of the
\
A typical LISP program might, on average, 'consume' one dotted pair from
the free list every other expression/statement which is executed. Clearly,
since it is a frequently accessed data structure, it would a good idea to
reserve a register a6 (say) to point to the head of the free list.
To implement a cons expression involves extracting a dotted pair
record from the free list, updating a6 to point to the remainder of the free
list and filling in the car and cdr fields of the new record (the tag field
of each record in the free-list would normally already be pre-set to
dotted):
The free list will become empty - at regular intervals - when all the pairs
in the heap have been allocated. At this point we have two choices - we
can try to increase the size of the heap or we can invoke a garbage collector.
Since no physical computer has an infinite capacity, at some point the first
option becomes impossible, in which case we must investigate collecting
the S-expressions which are no longer in use.
168 Symbolic programming languages
Garbage collection
As we noted above, in a language with recursive data types, it is difficult
for the programmer to keep track of which structures are in use at any one
time. Although it is clear from the text of a LISP program when new
cons pairs are created, it is less obvious when a cons pair is no longer
needed. Sometimes it is obvious, as can be seen in the (rather
tautologous) expression:
After evaluating the car, the newly generated cons pair can be discarded.
However, most cons pairs have a longer, unpredictable, lifetime than
this. It is theoretically impossible to automatically predict when a given
cons pair will become garbage.
In any case it is tedious to have to do so, and if the system can
automatically keep track of which data objects are in use, and which are
not, then this removes a burden from the programmer. It is the task of
the garbage collector to clear up those data objects which have been created
but which are no longer in use.
There are many possible schemes for garbage collection, but they are all
based on the principle of identifying those objects which are still in use
(the mark phase) and removing the rest (the collect phase). Some garbage
collection systems have additional constraints over and above the basic
one of collecting all the unused space - for instance a real time garbage
collector is required to be as fast as possible (with the possibility of not
necessarily collecting all the space at once for example); other systems are
required to execute in minimal space and yet other garbage collectors have
to be able to deal with objects of different sizes. We will look at one simple
scheme - stop, mark and collect. We stop when we run out of records in
the free list, mark the objects which are in use and collect the rest into the
free list.
The mark phase of a garbage collector examines all of the data objects -
S-expressions in our case - that are in use and sets a special marked flag
on them. This marked flag appears as a boolean field in the S-expression
record:
s exp=record
- marked: boolean; {true if S-exp is in use}
case tag: tg of
atom: ..... .
dotted: (car,cdr:As exp);
end; -
The collect phase involves trawling over the whole of the space allocated
to the heap and collecting up - into the free list - all those records which
9.2 LISP data structures 169
are not in use (i.e. not marked during the mark phase). The collect phase
usually also un-marks those records which are in use.
There are many marking algorithms, many of which attempt to run in
the least possible amount of space. (Recall that the garbage collector is only
called when we have run out of space, therefore it is reasonable to assume
that there is not much space for the garbage collector itself to run in.)
However, the basic marking algorithm is quite simple: if we have a
pointer to a structure which we know is in use, then either it is already
marked, in which case we do nothing, or we mark it and we recursively
mark the car and cdr fields:
procedure mark(m:As_exp);
begin
if not mA .marked then
begin
mA.marked:=true;
mark(mA.car);
mark(mA.cdr);
end;
end;
Notice that we must mark a dotted pair before marking the car and cdr
fields to prevent looping in the case that a structure is circular, which
frequently happens in a LISP system.
There are only a few places from which ultimately all the structures in
use can be reached. If a given S-expression is not referenced from either
the currently executing LISP program (in which case the reference would
originate from a value on the expression stack) or from the atom
dictionary (which leads in turn to the defined LISP functions) then it is not
possible to access the S-expression and therefore it must be garbage. Thus,
it is relatively simple for the LISP system to ensure that the mark
procedure can access all of the objects in use.
The collect phase of the garbage collector involves going through the
heap space and examining every record in it. It is at this point that we can
appreciate some of the beauty of LISP data structures: since every S-
expression is ultimately built from the same dotted pair record it is a
simple matter for the collector to recognize valid from invalid data,
simply from the mark flag:
170 Symbolic programming languages
procedure collect;
var p:"s exp;
begin -
p:=start of heap; {trawl the heap }
repeat - -
if not p". marked then {in use? }
begin
p".cdr:=free list; {put in free list}
free list:=p7
end; -
p".marked:=false; {clear mark flag}
p:=succ(p); {go to next S-exp}
until p=end_of_heap;
end;
This pseudo-Pascal procedure sketches out the main aspects of the
collection phase of the garbage collector. If the collect phase fails to find
any garbage, then the evaluation of the LISP program must terminate and
return some kind of error condition to the user.
modern, and although nowadays few LISP systems are not compiler based,
they also contain - as a library function - an interpreter for LISP programs.
LISP execution can be seen as a two-phase activity: expression
evaluation and function application. Evaluating expressions is the
primary way in which we initiate a LISP execution and function
application is the method used to help expression evaluation.
There are, in fact, three types of expression in LISP: function and
primitive operator application, program sequences and special forms,
which include conditional expressions.
(car x)
is the S-expression contained in the car field of the S-expression
identified by the variable x, and
(cdr x)
refers to the cdr field of x. Since, by Pascal standards, all S-expressions are
non-scalar, all LISP variables contain addresses of values rather than
values directly. The address contained in a variable is usually of an object
which is in the heap. This means that accessing the value of a variable -
such as x here - involves at least one memory indirection. The cdr
expression, for example, is compiled into code which picks up x's contents,
checks that the address contained refers to a dotted pair and then accesses
the cdr field of that pair:
move.l. aO
;r_, ;x is an address
cmp.b #dotted,tag(aO) ;pair?
bne cdr error ;il.l.egal. access
move.l. cdr(aO), ... ;cdr is a pointer
Lists are dominant in a LISP system and accessing list structures is always
performed using the car and cdr access functions. To simplify accessing
structures, particularly when the sequence of cars and cdrs is known
already (such as when the programmer wants to access the third element
172 Symbolic programming languages
of a list) LISP has a suite of path access functions, based on car and cdr.
For example, the expression
(caddr x)
(cdddr x)
move.l x,ao ;x
cmp.b #dotted,tag(aO) ;dotted pair?
bne cdr error
move.l cdr(aO),aO ; (cdr x)
cmp.b #dotted,tag(aO) ;dotted pair?
bne cdr error
move.l cdr(aO),aO ; (cddr x)
cmp.b #dotted,tag(aO) ;dotted pair?
bne cdr error
move.l car(aO), ... ; (caddr x)
This might be compared with the analogous Pascal expression to access the
second integer along in a list of integers:
move.! ~' aO
cmp.l #nil,aO
beq nil error
move.l next(aO),aO ; x". next ...
cmp.l #nil,aO
beq nil error
move.! next(aO),aO ; ... next" . next
cmp.l #nil,aO
beq nil error
move.! i(aO), ... ; next". i ...
(append X Y)
move. 1 X, - (a 7)
move . 1 y, - (a 7 )
j sr appe11d
Notice that we do not pass a hidden 'space' parameter for the result of the
function as we did for Pascal functions. This is because there is no
possibility of a LISP function assigning its value before returning to the
caller: in Pascal, a function's value is set by assigning the function variable
which could be any time within the execution of the function. When a
LISP function returns, the values on the stack are replaced by the value of
the function which is the value of the outermost expression in the LISP
function's body.
LISP has only one mode for passing arguments to a function: we pass
the value of the argument. However, since all values in LISP are S-
expressions, and since we can't pass an arbitrary S-expression in a fixed
length register, we pass the address of the value of the argument rather
than the value itself. So, in Pascal's terms, LISP is neither call-by-value
nor call-by-reference, but something in between. It is possible, for example
by using an S-expression overwriting primitive such as replaca, to
change the value of an argument to a function.
174 Symbolic programming languages
appe~ / tag=atom
~~Other
~IT] properties ...
l'~'code"I ~-~ps- . .
instructions ~ ~
for append
function... I. "pname"j_ ~
"a" 5J
"p"
Although the address of the append function's code may not be fixed, we
can fix the address of the append atom. Once an atom has been read into
the system its address does not change; we can use this fact to locate the
code of the append function. Recall that the structure of a LISP atom
includes, as a property, the defining program for any function associated
with the atom.
Given the address of the append atom, we can implement our entry
into the append program by searching its property list for the code:
9.3 Executing LISP programs 175
move.l appena, aO
@0 move.l car(aO),al
cmp.l #coae, car (al) ; code property
beq @1 ; found it!
move.l cdr(aO),aO
cmp.l #nil,aO ;last property?
beq un defined ; append is undefined
bra @0- ; try next property
·11-:::1--_..._,..--,---:i--- Other
properties ...
~fun·~~~ti:.~ ~
con ...
~~-
~-
-~~ , __
Notice that if every atom has a code property in the same place, then we
do not need to identify it explicitly; we can assert that the first entry in the
atom structure is always the code for the atom:
::.::- ~-rt.~~
~ ~
procedure ... ~
l"pname"i l"f"I l"o"I l"o"I
Figure 9.13 Layout of an undefined function's atom
176 Symbolic programming languages
If a given atom has no function associated with it, then it still has an
address of valid code to execute - it is simply the address of a standard
error reporting procedure.
With this structure for atoms, we are guaranteed that there is always
something to execute, even if it is only an error procedure; and that, in
turn, means that we don't need to check for valid code. Furthermore, by
fixing the location of the defining code, we can eliminate the search
through the atom's property list and use instead a simple indirection:
move.1 appena, aO
move.1 car(aO),aO
jsr (aO)
move. 1 X, - (a 7)
move . 1 '.)'", - (a 7)
move.1 appena, aO
move.1 car(aO),aO
jsr (aO) ; (append X Y)
move.! (a7)+,ao
move.1 car (aO), ... ; (car (append ...
(foo 2 3)
(bar 3)
(times 3 X)
Since bar has been invoked within the dynamic scope of foo, the value
of x which is available to bar is 2; so the expression that is to be evaluated
is:
(times 3 2)
x~
~Other
~ properties ...
~lueofX~
not defined
reporting
procedure ...
I"value" I l"pname" I l"x"I
Since we are likely to refer to the value of an atom more frequently than
we call its program, we can optimise access slightly by rearranging the
atom structure so that the first two entries in the property list are always
predefined (which also means that we do not need explicit property
identifiers) and that the value is the first entry in the structure and the
function code is the second:
x~
~~..--=:i---~other
rValue ofX
' .--
not defined
properties ...
reporting
procedure ...
l"pname"I
To take into account the fact that the atom's value is now at the top of its
structure, the sequence to enter a function must have an additional cdr to
step over the atom's value.
So, as we enter a function body, our first action is to assign the
parameters to their new values; however, because a parameter atom may
already have a value associated with it (from an outer execution scope), we
must save its old value; which we can do by saving it onto the system
stack when we assign the variable.
The prologue of a function steps through each argument, saving the old
value of the parameter variable, and assigning it to its new value. The
space used on the stack for the arguments of the function is used to keep
the old value of the parameters while the function is executing:
Notice that we can use the absolute addressing mode, in the 680x0
instruction which accesses x, to refer to the address of X's atom structure.
This is because the address of an atom is fixed once it has been entered into
atom dictionary. The complete prologue sequence for append becomes:
move.! X,al
move .1 car (al), ... ;value of X
where var1 ... varn are variables which are declared to be in scope
during the execution of the prog, and exp1, ... , expm are the
expressions to evaluate. Some of these expressions may be atoms, in
which case they are not evaluated but are interpreted as labels.
The entry to a prog sequence is similar to the prologue for a function:
we have to save the old values of the new variables - which we can do on
the system stack. When we exit the expression these variables are restored
using the values saved on the system stack.
Two special functions, go and return are used also within prog
expressions. The return function evaluates its argument and that value
becomes the value of the whole prog - thus causing it to exit also. The
go function is the LISP equivalent of goto, it is used to jump to another
point within the prog sequence: a label is indicated by an atom in the
prog sequence as opposed to a normal function application. Jumps out of
the current prog are not permitted in LISP, which considerably simplifies
its implementation.
Clearly, implementing a prog sequence is not all that different to
implementing a single expression. We evaluate the expressions in turn,
pushing arguments to functions etc, and calling the indicated functions.
However, since the values of the expressions are disregarded - by
dropping the results from the stack as soon as the invoked functions
return - it must be the case that they 'operate' by performing side-effects,
such as assigning a value to a variable.
(Predn Valn) )
Each pair of items (Predi Vali) forms an arm of the conditional. The
value of the cond expression is the value of the first of the expressions
Vali whose corresponding test expression Predi evaluates to true,
which in LISP is any non-nil value.
The implementation of a cond expression in 680x0 instructions is
similar to the implementation of a Pascal if-then-else. Each test
predicate P redi is evaluated, and after it returns the returned value is
compared against nil. If it is equal to nil then the next test is tried,
otherwise the value expression is evaluated and the value of that is the
value of the cond as a whole:
We can implement the test for x being equal to nil quite cheaply:
move.l x,ao
cmp.l #nil,car(aO)
bne
some LISP systems use a different version of cond which is more like the
Pascal if-then-else statement.
We can now see the complete set of instructions needed to implement
the simple LISP function for append below:
(de fun append (X Y)
(cond ( (eq X nil) Y)
(T (cons (car X)
(append (cdr X) ) ) ) )
which in 680x0 instructions becomes:
append:
move.! Y,al ;assign parameter y
move.! car(al),aO
move.! 4(a7),car(al) ;get parameter
move.! a0,4(a7) ;save old value of y
move.! X,al ;assign parameter x
move.! car(al),aO
move.! 8(a7),car(al) ;get parameter
move.! a0,8(a7) ;save old value of x
move.! X,al ;X=nil?
cmp.l #nil,car(al)
bne @1
move.! Y,al ;return y as val
move.! car(al),-(a7)
bra @2 ;go to epilogue
Pro log
184
10.1 Prolog data structures 185
The representation of a Prolog atom is much the same as for a LISP atom.
However there is no tradition, in Prolog systems, for an atom to have
associated properties in the way that a LISP atom can have. This means
that a Prolog atom is somewhat simpler to implement than a LISP atom.
The 'new' data types in Prolog, compared to LISP, are the logical variable
and the compound term. A variable is written in a similar manner to an
alphanumeric atom except that the first character must be uppercase or
underscore:
Var x variable 1
The compound terms are the Prolog equivalent of dotted pairs; except that
we can construct arbitrary tuples not just pairs:
Prolog also has a list notation, which is analogous to LISP's list notation
with a little more punctuation:
[1, 2, 3] which is equivalent to (1 2 3) in LISP
Prolog lists are just special cases of compound terms whose name is '. '.
We could assume that they were implemented in the same way as other
compound structures, but in practice many Prolog systems optimize lists
and use structures which are similar to LISP cons pairs to represent them.
The most interesting difference between Prolog and LISP data structures
relates to the logical variable. Uniquely amongst programming languages,
a Prolog variable is a true place holder: it stands for any - as yet unknown
186 Prolog
foo(X, a, bar(X))
then this is a perfectly valid structure and we do not need to know more
about X: it may remain unbound or uninstantiated. It is also possible to
make two variables the same. A Prolog goal (which is analogous to the
Pascal procedure call and the LISP function call) such as:
tags=(variable,atom,number,list,compound);
cell=record
marked:boolean; {for garbage collection}
case tag: tags of
atom:...... {atom structure}
number: (i:integer);
variable,
compound: (ptr:Acell);
end;
·~
tag
"foo/3"
A A L A L
"foo/3"
At some later point Y may become bound to gar (U) in which case both x
and Y become bound to gar (U). Since it is impractical to physically
replace each occurrence of X and Y by their new values, we implement this
shared binding by relying on the variable-variable bindings that we made
from x to Y, and then the binding from Y to gar (U): we determine the
value of x indirectly via a chain of variable links. The new picture for the
term is:
10.1 Prolog data structures 189
J"foo/3"/
Y is bound to gar (U) ----------4~[
1---------
move.! p,ao
@1 cmp.b #variable,tag(aO)
bne @2 ;p a variable?
cmp.l ptr(aO),aO
beq @2 ;self reference?
move.! ptr(aO),aO ;p:=pA.ptr
bra @1
@2 ;p left in aO
Although Prolog terms are more complex than LISP S-expressions, the
major difference between the languages lies in the way that data is accessed
190 Prolog
foo(f(X,g(Y))) :- ......
the term f (X, g (Y)) is a template which must match the corresponding
term in a call to foo. The nature of this matching may be quite complex,
involving variables in both the head and the call being instantiated. A
typical call to foo might be:
In order for these templates to match we need to bind the variable x to the
variable u - i.e. establish a variable link between them - and also to bind u
to the term g (Y) with the result that both X and u become bound to the
same term.
Unification can be used for accessing data as well as constructing it; it is
used to pass data into a procedure and to return results out of a procedure.
In this case, the variable x which is local to the clause, is bound to a
component of its incoming data and u which occurs in the call is bound to
the term g (Y). A further difference between Prolog unification and LISP
selector functions is that unification can fail: a match between terms might
not succeed as in:
This unification fails because g (G) is not unifiable with h (X). Failure in
Prolog unification leads to the system backtracking and some earlier choice
of rule is abandoned (together with all the consequent execution) and
another rule is tried. We shall see that in order to be able to backtrack to
try another alternative rule, we need to build a data structure in which to
record sufficient information to allow the system to try the alternative.
These records are called choice points to indicate that they represent a
possible choice in the execution of the Prolog program.
All this means that the support needed to support Prolog's data
structures is somewhat more complex than Pascal's or LISP's data
structures.
10.1 Prolog data structures 191
Compiling unification
Normally, a Prolog compiler arranges its data management by 'compiling'
the terms in the head of a clause into a sequence of instructions whose
function it is to unify the appropriate terms in the call.
For example, in the Prolog clause:
Top of cons~
term stack
The reason for using a stack like this is that when the system backtracks a
previous choice of clause which was made to solve a call is undone, and
all of the terms which have been constructed since then are no longer
needed and can be removed. This garbage can be removed by a simple
adjustment of the top of the constructed term stack. This is a much
simpler operation compared to a full mark and collect style garbage
collection needed to clear up discarded S-expressions in LISP. It is still the
case, however, that a garbage collector is needed to clear the constructed
term stack, and this needs to be more sophisticated than a LISP garbage
collector.
As with LISP's free-list, we would normally dedicate an address register
(a 6 say) to point to the next free location in the constructed term stack,
constructing a new list pair consists simply of incrementing the top of the
stack and assigning the old top as the address of the list pair:
We also need to initialise the head and tail of the new list pair to unbound
variables and to bind the local variables E and x to them, just as we would
for an incoming list pair:
move.l ptr(aO),aO
move.b #variable,tag(aO)
move.l aO,ptr(aO) ;new unbound
move.! aO,ptr('.E) ;bind E
move.b #variable,tag('.E)
lea cell(aO),aO ;next cell
move.b #variable,tag(aO)
move.! aO,ptr(aO) ;new unbound
move.! aO, ptr (X) ;bind x
move.b #variable, tag (X)
Since we have bound a variable from outside the clause in this step - by
assigning it to a list pair - we require a little further housekeeping. It may
be that the variable that we are assigning is older than - i.e. created before
- the most recent choice point. This is because the variable was in the call,
10.1 Prolog data structures 193
human(turing).
human(socrates).
greek(socrates). '1(4
:-fallible(X),greek(X).
A Prolog program 'solves' this problem by solving, in turn, the sub-
queries:
In other words, after the goal fallible (X) has been called, and is
successfully completed, then the next goal greek (X) is called. If it also
succeeds, then the whole query terminates successfully.
In order to solve the first goal we attempt to reduce it into simpler sub-
goals by using one of the clauses in the Prolog program - in this case the
only clause that might work is the single fallible rule '1(_1. To use a
clause to reduce a goal we have to match (i.e. unify) the head clause with
the goal.
This step also involves introducing any new local variables which are
associated with the clause. In the case of unifying the head of '1(_1 with the
fallible goal we introduce the new variable H, and we bind it to the
variable x from the goal. As a result of using the rule, the original
problem is reduced to showing that there is a human greek:
:-greek(turing).
10.2 Controlling a Prolog execution 195
with
:-greek(socrates)
Since the second rule for human was also the last one we did not need to
record a choice point this time, and if this new query were to fail then the
whole top-level query would fail also. However, it does not fail because
this goal matches with ~4. After solving the greek sub-goal there are no
further goals to solve. The 'answer' socrates may then be displayed as
the proof that there is a fallible greek.
In practice, in a Prolog system, the real top-level query is one which is
not seen by the programmer and it never terminates. This query invokes
a special read-evaluate-print program whose function is to continually
read a query from the terminal, evaluate it and print out an answer if it is
true and print a message if not. After completing one query the loop
carries on for more queries.
196 Prolog
'par~'
~parent
I """" .
b 1 next sub-goal variables
su -goa to solve
Arguments to a Prolog goal are not normally passed via the evaluation
stack. Instead they are placed in a series of 'argument registers'. These are
10.2 Controlling a Prolog execution 197
usually fixed global locations within the memory, although some Prolog
compilers may use one or more 680x0 registers to hold Prolog arguments.
Using argument registers to pass parameters is analogous to using 680x0
machine registers to pass parameters to a Pascal procedure. However, the
principal role of an argument register is to hold the argument during
unification. Once the unification is completed then the contents of the
argument registers will have been 'read' and either recorded in local
variables or matched against some structure in the head of the clause. In
either case the contents of the argument registers are no longer needed.
This is in contrast with Pascal arguments which can be accessed from any
point within the body of the procedure, or even from within procedures
declared locally to it.
As with the 680x0, a Prolog system usually has a fixed number of these
argument registers, setting an upper limit on the number of arguments a
goal may have. However, 32 seems to be a reasonable limit as there are
few Prolog goals with more than 32 arguments.
51.rgl 5l.rg2
T T
Each argument register is logically a cell with a tag and a value part; i.e. an
argument register can 'hold' any term. If the argument register is an
integer then the value part will be the integer, otherwise it will be a
pointer to some other structure. The only restriction normally imposed is
that an argument register cannot be an unbound variable: it must always
point to a location within the evaluation stack proper or on the
constructed term stack. An argument register containing an unbound
variable is represented by a variable-variable link to an unbound cell on
the evaluation stack or constructed term stack.
The local variables introduced by a clause when it is used to reduce a
goal are also kept on the evaluation stack. As with Pascal, these variables
are accessed via offsets from a base pointer, usually an address register;
however, unlike Pascal, we are not able to use the simple link and unlk
mechanism to allocate and deallocate space for them.
Each variable 'slot' is, like an argument register, a single cell and can
hold any term. Some Prolog systems initialize variables as they are
allocated to be unbound, others do not. Initializing variables reduces the
performance (since the effort to initialize the variables might be
198 Prolog
clause~
V•riabl.,,;
of
B
Figure 10.8 Local variables introduced by a clause
The third type of entry in the evaluation stack is the choice point record.
This is used when there is a choice of clause in reducing a goal. In the
choice point record are kept sufficient details to allow us to restore the
evaluation stack to the state just before the choice point record is created.
This allows us to backtrack and to make another choice as necessary.
The Prolog argument registers are also saved as part of the choice point
record. The motivation for saving them is the same as saving registers in
a Pascal procedure: they will be needed again to participate in another
unification; furthermore, the arguments are only needed again in the
event that the system backtracks.
Saved argument
registers~
Top sub-goal
Jil~~~--.:Topof
~ constructed term
stack
Trail Next clause
to try
The trail is used to record those variables which have been bound since a
choice point is created. This is generally kept as a separate data structure to
the main evaluation stack although logically it is part of the choice point
record's function to record the bound variables.
It is not necessary to record every binding in the trail; we only need to
record bindings to those variables which will survive a backtrack. When
the system backtracks all the variables created after the choice point will
automatically disappear - their creation will itself be undone - therefore it
is not necessary to record the fact that such variables have been bound.
We only need to create entries in the trail for variables which are older
10.2 Controlling a Prolog execution 199
than the most recent choice point, and which therefore will still be present
after backtracking albeit with possibly different values.
The final data structure is one we have already seen: the constructed
term stack. Like the LISP heap this is used to record terms which have
been dynamically created during a Prolog evaluation. However, we
organize it like a stack to facilitate backtracking. The constructed term
stack grows as new terms are created during unification, and shrinks as
part of backtracking. One of the fields in the choice point record indicates
the stack top at the point that the choice point is created.
Any terms created after the choice point are placed above this marker;
and so, when the system backtracks all the terms above the marker can be
discarded. This form of garbage collection is so powerful that it can
remove the need for many, if not most, calls to the garbage collector -
indeed early Prolog systems did not have a garbage collector. However, a
real garbage collector is still needed for those programs which do not
backtrack.
. /Prolog's moving
51.rgl ~ finger
v
:-fallible(X),greek(X)
x from top-level
query
~I
fallible(H) :-human(H).
human(turing).
human(socrates).
greek(socrates).
Constructed Evaluation
Trail Term Stack Stack
We have arbitrarily put the only variable so far in the system - X - in the
constructed term stack (C.T.S.) for convenience. In practice, we cannot
easily predict where this variable would be located.
200 Prolog
The first step that the evaluator makes is to enter the fallible
program. This involves setting the first argument register to point to x
and to create a call record indicating that there is another goal to solve
after the human goal:
!Jl.rgl
v
:-fallible(X),greek(X)
~fallible(H) :-human(H).
human (turing) •.
human(socrates).
greek(socrates).
We now have to unify the head of the fallible clause with the goal.
We must also create a new local variable - H - which is introduced as a
result of using the fa 11ib1 e clause and which is allocated on the
evaluation stack.
As a result of unifying the head and goal we bind H to the first argument
register - which is itself bound to the top-level goal variable:
!Jl.rgl
v
:-fallible(X),greek(X)
fallible(H) :-human(H).
~human(turing).
~human (socrates) .
greek(socrates).
We now enter the human procedure. Since this is the last sub-goal in the
rule for fallible we do not need to create a call record here; however,
10.2 Controlling a Prolog execution 201
since there are two clauses for human we do need to create a choice point
record (sometimes we might need both a call record and a choice point
record). In the choice point record are recorded the current goal, a pointer
to the constructed term stack, the trail and the previous choice point
record. We also record the argument registers. To avoid overly cluttering,
up our diagram we only show some of these pointers emanating from the
choice point record:
ftrgl
v
Saved argument :-fallible(X),greek(X)
register
fallible(B) :-human(B).
~human (turing) .
human(socrates).
greek(socrates).
Having created a choice point, we unify the head of the human clause with
the goal. This involves binding the goal variable x to the constant
turing; and an entry is created in the trail because the goal variable is
older than the choice point we have just created. After performing the
unification the next step is to attempt to solve the greek goal:
ftrgl
v ~
:-fallible(X),greek(X)
fallible(B) :-human(B).
human(turing).
human(socrates).
greek(socrates).
Solving the greek goal involves using the only clause there is for
greek. As we enter the greek goal, the first argument register is loaded
with the value of x, which is turing. In this case it happens that the first
argument register has not changed much in value; however with deeper
computations we would certainly expect the argument registers to be
constantly changing:
5trgl
v
:-fallible(X),greek(X)
fallible(H) :-human(H).
human(turing).
human(socrates).
~greek(socrates).
Figure 10.15 Attempt a greek solution
5trgl
v
:-fallible(X),greek(X)
~
fallible(H) :-human(H).
human(turing).
human(socrates).
greek(socrates).
We are back in the state where we needed to solve the human goal,
although the first human clause has been tried, and therefore we must try
the second one. We can now proceed to use the second human clause,
which this time succeeds by binding x to the greek socrates:
.9lrgl
v
:-fallible(X),greek(X)
fallible(H) :-human(H).
human (turing) . ~
human (socrates) .
greek(socrates).
Notice that, since there is no choice point in the way, we did not need to
create an entry in the trail when we bound x this time.
We can now move on to the final state, where the greek goal has been
entered and completed - and x is bound to the compatible socrates.
Since there are no more goals to solve, and there are no choice points
'protecting' the call record for fallible, most Prolog systems optimise
the stack by removing the call record from the stack:
.9lrgl
v
:-fallible(X),greek(X)
~
fallible(H) :-human(H).
human(turing).
human(socrates).
greek(socrates).
socrates
emulator
for
1-~-~I Virtual machine ~-1-~
680x0
There are a number of virtual machine designs suitable for Prolog; the
most famous is the Warren Abstract Machine (WAM). In the WAM,
instead of our long instruction sequence for unifying a list pair we have
just three W AM instructions:
10.3 Using a virtual machine 205
repeat
case pc" of
get_list:
unify var:
end; -
pc:=pc+l;
until false;
dc.w unify_var-@0
get_ list:
;implement get_ list
bra exit
where 'Tis an internal register to the W AM. The 'T register is used during
unification as a pointer which follows the internal structure of lists and
compound terms. Each unify_ instruction leaves '!'pointing at the next
argument of a compound term.
Thus, for three instructions which implement the 'meat' of the
unify_var virtual machine instruction, we have 12 'overhead'
instructions. It is quite important to try to optimize the implementation
of the decode instruction loop: in general a single extra instruction in the
decode loop can result in a performance degradation of 10-20%.
If we arrange the decoding of virtual machine instructions more
carefully, then we can optimise the decoding of instructions considerably.
For example, we can eliminate the error checking in the case statement
code: all we need to ensure is that the Prolog compiler generates correct
virtual machine instructions.
A further optimisation could be to use the scaled addressing modes
available on the 68020 and 68030. This would allow us to eliminate an
instruction from the decode cycle:
add.w dO, dO
Furthermore, we can increment the virtual machine's program counter at
the same time as accessing the opcode; and we could allocate an address
register (a4 say) to be the program counter. Together, these optimisations
give the instruction decoding sequence of:
move . w ( a4) +, dO
move.w @O(d0.w*2),d0
jmp @O(dO.w)
which, together with a jmp instruction at the end of the 680x0 instructions
used to implement each virtual machine instruction, gives us four 680x0
instructions to decode a virtual machine instruction.
10.3 Using a virtual machine 207
We can improve this still further if, instead of using arbitrary numbers
to represent virtual machine instructions, we use 680x0 addresses as the
opcodes: a virtual machine opcode is also the address of the 680x0
instructions which implement it. Each opcode now occupies 4 bytes
instead of 2, which is still far short of the space needed for the instructions.
This allows us to reduce the instruction decode and increment cycle to just
two 680x0 instructions:
move.! (a4)+,a0
jmp (aO)
unify var:
-move.! (a4) +, aO ; acquire Argn
move.b tag(a2),tag(aO);use a2 for 'I
move.! ptr(a2),ptr(a0)
lea cell (a2) , a2 ;'I is incremented
move.! ( a4) +, aO ; decode next ins.
jmp (aO)
In this regime, for this virtual machine instruction, the overhead for
instruction decode is reduced from 300% to 25%.
The exercise that we have just gone through of optimising a crucial
section of code, is a good example of one of the prime motivations for
programming directly in assembler. We have gained a considerable
performance benefit which it is extremely unlikely that a Pascal compiler
could generate - it simply requires too many assumptions which we, as
programmers, could make but a compiler could not.
There are many other aspects which are related to the implementation
of Prolog which we have not covered in this chapter. To fully cover the
techniques needed to implement a Prolog system would justify a book in
its own right! It has been our intention to outline some of the more
interesting aspects of implementing Prolog rather than providing a
complete guide to its implementation.
APPENDIX A
move.w a0,d3
uses register direct addressing for both the source and destination operand.
The effect of this instruction is to move the word length contents of
address register aO to data register d3.
When a data register is addressed as a word length quantity, as in this
case, only the lower half of the register is involved. So, for this instruction
only the lower half of d3 would be affected, and the upper half of the
register remains intact. When a data register is addressed as a byte quantity
then only the lowest quarter of the register takes part in the instruction.
208
A. Addressing modes for the 680x0 209
move.w a0,d3
al dl
a2
-------
a3
d2
- - - - - - -
<!3_X~~X_¥~_¥ ·----
a4 d4
-------
as dS
-------
yyyy replaced by 1003
a6 d6
a7 d7
cmp.b #32,dO
the operand #32 is the source operand and it is an immediate operand
(indicated by the presence of the '#' character in front of the literal
number). This instruction compares the lowest byte in register dO with 32,
which also happens to be the code for an ASCII space character.
Immediate addressing only makes sense in the case of a source operand.
Since the data is actually part of the instruction, using immediate
addressing for the destination would amount to allowing program
instructions to modify themselves. An ability for programs to modify
themselves is important to have, on a theoretical level, but it is not
obviously useful for an addressing mode.
210 A. Addressing modes for the 680x0
move .1 1000, d4
moves the long word at address 1 OOO into data register d4 overwriting
the whole of its contents.
move.1 1000,d4
996
1000 12345678 aO dO
1004 al dl
a2 d2
a3 d3
a4 d4 12345678
------- --------
as dS
a6 d6
a7 d7
uses an address register to specify the address of the operand. The specified
register contains the address in memory of the data value for the
instruction or where to place the result. In register direct addressing the
data value to be manipulated is in a register whereas in register indirect
addressing the register contains the address of the data.
Address register indirect is often used for pointer following - where the
memory is loaded from some variable into an address register and then
dereferenced and for storing into records via a pointer.
In the instruction
the long word value of dO (i.e. the whole of dO) is written out to the
address referred to in register a6.
or
where the offset o is a 16 bit number in the range -32768 ... 32767, is a
variation on address register indirect. In this case the address contained in
the address register is offset by means of a fixed displacement in order to
determine the final address of the operand.
The address register indirect with displacement mode is extremely
useful in accessing elements in records and in accessing local variables
within a Pascal procedure or function.
We can load the value which is addressed as being offset four bytes from
a2 into dO with the instruction:
move . w 4 ( a2) , dO
1-~o_ - - - - - dO xxxxyyyy
-------
1-~l_ - - - - - dl
-------
t-~2- ___1.Q~O d2
-------
1-~3_ - - - - - c_p_ _____
a4
1-------- d4
-------
998 as
1-·- - - - - - - dS
-------
1000 1234 a6 d6
1-------- -------
1002 5678 a7 d7 Before
1004 9012 move.w 4(a2),d0
1006
1-~o_ - - - - - QO_ ~x~~9.Q1:_2
~l_ - - - - - dl
-------
t-~2- __ ~Q_O_p d2
-------
1-~3_ - - - - - Q~-----
a4
------- d4 -------
as
1-------- dS
a6
-------
------- d6
-------
a7 d7 After
move.w (a3)+,d0
moves the word pointed at by address register a3 into the lower half of dO
and adds 2 to a3:
The instruction:
1004
Before
After
move.w d0,-(a7)
In order to pop dO back off the stack we would use the instruction:
move.w (a7)+,d0
Off(ax,ry.w*s) or (Off,ax,ry.w*s)
stores the contents of dO into the small array of long words based at a 2
and indexed through dl:
216 A. Addressing modes for the 680x0
Before After
900+25*4+0=1000
move.w d0,0(a2,dl.w*4)
jmp 36(PC)
would, in effect, add 36 to the program counter and cause a jump to that
new address. The instruction
move.w *+10,d3
uses an alternative notation for this addressing mode. The effect of this
instruction would be to move the word which is 10 bytes further on from
the start of this instruction into d3. It is up to the programmer, of course,
A. Addressing modes for the 680x0 217
Off (PC, rn. w*s), (Off, PC, rn. w*s) or *+Off (rn. w*s)
or just
label(rn.w*s)
As with normal indexing the width of the index is specified with the index
register and there is an optional scale factor (on the 68020).
The instruction
move.! charray(d0.w*4),dl
moves a long word from a table built into the program (at relative address
charray) into dl:
charray: de .1 ...
As with the address register indexing mode the form of program counter
relative indexing is restricted on the 68000 compared to the 68020/68030.
In the 68000/68010 the displacement can only be short, i.e. in the range
-128 ... 127 bytes, and the scale factor is restricted to being just 1.
an Memory address
Intermediate address
I
scale Scale factor: 1,2,4 or 8 f-. ~ +
operand address
move.w ([6,a2],d0.w*4,0),dl
accesses the long word at 6 (a 2 ) , adds it to the contents of register d O
(scaled to be a long word index) and the final offset of zero to compute the
actual address:
- - - - a2+6=804
move.w ([6,a2],d0.w*4,0),dl
This addressing mode is not available on the 68000 I 008 I 010 models.
Intermediate address
intermediate value
operand address
([Oi,PC],rn.w*s,Od)
([Oi,PC,rn.w*s],Od)
de.I labn
will cause a switch to one of labels lab1, ... ,labn depending on the value
of dO.
This addressing mode is not available on the 68000/010/008 models.
APPENDIX B
Below are listed the instructions which are actually referred to in the main
text. Where appropriate, related instructions are also listed.
The exact Motorol!l mnemonics are given - in many cases, we can use a
generic mnemonic and allow the assembler to choose the correct one. For
example, the adda instruction is a special case of add which adds to an
address register; many assemblers automatically substitute for the correct
mnemonics as necessary.
The list is not intended as a complete reference to all the 680x0
instructions; however the main instructions that application assembler
programmers use are all covered; the omitted instructions tend to be for
special system purposes and are often not available to the application
programmer. The format of each description is:
222
B. The 680x0 instructions used in the text 223
description: Add source to destination along with the extend bit. This
instruction is used to implement multi-word arithmetic.
description: Logically 'and' the bit pattern in the source with that of the
destination. Each bit in the destination is formed by and-ing
it with the corresponding bit of the source operand.
description: Logically 'and' the bit pattern in the immediate data with that
of the destination. This instruction is used with the and
instruction when the source operand is a literal value.
description: Logically 'and' the bit pattern in the immediate data with the
condition code register. In effect this is used to mask out
certain flags in the ccr.
ccr: Z is set to the new value of the corresponding bit. All other
flags are unaffected.
ccr: Z is set if the old value of the corresponding bit was zero,
reset otherwise. All other flags are unaffected.
description: Sets the condition codes depending on the specified bit field
and then complements the bit field.
description: Sets the condition codes depending on the specified bit field
and then zeroes the bit field.
description: Sets the condition codes depending on the specified bit field
and extracts the bit field extended to a 32 bit signed number
into the data register.
description: Sets the condition codes depending on the specified bit field
and extracts the bit field as a zero-extended 32 bit unsigned
number into the data register.
description: Searches the bit field for a 1 bit. The bit offset of that bit (i.e.
the bit offset given in the instruction plus the offset within
the field) is placed into the data register. If no 1 bit is found
then the data register is loaded with the offset plus field
width. The instruction also sets the condition codes
depending on the specified bit field.
description: Inserts the value contained in the bottom width bits of the
data register into the specified bit field. It also sets the
condition codes depending on the inserted value of the bit
field.
description: Sets the condition codes depending on the specified bit field
and then sets the bits in the bit field to all ones.
ccr: N is set if the most significant bit of the field was 1, cleared
otherwise; Z is set if the bit field was all O's and is cleared
otherwise; V and Care cleared and Xis unaffected.
description: Sets the condition codes depending on the specified bit field.
ccr: Z is set if the old value of the corresponding bit was zero,
reset otherwise. All other flags are unaffected.
bs r Branch to subroutine
description: A bit in the destination is tested and its state is reflected in the
Z flag in the ccr. If the destination is a data register then the
numbering of the bits is modulo 32, if the destination is
memory then the numbering is modulo 8 and it is a byte
operation.
ccr: Z is set if the value of the corresponding bit was zero, reset
otherwise. All other flags are unaffected.
description: Check the value of the data register dn against the operand
specified in <ea>. If d 0 <0, or is greater than the source
operand (i.e. <ea> ) then issue a TRAP which results in
exception processing. The comparison is signed
description: Check the value of the register r 0 against the bounds pair
stored at location <ea>. The lower bound is the first byte,
word or long word (depending on the size of the operation)
and the upper bound is the second location.
ccr: N is undefined.
Z is set if rn is equal to either bound, cleared otherwise
V is undefined.
c is set if rn is out of bounds, cleared otherwise
Xis unaffected.
cmp Compare
cm pm Compare memory
ccr: N Undefined
Z set if rn is equal to either bound, cleared otherwise
V Undefined
C set if rn is out of bounds, cleared otherwise
X not affected.
description: Decrement the register (as a word value) and if the register is
-::F -1 then branch to the label.
di vu Unsigned division
eor Exclusive OR
description: Logically 'exclusive or' the bit pattern in the source with that
of the destination. Each bit in the destination is formed by
exclusive or-ing it with the corresponding bit of the source
data register.
description: Logically 'exclusive or' the bit pattern in the immediate data
with that of the destination. This instruction is used in place
of the eor mnemonic when the source operand is a literal
value.
description: Logically 'exclusive or' the bit pattern in the immediate data
with the condition code register. In effect this is used to
complement certain flags in the ccr.
description: Sign extend the lower byte (or word) in the data register to a
valid word (or long word) quantity. This involves replicating
the most significant bit in the byte (or word) throughout the
upper byte (or word) of the register.
jmp Jump
syntax: jmp <ea>
j sr Jump to sub-routine
see: Chapter 8
syntax: link
description: Allocate space on the system stack and link address register
an to it. This involves pushing the old value of an onto the
stack, setting an to point to its old value on the stack, and
adding #data to the system stack. (Data is normally
negative.)
description: The destination is left shifted by count bits where count is either
#data or the least significant 6 bits of dx. The rightmost bits
are replaced by 0, and the leftmost bit shifted out is placed
into the extend and Carry flags.
mu ls Signed multiply
/
258 B. The 680x0 instructions used in the text
or Inclusive OR
description: Logically OR the bit pattern in the source with that of the
destination. Each bit in the destination is formed by or-ing it
with the corresponding bit of the source operand.
description: Logically OR the bit pattern in the immediate data with that
of the destination. This instruction is used with the or
instruction when the source operand is a literal value.
description: Logically OR the bit pattern in the immediate data with the
condition code register. In effect this is used to set certain
flags in the ccr.
description: Push the memory address specified by <ea> onto the system
stack. This in contrast with the lea instruction which would
load the address into a address register.
syntax: rts
description: If the condition is satisfied then set byte value at <ea> to true:
all ones; otherwise set byte at <ea> to false: all zeroes.
_G_E_-=--:: ~~e_a_t:~ ?~ :g1_:1~~ __]'J~Y:_t_:-'_~~'":'Y ________ !-_T_-=--:: le_ss _tha1_1 _________ ~·_-,_'v'f-'":'l'J~Y- __ _
_<.:;!_-=--:: ~~e_a_t:~ ~~a_n_ ____ -·~~l'J~Y_+_-,_~•_-,_]'J~·Y ___L_E_-=--:: !e_s~ _o_r_e_q_u_a! ______ ?'.:_t!'J.~-:Yf-'":'1'.'f_•:': _
HI-high .c-.z LS-low or same C+Z
description: Sub source from destination along with the extend bit. This
instruction is used to implement multi-word arithmetic.
syntax: swap
description: Swap upper half of data register d 0 with its lower half.
unlk Deallocate
syntax: unlk
Exercises 2.2.4
1. a) 1000 = 1*512+1*256+1*128+1*64+1*32+1 *8
2. There are 7 numbers in the range 20 ... 26. The fewest number of bits
that can represent 7 numbers is 3 bits (which can represent up to 8
numbers).
270
C. Answers to selected exercises 271
Where each term of the form (1-ai) is actually the complement of ai: if
ai=O then (1-ai)=l, and if ai=l then (1-ai)=O. Finally the +1 term at the
end signifies that we must add 1 after complementing the individual
terms in the expansion.
OXO=O OXl=O
lXO=O lXl=l
if n=l then
mu1t := a & b
else ...
otherwise, for the recursive case we have to split our two numbers,
recurse and then recombine. Splitting can t>e done by a mask and shift
operation. In standard Pascal this appears to be quite expensive since
it involves a multiplication in its own right; however if we
temporarily borrow some 'C' notation we can express it more directly:
272 C. Answers to selected exercises
begin
n2 :=n/2; mask .- (l<<n2)-l
aO .- (a & mask); al .- a >> n2;
bO := (b & mask); bl .- b >> n2;
aObO := mu1t(a0,b0,n2);
albl .- mu1t(al,bl,n2);
temp := mu1t(aO+al,bO+bl,n2)
-aObO-albl;
mu1t:=aObO+(temp<<n2)+(albl<<n);
end;
begin
if odd (n) then { make sure n is even }
n := n + l;
n2 := n>>l;
mask := (l<<n2) - l;
aO := a & mask; al := a>>n2;
bO := b & mask; bl := b>>n2;
aObO .- mu1t(a0, bO, n2);
albl .- mu1t(al, bl, n2);
if ((aO +al) & mask) <> aO+al) or
( (bO+bl) & mask) <> bO+bl) then
no .- n2+2
e1se
no . - n2; { adjust for carry
temp .- mu1t(aO+al,bO+bl,n0)
aObO - albl;
mu1t . - aObO + temp<<n2 + albl<<n;
end;
In each pass of this algorithm there are three recursive calls to mu1 t.
On the other hand, the size of the subsidiary problems is half the
number of bits. Therefore, the average depth of recursion will be
log2N for an N-bit multiplication. The complexity of this algorithm
is, then, O(NlogzN) which is less than O(N2) for the conventional
multiplication algorithm. However, this algorithm is considerably
more complex to implement and it would require very careful coding
or implementation in silicon to achieve a speed-up.
C. Answers to selected exercises 273
Exercises 2.3.4
0.10000000000000002 = 0.0100000000000000
0.01000000000000002 = 0.0001000000000000
0. 00010000000000002 = 0.0000000100000000
0.00000001000000002 = 0.0000000000000000
2. If we are to divide two fixed point numbers, a and 6 (say) with a fixed
point at /(_bits, then we can express the numbers as:
a= J'l.*2-k.
and
Thus we can divide the integer part of a by the integer part of 6, and
divide that by 2-t to give us the correct result with a fixed binary point
at !(, Dividing a number by 2-tamounts to a left shift of k bits:
a+-6 = ((5l+~)<<K)*~
As with fixed point multiplication, this left shift may lose significant
bits from the answer. In this case however, any bits that we lose are
liable to be the most significant bits rather than the least significant
bits that we lose in multiplication.
= (1QX/2X)XlQ-X
= sxx10-x
Exercises 3.3.3
X© YEE> X = Y
'.)'":= X© y Y= (Xo©Yo)©Yo = Xo
X:= X© Y X = (Xo©Yo)Xo = Yo
The instructions to do this are:
eor.l dO,dl
eor.l dl,dO
eor.l dO,dl
The execution time for these three instructions is the same as for the
three move instructions in the previous exercise.
276 C. Answers to selected exercises
Exercises 4.2.3
1. u-v-w+x-y
~ ju v -I - w + x - y
~ ju v - w -I+ x - y
~ ju v - w - x +I - y
~ ju v - w - x + Y.. -I
The instructions which implement this sequence are:
move.w u,-(a7)
move.w v, dO
sub.w dO, (a 7) ;subtract v from u
move.w w,dO
sub.w dO, (a 7) ;subtract w from u-v
move.w i\.r dO
add.w dO, (a 7) ;add x to u-v-w
move.w y, dO
sub.w dO, (a 7) ;subtract y
2. a) (u+v)/(u-15)
~ lu v +j/ju 15 -I
~ !u v + u 15 - !I
C. Answers to selected exercises 277
move.w u, - (a7)
move.w v,dO
add.w d0,(a7)
bov overflow error
move.w u,-(a7)
move.w #15,dO
sub.w dO, (a7)
bov overflow error
beq divide zero error
move.w (a7)+,cil
move.w (a7)+,d0
ext.! dO
divs.w dl,dO
bov overflow error
move.w d0,-(a7)
move.w u,d7
add.w v,d7
bov overflow error
move.w u,d6
sub.w #15,d6
bov overflow error
beq divide zero error
ext.! d7
divs.w d6,d7
bov overflow error
lu 32 * u v I + w **I
We shall use registers d7, d6 and dS to simulate an expression stack.
The basic code, without error checking is:
278 C. Answers to selected exercises
move.w u,d7
move.w #32,d6
mu ls d6,d7 ;u 32 *
move.w u, d6
move.w v, dS
ext.l d6 ;extend dividend
divs d5,d6 ;u v I
add.w d6,d7 ;u 32 * u v I +
move.b w,d6 ;start exponentiation
move.! #1,dO ;compute exp. into dO
cmp.b #0,d6 ; end. of loop
beq.s @2
@1 mu ls d7,d0 ;multiply
sub.w #1,d6
bne.s @1
@2 move.! d0,d7 ; store final answer
move.w u, d7
move.w #32,d6
mu ls d6,d7 ;u 32 *
bov overflow error
move.w u,d6
move.w v, dS
ext.l d6 ;extend dividend
beq zero divide
divs dS,d6 ;u v I
bov overflow error
add.w d6, d7 ;u 32 * u v I +
bov overflow error
move.b w, d6 - ; start exponentiation
move.! #1,dO ;compute exp. into dO
cmp.b #0, d6 ;end of loop
beq.s @2
@1 mu ls d7,d0 ;multiply
bov overflow error
sub.w #l,d6
bne.s @1
@2 move.! d0,d7 ;store final answer
C. Answers to selected exercises 279
Exercises 5.1.4
1. We can implement
in three instructions:
move. 1 f6p, aO
move.! foop(aO),al
move.! foop(al),foop(aO)
d entry = record
- mark:boolean; { 1 byte }
t:(a tag,b tag); { 1 byte }
n: "d-entryT { 4 bytes }
end; -
requires 6 bytes, but the second record requires up to two filler bytes:
e entry = record
- mark:boolean; { 1 byte }
{ filler }
n:"e entry; { 4 bytes }
t: ca:tag,b_tag); { 1 byte }
{ filler }
end; { 8 bytes }
f:file of d_entry;
280 C. Answers to selected exercises
then the data in the record file may be accessible from outside the
system: the program containing this definition may access data from
other programs or even other operating systems. These other
programs may be compiled using compilers which did not make the
same optimizations, therefore the program may not operate correctly
over the data file.
Perhaps an appropriate solution would be for the Pascal compiler to
inform the programmer that a small reorganization of the record
would yield the improvement: this leaves the actual decision to the
programmer.
Exercises 5.2.3
1. The assignment:
move.w "-' dO
cmp.w #1,dO ;l<x?
blt array error
cmp.w #5, dO- ; S>x?
bgt array error
move.! jjp, aO -
sub.w # 1, dO ; cant use offset
mulu #42, dO ; size of a jamjar
lea jar(aO,dO.w),aO
move.w y, dO
cmp.w #1,dO ;l<y?
blt array error
cmp.w #10 dO
I ; lO>y?
bgt array error
add.w dO,dO- ;*2
move.w jjn,-2(a0,d0.w) ; ... :=jjn
C. Answers to selected exercises 281
6*'B1
where 'B 1 is the number of non-zero bits in /. For the code to be faster
than the general purpose mulu instruction, this must be less than 29,
i.e.:
or 'B J<4.8333333333
If there are more than four significant bits in I then we are better to
use the mulu instruction. Note that this restriction does not refer to
the size of I; if we are multiplying by 32 then there is only one non-
zero bit in I, but if we want to multiply by 31 then there are five ncm-
zero bits in I and it would be better to use mul u.
Access to the element of the array involves accessing this table as well
as the array itself.
0
base address of bi
length of bi [ O]
j
20*length of bi [O]
11 . . 1ff?Tl I I .. I I
Exercises 6.2.3
i_set:=i_set+[I];
move.w J,dO
ext . 1 d0 ; convert to long
bfset i_set{dO:l}
This sets a bit field - of width 1 - to l's. We need to convert the word
length index I to a long value because the bfset instruction uses a
long value to specify the bit field's offset. We can also set a sub-range
of the set in one instruction also:
i_set:=i_set+[I .. I+4];
becomes
move.w I, dO
ext.l dO ; convert to long
bf set i_set{ dO : 5}
}::>I~ InJ=I
So, in 68000 instructions, when testing a large set such as i_set, each
segment of the test becomes:
lea i_set, aO
lea j set,al
move.w #31,dl
@1 move.! (aO),dO ;I fragment
and.l (al)+,dO ;I n J
cmp.l (aO)+,dO ;=?
bne not subset
dbra dl,@1
;yes, i_set<=j_set
move.w y, - (a7)
move.w #1,-(a7)
move.w (a7)+,d0
add.w (a7)+,d0
bov overflow line xxx
mu ls x,dO
bov overflow line xxx
cmp.w #0, dO ; range check
blt range error xxx
cmp.w #1023-;-do -
bg~ range error xxx
move.w dO,dl- -
lsr.w #3, dl ; compute byte offset
bset dO,O(al,dl)
C. Answers to selected exercises 285
Exercises 7 .1.5
bra @1
@0 lea ai, aO
move.w i, dO
add.w dO,dO
move.w -2(a0,d0.w),i
@1 move.w i, dO
cmp.w j, dO ;i<j?
blt @2
move.w j, dO
@2 add.w dO,dO ;dO = i or j
lea ai, aO
cmp.w #10,-2(a0,d0.w)
blt @0
Index
287
288 Index
290 Index